Ingestion workers
Five worker families, each producing typed signal rows. Apify for LinkedIn / Indeed / jobs.ch, GitHub GraphQL for dep references, HN Algolia for Who is Hiring, Crunchbase for funding, BuiltWith for stack enrichment.
CogniLead is signal-first by construction. The ingestion layer is the wedge — almost everything downstream is convergent with other outbound tools, but the ingestion choices are what produce the high-intent leads in the first place.
Worker families
- Apify actors. LinkedIn job posts, Indeed listings, jobs.ch. We pull on a per-tenant schedule and never resell scraped data. Apify cost is allocated per-tenant in the unit economics — see /docs/api for the meter shape.
- GitHub GraphQL. Find repositories that depend on a target package, optionally filtered to commits in the last N days. Useful for dev-tools and developer-facing infra.
- HN Algolia. The monthly "Who is Hiring" thread plus Show HN / Ask HN for signal-rich technical context. Algolia is queried on a strict cadence, not real-time.
- Crunchbase webhook. Funding round notifications. We do not pull the database — we subscribe to events, which is materially cheaper and respects Crunchbase ToS.
- BuiltWith. Tech-stack enrichment only. Never used as the primary signal — only to refine the fit score after another worker has flagged a company.
What we do not ingest
No private databases of email addresses. No purchased lists. No LinkedIn member scraping in violation of LinkedIn ToS — we read job posts (public artifacts) only. The wedge is finding high-intent behavior, not buying intent data.
Two free MCP tools surface this pipeline inside Cursor or Claude Desktop — no key required.