§036 min read · Last updated 2026-06-04

Ingestion workers

Five worker families, each producing typed signal rows. Apify for LinkedIn / Indeed / jobs.ch, GitHub GraphQL for dep references, HN Algolia for Who is Hiring, Crunchbase for funding, BuiltWith for stack enrichment.

CogniLead is signal-first by construction. The ingestion layer is the wedge — almost everything downstream is convergent with other outbound tools, but the ingestion choices are what produce the high-intent leads in the first place.

Worker families

  • Apify actors. LinkedIn job posts, Indeed listings, jobs.ch. We pull on a per-tenant schedule and never resell scraped data. Apify cost is allocated per-tenant in the unit economics — see /docs/api for the meter shape.
  • GitHub GraphQL. Find repositories that depend on a target package, optionally filtered to commits in the last N days. Useful for dev-tools and developer-facing infra.
  • HN Algolia. The monthly "Who is Hiring" thread plus Show HN / Ask HN for signal-rich technical context. Algolia is queried on a strict cadence, not real-time.
  • Crunchbase webhook. Funding round notifications. We do not pull the database — we subscribe to events, which is materially cheaper and respects Crunchbase ToS.
  • BuiltWith. Tech-stack enrichment only. Never used as the primary signal — only to refine the fit score after another worker has flagged a company.

What we do not ingest

No private databases of email addresses. No purchased lists. No LinkedIn member scraping in violation of LinkedIn ToS — we read job posts (public artifacts) only. The wedge is finding high-intent behavior, not buying intent data.

Evaluate the runtime

Two free MCP tools surface this pipeline inside Cursor or Claude Desktop — no key required.

Install MCP →