14 min read · Published 2026-06-04 · Updated 2026-06-04

How to find companies actively building ISO 20022 / FHIR / HIPAA — and why public job posts beat LinkedIn lists

TL;DR

Most cold outbound starts from a roster — a 270M-contact database filtered by firmographics and tech-stack tags. The hit rate is what you would expect when you are message-bombing a generic list: 1-3% reply on a good day, with the reputation tax that comes with it. The alternative is to start from observed behavior — public artifacts that announce, at a specific point in time, that a target company is actively building in your domain and lacks the capacity to staff it from existing headcount. A job post asking for a Postgres + FHIR engineer is a higher-intent signal than "works at a healthcare SaaS." A GitHub repo with a fresh swissqrbill dependency announces a Swiss invoicing build in a way no firmographic row ever can.

This playbook is the dev-to-dev version of the CogniLead wedge — the signal-extraction discipline we run internally and recommend whichever tool you use. We walk through five public artifact classes (HN Who's Hiring, GitHub dependencies, official career pages, funding announcements, conference attendee lists), give working extraction code in Python and JavaScript, and document the verification step that drops signals that look like a fit but are not. It is dogfood content: the signal-first posture is also what we ourselves use to find customers.

The playbook is technical and assumes you are comfortable with REST APIs and a parser. If you are looking for the strategic framing, the ingestion section of the spec is the shorter read.

The premise: signal-first vs roster-first

Roster-first outbound asks "who is plausibly in our ICP?" and contacts everyone who fits. The implicit bet is that volume will find the few who happen to be ready. The bet has worked for a decade because deliverability has been the bottleneck, not targeting. As deliverability tightens — Gmail and Yahoo's 2024 sender rules, Apple Mail Privacy Protection, the slow death of the open-rate metric — the volume bet stops paying.

Signal-first outbound asks "who has just published a public artifact that announces they are in our ICP today?" The implicit bet is that observable behavior beats inferred firmographics. Public artifacts are higher-intent than rostered attributes because the company chose to publish them. A job post is a budget commitment in a way a firmographic tag is not. A GitHub dependency is a code commitment in a way "tech stack: Node.js" is not. A fundraise press release is an explicit budget signal that a tag cannot fake.

The dev-to-dev variant matters because the target population — engineers, technical founders, CTOs at small SaaS — is unusually responsive to outreach that demonstrates the sender actually read the artifact. The hit-rate flip from 2% on roster-first to 12-18% on signal-first is not magic; it is the recipient recognizing that you opened with something specific they published, not a Mad-Libs template with their company name substituted in.

Section 1 · HN Who's Hiring monthly threads

The Hacker News "Who is hiring?" monthly thread is the single highest-signal public artifact for technical hiring. Each thread surfaces 800-1,500 company comments per month, each following a roughly standard format: company name, location, remote policy, role, stack, contact. The signal-to-noise ratio is unusually good because HN comment culture penalizes recruiter- style copy.

The Algolia HN Search API exposes the threads programmatically. Each monthly thread is a root post whose children are the company comments. Fetch the root, then page through children:

# Python: fetch the most recent "Who's hiring?" thread + comments
import re
import httpx

UA = "cognilead-signal-bot/1.0 (+mailto:signals@your-domain.dev)"
ALGOLIA = "https://hn.algolia.com/api/v1"

def latest_hiring_thread() -> dict:
    r = httpx.get(
        f"{ALGOLIA}/search_by_date",
        params={
            "query": "Ask HN: Who is hiring?",
            "tags": "story,author_whoishiring",
            "hitsPerPage": 1,
        },
        headers={"user-agent": UA},
        timeout=15,
    )
    r.raise_for_status()
    return r.json()["hits"][0]

def thread_comments(story_id: int) -> list[dict]:
    # The HN Firebase endpoint returns the full tree; Algolia returns
    # flat comments filtered by parent. We use the latter for cheaper paging.
    r = httpx.get(
        f"{ALGOLIA}/search",
        params={
            "tags": f"comment,story_{story_id}",
            "hitsPerPage": 1000,
        },
        headers={"user-agent": UA},
        timeout=30,
    )
    r.raise_for_status()
    return r.json()["hits"]

def filter_by_stack(comments: list[dict], pattern: str) -> list[dict]:
    rx = re.compile(pattern, re.IGNORECASE)
    return [c for c in comments if rx.search(c.get("comment_text") or "")]

if __name__ == "__main__":
    thread = latest_hiring_thread()
    comments = thread_comments(thread["objectID"])
    # FHIR / HL7 / HIPAA hits
    hits = filter_by_stack(comments, r"\b(FHIR|HL7|HIPAA|EHR|Epic)\b")
    for c in hits[:5]:
        print(c["author"], "—", c["story_id"], "—", c["comment_text"][:200])

Filter regexes are doing the load-bearing work. A target stack like ISO 20022, FHIR, or HIPAA encodes well as a literal token with word boundaries; a more general signal like "hiring a senior Postgres engineer" needs a less greedy pattern and a post-filter to drop generalist roles.

Why HN beats LinkedIn job descriptions for the dev-target outbound: HN comments are written by the engineer who will work with the hire, not by an HR copy team. The language is specific about the stack, the problem, and the bottleneck. A LinkedIn listing is optimized for recruiter SEO; an HN comment is optimized to attract the actual engineer the company wants. That asymmetry is the signal.

Section 2 · GitHub repos with revealing dependencies

A package dependency is a code commitment that announces a company's build intent in a way no firmographic row can. The company chose to add swissqrbill to package.json — they are building Swiss QR-bill invoicing right now. The company chose to add iso-20022-xml to requirements.txt — they are wiring payment infrastructure. These are observable build signals.

The GitHub Code Search API returns repos where a given filename contains a given string. Combine with the Repos API to surface owner / pushed_at / language to filter for active, company- affiliated repos:

// JavaScript: find recent repos depending on swissqrbill
// Requires a GitHub PAT in GITHUB_TOKEN — anonymous quota is too thin.
import { setTimeout as sleep } from 'node:timers/promises'

const GH = 'https://api.github.com'
const UA = 'cognilead-signal-bot/1.0 (+mailto:signals@your-domain.dev)'
const TOKEN = process.env.GITHUB_TOKEN
const HEADERS = {
  'user-agent': UA,
  accept: 'application/vnd.github+json',
  authorization: `Bearer ${TOKEN}`,
  'x-github-api-version': '2022-11-28',
}

async function codeSearch(query, page = 1) {
  // q syntax: filename:package.json swissqrbill in:file
  const url = `${GH}/search/code?q=${encodeURIComponent(query)}&per_page=30&page=${page}`
  const r = await fetch(url, { headers: HEADERS })
  if (r.status === 403) {
    // secondary rate limit — back off and retry once
    const wait = Number(r.headers.get('retry-after') ?? 60)
    await sleep(wait * 1000)
    return codeSearch(query, page)
  }
  if (!r.ok) throw new Error(`gh search ${r.status}: ${await r.text()}`)
  return r.json()
}

async function fetchPackageJson(repoFullName, path) {
  const url = `${GH}/repos/${repoFullName}/contents/${path}`
  const r = await fetch(url, { headers: HEADERS })
  if (!r.ok) return null
  const json = await r.json()
  // Contents API returns base64 content for blobs under 1MB.
  return JSON.parse(Buffer.from(json.content, 'base64').toString('utf8'))
}

const hits = await codeSearch('filename:package.json swissqrbill in:file')
for (const item of hits.items) {
  const pkg = await fetchPackageJson(item.repository.full_name, item.path)
  if (!pkg) continue
  const dep = pkg.dependencies?.swissqrbill ?? pkg.devDependencies?.swissqrbill
  if (!dep) continue
  console.log({
    company_domain: item.repository.homepage ?? null,
    repo: item.repository.full_name,
    package_version: dep,
    pushed_at: item.repository.pushed_at,
    license: item.repository.license?.spdx_id ?? null,
  })
  await sleep(500) // be polite — secondary limit forgives slow callers
}

Practical targets we mine for dev-to-dev outbound include:

  • Healthcare: fhir.js, @types/fhir, hl7-fhir, fhirclient (Python).
  • Swiss / EU payments: swissqrbill, iso-20022-xml, sepa-payment-initiation.
  • Compliance plumbing: node-cron + audit-log packages, json-schema-traverse alongside RLS migrations.

Section 3 · Official career pages

Career pages on company-owned domains are a higher-trust source than aggregators because the company controls the copy. The information density per post is lower than HN Who's Hiring, but the volume is dramatically higher and the staleness is bounded by the company's own posting cadence.

Ethical scraping rules apply unmodified:

  • Fetch /robots.txt first. If the careers path is disallowed, skip the domain. The marginal cost of obeying robots is approximately zero; the reputational cost of being a publicly known robots-violator is high.
  • Rate-limit per origin to at most one request every 2-3 seconds. The cap is so far below what any career page sees on a busy launch day that you will never be a problem; respecting it keeps you out of WAF rules.
  • Cache fetched pages for at least 24 hours. Career pages move on a multi-day cadence at most; re-fetching every hour adds load without adding signal.
  • Identify the bot with a User-Agent that includes a contact email. When a webmaster wants to ask you something, give them a way to ask.

The tells that turn a career-page post into a usable signal:

  • "Looking for an engineer with HIPAA experience" — explicit compliance build commitment. High-intent for HIPAA tooling sellers.
  • "You will own our Stripe integration end to end" — payments infrastructure scope. High-intent for payments orchestration tooling.
  • "Helping us migrate from MongoDB to Postgres" — explicit transition. High-intent for DB migration / Postgres ops tooling.
  • "SOC 2 readiness is a current priority" — compliance audit timeline. High-intent for compliance automation tooling.

Generic copy ("full-stack engineer, modern stack, ping-pong table") is signal-negative — drop the row. The verification step in section 6 is where the drop happens.

Section 4 · Funding announcements

A newly-funded company has, by definition, fresh budget and a runway-defined pressure to deploy it. The signal is loud but broad: not every freshly-funded company is in your ICP, so filtering is everything.

Public sources for funding data include Crunchbase (paid for good coverage, partial public feed), press releases on company-owned blogs, and the dedicated venture trackers (Sifted for EU/UK, Pitchbook for paid US coverage). The signal-quality hierarchy is roughly: company-owned announcement > named venture-firm press release > aggregator listing.

Practical filter dimensions:

  • Round size and stage. Series A is usually the sweet spot for technical-tooling outbound — Seed companies have not built the engineering org yet, Series C and later have settled on incumbents.
  • Vertical match. If you sell HIPAA tooling, a HealthTech Series A is in scope; a fintech Series A is probably not. Use the announcement's own framing ("builds an EHR for behavioral health") rather than firmographic tags.
  • Geo match. CogniLead's EU-residency targeting means EU-headquartered fundraises map cleanly to our jurisdiction-routed sends. Map to your own residency posture.
  • Time window. Funding signals decay fast — the first 30-60 days after announcement is when the team is actively hiring and buying. After 90 days you are competing with everyone else who waited.

We do not recommend buying a bulk Crunchbase export and dumping it into outbound — that converts the funding signal back into a roster and gives up the wedge. Curate small, respect the time window, and pair the funding fact with one other artifact (career-page post, GitHub repo) before contacting.

Section 5 · Public conference attendee lists

Some conferences publish attendee lists, exhibitor lists, or speaker lists. HIMSS (healthcare), Money 20/20 (fintech), Slush (European startups), Web Summit, Open Source Summit. The signal is "company sent a human to a vertical-specific event" — a budget and prioritization marker that maps cleanly to vertical-tooling outbound.

Extraction is standard scraping with the polite-rate caveats above. The GDPR boundary matters here in a way it does not for the other artifact classes:

  • Personal names from attendee badges were collected by the conference under a privacy notice that did not contemplate downstream commercial reuse. The conference is the controller; you are not their processor. Drop personal names. Keep only the corporate fact ("Company X had a stand at Money 20/20 2026").
  • Speaker lists are typically published for self-promotion and the speaker has effectively consented to public association — corporate fact plus speaker name is fine, but the contact step is to find the corporate generic address, not to email the speaker's personal address from the badge.
  • Exhibitor lists are corporate facts in public — fully fair game. The lead is the company; the contact is found through your normal corporate-channel enrichment.

Section 6 · The verification step

Signal extraction without a verification step is just lower- precision rostering. The drop-rate on signals that look like a fit but turn out not to be is structurally 20-35%, by design. We log the verification outcome on every signal and drop the ones that fail.

The CogniLead pipeline exposes this as a technical_hook_verified boolean on the lead row, decided by a stage that combines:

  • Re-fetch the artifact. If the GitHub repo no longer references the dependency at HEAD, the signal is stale — drop.
  • Cross-corroboration. Does at least one other public artifact mention the same build? A repo dependency plus a career-page post for the same stack is two-signal corroboration; a single signal in isolation is one-signal. We require two-signal for high-confidence sends.
  • Recency. Signals older than 90 days are downgraded to background context and do not by themselves trigger a send. The signal pollers re-poll on a cadence so a stale signal eventually drops out.
  • Negative-signal filter. If the company announced sunsetting the relevant product line, or laid off the relevant team, the signal is negative and the lead is dropped to a 12-month cooldown.

The reason this matters: a lead that fails verification is a lead that produces a low-quality send if you let it through, which costs you reputation, which costs you the next ten legitimate sends. The 20-35% drop rate is what the signal-first wedge buys — a smaller, higher-precision pipeline at comparable absolute reply volume to a 4× larger roster send.

Section 7 · Signal feature schema

Once verification is in place the next discipline is learning from outcomes. Every signal that produces a sent message produces, eventually, a reply or a non-reply within the 14-day horizon. Persist the signal features alongside the outcome and a future model can learn which signal shapes convert.

CogniLead persists this in the signal_outcomes table — schema in the data-model section of the spec. The relevant column is signal_features, a JSON blob with the per-signal feature set. A workable schema:

// signal_outcomes.signal_features — JSON schema sketch
{
  // What artifact produced the signal
  "source_class": "hn_hiring" | "github_dep" | "career_page" | "fundraise" | "conference",
  "source_url": "https://...",
  "observed_at": "2026-06-04T12:00:00Z",

  // Recency / freshness
  "artifact_age_days": 7,
  "is_fresh_within_30d": true,

  // Strength
  "specificity_score": 0.82,     // 0-1, regex/NLP-derived
  "explicit_role_match": true,   // job title or repo file path matches
  "second_signal_present": true, // cross-corroborated
  "second_signal_class": "career_page",

  // Vertical / stack hits — denormalised for analytics
  "stack_hits": ["fhir.js", "@types/fhir", "node-cron"],
  "vertical_tags": ["healthcare", "fhir", "ehr"],

  // Verification
  "hook_verified_at": "2026-06-04T12:15:00Z",
  "verification_method": "re_fetch_artifact",
  "negative_signal_check": "clean",

  // Geo / jurisdiction — load-bearing for residency routing
  "company_country": "CH",
  "company_jurisdiction": "CH"
}

Two design notes. First, denormalise the things you will want to analyse — keeping stack_hits as a flat array makes the eventual learn-from-replies query fast. Second, keep the feature set narrow and stable; adding features later is fine, renaming them six months in is painful when you want to look at year-over-year reply rates.

Next step

The signal-extraction discipline is the wedge whether you build it yourself or use CogniLead. If you build it yourself, the five artifact classes above plus the verification step plus the signal-feature schema is the minimum viable surface. If you use CogniLead, the same shape is available via the MCP server and the REST API — see /mcp for the server tools and /docs/api for the contract.

For the compliance wrapper around all of this — the LIA, the DPA inquiry response motion, the evidence pack — see the companion playbooks GDPR cold email LIA 2026 and DPA inquiry response in 30 days.

CogniLead

Build this with the CogniLead MCP

Five signal pollers, the verification stage, jurisdiction-aware send, and the audit chain — shipping behind an MCP server and a per-send price.

Not legal advice

This playbook is published by CogniLead for orientation. It is not legal advice and should not replace counsel from a Data Protection Officer or qualified lawyer. The applicable rules depend on your jurisdiction, your data subjects, and the specific facts of the processing.