pano-add + top-1000 watchlist ingest — 2026-04-19 · Memos

Loader: scripts/seed_pano_adds_20260419.py (reuses the primitives from cb_eco_seed.py).

Inputs (Crunchbase exports, pinned paths):

/Users/bradleygallaher/Downloads/add-pano-4-19-2026.csv — 15 curated healthcare / semiconductor / fintech adds.
/Users/bradleygallaher/Downloads/companies-4-19-2026.csv — 999 top-rank, heat/growth-scored watchlist entries.

All rows land as organization entities, dedupe against existing KG orgs by lowercase name, and get linked to env 40 (Company-Deep Similar watchlist). Embeddings via the local llama.cpp Jina-v5 server on :8081.

Results

add-pano-4-19-2026.csv   (pano_add)
  parsed=15   new=9   already-exist=6
  upserted=9  env-linked=15  embedded=9

companies-4-19-2026.csv   (top1000)
  parsed=999  new=466  already-exist=533
  upserted=466  env-linked=979  embedded=466

TOTAL
  parsed=1014  new=475  existing-linked=539
  upserted=475 env-linked=994  embedded=475

env 40 total linked entities grew from ~50 (Similar-50 baseline) to 1042 after the ingest.

One transient Supabase read-timeout during the embed-persist phase for Granite Ridge Resources (#1586777); verified post-hoc that the embedding actually did land, so final embed coverage is 475/475.

Source tags (for downstream filtering)

Every new row carries a metadata_json.source key so later analyses can partition env 40 cleanly:

pano_add_20260419 — the 15-row curated adds.
pano_top1000_20260419 — the 999-row top-rank watchlist.

Both also carry the original Crunchbase URL, slug, CB rank, industries, HQ location, stage (for the adds), and heat/growth score tier (for the top-1000).

Dedup behavior

The loader dedupes by lowercase name against every existing organization entity. When a CSV row matches an existing entity, the loader:

Leaves entities + entity_details untouched (no overwrite).
Adds the env-40 link (idempotent upsert on environment_entity_links).
Skips the embed step.

This is the correct behavior for preserving prior enrichment scores on pre-existing orgs (Anthropic, Meta, Stripe, etc.).

Infra used

Inference: not used for ingest.
Embedding: llama-server --embedding on :8081 with Jina-v5 small retrieval Q8 (1024-dim).

Both servers were already running from the enrichment_run_20260419.md pass earlier in the day.

Follow-up

The 475 new entities currently sit at the 0.20–0.30 baseline enrichment_score. Running enrich_cli.py entity <id> against the top-CB-rank entries would lift them the same way the 2026-04-19 enrichment pass lifted the 15 canonical companies (mean score 0.839). Scope that as a follow-up rather than running another 475-entity enrichment in this session per the API-rate-limit rule.
Consider promoting the highest-signal adds (Abridge, Waystar, Hippocratic AI, Tennr, Assort Health) into the first-class frontend/src/lib/company-library.ts so they get the full /companies/[slug] deep view.

pano-add + top-1000 watchlist ingest — 2026-04-19

Results

Source tags (for downstream filtering)

Dedup behavior

Infra used

Follow-up

Linked Intel

Related Intel