Loader: scripts/seed_pano_adds_20260419.py (reuses the primitives
from cb_eco_seed.py).
Inputs (Crunchbase exports, pinned paths):
/Users/bradleygallaher/Downloads/add-pano-4-19-2026.csv— 15 curated healthcare / semiconductor / fintech adds./Users/bradleygallaher/Downloads/companies-4-19-2026.csv— 999 top-rank, heat/growth-scored watchlist entries.
All rows land as organization entities, dedupe against existing
KG orgs by lowercase name, and get linked to env 40
(Company-Deep Similar watchlist). Embeddings via the local
llama.cpp Jina-v5 server on :8081.
Results
add-pano-4-19-2026.csv (pano_add)
parsed=15 new=9 already-exist=6
upserted=9 env-linked=15 embedded=9
companies-4-19-2026.csv (top1000)
parsed=999 new=466 already-exist=533
upserted=466 env-linked=979 embedded=466
TOTAL
parsed=1014 new=475 existing-linked=539
upserted=475 env-linked=994 embedded=475
env 40 total linked entities grew from ~50 (Similar-50 baseline) to 1042 after the ingest.
One transient Supabase read-timeout during the embed-persist phase
for Granite Ridge Resources (#1586777); verified post-hoc that
the embedding actually did land, so final embed coverage is
475/475.
Source tags (for downstream filtering)
Every new row carries a metadata_json.source key so later
analyses can partition env 40 cleanly:
pano_add_20260419— the 15-row curated adds.pano_top1000_20260419— the 999-row top-rank watchlist.
Both also carry the original Crunchbase URL, slug, CB rank, industries, HQ location, stage (for the adds), and heat/growth score tier (for the top-1000).
Dedup behavior
The loader dedupes by lowercase name against every existing
organization entity. When a CSV row matches an existing
entity, the loader:
- Leaves
entities+entity_detailsuntouched (no overwrite). - Adds the env-40 link (idempotent upsert on
environment_entity_links). - Skips the embed step.
This is the correct behavior for preserving prior enrichment scores on pre-existing orgs (Anthropic, Meta, Stripe, etc.).
Infra used
- Inference: not used for ingest.
- Embedding:
llama-server --embeddingon :8081 with Jina-v5 small retrieval Q8 (1024-dim).
Both servers were already running from the
enrichment_run_20260419.md pass earlier in the day.
Follow-up
- The 475 new entities currently sit at the 0.20–0.30 baseline
enrichment_score. Running
enrich_cli.py entity <id>against the top-CB-rank entries would lift them the same way the 2026-04-19 enrichment pass lifted the 15 canonical companies (mean score 0.839). Scope that as a follow-up rather than running another 475-entity enrichment in this session per the API-rate-limit rule. - Consider promoting the highest-signal adds (Abridge, Waystar,
Hippocratic AI, Tennr, Assort Health) into the first-class
frontend/src/lib/company-library.tsso they get the full/companies/[slug]deep view.