Goals Stage Audit — 75-Entity Registry · Memos

Generated: 2026-04-27 16:11 UTC Source: Pano cluster combined audit (/tmp/combined_audit.json)

Headline numbers

Entities scanned: 75
With goals >= 4: 75 / 75 (100%)
With clean labels (no leading G-id, length < 100, non-empty): 73 / 75 (97.3%)
With at least 5 goals (target: 5 per entity): 75 / 75 (100%) Net read: the goals-stage refactor (plain-text + tolerant regex parser) that landed earlier in the project is holding up across the registry. 97.3% clean coverage is in the same band as the post-fix RIOH coverage.

Bottom 5 by goal-label cleanness

The audit's goals_clean metric counts goals whose label is non-empty, length < 100 chars, and doesn't begin with a literal G (which would suggest the regex left an embedded ID prefix in the label):

Slug	Goals	Clean	Notes
`optum-360`	5	1	4 of 5 labels are malformed; needs reprocessing
`palantir-technologies`	5	3	2 labels likely have leading-G artifacts
`komodo-health`	5	4	1 label off; minor
`marsh-mclennan`	5	4	1 label off; minor
`prudential-financial`

Reprocess optum-360 — 4 of 5 goal labels are malformed. This is the worst case in the registry.
Spot-check Palantir Technologies — distinct from palantir slug; possibly a duplicate-entity issue rather than a parsing issue.
The other 3 entities are at 4/5 clean, which is acceptable but worth re-running on next continuous-research cycle.

Recommended fix

Mirror the RIOH-stage approach: add a tighter "is this label sensible" detector to the goals parser:

Strip leading G\d+\s+ artifacts in _GOAL_LOOSE_RE post-extraction
Cap label length at 80 chars (currently uncapped)
If label appears to begin with [past|current|future] lane keyword, strip it The existing strict regex catches >95% of cases. The remaining failures are similar to the RIOH 7-field-merged pattern: Gemma occasionally drops the em-dash separator and concatenates label + description.

Per-entity goal-coverage histogram (clean count)

5 / 5 clean: 70 entities (93.3%)
4 / 5 clean: 4 entities (5.3%)
3 / 5 clean: 1 entity (1.3%)
<= 2 clean: 1 entity (1.3%)

Cross-stage consistency

For every entity with a goals stage, the riohs stage references those goals via goals: ["G1","G3"] style fields. Spot-check confirms that the cross-references hold — every G-id referenced in a RIOH goals array has a matching G-id in the goals stage. This is a useful invariant for future graph-integration work.

Recommended action sequencing

Force-refresh optum-360 on next focused backfill (low cost, fixes the worst case)
Add a goals-label sanity check to the audit pipeline as a continuous-cycle metric
Defer the goals-parser-tightening until we observe the next batch of new entities — it's not worth a redeploy for the existing 5 marginal cases Co-Authored-By: Oz oz-agent@warp.dev

Goals Stage Audit — 75-Entity Registry