@pentatonic-ai/ai-agent-sdk 0.10.18 → 0.10.20
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.cjs +1 -1
- package/dist/index.js +1 -1
- package/package.json +1 -1
- package/packages/memory-engine-v2/RFC-decay-and-fusion.md +122 -8
- package/packages/memory-engine-v2/compat/server.py +18 -4
- package/packages/memory-engine-v2/docs/redistill-execution-plan-2026-06-22.md +269 -0
- package/packages/memory-engine-v2/docs/redistill-plan-2026-06-21.md +101 -0
- package/packages/memory-engine-v2/extractor-async/extraction_diff.py +218 -0
- package/packages/memory-engine-v2/extractor-async/test_email_alias_guard.py +78 -0
- package/packages/memory-engine-v2/extractor-async/test_extraction_diff.py +180 -0
- package/packages/memory-engine-v2/extractor-async/test_prompt_rules.py +58 -0
- package/packages/memory-engine-v2/extractor-async/worker.py +116 -0
- package/packages/memory-engine-v2/scripts/build_retrain_corpus.py +240 -0
- package/packages/memory-engine-v2/scripts/fusion_defrag.py +440 -0
- package/packages/memory-engine-v2/scripts/redistill.py +236 -0
package/dist/index.cjs
CHANGED
|
@@ -878,7 +878,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
|
|
|
878
878
|
}
|
|
879
879
|
|
|
880
880
|
// src/telemetry.js
|
|
881
|
-
var VERSION = "0.10.
|
|
881
|
+
var VERSION = "0.10.20";
|
|
882
882
|
var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
|
|
883
883
|
function machineId() {
|
|
884
884
|
const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";
|
package/dist/index.js
CHANGED
|
@@ -847,7 +847,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
|
|
|
847
847
|
}
|
|
848
848
|
|
|
849
849
|
// src/telemetry.js
|
|
850
|
-
var VERSION = "0.10.
|
|
850
|
+
var VERSION = "0.10.20";
|
|
851
851
|
var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
|
|
852
852
|
function machineId() {
|
|
853
853
|
const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@pentatonic-ai/ai-agent-sdk",
|
|
3
|
-
"version": "0.10.
|
|
3
|
+
"version": "0.10.20",
|
|
4
4
|
"description": "TES SDK — LLM observability and lifecycle tracking via Pentatonic Thing Event System. Track token usage, tool calls, and conversations. Manage things through event-sourced lifecycle stages with AI enrichment and vector search.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "./dist/index.cjs",
|
|
@@ -1,10 +1,18 @@
|
|
|
1
1
|
# RFC: the Fusion Drive — v2 memory self-healing (cross-run node fusion + decay)
|
|
2
2
|
|
|
3
3
|
> **Fusion Drive** = the continuous, arena-scoped background engine that keeps the v2
|
|
4
|
-
> memory graph self-healing
|
|
5
|
-
>
|
|
6
|
-
>
|
|
7
|
-
>
|
|
4
|
+
> memory graph self-healing. It triages every node into one of **three** outcomes:
|
|
5
|
+
> it *fuses* duplicate/near-duplicate nodes from different distillation runs into a single
|
|
6
|
+
> master node (horizontal convergence); it *re-distills* high-value extractions produced by
|
|
7
|
+
> a superseded teacher/prompt — regenerating them from the still-present source event through
|
|
8
|
+
> the current clean teacher (depth refresh); and it *decays* stale, low-value, and junk nodes
|
|
9
|
+
> out of existence (vertical aging). Named for the drive that does the fusing — the re-distill
|
|
10
|
+
> and decay passes ride the same engine.
|
|
11
|
+
>
|
|
12
|
+
> *(Revised 2026-06-22: added Part B′ — re-distillation — as the third triage verb, with the
|
|
13
|
+
> prompt-version-drift trigger. Motivated by the clean-prompt deploy (SDK 0.10.19, #126 +
|
|
14
|
+
> #129) which made "the current teacher is materially better than the one that produced most
|
|
15
|
+
> of the graph" concrete and measurable via `system_prompt_hash`.)*
|
|
8
16
|
|
|
9
17
|
**Status:** draft / spec — 2026-06-12
|
|
10
18
|
**Builds on:** `RFC-entity-reconciliation.md`, `scripts/entity_resolution_v2.py` (#82),
|
|
@@ -139,15 +147,101 @@ sparse backfill.
|
|
|
139
147
|
|
|
140
148
|
---
|
|
141
149
|
|
|
150
|
+
## Part B′ — Re-distillation: regenerate stale-prompt extractions from source
|
|
151
|
+
|
|
152
|
+
Fusion (A) needs a *correct counterpart* to converge toward; Decay (B) just *deletes*. But
|
|
153
|
+
the common case after a teacher/prompt upgrade is a **high-value node with no correct
|
|
154
|
+
counterpart yet** — the only extraction that exists is the stale-prompt one. Fusion has
|
|
155
|
+
nothing to fuse to; decay would throw away real information. The cure is the third verb: the
|
|
156
|
+
**source event still exists** (`events` table, 376k rows live), so regenerate the extraction
|
|
157
|
+
by re-running that event through the *current clean teacher*. Fusion converges horizontally,
|
|
158
|
+
decay ages vertically; re-distill refreshes **in depth**.
|
|
159
|
+
|
|
160
|
+
### B′1. Trigger — prompt-version drift, not raw age
|
|
161
|
+
The defect population is *exactly* the facts/entities whose provenance traces an **old
|
|
162
|
+
`system_prompt_hash`** — `bbdaba6b…` / `f1e0ff55…` / `ef0647c7…` (pre-clean), vs the clean
|
|
163
|
+
`6ccfe70f…` deployed with 0.10.19 (#126 modality/attribution + #129 email-discipline &
|
|
164
|
+
entity-separation). #118 propagated source onto facts, so provenance → the event's
|
|
165
|
+
`distillation_traces.system_prompt_hash` is queryable. **Age is a weak proxy; prompt-version
|
|
166
|
+
selects the defect set directly** — a months-old node the clean teacher would extract
|
|
167
|
+
identically needs nothing; a two-day-old node from the dirty prompt is a defect. Prioritize
|
|
168
|
+
by `salience` (B1) so high-value stale nodes go first.
|
|
169
|
+
|
|
170
|
+
### B′2. Triage routing — 3-way, by salience × prompt-version
|
|
171
|
+
Per assessed node/event:
|
|
172
|
+
|
|
173
|
+
| condition | outcome |
|
|
174
|
+
|---|---|
|
|
175
|
+
| stale prompt-hash **+** high salience **+** source event present | **re-distill** (this part) |
|
|
176
|
+
| has a correct newer-teacher counterpart in the arena | **fuse** (Part A) |
|
|
177
|
+
| low salience, junk-born (B2), no corroboration, never accessed | **decay** (Part B) |
|
|
178
|
+
|
|
179
|
+
### B′3. Mechanism — re-enqueue, don't mutate in place
|
|
180
|
+
Re-distill = re-insert the source `event_id` into `distillation_queue` (`status='pending'`,
|
|
181
|
+
`attempts=0`). The existing **extractor-async** worker claims it, runs the clean teacher,
|
|
182
|
+
writes the new extraction **and a fresh `6ccfe70f` trace**. No new pipeline — it reuses the
|
|
183
|
+
distiller, the combined-demand **autoscaler**, and the trace ledger. (Re-distill is a
|
|
184
|
+
*producer* of queue demand; the autoscaler's student-aware floor already keeps a teacher box
|
|
185
|
+
warm for it — see the deploy notes.)
|
|
186
|
+
|
|
187
|
+
### B′4. Supersedence — the load-bearing requirement
|
|
188
|
+
The store is **pure-accretion** (the whole motivation of this RFC). A naive re-enqueue makes
|
|
189
|
+
the clean extraction land **beside** the dirty one → it *worsens* fragmentation. So
|
|
190
|
+
re-distill MUST close the loop through Fusion's tombstone machinery — it is **sequenced into
|
|
191
|
+
the Fusion Drive, not bolted on**:
|
|
192
|
+
|
|
193
|
+
1. Each re-distill is recorded in a `redistill_runs` ledger with its triggering
|
|
194
|
+
`(event_id, old_prompt_hash)`.
|
|
195
|
+
2. When the clean extraction completes, **Fusion converges old ↔ new for that event** using
|
|
196
|
+
the teacher-version master signal (A2/A3): the new `6ccfe70f` extraction wins as master;
|
|
197
|
+
the old extraction's now-orphaned nodes (those whose **only** provenance was this event
|
|
198
|
+
under the old hash) are tombstoned/repointed via `entity_merges` / `fact_merges`.
|
|
199
|
+
3. Where an old node carries **other live provenance** (multi-event corroboration), only this
|
|
200
|
+
event's contribution is repointed — **never blind-delete a multi-source node** (the
|
|
201
|
+
over-merge failure mode: a hotel email wrongly attached to a person must not let one
|
|
202
|
+
event's repoint nuke an otherwise-corroborated node).
|
|
203
|
+
|
|
204
|
+
This dependency is hard: **re-distill is unsafe until Fusion's cross-run / teacher-version
|
|
205
|
+
master selection (E3) is live.** Until then a re-distill loop accretes. An interim cheaper
|
|
206
|
+
option (Open Q): explicit **event-scoped supersede** — delete only the facts/entities whose
|
|
207
|
+
provenance set is exactly `{this event}` under the old hash before re-enqueue — covers the
|
|
208
|
+
single-provenance majority without the full fusion adjudicator.
|
|
209
|
+
|
|
210
|
+
### B′5. Corpus-as-byproduct — one loop, three wins
|
|
211
|
+
Every re-distill emits a clean `6ccfe70f` `distillation_trace`. A prompt-version-drift
|
|
212
|
+
re-distill loop therefore **builds the student retrain corpus while it repairs the graph**
|
|
213
|
+
(`scripts/build_retrain_corpus.py` consumes those traces). It subsumes the one-shot full
|
|
214
|
+
re-distill: gradual, rate-limited, no nuke — graph repair **+** corpus **+** self-healing
|
|
215
|
+
from a single engine. This is the durable answer to "is the corpus building?": it is, as a
|
|
216
|
+
side effect of the gardener.
|
|
217
|
+
|
|
218
|
+
### B′6. Cadence + cost + safety
|
|
219
|
+
Rolling, rate-limited, autoscaler-aware, off-peak. Budget *N* events/hour against teacher
|
|
220
|
+
capacity; order by `salience × staleness`. **Never big-bang the full backlog** — gradual
|
|
221
|
+
migration is the point. Arena-scoped, dry-run → `--apply`, `redistill_runs` ledger for
|
|
222
|
+
observability and rollback. Same operational shape as fusion/decay/autoscaler.
|
|
223
|
+
|
|
224
|
+
---
|
|
225
|
+
|
|
142
226
|
## Part C — Ordering & how they combine
|
|
143
227
|
|
|
144
|
-
Per arena, on schedule: **(1)
|
|
145
|
-
|
|
146
|
-
|
|
228
|
+
Per arena, on schedule: **(1) triage → re-distill the high-value stale-prompt set (async via
|
|
229
|
+
the queue) → (2) fusion → (3) decay.** Re-distill is enqueued first so that by the time
|
|
230
|
+
fusion runs, the clean counterpart exists for it to crown as master (else fusion has only
|
|
231
|
+
stale renderings to choose between). Fusion then absorbs each master's duplicates'
|
|
232
|
+
provenance/salience *before* decay judges it (else a real node split across two weak dupes
|
|
233
|
+
could wrongly decay out). Then decay ages + evicts the survivors.
|
|
234
|
+
|
|
235
|
+
*(Re-distill is asynchronous — it completes on the teacher's schedule — so in practice a
|
|
236
|
+
node re-distilled in this pass is fused/decayed in the **next** per-arena pass, once its
|
|
237
|
+
clean trace + extraction have landed. The ledger links the two.)*
|
|
147
238
|
|
|
148
239
|
**This is what finally cures immortal pollution:**
|
|
149
240
|
- 7B polluted node *with* a correct Qwen3.6 counterpart → **fused**, correct one as master,
|
|
150
241
|
polluted demoted to alias / tombstoned.
|
|
242
|
+
- stale-prompt node, *high-value*, *no* correct counterpart, source event present →
|
|
243
|
+
**re-distilled** through the clean teacher → new master extraction; old superseded via
|
|
244
|
+
fusion (B′4). The information is *recovered*, not lost.
|
|
151
245
|
- 7B pure-junk node with *no* correct counterpart (numeric-ID-person, ungrounded) → born-low
|
|
152
246
|
salience + no corroboration + never accessed → **decays out and is evicted**.
|
|
153
247
|
|
|
@@ -165,8 +259,15 @@ reset, but no longer the *only* path).
|
|
|
165
259
|
- `relationships`: `+ salience REAL`, `+ last_accessed` (already has `weight`,
|
|
166
260
|
`first/last_seen`).
|
|
167
261
|
- new `fact_merges` audit (mirror `entity_merges` incl. `rollback_payload`).
|
|
168
|
-
- new `fusion_runs` + `decay_runs` ledgers for observability.
|
|
262
|
+
- new `fusion_runs` + `decay_runs` + `redistill_runs` ledgers for observability. `redistill_runs`:
|
|
263
|
+
`(id, arena, event_id, old_prompt_hash, new_prompt_hash, salience_at_trigger, enqueued_at,
|
|
264
|
+
completed_at, fused_at, mode)` — links a re-distill to its triggering node and to the fusion
|
|
265
|
+
that superseded the old extraction.
|
|
169
266
|
- `/search` gains a `last_accessed = NOW()` bump on returned nodes (batched).
|
|
267
|
+
- re-distill trigger needs provenance → prompt-version: either denormalize `system_prompt_hash`
|
|
268
|
+
onto `facts`/`entities` at write time (cheap filter), or join through
|
|
269
|
+
`distillation_traces(event_id → system_prompt_hash)` on the provenance event ids (no schema
|
|
270
|
+
change, costlier query). Prefer the join until the trigger volume justifies denormalizing.
|
|
170
271
|
|
|
171
272
|
## Part E — Rollout (each flag-gated, arena-scoped, dry-run-first, audited)
|
|
172
273
|
|
|
@@ -176,6 +277,13 @@ reset, but no longer the *only* path).
|
|
|
176
277
|
3. **Fusion extension** — scored canonical selection (fix typo-crowning) + cross-run
|
|
177
278
|
detection + fact fusion, dry-run → apply.
|
|
178
279
|
4. **Online/continuous** — wire fusion+decay to run after distillation per arena.
|
|
280
|
+
5. **Re-distill loop (Part B′)** — dry-run triage first (count stale-prompt nodes by
|
|
281
|
+
`system_prompt_hash` × salience bucket to size the work), then a **bounded `--apply` slice**
|
|
282
|
+
on one curated arena (re-enqueue + verify clean trace + verify fusion supersedes the old
|
|
283
|
+
extraction), then wire continuous. **Gated on step 3** (Fusion cross-run / teacher-version
|
|
284
|
+
master selection): until that's live, re-distill must use the interim **event-scoped
|
|
285
|
+
supersede** (B′4) or it accretes. Ships as `scripts/redistill.py` (dry-run default,
|
|
286
|
+
`--apply` gate, arena-scoped, `redistill_runs` ledger).
|
|
179
287
|
|
|
180
288
|
## Open questions
|
|
181
289
|
- Half-life constants per category — needs a calibration pass against real arenas.
|
|
@@ -183,3 +291,9 @@ reset, but no longer the *only* path).
|
|
|
183
291
|
- Directory authority source for canonical anchoring — HubSpot contacts? a curated table?
|
|
184
292
|
- Interaction with the (still-open) source_id supersede mode — fusion partly subsumes it,
|
|
185
293
|
but explicit supersede is cheaper for known-mutable sources.
|
|
294
|
+
- **Re-distill supersedence before full fusion is live** — is event-scoped supersede (delete
|
|
295
|
+
only nodes whose provenance set is exactly `{this event}` under the old hash) a safe enough
|
|
296
|
+
interim, or do we hard-gate the loop on E3? Single-provenance nodes are the majority, but
|
|
297
|
+
the multi-provenance tail is where the over-merge risk concentrates.
|
|
298
|
+
- **Re-distill prioritization** — pure `salience × staleness`, or weight toward the entities
|
|
299
|
+
behind known user-visible confabulations (Vickers/Boedecker) first?
|
|
@@ -896,6 +896,8 @@ class GraphQueryRequest(BaseModel):
|
|
|
896
896
|
entity_type: str | None = None
|
|
897
897
|
name: str | None = None # canonical_name (ILIKE)
|
|
898
898
|
subject: str | None = None # entity name OR canonical_name (facts.subject_entity)
|
|
899
|
+
subject_entity_id: str | None = None # EXACT facts.subject_entity_id — strict, no name bleed
|
|
900
|
+
object_entity_id: str | None = None # EXACT facts.object_entity_id
|
|
899
901
|
predicate: str | None = None
|
|
900
902
|
category: str | None = None # facts.category
|
|
901
903
|
from_name: str | None = None # relationships.from_entity.canonical_name
|
|
@@ -944,9 +946,14 @@ async def list_entities(req: GraphQueryRequest):
|
|
|
944
946
|
|
|
945
947
|
@app.post("/facts")
|
|
946
948
|
async def list_facts(req: GraphQueryRequest):
|
|
947
|
-
"""Filter facts by arena + optional category/predicate +
|
|
948
|
-
|
|
949
|
-
subject_entity_id
|
|
949
|
+
"""Filter facts by arena + optional category/predicate + subject.
|
|
950
|
+
|
|
951
|
+
PREFER `subject_entity_id` (exact id match) over `subject` (name ILIKE):
|
|
952
|
+
name matching bleeds one person's facts into another's answer when names
|
|
953
|
+
collide or fragment (the Will Vickers ⟵ Will Spencer confabulation — a
|
|
954
|
+
query resolved to one entity must NOT pull a same/similar-named entity's
|
|
955
|
+
facts). The name path is kept for back-compat callers that haven't resolved
|
|
956
|
+
an id yet, but entity-id is the strict, bleed-free path."""
|
|
950
957
|
arenas = _resolve_arenas(req)
|
|
951
958
|
conditions = ["f.arena = ANY(%s)"]
|
|
952
959
|
params: list[Any] = [arenas]
|
|
@@ -956,7 +963,14 @@ async def list_facts(req: GraphQueryRequest):
|
|
|
956
963
|
if req.predicate:
|
|
957
964
|
conditions.append("f.predicate ILIKE %s")
|
|
958
965
|
params.append(f"%{req.predicate}%")
|
|
959
|
-
if req.
|
|
966
|
+
if req.subject_entity_id:
|
|
967
|
+
conditions.append("f.subject_entity_id = %s")
|
|
968
|
+
params.append(req.subject_entity_id)
|
|
969
|
+
if req.object_entity_id:
|
|
970
|
+
conditions.append("f.object_entity_id = %s")
|
|
971
|
+
params.append(req.object_entity_id)
|
|
972
|
+
# Name path: only when no exact id was given (back-compat / unresolved callers).
|
|
973
|
+
if req.subject and not req.subject_entity_id:
|
|
960
974
|
conditions.append("EXISTS (SELECT 1 FROM entities e WHERE e.id = f.subject_entity_id AND (e.canonical_name ILIKE %s OR %s = ANY(e.aliases)))")
|
|
961
975
|
params.extend([f"%{req.subject}%", req.subject])
|
|
962
976
|
sql = f"""
|
|
@@ -0,0 +1,269 @@
|
|
|
1
|
+
# Re-distill EXECUTION plan — pentatonic-team, #126 modality fix (2026-06-22)
|
|
2
|
+
|
|
3
|
+
> **For review (Phil H) before any prod write.** This is the concrete, gated
|
|
4
|
+
> execution plan for deploying the #126 distiller fix and re-distilling the
|
|
5
|
+
> `pentatonic-team` graph. It supersedes the operational detail in
|
|
6
|
+
> `redistill-plan-2026-06-21.md` where live findings differ (scale, deploy
|
|
7
|
+
> mechanism, queue schema). The *why* and the guardrails there still hold.
|
|
8
|
+
>
|
|
9
|
+
> **Nothing here has been run except the outage fix in §1.** The irreversible
|
|
10
|
+
> full delete (§4) is explicitly gated on the pilot audit (§3) + a human go.
|
|
11
|
+
>
|
|
12
|
+
> **⚠️ Three prerequisites added after review (§0) are BLOCKING** — without them
|
|
13
|
+
> the re-distill would (a) re-introduce the same fabrications via the cascade
|
|
14
|
+
> student, (b) leave the vector index inconsistent, and (c) starve every other
|
|
15
|
+
> tenant's live ingest. Read §0 first.
|
|
16
|
+
|
|
17
|
+
## Why (recap)
|
|
18
|
+
|
|
19
|
+
A fact-by-fact audit found ~18.7% of distilled `pentatonic-team` facts were
|
|
20
|
+
wrong — dominated by **modality collapse** (future/scheduled/planned content
|
|
21
|
+
asserted as established fact), plus attribution errors and same-name
|
|
22
|
+
conflations. The teacher prompt is fixed in **ai-agent-sdk PR #126** (TENSE &
|
|
23
|
+
MODALITY / ATTRIBUTION FIDELITY / IDENTITY rules, in both `BATCH_SYSTEM_PROMPT`
|
|
24
|
+
and the active `GUIDED_JSON_SYSTEM_PROMPT`). The prompt only governs *future*
|
|
25
|
+
extractions, so the historical graph must be re-distilled to inherit the fix.
|
|
26
|
+
|
|
27
|
+
## Live findings that change the 2026-06-21 runbook
|
|
28
|
+
|
|
29
|
+
| Claim in 06-21 runbook | Live reality (verified 2026-06-22) |
|
|
30
|
+
|---|---|
|
|
31
|
+
| "~163k events", "434 facts" | **272,893 events · 483,097 facts · 291,797 rels** for `arena LIKE 'pentatonic-team%'`. The 434 was a 49-entity sample. |
|
|
32
|
+
| Deploy = "the L4 box running extractor-async" | **The distiller runs on the DB box `i-0559922cf59ac6975`** (`pme-prod-us-east-1`), container `pme2-extractor-async`. The `seesa-distiller-bakeoff` box (`i-0d65…`) is **NOT** the host (empty, stopped since 06-10) — earlier handoff was wrong. Models are external: teacher `seesa-distiller-l40s` (`172.31.26.202:8005`, `qwen3.6-27b-fp8`), student `seesa-student-l4` (`172.31.29.121:8005`). |
|
|
33
|
+
| `UPDATE distillation_queue ... WHERE arena LIKE` | **`distillation_queue` has NO `arena` column** (cols: id, event_id, enqueued_at, claimed_by, claimed_at, claim_expires_at, status, attempts, last_error, completed_at). Scope via a join to `events.arena` (see §3/§4). |
|
|
34
|
+
| disk: "prior 30GB root outage — watch disk" | Root is now **485G, 412G free (16%)** — ample. |
|
|
35
|
+
| "Merge #126 and deploy the new distiller" | The running container **may** build from a local copy `/opt/engine-v2/extractor-async/worker.py`, but the engine-deploy path also installs the SDK tarball **from S3 into `node_modules`** and builds from there — these have caused a *week of drift* before (editing a reference-only copy). **§2.0 makes verifying the real build context a hard step, not an assumption.** |
|
|
36
|
+
|
|
37
|
+
## 0. BLOCKING prerequisites (added 2026-06-22 review)
|
|
38
|
+
|
|
39
|
+
### 0.1 The re-distill MUST run TEACHER-ONLY (else it re-creates the bug)
|
|
40
|
+
|
|
41
|
+
`CASCADE_ENABLED=true` is live in prod (the student→teacher cascade, #99). Under
|
|
42
|
+
it, re-enqueued events flow **student-first**: ~75–80% are handled by the
|
|
43
|
+
fine-tuned student, only gated/escalated events reach the teacher. **#126 fixes
|
|
44
|
+
the *teacher* prompt only.** The deployed student is the `f1e0ff`-trained
|
|
45
|
+
fine-tune — it *learned the modality-collapse behaviour from the old teacher* —
|
|
46
|
+
so re-distilling through the live cascade would have the **student re-assert the
|
|
47
|
+
same future-as-fact fabrications on the majority of events**, and the audit
|
|
48
|
+
would not reach `<3%`.
|
|
49
|
+
|
|
50
|
+
**Therefore, for the entire re-distill (pilot §3 + full §4):**
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
# turn the cascade OFF so every re-enqueued event goes to the #126 teacher
|
|
54
|
+
aws ssm put-parameter --name /pme/prod-us-east-1/CASCADE_ENABLED --value false \
|
|
55
|
+
--type String --overwrite --region us-east-1
|
|
56
|
+
# then redeploy/restart extractor-async so the env reaches the worker (§2),
|
|
57
|
+
# and CONFIRM in the startup log: "cascade DISABLED" / no "student-primary" line.
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
Re-enable the cascade (`=true` + restart) only **after** §4 completes and the
|
|
61
|
+
audit passes. While it's off, the teacher fleet carries 100% of distillation —
|
|
62
|
+
factor that into §4.4 fleet sizing.
|
|
63
|
+
|
|
64
|
+
**Forward note (not part of this run):** deploying #126 advances the teacher
|
|
65
|
+
`prompt_hash`, which strands the live student (trained on `f1e0ff`). Once the
|
|
66
|
+
cascade is re-enabled, watch the random-sample student↔teacher agreement for
|
|
67
|
+
drift; the student will eventually want a refresh on #126 traces. Tracked
|
|
68
|
+
separately from this re-distill.
|
|
69
|
+
|
|
70
|
+
### 0.2 Vector index (Qdrant) must be reconciled — not just Postgres
|
|
71
|
+
|
|
72
|
+
Hybrid retrieval is live (Qdrant 1.18.2, `evidence` collection). The §4 DELETE
|
|
73
|
+
removes 483k facts from Postgres but **leaves their vectors orphaned in Qdrant**,
|
|
74
|
+
and re-distilled facts are content-hash-distinct → *new* vectors. Net without
|
|
75
|
+
reconciliation: search returns deleted facts + stale duplicates.
|
|
76
|
+
|
|
77
|
+
- **Confirm** what the `evidence` collection is keyed on (fact id vs event id)
|
|
78
|
+
and how a PG fact delete maps to vector points (`vector_provenance` table).
|
|
79
|
+
- **Purge** the arena's old vectors as part of §4.2 (delete the corresponding
|
|
80
|
+
Qdrant points / `vector_provenance` rows), and **verify** re-distilled facts
|
|
81
|
+
get re-embedded (the embedder lanes must keep up — `NV_EMBED_URL` interactive
|
|
82
|
+
+ bulk).
|
|
83
|
+
- The §4.1 snapshot + §Rollback are **Postgres-only**; add the vector state (see
|
|
84
|
+
§4.1 / Rollback below) or a rollback leaves Qdrant inconsistent with restored
|
|
85
|
+
PG.
|
|
86
|
+
|
|
87
|
+
### 0.3 Re-enqueue must NOT starve live ingest (all tenants)
|
|
88
|
+
|
|
89
|
+
`claim_next_batch` orders by `ORDER BY id`. Re-enqueued events are *old* → **low
|
|
90
|
+
ids** → they would be claimed **ahead of** live ingest (high ids) for **every
|
|
91
|
+
tenant**, stalling all forward memory ingest behind a multi-day job. Mitigate by
|
|
92
|
+
**dripping the re-enqueue in batches** (§4.3) rather than flipping all 273k to
|
|
93
|
+
`pending` at once — keep the pending re-distill backlog bounded (e.g. ≤ a few k)
|
|
94
|
+
so live ingest interleaves. Monitor that non-`pentatonic-team` pending age
|
|
95
|
+
doesn't climb.
|
|
96
|
+
|
|
97
|
+
## 1. Outage already fixed (2026-06-22 ~08:26 UTC)
|
|
98
|
+
|
|
99
|
+
`pme2-extractor-async` had been SIGKILL'd at 05:03 UTC (`OOMKilled=false` → a
|
|
100
|
+
failed deploy attempt) and silently not draining for ~3.5h while forward ingest
|
|
101
|
+
kept enqueuing — **no alert fired** (see Seesa SEE-189). Fixed: `docker start
|
|
102
|
+
pme2-extractor-async`; reset 60 orphaned claims (`status='claimed' AND
|
|
103
|
+
claim_expires_at < now()` → `pending`); stopped the wrongly-started bakeoff box.
|
|
104
|
+
Verified draining (~60/min, 0 new failures, active claims with future expiry).
|
|
105
|
+
**This restored the OLD prompt** (`prompt_hash=f1e0ff554f708d05`); §2 replaces it.
|
|
106
|
+
|
|
107
|
+
## 2. Deploy #126 (prompt fix) + disable cascade
|
|
108
|
+
|
|
109
|
+
SDK version bumped **0.10.18 → 0.10.19** (this PR) and tarball built
|
|
110
|
+
(`ai-agent-sdk-0.10.19.tgz`, bundled `worker.py` carries the 4 fix markers).
|
|
111
|
+
|
|
112
|
+
0. **Verify the real build context FIRST** (do not assume local-copy — this is
|
|
113
|
+
the "week of drift" failure mode). On `i-0559…`:
|
|
114
|
+
`docker inspect pme2-extractor-async --format '{{.Config.Image}}'` and read
|
|
115
|
+
the `extractor-async` service `build.context` in the *deployed* compose
|
|
116
|
+
(`/opt/engine-v2/...`). Confirm the file you are about to edit is the one that
|
|
117
|
+
image actually builds from. If the context resolves into
|
|
118
|
+
`node_modules/@pentatonic-ai/ai-agent-sdk/...` (the S3-tarball install), edit
|
|
119
|
+
THAT path (or replace the tarball), not a stray `/opt/engine-v2/extractor-async`
|
|
120
|
+
copy. Resolve Open-Q #1 (npm/S3 source of truth) here too.
|
|
121
|
+
1. **Diff** (safety — don't silently revert any box-local change): copy the #126
|
|
122
|
+
`worker.py` to `/tmp/worker.new.py`, then `diff` it against the file the build
|
|
123
|
+
context actually uses (from step 0). Expect **only** the
|
|
124
|
+
TENSE/MODALITY/ATTRIBUTION/IDENTITY prompt additions. If other diffs appear,
|
|
125
|
+
STOP and reconcile.
|
|
126
|
+
2. Replace that worker.py with the #126 version (keep a timestamped backup for
|
|
127
|
+
the Rollback worker-revert).
|
|
128
|
+
3. **Set `CASCADE_ENABLED=false`** (§0.1) so the re-distill is teacher-only.
|
|
129
|
+
4. `cd /opt/engine-v2 && sudo docker compose up -d --build extractor-async`.
|
|
130
|
+
5. **Verify**: startup log prints a NEW `prompt_hash` (≠ `f1e0ff554f708d05`)
|
|
131
|
+
AND shows the cascade is OFF (no "cascade ENABLED — student-primary" line);
|
|
132
|
+
`grep -c MODALITY` in the running container's worker.py > 0; a few fresh
|
|
133
|
+
completions look sane.
|
|
134
|
+
|
|
135
|
+
## 3. Pilot re-distill (~100 events) — GATE
|
|
136
|
+
|
|
137
|
+
1. Pilot set: the known-bad source events (Will Vickers, Catherine Hayes, Johann
|
|
138
|
+
Boedecker, Katrin) + ~80 random `pentatonic-team` events.
|
|
139
|
+
2. Scoped clear + re-enqueue (note the `events` join — no `arena` on the queue;
|
|
140
|
+
reset `attempts` so the rows are claim-eligible — eligibility is
|
|
141
|
+
`attempts < MAX_ATTEMPTS`):
|
|
142
|
+
```sql
|
|
143
|
+
-- clear old facts for the pilot events. NB provenance_event_ids[1] catches
|
|
144
|
+
-- facts whose FIRST source is a pilot event; a fact corroborated by (but not
|
|
145
|
+
-- first-seen from) a pilot event is missed — acceptable for a pilot, exact
|
|
146
|
+
-- scope comes in §4's whole-arena delete.
|
|
147
|
+
DELETE FROM facts f
|
|
148
|
+
WHERE f.arena LIKE 'pentatonic-team%'
|
|
149
|
+
AND f.provenance_event_ids[1] = ANY(:pilot_event_ids);
|
|
150
|
+
-- re-enqueue (queue has no arena; scope by event_id set; reset attempts)
|
|
151
|
+
UPDATE distillation_queue
|
|
152
|
+
SET status='pending', claimed_by=NULL, claimed_at=NULL,
|
|
153
|
+
claim_expires_at=NULL, attempts=0
|
|
154
|
+
WHERE event_id = ANY(:pilot_event_ids);
|
|
155
|
+
```
|
|
156
|
+
(Cascade is OFF per §0.1, so these run on the #126 teacher — the pilot
|
|
157
|
+
actually tests the fix, not the stale student.)
|
|
158
|
+
3. Let the fleet drain. **Audit** the pilot entities with the validation harness:
|
|
159
|
+
modality/attribution rate should fall toward ~0; "attended/is" → "is scheduled
|
|
160
|
+
to / plans to" or dropped. Vickers "Board Observer" should be modal/absent.
|
|
161
|
+
4. **🚦 GATE: report the audit; require an explicit human go before §4.**
|
|
162
|
+
|
|
163
|
+
## 4. Full re-distill (only after the pilot passes + go)
|
|
164
|
+
|
|
165
|
+
1. **Snapshot** (rollback point), off-box — Postgres AND vector state:
|
|
166
|
+
```bash
|
|
167
|
+
pg_dump ... -t facts -t entities -t relationships -t vector_provenance \
|
|
168
|
+
--where "arena LIKE 'pentatonic-team%'" > pt_graph_pre_redistill_2026-06-22.sql
|
|
169
|
+
# also snapshot/record the Qdrant evidence points for the arena (or a full
|
|
170
|
+
# collection snapshot) so §Rollback can restore the index, not just PG.
|
|
171
|
+
```
|
|
172
|
+
2. Clear (keep entities — Fusion + projection sweeps rebuild downstream) +
|
|
173
|
+
purge the orphaned vectors (§0.2):
|
|
174
|
+
```sql
|
|
175
|
+
DELETE FROM facts WHERE arena LIKE 'pentatonic-team%';
|
|
176
|
+
DELETE FROM relationships WHERE arena LIKE 'pentatonic-team%';
|
|
177
|
+
-- + delete the corresponding Qdrant points and vector_provenance rows for
|
|
178
|
+
-- the arena (see §0.2 — confirm the keying first).
|
|
179
|
+
```
|
|
180
|
+
Note: entities are retained, so old-prompt **junk entities** the new prompt
|
|
181
|
+
won't re-create (e.g. the "Pentatonic GmbH" footer affiliation) will persist
|
|
182
|
+
until Fusion decay/eviction clears them — flag for a post-run entity sweep if
|
|
183
|
+
the audit still spots them.
|
|
184
|
+
3. Re-enqueue in **bounded batches** (§0.3 — do NOT flip all 273k at once, or
|
|
185
|
+
live ingest for all tenants starves behind the low-id backlog). First confirm
|
|
186
|
+
completeness, then drip:
|
|
187
|
+
```sql
|
|
188
|
+
-- completeness check: how many arena events have a queue row vs not?
|
|
189
|
+
SELECT count(*) FILTER (WHERE q.id IS NOT NULL) AS have_row,
|
|
190
|
+
count(*) FILTER (WHERE q.id IS NULL) AS missing
|
|
191
|
+
FROM events e LEFT JOIN distillation_queue q ON q.event_id = e.id
|
|
192
|
+
WHERE e.arena LIKE 'pentatonic-team%';
|
|
193
|
+
-- for events WITH a row: re-enqueue a batch (reset attempts), repeat as the
|
|
194
|
+
-- pending re-distill backlog drains below a cap (e.g. 3k):
|
|
195
|
+
UPDATE distillation_queue q
|
|
196
|
+
SET status='pending', claimed_by=NULL, claimed_at=NULL,
|
|
197
|
+
claim_expires_at=NULL, attempts=0
|
|
198
|
+
FROM events e
|
|
199
|
+
WHERE e.id = q.event_id AND e.arena LIKE 'pentatonic-team%'
|
|
200
|
+
AND q.event_id IN (:next_batch_event_ids);
|
|
201
|
+
-- for events MISSING a row (pruned post-done): re-insert from events.
|
|
202
|
+
INSERT INTO distillation_queue (event_id, status, enqueued_at, attempts)
|
|
203
|
+
SELECT e.id, 'pending', now(), 0 FROM events e
|
|
204
|
+
WHERE e.arena LIKE 'pentatonic-team%'
|
|
205
|
+
AND NOT EXISTS (SELECT 1 FROM distillation_queue q WHERE q.event_id = e.id)
|
|
206
|
+
AND e.id IN (:next_batch_event_ids);
|
|
207
|
+
```
|
|
208
|
+
**Assert** the total re-enqueued (UPDATE + INSERT) across all batches == the
|
|
209
|
+
arena event count (~272,893) before declaring the enqueue complete.
|
|
210
|
+
4. **Scale the fleet** for ~273k events, **teacher-only** (cascade off → the
|
|
211
|
+
teacher carries 100%, not the usual ~25%). Single-worker ~60/min ⇒ ~76h;
|
|
212
|
+
`extractor-async-2/-3` + `distiller-autoscale.sh` help, BUT **g6e capacity is
|
|
213
|
+
currently severe** — this week we could not launch *or restart a stopped*
|
|
214
|
+
g6e. So: do **not** let the autoscaler scale the fleet to zero mid-run (a
|
|
215
|
+
stopped box may not come back); treat the currently-running L40S boxes as the
|
|
216
|
+
ceiling and the 76h as optimistic. Monitor `distillation_queue` depth → 0,
|
|
217
|
+
GPU, disk, and the `failed`/`ReadTimeout` rate (teacher ~20–49s/call).
|
|
218
|
+
|
|
219
|
+
## 5. Downstream reconciliation
|
|
220
|
+
|
|
221
|
+
1. **Fusion Drive** re-run for the arena (de-dup; run AFTER extraction settles) —
|
|
222
|
+
`backfill_entity_reconciliation.py` / `fusion-drive-*.sh`.
|
|
223
|
+
2. **Projection re-fold** (Seesa side): org/person projection sweeps re-fold from
|
|
224
|
+
the corrected graph — the Seesa **SEE-184** watermark sweep (now live, 03:45
|
|
225
|
+
UTC) detects the advanced `graphAsOf` and marks projections stale → SEE-168
|
|
226
|
+
refresh re-folds. Worker D1 read-models inherit the heal.
|
|
227
|
+
3. **Unblocks Seesa SEE-183** — the content-sensitivity retag (forward detection
|
|
228
|
+
already live) was deliberately held until this re-distill completes, to avoid
|
|
229
|
+
piling re-emitted events onto the shared queue mid-run.
|
|
230
|
+
4. **Re-enable the cascade** (§0.1): `CASCADE_ENABLED=true` + restart
|
|
231
|
+
extractor-async; confirm "cascade ENABLED — student-primary" returns. Then
|
|
232
|
+
watch student↔teacher agreement (the student is now `f1e0ff`-stale vs the
|
|
233
|
+
#126 teacher — schedule a student refresh if drift shows).
|
|
234
|
+
|
|
235
|
+
## Validation gate (before declaring done)
|
|
236
|
+
Re-run the audit harness over a fresh 30–50 entity sample → **bad-fact rate
|
|
237
|
+
< 3%** (from ~18.7%). Spot the canonical failures: Vickers "Board Observer"
|
|
238
|
+
(modal/absent), Catherine "will send / to organise" (→ commitments), Matvii
|
|
239
|
+
"Pentatonic GmbH" footer affiliation (gone), Sebastian conflation (split/keyed).
|
|
240
|
+
|
|
241
|
+
## Rollback
|
|
242
|
+
Restore `facts`/`relationships`/`vector_provenance` for `pentatonic-team%` from
|
|
243
|
+
the §4.1 snapshot AND restore the Qdrant `evidence` points (new-prompt
|
|
244
|
+
extractions are content-hash-distinct, so a restore is clean — but PG and Qdrant
|
|
245
|
+
must be rolled back *together* or search desyncs). Worker revert = redeploy the
|
|
246
|
+
prior worker.py (the §2.2 backup) + rebuild. Cascade: set `CASCADE_ENABLED=true`
|
|
247
|
+
back if it was changed.
|
|
248
|
+
|
|
249
|
+
## Open questions for Phil
|
|
250
|
+
1. **Durable SDK release / source of truth.** Resolve in §2.0: is the deployed
|
|
251
|
+
worker built from a local copy, or from the S3/npm SDK install? If the
|
|
252
|
+
lockfile/`npm install` path is authoritative, a future redeploy reverts #126
|
|
253
|
+
unless 0.10.19 is published there + the `/opt/engine-v2` pin bumped. (S3
|
|
254
|
+
`sdk/` has tarballs to 0.10.18; the lockfile root version was stale at 0.10.1
|
|
255
|
+
— confirm the real install path before relying on either.)
|
|
256
|
+
2. **Fleet sizing** for the **teacher-only** 273k run — given g6e capacity is
|
|
257
|
+
currently unreliable (can't restart stopped boxes), is the running L40S fleet
|
|
258
|
+
enough, or do we need reserved/alt-region capacity (or to accept a longer
|
|
259
|
+
wall-clock)?
|
|
260
|
+
3. ~~Re-enqueue completeness~~ → **resolved in §4.3** (completeness check +
|
|
261
|
+
INSERT-from-events fallback + count assertion).
|
|
262
|
+
4. Run **off-peak / batched**? §0.3 + §4.3 make it batched to protect live
|
|
263
|
+
ingest; still worth starting off-peak given the multi-day duration.
|
|
264
|
+
|
|
265
|
+
## Authorization note (Seesa-side / Claude Code)
|
|
266
|
+
The auto-mode Bash classifier blocks these shared-prod writes (SSM/ec2/psql) even
|
|
267
|
+
with verbal authority; they must run via the `!` prefix or a settings permission
|
|
268
|
+
rule. Read-only SSM queries are fine. The full DELETE (§4) will be kept behind an
|
|
269
|
+
explicit human go regardless of any rule.
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
# Re-distill plan — pentatonic-team, post modality/attribution fix (2026-06-21)
|
|
2
|
+
|
|
3
|
+
**Why.** A fact-by-fact audit of `org_model` (49 entities, 434 facts) found **~18.7%
|
|
4
|
+
of distilled facts wrong**, dominated by *modality collapse* (future/scheduled/
|
|
5
|
+
planned content asserted as established fact), plus attribution errors (unauthored
|
|
6
|
+
docs → person, attendee → organiser, org activity → person) and a few same-name
|
|
7
|
+
conflations. The teacher prompt is fixed in **ai-agent-sdk PR #126** (TENSE &
|
|
8
|
+
MODALITY / ATTRIBUTION FIDELITY / IDENTITY rules). The prompt only governs
|
|
9
|
+
*future* extractions, so the historical graph must be **re-distilled** to inherit
|
|
10
|
+
the fix.
|
|
11
|
+
|
|
12
|
+
**Already done (2026-06-21):** a targeted cleanup retracted **229** facts whose
|
|
13
|
+
provenance source event is dated in the future (`emitted_at > now()`) and whose
|
|
14
|
+
category is established (`state/mention/decision`), scoped to `pentatonic-team`.
|
|
15
|
+
Future *commitments* (29) were spared; the frozen `pip-agents` legacy was
|
|
16
|
+
untouched. ~11 "attends standup"-class facts (future *content* but a past-dated
|
|
17
|
+
source event) are NOT deterministically catchable and remain for this re-distill.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Scope & guardrails
|
|
22
|
+
- **Arena: `pentatonic-team%` ONLY.** `org_model` is multi-tenant — `pip-agents`
|
|
23
|
+
(~246k facts) is LEGACY/frozen and must never be touched. Every statement here
|
|
24
|
+
carries `WHERE arena LIKE 'pentatonic-team%'`.
|
|
25
|
+
- **Idempotency caveat (load-bearing).** `worker.py` IDs entities/facts/rels by
|
|
26
|
+
content-hash, so re-running the *same* prompt converges. But the **prompt
|
|
27
|
+
changed** (`SYSTEM_PROMPT_HASH` is new), so re-extraction yields *different*
|
|
28
|
+
facts with *different* IDs — the old wrong facts will **coexist** unless
|
|
29
|
+
cleared. ⇒ the re-distill MUST delete the existing pentatonic-team facts for
|
|
30
|
+
each re-processed event before/with re-extraction.
|
|
31
|
+
|
|
32
|
+
## Prerequisites
|
|
33
|
+
1. **Merge PR #126** and **deploy the new distiller** to the extractor fleet
|
|
34
|
+
(the L4 box running `extractor-async/worker.py`). Confirm the running worker's
|
|
35
|
+
`SYSTEM_PROMPT_HASH` matches the new prompt (it's logged on the trace line).
|
|
36
|
+
2. **Snapshot** (rollback point):
|
|
37
|
+
`pg_dump … -t facts -t entities -t relationships --where "arena LIKE 'pentatonic-team%'"`
|
|
38
|
+
(or a filtered `COPY … TO` per table). Store off-box.
|
|
39
|
+
|
|
40
|
+
## Procedure
|
|
41
|
+
|
|
42
|
+
### Stage 0 — pilot (≈100 events incl. the known-bad ones)
|
|
43
|
+
1. Pick a pilot set: the Will Vickers / Catherine Hayes / Johann / Katrin source
|
|
44
|
+
events + a random ~80 pentatonic-team events.
|
|
45
|
+
2. Delete existing facts for those events (scoped), then re-enqueue:
|
|
46
|
+
```sql
|
|
47
|
+
-- clear old facts for the pilot events
|
|
48
|
+
DELETE FROM facts f
|
|
49
|
+
WHERE f.arena LIKE 'pentatonic-team%'
|
|
50
|
+
AND f.provenance_event_ids[1] = ANY(:pilot_event_ids);
|
|
51
|
+
-- re-enqueue them
|
|
52
|
+
UPDATE distillation_queue
|
|
53
|
+
SET status='pending', claimed_by=NULL, claim_expires_at=NULL
|
|
54
|
+
WHERE arena LIKE 'pentatonic-team%'
|
|
55
|
+
AND event_id = ANY(:pilot_event_ids);
|
|
56
|
+
```
|
|
57
|
+
3. Let the fleet drain the queue. **Validate**: re-run the audit harness over the
|
|
58
|
+
pilot entities; the modality/attribution rate should fall toward ~0. Eyeball
|
|
59
|
+
the Vickers/Katrin facts — "attended/is" should become "is scheduled to /
|
|
60
|
+
plans to" (or drop).
|
|
61
|
+
|
|
62
|
+
### Stage 1 — full pentatonic-team re-distill (only if the pilot passes)
|
|
63
|
+
1. Clear pentatonic-team facts (keep entities; Fusion + the projection sweep
|
|
64
|
+
rebuild downstream). Relationships likewise.
|
|
65
|
+
```sql
|
|
66
|
+
DELETE FROM facts WHERE arena LIKE 'pentatonic-team%';
|
|
67
|
+
DELETE FROM relationships WHERE arena LIKE 'pentatonic-team%';
|
|
68
|
+
```
|
|
69
|
+
2. Re-enqueue every pentatonic-team event:
|
|
70
|
+
```sql
|
|
71
|
+
UPDATE distillation_queue
|
|
72
|
+
SET status='pending', claimed_by=NULL, claim_expires_at=NULL
|
|
73
|
+
WHERE arena LIKE 'pentatonic-team%';
|
|
74
|
+
```
|
|
75
|
+
(If queue rows were pruned post-done, re-insert from `events` for the arena.)
|
|
76
|
+
3. Scale the fleet for the backlog; monitor `distillation_queue` depth until 0.
|
|
77
|
+
|
|
78
|
+
### Stage 2 — downstream reconciliation
|
|
79
|
+
1. **Fusion Drive** re-run for the arena (consolidate the freshly-extracted
|
|
80
|
+
nodes/facts) — it's a de-duplicator, so run AFTER extraction settles.
|
|
81
|
+
`backfill_entity_reconciliation.py` covers the entity side.
|
|
82
|
+
2. **Projection re-fold** (Seesa side): the org/person projection sweeps re-fold
|
|
83
|
+
from the corrected graph (SEE-168 nightly sweep does this automatically; or
|
|
84
|
+
trigger manually). The Worker D1 read-models then inherit the heal.
|
|
85
|
+
|
|
86
|
+
## Validation (gate before declaring done)
|
|
87
|
+
- Re-run the audit harness over a fresh 30–50 entity sample → **bad-fact rate
|
|
88
|
+
target < 3%** (from ~18.7%).
|
|
89
|
+
- Spot the canonical failures: Vickers "Board Observer" (should be modal/absent),
|
|
90
|
+
Catherine's "will send / to organise" (→ commitments), Matvii's "Pentatonic
|
|
91
|
+
GmbH" footer affiliation (gone), Sebastian conflation (split or correctly keyed).
|
|
92
|
+
|
|
93
|
+
## Rollback
|
|
94
|
+
Restore `facts`/`relationships` for `pentatonic-team%` from the Stage-0 snapshot;
|
|
95
|
+
the new-prompt extractions are content-hash-distinct so a restore is clean.
|
|
96
|
+
|
|
97
|
+
## Cost / time
|
|
98
|
+
~163k pentatonic-team events × 7B extraction on the L4 fleet. Bounded by fleet
|
|
99
|
+
size × per-batch latency; run off-peak, monitor GPU/disk (prior outage: v2 wrote
|
|
100
|
+
to a 30 GB root — watch disk). Pilot first to estimate throughput before the full
|
|
101
|
+
run.
|