@pentatonic-ai/ai-agent-sdk 0.10.18 → 0.10.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/index.cjs CHANGED
@@ -878,7 +878,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
878
878
  }
879
879
 
880
880
  // src/telemetry.js
881
- var VERSION = "0.10.18";
881
+ var VERSION = "0.10.20";
882
882
  var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
883
883
  function machineId() {
884
884
  const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";
package/dist/index.js CHANGED
@@ -847,7 +847,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
847
847
  }
848
848
 
849
849
  // src/telemetry.js
850
- var VERSION = "0.10.18";
850
+ var VERSION = "0.10.20";
851
851
  var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
852
852
  function machineId() {
853
853
  const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@pentatonic-ai/ai-agent-sdk",
3
- "version": "0.10.18",
3
+ "version": "0.10.20",
4
4
  "description": "TES SDK — LLM observability and lifecycle tracking via Pentatonic Thing Event System. Track token usage, tool calls, and conversations. Manage things through event-sourced lifecycle stages with AI enrichment and vector search.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.cjs",
@@ -1,10 +1,18 @@
1
1
  # RFC: the Fusion Drive — v2 memory self-healing (cross-run node fusion + decay)
2
2
 
3
3
  > **Fusion Drive** = the continuous, arena-scoped background engine that keeps the v2
4
- > memory graph self-healing: it *fuses* duplicate/near-duplicate nodes from different
5
- > distillation runs into a single master node (horizontal convergence) and *decays* stale,
6
- > low-value, and junk nodes out of existence (vertical aging). Named for the drive that
7
- > does the fusingthe decay pass rides the same engine.
4
+ > memory graph self-healing. It triages every node into one of **three** outcomes:
5
+ > it *fuses* duplicate/near-duplicate nodes from different distillation runs into a single
6
+ > master node (horizontal convergence); it *re-distills* high-value extractions produced by
7
+ > a superseded teacher/promptregenerating them from the still-present source event through
8
+ > the current clean teacher (depth refresh); and it *decays* stale, low-value, and junk nodes
9
+ > out of existence (vertical aging). Named for the drive that does the fusing — the re-distill
10
+ > and decay passes ride the same engine.
11
+ >
12
+ > *(Revised 2026-06-22: added Part B′ — re-distillation — as the third triage verb, with the
13
+ > prompt-version-drift trigger. Motivated by the clean-prompt deploy (SDK 0.10.19, #126 +
14
+ > #129) which made "the current teacher is materially better than the one that produced most
15
+ > of the graph" concrete and measurable via `system_prompt_hash`.)*
8
16
 
9
17
  **Status:** draft / spec — 2026-06-12
10
18
  **Builds on:** `RFC-entity-reconciliation.md`, `scripts/entity_resolution_v2.py` (#82),
@@ -139,15 +147,101 @@ sparse backfill.
139
147
 
140
148
  ---
141
149
 
150
+ ## Part B′ — Re-distillation: regenerate stale-prompt extractions from source
151
+
152
+ Fusion (A) needs a *correct counterpart* to converge toward; Decay (B) just *deletes*. But
153
+ the common case after a teacher/prompt upgrade is a **high-value node with no correct
154
+ counterpart yet** — the only extraction that exists is the stale-prompt one. Fusion has
155
+ nothing to fuse to; decay would throw away real information. The cure is the third verb: the
156
+ **source event still exists** (`events` table, 376k rows live), so regenerate the extraction
157
+ by re-running that event through the *current clean teacher*. Fusion converges horizontally,
158
+ decay ages vertically; re-distill refreshes **in depth**.
159
+
160
+ ### B′1. Trigger — prompt-version drift, not raw age
161
+ The defect population is *exactly* the facts/entities whose provenance traces an **old
162
+ `system_prompt_hash`** — `bbdaba6b…` / `f1e0ff55…` / `ef0647c7…` (pre-clean), vs the clean
163
+ `6ccfe70f…` deployed with 0.10.19 (#126 modality/attribution + #129 email-discipline &
164
+ entity-separation). #118 propagated source onto facts, so provenance → the event's
165
+ `distillation_traces.system_prompt_hash` is queryable. **Age is a weak proxy; prompt-version
166
+ selects the defect set directly** — a months-old node the clean teacher would extract
167
+ identically needs nothing; a two-day-old node from the dirty prompt is a defect. Prioritize
168
+ by `salience` (B1) so high-value stale nodes go first.
169
+
170
+ ### B′2. Triage routing — 3-way, by salience × prompt-version
171
+ Per assessed node/event:
172
+
173
+ | condition | outcome |
174
+ |---|---|
175
+ | stale prompt-hash **+** high salience **+** source event present | **re-distill** (this part) |
176
+ | has a correct newer-teacher counterpart in the arena | **fuse** (Part A) |
177
+ | low salience, junk-born (B2), no corroboration, never accessed | **decay** (Part B) |
178
+
179
+ ### B′3. Mechanism — re-enqueue, don't mutate in place
180
+ Re-distill = re-insert the source `event_id` into `distillation_queue` (`status='pending'`,
181
+ `attempts=0`). The existing **extractor-async** worker claims it, runs the clean teacher,
182
+ writes the new extraction **and a fresh `6ccfe70f` trace**. No new pipeline — it reuses the
183
+ distiller, the combined-demand **autoscaler**, and the trace ledger. (Re-distill is a
184
+ *producer* of queue demand; the autoscaler's student-aware floor already keeps a teacher box
185
+ warm for it — see the deploy notes.)
186
+
187
+ ### B′4. Supersedence — the load-bearing requirement
188
+ The store is **pure-accretion** (the whole motivation of this RFC). A naive re-enqueue makes
189
+ the clean extraction land **beside** the dirty one → it *worsens* fragmentation. So
190
+ re-distill MUST close the loop through Fusion's tombstone machinery — it is **sequenced into
191
+ the Fusion Drive, not bolted on**:
192
+
193
+ 1. Each re-distill is recorded in a `redistill_runs` ledger with its triggering
194
+ `(event_id, old_prompt_hash)`.
195
+ 2. When the clean extraction completes, **Fusion converges old ↔ new for that event** using
196
+ the teacher-version master signal (A2/A3): the new `6ccfe70f` extraction wins as master;
197
+ the old extraction's now-orphaned nodes (those whose **only** provenance was this event
198
+ under the old hash) are tombstoned/repointed via `entity_merges` / `fact_merges`.
199
+ 3. Where an old node carries **other live provenance** (multi-event corroboration), only this
200
+ event's contribution is repointed — **never blind-delete a multi-source node** (the
201
+ over-merge failure mode: a hotel email wrongly attached to a person must not let one
202
+ event's repoint nuke an otherwise-corroborated node).
203
+
204
+ This dependency is hard: **re-distill is unsafe until Fusion's cross-run / teacher-version
205
+ master selection (E3) is live.** Until then a re-distill loop accretes. An interim cheaper
206
+ option (Open Q): explicit **event-scoped supersede** — delete only the facts/entities whose
207
+ provenance set is exactly `{this event}` under the old hash before re-enqueue — covers the
208
+ single-provenance majority without the full fusion adjudicator.
209
+
210
+ ### B′5. Corpus-as-byproduct — one loop, three wins
211
+ Every re-distill emits a clean `6ccfe70f` `distillation_trace`. A prompt-version-drift
212
+ re-distill loop therefore **builds the student retrain corpus while it repairs the graph**
213
+ (`scripts/build_retrain_corpus.py` consumes those traces). It subsumes the one-shot full
214
+ re-distill: gradual, rate-limited, no nuke — graph repair **+** corpus **+** self-healing
215
+ from a single engine. This is the durable answer to "is the corpus building?": it is, as a
216
+ side effect of the gardener.
217
+
218
+ ### B′6. Cadence + cost + safety
219
+ Rolling, rate-limited, autoscaler-aware, off-peak. Budget *N* events/hour against teacher
220
+ capacity; order by `salience × staleness`. **Never big-bang the full backlog** — gradual
221
+ migration is the point. Arena-scoped, dry-run → `--apply`, `redistill_runs` ledger for
222
+ observability and rollback. Same operational shape as fusion/decay/autoscaler.
223
+
224
+ ---
225
+
142
226
  ## Part C — Ordering & how they combine
143
227
 
144
- Per arena, on schedule: **(1) fusion(2) decay.** Fusion first so a master node absorbs
145
- its duplicates' provenance/salience *before* decay judges it (else a real node split across
146
- two weak dupes could wrongly decay out). Then decay ages + evicts the survivors.
228
+ Per arena, on schedule: **(1) triagere-distill the high-value stale-prompt set (async via
229
+ the queue) (2) fusion → (3) decay.** Re-distill is enqueued first so that by the time
230
+ fusion runs, the clean counterpart exists for it to crown as master (else fusion has only
231
+ stale renderings to choose between). Fusion then absorbs each master's duplicates'
232
+ provenance/salience *before* decay judges it (else a real node split across two weak dupes
233
+ could wrongly decay out). Then decay ages + evicts the survivors.
234
+
235
+ *(Re-distill is asynchronous — it completes on the teacher's schedule — so in practice a
236
+ node re-distilled in this pass is fused/decayed in the **next** per-arena pass, once its
237
+ clean trace + extraction have landed. The ledger links the two.)*
147
238
 
148
239
  **This is what finally cures immortal pollution:**
149
240
  - 7B polluted node *with* a correct Qwen3.6 counterpart → **fused**, correct one as master,
150
241
  polluted demoted to alias / tombstoned.
242
+ - stale-prompt node, *high-value*, *no* correct counterpart, source event present →
243
+ **re-distilled** through the clean teacher → new master extraction; old superseded via
244
+ fusion (B′4). The information is *recovered*, not lost.
151
245
  - 7B pure-junk node with *no* correct counterpart (numeric-ID-person, ungrounded) → born-low
152
246
  salience + no corroboration + never accessed → **decays out and is evicted**.
153
247
 
@@ -165,8 +259,15 @@ reset, but no longer the *only* path).
165
259
  - `relationships`: `+ salience REAL`, `+ last_accessed` (already has `weight`,
166
260
  `first/last_seen`).
167
261
  - new `fact_merges` audit (mirror `entity_merges` incl. `rollback_payload`).
168
- - new `fusion_runs` + `decay_runs` ledgers for observability.
262
+ - new `fusion_runs` + `decay_runs` + `redistill_runs` ledgers for observability. `redistill_runs`:
263
+ `(id, arena, event_id, old_prompt_hash, new_prompt_hash, salience_at_trigger, enqueued_at,
264
+ completed_at, fused_at, mode)` — links a re-distill to its triggering node and to the fusion
265
+ that superseded the old extraction.
169
266
  - `/search` gains a `last_accessed = NOW()` bump on returned nodes (batched).
267
+ - re-distill trigger needs provenance → prompt-version: either denormalize `system_prompt_hash`
268
+ onto `facts`/`entities` at write time (cheap filter), or join through
269
+ `distillation_traces(event_id → system_prompt_hash)` on the provenance event ids (no schema
270
+ change, costlier query). Prefer the join until the trigger volume justifies denormalizing.
170
271
 
171
272
  ## Part E — Rollout (each flag-gated, arena-scoped, dry-run-first, audited)
172
273
 
@@ -176,6 +277,13 @@ reset, but no longer the *only* path).
176
277
  3. **Fusion extension** — scored canonical selection (fix typo-crowning) + cross-run
177
278
  detection + fact fusion, dry-run → apply.
178
279
  4. **Online/continuous** — wire fusion+decay to run after distillation per arena.
280
+ 5. **Re-distill loop (Part B′)** — dry-run triage first (count stale-prompt nodes by
281
+ `system_prompt_hash` × salience bucket to size the work), then a **bounded `--apply` slice**
282
+ on one curated arena (re-enqueue + verify clean trace + verify fusion supersedes the old
283
+ extraction), then wire continuous. **Gated on step 3** (Fusion cross-run / teacher-version
284
+ master selection): until that's live, re-distill must use the interim **event-scoped
285
+ supersede** (B′4) or it accretes. Ships as `scripts/redistill.py` (dry-run default,
286
+ `--apply` gate, arena-scoped, `redistill_runs` ledger).
179
287
 
180
288
  ## Open questions
181
289
  - Half-life constants per category — needs a calibration pass against real arenas.
@@ -183,3 +291,9 @@ reset, but no longer the *only* path).
183
291
  - Directory authority source for canonical anchoring — HubSpot contacts? a curated table?
184
292
  - Interaction with the (still-open) source_id supersede mode — fusion partly subsumes it,
185
293
  but explicit supersede is cheaper for known-mutable sources.
294
+ - **Re-distill supersedence before full fusion is live** — is event-scoped supersede (delete
295
+ only nodes whose provenance set is exactly `{this event}` under the old hash) a safe enough
296
+ interim, or do we hard-gate the loop on E3? Single-provenance nodes are the majority, but
297
+ the multi-provenance tail is where the over-merge risk concentrates.
298
+ - **Re-distill prioritization** — pure `salience × staleness`, or weight toward the entities
299
+ behind known user-visible confabulations (Vickers/Boedecker) first?
@@ -896,6 +896,8 @@ class GraphQueryRequest(BaseModel):
896
896
  entity_type: str | None = None
897
897
  name: str | None = None # canonical_name (ILIKE)
898
898
  subject: str | None = None # entity name OR canonical_name (facts.subject_entity)
899
+ subject_entity_id: str | None = None # EXACT facts.subject_entity_id — strict, no name bleed
900
+ object_entity_id: str | None = None # EXACT facts.object_entity_id
899
901
  predicate: str | None = None
900
902
  category: str | None = None # facts.category
901
903
  from_name: str | None = None # relationships.from_entity.canonical_name
@@ -944,9 +946,14 @@ async def list_entities(req: GraphQueryRequest):
944
946
 
945
947
  @app.post("/facts")
946
948
  async def list_facts(req: GraphQueryRequest):
947
- """Filter facts by arena + optional category/predicate + optional
948
- subject-entity name. Subject filter joins facts → entities via
949
- subject_entity_id."""
949
+ """Filter facts by arena + optional category/predicate + subject.
950
+
951
+ PREFER `subject_entity_id` (exact id match) over `subject` (name ILIKE):
952
+ name matching bleeds one person's facts into another's answer when names
953
+ collide or fragment (the Will Vickers ⟵ Will Spencer confabulation — a
954
+ query resolved to one entity must NOT pull a same/similar-named entity's
955
+ facts). The name path is kept for back-compat callers that haven't resolved
956
+ an id yet, but entity-id is the strict, bleed-free path."""
950
957
  arenas = _resolve_arenas(req)
951
958
  conditions = ["f.arena = ANY(%s)"]
952
959
  params: list[Any] = [arenas]
@@ -956,7 +963,14 @@ async def list_facts(req: GraphQueryRequest):
956
963
  if req.predicate:
957
964
  conditions.append("f.predicate ILIKE %s")
958
965
  params.append(f"%{req.predicate}%")
959
- if req.subject:
966
+ if req.subject_entity_id:
967
+ conditions.append("f.subject_entity_id = %s")
968
+ params.append(req.subject_entity_id)
969
+ if req.object_entity_id:
970
+ conditions.append("f.object_entity_id = %s")
971
+ params.append(req.object_entity_id)
972
+ # Name path: only when no exact id was given (back-compat / unresolved callers).
973
+ if req.subject and not req.subject_entity_id:
960
974
  conditions.append("EXISTS (SELECT 1 FROM entities e WHERE e.id = f.subject_entity_id AND (e.canonical_name ILIKE %s OR %s = ANY(e.aliases)))")
961
975
  params.extend([f"%{req.subject}%", req.subject])
962
976
  sql = f"""
@@ -0,0 +1,269 @@
1
+ # Re-distill EXECUTION plan — pentatonic-team, #126 modality fix (2026-06-22)
2
+
3
+ > **For review (Phil H) before any prod write.** This is the concrete, gated
4
+ > execution plan for deploying the #126 distiller fix and re-distilling the
5
+ > `pentatonic-team` graph. It supersedes the operational detail in
6
+ > `redistill-plan-2026-06-21.md` where live findings differ (scale, deploy
7
+ > mechanism, queue schema). The *why* and the guardrails there still hold.
8
+ >
9
+ > **Nothing here has been run except the outage fix in §1.** The irreversible
10
+ > full delete (§4) is explicitly gated on the pilot audit (§3) + a human go.
11
+ >
12
+ > **⚠️ Three prerequisites added after review (§0) are BLOCKING** — without them
13
+ > the re-distill would (a) re-introduce the same fabrications via the cascade
14
+ > student, (b) leave the vector index inconsistent, and (c) starve every other
15
+ > tenant's live ingest. Read §0 first.
16
+
17
+ ## Why (recap)
18
+
19
+ A fact-by-fact audit found ~18.7% of distilled `pentatonic-team` facts were
20
+ wrong — dominated by **modality collapse** (future/scheduled/planned content
21
+ asserted as established fact), plus attribution errors and same-name
22
+ conflations. The teacher prompt is fixed in **ai-agent-sdk PR #126** (TENSE &
23
+ MODALITY / ATTRIBUTION FIDELITY / IDENTITY rules, in both `BATCH_SYSTEM_PROMPT`
24
+ and the active `GUIDED_JSON_SYSTEM_PROMPT`). The prompt only governs *future*
25
+ extractions, so the historical graph must be re-distilled to inherit the fix.
26
+
27
+ ## Live findings that change the 2026-06-21 runbook
28
+
29
+ | Claim in 06-21 runbook | Live reality (verified 2026-06-22) |
30
+ |---|---|
31
+ | "~163k events", "434 facts" | **272,893 events · 483,097 facts · 291,797 rels** for `arena LIKE 'pentatonic-team%'`. The 434 was a 49-entity sample. |
32
+ | Deploy = "the L4 box running extractor-async" | **The distiller runs on the DB box `i-0559922cf59ac6975`** (`pme-prod-us-east-1`), container `pme2-extractor-async`. The `seesa-distiller-bakeoff` box (`i-0d65…`) is **NOT** the host (empty, stopped since 06-10) — earlier handoff was wrong. Models are external: teacher `seesa-distiller-l40s` (`172.31.26.202:8005`, `qwen3.6-27b-fp8`), student `seesa-student-l4` (`172.31.29.121:8005`). |
33
+ | `UPDATE distillation_queue ... WHERE arena LIKE` | **`distillation_queue` has NO `arena` column** (cols: id, event_id, enqueued_at, claimed_by, claimed_at, claim_expires_at, status, attempts, last_error, completed_at). Scope via a join to `events.arena` (see §3/§4). |
34
+ | disk: "prior 30GB root outage — watch disk" | Root is now **485G, 412G free (16%)** — ample. |
35
+ | "Merge #126 and deploy the new distiller" | The running container **may** build from a local copy `/opt/engine-v2/extractor-async/worker.py`, but the engine-deploy path also installs the SDK tarball **from S3 into `node_modules`** and builds from there — these have caused a *week of drift* before (editing a reference-only copy). **§2.0 makes verifying the real build context a hard step, not an assumption.** |
36
+
37
+ ## 0. BLOCKING prerequisites (added 2026-06-22 review)
38
+
39
+ ### 0.1 The re-distill MUST run TEACHER-ONLY (else it re-creates the bug)
40
+
41
+ `CASCADE_ENABLED=true` is live in prod (the student→teacher cascade, #99). Under
42
+ it, re-enqueued events flow **student-first**: ~75–80% are handled by the
43
+ fine-tuned student, only gated/escalated events reach the teacher. **#126 fixes
44
+ the *teacher* prompt only.** The deployed student is the `f1e0ff`-trained
45
+ fine-tune — it *learned the modality-collapse behaviour from the old teacher* —
46
+ so re-distilling through the live cascade would have the **student re-assert the
47
+ same future-as-fact fabrications on the majority of events**, and the audit
48
+ would not reach `<3%`.
49
+
50
+ **Therefore, for the entire re-distill (pilot §3 + full §4):**
51
+
52
+ ```bash
53
+ # turn the cascade OFF so every re-enqueued event goes to the #126 teacher
54
+ aws ssm put-parameter --name /pme/prod-us-east-1/CASCADE_ENABLED --value false \
55
+ --type String --overwrite --region us-east-1
56
+ # then redeploy/restart extractor-async so the env reaches the worker (§2),
57
+ # and CONFIRM in the startup log: "cascade DISABLED" / no "student-primary" line.
58
+ ```
59
+
60
+ Re-enable the cascade (`=true` + restart) only **after** §4 completes and the
61
+ audit passes. While it's off, the teacher fleet carries 100% of distillation —
62
+ factor that into §4.4 fleet sizing.
63
+
64
+ **Forward note (not part of this run):** deploying #126 advances the teacher
65
+ `prompt_hash`, which strands the live student (trained on `f1e0ff`). Once the
66
+ cascade is re-enabled, watch the random-sample student↔teacher agreement for
67
+ drift; the student will eventually want a refresh on #126 traces. Tracked
68
+ separately from this re-distill.
69
+
70
+ ### 0.2 Vector index (Qdrant) must be reconciled — not just Postgres
71
+
72
+ Hybrid retrieval is live (Qdrant 1.18.2, `evidence` collection). The §4 DELETE
73
+ removes 483k facts from Postgres but **leaves their vectors orphaned in Qdrant**,
74
+ and re-distilled facts are content-hash-distinct → *new* vectors. Net without
75
+ reconciliation: search returns deleted facts + stale duplicates.
76
+
77
+ - **Confirm** what the `evidence` collection is keyed on (fact id vs event id)
78
+ and how a PG fact delete maps to vector points (`vector_provenance` table).
79
+ - **Purge** the arena's old vectors as part of §4.2 (delete the corresponding
80
+ Qdrant points / `vector_provenance` rows), and **verify** re-distilled facts
81
+ get re-embedded (the embedder lanes must keep up — `NV_EMBED_URL` interactive
82
+ + bulk).
83
+ - The §4.1 snapshot + §Rollback are **Postgres-only**; add the vector state (see
84
+ §4.1 / Rollback below) or a rollback leaves Qdrant inconsistent with restored
85
+ PG.
86
+
87
+ ### 0.3 Re-enqueue must NOT starve live ingest (all tenants)
88
+
89
+ `claim_next_batch` orders by `ORDER BY id`. Re-enqueued events are *old* → **low
90
+ ids** → they would be claimed **ahead of** live ingest (high ids) for **every
91
+ tenant**, stalling all forward memory ingest behind a multi-day job. Mitigate by
92
+ **dripping the re-enqueue in batches** (§4.3) rather than flipping all 273k to
93
+ `pending` at once — keep the pending re-distill backlog bounded (e.g. ≤ a few k)
94
+ so live ingest interleaves. Monitor that non-`pentatonic-team` pending age
95
+ doesn't climb.
96
+
97
+ ## 1. Outage already fixed (2026-06-22 ~08:26 UTC)
98
+
99
+ `pme2-extractor-async` had been SIGKILL'd at 05:03 UTC (`OOMKilled=false` → a
100
+ failed deploy attempt) and silently not draining for ~3.5h while forward ingest
101
+ kept enqueuing — **no alert fired** (see Seesa SEE-189). Fixed: `docker start
102
+ pme2-extractor-async`; reset 60 orphaned claims (`status='claimed' AND
103
+ claim_expires_at < now()` → `pending`); stopped the wrongly-started bakeoff box.
104
+ Verified draining (~60/min, 0 new failures, active claims with future expiry).
105
+ **This restored the OLD prompt** (`prompt_hash=f1e0ff554f708d05`); §2 replaces it.
106
+
107
+ ## 2. Deploy #126 (prompt fix) + disable cascade
108
+
109
+ SDK version bumped **0.10.18 → 0.10.19** (this PR) and tarball built
110
+ (`ai-agent-sdk-0.10.19.tgz`, bundled `worker.py` carries the 4 fix markers).
111
+
112
+ 0. **Verify the real build context FIRST** (do not assume local-copy — this is
113
+ the "week of drift" failure mode). On `i-0559…`:
114
+ `docker inspect pme2-extractor-async --format '{{.Config.Image}}'` and read
115
+ the `extractor-async` service `build.context` in the *deployed* compose
116
+ (`/opt/engine-v2/...`). Confirm the file you are about to edit is the one that
117
+ image actually builds from. If the context resolves into
118
+ `node_modules/@pentatonic-ai/ai-agent-sdk/...` (the S3-tarball install), edit
119
+ THAT path (or replace the tarball), not a stray `/opt/engine-v2/extractor-async`
120
+ copy. Resolve Open-Q #1 (npm/S3 source of truth) here too.
121
+ 1. **Diff** (safety — don't silently revert any box-local change): copy the #126
122
+ `worker.py` to `/tmp/worker.new.py`, then `diff` it against the file the build
123
+ context actually uses (from step 0). Expect **only** the
124
+ TENSE/MODALITY/ATTRIBUTION/IDENTITY prompt additions. If other diffs appear,
125
+ STOP and reconcile.
126
+ 2. Replace that worker.py with the #126 version (keep a timestamped backup for
127
+ the Rollback worker-revert).
128
+ 3. **Set `CASCADE_ENABLED=false`** (§0.1) so the re-distill is teacher-only.
129
+ 4. `cd /opt/engine-v2 && sudo docker compose up -d --build extractor-async`.
130
+ 5. **Verify**: startup log prints a NEW `prompt_hash` (≠ `f1e0ff554f708d05`)
131
+ AND shows the cascade is OFF (no "cascade ENABLED — student-primary" line);
132
+ `grep -c MODALITY` in the running container's worker.py > 0; a few fresh
133
+ completions look sane.
134
+
135
+ ## 3. Pilot re-distill (~100 events) — GATE
136
+
137
+ 1. Pilot set: the known-bad source events (Will Vickers, Catherine Hayes, Johann
138
+ Boedecker, Katrin) + ~80 random `pentatonic-team` events.
139
+ 2. Scoped clear + re-enqueue (note the `events` join — no `arena` on the queue;
140
+ reset `attempts` so the rows are claim-eligible — eligibility is
141
+ `attempts < MAX_ATTEMPTS`):
142
+ ```sql
143
+ -- clear old facts for the pilot events. NB provenance_event_ids[1] catches
144
+ -- facts whose FIRST source is a pilot event; a fact corroborated by (but not
145
+ -- first-seen from) a pilot event is missed — acceptable for a pilot, exact
146
+ -- scope comes in §4's whole-arena delete.
147
+ DELETE FROM facts f
148
+ WHERE f.arena LIKE 'pentatonic-team%'
149
+ AND f.provenance_event_ids[1] = ANY(:pilot_event_ids);
150
+ -- re-enqueue (queue has no arena; scope by event_id set; reset attempts)
151
+ UPDATE distillation_queue
152
+ SET status='pending', claimed_by=NULL, claimed_at=NULL,
153
+ claim_expires_at=NULL, attempts=0
154
+ WHERE event_id = ANY(:pilot_event_ids);
155
+ ```
156
+ (Cascade is OFF per §0.1, so these run on the #126 teacher — the pilot
157
+ actually tests the fix, not the stale student.)
158
+ 3. Let the fleet drain. **Audit** the pilot entities with the validation harness:
159
+ modality/attribution rate should fall toward ~0; "attended/is" → "is scheduled
160
+ to / plans to" or dropped. Vickers "Board Observer" should be modal/absent.
161
+ 4. **🚦 GATE: report the audit; require an explicit human go before §4.**
162
+
163
+ ## 4. Full re-distill (only after the pilot passes + go)
164
+
165
+ 1. **Snapshot** (rollback point), off-box — Postgres AND vector state:
166
+ ```bash
167
+ pg_dump ... -t facts -t entities -t relationships -t vector_provenance \
168
+ --where "arena LIKE 'pentatonic-team%'" > pt_graph_pre_redistill_2026-06-22.sql
169
+ # also snapshot/record the Qdrant evidence points for the arena (or a full
170
+ # collection snapshot) so §Rollback can restore the index, not just PG.
171
+ ```
172
+ 2. Clear (keep entities — Fusion + projection sweeps rebuild downstream) +
173
+ purge the orphaned vectors (§0.2):
174
+ ```sql
175
+ DELETE FROM facts WHERE arena LIKE 'pentatonic-team%';
176
+ DELETE FROM relationships WHERE arena LIKE 'pentatonic-team%';
177
+ -- + delete the corresponding Qdrant points and vector_provenance rows for
178
+ -- the arena (see §0.2 — confirm the keying first).
179
+ ```
180
+ Note: entities are retained, so old-prompt **junk entities** the new prompt
181
+ won't re-create (e.g. the "Pentatonic GmbH" footer affiliation) will persist
182
+ until Fusion decay/eviction clears them — flag for a post-run entity sweep if
183
+ the audit still spots them.
184
+ 3. Re-enqueue in **bounded batches** (§0.3 — do NOT flip all 273k at once, or
185
+ live ingest for all tenants starves behind the low-id backlog). First confirm
186
+ completeness, then drip:
187
+ ```sql
188
+ -- completeness check: how many arena events have a queue row vs not?
189
+ SELECT count(*) FILTER (WHERE q.id IS NOT NULL) AS have_row,
190
+ count(*) FILTER (WHERE q.id IS NULL) AS missing
191
+ FROM events e LEFT JOIN distillation_queue q ON q.event_id = e.id
192
+ WHERE e.arena LIKE 'pentatonic-team%';
193
+ -- for events WITH a row: re-enqueue a batch (reset attempts), repeat as the
194
+ -- pending re-distill backlog drains below a cap (e.g. 3k):
195
+ UPDATE distillation_queue q
196
+ SET status='pending', claimed_by=NULL, claimed_at=NULL,
197
+ claim_expires_at=NULL, attempts=0
198
+ FROM events e
199
+ WHERE e.id = q.event_id AND e.arena LIKE 'pentatonic-team%'
200
+ AND q.event_id IN (:next_batch_event_ids);
201
+ -- for events MISSING a row (pruned post-done): re-insert from events.
202
+ INSERT INTO distillation_queue (event_id, status, enqueued_at, attempts)
203
+ SELECT e.id, 'pending', now(), 0 FROM events e
204
+ WHERE e.arena LIKE 'pentatonic-team%'
205
+ AND NOT EXISTS (SELECT 1 FROM distillation_queue q WHERE q.event_id = e.id)
206
+ AND e.id IN (:next_batch_event_ids);
207
+ ```
208
+ **Assert** the total re-enqueued (UPDATE + INSERT) across all batches == the
209
+ arena event count (~272,893) before declaring the enqueue complete.
210
+ 4. **Scale the fleet** for ~273k events, **teacher-only** (cascade off → the
211
+ teacher carries 100%, not the usual ~25%). Single-worker ~60/min ⇒ ~76h;
212
+ `extractor-async-2/-3` + `distiller-autoscale.sh` help, BUT **g6e capacity is
213
+ currently severe** — this week we could not launch *or restart a stopped*
214
+ g6e. So: do **not** let the autoscaler scale the fleet to zero mid-run (a
215
+ stopped box may not come back); treat the currently-running L40S boxes as the
216
+ ceiling and the 76h as optimistic. Monitor `distillation_queue` depth → 0,
217
+ GPU, disk, and the `failed`/`ReadTimeout` rate (teacher ~20–49s/call).
218
+
219
+ ## 5. Downstream reconciliation
220
+
221
+ 1. **Fusion Drive** re-run for the arena (de-dup; run AFTER extraction settles) —
222
+ `backfill_entity_reconciliation.py` / `fusion-drive-*.sh`.
223
+ 2. **Projection re-fold** (Seesa side): org/person projection sweeps re-fold from
224
+ the corrected graph — the Seesa **SEE-184** watermark sweep (now live, 03:45
225
+ UTC) detects the advanced `graphAsOf` and marks projections stale → SEE-168
226
+ refresh re-folds. Worker D1 read-models inherit the heal.
227
+ 3. **Unblocks Seesa SEE-183** — the content-sensitivity retag (forward detection
228
+ already live) was deliberately held until this re-distill completes, to avoid
229
+ piling re-emitted events onto the shared queue mid-run.
230
+ 4. **Re-enable the cascade** (§0.1): `CASCADE_ENABLED=true` + restart
231
+ extractor-async; confirm "cascade ENABLED — student-primary" returns. Then
232
+ watch student↔teacher agreement (the student is now `f1e0ff`-stale vs the
233
+ #126 teacher — schedule a student refresh if drift shows).
234
+
235
+ ## Validation gate (before declaring done)
236
+ Re-run the audit harness over a fresh 30–50 entity sample → **bad-fact rate
237
+ < 3%** (from ~18.7%). Spot the canonical failures: Vickers "Board Observer"
238
+ (modal/absent), Catherine "will send / to organise" (→ commitments), Matvii
239
+ "Pentatonic GmbH" footer affiliation (gone), Sebastian conflation (split/keyed).
240
+
241
+ ## Rollback
242
+ Restore `facts`/`relationships`/`vector_provenance` for `pentatonic-team%` from
243
+ the §4.1 snapshot AND restore the Qdrant `evidence` points (new-prompt
244
+ extractions are content-hash-distinct, so a restore is clean — but PG and Qdrant
245
+ must be rolled back *together* or search desyncs). Worker revert = redeploy the
246
+ prior worker.py (the §2.2 backup) + rebuild. Cascade: set `CASCADE_ENABLED=true`
247
+ back if it was changed.
248
+
249
+ ## Open questions for Phil
250
+ 1. **Durable SDK release / source of truth.** Resolve in §2.0: is the deployed
251
+ worker built from a local copy, or from the S3/npm SDK install? If the
252
+ lockfile/`npm install` path is authoritative, a future redeploy reverts #126
253
+ unless 0.10.19 is published there + the `/opt/engine-v2` pin bumped. (S3
254
+ `sdk/` has tarballs to 0.10.18; the lockfile root version was stale at 0.10.1
255
+ — confirm the real install path before relying on either.)
256
+ 2. **Fleet sizing** for the **teacher-only** 273k run — given g6e capacity is
257
+ currently unreliable (can't restart stopped boxes), is the running L40S fleet
258
+ enough, or do we need reserved/alt-region capacity (or to accept a longer
259
+ wall-clock)?
260
+ 3. ~~Re-enqueue completeness~~ → **resolved in §4.3** (completeness check +
261
+ INSERT-from-events fallback + count assertion).
262
+ 4. Run **off-peak / batched**? §0.3 + §4.3 make it batched to protect live
263
+ ingest; still worth starting off-peak given the multi-day duration.
264
+
265
+ ## Authorization note (Seesa-side / Claude Code)
266
+ The auto-mode Bash classifier blocks these shared-prod writes (SSM/ec2/psql) even
267
+ with verbal authority; they must run via the `!` prefix or a settings permission
268
+ rule. Read-only SSM queries are fine. The full DELETE (§4) will be kept behind an
269
+ explicit human go regardless of any rule.
@@ -0,0 +1,101 @@
1
+ # Re-distill plan — pentatonic-team, post modality/attribution fix (2026-06-21)
2
+
3
+ **Why.** A fact-by-fact audit of `org_model` (49 entities, 434 facts) found **~18.7%
4
+ of distilled facts wrong**, dominated by *modality collapse* (future/scheduled/
5
+ planned content asserted as established fact), plus attribution errors (unauthored
6
+ docs → person, attendee → organiser, org activity → person) and a few same-name
7
+ conflations. The teacher prompt is fixed in **ai-agent-sdk PR #126** (TENSE &
8
+ MODALITY / ATTRIBUTION FIDELITY / IDENTITY rules). The prompt only governs
9
+ *future* extractions, so the historical graph must be **re-distilled** to inherit
10
+ the fix.
11
+
12
+ **Already done (2026-06-21):** a targeted cleanup retracted **229** facts whose
13
+ provenance source event is dated in the future (`emitted_at > now()`) and whose
14
+ category is established (`state/mention/decision`), scoped to `pentatonic-team`.
15
+ Future *commitments* (29) were spared; the frozen `pip-agents` legacy was
16
+ untouched. ~11 "attends standup"-class facts (future *content* but a past-dated
17
+ source event) are NOT deterministically catchable and remain for this re-distill.
18
+
19
+ ---
20
+
21
+ ## Scope & guardrails
22
+ - **Arena: `pentatonic-team%` ONLY.** `org_model` is multi-tenant — `pip-agents`
23
+ (~246k facts) is LEGACY/frozen and must never be touched. Every statement here
24
+ carries `WHERE arena LIKE 'pentatonic-team%'`.
25
+ - **Idempotency caveat (load-bearing).** `worker.py` IDs entities/facts/rels by
26
+ content-hash, so re-running the *same* prompt converges. But the **prompt
27
+ changed** (`SYSTEM_PROMPT_HASH` is new), so re-extraction yields *different*
28
+ facts with *different* IDs — the old wrong facts will **coexist** unless
29
+ cleared. ⇒ the re-distill MUST delete the existing pentatonic-team facts for
30
+ each re-processed event before/with re-extraction.
31
+
32
+ ## Prerequisites
33
+ 1. **Merge PR #126** and **deploy the new distiller** to the extractor fleet
34
+ (the L4 box running `extractor-async/worker.py`). Confirm the running worker's
35
+ `SYSTEM_PROMPT_HASH` matches the new prompt (it's logged on the trace line).
36
+ 2. **Snapshot** (rollback point):
37
+ `pg_dump … -t facts -t entities -t relationships --where "arena LIKE 'pentatonic-team%'"`
38
+ (or a filtered `COPY … TO` per table). Store off-box.
39
+
40
+ ## Procedure
41
+
42
+ ### Stage 0 — pilot (≈100 events incl. the known-bad ones)
43
+ 1. Pick a pilot set: the Will Vickers / Catherine Hayes / Johann / Katrin source
44
+ events + a random ~80 pentatonic-team events.
45
+ 2. Delete existing facts for those events (scoped), then re-enqueue:
46
+ ```sql
47
+ -- clear old facts for the pilot events
48
+ DELETE FROM facts f
49
+ WHERE f.arena LIKE 'pentatonic-team%'
50
+ AND f.provenance_event_ids[1] = ANY(:pilot_event_ids);
51
+ -- re-enqueue them
52
+ UPDATE distillation_queue
53
+ SET status='pending', claimed_by=NULL, claim_expires_at=NULL
54
+ WHERE arena LIKE 'pentatonic-team%'
55
+ AND event_id = ANY(:pilot_event_ids);
56
+ ```
57
+ 3. Let the fleet drain the queue. **Validate**: re-run the audit harness over the
58
+ pilot entities; the modality/attribution rate should fall toward ~0. Eyeball
59
+ the Vickers/Katrin facts — "attended/is" should become "is scheduled to /
60
+ plans to" (or drop).
61
+
62
+ ### Stage 1 — full pentatonic-team re-distill (only if the pilot passes)
63
+ 1. Clear pentatonic-team facts (keep entities; Fusion + the projection sweep
64
+ rebuild downstream). Relationships likewise.
65
+ ```sql
66
+ DELETE FROM facts WHERE arena LIKE 'pentatonic-team%';
67
+ DELETE FROM relationships WHERE arena LIKE 'pentatonic-team%';
68
+ ```
69
+ 2. Re-enqueue every pentatonic-team event:
70
+ ```sql
71
+ UPDATE distillation_queue
72
+ SET status='pending', claimed_by=NULL, claim_expires_at=NULL
73
+ WHERE arena LIKE 'pentatonic-team%';
74
+ ```
75
+ (If queue rows were pruned post-done, re-insert from `events` for the arena.)
76
+ 3. Scale the fleet for the backlog; monitor `distillation_queue` depth until 0.
77
+
78
+ ### Stage 2 — downstream reconciliation
79
+ 1. **Fusion Drive** re-run for the arena (consolidate the freshly-extracted
80
+ nodes/facts) — it's a de-duplicator, so run AFTER extraction settles.
81
+ `backfill_entity_reconciliation.py` covers the entity side.
82
+ 2. **Projection re-fold** (Seesa side): the org/person projection sweeps re-fold
83
+ from the corrected graph (SEE-168 nightly sweep does this automatically; or
84
+ trigger manually). The Worker D1 read-models then inherit the heal.
85
+
86
+ ## Validation (gate before declaring done)
87
+ - Re-run the audit harness over a fresh 30–50 entity sample → **bad-fact rate
88
+ target < 3%** (from ~18.7%).
89
+ - Spot the canonical failures: Vickers "Board Observer" (should be modal/absent),
90
+ Catherine's "will send / to organise" (→ commitments), Matvii's "Pentatonic
91
+ GmbH" footer affiliation (gone), Sebastian conflation (split or correctly keyed).
92
+
93
+ ## Rollback
94
+ Restore `facts`/`relationships` for `pentatonic-team%` from the Stage-0 snapshot;
95
+ the new-prompt extractions are content-hash-distinct so a restore is clean.
96
+
97
+ ## Cost / time
98
+ ~163k pentatonic-team events × 7B extraction on the L4 fleet. Bounded by fleet
99
+ size × per-batch latency; run off-peak, monitor GPU/disk (prior outage: v2 wrote
100
+ to a 30 GB root — watch disk). Pilot first to estimate throughput before the full
101
+ run.