npm - @pentatonic-ai/ai-agent-sdk - Versions diffs - 0.10.6 → 0.10.8 - Mend

@pentatonic-ai/ai-agent-sdk 0.10.6 → 0.10.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/dist/index.cjs CHANGED Viewed

@@ -878,7 +878,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
 }
 // src/telemetry.js
-var VERSION = "0.10.6";
+var VERSION = "0.10.8";
 var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
 function machineId() {
   const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";

package/dist/index.js CHANGED Viewed

@@ -847,7 +847,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
 }
 // src/telemetry.js
-var VERSION = "0.10.6";
+var VERSION = "0.10.8";
 var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
 function machineId() {
   const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@pentatonic-ai/ai-agent-sdk",
-  "version": "0.10.6",
+  "version": "0.10.8",
   "description": "TES SDK — LLM observability and lifecycle tracking via Pentatonic Thing Event System. Track token usage, tool calls, and conversations. Manage things through event-sourced lifecycle stages with AI enrichment and vector search.",
   "type": "module",
   "main": "./dist/index.cjs",

package/packages/memory-engine-v2/RFC-decay-and-fusion.md ADDED Viewed

@@ -0,0 +1,185 @@
+# RFC: the Fusion Drive — v2 memory self-healing (cross-run node fusion + decay)
+> **Fusion Drive** = the continuous, arena-scoped background engine that keeps the v2
+> memory graph self-healing: it *fuses* duplicate/near-duplicate nodes from different
+> distillation runs into a single master node (horizontal convergence) and *decays* stale,
+> low-value, and junk nodes out of existence (vertical aging). Named for the drive that
+> does the fusing — the decay pass rides the same engine.
+**Status:** draft / spec — 2026-06-12
+**Builds on:** `RFC-entity-reconciliation.md`, `scripts/entity_resolution_v2.py` (#82),
+`org-model/migrations/002_entity_merges_audit.sql`.
+**Motivated by:** the v2 store is currently **pure-accretion** — three independent
+properties, all verified in code, mean nothing ever leaves or improves in place:
+1. **No supersede by source_id** — event identity is `sha256(arena:content)`; re-emitting
+   edited content appends a new event, the old persists.
+2. **Accrete-only graph writes** — entity/fact upserts are `ON CONFLICT (id) DO UPDATE`
+   that only merge aliases/provenance and bump confidence; a *corrected* extraction has a
+   different deterministic id, so it lands **beside** the polluted node, never replacing it.
+3. **No decay/eviction** — v2 has no GC; fact confidence only moves up; recency affects
+   search ranking only, never retention.
+Net: improving the extractor/teacher only helps **new** content. Accumulated 7B-era
+pollution (hallucinated emails, numeric-ID-as-person, ungrounded entities) is immortal.
+`pentatonic-team` had to be **nuked** rather than re-distilled because of this; `pip-agents`
+(87k events) still carries all of it.
+This RFC makes the store **self-healing** via two complementary mechanisms:
+**fusion** (horizontal — converge duplicate/near-duplicate nodes from different
+distillation runs into one *master* node) and **decay** (vertical — age out stale and
+low-value nodes). Both are gated, arena-scoped, audited, and reversible.
+---
+## Part A — Fusion: converge near-duplicate nodes into a master
+Extends the existing entity-resolution machinery along four axes.
+### A1. Online + continuous (today it's dry-run batch)
+Run fusion as a scheduled per-arena pass (systemd timer on the engine box, same pattern as
+the distiller autoscaler) **and** opportunistically after a distillation run touches an
+arena's entities. Keep #82's invariants: dry-run default, `--apply` gate, arena scoping,
+`entity_merges` rollback. Add a `fusion_runs` ledger (arena, started_at, candidates,
+merged, mode) for observability.
+### A2. Cross-distillation-run detection (the actual pollution cure)
+The hard case #82 misses: 7B `"1716801984"` (numeric-ID person) and Qwen3.6 `"Katie Cooper"`
+are the same real entity but share **no name similarity**, so name-blocking never compares
+them. New candidate signals beyond name trigrams / embedding-on-name:
+- **Shared-provenance co-reference** — two entities of the same `entity_type` citing the
+  same `event_id` in `provenance_event_ids`, where one is low-quality (numeric / ungrounded
+  / single-token). The shared event's content is the adjudication context ("does this event
+  support these being the same person?").
+- **Context embedding** — embed the *facts/statements about* an entity (not just its name),
+  so name-divergent dupes still cluster. Reuses the bulk-embed lane.
+- **Teacher-version signal** — provenance maps to `distillation_traces.llm_model` /
+  `system_prompt_hash`. Prefer the newer-teacher extraction as master; an entity *only* ever
+  produced by the superseded teacher and never re-confirmed by the new one is both a fusion
+  candidate (likely a worse rendering of a node the new teacher got right) and a decay
+  candidate (stale-teacher orphan — see B).
+### A3. Master-node selection — replace richest-row-wins
+#82 uses "richest-row-wins", which (flagged in review) would crown the typo **"Phil Mossop"**
+over **"Philip Mossop"**. Replace with a **scored** canonical pick:
+| Signal | Effect |
+|---|---|
+| **Directory/authority anchor** (name matches an org-directory / HubSpot contact / Pip `contact_email`+`contact_name`) | dominant + → canonical |
+| Grounding (name appears verbatim in a provenance event's content) | + |
+| Teacher recency (newer `llm_model`) | + |
+| Corroboration (`cardinality(provenance_event_ids)`) | + |
+| Looks-like-ID (digit-ratio > 0.5) / hallucinated-email flag / single-token bare name | − − |
+Master = highest score. Losers' surface forms become **aliases** on the master (so existing
+lookups still resolve), facts/relationships are repointed, losers tombstoned in
+`entity_merges` with `rollback_payload`. Directory-anchored selection is the key fix: an
+authoritative source, when present, beats any heuristic.
+### A4. Fact + relationship fusion (today only entities fuse)
+After entity fusion (so subject/object ids are canonical):
+- **Facts** — exact `(arena, subject, predicate, object)` dupes already collapse via the
+  content-id. **Semantic** dupes (same assertion, different surface — "joined Acme" vs "works
+  at Acme") need statement-embedding similarity + LLM adjudication ("same assertion?").
+  Master fact = max confidence + best-grounded statement; union provenance; tombstone dupes.
+  New `fact_merges` audit mirroring `entity_merges`.
+- **Relationships** — `(from,to,type)` already collapses; a controlled rel-type vocabulary
+  ("works at" ≡ "employed by") is a later optional canonicalization.
+### A5. Audit, reversibility, safety rails
+Reuse `entity_merges`; add `fact_merges`. Every fusion carries `rollback_payload`.
+LLM-adjudicated merges store prompt+verdict. **Disclosure rail:** never send
+`disclosure_class='restricted'` rows to the LLM adjudicator (data-egress; the #82 review
+item). Auto-merge only above a high confidence band; everything else → human-review queue.
+---
+## Part B — Decay: age out stale and low-value nodes
+### B1. Separate `salience` from `confidence` (important)
+Do **not** decay `confidence` — it means "how corroborated/true is this", and decaying it
+would lie about corroboration. Add a separate **`salience`** (retention priority) to
+entities/facts/relationships. Decay acts on salience; eviction keys on salience.
+`salience(t) = salience₀ · exp(−ln2 · Δt / half_life[category])`, bumped on access or
+re-corroboration. Per-category half-life:
+| category | half-life | rationale |
+|---|---|---|
+| decision, commitment | very long / ∞ | durable record |
+| state, preference | medium | changes but matters |
+| mention, observation | short | ephemeral |
+`Δt` = time since `last_seen` **or** a new `last_accessed` (bumped when a node is returned by
+`/search` — cheap write, makes retrieval keep memories alive). Re-corroboration (new
+provenance) resets the clock and bumps salience.
+### B2. Born-salience — the cheap partial cure
+Seed `salience₀` from extraction-quality signals already computed (the trap detectors:
+ungrounded, numeric-ID-person, hallucinated-email, `noise_filter` hits). **Junk is born
+low**, so it decays below threshold and self-evicts fast — pollution cleans itself even
+without a fusion match.
+### B3. Eviction (GC)
+Node is evictable when: `salience < min_threshold` **AND** `last_seen`/`last_accessed`
+older than a floor **AND** not referenced by a surviving higher-salience node (an entity
+that's the subject/object of a live fact survives). Eviction = **tombstone** (soft-delete +
+retention window) → hard-delete after grace, cascading to the node's Qdrant points +
+`vector_provenance`. Never evict `disclosure_class='restricted'` without sign-off.
+### B4. Capacity bound (optional)
+Per-arena soft cap; when exceeded, evict lowest-salience first. Backstop against unbounded
+arenas.
+### B5. Cadence + safety
+Background per-arena pass (timer on the engine box), dry-run → `--apply` in a quiet window,
+counts logged, fully arena-scoped. Same operational shape as the distiller autoscaler /
+sparse backfill.
+---
+## Part C — Ordering & how they combine
+Per arena, on schedule: **(1) fusion → (2) decay.** Fusion first so a master node absorbs
+its duplicates' provenance/salience *before* decay judges it (else a real node split across
+two weak dupes could wrongly decay out). Then decay ages + evicts the survivors.
+**This is what finally cures immortal pollution:**
+- 7B polluted node *with* a correct Qwen3.6 counterpart → **fused**, correct one as master,
+  polluted demoted to alias / tombstoned.
+- 7B pure-junk node with *no* correct counterpart (numeric-ID-person, ungrounded) → born-low
+  salience + no corroboration + never accessed → **decays out and is evicted**.
+Together they convert the accrete-only store into a self-healing one. `pip-agents` could
+then self-clean over time instead of requiring a nuke (a nuke is still faster for a one-shot
+reset, but no longer the *only* path).
+---
+## Part D — Schema changes
+- `entities`: `+ salience REAL DEFAULT …`, `+ last_accessed TIMESTAMPTZ`.
+- `facts`: `+ salience REAL`, `+ last_accessed TIMESTAMPTZ` (keep `confidence` as-is =
+  corroboration truth; `asserted_at`/`expires_at` already exist).
+- `relationships`: `+ salience REAL`, `+ last_accessed` (already has `weight`,
+  `first/last_seen`).
+- new `fact_merges` audit (mirror `entity_merges` incl. `rollback_payload`).
+- new `fusion_runs` + `decay_runs` ledgers for observability.
+- `/search` gains a `last_accessed = NOW()` bump on returned nodes (batched).
+## Part E — Rollout (each flag-gated, arena-scoped, dry-run-first, audited)
+1. **Salience scoring only** — add columns, born-salience + decay math, NO eviction.
+   Observe distributions; confirm junk scores low and durable facts stay high.
+2. **Eviction** — dry-run (count what *would* evict) → `--apply` in a quiet window.
+3. **Fusion extension** — scored canonical selection (fix typo-crowning) + cross-run
+   detection + fact fusion, dry-run → apply.
+4. **Online/continuous** — wire fusion+decay to run after distillation per arena.
+## Open questions
+- Half-life constants per category — needs a calibration pass against real arenas.
+- `last_accessed` write amplification on hot search paths — batch/throttle the bump.
+- Directory authority source for canonical anchoring — HubSpot contacts? a curated table?
+- Interaction with the (still-open) source_id supersede mode — fusion partly subsumes it,
+  but explicit supersede is cheaper for known-mutable sources.

package/packages/memory-engine-v2/RFC-fusion-drive.md ADDED Viewed

@@ -0,0 +1,193 @@
+# RFC: the Fusion Drive — v2 memory self-healing (cross-run node fusion + decay)
+> **Fusion Drive** = the continuous, arena-scoped background engine that keeps the v2
+> memory graph self-healing: it *fuses* duplicate/near-duplicate nodes from different
+> distillation runs into a single master node (horizontal convergence) and *decays* stale,
+> low-value, and junk nodes out of existence (vertical aging). Named for the drive that
+> does the fusing — the decay pass rides the same engine.
+**Status:** spec + initial implementation (PR #92) — 2026-06-12. Implemented: salience
+scoring + decay, **eviction** (`fusion_drive_decay.py --evict`, reversible via
+`node_evictions`), and **fusion** of exact + cross-run-shared-provenance entity dupes and
+exact-triple fact dupes (`fusion_drive_fuse.py --apply`, reversible via `entity_merges`/
+`fact_merges`), with scored directory-anchored master selection. All arena-scoped,
+dry-run-default, transactional, audited. TODO (later PRs): embedding-band + LLM-adjudicated
+detection (in `entity_resolution_v2.py`), semantic fact fusion, authority-table wiring for
+canonical scoring, continuous scheduling, and a half-life/threshold calibration pass before
+`--evict` runs in prod.
+**Builds on:** `RFC-entity-reconciliation.md`, `scripts/entity_resolution_v2.py` (#82),
+`org-model/migrations/002_entity_merges_audit.sql`.
+**Motivated by:** the v2 store is currently **pure-accretion** — three independent
+properties, all verified in code, mean nothing ever leaves or improves in place:
+1. **No supersede by source_id** — event identity is `sha256(arena:content)`; re-emitting
+   edited content appends a new event, the old persists.
+2. **Accrete-only graph writes** — entity/fact upserts are `ON CONFLICT (id) DO UPDATE`
+   that only merge aliases/provenance and bump confidence; a *corrected* extraction has a
+   different deterministic id, so it lands **beside** the polluted node, never replacing it.
+3. **No decay/eviction** — v2 has no GC; fact confidence only moves up; recency affects
+   search ranking only, never retention.
+Net: improving the extractor/teacher only helps **new** content. Accumulated 7B-era
+pollution (hallucinated emails, numeric-ID-as-person, ungrounded entities) is immortal.
+`pentatonic-team` had to be **nuked** rather than re-distilled because of this; `pip-agents`
+(87k events) still carries all of it.
+This RFC makes the store **self-healing** via two complementary mechanisms:
+**fusion** (horizontal — converge duplicate/near-duplicate nodes from different
+distillation runs into one *master* node) and **decay** (vertical — age out stale and
+low-value nodes). Both are gated, arena-scoped, audited, and reversible.
+---
+## Part A — Fusion: converge near-duplicate nodes into a master
+Extends the existing entity-resolution machinery along four axes.
+### A1. Online + continuous (today it's dry-run batch)
+Run fusion as a scheduled per-arena pass (systemd timer on the engine box, same pattern as
+the distiller autoscaler) **and** opportunistically after a distillation run touches an
+arena's entities. Keep #82's invariants: dry-run default, `--apply` gate, arena scoping,
+`entity_merges` rollback. Add a `fusion_runs` ledger (arena, started_at, candidates,
+merged, mode) for observability.
+### A2. Cross-distillation-run detection (the actual pollution cure)
+The hard case #82 misses: 7B `"1716801984"` (numeric-ID person) and Qwen3.6 `"Katie Cooper"`
+are the same real entity but share **no name similarity**, so name-blocking never compares
+them. New candidate signals beyond name trigrams / embedding-on-name:
+- **Shared-provenance co-reference** — two entities of the same `entity_type` citing the
+  same `event_id` in `provenance_event_ids`, where one is low-quality (numeric / ungrounded
+  / single-token). The shared event's content is the adjudication context ("does this event
+  support these being the same person?").
+- **Context embedding** — embed the *facts/statements about* an entity (not just its name),
+  so name-divergent dupes still cluster. Reuses the bulk-embed lane.
+- **Teacher-version signal** — provenance maps to `distillation_traces.llm_model` /
+  `system_prompt_hash`. Prefer the newer-teacher extraction as master; an entity *only* ever
+  produced by the superseded teacher and never re-confirmed by the new one is both a fusion
+  candidate (likely a worse rendering of a node the new teacher got right) and a decay
+  candidate (stale-teacher orphan — see B).
+### A3. Master-node selection — replace richest-row-wins
+#82 uses "richest-row-wins", which (flagged in review) would crown the typo **"Phil Mossop"**
+over **"Philip Mossop"**. Replace with a **scored** canonical pick:
+| Signal | Effect |
+|---|---|
+| **Directory/authority anchor** (name matches an org-directory / HubSpot contact / Pip `contact_email`+`contact_name`) | dominant + → canonical |
+| Grounding (name appears verbatim in a provenance event's content) | + |
+| Teacher recency (newer `llm_model`) | + |
+| Corroboration (`cardinality(provenance_event_ids)`) | + |
+| Looks-like-ID (digit-ratio > 0.5) / hallucinated-email flag / single-token bare name | − − |
+Master = highest score. Losers' surface forms become **aliases** on the master (so existing
+lookups still resolve), facts/relationships are repointed, losers tombstoned in
+`entity_merges` with `rollback_payload`. Directory-anchored selection is the key fix: an
+authoritative source, when present, beats any heuristic.
+### A4. Fact + relationship fusion (today only entities fuse)
+After entity fusion (so subject/object ids are canonical):
+- **Facts** — exact `(arena, subject, predicate, object)` dupes already collapse via the
+  content-id. **Semantic** dupes (same assertion, different surface — "joined Acme" vs "works
+  at Acme") need statement-embedding similarity + LLM adjudication ("same assertion?").
+  Master fact = max confidence + best-grounded statement; union provenance; tombstone dupes.
+  New `fact_merges` audit mirroring `entity_merges`.
+- **Relationships** — `(from,to,type)` already collapses; a controlled rel-type vocabulary
+  ("works at" ≡ "employed by") is a later optional canonicalization.
+### A5. Audit, reversibility, safety rails
+Reuse `entity_merges`; add `fact_merges`. Every fusion carries `rollback_payload`.
+LLM-adjudicated merges store prompt+verdict. **Disclosure rail:** never send
+`disclosure_class='restricted'` rows to the LLM adjudicator (data-egress; the #82 review
+item). Auto-merge only above a high confidence band; everything else → human-review queue.
+---
+## Part B — Decay: age out stale and low-value nodes
+### B1. Separate `salience` from `confidence` (important)
+Do **not** decay `confidence` — it means "how corroborated/true is this", and decaying it
+would lie about corroboration. Add a separate **`salience`** (retention priority) to
+entities/facts/relationships. Decay acts on salience; eviction keys on salience.
+`salience(t) = salience₀ · exp(−ln2 · Δt / half_life[category])`, bumped on access or
+re-corroboration. Per-category half-life:
+| category | half-life | rationale |
+|---|---|---|
+| decision, commitment | very long / ∞ | durable record |
+| state, preference | medium | changes but matters |
+| mention, observation | short | ephemeral |
+`Δt` = time since `last_seen` **or** a new `last_accessed` (bumped when a node is returned by
+`/search` — cheap write, makes retrieval keep memories alive). Re-corroboration (new
+provenance) resets the clock and bumps salience.
+### B2. Born-salience — the cheap partial cure
+Seed `salience₀` from extraction-quality signals already computed (the trap detectors:
+ungrounded, numeric-ID-person, hallucinated-email, `noise_filter` hits). **Junk is born
+low**, so it decays below threshold and self-evicts fast — pollution cleans itself even
+without a fusion match.
+### B3. Eviction (GC)
+Node is evictable when: `salience < min_threshold` **AND** `last_seen`/`last_accessed`
+older than a floor **AND** not referenced by a surviving higher-salience node (an entity
+that's the subject/object of a live fact survives). Eviction = **tombstone** (soft-delete +
+retention window) → hard-delete after grace, cascading to the node's Qdrant points +
+`vector_provenance`. Never evict `disclosure_class='restricted'` without sign-off.
+### B4. Capacity bound (optional)
+Per-arena soft cap; when exceeded, evict lowest-salience first. Backstop against unbounded
+arenas.
+### B5. Cadence + safety
+Background per-arena pass (timer on the engine box), dry-run → `--apply` in a quiet window,
+counts logged, fully arena-scoped. Same operational shape as the distiller autoscaler /
+sparse backfill.
+---
+## Part C — Ordering & how they combine
+Per arena, on schedule: **(1) fusion → (2) decay.** Fusion first so a master node absorbs
+its duplicates' provenance/salience *before* decay judges it (else a real node split across
+two weak dupes could wrongly decay out). Then decay ages + evicts the survivors.
+**This is what finally cures immortal pollution:**
+- 7B polluted node *with* a correct Qwen3.6 counterpart → **fused**, correct one as master,
+  polluted demoted to alias / tombstoned.
+- 7B pure-junk node with *no* correct counterpart (numeric-ID-person, ungrounded) → born-low
+  salience + no corroboration + never accessed → **decays out and is evicted**.
+Together they convert the accrete-only store into a self-healing one. `pip-agents` could
+then self-clean over time instead of requiring a nuke (a nuke is still faster for a one-shot
+reset, but no longer the *only* path).
+---
+## Part D — Schema changes
+- `entities`: `+ salience REAL DEFAULT …`, `+ last_accessed TIMESTAMPTZ`.
+- `facts`: `+ salience REAL`, `+ last_accessed TIMESTAMPTZ` (keep `confidence` as-is =
+  corroboration truth; `asserted_at`/`expires_at` already exist).
+- `relationships`: `+ salience REAL`, `+ last_accessed` (already has `weight`,
+  `first/last_seen`).
+- new `fact_merges` audit (mirror `entity_merges` incl. `rollback_payload`).
+- new `fusion_runs` + `decay_runs` ledgers for observability.
+- `/search` gains a `last_accessed = NOW()` bump on returned nodes (batched).
+## Part E — Rollout (each flag-gated, arena-scoped, dry-run-first, audited)
+1. **Salience scoring only** — add columns, born-salience + decay math, NO eviction.
+   Observe distributions; confirm junk scores low and durable facts stay high.
+2. **Eviction** — dry-run (count what *would* evict) → `--apply` in a quiet window.
+3. **Fusion extension** — scored canonical selection (fix typo-crowning) + cross-run
+   detection + fact fusion, dry-run → apply.
+4. **Online/continuous** — wire fusion+decay to run after distillation per arena.
+## Open questions
+- Half-life constants per category — needs a calibration pass against real arenas.
+- `last_accessed` write amplification on hot search paths — batch/throttle the bump.
+- Directory authority source for canonical anchoring — HubSpot contacts? a curated table?
+- Interaction with the (still-open) source_id supersede mode — fusion partly subsumes it,
+  but explicit supersede is cheaper for known-mutable sources.

package/packages/memory-engine-v2/docker-compose.aws.yml CHANGED Viewed

@@ -19,6 +19,14 @@
 services:
   org-model:
+    # max_connections + shared_buffers must be passed via `-c` flags;
+    # the postgres:16-alpine image does NOT honor POSTGRES_MAX_CONNECTIONS
+    # or POSTGRES_SHARED_BUFFERS env vars (only POSTGRES_USER/PASSWORD/DB).
+    # 2026-05-19: bumped from compiled default 100 -> 200 after Pip's
+    # aborted-forget incident saturated the slots (4 stuck DELETEs +
+    # baseline pools). Shared_buffers raised to match the operator intent
+    # that was previously expressed in the unread env vars.
+    command: ["postgres", "-c", "max_connections=200", "-c", "shared_buffers=1GB"]
     environment:
       # Production tuning: bigger shared_buffers for the materialised
       # views, more connection slots for the extractor + compat pools.
@@ -45,8 +53,53 @@ services:
       PG_DSN: ${PME_V2_PG_DSN}
       LLM_ENDPOINT: ${PME_V2_LLM_ENDPOINT:-}
       LLM_API_KEY: ${PENTATONIC_AI_GATEWAY_KEY:-}
+      # Default model id for the AWS self-hosted distiller (Qwen2.5-7B-Instruct
+      # via vLLM on i-0d658d1aa70b497a6, served as `qwen2.5-7b-instruct`).
+      # When PME_V2_LLM_ENDPOINT points back at the Lambda 30B gateway,
+      # override LLM_MODEL via env to that gateway's model id.
+      LLM_MODEL: ${LLM_MODEL:-qwen2.5-7b-instruct}
+      # Self-hosted distiller (Qwen3.6-27B-FP8 on L40S, served via the
+      # autoscaled fleet). Tuning vs the Lambda 30B fleet: smaller
+      # per-call chunks, higher concurrency, longer timeout.
+      #
+      # EVENTS_PER_LLM_CALL=3 (was 5) + LLM_MAX_TOKENS_PER_EVENT_JSON=900
+      # (was the 400 default): the guided-JSON max_tokens budget is
+      # SHARED across the chunk's events, so dense events (full email/doc
+      # bodies maxing 8 ent/6 fct/6 rel ≈ ~1.1k output tokens each)
+      # clustering in a 5-event chunk overran the old 2000-tok ceiling
+      # and truncated the JSON array tail — 15% of calls finished on
+      # `length` not `stop` (measured 2026-06-12). 3×900=2700 output +
+      # ~2100 prompt = ~4.8k, well inside the L40S's 8192 max-model-len
+      # (16384 OOMs the L40S), giving every event real headroom.
+      # Quality over throughput — the autoscaler adds boxes to recover
+      # the per-box throughput lost to smaller chunks.
+      EVENTS_PER_LLM_CALL: "3"
+      CONCURRENT_LLM_CALLS: "20"
+      LLM_MAX_TOKENS_PER_EVENT_JSON: "900"
+      LLM_TIMEOUT_SEC: "300"
       POLL_INTERVAL_SEC: "10"
-      CLAIM_TTL_SEC: "600"
+      CLAIM_TTL_SEC: "900"
+      POLL_INTERVAL_SEC_AFTER_EMPTY: "5"
+      # Skip-source list — never distil agent's-own-output, code ingest,
+      # orchestrator briefings, manual triage events into the graph.
+      # Source labels enumerated as they were observed leaking into prod
+      # over the weekend. New agent producers should be added here AND
+      # source_kind='agent' filtering should already drop them via worker.py.
+      DISTILL_SKIP_SOURCES: "pip-code-ingest,claude-code-plugin,openclaw-seesa,openclaw-plugin,openclaw-philip-mossop,openclaw-jamie,seesa,seesa-direct-curl-test,seesa-dedup-probe,orchestrator-web,briefing-morning,briefing-eod,triage-email,triage-manual"
+      # Trace logging — captures raw teacher I/O per distilled event into
+      # the distillation_traces table for student-model training data.
+      # Opt-in: defaults false here; set DISTILL_TRACE_ENABLED=true in
+      # SSM Parameter Store to flip on. See ai-events-sdk PR #74 for the
+      # worker-side logic + the migration that creates the table.
+      DISTILL_TRACE_ENABLED: ${DISTILL_TRACE_ENABLED:-false}
+      DISTILL_OUTPUT_MODE: ${DISTILL_OUTPUT_MODE:-kv}
+      DISTILL_GUIDED_PARAM_STYLE: ${DISTILL_GUIDED_PARAM_STYLE:-response_format}
+      # Chat-template switches forwarded verbatim on every completion
+      # (vLLM `chat_template_kwargs`). Required for thinking-capable
+      # teachers — Qwen3.x defaults enable_thinking=true, which burns
+      # the token budget on reasoning the distiller never reads. Set in
+      # SSM to '{"enable_thinking": false}' for the Qwen3.6 teacher.
+      DISTILL_CHAT_TEMPLATE_KWARGS: ${DISTILL_CHAT_TEMPLATE_KWARGS:-}
   compat:
     environment:
@@ -54,8 +107,15 @@ services:
       VECTOR_INDEX_URL: http://vector-index:6333
       EXTRACTOR_SYNC_URL: http://extractor-sync:8101
       NV_EMBED_URL: ${NV_EMBED_URL}
+      # Bulk embed lane (PR #76 ai-events-sdk) — separate box from the
+      # interactive lane so heavy backfills don't queue behind chat
+      # query embeds. Set in SSM to a different IP from NV_EMBED_URL.
+      NV_EMBED_URL_BULK: ${NV_EMBED_URL_BULK}
       NV_EMBED_API_KEY: ${PENTATONIC_AI_GATEWAY_KEY}
       NV_EMBED_PROVIDER: pentatonic-gateway
+      SEARCH_HYBRID_ENABLED: ${SEARCH_HYBRID_ENABLED:-}
+      SEARCH_MMR_ENABLED: ${SEARCH_MMR_ENABLED:-1}
+      SEARCH_INTENT_BOOST: ${SEARCH_INTENT_BOOST:-1}
       EMBED_DIM: "4096"
   # Cloudflared tunnel — same pattern as v1. Optional; only start if
@@ -76,3 +136,4 @@ services:
     depends_on:
       compat:
         condition: service_healthy

package/packages/memory-engine-v2/docker-compose.yml CHANGED Viewed

@@ -74,7 +74,14 @@ services:
   # --------------------------------------------------------------------
   vector-index:
     <<: *engine-base
-    image: qdrant/qdrant:v1.12.4
+    # v1.18.2: minimum version whose API can ADD a named (sparse) vector
+    # to an existing collection (PUT /collections/{c}/vectors/{v}) —
+    # required by hybrid retrieval's 'lex' migration. Upgraded in prod
+    # 2026-06-11 by stepping minors 1.13.6→…→1.18.2 (the 1.12→1.18
+    # direct jump fails: segment.json "unknown variant `on_disk`").
+    # Do NOT lower this pin: 1.18-migrated storage cannot be read by
+    # older servers.
+    image: qdrant/qdrant:v1.18.2
     container_name: pme2-vector-index
     ports:
       - "127.0.0.1:${PME_V2_QDRANT_HTTP_PORT:-16333}:6333"

package/packages/memory-engine-v2/extractor-async/confidence.py CHANGED Viewed

@@ -60,3 +60,40 @@ def corroborated_confidence(n_sources: int) -> float:
     if bumped > _CONF_CAP:
         return _CONF_CAP
     return round(bumped, 2)
+# ── born salience (Fusion Drive) ─────────────────────────────────────
+# Retention priority a node is stamped with at extraction time, SEPARATE
+# from confidence (confidence = corroboration/truth; salience = how long
+# it's worth keeping). Junk — flagged by the extractor's own quality
+# detectors (noise name, numeric-ID-as-person, hallucinated email,
+# ungrounded, etc.) — is born near the floor so the Fusion Drive decay
+# pass evicts it on a short clock instead of the multi-year default.
+#
+# This MUST stay byte-identical to fusion_drive/salience.py:born_salience
+# (the decay side uses the same scale). test_born_salience_parity.py
+# guards the two against drift — same pattern as entity_id.py's parity
+# test across the sync/async build contexts.
+_SAL_BASE = 0.50
+_SAL_CORROB_PER_SOURCE = 0.10
+_SAL_CORROB_CAP = 0.30
+_SAL_FLOOR = 0.01
+_SAL_CEIL = 1.00
+_SAL_PENALTIES = {
+    "noise_name": 0.45,
+    "numeric_id_person": 0.45,
+    "hallucinated_email": 0.40,
+    "ungrounded": 0.35,
+    "subject_undeclared": 0.25,
+    "low_signal": 0.15,
+}
+def born_salience(n_sources: int = 1, quality_flags: list[str] | None = None) -> float:
+    """Salience to stamp on a freshly extracted node. See the module note."""
+    s = _SAL_BASE
+    if n_sources > 1:
+        s += min(_SAL_CORROB_CAP, _SAL_CORROB_PER_SOURCE * (n_sources - 1))
+    for flag in quality_flags or []:
+        s -= _SAL_PENALTIES.get(flag, 0.0)
+    return round(max(_SAL_FLOOR, min(_SAL_CEIL, s)), 4)

package/packages/memory-engine-v2/extractor-async/test_born_salience_parity.py ADDED Viewed

@@ -0,0 +1,35 @@
+"""Parity guard: confidence.born_salience (worker, copied into the container)
+must stay byte-equivalent to fusion_drive/salience.born_salience (the decay
+side). Same pattern as test_entity_id_parity.py — the two live across a Docker
+build-context boundary and would silently drift otherwise."""
+from __future__ import annotations
+import os
+import sys
+import confidence as worker
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "fusion_drive"))
+import salience as drive  # noqa: E402
+def test_constants_match():
+    assert worker._SAL_BASE == drive.BASE_SALIENCE
+    assert worker._SAL_CORROB_PER_SOURCE == drive.CORROB_PER_SOURCE
+    assert worker._SAL_CORROB_CAP == drive.CORROB_CAP
+    assert worker._SAL_FLOOR == drive.SALIENCE_FLOOR
+    assert worker._SAL_CEIL == drive.SALIENCE_CEIL
+    assert worker._SAL_PENALTIES == drive.QUALITY_PENALTIES
+def test_output_matches_across_input_matrix():
+    flagsets = [
+        None, [], ["noise_name"], ["numeric_id_person"], ["hallucinated_email"],
+        ["ungrounded"], ["subject_undeclared"], ["low_signal"],
+        ["numeric_id_person", "hallucinated_email", "ungrounded"],
+        ["noise_name"] * 5,
+    ]
+    for n in (1, 2, 3, 5, 100):
+        for flags in flagsets:
+            assert worker.born_salience(n, flags) == drive.born_salience(n_sources=n, quality_flags=flags), (n, flags)

package/packages/memory-engine-v2/extractor-async/test_guided_json_parser.py CHANGED Viewed

@@ -409,3 +409,47 @@ def test_guided_prompt_keeps_content_rules() -> None:
     # Pipe scaffolding gone
     assert "COUNT THE PIPES" not in p
     assert "PIPE-DELIMITED" not in p
+# ----------------------------------------------------------------------
+# DISTILL_CHAT_TEMPLATE_KWARGS — thinking-teacher template switch
+# ----------------------------------------------------------------------
+def test_default_body_has_no_chat_template_kwargs(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    """Unset env → the request body is byte-identical to before the
+    knob existed (Qwen2.5-class teachers need no template switches)."""
+    monkeypatch.delenv("DISTILL_CHAT_TEMPLATE_KWARGS", raising=False)
+    w = _load_worker("worker_no_ctk")
+    assert w.DISTILL_CHAT_TEMPLATE_KWARGS is None
+    assert "chat_template_kwargs" not in w._build_request_body("PROMPT", 5)
+def test_chat_template_kwargs_forwarded(monkeypatch: pytest.MonkeyPatch) -> None:
+    """The Qwen3.x swap case: {"enable_thinking": false} must land
+    verbatim in every request body, in both output modes."""
+    monkeypatch.setenv("DISTILL_CHAT_TEMPLATE_KWARGS", '{"enable_thinking": false}')
+    w = _load_worker("worker_ctk")
+    assert w.DISTILL_CHAT_TEMPLATE_KWARGS == {"enable_thinking": False}
+    body = w._build_request_body("PROMPT", 5)
+    assert body["chat_template_kwargs"] == {"enable_thinking": False}
+    monkeypatch.setenv("DISTILL_OUTPUT_MODE", "guided_json")
+    w2 = _load_worker("worker_ctk_guided")
+    body2 = w2._build_request_body("PROMPT", 5)
+    assert body2["chat_template_kwargs"] == {"enable_thinking": False}
+    assert "response_format" in body2
+def test_chat_template_kwargs_invalid_ignored(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    """Malformed JSON or a non-object must not take the worker down —
+    log + ignore, requests stay clean."""
+    for bad in ("{not json", '["a", "list"]', '"a string"'):
+        monkeypatch.setenv("DISTILL_CHAT_TEMPLATE_KWARGS", bad)
+        w = _load_worker(f"worker_ctk_bad_{abs(hash(bad))}")
+        assert w.DISTILL_CHAT_TEMPLATE_KWARGS is None
+        assert "chat_template_kwargs" not in w._build_request_body("PROMPT", 5)