npm - @pentatonic-ai/ai-agent-sdk - Versions diffs - 0.10.7 → 0.10.9 - Mend

@pentatonic-ai/ai-agent-sdk 0.10.7 → 0.10.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/dist/index.cjs CHANGED Viewed

@@ -878,7 +878,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
 }
 // src/telemetry.js
-var VERSION = "0.10.7";
+var VERSION = "0.10.9";
 var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
 function machineId() {
   const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";

package/dist/index.js CHANGED Viewed

@@ -847,7 +847,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
 }
 // src/telemetry.js
-var VERSION = "0.10.7";
+var VERSION = "0.10.9";
 var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
 function machineId() {
   const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@pentatonic-ai/ai-agent-sdk",
-  "version": "0.10.7",
+  "version": "0.10.9",
   "description": "TES SDK — LLM observability and lifecycle tracking via Pentatonic Thing Event System. Track token usage, tool calls, and conversations. Manage things through event-sourced lifecycle stages with AI enrichment and vector search.",
   "type": "module",
   "main": "./dist/index.cjs",

package/packages/memory-engine-v2/RFC-decay-and-fusion.md ADDED Viewed

@@ -0,0 +1,185 @@
+# RFC: the Fusion Drive — v2 memory self-healing (cross-run node fusion + decay)
+> **Fusion Drive** = the continuous, arena-scoped background engine that keeps the v2
+> memory graph self-healing: it *fuses* duplicate/near-duplicate nodes from different
+> distillation runs into a single master node (horizontal convergence) and *decays* stale,
+> low-value, and junk nodes out of existence (vertical aging). Named for the drive that
+> does the fusing — the decay pass rides the same engine.
+**Status:** draft / spec — 2026-06-12
+**Builds on:** `RFC-entity-reconciliation.md`, `scripts/entity_resolution_v2.py` (#82),
+`org-model/migrations/002_entity_merges_audit.sql`.
+**Motivated by:** the v2 store is currently **pure-accretion** — three independent
+properties, all verified in code, mean nothing ever leaves or improves in place:
+1. **No supersede by source_id** — event identity is `sha256(arena:content)`; re-emitting
+   edited content appends a new event, the old persists.
+2. **Accrete-only graph writes** — entity/fact upserts are `ON CONFLICT (id) DO UPDATE`
+   that only merge aliases/provenance and bump confidence; a *corrected* extraction has a
+   different deterministic id, so it lands **beside** the polluted node, never replacing it.
+3. **No decay/eviction** — v2 has no GC; fact confidence only moves up; recency affects
+   search ranking only, never retention.
+Net: improving the extractor/teacher only helps **new** content. Accumulated 7B-era
+pollution (hallucinated emails, numeric-ID-as-person, ungrounded entities) is immortal.
+`pentatonic-team` had to be **nuked** rather than re-distilled because of this; `pip-agents`
+(87k events) still carries all of it.
+This RFC makes the store **self-healing** via two complementary mechanisms:
+**fusion** (horizontal — converge duplicate/near-duplicate nodes from different
+distillation runs into one *master* node) and **decay** (vertical — age out stale and
+low-value nodes). Both are gated, arena-scoped, audited, and reversible.
+---
+## Part A — Fusion: converge near-duplicate nodes into a master
+Extends the existing entity-resolution machinery along four axes.
+### A1. Online + continuous (today it's dry-run batch)
+Run fusion as a scheduled per-arena pass (systemd timer on the engine box, same pattern as
+the distiller autoscaler) **and** opportunistically after a distillation run touches an
+arena's entities. Keep #82's invariants: dry-run default, `--apply` gate, arena scoping,
+`entity_merges` rollback. Add a `fusion_runs` ledger (arena, started_at, candidates,
+merged, mode) for observability.
+### A2. Cross-distillation-run detection (the actual pollution cure)
+The hard case #82 misses: 7B `"1716801984"` (numeric-ID person) and Qwen3.6 `"Katie Cooper"`
+are the same real entity but share **no name similarity**, so name-blocking never compares
+them. New candidate signals beyond name trigrams / embedding-on-name:
+- **Shared-provenance co-reference** — two entities of the same `entity_type` citing the
+  same `event_id` in `provenance_event_ids`, where one is low-quality (numeric / ungrounded
+  / single-token). The shared event's content is the adjudication context ("does this event
+  support these being the same person?").
+- **Context embedding** — embed the *facts/statements about* an entity (not just its name),
+  so name-divergent dupes still cluster. Reuses the bulk-embed lane.
+- **Teacher-version signal** — provenance maps to `distillation_traces.llm_model` /
+  `system_prompt_hash`. Prefer the newer-teacher extraction as master; an entity *only* ever
+  produced by the superseded teacher and never re-confirmed by the new one is both a fusion
+  candidate (likely a worse rendering of a node the new teacher got right) and a decay
+  candidate (stale-teacher orphan — see B).
+### A3. Master-node selection — replace richest-row-wins
+#82 uses "richest-row-wins", which (flagged in review) would crown the typo **"Phil Mossop"**
+over **"Philip Mossop"**. Replace with a **scored** canonical pick:
+| Signal | Effect |
+|---|---|
+| **Directory/authority anchor** (name matches an org-directory / HubSpot contact / Pip `contact_email`+`contact_name`) | dominant + → canonical |
+| Grounding (name appears verbatim in a provenance event's content) | + |
+| Teacher recency (newer `llm_model`) | + |
+| Corroboration (`cardinality(provenance_event_ids)`) | + |
+| Looks-like-ID (digit-ratio > 0.5) / hallucinated-email flag / single-token bare name | − − |
+Master = highest score. Losers' surface forms become **aliases** on the master (so existing
+lookups still resolve), facts/relationships are repointed, losers tombstoned in
+`entity_merges` with `rollback_payload`. Directory-anchored selection is the key fix: an
+authoritative source, when present, beats any heuristic.
+### A4. Fact + relationship fusion (today only entities fuse)
+After entity fusion (so subject/object ids are canonical):
+- **Facts** — exact `(arena, subject, predicate, object)` dupes already collapse via the
+  content-id. **Semantic** dupes (same assertion, different surface — "joined Acme" vs "works
+  at Acme") need statement-embedding similarity + LLM adjudication ("same assertion?").
+  Master fact = max confidence + best-grounded statement; union provenance; tombstone dupes.
+  New `fact_merges` audit mirroring `entity_merges`.
+- **Relationships** — `(from,to,type)` already collapses; a controlled rel-type vocabulary
+  ("works at" ≡ "employed by") is a later optional canonicalization.
+### A5. Audit, reversibility, safety rails
+Reuse `entity_merges`; add `fact_merges`. Every fusion carries `rollback_payload`.
+LLM-adjudicated merges store prompt+verdict. **Disclosure rail:** never send
+`disclosure_class='restricted'` rows to the LLM adjudicator (data-egress; the #82 review
+item). Auto-merge only above a high confidence band; everything else → human-review queue.
+---
+## Part B — Decay: age out stale and low-value nodes
+### B1. Separate `salience` from `confidence` (important)
+Do **not** decay `confidence` — it means "how corroborated/true is this", and decaying it
+would lie about corroboration. Add a separate **`salience`** (retention priority) to
+entities/facts/relationships. Decay acts on salience; eviction keys on salience.
+`salience(t) = salience₀ · exp(−ln2 · Δt / half_life[category])`, bumped on access or
+re-corroboration. Per-category half-life:
+| category | half-life | rationale |
+|---|---|---|
+| decision, commitment | very long / ∞ | durable record |
+| state, preference | medium | changes but matters |
+| mention, observation | short | ephemeral |
+`Δt` = time since `last_seen` **or** a new `last_accessed` (bumped when a node is returned by
+`/search` — cheap write, makes retrieval keep memories alive). Re-corroboration (new
+provenance) resets the clock and bumps salience.
+### B2. Born-salience — the cheap partial cure
+Seed `salience₀` from extraction-quality signals already computed (the trap detectors:
+ungrounded, numeric-ID-person, hallucinated-email, `noise_filter` hits). **Junk is born
+low**, so it decays below threshold and self-evicts fast — pollution cleans itself even
+without a fusion match.
+### B3. Eviction (GC)
+Node is evictable when: `salience < min_threshold` **AND** `last_seen`/`last_accessed`
+older than a floor **AND** not referenced by a surviving higher-salience node (an entity
+that's the subject/object of a live fact survives). Eviction = **tombstone** (soft-delete +
+retention window) → hard-delete after grace, cascading to the node's Qdrant points +
+`vector_provenance`. Never evict `disclosure_class='restricted'` without sign-off.
+### B4. Capacity bound (optional)
+Per-arena soft cap; when exceeded, evict lowest-salience first. Backstop against unbounded
+arenas.
+### B5. Cadence + safety
+Background per-arena pass (timer on the engine box), dry-run → `--apply` in a quiet window,
+counts logged, fully arena-scoped. Same operational shape as the distiller autoscaler /
+sparse backfill.
+---
+## Part C — Ordering & how they combine
+Per arena, on schedule: **(1) fusion → (2) decay.** Fusion first so a master node absorbs
+its duplicates' provenance/salience *before* decay judges it (else a real node split across
+two weak dupes could wrongly decay out). Then decay ages + evicts the survivors.
+**This is what finally cures immortal pollution:**
+- 7B polluted node *with* a correct Qwen3.6 counterpart → **fused**, correct one as master,
+  polluted demoted to alias / tombstoned.
+- 7B pure-junk node with *no* correct counterpart (numeric-ID-person, ungrounded) → born-low
+  salience + no corroboration + never accessed → **decays out and is evicted**.
+Together they convert the accrete-only store into a self-healing one. `pip-agents` could
+then self-clean over time instead of requiring a nuke (a nuke is still faster for a one-shot
+reset, but no longer the *only* path).
+---
+## Part D — Schema changes
+- `entities`: `+ salience REAL DEFAULT …`, `+ last_accessed TIMESTAMPTZ`.
+- `facts`: `+ salience REAL`, `+ last_accessed TIMESTAMPTZ` (keep `confidence` as-is =
+  corroboration truth; `asserted_at`/`expires_at` already exist).
+- `relationships`: `+ salience REAL`, `+ last_accessed` (already has `weight`,
+  `first/last_seen`).
+- new `fact_merges` audit (mirror `entity_merges` incl. `rollback_payload`).
+- new `fusion_runs` + `decay_runs` ledgers for observability.
+- `/search` gains a `last_accessed = NOW()` bump on returned nodes (batched).
+## Part E — Rollout (each flag-gated, arena-scoped, dry-run-first, audited)
+1. **Salience scoring only** — add columns, born-salience + decay math, NO eviction.
+   Observe distributions; confirm junk scores low and durable facts stay high.
+2. **Eviction** — dry-run (count what *would* evict) → `--apply` in a quiet window.
+3. **Fusion extension** — scored canonical selection (fix typo-crowning) + cross-run
+   detection + fact fusion, dry-run → apply.
+4. **Online/continuous** — wire fusion+decay to run after distillation per arena.
+## Open questions
+- Half-life constants per category — needs a calibration pass against real arenas.
+- `last_accessed` write amplification on hot search paths — batch/throttle the bump.
+- Directory authority source for canonical anchoring — HubSpot contacts? a curated table?
+- Interaction with the (still-open) source_id supersede mode — fusion partly subsumes it,
+  but explicit supersede is cheaper for known-mutable sources.

package/packages/memory-engine-v2/RFC-fusion-drive.md ADDED Viewed

@@ -0,0 +1,199 @@
+# RFC: the Fusion Drive — v2 memory self-healing (cross-run node fusion + decay)
+> **Fusion Drive** = the continuous, arena-scoped background engine that keeps the v2
+> memory graph self-healing: it *fuses* duplicate/near-duplicate nodes from different
+> distillation runs into a single master node (horizontal convergence) and *decays* stale,
+> low-value, and junk nodes out of existence (vertical aging). Named for the drive that
+> does the fusing — the decay pass rides the same engine.
+**Status:** spec + implementation (PR #92, then completion PR) — 2026-06-13.
+**Implemented:** salience scoring + decay; **eviction** (`fusion_drive_decay.py --evict`,
+reversible via `node_evictions`); **entity AND relationship decay**; **fusion** of exact +
+cross-run-shared-provenance entity dupes and exact-triple fact dupes, plus an **LLM
+adjudication tier via the in-VPC distiller** (Qwen3.6 — NO egress) for ambiguous cross-run
+entities and semantic (same-assertion-different-words) facts; **authority signals** wired
+into canonical scoring (`grounded` = name verbatim in a provenance event;
+`from_current_teacher` = `distillation_traces.llm_model`); **born-salience** in BOTH the
+async distiller and the sync extractor (+ backfill for existing rows); **continuous
+scheduling** (the `fusion-drive-sweep` 6h timer — dry-run-default, never `--evict` from
+cron). All arena-scoped, dry-run-default, transactional, reversible, audited.
+**Remaining:** `in_directory` anchoring (needs an authoritative directory/contacts source —
+no such table exists yet; the scorer already supports it for when one lands); and the
+**half-life / threshold / salience-constant CALIBRATION pass on a real arena before
+`--evict` is ever run in prod** — eviction stays a deliberate manual op until then.
+**Builds on:** `RFC-entity-reconciliation.md`, `scripts/entity_resolution_v2.py` (#82),
+`org-model/migrations/002_entity_merges_audit.sql`.
+**Motivated by:** the v2 store is currently **pure-accretion** — three independent
+properties, all verified in code, mean nothing ever leaves or improves in place:
+1. **No supersede by source_id** — event identity is `sha256(arena:content)`; re-emitting
+   edited content appends a new event, the old persists.
+2. **Accrete-only graph writes** — entity/fact upserts are `ON CONFLICT (id) DO UPDATE`
+   that only merge aliases/provenance and bump confidence; a *corrected* extraction has a
+   different deterministic id, so it lands **beside** the polluted node, never replacing it.
+3. **No decay/eviction** — v2 has no GC; fact confidence only moves up; recency affects
+   search ranking only, never retention.
+Net: improving the extractor/teacher only helps **new** content. Accumulated 7B-era
+pollution (hallucinated emails, numeric-ID-as-person, ungrounded entities) is immortal.
+`pentatonic-team` had to be **nuked** rather than re-distilled because of this; `pip-agents`
+(87k events) still carries all of it.
+This RFC makes the store **self-healing** via two complementary mechanisms:
+**fusion** (horizontal — converge duplicate/near-duplicate nodes from different
+distillation runs into one *master* node) and **decay** (vertical — age out stale and
+low-value nodes). Both are gated, arena-scoped, audited, and reversible.
+---
+## Part A — Fusion: converge near-duplicate nodes into a master
+Extends the existing entity-resolution machinery along four axes.
+### A1. Online + continuous (today it's dry-run batch)
+Run fusion as a scheduled per-arena pass (systemd timer on the engine box, same pattern as
+the distiller autoscaler) **and** opportunistically after a distillation run touches an
+arena's entities. Keep #82's invariants: dry-run default, `--apply` gate, arena scoping,
+`entity_merges` rollback. Add a `fusion_runs` ledger (arena, started_at, candidates,
+merged, mode) for observability.
+### A2. Cross-distillation-run detection (the actual pollution cure)
+The hard case #82 misses: 7B `"1716801984"` (numeric-ID person) and Qwen3.6 `"Katie Cooper"`
+are the same real entity but share **no name similarity**, so name-blocking never compares
+them. New candidate signals beyond name trigrams / embedding-on-name:
+- **Shared-provenance co-reference** — two entities of the same `entity_type` citing the
+  same `event_id` in `provenance_event_ids`, where one is low-quality (numeric / ungrounded
+  / single-token). The shared event's content is the adjudication context ("does this event
+  support these being the same person?").
+- **Context embedding** — embed the *facts/statements about* an entity (not just its name),
+  so name-divergent dupes still cluster. Reuses the bulk-embed lane.
+- **Teacher-version signal** — provenance maps to `distillation_traces.llm_model` /
+  `system_prompt_hash`. Prefer the newer-teacher extraction as master; an entity *only* ever
+  produced by the superseded teacher and never re-confirmed by the new one is both a fusion
+  candidate (likely a worse rendering of a node the new teacher got right) and a decay
+  candidate (stale-teacher orphan — see B).
+### A3. Master-node selection — replace richest-row-wins
+#82 uses "richest-row-wins", which (flagged in review) would crown the typo **"Phil Mossop"**
+over **"Philip Mossop"**. Replace with a **scored** canonical pick:
+| Signal | Effect |
+|---|---|
+| **Directory/authority anchor** (name matches an org-directory / HubSpot contact / Pip `contact_email`+`contact_name`) | dominant + → canonical |
+| Grounding (name appears verbatim in a provenance event's content) | + |
+| Teacher recency (newer `llm_model`) | + |
+| Corroboration (`cardinality(provenance_event_ids)`) | + |
+| Looks-like-ID (digit-ratio > 0.5) / hallucinated-email flag / single-token bare name | − − |
+Master = highest score. Losers' surface forms become **aliases** on the master (so existing
+lookups still resolve), facts/relationships are repointed, losers tombstoned in
+`entity_merges` with `rollback_payload`. Directory-anchored selection is the key fix: an
+authoritative source, when present, beats any heuristic.
+### A4. Fact + relationship fusion (today only entities fuse)
+After entity fusion (so subject/object ids are canonical):
+- **Facts** — exact `(arena, subject, predicate, object)` dupes already collapse via the
+  content-id. **Semantic** dupes (same assertion, different surface — "joined Acme" vs "works
+  at Acme") need statement-embedding similarity + LLM adjudication ("same assertion?").
+  Master fact = max confidence + best-grounded statement; union provenance; tombstone dupes.
+  New `fact_merges` audit mirroring `entity_merges`.
+- **Relationships** — `(from,to,type)` already collapses; a controlled rel-type vocabulary
+  ("works at" ≡ "employed by") is a later optional canonicalization.
+### A5. Audit, reversibility, safety rails
+Reuse `entity_merges`; add `fact_merges`. Every fusion carries `rollback_payload`.
+LLM-adjudicated merges store prompt+verdict. **Disclosure rail:** never send
+`disclosure_class='restricted'` rows to the LLM adjudicator (data-egress; the #82 review
+item). Auto-merge only above a high confidence band; everything else → human-review queue.
+---
+## Part B — Decay: age out stale and low-value nodes
+### B1. Separate `salience` from `confidence` (important)
+Do **not** decay `confidence` — it means "how corroborated/true is this", and decaying it
+would lie about corroboration. Add a separate **`salience`** (retention priority) to
+entities/facts/relationships. Decay acts on salience; eviction keys on salience.
+`salience(t) = salience₀ · exp(−ln2 · Δt / half_life[category])`, bumped on access or
+re-corroboration. Per-category half-life:
+| category | half-life | rationale |
+|---|---|---|
+| decision, commitment | very long / ∞ | durable record |
+| state, preference | medium | changes but matters |
+| mention, observation | short | ephemeral |
+`Δt` = time since `last_seen` **or** a new `last_accessed` (bumped when a node is returned by
+`/search` — cheap write, makes retrieval keep memories alive). Re-corroboration (new
+provenance) resets the clock and bumps salience.
+### B2. Born-salience — the cheap partial cure
+Seed `salience₀` from extraction-quality signals already computed (the trap detectors:
+ungrounded, numeric-ID-person, hallucinated-email, `noise_filter` hits). **Junk is born
+low**, so it decays below threshold and self-evicts fast — pollution cleans itself even
+without a fusion match.
+### B3. Eviction (GC)
+Node is evictable when: `salience < min_threshold` **AND** `last_seen`/`last_accessed`
+older than a floor **AND** not referenced by a surviving higher-salience node (an entity
+that's the subject/object of a live fact survives). Eviction = **tombstone** (soft-delete +
+retention window) → hard-delete after grace, cascading to the node's Qdrant points +
+`vector_provenance`. Never evict `disclosure_class='restricted'` without sign-off.
+### B4. Capacity bound (optional)
+Per-arena soft cap; when exceeded, evict lowest-salience first. Backstop against unbounded
+arenas.
+### B5. Cadence + safety
+Background per-arena pass (timer on the engine box), dry-run → `--apply` in a quiet window,
+counts logged, fully arena-scoped. Same operational shape as the distiller autoscaler /
+sparse backfill.
+---
+## Part C — Ordering & how they combine
+Per arena, on schedule: **(1) fusion → (2) decay.** Fusion first so a master node absorbs
+its duplicates' provenance/salience *before* decay judges it (else a real node split across
+two weak dupes could wrongly decay out). Then decay ages + evicts the survivors.
+**This is what finally cures immortal pollution:**
+- 7B polluted node *with* a correct Qwen3.6 counterpart → **fused**, correct one as master,
+  polluted demoted to alias / tombstoned.
+- 7B pure-junk node with *no* correct counterpart (numeric-ID-person, ungrounded) → born-low
+  salience + no corroboration + never accessed → **decays out and is evicted**.
+Together they convert the accrete-only store into a self-healing one. `pip-agents` could
+then self-clean over time instead of requiring a nuke (a nuke is still faster for a one-shot
+reset, but no longer the *only* path).
+---
+## Part D — Schema changes
+- `entities`: `+ salience REAL DEFAULT …`, `+ last_accessed TIMESTAMPTZ`.
+- `facts`: `+ salience REAL`, `+ last_accessed TIMESTAMPTZ` (keep `confidence` as-is =
+  corroboration truth; `asserted_at`/`expires_at` already exist).
+- `relationships`: `+ salience REAL`, `+ last_accessed` (already has `weight`,
+  `first/last_seen`).
+- new `fact_merges` audit (mirror `entity_merges` incl. `rollback_payload`).
+- new `fusion_runs` + `decay_runs` ledgers for observability.
+- `/search` gains a `last_accessed = NOW()` bump on returned nodes (batched).
+## Part E — Rollout (each flag-gated, arena-scoped, dry-run-first, audited)
+1. **Salience scoring only** — add columns, born-salience + decay math, NO eviction.
+   Observe distributions; confirm junk scores low and durable facts stay high.
+2. **Eviction** — dry-run (count what *would* evict) → `--apply` in a quiet window.
+3. **Fusion extension** — scored canonical selection (fix typo-crowning) + cross-run
+   detection + fact fusion, dry-run → apply.
+4. **Online/continuous** — wire fusion+decay to run after distillation per arena.
+## Open questions
+- Half-life constants per category — needs a calibration pass against real arenas.
+- `last_accessed` write amplification on hot search paths — batch/throttle the bump.
+- Directory authority source for canonical anchoring — HubSpot contacts? a curated table?
+- Interaction with the (still-open) source_id supersede mode — fusion partly subsumes it,
+  but explicit supersede is cheaper for known-mutable sources.

package/packages/memory-engine-v2/extractor-async/confidence.py CHANGED Viewed

@@ -60,3 +60,40 @@ def corroborated_confidence(n_sources: int) -> float:
     if bumped > _CONF_CAP:
         return _CONF_CAP
     return round(bumped, 2)
+# ── born salience (Fusion Drive) ─────────────────────────────────────
+# Retention priority a node is stamped with at extraction time, SEPARATE
+# from confidence (confidence = corroboration/truth; salience = how long
+# it's worth keeping). Junk — flagged by the extractor's own quality
+# detectors (noise name, numeric-ID-as-person, hallucinated email,
+# ungrounded, etc.) — is born near the floor so the Fusion Drive decay
+# pass evicts it on a short clock instead of the multi-year default.
+#
+# This MUST stay byte-identical to fusion_drive/salience.py:born_salience
+# (the decay side uses the same scale). test_born_salience_parity.py
+# guards the two against drift — same pattern as entity_id.py's parity
+# test across the sync/async build contexts.
+_SAL_BASE = 0.50
+_SAL_CORROB_PER_SOURCE = 0.10
+_SAL_CORROB_CAP = 0.30
+_SAL_FLOOR = 0.01
+_SAL_CEIL = 1.00
+_SAL_PENALTIES = {
+    "noise_name": 0.45,
+    "numeric_id_person": 0.45,
+    "hallucinated_email": 0.40,
+    "ungrounded": 0.35,
+    "subject_undeclared": 0.25,
+    "low_signal": 0.15,
+}
+def born_salience(n_sources: int = 1, quality_flags: list[str] | None = None) -> float:
+    """Salience to stamp on a freshly extracted node. See the module note."""
+    s = _SAL_BASE
+    if n_sources > 1:
+        s += min(_SAL_CORROB_CAP, _SAL_CORROB_PER_SOURCE * (n_sources - 1))
+    for flag in quality_flags or []:
+        s -= _SAL_PENALTIES.get(flag, 0.0)
+    return round(max(_SAL_FLOOR, min(_SAL_CEIL, s)), 4)

package/packages/memory-engine-v2/extractor-async/source_time.py ADDED Viewed

@@ -0,0 +1,63 @@
+"""source_time — robust ISO-8601 source-time parsing for graph stamping.
+The memory graph must stamp `events.emitted_at` and the graph rows'
+`first_seen` / `last_seen` / `asserted_at` from the SOURCE time of the
+content (when the email/meeting/message actually happened), NOT the
+ingest wall-clock (`NOW()`). The source time is carried on the event as
+`attributes.timestamp` (ISO-8601). This helper promotes it.
+Mirrors `compat/server.py:_parse_ts` (handles the bare `Z` suffix that
+`datetime.fromisoformat` only learned in 3.11) but returns a tz-aware
+`datetime` rather than a unix float, because the destination columns are
+`TIMESTAMPTZ` and we want psycopg to bind a datetime, not an epoch.
+CONTRACT (load-bearing): callers MUST fall back to the existing default
+(received / NOW) when the source time is absent or unparseable. This
+helper NEVER raises and returns `None` on anything it can't parse — the
+caller is responsible for the `or NOW()` fallback so we never NULL a
+NOT NULL column or crash the ingest/distill path.
+NOTE: keep this byte-identical with the copy in extractor-sync/. Same
+convention as entity_id.py — two services, one parsing rule.
+"""
+from __future__ import annotations
+from datetime import datetime, timezone
+from typing import Any
+def parse_source_time(value: Any) -> datetime | None:
+    """Best-effort ISO-8601 -> tz-aware datetime. Returns None on
+    anything we can't parse (caller falls back to NOW()).
+    Accepts both the bare `Z` suffix and explicit offsets. A parsed
+    value with no offset is assumed UTC (the producers emit UTC ISO
+    strings; a naive datetime would break TIMESTAMPTZ comparisons)."""
+    if not isinstance(value, str) or not value:
+        return None
+    try:
+        # `fromisoformat` handles `+00:00` but not the bare `Z` suffix
+        # until Python 3.11; normalise to be safe across runtime
+        # versions on the engine box.
+        dt = datetime.fromisoformat(value.replace("Z", "+00:00"))
+    except Exception:
+        return None
+    if dt.tzinfo is None:
+        # Producer emitted a naive ISO string; treat as UTC rather than
+        # letting psycopg interpret it in the server's local zone.
+        dt = dt.replace(tzinfo=timezone.utc)
+    return dt
+def event_source_time(event: dict[str, Any]) -> datetime | None:
+    """Pull the source time off an event dict's attributes.
+    Precedence: `attributes.timestamp` (the source/content time) wins
+    over `attributes.emitted_at` (a producer-supplied emit-now, which is
+    closer to ingest time). Returns None if neither parses — caller
+    falls back to NOW()."""
+    attrs = event.get("attributes") or {}
+    return parse_source_time(attrs.get("timestamp")) or parse_source_time(
+        attrs.get("emitted_at")
+    )

package/packages/memory-engine-v2/extractor-async/test_born_salience_parity.py ADDED Viewed

@@ -0,0 +1,35 @@
+"""Parity guard: confidence.born_salience (worker, copied into the container)
+must stay byte-equivalent to fusion_drive/salience.born_salience (the decay
+side). Same pattern as test_entity_id_parity.py — the two live across a Docker
+build-context boundary and would silently drift otherwise."""
+from __future__ import annotations
+import os
+import sys
+import confidence as worker
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "fusion_drive"))
+import salience as drive  # noqa: E402
+def test_constants_match():
+    assert worker._SAL_BASE == drive.BASE_SALIENCE
+    assert worker._SAL_CORROB_PER_SOURCE == drive.CORROB_PER_SOURCE
+    assert worker._SAL_CORROB_CAP == drive.CORROB_CAP
+    assert worker._SAL_FLOOR == drive.SALIENCE_FLOOR
+    assert worker._SAL_CEIL == drive.SALIENCE_CEIL
+    assert worker._SAL_PENALTIES == drive.QUALITY_PENALTIES
+def test_output_matches_across_input_matrix():
+    flagsets = [
+        None, [], ["noise_name"], ["numeric_id_person"], ["hallucinated_email"],
+        ["ungrounded"], ["subject_undeclared"], ["low_signal"],
+        ["numeric_id_person", "hallucinated_email", "ungrounded"],
+        ["noise_name"] * 5,
+    ]
+    for n in (1, 2, 3, 5, 100):
+        for flags in flagsets:
+            assert worker.born_salience(n, flags) == drive.born_salience(n_sources=n, quality_flags=flags), (n, flags)

package/packages/memory-engine-v2/extractor-async/test_source_time.py ADDED Viewed

@@ -0,0 +1,102 @@
+"""Tests for source_time — promoting source event time onto graph rows.
+The contract under test: source time present and parseable → used;
+absent, empty, or garbage → returns None so the caller falls back to
+NOW() (never crashes, never NULLs a NOT NULL column).
+Run: pytest packages/memory-engine-v2/extractor-async/test_source_time.py
+"""
+from __future__ import annotations
+from datetime import datetime, timezone
+import pytest
+from source_time import event_source_time, parse_source_time
+class TestParseSourceTime:
+    def test_iso_with_z_suffix(self):
+        dt = parse_source_time("2025-03-14T09:30:00Z")
+        assert dt == datetime(2025, 3, 14, 9, 30, tzinfo=timezone.utc)
+    def test_iso_with_explicit_offset(self):
+        dt = parse_source_time("2025-03-14T09:30:00+00:00")
+        assert dt == datetime(2025, 3, 14, 9, 30, tzinfo=timezone.utc)
+    def test_iso_with_nonzero_offset_preserved(self):
+        dt = parse_source_time("2025-03-14T12:30:00+03:00")
+        # 12:30+03:00 == 09:30 UTC
+        assert dt.utcoffset().total_seconds() == 3 * 3600
+        assert dt.astimezone(timezone.utc) == datetime(
+            2025, 3, 14, 9, 30, tzinfo=timezone.utc
+        )
+    def test_naive_iso_assumed_utc(self):
+        # No offset → must NOT come back naive (would break TIMESTAMPTZ
+        # comparisons); we assume UTC.
+        dt = parse_source_time("2025-03-14T09:30:00")
+        assert dt is not None
+        assert dt.tzinfo is not None
+        assert dt == datetime(2025, 3, 14, 9, 30, tzinfo=timezone.utc)
+    # --- fallback cases: must return None, never raise ---
+    @pytest.mark.parametrize(
+        "bad",
+        [
+            None,
+            "",
+            "not-a-date",
+            "2025-13-99T99:99:99Z",  # structurally ISO-ish but invalid
+            "14/03/2025",            # wrong format
+            12345,                    # not a string
+            [],                       # not a string
+            {"timestamp": "x"},      # not a string
+        ],
+    )
+    def test_garbage_or_absent_returns_none(self, bad):
+        assert parse_source_time(bad) is None
+class TestEventSourceTime:
+    def test_prefers_timestamp_over_emitted_at(self):
+        ev = {
+            "attributes": {
+                "timestamp": "2025-01-01T00:00:00Z",     # source time
+                "emitted_at": "2025-06-01T00:00:00Z",    # producer emit-now
+            }
+        }
+        assert event_source_time(ev) == datetime(
+            2025, 1, 1, 0, 0, tzinfo=timezone.utc
+        )
+    def test_falls_back_to_emitted_at_when_no_timestamp(self):
+        ev = {"attributes": {"emitted_at": "2025-06-01T00:00:00Z"}}
+        assert event_source_time(ev) == datetime(
+            2025, 6, 1, 0, 0, tzinfo=timezone.utc
+        )
+    def test_none_when_neither_present(self):
+        assert event_source_time({"attributes": {}}) is None
+    def test_none_when_no_attributes(self):
+        # Must not crash on an event with a missing/None attributes bag.
+        assert event_source_time({}) is None
+        assert event_source_time({"attributes": None}) is None
+    def test_garbage_timestamp_falls_back_to_emitted_at(self):
+        ev = {
+            "attributes": {
+                "timestamp": "garbage",
+                "emitted_at": "2025-06-01T00:00:00Z",
+            }
+        }
+        assert event_source_time(ev) == datetime(
+            2025, 6, 1, 0, 0, tzinfo=timezone.utc
+        )
+    def test_all_garbage_returns_none(self):
+        ev = {"attributes": {"timestamp": "nope", "emitted_at": "also-nope"}}
+        assert event_source_time(ev) is None