@pentatonic-ai/ai-agent-sdk 0.10.15 → 0.10.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/index.cjs CHANGED
@@ -878,7 +878,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
878
878
  }
879
879
 
880
880
  // src/telemetry.js
881
- var VERSION = "0.10.15";
881
+ var VERSION = "0.10.17";
882
882
  var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
883
883
  function machineId() {
884
884
  const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";
package/dist/index.js CHANGED
@@ -847,7 +847,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
847
847
  }
848
848
 
849
849
  // src/telemetry.js
850
- var VERSION = "0.10.15";
850
+ var VERSION = "0.10.17";
851
851
  var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
852
852
  function machineId() {
853
853
  const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@pentatonic-ai/ai-agent-sdk",
3
- "version": "0.10.15",
3
+ "version": "0.10.17",
4
4
  "description": "TES SDK — LLM observability and lifecycle tracking via Pentatonic Thing Event System. Track token usage, tool calls, and conversations. Manage things through event-sourced lifecycle stages with AI enrichment and vector search.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.cjs",
@@ -0,0 +1,148 @@
1
+ # RFC — Student→Teacher Distillation Cascade (#99)
2
+
3
+ **Status:** proposed (plan of record) · **Date:** 2026-06-18 · **Owner:** Phil
4
+
5
+ ## Goal
6
+
7
+ Cut distillation GPU cost/latency by making the cheap fine-tuned **student**
8
+ (`numind/NuExtract-2.0-4B`, full-FT on teacher traces) the **primary** distiller,
9
+ and reaching the expensive **teacher** (Qwen3.6-27B on the L40S fleet) only for a
10
+ **sampled + gated subset**. The teacher's output on that subset both corrects the
11
+ graph and feeds continuous improvement of the student.
12
+
13
+ This is a model **cascade**, not a replacement: the teacher stays the source of
14
+ truth on the hard/sampled tail; the student carries the bulk.
15
+
16
+ ## What we already have (grounding)
17
+
18
+ - **Trained student** at `s3://pme-deploy-prod-us-east-1-170649632502/backups/training/nuextract-2.0-4b-ft-final/` (7 GB). Train/eval loss 0.112/0.106.
19
+ - **Quality vs the CURRENT teacher** (`f1e0ff` prompt): median entity-F1 **0.909**, ~**0% invalid JSON**, ~1% email-hallucination, **no drift** despite a teacher-prompt change since training (the student trained on `bbdaba`). So it's deployable without a retrain.
20
+ - **Routing signals are weak.** Single-pass token-confidence and a feature-based verifier (HistGBM, AUC 0.63) only modestly beat random at predicting student↔teacher disagreement. Cheap features don't pinpoint the student's errors. → routing leans on deterministic gates + sampling, not a confidence/verifier gate.
21
+ - **Infra to reuse:** `distillation_queue` + the `extractor-async` consumer pattern; the combined-demand distiller **autoscaler**; the `fusion_queue` consumer pattern; `distillation_traces` with `system_prompt_hash` (teacher-version segmentation); the Fusion Drive (fuzzy self-healing on top).
22
+ - **Caveat carried forward:** the "disagreement rate" we've quoted (~12%) is a crude entity-name-exact-match proxy — entities only, penalizes normalization, teacher-as-gold. A proper structured-diff metric is a dependency for trustworthy monitoring + verifier labels (see Open Questions).
23
+
24
+ ## Architecture (steady state, post-flip)
25
+
26
+ ```
27
+ event → extractor-sync (deterministic provisional write, unchanged) → distillation_queue
28
+
29
+
30
+ STUDENT consumer (cheap, always-on small GPU)
31
+ • distils the event, writes entities/facts/relationships to the graph
32
+ tagged producer='student'
33
+ • computes the escalation decision (below)
34
+
35
+ └── ESCALATE subset → distillation_queue-teacher (the existing 27B flow,
36
+ autoscaled L40S fleet)
37
+ • teacher distils
38
+ • SUPERSEDES the student's rows for that event
39
+ • writes distillation_traces (gold) → monitoring + retraining
40
+ ```
41
+
42
+ The student is a new consumer; escalation is an enqueue onto the existing teacher
43
+ path. No new scheduler — the autoscaler already scales the teacher fleet on queue
44
+ depth (incl. fusion). The student runs on a cheap always-on GPU (L4/g6 — 4B fits
45
+ in ~8 GB; the teacher L40S fleet stays scale-to-zero for the escalation subset).
46
+
47
+ ## Escalation policy (the "sample out")
48
+
49
+ Escalate to the teacher iff **any** of:
50
+
51
+ 1. **Deterministic gate-fail** (cheap, high-precision):
52
+ - student output isn't valid JSON / violates schema, OR
53
+ - **grounding violation** — student emits an email/entity contradicting the
54
+ event's structured envelope (the #111 hard-key bag). *This is the gate that
55
+ catches "confidently wrong about a known fact," which confidence can't.*
56
+ 2. **High-value event class** — e.g. `decision`/`commitment` facts, VIP arenas —
57
+ always teacher (cheap metadata routing, decided pre-student where possible).
58
+ 3. **Random sample** (e.g. 3–5%) — *not* a quality lever; this exists to (a)
59
+ monitor student↔teacher agreement over time (drift), and (b) generate fresh
60
+ teacher-gold on live traffic for retraining the student + verifier (active
61
+ learning). The events the student is *unsure* on are the highest-value
62
+ retraining data.
63
+ 4. *(soft)* **verifier score** — verifier-v0 is weak (AUC 0.63); use it only as a
64
+ low-weight tiebreak to nudge borderline events toward the teacher, never as
65
+ the primary gate. Revisit if a stronger signal (self-consistency) is built.
66
+
67
+ Everything else: the student's write stands.
68
+
69
+ ## Supersede-on-escalation (the one genuinely new mechanism)
70
+
71
+ The store is **pure-accretion** — event identity is `content-hash`, there is **no
72
+ supersede-by-source_id**, and graph upserts only accrete (see
73
+ `pme2-dedup-supersede-semantics`). So when the teacher re-distils an escalated
74
+ event, its output must **replace** the student's rows for that event, not pile on
75
+ top. Options:
76
+
77
+ - **(A) Producer-tagged supersede (recommended).** Tag every graph row written by
78
+ distillation with `producer` (`student`/`teacher`) + `event_id` (already in
79
+ `provenance_event_ids`). On a teacher escalation, in one transaction: delete the
80
+ `producer='student'` rows whose provenance is that single event **and** that no
81
+ other event corroborates (don't delete a row a second event also supports —
82
+ decrement/repoint instead), then write the teacher's rows. Mirrors the Fusion
83
+ Drive's repoint/audit discipline; reversible via an audit receipt.
84
+ - **(B) Defer-write.** Decide escalation *before* the student writes (gates that
85
+ don't need the student output: high-value class, random sample), and for those
86
+ skip the student write entirely — teacher-only. Gate-fails (which need the
87
+ student output) still need (A). Cuts most supersede churn.
88
+ - Recommended: **B for the pre-decidable escalations + A for gate-fail
89
+ escalations.** Most escalations (class/random) are pre-decidable → no student
90
+ write to undo; only the (rarer) gate-fails incur a supersede.
91
+
92
+ Open: define "no other event corroborates" precisely against the accretion graph;
93
+ reuse Fusion Drive's `entity_merges`/`fact_merges` audit tables for reversibility.
94
+
95
+ ## Rollout sequence
96
+
97
+ 1. **Shadow** (no graph impact): student runs alongside; teacher still does
98
+ everything; log student-vs-teacher per event → validate on live traffic +
99
+ accumulate verifier/quality-metric labels. (We've already done a *batch* shadow
100
+ over recent traces; a brief standing shadow confirms on live flow.)
101
+ 2. **Flip to student-primary + sampled teacher** (the diagram above), starting
102
+ with a **conservative escalation rate** (high random %, broad high-value
103
+ classes), tighten as monitoring confirms quality.
104
+ 3. **Iterate**: retrain student on accumulated teacher-gold (esp. escalated/hard
105
+ events); rebuild the verifier when a proper metric + more data exist.
106
+
107
+ Kill switch + dry-run posture mirror the Fusion Drive (a flag to fall back to
108
+ teacher-primary instantly).
109
+
110
+ ## Monitoring & active learning
111
+
112
+ - **Drift:** the random-sample agreement rate, segmented by `system_prompt_hash`.
113
+ A teacher-prompt change (like `bbdaba`→`f1e0ff`) shows up as an agreement drop →
114
+ trigger a student refresh. (The hash segmentation already exists.)
115
+ - **Active learning:** escalated + sampled events with teacher-gold are the next
116
+ training corpus; the student improves on exactly its weak spots.
117
+ - **Cost model:** worth it iff escalation rate × teacher-cost ≪ teacher-on-
118
+ everything. Student-on-everything (cheap GPU, always-on) + teacher on ~10–20%
119
+ beats teacher-on-100% comfortably; the L40S fleet scale-to-zero already assumes
120
+ bursty teacher load.
121
+
122
+ ## Open questions / risks
123
+
124
+ - **Disagreement metric.** Replace the entity-name-exact-match proxy with a
125
+ structured diff (entities w/ type+email, facts as s·p·o, relationships; fuzzy
126
+ name matching; small independent rubric for semantic equivalence). Dependency
127
+ for trustworthy monitoring **and** better verifier labels. *(Highest-leverage
128
+ next build.)*
129
+ - **Prompt-version coupling.** The student is bound to a teacher prompt version
130
+ (`system_prompt_hash`). Every teacher-prompt change risks staling it → the
131
+ monitoring must watch the hash and the refresh pipeline must be cheap.
132
+ - **Routing is unsolved.** No cheap signal cleanly predicts student errors yet;
133
+ self-consistency (K-sample) is the most promising unbuilt option but costs K×.
134
+ Until then, gates + sampling carry it and we accept the residual tail.
135
+ - **Supersede correctness** in the accretion store (above) — the riskiest piece
136
+ to get exactly right.
137
+
138
+ ## Build phases (components)
139
+
140
+ 1. **Metric** — structured-diff scorer (offline, no GPU). *Do first.*
141
+ 2. **Student service** — always-on cheap-GPU server (4B) + a `distillation_queue`
142
+ student consumer; producer-tagging on graph writes.
143
+ 3. **Escalation + supersede** — gate/class/random logic; supersede-on-escalation
144
+ (B+A); reuse audit tables.
145
+ 4. **Shadow wiring** → **flip** → **monitoring dashboard** (agreement by hash).
146
+ 5. **Retrain loop** — periodic student/verifier retrain on accumulated gold.
147
+
148
+ 🤖 Generated with [Claude Code](https://claude.com/claude-code)
@@ -928,7 +928,7 @@ async def list_entities(req: GraphQueryRequest):
928
928
  params.extend([pattern, pattern])
929
929
  sql = f"""
930
930
  SELECT id, arena, entity_type, canonical_name, aliases,
931
- provenance_event_ids, last_seen
931
+ provenance_event_ids, attributes, last_seen
932
932
  FROM entities
933
933
  WHERE {' AND '.join(conditions)}
934
934
  ORDER BY last_seen DESC
@@ -55,3 +55,23 @@ def entity_id(arena: str, entity_type: str, canonical_name: str) -> str:
55
55
  """
56
56
  key = f"{arena}|{entity_type}|{normalize_surface_form(canonical_name)}"
57
57
  return "e_" + hashlib.sha256(key.encode()).hexdigest()[:24]
58
+
59
+
60
+ def person_id_key(name: str | None, email: str | None) -> str:
61
+ """The string `entity_id()` should hash to mint a PERSON's node id — the
62
+ EMAIL (the person's deterministic hard key) when present, else the name.
63
+
64
+ Lives HERE, in the byte-identical shared file, BECAUSE both extractors mint
65
+ person nodes and must agree: the sync pass builds them from the envelope at
66
+ ingest (it has the email), the async pass builds them from prose (email
67
+ promoted to an alias). Keying both on the email means the same person mints
68
+ the SAME id from either pass and in any processing order — converging to one
69
+ node instead of sync's name-keyed node and async's node racing, or a
70
+ re-distill re-homing them differently. The name is the fallback for a person
71
+ with no email (Fusion still merges those fuzzily). `entity_id()` normalises
72
+ (lowercase + trim) the result, so casing/whitespace variants of an email
73
+ collapse. (Org keying is the async-only `org_node_id_key`; person keying is
74
+ cross-pass, so it belongs in this parity-guarded file.)
75
+ """
76
+ e = (email or "").strip()
77
+ return e if e else (name or "")
@@ -0,0 +1,175 @@
1
+ """Unit tests for the student→teacher distillation cascade (#99).
2
+
3
+ Covers the pure decision surface — the JSON-salvage of student output, the
4
+ grounding email set derived from an event, and the escalation gates (parse
5
+ fail, empty, grounding violation, high-value class, random sample, pass). The
6
+ network call (call_student_one) and the DB writes (_apply_extraction,
7
+ _record_distillation) are integration-tested elsewhere; here we pin the routing
8
+ logic that decides student-XOR-teacher.
9
+ """
10
+
11
+ from __future__ import annotations
12
+
13
+ import importlib.util
14
+ from pathlib import Path
15
+
16
+ import pytest
17
+
18
+ _THIS = Path(__file__).resolve().parent
19
+
20
+
21
+ def _load_worker(name: str = "extractor_async_worker_cascade"):
22
+ spec = importlib.util.spec_from_file_location(name, _THIS / "worker.py")
23
+ assert spec and spec.loader
24
+ mod = importlib.util.module_from_spec(spec)
25
+ spec.loader.exec_module(mod)
26
+ return mod
27
+
28
+
29
+ try:
30
+ worker = _load_worker()
31
+ except ImportError as e:
32
+ pytest.skip(f"extractor-async deps unavailable: {e}", allow_module_level=True)
33
+
34
+
35
+ # ----------------------------------------------------------------------
36
+ # _salvage_json_object — the JSON-validity gate's parser
37
+ # ----------------------------------------------------------------------
38
+
39
+ def test_salvage_plain_object() -> None:
40
+ assert worker._salvage_json_object('{"entities": []}') == {"entities": []}
41
+
42
+
43
+ def test_salvage_strips_code_fence() -> None:
44
+ txt = '```json\n{"facts": [{"statement": "x"}]}\n```'
45
+ assert worker._salvage_json_object(txt) == {"facts": [{"statement": "x"}]}
46
+
47
+
48
+ def test_salvage_extracts_embedded_object() -> None:
49
+ txt = 'Sure! Here is the extraction:\n{"entities": [{"name": "Acme"}]}\nDone.'
50
+ assert worker._salvage_json_object(txt) == {"entities": [{"name": "Acme"}]}
51
+
52
+
53
+ def test_salvage_returns_none_on_garbage() -> None:
54
+ assert worker._salvage_json_object("not json at all") is None
55
+ assert worker._salvage_json_object("") is None
56
+ # a bare JSON array is not an event object
57
+ assert worker._salvage_json_object("[1, 2, 3]") is None
58
+
59
+
60
+ # ----------------------------------------------------------------------
61
+ # _event_known_emails — grounding set (content + structured envelope)
62
+ # ----------------------------------------------------------------------
63
+
64
+ def test_known_emails_from_content_and_envelope() -> None:
65
+ event = {
66
+ "content": "Reach me at alice@acme.com tomorrow.",
67
+ "attributes": {
68
+ "contact_email": "Bob <bob@acme.com>",
69
+ "to_emails": ["carol@acme.com", "dave@x.io"],
70
+ "cc_emails": ["erin@acme.com"],
71
+ },
72
+ }
73
+ known = worker._event_known_emails(event)
74
+ assert known == {
75
+ "alice@acme.com", "bob@acme.com", "carol@acme.com",
76
+ "dave@x.io", "erin@acme.com",
77
+ }
78
+
79
+
80
+ def test_known_emails_lowercased_and_empty_safe() -> None:
81
+ assert worker._event_known_emails({"content": "FOO@BAR.COM"}) == {"foo@bar.com"}
82
+ assert worker._event_known_emails({}) == set()
83
+
84
+
85
+ # ----------------------------------------------------------------------
86
+ # escalation_decision — the gates (precedence + each trigger)
87
+ # ----------------------------------------------------------------------
88
+
89
+ def _no_sample(monkeypatch):
90
+ """Pin the random sample off so the deterministic gates are isolated."""
91
+ monkeypatch.setattr(worker, "STUDENT_SAMPLE_RATE", 0.0)
92
+
93
+
94
+ def test_escalate_on_parse_fail(monkeypatch) -> None:
95
+ _no_sample(monkeypatch)
96
+ esc, reason = worker.escalation_decision({"content": ""}, None)
97
+ assert esc and reason == "student_parse_fail"
98
+
99
+
100
+ def test_escalate_on_empty_extraction(monkeypatch) -> None:
101
+ _no_sample(monkeypatch)
102
+ empty = {"entities": [], "facts": [], "relationships": []}
103
+ esc, reason = worker.escalation_decision({"content": ""}, empty)
104
+ assert esc and reason == "student_empty"
105
+
106
+
107
+ def test_escalate_on_grounding_violation(monkeypatch) -> None:
108
+ _no_sample(monkeypatch)
109
+ event = {"content": "Met with the vendor.", "attributes": {}}
110
+ # student invents an email not present anywhere in the event
111
+ result = {
112
+ "entities": [{"type": "person", "name": "X", "aliases": ["ghost@evil.com"]}],
113
+ "facts": [],
114
+ "relationships": [],
115
+ }
116
+ esc, reason = worker.escalation_decision(event, result)
117
+ assert esc and reason == "grounding_violation"
118
+
119
+
120
+ def test_no_escalation_when_email_is_grounded(monkeypatch) -> None:
121
+ _no_sample(monkeypatch)
122
+ event = {"content": "ping alice@acme.com", "attributes": {}}
123
+ result = {
124
+ "entities": [{"type": "person", "name": "Alice", "aliases": ["alice@acme.com"]}],
125
+ "facts": [],
126
+ "relationships": [],
127
+ }
128
+ esc, reason = worker.escalation_decision(event, result)
129
+ assert not esc and reason is None
130
+
131
+
132
+ def test_escalate_on_high_value_class(monkeypatch) -> None:
133
+ _no_sample(monkeypatch)
134
+ monkeypatch.setattr(worker, "HIGH_VALUE_CATEGORIES", {"decision", "commitment"})
135
+ event = {"content": "We will ship Friday.", "attributes": {}}
136
+ result = {
137
+ "entities": [],
138
+ "facts": [{"category": "decision", "statement": "ship Friday", "subject": "team"}],
139
+ "relationships": [],
140
+ }
141
+ esc, reason = worker.escalation_decision(event, result)
142
+ assert esc and reason == "high_value_class"
143
+
144
+
145
+ def test_escalate_on_random_sample(monkeypatch) -> None:
146
+ monkeypatch.setattr(worker, "STUDENT_SAMPLE_RATE", 1.0) # always sample
147
+ monkeypatch.setattr(worker, "HIGH_VALUE_CATEGORIES", set())
148
+ event = {"content": "low value note", "attributes": {}}
149
+ result = {"entities": [{"type": "other", "name": "thing"}], "facts": [], "relationships": []}
150
+ esc, reason = worker.escalation_decision(event, result)
151
+ assert esc and reason == "random_sample"
152
+
153
+
154
+ def test_student_handles_clean_low_value_event(monkeypatch) -> None:
155
+ """The common case: valid, grounded, non-high-value, not sampled → the
156
+ student's write stands (no escalation)."""
157
+ _no_sample(monkeypatch)
158
+ monkeypatch.setattr(worker, "HIGH_VALUE_CATEGORIES", {"decision", "commitment"})
159
+ event = {"content": "Acme released a new SKU.", "attributes": {}}
160
+ result = {
161
+ "entities": [{"type": "org", "name": "Acme"}],
162
+ "facts": [{"category": "state", "statement": "Acme released a SKU", "subject": "Acme"}],
163
+ "relationships": [],
164
+ }
165
+ esc, reason = worker.escalation_decision(event, result)
166
+ assert not esc and reason is None
167
+
168
+
169
+ # ----------------------------------------------------------------------
170
+ # Flag contract — cascade is a no-op until CASCADE_ENABLED is flipped.
171
+ # ----------------------------------------------------------------------
172
+
173
+ def test_cascade_default_off() -> None:
174
+ """Default env ⇒ teacher-only. The flag is the kill switch."""
175
+ assert worker.CASCADE_ENABLED is False
@@ -0,0 +1,62 @@
1
+ """Tests for fact_source — deriving the source label stamped onto facts.
2
+
3
+ The contract under test (SoR-drift foundation): prefer the finer
4
+ producer label `attributes.source` (gmail / hubspot / ...) over the
5
+ coarse `source_kind` enum; fall back to `source_kind` when no finer
6
+ label; return None only when neither is present (NULL == source-unknown,
7
+ the pre-009 state). Pure + total: never raises.
8
+
9
+ Run: pytest packages/memory-engine-v2/extractor-async/test_fact_source.py
10
+ """
11
+
12
+ from __future__ import annotations
13
+
14
+ import pytest
15
+
16
+ from worker import fact_source
17
+
18
+
19
+ class TestFactSource:
20
+ def test_prefers_finer_attributes_source(self):
21
+ # attributes.source (hubspot) wins over the coarse source_kind
22
+ # (system) — this is the CRM-vs-email granularity SoR-drift needs.
23
+ ev = {"source_kind": "system", "attributes": {"source": "hubspot"}}
24
+ assert fact_source(ev) == "hubspot"
25
+
26
+ def test_email_finer_than_note_kind(self):
27
+ ev = {"source_kind": "note", "attributes": {"source": "gmail"}}
28
+ assert fact_source(ev) == "gmail"
29
+
30
+ def test_falls_back_to_source_kind_when_no_attribute(self):
31
+ ev = {"source_kind": "chat", "attributes": {}}
32
+ assert fact_source(ev) == "chat"
33
+
34
+ def test_falls_back_when_attributes_missing(self):
35
+ ev = {"source_kind": "doc"}
36
+ assert fact_source(ev) == "doc"
37
+
38
+ def test_strips_whitespace(self):
39
+ ev = {"source_kind": "system", "attributes": {"source": " slack "}}
40
+ assert fact_source(ev) == "slack"
41
+
42
+ def test_blank_attribute_falls_through_to_kind(self):
43
+ # An empty/whitespace `source` must NOT win over a real kind.
44
+ ev = {"source_kind": "doc", "attributes": {"source": " "}}
45
+ assert fact_source(ev) == "doc"
46
+
47
+ # --- None cases: source-unknown, column stays NULL ---
48
+
49
+ @pytest.mark.parametrize(
50
+ "ev",
51
+ [
52
+ {},
53
+ {"attributes": {}},
54
+ {"attributes": {"source": ""}},
55
+ {"source_kind": "", "attributes": {"source": None}},
56
+ {"source_kind": None, "attributes": None},
57
+ # Non-string types must not crash and must not be stamped.
58
+ {"source_kind": 7, "attributes": {"source": 42}},
59
+ ],
60
+ )
61
+ def test_returns_none_when_no_usable_source(self, ev):
62
+ assert fact_source(ev) is None