@pentatonic-ai/ai-agent-sdk 0.10.16 → 0.10.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/index.cjs CHANGED
@@ -878,7 +878,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
878
878
  }
879
879
 
880
880
  // src/telemetry.js
881
- var VERSION = "0.10.16";
881
+ var VERSION = "0.10.18";
882
882
  var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
883
883
  function machineId() {
884
884
  const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";
package/dist/index.js CHANGED
@@ -847,7 +847,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
847
847
  }
848
848
 
849
849
  // src/telemetry.js
850
- var VERSION = "0.10.16";
850
+ var VERSION = "0.10.18";
851
851
  var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
852
852
  function machineId() {
853
853
  const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@pentatonic-ai/ai-agent-sdk",
3
- "version": "0.10.16",
3
+ "version": "0.10.18",
4
4
  "description": "TES SDK — LLM observability and lifecycle tracking via Pentatonic Thing Event System. Track token usage, tool calls, and conversations. Manage things through event-sourced lifecycle stages with AI enrichment and vector search.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.cjs",
@@ -0,0 +1,148 @@
1
+ # RFC — Student→Teacher Distillation Cascade (#99)
2
+
3
+ **Status:** proposed (plan of record) · **Date:** 2026-06-18 · **Owner:** Phil
4
+
5
+ ## Goal
6
+
7
+ Cut distillation GPU cost/latency by making the cheap fine-tuned **student**
8
+ (`numind/NuExtract-2.0-4B`, full-FT on teacher traces) the **primary** distiller,
9
+ and reaching the expensive **teacher** (Qwen3.6-27B on the L40S fleet) only for a
10
+ **sampled + gated subset**. The teacher's output on that subset both corrects the
11
+ graph and feeds continuous improvement of the student.
12
+
13
+ This is a model **cascade**, not a replacement: the teacher stays the source of
14
+ truth on the hard/sampled tail; the student carries the bulk.
15
+
16
+ ## What we already have (grounding)
17
+
18
+ - **Trained student** at `s3://pme-deploy-prod-us-east-1-170649632502/backups/training/nuextract-2.0-4b-ft-final/` (7 GB). Train/eval loss 0.112/0.106.
19
+ - **Quality vs the CURRENT teacher** (`f1e0ff` prompt): median entity-F1 **0.909**, ~**0% invalid JSON**, ~1% email-hallucination, **no drift** despite a teacher-prompt change since training (the student trained on `bbdaba`). So it's deployable without a retrain.
20
+ - **Routing signals are weak.** Single-pass token-confidence and a feature-based verifier (HistGBM, AUC 0.63) only modestly beat random at predicting student↔teacher disagreement. Cheap features don't pinpoint the student's errors. → routing leans on deterministic gates + sampling, not a confidence/verifier gate.
21
+ - **Infra to reuse:** `distillation_queue` + the `extractor-async` consumer pattern; the combined-demand distiller **autoscaler**; the `fusion_queue` consumer pattern; `distillation_traces` with `system_prompt_hash` (teacher-version segmentation); the Fusion Drive (fuzzy self-healing on top).
22
+ - **Caveat carried forward:** the "disagreement rate" we've quoted (~12%) is a crude entity-name-exact-match proxy — entities only, penalizes normalization, teacher-as-gold. A proper structured-diff metric is a dependency for trustworthy monitoring + verifier labels (see Open Questions).
23
+
24
+ ## Architecture (steady state, post-flip)
25
+
26
+ ```
27
+ event → extractor-sync (deterministic provisional write, unchanged) → distillation_queue
28
+
29
+
30
+ STUDENT consumer (cheap, always-on small GPU)
31
+ • distils the event, writes entities/facts/relationships to the graph
32
+ tagged producer='student'
33
+ • computes the escalation decision (below)
34
+
35
+ └── ESCALATE subset → distillation_queue-teacher (the existing 27B flow,
36
+ autoscaled L40S fleet)
37
+ • teacher distils
38
+ • SUPERSEDES the student's rows for that event
39
+ • writes distillation_traces (gold) → monitoring + retraining
40
+ ```
41
+
42
+ The student is a new consumer; escalation is an enqueue onto the existing teacher
43
+ path. No new scheduler — the autoscaler already scales the teacher fleet on queue
44
+ depth (incl. fusion). The student runs on a cheap always-on GPU (L4/g6 — 4B fits
45
+ in ~8 GB; the teacher L40S fleet stays scale-to-zero for the escalation subset).
46
+
47
+ ## Escalation policy (the "sample out")
48
+
49
+ Escalate to the teacher iff **any** of:
50
+
51
+ 1. **Deterministic gate-fail** (cheap, high-precision):
52
+ - student output isn't valid JSON / violates schema, OR
53
+ - **grounding violation** — student emits an email/entity contradicting the
54
+ event's structured envelope (the #111 hard-key bag). *This is the gate that
55
+ catches "confidently wrong about a known fact," which confidence can't.*
56
+ 2. **High-value event class** — e.g. `decision`/`commitment` facts, VIP arenas —
57
+ always teacher (cheap metadata routing, decided pre-student where possible).
58
+ 3. **Random sample** (e.g. 3–5%) — *not* a quality lever; this exists to (a)
59
+ monitor student↔teacher agreement over time (drift), and (b) generate fresh
60
+ teacher-gold on live traffic for retraining the student + verifier (active
61
+ learning). The events the student is *unsure* on are the highest-value
62
+ retraining data.
63
+ 4. *(soft)* **verifier score** — verifier-v0 is weak (AUC 0.63); use it only as a
64
+ low-weight tiebreak to nudge borderline events toward the teacher, never as
65
+ the primary gate. Revisit if a stronger signal (self-consistency) is built.
66
+
67
+ Everything else: the student's write stands.
68
+
69
+ ## Supersede-on-escalation (the one genuinely new mechanism)
70
+
71
+ The store is **pure-accretion** — event identity is `content-hash`, there is **no
72
+ supersede-by-source_id**, and graph upserts only accrete (see
73
+ `pme2-dedup-supersede-semantics`). So when the teacher re-distils an escalated
74
+ event, its output must **replace** the student's rows for that event, not pile on
75
+ top. Options:
76
+
77
+ - **(A) Producer-tagged supersede (recommended).** Tag every graph row written by
78
+ distillation with `producer` (`student`/`teacher`) + `event_id` (already in
79
+ `provenance_event_ids`). On a teacher escalation, in one transaction: delete the
80
+ `producer='student'` rows whose provenance is that single event **and** that no
81
+ other event corroborates (don't delete a row a second event also supports —
82
+ decrement/repoint instead), then write the teacher's rows. Mirrors the Fusion
83
+ Drive's repoint/audit discipline; reversible via an audit receipt.
84
+ - **(B) Defer-write.** Decide escalation *before* the student writes (gates that
85
+ don't need the student output: high-value class, random sample), and for those
86
+ skip the student write entirely — teacher-only. Gate-fails (which need the
87
+ student output) still need (A). Cuts most supersede churn.
88
+ - Recommended: **B for the pre-decidable escalations + A for gate-fail
89
+ escalations.** Most escalations (class/random) are pre-decidable → no student
90
+ write to undo; only the (rarer) gate-fails incur a supersede.
91
+
92
+ Open: define "no other event corroborates" precisely against the accretion graph;
93
+ reuse Fusion Drive's `entity_merges`/`fact_merges` audit tables for reversibility.
94
+
95
+ ## Rollout sequence
96
+
97
+ 1. **Shadow** (no graph impact): student runs alongside; teacher still does
98
+ everything; log student-vs-teacher per event → validate on live traffic +
99
+ accumulate verifier/quality-metric labels. (We've already done a *batch* shadow
100
+ over recent traces; a brief standing shadow confirms on live flow.)
101
+ 2. **Flip to student-primary + sampled teacher** (the diagram above), starting
102
+ with a **conservative escalation rate** (high random %, broad high-value
103
+ classes), tighten as monitoring confirms quality.
104
+ 3. **Iterate**: retrain student on accumulated teacher-gold (esp. escalated/hard
105
+ events); rebuild the verifier when a proper metric + more data exist.
106
+
107
+ Kill switch + dry-run posture mirror the Fusion Drive (a flag to fall back to
108
+ teacher-primary instantly).
109
+
110
+ ## Monitoring & active learning
111
+
112
+ - **Drift:** the random-sample agreement rate, segmented by `system_prompt_hash`.
113
+ A teacher-prompt change (like `bbdaba`→`f1e0ff`) shows up as an agreement drop →
114
+ trigger a student refresh. (The hash segmentation already exists.)
115
+ - **Active learning:** escalated + sampled events with teacher-gold are the next
116
+ training corpus; the student improves on exactly its weak spots.
117
+ - **Cost model:** worth it iff escalation rate × teacher-cost ≪ teacher-on-
118
+ everything. Student-on-everything (cheap GPU, always-on) + teacher on ~10–20%
119
+ beats teacher-on-100% comfortably; the L40S fleet scale-to-zero already assumes
120
+ bursty teacher load.
121
+
122
+ ## Open questions / risks
123
+
124
+ - **Disagreement metric.** Replace the entity-name-exact-match proxy with a
125
+ structured diff (entities w/ type+email, facts as s·p·o, relationships; fuzzy
126
+ name matching; small independent rubric for semantic equivalence). Dependency
127
+ for trustworthy monitoring **and** better verifier labels. *(Highest-leverage
128
+ next build.)*
129
+ - **Prompt-version coupling.** The student is bound to a teacher prompt version
130
+ (`system_prompt_hash`). Every teacher-prompt change risks staling it → the
131
+ monitoring must watch the hash and the refresh pipeline must be cheap.
132
+ - **Routing is unsolved.** No cheap signal cleanly predicts student errors yet;
133
+ self-consistency (K-sample) is the most promising unbuilt option but costs K×.
134
+ Until then, gates + sampling carry it and we accept the residual tail.
135
+ - **Supersede correctness** in the accretion store (above) — the riskiest piece
136
+ to get exactly right.
137
+
138
+ ## Build phases (components)
139
+
140
+ 1. **Metric** — structured-diff scorer (offline, no GPU). *Do first.*
141
+ 2. **Student service** — always-on cheap-GPU server (4B) + a `distillation_queue`
142
+ student consumer; producer-tagging on graph writes.
143
+ 3. **Escalation + supersede** — gate/class/random logic; supersede-on-escalation
144
+ (B+A); reuse audit tables.
145
+ 4. **Shadow wiring** → **flip** → **monitoring dashboard** (agreement by hash).
146
+ 5. **Retrain loop** — periodic student/verifier retrain on accumulated gold.
147
+
148
+ 🤖 Generated with [Claude Code](https://claude.com/claude-code)
@@ -0,0 +1,175 @@
1
+ """Unit tests for the student→teacher distillation cascade (#99).
2
+
3
+ Covers the pure decision surface — the JSON-salvage of student output, the
4
+ grounding email set derived from an event, and the escalation gates (parse
5
+ fail, empty, grounding violation, high-value class, random sample, pass). The
6
+ network call (call_student_one) and the DB writes (_apply_extraction,
7
+ _record_distillation) are integration-tested elsewhere; here we pin the routing
8
+ logic that decides student-XOR-teacher.
9
+ """
10
+
11
+ from __future__ import annotations
12
+
13
+ import importlib.util
14
+ from pathlib import Path
15
+
16
+ import pytest
17
+
18
+ _THIS = Path(__file__).resolve().parent
19
+
20
+
21
+ def _load_worker(name: str = "extractor_async_worker_cascade"):
22
+ spec = importlib.util.spec_from_file_location(name, _THIS / "worker.py")
23
+ assert spec and spec.loader
24
+ mod = importlib.util.module_from_spec(spec)
25
+ spec.loader.exec_module(mod)
26
+ return mod
27
+
28
+
29
+ try:
30
+ worker = _load_worker()
31
+ except ImportError as e:
32
+ pytest.skip(f"extractor-async deps unavailable: {e}", allow_module_level=True)
33
+
34
+
35
+ # ----------------------------------------------------------------------
36
+ # _salvage_json_object — the JSON-validity gate's parser
37
+ # ----------------------------------------------------------------------
38
+
39
+ def test_salvage_plain_object() -> None:
40
+ assert worker._salvage_json_object('{"entities": []}') == {"entities": []}
41
+
42
+
43
+ def test_salvage_strips_code_fence() -> None:
44
+ txt = '```json\n{"facts": [{"statement": "x"}]}\n```'
45
+ assert worker._salvage_json_object(txt) == {"facts": [{"statement": "x"}]}
46
+
47
+
48
+ def test_salvage_extracts_embedded_object() -> None:
49
+ txt = 'Sure! Here is the extraction:\n{"entities": [{"name": "Acme"}]}\nDone.'
50
+ assert worker._salvage_json_object(txt) == {"entities": [{"name": "Acme"}]}
51
+
52
+
53
+ def test_salvage_returns_none_on_garbage() -> None:
54
+ assert worker._salvage_json_object("not json at all") is None
55
+ assert worker._salvage_json_object("") is None
56
+ # a bare JSON array is not an event object
57
+ assert worker._salvage_json_object("[1, 2, 3]") is None
58
+
59
+
60
+ # ----------------------------------------------------------------------
61
+ # _event_known_emails — grounding set (content + structured envelope)
62
+ # ----------------------------------------------------------------------
63
+
64
+ def test_known_emails_from_content_and_envelope() -> None:
65
+ event = {
66
+ "content": "Reach me at alice@acme.com tomorrow.",
67
+ "attributes": {
68
+ "contact_email": "Bob <bob@acme.com>",
69
+ "to_emails": ["carol@acme.com", "dave@x.io"],
70
+ "cc_emails": ["erin@acme.com"],
71
+ },
72
+ }
73
+ known = worker._event_known_emails(event)
74
+ assert known == {
75
+ "alice@acme.com", "bob@acme.com", "carol@acme.com",
76
+ "dave@x.io", "erin@acme.com",
77
+ }
78
+
79
+
80
+ def test_known_emails_lowercased_and_empty_safe() -> None:
81
+ assert worker._event_known_emails({"content": "FOO@BAR.COM"}) == {"foo@bar.com"}
82
+ assert worker._event_known_emails({}) == set()
83
+
84
+
85
+ # ----------------------------------------------------------------------
86
+ # escalation_decision — the gates (precedence + each trigger)
87
+ # ----------------------------------------------------------------------
88
+
89
+ def _no_sample(monkeypatch):
90
+ """Pin the random sample off so the deterministic gates are isolated."""
91
+ monkeypatch.setattr(worker, "STUDENT_SAMPLE_RATE", 0.0)
92
+
93
+
94
+ def test_escalate_on_parse_fail(monkeypatch) -> None:
95
+ _no_sample(monkeypatch)
96
+ esc, reason = worker.escalation_decision({"content": ""}, None)
97
+ assert esc and reason == "student_parse_fail"
98
+
99
+
100
+ def test_escalate_on_empty_extraction(monkeypatch) -> None:
101
+ _no_sample(monkeypatch)
102
+ empty = {"entities": [], "facts": [], "relationships": []}
103
+ esc, reason = worker.escalation_decision({"content": ""}, empty)
104
+ assert esc and reason == "student_empty"
105
+
106
+
107
+ def test_escalate_on_grounding_violation(monkeypatch) -> None:
108
+ _no_sample(monkeypatch)
109
+ event = {"content": "Met with the vendor.", "attributes": {}}
110
+ # student invents an email not present anywhere in the event
111
+ result = {
112
+ "entities": [{"type": "person", "name": "X", "aliases": ["ghost@evil.com"]}],
113
+ "facts": [],
114
+ "relationships": [],
115
+ }
116
+ esc, reason = worker.escalation_decision(event, result)
117
+ assert esc and reason == "grounding_violation"
118
+
119
+
120
+ def test_no_escalation_when_email_is_grounded(monkeypatch) -> None:
121
+ _no_sample(monkeypatch)
122
+ event = {"content": "ping alice@acme.com", "attributes": {}}
123
+ result = {
124
+ "entities": [{"type": "person", "name": "Alice", "aliases": ["alice@acme.com"]}],
125
+ "facts": [],
126
+ "relationships": [],
127
+ }
128
+ esc, reason = worker.escalation_decision(event, result)
129
+ assert not esc and reason is None
130
+
131
+
132
+ def test_escalate_on_high_value_class(monkeypatch) -> None:
133
+ _no_sample(monkeypatch)
134
+ monkeypatch.setattr(worker, "HIGH_VALUE_CATEGORIES", {"decision", "commitment"})
135
+ event = {"content": "We will ship Friday.", "attributes": {}}
136
+ result = {
137
+ "entities": [],
138
+ "facts": [{"category": "decision", "statement": "ship Friday", "subject": "team"}],
139
+ "relationships": [],
140
+ }
141
+ esc, reason = worker.escalation_decision(event, result)
142
+ assert esc and reason == "high_value_class"
143
+
144
+
145
+ def test_escalate_on_random_sample(monkeypatch) -> None:
146
+ monkeypatch.setattr(worker, "STUDENT_SAMPLE_RATE", 1.0) # always sample
147
+ monkeypatch.setattr(worker, "HIGH_VALUE_CATEGORIES", set())
148
+ event = {"content": "low value note", "attributes": {}}
149
+ result = {"entities": [{"type": "other", "name": "thing"}], "facts": [], "relationships": []}
150
+ esc, reason = worker.escalation_decision(event, result)
151
+ assert esc and reason == "random_sample"
152
+
153
+
154
+ def test_student_handles_clean_low_value_event(monkeypatch) -> None:
155
+ """The common case: valid, grounded, non-high-value, not sampled → the
156
+ student's write stands (no escalation)."""
157
+ _no_sample(monkeypatch)
158
+ monkeypatch.setattr(worker, "HIGH_VALUE_CATEGORIES", {"decision", "commitment"})
159
+ event = {"content": "Acme released a new SKU.", "attributes": {}}
160
+ result = {
161
+ "entities": [{"type": "org", "name": "Acme"}],
162
+ "facts": [{"category": "state", "statement": "Acme released a SKU", "subject": "Acme"}],
163
+ "relationships": [],
164
+ }
165
+ esc, reason = worker.escalation_decision(event, result)
166
+ assert not esc and reason is None
167
+
168
+
169
+ # ----------------------------------------------------------------------
170
+ # Flag contract — cascade is a no-op until CASCADE_ENABLED is flipped.
171
+ # ----------------------------------------------------------------------
172
+
173
+ def test_cascade_default_off() -> None:
174
+ """Default env ⇒ teacher-only. The flag is the kill switch."""
175
+ assert worker.CASCADE_ENABLED is False
@@ -0,0 +1,69 @@
1
+ """Unit tests for the distillation_queue attempts/retry accounting.
2
+
3
+ Regression guard for the lease-reclaim bug (gotcha #11): claiming must NOT
4
+ consume the retry budget — only genuine processing failures do — so a worker
5
+ restart (deploy recreating the container) can re-claim stranded in-flight work
6
+ indefinitely instead of stranding it in `claimed` forever. The DB-touching
7
+ claim/release/fail SQL isn't unit-testable here (no DB in this suite), but the
8
+ give-up decision is pure logic, so we pin it.
9
+ """
10
+
11
+ from __future__ import annotations
12
+
13
+ import importlib.util
14
+ from pathlib import Path
15
+
16
+ import pytest
17
+
18
+ _THIS = Path(__file__).resolve().parent
19
+
20
+
21
+ def _load_worker(name: str = "extractor_async_worker_qa"):
22
+ spec = importlib.util.spec_from_file_location(name, _THIS / "worker.py")
23
+ assert spec and spec.loader
24
+ mod = importlib.util.module_from_spec(spec)
25
+ spec.loader.exec_module(mod)
26
+ return mod
27
+
28
+
29
+ try:
30
+ worker = _load_worker()
31
+ except ImportError as e:
32
+ pytest.skip(f"extractor-async deps unavailable: {e}", allow_module_level=True)
33
+
34
+
35
+ def test_attempts_exhausted_gives_exactly_max_genuine_tries(monkeypatch) -> None:
36
+ """`attempts` is the count of PRIOR genuine failures at claim time. With
37
+ MAX_ATTEMPTS=3 the sequence is: fail#1 (attempts=0)→retry, fail#2
38
+ (attempts=1)→retry, fail#3 (attempts=2)→terminal. Exactly 3 tries."""
39
+ monkeypatch.setattr(worker, "MAX_ATTEMPTS", 3)
40
+ assert worker._attempts_exhausted(0) is False # 1st failure → retry
41
+ assert worker._attempts_exhausted(1) is False # 2nd failure → retry
42
+ assert worker._attempts_exhausted(2) is True # 3rd failure → give up
43
+ assert worker._attempts_exhausted(3) is True
44
+
45
+
46
+ def test_attempts_exhausted_respects_max(monkeypatch) -> None:
47
+ monkeypatch.setattr(worker, "MAX_ATTEMPTS", 1)
48
+ assert worker._attempts_exhausted(0) is True # single try, no retry
49
+ monkeypatch.setattr(worker, "MAX_ATTEMPTS", 5)
50
+ assert worker._attempts_exhausted(3) is False
51
+ assert worker._attempts_exhausted(4) is True
52
+
53
+
54
+ def test_claim_sql_does_not_increment_attempts() -> None:
55
+ """The fix: claiming must not touch `attempts` (only release/fail do). Guard
56
+ against a regression that reintroduces the increment at claim time. We check
57
+ the source of claim_next_batch rather than execute it (no DB here)."""
58
+ import inspect
59
+ src = inspect.getsource(worker.claim_next_batch)
60
+ # the claim UPDATE must not bump attempts; the only attempts reference is the
61
+ # eligibility predicate `attempts < %s`.
62
+ assert "attempts = attempts + 1" not in src
63
+ assert "attempts <" in src # eligibility gate still present
64
+
65
+
66
+ def test_release_and_fail_increment_attempts() -> None:
67
+ import inspect
68
+ assert "attempts = attempts + 1" in inspect.getsource(worker.release_claim)
69
+ assert "attempts = attempts + 1" in inspect.getsource(worker.mark_failed)
@@ -30,6 +30,7 @@ import hashlib
30
30
  import json
31
31
  import logging
32
32
  import os
33
+ import random
33
34
  import re
34
35
  import socket
35
36
  import time
@@ -94,6 +95,45 @@ DISTILL_TRACE_ENABLED = os.environ.get(
94
95
  ).strip().lower() in ("true", "1", "yes", "on")
95
96
 
96
97
 
98
+ def _envflag(name: str, default: str = "false") -> bool:
99
+ return os.environ.get(name, default).strip().lower() in ("true", "1", "yes", "on")
100
+
101
+
102
+ # --------------------------------------------------------------------
103
+ # Student→teacher distillation cascade (#99)
104
+ #
105
+ # When CASCADE_ENABLED, the cheap fine-tuned student (NuExtract-2.0-4B,
106
+ # served behind STUDENT_ENDPOINT) is the PRIMARY distiller. Per event the
107
+ # worker runs the student first, applies deterministic gates + sampling, and
108
+ # the event is handled by student XOR teacher — DISJOINT writes, so nothing in
109
+ # the accretion store is ever superseded. The 27B teacher (the existing
110
+ # call_llm_batch path) handles only the escalated subset.
111
+ #
112
+ # Hard requirement on this load-bearing ingestion path: the whole cascade is
113
+ # behind this single flag. Flip it off ⇒ byte-for-byte the prior teacher-only
114
+ # behaviour (no student call, no ledger write). That is the instant kill switch
115
+ # (cf. the 0.10.9 outage). Every graph row's producer is recorded in the
116
+ # event_distillations ledger (migration 010) so escalation can be monitored and
117
+ # student rows can be re-distilled later ("mop up").
118
+ CASCADE_ENABLED = _envflag("CASCADE_ENABLED")
119
+ STUDENT_ENDPOINT = os.environ.get("STUDENT_ENDPOINT", "")
120
+ STUDENT_API_KEY = os.environ.get("STUDENT_API_KEY", "")
121
+ STUDENT_MODEL = os.environ.get("STUDENT_MODEL", "nuextract-2.0-4b-ft")
122
+ STUDENT_TIMEOUT_SEC = float(os.environ.get("STUDENT_TIMEOUT_SEC", "60"))
123
+ STUDENT_MAX_TOKENS = int(os.environ.get("STUDENT_MAX_TOKENS", "768"))
124
+ # Fraction of student-passing events ALSO sent to the teacher — NOT a quality
125
+ # lever, this is the monitoring + active-learning sample (drift detection and
126
+ # fresh teacher-gold on live traffic). Default conservative-ish 5%.
127
+ STUDENT_SAMPLE_RATE = float(os.environ.get("STUDENT_SAMPLE_RATE", "0.05"))
128
+ # Fact categories that always go to the teacher regardless of the student's
129
+ # output (high-value, cheap to over-escalate). Comma-separated, lowercased.
130
+ HIGH_VALUE_CATEGORIES = {
131
+ c.strip().lower()
132
+ for c in os.environ.get("HIGH_VALUE_CATEGORIES", "decision,commitment").split(",")
133
+ if c.strip()
134
+ }
135
+
136
+
97
137
  # KV-text output format constants. We dropped JSON output (and the
98
138
  # `guided_json` schema enforcement that went with it) because a single
99
139
  # invalid char inside a 13k-character JSON blob nukes the whole 10-event
@@ -779,6 +819,143 @@ async def call_llm_batch(
779
819
  return parsed
780
820
 
781
821
 
822
+ # --------------------------------------------------------------------
823
+ # Student→teacher cascade (#99) — student call + escalation gates
824
+ # --------------------------------------------------------------------
825
+
826
+ _EMAIL_RE = re.compile(r"[\w.+-]+@[\w.-]+\.\w+")
827
+ _JSON_OBJ_RE = re.compile(r"\{.*\}", re.DOTALL)
828
+
829
+
830
+ def _salvage_json_object(text: str) -> dict[str, Any] | None:
831
+ """Best-effort parse of the student's single-event output into a dict.
832
+ The student was trained to emit one event object verbatim (the teacher's
833
+ per-event raw_slice), so a bare json.loads usually works; we also strip
834
+ ```json fences and grab the outermost {...} as a fallback. Returns None on
835
+ any failure — that None IS the JSON-validity gate (→ escalate)."""
836
+ t = text.strip()
837
+ if t.startswith("```"):
838
+ t = t.strip("`")
839
+ if t[:4].lower() == "json":
840
+ t = t[4:]
841
+ try:
842
+ obj = json.loads(t)
843
+ return obj if isinstance(obj, dict) else None
844
+ except Exception:
845
+ pass
846
+ m = _JSON_OBJ_RE.search(t)
847
+ if m:
848
+ try:
849
+ obj = json.loads(m.group(0))
850
+ return obj if isinstance(obj, dict) else None
851
+ except Exception:
852
+ return None
853
+ return None
854
+
855
+
856
+ async def call_student_one(
857
+ client: httpx.AsyncClient, event: dict[str, Any]
858
+ ) -> dict[str, Any] | None:
859
+ """Distil ONE event with the fine-tuned student. The student trained on
860
+ single-event (build_event_block → per-event object) pairs with NO system
861
+ prompt, so the request is a single user turn. Output is normalised through
862
+ the SAME _parse_guided_json the teacher path uses (wrap the lone object in
863
+ the {"events":[...]} envelope), so a student-produced result dict is
864
+ byte-shape-identical to a teacher one — every downstream upsert is
865
+ producer-agnostic. Returns the result dict, or None on transport/parse
866
+ failure (→ the JSON-validity gate escalates it)."""
867
+ block = build_event_block(0, event)
868
+ headers = {"Content-Type": "application/json"}
869
+ if STUDENT_API_KEY:
870
+ headers["X-API-Key"] = STUDENT_API_KEY
871
+ headers["Authorization"] = f"Bearer {STUDENT_API_KEY}"
872
+ body = {
873
+ "model": STUDENT_MODEL,
874
+ "messages": [{"role": "user", "content": block}],
875
+ "temperature": 0.0,
876
+ "max_tokens": STUDENT_MAX_TOKENS,
877
+ }
878
+ try:
879
+ r = await client.post(
880
+ STUDENT_ENDPOINT, json=body, headers=headers, timeout=STUDENT_TIMEOUT_SEC
881
+ )
882
+ r.raise_for_status()
883
+ data = r.json()
884
+ text = (data.get("choices") or [{}])[0].get("message", {}).get("content", "")
885
+ if not text:
886
+ text = data.get("message", {}).get("content", "")
887
+ except Exception as exc:
888
+ log.warning(f"student call failed event_id={event.get('id')}: {exc}")
889
+ return None
890
+ obj = _salvage_json_object(text or "")
891
+ if obj is None:
892
+ return None
893
+ parsed = _parse_guided_json(json.dumps({"events": [obj]}), 1)
894
+ result = parsed[0]
895
+ result["raw_slice"] = json.dumps(obj, ensure_ascii=False)
896
+ return result
897
+
898
+
899
+ def _event_known_emails(event: dict[str, Any]) -> set[str]:
900
+ """Emails grounded in the event — its content plus the structured envelope
901
+ (the #111 hard-key bag). The grounding gate escalates any student output
902
+ that asserts an email NOT in this set: 'confidently wrong about a known
903
+ fact', which token-confidence can't catch."""
904
+ known = {e.lower() for e in _EMAIL_RE.findall(event.get("content") or "")}
905
+ attrs = event.get("attributes") or {}
906
+ for k in ("contact_email", "author", "user_id"):
907
+ v = attrs.get(k)
908
+ if isinstance(v, str):
909
+ known |= {e.lower() for e in _EMAIL_RE.findall(v)}
910
+ for k in ("to_emails", "cc_emails"):
911
+ v = attrs.get(k)
912
+ if isinstance(v, list):
913
+ for item in v:
914
+ if isinstance(item, str):
915
+ known |= {e.lower() for e in _EMAIL_RE.findall(item)}
916
+ return known
917
+
918
+
919
+ def escalation_decision(
920
+ event: dict[str, Any], student_result: dict[str, Any] | None
921
+ ) -> tuple[bool, str | None]:
922
+ """Decide whether an event escalates from student to teacher. Escalate iff
923
+ ANY gate fires (see RFC-student-cascade §Escalation policy). Returns
924
+ (escalate, reason). reason is the producer-tag stored in the ledger so the
925
+ escalation mix is queryable.
926
+
927
+ Routing leans on deterministic gates + sampling, NOT on token-confidence /
928
+ the weak verifier (AUC 0.63) — cheap signals don't pinpoint student errors."""
929
+ # 1a. JSON/schema validity gate — None means the student produced nothing
930
+ # parseable.
931
+ if student_result is None:
932
+ return True, "student_parse_fail"
933
+ # 1b. A parseable-but-empty extraction is also a fail — the student gave us
934
+ # no graph signal; let the teacher try.
935
+ if not (
936
+ student_result.get("entities")
937
+ or student_result.get("facts")
938
+ or student_result.get("relationships")
939
+ ):
940
+ return True, "student_empty"
941
+ # 1c. Grounding gate — student asserts an email the event doesn't contain.
942
+ known = _event_known_emails(event)
943
+ blob = json.dumps(student_result.get("entities", [])) + json.dumps(
944
+ student_result.get("facts", [])
945
+ )
946
+ for em in {e.lower() for e in _EMAIL_RE.findall(blob)}:
947
+ if em not in known:
948
+ return True, "grounding_violation"
949
+ # 2. High-value fact class — always teacher.
950
+ for f in student_result.get("facts", []):
951
+ if (f.get("category") or "").lower() in HIGH_VALUE_CATEGORIES:
952
+ return True, "high_value_class"
953
+ # 3. Random monitoring/active-learning sample (not a quality lever).
954
+ if STUDENT_SAMPLE_RATE > 0 and random.random() < STUDENT_SAMPLE_RATE:
955
+ return True, "random_sample"
956
+ return False, None
957
+
958
+
782
959
  # --------------------------------------------------------------------
783
960
  # Upsert helpers (mirror extractor-sync's idempotent shape)
784
961
  # --------------------------------------------------------------------
@@ -1508,6 +1685,41 @@ def _insert_trace(
1508
1685
  )
1509
1686
 
1510
1687
 
1688
+ def _record_distillation(
1689
+ conn: psycopg.Connection,
1690
+ *,
1691
+ event_id: str,
1692
+ producer: str,
1693
+ llm_model: str,
1694
+ escalated: bool | None,
1695
+ escalate_reason: str | None,
1696
+ ) -> None:
1697
+ """Append a (event_id, producer) row to the cascade ledger (migration 010).
1698
+ Audit-only — caller wraps in try/except, never poisons the upsert path.
1699
+ Records WHICH producer wrote this event's graph rows so escalation can be
1700
+ monitored and student rows re-distilled later. ON CONFLICT refreshes (a
1701
+ gate-fail re-distill legitimately re-stamps the teacher row)."""
1702
+ with conn.cursor() as cur:
1703
+ cur.execute(
1704
+ """
1705
+ INSERT INTO event_distillations (
1706
+ event_id, producer, llm_model, system_prompt_hash,
1707
+ escalated, escalate_reason
1708
+ ) VALUES (%s, %s, %s, %s, %s, %s)
1709
+ ON CONFLICT (event_id, producer) DO UPDATE SET
1710
+ llm_model = EXCLUDED.llm_model,
1711
+ system_prompt_hash = EXCLUDED.system_prompt_hash,
1712
+ escalated = EXCLUDED.escalated,
1713
+ escalate_reason = EXCLUDED.escalate_reason,
1714
+ distilled_at = now()
1715
+ """,
1716
+ (
1717
+ event_id, producer, llm_model, SYSTEM_PROMPT_HASH,
1718
+ escalated, escalate_reason,
1719
+ ),
1720
+ )
1721
+
1722
+
1511
1723
  # --------------------------------------------------------------------
1512
1724
  # Queue mechanics
1513
1725
  # --------------------------------------------------------------------
@@ -1693,8 +1905,16 @@ def claim_next_batch(conn: psycopg.Connection) -> list[dict[str, Any]]:
1693
1905
  status = 'claimed',
1694
1906
  claimed_by = %s,
1695
1907
  claimed_at = NOW(),
1696
- claim_expires_at = NOW() + (%s || ' seconds')::interval,
1697
- attempts = attempts + 1
1908
+ claim_expires_at = NOW() + (%s || ' seconds')::interval
1909
+ -- NB: claiming does NOT increment `attempts`. `attempts` counts
1910
+ -- genuine PROCESSING failures (release_claim / mark_failed), not
1911
+ -- claim-grabs. A worker that dies mid-batch (e.g. a deploy
1912
+ -- recreates the container) leaves its rows in `claimed`; the lease
1913
+ -- expires and they are re-claimed here WITHOUT burning the retry
1914
+ -- budget — so restarts can't strand in-flight work. (Pre-fix, the
1915
+ -- increment lived here and ~3 deploys could push a row to
1916
+ -- attempts=MAX, making it forever-ineligible for reclaim AND never
1917
+ -- marked failed → orphaned in `claimed`. See gotcha #11.)
1698
1918
  WHERE id IN (
1699
1919
  SELECT id FROM distillation_queue
1700
1920
  WHERE (
@@ -1731,14 +1951,21 @@ def mark_done(conn: psycopg.Connection, queue_id: int) -> None:
1731
1951
 
1732
1952
 
1733
1953
  def mark_failed(conn: psycopg.Connection, queue_id: int, error: str) -> None:
1954
+ # Terminal genuine-failure path → count the attempt (claiming no longer
1955
+ # does; see claim_next_batch). Leaves the row's `attempts` reflecting the
1956
+ # true number of processing attempts on a failed row.
1734
1957
  with conn.cursor() as cur:
1735
1958
  cur.execute(
1736
- "UPDATE distillation_queue SET status = 'failed', last_error = %s WHERE id = %s",
1959
+ "UPDATE distillation_queue SET status = 'failed', "
1960
+ "attempts = attempts + 1, last_error = %s WHERE id = %s",
1737
1961
  (error[:1024], queue_id),
1738
1962
  )
1739
1963
 
1740
1964
 
1741
1965
  def release_claim(conn: psycopg.Connection, queue_id: int, error: str) -> None:
1966
+ # Recoverable genuine-failure path (will retry) → count the attempt. This is
1967
+ # where the retry budget is spent — NOT at claim time — so a deploy-induced
1968
+ # reclaim never consumes it.
1742
1969
  with conn.cursor() as cur:
1743
1970
  cur.execute(
1744
1971
  """
@@ -1747,6 +1974,7 @@ def release_claim(conn: psycopg.Connection, queue_id: int, error: str) -> None:
1747
1974
  claimed_by = NULL,
1748
1975
  claimed_at = NULL,
1749
1976
  claim_expires_at = NULL,
1977
+ attempts = attempts + 1,
1750
1978
  last_error = %s
1751
1979
  WHERE id = %s
1752
1980
  """,
@@ -1754,6 +1982,15 @@ def release_claim(conn: psycopg.Connection, queue_id: int, error: str) -> None:
1754
1982
  )
1755
1983
 
1756
1984
 
1985
+ def _attempts_exhausted(attempts: int) -> bool:
1986
+ """Whether THIS processing failure should be terminal (mark_failed) rather
1987
+ than retried (release_claim). `attempts` is the row's value at claim time =
1988
+ the count of PRIOR genuine failures (claiming no longer increments it). This
1989
+ failure is attempt #(attempts+1), so we give up once that reaches
1990
+ MAX_ATTEMPTS — giving exactly MAX_ATTEMPTS genuine tries before failing."""
1991
+ return attempts + 1 >= MAX_ATTEMPTS
1992
+
1993
+
1757
1994
  # --------------------------------------------------------------------
1758
1995
  # Main loop
1759
1996
  # --------------------------------------------------------------------
@@ -1829,12 +2066,81 @@ async def process_batch(
1829
2066
  if not callable_items:
1830
2067
  return
1831
2068
 
2069
+ # Cascade (#99): student-primary with a gated/sampled teacher escalation.
2070
+ # Disjoint writes — each event is handled by student XOR teacher — so the
2071
+ # accretion store is never superseded. Flag-off ⇒ the teacher-only path
2072
+ # below, byte-for-byte the prior behaviour.
2073
+ if CASCADE_ENABLED and not stub_mode and STUDENT_ENDPOINT:
2074
+ await _process_cascade(http, conn, callable_items, events_by_qid)
2075
+ return
2076
+
2077
+ await _run_teacher(http, conn, callable_items, events_by_qid, stub_mode, None)
2078
+
2079
+
2080
+ async def _process_cascade(
2081
+ http: httpx.AsyncClient,
2082
+ conn: psycopg.Connection,
2083
+ callable_items: list[dict[str, Any]],
2084
+ events_by_qid: dict[int, dict[str, Any] | None],
2085
+ ) -> None:
2086
+ """Student-primary cascade for one claim. Runs the student over every
2087
+ callable event (bounded concurrency), then per event applies the escalation
2088
+ gates: a pass writes the student's extraction (producer='student'); a fail/
2089
+ sample escalates to the teacher. Disjoint — the student writes XOR the event
2090
+ is escalated, so there is nothing to supersede."""
2091
+ sem = asyncio.Semaphore(CONCURRENT_LLM_CALLS)
2092
+
2093
+ async def _student(item):
2094
+ async with sem:
2095
+ ev = events_by_qid[item["id"]]
2096
+ return item, await call_student_one(http, ev)
2097
+
2098
+ outcomes = await asyncio.gather(*[_student(i) for i in callable_items])
2099
+
2100
+ escalate_items: list[dict[str, Any]] = []
2101
+ reason_by_qid: dict[int, str] = {}
2102
+ for item, sresult in outcomes:
2103
+ event = events_by_qid[item["id"]]
2104
+ escalate, reason = escalation_decision(event, sresult)
2105
+ if escalate:
2106
+ escalate_items.append(item)
2107
+ reason_by_qid[item["id"]] = reason or "escalated"
2108
+ else:
2109
+ _apply_extraction(
2110
+ conn, item=item, event=event, result=sresult, llm_ms=0.0,
2111
+ local_idx=0, stub_mode=False, producer="student",
2112
+ escalated=False, escalate_reason=None,
2113
+ )
2114
+
2115
+ log.info(
2116
+ f"cascade: {len(callable_items) - len(escalate_items)} student-handled, "
2117
+ f"{len(escalate_items)} escalated to teacher"
2118
+ )
2119
+ if escalate_items:
2120
+ await _run_teacher(http, conn, escalate_items, events_by_qid, False, reason_by_qid)
2121
+
2122
+
2123
+ async def _run_teacher(
2124
+ http: httpx.AsyncClient,
2125
+ conn: psycopg.Connection,
2126
+ teacher_items: list[dict[str, Any]],
2127
+ events_by_qid: dict[int, dict[str, Any] | None],
2128
+ stub_mode: bool,
2129
+ reason_by_qid: dict[int, str] | None,
2130
+ ) -> None:
2131
+ """Teacher (27B) distillation over the given items — the existing
2132
+ multi-event batched path, factored out so both the cascade-off (all items)
2133
+ and cascade escalation (subset) flows share it. `reason_by_qid` non-None
2134
+ means these items were escalated by the cascade (tags the ledger row);
2135
+ None means pure teacher-only (no ledger)."""
2136
+ if not teacher_items:
2137
+ return
1832
2138
  # Build chunks of EVENTS_PER_LLM_CALL items each (last chunk may be
1833
2139
  # short). Each chunk → one LLM call. Up to CONCURRENT_LLM_CALLS run
1834
2140
  # concurrently; asyncio.gather queues the rest.
1835
2141
  chunks: list[tuple[list[dict[str, Any]], list[dict[str, Any]]]] = []
1836
- for s in range(0, len(callable_items), EVENTS_PER_LLM_CALL):
1837
- chunk_items = callable_items[s : s + EVENTS_PER_LLM_CALL]
2142
+ for s in range(0, len(teacher_items), EVENTS_PER_LLM_CALL):
2143
+ chunk_items = teacher_items[s : s + EVENTS_PER_LLM_CALL]
1838
2144
  chunk_events = [events_by_qid[i["id"]] for i in chunk_items]
1839
2145
  chunks.append((chunk_items, chunk_events))
1840
2146
 
@@ -1853,7 +2159,6 @@ async def process_batch(
1853
2159
  for (chunk_items, _chunk_events), (per_item, llm_ms) in zip(chunks, chunk_outcomes):
1854
2160
  for local_idx, (item, result) in enumerate(zip(chunk_items, per_item)):
1855
2161
  queue_id = item["id"]
1856
- event_id = item["event_id"]
1857
2162
  attempts = item["attempts"]
1858
2163
  event = events_by_qid[queue_id]
1859
2164
 
@@ -1862,83 +2167,123 @@ async def process_batch(
1862
2167
  log.warning(
1863
2168
  f"extraction failed queue_id={queue_id} attempts={attempts}: {err}"
1864
2169
  )
1865
- if attempts >= MAX_ATTEMPTS:
2170
+ if _attempts_exhausted(attempts):
1866
2171
  mark_failed(conn, queue_id, err)
1867
2172
  else:
1868
2173
  release_claim(conn, queue_id, err)
1869
2174
  continue
1870
2175
 
1871
- ents = result.get("entities") or []
1872
- facts = result.get("facts") or []
1873
- rels = result.get("relationships") or []
1874
- arena = event["arena"]
1875
- participant_set = event.get("participant_set") or [arena]
1876
- disclosure = event.get("disclosure_class") or "private"
1877
- # SOURCE time of this event: prefer the parsed
1878
- # `attributes.timestamp` (canonical), falling back to the
1879
- # stored `emitted_at` column (which the sync path now also
1880
- # stamps from source time). `None` ⇒ upserts fall back to
1881
- # NOW() in-SQL. NEVER crash on a bad/absent source time.
1882
- event_time = event_source_time(event) or event.get("emitted_at")
1883
- # A structured deadline on the source event, if the producer
1884
- # supplied one — populates facts.effective_until. Absent or
1885
- # unparseable ⇒ None (column stays NULL, its existing
1886
- # behaviour). Only `attributes.due_at` is honoured; we do NOT
1887
- # guess deadlines from free text here.
1888
- due_at = parse_source_time((event.get("attributes") or {}).get("due_at"))
1889
- # ORIGINATING SOURCE of this event, stamped onto its facts so
1890
- # downstream can tell CRM-asserted from email-asserted (the
1891
- # SoR-drift foundation). Finer `attributes.source` else coarse
1892
- # `source_kind`; None ⇒ column stays NULL (source-unknown).
1893
- src = fact_source(event)
2176
+ escalated = None if reason_by_qid is None else True
2177
+ reason = None if reason_by_qid is None else reason_by_qid.get(queue_id)
2178
+ _apply_extraction(
2179
+ conn, item=item, event=event, result=result, llm_ms=llm_ms,
2180
+ local_idx=local_idx, stub_mode=stub_mode, producer="teacher",
2181
+ escalated=escalated, escalate_reason=reason,
2182
+ )
2183
+
1894
2184
 
2185
+ def _apply_extraction(
2186
+ conn: psycopg.Connection,
2187
+ *,
2188
+ item: dict[str, Any],
2189
+ event: dict[str, Any],
2190
+ result: dict[str, Any],
2191
+ llm_ms: float,
2192
+ local_idx: int,
2193
+ stub_mode: bool,
2194
+ producer: str,
2195
+ escalated: bool | None,
2196
+ escalate_reason: str | None,
2197
+ ) -> bool:
2198
+ """Write one event's extraction to the graph (+ trace + cascade ledger).
2199
+ Shared by the student-pass and teacher paths so producer tagging and the
2200
+ upsert semantics are identical; `result` is a parsed extraction dict (never
2201
+ an Exception). Returns True on success; on a DB failure releases/fails the
2202
+ claim per attempts and returns False."""
2203
+ queue_id = item["id"]
2204
+ event_id = item["event_id"]
2205
+ attempts = item["attempts"]
2206
+ ents = result.get("entities") or []
2207
+ facts = result.get("facts") or []
2208
+ rels = result.get("relationships") or []
2209
+ arena = event["arena"]
2210
+ participant_set = event.get("participant_set") or [arena]
2211
+ disclosure = event.get("disclosure_class") or "private"
2212
+ # SOURCE time of this event: prefer the parsed `attributes.timestamp`
2213
+ # (canonical), falling back to the stored `emitted_at` column. `None` ⇒
2214
+ # upserts fall back to NOW() in-SQL. NEVER crash on a bad/absent source time.
2215
+ event_time = event_source_time(event) or event.get("emitted_at")
2216
+ # A structured deadline on the source event, if the producer supplied one —
2217
+ # populates facts.effective_until. Only `attributes.due_at` is honoured.
2218
+ due_at = parse_source_time((event.get("attributes") or {}).get("due_at"))
2219
+ # ORIGINATING SOURCE of this event, stamped onto its facts so downstream can
2220
+ # tell CRM-asserted from email-asserted (the SoR-drift foundation). Finer
2221
+ # `attributes.source` else coarse `source_kind`; None ⇒ column stays NULL.
2222
+ src = fact_source(event)
2223
+
2224
+ try:
2225
+ name_to_id = upsert_entities(
2226
+ conn, arena, event_id, participant_set, disclosure, ents,
2227
+ event_time, event.get("attributes"),
2228
+ )
2229
+ n_facts = upsert_facts(
2230
+ conn, arena, event_id, participant_set, disclosure, facts, name_to_id,
2231
+ event_time, due_at, src,
2232
+ )
2233
+ n_rels = upsert_relationships(
2234
+ conn, arena, event_id, participant_set, disclosure, rels, name_to_id,
2235
+ event_time,
2236
+ )
2237
+ mark_done(conn, queue_id)
2238
+ log.info(
2239
+ f"completed queue_id={queue_id} event_id={event_id} producer={producer} "
2240
+ f"entities={len(name_to_id)} facts={n_facts} relationships={n_rels}"
2241
+ + (f" llm_ms={llm_ms:.0f}" if not stub_mode and llm_ms else "")
2242
+ )
2243
+ # Trace logging — best-effort. ONLY the teacher produces training gold;
2244
+ # logging student output would train the student on itself.
2245
+ if DISTILL_TRACE_ENABLED and not stub_mode and producer == "teacher":
1895
2246
  try:
1896
- name_to_id = upsert_entities(
1897
- conn, arena, event_id, participant_set, disclosure, ents,
1898
- event_time, event.get("attributes"),
2247
+ _insert_trace(
2248
+ conn,
2249
+ event_id=event_id,
2250
+ user_prompt=build_event_block(local_idx, event),
2251
+ raw_response=result.get("raw_slice", ""),
2252
+ llm_chunk_ms=llm_ms,
1899
2253
  )
1900
- n_facts = upsert_facts(
1901
- conn, arena, event_id, participant_set, disclosure, facts, name_to_id,
1902
- event_time, due_at, src,
1903
- )
1904
- n_rels = upsert_relationships(
1905
- conn, arena, event_id, participant_set, disclosure, rels, name_to_id,
1906
- event_time,
2254
+ except Exception as trace_exc:
2255
+ log.warning(
2256
+ f"trace insert failed queue_id={queue_id} "
2257
+ f"event_id={event_id}: {trace_exc}"
1907
2258
  )
1908
- mark_done(conn, queue_id)
1909
- log.info(
1910
- f"completed queue_id={queue_id} event_id={event_id} "
1911
- f"entities={len(name_to_id)} facts={n_facts} "
1912
- f"relationships={n_rels}"
1913
- + (f" llm_ms={llm_ms:.0f}/chunk" if not stub_mode else "")
2259
+ # Cascade ledger — best-effort, records which producer wrote this
2260
+ # event's rows (only when the cascade is on).
2261
+ if CASCADE_ENABLED:
2262
+ try:
2263
+ _record_distillation(
2264
+ conn,
2265
+ event_id=event_id,
2266
+ producer=producer,
2267
+ llm_model=LLM_MODEL if producer == "teacher" else STUDENT_MODEL,
2268
+ escalated=escalated,
2269
+ escalate_reason=escalate_reason,
1914
2270
  )
1915
- # Trace logging — best-effort, never breaks the worker.
1916
- # Captures (input, output) so a student model can be
1917
- # trained on the teacher's distribution. Skipped in
1918
- # stub mode (no real LLM output to record).
1919
- if DISTILL_TRACE_ENABLED and not stub_mode:
1920
- try:
1921
- _insert_trace(
1922
- conn,
1923
- event_id=event_id,
1924
- user_prompt=build_event_block(local_idx, event),
1925
- raw_response=result.get("raw_slice", ""),
1926
- llm_chunk_ms=llm_ms,
1927
- )
1928
- except Exception as trace_exc:
1929
- log.warning(
1930
- f"trace insert failed queue_id={queue_id} "
1931
- f"event_id={event_id}: {trace_exc}"
1932
- )
1933
- except Exception as exc:
1934
- err = f"{type(exc).__name__}: {exc}"
2271
+ except Exception as ledger_exc:
1935
2272
  log.warning(
1936
- f"db upsert failed queue_id={queue_id} attempts={attempts}: {err}"
2273
+ f"ledger insert failed queue_id={queue_id} "
2274
+ f"event_id={event_id}: {ledger_exc}"
1937
2275
  )
1938
- if attempts >= MAX_ATTEMPTS:
1939
- mark_failed(conn, queue_id, err)
1940
- else:
1941
- release_claim(conn, queue_id, err)
2276
+ return True
2277
+ except Exception as exc:
2278
+ err = f"{type(exc).__name__}: {exc}"
2279
+ log.warning(
2280
+ f"db upsert failed queue_id={queue_id} attempts={attempts}: {err}"
2281
+ )
2282
+ if _attempts_exhausted(attempts):
2283
+ mark_failed(conn, queue_id, err)
2284
+ else:
2285
+ release_claim(conn, queue_id, err)
2286
+ return False
1942
2287
 
1943
2288
 
1944
2289
  async def amain():
@@ -1951,6 +2296,18 @@ async def amain():
1951
2296
  f"output_mode={DISTILL_OUTPUT_MODE}, "
1952
2297
  f"prompt_hash={SYSTEM_PROMPT_HASH})"
1953
2298
  )
2299
+ if CASCADE_ENABLED:
2300
+ log.info(
2301
+ f"cascade ENABLED — student-primary "
2302
+ f"(model={STUDENT_MODEL}, endpoint={STUDENT_ENDPOINT or '(unset!)'}, "
2303
+ f"sample_rate={STUDENT_SAMPLE_RATE}, "
2304
+ f"high_value={sorted(HIGH_VALUE_CATEGORIES)})"
2305
+ )
2306
+ if not STUDENT_ENDPOINT:
2307
+ log.warning(
2308
+ "CASCADE_ENABLED but STUDENT_ENDPOINT unset — falling back to "
2309
+ "teacher-only (no student call)."
2310
+ )
1954
2311
  stub_mode = not LLM_ENDPOINT
1955
2312
  if stub_mode:
1956
2313
  log.warning("LLM_ENDPOINT not set — running in stub mode (no extraction).")
@@ -0,0 +1,40 @@
1
+ -- 010_distillation_ledger.sql — student→teacher cascade audit ledger (#99)
2
+ --
3
+ -- The cascade runs the cheap fine-tuned student as the primary distiller and
4
+ -- escalates a gated/sampled subset to the 27B teacher. Graph writes are
5
+ -- DISJOINT per event (student XOR teacher), so we never supersede in the
6
+ -- accretion store — but we still need to know WHICH producer wrote each
7
+ -- event's rows, both for monitoring (escalation rate, agreement) and for the
8
+ -- "mop up" path (find + re-distill any student rows later). This ledger is
9
+ -- that record: one row per (event, producer).
10
+ --
11
+ -- It is intentionally OUTSIDE the entities/facts/relationships tables — no
12
+ -- column added to the load-bearing graph tables — so the cascade is fully
13
+ -- reversible: drop the flag and this table is simply no longer written. Rows
14
+ -- are best-effort (the worker wraps the insert in try/except, exactly like
15
+ -- distillation_traces) and never gate the upsert path.
16
+ --
17
+ -- Disjoint writes mean normally one producer per event. A gate-fail re-distill
18
+ -- (mop-up: teacher re-runs an event the student first handled) legitimately
19
+ -- adds a second producer row — the PK(event_id, producer) permits that and the
20
+ -- ON CONFLICT refreshes the existing row.
21
+
22
+ CREATE TABLE IF NOT EXISTS event_distillations (
23
+ event_id text NOT NULL,
24
+ producer text NOT NULL, -- 'student' | 'teacher'
25
+ llm_model text NOT NULL,
26
+ system_prompt_hash text,
27
+ escalated boolean, -- NULL when cascade disabled (pure teacher)
28
+ escalate_reason text, -- why this event went to the teacher
29
+ distilled_at timestamptz NOT NULL DEFAULT now(),
30
+ PRIMARY KEY (event_id, producer)
31
+ );
32
+
33
+ -- Monitoring: escalation rate / producer mix over time.
34
+ CREATE INDEX IF NOT EXISTS idx_event_distillations_producer_time
35
+ ON event_distillations (producer, distilled_at DESC);
36
+
37
+ -- Active learning + mop-up: find the events a given gate escalated.
38
+ CREATE INDEX IF NOT EXISTS idx_event_distillations_reason
39
+ ON event_distillations (escalate_reason)
40
+ WHERE escalate_reason IS NOT NULL;