agentfootprint 6.28.1 → 6.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/README.md +69 -4
  2. package/dist/esm/lib/context-bisect/index.js +2 -0
  3. package/dist/esm/lib/context-bisect/index.js.map +1 -1
  4. package/dist/esm/lib/context-bisect/missingContext.js +63 -0
  5. package/dist/esm/lib/context-bisect/missingContext.js.map +1 -0
  6. package/dist/esm/lib/influence-core/attributability.js +136 -0
  7. package/dist/esm/lib/influence-core/attributability.js.map +1 -0
  8. package/dist/esm/lib/influence-core/index.js +2 -1
  9. package/dist/esm/lib/influence-core/index.js.map +1 -1
  10. package/dist/esm/lib/influence-core/types.js +27 -0
  11. package/dist/esm/lib/influence-core/types.js.map +1 -1
  12. package/dist/esm/observe.js +2 -2
  13. package/dist/esm/observe.js.map +1 -1
  14. package/dist/lib/context-bisect/index.js +4 -1
  15. package/dist/lib/context-bisect/index.js.map +1 -1
  16. package/dist/lib/context-bisect/missingContext.js +67 -0
  17. package/dist/lib/context-bisect/missingContext.js.map +1 -0
  18. package/dist/lib/influence-core/attributability.js +142 -0
  19. package/dist/lib/influence-core/attributability.js.map +1 -0
  20. package/dist/lib/influence-core/index.js +8 -1
  21. package/dist/lib/influence-core/index.js.map +1 -1
  22. package/dist/lib/influence-core/types.js +28 -1
  23. package/dist/lib/influence-core/types.js.map +1 -1
  24. package/dist/observe.js +9 -2
  25. package/dist/observe.js.map +1 -1
  26. package/dist/types/lib/context-bisect/index.d.ts +1 -0
  27. package/dist/types/lib/context-bisect/index.d.ts.map +1 -1
  28. package/dist/types/lib/context-bisect/missingContext.d.ts +72 -0
  29. package/dist/types/lib/context-bisect/missingContext.d.ts.map +1 -0
  30. package/dist/types/lib/influence-core/attributability.d.ts +73 -0
  31. package/dist/types/lib/influence-core/attributability.d.ts.map +1 -0
  32. package/dist/types/lib/influence-core/index.d.ts +3 -2
  33. package/dist/types/lib/influence-core/index.d.ts.map +1 -1
  34. package/dist/types/lib/influence-core/types.d.ts +95 -0
  35. package/dist/types/lib/influence-core/types.d.ts.map +1 -1
  36. package/dist/types/observe.d.ts +2 -2
  37. package/dist/types/observe.d.ts.map +1 -1
  38. package/package.json +1 -1
package/README.md CHANGED
@@ -107,6 +107,16 @@ A: verifyAuditBundle → valid: false, brokenAt: #16 — the tampered record, na
107
107
 
108
108
  And you don't have to read the trace yourself — **we provide the tools for an LLM to track it for you**: the trace toolpack let a debugger model find a planted bug while reading **9.5% of the trace** ([guide](docs/guides/trace-debugging.md)).
109
109
 
110
+ **And all this watching costs the run nothing.** Your agent *is* the event loop: a stage runs on the call stack and feeds its trace events into a queue; in the idle beat the dispatcher delivers them to your listeners and files them in trace memory — **one beat behind**, never blocking the hot path:
111
+
112
+ <p align="center">
113
+ <picture>
114
+ <source media="(prefers-color-scheme: dark)" srcset="docs/assets/event-loop-dark.svg">
115
+ <source media="(prefers-color-scheme: light)" srcset="docs/assets/event-loop-light.svg">
116
+ <img alt="Your agent is the event loop — animated. Left: your agent code (Context, Call LLM, Tool Calls) looping turn after turn. Right: the JS event loop drawn as two bold curved arrows with a traveling cursor and two stops — the call stack, where each stage runs as a frame and feeds four trace events (structure, data, control, emit) into the trace queue at the loop's center; and idle time, where the dispatcher flies the queued events into TRACE MEMORY and every listener (onStageAdded, onCommit, onDecision, onEmit) receives every event, one beat behind. Grey is JavaScript's own machinery, green is footprintjs, colors are your code and its trace." src="docs/assets/event-loop-light.svg" width="100%"/>
117
+ </picture>
118
+ </p>
119
+
110
120
  ## One contextual error, walked end to end
111
121
 
112
122
  The third question above, in full — every value below is the captured output of
@@ -140,6 +150,36 @@ flips APPROVED → DECLINED in **3/3 seeded reruns**; the benign style fact and
140
150
  the lookup tool come back not-confirmed, 0/3. Scores are proxies; only the
141
151
  ablation verdict makes a causal claim — the report says so itself.
142
152
 
153
+ **And when the proxy *can't* rank — it says so.** Output-similarity scoring is
154
+ structurally blind to **absence/crowding** bugs (a key instruction truncated out
155
+ of the window, context diluted by filler): the culprit doesn't resemble the
156
+ answer, so it ranks low under an innocent. `rankingConfidence` is the honesty
157
+ marker for that — when no source clearly wins, it returns a **shortlist to
158
+ confirm by ablation** instead of a confident, wrong #1:
159
+
160
+ ```typescript
161
+ import { rankingConfidence, ratioStrategy } from 'agentfootprint/observe';
162
+
163
+ const c = rankingConfidence(scores); // over a scoreInfluence() result
164
+ if (!c.clearWinner) ablate(c.shortlist); // too flat to trust → escalate to truth
165
+ // decisiveness rule is pluggable: marginStrategy (default) · ratioStrategy
166
+ // (scale-invariant) · bring-your-own. See docs/guides/ranking-confidence.md
167
+ ```
168
+
169
+ **Three interfaces, one for each shape of the bug** — ship-a-default, bring-your-own:
170
+
171
+ | interface | finds the culprit when it is… | confirm by |
172
+ |---|---|---|
173
+ | **influence ranking** (`scoreInfluence` + `rankingConfidence`) | **present** — orders suspects, says when it can't | — |
174
+ | **ablation** (`localizeContextBug`) | **present** — *remove* it, see the outcome flip | removal |
175
+ | **missing-context finder** (`findDroppedContext`) | **absent** — available but never reached the model (`available − sent`) | restoration |
176
+
177
+ The third closes the gap the first two are blind to — a key instruction truncated
178
+ out of the window has nothing to ablate. `findDroppedContext` is a cheap, exact id
179
+ diff (no embeddings, no LLM); confirm by *restoration* — add the dropped unit back,
180
+ see if the outcome flips. [Guide](docs/guides/missing-context.md) ·
181
+ [example](examples/observability/10-missing-context.ts).
182
+
143
183
  **The same walk, visual.** One call serializes the report for
144
184
  [AgentThinkingUI](https://github.com/footprintjs/agentThinkingUI)'s
145
185
  `<BacktrackView>` — the "why?" board, triggerable from **any** decision point
@@ -174,7 +214,7 @@ board is a `runtimeStageId` a debugger LLM can drill with the
174
214
  | 🔧 Building an agent? | 🐛 Agent misbehaving? | 🏛️ Need audit / compliance? |
175
215
  |---|---|---|
176
216
  | Typed agents with skills, steering, RAG, memory, guardrails — and the trace for free. | Lint your tool catalog in 5 minutes — works on **any** framework's tool list (plain JSON / MCP / OpenAI / Anthropic shapes). Then causal slices, context bisection, and the debugger-LLM toolpack. | Hash-chained, tamper-evident run records with an offline verifier — record-keeping in the EU-AI-Act shape. |
177
- | [→ Quick start](#quick-start--runs-offline-no-api-key) | [→ Tool-catalog lint](docs/guides/tool-catalog-lint.md) · [→ Trace debugging](docs/guides/trace-debugging.md) | [→ Tamper-evident audit](docs/guides/security.md) |
217
+ | [→ Quick start](#quick-start--runs-offline-no-api-key) · [→ Build ↓](#-build--design-your-agent-or-system-of-agents) | [→ Debug ↓](#-debug--see-what-your-agent-did) · [→ Tool-catalog lint](docs/guides/tool-catalog-lint.md) · [→ Trace debugging](docs/guides/trace-debugging.md) | [→ Audit ↓](#-audit--prove-what-happened) · [→ Security guide](docs/guides/security.md) |
178
218
 
179
219
  ---
180
220
 
@@ -265,7 +305,7 @@ const pipeline = Sequence.create()
265
305
  await pipeline.run({ message: 'URGENT: refund dispute on order #4411' });
266
306
  ```
267
307
 
268
- The fourth primitive is `Loop` — `Loop.repeat(agent).until(guard).times(5)`, with a mandatory budget guard. And the named patterns from the research literature ship pre-composed from the same four: `selfConsistency` · `reflection` · `debate` · `mapReduce` · `tot` · `swarm`. Because every composition is a flowchart, the structure you wrote is the structure you see in the UI — and the trace spans the whole pipeline, not one agent at a time. [Designing systems of agents ↓](#how-do-i-design-my-agent-or-system-of-agents)
308
+ The fourth primitive is `Loop` — `Loop.repeat(agent).until(guard).times(5)`, with a mandatory budget guard. And the named patterns from the research literature ship pre-composed from the same four: `selfConsistency` · `reflection` · `debate` · `mapReduce` · `tot` · `swarm`. Because every composition is a flowchart, the structure you wrote is the structure you see in the UI — and the trace spans the whole pipeline, not one agent at a time. [Designing systems of agents ↓](#-build--design-your-agent-or-system-of-agents)
269
309
 
270
310
  ---
271
311
 
@@ -348,7 +388,7 @@ So we used the budget those abstractions would have cost us to invest deeply in
348
388
 
349
389
  ---
350
390
 
351
- ## How do I design my agent or system of agents?
391
+ ## 🔧 Build design your agent or system of agents
352
392
 
353
393
  Two scales — same alphabet. Four control flows are the entire vocabulary.
354
394
 
@@ -510,7 +550,7 @@ Same trick as the injection model: instead of N libraries for N patterns, we fou
510
550
 
511
551
  ---
512
552
 
513
- ## How do I see what my agent did?
553
+ ## 🐛 Debug see what your agent did
514
554
 
515
555
  <p align="center">
516
556
  <img src="docs/assets/lens-run.png" alt="A real agent run in the Lens: the conversation (with live PII redaction), the executed path lit on the merge-tree flowchart, the WHAT-HAPPENED timeline of every iteration/context/LLM turn/route, run stats, and the step inspector — all generated from the run's own trace." width="100%">
@@ -618,6 +658,31 @@ off the hot path.
618
658
 
619
659
  ---
620
660
 
661
+ ## 🏛️ Audit — prove what happened
662
+
663
+ Answering *"why was the loan rejected?"* from captured evidence is the [debug door above](#-debug--see-what-your-agent-did). The audit door adds the integrity layer: prove the **record itself** hasn't been edited since capture. `auditExport()` hash-chains every typed event — decisions, tool calls, validation rejections, permission verdicts, costs — into an append-only bundle (EU AI Act Art. 12 record-keeping shape); `verifyAuditBundle()` re-checks it **offline** — no agent, no LLM — and names the exact record any tamper broke.
664
+
665
+ ```ts
666
+ import { auditExport, verifyAuditBundle } from 'agentfootprint/observability-providers';
667
+
668
+ const audit = auditExport({ agent: 'ledger-auditor' });
669
+ const stop = agent.enable.observability({ strategy: audit });
670
+ await agent.run({ message: 'audit account ACCT-1142' });
671
+ stop();
672
+
673
+ const bundle = audit.bundle(); // plain JSON — store anywhere
674
+ verifyAuditBundle(bundle); // { valid: true, recordsChecked: 50 }
675
+ // flip one byte anywhere → { valid: false, brokenAt: 13, reason: 'hash mismatch — …' }
676
+ ```
677
+
678
+ Payloads are PII-bounded by default (tool args as key names, results as a type, content as `[N chars]` markers). And it's honest about its limits: tamper-**evident**, not tamper-proof — for non-repudiation, anchor both chain ends in external storage (WORM store, signed log).
679
+
680
+ > 📖 **[Tamper-evident audit guide](docs/guides/security.md#tamper-evident-audit-export--auditexport--verifyauditbundle)** ·
681
+ > [`examples/features/19-audit-export.ts`](examples/features/19-audit-export.ts) — capture → verify → tamper → drain ·
682
+ > [`20-regulated-decisioning.ts`](examples/features/20-regulated-decisioning.ts) — an offline auditor reconstructs a loan decline from persisted files, both chain ends anchored
683
+
684
+ ---
685
+
621
686
  ## Mocks first, production second
622
687
 
623
688
  Build the entire app against in-memory mocks with **zero API cost**, then swap real infrastructure one boundary at a time.
@@ -15,6 +15,8 @@
15
15
  * claims; slice completeness is bounded by tracking — and says so.
16
16
  */
17
17
  export { llmEdgeWeigher, stepOutputText, } from './llmEdgeWeigher.js';
18
+ // Interface #3 — missing-context finder (available − sent; confirm by restoration).
19
+ export { findDroppedContext, } from './missingContext.js';
18
20
  export { defaultSuspectClassifier, formatContextBugReport, llmCallIdsFromEvents, localizeContextBug, suspectLabel, } from './localize.js';
19
21
  export { toBacktrackTrace, } from './toBacktrackTrace.js';
20
22
  export { ablationForSuspect, applyAblations, defaultOutcomeComparator, probeFlipped, runAblationProbe, verdictFor, } from './ablation.js';
@@ -1 +1 @@
1
- {"version":3,"file":"index.js","sourceRoot":"","sources":["../../../../src/lib/context-bisect/index.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;GAeG;AAEH,OAAO,EACL,cAAc,EACd,cAAc,GAIf,MAAM,qBAAqB,CAAC;AAE7B,OAAO,EACL,wBAAwB,EACxB,sBAAsB,EACtB,oBAAoB,EACpB,kBAAkB,EAClB,YAAY,GAKb,MAAM,eAAe,CAAC;AAEvB,OAAO,EACL,gBAAgB,GAOjB,MAAM,uBAAuB,CAAC;AAE/B,OAAO,EACL,kBAAkB,EAClB,cAAc,EACd,wBAAwB,EACxB,YAAY,EACZ,gBAAgB,EAChB,UAAU,GAGX,MAAM,eAAe,CAAC;AAEvB,OAAO,EACL,cAAc,GAIf,MAAM,aAAa,CAAC;AAErB,OAAO,EACL,uBAAuB,GAoBxB,MAAM,YAAY,CAAC"}
1
+ {"version":3,"file":"index.js","sourceRoot":"","sources":["../../../../src/lib/context-bisect/index.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;GAeG;AAEH,OAAO,EACL,cAAc,EACd,cAAc,GAIf,MAAM,qBAAqB,CAAC;AAE7B,oFAAoF;AACpF,OAAO,EACL,kBAAkB,GAInB,MAAM,qBAAqB,CAAC;AAE7B,OAAO,EACL,wBAAwB,EACxB,sBAAsB,EACtB,oBAAoB,EACpB,kBAAkB,EAClB,YAAY,GAKb,MAAM,eAAe,CAAC;AAEvB,OAAO,EACL,gBAAgB,GAOjB,MAAM,uBAAuB,CAAC;AAE/B,OAAO,EACL,kBAAkB,EAClB,cAAc,EACd,wBAAwB,EACxB,YAAY,EACZ,gBAAgB,EAChB,UAAU,GAGX,MAAM,eAAe,CAAC;AAEvB,OAAO,EACL,cAAc,GAIf,MAAM,aAAa,CAAC;AAErB,OAAO,EACL,uBAAuB,GAoBxB,MAAM,YAAY,CAAC"}
@@ -0,0 +1,63 @@
1
+ /**
2
+ * missingContext — interface #3: find context that was AVAILABLE but never
3
+ * reached the model (RFC-003).
4
+ *
5
+ * The localizer's influence ranking (#1) + ablation (#2) handle culprits that
6
+ * are PRESENT in the context. They are blind to the opposite failure: a needed
7
+ * unit that was *dropped* — truncated out of the window, or never selected —
8
+ * so the model never saw it. You cannot ablate what isn't there.
9
+ *
10
+ * This finder is the cheap, exact, deterministic half of that case: a SET
11
+ * DIFFERENCE over unit ids. The library tracks context as identified units
12
+ * (each injection / memory entry / tool result has a stable id), so "what got
13
+ * dropped" is `available − sent` — no embeddings, no LLM, O(n).
14
+ *
15
+ * Causal confirmation is the MIRROR of ablation: RESTORATION. Add a dropped
16
+ * unit back, re-run, and an outcome flip is the causal proof. Like ablation,
17
+ * the re-run is consumer-supplied (the library doesn't own your agent loop);
18
+ * see `findDroppedContext` docs + example 10 for the pattern.
19
+ *
20
+ * Honest claim: a dropped unit is a CANDIDATE missing-context culprit, never a
21
+ * confirmed cause — most dropped context is correctly dropped. Only restoration
22
+ * makes a causal claim.
23
+ */
24
+ /**
25
+ * Find context that was available for a turn but never reached the model —
26
+ * `available − sent` by id. Pure, deterministic, O(n); no model or embedder.
27
+ *
28
+ * Ids are assumed stable and unique per side (duplicates are de-duplicated,
29
+ * first occurrence wins). Units in `sent` but not `available` are ignored.
30
+ *
31
+ * Confirm a candidate causally by RESTORATION (the mirror of ablation): add the
32
+ * dropped unit back into the context and re-run; an outcome flip is the proof.
33
+ *
34
+ * @example
35
+ * const { dropped, anyDropped } = findDroppedContext(assembled, sentToModel);
36
+ * if (anyDropped) {
37
+ * for (const unit of dropped) {
38
+ * if (await rerunWith(unit).outcomeFlips()) report(unit); // restoration = causal
39
+ * }
40
+ * }
41
+ */
42
+ export function findDroppedContext(available, sent) {
43
+ const sentIds = new Set();
44
+ for (const u of sent)
45
+ sentIds.add(u.id);
46
+ const dropped = [];
47
+ const seenAvailable = new Set();
48
+ for (const u of available) {
49
+ if (seenAvailable.has(u.id))
50
+ continue; // de-dup by id, first wins
51
+ seenAvailable.add(u.id);
52
+ if (!sentIds.has(u.id))
53
+ dropped.push(u.content === undefined ? { id: u.id } : { id: u.id, content: u.content });
54
+ }
55
+ const availableCount = seenAvailable.size;
56
+ const sentCount = sentIds.size;
57
+ const anyDropped = dropped.length > 0;
58
+ const reason = anyDropped
59
+ ? `${dropped.length} of ${availableCount} available unit(s) never reached the model — candidate(s) for a missing-context bug (truncation / dilution). Confirm by RESTORATION: add a unit back and re-run; an outcome flip is the causal proof (mirror of ablation). Most dropped context is correctly dropped — only restoration confirms.`
60
+ : `All ${availableCount} available unit(s) reached the model — no missing-context bug here (nothing was dropped).`;
61
+ return { dropped, availableCount, sentCount, anyDropped, reason };
62
+ }
63
+ //# sourceMappingURL=missingContext.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"missingContext.js","sourceRoot":"","sources":["../../../../src/lib/context-bisect/missingContext.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;;GAsBG;AAkCH;;;;;;;;;;;;;;;;;GAiBG;AACH,MAAM,UAAU,kBAAkB,CAChC,SAAiC,EACjC,IAA4B;IAE5B,MAAM,OAAO,GAAG,IAAI,GAAG,EAAU,CAAC;IAClC,KAAK,MAAM,CAAC,IAAI,IAAI;QAAE,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC;IAExC,MAAM,OAAO,GAAkB,EAAE,CAAC;IAClC,MAAM,aAAa,GAAG,IAAI,GAAG,EAAU,CAAC;IACxC,KAAK,MAAM,CAAC,IAAI,SAAS,EAAE,CAAC;QAC1B,IAAI,aAAa,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC;YAAE,SAAS,CAAC,2BAA2B;QAClE,aAAa,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC;QACxB,IAAI,CAAC,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC;YAAE,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,OAAO,KAAK,SAAS,CAAC,CAAC,CAAC,EAAE,EAAE,EAAE,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,EAAE,EAAE,EAAE,CAAC,CAAC,EAAE,EAAE,OAAO,EAAE,CAAC,CAAC,OAAO,EAAE,CAAC,CAAC;IAClH,CAAC;IAED,MAAM,cAAc,GAAG,aAAa,CAAC,IAAI,CAAC;IAC1C,MAAM,SAAS,GAAG,OAAO,CAAC,IAAI,CAAC;IAC/B,MAAM,UAAU,GAAG,OAAO,CAAC,MAAM,GAAG,CAAC,CAAC;IACtC,MAAM,MAAM,GAAG,UAAU;QACvB,CAAC,CAAC,GAAG,OAAO,CAAC,MAAM,OAAO,cAAc,mSAAmS;QAC3U,CAAC,CAAC,OAAO,cAAc,2FAA2F,CAAC;IAErH,OAAO,EAAE,OAAO,EAAE,cAAc,EAAE,SAAS,EAAE,UAAU,EAAE,MAAM,EAAE,CAAC;AACpE,CAAC"}
@@ -0,0 +1,136 @@
1
+ import { DEFAULT_CLEAR_WINNER_MARGIN, DEFAULT_CLEAR_WINNER_RATIO, DEFAULT_SHORTLIST_BAND } from './types.js';
2
+ const nonNegative = (label, x) => {
3
+ // `!(x >= 0)` rejects negatives AND NaN (a plain `< 0` would let NaN through).
4
+ if (!(x >= 0))
5
+ throw new Error(`${label} must be >= 0 (got ${x})`);
6
+ return x;
7
+ };
8
+ /**
9
+ * Default strategy: ABSOLUTE top-2 gap `s0 − s1 >= threshold`. Simple and
10
+ * interpretable, but embedder-relative (the gap scale depends on the embedding
11
+ * geometry). Use `ratioStrategy` for cross-embedder transfer.
12
+ */
13
+ export function marginStrategy(threshold = DEFAULT_CLEAR_WINNER_MARGIN) {
14
+ nonNegative('marginStrategy: threshold', threshold);
15
+ return {
16
+ name: `margin>=${threshold}`,
17
+ isClearWinner: (s) => s.length >= 2 && s[0] - s[1] >= threshold,
18
+ };
19
+ }
20
+ /**
21
+ * Scale-invariant strategy: top-2 gap as a FRACTION of the top score,
22
+ * `(s0 − s1) / |s0| >= threshold`. Transfers across embedders / answer lengths
23
+ * where the absolute margin does not. A zero (or all-equal) top is never a
24
+ * clear winner.
25
+ */
26
+ export function ratioStrategy(threshold = DEFAULT_CLEAR_WINNER_RATIO) {
27
+ nonNegative('ratioStrategy: threshold', threshold);
28
+ return {
29
+ name: `ratio>=${threshold}`,
30
+ isClearWinner: (s) => {
31
+ if (s.length < 2)
32
+ return false;
33
+ const denom = Math.abs(s[0]);
34
+ if (denom === 0)
35
+ return false; // flat at zero → no clear winner (avoid div-by-zero)
36
+ return (s[0] - s[1]) / denom >= threshold;
37
+ },
38
+ };
39
+ }
40
+ /** Finite score, or −Infinity for a malformed (NaN/+Infinity/−Infinity) one —
41
+ * so a bad embedder degrades that item to "ranked last", never corrupts the
42
+ * ordering. Note +Infinity is demoted too: a meaningless score is never a win. */
43
+ const finiteScore = (s) => (Number.isFinite(s.score) ? s.score : -Infinity);
44
+ /** Total, NaN-free comparator (descending) — correctness does not rest on the
45
+ * engine's handling of a NaN comparator return for the all-malformed case. */
46
+ const byScoreDesc = (a, b) => {
47
+ const x = finiteScore(a);
48
+ const y = finiteScore(b);
49
+ return x > y ? -1 : x < y ? 1 : 0;
50
+ };
51
+ /**
52
+ * Assess whether an influence ranking has a clear winner to trust as a lead,
53
+ * or is too close to call and should be confirmed by ablation.
54
+ *
55
+ * Guarantees (relied on by the localizer): the returned `shortlist` always
56
+ * contains `lead` when there is one, and — when there is NO clear winner and
57
+ * there are ≥2 suspects — always contains the runner-up too (so ablation over
58
+ * the shortlist covers the real culprit even if it ranked below an innocent).
59
+ *
60
+ * @param scores `scoreInfluence` output (any order — re-sorted defensively).
61
+ * Ids are assumed unique (as `scoreInfluence` enforces); the
62
+ * shortlist is de-duplicated defensively regardless.
63
+ * @throws Error on negative or NaN options.
64
+ */
65
+ export function rankingConfidence(scores, options = {}) {
66
+ // strategy WINS over clearWinnerMargin; the default builds a margin strategy
67
+ // (which validates its own threshold).
68
+ const strategy = options.strategy ?? marginStrategy(options.clearWinnerMargin ?? DEFAULT_CLEAR_WINNER_MARGIN);
69
+ const shortlistBand = nonNegative('rankingConfidence: shortlistBand', options.shortlistBand ?? DEFAULT_SHORTLIST_BAND);
70
+ if (scores.length === 0) {
71
+ return { clearWinner: false, margin: undefined, lead: undefined, shortlist: [], reason: 'No suspects to rank.' };
72
+ }
73
+ const ranked = [...scores].sort(byScoreDesc);
74
+ const top = ranked[0];
75
+ const topScore = finiteScore(top);
76
+ if (ranked.length === 1) {
77
+ return {
78
+ clearWinner: true,
79
+ margin: undefined,
80
+ lead: top.id,
81
+ shortlist: [top.id],
82
+ reason: `Only one suspect "${top.id}" — clear by default (nothing to compare against); confirm by ablation for a causal claim.`,
83
+ };
84
+ }
85
+ const secondScore = finiteScore(ranked[1]);
86
+ // Clear winner, robust to malformed scores (framework invariants, NOT the
87
+ // strategy's concern):
88
+ // - top itself malformed (e.g. all-malformed) → no clear winner, no margin.
89
+ // - clean finite top, malformed runner-up → unambiguous lead → clear winner
90
+ // (the inverse of suppressing it); no meaningful finite gap to report.
91
+ // - both finite → the pluggable STRATEGY decides, over all finite scores.
92
+ let clearWinner;
93
+ let margin;
94
+ if (!Number.isFinite(topScore)) {
95
+ clearWinner = false;
96
+ margin = undefined;
97
+ }
98
+ else if (!Number.isFinite(secondScore)) {
99
+ clearWinner = true;
100
+ margin = undefined;
101
+ }
102
+ else {
103
+ margin = topScore - secondScore;
104
+ const finiteRanked = ranked.map(finiteScore).filter((x) => Number.isFinite(x));
105
+ clearWinner = strategy.isClearWinner(finiteRanked);
106
+ }
107
+ // Shortlist = the band of FINITE scores within shortlistBand of a finite top.
108
+ // Then enforce the guarantees: lead always present; when there is no clear
109
+ // winner with ≥2 suspects, the runner-up is present too.
110
+ const shortlist = [];
111
+ const seen = new Set();
112
+ const add = (id) => {
113
+ if (!seen.has(id)) {
114
+ seen.add(id);
115
+ shortlist.push(id);
116
+ }
117
+ };
118
+ if (Number.isFinite(topScore)) {
119
+ for (const s of ranked) {
120
+ const sc = finiteScore(s);
121
+ if (Number.isFinite(sc) && topScore - sc <= shortlistBand)
122
+ add(s.id);
123
+ }
124
+ }
125
+ add(top.id); // guarantee: lead always in the shortlist
126
+ if (!clearWinner)
127
+ add(ranked[1].id); // guarantee: no-clear-winner shortlist covers the runner-up
128
+ const gap = margin === undefined ? 'n/a' : margin.toFixed(3);
129
+ const reason = clearWinner
130
+ ? margin === undefined
131
+ ? `Clear winner [${strategy.name}]: "${top.id}" leads clearly (runner-up score unavailable). A clear lead is a similarity PROXY, not a proven cause — confirm by ablation.`
132
+ : `Clear winner [${strategy.name}]: "${top.id}" leads (top-2 margin ${gap}). A clear lead is a similarity PROXY, not a proven cause — confirm by ablation.`
133
+ : `Too close to call [${strategy.name}]: top-2 margin ${gap} — no suspect stands out by output similarity. Double-check the ${shortlist.length} shortlisted suspect(s) by ABLATION. Similarity scoring is blind to absence/crowding bugs (history truncation, context dilution), where the culprit need not resemble the answer; a flat top can also mean genuinely co-equal sources.`;
134
+ return { clearWinner, margin, lead: top.id, shortlist, reason };
135
+ }
136
+ //# sourceMappingURL=attributability.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"attributability.js","sourceRoot":"","sources":["../../../../src/lib/influence-core/attributability.ts"],"names":[],"mappings":"AA2BA,OAAO,EAAE,2BAA2B,EAAE,0BAA0B,EAAE,sBAAsB,EAAE,MAAM,YAAY,CAAC;AAE7G,MAAM,WAAW,GAAG,CAAC,KAAa,EAAE,CAAS,EAAU,EAAE;IACvD,+EAA+E;IAC/E,IAAI,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC;QAAE,MAAM,IAAI,KAAK,CAAC,GAAG,KAAK,sBAAsB,CAAC,GAAG,CAAC,CAAC;IACnE,OAAO,CAAC,CAAC;AACX,CAAC,CAAC;AAEF;;;;GAIG;AACH,MAAM,UAAU,cAAc,CAAC,YAAoB,2BAA2B;IAC5E,WAAW,CAAC,2BAA2B,EAAE,SAAS,CAAC,CAAC;IACpD,OAAO;QACL,IAAI,EAAE,WAAW,SAAS,EAAE;QAC5B,aAAa,EAAE,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,MAAM,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,IAAI,SAAS;KAChE,CAAC;AACJ,CAAC;AAED;;;;;GAKG;AACH,MAAM,UAAU,aAAa,CAAC,YAAoB,0BAA0B;IAC1E,WAAW,CAAC,0BAA0B,EAAE,SAAS,CAAC,CAAC;IACnD,OAAO;QACL,IAAI,EAAE,UAAU,SAAS,EAAE;QAC3B,aAAa,EAAE,CAAC,CAAC,EAAE,EAAE;YACnB,IAAI,CAAC,CAAC,MAAM,GAAG,CAAC;gBAAE,OAAO,KAAK,CAAC;YAC/B,MAAM,KAAK,GAAG,IAAI,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC;YAC7B,IAAI,KAAK,KAAK,CAAC;gBAAE,OAAO,KAAK,CAAC,CAAC,qDAAqD;YACpF,OAAO,CAAC,CAAC,CAAC,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,CAAC,GAAG,KAAK,IAAI,SAAS,CAAC;QAC5C,CAAC;KACF,CAAC;AACJ,CAAC;AAoBD;;mFAEmF;AACnF,MAAM,WAAW,GAAG,CAAC,CAAiB,EAAU,EAAE,CAAC,CAAC,MAAM,CAAC,QAAQ,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,QAAQ,CAAC,CAAC;AAEpG;+EAC+E;AAC/E,MAAM,WAAW,GAAG,CAAC,CAAiB,EAAE,CAAiB,EAAU,EAAE;IACnE,MAAM,CAAC,GAAG,WAAW,CAAC,CAAC,CAAC,CAAC;IACzB,MAAM,CAAC,GAAG,WAAW,CAAC,CAAC,CAAC,CAAC;IACzB,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC,CAAC;AACpC,CAAC,CAAC;AAEF;;;;;;;;;;;;;GAaG;AACH,MAAM,UAAU,iBAAiB,CAC/B,MAAiC,EACjC,UAAoC,EAAE;IAEtC,6EAA6E;IAC7E,uCAAuC;IACvC,MAAM,QAAQ,GAAG,OAAO,CAAC,QAAQ,IAAI,cAAc,CAAC,OAAO,CAAC,iBAAiB,IAAI,2BAA2B,CAAC,CAAC;IAC9G,MAAM,aAAa,GAAG,WAAW,CAAC,kCAAkC,EAAE,OAAO,CAAC,aAAa,IAAI,sBAAsB,CAAC,CAAC;IAEvH,IAAI,MAAM,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;QACxB,OAAO,EAAE,WAAW,EAAE,KAAK,EAAE,MAAM,EAAE,SAAS,EAAE,IAAI,EAAE,SAAS,EAAE,SAAS,EAAE,EAAE,EAAE,MAAM,EAAE,sBAAsB,EAAE,CAAC;IACnH,CAAC;IAED,MAAM,MAAM,GAAG,CAAC,GAAG,MAAM,CAAC,CAAC,IAAI,CAAC,WAAW,CAAC,CAAC;IAC7C,MAAM,GAAG,GAAG,MAAM,CAAC,CAAC,CAAC,CAAC;IACtB,MAAM,QAAQ,GAAG,WAAW,CAAC,GAAG,CAAC,CAAC;IAElC,IAAI,MAAM,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;QACxB,OAAO;YACL,WAAW,EAAE,IAAI;YACjB,MAAM,EAAE,SAAS;YACjB,IAAI,EAAE,GAAG,CAAC,EAAE;YACZ,SAAS,EAAE,CAAC,GAAG,CAAC,EAAE,CAAC;YACnB,MAAM,EAAE,qBAAqB,GAAG,CAAC,EAAE,4FAA4F;SAChI,CAAC;IACJ,CAAC;IAED,MAAM,WAAW,GAAG,WAAW,CAAC,MAAM,CAAC,CAAC,CAAC,CAAC,CAAC;IAE3C,0EAA0E;IAC1E,uBAAuB;IACvB,6EAA6E;IAC7E,6EAA6E;IAC7E,0EAA0E;IAC1E,2EAA2E;IAC3E,IAAI,WAAoB,CAAC;IACzB,IAAI,MAA0B,CAAC;IAC/B,IAAI,CAAC,MAAM,CAAC,QAAQ,CAAC,QAAQ,CAAC,EAAE,CAAC;QAC/B,WAAW,GAAG,KAAK,CAAC;QACpB,MAAM,GAAG,SAAS,CAAC;IACrB,CAAC;SAAM,IAAI,CAAC,MAAM,CAAC,QAAQ,CAAC,WAAW,CAAC,EAAE,CAAC;QACzC,WAAW,GAAG,IAAI,CAAC;QACnB,MAAM,GAAG,SAAS,CAAC;IACrB,CAAC;SAAM,CAAC;QACN,MAAM,GAAG,QAAQ,GAAG,WAAW,CAAC;QAChC,MAAM,YAAY,GAAG,MAAM,CAAC,GAAG,CAAC,WAAW,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,MAAM,CAAC,QAAQ,CAAC,CAAC,CAAC,CAAC,CAAC;QAC/E,WAAW,GAAG,QAAQ,CAAC,aAAa,CAAC,YAAY,CAAC,CAAC;IACrD,CAAC;IAED,8EAA8E;IAC9E,2EAA2E;IAC3E,yDAAyD;IACzD,MAAM,SAAS,GAAa,EAAE,CAAC;IAC/B,MAAM,IAAI,GAAG,IAAI,GAAG,EAAU,CAAC;IAC/B,MAAM,GAAG,GAAG,CAAC,EAAU,EAAE,EAAE;QACzB,IAAI,CAAC,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,EAAE,CAAC;YAClB,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,CAAC;YACb,SAAS,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;QACrB,CAAC;IACH,CAAC,CAAC;IACF,IAAI,MAAM,CAAC,QAAQ,CAAC,QAAQ,CAAC,EAAE,CAAC;QAC9B,KAAK,MAAM,CAAC,IAAI,MAAM,EAAE,CAAC;YACvB,MAAM,EAAE,GAAG,WAAW,CAAC,CAAC,CAAC,CAAC;YAC1B,IAAI,MAAM,CAAC,QAAQ,CAAC,EAAE,CAAC,IAAI,QAAQ,GAAG,EAAE,IAAI,aAAa;gBAAE,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC;QACvE,CAAC;IACH,CAAC;IACD,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,CAAC,CAAC,0CAA0C;IACvD,IAAI,CAAC,WAAW;QAAE,GAAG,CAAC,MAAM,CAAC,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,4DAA4D;IAEjG,MAAM,GAAG,GAAG,MAAM,KAAK,SAAS,CAAC,CAAC,CAAC,KAAK,CAAC,CAAC,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC,CAAC,CAAC;IAC7D,MAAM,MAAM,GAAG,WAAW;QACxB,CAAC,CAAC,MAAM,KAAK,SAAS;YACpB,CAAC,CAAC,iBAAiB,QAAQ,CAAC,IAAI,OAAO,GAAG,CAAC,EAAE,8HAA8H;YAC3K,CAAC,CAAC,iBAAiB,QAAQ,CAAC,IAAI,OAAO,GAAG,CAAC,EAAE,yBAAyB,GAAG,kFAAkF;QAC7J,CAAC,CAAC,sBAAsB,QAAQ,CAAC,IAAI,mBAAmB,GAAG,mEAAmE,SAAS,CAAC,MAAM,wOAAwO,CAAC;IAEzX,OAAO,EAAE,WAAW,EAAE,MAAM,EAAE,IAAI,EAAE,GAAG,CAAC,EAAE,EAAE,SAAS,EAAE,MAAM,EAAE,CAAC;AAClE,CAAC"}
@@ -24,7 +24,8 @@
24
24
  * embedding-geometry PROXY — semantic alignment, never model internals
25
25
  * and never causal attribution.
26
26
  */
27
- export { DEFAULT_INFLUENCE_WEIGHTS, DEFAULT_MARGIN_THRESHOLD, DEFAULT_PERSISTENCE_THRESHOLD, } from './types.js';
27
+ export { DEFAULT_CLEAR_WINNER_MARGIN, DEFAULT_CLEAR_WINNER_RATIO, DEFAULT_INFLUENCE_WEIGHTS, DEFAULT_MARGIN_THRESHOLD, DEFAULT_PERSISTENCE_THRESHOLD, DEFAULT_SHORTLIST_BAND, } from './types.js';
28
+ export { marginStrategy, rankingConfidence, ratioStrategy, } from './attributability.js';
28
29
  export { contentHash, EmbeddingCache, embeddingCache, } from './cache.js';
29
30
  export { adaptWeights, averageRelevancy, compositeScore, finalAnswerSimilarity, persistence, scoreInfluence, structuralProximity, } from './signals.js';
30
31
  export { pairwiseSimilarity } from './similarity.js';
@@ -1 +1 @@
1
- {"version":3,"file":"index.js","sourceRoot":"","sources":["../../../../src/lib/influence-core/index.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;;;;;GAyBG;AAiBH,OAAO,EACL,yBAAyB,EACzB,wBAAwB,EACxB,6BAA6B,GAC9B,MAAM,YAAY,CAAC;AAEpB,OAAO,EACL,WAAW,EACX,cAAc,EACd,cAAc,GAGf,MAAM,YAAY,CAAC;AAEpB,OAAO,EACL,YAAY,EACZ,gBAAgB,EAChB,cAAc,EACd,qBAAqB,EACrB,WAAW,EACX,cAAc,EACd,mBAAmB,GAEpB,MAAM,cAAc,CAAC;AAEtB,OAAO,EAAE,kBAAkB,EAA+B,MAAM,iBAAiB,CAAC;AAElF,OAAO,EAAE,WAAW,EAAwB,MAAM,aAAa,CAAC"}
1
+ {"version":3,"file":"index.js","sourceRoot":"","sources":["../../../../src/lib/influence-core/index.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;;;;;GAyBG;AAmBH,OAAO,EACL,2BAA2B,EAC3B,0BAA0B,EAC1B,yBAAyB,EACzB,wBAAwB,EACxB,6BAA6B,EAC7B,sBAAsB,GACvB,MAAM,YAAY,CAAC;AAEpB,OAAO,EACL,cAAc,EACd,iBAAiB,EACjB,aAAa,GAEd,MAAM,sBAAsB,CAAC;AAE9B,OAAO,EACL,WAAW,EACX,cAAc,EACd,cAAc,GAGf,MAAM,YAAY,CAAC;AAEpB,OAAO,EACL,YAAY,EACZ,gBAAgB,EAChB,cAAc,EACd,qBAAqB,EACrB,WAAW,EACX,cAAc,EACd,mBAAmB,GAEpB,MAAM,cAAc,CAAC;AAEtB,OAAO,EAAE,kBAAkB,EAA+B,MAAM,iBAAiB,CAAC;AAElF,OAAO,EAAE,WAAW,EAAwB,MAAM,aAAa,CAAC"}
@@ -32,4 +32,31 @@ export const DEFAULT_INFLUENCE_WEIGHTS = Object.freeze({
32
32
  export const DEFAULT_PERSISTENCE_THRESHOLD = 0.3;
33
33
  /** RFC-002 §4 default: margins below this flag the choice as `narrow`. */
34
34
  export const DEFAULT_MARGIN_THRESHOLD = 0.05;
35
+ /**
36
+ * RFC-003 default: an influence ranking whose top-1 vs top-2 score margin is
37
+ * below this has NO clear winner — a shortlist, not a verdict. Escalate to
38
+ * ablation.
39
+ *
40
+ * UNCALIBRATED proxy starting point, chosen for interpretability. `margin`
41
+ * is an ABSOLUTE difference on the same scale as `scoreInfluence`'s composite
42
+ * (S ∈ ≈[−0.7, 1]), so this threshold is EMBEDDER-RELATIVE — recalibrate by
43
+ * sweeping clear-winner vs flat rankings on your embedder. The numeric
44
+ * coincidence with `DEFAULT_MARGIN_THRESHOLD` is NOT a shared derivation: that
45
+ * one measures `scoreMargin`'s chosen-vs-not-chosen distribution, a different
46
+ * statistic.
47
+ */
48
+ export const DEFAULT_CLEAR_WINNER_MARGIN = 0.05;
49
+ /**
50
+ * RFC-003 default: when there is no clear winner, suspects scoring within this
51
+ * band of the top form the shortlist ablation should COVER (the culprit may be
52
+ * any of them — or, for absence bugs, none). UNCALIBRATED proxy; embedder-
53
+ * relative (see `DEFAULT_CLEAR_WINNER_MARGIN`).
54
+ */
55
+ export const DEFAULT_SHORTLIST_BAND = 0.1;
56
+ /**
57
+ * RFC-003 default for `ratioStrategy`: the top-2 gap as a FRACTION of the top
58
+ * score `(s0 − s1) / |s0|`. Unlike the absolute margin this is scale-invariant,
59
+ * so it transfers across embedders / answer lengths. UNCALIBRATED proxy.
60
+ */
61
+ export const DEFAULT_CLEAR_WINNER_RATIO = 0.05;
35
62
  //# sourceMappingURL=types.js.map
@@ -1 +1 @@
1
- {"version":3,"file":"types.js","sourceRoot":"","sources":["../../../../src/lib/influence-core/types.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;;GAsBG;AA0BH,mEAAmE;AACnE,MAAM,CAAC,MAAM,yBAAyB,GAAqB,MAAM,CAAC,MAAM,CAAC;IACvE,EAAE,EAAE,GAAG;IACP,GAAG,EAAE,GAAG;IACR,OAAO,EAAE,GAAG;IACZ,KAAK,EAAE,GAAG;CACX,CAAC,CAAC;AAEH,yDAAyD;AACzD,MAAM,CAAC,MAAM,6BAA6B,GAAG,GAAG,CAAC;AAEjD,0EAA0E;AAC1E,MAAM,CAAC,MAAM,wBAAwB,GAAG,IAAI,CAAC"}
1
+ {"version":3,"file":"types.js","sourceRoot":"","sources":["../../../../src/lib/influence-core/types.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;;GAsBG;AA0BH,mEAAmE;AACnE,MAAM,CAAC,MAAM,yBAAyB,GAAqB,MAAM,CAAC,MAAM,CAAC;IACvE,EAAE,EAAE,GAAG;IACP,GAAG,EAAE,GAAG;IACR,OAAO,EAAE,GAAG;IACZ,KAAK,EAAE,GAAG;CACX,CAAC,CAAC;AAEH,yDAAyD;AACzD,MAAM,CAAC,MAAM,6BAA6B,GAAG,GAAG,CAAC;AAEjD,0EAA0E;AAC1E,MAAM,CAAC,MAAM,wBAAwB,GAAG,IAAI,CAAC;AAE7C;;;;;;;;;;;;GAYG;AACH,MAAM,CAAC,MAAM,2BAA2B,GAAG,IAAI,CAAC;AAEhD;;;;;GAKG;AACH,MAAM,CAAC,MAAM,sBAAsB,GAAG,GAAG,CAAC;AAE1C;;;;GAIG;AACH,MAAM,CAAC,MAAM,0BAA0B,GAAG,IAAI,CAAC"}
@@ -65,7 +65,7 @@ export { typedEmit } from './recorders/core/typedEmit.js';
65
65
  // edge weigher). Honest claim: every score is an embedding-geometry
66
66
  // PROXY — semantic alignment, never model internals, never causal
67
67
  // attribution.
68
- export { adaptWeights, averageRelevancy, compositeScore, contentHash, DEFAULT_INFLUENCE_WEIGHTS, DEFAULT_MARGIN_THRESHOLD, DEFAULT_PERSISTENCE_THRESHOLD, EmbeddingCache, embeddingCache, finalAnswerSimilarity, pairwiseSimilarity, persistence, scoreInfluence, scoreMargin, structuralProximity, } from './lib/influence-core/index.js';
68
+ export { adaptWeights, averageRelevancy, compositeScore, contentHash, DEFAULT_CLEAR_WINNER_MARGIN, DEFAULT_CLEAR_WINNER_RATIO, DEFAULT_INFLUENCE_WEIGHTS, DEFAULT_MARGIN_THRESHOLD, DEFAULT_PERSISTENCE_THRESHOLD, DEFAULT_SHORTLIST_BAND, EmbeddingCache, embeddingCache, finalAnswerSimilarity, marginStrategy, pairwiseSimilarity, persistence, rankingConfidence, ratioStrategy, scoreInfluence, scoreMargin, structuralProximity, } from './lib/influence-core/index.js';
69
69
  // Introspection toolpack (RFC-003 Part C) — footprintjs trace evidence
70
70
  // exposed as TOOLS a debugging LLM calls over a COMPLETED run's artifacts.
71
71
  // Bounded, honest (⚠ markers), redaction-respecting, id-navigable.
@@ -81,7 +81,7 @@ export { buildSelfExplainSkill, buildSelfExplainToolProvider, SelfExplainBinding
81
81
  // counterfactual ablation. §B2 claim tiers: scores/weights are
82
82
  // embedding-geometry PROXIES; ablation verdicts are the ONLY causal
83
83
  // claims; slice completeness is bounded by tracking — and says so.
84
- export { ablationForSuspect, applyAblations, bisectCulprits, CONTEXT_BISECT_DEFAULTS, defaultOutcomeComparator, defaultSuspectClassifier, formatContextBugReport, llmCallIdsFromEvents, llmEdgeWeigher, localizeContextBug, probeFlipped, runAblationProbe, stepOutputText, suspectLabel, verdictFor, } from './lib/context-bisect/index.js';
84
+ export { ablationForSuspect, applyAblations, bisectCulprits, CONTEXT_BISECT_DEFAULTS, defaultOutcomeComparator, defaultSuspectClassifier, findDroppedContext, formatContextBugReport, llmCallIdsFromEvents, llmEdgeWeigher, localizeContextBug, probeFlipped, runAblationProbe, stepOutputText, suspectLabel, verdictFor, } from './lib/context-bisect/index.js';
85
85
  // BacktrackTrace serializer — feeds agentThinkingUI's <BacktrackView>
86
86
  // (the "why?" board) straight off a localizer report. Pure mapping, no
87
87
  // UI dependency; the interfaces mirror agentthinkingui's contract.
@@ -1 +1 @@
1
- {"version":3,"file":"observe.js","sourceRoot":"","sources":["../../src/observe.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;GA+BG;AAEH,4BAA4B;AAC5B,OAAO,EAAE,eAAe,EAA+B,MAAM,qCAAqC,CAAC;AACnG,OAAO,EAAE,cAAc,EAA8B,MAAM,oCAAoC,CAAC;AAEhG,+BAA+B;AAC/B,OAAO,EACL,mBAAmB,GAEpB,MAAM,yCAAyC,CAAC;AACjD,OAAO,EAAE,aAAa,EAA6B,MAAM,mCAAmC,CAAC;AAC7F,OAAO,EACL,gBAAgB,EAChB,gBAAgB,GAgBjB,MAAM,+CAA+C,CAAC;AACvD,OAAO,EACL,aAAa,EACb,eAAe,EACf,eAAe,GAQhB,MAAM,8CAA8C,CAAC;AACtD,OAAO,EACL,eAAe,EACf,cAAc,GAQf,MAAM,gDAAgD,CAAC;AACxD,OAAO,EACL,iBAAiB,EACjB,iBAAiB,EACjB,cAAc,EACd,eAAe,EACf,oBAAoB,GAKrB,MAAM,gDAAgD,CAAC;AAExD,6BAA6B;AAC7B,OAAO,EAAE,YAAY,EAA4B,MAAM,kCAAkC,CAAC;AAC1F,OAAO,EAAE,aAAa,EAA6B,MAAM,mCAAmC,CAAC;AAC7F,OAAO,EACL,wBAAwB,GAEzB,MAAM,8CAA8C,CAAC;AACtD,OAAO,EAAE,YAAY,EAA4B,MAAM,kCAAkC,CAAC;AAC1F,OAAO,EAAE,cAAc,EAA8B,MAAM,oCAAoC,CAAC;AAChG,OAAO,EACL,kBAAkB,GAEnB,MAAM,wCAAwC,CAAC;AAChD,OAAO,EAAE,aAAa,EAA6B,MAAM,mCAAmC,CAAC;AAC7F,OAAO,EACL,aAAa,EACb,cAAc,GAIf,MAAM,8CAA8C,CAAC;AACtD,OAAO,EACL,YAAY,GAGb,MAAM,6CAA6C,CAAC;AACrD,4EAA4E;AAC5E,gFAAgF;AAChF,OAAO,EACL,mBAAmB,GAMpB,MAAM,kDAAkD,CAAC;AAC1D,gFAAgF;AAChF,gFAAgF;AAChF,OAAO,EACL,kBAAkB,GAQnB,MAAM,yDAAyD,CAAC;AAEjE,uDAAuD;AACvD,OAAO,EAAE,SAAS,EAAE,MAAM,+BAA+B,CAAC;AAE1D,uEAAuE;AACvE,uEAAuE;AACvE,uEAAuE;AACvE,oEAAoE;AACpE,oEAAoE;AACpE,kEAAkE;AAClE,eAAe;AACf,OAAO,EACL,YAAY,EACZ,gBAAgB,EAChB,cAAc,EACd,WAAW,EACX,yBAAyB,EACzB,wBAAwB,EACxB,6BAA6B,EAC7B,cAAc,EACd,cAAc,EACd,qBAAqB,EACrB,kBAAkB,EAClB,WAAW,EACX,cAAc,EACd,WAAW,EACX,mBAAmB,GAiBpB,MAAM,+BAA+B,CAAC;AACvC,uEAAuE;AACvE,2EAA2E;AAC3E,mEAAmE;AACnE,OAAO,EACL,aAAa,EACb,iBAAiB,EACjB,wBAAwB,EACxB,kBAAkB,EAClB,aAAa,GAGd,MAAM,+BAA+B,CAAC;AACvC,uEAAuE;AACvE,yEAAyE;AACzE,0EAA0E;AAC1E,kEAAkE;AAClE,OAAO,EACL,qBAAqB,EACrB,4BAA4B,EAC5B,kBAAkB,EAClB,eAAe,GAGhB,MAAM,+BAA+B,CAAC;AACvC,qEAAqE;AACrE,sEAAsE;AACtE,sEAAsE;AACtE,+DAA+D;AAC/D,oEAAoE;AACpE,mEAAmE;AACnE,OAAO,EACL,kBAAkB,EAClB,cAAc,EACd,cAAc,EACd,uBAAuB,EACvB,wBAAwB,EACxB,wBAAwB,EACxB,sBAAsB,EACtB,oBAAoB,EACpB,cAAc,EACd,kBAAkB,EAClB,YAAY,EACZ,gBAAgB,EAChB,cAAc,EACd,YAAY,EACZ,UAAU,GA+BX,MAAM,+BAA+B,CAAC;AACvC,sEAAsE;AACtE,uEAAuE;AACvE,mEAAmE;AACnE,OAAO,EACL,gBAAgB,GAOjB,MAAM,+BAA+B,CAAC;AACvC,wEAAwE;AACxE,8EAA8E;AAC9E,qEAAqE;AACrE,wEAAwE;AACxE,uEAAuE;AACvE,gEAAgE;AAChE,gDAAgD;AAChD,OAAO,EACL,kBAAkB,EAClB,gBAAgB,EAChB,aAAa,EACb,iBAAiB,EACjB,+BAA+B,EAC/B,qBAAqB,EACrB,kBAAkB,EAClB,iBAAiB,EACjB,sBAAsB,EACtB,eAAe,EACf,mBAAmB,EACnB,eAAe,EACf,uBAAuB,EACvB,yBAAyB,EACzB,iBAAiB,EACjB,cAAc,EACd,mBAAmB,GAepB,MAAM,0BAA0B,CAAC;AAClC,sEAAsE;AACtE,uEAAuE;AACvE,qEAAqE;AACrE,oEAAoE;AACpE,OAAO,EACL,kBAAkB,EAClB,kBAAkB,GAOnB,MAAM,iDAAiD,CAAC"}
1
+ {"version":3,"file":"observe.js","sourceRoot":"","sources":["../../src/observe.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;GA+BG;AAEH,4BAA4B;AAC5B,OAAO,EAAE,eAAe,EAA+B,MAAM,qCAAqC,CAAC;AACnG,OAAO,EAAE,cAAc,EAA8B,MAAM,oCAAoC,CAAC;AAEhG,+BAA+B;AAC/B,OAAO,EACL,mBAAmB,GAEpB,MAAM,yCAAyC,CAAC;AACjD,OAAO,EAAE,aAAa,EAA6B,MAAM,mCAAmC,CAAC;AAC7F,OAAO,EACL,gBAAgB,EAChB,gBAAgB,GAgBjB,MAAM,+CAA+C,CAAC;AACvD,OAAO,EACL,aAAa,EACb,eAAe,EACf,eAAe,GAQhB,MAAM,8CAA8C,CAAC;AACtD,OAAO,EACL,eAAe,EACf,cAAc,GAQf,MAAM,gDAAgD,CAAC;AACxD,OAAO,EACL,iBAAiB,EACjB,iBAAiB,EACjB,cAAc,EACd,eAAe,EACf,oBAAoB,GAKrB,MAAM,gDAAgD,CAAC;AAExD,6BAA6B;AAC7B,OAAO,EAAE,YAAY,EAA4B,MAAM,kCAAkC,CAAC;AAC1F,OAAO,EAAE,aAAa,EAA6B,MAAM,mCAAmC,CAAC;AAC7F,OAAO,EACL,wBAAwB,GAEzB,MAAM,8CAA8C,CAAC;AACtD,OAAO,EAAE,YAAY,EAA4B,MAAM,kCAAkC,CAAC;AAC1F,OAAO,EAAE,cAAc,EAA8B,MAAM,oCAAoC,CAAC;AAChG,OAAO,EACL,kBAAkB,GAEnB,MAAM,wCAAwC,CAAC;AAChD,OAAO,EAAE,aAAa,EAA6B,MAAM,mCAAmC,CAAC;AAC7F,OAAO,EACL,aAAa,EACb,cAAc,GAIf,MAAM,8CAA8C,CAAC;AACtD,OAAO,EACL,YAAY,GAGb,MAAM,6CAA6C,CAAC;AACrD,4EAA4E;AAC5E,gFAAgF;AAChF,OAAO,EACL,mBAAmB,GAMpB,MAAM,kDAAkD,CAAC;AAC1D,gFAAgF;AAChF,gFAAgF;AAChF,OAAO,EACL,kBAAkB,GAQnB,MAAM,yDAAyD,CAAC;AAEjE,uDAAuD;AACvD,OAAO,EAAE,SAAS,EAAE,MAAM,+BAA+B,CAAC;AAE1D,uEAAuE;AACvE,uEAAuE;AACvE,uEAAuE;AACvE,oEAAoE;AACpE,oEAAoE;AACpE,kEAAkE;AAClE,eAAe;AACf,OAAO,EACL,YAAY,EACZ,gBAAgB,EAChB,cAAc,EACd,WAAW,EACX,2BAA2B,EAC3B,0BAA0B,EAC1B,yBAAyB,EACzB,wBAAwB,EACxB,6BAA6B,EAC7B,sBAAsB,EACtB,cAAc,EACd,cAAc,EACd,qBAAqB,EACrB,cAAc,EACd,kBAAkB,EAClB,WAAW,EACX,iBAAiB,EACjB,aAAa,EACb,cAAc,EACd,WAAW,EACX,mBAAmB,GAoBpB,MAAM,+BAA+B,CAAC;AACvC,uEAAuE;AACvE,2EAA2E;AAC3E,mEAAmE;AACnE,OAAO,EACL,aAAa,EACb,iBAAiB,EACjB,wBAAwB,EACxB,kBAAkB,EAClB,aAAa,GAGd,MAAM,+BAA+B,CAAC;AACvC,uEAAuE;AACvE,yEAAyE;AACzE,0EAA0E;AAC1E,kEAAkE;AAClE,OAAO,EACL,qBAAqB,EACrB,4BAA4B,EAC5B,kBAAkB,EAClB,eAAe,GAGhB,MAAM,+BAA+B,CAAC;AACvC,qEAAqE;AACrE,sEAAsE;AACtE,sEAAsE;AACtE,+DAA+D;AAC/D,oEAAoE;AACpE,mEAAmE;AACnE,OAAO,EACL,kBAAkB,EAClB,cAAc,EACd,cAAc,EACd,uBAAuB,EACvB,wBAAwB,EACxB,wBAAwB,EACxB,kBAAkB,EAClB,sBAAsB,EACtB,oBAAoB,EACpB,cAAc,EACd,kBAAkB,EAClB,YAAY,EACZ,gBAAgB,EAChB,cAAc,EACd,YAAY,EACZ,UAAU,GAkCX,MAAM,+BAA+B,CAAC;AACvC,sEAAsE;AACtE,uEAAuE;AACvE,mEAAmE;AACnE,OAAO,EACL,gBAAgB,GAOjB,MAAM,+BAA+B,CAAC;AACvC,wEAAwE;AACxE,8EAA8E;AAC9E,qEAAqE;AACrE,wEAAwE;AACxE,uEAAuE;AACvE,gEAAgE;AAChE,gDAAgD;AAChD,OAAO,EACL,kBAAkB,EAClB,gBAAgB,EAChB,aAAa,EACb,iBAAiB,EACjB,+BAA+B,EAC/B,qBAAqB,EACrB,kBAAkB,EAClB,iBAAiB,EACjB,sBAAsB,EACtB,eAAe,EACf,mBAAmB,EACnB,eAAe,EACf,uBAAuB,EACvB,yBAAyB,EACzB,iBAAiB,EACjB,cAAc,EACd,mBAAmB,GAepB,MAAM,0BAA0B,CAAC;AAClC,sEAAsE;AACtE,uEAAuE;AACvE,qEAAqE;AACrE,oEAAoE;AACpE,OAAO,EACL,kBAAkB,EAClB,kBAAkB,GAOnB,MAAM,iDAAiD,CAAC"}
@@ -16,10 +16,13 @@
16
16
  * claims; slice completeness is bounded by tracking — and says so.
17
17
  */
18
18
  Object.defineProperty(exports, "__esModule", { value: true });
19
- exports.CONTEXT_BISECT_DEFAULTS = exports.bisectCulprits = exports.verdictFor = exports.runAblationProbe = exports.probeFlipped = exports.defaultOutcomeComparator = exports.applyAblations = exports.ablationForSuspect = exports.toBacktrackTrace = exports.suspectLabel = exports.localizeContextBug = exports.llmCallIdsFromEvents = exports.formatContextBugReport = exports.defaultSuspectClassifier = exports.stepOutputText = exports.llmEdgeWeigher = void 0;
19
+ exports.CONTEXT_BISECT_DEFAULTS = exports.bisectCulprits = exports.verdictFor = exports.runAblationProbe = exports.probeFlipped = exports.defaultOutcomeComparator = exports.applyAblations = exports.ablationForSuspect = exports.toBacktrackTrace = exports.suspectLabel = exports.localizeContextBug = exports.llmCallIdsFromEvents = exports.formatContextBugReport = exports.defaultSuspectClassifier = exports.findDroppedContext = exports.stepOutputText = exports.llmEdgeWeigher = void 0;
20
20
  var llmEdgeWeigher_js_1 = require("./llmEdgeWeigher.js");
21
21
  Object.defineProperty(exports, "llmEdgeWeigher", { enumerable: true, get: function () { return llmEdgeWeigher_js_1.llmEdgeWeigher; } });
22
22
  Object.defineProperty(exports, "stepOutputText", { enumerable: true, get: function () { return llmEdgeWeigher_js_1.stepOutputText; } });
23
+ // Interface #3 — missing-context finder (available − sent; confirm by restoration).
24
+ var missingContext_js_1 = require("./missingContext.js");
25
+ Object.defineProperty(exports, "findDroppedContext", { enumerable: true, get: function () { return missingContext_js_1.findDroppedContext; } });
23
26
  var localize_js_1 = require("./localize.js");
24
27
  Object.defineProperty(exports, "defaultSuspectClassifier", { enumerable: true, get: function () { return localize_js_1.defaultSuspectClassifier; } });
25
28
  Object.defineProperty(exports, "formatContextBugReport", { enumerable: true, get: function () { return localize_js_1.formatContextBugReport; } });
@@ -1 +1 @@
1
- {"version":3,"file":"index.js","sourceRoot":"","sources":["../../../src/lib/context-bisect/index.ts"],"names":[],"mappings":";AAAA;;;;;;;;;;;;;;;GAeG;;;AAEH,yDAM6B;AAL3B,mHAAA,cAAc,OAAA;AACd,mHAAA,cAAc,OAAA;AAMhB,6CAUuB;AATrB,uHAAA,wBAAwB,OAAA;AACxB,qHAAA,sBAAsB,OAAA;AACtB,mHAAA,oBAAoB,OAAA;AACpB,iHAAA,kBAAkB,OAAA;AAClB,2GAAA,YAAY,OAAA;AAOd,6DAQ+B;AAP7B,uHAAA,gBAAgB,OAAA;AASlB,6CASuB;AARrB,iHAAA,kBAAkB,OAAA;AAClB,6GAAA,cAAc,OAAA;AACd,uHAAA,wBAAwB,OAAA;AACxB,2GAAA,YAAY,OAAA;AACZ,+GAAA,gBAAgB,OAAA;AAChB,yGAAA,UAAU,OAAA;AAKZ,yCAKqB;AAJnB,2GAAA,cAAc,OAAA;AAMhB,uCAqBoB;AApBlB,mHAAA,uBAAuB,OAAA"}
1
+ {"version":3,"file":"index.js","sourceRoot":"","sources":["../../../src/lib/context-bisect/index.ts"],"names":[],"mappings":";AAAA;;;;;;;;;;;;;;;GAeG;;;AAEH,yDAM6B;AAL3B,mHAAA,cAAc,OAAA;AACd,mHAAA,cAAc,OAAA;AAMhB,oFAAoF;AACpF,yDAK6B;AAJ3B,uHAAA,kBAAkB,OAAA;AAMpB,6CAUuB;AATrB,uHAAA,wBAAwB,OAAA;AACxB,qHAAA,sBAAsB,OAAA;AACtB,mHAAA,oBAAoB,OAAA;AACpB,iHAAA,kBAAkB,OAAA;AAClB,2GAAA,YAAY,OAAA;AAOd,6DAQ+B;AAP7B,uHAAA,gBAAgB,OAAA;AASlB,6CASuB;AARrB,iHAAA,kBAAkB,OAAA;AAClB,6GAAA,cAAc,OAAA;AACd,uHAAA,wBAAwB,OAAA;AACxB,2GAAA,YAAY,OAAA;AACZ,+GAAA,gBAAgB,OAAA;AAChB,yGAAA,UAAU,OAAA;AAKZ,yCAKqB;AAJnB,2GAAA,cAAc,OAAA;AAMhB,uCAqBoB;AApBlB,mHAAA,uBAAuB,OAAA"}
@@ -0,0 +1,67 @@
1
+ "use strict";
2
+ /**
3
+ * missingContext — interface #3: find context that was AVAILABLE but never
4
+ * reached the model (RFC-003).
5
+ *
6
+ * The localizer's influence ranking (#1) + ablation (#2) handle culprits that
7
+ * are PRESENT in the context. They are blind to the opposite failure: a needed
8
+ * unit that was *dropped* — truncated out of the window, or never selected —
9
+ * so the model never saw it. You cannot ablate what isn't there.
10
+ *
11
+ * This finder is the cheap, exact, deterministic half of that case: a SET
12
+ * DIFFERENCE over unit ids. The library tracks context as identified units
13
+ * (each injection / memory entry / tool result has a stable id), so "what got
14
+ * dropped" is `available − sent` — no embeddings, no LLM, O(n).
15
+ *
16
+ * Causal confirmation is the MIRROR of ablation: RESTORATION. Add a dropped
17
+ * unit back, re-run, and an outcome flip is the causal proof. Like ablation,
18
+ * the re-run is consumer-supplied (the library doesn't own your agent loop);
19
+ * see `findDroppedContext` docs + example 10 for the pattern.
20
+ *
21
+ * Honest claim: a dropped unit is a CANDIDATE missing-context culprit, never a
22
+ * confirmed cause — most dropped context is correctly dropped. Only restoration
23
+ * makes a causal claim.
24
+ */
25
+ Object.defineProperty(exports, "__esModule", { value: true });
26
+ exports.findDroppedContext = void 0;
27
+ /**
28
+ * Find context that was available for a turn but never reached the model —
29
+ * `available − sent` by id. Pure, deterministic, O(n); no model or embedder.
30
+ *
31
+ * Ids are assumed stable and unique per side (duplicates are de-duplicated,
32
+ * first occurrence wins). Units in `sent` but not `available` are ignored.
33
+ *
34
+ * Confirm a candidate causally by RESTORATION (the mirror of ablation): add the
35
+ * dropped unit back into the context and re-run; an outcome flip is the proof.
36
+ *
37
+ * @example
38
+ * const { dropped, anyDropped } = findDroppedContext(assembled, sentToModel);
39
+ * if (anyDropped) {
40
+ * for (const unit of dropped) {
41
+ * if (await rerunWith(unit).outcomeFlips()) report(unit); // restoration = causal
42
+ * }
43
+ * }
44
+ */
45
+ function findDroppedContext(available, sent) {
46
+ const sentIds = new Set();
47
+ for (const u of sent)
48
+ sentIds.add(u.id);
49
+ const dropped = [];
50
+ const seenAvailable = new Set();
51
+ for (const u of available) {
52
+ if (seenAvailable.has(u.id))
53
+ continue; // de-dup by id, first wins
54
+ seenAvailable.add(u.id);
55
+ if (!sentIds.has(u.id))
56
+ dropped.push(u.content === undefined ? { id: u.id } : { id: u.id, content: u.content });
57
+ }
58
+ const availableCount = seenAvailable.size;
59
+ const sentCount = sentIds.size;
60
+ const anyDropped = dropped.length > 0;
61
+ const reason = anyDropped
62
+ ? `${dropped.length} of ${availableCount} available unit(s) never reached the model — candidate(s) for a missing-context bug (truncation / dilution). Confirm by RESTORATION: add a unit back and re-run; an outcome flip is the causal proof (mirror of ablation). Most dropped context is correctly dropped — only restoration confirms.`
63
+ : `All ${availableCount} available unit(s) reached the model — no missing-context bug here (nothing was dropped).`;
64
+ return { dropped, availableCount, sentCount, anyDropped, reason };
65
+ }
66
+ exports.findDroppedContext = findDroppedContext;
67
+ //# sourceMappingURL=missingContext.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"missingContext.js","sourceRoot":"","sources":["../../../src/lib/context-bisect/missingContext.ts"],"names":[],"mappings":";AAAA;;;;;;;;;;;;;;;;;;;;;;GAsBG;;;AAkCH;;;;;;;;;;;;;;;;;GAiBG;AACH,SAAgB,kBAAkB,CAChC,SAAiC,EACjC,IAA4B;IAE5B,MAAM,OAAO,GAAG,IAAI,GAAG,EAAU,CAAC;IAClC,KAAK,MAAM,CAAC,IAAI,IAAI;QAAE,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC;IAExC,MAAM,OAAO,GAAkB,EAAE,CAAC;IAClC,MAAM,aAAa,GAAG,IAAI,GAAG,EAAU,CAAC;IACxC,KAAK,MAAM,CAAC,IAAI,SAAS,EAAE,CAAC;QAC1B,IAAI,aAAa,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC;YAAE,SAAS,CAAC,2BAA2B;QAClE,aAAa,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC;QACxB,IAAI,CAAC,OAAO,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,CAAC;YAAE,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,OAAO,KAAK,SAAS,CAAC,CAAC,CAAC,EAAE,EAAE,EAAE,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,EAAE,EAAE,EAAE,CAAC,CAAC,EAAE,EAAE,OAAO,EAAE,CAAC,CAAC,OAAO,EAAE,CAAC,CAAC;IAClH,CAAC;IAED,MAAM,cAAc,GAAG,aAAa,CAAC,IAAI,CAAC;IAC1C,MAAM,SAAS,GAAG,OAAO,CAAC,IAAI,CAAC;IAC/B,MAAM,UAAU,GAAG,OAAO,CAAC,MAAM,GAAG,CAAC,CAAC;IACtC,MAAM,MAAM,GAAG,UAAU;QACvB,CAAC,CAAC,GAAG,OAAO,CAAC,MAAM,OAAO,cAAc,mSAAmS;QAC3U,CAAC,CAAC,OAAO,cAAc,2FAA2F,CAAC;IAErH,OAAO,EAAE,OAAO,EAAE,cAAc,EAAE,SAAS,EAAE,UAAU,EAAE,MAAM,EAAE,CAAC;AACpE,CAAC;AAvBD,gDAuBC"}