npm - @dogfood-lab/study-swarm - Versions diffs - 1.1.0 → 1.3.0 - Mend

@dogfood-lab/study-swarm 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/CHANGELOG.md +40 -0
package/PROTOCOL.md +18 -0
package/README.es.md +27 -2
package/README.fr.md +47 -22
package/README.hi.md +27 -2
package/README.it.md +58 -33
package/README.ja.md +60 -35
package/README.md +27 -2
package/README.pt-BR.md +27 -2
package/README.zh.md +27 -2
package/bin/study-swarm.mjs +452 -1
package/examples/study-swarm-canon-rollback.dispatch.md +97 -0
package/examples/study-swarm-canon-rollback.lock.json +78 -0
package/examples/study-swarm-canon-rollback.orchestration.json +761 -0
package/examples/study-swarm-ci.yml +3 -0
package/examples/study-swarm-lock.dispatch.md +137 -0
package/examples/study-swarm-lock.lock.json +62 -0
package/examples/study-swarm-lock.orchestration.json +369 -0
package/package.json +1 -1

package/examples/study-swarm-canon-rollback.dispatch.md ADDED Viewed

@@ -0,0 +1,97 @@
+<!-- study-swarm v1.3.0 · protocol-sha256:b1e10bc705ed8fec · created:2026-06-30 -->
+# Study-swarm dispatch: the canon-rollback compensator (`withdraw` / `requalify`)
+> **Design dispatch.** This grounds the design of the **canon-rollback compensator** — the executable
+> form of the protocol's named `requalify_dependent_slices` undo. A verified finding becomes **canon**
+> (it informs a downstream design decision); when that finding is later **withdrawn** — a citation turns
+> out fabricated/misattributed on a re-run, a cited paper is **retracted** upstream, or the gate flips it —
+> a `git revert` is not enough, because the finding already propagated. The compensator flags every
+> dependent dispatch `evidence-withdrawn`, gates them fail-closed until each is removed or re-grounded, and
+> emits a content-addressed withdrawal receipt — implementing the **NAMED_COMPENSATORS** workflow standard
+> (heritage: Sagas, Garcia-Molina & Salem 1987). Five load-bearing questions went to parallel
+> retrieval-grounded research agents; every finding below was fetched this session and the whole set is
+> gated through Step 4 (`roleos verify-citations` → prism, a different model family) **before** it informs
+> the architecture. The synthesizer is Claude/Opus; the groundedness lens is Mistral; the existence oracle
+> is deterministic retrieval — none of them Claude. Run `study-swarm lint study-swarm-canon-rollback.dispatch.md`
+> (it passes).
+## Step 1 — Load-bearing questions
+Each passes the load-bearing test (two real designs hinge on the answer; an adjacent field has measured it; the protocol's prose is silent on the mechanism):
+- **Q1 — Revocation propagation.** How do certificate, package, and supply-chain systems propagate a revocation/withdrawal to dependents — revoke vs flag vs delete, soft vs hard, how do dependents learn?
+- **Q2 — Machine-readable status states.** How do vulnerability/exploitability formats propagate a status to dependents, and what discrete states do they define (so a withdrawal is a typed state, not free text)?
+- **Q3 — Scholarly retraction → citing works.** How is a retraction represented and propagated to the works that cite it (the most literal analog of withdrawing a cited finding), and does a soft alert suffice?
+- **Q4 — Sound compensators.** What makes a compensating transaction sound — named, idempotent, leaving a known post-state — and how is a compensation receipted/auditable?
+- **Q5 — Stale-marking, tombstones, contrastive surfacing.** How do build systems mark a downstream artifact stale on a changed/withdrawn input, why do tombstones beat hard deletes for audit, and what's the HCI case for surfacing a withdrawal contrastively rather than silently?
+## Step 2 — Research dispatch
+Five parallel research agents (one per question), each retrieval-required — a source an agent could not fetch did not enter the dispatch — followed by a per-question **coverage-recovery sweep** that re-retrieved and ran a retrieval-based **existence audit** of the first sweep's citations. The recovery pass earned its keep: it corrected the **IETF Idempotency-Key** draft's authors (Jena & Dalal 2025, not the first sweep's guess), softened a **COPE** claim that overstated the source ("watermarked" is Crossref/NISO wording, not COPE's "unmistakably identified"; COPE was then dropped in favour of NISO + Crossref, which cover retain-and-mark directly), resolved **Hsiao & Schneider 2021** on the second pass (the first marked it `retrieved:false`), and flagged a **Bazel** claim as slightly ahead of its cited page (dropped in favour of *Build Systems à la Carte*, which states the verifying-trace mechanism precisely). All corrections are folded into Step 3; the audit is recorded in Step 4.
+## Step 3 — Research grounding
+<!-- Every finding: author + year + a resolvable identifier (RFC / spec URL / DOI / arXiv), one-sentence finding, design implication. All gated by Step 4 before Step 5. Claims are phrased to what the retrieved source supports. -->
+1. **(Q1) An X.509 CRL never deletes a revoked certificate — it retains the serial with a CRLReason code, and an entry must not be removed until the certificate expires.** Cooper et al. 2008 (RFC 5280, https://www.rfc-editor.org/rfc/rfc5280). Implication: the `evidence-withdrawn` sidecar is a *retained* tombstone carrying a reason code, persisting across rebuilds — flag-don't-delete (C5).
+2. **(Q1) OCSP stapling attaches the signed status inline during the handshake, so revocation status travels WITH the artifact instead of a separate fetch.** Eastlake 2011 (RFC 6066, https://www.rfc-editor.org/rfc/rfc6066). Implication: the tombstone is a sidecar *co-located* with its dispatch, read deterministically with no network call — fitting the zero-dependency constraint (C4).
+3. **(Q1) Go's `retract` directive keeps a version available (locked builds still resolve it) but excludes it from automatic selection and warns, explicitly borrowing the academic "retracted paper" analogy.** The Go Authors 2019 (https://go.dev/ref/mod). Implication: a withdrawn finding stays replayable but HALTS automatic downstream use, and re-grounding a corrected finding is the remedy — the protocol's own framing, validated (C5, C6).
+4. **(Q1) `cargo yank` does not delete data — already-locked builds still resolve the version, new ones cannot, and `--undo` re-adds it.** The Cargo Authors 2024 (https://doc.rust-lang.org/cargo/commands/cargo-yank.html). Implication: clearing a flag is the reversible `--undo` analog, and already-pinned dispatches stay reproducible (C5, C7).
+5. **(Q1) npm recommends `deprecate` (the package stays downloadable with a warning) over `unpublish`, precisely because a version with dependents must not break consumers.** npm, Inc. 2024 (https://docs.npmjs.com/policies/unpublish/). Implication: flag-and-retain is the default *because* a withdrawn finding has dependents; hard deletion is the no-dependents exception (C5).
+6. **(Q1) RFC 7633 "Must-Staple" instructs a relying party to reject a configuration that fails to present the mandated status rather than soft-failing — closing the loophole where a missing check is read as "good."** Hallam-Baker 2015 (RFC 7633, https://www.rfc-editor.org/rfc/rfc7633). Implication: `requalify --check` fails closed by default — the absence of a clean re-verification is failure, never an implicit pass (C6).
+7. **(Q1) TUF makes every removal/revocation an explicit signed, version-incremented metadata change, with monotonic ordering that rejects a replayed older state.** TUF Project 2024 (https://theupdateframework.github.io/specification/latest/). Implication: the receipt and sidecar are version-incremented and content-hashed so a replay of a pre-withdrawal state fails the gate (C8).
+8. **(Q2) OpenVEX defines a small CLOSED status enum and requires a machine-readable justification — a status cannot be asserted without a structured reason — to enable automated policy rather than free-text prose.** OpenVEX Project 2023 (https://github.com/openvex/spec/blob/main/OPENVEX-SPEC.md). Implication: the withdrawal carries a closed `--reason` enum (fabricated / misattributed / retracted / verifier-flipped / other); a bare flag with no machine-readable cause is rejected (C3).
+9. **(Q2) CSAF 2.0 mandates that a released versioned document's contents MUST NOT be modified — any change is a new version recorded in a `revision_history` — alongside its product_status enum.** OASIS CSAF Technical Committee 2022 (https://docs.oasis-open.org/csaf/csaf/v2.0/os/csaf-v2.0-os.html). Implication: the sidecar is append-only and version-incremented — a flag/clear is a new audit-trail entry, never an in-place edit (C7, C8).
+10. **(Q2) CycloneDX 1.6 distinguishes `false_positive` (wrongly identified) from `resolved` (genuinely remediated) as different terminal states.** CycloneDX 2024 (https://cyclonedx.org/docs/1.6/json/). Implication: resolution records *why* a flag cleared — `--mode removed` (the finding was fabricated/misattributed and is gone, a false-positive analog) vs `--mode regrounded` (re-verified clean) (C9).
+11. **(Q2) CycloneDX 1.6 defines `resolved_with_pedigree` — a remediation accompanied by verifiable evidence — distinct from a bare `resolved` assertion.** CycloneDX 2024 (https://cyclonedx.org/docs/1.6/json/). Implication: `--mode regrounded` requires a `--note` attestation (the sibling-runner re-verification reference), never a bare claim that it was fixed (C9, C12).
+12. **(Q3) NISO's CREC recommended practice preserves a retracted item's status as machine-readable metadata, flagged not deleted, and consumable by automated downstream processes.** NISO CREC Working Group 2024 (NISO RP-45-2024, https://www.niso.org/publications/rp-45-2024-crec). Implication: the tombstone is machine-readable so the fail-closed gate consumes it deterministically (pure JSON + SHA-256, no model call), and the record is retained (C5, C6).
+13. **(Q3) Crossref keeps the original DOI/record and issues a SEPARATE retraction notice with its own DOI, linked to the original (which is marked), rather than removing it.** Crossref 2024 (https://www.crossref.org/documentation/principles-practices/best-practices/versioning/). Implication: the withdrawal receipt is a separate content-addressed record pointing at the withdrawn identifier; the dependent dispatches are marked in place and retained (C8, C5).
+14. **(Q3) A randomized controlled trial emailing authors who cited a now-retracted article did NOT significantly reduce future citation of retracted papers.** DeVito, Cunningham & Goldacre 2024 (RetractoBot, https://peerreviewcongress.org/abstract/notifying-authors-that-they-have-cited-a-retracted-article-and-future-citations-of-retracted-articles-the-retractobot-randomized-controlled-trial/). Implication: a soft alert is empirically insufficient — the compensator must FAIL CLOSED (a non-zero gate, the andon halt), not merely notify (C6).
+15. **(Q3) Only ~5.4% of post-retraction citations acknowledged the retraction — the overwhelming majority kept using retracted papers as if valid.** Hsiao & Schneider 2021 (DOI:10.1162/qss_a_00155). Implication: without an enforced gate, withdrawn findings keep propagating silently; `requalify --check` must be the default-on barrier, not a notice humans may ignore (C6).
+16. **(Q3) Retracted research keeps causing downstream harm via an "attention escape" mechanism that disproportionately affects INDIRECTLY citing work.** Huang et al. 2025 (arXiv:2501.00473). Implication: `withdraw` scans the WHOLE corpus, not a single directory — every dispatch citing the withdrawn identifier is flagged, because harm escapes to the indirect layer (C2).
+17. **(Q4) A saga pairs each transaction with a compensating transaction that undoes its effect "from a semantic point of view" and does not necessarily restore the exact prior state.** Garcia-Molina & Salem 1987 (DOI:10.1145/38713.38742). Implication: the compensator is SEMANTIC, not syntactic — it does not `git revert` the propagation; it flags the dependent so the system reaches a valid, known state. This is *why* a named compensator beyond `revert_dispatch_commit` is required (C1).
+18. **(Q4) A compensating transaction does not return the system to its start state; because compensation can itself fail and be retried, each step must be idempotent and record progress to resume.** Microsoft 2026 (Azure Architecture Center, https://learn.microsoft.com/en-us/azure/architecture/patterns/compensating-transaction). Implication: `requalify --resolve` is idempotent and records progress in the sidecar, so a re-run resumes correctly rather than double-applying (C7).
+19. **(Q4) Stripe replays an idempotency key by returning the first request's stored result without re-performing it, and rejects a replay whose parameters differ.** Stripe 2026 (https://docs.stripe.com/api/idempotent_requests). Implication: re-running `withdraw` for the same (identifier, reason, corpus) yields the SAME receipt digest with no new side effect; a re-run with a different reason is a detectable change, not a silent divergence (C7, C8).
+20. **(Q4) The IETF Idempotency-Key draft pairs a client key (recognize the retry, return the prior result) with an optional fingerprint that detects a key reused with a different payload.** Jena & Dalal 2025 (draft-ietf-httpapi-idempotency-key-header-07, https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-07). Implication: the receipt's key is the withdrawn identifier and its fingerprint is `receipt_sha256` over (id + reason + sorted dependents) — a safe-replay key plus a drift detector (C8).
+21. **(Q4) In event sourcing the store is never updated; the only way to undo is to append a compensating event — the original event remains and the compensating event records that it was undone.** Microsoft 2026 (Azure Architecture Center, https://learn.microsoft.com/en-us/azure/architecture/patterns/event-sourcing). Implication: the sidecar's `audit_trail` is append-only — `withdraw` and `resolve` are appended events; un-withdrawal never erases the tombstone, it records a transition (C7).
+22. **(Q4) Git is content-addressable: an object's key is the hash of its header plus exact content, so identical content yields the same key and any change yields a different key.** Chacon & Straub 2014 (https://git-scm.com/book/en/v2/Git-Internals-Git-Objects). Implication: `receipt_sha256` / `withdrawn_sha256` are the records' content-addresses (RFC 8785 JCS + SHA-256), so any post-hoc tamper of the id/reason/dependents is drift-detectable — the same self-integrity model as `dispatch.lock.json` (C8).
+23. **(Q5) Build Systems à la Carte models incremental builds via rebuilders that consult "verifying traces" — stored dependency hashes — and rebuild a target only when those recorded hashes no longer match.** Mokhov, Mitchell & Peyton Jones 2018 (DOI:10.1145/3236774). Implication: `requalify --check` IS a deterministic `verifyVT` (a fail-closed hash/status comparison); the sibling runner supplies the re-verified value (`recordVT`), keeping the CLI a pure verifier (C11, C12).
+24. **(Q5) Snakemake re-runs an output when its inputs, parameters, code, or environment change, using checksums rather than trusting timestamps.** Snakemake 2024 (https://snakemake.readthedocs.io/en/stable/project_info/faq.html). Implication: a withdrawn citation is a changed-input trigger — a dependent must be requalified (re-grounded) before it is considered current; clearing the flag is the analog of a re-run producing a fresh, validated output (C6, C12).
+25. **(Q5) Cassandra deletes by writing a timestamped tombstone marker rather than removing the value, because a hard delete lets an offline replica resurrect the data as a zombie; the marker itself propagates and is auditable.** Apache Cassandra 2024 (https://cassandra.apache.org/doc/latest/cassandra/managing/operating/compaction/tombstones.html). Implication: the tombstone sidecar prevents a withdrawn finding from "resurrecting" into a dependent design on a later run, and the marker (not the absence of data) is what carries the audit (C5).
+26. **(Q5) In a study of N=628, contrastive explanations — the difference between the AI's choice and the user's likely choice — improved independent decision-making versus unilateral explanations, without sacrificing accuracy.** Buçinca et al. 2024 (arXiv:2410.04253). Implication: a withdrawal is surfaced contrastively — "finding N withdrawn because X; you may have relied on it; dispatches A, B flagged — re-ground or override" — never a silent drop (C10).
+27. **(Q5) AI explanations did not improve complementary human-AI team performance; they increased the chance humans accepted the AI's recommendation regardless of correctness (over-reliance).** Bansal et al. 2021 (arXiv:2006.14779). Implication: a silent or unexplained removal of a withdrawn finding risks the same over-reliance — users keep trusting the now-empty design — so the compensator fails closed and forces an explicit re-ground/override (C6, C10).
+## Step 4 — External verification
+<!-- Run against this dispatch's citations through the live runner before Step 5 was locked. -->
+**Run against this dispatch's 27 citations through the LIVE runner before Step 5 was locked.** Synthesizer = Claude/Opus; verifier = the deterministic arXiv→Crossref retrieval oracle + a groundedness lens on **`mistral-small:24b`** (ModelFamily `local`, reasoning-stripped — the synthesizer's `anthropic` family is excluded by construction). Command: `roleos verify-citations examples/study-swarm-canon-rollback.dispatch.md --provider ollama` → `prism verify --type citations` (prism v1.6.0). **No verifier was Claude — the protocol did not grade its own homework.**
+- [x] existence established by **retrieval, not memory** — the structured oracle for the academic subset (arXiv + Crossref DOIs), plus an independent in-session retrieval existence-audit (the coverage-recovery pass) for the RFC/spec/vendor-doc subset.
+- [x] groundedness checked by a **different family** (`mistral-small:24b`, reasoning-stripped) where the oracle returned an abstract; abstract-less and oracle-unavailable citations **escalated, never auto-passed**.
+- [x] **≥ 3 decorrelated lenses**: the deterministic arXiv/Crossref oracle + the Mistral groundedness lens + the independent coverage-recovery retrieval existence-audit.
+**Verdict: `escalate` (advisory, non-blocking). 0 fabricated, 0 refused.** As on the v1.2 lock dispatch, the gate discriminated by source type: this dispatch is **heavily RFC / OASIS / NISO / vendor-spec sourced**, so most citations are reported `unparsed` by the arXiv/Crossref extractor — **that is not fabrication**; each was existence-verified by direct retrieval this session (the coverage-recovery audit resolved all of them and produced the corrections folded into Step 3). The academic subset (Garcia-Molina & Salem `10.1145/38713.38742`; Build Systems à la Carte `10.1145/3236774`; Hsiao & Schneider `10.1162/qss_a_00155`; Buçinca `arXiv:2410.04253`; Bansal `arXiv:2006.14779`; Huang `arXiv:2501.00473`) was resolved through the oracle. **Receipt captured + cryptographically verified** — see the receipt id and chain hash pinned into this dispatch's [`study-swarm-canon-rollback.lock.json`](study-swarm-canon-rollback.lock.json) (L10), public-key-verified via `prism verify-receipt --public-key` → `signature_valid: true`. Honest ceiling: the signing key is ephemeral and scratchpad-local, so the receipt buys third-party *verifiability*, not anti-forgery.
+## Step 5 — Architecture (the canon-rollback compensator)
+Each choice traces to findings by number. Three deterministic, network-free verbs over a dispatch corpus:
+- `study-swarm withdraw <identifier> --reason <reason> [--detail <text>] [--from <dir>] [--receipt <path>]`
+- `study-swarm requalify --check <corpus-dir>`
+- `study-swarm requalify --resolve <dispatch> <identifier> --mode removed|regrounded [--note <text>]`
+- **C1 — A semantic compensator, not a `git revert`.** A compensator "undoes from a semantic point of view" and does not restore the exact prior state, so the withdrawn finding's propagation cannot be un-done by reverting the commit; the compensator instead drives dependents to a valid, known state by flagging them. This is why `requalify_dependent_slices` must exist beyond `revert_dispatch_commit`. (findings 17, 18)
+- **C2 — Dependents = every dispatch in the corpus citing the identifier; scan the whole corpus.** Harm escapes to indirectly-related work, so the scan is corpus-wide, not a single directory, and uses the same finding parser `lint` uses so Step 3 and the compensator agree on what a citation is. (finding 16)
+- **C3 — A closed, machine-readable reason enum, never free text.** `--reason` is one of `fabricated | misattributed | retracted | verifier-flipped | other`; a withdrawal with no structured cause is rejected. (finding 8)
+- **C4 — A co-located tombstone sidecar (`<slug>.withdrawn.json`).** Status travels WITH the artifact (the OCSP-stapling property), so a runner reading the dispatch gets the withdrawal status inline with no network call. (finding 2)
+- **C5 — Flag, never delete (a retained tombstone).** Every mature system retains-and-flags rather than hard-deletes a thing with dependents; the sidecar marks the dispatch and is retained for audit, and a withdrawn finding cannot silently resurrect on a later run. (findings 1, 3, 5, 12, 13, 25)
+- **C6 — The gate fails closed (the andon).** `requalify --check` exits non-zero for any unresolved `evidence-withdrawn` flag — a missing re-verification is failure, not an implicit pass — because notification alone provably fails to stop downstream propagation and silent drops drive over-reliance. (findings 6, 14, 15, 24, 27)
+- **C7 — Idempotent resolve + an append-only audit trail.** `requalify --resolve` is idempotent (re-running on a cleared finding is a no-op) and records progress; the sidecar is append-only and version-incremented — a flag or clear is an appended event, never an in-place edit. (findings 4, 9, 18, 19, 21)
+- **C8 — Content-addressed receipt + sidecar, drift-detectable.** Both carry a self-describing `sha256-…` digest (RFC 8785 JCS + SHA-256) over their body; the receipt's key is the withdrawn identifier and its fingerprint is `receipt_sha256`, so a replay is a safe no-op and any tamper is drift-detectable — the same self-integrity model as `dispatch.lock.json`. (findings 7, 9, 13, 20, 22)
+- **C9 — Two resolution modes: `removed` vs `regrounded`.** A finding that was fabricated/misattributed is `removed` (the citation is gone — a false-positive analog, checked deterministically); one re-verified in place is `regrounded`, which requires a `--note` attestation, mirroring "resolved-with-evidence." (findings 10, 11)
+- **C10 — Contrastive surfacing, never a silent drop.** `withdraw` and `requalify --check` emit a contrastive frame ("you may have relied on this; re-ground or override"), pairing with Step 4's `CANNOT_CONFIRM` checkpoint, because contrastive surfacing improves independent decisions and a silent drop risks over-reliance. (findings 26, 27)
+- **C11 — The compensator operates on the VOLATILE evidence layer, not the STABLE protocol/lock shape.** The tombstone (per-dispatch evidence) is a separate module from the lock (replay-pinning) and `PROTOCOL.md` (the methodology); `requalify --check` is the `verifyVT` of a build system, and `lock --verify` is unaffected by a withdraw/resolve — a Parnas secret-hiding boundary the smoke suite proves holds. (finding 23)
+- **C12 — Honest ceiling: the CLI flags, gates, and receipts; re-verification defers to the sibling runner.** Like a build system's `recordVT`, the new re-verified value comes from outside (`roleos verify-citations` → prism); `requalify --resolve --mode regrounded` records that it happened, it does not itself re-verify — keeping the CLI zero-dependency, network-free, and deterministic. (findings 11, 23, 24)
+**Net:** the compensator makes `requalify_dependent_slices` executable — a withdrawn finding is flagged on every dependent with a machine-readable reason, the dependents HALT fail-closed until removed or re-grounded, and the whole rollback is captured in a content-addressed, drift-detectable, append-only receipt — while telling the truth about its ceiling: it flags and gates deterministically; the re-verification of a re-grounded finding is the sibling runner's job.

package/examples/study-swarm-canon-rollback.lock.json ADDED Viewed

@@ -0,0 +1,78 @@
+{
+  "schema": "dispatch.lock/v1",
+  "study_swarm_version": "1.3.0",
+  "protocol_sha256": "sha256-seELxwXtj+xVxa3mas2cP0J0iMTTpq4vHqFMiUGqmuU=",
+  "dispatch_sha256": "sha256-V7b/nROpdDfH6TBUDrBwcyxFwR99yihkfIaNFE5yGkk=",
+  "steps": [
+    {
+      "question_id": "Q1-revocation-propagation",
+      "resolved_model": "claude-opus-4-8",
+      "prompt_sha256": "sha256-UM5MzdUiL88wbj+GaUsrlwiAwSffNYD6G5KcIRSFcn0=",
+      "tool_schema_sha256": "sha256-Tuyydf1fteUasUVBy+VzlGijelrqPfvHjI7wHvE9Omk=",
+      "schema_dialect": "https://json-schema.org/draft/2020-12/schema",
+      "params": {
+        "effort": "high"
+      },
+      "output_sha256": "sha256-utlcmvknCo6mvVE0CJSptTHKAWjAbyAccCUXlV4cuyk="
+    },
+    {
+      "question_id": "Q2-status-propagation-states",
+      "resolved_model": "claude-opus-4-8",
+      "prompt_sha256": "sha256-AY9HKlNQhdAw8Woxni4903GeG3GwItWaHLucloitZlQ=",
+      "tool_schema_sha256": "sha256-Tuyydf1fteUasUVBy+VzlGijelrqPfvHjI7wHvE9Omk=",
+      "schema_dialect": "https://json-schema.org/draft/2020-12/schema",
+      "params": {
+        "effort": "high"
+      },
+      "output_sha256": "sha256-yU0hMIEGjOL7imQbLJWzXhJQze1onwHDVmPY0dKR8ow="
+    },
+    {
+      "question_id": "Q3-scholarly-retraction",
+      "resolved_model": "claude-opus-4-8",
+      "prompt_sha256": "sha256-TEeHZdUC26paRW9TSYHuiDx0Xn6olBysMIkPDDw4jXo=",
+      "tool_schema_sha256": "sha256-Tuyydf1fteUasUVBy+VzlGijelrqPfvHjI7wHvE9Omk=",
+      "schema_dialect": "https://json-schema.org/draft/2020-12/schema",
+      "params": {
+        "effort": "high"
+      },
+      "output_sha256": "sha256-gWAS2GsMJmkDgODKPL3tR0bX1qqc7pIh9MVevzVUkHQ="
+    },
+    {
+      "question_id": "Q4-sound-compensators",
+      "resolved_model": "claude-opus-4-8",
+      "prompt_sha256": "sha256-aavKw3GRZWQPAQf3UbsptfbiOjQSLfqOoQJ9zqvQeJM=",
+      "tool_schema_sha256": "sha256-Tuyydf1fteUasUVBy+VzlGijelrqPfvHjI7wHvE9Omk=",
+      "schema_dialect": "https://json-schema.org/draft/2020-12/schema",
+      "params": {
+        "effort": "high"
+      },
+      "output_sha256": "sha256-5ui1SfolNP+Dj4wqut/XysFeXlRD2wtlAd9cUzFgkSU="
+    },
+    {
+      "question_id": "Q5-stale-tombstone-contrastive",
+      "resolved_model": "claude-opus-4-8",
+      "prompt_sha256": "sha256-4K8vxSk4rRNpH76T+dBc5pmMRqdP1nCPKahSB650cKc=",
+      "tool_schema_sha256": "sha256-Tuyydf1fteUasUVBy+VzlGijelrqPfvHjI7wHvE9Omk=",
+      "schema_dialect": "https://json-schema.org/draft/2020-12/schema",
+      "params": {
+        "effort": "high"
+      },
+      "output_sha256": "sha256-0UKzfcvegCvkXik9Dq/xSW2e2kDW1LNgKaOt+omoMq8="
+    }
+  ],
+  "verification": {
+    "runner": "roleos verify-citations",
+    "runner_source": "role-os local clone E:/AI/role-os",
+    "tool": "prism verify --type citations",
+    "tool_version": "prism 1.6.0",
+    "verifier_model": "mistral-small:24b",
+    "verifier_family": "local",
+    "caller_family_excluded": "anthropic",
+    "verdict": "escalate",
+    "receipt_id": "prism-01kwbg8j26k63mm3cqe1t980p9",
+    "receipt_signature": "1f4455b1f090425503850b0d58051263a69987b4d975cbb94feff49653e362466095993e3a106c82bc5ec1a17b4c6a659df1665528936373ca98d448a519da0e",
+    "citations_sha256": "70a0d84650477090be49925bf4b53ab52b8fdb88ebf5d2487f3000ea96fc810a",
+    "receipt_chain_sha256": "dcd57fc2ec8d66992e9762e1e551aee75829dda2bf9cc0837bc3106f48cece94"
+  },
+  "lock_sha256": "sha256-ux8eG1feaHkbKoSd4qxcoK/UE7jgXe1jd21LBfxviNc="
+}