npm - @lcv-ideas-software/cross-review - Versions diffs - 4.2.5 → 4.3.0 - Mend

@lcv-ideas-software/cross-review 4.2.5 → 4.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/CHANGELOG.md +30 -0
package/README.md +9 -2
package/dist/scripts/eval-fixtures.d.ts +54 -0
package/dist/scripts/eval-fixtures.js +216 -0
package/dist/scripts/eval-fixtures.js.map +1 -0
package/dist/scripts/smoke.js +187 -0
package/dist/scripts/smoke.js.map +1 -1
package/dist/src/core/config.d.ts +1 -1
package/dist/src/core/config.js +1 -1
package/dist/src/core/orchestrator.js +27 -2
package/dist/src/core/orchestrator.js.map +1 -1
package/dist/src/core/reports.d.ts +2 -1
package/dist/src/core/reports.js +30 -0
package/dist/src/core/reports.js.map +1 -1
package/dist/src/core/session-store.d.ts +2 -1
package/dist/src/core/session-store.js +141 -0
package/dist/src/core/session-store.js.map +1 -1
package/dist/src/core/types.d.ts +30 -0
package/dist/src/mcp/server.js +15 -0
package/dist/src/mcp/server.js.map +1 -1
package/docs/apresentacao-cross-review.md +10 -8
package/docs/apresentacao.md +23 -20
package/docs/architecture.md +5 -0
package/package.json +2 -1

package/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,36 @@ standard `v00.00.00`; npm package versions remain SemVer.
 ## [Unreleased]
+## [v04.03.00] — 2026-06-05
+**Minor — P1/P2/P3 audit follow-up.** This release closes the first concrete
+items from the post-v4.2.5 runtime/session audit: unresolved evidence is harder
+to miss at finalization time, fixture-level regressions can be evaluated
+offline, and operators get a read-only peer reliability report without changing
+peer selection.
+### Added
+- Added `session_peer_reliability_report`, a read-only MCP tool that aggregates
+  per-peer parser warnings, decision quality, rejected/provider failures,
+  evidence checklist dispositions, fabrication-related events, latency and
+  cost.
+- Added `npm run eval:fixtures`, an offline fixture harness for truthfulness
+  preflight, parser diagnostics and report rendering contracts. It does not
+  start provider sessions or call reviewers.
+- `session_report` now includes an **Unresolved Evidence Disposition** section
+  when checklist items remain `open` or `not_resurfaced`.
+### Changed
+- Automatic convergence with unresolved checklist items now finalizes with
+  `unanimous_ready_with_unresolved_evidence` or
+  `recovered_unanimity_with_unresolved_evidence` instead of a plain success
+  reason.
+- Finalization now emits `session.evidence_checklist_unresolved_on_finalize`
+  with unresolved counts and item summaries when a session closes while
+  evidence asks are still open or only inferred as not resurfaced.
 ## [v04.02.05] — 2026-06-05
 **Patch — session audit hardening.** This release closes follow-ups from the

package/README.md CHANGED Viewed

@@ -24,7 +24,7 @@ npm install -g @lcv-ideas-software/cross-review
 npm install -g @lcv-ideas-software/cross-review --registry=https://npm.pkg.github.com
 ```
-**Status.** Stable. Current release: **v04.02.05** (npm package `4.2.5`). See [CHANGELOG.md](./CHANGELOG.md) for the full release history.
+**Status.** Stable. Current release: **v04.03.00** (npm package `4.3.0`). See [CHANGELOG.md](./CHANGELOG.md) for the full release history.
 > **Project renamed 2026-05-15.** This project was previously published as
 > [`@lcv-ideas-software/cross-review-v2`](https://www.npmjs.com/package/@lcv-ideas-software/cross-review-v2)
@@ -38,6 +38,7 @@ The version history at a glance:
 | Release              | Scope                                                                                                                                                                                                              |
 | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| **`v04.03.00`**      | Minor — P1/P2/P3 follow-up with unresolved-evidence close-out visibility, an offline fixture eval harness, and a read-only peer reliability report.                                                                |
 | **`v04.02.05`**      | Patch — harden session auditability with terminal events, cost split reporting, `not_resurfaced` visibility, and relator provenance checks for session IDs/GitHub URLs.                                            |
 | **`v04.02.04`**      | Patch — harden truthfulness preflight auditability, add a read-only preflight retest tool, and reduce false parser warnings for attached/log evidence.                                                             |
 | **`v04.02.03`**      | Patch — promote the Gemini canonical default to `gemini-3.1-pro-preview` and refresh the active local Gemini rate card.                                                                                            |
@@ -211,6 +212,7 @@ these environment variables before running real sessions (example):
 - `session_metrics`
 - `session_doctor`
 - `session_report`
+- `session_peer_reliability_report`
 - `session_check_convergence`
 - `session_truthfulness_preflight_check`
 - `session_attach_evidence`
@@ -228,7 +230,12 @@ these environment variables before running real sessions (example):
 lack terminal events, and reports peer-call cost separately from generation
 artifact cost. `session_report` uses the same split and calls out
 `not_resurfaced` evidence checklist items as inference-only, not proof that the
-requested evidence was satisfied.
+requested evidence was satisfied. If a session otherwise reaches unanimity with
+open or `not_resurfaced` checklist items, finalization records an
+`*_with_unresolved_evidence` outcome reason and emits a durable unresolved
+evidence event. `session_peer_reliability_report` is read-only and aggregates
+per-peer parser warnings, evidence ask status, provider failures, cost and
+latency.
 ## Repository conventions

package/dist/scripts/eval-fixtures.d.ts ADDED Viewed

@@ -0,0 +1,54 @@
+export declare const truthfulnessCases: readonly [{
+    readonly name: "current runtime contradiction is blocked";
+    readonly input: {
+        readonly task: "The current cross-review runtime is 4.2.4.";
+        readonly runtimeFacts: {
+            readonly runtime_version: "4.2.5";
+            readonly release_date: "2026-06-05";
+        };
+        readonly attachmentsPresent: false;
+    };
+    readonly expectPass: false;
+    readonly expectIssueClass: "runtime_contradiction";
+}, {
+    readonly name: "matching current runtime facts pass";
+    readonly input: {
+        readonly task: "server_info shows current cross-review runtime 4.2.5.";
+        readonly runtimeFacts: {
+            readonly runtime_version: "4.2.5";
+            readonly release_date: "2026-06-05";
+        };
+        readonly attachmentsPresent: false;
+    };
+    readonly expectPass: true;
+}, {
+    readonly name: "historical timing claim needs snapshot evidence";
+    readonly input: {
+        readonly task: "When the audit began, cross-review was running 4.2.4.";
+        readonly runtimeFacts: {
+            readonly runtime_version: "4.2.5";
+            readonly release_date: "2026-06-05";
+        };
+        readonly attachmentsPresent: false;
+    };
+    readonly expectPass: false;
+    readonly expectIssueClass: "unsupported_historical_claim";
+}];
+export declare const parserCases: readonly [{
+    readonly name: "verified with empty evidence gets empty-evidence warning";
+    readonly text: string;
+    readonly expectStatus: "READY";
+    readonly expectWarning: "verified_without_evidence_sources";
+}, {
+    readonly name: "verified with attached evidence path is concrete";
+    readonly text: string;
+    readonly expectStatus: "READY";
+    readonly absentWarning: "verified_without_evidence_sources";
+}];
+export declare const reportCases: readonly [{
+    readonly name: "cost split and unresolved evidence are surfaced";
+    readonly peerCost: 14.652426;
+    readonly generationCost: 1.876718;
+    readonly totalCost: 16.529144;
+    readonly unresolvedAsk: "attach raw npm test output";
+}];

package/dist/scripts/eval-fixtures.js ADDED Viewed

@@ -0,0 +1,216 @@
+import assert from "node:assert/strict";
+import fs from "node:fs";
+import os from "node:os";
+import path from "node:path";
+import { loadConfig } from "../src/core/config.js";
+import { truthfulnessPreflight } from "../src/core/orchestrator.js";
+import { sessionReportMarkdown } from "../src/core/reports.js";
+import { SessionStore } from "../src/core/session-store.js";
+import { parsePeerStatus } from "../src/core/status.js";
+function evalTmpDir(label) {
+    return fs.mkdtempSync(path.join(os.tmpdir(), `cross-review-eval-${label}-`));
+}
+export const truthfulnessCases = [
+    {
+        name: "current runtime contradiction is blocked",
+        input: {
+            task: "The current cross-review runtime is 4.2.4.",
+            runtimeFacts: { runtime_version: "4.2.5", release_date: "2026-06-05" },
+            attachmentsPresent: false,
+        },
+        expectPass: false,
+        expectIssueClass: "runtime_contradiction",
+    },
+    {
+        name: "matching current runtime facts pass",
+        input: {
+            task: "server_info shows current cross-review runtime 4.2.5.",
+            runtimeFacts: { runtime_version: "4.2.5", release_date: "2026-06-05" },
+            attachmentsPresent: false,
+        },
+        expectPass: true,
+    },
+    {
+        name: "historical timing claim needs snapshot evidence",
+        input: {
+            task: "When the audit began, cross-review was running 4.2.4.",
+            runtimeFacts: { runtime_version: "4.2.5", release_date: "2026-06-05" },
+            attachmentsPresent: false,
+        },
+        expectPass: false,
+        expectIssueClass: "unsupported_historical_claim",
+    },
+];
+export const parserCases = [
+    {
+        name: "verified with empty evidence gets empty-evidence warning",
+        text: JSON.stringify({
+            status: "READY",
+            summary: "ok",
+            confidence: "verified",
+            evidence_sources: [],
+            caller_requests: [],
+            follow_ups: [],
+        }),
+        expectStatus: "READY",
+        expectWarning: "verified_without_evidence_sources",
+    },
+    {
+        name: "verified with attached evidence path is concrete",
+        text: JSON.stringify({
+            status: "READY",
+            summary: "ok",
+            confidence: "verified",
+            evidence_sources: ["evidence/2026-06-05T00-00-00Z-raw-smoke.txt: npm test 42 passed"],
+            caller_requests: [],
+            follow_ups: [],
+        }),
+        expectStatus: "READY",
+        absentWarning: "verified_without_evidence_sources",
+    },
+];
+export const reportCases = [
+    {
+        name: "cost split and unresolved evidence are surfaced",
+        peerCost: 14.652426,
+        generationCost: 1.876718,
+        totalCost: 16.529144,
+        unresolvedAsk: "attach raw npm test output",
+    },
+];
+for (const testCase of truthfulnessCases) {
+    const result = truthfulnessPreflight({
+        task: testCase.input.task,
+        runtimeFacts: testCase.input.runtimeFacts,
+        attachmentsPresent: testCase.input.attachmentsPresent,
+    });
+    assert.equal(result.pass, testCase.expectPass, testCase.name);
+    if ("expectIssueClass" in testCase) {
+        assert.ok(result.issue_classes.includes(testCase.expectIssueClass), testCase.name);
+    }
+}
+for (const testCase of parserCases) {
+    const result = parsePeerStatus(testCase.text);
+    assert.equal(result.status, testCase.expectStatus, testCase.name);
+    if ("expectWarning" in testCase) {
+        assert.ok(result.parser_warnings.includes(testCase.expectWarning), testCase.name);
+    }
+    if ("absentWarning" in testCase) {
+        assert.ok(!result.parser_warnings.includes(testCase.absentWarning), testCase.name);
+    }
+}
+for (const testCase of reportCases) {
+    const store = new SessionStore({
+        ...loadConfig(),
+        data_dir: evalTmpDir("report"),
+    });
+    const session = await store.init(`eval report fixture: ${testCase.name}`, "operator", []);
+    const meta = store.read(session.session_id);
+    const ts = new Date().toISOString();
+    meta.rounds = [
+        {
+            round: 1,
+            started_at: ts,
+            completed_at: ts,
+            caller_status: "READY",
+            prompt_file: "agent-runs/round-1-prompt.md",
+            peers: [
+                {
+                    peer: "codex",
+                    provider: "openai",
+                    model: "gpt-5.5",
+                    status: "READY",
+                    structured: {
+                        status: "READY",
+                        summary: "ready",
+                        confidence: "verified",
+                        evidence_sources: ["server_info: version 4.2.5"],
+                        caller_requests: [],
+                        follow_ups: [],
+                    },
+                    text: "{}",
+                    raw: { fixture: true },
+                    decision_quality: "clean",
+                    parser_warnings: [],
+                    attempts: 1,
+                    latency_ms: 1,
+                    usage: { input_tokens: 1, output_tokens: 1, total_tokens: 2 },
+                    cost: {
+                        currency: "USD",
+                        estimated: false,
+                        source: "configured-rate",
+                        total_cost: testCase.peerCost,
+                    },
+                },
+            ],
+            rejected: [],
+            convergence: {
+                converged: true,
+                reason: "fixture",
+                ready_peers: ["codex"],
+                not_ready_peers: [],
+                needs_evidence_peers: [],
+                rejected_peers: [],
+                skipped_peers: [],
+                decision_quality: {
+                    codex: "clean",
+                    claude: "clean",
+                    gemini: "clean",
+                    deepseek: "clean",
+                    grok: "clean",
+                    perplexity: "clean",
+                },
+                blocking_details: [],
+            },
+        },
+    ];
+    meta.generation_files = [
+        {
+            round: 0,
+            peer: "codex",
+            label: "initial_draft",
+            path: "agent-runs/round-0-initial-draft.md",
+            ts,
+            usage: { input_tokens: 1, output_tokens: 1, total_tokens: 2 },
+            cost: {
+                currency: "USD",
+                estimated: false,
+                source: "configured-rate",
+                total_cost: testCase.generationCost,
+            },
+        },
+    ];
+    meta.totals.cost = {
+        currency: "USD",
+        estimated: false,
+        source: "configured-rate",
+        total_cost: testCase.totalCost,
+    };
+    meta.evidence_checklist = [
+        {
+            id: "eval-1",
+            peer: "codex",
+            first_round: 1,
+            last_round: 1,
+            round_count: 1,
+            ask: testCase.unresolvedAsk,
+            first_seen_at: ts,
+            last_seen_at: ts,
+            status: "not_resurfaced",
+            addressed_at_round: 2,
+            address_method: "resurfacing",
+        },
+    ];
+    fs.writeFileSync(store.metaPath(session.session_id), JSON.stringify(meta));
+    const report = sessionReportMarkdown(store.read(session.session_id), []);
+    assert.ok(report.includes("$16.529144 USD = $14.652426 peer + $1.876718 generation"));
+    assert.ok(report.includes("## Unresolved Evidence Disposition"));
+    assert.ok(report.includes(testCase.unresolvedAsk));
+}
+console.log(JSON.stringify({
+    ok: true,
+    truthfulness_cases: truthfulnessCases.length,
+    parser_cases: parserCases.length,
+    report_cases: reportCases.length,
+}));
+//# sourceMappingURL=eval-fixtures.js.map

package/dist/scripts/eval-fixtures.js.map ADDED Viewed

@@ -0,0 +1 @@

+ {"version":3,"file":"eval-fixtures.js","sourceRoot":"","sources":["../../scripts/eval-fixtures.ts"],"names":[],"mappings":"AAAA,OAAO,MAAM,MAAM,oBAAoB,CAAC;AACxC,OAAO,EAAE,MAAM,SAAS,CAAC;AACzB,OAAO,EAAE,MAAM,SAAS,CAAC;AACzB,OAAO,IAAI,MAAM,WAAW,CAAC;AAC7B,OAAO,EAAE,UAAU,EAAE,MAAM,uBAAuB,CAAC;AACnD,OAAO,EAAE,qBAAqB,EAAE,MAAM,6BAA6B,CAAC;AACpE,OAAO,EAAE,qBAAqB,EAAE,MAAM,wBAAwB,CAAC;AAC/D,OAAO,EAAE,YAAY,EAAE,MAAM,8BAA8B,CAAC;AAC5D,OAAO,EAAE,eAAe,EAAE,MAAM,uBAAuB,CAAC;AAExD,SAAS,UAAU,CAAC,KAAa;IAC/B,OAAO,EAAE,CAAC,WAAW,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC,MAAM,EAAE,EAAE,qBAAqB,KAAK,GAAG,CAAC,CAAC,CAAC;AAC/E,CAAC;AAED,MAAM,CAAC,MAAM,iBAAiB,GAAG;IAC/B;QACE,IAAI,EAAE,0CAA0C;QAChD,KAAK,EAAE;YACL,IAAI,EAAE,4CAA4C;YAClD,YAAY,EAAE,EAAE,eAAe,EAAE,OAAO,EAAE,YAAY,EAAE,YAAY,EAAE;YACtE,kBAAkB,EAAE,KAAK;SAC1B;QACD,UAAU,EAAE,KAAK;QACjB,gBAAgB,EAAE,uBAAuB;KAC1C;IACD;QACE,IAAI,EAAE,qCAAqC;QAC3C,KAAK,EAAE;YACL,IAAI,EAAE,uDAAuD;YAC7D,YAAY,EAAE,EAAE,eAAe,EAAE,OAAO,EAAE,YAAY,EAAE,YAAY,EAAE;YACtE,kBAAkB,EAAE,KAAK;SAC1B;QACD,UAAU,EAAE,IAAI;KACjB;IACD;QACE,IAAI,EAAE,iDAAiD;QACvD,KAAK,EAAE;YACL,IAAI,EAAE,uDAAuD;YAC7D,YAAY,EAAE,EAAE,eAAe,EAAE,OAAO,EAAE,YAAY,EAAE,YAAY,EAAE;YACtE,kBAAkB,EAAE,KAAK;SAC1B;QACD,UAAU,EAAE,KAAK;QACjB,gBAAgB,EAAE,8BAA8B;KACjD;CACO,CAAC;AAEX,MAAM,CAAC,MAAM,WAAW,GAAG;IACzB;QACE,IAAI,EAAE,0DAA0D;QAChE,IAAI,EAAE,IAAI,CAAC,SAAS,CAAC;YACnB,MAAM,EAAE,OAAO;YACf,OAAO,EAAE,IAAI;YACb,UAAU,EAAE,UAAU;YACtB,gBAAgB,EAAE,EAAE;YACpB,eAAe,EAAE,EAAE;YACnB,UAAU,EAAE,EAAE;SACf,CAAC;QACF,YAAY,EAAE,OAAO;QACrB,aAAa,EAAE,mCAAmC;KACnD;IACD;QACE,IAAI,EAAE,kDAAkD;QACxD,IAAI,EAAE,IAAI,CAAC,SAAS,CAAC;YACnB,MAAM,EAAE,OAAO;YACf,OAAO,EAAE,IAAI;YACb,UAAU,EAAE,UAAU;YACtB,gBAAgB,EAAE,CAAC,iEAAiE,CAAC;YACrF,eAAe,EAAE,EAAE;YACnB,UAAU,EAAE,EAAE;SACf,CAAC;QACF,YAAY,EAAE,OAAO;QACrB,aAAa,EAAE,mCAAmC;KACnD;CACO,CAAC;AAEX,MAAM,CAAC,MAAM,WAAW,GAAG;IACzB;QACE,IAAI,EAAE,iDAAiD;QACvD,QAAQ,EAAE,SAAS;QACnB,cAAc,EAAE,QAAQ;QACxB,SAAS,EAAE,SAAS;QACpB,aAAa,EAAE,4BAA4B;KAC5C;CACO,CAAC;AAEX,KAAK,MAAM,QAAQ,IAAI,iBAAiB,EAAE,CAAC;IACzC,MAAM,MAAM,GAAG,qBAAqB,CAAC;QACnC,IAAI,EAAE,QAAQ,CAAC,KAAK,CAAC,IAAI;QACzB,YAAY,EAAE,QAAQ,CAAC,KAAK,CAAC,YAAY;QACzC,kBAAkB,EAAE,QAAQ,CAAC,KAAK,CAAC,kBAAkB;KACtD,CAAC,CAAC;IACH,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,IAAI,EAAE,QAAQ,CAAC,UAAU,EAAE,QAAQ,CAAC,IAAI,CAAC,CAAC;IAC9D,IAAI,kBAAkB,IAAI,QAAQ,EAAE,CAAC;QACnC,MAAM,CAAC,EAAE,CAAC,MAAM,CAAC,aAAa,CAAC,QAAQ,CAAC,QAAQ,CAAC,gBAAgB,CAAC,EAAE,QAAQ,CAAC,IAAI,CAAC,CAAC;IACrF,CAAC;AACH,CAAC;AAED,KAAK,MAAM,QAAQ,IAAI,WAAW,EAAE,CAAC;IACnC,MAAM,MAAM,GAAG,eAAe,CAAC,QAAQ,CAAC,IAAI,CAAC,CAAC;IAC9C,MAAM,CAAC,KAAK,CAAC,MAAM,CAAC,MAAM,EAAE,QAAQ,CAAC,YAAY,EAAE,QAAQ,CAAC,IAAI,CAAC,CAAC;IAClE,IAAI,eAAe,IAAI,QAAQ,EAAE,CAAC;QAChC,MAAM,CAAC,EAAE,CAAC,MAAM,CAAC,eAAe,CAAC,QAAQ,CAAC,QAAQ,CAAC,aAAa,CAAC,EAAE,QAAQ,CAAC,IAAI,CAAC,CAAC;IACpF,CAAC;IACD,IAAI,eAAe,IAAI,QAAQ,EAAE,CAAC;QAChC,MAAM,CAAC,EAAE,CAAC,CAAC,MAAM,CAAC,eAAe,CAAC,QAAQ,CAAC,QAAQ,CAAC,aAAa,CAAC,EAAE,QAAQ,CAAC,IAAI,CAAC,CAAC;IACrF,CAAC;AACH,CAAC;AAED,KAAK,MAAM,QAAQ,IAAI,WAAW,EAAE,CAAC;IACnC,MAAM,KAAK,GAAG,IAAI,YAAY,CAAC;QAC7B,GAAG,UAAU,EAAE;QACf,QAAQ,EAAE,UAAU,CAAC,QAAQ,CAAC;KAC/B,CAAC,CAAC;IACH,MAAM,OAAO,GAAG,MAAM,KAAK,CAAC,IAAI,CAAC,wBAAwB,QAAQ,CAAC,IAAI,EAAE,EAAE,UAAU,EAAE,EAAE,CAAC,CAAC;IAC1F,MAAM,IAAI,GAAG,KAAK,CAAC,IAAI,CAAC,OAAO,CAAC,UAAU,CAAC,CAAC;IAC5C,MAAM,EAAE,GAAG,IAAI,IAAI,EAAE,CAAC,WAAW,EAAE,CAAC;IACpC,IAAI,CAAC,MAAM,GAAG;QACZ;YACE,KAAK,EAAE,CAAC;YACR,UAAU,EAAE,EAAE;YACd,YAAY,EAAE,EAAE;YAChB,aAAa,EAAE,OAAO;YACtB,WAAW,EAAE,8BAA8B;YAC3C,KAAK,EAAE;gBACL;oBACE,IAAI,EAAE,OAAO;oBACb,QAAQ,EAAE,QAAQ;oBAClB,KAAK,EAAE,SAAS;oBAChB,MAAM,EAAE,OAAO;oBACf,UAAU,EAAE;wBACV,MAAM,EAAE,OAAO;wBACf,OAAO,EAAE,OAAO;wBAChB,UAAU,EAAE,UAAU;wBACtB,gBAAgB,EAAE,CAAC,4BAA4B,CAAC;wBAChD,eAAe,EAAE,EAAE;wBACnB,UAAU,EAAE,EAAE;qBACf;oBACD,IAAI,EAAE,IAAI;oBACV,GAAG,EAAE,EAAE,OAAO,EAAE,IAAI,EAAE;oBACtB,gBAAgB,EAAE,OAAO;oBACzB,eAAe,EAAE,EAAE;oBACnB,QAAQ,EAAE,CAAC;oBACX,UAAU,EAAE,CAAC;oBACb,KAAK,EAAE,EAAE,YAAY,EAAE,CAAC,EAAE,aAAa,EAAE,CAAC,EAAE,YAAY,EAAE,CAAC,EAAE;oBAC7D,IAAI,EAAE;wBACJ,QAAQ,EAAE,KAAK;wBACf,SAAS,EAAE,KAAK;wBAChB,MAAM,EAAE,iBAAiB;wBACzB,UAAU,EAAE,QAAQ,CAAC,QAAQ;qBAC9B;iBACF;aACF;YACD,QAAQ,EAAE,EAAE;YACZ,WAAW,EAAE;gBACX,SAAS,EAAE,IAAI;gBACf,MAAM,EAAE,SAAS;gBACjB,WAAW,EAAE,CAAC,OAAO,CAAC;gBACtB,eAAe,EAAE,EAAE;gBACnB,oBAAoB,EAAE,EAAE;gBACxB,cAAc,EAAE,EAAE;gBAClB,aAAa,EAAE,EAAE;gBACjB,gBAAgB,EAAE;oBAChB,KAAK,EAAE,OAAO;oBACd,MAAM,EAAE,OAAO;oBACf,MAAM,EAAE,OAAO;oBACf,QAAQ,EAAE,OAAO;oBACjB,IAAI,EAAE,OAAO;oBACb,UAAU,EAAE,OAAO;iBACpB;gBACD,gBAAgB,EAAE,EAAE;aACrB;SACF;KACF,CAAC;IACF,IAAI,CAAC,gBAAgB,GAAG;QACtB;YACE,KAAK,EAAE,CAAC;YACR,IAAI,EAAE,OAAO;YACb,KAAK,EAAE,eAAe;YACtB,IAAI,EAAE,qCAAqC;YAC3C,EAAE;YACF,KAAK,EAAE,EAAE,YAAY,EAAE,CAAC,EAAE,aAAa,EAAE,CAAC,EAAE,YAAY,EAAE,CAAC,EAAE;YAC7D,IAAI,EAAE;gBACJ,QAAQ,EAAE,KAAK;gBACf,SAAS,EAAE,KAAK;gBAChB,MAAM,EAAE,iBAAiB;gBACzB,UAAU,EAAE,QAAQ,CAAC,cAAc;aACpC;SACF;KACF,CAAC;IACF,IAAI,CAAC,MAAM,CAAC,IAAI,GAAG;QACjB,QAAQ,EAAE,KAAK;QACf,SAAS,EAAE,KAAK;QAChB,MAAM,EAAE,iBAAiB;QACzB,UAAU,EAAE,QAAQ,CAAC,SAAS;KAC/B,CAAC;IACF,IAAI,CAAC,kBAAkB,GAAG;QACxB;YACE,EAAE,EAAE,QAAQ;YACZ,IAAI,EAAE,OAAO;YACb,WAAW,EAAE,CAAC;YACd,UAAU,EAAE,CAAC;YACb,WAAW,EAAE,CAAC;YACd,GAAG,EAAE,QAAQ,CAAC,aAAa;YAC3B,aAAa,EAAE,EAAE;YACjB,YAAY,EAAE,EAAE;YAChB,MAAM,EAAE,gBAAgB;YACxB,kBAAkB,EAAE,CAAC;YACrB,cAAc,EAAE,aAAa;SAC9B;KACF,CAAC;IACF,EAAE,CAAC,aAAa,CAAC,KAAK,CAAC,QAAQ,CAAC,OAAO,CAAC,UAAU,CAAC,EAAE,IAAI,CAAC,SAAS,CAAC,IAAI,CAAC,CAAC,CAAC;IAC3E,MAAM,MAAM,GAAG,qBAAqB,CAAC,KAAK,CAAC,IAAI,CAAC,OAAO,CAAC,UAAU,CAAC,EAAE,EAAE,CAAC,CAAC;IACzE,MAAM,CAAC,EAAE,CAAC,MAAM,CAAC,QAAQ,CAAC,yDAAyD,CAAC,CAAC,CAAC;IACtF,MAAM,CAAC,EAAE,CAAC,MAAM,CAAC,QAAQ,CAAC,oCAAoC,CAAC,CAAC,CAAC;IACjE,MAAM,CAAC,EAAE,CAAC,MAAM,CAAC,QAAQ,CAAC,QAAQ,CAAC,aAAa,CAAC,CAAC,CAAC;AACrD,CAAC;AAED,OAAO,CAAC,GAAG,CACT,IAAI,CAAC,SAAS,CAAC;IACb,EAAE,EAAE,IAAI;IACR,kBAAkB,EAAE,iBAAiB,CAAC,MAAM;IAC5C,YAAY,EAAE,WAAW,CAAC,MAAM;IAChC,YAAY,EAAE,WAAW,CAAC,MAAM;CACjC,CAAC,CACH,CAAC"}

package/dist/scripts/smoke.js CHANGED Viewed

@@ -1248,6 +1248,193 @@ assert.equal(Object.hasOwn(metrics.decision_quality, "undefined"), false);
     assert.ok(notResurfacedReport.includes("not_resurfaced means the ask was not repeated; it is not proof that evidence was satisfied."), "v4.2.5 / not_resurfaced: session report must state the not_resurfaced semantics");
     console.log("[smoke] terminal_cost_evidence_audit_test: PASS");
 }
+// v4.3.0 / P1: unanimous READY with unresolved evidence must not look like a
+// plain unanimous_ready close-out. `not_resurfaced` is inference-only: it may
+// allow convergence, but the final metadata/report must keep that disposition
+// visible for operators.
+{
+    const { sessionReportMarkdown } = await import("../src/core/reports.js");
+    const unresolvedEvents = [];
+    const unresolvedConfig = {
+        ...loadConfig(),
+        data_dir: smokeTmpDir("unresolved-evidence-finalize"),
+        budget: {
+            ...loadConfig().budget,
+            max_session_cost_usd: 10000,
+            preflight_max_round_cost_usd: 10000,
+            until_stopped_max_cost_usd: 10000,
+        },
+        evidence_judge_autowire: {
+            ...loadConfig().evidence_judge_autowire,
+            mode: "off",
+            active: false,
+        },
+    };
+    const unresolvedOrch = new CrossReviewOrchestrator(unresolvedConfig, (event) => unresolvedEvents.push(event.type));
+    const unresolvedR1 = await unresolvedOrch.askPeers({
+        task: "P1 unresolved evidence finalization guard fixture.",
+        draft: "FORCE_NEEDS_EVIDENCE",
+        caller: "operator",
+        peers: ["claude"],
+    });
+    const unresolvedR2 = await unresolvedOrch.askPeers({
+        session_id: unresolvedR1.session.session_id,
+        task: "P1 unresolved evidence finalization guard fixture.",
+        draft: "Clean revised draft, no test marker present.",
+        caller: "operator",
+        peers: ["claude"],
+    });
+    assert.equal(unresolvedR2.converged, true);
+    assert.equal(unresolvedR2.session.outcome, "converged");
+    assert.equal(unresolvedR2.session.outcome_reason, "unanimous_ready_with_unresolved_evidence", "v4.3.0 / P1: convergence with not_resurfaced evidence must not finalize as plain unanimous_ready");
+    assert.ok(unresolvedEvents.includes("session.evidence_checklist_unresolved_on_finalize"), "v4.3.0 / P1: unresolved evidence close-out must emit an audit event");
+    const unresolvedReport = sessionReportMarkdown(unresolvedOrch.store.read(unresolvedR2.session.session_id), unresolvedOrch.store.readEvents(unresolvedR2.session.session_id));
+    assert.ok(unresolvedReport.includes("## Unresolved Evidence Disposition"), "v4.3.0 / P1: session_report must include unresolved-evidence disposition table");
+    assert.ok(unresolvedReport.includes("not_resurfaced"), "v4.3.0 / P1: session_report must name not_resurfaced unresolved items");
+    console.log("[smoke] unresolved_evidence_finalization_guard_test: PASS");
+}
+// v4.3.0 / P3: read-only peer reliability telemetry. This is deliberately
+// observational; it must not change peer selection or mutate sessions.
+{
+    const { SessionStore } = await import("../src/core/session-store.js");
+    const reliabilityStore = new SessionStore({
+        ...config,
+        data_dir: smokeTmpDir("peer-reliability"),
+    });
+    const reliabilitySession = await reliabilityStore.init("peer reliability report fixture", "operator", []);
+    const reliabilityMeta = reliabilityStore.read(reliabilitySession.session_id);
+    const ts = new Date().toISOString();
+    reliabilityMeta.rounds = [
+        {
+            round: 1,
+            started_at: ts,
+            completed_at: ts,
+            caller_status: "READY",
+            prompt_file: "agent-runs/round-1-prompt.md",
+            peers: [
+                {
+                    peer: "claude",
+                    provider: "anthropic",
+                    model: "claude-opus-4-8",
+                    status: "NEEDS_EVIDENCE",
+                    structured: {
+                        status: "NEEDS_EVIDENCE",
+                        summary: "needs log",
+                        confidence: "verified",
+                        evidence_sources: ["src/core/session-store.ts:1"],
+                        caller_requests: ["attach raw npm test output"],
+                        follow_ups: [],
+                    },
+                    text: "{}",
+                    raw: { fixture: true },
+                    decision_quality: "clean",
+                    parser_warnings: [],
+                    attempts: 1,
+                    latency_ms: 50,
+                    usage: { input_tokens: 10, output_tokens: 5, total_tokens: 15 },
+                    cost: { currency: "USD", estimated: false, source: "configured-rate", total_cost: 1 },
+                },
+                {
+                    peer: "grok",
+                    provider: "xai",
+                    model: "grok-4.3",
+                    status: "READY",
+                    structured: {
+                        status: "READY",
+                        summary: "ready",
+                        confidence: "verified",
+                        evidence_sources: ["server_info: version 4.2.5"],
+                        caller_requests: [],
+                        follow_ups: [],
+                    },
+                    text: "{}",
+                    raw: { fixture: true },
+                    decision_quality: "format_warning",
+                    parser_warnings: ["verified_without_concrete_evidence_sources"],
+                    attempts: 1,
+                    latency_ms: 100,
+                    usage: { input_tokens: 20, output_tokens: 10, total_tokens: 30 },
+                    cost: { currency: "USD", estimated: false, source: "configured-rate", total_cost: 2 },
+                },
+            ],
+            rejected: [
+                {
+                    peer: "perplexity",
+                    provider: "perplexity",
+                    model: "sonar-reasoning-pro",
+                    failure_class: "provider_error",
+                    message: "fixture provider error",
+                    retryable: false,
+                    attempts: 1,
+                    latency_ms: 0,
+                },
+            ],
+            convergence: {
+                converged: false,
+                reason: "fixture",
+                ready_peers: ["grok"],
+                not_ready_peers: [],
+                needs_evidence_peers: ["claude"],
+                rejected_peers: ["perplexity"],
+                skipped_peers: [],
+                decision_quality: {
+                    codex: "clean",
+                    claude: "clean",
+                    gemini: "clean",
+                    deepseek: "clean",
+                    grok: "format_warning",
+                    perplexity: "failed",
+                },
+                blocking_details: ["claude:NEEDS_EVIDENCE", "perplexity:provider_error"],
+            },
+        },
+    ];
+    reliabilityMeta.evidence_checklist = [
+        {
+            id: "rel-1",
+            peer: "claude",
+            first_round: 1,
+            last_round: 1,
+            round_count: 1,
+            ask: "attach raw npm test output",
+            first_seen_at: ts,
+            last_seen_at: ts,
+            status: "not_resurfaced",
+            addressed_at_round: 2,
+            address_method: "resurfacing",
+        },
+    ];
+    fs.writeFileSync(reliabilityStore.metaPath(reliabilitySession.session_id), JSON.stringify(reliabilityMeta));
+    await reliabilityStore.appendEvent({
+        ts,
+        type: "session.lead_meta_audit_fabrication_detected",
+        session_id: reliabilitySession.session_id,
+        message: "fixture fabrication event",
+        data: { peer: "grok" },
+    });
+    const reliability = reliabilityStore.peerReliabilityReport();
+    assert.equal(reliability.scope, "all");
+    assert.equal(reliability.by_peer.claude?.needs_evidence, 1);
+    assert.equal(reliability.by_peer.claude?.not_resurfaced_asks, 1);
+    assert.equal(reliability.by_peer.grok?.ready, 1);
+    assert.equal(reliability.by_peer.grok?.parser_warnings_total, 1);
+    assert.equal(reliability.by_peer.grok?.fabrication_events, 1);
+    assert.equal(reliability.by_peer.perplexity?.provider_errors, 1);
+    console.log("[smoke] peer_reliability_report_test: PASS");
+}
+// v4.3.0 / P2: offline declarative eval harness. This pins the existence of a
+// no-provider-call fixture runner so regressions found in real sessions can be
+// replayed without growing the ad hoc smoke body indefinitely.
+{
+    const pkg = JSON.parse(fs.readFileSync("package.json", "utf8"));
+    assert.equal(pkg.scripts?.["eval:fixtures"], "tsx scripts/eval-fixtures.ts", "v4.3.0 / P2: package.json must expose the offline fixture eval runner");
+    const evalHarness = fs.readFileSync("scripts/eval-fixtures.ts", "utf8");
+    assert.ok(/truthfulnessCases/.test(evalHarness) &&
+        /parserCases/.test(evalHarness) &&
+        /reportCases/.test(evalHarness), "v4.3.0 / P2: eval-fixtures must use declarative truthfulness/parser/report case tables");
+    assert.ok(!/askPeers\(|runUntilUnanimous\(|session_start_round/.test(evalHarness), "v4.3.0 / P2: eval-fixtures must stay offline and avoid provider-review entry points");
+    console.log("[smoke] offline_fixture_eval_contract_test: PASS");
+}
 // v2.22.0 (B.P3): session.budget_warning event emit + idempotency. The
 // orchestrator emits a one-shot warning when cumulative cost crosses
 // 75% of cost_ceiling_usd; the budget_warning_emitted flag persists