npm - @mmerterden/multi-agent-pipeline - Versions diffs - 10.7.4 → 10.9.0 - Mend

@mmerterden/multi-agent-pipeline 10.7.4 → 10.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/CHANGELOG.md +93 -0
package/README.md +2 -0
package/docs/engineering.md +76 -0
package/docs/features.md +49 -33
package/package.json +1 -1
package/pipeline/commands/multi-agent/refs/features/verify-by-test.md +41 -0
package/pipeline/commands/multi-agent/refs/phases/log-format.md +10 -0
package/pipeline/commands/multi-agent/refs/phases/operations.md +15 -2
package/pipeline/commands/multi-agent/refs/phases/phase-0-init.md +9 -0
package/pipeline/commands/multi-agent/refs/phases/phase-3-dev.md +12 -0
package/pipeline/commands/multi-agent/refs/phases/phase-4-review.md +12 -1
package/pipeline/commands/multi-agent/refs/rules.md +1 -0
package/pipeline/commands/multi-agent/resume.md +7 -4
package/pipeline/commands/multi-agent/review.md +33 -1
package/pipeline/schemas/diff-risk.schema.json +5 -4
package/pipeline/schemas/prefs.schema.json +59 -0
package/pipeline/schemas/token-budget.json +7 -7
package/pipeline/schemas/triage-output.schema.json +35 -4
package/pipeline/scripts/README.md +3 -0
package/pipeline/scripts/diff-risk-score.mjs +11 -1
package/pipeline/scripts/fixtures/diff-risk-test-removal.diff +40 -0
package/pipeline/scripts/fixtures/install-layout.tsv +3 -3
package/pipeline/scripts/smoke-diff-risk.sh +30 -1
package/pipeline/scripts/smoke-handoff-contract.sh +92 -0
package/pipeline/scripts/smoke-update-check.sh +122 -0
package/pipeline/scripts/smoke-verify-by-test.sh +148 -0
package/pipeline/scripts/update-check.sh +82 -0
package/pipeline/scripts/validate-diff-risk.mjs +2 -1
package/pipeline/scripts/validate-triage.mjs +31 -2

package/pipeline/commands/multi-agent/resume.md CHANGED Viewed

@@ -23,10 +23,13 @@ Resume a paused or failed task from the last successful phase.
    - `haltReason`  -  if set, show it so the user knows why the run stopped; clear it on successful re-entry
    - `autopilot`  -  preserve the mode
-3. **Load context**  -  read prior-phase findings from `agent-log.md`:
-   - Phase 1 analysis → use it from Phase 2+
-   - Phase 2 plan → use it from Phase 3+
-   - Phase 3 code → already in the worktree
+3. **Load context**  -  rebuild working context from durable artifacts, never from conversation memory:
+   - **Handoff first (v10.8.0)**: read the LATEST `## Handoff` block in `agent-log.md`  -  it carries done/remaining/decisions/open-findings and the exact re-entry point (phase + subStep). When present, it is the primary context source; cross-check its `Next:` line against `state.currentPhase` and trust state on mismatch (state is the machine truth, handoff is the narrative).
+   - Fall back to per-phase findings for logs written before v10.8 (no handoff blocks):
+     - Phase 1 analysis → use it from Phase 2+
+     - Phase 2 plan → use it from Phase 3+
+     - Phase 3 code → already in the worktree
+   - Recent `git log --oneline -10` in the worktree grounds what was actually committed vs. claimed.
 4. **Continue the pipeline**  -  start from the next phase (same pipeline as the main multi-agent command).

package/pipeline/commands/multi-agent/review.md CHANGED Viewed

@@ -109,6 +109,38 @@ The `credential-store.sh` wrapper handles macOS Keychain (`security`), Linux lib
 Save the diff to `/tmp/multi-agent-review-${TASK_ID}-diff.patch` so reviewers can re-read it.
+### 2b. Module review guides  -  path-scoped convention files
+A module in the repo may carry its own CLAUDE guide  -  a convention/checklist file living somewhere in the module's directory tree that the host CLI never auto-loads. When a changed file's module has such a guide, the review must consult it. Discovery is deterministic, from the diff's changed paths:
+```bash
+# Changed file paths from the patch:
+grep -E '^\+\+\+ b/' "$DIFF_FILE" | sed 's|^+++ b/||' | sort -u > /tmp/multi-agent-review-${TASK_ID}-paths.txt
+# For each changed path, walk its directory chain up to the repo root and
+# collect guide files matching: CLAUDE.md, *-CLAUDE.md, AGENTS.md.
+# Root-level CLAUDE.md/AGENTS.md are excluded  -  the host CLI already loads those.
+guides=()
+while IFS= read -r p; do
+  d=$(dirname "$p")
+  while [ "$d" != "." ] && [ "$d" != "/" ]; do
+    for g in "$d"/CLAUDE.md "$d"/*-CLAUDE.md "$d"/AGENTS.md; do
+      [ -e "$g" ] && guides+=("$g")
+    done
+    d=$(dirname "$d")
+  done
+done < /tmp/multi-agent-review-${TASK_ID}-paths.txt
+# dedupe, cap at 5 (log any dropped so truncation is never silent)
+```
+Existence checks are resolved against the local checkout when the cwd is the target repo. In PR mode without a local checkout, probe the candidate paths via the provider API instead (`gh api /repos/{o}/{r}/contents/{path}?ref={headSha}` / Bitbucket `GET /projects/{KEY}/repos/{slug}/browse/{path}?at={headSha}`) and fetch the matching files' raw content the same way. No hit → step is a silent no-op.
+Persist `agent-state.review.moduleGuides = [<repo-relative paths>]` and inject into every reviewer prompt (Step 3):
+> MODULE REVIEW GUIDES: before reviewing, read each of these guide files. Apply a guide's rules/checklist to every changed file under its directory. Guide violations are findings like any other  -  triage them by severity.
+Scope note: a guide governs only files under its own directory  -  a guide found under one module must not be applied to a sibling module's changes in the same PR.
 ### 3. Launch parallel reviewers  -  host-CLI dependent
 **Claude Code (2 in parallel):**
@@ -120,7 +152,7 @@ Save the diff to `/tmp/multi-agent-review-${TASK_ID}-diff.patch` so reviewers ca
 - Agent 2: `gpt-5.4` → edge cases, alternate perspective
 - Agent 3: `claude-sonnet-4-6` → general quality
-Each reviewer receives the diff plus the standard reviewer system prompt (see `refs/phases/phase-4-review.md` for the prompt contract). Output: structured `findings[]` per reviewer.
+Each reviewer receives the diff, the module review guides from Step 2b (when any were found), plus the standard reviewer system prompt (see `refs/phases/phase-4-review.md` for the prompt contract). Output: structured `findings[]` per reviewer.
 ### 4. Store-compliance cross-reference

package/pipeline/schemas/diff-risk.schema.json CHANGED Viewed

@@ -1,16 +1,16 @@
 {
   "$schema": "https://json-schema.org/draft/2020-12/schema",
   "$id": "https://github.com/mmerterden/multi-agent-pipeline/pipeline/schemas/diff-risk.schema.json",
-  "version": "1.0.0",
+  "version": "1.1.0",
   "title": "Multi-Agent Pipeline  -  Phase 4 diff risk score",
-  "description": "Output contract for diff-risk-score.mjs. Heuristic, deterministic, no LLM. Produced before Phase 4 Step 2 to give reviewer prompts a priority ordering  -  never used as a gate.",
+  "description": "Output contract for diff-risk-score.mjs. Heuristic, deterministic, no LLM. Produced before Phase 4 Step 2 to give reviewer prompts a priority ordering  -  never used as a gate. v1.1.0 adds the test_lines_removed signal (immutable-test backstop: a test file whose diff removes more lines than it adds).",
   "type": "object",
   "additionalProperties": false,
   "required": ["schemaVersion", "task", "totals", "files"],
   "properties": {
     "schemaVersion": {
       "type": "string",
-      "const": "1.0.0"
+      "const": "1.1.0"
     },
     "task": {
       "type": "object",
@@ -63,7 +63,8 @@
                     "no_test_change",
                     "complexity_delta",
                     "ui_critical",
-                    "migration"
+                    "migration",
+                    "test_lines_removed"
                   ]
                 },
                 "weight": { "type": "number" },

package/pipeline/schemas/prefs.schema.json CHANGED Viewed

@@ -701,6 +701,65 @@
           "default": false,
           "description": "v6.1.0+ \u2014 Phase 4 Step 2.5 rebuttal round. When reviewers disagree (mixed blocker/approved verdict), each reviewer is re-prompted with the others' opposing arguments for one additional round before triage. Lifts signal quality on ambiguous findings at ~1\u00d7 Step 2 token cost. Off by default \u2014 flip for security-critical or release-branch reviews."
         },
+        "updateCheck": {
+          "type": "object",
+          "additionalProperties": false,
+          "description": "v10.9+ - Phase 0 Step 0.6 advisory version check. Once per ttlHours window, a bounded (3s) registry read compares the installed version against dist-tags.latest. Newer version found: interactive modes ask 'Update now / Continue' (yes runs the /multi-agent:update flow, then the run continues); autopilot logs one line and never asks. Offline/failed checks are silent; the step never blocks the pipeline.",
+          "properties": {
+            "enabled": {
+              "type": "boolean",
+              "default": true,
+              "description": "Master switch. On by default - the cost is at most one 3s-bounded curl per ttlHours."
+            },
+            "ttlHours": {
+              "type": "integer",
+              "minimum": 1,
+              "maximum": 168,
+              "default": 24,
+              "description": "Cache window for the registry read."
+            },
+            "autoUpdate": {
+              "type": "boolean",
+              "default": false,
+              "description": "When true, skip the question and run the update flow automatically before the run starts (interactive AND autopilot). Off by default - self-modifying ~/.claude without asking is a surprise."
+            }
+          }
+        },
+        "verifyByTest": {
+          "type": "object",
+          "additionalProperties": false,
+          "description": "v10.8+ - Phase 4 Step 3.7 verify-by-test. When enabled, accepted BLOCKING findings are empirically validated before the Phase 3 rework loop: one verifier agent writes a minimal repro test per finding and runs only that test. Confirmed findings hand their failing test to Phase 3 as the RED step; non-reproducible findings are downgraded to deferred under evidence-gate. Only blocking findings are ever verified (fixed behavior, not a knob). Adds one model call plus up to maxFindings single-test runs per iteration with accepted blockers; default off. Flip on for security-critical work, release branches, or repos with noisy reviewers. Full spec: refs/features/verify-by-test.md.",
+          "properties": {
+            "enabled": {
+              "type": "boolean",
+              "default": false,
+              "description": "Master switch."
+            },
+            "maxFindings": {
+              "type": "integer",
+              "minimum": 1,
+              "maximum": 10,
+              "default": 3,
+              "description": "Max accepted blocking findings verified per review iteration. Findings beyond the cap keep their judgment-only verdict."
+            },
+            "model": {
+              "type": "string",
+              "enum": [
+                "sonnet",
+                "opus"
+              ],
+              "default": "sonnet",
+              "description": "Verifier agent model. Writing a minimal repro test is mechanical work; Sonnet is the cost-sane default."
+            },
+            "stepTimeoutSec": {
+              "type": "integer",
+              "minimum": 60,
+              "maximum": 1800,
+              "default": 600,
+              "description": "Wall-clock budget for the whole Step 3.7 pass. On breach, remaining findings keep judgment-only verdicts and the pipeline proceeds (never blocks)."
+            }
+          }
+        },
         "review": {
           "type": "object",
           "additionalProperties": false,

package/pipeline/schemas/token-budget.json CHANGED Viewed

@@ -3,15 +3,15 @@
   "$id": "https://github.com/mmerterden/multi-agent-pipeline/pipeline/schemas/token-budget.json",
   "description": "Per-phase token budget for lazy-loaded pipeline docs. Enforced by smoke-token-budget.sh.",
   "phases": {
-    "phase-0-init":      { "max_tokens": 11250, "warn_tokens": 9900 },
+    "phase-0-init":      { "max_tokens": 12400, "warn_tokens": 10900 },
     "phase-1-analysis":  { "max_tokens": 3750, "warn_tokens": 3300 },
     "phase-2-planning":  { "max_tokens": 6500, "warn_tokens": 5750 },
-    "phase-3-dev":       { "max_tokens": 7650, "warn_tokens": 6750 },
-    "phase-4-review":    { "max_tokens": 11100, "warn_tokens": 9750 },
-    "phase-5-test":      { "max_tokens": 2300, "warn_tokens": 2000 },
-    "phase-6-commit":    { "max_tokens": 5550, "warn_tokens": 4900 },
+    "phase-3-dev":       { "max_tokens": 7900, "warn_tokens": 6950 },
+    "phase-4-review":    { "max_tokens": 13250, "warn_tokens": 11650 },
+    "phase-5-test":      { "max_tokens": 2550, "warn_tokens": 2250 },
+    "phase-6-commit":    { "max_tokens": 6150, "warn_tokens": 5400 },
     "phase-7-report":    { "max_tokens": 5600, "warn_tokens": 4950 }
   },
-  "total_max_tokens": 46500,
-  "note": "Token estimate = ceil(chars / 4). Per-phase budget rule: warn = current+10% (rounded to nearest 50), max = current+25%. Gives ~6 edit cycles of headroom before warn trips  -  intentionally quiet under normal maintenance, loud when a phase grows unusually. Only the active phase is loaded (lazy). Recalibrated at v10.0.0 after the validator/consistency/simplifier/lesson gate contracts landed in phases 1-4 (prose already compressed; the residual growth is the gate contracts themselves)."
+  "total_max_tokens": 50000,
+  "note": "Token estimate = ceil(chars / 4). Per-phase budget rule: warn = current+10% (rounded to nearest 50), max = current+25%. Gives ~6 edit cycles of headroom before warn trips  -  intentionally quiet under normal maintenance, loud when a phase grows unusually. Only the active phase is loaded (lazy). Recalibrated at v10.0.0 after the validator/consistency/simplifier/lesson gate contracts landed in phases 1-4. Recalibrated again at v10.9.0 after the verify-by-test (Phase 4 Step 3.7), update-check (Phase 0 Step 0.6), immutable-test (Phase 3 GREEN) and redTests re-entry contracts landed  -  Step 3.7 prose was compressed to a pointer into refs/features/verify-by-test.md before the recalibration."
 }

package/pipeline/schemas/triage-output.schema.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
   "$schema": "https://json-schema.org/draft/2020-12/schema",
   "$id": "https://github.com/mmerterden/multi-agent-pipeline/pipeline/schemas/triage-output.schema.json",
-  "version": "3.1.0",
+  "version": "3.2.0",
   "title": "Multi-Agent Pipeline  -  Phase 4 triage output",
-  "description": "Contract for the Opus triage agent's JSON output in Phase 4 Step 3. Triage consumes merged reviewer findings and splits them into accepted/deferred/rejected. Only `accepted` blocking/important items trigger Phase 3 rework. v3.1.0 adds the optional `consensus` block so triage can surface reviewer-agreement risk (false consensus among same-base-model reviewers) instead of silently merging.",
+  "description": "Contract for the Opus triage agent's JSON output in Phase 4 Step 3. Triage consumes merged reviewer findings and splits them into accepted/deferred/rejected. Only `accepted` blocking/important items trigger Phase 3 rework. v3.1.0 adds the optional `consensus` block so triage can surface reviewer-agreement risk (false consensus among same-base-model reviewers) instead of silently merging. v3.2.0 adds the optional per-finding `verification` block written by Phase 4 Step 3.7 (verify-by-test): the empirical repro-test outcome for accepted blocking findings.",
   "type": "object",
   "additionalProperties": false,
   "required": ["accepted", "deferred", "rejected", "approved"],
@@ -114,6 +114,35 @@
         }
       }
     },
+    "verification": {
+      "type": "object",
+      "additionalProperties": false,
+      "description": "v3.2.0 verify-by-test outcome (Phase 4 Step 3.7, opt-in via prefs.global.verifyByTest). confirmed = repro test failed as the finding predicts (finding stands, test kept as the Phase 3 RED test); not-reproduced = repro test passed under evidence-gate (finding downgraded to deferred); inconclusive = compile error / timeout / not unit-testable (judgment verdict stands).",
+      "required": ["result"],
+      "properties": {
+        "result": {
+          "type": "string",
+          "enum": ["confirmed", "not-reproduced", "inconclusive"]
+        },
+        "testRef": {
+          "type": "string",
+          "minLength": 1,
+          "description": "Single-test reference, e.g. 'AuthTests/LoginTests/testExpiredTokenRejected' or 'tests/test_auth.py::test_expired_token'."
+        },
+        "evidencePath": {
+          "type": "string",
+          "minLength": 1,
+          "description": "Path to the test-run log verified by evidence-gate.mjs, e.g. '.pipeline/verify-1.test.log'."
+        },
+        "note": { "type": "string" }
+      },
+      "if": {
+        "properties": { "result": { "enum": ["confirmed", "not-reproduced"] } }
+      },
+      "then": {
+        "required": ["result", "testRef", "evidencePath"]
+      }
+    },
     "rawFinding": {
       "type": "object",
       "additionalProperties": false,
@@ -124,7 +153,8 @@
         "line": { "type": "integer", "minimum": 0 },
         "issue": { "type": "string", "minLength": 4 },
         "fix": { "type": "string" },
-        "reviewer": { "$ref": "#/$defs/reviewer" }
+        "reviewer": { "$ref": "#/$defs/reviewer" },
+        "verification": { "$ref": "#/$defs/verification" }
       }
     },
     "acceptedFinding": {
@@ -144,7 +174,8 @@
               "type": "string",
               "minLength": 4,
               "description": "Concrete change the dev agent must make. Required for accepted items so Phase 3 re-entry has actionable direction."
-            }
+            },
+            "verification": { "$ref": "#/$defs/verification" }
           }
         }
       ]

package/pipeline/scripts/README.md CHANGED Viewed

@@ -22,6 +22,9 @@ Validate contracts. Each emits `══ <name> smoke: N passed, M failed ══`
 - `smoke-phase-6-multi.sh`  -  Phase 6 multi-repo commit/PR cross-linking
 - `smoke-phase-banner.sh` + `smoke-phase-tracker.sh`  -  Phase UI output contracts
 - `smoke-phase4-triage.sh`  -  Phase 4 reviewer → triage flow
+- `smoke-verify-by-test.sh`  -  Phase 4 Step 3.7 verify-by-test contract (v10.8.0)
+- `smoke-handoff-contract.sh`  -  phase-boundary structured handoff + handoff-first resume (v10.8.0)
+- `smoke-update-check.sh`  -  Phase 0 Step 0.6 advisory update-check contract (v10.9.0)
 ### Schema + state
 - `smoke-schema-validation.sh`  -  all JSON schemas validate

package/pipeline/scripts/diff-risk-score.mjs CHANGED Viewed

@@ -15,6 +15,7 @@
  *   complexity_delta  -  added if/guard/case/switch/while count     w=1.5
  *   ui_critical       -  *View.swift / *Screen.kt / Configuration   w=1.5
  *   migration         -  DB schema / migration path                 w=4.0
+ *   test_lines_removed -  test file shrinks (removed > added)       w=3.0
  *
  * Inputs:
  *   --base <ref>     Base ref. Default: origin/main, fallback: main
@@ -275,6 +276,15 @@ function buildRow(stat, addedLines, allChangedPaths) {
     }
   }
+  // Test-lines-removed: a test-classified file whose diff removes more lines
+  // than it adds. Shrinking tests is the classic get-to-green shortcut the
+  // immutable-test rule forbids (refs/rules.md); surface it to reviewers.
+  if (isTestPath(path) && stat.removed > stat.added) {
+    const w = 3.0;
+    signals.push({ name: "test_lines_removed", weight: w, value: stat.removed - stat.added });
+    score += 12 * w;
+  }
   return {
     path,
     score: Math.round(score * 100) / 100,
@@ -306,7 +316,7 @@ function main() {
   };
   const out = {
-    schemaVersion: "1.0.0",
+    schemaVersion: "1.1.0",
     task: {
       id: TASK_ID,
       base: BASE || "(diff-file)",

package/pipeline/scripts/fixtures/diff-risk-test-removal.diff ADDED Viewed

@@ -0,0 +1,40 @@
+diff --git a/MyAppTests/LoginViewModelTests.swift b/MyAppTests/LoginViewModelTests.swift
+index 1111111..2222222 100644
+--- a/MyAppTests/LoginViewModelTests.swift
++++ b/MyAppTests/LoginViewModelTests.swift
+@@ -10,30 +10,20 @@ final class LoginViewModelTests: XCTestCase {
+     func testLoginWithValidCredentials_Succeeds() {
+         let sut = LoginViewModel(service: MockAuthService())
++        sut.retryPolicy = .none
+         sut.login(email: "user@example.com", password: "correct")
++        XCTAssertTrue(sut.isAuthenticated)
+     }
+-
+-    func testLoginWithInvalidEmail_ShowsError() {
+-        let sut = LoginViewModel(service: MockAuthService())
+-        sut.login(email: "not-an-email", password: "irrelevant")
+-        XCTAssertEqual(sut.errorMessage, "Invalid email")
+-    }
+-
+-    func testLoginWithExpiredToken_Rejects() {
+-        let sut = LoginViewModel(service: MockAuthService(tokenState: .expired))
+-        sut.login(email: "user@example.com", password: "correct")
+-        XCTAssertFalse(sut.isAuthenticated)
+-    }
+-
+-    func testLogout_ClearsSession() {
+-        let sut = LoginViewModel(service: MockAuthService())
+-        sut.logout()
+-        XCTAssertNil(sut.session)
+-    }
+ }
+diff --git a/MyApp/Sources/Auth/LoginViewModel.swift b/MyApp/Sources/Auth/LoginViewModel.swift
+index 3333333..4444444 100644
+--- a/MyApp/Sources/Auth/LoginViewModel.swift
++++ b/MyApp/Sources/Auth/LoginViewModel.swift
+@@ -20,6 +20,8 @@ final class LoginViewModel {
+     func login(email: String, password: String) {
++        guard email.contains("@") else { return }
++        service.authenticate(email: email, password: password)
+     }
+ }

package/pipeline/scripts/fixtures/install-layout.tsv CHANGED Viewed

@@ -1,16 +1,16 @@
 .claude/CLAUDE.md	1
 .claude/agents	8
-.claude/commands	88
+.claude/commands	89
 .claude/lib	23
 .claude/multi-agent-preferences.json	1
 .claude/rules	12
 .claude/schemas	23
-.claude/scripts	167
+.claude/scripts	171
 .claude/settings.json	1
 .claude/skills	560
 .copilot/agents	8
 .copilot/copilot-instructions.md	1
 .copilot/lib	23
 .copilot/schemas	23
-.copilot/scripts	167
+.copilot/scripts	171
 .copilot/skills	596

package/pipeline/scripts/smoke-diff-risk.sh CHANGED Viewed

@@ -12,6 +12,7 @@
 #   8. phase-4-review.md ref doc declares Step 1.75 + diff-risk-score.mjs
 #   9. code-reviewer.md agent template carries the priority-files placeholder
 #   10. prefs.schema.json exposes diffRisk advisory toggle
+#   11. test-removal fixture fires the test_lines_removed signal (v1.1.0)
 #
 # Exit 0 = all pass, 1 = any failure.
@@ -26,6 +27,7 @@ REVIEWER="$ROOT/pipeline/agents/code-reviewer.md"
 PREFS="$ROOT/pipeline/schemas/prefs.schema.json"
 FIX_IOS="$ROOT/pipeline/scripts/fixtures/diff-risk-ios.diff"
 FIX_AND="$ROOT/pipeline/scripts/fixtures/diff-risk-android.diff"
+FIX_TESTRM="$ROOT/pipeline/scripts/fixtures/diff-risk-test-removal.diff"
 pass=0
 fail=0
@@ -38,10 +40,11 @@ printf '→ smoke-diff-risk (v8.3.0): pre-review risk scoring contract\n'
 [ -f "$SCHEMA" ]   || { record_fail "schema missing: $SCHEMA"; exit 1; }
 [ -f "$FIX_IOS" ]  || { record_fail "fixture missing: $FIX_IOS"; exit 1; }
 [ -f "$FIX_AND" ]  || { record_fail "fixture missing: $FIX_AND"; exit 1; }
+[ -f "$FIX_TESTRM" ] || { record_fail "fixture missing: $FIX_TESTRM"; exit 1; }
 # --- 1: iOS fixture produces JSON ---
 out_ios=$(node "$SCORE" --diff "$FIX_IOS" 2>/dev/null)
-if jq -e '.schemaVersion == "1.0.0"' <<< "$out_ios" >/dev/null 2>&1; then
+if jq -e '.schemaVersion == "1.1.0"' <<< "$out_ios" >/dev/null 2>&1; then
   record_pass "iOS fixture renders schema-versioned JSON"
 else
   record_fail "iOS fixture JSON malformed or missing schemaVersion"
@@ -150,6 +153,32 @@ else
   record_fail "prefs.schema.json missing global.diffRiskAdvisory"
 fi
+# --- 11: test_lines_removed signal fires on the test-removal fixture ---
+out_testrm=$(node "$SCORE" --diff "$FIX_TESTRM" 2>/dev/null)
+sig_value=$(jq -r '.files[] | select(.path == "MyAppTests/LoginViewModelTests.swift")
+                   | .signals[] | select(.name == "test_lines_removed") | .value' <<< "$out_testrm")
+if [ "$sig_value" = "16" ]; then
+  record_pass "test_lines_removed fires with value=16 (18 removed - 2 added)"
+else
+  record_fail "test_lines_removed should fire with value=16, got: ${sig_value:-missing}"
+fi
+sig_on_source=$(jq -r '[.files[] | select(.path == "MyApp/Sources/Auth/LoginViewModel.swift")
+                        | .signals[] | select(.name == "test_lines_removed")] | length' <<< "$out_testrm")
+if [ "$sig_on_source" = "0" ]; then
+  record_pass "test_lines_removed does not fire on source files"
+else
+  record_fail "test_lines_removed must only fire on test-classified paths"
+fi
+set +e
+echo "$out_testrm" | node "$VALIDATE" - >/dev/null 2>&1
+rc_testrm=$?
+set -e
+if [ "$rc_testrm" -eq 0 ]; then
+  record_pass "validator accepts output carrying test_lines_removed"
+else
+  record_fail "validator rejected test_lines_removed output (rc=$rc_testrm)"
+fi
 # --- Summary ---
 total=$((pass + fail))
 printf '\n→ smoke-diff-risk: %d/%d passed\n' "$pass" "$total"

package/pipeline/scripts/smoke-handoff-contract.sh ADDED Viewed

@@ -0,0 +1,92 @@
+#!/usr/bin/env bash
+# smoke-handoff-contract.sh
+#
+# Verifies the v10.8.0 structured-handoff contract (fresh-context re-entry):
+#   1. operations.md documents the Handoff block with all 5 required lines
+#   2. operations.md compaction trigger re-reads state AND the latest handoff
+#   3. log-format.md documents the Handoff section in the canonical log shape
+#   4. resume.md Step 3 reads the latest handoff FIRST with pre-v10.8 fallback
+#
+# Exit 0 = all pass, 1 = any failure.
+set -euo pipefail
+ROOT="$(cd "$(dirname "$0")/../.." && pwd)"
+OPS="$ROOT/pipeline/commands/multi-agent/refs/phases/operations.md"
+LOGFMT="$ROOT/pipeline/commands/multi-agent/refs/phases/log-format.md"
+RESUME="$ROOT/pipeline/commands/multi-agent/resume.md"
+pass=0
+fail=0
+failures=()
+record_pass() { pass=$((pass + 1)); printf '  \033[0;32mPASS\033[0m %s\n' "$1"; }
+record_fail() { fail=$((fail + 1)); failures+=("$1"); printf '  \033[0;31mFAIL\033[0m %s\n' "$1"; }
+printf '→ smoke-handoff-contract: structured handoff (fresh-context re-entry)\n'
+# 1. operations.md documents the Handoff block with the 5 required lines
+if [ ! -f "$OPS" ]; then
+  record_fail "operations.md missing"
+else
+  if grep -qF "Handoff block (v10.8.0)" "$OPS"; then
+    record_pass "operations.md documents the Handoff block"
+  else
+    record_fail "operations.md missing 'Handoff block (v10.8.0)' spec"
+  fi
+  for line in "- Done:" "- Remaining:" "- Decisions:" "- Open findings:" "- Next:"; do
+    if grep -qF -- "$line" "$OPS"; then
+      record_pass "operations.md handoff spec has '$line'"
+    else
+      record_fail "operations.md handoff spec missing '$line'"
+    fi
+  done
+  if grep -qF "no agent dispatch, no extra LLM call" "$OPS"; then
+    record_pass "operations.md states handoff is orchestrator-written (no LLM call)"
+  else
+    record_fail "operations.md must state the handoff costs no LLM call"
+  fi
+fi
+# 2. Compaction trigger re-reads state AND latest handoff
+if grep -qE 'agent-state\.json.*AND the latest.*Handoff' "$OPS"; then
+  record_pass "compaction trigger re-reads state + latest handoff"
+else
+  record_fail "operations.md compaction trigger must re-read agent-state.json AND the latest Handoff block"
+fi
+# 3. log-format.md documents the Handoff section
+if grep -qF "## Handoff - end of Phase" "$LOGFMT"; then
+  record_pass "log-format.md documents the Handoff section"
+else
+  record_fail "log-format.md missing the Handoff section"
+fi
+if grep -qF "LATEST block is authoritative" "$LOGFMT"; then
+  record_pass "log-format.md states latest-block-wins semantics"
+else
+  record_fail "log-format.md must state the latest handoff block is authoritative"
+fi
+# 4. resume.md reads handoff first, with fallback for older logs
+if grep -qE 'LATEST .?## Handoff.? block' "$RESUME"; then
+  record_pass "resume.md Step 3 reads the latest Handoff block first"
+else
+  record_fail "resume.md Step 3 must read the latest Handoff block first"
+fi
+if grep -qiF "fall back to per-phase findings" "$RESUME"; then
+  record_pass "resume.md keeps the pre-v10.8 per-phase fallback"
+else
+  record_fail "resume.md must keep the pre-v10.8 per-phase findings fallback"
+fi
+if grep -qF "trust state on mismatch" "$RESUME"; then
+  record_pass "resume.md defines state-wins conflict rule"
+else
+  record_fail "resume.md must define the handoff-vs-state conflict rule (state wins)"
+fi
+printf '\n══ handoff-contract smoke: %d passed, %d failed ══\n' "$pass" "$fail"
+if [ "$fail" -gt 0 ]; then
+  printf '\nFailures:\n'
+  for msg in "${failures[@]}"; do printf '  - %s\n' "$msg"; done
+  exit 1
+fi
+exit 0

package/pipeline/scripts/smoke-update-check.sh ADDED Viewed

@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+# smoke-update-check.sh
+#
+# Verifies the Phase 0 Step 0.6 advisory update-check contract:
+#   1. update-check.sh exists, parses (bash -n), and honors the advisory contract
+#      offline: exit 0 + empty stdout when the registry is unreachable
+#   2. Cached path: fresh cache short-circuits without a network call
+#   3. Newer latest -> "<local>|<latest>"; same or older latest -> silent
+#   4. prefs.schema.json exposes updateCheck.{enabled,ttlHours,autoUpdate}
+#      with the documented defaults (enabled=true, autoUpdate=false)
+#   5. phase-0-init.md documents Step 0.6 with the autopilot log-only rule
+#
+# Exit 0 = all pass, 1 = any failure.
+set -euo pipefail
+ROOT="$(cd "$(dirname "$0")/../.." && pwd)"
+SCRIPT="$ROOT/pipeline/scripts/update-check.sh"
+PREFS_SCHEMA="$ROOT/pipeline/schemas/prefs.schema.json"
+PHASE0_DOC="$ROOT/pipeline/commands/multi-agent/refs/phases/phase-0-init.md"
+pass=0
+fail=0
+failures=()
+record_pass() { pass=$((pass + 1)); printf '  \033[0;32mPASS\033[0m %s\n' "$1"; }
+record_fail() { fail=$((fail + 1)); failures+=("$1"); printf '  \033[0;31mFAIL\033[0m %s\n' "$1"; }
+printf '→ smoke-update-check: Phase 0 Step 0.6 advisory contract\n'
+tmpdir=$(mktemp -d)
+trap 'rm -rf "$tmpdir"' EXIT
+CACHE="$tmpdir/update-check-cache"
+# 1. Script parses
+if bash -n "$SCRIPT" 2>/dev/null; then
+  record_pass "update-check.sh parses (bash -n)"
+else
+  record_fail "update-check.sh has syntax errors"
+fi
+now=$(date +%s)
+# 2. Fresh cache with newer latest -> reports, no network needed
+printf '%s|99.0.0\n' "$now" > "$CACHE"
+out=$(UPDATE_CHECK_CACHE="$CACHE" bash "$SCRIPT" --local 10.0.0 2>/dev/null); rc=$?
+if [ "$rc" -eq 0 ] && [ "$out" = "10.0.0|99.0.0" ]; then
+  record_pass "newer cached latest -> '<local>|<latest>' (exit 0)"
+else
+  record_fail "newer cached latest should print '10.0.0|99.0.0' (got '$out', rc=$rc)"
+fi
+# 3. Same version -> silent
+printf '%s|10.0.0\n' "$now" > "$CACHE"
+out=$(UPDATE_CHECK_CACHE="$CACHE" bash "$SCRIPT" --local 10.0.0 2>/dev/null); rc=$?
+if [ "$rc" -eq 0 ] && [ -z "$out" ]; then
+  record_pass "same version -> silent"
+else
+  record_fail "same version should be silent (got '$out', rc=$rc)"
+fi
+# 3b. Local ahead of registry (dev machine) -> silent
+printf '%s|10.0.0\n' "$now" > "$CACHE"
+out=$(UPDATE_CHECK_CACHE="$CACHE" bash "$SCRIPT" --local 10.1.0 2>/dev/null); rc=$?
+if [ "$rc" -eq 0 ] && [ -z "$out" ]; then
+  record_pass "local ahead of registry -> silent (no downgrade prompt)"
+else
+  record_fail "local-ahead should be silent (got '$out', rc=$rc)"
+fi
+# 3c. Offline + stale cache -> silent exit 0 (advisory: never blocks)
+printf '0|10.0.0\n' > "$CACHE"
+out=$(UPDATE_CHECK_CACHE="$CACHE" http_proxy="http://127.0.0.1:1" https_proxy="http://127.0.0.1:1" \
+      bash "$SCRIPT" --local 10.0.0 2>/dev/null); rc=$?
+if [ "$rc" -eq 0 ] && [ -z "$out" ]; then
+  record_pass "offline + stale cache -> silent exit 0"
+else
+  record_fail "offline should be a silent no-op (got '$out', rc=$rc)"
+fi
+# 4. Prefs schema knobs + defaults
+for prop in enabled ttlHours autoUpdate; do
+  if jq -e ".properties.global.properties.updateCheck.properties.${prop}" "$PREFS_SCHEMA" >/dev/null 2>&1; then
+    record_pass "prefs schema exposes updateCheck.${prop}"
+  else
+    record_fail "prefs schema missing updateCheck.${prop}"
+  fi
+done
+if jq -e '.properties.global.properties.updateCheck.properties.enabled.default == true' "$PREFS_SCHEMA" >/dev/null 2>&1; then
+  record_pass "updateCheck.enabled defaults to true (notify-only, bounded cost)"
+else
+  record_fail "updateCheck.enabled must default to true"
+fi
+if jq -e '.properties.global.properties.updateCheck.properties.autoUpdate | has("default") and .default == false' "$PREFS_SCHEMA" >/dev/null 2>&1; then
+  record_pass "updateCheck.autoUpdate defaults to false (no silent self-modify)"
+else
+  record_fail "updateCheck.autoUpdate must default to false"
+fi
+# 5. Phase 0 doc wiring
+if grep -qF "Step 0.6 - Update check" "$PHASE0_DOC"; then
+  record_pass "phase-0-init.md documents Step 0.6"
+else
+  record_fail "phase-0-init.md missing Step 0.6"
+fi
+if grep -qF "update-check.sh" "$PHASE0_DOC"; then
+  record_pass "phase-0-init.md invokes update-check.sh"
+else
+  record_fail "phase-0-init.md must invoke update-check.sh"
+fi
+if grep -qF "log-only" "$PHASE0_DOC" && grep -qF "never ask (zero-interaction contract)" "$PHASE0_DOC"; then
+  record_pass "phase-0-init.md states the autopilot log-only rule"
+else
+  record_fail "phase-0-init.md must state autopilot never asks (log-only)"
+fi
+printf '\n══ update-check smoke: %d passed, %d failed ══\n' "$pass" "$fail"
+if [ "$fail" -gt 0 ]; then
+  printf '\nFailures:\n'
+  for msg in "${failures[@]}"; do printf '  - %s\n' "$msg"; done
+  exit 1
+fi
+exit 0