npm - @mmerterden/multi-agent-pipeline - Versions diffs - 10.8.0 → 10.9.0 - Mend

@mmerterden/multi-agent-pipeline 10.8.0 → 10.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/CHANGELOG.md +39 -0
package/README.md +2 -0
package/docs/engineering.md +76 -0
package/docs/features.md +40 -36
package/package.json +1 -1
package/pipeline/commands/multi-agent/refs/phases/phase-0-init.md +9 -0
package/pipeline/commands/multi-agent/refs/phases/phase-3-dev.md +10 -0
package/pipeline/commands/multi-agent/refs/phases/phase-4-review.md +2 -23
package/pipeline/schemas/prefs.schema.json +24 -0
package/pipeline/schemas/token-budget.json +7 -7
package/pipeline/scripts/README.md +1 -0
package/pipeline/scripts/fixtures/install-layout.tsv +2 -2
package/pipeline/scripts/smoke-update-check.sh +122 -0
package/pipeline/scripts/update-check.sh +82 -0

package/CHANGELOG.md CHANGED Viewed

@@ -14,6 +14,45 @@ Internal file-layout changes that don't affect the slash-command surface are sti
 ---
+## [10.9.0] - 2026-07-03
+### Added
+- **Update check at run start (Phase 0 Step 0.6, opt-out via
+  `prefs.global.updateCheck`).** Once per `ttlHours` window (default 24h,
+  cached, 3s-bounded curl straight to the npm registry - never `npm view`),
+  the pipeline compares the installed version against `dist-tags.latest`.
+  Newer version found: interactive modes ask ONE question ("Update now?" ->
+  yes runs the `/multi-agent:update` flow and the run continues; the update
+  takes full effect next session); autopilot logs one line and never asks
+  (zero-interaction contract). `autoUpdate: true` skips the question and
+  updates before the worktree is created. Offline, ahead-of-registry, and
+  failed checks are silent - the step never blocks. New
+  `pipeline/scripts/update-check.sh` + `smoke-update-check.sh` (13 assertions).
+- **Design fidelity contract (Phase 3, BLOCKING on Figma-referenced tasks).**
+  The analysis doc's captured frames are the 1:1 reference for every UI line:
+  a Code Connect-mapped component must be used verbatim (no sound-alikes,
+  missing modifiers go into `+Modifiers`, never forks), unmapped UI becomes a
+  NEW component under the standard architecture, and inter-component spacing
+  comes from the design's measured values mapped to tokens - never invented.
+  Missing design data halts with a re-run-analysis instruction; Figma MCP
+  stays forbidden outside analysis.
+- **`docs/engineering.md`** - version-free catalog of the discipline layer
+  (bounded loops, evidence gates, token-budgeted phase docs, immutable tests,
+  fresh-context handoffs, memory hedges, install hygiene), linked from README.
+### Changed
+- **Docs drop inline version markers.** `docs/features.md` headers and bullets
+  no longer carry `(vX.Y.Z)` tags - the docs describe the current state;
+  history lives in this CHANGELOG. Same principle applied to the website copy.
+- **Phase-doc token budgets recalibrated** (`token-budget.json`, total
+  46500 -> 50000) after the verify-by-test, update-check, immutable-test, and
+  design-fidelity contracts landed - Step 3.7 prose was compressed to a
+  pointer into `refs/features/verify-by-test.md` before recalibration.
+---
 ## [10.8.0] - 2026-07-02
 Three additive loop-quality features, all sourced from the 2026 agentic-loop research sweep (Anthropic long-running-agent harness guidance + adversarial-review findings): empirical verification of blocking findings, an immutable-test contract, and structured phase-boundary handoffs.

package/README.md CHANGED Viewed

@@ -51,6 +51,8 @@ One command runs 8 phases, with a gate between the risky ones:
 Under the hood: each task runs in its own **git worktree** (or the current branch with `:local`), commits use the **git identity routed from the repo's origin URL**, and **multi-repo** tasks get per-repo worktrees plus an integration build. Tokens stay in the OS keychain; nothing is committed or logged. `/multi-agent:review` can also review an existing GitHub/Bitbucket PR — per-finding inline comments anchored to `file:line` + an explicit Approve / Needs-Work state.
+The discipline behind all of this — bounded loops, evidence gates, token-budgeted phase docs, immutable tests, fresh-context handoffs — is catalogued in [docs/engineering.md](./docs/engineering.md). The full feature list lives in [docs/features.md](./docs/features.md).
 ## Modes
 | Mode | Command | Flow |

package/docs/engineering.md ADDED Viewed

@@ -0,0 +1,76 @@
+# Engineering Discipline
+The pipeline's output quality comes less from any single model call and more from the constraints wrapped around every call. This page lists those constraints - the internals that keep long agentic runs correct, cheap, and auditable. No version markers here; this document always describes the current state (history lives in the [CHANGELOG](../CHANGELOG.md)).
+## Loop discipline
+Every loop in the pipeline has a deterministic exit condition and a hard iteration cap. Nothing "keeps trying".
+- **TDD micro-loop** (Phase 3): red -> green -> refactor per task; build-fix retries capped at 3 attempts.
+- **Dev-critic loop** (Phase 3.5, opt-in): generator vs critic on deterministic criteria; max 2 iterations, round 3 escalates to the user.
+- **Review-fix macro-loop** (Phase 3 <-> 4): only triage-accepted blocking findings re-enter development; max 3 iterations, then hard-kill and escalate.
+- **Disagreement round** (Phase 4, opt-in): when reviewers split, exactly one rebuttal round - never more.
+- **Global rule**: any retry loop stops after 3 attempts and asks the user. No exceptions, no infinite loops.
+## Evidence over claims
+- **Default-FAIL evidence gate**: a "build passed" or "tests passed" claim is rejected unless the captured log actually shows success. Missing, empty, or failure-marked logs fail CLOSED.
+- **Verify-by-test** (opt-in): an accepted blocking review finding can be empirically confirmed - a verifier agent writes a minimal repro test and runs it. Test fails as predicted: the finding stands and the failing test is handed to the rework loop as its RED test. Test passes: the finding is downgraded, and even the downgrade must pass the evidence gate.
+- **Deterministic validator gates**: triage output, reviewer output, diff-risk output, and agent state are schema-validated by zero-dependency scripts whose exit codes decide the flow - one self-correction rework, then halt with a recovery hint. The LLM never grades its own homework.
+## Test integrity
+- **Immutable tests**: deleting, renaming, or weakening an existing test to get a green run is a rule violation. A test changes only when the task changes the spec it encodes, named in the commit body.
+- **Deterministic backstop**: the diff-risk scorer flags any test file whose diff removes more lines than it adds (`test_lines_removed`), so a shrinking test suite surfaces to reviewers even if the rule is ignored.
+- **Test-gap detection**: newly added public symbols with no paired test are reported before the test phase.
+## Context management
+- **Token-budgeted phase docs**: every phase document has a per-file and total token budget enforced by a smoke gate (`token-budget.json`). Growth is compressed first; budgets are recalibrated only when real contract text lands. This keeps the orchestrator's working context lean across an 8-phase run.
+- **Lazy loading**: only the active phase's document is in context; references load on demand.
+- **Structured handoff blocks**: at every phase boundary the orchestrator appends a Done / Remaining / Decisions / Open findings / Next block to the run log - written from state it already holds, no extra model call. Resume and post-compaction re-entry read the latest handoff plus the state file, never the conversation history.
+- **Proactive compaction**: past ~50% context usage the run compacts itself and re-grounds from durable artifacts instead of waiting for lossy auto-compaction.
+## Review architecture
+- **Deterministic gates before AI review**: build, lint, tests, and secret scan must be green before a single reviewer token is spent.
+- **Cross-model parallel review**: independent reviewers with different vantage points, then a triage pass that filters false positives and out-of-scope noise. Reviewer findings are raw signals, not commands - nothing auto-loops on a "blocking" tag.
+- **Consensus surfacing**: unanimous agreement between same-base-model reviewers on judgment-heavy surfaces is flagged as unverified instead of being trusted as independent confirmation.
+- **Diff risk scoring**: a sub-second, zero-LLM heuristic ranks changed files (security paths, migrations, missing tests, shrinking tests) and feeds reviewers a priority hint - advisory, never a gate.
+- **Prompt-cache-aware dispatch**: reviewer and triage prompts share a byte-identical leading block so parallel dispatches hit the prompt cache instead of re-billing the diff.
+## Design fidelity
+- **Single source of design truth**: Figma is fetched once, during analysis, through a 3-tier fallback chain (MCP -> REST -> user screenshot). Every later phase reads the frozen analysis artifact; calling Figma from dev or review phases is a contract violation enforced by a smoke gate.
+- **1:1 implementation contract**: a Code Connect-mapped component must be used verbatim (no sound-alike substitutes; missing modifiers are added to the component, never forked). Unmapped UI becomes a new component under the standard architecture. Spacing between components comes from the design's measured values mapped to tokens - never invented numbers.
+## Cost transparency
+- **Per-phase token ledger**: every billable call forwards token counts to the tracker; each run ends with a cost table naming the top cost driver and pricing prompt-cache reads at their discounted rate.
+- **Budget ceilings**: a cost budget can cap a run; crossing it triggers model downgrade or halt rather than silent overspend.
+- **Model fallback contract**: premium-tier dispatches degrade deterministically (top tier -> mid -> base) on dispatch errors, date gates, or budget ceilings - never by improvisation.
+## Memory without bias
+- **Learnings ledger**: durable per-repo facts and rejected review preferences replay into analysis and triage, so agents stop re-discovering and reviewers stop re-flagging the same things.
+- **Hedged injection**: every memory injection carries an explicit "context, not command" hedge, and blocking-severity rejections are never memorized - a wrong rejection must not permanently suppress a finding class.
+- **Prior-art caps**: triage sees at most a fixed number of highest-similarity past findings; memory is advisory context, not a verdict multiplier.
+## Update and install hygiene
+- **Wipe-then-copy install**: pipeline-owned directories are cleared before copying, so files deleted upstream never linger as ghost commands or dead scripts on user machines.
+- **Layout fingerprint**: a fixture pins the installed file layout; structural drift fails the suite.
+- **Advisory update check**: at run start (cached, bounded, offline-silent) the pipeline compares its installed version with the registry; interactive runs offer a one-question update, autopilot only logs. It never blocks and never self-modifies mid-run.
+- **Token-preserving uninstall**: removing the pipeline never touches the user's stored credentials.
+## Security and hygiene
+- **Secret scanning**: staged diffs are scanned with provider-prefix patterns plus Shannon-entropy detection before every commit; findings block.
+- **Personal-data genericization**: the public repo is generated from the private source with a scrub pass (names, hosts, project keys) and a zero-hit grep gate before push.
+- **Skill scanner**: tiered pattern scanning over skill content guards against supply-chain surprises.
+- **Intent guard**: a deterministic classifier separates "answer my question" from "change my code", so a conceptual question never spawns a worktree.
+## Cross-CLI parity
+- **One source of truth, two runtimes**: Claude Code and Copilot CLI command surfaces are byte-identical where shared; a parity smoke fails on drift.
+- **Portability gates**: every shell script passes `bash -n` and a GNU/BSD portability grep; the CI matrix runs macOS, Linux, and Windows.

package/docs/features.md CHANGED Viewed

@@ -43,7 +43,7 @@ Compose freely: `--dev --local autopilot` = shortest, least-friction path.
 Build commands, test runners, lint tools, and review focus areas all adapt to the detected stack.
-### Stack Selection (marketplace plugins, v10.5.0+)
+### Stack Selection (marketplace plugins)
 Stack skill sets ship as versioned plugins in the `multi-agent-plugins` marketplace. Selecting a stack enables the matching plugin(s) in the current repo's `.claude/settings.json` `enabledPlugins` — no skill copying, no session restart tricks, no directory shuffling. The `ai-common-engineering-toolkit` (accessibility audit, humanizer, Firebase) is always enabled alongside the stack plugin.
@@ -57,7 +57,7 @@ Stack skill sets ship as versioned plugins in the `multi-agent-plugins` marketpl
 /multi-agent:stack all         # every stack plugin
 ```
-### Task Type Detection (v2.0.0)
+### Task Type Detection
 Phase 0 Step 9 classifies every task before Phase 1 starts. Deterministic priority order: Figma URL → instruction file path → git diff heuristic → Jira issue type → branch name → description keywords → user prompt (autopilot defaults to `feature`).
@@ -71,26 +71,26 @@ Result persisted to `agent-state.taskType`:
 | `refactor`  | Phase 4 emphasizes behavior preservation; Phase 6 uses `refactor(...)` prefix |
 | `chore`     | Lightweight flow; Phase 6 uses `chore(...)` prefix                            |
-### SubPhase Convention (v2.0.0)
+### SubPhase Convention
-When a specialized skill takes over a main pipeline phase, progress is reported as SubPhases (e.g. `SubPhase 3.0: Init`, `SubPhase 3.1: Gather`). The top-level pipeline stays fixed at 8 phases (0-7 — Phase 6.5 was merged into Phase 7 REPORT in v5.1.0) — specialized work slots into its parent phase without inflating the count.
+When a specialized skill takes over a main pipeline phase, progress is reported as SubPhases (e.g. `SubPhase 3.0: Init`, `SubPhase 3.1: Gather`). The top-level pipeline stays fixed at 8 phases (0-7) — specialized work slots into its parent phase without inflating the count.
 ## PR & Review Flow
-### Default Reviewers (v1.6.1, hardened in v2.0.0)
+### Default Reviewers
 - **Bitbucket**: Fetches via `/rest/default-reviewers/1.0/`. Every PUT must re-send `reviewers`, `fromRef`, `toRef`, `draft` (regression guarded by smoke test).
 - **GitHub**: Honors `CODEOWNERS` + falls back to `prefs.projects[].githubDefaultReviewers`.
 - PR author always filtered out (Bitbucket: 409, GitHub: GraphQL error).
-### Draft vs Ready Prompt (v1.7.0)
+### Draft vs Ready Prompt
 Phase 6 asks `DRAFT or READY?` before creating the PR and persists the choice in `prefs.projects[].defaultPrMode`.
 - Bitbucket: `draft: true` flag (DC 8.x+) with `[DRAFT]` title fallback for older servers.
 - GitHub: `gh pr create --draft` + `gh pr ready` for promotion.
-### `channels` Command (v5.7.0 — replaces `enrich`)
+### `channels` Command
 Multi-channel reporter — Phase 7 delegates to it, and it's also invocable post-hoc for fixes closed outside the pipeline:
@@ -102,9 +102,9 @@ Multi-channel reporter — Phase 7 delegates to it, and it's also invocable post
 /multi-agent:channels ABC-1234 --channels jira,confluence --content test
 ```
-Multi-select **channels** (Jira / Confluence / Wiki / PR description) × multi-select **content** (normal analysis / test scenarios / auto-diff summary / manual note). Each body runs through the humanizer skill per-channel. Bitbucket PR updates use the reviewer-preserving PUT pattern (title + description + reviewers + fromRef + toRef + version mandatory). Replaces the pre-v5.7 `enrich` command — all its capabilities (diff auto-summarize, manual mode, reviewer-preserving) are preserved; Confluence + Wiki are new.
+Multi-select **channels** (Jira / Confluence / Wiki / PR description) × multi-select **content** (normal analysis / test scenarios / auto-diff summary / manual note). Each body runs through the humanizer skill per-channel. Bitbucket PR updates use the reviewer-preserving PUT pattern (title + description + reviewers + fromRef + toRef + version mandatory). Replaces the earlier `enrich` command — all its capabilities (diff auto-summarize, manual mode, reviewer-preserving) are preserved; Confluence + Wiki are new.
-### Body Preservation Contract (v1.6.1, verified by smoke test in v2.0.0)
+### Body Preservation Contract (smoke-verified)
 Every external-system body (PR description, Jira comment, GitHub issue) uses `jq -n --rawfile body body.md '{description: $body}'` → `curl --data-binary @payload.json`. No literal `\n` strings, no HTML entities (`&amp;`, `&lt;`, `&quot;`). UTF-8 preserved end-to-end. `scripts/smoke-add-detail.sh` runs 14 contract assertions.
@@ -137,7 +137,7 @@ The reviewer set is **CLI-aware**: Claude Code dispatches 2 reviewers in paralle
 **Fable Triage** (Phase 4 Step 3, Opus on Copilot CLI): Evaluates merged raw findings against task scope. Classifies each as `accepted` (fix now), `deferred` (out of scope, log for later), or `rejected` (false positive / noise). Only triage-accepted blocking items loop back to Phase 3.
-### Runtime Triage Validator (v2.3.0)
+### Runtime Triage Validator
 After triage returns, output is validated by `validate-triage.mjs`:
@@ -148,19 +148,23 @@ After triage returns, output is validated by `validate-triage.mjs`:
 | **2** | Over-rejection guard tripped — pause for human               |
 | **3** | Contradiction auto-corrected — proceed with corrected output |
-### Bidirectional Approved↔Blocking Auto-Correction (v2.6.1)
+### Bidirectional Approved↔Blocking Auto-Correction
-If triage returns `approved: false` but has no blocking items, the validator forces `approved: true`. Conversely, if `approved: true` but blocking items exist, it forces `approved: false`. Hardened with `if`/`then` constraint in schema v3.0.0.
+If triage returns `approved: false` but has no blocking items, the validator forces `approved: true`. Conversely, if `approved: true` but blocking items exist, it forces `approved: false`. Hardened with an `if`/`then` constraint in the schema itself.
-### Verify-by-Test Triage (Phase 4 Step 3.7, v10.8.0, opt-in)
+### Verify-by-Test Triage (Phase 4 Step 3.7, opt-in)
 A triage verdict is a judgment call; a failing repro test is proof. When `prefs.global.verifyByTest.enabled` is on, one verifier agent (default Sonnet) writes a minimal repro test per accepted blocking finding (cap: `maxFindings`=3) and runs only that test. Fails as predicted -> finding confirmed, the repro test becomes the Phase 3 rework RED test. Passes under `evidence-gate.mjs` -> finding downgraded to `deferred`. Compile error / timeout -> `inconclusive`, judgment stands. Timeout-bounded, never blocks. Full spec: `refs/features/verify-by-test.md`.
-### Immutable-Test Rule + `test_lines_removed` Signal (v10.8.0)
+### Immutable-Test Rule + `test_lines_removed` Signal
 Existing tests are immutable during a task: deleting, renaming, or weakening an assertion to reach green is a violation (`refs/rules.md`, Phase 3 GREEN step). A test changes only when the task changes the spec it encodes, named in the commit body. Deterministic backstop: `diff-risk-score.mjs` emits `test_lines_removed` (w=3.0) for any test-classified file whose diff removes more lines than it adds.
-### Structured Handoff Blocks (v10.8.0)
+### Update Check at Run Start
+Phase 0 Step 0.6, opt-out via `prefs.global.updateCheck`. Once per `ttlHours` window (cached, 3s-bounded curl to the npm registry), the installed version is compared against `dist-tags.latest`. Newer version: interactive modes ask "Update now?" (yes runs the `/multi-agent:update` flow, then the run continues); autopilot logs one line and never asks. `autoUpdate: true` updates silently before the worktree exists. Offline and failed checks are silent - never blocks.
+### Structured Handoff Blocks
 Every phase transition appends a `## Handoff` block (Done / Remaining / Decisions / Open findings / Next) to `agent-log.md` - orchestrator-written from existing state, no LLM call. `/multi-agent:resume` and post-`/compact` re-grounding read the latest handoff first, so long runs re-enter from durable artifacts instead of conversation memory (fresh-context discipline from Anthropic's long-running-agent harness guidance).
@@ -174,7 +178,7 @@ If changes include UI files, reviewers check for:
 Pure code analysis — no simulator needed. Device-level audits run in Phase 5 when requested.
-### Status Enforcement (v1.5.0)
+### Status Enforcement
 Phase 3 marks issue-tracker status update as MANDATORY with a post-mutation verify step that re-reads the field and retries once on silent `VALIDATION` failures (e.g. stale Projects V2 option IDs after a board rebuild).
@@ -187,49 +191,49 @@ Phase 3 marks issue-tracker status update as MANDATORY with a post-mutation veri
 ## Testing & Quality
-### Schema-Validated State (v2.0.0)
+### Schema-Validated State
 All critical state files are schema-validated at read and write time:
 - `agent-state.schema.json` — validates `$HOME/.claude/logs/multi-agent/.../agent-state.json`
-- `prefs.schema.json` — validates `$HOME/.claude/multi-agent-preferences.json` (extended v2.5.0)
-- `triage-output.schema.json` — validates triage output (v2.3.0, hardened v2.6.1, `if`/`then` constraint v3.0.0)
+- `prefs.schema.json` — validates `$HOME/.claude/multi-agent-preferences.json`
+- `triage-output.schema.json` — validates triage output (contradiction `if`/`then` constraint built in)
-### Smoke Test Suites (v2.0.0–v2.5.0)
+### Smoke Test Suites
 10 suites: add-detail (14 body-preservation assertions), review-triage (validator exit codes), prefs (schema round-trip), state (agent-state lifecycle), metrics (telemetry emission), sync (instruction parity), secret-scan (hook patterns), phase-banner (terminal UI), token-budget (per-phase limits), phase-tracker (progress tracking).
-### Adversarial Eval Fixtures (v2.5.0–v2.6.0)
+### Adversarial Eval Fixtures
 10 fixtures that test triage resilience against adversarial reviewer output: over-rejection, hallucinated findings, contradictions, invalid JSON, schema violations, duplicate findings, scope creep, empty results, timeout simulation, and combined edge cases.
-### Sync Parity Check (v2.3.0)
+### Sync Parity Check
 Detects drift between Claude Code instructions (`~/.claude/commands/multi-agent.md`), Copilot CLI instructions (`~/.copilot/copilot-instructions.md`), and the repo's pipeline spec files. Reports discrepancies during Phase 0 Init.
-### Token Budget Enforcement (v3.0.0)
+### Token Budget Enforcement
 Per-phase token budgets prevent runaway sessions. If a phase exceeds its budget, the pipeline pauses and offers: continue (extend budget), skip phase, or abort. Budgets are configurable in `prefs.global.tokenBudgets`.
 ## Telemetry & Observability
-- **Pipeline Metrics** (v2.3.0): Structured metrics to `metrics.jsonl` via `log-metric.sh`. Aggregated by `aggregate-metrics.mjs`.
-- **Cost Telemetry** (v2.5.0): Per-phase token cost tracking (`tokens_in`, `tokens_out`, `model`, `duration_ms`). Omitted fields handled gracefully.
-- **Phase Tracker** (v2.4.0): Cross-CLI visual progress (current phase, elapsed time, iteration count).
-- **Phase Banner** (v2.4.0): Terminal UI for phase transitions with Unicode box-drawing characters.
-- **Per-task Cost Breakdown in agent-log.md** (v8.3.0): Phase 7 appends a 4-column block (Phase · Model · Tokens in/out · Est. USD) to every run's `agent-log.md`. Sourced from `phase-tracker.sh tokens` accumulators × `cost-table.json` prices. Independent of the channels-side `reportContent.costSummary` toggle. The `LOG_METRIC_FORWARD_TO_TRACKER=1` env flag mirrors `tokens_in`/`tokens_out`/`model` from `log-metric.sh` into the tracker so JSONL metrics and the cost block stay in sync from one call site.
+- **Pipeline Metrics**: Structured metrics to `metrics.jsonl` via `log-metric.sh`. Aggregated by `aggregate-metrics.mjs`.
+- **Cost Telemetry**: Per-phase token cost tracking (`tokens_in`, `tokens_out`, `model`, `duration_ms`). Omitted fields handled gracefully.
+- **Phase Tracker**: Cross-CLI visual progress (current phase, elapsed time, iteration count).
+- **Phase Banner**: Terminal UI for phase transitions with Unicode box-drawing characters.
+- **Per-task Cost Breakdown in agent-log.md**: Phase 7 appends a 4-column block (Phase · Model · Tokens in/out · Est. USD) to every run's `agent-log.md`. Sourced from `phase-tracker.sh tokens` accumulators × `cost-table.json` prices. Independent of the channels-side `reportContent.costSummary` toggle. The `LOG_METRIC_FORWARD_TO_TRACKER=1` env flag mirrors `tokens_in`/`tokens_out`/`model` from `log-metric.sh` into the tracker so JSONL metrics and the cost block stay in sync from one call site.
-### Diff Risk Scoring (v8.3.0)
+### Diff Risk Scoring
 `pipeline/scripts/diff-risk-score.mjs` runs at Phase 4 Step 1.75 — before reviewer dispatch. Heuristic, deterministic, sub-second, no LLM. Top-N risk-ranked files inject into each reviewer's prompt as a `${PRIORITY_FILES}` block; reviewers read those files first but still review the entire diff.
-Signals + weights: `security_path` ×3, `migration` ×4, `public_api` ×2, `no_test_change` ×2.5, `test_lines_removed` ×3 (v10.8.0: test file shrinks - immutable-test backstop), `complexity_delta` ×1.5, `ui_critical` ×1.5, `loc_changed` ×1. Toggle via `prefs.global.diffRiskAdvisory` (default ON).
+Signals + weights: `security_path` ×3, `migration` ×4, `public_api` ×2, `no_test_change` ×2.5, `test_lines_removed` ×3 (test file shrinks - immutable-test backstop), `complexity_delta` ×1.5, `ui_critical` ×1.5, `loc_changed` ×1. Toggle via `prefs.global.diffRiskAdvisory` (default ON).
-### Test Gap Detection (v8.3.0)
+### Test Gap Detection
 `pipeline/scripts/test-gap-scan.mjs` runs at Phase 5 Step 0. Walks the diff for newly added public symbols and reports those with no paired test. Stack-specific rules ship for iOS, Android, Python, Node.js. iOS Views and Android `@Composable` symbols default to `important`; other public API additions to `suggestion`. Optional gating via `prefs.testGap.blockingThreshold` — when set, the report becomes a Phase 4 rework finding once `important + blocking` count exceeds the threshold.
-### Triage Memory (v8.3.0)
+### Triage Memory
 Per-repo append-only JSONL corpus at `~/.claude/memory/multi-agent/<repo-slug>/triage-corpus.jsonl`. Phase 7 ingests every triage output (idempotent), Phase 1 enriches the analysis with similar past tasks, Phase 4 triage attaches prior-art hits to each raw finding with an explicit bias hedge. Token-overlap recall, zero deps, Node-18-compatible. `/multi-agent:search "<text>" --semantic` routes the query to the corpus instead of agent-log grep. Toggle via `prefs.global.priorArtEnrichment.enabled` (default ON).
@@ -280,10 +284,10 @@ Token registry maps logical names (`jira`, `bitbucket`, `github`, `confluence`)
 ## Schemas & Validation
-- `pipeline/schemas/agent-state.schema.json` — validates agent state lifecycle (v2.0.0)
-- `pipeline/schemas/prefs.schema.json` — validates preferences (v2.0.0, extended v2.5.0)
-- `pipeline/schemas/triage-output.schema.json` — validates triage output (v2.3.0, hardened v2.6.1, `if`/`then` constraint v3.0.0)
+- `pipeline/schemas/agent-state.schema.json` — validates agent state lifecycle
+- `pipeline/schemas/prefs.schema.json` — validates preferences
+- `pipeline/schemas/triage-output.schema.json` — validates triage output (contradiction `if`/`then` constraint built in)
-## File Layout (v1.9.0)
+## File Layout
 `commands/multi-agent/{add-detail,setup,help}.md` → slash commands. `commands/multi-agent/refs/**` → guides, phase specs, rules (`refs:` prefix signals "not an action"). Modifier flags and ops stay inline in `multi-agent.md`.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@mmerterden/multi-agent-pipeline",
-  "version": "10.8.0",
+  "version": "10.9.0",
   "description": "8-phase AI development pipeline with full orchestration on Claude Code and Copilot CLI. Analysis, planning, TDD, CLI-aware parallel review with consensus surfacing + Fable triage, default-FAIL evidence gates, secret + intent guards, per-phase cost ledger, persistent learnings memory, wiki generation, commit automation. Token-preserving uninstall.",
   "type": "module",
   "main": "index.js",

package/pipeline/commands/multi-agent/refs/phases/phase-0-init.md CHANGED Viewed

@@ -98,6 +98,15 @@ Log the resolved tier in the agent log:
 Full chain definition, REST endpoints, URL parsing, canonical-component contract: `pipeline/rules/figma-pipeline.md` "MUST: Figma access - 3-tier fallback chain (BLOCKING, pipeline-wide)". Do not duplicate it here.
+#### Step 0.6 - Update check (advisory, opt-out via `prefs.global.updateCheck.enabled`)
+Run `bash pipeline/scripts/update-check.sh` (cached per `updateCheck.ttlHours`, default 24h; 3s-bounded; every failure path silent). Empty output → continue. Output `<local>|<latest>` means a newer version exists:
+- **Interactive**: log `→ update available: v<local> -> v<latest>`, ask ONE AskUserQuestion  -  **Update now** (recommended) / **Continue without updating**. On *Update now* (or `updateCheck.autoUpdate: true`, which skips the question): run the `/multi-agent:update` flow, log `→ updated to v<latest>`, continue the run (note in the log: already-loaded phase docs finish this run on the old version; full effect next session). On *Continue*: no re-ask until the TTL expires.
+- **Autopilot**: never ask (zero-interaction contract). Log `→ update available: ... (log-only; run /multi-agent:update)` and continue  -  unless `autoUpdate: true`, then update silently first.
+Advisory  -  NEVER blocks. Must run BEFORE Step 6 (worktree creation) so an accepted update cannot mutate `~/.claude` under a mid-phase run.
 #### Step 1  -  Parse Input
 **Branch from input**: If the user provided a branch name after the issue reference (space-separated), store as `baseBranch` and skip Step 3. Otherwise Step 3 asks interactively.

package/pipeline/commands/multi-agent/refs/phases/phase-3-dev.md CHANGED Viewed

@@ -49,6 +49,16 @@ When Phase 0 Step 7 classified the task as `component`, Phase 3 delegates the en
 For non-component taskTypes (`bugfix`, `feature`, `refactor`, `chore`), continue with the standard TDD section below.
+#### Design fidelity contract (BLOCKING when the task carries a Figma reference)
+Applies to EVERY task whose analysis doc carries design ground truth (Section 6 rows, node IDs, screenshots)  -  component-dispatch AND TDD-path UI work alike. MCP stays forbidden here (analysis-only rule); the analysis doc IS the design:
+1. **1:1, not interpretation.** The captured frames are the reference for every UI line. Layout, variant states, copy, and colours come from the analysis doc's measured values  -  never eyeballed, never "close enough".
+2. **Code Connect decides the component.** A mapped component (`CodeConnectSnippet` / repo `*.figma.swift` / `*.figma.kt` row) MUST be used verbatim  -  sound-alike substitutes are forbidden; a missing modifier is added to the component's `+Modifiers` extension, never forked. No mapping exists -> write a NEW component per the Configuration/View/Modifiers architecture; never inline ad-hoc UI into the consumer screen.
+3. **Inter-component spacing is part of the design.** Gaps, paddings, and alignment BETWEEN components must match the frame's measured values, mapped to spacing tokens (`Spacing*`)  -  never invented numbers. A spacing/layout deviation is a `review_blocking` finding in Phase 4 Step 1.8, not a nitpick.
+Missing design data (variant, padding, copy) → HALT and instruct the user to re-run `/multi-agent:analysis`; never guess and never call Figma MCP from this phase.
 #### Re-entry from Phase 4 triage
 Phase 3 runs twice in the pipeline lifetime: first for initial development, then optionally for rework after Phase 4 review. **Phase 3 never acts on raw reviewer output.** It only consumes `triage.accepted` findings  -  Fable triage in Phase 4 already filtered false-positives, deferred out-of-scope items, and rejected noise.

package/pipeline/commands/multi-agent/refs/phases/phase-4-review.md CHANGED Viewed

@@ -390,30 +390,9 @@ After the triage verdict is computed, populate `triage.consensus`:
 ##### 3.7 Verify-by-test (opt-in, empirical validation of blocking findings)
-**Rationale:** a triage verdict is still a judgment call; a failing repro test is proof. Debating a finding costs tokens, running it costs one test invocation and kills false positives that survive adversarial framing.
+A triage verdict is judgment; a failing repro test is proof. Runs only when `prefs.global.verifyByTest.enabled` is `true` AND `accepted` contains a `blocking` finding; otherwise skip silently. **Full contract (verdict table, cleanup invariant, prompts): `refs/features/verify-by-test.md`  -  read it before executing this step.**
-**Gate:** runs only when `prefs.global.verifyByTest.enabled` is `true` AND the validated triage output contains at least one `accepted` finding with `severity: "blocking"`. Otherwise skip silently (no log noise). Full behavior spec: `refs/features/verify-by-test.md`.
-1. **Dispatch ONE verifier agent** (subagent_type: `general-purpose`, model: `prefs.global.verifyByTest.model`, default `sonnet`) for the iteration  -  not one per finding. Input: up to `verifyByTest.maxFindings` (default 3) accepted blocking findings (ordered as triage returned them), the diff hunks for their files, and the project's test conventions from Phase 1. Findings beyond the cap keep their judgment-only verdict; log `verify_by_test=cap-exceeded count=<n>`.
-2. **Per finding, the verifier writes ONE minimal repro test** asserting the CORRECT behavior the finding claims is broken, following the platform test conventions (framework, naming, location per phase-3-dev.md), then runs ONLY that test using the platform single-test invocation from Phase 3 (`xcodebuild test -only-testing:`, `pytest {file}::{name}`, `npm test -- --testPathPattern=`, `./gradlew test --tests`), wrapped in `acquire_build_lock`/`release_build_lock`, log tee'd to `$WORKTREE/.pipeline/verify-<i>.test.log`.
-3. **Verdict mapping (fails toward keeping blockers):**
-| Repro test outcome | `verification.result` | Action on finding |
-| --- | --- | --- |
-| Test FAILS as the finding predicts | `confirmed` | Stays `accepted` blocking. Repro test is KEPT in the worktree and recorded in `redTests[]` as the RED test for the Phase 3 rework loop. |
-| Test PASSES (defect not reproducible) | `not-reproduced` | Downgrade is evidence-gated: `node pipeline/scripts/evidence-gate.mjs --claim test --status passed --evidence "$WORKTREE/.pipeline/verify-<i>.test.log"` must exit 0. On exit 0: move finding from `accepted[]` to `deferred[]` with reason `verify-by-test: not reproduced - repro test <testRef> passed`, delete the repro test file. On exit non-zero: treat as `inconclusive`. |
-| Compile error, timeout, or defect not expressible as a unit test | `inconclusive` | Stays `accepted` blocking (judgment-only verdict stands). Partial test file deleted, cause noted in `verification.note`. |
-4. **Cleanup invariant:** after Step 3.7 the only verifier artifacts left in the worktree are the confirmed repro tests listed in `redTests[]` (they get committed with the fix, satisfying the TDD contract) and logs under `$WORKTREE/.pipeline/` (already outside Phase 6 commit scope).
-5. **Persist + re-validate:** stamp each verified finding with a `verification` object (schema v3.2.0), persist `state.reviewIterations[-1].verifyByTest = { attempted, confirmed, downgraded, inconclusive, redTests: [{file, testRef, issue}] }`, recompute `approved`, then re-run `validate-triage.mjs` on the mutated `$TRIAGE_FILE` under the same 3.2.1 gate protocol.
-6. **Timeout/fallback (mirrors 3.3):** the whole step is bounded by `verifyByTest.stepTimeoutSec` (default 600). On verifier crash or budget breach: no retry; remaining findings keep judgment-only verdicts, log `triage=verify-by-test-timeout`, proceed to Step 4. Never blocks the pipeline.
-Telemetry (per 3.4 conventions, best-effort):
-```bash
-LOG_METRIC_FORWARD_TO_TRACKER=1 pipeline/scripts/log-metric.sh "$TASK_ID" 4 review.verify_by_test \
-  attempted=$A confirmed=$C downgraded=$D inconclusive=$I duration_ms=$MS
-```
+Compressed flow: dispatch ONE verifier agent (model `verifyByTest.model`, default `sonnet`) for up to `maxFindings` (default 3) accepted blocking findings. Per finding it writes ONE minimal repro test and runs ONLY that test (Phase 3 single-test invocation, build lock, log tee'd to `$WORKTREE/.pipeline/verify-<i>.test.log`). Outcomes: test FAILS as predicted -> `confirmed`, finding stays blocking and the test is KEPT in `redTests[]` as the Phase 3 rework RED test; test PASSES -> `not-reproduced` ONLY if `evidence-gate.mjs --claim test --status passed` exits 0 on the log, finding moves to `deferred[]`, test deleted; compile error / timeout / not unit-testable -> `inconclusive`, judgment verdict stands. Stamp findings with `verification` (schema v3.2.0), persist `state.reviewIterations[-1].verifyByTest = {attempted, confirmed, downgraded, inconclusive, redTests[]}`, recompute `approved`, re-run `validate-triage.mjs` under the 3.2.1 gate. Whole step bounded by `stepTimeoutSec` (default 600); on breach or crash remaining findings keep judgment verdicts  -  never blocks. Telemetry per 3.4: `review.verify_by_test attempted= confirmed= downgraded= inconclusive= duration_ms=`.
 #### Step 4  -  Consensus + Action (triage-driven)

package/pipeline/schemas/prefs.schema.json CHANGED Viewed

@@ -701,6 +701,30 @@
           "default": false,
           "description": "v6.1.0+ \u2014 Phase 4 Step 2.5 rebuttal round. When reviewers disagree (mixed blocker/approved verdict), each reviewer is re-prompted with the others' opposing arguments for one additional round before triage. Lifts signal quality on ambiguous findings at ~1\u00d7 Step 2 token cost. Off by default \u2014 flip for security-critical or release-branch reviews."
         },
+        "updateCheck": {
+          "type": "object",
+          "additionalProperties": false,
+          "description": "v10.9+ - Phase 0 Step 0.6 advisory version check. Once per ttlHours window, a bounded (3s) registry read compares the installed version against dist-tags.latest. Newer version found: interactive modes ask 'Update now / Continue' (yes runs the /multi-agent:update flow, then the run continues); autopilot logs one line and never asks. Offline/failed checks are silent; the step never blocks the pipeline.",
+          "properties": {
+            "enabled": {
+              "type": "boolean",
+              "default": true,
+              "description": "Master switch. On by default - the cost is at most one 3s-bounded curl per ttlHours."
+            },
+            "ttlHours": {
+              "type": "integer",
+              "minimum": 1,
+              "maximum": 168,
+              "default": 24,
+              "description": "Cache window for the registry read."
+            },
+            "autoUpdate": {
+              "type": "boolean",
+              "default": false,
+              "description": "When true, skip the question and run the update flow automatically before the run starts (interactive AND autopilot). Off by default - self-modifying ~/.claude without asking is a surprise."
+            }
+          }
+        },
         "verifyByTest": {
           "type": "object",
           "additionalProperties": false,

package/pipeline/schemas/token-budget.json CHANGED Viewed

@@ -3,15 +3,15 @@
   "$id": "https://github.com/mmerterden/multi-agent-pipeline/pipeline/schemas/token-budget.json",
   "description": "Per-phase token budget for lazy-loaded pipeline docs. Enforced by smoke-token-budget.sh.",
   "phases": {
-    "phase-0-init":      { "max_tokens": 11250, "warn_tokens": 9900 },
+    "phase-0-init":      { "max_tokens": 12400, "warn_tokens": 10900 },
     "phase-1-analysis":  { "max_tokens": 3750, "warn_tokens": 3300 },
     "phase-2-planning":  { "max_tokens": 6500, "warn_tokens": 5750 },
-    "phase-3-dev":       { "max_tokens": 7650, "warn_tokens": 6750 },
-    "phase-4-review":    { "max_tokens": 11100, "warn_tokens": 9750 },
-    "phase-5-test":      { "max_tokens": 2300, "warn_tokens": 2000 },
-    "phase-6-commit":    { "max_tokens": 5550, "warn_tokens": 4900 },
+    "phase-3-dev":       { "max_tokens": 7900, "warn_tokens": 6950 },
+    "phase-4-review":    { "max_tokens": 13250, "warn_tokens": 11650 },
+    "phase-5-test":      { "max_tokens": 2550, "warn_tokens": 2250 },
+    "phase-6-commit":    { "max_tokens": 6150, "warn_tokens": 5400 },
     "phase-7-report":    { "max_tokens": 5600, "warn_tokens": 4950 }
   },
-  "total_max_tokens": 46500,
-  "note": "Token estimate = ceil(chars / 4). Per-phase budget rule: warn = current+10% (rounded to nearest 50), max = current+25%. Gives ~6 edit cycles of headroom before warn trips  -  intentionally quiet under normal maintenance, loud when a phase grows unusually. Only the active phase is loaded (lazy). Recalibrated at v10.0.0 after the validator/consistency/simplifier/lesson gate contracts landed in phases 1-4 (prose already compressed; the residual growth is the gate contracts themselves)."
+  "total_max_tokens": 50000,
+  "note": "Token estimate = ceil(chars / 4). Per-phase budget rule: warn = current+10% (rounded to nearest 50), max = current+25%. Gives ~6 edit cycles of headroom before warn trips  -  intentionally quiet under normal maintenance, loud when a phase grows unusually. Only the active phase is loaded (lazy). Recalibrated at v10.0.0 after the validator/consistency/simplifier/lesson gate contracts landed in phases 1-4. Recalibrated again at v10.9.0 after the verify-by-test (Phase 4 Step 3.7), update-check (Phase 0 Step 0.6), immutable-test (Phase 3 GREEN) and redTests re-entry contracts landed  -  Step 3.7 prose was compressed to a pointer into refs/features/verify-by-test.md before the recalibration."
 }

package/pipeline/scripts/README.md CHANGED Viewed

@@ -24,6 +24,7 @@ Validate contracts. Each emits `══ <name> smoke: N passed, M failed ══`
 - `smoke-phase4-triage.sh`  -  Phase 4 reviewer → triage flow
 - `smoke-verify-by-test.sh`  -  Phase 4 Step 3.7 verify-by-test contract (v10.8.0)
 - `smoke-handoff-contract.sh`  -  phase-boundary structured handoff + handoff-first resume (v10.8.0)
+- `smoke-update-check.sh`  -  Phase 0 Step 0.6 advisory update-check contract (v10.9.0)
 ### Schema + state
 - `smoke-schema-validation.sh`  -  all JSON schemas validate

package/pipeline/scripts/fixtures/install-layout.tsv CHANGED Viewed

@@ -5,12 +5,12 @@
 .claude/multi-agent-preferences.json	1
 .claude/rules	12
 .claude/schemas	23
-.claude/scripts	169
+.claude/scripts	171
 .claude/settings.json	1
 .claude/skills	560
 .copilot/agents	8
 .copilot/copilot-instructions.md	1
 .copilot/lib	23
 .copilot/schemas	23
-.copilot/scripts	169
+.copilot/scripts	171
 .copilot/skills	596

package/pipeline/scripts/smoke-update-check.sh ADDED Viewed

@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+# smoke-update-check.sh
+#
+# Verifies the Phase 0 Step 0.6 advisory update-check contract:
+#   1. update-check.sh exists, parses (bash -n), and honors the advisory contract
+#      offline: exit 0 + empty stdout when the registry is unreachable
+#   2. Cached path: fresh cache short-circuits without a network call
+#   3. Newer latest -> "<local>|<latest>"; same or older latest -> silent
+#   4. prefs.schema.json exposes updateCheck.{enabled,ttlHours,autoUpdate}
+#      with the documented defaults (enabled=true, autoUpdate=false)
+#   5. phase-0-init.md documents Step 0.6 with the autopilot log-only rule
+#
+# Exit 0 = all pass, 1 = any failure.
+set -euo pipefail
+ROOT="$(cd "$(dirname "$0")/../.." && pwd)"
+SCRIPT="$ROOT/pipeline/scripts/update-check.sh"
+PREFS_SCHEMA="$ROOT/pipeline/schemas/prefs.schema.json"
+PHASE0_DOC="$ROOT/pipeline/commands/multi-agent/refs/phases/phase-0-init.md"
+pass=0
+fail=0
+failures=()
+record_pass() { pass=$((pass + 1)); printf '  \033[0;32mPASS\033[0m %s\n' "$1"; }
+record_fail() { fail=$((fail + 1)); failures+=("$1"); printf '  \033[0;31mFAIL\033[0m %s\n' "$1"; }
+printf '→ smoke-update-check: Phase 0 Step 0.6 advisory contract\n'
+tmpdir=$(mktemp -d)
+trap 'rm -rf "$tmpdir"' EXIT
+CACHE="$tmpdir/update-check-cache"
+# 1. Script parses
+if bash -n "$SCRIPT" 2>/dev/null; then
+  record_pass "update-check.sh parses (bash -n)"
+else
+  record_fail "update-check.sh has syntax errors"
+fi
+now=$(date +%s)
+# 2. Fresh cache with newer latest -> reports, no network needed
+printf '%s|99.0.0\n' "$now" > "$CACHE"
+out=$(UPDATE_CHECK_CACHE="$CACHE" bash "$SCRIPT" --local 10.0.0 2>/dev/null); rc=$?
+if [ "$rc" -eq 0 ] && [ "$out" = "10.0.0|99.0.0" ]; then
+  record_pass "newer cached latest -> '<local>|<latest>' (exit 0)"
+else
+  record_fail "newer cached latest should print '10.0.0|99.0.0' (got '$out', rc=$rc)"
+fi
+# 3. Same version -> silent
+printf '%s|10.0.0\n' "$now" > "$CACHE"
+out=$(UPDATE_CHECK_CACHE="$CACHE" bash "$SCRIPT" --local 10.0.0 2>/dev/null); rc=$?
+if [ "$rc" -eq 0 ] && [ -z "$out" ]; then
+  record_pass "same version -> silent"
+else
+  record_fail "same version should be silent (got '$out', rc=$rc)"
+fi
+# 3b. Local ahead of registry (dev machine) -> silent
+printf '%s|10.0.0\n' "$now" > "$CACHE"
+out=$(UPDATE_CHECK_CACHE="$CACHE" bash "$SCRIPT" --local 10.1.0 2>/dev/null); rc=$?
+if [ "$rc" -eq 0 ] && [ -z "$out" ]; then
+  record_pass "local ahead of registry -> silent (no downgrade prompt)"
+else
+  record_fail "local-ahead should be silent (got '$out', rc=$rc)"
+fi
+# 3c. Offline + stale cache -> silent exit 0 (advisory: never blocks)
+printf '0|10.0.0\n' > "$CACHE"
+out=$(UPDATE_CHECK_CACHE="$CACHE" http_proxy="http://127.0.0.1:1" https_proxy="http://127.0.0.1:1" \
+      bash "$SCRIPT" --local 10.0.0 2>/dev/null); rc=$?
+if [ "$rc" -eq 0 ] && [ -z "$out" ]; then
+  record_pass "offline + stale cache -> silent exit 0"
+else
+  record_fail "offline should be a silent no-op (got '$out', rc=$rc)"
+fi
+# 4. Prefs schema knobs + defaults
+for prop in enabled ttlHours autoUpdate; do
+  if jq -e ".properties.global.properties.updateCheck.properties.${prop}" "$PREFS_SCHEMA" >/dev/null 2>&1; then
+    record_pass "prefs schema exposes updateCheck.${prop}"
+  else
+    record_fail "prefs schema missing updateCheck.${prop}"
+  fi
+done
+if jq -e '.properties.global.properties.updateCheck.properties.enabled.default == true' "$PREFS_SCHEMA" >/dev/null 2>&1; then
+  record_pass "updateCheck.enabled defaults to true (notify-only, bounded cost)"
+else
+  record_fail "updateCheck.enabled must default to true"
+fi
+if jq -e '.properties.global.properties.updateCheck.properties.autoUpdate | has("default") and .default == false' "$PREFS_SCHEMA" >/dev/null 2>&1; then
+  record_pass "updateCheck.autoUpdate defaults to false (no silent self-modify)"
+else
+  record_fail "updateCheck.autoUpdate must default to false"
+fi
+# 5. Phase 0 doc wiring
+if grep -qF "Step 0.6 - Update check" "$PHASE0_DOC"; then
+  record_pass "phase-0-init.md documents Step 0.6"
+else
+  record_fail "phase-0-init.md missing Step 0.6"
+fi
+if grep -qF "update-check.sh" "$PHASE0_DOC"; then
+  record_pass "phase-0-init.md invokes update-check.sh"
+else
+  record_fail "phase-0-init.md must invoke update-check.sh"
+fi
+if grep -qF "log-only" "$PHASE0_DOC" && grep -qF "never ask (zero-interaction contract)" "$PHASE0_DOC"; then
+  record_pass "phase-0-init.md states the autopilot log-only rule"
+else
+  record_fail "phase-0-init.md must state autopilot never asks (log-only)"
+fi
+printf '\n══ update-check smoke: %d passed, %d failed ══\n' "$pass" "$fail"
+if [ "$fail" -gt 0 ]; then
+  printf '\nFailures:\n'
+  for msg in "${failures[@]}"; do printf '  - %s\n' "$msg"; done
+  exit 1
+fi
+exit 0

package/pipeline/scripts/update-check.sh ADDED Viewed

@@ -0,0 +1,82 @@
+#!/usr/bin/env bash
+# update-check.sh  -  cached advisory version check (Phase 0 Step 0.6)
+#
+# Compares the locally installed pipeline version against the npm registry's
+# dist-tags.latest. Cached with a TTL so at most one network call per TTL
+# window; the call is bounded by a short timeout and every failure path is
+# silent  -  this script NEVER blocks or fails the pipeline.
+#
+# stdout: "<local>|<latest>" when a newer version exists, nothing otherwise.
+# Exit code: always 0 (advisory contract).
+#
+# Usage:
+#   bash pipeline/scripts/update-check.sh                 # auto-detect local version
+#   bash pipeline/scripts/update-check.sh --local 10.8.0  # explicit local version
+#   bash pipeline/scripts/update-check.sh --ttl-hours 24  # cache window (default 24)
+#   bash pipeline/scripts/update-check.sh --force         # ignore cache
+#
+# Cache file: ~/.claude/logs/multi-agent/.update-check ("epoch|latest").
+# Registry read is a plain curl  -  never `npm view` (a user-level .npmrc scope
+# mapping can silently reroute npm to a different registry; curl cannot lie).
+set -euo pipefail
+PKG="@mmerterden/multi-agent-pipeline"
+REGISTRY_URL="https://registry.npmjs.org/${PKG/\//%2F}"
+CACHE_FILE="${UPDATE_CHECK_CACHE:-$HOME/.claude/logs/multi-agent/.update-check}"
+TTL_HOURS=24
+LOCAL_VERSION=""
+FORCE=0
+while [ $# -gt 0 ]; do
+  case "$1" in
+    --local) LOCAL_VERSION="${2:-}"; shift 2 ;;
+    --ttl-hours) TTL_HOURS="${2:-24}"; shift 2 ;;
+    --force) FORCE=1; shift ;;
+    *) shift ;;
+  esac
+done
+# Local version: explicit arg, else read from the pipeline repo clone.
+if [ -z "$LOCAL_VERSION" ]; then
+  for candidate in "$HOME/multi-agent-pipeline" "$HOME/dev/multi-agent-pipeline" "$HOME/projects/multi-agent-pipeline"; do
+    if [ -f "$candidate/package.json" ]; then
+      LOCAL_VERSION=$(node -p "require('$candidate/package.json').version" 2>/dev/null || true)
+      [ -n "$LOCAL_VERSION" ] && break
+    fi
+  done
+fi
+[ -z "$LOCAL_VERSION" ] && exit 0  # cannot determine local version -> silent no-op
+now=$(date +%s)
+latest=""
+# Fresh cache?
+if [ "$FORCE" -eq 0 ] && [ -f "$CACHE_FILE" ]; then
+  cached_epoch=$(cut -d'|' -f1 "$CACHE_FILE" 2>/dev/null || echo 0)
+  cached_latest=$(cut -d'|' -f2 "$CACHE_FILE" 2>/dev/null || echo "")
+  case "$cached_epoch" in (*[!0-9]*|"") cached_epoch=0 ;; esac
+  if [ $((now - cached_epoch)) -lt $((TTL_HOURS * 3600)) ] && [ -n "$cached_latest" ]; then
+    latest="$cached_latest"
+  fi
+fi
+# Stale or missing cache -> one bounded registry call (silent on any failure).
+if [ -z "$latest" ]; then
+  latest=$(curl -sm 3 "$REGISTRY_URL" 2>/dev/null \
+    | { command -v jq >/dev/null 2>&1 && jq -r '."dist-tags".latest // empty' \
+        || sed -n 's/.*"latest":"\([^"]*\)".*/\1/p'; } | head -1) || true
+  [ -z "$latest" ] && exit 0
+  mkdir -p "$(dirname "$CACHE_FILE")" 2>/dev/null || exit 0
+  printf '%s|%s\n' "$now" "$latest" > "$CACHE_FILE" 2>/dev/null || true
+fi
+[ "$latest" = "$LOCAL_VERSION" ] && exit 0
+# Update available only when latest sorts strictly ABOVE local (a dev machine
+# running ahead of the registry must not see an "update" prompt).
+highest=$(printf '%s\n%s\n' "$LOCAL_VERSION" "$latest" | sort -t. -k1,1n -k2,2n -k3,3n | tail -1)
+if [ "$highest" = "$latest" ]; then
+  printf '%s|%s\n' "$LOCAL_VERSION" "$latest"
+fi
+exit 0