npm - @mmerterden/multi-agent-pipeline - Versions diffs - 10.7.3 → 10.8.0 - Mend

@mmerterden/multi-agent-pipeline 10.7.3 → 10.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (58) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -14,6 +14,77 @@ Internal file-layout changes that don't affect the slash-command surface are sti
 ---
+## [10.8.0] - 2026-07-02
+Three additive loop-quality features, all sourced from the 2026 agentic-loop research sweep (Anthropic long-running-agent harness guidance + adversarial-review findings): empirical verification of blocking findings, an immutable-test contract, and structured phase-boundary handoffs.
+### Added
+- **Verify-by-test triage (Phase 4 Step 3.7, opt-in via `prefs.global.verifyByTest`).**
+  Accepted blocking findings are no longer judgment-final: one verifier agent
+  (default Sonnet, capped at `maxFindings`=3) writes a minimal repro test per
+  finding and runs ONLY that test. Test fails as predicted -> finding confirmed
+  and the repro test is handed to Phase 3 as the rework RED test
+  (`state.reviewIterations[-1].verifyByTest.redTests[]`). Test passes under
+  `evidence-gate.mjs` -> finding downgraded to `deferred` (never `rejected`).
+  Compile error / timeout -> `inconclusive`, judgment verdict stands. The whole
+  step is timeout-bounded and never blocks the pipeline. Schema
+  `triage-output` v3.2.0 adds the optional per-finding `verification` block;
+  `validate-triage.mjs` validates it. New feature doc
+  `refs/features/verify-by-test.md` + `smoke-verify-by-test.sh` (20 assertions).
+- **Immutable-test rule + `test_lines_removed` diff-risk signal.** New rule in
+  `refs/rules.md` and the Phase 3 GREEN step: deleting, renaming, or weakening
+  an existing test to reach green is a violation; a test changes only when the
+  task changes the spec it encodes, named in the commit body. Deterministic
+  backstop: `diff-risk-score.mjs` v1.1.0 emits `test_lines_removed` (w=3.0)
+  when a test-classified file removes more lines than it adds; wired into the
+  Phase 4 Step 1.75 signal table, `diff-risk.schema.json` v1.1.0, and
+  `validate-diff-risk.mjs`. New fixture `diff-risk-test-removal.diff` +
+  3 new `smoke-diff-risk.sh` assertions.
+- **Structured handoff blocks (fresh-context re-entry discipline).** The
+  phase-boundary checkpoint in `refs/phases/operations.md` now appends a
+  structured `## Handoff` block (Done / Remaining / Decisions / Open findings /
+  Next) to `agent-log.md` at every phase transition - written by the
+  orchestrator from state it already holds, no LLM call, ~15 lines. The
+  post-`/compact` re-grounding and `resume.md` Step 3 read the latest handoff
+  FIRST (state wins on mismatch), falling back to per-phase findings for
+  pre-v10.8 logs. Documented in `log-format.md`; guarded by
+  `smoke-handoff-contract.sh` (13 assertions).
+- **Module review guides in `/multi-agent:review` (Step 2b).** When a changed
+  file's module carries its own convention file (`CLAUDE.md`, `*-CLAUDE.md`,
+  `AGENTS.md` below repo root), the review discovers it deterministically from
+  the diff paths (capped at 5, truncation logged), injects it into every
+  reviewer prompt, and scopes each guide to files under its own directory.
+  Works with a local checkout or via provider API in PR mode.
+### Fixed
+- **`validate-triage.mjs` reviewer enum accepts `fable` and `gpt`.** The runtime
+  validator still rejected the schema-v3.1.0 reviewer values restored in
+  v10.7.4 (`fable` is Reviewer 1 on Claude Code, `gpt` on Copilot CLI), so any
+  accepted finding attributed to them failed the 3.2.1 gate. Enum now matches
+  `triage-output.schema.json`.
+---
+## [10.7.4] - 2026-07-02
+Deep-consistency sweep: the v10.6.0 Fable restore and the v10.7.0 adapter removal are now reflected on every surface, and the test suite is green again end to end.
+### Fixed
+- **`cost-table.json` regains the `fable` price row** (`claude-fable-5`, $10/$50 per MTok, cache-read $1) — the v10.6.0 restore had left every architect/Reviewer-1/triage dispatch unpriced ("USD unavailable") in the cost ledger. `cost-budget-check.mjs` and `prefs.schema.json` `costBudget.pricingModel` now default to `fable` (conservative upper bound); `triage-output` / `reviewer-output` schemas and the telemetry enums (`log-format`, `progress-contract`) accept `fable` (+ `gpt` as reviewer source).
+- **`uninstall` legacy adapter cleanup actually works again.** The four `--cursor` / `--copilot-chat` / `--antigravity` / `--codex` blocks imported adapter modules deleted in v10.7.0, so every run silently no-opped. They now perform inline cleanup (multi-agent-* files + managed-marker blocks) with user files untouched; verified by a round-trip test.
+- **`smoke-install-layout` fixture regenerated** (167 scripts; it was stale at 174 since the v10.7.0 deletions and failed `npm test` + CI). `smoke-readme-counts.sh` deleted — it asserted a README counts table that the v10.7.1 concise-README rewrite intentionally removed. The 10.7.1 changelog entry no longer spells the corporate Jira key it genericized, so the tarball leak gate passes.
+- **`multi-agent-sync` skill command inventory reconciled** — it still listed the v7-era 26 commands; now the canonical 35 (incl. `finish`), matching `sync.md` and `cross-cli-contract.md`.
+### Changed
+- **Fable-restore consistency sweep.** Every doc that still said "Opus triage" / "Opus top tier" now says Fable where Claude Code dispatches Fable (Phase 1/2 headers, phase-4-review tables + metrics, help EN+TR, review command + skill, orchestration SKILL, features/architecture/performance docs, examples, CLAUDE.md template). The Copilot CLI reviewer trio stays explicitly pinned at GPT-5.4 + Opus + Sonnet (Fable 5 is not offered there) — now stated in `model-fallback.md`, which also drops its stale version-tagged title. The stray `claude-opus-4.6` / `claude-sonnet-4.6` ids are normalized to `claude-opus-4-8` / `claude-sonnet-4-6`.
+- **Adapter-era residue swept**: `index.js` help no longer advertises removed install flags; `package.json` description/keywords say Claude Code + Copilot CLI only; `.gitignore` drops the dead `sync-adapters.mjs` note; `tracker-contract.md` drops the Cursor detection branch; ADR-0007 is marked Superseded and `GENERICITY-REVIEW.md` carries a pre-v10.7.0 banner; ROADMAP's "Current Release" jumps from the stale 10.1.0 (which still claimed Fable retired) to 10.7.x and dead future items are resolved.
+- **Plugin-model docs**: the Copilot instructions template and `scripts/README` no longer reference the deleted `stack-swap.sh` session-start mechanic; `features.md` "Stack Swap" section rewritten around marketplace-plugin enablement; `architecture.md` counts refreshed (39 commands, 8 personas, 17 schemas, 100+ smokes, 41 figma + 38 core + 143 external skills); `help` stack args include `mobile`.
 ## [10.7.3] - 2026-07-02
 ### Changed
@@ -103,8 +174,8 @@ Fable 5 restored as the top model tier; `stack-swap` fully removed; setup gains
 ### Fixed
-- Genericized a leaked corporate Jira key (`DIJITAL` → `PROJ`/`{JIRA_KEY}`) in the
-  `finish` command + skill so the personal-data gate passes.
+- Genericized a leaked corporate Jira project key (replaced with `PROJ`/`{JIRA_KEY}`)
+  in the `finish` command + skill so the personal-data gate passes.
 ## [10.5.0] - 2026-07-02

package/docs/adr/0001-three-model-triage.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# 1. CLI-aware parallel review with Opus triage
+# 1. CLI-aware parallel review with top-tier triage
-**Status:** Accepted · 2025 · Amended 2026-04 (CLI-aware reviewer set)
+**Status:** Accepted · 2025 · Amended 2026-04 (CLI-aware reviewer set) · Amended 2026-07 (v10.6.0: Fable 5 restored — Reviewer 1 and triage run on Fable on Claude Code; Copilot CLI pins Opus. "Opus" below reads as "the top tier of the day")
 ## Context

package/docs/adr/0007-multi-tool-adapter-framework.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # 7. Multi-tool adapter framework + token-preserving uninstall
-**Status:** Accepted · 2026-04-27 (v7.7.0 / v7.9.0)
+**Status:** Superseded · 2026-07-02 (v10.7.0 removed all non-native adapters — the pipeline targets Claude Code + Copilot CLI only). Original acceptance: 2026-04-27 (v7.7.0 / v7.9.0). Kept as the historical record of why the adapter framework existed; install flags documented below no longer exist.
 ## Context

package/docs/adr/README.md CHANGED Viewed

@@ -10,13 +10,13 @@ Format: lightly adapted from [Michael Nygard's ADR template](https://cognitect.c
 | # | Title | Status |
 |---|-------|--------|
-| [0001](./0001-three-model-triage.md)   | CLI-aware parallel review with Opus triage         | Accepted (amended v5.2.2) |
+| [0001](./0001-three-model-triage.md)   | CLI-aware parallel review with top-tier triage      | Accepted (amended v5.2.2, v10.6.0 Fable) |
 | [0002](./0002-instruction-driven-flag.md) | instructionDriven flag as explicit pipeline fork  | Accepted |
 | [0003](./0003-unified-shared-skills.md) | Unified `skills/shared/` for Claude Code + Copilot | Accepted (amended v5.3.3) |
 | [0004](./0004-zero-dependency-philosophy.md) | Keep the package zero-runtime-deps                | Accepted |
 | [0005](./0005-lazy-phase-docs.md)      | Lazy-loaded phase docs with per-phase token budget | Accepted |
 | [0006](./0006-skills-core-external-split.md) | `shared/core/` vs `shared/external/` source org    | Accepted |
-| [0007](./0007-multi-tool-adapter-framework.md) | Multi-tool adapter framework + token-preserving uninstall | Accepted (v7.7.0 / v7.9.0) |
+| [0007](./0007-multi-tool-adapter-framework.md) | Multi-tool adapter framework + token-preserving uninstall | Superseded by v10.7.0 (adapters removed; Claude Code + Copilot CLI only) |
 ## Writing a New ADR

package/docs/architecture.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Architecture
-## 9-Phase Pipeline Flow
+## 8-Phase Pipeline Flow (0-7)
 ```mermaid
 graph TD
@@ -9,7 +9,7 @@ graph TD
     P1["Phase 1: Analysis<br/>Codebase scan (parallel Explore agents)"]
     P2["Phase 2: Planning<br/>Task breakdown, architecture review"]
     P3["Phase 3: Dev<br/>TDD: RED → GREEN → REFACTOR"]
-    P4["Phase 4: Review<br/>Parallel + Opus triage<br/>(Claude: 2-model · Copilot: 3-model)"]
+    P4["Phase 4: Review<br/>Parallel + Fable triage<br/>(Claude: 2-model · Copilot: 3-model)"]
     P5["Phase 5: Test<br/>Optional manual testing"]
     P6["Phase 6: Commit<br/>Git commit, PR creation"]
     P7["Phase 7: Report<br/>Jira · Wiki+Figma · Confluence · Log · Knowledge"]
@@ -53,11 +53,11 @@ graph LR
 graph TD
     DIFF["Code Diff"]
-    DIFF --> OPUS["Opus<br/>Security + Architecture"]
+    DIFF --> R1["Fable (Claude Code) / Opus (Copilot)<br/>Security + Architecture"]
     DIFF --> GPT["GPT-5.4<br/>Quality + Edge Cases"]
     DIFF --> SON["Sonnet<br/>Correctness + Style"]
-    OPUS --> TRIAGE["Opus Triage"]
+    R1 --> TRIAGE["Fable Triage<br/>(Opus on Copilot CLI)"]
     GPT --> TRIAGE
     SON --> TRIAGE
@@ -117,19 +117,19 @@ graph TB
     end
     subgraph "Pipeline Specs"
-        CMD[commands/<br/>25 sub-commands]
-        AGT[agents/<br/>6 agent definitions]
+        CMD[commands/<br/>39 command files<br/>(34 user-facing)]
+        AGT[agents/<br/>8 agent personas]
         RUL[rules/<br/>12 domain rules]
-        PHS[refs/phases/<br/>8 phase specs]
-        FIG[skills/figma-{ios,android,common}/<br/>37 figma skills]
-        CMP[skills/shared/core/<br/>28 orchestration skills<br/>incl. compliance]
-        EXT[skills/shared/external/<br/>127 curated skills]
+        PHS[refs/phases/<br/>8 phase specs + 3 contracts]
+        FIG[skills/figma-{ios,android,common}/<br/>41 figma skills]
+        CMP[skills/shared/core/<br/>38 orchestration skills<br/>incl. compliance]
+        EXT[skills/shared/external/<br/>143 curated skills<br/>(authoring source for the<br/>multi-agent-plugins marketplace)]
     end
     subgraph "Quality Gates"
-        SCH[schemas/<br/>7 JSON schemas + token-budget]
-        EVL[eval/triage/<br/>10 regression fixtures]
-        SMK[scripts/smoke-*<br/>45 smoke suites]
+        SCH[schemas/<br/>17 JSON schemas]
+        EVL[eval/triage/<br/>12 regression fixtures]
+        SMK[scripts/smoke-*<br/>100+ smoke suites]
     end
     IDX --> INS
@@ -167,7 +167,7 @@ User Input → Phase 0 (Init)
 ```mermaid
 graph TD
     CC["Claude Code<br/>(source of truth)"]
-    COP["Copilot CLI<br/>(instructions + 145 unified skills)"]
+    COP["Copilot CLI<br/>(instructions + unified skills)"]
     REPO["Pipeline Repo<br/>(npm package)"]
     WEB["Website<br/>(optional)"]
     RC["Remote Control<br/>(optional)"]

package/docs/features.md CHANGED Viewed

@@ -4,15 +4,15 @@ Comprehensive list of every feature the pipeline ships. The top-level `README.md
 ## Core Pipeline
-### 9-Phase Orchestration
+### 8-Phase Orchestration (0-7)
 ```
 Phase 0: Init      Project selection, branch setup, identity, worktree
 Phase 1: Analysis  Stack detection, codebase exploration (parallel Explore agents)
 Phase 2: Planning  Task decomposition, architecture review, user approval
 Phase 3: Dev       TDD cycle: test → code → build (Sonnet)
-Phase 4: Review    Deterministic gates + parallel AI review + Opus triage
-                   (Claude Code: Opus + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
+Phase 4: Review    Deterministic gates + parallel AI review + Fable triage
+                   (Claude Code: Fable + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
 Phase 5: Test      Optional manual testing + on-demand device audits
 Phase 6: Commit    Git commit, push, PR with default reviewers + draft/ready prompt
 Phase 7: Report   External: Jira comment · Wiki + Figma screenshots · Confluence
@@ -39,21 +39,22 @@ Compose freely: `--dev --local autopilot` = shortest, least-friction path.
 | Android/Kotlin | `build.gradle[.kts]`                         | `refs/android-guide.md`  |
 | Backend        | `requirements.txt`, `package.json`, `go.mod` | `refs/backend-guide.md`  |
 | Frontend       | `package.json` + framework detection         | `refs/frontend-guide.md` |
-| Docker         | `Dockerfile`, `docker-compose.yml`           | `refs/docker-guide.md`   |
+| Docker         | `Dockerfile`, `docker-compose.yml`           | `refs/backend-guide.md`  |
 Build commands, test runners, lint tools, and review focus areas all adapt to the detected stack.
-### Stack Swap
+### Stack Selection (marketplace plugins, v10.5.0+)
-Switch skill sets mid-session without restarting the CLI:
+Stack skill sets ship as versioned plugins in the `multi-agent-plugins` marketplace. Selecting a stack enables the matching plugin(s) in the current repo's `.claude/settings.json` `enabledPlugins` — no skill copying, no session restart tricks, no directory shuffling. The `ai-common-engineering-toolkit` (accessibility audit, humanizer, Firebase) is always enabled alongside the stack plugin.
 ```bash
-/multi-agent stack ios         # SwiftUI, Xcode, ViewInspector
-/multi-agent stack android     # Compose, Gradle, Hilt
-/multi-agent stack backend     # FastAPI, Node.js, Docker
-/multi-agent stack frontend    # React, Next.js, Vue
-/multi-agent stack mobile      # iOS + Android combined
-/multi-agent stack all         # Load everything
+/multi-agent:stack ios         # ai-ios-engineering-toolkit (SwiftUI, Xcode, HIG)
+/multi-agent:stack android     # ai-android-engineering-toolkit (Compose, Gradle, Hilt)
+/multi-agent:stack mobile      # iOS + Android combined
+/multi-agent:stack backend     # ai-backend-toolkit (spec-driven APIs)
+/multi-agent:stack frontend    # ai-frontend-engineering-toolkit (React/TSX)
+/multi-agent:stack fullstack   # backend + frontend
+/multi-agent:stack all         # every stack plugin
 ```
 ### Task Type Detection (v2.0.0)
@@ -124,21 +125,21 @@ Cheap, objective checks run BEFORE any AI token is spent:
 If any gate fails, fix first. Don't waste AI tokens reviewing broken code.
-### 3-Model Parallel Review + Opus Triage (Phase 4 Steps 2–3)
+### CLI-Aware Parallel Review + Fable Triage (Phase 4 Steps 2–3)
-| Reviewer   | Model               | Focus                             |
-| ---------- | ------------------- | --------------------------------- |
-| Reviewer 1 | `claude-opus-4.6`   | Deep security + architecture      |
-| Reviewer 2 | `gpt-5.4`           | Edge cases, different perspective |
-| Reviewer 3 | `claude-sonnet-4.6` | Quality + correctness + naming    |
+| Reviewer   | Model               | Focus                             | Where it runs        |
+| ---------- | ------------------- | --------------------------------- | -------------------- |
+| Reviewer 1 | `claude-fable-5` (Claude Code) / `claude-opus-4-8` (Copilot CLI) | Deep security + architecture | Both CLIs |
+| Reviewer 2 | `gpt-5.4`           | Edge cases, different perspective | **Copilot CLI only** |
+| Reviewer 3 | `claude-sonnet-4-6` | Quality + correctness + naming    | Both CLIs            |
-All three dispatched in parallel. Each returns structured JSON for deterministic aggregation. Cross-model diversity catches blind spots that any single model family would miss.
+The reviewer set is **CLI-aware**: Claude Code dispatches 2 reviewers in parallel (Fable + Sonnet — GPT-5.4 is not available there); Copilot CLI dispatches all 3. Each returns structured JSON for deterministic aggregation. Cross-model diversity catches blind spots that any single model family would miss.
-**Opus Triage** (Phase 4 Step 3): Evaluates merged raw findings against task scope. Classifies each as `accepted` (fix now), `deferred` (out of scope, log for later), or `rejected` (false positive / noise). Only triage-accepted blocking items loop back to Phase 3.
+**Fable Triage** (Phase 4 Step 3, Opus on Copilot CLI): Evaluates merged raw findings against task scope. Classifies each as `accepted` (fix now), `deferred` (out of scope, log for later), or `rejected` (false positive / noise). Only triage-accepted blocking items loop back to Phase 3.
 ### Runtime Triage Validator (v2.3.0)
-After Opus triage returns, output is validated by `validate-triage.mjs`:
+After triage returns, output is validated by `validate-triage.mjs`:
 | Exit  | Meaning                                                      |
 | ----- | ------------------------------------------------------------ |
@@ -151,6 +152,18 @@ After Opus triage returns, output is validated by `validate-triage.mjs`:
 If triage returns `approved: false` but has no blocking items, the validator forces `approved: true`. Conversely, if `approved: true` but blocking items exist, it forces `approved: false`. Hardened with `if`/`then` constraint in schema v3.0.0.
+### Verify-by-Test Triage (Phase 4 Step 3.7, v10.8.0, opt-in)
+A triage verdict is a judgment call; a failing repro test is proof. When `prefs.global.verifyByTest.enabled` is on, one verifier agent (default Sonnet) writes a minimal repro test per accepted blocking finding (cap: `maxFindings`=3) and runs only that test. Fails as predicted -> finding confirmed, the repro test becomes the Phase 3 rework RED test. Passes under `evidence-gate.mjs` -> finding downgraded to `deferred`. Compile error / timeout -> `inconclusive`, judgment stands. Timeout-bounded, never blocks. Full spec: `refs/features/verify-by-test.md`.
+### Immutable-Test Rule + `test_lines_removed` Signal (v10.8.0)
+Existing tests are immutable during a task: deleting, renaming, or weakening an assertion to reach green is a violation (`refs/rules.md`, Phase 3 GREEN step). A test changes only when the task changes the spec it encodes, named in the commit body. Deterministic backstop: `diff-risk-score.mjs` emits `test_lines_removed` (w=3.0) for any test-classified file whose diff removes more lines than it adds.
+### Structured Handoff Blocks (v10.8.0)
+Every phase transition appends a `## Handoff` block (Done / Remaining / Decisions / Open findings / Next) to `agent-log.md` - orchestrator-written from existing state, no LLM call. `/multi-agent:resume` and post-`/compact` re-grounding read the latest handoff first, so long runs re-enter from durable artifacts instead of conversation memory (fresh-context discipline from Anthropic's long-running-agent harness guidance).
 ### Accessibility Code Review (Phase 4 Step 1.5)
 If changes include UI files, reviewers check for:
@@ -210,7 +223,7 @@ Per-phase token budgets prevent runaway sessions. If a phase exceeds its budget,
 `pipeline/scripts/diff-risk-score.mjs` runs at Phase 4 Step 1.75 — before reviewer dispatch. Heuristic, deterministic, sub-second, no LLM. Top-N risk-ranked files inject into each reviewer's prompt as a `${PRIORITY_FILES}` block; reviewers read those files first but still review the entire diff.
-Signals + weights: `security_path` ×3, `migration` ×4, `public_api` ×2, `no_test_change` ×2.5, `complexity_delta` ×1.5, `ui_critical` ×1.5, `loc_changed` ×1. Toggle via `prefs.global.diffRiskAdvisory` (default ON).
+Signals + weights: `security_path` ×3, `migration` ×4, `public_api` ×2, `no_test_change` ×2.5, `test_lines_removed` ×3 (v10.8.0: test file shrinks - immutable-test backstop), `complexity_delta` ×1.5, `ui_critical` ×1.5, `loc_changed` ×1. Toggle via `prefs.global.diffRiskAdvisory` (default ON).
 ### Test Gap Detection (v8.3.0)

package/docs/performance.md CHANGED Viewed

@@ -36,7 +36,7 @@ Each event in `metrics.jsonl` is a single-line JSON object written by
   JSON retries, timeouts.
 - **Phase 3 retries** — build / test / lint retry distribution per task.
 - **Cost per model** — calls, duration, tokens in/out, broken down by model
-  (Opus, Sonnet, GPT-5.4).
+  (Fable, Opus, Sonnet, GPT-5.4).
 - **Language preference** — distribution of EN vs TR prompts.
 ## Typical Output (Markdown)
@@ -67,8 +67,8 @@ _Source: ~/.claude/logs/multi-agent/metrics.jsonl · Events: 421 (0 parse errors
 | Model | Calls | Duration (ms) | Tokens In | Tokens Out |
 |-------|-------|---------------|-----------|------------|
-| `claude-opus-4.6` | 124 | 612430 | 380221 | 92114 |
-| `claude-sonnet-4.6` | 89 | 412318 | 201445 | 58903 |
+| `claude-fable-5` | 124 | 612430 | 380221 | 92114 |
+| `claude-sonnet-4-6` | 89 | 412318 | 201445 | 58903 |
 | `gpt-5.4` | 88 | 398214 | 194302 | 55128 |
 ```

package/index.js CHANGED Viewed

@@ -41,30 +41,26 @@ if (!command || command === "install") {
     npx @mmerterden/multi-agent-pipeline install              Install for Claude Code (default)
     npx @mmerterden/multi-agent-pipeline install --copilot    Install for Copilot CLI
     npx @mmerterden/multi-agent-pipeline install --all        Both Claude + Copilot
-    npx @mmerterden/multi-agent-pipeline install --cursor       Cursor full orchestration (rules + subagents + /multi-agent + MCP)
-    npx @mmerterden/multi-agent-pipeline install --copilot-chat GitHub Copilot Chat (.github/copilot-instructions.md)
-    npx @mmerterden/multi-agent-pipeline install --antigravity  Antigravity full orchestration (.agent/ + AGENTS.md + MCP)
-    npx @mmerterden/multi-agent-pipeline install --all-tools    Every supported tool (Claude + Copilot + Cursor + Copilot Chat + Antigravity)
     npx @mmerterden/multi-agent-pipeline install --link       Use symlinks (saves tokens, dev mode)
   Uninstall (token-preserving — Keychain/Credential Manager untouched):
     npx @mmerterden/multi-agent-pipeline uninstall            Interactive: remove from all installed targets
     npx @mmerterden/multi-agent-pipeline uninstall --yes      Skip prompt
     npx @mmerterden/multi-agent-pipeline uninstall --dry-run  Report what would be removed
-    npx @mmerterden/multi-agent-pipeline uninstall --cursor   Only Cursor (use --target=<path> to override cwd)
+    npx @mmerterden/multi-agent-pipeline uninstall --claude   Only Claude Code
+    npx @mmerterden/multi-agent-pipeline uninstall --cursor   Legacy pre-v10.7 adapter-file cleanup (also --copilot-chat / --antigravity / --codex; --target=<path> overrides cwd)
   Help:
     npx @mmerterden/multi-agent-pipeline help
   Options:
     --no-color           Disable colored output
-    --target=<path>      Adapter target dir (defaults to cwd; ignored for Claude/Copilot)
+    --target=<path>      Target dir for legacy adapter cleanup on uninstall (defaults to cwd)
     --platform=ios|android|all   Filter external skills by platform (default: all)
   After installation:
     Claude Code:  /multi-agent "MOBILE-123"
     Copilot CLI:  Describe your task naturally — pipeline instructions are loaded
-    Cursor / Antigravity / VS Code Copilot Chat: full orchestration (subagents + /multi-agent + MCP)
   More info: https://github.com/mmerterden/multi-agent-pipeline
   `);

package/install/templates/copilot-instructions.md CHANGED Viewed

@@ -240,9 +240,9 @@ Cost block reads `phase-tracker.sh tokens` accumulators × `cost-table.json` pri
 - Every public method must have tests
 - Commit format: {type}({scope}): description [{jiraId}]
-## Stack Detection
+## Stack Selection
-Auto-detected at session start by `~/.copilot/scripts/stack-swap.sh` (uses project markers). Manual override:
+Stack skill sets ship as versioned plugins in the `multi-agent-plugins` marketplace. Selecting a stack enables the matching plugin(s) in the target repo's `.claude/settings.json` `enabledPlugins`; the `ai-common-engineering-toolkit` is always enabled alongside. There is no session-start auto-swap script. Select or change the stack with:
 ```bash
 multi-agent-stack [ios|android|mobile|backend|frontend|fullstack|all]

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@mmerterden/multi-agent-pipeline",
-  "version": "10.7.3",
-  "description": "8-phase AI development pipeline with full orchestration on Claude Code, Copilot CLI, Cursor, Antigravity, and VS Code Copilot Chat. Analysis, planning, TDD, CLI-aware parallel review with consensus surfacing + Opus triage, default-FAIL evidence gates, secret + intent guards, per-phase cost ledger, persistent learnings memory, wiki generation, commit automation. Token-preserving uninstall.",
+  "version": "10.8.0",
+  "description": "8-phase AI development pipeline with full orchestration on Claude Code and Copilot CLI. Analysis, planning, TDD, CLI-aware parallel review with consensus surfacing + Fable triage, default-FAIL evidence gates, secret + intent guards, per-phase cost ledger, persistent learnings memory, wiki generation, commit automation. Token-preserving uninstall.",
   "type": "module",
   "main": "index.js",
   "exports": {
@@ -34,9 +34,6 @@
     "copilot-cli",
     "copilot",
     "claude",
-    "cursor",
-    "windsurf",
-    "cline",
     "ios",
     "android",
     "backend",

package/pipeline/agents/dev-critic.md CHANGED Viewed

@@ -7,7 +7,7 @@ modelRationale: "Critic tier  -  deterministic checklist + build/test verificati
 # Dev Critic Agent  -  Phase 3.5
-You are the in-loop critic for Phase 3 (Dev). The generator (Sonnet/Opus during Phase 3) has just finished its last edit. **You run BEFORE Phase 4**, on the same worktree, against deterministic criteria that already exist on disk. Your job: catch failures the generator would otherwise send into Phase 4 and waste 2-3 reviewer calls + Opus triage on.
+You are the in-loop critic for Phase 3 (Dev). The generator (Sonnet/Opus during Phase 3) has just finished its last edit. **You run BEFORE Phase 4**, on the same worktree, against deterministic criteria that already exist on disk. Your job: catch failures the generator would otherwise send into Phase 4 and waste 2-3 reviewer calls + Fable triage on.
 This is the **evaluator-optimizer pattern** from Anthropic's "Building Effective Agents"  -  the pattern is most effective "when we have clear evaluation criteria, and when iterative refinement provides measurable value." Phase 3 satisfies both: criteria are written in `rules/*.md`, refinement value is measured by Phase 4 fix-cycles avoided.

package/pipeline/claude-md-template.md CHANGED Viewed

@@ -16,7 +16,7 @@
 1. Analysis (Opus) -> scope, impact analysis
 2. Planning (Opus) -> spec, task breakdown
 3. Development (Sonnet) -> TDD, code, build
-4. Review -> deterministic gates + parallel review + Opus triage
+4. Review -> deterministic gates + parallel review + Fable triage
    - Claude Code: Opus + Sonnet (2 paralel)
    - Copilot CLI: GPT-5.4 + Opus + Sonnet (3 paralel)

package/pipeline/commands/multi-agent/dev-autopilot.md CHANGED Viewed

@@ -33,7 +33,7 @@ Phase 7: Report     → short terminal summary
 1. **Parse input**  -  standard multi-agent input formats (Issue URL, Jira ID, free text)
 2. **Phase 0: Init**  -  set `"mode": "dev", "autopilot": true` in `agent-state.json`
-3. **Phase 3: Dev**  -  write code directly on `claude-opus-4.6` and verify the build
+3. **Phase 3: Dev**  -  write code directly on `claude-opus-4-8` and verify the build
 4. **Phase 6: Commit**  -  auto commit + push + PR
 5. **Phase 7: Report**  -  terminal summary

package/pipeline/commands/multi-agent/finish.md CHANGED Viewed

@@ -33,7 +33,7 @@ You already did the work locally  -  wrote code on the current branch and maybe
 ```
 Phase 0: Init          → project/branch detect, resolve base + diff (work-already-done), Jira id, state (NO worktree)
-Phase 4: Review        → deterministic gates + parallel review (Opus + Sonnet) + Opus triage
+Phase 4: Review        → deterministic gates + parallel review (Fable + Sonnet) + Fable triage
 Phase 5: Build+Test    → stack-aware build gate + run existing tests; SUCCESS required (automated, not the interactive user-test)
 Phase 6: Commit        → commit remaining local changes + push + open PR if none exists
 Phase 7: Report        → technical analysis + Jira comment with test scenarios (channels: Jira / PR / Confluence / Wiki)
@@ -51,7 +51,7 @@ Phases 1-3 (Analysis / Planning / Dev) are skipped by design  -  `finish` treats
 ## Phase execution (reuse the existing phase contracts)
-- **Phase 4 Review** — run per `refs/phases/phase-4-review.md` against the resolved diff: deterministic gates (Step 1.x), stack-specific parallel reviewers (Opus + Sonnet on Claude Code; +GPT on Copilot CLI), Opus triage → `triage.accepted`. Blocking/important accepted findings:
+- **Phase 4 Review** — run per `refs/phases/phase-4-review.md` against the resolved diff: deterministic gates (Step 1.x), stack-specific parallel reviewers (Fable + Sonnet on Claude Code; GPT + Opus + Sonnet on Copilot CLI), Fable triage → `triage.accepted`. Blocking/important accepted findings:
   - interactive: present them and ask (`AskUserQuestion`) whether to fix now (loop back through a minimal Phase-3-style TDD fix) or proceed;
   - `autopilot` (or `prefs.global.finish.autoFix == true`): auto-fix accepted blocking/important findings, then re-review the fix, before advancing.
 - **Phase 5 Build+Test** — the **automated success gate** (this is what "build+test success" means here; the interactive device user-test is `/multi-agent:manual-test`). Stack-aware: build via `figma-config.build` (iOS scheme / Android gradle / detected backend/frontend build) and run the existing test suite if present (`swift test` / `xcodebuild test` / `./gradlew test` / `pytest` / `npm test` / `vitest`). Require success to advance; on failure, surface logs and (interactive) stop or (autopilot) attempt a bounded fix loop. **If the repo has no tests, report "no tests present" — never fabricate test results.**

package/pipeline/commands/multi-agent/help.md CHANGED Viewed

@@ -52,13 +52,13 @@ How It Works (Phase 0  -  Interactive Flow):
 Pipeline (after Phase 0)  -  shown as visual cards in terminal:
   Phase 0:   Init       -> The 8 steps above
-  Phase 1:   Analysis   -> Stack detection + codebase scan (Opus)
+  Phase 1:   Analysis   -> Stack detection + codebase scan (Fable)
   Phase 2:   Planning   -> Task breakdown + architecture review + Plan Approval Gate
                           (clarification max 2 rounds + approval loop  -  normal mode only;
                            skipped for --dev, autopilot, --dev autopilot)
   Phase 3:   Dev        -> TDD: test -> code -> build (Sonnet) + build queue
-  Phase 4:   Review     -> Deterministic gates + parallel AI review + Opus triage
-                          (Claude Code: Opus + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
+  Phase 4:   Review     -> Deterministic gates + parallel AI review + Fable triage
+                          (Claude Code: Fable + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
   Phase 5:   Test       -> Optional: switch to branch, test in Xcode
                           (runs in dev + full; skipped in every autopilot and local variant)
   Phase 6:   Commit     -> Commit -> push -> PR + issue body update (never auto-closes)
@@ -73,7 +73,7 @@ Pipeline (after Phase 0)  -  shown as visual cards in terminal:
 Modes:
-  (normal)             Full 8 phases, Sonnet dev, Plan Approval Gate active, parallel review + Opus triage
+  (normal)             Full 8 phases, Sonnet dev, Plan Approval Gate active, parallel review + Fable triage
   --dev                Fast: Init -> Dev(Opus) -> Commit -> Report (no plan gate)
   --local              No worktree  -  works directly on local branch
   autopilot            Skip all confirmations INCLUDING plan gate, auto commit/PR
@@ -120,7 +120,7 @@ Setup & Maintenance:
   /multi-agent:setup          First-run wizard  -  Keychain tokens + Git identity + language
   /multi-agent:language [en|tr]   Show or set outputLanguage (promptLanguage stays English)
-  /multi-agent:stack <id>     Select stack by enabling the matching marketplace plugin(s) (ios / android / backend / frontend / fullstack / all)
+  /multi-agent:stack <id>     Select stack by enabling the matching marketplace plugin(s) (ios / android / mobile / backend / frontend / fullstack / all)
   /multi-agent:sync           Sync ecosystem (Claude Code + Copilot CLI + pipeline + website + remote-control)
   /multi-agent:update         Pull latest pipeline + reinstall + run migrations
   /multi-agent:delete         Uninstall pipeline from every CLI (Keychain tokens left intact, double confirm)
@@ -195,7 +195,7 @@ Quality & Telemetry (advisory, on by default  -  flip prefs.global.* to disable)
   Triage Memory    Phase 7 ingests accepted/deferred/rejected findings into a per-repo corpus
   Prior-Art Lookup Phase 1 + Phase 4 query the corpus for similar past findings, inject as context
   Per-Persona      Reviewer/agent dispatch reads `preferredModel` from persona file; per-call override
-                   via PHASE_MODEL_OVERRIDE; falls back to opus
+                   via PHASE_MODEL_OVERRIDE; ladder fable -> opus -> sonnet -> haiku
 ------------------------------------------------------------
@@ -260,13 +260,13 @@ Nasıl Çalışır (Phase 0  -  İnteraktif Akış):
 Pipeline (Phase 0'dan sonra)  -  terminalde görsel kart olarak görünür:
   Phase 0:   Init       -> Yukarıdaki 8 adım
-  Phase 1:   Analysis   -> Stack tespiti + codebase taraması (Opus)
+  Phase 1:   Analysis   -> Stack tespiti + codebase taraması (Fable)
   Phase 2:   Planning   -> Task kırılımı + mimari inceleme + Plan Onay Kapısı
                           (clarification max 2 tur + onay döngüsü  -  sadece normal mode;
                            --dev, autopilot, --dev autopilot'ta skip)
   Phase 3:   Dev        -> TDD: test -> kod -> build (Sonnet) + build queue
-  Phase 4:   Review     -> Deterministik kapılar + paralel AI review + Opus triage
-                          (Claude Code: Opus + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
+  Phase 4:   Review     -> Deterministik kapılar + paralel AI review + Fable triage
+                          (Claude Code: Fable + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
   Phase 5:   Test       -> Opsiyonel: branch'e geç, Xcode'da test (--dev / autopilot'ta skip)
   Phase 6:   Commit     -> Commit -> push -> PR + issue body güncelleme (hiç auto-close yok)
   Phase 7:   Report     -> Channels dispatcher (PR · Jira · Confluence · Wiki, multi-select)
@@ -280,7 +280,7 @@ Pipeline (Phase 0'dan sonra)  -  terminalde görsel kart olarak görünür:
 Modlar:
-  (normal)             Tam 8 faz, Sonnet dev, Plan Onay Kapısı aktif, paralel review + Opus triage
+  (normal)             Tam 8 faz, Sonnet dev, Plan Onay Kapısı aktif, paralel review + Fable triage
   --dev                Hızlı: Init -> Dev(Opus) -> Commit -> Report (plan gate yok)
   --local              Worktree yok  -  doğrudan local branch'te çalışır
   autopilot            Plan gate dahil tüm onayları atla, otomatik commit/PR
@@ -327,7 +327,7 @@ Setup & Maintenance:
   /multi-agent:setup              İlk kurulum sihirbazı  -  Keychain token + Git kimliği + dil
   /multi-agent:language [en|tr]   outputLanguage'ı göster veya ayarla (promptLanguage İngilizce kalır)
-  /multi-agent:stack <id>         Stack'i eşleşen marketplace plugin'i etkinleştirerek seç (ios / android / backend / frontend / fullstack / all)
+  /multi-agent:stack <id>         Stack'i eşleşen marketplace plugin'i etkinleştirerek seç (ios / android / mobile / backend / frontend / fullstack / all)
   /multi-agent:sync               Ekosistemi senkronize et (Claude Code + Copilot CLI + pipeline + website + remote-control)
   /multi-agent:update             En son pipeline'ı çek + reinstall + migration çalıştır
   /multi-agent:delete             Pipeline'ı tüm CLI'lerden kaldır (Keychain token dokunulmaz, çift onay)
@@ -402,7 +402,7 @@ Quality & Telemetry (advisory, default açık  -  prefs.global.* ile kapatılabi
   Triage Memory    Phase 7 accepted/deferred/rejected bulguları repo başına corpus'a yazar
   Prior-Art Lookup Phase 1 + Phase 4 corpus'tan benzer geçmiş bulgu sorgular, ek context olarak enjekte
   Per-Persona      Reviewer/agent dispatch persona dosyasından `preferredModel` okur;
-                   per-call override PHASE_MODEL_OVERRIDE ile; fallback opus
+                   per-call override PHASE_MODEL_OVERRIDE ile; merdiven fable -> opus -> sonnet -> haiku
 ------------------------------------------------------------

package/pipeline/commands/multi-agent/local.md CHANGED Viewed

@@ -30,7 +30,7 @@ Phase 0: Init       → project detection, branch check, state (NO worktree)
 Phase 1: Analysis   → codebase scan (parallel explore agents, Opus)
 Phase 2: Planning   → task breakdown, Plan Approval Gate (approval loop)
 Phase 3: Dev        → TDD (Sonnet), build queue
-Phase 4: Review     → deterministic gates + parallel review + Opus triage
+Phase 4: Review     → deterministic gates + parallel review + Fable triage
 Phase 6: Commit     → pre-commit checkout prompt, commit + push + PR
 Phase 7: Report     → Jira / Wiki / Confluence + log + knowledge/memory
 ```

package/pipeline/commands/multi-agent/refs/features/dev-critic.md CHANGED Viewed

@@ -37,7 +37,7 @@ Introduces ~1× Sonnet call per Dev iteration. On simple bug fixes the cost outw
 ## Why this fits orchestrator-workers + evaluator-optimizer hybrid
-Phase 4 is parallelization-with-voting  -  good for *adversarial* perspectives (security, architecture). Phase 3.5 is evaluator-optimizer  -  good for *deterministic* criteria (build, tests, checklists). Sending failing builds into Phase 4 wastes 2-3 reviewer calls + Opus triage; Phase 3.5 absorbs that cost at one Sonnet call.
+Phase 4 is parallelization-with-voting  -  good for *adversarial* perspectives (security, architecture). Phase 3.5 is evaluator-optimizer  -  good for *deterministic* criteria (build, tests, checklists). Sending failing builds into Phase 4 wastes 2-3 reviewer calls + Fable triage; Phase 3.5 absorbs that cost at one Sonnet call.
 ## Reference

package/pipeline/commands/multi-agent/refs/features/model-fallback.md CHANGED Viewed

@@ -1,4 +1,6 @@
-# Model Fallback Contract (v10.1.0)
+# Model Fallback Contract
+> Contract last revised in **v10.6.0** (Fable 5 restored as top tier). The version tag here tracks the last substantive change to this contract, not the pipeline release.
 Personas route to the top available intelligence tier they declare in
 `preferredModel`. That tier can be quota-limited or temporarily unavailable.
@@ -95,5 +97,7 @@ per-phase `model` field already carries the override).
   deterministic over clever).
 - No edits to `pipeline/agents/*.md` at runtime; frontmatter is install-time
   configuration only.
-- Copilot CLI reviewer set and adapter-platform model pins are out of scope
-  (they pin their own models and do not use this persona ladder).
+- Copilot CLI reviewer set is out of scope: Copilot CLI pins its own three
+  reviewer models (GPT-5.4 + Opus + Sonnet — Fable 5 is not offered there) and
+  does not use this persona ladder. Only Claude Code dispatches Reviewer-1 on
+  Fable.