@mmerterden/multi-agent-pipeline 10.7.3 → 10.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (58) hide show
  1. package/CHANGELOG.md +73 -2
  2. package/docs/adr/0001-three-model-triage.md +2 -2
  3. package/docs/adr/0007-multi-tool-adapter-framework.md +1 -1
  4. package/docs/adr/README.md +2 -2
  5. package/docs/architecture.md +14 -14
  6. package/docs/features.md +35 -22
  7. package/docs/performance.md +3 -3
  8. package/index.js +3 -7
  9. package/install/templates/copilot-instructions.md +2 -2
  10. package/package.json +2 -5
  11. package/pipeline/agents/dev-critic.md +1 -1
  12. package/pipeline/claude-md-template.md +1 -1
  13. package/pipeline/commands/multi-agent/dev-autopilot.md +1 -1
  14. package/pipeline/commands/multi-agent/finish.md +2 -2
  15. package/pipeline/commands/multi-agent/help.md +12 -12
  16. package/pipeline/commands/multi-agent/local.md +1 -1
  17. package/pipeline/commands/multi-agent/refs/features/dev-critic.md +1 -1
  18. package/pipeline/commands/multi-agent/refs/features/model-fallback.md +7 -3
  19. package/pipeline/commands/multi-agent/refs/features/verify-by-test.md +41 -0
  20. package/pipeline/commands/multi-agent/refs/knowledge.md +1 -1
  21. package/pipeline/commands/multi-agent/refs/phases/log-format.md +11 -1
  22. package/pipeline/commands/multi-agent/refs/phases/modes.md +1 -1
  23. package/pipeline/commands/multi-agent/refs/phases/operations.md +15 -2
  24. package/pipeline/commands/multi-agent/refs/phases/phase-1-analysis.md +2 -2
  25. package/pipeline/commands/multi-agent/refs/phases/phase-2-planning.md +2 -2
  26. package/pipeline/commands/multi-agent/refs/phases/phase-3-dev.md +3 -1
  27. package/pipeline/commands/multi-agent/refs/phases/phase-4-review.md +51 -19
  28. package/pipeline/commands/multi-agent/refs/progress-contract.md +1 -1
  29. package/pipeline/commands/multi-agent/refs/rules.md +1 -0
  30. package/pipeline/commands/multi-agent/refs/tracker-contract.md +1 -2
  31. package/pipeline/commands/multi-agent/resume.md +7 -4
  32. package/pipeline/commands/multi-agent/review.md +41 -9
  33. package/pipeline/commands/multi-agent/sync.md +3 -3
  34. package/pipeline/commands/multi-agent.md +7 -7
  35. package/pipeline/schemas/agent-state.schema.json +1 -1
  36. package/pipeline/schemas/diff-risk.schema.json +5 -4
  37. package/pipeline/schemas/prefs.schema.json +38 -3
  38. package/pipeline/schemas/reviewer-output.schema.json +1 -1
  39. package/pipeline/schemas/triage-output.schema.json +37 -6
  40. package/pipeline/scripts/README.md +3 -2
  41. package/pipeline/scripts/cost-budget-check.mjs +1 -1
  42. package/pipeline/scripts/cost-table.json +7 -0
  43. package/pipeline/scripts/diff-risk-score.mjs +11 -1
  44. package/pipeline/scripts/fixtures/diff-risk-test-removal.diff +40 -0
  45. package/pipeline/scripts/fixtures/install-layout.tsv +5 -5
  46. package/pipeline/scripts/smoke-diff-risk.sh +30 -1
  47. package/pipeline/scripts/smoke-handoff-contract.sh +92 -0
  48. package/pipeline/scripts/smoke-verify-by-test.sh +148 -0
  49. package/pipeline/scripts/uninstall.mjs +53 -57
  50. package/pipeline/scripts/validate-diff-risk.mjs +2 -1
  51. package/pipeline/scripts/validate-triage.mjs +31 -2
  52. package/pipeline/skills/shared/core/multi-agent/SKILL.md +11 -11
  53. package/pipeline/skills/shared/core/multi-agent-dev-autopilot/SKILL.md +1 -1
  54. package/pipeline/skills/shared/core/multi-agent-finish/SKILL.md +1 -1
  55. package/pipeline/skills/shared/core/multi-agent-help/SKILL.md +8 -8
  56. package/pipeline/skills/shared/core/multi-agent-review/SKILL.md +5 -5
  57. package/pipeline/skills/shared/core/multi-agent-sync/SKILL.md +7 -5
  58. package/pipeline/scripts/smoke-readme-counts.sh +0 -120
package/CHANGELOG.md CHANGED
@@ -14,6 +14,77 @@ Internal file-layout changes that don't affect the slash-command surface are sti
14
14
 
15
15
  ---
16
16
 
17
+ ## [10.8.0] - 2026-07-02
18
+
19
+ Three additive loop-quality features, all sourced from the 2026 agentic-loop research sweep (Anthropic long-running-agent harness guidance + adversarial-review findings): empirical verification of blocking findings, an immutable-test contract, and structured phase-boundary handoffs.
20
+
21
+ ### Added
22
+
23
+ - **Verify-by-test triage (Phase 4 Step 3.7, opt-in via `prefs.global.verifyByTest`).**
24
+ Accepted blocking findings are no longer judgment-final: one verifier agent
25
+ (default Sonnet, capped at `maxFindings`=3) writes a minimal repro test per
26
+ finding and runs ONLY that test. Test fails as predicted -> finding confirmed
27
+ and the repro test is handed to Phase 3 as the rework RED test
28
+ (`state.reviewIterations[-1].verifyByTest.redTests[]`). Test passes under
29
+ `evidence-gate.mjs` -> finding downgraded to `deferred` (never `rejected`).
30
+ Compile error / timeout -> `inconclusive`, judgment verdict stands. The whole
31
+ step is timeout-bounded and never blocks the pipeline. Schema
32
+ `triage-output` v3.2.0 adds the optional per-finding `verification` block;
33
+ `validate-triage.mjs` validates it. New feature doc
34
+ `refs/features/verify-by-test.md` + `smoke-verify-by-test.sh` (20 assertions).
35
+ - **Immutable-test rule + `test_lines_removed` diff-risk signal.** New rule in
36
+ `refs/rules.md` and the Phase 3 GREEN step: deleting, renaming, or weakening
37
+ an existing test to reach green is a violation; a test changes only when the
38
+ task changes the spec it encodes, named in the commit body. Deterministic
39
+ backstop: `diff-risk-score.mjs` v1.1.0 emits `test_lines_removed` (w=3.0)
40
+ when a test-classified file removes more lines than it adds; wired into the
41
+ Phase 4 Step 1.75 signal table, `diff-risk.schema.json` v1.1.0, and
42
+ `validate-diff-risk.mjs`. New fixture `diff-risk-test-removal.diff` +
43
+ 3 new `smoke-diff-risk.sh` assertions.
44
+ - **Structured handoff blocks (fresh-context re-entry discipline).** The
45
+ phase-boundary checkpoint in `refs/phases/operations.md` now appends a
46
+ structured `## Handoff` block (Done / Remaining / Decisions / Open findings /
47
+ Next) to `agent-log.md` at every phase transition - written by the
48
+ orchestrator from state it already holds, no LLM call, ~15 lines. The
49
+ post-`/compact` re-grounding and `resume.md` Step 3 read the latest handoff
50
+ FIRST (state wins on mismatch), falling back to per-phase findings for
51
+ pre-v10.8 logs. Documented in `log-format.md`; guarded by
52
+ `smoke-handoff-contract.sh` (13 assertions).
53
+
54
+ - **Module review guides in `/multi-agent:review` (Step 2b).** When a changed
55
+ file's module carries its own convention file (`CLAUDE.md`, `*-CLAUDE.md`,
56
+ `AGENTS.md` below repo root), the review discovers it deterministically from
57
+ the diff paths (capped at 5, truncation logged), injects it into every
58
+ reviewer prompt, and scopes each guide to files under its own directory.
59
+ Works with a local checkout or via provider API in PR mode.
60
+
61
+ ### Fixed
62
+
63
+ - **`validate-triage.mjs` reviewer enum accepts `fable` and `gpt`.** The runtime
64
+ validator still rejected the schema-v3.1.0 reviewer values restored in
65
+ v10.7.4 (`fable` is Reviewer 1 on Claude Code, `gpt` on Copilot CLI), so any
66
+ accepted finding attributed to them failed the 3.2.1 gate. Enum now matches
67
+ `triage-output.schema.json`.
68
+
69
+ ---
70
+
71
+ ## [10.7.4] - 2026-07-02
72
+
73
+ Deep-consistency sweep: the v10.6.0 Fable restore and the v10.7.0 adapter removal are now reflected on every surface, and the test suite is green again end to end.
74
+
75
+ ### Fixed
76
+
77
+ - **`cost-table.json` regains the `fable` price row** (`claude-fable-5`, $10/$50 per MTok, cache-read $1) — the v10.6.0 restore had left every architect/Reviewer-1/triage dispatch unpriced ("USD unavailable") in the cost ledger. `cost-budget-check.mjs` and `prefs.schema.json` `costBudget.pricingModel` now default to `fable` (conservative upper bound); `triage-output` / `reviewer-output` schemas and the telemetry enums (`log-format`, `progress-contract`) accept `fable` (+ `gpt` as reviewer source).
78
+ - **`uninstall` legacy adapter cleanup actually works again.** The four `--cursor` / `--copilot-chat` / `--antigravity` / `--codex` blocks imported adapter modules deleted in v10.7.0, so every run silently no-opped. They now perform inline cleanup (multi-agent-* files + managed-marker blocks) with user files untouched; verified by a round-trip test.
79
+ - **`smoke-install-layout` fixture regenerated** (167 scripts; it was stale at 174 since the v10.7.0 deletions and failed `npm test` + CI). `smoke-readme-counts.sh` deleted — it asserted a README counts table that the v10.7.1 concise-README rewrite intentionally removed. The 10.7.1 changelog entry no longer spells the corporate Jira key it genericized, so the tarball leak gate passes.
80
+ - **`multi-agent-sync` skill command inventory reconciled** — it still listed the v7-era 26 commands; now the canonical 35 (incl. `finish`), matching `sync.md` and `cross-cli-contract.md`.
81
+
82
+ ### Changed
83
+
84
+ - **Fable-restore consistency sweep.** Every doc that still said "Opus triage" / "Opus top tier" now says Fable where Claude Code dispatches Fable (Phase 1/2 headers, phase-4-review tables + metrics, help EN+TR, review command + skill, orchestration SKILL, features/architecture/performance docs, examples, CLAUDE.md template). The Copilot CLI reviewer trio stays explicitly pinned at GPT-5.4 + Opus + Sonnet (Fable 5 is not offered there) — now stated in `model-fallback.md`, which also drops its stale version-tagged title. The stray `claude-opus-4.6` / `claude-sonnet-4.6` ids are normalized to `claude-opus-4-8` / `claude-sonnet-4-6`.
85
+ - **Adapter-era residue swept**: `index.js` help no longer advertises removed install flags; `package.json` description/keywords say Claude Code + Copilot CLI only; `.gitignore` drops the dead `sync-adapters.mjs` note; `tracker-contract.md` drops the Cursor detection branch; ADR-0007 is marked Superseded and `GENERICITY-REVIEW.md` carries a pre-v10.7.0 banner; ROADMAP's "Current Release" jumps from the stale 10.1.0 (which still claimed Fable retired) to 10.7.x and dead future items are resolved.
86
+ - **Plugin-model docs**: the Copilot instructions template and `scripts/README` no longer reference the deleted `stack-swap.sh` session-start mechanic; `features.md` "Stack Swap" section rewritten around marketplace-plugin enablement; `architecture.md` counts refreshed (39 commands, 8 personas, 17 schemas, 100+ smokes, 41 figma + 38 core + 143 external skills); `help` stack args include `mobile`.
87
+
17
88
  ## [10.7.3] - 2026-07-02
18
89
 
19
90
  ### Changed
@@ -103,8 +174,8 @@ Fable 5 restored as the top model tier; `stack-swap` fully removed; setup gains
103
174
 
104
175
  ### Fixed
105
176
 
106
- - Genericized a leaked corporate Jira key (`DIJITAL` `PROJ`/`{JIRA_KEY}`) in the
107
- `finish` command + skill so the personal-data gate passes.
177
+ - Genericized a leaked corporate Jira project key (replaced with `PROJ`/`{JIRA_KEY}`)
178
+ in the `finish` command + skill so the personal-data gate passes.
108
179
 
109
180
  ## [10.5.0] - 2026-07-02
110
181
 
@@ -1,6 +1,6 @@
1
- # 1. CLI-aware parallel review with Opus triage
1
+ # 1. CLI-aware parallel review with top-tier triage
2
2
 
3
- **Status:** Accepted · 2025 · Amended 2026-04 (CLI-aware reviewer set)
3
+ **Status:** Accepted · 2025 · Amended 2026-04 (CLI-aware reviewer set) · Amended 2026-07 (v10.6.0: Fable 5 restored — Reviewer 1 and triage run on Fable on Claude Code; Copilot CLI pins Opus. "Opus" below reads as "the top tier of the day")
4
4
 
5
5
  ## Context
6
6
 
@@ -1,6 +1,6 @@
1
1
  # 7. Multi-tool adapter framework + token-preserving uninstall
2
2
 
3
- **Status:** Accepted · 2026-04-27 (v7.7.0 / v7.9.0)
3
+ **Status:** Superseded · 2026-07-02 (v10.7.0 removed all non-native adapters — the pipeline targets Claude Code + Copilot CLI only). Original acceptance: 2026-04-27 (v7.7.0 / v7.9.0). Kept as the historical record of why the adapter framework existed; install flags documented below no longer exist.
4
4
 
5
5
  ## Context
6
6
 
@@ -10,13 +10,13 @@ Format: lightly adapted from [Michael Nygard's ADR template](https://cognitect.c
10
10
 
11
11
  | # | Title | Status |
12
12
  |---|-------|--------|
13
- | [0001](./0001-three-model-triage.md) | CLI-aware parallel review with Opus triage | Accepted (amended v5.2.2) |
13
+ | [0001](./0001-three-model-triage.md) | CLI-aware parallel review with top-tier triage | Accepted (amended v5.2.2, v10.6.0 Fable) |
14
14
  | [0002](./0002-instruction-driven-flag.md) | instructionDriven flag as explicit pipeline fork | Accepted |
15
15
  | [0003](./0003-unified-shared-skills.md) | Unified `skills/shared/` for Claude Code + Copilot | Accepted (amended v5.3.3) |
16
16
  | [0004](./0004-zero-dependency-philosophy.md) | Keep the package zero-runtime-deps | Accepted |
17
17
  | [0005](./0005-lazy-phase-docs.md) | Lazy-loaded phase docs with per-phase token budget | Accepted |
18
18
  | [0006](./0006-skills-core-external-split.md) | `shared/core/` vs `shared/external/` source org | Accepted |
19
- | [0007](./0007-multi-tool-adapter-framework.md) | Multi-tool adapter framework + token-preserving uninstall | Accepted (v7.7.0 / v7.9.0) |
19
+ | [0007](./0007-multi-tool-adapter-framework.md) | Multi-tool adapter framework + token-preserving uninstall | Superseded by v10.7.0 (adapters removed; Claude Code + Copilot CLI only) |
20
20
 
21
21
  ## Writing a New ADR
22
22
 
@@ -1,6 +1,6 @@
1
1
  # Architecture
2
2
 
3
- ## 9-Phase Pipeline Flow
3
+ ## 8-Phase Pipeline Flow (0-7)
4
4
 
5
5
  ```mermaid
6
6
  graph TD
@@ -9,7 +9,7 @@ graph TD
9
9
  P1["Phase 1: Analysis<br/>Codebase scan (parallel Explore agents)"]
10
10
  P2["Phase 2: Planning<br/>Task breakdown, architecture review"]
11
11
  P3["Phase 3: Dev<br/>TDD: RED → GREEN → REFACTOR"]
12
- P4["Phase 4: Review<br/>Parallel + Opus triage<br/>(Claude: 2-model · Copilot: 3-model)"]
12
+ P4["Phase 4: Review<br/>Parallel + Fable triage<br/>(Claude: 2-model · Copilot: 3-model)"]
13
13
  P5["Phase 5: Test<br/>Optional manual testing"]
14
14
  P6["Phase 6: Commit<br/>Git commit, PR creation"]
15
15
  P7["Phase 7: Report<br/>Jira · Wiki+Figma · Confluence · Log · Knowledge"]
@@ -53,11 +53,11 @@ graph LR
53
53
  graph TD
54
54
  DIFF["Code Diff"]
55
55
 
56
- DIFF --> OPUS["Opus<br/>Security + Architecture"]
56
+ DIFF --> R1["Fable (Claude Code) / Opus (Copilot)<br/>Security + Architecture"]
57
57
  DIFF --> GPT["GPT-5.4<br/>Quality + Edge Cases"]
58
58
  DIFF --> SON["Sonnet<br/>Correctness + Style"]
59
59
 
60
- OPUS --> TRIAGE["Opus Triage"]
60
+ R1 --> TRIAGE["Fable Triage<br/>(Opus on Copilot CLI)"]
61
61
  GPT --> TRIAGE
62
62
  SON --> TRIAGE
63
63
 
@@ -117,19 +117,19 @@ graph TB
117
117
  end
118
118
 
119
119
  subgraph "Pipeline Specs"
120
- CMD[commands/<br/>25 sub-commands]
121
- AGT[agents/<br/>6 agent definitions]
120
+ CMD[commands/<br/>39 command files<br/>(34 user-facing)]
121
+ AGT[agents/<br/>8 agent personas]
122
122
  RUL[rules/<br/>12 domain rules]
123
- PHS[refs/phases/<br/>8 phase specs]
124
- FIG[skills/figma-{ios,android,common}/<br/>37 figma skills]
125
- CMP[skills/shared/core/<br/>28 orchestration skills<br/>incl. compliance]
126
- EXT[skills/shared/external/<br/>127 curated skills]
123
+ PHS[refs/phases/<br/>8 phase specs + 3 contracts]
124
+ FIG[skills/figma-{ios,android,common}/<br/>41 figma skills]
125
+ CMP[skills/shared/core/<br/>38 orchestration skills<br/>incl. compliance]
126
+ EXT[skills/shared/external/<br/>143 curated skills<br/>(authoring source for the<br/>multi-agent-plugins marketplace)]
127
127
  end
128
128
 
129
129
  subgraph "Quality Gates"
130
- SCH[schemas/<br/>7 JSON schemas + token-budget]
131
- EVL[eval/triage/<br/>10 regression fixtures]
132
- SMK[scripts/smoke-*<br/>45 smoke suites]
130
+ SCH[schemas/<br/>17 JSON schemas]
131
+ EVL[eval/triage/<br/>12 regression fixtures]
132
+ SMK[scripts/smoke-*<br/>100+ smoke suites]
133
133
  end
134
134
 
135
135
  IDX --> INS
@@ -167,7 +167,7 @@ User Input → Phase 0 (Init)
167
167
  ```mermaid
168
168
  graph TD
169
169
  CC["Claude Code<br/>(source of truth)"]
170
- COP["Copilot CLI<br/>(instructions + 145 unified skills)"]
170
+ COP["Copilot CLI<br/>(instructions + unified skills)"]
171
171
  REPO["Pipeline Repo<br/>(npm package)"]
172
172
  WEB["Website<br/>(optional)"]
173
173
  RC["Remote Control<br/>(optional)"]
package/docs/features.md CHANGED
@@ -4,15 +4,15 @@ Comprehensive list of every feature the pipeline ships. The top-level `README.md
4
4
 
5
5
  ## Core Pipeline
6
6
 
7
- ### 9-Phase Orchestration
7
+ ### 8-Phase Orchestration (0-7)
8
8
 
9
9
  ```
10
10
  Phase 0: Init Project selection, branch setup, identity, worktree
11
11
  Phase 1: Analysis Stack detection, codebase exploration (parallel Explore agents)
12
12
  Phase 2: Planning Task decomposition, architecture review, user approval
13
13
  Phase 3: Dev TDD cycle: test → code → build (Sonnet)
14
- Phase 4: Review Deterministic gates + parallel AI review + Opus triage
15
- (Claude Code: Opus + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
14
+ Phase 4: Review Deterministic gates + parallel AI review + Fable triage
15
+ (Claude Code: Fable + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
16
16
  Phase 5: Test Optional manual testing + on-demand device audits
17
17
  Phase 6: Commit Git commit, push, PR with default reviewers + draft/ready prompt
18
18
  Phase 7: Report External: Jira comment · Wiki + Figma screenshots · Confluence
@@ -39,21 +39,22 @@ Compose freely: `--dev --local autopilot` = shortest, least-friction path.
39
39
  | Android/Kotlin | `build.gradle[.kts]` | `refs/android-guide.md` |
40
40
  | Backend | `requirements.txt`, `package.json`, `go.mod` | `refs/backend-guide.md` |
41
41
  | Frontend | `package.json` + framework detection | `refs/frontend-guide.md` |
42
- | Docker | `Dockerfile`, `docker-compose.yml` | `refs/docker-guide.md` |
42
+ | Docker | `Dockerfile`, `docker-compose.yml` | `refs/backend-guide.md` |
43
43
 
44
44
  Build commands, test runners, lint tools, and review focus areas all adapt to the detected stack.
45
45
 
46
- ### Stack Swap
46
+ ### Stack Selection (marketplace plugins, v10.5.0+)
47
47
 
48
- Switch skill sets mid-session without restarting the CLI:
48
+ Stack skill sets ship as versioned plugins in the `multi-agent-plugins` marketplace. Selecting a stack enables the matching plugin(s) in the current repo's `.claude/settings.json` `enabledPlugins` — no skill copying, no session restart tricks, no directory shuffling. The `ai-common-engineering-toolkit` (accessibility audit, humanizer, Firebase) is always enabled alongside the stack plugin.
49
49
 
50
50
  ```bash
51
- /multi-agent stack ios # SwiftUI, Xcode, ViewInspector
52
- /multi-agent stack android # Compose, Gradle, Hilt
53
- /multi-agent stack backend # FastAPI, Node.js, Docker
54
- /multi-agent stack frontend # React, Next.js, Vue
55
- /multi-agent stack mobile # iOS + Android combined
56
- /multi-agent stack all # Load everything
51
+ /multi-agent:stack ios # ai-ios-engineering-toolkit (SwiftUI, Xcode, HIG)
52
+ /multi-agent:stack android # ai-android-engineering-toolkit (Compose, Gradle, Hilt)
53
+ /multi-agent:stack mobile # iOS + Android combined
54
+ /multi-agent:stack backend # ai-backend-toolkit (spec-driven APIs)
55
+ /multi-agent:stack frontend # ai-frontend-engineering-toolkit (React/TSX)
56
+ /multi-agent:stack fullstack # backend + frontend
57
+ /multi-agent:stack all # every stack plugin
57
58
  ```
58
59
 
59
60
  ### Task Type Detection (v2.0.0)
@@ -124,21 +125,21 @@ Cheap, objective checks run BEFORE any AI token is spent:
124
125
 
125
126
  If any gate fails, fix first. Don't waste AI tokens reviewing broken code.
126
127
 
127
- ### 3-Model Parallel Review + Opus Triage (Phase 4 Steps 2–3)
128
+ ### CLI-Aware Parallel Review + Fable Triage (Phase 4 Steps 2–3)
128
129
 
129
- | Reviewer | Model | Focus |
130
- | ---------- | ------------------- | --------------------------------- |
131
- | Reviewer 1 | `claude-opus-4.6` | Deep security + architecture |
132
- | Reviewer 2 | `gpt-5.4` | Edge cases, different perspective |
133
- | Reviewer 3 | `claude-sonnet-4.6` | Quality + correctness + naming |
130
+ | Reviewer | Model | Focus | Where it runs |
131
+ | ---------- | ------------------- | --------------------------------- | -------------------- |
132
+ | Reviewer 1 | `claude-fable-5` (Claude Code) / `claude-opus-4-8` (Copilot CLI) | Deep security + architecture | Both CLIs |
133
+ | Reviewer 2 | `gpt-5.4` | Edge cases, different perspective | **Copilot CLI only** |
134
+ | Reviewer 3 | `claude-sonnet-4-6` | Quality + correctness + naming | Both CLIs |
134
135
 
135
- All three dispatched in parallel. Each returns structured JSON for deterministic aggregation. Cross-model diversity catches blind spots that any single model family would miss.
136
+ The reviewer set is **CLI-aware**: Claude Code dispatches 2 reviewers in parallel (Fable + Sonnet — GPT-5.4 is not available there); Copilot CLI dispatches all 3. Each returns structured JSON for deterministic aggregation. Cross-model diversity catches blind spots that any single model family would miss.
136
137
 
137
- **Opus Triage** (Phase 4 Step 3): Evaluates merged raw findings against task scope. Classifies each as `accepted` (fix now), `deferred` (out of scope, log for later), or `rejected` (false positive / noise). Only triage-accepted blocking items loop back to Phase 3.
138
+ **Fable Triage** (Phase 4 Step 3, Opus on Copilot CLI): Evaluates merged raw findings against task scope. Classifies each as `accepted` (fix now), `deferred` (out of scope, log for later), or `rejected` (false positive / noise). Only triage-accepted blocking items loop back to Phase 3.
138
139
 
139
140
  ### Runtime Triage Validator (v2.3.0)
140
141
 
141
- After Opus triage returns, output is validated by `validate-triage.mjs`:
142
+ After triage returns, output is validated by `validate-triage.mjs`:
142
143
 
143
144
  | Exit | Meaning |
144
145
  | ----- | ------------------------------------------------------------ |
@@ -151,6 +152,18 @@ After Opus triage returns, output is validated by `validate-triage.mjs`:
151
152
 
152
153
  If triage returns `approved: false` but has no blocking items, the validator forces `approved: true`. Conversely, if `approved: true` but blocking items exist, it forces `approved: false`. Hardened with `if`/`then` constraint in schema v3.0.0.
153
154
 
155
+ ### Verify-by-Test Triage (Phase 4 Step 3.7, v10.8.0, opt-in)
156
+
157
+ A triage verdict is a judgment call; a failing repro test is proof. When `prefs.global.verifyByTest.enabled` is on, one verifier agent (default Sonnet) writes a minimal repro test per accepted blocking finding (cap: `maxFindings`=3) and runs only that test. Fails as predicted -> finding confirmed, the repro test becomes the Phase 3 rework RED test. Passes under `evidence-gate.mjs` -> finding downgraded to `deferred`. Compile error / timeout -> `inconclusive`, judgment stands. Timeout-bounded, never blocks. Full spec: `refs/features/verify-by-test.md`.
158
+
159
+ ### Immutable-Test Rule + `test_lines_removed` Signal (v10.8.0)
160
+
161
+ Existing tests are immutable during a task: deleting, renaming, or weakening an assertion to reach green is a violation (`refs/rules.md`, Phase 3 GREEN step). A test changes only when the task changes the spec it encodes, named in the commit body. Deterministic backstop: `diff-risk-score.mjs` emits `test_lines_removed` (w=3.0) for any test-classified file whose diff removes more lines than it adds.
162
+
163
+ ### Structured Handoff Blocks (v10.8.0)
164
+
165
+ Every phase transition appends a `## Handoff` block (Done / Remaining / Decisions / Open findings / Next) to `agent-log.md` - orchestrator-written from existing state, no LLM call. `/multi-agent:resume` and post-`/compact` re-grounding read the latest handoff first, so long runs re-enter from durable artifacts instead of conversation memory (fresh-context discipline from Anthropic's long-running-agent harness guidance).
166
+
154
167
  ### Accessibility Code Review (Phase 4 Step 1.5)
155
168
 
156
169
  If changes include UI files, reviewers check for:
@@ -210,7 +223,7 @@ Per-phase token budgets prevent runaway sessions. If a phase exceeds its budget,
210
223
 
211
224
  `pipeline/scripts/diff-risk-score.mjs` runs at Phase 4 Step 1.75 — before reviewer dispatch. Heuristic, deterministic, sub-second, no LLM. Top-N risk-ranked files inject into each reviewer's prompt as a `${PRIORITY_FILES}` block; reviewers read those files first but still review the entire diff.
212
225
 
213
- Signals + weights: `security_path` ×3, `migration` ×4, `public_api` ×2, `no_test_change` ×2.5, `complexity_delta` ×1.5, `ui_critical` ×1.5, `loc_changed` ×1. Toggle via `prefs.global.diffRiskAdvisory` (default ON).
226
+ Signals + weights: `security_path` ×3, `migration` ×4, `public_api` ×2, `no_test_change` ×2.5, `test_lines_removed` ×3 (v10.8.0: test file shrinks - immutable-test backstop), `complexity_delta` ×1.5, `ui_critical` ×1.5, `loc_changed` ×1. Toggle via `prefs.global.diffRiskAdvisory` (default ON).
214
227
 
215
228
  ### Test Gap Detection (v8.3.0)
216
229
 
@@ -36,7 +36,7 @@ Each event in `metrics.jsonl` is a single-line JSON object written by
36
36
  JSON retries, timeouts.
37
37
  - **Phase 3 retries** — build / test / lint retry distribution per task.
38
38
  - **Cost per model** — calls, duration, tokens in/out, broken down by model
39
- (Opus, Sonnet, GPT-5.4).
39
+ (Fable, Opus, Sonnet, GPT-5.4).
40
40
  - **Language preference** — distribution of EN vs TR prompts.
41
41
 
42
42
  ## Typical Output (Markdown)
@@ -67,8 +67,8 @@ _Source: ~/.claude/logs/multi-agent/metrics.jsonl · Events: 421 (0 parse errors
67
67
 
68
68
  | Model | Calls | Duration (ms) | Tokens In | Tokens Out |
69
69
  |-------|-------|---------------|-----------|------------|
70
- | `claude-opus-4.6` | 124 | 612430 | 380221 | 92114 |
71
- | `claude-sonnet-4.6` | 89 | 412318 | 201445 | 58903 |
70
+ | `claude-fable-5` | 124 | 612430 | 380221 | 92114 |
71
+ | `claude-sonnet-4-6` | 89 | 412318 | 201445 | 58903 |
72
72
  | `gpt-5.4` | 88 | 398214 | 194302 | 55128 |
73
73
  ```
74
74
 
package/index.js CHANGED
@@ -41,30 +41,26 @@ if (!command || command === "install") {
41
41
  npx @mmerterden/multi-agent-pipeline install Install for Claude Code (default)
42
42
  npx @mmerterden/multi-agent-pipeline install --copilot Install for Copilot CLI
43
43
  npx @mmerterden/multi-agent-pipeline install --all Both Claude + Copilot
44
- npx @mmerterden/multi-agent-pipeline install --cursor Cursor full orchestration (rules + subagents + /multi-agent + MCP)
45
- npx @mmerterden/multi-agent-pipeline install --copilot-chat GitHub Copilot Chat (.github/copilot-instructions.md)
46
- npx @mmerterden/multi-agent-pipeline install --antigravity Antigravity full orchestration (.agent/ + AGENTS.md + MCP)
47
- npx @mmerterden/multi-agent-pipeline install --all-tools Every supported tool (Claude + Copilot + Cursor + Copilot Chat + Antigravity)
48
44
  npx @mmerterden/multi-agent-pipeline install --link Use symlinks (saves tokens, dev mode)
49
45
 
50
46
  Uninstall (token-preserving — Keychain/Credential Manager untouched):
51
47
  npx @mmerterden/multi-agent-pipeline uninstall Interactive: remove from all installed targets
52
48
  npx @mmerterden/multi-agent-pipeline uninstall --yes Skip prompt
53
49
  npx @mmerterden/multi-agent-pipeline uninstall --dry-run Report what would be removed
54
- npx @mmerterden/multi-agent-pipeline uninstall --cursor Only Cursor (use --target=<path> to override cwd)
50
+ npx @mmerterden/multi-agent-pipeline uninstall --claude Only Claude Code
51
+ npx @mmerterden/multi-agent-pipeline uninstall --cursor Legacy pre-v10.7 adapter-file cleanup (also --copilot-chat / --antigravity / --codex; --target=<path> overrides cwd)
55
52
 
56
53
  Help:
57
54
  npx @mmerterden/multi-agent-pipeline help
58
55
 
59
56
  Options:
60
57
  --no-color Disable colored output
61
- --target=<path> Adapter target dir (defaults to cwd; ignored for Claude/Copilot)
58
+ --target=<path> Target dir for legacy adapter cleanup on uninstall (defaults to cwd)
62
59
  --platform=ios|android|all Filter external skills by platform (default: all)
63
60
 
64
61
  After installation:
65
62
  Claude Code: /multi-agent "MOBILE-123"
66
63
  Copilot CLI: Describe your task naturally — pipeline instructions are loaded
67
- Cursor / Antigravity / VS Code Copilot Chat: full orchestration (subagents + /multi-agent + MCP)
68
64
 
69
65
  More info: https://github.com/mmerterden/multi-agent-pipeline
70
66
  `);
@@ -240,9 +240,9 @@ Cost block reads `phase-tracker.sh tokens` accumulators × `cost-table.json` pri
240
240
  - Every public method must have tests
241
241
  - Commit format: {type}({scope}): description [{jiraId}]
242
242
 
243
- ## Stack Detection
243
+ ## Stack Selection
244
244
 
245
- Auto-detected at session start by `~/.copilot/scripts/stack-swap.sh` (uses project markers). Manual override:
245
+ Stack skill sets ship as versioned plugins in the `multi-agent-plugins` marketplace. Selecting a stack enables the matching plugin(s) in the target repo's `.claude/settings.json` `enabledPlugins`; the `ai-common-engineering-toolkit` is always enabled alongside. There is no session-start auto-swap script. Select or change the stack with:
246
246
 
247
247
  ```bash
248
248
  multi-agent-stack [ios|android|mobile|backend|frontend|fullstack|all]
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@mmerterden/multi-agent-pipeline",
3
- "version": "10.7.3",
4
- "description": "8-phase AI development pipeline with full orchestration on Claude Code, Copilot CLI, Cursor, Antigravity, and VS Code Copilot Chat. Analysis, planning, TDD, CLI-aware parallel review with consensus surfacing + Opus triage, default-FAIL evidence gates, secret + intent guards, per-phase cost ledger, persistent learnings memory, wiki generation, commit automation. Token-preserving uninstall.",
3
+ "version": "10.8.0",
4
+ "description": "8-phase AI development pipeline with full orchestration on Claude Code and Copilot CLI. Analysis, planning, TDD, CLI-aware parallel review with consensus surfacing + Fable triage, default-FAIL evidence gates, secret + intent guards, per-phase cost ledger, persistent learnings memory, wiki generation, commit automation. Token-preserving uninstall.",
5
5
  "type": "module",
6
6
  "main": "index.js",
7
7
  "exports": {
@@ -34,9 +34,6 @@
34
34
  "copilot-cli",
35
35
  "copilot",
36
36
  "claude",
37
- "cursor",
38
- "windsurf",
39
- "cline",
40
37
  "ios",
41
38
  "android",
42
39
  "backend",
@@ -7,7 +7,7 @@ modelRationale: "Critic tier - deterministic checklist + build/test verificati
7
7
 
8
8
  # Dev Critic Agent - Phase 3.5
9
9
 
10
- You are the in-loop critic for Phase 3 (Dev). The generator (Sonnet/Opus during Phase 3) has just finished its last edit. **You run BEFORE Phase 4**, on the same worktree, against deterministic criteria that already exist on disk. Your job: catch failures the generator would otherwise send into Phase 4 and waste 2-3 reviewer calls + Opus triage on.
10
+ You are the in-loop critic for Phase 3 (Dev). The generator (Sonnet/Opus during Phase 3) has just finished its last edit. **You run BEFORE Phase 4**, on the same worktree, against deterministic criteria that already exist on disk. Your job: catch failures the generator would otherwise send into Phase 4 and waste 2-3 reviewer calls + Fable triage on.
11
11
 
12
12
  This is the **evaluator-optimizer pattern** from Anthropic's "Building Effective Agents" - the pattern is most effective "when we have clear evaluation criteria, and when iterative refinement provides measurable value." Phase 3 satisfies both: criteria are written in `rules/*.md`, refinement value is measured by Phase 4 fix-cycles avoided.
13
13
 
@@ -16,7 +16,7 @@
16
16
  1. Analysis (Opus) -> scope, impact analysis
17
17
  2. Planning (Opus) -> spec, task breakdown
18
18
  3. Development (Sonnet) -> TDD, code, build
19
- 4. Review -> deterministic gates + parallel review + Opus triage
19
+ 4. Review -> deterministic gates + parallel review + Fable triage
20
20
  - Claude Code: Opus + Sonnet (2 paralel)
21
21
  - Copilot CLI: GPT-5.4 + Opus + Sonnet (3 paralel)
22
22
 
@@ -33,7 +33,7 @@ Phase 7: Report → short terminal summary
33
33
 
34
34
  1. **Parse input** - standard multi-agent input formats (Issue URL, Jira ID, free text)
35
35
  2. **Phase 0: Init** - set `"mode": "dev", "autopilot": true` in `agent-state.json`
36
- 3. **Phase 3: Dev** - write code directly on `claude-opus-4.6` and verify the build
36
+ 3. **Phase 3: Dev** - write code directly on `claude-opus-4-8` and verify the build
37
37
  4. **Phase 6: Commit** - auto commit + push + PR
38
38
  5. **Phase 7: Report** - terminal summary
39
39
 
@@ -33,7 +33,7 @@ You already did the work locally - wrote code on the current branch and maybe
33
33
 
34
34
  ```
35
35
  Phase 0: Init → project/branch detect, resolve base + diff (work-already-done), Jira id, state (NO worktree)
36
- Phase 4: Review → deterministic gates + parallel review (Opus + Sonnet) + Opus triage
36
+ Phase 4: Review → deterministic gates + parallel review (Fable + Sonnet) + Fable triage
37
37
  Phase 5: Build+Test → stack-aware build gate + run existing tests; SUCCESS required (automated, not the interactive user-test)
38
38
  Phase 6: Commit → commit remaining local changes + push + open PR if none exists
39
39
  Phase 7: Report → technical analysis + Jira comment with test scenarios (channels: Jira / PR / Confluence / Wiki)
@@ -51,7 +51,7 @@ Phases 1-3 (Analysis / Planning / Dev) are skipped by design - `finish` treats
51
51
 
52
52
  ## Phase execution (reuse the existing phase contracts)
53
53
 
54
- - **Phase 4 Review** — run per `refs/phases/phase-4-review.md` against the resolved diff: deterministic gates (Step 1.x), stack-specific parallel reviewers (Opus + Sonnet on Claude Code; +GPT on Copilot CLI), Opus triage → `triage.accepted`. Blocking/important accepted findings:
54
+ - **Phase 4 Review** — run per `refs/phases/phase-4-review.md` against the resolved diff: deterministic gates (Step 1.x), stack-specific parallel reviewers (Fable + Sonnet on Claude Code; GPT + Opus + Sonnet on Copilot CLI), Fable triage → `triage.accepted`. Blocking/important accepted findings:
55
55
  - interactive: present them and ask (`AskUserQuestion`) whether to fix now (loop back through a minimal Phase-3-style TDD fix) or proceed;
56
56
  - `autopilot` (or `prefs.global.finish.autoFix == true`): auto-fix accepted blocking/important findings, then re-review the fix, before advancing.
57
57
  - **Phase 5 Build+Test** — the **automated success gate** (this is what "build+test success" means here; the interactive device user-test is `/multi-agent:manual-test`). Stack-aware: build via `figma-config.build` (iOS scheme / Android gradle / detected backend/frontend build) and run the existing test suite if present (`swift test` / `xcodebuild test` / `./gradlew test` / `pytest` / `npm test` / `vitest`). Require success to advance; on failure, surface logs and (interactive) stop or (autopilot) attempt a bounded fix loop. **If the repo has no tests, report "no tests present" — never fabricate test results.**
@@ -52,13 +52,13 @@ How It Works (Phase 0 - Interactive Flow):
52
52
  Pipeline (after Phase 0) - shown as visual cards in terminal:
53
53
 
54
54
  Phase 0: Init -> The 8 steps above
55
- Phase 1: Analysis -> Stack detection + codebase scan (Opus)
55
+ Phase 1: Analysis -> Stack detection + codebase scan (Fable)
56
56
  Phase 2: Planning -> Task breakdown + architecture review + Plan Approval Gate
57
57
  (clarification max 2 rounds + approval loop - normal mode only;
58
58
  skipped for --dev, autopilot, --dev autopilot)
59
59
  Phase 3: Dev -> TDD: test -> code -> build (Sonnet) + build queue
60
- Phase 4: Review -> Deterministic gates + parallel AI review + Opus triage
61
- (Claude Code: Opus + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
60
+ Phase 4: Review -> Deterministic gates + parallel AI review + Fable triage
61
+ (Claude Code: Fable + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
62
62
  Phase 5: Test -> Optional: switch to branch, test in Xcode
63
63
  (runs in dev + full; skipped in every autopilot and local variant)
64
64
  Phase 6: Commit -> Commit -> push -> PR + issue body update (never auto-closes)
@@ -73,7 +73,7 @@ Pipeline (after Phase 0) - shown as visual cards in terminal:
73
73
 
74
74
  Modes:
75
75
 
76
- (normal) Full 8 phases, Sonnet dev, Plan Approval Gate active, parallel review + Opus triage
76
+ (normal) Full 8 phases, Sonnet dev, Plan Approval Gate active, parallel review + Fable triage
77
77
  --dev Fast: Init -> Dev(Opus) -> Commit -> Report (no plan gate)
78
78
  --local No worktree - works directly on local branch
79
79
  autopilot Skip all confirmations INCLUDING plan gate, auto commit/PR
@@ -120,7 +120,7 @@ Setup & Maintenance:
120
120
 
121
121
  /multi-agent:setup First-run wizard - Keychain tokens + Git identity + language
122
122
  /multi-agent:language [en|tr] Show or set outputLanguage (promptLanguage stays English)
123
- /multi-agent:stack <id> Select stack by enabling the matching marketplace plugin(s) (ios / android / backend / frontend / fullstack / all)
123
+ /multi-agent:stack <id> Select stack by enabling the matching marketplace plugin(s) (ios / android / mobile / backend / frontend / fullstack / all)
124
124
  /multi-agent:sync Sync ecosystem (Claude Code + Copilot CLI + pipeline + website + remote-control)
125
125
  /multi-agent:update Pull latest pipeline + reinstall + run migrations
126
126
  /multi-agent:delete Uninstall pipeline from every CLI (Keychain tokens left intact, double confirm)
@@ -195,7 +195,7 @@ Quality & Telemetry (advisory, on by default - flip prefs.global.* to disable)
195
195
  Triage Memory Phase 7 ingests accepted/deferred/rejected findings into a per-repo corpus
196
196
  Prior-Art Lookup Phase 1 + Phase 4 query the corpus for similar past findings, inject as context
197
197
  Per-Persona Reviewer/agent dispatch reads `preferredModel` from persona file; per-call override
198
- via PHASE_MODEL_OVERRIDE; falls back to opus
198
+ via PHASE_MODEL_OVERRIDE; ladder fable -> opus -> sonnet -> haiku
199
199
 
200
200
  ------------------------------------------------------------
201
201
 
@@ -260,13 +260,13 @@ Nasıl Çalışır (Phase 0 - İnteraktif Akış):
260
260
  Pipeline (Phase 0'dan sonra) - terminalde görsel kart olarak görünür:
261
261
 
262
262
  Phase 0: Init -> Yukarıdaki 8 adım
263
- Phase 1: Analysis -> Stack tespiti + codebase taraması (Opus)
263
+ Phase 1: Analysis -> Stack tespiti + codebase taraması (Fable)
264
264
  Phase 2: Planning -> Task kırılımı + mimari inceleme + Plan Onay Kapısı
265
265
  (clarification max 2 tur + onay döngüsü - sadece normal mode;
266
266
  --dev, autopilot, --dev autopilot'ta skip)
267
267
  Phase 3: Dev -> TDD: test -> kod -> build (Sonnet) + build queue
268
- Phase 4: Review -> Deterministik kapılar + paralel AI review + Opus triage
269
- (Claude Code: Opus + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
268
+ Phase 4: Review -> Deterministik kapılar + paralel AI review + Fable triage
269
+ (Claude Code: Fable + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
270
270
  Phase 5: Test -> Opsiyonel: branch'e geç, Xcode'da test (--dev / autopilot'ta skip)
271
271
  Phase 6: Commit -> Commit -> push -> PR + issue body güncelleme (hiç auto-close yok)
272
272
  Phase 7: Report -> Channels dispatcher (PR · Jira · Confluence · Wiki, multi-select)
@@ -280,7 +280,7 @@ Pipeline (Phase 0'dan sonra) - terminalde görsel kart olarak görünür:
280
280
 
281
281
  Modlar:
282
282
 
283
- (normal) Tam 8 faz, Sonnet dev, Plan Onay Kapısı aktif, paralel review + Opus triage
283
+ (normal) Tam 8 faz, Sonnet dev, Plan Onay Kapısı aktif, paralel review + Fable triage
284
284
  --dev Hızlı: Init -> Dev(Opus) -> Commit -> Report (plan gate yok)
285
285
  --local Worktree yok - doğrudan local branch'te çalışır
286
286
  autopilot Plan gate dahil tüm onayları atla, otomatik commit/PR
@@ -327,7 +327,7 @@ Setup & Maintenance:
327
327
 
328
328
  /multi-agent:setup İlk kurulum sihirbazı - Keychain token + Git kimliği + dil
329
329
  /multi-agent:language [en|tr] outputLanguage'ı göster veya ayarla (promptLanguage İngilizce kalır)
330
- /multi-agent:stack <id> Stack'i eşleşen marketplace plugin'i etkinleştirerek seç (ios / android / backend / frontend / fullstack / all)
330
+ /multi-agent:stack <id> Stack'i eşleşen marketplace plugin'i etkinleştirerek seç (ios / android / mobile / backend / frontend / fullstack / all)
331
331
  /multi-agent:sync Ekosistemi senkronize et (Claude Code + Copilot CLI + pipeline + website + remote-control)
332
332
  /multi-agent:update En son pipeline'ı çek + reinstall + migration çalıştır
333
333
  /multi-agent:delete Pipeline'ı tüm CLI'lerden kaldır (Keychain token dokunulmaz, çift onay)
@@ -402,7 +402,7 @@ Quality & Telemetry (advisory, default açık - prefs.global.* ile kapatılabi
402
402
  Triage Memory Phase 7 accepted/deferred/rejected bulguları repo başına corpus'a yazar
403
403
  Prior-Art Lookup Phase 1 + Phase 4 corpus'tan benzer geçmiş bulgu sorgular, ek context olarak enjekte
404
404
  Per-Persona Reviewer/agent dispatch persona dosyasından `preferredModel` okur;
405
- per-call override PHASE_MODEL_OVERRIDE ile; fallback opus
405
+ per-call override PHASE_MODEL_OVERRIDE ile; merdiven fable -> opus -> sonnet -> haiku
406
406
 
407
407
  ------------------------------------------------------------
408
408
 
@@ -30,7 +30,7 @@ Phase 0: Init → project detection, branch check, state (NO worktree)
30
30
  Phase 1: Analysis → codebase scan (parallel explore agents, Opus)
31
31
  Phase 2: Planning → task breakdown, Plan Approval Gate (approval loop)
32
32
  Phase 3: Dev → TDD (Sonnet), build queue
33
- Phase 4: Review → deterministic gates + parallel review + Opus triage
33
+ Phase 4: Review → deterministic gates + parallel review + Fable triage
34
34
  Phase 6: Commit → pre-commit checkout prompt, commit + push + PR
35
35
  Phase 7: Report → Jira / Wiki / Confluence + log + knowledge/memory
36
36
  ```
@@ -37,7 +37,7 @@ Introduces ~1× Sonnet call per Dev iteration. On simple bug fixes the cost outw
37
37
 
38
38
  ## Why this fits orchestrator-workers + evaluator-optimizer hybrid
39
39
 
40
- Phase 4 is parallelization-with-voting - good for *adversarial* perspectives (security, architecture). Phase 3.5 is evaluator-optimizer - good for *deterministic* criteria (build, tests, checklists). Sending failing builds into Phase 4 wastes 2-3 reviewer calls + Opus triage; Phase 3.5 absorbs that cost at one Sonnet call.
40
+ Phase 4 is parallelization-with-voting - good for *adversarial* perspectives (security, architecture). Phase 3.5 is evaluator-optimizer - good for *deterministic* criteria (build, tests, checklists). Sending failing builds into Phase 4 wastes 2-3 reviewer calls + Fable triage; Phase 3.5 absorbs that cost at one Sonnet call.
41
41
 
42
42
  ## Reference
43
43
 
@@ -1,4 +1,6 @@
1
- # Model Fallback Contract (v10.1.0)
1
+ # Model Fallback Contract
2
+
3
+ > Contract last revised in **v10.6.0** (Fable 5 restored as top tier). The version tag here tracks the last substantive change to this contract, not the pipeline release.
2
4
 
3
5
  Personas route to the top available intelligence tier they declare in
4
6
  `preferredModel`. That tier can be quota-limited or temporarily unavailable.
@@ -95,5 +97,7 @@ per-phase `model` field already carries the override).
95
97
  deterministic over clever).
96
98
  - No edits to `pipeline/agents/*.md` at runtime; frontmatter is install-time
97
99
  configuration only.
98
- - Copilot CLI reviewer set and adapter-platform model pins are out of scope
99
- (they pin their own models and do not use this persona ladder).
100
+ - Copilot CLI reviewer set is out of scope: Copilot CLI pins its own three
101
+ reviewer models (GPT-5.4 + Opus + Sonnet Fable 5 is not offered there) and
102
+ does not use this persona ladder. Only Claude Code dispatches Reviewer-1 on
103
+ Fable.