@mmerterden/multi-agent-pipeline 10.7.2 → 10.7.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +33 -2
- package/README.md +20 -4
- package/docs/adr/0001-three-model-triage.md +2 -2
- package/docs/adr/0007-multi-tool-adapter-framework.md +1 -1
- package/docs/adr/README.md +2 -2
- package/docs/architecture.md +14 -14
- package/docs/features.md +22 -21
- package/docs/performance.md +3 -3
- package/index.js +3 -7
- package/install/templates/copilot-instructions.md +2 -2
- package/package.json +2 -5
- package/pipeline/agents/dev-critic.md +1 -1
- package/pipeline/agents/task-clarifier.md +1 -3
- package/pipeline/claude-md-template.md +1 -1
- package/pipeline/commands/multi-agent/dev-autopilot.md +1 -1
- package/pipeline/commands/multi-agent/finish.md +2 -2
- package/pipeline/commands/multi-agent/help.md +12 -12
- package/pipeline/commands/multi-agent/local.md +1 -1
- package/pipeline/commands/multi-agent/refs/features/dev-critic.md +1 -1
- package/pipeline/commands/multi-agent/refs/features/model-fallback.md +7 -3
- package/pipeline/commands/multi-agent/refs/features/shadow-git.md +1 -1
- package/pipeline/commands/multi-agent/refs/knowledge.md +1 -1
- package/pipeline/commands/multi-agent/refs/phases/log-format.md +1 -1
- package/pipeline/commands/multi-agent/refs/phases/modes.md +1 -1
- package/pipeline/commands/multi-agent/refs/phases/phase-0-init.md +1 -1
- package/pipeline/commands/multi-agent/refs/phases/phase-1-analysis.md +2 -2
- package/pipeline/commands/multi-agent/refs/phases/phase-2-planning.md +3 -3
- package/pipeline/commands/multi-agent/refs/phases/phase-3-dev.md +2 -2
- package/pipeline/commands/multi-agent/refs/phases/phase-4-review.md +18 -18
- package/pipeline/commands/multi-agent/refs/progress-contract.md +1 -1
- package/pipeline/commands/multi-agent/refs/tracker-contract.md +1 -3
- package/pipeline/commands/multi-agent/review.md +8 -8
- package/pipeline/commands/multi-agent/sync.md +3 -3
- package/pipeline/commands/multi-agent.md +7 -7
- package/pipeline/lib/plan-todos.sh +2 -5
- package/pipeline/lib/post-pr-review.sh +2 -2
- package/pipeline/lib/review-watch.sh +2 -6
- package/pipeline/lib/shadow-git.sh +3 -5
- package/pipeline/schemas/agent-state.schema.json +1 -1
- package/pipeline/schemas/clarify-output.schema.json +1 -1
- package/pipeline/schemas/plan-todos.schema.json +1 -1
- package/pipeline/schemas/prefs.schema.json +8 -8
- package/pipeline/schemas/reviewer-output.schema.json +1 -1
- package/pipeline/schemas/triage-output.schema.json +2 -2
- package/pipeline/scripts/README.md +1 -2
- package/pipeline/scripts/cost-budget-check.mjs +1 -1
- package/pipeline/scripts/cost-table.json +7 -0
- package/pipeline/scripts/fixtures/install-layout.tsv +5 -5
- package/pipeline/scripts/smoke-review-watch.sh +2 -2
- package/pipeline/scripts/smoke-shadow-git.sh +1 -1
- package/pipeline/scripts/uninstall.mjs +53 -57
- package/pipeline/skills/shared/core/multi-agent/SKILL.md +11 -11
- package/pipeline/skills/shared/core/multi-agent-dev-autopilot/SKILL.md +1 -1
- package/pipeline/skills/shared/core/multi-agent-finish/SKILL.md +1 -1
- package/pipeline/skills/shared/core/multi-agent-help/SKILL.md +8 -8
- package/pipeline/skills/shared/core/multi-agent-review/SKILL.md +5 -5
- package/pipeline/skills/shared/core/multi-agent-sync/SKILL.md +7 -5
- package/pipeline/scripts/smoke-readme-counts.sh +0 -120
package/CHANGELOG.md
CHANGED
|
@@ -14,6 +14,37 @@ Internal file-layout changes that don't affect the slash-command surface are sti
|
|
|
14
14
|
|
|
15
15
|
---
|
|
16
16
|
|
|
17
|
+
## [10.7.4] - 2026-07-02
|
|
18
|
+
|
|
19
|
+
Deep-consistency sweep: the v10.6.0 Fable restore and the v10.7.0 adapter removal are now reflected on every surface, and the test suite is green again end to end.
|
|
20
|
+
|
|
21
|
+
### Fixed
|
|
22
|
+
|
|
23
|
+
- **`cost-table.json` regains the `fable` price row** (`claude-fable-5`, $10/$50 per MTok, cache-read $1) — the v10.6.0 restore had left every architect/Reviewer-1/triage dispatch unpriced ("USD unavailable") in the cost ledger. `cost-budget-check.mjs` and `prefs.schema.json` `costBudget.pricingModel` now default to `fable` (conservative upper bound); `triage-output` / `reviewer-output` schemas and the telemetry enums (`log-format`, `progress-contract`) accept `fable` (+ `gpt` as reviewer source).
|
|
24
|
+
- **`uninstall` legacy adapter cleanup actually works again.** The four `--cursor` / `--copilot-chat` / `--antigravity` / `--codex` blocks imported adapter modules deleted in v10.7.0, so every run silently no-opped. They now perform inline cleanup (multi-agent-* files + managed-marker blocks) with user files untouched; verified by a round-trip test.
|
|
25
|
+
- **`smoke-install-layout` fixture regenerated** (167 scripts; it was stale at 174 since the v10.7.0 deletions and failed `npm test` + CI). `smoke-readme-counts.sh` deleted — it asserted a README counts table that the v10.7.1 concise-README rewrite intentionally removed. The 10.7.1 changelog entry no longer spells the corporate Jira key it genericized, so the tarball leak gate passes.
|
|
26
|
+
- **`multi-agent-sync` skill command inventory reconciled** — it still listed the v7-era 26 commands; now the canonical 35 (incl. `finish`), matching `sync.md` and `cross-cli-contract.md`.
|
|
27
|
+
|
|
28
|
+
### Changed
|
|
29
|
+
|
|
30
|
+
- **Fable-restore consistency sweep.** Every doc that still said "Opus triage" / "Opus top tier" now says Fable where Claude Code dispatches Fable (Phase 1/2 headers, phase-4-review tables + metrics, help EN+TR, review command + skill, orchestration SKILL, features/architecture/performance docs, examples, CLAUDE.md template). The Copilot CLI reviewer trio stays explicitly pinned at GPT-5.4 + Opus + Sonnet (Fable 5 is not offered there) — now stated in `model-fallback.md`, which also drops its stale version-tagged title. The stray `claude-opus-4.6` / `claude-sonnet-4.6` ids are normalized to `claude-opus-4-8` / `claude-sonnet-4-6`.
|
|
31
|
+
- **Adapter-era residue swept**: `index.js` help no longer advertises removed install flags; `package.json` description/keywords say Claude Code + Copilot CLI only; `.gitignore` drops the dead `sync-adapters.mjs` note; `tracker-contract.md` drops the Cursor detection branch; ADR-0007 is marked Superseded and `GENERICITY-REVIEW.md` carries a pre-v10.7.0 banner; ROADMAP's "Current Release" jumps from the stale 10.1.0 (which still claimed Fable retired) to 10.7.x and dead future items are resolved.
|
|
32
|
+
- **Plugin-model docs**: the Copilot instructions template and `scripts/README` no longer reference the deleted `stack-swap.sh` session-start mechanic; `features.md` "Stack Swap" section rewritten around marketplace-plugin enablement; `architecture.md` counts refreshed (39 commands, 8 personas, 17 schemas, 100+ smokes, 41 figma + 38 core + 143 external skills); `help` stack args include `mobile`.
|
|
33
|
+
|
|
34
|
+
## [10.7.3] - 2026-07-02
|
|
35
|
+
|
|
36
|
+
### Changed
|
|
37
|
+
|
|
38
|
+
- **Zero external-tool residue.** Genericized every remaining editor/tool citation
|
|
39
|
+
in the pipeline's own files (Windsurf Cascade / Cursor Plan Mode / Cursor Bugbot /
|
|
40
|
+
Cline / Devin) across schemas, lib comments, refs, and the task-clarifier persona —
|
|
41
|
+
feature behavior is unchanged, only the third-party naming is dropped. (Vendored
|
|
42
|
+
`shared/external` knowledge and the `~/.codex` prune path in `update` are unaffected.)
|
|
43
|
+
- **README** gained a **Tokens & integrations** table (keychain-mapping model + the
|
|
44
|
+
services the pipeline talks to: Jira, GitHub, Bitbucket, Confluence, Figma, Fortify,
|
|
45
|
+
Firebase, Jenkins, npm) and an explicit **Platform support** section (macOS / Linux /
|
|
46
|
+
Windows, keychain backends).
|
|
47
|
+
|
|
17
48
|
## [10.7.2] - 2026-07-02
|
|
18
49
|
|
|
19
50
|
### Changed
|
|
@@ -89,8 +120,8 @@ Fable 5 restored as the top model tier; `stack-swap` fully removed; setup gains
|
|
|
89
120
|
|
|
90
121
|
### Fixed
|
|
91
122
|
|
|
92
|
-
- Genericized a leaked corporate Jira key (
|
|
93
|
-
`finish` command + skill so the personal-data gate passes.
|
|
123
|
+
- Genericized a leaked corporate Jira project key (replaced with `PROJ`/`{JIRA_KEY}`)
|
|
124
|
+
in the `finish` command + skill so the personal-data gate passes.
|
|
94
125
|
|
|
95
126
|
## [10.5.0] - 2026-07-02
|
|
96
127
|
|
package/README.md
CHANGED
|
@@ -84,11 +84,27 @@ The pipeline runs natively on **Claude Code** and **Copilot CLI** — both insta
|
|
|
84
84
|
|
|
85
85
|
Filter skills by stack with `--platform=ios\|android\|all`.
|
|
86
86
|
|
|
87
|
-
##
|
|
87
|
+
## Tokens & integrations
|
|
88
88
|
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
89
|
+
`setup` scans your OS keychain and maps each token by a **logical name** (e.g. `jira`) to its real keychain entry — the pipeline resolves tokens through that mapping (`credential-store.sh`), so literal keychain names never appear in synced files. Tokens stay in the keychain (macOS Keychain / Windows Credential Manager / Linux libsecret), are **never committed or logged**, and are all **optional** — the pipeline asks for any it needs at Phase 0.
|
|
90
|
+
|
|
91
|
+
| Token | Used for | Phase |
|
|
92
|
+
|---|---|---|
|
|
93
|
+
| `jira` | fetch the issue · post the report comment | 0, 7 |
|
|
94
|
+
| `github` | issues · PRs · `gh` auth | 0, 6 |
|
|
95
|
+
| `bitbucket` | PR create/update (reviewer-preserving) · diff | 6 |
|
|
96
|
+
| `confluence` | publish analysis / wiki pages | 7 |
|
|
97
|
+
| `figma` + `figma_mcp` | fetch design context | analysis only |
|
|
98
|
+
| `fortify` | security-scan findings gate | 4 |
|
|
99
|
+
| `firebase` | Firebase config (base64 JSON) for Firebase projects | as needed |
|
|
100
|
+
| `jenkins` | CI trigger / status | build / deploy |
|
|
101
|
+
| `npm` | package publish (mostly CI) | release |
|
|
102
|
+
|
|
103
|
+
The **secret scan** runs as a `PreToolUse` hook on Claude Code (hard-blocks a commit on a hit) and as a pre-push check elsewhere.
|
|
104
|
+
|
|
105
|
+
## Platform support
|
|
106
|
+
|
|
107
|
+
Runs on **macOS**, **Linux**, and **Windows** (Git Bash / WSL). Shell and credential access go through a platform-agnostic layer — the keychain resolves automatically to **macOS Keychain**, **Linux libsecret** (`secret-tool`), or **Windows Credential Manager**, and scripts fall back between BSD and GNU tool variants. Node.js 18 / 20 / 22.
|
|
92
108
|
|
|
93
109
|
## Companion repos
|
|
94
110
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
|
-
# 1. CLI-aware parallel review with
|
|
1
|
+
# 1. CLI-aware parallel review with top-tier triage
|
|
2
2
|
|
|
3
|
-
**Status:** Accepted · 2025 · Amended 2026-04 (CLI-aware reviewer set)
|
|
3
|
+
**Status:** Accepted · 2025 · Amended 2026-04 (CLI-aware reviewer set) · Amended 2026-07 (v10.6.0: Fable 5 restored — Reviewer 1 and triage run on Fable on Claude Code; Copilot CLI pins Opus. "Opus" below reads as "the top tier of the day")
|
|
4
4
|
|
|
5
5
|
## Context
|
|
6
6
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# 7. Multi-tool adapter framework + token-preserving uninstall
|
|
2
2
|
|
|
3
|
-
**Status:**
|
|
3
|
+
**Status:** Superseded · 2026-07-02 (v10.7.0 removed all non-native adapters — the pipeline targets Claude Code + Copilot CLI only). Original acceptance: 2026-04-27 (v7.7.0 / v7.9.0). Kept as the historical record of why the adapter framework existed; install flags documented below no longer exist.
|
|
4
4
|
|
|
5
5
|
## Context
|
|
6
6
|
|
package/docs/adr/README.md
CHANGED
|
@@ -10,13 +10,13 @@ Format: lightly adapted from [Michael Nygard's ADR template](https://cognitect.c
|
|
|
10
10
|
|
|
11
11
|
| # | Title | Status |
|
|
12
12
|
|---|-------|--------|
|
|
13
|
-
| [0001](./0001-three-model-triage.md) | CLI-aware parallel review with
|
|
13
|
+
| [0001](./0001-three-model-triage.md) | CLI-aware parallel review with top-tier triage | Accepted (amended v5.2.2, v10.6.0 Fable) |
|
|
14
14
|
| [0002](./0002-instruction-driven-flag.md) | instructionDriven flag as explicit pipeline fork | Accepted |
|
|
15
15
|
| [0003](./0003-unified-shared-skills.md) | Unified `skills/shared/` for Claude Code + Copilot | Accepted (amended v5.3.3) |
|
|
16
16
|
| [0004](./0004-zero-dependency-philosophy.md) | Keep the package zero-runtime-deps | Accepted |
|
|
17
17
|
| [0005](./0005-lazy-phase-docs.md) | Lazy-loaded phase docs with per-phase token budget | Accepted |
|
|
18
18
|
| [0006](./0006-skills-core-external-split.md) | `shared/core/` vs `shared/external/` source org | Accepted |
|
|
19
|
-
| [0007](./0007-multi-tool-adapter-framework.md) | Multi-tool adapter framework + token-preserving uninstall |
|
|
19
|
+
| [0007](./0007-multi-tool-adapter-framework.md) | Multi-tool adapter framework + token-preserving uninstall | Superseded by v10.7.0 (adapters removed; Claude Code + Copilot CLI only) |
|
|
20
20
|
|
|
21
21
|
## Writing a New ADR
|
|
22
22
|
|
package/docs/architecture.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Architecture
|
|
2
2
|
|
|
3
|
-
##
|
|
3
|
+
## 8-Phase Pipeline Flow (0-7)
|
|
4
4
|
|
|
5
5
|
```mermaid
|
|
6
6
|
graph TD
|
|
@@ -9,7 +9,7 @@ graph TD
|
|
|
9
9
|
P1["Phase 1: Analysis<br/>Codebase scan (parallel Explore agents)"]
|
|
10
10
|
P2["Phase 2: Planning<br/>Task breakdown, architecture review"]
|
|
11
11
|
P3["Phase 3: Dev<br/>TDD: RED → GREEN → REFACTOR"]
|
|
12
|
-
P4["Phase 4: Review<br/>Parallel +
|
|
12
|
+
P4["Phase 4: Review<br/>Parallel + Fable triage<br/>(Claude: 2-model · Copilot: 3-model)"]
|
|
13
13
|
P5["Phase 5: Test<br/>Optional manual testing"]
|
|
14
14
|
P6["Phase 6: Commit<br/>Git commit, PR creation"]
|
|
15
15
|
P7["Phase 7: Report<br/>Jira · Wiki+Figma · Confluence · Log · Knowledge"]
|
|
@@ -53,11 +53,11 @@ graph LR
|
|
|
53
53
|
graph TD
|
|
54
54
|
DIFF["Code Diff"]
|
|
55
55
|
|
|
56
|
-
DIFF -->
|
|
56
|
+
DIFF --> R1["Fable (Claude Code) / Opus (Copilot)<br/>Security + Architecture"]
|
|
57
57
|
DIFF --> GPT["GPT-5.4<br/>Quality + Edge Cases"]
|
|
58
58
|
DIFF --> SON["Sonnet<br/>Correctness + Style"]
|
|
59
59
|
|
|
60
|
-
|
|
60
|
+
R1 --> TRIAGE["Fable Triage<br/>(Opus on Copilot CLI)"]
|
|
61
61
|
GPT --> TRIAGE
|
|
62
62
|
SON --> TRIAGE
|
|
63
63
|
|
|
@@ -117,19 +117,19 @@ graph TB
|
|
|
117
117
|
end
|
|
118
118
|
|
|
119
119
|
subgraph "Pipeline Specs"
|
|
120
|
-
CMD[commands/<br/>
|
|
121
|
-
AGT[agents/<br/>
|
|
120
|
+
CMD[commands/<br/>39 command files<br/>(34 user-facing)]
|
|
121
|
+
AGT[agents/<br/>8 agent personas]
|
|
122
122
|
RUL[rules/<br/>12 domain rules]
|
|
123
|
-
PHS[refs/phases/<br/>8 phase specs]
|
|
124
|
-
FIG[skills/figma-{ios,android,common}/<br/>
|
|
125
|
-
CMP[skills/shared/core/<br/>
|
|
126
|
-
EXT[skills/shared/external/<br/>
|
|
123
|
+
PHS[refs/phases/<br/>8 phase specs + 3 contracts]
|
|
124
|
+
FIG[skills/figma-{ios,android,common}/<br/>41 figma skills]
|
|
125
|
+
CMP[skills/shared/core/<br/>38 orchestration skills<br/>incl. compliance]
|
|
126
|
+
EXT[skills/shared/external/<br/>143 curated skills<br/>(authoring source for the<br/>multi-agent-plugins marketplace)]
|
|
127
127
|
end
|
|
128
128
|
|
|
129
129
|
subgraph "Quality Gates"
|
|
130
|
-
SCH[schemas/<br/>
|
|
131
|
-
EVL[eval/triage/<br/>
|
|
132
|
-
SMK[scripts/smoke-*<br/>
|
|
130
|
+
SCH[schemas/<br/>17 JSON schemas]
|
|
131
|
+
EVL[eval/triage/<br/>12 regression fixtures]
|
|
132
|
+
SMK[scripts/smoke-*<br/>100+ smoke suites]
|
|
133
133
|
end
|
|
134
134
|
|
|
135
135
|
IDX --> INS
|
|
@@ -167,7 +167,7 @@ User Input → Phase 0 (Init)
|
|
|
167
167
|
```mermaid
|
|
168
168
|
graph TD
|
|
169
169
|
CC["Claude Code<br/>(source of truth)"]
|
|
170
|
-
COP["Copilot CLI<br/>(instructions +
|
|
170
|
+
COP["Copilot CLI<br/>(instructions + unified skills)"]
|
|
171
171
|
REPO["Pipeline Repo<br/>(npm package)"]
|
|
172
172
|
WEB["Website<br/>(optional)"]
|
|
173
173
|
RC["Remote Control<br/>(optional)"]
|
package/docs/features.md
CHANGED
|
@@ -4,15 +4,15 @@ Comprehensive list of every feature the pipeline ships. The top-level `README.md
|
|
|
4
4
|
|
|
5
5
|
## Core Pipeline
|
|
6
6
|
|
|
7
|
-
###
|
|
7
|
+
### 8-Phase Orchestration (0-7)
|
|
8
8
|
|
|
9
9
|
```
|
|
10
10
|
Phase 0: Init Project selection, branch setup, identity, worktree
|
|
11
11
|
Phase 1: Analysis Stack detection, codebase exploration (parallel Explore agents)
|
|
12
12
|
Phase 2: Planning Task decomposition, architecture review, user approval
|
|
13
13
|
Phase 3: Dev TDD cycle: test → code → build (Sonnet)
|
|
14
|
-
Phase 4: Review Deterministic gates + parallel AI review +
|
|
15
|
-
(Claude Code:
|
|
14
|
+
Phase 4: Review Deterministic gates + parallel AI review + Fable triage
|
|
15
|
+
(Claude Code: Fable + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
|
|
16
16
|
Phase 5: Test Optional manual testing + on-demand device audits
|
|
17
17
|
Phase 6: Commit Git commit, push, PR with default reviewers + draft/ready prompt
|
|
18
18
|
Phase 7: Report External: Jira comment · Wiki + Figma screenshots · Confluence
|
|
@@ -39,21 +39,22 @@ Compose freely: `--dev --local autopilot` = shortest, least-friction path.
|
|
|
39
39
|
| Android/Kotlin | `build.gradle[.kts]` | `refs/android-guide.md` |
|
|
40
40
|
| Backend | `requirements.txt`, `package.json`, `go.mod` | `refs/backend-guide.md` |
|
|
41
41
|
| Frontend | `package.json` + framework detection | `refs/frontend-guide.md` |
|
|
42
|
-
| Docker | `Dockerfile`, `docker-compose.yml` | `refs/
|
|
42
|
+
| Docker | `Dockerfile`, `docker-compose.yml` | `refs/backend-guide.md` |
|
|
43
43
|
|
|
44
44
|
Build commands, test runners, lint tools, and review focus areas all adapt to the detected stack.
|
|
45
45
|
|
|
46
|
-
### Stack
|
|
46
|
+
### Stack Selection (marketplace plugins, v10.5.0+)
|
|
47
47
|
|
|
48
|
-
|
|
48
|
+
Stack skill sets ship as versioned plugins in the `multi-agent-plugins` marketplace. Selecting a stack enables the matching plugin(s) in the current repo's `.claude/settings.json` `enabledPlugins` — no skill copying, no session restart tricks, no directory shuffling. The `ai-common-engineering-toolkit` (accessibility audit, humanizer, Firebase) is always enabled alongside the stack plugin.
|
|
49
49
|
|
|
50
50
|
```bash
|
|
51
|
-
/multi-agent
|
|
52
|
-
/multi-agent
|
|
53
|
-
/multi-agent
|
|
54
|
-
/multi-agent
|
|
55
|
-
/multi-agent
|
|
56
|
-
/multi-agent
|
|
51
|
+
/multi-agent:stack ios # ai-ios-engineering-toolkit (SwiftUI, Xcode, HIG)
|
|
52
|
+
/multi-agent:stack android # ai-android-engineering-toolkit (Compose, Gradle, Hilt)
|
|
53
|
+
/multi-agent:stack mobile # iOS + Android combined
|
|
54
|
+
/multi-agent:stack backend # ai-backend-toolkit (spec-driven APIs)
|
|
55
|
+
/multi-agent:stack frontend # ai-frontend-engineering-toolkit (React/TSX)
|
|
56
|
+
/multi-agent:stack fullstack # backend + frontend
|
|
57
|
+
/multi-agent:stack all # every stack plugin
|
|
57
58
|
```
|
|
58
59
|
|
|
59
60
|
### Task Type Detection (v2.0.0)
|
|
@@ -124,21 +125,21 @@ Cheap, objective checks run BEFORE any AI token is spent:
|
|
|
124
125
|
|
|
125
126
|
If any gate fails, fix first. Don't waste AI tokens reviewing broken code.
|
|
126
127
|
|
|
127
|
-
###
|
|
128
|
+
### CLI-Aware Parallel Review + Fable Triage (Phase 4 Steps 2–3)
|
|
128
129
|
|
|
129
|
-
| Reviewer | Model | Focus |
|
|
130
|
-
| ---------- | ------------------- | --------------------------------- |
|
|
131
|
-
| Reviewer 1 | `claude-opus-4
|
|
132
|
-
| Reviewer 2 | `gpt-5.4` | Edge cases, different perspective |
|
|
133
|
-
| Reviewer 3 | `claude-sonnet-4
|
|
130
|
+
| Reviewer | Model | Focus | Where it runs |
|
|
131
|
+
| ---------- | ------------------- | --------------------------------- | -------------------- |
|
|
132
|
+
| Reviewer 1 | `claude-fable-5` (Claude Code) / `claude-opus-4-8` (Copilot CLI) | Deep security + architecture | Both CLIs |
|
|
133
|
+
| Reviewer 2 | `gpt-5.4` | Edge cases, different perspective | **Copilot CLI only** |
|
|
134
|
+
| Reviewer 3 | `claude-sonnet-4-6` | Quality + correctness + naming | Both CLIs |
|
|
134
135
|
|
|
135
|
-
|
|
136
|
+
The reviewer set is **CLI-aware**: Claude Code dispatches 2 reviewers in parallel (Fable + Sonnet — GPT-5.4 is not available there); Copilot CLI dispatches all 3. Each returns structured JSON for deterministic aggregation. Cross-model diversity catches blind spots that any single model family would miss.
|
|
136
137
|
|
|
137
|
-
**
|
|
138
|
+
**Fable Triage** (Phase 4 Step 3, Opus on Copilot CLI): Evaluates merged raw findings against task scope. Classifies each as `accepted` (fix now), `deferred` (out of scope, log for later), or `rejected` (false positive / noise). Only triage-accepted blocking items loop back to Phase 3.
|
|
138
139
|
|
|
139
140
|
### Runtime Triage Validator (v2.3.0)
|
|
140
141
|
|
|
141
|
-
After
|
|
142
|
+
After triage returns, output is validated by `validate-triage.mjs`:
|
|
142
143
|
|
|
143
144
|
| Exit | Meaning |
|
|
144
145
|
| ----- | ------------------------------------------------------------ |
|
package/docs/performance.md
CHANGED
|
@@ -36,7 +36,7 @@ Each event in `metrics.jsonl` is a single-line JSON object written by
|
|
|
36
36
|
JSON retries, timeouts.
|
|
37
37
|
- **Phase 3 retries** — build / test / lint retry distribution per task.
|
|
38
38
|
- **Cost per model** — calls, duration, tokens in/out, broken down by model
|
|
39
|
-
(Opus, Sonnet, GPT-5.4).
|
|
39
|
+
(Fable, Opus, Sonnet, GPT-5.4).
|
|
40
40
|
- **Language preference** — distribution of EN vs TR prompts.
|
|
41
41
|
|
|
42
42
|
## Typical Output (Markdown)
|
|
@@ -67,8 +67,8 @@ _Source: ~/.claude/logs/multi-agent/metrics.jsonl · Events: 421 (0 parse errors
|
|
|
67
67
|
|
|
68
68
|
| Model | Calls | Duration (ms) | Tokens In | Tokens Out |
|
|
69
69
|
|-------|-------|---------------|-----------|------------|
|
|
70
|
-
| `claude-
|
|
71
|
-
| `claude-sonnet-4
|
|
70
|
+
| `claude-fable-5` | 124 | 612430 | 380221 | 92114 |
|
|
71
|
+
| `claude-sonnet-4-6` | 89 | 412318 | 201445 | 58903 |
|
|
72
72
|
| `gpt-5.4` | 88 | 398214 | 194302 | 55128 |
|
|
73
73
|
```
|
|
74
74
|
|
package/index.js
CHANGED
|
@@ -41,30 +41,26 @@ if (!command || command === "install") {
|
|
|
41
41
|
npx @mmerterden/multi-agent-pipeline install Install for Claude Code (default)
|
|
42
42
|
npx @mmerterden/multi-agent-pipeline install --copilot Install for Copilot CLI
|
|
43
43
|
npx @mmerterden/multi-agent-pipeline install --all Both Claude + Copilot
|
|
44
|
-
npx @mmerterden/multi-agent-pipeline install --cursor Cursor full orchestration (rules + subagents + /multi-agent + MCP)
|
|
45
|
-
npx @mmerterden/multi-agent-pipeline install --copilot-chat GitHub Copilot Chat (.github/copilot-instructions.md)
|
|
46
|
-
npx @mmerterden/multi-agent-pipeline install --antigravity Antigravity full orchestration (.agent/ + AGENTS.md + MCP)
|
|
47
|
-
npx @mmerterden/multi-agent-pipeline install --all-tools Every supported tool (Claude + Copilot + Cursor + Copilot Chat + Antigravity)
|
|
48
44
|
npx @mmerterden/multi-agent-pipeline install --link Use symlinks (saves tokens, dev mode)
|
|
49
45
|
|
|
50
46
|
Uninstall (token-preserving — Keychain/Credential Manager untouched):
|
|
51
47
|
npx @mmerterden/multi-agent-pipeline uninstall Interactive: remove from all installed targets
|
|
52
48
|
npx @mmerterden/multi-agent-pipeline uninstall --yes Skip prompt
|
|
53
49
|
npx @mmerterden/multi-agent-pipeline uninstall --dry-run Report what would be removed
|
|
54
|
-
npx @mmerterden/multi-agent-pipeline uninstall --
|
|
50
|
+
npx @mmerterden/multi-agent-pipeline uninstall --claude Only Claude Code
|
|
51
|
+
npx @mmerterden/multi-agent-pipeline uninstall --cursor Legacy pre-v10.7 adapter-file cleanup (also --copilot-chat / --antigravity / --codex; --target=<path> overrides cwd)
|
|
55
52
|
|
|
56
53
|
Help:
|
|
57
54
|
npx @mmerterden/multi-agent-pipeline help
|
|
58
55
|
|
|
59
56
|
Options:
|
|
60
57
|
--no-color Disable colored output
|
|
61
|
-
--target=<path>
|
|
58
|
+
--target=<path> Target dir for legacy adapter cleanup on uninstall (defaults to cwd)
|
|
62
59
|
--platform=ios|android|all Filter external skills by platform (default: all)
|
|
63
60
|
|
|
64
61
|
After installation:
|
|
65
62
|
Claude Code: /multi-agent "MOBILE-123"
|
|
66
63
|
Copilot CLI: Describe your task naturally — pipeline instructions are loaded
|
|
67
|
-
Cursor / Antigravity / VS Code Copilot Chat: full orchestration (subagents + /multi-agent + MCP)
|
|
68
64
|
|
|
69
65
|
More info: https://github.com/mmerterden/multi-agent-pipeline
|
|
70
66
|
`);
|
|
@@ -240,9 +240,9 @@ Cost block reads `phase-tracker.sh tokens` accumulators × `cost-table.json` pri
|
|
|
240
240
|
- Every public method must have tests
|
|
241
241
|
- Commit format: {type}({scope}): description [{jiraId}]
|
|
242
242
|
|
|
243
|
-
## Stack
|
|
243
|
+
## Stack Selection
|
|
244
244
|
|
|
245
|
-
|
|
245
|
+
Stack skill sets ship as versioned plugins in the `multi-agent-plugins` marketplace. Selecting a stack enables the matching plugin(s) in the target repo's `.claude/settings.json` `enabledPlugins`; the `ai-common-engineering-toolkit` is always enabled alongside. There is no session-start auto-swap script. Select or change the stack with:
|
|
246
246
|
|
|
247
247
|
```bash
|
|
248
248
|
multi-agent-stack [ios|android|mobile|backend|frontend|fullstack|all]
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@mmerterden/multi-agent-pipeline",
|
|
3
|
-
"version": "10.7.
|
|
4
|
-
"description": "8-phase AI development pipeline with full orchestration on Claude Code
|
|
3
|
+
"version": "10.7.4",
|
|
4
|
+
"description": "8-phase AI development pipeline with full orchestration on Claude Code and Copilot CLI. Analysis, planning, TDD, CLI-aware parallel review with consensus surfacing + Fable triage, default-FAIL evidence gates, secret + intent guards, per-phase cost ledger, persistent learnings memory, wiki generation, commit automation. Token-preserving uninstall.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "index.js",
|
|
7
7
|
"exports": {
|
|
@@ -34,9 +34,6 @@
|
|
|
34
34
|
"copilot-cli",
|
|
35
35
|
"copilot",
|
|
36
36
|
"claude",
|
|
37
|
-
"cursor",
|
|
38
|
-
"windsurf",
|
|
39
|
-
"cline",
|
|
40
37
|
"ios",
|
|
41
38
|
"android",
|
|
42
39
|
"backend",
|
|
@@ -7,7 +7,7 @@ modelRationale: "Critic tier - deterministic checklist + build/test verificati
|
|
|
7
7
|
|
|
8
8
|
# Dev Critic Agent - Phase 3.5
|
|
9
9
|
|
|
10
|
-
You are the in-loop critic for Phase 3 (Dev). The generator (Sonnet/Opus during Phase 3) has just finished its last edit. **You run BEFORE Phase 4**, on the same worktree, against deterministic criteria that already exist on disk. Your job: catch failures the generator would otherwise send into Phase 4 and waste 2-3 reviewer calls +
|
|
10
|
+
You are the in-loop critic for Phase 3 (Dev). The generator (Sonnet/Opus during Phase 3) has just finished its last edit. **You run BEFORE Phase 4**, on the same worktree, against deterministic criteria that already exist on disk. Your job: catch failures the generator would otherwise send into Phase 4 and waste 2-3 reviewer calls + Fable triage on.
|
|
11
11
|
|
|
12
12
|
This is the **evaluator-optimizer pattern** from Anthropic's "Building Effective Agents" - the pattern is most effective "when we have clear evaluation criteria, and when iterative refinement provides measurable value." Phase 3 satisfies both: criteria are written in `rules/*.md`, refinement value is measured by Phase 4 fix-cycles avoided.
|
|
13
13
|
|
|
@@ -7,7 +7,7 @@ modelRationale: "Ambiguity scoring is a low-stakes classification + targeted que
|
|
|
7
7
|
|
|
8
8
|
# Task Clarifier Agent - Phase 0 Step 9
|
|
9
9
|
|
|
10
|
-
You score how clearly the task is specified and, when score is below the threshold, emit up to N clarifying questions the user must answer before Phase 1 (Analysis) begins.
|
|
10
|
+
You score how clearly the task is specified and, when score is below the threshold, emit up to N clarifying questions the user must answer before Phase 1 (Analysis) begins.
|
|
11
11
|
|
|
12
12
|
**You do NOT solve the task.** You only assess whether the task is solvable as stated. The user is the one who answers; you produce the questions.
|
|
13
13
|
|
|
@@ -108,6 +108,4 @@ Cost expectation on Haiku (input ~1.5k tokens issue body, output ~400 tokens JSO
|
|
|
108
108
|
|
|
109
109
|
## Pattern citation
|
|
110
110
|
|
|
111
|
-
- Devin docs (clarifying via Knowledge triggers + Ask Devin sessions): <https://docs.devin.ai/work-with-devin/devin-review>
|
|
112
|
-
- Cursor Plan Mode (asks clarifying questions before producing the plan): <https://cursor.com/docs/agent/planning>
|
|
113
111
|
- Anthropic Building Effective Agents - orchestrator-workers pattern, prefer cheap classification before expensive synthesis: <https://www.anthropic.com/engineering/building-effective-agents>
|
|
@@ -16,7 +16,7 @@
|
|
|
16
16
|
1. Analysis (Opus) -> scope, impact analysis
|
|
17
17
|
2. Planning (Opus) -> spec, task breakdown
|
|
18
18
|
3. Development (Sonnet) -> TDD, code, build
|
|
19
|
-
4. Review -> deterministic gates + parallel review +
|
|
19
|
+
4. Review -> deterministic gates + parallel review + Fable triage
|
|
20
20
|
- Claude Code: Opus + Sonnet (2 paralel)
|
|
21
21
|
- Copilot CLI: GPT-5.4 + Opus + Sonnet (3 paralel)
|
|
22
22
|
|
|
@@ -33,7 +33,7 @@ Phase 7: Report → short terminal summary
|
|
|
33
33
|
|
|
34
34
|
1. **Parse input** - standard multi-agent input formats (Issue URL, Jira ID, free text)
|
|
35
35
|
2. **Phase 0: Init** - set `"mode": "dev", "autopilot": true` in `agent-state.json`
|
|
36
|
-
3. **Phase 3: Dev** - write code directly on `claude-opus-4
|
|
36
|
+
3. **Phase 3: Dev** - write code directly on `claude-opus-4-8` and verify the build
|
|
37
37
|
4. **Phase 6: Commit** - auto commit + push + PR
|
|
38
38
|
5. **Phase 7: Report** - terminal summary
|
|
39
39
|
|
|
@@ -33,7 +33,7 @@ You already did the work locally - wrote code on the current branch and maybe
|
|
|
33
33
|
|
|
34
34
|
```
|
|
35
35
|
Phase 0: Init → project/branch detect, resolve base + diff (work-already-done), Jira id, state (NO worktree)
|
|
36
|
-
Phase 4: Review → deterministic gates + parallel review (
|
|
36
|
+
Phase 4: Review → deterministic gates + parallel review (Fable + Sonnet) + Fable triage
|
|
37
37
|
Phase 5: Build+Test → stack-aware build gate + run existing tests; SUCCESS required (automated, not the interactive user-test)
|
|
38
38
|
Phase 6: Commit → commit remaining local changes + push + open PR if none exists
|
|
39
39
|
Phase 7: Report → technical analysis + Jira comment with test scenarios (channels: Jira / PR / Confluence / Wiki)
|
|
@@ -51,7 +51,7 @@ Phases 1-3 (Analysis / Planning / Dev) are skipped by design - `finish` treats
|
|
|
51
51
|
|
|
52
52
|
## Phase execution (reuse the existing phase contracts)
|
|
53
53
|
|
|
54
|
-
- **Phase 4 Review** — run per `refs/phases/phase-4-review.md` against the resolved diff: deterministic gates (Step 1.x), stack-specific parallel reviewers (
|
|
54
|
+
- **Phase 4 Review** — run per `refs/phases/phase-4-review.md` against the resolved diff: deterministic gates (Step 1.x), stack-specific parallel reviewers (Fable + Sonnet on Claude Code; GPT + Opus + Sonnet on Copilot CLI), Fable triage → `triage.accepted`. Blocking/important accepted findings:
|
|
55
55
|
- interactive: present them and ask (`AskUserQuestion`) whether to fix now (loop back through a minimal Phase-3-style TDD fix) or proceed;
|
|
56
56
|
- `autopilot` (or `prefs.global.finish.autoFix == true`): auto-fix accepted blocking/important findings, then re-review the fix, before advancing.
|
|
57
57
|
- **Phase 5 Build+Test** — the **automated success gate** (this is what "build+test success" means here; the interactive device user-test is `/multi-agent:manual-test`). Stack-aware: build via `figma-config.build` (iOS scheme / Android gradle / detected backend/frontend build) and run the existing test suite if present (`swift test` / `xcodebuild test` / `./gradlew test` / `pytest` / `npm test` / `vitest`). Require success to advance; on failure, surface logs and (interactive) stop or (autopilot) attempt a bounded fix loop. **If the repo has no tests, report "no tests present" — never fabricate test results.**
|
|
@@ -52,13 +52,13 @@ How It Works (Phase 0 - Interactive Flow):
|
|
|
52
52
|
Pipeline (after Phase 0) - shown as visual cards in terminal:
|
|
53
53
|
|
|
54
54
|
Phase 0: Init -> The 8 steps above
|
|
55
|
-
Phase 1: Analysis -> Stack detection + codebase scan (
|
|
55
|
+
Phase 1: Analysis -> Stack detection + codebase scan (Fable)
|
|
56
56
|
Phase 2: Planning -> Task breakdown + architecture review + Plan Approval Gate
|
|
57
57
|
(clarification max 2 rounds + approval loop - normal mode only;
|
|
58
58
|
skipped for --dev, autopilot, --dev autopilot)
|
|
59
59
|
Phase 3: Dev -> TDD: test -> code -> build (Sonnet) + build queue
|
|
60
|
-
Phase 4: Review -> Deterministic gates + parallel AI review +
|
|
61
|
-
(Claude Code:
|
|
60
|
+
Phase 4: Review -> Deterministic gates + parallel AI review + Fable triage
|
|
61
|
+
(Claude Code: Fable + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
|
|
62
62
|
Phase 5: Test -> Optional: switch to branch, test in Xcode
|
|
63
63
|
(runs in dev + full; skipped in every autopilot and local variant)
|
|
64
64
|
Phase 6: Commit -> Commit -> push -> PR + issue body update (never auto-closes)
|
|
@@ -73,7 +73,7 @@ Pipeline (after Phase 0) - shown as visual cards in terminal:
|
|
|
73
73
|
|
|
74
74
|
Modes:
|
|
75
75
|
|
|
76
|
-
(normal) Full 8 phases, Sonnet dev, Plan Approval Gate active, parallel review +
|
|
76
|
+
(normal) Full 8 phases, Sonnet dev, Plan Approval Gate active, parallel review + Fable triage
|
|
77
77
|
--dev Fast: Init -> Dev(Opus) -> Commit -> Report (no plan gate)
|
|
78
78
|
--local No worktree - works directly on local branch
|
|
79
79
|
autopilot Skip all confirmations INCLUDING plan gate, auto commit/PR
|
|
@@ -120,7 +120,7 @@ Setup & Maintenance:
|
|
|
120
120
|
|
|
121
121
|
/multi-agent:setup First-run wizard - Keychain tokens + Git identity + language
|
|
122
122
|
/multi-agent:language [en|tr] Show or set outputLanguage (promptLanguage stays English)
|
|
123
|
-
/multi-agent:stack <id> Select stack by enabling the matching marketplace plugin(s) (ios / android / backend / frontend / fullstack / all)
|
|
123
|
+
/multi-agent:stack <id> Select stack by enabling the matching marketplace plugin(s) (ios / android / mobile / backend / frontend / fullstack / all)
|
|
124
124
|
/multi-agent:sync Sync ecosystem (Claude Code + Copilot CLI + pipeline + website + remote-control)
|
|
125
125
|
/multi-agent:update Pull latest pipeline + reinstall + run migrations
|
|
126
126
|
/multi-agent:delete Uninstall pipeline from every CLI (Keychain tokens left intact, double confirm)
|
|
@@ -195,7 +195,7 @@ Quality & Telemetry (advisory, on by default - flip prefs.global.* to disable)
|
|
|
195
195
|
Triage Memory Phase 7 ingests accepted/deferred/rejected findings into a per-repo corpus
|
|
196
196
|
Prior-Art Lookup Phase 1 + Phase 4 query the corpus for similar past findings, inject as context
|
|
197
197
|
Per-Persona Reviewer/agent dispatch reads `preferredModel` from persona file; per-call override
|
|
198
|
-
via PHASE_MODEL_OVERRIDE;
|
|
198
|
+
via PHASE_MODEL_OVERRIDE; ladder fable -> opus -> sonnet -> haiku
|
|
199
199
|
|
|
200
200
|
------------------------------------------------------------
|
|
201
201
|
|
|
@@ -260,13 +260,13 @@ Nasıl Çalışır (Phase 0 - İnteraktif Akış):
|
|
|
260
260
|
Pipeline (Phase 0'dan sonra) - terminalde görsel kart olarak görünür:
|
|
261
261
|
|
|
262
262
|
Phase 0: Init -> Yukarıdaki 8 adım
|
|
263
|
-
Phase 1: Analysis -> Stack tespiti + codebase taraması (
|
|
263
|
+
Phase 1: Analysis -> Stack tespiti + codebase taraması (Fable)
|
|
264
264
|
Phase 2: Planning -> Task kırılımı + mimari inceleme + Plan Onay Kapısı
|
|
265
265
|
(clarification max 2 tur + onay döngüsü - sadece normal mode;
|
|
266
266
|
--dev, autopilot, --dev autopilot'ta skip)
|
|
267
267
|
Phase 3: Dev -> TDD: test -> kod -> build (Sonnet) + build queue
|
|
268
|
-
Phase 4: Review -> Deterministik kapılar + paralel AI review +
|
|
269
|
-
(Claude Code:
|
|
268
|
+
Phase 4: Review -> Deterministik kapılar + paralel AI review + Fable triage
|
|
269
|
+
(Claude Code: Fable + Sonnet · Copilot CLI: GPT-5.4 + Opus + Sonnet)
|
|
270
270
|
Phase 5: Test -> Opsiyonel: branch'e geç, Xcode'da test (--dev / autopilot'ta skip)
|
|
271
271
|
Phase 6: Commit -> Commit -> push -> PR + issue body güncelleme (hiç auto-close yok)
|
|
272
272
|
Phase 7: Report -> Channels dispatcher (PR · Jira · Confluence · Wiki, multi-select)
|
|
@@ -280,7 +280,7 @@ Pipeline (Phase 0'dan sonra) - terminalde görsel kart olarak görünür:
|
|
|
280
280
|
|
|
281
281
|
Modlar:
|
|
282
282
|
|
|
283
|
-
(normal) Tam 8 faz, Sonnet dev, Plan Onay Kapısı aktif, paralel review +
|
|
283
|
+
(normal) Tam 8 faz, Sonnet dev, Plan Onay Kapısı aktif, paralel review + Fable triage
|
|
284
284
|
--dev Hızlı: Init -> Dev(Opus) -> Commit -> Report (plan gate yok)
|
|
285
285
|
--local Worktree yok - doğrudan local branch'te çalışır
|
|
286
286
|
autopilot Plan gate dahil tüm onayları atla, otomatik commit/PR
|
|
@@ -327,7 +327,7 @@ Setup & Maintenance:
|
|
|
327
327
|
|
|
328
328
|
/multi-agent:setup İlk kurulum sihirbazı - Keychain token + Git kimliği + dil
|
|
329
329
|
/multi-agent:language [en|tr] outputLanguage'ı göster veya ayarla (promptLanguage İngilizce kalır)
|
|
330
|
-
/multi-agent:stack <id> Stack'i eşleşen marketplace plugin'i etkinleştirerek seç (ios / android / backend / frontend / fullstack / all)
|
|
330
|
+
/multi-agent:stack <id> Stack'i eşleşen marketplace plugin'i etkinleştirerek seç (ios / android / mobile / backend / frontend / fullstack / all)
|
|
331
331
|
/multi-agent:sync Ekosistemi senkronize et (Claude Code + Copilot CLI + pipeline + website + remote-control)
|
|
332
332
|
/multi-agent:update En son pipeline'ı çek + reinstall + migration çalıştır
|
|
333
333
|
/multi-agent:delete Pipeline'ı tüm CLI'lerden kaldır (Keychain token dokunulmaz, çift onay)
|
|
@@ -402,7 +402,7 @@ Quality & Telemetry (advisory, default açık - prefs.global.* ile kapatılabi
|
|
|
402
402
|
Triage Memory Phase 7 accepted/deferred/rejected bulguları repo başına corpus'a yazar
|
|
403
403
|
Prior-Art Lookup Phase 1 + Phase 4 corpus'tan benzer geçmiş bulgu sorgular, ek context olarak enjekte
|
|
404
404
|
Per-Persona Reviewer/agent dispatch persona dosyasından `preferredModel` okur;
|
|
405
|
-
per-call override PHASE_MODEL_OVERRIDE ile;
|
|
405
|
+
per-call override PHASE_MODEL_OVERRIDE ile; merdiven fable -> opus -> sonnet -> haiku
|
|
406
406
|
|
|
407
407
|
------------------------------------------------------------
|
|
408
408
|
|
|
@@ -30,7 +30,7 @@ Phase 0: Init → project detection, branch check, state (NO worktree)
|
|
|
30
30
|
Phase 1: Analysis → codebase scan (parallel explore agents, Opus)
|
|
31
31
|
Phase 2: Planning → task breakdown, Plan Approval Gate (approval loop)
|
|
32
32
|
Phase 3: Dev → TDD (Sonnet), build queue
|
|
33
|
-
Phase 4: Review → deterministic gates + parallel review +
|
|
33
|
+
Phase 4: Review → deterministic gates + parallel review + Fable triage
|
|
34
34
|
Phase 6: Commit → pre-commit checkout prompt, commit + push + PR
|
|
35
35
|
Phase 7: Report → Jira / Wiki / Confluence + log + knowledge/memory
|
|
36
36
|
```
|
|
@@ -37,7 +37,7 @@ Introduces ~1× Sonnet call per Dev iteration. On simple bug fixes the cost outw
|
|
|
37
37
|
|
|
38
38
|
## Why this fits orchestrator-workers + evaluator-optimizer hybrid
|
|
39
39
|
|
|
40
|
-
Phase 4 is parallelization-with-voting - good for *adversarial* perspectives (security, architecture). Phase 3.5 is evaluator-optimizer - good for *deterministic* criteria (build, tests, checklists). Sending failing builds into Phase 4 wastes 2-3 reviewer calls +
|
|
40
|
+
Phase 4 is parallelization-with-voting - good for *adversarial* perspectives (security, architecture). Phase 3.5 is evaluator-optimizer - good for *deterministic* criteria (build, tests, checklists). Sending failing builds into Phase 4 wastes 2-3 reviewer calls + Fable triage; Phase 3.5 absorbs that cost at one Sonnet call.
|
|
41
41
|
|
|
42
42
|
## Reference
|
|
43
43
|
|
|
@@ -1,4 +1,6 @@
|
|
|
1
|
-
# Model Fallback Contract
|
|
1
|
+
# Model Fallback Contract
|
|
2
|
+
|
|
3
|
+
> Contract last revised in **v10.6.0** (Fable 5 restored as top tier). The version tag here tracks the last substantive change to this contract, not the pipeline release.
|
|
2
4
|
|
|
3
5
|
Personas route to the top available intelligence tier they declare in
|
|
4
6
|
`preferredModel`. That tier can be quota-limited or temporarily unavailable.
|
|
@@ -95,5 +97,7 @@ per-phase `model` field already carries the override).
|
|
|
95
97
|
deterministic over clever).
|
|
96
98
|
- No edits to `pipeline/agents/*.md` at runtime; frontmatter is install-time
|
|
97
99
|
configuration only.
|
|
98
|
-
- Copilot CLI reviewer set
|
|
99
|
-
(
|
|
100
|
+
- Copilot CLI reviewer set is out of scope: Copilot CLI pins its own three
|
|
101
|
+
reviewer models (GPT-5.4 + Opus + Sonnet — Fable 5 is not offered there) and
|
|
102
|
+
does not use this persona ladder. Only Claude Code dispatches Reviewer-1 on
|
|
103
|
+
Fable.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Feature: Shadow-Git Checkpoints (Phase 3)
|
|
2
2
|
|
|
3
|
-
**Gated by `prefs.global.shadowGit.enabled`** (default: `false`). The orchestrator snapshots the worktree via `pipeline/lib/shadow-git.sh` so sub-phase rollback is possible without polluting the project's real `.git` history.
|
|
3
|
+
**Gated by `prefs.global.shadowGit.enabled`** (default: `false`). The orchestrator snapshots the worktree via `pipeline/lib/shadow-git.sh` so sub-phase rollback is possible without polluting the project's real `.git` history.
|
|
4
4
|
|
|
5
5
|
```bash
|
|
6
6
|
# Phase 0 (one-time per task): initialize shadow repo + baseline snapshot.
|
|
@@ -88,7 +88,7 @@ Phase 4: Review (CLI-aware parallel + triage)
|
|
|
88
88
|
|-- Reviewer (gpt-5.4) -> edge cases, cross-provider diversity
|
|
89
89
|
+-- Reviewer (sonnet) -> quality + correctness + naming
|
|
90
90
|
Context for ALL: Phase 1 files + Phase 2 decisions + Phase 3 diff
|
|
91
|
-
Then: single
|
|
91
|
+
Then: single Fable triage pass filters false-positives
|
|
92
92
|
```
|
|
93
93
|
|
|
94
94
|
### Context-Aware Agent Prompting
|
|
@@ -91,7 +91,7 @@ Every phase that dispatches a billable LLM agent MUST forward its token totals t
|
|
|
91
91
|
|
|
92
92
|
```bash
|
|
93
93
|
pipeline/scripts/log-metric.sh "$TASK_ID" <phase-id> <event> \
|
|
94
|
-
model=<opus|sonnet|haiku|gpt-5.4> tokens_in=$IN tokens_out=$OUT tokens_cached=$CACHED duration_ms=$DUR
|
|
94
|
+
model=<fable|opus|sonnet|haiku|gpt-5.4> tokens_in=$IN tokens_out=$OUT tokens_cached=$CACHED duration_ms=$DUR
|
|
95
95
|
LOG_METRIC_FORWARD_TO_TRACKER=1 pipeline/scripts/log-metric.sh "$TASK_ID" <phase-id> tokens \
|
|
96
96
|
model=<...> tokens_in=$IN tokens_out=$OUT tokens_cached=$CACHED
|
|
97
97
|
```
|
|
@@ -85,7 +85,7 @@ Phase 0: Init -> Phase 3: Dev (self-contained) -> Phase 5: Test -> Phase 6: Comm
|
|
|
85
85
|
| Phase 1 (Analysis) | Parallel Explore agents | **SKIP** | **SKIP** | **SKIP** | **SKIP** |
|
|
86
86
|
| Phase 2 (Planning) | TaskCreate + architecture review + **Plan Approval Gate** (clarification max 2 rounds + approval loop) | **SKIP** (plan gate not applicable - fast path) | **SKIP** | **SKIP** | **SKIP** |
|
|
87
87
|
| Phase 3 (Dev) | Follows Phase 2 plan, TDD cycle (Sonnet) | **Self-contained** (Opus): agent scans relevant files, implements with TDD, builds | Same as `--dev` | Same as `--dev` | Same as `--dev` |
|
|
88
|
-
| Phase 4 (Review) | Parallel review +
|
|
88
|
+
| Phase 4 (Review) | Parallel review + Fable triage (Claude: 2-model / Copilot: 3-model) | **SKIP** | **SKIP** | **SKIP** | **SKIP** |
|
|
89
89
|
| Phase 5 (User Test) | Interactive prompt (ask user to test) | **Interactive prompt** (ask user to test) | **Skip** (autopilot suppresses interactive prompts) | **Interactive prompt** (same as `--dev`) | **Skip** (same as `--dev autopilot`) |
|
|
90
90
|
| Phase 6 (Commit) | Commit + PR | Same - still asks (unless autopilot) | Auto commit + push + PR | Same as `--dev` | Auto commit + push + PR |
|
|
91
91
|
| Phase 7 (Report) | Full report + channels multi-select | Simplified - no review/analysis sections, BUT **channels menu still pauses** | Channels menu **STILL PAUSES** | Same as `--dev` | Channels menu **STILL PAUSES** |
|