@mmerterden/multi-agent-pipeline 10.7.3 → 10.7.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +19 -2
- package/docs/adr/0001-three-model-triage.md +2 -2
- package/docs/adr/0007-multi-tool-adapter-framework.md +1 -1
- package/docs/adr/README.md +2 -2
- package/docs/architecture.md +14 -14
- package/docs/features.md +22 -21
- package/docs/performance.md +3 -3
- package/index.js +3 -7
- package/install/templates/copilot-instructions.md +2 -2
- package/package.json +2 -5
- package/pipeline/agents/dev-critic.md +1 -1
- package/pipeline/claude-md-template.md +1 -1
- package/pipeline/commands/multi-agent/dev-autopilot.md +1 -1
- package/pipeline/commands/multi-agent/finish.md +2 -2
- package/pipeline/commands/multi-agent/help.md +12 -12
- package/pipeline/commands/multi-agent/local.md +1 -1
- package/pipeline/commands/multi-agent/refs/features/dev-critic.md +1 -1
- package/pipeline/commands/multi-agent/refs/features/model-fallback.md +7 -3
- package/pipeline/commands/multi-agent/refs/knowledge.md +1 -1
- package/pipeline/commands/multi-agent/refs/phases/log-format.md +1 -1
- package/pipeline/commands/multi-agent/refs/phases/modes.md +1 -1
- package/pipeline/commands/multi-agent/refs/phases/phase-1-analysis.md +2 -2
- package/pipeline/commands/multi-agent/refs/phases/phase-2-planning.md +2 -2
- package/pipeline/commands/multi-agent/refs/phases/phase-3-dev.md +1 -1
- package/pipeline/commands/multi-agent/refs/phases/phase-4-review.md +18 -18
- package/pipeline/commands/multi-agent/refs/progress-contract.md +1 -1
- package/pipeline/commands/multi-agent/refs/tracker-contract.md +1 -2
- package/pipeline/commands/multi-agent/review.md +8 -8
- package/pipeline/commands/multi-agent/sync.md +3 -3
- package/pipeline/commands/multi-agent.md +7 -7
- package/pipeline/schemas/agent-state.schema.json +1 -1
- package/pipeline/schemas/prefs.schema.json +3 -3
- package/pipeline/schemas/reviewer-output.schema.json +1 -1
- package/pipeline/schemas/triage-output.schema.json +2 -2
- package/pipeline/scripts/README.md +1 -2
- package/pipeline/scripts/cost-budget-check.mjs +1 -1
- package/pipeline/scripts/cost-table.json +7 -0
- package/pipeline/scripts/fixtures/install-layout.tsv +5 -5
- package/pipeline/scripts/uninstall.mjs +53 -57
- package/pipeline/skills/shared/core/multi-agent/SKILL.md +11 -11
- package/pipeline/skills/shared/core/multi-agent-dev-autopilot/SKILL.md +1 -1
- package/pipeline/skills/shared/core/multi-agent-finish/SKILL.md +1 -1
- package/pipeline/skills/shared/core/multi-agent-help/SKILL.md +8 -8
- package/pipeline/skills/shared/core/multi-agent-review/SKILL.md +5 -5
- package/pipeline/skills/shared/core/multi-agent-sync/SKILL.md +7 -5
- package/pipeline/scripts/smoke-readme-counts.sh +0 -120
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
### Phase 4: Review (deterministic gates + parallel + triage)
|
|
2
2
|
|
|
3
|
-
> **TLDR** - Three-stage review. Stage 1: deterministic gates (build + lint + test + secret scan) that MUST pass. Stage 2: AI models in parallel - reviewer set is **CLI-aware**: Claude Code dispatches 2 reviewers (
|
|
3
|
+
> **TLDR** - Three-stage review. Stage 1: deterministic gates (build + lint + test + secret scan) that MUST pass. Stage 2: AI models in parallel - reviewer set is **CLI-aware**: Claude Code dispatches 2 reviewers (Fable + Sonnet); Copilot CLI dispatches 3 reviewers (GPT-5.4 + Opus + Sonnet — Fable 5 is not offered on Copilot CLI). Stage 3: Fable triage (Opus on Copilot CLI) - evaluates raw findings, filters false-positives/out-of-scope, keeps only actionable items. Only triage-accepted blocking items loop back to Phase 3.
|
|
4
4
|
|
|
5
5
|
<!-- progress-contract: applied -->
|
|
6
6
|
Progress emission per `refs/progress-contract.md` - lines for each gate, each reviewer dispatch + finish, triage start, triage verdict, fix dispatch.
|
|
@@ -181,17 +181,17 @@ Launch Agent instances **in parallel** using the shared `code-reviewer` subagent
|
|
|
181
181
|
|
|
182
182
|
| Reviewer | subagent_type | Model | Focus | Skills Referenced | Where it runs |
|
|
183
183
|
| ---------- | ----------------- | ------------------- | --------------------------------- | --------------------------------------------- | -------------------- |
|
|
184
|
-
| Reviewer 1 | `code-reviewer` | `claude-opus-4
|
|
184
|
+
| Reviewer 1 | `code-reviewer` | `claude-fable-5` (Claude Code) / `claude-opus-4-8` (Copilot CLI) | Deep security + architecture | `api-security-best-practices`, `architecture` | Both CLIs |
|
|
185
185
|
| Reviewer 2 | `code-reviewer` | `gpt-5.4` | Edge cases, different perspective | cross-model diversity | **Copilot CLI only** |
|
|
186
|
-
| Reviewer 3 | `code-reviewer` | `claude-sonnet-4
|
|
186
|
+
| Reviewer 3 | `code-reviewer` | `claude-sonnet-4-6` | Quality + correctness + naming | `clean-code`, stack-specific skill | Both CLIs |
|
|
187
187
|
|
|
188
188
|
Each reviewer inherits the `code-reviewer` agent's focus areas (Security, Architecture, Quality, Performance) and output contract. The orchestrator overrides only the model and the stack-specific skill per-reviewer - no prompt duplication.
|
|
189
189
|
|
|
190
|
-
**Model override wiring:** `code-reviewer.md` declares `preferredModel: fable`, so Reviewer 1 uses the persona default (Fable 5). Reviewer 2 (Copilot-only, `gpt-5.4`) and Reviewer 3 (`claude-sonnet-4
|
|
190
|
+
**Model override wiring:** `code-reviewer.md` declares `preferredModel: fable`, so Reviewer 1 uses the persona default (Fable 5). Reviewer 2 (Copilot-only, `gpt-5.4`) and Reviewer 3 (`claude-sonnet-4-6`) set `PHASE_MODEL_OVERRIDE=<model>` before dispatch - the orchestrator exports `CLAUDE_CODE_SUBAGENT_MODEL` on Claude Code, or passes `--model` on Copilot CLI. Full precedence rule: `skills/shared/core/multi-agent/SKILL.md#agent-dispatch--per-persona-model-routing-v610`. Fable dispatches are subject to the fallback contract (`refs/features/model-fallback.md`): dispatch-error retry walks `fable -> opus -> sonnet` and budget-ceiling downgrade.
|
|
191
191
|
|
|
192
192
|
**Stack-specific skills loaded per reviewer** (from Phase 1 `detectedStack`). On Claude Code, Reviewer 2 (GPT-5.4) is not dispatched - its skill column is ignored. On Copilot CLI all three columns are used.
|
|
193
193
|
|
|
194
|
-
| Stack | Reviewer 1 (Opus) | Reviewer 2 (GPT-5.4 - Copilot CLI only) | Reviewer 3 (Sonnet) |
|
|
194
|
+
| Stack | Reviewer 1 (Fable / Opus on Copilot) | Reviewer 2 (GPT-5.4 - Copilot CLI only) | Reviewer 3 (Sonnet) |
|
|
195
195
|
|-------|-------------------|-----------------------------------------|---------------------|
|
|
196
196
|
| iOS/Swift | `ios-security`, `swiftui-performance`, `hig-patterns` | `swift-concurrency`, `ios-accessibility` | `swiftui-pro`, `swift-testing` |
|
|
197
197
|
| Android/Kotlin | `android-security`, `android-performance` | `compose-testing`, `android-architecture` | `compose-components`, `kotlin-coroutines-expert` |
|
|
@@ -204,11 +204,11 @@ Skills are injected into reviewer prompt context - the reviewer uses them as r
|
|
|
204
204
|
|
|
205
205
|
**iOS/Swift - interaction & convention skills (conditional).** When the diff touches SwiftUI UI files (`*View.swift`, `*Screen.swift`, `*Configuration.swift`, `*+Modifiers.swift`), additionally inject the relevant `figma-common` convention skills as reference for the iOS reviewers: `figma-navigation`, `figma-overlays`, `figma-bottom-sheets` (interaction: emit-intent vs self-route/self-present; native-SwiftUI-first vs the project's `ui.*` custom system), and the enriched `figma-to-swiftui` accessibility rules (minimalism). These back the Step 1.5 iOS convention checks. Generic across SwiftUI projects - not tied to any one app. Omit when the diff has no SwiftUI UI changes (keeps the reviewer prompt lean).
|
|
206
206
|
|
|
207
|
-
**Dispatch timeout (required, mirrors triage 3.3).** Reviewers run in parallel and triage waits on all of them, so one stalled reviewer hangs the phase. Bound each reviewer dispatch by `REVIEWER_TIMEOUT_SECONDS` (default 180). If a reviewer has not returned by the budget: log `review.reviewer_timeout reviewer=<name>`, treat that reviewer as absent, and proceed to triage with the reviewers that did return. The merged-findings count and `consensus.reviewerCount` reflect only the reviewers that returned. If **zero** reviewers return, retry
|
|
207
|
+
**Dispatch timeout (required, mirrors triage 3.3).** Reviewers run in parallel and triage waits on all of them, so one stalled reviewer hangs the phase. Bound each reviewer dispatch by `REVIEWER_TIMEOUT_SECONDS` (default 180). If a reviewer has not returned by the budget: log `review.reviewer_timeout reviewer=<name>`, treat that reviewer as absent, and proceed to triage with the reviewers that did return. The merged-findings count and `consensus.reviewerCount` reflect only the reviewers that returned. If **zero** reviewers return, retry Reviewer 1 once; on a second total failure HALT with `ERR: no reviewer returned within ${REVIEWER_TIMEOUT_SECONDS}s; resume with /multi-agent:resume #N.`. The Step 2.5 rebuttal round uses the same per-dispatch timeout. Never block indefinitely on a slow or dead reviewer dispatch.
|
|
208
208
|
|
|
209
209
|
#### Output contract - reviewer step
|
|
210
210
|
|
|
211
|
-
Step 2 produces N reviewer-output objects (one per dispatched reviewer), each conforming to `pipeline/schemas/reviewer-output.schema.json`. They are persisted to `state.reviewIterations[<iteration>].reviewers[]` and consumed by Step 3 (
|
|
211
|
+
Step 2 produces N reviewer-output objects (one per dispatched reviewer), each conforming to `pipeline/schemas/reviewer-output.schema.json`. They are persisted to `state.reviewIterations[<iteration>].reviewers[]` and consumed by Step 3 (Fable triage) - never by Phase 6 directly. The triage step (below) is the producer of the only review artifact Phase 6 reads, conforming to `pipeline/schemas/triage-output.schema.json`.
|
|
212
212
|
|
|
213
213
|
**Subagent return format** - each reviewer returns JSON conforming to `pipeline/schemas/reviewer-output.schema.json`:
|
|
214
214
|
|
|
@@ -248,9 +248,9 @@ Exit 0 = valid. Exit 2 = contradiction (approved=true with blocking findings) -
|
|
|
248
248
|
|
|
249
249
|
**Off by default reason:** mixed-verdict cases are ~8% of runs in practice; the extra ~$0.20-$0.50 per run isn't worth automating for users who'd rather let triage resolve it cleanly. Users with high-stakes tasks (security-critical, release branches) can flip the flag.
|
|
250
250
|
|
|
251
|
-
#### Step 3 -
|
|
251
|
+
#### Step 3 - Fable Triage (filter before acting)
|
|
252
252
|
|
|
253
|
-
**CRITICAL**: Reviewer findings are **raw signals**, not commands. Never auto-loop on every "blocking" tag - reviewers hallucinate, misread scope, or repeat each other. Run
|
|
253
|
+
**CRITICAL**: Reviewer findings are **raw signals**, not commands. Never auto-loop on every "blocking" tag - reviewers hallucinate, misread scope, or repeat each other. Run Fable triage (Opus on Copilot CLI) to evaluate merged findings against task scope.
|
|
254
254
|
|
|
255
255
|
##### 3.1 Short-circuit: no findings
|
|
256
256
|
|
|
@@ -258,7 +258,7 @@ If merged findings `length === 0`, **skip triage**: write empty result `{"accept
|
|
|
258
258
|
|
|
259
259
|
##### 3.2 Launch triage agent
|
|
260
260
|
|
|
261
|
-
Launch **1 Agent** (subagent_type: `general-purpose`, model: `opus`) with:
|
|
261
|
+
Launch **1 Agent** (subagent_type: `general-purpose`, model: `fable` on Claude Code / `opus` on Copilot CLI) with:
|
|
262
262
|
|
|
263
263
|
- Raw findings from Reviewer 1 + Reviewer 2 (merged JSON)
|
|
264
264
|
- Task scope (Phase 1 analysis summary + Phase 2 plan)
|
|
@@ -307,11 +307,11 @@ Step 3 produces a single triage-output object conforming to `pipeline/schemas/tr
|
|
|
307
307
|
|
|
308
308
|
Return ONLY valid JSON conforming to pipeline/schemas/triage-output.schema.json:
|
|
309
309
|
{
|
|
310
|
-
"accepted": [{ "severity": "blocking|important|suggestion", "file": "...", "line": N, "issue": "...", "fix": "...", "reviewer": "opus|sonnet" }],
|
|
310
|
+
"accepted": [{ "severity": "blocking|important|suggestion", "file": "...", "line": N, "issue": "...", "fix": "...", "reviewer": "fable|opus|sonnet|gpt" }],
|
|
311
311
|
"deferred": [{ "finding": {...}, "reason": "..." }],
|
|
312
312
|
"rejected": [{ "finding": {...}, "reason": "..." }],
|
|
313
313
|
"approved": true|false, // true if no accepted blocking items remain
|
|
314
|
-
"consensus": { "reviewerCount": N, "verdict": "unanimous-pass|unanimous-block|split|unverified", "disagreements": [{ "file": "...", "line": N, "issue": "...", "note": "
|
|
314
|
+
"consensus": { "reviewerCount": N, "verdict": "unanimous-pass|unanimous-block|split|unverified", "disagreements": [{ "file": "...", "line": N, "issue": "...", "note": "Fable blocking, Sonnet approved" }] } // optional, see 3.6
|
|
315
315
|
}
|
|
316
316
|
```
|
|
317
317
|
|
|
@@ -352,12 +352,12 @@ Failure fallback (timeout >120s, or agent crash before any JSON is produced): re
|
|
|
352
352
|
Emit metrics per review pass for Phase 7 cost rollup:
|
|
353
353
|
|
|
354
354
|
```bash
|
|
355
|
-
LOG_METRIC_FORWARD_TO_TRACKER=1 pipeline/scripts/log-metric.sh "$TASK_ID" 4 review.reviewer_call model=
|
|
355
|
+
LOG_METRIC_FORWARD_TO_TRACKER=1 pipeline/scripts/log-metric.sh "$TASK_ID" 4 review.reviewer_call model=fable duration_ms=$R1_DURATION tokens_in=$R1_IN tokens_out=$R1_OUT # model=opus on Copilot CLI
|
|
356
356
|
# GPT-5.4 metric emitted only on Copilot CLI (skip on Claude Code):
|
|
357
357
|
[ "${CLI_HOST:-claude}" = "copilot" ] && \
|
|
358
358
|
LOG_METRIC_FORWARD_TO_TRACKER=1 pipeline/scripts/log-metric.sh "$TASK_ID" 4 review.reviewer_call model=gpt-5.4 duration_ms=$GPT_DURATION tokens_in=$GPT_IN tokens_out=$GPT_OUT
|
|
359
359
|
LOG_METRIC_FORWARD_TO_TRACKER=1 pipeline/scripts/log-metric.sh "$TASK_ID" 4 review.reviewer_call model=sonnet duration_ms=$SONNET_DURATION tokens_in=$SONNET_IN tokens_out=$SONNET_OUT
|
|
360
|
-
LOG_METRIC_FORWARD_TO_TRACKER=1 pipeline/scripts/log-metric.sh "$TASK_ID" 4 review.triage_call model=
|
|
360
|
+
LOG_METRIC_FORWARD_TO_TRACKER=1 pipeline/scripts/log-metric.sh "$TASK_ID" 4 review.triage_call model=fable duration_ms=$TRIAGE_DURATION tokens_in=$TRIAGE_IN tokens_out=$TRIAGE_OUT
|
|
361
361
|
pipeline/scripts/log-metric.sh "$TASK_ID" 4 review.completed raw_count=$RAW accepted=$ACC deferred=$DEF rejected=$REJ approved=$APPROVED duration_ms=$DURATION
|
|
362
362
|
```
|
|
363
363
|
|
|
@@ -365,18 +365,18 @@ pipeline/scripts/log-metric.sh "$TASK_ID" 4 review.completed raw_count=$RAW acce
|
|
|
365
365
|
|
|
366
366
|
##### 3.5 Optional cross-check (single-point-of-failure mitigation)
|
|
367
367
|
|
|
368
|
-
Opt-in via `prefs.global.triageCrossCheck.enabled` (default `false`). Sampled runs dispatch a **Sonnet** triage agent as second opinion, validated via `validate-triage.mjs` (same fallback rules). Disagreements logged as `triage.cross_check_diff`; `blockOnDisagreement` pauses for user (autopilot: proceed with
|
|
368
|
+
Opt-in via `prefs.global.triageCrossCheck.enabled` (default `false`). Sampled runs dispatch a **Sonnet** triage agent as second opinion, validated via `validate-triage.mjs` (same fallback rules). Disagreements logged as `triage.cross_check_diff`; `blockOnDisagreement` pauses for user (autopilot: proceed with the Fable verdict). Doubles triage cost on sampled runs.
|
|
369
369
|
|
|
370
370
|
##### 3.6 Consensus surfacing (anti-correlation)
|
|
371
371
|
|
|
372
|
-
**Rationale:** Reviewer 1 (
|
|
372
|
+
**Rationale:** Reviewer 1 (Fable) and Reviewer 3 (Sonnet) are both Anthropic Claude models, so unanimous agreement on a *judgment call* is not independent confirmation - same-family models drift the same way on ambiguous prompts. Treating "both approved" as proof produces false-consensus passes. Triage therefore records a `consensus` block (schema v3.1.0) and surfaces disagreement and unverified agreement to the user rather than burying it.
|
|
373
373
|
|
|
374
374
|
After the triage verdict is computed, populate `triage.consensus`:
|
|
375
375
|
|
|
376
376
|
1. `reviewerCount` = number of reviewers dispatched this iteration (`2` on Claude Code, `3` on Copilot CLI).
|
|
377
377
|
2. Classify the iteration `verdict`:
|
|
378
378
|
- `unanimous-block` -> all reviewers returned at least one overlapping `blocking` finding.
|
|
379
|
-
- `split` -> reviewers disagreed on existence or severity of one or more findings (the Step 2.5 disagreement definition). List each split in `disagreements[]` with a `note` naming who held which position (e.g. "
|
|
379
|
+
- `split` -> reviewers disagreed on existence or severity of one or more findings (the Step 2.5 disagreement definition). List each split in `disagreements[]` with a `note` naming who held which position (e.g. "Fable blocking, Sonnet approved").
|
|
380
380
|
- `unanimous-pass` -> all reviewers approved AND the diff is low-risk (no security/auth/concurrency surface per Phase 1 `touchedAreas`). Clear-cut; trust it.
|
|
381
381
|
- `unverified` -> all reviewers approved BUT the diff touches a judgment-heavy surface (security, auth, concurrency, money, data migration). Agreement here may be correlated; do NOT treat it as a confirmed pass. Surface it.
|
|
382
382
|
3. `disagreements[]` is populated for `split` and is also used to carry `unverified` notes (e.g. "both approved a keychain change - agreement unverified, confirm manually").
|
|
@@ -430,7 +430,7 @@ for proj in $(jq -r '.projects[] | "\(.name)\t\(.worktreePath)\t\(.baseBranch)"'
|
|
|
430
430
|
done
|
|
431
431
|
```
|
|
432
432
|
|
|
433
|
-
Same
|
|
433
|
+
Same reviewer set (Fable-or-Opus / GPT-5.4 / Sonnet) receive `COMBINED_DIFF` with a multi-repo prefix in the system prompt:
|
|
434
434
|
|
|
435
435
|
```
|
|
436
436
|
This is a multi-repo task spanning {N} repos: {repo names}.
|
|
@@ -128,7 +128,7 @@ Every phase that dispatches a billable LLM agent MUST forward the call's token t
|
|
|
128
128
|
|
|
129
129
|
```bash
|
|
130
130
|
LOG_METRIC_FORWARD_TO_TRACKER=1 pipeline/scripts/log-metric.sh "$TASK_ID" <phase-id> <event> \
|
|
131
|
-
model=<opus|sonnet|haiku|gpt-5.4> \
|
|
131
|
+
model=<fable|opus|sonnet|haiku|gpt-5.4> \
|
|
132
132
|
tokens_in=$IN tokens_out=$OUT duration_ms=$DUR
|
|
133
133
|
```
|
|
134
134
|
|
|
@@ -24,8 +24,7 @@ The agent detects which CLI it's running in and uses the appropriate visual mech
|
|
|
24
24
|
```
|
|
25
25
|
1. system prompt mentions "Claude Code" → claude-code
|
|
26
26
|
2. system prompt mentions "Copilot" / "GitHub Copilot" → copilot
|
|
27
|
-
3.
|
|
28
|
-
5. None of the above → generic (bash stdout)
|
|
27
|
+
3. None of the above → generic (bash stdout)
|
|
29
28
|
```
|
|
30
29
|
|
|
31
30
|
Visual mechanism per CLI:
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
---
|
|
2
|
-
description: "Run parallel review on a branch's diff or a Pull Request: 2 models on Claude Code (
|
|
2
|
+
description: "Run parallel review on a branch's diff or a Pull Request: 2 models on Claude Code (Fable + Sonnet), 3 models on Copilot CLI (GPT + Opus + Sonnet). On PR input, posts per-finding inline comments and sets approve/needs-work review state."
|
|
3
3
|
argument-hint: "[#N | repo#N | PR-URL | branch] - optional: PR by number/URL, repo+number, or local branch. Supports GitHub and Bitbucket Server URLs. If omitted, the current branch is used."
|
|
4
4
|
---
|
|
5
5
|
|
|
@@ -112,13 +112,13 @@ Save the diff to `/tmp/multi-agent-review-${TASK_ID}-diff.patch` so reviewers ca
|
|
|
112
112
|
### 3. Launch parallel reviewers - host-CLI dependent
|
|
113
113
|
|
|
114
114
|
**Claude Code (2 in parallel):**
|
|
115
|
-
- Agent 1: `claude-
|
|
116
|
-
- Agent 2: `claude-sonnet-4
|
|
115
|
+
- Agent 1: `claude-fable-5` → security + architecture
|
|
116
|
+
- Agent 2: `claude-sonnet-4-6` → general quality
|
|
117
117
|
|
|
118
118
|
**Copilot CLI (3 in parallel):**
|
|
119
|
-
- Agent 1: `claude-opus-4
|
|
119
|
+
- Agent 1: `claude-opus-4-8` → security + architecture (Fable 5 is not offered on Copilot CLI)
|
|
120
120
|
- Agent 2: `gpt-5.4` → edge cases, alternate perspective
|
|
121
|
-
- Agent 3: `claude-sonnet-4
|
|
121
|
+
- Agent 3: `claude-sonnet-4-6` → general quality
|
|
122
122
|
|
|
123
123
|
Each reviewer receives the diff plus the standard reviewer system prompt (see `refs/phases/phase-4-review.md` for the prompt contract). Output: structured `findings[]` per reviewer.
|
|
124
124
|
|
|
@@ -137,7 +137,7 @@ Each finding gets the `ruleID` from the catalog plus the platform policy ref:
|
|
|
137
137
|
|
|
138
138
|
Catalog-only - does NOT invoke binaries. For a full scan, use `/multi-agent:test "store-ready"`.
|
|
139
139
|
|
|
140
|
-
### 5. Triage (
|
|
140
|
+
### 5. Triage (Fable)
|
|
141
141
|
|
|
142
142
|
Classify findings into:
|
|
143
143
|
- 🔴 **Blocking** → must fix
|
|
@@ -152,10 +152,10 @@ Triage also marks each finding as `accepted` (real issue), `deferred` (real but
|
|
|
152
152
|
🔍 Review Complete · PR #1250 · 3 files +120 -45
|
|
153
153
|
| Model | Verdict | Blocking | Important | Suggestion |
|
|
154
154
|
|----------|-----------|----------|-----------|------------|
|
|
155
|
-
|
|
|
155
|
+
| Fable | approved | 0 | 1 | 3 |
|
|
156
156
|
| Sonnet | rejected | 1 | 2 | 5 |
|
|
157
157
|
|
|
158
|
-
Consensus: ⚠ DISAGREEMENT - see
|
|
158
|
+
Consensus: ⚠ DISAGREEMENT - see Fable triage
|
|
159
159
|
```
|
|
160
160
|
|
|
161
161
|
This summary ALWAYS prints, regardless of input mode. The chat is the live conversation; on the PR side, the durable artifacts are inline comments + the review state (Step 7).
|
|
@@ -58,7 +58,7 @@ Run every step automatically:
|
|
|
58
58
|
Step 0: FIGMA_SYNC SKIP (deprecated - feedback_figma_source_deprecated)
|
|
59
59
|
Step 1: PLATFORM Detect macOS / Linux / Windows (Git Bash / WSL); export PLATFORM env
|
|
60
60
|
Step 1.5: DETECT Compare timestamps, find stale targets
|
|
61
|
-
Step 2: COPILOT Claude Code -> Copilot CLI (instructions +
|
|
61
|
+
Step 2: COPILOT Claude Code -> Copilot CLI (instructions + 35 sub-command skills)
|
|
62
62
|
Step 3: REPO Claude Code -> pipeline repo (genericized, personal data scrub, bash -n on all sh)
|
|
63
63
|
Step 3c: PLUGINS pipeline shared/external -> multi-agent-plugins marketplace (rebuild knowledge/,
|
|
64
64
|
bump changed plugins' patch version, commit + push the plugins repo)
|
|
@@ -277,11 +277,11 @@ This runs on the Claude <-> Copilot axis — the two CLIs the pipeline supports
|
|
|
277
277
|
|-------------|-------------|
|
|
278
278
|
| `~/.claude/commands/multi-agent/{cmd}.md` | `~/.copilot/skills/multi-agent-{cmd}/SKILL.md` |
|
|
279
279
|
|
|
280
|
-
**
|
|
280
|
+
**35 commands are synced** (canonical inventory - must match `cross-cli-contract.md` section 1; drift = contract violation):
|
|
281
281
|
|
|
282
282
|
```
|
|
283
283
|
analysis, analysis-resolve, autopilot, build-optimize, channels, delete, dev,
|
|
284
|
-
dev-autopilot, dev-local, dev-local-autopilot, diff-explain, garbage-collect,
|
|
284
|
+
dev-autopilot, dev-local, dev-local-autopilot, diff-explain, finish, garbage-collect,
|
|
285
285
|
help, issue, jira, kill, language, local, local-autopilot, log, manual-test,
|
|
286
286
|
prune-logs, purge, refactor, resume, review, scan, search, setup, stack, status,
|
|
287
287
|
sync, test, update
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
---
|
|
2
|
-
description: "Task orchestrator - full pipeline via Jira ID + branch or GitHub Issue URL: analysis, plan, TDD development, parallel review +
|
|
2
|
+
description: "Task orchestrator - full pipeline via Jira ID + branch or GitHub Issue URL: analysis, plan, TDD development, parallel review + Fable triage (CLI-aware: 2-model on Claude Code, 3-model on Copilot CLI), commit, log"
|
|
3
3
|
allowed-tools: Agent, Bash, Read, Write, Edit, Glob, Grep, TaskCreate, TaskUpdate, TaskList, TaskGet, AskUserQuestion, WebFetch, WebSearch, NotebookEdit, Skill
|
|
4
4
|
---
|
|
5
5
|
|
|
@@ -140,14 +140,14 @@ This command uses lazy loading for token efficiency. Read the relevant sub-file
|
|
|
140
140
|
- Multiple stacks -> load all relevant guides
|
|
141
141
|
|
|
142
142
|
**Agent definitions** (used in Phase 1 and Phase 4):
|
|
143
|
-
- `$HOME/.claude/agents/code-reviewer.md` - Phase 4 reviewer persona (`preferredModel:
|
|
143
|
+
- `$HOME/.claude/agents/code-reviewer.md` - Phase 4 reviewer persona (`preferredModel: fable`; Phase 4 overrides Reviewer 3 to `sonnet`)
|
|
144
144
|
- `$HOME/.claude/agents/explorer.md` - Phase 1 codebase scan persona (`preferredModel: sonnet` - scan work, cost-efficient)
|
|
145
|
-
- `$HOME/.claude/agents/ios-architect.md` - iOS architecture review (`preferredModel:
|
|
146
|
-
- `$HOME/.claude/agents/android-architect.md` - Android architecture review (`preferredModel:
|
|
147
|
-
- `$HOME/.claude/agents/backend-architect.md` - Backend/API architecture review (`preferredModel:
|
|
145
|
+
- `$HOME/.claude/agents/ios-architect.md` - iOS architecture review (`preferredModel: fable`)
|
|
146
|
+
- `$HOME/.claude/agents/android-architect.md` - Android architecture review (`preferredModel: fable`)
|
|
147
|
+
- `$HOME/.claude/agents/backend-architect.md` - Backend/API architecture review (`preferredModel: fable`)
|
|
148
148
|
- `$HOME/.claude/agents/security-auditor.md` - Security audit (`preferredModel: opus`)
|
|
149
149
|
|
|
150
|
-
**Per-persona model routing:** Before each Agent dispatch, the orchestrator reads `preferredModel` from the persona file and exports `CLAUDE_CODE_SUBAGENT_MODEL` (Claude Code) / passes `--model` (Copilot CLI). Precedence: per-dispatch `PHASE_MODEL_OVERRIDE` > persona `preferredModel` > `
|
|
150
|
+
**Per-persona model routing:** Before each Agent dispatch, the orchestrator reads `preferredModel` from the persona file and exports `CLAUDE_CODE_SUBAGENT_MODEL` (Claude Code) / passes `--model` (Copilot CLI). Precedence: per-dispatch `PHASE_MODEL_OVERRIDE` > persona `preferredModel` > `fable` (falls back per `refs/features/model-fallback.md`). Full contract: `skills/shared/core/multi-agent/SKILL.md#agent-dispatch--per-persona-model-routing-v610`.
|
|
151
151
|
|
|
152
152
|
---
|
|
153
153
|
|
|
@@ -247,7 +247,7 @@ When called with `review`:
|
|
|
247
247
|
1. Detect current branch and project from cwd (or ask)
|
|
248
248
|
2. Get diff: `git diff HEAD` (unstaged + staged)
|
|
249
249
|
3. If no diff, get diff against base branch: `git diff origin/{baseBranch}...HEAD`
|
|
250
|
-
4. Launch Phase 4 review (parallel +
|
|
250
|
+
4. Launch Phase 4 review (parallel + Fable triage - 2-model on Claude Code, 3-model on Copilot CLI) on the diff
|
|
251
251
|
5. No worktree, no state file - lightweight one-shot review
|
|
252
252
|
6. Print findings to terminal
|
|
253
253
|
|
|
@@ -183,7 +183,7 @@
|
|
|
183
183
|
"planEditRequests": {
|
|
184
184
|
"type": "array",
|
|
185
185
|
"items": { "type": "string" },
|
|
186
|
-
"description": "v5.3.0 Phase 2 - free-text edit instructions the user typed between plan renders. Preserved verbatim for audit;
|
|
186
|
+
"description": "v5.3.0 Phase 2 - free-text edit instructions the user typed between plan renders. Preserved verbatim for audit; the planning model (Fable top tier) parses them conversationally to revise the plan."
|
|
187
187
|
}
|
|
188
188
|
}
|
|
189
189
|
}
|
|
@@ -831,9 +831,9 @@
|
|
|
831
831
|
},
|
|
832
832
|
"pricingModel": {
|
|
833
833
|
"type": "string",
|
|
834
|
-
"enum": ["opus", "sonnet", "haiku"],
|
|
835
|
-
"default": "
|
|
836
|
-
"description": "Which cost-table.json rate to price accumulated tokens at. Defaults to
|
|
834
|
+
"enum": ["fable", "opus", "sonnet", "haiku"],
|
|
835
|
+
"default": "fable",
|
|
836
|
+
"description": "Which cost-table.json rate to price accumulated tokens at. Defaults to fable (the top tier since v10.6.0) for a deliberately conservative (upper-bound) estimate, so the ceiling trips early rather than late."
|
|
837
837
|
}
|
|
838
838
|
}
|
|
839
839
|
},
|
|
@@ -19,7 +19,7 @@
|
|
|
19
19
|
},
|
|
20
20
|
"reviewer": {
|
|
21
21
|
"type": "string",
|
|
22
|
-
"description": "Model label for this output (e.g. 'opus', 'sonnet', 'gpt'). Present once the parallel reviewer outputs are merged into the Phase 4 array so triage/consensus can attribute each finding to its source. Optional on a single reviewer's raw pre-merge output."
|
|
22
|
+
"description": "Model label for this output (e.g. 'fable', 'opus', 'sonnet', 'gpt'). Present once the parallel reviewer outputs are merged into the Phase 4 array so triage/consensus can attribute each finding to its source. Optional on a single reviewer's raw pre-merge output."
|
|
23
23
|
}
|
|
24
24
|
},
|
|
25
25
|
"$defs": {
|
|
@@ -74,8 +74,8 @@
|
|
|
74
74
|
},
|
|
75
75
|
"reviewer": {
|
|
76
76
|
"type": "string",
|
|
77
|
-
"enum": ["opus", "sonnet"],
|
|
78
|
-
"description": "Which reviewer produced the raw finding. Haiku was removed in v2.1.0."
|
|
77
|
+
"enum": ["fable", "opus", "sonnet", "gpt"],
|
|
78
|
+
"description": "Which reviewer produced the raw finding. Claude Code Reviewer 1 is fable (opus when fallback engages); Copilot CLI adds gpt. Haiku was removed in v2.1.0."
|
|
79
79
|
},
|
|
80
80
|
"consensus": {
|
|
81
81
|
"type": "object",
|
|
@@ -64,12 +64,11 @@ Installed into `~/.claude/scripts/` and invoked by settings.json hook configurat
|
|
|
64
64
|
- `pre-push-check.sh` - runs before `git push` (smoke-cross-cli-behavior + smoke-personal-data)
|
|
65
65
|
- `output-quality-check.sh` - runs after PR body / Jira comment generation (newline / HTML entity guard)
|
|
66
66
|
|
|
67
|
-
## Runtime helpers
|
|
67
|
+
## Runtime helpers
|
|
68
68
|
Shell scripts invoked during pipeline execution.
|
|
69
69
|
|
|
70
70
|
- `phase-banner.sh` - renders phase headers
|
|
71
71
|
- `phase-tracker.sh` - live tracker state + tokens accumulation + render
|
|
72
|
-
- `stack-swap.sh` - stack detection + skill set swap
|
|
73
72
|
- `keychain-save.sh` - store PAT in macOS Keychain
|
|
74
73
|
- `audit-log.sh` + `audit-log-rotate.sh` - opt-in audit trail
|
|
75
74
|
- `log-metric.sh` - opt-in metric capture
|
|
@@ -66,7 +66,7 @@ if (flags.help || flags.h) {
|
|
|
66
66
|
}
|
|
67
67
|
|
|
68
68
|
// --- resolve config: prefs first, CLI overrides -----------------------------
|
|
69
|
-
const cfg = { enabled: false, maxUsd: 5.0, warnPct: 80, onExceed: "warn", pricingModel: "
|
|
69
|
+
const cfg = { enabled: false, maxUsd: 5.0, warnPct: 80, onExceed: "warn", pricingModel: "fable" };
|
|
70
70
|
|
|
71
71
|
if (flags.prefs) {
|
|
72
72
|
if (!existsSync(flags.prefs)) die(`prefs file not found: ${flags.prefs}`);
|
|
@@ -2,6 +2,13 @@
|
|
|
2
2
|
"_readme": "Per-model unit prices in USD per million tokens. Source: Anthropic public pricing (verified 2026-04-21). Update when Anthropic publishes new tiers. Unknown models render USD as ' - ' and emit a footnote - never block PR-body generation. cacheReadPerMtok is the discounted rate for prompt-cache hits (~10% of inPerMtok); the renderer prices a phase's tokens_cached at this rate when the tracker records it, so resume/cache reuse is visible in the ledger.",
|
|
3
3
|
"schemaVersion": "1.1.0",
|
|
4
4
|
"prices": {
|
|
5
|
+
"fable": {
|
|
6
|
+
"inPerMtok": 10.0,
|
|
7
|
+
"outPerMtok": 50.0,
|
|
8
|
+
"cacheReadPerMtok": 1.0,
|
|
9
|
+
"modelId": "claude-fable-5",
|
|
10
|
+
"note": "Top tier (restored v10.6.0) - architects, Reviewer 1, triage. Verified against Anthropic pricing 2026-07-02."
|
|
11
|
+
},
|
|
5
12
|
"opus": {
|
|
6
13
|
"inPerMtok": 5.0,
|
|
7
14
|
"outPerMtok": 25.0,
|
|
@@ -1,16 +1,16 @@
|
|
|
1
1
|
.claude/CLAUDE.md 1
|
|
2
2
|
.claude/agents 8
|
|
3
|
-
.claude/commands
|
|
3
|
+
.claude/commands 88
|
|
4
4
|
.claude/lib 23
|
|
5
5
|
.claude/multi-agent-preferences.json 1
|
|
6
6
|
.claude/rules 12
|
|
7
7
|
.claude/schemas 23
|
|
8
|
-
.claude/scripts
|
|
8
|
+
.claude/scripts 167
|
|
9
9
|
.claude/settings.json 1
|
|
10
|
-
.claude/skills
|
|
10
|
+
.claude/skills 560
|
|
11
11
|
.copilot/agents 8
|
|
12
12
|
.copilot/copilot-instructions.md 1
|
|
13
13
|
.copilot/lib 23
|
|
14
14
|
.copilot/schemas 23
|
|
15
|
-
.copilot/scripts
|
|
16
|
-
.copilot/skills
|
|
15
|
+
.copilot/scripts 167
|
|
16
|
+
.copilot/skills 596
|
|
@@ -2,15 +2,17 @@
|
|
|
2
2
|
|
|
3
3
|
/**
|
|
4
4
|
* @file Token-preserving uninstaller - removes the multi-agent-pipeline
|
|
5
|
-
* footprint from Claude Code
|
|
6
|
-
*
|
|
7
|
-
*
|
|
5
|
+
* footprint from Claude Code and Copilot CLI without touching personal access
|
|
6
|
+
* tokens stored in the OS credential store (macOS Keychain / Windows
|
|
7
|
+
* Credential Manager / Linux libsecret). The legacy adapter flags
|
|
8
|
+
* (--cursor / --copilot-chat / --antigravity / --codex) clean up files left
|
|
9
|
+
* behind by pre-v10.7.0 adapter installs; the adapters themselves are gone.
|
|
8
10
|
*
|
|
9
11
|
* Invocation:
|
|
10
12
|
* node uninstall.mjs # interactive, removes from all installed targets
|
|
11
13
|
* node uninstall.mjs --yes # skip prompts, remove from all
|
|
12
14
|
* node uninstall.mjs --claude # only Claude Code
|
|
13
|
-
* node uninstall.mjs --cursor --target=/path/to/repo
|
|
15
|
+
* node uninstall.mjs --cursor --target=/path/to/repo # legacy adapter-file cleanup
|
|
14
16
|
* node uninstall.mjs --dry-run # report what would be removed, change nothing
|
|
15
17
|
*
|
|
16
18
|
* Targeted by:
|
|
@@ -29,12 +31,9 @@
|
|
|
29
31
|
*/
|
|
30
32
|
|
|
31
33
|
import { existsSync, readdirSync, readFileSync, rmSync, writeFileSync } from "fs";
|
|
32
|
-
import { join
|
|
33
|
-
import { fileURLToPath } from "url";
|
|
34
|
+
import { join } from "path";
|
|
34
35
|
import { createInterface } from "readline";
|
|
35
36
|
|
|
36
|
-
const __dirname = dirname(fileURLToPath(import.meta.url));
|
|
37
|
-
const PIPELINE_ROOT = join(__dirname, "..");
|
|
38
37
|
const HOME = process.env.HOME || process.env.USERPROFILE;
|
|
39
38
|
|
|
40
39
|
const flags = process.argv.slice(2).filter((a) => a !== "uninstall");
|
|
@@ -106,6 +105,23 @@ function rmMatchingDirs(parent, predicate) {
|
|
|
106
105
|
return count;
|
|
107
106
|
}
|
|
108
107
|
|
|
108
|
+
/**
|
|
109
|
+
* Remove every plain file under `parent` whose name matches a predicate.
|
|
110
|
+
* Used by the legacy adapter cleanup (pre-v10.7.0 generated files).
|
|
111
|
+
* @param {string} parent
|
|
112
|
+
* @param {(name: string) => boolean} predicate
|
|
113
|
+
*/
|
|
114
|
+
function rmMatchingFiles(parent, predicate) {
|
|
115
|
+
if (!existsSync(parent)) return 0;
|
|
116
|
+
let count = 0;
|
|
117
|
+
for (const entry of readdirSync(parent, { withFileTypes: true })) {
|
|
118
|
+
if (!entry.isFile()) continue;
|
|
119
|
+
if (!predicate(entry.name)) continue;
|
|
120
|
+
if (rmIfExists(join(parent, entry.name))) count++;
|
|
121
|
+
}
|
|
122
|
+
return count;
|
|
123
|
+
}
|
|
124
|
+
|
|
109
125
|
/**
|
|
110
126
|
* Strip a marker-wrapped `<!-- multi-agent-pipeline:begin/end -->` block from
|
|
111
127
|
* a user-owned file. Preserves everything outside the markers. Deletes the
|
|
@@ -287,68 +303,48 @@ async function main() {
|
|
|
287
303
|
stripManagedBlock(join(COP, "copilot-instructions.md"));
|
|
288
304
|
}
|
|
289
305
|
|
|
306
|
+
// Legacy adapter-file cleanup (adapters removed in v10.7.0). These blocks
|
|
307
|
+
// delete files a pre-v10.7.0 install generated; they never touch user files
|
|
308
|
+
// outside the multi-agent-* namespace / managed markers.
|
|
290
309
|
if (forCursor) {
|
|
291
310
|
console.log("");
|
|
292
|
-
console.log(` [Cursor] Removing from ${adapterTarget}...`);
|
|
293
|
-
|
|
294
|
-
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
console.log(` removed: ${result.removed} file(s)`);
|
|
298
|
-
} catch (e) {
|
|
299
|
-
console.log(` skipped (adapter unavailable): ${e.message}`);
|
|
300
|
-
}
|
|
301
|
-
} else {
|
|
302
|
-
report("would invoke", "cursor adapter uninstall");
|
|
303
|
-
}
|
|
311
|
+
console.log(` [Cursor - legacy cleanup] Removing from ${adapterTarget}...`);
|
|
312
|
+
const isOurs = (name) => name.startsWith("multi-agent-") || name.startsWith("multi-agent.");
|
|
313
|
+
const n = rmMatchingFiles(join(adapterTarget, ".cursor", "rules"), isOurs);
|
|
314
|
+
if (n > 0) console.log(` removed ${n} rule file(s)`);
|
|
315
|
+
stripManagedBlock(join(adapterTarget, ".cursorrules"));
|
|
304
316
|
}
|
|
305
317
|
|
|
306
318
|
if (forCopilotChat) {
|
|
307
319
|
console.log("");
|
|
308
|
-
console.log(` [GitHub Copilot Chat] Removing from ${adapterTarget}...`);
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
} catch (e) {
|
|
315
|
-
console.log(` skipped (adapter unavailable): ${e.message}`);
|
|
316
|
-
}
|
|
317
|
-
} else {
|
|
318
|
-
report("would invoke", "copilot-chat adapter uninstall");
|
|
319
|
-
}
|
|
320
|
+
console.log(` [GitHub Copilot Chat - legacy cleanup] Removing from ${adapterTarget}...`);
|
|
321
|
+
stripManagedBlock(join(adapterTarget, ".github", "copilot-instructions.md"));
|
|
322
|
+
const n = rmMatchingFiles(join(adapterTarget, ".github", "instructions"), (name) =>
|
|
323
|
+
name.startsWith("multi-agent-"),
|
|
324
|
+
);
|
|
325
|
+
if (n > 0) console.log(` removed ${n} instruction file(s)`);
|
|
320
326
|
}
|
|
321
327
|
|
|
322
328
|
if (forAntigravity) {
|
|
323
329
|
console.log("");
|
|
324
|
-
console.log(` [Antigravity] Removing from ${adapterTarget}...`);
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
}
|
|
333
|
-
} else {
|
|
334
|
-
report("would invoke", "antigravity adapter uninstall");
|
|
335
|
-
}
|
|
330
|
+
console.log(` [Antigravity - legacy cleanup] Removing from ${adapterTarget}...`);
|
|
331
|
+
const isOurs = (name) => name.startsWith("multi-agent-") || name.startsWith("multi-agent.");
|
|
332
|
+
let n = rmMatchingFiles(join(adapterTarget, ".agent", "rules"), isOurs);
|
|
333
|
+
n += rmMatchingFiles(join(adapterTarget, ".agent", "workflows"), isOurs);
|
|
334
|
+
if (n > 0) console.log(` removed ${n} .agent file(s)`);
|
|
335
|
+
stripManagedBlock(join(adapterTarget, "AGENTS.md"));
|
|
336
|
+
if (existsSync(join(adapterTarget, ".agent", "mcp_config.json")))
|
|
337
|
+
console.log(" note: .agent/mcp_config.json left untouched (may hold user servers) - remove pipeline entries manually if present");
|
|
336
338
|
}
|
|
337
339
|
|
|
338
|
-
if (forCodex) {
|
|
340
|
+
if (forCodex && HOME) {
|
|
339
341
|
console.log("");
|
|
340
|
-
console.log(" [OpenAI Codex CLI] Removing from ~/.codex...");
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
|
|
346
|
-
} catch (e) {
|
|
347
|
-
console.log(` skipped (adapter unavailable): ${e.message}`);
|
|
348
|
-
}
|
|
349
|
-
} else {
|
|
350
|
-
report("would invoke", "codex adapter uninstall");
|
|
351
|
-
}
|
|
342
|
+
console.log(" [OpenAI Codex CLI - legacy cleanup] Removing from ~/.codex...");
|
|
343
|
+
const CODEX = join(HOME, ".codex");
|
|
344
|
+
rmIfExists(join(CODEX, "prompts", "multi-agent.md"));
|
|
345
|
+
stripManagedBlock(join(CODEX, "AGENTS.md"));
|
|
346
|
+
if (existsSync(join(CODEX, "config.toml")))
|
|
347
|
+
console.log(" note: ~/.codex/config.toml left untouched (may hold user MCP servers) - remove pipeline entries manually if present");
|
|
352
348
|
}
|
|
353
349
|
|
|
354
350
|
console.log("");
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: multi-agent
|
|
3
3
|
language: en
|
|
4
|
-
description: "Task orchestrator: runs the full pipeline from a Jira ID or GitHub Issue URL - analysis → plan → TDD development → parallel review (
|
|
4
|
+
description: "Task orchestrator: runs the full pipeline from a Jira ID or GitHub Issue URL - analysis → plan → TDD development → parallel review (Fable + Sonnet on Claude Code, GPT + Opus + Sonnet on Copilot CLI) → commit → log. Every step is written to agent-log.md."
|
|
5
5
|
user-invocable: true
|
|
6
6
|
argument-hint: '"PROJ-12345" "feature/PROJ-12345-flight-filter" | "https://github.com/.../issues/316" | status | log #1 | resume #1 | kill #1 | clear-logs | purge | review'
|
|
7
7
|
---
|
|
@@ -397,7 +397,7 @@ Full contract: `refs/tracker-contract.md` section "TaskCreate ordering (strict)"
|
|
|
397
397
|
|
|
398
398
|
**IMPORTANT**: Update `agent-state.json` at EVERY phase transition. This file is the resume source of truth.
|
|
399
399
|
|
|
400
|
-
### Phase 1: Analysis (claude-
|
|
400
|
+
### Phase 1: Analysis (claude-fable-5)
|
|
401
401
|
1. Launch **explore agents** (parallel) to scan codebase:
|
|
402
402
|
- Related files to the task
|
|
403
403
|
- Existing patterns and conventions
|
|
@@ -406,17 +406,17 @@ Full contract: `refs/tracker-contract.md` section "TaskCreate ordering (strict)"
|
|
|
406
406
|
3. Summarize findings
|
|
407
407
|
4. Log: `📊 Phase 1: Analysis - {N} files identified, {summary}`
|
|
408
408
|
|
|
409
|
-
### Phase 2: Planning (claude-
|
|
409
|
+
### Phase 2: Planning (claude-fable-5)
|
|
410
410
|
1. Create task breakdown → todos with dependencies
|
|
411
411
|
2. Launch **ios-architect** agent for architecture review (if structural changes)
|
|
412
412
|
3. Determine development approach per todo (new file, modify, refactor)
|
|
413
413
|
4. Log: `🧠 Phase 2: Plan - {N} todos created`
|
|
414
414
|
5. **Plan Approval Gate** (normal mode only - skipped for `--dev`, `autopilot`, `--dev autopilot`). Full flow in `refs/phases/phase-2-planning.md` Step 5:
|
|
415
415
|
- **5a - Clarification** (conditional, max 2 rounds): if the Jira/issue description is ambiguous (vague acceptance, UI task without Figma, API task without endpoint contract, `ambiguityScore >= 2`, parent-story scope drift), ask structured questions before rendering the plan. User answers → plan regenerated. If it is still unclear after the 2nd round, render the plan with a "best-effort" banner.
|
|
416
|
-
- **5b - Approval loop**: render plan → user: `onayla`/`iptal`/free-text. A free-text edit request →
|
|
416
|
+
- **5b - Approval loop**: render plan → user: `onayla`/`iptal`/free-text. A free-text edit request → the planning model (Fable) revises the plan → show it again. No iteration cap; user controls exit via `onayla` or `iptal`.
|
|
417
417
|
- Persist `clarificationRounds`, `clarificationQuestions`, `clarificationAnswers`, `planIterations`, `planApprovedAt`, `planEditRequests` to `state.phases["2"]`.
|
|
418
418
|
|
|
419
|
-
### Phase 3: Dev (claude-sonnet-4
|
|
419
|
+
### Phase 3: Dev (claude-sonnet-4-6)
|
|
420
420
|
For each todo (respecting dependency order):
|
|
421
421
|
1. Update todo status: `in_progress`
|
|
422
422
|
2. **TDD cycle**:
|
|
@@ -431,9 +431,9 @@ For each todo (respecting dependency order):
|
|
|
431
431
|
### Phase 4: Review (parallel + triage)
|
|
432
432
|
0. **Diff Risk Scoring (advisory, v8.3+)** - before reviewer dispatch run `node pipeline/scripts/diff-risk-score.mjs --base "$BASE_BRANCH" --top 5` and inject the top-N risk-ranked files as a `${PRIORITY_FILES}` block into each reviewer's prompt. Heuristic, deterministic, sub-second, never gates the pipeline. Disabled when `prefs.global.diffRiskAdvisory = false`. Signals: security paths (×3), schema migrations (×4), public API surfaces (×2), no-test-change (×2.5), complexity delta (×1.5), UI-critical paths (×1.5), loc changed (×1).
|
|
433
433
|
1. Launch **code-reviewer** agents in parallel. Reviewer set depends on which CLI is hosting the pipeline:
|
|
434
|
-
- **Claude Code** (2 reviewers): `claude-
|
|
435
|
-
- **Copilot CLI** (3 reviewers): `gpt-5.4` (edge cases, different perspective) + `claude-opus-4
|
|
436
|
-
- Triage (
|
|
434
|
+
- **Claude Code** (2 reviewers): `claude-fable-5` (deep security + architecture) + `claude-sonnet-4-6` (quality + correctness)
|
|
435
|
+
- **Copilot CLI** (3 reviewers): `gpt-5.4` (edge cases, different perspective) + `claude-opus-4-8` + `claude-sonnet-4-6` (Fable 5 is not offered on Copilot CLI)
|
|
436
|
+
- Triage: single top-tier pass over merged findings (`claude-fable-5` on Claude Code, `claude-opus-4-8` on Copilot CLI)
|
|
437
437
|
2. Collect findings, classify:
|
|
438
438
|
- 🔴 **Blocking** → must fix → back to Phase 3 (max 3 iterations)
|
|
439
439
|
- 🟡 **Important** → fix and re-review
|
|
@@ -872,11 +872,11 @@ First show how it works:
|
|
|
872
872
|
```
|
|
873
873
|
🤖 How it works (8 phases):
|
|
874
874
|
0. Init - Project detection, worktree creation, state file
|
|
875
|
-
1. Analysis - Codebase scan (parallel explore agents,
|
|
875
|
+
1. Analysis - Codebase scan (parallel explore agents, Fable)
|
|
876
876
|
2. Planning - Task breakdown, architecture review, user approval
|
|
877
877
|
3. Dev - TDD loop: write test → write code → build (Sonnet)
|
|
878
|
-
4. Review - Parallel review +
|
|
879
|
-
• Claude Code →
|
|
878
|
+
4. Review - Parallel review + Fable triage. Reviewer set by CLI:
|
|
879
|
+
• Claude Code → Fable + Sonnet (2 parallel)
|
|
880
880
|
• Copilot CLI → GPT-5.4 + Opus + Sonnet (3 parallel)
|
|
881
881
|
5. Test - Optional: switch to the branch, manual test in Xcode
|
|
882
882
|
6. Commit - Commit + push + PR + issue body update (PR links + progress flags)
|
|
@@ -35,7 +35,7 @@ Phase 7: Report → Short terminal summary
|
|
|
35
35
|
|
|
36
36
|
1. **Parse the input** - Normal multi-agent formats (Issue URL, Jira ID, free-text)
|
|
37
37
|
2. **Phase 0: Init** - `"mode": "dev", "autopilot": true` in `agent-state.json`
|
|
38
|
-
3. **Phase 3: Dev** - Write code + build directly with `claude-opus-4
|
|
38
|
+
3. **Phase 3: Dev** - Write code + build directly with `claude-opus-4-8`
|
|
39
39
|
4. **Phase 6: Commit** - Create automatic commit + push + PR
|
|
40
40
|
5. **Phase 7: Report** - Terminal summary
|
|
41
41
|
|