@zigrivers/scaffold 3.15.0 → 3.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -947,23 +947,22 @@ You don't need both — Scaffold works with whichever CLIs are available. Having
947
947
  #### How mmr Works
948
948
 
949
949
  ```
950
- mmr review --pr 47 ──→ Dispatches to all channels in background
951
- Returns job ID immediately
952
- Agent continues working
950
+ # Recommended: single-command pipeline (--sync)
951
+ mmr review --pr 47 --sync ──→ Dispatches to all channels
952
+ Runs compensating passes for unavailable channels
953
+ Parses outputs, reconciles findings
954
+ Applies severity gate, derives verdict
955
+ Exit code: 0=pass, 2=blocked, 3=needs-decision
953
956
 
954
- mmr status mmr-a1b2c3 ──→ Poll progress (which channels done?)
955
- Exit code: 0=done, 1=running, 4=failed
956
-
957
- mmr results mmr-a1b2c3 ──→ Reconcile findings across channels
958
- Run compensating passes for unavailable channels
959
- Apply severity gate
960
- Output unified findings
961
- Exit code: 0=passed, 2=gate failed, 3=degraded
957
+ # Alternative: step-by-step (for async workflows)
958
+ mmr review --pr 47 ──→ Dispatch and await all channels
959
+ mmr results mmr-a1b2c3 ──→ Reconcile findings, output verdict
962
960
  ```
963
961
 
964
962
  **Key features:**
965
963
 
966
- - **Async job model** — reviews run in background processes. The agent fires `mmr review` and continues working. No blocking for 4-6 minutes.
964
+ - **--sync mode** — single-command pipeline: dispatch, parse, reconcile, verdict. The recommended entry point for agents and CI.
965
+ - **Compensating passes** — when a channel is unavailable, a Claude-based review focused on that channel's strength area runs automatically.
967
966
  - **Per-channel auth verification** — checks authentication before every dispatch. Auth failures are never silent — `mmr` tells you exactly what expired and the command to fix it.
968
967
  - **Immutable core prompt** — every channel gets the same severity definitions (P0-P3), output format spec (JSON), and review criteria. No prompt drift between channels.
969
968
  - **Automated reconciliation** — when two channels flag the same location, that's consensus (high confidence). When only one channel flags something, it's unique (medium confidence). P0 from any single source is always high confidence.
@@ -1079,6 +1078,14 @@ You can also adjust per-channel timeouts, the default severity threshold, and na
1079
1078
  **After creating a PR:**
1080
1079
 
1081
1080
  ```bash
1081
+ # Recommended: single-command review
1082
+ mmr review --pr 47 --sync --focus "auth flow, session handling"
1083
+ # → Full review output with verdict and findings
1084
+
1085
+ # Or with text output for readability:
1086
+ mmr review --pr 47 --sync --format text
1087
+
1088
+ # Step-by-step (when you want to continue working while review runs):
1082
1089
  mmr review --pr 47 --focus "auth flow, session handling"
1083
1090
  # → Job mmr-a1b2c3 started. 2/2 channels dispatched.
1084
1091
  ```
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: automated-review-tooling
3
- description: Patterns for setting up automated PR code review using AI models (Codex, Gemini) via local CLI, including dual-model review, reconciliation, and CI integration
3
+ description: Patterns for automated PR code review using AI CLI tools (Codex, Gemini, Claude) orchestration, reconciliation, compensating passes, and CI integration
4
4
  topics: [code-review, automation, codex, gemini, pull-requests, ci-cd, review-tooling]
5
5
  ---
6
6
 
@@ -24,10 +24,10 @@ These are the authoritative verdict definitions. Tool files (`review-code.md`, `
24
24
 
25
25
  | Verdict | Condition |
26
26
  |---------|-----------|
27
- | `pass` | All configured channels ran, no unresolved P0/P1/P2 |
28
- | `degraded-pass` | Channels skipped, compensated, or have non-full coverage (e.g., partial timeout), no unresolved P0/P1/P2 |
29
- | `blocked` | Unresolved P0/P1/P2 after 3 fix rounds |
30
- | `needs-user-decision` | Contradictions or unresolvable findings |
27
+ | `pass` | All channels completed, no unresolved P0/P1/P2 |
28
+ | `degraded-pass` | Some channels unavailable, compensating passes ran, no unresolved P0/P1/P2 |
29
+ | `blocked` | Findings at or above fix threshold remain unresolved |
30
+ | `needs-user-decision` | No channels completed insufficient data for a determination |
31
31
 
32
32
  **Verdict precedence:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. When multiple conditions apply, the higher-precedence verdict wins.
33
33
 
@@ -35,13 +35,13 @@ These are the authoritative verdict definitions. Tool files (`review-code.md`, `
35
35
 
36
36
  #### Status Model
37
37
 
38
- `compensating` is a **coverage label** applied to a channel's output, not a replacement for the root-cause status. Each channel retains its root-cause status (`not_installed`, `auth_failed`, `auth_timeout`, `failed`) AND gains a coverage label (`compensating (X-equivalent)`) when a compensating pass ran. The fix cycle uses the **root-cause status** to decide whether to retry (never retry `not_installed`, `auth_failed`, `auth_timeout`). The report uses the **coverage label** to show the reader what ran.
38
+ `compensating` is a **coverage label** applied to a channel's output, not a replacement for the root-cause status. Each channel retains its root-cause status (`not_installed`, `auth_failed`, `timeout`, `failed`) AND gains a coverage label (`compensating (X-equivalent)`) when a compensating pass ran. The fix cycle uses the **root-cause status** to decide whether to retry (never retry `not_installed`, `auth_failed`, `timeout`). The report uses the **coverage label** to show the reader what ran.
39
39
 
40
40
  #### Compensating Passes
41
41
 
42
- When an external channel (Codex or Gemini) is unavailable, run a compensating Claude self-review pass:
42
+ When a channel (Codex or Gemini) is unavailable, the CLI dispatches a compensating pass via `claude -p`:
43
43
 
44
- - Same prompt structure as the missing channel, executed as a Claude self-review pass.
44
+ - Same prompt structure as the missing channel, executed as a `claude -p` dispatch.
45
45
  - Labeled `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]` in the review summary.
46
46
  - Missing Codex → focus on implementation correctness, security, API contracts.
47
47
  - Missing Gemini → focus on architectural patterns, design reasoning, broad context.
@@ -49,8 +49,6 @@ When an external channel (Codex or Gemini) is unavailable, run a compensating Cl
49
49
  - Compensating-pass findings are **single-source confidence** — they do NOT raise to high confidence even if they agree with another channel's findings.
50
50
  - Normal mandatory-fix thresholds apply: P0/P1/P2 findings from compensating passes still require fixing.
51
51
 
52
- **Superpowers channel:** No compensating pass needed — Superpowers is a Claude subagent and is always available. If the Superpowers plugin is not installed, run available external CLIs and warn the user that review coverage is reduced.
53
-
54
52
  #### Foreground-Only Execution
55
53
 
56
54
  Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty or truncated output from Codex and Gemini CLIs. Multiple foreground calls can still run in parallel if the tool runner supports parallel tool invocations.
@@ -67,7 +65,7 @@ Reconciliation normalizes findings from all channels (real and compensating) to
67
65
 
68
66
  The reconciliation output is a deduplicated list of findings with confidence scores. High-confidence findings (agreed by 2+ real channels) are actionable without further discussion. Low-confidence findings (single-source, or from compensating passes) still require action at P0/P1/P2 but should be noted as lower-confidence in the review summary.
69
67
 
70
- Findings that appear in all three channels (Codex, Gemini, Superpowers) are considered maximum-confidence and should be surfaced first in the review summary. Findings that appear in only one channel should include the channel name in the finding description to help the developer assess confidence independently.
68
+ Findings that appear in all three channels (Codex, Gemini, Claude) are considered maximum-confidence and should be surfaced first in the review summary. Findings that appear in only one channel should include the channel name in the finding description to help the developer assess confidence independently.
71
69
 
72
70
  ```bash
73
71
  # Orchestration reconciliation workflow
@@ -80,16 +78,15 @@ Findings that appear in all three channels (Codex, Gemini, Superpowers) are cons
80
78
 
81
79
  ### Channel Dispatch Pattern and Orchestration
82
80
 
83
- Each external channel (Codex, Gemini) follows the same dispatch pattern: check installation, check auth, then dispatch as a foreground call. If any step fails, record the root-cause status, queue a compensating pass, and continue to the next channel. The Superpowers channel is always available as a Claude subagent and does not require installation or auth checks.
81
+ Each channel (Codex, Gemini, Claude) follows the same dispatch pattern: check installation, check auth, then dispatch as a foreground call. If any step fails, record the root-cause status, queue a compensating pass (for Codex/Gemini), and continue to the next channel.
84
82
 
85
83
  ```bash
86
84
  # Channel dispatch pattern
87
- # For each external channel (Codex, Gemini):
85
+ # For each channel (codex, gemini, claude):
88
86
  # 1. command -v <tool> >/dev/null 2>&1 || { status=not_installed; queue_compensating; continue; }
89
87
  # 2. <auth_check> || { status=auth_failed; queue_compensating; continue; }
90
88
  # 3. <dispatch_foreground> || { status=failed; queue_compensating; continue; }
91
- # For Superpowers: dispatch subagent (always available)
92
- # After all: run queued compensating passes → reconcile → verdict
89
+ # After all: run queued compensating passes (via claude -p) → reconcile → verdict
93
90
  ```
94
91
 
95
92
  After all channels and compensating passes complete, run the reconciliation workflow above and apply the verdict decision flow. Channel results and compensating-pass labels must be preserved in the review output for auditability — do not collapse or omit them even when findings are empty.
@@ -99,14 +96,14 @@ After all channels and compensating passes complete, run the reconciliation work
99
96
  When Codex is unavailable (not installed or auth failure), the orchestration proceeds as follows:
100
97
 
101
98
  1. The installation check (`command -v codex`) fails. Codex channel status is set to `not_installed`.
102
- 2. A compensating Codex-equivalent pass is queued: a Claude self-review focused on implementation correctness, security, and API contracts.
103
- 3. Gemini and Superpowers channels run normally.
99
+ 2. A compensating Codex-equivalent pass is queued: a `claude -p` dispatch focused on implementation correctness, security, and API contracts.
100
+ 3. Gemini and Claude channels run normally.
104
101
  4. The compensating pass runs, producing findings labeled `[compensating: Codex-equivalent]`.
105
- 5. Reconciliation merges findings from all three sources (Gemini, Superpowers, compensating-Codex).
102
+ 5. Reconciliation merges findings from all three sources (Gemini, Claude, compensating-Codex).
106
103
  6. Maximum achievable verdict is `degraded-pass` because a real channel was absent.
107
104
  7. The review summary notes: "Codex channel: not_installed (compensating: Codex-equivalent pass ran)."
108
105
 
109
- **Fix-cycle channel rule:** Only re-run channels that originally completed or ran as compensating passes. `failed` channels are covered by their compensating pass and are not retried during fix rounds. Never retry a channel with status `not_installed`, `auth_failed`, or `auth_timeout` — these indicate persistent environment conditions that will not resolve between fix rounds.
106
+ **Fix-cycle channel rule:** Only re-run channels that originally completed or ran as compensating passes. `failed` channels are covered by their compensating pass and are not retried during fix rounds. Never retry a channel with status `not_installed`, `auth_failed`, or `timeout` — these indicate persistent environment conditions that will not resolve between fix rounds.
110
107
 
111
108
  ### Verdict Decision Flow
112
109
 
@@ -114,19 +111,17 @@ Apply the following evaluation order to determine the final verdict. The first m
114
111
 
115
112
  ```
116
113
  Verdict evaluation order:
117
- 1. Any contradictions or unresolvable findings? → needs-user-decision
114
+ 1. No channels completed? → needs-user-decision
118
115
  2. Any unresolved P0/P1/P2 after 3 fix rounds? → blocked
119
116
  3. Any channel not at full coverage? → degraded-pass
120
117
  4. All channels completed, no unresolved P0/P1/P2? → pass
121
118
  ```
122
119
 
123
- A "contradiction" exists when two channels report opposite conclusions about the same code location for example, Codex flags a function as insecure while Gemini explicitly approves it. Contradictions cannot be resolved by the agent alone and must be surfaced to the user.
124
-
125
- A channel is "not at full coverage" when: it ran as a compensating pass instead of a real tool, it timed out partially, or the Superpowers plugin is not installed and available channels do not cover the full diff.
120
+ A channel is "not at full coverage" when: it ran as a compensating pass instead of a real tool, or it timed out.
126
121
 
127
- **Verdict precedence reminder:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. If multiple conditions apply simultaneously (for example, both a contradiction and an unresolved P0 exist), the higher-precedence verdict wins.
122
+ **Verdict precedence reminder:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. When multiple conditions apply simultaneously, the higher-precedence verdict wins.
128
123
 
129
- The verdict is always computed after all fix rounds are exhausted — do not emit a partial verdict mid-cycle. If a fix round resolves all P0/P1/P2 findings and no contradictions remain, the verdict upgrades from `blocked` to `pass` or `degraded-pass` depending on channel coverage. This upgrade must be verified explicitly by re-running the reconciliation step after each fix round, not assumed from the fact that fixes were applied.
124
+ The verdict is always computed after all fix rounds are exhausted — do not emit a partial verdict mid-cycle. If a fix round resolves all P0/P1/P2 findings, the verdict upgrades from `blocked` to `pass` or `degraded-pass` depending on channel coverage. This upgrade must be verified explicitly by re-running the reconciliation step after each fix round, not assumed from the fact that fixes were applied.
130
125
 
131
126
  ### Security-Focused Review Checklist
132
127
 
@@ -197,4 +192,4 @@ When external CLIs are unavailable, the degraded-mode behavior defined in the Su
197
192
  5. When both external channels are unavailable, note "All findings are single-model (Claude only). External validation was unavailable." in the review summary.
198
193
  6. Never silently drop unavailable channels — always record the channel status and compensating coverage label in the review output.
199
194
 
200
- **Superpowers channel exception:** Superpowers is a Claude subagent and requires no external CLI or auth. It is always available as long as the Superpowers plugin is installed in the Claude Code environment. If the plugin is not installed, run available external CLIs and warn the user that review coverage is reduced but do not run a compensating pass for Superpowers (the compensating-pass mechanism only applies to external CLIs that have an installation/auth gate).
195
+ **Claude CLI channel:** Claude CLI handles its own auth and is generally always available. The compensating-pass mechanism applies to external CLIs (Codex, Gemini) that have an installation/auth gate. When Codex or Gemini are unavailable, compensating passes are dispatched via `claude -p` with focused prompts targeting the missing channel's strength area.
@@ -1,27 +1,18 @@
1
1
  ---
2
2
  name: multi-model-review-dispatch
3
- description: Patterns for dispatching reviews to external AI models (Codex, Gemini) at depth 4+, including fallback strategies and finding reconciliation
4
- topics: [multi-model, code-review, depth-scaling, codex, gemini, review-synthesis]
3
+ description: Patterns for dispatching reviews to AI CLI tools (Codex, Gemini, Claude), including fallback strategies and finding reconciliation
4
+ topics: [multi-model, code-review, codex, gemini, claude, review-synthesis]
5
5
  ---
6
6
 
7
7
  # Multi-Model Review Dispatch
8
8
 
9
- At higher methodology depths (4+), reviews benefit from independent validation by external AI models. Different models have different blind spots — Codex excels at code-centric analysis while Gemini brings strength in design and architectural reasoning. Dispatching to multiple models and reconciling their findings produces higher-quality reviews than any single model alone. This knowledge covers when to dispatch, how to dispatch, how to handle failures, and how to reconcile disagreements.
9
+ Reviews benefit from independent validation by multiple AI models. Different models have different blind spots — Codex excels at code-centric analysis, Gemini brings strength in design and architectural reasoning, and Claude provides plan alignment and code quality assessment. Dispatching to multiple models and reconciling their findings produces higher-quality reviews than any single model alone. This knowledge covers how to dispatch, how to handle failures, and how to reconcile disagreements.
10
10
 
11
11
  ## Summary
12
12
 
13
13
  ### When to Dispatch
14
14
 
15
- Multi-model review activates at depth 4+ in the methodology scaling system:
16
-
17
- | Depth | Review Approach |
18
- |-------|----------------|
19
- | 1-2 | Claude-only, reduced pass count |
20
- | 3 | Claude-only, full pass count |
21
- | 4 | Full passes + one external model (if available) |
22
- | 5 | Full passes + multi-model with reconciliation |
23
-
24
- Dispatch is always optional. If no external model CLI is available, the review proceeds as a Claude-only enhanced review with additional self-review passes to partially compensate.
15
+ Multi-model review runs all enabled channels on every review. The MMR CLI (`mmr review --sync`) is the primary entry point and handles dispatch, parsing, reconciliation, and verdict derivation automatically.
25
16
 
26
17
  ### Model Selection
27
18
 
@@ -29,15 +20,16 @@ Dispatch is always optional. If no external model CLI is available, the review p
29
20
  |-------|----------|----------|
30
21
  | **Codex** (OpenAI) | Code analysis, implementation correctness, API contract validation | Code reviews, security reviews, API reviews, database schema reviews |
31
22
  | **Gemini** (Google) | Design reasoning, architectural patterns, broad context understanding | Architecture reviews, PRD reviews, UX reviews, domain model reviews |
23
+ | **Claude** (Anthropic) | Plan alignment, code quality, testing thoroughness | Code reviews, plan verification, test coverage |
32
24
 
33
- When both models are available at depth 5, dispatch to both and reconcile. At depth 4, choose the model best suited to the artifact type.
25
+ All enabled channels run on every review. When a channel is unavailable, a compensating pass is dispatched via `claude -p` focused on the missing channel's strength area.
34
26
 
35
27
  ### Graceful Fallback
36
28
 
37
29
  External models are never required. The fallback chain:
38
30
  1. Attempt dispatch to selected model(s)
39
31
  2. If CLI unavailable → skip that model, note in report
40
- 3. If timeout → use partial results if any, note incompleteness
32
+ 3. If timeout → CLI kills the process; no partial output preserved; compensating pass runs
41
33
  4. If all external models fail → Claude-only enhanced review (additional self-review passes)
42
34
 
43
35
  The review never blocks on external model availability.
@@ -82,15 +74,15 @@ If auth fails, report status `auth_failed` and surface recovery to the user:
82
74
  - Codex: "Codex auth expired — run `! codex login` to re-authenticate"
83
75
  - Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
84
76
 
85
- If auth check times out (~5 seconds), retry once. If still failing, report `auth_timeout`.
77
+ If auth check times out (~5 seconds), retry once. If still failing, report `timeout`.
86
78
  If auth succeeds, report `ready` and proceed to dispatch.
87
79
 
88
80
  **Post-dispatch terminal states:**
89
81
  - `completed` — channel produced results, use normally
90
- - `partial_timeout` — partial output before timeout; use what was received, note incompleteness. Does NOT trigger compensating pass.
91
- - `failed` — crashed or unparseable output; triggers compensating pass.
82
+ - `timeout` — channel exceeded time limit; CLI kills the process and marks it as `timeout`; triggers compensating pass
83
+ - `failed` — crashed or unparseable output; triggers compensating pass
92
84
 
93
- Verdict impact: `partial_timeout` and `failed` channels mean the review is degraded. Maximum verdict is `degraded-pass` when any channel has a non-`completed` terminal state.
85
+ Verdict impact: `timeout` and `failed` channels mean the review is degraded. Maximum verdict is `degraded-pass` when any channel has a non-`completed` terminal state.
94
86
 
95
87
  #### Prompt Formatting
96
88
 
@@ -126,10 +118,12 @@ Respond with a JSON array of findings:
126
118
  "severity": "P0|P1|P2|P3",
127
119
  "category": "coverage|consistency|correctness|completeness",
128
120
  "location": "section or line reference",
129
- "finding": "description of the issue",
121
+ "description": "description of the issue",
130
122
  "suggestion": "recommended fix"
131
123
  }
132
124
  ]
125
+
126
+ Note: `id` and `category` are optional — the CLI auto-generates IDs (F-001, F-002, ...) when omitted.
133
127
  ```
134
128
 
135
129
  #### Output Parsing
@@ -137,15 +131,11 @@ Respond with a JSON array of findings:
137
131
  External model output is parsed as JSON. Handle common parsing issues:
138
132
  - Strip markdown code fences (```json ... ```) if the model wraps output
139
133
  - Handle trailing commas in JSON arrays
140
- - Validate that each finding has the required fields (severity, category, finding)
134
+ - Validate that each finding has the required fields (severity, location, description, suggestion)
141
135
  - Discard malformed entries rather than failing the entire parse
142
136
 
143
- Store raw output for audit:
144
- ```
145
- docs/reviews/{artifact}/codex-review.json — raw Codex findings
146
- docs/reviews/{artifact}/gemini-review.json — raw Gemini findings
147
- docs/reviews/{artifact}/review-summary.md — reconciled synthesis
148
- ```
137
+ The CLI stores raw output at `~/.mmr/jobs/{job-id}/` per channel. Review results
138
+ are available via `mmr results <job-id>`.
149
139
 
150
140
  ### Timeout Handling
151
141
 
@@ -158,14 +148,7 @@ External model calls can hang or take unreasonably long. Set reasonable timeouts
158
148
  | Medium artifact review (2000-10000 words) | 120 seconds | Needs more processing time |
159
149
  | Large artifact review (>10000 words) | 180 seconds | Maximum reasonable wait |
160
150
 
161
- #### Partial Result Handling
162
-
163
- If a timeout occurs mid-response:
164
- 1. Check if the partial output contains valid JSON entries
165
- 2. If yes, use the valid entries and note "partial results" in the report
166
- 3. If no, treat as a model failure and fall back
167
-
168
- Never wait indefinitely. A review that completes in 3 minutes with Claude-only findings is better than one that blocks for 10 minutes waiting for an external model.
151
+ Never wait indefinitely. A review that completes in 3 minutes with Claude-only findings is better than one that blocks for 10 minutes waiting for an external model. When a channel times out, the CLI kills the process — no partial output is preserved. A compensating pass runs in its place.
169
152
 
170
153
  ### Finding Reconciliation
171
154
 
@@ -224,9 +207,9 @@ When synthesizing multi-model findings, classify each finding:
224
207
  # Multi-Model Review Summary: [Artifact Name]
225
208
 
226
209
  ## Models Used
227
- - Claude (primary reviewer)
228
- - Codex (external, depth 4+) — [available/unavailable/timeout]
229
- - Gemini (external, depth 5) — [available/unavailable/timeout]
210
+ - Claude CLI — [available/unavailable/timeout]
211
+ - Codex CLI — [available/unavailable/timeout]
212
+ - Gemini CLI — [available/unavailable/timeout]
230
213
 
231
214
  ## Consensus Findings
232
215
  | # | Severity | Finding | Models | Confidence |
@@ -251,14 +234,10 @@ or areas where external models provided unique value]
251
234
 
252
235
  #### Raw JSON Preservation
253
236
 
254
- Always preserve the raw JSON output from external models, even after reconciliation. The raw findings serve as an audit trail and enable re-analysis if the reconciliation logic is later improved.
237
+ Always preserve the raw JSON output from each channel, even after reconciliation. The raw findings serve as an audit trail and enable re-analysis if the reconciliation logic is later improved.
255
238
 
256
- ```
257
- docs/reviews/{artifact}/
258
- codex-review.json — raw output from Codex
259
- gemini-review.json — raw output from Gemini
260
- review-summary.md — reconciled synthesis
261
- ```
239
+ The CLI stores raw output at `~/.mmr/jobs/{job-id}/` with per-channel result files.
240
+ Results are accessible via `mmr results <job-id>`.
262
241
 
263
242
  ### Quality Gates
264
243
 
@@ -266,20 +245,18 @@ Minimum standards for a multi-model review to be considered complete:
266
245
 
267
246
  | Gate | Threshold | Rationale |
268
247
  |------|-----------|-----------|
269
- | Minimum finding count | At least 3 findings across all models | A review with zero findings likely missed something |
270
- | Coverage threshold | Every review pass has at least one finding or explicit "no issues found" note | Ensures all passes were actually executed |
248
+ | Coverage threshold | Every channel has at least one finding or explicit "no issues found" note | Ensures all channels were actually executed |
271
249
  | Reconciliation completeness | All cross-model disagreements have documented resolutions | No unresolved conflicts |
272
- | Raw output preserved | JSON files exist for all models that were dispatched | Audit trail |
250
+ | Raw output preserved | Per-channel results exist for all dispatched channels | Audit trail |
273
251
 
274
- If the primary Claude review produces zero findings and external models are unavailable, the review should explicitly note this as unusual and recommend a targeted re-review at a later stage.
252
+ Zero findings across all channels is a valid outcome when the diff is clean.
275
253
 
276
254
  #### Degraded-Mode Gate Adaptation
277
255
 
278
256
  When channels are skipped and compensating passes are used:
279
257
 
280
- - **Minimum finding count** gate: compensating passes count toward the total but are not treated as separate external channels for consensus purposes.
281
258
  - **Reconciliation completeness** gate (cross-model disagreement documentation): applies whenever 2+ distinct model perspectives participate (Claude + one external counts). N/A only when Claude is the sole perspective (no external models and no compensating passes that introduce genuinely different framing).
282
- - **Coverage threshold** gate: compensating passes satisfy the "every pass has at least one finding or explicit no-issues note" requirement.
259
+ - **Coverage threshold** gate: compensating passes satisfy the "every channel has at least one finding or explicit no-issues note" requirement.
283
260
  - The reconciled output must record which channels were real, which were compensating, and which were skipped, so the orchestration layer can apply appropriate verdict logic.
284
261
 
285
262
  ### Common Anti-Patterns
@@ -288,12 +265,10 @@ When channels are skipped and compensating passes are used:
288
265
 
289
266
  **Ignoring disagreements.** Two models disagree, and the reviewer picks one without analysis. Fix: disagreements are the most valuable signal in multi-model review. They identify areas of genuine ambiguity or complexity. Always investigate and document the resolution.
290
267
 
291
- **Dispatching at low depth.** Running external model reviews at depth 1-2 where the review scope is intentionally minimal. The external model does a full analysis anyway, producing findings that are out of scope. Fix: only dispatch at depth 4+. Lower depths use Claude-only review with reduced pass count.
292
-
293
- **No fallback plan.** The review pipeline assumes external models are always available. When Codex is down, the review fails entirely. Fix: external dispatch is always optional. The fallback to Claude-only enhanced review must be implemented and tested.
268
+ **No fallback plan.** The review pipeline assumes external models are always available. When Codex is down, the review fails entirely. Fix: external dispatch is always optional. The CLI automatically dispatches compensating passes via `claude -p` when channels are unavailable.
294
269
 
295
270
  **Over-weighting consensus.** Two models agree on a finding, so it must be correct. But both models may share the same bias (e.g., both flag a pattern as an anti-pattern that is actually appropriate for this project's constraints). Fix: consensus increases confidence but does not guarantee correctness. All findings still require artifact-level verification.
296
271
 
297
272
  **Dispatching the full pipeline context.** Sending the entire project context (all docs, all code) to the external model. This exceeds context limits and dilutes focus. Fix: send only the artifact under review and the minimal upstream context needed for that specific review.
298
273
 
299
- **Ignoring partial results.** A model times out after producing 3 of 5 findings. The reviewer discards all results because the review is "incomplete." Fix: partial results are still valuable. Include them with a note about incompleteness. Three real findings are better than zero.
274
+ **Treating a timeout as a silent skip.** A channel times out and the reviewer proceeds without documenting it. Fix: when a channel times out, record the root-cause status as `timeout`, queue a compensating pass, and include it in the review summary. The CLI kills timed-out processes no partial output is available, but the compensating pass ensures coverage.
@@ -26,7 +26,7 @@ comprehensive quality check before releasing or handing off the project.
26
26
  The three channels are:
27
27
  1. **Codex CLI** — Implementation correctness, security, API contracts
28
28
  2. **Gemini CLI** — Design reasoning, architectural patterns, broad context
29
- 3. **Superpowers code-reviewer** — Plan alignment, code quality, testing
29
+ 3. **Claude CLI** — Plan alignment, code quality, testing
30
30
 
31
31
  ## Inputs
32
32
 
@@ -191,6 +191,7 @@ Return ALL findings as valid JSON:
191
191
  {
192
192
  "severity": "P0|P1|P2|P3",
193
193
  "category": "architecture-alignment|security|error-handling|test-coverage|complexity|dependencies",
194
+ "location": "relative/path/to/file.ts:42",
194
195
  "file": "relative/path/to/file.ts",
195
196
  "line": 42,
196
197
  "description": "Specific description of the issue",
@@ -226,7 +227,7 @@ If not installed: queue a compensating pass (implementation correctness, securit
226
227
  codex login status 2>/dev/null && echo "codex authenticated" || echo "codex NOT authenticated"
227
228
  ```
228
229
 
229
- If not authenticated: tell the user "Codex auth expired. Run: `! codex login`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (auth_timeout or user declines): queue a compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`).
230
+ If not authenticated: tell the user "Codex auth expired. Run: `! codex login`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (timeout or user declines): queue a compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`).
230
231
 
231
232
  If Codex fails during execution (non-zero exit, malformed output, timeout): queue a compensating pass with the same focus and label.
232
233
 
@@ -254,7 +255,7 @@ If not installed: queue a compensating pass (architectural patterns, design reas
254
255
  NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
255
256
  ```
256
257
 
257
- If exit code is 41: tell the user "Gemini auth expired. Run: `! gemini -p \"hello\"`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (auth_timeout or user declines): queue a compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`).
258
+ If exit code is 41: tell the user "Gemini auth expired. Run: `! gemini -p \"hello\"`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (timeout or user declines): queue a compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`).
258
259
 
259
260
  If Gemini fails during execution (non-zero exit, malformed output, timeout): queue a compensating pass with the same focus and label.
260
261
 
@@ -291,6 +292,7 @@ surfaces to this format before returning):
291
292
  {
292
293
  "severity": "P0|P1|P2|P3",
293
294
  "category": "architecture-alignment|security|error-handling|test-coverage|complexity|dependencies",
295
+ "location": "relative/path/to/file.ts:42",
294
296
  "file": "relative/path/to/file.ts",
295
297
  "line": 42,
296
298
  "description": "Specific description of the issue",
@@ -300,6 +302,10 @@ surfaces to this format before returning):
300
302
  }
301
303
  ```
302
304
 
305
+ **MMR compatibility:** The `location` field (`file:line` format) is required for
306
+ `mmr reconcile` injection. The `file` and `line` fields are retained for backward
307
+ compatibility with direct channel consumers.
308
+
303
309
  Store as `SUPERPOWERS_PHASE1_FINDINGS`.
304
310
 
305
311
  ### Step 5: Run Phase 2 — Parallel User Story Review
@@ -441,8 +447,8 @@ before returning. Then return all three channels' findings plus channel status:
441
447
  {
442
448
  "story": "[STORY_TITLE]",
443
449
  "channel_status": {
444
- "codex": { "root_cause": "null|not_installed|auth_failed|auth_timeout|failed", "coverage_status": "full|compensating" },
445
- "gemini": { "root_cause": "null|not_installed|auth_failed|auth_timeout|failed", "coverage_status": "full|compensating" },
450
+ "codex": { "root_cause": "null|not_installed|auth_failed|timeout|failed", "coverage_status": "full|compensating" },
451
+ "gemini": { "root_cause": "null|not_installed|auth_failed|timeout|failed", "coverage_status": "full|compensating" },
446
452
  "superpowers": { "root_cause": null, "coverage_status": "full" }
447
453
  },
448
454
  "codex": { "findings": [...] },
@@ -453,6 +459,29 @@ before returning. Then return all three channels' findings plus channel status:
453
459
 
454
460
  Collect findings from all subagents. Store as `PHASE2_FINDINGS`.
455
461
 
462
+ ### Step 5e: Optional — Inject Findings into MMR for Unified Reconciliation
463
+
464
+ If an MMR job exists (e.g., from a prior `mmr review` run on the same branch), the
465
+ agent can inject its post-implementation review findings into MMR for unified
466
+ reconciliation across all channels:
467
+
468
+ ```bash
469
+ # Inject Phase 1 and Phase 2 findings into an existing MMR job
470
+ # Write agent findings to a temp file for mmr reconcile
471
+ echo "$AGENT_FINDINGS" > /tmp/agent-findings.json
472
+ mmr reconcile "$JOB_ID" --channel superpowers --input /tmp/agent-findings.json
473
+ ```
474
+
475
+ All findings injected via `mmr reconcile` must use MMR-compatible schema: each
476
+ finding needs `severity` (P0-P3), `location` (file:line), and `description`
477
+ (`suggestion` is optional). The strict validator will reject findings with
478
+ missing or invalid required fields.
479
+
480
+ This step is optional — post-implementation review is a full-codebase review (not
481
+ diff-only), so it operates independently of `mmr review`. Use `mmr reconcile` only
482
+ when you want to merge post-implementation findings into an existing MMR job for a
483
+ single unified verdict.
484
+
456
485
  ### Step 6: Consolidate Findings
457
486
 
458
487
  Merge all findings from Phase 1 (`CODEX_PHASE1_FINDINGS`, `GEMINI_PHASE1_FINDINGS`,
@@ -656,9 +685,9 @@ the user they require manual attention before the project is ready to release.
656
685
  | Codex not installed (`command -v` fails) | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "not_installed" in report |
657
686
  | Gemini not installed (`command -v` fails) | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "not_installed" in report |
658
687
  | Codex auth expired — user recovers | Re-run auth check; proceed with full Codex channel |
659
- | Codex auth expired — user declines or auth_timeout | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "auth_failed" or "auth_timeout" in report |
688
+ | Codex auth expired — user declines or timeout | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "auth_failed" or "timeout" in report |
660
689
  | Gemini auth expired (exit 41) — user recovers | Re-run auth check; proceed with full Gemini channel |
661
- | Gemini auth expired — user declines or auth_timeout | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "auth_failed" or "auth_timeout" in report |
690
+ | Gemini auth expired — user declines or timeout | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "auth_failed" or "timeout" in report |
662
691
  | Channel fails during execution (non-zero exit, malformed output, timeout) | Queue compensating pass for that channel with same focus and label; document root cause in report |
663
692
  | Both external CLIs unavailable (any combination of not_installed / auth failure) | Run all compensating passes plus Superpowers code-reviewer; report coverage as "degraded-coverage"; warn user that review coverage is reduced |
664
693
  | Superpowers unavailable | Document as "unavailable" in report; proceed with remaining channels; Superpowers is a Claude subagent and should always be available |
@@ -23,7 +23,7 @@ anything leaves the machine.
23
23
  The three channels are:
24
24
  1. **Codex CLI** — implementation correctness, security, API contracts
25
25
  2. **Gemini CLI** — architectural patterns, broad-context reasoning
26
- 3. **Superpowers code-reviewer** — Claude subagent review of code quality, tests, and plan alignment
26
+ 3. **Claude CLI** — Claude subagent review of code quality, tests, and plan alignment
27
27
 
28
28
  ## Inputs
29
29
 
@@ -46,6 +46,31 @@ The three channels are:
46
46
 
47
47
  ## Instructions
48
48
 
49
+ ### Primary: MMR CLI + Agent Reconcile
50
+
51
+ When the MMR CLI is installed, use it as the primary entry point:
52
+
53
+ ```bash
54
+ # Staged changes
55
+ mmr review --staged --sync --format json
56
+
57
+ # Branch diff against main
58
+ mmr review --base main --sync --format json
59
+ ```
60
+
61
+ After the CLI review completes, dispatch the agent's code-reviewer skill (4th channel) and inject findings into the MMR job for unified reconciliation:
62
+
63
+ ```bash
64
+ # job_id is captured from mmr review --sync --format json output
65
+ # Write agent findings to a temp file for mmr reconcile
66
+ echo "$AGENT_FINDINGS" > /tmp/agent-findings.json
67
+ mmr reconcile "$JOB_ID" --channel superpowers --input /tmp/agent-findings.json
68
+ ```
69
+
70
+ The agent's review output must use MMR-compatible finding schema: each finding needs `severity` (P0-P3), `location` (file:line), and `description` (`suggestion` is optional).
71
+
72
+ If `mmr` is not installed (`command -v mmr` fails), fall back to the manual multi-channel flow below.
73
+
49
74
  ### Step 1: Detect Mode
50
75
 
51
76
  Parse `$ARGUMENTS` and set:
@@ -173,7 +198,7 @@ codex login status 2>/dev/null
173
198
  - If `codex` is not installed: skip this channel and record root-cause `not_installed`
174
199
  - If auth fails: tell the user to run `! codex login`, retry after recovery, and if recovery is not possible, record root-cause `auth_failed` and continue with the remaining channels
175
200
 
176
- If auth cannot be recovered, or if Codex is not installed, queue a compensating Claude self-review pass focused on implementation correctness, security, and API contracts. Label findings as `[compensating: Codex-equivalent]`. If auth check times out (~5s), retry once; if still failing, record `auth timeout` and queue compensating pass. This pass runs after all channel dispatch attempts complete.
201
+ If auth cannot be recovered, or if Codex is not installed, queue a compensating Claude self-review pass focused on implementation correctness, security, and API contracts. Label findings as `[compensating: Codex-equivalent]`. If auth check times out (~5s), retry once; if still failing, record `timeout` and queue compensating pass. This pass runs after all channel dispatch attempts complete.
177
202
 
178
203
  Build the prompt in a temporary file and pass it over stdin:
179
204
 
@@ -209,9 +234,9 @@ NO_BROWSER=true gemini -p "$(cat "$PROMPT_FILE")" --output-format json --approva
209
234
 
210
235
  If the CLI exits with a non-zero code, produces malformed/unparseable output, or is killed by the tool runner timeout, record root-cause `failed` and queue a compensating pass for that channel.
211
236
 
212
- #### Channel 3: Superpowers code-reviewer
237
+ #### Channel 3: Claude CLI
213
238
 
214
- Dispatch the `superpowers:code-reviewer` subagent.
239
+ Dispatch via `claude -p` with the review prompt.
215
240
 
216
241
  - If explicit refs are being reviewed, provide `BASE_SHA` and `HEAD_SHA`
217
242
  - Otherwise provide:
@@ -297,7 +322,7 @@ Otherwise:
297
322
  3. Repeat for up to 3 fix rounds
298
323
  4. If any finding remains unresolved after 3 rounds, stop with verdict `needs-user-decision`
299
324
 
300
- **Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not installed`, `auth failed`, or `auth timeout` during fix rounds — its availability does not change within a session.
325
+ **Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not_installed`, `auth_failed`, or `timeout` during fix rounds — its availability does not change within a session.
301
326
 
302
327
  ### Step 8: Final Verdict
303
328
 
@@ -321,9 +346,9 @@ Output a concise summary in this format:
321
346
  [scope label]
322
347
 
323
348
  ### Channels Executed
324
- - Codex CLI — root cause: [completed / not installed / auth failed / auth timeout / failed], coverage: [full / compensating (Codex-equivalent)]
325
- - Gemini CLI — root cause: [completed / not installed / auth failed / auth timeout / failed], coverage: [full / compensating (Gemini-equivalent)]
326
- - Superpowers code-reviewer — [completed / failed]
349
+ - Codex CLI — root cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating (Codex-equivalent)]
350
+ - Gemini CLI — root cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating (Gemini-equivalent)]
351
+ - Claude CLIroot cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating]
327
352
 
328
353
  ### Findings
329
354
  [consensus findings first, then single-source findings]
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: review-pr
3
- description: Run all configured code review channels on a PR (Codex CLI, Gemini CLI, Superpowers code-reviewer)
3
+ description: Run all configured code review channels on a PR (Codex CLI, Gemini CLI, Claude CLI)
4
4
  phase: null
5
5
  order: null
6
6
  dependencies: []
@@ -21,15 +21,16 @@ of remembering three separate review invocations.
21
21
  The three channels are:
22
22
  1. **Codex CLI** — OpenAI's code analysis (implementation correctness, security, API contracts)
23
23
  2. **Gemini CLI** — Google's design reasoning (architectural patterns, broad context)
24
- 3. **Superpowers code-reviewer** — Claude subagent review (plan alignment, code quality, testing)
24
+ 3. **Claude CLI** — Anthropic's code review (plan alignment, code quality, testing)
25
25
 
26
26
  ## Inputs
27
27
 
28
28
  - $ARGUMENTS — PR number (optional; auto-detected from current branch if omitted)
29
- - docs/review-standards.md (optional) severity definitions and review criteria
30
- - docs/coding-standards.md (required) — coding conventions for review context
31
- - docs/tdd-standards.md (optional) test coverage expectations
32
- - AGENTS.md (optional) reviewer instructions with project-specific rules
29
+ - `.mmr.yaml`MMR CLI configuration (channels, review_criteria, defaults)
30
+
31
+ The CLI handles review context via config (`review_criteria` in `.mmr.yaml`).
32
+ Project-specific standards (coding-standards, review-standards) are referenced
33
+ in the review criteria config rather than read at dispatch time.
33
34
 
34
35
  ## Expected Outputs
35
36
 
@@ -48,119 +49,98 @@ PR_NUMBER="${ARGUMENTS:-$(gh pr view --json number -q .number 2>/dev/null)}"
48
49
 
49
50
  If no PR is found, stop and tell the user to create a PR first.
50
51
 
51
- ### Step 2: Gather Review Context
52
+ ### Step 2: Run MMR Review
52
53
 
53
- Collect the PR diff and project standards for review prompts:
54
+ Use the MMR CLI as the primary entry point for automated dispatch, reconciliation, and verdict:
54
55
 
55
56
  ```bash
56
- PR_DIFF=$(gh pr diff "$PR_NUMBER")
57
+ MMR_RESULT=$(mmr review --pr "$PR_NUMBER" --sync --format json)
58
+ # Extract job_id from JSON output for use in mmr reconcile
59
+ JOB_ID=$(echo "$MMR_RESULT" | grep -o '"job_id": "[^"]*"' | head -1 | cut -d'"' -f4)
57
60
  ```
58
61
 
59
- Read these files for review context (skip any that don't exist):
60
- - `docs/coding-standards.md`
61
- - `docs/tdd-standards.md`
62
- - `docs/review-standards.md`
63
- - `AGENTS.md`
62
+ The CLI handles:
63
+ - Installation and auth checks for each channel (codex, gemini, claude)
64
+ - Compensating passes when channels are unavailable (dispatched via `claude -p`)
65
+ - Output parsing and finding reconciliation
66
+ - Verdict derivation (pass/degraded-pass/blocked/needs-user-decision)
67
+ - Exit codes: 0=pass/degraded-pass, 2=blocked, 3=needs-user-decision
64
68
 
65
- ### Step 3: Run All Three Review Channels
69
+ The CLI supports multiple input modes:
70
+ - `--pr <number>` — review a GitHub PR (fetches diff via `gh pr diff`)
71
+ - `--diff <file>` — review a diff file
72
+ - `--staged` — review staged changes (`git diff --cached`)
73
+ - `--base <ref> --head <ref>` — review diff between two refs
66
74
 
67
- Run all three channels. Track which ones complete successfully.
75
+ **Manual fallback** (when MMR CLI is not installed):
68
76
 
69
- **Foreground only:** Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty output.
77
+ Run Codex, Gemini, and Claude CLI commands individually as foreground Bash calls.
78
+ Never use `run_in_background`, `&`, or `nohup`.
70
79
 
71
80
  #### Channel 1: Codex CLI
72
81
 
73
- **Installation check:**
74
- ```bash
75
- command -v codex >/dev/null 2>&1
76
- ```
77
- - If `codex` is not installed: queue a compensating Claude self-review pass focused on implementation correctness, security, and API contracts. Record root-cause `not_installed`. Skip to next channel.
78
-
79
- **Auth check first** (auth tokens expire — always re-verify):
80
-
81
- ```bash
82
- codex login status 2>/dev/null && echo "codex authenticated" || echo "codex NOT authenticated"
83
- ```
84
-
85
- If auth fails, tell the user: "Codex auth expired. Run: `! codex login`" — do NOT
86
- silently fall back. After the user re-authenticates, retry.
87
-
88
- If auth cannot be recovered, queue a compensating pass (same focus as above). Record root-cause `auth_failed`.
89
- If auth check times out (~5s), retry once. If still failing, record root-cause `auth_timeout` and queue compensating pass.
90
-
91
- **Run the review:**
92
-
93
82
  ```bash
83
+ command -v codex >/dev/null 2>&1 || echo "Codex not installed"
84
+ codex login status 2>/dev/null
94
85
  codex exec --skip-git-repo-check -s read-only --ephemeral "REVIEW_PROMPT" 2>/dev/null
95
86
  ```
96
87
 
97
- The review prompt must include:
98
- - The PR diff
99
- - Coding standards from docs/coding-standards.md
100
- - Review standards from docs/review-standards.md (if exists)
101
- - Instruction to report P0/P1/P2 findings as JSON with severity, location (file:line), description, and suggestion
102
-
103
- If the CLI exits with a non-zero code, produces malformed/unparseable output, or is killed by the tool runner timeout, record root-cause `failed` and queue a compensating pass for that channel.
88
+ If not installed or auth fails, queue a compensating pass focused on implementation
89
+ correctness, security, and API contracts. Auth failure recovery: `! codex login`.
104
90
 
105
91
  #### Channel 2: Gemini CLI
106
92
 
107
- **Installation check:**
108
93
  ```bash
109
- command -v gemini >/dev/null 2>&1
94
+ command -v gemini >/dev/null 2>&1 || echo "Gemini not installed"
95
+ NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
96
+ NO_BROWSER=true gemini -p "REVIEW_PROMPT" --output-format json --approval-mode yolo 2>/dev/null
110
97
  ```
111
- - If `gemini` is not installed: queue a compensating Claude self-review pass focused on architectural patterns, design reasoning, and broad context. Record root-cause `not_installed`. Label findings as `[compensating: Gemini-equivalent]`. Skip to next channel.
112
98
 
113
- **Auth check first:**
99
+ If not installed or auth fails, queue a compensating pass focused on architectural
100
+ patterns, design reasoning, and broad context. Auth failure recovery: `! gemini -p "hello"`.
101
+
102
+ #### Channel 3: Claude CLI
114
103
 
115
104
  ```bash
116
- GEMINI_AUTH_CHECK=$(NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1)
117
- GEMINI_EXIT=$?
118
- if [ "$GEMINI_EXIT" -eq 0 ]; then
119
- echo "gemini authenticated"
120
- elif [ "$GEMINI_EXIT" -eq 41 ]; then
121
- echo "gemini NOT authenticated (exit 41: auth error)"
122
- fi
105
+ claude -p "REVIEW_PROMPT" --output-format json 2>/dev/null
123
106
  ```
124
107
 
125
- If auth fails (exit 41), tell the user: "Gemini auth expired. Run: `! gemini -p \"hello\"`" — do NOT silently fall back. After the user re-authenticates, retry.
108
+ Claude CLI handles its own auth. Focus: plan alignment, code quality, testing.
126
109
 
127
- If auth cannot be recovered, queue a compensating pass focused on architectural patterns, design reasoning, and broad context. Record root-cause `auth_failed`. Label findings as `[compensating: Gemini-equivalent]`.
128
- If auth check times out (~5s), retry once. If still failing, record root-cause `auth_timeout` and queue compensating pass labeled `[compensating: Gemini-equivalent]`.
110
+ **After all channels:** Run any queued compensating passes as additional `claude -p`
111
+ dispatches with focused prompts. Label findings as `[compensating: Codex-equivalent]`
112
+ or `[compensating: Gemini-equivalent]`.
129
113
 
130
- **Run the review:**
114
+ ### Step 3: Run Agent Code Review (4th channel)
131
115
 
132
- ```bash
133
- NO_BROWSER=true gemini -p "REVIEW_PROMPT" --output-format json --approval-mode yolo 2>/dev/null
134
- ```
116
+ Dispatch your platform's code-reviewer skill for a complementary review:
117
+ - **Claude Code:** dispatch `superpowers:code-reviewer` subagent with the PR diff and review criteria
118
+ - **Other platforms:** use your platform's equivalent agent review skill
135
119
 
136
- Same review prompt content as Codex. Do NOT share one model's output with the other
137
- each reviews independently.
120
+ The agent skill runs inside your agent's context it has access to conversation history, project knowledge, and plan context that external CLIs lack.
138
121
 
139
- If the CLI exits with a non-zero code, produces malformed/unparseable output, or is killed by the tool runner timeout, record root-cause `failed` and queue a compensating pass for that channel.
122
+ **Important:** The agent's review output must use MMR-compatible finding schema: each finding needs `severity` (P0-P3), `location` (file:line), and `description` (`suggestion` is optional). The strict validator in `mmr reconcile` will reject findings with missing or invalid required fields.
140
123
 
141
- #### Channel 3: Superpowers Code-Reviewer Subagent
124
+ ### Step 4: Inject Agent Review into MMR
142
125
 
143
- Dispatch the `superpowers:code-reviewer` subagent. This channel always runs (it uses
144
- Claude, which is always available).
126
+ Feed the agent review findings into MMR for unified reconciliation:
145
127
 
146
128
  ```bash
147
- BASE_SHA=$(gh pr view "$PR_NUMBER" --json baseRefOid -q .baseRefOid)
148
- HEAD_SHA=$(gh pr view "$PR_NUMBER" --json headRefOid -q .headRefOid)
129
+ # job_id is captured from mmr review --sync --format json output
130
+ # Write agent findings to a temp file for mmr reconcile
131
+ echo "$AGENT_FINDINGS" > /tmp/agent-findings.json
132
+ mmr reconcile "$JOB_ID" --channel superpowers --input /tmp/agent-findings.json
149
133
  ```
150
134
 
151
- Dispatch with the Agent tool using `superpowers:code-reviewer` as the subagent type,
152
- providing:
153
- - `WHAT_WAS_IMPLEMENTED` PR title and description
154
- - `PLAN_OR_REQUIREMENTS` coding standards and review standards
155
- - `BASE_SHA` — base commit
156
- - `HEAD_SHA` — head commit
157
- - `DESCRIPTION` — PR summary
135
+ The `reconcile` command:
136
+ - Adds the agent's findings as a new channel in the job
137
+ - Re-runs reconciliation across ALL channels (CLI + agent)
138
+ - Outputs the unified verdict with all sources included
158
139
 
159
- **After all channels:** Run any queued compensating passes as foreground Claude self-review passes. Each uses the same review prompt as the missing channel, focused on that channel's strength area. Label findings as `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]`.
140
+ ### Step 5: Reconcile Findings
160
141
 
161
- ### Step 4: Reconcile Findings
162
-
163
- After all channels complete, reconcile findings:
142
+ When using `mmr review --sync`, reconciliation is automatic. For manual fallback,
143
+ reconcile findings after all channels complete:
164
144
 
165
145
  | Scenario | Confidence | Action |
166
146
  |----------|-----------|--------|
@@ -171,7 +151,7 @@ After all channels complete, reconcile findings:
171
151
  | Channels contradict each other | **Low** | Present to user for adjudication |
172
152
  | Compensating-pass P0/P1/P2 finding | **Single-source** | Fix per normal thresholds, label as compensating |
173
153
 
174
- ### Step 5: Report Results
154
+ ### Step 6: Report Results
175
155
 
176
156
  Output a review summary in this format:
177
157
 
@@ -179,9 +159,10 @@ Output a review summary in this format:
179
159
  ## Code Review Summary — PR #[number]
180
160
 
181
161
  ### Channels Executed
182
- - [ ] Codex CLI — root cause: [completed / not installed / auth failed / auth timeout / failed], coverage: [full / compensating (Codex-equivalent)]
183
- - [ ] Gemini CLI — root cause: [completed / not installed / auth failed / auth timeout / failed], coverage: [full / compensating (Gemini-equivalent)]
184
- - [ ] Superpowers code-reviewer — [completed / failed]
162
+ - [ ] Codex CLI — root cause: [completed / not installed / auth failed / timeout / failed], coverage: [full / compensating (Codex-equivalent)]
163
+ - [ ] Gemini CLI — root cause: [completed / not installed / auth failed / timeout / failed], coverage: [full / compensating (Gemini-equivalent)]
164
+ - [ ] Claude CLIroot cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating]
165
+ - [ ] Agent review — [completed / skipped], injected via mmr reconcile
185
166
 
186
167
  ### Consensus Findings (High Confidence)
187
168
  [Findings flagged by 2+ channels]
@@ -196,7 +177,7 @@ Output a review summary in this format:
196
177
  [pass / degraded-pass / blocked / needs-user-decision]
197
178
  ```
198
179
 
199
- ### Step 5a: Final Verdict
180
+ ### Step 6a: Final Verdict
200
181
 
201
182
  Return exactly one verdict:
202
183
 
@@ -209,17 +190,19 @@ Verdict precedence: `needs-user-decision` > `blocked` > `degraded-pass` > `pass`
209
190
 
210
191
  When compensating passes ran, maximum achievable verdict is `degraded-pass`. When both external channels were compensated, note "All findings are single-model."
211
192
 
212
- ### Step 6: Fix P0/P1/P2 Findings
193
+ ### Step 7: Fix P0/P1/P2 Findings
213
194
 
214
195
  If any P0, P1, or P2 findings exist:
215
196
  1. Fix them in the code
216
197
  2. Push the fixes: `git push`
217
- 3. Re-run the channels that produced findings to verify fixes
198
+ 3. Re-run the review to verify fixes: `mmr review --pr "$PR_NUMBER" --sync --format json`
218
199
  4. After 3 fix rounds with unresolved P0/P1/P2 findings, stop and ask the user for direction — do NOT merge automatically. Document remaining findings and let the user decide whether to continue fixing, create follow-up issues, or override.
219
200
 
220
- **Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not installed`, `auth failed`, or `auth timeout` during fix rounds — its availability does not change within a session.
201
+ **Note:** Fix cycles are an orchestration concern the caller (agent or human) handles the fix loop. The CLI provides the review and verdict; the caller decides whether to fix and re-run.
202
+
203
+ **Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not_installed`, `auth_failed`, or `timeout` during fix rounds — its availability does not change within a session.
221
204
 
222
- ### Step 7: Confirm Completion
205
+ ### Step 8: Confirm Completion
223
206
 
224
207
  After all findings are resolved (or 3 rounds complete), output:
225
208
 
@@ -236,21 +219,22 @@ Do NOT proceed to the next task or merge until this confirmation is output.
236
219
  | Channel not installed | Queue compensating pass, report root-cause `not_installed` |
237
220
  | Auth expired, user recovers | Retry dispatch |
238
221
  | Auth expired, user declines | Queue compensating pass, report root-cause `auth_failed` |
239
- | Auth check timeout (after retry) | Queue compensating pass, report root-cause `auth_timeout` |
222
+ | Auth check timeout (after retry) | Queue compensating pass, report root-cause `timeout` |
240
223
  | Channel fails during execution | Queue compensating pass, report root-cause `failed` |
241
224
  | Both external channels unavailable | Two compensating passes, max verdict: `degraded-pass`, note "All findings single-model" |
242
- | Superpowers unavailable | Run available CLIs, warn user (Superpowers is always-available Claude — no compensating pass) |
243
225
 
244
226
  ## Process Rules
245
227
 
246
- 1. **Foreground only** — Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`.
247
- 2. **All three channels are mandatory** — skip only when a tool is genuinely not installed, never by choice.
228
+ 1. **Foreground only** — Always run Codex, Gemini, and Claude CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`.
229
+ 2. **All three channels are mandatory** — Codex CLI, Gemini CLI, and Claude CLI. Skip only when a tool is genuinely not installed, never by choice.
248
230
  3. **Auth failures are not silent** — always surface to the user with the exact recovery command.
249
231
  4. **Independence** — never share one channel's output with another. Each reviews the diff independently.
250
232
  5. **Fix before proceeding** — P0/P1/P2 findings must be resolved before moving to the next task.
251
233
  6. **3-round limit** — never attempt more than 3 fix rounds. Surface unresolved findings to the user.
252
234
  7. **Document everything** — the review summary must show which channels ran and which were skipped, with reasons.
253
- 8. **Dispatch pattern** follows `multi-model-review-dispatch` knowledge entry. When modifying channel dispatch in this file, verify consistency with `review-code.md` and `post-implementation-review.md`.
235
+ 8. **CLI-first** use `mmr review --sync` as the primary entry point. Manual dispatch is a fallback only.
236
+ 9. **Job storage** — the CLI stores job data at `~/.mmr/jobs/{job-id}/results.json`. Review results are available via `mmr results <job-id>`.
237
+ 10. **Dispatch pattern** follows `multi-model-review-dispatch` knowledge entry. When modifying channel dispatch in this file, verify consistency with `review-code.md` and `post-implementation-review.md`.
254
238
 
255
239
  ---
256
240
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@zigrivers/scaffold",
3
- "version": "3.15.0",
3
+ "version": "3.16.0",
4
4
  "description": "AI-powered software project scaffolding pipeline",
5
5
  "type": "module",
6
6
  "workspaces": [