@zigrivers/scaffold 3.15.0 → 3.16.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +19 -12
- package/content/knowledge/core/automated-review-tooling.md +21 -26
- package/content/knowledge/core/multi-model-review-dispatch.md +30 -55
- package/content/tools/post-implementation-review.md +36 -7
- package/content/tools/review-code.md +33 -8
- package/content/tools/review-pr.md +79 -95
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -947,23 +947,22 @@ You don't need both — Scaffold works with whichever CLIs are available. Having
|
|
|
947
947
|
#### How mmr Works
|
|
948
948
|
|
|
949
949
|
```
|
|
950
|
-
|
|
951
|
-
|
|
952
|
-
|
|
950
|
+
# Recommended: single-command pipeline (--sync)
|
|
951
|
+
mmr review --pr 47 --sync ──→ Dispatches to all channels
|
|
952
|
+
Runs compensating passes for unavailable channels
|
|
953
|
+
Parses outputs, reconciles findings
|
|
954
|
+
Applies severity gate, derives verdict
|
|
955
|
+
Exit code: 0=pass, 2=blocked, 3=needs-decision
|
|
953
956
|
|
|
954
|
-
|
|
955
|
-
|
|
956
|
-
|
|
957
|
-
mmr results mmr-a1b2c3 ──→ Reconcile findings across channels
|
|
958
|
-
Run compensating passes for unavailable channels
|
|
959
|
-
Apply severity gate
|
|
960
|
-
Output unified findings
|
|
961
|
-
Exit code: 0=passed, 2=gate failed, 3=degraded
|
|
957
|
+
# Alternative: step-by-step (for async workflows)
|
|
958
|
+
mmr review --pr 47 ──→ Dispatch and await all channels
|
|
959
|
+
mmr results mmr-a1b2c3 ──→ Reconcile findings, output verdict
|
|
962
960
|
```
|
|
963
961
|
|
|
964
962
|
**Key features:**
|
|
965
963
|
|
|
966
|
-
-
|
|
964
|
+
- **--sync mode** — single-command pipeline: dispatch, parse, reconcile, verdict. The recommended entry point for agents and CI.
|
|
965
|
+
- **Compensating passes** — when a channel is unavailable, a Claude-based review focused on that channel's strength area runs automatically.
|
|
967
966
|
- **Per-channel auth verification** — checks authentication before every dispatch. Auth failures are never silent — `mmr` tells you exactly what expired and the command to fix it.
|
|
968
967
|
- **Immutable core prompt** — every channel gets the same severity definitions (P0-P3), output format spec (JSON), and review criteria. No prompt drift between channels.
|
|
969
968
|
- **Automated reconciliation** — when two channels flag the same location, that's consensus (high confidence). When only one channel flags something, it's unique (medium confidence). P0 from any single source is always high confidence.
|
|
@@ -1079,6 +1078,14 @@ You can also adjust per-channel timeouts, the default severity threshold, and na
|
|
|
1079
1078
|
**After creating a PR:**
|
|
1080
1079
|
|
|
1081
1080
|
```bash
|
|
1081
|
+
# Recommended: single-command review
|
|
1082
|
+
mmr review --pr 47 --sync --focus "auth flow, session handling"
|
|
1083
|
+
# → Full review output with verdict and findings
|
|
1084
|
+
|
|
1085
|
+
# Or with text output for readability:
|
|
1086
|
+
mmr review --pr 47 --sync --format text
|
|
1087
|
+
|
|
1088
|
+
# Step-by-step (when you want to continue working while review runs):
|
|
1082
1089
|
mmr review --pr 47 --focus "auth flow, session handling"
|
|
1083
1090
|
# → Job mmr-a1b2c3 started. 2/2 channels dispatched.
|
|
1084
1091
|
```
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: automated-review-tooling
|
|
3
|
-
description: Patterns for
|
|
3
|
+
description: Patterns for automated PR code review using AI CLI tools (Codex, Gemini, Claude) — orchestration, reconciliation, compensating passes, and CI integration
|
|
4
4
|
topics: [code-review, automation, codex, gemini, pull-requests, ci-cd, review-tooling]
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -24,10 +24,10 @@ These are the authoritative verdict definitions. Tool files (`review-code.md`, `
|
|
|
24
24
|
|
|
25
25
|
| Verdict | Condition |
|
|
26
26
|
|---------|-----------|
|
|
27
|
-
| `pass` | All
|
|
28
|
-
| `degraded-pass` |
|
|
29
|
-
| `blocked` |
|
|
30
|
-
| `needs-user-decision` |
|
|
27
|
+
| `pass` | All channels completed, no unresolved P0/P1/P2 |
|
|
28
|
+
| `degraded-pass` | Some channels unavailable, compensating passes ran, no unresolved P0/P1/P2 |
|
|
29
|
+
| `blocked` | Findings at or above fix threshold remain unresolved |
|
|
30
|
+
| `needs-user-decision` | No channels completed — insufficient data for a determination |
|
|
31
31
|
|
|
32
32
|
**Verdict precedence:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. When multiple conditions apply, the higher-precedence verdict wins.
|
|
33
33
|
|
|
@@ -35,13 +35,13 @@ These are the authoritative verdict definitions. Tool files (`review-code.md`, `
|
|
|
35
35
|
|
|
36
36
|
#### Status Model
|
|
37
37
|
|
|
38
|
-
`compensating` is a **coverage label** applied to a channel's output, not a replacement for the root-cause status. Each channel retains its root-cause status (`not_installed`, `auth_failed`, `
|
|
38
|
+
`compensating` is a **coverage label** applied to a channel's output, not a replacement for the root-cause status. Each channel retains its root-cause status (`not_installed`, `auth_failed`, `timeout`, `failed`) AND gains a coverage label (`compensating (X-equivalent)`) when a compensating pass ran. The fix cycle uses the **root-cause status** to decide whether to retry (never retry `not_installed`, `auth_failed`, `timeout`). The report uses the **coverage label** to show the reader what ran.
|
|
39
39
|
|
|
40
40
|
#### Compensating Passes
|
|
41
41
|
|
|
42
|
-
When
|
|
42
|
+
When a channel (Codex or Gemini) is unavailable, the CLI dispatches a compensating pass via `claude -p`:
|
|
43
43
|
|
|
44
|
-
- Same prompt structure as the missing channel, executed as a
|
|
44
|
+
- Same prompt structure as the missing channel, executed as a `claude -p` dispatch.
|
|
45
45
|
- Labeled `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]` in the review summary.
|
|
46
46
|
- Missing Codex → focus on implementation correctness, security, API contracts.
|
|
47
47
|
- Missing Gemini → focus on architectural patterns, design reasoning, broad context.
|
|
@@ -49,8 +49,6 @@ When an external channel (Codex or Gemini) is unavailable, run a compensating Cl
|
|
|
49
49
|
- Compensating-pass findings are **single-source confidence** — they do NOT raise to high confidence even if they agree with another channel's findings.
|
|
50
50
|
- Normal mandatory-fix thresholds apply: P0/P1/P2 findings from compensating passes still require fixing.
|
|
51
51
|
|
|
52
|
-
**Superpowers channel:** No compensating pass needed — Superpowers is a Claude subagent and is always available. If the Superpowers plugin is not installed, run available external CLIs and warn the user that review coverage is reduced.
|
|
53
|
-
|
|
54
52
|
#### Foreground-Only Execution
|
|
55
53
|
|
|
56
54
|
Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty or truncated output from Codex and Gemini CLIs. Multiple foreground calls can still run in parallel if the tool runner supports parallel tool invocations.
|
|
@@ -67,7 +65,7 @@ Reconciliation normalizes findings from all channels (real and compensating) to
|
|
|
67
65
|
|
|
68
66
|
The reconciliation output is a deduplicated list of findings with confidence scores. High-confidence findings (agreed by 2+ real channels) are actionable without further discussion. Low-confidence findings (single-source, or from compensating passes) still require action at P0/P1/P2 but should be noted as lower-confidence in the review summary.
|
|
69
67
|
|
|
70
|
-
Findings that appear in all three channels (Codex, Gemini,
|
|
68
|
+
Findings that appear in all three channels (Codex, Gemini, Claude) are considered maximum-confidence and should be surfaced first in the review summary. Findings that appear in only one channel should include the channel name in the finding description to help the developer assess confidence independently.
|
|
71
69
|
|
|
72
70
|
```bash
|
|
73
71
|
# Orchestration reconciliation workflow
|
|
@@ -80,16 +78,15 @@ Findings that appear in all three channels (Codex, Gemini, Superpowers) are cons
|
|
|
80
78
|
|
|
81
79
|
### Channel Dispatch Pattern and Orchestration
|
|
82
80
|
|
|
83
|
-
Each
|
|
81
|
+
Each channel (Codex, Gemini, Claude) follows the same dispatch pattern: check installation, check auth, then dispatch as a foreground call. If any step fails, record the root-cause status, queue a compensating pass (for Codex/Gemini), and continue to the next channel.
|
|
84
82
|
|
|
85
83
|
```bash
|
|
86
84
|
# Channel dispatch pattern
|
|
87
|
-
# For each
|
|
85
|
+
# For each channel (codex, gemini, claude):
|
|
88
86
|
# 1. command -v <tool> >/dev/null 2>&1 || { status=not_installed; queue_compensating; continue; }
|
|
89
87
|
# 2. <auth_check> || { status=auth_failed; queue_compensating; continue; }
|
|
90
88
|
# 3. <dispatch_foreground> || { status=failed; queue_compensating; continue; }
|
|
91
|
-
#
|
|
92
|
-
# After all: run queued compensating passes → reconcile → verdict
|
|
89
|
+
# After all: run queued compensating passes (via claude -p) → reconcile → verdict
|
|
93
90
|
```
|
|
94
91
|
|
|
95
92
|
After all channels and compensating passes complete, run the reconciliation workflow above and apply the verdict decision flow. Channel results and compensating-pass labels must be preserved in the review output for auditability — do not collapse or omit them even when findings are empty.
|
|
@@ -99,14 +96,14 @@ After all channels and compensating passes complete, run the reconciliation work
|
|
|
99
96
|
When Codex is unavailable (not installed or auth failure), the orchestration proceeds as follows:
|
|
100
97
|
|
|
101
98
|
1. The installation check (`command -v codex`) fails. Codex channel status is set to `not_installed`.
|
|
102
|
-
2. A compensating Codex-equivalent pass is queued: a
|
|
103
|
-
3. Gemini and
|
|
99
|
+
2. A compensating Codex-equivalent pass is queued: a `claude -p` dispatch focused on implementation correctness, security, and API contracts.
|
|
100
|
+
3. Gemini and Claude channels run normally.
|
|
104
101
|
4. The compensating pass runs, producing findings labeled `[compensating: Codex-equivalent]`.
|
|
105
|
-
5. Reconciliation merges findings from all three sources (Gemini,
|
|
102
|
+
5. Reconciliation merges findings from all three sources (Gemini, Claude, compensating-Codex).
|
|
106
103
|
6. Maximum achievable verdict is `degraded-pass` because a real channel was absent.
|
|
107
104
|
7. The review summary notes: "Codex channel: not_installed (compensating: Codex-equivalent pass ran)."
|
|
108
105
|
|
|
109
|
-
**Fix-cycle channel rule:** Only re-run channels that originally completed or ran as compensating passes. `failed` channels are covered by their compensating pass and are not retried during fix rounds. Never retry a channel with status `not_installed`, `auth_failed`, or `
|
|
106
|
+
**Fix-cycle channel rule:** Only re-run channels that originally completed or ran as compensating passes. `failed` channels are covered by their compensating pass and are not retried during fix rounds. Never retry a channel with status `not_installed`, `auth_failed`, or `timeout` — these indicate persistent environment conditions that will not resolve between fix rounds.
|
|
110
107
|
|
|
111
108
|
### Verdict Decision Flow
|
|
112
109
|
|
|
@@ -114,19 +111,17 @@ Apply the following evaluation order to determine the final verdict. The first m
|
|
|
114
111
|
|
|
115
112
|
```
|
|
116
113
|
Verdict evaluation order:
|
|
117
|
-
1.
|
|
114
|
+
1. No channels completed? → needs-user-decision
|
|
118
115
|
2. Any unresolved P0/P1/P2 after 3 fix rounds? → blocked
|
|
119
116
|
3. Any channel not at full coverage? → degraded-pass
|
|
120
117
|
4. All channels completed, no unresolved P0/P1/P2? → pass
|
|
121
118
|
```
|
|
122
119
|
|
|
123
|
-
A
|
|
124
|
-
|
|
125
|
-
A channel is "not at full coverage" when: it ran as a compensating pass instead of a real tool, it timed out partially, or the Superpowers plugin is not installed and available channels do not cover the full diff.
|
|
120
|
+
A channel is "not at full coverage" when: it ran as a compensating pass instead of a real tool, or it timed out.
|
|
126
121
|
|
|
127
|
-
**Verdict precedence reminder:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`.
|
|
122
|
+
**Verdict precedence reminder:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. When multiple conditions apply simultaneously, the higher-precedence verdict wins.
|
|
128
123
|
|
|
129
|
-
The verdict is always computed after all fix rounds are exhausted — do not emit a partial verdict mid-cycle. If a fix round resolves all P0/P1/P2 findings
|
|
124
|
+
The verdict is always computed after all fix rounds are exhausted — do not emit a partial verdict mid-cycle. If a fix round resolves all P0/P1/P2 findings, the verdict upgrades from `blocked` to `pass` or `degraded-pass` depending on channel coverage. This upgrade must be verified explicitly by re-running the reconciliation step after each fix round, not assumed from the fact that fixes were applied.
|
|
130
125
|
|
|
131
126
|
### Security-Focused Review Checklist
|
|
132
127
|
|
|
@@ -197,4 +192,4 @@ When external CLIs are unavailable, the degraded-mode behavior defined in the Su
|
|
|
197
192
|
5. When both external channels are unavailable, note "All findings are single-model (Claude only). External validation was unavailable." in the review summary.
|
|
198
193
|
6. Never silently drop unavailable channels — always record the channel status and compensating coverage label in the review output.
|
|
199
194
|
|
|
200
|
-
**
|
|
195
|
+
**Claude CLI channel:** Claude CLI handles its own auth and is generally always available. The compensating-pass mechanism applies to external CLIs (Codex, Gemini) that have an installation/auth gate. When Codex or Gemini are unavailable, compensating passes are dispatched via `claude -p` with focused prompts targeting the missing channel's strength area.
|
|
@@ -1,27 +1,18 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: multi-model-review-dispatch
|
|
3
|
-
description: Patterns for dispatching reviews to
|
|
4
|
-
topics: [multi-model, code-review,
|
|
3
|
+
description: Patterns for dispatching reviews to AI CLI tools (Codex, Gemini, Claude), including fallback strategies and finding reconciliation
|
|
4
|
+
topics: [multi-model, code-review, codex, gemini, claude, review-synthesis]
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# Multi-Model Review Dispatch
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
Reviews benefit from independent validation by multiple AI models. Different models have different blind spots — Codex excels at code-centric analysis, Gemini brings strength in design and architectural reasoning, and Claude provides plan alignment and code quality assessment. Dispatching to multiple models and reconciling their findings produces higher-quality reviews than any single model alone. This knowledge covers how to dispatch, how to handle failures, and how to reconcile disagreements.
|
|
10
10
|
|
|
11
11
|
## Summary
|
|
12
12
|
|
|
13
13
|
### When to Dispatch
|
|
14
14
|
|
|
15
|
-
Multi-model review
|
|
16
|
-
|
|
17
|
-
| Depth | Review Approach |
|
|
18
|
-
|-------|----------------|
|
|
19
|
-
| 1-2 | Claude-only, reduced pass count |
|
|
20
|
-
| 3 | Claude-only, full pass count |
|
|
21
|
-
| 4 | Full passes + one external model (if available) |
|
|
22
|
-
| 5 | Full passes + multi-model with reconciliation |
|
|
23
|
-
|
|
24
|
-
Dispatch is always optional. If no external model CLI is available, the review proceeds as a Claude-only enhanced review with additional self-review passes to partially compensate.
|
|
15
|
+
Multi-model review runs all enabled channels on every review. The MMR CLI (`mmr review --sync`) is the primary entry point and handles dispatch, parsing, reconciliation, and verdict derivation automatically.
|
|
25
16
|
|
|
26
17
|
### Model Selection
|
|
27
18
|
|
|
@@ -29,15 +20,16 @@ Dispatch is always optional. If no external model CLI is available, the review p
|
|
|
29
20
|
|-------|----------|----------|
|
|
30
21
|
| **Codex** (OpenAI) | Code analysis, implementation correctness, API contract validation | Code reviews, security reviews, API reviews, database schema reviews |
|
|
31
22
|
| **Gemini** (Google) | Design reasoning, architectural patterns, broad context understanding | Architecture reviews, PRD reviews, UX reviews, domain model reviews |
|
|
23
|
+
| **Claude** (Anthropic) | Plan alignment, code quality, testing thoroughness | Code reviews, plan verification, test coverage |
|
|
32
24
|
|
|
33
|
-
|
|
25
|
+
All enabled channels run on every review. When a channel is unavailable, a compensating pass is dispatched via `claude -p` focused on the missing channel's strength area.
|
|
34
26
|
|
|
35
27
|
### Graceful Fallback
|
|
36
28
|
|
|
37
29
|
External models are never required. The fallback chain:
|
|
38
30
|
1. Attempt dispatch to selected model(s)
|
|
39
31
|
2. If CLI unavailable → skip that model, note in report
|
|
40
|
-
3. If timeout →
|
|
32
|
+
3. If timeout → CLI kills the process; no partial output preserved; compensating pass runs
|
|
41
33
|
4. If all external models fail → Claude-only enhanced review (additional self-review passes)
|
|
42
34
|
|
|
43
35
|
The review never blocks on external model availability.
|
|
@@ -82,15 +74,15 @@ If auth fails, report status `auth_failed` and surface recovery to the user:
|
|
|
82
74
|
- Codex: "Codex auth expired — run `! codex login` to re-authenticate"
|
|
83
75
|
- Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
|
|
84
76
|
|
|
85
|
-
If auth check times out (~5 seconds), retry once. If still failing, report `
|
|
77
|
+
If auth check times out (~5 seconds), retry once. If still failing, report `timeout`.
|
|
86
78
|
If auth succeeds, report `ready` and proceed to dispatch.
|
|
87
79
|
|
|
88
80
|
**Post-dispatch terminal states:**
|
|
89
81
|
- `completed` — channel produced results, use normally
|
|
90
|
-
- `
|
|
91
|
-
- `failed` — crashed or unparseable output; triggers compensating pass
|
|
82
|
+
- `timeout` — channel exceeded time limit; CLI kills the process and marks it as `timeout`; triggers compensating pass
|
|
83
|
+
- `failed` — crashed or unparseable output; triggers compensating pass
|
|
92
84
|
|
|
93
|
-
Verdict impact: `
|
|
85
|
+
Verdict impact: `timeout` and `failed` channels mean the review is degraded. Maximum verdict is `degraded-pass` when any channel has a non-`completed` terminal state.
|
|
94
86
|
|
|
95
87
|
#### Prompt Formatting
|
|
96
88
|
|
|
@@ -126,10 +118,12 @@ Respond with a JSON array of findings:
|
|
|
126
118
|
"severity": "P0|P1|P2|P3",
|
|
127
119
|
"category": "coverage|consistency|correctness|completeness",
|
|
128
120
|
"location": "section or line reference",
|
|
129
|
-
"
|
|
121
|
+
"description": "description of the issue",
|
|
130
122
|
"suggestion": "recommended fix"
|
|
131
123
|
}
|
|
132
124
|
]
|
|
125
|
+
|
|
126
|
+
Note: `id` and `category` are optional — the CLI auto-generates IDs (F-001, F-002, ...) when omitted.
|
|
133
127
|
```
|
|
134
128
|
|
|
135
129
|
#### Output Parsing
|
|
@@ -137,15 +131,11 @@ Respond with a JSON array of findings:
|
|
|
137
131
|
External model output is parsed as JSON. Handle common parsing issues:
|
|
138
132
|
- Strip markdown code fences (```json ... ```) if the model wraps output
|
|
139
133
|
- Handle trailing commas in JSON arrays
|
|
140
|
-
- Validate that each finding has the required fields (severity,
|
|
134
|
+
- Validate that each finding has the required fields (severity, location, description, suggestion)
|
|
141
135
|
- Discard malformed entries rather than failing the entire parse
|
|
142
136
|
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
docs/reviews/{artifact}/codex-review.json — raw Codex findings
|
|
146
|
-
docs/reviews/{artifact}/gemini-review.json — raw Gemini findings
|
|
147
|
-
docs/reviews/{artifact}/review-summary.md — reconciled synthesis
|
|
148
|
-
```
|
|
137
|
+
The CLI stores raw output at `~/.mmr/jobs/{job-id}/` per channel. Review results
|
|
138
|
+
are available via `mmr results <job-id>`.
|
|
149
139
|
|
|
150
140
|
### Timeout Handling
|
|
151
141
|
|
|
@@ -158,14 +148,7 @@ External model calls can hang or take unreasonably long. Set reasonable timeouts
|
|
|
158
148
|
| Medium artifact review (2000-10000 words) | 120 seconds | Needs more processing time |
|
|
159
149
|
| Large artifact review (>10000 words) | 180 seconds | Maximum reasonable wait |
|
|
160
150
|
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
If a timeout occurs mid-response:
|
|
164
|
-
1. Check if the partial output contains valid JSON entries
|
|
165
|
-
2. If yes, use the valid entries and note "partial results" in the report
|
|
166
|
-
3. If no, treat as a model failure and fall back
|
|
167
|
-
|
|
168
|
-
Never wait indefinitely. A review that completes in 3 minutes with Claude-only findings is better than one that blocks for 10 minutes waiting for an external model.
|
|
151
|
+
Never wait indefinitely. A review that completes in 3 minutes with Claude-only findings is better than one that blocks for 10 minutes waiting for an external model. When a channel times out, the CLI kills the process — no partial output is preserved. A compensating pass runs in its place.
|
|
169
152
|
|
|
170
153
|
### Finding Reconciliation
|
|
171
154
|
|
|
@@ -224,9 +207,9 @@ When synthesizing multi-model findings, classify each finding:
|
|
|
224
207
|
# Multi-Model Review Summary: [Artifact Name]
|
|
225
208
|
|
|
226
209
|
## Models Used
|
|
227
|
-
- Claude
|
|
228
|
-
- Codex
|
|
229
|
-
- Gemini
|
|
210
|
+
- Claude CLI — [available/unavailable/timeout]
|
|
211
|
+
- Codex CLI — [available/unavailable/timeout]
|
|
212
|
+
- Gemini CLI — [available/unavailable/timeout]
|
|
230
213
|
|
|
231
214
|
## Consensus Findings
|
|
232
215
|
| # | Severity | Finding | Models | Confidence |
|
|
@@ -251,14 +234,10 @@ or areas where external models provided unique value]
|
|
|
251
234
|
|
|
252
235
|
#### Raw JSON Preservation
|
|
253
236
|
|
|
254
|
-
Always preserve the raw JSON output from
|
|
237
|
+
Always preserve the raw JSON output from each channel, even after reconciliation. The raw findings serve as an audit trail and enable re-analysis if the reconciliation logic is later improved.
|
|
255
238
|
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
codex-review.json — raw output from Codex
|
|
259
|
-
gemini-review.json — raw output from Gemini
|
|
260
|
-
review-summary.md — reconciled synthesis
|
|
261
|
-
```
|
|
239
|
+
The CLI stores raw output at `~/.mmr/jobs/{job-id}/` with per-channel result files.
|
|
240
|
+
Results are accessible via `mmr results <job-id>`.
|
|
262
241
|
|
|
263
242
|
### Quality Gates
|
|
264
243
|
|
|
@@ -266,20 +245,18 @@ Minimum standards for a multi-model review to be considered complete:
|
|
|
266
245
|
|
|
267
246
|
| Gate | Threshold | Rationale |
|
|
268
247
|
|------|-----------|-----------|
|
|
269
|
-
|
|
|
270
|
-
| Coverage threshold | Every review pass has at least one finding or explicit "no issues found" note | Ensures all passes were actually executed |
|
|
248
|
+
| Coverage threshold | Every channel has at least one finding or explicit "no issues found" note | Ensures all channels were actually executed |
|
|
271
249
|
| Reconciliation completeness | All cross-model disagreements have documented resolutions | No unresolved conflicts |
|
|
272
|
-
| Raw output preserved |
|
|
250
|
+
| Raw output preserved | Per-channel results exist for all dispatched channels | Audit trail |
|
|
273
251
|
|
|
274
|
-
|
|
252
|
+
Zero findings across all channels is a valid outcome when the diff is clean.
|
|
275
253
|
|
|
276
254
|
#### Degraded-Mode Gate Adaptation
|
|
277
255
|
|
|
278
256
|
When channels are skipped and compensating passes are used:
|
|
279
257
|
|
|
280
|
-
- **Minimum finding count** gate: compensating passes count toward the total but are not treated as separate external channels for consensus purposes.
|
|
281
258
|
- **Reconciliation completeness** gate (cross-model disagreement documentation): applies whenever 2+ distinct model perspectives participate (Claude + one external counts). N/A only when Claude is the sole perspective (no external models and no compensating passes that introduce genuinely different framing).
|
|
282
|
-
- **Coverage threshold** gate: compensating passes satisfy the "every
|
|
259
|
+
- **Coverage threshold** gate: compensating passes satisfy the "every channel has at least one finding or explicit no-issues note" requirement.
|
|
283
260
|
- The reconciled output must record which channels were real, which were compensating, and which were skipped, so the orchestration layer can apply appropriate verdict logic.
|
|
284
261
|
|
|
285
262
|
### Common Anti-Patterns
|
|
@@ -288,12 +265,10 @@ When channels are skipped and compensating passes are used:
|
|
|
288
265
|
|
|
289
266
|
**Ignoring disagreements.** Two models disagree, and the reviewer picks one without analysis. Fix: disagreements are the most valuable signal in multi-model review. They identify areas of genuine ambiguity or complexity. Always investigate and document the resolution.
|
|
290
267
|
|
|
291
|
-
**
|
|
292
|
-
|
|
293
|
-
**No fallback plan.** The review pipeline assumes external models are always available. When Codex is down, the review fails entirely. Fix: external dispatch is always optional. The fallback to Claude-only enhanced review must be implemented and tested.
|
|
268
|
+
**No fallback plan.** The review pipeline assumes external models are always available. When Codex is down, the review fails entirely. Fix: external dispatch is always optional. The CLI automatically dispatches compensating passes via `claude -p` when channels are unavailable.
|
|
294
269
|
|
|
295
270
|
**Over-weighting consensus.** Two models agree on a finding, so it must be correct. But both models may share the same bias (e.g., both flag a pattern as an anti-pattern that is actually appropriate for this project's constraints). Fix: consensus increases confidence but does not guarantee correctness. All findings still require artifact-level verification.
|
|
296
271
|
|
|
297
272
|
**Dispatching the full pipeline context.** Sending the entire project context (all docs, all code) to the external model. This exceeds context limits and dilutes focus. Fix: send only the artifact under review and the minimal upstream context needed for that specific review.
|
|
298
273
|
|
|
299
|
-
**
|
|
274
|
+
**Treating a timeout as a silent skip.** A channel times out and the reviewer proceeds without documenting it. Fix: when a channel times out, record the root-cause status as `timeout`, queue a compensating pass, and include it in the review summary. The CLI kills timed-out processes — no partial output is available, but the compensating pass ensures coverage.
|
|
@@ -26,7 +26,7 @@ comprehensive quality check before releasing or handing off the project.
|
|
|
26
26
|
The three channels are:
|
|
27
27
|
1. **Codex CLI** — Implementation correctness, security, API contracts
|
|
28
28
|
2. **Gemini CLI** — Design reasoning, architectural patterns, broad context
|
|
29
|
-
3. **
|
|
29
|
+
3. **Claude CLI** — Plan alignment, code quality, testing
|
|
30
30
|
|
|
31
31
|
## Inputs
|
|
32
32
|
|
|
@@ -191,6 +191,7 @@ Return ALL findings as valid JSON:
|
|
|
191
191
|
{
|
|
192
192
|
"severity": "P0|P1|P2|P3",
|
|
193
193
|
"category": "architecture-alignment|security|error-handling|test-coverage|complexity|dependencies",
|
|
194
|
+
"location": "relative/path/to/file.ts:42",
|
|
194
195
|
"file": "relative/path/to/file.ts",
|
|
195
196
|
"line": 42,
|
|
196
197
|
"description": "Specific description of the issue",
|
|
@@ -226,7 +227,7 @@ If not installed: queue a compensating pass (implementation correctness, securit
|
|
|
226
227
|
codex login status 2>/dev/null && echo "codex authenticated" || echo "codex NOT authenticated"
|
|
227
228
|
```
|
|
228
229
|
|
|
229
|
-
If not authenticated: tell the user "Codex auth expired. Run: `! codex login`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (
|
|
230
|
+
If not authenticated: tell the user "Codex auth expired. Run: `! codex login`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (timeout or user declines): queue a compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`).
|
|
230
231
|
|
|
231
232
|
If Codex fails during execution (non-zero exit, malformed output, timeout): queue a compensating pass with the same focus and label.
|
|
232
233
|
|
|
@@ -254,7 +255,7 @@ If not installed: queue a compensating pass (architectural patterns, design reas
|
|
|
254
255
|
NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
|
|
255
256
|
```
|
|
256
257
|
|
|
257
|
-
If exit code is 41: tell the user "Gemini auth expired. Run: `! gemini -p \"hello\"`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (
|
|
258
|
+
If exit code is 41: tell the user "Gemini auth expired. Run: `! gemini -p \"hello\"`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (timeout or user declines): queue a compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`).
|
|
258
259
|
|
|
259
260
|
If Gemini fails during execution (non-zero exit, malformed output, timeout): queue a compensating pass with the same focus and label.
|
|
260
261
|
|
|
@@ -291,6 +292,7 @@ surfaces to this format before returning):
|
|
|
291
292
|
{
|
|
292
293
|
"severity": "P0|P1|P2|P3",
|
|
293
294
|
"category": "architecture-alignment|security|error-handling|test-coverage|complexity|dependencies",
|
|
295
|
+
"location": "relative/path/to/file.ts:42",
|
|
294
296
|
"file": "relative/path/to/file.ts",
|
|
295
297
|
"line": 42,
|
|
296
298
|
"description": "Specific description of the issue",
|
|
@@ -300,6 +302,10 @@ surfaces to this format before returning):
|
|
|
300
302
|
}
|
|
301
303
|
```
|
|
302
304
|
|
|
305
|
+
**MMR compatibility:** The `location` field (`file:line` format) is required for
|
|
306
|
+
`mmr reconcile` injection. The `file` and `line` fields are retained for backward
|
|
307
|
+
compatibility with direct channel consumers.
|
|
308
|
+
|
|
303
309
|
Store as `SUPERPOWERS_PHASE1_FINDINGS`.
|
|
304
310
|
|
|
305
311
|
### Step 5: Run Phase 2 — Parallel User Story Review
|
|
@@ -441,8 +447,8 @@ before returning. Then return all three channels' findings plus channel status:
|
|
|
441
447
|
{
|
|
442
448
|
"story": "[STORY_TITLE]",
|
|
443
449
|
"channel_status": {
|
|
444
|
-
"codex": { "root_cause": "null|not_installed|auth_failed|
|
|
445
|
-
"gemini": { "root_cause": "null|not_installed|auth_failed|
|
|
450
|
+
"codex": { "root_cause": "null|not_installed|auth_failed|timeout|failed", "coverage_status": "full|compensating" },
|
|
451
|
+
"gemini": { "root_cause": "null|not_installed|auth_failed|timeout|failed", "coverage_status": "full|compensating" },
|
|
446
452
|
"superpowers": { "root_cause": null, "coverage_status": "full" }
|
|
447
453
|
},
|
|
448
454
|
"codex": { "findings": [...] },
|
|
@@ -453,6 +459,29 @@ before returning. Then return all three channels' findings plus channel status:
|
|
|
453
459
|
|
|
454
460
|
Collect findings from all subagents. Store as `PHASE2_FINDINGS`.
|
|
455
461
|
|
|
462
|
+
### Step 5e: Optional — Inject Findings into MMR for Unified Reconciliation
|
|
463
|
+
|
|
464
|
+
If an MMR job exists (e.g., from a prior `mmr review` run on the same branch), the
|
|
465
|
+
agent can inject its post-implementation review findings into MMR for unified
|
|
466
|
+
reconciliation across all channels:
|
|
467
|
+
|
|
468
|
+
```bash
|
|
469
|
+
# Inject Phase 1 and Phase 2 findings into an existing MMR job
|
|
470
|
+
# Write agent findings to a temp file for mmr reconcile
|
|
471
|
+
echo "$AGENT_FINDINGS" > /tmp/agent-findings.json
|
|
472
|
+
mmr reconcile "$JOB_ID" --channel superpowers --input /tmp/agent-findings.json
|
|
473
|
+
```
|
|
474
|
+
|
|
475
|
+
All findings injected via `mmr reconcile` must use MMR-compatible schema: each
|
|
476
|
+
finding needs `severity` (P0-P3), `location` (file:line), and `description`
|
|
477
|
+
(`suggestion` is optional). The strict validator will reject findings with
|
|
478
|
+
missing or invalid required fields.
|
|
479
|
+
|
|
480
|
+
This step is optional — post-implementation review is a full-codebase review (not
|
|
481
|
+
diff-only), so it operates independently of `mmr review`. Use `mmr reconcile` only
|
|
482
|
+
when you want to merge post-implementation findings into an existing MMR job for a
|
|
483
|
+
single unified verdict.
|
|
484
|
+
|
|
456
485
|
### Step 6: Consolidate Findings
|
|
457
486
|
|
|
458
487
|
Merge all findings from Phase 1 (`CODEX_PHASE1_FINDINGS`, `GEMINI_PHASE1_FINDINGS`,
|
|
@@ -656,9 +685,9 @@ the user they require manual attention before the project is ready to release.
|
|
|
656
685
|
| Codex not installed (`command -v` fails) | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "not_installed" in report |
|
|
657
686
|
| Gemini not installed (`command -v` fails) | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "not_installed" in report |
|
|
658
687
|
| Codex auth expired — user recovers | Re-run auth check; proceed with full Codex channel |
|
|
659
|
-
| Codex auth expired — user declines or
|
|
688
|
+
| Codex auth expired — user declines or timeout | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "auth_failed" or "timeout" in report |
|
|
660
689
|
| Gemini auth expired (exit 41) — user recovers | Re-run auth check; proceed with full Gemini channel |
|
|
661
|
-
| Gemini auth expired — user declines or
|
|
690
|
+
| Gemini auth expired — user declines or timeout | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "auth_failed" or "timeout" in report |
|
|
662
691
|
| Channel fails during execution (non-zero exit, malformed output, timeout) | Queue compensating pass for that channel with same focus and label; document root cause in report |
|
|
663
692
|
| Both external CLIs unavailable (any combination of not_installed / auth failure) | Run all compensating passes plus Superpowers code-reviewer; report coverage as "degraded-coverage"; warn user that review coverage is reduced |
|
|
664
693
|
| Superpowers unavailable | Document as "unavailable" in report; proceed with remaining channels; Superpowers is a Claude subagent and should always be available |
|
|
@@ -23,7 +23,7 @@ anything leaves the machine.
|
|
|
23
23
|
The three channels are:
|
|
24
24
|
1. **Codex CLI** — implementation correctness, security, API contracts
|
|
25
25
|
2. **Gemini CLI** — architectural patterns, broad-context reasoning
|
|
26
|
-
3. **
|
|
26
|
+
3. **Claude CLI** — Claude subagent review of code quality, tests, and plan alignment
|
|
27
27
|
|
|
28
28
|
## Inputs
|
|
29
29
|
|
|
@@ -46,6 +46,31 @@ The three channels are:
|
|
|
46
46
|
|
|
47
47
|
## Instructions
|
|
48
48
|
|
|
49
|
+
### Primary: MMR CLI + Agent Reconcile
|
|
50
|
+
|
|
51
|
+
When the MMR CLI is installed, use it as the primary entry point:
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
# Staged changes
|
|
55
|
+
mmr review --staged --sync --format json
|
|
56
|
+
|
|
57
|
+
# Branch diff against main
|
|
58
|
+
mmr review --base main --sync --format json
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
After the CLI review completes, dispatch the agent's code-reviewer skill (4th channel) and inject findings into the MMR job for unified reconciliation:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
# job_id is captured from mmr review --sync --format json output
|
|
65
|
+
# Write agent findings to a temp file for mmr reconcile
|
|
66
|
+
echo "$AGENT_FINDINGS" > /tmp/agent-findings.json
|
|
67
|
+
mmr reconcile "$JOB_ID" --channel superpowers --input /tmp/agent-findings.json
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
The agent's review output must use MMR-compatible finding schema: each finding needs `severity` (P0-P3), `location` (file:line), and `description` (`suggestion` is optional).
|
|
71
|
+
|
|
72
|
+
If `mmr` is not installed (`command -v mmr` fails), fall back to the manual multi-channel flow below.
|
|
73
|
+
|
|
49
74
|
### Step 1: Detect Mode
|
|
50
75
|
|
|
51
76
|
Parse `$ARGUMENTS` and set:
|
|
@@ -173,7 +198,7 @@ codex login status 2>/dev/null
|
|
|
173
198
|
- If `codex` is not installed: skip this channel and record root-cause `not_installed`
|
|
174
199
|
- If auth fails: tell the user to run `! codex login`, retry after recovery, and if recovery is not possible, record root-cause `auth_failed` and continue with the remaining channels
|
|
175
200
|
|
|
176
|
-
If auth cannot be recovered, or if Codex is not installed, queue a compensating Claude self-review pass focused on implementation correctness, security, and API contracts. Label findings as `[compensating: Codex-equivalent]`. If auth check times out (~5s), retry once; if still failing, record `
|
|
201
|
+
If auth cannot be recovered, or if Codex is not installed, queue a compensating Claude self-review pass focused on implementation correctness, security, and API contracts. Label findings as `[compensating: Codex-equivalent]`. If auth check times out (~5s), retry once; if still failing, record `timeout` and queue compensating pass. This pass runs after all channel dispatch attempts complete.
|
|
177
202
|
|
|
178
203
|
Build the prompt in a temporary file and pass it over stdin:
|
|
179
204
|
|
|
@@ -209,9 +234,9 @@ NO_BROWSER=true gemini -p "$(cat "$PROMPT_FILE")" --output-format json --approva
|
|
|
209
234
|
|
|
210
235
|
If the CLI exits with a non-zero code, produces malformed/unparseable output, or is killed by the tool runner timeout, record root-cause `failed` and queue a compensating pass for that channel.
|
|
211
236
|
|
|
212
|
-
#### Channel 3:
|
|
237
|
+
#### Channel 3: Claude CLI
|
|
213
238
|
|
|
214
|
-
Dispatch
|
|
239
|
+
Dispatch via `claude -p` with the review prompt.
|
|
215
240
|
|
|
216
241
|
- If explicit refs are being reviewed, provide `BASE_SHA` and `HEAD_SHA`
|
|
217
242
|
- Otherwise provide:
|
|
@@ -297,7 +322,7 @@ Otherwise:
|
|
|
297
322
|
3. Repeat for up to 3 fix rounds
|
|
298
323
|
4. If any finding remains unresolved after 3 rounds, stop with verdict `needs-user-decision`
|
|
299
324
|
|
|
300
|
-
**Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `
|
|
325
|
+
**Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not_installed`, `auth_failed`, or `timeout` during fix rounds — its availability does not change within a session.
|
|
301
326
|
|
|
302
327
|
### Step 8: Final Verdict
|
|
303
328
|
|
|
@@ -321,9 +346,9 @@ Output a concise summary in this format:
|
|
|
321
346
|
[scope label]
|
|
322
347
|
|
|
323
348
|
### Channels Executed
|
|
324
|
-
- Codex CLI — root cause: [completed /
|
|
325
|
-
- Gemini CLI — root cause: [completed /
|
|
326
|
-
-
|
|
349
|
+
- Codex CLI — root cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating (Codex-equivalent)]
|
|
350
|
+
- Gemini CLI — root cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating (Gemini-equivalent)]
|
|
351
|
+
- Claude CLI — root cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating]
|
|
327
352
|
|
|
328
353
|
### Findings
|
|
329
354
|
[consensus findings first, then single-source findings]
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: review-pr
|
|
3
|
-
description: Run all configured code review channels on a PR (Codex CLI, Gemini CLI,
|
|
3
|
+
description: Run all configured code review channels on a PR (Codex CLI, Gemini CLI, Claude CLI)
|
|
4
4
|
phase: null
|
|
5
5
|
order: null
|
|
6
6
|
dependencies: []
|
|
@@ -21,15 +21,16 @@ of remembering three separate review invocations.
|
|
|
21
21
|
The three channels are:
|
|
22
22
|
1. **Codex CLI** — OpenAI's code analysis (implementation correctness, security, API contracts)
|
|
23
23
|
2. **Gemini CLI** — Google's design reasoning (architectural patterns, broad context)
|
|
24
|
-
3. **
|
|
24
|
+
3. **Claude CLI** — Anthropic's code review (plan alignment, code quality, testing)
|
|
25
25
|
|
|
26
26
|
## Inputs
|
|
27
27
|
|
|
28
28
|
- $ARGUMENTS — PR number (optional; auto-detected from current branch if omitted)
|
|
29
|
-
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
-
|
|
29
|
+
- `.mmr.yaml` — MMR CLI configuration (channels, review_criteria, defaults)
|
|
30
|
+
|
|
31
|
+
The CLI handles review context via config (`review_criteria` in `.mmr.yaml`).
|
|
32
|
+
Project-specific standards (coding-standards, review-standards) are referenced
|
|
33
|
+
in the review criteria config rather than read at dispatch time.
|
|
33
34
|
|
|
34
35
|
## Expected Outputs
|
|
35
36
|
|
|
@@ -48,119 +49,98 @@ PR_NUMBER="${ARGUMENTS:-$(gh pr view --json number -q .number 2>/dev/null)}"
|
|
|
48
49
|
|
|
49
50
|
If no PR is found, stop and tell the user to create a PR first.
|
|
50
51
|
|
|
51
|
-
### Step 2:
|
|
52
|
+
### Step 2: Run MMR Review
|
|
52
53
|
|
|
53
|
-
|
|
54
|
+
Use the MMR CLI as the primary entry point for automated dispatch, reconciliation, and verdict:
|
|
54
55
|
|
|
55
56
|
```bash
|
|
56
|
-
|
|
57
|
+
MMR_RESULT=$(mmr review --pr "$PR_NUMBER" --sync --format json)
|
|
58
|
+
# Extract job_id from JSON output for use in mmr reconcile
|
|
59
|
+
JOB_ID=$(echo "$MMR_RESULT" | grep -o '"job_id": "[^"]*"' | head -1 | cut -d'"' -f4)
|
|
57
60
|
```
|
|
58
61
|
|
|
59
|
-
|
|
60
|
-
-
|
|
61
|
-
- `
|
|
62
|
-
-
|
|
63
|
-
-
|
|
62
|
+
The CLI handles:
|
|
63
|
+
- Installation and auth checks for each channel (codex, gemini, claude)
|
|
64
|
+
- Compensating passes when channels are unavailable (dispatched via `claude -p`)
|
|
65
|
+
- Output parsing and finding reconciliation
|
|
66
|
+
- Verdict derivation (pass/degraded-pass/blocked/needs-user-decision)
|
|
67
|
+
- Exit codes: 0=pass/degraded-pass, 2=blocked, 3=needs-user-decision
|
|
64
68
|
|
|
65
|
-
|
|
69
|
+
The CLI supports multiple input modes:
|
|
70
|
+
- `--pr <number>` — review a GitHub PR (fetches diff via `gh pr diff`)
|
|
71
|
+
- `--diff <file>` — review a diff file
|
|
72
|
+
- `--staged` — review staged changes (`git diff --cached`)
|
|
73
|
+
- `--base <ref> --head <ref>` — review diff between two refs
|
|
66
74
|
|
|
67
|
-
|
|
75
|
+
**Manual fallback** (when MMR CLI is not installed):
|
|
68
76
|
|
|
69
|
-
|
|
77
|
+
Run Codex, Gemini, and Claude CLI commands individually as foreground Bash calls.
|
|
78
|
+
Never use `run_in_background`, `&`, or `nohup`.
|
|
70
79
|
|
|
71
80
|
#### Channel 1: Codex CLI
|
|
72
81
|
|
|
73
|
-
**Installation check:**
|
|
74
|
-
```bash
|
|
75
|
-
command -v codex >/dev/null 2>&1
|
|
76
|
-
```
|
|
77
|
-
- If `codex` is not installed: queue a compensating Claude self-review pass focused on implementation correctness, security, and API contracts. Record root-cause `not_installed`. Skip to next channel.
|
|
78
|
-
|
|
79
|
-
**Auth check first** (auth tokens expire — always re-verify):
|
|
80
|
-
|
|
81
|
-
```bash
|
|
82
|
-
codex login status 2>/dev/null && echo "codex authenticated" || echo "codex NOT authenticated"
|
|
83
|
-
```
|
|
84
|
-
|
|
85
|
-
If auth fails, tell the user: "Codex auth expired. Run: `! codex login`" — do NOT
|
|
86
|
-
silently fall back. After the user re-authenticates, retry.
|
|
87
|
-
|
|
88
|
-
If auth cannot be recovered, queue a compensating pass (same focus as above). Record root-cause `auth_failed`.
|
|
89
|
-
If auth check times out (~5s), retry once. If still failing, record root-cause `auth_timeout` and queue compensating pass.
|
|
90
|
-
|
|
91
|
-
**Run the review:**
|
|
92
|
-
|
|
93
82
|
```bash
|
|
83
|
+
command -v codex >/dev/null 2>&1 || echo "Codex not installed"
|
|
84
|
+
codex login status 2>/dev/null
|
|
94
85
|
codex exec --skip-git-repo-check -s read-only --ephemeral "REVIEW_PROMPT" 2>/dev/null
|
|
95
86
|
```
|
|
96
87
|
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
- Coding standards from docs/coding-standards.md
|
|
100
|
-
- Review standards from docs/review-standards.md (if exists)
|
|
101
|
-
- Instruction to report P0/P1/P2 findings as JSON with severity, location (file:line), description, and suggestion
|
|
102
|
-
|
|
103
|
-
If the CLI exits with a non-zero code, produces malformed/unparseable output, or is killed by the tool runner timeout, record root-cause `failed` and queue a compensating pass for that channel.
|
|
88
|
+
If not installed or auth fails, queue a compensating pass focused on implementation
|
|
89
|
+
correctness, security, and API contracts. Auth failure recovery: `! codex login`.
|
|
104
90
|
|
|
105
91
|
#### Channel 2: Gemini CLI
|
|
106
92
|
|
|
107
|
-
**Installation check:**
|
|
108
93
|
```bash
|
|
109
|
-
command -v gemini >/dev/null 2>&1
|
|
94
|
+
command -v gemini >/dev/null 2>&1 || echo "Gemini not installed"
|
|
95
|
+
NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
|
|
96
|
+
NO_BROWSER=true gemini -p "REVIEW_PROMPT" --output-format json --approval-mode yolo 2>/dev/null
|
|
110
97
|
```
|
|
111
|
-
- If `gemini` is not installed: queue a compensating Claude self-review pass focused on architectural patterns, design reasoning, and broad context. Record root-cause `not_installed`. Label findings as `[compensating: Gemini-equivalent]`. Skip to next channel.
|
|
112
98
|
|
|
113
|
-
|
|
99
|
+
If not installed or auth fails, queue a compensating pass focused on architectural
|
|
100
|
+
patterns, design reasoning, and broad context. Auth failure recovery: `! gemini -p "hello"`.
|
|
101
|
+
|
|
102
|
+
#### Channel 3: Claude CLI
|
|
114
103
|
|
|
115
104
|
```bash
|
|
116
|
-
|
|
117
|
-
GEMINI_EXIT=$?
|
|
118
|
-
if [ "$GEMINI_EXIT" -eq 0 ]; then
|
|
119
|
-
echo "gemini authenticated"
|
|
120
|
-
elif [ "$GEMINI_EXIT" -eq 41 ]; then
|
|
121
|
-
echo "gemini NOT authenticated (exit 41: auth error)"
|
|
122
|
-
fi
|
|
105
|
+
claude -p "REVIEW_PROMPT" --output-format json 2>/dev/null
|
|
123
106
|
```
|
|
124
107
|
|
|
125
|
-
|
|
108
|
+
Claude CLI handles its own auth. Focus: plan alignment, code quality, testing.
|
|
126
109
|
|
|
127
|
-
|
|
128
|
-
|
|
110
|
+
**After all channels:** Run any queued compensating passes as additional `claude -p`
|
|
111
|
+
dispatches with focused prompts. Label findings as `[compensating: Codex-equivalent]`
|
|
112
|
+
or `[compensating: Gemini-equivalent]`.
|
|
129
113
|
|
|
130
|
-
|
|
114
|
+
### Step 3: Run Agent Code Review (4th channel)
|
|
131
115
|
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
116
|
+
Dispatch your platform's code-reviewer skill for a complementary review:
|
|
117
|
+
- **Claude Code:** dispatch `superpowers:code-reviewer` subagent with the PR diff and review criteria
|
|
118
|
+
- **Other platforms:** use your platform's equivalent agent review skill
|
|
135
119
|
|
|
136
|
-
|
|
137
|
-
each reviews independently.
|
|
120
|
+
The agent skill runs inside your agent's context — it has access to conversation history, project knowledge, and plan context that external CLIs lack.
|
|
138
121
|
|
|
139
|
-
|
|
122
|
+
**Important:** The agent's review output must use MMR-compatible finding schema: each finding needs `severity` (P0-P3), `location` (file:line), and `description` (`suggestion` is optional). The strict validator in `mmr reconcile` will reject findings with missing or invalid required fields.
|
|
140
123
|
|
|
141
|
-
|
|
124
|
+
### Step 4: Inject Agent Review into MMR
|
|
142
125
|
|
|
143
|
-
|
|
144
|
-
Claude, which is always available).
|
|
126
|
+
Feed the agent review findings into MMR for unified reconciliation:
|
|
145
127
|
|
|
146
128
|
```bash
|
|
147
|
-
|
|
148
|
-
|
|
129
|
+
# job_id is captured from mmr review --sync --format json output
|
|
130
|
+
# Write agent findings to a temp file for mmr reconcile
|
|
131
|
+
echo "$AGENT_FINDINGS" > /tmp/agent-findings.json
|
|
132
|
+
mmr reconcile "$JOB_ID" --channel superpowers --input /tmp/agent-findings.json
|
|
149
133
|
```
|
|
150
134
|
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
-
|
|
154
|
-
-
|
|
155
|
-
- `BASE_SHA` — base commit
|
|
156
|
-
- `HEAD_SHA` — head commit
|
|
157
|
-
- `DESCRIPTION` — PR summary
|
|
135
|
+
The `reconcile` command:
|
|
136
|
+
- Adds the agent's findings as a new channel in the job
|
|
137
|
+
- Re-runs reconciliation across ALL channels (CLI + agent)
|
|
138
|
+
- Outputs the unified verdict with all sources included
|
|
158
139
|
|
|
159
|
-
|
|
140
|
+
### Step 5: Reconcile Findings
|
|
160
141
|
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
After all channels complete, reconcile findings:
|
|
142
|
+
When using `mmr review --sync`, reconciliation is automatic. For manual fallback,
|
|
143
|
+
reconcile findings after all channels complete:
|
|
164
144
|
|
|
165
145
|
| Scenario | Confidence | Action |
|
|
166
146
|
|----------|-----------|--------|
|
|
@@ -171,7 +151,7 @@ After all channels complete, reconcile findings:
|
|
|
171
151
|
| Channels contradict each other | **Low** | Present to user for adjudication |
|
|
172
152
|
| Compensating-pass P0/P1/P2 finding | **Single-source** | Fix per normal thresholds, label as compensating |
|
|
173
153
|
|
|
174
|
-
### Step
|
|
154
|
+
### Step 6: Report Results
|
|
175
155
|
|
|
176
156
|
Output a review summary in this format:
|
|
177
157
|
|
|
@@ -179,9 +159,10 @@ Output a review summary in this format:
|
|
|
179
159
|
## Code Review Summary — PR #[number]
|
|
180
160
|
|
|
181
161
|
### Channels Executed
|
|
182
|
-
- [ ] Codex CLI — root cause: [completed / not installed / auth failed /
|
|
183
|
-
- [ ] Gemini CLI — root cause: [completed / not installed / auth failed /
|
|
184
|
-
- [ ]
|
|
162
|
+
- [ ] Codex CLI — root cause: [completed / not installed / auth failed / timeout / failed], coverage: [full / compensating (Codex-equivalent)]
|
|
163
|
+
- [ ] Gemini CLI — root cause: [completed / not installed / auth failed / timeout / failed], coverage: [full / compensating (Gemini-equivalent)]
|
|
164
|
+
- [ ] Claude CLI — root cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating]
|
|
165
|
+
- [ ] Agent review — [completed / skipped], injected via mmr reconcile
|
|
185
166
|
|
|
186
167
|
### Consensus Findings (High Confidence)
|
|
187
168
|
[Findings flagged by 2+ channels]
|
|
@@ -196,7 +177,7 @@ Output a review summary in this format:
|
|
|
196
177
|
[pass / degraded-pass / blocked / needs-user-decision]
|
|
197
178
|
```
|
|
198
179
|
|
|
199
|
-
### Step
|
|
180
|
+
### Step 6a: Final Verdict
|
|
200
181
|
|
|
201
182
|
Return exactly one verdict:
|
|
202
183
|
|
|
@@ -209,17 +190,19 @@ Verdict precedence: `needs-user-decision` > `blocked` > `degraded-pass` > `pass`
|
|
|
209
190
|
|
|
210
191
|
When compensating passes ran, maximum achievable verdict is `degraded-pass`. When both external channels were compensated, note "All findings are single-model."
|
|
211
192
|
|
|
212
|
-
### Step
|
|
193
|
+
### Step 7: Fix P0/P1/P2 Findings
|
|
213
194
|
|
|
214
195
|
If any P0, P1, or P2 findings exist:
|
|
215
196
|
1. Fix them in the code
|
|
216
197
|
2. Push the fixes: `git push`
|
|
217
|
-
3. Re-run the
|
|
198
|
+
3. Re-run the review to verify fixes: `mmr review --pr "$PR_NUMBER" --sync --format json`
|
|
218
199
|
4. After 3 fix rounds with unresolved P0/P1/P2 findings, stop and ask the user for direction — do NOT merge automatically. Document remaining findings and let the user decide whether to continue fixing, create follow-up issues, or override.
|
|
219
200
|
|
|
220
|
-
**Fix
|
|
201
|
+
**Note:** Fix cycles are an orchestration concern — the caller (agent or human) handles the fix loop. The CLI provides the review and verdict; the caller decides whether to fix and re-run.
|
|
202
|
+
|
|
203
|
+
**Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not_installed`, `auth_failed`, or `timeout` during fix rounds — its availability does not change within a session.
|
|
221
204
|
|
|
222
|
-
### Step
|
|
205
|
+
### Step 8: Confirm Completion
|
|
223
206
|
|
|
224
207
|
After all findings are resolved (or 3 rounds complete), output:
|
|
225
208
|
|
|
@@ -236,21 +219,22 @@ Do NOT proceed to the next task or merge until this confirmation is output.
|
|
|
236
219
|
| Channel not installed | Queue compensating pass, report root-cause `not_installed` |
|
|
237
220
|
| Auth expired, user recovers | Retry dispatch |
|
|
238
221
|
| Auth expired, user declines | Queue compensating pass, report root-cause `auth_failed` |
|
|
239
|
-
| Auth check timeout (after retry) | Queue compensating pass, report root-cause `
|
|
222
|
+
| Auth check timeout (after retry) | Queue compensating pass, report root-cause `timeout` |
|
|
240
223
|
| Channel fails during execution | Queue compensating pass, report root-cause `failed` |
|
|
241
224
|
| Both external channels unavailable | Two compensating passes, max verdict: `degraded-pass`, note "All findings single-model" |
|
|
242
|
-
| Superpowers unavailable | Run available CLIs, warn user (Superpowers is always-available Claude — no compensating pass) |
|
|
243
225
|
|
|
244
226
|
## Process Rules
|
|
245
227
|
|
|
246
|
-
1. **Foreground only** — Always run Codex and
|
|
247
|
-
2. **All three channels are mandatory** —
|
|
228
|
+
1. **Foreground only** — Always run Codex, Gemini, and Claude CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`.
|
|
229
|
+
2. **All three channels are mandatory** — Codex CLI, Gemini CLI, and Claude CLI. Skip only when a tool is genuinely not installed, never by choice.
|
|
248
230
|
3. **Auth failures are not silent** — always surface to the user with the exact recovery command.
|
|
249
231
|
4. **Independence** — never share one channel's output with another. Each reviews the diff independently.
|
|
250
232
|
5. **Fix before proceeding** — P0/P1/P2 findings must be resolved before moving to the next task.
|
|
251
233
|
6. **3-round limit** — never attempt more than 3 fix rounds. Surface unresolved findings to the user.
|
|
252
234
|
7. **Document everything** — the review summary must show which channels ran and which were skipped, with reasons.
|
|
253
|
-
8. **
|
|
235
|
+
8. **CLI-first** — use `mmr review --sync` as the primary entry point. Manual dispatch is a fallback only.
|
|
236
|
+
9. **Job storage** — the CLI stores job data at `~/.mmr/jobs/{job-id}/results.json`. Review results are available via `mmr results <job-id>`.
|
|
237
|
+
10. **Dispatch pattern** follows `multi-model-review-dispatch` knowledge entry. When modifying channel dispatch in this file, verify consistency with `review-code.md` and `post-implementation-review.md`.
|
|
254
238
|
|
|
255
239
|
---
|
|
256
240
|
|