@zigrivers/scaffold 3.24.3 → 3.25.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/content/knowledge/core/automated-review-tooling.md +8 -8
- package/content/pipeline/build/multi-agent-resume.md +3 -3
- package/content/pipeline/build/multi-agent-start.md +3 -3
- package/content/pipeline/build/single-agent-resume.md +3 -3
- package/content/pipeline/build/single-agent-start.md +3 -3
- package/content/pipeline/environment/automated-pr-review.md +15 -3
- package/content/skills/mmr/SKILL.md +8 -1
- package/content/skills/multi-model-dispatch/SKILL.md +5 -5
- package/content/tools/post-implementation-review.md +35 -15
- package/content/tools/review-code.md +36 -16
- package/content/tools/review-pr.md +28 -13
- package/package.json +1 -1
- package/skills/mmr/SKILL.md +8 -1
- package/skills/multi-model-dispatch/SKILL.md +5 -5
|
@@ -14,7 +14,7 @@ Automated code review leverages AI models to provide consistent, thorough code r
|
|
|
14
14
|
|
|
15
15
|
See `review-methodology` for severity definitions (P0-P3). See `multi-model-review-dispatch` for finding reconciliation rules.
|
|
16
16
|
|
|
17
|
-
**Action thresholds:**
|
|
17
|
+
**Action thresholds:** Findings at or above the configured `fix_threshold` (read from `results.fix_threshold` in the verdict JSON; default `P2`) must be fixed before proceeding to the next task. Findings below threshold are recorded as advisory but not actioned.
|
|
18
18
|
|
|
19
19
|
### Degraded-Mode Behavior
|
|
20
20
|
|
|
@@ -24,8 +24,8 @@ These are the authoritative verdict definitions. Tool files (`review-code.md`, `
|
|
|
24
24
|
|
|
25
25
|
| Verdict | Condition |
|
|
26
26
|
|---------|-----------|
|
|
27
|
-
| `pass` | All channels completed, no unresolved
|
|
28
|
-
| `degraded-pass` | Some channels unavailable, compensating passes ran, no unresolved
|
|
27
|
+
| `pass` | All channels completed, no unresolved findings at or above `fix_threshold` |
|
|
28
|
+
| `degraded-pass` | Some channels unavailable, compensating passes ran, no unresolved findings at or above `fix_threshold` |
|
|
29
29
|
| `blocked` | Findings at or above fix threshold remain unresolved |
|
|
30
30
|
| `needs-user-decision` | No channels completed — insufficient data for a determination |
|
|
31
31
|
|
|
@@ -47,7 +47,7 @@ When a channel (Codex or Gemini) is unavailable, the CLI dispatches a compensati
|
|
|
47
47
|
- Missing Gemini → focus on architectural patterns, design reasoning, broad context.
|
|
48
48
|
- Missing both → two compensating passes (one per missing channel's strength area).
|
|
49
49
|
- Compensating-pass findings are **single-source confidence** — they do NOT raise to high confidence even if they agree with another channel's findings.
|
|
50
|
-
- Normal mandatory-fix thresholds apply:
|
|
50
|
+
- Normal mandatory-fix thresholds apply: findings at or above `fix_threshold` from compensating passes still require fixing.
|
|
51
51
|
|
|
52
52
|
#### Foreground-Only Execution
|
|
53
53
|
|
|
@@ -63,7 +63,7 @@ After all channels complete (including compensating passes), reconcile findings
|
|
|
63
63
|
|
|
64
64
|
Reconciliation normalizes findings from all channels (real and compensating) to a common schema, then matches findings across channels by location and category. The purpose is to detect when multiple independent channels agree on a finding (raising confidence) and to surface contradictions that require human judgment. A finding reported by Codex alone has lower confidence than the same finding reported by both Codex and Gemini.
|
|
65
65
|
|
|
66
|
-
The reconciliation output is a deduplicated list of findings with confidence scores. High-confidence findings (agreed by 2+ real channels) are actionable without further discussion. Low-confidence findings (single-source, or from compensating passes) still require action at
|
|
66
|
+
The reconciliation output is a deduplicated list of findings with confidence scores. High-confidence findings (agreed by 2+ real channels) are actionable without further discussion. Low-confidence findings (single-source, or from compensating passes) still require action when at or above `fix_threshold` but should be noted as lower-confidence in the review summary.
|
|
67
67
|
|
|
68
68
|
Findings that appear in all three channels (Codex, Gemini, Claude) are considered maximum-confidence and should be surfaced first in the review summary. Findings that appear in only one channel should include the channel name in the finding description to help the developer assess confidence independently.
|
|
69
69
|
|
|
@@ -112,16 +112,16 @@ Apply the following evaluation order to determine the final verdict. The first m
|
|
|
112
112
|
```
|
|
113
113
|
Verdict evaluation order:
|
|
114
114
|
1. No channels completed? → needs-user-decision
|
|
115
|
-
2. Any unresolved
|
|
115
|
+
2. Any unresolved findings at or above `fix_threshold` after 3 fix rounds? → blocked
|
|
116
116
|
3. Any channel not at full coverage? → degraded-pass
|
|
117
|
-
4. All channels completed, no unresolved
|
|
117
|
+
4. All channels completed, no unresolved findings at or above `fix_threshold`? → pass
|
|
118
118
|
```
|
|
119
119
|
|
|
120
120
|
A channel is "not at full coverage" when: it ran as a compensating pass instead of a real tool, or it timed out.
|
|
121
121
|
|
|
122
122
|
**Verdict precedence reminder:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. When multiple conditions apply simultaneously, the higher-precedence verdict wins.
|
|
123
123
|
|
|
124
|
-
The verdict is always computed after all fix rounds are exhausted — do not emit a partial verdict mid-cycle. If a fix round resolves all
|
|
124
|
+
The verdict is always computed after all fix rounds are exhausted — do not emit a partial verdict mid-cycle. If a fix round resolves all findings at or above `fix_threshold`, the verdict upgrades from `blocked` to `pass` or `degraded-pass` depending on channel coverage. This upgrade must be verified explicitly by re-running the reconciliation step after each fix round, not assumed from the fact that fixes were applied.
|
|
125
125
|
|
|
126
126
|
### Security-Focused Review Checklist
|
|
127
127
|
|
|
@@ -168,7 +168,7 @@ Once in-progress work is complete (or if there was none):
|
|
|
168
168
|
- This reviews the local delivery candidate without requiring a PR
|
|
169
169
|
- Surface auth failures immediately and retry after recovery
|
|
170
170
|
- If recovery is not possible, document reduced review coverage and continue with the available channels
|
|
171
|
-
- Fix any
|
|
171
|
+
- Fix any findings at or above `fix_threshold` before proceeding
|
|
172
172
|
|
|
173
173
|
3. **Create PR** (if not already created for in-progress work)
|
|
174
174
|
- Push the branch: `git push -u origin HEAD`
|
|
@@ -184,7 +184,7 @@ Once in-progress work is complete (or if there was none):
|
|
|
184
184
|
4. **Superpowers code-reviewer** (4th channel): dispatch `superpowers:code-reviewer` subagent with BASE_SHA and HEAD_SHA
|
|
185
185
|
- Verify auth before each CLI (`mmr config test` pre-flights all three at once)
|
|
186
186
|
- All four channels should execute. Missing Codex or Gemini → MMR runs a compensating Claude pass in its place (degraded-pass verdict). Missing Claude CLI → review proceeds without compensation.
|
|
187
|
-
- Fix any
|
|
187
|
+
- Fix any findings at or above `fix_threshold` before proceeding
|
|
188
188
|
- Do NOT move to the next task until the review completes
|
|
189
189
|
|
|
190
190
|
5. **Between-task cleanup**
|
|
@@ -239,7 +239,7 @@ Once in-progress work is complete (or if there was none):
|
|
|
239
239
|
5. **TDD is not optional** — Continue the red-green-refactor cycle for any in-progress work.
|
|
240
240
|
6. **Quality gates before PR** — Never create a PR with failing checks.
|
|
241
241
|
7. **Honor pre-push review when requested** — If the user or project workflow asks for pre-push multi-model review, run `scaffold run review-code` after quality gates and before `git push`.
|
|
242
|
-
8. **Code review before next task** — After creating a PR, run `scaffold run review-pr`: three CLI channels (Codex CLI, Gemini CLI, Claude CLI) via MMR plus the Superpowers code-reviewer agent as a complementary 4th channel. Fix all
|
|
242
|
+
8. **Code review before next task** — After creating a PR, run `scaffold run review-pr`: three CLI channels (Codex CLI, Gemini CLI, Claude CLI) via MMR plus the Superpowers code-reviewer agent as a complementary 4th channel. Fix all findings at or above `fix_threshold` before moving on.
|
|
243
243
|
9. **Follow CLAUDE.md** — It is the authority on project conventions and commands.
|
|
244
244
|
|
|
245
245
|
---
|
|
@@ -171,7 +171,7 @@ For each task:
|
|
|
171
171
|
- This reviews the local delivery candidate without requiring a PR
|
|
172
172
|
- Surface auth failures immediately and retry after recovery
|
|
173
173
|
- If recovery is not possible, document reduced review coverage and continue with the available channels
|
|
174
|
-
- Fix any
|
|
174
|
+
- Fix any findings at or above `fix_threshold` before proceeding
|
|
175
175
|
|
|
176
176
|
7. **Create PR**
|
|
177
177
|
- Push the branch: `git push -u origin HEAD`
|
|
@@ -188,7 +188,7 @@ For each task:
|
|
|
188
188
|
4. **Superpowers code-reviewer** (4th channel): dispatch `superpowers:code-reviewer` subagent with BASE_SHA and HEAD_SHA
|
|
189
189
|
- Verify auth before each CLI (`mmr config test` pre-flights all three at once)
|
|
190
190
|
- All four channels should execute. Missing Codex or Gemini → MMR runs a compensating Claude pass in its place (degraded-pass verdict). Missing Claude CLI → review proceeds without compensation.
|
|
191
|
-
- Fix any
|
|
191
|
+
- Fix any findings at or above `fix_threshold` before proceeding
|
|
192
192
|
- Do NOT move to the next task until the review completes
|
|
193
193
|
|
|
194
194
|
9. **Between-task cleanup**
|
|
@@ -231,7 +231,7 @@ For each task:
|
|
|
231
231
|
4. **TDD is not optional** — Write failing tests before implementation. No exceptions.
|
|
232
232
|
5. **Quality gates before PR** — Never create a PR with failing checks.
|
|
233
233
|
6. **Honor pre-push review when requested** — If the user or project workflow asks for pre-push multi-model review, run `scaffold run review-code` after quality gates and before `git push`.
|
|
234
|
-
7. **Code review before next task** — After creating a PR, run `scaffold run review-pr`: three CLI channels (Codex CLI, Gemini CLI, Claude CLI) via MMR plus the Superpowers code-reviewer agent as a complementary 4th channel. Fix all
|
|
234
|
+
7. **Code review before next task** — After creating a PR, run `scaffold run review-pr`: three CLI channels (Codex CLI, Gemini CLI, Claude CLI) via MMR plus the Superpowers code-reviewer agent as a complementary 4th channel. Fix all findings at or above `fix_threshold` before moving on.
|
|
235
235
|
8. **Avoid task conflicts** — Check what other agents are working on before claiming.
|
|
236
236
|
9. **Follow CLAUDE.md** — It is the authority on project conventions and commands.
|
|
237
237
|
|
|
@@ -145,7 +145,7 @@ Once in-progress work is complete (or if there was none):
|
|
|
145
145
|
- This reviews the local delivery candidate without requiring a PR
|
|
146
146
|
- Surface auth failures immediately and retry after recovery
|
|
147
147
|
- If recovery is not possible, document reduced review coverage and continue with the available channels
|
|
148
|
-
- Fix any
|
|
148
|
+
- Fix any findings at or above `fix_threshold` before proceeding
|
|
149
149
|
|
|
150
150
|
3. **Create PR** (if not already created for in-progress work)
|
|
151
151
|
- Push the branch: `git push -u origin HEAD`
|
|
@@ -161,7 +161,7 @@ Once in-progress work is complete (or if there was none):
|
|
|
161
161
|
4. **Superpowers code-reviewer** (4th channel): dispatch `superpowers:code-reviewer` subagent with BASE_SHA and HEAD_SHA
|
|
162
162
|
- Verify auth before each CLI (`mmr config test` pre-flights all three at once)
|
|
163
163
|
- All four channels should execute. Missing Codex or Gemini → MMR runs a compensating Claude pass in its place (degraded-pass verdict). Missing Claude CLI → review proceeds without compensation.
|
|
164
|
-
- Fix any
|
|
164
|
+
- Fix any findings at or above `fix_threshold` before proceeding
|
|
165
165
|
- Do NOT move to the next task until the review completes
|
|
166
166
|
|
|
167
167
|
5. **Claim next task**
|
|
@@ -204,7 +204,7 @@ Once in-progress work is complete (or if there was none):
|
|
|
204
204
|
4. **TDD is not optional** — Continue the red-green-refactor cycle for any in-progress work.
|
|
205
205
|
5. **Quality gates before PR** — Never create a PR with failing checks.
|
|
206
206
|
6. **Honor pre-push review when requested** — If the user or project workflow asks for pre-push multi-model review, run `scaffold run review-code` after quality gates and before `git push`.
|
|
207
|
-
7. **Code review before next task** — After creating a PR, run `scaffold run review-pr`: three CLI channels (Codex CLI, Gemini CLI, Claude CLI) via MMR plus the Superpowers code-reviewer agent as a complementary 4th channel. Fix all
|
|
207
|
+
7. **Code review before next task** — After creating a PR, run `scaffold run review-pr`: three CLI channels (Codex CLI, Gemini CLI, Claude CLI) via MMR plus the Superpowers code-reviewer agent as a complementary 4th channel. Fix all findings at or above `fix_threshold` before moving on.
|
|
208
208
|
8. **Follow CLAUDE.md** — It is the authority on project conventions and commands.
|
|
209
209
|
|
|
210
210
|
---
|
|
@@ -150,7 +150,7 @@ For each task:
|
|
|
150
150
|
- This reviews the local delivery candidate without requiring a PR
|
|
151
151
|
- Surface auth failures immediately and retry after recovery
|
|
152
152
|
- If recovery is not possible, document reduced review coverage and continue with the available channels
|
|
153
|
-
- Fix any
|
|
153
|
+
- Fix any findings at or above `fix_threshold` before proceeding
|
|
154
154
|
|
|
155
155
|
7. **Create PR**
|
|
156
156
|
- Push the branch: `git push -u origin HEAD`
|
|
@@ -167,7 +167,7 @@ For each task:
|
|
|
167
167
|
4. **Superpowers code-reviewer** (4th channel): dispatch `superpowers:code-reviewer` subagent with BASE_SHA and HEAD_SHA
|
|
168
168
|
- Verify auth before each CLI (`mmr config test` pre-flights all three at once)
|
|
169
169
|
- All four channels should execute. Missing Codex or Gemini → MMR runs a compensating Claude pass in its place (degraded-pass verdict). Missing Claude CLI → review proceeds without compensation.
|
|
170
|
-
- Fix any
|
|
170
|
+
- Fix any findings at or above `fix_threshold` before proceeding
|
|
171
171
|
- Do NOT move to the next task until the review completes
|
|
172
172
|
|
|
173
173
|
9. **Update status**
|
|
@@ -202,7 +202,7 @@ For each task:
|
|
|
202
202
|
2. **One task at a time** — Complete the current task fully before starting the next.
|
|
203
203
|
3. **Quality gates before PR** — Never create a PR with failing checks.
|
|
204
204
|
4. **Honor pre-push review when requested** — If the user or project workflow asks for pre-push multi-model review, run `scaffold run review-code` after quality gates and before `git push`.
|
|
205
|
-
5. **Code review before next task** — After creating a PR, run `scaffold run review-pr`: three CLI channels (Codex CLI, Gemini CLI, Claude CLI) via MMR plus the Superpowers code-reviewer agent as a complementary 4th channel. Fix all
|
|
205
|
+
5. **Code review before next task** — After creating a PR, run `scaffold run review-pr`: three CLI channels (Codex CLI, Gemini CLI, Claude CLI) via MMR plus the Superpowers code-reviewer agent as a complementary 4th channel. Fix all findings at or above `fix_threshold` before moving on.
|
|
206
206
|
6. **Update status immediately** — Mark tasks complete as soon as review passes.
|
|
207
207
|
7. **Consult lessons.md** — Check for relevant anti-patterns before each task.
|
|
208
208
|
8. **Follow CLAUDE.md** — It is the authority on project conventions and commands.
|
|
@@ -100,6 +100,18 @@ Check if AGENTS.md exists first. If it exists, check for scaffold tracking comme
|
|
|
100
100
|
|
|
101
101
|
## Instructions
|
|
102
102
|
|
|
103
|
+
### MMR Configuration
|
|
104
|
+
|
|
105
|
+
If `.mmr.yaml` does not exist in the project root and `mmr` is on `PATH`,
|
|
106
|
+
run `mmr config init` once to create one. The generated file pins
|
|
107
|
+
`fix_threshold: P2` (the recommended default for typical software work)
|
|
108
|
+
with an explanatory comment block describing each severity tier — edit
|
|
109
|
+
the value if your project warrants a different gate (`P1` for low-friction
|
|
110
|
+
prototypes; `P3` for security-sensitive work).
|
|
111
|
+
|
|
112
|
+
If `mmr` is not installed, install it before running multi-model review;
|
|
113
|
+
otherwise channels will degrade.
|
|
114
|
+
|
|
103
115
|
### Configure Review Enforcement Hook
|
|
104
116
|
|
|
105
117
|
Add a Claude Code hook to the project's `.claude/settings.json` that fires after
|
|
@@ -118,7 +130,7 @@ Add this to `.claude/settings.json`:
|
|
|
118
130
|
"hooks": [
|
|
119
131
|
{
|
|
120
132
|
"type": "command",
|
|
121
|
-
"command": "if echo \"$CC_BASH_COMMAND\" | grep -q 'gh pr create'; then echo '\\n⚠️ MANDATORY: Run all 3 CLI review channels plus the Superpowers 4th channel before proceeding to the next task:\\n\\n 1. Codex CLI:\\n Auth: codex login status 2>/dev/null\\n Run: codex exec --skip-git-repo-check -s read-only --ephemeral \"REVIEW_PROMPT\" 2>/dev/null\\n\\n 2. Gemini CLI:\\n Auth: NO_BROWSER=true gemini -p \"respond with ok\" -o json 2>&1\\n Run: NO_BROWSER=true gemini -p \"REVIEW_PROMPT\" --output-format json --approval-mode yolo 2>/dev/null\\n\\n 3. Claude CLI:\\n Auth: claude -p \"respond with ok\" 2>/dev/null\\n Run: claude -p \"REVIEW_PROMPT\" --output-format json 2>/dev/null\\n\\n 4. Superpowers code-reviewer (complementary 4th channel):\\n Dispatch superpowers:code-reviewer subagent with BASE_SHA and HEAD_SHA\\n\\nIf auth fails: tell user to run ! codex login, ! gemini -p \"hello\", or ! claude login (as applicable).\\nDo not silently skip channels — surface auth failures and let MMR decide: missing Codex/Gemini get compensating Claude passes (degraded-pass verdict); missing Claude proceeds without compensation.\\nFix all
|
|
133
|
+
"command": "if echo \"$CC_BASH_COMMAND\" | grep -q 'gh pr create'; then echo '\\n⚠️ MANDATORY: Run all 3 CLI review channels plus the Superpowers 4th channel before proceeding to the next task:\\n\\n 1. Codex CLI:\\n Auth: codex login status 2>/dev/null\\n Run: codex exec --skip-git-repo-check -s read-only --ephemeral \"REVIEW_PROMPT\" 2>/dev/null\\n\\n 2. Gemini CLI:\\n Auth: NO_BROWSER=true gemini -p \"respond with ok\" -o json 2>&1\\n Run: NO_BROWSER=true gemini -p \"REVIEW_PROMPT\" --output-format json --approval-mode yolo 2>/dev/null\\n\\n 3. Claude CLI:\\n Auth: claude -p \"respond with ok\" 2>/dev/null\\n Run: claude -p \"REVIEW_PROMPT\" --output-format json 2>/dev/null\\n\\n 4. Superpowers code-reviewer (complementary 4th channel):\\n Dispatch superpowers:code-reviewer subagent with BASE_SHA and HEAD_SHA\\n\\nIf auth fails: tell user to run ! codex login, ! gemini -p \"hello\", or ! claude login (as applicable).\\nDo not silently skip channels — surface auth failures and let MMR decide: missing Codex/Gemini get compensating Claude passes (degraded-pass verdict); missing Claude proceeds without compensation.\\nFix all findings at or above the configured fix_threshold (see results.fix_threshold in the verdict JSON; default P2).\\nFull instructions: scaffold run review-pr'; fi"
|
|
122
134
|
}
|
|
123
135
|
]
|
|
124
136
|
}
|
|
@@ -151,8 +163,8 @@ markers, replace it in place and add the markers.
|
|
|
151
163
|
<!-- scaffold:automated-pr-review:claude-md start -->
|
|
152
164
|
**Mandatory after `gh pr create`** — run `/scaffold:review-pr <PR#>` to execute
|
|
153
165
|
all three review channels (Codex CLI, Gemini CLI, Claude CLI), plus the
|
|
154
|
-
Superpowers code-reviewer agent as a complementary 4th channel. Fix
|
|
155
|
-
|
|
166
|
+
Superpowers code-reviewer agent as a complementary 4th channel. Fix findings
|
|
167
|
+
at or above `fix_threshold` before moving to the next task. A post-hook on `gh pr create` will
|
|
156
168
|
remind you.
|
|
157
169
|
|
|
158
170
|
**Optional but supported** for non-PR targets — the review is not PR-gated.
|
|
@@ -119,13 +119,20 @@ Re-run `mmr config test` after re-authenticating to verify.
|
|
|
119
119
|
|
|
120
120
|
## Severity Gate
|
|
121
121
|
|
|
122
|
-
Default threshold is P2 (
|
|
122
|
+
Default threshold is `P2` (the verdict gate blocks on P0, P1, and P2;
|
|
123
|
+
P3 findings are kept in the result as **advisory** but don't cause
|
|
124
|
+
`blocked`). Override per-review:
|
|
123
125
|
|
|
124
126
|
```bash
|
|
125
127
|
mmr review --pr 47 --fix-threshold P1 # Only fix P0 and P1
|
|
126
128
|
mmr review --pr 47 --fix-threshold P0 # Only fix critical issues
|
|
127
129
|
```
|
|
128
130
|
|
|
131
|
+
The verdict JSON includes `advisory_count` (count of findings strictly
|
|
132
|
+
below the threshold). Formatted output shows `Advisory: N` (text) or
|
|
133
|
+
`**Advisory:** N` (markdown) when non-zero — useful for spotting real
|
|
134
|
+
findings that the gate didn't block.
|
|
135
|
+
|
|
129
136
|
## Output Formats
|
|
130
137
|
|
|
131
138
|
```bash
|
|
@@ -147,13 +147,13 @@ When dispatching a review, bundle all relevant context into the prompt. Each CLI
|
|
|
147
147
|
### Template for Artifact Review
|
|
148
148
|
|
|
149
149
|
```
|
|
150
|
-
You are reviewing a project artifact for quality issues. Report P0
|
|
150
|
+
You are reviewing a project artifact for quality issues. Report all P0, P1, P2, and P3 findings; the project's fix threshold is applied downstream.
|
|
151
151
|
|
|
152
152
|
## Severity Definitions
|
|
153
153
|
- P0: Will cause implementation failure, data loss, security vulnerability, or fundamental architectural flaw
|
|
154
154
|
- P1: Will cause bugs in normal usage, inconsistency across documents, or blocks downstream work
|
|
155
155
|
- P2: Improvement opportunity — style, naming, documentation, minor optimization
|
|
156
|
-
-
|
|
156
|
+
- P3: Personal preference, trivial nits — included so a strict project (`fix_threshold: P3`) can act on them; otherwise advisory
|
|
157
157
|
|
|
158
158
|
## Review Standards
|
|
159
159
|
[paste contents of docs/review-standards.md if it exists, otherwise use severity definitions above]
|
|
@@ -170,7 +170,7 @@ Respond with a JSON object:
|
|
|
170
170
|
"approved": true/false,
|
|
171
171
|
"findings": [
|
|
172
172
|
{
|
|
173
|
-
"severity": "P0" or "P1" or "P2",
|
|
173
|
+
"severity": "P0" or "P1" or "P2" or "P3",
|
|
174
174
|
"location": "section or line reference",
|
|
175
175
|
"description": "what's wrong",
|
|
176
176
|
"suggestion": "specific fix"
|
|
@@ -179,13 +179,13 @@ Respond with a JSON object:
|
|
|
179
179
|
"summary": "one-line assessment"
|
|
180
180
|
}
|
|
181
181
|
|
|
182
|
-
If no
|
|
182
|
+
If no findings, respond with: { "approved": true, "findings": [], "summary": "No issues found." }
|
|
183
183
|
```
|
|
184
184
|
|
|
185
185
|
### Template for PR Diff Review
|
|
186
186
|
|
|
187
187
|
```
|
|
188
|
-
You are reviewing a pull request diff. Report P0, P1, and
|
|
188
|
+
You are reviewing a pull request diff. Report all P0, P1, P2, and P3 findings; the project's fix threshold is applied downstream.
|
|
189
189
|
|
|
190
190
|
## Review Standards
|
|
191
191
|
[paste docs/review-standards.md]
|
|
@@ -10,7 +10,7 @@ conditional: null
|
|
|
10
10
|
stateless: true
|
|
11
11
|
category: tool
|
|
12
12
|
knowledge-base: [multi-model-review-dispatch, automated-review-tooling, post-implementation-review-methodology]
|
|
13
|
-
argument-hint: "[--report-only]"
|
|
13
|
+
argument-hint: "[--report-only] [--fix-threshold P0|P1|P2|P3]"
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
## Purpose
|
|
@@ -30,7 +30,7 @@ The three channels are:
|
|
|
30
30
|
|
|
31
31
|
## Inputs
|
|
32
32
|
|
|
33
|
-
- `$ARGUMENTS` — `--report-only` flag
|
|
33
|
+
- `$ARGUMENTS` — `--report-only` flag and/or `--fix-threshold P0|P1|P2|P3` (both optional)
|
|
34
34
|
- `docs/user-stories.md` (required) — user stories with acceptance criteria; organizing manifest for Phase 2
|
|
35
35
|
- `docs/implementation-plan.md` (optional) — implementation tasks; used to cross-check that all planned deliverables were built
|
|
36
36
|
- `docs/coding-standards.md` (required) — coding conventions for review context
|
|
@@ -43,13 +43,13 @@ The three channels are:
|
|
|
43
43
|
## Expected Outputs
|
|
44
44
|
|
|
45
45
|
- `docs/reviews/post-implementation-review.md` — consolidated findings report
|
|
46
|
-
- Fixed code (
|
|
46
|
+
- Fixed code (findings at or above `fix_threshold` resolved) — in review+fix and update modes
|
|
47
47
|
|
|
48
48
|
## Mode Detection
|
|
49
49
|
|
|
50
50
|
| Condition | Mode |
|
|
51
51
|
|-----------|------|
|
|
52
|
-
| No prior report, no `--report-only` | **Review + Fix** — run all phases, then fix
|
|
52
|
+
| No prior report, no `--report-only` | **Review + Fix** — run all phases, then fix findings at or above `fix_threshold` |
|
|
53
53
|
| No prior report, `--report-only` | **Report Only** — run all phases, write report, no code changes |
|
|
54
54
|
| Prior report exists, no `--report-only` | **Update Mode** — load prior findings, skip to Phase 3 fix execution |
|
|
55
55
|
| Prior report exists, `--report-only` | **Re-review** — run full review fresh, overwrite prior report |
|
|
@@ -63,6 +63,12 @@ The three channels are:
|
|
|
63
63
|
REPORT_ONLY=false
|
|
64
64
|
[[ "$ARGUMENTS" == *"--report-only"* ]] && REPORT_ONLY=true
|
|
65
65
|
|
|
66
|
+
# Detect --fix-threshold flag
|
|
67
|
+
FIX_THRESHOLD=""
|
|
68
|
+
if [[ "$ARGUMENTS" =~ (^|[[:space:]])--fix-threshold[[:space:]]+(P[0-3])($|[[:space:]]) ]]; then
|
|
69
|
+
FIX_THRESHOLD="${BASH_REMATCH[2]}"
|
|
70
|
+
fi
|
|
71
|
+
|
|
66
72
|
# Detect prior report
|
|
67
73
|
PRIOR_REPORT="docs/reviews/post-implementation-review.md"
|
|
68
74
|
[[ -f "$PRIOR_REPORT" ]] && PRIOR_EXISTS=true || PRIOR_EXISTS=false
|
|
@@ -482,6 +488,12 @@ diff-only), so it operates independently of `mmr review`. Use `mmr reconcile` on
|
|
|
482
488
|
when you want to merge post-implementation findings into an existing MMR job for a
|
|
483
489
|
single unified verdict.
|
|
484
490
|
|
|
491
|
+
If `$FIX_THRESHOLD` is set and a fresh `mmr review` is dispatched as part
|
|
492
|
+
of this flow (e.g., to seed a job for `mmr reconcile`), forward it to that
|
|
493
|
+
invocation: `mmr review … --fix-threshold "$FIX_THRESHOLD" …`. The
|
|
494
|
+
existing `mmr reconcile` call does not take `--fix-threshold` directly —
|
|
495
|
+
the job's threshold is set at `mmr review` time.
|
|
496
|
+
|
|
485
497
|
### Step 6: Consolidate Findings
|
|
486
498
|
|
|
487
499
|
Merge all findings from Phase 1 (`CODEX_PHASE1_FINDINGS`, `GEMINI_PHASE1_FINDINGS`,
|
|
@@ -495,8 +507,12 @@ entry. Record all source channels in a `sources` array on the merged finding.
|
|
|
495
507
|
|
|
496
508
|
**Sorting:** P0 first, then P1, then P2, then P3.
|
|
497
509
|
|
|
498
|
-
**Fix queue:**
|
|
499
|
-
|
|
510
|
+
**Fix queue:** Findings at or above the configured `fix_threshold` enter the
|
|
511
|
+
fix queue. The threshold defaults to `P2` (so P0, P1, P2 enter the queue and
|
|
512
|
+
P3 is advisory) and is configurable via `.mmr.yaml`, `--fix-threshold`
|
|
513
|
+
passed to this command, or the user's `~/.mmr/config.yaml`. The agent
|
|
514
|
+
reads the active threshold from `$FIX_THRESHOLD` if set; otherwise from
|
|
515
|
+
`.mmr.yaml` or the built-in default.
|
|
500
516
|
|
|
501
517
|
### Step 7: Write the Findings Report
|
|
502
518
|
|
|
@@ -543,7 +559,7 @@ Create `docs/reviews/` if it does not exist. Write the following to
|
|
|
543
559
|
- [criterion]: satisfied | partial | not-satisfied
|
|
544
560
|
|
|
545
561
|
**Findings:**
|
|
546
|
-
[
|
|
562
|
+
[Findings sorted by severity, or "No findings."]
|
|
547
563
|
|
|
548
564
|
[Repeat for each story]
|
|
549
565
|
|
|
@@ -574,7 +590,10 @@ PRE_FIX_SHA=$(git rev-parse HEAD)
|
|
|
574
590
|
This is used in Step 9 to identify all files modified across all fix commits,
|
|
575
591
|
regardless of how many severity-tier commits are made.
|
|
576
592
|
|
|
577
|
-
Process the fix queue in priority order:
|
|
593
|
+
Process the fix queue in priority order: iterate severity tiers from most
|
|
594
|
+
critical to least, processing every tier from `P0` down to and including
|
|
595
|
+
the configured `fix_threshold` (default `P2`). At threshold `P3` this
|
|
596
|
+
includes all four tiers; at `P0` only critical findings are processed.
|
|
578
597
|
Within each severity tier, fix high-confidence findings (multi-source) first.
|
|
579
598
|
|
|
580
599
|
For each finding:
|
|
@@ -593,15 +612,16 @@ For each finding:
|
|
|
593
612
|
- Stop attempting to fix it
|
|
594
613
|
- Continue to the next finding in the queue
|
|
595
614
|
|
|
596
|
-
After all
|
|
597
|
-
before moving to
|
|
615
|
+
After all findings in a severity tier are fixed, re-read each modified file
|
|
616
|
+
once to confirm correctness before moving to the next tier.
|
|
598
617
|
|
|
599
|
-
Commit after each severity tier
|
|
618
|
+
Commit after each severity tier processed (the tier label varies by run —
|
|
619
|
+
`P0`, `P1`, `P2`, or `P3` depending on the configured threshold):
|
|
600
620
|
|
|
601
621
|
```bash
|
|
602
622
|
git add [modified source files only — not the report]
|
|
603
|
-
git commit -m "fix: resolve
|
|
604
|
-
#
|
|
623
|
+
git commit -m "fix: resolve <tier> post-implementation review findings"
|
|
624
|
+
# Substitute <tier> with the severity label of the tier you just processed
|
|
605
625
|
```
|
|
606
626
|
|
|
607
627
|
### Step 9: Final Verification Pass
|
|
@@ -616,7 +636,7 @@ recorded at the start of Step 8:
|
|
|
616
636
|
git diff --name-only $PRE_FIX_SHA..HEAD
|
|
617
637
|
```
|
|
618
638
|
|
|
619
|
-
This captures files from every severity-tier commit
|
|
639
|
+
This captures files from every severity-tier commit, not just
|
|
620
640
|
the most recent one.
|
|
621
641
|
|
|
622
642
|
Dispatch `superpowers:code-reviewer` with:
|
|
@@ -701,7 +721,7 @@ the user they require manual attention before the project is ready to release.
|
|
|
701
721
|
3. **Auth failures are not silent** — always surface to the user with the exact recovery command (`! codex login` or `! gemini -p "hello"`). Wait for user response before queuing a compensating pass.
|
|
702
722
|
4. **Independence** — never share one channel's output with another. Each reviews independently.
|
|
703
723
|
5. **Verify every fix** — run tests (or re-read the file) immediately after each fix before moving on.
|
|
704
|
-
6. **3-round limit (per finding)** — never attempt to fix the *same*
|
|
724
|
+
6. **3-round limit (per finding)** — never attempt to fix the *same* blocking finding more than 3 times. Each round that surfaces a *new, different, fixable* finding is healthy iteration — keep going. Stop only when the same finding recurs across 3 attempts, channels contradict each other, or the user asks to stop. Surface unresolved findings to the user.
|
|
705
725
|
7. **Document everything** — the report must show which channels ran, which were compensating, which were skipped, and the root cause for any degraded channel.
|
|
706
726
|
8. **No auto-merge** — this tool modifies local files only. It never pushes, merges, or creates PRs.
|
|
707
727
|
9. **Dispatch pattern cross-reference** — Phase 2 parallel dispatch uses `superpowers:dispatching-parallel-agents`. Each story subagent dispatches its own `superpowers:code-reviewer` as Channel 3. This two-level nesting is intentional and supported.
|
|
@@ -10,7 +10,7 @@ conditional: null
|
|
|
10
10
|
stateless: true
|
|
11
11
|
category: tool
|
|
12
12
|
knowledge-base: [multi-model-review-dispatch, automated-review-tooling]
|
|
13
|
-
argument-hint: "[--base <ref>] [--head <ref>] [--staged] [--report-only]"
|
|
13
|
+
argument-hint: "[--base <ref>] [--head <ref>] [--staged] [--report-only] [--fix-threshold P0|P1|P2|P3]"
|
|
14
14
|
---
|
|
15
15
|
|
|
16
16
|
## Purpose
|
|
@@ -44,6 +44,7 @@ brand-new files.
|
|
|
44
44
|
- `--head <ref>` — explicit head ref for diff review
|
|
45
45
|
- `--staged` — review only staged changes (`git diff --cached`)
|
|
46
46
|
- `--report-only` — collect findings and verdict, but do not apply fixes
|
|
47
|
+
- `--fix-threshold P0|P1|P2|P3` — override the project's configured threshold for this run
|
|
47
48
|
- `docs/coding-standards.md` (required) — coding conventions for review context
|
|
48
49
|
- `docs/tdd-standards.md` (optional) — test expectations
|
|
49
50
|
- `docs/review-standards.md` (optional) — severity definitions and review criteria
|
|
@@ -63,6 +64,17 @@ brand-new files.
|
|
|
63
64
|
When the MMR CLI is installed, use it as the primary entry point. Pick the
|
|
64
65
|
invocation that matches the scope the user asked for:
|
|
65
66
|
|
|
67
|
+
A common helper across all four invocation modes — set `MMR_FLAGS` once
|
|
68
|
+
and reuse it. **Note:** `FIX_THRESHOLD` is parsed from `$ARGUMENTS` in
|
|
69
|
+
Step 1 below; if you're skipping ahead to the invocations, run Step 1's
|
|
70
|
+
detection block first so the `--fix-threshold` flag actually flows
|
|
71
|
+
through.
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
MMR_FLAGS=(--sync --format json)
|
|
75
|
+
[ -n "$FIX_THRESHOLD" ] && MMR_FLAGS+=(--fix-threshold "$FIX_THRESHOLD")
|
|
76
|
+
```
|
|
77
|
+
|
|
66
78
|
```bash
|
|
67
79
|
# Default (no flags) — full local delivery candidate:
|
|
68
80
|
# committed branch diff (vs origin/main or main) + staged + unstaged.
|
|
@@ -93,16 +105,16 @@ fi
|
|
|
93
105
|
# that covers committed branch work + staged + unstaged edits, with
|
|
94
106
|
# repeated edits to the same file collapsed into a single final hunk.
|
|
95
107
|
MERGE_BASE=$(git merge-base "$BASE_REF" HEAD 2>/dev/null || echo "$BASE_REF")
|
|
96
|
-
git diff "$MERGE_BASE" | mmr review --diff -
|
|
108
|
+
git diff "$MERGE_BASE" | mmr review --diff - "${MMR_FLAGS[@]}"
|
|
97
109
|
|
|
98
110
|
# Staged changes only:
|
|
99
|
-
mmr review --staged
|
|
111
|
+
mmr review --staged "${MMR_FLAGS[@]}"
|
|
100
112
|
|
|
101
113
|
# Branch diff against main (committed only, no staged/unstaged):
|
|
102
|
-
mmr review --base main
|
|
114
|
+
mmr review --base main "${MMR_FLAGS[@]}"
|
|
103
115
|
|
|
104
116
|
# Explicit ref range:
|
|
105
|
-
mmr review --base <base-ref> --head <head-ref>
|
|
117
|
+
mmr review --base <base-ref> --head <head-ref> "${MMR_FLAGS[@]}"
|
|
106
118
|
```
|
|
107
119
|
|
|
108
120
|
Routing rules:
|
|
@@ -133,6 +145,14 @@ Parse `$ARGUMENTS` and set:
|
|
|
133
145
|
- `STAGED_ONLY=true` if `$ARGUMENTS` contains `--staged`
|
|
134
146
|
- `BASE_REF` from `--base <ref>` if present
|
|
135
147
|
- `HEAD_REF` from `--head <ref>` if present
|
|
148
|
+
- `FIX_THRESHOLD` from `--fix-threshold <value>` if present (must match `P0`, `P1`, `P2`, or `P3`); leave empty to defer to `.mmr.yaml`/built-in default
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
FIX_THRESHOLD=""
|
|
152
|
+
if [[ "$ARGUMENTS" =~ (^|[[:space:]])--fix-threshold[[:space:]]+(P[0-3])($|[[:space:]]) ]]; then
|
|
153
|
+
FIX_THRESHOLD="${BASH_REMATCH[2]}"
|
|
154
|
+
fi
|
|
155
|
+
```
|
|
136
156
|
|
|
137
157
|
If `--head` is provided without `--base`, stop and tell the user both refs are
|
|
138
158
|
required for explicit-range review.
|
|
@@ -312,14 +332,14 @@ clean ref range exists.
|
|
|
312
332
|
All channels should receive an equivalent prompt bundle built from the local review scope:
|
|
313
333
|
|
|
314
334
|
```text
|
|
315
|
-
You are reviewing local code changes before commit or push. Report
|
|
316
|
-
and
|
|
335
|
+
You are reviewing local code changes before commit or push. Report all P0, P1,
|
|
336
|
+
P2, and P3 findings; the project's fix threshold is applied downstream.
|
|
317
337
|
|
|
318
338
|
## Scope
|
|
319
339
|
[scope label]
|
|
320
340
|
|
|
321
341
|
## Review Standards
|
|
322
|
-
[docs/review-standards.md if present, otherwise define P0
|
|
342
|
+
[docs/review-standards.md if present, otherwise define P0–P3]
|
|
323
343
|
|
|
324
344
|
## Coding Standards
|
|
325
345
|
[docs/coding-standards.md]
|
|
@@ -342,7 +362,7 @@ Respond with JSON:
|
|
|
342
362
|
"approved": true/false,
|
|
343
363
|
"findings": [
|
|
344
364
|
{
|
|
345
|
-
"severity": "P0" | "P1" | "P2",
|
|
365
|
+
"severity": "P0" | "P1" | "P2" | "P3",
|
|
346
366
|
"location": "file:line or section",
|
|
347
367
|
"description": "what is wrong",
|
|
348
368
|
"suggestion": "specific fix"
|
|
@@ -364,7 +384,7 @@ Use these rules:
|
|
|
364
384
|
| Any single P2 | Fix unless clearly inapplicable; if disputed, surface to user |
|
|
365
385
|
| All executed channels approve | Candidate passes review |
|
|
366
386
|
| Strong contradiction on a medium-severity issue | Verdict becomes `needs-user-decision` |
|
|
367
|
-
| Compensating-pass
|
|
387
|
+
| Compensating-pass blocking finding | Single-source confidence — fix per normal thresholds, but label as compensating in summary |
|
|
368
388
|
|
|
369
389
|
### Step 7: Apply Fixes Unless in Report-Only Mode
|
|
370
390
|
|
|
@@ -374,10 +394,10 @@ If `REPORT_ONLY=true`:
|
|
|
374
394
|
- Stop
|
|
375
395
|
|
|
376
396
|
Otherwise:
|
|
377
|
-
1. Fix all
|
|
397
|
+
1. Fix all findings at or above `fix_threshold` (read from `results.fix_threshold` in the verdict JSON; default `P2`)
|
|
378
398
|
2. Re-run the channels that produced findings
|
|
379
399
|
3. Keep iterating as long as each new round surfaces *different, concrete, fixable* findings — that is healthy review/fix iteration, not a stuck loop
|
|
380
|
-
4. The 3-round limit is **per finding**: stop and surface to the user when the *same*
|
|
400
|
+
4. The 3-round limit is **per finding**: stop and surface to the user when the *same* blocking finding (or set) recurs across 3 attempts without progress. Other stop conditions: a finding is genuinely ambiguous (channels contradict each other), or the user explicitly asks to stop. Use verdict `needs-user-decision` for ambiguity, `blocked` for stuck-loop cases.
|
|
381
401
|
|
|
382
402
|
**Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not_installed`, `auth_failed`, or `timeout` during fix rounds — its availability does not change within a session.
|
|
383
403
|
|
|
@@ -385,9 +405,9 @@ Otherwise:
|
|
|
385
405
|
|
|
386
406
|
Return exactly one verdict:
|
|
387
407
|
|
|
388
|
-
- `pass` — all channels completed with `full` coverage, no unresolved
|
|
389
|
-
- `degraded-pass` — at least one channel was skipped/compensated (coverage is not all `full`), but all executed and compensating channels have no unresolved
|
|
390
|
-
- `blocked` — gate failed: at least one unresolved finding sits at or above the fix threshold (typically the *same* finding(s) remain unresolved after 3 fix attempts;
|
|
408
|
+
- `pass` — all channels completed with `full` coverage, no unresolved findings at or above `fix_threshold`
|
|
409
|
+
- `degraded-pass` — at least one channel was skipped/compensated (coverage is not all `full`), but all executed and compensating channels have no unresolved findings at or above `fix_threshold`
|
|
410
|
+
- `blocked` — gate failed: at least one unresolved finding sits at or above the fix threshold (typically the *same* finding(s) remain unresolved after 3 fix attempts; the threshold defaults to `P2` but is configurable via `.mmr.yaml` or `--fix-threshold`)
|
|
391
411
|
- `needs-user-decision` — no channels completed (no reconciled result was possible), reviewer disagreement / contradictions, or a finding requires human judgment that automated iteration can't resolve
|
|
392
412
|
|
|
393
413
|
When compensating passes ran for any channel, the maximum achievable verdict is `degraded-pass` — never `pass`, even if all findings are resolved. When both external channels were compensated, the review summary must note: "All findings are single-model (Claude only)."
|
|
@@ -424,5 +444,5 @@ for the next delivery step (commit, push, or PR creation).
|
|
|
424
444
|
2. **All 3 channels are mandatory** — skip only when a tool is genuinely not installed, never by choice.
|
|
425
445
|
3. **Auth failures are not silent** — always surface to the user with recovery instructions.
|
|
426
446
|
4. **Independence** — never share one channel's output with another.
|
|
427
|
-
5. **Fix before proceeding** —
|
|
447
|
+
5. **Fix before proceeding** — findings at or above `fix_threshold` must be resolved before moving to the next task.
|
|
428
448
|
6. **Dispatch pattern** follows `multi-model-review-dispatch` knowledge entry. When modifying channel dispatch in this file, verify consistency with `review-pr.md` and `post-implementation-review.md`.
|
|
@@ -9,7 +9,7 @@ conditional: null
|
|
|
9
9
|
stateless: true
|
|
10
10
|
category: tool
|
|
11
11
|
knowledge-base: [multi-model-review-dispatch, automated-review-tooling]
|
|
12
|
-
argument-hint: "<PR
|
|
12
|
+
argument-hint: "<PR# or blank> [--fix-threshold P0|P1|P2|P3]"
|
|
13
13
|
---
|
|
14
14
|
|
|
15
15
|
## Purpose
|
|
@@ -44,7 +44,7 @@ The three channels are:
|
|
|
44
44
|
|
|
45
45
|
## Inputs
|
|
46
46
|
|
|
47
|
-
- $ARGUMENTS — PR number (optional; auto-detected from current branch if omitted)
|
|
47
|
+
- $ARGUMENTS — PR number (optional; auto-detected from current branch if omitted) and/or `--fix-threshold P0|P1|P2|P3` to override the project's configured threshold for this run
|
|
48
48
|
- `.mmr.yaml` — MMR CLI configuration (channels, review_criteria, defaults)
|
|
49
49
|
|
|
50
50
|
The CLI handles review context via config (`review_criteria` in `.mmr.yaml`).
|
|
@@ -54,7 +54,7 @@ in the review criteria config rather than read at dispatch time.
|
|
|
54
54
|
## Expected Outputs
|
|
55
55
|
|
|
56
56
|
- All three CLI review channels executed (or fallback documented) plus the Superpowers code-reviewer 4th channel reconciled via `mmr reconcile`
|
|
57
|
-
-
|
|
57
|
+
- findings at or above the configured `fix_threshold` fixed before proceeding (read from `results.fix_threshold` in the verdict JSON; default `P2`)
|
|
58
58
|
- Review summary with per-channel results and reconciliation
|
|
59
59
|
|
|
60
60
|
## Instructions
|
|
@@ -62,8 +62,21 @@ in the review criteria config rather than read at dispatch time.
|
|
|
62
62
|
### Step 1: Identify the PR
|
|
63
63
|
|
|
64
64
|
```bash
|
|
65
|
-
#
|
|
66
|
-
|
|
65
|
+
# Strip --fix-threshold from $ARGUMENTS if present; remainder is the PR number.
|
|
66
|
+
# Strip the entire matched span (BASH_REMATCH[0]) — including whatever
|
|
67
|
+
# whitespace separator was used (space, tab, multi-space). Replacing with a
|
|
68
|
+
# single space preserves token boundaries; tr -d '[:space:]' below drops
|
|
69
|
+
# everything else.
|
|
70
|
+
FIX_THRESHOLD=""
|
|
71
|
+
ARGS_REMAINING="$ARGUMENTS"
|
|
72
|
+
if [[ "$ARGS_REMAINING" =~ (^|[[:space:]])--fix-threshold[[:space:]]+(P[0-3])($|[[:space:]]) ]]; then
|
|
73
|
+
FIX_THRESHOLD="${BASH_REMATCH[2]}"
|
|
74
|
+
ARGS_REMAINING="${ARGS_REMAINING//${BASH_REMATCH[0]}/ }"
|
|
75
|
+
fi
|
|
76
|
+
|
|
77
|
+
# Use remaining argument if provided, otherwise detect from current branch
|
|
78
|
+
PR_NUMBER="$(echo "$ARGS_REMAINING" | tr -d '[:space:]')"
|
|
79
|
+
PR_NUMBER="${PR_NUMBER:-$(gh pr view --json number -q .number 2>/dev/null)}"
|
|
67
80
|
```
|
|
68
81
|
|
|
69
82
|
If no PR is found, stop and tell the user to create a PR first.
|
|
@@ -73,7 +86,9 @@ If no PR is found, stop and tell the user to create a PR first.
|
|
|
73
86
|
Use the MMR CLI as the primary entry point for automated dispatch, reconciliation, and verdict:
|
|
74
87
|
|
|
75
88
|
```bash
|
|
76
|
-
|
|
89
|
+
MMR_FLAGS=(--pr "$PR_NUMBER" --sync --format json)
|
|
90
|
+
[ -n "$FIX_THRESHOLD" ] && MMR_FLAGS+=(--fix-threshold "$FIX_THRESHOLD")
|
|
91
|
+
MMR_RESULT=$(mmr review "${MMR_FLAGS[@]}")
|
|
77
92
|
# Extract job_id from JSON output for use in mmr reconcile
|
|
78
93
|
JOB_ID=$(echo "$MMR_RESULT" | grep -o '"job_id": "[^"]*"' | head -1 | cut -d'"' -f4)
|
|
79
94
|
```
|
|
@@ -168,7 +183,7 @@ reconcile findings after all channels complete:
|
|
|
168
183
|
| One channel flags P0, others approve | **High** | Fix it — P0 is critical from any source |
|
|
169
184
|
| One channel flags P1, others approve | **Medium** | Fix it — P1 findings are mandatory regardless of source count |
|
|
170
185
|
| Channels contradict each other | **Low** | Present to user for adjudication |
|
|
171
|
-
| Compensating-pass
|
|
186
|
+
| Compensating-pass blocking finding | **Single-source** | Fix per normal thresholds, label as compensating |
|
|
172
187
|
|
|
173
188
|
### Step 6: Report Results
|
|
174
189
|
|
|
@@ -200,7 +215,7 @@ Output a review summary in this format:
|
|
|
200
215
|
|
|
201
216
|
Return exactly one verdict:
|
|
202
217
|
|
|
203
|
-
- `pass` — all channels completed and the gate passed (no unresolved findings at or above the configured fix threshold;
|
|
218
|
+
- `pass` — all channels completed and the gate passed (no unresolved findings at or above the configured fix threshold; the threshold defaults to `P2` but is configurable via `.mmr.yaml` or `--fix-threshold`)
|
|
204
219
|
- `degraded-pass` — gate passed but some channels were skipped or replaced by compensating passes (max achievable verdict when any channel was compensated)
|
|
205
220
|
- `blocked` — gate failed: at least one unresolved finding sits at or above the fix threshold (typically the *same* finding(s) remain unresolved after 3 fix attempts)
|
|
206
221
|
- `needs-user-decision` — no channels completed (no reconciled result was possible), reviewer disagreement / contradictions, or a finding requires human judgment that automated iteration can't resolve
|
|
@@ -209,15 +224,15 @@ Verdict precedence: `needs-user-decision` > `blocked` > `degraded-pass` > `pass`
|
|
|
209
224
|
|
|
210
225
|
When compensating passes ran, maximum achievable verdict is `degraded-pass`. When both external channels were compensated, note "All findings are single-model."
|
|
211
226
|
|
|
212
|
-
### Step 7: Fix
|
|
227
|
+
### Step 7: Fix Blocking Findings
|
|
213
228
|
|
|
214
|
-
If any
|
|
229
|
+
If any findings sit at or above `fix_threshold` (the verdict JSON's `fix_threshold` field; default `P2`):
|
|
215
230
|
1. Fix them in the code
|
|
216
231
|
2. Push the fixes: `git push`
|
|
217
232
|
3. Re-run the review to verify fixes: `mmr review --pr "$PR_NUMBER" --sync --format json`
|
|
218
233
|
4. The 3-round limit is **per finding**, not total rounds:
|
|
219
234
|
- **Keep going** when each new round surfaces *different, concrete, fixable* findings — that is healthy review/fix iteration.
|
|
220
|
-
- **Stop and ask the user** when (a) the *same*
|
|
235
|
+
- **Stop and ask the user** when (a) the *same* blocking finding (or set) recurs across 3 attempts without progress, (b) a finding is genuinely ambiguous (channels contradict each other), or (c) the user explicitly asks to stop.
|
|
221
236
|
- **When stopped**, do NOT merge automatically. Document the unresolved findings (severity, location, attempt count) and let the user decide whether to continue fixing, create follow-up issues, or override.
|
|
222
237
|
|
|
223
238
|
**Note:** Fix cycles are an orchestration concern — the caller (agent or human) handles the fix loop. The CLI provides the review and verdict; the caller decides whether to fix and re-run.
|
|
@@ -261,8 +276,8 @@ In either path, output the message and stop. Do NOT proceed to the next task wit
|
|
|
261
276
|
2. **All three CLI channels are mandatory** — Codex CLI, Gemini CLI, and Claude CLI. Plus the Superpowers code-reviewer agent as a complementary 4th channel reconciled via `mmr reconcile` (Step 3). Skip a CLI channel only when a tool is genuinely not installed or auth cannot be recovered (in which case MMR emits a compensating pass for missing Codex/Gemini channels; a missing Claude CLI has no compensator). Never skip by choice.
|
|
262
277
|
3. **Auth failures are not silent** — always surface to the user with the exact recovery command.
|
|
263
278
|
4. **Independence** — never share one channel's output with another. Each reviews the diff independently.
|
|
264
|
-
5. **Fix before proceeding** —
|
|
265
|
-
6. **3-round limit (per finding)** — never attempt to fix the *same*
|
|
279
|
+
5. **Fix before proceeding** — findings at or above `fix_threshold` must be resolved before moving to the next task.
|
|
280
|
+
6. **3-round limit (per finding)** — never attempt to fix the *same* blocking finding more than 3 times. Each round that surfaces a *new* fixable finding is healthy iteration — keep going. Stop only when the same finding recurs across 3 attempts, channels contradict each other, or the user asks to stop.
|
|
266
281
|
7. **Document everything** — the review summary must show which channels ran and which were skipped, with reasons.
|
|
267
282
|
8. **CLI-first** — use `mmr review --sync` as the primary entry point. Manual dispatch is a fallback only.
|
|
268
283
|
9. **Job storage** — the CLI stores job data at `~/.mmr/jobs/{job-id}/results.json`. Review results are available via `mmr results <job-id>`.
|
package/package.json
CHANGED
package/skills/mmr/SKILL.md
CHANGED
|
@@ -119,13 +119,20 @@ Re-run `mmr config test` after re-authenticating to verify.
|
|
|
119
119
|
|
|
120
120
|
## Severity Gate
|
|
121
121
|
|
|
122
|
-
Default threshold is P2 (
|
|
122
|
+
Default threshold is `P2` (the verdict gate blocks on P0, P1, and P2;
|
|
123
|
+
P3 findings are kept in the result as **advisory** but don't cause
|
|
124
|
+
`blocked`). Override per-review:
|
|
123
125
|
|
|
124
126
|
```bash
|
|
125
127
|
mmr review --pr 47 --fix-threshold P1 # Only fix P0 and P1
|
|
126
128
|
mmr review --pr 47 --fix-threshold P0 # Only fix critical issues
|
|
127
129
|
```
|
|
128
130
|
|
|
131
|
+
The verdict JSON includes `advisory_count` (count of findings strictly
|
|
132
|
+
below the threshold). Formatted output shows `Advisory: N` (text) or
|
|
133
|
+
`**Advisory:** N` (markdown) when non-zero — useful for spotting real
|
|
134
|
+
findings that the gate didn't block.
|
|
135
|
+
|
|
129
136
|
## Output Formats
|
|
130
137
|
|
|
131
138
|
```bash
|
|
@@ -147,13 +147,13 @@ When dispatching a review, bundle all relevant context into the prompt. Each CLI
|
|
|
147
147
|
### Template for Artifact Review
|
|
148
148
|
|
|
149
149
|
```
|
|
150
|
-
You are reviewing a project artifact for quality issues. Report P0
|
|
150
|
+
You are reviewing a project artifact for quality issues. Report all P0, P1, P2, and P3 findings; the project's fix threshold is applied downstream.
|
|
151
151
|
|
|
152
152
|
## Severity Definitions
|
|
153
153
|
- P0: Will cause implementation failure, data loss, security vulnerability, or fundamental architectural flaw
|
|
154
154
|
- P1: Will cause bugs in normal usage, inconsistency across documents, or blocks downstream work
|
|
155
155
|
- P2: Improvement opportunity — style, naming, documentation, minor optimization
|
|
156
|
-
-
|
|
156
|
+
- P3: Personal preference, trivial nits — included so a strict project (`fix_threshold: P3`) can act on them; otherwise advisory
|
|
157
157
|
|
|
158
158
|
## Review Standards
|
|
159
159
|
[paste contents of docs/review-standards.md if it exists, otherwise use severity definitions above]
|
|
@@ -170,7 +170,7 @@ Respond with a JSON object:
|
|
|
170
170
|
"approved": true/false,
|
|
171
171
|
"findings": [
|
|
172
172
|
{
|
|
173
|
-
"severity": "P0" or "P1" or "P2",
|
|
173
|
+
"severity": "P0" or "P1" or "P2" or "P3",
|
|
174
174
|
"location": "section or line reference",
|
|
175
175
|
"description": "what's wrong",
|
|
176
176
|
"suggestion": "specific fix"
|
|
@@ -179,13 +179,13 @@ Respond with a JSON object:
|
|
|
179
179
|
"summary": "one-line assessment"
|
|
180
180
|
}
|
|
181
181
|
|
|
182
|
-
If no
|
|
182
|
+
If no findings, respond with: { "approved": true, "findings": [], "summary": "No issues found." }
|
|
183
183
|
```
|
|
184
184
|
|
|
185
185
|
### Template for PR Diff Review
|
|
186
186
|
|
|
187
187
|
```
|
|
188
|
-
You are reviewing a pull request diff. Report P0, P1, and
|
|
188
|
+
You are reviewing a pull request diff. Report all P0, P1, P2, and P3 findings; the project's fix threshold is applied downstream.
|
|
189
189
|
|
|
190
190
|
## Review Standards
|
|
191
191
|
[paste docs/review-standards.md]
|