@zigrivers/scaffold 3.11.0 → 3.12.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +4 -3
- package/content/knowledge/core/automated-review-tooling.md +137 -140
- package/content/knowledge/core/multi-model-research-dispatch.md +27 -16
- package/content/knowledge/core/multi-model-review-dispatch.md +47 -6
- package/content/skills/multi-model-dispatch/SKILL.md +20 -22
- package/content/tools/post-implementation-review.md +71 -26
- package/content/tools/review-code.md +37 -11
- package/content/tools/review-pr.md +65 -23
- package/package.json +1 -1
- package/skills/multi-model-dispatch/SKILL.md +20 -22
package/README.md
CHANGED
|
@@ -38,7 +38,7 @@ Either way, Scaffold constructs the prompt and the target AI tool does the work.
|
|
|
38
38
|
|
|
39
39
|
**Depth scale** (1-5) — Controls how thorough each step's output is, from "focus on the core deliverable" (1) to "explore all angles, tradeoffs, and edge cases" (5). Depth resolves with 4-level precedence: CLI flag > step override > custom default > preset default.
|
|
40
40
|
|
|
41
|
-
**Multi-model validation** — At depth 4-5, all 19 review and validation steps can dispatch independent reviews to Codex and/or Gemini CLIs. Two independent models catch more blind spots than one. When both CLIs are available, findings are reconciled by confidence level (both agree = high confidence, single model P0 = still actionable).
|
|
41
|
+
**Multi-model validation** — At depth 4-5, all 19 review and validation steps can dispatch independent reviews to Codex and/or Gemini CLIs. Two independent models catch more blind spots than one. When both CLIs are available, findings are reconciled by confidence level (both agree = high confidence, single model P0 = still actionable). When a channel is unavailable, a compensating Claude self-review pass runs in its place (labeled `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]`, single-source confidence). CLI commands must always run in the foreground — background execution produces empty output. See the [Multi-Model Review](#multi-model-review) section.
|
|
42
42
|
|
|
43
43
|
**State management** — Pipeline progress is tracked in `.scaffold/state.json` with atomic file writes and crash recovery. An advisory lock prevents concurrent runs. Decisions are logged to an append-only `decisions.jsonl`.
|
|
44
44
|
|
|
@@ -931,12 +931,13 @@ mmr review --pr 47 ──→ Dispatches to all channels in background
|
|
|
931
931
|
Agent continues working
|
|
932
932
|
|
|
933
933
|
mmr status mmr-a1b2c3 ──→ Poll progress (which channels done?)
|
|
934
|
-
Exit code: 0=done, 1=running,
|
|
934
|
+
Exit code: 0=done, 1=running, 4=failed
|
|
935
935
|
|
|
936
936
|
mmr results mmr-a1b2c3 ──→ Reconcile findings across channels
|
|
937
|
+
Run compensating passes for unavailable channels
|
|
937
938
|
Apply severity gate
|
|
938
939
|
Output unified findings
|
|
939
|
-
Exit code: 0=passed,
|
|
940
|
+
Exit code: 0=passed, 2=gate failed, 3=degraded
|
|
940
941
|
```
|
|
941
942
|
|
|
942
943
|
**Key features:**
|
|
@@ -10,194 +10,191 @@ Automated PR review leverages AI models to provide consistent, thorough code rev
|
|
|
10
10
|
|
|
11
11
|
## Summary
|
|
12
12
|
|
|
13
|
-
###
|
|
13
|
+
### Review Severity and Reconciliation
|
|
14
14
|
|
|
15
|
-
|
|
16
|
-
- **No CI secrets required** — models run locally via CLI tools
|
|
17
|
-
- **Dual-model review** — run Codex and Gemini (when available) for independent perspectives
|
|
18
|
-
- **Agent-managed loop** — Claude orchestrates the review-fix cycle locally
|
|
15
|
+
See `review-methodology` for severity definitions (P0-P3). See `multi-model-review-dispatch` for finding reconciliation rules.
|
|
19
16
|
|
|
20
|
-
|
|
21
|
-
- `AGENTS.md` — reviewer instructions with project-specific rules
|
|
22
|
-
- `docs/review-standards.md` — severity definitions (P0-P3) and criteria
|
|
23
|
-
- `scripts/cli-pr-review.sh` — dual-model review script
|
|
24
|
-
- `scripts/await-pr-review.sh` — polling script for external bot mode
|
|
17
|
+
**Action thresholds:** P0/P1/P2 findings must be fixed before proceeding to the next task. P3 findings are recorded but not actioned.
|
|
25
18
|
|
|
26
|
-
###
|
|
19
|
+
### Degraded-Mode Behavior
|
|
27
20
|
|
|
28
|
-
|
|
29
|
-
- **P0 (blocking)** — must fix before merge (security, data loss, broken functionality)
|
|
30
|
-
- **P1 (important)** — should fix before merge (bugs, missing tests, performance)
|
|
31
|
-
- **P2 (suggestion)** — consider fixing (style, naming, documentation)
|
|
32
|
-
- **P3 (nit)** — optional (personal preference, minor optimization)
|
|
21
|
+
#### Verdict Definitions
|
|
33
22
|
|
|
34
|
-
|
|
23
|
+
These are the authoritative verdict definitions. Tool files (`review-code.md`, `review-pr.md`) reference these.
|
|
35
24
|
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
25
|
+
| Verdict | Condition |
|
|
26
|
+
|---------|-----------|
|
|
27
|
+
| `pass` | All configured channels ran, no unresolved P0/P1/P2 |
|
|
28
|
+
| `degraded-pass` | Channels skipped, compensated, or have non-full coverage (e.g., partial timeout), no unresolved P0/P1/P2 |
|
|
29
|
+
| `blocked` | Unresolved P0/P1/P2 after 3 fix rounds |
|
|
30
|
+
| `needs-user-decision` | Contradictions or unresolvable findings |
|
|
41
31
|
|
|
42
|
-
|
|
32
|
+
**Verdict precedence:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. When multiple conditions apply, the higher-precedence verdict wins.
|
|
43
33
|
|
|
44
|
-
|
|
45
|
-
1. Agent creates PR
|
|
46
|
-
2. Agent runs `scripts/cli-pr-review.sh` (or review runs automatically)
|
|
47
|
-
3. Review findings are posted as PR comments or written to a local file
|
|
48
|
-
4. Agent addresses P0/P1/P2 findings, pushes fixes
|
|
49
|
-
5. Re-review until no P0/P1/P2 findings remain
|
|
50
|
-
6. PR is ready for merge
|
|
34
|
+
**Both external channels missing:** Maximum achievable verdict is `degraded-pass` — never `pass`. Review summary must note: "All findings are single-model (Claude only). External validation was unavailable."
|
|
51
35
|
|
|
52
|
-
|
|
36
|
+
#### Status Model
|
|
53
37
|
|
|
54
|
-
|
|
38
|
+
`compensating` is a **coverage label** applied to a channel's output, not a replacement for the root-cause status. Each channel retains its root-cause status (`not_installed`, `auth_failed`, `auth_timeout`, `failed`) AND gains a coverage label (`compensating (X-equivalent)`) when a compensating pass ran. The fix cycle uses the **root-cause status** to decide whether to retry (never retry `not_installed`, `auth_failed`, `auth_timeout`). The report uses the **coverage label** to show the reader what ran.
|
|
55
39
|
|
|
56
|
-
|
|
40
|
+
#### Compensating Passes
|
|
57
41
|
|
|
58
|
-
|
|
59
|
-
# Code Review Instructions
|
|
42
|
+
When an external channel (Codex or Gemini) is unavailable, run a compensating Claude self-review pass:
|
|
60
43
|
|
|
61
|
-
|
|
62
|
-
[
|
|
44
|
+
- Same prompt structure as the missing channel, executed as a Claude self-review pass.
|
|
45
|
+
- Labeled `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]` in the review summary.
|
|
46
|
+
- Missing Codex → focus on implementation correctness, security, API contracts.
|
|
47
|
+
- Missing Gemini → focus on architectural patterns, design reasoning, broad context.
|
|
48
|
+
- Missing both → two compensating passes (one per missing channel's strength area).
|
|
49
|
+
- Compensating-pass findings are **single-source confidence** — they do NOT raise to high confidence even if they agree with another channel's findings.
|
|
50
|
+
- Normal mandatory-fix thresholds apply: P0/P1/P2 findings from compensating passes still require fixing.
|
|
63
51
|
|
|
64
|
-
|
|
65
|
-
- Security: [project-specific security concerns]
|
|
66
|
-
- Performance: [known hot paths or constraints]
|
|
67
|
-
- Testing: [coverage requirements, test patterns]
|
|
52
|
+
**Superpowers channel:** No compensating pass needed — Superpowers is a Claude subagent and is always available. If the Superpowers plugin is not installed, run available external CLIs and warn the user that review coverage is reduced.
|
|
68
53
|
|
|
69
|
-
|
|
70
|
-
See docs/coding-standards.md for:
|
|
71
|
-
- Naming conventions
|
|
72
|
-
- Error handling patterns
|
|
73
|
-
- Logging standards
|
|
54
|
+
#### Foreground-Only Execution
|
|
74
55
|
|
|
75
|
-
|
|
76
|
-
[Project-specific patterns reviewers should enforce]
|
|
56
|
+
Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty or truncated output from Codex and Gemini CLIs. Multiple foreground calls can still run in parallel if the tool runner supports parallel tool invocations.
|
|
77
57
|
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
58
|
+
This constraint is intentionally duplicated from `multi-model-review-dispatch`. Knowledge entries are injected independently by the assembly engine — an agent may receive this entry without `multi-model-review-dispatch`, so both need the constraint.
|
|
59
|
+
|
|
60
|
+
## Deep Guidance
|
|
61
|
+
|
|
62
|
+
### Finding Reconciliation
|
|
63
|
+
|
|
64
|
+
After all channels complete (including compensating passes), reconcile findings using the rules in `multi-model-review-dispatch`. This orchestration entry triggers reconciliation; the dispatch entry defines how to perform it.
|
|
81
65
|
|
|
82
|
-
|
|
66
|
+
Reconciliation normalizes findings from all channels (real and compensating) to a common schema, then matches findings across channels by location and category. The purpose is to detect when multiple independent channels agree on a finding (raising confidence) and to surface contradictions that require human judgment. A finding reported by Codex alone has lower confidence than the same finding reported by both Codex and Gemini.
|
|
83
67
|
|
|
84
|
-
The
|
|
68
|
+
The reconciliation output is a deduplicated list of findings with confidence scores. High-confidence findings (agreed by 2+ real channels) are actionable without further discussion. Low-confidence findings (single-source, or from compensating passes) still require action at P0/P1/P2 but should be noted as lower-confidence in the review summary.
|
|
69
|
+
|
|
70
|
+
Findings that appear in all three channels (Codex, Gemini, Superpowers) are considered maximum-confidence and should be surfaced first in the review summary. Findings that appear in only one channel should include the channel name in the finding description to help the developer assess confidence independently.
|
|
85
71
|
|
|
86
72
|
```bash
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
#
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
# 2. Run Codex review (if available)
|
|
94
|
-
if command -v codex &>/dev/null; then
|
|
95
|
-
codex_findings=$(echo "$diff" | codex review --context AGENTS.md)
|
|
96
|
-
fi
|
|
97
|
-
|
|
98
|
-
# 3. Run Gemini review (if available)
|
|
99
|
-
if command -v gemini &>/dev/null; then
|
|
100
|
-
gemini_findings=$(echo "$diff" | gemini review --context AGENTS.md)
|
|
101
|
-
fi
|
|
102
|
-
|
|
103
|
-
# 4. Reconcile findings
|
|
104
|
-
# - Findings from both models: HIGH confidence
|
|
105
|
-
# - Findings from one model: MEDIUM confidence
|
|
106
|
-
# - Contradictions: flagged for human review
|
|
73
|
+
# Orchestration reconciliation workflow
|
|
74
|
+
# 1. Collect findings from all channels (real + compensating)
|
|
75
|
+
# 2. Normalize to common schema (severity, category, location, description)
|
|
76
|
+
# 3. Match findings across channels by location + category
|
|
77
|
+
# 4. Apply consensus rules from multi-model-review-dispatch
|
|
78
|
+
# 5. Produce reconciled findings list with confidence scores
|
|
107
79
|
```
|
|
108
80
|
|
|
109
|
-
###
|
|
81
|
+
### Channel Dispatch Pattern and Orchestration
|
|
110
82
|
|
|
111
|
-
|
|
112
|
-
- Severity levels with concrete examples per project
|
|
113
|
-
- What constitutes a blocking review (P0/P1/P2 threshold)
|
|
114
|
-
- Auto-approve criteria (when review can be skipped)
|
|
115
|
-
- Review SLA (how long before auto-approve kicks in)
|
|
83
|
+
Each external channel (Codex, Gemini) follows the same dispatch pattern: check installation, check auth, then dispatch as a foreground call. If any step fails, record the root-cause status, queue a compensating pass, and continue to the next channel. The Superpowers channel is always available as a Claude subagent and does not require installation or auth checks.
|
|
116
84
|
|
|
117
|
-
|
|
85
|
+
```bash
|
|
86
|
+
# Channel dispatch pattern
|
|
87
|
+
# For each external channel (Codex, Gemini):
|
|
88
|
+
# 1. command -v <tool> >/dev/null 2>&1 || { status=not_installed; queue_compensating; continue; }
|
|
89
|
+
# 2. <auth_check> || { status=auth_failed; queue_compensating; continue; }
|
|
90
|
+
# 3. <dispatch_foreground> || { status=failed; queue_compensating; continue; }
|
|
91
|
+
# For Superpowers: dispatch subagent (always available)
|
|
92
|
+
# After all: run queued compensating passes → reconcile → verdict
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
After all channels and compensating passes complete, run the reconciliation workflow above and apply the verdict decision flow. Channel results and compensating-pass labels must be preserved in the review output for auditability — do not collapse or omit them even when findings are empty.
|
|
96
|
+
|
|
97
|
+
### Degraded-Mode Worked Example
|
|
118
98
|
|
|
119
|
-
|
|
120
|
-
1. Claude performs an enhanced self-review of the diff
|
|
121
|
-
2. Focus on the AGENTS.md review criteria
|
|
122
|
-
3. Apply the same severity classification
|
|
123
|
-
4. Document that the review was single-model
|
|
99
|
+
When Codex is unavailable (not installed or auth failure), the orchestration proceeds as follows:
|
|
124
100
|
|
|
125
|
-
|
|
101
|
+
1. The installation check (`command -v codex`) fails. Codex channel status is set to `not_installed`.
|
|
102
|
+
2. A compensating Codex-equivalent pass is queued: a Claude self-review focused on implementation correctness, security, and API contracts.
|
|
103
|
+
3. Gemini and Superpowers channels run normally.
|
|
104
|
+
4. The compensating pass runs, producing findings labeled `[compensating: Codex-equivalent]`.
|
|
105
|
+
5. Reconciliation merges findings from all three sources (Gemini, Superpowers, compensating-Codex).
|
|
106
|
+
6. Maximum achievable verdict is `degraded-pass` because a real channel was absent.
|
|
107
|
+
7. The review summary notes: "Codex channel: not_installed (compensating: Codex-equivalent pass ran)."
|
|
126
108
|
|
|
127
|
-
|
|
128
|
-
- Add new review focus areas when new patterns emerge
|
|
129
|
-
- Remove rules that linters now enforce automatically
|
|
130
|
-
- Update AGENTS.md when architecture changes
|
|
131
|
-
- Track false-positive rates and adjust thresholds
|
|
109
|
+
**Fix-cycle channel rule:** Only re-run channels that originally completed or ran as compensating passes. `failed` channels are covered by their compensating pass and are not retried during fix rounds. Never retry a channel with status `not_installed`, `auth_failed`, or `auth_timeout` — these indicate persistent environment conditions that will not resolve between fix rounds.
|
|
132
110
|
|
|
133
|
-
###
|
|
111
|
+
### Verdict Decision Flow
|
|
134
112
|
|
|
135
|
-
|
|
113
|
+
Apply the following evaluation order to determine the final verdict. The first matching condition wins; all subsequent conditions are skipped.
|
|
136
114
|
|
|
137
115
|
```
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
│ Unique finding │ Found │ - │ MEDIUM confidence │
|
|
144
|
-
│ Unique finding │ - │ Found │ MEDIUM confidence │
|
|
145
|
-
│ Contradiction │ Fix X │ Keep X │ Flag for agent │
|
|
146
|
-
└─────────────────┴──────────┴──────────┴───────────────────┘
|
|
116
|
+
Verdict evaluation order:
|
|
117
|
+
1. Any contradictions or unresolvable findings? → needs-user-decision
|
|
118
|
+
2. Any unresolved P0/P1/P2 after 3 fix rounds? → blocked
|
|
119
|
+
3. Any channel not at full coverage? → degraded-pass
|
|
120
|
+
4. All channels completed, no unresolved P0/P1/P2? → pass
|
|
147
121
|
```
|
|
148
122
|
|
|
149
|
-
|
|
123
|
+
A "contradiction" exists when two channels report opposite conclusions about the same code location — for example, Codex flags a function as insecure while Gemini explicitly approves it. Contradictions cannot be resolved by the agent alone and must be surfaced to the user.
|
|
124
|
+
|
|
125
|
+
A channel is "not at full coverage" when: it ran as a compensating pass instead of a real tool, it timed out partially, or the Superpowers plugin is not installed and available channels do not cover the full diff.
|
|
126
|
+
|
|
127
|
+
**Verdict precedence reminder:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. If multiple conditions apply simultaneously (for example, both a contradiction and an unresolved P0 exist), the higher-precedence verdict wins.
|
|
128
|
+
|
|
129
|
+
The verdict is always computed after all fix rounds are exhausted — do not emit a partial verdict mid-cycle. If a fix round resolves all P0/P1/P2 findings and no contradictions remain, the verdict upgrades from `blocked` to `pass` or `degraded-pass` depending on channel coverage. This upgrade must be verified explicitly by re-running the reconciliation step after each fix round, not assumed from the fact that fixes were applied.
|
|
150
130
|
|
|
151
131
|
### Security-Focused Review Checklist
|
|
152
132
|
|
|
153
133
|
Every automated review should check:
|
|
154
|
-
- No secrets or credentials in the diff (API keys, passwords, tokens)
|
|
155
|
-
- No `eval()` or equivalent unsafe operations introduced
|
|
156
|
-
- SQL queries use parameterized queries
|
|
157
|
-
- User input is validated before use
|
|
158
|
-
- Authentication/authorization checks are present on new endpoints
|
|
159
|
-
- Dependencies added are from trusted sources with known versions
|
|
134
|
+
- No secrets or credentials in the diff (API keys, passwords, tokens, private keys)
|
|
135
|
+
- No `eval()` or equivalent unsafe operations introduced (dynamic code execution, shell injection)
|
|
136
|
+
- SQL queries use parameterized queries — no string concatenation with user input
|
|
137
|
+
- User input is validated and sanitized before use in queries, commands, or output
|
|
138
|
+
- Authentication/authorization checks are present on all new endpoints and operations
|
|
139
|
+
- Dependencies added are from trusted sources with known, pinned versions
|
|
140
|
+
- No new global state or singletons that could cause cross-request data leaks
|
|
141
|
+
- Error messages do not expose internal paths, stack traces, or sensitive system details
|
|
142
|
+
- File system operations use safe path handling (no path traversal vulnerabilities)
|
|
143
|
+
- Cryptographic operations use approved algorithms and key lengths
|
|
144
|
+
|
|
145
|
+
When reviewing diffs that touch authentication, authorization, or data handling, elevate any security-related finding by one severity level. A finding that would normally be P2 (recommended) becomes P1 (required) in security-sensitive code paths. This conservative stance reflects the asymmetric cost of security failures versus the cost of over-caution during review.
|
|
160
146
|
|
|
161
147
|
### Performance Review Patterns
|
|
162
148
|
|
|
163
|
-
Look for these performance anti-patterns:
|
|
164
|
-
- N+1 queries (loop
|
|
165
|
-
- Missing pagination on list endpoints
|
|
166
|
-
- Synchronous operations that should be async
|
|
167
|
-
- Large objects passed by value instead of reference
|
|
168
|
-
- Missing caching for expensive computations
|
|
169
|
-
- Unbounded growth in arrays or maps
|
|
149
|
+
Look for these performance anti-patterns in the diff:
|
|
150
|
+
- N+1 queries (loop containing individual DB calls — use batch queries or eager loading)
|
|
151
|
+
- Missing pagination on list endpoints (unbounded result sets)
|
|
152
|
+
- Synchronous operations that should be async (blocking I/O in hot paths)
|
|
153
|
+
- Large objects passed by value instead of reference (unnecessary deep copies)
|
|
154
|
+
- Missing caching for expensive computations that are called repeatedly
|
|
155
|
+
- Unbounded growth in arrays or maps (no eviction, no size limits)
|
|
156
|
+
- Missing indexes on columns used in WHERE clauses of new queries
|
|
157
|
+
- Eager loading where lazy loading would suffice (over-fetching)
|
|
158
|
+
- Missing connection pooling or connection reuse for external services
|
|
170
159
|
|
|
171
|
-
###
|
|
160
|
+
### Common False Positives
|
|
172
161
|
|
|
173
|
-
|
|
162
|
+
Track and suppress recurring false positives to reduce noise in future reviews:
|
|
163
|
+
- Test files flagged for "hardcoded values" (test fixtures and expected values are intentional)
|
|
164
|
+
- Migration files flagged for "raw SQL" (migrations must use raw SQL for schema changes)
|
|
165
|
+
- Generated files flagged for style issues (generated code follows its own generator's conventions)
|
|
166
|
+
- Intentional use of `any` types in TypeScript adapter layers or third-party type overrides
|
|
167
|
+
- Deliberate `eslint-disable` comments that are already justified in surrounding context
|
|
168
|
+
- Seed data files flagged for hardcoded credentials (test-only, not production)
|
|
174
169
|
|
|
175
|
-
|
|
176
|
-
## Code Review
|
|
177
|
-
| Command | Purpose |
|
|
178
|
-
|---------|---------|
|
|
179
|
-
| `scripts/cli-pr-review.sh <PR#>` | Run dual-model review |
|
|
180
|
-
| `scripts/await-pr-review.sh <PR#>` | Poll for external review |
|
|
181
|
-
```
|
|
170
|
+
Add suppressions to AGENTS.md under "Out of Scope" to prevent repeated false findings across review cycles.
|
|
182
171
|
|
|
183
|
-
|
|
172
|
+
### Review Metrics and Continuous Improvement
|
|
184
173
|
|
|
185
|
-
|
|
174
|
+
Track these metrics over time to improve review quality and calibrate thresholds:
|
|
186
175
|
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
176
|
+
| Metric | Definition | Use |
|
|
177
|
+
|--------|------------|-----|
|
|
178
|
+
| False positive rate | Findings dismissed without action / total findings | Calibrate severity thresholds |
|
|
179
|
+
| Escape rate | Bugs reaching production despite review / total bugs | Identify coverage gaps |
|
|
180
|
+
| Time to resolve | Average time between finding logged and fix merged | Identify bottlenecks |
|
|
181
|
+
| Coverage | PRs receiving automated review / total PRs merged | Track adoption |
|
|
182
|
+
| Model agreement rate | Findings agreed by 2+ channels / total findings | Tune reconciliation rules |
|
|
183
|
+
| Compensating-pass rate | Reviews using compensating passes / total reviews | Track environment health |
|
|
191
184
|
|
|
192
|
-
|
|
185
|
+
Use the false positive rate to determine whether a severity category is over-triggering. Use the escape rate to determine whether the review is missing entire classes of bugs. Use the compensating-pass rate to identify when the review environment needs maintenance (expired auth tokens, broken CLI installs).
|
|
193
186
|
|
|
194
|
-
|
|
187
|
+
Log metric snapshots in AGENTS.md after each major project milestone. A declining model agreement rate over time suggests either that the review prompts are drifting in quality or that the codebase is accumulating technical debt in areas where models diverge. A rising escape rate despite consistent review coverage is a signal to revisit the severity thresholds or the focus areas in the review prompts.
|
|
188
|
+
|
|
189
|
+
### Fallback When Models Unavailable
|
|
190
|
+
|
|
191
|
+
When external CLIs are unavailable, the degraded-mode behavior defined in the Summary section applies. To summarize the operational steps:
|
|
195
192
|
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
-
|
|
199
|
-
-
|
|
200
|
-
-
|
|
201
|
-
|
|
193
|
+
1. For each unavailable external channel, queue a compensating Claude self-review pass focused on that channel's strength area.
|
|
194
|
+
2. Label findings as `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]`.
|
|
195
|
+
3. Treat compensating findings as single-source confidence — they do not raise to high confidence even when they agree with another channel.
|
|
196
|
+
4. Maximum verdict is `degraded-pass` when any channel ran as compensating instead of real.
|
|
197
|
+
5. When both external channels are unavailable, note "All findings are single-model (Claude only). External validation was unavailable." in the review summary.
|
|
198
|
+
6. Never silently drop unavailable channels — always record the channel status and compensating coverage label in the review output.
|
|
202
199
|
|
|
203
|
-
|
|
200
|
+
**Superpowers channel exception:** Superpowers is a Claude subagent and requires no external CLI or auth. It is always available as long as the Superpowers plugin is installed in the Claude Code environment. If the plugin is not installed, run available external CLIs and warn the user that review coverage is reduced — but do not run a compensating pass for Superpowers (the compensating-pass mechanism only applies to external CLIs that have an installation/auth gate).
|
|
@@ -18,12 +18,13 @@ At higher methodology depths (4+), idea exploration and adversarial challenge be
|
|
|
18
18
|
| 5 | Multi-model with reconciliation | Multi-model with reconciliation |
|
|
19
19
|
|
|
20
20
|
### Graceful Fallback Chain
|
|
21
|
-
1. Check if external CLI is available (`
|
|
22
|
-
2. If
|
|
23
|
-
3. If auth
|
|
24
|
-
4. If
|
|
25
|
-
5. If
|
|
26
|
-
6.
|
|
21
|
+
1. Check if external CLI is available (`command -v codex`, `command -v gemini`)
|
|
22
|
+
2. If not installed, skip that model silently — note in Session Metadata
|
|
23
|
+
3. If installed, check auth (`codex login status`, `NO_BROWSER=true gemini -p "respond with ok" -o json`)
|
|
24
|
+
4. If auth fails, surface loudly to the user with `!` recovery command — do NOT silently skip
|
|
25
|
+
5. If auth succeeds, dispatch with timeout
|
|
26
|
+
6. If no external models available, fall back to primary model with distinct framing prompts
|
|
27
|
+
7. Never block the session waiting for unavailable tools
|
|
27
28
|
|
|
28
29
|
### Reconciliation Rules
|
|
29
30
|
- **2+ models agree** on the same finding = **consensus** — high confidence, present as validated
|
|
@@ -36,19 +37,29 @@ At higher methodology depths (4+), idea exploration and adversarial challenge be
|
|
|
36
37
|
|
|
37
38
|
Before dispatching, verify CLI tools are installed and authenticated:
|
|
38
39
|
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
# Exit 0 = ready. Non-zero = skip Codex.
|
|
40
|
+
### Foreground-Only Execution
|
|
41
|
+
|
|
42
|
+
When an AI agent dispatches research or challenge prompts via a tool runner, always run commands in the foreground. Background execution (`run_in_background`, `&`, `nohup`) produces empty or truncated output from Codex and Gemini CLIs. Multiple foreground calls can still run in parallel if the tool runner supports parallel tool invocations.
|
|
43
43
|
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
44
|
+
```bash
|
|
45
|
+
# Codex CLI — step 1: check installed
|
|
46
|
+
command -v codex >/dev/null 2>&1 || { echo "Codex not installed — skipping"; exit 0; }
|
|
47
|
+
# step 2: check auth
|
|
48
|
+
codex login status 2>/dev/null
|
|
49
|
+
# Exit 0 = ready. Non-zero = auth failure (surface to user).
|
|
50
|
+
|
|
51
|
+
# Gemini CLI — step 1: check installed
|
|
52
|
+
command -v gemini >/dev/null 2>&1 || { echo "Gemini not installed — skipping"; exit 0; }
|
|
53
|
+
# step 2: check auth
|
|
54
|
+
NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
|
|
55
|
+
# Check for "ok" in response. Exit 41 = auth failure (surface to user).
|
|
47
56
|
```
|
|
48
57
|
|
|
49
|
-
|
|
50
|
-
-
|
|
51
|
-
-
|
|
58
|
+
Two distinct failure modes:
|
|
59
|
+
- **Not installed** (`command -v` fails): skip silently, note in Session Metadata
|
|
60
|
+
- **Auth failed** (non-zero after install check): surface loudly — tell the user which tool failed and how to fix it:
|
|
61
|
+
- Codex: "Codex auth expired — run `! codex login` to re-authenticate"
|
|
62
|
+
- Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
|
|
52
63
|
|
|
53
64
|
Auth failures are NOT silent fallbacks — surface them explicitly.
|
|
54
65
|
|
|
@@ -44,21 +44,53 @@ The review never blocks on external model availability.
|
|
|
44
44
|
|
|
45
45
|
## Deep Guidance
|
|
46
46
|
|
|
47
|
+
See `review-methodology` for severity definitions (P0-P3). This entry uses those severities but does not define them.
|
|
48
|
+
|
|
47
49
|
### Dispatch Mechanics
|
|
48
50
|
|
|
51
|
+
#### Foreground-Only Execution
|
|
52
|
+
|
|
53
|
+
When an AI agent dispatches CLI reviews via a tool runner (Claude Code Bash tool, Codex exec, etc.), always run commands in the foreground. Background execution (`run_in_background`, `&`, `nohup`) produces empty or truncated output from Codex and Gemini CLIs. Multiple foreground calls can still run in parallel if the tool runner supports parallel tool invocations.
|
|
54
|
+
|
|
49
55
|
#### CLI Availability Check
|
|
50
56
|
|
|
51
|
-
Before dispatching, verify the model CLI is installed and authenticated:
|
|
57
|
+
Before dispatching, verify the model CLI is installed and authenticated using a two-step process that produces distinct statuses for the orchestration layer:
|
|
58
|
+
|
|
59
|
+
**Step 1 — Installation check:**
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
# Codex: not found -> status: "not_installed"
|
|
63
|
+
command -v codex >/dev/null 2>&1
|
|
64
|
+
|
|
65
|
+
# Gemini: not found -> status: "not_installed"
|
|
66
|
+
command -v gemini >/dev/null 2>&1
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
If the CLI is not found, report status `not_installed` to the orchestration layer. Do not prompt the user to install it.
|
|
70
|
+
|
|
71
|
+
**Step 2 — Auth verification (only if installed):**
|
|
52
72
|
|
|
53
73
|
```bash
|
|
54
|
-
# Codex
|
|
55
|
-
|
|
74
|
+
# Codex: fail -> status: "auth_failed"
|
|
75
|
+
codex login status 2>/dev/null
|
|
56
76
|
|
|
57
|
-
# Gemini
|
|
58
|
-
|
|
77
|
+
# Gemini: exit 41 -> status: "auth_failed"
|
|
78
|
+
NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
|
|
59
79
|
```
|
|
60
80
|
|
|
61
|
-
If
|
|
81
|
+
If auth fails, report status `auth_failed` and surface recovery to the user:
|
|
82
|
+
- Codex: "Codex auth expired — run `! codex login` to re-authenticate"
|
|
83
|
+
- Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
|
|
84
|
+
|
|
85
|
+
If auth check times out (~5 seconds), retry once. If still failing, report `auth_timeout`.
|
|
86
|
+
If auth succeeds, report `ready` and proceed to dispatch.
|
|
87
|
+
|
|
88
|
+
**Post-dispatch terminal states:**
|
|
89
|
+
- `completed` — channel produced results, use normally
|
|
90
|
+
- `partial_timeout` — partial output before timeout; use what was received, note incompleteness. Does NOT trigger compensating pass.
|
|
91
|
+
- `failed` — crashed or unparseable output; triggers compensating pass.
|
|
92
|
+
|
|
93
|
+
Verdict impact: `partial_timeout` and `failed` channels mean the review is degraded. Maximum verdict is `degraded-pass` when any channel has a non-`completed` terminal state.
|
|
62
94
|
|
|
63
95
|
#### Prompt Formatting
|
|
64
96
|
|
|
@@ -241,6 +273,15 @@ Minimum standards for a multi-model review to be considered complete:
|
|
|
241
273
|
|
|
242
274
|
If the primary Claude review produces zero findings and external models are unavailable, the review should explicitly note this as unusual and recommend a targeted re-review at a later stage.
|
|
243
275
|
|
|
276
|
+
#### Degraded-Mode Gate Adaptation
|
|
277
|
+
|
|
278
|
+
When channels are skipped and compensating passes are used:
|
|
279
|
+
|
|
280
|
+
- **Minimum finding count** gate: compensating passes count toward the total but are not treated as separate external channels for consensus purposes.
|
|
281
|
+
- **Reconciliation completeness** gate (cross-model disagreement documentation): applies whenever 2+ distinct model perspectives participate (Claude + one external counts). N/A only when Claude is the sole perspective (no external models and no compensating passes that introduce genuinely different framing).
|
|
282
|
+
- **Coverage threshold** gate: compensating passes satisfy the "every pass has at least one finding or explicit no-issues note" requirement.
|
|
283
|
+
- The reconciled output must record which channels were real, which were compensating, and which were skipped, so the orchestration layer can apply appropriate verdict logic.
|
|
284
|
+
|
|
244
285
|
### Common Anti-Patterns
|
|
245
286
|
|
|
246
287
|
**Blind trust of external findings.** An external model flags an issue and the reviewer includes it without verification. External models hallucinate — they may flag a "missing section" that actually exists, or cite a "contradiction" based on a misread. Fix: every external finding must be verified against the actual artifact before inclusion in the final report.
|
|
@@ -64,7 +64,7 @@ fi
|
|
|
64
64
|
|
|
65
65
|
The `!` prefix runs the command in the user's terminal session, allowing interactive auth flows (browser OAuth, Y/n prompts) that can't work in headless mode.
|
|
66
66
|
|
|
67
|
-
**If neither CLI is available or authenticated**:
|
|
67
|
+
**If neither CLI is available or authenticated**: Queue a compensating Claude pass focused on the failed channel's strength area. Document this as "single-model review (no external CLIs available)."
|
|
68
68
|
|
|
69
69
|
## Correct Invocation Patterns
|
|
70
70
|
|
|
@@ -134,6 +134,12 @@ NO_BROWSER=true gemini -p "REVIEW_PROMPT_HERE" --output-format json -s --approva
|
|
|
134
134
|
|
|
135
135
|
**Output**: JSON on stdout with `{ response, stats, error }` structure.
|
|
136
136
|
|
|
137
|
+
## Foreground-Only Execution
|
|
138
|
+
|
|
139
|
+
Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty or truncated output from both CLIs. Multiple foreground calls in a single message are fine — the tool runner supports parallel invocations.
|
|
140
|
+
|
|
141
|
+
This means: when dispatching reviews, make each CLI call a separate foreground Bash tool invocation. Do NOT use shell `&` or background subshells.
|
|
142
|
+
|
|
137
143
|
## Context Bundling
|
|
138
144
|
|
|
139
145
|
When dispatching a review, bundle all relevant context into the prompt. Each CLI gets the same bundle — do NOT share one model's review with the other.
|
|
@@ -208,35 +214,27 @@ You are reviewing a pull request diff. Report P0, P1, and P2 issues.
|
|
|
208
214
|
| PR diff | Full diff | If >2000 lines, split into file groups |
|
|
209
215
|
| Implementation plan | Task list + representative tasks | Include full task list, detail for flagged tasks |
|
|
210
216
|
|
|
211
|
-
##
|
|
212
|
-
|
|
213
|
-
When both CLIs produce results, reconcile findings using these rules:
|
|
217
|
+
## Finding Reconciliation
|
|
214
218
|
|
|
215
|
-
|
|
216
|
-
|----------|-----------|--------|
|
|
217
|
-
| Both flag same issue | **High** | Fix immediately — two independent models agree |
|
|
218
|
-
| Both approve (no findings) | **High** | Proceed confidently |
|
|
219
|
-
| One flags P0, other approves | **High** | Fix it — P0 is critical enough from a single source |
|
|
220
|
-
| One flags P1, other approves | **Medium** | Review the finding carefully before fixing. If the finding is specific and actionable, fix it. If vague, skip. |
|
|
221
|
-
| Models contradict each other | **Low** | Present both findings to the user for adjudication |
|
|
219
|
+
When multiple models produce findings, reconcile them using the rules defined in `multi-model-review-dispatch`. Key principles:
|
|
222
220
|
|
|
223
|
-
**Independence rule**: Never share one model's review output with the other. Each model must review the artifact independently to avoid confirmation bias.
|
|
221
|
+
- **Independence rule**: Never share one model's review output with the other. Each model must review the artifact independently to avoid confirmation bias.
|
|
222
|
+
- **Round tracking**: For iterative reviews (like PR review loops), track the round number. After 3 fix rounds with unresolved findings, stop and surface the verdict (`blocked` or `needs-user-decision`) to the user. Do NOT auto-merge.
|
|
224
223
|
|
|
225
|
-
|
|
224
|
+
For the full consensus rules, confidence scoring, and disagreement resolution process, see `multi-model-review-dispatch`.
|
|
226
225
|
|
|
227
226
|
## Fallback Behavior
|
|
228
227
|
|
|
229
228
|
| Situation | Fallback |
|
|
230
229
|
|-----------|----------|
|
|
231
|
-
| Neither CLI available |
|
|
232
|
-
| Codex only | Single-model review with Codex |
|
|
233
|
-
| Gemini only | Single-model review with Gemini |
|
|
234
|
-
| **CLI auth expired** | **Surface to user with recovery command — do NOT silently fall back** |
|
|
235
|
-
| One CLI fails mid-review
|
|
236
|
-
| Both CLIs fail
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
**Auth failures are NOT silent fallbacks.** The difference between "CLI not installed" (fall back quietly) and "CLI auth expired" (user action required) is critical. Auth can be fixed in 30 seconds with an interactive command — silently skipping wastes the user's review infrastructure.
|
|
230
|
+
| Neither CLI available | Queue two compensating Claude passes (one per missing channel's strength area). Label findings. Max verdict: `degraded-pass`. |
|
|
231
|
+
| Codex only | Single-model review with Codex + compensating Claude pass for Gemini |
|
|
232
|
+
| Gemini only | Single-model review with Gemini + compensating Claude pass for Codex |
|
|
233
|
+
| **CLI auth expired** | **Surface to user with `!` recovery command — do NOT silently fall back** |
|
|
234
|
+
| One CLI fails mid-review | Use partial results if available, else queue compensating pass. Note failure in summary. |
|
|
235
|
+
| Both CLIs fail | Two compensating passes, max verdict: `degraded-pass`. Warn user. |
|
|
236
|
+
|
|
237
|
+
Auth failures are NOT silent fallbacks.
|
|
240
238
|
|
|
241
239
|
## Integration with Review Steps
|
|
242
240
|
|
|
@@ -208,17 +208,27 @@ Severity:
|
|
|
208
208
|
Return ONLY valid JSON. No markdown, no explanation outside the JSON object.
|
|
209
209
|
```
|
|
210
210
|
|
|
211
|
+
**Foreground only:** Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty output.
|
|
212
|
+
|
|
211
213
|
#### Channel 1: Codex CLI
|
|
212
214
|
|
|
213
|
-
**
|
|
215
|
+
**Installation check first:**
|
|
216
|
+
|
|
217
|
+
```bash
|
|
218
|
+
command -v codex >/dev/null 2>&1 || echo "Codex not installed"
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
If not installed: queue a compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`). Skip to Channel 2.
|
|
222
|
+
|
|
223
|
+
**Auth check** (tokens expire — always re-verify):
|
|
214
224
|
|
|
215
225
|
```bash
|
|
216
226
|
codex login status 2>/dev/null && echo "codex authenticated" || echo "codex NOT authenticated"
|
|
217
227
|
```
|
|
218
228
|
|
|
219
|
-
If not authenticated: tell the user "Codex auth expired. Run: `! codex login`".
|
|
220
|
-
|
|
221
|
-
If Codex
|
|
229
|
+
If not authenticated: tell the user "Codex auth expired. Run: `! codex login`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (auth_timeout or user declines): queue a compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`).
|
|
230
|
+
|
|
231
|
+
If Codex fails during execution (non-zero exit, malformed output, timeout): queue a compensating pass with the same focus and label.
|
|
222
232
|
|
|
223
233
|
**Run the review:**
|
|
224
234
|
|
|
@@ -230,15 +240,23 @@ Store the JSON output as `CODEX_PHASE1_FINDINGS`.
|
|
|
230
240
|
|
|
231
241
|
#### Channel 2: Gemini CLI
|
|
232
242
|
|
|
233
|
-
**
|
|
243
|
+
**Installation check first:**
|
|
244
|
+
|
|
245
|
+
```bash
|
|
246
|
+
command -v gemini >/dev/null 2>&1 || echo "Gemini not installed"
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
If not installed: queue a compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`). Skip to Channel 3.
|
|
250
|
+
|
|
251
|
+
**Auth check:**
|
|
234
252
|
|
|
235
253
|
```bash
|
|
236
254
|
NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
|
|
237
255
|
```
|
|
238
256
|
|
|
239
|
-
If exit code is 41: tell the user "Gemini auth expired. Run: `! gemini -p \"hello\"`".
|
|
240
|
-
|
|
241
|
-
If Gemini
|
|
257
|
+
If exit code is 41: tell the user "Gemini auth expired. Run: `! gemini -p \"hello\"`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (auth_timeout or user declines): queue a compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`).
|
|
258
|
+
|
|
259
|
+
If Gemini fails during execution (non-zero exit, malformed output, timeout): queue a compensating pass with the same focus and label.
|
|
242
260
|
|
|
243
261
|
**Run the review (independent — do NOT include Codex output in this prompt):**
|
|
244
262
|
|
|
@@ -248,6 +266,8 @@ NO_BROWSER=true gemini -p "[PHASE 1 REVIEW PROMPT]" --output-format json --appro
|
|
|
248
266
|
|
|
249
267
|
Store as `GEMINI_PHASE1_FINDINGS`.
|
|
250
268
|
|
|
269
|
+
**After all Phase 1 channels:** Run any queued compensating passes. Record which channels were real, compensating, or skipped. This availability map is used for Phase 2.
|
|
270
|
+
|
|
251
271
|
#### Channel 3: Superpowers Code-Reviewer
|
|
252
272
|
|
|
253
273
|
Dispatch the `superpowers:code-reviewer` subagent. This channel always runs.
|
|
@@ -349,13 +369,21 @@ Acceptance Criteria:
|
|
|
349
369
|
Run all three review channels. Each runs independently — do NOT share one channel's
|
|
350
370
|
output with another.
|
|
351
371
|
|
|
372
|
+
**Foreground only:** Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty output.
|
|
373
|
+
|
|
374
|
+
**Session-scoped channel availability:** Phase 1 has already probed channel installation and auth. Pass the Phase 1 channel availability results to each Phase 2 subagent. Subagents do NOT re-probe — if Codex was `auth failed` in Phase 1, every Phase 2 subagent treats Codex as unavailable and runs a compensating pass immediately.
|
|
375
|
+
|
|
376
|
+
Phase 2 compensating passes adapt the focus to story context:
|
|
377
|
+
- Missing Codex → focus compensating pass on implementation correctness and edge cases for this story's acceptance criteria.
|
|
378
|
+
- Missing Gemini → focus compensating pass on design coherence and architectural alignment for this story.
|
|
379
|
+
|
|
352
380
|
Channel 1 — Codex CLI:
|
|
353
|
-
Auth: codex login status 2>/dev/null
|
|
354
381
|
Run: codex exec --skip-git-repo-check -s read-only --ephemeral "[PER-STORY PROMPT]" 2>/dev/null
|
|
382
|
+
(Only if Codex was available in Phase 1. Otherwise run compensating pass immediately.)
|
|
355
383
|
|
|
356
384
|
Channel 2 — Gemini CLI (independent):
|
|
357
|
-
Auth: NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1 (exit 41 = auth failure)
|
|
358
385
|
Run: NO_BROWSER=true gemini -p "[PER-STORY PROMPT]" --output-format json --approval-mode yolo 2>/dev/null
|
|
386
|
+
(Only if Gemini was available in Phase 1. Otherwise run compensating pass immediately.)
|
|
359
387
|
|
|
360
388
|
Channel 3 — Superpowers code-reviewer:
|
|
361
389
|
Dispatch superpowers:code-reviewer with:
|
|
@@ -409,9 +437,14 @@ The per-story review prompt for Codex and Gemini:
|
|
|
409
437
|
|
|
410
438
|
Normalize the Superpowers code-reviewer findings to the same JSON shape as
|
|
411
439
|
Codex/Gemini (severity, acceptance_criterion, file, line, description, suggestion)
|
|
412
|
-
before returning. Then return all three channels' findings:
|
|
440
|
+
before returning. Then return all three channels' findings plus channel status:
|
|
413
441
|
{
|
|
414
442
|
"story": "[STORY_TITLE]",
|
|
443
|
+
"channel_status": {
|
|
444
|
+
"codex": { "root_cause": "null|not_installed|auth_failed|auth_timeout|failed", "coverage_status": "full|compensating" },
|
|
445
|
+
"gemini": { "root_cause": "null|not_installed|auth_failed|auth_timeout|failed", "coverage_status": "full|compensating" },
|
|
446
|
+
"superpowers": { "root_cause": null, "coverage_status": "full" }
|
|
447
|
+
},
|
|
415
448
|
"codex": { "findings": [...] },
|
|
416
449
|
"gemini": { "findings": [...] },
|
|
417
450
|
"superpowers": { "findings": [...] }
|
|
@@ -448,7 +481,9 @@ Create `docs/reviews/` if it does not exist. Write the following to
|
|
|
448
481
|
|
|
449
482
|
- **Date:** [YYYY-MM-DD]
|
|
450
483
|
- **Mode:** [Review + Fix | Report Only | Update Mode | Re-review]
|
|
451
|
-
- **
|
|
484
|
+
- **Coverage:** [full-coverage / degraded-coverage / partial-coverage]
|
|
485
|
+
- **Channels (Phase 1):** Codex [completed | compensating | skipped — reason] | Gemini [completed | compensating | skipped — reason] | Superpowers [completed]
|
|
486
|
+
- **Channels (Phase 2):** [N] stories reviewed, [N] with full channels, [N] with compensating passes
|
|
452
487
|
- **Findings:** P0: [N] | P1: [N] | P2: [N] | P3: [N]
|
|
453
488
|
- **Fixed:** [N findings fixed | N/A — report-only]
|
|
454
489
|
|
|
@@ -602,7 +637,9 @@ Output the completion summary:
|
|
|
602
637
|
```
|
|
603
638
|
Post-implementation review complete.
|
|
604
639
|
|
|
605
|
-
|
|
640
|
+
Coverage: [full-coverage / degraded-coverage / partial-coverage]
|
|
641
|
+
Channels (Phase 1): Codex [completed|compensating|skipped — reason] | Gemini [completed|compensating|skipped — reason] | Superpowers [completed]
|
|
642
|
+
Channels (Phase 2): [N] stories reviewed, [N] with full channels, [N] with compensating passes
|
|
606
643
|
Findings: P0: [N] | P1: [N] | P2: [N] | P3: [N]
|
|
607
644
|
Fixed: [N] | Remaining: [N]
|
|
608
645
|
|
|
@@ -616,23 +653,29 @@ the user they require manual attention before the project is ready to release.
|
|
|
616
653
|
|
|
617
654
|
| Situation | Action |
|
|
618
655
|
|-----------|--------|
|
|
619
|
-
| Codex not installed |
|
|
620
|
-
| Gemini not installed |
|
|
621
|
-
| Codex auth expired
|
|
622
|
-
|
|
|
623
|
-
|
|
|
656
|
+
| Codex not installed (`command -v` fails) | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "not_installed" in report |
|
|
657
|
+
| Gemini not installed (`command -v` fails) | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "not_installed" in report |
|
|
658
|
+
| Codex auth expired — user recovers | Re-run auth check; proceed with full Codex channel |
|
|
659
|
+
| Codex auth expired — user declines or auth_timeout | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "auth_failed" or "auth_timeout" in report |
|
|
660
|
+
| Gemini auth expired (exit 41) — user recovers | Re-run auth check; proceed with full Gemini channel |
|
|
661
|
+
| Gemini auth expired — user declines or auth_timeout | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "auth_failed" or "auth_timeout" in report |
|
|
662
|
+
| Channel fails during execution (non-zero exit, malformed output, timeout) | Queue compensating pass for that channel with same focus and label; document root cause in report |
|
|
663
|
+
| Both external CLIs unavailable (any combination of not_installed / auth failure) | Run all compensating passes plus Superpowers code-reviewer; report coverage as "degraded-coverage"; warn user that review coverage is reduced |
|
|
664
|
+
| Superpowers unavailable | Document as "unavailable" in report; proceed with remaining channels; Superpowers is a Claude subagent and should always be available |
|
|
624
665
|
| `docs/user-stories.md` missing | Skip Phase 2; run Phase 1 only; warn user that functional review is incomplete |
|
|
625
666
|
| `docs/coding-standards.md` missing | Proceed without it; note its absence in the report summary |
|
|
626
667
|
|
|
627
668
|
## Process Rules
|
|
628
669
|
|
|
629
|
-
1. **
|
|
630
|
-
2. **
|
|
631
|
-
3. **
|
|
632
|
-
4. **
|
|
633
|
-
5. **
|
|
634
|
-
6. **
|
|
635
|
-
7. **
|
|
670
|
+
1. **Foreground only** — Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty output.
|
|
671
|
+
2. **All three channels are mandatory** — a channel enters degraded mode (compensating pass) when not installed, auth cannot be recovered, or it fails during execution. Never skip silently by choice.
|
|
672
|
+
3. **Auth failures are not silent** — always surface to the user with the exact recovery command (`! codex login` or `! gemini -p "hello"`). Wait for user response before queuing a compensating pass.
|
|
673
|
+
4. **Independence** — never share one channel's output with another. Each reviews independently.
|
|
674
|
+
5. **Verify every fix** — run tests (or re-read the file) immediately after each fix before moving on.
|
|
675
|
+
6. **3-round limit** — never attempt to fix the same finding more than 3 times. Surface unresolved findings to the user.
|
|
676
|
+
7. **Document everything** — the report must show which channels ran, which were compensating, which were skipped, and the root cause for any degraded channel.
|
|
677
|
+
8. **No auto-merge** — this tool modifies local files only. It never pushes, merges, or creates PRs.
|
|
678
|
+
9. **Dispatch pattern cross-reference** — Phase 2 parallel dispatch uses `superpowers:dispatching-parallel-agents`. Each story subagent dispatches its own `superpowers:code-reviewer` as Channel 3. This two-level nesting is intentional and supported.
|
|
636
679
|
|
|
637
680
|
## After This Step
|
|
638
681
|
|
|
@@ -642,7 +685,9 @@ When the review is complete, tell the user:
|
|
|
642
685
|
**Post-implementation review complete.**
|
|
643
686
|
|
|
644
687
|
Results:
|
|
645
|
-
-
|
|
688
|
+
- Coverage: [full-coverage / degraded-coverage / partial-coverage]
|
|
689
|
+
- Channels (Phase 1): Codex [completed|compensating|skipped — reason] | Gemini [completed|compensating|skipped — reason] | Superpowers [completed]
|
|
690
|
+
- Channels (Phase 2): [N] stories reviewed, [N] with full channels, [N] with compensating passes
|
|
646
691
|
- Phase 1 (systemic): [N] findings — [N] fixed, [N] remaining
|
|
647
692
|
- Phase 2 (functional): [N] findings across [N] stories — [N] fixed, [N] remaining
|
|
648
693
|
- Report: `docs/reviews/post-implementation-review.md`
|
|
@@ -159,6 +159,8 @@ Format the changed-file context like:
|
|
|
159
159
|
|
|
160
160
|
Each channel reviews independently. Do NOT share one channel's output with another.
|
|
161
161
|
|
|
162
|
+
**Foreground only:** Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty output. Multiple foreground calls in a single message are fine.
|
|
163
|
+
|
|
162
164
|
#### Channel 1: Codex CLI
|
|
163
165
|
|
|
164
166
|
Check installation and auth:
|
|
@@ -168,8 +170,10 @@ command -v codex >/dev/null 2>&1
|
|
|
168
170
|
codex login status 2>/dev/null
|
|
169
171
|
```
|
|
170
172
|
|
|
171
|
-
- If `codex` is not installed: skip this channel and record
|
|
172
|
-
- If auth fails: tell the user to run `! codex login`, retry after recovery, and if recovery is not possible, record
|
|
173
|
+
- If `codex` is not installed: skip this channel and record root-cause `not_installed`
|
|
174
|
+
- If auth fails: tell the user to run `! codex login`, retry after recovery, and if recovery is not possible, record root-cause `auth_failed` and continue with the remaining channels
|
|
175
|
+
|
|
176
|
+
If auth cannot be recovered, or if Codex is not installed, queue a compensating Claude self-review pass focused on implementation correctness, security, and API contracts. Label findings as `[compensating: Codex-equivalent]`. If auth check times out (~5s), retry once; if still failing, record `auth timeout` and queue compensating pass. This pass runs after all channel dispatch attempts complete.
|
|
173
177
|
|
|
174
178
|
Build the prompt in a temporary file and pass it over stdin:
|
|
175
179
|
|
|
@@ -179,6 +183,8 @@ PROMPT_FILE=$(mktemp)
|
|
|
179
183
|
codex exec --skip-git-repo-check -s read-only --ephemeral - < "$PROMPT_FILE" 2>/dev/null
|
|
180
184
|
```
|
|
181
185
|
|
|
186
|
+
If the CLI exits with a non-zero code, produces malformed/unparseable output, or is killed by the tool runner timeout, record root-cause `failed` and queue a compensating pass for that channel.
|
|
187
|
+
|
|
182
188
|
#### Channel 2: Gemini CLI
|
|
183
189
|
|
|
184
190
|
Check installation and auth:
|
|
@@ -188,8 +194,10 @@ command -v gemini >/dev/null 2>&1
|
|
|
188
194
|
NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
|
|
189
195
|
```
|
|
190
196
|
|
|
191
|
-
- If `gemini` is not installed: skip this channel and record
|
|
192
|
-
- If auth fails (including exit 41): tell the user to run `! gemini -p "hello"`, retry after recovery, and if recovery is not possible, record
|
|
197
|
+
- If `gemini` is not installed: skip this channel and record root-cause `not_installed`
|
|
198
|
+
- If auth fails (including exit 41): tell the user to run `! gemini -p "hello"`, retry after recovery, and if recovery is not possible, record root-cause `auth_failed` and continue with the remaining channels
|
|
199
|
+
|
|
200
|
+
If auth cannot be recovered, or if Gemini is not installed, queue a compensating Claude self-review pass focused on architectural patterns, design reasoning, and broad context. Label findings as `[compensating: Gemini-equivalent]`. If auth check times out (~5s), retry once; if still failing, record `auth timeout` and queue compensating pass. This pass runs after all channel dispatch attempts complete.
|
|
193
201
|
|
|
194
202
|
Build the prompt in a temporary file and pass it as a single prompt string:
|
|
195
203
|
|
|
@@ -199,6 +207,8 @@ PROMPT_FILE=$(mktemp)
|
|
|
199
207
|
NO_BROWSER=true gemini -p "$(cat "$PROMPT_FILE")" --output-format json --approval-mode yolo 2>/dev/null
|
|
200
208
|
```
|
|
201
209
|
|
|
210
|
+
If the CLI exits with a non-zero code, produces malformed/unparseable output, or is killed by the tool runner timeout, record root-cause `failed` and queue a compensating pass for that channel.
|
|
211
|
+
|
|
202
212
|
#### Channel 3: Superpowers code-reviewer
|
|
203
213
|
|
|
204
214
|
Dispatch the `superpowers:code-reviewer` subagent.
|
|
@@ -213,6 +223,8 @@ Dispatch the `superpowers:code-reviewer` subagent.
|
|
|
213
223
|
This channel must review the same local delivery candidate, even when no PR or
|
|
214
224
|
clean ref range exists.
|
|
215
225
|
|
|
226
|
+
**After all channels:** Run any queued compensating passes as foreground Claude self-review passes. Each compensating pass uses the same review prompt as the missing channel, focusing on that channel's strength area.
|
|
227
|
+
|
|
216
228
|
### Step 5: Use This Review Prompt
|
|
217
229
|
|
|
218
230
|
All channels should receive an equivalent prompt bundle built from the local review scope:
|
|
@@ -270,6 +282,7 @@ Use these rules:
|
|
|
270
282
|
| Any single P2 | Fix unless clearly inapplicable; if disputed, surface to user |
|
|
271
283
|
| All executed channels approve | Candidate passes review |
|
|
272
284
|
| Strong contradiction on a medium-severity issue | Verdict becomes `needs-user-decision` |
|
|
285
|
+
| Compensating-pass P0/P1/P2 finding | Single-source confidence — fix per normal thresholds, but label as compensating in summary |
|
|
273
286
|
|
|
274
287
|
### Step 7: Apply Fixes Unless in Report-Only Mode
|
|
275
288
|
|
|
@@ -284,14 +297,18 @@ Otherwise:
|
|
|
284
297
|
3. Repeat for up to 3 fix rounds
|
|
285
298
|
4. If any finding remains unresolved after 3 rounds, stop with verdict `needs-user-decision`
|
|
286
299
|
|
|
300
|
+
**Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not installed`, `auth failed`, or `auth timeout` during fix rounds — its availability does not change within a session.
|
|
301
|
+
|
|
287
302
|
### Step 8: Final Verdict
|
|
288
303
|
|
|
289
304
|
Return exactly one verdict:
|
|
290
305
|
|
|
291
|
-
- `pass` — all
|
|
292
|
-
- `degraded-pass` — at least one channel was skipped
|
|
293
|
-
- `blocked` —
|
|
294
|
-
- `needs-user-decision` — reviewer disagreement or findings still unresolved after 3
|
|
306
|
+
- `pass` — all channels completed with `full` coverage, no unresolved P0/P1/P2
|
|
307
|
+
- `degraded-pass` — at least one channel was skipped/compensated (coverage is not all `full`), but all executed and compensating channels have no unresolved P0/P1/P2
|
|
308
|
+
- `blocked` — unresolved P0/P1/P2 findings remain after 3 fix rounds, OR no reconciled result was possible
|
|
309
|
+
- `needs-user-decision` — reviewer disagreement on a finding, or findings still unresolved after 3 rounds that require human judgment
|
|
310
|
+
|
|
311
|
+
When compensating passes ran for any channel, the maximum achievable verdict is `degraded-pass` — never `pass`, even if all findings are resolved. When both external channels were compensated, the review summary must note: "All findings are single-model (Claude only)."
|
|
295
312
|
|
|
296
313
|
### Step 9: Report Results
|
|
297
314
|
|
|
@@ -304,9 +321,9 @@ Output a concise summary in this format:
|
|
|
304
321
|
[scope label]
|
|
305
322
|
|
|
306
323
|
### Channels Executed
|
|
307
|
-
- Codex CLI — [completed /
|
|
308
|
-
- Gemini CLI — [completed /
|
|
309
|
-
- Superpowers code-reviewer — [completed /
|
|
324
|
+
- Codex CLI — root cause: [completed / not installed / auth failed / auth timeout / failed], coverage: [full / compensating (Codex-equivalent)]
|
|
325
|
+
- Gemini CLI — root cause: [completed / not installed / auth failed / auth timeout / failed], coverage: [full / compensating (Gemini-equivalent)]
|
|
326
|
+
- Superpowers code-reviewer — [completed / failed]
|
|
310
327
|
|
|
311
328
|
### Findings
|
|
312
329
|
[consensus findings first, then single-source findings]
|
|
@@ -317,3 +334,12 @@ Output a concise summary in this format:
|
|
|
317
334
|
|
|
318
335
|
If the verdict is `pass` or `degraded-pass`, explicitly say the code is ready
|
|
319
336
|
for the next delivery step (commit, push, or PR creation).
|
|
337
|
+
|
|
338
|
+
## Process Rules
|
|
339
|
+
|
|
340
|
+
1. **Foreground only** — Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`.
|
|
341
|
+
2. **All 3 channels are mandatory** — skip only when a tool is genuinely not installed, never by choice.
|
|
342
|
+
3. **Auth failures are not silent** — always surface to the user with recovery instructions.
|
|
343
|
+
4. **Independence** — never share one channel's output with another.
|
|
344
|
+
5. **Fix before proceeding** — P0/P1/P2 findings must be resolved before moving to the next task.
|
|
345
|
+
6. **Dispatch pattern** follows `multi-model-review-dispatch` knowledge entry. When modifying channel dispatch in this file, verify consistency with `review-pr.md` and `post-implementation-review.md`.
|
|
@@ -66,18 +66,28 @@ Read these files for review context (skip any that don't exist):
|
|
|
66
66
|
|
|
67
67
|
Run all three channels. Track which ones complete successfully.
|
|
68
68
|
|
|
69
|
+
**Foreground only:** Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty output.
|
|
70
|
+
|
|
69
71
|
#### Channel 1: Codex CLI
|
|
70
72
|
|
|
73
|
+
**Installation check:**
|
|
74
|
+
```bash
|
|
75
|
+
command -v codex >/dev/null 2>&1
|
|
76
|
+
```
|
|
77
|
+
- If `codex` is not installed: queue a compensating Claude self-review pass focused on implementation correctness, security, and API contracts. Record root-cause `not_installed`. Skip to next channel.
|
|
78
|
+
|
|
71
79
|
**Auth check first** (auth tokens expire — always re-verify):
|
|
72
80
|
|
|
73
81
|
```bash
|
|
74
82
|
codex login status 2>/dev/null && echo "codex authenticated" || echo "codex NOT authenticated"
|
|
75
83
|
```
|
|
76
84
|
|
|
77
|
-
If Codex is not installed, skip this channel and note it in the summary.
|
|
78
85
|
If auth fails, tell the user: "Codex auth expired. Run: `! codex login`" — do NOT
|
|
79
86
|
silently fall back. After the user re-authenticates, retry.
|
|
80
87
|
|
|
88
|
+
If auth cannot be recovered, queue a compensating pass (same focus as above). Record root-cause `auth_failed`.
|
|
89
|
+
If auth check times out (~5s), retry once. If still failing, record root-cause `auth_timeout` and queue compensating pass.
|
|
90
|
+
|
|
81
91
|
**Run the review:**
|
|
82
92
|
|
|
83
93
|
```bash
|
|
@@ -90,8 +100,16 @@ The review prompt must include:
|
|
|
90
100
|
- Review standards from docs/review-standards.md (if exists)
|
|
91
101
|
- Instruction to report P0/P1/P2 findings as JSON with severity, location (file:line), description, and suggestion
|
|
92
102
|
|
|
103
|
+
If the CLI exits with a non-zero code, produces malformed/unparseable output, or is killed by the tool runner timeout, record root-cause `failed` and queue a compensating pass for that channel.
|
|
104
|
+
|
|
93
105
|
#### Channel 2: Gemini CLI
|
|
94
106
|
|
|
107
|
+
**Installation check:**
|
|
108
|
+
```bash
|
|
109
|
+
command -v gemini >/dev/null 2>&1
|
|
110
|
+
```
|
|
111
|
+
- If `gemini` is not installed: queue a compensating Claude self-review pass focused on architectural patterns, design reasoning, and broad context. Record root-cause `not_installed`. Label findings as `[compensating: Gemini-equivalent]`. Skip to next channel.
|
|
112
|
+
|
|
95
113
|
**Auth check first:**
|
|
96
114
|
|
|
97
115
|
```bash
|
|
@@ -104,9 +122,11 @@ elif [ "$GEMINI_EXIT" -eq 41 ]; then
|
|
|
104
122
|
fi
|
|
105
123
|
```
|
|
106
124
|
|
|
107
|
-
If Gemini is not installed, skip this channel and note it in the summary.
|
|
108
125
|
If auth fails (exit 41), tell the user: "Gemini auth expired. Run: `! gemini -p \"hello\"`" — do NOT silently fall back. After the user re-authenticates, retry.
|
|
109
126
|
|
|
127
|
+
If auth cannot be recovered, queue a compensating pass focused on architectural patterns, design reasoning, and broad context. Record root-cause `auth_failed`. Label findings as `[compensating: Gemini-equivalent]`.
|
|
128
|
+
If auth check times out (~5s), retry once. If still failing, record root-cause `auth_timeout` and queue compensating pass labeled `[compensating: Gemini-equivalent]`.
|
|
129
|
+
|
|
110
130
|
**Run the review:**
|
|
111
131
|
|
|
112
132
|
```bash
|
|
@@ -116,6 +136,8 @@ NO_BROWSER=true gemini -p "REVIEW_PROMPT" --output-format json --approval-mode y
|
|
|
116
136
|
Same review prompt content as Codex. Do NOT share one model's output with the other —
|
|
117
137
|
each reviews independently.
|
|
118
138
|
|
|
139
|
+
If the CLI exits with a non-zero code, produces malformed/unparseable output, or is killed by the tool runner timeout, record root-cause `failed` and queue a compensating pass for that channel.
|
|
140
|
+
|
|
119
141
|
#### Channel 3: Superpowers Code-Reviewer Subagent
|
|
120
142
|
|
|
121
143
|
Dispatch the `superpowers:code-reviewer` subagent. This channel always runs (it uses
|
|
@@ -134,6 +156,8 @@ providing:
|
|
|
134
156
|
- `HEAD_SHA` — head commit
|
|
135
157
|
- `DESCRIPTION` — PR summary
|
|
136
158
|
|
|
159
|
+
**After all channels:** Run any queued compensating passes as foreground Claude self-review passes. Each uses the same review prompt as the missing channel, focused on that channel's strength area. Label findings as `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]`.
|
|
160
|
+
|
|
137
161
|
### Step 4: Reconcile Findings
|
|
138
162
|
|
|
139
163
|
After all channels complete, reconcile findings:
|
|
@@ -145,6 +169,7 @@ After all channels complete, reconcile findings:
|
|
|
145
169
|
| One channel flags P0, others approve | **High** | Fix it — P0 is critical from any source |
|
|
146
170
|
| One channel flags P1, others approve | **Medium** | Fix it — P1 findings are mandatory regardless of source count |
|
|
147
171
|
| Channels contradict each other | **Low** | Present to user for adjudication |
|
|
172
|
+
| Compensating-pass P0/P1/P2 finding | **Single-source** | Fix per normal thresholds, label as compensating |
|
|
148
173
|
|
|
149
174
|
### Step 5: Report Results
|
|
150
175
|
|
|
@@ -154,9 +179,9 @@ Output a review summary in this format:
|
|
|
154
179
|
## Code Review Summary — PR #[number]
|
|
155
180
|
|
|
156
181
|
### Channels Executed
|
|
157
|
-
- [ ] Codex CLI — [completed /
|
|
158
|
-
- [ ] Gemini CLI — [completed /
|
|
159
|
-
- [ ] Superpowers code-reviewer — [completed /
|
|
182
|
+
- [ ] Codex CLI — root cause: [completed / not installed / auth failed / auth timeout / failed], coverage: [full / compensating (Codex-equivalent)]
|
|
183
|
+
- [ ] Gemini CLI — root cause: [completed / not installed / auth failed / auth timeout / failed], coverage: [full / compensating (Gemini-equivalent)]
|
|
184
|
+
- [ ] Superpowers code-reviewer — [completed / failed]
|
|
160
185
|
|
|
161
186
|
### Consensus Findings (High Confidence)
|
|
162
187
|
[Findings flagged by 2+ channels]
|
|
@@ -168,9 +193,22 @@ Output a review summary in this format:
|
|
|
168
193
|
[Contradictions between channels]
|
|
169
194
|
|
|
170
195
|
### Verdict
|
|
171
|
-
[
|
|
196
|
+
[pass / degraded-pass / blocked / needs-user-decision]
|
|
172
197
|
```
|
|
173
198
|
|
|
199
|
+
### Step 5a: Final Verdict
|
|
200
|
+
|
|
201
|
+
Return exactly one verdict:
|
|
202
|
+
|
|
203
|
+
- `pass` — all channels ran, no unresolved P0/P1/P2
|
|
204
|
+
- `degraded-pass` — channels skipped/compensated, no unresolved P0/P1/P2
|
|
205
|
+
- `blocked` — unresolved P0/P1/P2 after 3 fix rounds
|
|
206
|
+
- `needs-user-decision` — contradictions or unresolvable findings
|
|
207
|
+
|
|
208
|
+
Verdict precedence: `needs-user-decision` > `blocked` > `degraded-pass` > `pass`.
|
|
209
|
+
|
|
210
|
+
When compensating passes ran, maximum achievable verdict is `degraded-pass`. When both external channels were compensated, note "All findings are single-model."
|
|
211
|
+
|
|
174
212
|
### Step 6: Fix P0/P1/P2 Findings
|
|
175
213
|
|
|
176
214
|
If any P0, P1, or P2 findings exist:
|
|
@@ -179,12 +217,14 @@ If any P0, P1, or P2 findings exist:
|
|
|
179
217
|
3. Re-run the channels that produced findings to verify fixes
|
|
180
218
|
4. After 3 fix rounds with unresolved P0/P1/P2 findings, stop and ask the user for direction — do NOT merge automatically. Document remaining findings and let the user decide whether to continue fixing, create follow-up issues, or override.
|
|
181
219
|
|
|
220
|
+
**Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not installed`, `auth failed`, or `auth timeout` during fix rounds — its availability does not change within a session.
|
|
221
|
+
|
|
182
222
|
### Step 7: Confirm Completion
|
|
183
223
|
|
|
184
224
|
After all findings are resolved (or 3 rounds complete), output:
|
|
185
225
|
|
|
186
226
|
```
|
|
187
|
-
Code review complete.
|
|
227
|
+
Code review complete. Verdict: [pass/degraded-pass]. Channels: [N] executed, [N] compensating. PR #[number] is ready for merge.
|
|
188
228
|
```
|
|
189
229
|
|
|
190
230
|
Do NOT proceed to the next task or merge until this confirmation is output.
|
|
@@ -193,19 +233,24 @@ Do NOT proceed to the next task or merge until this confirmation is output.
|
|
|
193
233
|
|
|
194
234
|
| Situation | Action |
|
|
195
235
|
|-----------|--------|
|
|
196
|
-
|
|
|
197
|
-
|
|
|
198
|
-
|
|
|
199
|
-
|
|
|
200
|
-
|
|
|
236
|
+
| Channel not installed | Queue compensating pass, report root-cause `not_installed` |
|
|
237
|
+
| Auth expired, user recovers | Retry dispatch |
|
|
238
|
+
| Auth expired, user declines | Queue compensating pass, report root-cause `auth_failed` |
|
|
239
|
+
| Auth check timeout (after retry) | Queue compensating pass, report root-cause `auth_timeout` |
|
|
240
|
+
| Channel fails during execution | Queue compensating pass, report root-cause `failed` |
|
|
241
|
+
| Both external channels unavailable | Two compensating passes, max verdict: `degraded-pass`, note "All findings single-model" |
|
|
242
|
+
| Superpowers unavailable | Run available CLIs, warn user (Superpowers is always-available Claude — no compensating pass) |
|
|
201
243
|
|
|
202
244
|
## Process Rules
|
|
203
245
|
|
|
204
|
-
1. **
|
|
205
|
-
2. **
|
|
206
|
-
3. **
|
|
207
|
-
4. **
|
|
208
|
-
5. **
|
|
246
|
+
1. **Foreground only** — Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`.
|
|
247
|
+
2. **All three channels are mandatory** — skip only when a tool is genuinely not installed, never by choice.
|
|
248
|
+
3. **Auth failures are not silent** — always surface to the user with the exact recovery command.
|
|
249
|
+
4. **Independence** — never share one channel's output with another. Each reviews the diff independently.
|
|
250
|
+
5. **Fix before proceeding** — P0/P1/P2 findings must be resolved before moving to the next task.
|
|
251
|
+
6. **3-round limit** — never attempt more than 3 fix rounds. Surface unresolved findings to the user.
|
|
252
|
+
7. **Document everything** — the review summary must show which channels ran and which were skipped, with reasons.
|
|
253
|
+
8. **Dispatch pattern** follows `multi-model-review-dispatch` knowledge entry. When modifying channel dispatch in this file, verify consistency with `review-code.md` and `post-implementation-review.md`.
|
|
209
254
|
|
|
210
255
|
---
|
|
211
256
|
|
|
@@ -214,16 +259,13 @@ Do NOT proceed to the next task or merge until this confirmation is output.
|
|
|
214
259
|
When code review is complete, tell the user:
|
|
215
260
|
|
|
216
261
|
---
|
|
217
|
-
**Code review complete** — All channels executed for PR #[number].
|
|
262
|
+
**Code review complete** — Verdict: [pass/degraded-pass]. All channels executed for PR #[number].
|
|
218
263
|
|
|
219
264
|
**Results:**
|
|
220
|
-
- Channels run: [list which of the 3 ran]
|
|
265
|
+
- Channels run: [list which of the 3 ran, noting any compensating]
|
|
221
266
|
- Findings fixed: [count]
|
|
222
267
|
- Remaining: [none / list]
|
|
223
268
|
|
|
224
|
-
**Next:** Return to the task execution loop
|
|
225
|
-
the next unblocked task with `/scaffold:single-agent-start`.
|
|
226
|
-
|
|
227
|
-
**Pipeline reference:** `/scaffold:prompt-pipeline`
|
|
269
|
+
**Next:** Return to the task execution loop.
|
|
228
270
|
|
|
229
271
|
---
|
package/package.json
CHANGED
|
@@ -64,7 +64,7 @@ fi
|
|
|
64
64
|
|
|
65
65
|
The `!` prefix runs the command in the user's terminal session, allowing interactive auth flows (browser OAuth, Y/n prompts) that can't work in headless mode.
|
|
66
66
|
|
|
67
|
-
**If neither CLI is available or authenticated**:
|
|
67
|
+
**If neither CLI is available or authenticated**: Queue a compensating Claude pass focused on the failed channel's strength area. Document this as "single-model review (no external CLIs available)."
|
|
68
68
|
|
|
69
69
|
## Correct Invocation Patterns
|
|
70
70
|
|
|
@@ -134,6 +134,12 @@ NO_BROWSER=true gemini -p "REVIEW_PROMPT_HERE" --output-format json -s --approva
|
|
|
134
134
|
|
|
135
135
|
**Output**: JSON on stdout with `{ response, stats, error }` structure.
|
|
136
136
|
|
|
137
|
+
## Foreground-Only Execution
|
|
138
|
+
|
|
139
|
+
Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty or truncated output from both CLIs. Multiple foreground calls in a single message are fine — the tool runner supports parallel invocations.
|
|
140
|
+
|
|
141
|
+
This means: when dispatching reviews, make each CLI call a separate foreground Bash tool invocation. Do NOT use shell `&` or background subshells.
|
|
142
|
+
|
|
137
143
|
## Context Bundling
|
|
138
144
|
|
|
139
145
|
When dispatching a review, bundle all relevant context into the prompt. Each CLI gets the same bundle — do NOT share one model's review with the other.
|
|
@@ -208,35 +214,27 @@ You are reviewing a pull request diff. Report P0, P1, and P2 issues.
|
|
|
208
214
|
| PR diff | Full diff | If >2000 lines, split into file groups |
|
|
209
215
|
| Implementation plan | Task list + representative tasks | Include full task list, detail for flagged tasks |
|
|
210
216
|
|
|
211
|
-
##
|
|
212
|
-
|
|
213
|
-
When both CLIs produce results, reconcile findings using these rules:
|
|
217
|
+
## Finding Reconciliation
|
|
214
218
|
|
|
215
|
-
|
|
216
|
-
|----------|-----------|--------|
|
|
217
|
-
| Both flag same issue | **High** | Fix immediately — two independent models agree |
|
|
218
|
-
| Both approve (no findings) | **High** | Proceed confidently |
|
|
219
|
-
| One flags P0, other approves | **High** | Fix it — P0 is critical enough from a single source |
|
|
220
|
-
| One flags P1, other approves | **Medium** | Review the finding carefully before fixing. If the finding is specific and actionable, fix it. If vague, skip. |
|
|
221
|
-
| Models contradict each other | **Low** | Present both findings to the user for adjudication |
|
|
219
|
+
When multiple models produce findings, reconcile them using the rules defined in `multi-model-review-dispatch`. Key principles:
|
|
222
220
|
|
|
223
|
-
**Independence rule**: Never share one model's review output with the other. Each model must review the artifact independently to avoid confirmation bias.
|
|
221
|
+
- **Independence rule**: Never share one model's review output with the other. Each model must review the artifact independently to avoid confirmation bias.
|
|
222
|
+
- **Round tracking**: For iterative reviews (like PR review loops), track the round number. After 3 fix rounds with unresolved findings, stop and surface the verdict (`blocked` or `needs-user-decision`) to the user. Do NOT auto-merge.
|
|
224
223
|
|
|
225
|
-
|
|
224
|
+
For the full consensus rules, confidence scoring, and disagreement resolution process, see `multi-model-review-dispatch`.
|
|
226
225
|
|
|
227
226
|
## Fallback Behavior
|
|
228
227
|
|
|
229
228
|
| Situation | Fallback |
|
|
230
229
|
|-----------|----------|
|
|
231
|
-
| Neither CLI available |
|
|
232
|
-
| Codex only | Single-model review with Codex |
|
|
233
|
-
| Gemini only | Single-model review with Gemini |
|
|
234
|
-
| **CLI auth expired** | **Surface to user with recovery command — do NOT silently fall back** |
|
|
235
|
-
| One CLI fails mid-review
|
|
236
|
-
| Both CLIs fail
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
**Auth failures are NOT silent fallbacks.** The difference between "CLI not installed" (fall back quietly) and "CLI auth expired" (user action required) is critical. Auth can be fixed in 30 seconds with an interactive command — silently skipping wastes the user's review infrastructure.
|
|
230
|
+
| Neither CLI available | Queue two compensating Claude passes (one per missing channel's strength area). Label findings. Max verdict: `degraded-pass`. |
|
|
231
|
+
| Codex only | Single-model review with Codex + compensating Claude pass for Gemini |
|
|
232
|
+
| Gemini only | Single-model review with Gemini + compensating Claude pass for Codex |
|
|
233
|
+
| **CLI auth expired** | **Surface to user with `!` recovery command — do NOT silently fall back** |
|
|
234
|
+
| One CLI fails mid-review | Use partial results if available, else queue compensating pass. Note failure in summary. |
|
|
235
|
+
| Both CLIs fail | Two compensating passes, max verdict: `degraded-pass`. Warn user. |
|
|
236
|
+
|
|
237
|
+
Auth failures are NOT silent fallbacks.
|
|
240
238
|
|
|
241
239
|
## Integration with Review Steps
|
|
242
240
|
|