code-review-forge 2.0.0a1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- code_forge/__init__.py +14 -0
- code_forge/__main__.py +8 -0
- code_forge/autofix.py +78 -0
- code_forge/baseline.py +216 -0
- code_forge/cli.py +983 -0
- code_forge/delta.py +65 -0
- code_forge/diagnose.py +109 -0
- code_forge/diff.py +82 -0
- code_forge/disposition.py +32 -0
- code_forge/e2e_check.py +641 -0
- code_forge/env_resolver.py +91 -0
- code_forge/errors.py +34 -0
- code_forge/exit_codes.py +37 -0
- code_forge/factories.py +191 -0
- code_forge/falsify.py +85 -0
- code_forge/gate_check.py +466 -0
- code_forge/git.py +351 -0
- code_forge/hold.py +126 -0
- code_forge/install_hooks.py +331 -0
- code_forge/lock.py +162 -0
- code_forge/machine.py +792 -0
- code_forge/mode_resolver.py +60 -0
- code_forge/mutation.py +380 -0
- code_forge/parsers/__init__.py +56 -0
- code_forge/parsers/_sarif.py +77 -0
- code_forge/parsers/base.py +65 -0
- code_forge/parsers/checkpatch.py +66 -0
- code_forge/parsers/clippy.py +85 -0
- code_forge/parsers/non_ascii.py +47 -0
- code_forge/parsers/ruff.py +18 -0
- code_forge/parsers/semgrep.py +18 -0
- code_forge/parsers/shellcheck.py +56 -0
- code_forge/registry.py +153 -0
- code_forge/reporter.py +133 -0
- code_forge/runner.py +205 -0
- code_forge/sarif.py +226 -0
- code_forge/skills/adversarial-qe/SKILL.md +272 -0
- code_forge/skills/code-forge/SKILL.md +1193 -0
- code_forge/skills/code-review-expert/SKILL.md +162 -0
- code_forge/skills/code-review-expert/references/code-quality-checklist.md +130 -0
- code_forge/skills/code-review-expert/references/removal-plan.md +52 -0
- code_forge/skills/code-review-expert/references/security-checklist.md +118 -0
- code_forge/skills/code-review-expert/references/solid-checklist.md +65 -0
- code_forge/skills/kernel-fp-verify/SKILL.md +101 -0
- code_forge/skills/qodo-review/SKILL.md +135 -0
- code_forge/skills/smoke-test/SKILL.md +253 -0
- code_forge/skills/smoke-test/references/boundary-cases.md +114 -0
- code_forge/skills/smoke-test/references/concurrency-patterns.md +306 -0
- code_forge/skills/smoke-test/references/injection-payloads.md +124 -0
- code_forge/skills/smoke-test/test-library/shell/README.md +271 -0
- code_forge/skills/smoke-test/test-library/shell/primitives.sh +352 -0
- code_forge/skills/smoke-test/test-library/shell/primitives_test.sh +324 -0
- code_forge/snapshot.py +196 -0
- code_forge/source.py +64 -0
- code_forge/state.py +246 -0
- code_forge/verdict.py +43 -0
- code_review_forge-2.0.0a1.dist-info/METADATA +237 -0
- code_review_forge-2.0.0a1.dist-info/RECORD +62 -0
- code_review_forge-2.0.0a1.dist-info/WHEEL +5 -0
- code_review_forge-2.0.0a1.dist-info/entry_points.txt +2 -0
- code_review_forge-2.0.0a1.dist-info/licenses/LICENSE +179 -0
- code_review_forge-2.0.0a1.dist-info/top_level.txt +1 -0
|
@@ -0,0 +1,1193 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: code-forge
|
|
3
|
+
description: "5-step code review pipeline with cycle-counter state machine, hook enforcement, and anti-hallucination gates. Minimum 9 static review passes before commit. Use when reviewing code changes before commit, or when user says /code-forge, 'review', 'three-cycle review', or 'run the full review pipeline'."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Forge -- Code Review Pipeline
|
|
7
|
+
|
|
8
|
+
5-step pipeline that forges code through repeated review cycles until zero defects remain.
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
- **Before any commit** of code changes (mandatory per CLAUDE.md)
|
|
13
|
+
- When user invokes `/forge` or asks for "full review", "three-cycle review"
|
|
14
|
+
- After fixing bugs, adding features, or refactoring -- before the commit step
|
|
15
|
+
|
|
16
|
+
## When NOT to Use
|
|
17
|
+
|
|
18
|
+
- Documentation-only commits (`# docs`)
|
|
19
|
+
- Configuration-only commits (`# config`)
|
|
20
|
+
- Tooling/dependency commits (`# chore`)
|
|
21
|
+
- Work-in-progress snapshots (`# wip`)
|
|
22
|
+
|
|
23
|
+
For `# docs`, `# config`, `# chore`, and `# wip` commits, Steps 5-7 (R1/R2/R3
|
|
24
|
+
dynamic gates) are also skipped, not just the static review pipeline. These
|
|
25
|
+
commit types are exempt from test-gate, mutation-check, and e2e-check because
|
|
26
|
+
they carry no runnable logic change.
|
|
27
|
+
|
|
28
|
+
## Arguments
|
|
29
|
+
|
|
30
|
+
- No argument: review uncommitted changes (staged + unstaged)
|
|
31
|
+
- `committed`: review current branch vs merge-base
|
|
32
|
+
- `step N`: resume from a specific step (e.g., `step 4` to run smoke test only)
|
|
33
|
+
- `--skip-0`: skip Step 0 pre-checks (use only when re-entering after a fix that did not change syntax/lint)
|
|
34
|
+
|
|
35
|
+
## Prerequisites
|
|
36
|
+
|
|
37
|
+
- Code changes exist (staged or unstaged diff, or committed branch diff)
|
|
38
|
+
- Working inside a git worktree (not main tree -- enforced by check_worktree.sh hook)
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
# Pipeline Overview
|
|
43
|
+
|
|
44
|
+
```
|
|
45
|
+
Code Change
|
|
46
|
+
|
|
|
47
|
+
v
|
|
48
|
+
[Step 0] Syntax (0a) + Lint (0b) + Non-ASCII (0c)
|
|
49
|
+
|
|
|
50
|
+
v
|
|
51
|
+
[Steps 1-3] Three-cycle static review (cycle_counter state machine)
|
|
52
|
+
| Each cycle = Pass 1 + Pass 2 + Pass 3
|
|
53
|
+
| P0/P1 -> fix -> counter = 0 -> restart all
|
|
54
|
+
| P2 -> fix -> restart current cycle
|
|
55
|
+
| P3 -> accumulate (density check -> P2 escalation)
|
|
56
|
+
| Clean -> auto-continue (no user prompt)
|
|
57
|
+
| 3 consecutive clean cycles -> proceed
|
|
58
|
+
v
|
|
59
|
+
[Step 3.5] False-positive verification (if findings were fixed)
|
|
60
|
+
|
|
|
61
|
+
v
|
|
62
|
+
[Step 4] Smoke test (runtime verification)
|
|
63
|
+
|
|
|
64
|
+
v
|
|
65
|
+
[Step 5] R1 Test Gate (tests exist + pass for changed source)
|
|
66
|
+
|
|
|
67
|
+
v
|
|
68
|
+
[Step 6] R2 Mutation Check (tests kill mutants, not just pass)
|
|
69
|
+
|
|
|
70
|
+
v
|
|
71
|
+
[Step 7] R3 E2E Coverage (cross-component signature change has e2e artifact)
|
|
72
|
+
|
|
|
73
|
+
v
|
|
74
|
+
[COMMIT GATE] git commit # post-review-c3
|
|
75
|
+
Requires: 3 clean cycles + R1 PASS + R2 PASS + R3 PASS/SKIP
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
# Step 0: Pre-Review Gate
|
|
81
|
+
|
|
82
|
+
All three sub-checks must pass. Only NEW warnings count -- pre-existing issues in untouched code are out of scope.
|
|
83
|
+
|
|
84
|
+
## 0a. Syntax Check
|
|
85
|
+
|
|
86
|
+
Run the appropriate tool for each language in the diff:
|
|
87
|
+
|
|
88
|
+
| Language | Command |
|
|
89
|
+
|----------|---------|
|
|
90
|
+
| Shell | `bash -n <file>` + `shellcheck <file>` |
|
|
91
|
+
| Python | `python3 -m py_compile <file>` |
|
|
92
|
+
| Go | `go vet ./...` |
|
|
93
|
+
| C (kernel) | `make` |
|
|
94
|
+
| Rust | `cargo check` |
|
|
95
|
+
|
|
96
|
+
## 0b. Format/Lint Check
|
|
97
|
+
|
|
98
|
+
| Language | Command |
|
|
99
|
+
|----------|---------|
|
|
100
|
+
| Shell | `shellcheck -W <file>`, verify line length <= 80 |
|
|
101
|
+
| Python | `pylint --enable=W,C <file>` or `ruff check <file>` |
|
|
102
|
+
| Go | `golangci-lint run` |
|
|
103
|
+
| C (kernel) | `scripts/checkpatch.pl --strict` |
|
|
104
|
+
| Rust | `cargo clippy` |
|
|
105
|
+
| All | `semgrep` (security lint, all languages) |
|
|
106
|
+
|
|
107
|
+
Project-specific overrides always win (e.g., kernel uses checkpatch.pl, not generic lint).
|
|
108
|
+
|
|
109
|
+
## Comprehensive Language Tables
|
|
110
|
+
|
|
111
|
+
Tool absence rule: if a tool is not installed, log `tool_missing: <tool>` to
|
|
112
|
+
`.code-forge/findings.json` and continue (WARN, not FAIL).
|
|
113
|
+
|
|
114
|
+
### Programming Languages (14)
|
|
115
|
+
|
|
116
|
+
| Language | 0a Syntax | 0b Lint | Test Runner (R1) | Mutation (R2) |
|
|
117
|
+
|---|---|---|---|---|
|
|
118
|
+
| Python | `python3 -m py_compile` | `ruff check` (preferred) or `pylint` | `pytest` | `mutmut` or `cosmic-ray` |
|
|
119
|
+
| Go | `go vet ./...` | `golangci-lint run` | `go test ./...` | `gremlins` or `go-mutesting` |
|
|
120
|
+
| Rust | `cargo check` | `cargo clippy` | `cargo test` | `cargo mutants` |
|
|
121
|
+
| JavaScript | `node --check` | `eslint` | `jest` / `vitest` / `mocha` | `stryker-mutator` |
|
|
122
|
+
| TypeScript | `tsc --noEmit` | `eslint` + `@typescript-eslint` | `jest` / `vitest` | `stryker-mutator` |
|
|
123
|
+
| Java | `javac -Xlint -d /tmp` | `checkstyle` + `spotbugs` | `mvn test` / `gradle test` | `pitest` |
|
|
124
|
+
| Kotlin | `kotlinc -script` or `-Werror` | `ktlint` + `detekt` | `gradle test` | `pitest` |
|
|
125
|
+
| C | `gcc -fsyntax-only -Wall` | `cppcheck` + `clang-tidy` | `ctest` / `make test` | `mull` |
|
|
126
|
+
| C++ | `g++ -fsyntax-only -Wall` | `cppcheck` + `clang-tidy` | `ctest` / `make test` | `mull` |
|
|
127
|
+
| Kernel C | `make` (subsystem build) | `scripts/checkpatch.pl --strict` | Beaker functional | N/A |
|
|
128
|
+
| Shell | `bash -n` + `shellcheck` | `shellcheck` | `bats` / inline | LLM-inject 10 mutants |
|
|
129
|
+
| Ruby | `ruby -c` | `rubocop` | `rspec` / `minitest` | `mutant` |
|
|
130
|
+
| PHP | `php -l` | `phpstan` + `phpcs` | `phpunit` | `infection` |
|
|
131
|
+
| Swift | `swift -frontend -parse` | `swiftlint` | `swift test` | `muter` |
|
|
132
|
+
|
|
133
|
+
### Config / Markup (7)
|
|
134
|
+
|
|
135
|
+
| Format | 0a Syntax | 0b Lint | Notes |
|
|
136
|
+
|---|---|---|---|
|
|
137
|
+
| YAML | `yamllint` or `python3 -c "import yaml; yaml.safe_load(open(p))"` | `yamllint` | YNL netlink specs MUST run yamllint |
|
|
138
|
+
| JSON | `jq . > /dev/null` or `python3 -m json.tool` | `jsonlint` | |
|
|
139
|
+
| TOML | `python3 -c "import tomllib; tomllib.load(open(p,'rb'))"` | `taplo lint` | |
|
|
140
|
+
| XML | `xmllint --noout` | `xmllint --schema <xsd>` | |
|
|
141
|
+
| Markdown | N/A (always parses) | `markdownlint-cli2` | |
|
|
142
|
+
| HTML | `tidy -e -q` | `htmlhint` | |
|
|
143
|
+
| CSS | `stylelint` | `stylelint` | |
|
|
144
|
+
|
|
145
|
+
### Specialized DSL (7)
|
|
146
|
+
|
|
147
|
+
| DSL | 0a Syntax | 0b Lint | Notes |
|
|
148
|
+
|---|---|---|---|
|
|
149
|
+
| SQL | `sqlfluff parse` | `sqlfluff lint` | Dialect-specific |
|
|
150
|
+
| Dockerfile | `hadolint` (combined) | `hadolint` | |
|
|
151
|
+
| Terraform | `terraform validate` | `tflint` | Run `terraform init` first |
|
|
152
|
+
| Kubernetes YAML | `kubeconform` | `kube-linter` | Also run yamllint |
|
|
153
|
+
| Ansible | `ansible-playbook --syntax-check` | `ansible-lint` | |
|
|
154
|
+
| protobuf | `protoc --proto_path=. <file>` | `buf lint` | |
|
|
155
|
+
| GraphQL | `graphql-cli parse` | `graphql-schema-linter` | |
|
|
156
|
+
|
|
157
|
+
## 0c. Non-ASCII Check
|
|
158
|
+
|
|
159
|
+
LLMs silently emit non-ASCII characters (em dash U+2014, smart quotes U+201C/201D, arrow U+2192, ellipsis U+2026) that look identical to ASCII. Reviewers (also LLMs) have the same blind spot.
|
|
160
|
+
|
|
161
|
+
```bash
|
|
162
|
+
git diff HEAD --diff-filter=AM -U0 | grep '^+' | grep -P '[^\x00-\x7F]' && echo "FAIL: non-ASCII in new code"
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
Any hit = fix before proceeding. This check applies to ALL output: code, comments, commit messages, emails, drafts.
|
|
166
|
+
|
|
167
|
+
## Step 0 Gate
|
|
168
|
+
|
|
169
|
+
- **Entry**: code change exists (staged or unstaged diff)
|
|
170
|
+
- **Exit**: 0a + 0b + 0c all pass with zero new warnings
|
|
171
|
+
- **On failure**: fix the issue, re-run Step 0
|
|
172
|
+
|
|
173
|
+
## Step 0 Context Fusion (FUSE-01)
|
|
174
|
+
|
|
175
|
+
After Step 0 completes, serialize ALL Step 0 findings into a context block.
|
|
176
|
+
This block is prepended to the prompt for EVERY LLM pass (Steps 1-3).
|
|
177
|
+
|
|
178
|
+
**Why:** Prevents LLM passes from re-flagging issues that Step 0 already caught.
|
|
179
|
+
Semgrep Multimodal achieved 8x more true positives and 50% less noise with this
|
|
180
|
+
deterministic+LLM fusion pattern.
|
|
181
|
+
|
|
182
|
+
**Step 1 -- Collect Step 0 findings:**
|
|
183
|
+
After Step 0 checks (0a syntax, 0b lint, 0c non-ASCII) complete, gather any
|
|
184
|
+
issues that were found and fixed. Record each finding with: file, line, tool, issue.
|
|
185
|
+
|
|
186
|
+
**Step 2 -- Serialize as markdown table (capped at 20 rows):**
|
|
187
|
+
Format the findings as a structured context block:
|
|
188
|
+
|
|
189
|
+
```markdown
|
|
190
|
+
## Step 0 Findings (deterministic, already addressed)
|
|
191
|
+
|
|
192
|
+
The following issues were detected by Step 0 deterministic checks.
|
|
193
|
+
They have been fixed by the author. Do NOT re-flag these specific issues.
|
|
194
|
+
If you find NEW instances of the same pattern elsewhere, report them.
|
|
195
|
+
|
|
196
|
+
| # | File | Line | Tool | Issue |
|
|
197
|
+
|---|------|------|------|-------|
|
|
198
|
+
| 1 | path/to/file.py | 42 | pylint W0707 | raise-missing-from |
|
|
199
|
+
| 2 | path/to/file.sh | 15 | shellcheck SC2086 | unquoted variable |
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
**Size cap:** If Step 0 found more than 20 issues, show only the first 20 rows
|
|
203
|
+
and add this note after the table:
|
|
204
|
+
|
|
205
|
+
```
|
|
206
|
+
[forge] Step 0 found N issues total. Showing first 20. Full list in .forge/step0_findings.txt.
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
Write the complete list to `.forge/step0_findings.txt` for reference.
|
|
210
|
+
|
|
211
|
+
If Step 0 found zero issues, use this shorter block:
|
|
212
|
+
|
|
213
|
+
```markdown
|
|
214
|
+
## Step 0 Findings (deterministic)
|
|
215
|
+
|
|
216
|
+
Step 0 checks (syntax, lint, non-ASCII) found zero issues. No prior context.
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
**Step 3 -- Inject into each LLM pass:**
|
|
220
|
+
Before invoking each pass (/qodo-review, /code-review-expert, /adversarial-qe),
|
|
221
|
+
prepend the Step 0 context block to the review prompt. The context block goes
|
|
222
|
+
BEFORE the diff content, so the LLM sees it first.
|
|
223
|
+
|
|
224
|
+
**Rules for LLM passes when receiving Step 0 context:**
|
|
225
|
+
1. Do NOT re-flag the exact same issue at the exact same file:line that Step 0 caught
|
|
226
|
+
2. DO flag NEW instances of the same pattern in OTHER locations
|
|
227
|
+
3. DO flag related-but-different issues at the same location (e.g., Step 0 caught
|
|
228
|
+
a missing import, but Pass 2 notices the function using that import has a logic error)
|
|
229
|
+
4. When in doubt, report the finding but note "Step 0 caught a related issue at this location"
|
|
230
|
+
|
|
231
|
+
---
|
|
232
|
+
|
|
233
|
+
# Steps 1-3: Three-Cycle Static Review
|
|
234
|
+
|
|
235
|
+
## State Machine
|
|
236
|
+
|
|
237
|
+
```
|
|
238
|
+
State: cycle_counter = 0 (target = 3)
|
|
239
|
+
p3_by_rule = {} # {rule_type: [file_paths]}
|
|
240
|
+
changed_lines = N # from git diff --stat
|
|
241
|
+
|
|
242
|
+
loop:
|
|
243
|
+
run Cycle (Pass 1 -> Pass 2 -> Pass 3)
|
|
244
|
+
|
|
245
|
+
After EACH pass:
|
|
246
|
+
normalize findings to P0/P1/P2/P3 (see Severity Normalization)
|
|
247
|
+
validate finding data before storing (see Finding Persistence)
|
|
248
|
+
persist ALL findings to .forge/findings.json (see Finding Persistence)
|
|
249
|
+
|
|
250
|
+
if zero findings:
|
|
251
|
+
[AUTO-CONTINUE] immediately proceed to next pass (TRUST-06)
|
|
252
|
+
report: "[forge] Cycle N/3, Pass P/3: skill-name -- CLEAN"
|
|
253
|
+
do NOT wait for user input
|
|
254
|
+
|
|
255
|
+
else if any P0 or P1 finding:
|
|
256
|
+
[FULL RESET] fix all findings, cycle_counter = 0 (TRUST-07)
|
|
257
|
+
report: "[forge] P0/P1 found -- full reset. cycle_counter = 0"
|
|
258
|
+
goto loop
|
|
259
|
+
|
|
260
|
+
else if any P2 finding (no P0/P1):
|
|
261
|
+
[CYCLE RESTART] fix P2 findings, restart current cycle (TRUST-07)
|
|
262
|
+
report: "[forge] P2 found -- restarting current cycle"
|
|
263
|
+
do NOT reset cycle_counter to 0
|
|
264
|
+
restart current cycle from Pass 1
|
|
265
|
+
|
|
266
|
+
else if only P3 findings:
|
|
267
|
+
[ACCUMULATE with density-based escalation] (TRUST-07 + P3-THRESHOLD-RESEARCH)
|
|
268
|
+
|
|
269
|
+
Step A -- Deduplicate: group new P3s by rule type
|
|
270
|
+
for each P3: p3_by_rule[rule_type].append(file_path)
|
|
271
|
+
|
|
272
|
+
Step B -- Compute metrics:
|
|
273
|
+
distinct_per_file = max(len(set(rules_in_file)) for each file)
|
|
274
|
+
distinct_per_diff = len(p3_by_rule.keys())
|
|
275
|
+
density = total_p3_count / changed_lines
|
|
276
|
+
|
|
277
|
+
Step C -- Check thresholds (any one triggers escalation):
|
|
278
|
+
if distinct_per_file > 5:
|
|
279
|
+
report: "[forge] P3 density: >5 distinct violations in {file} -- P2 escalation"
|
|
280
|
+
restart current cycle (P2-equivalent)
|
|
281
|
+
else if distinct_per_diff > 10:
|
|
282
|
+
report: "[forge] P3 density: >10 distinct violations across diff -- P2 escalation"
|
|
283
|
+
restart current cycle (P2-equivalent)
|
|
284
|
+
else if density > 0.15:
|
|
285
|
+
report: "[forge] P3 density: {density:.2f}/line (>0.15) -- P2 escalation"
|
|
286
|
+
restart current cycle (P2-equivalent)
|
|
287
|
+
else:
|
|
288
|
+
report: "[forge] P3: {N} findings ({distinct_per_diff} distinct rules), density {density:.2f}/line -- below threshold, continuing"
|
|
289
|
+
proceed to next pass without fixing
|
|
290
|
+
|
|
291
|
+
After all 3 passes in a cycle complete:
|
|
292
|
+
cycle_counter += 1
|
|
293
|
+
if cycle_counter == 3:
|
|
294
|
+
proceed to Step 3.5 or Step 4
|
|
295
|
+
else:
|
|
296
|
+
goto loop
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
**Critical change from current behavior:** The current state machine resets cycle_counter on ANY finding. The new state machine only resets on P0/P1. P2 restarts the current cycle without resetting the counter. P3 uses density-based escalation with deduplication: per-file >5, per-diff >10, or density >0.15/line triggers P2-equivalent restart. Based on P3-THRESHOLD-RESEARCH.md (Google Tricorder, BitsAI-CR, Broken Windows theory, ESLint --max-warnings).
|
|
300
|
+
|
|
301
|
+
## Auto-Continue Protocol (TRUST-06)
|
|
302
|
+
|
|
303
|
+
After each pass completes:
|
|
304
|
+
- If **zero findings**: immediately invoke the next pass. Do not output
|
|
305
|
+
"waiting for input" or "how would you like to proceed?" prompts.
|
|
306
|
+
Report the clean result in one line and move on:
|
|
307
|
+
`[forge] Cycle 2/3, Pass 1/3: qodo-review -- CLEAN`
|
|
308
|
+
- If **findings exist**: pause and present findings for user decision
|
|
309
|
+
(accept/reject/fix). Only proceed after user responds.
|
|
310
|
+
|
|
311
|
+
This eliminates the current UX pain of typing "continue" after every clean pass.
|
|
312
|
+
The pipeline should flow silently through clean passes and only stop when
|
|
313
|
+
human judgment is needed.
|
|
314
|
+
|
|
315
|
+
## Each Cycle = 3 Sequential Passes
|
|
316
|
+
|
|
317
|
+
### Pass 1: /qodo-review
|
|
318
|
+
|
|
319
|
+
Invoke the `/qodo-review` skill.
|
|
320
|
+
|
|
321
|
+
- Change-aware pre-review with feature-grouped walkthrough
|
|
322
|
+
- Severity: Red (must fix) / Yellow (problematic) / Green (minor)
|
|
323
|
+
- Anti-hallucination gate: mandatory re-read via Read tool + grep verification before reporting any finding
|
|
324
|
+
- Large diffs (>500 lines or >10 files): split into batches, review serially
|
|
325
|
+
- Read-only analysis only -- no code modifications
|
|
326
|
+
- Output: Changes Summary -> Files Walkthrough -> Code Suggestions
|
|
327
|
+
|
|
328
|
+
### Pass 2: /code-review-expert
|
|
329
|
+
|
|
330
|
+
Invoke the `/code-review-expert` skill.
|
|
331
|
+
|
|
332
|
+
- Senior engineer lens: SOLID, architecture, security
|
|
333
|
+
- Severity: P0 (critical) / P1 (high) / P2 (medium) / P3 (low)
|
|
334
|
+
- Covers: SOLID + architecture -> removal candidates -> security scan -> commit message -> code quality
|
|
335
|
+
- Output: Summary -> Findings by severity -> Action plan
|
|
336
|
+
- Always asks user before implementing fixes
|
|
337
|
+
|
|
338
|
+
### Pass 3: /adversarial-qe
|
|
339
|
+
|
|
340
|
+
Invoke the `/adversarial-qe` skill.
|
|
341
|
+
|
|
342
|
+
- Red-team QE: assumes bugs exist until proven otherwise
|
|
343
|
+
- 14 attack dimensions:
|
|
344
|
+
1. Correctness and logic
|
|
345
|
+
2. Edge cases and boundaries (including "successful command, empty output" pattern)
|
|
346
|
+
3. Error handling and resilience
|
|
347
|
+
4. Security (injection, auth, secrets, TOCTOU)
|
|
348
|
+
5. Concurrency (races, deadlocks, lifecycle)
|
|
349
|
+
6. API and contract (breaking changes, validation)
|
|
350
|
+
7. Bidirectional correctness (round-trip encode/decode)
|
|
351
|
+
8. Graceful degradation (missing optional dependencies)
|
|
352
|
+
9. Convention adherence (grep FULL FILE, not just diff) -- expanded with naming quality and readability
|
|
353
|
+
10. Performance and scalability
|
|
354
|
+
11. Test quality
|
|
355
|
+
12. AI-generated code smells
|
|
356
|
+
13. Documentation completeness [SHADOW] -- public API docstrings, changelog entries, README updates for user-facing changes
|
|
357
|
+
14. Change scope [SHADOW] -- single-concern diffs, flag unfocused changes mixing unrelated concerns
|
|
358
|
+
- 3-step finding verification gate: (1) Re-read code, (2) Ground truth verification, (3) Debate yourself
|
|
359
|
+
- Output: Severity-ordered table with Location / Finding / Evidence / Suggestion
|
|
360
|
+
|
|
361
|
+
## Severity Normalization
|
|
362
|
+
|
|
363
|
+
Every finding from any pass MUST be normalized to P0/P1/P2/P3 before recording. Use this mapping:
|
|
364
|
+
|
|
365
|
+
| qodo-review | code-review-expert | adversarial-qe | Normalized |
|
|
366
|
+
|-------------|-------------------|----------------|------------|
|
|
367
|
+
| Red (must fix) | P0 Critical | Critical | P0 |
|
|
368
|
+
| Red (must fix) | P1 High | High | P1 |
|
|
369
|
+
| Yellow (problematic) | P2 Medium | Medium | P2 |
|
|
370
|
+
| Green (minor) | P3 Low | Low/Nit | P3 |
|
|
371
|
+
|
|
372
|
+
When a pass reports findings without explicit severity, classify based on impact:
|
|
373
|
+
- P0: Data loss, security breach, crash in normal path
|
|
374
|
+
- P1: Logic error, wrong output, security weakness
|
|
375
|
+
- P2: Missing validation, incomplete error handling, non-trivial code smell
|
|
376
|
+
- P3: Style preference, naming nit, minor readability issue
|
|
377
|
+
|
|
378
|
+
## Finding Persistence (TRUST-01)
|
|
379
|
+
|
|
380
|
+
After each pass completes and findings are normalized, persist EVERY finding to `.forge/findings.json`. This includes zero-finding passes (record the pass metadata in runs).
|
|
381
|
+
|
|
382
|
+
**Recording a finding:** Use a Bash tool call with Python heredoc to append to findings.json:
|
|
383
|
+
|
|
384
|
+
```bash
|
|
385
|
+
python3 << 'PYEOF'
|
|
386
|
+
import json, uuid, datetime, os, tempfile, subprocess, sys
|
|
387
|
+
|
|
388
|
+
findings_file = '.forge/findings.json'
|
|
389
|
+
os.makedirs('.forge', exist_ok=True)
|
|
390
|
+
|
|
391
|
+
try:
|
|
392
|
+
with open(findings_file, 'r') as f:
|
|
393
|
+
data = json.load(f)
|
|
394
|
+
except (FileNotFoundError, json.JSONDecodeError):
|
|
395
|
+
data = {'version': 1, 'findings': [], 'runs': []}
|
|
396
|
+
|
|
397
|
+
# Get commit SHA via subprocess (NOT shell substitution -- quoted heredoc does not expand $())
|
|
398
|
+
try:
|
|
399
|
+
commit_sha = subprocess.check_output(
|
|
400
|
+
['git', 'rev-parse', '--short', 'HEAD'],
|
|
401
|
+
stderr=subprocess.DEVNULL, text=True
|
|
402
|
+
).strip()
|
|
403
|
+
except Exception:
|
|
404
|
+
commit_sha = 'unknown'
|
|
405
|
+
|
|
406
|
+
# VALIDATION: check extracted values before storing
|
|
407
|
+
VALID_SEVERITIES = {'P0', 'P1', 'P2', 'P3'}
|
|
408
|
+
VALID_DIMENSIONS = {
|
|
409
|
+
'correctness', 'security', 'performance',
|
|
410
|
+
'concurrency', 'api_contract', 'bidirectional', 'graceful_degradation',
|
|
411
|
+
'convention', 'test_quality', 'ai_code_smell',
|
|
412
|
+
'error_handling', 'edge_cases',
|
|
413
|
+
'doc_completeness', 'change_scope',
|
|
414
|
+
'unknown',
|
|
415
|
+
}
|
|
416
|
+
|
|
417
|
+
severity = 'REPLACE_WITH_SEVERITY'
|
|
418
|
+
dimension = 'REPLACE_WITH_DIMENSION'
|
|
419
|
+
file_path = 'REPLACE_WITH_ACTUAL_FILE'
|
|
420
|
+
|
|
421
|
+
if severity not in VALID_SEVERITIES:
|
|
422
|
+
print(f"[forge-warn] Invalid severity '{severity}', defaulting to P2", file=sys.stderr)
|
|
423
|
+
severity = 'P2'
|
|
424
|
+
if dimension not in VALID_DIMENSIONS:
|
|
425
|
+
print(f"[forge-warn] Invalid dimension '{dimension}', defaulting to unknown", file=sys.stderr)
|
|
426
|
+
dimension = 'unknown'
|
|
427
|
+
if file_path != 'unknown' and not os.path.isfile(file_path):
|
|
428
|
+
print(f"[forge-warn] File not found: '{file_path}', storing as-is", file=sys.stderr)
|
|
429
|
+
|
|
430
|
+
evidence_count = 1 # REPLACE_WITH_EVIDENCE_COUNT
|
|
431
|
+
llm_self_report = 0.8 # REPLACE_WITH_LLM_CONFIDENCE
|
|
432
|
+
|
|
433
|
+
if not isinstance(evidence_count, int) or evidence_count < 0:
|
|
434
|
+
print("[forge-warn] Invalid evidence_count, defaulting to 1", file=sys.stderr)
|
|
435
|
+
evidence_count = 1
|
|
436
|
+
if not isinstance(llm_self_report, (int, float)) or not (0.0 <= llm_self_report <= 1.0):
|
|
437
|
+
print("[forge-warn] Invalid llm_self_report, defaulting to 0.8", file=sys.stderr)
|
|
438
|
+
llm_self_report = 0.8
|
|
439
|
+
|
|
440
|
+
data['findings'].append({
|
|
441
|
+
'id': str(uuid.uuid4()),
|
|
442
|
+
'timestamp': datetime.datetime.now(datetime.timezone.utc).isoformat(),
|
|
443
|
+
'file': file_path,
|
|
444
|
+
'line': -1,
|
|
445
|
+
'dimension': dimension,
|
|
446
|
+
'pass': 1,
|
|
447
|
+
'cycle': 1,
|
|
448
|
+
'severity': severity,
|
|
449
|
+
'description': 'REPLACE_WITH_FINDING_TEXT',
|
|
450
|
+
'outcome': 'pending',
|
|
451
|
+
'reject_reason': None,
|
|
452
|
+
'commit_sha': commit_sha,
|
|
453
|
+
'cost_tokens': {'input': 0, 'output': 0},
|
|
454
|
+
'confidence': 0.0,
|
|
455
|
+
'confidence_signals': {
|
|
456
|
+
'dimension_fp_rate': 0.0,
|
|
457
|
+
'pass_agreement': 1.0,
|
|
458
|
+
'evidence_count': evidence_count,
|
|
459
|
+
'llm_self_report': llm_self_report,
|
|
460
|
+
},
|
|
461
|
+
'shadow': False, # True for shadow-mode dimensions (doc_completeness, change_scope)
|
|
462
|
+
})
|
|
463
|
+
|
|
464
|
+
# Atomic write
|
|
465
|
+
dir_name = os.path.dirname(findings_file) or '.'
|
|
466
|
+
fd, tmp = tempfile.mkstemp(dir=dir_name, suffix='.json')
|
|
467
|
+
try:
|
|
468
|
+
with os.fdopen(fd, 'w') as f:
|
|
469
|
+
json.dump(data, f, indent=2)
|
|
470
|
+
os.replace(tmp, findings_file)
|
|
471
|
+
except Exception:
|
|
472
|
+
try:
|
|
473
|
+
os.unlink(tmp)
|
|
474
|
+
except OSError:
|
|
475
|
+
pass
|
|
476
|
+
raise
|
|
477
|
+
PYEOF
|
|
478
|
+
```
|
|
479
|
+
|
|
480
|
+
Replace the placeholder values with actual finding data from the pass output. For each finding reported by a pass, execute one append call.
|
|
481
|
+
|
|
482
|
+
**Finding schema fields (D1):**
|
|
483
|
+
- `id`: UUID v4 (unique per finding)
|
|
484
|
+
- `timestamp`: ISO-8601 UTC
|
|
485
|
+
- `file`: relative path to the file with the finding
|
|
486
|
+
- `line`: line number (-1 if unknown)
|
|
487
|
+
- `dimension`: which review dimension (must be one of the 14 known dimensions in VALID_DIMENSIONS or "unknown")
|
|
488
|
+
- `pass`: which pass number (1, 2, or 3)
|
|
489
|
+
- `cycle`: which cycle number
|
|
490
|
+
- `severity`: normalized P0/P1/P2/P3 (validated before storage)
|
|
491
|
+
- `description`: finding text from the review pass
|
|
492
|
+
- `outcome`: "pending" (initial), "accepted", or "rejected"
|
|
493
|
+
- `reject_reason`: null (initial) or one of: HALLUCINATION, CONTEXT_MISSING, INTENTIONAL, NOT_APPLICABLE, STYLE_PREFERENCE, ACCEPTABLE_RISK
|
|
494
|
+
- `commit_sha`: short git SHA at time of finding (obtained via subprocess, NOT shell substitution)
|
|
495
|
+
- `cost_tokens`: {"input": N, "output": M} -- token counts for the pass that produced this finding (set to 0 during interactive mode; CLI wrapper populates actual values)
|
|
496
|
+
- `confidence`: float 0.0-1.0, computed by CLI post-run via backfill_confidence(). Set to 0.0 at recording time (SKILL.md heredoc cannot compute it -- needs historical FP data).
|
|
497
|
+
- `confidence_signals`: dict with raw signals for the confidence formula:
|
|
498
|
+
- `dimension_fp_rate`: 0.0 (placeholder, computed by CLI from findings.json history)
|
|
499
|
+
- `pass_agreement`: 1.0 (1.0 = finding from single pass; fraction of agreeing passes when multi-pass data available)
|
|
500
|
+
- `evidence_count`: number of distinct code locations examined to support this finding
|
|
501
|
+
- `llm_self_report`: LLM's stated confidence that this finding is a true positive (0.0-1.0)
|
|
502
|
+
|
|
503
|
+
## Confidence Signal Instructions
|
|
504
|
+
|
|
505
|
+
When recording a finding, you MUST set these fields to actual values, not defaults:
|
|
506
|
+
|
|
507
|
+
- `evidence_count`: Set to the number of distinct code locations (lines, functions, or
|
|
508
|
+
files) you examined to support this finding. Count only locations you actually read
|
|
509
|
+
and cite in the finding description. Minimum 1, typical range 1-10.
|
|
510
|
+
|
|
511
|
+
- `llm_self_report`: Set to your genuine confidence that this finding is a true positive,
|
|
512
|
+
as a float from 0.0 to 1.0. Consider:
|
|
513
|
+
- 0.9-1.0: You are certain this is a real issue (clear bug, obvious vulnerability)
|
|
514
|
+
- 0.7-0.8: High confidence but some ambiguity (pattern match, context-dependent)
|
|
515
|
+
- 0.4-0.6: Uncertain (could be intentional, might be a style choice)
|
|
516
|
+
- 0.1-0.3: Low confidence (speculative, may be a false positive)
|
|
517
|
+
Do NOT default to 0.8 -- assess each finding individually.
|
|
518
|
+
|
|
519
|
+
## Shadow Mode Dimensions (DIM-01, DIM-04)
|
|
520
|
+
|
|
521
|
+
Dimensions 13 (doc_completeness) and 14 (change_scope) operate in **shadow mode**:
|
|
522
|
+
- Findings ARE persisted to .forge/findings.json with `'shadow': True`
|
|
523
|
+
- Findings are NOT displayed to the user in review output
|
|
524
|
+
- Findings are NOT counted toward cycle reset decisions
|
|
525
|
+
- After 20+ shadow findings accumulate, FP rate is computed via `forge --eval --shadow`
|
|
526
|
+
- If FP < 10%: dimension is promoted to active. Use `forge --promote <dim>` to set all findings for that dimension to shadow=False.
|
|
527
|
+
- If FP >= 10%: SKILL.md prompt for that dimension needs improvement before retry
|
|
528
|
+
|
|
529
|
+
When recording a finding for dim 13 or 14, check config for promotion status before setting shadow flag:
|
|
530
|
+
```python
|
|
531
|
+
# Shadow dimension finding -- logged but NOT shown to user
|
|
532
|
+
# N4 fix: check promoted_dimensions in config before hardcoding shadow
|
|
533
|
+
SHADOW_DIMENSIONS = {'doc_completeness', 'change_scope'}
|
|
534
|
+
promoted = set(config.get('promoted_dimensions', []))
|
|
535
|
+
if dimension in SHADOW_DIMENSIONS and dimension not in promoted:
|
|
536
|
+
finding['shadow'] = True
|
|
537
|
+
```
|
|
538
|
+
|
|
539
|
+
**DIM-01 Documentation Completeness (dim 13):**
|
|
540
|
+
Check whether public-facing code changes include adequate documentation updates:
|
|
541
|
+
- New public functions/methods/classes: do they have docstrings?
|
|
542
|
+
- Changed function signatures: is the docstring updated to match?
|
|
543
|
+
- User-facing feature changes: is there a changelog entry or README update?
|
|
544
|
+
- API endpoint changes: is API documentation updated?
|
|
545
|
+
Do NOT flag: internal/private functions, test files, configuration changes, refactoring that preserves behavior.
|
|
546
|
+
|
|
547
|
+
**DIM-04 Change Scope (dim 14):**
|
|
548
|
+
Check whether the diff contains a single coherent concern:
|
|
549
|
+
- Does the diff mix unrelated changes (e.g., feature + refactor + bugfix)?
|
|
550
|
+
- Are there files modified that have no logical connection to the primary change?
|
|
551
|
+
- Does the commit message describe one thing but the diff does several?
|
|
552
|
+
Do NOT flag: necessary supporting changes (e.g., updating imports when moving a function), test additions for the primary change, formatting changes required by the primary change.
|
|
553
|
+
|
|
554
|
+
NOTE (R15): Shadow mode display filtering is implemented in Plan 04 (Wave 3). Until Plan 04 executes, shadow findings will appear in --stats/--eval output. This is acceptable during Phase 2 execution -- data collection starts immediately, filtering is wired later.
|
|
555
|
+
|
|
556
|
+
## Session State (Hook Integration)
|
|
557
|
+
|
|
558
|
+
The `check_review_tracker.sh` hook writes severity data to `.forge/current_session.json`
|
|
559
|
+
after each review pass. This file contains:
|
|
560
|
+
|
|
561
|
+
```json
|
|
562
|
+
{
|
|
563
|
+
"last_max_severity": "P2",
|
|
564
|
+
"last_review_pass": "qodo-review",
|
|
565
|
+
"qodo_runs": 3,
|
|
566
|
+
"rounds_with_findings": 1
|
|
567
|
+
}
|
|
568
|
+
```
|
|
569
|
+
|
|
570
|
+
When available, read this file to cross-check severity classification. If the hook
|
|
571
|
+
detected a higher severity than the SKILL.md state machine assigned, use the higher
|
|
572
|
+
severity (conservative approach). This provides a second layer of severity enforcement
|
|
573
|
+
beyond the SKILL.md instructions alone.
|
|
574
|
+
|
|
575
|
+
## Feedback Collection (LEARN-07-LITE)
|
|
576
|
+
|
|
577
|
+
All findings are initially recorded with `outcome: "pending"`.
|
|
578
|
+
|
|
579
|
+
**When to collect feedback:**
|
|
580
|
+
Feedback collection happens ONCE, at the END of the pipeline -- specifically at
|
|
581
|
+
the commit gate, AFTER Step 4 (smoke test) completes. This is the single point
|
|
582
|
+
where the user reviews all accumulated findings before committing.
|
|
583
|
+
|
|
584
|
+
Do NOT collect feedback during individual passes (this conflicts with auto-continue).
|
|
585
|
+
Do NOT pause between passes to ask about findings.
|
|
586
|
+
|
|
587
|
+
**At pipeline completion (commit gate):**
|
|
588
|
+
Present a summary table of ALL findings from this session:
|
|
589
|
+
|
|
590
|
+
```
|
|
591
|
+
[forge] Pipeline complete. Findings summary:
|
|
592
|
+
|
|
593
|
+
# | Severity | Dimension | File | Status
|
|
594
|
+
1 | P2 | security | hooks/check_*.sh | fixed (accepted)
|
|
595
|
+
2 | P3 | style | cli/forge_cli.py | accumulated (pending)
|
|
596
|
+
3 | P1 | correctness | skills/forge/SKILL.md | fixed (accepted)
|
|
597
|
+
|
|
598
|
+
Classify pending findings? [y/n/defer]
|
|
599
|
+
```
|
|
600
|
+
|
|
601
|
+
If user chooses to classify:
|
|
602
|
+
- For each pending finding, ask:
|
|
603
|
+
- **Accept**: The finding was valid (outcome = "accepted")
|
|
604
|
+
- **Reject**: The finding was a false positive (outcome = "rejected")
|
|
605
|
+
If rejected, ask which category:
|
|
606
|
+
1. HALLUCINATION -- the problem does not exist
|
|
607
|
+
2. CONTEXT_MISSING -- reviewer lacked necessary context
|
|
608
|
+
3. INTENTIONAL -- this was an intentional design choice
|
|
609
|
+
4. NOT_APPLICABLE -- the rule does not apply here
|
|
610
|
+
5. STYLE_PREFERENCE -- subjective, not a defect
|
|
611
|
+
6. ACCEPTABLE_RISK -- real issue, but risk accepted
|
|
612
|
+
|
|
613
|
+
If user defers: findings remain "pending" for later classification via `forge --classify`.
|
|
614
|
+
|
|
615
|
+
**Findings that were fixed:**
|
|
616
|
+
When a finding triggers a code fix (P0/P1/P2 that caused reset), automatically
|
|
617
|
+
set its outcome to "accepted" -- the act of fixing it confirms it was valid.
|
|
618
|
+
Only accumulated P3 findings and unfixed findings remain "pending".
|
|
619
|
+
|
|
620
|
+
**Updating a finding outcome:** Use a Bash tool call with Python heredoc:
|
|
621
|
+
|
|
622
|
+
```bash
|
|
623
|
+
python3 << 'PYEOF'
|
|
624
|
+
import json, os, tempfile
|
|
625
|
+
|
|
626
|
+
findings_file = '.forge/findings.json'
|
|
627
|
+
finding_id = 'REPLACE_WITH_FINDING_UUID'
|
|
628
|
+
new_outcome = 'rejected' # or 'accepted'
|
|
629
|
+
new_reason = 'HALLUCINATION' # or None for accepted
|
|
630
|
+
|
|
631
|
+
with open(findings_file, 'r') as f:
|
|
632
|
+
data = json.load(f)
|
|
633
|
+
|
|
634
|
+
for finding in data['findings']:
|
|
635
|
+
if finding['id'] == finding_id:
|
|
636
|
+
finding['outcome'] = new_outcome
|
|
637
|
+
finding['reject_reason'] = new_reason if new_outcome == 'rejected' else None
|
|
638
|
+
break
|
|
639
|
+
|
|
640
|
+
dir_name = os.path.dirname(findings_file) or '.'
|
|
641
|
+
fd, tmp = tempfile.mkstemp(dir=dir_name, suffix='.json')
|
|
642
|
+
try:
|
|
643
|
+
with os.fdopen(fd, 'w') as f:
|
|
644
|
+
json.dump(data, f, indent=2)
|
|
645
|
+
os.replace(tmp, findings_file)
|
|
646
|
+
except Exception:
|
|
647
|
+
try:
|
|
648
|
+
os.unlink(tmp)
|
|
649
|
+
except OSError:
|
|
650
|
+
pass
|
|
651
|
+
raise
|
|
652
|
+
PYEOF
|
|
653
|
+
```
|
|
654
|
+
|
|
655
|
+
## Why Each Pass Is Mandatory
|
|
656
|
+
|
|
657
|
+
- Pass 1 (qodo): catches structural/feature-level issues
|
|
658
|
+
- Pass 2 (code-review-expert): catches SOLID violations, architecture problems
|
|
659
|
+
- Pass 3 (adversarial-qe): catches regressions INTRODUCED BY fixes from Passes 1-2
|
|
660
|
+
|
|
661
|
+
This is the key insight: fixes create new bugs. Pass 3 exists to catch them.
|
|
662
|
+
|
|
663
|
+
## Cross-Function Enforcement
|
|
664
|
+
|
|
665
|
+
Diff-only review cannot catch cross-function inconsistencies. Pass 3 must grep the FULL FILE for consistency: error message prefixes, naming conventions, variable usage patterns.
|
|
666
|
+
|
|
667
|
+
## Handling Findings
|
|
668
|
+
|
|
669
|
+
Finding handling depends on severity (see Severity-Gated Cycle Reset above):
|
|
670
|
+
|
|
671
|
+
- **P0/P1 findings**: Fix ALL findings immediately. cycle_counter = 0. Restart from Cycle 1, Pass 1.
|
|
672
|
+
- **P2 findings**: Fix P2 findings. Restart current cycle from Pass 1. Do NOT reset cycle_counter.
|
|
673
|
+
- **P3 findings**: Record but do not fix immediately. Accumulate and continue to next pass.
|
|
674
|
+
- Deduplicate by rule type, then check density thresholds:
|
|
675
|
+
- Per-file >5 distinct rule violations: P2-equivalent restart
|
|
676
|
+
- Per-diff >10 distinct rule violations: P2-equivalent restart
|
|
677
|
+
- Density >0.15 P3 findings per changed line: P2-equivalent restart
|
|
678
|
+
- Below all thresholds: accumulate silently, continue
|
|
679
|
+
|
|
680
|
+
After fixing any finding, verify no out-of-scope files were modified:
|
|
681
|
+
```bash
|
|
682
|
+
git diff --name-only
|
|
683
|
+
```
|
|
684
|
+
Revert any out-of-scope changes with `git checkout -- <file>`.
|
|
685
|
+
|
|
686
|
+
## Hard Stop
|
|
687
|
+
|
|
688
|
+
The `check_review_tracker.sh` hook tracks state. After 3 rounds where findings persist, it blocks all Edit/Write operations. This requires human intervention to unblock and prevents infinite fix-break loops.
|
|
689
|
+
|
|
690
|
+
## Steps 1-3 Gate
|
|
691
|
+
|
|
692
|
+
- **Entry**: Step 0 passed
|
|
693
|
+
- **Exit**: 3 consecutive cycles where ALL 3 passes report zero findings (minimum 9 passes total)
|
|
694
|
+
- **On P0/P1**: fix -> counter = 0 -> restart from Cycle 1
|
|
695
|
+
- **On P2**: fix -> restart current cycle
|
|
696
|
+
- **On P3 only**: accumulate (density check -> P2 escalation if thresholds exceeded)
|
|
697
|
+
|
|
698
|
+
---
|
|
699
|
+
|
|
700
|
+
# Step 3.5: False-Positive Verification
|
|
701
|
+
|
|
702
|
+
Invoke `/kernel-fp-verify` skill.
|
|
703
|
+
|
|
704
|
+
## When to Run
|
|
705
|
+
|
|
706
|
+
- **Run**: after three-cycle review accumulated findings that were fixed
|
|
707
|
+
- **Skip**: if all 3 cycles were clean from the start (no findings ever reported)
|
|
708
|
+
|
|
709
|
+
## 10-Step Verification Protocol
|
|
710
|
+
|
|
711
|
+
For each accumulated finding that was fixed, verify:
|
|
712
|
+
|
|
713
|
+
1. Re-read the code at the cited location
|
|
714
|
+
2. Prove the path is REACHABLE (not just "unlikely")
|
|
715
|
+
3. Identify concrete failure mode (crash / wrong output / data corruption / security breach)
|
|
716
|
+
4. Check full context (2-3 levels up/down the call chain)
|
|
717
|
+
5. Check patch series context (for multi-patch sets)
|
|
718
|
+
6. Verify against independent ground truth
|
|
719
|
+
7. Check for intentional design (read comments/docs)
|
|
720
|
+
8. Test complex multi-step conditions
|
|
721
|
+
9. Anti-hallucination check (does the function/variable/constant actually exist?)
|
|
722
|
+
10. Debate yourself (author's perspective vs reviewer's perspective)
|
|
723
|
+
|
|
724
|
+
## Valid Dismissal Reasons (exhaustive)
|
|
725
|
+
|
|
726
|
+
- Hallucination (the function/variable does not exist)
|
|
727
|
+
- Structurally unreachable path
|
|
728
|
+
- Documented intentional behavior
|
|
729
|
+
- Subsequent patch in the series fixes it
|
|
730
|
+
|
|
731
|
+
No other dismissal reasons are valid.
|
|
732
|
+
|
|
733
|
+
## Output
|
|
734
|
+
|
|
735
|
+
Each finding classified as: CONFIRMED / DOWNGRADED / DISMISSED, with evidence and which verification steps failed.
|
|
736
|
+
|
|
737
|
+
---
|
|
738
|
+
|
|
739
|
+
# Step 4: Smoke Test
|
|
740
|
+
|
|
741
|
+
Invoke the `/smoke-test` skill.
|
|
742
|
+
|
|
743
|
+
## Coverage Matrix
|
|
744
|
+
|
|
745
|
+
All categories required unless clearly N/A:
|
|
746
|
+
|
|
747
|
+
| Category | What to test |
|
|
748
|
+
|----------|-------------|
|
|
749
|
+
| Normal path | Primary execution path, expected output |
|
|
750
|
+
| Boundary | Empty input, null, max size, zero-length |
|
|
751
|
+
| Security | Injection payloads, path traversal |
|
|
752
|
+
| Concurrency | Race conditions (if applicable) |
|
|
753
|
+
|
|
754
|
+
## Workflow
|
|
755
|
+
|
|
756
|
+
- **A.** Analyze change: what changed, primary execution path, edge cases
|
|
757
|
+
- **B.** Select test primitives from decision table (language-specific)
|
|
758
|
+
- **C.** Assemble test script using standard patterns
|
|
759
|
+
- **D.** Execute and record results (PASS/FAIL counts)
|
|
760
|
+
|
|
761
|
+
## Language-Specific Test Runners
|
|
762
|
+
|
|
763
|
+
| Language | Runner | Primitives |
|
|
764
|
+
|----------|--------|-----------|
|
|
765
|
+
| Shell | primitives.sh | run_and_capture, assert_success, assert_failure, assert_output_contains, assert_stderr_contains, assert_file_exists, assert_no_zombie, assert_json_valid, assert_no_command_exec, assert_no_path_traversal |
|
|
766
|
+
| Python | pytest | standard pytest assertions |
|
|
767
|
+
| Go | go test | standard testing package |
|
|
768
|
+
| C | Beaker / framework | see Kernel C Exception |
|
|
769
|
+
|
|
770
|
+
## Shell-Specific Footguns
|
|
771
|
+
|
|
772
|
+
These evade `bash -n` and `shellcheck` -- test for them explicitly:
|
|
773
|
+
|
|
774
|
+
1. bash auto-reaps direct children (need non-bash intermediate for zombie detection)
|
|
775
|
+
2. `local` only valid inside functions
|
|
776
|
+
3. `((x++))` returns old value (post-increment evaluates to 0 when x=0)
|
|
777
|
+
4. `$(...)` captures multi-line output (use `grep -q` with stdout redirect)
|
|
778
|
+
5. `jq -e` prints to stdout (always `>/dev/null 2>&1`)
|
|
779
|
+
|
|
780
|
+
## Kernel C Exception
|
|
781
|
+
|
|
782
|
+
Pre-commit Step 4 = build passes + kernel-qe test plan exists + Beaker job XML generated.
|
|
783
|
+
Step 5 (Beaker submission) = pre-merge gate, not pre-commit requirement.
|
|
784
|
+
|
|
785
|
+
## Prohibited During Smoke Test
|
|
786
|
+
|
|
787
|
+
- Do NOT modify tested code
|
|
788
|
+
- Do NOT depend on network
|
|
789
|
+
- Do NOT include syntax checks (those belong in Step 0)
|
|
790
|
+
|
|
791
|
+
## Step 4 Gate
|
|
792
|
+
|
|
793
|
+
- **Entry**: cycle_counter = 3 and Step 3.5 complete (if applicable)
|
|
794
|
+
- **Exit**: all tests PASS
|
|
795
|
+
- **On failure**: fix the code -> restart from Step 0 (full pipeline restart, not just re-run smoke test)
|
|
796
|
+
|
|
797
|
+
---
|
|
798
|
+
|
|
799
|
+
# Step 5: R1 Test Gate
|
|
800
|
+
|
|
801
|
+
## Purpose
|
|
802
|
+
|
|
803
|
+
Tests must exist for every diff-impacted source file and must pass. The gate
|
|
804
|
+
detects changed source files, maps them to expected test files using ecosystem
|
|
805
|
+
conventions, runs the test suite, and fails if any test fails or if no test
|
|
806
|
+
file can be found for a public function in the changed source.
|
|
807
|
+
|
|
808
|
+
## Algorithm (language-independent)
|
|
809
|
+
|
|
810
|
+
1. Determine changed source files from the diff (exclude test files themselves).
|
|
811
|
+
2. For each changed source file, locate candidate test files using ecosystem
|
|
812
|
+
naming conventions (see Tool Table below).
|
|
813
|
+
3. Run the test suite restricted to those candidate test files.
|
|
814
|
+
4. If no candidate test file exists for a public function in the changed source,
|
|
815
|
+
emit R1 PARTIAL (LLM fallback applies -- see Fallback).
|
|
816
|
+
5. If any test fails, R1 FAIL. If all pass (or skip), R1 PASS.
|
|
817
|
+
|
|
818
|
+
## Tool Table
|
|
819
|
+
|
|
820
|
+
| Language | Test Runner (R1) | Test file naming convention |
|
|
821
|
+
|---|---|---|
|
|
822
|
+
| Python | `pytest` | `tests/test_<module>.py` or `test_<module>.py` |
|
|
823
|
+
| Go | `go test ./...` | `<package>_test.go` in same directory |
|
|
824
|
+
| Rust | `cargo test` | `tests/` dir or `#[cfg(test)]` in same file |
|
|
825
|
+
| JavaScript | `jest` / `vitest` / `mocha` | `<module>.test.js` or `__tests__/<module>.js` |
|
|
826
|
+
| TypeScript | `jest` / `vitest` | `<module>.test.ts` or `__tests__/<module>.ts` |
|
|
827
|
+
| Java | `mvn test` / `gradle test` | `<Class>Test.java` or `Test<Class>.java` |
|
|
828
|
+
| Kotlin | `gradle test` | `<Class>Test.kt` |
|
|
829
|
+
| C | `ctest` / `make test` | `test_<module>.c` or `<module>_test.c` |
|
|
830
|
+
| C++ | `ctest` / `make test` | `test_<module>.cpp` or `<module>_test.cpp` |
|
|
831
|
+
| Kernel C | Beaker functional | `runtest.sh` under test case directory |
|
|
832
|
+
| Shell | `bats` / inline | `test_<script>.bats` or `test_<script>.sh` |
|
|
833
|
+
| Ruby | `rspec` / `minitest` | `<module>_spec.rb` or `test_<module>.rb` |
|
|
834
|
+
| PHP | `phpunit` | `<Class>Test.php` |
|
|
835
|
+
| Swift | `swift test` | `<Module>Tests.swift` |
|
|
836
|
+
|
|
837
|
+
## Python CLI Fast Path (optional)
|
|
838
|
+
|
|
839
|
+
```
|
|
840
|
+
code-forge gate-check
|
|
841
|
+
```
|
|
842
|
+
|
|
843
|
+
Reads `.code-forge/gate.yaml` for test command and path filter configuration.
|
|
844
|
+
|
|
845
|
+
## Fallback (no test file found)
|
|
846
|
+
|
|
847
|
+
When no test file can be located for a changed public function:
|
|
848
|
+
1. LLM identifies all public functions in the changed source.
|
|
849
|
+
2. For each untested public function, generates a stub test that calls the
|
|
850
|
+
function with representative inputs and asserts the return type.
|
|
851
|
+
3. Mark R1 PARTIAL in findings.json. The stub test is advisory -- it does not
|
|
852
|
+
replace a real test.
|
|
853
|
+
|
|
854
|
+
## Failure Handling
|
|
855
|
+
|
|
856
|
+
- FAIL -> fix (add or repair tests) -> cycle_counter = 0 -> restart from Step 0
|
|
857
|
+
- Record to `.code-forge/findings.json`:
|
|
858
|
+
|
|
859
|
+
```json
|
|
860
|
+
{
|
|
861
|
+
"gate": "R1",
|
|
862
|
+
"result": "FAIL",
|
|
863
|
+
"failed_tests": ["tests/test_foo.py::test_bar"],
|
|
864
|
+
"missing_coverage": ["src/foo.py::public_fn"]
|
|
865
|
+
}
|
|
866
|
+
```
|
|
867
|
+
|
|
868
|
+
---
|
|
869
|
+
|
|
870
|
+
# Step 6: R2 Mutation Check
|
|
871
|
+
|
|
872
|
+
## Purpose
|
|
873
|
+
|
|
874
|
+
Tests must be capable of killing mutants introduced into the changed code, not
|
|
875
|
+
just achieve line coverage. A passing test suite that cannot detect a simple
|
|
876
|
+
mutation (e.g., flipped boolean, off-by-one) is toothless. R2 detects this by
|
|
877
|
+
mutating the changed files and running the test suite against each mutant. Any
|
|
878
|
+
surviving mutant means the tests cannot catch the corresponding change.
|
|
879
|
+
|
|
880
|
+
## Algorithm (language-independent)
|
|
881
|
+
|
|
882
|
+
1. Scope mutation to diff-changed files only (not the full codebase).
|
|
883
|
+
2. Run the baseline test suite three times to confirm it is not flaky.
|
|
884
|
+
3. If the mutation tool is not installed, log `tool_missing` and WARN (not FAIL).
|
|
885
|
+
4. Apply the mutation tool to generate mutants for each changed file.
|
|
886
|
+
5. Run the test suite against each mutant.
|
|
887
|
+
6. Collect surviving mutants (mutants not killed by any test).
|
|
888
|
+
7. If survivor count > 0, R2 FAIL with survivor list. Otherwise R2 PASS.
|
|
889
|
+
|
|
890
|
+
## Tool Table
|
|
891
|
+
|
|
892
|
+
| Language | Mutation Tool (R2) | Notes |
|
|
893
|
+
|---|---|---|
|
|
894
|
+
| Python | `mutmut` (preferred) or `cosmic-ray` | `mutmut run` + `mutmut results` |
|
|
895
|
+
| Go | `gremlins` or `go-mutesting` | `gremlins unleash ./...` |
|
|
896
|
+
| Rust | `cargo mutants` | `cargo mutants --workspace` |
|
|
897
|
+
| JavaScript | `stryker-mutator` | `npx stryker run` |
|
|
898
|
+
| TypeScript | `stryker-mutator` | `npx stryker run` |
|
|
899
|
+
| Java | `pitest` | `mvn org.pitest:pitest-maven:mutationCoverage` |
|
|
900
|
+
| Kotlin | `pitest` | `gradle pitest` |
|
|
901
|
+
| C | `mull` | `mull-runner <test-binary>` |
|
|
902
|
+
| C++ | `mull` | `mull-runner <test-binary>` |
|
|
903
|
+
| Kernel C | N/A | Beaker functional tests only; skip R2 |
|
|
904
|
+
| Shell | LLM-inject 10 mutants | See Fallback below |
|
|
905
|
+
| Ruby | `mutant` | `mutant run` |
|
|
906
|
+
| PHP | `infection` | `./vendor/bin/infection` |
|
|
907
|
+
| Swift | `muter` | `muter run` |
|
|
908
|
+
|
|
909
|
+
## Python CLI Fast Path (optional)
|
|
910
|
+
|
|
911
|
+
```
|
|
912
|
+
code-forge mutation-check --timeout 600
|
|
913
|
+
```
|
|
914
|
+
|
|
915
|
+
Defaults to uncommitted changes. Pass `--diff <path>` to specify a diff file.
|
|
916
|
+
Pass `--paths <glob>` to restrict to matching files.
|
|
917
|
+
|
|
918
|
+
## Fallback (no tool installed)
|
|
919
|
+
|
|
920
|
+
When the mutation tool is not installed:
|
|
921
|
+
1. Log `tool_missing: <tool_name>` to `.code-forge/findings.json`.
|
|
922
|
+
2. LLM injects 10 representative mutants per changed function manually:
|
|
923
|
+
negate a boolean, flip a comparison operator, remove a guard clause,
|
|
924
|
+
swap two arguments, change a return value.
|
|
925
|
+
3. Run the test suite after each manual mutation.
|
|
926
|
+
4. Report surviving manual mutants as R2 advisory findings (not FAIL).
|
|
927
|
+
5. Mark R2 PARTIAL in findings.json.
|
|
928
|
+
|
|
929
|
+
## Failure Handling
|
|
930
|
+
|
|
931
|
+
- FAIL -> add or strengthen tests -> cycle_counter = 0 -> restart from Step 0
|
|
932
|
+
- Record to `.code-forge/findings.json`:
|
|
933
|
+
|
|
934
|
+
```json
|
|
935
|
+
{
|
|
936
|
+
"gate": "R2",
|
|
937
|
+
"result": "FAIL",
|
|
938
|
+
"survivors": [
|
|
939
|
+
"code_forge.mutation.run_mutation__mutmut_3",
|
|
940
|
+
"code_forge.mutation.run_mutation__mutmut_7"
|
|
941
|
+
]
|
|
942
|
+
}
|
|
943
|
+
```
|
|
944
|
+
|
|
945
|
+
---
|
|
946
|
+
|
|
947
|
+
# Step 7: R3 E2E Coverage
|
|
948
|
+
|
|
949
|
+
## Purpose
|
|
950
|
+
|
|
951
|
+
When a diff touches multiple source components AND modifies a function signature
|
|
952
|
+
or return type, cross-component integration is at risk. R3 checks whether an
|
|
953
|
+
e2e test artifact exists that covers the boundary. It operates in two layers:
|
|
954
|
+
Layer 1 (heuristic, always active) emits an advisory finding when >=2 source
|
|
955
|
+
groups are changed and a signature modification is detected. Layer 2 (opt-in,
|
|
956
|
+
requires `.code-forge/components.yaml`) emits a blocking finding when a hub
|
|
957
|
+
component and a dependent are both modified and no e2e artifact exists under the
|
|
958
|
+
dependent's paths.
|
|
959
|
+
|
|
960
|
+
## Algorithm (language-independent)
|
|
961
|
+
|
|
962
|
+
1. Parse the diff to detect signature changes (Python `def`, shell functions,
|
|
963
|
+
section headers matching a def pattern).
|
|
964
|
+
2. Group changed source files by component using path heuristics or
|
|
965
|
+
`.code-forge/components.yaml` if present.
|
|
966
|
+
3. **Layer 1 (heuristic):** if >=2 source groups changed AND a signature change
|
|
967
|
+
detected -> emit advisory finding (DISMISSED disposition, non-blocking).
|
|
968
|
+
4. **Layer 2 (explicit, opt-in):** if `components.yaml` present, resolve hub +
|
|
969
|
+
dependent co-occurrence. If both touched and no e2e artifact matches the
|
|
970
|
+
configured `e2e_patterns` under the dependent's paths -> emit blocking finding
|
|
971
|
+
(UNCERTAIN disposition, R3 FAIL). `e2e_absent_ok` in components.yaml
|
|
972
|
+
provides an escape hatch for components intentionally lacking e2e coverage.
|
|
973
|
+
5. If no components.yaml and no path heuristic match -> SKIP with WARN.
|
|
974
|
+
|
|
975
|
+
## Tool Table
|
|
976
|
+
|
|
977
|
+
| Ecosystem | E2E artifact patterns | Notes |
|
|
978
|
+
|---|---|---|
|
|
979
|
+
| Python | `tests/e2e/**`, `test_*integration*` | Default patterns |
|
|
980
|
+
| Go | `*_integration_test.go`, `e2e/**/*_test.go` | |
|
|
981
|
+
| Rust | `tests/integration_*.rs`, `tests/e2e_*.rs` | |
|
|
982
|
+
| JavaScript/TS | `e2e/**/*.spec.*`, `**/*.e2e-spec.*`, `cypress/**` | |
|
|
983
|
+
| Java/Kotlin | `*IT.java`, `*IntegrationTest.java`, `*IT.kt` | |
|
|
984
|
+
| C/C++ | `test/integration_*`, `tests/e2e_*` | |
|
|
985
|
+
| Shell | `tests/e2e_*.sh`, `tests/integration_*.sh` | |
|
|
986
|
+
|
|
987
|
+
## Python CLI Fast Path (optional)
|
|
988
|
+
|
|
989
|
+
```
|
|
990
|
+
code-forge e2e-check
|
|
991
|
+
```
|
|
992
|
+
|
|
993
|
+
Defaults to uncommitted changes and current directory as repo root. Pass
|
|
994
|
+
`--diff <path>` to specify a diff file. Pass `--repo-root <path>` to set
|
|
995
|
+
the repository root for artifact search.
|
|
996
|
+
|
|
997
|
+
## Fallback (no components.yaml, no path heuristic match)
|
|
998
|
+
|
|
999
|
+
When `.code-forge/components.yaml` is absent and the path heuristic cannot
|
|
1000
|
+
group changed files into >=2 components:
|
|
1001
|
+
- SKIP with WARN: log `e2e_check: skip: no components config and no
|
|
1002
|
+
cross-component change detected` to `.code-forge/findings.json`.
|
|
1003
|
+
- R3 result is SKIP (not FAIL); the pipeline proceeds to commit gate.
|
|
1004
|
+
|
|
1005
|
+
## Failure Handling
|
|
1006
|
+
|
|
1007
|
+
- Layer 1 finding (advisory): accumulate, do not block pipeline.
|
|
1008
|
+
- Layer 2 finding (blocking): FAIL -> add or identify e2e test artifact ->
|
|
1009
|
+
cycle_counter = 0 -> restart from Step 0.
|
|
1010
|
+
- Record to `.code-forge/findings.json`:
|
|
1011
|
+
|
|
1012
|
+
```json
|
|
1013
|
+
{
|
|
1014
|
+
"gate": "R3",
|
|
1015
|
+
"result": "FAIL",
|
|
1016
|
+
"survivors": [],
|
|
1017
|
+
"description": "cross-component change: hub 'core' + dependent 'api' both touched; no e2e artifact found"
|
|
1018
|
+
}
|
|
1019
|
+
```
|
|
1020
|
+
|
|
1021
|
+
SKIP records:
|
|
1022
|
+
|
|
1023
|
+
```json
|
|
1024
|
+
{
|
|
1025
|
+
"gate": "R3",
|
|
1026
|
+
"result": "SKIP",
|
|
1027
|
+
"survivors": []
|
|
1028
|
+
}
|
|
1029
|
+
```
|
|
1030
|
+
|
|
1031
|
+
---
|
|
1032
|
+
|
|
1033
|
+
# Commit Gate
|
|
1034
|
+
|
|
1035
|
+
Only after ALL steps complete:
|
|
1036
|
+
|
|
1037
|
+
```bash
|
|
1038
|
+
git commit -m "<subsystem>/<case>: <summary>
|
|
1039
|
+
|
|
1040
|
+
<detailed description>
|
|
1041
|
+
|
|
1042
|
+
Signed-off-by: Minxi Hou <houminxi@gmail.com>" # post-review-c3
|
|
1043
|
+
```
|
|
1044
|
+
|
|
1045
|
+
## Completion Checklist
|
|
1046
|
+
|
|
1047
|
+
Before committing, all of the following must be satisfied:
|
|
1048
|
+
|
|
1049
|
+
- [ ] 3 consecutive clean review cycles (Steps 1-3) with zero findings
|
|
1050
|
+
- [ ] Step 3.5 false-positive verification complete (if findings were fixed)
|
|
1051
|
+
- [ ] Step 4 smoke test: PASS
|
|
1052
|
+
- [ ] Step 5 R1 test gate: PASS (or PARTIAL with stub tests generated)
|
|
1053
|
+
- [ ] Step 6 R2 mutation check: PASS (or PARTIAL if tool absent + LLM fallback done)
|
|
1054
|
+
- [ ] Step 7 R3 e2e check: PASS or SKIP (SKIP is acceptable when no cross-component change detected)
|
|
1055
|
+
|
|
1056
|
+
## findings.json: dynamic_gate_run entry shape
|
|
1057
|
+
|
|
1058
|
+
Each dynamic gate (R1/R2/R3) run appends an entry to `.code-forge/findings.json`
|
|
1059
|
+
under a `dynamic_gate_run` key. The schema:
|
|
1060
|
+
|
|
1061
|
+
```json
|
|
1062
|
+
{
|
|
1063
|
+
"dynamic_gate_run": {
|
|
1064
|
+
"gate": "R1",
|
|
1065
|
+
"result": "PASS",
|
|
1066
|
+
"timestamp": "2026-05-27T12:00:00Z",
|
|
1067
|
+
"survivors": [],
|
|
1068
|
+
"failed_tests": [],
|
|
1069
|
+
"missing_coverage": [],
|
|
1070
|
+
"tool": "pytest",
|
|
1071
|
+
"tool_missing": false,
|
|
1072
|
+
"infra_errors": []
|
|
1073
|
+
}
|
|
1074
|
+
}
|
|
1075
|
+
```
|
|
1076
|
+
|
|
1077
|
+
Fields:
|
|
1078
|
+
- `gate`: "R1", "R2", or "R3"
|
|
1079
|
+
- `result`: "PASS", "FAIL", "SKIP", or "PARTIAL"
|
|
1080
|
+
- `timestamp`: ISO-8601 UTC
|
|
1081
|
+
- `survivors`: list of mutant names (R2) or finding descriptions (R3)
|
|
1082
|
+
- `failed_tests`: list of test identifiers that failed (R1 only)
|
|
1083
|
+
- `missing_coverage`: list of source locations with no test file (R1 only)
|
|
1084
|
+
- `tool`: name of the tool invoked (e.g., "mutmut", "pytest", "e2e_check")
|
|
1085
|
+
- `tool_missing`: true if the tool was not installed (soft dependency)
|
|
1086
|
+
- `infra_errors`: list of infrastructure error strings
|
|
1087
|
+
|
|
1088
|
+
## Rules
|
|
1089
|
+
|
|
1090
|
+
- `# post-review-c3` is an internal gate marker ONLY -- it triggers the hook check
|
|
1091
|
+
- The marker must NEVER appear in the commit message content itself
|
|
1092
|
+
- The commit message must read as if written by a human engineer
|
|
1093
|
+
- Zero AI markers: no Co-Authored-By, no model names, no review process metadata
|
|
1094
|
+
|
|
1095
|
+
## Non-Code Exemptions
|
|
1096
|
+
|
|
1097
|
+
These commit types bypass the full pipeline but still require worktree and
|
|
1098
|
+
AI-attribution checks. Steps 5-7 (R1/R2/R3) are also skipped for these types:
|
|
1099
|
+
|
|
1100
|
+
- `# docs` -- documentation only
|
|
1101
|
+
- `# config` -- configuration changes
|
|
1102
|
+
- `# chore` -- tooling, dependencies, cleanup
|
|
1103
|
+
- `# wip` -- work in progress
|
|
1104
|
+
|
|
1105
|
+
---
|
|
1106
|
+
|
|
1107
|
+
# Adaptive Mechanisms
|
|
1108
|
+
|
|
1109
|
+
These are built into the pipeline and must be followed:
|
|
1110
|
+
|
|
1111
|
+
1. **Severity-Gated Cycle Reset (TRUST-07)**: P0/P1 findings reset counter to 0 and restart from Cycle 1 Pass 1. P2 findings restart the current cycle without resetting the counter. P3 findings accumulate with density-based escalation: deduplicate by rule type, then check per-file >5, per-diff >10, density >0.15/line -- any trigger causes P2-equivalent restart. Below threshold: accumulate silently, report count, continue. This replaces the previous unconditional reset behavior, reducing wasted passes by an estimated 60%+ while maintaining quality for critical issues.
|
|
1112
|
+
|
|
1113
|
+
2. **Hard Stop After 3 Rounds With Findings**: hook blocks all Edit/Write. Forces human intervention. Prevents infinite fix-break loops.
|
|
1114
|
+
|
|
1115
|
+
3. **Cross-Function Grep (Pass 3)**: dimension 9 "Convention adherence" requires grepping the full file, not just the diff. Catches cross-function inconsistencies.
|
|
1116
|
+
|
|
1117
|
+
4. **Anti-Hallucination Gates**: Pass 1 (re-read + grep), Pass 3 (3-step verification), Step 3.5 (10-step protocol with existence check).
|
|
1118
|
+
|
|
1119
|
+
5. **Cross-Model Complementarity**: different AI models catch different bug classes. The 3-pass structure exploits this: structural (Pass 1), architectural (Pass 2), adversarial (Pass 3).
|
|
1120
|
+
|
|
1121
|
+
6. **Ground Truth Verification for Test Infrastructure**: test assertions validated via bug injection: inject bug -> FAIL -> revert -> PASS. Static analysis alone cannot catch faulty assertion logic.
|
|
1122
|
+
|
|
1123
|
+
7. **Full Pipeline Restart on Smoke Test Failure**: smoke test FAIL -> fix -> restart from Step 0 (not Step 4). The fix itself may introduce new lint/review issues.
|
|
1124
|
+
|
|
1125
|
+
8. **Bidirectional Correctness**: round-trip operations (encode/decode, serialize/deserialize) verified in both directions. Origin: Sashiko review gap.
|
|
1126
|
+
|
|
1127
|
+
9. **Graceful Degradation**: missing optional dependencies must degrade gracefully, not crash. Review checks for this explicitly. Origin: Sashiko review gap.
|
|
1128
|
+
|
|
1129
|
+
10. **Scope Verification After Automated Tools**: after any review pass, check `git status` / `git diff --name-only` to confirm no out-of-scope files were modified. Revert any out-of-scope changes immediately.
|
|
1130
|
+
|
|
1131
|
+
11. **Auto-Continue on Clean Pass (TRUST-06)**: when a pass reports zero findings, forge immediately proceeds to the next pass/cycle without waiting for user input. Only pauses when findings exist and user decision is needed. Eliminates the "type continue after every LGTM pass" UX friction.
|
|
1132
|
+
|
|
1133
|
+
12. **Finding Persistence (TRUST-01)**: every finding is recorded to .forge/findings.json with structured metadata (severity, dimension, outcome, reject_reason). Extracted data is validated before storage (severity must be P0-P3, dimension must be in known set, file path existence checked). This enables Phase 1b calibration via 30+ days of accumulated data.
|
|
1134
|
+
|
|
1135
|
+
13. **Feedback Collection (LEARN-07-LITE)**: binary accept/reject feedback collected ONCE at pipeline completion (commit gate). Findings fixed during the pipeline are auto-accepted. Pending findings can be classified at commit gate or deferred to `forge --classify`. Feedback is NOT collected during individual passes to avoid conflicting with auto-continue.
|
|
1136
|
+
|
|
1137
|
+
14. **Step 0 Context Fusion (FUSE-01)**: deterministic Step 0 findings are serialized as a markdown table (capped at 20 rows) and injected into every LLM pass prompt. This prevents redundant flagging and lets LLM passes focus on issues that static tools cannot catch.
|
|
1138
|
+
|
|
1139
|
+
---
|
|
1140
|
+
|
|
1141
|
+
# Hook Enforcement Layer
|
|
1142
|
+
|
|
1143
|
+
These hooks enforce the pipeline at the tool level:
|
|
1144
|
+
|
|
1145
|
+
| Hook | Trigger | Purpose |
|
|
1146
|
+
|------|---------|---------|
|
|
1147
|
+
| check_worktree.sh | PreToolUse Edit/Write | Block edits in main worktree |
|
|
1148
|
+
| check_non_ascii.sh | PreToolUse Write/Edit | Non-ASCII character detection |
|
|
1149
|
+
| check_read_before_edit.sh | PostToolUse Read + PreToolUse Edit | 1:1 Read:Edit ratio + size guard |
|
|
1150
|
+
| check_review_tracker.sh | PostToolUse Bash (qodo) + PreToolUse Edit | Review state machine + hard stop |
|
|
1151
|
+
| check_git_commit_review.sh | PreToolUse Bash (git commit) | Block unreviewed commits + AI attribution check |
|
|
1152
|
+
| check_git_push_review.sh | PreToolUse Bash (git push) | Block unreviewed pushes |
|
|
1153
|
+
|
|
1154
|
+
---
|
|
1155
|
+
|
|
1156
|
+
# Execution Protocol
|
|
1157
|
+
|
|
1158
|
+
When `/forge` is invoked:
|
|
1159
|
+
|
|
1160
|
+
1. **Determine diff source**: uncommitted (default) or committed (if `committed` arg)
|
|
1161
|
+
2. **Display pipeline banner**:
|
|
1162
|
+
```
|
|
1163
|
+
Forge: starting 5-step review pipeline
|
|
1164
|
+
Diff: <N> files, <M> lines changed
|
|
1165
|
+
```
|
|
1166
|
+
3. **Run Step 0**: syntax + lint + non-ASCII. Stop on any failure. After all Step 0 checks pass, serialize findings into FUSE-01 context block for LLM passes (cap at 20 rows).
|
|
1167
|
+
4. **Initialize cycle_counter = 0**
|
|
1168
|
+
5. **Run cycles**: invoke /qodo-review, /code-review-expert, /adversarial-qe sequentially. Apply severity-gated state machine: P0/P1 = full reset, P2 = cycle restart, P3 = accumulate (density check -> P2 escalation), clean = auto-continue. Persist all findings to .forge/findings.json with validation.
|
|
1169
|
+
6. **After 3 clean cycles**: run Step 3.5 if findings were ever fixed during the process.
|
|
1170
|
+
7. **Run Step 4**: invoke /smoke-test. Full pipeline restart on any FAIL.
|
|
1171
|
+
8. **Report**: summary of passes completed, findings fixed, smoke test results.
|
|
1172
|
+
8.5. **Feedback collection**: present finding summary table. Collect accept/reject for pending findings (LEARN-07-LITE). Users can defer to `forge --classify`.
|
|
1173
|
+
9. **The commit itself is NOT performed by forge** -- it reports readiness and the user commits with the `# post-review-c3` marker.
|
|
1174
|
+
|
|
1175
|
+
## Progress Tracking
|
|
1176
|
+
|
|
1177
|
+
After each pass, report:
|
|
1178
|
+
|
|
1179
|
+
```
|
|
1180
|
+
[forge] Cycle <N>/3, Pass <P>/3: <skill-name>
|
|
1181
|
+
[forge] Result: <zero findings | N findings>
|
|
1182
|
+
[forge] cycle_counter = <value>
|
|
1183
|
+
```
|
|
1184
|
+
|
|
1185
|
+
After pipeline completes:
|
|
1186
|
+
|
|
1187
|
+
```
|
|
1188
|
+
[forge] Pipeline complete
|
|
1189
|
+
[forge] Total passes: <N> (minimum 9)
|
|
1190
|
+
[forge] Findings fixed: <N>
|
|
1191
|
+
[forge] Smoke test: PASS
|
|
1192
|
+
[forge] Ready to commit with: # post-review-c3
|
|
1193
|
+
```
|