@codexstar/bug-hunter 3.0.0 → 3.0.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +149 -83
- package/README.md +150 -15
- package/SKILL.md +94 -27
- package/agents/openai.yaml +4 -0
- package/bin/bug-hunter +9 -3
- package/docs/images/2026-03-12-fix-plan-rollout.png +0 -0
- package/docs/images/2026-03-12-hero-bug-hunter-overview.png +0 -0
- package/docs/images/2026-03-12-machine-readable-artifacts.png +0 -0
- package/docs/images/2026-03-12-pr-review-flow.png +0 -0
- package/docs/images/2026-03-12-security-pack.png +0 -0
- package/docs/images/adversarial-debate.png +0 -0
- package/docs/images/doc-verify-fix-plan.png +0 -0
- package/docs/images/hero.png +0 -0
- package/docs/images/pipeline-overview.png +0 -0
- package/docs/images/security-finding-card.png +0 -0
- package/docs/plans/2026-03-11-structured-output-migration-plan.md +288 -0
- package/docs/plans/2026-03-12-audit-bug-fixes-surgical-plan.md +193 -0
- package/docs/plans/2026-03-12-enterprise-security-pack-e2e-plan.md +59 -0
- package/docs/plans/2026-03-12-local-security-skills-integration-plan.md +39 -0
- package/docs/plans/2026-03-12-pr-review-strategic-fix-flow.md +78 -0
- package/evals/evals.json +366 -102
- package/modes/extended.md +2 -2
- package/modes/fix-loop.md +30 -30
- package/modes/fix-pipeline.md +32 -6
- package/modes/large-codebase.md +14 -15
- package/modes/local-sequential.md +44 -20
- package/modes/loop.md +56 -56
- package/modes/parallel.md +3 -3
- package/modes/scaled.md +2 -2
- package/modes/single-file.md +3 -3
- package/modes/small.md +11 -11
- package/package.json +10 -1
- package/prompts/fixer.md +37 -23
- package/prompts/hunter.md +39 -20
- package/prompts/referee.md +34 -20
- package/prompts/skeptic.md +25 -22
- package/schemas/coverage.schema.json +67 -0
- package/schemas/examples/findings.invalid.json +13 -0
- package/schemas/examples/findings.valid.json +17 -0
- package/schemas/findings.schema.json +76 -0
- package/schemas/fix-plan.schema.json +94 -0
- package/schemas/fix-report.schema.json +105 -0
- package/schemas/fix-strategy.schema.json +99 -0
- package/schemas/recon.schema.json +31 -0
- package/schemas/referee.schema.json +46 -0
- package/schemas/shared.schema.json +51 -0
- package/schemas/skeptic.schema.json +21 -0
- package/scripts/bug-hunter-state.cjs +35 -12
- package/scripts/code-index.cjs +11 -4
- package/scripts/fix-lock.cjs +95 -25
- package/scripts/payload-guard.cjs +24 -10
- package/scripts/pr-scope.cjs +181 -0
- package/scripts/render-report.cjs +346 -0
- package/scripts/run-bug-hunter.cjs +667 -32
- package/scripts/schema-runtime.cjs +273 -0
- package/scripts/schema-validate.cjs +40 -0
- package/scripts/tests/bug-hunter-state.test.cjs +68 -3
- package/scripts/tests/code-index.test.cjs +15 -0
- package/scripts/tests/fix-lock.test.cjs +60 -2
- package/scripts/tests/fixtures/flaky-worker.cjs +6 -1
- package/scripts/tests/fixtures/low-confidence-worker.cjs +8 -2
- package/scripts/tests/fixtures/success-worker.cjs +6 -1
- package/scripts/tests/payload-guard.test.cjs +154 -2
- package/scripts/tests/pr-scope.test.cjs +212 -0
- package/scripts/tests/render-report.test.cjs +180 -0
- package/scripts/tests/run-bug-hunter.test.cjs +686 -2
- package/scripts/tests/security-skills-integration.test.cjs +29 -0
- package/scripts/tests/skills-packaging.test.cjs +30 -0
- package/scripts/tests/worktree-harvest.test.cjs +66 -0
- package/scripts/worktree-harvest.cjs +62 -9
- package/skills/README.md +19 -0
- package/skills/commit-security-scan/SKILL.md +63 -0
- package/skills/security-review/SKILL.md +57 -0
- package/skills/threat-model-generation/SKILL.md +47 -0
- package/skills/vulnerability-validation/SKILL.md +59 -0
- package/templates/subagent-wrapper.md +12 -3
- package/modes/_dispatch.md +0 -121
package/modes/extended.md
CHANGED
|
@@ -35,7 +35,7 @@ After Recon completes, read `.bug-hunter/recon.md` to extract the risk map and t
|
|
|
35
35
|
|
|
36
36
|
Partition files from `triage.scanOrder` (or the Recon risk map if no triage) into chunks:
|
|
37
37
|
- **Service-aware partitioning (preferred):** If triage detected multiple domains, partition by domain.
|
|
38
|
-
- **Risk-tier partitioning (fallback):** Process CRITICAL files first, then HIGH, then MEDIUM.
|
|
38
|
+
- **Risk-tier partitioning (fallback):** Process CRITICAL files first, then HIGH, then MEDIUM, then LOW.
|
|
39
39
|
- Chunk size: FILE_BUDGET ÷ 2 files per chunk (keep chunks small to avoid compaction).
|
|
40
40
|
- Keep same-directory files together when possible.
|
|
41
41
|
|
|
@@ -67,7 +67,7 @@ For each chunk:
|
|
|
67
67
|
|
|
68
68
|
### 5d. Merge all findings
|
|
69
69
|
|
|
70
|
-
After all chunks complete, merge findings from state into `.bug-hunter/findings.
|
|
70
|
+
After all chunks complete, merge findings from state into `.bug-hunter/findings.json`.
|
|
71
71
|
|
|
72
72
|
If TOTAL FINDINGS: 0, skip Skeptic and Referee. Go to Step 7 (Final Report) in SKILL.md.
|
|
73
73
|
|
package/modes/fix-loop.md
CHANGED
|
@@ -17,58 +17,56 @@ When `LOOP_MODE=true` AND `FIX_MODE=true`, before running the first pipeline ite
|
|
|
17
17
|
2. Call the `ralph_start` tool:
|
|
18
18
|
|
|
19
19
|
```
|
|
20
|
+
MAX_FIX_LOOP_ITERATIONS = max(
|
|
21
|
+
15,
|
|
22
|
+
min(250, ceil(SCANNABLE_FILES / max(FILE_BUDGET, 1)) + ELIGIBLE_BUG_COUNT + 8)
|
|
23
|
+
)
|
|
24
|
+
|
|
20
25
|
ralph_start({
|
|
21
26
|
name: "bug-hunter-fix-audit",
|
|
22
27
|
taskContent: <the TODO.md content below>,
|
|
23
|
-
maxIterations:
|
|
28
|
+
maxIterations: MAX_FIX_LOOP_ITERATIONS
|
|
24
29
|
})
|
|
25
30
|
```
|
|
26
31
|
|
|
27
32
|
3. The ralph-loop system will then drive iteration. Each iteration:
|
|
28
33
|
- You receive the task prompt with the current checklist state.
|
|
29
34
|
- You execute one iteration of find + fix.
|
|
30
|
-
- You update `.bug-hunter/coverage.
|
|
31
|
-
- If all bugs are FIXED and all
|
|
35
|
+
- You update `.bug-hunter/coverage.json` with results and render `.bug-hunter/coverage.md`.
|
|
36
|
+
- If all bugs are FIXED and all queued scannable source files are DONE → output `<promise>COMPLETE</promise>`.
|
|
32
37
|
- Otherwise → call `ralph_done` to proceed to the next iteration.
|
|
33
38
|
|
|
34
39
|
**Do NOT manually loop or re-invoke yourself.** The ralph-loop system handles iteration automatically.
|
|
35
40
|
|
|
36
41
|
## Coverage file extension for fix mode
|
|
37
42
|
|
|
38
|
-
The `.bug-hunter/coverage.
|
|
43
|
+
The `.bug-hunter/coverage.json` file carries the same loop state, plus fix
|
|
44
|
+
entries:
|
|
39
45
|
|
|
40
|
-
```
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
BUG-12|FIXED|2|src/api/users.ts
|
|
48
|
-
|
|
49
|
-
## Test Results
|
|
50
|
-
<!-- One line per iteration. Format: ITERATION|PASSED|FAILED|NEW_FAILURES|RESOLVED -->
|
|
51
|
-
1|45|3|2|0
|
|
52
|
-
2|47|1|0|1
|
|
46
|
+
```json
|
|
47
|
+
{
|
|
48
|
+
"fixes": [
|
|
49
|
+
{ "bugId": "BUG-3", "status": "FIXED" },
|
|
50
|
+
{ "bugId": "BUG-12", "status": "FIX_FAILED" }
|
|
51
|
+
]
|
|
52
|
+
}
|
|
53
53
|
```
|
|
54
54
|
|
|
55
|
-
**Parsing rule:** For each BUG-ID, use the LAST entry in the Fixes section. Earlier entries for the same BUG-ID are history — only the latest matters.
|
|
56
|
-
|
|
57
55
|
## Loop iteration logic
|
|
58
56
|
|
|
59
57
|
```
|
|
60
58
|
For each iteration:
|
|
61
|
-
1. Read coverage
|
|
62
|
-
2. Collect
|
|
63
|
-
- Unfixed bugs: latest
|
|
64
|
-
- Unscanned files:
|
|
59
|
+
1. Read coverage.json
|
|
60
|
+
2. Collect:
|
|
61
|
+
- Unfixed bugs: latest fix status in {FIX_REVERTED, FIX_FAILED, FIX_CONFLICT, SKIPPED, FIXER_BUG, MANUAL_REVIEW}
|
|
62
|
+
- Unscanned files: file status != done
|
|
65
63
|
3. If unfixed bugs exist OR unscanned files exist:
|
|
66
64
|
a. If unscanned files -> run Phase 1 (find pipeline) on them -> get new confirmed bugs
|
|
67
65
|
b. Combine: unfixed bugs + newly confirmed bugs
|
|
68
66
|
c. Run Phase 2 (fix + verify) on combined list
|
|
69
|
-
d. Update coverage
|
|
67
|
+
d. Update coverage.json and re-render coverage.md
|
|
70
68
|
e. Call ralph_done to proceed to next iteration
|
|
71
|
-
4. If all bugs FIXED and all
|
|
69
|
+
4. If all bugs FIXED and all queued scannable source files are DONE:
|
|
72
70
|
-> Run final test suite one more time
|
|
73
71
|
-> If no new failures:
|
|
74
72
|
Output <promise>COMPLETE</promise>
|
|
@@ -87,6 +85,8 @@ Use this as the `taskContent` parameter when calling `ralph_start`:
|
|
|
87
85
|
## Discovery Tasks
|
|
88
86
|
- [ ] All CRITICAL files scanned
|
|
89
87
|
- [ ] All HIGH files scanned
|
|
88
|
+
- [ ] All MEDIUM files scanned
|
|
89
|
+
- [ ] All LOW files scanned
|
|
90
90
|
- [ ] Findings verified through Skeptic+Referee pipeline
|
|
91
91
|
|
|
92
92
|
## Fix Tasks
|
|
@@ -100,13 +100,13 @@ Use this as the `taskContent` parameter when calling `ralph_start`:
|
|
|
100
100
|
- [ ] ALL_TASKS_COMPLETE
|
|
101
101
|
|
|
102
102
|
## Instructions
|
|
103
|
-
1. Read .bug-hunter/coverage.
|
|
104
|
-
2. Parse
|
|
105
|
-
3. Parse
|
|
103
|
+
1. Read .bug-hunter/coverage.json for previous iteration state
|
|
104
|
+
2. Parse the `files` array — collect unscanned CRITICAL/HIGH/MEDIUM/LOW files
|
|
105
|
+
3. Parse the `fixes` array — collect unfixed bugs (latest entry not FIXED)
|
|
106
106
|
4. If unscanned files exist: run Phase 1 (find pipeline) on them
|
|
107
107
|
5. If unfixed bugs exist: run Phase 2 (fix pipeline) on them
|
|
108
|
-
6. Update coverage
|
|
109
|
-
7. Output <promise>COMPLETE</promise> when all bugs are FIXED and no new test failures
|
|
108
|
+
6. Update coverage.json with results and render coverage.md
|
|
109
|
+
7. Output <promise>COMPLETE</promise> only when all queued files are DONE, all discovered bugs are FIXED, and no new test failures remain
|
|
110
110
|
8. Otherwise call ralph_done to continue to the next iteration
|
|
111
111
|
```
|
|
112
112
|
|
package/modes/fix-pipeline.md
CHANGED
|
@@ -50,11 +50,14 @@ DYNAMIC_TTL = max(1800, ELIGIBLE_COUNT * 600) # 10 min per bug, minimum 30 min
|
|
|
50
50
|
```
|
|
51
51
|
node "$SKILL_DIR/scripts/fix-lock.cjs" acquire ".bug-hunter/fix.lock" $DYNAMIC_TTL
|
|
52
52
|
```
|
|
53
|
+
Record `LOCK_OWNER_TOKEN` from the returned JSON (`lock.ownerToken`).
|
|
53
54
|
If lock cannot be acquired, stop Phase 2 to avoid concurrent mutation.
|
|
54
55
|
|
|
56
|
+
**Owner token:** `acquire` returns `lock.ownerToken`; renew/release now require that token. Persist it for the entire Phase 2 run as `LOCK_OWNER_TOKEN`.
|
|
57
|
+
|
|
55
58
|
**Lock renewal:** During Step 9 execution, renew the lock after each bug fix to prevent TTL expiry on long runs:
|
|
56
59
|
```
|
|
57
|
-
node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock"
|
|
60
|
+
node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN"
|
|
58
61
|
```
|
|
59
62
|
|
|
60
63
|
**8b. Detect verification commands**
|
|
@@ -81,7 +84,16 @@ If `TEST_COMMAND` is not null:
|
|
|
81
84
|
|
|
82
85
|
If baseline cannot run, set `BASELINE=null` and `FLAKY_TESTS={}` and continue with manual-verification warning.
|
|
83
86
|
|
|
84
|
-
**8d. Build sequential fix plan**
|
|
87
|
+
**8d. Build fix strategy + sequential fix plan**
|
|
88
|
+
|
|
89
|
+
Before deciding what to patch, write `.bug-hunter/fix-strategy.json` and `.bug-hunter/fix-strategy.md`.
|
|
90
|
+
The strategy artifact must classify each confirmed bug into one of:
|
|
91
|
+
- `safe-autofix`
|
|
92
|
+
- `manual-review`
|
|
93
|
+
- `larger-refactor`
|
|
94
|
+
- `architectural-remediation`
|
|
95
|
+
|
|
96
|
+
If `PLAN_ONLY_MODE=true`, stop after the strategy artifact and fix-plan preview are written.
|
|
85
97
|
|
|
86
98
|
Prepare bug queue:
|
|
87
99
|
1. Apply confidence gate:
|
|
@@ -185,7 +197,7 @@ For each batch in order:
|
|
|
185
197
|
|
|
186
198
|
8a. Renew lock after each batch:
|
|
187
199
|
```
|
|
188
|
-
node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock"
|
|
200
|
+
node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN"
|
|
189
201
|
```
|
|
190
202
|
|
|
191
203
|
**Path B — Direct mode (`WORKTREE_MODE=false`):**
|
|
@@ -201,7 +213,7 @@ For each batch in order:
|
|
|
201
213
|
7b. Record commit hash per BUG-ID in a fix ledger.
|
|
202
214
|
8b. **Renew lock** after each bug fix:
|
|
203
215
|
```
|
|
204
|
-
node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock"
|
|
216
|
+
node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN"
|
|
205
217
|
```
|
|
206
218
|
|
|
207
219
|
If a bug cannot be fixed, mark `SKIPPED` and continue.
|
|
@@ -260,7 +272,9 @@ Use exact fixed scope from the real base commit:
|
|
|
260
272
|
2. Build changed hunks list.
|
|
261
273
|
3. Run one lightweight Hunter on changed hunks only with a **severity floor of MEDIUM**:
|
|
262
274
|
- Only report fixer-introduced bugs at MEDIUM severity or above.
|
|
263
|
-
- LOW-severity issues from the fixer are logged
|
|
275
|
+
- LOW-severity issues from the fixer are logged in `.bug-hunter/fix-report.json`
|
|
276
|
+
(and optional derived `.bug-hunter/fix-report.md`) as informational notes
|
|
277
|
+
but do NOT trigger `FIXER_BUG` status.
|
|
264
278
|
|
|
265
279
|
This removes ambiguity from `<base-branch>` and works for path scans, staged scans, and branch scans.
|
|
266
280
|
|
|
@@ -301,7 +315,7 @@ If stash was created (not applicable in dry-run mode):
|
|
|
301
315
|
|
|
302
316
|
Always release single-writer lock at the end (success or failure path):
|
|
303
317
|
```
|
|
304
|
-
node "$SKILL_DIR/scripts/fix-lock.cjs" release ".bug-hunter/fix.lock"
|
|
318
|
+
node "$SKILL_DIR/scripts/fix-lock.cjs" release ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN"
|
|
305
319
|
```
|
|
306
320
|
If an earlier step aborts Phase 2, run the same release command AND worktree cleanup-all in best-effort cleanup before returning.
|
|
307
321
|
|
|
@@ -375,6 +389,18 @@ Write `.bug-hunter/fix-report.json` alongside the markdown report:
|
|
|
375
389
|
}
|
|
376
390
|
```
|
|
377
391
|
|
|
392
|
+
Validate it immediately:
|
|
393
|
+
|
|
394
|
+
```bash
|
|
395
|
+
node "$SKILL_DIR/scripts/schema-validate.cjs" fix-report ".bug-hunter/fix-report.json"
|
|
396
|
+
```
|
|
397
|
+
|
|
398
|
+
Render the Markdown companion when humans need it:
|
|
399
|
+
|
|
400
|
+
```bash
|
|
401
|
+
node "$SKILL_DIR/scripts/render-report.cjs" fix-report ".bug-hunter/fix-report.json" > ".bug-hunter/fix-report.md"
|
|
402
|
+
```
|
|
403
|
+
|
|
378
404
|
Rules:
|
|
379
405
|
- `dry_run: true` when `DRY_RUN_MODE=true` — the `fixes` array contains planned diffs instead of commit hashes.
|
|
380
406
|
- `circuit_breaker_tripped: true` when the circuit breaker halted the pipeline.
|
package/modes/large-codebase.md
CHANGED
|
@@ -67,7 +67,7 @@ This is fast — no file reading, just directory listing and heuristic classific
|
|
|
67
67
|
Process ONE domain at a time, running the **full pipeline** (Recon → Hunter → Skeptic → Referee) within each domain:
|
|
68
68
|
|
|
69
69
|
```
|
|
70
|
-
For each domain (CRITICAL first, then HIGH, then MEDIUM):
|
|
70
|
+
For each domain (CRITICAL first, then HIGH, then MEDIUM, then LOW):
|
|
71
71
|
1. Get this domain's file list:
|
|
72
72
|
- If triage exists: use triage.domainFileLists[domainPath]
|
|
73
73
|
- If no triage: use fd/find to list files in this domain's directory
|
|
@@ -78,9 +78,9 @@ For each domain (CRITICAL first, then HIGH, then MEDIUM):
|
|
|
78
78
|
|
|
79
79
|
Write domain results to:
|
|
80
80
|
.bug-hunter/domains/<domain-name>/recon.md
|
|
81
|
-
.bug-hunter/domains/<domain-name>/findings.
|
|
82
|
-
.bug-hunter/domains/<domain-name>/skeptic.
|
|
83
|
-
.bug-hunter/domains/<domain-name>/referee.
|
|
81
|
+
.bug-hunter/domains/<domain-name>/findings.json
|
|
82
|
+
.bug-hunter/domains/<domain-name>/skeptic.json
|
|
83
|
+
.bug-hunter/domains/<domain-name>/referee.json
|
|
84
84
|
|
|
85
85
|
Record in state:
|
|
86
86
|
node "$SKILL_DIR/scripts/bug-hunter-state.cjs" record-findings ...
|
|
@@ -123,7 +123,7 @@ Write boundary results to `.bug-hunter/domains/_boundaries/`.
|
|
|
123
123
|
|
|
124
124
|
After all domains + boundaries are audited:
|
|
125
125
|
|
|
126
|
-
1. Read all domain `referee.
|
|
126
|
+
1. Read all domain `referee.json` files and boundary results.
|
|
127
127
|
2. Merge findings, deduplicate by file + line + claim.
|
|
128
128
|
3. Renumber BUG-IDs globally.
|
|
129
129
|
4. Build the final report per Step 7 in SKILL.md.
|
|
@@ -163,17 +163,16 @@ Use `.bug-hunter/state.json` with domain-aware structure:
|
|
|
163
163
|
- Iteration N-1: Tier 3 merge and report
|
|
164
164
|
- Iteration N: Coverage check → DONE or continue with missed domains
|
|
165
165
|
|
|
166
|
-
The ralph-loop's coverage check reads the state file and only marks DONE when all
|
|
166
|
+
The ralph-loop's coverage check reads the state file and only marks DONE when all queued domains show status `done`.
|
|
167
167
|
|
|
168
|
-
##
|
|
168
|
+
## Default autonomous behavior
|
|
169
169
|
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
```
|
|
170
|
+
Autonomous mode is exhaustive by default:
|
|
171
|
+
- Finish all CRITICAL domains first.
|
|
172
|
+
- Then continue through HIGH domains.
|
|
173
|
+
- Then continue through MEDIUM domains.
|
|
174
|
+
- Then continue through LOW domains.
|
|
175
|
+
- Only stop when the domain queue is exhausted, the user interrupts, or a hard blocker prevents safe progress.
|
|
177
176
|
|
|
178
177
|
## Optimization: Delta-first for repeat scans
|
|
179
178
|
|
|
@@ -209,4 +208,4 @@ When executing large-codebase mode:
|
|
|
209
208
|
- [ ] Tier 3: Merge all domain + boundary findings
|
|
210
209
|
- [ ] Tier 3: Deduplicate and renumber
|
|
211
210
|
- [ ] Tier 3: Build final report with per-domain breakdown
|
|
212
|
-
- [ ] Coverage: All
|
|
211
|
+
- [ ] Coverage: All queued domains done? If not, continue.
|
|
@@ -6,7 +6,10 @@ This is NOT a degraded mode. The skill is designed to work fully here.
|
|
|
6
6
|
|
|
7
7
|
## How It Works
|
|
8
8
|
|
|
9
|
-
You (the orchestrating agent) play each role yourself, sequentially. Between
|
|
9
|
+
You (the orchestrating agent) play each role yourself, sequentially. Between
|
|
10
|
+
phases you write canonical JSON artifacts so later phases can reference them
|
|
11
|
+
without holding everything in working memory. Markdown reports are derived from
|
|
12
|
+
those JSON files when humans need them.
|
|
10
13
|
|
|
11
14
|
All state files go in `.bug-hunter/` relative to the working directory.
|
|
12
15
|
|
|
@@ -26,7 +29,9 @@ All state files go in `.bug-hunter/` relative to the working directory.
|
|
|
26
29
|
- Use `triage.scanOrder` as the file order for Phase B.
|
|
27
30
|
- Recon's remaining job: read 3-5 key files from CRITICAL domains to identify **tech stack** (framework, auth mechanism, database, key dependencies) and **trust boundary patterns** (how routes are defined, how auth middleware is applied, etc.).
|
|
28
31
|
- If git is available, check recently changed files with `git log`.
|
|
29
|
-
- Write your Recon output to `.bug-hunter/recon.
|
|
32
|
+
- Write your Recon output to `.bug-hunter/recon.json` if structured output is
|
|
33
|
+
requested; otherwise keep `.bug-hunter/recon.md` as a temporary fallback
|
|
34
|
+
until the Recon prompt is migrated.
|
|
30
35
|
|
|
31
36
|
3. **If `.bug-hunter/triage.json` does NOT exist** (fallback — Recon called directly):
|
|
32
37
|
- Execute the full Recon instructions: discover files, classify, compute FILE_BUDGET.
|
|
@@ -44,12 +49,16 @@ All state files go in `.bug-hunter/` relative to the working directory.
|
|
|
44
49
|
2. Read `SKILL_DIR/prompts/doc-lookup.md` with the Read tool.
|
|
45
50
|
3. **Switch mindset**: you are now a Bug Hunter. Your ONLY job is to find behavioral bugs.
|
|
46
51
|
4. Execute the Hunter instructions yourself:
|
|
47
|
-
- Read files in risk-map order: CRITICAL → HIGH → MEDIUM.
|
|
52
|
+
- Read files in risk-map order: CRITICAL → HIGH → MEDIUM → LOW.
|
|
48
53
|
- For each file, use the Read tool. Do NOT rely on memory from earlier phases.
|
|
49
54
|
- Apply the mandatory security checklist sweep (Phase 3 in hunter.md) on every CRITICAL and HIGH file.
|
|
50
55
|
- Track which files you actually read — be honest about coverage.
|
|
51
56
|
- For each bug found, record it in the exact BUG-N format specified in hunter.md.
|
|
52
|
-
5. Write your complete findings to `.bug-hunter/findings.
|
|
57
|
+
5. Write your complete findings to `.bug-hunter/findings.json`.
|
|
58
|
+
6. Validate the artifact immediately:
|
|
59
|
+
```bash
|
|
60
|
+
node "$SKILL_DIR/scripts/schema-validate.cjs" findings ".bug-hunter/findings.json"
|
|
61
|
+
```
|
|
53
62
|
|
|
54
63
|
**Context management:** If you notice earlier files becoming hazy in your memory:
|
|
55
64
|
- STOP expanding to new files.
|
|
@@ -84,9 +93,10 @@ If the Recon risk map contains more files than FILE_BUDGET, do NOT try to read t
|
|
|
84
93
|
```bash
|
|
85
94
|
node "$SKILL_DIR/scripts/bug-hunter-state.cjs" mark-chunk ".bug-hunter/state.json" "<chunk-id>" done
|
|
86
95
|
```
|
|
87
|
-
3. After all chunks: merge findings from `.bug-hunter/state.json` into
|
|
96
|
+
3. After all chunks: merge findings from `.bug-hunter/state.json` into
|
|
97
|
+
`.bug-hunter/findings.json`.
|
|
88
98
|
|
|
89
|
-
**Gap-fill:** After scanning, compare FILES SCANNED against the risk map. If any
|
|
99
|
+
**Gap-fill:** After scanning, compare FILES SCANNED against the risk map. If any queued scannable files are in FILES SKIPPED, read them now in priority order (CRITICAL → HIGH → MEDIUM → LOW) and append any new findings. If you truly cannot read them (context exhaustion), leave them in FILES SKIPPED so loop mode can resume them next.
|
|
90
100
|
|
|
91
101
|
If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report) in SKILL.md.
|
|
92
102
|
|
|
@@ -95,7 +105,7 @@ If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report)
|
|
|
95
105
|
1. Read `SKILL_DIR/prompts/skeptic.md` with the Read tool.
|
|
96
106
|
2. Read `SKILL_DIR/prompts/doc-lookup.md` with the Read tool.
|
|
97
107
|
3. **Switch mindset completely**: you are now the Skeptic. Your job is to DISPROVE false positives. Forget the pride of finding them — you want to kill weak claims.
|
|
98
|
-
4. Read `.bug-hunter/findings.
|
|
108
|
+
4. Read `.bug-hunter/findings.json` to get the findings list.
|
|
99
109
|
5. For EACH finding:
|
|
100
110
|
- Re-read the actual code at the reported file and line with the Read tool. This is MANDATORY — do not evaluate from memory.
|
|
101
111
|
- Read all cross-referenced files.
|
|
@@ -103,7 +113,12 @@ If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report)
|
|
|
103
113
|
- Check framework/middleware protections the Hunter may have missed.
|
|
104
114
|
- Apply the risk calculation: `EV = (confidence% × points) - ((100 - confidence%) × 2 × points)`. Only DISPROVE when EV is positive (confidence > 67%).
|
|
105
115
|
- For Critical bugs: need >67% confidence AND all cross-references read.
|
|
106
|
-
6. Write your complete Skeptic output to `.bug-hunter/skeptic.
|
|
116
|
+
6. Write your complete Skeptic output to `.bug-hunter/skeptic.json` in the
|
|
117
|
+
format from skeptic.md.
|
|
118
|
+
7. Validate it immediately:
|
|
119
|
+
```bash
|
|
120
|
+
node "$SKILL_DIR/scripts/schema-validate.cjs" skeptic ".bug-hunter/skeptic.json"
|
|
121
|
+
```
|
|
107
122
|
|
|
108
123
|
**Important:** When switching from Hunter to Skeptic, genuinely try to disprove your own findings. The point of this phase is adversarial review. If you cannot genuinely argue against a finding, ACCEPT it and move on — do not waste time rubber-stamping.
|
|
109
124
|
|
|
@@ -111,13 +126,21 @@ If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report)
|
|
|
111
126
|
|
|
112
127
|
1. Read `SKILL_DIR/prompts/referee.md` with the Read tool.
|
|
113
128
|
2. **Switch mindset**: you are the impartial Referee. You trust neither the Hunter nor the Skeptic.
|
|
114
|
-
3. Read both `.bug-hunter/findings.
|
|
129
|
+
3. Read both `.bug-hunter/findings.json` and `.bug-hunter/skeptic.json`.
|
|
115
130
|
4. For each finding:
|
|
116
131
|
- **Tier 1 (all Critical + top 15 by severity):** Re-read the actual code yourself a THIRD time using the Read tool. Construct the runtime trigger independently. Make your own judgment.
|
|
117
132
|
- **Tier 2 (remaining):** Evaluate evidence quality. Whose code quotes are more specific? Whose runtime trigger is more concrete?
|
|
118
133
|
5. Make final REAL BUG / NOT A BUG verdicts with severity calibration.
|
|
119
|
-
6. Write the final Referee
|
|
120
|
-
7.
|
|
134
|
+
6. Write the final Referee verdicts to `.bug-hunter/referee.json`.
|
|
135
|
+
7. Validate them immediately:
|
|
136
|
+
```bash
|
|
137
|
+
node "$SKILL_DIR/scripts/schema-validate.cjs" referee ".bug-hunter/referee.json"
|
|
138
|
+
```
|
|
139
|
+
8. Render `.bug-hunter/report.md` from the JSON artifacts:
|
|
140
|
+
```bash
|
|
141
|
+
node "$SKILL_DIR/scripts/render-report.cjs" report ".bug-hunter/findings.json" ".bug-hunter/referee.json" > ".bug-hunter/report.md"
|
|
142
|
+
```
|
|
143
|
+
9. Proceed to Step 7 (Final Report) in SKILL.md.
|
|
121
144
|
|
|
122
145
|
## State Files Summary
|
|
123
146
|
|
|
@@ -125,10 +148,11 @@ After a complete local-sequential run, these files should exist:
|
|
|
125
148
|
|
|
126
149
|
| File | Phase | Content |
|
|
127
150
|
|------|-------|---------|
|
|
128
|
-
| `.bug-hunter/recon.
|
|
129
|
-
| `.bug-hunter/findings.
|
|
130
|
-
| `.bug-hunter/skeptic.
|
|
131
|
-
| `.bug-hunter/referee.
|
|
151
|
+
| `.bug-hunter/recon.json` | A | Recon artifact when structured output is used |
|
|
152
|
+
| `.bug-hunter/findings.json` | B | All Hunter findings in canonical JSON |
|
|
153
|
+
| `.bug-hunter/skeptic.json` | C | Skeptic challenges in canonical JSON |
|
|
154
|
+
| `.bug-hunter/referee.json` | D | Final verdicts in canonical JSON |
|
|
155
|
+
| `.bug-hunter/report.md` | D | Human-readable report rendered from JSON |
|
|
132
156
|
| `.bug-hunter/state.json` | B (chunked) | Chunk progress, findings ledger |
|
|
133
157
|
| `.bug-hunter/source-files.json` | A | Source file list (for state init) |
|
|
134
158
|
|
|
@@ -136,8 +160,8 @@ After a complete local-sequential run, these files should exist:
|
|
|
136
160
|
|
|
137
161
|
After Phase D, check coverage:
|
|
138
162
|
|
|
139
|
-
- If all
|
|
140
|
-
- If any
|
|
141
|
-
- If `--loop` mode: the ralph-loop
|
|
142
|
-
- If not `--loop`: include a coverage WARNING in the Final Report and recommend
|
|
143
|
-
- Do NOT claim "full coverage" or "audit complete" unless every
|
|
163
|
+
- If all queued scannable source files were scanned: proceed to Final Report.
|
|
164
|
+
- If any queued scannable files were skipped:
|
|
165
|
+
- If `--loop` mode: the ralph-loop must iterate and cover the remaining queue next.
|
|
166
|
+
- If not `--loop`: include a coverage WARNING in the Final Report and recommend loop mode.
|
|
167
|
+
- Do NOT claim "full coverage" or "audit complete" unless every queued scannable source file was actually read with the Read tool and has status DONE.
|
package/modes/loop.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Ralph-Loop Mode (`--loop`)
|
|
2
2
|
|
|
3
|
-
When `--loop` is present, the bug-hunter wraps itself in a ralph-loop that keeps iterating until the audit achieves full coverage. This is for thorough, autonomous audits where you want every file examined.
|
|
3
|
+
When `--loop` is present, the bug-hunter wraps itself in a ralph-loop that keeps iterating until the audit achieves full queued coverage. This is for thorough, autonomous audits where you want every queued scannable source file examined unless the user interrupts.
|
|
4
4
|
|
|
5
5
|
## CRITICAL: Starting the ralph-loop
|
|
6
6
|
|
|
@@ -12,65 +12,63 @@ When `LOOP_MODE=true` is set (from `--loop` flag), before running the first pipe
|
|
|
12
12
|
2. Call the `ralph_start` tool:
|
|
13
13
|
|
|
14
14
|
```
|
|
15
|
+
MAX_LOOP_ITERATIONS = max(12, min(200, ceil(SCANNABLE_FILES / max(FILE_BUDGET, 1)) + 8))
|
|
16
|
+
|
|
15
17
|
ralph_start({
|
|
16
18
|
name: "bug-hunter-audit",
|
|
17
19
|
taskContent: <the TODO.md content below>,
|
|
18
|
-
maxIterations:
|
|
20
|
+
maxIterations: MAX_LOOP_ITERATIONS
|
|
19
21
|
})
|
|
20
22
|
```
|
|
21
23
|
|
|
22
24
|
3. The ralph-loop system will then drive iteration. Each iteration:
|
|
23
25
|
- You receive the task prompt with the current checklist state.
|
|
24
26
|
- You execute one iteration of the bug-hunt pipeline (steps below).
|
|
25
|
-
- You update `.bug-hunter/coverage.
|
|
26
|
-
- If ALL
|
|
27
|
+
- You update `.bug-hunter/coverage.json` with results and render `.bug-hunter/coverage.md` from it.
|
|
28
|
+
- If ALL queued scannable source files are DONE → output `<promise>COMPLETE</promise>` to end the loop.
|
|
27
29
|
- Otherwise → call `ralph_done` to proceed to the next iteration.
|
|
28
30
|
|
|
29
31
|
**Do NOT manually loop or re-invoke yourself.** The ralph-loop system handles iteration automatically after you call `ralph_start`.
|
|
30
32
|
|
|
31
33
|
## How it works
|
|
32
34
|
|
|
33
|
-
1. **First iteration**: Run the normal pipeline (Recon → Hunters → Skeptics →
|
|
35
|
+
1. **First iteration**: Run the normal pipeline (Recon → Hunters → Skeptics →
|
|
36
|
+
Referee). At the end, write canonical coverage state to
|
|
37
|
+
`.bug-hunter/coverage.json` and render `.bug-hunter/coverage.md` from it.
|
|
34
38
|
|
|
35
39
|
2. **Coverage check**: After each iteration, evaluate:
|
|
36
|
-
- If ALL
|
|
37
|
-
- If any
|
|
38
|
-
-
|
|
39
|
-
|
|
40
|
-
3. **Subsequent iterations**: Each new iteration reads
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
MEDIUM|src/utils/format.ts|SKIPPED|0|
|
|
65
|
-
TEST|src/auth/login.test.ts|CONTEXT|1|
|
|
66
|
-
|
|
67
|
-
## Bugs
|
|
68
|
-
<!-- One line per confirmed bug. Format: BUG-ID|SEVERITY|FILE|LINES|ONE_LINE_DESCRIPTION -->
|
|
69
|
-
BUG-3|Critical|src/auth/login.ts|45-52|JWT token not validated before use
|
|
70
|
-
BUG-7|Medium|src/auth/login.ts|89|Password comparison uses timing-unsafe equality
|
|
71
|
-
BUG-12|Low|src/api/users.ts|120-125|Missing null check on optional profile field
|
|
40
|
+
- If ALL queued scannable source files show status DONE → output `<promise>COMPLETE</promise>` → loop ends
|
|
41
|
+
- If any queued scannable source files are SKIPPED or PARTIAL → call `ralph_done` → loop continues
|
|
42
|
+
- Do NOT stop just because the current prioritized tier is clean; continue descending through MEDIUM and LOW files automatically
|
|
43
|
+
|
|
44
|
+
3. **Subsequent iterations**: Each new iteration reads
|
|
45
|
+
`.bug-hunter/coverage.json` to see what's already been done, then runs the
|
|
46
|
+
pipeline ONLY on uncovered files. New findings are appended to the
|
|
47
|
+
cumulative bug list.
|
|
48
|
+
|
|
49
|
+
## Coverage file format (canonical)
|
|
50
|
+
|
|
51
|
+
**`.bug-hunter/coverage.json`:**
|
|
52
|
+
```json
|
|
53
|
+
{
|
|
54
|
+
"schemaVersion": 1,
|
|
55
|
+
"iteration": 1,
|
|
56
|
+
"status": "IN_PROGRESS",
|
|
57
|
+
"files": [
|
|
58
|
+
{ "path": "src/auth/login.ts", "status": "done" },
|
|
59
|
+
{ "path": "src/api/payments.ts", "status": "pending" }
|
|
60
|
+
],
|
|
61
|
+
"bugs": [
|
|
62
|
+
{ "bugId": "BUG-3", "severity": "Critical", "file": "src/auth/login.ts", "claim": "JWT token not validated before use" }
|
|
63
|
+
],
|
|
64
|
+
"fixes": [
|
|
65
|
+
{ "bugId": "BUG-3", "status": "MANUAL_REVIEW" }
|
|
66
|
+
]
|
|
67
|
+
}
|
|
72
68
|
```
|
|
73
69
|
|
|
70
|
+
**`.bug-hunter/coverage.md`** is derived from the JSON artifact for humans.
|
|
71
|
+
|
|
74
72
|
## TODO.md task content for ralph_start
|
|
75
73
|
|
|
76
74
|
Use this as the `taskContent` parameter when calling `ralph_start`:
|
|
@@ -82,44 +80,46 @@ Use this as the `taskContent` parameter when calling `ralph_start`:
|
|
|
82
80
|
## Coverage Tasks
|
|
83
81
|
- [ ] All CRITICAL files scanned
|
|
84
82
|
- [ ] All HIGH files scanned
|
|
83
|
+
- [ ] All MEDIUM files scanned
|
|
84
|
+
- [ ] All LOW files scanned
|
|
85
85
|
- [ ] Findings verified through Skeptic+Referee pipeline
|
|
86
86
|
|
|
87
87
|
## Completion
|
|
88
88
|
- [ ] ALL_TASKS_COMPLETE
|
|
89
89
|
|
|
90
90
|
## Instructions
|
|
91
|
-
1. Read .bug-hunter/coverage.
|
|
92
|
-
2. Parse the
|
|
91
|
+
1. Read .bug-hunter/coverage.json for previous iteration state
|
|
92
|
+
2. Parse the `files` array — collect all entries where `status` is not `done`
|
|
93
93
|
3. Run bug-hunter pipeline on those files only
|
|
94
|
-
4. Update coverage
|
|
95
|
-
5. Output <promise>COMPLETE</promise> when all
|
|
94
|
+
4. Update coverage JSON: change file status to `done`, append bug summaries, and render coverage.md
|
|
95
|
+
5. Output <promise>COMPLETE</promise> only when all queued source files are DONE
|
|
96
96
|
6. Otherwise call ralph_done to continue to the next iteration
|
|
97
97
|
```
|
|
98
98
|
|
|
99
99
|
## Coverage file validation
|
|
100
100
|
|
|
101
101
|
At the start of each iteration, validate the coverage file:
|
|
102
|
-
1.
|
|
103
|
-
2.
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
Update the CHECKSUM every time you write to the coverage file.
|
|
102
|
+
1. Validate `.bug-hunter/coverage.json` against the local coverage schema.
|
|
103
|
+
2. If validation fails, rename the bad file to `.bug-hunter/coverage.json.bak`
|
|
104
|
+
and start fresh. Warn the user.
|
|
105
|
+
3. Always regenerate `.bug-hunter/coverage.md` from the JSON artifact after a
|
|
106
|
+
successful write.
|
|
108
107
|
|
|
109
108
|
## Iteration behavior
|
|
110
109
|
|
|
111
110
|
Each iteration after the first:
|
|
112
|
-
1. Read `.bug-hunter/coverage.
|
|
113
|
-
2. Collect all
|
|
111
|
+
1. Read `.bug-hunter/coverage.json`
|
|
112
|
+
2. Collect all file entries where `status != "done"`
|
|
114
113
|
3. If none remain → output `<promise>COMPLETE</promise>` (this ends the ralph-loop)
|
|
115
114
|
4. Otherwise, run the pipeline on remaining files only (use small/parallel mode based on count)
|
|
116
|
-
5. Update
|
|
115
|
+
5. Update `coverage.json`, then render `coverage.md`
|
|
117
116
|
6. Increment ITERATION counter
|
|
118
117
|
7. Call `ralph_done` to proceed to the next iteration
|
|
119
118
|
|
|
120
119
|
## Safety
|
|
121
120
|
|
|
122
|
-
- Max
|
|
121
|
+
- Max iterations should scale with the queue size so autonomous runs do not stop early
|
|
123
122
|
- Each iteration only scans NEW files — no re-scanning already-DONE files
|
|
124
123
|
- User can stop anytime with ESC or `/ralph-stop`
|
|
125
|
-
-
|
|
124
|
+
- Canonical state is in `.bug-hunter/coverage.json`; `coverage.md` is derived
|
|
125
|
+
and fully resumable from that JSON
|
package/modes/parallel.md
CHANGED
|
@@ -70,7 +70,7 @@ Pass to the Hunter:
|
|
|
70
70
|
- If scout hints exist (from Step 5), use them to prioritize certain code sections, but scan all files regardless.
|
|
71
71
|
- `doc-lookup.md` contents as phase-specific context.
|
|
72
72
|
|
|
73
|
-
After completion, read `.bug-hunter/findings.
|
|
73
|
+
After completion, read `.bug-hunter/findings.json`.
|
|
74
74
|
|
|
75
75
|
**Merge scout + deep findings:** If scout pass ran, compare scout findings with deep Hunter findings. Promote any scout-only findings (bugs the deep Hunter missed) into the findings list for Skeptic review.
|
|
76
76
|
|
|
@@ -80,7 +80,7 @@ If TOTAL FINDINGS: 0, skip Skeptic and Referee. Go to Step 7 (Final Report) in S
|
|
|
80
80
|
|
|
81
81
|
## Step 5-verify: Gap-fill check
|
|
82
82
|
|
|
83
|
-
Same as small mode: compare FILES SCANNED vs risk map, re-scan any missed
|
|
83
|
+
Same as small mode: compare FILES SCANNED vs risk map, then re-scan any missed queued scannable files in priority order.
|
|
84
84
|
|
|
85
85
|
---
|
|
86
86
|
|
|
@@ -104,7 +104,7 @@ Dispatch Referee using the standard dispatch pattern (see `_dispatch.md`, role=`
|
|
|
104
104
|
|
|
105
105
|
Pass the merged Hunter findings + Skeptic challenges.
|
|
106
106
|
|
|
107
|
-
After completion, read `.bug-hunter/referee.md
|
|
107
|
+
After completion, read `.bug-hunter/referee.json`, then render `.bug-hunter/report.md` from the JSON artifacts.
|
|
108
108
|
|
|
109
109
|
---
|
|
110
110
|
|
package/modes/scaled.md
CHANGED
|
@@ -45,7 +45,7 @@ For each chunk: dispatch Hunter, record findings, mark done — same pattern as
|
|
|
45
45
|
### 5c. Cross-chunk consistency
|
|
46
46
|
|
|
47
47
|
After all chunks complete:
|
|
48
|
-
1. Merge findings from state into `.bug-hunter/findings.
|
|
48
|
+
1. Merge findings from state into `.bug-hunter/findings.json`.
|
|
49
49
|
2. Run consistency check: look for duplicate BUG-IDs across chunks and conflicting claims on the same file/line.
|
|
50
50
|
3. Resolve conflicts: keep the finding with the stronger evidence.
|
|
51
51
|
|
|
@@ -73,4 +73,4 @@ Pass merged Hunter findings + Skeptic challenges.
|
|
|
73
73
|
|
|
74
74
|
Proceed to **Step 7** (Final Report) in SKILL.md.
|
|
75
75
|
|
|
76
|
-
If `--loop` was specified and coverage is incomplete, the ralph-loop will iterate to cover remaining files.
|
|
76
|
+
If `--loop` was specified and coverage is incomplete, the ralph-loop will iterate to cover the remaining queued files until the queue is exhausted or the user interrupts.
|