@codexstar/bug-hunter 3.0.0 → 3.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (78) hide show
  1. package/CHANGELOG.md +149 -83
  2. package/README.md +150 -15
  3. package/SKILL.md +94 -27
  4. package/agents/openai.yaml +4 -0
  5. package/bin/bug-hunter +9 -3
  6. package/docs/images/2026-03-12-fix-plan-rollout.png +0 -0
  7. package/docs/images/2026-03-12-hero-bug-hunter-overview.png +0 -0
  8. package/docs/images/2026-03-12-machine-readable-artifacts.png +0 -0
  9. package/docs/images/2026-03-12-pr-review-flow.png +0 -0
  10. package/docs/images/2026-03-12-security-pack.png +0 -0
  11. package/docs/images/adversarial-debate.png +0 -0
  12. package/docs/images/doc-verify-fix-plan.png +0 -0
  13. package/docs/images/hero.png +0 -0
  14. package/docs/images/pipeline-overview.png +0 -0
  15. package/docs/images/security-finding-card.png +0 -0
  16. package/docs/plans/2026-03-11-structured-output-migration-plan.md +288 -0
  17. package/docs/plans/2026-03-12-audit-bug-fixes-surgical-plan.md +193 -0
  18. package/docs/plans/2026-03-12-enterprise-security-pack-e2e-plan.md +59 -0
  19. package/docs/plans/2026-03-12-local-security-skills-integration-plan.md +39 -0
  20. package/docs/plans/2026-03-12-pr-review-strategic-fix-flow.md +78 -0
  21. package/evals/evals.json +366 -102
  22. package/modes/extended.md +2 -2
  23. package/modes/fix-loop.md +30 -30
  24. package/modes/fix-pipeline.md +32 -6
  25. package/modes/large-codebase.md +14 -15
  26. package/modes/local-sequential.md +44 -20
  27. package/modes/loop.md +56 -56
  28. package/modes/parallel.md +3 -3
  29. package/modes/scaled.md +2 -2
  30. package/modes/single-file.md +3 -3
  31. package/modes/small.md +11 -11
  32. package/package.json +11 -1
  33. package/prompts/fixer.md +37 -23
  34. package/prompts/hunter.md +39 -20
  35. package/prompts/referee.md +34 -20
  36. package/prompts/skeptic.md +25 -22
  37. package/schemas/coverage.schema.json +67 -0
  38. package/schemas/examples/findings.invalid.json +13 -0
  39. package/schemas/examples/findings.valid.json +17 -0
  40. package/schemas/findings.schema.json +76 -0
  41. package/schemas/fix-plan.schema.json +94 -0
  42. package/schemas/fix-report.schema.json +105 -0
  43. package/schemas/fix-strategy.schema.json +99 -0
  44. package/schemas/recon.schema.json +31 -0
  45. package/schemas/referee.schema.json +46 -0
  46. package/schemas/shared.schema.json +51 -0
  47. package/schemas/skeptic.schema.json +21 -0
  48. package/scripts/bug-hunter-state.cjs +35 -12
  49. package/scripts/code-index.cjs +11 -4
  50. package/scripts/fix-lock.cjs +95 -25
  51. package/scripts/payload-guard.cjs +24 -10
  52. package/scripts/pr-scope.cjs +181 -0
  53. package/scripts/prepublish-guard.cjs +82 -0
  54. package/scripts/render-report.cjs +346 -0
  55. package/scripts/run-bug-hunter.cjs +669 -33
  56. package/scripts/schema-runtime.cjs +273 -0
  57. package/scripts/schema-validate.cjs +40 -0
  58. package/scripts/tests/bug-hunter-state.test.cjs +68 -3
  59. package/scripts/tests/code-index.test.cjs +15 -0
  60. package/scripts/tests/fix-lock.test.cjs +60 -2
  61. package/scripts/tests/fixtures/flaky-worker.cjs +6 -1
  62. package/scripts/tests/fixtures/low-confidence-worker.cjs +8 -2
  63. package/scripts/tests/fixtures/success-worker.cjs +6 -1
  64. package/scripts/tests/payload-guard.test.cjs +154 -2
  65. package/scripts/tests/pr-scope.test.cjs +212 -0
  66. package/scripts/tests/render-report.test.cjs +180 -0
  67. package/scripts/tests/run-bug-hunter.test.cjs +686 -2
  68. package/scripts/tests/security-skills-integration.test.cjs +29 -0
  69. package/scripts/tests/skills-packaging.test.cjs +30 -0
  70. package/scripts/tests/worktree-harvest.test.cjs +67 -1
  71. package/scripts/worktree-harvest.cjs +62 -9
  72. package/skills/README.md +19 -0
  73. package/skills/commit-security-scan/SKILL.md +63 -0
  74. package/skills/security-review/SKILL.md +57 -0
  75. package/skills/threat-model-generation/SKILL.md +47 -0
  76. package/skills/vulnerability-validation/SKILL.md +59 -0
  77. package/templates/subagent-wrapper.md +12 -3
  78. package/modes/_dispatch.md +0 -121
package/modes/extended.md CHANGED
@@ -35,7 +35,7 @@ After Recon completes, read `.bug-hunter/recon.md` to extract the risk map and t
35
35
 
36
36
  Partition files from `triage.scanOrder` (or the Recon risk map if no triage) into chunks:
37
37
  - **Service-aware partitioning (preferred):** If triage detected multiple domains, partition by domain.
38
- - **Risk-tier partitioning (fallback):** Process CRITICAL files first, then HIGH, then MEDIUM.
38
+ - **Risk-tier partitioning (fallback):** Process CRITICAL files first, then HIGH, then MEDIUM, then LOW.
39
39
  - Chunk size: FILE_BUDGET ÷ 2 files per chunk (keep chunks small to avoid compaction).
40
40
  - Keep same-directory files together when possible.
41
41
 
@@ -67,7 +67,7 @@ For each chunk:
67
67
 
68
68
  ### 5d. Merge all findings
69
69
 
70
- After all chunks complete, merge findings from state into `.bug-hunter/findings.md`.
70
+ After all chunks complete, merge findings from state into `.bug-hunter/findings.json`.
71
71
 
72
72
  If TOTAL FINDINGS: 0, skip Skeptic and Referee. Go to Step 7 (Final Report) in SKILL.md.
73
73
 
package/modes/fix-loop.md CHANGED
@@ -17,58 +17,56 @@ When `LOOP_MODE=true` AND `FIX_MODE=true`, before running the first pipeline ite
17
17
  2. Call the `ralph_start` tool:
18
18
 
19
19
  ```
20
+ MAX_FIX_LOOP_ITERATIONS = max(
21
+ 15,
22
+ min(250, ceil(SCANNABLE_FILES / max(FILE_BUDGET, 1)) + ELIGIBLE_BUG_COUNT + 8)
23
+ )
24
+
20
25
  ralph_start({
21
26
  name: "bug-hunter-fix-audit",
22
27
  taskContent: <the TODO.md content below>,
23
- maxIterations: 15
28
+ maxIterations: MAX_FIX_LOOP_ITERATIONS
24
29
  })
25
30
  ```
26
31
 
27
32
  3. The ralph-loop system will then drive iteration. Each iteration:
28
33
  - You receive the task prompt with the current checklist state.
29
34
  - You execute one iteration of find + fix.
30
- - You update `.bug-hunter/coverage.md` with results.
31
- - If all bugs are FIXED and all CRITICAL/HIGH files are DONE → output `<promise>COMPLETE</promise>`.
35
+ - You update `.bug-hunter/coverage.json` with results and render `.bug-hunter/coverage.md`.
36
+ - If all bugs are FIXED and all queued scannable source files are DONE → output `<promise>COMPLETE</promise>`.
32
37
  - Otherwise → call `ralph_done` to proceed to the next iteration.
33
38
 
34
39
  **Do NOT manually loop or re-invoke yourself.** The ralph-loop system handles iteration automatically.
35
40
 
36
41
  ## Coverage file extension for fix mode
37
42
 
38
- The `.bug-hunter/coverage.md` file gains additional sections:
43
+ The `.bug-hunter/coverage.json` file carries the same loop state, plus fix
44
+ entries:
39
45
 
40
- ```markdown
41
- ## Fixes
42
- <!-- One line per bug. LATEST entry per BUG-ID is current status. -->
43
- <!-- Format: BUG-ID|STATUS|ITERATION_FIXED|FILES_MODIFIED -->
44
- <!-- STATUS: FIXED | FIX_REVERTED | FIX_FAILED | PARTIAL | FIX_CONFLICT | SKIPPED | FIXER_BUG -->
45
- BUG-3|FIXED|1|src/auth/login.ts
46
- BUG-7|FIXED|1|src/auth/login.ts
47
- BUG-12|FIXED|2|src/api/users.ts
48
-
49
- ## Test Results
50
- <!-- One line per iteration. Format: ITERATION|PASSED|FAILED|NEW_FAILURES|RESOLVED -->
51
- 1|45|3|2|0
52
- 2|47|1|0|1
46
+ ```json
47
+ {
48
+ "fixes": [
49
+ { "bugId": "BUG-3", "status": "FIXED" },
50
+ { "bugId": "BUG-12", "status": "FIX_FAILED" }
51
+ ]
52
+ }
53
53
  ```
54
54
 
55
- **Parsing rule:** For each BUG-ID, use the LAST entry in the Fixes section. Earlier entries for the same BUG-ID are history — only the latest matters.
56
-
57
55
  ## Loop iteration logic
58
56
 
59
57
  ```
60
58
  For each iteration:
61
- 1. Read coverage file
62
- 2. Collect (using LAST entry per BUG-ID):
63
- - Unfixed bugs: latest STATUS in {FIX_REVERTED, FIX_FAILED, FIX_CONFLICT, SKIPPED, FIXER_BUG}
64
- - Unscanned files: STATUS != DONE in Files section (CRITICAL/HIGH only)
59
+ 1. Read coverage.json
60
+ 2. Collect:
61
+ - Unfixed bugs: latest fix status in {FIX_REVERTED, FIX_FAILED, FIX_CONFLICT, SKIPPED, FIXER_BUG, MANUAL_REVIEW}
62
+ - Unscanned files: file status != done
65
63
  3. If unfixed bugs exist OR unscanned files exist:
66
64
  a. If unscanned files -> run Phase 1 (find pipeline) on them -> get new confirmed bugs
67
65
  b. Combine: unfixed bugs + newly confirmed bugs
68
66
  c. Run Phase 2 (fix + verify) on combined list
69
- d. Update coverage file (append new entries to Fixes section)
67
+ d. Update coverage.json and re-render coverage.md
70
68
  e. Call ralph_done to proceed to next iteration
71
- 4. If all bugs FIXED and all CRITICAL/HIGH files DONE:
69
+ 4. If all bugs FIXED and all queued scannable source files are DONE:
72
70
  -> Run final test suite one more time
73
71
  -> If no new failures:
74
72
  Output <promise>COMPLETE</promise>
@@ -87,6 +85,8 @@ Use this as the `taskContent` parameter when calling `ralph_start`:
87
85
  ## Discovery Tasks
88
86
  - [ ] All CRITICAL files scanned
89
87
  - [ ] All HIGH files scanned
88
+ - [ ] All MEDIUM files scanned
89
+ - [ ] All LOW files scanned
90
90
  - [ ] Findings verified through Skeptic+Referee pipeline
91
91
 
92
92
  ## Fix Tasks
@@ -100,13 +100,13 @@ Use this as the `taskContent` parameter when calling `ralph_start`:
100
100
  - [ ] ALL_TASKS_COMPLETE
101
101
 
102
102
  ## Instructions
103
- 1. Read .bug-hunter/coverage.md for previous iteration state
104
- 2. Parse Files table — collect unscanned CRITICAL/HIGH files
105
- 3. Parse Fixes table — collect unfixed bugs (latest entry not FIXED)
103
+ 1. Read .bug-hunter/coverage.json for previous iteration state
104
+ 2. Parse the `files` array — collect unscanned CRITICAL/HIGH/MEDIUM/LOW files
105
+ 3. Parse the `fixes` array — collect unfixed bugs (latest entry not FIXED)
106
106
  4. If unscanned files exist: run Phase 1 (find pipeline) on them
107
107
  5. If unfixed bugs exist: run Phase 2 (fix pipeline) on them
108
- 6. Update coverage file with results
109
- 7. Output <promise>COMPLETE</promise> when all bugs are FIXED and no new test failures
108
+ 6. Update coverage.json with results and render coverage.md
109
+ 7. Output <promise>COMPLETE</promise> only when all queued files are DONE, all discovered bugs are FIXED, and no new test failures remain
110
110
  8. Otherwise call ralph_done to continue to the next iteration
111
111
  ```
112
112
 
@@ -50,11 +50,14 @@ DYNAMIC_TTL = max(1800, ELIGIBLE_COUNT * 600) # 10 min per bug, minimum 30 min
50
50
  ```
51
51
  node "$SKILL_DIR/scripts/fix-lock.cjs" acquire ".bug-hunter/fix.lock" $DYNAMIC_TTL
52
52
  ```
53
+ Record `LOCK_OWNER_TOKEN` from the returned JSON (`lock.ownerToken`).
53
54
  If lock cannot be acquired, stop Phase 2 to avoid concurrent mutation.
54
55
 
56
+ **Owner token:** `acquire` returns `lock.ownerToken`; renew/release now require that token. Persist it for the entire Phase 2 run as `LOCK_OWNER_TOKEN`.
57
+
55
58
  **Lock renewal:** During Step 9 execution, renew the lock after each bug fix to prevent TTL expiry on long runs:
56
59
  ```
57
- node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock"
60
+ node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN"
58
61
  ```
59
62
 
60
63
  **8b. Detect verification commands**
@@ -81,7 +84,16 @@ If `TEST_COMMAND` is not null:
81
84
 
82
85
  If baseline cannot run, set `BASELINE=null` and `FLAKY_TESTS={}` and continue with manual-verification warning.
83
86
 
84
- **8d. Build sequential fix plan**
87
+ **8d. Build fix strategy + sequential fix plan**
88
+
89
+ Before deciding what to patch, write `.bug-hunter/fix-strategy.json` and `.bug-hunter/fix-strategy.md`.
90
+ The strategy artifact must classify each confirmed bug into one of:
91
+ - `safe-autofix`
92
+ - `manual-review`
93
+ - `larger-refactor`
94
+ - `architectural-remediation`
95
+
96
+ If `PLAN_ONLY_MODE=true`, stop after the strategy artifact and fix-plan preview are written.
85
97
 
86
98
  Prepare bug queue:
87
99
  1. Apply confidence gate:
@@ -185,7 +197,7 @@ For each batch in order:
185
197
 
186
198
  8a. Renew lock after each batch:
187
199
  ```
188
- node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock"
200
+ node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN"
189
201
  ```
190
202
 
191
203
  **Path B — Direct mode (`WORKTREE_MODE=false`):**
@@ -201,7 +213,7 @@ For each batch in order:
201
213
  7b. Record commit hash per BUG-ID in a fix ledger.
202
214
  8b. **Renew lock** after each bug fix:
203
215
  ```
204
- node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock"
216
+ node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN"
205
217
  ```
206
218
 
207
219
  If a bug cannot be fixed, mark `SKIPPED` and continue.
@@ -260,7 +272,9 @@ Use exact fixed scope from the real base commit:
260
272
  2. Build changed hunks list.
261
273
  3. Run one lightweight Hunter on changed hunks only with a **severity floor of MEDIUM**:
262
274
  - Only report fixer-introduced bugs at MEDIUM severity or above.
263
- - LOW-severity issues from the fixer are logged to `.bug-hunter/fix-report.md` as informational notes but do NOT trigger `FIXER_BUG` status.
275
+ - LOW-severity issues from the fixer are logged in `.bug-hunter/fix-report.json`
276
+ (and optional derived `.bug-hunter/fix-report.md`) as informational notes
277
+ but do NOT trigger `FIXER_BUG` status.
264
278
 
265
279
  This removes ambiguity from `<base-branch>` and works for path scans, staged scans, and branch scans.
266
280
 
@@ -301,7 +315,7 @@ If stash was created (not applicable in dry-run mode):
301
315
 
302
316
  Always release single-writer lock at the end (success or failure path):
303
317
  ```
304
- node "$SKILL_DIR/scripts/fix-lock.cjs" release ".bug-hunter/fix.lock"
318
+ node "$SKILL_DIR/scripts/fix-lock.cjs" release ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN"
305
319
  ```
306
320
  If an earlier step aborts Phase 2, run the same release command AND worktree cleanup-all in best-effort cleanup before returning.
307
321
 
@@ -375,6 +389,18 @@ Write `.bug-hunter/fix-report.json` alongside the markdown report:
375
389
  }
376
390
  ```
377
391
 
392
+ Validate it immediately:
393
+
394
+ ```bash
395
+ node "$SKILL_DIR/scripts/schema-validate.cjs" fix-report ".bug-hunter/fix-report.json"
396
+ ```
397
+
398
+ Render the Markdown companion when humans need it:
399
+
400
+ ```bash
401
+ node "$SKILL_DIR/scripts/render-report.cjs" fix-report ".bug-hunter/fix-report.json" > ".bug-hunter/fix-report.md"
402
+ ```
403
+
378
404
  Rules:
379
405
  - `dry_run: true` when `DRY_RUN_MODE=true` — the `fixes` array contains planned diffs instead of commit hashes.
380
406
  - `circuit_breaker_tripped: true` when the circuit breaker halted the pipeline.
@@ -67,7 +67,7 @@ This is fast — no file reading, just directory listing and heuristic classific
67
67
  Process ONE domain at a time, running the **full pipeline** (Recon → Hunter → Skeptic → Referee) within each domain:
68
68
 
69
69
  ```
70
- For each domain (CRITICAL first, then HIGH, then MEDIUM):
70
+ For each domain (CRITICAL first, then HIGH, then MEDIUM, then LOW):
71
71
  1. Get this domain's file list:
72
72
  - If triage exists: use triage.domainFileLists[domainPath]
73
73
  - If no triage: use fd/find to list files in this domain's directory
@@ -78,9 +78,9 @@ For each domain (CRITICAL first, then HIGH, then MEDIUM):
78
78
 
79
79
  Write domain results to:
80
80
  .bug-hunter/domains/<domain-name>/recon.md
81
- .bug-hunter/domains/<domain-name>/findings.md
82
- .bug-hunter/domains/<domain-name>/skeptic.md
83
- .bug-hunter/domains/<domain-name>/referee.md
81
+ .bug-hunter/domains/<domain-name>/findings.json
82
+ .bug-hunter/domains/<domain-name>/skeptic.json
83
+ .bug-hunter/domains/<domain-name>/referee.json
84
84
 
85
85
  Record in state:
86
86
  node "$SKILL_DIR/scripts/bug-hunter-state.cjs" record-findings ...
@@ -123,7 +123,7 @@ Write boundary results to `.bug-hunter/domains/_boundaries/`.
123
123
 
124
124
  After all domains + boundaries are audited:
125
125
 
126
- 1. Read all domain `referee.md` files and boundary results.
126
+ 1. Read all domain `referee.json` files and boundary results.
127
127
  2. Merge findings, deduplicate by file + line + claim.
128
128
  3. Renumber BUG-IDs globally.
129
129
  4. Build the final report per Step 7 in SKILL.md.
@@ -163,17 +163,16 @@ Use `.bug-hunter/state.json` with domain-aware structure:
163
163
  - Iteration N-1: Tier 3 merge and report
164
164
  - Iteration N: Coverage check → DONE or continue with missed domains
165
165
 
166
- The ralph-loop's coverage check reads the state file and only marks DONE when all CRITICAL and HIGH domains show status `done`.
166
+ The ralph-loop's coverage check reads the state file and only marks DONE when all queued domains show status `done`.
167
167
 
168
- ## Optimization: Skip LOW domains
168
+ ## Default autonomous behavior
169
169
 
170
- For truly huge codebases (1,000+ files), skip LOW-tier domains entirely unless `--exhaustive` is specified. UI components, test utilities, and formatting helpers rarely contain runtime bugs worth the context cost.
171
-
172
- Report skipped domains in the final report:
173
- ```
174
- ℹ️ Skipped [N] LOW-tier domains ([M] files) for efficiency.
175
- Use `--exhaustive` to include all domains.
176
- ```
170
+ Autonomous mode is exhaustive by default:
171
+ - Finish all CRITICAL domains first.
172
+ - Then continue through HIGH domains.
173
+ - Then continue through MEDIUM domains.
174
+ - Then continue through LOW domains.
175
+ - Only stop when the domain queue is exhausted, the user interrupts, or a hard blocker prevents safe progress.
177
176
 
178
177
  ## Optimization: Delta-first for repeat scans
179
178
 
@@ -209,4 +208,4 @@ When executing large-codebase mode:
209
208
  - [ ] Tier 3: Merge all domain + boundary findings
210
209
  - [ ] Tier 3: Deduplicate and renumber
211
210
  - [ ] Tier 3: Build final report with per-domain breakdown
212
- - [ ] Coverage: All CRITICAL/HIGH domains done? If not, continue.
211
+ - [ ] Coverage: All queued domains done? If not, continue.
@@ -6,7 +6,10 @@ This is NOT a degraded mode. The skill is designed to work fully here.
6
6
 
7
7
  ## How It Works
8
8
 
9
- You (the orchestrating agent) play each role yourself, sequentially. Between phases you **write outputs to files** so later phases can reference them without holding everything in working memory.
9
+ You (the orchestrating agent) play each role yourself, sequentially. Between
10
+ phases you write canonical JSON artifacts so later phases can reference them
11
+ without holding everything in working memory. Markdown reports are derived from
12
+ those JSON files when humans need them.
10
13
 
11
14
  All state files go in `.bug-hunter/` relative to the working directory.
12
15
 
@@ -26,7 +29,9 @@ All state files go in `.bug-hunter/` relative to the working directory.
26
29
  - Use `triage.scanOrder` as the file order for Phase B.
27
30
  - Recon's remaining job: read 3-5 key files from CRITICAL domains to identify **tech stack** (framework, auth mechanism, database, key dependencies) and **trust boundary patterns** (how routes are defined, how auth middleware is applied, etc.).
28
31
  - If git is available, check recently changed files with `git log`.
29
- - Write your Recon output to `.bug-hunter/recon.md` include the tech stack, patterns, and the triage-provided risk map.
32
+ - Write your Recon output to `.bug-hunter/recon.json` if structured output is
33
+ requested; otherwise keep `.bug-hunter/recon.md` as a temporary fallback
34
+ until the Recon prompt is migrated.
30
35
 
31
36
  3. **If `.bug-hunter/triage.json` does NOT exist** (fallback — Recon called directly):
32
37
  - Execute the full Recon instructions: discover files, classify, compute FILE_BUDGET.
@@ -44,12 +49,16 @@ All state files go in `.bug-hunter/` relative to the working directory.
44
49
  2. Read `SKILL_DIR/prompts/doc-lookup.md` with the Read tool.
45
50
  3. **Switch mindset**: you are now a Bug Hunter. Your ONLY job is to find behavioral bugs.
46
51
  4. Execute the Hunter instructions yourself:
47
- - Read files in risk-map order: CRITICAL → HIGH → MEDIUM.
52
+ - Read files in risk-map order: CRITICAL → HIGH → MEDIUM → LOW.
48
53
  - For each file, use the Read tool. Do NOT rely on memory from earlier phases.
49
54
  - Apply the mandatory security checklist sweep (Phase 3 in hunter.md) on every CRITICAL and HIGH file.
50
55
  - Track which files you actually read — be honest about coverage.
51
56
  - For each bug found, record it in the exact BUG-N format specified in hunter.md.
52
- 5. Write your complete findings to `.bug-hunter/findings.md`.
57
+ 5. Write your complete findings to `.bug-hunter/findings.json`.
58
+ 6. Validate the artifact immediately:
59
+ ```bash
60
+ node "$SKILL_DIR/scripts/schema-validate.cjs" findings ".bug-hunter/findings.json"
61
+ ```
53
62
 
54
63
  **Context management:** If you notice earlier files becoming hazy in your memory:
55
64
  - STOP expanding to new files.
@@ -84,9 +93,10 @@ If the Recon risk map contains more files than FILE_BUDGET, do NOT try to read t
84
93
  ```bash
85
94
  node "$SKILL_DIR/scripts/bug-hunter-state.cjs" mark-chunk ".bug-hunter/state.json" "<chunk-id>" done
86
95
  ```
87
- 3. After all chunks: merge findings from `.bug-hunter/state.json` into `.bug-hunter/findings.md`.
96
+ 3. After all chunks: merge findings from `.bug-hunter/state.json` into
97
+ `.bug-hunter/findings.json`.
88
98
 
89
- **Gap-fill:** After scanning, compare FILES SCANNED against the risk map. If any CRITICAL or HIGH files are in FILES SKIPPED, read them now and append any new findings. If you truly cannot read them (context exhaustion), leave them in FILES SKIPPED.
99
+ **Gap-fill:** After scanning, compare FILES SCANNED against the risk map. If any queued scannable files are in FILES SKIPPED, read them now in priority order (CRITICAL → HIGH → MEDIUM → LOW) and append any new findings. If you truly cannot read them (context exhaustion), leave them in FILES SKIPPED so loop mode can resume them next.
90
100
 
91
101
  If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report) in SKILL.md.
92
102
 
@@ -95,7 +105,7 @@ If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report)
95
105
  1. Read `SKILL_DIR/prompts/skeptic.md` with the Read tool.
96
106
  2. Read `SKILL_DIR/prompts/doc-lookup.md` with the Read tool.
97
107
  3. **Switch mindset completely**: you are now the Skeptic. Your job is to DISPROVE false positives. Forget the pride of finding them — you want to kill weak claims.
98
- 4. Read `.bug-hunter/findings.md` to get the findings list.
108
+ 4. Read `.bug-hunter/findings.json` to get the findings list.
99
109
  5. For EACH finding:
100
110
  - Re-read the actual code at the reported file and line with the Read tool. This is MANDATORY — do not evaluate from memory.
101
111
  - Read all cross-referenced files.
@@ -103,7 +113,12 @@ If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report)
103
113
  - Check framework/middleware protections the Hunter may have missed.
104
114
  - Apply the risk calculation: `EV = (confidence% × points) - ((100 - confidence%) × 2 × points)`. Only DISPROVE when EV is positive (confidence > 67%).
105
115
  - For Critical bugs: need >67% confidence AND all cross-references read.
106
- 6. Write your complete Skeptic output to `.bug-hunter/skeptic.md` in the format from skeptic.md.
116
+ 6. Write your complete Skeptic output to `.bug-hunter/skeptic.json` in the
117
+ format from skeptic.md.
118
+ 7. Validate it immediately:
119
+ ```bash
120
+ node "$SKILL_DIR/scripts/schema-validate.cjs" skeptic ".bug-hunter/skeptic.json"
121
+ ```
107
122
 
108
123
  **Important:** When switching from Hunter to Skeptic, genuinely try to disprove your own findings. The point of this phase is adversarial review. If you cannot genuinely argue against a finding, ACCEPT it and move on — do not waste time rubber-stamping.
109
124
 
@@ -111,13 +126,21 @@ If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report)
111
126
 
112
127
  1. Read `SKILL_DIR/prompts/referee.md` with the Read tool.
113
128
  2. **Switch mindset**: you are the impartial Referee. You trust neither the Hunter nor the Skeptic.
114
- 3. Read both `.bug-hunter/findings.md` and `.bug-hunter/skeptic.md`.
129
+ 3. Read both `.bug-hunter/findings.json` and `.bug-hunter/skeptic.json`.
115
130
  4. For each finding:
116
131
  - **Tier 1 (all Critical + top 15 by severity):** Re-read the actual code yourself a THIRD time using the Read tool. Construct the runtime trigger independently. Make your own judgment.
117
132
  - **Tier 2 (remaining):** Evaluate evidence quality. Whose code quotes are more specific? Whose runtime trigger is more concrete?
118
133
  5. Make final REAL BUG / NOT A BUG verdicts with severity calibration.
119
- 6. Write the final Referee report to `.bug-hunter/referee.md`.
120
- 7. Proceed to Step 7 (Final Report) in SKILL.md.
134
+ 6. Write the final Referee verdicts to `.bug-hunter/referee.json`.
135
+ 7. Validate them immediately:
136
+ ```bash
137
+ node "$SKILL_DIR/scripts/schema-validate.cjs" referee ".bug-hunter/referee.json"
138
+ ```
139
+ 8. Render `.bug-hunter/report.md` from the JSON artifacts:
140
+ ```bash
141
+ node "$SKILL_DIR/scripts/render-report.cjs" report ".bug-hunter/findings.json" ".bug-hunter/referee.json" > ".bug-hunter/report.md"
142
+ ```
143
+ 9. Proceed to Step 7 (Final Report) in SKILL.md.
121
144
 
122
145
  ## State Files Summary
123
146
 
@@ -125,10 +148,11 @@ After a complete local-sequential run, these files should exist:
125
148
 
126
149
  | File | Phase | Content |
127
150
  |------|-------|---------|
128
- | `.bug-hunter/recon.md` | A | Risk map, file metrics, tech stack |
129
- | `.bug-hunter/findings.md` | B | All Hunter findings in BUG-N format |
130
- | `.bug-hunter/skeptic.md` | C | Skeptic challenges and decisions |
131
- | `.bug-hunter/referee.md` | D | Final verdicts and confirmed bugs |
151
+ | `.bug-hunter/recon.json` | A | Recon artifact when structured output is used |
152
+ | `.bug-hunter/findings.json` | B | All Hunter findings in canonical JSON |
153
+ | `.bug-hunter/skeptic.json` | C | Skeptic challenges in canonical JSON |
154
+ | `.bug-hunter/referee.json` | D | Final verdicts in canonical JSON |
155
+ | `.bug-hunter/report.md` | D | Human-readable report rendered from JSON |
132
156
  | `.bug-hunter/state.json` | B (chunked) | Chunk progress, findings ledger |
133
157
  | `.bug-hunter/source-files.json` | A | Source file list (for state init) |
134
158
 
@@ -136,8 +160,8 @@ After a complete local-sequential run, these files should exist:
136
160
 
137
161
  After Phase D, check coverage:
138
162
 
139
- - If all CRITICAL and HIGH files were scanned: proceed to Final Report.
140
- - If any CRITICAL/HIGH files were skipped:
141
- - If `--loop` mode: the ralph-loop will iterate and cover them next.
142
- - If not `--loop`: include a coverage WARNING in the Final Report and recommend `--loop`.
143
- - Do NOT claim "full coverage" or "audit complete" unless every CRITICAL and HIGH file was actually read with the Read tool and has status DONE.
163
+ - If all queued scannable source files were scanned: proceed to Final Report.
164
+ - If any queued scannable files were skipped:
165
+ - If `--loop` mode: the ralph-loop must iterate and cover the remaining queue next.
166
+ - If not `--loop`: include a coverage WARNING in the Final Report and recommend loop mode.
167
+ - Do NOT claim "full coverage" or "audit complete" unless every queued scannable source file was actually read with the Read tool and has status DONE.
package/modes/loop.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Ralph-Loop Mode (`--loop`)
2
2
 
3
- When `--loop` is present, the bug-hunter wraps itself in a ralph-loop that keeps iterating until the audit achieves full coverage. This is for thorough, autonomous audits where you want every file examined.
3
+ When `--loop` is present, the bug-hunter wraps itself in a ralph-loop that keeps iterating until the audit achieves full queued coverage. This is for thorough, autonomous audits where you want every queued scannable source file examined unless the user interrupts.
4
4
 
5
5
  ## CRITICAL: Starting the ralph-loop
6
6
 
@@ -12,65 +12,63 @@ When `LOOP_MODE=true` is set (from `--loop` flag), before running the first pipe
12
12
  2. Call the `ralph_start` tool:
13
13
 
14
14
  ```
15
+ MAX_LOOP_ITERATIONS = max(12, min(200, ceil(SCANNABLE_FILES / max(FILE_BUDGET, 1)) + 8))
16
+
15
17
  ralph_start({
16
18
  name: "bug-hunter-audit",
17
19
  taskContent: <the TODO.md content below>,
18
- maxIterations: 10
20
+ maxIterations: MAX_LOOP_ITERATIONS
19
21
  })
20
22
  ```
21
23
 
22
24
  3. The ralph-loop system will then drive iteration. Each iteration:
23
25
  - You receive the task prompt with the current checklist state.
24
26
  - You execute one iteration of the bug-hunt pipeline (steps below).
25
- - You update `.bug-hunter/coverage.md` with results.
26
- - If ALL CRITICAL/HIGH files are DONE → output `<promise>COMPLETE</promise>` to end the loop.
27
+ - You update `.bug-hunter/coverage.json` with results and render `.bug-hunter/coverage.md` from it.
28
+ - If ALL queued scannable source files are DONE → output `<promise>COMPLETE</promise>` to end the loop.
27
29
  - Otherwise → call `ralph_done` to proceed to the next iteration.
28
30
 
29
31
  **Do NOT manually loop or re-invoke yourself.** The ralph-loop system handles iteration automatically after you call `ralph_start`.
30
32
 
31
33
  ## How it works
32
34
 
33
- 1. **First iteration**: Run the normal pipeline (Recon → Hunters → Skeptics → Referee). At the end, write a coverage report to `.bug-hunter/coverage.md` using the machine-parseable format below.
35
+ 1. **First iteration**: Run the normal pipeline (Recon → Hunters → Skeptics →
36
+ Referee). At the end, write canonical coverage state to
37
+ `.bug-hunter/coverage.json` and render `.bug-hunter/coverage.md` from it.
34
38
 
35
39
  2. **Coverage check**: After each iteration, evaluate:
36
- - If ALL CRITICAL and HIGH files show status DONE → output `<promise>COMPLETE</promise>` → loop ends
37
- - If any CRITICAL/HIGH files are SKIPPED or PARTIAL → call `ralph_done` → loop continues
38
- - If only MEDIUM files remain uncovered output `<promise>COMPLETE</promise>` (MEDIUM gaps are acceptable)
39
-
40
- 3. **Subsequent iterations**: Each new iteration reads `.bug-hunter/coverage.md` to see what's already been done, then runs the pipeline ONLY on uncovered files. New findings are appended to the cumulative bug list.
41
-
42
- ## Coverage file format (machine-parseable)
43
-
44
- **`.bug-hunter/coverage.md`:**
45
- ```markdown
46
- # Bug Hunt Coverage
47
- SCHEMA_VERSION: 2
48
-
49
- ## Meta
50
- ITERATION: [N]
51
- STATUS: [IN_PROGRESS | COMPLETE]
52
- TOTAL_BUGS_FOUND: [N]
53
- TIMESTAMP: [ISO 8601]
54
- CHECKSUM: [line_count of Files section]|[line_count of Bugs section]
55
-
56
- ## Files
57
- <!-- One line per file. Format: TIER|PATH|STATUS|ITERATION_SCANNED|BUGS_FOUND -->
58
- <!-- STATUS: DONE | PARTIAL | SKIPPED -->
59
- <!-- BUGS_FOUND: comma-separated BUG-IDs, or NONE -->
60
- CRITICAL|src/auth/login.ts|DONE|1|BUG-3,BUG-7
61
- CRITICAL|src/auth/middleware.ts|DONE|1|NONE
62
- HIGH|src/api/users.ts|DONE|1|BUG-12
63
- HIGH|src/api/payments.ts|SKIPPED|0|
64
- MEDIUM|src/utils/format.ts|SKIPPED|0|
65
- TEST|src/auth/login.test.ts|CONTEXT|1|
66
-
67
- ## Bugs
68
- <!-- One line per confirmed bug. Format: BUG-ID|SEVERITY|FILE|LINES|ONE_LINE_DESCRIPTION -->
69
- BUG-3|Critical|src/auth/login.ts|45-52|JWT token not validated before use
70
- BUG-7|Medium|src/auth/login.ts|89|Password comparison uses timing-unsafe equality
71
- BUG-12|Low|src/api/users.ts|120-125|Missing null check on optional profile field
40
+ - If ALL queued scannable source files show status DONE → output `<promise>COMPLETE</promise>` → loop ends
41
+ - If any queued scannable source files are SKIPPED or PARTIAL → call `ralph_done` → loop continues
42
+ - Do NOT stop just because the current prioritized tier is clean; continue descending through MEDIUM and LOW files automatically
43
+
44
+ 3. **Subsequent iterations**: Each new iteration reads
45
+ `.bug-hunter/coverage.json` to see what's already been done, then runs the
46
+ pipeline ONLY on uncovered files. New findings are appended to the
47
+ cumulative bug list.
48
+
49
+ ## Coverage file format (canonical)
50
+
51
+ **`.bug-hunter/coverage.json`:**
52
+ ```json
53
+ {
54
+ "schemaVersion": 1,
55
+ "iteration": 1,
56
+ "status": "IN_PROGRESS",
57
+ "files": [
58
+ { "path": "src/auth/login.ts", "status": "done" },
59
+ { "path": "src/api/payments.ts", "status": "pending" }
60
+ ],
61
+ "bugs": [
62
+ { "bugId": "BUG-3", "severity": "Critical", "file": "src/auth/login.ts", "claim": "JWT token not validated before use" }
63
+ ],
64
+ "fixes": [
65
+ { "bugId": "BUG-3", "status": "MANUAL_REVIEW" }
66
+ ]
67
+ }
72
68
  ```
73
69
 
70
+ **`.bug-hunter/coverage.md`** is derived from the JSON artifact for humans.
71
+
74
72
  ## TODO.md task content for ralph_start
75
73
 
76
74
  Use this as the `taskContent` parameter when calling `ralph_start`:
@@ -82,44 +80,46 @@ Use this as the `taskContent` parameter when calling `ralph_start`:
82
80
  ## Coverage Tasks
83
81
  - [ ] All CRITICAL files scanned
84
82
  - [ ] All HIGH files scanned
83
+ - [ ] All MEDIUM files scanned
84
+ - [ ] All LOW files scanned
85
85
  - [ ] Findings verified through Skeptic+Referee pipeline
86
86
 
87
87
  ## Completion
88
88
  - [ ] ALL_TASKS_COMPLETE
89
89
 
90
90
  ## Instructions
91
- 1. Read .bug-hunter/coverage.md for previous iteration state
92
- 2. Parse the Files table — collect all lines where STATUS is not DONE and TIER is CRITICAL or HIGH
91
+ 1. Read .bug-hunter/coverage.json for previous iteration state
92
+ 2. Parse the `files` array — collect all entries where `status` is not `done`
93
93
  3. Run bug-hunter pipeline on those files only
94
- 4. Update coverage file: change STATUS to DONE, add BUG-IDs
95
- 5. Output <promise>COMPLETE</promise> when all CRITICAL/HIGH files are DONE
94
+ 4. Update coverage JSON: change file status to `done`, append bug summaries, and render coverage.md
95
+ 5. Output <promise>COMPLETE</promise> only when all queued source files are DONE
96
96
  6. Otherwise call ralph_done to continue to the next iteration
97
97
  ```
98
98
 
99
99
  ## Coverage file validation
100
100
 
101
101
  At the start of each iteration, validate the coverage file:
102
- 1. Check `SCHEMA_VERSION: 2` exists on line 2 — if missing, this is a v1 file; migrate by adding the header
103
- 2. Parse the CHECKSUM field: `[file_lines]|[bug_lines]` count actual lines in Files and Bugs sections
104
- 3. If counts don't match the checksum, the file may be corrupted. Warn: "Coverage file checksum mismatch (expected X|Y, got A|B). Re-scanning affected files." Then set any files with mismatched data to STATUS=PARTIAL for re-scan.
105
- 4. If the file fails to parse entirely (malformed lines, missing sections), rename it to `.bug-hunter/coverage.md.bak` and start fresh. Warn user.
106
-
107
- Update the CHECKSUM every time you write to the coverage file.
102
+ 1. Validate `.bug-hunter/coverage.json` against the local coverage schema.
103
+ 2. If validation fails, rename the bad file to `.bug-hunter/coverage.json.bak`
104
+ and start fresh. Warn the user.
105
+ 3. Always regenerate `.bug-hunter/coverage.md` from the JSON artifact after a
106
+ successful write.
108
107
 
109
108
  ## Iteration behavior
110
109
 
111
110
  Each iteration after the first:
112
- 1. Read `.bug-hunter/coverage.md` — parse the Files table
113
- 2. Collect all lines where STATUS != DONE and TIER is CRITICAL or HIGH
111
+ 1. Read `.bug-hunter/coverage.json`
112
+ 2. Collect all file entries where `status != "done"`
114
113
  3. If none remain → output `<promise>COMPLETE</promise>` (this ends the ralph-loop)
115
114
  4. Otherwise, run the pipeline on remaining files only (use small/parallel mode based on count)
116
- 5. Update the coverage file: set STATUS to DONE for scanned files, append new bugs to the Bugs section
115
+ 5. Update `coverage.json`, then render `coverage.md`
117
116
  6. Increment ITERATION counter
118
117
  7. Call `ralph_done` to proceed to the next iteration
119
118
 
120
119
  ## Safety
121
120
 
122
- - Max 10 iterations by default (set via `ralph_start({ maxIterations: 10 })`)
121
+ - Max iterations should scale with the queue size so autonomous runs do not stop early
123
122
  - Each iteration only scans NEW files — no re-scanning already-DONE files
124
123
  - User can stop anytime with ESC or `/ralph-stop`
125
- - All state is in `.bug-hunter/coverage.md` fully resumable, machine-parseable
124
+ - Canonical state is in `.bug-hunter/coverage.json`; `coverage.md` is derived
125
+ and fully resumable from that JSON
package/modes/parallel.md CHANGED
@@ -70,7 +70,7 @@ Pass to the Hunter:
70
70
  - If scout hints exist (from Step 5), use them to prioritize certain code sections, but scan all files regardless.
71
71
  - `doc-lookup.md` contents as phase-specific context.
72
72
 
73
- After completion, read `.bug-hunter/findings.md`.
73
+ After completion, read `.bug-hunter/findings.json`.
74
74
 
75
75
  **Merge scout + deep findings:** If scout pass ran, compare scout findings with deep Hunter findings. Promote any scout-only findings (bugs the deep Hunter missed) into the findings list for Skeptic review.
76
76
 
@@ -80,7 +80,7 @@ If TOTAL FINDINGS: 0, skip Skeptic and Referee. Go to Step 7 (Final Report) in S
80
80
 
81
81
  ## Step 5-verify: Gap-fill check
82
82
 
83
- Same as small mode: compare FILES SCANNED vs risk map, re-scan any missed CRITICAL/HIGH files.
83
+ Same as small mode: compare FILES SCANNED vs risk map, then re-scan any missed queued scannable files in priority order.
84
84
 
85
85
  ---
86
86
 
@@ -104,7 +104,7 @@ Dispatch Referee using the standard dispatch pattern (see `_dispatch.md`, role=`
104
104
 
105
105
  Pass the merged Hunter findings + Skeptic challenges.
106
106
 
107
- After completion, read `.bug-hunter/referee.md`.
107
+ After completion, read `.bug-hunter/referee.json`, then render `.bug-hunter/report.md` from the JSON artifacts.
108
108
 
109
109
  ---
110
110
 
package/modes/scaled.md CHANGED
@@ -45,7 +45,7 @@ For each chunk: dispatch Hunter, record findings, mark done — same pattern as
45
45
  ### 5c. Cross-chunk consistency
46
46
 
47
47
  After all chunks complete:
48
- 1. Merge findings from state into `.bug-hunter/findings.md`.
48
+ 1. Merge findings from state into `.bug-hunter/findings.json`.
49
49
  2. Run consistency check: look for duplicate BUG-IDs across chunks and conflicting claims on the same file/line.
50
50
  3. Resolve conflicts: keep the finding with the stronger evidence.
51
51
 
@@ -73,4 +73,4 @@ Pass merged Hunter findings + Skeptic challenges.
73
73
 
74
74
  Proceed to **Step 7** (Final Report) in SKILL.md.
75
75
 
76
- If `--loop` was specified and coverage is incomplete, the ralph-loop will iterate to cover remaining files.
76
+ If `--loop` was specified and coverage is incomplete, the ralph-loop will iterate to cover the remaining queued files until the queue is exhausted or the user interrupts.