@codexstar/bug-hunter 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/CHANGELOG.md +151 -0
  2. package/LICENSE +21 -0
  3. package/README.md +665 -0
  4. package/SKILL.md +624 -0
  5. package/bin/bug-hunter +222 -0
  6. package/evals/evals.json +362 -0
  7. package/modes/_dispatch.md +121 -0
  8. package/modes/extended.md +94 -0
  9. package/modes/fix-loop.md +115 -0
  10. package/modes/fix-pipeline.md +384 -0
  11. package/modes/large-codebase.md +212 -0
  12. package/modes/local-sequential.md +143 -0
  13. package/modes/loop.md +125 -0
  14. package/modes/parallel.md +113 -0
  15. package/modes/scaled.md +76 -0
  16. package/modes/single-file.md +38 -0
  17. package/modes/small.md +86 -0
  18. package/package.json +56 -0
  19. package/prompts/doc-lookup.md +44 -0
  20. package/prompts/examples/hunter-examples.md +131 -0
  21. package/prompts/examples/skeptic-examples.md +87 -0
  22. package/prompts/fixer.md +103 -0
  23. package/prompts/hunter.md +146 -0
  24. package/prompts/recon.md +159 -0
  25. package/prompts/referee.md +122 -0
  26. package/prompts/skeptic.md +143 -0
  27. package/prompts/threat-model.md +122 -0
  28. package/scripts/bug-hunter-state.cjs +537 -0
  29. package/scripts/code-index.cjs +541 -0
  30. package/scripts/context7-api.cjs +133 -0
  31. package/scripts/delta-mode.cjs +219 -0
  32. package/scripts/dep-scan.cjs +343 -0
  33. package/scripts/doc-lookup.cjs +316 -0
  34. package/scripts/fix-lock.cjs +167 -0
  35. package/scripts/init-test-fixture.sh +19 -0
  36. package/scripts/payload-guard.cjs +197 -0
  37. package/scripts/run-bug-hunter.cjs +892 -0
  38. package/scripts/tests/bug-hunter-state.test.cjs +87 -0
  39. package/scripts/tests/code-index.test.cjs +57 -0
  40. package/scripts/tests/delta-mode.test.cjs +47 -0
  41. package/scripts/tests/fix-lock.test.cjs +36 -0
  42. package/scripts/tests/fixtures/flaky-worker.cjs +63 -0
  43. package/scripts/tests/fixtures/low-confidence-worker.cjs +73 -0
  44. package/scripts/tests/fixtures/success-worker.cjs +42 -0
  45. package/scripts/tests/payload-guard.test.cjs +41 -0
  46. package/scripts/tests/run-bug-hunter.test.cjs +403 -0
  47. package/scripts/tests/test-utils.cjs +59 -0
  48. package/scripts/tests/worktree-harvest.test.cjs +297 -0
  49. package/scripts/triage.cjs +528 -0
  50. package/scripts/worktree-harvest.cjs +516 -0
  51. package/templates/subagent-wrapper.md +109 -0
@@ -0,0 +1,121 @@
1
+ # Shared Dispatch Patterns
2
+
3
+ This file defines how to dispatch each pipeline role (Recon, Hunter, Skeptic, Referee, Fixer) using any `AGENT_BACKEND`. Mode files reference this instead of duplicating dispatch boilerplate.
4
+
5
+ ---
6
+
7
+ ## Dispatch by Backend
8
+
9
+ ### local-sequential
10
+
11
+ You execute the role yourself:
12
+
13
+ 1. Read the prompt file: `read({ path: "$SKILL_DIR/prompts/<role>.md" })`
14
+ 2. If the role needs doc-lookup: also read `$SKILL_DIR/prompts/doc-lookup.md`
15
+ 3. **Switch mindset** to the role (important for Skeptic/Referee — genuinely adversarial)
16
+ 4. Execute the role's instructions using the Read tool to examine source files
17
+ 5. Write output to the role's output file (see Output Files table below)
18
+
19
+ ### subagent
20
+
21
+ 1. Read the prompt file: `read({ path: "$SKILL_DIR/prompts/<role>.md" })`
22
+ 2. Read the wrapper template: `read({ path: "$SKILL_DIR/templates/subagent-wrapper.md" })`
23
+ 3. Generate payload:
24
+ ```bash
25
+ node "$SKILL_DIR/scripts/payload-guard.cjs" generate <role> ".bug-hunter/payloads/<role>-<context>.json"
26
+ ```
27
+ 4. Edit the payload JSON — fill in `skillDir`, `targetFiles`, and role-specific fields
28
+ 5. Validate:
29
+ ```bash
30
+ node "$SKILL_DIR/scripts/payload-guard.cjs" validate <role> ".bug-hunter/payloads/<role>-<context>.json"
31
+ ```
32
+ 6. Fill the subagent-wrapper template variables:
33
+ - `{ROLE_NAME}` = role name (see table below)
34
+ - `{ROLE_DESCRIPTION}` = role description (see table below)
35
+ - `{PROMPT_CONTENT}` = full contents of the prompt .md file
36
+ - `{TARGET_DESCRIPTION}` = what is being scanned
37
+ - `{SKILL_DIR}` = absolute path to skill directory
38
+ - `{FILE_LIST}` = files in scan order (CRITICAL first)
39
+ - `{RISK_MAP}` = risk classification from triage or Recon
40
+ - `{TECH_STACK}` = framework, auth, DB from Recon
41
+ - `{PHASE_SPECIFIC_CONTEXT}` = role-specific context (see below)
42
+ - `{OUTPUT_FILE_PATH}` = output file path
43
+ 7. Dispatch:
44
+ ```
45
+ subagent({ agent: "<role>-agent", task: "<filled template>", output: "<output-path>" })
46
+ ```
47
+ 8. Read the output file after completion
48
+
49
+ ### teams
50
+
51
+ Same as subagent, but dispatch with:
52
+ ```
53
+ teams({ tasks: [{ text: "<filled template>" }], maxTeammates: 1 })
54
+ ```
55
+
56
+ ### interactive_shell
57
+
58
+ ```
59
+ interactive_shell({ command: 'pi "<filled task prompt>"', mode: "dispatch" })
60
+ ```
61
+
62
+ ---
63
+
64
+ ## Role Reference
65
+
66
+ | Role | Prompt File | Role Description | Output File | Phase-Specific Context |
67
+ |------|-------------|-----------------|-------------|----------------------|
68
+ | `recon` | `prompts/recon.md` | Reconnaissance agent — map the codebase and classify files by risk | `.bug-hunter/recon.md` | Triage JSON path (if exists) |
69
+ | `hunter` | `prompts/hunter.md` | Bug Hunter — find behavioral bugs in source code | `.bug-hunter/findings.md` | `doc-lookup.md` + risk map + tech stack |
70
+ | `skeptic` | `prompts/skeptic.md` | Skeptic — adversarial review to disprove false positives | `.bug-hunter/skeptic.md` | Hunter findings (compact: bugId, severity, file, lines, claim, evidence, runtimeTrigger) + `doc-lookup.md` |
71
+ | `referee` | `prompts/referee.md` | Referee — impartial final judge of all findings | `.bug-hunter/referee.md` | Hunter findings + Skeptic challenges |
72
+ | `fixer` | `prompts/fixer.md` | Surgical code fixer — implement minimal fixes for confirmed bugs | `.bug-hunter/fix-report.md` | Confirmed bugs from Referee + tech stack + `doc-lookup.md` |
73
+
74
+ ---
75
+
76
+ ## Fixer Dispatch: Worktree Isolation (subagent/teams only)
77
+
78
+ When `WORKTREE_MODE=true`, the Fixer runs in a managed git worktree for isolation. The orchestrator handles the full lifecycle — the Fixer just edits and commits.
79
+
80
+ **Key differences from other role dispatches:**
81
+
82
+ 1. The worktree is created by the orchestrator via `worktree-harvest.cjs prepare` BEFORE dispatch.
83
+ 2. The Fixer's working directory is set to the worktree's absolute path, not the project root.
84
+ 3. The Fixer MUST `git add` + `git commit` each fix (uncommitted work = `FIX_FAILED`).
85
+ 4. The orchestrator harvests commits via `worktree-harvest.cjs harvest` AFTER dispatch.
86
+ 5. The orchestrator cleans up via `worktree-harvest.cjs cleanup` AFTER harvest.
87
+
88
+ **CRITICAL — do NOT use `isolation: "worktree"` on the Agent tool:**
89
+ The Agent tool's built-in worktree isolation creates an ephemeral branch and auto-cleans on exit, which loses Fixer commits. We manage worktrees ourselves so the Fixer commits land directly on the fix branch.
90
+
91
+ **Fixer-specific template variables for `{PHASE_SPECIFIC_CONTEXT}`:**
92
+ - `WORKTREE_DIR: <absolute path to worktree>`
93
+ - `FIX_BRANCH: <branch name>`
94
+ - `COMMIT_FORMAT: fix(bug-hunter): BUG-N — [description]`
95
+ - Worktree isolation rules (see `{WORKTREE_RULES}` in subagent-wrapper.md)
96
+
97
+ **Lifecycle diagram:**
98
+ ```
99
+ Orchestrator Fixer (in worktree)
100
+ | |
101
+ |-- prepare (worktree-harvest.cjs) -->|
102
+ | |-- read code
103
+ | |-- edit files
104
+ | |-- git add + commit per bug
105
+ | |-- report done
106
+ |<-- harvest (worktree-harvest.cjs) --|
107
+ |-- cleanup (worktree-harvest.cjs) |
108
+ |-- verify on fix branch |
109
+ ```
110
+
111
+ ---
112
+
113
+ ## Context Pruning Rules
114
+
115
+ When passing data between phases, include only what the receiving role needs:
116
+
117
+ **To Skeptic:** For each bug: BUG-ID, severity, file, lines, claim, evidence, runtimeTrigger, cross-references. Omit: Hunter's internal reasoning, scan coverage stats, FILES SCANNED/SKIPPED metadata.
118
+
119
+ **To Referee:** Full Hunter findings + full Skeptic challenges. The Referee needs both sides to judge.
120
+
121
+ **To Fixer:** For each confirmed bug: BUG-ID, severity, file, line range, description, suggested fix direction, tech stack context. Omit: Skeptic challenges, Referee reasoning.
@@ -0,0 +1,94 @@
1
+ # Extended Mode (FILE_BUDGET+1 to FILE_BUDGET×2 files) — chunked sequential
2
+
3
+ This mode handles larger targets that don't fit in a single Hunter pass.
4
+ Files are split into chunks processed sequentially with persistent state.
5
+ All phases are dispatched using the `AGENT_BACKEND` selected during SKILL preflight.
6
+
7
+ ---
8
+
9
+ ## Triage Integration
10
+
11
+ Before any phase, check for `.bug-hunter/triage.json` (written by Step 1). If present:
12
+ - Use `triage.riskMap` as the risk map — skip Recon's file classification.
13
+ - Use `triage.scanOrder` as the chunk-building source (files already priority-ordered).
14
+ - Use `triage.fileBudget` as FILE_BUDGET and chunk size cap.
15
+ - Use `triage.domains` for service-aware partitioning if available.
16
+ - Recon becomes an enrichment pass: identify tech stack and trust boundary patterns only.
17
+
18
+ ---
19
+
20
+ ## Step 4: Run Recon
21
+
22
+ Dispatch Recon using the standard dispatch pattern (see `_dispatch.md`, role=`recon`).
23
+
24
+ **If triage data exists**, tell Recon to use the triage risk map and only identify tech stack + patterns.
25
+
26
+ **If no triage data**, Recon does full file discovery and classification.
27
+
28
+ After Recon completes, read `.bug-hunter/recon.md` to extract the risk map and tech stack.
29
+
30
+ ---
31
+
32
+ ## Step 5: Run Chunked Hunters
33
+
34
+ ### 5a. Build chunks
35
+
36
+ Partition files from `triage.scanOrder` (or the Recon risk map if no triage) into chunks:
37
+ - **Service-aware partitioning (preferred):** If triage detected multiple domains, partition by domain.
38
+ - **Risk-tier partitioning (fallback):** Process CRITICAL files first, then HIGH, then MEDIUM.
39
+ - Chunk size: FILE_BUDGET ÷ 2 files per chunk (keep chunks small to avoid compaction).
40
+ - Keep same-directory files together when possible.
41
+
42
+ ### 5b. Initialize state
43
+
44
+ ```bash
45
+ node "$SKILL_DIR/scripts/bug-hunter-state.cjs" init ".bug-hunter/state.json" "extended" ".bug-hunter/source-files.json" 30
46
+ ```
47
+
48
+ ### 5c. Execute chunks sequentially
49
+
50
+ For each chunk:
51
+
52
+ 1. Get next chunk and mark in-progress:
53
+ ```bash
54
+ node "$SKILL_DIR/scripts/bug-hunter-state.cjs" next-chunk ".bug-hunter/state.json"
55
+ node "$SKILL_DIR/scripts/bug-hunter-state.cjs" mark-chunk ".bug-hunter/state.json" "<chunk-id>" in_progress
56
+ ```
57
+
58
+ 2. Dispatch Hunter on this chunk's files using the standard dispatch pattern (see `_dispatch.md`, role=`hunter`).
59
+
60
+ 3. Record findings and mark done:
61
+ ```bash
62
+ node "$SKILL_DIR/scripts/bug-hunter-state.cjs" record-findings ".bug-hunter/state.json" ".bug-hunter/chunk-<id>-findings.json" "extended"
63
+ node "$SKILL_DIR/scripts/bug-hunter-state.cjs" mark-chunk ".bug-hunter/state.json" "<chunk-id>" done
64
+ ```
65
+
66
+ 4. Continue to next chunk.
67
+
68
+ ### 5d. Merge all findings
69
+
70
+ After all chunks complete, merge findings from state into `.bug-hunter/findings.md`.
71
+
72
+ If TOTAL FINDINGS: 0, skip Skeptic and Referee. Go to Step 7 (Final Report) in SKILL.md.
73
+
74
+ ---
75
+
76
+ ## Step 6: Run Skeptic(s)
77
+
78
+ Dispatch 1-2 Skeptics by directory using the standard dispatch pattern (see `_dispatch.md`, role=`skeptic`).
79
+
80
+ Split bugs by directory/service so each Skeptic has a focused scope. Merge results after completion.
81
+
82
+ ---
83
+
84
+ ## Step 7: Run Referee
85
+
86
+ Dispatch Referee using the standard dispatch pattern (see `_dispatch.md`, role=`referee`).
87
+
88
+ Pass merged Hunter findings + Skeptic challenges.
89
+
90
+ ---
91
+
92
+ ## After Step 7
93
+
94
+ Proceed to **Step 7** (Final Report) in SKILL.md.
@@ -0,0 +1,115 @@
1
+ # Fix Loop Mode (`--loop --fix`)
2
+
3
+ When both `--loop` and `--fix` are set, the ralph-loop wraps the ENTIRE pipeline (find + fix). Each iteration:
4
+
5
+ 1. **Phase 1**: Find bugs (or read from previous coverage file for remaining bugs)
6
+ 2. **Phase 2**: Fix confirmed bugs
7
+ 3. **Verify**: Run tests with baseline diff
8
+ 4. **Evaluate**: Update coverage file with fix status
9
+
10
+ ## CRITICAL: Starting the ralph-loop
11
+
12
+ **You MUST call the `ralph_start` tool to begin the loop.** Without this call, the loop will not iterate.
13
+
14
+ When `LOOP_MODE=true` AND `FIX_MODE=true`, before running the first pipeline iteration:
15
+
16
+ 1. Build the task content from the TODO.md template below.
17
+ 2. Call the `ralph_start` tool:
18
+
19
+ ```
20
+ ralph_start({
21
+ name: "bug-hunter-fix-audit",
22
+ taskContent: <the TODO.md content below>,
23
+ maxIterations: 15
24
+ })
25
+ ```
26
+
27
+ 3. The ralph-loop system will then drive iteration. Each iteration:
28
+ - You receive the task prompt with the current checklist state.
29
+ - You execute one iteration of find + fix.
30
+ - You update `.bug-hunter/coverage.md` with results.
31
+ - If all bugs are FIXED and all CRITICAL/HIGH files are DONE → output `<promise>COMPLETE</promise>`.
32
+ - Otherwise → call `ralph_done` to proceed to the next iteration.
33
+
34
+ **Do NOT manually loop or re-invoke yourself.** The ralph-loop system handles iteration automatically.
35
+
36
+ ## Coverage file extension for fix mode
37
+
38
+ The `.bug-hunter/coverage.md` file gains additional sections:
39
+
40
+ ```markdown
41
+ ## Fixes
42
+ <!-- One line per bug. LATEST entry per BUG-ID is current status. -->
43
+ <!-- Format: BUG-ID|STATUS|ITERATION_FIXED|FILES_MODIFIED -->
44
+ <!-- STATUS: FIXED | FIX_REVERTED | FIX_FAILED | PARTIAL | FIX_CONFLICT | SKIPPED | FIXER_BUG -->
45
+ BUG-3|FIXED|1|src/auth/login.ts
46
+ BUG-7|FIXED|1|src/auth/login.ts
47
+ BUG-12|FIXED|2|src/api/users.ts
48
+
49
+ ## Test Results
50
+ <!-- One line per iteration. Format: ITERATION|PASSED|FAILED|NEW_FAILURES|RESOLVED -->
51
+ 1|45|3|2|0
52
+ 2|47|1|0|1
53
+ ```
54
+
55
+ **Parsing rule:** For each BUG-ID, use the LAST entry in the Fixes section. Earlier entries for the same BUG-ID are history — only the latest matters.
56
+
57
+ ## Loop iteration logic
58
+
59
+ ```
60
+ For each iteration:
61
+ 1. Read coverage file
62
+ 2. Collect (using LAST entry per BUG-ID):
63
+ - Unfixed bugs: latest STATUS in {FIX_REVERTED, FIX_FAILED, FIX_CONFLICT, SKIPPED, FIXER_BUG}
64
+ - Unscanned files: STATUS != DONE in Files section (CRITICAL/HIGH only)
65
+ 3. If unfixed bugs exist OR unscanned files exist:
66
+ a. If unscanned files -> run Phase 1 (find pipeline) on them -> get new confirmed bugs
67
+ b. Combine: unfixed bugs + newly confirmed bugs
68
+ c. Run Phase 2 (fix + verify) on combined list
69
+ d. Update coverage file (append new entries to Fixes section)
70
+ e. Call ralph_done to proceed to next iteration
71
+ 4. If all bugs FIXED and all CRITICAL/HIGH files DONE:
72
+ -> Run final test suite one more time
73
+ -> If no new failures:
74
+ Output <promise>COMPLETE</promise>
75
+ -> If pre-existing failures only:
76
+ Note "pre-existing test failures — not caused by bug fixes"
77
+ Output <promise>COMPLETE</promise>
78
+ ```
79
+
80
+ ## TODO.md task content for ralph_start
81
+
82
+ Use this as the `taskContent` parameter when calling `ralph_start`:
83
+
84
+ ```markdown
85
+ # Bug Hunt + Fix Audit
86
+
87
+ ## Discovery Tasks
88
+ - [ ] All CRITICAL files scanned
89
+ - [ ] All HIGH files scanned
90
+ - [ ] Findings verified through Skeptic+Referee pipeline
91
+
92
+ ## Fix Tasks
93
+ - [ ] All Critical bugs fixed
94
+ - [ ] All Medium bugs fixed
95
+ - [ ] All Low bugs fixed (best effort)
96
+ - [ ] No new test failures introduced
97
+ - [ ] Build and typecheck pass
98
+
99
+ ## Completion
100
+ - [ ] ALL_TASKS_COMPLETE
101
+
102
+ ## Instructions
103
+ 1. Read .bug-hunter/coverage.md for previous iteration state
104
+ 2. Parse Files table — collect unscanned CRITICAL/HIGH files
105
+ 3. Parse Fixes table — collect unfixed bugs (latest entry not FIXED)
106
+ 4. If unscanned files exist: run Phase 1 (find pipeline) on them
107
+ 5. If unfixed bugs exist: run Phase 2 (fix pipeline) on them
108
+ 6. Update coverage file with results
109
+ 7. Output <promise>COMPLETE</promise> when all bugs are FIXED and no new test failures
110
+ 8. Otherwise call ralph_done to continue to the next iteration
111
+ ```
112
+
113
+ ## Ralph-loop state file for fix mode
114
+
115
+ When `--loop --fix`, the `.bug-hunter/ralph-loop.local.md` is created automatically by the `ralph_start` tool. You do NOT need to create this file manually — just call `ralph_start` with the correct parameters.