@codexstar/bug-hunter 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/CHANGELOG.md +151 -0
  2. package/LICENSE +21 -0
  3. package/README.md +665 -0
  4. package/SKILL.md +624 -0
  5. package/bin/bug-hunter +222 -0
  6. package/evals/evals.json +362 -0
  7. package/modes/_dispatch.md +121 -0
  8. package/modes/extended.md +94 -0
  9. package/modes/fix-loop.md +115 -0
  10. package/modes/fix-pipeline.md +384 -0
  11. package/modes/large-codebase.md +212 -0
  12. package/modes/local-sequential.md +143 -0
  13. package/modes/loop.md +125 -0
  14. package/modes/parallel.md +113 -0
  15. package/modes/scaled.md +76 -0
  16. package/modes/single-file.md +38 -0
  17. package/modes/small.md +86 -0
  18. package/package.json +56 -0
  19. package/prompts/doc-lookup.md +44 -0
  20. package/prompts/examples/hunter-examples.md +131 -0
  21. package/prompts/examples/skeptic-examples.md +87 -0
  22. package/prompts/fixer.md +103 -0
  23. package/prompts/hunter.md +146 -0
  24. package/prompts/recon.md +159 -0
  25. package/prompts/referee.md +122 -0
  26. package/prompts/skeptic.md +143 -0
  27. package/prompts/threat-model.md +122 -0
  28. package/scripts/bug-hunter-state.cjs +537 -0
  29. package/scripts/code-index.cjs +541 -0
  30. package/scripts/context7-api.cjs +133 -0
  31. package/scripts/delta-mode.cjs +219 -0
  32. package/scripts/dep-scan.cjs +343 -0
  33. package/scripts/doc-lookup.cjs +316 -0
  34. package/scripts/fix-lock.cjs +167 -0
  35. package/scripts/init-test-fixture.sh +19 -0
  36. package/scripts/payload-guard.cjs +197 -0
  37. package/scripts/run-bug-hunter.cjs +892 -0
  38. package/scripts/tests/bug-hunter-state.test.cjs +87 -0
  39. package/scripts/tests/code-index.test.cjs +57 -0
  40. package/scripts/tests/delta-mode.test.cjs +47 -0
  41. package/scripts/tests/fix-lock.test.cjs +36 -0
  42. package/scripts/tests/fixtures/flaky-worker.cjs +63 -0
  43. package/scripts/tests/fixtures/low-confidence-worker.cjs +73 -0
  44. package/scripts/tests/fixtures/success-worker.cjs +42 -0
  45. package/scripts/tests/payload-guard.test.cjs +41 -0
  46. package/scripts/tests/run-bug-hunter.test.cjs +403 -0
  47. package/scripts/tests/test-utils.cjs +59 -0
  48. package/scripts/tests/worktree-harvest.test.cjs +297 -0
  49. package/scripts/triage.cjs +528 -0
  50. package/scripts/worktree-harvest.cjs +516 -0
  51. package/templates/subagent-wrapper.md +109 -0
package/README.md ADDED
@@ -0,0 +1,665 @@
1
+ <p align="center">
2
+ <img src="docs/images/hero.png" alt="Bug Hunter β€” AI-powered adversarial code security scanner with multi-agent pipeline for automated vulnerability detection, false-positive elimination, and safe auto-fix" width="720">
3
+ </p>
4
+
5
+ <h1 align="center">πŸ› Bug Hunter</h1>
6
+ <p align="center"><strong>AI-powered adversarial bug finding that argues with itself to surface real vulnerabilities β€” and auto-fixes them safely.</strong></p>
7
+ <p align="center">
8
+ <a href="#install">Install</a> Β·
9
+ <a href="#usage">Usage</a> Β·
10
+ <a href="#how-the-adversarial-pipeline-works">How It Works</a> Β·
11
+ <a href="#features">Features</a> Β·
12
+ <a href="#security-classification-stride-cwe-cvss">Security</a> Β·
13
+ <a href="#supported-languages-and-frameworks">Languages</a>
14
+ </p>
15
+
16
+ ---
17
+
18
+ ## Install
19
+
20
+ ```bash
21
+ # One-line install for any IDE (Claude Code, Cursor, Windsurf, Copilot, Kiro)
22
+ npx skills add codexstar69/bug-hunter
23
+ ```
24
+
25
+ **Or install globally via npm:**
26
+
27
+ ```bash
28
+ npm install -g @codexstar/bug-hunter
29
+ bug-hunter install # auto-detects your IDE/agent
30
+ bug-hunter doctor # verify environment
31
+ ```
32
+
33
+ **Or clone manually:**
34
+
35
+ ```bash
36
+ git clone https://github.com/codexstar69/bug-hunter.git ~/.agents/skills/bug-hunter
37
+ ```
38
+
39
+ **Optional (recommended):** Install [Context Hub](https://github.com/andrewyng/context-hub) for curated documentation verification:
40
+ ```bash
41
+ npm install -g @aisuite/chub
42
+ ```
43
+
44
+ > **Requirements:** Node.js 18+. No other dependencies.
45
+ >
46
+ > **Works with:** [Pi](https://github.com/mariozechner/pi-coding-agent), Claude Code, Codex, Cursor, Windsurf, Kiro, Copilot β€” or any AI agent that can read files and run shell commands.
47
+
48
+ ---
49
+
50
+ ## Usage
51
+
52
+ ```bash
53
+ /bug-hunter # scan entire project, auto-fix confirmed bugs
54
+ /bug-hunter src/ # scan a specific directory
55
+ /bug-hunter lib/auth.ts # scan a single file
56
+ /bug-hunter --scan-only src/ # report only β€” no code changes
57
+ /bug-hunter --fix --approve src/ # ask before each fix
58
+ /bug-hunter -b feature-xyz # scan only files changed in branch (vs main)
59
+ /bug-hunter --staged # scan staged files (pre-commit hook)
60
+ /bug-hunter --deps --threat-model # full audit: CVEs + STRIDE threat model
61
+ ```
62
+
63
+ ---
64
+
65
+ ## What Is Bug Hunter
66
+
67
+ Three AI agents argue about your code. One hunts for bugs. One tries to disprove every finding. One delivers the final verdict. Only bugs that survive all three stages make the report.
68
+
69
+ This eliminates the two biggest problems with AI code review: **false positive overload** (the Skeptic catches them) and **fixes that break things** (canary rollout with automatic rollback catches those).
70
+
71
+ ---
72
+
73
+ ## Table of Contents
74
+
75
+ - [How the Adversarial Pipeline Works](#how-the-adversarial-pipeline-works)
76
+ - [Features](#features)
77
+ - [Security Classification β€” STRIDE, CWE, and CVSS](#security-classification-stride-cwe-cvss)
78
+ - [Threat Modeling with STRIDE](#threat-modeling-with-stride)
79
+ - [Dependency CVE Scanning](#dependency-cve-scanning)
80
+ - [Strategic Fix Planning and Safe Auto-Fix](#strategic-fix-planning-and-safe-auto-fix)
81
+ - [Structured JSON Output for CI/CD Integration](#structured-json-output-for-cicd-integration)
82
+ - [Output Files Reference](#output-files-reference)
83
+ - [Supported Languages and Frameworks](#supported-languages-and-frameworks)
84
+ - [CLI Flags Reference](#cli-flags-reference)
85
+ - [Self-Test and Validation](#self-test-and-validation)
86
+ - [Project Structure](#project-structure)
87
+ - [License](#license)
88
+
89
+ ---
90
+
91
+ ## The Problem with AI Code Review
92
+
93
+ Traditional AI code review tools suffer from two persistent failure modes:
94
+
95
+ 1. **False positive overload.** Developers waste hours triaging "bugs" that aren't real β€” the code is fine, or the framework already handles the edge case. This erodes trust and leads teams to ignore automated findings entirely.
96
+
97
+ 2. **Fixes that introduce regressions.** Automated fixers often break working code because they lack full context β€” they don't understand the test suite, the framework's implicit behaviors, or the upstream dependencies.
98
+
99
+ Bug Hunter eliminates both problems:
100
+
101
+ - **False positives** are filtered through adversarial debate. The Hunter finds bugs, the Skeptic tries to disprove them with counter-evidence, and the Referee delivers an independent verdict β€” replicating the dynamics of a real multi-reviewer code review, but automated and reproducible.
102
+
103
+ - **Regressions from fixes** are prevented by a strategic fix pipeline that captures test baselines, applies canary rollouts, checkpoints every commit, auto-reverts failures, and re-scans fixed code for newly introduced bugs.
104
+
105
+ ---
106
+
107
+ ## How the Adversarial Pipeline Works
108
+
109
+ ### Pipeline Architecture Overview
110
+
111
+ <p align="center">
112
+ <img src="docs/images/pipeline-overview.png" alt="Bug Hunter 8-stage adversarial pipeline architecture β€” Triage, Recon, Hunter deep scan, Documentation verification, Skeptic adversarial challenge, Referee independent verdict, Strategic fix plan, and Verification" width="100%">
113
+ </p>
114
+
115
+ The pipeline processes your code through eight sequential stages. Each stage feeds structured output to the next, creating a chain of evidence that eliminates noise and surfaces only confirmed, real bugs.
116
+
117
+ ```
118
+ Your Code
119
+ ↓
120
+ πŸ” Triage β€” Classifies files by risk in <2s, zero AI cost
121
+ ↓
122
+ πŸ—ΊοΈ Recon β€” Maps tech stack, identifies high-risk attack surfaces
123
+ ↓
124
+ 🎯 Hunter β€” Deep behavioral scan: logic errors, security holes, race conditions
125
+ ↓ ↕ verifies claims against official library documentation
126
+ πŸ›‘οΈ Skeptic β€” Adversarial challenge: attempts to DISPROVE every finding
127
+ ↓ ↕ verifies dismissals against official documentation
128
+ βš–οΈ Referee β€” Independent final judge: re-reads code, delivers verdict
129
+ ↓
130
+ πŸ“‹ Report β€” Confirmed bugs only, with severity, STRIDE/CWE tags, CVSS scores
131
+ ↓
132
+ πŸ“ Fix Plan β€” Strategic plan: priority ordering, canary rollout, safety gates
133
+ ↓
134
+ πŸ”§ Fixer β€” Executes fixes sequentially on a dedicated git branch
135
+ ↓ ↕ checks documentation for correct API usage in patches
136
+ βœ… Verify β€” Tests every fix, reverts failures, re-scans for fixer-introduced bugs
137
+ ```
138
+
139
+ ### Adversarial Debate β€” Hunter vs Skeptic vs Referee
140
+
141
+ <p align="center">
142
+ <img src="docs/images/adversarial-debate.png" alt="Adversarial debate workflow diagram β€” Hunter agent reports bug findings, Skeptic agent attempts disproof with counter-evidence, Referee agent delivers independent verdict with confidence scoring" width="100%">
143
+ </p>
144
+
145
+ The core innovation is **structured adversarial debate** between agents with opposing incentives. This mirrors how real security teams operate β€” a penetration tester finds vulnerabilities, a defender challenges the findings, and a security architect makes the final call.
146
+
147
+ Each agent independently reads the source code. No agent trusts another's analysis β€” they verify claims by re-reading the actual code and checking official documentation.
148
+
149
+ ### Agent Incentive Scoring System
150
+
151
+ | Agent | Earns Points For | Loses Points For |
152
+ |-------|-----------------|-----------------|
153
+ | 🎯 **Hunter** | Reporting real, confirmed bugs | Reporting false positives |
154
+ | πŸ›‘οΈ **Skeptic** | Successfully disproving false positives | Dismissing real bugs (2Γ— penalty) |
155
+ | βš–οΈ **Referee** | Accurate, well-reasoned final verdicts | Blind trust in either Hunter or Skeptic |
156
+
157
+ This scoring creates a **self-correcting equilibrium**. The Hunter doesn't flood the report with low-quality findings because false positives reduce its score. The Skeptic doesn't dismiss everything because missing a real bug incurs a double penalty. The Referee can't rubber-stamp β€” it must independently verify.
158
+
159
+ ---
160
+
161
+ ---
162
+
163
+ ## Features
164
+
165
+ ### Zero-Token Triage β€” Instant File Classification
166
+
167
+ Before any AI agent runs, a lightweight Node.js script (`scripts/triage.cjs`) scans your entire codebase in **under 2 seconds**. It classifies every file by risk level β€” CRITICAL, HIGH, MEDIUM, LOW, or CONTEXT-ONLY β€” computes a token budget, and selects the optimal scanning strategy.
168
+
169
+ This means **zero wasted AI tokens** on file discovery and classification. A 2,000-file monorepo is triaged in the same time as a 10-file project.
170
+
171
+ The triage output drives every downstream decision: which files the Hunter reads first, how many parallel workers to spawn, and whether loop mode is needed for complete coverage.
172
+
173
+ ### Deep Bug Hunting β€” Runtime Behavioral Analysis
174
+
175
+ The Hunter agent reads your code file-by-file, prioritized by risk level, and searches for bugs that cause **real problems at runtime**:
176
+
177
+ - **Logic errors** β€” wrong comparisons, off-by-one, inverted conditions, unreachable branches
178
+ - **Security vulnerabilities** β€” SQL injection, XSS, path traversal, IDOR, authentication bypass, SSRF
179
+ - **Race conditions** β€” concurrent access without synchronization, deadlock patterns, TOCTOU
180
+ - **Error handling gaps** β€” swallowed exceptions, unhandled promise rejections, missing edge cases
181
+ - **Data integrity issues** β€” silent truncation, encoding corruption, timezone mishandling, integer overflow
182
+ - **API contract violations** β€” type mismatches between callers and callees, incorrect callback signatures
183
+ - **Resource leaks** β€” unclosed database connections, file handles, event listeners, WebSocket connections
184
+
185
+ The Hunter does **not** report: code style preferences, naming conventions, unused variables, TODO comments, or subjective improvement suggestions. Only behavioral bugs that affect runtime correctness or security.
186
+
187
+ ### Official Documentation Verification via Context Hub + Context7
188
+
189
+ <p align="center">
190
+ <img src="docs/images/doc-verify-fix-plan.png" alt="Documentation verification workflow β€” agents check official library docs via Context Hub and Context7 API before making claims, plus strategic fix planning timeline with canary rollout and checkpoint verification" width="100%">
191
+ </p>
192
+
193
+ AI models frequently make incorrect assumptions about library behavior β€” "Express sanitizes input by default" (it doesn't), "Prisma parameterizes `$queryRaw` automatically" (it depends on usage). These wrong assumptions produce both false positives and missed real bugs.
194
+
195
+ Bug Hunter solves this by **verifying claims against official documentation** via [Context Hub](https://github.com/andrewyng/context-hub) (curated, versioned docs) with [Context7](https://context7.com) as a fallback, before any agent makes an assertion about framework behavior.
196
+
197
+ #### Which Agents Verify Documentation and When
198
+
199
+ | Agent | Verification Trigger | Example Query |
200
+ |-------|---------------------|---------------|
201
+ | 🎯 **Hunter** | Claiming a framework lacks a protection | "Does Express.js escape HTML in responses?" β†’ Express docs confirm it doesn't β†’ XSS reported |
202
+ | πŸ›‘οΈ **Skeptic** | Disproving a finding based on framework behavior | "Does Prisma parameterize `$queryRaw`?" β†’ Prisma docs show tagged template parameterization β†’ false positive dismissed |
203
+ | πŸ”§ **Fixer** | Implementing a fix using a library API | "Correct `helmet()` middleware pattern in Express?" β†’ docs β†’ fix uses documented API |
204
+
205
+ #### Documentation Verification in Practice
206
+
207
+ When the Hunter reports a potential SQL injection:
208
+
209
+ ```
210
+ 1. Hunter reads code: db.query(`SELECT * FROM users WHERE id = ${userId}`)
211
+ 2. Hunter queries: "Does node-postgres parameterize template literals?"
212
+ β†’ Runs: node doc-lookup.cjs get "/node-postgres/node-pg" "template literal queries"
213
+ β†’ pg docs: template literals are interpolated directly, NOT parameterized
214
+ 3. Hunter reports: "SQL injection β€” per pg docs, template literals are string-interpolated"
215
+ ```
216
+
217
+ When the Skeptic reviews the same finding:
218
+
219
+ ```
220
+ 1. Skeptic independently re-reads the source code
221
+ 2. Skeptic queries the same documentation to verify the Hunter's claim
222
+ 3. Skeptic confirms: "pg documentation agrees β€” this is a real injection vector"
223
+ 4. Finding survives to Referee stage
224
+ ```
225
+
226
+ #### Abuse Prevention Rules
227
+
228
+ - Agents only query docs for a **specific claim to verify** β€” not speculatively for every import
229
+ - **One lookup per claim** β€” no chained exploratory searches
230
+ - If the API returns nothing useful: "Could not verify from docs β€” proceeding based on code analysis only"
231
+ - Agents **cite their evidence**: "Per Express docs: [exact quote]" β€” so reasoning is auditable
232
+
233
+ #### Supported Ecosystem Coverage
234
+
235
+ Documentation verification works for any library available in [Context Hub](https://github.com/andrewyng/context-hub) (curated docs) or indexed by Context7 β€” covering the majority of popular packages across **npm, PyPI, Go modules, Rust crates, Ruby gems**, and more.
236
+
237
+ ### Adversarial Skeptic with 15 Hard Exclusion Rules
238
+
239
+ The Skeptic doesn't rubber-stamp findings. It re-reads the actual source code for every reported bug and attempts to disprove it. Before deep adversarial analysis, it applies **15 hard exclusion rules** β€” settled false-positive categories that are instantly dismissed:
240
+
241
+ | # | Exclusion Rule | Rationale |
242
+ |---|---------------|-----------|
243
+ | 1 | DoS claims without demonstrated amplification | Theoretical only |
244
+ | 2 | Rate limiting concerns | Informational, not behavioral bugs |
245
+ | 3 | Memory safety in memory-safe languages | Rust safe code, Go, Java GC |
246
+ | 4 | Findings in test files | Test code, not production |
247
+ | 5 | Log injection concerns | Low-impact in most contexts |
248
+ | 6 | SSRF with attacker controlling only the path | Insufficient control for exploitation |
249
+ | 7 | LLM prompt injection | Out of scope for code review |
250
+ | 8 | ReDoS without a demonstrated >1s payload | Unproven impact |
251
+ | 9 | Documentation/config-only findings | Not runtime behavior |
252
+ | 10 | Missing audit logging | Informational, not a bug |
253
+ | 11 | Environment variables treated as untrusted | Server-side env is trusted |
254
+ | 12 | UUIDs treated as guessable | Cryptographically random by spec |
255
+ | 13 | Client-side-only auth checks with server enforcement | Server enforces correctly |
256
+ | 14 | Secrets on disk with proper file permissions | OS-level protection is sufficient |
257
+ | 15 | Memory/CPU exhaustion without external attack path | No exploitable entry point |
258
+
259
+ Findings that survive the exclusion filter receive full adversarial analysis: independent code re-reading, framework documentation verification, and confidence-gated verdicts.
260
+
261
+ ### Few-Shot Calibration for Precision Tuning
262
+
263
+ Hunter and Skeptic agents receive **worked calibration examples** before scanning β€” real findings with complete analysis chains showing the expected reasoning quality:
264
+
265
+ - **Hunter examples**: 3 confirmed bugs (SQL injection, IDOR, command injection) + 2 correctly identified false positives
266
+ - **Skeptic examples**: 2 accepted real bugs + 2 correctly disproved false positives + 1 manual-review edge case
267
+
268
+ These examples calibrate agent judgment and establish the expected evidence standard for every finding.
269
+
270
+ ### Automatic Codebase Scaling Strategies
271
+
272
+ Bug Hunter automatically selects the optimal scanning strategy based on your codebase size:
273
+
274
+ | Codebase Size | Strategy | Behavior |
275
+ |---------------|----------|----------|
276
+ | **1 file** | Single-file | Direct deep scan, zero overhead |
277
+ | **2–10 files** | Small | Quick recon + single deep pass |
278
+ | **11–60 files** | Parallel | Hybrid scanning with optional dual-lens verification |
279
+ | **60–120 files** | Extended | Sequential chunked scanning with progress checkpoints |
280
+ | **120–180 files** | Scaled | State-driven chunks with resume capability |
281
+ | **180+ files** | Large-codebase | Domain-scoped pipelines + boundary audits (loop mode, on by default) |
282
+
283
+ Loop mode is **on by default** β€” the pipeline runs iteratively until every critical and high-risk file has been audited, with persistent state enabling stop-and-resume workflows. Use `--no-loop` for a single-pass scan.
284
+
285
+ ---
286
+
287
+ ## Security Classification β€” STRIDE, CWE, and CVSS
288
+
289
+ Every security finding is tagged with industry-standard identifiers, making Bug Hunter output compatible with professional security tooling, compliance frameworks, and vulnerability management platforms.
290
+
291
+ ### STRIDE Threat Categorization
292
+
293
+ Each security bug is classified under one of the six STRIDE threat categories:
294
+
295
+ | Category | Threat Type | Example |
296
+ |----------|------------|---------|
297
+ | **S** β€” Spoofing | Identity falsification | Authentication bypass, JWT forgery |
298
+ | **T** β€” Tampering | Data modification | SQL injection, parameter manipulation |
299
+ | **R** β€” Repudiation | Action deniability | Missing audit logs for sensitive operations |
300
+ | **I** β€” Information Disclosure | Data leakage | Exposed API keys, verbose error messages |
301
+ | **D** β€” Denial of Service | Availability disruption | Unbounded queries, resource exhaustion |
302
+ | **E** β€” Elevation of Privilege | Unauthorized access escalation | IDOR, broken access control |
303
+
304
+ ### CWE Weakness Identification
305
+
306
+ Findings include the specific [CWE (Common Weakness Enumeration)](https://cwe.mitre.org/) identifier β€” the industry standard for classifying software weaknesses:
307
+
308
+ - **CWE-89** β€” SQL Injection
309
+ - **CWE-79** β€” Cross-Site Scripting (XSS)
310
+ - **CWE-22** β€” Path Traversal
311
+ - **CWE-639** β€” Insecure Direct Object Reference (IDOR)
312
+ - **CWE-78** β€” OS Command Injection
313
+ - **CWE-862** β€” Missing Authorization
314
+
315
+ CWE tags enable direct mapping to **OWASP Top 10**, **NIST NVD**, and compliance frameworks like **SOC 2** and **ISO 27001**.
316
+
317
+ ### CVSS 3.1 Severity Scoring
318
+
319
+ Critical and high-severity security bugs receive a **CVSS 3.1 vector and numeric score** (0.0–10.0):
320
+
321
+ ```
322
+ CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N β†’ 9.1 (Critical)
323
+ ```
324
+
325
+ CVSS scores enable **risk-based prioritization** β€” teams can set CI/CD gates that block merges on findings above a threshold score.
326
+
327
+ ### Enriched Referee Verdicts with Proof of Concept
328
+
329
+ <p align="center">
330
+ <img src="docs/images/security-finding-card.png" alt="Enriched security finding card showing bug ID, severity badge, STRIDE and CWE classification, CVSS 3.1 score ring, reachability and exploitability ratings, and proof of concept payload" width="100%">
331
+ </p>
332
+
333
+ For confirmed security bugs, the Referee enriches the verdict with professional-grade detail:
334
+
335
+ | Field | Description |
336
+ |-------|------------|
337
+ | **Reachability** | Can an external attacker reach this code path? (EXTERNAL / AUTHENTICATED / INTERNAL / UNREACHABLE) |
338
+ | **Exploitability** | How difficult is exploitation? (EASY / MEDIUM / HARD) |
339
+ | **CVSS 3.1 Score** | Numeric severity on the 0.0–10.0 scale with full vector string |
340
+ | **Proof of Concept** | Minimal benign PoC: payload, request, expected behavior, actual behavior |
341
+
342
+ #### Example Enriched Verdict
343
+
344
+ ```
345
+ VERDICT: REAL BUG | Confidence: High
346
+ - Reachability: EXTERNAL
347
+ - Exploitability: EASY
348
+ - CVSS: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N (9.1)
349
+ - Proof of Concept:
350
+ - Payload: ' OR '1'='1
351
+ - Request: GET /api/users?search=test' OR '1'='1
352
+ - Expected: Returns matching users only
353
+ - Actual: Returns ALL users β€” SQL injection bypasses WHERE clause
354
+ ```
355
+
356
+ ---
357
+
358
+ ## Threat Modeling with STRIDE
359
+
360
+ Run with `--threat-model` and Bug Hunter generates a comprehensive **STRIDE threat model** for your codebase:
361
+
362
+ - **Trust boundary mapping** β€” public β†’ authenticated β†’ internal service boundaries
363
+ - **Entry point identification** β€” HTTP routes, WebSocket handlers, queue consumers, cron jobs
364
+ - **Data flow analysis** β€” traces sensitive data from ingestion to storage to response
365
+ - **Tech-stack-specific patterns** β€” vulnerable vs. safe code examples for your exact framework versions
366
+
367
+ The threat model is saved to `.bug-hunter/threat-model.md` and automatically feeds into Hunter and Recon for more targeted analysis. Threat models are reused across runs and regenerated if older than 90 days.
368
+
369
+ ---
370
+
371
+ ## Dependency CVE Scanning
372
+
373
+ Run with `--deps` for **third-party vulnerability auditing**:
374
+
375
+ - **Package manager support**: npm, pnpm, yarn, bun (Node.js), pip (Python), go (Go), cargo (Rust)
376
+ - **Severity filtering**: only HIGH and CRITICAL CVEs β€” no noise from low-severity advisories
377
+ - **Lockfile-aware detection**: automatically identifies your package manager and runs the correct audit command
378
+ - **Reachability analysis**: scans your source code to determine if you actually *use* the vulnerable API β€” a vulnerable transitive dependency you never import is flagged as `NOT_REACHABLE`
379
+
380
+ Dependency findings are saved to `.bug-hunter/dep-findings.json` and cross-referenced by the Hunter when scanning your application code.
381
+
382
+ ---
383
+
384
+ ## Strategic Fix Planning and Safe Auto-Fix
385
+
386
+ Bug Hunter doesn't throw uncoordinated patches at your codebase. After the Referee confirms real bugs, the system builds a **strategic fix plan** with safety gates at every step β€” the difference between "an AI that edits files" and "an AI that engineers patches."
387
+
388
+ ### Phase 1 β€” Safety Setup and Git Branching
389
+
390
+ - Verifies you're in a git repository (warns if not β€” no rollback without version control)
391
+ - Captures current branch and commit hash as a **restore point**
392
+ - Stashes uncommitted changes safely β€” nothing is lost
393
+ - Creates a dedicated fix branch: `bug-hunter-fix-YYYYMMDD-HHmmss`
394
+ - Acquires a **single-writer lock** β€” prevents concurrent fixers from conflicting
395
+ - **Worktree isolation** (subagent/teams backends): Fixer runs in an isolated git worktree so edits never touch your working tree until verified β€” managed by `worktree-harvest.cjs` with automatic crash recovery
396
+
397
+ ### Phase 2 β€” Test Baseline Capture
398
+
399
+ - Auto-detects your project's test, typecheck, and build commands
400
+ - Runs the test suite once to record the **passing baseline**
401
+ - This baseline is critical: if a fix causes a previously-passing test to fail, the fix is auto-reverted
402
+
403
+ ### Phase 3 β€” Confidence-Gated Fix Queue
404
+
405
+ - **75% confidence gate**: only bugs the Referee confirmed with β‰₯75% confidence are auto-fixed
406
+ - Bugs below the threshold are marked `MANUAL_REVIEW` β€” reported but never auto-edited
407
+ - **Conflict resolution**: same-file bugs are grouped and ordered to prevent overlapping edits
408
+ - **Severity ordering**: Critical β†’ High β†’ Medium β†’ Low
409
+
410
+ ### Phase 4 β€” Canary Rollout Strategy
411
+
412
+ ```
413
+ Fix Plan: 7 eligible bugs | canary: 2 | rollout: 5 | manual-review: 3
414
+
415
+ Canary Phase:
416
+ BUG-1 (CRITICAL) β†’ fix SQL injection in users.ts:45 β†’ commit β†’ test β†’ βœ… pass
417
+ BUG-2 (CRITICAL) β†’ fix auth bypass in auth.ts:23 β†’ commit β†’ test β†’ βœ… pass
418
+ Canary passed β€” continuing rollout
419
+
420
+ Rollout Phase:
421
+ BUG-3 (HIGH) β†’ fix XSS in template.ts:89 β†’ commit β†’ test β†’ βœ… pass
422
+ BUG-4 (MEDIUM) β†’ fix race condition in queue.ts:112 β†’ commit β†’ test β†’ ❌ FAIL
423
+ β†’ Auto-reverting BUG-4 fix β†’ re-test β†’ βœ… pass (failure cleared)
424
+ β†’ BUG-4 status: FIX_REVERTED
425
+ BUG-5 (MEDIUM) β†’ fix error swallow in api.ts:67 β†’ commit β†’ test β†’ βœ… pass
426
+ ```
427
+
428
+ The 1–3 highest-severity bugs are fixed first as a **canary group**. If canary fixes break tests, the entire fix pipeline halts β€” no further changes are applied. If canaries pass, remaining fixes roll out sequentially with per-fix checkpoints.
429
+
430
+ ### Phase 5 β€” Post-Fix Verification and Re-Scan
431
+
432
+ After all fixes are applied, three verification steps run:
433
+
434
+ 1. **Full test suite** β€” compared against the baseline to surface any new failures
435
+ 2. **Typecheck and build** β€” catches compile-time errors the fixer may have introduced
436
+ 3. **Post-fix re-scan** β€” a lightweight Hunter re-scans ONLY the lines the Fixer changed, specifically looking for bugs the Fixer itself introduced (e.g., an off-by-one in new validation logic)
437
+
438
+ ### Fix Status Reference
439
+
440
+ Every bug receives a final status after the fix pipeline completes:
441
+
442
+ | Status | Meaning |
443
+ |--------|---------|
444
+ | **FIXED** | Patch applied, all tests pass, no fixer-introduced regressions |
445
+ | **FIX_REVERTED** | Patch applied but caused test failure β€” cleanly auto-reverted |
446
+ | **FIX_FAILED** | Patch caused failures and could not be cleanly reverted β€” needs manual intervention |
447
+ | **PARTIAL** | Minimal patch applied, but a larger refactor is needed for complete resolution |
448
+ | **SKIPPED** | Bug confirmed but fix not attempted (too risky, architectural scope, etc.) |
449
+ | **FIXER_BUG** | Post-fix re-scan detected that the Fixer introduced a new bug |
450
+ | **MANUAL_REVIEW** | Referee confidence below 75% β€” reported but not auto-fixed |
451
+
452
+ ### Documentation-Verified Fixes
453
+
454
+ The Fixer verifies correct API usage by querying official documentation before implementing patches:
455
+
456
+ ```
457
+ Example: Fixing SQL injection (BUG-1)
458
+
459
+ 1. Fixer reads Referee verdict: "SQL injection via string concatenation in pg query"
460
+ 2. Fixer queries: "Correct parameterized query pattern in node-postgres?"
461
+ β†’ Runs: node doc-lookup.cjs get "/node-postgres/node-pg" "parameterized queries"
462
+ β†’ pg docs: Use db.query('SELECT * FROM users WHERE id = $1', [userId])
463
+ 3. Fixer implements the documented pattern β€” not a guess from training data
464
+ 4. Checkpoint commit β†’ tests run β†’ pass βœ…
465
+ ```
466
+
467
+ This prevents a common failure: the Fixer "fixing" a bug using an API pattern that doesn't exist or behaves differently than expected.
468
+
469
+ ---
470
+
471
+ ## Structured JSON Output for CI/CD Integration
472
+
473
+ Every run produces machine-readable output at `.bug-hunter/findings.json` for pipeline automation:
474
+
475
+ ```json
476
+ {
477
+ "version": "3.0.0",
478
+ "scan_id": "scan-2026-03-10-083000",
479
+ "scan_date": "2026-03-10T08:30:00Z",
480
+ "mode": "parallel",
481
+ "target": "src/",
482
+ "files_scanned": 47,
483
+ "confirmed": [
484
+ {
485
+ "id": "BUG-1",
486
+ "severity": "CRITICAL",
487
+ "category": "security",
488
+ "stride": "Tampering",
489
+ "cwe": "CWE-89",
490
+ "file": "src/api/users.ts",
491
+ "lines": "45-49",
492
+ "reachability": "EXTERNAL",
493
+ "exploitability": "EASY",
494
+ "cvss_score": 9.1,
495
+ "cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N",
496
+ "poc": {
497
+ "payload": "' OR '1'='1",
498
+ "request": "GET /api/users?search=test' OR '1'='1",
499
+ "expected": "Returns matching users only",
500
+ "actual": "Returns ALL users (SQL injection)"
501
+ }
502
+ }
503
+ ],
504
+ "summary": {
505
+ "total_reported": 12,
506
+ "confirmed": 5,
507
+ "dismissed": 7,
508
+ "by_severity": { "CRITICAL": 2, "HIGH": 1, "MEDIUM": 1, "LOW": 1 },
509
+ "by_stride": { "Tampering": 2, "InfoDisclosure": 1, "ElevationOfPrivilege": 2 }
510
+ }
511
+ }
512
+ ```
513
+
514
+ Use this output for **CI/CD pipeline gating** (block merges on CRITICAL findings), **security dashboards** (Grafana, Datadog), or **automated ticket creation** (Jira, Linear, GitHub Issues).
515
+
516
+ ---
517
+
518
+ ## Output Files Reference
519
+
520
+ Every run creates a `.bug-hunter/` directory (add to `.gitignore`) containing:
521
+
522
+ | File | Generated | Contents |
523
+ |------|-----------|----------|
524
+ | `report.md` | Always | Human-readable report: confirmed bugs, dismissed findings, coverage stats |
525
+ | `findings.json` | Always | Machine-readable JSON for CI/CD and dashboards |
526
+ | `triage.json` | Always | File classification, risk map, strategy selection, token estimates |
527
+ | `recon.md` | Always | Tech stack analysis, attack surface mapping, scan order |
528
+ | `findings.md` | Always | Raw Hunter findings before Skeptic review |
529
+ | `skeptic.md` | Always | Skeptic challenge decisions with evidence |
530
+ | `referee.md` | Always | Referee final verdicts with enrichment |
531
+ | `fix-report.md` | Fix mode | Per-bug fix status, verification results, git diff summary |
532
+ | `fix-report.json` | Fix mode | Machine-readable fix results for CI/CD gating and dashboards |
533
+ | `worktree-*/` | Worktree fix mode | Temporary isolated worktrees for Fixer subagents (auto-cleaned) |
534
+ | `threat-model.md` | `--threat-model` | STRIDE threat model with trust boundaries and data flows |
535
+ | `dep-findings.json` | `--deps` | Dependency CVE results with reachability analysis |
536
+ | `state.json` | Large scans | Progress checkpoint for resume after interruption |
537
+
538
+ ---
539
+
540
+ ## Supported Languages and Frameworks
541
+
542
+ **Languages:** TypeScript, JavaScript, Python, Go, Rust, Java, Kotlin, Ruby, PHP
543
+
544
+ **Frameworks:** Express, Fastify, Next.js, Django, Flask, FastAPI, Gin, Echo, Actix, Spring Boot, Rails, Laravel β€” and any framework indexed by Context7 for documentation verification.
545
+
546
+ The pipeline adapts to whatever it finds. Triage classifies files by extension and risk patterns; Hunter and Skeptic agents adjust their security checklists based on the detected tech stack.
547
+
548
+ ---
549
+
550
+ ## CLI Flags Reference
551
+
552
+ | Flag | Behavior |
553
+ |------|----------|
554
+ | *(no flags)* | Scan current directory, auto-fix confirmed bugs |
555
+ | `src/` or `file.ts` | Scan specific path |
556
+ | `-b branch-name` | Scan files changed in branch (vs main) |
557
+ | `-b branch --base dev` | Scan branch diff against specific base |
558
+ | `--staged` | Scan git-staged files (pre-commit hook integration) |
559
+ | `--scan-only` | Report only β€” no code changes |
560
+ | `--fix` | Find and auto-fix bugs (default behavior) |
561
+ | `--approve` | Interactive mode β€” ask before each fix |
562
+ | `--autonomous` | Full auto-fix with zero intervention |
563
+ | `--loop` | Iterative mode β€” runs until 100% critical file coverage **(on by default)** |
564
+ | `--no-loop` | Disable loop mode β€” single-pass scan only |
565
+ | `--deps` | Include dependency CVE scanning with reachability analysis |
566
+ | `--threat-model` | Generate or use STRIDE threat model for targeted security analysis |
567
+ | `--dry-run` | Preview planned fixes without editing files β€” outputs diff previews and `fix-report.json` |
568
+
569
+ All flags compose: `/bug-hunter --deps --threat-model --fix src/`
570
+
571
+ ---
572
+
573
+ ## Self-Test and Validation
574
+
575
+ Bug Hunter ships with a test fixture containing an Express app with **6 intentionally planted bugs** (2 Critical, 3 Medium, 1 Low):
576
+
577
+ ```bash
578
+ /bug-hunter test-fixture/
579
+ ```
580
+
581
+ **Expected benchmark results:**
582
+ - βœ… All 6 bugs found by Hunter
583
+ - βœ… At least 1 false positive challenged and dismissed by Skeptic
584
+ - βœ… All 6 planted bugs confirmed by Referee
585
+
586
+ **Calibration thresholds:** If fewer than 5 of 6 are found, prompts need tuning. If more than 3 false positives survive to Referee, the Skeptic prompt needs tightening.
587
+
588
+ ---
589
+
590
+ ## Project Structure
591
+
592
+ ```
593
+ bug-hunter/
594
+ β”œβ”€β”€ SKILL.md # Pipeline orchestration logic
595
+ β”œβ”€β”€ README.md # This documentation
596
+ β”œβ”€β”€ CHANGELOG.md # Version history
597
+ β”œβ”€β”€ package.json # npm package config (@codexstar/bug-hunter)
598
+ β”‚
599
+ β”œβ”€β”€ bin/
600
+ β”‚ └── bug-hunter # CLI entry point (install, doctor, info)
601
+ β”‚
602
+ β”œβ”€β”€ docs/
603
+ β”‚ └── images/ # Documentation visuals
604
+ β”‚ β”œβ”€β”€ hero.png # Hero banner
605
+ β”‚ β”œβ”€β”€ pipeline-overview.png # 8-stage pipeline diagram
606
+ β”‚ β”œβ”€β”€ adversarial-debate.png # Hunter vs Skeptic vs Referee flow
607
+ β”‚ β”œβ”€β”€ doc-verify-fix-plan.png # Documentation verification + fix planning
608
+ β”‚ └── security-finding-card.png # Enriched finding card with CVSS
609
+ β”‚
610
+ β”œβ”€β”€ modes/ # Execution strategies by codebase size
611
+ β”‚ β”œβ”€β”€ single-file.md # 1 file
612
+ β”‚ β”œβ”€β”€ small.md # 2–10 files
613
+ β”‚ β”œβ”€β”€ parallel.md # 11–FILE_BUDGET files
614
+ β”‚ β”œβ”€β”€ extended.md # Chunked scanning
615
+ β”‚ β”œβ”€β”€ scaled.md # State-driven chunks with resume
616
+ β”‚ β”œβ”€β”€ large-codebase.md # Domain-scoped pipelines
617
+ β”‚ β”œβ”€β”€ local-sequential.md # Single-agent execution
618
+ β”‚ β”œβ”€β”€ loop.md # Iterative coverage loop
619
+ β”‚ β”œβ”€β”€ fix-pipeline.md # Auto-fix orchestration (with worktree isolation)
620
+ β”‚ β”œβ”€β”€ fix-loop.md # Fix + re-scan loop
621
+ β”‚ └── _dispatch.md # Shared dispatch patterns + worktree lifecycle
622
+ β”‚
623
+ β”œβ”€β”€ prompts/ # Agent system prompts
624
+ β”‚ β”œβ”€β”€ recon.md # Reconnaissance agent
625
+ β”‚ β”œβ”€β”€ hunter.md # Bug hunting agent
626
+ β”‚ β”œβ”€β”€ skeptic.md # Adversarial reviewer
627
+ β”‚ β”œβ”€β”€ referee.md # Final verdict judge
628
+ β”‚ β”œβ”€β”€ fixer.md # Auto-fix agent
629
+ β”‚ β”œβ”€β”€ doc-lookup.md # Documentation verification
630
+ β”‚ β”œβ”€β”€ threat-model.md # STRIDE threat model generator
631
+ β”‚ └── examples/ # Calibration few-shot examples
632
+ β”‚ β”œβ”€β”€ hunter-examples.md # 3 real + 2 false positives
633
+ β”‚ └── skeptic-examples.md # 2 accepted + 2 disproved + 1 review
634
+ β”‚
635
+ β”œβ”€β”€ scripts/ # Node.js helpers (zero AI tokens)
636
+ β”‚ β”œβ”€β”€ triage.cjs # File classification (<2s)
637
+ β”‚ β”œβ”€β”€ dep-scan.cjs # Dependency CVE scanner
638
+ β”‚ β”œβ”€β”€ doc-lookup.cjs # Documentation lookup (chub + Context7 fallback)
639
+ β”‚ β”œβ”€β”€ context7-api.cjs # Context7 API fallback
640
+ β”‚ β”œβ”€β”€ run-bug-hunter.cjs # Chunk orchestrator
641
+ β”‚ β”œβ”€β”€ bug-hunter-state.cjs # Persistent state for resume
642
+ β”‚ β”œβ”€β”€ delta-mode.cjs # Changed-file scope reduction
643
+ β”‚ β”œβ”€β”€ payload-guard.cjs # Subagent payload validation
644
+ β”‚ β”œβ”€β”€ fix-lock.cjs # Concurrent fixer prevention
645
+ β”‚ β”œβ”€β”€ worktree-harvest.cjs # Worktree isolation for Fixer subagents
646
+ β”‚ β”œβ”€β”€ code-index.cjs # Cross-domain analysis (optional)
647
+ β”‚ └── tests/ # Test suite (node --test)
648
+ β”‚ β”œβ”€β”€ run-bug-hunter.test.cjs # Orchestrator tests
649
+ β”‚ └── worktree-harvest.test.cjs # Worktree lifecycle tests
650
+ β”‚
651
+ β”œβ”€β”€ templates/
652
+ β”‚ └── subagent-wrapper.md # Subagent launch template (with worktree rules)
653
+ β”‚
654
+ └── test-fixture/ # 6 planted bugs for validation
655
+ β”œβ”€β”€ server.js
656
+ β”œβ”€β”€ auth.js
657
+ β”œβ”€β”€ db.js
658
+ └── users.js
659
+ ```
660
+
661
+ ---
662
+
663
+ ## License
664
+
665
+ MIT β€” use it however you want.