npm - @codexstar/bug-hunter - Versions diffs - 3.0.0 → 3.0.6 - Mend

@codexstar/bug-hunter 3.0.0 → 3.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (78) hide show

package/CHANGELOG.md +149 -83
package/README.md +150 -15
package/SKILL.md +94 -27
package/agents/openai.yaml +4 -0
package/bin/bug-hunter +9 -3
package/docs/images/2026-03-12-fix-plan-rollout.png +0 -0
package/docs/images/2026-03-12-hero-bug-hunter-overview.png +0 -0
package/docs/images/2026-03-12-machine-readable-artifacts.png +0 -0
package/docs/images/2026-03-12-pr-review-flow.png +0 -0
package/docs/images/2026-03-12-security-pack.png +0 -0
package/docs/images/adversarial-debate.png +0 -0
package/docs/images/doc-verify-fix-plan.png +0 -0
package/docs/images/hero.png +0 -0
package/docs/images/pipeline-overview.png +0 -0
package/docs/images/security-finding-card.png +0 -0
package/docs/plans/2026-03-11-structured-output-migration-plan.md +288 -0
package/docs/plans/2026-03-12-audit-bug-fixes-surgical-plan.md +193 -0
package/docs/plans/2026-03-12-enterprise-security-pack-e2e-plan.md +59 -0
package/docs/plans/2026-03-12-local-security-skills-integration-plan.md +39 -0
package/docs/plans/2026-03-12-pr-review-strategic-fix-flow.md +78 -0
package/evals/evals.json +366 -102
package/modes/extended.md +2 -2
package/modes/fix-loop.md +30 -30
package/modes/fix-pipeline.md +32 -6
package/modes/large-codebase.md +14 -15
package/modes/local-sequential.md +44 -20
package/modes/loop.md +56 -56
package/modes/parallel.md +3 -3
package/modes/scaled.md +2 -2
package/modes/single-file.md +3 -3
package/modes/small.md +11 -11
package/package.json +11 -1
package/prompts/fixer.md +37 -23
package/prompts/hunter.md +39 -20
package/prompts/referee.md +34 -20
package/prompts/skeptic.md +25 -22
package/schemas/coverage.schema.json +67 -0
package/schemas/examples/findings.invalid.json +13 -0
package/schemas/examples/findings.valid.json +17 -0
package/schemas/findings.schema.json +76 -0
package/schemas/fix-plan.schema.json +94 -0
package/schemas/fix-report.schema.json +105 -0
package/schemas/fix-strategy.schema.json +99 -0
package/schemas/recon.schema.json +31 -0
package/schemas/referee.schema.json +46 -0
package/schemas/shared.schema.json +51 -0
package/schemas/skeptic.schema.json +21 -0
package/scripts/bug-hunter-state.cjs +35 -12
package/scripts/code-index.cjs +11 -4
package/scripts/fix-lock.cjs +95 -25
package/scripts/payload-guard.cjs +24 -10
package/scripts/pr-scope.cjs +181 -0
package/scripts/prepublish-guard.cjs +82 -0
package/scripts/render-report.cjs +346 -0
package/scripts/run-bug-hunter.cjs +669 -33
package/scripts/schema-runtime.cjs +273 -0
package/scripts/schema-validate.cjs +40 -0
package/scripts/tests/bug-hunter-state.test.cjs +68 -3
package/scripts/tests/code-index.test.cjs +15 -0
package/scripts/tests/fix-lock.test.cjs +60 -2
package/scripts/tests/fixtures/flaky-worker.cjs +6 -1
package/scripts/tests/fixtures/low-confidence-worker.cjs +8 -2
package/scripts/tests/fixtures/success-worker.cjs +6 -1
package/scripts/tests/payload-guard.test.cjs +154 -2
package/scripts/tests/pr-scope.test.cjs +212 -0
package/scripts/tests/render-report.test.cjs +180 -0
package/scripts/tests/run-bug-hunter.test.cjs +686 -2
package/scripts/tests/security-skills-integration.test.cjs +29 -0
package/scripts/tests/skills-packaging.test.cjs +30 -0
package/scripts/tests/worktree-harvest.test.cjs +67 -1
package/scripts/worktree-harvest.cjs +62 -9
package/skills/README.md +19 -0
package/skills/commit-security-scan/SKILL.md +63 -0
package/skills/security-review/SKILL.md +57 -0
package/skills/threat-model-generation/SKILL.md +47 -0
package/skills/vulnerability-validation/SKILL.md +59 -0
package/templates/subagent-wrapper.md +12 -3
package/modes/_dispatch.md +0 -121

package/modes/extended.md CHANGED Viewed

@@ -35,7 +35,7 @@ After Recon completes, read `.bug-hunter/recon.md` to extract the risk map and t
 Partition files from `triage.scanOrder` (or the Recon risk map if no triage) into chunks:
 - **Service-aware partitioning (preferred):** If triage detected multiple domains, partition by domain.
-- **Risk-tier partitioning (fallback):** Process CRITICAL files first, then HIGH, then MEDIUM.
+- **Risk-tier partitioning (fallback):** Process CRITICAL files first, then HIGH, then MEDIUM, then LOW.
 - Chunk size: FILE_BUDGET ÷ 2 files per chunk (keep chunks small to avoid compaction).
 - Keep same-directory files together when possible.
@@ -67,7 +67,7 @@ For each chunk:
 ### 5d. Merge all findings
-After all chunks complete, merge findings from state into `.bug-hunter/findings.md`.
+After all chunks complete, merge findings from state into `.bug-hunter/findings.json`.
 If TOTAL FINDINGS: 0, skip Skeptic and Referee. Go to Step 7 (Final Report) in SKILL.md.

package/modes/fix-loop.md CHANGED Viewed

@@ -17,58 +17,56 @@ When `LOOP_MODE=true` AND `FIX_MODE=true`, before running the first pipeline ite
 2. Call the `ralph_start` tool:
 ```
+MAX_FIX_LOOP_ITERATIONS = max(
+  15,
+  min(250, ceil(SCANNABLE_FILES / max(FILE_BUDGET, 1)) + ELIGIBLE_BUG_COUNT + 8)
+)
 ralph_start({
   name: "bug-hunter-fix-audit",
   taskContent: <the TODO.md content below>,
-  maxIterations: 15
+  maxIterations: MAX_FIX_LOOP_ITERATIONS
 })
 ```
 3. The ralph-loop system will then drive iteration. Each iteration:
    - You receive the task prompt with the current checklist state.
    - You execute one iteration of find + fix.
-   - You update `.bug-hunter/coverage.md` with results.
-   - If all bugs are FIXED and all CRITICAL/HIGH files are DONE → output `<promise>COMPLETE</promise>`.
+   - You update `.bug-hunter/coverage.json` with results and render `.bug-hunter/coverage.md`.
+   - If all bugs are FIXED and all queued scannable source files are DONE → output `<promise>COMPLETE</promise>`.
    - Otherwise → call `ralph_done` to proceed to the next iteration.
 **Do NOT manually loop or re-invoke yourself.** The ralph-loop system handles iteration automatically.
 ## Coverage file extension for fix mode
-The `.bug-hunter/coverage.md` file gains additional sections:
+The `.bug-hunter/coverage.json` file carries the same loop state, plus fix
+entries:
-```markdown
-## Fixes
-<!-- One line per bug. LATEST entry per BUG-ID is current status. -->
-<!-- Format: BUG-ID|STATUS|ITERATION_FIXED|FILES_MODIFIED -->
-<!-- STATUS: FIXED | FIX_REVERTED | FIX_FAILED | PARTIAL | FIX_CONFLICT | SKIPPED | FIXER_BUG -->
-BUG-3|FIXED|1|src/auth/login.ts
-BUG-7|FIXED|1|src/auth/login.ts
-BUG-12|FIXED|2|src/api/users.ts
-## Test Results
-<!-- One line per iteration. Format: ITERATION|PASSED|FAILED|NEW_FAILURES|RESOLVED -->
-1|45|3|2|0
-2|47|1|0|1
+```json
+{
+  "fixes": [
+    { "bugId": "BUG-3", "status": "FIXED" },
+    { "bugId": "BUG-12", "status": "FIX_FAILED" }
+  ]
+}
 ```
-**Parsing rule:** For each BUG-ID, use the LAST entry in the Fixes section. Earlier entries for the same BUG-ID are history — only the latest matters.
 ## Loop iteration logic
 ```
 For each iteration:
-  1. Read coverage file
-  2. Collect (using LAST entry per BUG-ID):
-     - Unfixed bugs: latest STATUS in {FIX_REVERTED, FIX_FAILED, FIX_CONFLICT, SKIPPED, FIXER_BUG}
-     - Unscanned files: STATUS != DONE in Files section (CRITICAL/HIGH only)
+  1. Read coverage.json
+  2. Collect:
+     - Unfixed bugs: latest fix status in {FIX_REVERTED, FIX_FAILED, FIX_CONFLICT, SKIPPED, FIXER_BUG, MANUAL_REVIEW}
+     - Unscanned files: file status != done
   3. If unfixed bugs exist OR unscanned files exist:
      a. If unscanned files -> run Phase 1 (find pipeline) on them -> get new confirmed bugs
      b. Combine: unfixed bugs + newly confirmed bugs
      c. Run Phase 2 (fix + verify) on combined list
-     d. Update coverage file (append new entries to Fixes section)
+     d. Update coverage.json and re-render coverage.md
      e. Call ralph_done to proceed to next iteration
-  4. If all bugs FIXED and all CRITICAL/HIGH files DONE:
+  4. If all bugs FIXED and all queued scannable source files are DONE:
      -> Run final test suite one more time
      -> If no new failures:
         Output <promise>COMPLETE</promise>
@@ -87,6 +85,8 @@ Use this as the `taskContent` parameter when calling `ralph_start`:
 ## Discovery Tasks
 - [ ] All CRITICAL files scanned
 - [ ] All HIGH files scanned
+- [ ] All MEDIUM files scanned
+- [ ] All LOW files scanned
 - [ ] Findings verified through Skeptic+Referee pipeline
 ## Fix Tasks
@@ -100,13 +100,13 @@ Use this as the `taskContent` parameter when calling `ralph_start`:
 - [ ] ALL_TASKS_COMPLETE
 ## Instructions
-1. Read .bug-hunter/coverage.md for previous iteration state
-2. Parse Files table — collect unscanned CRITICAL/HIGH files
-3. Parse Fixes table — collect unfixed bugs (latest entry not FIXED)
+1. Read .bug-hunter/coverage.json for previous iteration state
+2. Parse the `files` array — collect unscanned CRITICAL/HIGH/MEDIUM/LOW files
+3. Parse the `fixes` array — collect unfixed bugs (latest entry not FIXED)
 4. If unscanned files exist: run Phase 1 (find pipeline) on them
 5. If unfixed bugs exist: run Phase 2 (fix pipeline) on them
-6. Update coverage file with results
-7. Output <promise>COMPLETE</promise> when all bugs are FIXED and no new test failures
+6. Update coverage.json with results and render coverage.md
+7. Output <promise>COMPLETE</promise> only when all queued files are DONE, all discovered bugs are FIXED, and no new test failures remain
 8. Otherwise call ralph_done to continue to the next iteration
 ```

package/modes/fix-pipeline.md CHANGED Viewed

@@ -50,11 +50,14 @@ DYNAMIC_TTL = max(1800, ELIGIBLE_COUNT * 600)   # 10 min per bug, minimum 30 min
 ```
 node "$SKILL_DIR/scripts/fix-lock.cjs" acquire ".bug-hunter/fix.lock" $DYNAMIC_TTL
 ```
+Record `LOCK_OWNER_TOKEN` from the returned JSON (`lock.ownerToken`).
 If lock cannot be acquired, stop Phase 2 to avoid concurrent mutation.
+**Owner token:** `acquire` returns `lock.ownerToken`; renew/release now require that token. Persist it for the entire Phase 2 run as `LOCK_OWNER_TOKEN`.
 **Lock renewal:** During Step 9 execution, renew the lock after each bug fix to prevent TTL expiry on long runs:
 ```
-node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock"
+node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN"
 ```
 **8b. Detect verification commands**
@@ -81,7 +84,16 @@ If `TEST_COMMAND` is not null:
 If baseline cannot run, set `BASELINE=null` and `FLAKY_TESTS={}` and continue with manual-verification warning.
-**8d. Build sequential fix plan**
+**8d. Build fix strategy + sequential fix plan**
+Before deciding what to patch, write `.bug-hunter/fix-strategy.json` and `.bug-hunter/fix-strategy.md`.
+The strategy artifact must classify each confirmed bug into one of:
+- `safe-autofix`
+- `manual-review`
+- `larger-refactor`
+- `architectural-remediation`
+If `PLAN_ONLY_MODE=true`, stop after the strategy artifact and fix-plan preview are written.
 Prepare bug queue:
 1. Apply confidence gate:
@@ -185,7 +197,7 @@ For each batch in order:
 8a. Renew lock after each batch:
    ```
-   node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock"
+   node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN"
    ```
 **Path B — Direct mode (`WORKTREE_MODE=false`):**
@@ -201,7 +213,7 @@ For each batch in order:
 7b. Record commit hash per BUG-ID in a fix ledger.
 8b. **Renew lock** after each bug fix:
    ```
-   node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock"
+   node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN"
    ```
 If a bug cannot be fixed, mark `SKIPPED` and continue.
@@ -260,7 +272,9 @@ Use exact fixed scope from the real base commit:
 2. Build changed hunks list.
 3. Run one lightweight Hunter on changed hunks only with a **severity floor of MEDIUM**:
    - Only report fixer-introduced bugs at MEDIUM severity or above.
-   - LOW-severity issues from the fixer are logged to `.bug-hunter/fix-report.md` as informational notes but do NOT trigger `FIXER_BUG` status.
+   - LOW-severity issues from the fixer are logged in `.bug-hunter/fix-report.json`
+     (and optional derived `.bug-hunter/fix-report.md`) as informational notes
+     but do NOT trigger `FIXER_BUG` status.
 This removes ambiguity from `<base-branch>` and works for path scans, staged scans, and branch scans.
@@ -301,7 +315,7 @@ If stash was created (not applicable in dry-run mode):
 Always release single-writer lock at the end (success or failure path):
 ```
-node "$SKILL_DIR/scripts/fix-lock.cjs" release ".bug-hunter/fix.lock"
+node "$SKILL_DIR/scripts/fix-lock.cjs" release ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN"
 ```
 If an earlier step aborts Phase 2, run the same release command AND worktree cleanup-all in best-effort cleanup before returning.
@@ -375,6 +389,18 @@ Write `.bug-hunter/fix-report.json` alongside the markdown report:
 }
 ```
+Validate it immediately:
+```bash
+node "$SKILL_DIR/scripts/schema-validate.cjs" fix-report ".bug-hunter/fix-report.json"
+```
+Render the Markdown companion when humans need it:
+```bash
+node "$SKILL_DIR/scripts/render-report.cjs" fix-report ".bug-hunter/fix-report.json" > ".bug-hunter/fix-report.md"
+```
 Rules:
 - `dry_run: true` when `DRY_RUN_MODE=true` — the `fixes` array contains planned diffs instead of commit hashes.
 - `circuit_breaker_tripped: true` when the circuit breaker halted the pipeline.

package/modes/large-codebase.md CHANGED Viewed

@@ -67,7 +67,7 @@ This is fast — no file reading, just directory listing and heuristic classific
 Process ONE domain at a time, running the **full pipeline** (Recon → Hunter → Skeptic → Referee) within each domain:
 ```
-For each domain (CRITICAL first, then HIGH, then MEDIUM):
+For each domain (CRITICAL first, then HIGH, then MEDIUM, then LOW):
   1. Get this domain's file list:
      - If triage exists: use triage.domainFileLists[domainPath]
      - If no triage: use fd/find to list files in this domain's directory
@@ -78,9 +78,9 @@ For each domain (CRITICAL first, then HIGH, then MEDIUM):
   Write domain results to:
     .bug-hunter/domains/<domain-name>/recon.md
-    .bug-hunter/domains/<domain-name>/findings.md
-    .bug-hunter/domains/<domain-name>/skeptic.md
-    .bug-hunter/domains/<domain-name>/referee.md
+    .bug-hunter/domains/<domain-name>/findings.json
+    .bug-hunter/domains/<domain-name>/skeptic.json
+    .bug-hunter/domains/<domain-name>/referee.json
   Record in state:
     node "$SKILL_DIR/scripts/bug-hunter-state.cjs" record-findings ...
@@ -123,7 +123,7 @@ Write boundary results to `.bug-hunter/domains/_boundaries/`.
 After all domains + boundaries are audited:
-1. Read all domain `referee.md` files and boundary results.
+1. Read all domain `referee.json` files and boundary results.
 2. Merge findings, deduplicate by file + line + claim.
 3. Renumber BUG-IDs globally.
 4. Build the final report per Step 7 in SKILL.md.
@@ -163,17 +163,16 @@ Use `.bug-hunter/state.json` with domain-aware structure:
 - Iteration N-1: Tier 3 merge and report
 - Iteration N: Coverage check → DONE or continue with missed domains
-The ralph-loop's coverage check reads the state file and only marks DONE when all CRITICAL and HIGH domains show status `done`.
+The ralph-loop's coverage check reads the state file and only marks DONE when all queued domains show status `done`.
-## Optimization: Skip LOW domains
+## Default autonomous behavior
-For truly huge codebases (1,000+ files), skip LOW-tier domains entirely unless `--exhaustive` is specified. UI components, test utilities, and formatting helpers rarely contain runtime bugs worth the context cost.
-Report skipped domains in the final report:
-```
-ℹ️ Skipped [N] LOW-tier domains ([M] files) for efficiency.
-Use `--exhaustive` to include all domains.
-```
+Autonomous mode is exhaustive by default:
+- Finish all CRITICAL domains first.
+- Then continue through HIGH domains.
+- Then continue through MEDIUM domains.
+- Then continue through LOW domains.
+- Only stop when the domain queue is exhausted, the user interrupts, or a hard blocker prevents safe progress.
 ## Optimization: Delta-first for repeat scans
@@ -209,4 +208,4 @@ When executing large-codebase mode:
 - [ ] Tier 3: Merge all domain + boundary findings
 - [ ] Tier 3: Deduplicate and renumber
 - [ ] Tier 3: Build final report with per-domain breakdown
-- [ ] Coverage: All CRITICAL/HIGH domains done? If not, continue.
+- [ ] Coverage: All queued domains done? If not, continue.

package/modes/local-sequential.md CHANGED Viewed

@@ -6,7 +6,10 @@ This is NOT a degraded mode. The skill is designed to work fully here.
 ## How It Works
-You (the orchestrating agent) play each role yourself, sequentially. Between phases you **write outputs to files** so later phases can reference them without holding everything in working memory.
+You (the orchestrating agent) play each role yourself, sequentially. Between
+phases you write canonical JSON artifacts so later phases can reference them
+without holding everything in working memory. Markdown reports are derived from
+those JSON files when humans need them.
 All state files go in `.bug-hunter/` relative to the working directory.
@@ -26,7 +29,9 @@ All state files go in `.bug-hunter/` relative to the working directory.
    - Use `triage.scanOrder` as the file order for Phase B.
    - Recon's remaining job: read 3-5 key files from CRITICAL domains to identify **tech stack** (framework, auth mechanism, database, key dependencies) and **trust boundary patterns** (how routes are defined, how auth middleware is applied, etc.).
    - If git is available, check recently changed files with `git log`.
-   - Write your Recon output to `.bug-hunter/recon.md` — include the tech stack, patterns, and the triage-provided risk map.
+   - Write your Recon output to `.bug-hunter/recon.json` if structured output is
+     requested; otherwise keep `.bug-hunter/recon.md` as a temporary fallback
+     until the Recon prompt is migrated.
 3. **If `.bug-hunter/triage.json` does NOT exist** (fallback — Recon called directly):
    - Execute the full Recon instructions: discover files, classify, compute FILE_BUDGET.
@@ -44,12 +49,16 @@ All state files go in `.bug-hunter/` relative to the working directory.
 2. Read `SKILL_DIR/prompts/doc-lookup.md` with the Read tool.
 3. **Switch mindset**: you are now a Bug Hunter. Your ONLY job is to find behavioral bugs.
 4. Execute the Hunter instructions yourself:
-   - Read files in risk-map order: CRITICAL → HIGH → MEDIUM.
+   - Read files in risk-map order: CRITICAL → HIGH → MEDIUM → LOW.
    - For each file, use the Read tool. Do NOT rely on memory from earlier phases.
    - Apply the mandatory security checklist sweep (Phase 3 in hunter.md) on every CRITICAL and HIGH file.
    - Track which files you actually read — be honest about coverage.
    - For each bug found, record it in the exact BUG-N format specified in hunter.md.
-5. Write your complete findings to `.bug-hunter/findings.md`.
+5. Write your complete findings to `.bug-hunter/findings.json`.
+6. Validate the artifact immediately:
+   ```bash
+   node "$SKILL_DIR/scripts/schema-validate.cjs" findings ".bug-hunter/findings.json"
+   ```
 **Context management:** If you notice earlier files becoming hazy in your memory:
 - STOP expanding to new files.
@@ -84,9 +93,10 @@ If the Recon risk map contains more files than FILE_BUDGET, do NOT try to read t
       ```bash
       node "$SKILL_DIR/scripts/bug-hunter-state.cjs" mark-chunk ".bug-hunter/state.json" "<chunk-id>" done
       ```
-3. After all chunks: merge findings from `.bug-hunter/state.json` into `.bug-hunter/findings.md`.
+3. After all chunks: merge findings from `.bug-hunter/state.json` into
+   `.bug-hunter/findings.json`.
-**Gap-fill:** After scanning, compare FILES SCANNED against the risk map. If any CRITICAL or HIGH files are in FILES SKIPPED, read them now and append any new findings. If you truly cannot read them (context exhaustion), leave them in FILES SKIPPED.
+**Gap-fill:** After scanning, compare FILES SCANNED against the risk map. If any queued scannable files are in FILES SKIPPED, read them now in priority order (CRITICAL → HIGH → MEDIUM → LOW) and append any new findings. If you truly cannot read them (context exhaustion), leave them in FILES SKIPPED so loop mode can resume them next.
 If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report) in SKILL.md.
@@ -95,7 +105,7 @@ If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report)
 1. Read `SKILL_DIR/prompts/skeptic.md` with the Read tool.
 2. Read `SKILL_DIR/prompts/doc-lookup.md` with the Read tool.
 3. **Switch mindset completely**: you are now the Skeptic. Your job is to DISPROVE false positives. Forget the pride of finding them — you want to kill weak claims.
-4. Read `.bug-hunter/findings.md` to get the findings list.
+4. Read `.bug-hunter/findings.json` to get the findings list.
 5. For EACH finding:
    - Re-read the actual code at the reported file and line with the Read tool. This is MANDATORY — do not evaluate from memory.
    - Read all cross-referenced files.
@@ -103,7 +113,12 @@ If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report)
    - Check framework/middleware protections the Hunter may have missed.
    - Apply the risk calculation: `EV = (confidence% × points) - ((100 - confidence%) × 2 × points)`. Only DISPROVE when EV is positive (confidence > 67%).
    - For Critical bugs: need >67% confidence AND all cross-references read.
-6. Write your complete Skeptic output to `.bug-hunter/skeptic.md` in the format from skeptic.md.
+6. Write your complete Skeptic output to `.bug-hunter/skeptic.json` in the
+   format from skeptic.md.
+7. Validate it immediately:
+   ```bash
+   node "$SKILL_DIR/scripts/schema-validate.cjs" skeptic ".bug-hunter/skeptic.json"
+   ```
 **Important:** When switching from Hunter to Skeptic, genuinely try to disprove your own findings. The point of this phase is adversarial review. If you cannot genuinely argue against a finding, ACCEPT it and move on — do not waste time rubber-stamping.
@@ -111,13 +126,21 @@ If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report)
 1. Read `SKILL_DIR/prompts/referee.md` with the Read tool.
 2. **Switch mindset**: you are the impartial Referee. You trust neither the Hunter nor the Skeptic.
-3. Read both `.bug-hunter/findings.md` and `.bug-hunter/skeptic.md`.
+3. Read both `.bug-hunter/findings.json` and `.bug-hunter/skeptic.json`.
 4. For each finding:
    - **Tier 1 (all Critical + top 15 by severity):** Re-read the actual code yourself a THIRD time using the Read tool. Construct the runtime trigger independently. Make your own judgment.
    - **Tier 2 (remaining):** Evaluate evidence quality. Whose code quotes are more specific? Whose runtime trigger is more concrete?
 5. Make final REAL BUG / NOT A BUG verdicts with severity calibration.
-6. Write the final Referee report to `.bug-hunter/referee.md`.
-7. Proceed to Step 7 (Final Report) in SKILL.md.
+6. Write the final Referee verdicts to `.bug-hunter/referee.json`.
+7. Validate them immediately:
+   ```bash
+   node "$SKILL_DIR/scripts/schema-validate.cjs" referee ".bug-hunter/referee.json"
+   ```
+8. Render `.bug-hunter/report.md` from the JSON artifacts:
+   ```bash
+   node "$SKILL_DIR/scripts/render-report.cjs" report ".bug-hunter/findings.json" ".bug-hunter/referee.json" > ".bug-hunter/report.md"
+   ```
+9. Proceed to Step 7 (Final Report) in SKILL.md.
 ## State Files Summary
@@ -125,10 +148,11 @@ After a complete local-sequential run, these files should exist:
 | File | Phase | Content |
 |------|-------|---------|
-| `.bug-hunter/recon.md` | A | Risk map, file metrics, tech stack |
-| `.bug-hunter/findings.md` | B | All Hunter findings in BUG-N format |
-| `.bug-hunter/skeptic.md` | C | Skeptic challenges and decisions |
-| `.bug-hunter/referee.md` | D | Final verdicts and confirmed bugs |
+| `.bug-hunter/recon.json` | A | Recon artifact when structured output is used |
+| `.bug-hunter/findings.json` | B | All Hunter findings in canonical JSON |
+| `.bug-hunter/skeptic.json` | C | Skeptic challenges in canonical JSON |
+| `.bug-hunter/referee.json` | D | Final verdicts in canonical JSON |
+| `.bug-hunter/report.md` | D | Human-readable report rendered from JSON |
 | `.bug-hunter/state.json` | B (chunked) | Chunk progress, findings ledger |
 | `.bug-hunter/source-files.json` | A | Source file list (for state init) |
@@ -136,8 +160,8 @@ After a complete local-sequential run, these files should exist:
 After Phase D, check coverage:
-- If all CRITICAL and HIGH files were scanned: proceed to Final Report.
-- If any CRITICAL/HIGH files were skipped:
-  - If `--loop` mode: the ralph-loop will iterate and cover them next.
-  - If not `--loop`: include a coverage WARNING in the Final Report and recommend `--loop`.
-- Do NOT claim "full coverage" or "audit complete" unless every CRITICAL and HIGH file was actually read with the Read tool and has status DONE.
+- If all queued scannable source files were scanned: proceed to Final Report.
+- If any queued scannable files were skipped:
+  - If `--loop` mode: the ralph-loop must iterate and cover the remaining queue next.
+  - If not `--loop`: include a coverage WARNING in the Final Report and recommend loop mode.
+- Do NOT claim "full coverage" or "audit complete" unless every queued scannable source file was actually read with the Read tool and has status DONE.

package/modes/loop.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Ralph-Loop Mode (`--loop`)
-When `--loop` is present, the bug-hunter wraps itself in a ralph-loop that keeps iterating until the audit achieves full coverage. This is for thorough, autonomous audits where you want every file examined.
+When `--loop` is present, the bug-hunter wraps itself in a ralph-loop that keeps iterating until the audit achieves full queued coverage. This is for thorough, autonomous audits where you want every queued scannable source file examined unless the user interrupts.
 ## CRITICAL: Starting the ralph-loop
@@ -12,65 +12,63 @@ When `LOOP_MODE=true` is set (from `--loop` flag), before running the first pipe
 2. Call the `ralph_start` tool:
 ```
+MAX_LOOP_ITERATIONS = max(12, min(200, ceil(SCANNABLE_FILES / max(FILE_BUDGET, 1)) + 8))
 ralph_start({
   name: "bug-hunter-audit",
   taskContent: <the TODO.md content below>,
-  maxIterations: 10
+  maxIterations: MAX_LOOP_ITERATIONS
 })
 ```
 3. The ralph-loop system will then drive iteration. Each iteration:
    - You receive the task prompt with the current checklist state.
    - You execute one iteration of the bug-hunt pipeline (steps below).
-   - You update `.bug-hunter/coverage.md` with results.
-   - If ALL CRITICAL/HIGH files are DONE → output `<promise>COMPLETE</promise>` to end the loop.
+   - You update `.bug-hunter/coverage.json` with results and render `.bug-hunter/coverage.md` from it.
+   - If ALL queued scannable source files are DONE → output `<promise>COMPLETE</promise>` to end the loop.
    - Otherwise → call `ralph_done` to proceed to the next iteration.
 **Do NOT manually loop or re-invoke yourself.** The ralph-loop system handles iteration automatically after you call `ralph_start`.
 ## How it works
-1. **First iteration**: Run the normal pipeline (Recon → Hunters → Skeptics → Referee). At the end, write a coverage report to `.bug-hunter/coverage.md` using the machine-parseable format below.
+1. **First iteration**: Run the normal pipeline (Recon → Hunters → Skeptics →
+   Referee). At the end, write canonical coverage state to
+   `.bug-hunter/coverage.json` and render `.bug-hunter/coverage.md` from it.
 2. **Coverage check**: After each iteration, evaluate:
-   - If ALL CRITICAL and HIGH files show status DONE → output `<promise>COMPLETE</promise>` → loop ends
-   - If any CRITICAL/HIGH files are SKIPPED or PARTIAL → call `ralph_done` → loop continues
-   - If only MEDIUM files remain uncovered → output `<promise>COMPLETE</promise>` (MEDIUM gaps are acceptable)
-3. **Subsequent iterations**: Each new iteration reads `.bug-hunter/coverage.md` to see what's already been done, then runs the pipeline ONLY on uncovered files. New findings are appended to the cumulative bug list.
-## Coverage file format (machine-parseable)
-**`.bug-hunter/coverage.md`:**
-```markdown
-# Bug Hunt Coverage
-SCHEMA_VERSION: 2
-## Meta
-ITERATION: [N]
-STATUS: [IN_PROGRESS | COMPLETE]
-TOTAL_BUGS_FOUND: [N]
-TIMESTAMP: [ISO 8601]
-CHECKSUM: [line_count of Files section]|[line_count of Bugs section]
-## Files
-<!-- One line per file. Format: TIER|PATH|STATUS|ITERATION_SCANNED|BUGS_FOUND -->
-<!-- STATUS: DONE | PARTIAL | SKIPPED -->
-<!-- BUGS_FOUND: comma-separated BUG-IDs, or NONE -->
-CRITICAL|src/auth/login.ts|DONE|1|BUG-3,BUG-7
-CRITICAL|src/auth/middleware.ts|DONE|1|NONE
-HIGH|src/api/users.ts|DONE|1|BUG-12
-HIGH|src/api/payments.ts|SKIPPED|0|
-MEDIUM|src/utils/format.ts|SKIPPED|0|
-TEST|src/auth/login.test.ts|CONTEXT|1|
-## Bugs
-<!-- One line per confirmed bug. Format: BUG-ID|SEVERITY|FILE|LINES|ONE_LINE_DESCRIPTION -->
-BUG-3|Critical|src/auth/login.ts|45-52|JWT token not validated before use
-BUG-7|Medium|src/auth/login.ts|89|Password comparison uses timing-unsafe equality
-BUG-12|Low|src/api/users.ts|120-125|Missing null check on optional profile field
+   - If ALL queued scannable source files show status DONE → output `<promise>COMPLETE</promise>` → loop ends
+   - If any queued scannable source files are SKIPPED or PARTIAL → call `ralph_done` → loop continues
+   - Do NOT stop just because the current prioritized tier is clean; continue descending through MEDIUM and LOW files automatically
+3. **Subsequent iterations**: Each new iteration reads
+   `.bug-hunter/coverage.json` to see what's already been done, then runs the
+   pipeline ONLY on uncovered files. New findings are appended to the
+   cumulative bug list.
+## Coverage file format (canonical)
+**`.bug-hunter/coverage.json`:**
+```json
+{
+  "schemaVersion": 1,
+  "iteration": 1,
+  "status": "IN_PROGRESS",
+  "files": [
+    { "path": "src/auth/login.ts", "status": "done" },
+    { "path": "src/api/payments.ts", "status": "pending" }
+  ],
+  "bugs": [
+    { "bugId": "BUG-3", "severity": "Critical", "file": "src/auth/login.ts", "claim": "JWT token not validated before use" }
+  ],
+  "fixes": [
+    { "bugId": "BUG-3", "status": "MANUAL_REVIEW" }
+  ]
+}
 ```
+**`.bug-hunter/coverage.md`** is derived from the JSON artifact for humans.
 ## TODO.md task content for ralph_start
 Use this as the `taskContent` parameter when calling `ralph_start`:
@@ -82,44 +80,46 @@ Use this as the `taskContent` parameter when calling `ralph_start`:
 ## Coverage Tasks
 - [ ] All CRITICAL files scanned
 - [ ] All HIGH files scanned
+- [ ] All MEDIUM files scanned
+- [ ] All LOW files scanned
 - [ ] Findings verified through Skeptic+Referee pipeline
 ## Completion
 - [ ] ALL_TASKS_COMPLETE
 ## Instructions
-1. Read .bug-hunter/coverage.md for previous iteration state
-2. Parse the Files table — collect all lines where STATUS is not DONE and TIER is CRITICAL or HIGH
+1. Read .bug-hunter/coverage.json for previous iteration state
+2. Parse the `files` array — collect all entries where `status` is not `done`
 3. Run bug-hunter pipeline on those files only
-4. Update coverage file: change STATUS to DONE, add BUG-IDs
-5. Output <promise>COMPLETE</promise> when all CRITICAL/HIGH files are DONE
+4. Update coverage JSON: change file status to `done`, append bug summaries, and render coverage.md
+5. Output <promise>COMPLETE</promise> only when all queued source files are DONE
 6. Otherwise call ralph_done to continue to the next iteration
 ```
 ## Coverage file validation
 At the start of each iteration, validate the coverage file:
-1. Check `SCHEMA_VERSION: 2` exists on line 2 — if missing, this is a v1 file; migrate by adding the header
-2. Parse the CHECKSUM field: `[file_lines]|[bug_lines]` — count actual lines in Files and Bugs sections
-3. If counts don't match the checksum, the file may be corrupted. Warn: "Coverage file checksum mismatch (expected X|Y, got A|B). Re-scanning affected files." Then set any files with mismatched data to STATUS=PARTIAL for re-scan.
-4. If the file fails to parse entirely (malformed lines, missing sections), rename it to `.bug-hunter/coverage.md.bak` and start fresh. Warn user.
-Update the CHECKSUM every time you write to the coverage file.
+1. Validate `.bug-hunter/coverage.json` against the local coverage schema.
+2. If validation fails, rename the bad file to `.bug-hunter/coverage.json.bak`
+   and start fresh. Warn the user.
+3. Always regenerate `.bug-hunter/coverage.md` from the JSON artifact after a
+   successful write.
 ## Iteration behavior
 Each iteration after the first:
-1. Read `.bug-hunter/coverage.md` — parse the Files table
-2. Collect all lines where STATUS != DONE and TIER is CRITICAL or HIGH
+1. Read `.bug-hunter/coverage.json`
+2. Collect all file entries where `status != "done"`
 3. If none remain → output `<promise>COMPLETE</promise>` (this ends the ralph-loop)
 4. Otherwise, run the pipeline on remaining files only (use small/parallel mode based on count)
-5. Update the coverage file: set STATUS to DONE for scanned files, append new bugs to the Bugs section
+5. Update `coverage.json`, then render `coverage.md`
 6. Increment ITERATION counter
 7. Call `ralph_done` to proceed to the next iteration
 ## Safety
-- Max 10 iterations by default (set via `ralph_start({ maxIterations: 10 })`)
+- Max iterations should scale with the queue size so autonomous runs do not stop early
 - Each iteration only scans NEW files — no re-scanning already-DONE files
 - User can stop anytime with ESC or `/ralph-stop`
-- All state is in `.bug-hunter/coverage.md` — fully resumable, machine-parseable
+- Canonical state is in `.bug-hunter/coverage.json`; `coverage.md` is derived
+  and fully resumable from that JSON

package/modes/parallel.md CHANGED Viewed

@@ -70,7 +70,7 @@ Pass to the Hunter:
 - If scout hints exist (from Step 5), use them to prioritize certain code sections, but scan all files regardless.
 - `doc-lookup.md` contents as phase-specific context.
-After completion, read `.bug-hunter/findings.md`.
+After completion, read `.bug-hunter/findings.json`.
 **Merge scout + deep findings:** If scout pass ran, compare scout findings with deep Hunter findings. Promote any scout-only findings (bugs the deep Hunter missed) into the findings list for Skeptic review.
@@ -80,7 +80,7 @@ If TOTAL FINDINGS: 0, skip Skeptic and Referee. Go to Step 7 (Final Report) in S
 ## Step 5-verify: Gap-fill check
-Same as small mode: compare FILES SCANNED vs risk map, re-scan any missed CRITICAL/HIGH files.
+Same as small mode: compare FILES SCANNED vs risk map, then re-scan any missed queued scannable files in priority order.
 ---
@@ -104,7 +104,7 @@ Dispatch Referee using the standard dispatch pattern (see `_dispatch.md`, role=`
 Pass the merged Hunter findings + Skeptic challenges.
-After completion, read `.bug-hunter/referee.md`.
+After completion, read `.bug-hunter/referee.json`, then render `.bug-hunter/report.md` from the JSON artifacts.
 ---

package/modes/scaled.md CHANGED Viewed

@@ -45,7 +45,7 @@ For each chunk: dispatch Hunter, record findings, mark done — same pattern as
 ### 5c. Cross-chunk consistency
 After all chunks complete:
-1. Merge findings from state into `.bug-hunter/findings.md`.
+1. Merge findings from state into `.bug-hunter/findings.json`.
 2. Run consistency check: look for duplicate BUG-IDs across chunks and conflicting claims on the same file/line.
 3. Resolve conflicts: keep the finding with the stronger evidence.
@@ -73,4 +73,4 @@ Pass merged Hunter findings + Skeptic challenges.
 Proceed to **Step 7** (Final Report) in SKILL.md.
-If `--loop` was specified and coverage is incomplete, the ralph-loop will iterate to cover remaining files.
+If `--loop` was specified and coverage is incomplete, the ralph-loop will iterate to cover the remaining queued files until the queue is exhausted or the user interrupts.