@codexstar/bug-hunter 3.0.0 → 3.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (77) hide show
  1. package/CHANGELOG.md +149 -83
  2. package/README.md +150 -15
  3. package/SKILL.md +94 -27
  4. package/agents/openai.yaml +4 -0
  5. package/bin/bug-hunter +9 -3
  6. package/docs/images/2026-03-12-fix-plan-rollout.png +0 -0
  7. package/docs/images/2026-03-12-hero-bug-hunter-overview.png +0 -0
  8. package/docs/images/2026-03-12-machine-readable-artifacts.png +0 -0
  9. package/docs/images/2026-03-12-pr-review-flow.png +0 -0
  10. package/docs/images/2026-03-12-security-pack.png +0 -0
  11. package/docs/images/adversarial-debate.png +0 -0
  12. package/docs/images/doc-verify-fix-plan.png +0 -0
  13. package/docs/images/hero.png +0 -0
  14. package/docs/images/pipeline-overview.png +0 -0
  15. package/docs/images/security-finding-card.png +0 -0
  16. package/docs/plans/2026-03-11-structured-output-migration-plan.md +288 -0
  17. package/docs/plans/2026-03-12-audit-bug-fixes-surgical-plan.md +193 -0
  18. package/docs/plans/2026-03-12-enterprise-security-pack-e2e-plan.md +59 -0
  19. package/docs/plans/2026-03-12-local-security-skills-integration-plan.md +39 -0
  20. package/docs/plans/2026-03-12-pr-review-strategic-fix-flow.md +78 -0
  21. package/evals/evals.json +366 -102
  22. package/modes/extended.md +2 -2
  23. package/modes/fix-loop.md +30 -30
  24. package/modes/fix-pipeline.md +32 -6
  25. package/modes/large-codebase.md +14 -15
  26. package/modes/local-sequential.md +44 -20
  27. package/modes/loop.md +56 -56
  28. package/modes/parallel.md +3 -3
  29. package/modes/scaled.md +2 -2
  30. package/modes/single-file.md +3 -3
  31. package/modes/small.md +11 -11
  32. package/package.json +10 -1
  33. package/prompts/fixer.md +37 -23
  34. package/prompts/hunter.md +39 -20
  35. package/prompts/referee.md +34 -20
  36. package/prompts/skeptic.md +25 -22
  37. package/schemas/coverage.schema.json +67 -0
  38. package/schemas/examples/findings.invalid.json +13 -0
  39. package/schemas/examples/findings.valid.json +17 -0
  40. package/schemas/findings.schema.json +76 -0
  41. package/schemas/fix-plan.schema.json +94 -0
  42. package/schemas/fix-report.schema.json +105 -0
  43. package/schemas/fix-strategy.schema.json +99 -0
  44. package/schemas/recon.schema.json +31 -0
  45. package/schemas/referee.schema.json +46 -0
  46. package/schemas/shared.schema.json +51 -0
  47. package/schemas/skeptic.schema.json +21 -0
  48. package/scripts/bug-hunter-state.cjs +35 -12
  49. package/scripts/code-index.cjs +11 -4
  50. package/scripts/fix-lock.cjs +95 -25
  51. package/scripts/payload-guard.cjs +24 -10
  52. package/scripts/pr-scope.cjs +181 -0
  53. package/scripts/render-report.cjs +346 -0
  54. package/scripts/run-bug-hunter.cjs +667 -32
  55. package/scripts/schema-runtime.cjs +273 -0
  56. package/scripts/schema-validate.cjs +40 -0
  57. package/scripts/tests/bug-hunter-state.test.cjs +68 -3
  58. package/scripts/tests/code-index.test.cjs +15 -0
  59. package/scripts/tests/fix-lock.test.cjs +60 -2
  60. package/scripts/tests/fixtures/flaky-worker.cjs +6 -1
  61. package/scripts/tests/fixtures/low-confidence-worker.cjs +8 -2
  62. package/scripts/tests/fixtures/success-worker.cjs +6 -1
  63. package/scripts/tests/payload-guard.test.cjs +154 -2
  64. package/scripts/tests/pr-scope.test.cjs +212 -0
  65. package/scripts/tests/render-report.test.cjs +180 -0
  66. package/scripts/tests/run-bug-hunter.test.cjs +686 -2
  67. package/scripts/tests/security-skills-integration.test.cjs +29 -0
  68. package/scripts/tests/skills-packaging.test.cjs +30 -0
  69. package/scripts/tests/worktree-harvest.test.cjs +66 -0
  70. package/scripts/worktree-harvest.cjs +62 -9
  71. package/skills/README.md +19 -0
  72. package/skills/commit-security-scan/SKILL.md +63 -0
  73. package/skills/security-review/SKILL.md +57 -0
  74. package/skills/threat-model-generation/SKILL.md +47 -0
  75. package/skills/vulnerability-validation/SKILL.md +59 -0
  76. package/templates/subagent-wrapper.md +12 -3
  77. package/modes/_dispatch.md +0 -121
@@ -0,0 +1,288 @@
1
+ # Canonical Structured Outputs For Bug Hunter
2
+
3
+ This ExecPlan is a living document. The sections `Progress`, `Surprises & Discoveries`, `Decision Log`, and `Outcomes & Retrospective` must be kept up to date as work proceeds.
4
+
5
+ This repository does not contain a checked-in `PLANS.md`, but this document is written to the same standard as the machine-local ExecPlan reference at `/Users/codex/Downloads/Code Files/PLANS.md`. Keep this plan self-contained as implementation proceeds.
6
+
7
+ ## Purpose / Big Picture
8
+
9
+ After this change, Bug Hunter will use one canonical structured contract from end to end. Each phase will emit validated JSON as the source of truth, while Markdown becomes a rendered report for humans. This matters because the current system mixes Markdown prompts, ad hoc parsing, and JSON side channels, which makes the pipeline slower, harder to validate, and more likely to drift into false positives, silent false negatives, or broken fix eligibility.
10
+
11
+ The user-visible result is simple to verify. A bug-hunter run should create phase artifacts such as `.bug-hunter/recon.json`, `.bug-hunter/findings.json`, `.bug-hunter/skeptic.json`, `.bug-hunter/referee.json`, `.bug-hunter/coverage.json`, and `.bug-hunter/fix-report.json`. The same run should still produce readable Markdown reports, but those Markdown files must be generated from the JSON artifacts rather than being the only source of truth. A failed or malformed phase output should be rejected immediately with a precise validation error and a retry path instead of slipping through as an empty or partially parsed report.
12
+
13
+ ## Progress
14
+
15
+ - [x] (2026-03-11 18:40Z) Create versioned JSON schemas for `recon`, `findings`, `skeptic`, `referee`, `coverage`, `fix-report`, plus shared definitions under `schemas/`.
16
+ - [x] (2026-03-11 18:40Z) Add `scripts/schema-runtime.cjs` and `scripts/schema-validate.cjs`, ship `schemas/` in the npm package, and add example valid/invalid `findings.json` fixtures.
17
+ - [x] (2026-03-11 18:40Z) Wire strict findings validation into `payload-guard.cjs`, `bug-hunter-state.cjs`, and `run-bug-hunter.cjs`, including retry-on-invalid-findings inside the chunk worker loop.
18
+ - [x] (2026-03-11 20:05Z) Replace Markdown-only phase prompting with JSON-first prompting plus rendered Markdown output guidance, including `scripts/render-report.cjs`.
19
+ - [x] (2026-03-11 20:05Z) Normalize confidence to numeric values in canonical findings/referee contracts and fix-plan eligibility.
20
+ - [x] (2026-03-11 20:05Z) Replace `coverage.md` as canonical loop state with `coverage.json` and keep `coverage.md` as a derived summary.
21
+ - [x] (2026-03-10 21:06Z) Add strict inbound and outbound validation, retry logic, and eval coverage for malformed outputs and stale contracts.
22
+ - [x] (2026-03-11 20:05Z) Update core documentation, mode docs, wrapper templates, and eval text so they match the full-queue loop semantics and the new structured contracts.
23
+
24
+ ## Surprises & Discoveries
25
+
26
+ - Observation: the orchestrator already has a JSON worker path, but the main prompts still tell agents to write Markdown reports.
27
+ Evidence: `scripts/run-bug-hunter.cjs` writes and reads `chunk-<id>-findings.json`, while `prompts/hunter.md` still directs output to `.bug-hunter/findings.md`.
28
+
29
+ - Observation: fix planning expects numeric confidence, but the Referee prompt still emits `High/Medium/Low`.
30
+ Evidence: `scripts/run-bug-hunter.cjs` filters fix eligibility with `confidence >= confidenceThreshold`, while `prompts/referee.md` asks for `Confidence: High/Medium/Low`.
31
+
32
+ - Observation: loop state is still a machine-parseable Markdown document, which is more brittle than the rest of the JSON-capable pipeline.
33
+ Evidence: `modes/loop.md` defines `.bug-hunter/coverage.md` with line-based sections and a checksum format instead of a JSON state file.
34
+
35
+ - Observation: evaluation fixtures still encode the earlier `CRITICAL/HIGH` stopping rule.
36
+ Evidence: `evals/evals.json` case `id: 6` still expects completion once all CRITICAL and HIGH files are done.
37
+
38
+ - Observation: once schema refs become real runtime assets, isolated skill copies must include `schemas/` as well as `scripts/`.
39
+ Evidence: the preflight isolation test needed `schemas/findings.schema.json` and the new schema helper scripts copied into the sandbox to stay representative.
40
+
41
+ - Observation: deduplicated findings now inherit the strongest numeric confidence for the shared `file|lines|claim` key, which changes low-confidence metrics compared with the previous loose merge.
42
+ Evidence: `scripts/bug-hunter-state.cjs` now validates findings before merge and keeps the maximum `confidenceScore` for duplicate keys, which required updating the state test expectation.
43
+
44
+ - Observation: the remaining validation gap closed cleanly once the runner exposed a generic schema-validated phase command instead of baking phase-specific logic into docs.
45
+ Evidence: `scripts/run-bug-hunter.cjs` now exposes `phase`, validates any named artifact after each attempt, and retries malformed Skeptic/Referee/Fix outputs before the phase succeeds.
46
+
47
+ ## Decision Log
48
+
49
+ - Decision: use provider-agnostic local JSON schemas as the source of truth, and treat provider-native structured outputs as an optimization layer.
50
+ Rationale: Bug Hunter runs across multiple agent backends and CLIs. Native structured outputs from Claude, OpenAI, and Gemini can improve reliability where available, but the skill must remain correct on backends that only support plain prompting and local validation.
51
+ Date/Author: 2026-03-11 / Codex
52
+
53
+ - Decision: keep Markdown reports, but generate them from validated JSON artifacts.
54
+ Rationale: humans still need readable reports, but machine-state should not depend on brittle line parsing or prompt formatting quirks.
55
+ Date/Author: 2026-03-11 / Codex
56
+
57
+ - Decision: normalize confidence to both `confidence_score` and `confidence_label`.
58
+ Rationale: numeric confidence is required for fix eligibility and consistency checks, while a short label remains useful for readable reports.
59
+ Date/Author: 2026-03-11 / Codex
60
+
61
+ - Decision: migrate loop state from `coverage.md` to `coverage.json` and keep a rendered `coverage.md` for visibility.
62
+ Rationale: the loop is the long-lived state carrier. It benefits the most from strict schema validation, resumability, and safe retries.
63
+ Date/Author: 2026-03-11 / Codex
64
+
65
+ - Decision: ship the schema files as package assets and treat missing schema files as a preflight failure.
66
+ Rationale: payload guards and worker validation now depend on the checked-in schema files at runtime, so an install missing `schemas/` is broken even if the scripts themselves exist.
67
+ Date/Author: 2026-03-11 / Codex
68
+
69
+ ## Outcomes & Retrospective
70
+
71
+ This migration milestone is now complete. Bug Hunter rejects malformed `findings.json` artifacts before they reach state, retries the worker when those artifacts are invalid, ships explicit schemas plus a validator CLI, renders Markdown from canonical JSON, writes canonical `coverage.json` loop state with a derived `coverage.md` companion, and now enforces Skeptic/Referee/Fixer artifact validation through the orchestrated `run-bug-hunter.cjs phase` path as well as the manual/local path.
72
+
73
+ ## Context and Orientation
74
+
75
+ Bug Hunter is a skill package rooted at `/Users/codex/.agents/skills/bug-hunter`. The important files for this work are spread across prompts, mode documents, helper scripts, and tests.
76
+
77
+ `prompts/hunter.md`, `prompts/skeptic.md`, `prompts/referee.md`, and `prompts/fixer.md` define what each analysis phase writes today. They currently emphasize Markdown output with free-form sections and line-oriented formats. This is the main place where drift enters the system.
78
+
79
+ `scripts/run-bug-hunter.cjs` is the orchestration helper that manages chunk execution, retries, delta expansion, consistency reports, and fix-plan generation. It already understands JSON findings files written by workers. This file is the best anchor for the migration because it already behaves like a JSON pipeline in the tests.
80
+
81
+ `scripts/bug-hunter-state.cjs` stores durable scan state such as chunk progress, a bug ledger, fact cards, consistency information, and fix plans. It currently records findings from JSON files, but it does not validate rich schemas and it accepts incomplete objects as long as basic fields exist.
82
+
83
+ `scripts/payload-guard.cjs` validates worker payloads before launch. Right now it only checks that required top-level fields exist and that `outputSchema` is “an object”. It does not enforce real schemas for either inbound or outbound data.
84
+
85
+ `modes/loop.md` and `modes/fix-loop.md` define the iterative audit loop. They currently store machine state in `.bug-hunter/coverage.md`, which is a Markdown file with line-based sections. That format is readable but brittle and expensive to maintain compared with JSON.
86
+
87
+ `evals/evals.json` and `scripts/tests/*.test.cjs` are the safety net. They currently prove parts of the JSON worker path, but they do not yet enforce full end-to-end structured outputs or the newly required full-queue loop semantics.
88
+
89
+ In this plan, “structured output” means a phase result that conforms to a versioned JSON schema that can be validated locally with no guesswork. “Canonical artifact” means the file every later phase trusts as the source of truth. “Rendered report” means a human-readable Markdown file generated from a validated JSON artifact.
90
+
91
+ ## Plan of Work
92
+
93
+ The work starts by defining stable versioned schemas in a new directory, `schemas/`, under the skill root. Create one schema module per artifact: `recon`, `findings`, `skeptic`, `referee`, `coverage`, `fix-report`, and any shared types such as file coverage entries, cross-reference items, STRIDE/CWE metadata, and confidence values. Use plain JSON Schema stored in `.json` files or JavaScript schema builders that output JSON Schema, but keep the final schemas serializable and versioned. Each schema must include a `schemaVersion` field. Confidence must be represented as `confidenceScore` on a numeric 0–100 scale, and optionally `confidenceLabel` derived from it for rendered reports.
94
+
95
+ Next, add a schema runtime helper under `scripts/`, for example `scripts/schema-validate.cjs`, that can validate any named artifact file and print a short machine-readable result. This helper must be used in three places: when generating payloads, when reading worker outputs, and when reading persisted loop state. Expand `scripts/payload-guard.cjs` so the role templates point to real output schemas rather than placeholder `format/version` objects. The guard should reject missing or mismatched schema names before work starts.
96
+
97
+ Then migrate the prompts. `prompts/hunter.md`, `prompts/skeptic.md`, `prompts/referee.md`, and `prompts/fixer.md` should stop treating Markdown as the primary output. Instead they should instruct the agent to write a JSON array or object to the assigned canonical path, and optionally write a rendered Markdown companion file if the assignment requests it. The JSON contract must be concrete. For example, Hunter findings must include `bugId`, `severity`, `category`, `file`, `lines`, `claim`, `evidence`, `runtimeTrigger`, `crossReferences`, and `confidenceScore`. Referee verdicts must include `verdict`, `trueSeverity`, `confidenceScore`, `confidenceLabel`, `verificationMode`, and enriched security fields where applicable. Keep the prose reasoning, but move it into explicitly typed fields such as `analysisSummary` instead of free-form blocks.
98
+
99
+ Once the prompts are changed, update the orchestrator and state layer to consume the new contracts only. In `scripts/run-bug-hunter.cjs`, treat missing worker JSON output as a hard phase failure unless the phase explicitly allows zero results via a valid empty array. Validate every worker output before recording it in state. If validation fails, journal the schema error, mark the chunk or phase as failed, and let the retry logic rerun the worker. In `scripts/bug-hunter-state.cjs`, reject findings entries that omit required fields, and enrich ledger entries with normalized keys such as `confidenceScore`, `severity`, `category`, and `verificationMode`. Do not silently continue when a result is malformed.
100
+
101
+ After the phase artifacts are stable, migrate loop state. Add a new canonical file, `.bug-hunter/coverage.json`, and make it the state the loop reads and writes. It should contain top-level metadata, file coverage entries, cumulative bugs, fix ledger entries, and the current loop status. Keep `.bug-hunter/coverage.md`, but generate it from `coverage.json` after each iteration so humans can still inspect progress. Update `modes/loop.md` and `modes/fix-loop.md` to describe the JSON state as canonical and Markdown as derived.
102
+
103
+ The provider-specific structured-output layer comes next. Add a small capability adapter under `scripts/` or `templates/` that can describe three modes: native structured output supported, native unsupported but JSON prompting available, and plain-text fallback with local validation. Do not make provider-native structured outputs mandatory for correctness. When the backend supports them, use the local schema definitions to generate provider-specific requests. For Claude this means schema-constrained output or strict tool result patterns where available. For OpenAI this means strict structured outputs using JSON Schema and handling refusals or first-schema latency explicitly. For Gemini this means `responseMimeType: application/json` with `responseSchema`. If a backend does not support native structured output, keep the prompt JSON-first and validate locally after the response.
104
+
105
+ Finally, update every test and eval path. Add tests for schema validation failures, malformed worker outputs, missing `confidenceScore`, invalid coverage state, and rendered Markdown generation from JSON. Update `evals/evals.json` to require full queued coverage semantics and the presence of canonical JSON artifacts. Keep the existing worker fixture tests, but add one fully integrated smoke path that simulates a Hunter JSON output, a Skeptic JSON output, a Referee JSON output, and the resulting fix-plan eligibility.
106
+
107
+ ## Milestones
108
+
109
+ ### Milestone 1: Define the canonical data contracts
110
+
111
+ At the end of this milestone, the repository has explicit versioned schemas for every phase artifact, and a local validator can reject malformed files deterministically. Nothing user-visible changes yet, but the implementation gains a stable foundation. This milestone is complete when a novice can run schema validation against a sample `findings.json` and see success, then remove a required field and see a validation failure with a helpful error.
112
+
113
+ ### Milestone 2: Convert prompts and orchestrator to JSON-first phase outputs
114
+
115
+ At the end of this milestone, Hunter, Skeptic, Referee, and Fixer all emit canonical JSON artifacts, and the orchestrator only accepts validated JSON for state updates. Markdown reports still exist, but they are generated from JSON. This milestone is complete when a simulated worker run produces `findings.json`, the orchestrator records it, and a malformed output fails fast with retry instead of silently succeeding.
116
+
117
+ ### Milestone 3: Migrate loop state to JSON and align semantics
118
+
119
+ At the end of this milestone, `.bug-hunter/coverage.json` is the canonical loop state, the loop uses full queued coverage semantics, and `.bug-hunter/coverage.md` is a rendered summary. This milestone is complete when a loop simulation can resume from `coverage.json`, continue through queued files, and render a readable Markdown view from the same state.
120
+
121
+ ### Milestone 4: Add provider-native structured output adapters and end-to-end safety tests
122
+
123
+ At the end of this milestone, the skill can optionally use native structured outputs for Claude, OpenAI, or Gemini capable backends, but still behaves correctly without them. The tests and evals enforce the new contracts. This milestone is complete when the provider adapter selects the correct mode, malformed outputs are rejected across all supported execution paths, and evals no longer encode the obsolete `CRITICAL/HIGH` stopping rule.
124
+
125
+ ## Concrete Steps
126
+
127
+ Work from `/Users/codex/.agents/skills/bug-hunter`.
128
+
129
+ 1. Create the schema directory and files.
130
+
131
+ mkdir -p docs/plans schemas
132
+
133
+ Add files such as:
134
+ schemas/findings.schema.json
135
+ schemas/skeptic.schema.json
136
+ schemas/referee.schema.json
137
+ schemas/coverage.schema.json
138
+ schemas/fix-report.schema.json
139
+ schemas/recon.schema.json
140
+ schemas/shared.schema.json
141
+
142
+ Expected result: the `schemas/` directory exists and each schema file includes `schemaVersion`.
143
+
144
+ 2. Add a validation helper.
145
+
146
+ Create `scripts/schema-validate.cjs` and teach it:
147
+ - how to load a schema by name
148
+ - how to validate a file path
149
+ - how to print JSON success or JSON error output
150
+
151
+ Expected result:
152
+
153
+ node scripts/schema-validate.cjs findings schemas/examples/findings.valid.json
154
+ {"ok":true,"artifact":"findings"}
155
+
156
+ node scripts/schema-validate.cjs findings schemas/examples/findings.invalid.json
157
+ {"ok":false,"artifact":"findings","errors":["missing required property: claim"]}
158
+
159
+ 3. Update `scripts/payload-guard.cjs` and `scripts/run-bug-hunter.cjs`.
160
+
161
+ Replace placeholder `outputSchema` objects with real schema references. Validate worker outputs before calling `record-findings` or any equivalent state write.
162
+
163
+ Expected result: a malformed findings file causes the chunk to fail with a schema error instead of being recorded as partial success.
164
+
165
+ 4. Update the prompts and rendered-report flow.
166
+
167
+ Change prompt files so JSON is the primary output. Add a renderer script such as `scripts/render-report.cjs` if needed.
168
+
169
+ Expected result: a run produces both JSON and Markdown, with Markdown fully derivable from JSON.
170
+
171
+ 5. Migrate loop state.
172
+
173
+ Add `coverage.json`, update `modes/loop.md` and `modes/fix-loop.md`, and render `coverage.md` from JSON.
174
+
175
+ Expected result: the loop resumes from JSON state and no longer depends on parsing Markdown line structure.
176
+
177
+ 6. Update tests and evals.
178
+
179
+ Run:
180
+
181
+ node --test scripts/tests/*.test.cjs
182
+
183
+ Add tests for malformed artifacts, missing confidence scores, bad coverage state, and rendered Markdown output. Update `evals/evals.json` so loop completion requires full queued coverage, not just CRITICAL and HIGH completion.
184
+
185
+ ## Validation and Acceptance
186
+
187
+ Acceptance is behavior-based.
188
+
189
+ First, run the script tests from `/Users/codex/.agents/skills/bug-hunter`:
190
+
191
+ node --test scripts/tests/*.test.cjs
192
+
193
+ Expect all tests to pass, including new tests that fail before the migration because the old code accepted malformed outputs or textual confidence.
194
+
195
+ Second, run a local orchestrator smoke path with a valid worker fixture. It must produce canonical JSON output files and a rendered Markdown report. Observe:
196
+
197
+ .bug-hunter/findings.json
198
+ .bug-hunter/referee.json
199
+ .bug-hunter/fix-report.json
200
+ .bug-hunter/coverage.json
201
+ .bug-hunter/report.md
202
+
203
+ Third, deliberately break one phase artifact by removing a required field such as `claim` or `confidenceScore`. Re-run the same smoke path and expect:
204
+
205
+ - the phase fails
206
+ - the journal records a schema validation error
207
+ - state is not updated from the malformed artifact
208
+ - retry logic is allowed to rerun the worker
209
+
210
+ Fourth, run a loop simulation and verify that completion only occurs when every queued scannable file is marked done in `coverage.json`, not merely when CRITICAL and HIGH files are done.
211
+
212
+ ## Idempotence and Recovery
213
+
214
+ The migration should be safe to run incrementally. Schema files and validators are additive. During implementation, keep Markdown outputs in parallel with JSON outputs until all consumers are switched over. Do not remove Markdown files until JSON-based rendering and validation are proven.
215
+
216
+ If a phase fails because of schema validation, the safe recovery path is to fix the producer prompt or fixture and rerun the same command. Because the state update happens after validation, malformed outputs should not poison the state file.
217
+
218
+ When migrating loop state, keep a one-time importer from `coverage.md` to `coverage.json` or, if that is too brittle, explicitly start fresh and document that old Markdown loop state is not resumable across the migration. Choose one path and document it in the implementation notes.
219
+
220
+ ## Artifacts and Notes
221
+
222
+ The most important implementation artifacts should be:
223
+
224
+ schemas/*.schema.json
225
+ scripts/schema-validate.cjs
226
+ scripts/render-report.cjs
227
+ .bug-hunter/*.json
228
+ .bug-hunter/report.md
229
+ .bug-hunter/coverage.md
230
+
231
+ Expected evidence after completion:
232
+
233
+ $ node scripts/schema-validate.cjs findings .bug-hunter/findings.json
234
+ {"ok":true,"artifact":"findings"}
235
+
236
+ $ node --test scripts/tests/*.test.cjs
237
+ ℹ pass <updated-count>
238
+ ℹ fail 0
239
+
240
+ ## Interfaces and Dependencies
241
+
242
+ Define these stable interfaces by the end of the work:
243
+
244
+ In `schemas/findings.schema.json`, define a findings artifact that is an array of finding objects. Each finding object must include:
245
+
246
+ bugId: string
247
+ severity: "Critical" | "Medium" | "Low"
248
+ category: string
249
+ file: string
250
+ lines: string
251
+ claim: string
252
+ evidence: string
253
+ runtimeTrigger: string
254
+ crossReferences: array
255
+ confidenceScore: number
256
+
257
+ In `schemas/referee.schema.json`, define a verdict artifact with:
258
+
259
+ bugId: string
260
+ verdict: "REAL_BUG" | "NOT_A_BUG" | "MANUAL_REVIEW"
261
+ trueSeverity: "Critical" | "Medium" | "Low"
262
+ confidenceScore: number
263
+ confidenceLabel: string
264
+ verificationMode: "INDEPENDENTLY_VERIFIED" | "EVIDENCE_BASED"
265
+ analysisSummary: string
266
+
267
+ In `schemas/coverage.schema.json`, define loop state with:
268
+
269
+ schemaVersion: number
270
+ iteration: number
271
+ status: "IN_PROGRESS" | "COMPLETE"
272
+ files: array of file coverage entries
273
+ bugs: array of confirmed bug summaries
274
+ fixes: array of fix ledger entries
275
+
276
+ In `scripts/schema-validate.cjs`, implement a CLI with:
277
+
278
+ node scripts/schema-validate.cjs <artifact-name> <file-path>
279
+
280
+ In `scripts/render-report.cjs`, implement a CLI that renders Markdown from JSON artifacts:
281
+
282
+ node scripts/render-report.cjs report .bug-hunter/findings.json .bug-hunter/referee.json > .bug-hunter/report.md
283
+
284
+ Provider-native structured output adapters, if added, must consume these local schemas rather than inventing provider-specific contracts.
285
+
286
+ ## Change Log For This Plan
287
+
288
+ 2026-03-11: Initial ExecPlan created after the structured-output audit. The plan chooses provider-agnostic local schemas as the foundation and treats Claude/OpenAI/Gemini native structured outputs as optional accelerators rather than the source of truth.
@@ -0,0 +1,193 @@
1
+ # Surgical Fix Plan for Confirmed Audit Bugs
2
+
3
+ ## Objective
4
+
5
+ Fix the four confirmed runtime bugs without changing the surrounding product behavior, public UX, or broader pipeline design beyond what is necessary for correctness and safety.
6
+
7
+ Confirmed bugs:
8
+ - `BUG-1` — `scripts/run-bug-hunter.cjs`
9
+ - `BUG-2` — `scripts/pr-scope.cjs`
10
+ - `BUG-3` — `scripts/fix-lock.cjs`
11
+ - `BUG-4` — `scripts/code-index.cjs`
12
+
13
+ ## Fix order
14
+
15
+ 1. `BUG-3` `scripts/fix-lock.cjs`
16
+ 2. `BUG-4` `scripts/code-index.cjs`
17
+ 3. `BUG-2` `scripts/pr-scope.cjs`
18
+ 4. `BUG-1` `scripts/run-bug-hunter.cjs`
19
+
20
+ Rationale:
21
+ - `BUG-3` and `BUG-4` are isolated utility-level correctness fixes with low blast radius.
22
+ - `BUG-2` changes PR scope resolution behavior and needs targeted tests around fallback semantics.
23
+ - `BUG-1` touches orchestration behavior and should land last after the supporting utilities are stable.
24
+
25
+ ---
26
+
27
+ ## BUG-3 — fix-lock can steal a live lock
28
+
29
+ ### Problem
30
+ `acquire()` treats TTL expiry as sufficient proof of staleness and does not check whether the recorded PID is still alive.
31
+
32
+ ### Surgical fix
33
+ - Keep the existing lock file format.
34
+ - Change stale recovery logic so a lock is auto-recovered only when:
35
+ - TTL expired **and**
36
+ - owner PID is absent or not alive.
37
+ - If TTL expired but owner is still alive, return a failure payload such as:
38
+ - `reason: "lock-held-by-live-owner"`
39
+ - include `stale: true` and `ownerAlive: true` for observability.
40
+
41
+ ### Files
42
+ - `scripts/fix-lock.cjs`
43
+ - tests in `scripts/tests/fix-lock.test.cjs`
44
+
45
+ ### Test additions
46
+ - acquiring a fresh lock from another process still fails
47
+ - acquiring an expired lock whose PID is dead succeeds
48
+ - acquiring an expired lock whose PID is alive fails
49
+ - `status` remains consistent with acquire behavior
50
+
51
+ ### Risk
52
+ Low. Pure locking behavior change.
53
+
54
+ ---
55
+
56
+ ## BUG-4 — code-index query-bugs temp file collision
57
+
58
+ ### Problem
59
+ `queryBugs()` always writes `.seed-files.tmp.json` in the same directory and only deletes it on success.
60
+
61
+ ### Surgical fix
62
+ - Replace fixed temp filename with a unique invocation-scoped filename, e.g. based on:
63
+ - `process.pid`
64
+ - timestamp
65
+ - random suffix
66
+ - Wrap temp-file lifecycle in `try/finally` so cleanup runs even if `query()` throws.
67
+ - Preserve current command contract and output shape.
68
+
69
+ ### Files
70
+ - `scripts/code-index.cjs`
71
+ - tests in `scripts/tests/code-index.test.cjs`
72
+
73
+ ### Test additions
74
+ - `query-bugs` cleans up temp file after success
75
+ - `query-bugs` cleans up temp file after a thrown query path
76
+ - parallel invocations do not reuse the same temp file name
77
+
78
+ ### Risk
79
+ Low. Local helper behavior only.
80
+
81
+ ---
82
+
83
+ ## BUG-2 — pr-scope silent wrong-base fallback
84
+
85
+ ### Problem
86
+ For `selector === "current"`, any `gh` failure falls back to `git diff <base or main>...HEAD` and reports success. This can silently produce the wrong review scope.
87
+
88
+ ### Surgical fix
89
+ Preferred minimal behavior:
90
+ - Keep git fallback only for `current`.
91
+ - Before fallback, determine base branch more safely:
92
+ 1. explicit `--base` if supplied
93
+ 2. repo default branch if discoverable
94
+ 3. otherwise fail explicitly instead of assuming `main`
95
+ - If `gh` fails and no trustworthy base is available, return an error rather than a successful but potentially wrong scope.
96
+
97
+ ### Implementation notes
98
+ - Add a small helper to resolve default branch via git when possible, e.g. from:
99
+ - `refs/remotes/origin/HEAD`
100
+ - or another safe git source
101
+ - Do **not** broaden fallback for numbered/recent PRs.
102
+ - Preserve existing JSON output contract, but add metadata when fallback is used.
103
+
104
+ ### Files
105
+ - `scripts/pr-scope.cjs`
106
+ - tests in `scripts/tests/pr-scope.test.cjs`
107
+
108
+ ### Test additions
109
+ - `current` with explicit `--base` still falls back correctly
110
+ - `current` with discoverable default branch falls back correctly
111
+ - `current` with no trustworthy base fails explicitly
112
+ - `recent` and numbered PRs still require GitHub metadata
113
+
114
+ ### Risk
115
+ Medium. Scope-selection behavior changes and could affect user workflows, but the change is correctness-oriented and bounded.
116
+
117
+ ---
118
+
119
+ ## BUG-1 — fix strategy ignored by executable fix queue
120
+
121
+ ### Problem
122
+ `fix-strategy.json` is generated, but `buildFixPlan()` still computes eligibility directly from confidence alone. Strategy classes such as `manual-review`, `larger-refactor`, and `architectural-remediation` do not actually gate execution.
123
+
124
+ ### Surgical fix
125
+ - Keep `fix-strategy.json` as the source of truth for execution eligibility.
126
+ - Update the executable queue builder so only findings/clusters marked safe for autofix enter:
127
+ - `safe-autofix`
128
+ - and `autofixEligible === true`
129
+ - Ensure `manual-review`, `larger-refactor`, and `architectural-remediation` never flow into canary/rollout.
130
+ - Preserve current `fix-plan.json` shape as much as possible to minimize downstream breakage.
131
+
132
+ ### Recommended implementation shape
133
+ Option A, lowest risk:
134
+ - Refactor `buildFixPlan()` to accept preclassified entries from `buildFixStrategy()`.
135
+ - Derive eligible/canary/rollout only from strategy entries where `autofixEligible === true`.
136
+
137
+ Also fix cluster-stage ambiguity:
138
+ - Either include `executionStage` in the cluster grouping key, or
139
+ - compute cluster stage conservatively from all entries instead of taking `entries[0]`.
140
+
141
+ ### Files
142
+ - `scripts/run-bug-hunter.cjs`
143
+ - tests in `scripts/tests/run-bug-hunter.test.cjs`
144
+ - possibly `schemas/fix-strategy.schema.json` only if contract refinement is needed
145
+
146
+ ### Test additions
147
+ - high-confidence `architectural-remediation` finding does not enter `fixPlan.canary/rollout`
148
+ - high-confidence `larger-refactor` finding does not enter executable queue
149
+ - `safe-autofix` findings still enter canary/rollout
150
+ - mixed-stage safe-autofix entries in same directory do not collapse incorrectly
151
+
152
+ ### Risk
153
+ Medium-high. This changes executable orchestration, but still within the intended design and existing artifact model.
154
+
155
+ ---
156
+
157
+ ## Verification plan
158
+
159
+ Run after each bug fix if practical, and again at the end:
160
+
161
+ ```bash
162
+ node --test scripts/tests/*.test.cjs
163
+ ```
164
+
165
+ Recommended focused sequence during implementation:
166
+
167
+ ```bash
168
+ node --test scripts/tests/fix-lock.test.cjs
169
+ node --test scripts/tests/code-index.test.cjs
170
+ node --test scripts/tests/pr-scope.test.cjs
171
+ node --test scripts/tests/run-bug-hunter.test.cjs
172
+ node --test scripts/tests/*.test.cjs
173
+ ```
174
+
175
+ ## Definition of done
176
+
177
+ - [x] All 4 confirmed bugs have targeted code fixes.
178
+ - [x] Regression tests exist for each bug.
179
+ - [x] Full script test suite passes.
180
+ - [x] No public CLI contract is changed except where necessary to avoid silent wrong behavior.
181
+ - [x] `fix-strategy` becomes behaviorally authoritative for execution gating, not just informational.
182
+
183
+ ## Outcome
184
+
185
+ Implemented and verified on 2026-03-12.
186
+
187
+ Fresh verification evidence:
188
+
189
+ ```bash
190
+ node --test scripts/tests/*.test.cjs
191
+ ```
192
+
193
+ Result: 44/44 tests passing.
@@ -0,0 +1,59 @@
1
+ # Enterprise Security Pack End-to-End Integration Plan
2
+
3
+ ## Objective
4
+
5
+ Make Bug Hunter's bundled local security skills fully end-to-end connected, portable, and enterprise-grade.
6
+
7
+ The bundled local skills already exist under `skills/`, but the main Bug Hunter orchestration flow does not yet actively route into them. This plan closes that gap by wiring the main `SKILL.md`, documentation, tests, and evals so the companion skills are not just packaged assets — they become part of the operating system of the product.
8
+
9
+ ## Target outcomes
10
+
11
+ 1. Main Bug Hunter flow explicitly routes into bundled local security skills when relevant.
12
+ 2. Security entrypoints are easy to invoke and enterprise-friendly.
13
+ 3. Docs, tests, and evals all reflect the integrated flow.
14
+ 4. The repository remains fully portable with no external marketplace dependency.
15
+ 5. After integration, run a focused Bug Hunter audit on the repository, fix any real bugs found, and summarize the net result.
16
+
17
+ ## Integration model
18
+
19
+ Bug Hunter remains the top-level orchestrator.
20
+
21
+ Bundled local skills become capability modules:
22
+ - `skills/commit-security-scan/` → diff-scoped PR/commit/staged security review
23
+ - `skills/security-review/` → full security workflow (threat model + code + deps + validation)
24
+ - `skills/threat-model-generation/` → authoritative threat model bootstrap/refresh
25
+ - `skills/vulnerability-validation/` → exploitability/reachability/CVSS/PoC validation for security findings
26
+
27
+ The main skill should load these on demand from local paths and keep all artifacts under `.bug-hunter/`.
28
+
29
+ ## Work plan
30
+
31
+ ### Milestone 1 — Main skill routing
32
+ - Add security-oriented flags and aliases to `SKILL.md` / `README.md`
33
+ - Add explicit routing rules for when to read bundled local security skills
34
+ - Make threat model generation explicitly delegate to bundled `threat-model-generation`
35
+ - Make PR security review explicitly delegate to bundled `commit-security-scan`
36
+ - Make severe security validation explicitly delegate to bundled `vulnerability-validation`
37
+ - Make full security audit explicitly delegate to bundled `security-review`
38
+
39
+ ### Milestone 2 — Enterprise UX surface
40
+ - Add enterprise-grade usage examples and a security-pack section in docs
41
+ - Keep behavior portable and artifact-native (`.bug-hunter/*` only)
42
+
43
+ ### Milestone 3 — Guardrails
44
+ - Add regression tests proving the main skill references and exposes the bundled skills
45
+ - Add evals for the new end-to-end security flows
46
+
47
+ ### Milestone 4 — Cross verification and self-audit
48
+ - Run the full script test suite
49
+ - Run a focused Bug Hunter audit on the repository
50
+ - Fix any real bugs uncovered by that audit
51
+ - Summarize all shipped changes briefly
52
+
53
+ ## Definition of done
54
+
55
+ - Main `SKILL.md` actively routes to the bundled local security skills
56
+ - `README.md` documents the integrated security pack as a real workflow, not just a packaged extra
57
+ - tests and evals cover the integrated paths
58
+ - full test suite passes
59
+ - self-audit completes and any confirmed bugs are fixed
@@ -0,0 +1,39 @@
1
+ # Local Security Skills Integration Plan
2
+
3
+ ## Objective
4
+
5
+ Vendor the security-engineer marketplace capabilities into Bug Hunter as local, portable companion skills so the repository is self-contained and does not depend on external machine-specific skill paths.
6
+
7
+ Target local skills:
8
+ - `skills/commit-security-scan/`
9
+ - `skills/security-review/`
10
+ - `skills/threat-model-generation/`
11
+ - `skills/vulnerability-validation/`
12
+
13
+ ## Design
14
+
15
+ Use Bug Hunter as the orchestrator and package the imported capabilities as local skills with Bug Hunter-native artifact paths and schemas.
16
+
17
+ Principles:
18
+ - No references to `.factory/` or external marketplace paths
19
+ - Reuse Bug Hunter-native artifacts under `.bug-hunter/`
20
+ - Keep skill bodies focused on capability/workflow; keep runtime logic in existing prompts/scripts
21
+ - Make the new skills portable by including them in the package `files` list and documenting them in the repo
22
+
23
+ ## Work items
24
+
25
+ 1. Create local skill directories with adapted `SKILL.md` files
26
+ 2. Point all skill outputs/inputs to `.bug-hunter/*` artifacts and existing Bug Hunter concepts
27
+ 3. Add a packaging/regression test to verify the local skills are present and packaged
28
+ 4. Add `skills/` to `package.json` publish files
29
+ 5. Document the bundled companion skills in `README.md`
30
+ 6. Update `CHANGELOG.md`
31
+ 7. Run tests
32
+
33
+ ## Definition of done
34
+
35
+ - `skills/` exists with the four local security skills
36
+ - no vendored skill references point to `.factory/` paths
37
+ - package metadata includes `skills/`
38
+ - tests verify the packaged skills exist
39
+ - docs explain the bundled local security pack