auditor-lambda 0.2.5 → 0.2.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. package/README.md +35 -7
  2. package/audit-code-wrapper-lib.mjs +1612 -331
  3. package/dist/cli.js +397 -38
  4. package/dist/coverage.d.ts +2 -2
  5. package/dist/coverage.js +5 -5
  6. package/dist/extractors/disposition.js +10 -1
  7. package/dist/extractors/flows.js +7 -1
  8. package/dist/extractors/pathPatterns.d.ts +3 -0
  9. package/dist/extractors/pathPatterns.js +15 -0
  10. package/dist/extractors/risk.js +7 -1
  11. package/dist/io/artifacts.d.ts +6 -6
  12. package/dist/io/artifacts.js +14 -17
  13. package/dist/io/json.d.ts +2 -0
  14. package/dist/io/json.js +15 -0
  15. package/dist/io/runArtifacts.d.ts +3 -1
  16. package/dist/io/runArtifacts.js +20 -5
  17. package/dist/mcp/server.d.ts +1 -0
  18. package/dist/mcp/server.js +579 -0
  19. package/dist/orchestrator/advance.js +9 -2
  20. package/dist/orchestrator/dependencyMap.js +9 -13
  21. package/dist/orchestrator/executors.js +7 -2
  22. package/dist/orchestrator/flowRequeue.d.ts +2 -2
  23. package/dist/orchestrator/flowRequeue.js +16 -3
  24. package/dist/orchestrator/internalExecutors.d.ts +2 -1
  25. package/dist/orchestrator/internalExecutors.js +129 -48
  26. package/dist/orchestrator/requeue.js +10 -4
  27. package/dist/orchestrator/requeueCommand.js +15 -2
  28. package/dist/orchestrator/resultIngestion.d.ts +2 -1
  29. package/dist/orchestrator/resultIngestion.js +26 -6
  30. package/dist/orchestrator/runtimeValidation.d.ts +7 -2
  31. package/dist/orchestrator/runtimeValidation.js +61 -49
  32. package/dist/orchestrator/runtimeValidationUpdate.js +2 -4
  33. package/dist/orchestrator/state.js +28 -14
  34. package/dist/orchestrator/taskBuilder.js +4 -2
  35. package/dist/orchestrator/trivialAudit.d.ts +4 -0
  36. package/dist/orchestrator/trivialAudit.js +49 -0
  37. package/dist/prompts/renderWorkerPrompt.js +6 -2
  38. package/dist/providers/spawnLoggedCommand.js +17 -0
  39. package/dist/reporting/mergeFindings.js +3 -11
  40. package/dist/reporting/rootCause.js +92 -9
  41. package/dist/reporting/synthesis.d.ts +25 -22
  42. package/dist/reporting/synthesis.js +92 -59
  43. package/dist/reporting/workBlocks.d.ts +12 -3
  44. package/dist/reporting/workBlocks.js +124 -70
  45. package/dist/supervisor/sessionConfig.js +4 -2
  46. package/dist/types/flows.d.ts +2 -0
  47. package/dist/types/runtimeValidation.d.ts +2 -1
  48. package/dist/types.d.ts +8 -6
  49. package/dist/validation/auditResults.d.ts +5 -2
  50. package/dist/validation/auditResults.js +335 -43
  51. package/docs/agent-integrations.md +38 -29
  52. package/docs/artifacts.md +18 -51
  53. package/docs/bootstrap-install.md +60 -30
  54. package/docs/contract.md +25 -117
  55. package/docs/field-trial-bug-report.md +237 -0
  56. package/docs/next-steps.md +59 -44
  57. package/docs/packaging.md +13 -3
  58. package/docs/production-launch-bar.md +2 -2
  59. package/docs/production-readiness.md +9 -5
  60. package/docs/releasing.md +81 -0
  61. package/docs/session-config.md +20 -1
  62. package/docs/usage.md +22 -0
  63. package/package.json +4 -1
  64. package/schemas/audit_result.schema.json +4 -5
  65. package/schemas/audit_task.schema.json +10 -0
  66. package/schemas/runtime_validation_report.schema.json +1 -1
  67. package/skills/audit-code/SKILL.md +11 -2
  68. package/skills/audit-code/audit-code.prompt.md +11 -10
  69. package/schemas/merged_findings.schema.json +0 -19
  70. package/schemas/root_cause_clusters.schema.json +0 -28
  71. package/schemas/synthesis_report.schema.json +0 -61
@@ -8,65 +8,86 @@ audit-code install
8
8
 
9
9
  That command installs the repo-local `/audit-code` surfaces we can automate today.
10
10
 
11
- ## What it writes
12
-
13
- Installed command surfaces:
11
+ After bootstrap, run:
14
12
 
15
- - `.github/prompts/audit-code.prompt.md` for VS Code chat prompt files, with `agent: agent`
16
- - `.opencode/commands/audit-code.md` for OpenCode custom commands, with `agent: build`
17
- - `.claude/commands/audit-code.md` for Claude Code custom slash commands
13
+ ```bash
14
+ audit-code verify-install
15
+ ```
18
16
 
19
- Installed always-on compatibility surfaces:
17
+ That smoke-tests the generated host assets plus the shared repo-local MCP launcher without waiting for a full editor walkthrough.
20
18
 
21
- - `.github/copilot-instructions.md`
22
- - `AGENTS.md`
23
- - `CLAUDE.md`
19
+ ## What it writes
24
20
 
25
- Installed repo-local canonical assets:
21
+ Installed shared surfaces:
26
22
 
27
23
  - `.audit-code/install/audit-code.import.md`
28
24
  - `.audit-code/install/SKILL.md`
29
25
  - `.audit-code/install/GETTING-STARTED.md`
26
+ - `.audit-code/install/manifest.json`
27
+ - `.audit-code/install/run-mcp-server.mjs`
28
+
29
+ Installed host-specific surfaces:
30
+
31
+ - Codex:
32
+ - `.codex/skills/audit-code/*`
33
+ - `AGENTS.md` managed block when needed
34
+ - `.audit-code/install/codex/MCP-SETUP.md`
35
+ - `.audit-code/install/codex/RE-AUDIT-AUTOMATION.md`
36
+ - Claude Desktop:
37
+ - `.audit-code/install/claude-desktop/PROJECT-TEMPLATE.md`
38
+ - `.audit-code/install/claude-desktop/remote-mcp-connector.json`
39
+ - `.audit-code/install/claude-desktop/auditor-lambda.dxt`
40
+ - `.audit-code/install/claude-desktop/auditor-lambda.mcpb`
41
+ - OpenCode:
42
+ - `.opencode/commands/audit-code.md`
43
+ - `.opencode/skills/audit-code/*`
44
+ - `opencode.json`
45
+ - `AGENTS.md` managed block when needed
46
+ - VS Code:
47
+ - `.github/prompts/audit-code.prompt.md`
48
+ - `.github/copilot-instructions.md`
49
+ - `.github/agents/auditor.agent.md`
50
+ - `.vscode/mcp.json`
51
+ - Antigravity:
52
+ - `.audit-code/install/antigravity/PLANNING-MODE.md`
53
+ - `AGENTS.md` managed block when needed
30
54
 
31
55
  The generated `GETTING-STARTED.md` now includes dedicated quick-start sections for:
32
56
 
33
- - VS Code
34
- - OpenCode
35
- - Claude Code
57
+ - Codex
36
58
  - Claude Desktop
59
+ - OpenCode
60
+ - VS Code
37
61
  - Antigravity
38
62
 
39
- Installed compatibility skill bundles:
40
-
41
- - `.opencode/skills/audit-code/*`
42
- - `.claude/skills/audit-code/*`
43
- - `.agents/skills/audit-code/*`
44
-
45
63
  ## Goal
46
64
 
47
- After bootstrap, the user should be able to open a supported conversation surface in the repository and invoke:
65
+ After bootstrap, the user should be able to open a supported host surface in the repository and invoke:
48
66
 
49
67
  ```text
50
68
  /audit-code
51
69
  ```
52
70
 
53
- without supplying extra root paths, provider flags, or model-selection arguments.
71
+ without supplying extra root paths, provider flags, or model-selection arguments, or connect through the shared MCP server when the host prefers tool-driven integration.
54
72
 
55
73
  ## What is fully automated today
56
74
 
57
- - VS Code and GitHub Copilot repo-local prompt surfaces
58
- - OpenCode project command surfaces
59
- - Claude Code project command surfaces
60
- - tool-agnostic compatibility instruction files for hosts that honor `AGENTS.md` or `CLAUDE.md`
75
+ - shared installer output, manifest generation, and repo-local MCP launcher generation
76
+ - Codex skill-bundle and AGENTS-oriented install output
77
+ - OpenCode command, skill, prompt, and config generation
78
+ - VS Code prompt, custom-agent, instruction, and MCP config generation
79
+ - Claude Desktop project-template, remote-connector, and local bundle generation
80
+ - Antigravity planning-mode guidance generation
61
81
 
62
82
  ## What is not fully automated today
63
83
 
64
- - Claude Desktop does not currently have a verified project-local slash-command install surface in this repository
65
- - Antigravity does not currently have a verified repo-local slash-command install surface in this repository
84
+ - product-level smoke validation for the generated Codex, Claude Desktop, OpenCode, and VS Code assets
85
+ - one-click proof that the generated Claude Desktop bundle installs cleanly in a real Desktop environment
86
+ - documented Antigravity artifact round-tripping back through `import_results` and `import_runtime_updates`
66
87
 
67
- For those hosts, the bootstrap command still installs compatibility assets, but the final `/audit-code` discovery behavior remains host-dependent.
88
+ For those gaps, the bootstrap command now writes the repo-local assets and guidance, but the final operator experience still needs end-to-end host verification.
68
89
 
69
- Use `.audit-code/install/GETTING-STARTED.md` as the low-guess repo-local handoff for those manual prompt-import paths and for the exact VS Code, OpenCode, and Claude Code bootstrap surfaces that were generated.
90
+ Use `.audit-code/install/GETTING-STARTED.md` as the low-guess repo-local handoff, and treat `.audit-code/install/manifest.json` as the machine-readable source of truth for what was generated.
70
91
 
71
92
  ## Narrow compatibility alias
72
93
 
@@ -77,3 +98,12 @@ audit-code install-host --host copilot
77
98
  ```
78
99
 
79
100
  Use it only when you intentionally want the smaller Copilot-only install path instead of the default bootstrap.
101
+
102
+ ## Remaining steps
103
+
104
+ The installer foundation is now in place. The remaining work is:
105
+
106
+ 1. smoke-test each claimed host in the real product, not only via file-generation tests
107
+ 2. tighten `GETTING-STARTED.md` and host-specific setup docs where those smoke tests show friction
108
+ 3. prove the Claude Desktop local bundle install path operationally
109
+ 4. document Antigravity artifact-import workflows more concretely
package/docs/contract.md CHANGED
@@ -1,71 +1,12 @@
1
- # audit-code response contract
1
+ # audit-code Response Contract
2
2
 
3
- This document describes the backend fallback JSON response contract for the `audit-code` wrapper.
3
+ This document follows [audit-goals.md](C:/Code/auditor-lambda/spec/audit-goals.md).
4
4
 
5
- The canonical product remains `/audit-code` in conversation.
5
+ ## Canonical output
6
6
 
7
- ## Backend fallback commands
7
+ The authoritative completed-audit output is repo-root `audit-report.md`.
8
8
 
9
- Repo-local fallback command from the target repository root:
10
-
11
- ```bash
12
- audit-code
13
- ```
14
-
15
- Installed helper for locating the packaged conversation prompt asset:
16
-
17
- ```bash
18
- audit-code prompt-path
19
- ```
20
-
21
- Installed helper for validating the current backend artifact bundle:
22
-
23
- ```bash
24
- audit-code validate
25
- ```
26
-
27
- Repository-local wrapper equivalent:
28
-
29
- ```bash
30
- node audit-code.mjs
31
- ```
32
-
33
- ## Contract version
34
-
35
- Every canonical wrapper response includes:
36
-
37
- ```text
38
- contract_version: audit-code/v1alpha1
39
- ```
40
-
41
- Consumers should verify this value before assuming the response shape.
42
-
43
- ## Source of truth
44
-
45
- The versioned JSON schema is:
46
-
47
- ```text
48
- schemas/audit-code-v1alpha1.schema.json
49
- ```
50
-
51
- Product tests validate live wrapper output against that schema.
52
-
53
- ## Reproducible installed-command smoke check
54
-
55
- From the repository root:
56
-
57
- ```bash
58
- npm install
59
- npm run build
60
- npm link
61
- npm run smoke:linked-audit-code
62
- ```
63
-
64
- This exercises the installed backend fallback command end-to-end and validates the emitted JSON against the versioned schema.
65
-
66
- ## Top-level fields
67
-
68
- The current v1alpha1 contract includes these top-level fields:
9
+ Until completion, the wrapper response remains a JSON envelope with:
69
10
 
70
11
  - `contract_version`
71
12
  - `audit_state`
@@ -77,64 +18,31 @@ The current v1alpha1 contract includes these top-level fields:
77
18
  - `next_likely_step`
78
19
  - `handoff`
79
20
 
80
- `handoff` is a companion operator-context object. It includes:
81
-
82
- - current top-level status
83
- - repo and artifacts paths
84
- - pending obligations
85
- - suggested evidence-import paths and commands when manual continuation is required
86
- - stable paths to companion handoff files under `.audit-artifacts`
87
-
88
- ## Terminal states
89
-
90
- Consumers should continue invoking the same wrapper until:
91
-
92
- 1. `next_likely_step == null`
93
-
94
- Terminal interpretation:
95
-
96
- - `audit_state.status == "complete"` means the audit finished end to end.
97
- - `audit_state.status == "blocked"` means no further automatic progress is available and the remaining work needs imported results or an interactive provider.
98
- - `progress_made` tells you whether the current invocation wrote additional artifacts before it reached that terminal state.
99
-
100
- When the wrapper emits a response, it also refreshes:
101
-
102
- - `.audit-artifacts/operator-handoff.json`
103
- - `.audit-artifacts/operator-handoff.md`
104
-
105
- Those files mirror the structured `handoff` guidance in machine-readable and human-readable forms.
106
-
107
- ## Audit state shape
108
-
109
- `audit_state` includes:
110
-
111
- - `status`
112
- - `obligations`
113
- - optional `last_executor`
114
- - optional `last_obligation`
115
- - optional `blockers`
21
+ ## AuditResult contract
116
22
 
117
- `status` is one of:
23
+ Workers submit `AuditResult[]` shaped by `schemas/audit_result.schema.json`.
118
24
 
119
- - `not_started`
120
- - `active`
121
- - `blocked`
122
- - `complete`
25
+ Important rules:
123
26
 
124
- Each obligation includes:
27
+ - `file_coverage` is required and must include every assigned file.
28
+ - `file_coverage[].total_lines` must match the current file line count.
29
+ - `findings[].affected_files` must be objects, not strings.
30
+ - `findings[].evidence` must be an array of plain strings.
125
31
 
126
- - `id`
127
- - `state`
128
- - optional `reason`
32
+ Use `audit-code validate-results --results <file>` before ingestion to validate
33
+ results against the active task manifest.
129
34
 
130
- Obligation `state` is one of:
35
+ ## Internal artifacts during incomplete runs
131
36
 
132
- - `missing`
133
- - `present`
134
- - `stale`
135
- - `blocked`
136
- - `satisfied`
37
+ The engine may keep resumable artifacts under `.audit-artifacts/`, including:
137
38
 
138
- ## Compatibility note
39
+ - intake/structure/planning artifacts
40
+ - `audit_tasks.json`
41
+ - `audit_results.jsonl`
42
+ - `requeue_tasks.json`
43
+ - `runtime_validation_tasks.json`
44
+ - `runtime_validation_report.json` when runtime validation is planned
45
+ - dispatch files for the active worker task
139
46
 
140
- This contract is versioned as `v1alpha1` deliberately. It is stable enough for current product use, but it should still be treated as pre-v1.
47
+ These artifacts are internal and transient. On successful completion, they are
48
+ cleared out and only `audit-report.md` remains.
@@ -0,0 +1,237 @@
1
+ # audit-code Field Trial Bug Report
2
+
3
+ **Observed by:** LLM workers (Claude, April 2026)
4
+ **Environments tested:** Claude Desktop (claude-code provider), OpenCode (opencode provider)
5
+ **Repositories audited:** `Polar-CV-KAN` (~30 source files, ~126 tasks, Claude Desktop); same codebase, OpenCode run
6
+ **Report date:** 2026-04-21
7
+
8
+ Issues marked **[Both]** were independently observed in both environments. Issues marked **[CD]** or **[OC]** were specific to one environment.
9
+
10
+ ---
11
+
12
+ ## Critical
13
+
14
+ ### F-01 — Orchestrator never transitions to `status: "complete"` [Both]
15
+
16
+ **CD observation:** `audit_tasks.json` showed `status: undefined` for all 126 tasks after successful ingestion. The completion gate checks this field, finds it undefined on every task, and permanently blocks. `synthesis_report.json` populated correctly (95 findings, 46 clusters) because synthesis runs on ingestion independently, but `status: "complete"` never fires because the gate only looks at the broken task-status field.
17
+
18
+ **OC observation:** Even after all obligations reached `satisfied`, `audit_state.json` remained `status: "active"` and re-triggered planning artifacts. Required direct JSON edit to force `status: "complete"`.
19
+
20
+ **Impact:** The documented stop condition ("stop the loop when terminal output shows `status: complete`") never fires in either environment. Workers must either loop indefinitely or make a judgment call to stop. This is the most fundamental failure in the framework.
21
+
22
+ **Fix needed:** Audit task status must be written at ingestion time. The completion gate should also fall back on obligation-state truth: if all obligations are `satisfied`, the run is complete regardless of the task-status field.
23
+
24
+ ---
25
+
26
+ ### F-02 — Worker launch failures / silent executor failures [Both]
27
+
28
+ **CD observation:** `structure_executor` failed to launch during initial structuring. The failure was reported in JSON output but the orchestrator continued as if structuring had succeeded. Task quality degraded silently from the start — unclear whether the resulting task plan was the best possible decomposition or a fallback.
29
+
30
+ **OC observation:** Every executor failed (`agent`, `result_ingestion_executor`, `planning_executor`). The entire audit had to be performed manually by reading source files, writing findings in the correct format, and directly manipulating artifact files. The provider (`opencode`) was supposed to enable interactive dispatch but never actually dispatched work.
31
+
32
+ **Impact (OC):** The framework served as a state tracker only — no automation at all. **Impact (CD):** Silent quality degradation at the task-planning phase with no way to detect it.
33
+
34
+ **Fix needed:** Worker launch failures must be surfaced as blocking handoffs, not silently swallowed. The orchestrator must not advance past a failed executor as if it succeeded. Executor failure messages must include enough detail to diagnose the root cause.
35
+
36
+ ---
37
+
38
+ ## High
39
+
40
+ ### F-03 — `--results` ingestion is unreliable [Both]
41
+
42
+ **CD observation:** `audit-code --results <file>` threw `TypeError: e.trim is not a function` when evidence fields contained objects rather than plain strings. The CLI exited 0 in some cases, making it ambiguous whether results were partially or fully ingested.
43
+
44
+ **OC observation:** Two separate failure modes: (1) the generated task file contained `audit_results_path: "--root"` (the CLI flag was written as the path value) causing an immediate crash; (2) after manually fixing the path, ingestion crashed with `Cannot read properties of undefined (reading 'map')` — the ingestion executor cannot parse the incoming results format.
45
+
46
+ **Impact:** The primary submission mechanism is unreliable. Workers resort to custom scripts that bypass the framework and will break if the artifact schema changes.
47
+
48
+ **Fix needed:**
49
+ - Fix the `audit_results_path: "--root"` generation bug in the task file writer.
50
+ - Add schema validation on ingestion that emits field-level errors: `"evidence[2] must be a string, got object"` rather than a bare JS runtime crash.
51
+ - The ingestion executor must not exit 0 on partial failure.
52
+
53
+ ---
54
+
55
+ ### F-04 — CLI hangs without output [CD]
56
+
57
+ On multiple occasions, `audit-code` or `audit-code --results <file>` produced no output and hung indefinitely. No timeout, no error message, no way to distinguish a genuine hang from a slow operation.
58
+
59
+ **Observed pattern:** Hangs were most frequent immediately after large ingestions and at session start during manifest structuring. Suspected cause: Node.js blocking on large JSON serialization or file locking between the orchestrator writing state and the CLI reading it.
60
+
61
+ **Fix needed:** Add a timeout (`--timeout <ms>`) and ensure the CLI emits a progress indicator or heartbeat on long operations.
62
+
63
+ ---
64
+
65
+ ### F-05 — Requeue tasks explosion — 141 tasks from 10 findings [OC]
66
+
67
+ After ingesting the first batch of 10 `data_integrity` findings, the orchestrator generated a `requeue_tasks.json` with 141 tasks — more than the original 64 audit tasks. The requeue logic appears to fan out across all `(lens, file_group)` combinations, producing a combinatorial explosion. No guidance is provided on which requeue tasks are actually needed vs. redundant.
68
+
69
+ **Fix needed:** Requeue logic must de-duplicate against already-completed coverage. Requeue tasks should only be generated for `(file_group, lens)` pairs not already marked complete in `coverage_matrix.json`.
70
+
71
+ ---
72
+
73
+ ## Medium
74
+
75
+ ### F-06 — Evidence schema undocumented; validation only at ingestion [Both]
76
+
77
+ **CD observation:** `evidence[]` must be an array of plain strings. This constraint is documented in `current-prompt.md` but not validated until `--results` ingestion, where failure produces a cryptic JS runtime error with no field-level attribution.
78
+
79
+ **OC observation:** `evidence` must be an array of objects (`[{excerpt, line_reference}]`), not a single object. Discovered only through the error `"(finding.evidence ?? []) is not iterable"`.
80
+
81
+ **Note — schema discrepancy:** The two environments reported conflicting evidence types (strings vs. objects). This is itself a documentation or versioning problem — the expected format is not the same across runs, or the prompt changed between environments.
82
+
83
+ **Fix needed:**
84
+ - Publish a JSON Schema file (or inline schema comment in the prompt) that workers can validate against before submission.
85
+ - The string format should be explicit: `"src/foo.py:42 — variable overwritten before use"`.
86
+ - Reconcile the expected type (string vs object) and pick one; document it prominently.
87
+
88
+ ---
89
+
90
+ ### F-07 — `synthesis_current` obligation permanently shows "missing" [Both]
91
+
92
+ **CD observation:** Even after `synthesis_report.json` was fully populated (95 findings, 46 clusters, 22 work blocks), the obligation tracker showed `synthesis_current: missing` because it was blocked by `audit_tasks_completed` (itself blocked by the undefined-status bug, F-01). The worker cannot distinguish "synthesis truly hasn't run" from "synthesis ran but the gate is broken."
93
+
94
+ **OC observation:** The obligation was never going to be satisfied by the framework — the synthesis agent never ran. Required manually creating `synthesis_report.json` and forcing the obligation to `satisfied`.
95
+
96
+ **Fix needed:** Decouple `synthesis_current` satisfaction from `audit_tasks_completed`. If `synthesis_report.json` exists and is non-empty, `synthesis_current` should be satisfied regardless of upstream gate state.
97
+
98
+ ---
99
+
100
+ ### F-08 — "All remaining N tasks low priority" — no guidance on what to do [CD]
101
+
102
+ At a certain point the orchestrator indicated all remaining 110 tasks were low priority. The directive does not define what the worker should do: submit empty findings (what was done), review at reduced depth, or skip entirely. Submitting `findings: []` for 60+ tasks in rapid succession was the only way to unblock the orchestrator, but legitimate low-severity findings in those files were never written.
103
+
104
+ **Fix needed:** When the orchestrator decides remaining tasks are low priority, emit an explicit directive — e.g., `"You may submit empty findings for these tasks"` or `"Review at reduced depth"` or `"Skip — the audit has sufficient coverage"`.
105
+
106
+ ---
107
+
108
+ ### F-09 — Trivially empty files dispatched as full audit tasks [CD]
109
+
110
+ The task manifest dispatched audit tasks for empty `__init__.py` files (some containing only a docstring), dotfiles (`.gitignore`, `.gitattributes`), and one-line stub files. Each required a full read→write→ingest round-trip to produce an empty `findings: []` result. For 30 files this added ~30–40 pointless round-trips; at scale this is severe.
111
+
112
+ **Fix needed:** Filter files below a minimum token threshold (or with no parseable code constructs) before dispatching tasks. Batch all trivially-empty files into a single no-op task, or skip them entirely.
113
+
114
+ ---
115
+
116
+ ### F-10 — No batch task dispatch; one task per CLI invocation [Both]
117
+
118
+ **CD observation:** 126 tasks required 252+ CLI invocations plus 126 file reads and writes. The design assumes one task = one LLM call = one CLI round-trip.
119
+
120
+ **OC observation:** No bulk ingestion mechanism; wrote a custom `scripts/ingest-results.mjs` that directly mutated `audit_results.jsonl`, `coverage_matrix.json`, and `audit_state.json`.
121
+
122
+ **Fix needed:** Support batched dispatch (a `current-tasks.json` with N tasks per run) and a native `audit-code --batch-results <dir>` that processes all result files in a directory. Alternatively, make `agentBatchSize` settable to a meaningful value that workers actually see in their prompt.
123
+
124
+ ---
125
+
126
+ ### F-11 — Coverage matrix ↔ task_id mapping is opaque [OC]
127
+
128
+ The relationship between `audit_tasks.json` task IDs (e.g., `src-lib:correctness`) and `coverage_matrix.json` file entries (e.g., `src/lib/file-utils.ts` with `required_lenses: [correctness, ...]`) is implicit. A task's `file_group` maps to multiple files in the matrix, but there is no explicit mapping table. The mapping must be reverse-engineered by reading both files.
129
+
130
+ **Fix needed:** Either include the resolved file list in each task, or provide an `audit-code explain-task <task_id>` subcommand that shows which files and lenses a task covers.
131
+
132
+ ---
133
+
134
+ ### F-12 — Bash variable substitution breaks `node -e` shell loops [CD]
135
+
136
+ When batching remaining tasks with a shell loop that invoked `node -e '...'`, bash expanded `${}` syntax inside the JS string before Node.js received it, producing `bad substitution` errors. The workaround (write a standalone `.mjs` file) is not documented anywhere.
137
+
138
+ **Fix needed:** Document the correct pattern for batch submission loops, or provide a native `audit-code --batch-results <dir>` to eliminate the need for shell-level scripting entirely.
139
+
140
+ ---
141
+
142
+ ### F-13 — Session config discovery and provider switching are error-prone [Both]
143
+
144
+ **CD observation:** Every invocation printed `[session-config] no session-config.json found` — 252+ times across the run. The warning appeared even after completing ingestion steps that should have established the session.
145
+
146
+ **OC observation:** Required manually creating `session-config.json` with `{"provider": "opencode"}`. Even after doing so, the provider change only altered the error message; actual dispatch still did not work.
147
+
148
+ **Fix needed:** Create a default `session-config.json` on first `audit-code` run. Suppress the warning when a session config is genuinely optional. Document the `provider` field and its valid values prominently in the session-config guide.
149
+
150
+ ---
151
+
152
+ ### F-14 — No documentation on artifact schema or contract [OC]
153
+
154
+ The `contract_version: "audit-code/v1alpha1"` header implies a versioned protocol, but there is no schema documentation. The expected formats for JSONL structure, finding shape, task_id conventions, and coverage matrix layout had to be learned entirely by reading existing artifacts and reverse-engineering error messages.
155
+
156
+ **Fix needed:** Publish a contract reference document alongside `docs/contract.md` (which exists but may be incomplete) covering: all artifact file schemas, field-level types and constraints, task_id naming convention, and the expected `AuditResult` JSON shape with a worked example.
157
+
158
+ ---
159
+
160
+ ## Low
161
+
162
+ ### F-15 — `worker_results_pending.json` not cleared after ingestion [CD]
163
+
164
+ After `audit-code --results <file>`, the staging file is not cleared. Stale results from the previous task are resubmitted if the worker forgets to overwrite the file.
165
+
166
+ **Fix needed:** After successful ingestion, delete or rename `worker_results_pending.json` (e.g., to `worker_results_submitted_<timestamp>.json`).
167
+
168
+ ---
169
+
170
+ ### F-16 — `related_findings` contains only circular self-references [CD]
171
+
172
+ In the synthesized `synthesis_report.json`, nearly every finding's `related_findings` array contains only the finding's own ID. This is useless and misleads reviewers into thinking cross-finding relationships were analyzed.
173
+
174
+ **Fix needed:** Omit `related_findings` when the synthesis engine cannot identify cross-finding relationships, rather than populating it with a self-reference.
175
+
176
+ ---
177
+
178
+ ### F-17 — Runtime validation evidence shows "pending" for every finding [CD]
179
+
180
+ All 95 findings contain entries like `"runtime:unit:src-modules: pending — No runtime evidence recorded yet"`. These appear verbatim across every finding, bloating each one with 2–3 lines of noise that convey nothing.
181
+
182
+ **Fix needed:** Omit pending evidence entries from the output entirely. A finding with no runtime evidence should have no runtime evidence in its array — not a placeholder repeated 95 times.
183
+
184
+ ---
185
+
186
+ ### F-18 — `root_cause_clusters` are file co-location groups, not semantic clusters [CD]
187
+
188
+ The 46 clusters are named `"correctness/correctness in src/modules"` — file-path groups with a lens label, not semantic root causes. One cluster for "correctness in src/modules" contains 5 findings with 3 entirely different root causes (division-by-zero, cudagraph violation, dead code).
189
+
190
+ **Fix needed:** Root-cause clustering should be semantic (e.g., "All NaN paths from missing eps guards"). Either improve the clustering algorithm or rename the section to `file_clusters` to accurately describe what it contains.
191
+
192
+ ---
193
+
194
+ ### F-19 — `work_blocks` section omitted from final summary presentation [CD]
195
+
196
+ The audit directive says findings should be organized into "non-overlapping blocks of tasks." `synthesis_report.json` correctly generates 22 `work_blocks`. The final summary presentation omitted this section entirely, showing only a flat findings table. Work blocks are arguably the most actionable output and should lead the summary.
197
+
198
+ **Fix needed:** Make the `work_blocks` presentation requirement explicit and prominent in the final-output section of the prompt.
199
+
200
+ ---
201
+
202
+ ### F-20 — `reviewed_ranges` field is unenforceable and creates false confidence [CD]
203
+
204
+ Workers can declare `reviewed_ranges: [{start: 1, end: 10}]` while writing findings about line 800, or declare the full file range without actually reading it. For a 966-line file this makes the field meaningless.
205
+
206
+ **Fix needed:** Either remove `reviewed_ranges` (it creates false confidence) or enforce it mechanically by requiring a content hash or line-count to be included alongside the range declaration.
207
+
208
+ ---
209
+
210
+ ## Summary Table
211
+
212
+ | ID | Issue | Sev | Env | Type |
213
+ |----|-------|-----|-----|------|
214
+ | F-01 | Orchestrator never reaches `status: "complete"` — task statuses undefined | Critical | Both | Bug |
215
+ | F-02 | Worker launch failures / silent executor failures | Critical | Both | Bug |
216
+ | F-03 | `--results` ingestion unreliable (type errors, wrong path, `.map()` crash) | High | Both | Bug |
217
+ | F-04 | CLI hangs without output on some invocations | High | CD | Bug |
218
+ | F-05 | Requeue tasks explosion — 141 tasks from 10 findings | High | OC | Bug |
219
+ | F-06 | Evidence schema undocumented; validated only at ingestion (conflicting types across envs) | Medium | Both | DX |
220
+ | F-07 | `synthesis_current` permanently "missing" even when report is populated | Medium | Both | Bug |
221
+ | F-08 | "All remaining N tasks low priority" — no guidance on worker action | Medium | CD | UX |
222
+ | F-09 | Trivially empty files dispatched as full audit tasks | Medium | CD | Design |
223
+ | F-10 | No batch task dispatch; one task = one CLI invocation = one round-trip | Medium | Both | Design |
224
+ | F-11 | Coverage matrix ↔ task_id mapping is opaque; no lookup table | Medium | OC | DX |
225
+ | F-12 | Bash variable substitution breaks `node -e` shell loops | Medium | CD | DX |
226
+ | F-13 | Session config discovery / provider switching error-prone; noisy warnings | Medium | Both | UX |
227
+ | F-14 | No documentation on artifact schema or contract | Medium | OC | DX |
228
+ | F-15 | `worker_results_pending.json` not cleared after ingestion | Low | CD | Bug |
229
+ | F-16 | `related_findings` circular self-references | Low | CD | Data |
230
+ | F-17 | Runtime validation "pending" entries in all 95 findings | Low | CD | Data |
231
+ | F-18 | `root_cause_clusters` are file co-location groups, not semantic | Low | CD | Design |
232
+ | F-19 | `work_blocks` omitted from final summary presentation | Low | CD | UX |
233
+ | F-20 | `reviewed_ranges` unenforceable; creates false confidence | Low | CD | Design |
234
+
235
+ **Env key:** CD = Claude Desktop (claude-code provider), OC = OpenCode (opencode provider), Both = independently observed in both.
236
+
237
+ **Type key:** Bug = incorrect behavior, DX = developer/worker experience, UX = output/presentation, Design = intentional design that needs reconsideration, Data = output data quality.