auditor-lambda 0.3.12 → 0.3.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. package/README.md +20 -24
  2. package/audit-code-wrapper-lib.mjs +52 -53
  3. package/dist/cli.js +43 -6
  4. package/dist/coverage.js +3 -1
  5. package/dist/extractors/disposition.js +8 -1
  6. package/dist/extractors/graph.d.ts +3 -1
  7. package/dist/extractors/graph.js +1147 -67
  8. package/dist/extractors/graphManifestEdges.d.ts +14 -0
  9. package/dist/extractors/graphManifestEdges.js +1158 -0
  10. package/dist/extractors/graphPathUtils.d.ts +5 -0
  11. package/dist/extractors/graphPathUtils.js +75 -0
  12. package/dist/extractors/pathPatterns.d.ts +1 -0
  13. package/dist/extractors/pathPatterns.js +3 -0
  14. package/dist/io/artifacts.d.ts +10 -1
  15. package/dist/io/artifacts.js +23 -3
  16. package/dist/orchestrator/internalExecutors.d.ts +4 -0
  17. package/dist/orchestrator/internalExecutors.js +35 -6
  18. package/dist/orchestrator/reviewPackets.js +1003 -31
  19. package/dist/orchestrator/syntaxResolutionExecutor.js +34 -0
  20. package/dist/types/externalAnalyzer.d.ts +9 -0
  21. package/dist/types/graph.d.ts +3 -0
  22. package/dist/types/reviewPlanning.d.ts +39 -0
  23. package/docs/contracts.md +215 -0
  24. package/docs/development.md +210 -0
  25. package/docs/handoff.md +204 -0
  26. package/docs/history.md +40 -0
  27. package/docs/operator-guide.md +189 -0
  28. package/docs/product.md +185 -0
  29. package/docs/release.md +131 -0
  30. package/package.json +1 -1
  31. package/schemas/audit_plan_metrics.schema.json +347 -0
  32. package/schemas/external_analyzer_results.schema.json +35 -0
  33. package/schemas/graph_bundle.schema.json +47 -2
  34. package/schemas/review_packets.schema.json +160 -0
  35. package/skills/audit-code/SKILL.md +7 -3
  36. package/skills/audit-code/audit-code.prompt.md +4 -1
  37. package/docs/agent-integrations.md +0 -317
  38. package/docs/agent-roles.md +0 -69
  39. package/docs/architecture.md +0 -90
  40. package/docs/artifacts.md +0 -36
  41. package/docs/bootstrap-install.md +0 -139
  42. package/docs/contract.md +0 -54
  43. package/docs/dispatch-implementation-plan.md +0 -302
  44. package/docs/field-trial-bug-report.md +0 -237
  45. package/docs/github-copilot.md +0 -66
  46. package/docs/model-selection.md +0 -97
  47. package/docs/next-steps.md +0 -202
  48. package/docs/packaging.md +0 -120
  49. package/docs/pipeline.md +0 -152
  50. package/docs/product-direction.md +0 -154
  51. package/docs/production-launch-bar.md +0 -92
  52. package/docs/production-readiness.md +0 -58
  53. package/docs/releasing.md +0 -145
  54. package/docs/remediation-baseline.md +0 -75
  55. package/docs/repo-layout.md +0 -30
  56. package/docs/run-flow.md +0 -56
  57. package/docs/session-config.md +0 -319
  58. package/docs/supervisor.md +0 -100
  59. package/docs/usage.md +0 -215
  60. package/docs/windows-setup.md +0 -146
  61. package/docs/workflow-refactor-brief.md +0 -124
@@ -1,237 +0,0 @@
1
- # audit-code Field Trial Bug Report
2
-
3
- **Observed by:** LLM workers (Claude, April 2026)
4
- **Environments tested:** Claude Desktop (claude-code provider), OpenCode (opencode provider)
5
- **Repositories audited:** `Polar-CV-KAN` (~30 source files, ~126 tasks, Claude Desktop); same codebase, OpenCode run
6
- **Report date:** 2026-04-21
7
-
8
- Issues marked **[Both]** were independently observed in both environments. Issues marked **[CD]** or **[OC]** were specific to one environment.
9
-
10
- ---
11
-
12
- ## Critical
13
-
14
- ### F-01 — Orchestrator never transitions to `status: "complete"` [Both]
15
-
16
- **CD observation:** `audit_tasks.json` showed `status: undefined` for all 126 tasks after successful ingestion. The completion gate checks this field, finds it undefined on every task, and permanently blocks. `synthesis_report.json` populated correctly (95 findings, 46 clusters) because synthesis runs on ingestion independently, but `status: "complete"` never fires because the gate only looks at the broken task-status field.
17
-
18
- **OC observation:** Even after all obligations reached `satisfied`, `audit_state.json` remained `status: "active"` and re-triggered planning artifacts. Required direct JSON edit to force `status: "complete"`.
19
-
20
- **Impact:** The documented stop condition ("stop the loop when terminal output shows `status: complete`") never fires in either environment. Workers must either loop indefinitely or make a judgment call to stop. This is the most fundamental failure in the framework.
21
-
22
- **Fix needed:** Audit task status must be written at ingestion time. The completion gate should also fall back on obligation-state truth: if all obligations are `satisfied`, the run is complete regardless of the task-status field.
23
-
24
- ---
25
-
26
- ### F-02 — Worker launch failures / silent executor failures [Both]
27
-
28
- **CD observation:** `structure_executor` failed to launch during initial structuring. The failure was reported in JSON output but the orchestrator continued as if structuring had succeeded. Task quality degraded silently from the start — unclear whether the resulting task plan was the best possible decomposition or a fallback.
29
-
30
- **OC observation:** Every executor failed (`agent`, `result_ingestion_executor`, `planning_executor`). The entire audit had to be performed manually by reading source files, writing findings in the correct format, and directly manipulating artifact files. The provider (`opencode`) was supposed to enable interactive dispatch but never actually dispatched work.
31
-
32
- **Impact (OC):** The framework served as a state tracker only — no automation at all. **Impact (CD):** Silent quality degradation at the task-planning phase with no way to detect it.
33
-
34
- **Fix needed:** Worker launch failures must be surfaced as blocking handoffs, not silently swallowed. The orchestrator must not advance past a failed executor as if it succeeded. Executor failure messages must include enough detail to diagnose the root cause.
35
-
36
- ---
37
-
38
- ## High
39
-
40
- ### F-03 — `--results` ingestion is unreliable [Both]
41
-
42
- **CD observation:** `audit-code --results <file>` threw `TypeError: e.trim is not a function` when evidence fields contained objects rather than plain strings. The CLI exited 0 in some cases, making it ambiguous whether results were partially or fully ingested.
43
-
44
- **OC observation:** Two separate failure modes: (1) the generated task file contained `audit_results_path: "--root"` (the CLI flag was written as the path value) causing an immediate crash; (2) after manually fixing the path, ingestion crashed with `Cannot read properties of undefined (reading 'map')` — the ingestion executor cannot parse the incoming results format.
45
-
46
- **Impact:** The primary submission mechanism is unreliable. Workers resort to custom scripts that bypass the framework and will break if the artifact schema changes.
47
-
48
- **Fix needed:**
49
- - Fix the `audit_results_path: "--root"` generation bug in the task file writer.
50
- - Add schema validation on ingestion that emits field-level errors: `"evidence[2] must be a string, got object"` rather than a bare JS runtime crash.
51
- - The ingestion executor must not exit 0 on partial failure.
52
-
53
- ---
54
-
55
- ### F-04 — CLI hangs without output [CD]
56
-
57
- On multiple occasions, `audit-code` or `audit-code --results <file>` produced no output and hung indefinitely. No timeout, no error message, no way to distinguish a genuine hang from a slow operation.
58
-
59
- **Observed pattern:** Hangs were most frequent immediately after large ingestions and at session start during manifest structuring. Suspected cause: Node.js blocking on large JSON serialization or file locking between the orchestrator writing state and the CLI reading it.
60
-
61
- **Fix needed:** Add a timeout (`--timeout <ms>`) and ensure the CLI emits a progress indicator or heartbeat on long operations.
62
-
63
- ---
64
-
65
- ### F-05 — Requeue tasks explosion — 141 tasks from 10 findings [OC]
66
-
67
- After ingesting the first batch of 10 `data_integrity` findings, the orchestrator generated a `requeue_tasks.json` with 141 tasks — more than the original 64 audit tasks. The requeue logic appears to fan out across all `(lens, file_group)` combinations, producing a combinatorial explosion. No guidance is provided on which requeue tasks are actually needed vs. redundant.
68
-
69
- **Fix needed:** Requeue logic must de-duplicate against already-completed coverage. Requeue tasks should only be generated for `(file_group, lens)` pairs not already marked complete in `coverage_matrix.json`.
70
-
71
- ---
72
-
73
- ## Medium
74
-
75
- ### F-06 — Evidence schema undocumented; validation only at ingestion [Both]
76
-
77
- **CD observation:** `evidence[]` must be an array of plain strings. This constraint is documented in `current-prompt.md` but not validated until `--results` ingestion, where failure produces a cryptic JS runtime error with no field-level attribution.
78
-
79
- **OC observation:** `evidence` must be an array of objects (`[{excerpt, line_reference}]`), not a single object. Discovered only through the error `"(finding.evidence ?? []) is not iterable"`.
80
-
81
- **Note — schema discrepancy:** The two environments reported conflicting evidence types (strings vs. objects). This is itself a documentation or versioning problem — the expected format is not the same across runs, or the prompt changed between environments.
82
-
83
- **Fix needed:**
84
- - Publish a JSON Schema file (or inline schema comment in the prompt) that workers can validate against before submission.
85
- - The string format should be explicit: `"src/foo.py:42 — variable overwritten before use"`.
86
- - Reconcile the expected type (string vs object) and pick one; document it prominently.
87
-
88
- ---
89
-
90
- ### F-07 — `synthesis_current` obligation permanently shows "missing" [Both]
91
-
92
- **CD observation:** Even after `synthesis_report.json` was fully populated (95 findings, 46 clusters, 22 work blocks), the obligation tracker showed `synthesis_current: missing` because it was blocked by `audit_tasks_completed` (itself blocked by the undefined-status bug, F-01). The worker cannot distinguish "synthesis truly hasn't run" from "synthesis ran but the gate is broken."
93
-
94
- **OC observation:** The obligation was never going to be satisfied by the framework — the synthesis agent never ran. Required manually creating `synthesis_report.json` and forcing the obligation to `satisfied`.
95
-
96
- **Fix needed:** Decouple `synthesis_current` satisfaction from `audit_tasks_completed`. If `synthesis_report.json` exists and is non-empty, `synthesis_current` should be satisfied regardless of upstream gate state.
97
-
98
- ---
99
-
100
- ### F-08 — "All remaining N tasks low priority" — no guidance on what to do [CD]
101
-
102
- At a certain point the orchestrator indicated all remaining 110 tasks were low priority. The directive does not define what the worker should do: submit empty findings (what was done), review at reduced depth, or skip entirely. Submitting `findings: []` for 60+ tasks in rapid succession was the only way to unblock the orchestrator, but legitimate low-severity findings in those files were never written.
103
-
104
- **Fix needed:** When the orchestrator decides remaining tasks are low priority, emit an explicit directive — e.g., `"You may submit empty findings for these tasks"` or `"Review at reduced depth"` or `"Skip — the audit has sufficient coverage"`.
105
-
106
- ---
107
-
108
- ### F-09 — Trivially empty files dispatched as full audit tasks [CD]
109
-
110
- The task manifest dispatched audit tasks for empty `__init__.py` files (some containing only a docstring), dotfiles (`.gitignore`, `.gitattributes`), and one-line stub files. Each required a full read→write→ingest round-trip to produce an empty `findings: []` result. For 30 files this added ~30–40 pointless round-trips; at scale this is severe.
111
-
112
- **Fix needed:** Filter files below a minimum token threshold (or with no parseable code constructs) before dispatching tasks. Batch all trivially-empty files into a single no-op task, or skip them entirely.
113
-
114
- ---
115
-
116
- ### F-10 — No batch task dispatch; one task per CLI invocation [Both]
117
-
118
- **CD observation:** 126 tasks required 252+ CLI invocations plus 126 file reads and writes. The design assumes one task = one LLM call = one CLI round-trip.
119
-
120
- **OC observation:** No bulk ingestion mechanism; wrote a custom `scripts/ingest-results.mjs` that directly mutated `audit_results.jsonl`, `coverage_matrix.json`, and `audit_state.json`.
121
-
122
- **Fix needed:** Support batched dispatch (a `current-tasks.json` with N tasks per run) and a native `audit-code --batch-results <dir>` that processes all result files in a directory. Alternatively, make `agentBatchSize` settable to a meaningful value that workers actually see in their prompt.
123
-
124
- ---
125
-
126
- ### F-11 — Coverage matrix ↔ task_id mapping is opaque [OC]
127
-
128
- The relationship between `audit_tasks.json` task IDs (e.g., `src-lib:correctness`) and `coverage_matrix.json` file entries (e.g., `src/lib/file-utils.ts` with `required_lenses: [correctness, ...]`) is implicit. A task's `file_group` maps to multiple files in the matrix, but there is no explicit mapping table. The mapping must be reverse-engineered by reading both files.
129
-
130
- **Fix needed:** Either include the resolved file list in each task, or provide an `audit-code explain-task <task_id>` subcommand that shows which files and lenses a task covers.
131
-
132
- ---
133
-
134
- ### F-12 — Bash variable substitution breaks `node -e` shell loops [CD]
135
-
136
- When batching remaining tasks with a shell loop that invoked `node -e '...'`, bash expanded `${}` syntax inside the JS string before Node.js received it, producing `bad substitution` errors. The workaround (write a standalone `.mjs` file) is not documented anywhere.
137
-
138
- **Fix needed:** Document the correct pattern for batch submission loops, or provide a native `audit-code --batch-results <dir>` to eliminate the need for shell-level scripting entirely.
139
-
140
- ---
141
-
142
- ### F-13 — Session config discovery and provider switching are error-prone [Both]
143
-
144
- **CD observation:** Every invocation printed `[session-config] no session-config.json found` — 252+ times across the run. The warning appeared even after completing ingestion steps that should have established the session.
145
-
146
- **OC observation:** Required manually creating `session-config.json` with `{"provider": "opencode"}`. Even after doing so, the provider change only altered the error message; actual dispatch still did not work.
147
-
148
- **Fix needed:** Create a default `session-config.json` on first `audit-code` run. Suppress the warning when a session config is genuinely optional. Document the `provider` field and its valid values prominently in the session-config guide.
149
-
150
- ---
151
-
152
- ### F-14 — No documentation on artifact schema or contract [OC]
153
-
154
- The `contract_version: "audit-code/v1alpha1"` header implies a versioned protocol, but there is no schema documentation. The expected formats for JSONL structure, finding shape, task_id conventions, and coverage matrix layout had to be learned entirely by reading existing artifacts and reverse-engineering error messages.
155
-
156
- **Fix needed:** Publish a contract reference document alongside `docs/contract.md` (which exists but may be incomplete) covering: all artifact file schemas, field-level types and constraints, task_id naming convention, and the expected `AuditResult` JSON shape with a worked example.
157
-
158
- ---
159
-
160
- ## Low
161
-
162
- ### F-15 — `worker_results_pending.json` not cleared after ingestion [CD]
163
-
164
- After `audit-code --results <file>`, the staging file is not cleared. Stale results from the previous task are resubmitted if the worker forgets to overwrite the file.
165
-
166
- **Fix needed:** After successful ingestion, delete or rename `worker_results_pending.json` (e.g., to `worker_results_submitted_<timestamp>.json`).
167
-
168
- ---
169
-
170
- ### F-16 — `related_findings` contains only circular self-references [CD]
171
-
172
- In the synthesized `synthesis_report.json`, nearly every finding's `related_findings` array contains only the finding's own ID. This is useless and misleads reviewers into thinking cross-finding relationships were analyzed.
173
-
174
- **Fix needed:** Omit `related_findings` when the synthesis engine cannot identify cross-finding relationships, rather than populating it with a self-reference.
175
-
176
- ---
177
-
178
- ### F-17 — Runtime validation evidence shows "pending" for every finding [CD]
179
-
180
- All 95 findings contain entries like `"runtime:unit:src-modules: pending — No runtime evidence recorded yet"`. These appear verbatim across every finding, bloating each one with 2–3 lines of noise that convey nothing.
181
-
182
- **Fix needed:** Omit pending evidence entries from the output entirely. A finding with no runtime evidence should have no runtime evidence in its array — not a placeholder repeated 95 times.
183
-
184
- ---
185
-
186
- ### F-18 — `root_cause_clusters` are file co-location groups, not semantic clusters [CD]
187
-
188
- The 46 clusters are named `"correctness/correctness in src/modules"` — file-path groups with a lens label, not semantic root causes. One cluster for "correctness in src/modules" contains 5 findings with 3 entirely different root causes (division-by-zero, cudagraph violation, dead code).
189
-
190
- **Fix needed:** Root-cause clustering should be semantic (e.g., "All NaN paths from missing eps guards"). Either improve the clustering algorithm or rename the section to `file_clusters` to accurately describe what it contains.
191
-
192
- ---
193
-
194
- ### F-19 — `work_blocks` section omitted from final summary presentation [CD]
195
-
196
- The audit directive says findings should be organized into "non-overlapping blocks of tasks." `synthesis_report.json` correctly generates 22 `work_blocks`. The final summary presentation omitted this section entirely, showing only a flat findings table. Work blocks are arguably the most actionable output and should lead the summary.
197
-
198
- **Fix needed:** Make the `work_blocks` presentation requirement explicit and prominent in the final-output section of the prompt.
199
-
200
- ---
201
-
202
- ### F-20 — `reviewed_ranges` field is unenforceable and creates false confidence [CD]
203
-
204
- Workers can declare `reviewed_ranges: [{start: 1, end: 10}]` while writing findings about line 800, or declare the full file range without actually reading it. For a 966-line file this makes the field meaningless.
205
-
206
- **Fix needed:** Either remove `reviewed_ranges` (it creates false confidence) or enforce it mechanically by requiring a content hash or line-count to be included alongside the range declaration.
207
-
208
- ---
209
-
210
- ## Summary Table
211
-
212
- | ID | Issue | Sev | Env | Type |
213
- |----|-------|-----|-----|------|
214
- | F-01 | Orchestrator never reaches `status: "complete"` — task statuses undefined | Critical | Both | Bug |
215
- | F-02 | Worker launch failures / silent executor failures | Critical | Both | Bug |
216
- | F-03 | `--results` ingestion unreliable (type errors, wrong path, `.map()` crash) | High | Both | Bug |
217
- | F-04 | CLI hangs without output on some invocations | High | CD | Bug |
218
- | F-05 | Requeue tasks explosion — 141 tasks from 10 findings | High | OC | Bug |
219
- | F-06 | Evidence schema undocumented; validated only at ingestion (conflicting types across envs) | Medium | Both | DX |
220
- | F-07 | `synthesis_current` permanently "missing" even when report is populated | Medium | Both | Bug |
221
- | F-08 | "All remaining N tasks low priority" — no guidance on worker action | Medium | CD | UX |
222
- | F-09 | Trivially empty files dispatched as full audit tasks | Medium | CD | Design |
223
- | F-10 | No batch task dispatch; one task = one CLI invocation = one round-trip | Medium | Both | Design |
224
- | F-11 | Coverage matrix ↔ task_id mapping is opaque; no lookup table | Medium | OC | DX |
225
- | F-12 | Bash variable substitution breaks `node -e` shell loops | Medium | CD | DX |
226
- | F-13 | Session config discovery / provider switching error-prone; noisy warnings | Medium | Both | UX |
227
- | F-14 | No documentation on artifact schema or contract | Medium | OC | DX |
228
- | F-15 | `worker_results_pending.json` not cleared after ingestion | Low | CD | Bug |
229
- | F-16 | `related_findings` circular self-references | Low | CD | Data |
230
- | F-17 | Runtime validation "pending" entries in all 95 findings | Low | CD | Data |
231
- | F-18 | `root_cause_clusters` are file co-location groups, not semantic | Low | CD | Design |
232
- | F-19 | `work_blocks` omitted from final summary presentation | Low | CD | UX |
233
- | F-20 | `reviewed_ranges` unenforceable; creates false confidence | Low | CD | Design |
234
-
235
- **Env key:** CD = Claude Desktop (claude-code provider), OC = OpenCode (opencode provider), Both = independently observed in both.
236
-
237
- **Type key:** Bug = incorrect behavior, DX = developer/worker experience, UX = output/presentation, Design = intentional design that needs reconsideration, Data = output data quality.
@@ -1,66 +0,0 @@
1
- # GitHub Copilot setup
2
-
3
- This is one repo-local host integration for the `/audit-code` conversation surface.
4
-
5
- It is now a narrower host-specific path. The preferred user flow is a global
6
- package install plus the prompt's self-bootstrap step; `audit-code install`
7
- remains the explicit repair or force-refresh path.
8
-
9
- ## Recommended path
10
-
11
- Install once:
12
-
13
- ```bash
14
- npm install -g auditor-lambda
15
- ```
16
-
17
- Then invoke `/audit-code` in a supported chat surface. The prompt runs:
18
-
19
- ```bash
20
- audit-code ensure --quiet
21
- ```
22
-
23
- If Copilot has not discovered the workspace prompt/MCP files yet, run this from
24
- the target repository root:
25
-
26
- ```bash
27
- audit-code install
28
- ```
29
-
30
- That writes the canonical `/audit-code` prompt and compatibility instructions into:
31
-
32
- ```text
33
- .github/prompts/audit-code.prompt.md
34
- .github/copilot-instructions.md
35
- ```
36
-
37
- After that, open GitHub Copilot Chat in the same repository and invoke `/audit-code`.
38
-
39
- If you only want the narrower VS Code / Copilot install surface, use:
40
-
41
- ```bash
42
- audit-code install --host vscode
43
- ```
44
-
45
- ## Behavior
46
-
47
- - `audit-code ensure` creates or refreshes the same files only when missing or stale
48
- - `audit-code install` force-refreshes the canonical prompt payload from `skills/audit-code/audit-code.prompt.md`
49
- - the generated prompt file explicitly sets `agent: auditor` so Copilot Chat uses the generated auditor custom agent
50
- - the installer upserts its managed compatibility block into `.github/copilot-instructions.md` instead of clobbering unrelated instructions
51
- - it prints machine-readable JSON describing the installed targets
52
-
53
- ## Explicit root override
54
-
55
- If you are running from outside the target repository:
56
-
57
- ```bash
58
- audit-code install --root /path/to/repo
59
- ```
60
-
61
- ## Why this exists
62
-
63
- The goal is to reduce conversation setup friction without repositioning the backend CLI as the primary product surface.
64
-
65
- GitHub Copilot now shares the same repo-native self-bootstrap path as the other currently automated hosts.
66
- The older `audit-code install-host --host copilot` alias still exists for compatibility, but it is no longer the recommended setup flow.
@@ -1,97 +0,0 @@
1
- # model selection
2
-
3
- This repository has two distinct model-selection layers.
4
-
5
- ## 1. Skill-first product rule
6
-
7
- The canonical product surface is `/audit-code` in conversation.
8
-
9
- For that surface, the default model rule is:
10
-
11
- - use the active conversation model by default
12
- - avoid forcing the user to supply a model in normal usage
13
-
14
- That is the intended product contract.
15
-
16
- When packet dispatch is prepared, `dispatch-plan.json` includes
17
- provider-neutral complexity metadata and a `model_hint.tier` value:
18
-
19
- - `small` for tiny, low-priority packets without sensitive lenses or risk tags
20
- - `standard` for ordinary bounded review packets
21
- - `deep` for high-priority, large, critical-flow, or external-signal packets
22
-
23
- Hosts that support per-subagent model choice may map those tiers to their own
24
- available models. Hosts that do not support model choice can ignore the fields.
25
- The backend still does not prescribe concrete model names.
26
-
27
- ## 2. Backend provider rule
28
-
29
- When the local backend delegates bounded worker runs into an external provider, model selection becomes provider-specific.
30
-
31
- That means the supervisor should not invent a global model abstraction unless it can be implemented consistently across adapters.
32
-
33
- Today the practical rule is:
34
-
35
- - no provider-specific model override by default
36
- - let each provider use its own default model unless explicitly configured otherwise
37
- - when a model is required, pass it through the provider's own native CLI arguments via `extra_args`
38
-
39
- ## Current provider behavior
40
-
41
- ### `local-subprocess`
42
-
43
- No external LLM provider is selected here.
44
-
45
- The supervisor simply launches the bounded local worker command.
46
-
47
- ### `claude-code`
48
-
49
- The built-in Claude Code adapter supports model overrides through `claude_code.extra_args`.
50
-
51
- Example:
52
-
53
- ```json
54
- {
55
- "provider": "claude-code",
56
- "ui_mode": "visible",
57
- "claude_code": {
58
- "command": "claude",
59
- "extra_args": ["--model", "sonnet"]
60
- }
61
- }
62
- ```
63
-
64
- Use this only when you want to force a specific Claude Code model instead of relying on the user's normal Claude Code default.
65
-
66
- ### `opencode`
67
-
68
- The built-in OpenCode adapter supports model overrides through `opencode.extra_args`.
69
-
70
- Example:
71
-
72
- ```json
73
- {
74
- "provider": "opencode",
75
- "ui_mode": "visible",
76
- "opencode": {
77
- "command": "opencode",
78
- "extra_args": ["--model", "anthropic/claude-sonnet-4.5"]
79
- }
80
- }
81
- ```
82
-
83
- Use this only when you want to force a specific OpenCode provider/model pair instead of relying on the user's normal OpenCode default.
84
-
85
- ### `subprocess-template` and `vscode-task`
86
-
87
- These adapters do not define model semantics themselves.
88
-
89
- If you need model selection here, include it in the external launcher command template or in the external tool being invoked.
90
-
91
- ## Recommendation
92
-
93
- For the cleanest product behavior:
94
-
95
- 1. in conversation, let `/audit-code` inherit the active conversation model
96
- 2. for repo-local backend usage, do not force model selection unless the operator has a concrete reason
97
- 3. when needed, set model selection in the provider-native config rather than inventing another repo-level abstraction
@@ -1,202 +0,0 @@
1
- # Next Implementation Steps
2
-
3
- This document tracks the next meaningful implementation work after the packet
4
- review-dispatch refactor and the current skill-first productionization pass.
5
-
6
- As of April 30, 2026, the shared MCP substrate and the host-native installer pass have landed, but this repository is not yet ready for a public production launch.
7
-
8
- See:
9
-
10
- - `docs/production-readiness.md`
11
-
12
- ## Current state
13
-
14
- The repository now supports:
15
-
16
- - `/audit-code` as the documented canonical product route
17
- - packaged and repository-local access to `skills/audit-code/audit-code.prompt.md`
18
- - `audit-code prompt-path` as a stable prompt lookup helper
19
- - `npm install -g auditor-lambda` as the intended one-time user install
20
- - `audit-code ensure` as an idempotent self-bootstrap helper for current repository host surfaces
21
- - `audit-code install --host vscode|opencode|codex|claude-desktop|antigravity|all` as an explicit repair or force-refresh installer for the current host surfaces
22
- - `audit-code mcp` as a first-class stdio MCP server entrypoint
23
- - a stable MCP contract with the `start_audit`, `get_status`, `continue_audit`, `explain_task`, `validate_artifacts`, `import_results`, and `import_runtime_updates` tools
24
- - repo-local MCP resources for current artifacts, operator handoff, install guidance, and the current audit report
25
- - repo-local MCP prompts for `audit-code`, `review-task`, and `synthesize-report`
26
- - generated `.audit-code/install/manifest.json` plus a shared repo-local MCP launcher script
27
- - Codex install assets including a repo skill bundle, `AGENTS.md` support, MCP setup guidance, and an automation recipe
28
- - Claude Desktop install assets including a project template, remote connector template, and generated local bundle artifacts
29
- - OpenCode install assets including command, skill, prompt, and `opencode.json` support
30
- - VS Code install assets including prompt file, Copilot instructions, custom agent, and `.vscode/mcp.json`
31
- - Antigravity install assets including planning-mode guidance and MCP-oriented setup guidance
32
- - explicit failures for malformed backend config and corrupted artifact JSON
33
- - `audit-code validate` as a machine-readable backend operator check
34
- - an explicit in-repo release gate via `npm run verify:release`
35
- - structured operator handoff output plus `.audit-artifacts/operator-handoff.{json,md}` for blocked fallback runs
36
- - configured provider bridges that can continue audit-task review by writing structured results and handing control back to the bounded worker command
37
- - graph-informed review packets, `review_packets.json`, and `audit_plan_metrics.json`
38
- - compact packet `prepare-dispatch` and `merge-and-ingest` envelopes
39
-
40
- That means the current release is suitable for a controlled alpha or beta skill-first workflow with MCP-aware host bootstrapping, but it is not yet the final public production end-state.
41
-
42
- ## Near-term priorities
43
-
44
- ### 1. Prove packet review dispatch on real repositories
45
-
46
- The highest-priority product follow-through is to validate the packet workflow
47
- outside this repository and compare it to the legacy fan-out baseline.
48
-
49
- Near-term work should focus on:
50
-
51
- - running `/audit-code` against at least one nontrivial external repository
52
- - recording packet count, task count, warning count, and largest-packet estimate
53
- - comparing observed worker count and token/quota behavior against the old
54
- one-task-per-worker model
55
- - tightening packet budgets or warning thresholds if real repositories expose
56
- rough edges
57
-
58
- The current handoff for this work is:
59
-
60
- - `docs/workflow-refactor-brief.md`
61
- - `docs/remediation-baseline.md`
62
-
63
- ### 2. Verify the shipped host integrations end to end
64
-
65
- The biggest remaining gap is not raw feature presence anymore. It is host-by-host proof that the generated assets work in the actual products they target.
66
-
67
- Near-term work should focus on:
68
-
69
- - verifying the Codex skill bundle, `AGENTS.md`, MCP setup guidance, and automation recipe against the real Codex app flow
70
- - installing and smoke-testing the generated Claude Desktop `DXT` or bundle output in a real Desktop environment
71
- - validating that the OpenCode `opencode.json` shape, command file, and MCP config match current OpenCode behavior
72
- - validating the VS Code prompt, agent, and `.vscode/mcp.json` flow inside a real workspace
73
- - validating that the Antigravity planning-mode guidance is accurate and does not over-promise a native saved-workflow surface
74
-
75
- ### 3. Close the remaining host-native UX gaps
76
-
77
- The product goal is still conversational first, not fallback-CLI first, and some shipped surfaces are still guidance-heavy rather than truly native.
78
-
79
- Near-term work should focus on:
80
-
81
- - turning the Codex automation recipe into a proven, documented automation flow after real operator validation
82
- - polishing Claude Desktop one-click install so the generated bundle is a clearly supported path instead of a mostly technical artifact
83
- - deciding whether OpenCode and VS Code need any smaller UX refinements after smoke-testing, rather than assuming the first generated surfaces are final
84
- - keeping Antigravity framed as a workflow-and-artifacts host until Google documents a stable project-local config surface
85
-
86
- ### 4. Polish continuation through assisted review
87
-
88
- The repo-local backend fallback still intentionally stops in blocked state under `local-subprocess`, but configured provider bridges can now continue the audit-task review phase automatically.
89
-
90
- Near-term work should focus on:
91
-
92
- - clearer evidence handoff for imported results and runtime updates
93
- - less operator guesswork when a configured provider fails to return usable results
94
- - stronger host-specific guidance for provider-assisted bridges
95
-
96
- ### 5. Harden publish and release operations
97
-
98
- The packaged install story is in place, but release operations still need finishing work.
99
-
100
- Near-term work should focus on:
101
-
102
- - npm package-name availability and ownership
103
- - one-time npm Trusted Publisher setup for `.github/workflows/publish-package.yml`
104
- - the first GitHub Actions dry run and live publish through that workflow
105
- - keeping prerelease publication off the `latest` dist-tag unless intentionally requested
106
- - keeping linked-install and packaged-install smoke checks as release gates
107
-
108
- ## Frictionless-ready checklist
109
-
110
- The repository should not be described as frictionless and production-ready until this checklist is substantially true:
111
-
112
- ### Codex, Claude Desktop, OpenCode, and VS Code
113
-
114
- - `audit-code ensure` remains the default self-bootstrap path and `audit-code install` remains the explicit repair path
115
- - the generated repo-local surfaces are obvious in installer output and in `.audit-code/install/GETTING-STARTED.md`
116
- - a new user can invoke `/audit-code` without guessing where prompts or commands were written
117
- - the generated MCP setup path works in the real host, not only in unit tests
118
- - smoke coverage continues to verify the exact repo-local files these hosts consume
119
-
120
- ### Antigravity
121
-
122
- - the planning-mode guidance is explicit, repo-local, and easy to discover
123
- - `.audit-code/install/GETTING-STARTED.md` gives Antigravity-specific steps instead of generic prompt-import advice
124
- - docs avoid implying native repo-local saved-workflow support that is not actually shipped
125
- - the backend fallback remains a clearly secondary path instead of the default recommendation
126
-
127
- ### Assisted review continuation
128
-
129
- - configured interactive providers can continue blocked audits through audit-task review in the same wrapper invocation
130
- - operator handoff artifacts remain explicit and inspectable even when continuation is smoother
131
-
132
- ### Release operations
133
-
134
- - `npm run verify:release` remains green and authoritative
135
- - the real publish path is proven with npm ownership, npm Trusted Publishing, and a real GitHub Actions dry run or prerelease publish
136
-
137
- ## Probable next steps
138
-
139
- These are the most likely next implementation steps based on the current codebase state, but they should still be treated as provisional rather than guaranteed:
140
-
141
- ### 1. Prove the host installers in real products
142
-
143
- Status:
144
-
145
- - partially completed in code, not yet fully validated operationally
146
-
147
- Most likely shape:
148
-
149
- - run fresh-repo smoke checks inside Codex, Claude Desktop, OpenCode, and VS Code, with Antigravity validated against its planning-mode path
150
- - confirm that the generated files are both syntactically valid and actually discovered by each host
151
- - tighten generated docs wherever operator confusion appears during those checks
152
- - keep Antigravity as a documented planning-mode path unless a stable project config contract is published
153
-
154
- Practical success bar:
155
-
156
- - a new operator can run one install command and reach a working `/audit-code` or MCP-backed flow in each claimed host without guesswork
157
-
158
- ### 2. Harden configured interactive-provider continuation
159
-
160
- Most likely shape:
161
-
162
- - use the existing provider configuration surface in `.audit-artifacts/session-config.json`
163
- - keep the provider-assisted review handoff less manual when `claude-code`, `opencode`, or another bridge is intentionally configured
164
- - preserve explicit artifact imports and operator visibility instead of hiding state transitions
165
- - improve diagnostics and recovery when the provider fails to emit structured results
166
-
167
- Practical success bar:
168
-
169
- - a configured provider can continue through audit-task review with good diagnostics and low operator guesswork when something goes wrong
170
-
171
- ### 3. Finish the Claude Desktop and Antigravity follow-through
172
-
173
- Most likely shape:
174
-
175
- - prove the generated Claude Desktop local bundle in a real Desktop install flow
176
- - decide whether to check in or generate the final desktop-extension packaging metadata more explicitly
177
- - add remote connector deployment guidance that is specific enough for Team or Enterprise rollout
178
- - document exactly how Antigravity-produced artifacts should flow back through `import_results` and `import_runtime_updates`
179
-
180
- Practical success bar:
181
-
182
- - Claude Desktop and Antigravity guidance is operational, specific, and consistent with what the products really support
183
-
184
- ### 4. Prove the release path outside the repository
185
-
186
- Most likely shape:
187
-
188
- - confirm npm package-name ownership and npm Trusted Publisher configuration
189
- - run a real GitHub Actions pre-release or dry-run publish
190
- - keep `npm run verify:release` as the minimum in-repo gate before publish
191
-
192
- Practical success bar:
193
-
194
- - the release workflow is demonstrated end to end instead of only being inferred from configuration
195
-
196
- ## Non-goals for the next phase
197
-
198
- These should not become the primary focus of the next implementation pass:
199
-
200
- - repositioning the CLI as a peer product surface
201
- - expanding low-level backend helpers into a CLI-first user experience
202
- - making backend implementation details outrank the conversation contract in docs or product decisions