auditor-lambda 0.3.12 → 0.3.14
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +20 -24
- package/audit-code-wrapper-lib.mjs +52 -53
- package/dist/cli.js +43 -6
- package/dist/coverage.js +3 -1
- package/dist/extractors/disposition.js +8 -1
- package/dist/extractors/graph.d.ts +3 -1
- package/dist/extractors/graph.js +1147 -67
- package/dist/extractors/graphManifestEdges.d.ts +14 -0
- package/dist/extractors/graphManifestEdges.js +1158 -0
- package/dist/extractors/graphPathUtils.d.ts +5 -0
- package/dist/extractors/graphPathUtils.js +75 -0
- package/dist/extractors/pathPatterns.d.ts +1 -0
- package/dist/extractors/pathPatterns.js +3 -0
- package/dist/io/artifacts.d.ts +10 -1
- package/dist/io/artifacts.js +23 -3
- package/dist/orchestrator/internalExecutors.d.ts +4 -0
- package/dist/orchestrator/internalExecutors.js +35 -6
- package/dist/orchestrator/reviewPackets.js +1003 -31
- package/dist/orchestrator/syntaxResolutionExecutor.js +34 -0
- package/dist/types/externalAnalyzer.d.ts +9 -0
- package/dist/types/graph.d.ts +3 -0
- package/dist/types/reviewPlanning.d.ts +39 -0
- package/docs/contracts.md +215 -0
- package/docs/development.md +210 -0
- package/docs/handoff.md +204 -0
- package/docs/history.md +40 -0
- package/docs/operator-guide.md +189 -0
- package/docs/product.md +185 -0
- package/docs/release.md +131 -0
- package/package.json +1 -1
- package/schemas/audit_plan_metrics.schema.json +347 -0
- package/schemas/external_analyzer_results.schema.json +35 -0
- package/schemas/graph_bundle.schema.json +47 -2
- package/schemas/review_packets.schema.json +160 -0
- package/skills/audit-code/SKILL.md +7 -3
- package/skills/audit-code/audit-code.prompt.md +4 -1
- package/docs/agent-integrations.md +0 -317
- package/docs/agent-roles.md +0 -69
- package/docs/architecture.md +0 -90
- package/docs/artifacts.md +0 -36
- package/docs/bootstrap-install.md +0 -139
- package/docs/contract.md +0 -54
- package/docs/dispatch-implementation-plan.md +0 -302
- package/docs/field-trial-bug-report.md +0 -237
- package/docs/github-copilot.md +0 -66
- package/docs/model-selection.md +0 -97
- package/docs/next-steps.md +0 -202
- package/docs/packaging.md +0 -120
- package/docs/pipeline.md +0 -152
- package/docs/product-direction.md +0 -154
- package/docs/production-launch-bar.md +0 -92
- package/docs/production-readiness.md +0 -58
- package/docs/releasing.md +0 -145
- package/docs/remediation-baseline.md +0 -75
- package/docs/repo-layout.md +0 -30
- package/docs/run-flow.md +0 -56
- package/docs/session-config.md +0 -319
- package/docs/supervisor.md +0 -100
- package/docs/usage.md +0 -215
- package/docs/windows-setup.md +0 -146
- package/docs/workflow-refactor-brief.md +0 -124
|
@@ -1,237 +0,0 @@
|
|
|
1
|
-
# audit-code Field Trial Bug Report
|
|
2
|
-
|
|
3
|
-
**Observed by:** LLM workers (Claude, April 2026)
|
|
4
|
-
**Environments tested:** Claude Desktop (claude-code provider), OpenCode (opencode provider)
|
|
5
|
-
**Repositories audited:** `Polar-CV-KAN` (~30 source files, ~126 tasks, Claude Desktop); same codebase, OpenCode run
|
|
6
|
-
**Report date:** 2026-04-21
|
|
7
|
-
|
|
8
|
-
Issues marked **[Both]** were independently observed in both environments. Issues marked **[CD]** or **[OC]** were specific to one environment.
|
|
9
|
-
|
|
10
|
-
---
|
|
11
|
-
|
|
12
|
-
## Critical
|
|
13
|
-
|
|
14
|
-
### F-01 — Orchestrator never transitions to `status: "complete"` [Both]
|
|
15
|
-
|
|
16
|
-
**CD observation:** `audit_tasks.json` showed `status: undefined` for all 126 tasks after successful ingestion. The completion gate checks this field, finds it undefined on every task, and permanently blocks. `synthesis_report.json` populated correctly (95 findings, 46 clusters) because synthesis runs on ingestion independently, but `status: "complete"` never fires because the gate only looks at the broken task-status field.
|
|
17
|
-
|
|
18
|
-
**OC observation:** Even after all obligations reached `satisfied`, `audit_state.json` remained `status: "active"` and re-triggered planning artifacts. Required direct JSON edit to force `status: "complete"`.
|
|
19
|
-
|
|
20
|
-
**Impact:** The documented stop condition ("stop the loop when terminal output shows `status: complete`") never fires in either environment. Workers must either loop indefinitely or make a judgment call to stop. This is the most fundamental failure in the framework.
|
|
21
|
-
|
|
22
|
-
**Fix needed:** Audit task status must be written at ingestion time. The completion gate should also fall back on obligation-state truth: if all obligations are `satisfied`, the run is complete regardless of the task-status field.
|
|
23
|
-
|
|
24
|
-
---
|
|
25
|
-
|
|
26
|
-
### F-02 — Worker launch failures / silent executor failures [Both]
|
|
27
|
-
|
|
28
|
-
**CD observation:** `structure_executor` failed to launch during initial structuring. The failure was reported in JSON output but the orchestrator continued as if structuring had succeeded. Task quality degraded silently from the start — unclear whether the resulting task plan was the best possible decomposition or a fallback.
|
|
29
|
-
|
|
30
|
-
**OC observation:** Every executor failed (`agent`, `result_ingestion_executor`, `planning_executor`). The entire audit had to be performed manually by reading source files, writing findings in the correct format, and directly manipulating artifact files. The provider (`opencode`) was supposed to enable interactive dispatch but never actually dispatched work.
|
|
31
|
-
|
|
32
|
-
**Impact (OC):** The framework served as a state tracker only — no automation at all. **Impact (CD):** Silent quality degradation at the task-planning phase with no way to detect it.
|
|
33
|
-
|
|
34
|
-
**Fix needed:** Worker launch failures must be surfaced as blocking handoffs, not silently swallowed. The orchestrator must not advance past a failed executor as if it succeeded. Executor failure messages must include enough detail to diagnose the root cause.
|
|
35
|
-
|
|
36
|
-
---
|
|
37
|
-
|
|
38
|
-
## High
|
|
39
|
-
|
|
40
|
-
### F-03 — `--results` ingestion is unreliable [Both]
|
|
41
|
-
|
|
42
|
-
**CD observation:** `audit-code --results <file>` threw `TypeError: e.trim is not a function` when evidence fields contained objects rather than plain strings. The CLI exited 0 in some cases, making it ambiguous whether results were partially or fully ingested.
|
|
43
|
-
|
|
44
|
-
**OC observation:** Two separate failure modes: (1) the generated task file contained `audit_results_path: "--root"` (the CLI flag was written as the path value) causing an immediate crash; (2) after manually fixing the path, ingestion crashed with `Cannot read properties of undefined (reading 'map')` — the ingestion executor cannot parse the incoming results format.
|
|
45
|
-
|
|
46
|
-
**Impact:** The primary submission mechanism is unreliable. Workers resort to custom scripts that bypass the framework and will break if the artifact schema changes.
|
|
47
|
-
|
|
48
|
-
**Fix needed:**
|
|
49
|
-
- Fix the `audit_results_path: "--root"` generation bug in the task file writer.
|
|
50
|
-
- Add schema validation on ingestion that emits field-level errors: `"evidence[2] must be a string, got object"` rather than a bare JS runtime crash.
|
|
51
|
-
- The ingestion executor must not exit 0 on partial failure.
|
|
52
|
-
|
|
53
|
-
---
|
|
54
|
-
|
|
55
|
-
### F-04 — CLI hangs without output [CD]
|
|
56
|
-
|
|
57
|
-
On multiple occasions, `audit-code` or `audit-code --results <file>` produced no output and hung indefinitely. No timeout, no error message, no way to distinguish a genuine hang from a slow operation.
|
|
58
|
-
|
|
59
|
-
**Observed pattern:** Hangs were most frequent immediately after large ingestions and at session start during manifest structuring. Suspected cause: Node.js blocking on large JSON serialization or file locking between the orchestrator writing state and the CLI reading it.
|
|
60
|
-
|
|
61
|
-
**Fix needed:** Add a timeout (`--timeout <ms>`) and ensure the CLI emits a progress indicator or heartbeat on long operations.
|
|
62
|
-
|
|
63
|
-
---
|
|
64
|
-
|
|
65
|
-
### F-05 — Requeue tasks explosion — 141 tasks from 10 findings [OC]
|
|
66
|
-
|
|
67
|
-
After ingesting the first batch of 10 `data_integrity` findings, the orchestrator generated a `requeue_tasks.json` with 141 tasks — more than the original 64 audit tasks. The requeue logic appears to fan out across all `(lens, file_group)` combinations, producing a combinatorial explosion. No guidance is provided on which requeue tasks are actually needed vs. redundant.
|
|
68
|
-
|
|
69
|
-
**Fix needed:** Requeue logic must de-duplicate against already-completed coverage. Requeue tasks should only be generated for `(file_group, lens)` pairs not already marked complete in `coverage_matrix.json`.
|
|
70
|
-
|
|
71
|
-
---
|
|
72
|
-
|
|
73
|
-
## Medium
|
|
74
|
-
|
|
75
|
-
### F-06 — Evidence schema undocumented; validation only at ingestion [Both]
|
|
76
|
-
|
|
77
|
-
**CD observation:** `evidence[]` must be an array of plain strings. This constraint is documented in `current-prompt.md` but not validated until `--results` ingestion, where failure produces a cryptic JS runtime error with no field-level attribution.
|
|
78
|
-
|
|
79
|
-
**OC observation:** `evidence` must be an array of objects (`[{excerpt, line_reference}]`), not a single object. Discovered only through the error `"(finding.evidence ?? []) is not iterable"`.
|
|
80
|
-
|
|
81
|
-
**Note — schema discrepancy:** The two environments reported conflicting evidence types (strings vs. objects). This is itself a documentation or versioning problem — the expected format is not the same across runs, or the prompt changed between environments.
|
|
82
|
-
|
|
83
|
-
**Fix needed:**
|
|
84
|
-
- Publish a JSON Schema file (or inline schema comment in the prompt) that workers can validate against before submission.
|
|
85
|
-
- The string format should be explicit: `"src/foo.py:42 — variable overwritten before use"`.
|
|
86
|
-
- Reconcile the expected type (string vs object) and pick one; document it prominently.
|
|
87
|
-
|
|
88
|
-
---
|
|
89
|
-
|
|
90
|
-
### F-07 — `synthesis_current` obligation permanently shows "missing" [Both]
|
|
91
|
-
|
|
92
|
-
**CD observation:** Even after `synthesis_report.json` was fully populated (95 findings, 46 clusters, 22 work blocks), the obligation tracker showed `synthesis_current: missing` because it was blocked by `audit_tasks_completed` (itself blocked by the undefined-status bug, F-01). The worker cannot distinguish "synthesis truly hasn't run" from "synthesis ran but the gate is broken."
|
|
93
|
-
|
|
94
|
-
**OC observation:** The obligation was never going to be satisfied by the framework — the synthesis agent never ran. Required manually creating `synthesis_report.json` and forcing the obligation to `satisfied`.
|
|
95
|
-
|
|
96
|
-
**Fix needed:** Decouple `synthesis_current` satisfaction from `audit_tasks_completed`. If `synthesis_report.json` exists and is non-empty, `synthesis_current` should be satisfied regardless of upstream gate state.
|
|
97
|
-
|
|
98
|
-
---
|
|
99
|
-
|
|
100
|
-
### F-08 — "All remaining N tasks low priority" — no guidance on what to do [CD]
|
|
101
|
-
|
|
102
|
-
At a certain point the orchestrator indicated all remaining 110 tasks were low priority. The directive does not define what the worker should do: submit empty findings (what was done), review at reduced depth, or skip entirely. Submitting `findings: []` for 60+ tasks in rapid succession was the only way to unblock the orchestrator, but legitimate low-severity findings in those files were never written.
|
|
103
|
-
|
|
104
|
-
**Fix needed:** When the orchestrator decides remaining tasks are low priority, emit an explicit directive — e.g., `"You may submit empty findings for these tasks"` or `"Review at reduced depth"` or `"Skip — the audit has sufficient coverage"`.
|
|
105
|
-
|
|
106
|
-
---
|
|
107
|
-
|
|
108
|
-
### F-09 — Trivially empty files dispatched as full audit tasks [CD]
|
|
109
|
-
|
|
110
|
-
The task manifest dispatched audit tasks for empty `__init__.py` files (some containing only a docstring), dotfiles (`.gitignore`, `.gitattributes`), and one-line stub files. Each required a full read→write→ingest round-trip to produce an empty `findings: []` result. For 30 files this added ~30–40 pointless round-trips; at scale this is severe.
|
|
111
|
-
|
|
112
|
-
**Fix needed:** Filter files below a minimum token threshold (or with no parseable code constructs) before dispatching tasks. Batch all trivially-empty files into a single no-op task, or skip them entirely.
|
|
113
|
-
|
|
114
|
-
---
|
|
115
|
-
|
|
116
|
-
### F-10 — No batch task dispatch; one task per CLI invocation [Both]
|
|
117
|
-
|
|
118
|
-
**CD observation:** 126 tasks required 252+ CLI invocations plus 126 file reads and writes. The design assumes one task = one LLM call = one CLI round-trip.
|
|
119
|
-
|
|
120
|
-
**OC observation:** No bulk ingestion mechanism; wrote a custom `scripts/ingest-results.mjs` that directly mutated `audit_results.jsonl`, `coverage_matrix.json`, and `audit_state.json`.
|
|
121
|
-
|
|
122
|
-
**Fix needed:** Support batched dispatch (a `current-tasks.json` with N tasks per run) and a native `audit-code --batch-results <dir>` that processes all result files in a directory. Alternatively, make `agentBatchSize` settable to a meaningful value that workers actually see in their prompt.
|
|
123
|
-
|
|
124
|
-
---
|
|
125
|
-
|
|
126
|
-
### F-11 — Coverage matrix ↔ task_id mapping is opaque [OC]
|
|
127
|
-
|
|
128
|
-
The relationship between `audit_tasks.json` task IDs (e.g., `src-lib:correctness`) and `coverage_matrix.json` file entries (e.g., `src/lib/file-utils.ts` with `required_lenses: [correctness, ...]`) is implicit. A task's `file_group` maps to multiple files in the matrix, but there is no explicit mapping table. The mapping must be reverse-engineered by reading both files.
|
|
129
|
-
|
|
130
|
-
**Fix needed:** Either include the resolved file list in each task, or provide an `audit-code explain-task <task_id>` subcommand that shows which files and lenses a task covers.
|
|
131
|
-
|
|
132
|
-
---
|
|
133
|
-
|
|
134
|
-
### F-12 — Bash variable substitution breaks `node -e` shell loops [CD]
|
|
135
|
-
|
|
136
|
-
When batching remaining tasks with a shell loop that invoked `node -e '...'`, bash expanded `${}` syntax inside the JS string before Node.js received it, producing `bad substitution` errors. The workaround (write a standalone `.mjs` file) is not documented anywhere.
|
|
137
|
-
|
|
138
|
-
**Fix needed:** Document the correct pattern for batch submission loops, or provide a native `audit-code --batch-results <dir>` to eliminate the need for shell-level scripting entirely.
|
|
139
|
-
|
|
140
|
-
---
|
|
141
|
-
|
|
142
|
-
### F-13 — Session config discovery and provider switching are error-prone [Both]
|
|
143
|
-
|
|
144
|
-
**CD observation:** Every invocation printed `[session-config] no session-config.json found` — 252+ times across the run. The warning appeared even after completing ingestion steps that should have established the session.
|
|
145
|
-
|
|
146
|
-
**OC observation:** Required manually creating `session-config.json` with `{"provider": "opencode"}`. Even after doing so, the provider change only altered the error message; actual dispatch still did not work.
|
|
147
|
-
|
|
148
|
-
**Fix needed:** Create a default `session-config.json` on first `audit-code` run. Suppress the warning when a session config is genuinely optional. Document the `provider` field and its valid values prominently in the session-config guide.
|
|
149
|
-
|
|
150
|
-
---
|
|
151
|
-
|
|
152
|
-
### F-14 — No documentation on artifact schema or contract [OC]
|
|
153
|
-
|
|
154
|
-
The `contract_version: "audit-code/v1alpha1"` header implies a versioned protocol, but there is no schema documentation. The expected formats for JSONL structure, finding shape, task_id conventions, and coverage matrix layout had to be learned entirely by reading existing artifacts and reverse-engineering error messages.
|
|
155
|
-
|
|
156
|
-
**Fix needed:** Publish a contract reference document alongside `docs/contract.md` (which exists but may be incomplete) covering: all artifact file schemas, field-level types and constraints, task_id naming convention, and the expected `AuditResult` JSON shape with a worked example.
|
|
157
|
-
|
|
158
|
-
---
|
|
159
|
-
|
|
160
|
-
## Low
|
|
161
|
-
|
|
162
|
-
### F-15 — `worker_results_pending.json` not cleared after ingestion [CD]
|
|
163
|
-
|
|
164
|
-
After `audit-code --results <file>`, the staging file is not cleared. Stale results from the previous task are resubmitted if the worker forgets to overwrite the file.
|
|
165
|
-
|
|
166
|
-
**Fix needed:** After successful ingestion, delete or rename `worker_results_pending.json` (e.g., to `worker_results_submitted_<timestamp>.json`).
|
|
167
|
-
|
|
168
|
-
---
|
|
169
|
-
|
|
170
|
-
### F-16 — `related_findings` contains only circular self-references [CD]
|
|
171
|
-
|
|
172
|
-
In the synthesized `synthesis_report.json`, nearly every finding's `related_findings` array contains only the finding's own ID. This is useless and misleads reviewers into thinking cross-finding relationships were analyzed.
|
|
173
|
-
|
|
174
|
-
**Fix needed:** Omit `related_findings` when the synthesis engine cannot identify cross-finding relationships, rather than populating it with a self-reference.
|
|
175
|
-
|
|
176
|
-
---
|
|
177
|
-
|
|
178
|
-
### F-17 — Runtime validation evidence shows "pending" for every finding [CD]
|
|
179
|
-
|
|
180
|
-
All 95 findings contain entries like `"runtime:unit:src-modules: pending — No runtime evidence recorded yet"`. These appear verbatim across every finding, bloating each one with 2–3 lines of noise that convey nothing.
|
|
181
|
-
|
|
182
|
-
**Fix needed:** Omit pending evidence entries from the output entirely. A finding with no runtime evidence should have no runtime evidence in its array — not a placeholder repeated 95 times.
|
|
183
|
-
|
|
184
|
-
---
|
|
185
|
-
|
|
186
|
-
### F-18 — `root_cause_clusters` are file co-location groups, not semantic clusters [CD]
|
|
187
|
-
|
|
188
|
-
The 46 clusters are named `"correctness/correctness in src/modules"` — file-path groups with a lens label, not semantic root causes. One cluster for "correctness in src/modules" contains 5 findings with 3 entirely different root causes (division-by-zero, cudagraph violation, dead code).
|
|
189
|
-
|
|
190
|
-
**Fix needed:** Root-cause clustering should be semantic (e.g., "All NaN paths from missing eps guards"). Either improve the clustering algorithm or rename the section to `file_clusters` to accurately describe what it contains.
|
|
191
|
-
|
|
192
|
-
---
|
|
193
|
-
|
|
194
|
-
### F-19 — `work_blocks` section omitted from final summary presentation [CD]
|
|
195
|
-
|
|
196
|
-
The audit directive says findings should be organized into "non-overlapping blocks of tasks." `synthesis_report.json` correctly generates 22 `work_blocks`. The final summary presentation omitted this section entirely, showing only a flat findings table. Work blocks are arguably the most actionable output and should lead the summary.
|
|
197
|
-
|
|
198
|
-
**Fix needed:** Make the `work_blocks` presentation requirement explicit and prominent in the final-output section of the prompt.
|
|
199
|
-
|
|
200
|
-
---
|
|
201
|
-
|
|
202
|
-
### F-20 — `reviewed_ranges` field is unenforceable and creates false confidence [CD]
|
|
203
|
-
|
|
204
|
-
Workers can declare `reviewed_ranges: [{start: 1, end: 10}]` while writing findings about line 800, or declare the full file range without actually reading it. For a 966-line file this makes the field meaningless.
|
|
205
|
-
|
|
206
|
-
**Fix needed:** Either remove `reviewed_ranges` (it creates false confidence) or enforce it mechanically by requiring a content hash or line-count to be included alongside the range declaration.
|
|
207
|
-
|
|
208
|
-
---
|
|
209
|
-
|
|
210
|
-
## Summary Table
|
|
211
|
-
|
|
212
|
-
| ID | Issue | Sev | Env | Type |
|
|
213
|
-
|----|-------|-----|-----|------|
|
|
214
|
-
| F-01 | Orchestrator never reaches `status: "complete"` — task statuses undefined | Critical | Both | Bug |
|
|
215
|
-
| F-02 | Worker launch failures / silent executor failures | Critical | Both | Bug |
|
|
216
|
-
| F-03 | `--results` ingestion unreliable (type errors, wrong path, `.map()` crash) | High | Both | Bug |
|
|
217
|
-
| F-04 | CLI hangs without output on some invocations | High | CD | Bug |
|
|
218
|
-
| F-05 | Requeue tasks explosion — 141 tasks from 10 findings | High | OC | Bug |
|
|
219
|
-
| F-06 | Evidence schema undocumented; validated only at ingestion (conflicting types across envs) | Medium | Both | DX |
|
|
220
|
-
| F-07 | `synthesis_current` permanently "missing" even when report is populated | Medium | Both | Bug |
|
|
221
|
-
| F-08 | "All remaining N tasks low priority" — no guidance on worker action | Medium | CD | UX |
|
|
222
|
-
| F-09 | Trivially empty files dispatched as full audit tasks | Medium | CD | Design |
|
|
223
|
-
| F-10 | No batch task dispatch; one task = one CLI invocation = one round-trip | Medium | Both | Design |
|
|
224
|
-
| F-11 | Coverage matrix ↔ task_id mapping is opaque; no lookup table | Medium | OC | DX |
|
|
225
|
-
| F-12 | Bash variable substitution breaks `node -e` shell loops | Medium | CD | DX |
|
|
226
|
-
| F-13 | Session config discovery / provider switching error-prone; noisy warnings | Medium | Both | UX |
|
|
227
|
-
| F-14 | No documentation on artifact schema or contract | Medium | OC | DX |
|
|
228
|
-
| F-15 | `worker_results_pending.json` not cleared after ingestion | Low | CD | Bug |
|
|
229
|
-
| F-16 | `related_findings` circular self-references | Low | CD | Data |
|
|
230
|
-
| F-17 | Runtime validation "pending" entries in all 95 findings | Low | CD | Data |
|
|
231
|
-
| F-18 | `root_cause_clusters` are file co-location groups, not semantic | Low | CD | Design |
|
|
232
|
-
| F-19 | `work_blocks` omitted from final summary presentation | Low | CD | UX |
|
|
233
|
-
| F-20 | `reviewed_ranges` unenforceable; creates false confidence | Low | CD | Design |
|
|
234
|
-
|
|
235
|
-
**Env key:** CD = Claude Desktop (claude-code provider), OC = OpenCode (opencode provider), Both = independently observed in both.
|
|
236
|
-
|
|
237
|
-
**Type key:** Bug = incorrect behavior, DX = developer/worker experience, UX = output/presentation, Design = intentional design that needs reconsideration, Data = output data quality.
|
package/docs/github-copilot.md
DELETED
|
@@ -1,66 +0,0 @@
|
|
|
1
|
-
# GitHub Copilot setup
|
|
2
|
-
|
|
3
|
-
This is one repo-local host integration for the `/audit-code` conversation surface.
|
|
4
|
-
|
|
5
|
-
It is now a narrower host-specific path. The preferred user flow is a global
|
|
6
|
-
package install plus the prompt's self-bootstrap step; `audit-code install`
|
|
7
|
-
remains the explicit repair or force-refresh path.
|
|
8
|
-
|
|
9
|
-
## Recommended path
|
|
10
|
-
|
|
11
|
-
Install once:
|
|
12
|
-
|
|
13
|
-
```bash
|
|
14
|
-
npm install -g auditor-lambda
|
|
15
|
-
```
|
|
16
|
-
|
|
17
|
-
Then invoke `/audit-code` in a supported chat surface. The prompt runs:
|
|
18
|
-
|
|
19
|
-
```bash
|
|
20
|
-
audit-code ensure --quiet
|
|
21
|
-
```
|
|
22
|
-
|
|
23
|
-
If Copilot has not discovered the workspace prompt/MCP files yet, run this from
|
|
24
|
-
the target repository root:
|
|
25
|
-
|
|
26
|
-
```bash
|
|
27
|
-
audit-code install
|
|
28
|
-
```
|
|
29
|
-
|
|
30
|
-
That writes the canonical `/audit-code` prompt and compatibility instructions into:
|
|
31
|
-
|
|
32
|
-
```text
|
|
33
|
-
.github/prompts/audit-code.prompt.md
|
|
34
|
-
.github/copilot-instructions.md
|
|
35
|
-
```
|
|
36
|
-
|
|
37
|
-
After that, open GitHub Copilot Chat in the same repository and invoke `/audit-code`.
|
|
38
|
-
|
|
39
|
-
If you only want the narrower VS Code / Copilot install surface, use:
|
|
40
|
-
|
|
41
|
-
```bash
|
|
42
|
-
audit-code install --host vscode
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
## Behavior
|
|
46
|
-
|
|
47
|
-
- `audit-code ensure` creates or refreshes the same files only when missing or stale
|
|
48
|
-
- `audit-code install` force-refreshes the canonical prompt payload from `skills/audit-code/audit-code.prompt.md`
|
|
49
|
-
- the generated prompt file explicitly sets `agent: auditor` so Copilot Chat uses the generated auditor custom agent
|
|
50
|
-
- the installer upserts its managed compatibility block into `.github/copilot-instructions.md` instead of clobbering unrelated instructions
|
|
51
|
-
- it prints machine-readable JSON describing the installed targets
|
|
52
|
-
|
|
53
|
-
## Explicit root override
|
|
54
|
-
|
|
55
|
-
If you are running from outside the target repository:
|
|
56
|
-
|
|
57
|
-
```bash
|
|
58
|
-
audit-code install --root /path/to/repo
|
|
59
|
-
```
|
|
60
|
-
|
|
61
|
-
## Why this exists
|
|
62
|
-
|
|
63
|
-
The goal is to reduce conversation setup friction without repositioning the backend CLI as the primary product surface.
|
|
64
|
-
|
|
65
|
-
GitHub Copilot now shares the same repo-native self-bootstrap path as the other currently automated hosts.
|
|
66
|
-
The older `audit-code install-host --host copilot` alias still exists for compatibility, but it is no longer the recommended setup flow.
|
package/docs/model-selection.md
DELETED
|
@@ -1,97 +0,0 @@
|
|
|
1
|
-
# model selection
|
|
2
|
-
|
|
3
|
-
This repository has two distinct model-selection layers.
|
|
4
|
-
|
|
5
|
-
## 1. Skill-first product rule
|
|
6
|
-
|
|
7
|
-
The canonical product surface is `/audit-code` in conversation.
|
|
8
|
-
|
|
9
|
-
For that surface, the default model rule is:
|
|
10
|
-
|
|
11
|
-
- use the active conversation model by default
|
|
12
|
-
- avoid forcing the user to supply a model in normal usage
|
|
13
|
-
|
|
14
|
-
That is the intended product contract.
|
|
15
|
-
|
|
16
|
-
When packet dispatch is prepared, `dispatch-plan.json` includes
|
|
17
|
-
provider-neutral complexity metadata and a `model_hint.tier` value:
|
|
18
|
-
|
|
19
|
-
- `small` for tiny, low-priority packets without sensitive lenses or risk tags
|
|
20
|
-
- `standard` for ordinary bounded review packets
|
|
21
|
-
- `deep` for high-priority, large, critical-flow, or external-signal packets
|
|
22
|
-
|
|
23
|
-
Hosts that support per-subagent model choice may map those tiers to their own
|
|
24
|
-
available models. Hosts that do not support model choice can ignore the fields.
|
|
25
|
-
The backend still does not prescribe concrete model names.
|
|
26
|
-
|
|
27
|
-
## 2. Backend provider rule
|
|
28
|
-
|
|
29
|
-
When the local backend delegates bounded worker runs into an external provider, model selection becomes provider-specific.
|
|
30
|
-
|
|
31
|
-
That means the supervisor should not invent a global model abstraction unless it can be implemented consistently across adapters.
|
|
32
|
-
|
|
33
|
-
Today the practical rule is:
|
|
34
|
-
|
|
35
|
-
- no provider-specific model override by default
|
|
36
|
-
- let each provider use its own default model unless explicitly configured otherwise
|
|
37
|
-
- when a model is required, pass it through the provider's own native CLI arguments via `extra_args`
|
|
38
|
-
|
|
39
|
-
## Current provider behavior
|
|
40
|
-
|
|
41
|
-
### `local-subprocess`
|
|
42
|
-
|
|
43
|
-
No external LLM provider is selected here.
|
|
44
|
-
|
|
45
|
-
The supervisor simply launches the bounded local worker command.
|
|
46
|
-
|
|
47
|
-
### `claude-code`
|
|
48
|
-
|
|
49
|
-
The built-in Claude Code adapter supports model overrides through `claude_code.extra_args`.
|
|
50
|
-
|
|
51
|
-
Example:
|
|
52
|
-
|
|
53
|
-
```json
|
|
54
|
-
{
|
|
55
|
-
"provider": "claude-code",
|
|
56
|
-
"ui_mode": "visible",
|
|
57
|
-
"claude_code": {
|
|
58
|
-
"command": "claude",
|
|
59
|
-
"extra_args": ["--model", "sonnet"]
|
|
60
|
-
}
|
|
61
|
-
}
|
|
62
|
-
```
|
|
63
|
-
|
|
64
|
-
Use this only when you want to force a specific Claude Code model instead of relying on the user's normal Claude Code default.
|
|
65
|
-
|
|
66
|
-
### `opencode`
|
|
67
|
-
|
|
68
|
-
The built-in OpenCode adapter supports model overrides through `opencode.extra_args`.
|
|
69
|
-
|
|
70
|
-
Example:
|
|
71
|
-
|
|
72
|
-
```json
|
|
73
|
-
{
|
|
74
|
-
"provider": "opencode",
|
|
75
|
-
"ui_mode": "visible",
|
|
76
|
-
"opencode": {
|
|
77
|
-
"command": "opencode",
|
|
78
|
-
"extra_args": ["--model", "anthropic/claude-sonnet-4.5"]
|
|
79
|
-
}
|
|
80
|
-
}
|
|
81
|
-
```
|
|
82
|
-
|
|
83
|
-
Use this only when you want to force a specific OpenCode provider/model pair instead of relying on the user's normal OpenCode default.
|
|
84
|
-
|
|
85
|
-
### `subprocess-template` and `vscode-task`
|
|
86
|
-
|
|
87
|
-
These adapters do not define model semantics themselves.
|
|
88
|
-
|
|
89
|
-
If you need model selection here, include it in the external launcher command template or in the external tool being invoked.
|
|
90
|
-
|
|
91
|
-
## Recommendation
|
|
92
|
-
|
|
93
|
-
For the cleanest product behavior:
|
|
94
|
-
|
|
95
|
-
1. in conversation, let `/audit-code` inherit the active conversation model
|
|
96
|
-
2. for repo-local backend usage, do not force model selection unless the operator has a concrete reason
|
|
97
|
-
3. when needed, set model selection in the provider-native config rather than inventing another repo-level abstraction
|
package/docs/next-steps.md
DELETED
|
@@ -1,202 +0,0 @@
|
|
|
1
|
-
# Next Implementation Steps
|
|
2
|
-
|
|
3
|
-
This document tracks the next meaningful implementation work after the packet
|
|
4
|
-
review-dispatch refactor and the current skill-first productionization pass.
|
|
5
|
-
|
|
6
|
-
As of April 30, 2026, the shared MCP substrate and the host-native installer pass have landed, but this repository is not yet ready for a public production launch.
|
|
7
|
-
|
|
8
|
-
See:
|
|
9
|
-
|
|
10
|
-
- `docs/production-readiness.md`
|
|
11
|
-
|
|
12
|
-
## Current state
|
|
13
|
-
|
|
14
|
-
The repository now supports:
|
|
15
|
-
|
|
16
|
-
- `/audit-code` as the documented canonical product route
|
|
17
|
-
- packaged and repository-local access to `skills/audit-code/audit-code.prompt.md`
|
|
18
|
-
- `audit-code prompt-path` as a stable prompt lookup helper
|
|
19
|
-
- `npm install -g auditor-lambda` as the intended one-time user install
|
|
20
|
-
- `audit-code ensure` as an idempotent self-bootstrap helper for current repository host surfaces
|
|
21
|
-
- `audit-code install --host vscode|opencode|codex|claude-desktop|antigravity|all` as an explicit repair or force-refresh installer for the current host surfaces
|
|
22
|
-
- `audit-code mcp` as a first-class stdio MCP server entrypoint
|
|
23
|
-
- a stable MCP contract with the `start_audit`, `get_status`, `continue_audit`, `explain_task`, `validate_artifacts`, `import_results`, and `import_runtime_updates` tools
|
|
24
|
-
- repo-local MCP resources for current artifacts, operator handoff, install guidance, and the current audit report
|
|
25
|
-
- repo-local MCP prompts for `audit-code`, `review-task`, and `synthesize-report`
|
|
26
|
-
- generated `.audit-code/install/manifest.json` plus a shared repo-local MCP launcher script
|
|
27
|
-
- Codex install assets including a repo skill bundle, `AGENTS.md` support, MCP setup guidance, and an automation recipe
|
|
28
|
-
- Claude Desktop install assets including a project template, remote connector template, and generated local bundle artifacts
|
|
29
|
-
- OpenCode install assets including command, skill, prompt, and `opencode.json` support
|
|
30
|
-
- VS Code install assets including prompt file, Copilot instructions, custom agent, and `.vscode/mcp.json`
|
|
31
|
-
- Antigravity install assets including planning-mode guidance and MCP-oriented setup guidance
|
|
32
|
-
- explicit failures for malformed backend config and corrupted artifact JSON
|
|
33
|
-
- `audit-code validate` as a machine-readable backend operator check
|
|
34
|
-
- an explicit in-repo release gate via `npm run verify:release`
|
|
35
|
-
- structured operator handoff output plus `.audit-artifacts/operator-handoff.{json,md}` for blocked fallback runs
|
|
36
|
-
- configured provider bridges that can continue audit-task review by writing structured results and handing control back to the bounded worker command
|
|
37
|
-
- graph-informed review packets, `review_packets.json`, and `audit_plan_metrics.json`
|
|
38
|
-
- compact packet `prepare-dispatch` and `merge-and-ingest` envelopes
|
|
39
|
-
|
|
40
|
-
That means the current release is suitable for a controlled alpha or beta skill-first workflow with MCP-aware host bootstrapping, but it is not yet the final public production end-state.
|
|
41
|
-
|
|
42
|
-
## Near-term priorities
|
|
43
|
-
|
|
44
|
-
### 1. Prove packet review dispatch on real repositories
|
|
45
|
-
|
|
46
|
-
The highest-priority product follow-through is to validate the packet workflow
|
|
47
|
-
outside this repository and compare it to the legacy fan-out baseline.
|
|
48
|
-
|
|
49
|
-
Near-term work should focus on:
|
|
50
|
-
|
|
51
|
-
- running `/audit-code` against at least one nontrivial external repository
|
|
52
|
-
- recording packet count, task count, warning count, and largest-packet estimate
|
|
53
|
-
- comparing observed worker count and token/quota behavior against the old
|
|
54
|
-
one-task-per-worker model
|
|
55
|
-
- tightening packet budgets or warning thresholds if real repositories expose
|
|
56
|
-
rough edges
|
|
57
|
-
|
|
58
|
-
The current handoff for this work is:
|
|
59
|
-
|
|
60
|
-
- `docs/workflow-refactor-brief.md`
|
|
61
|
-
- `docs/remediation-baseline.md`
|
|
62
|
-
|
|
63
|
-
### 2. Verify the shipped host integrations end to end
|
|
64
|
-
|
|
65
|
-
The biggest remaining gap is not raw feature presence anymore. It is host-by-host proof that the generated assets work in the actual products they target.
|
|
66
|
-
|
|
67
|
-
Near-term work should focus on:
|
|
68
|
-
|
|
69
|
-
- verifying the Codex skill bundle, `AGENTS.md`, MCP setup guidance, and automation recipe against the real Codex app flow
|
|
70
|
-
- installing and smoke-testing the generated Claude Desktop `DXT` or bundle output in a real Desktop environment
|
|
71
|
-
- validating that the OpenCode `opencode.json` shape, command file, and MCP config match current OpenCode behavior
|
|
72
|
-
- validating the VS Code prompt, agent, and `.vscode/mcp.json` flow inside a real workspace
|
|
73
|
-
- validating that the Antigravity planning-mode guidance is accurate and does not over-promise a native saved-workflow surface
|
|
74
|
-
|
|
75
|
-
### 3. Close the remaining host-native UX gaps
|
|
76
|
-
|
|
77
|
-
The product goal is still conversational first, not fallback-CLI first, and some shipped surfaces are still guidance-heavy rather than truly native.
|
|
78
|
-
|
|
79
|
-
Near-term work should focus on:
|
|
80
|
-
|
|
81
|
-
- turning the Codex automation recipe into a proven, documented automation flow after real operator validation
|
|
82
|
-
- polishing Claude Desktop one-click install so the generated bundle is a clearly supported path instead of a mostly technical artifact
|
|
83
|
-
- deciding whether OpenCode and VS Code need any smaller UX refinements after smoke-testing, rather than assuming the first generated surfaces are final
|
|
84
|
-
- keeping Antigravity framed as a workflow-and-artifacts host until Google documents a stable project-local config surface
|
|
85
|
-
|
|
86
|
-
### 4. Polish continuation through assisted review
|
|
87
|
-
|
|
88
|
-
The repo-local backend fallback still intentionally stops in blocked state under `local-subprocess`, but configured provider bridges can now continue the audit-task review phase automatically.
|
|
89
|
-
|
|
90
|
-
Near-term work should focus on:
|
|
91
|
-
|
|
92
|
-
- clearer evidence handoff for imported results and runtime updates
|
|
93
|
-
- less operator guesswork when a configured provider fails to return usable results
|
|
94
|
-
- stronger host-specific guidance for provider-assisted bridges
|
|
95
|
-
|
|
96
|
-
### 5. Harden publish and release operations
|
|
97
|
-
|
|
98
|
-
The packaged install story is in place, but release operations still need finishing work.
|
|
99
|
-
|
|
100
|
-
Near-term work should focus on:
|
|
101
|
-
|
|
102
|
-
- npm package-name availability and ownership
|
|
103
|
-
- one-time npm Trusted Publisher setup for `.github/workflows/publish-package.yml`
|
|
104
|
-
- the first GitHub Actions dry run and live publish through that workflow
|
|
105
|
-
- keeping prerelease publication off the `latest` dist-tag unless intentionally requested
|
|
106
|
-
- keeping linked-install and packaged-install smoke checks as release gates
|
|
107
|
-
|
|
108
|
-
## Frictionless-ready checklist
|
|
109
|
-
|
|
110
|
-
The repository should not be described as frictionless and production-ready until this checklist is substantially true:
|
|
111
|
-
|
|
112
|
-
### Codex, Claude Desktop, OpenCode, and VS Code
|
|
113
|
-
|
|
114
|
-
- `audit-code ensure` remains the default self-bootstrap path and `audit-code install` remains the explicit repair path
|
|
115
|
-
- the generated repo-local surfaces are obvious in installer output and in `.audit-code/install/GETTING-STARTED.md`
|
|
116
|
-
- a new user can invoke `/audit-code` without guessing where prompts or commands were written
|
|
117
|
-
- the generated MCP setup path works in the real host, not only in unit tests
|
|
118
|
-
- smoke coverage continues to verify the exact repo-local files these hosts consume
|
|
119
|
-
|
|
120
|
-
### Antigravity
|
|
121
|
-
|
|
122
|
-
- the planning-mode guidance is explicit, repo-local, and easy to discover
|
|
123
|
-
- `.audit-code/install/GETTING-STARTED.md` gives Antigravity-specific steps instead of generic prompt-import advice
|
|
124
|
-
- docs avoid implying native repo-local saved-workflow support that is not actually shipped
|
|
125
|
-
- the backend fallback remains a clearly secondary path instead of the default recommendation
|
|
126
|
-
|
|
127
|
-
### Assisted review continuation
|
|
128
|
-
|
|
129
|
-
- configured interactive providers can continue blocked audits through audit-task review in the same wrapper invocation
|
|
130
|
-
- operator handoff artifacts remain explicit and inspectable even when continuation is smoother
|
|
131
|
-
|
|
132
|
-
### Release operations
|
|
133
|
-
|
|
134
|
-
- `npm run verify:release` remains green and authoritative
|
|
135
|
-
- the real publish path is proven with npm ownership, npm Trusted Publishing, and a real GitHub Actions dry run or prerelease publish
|
|
136
|
-
|
|
137
|
-
## Probable next steps
|
|
138
|
-
|
|
139
|
-
These are the most likely next implementation steps based on the current codebase state, but they should still be treated as provisional rather than guaranteed:
|
|
140
|
-
|
|
141
|
-
### 1. Prove the host installers in real products
|
|
142
|
-
|
|
143
|
-
Status:
|
|
144
|
-
|
|
145
|
-
- partially completed in code, not yet fully validated operationally
|
|
146
|
-
|
|
147
|
-
Most likely shape:
|
|
148
|
-
|
|
149
|
-
- run fresh-repo smoke checks inside Codex, Claude Desktop, OpenCode, and VS Code, with Antigravity validated against its planning-mode path
|
|
150
|
-
- confirm that the generated files are both syntactically valid and actually discovered by each host
|
|
151
|
-
- tighten generated docs wherever operator confusion appears during those checks
|
|
152
|
-
- keep Antigravity as a documented planning-mode path unless a stable project config contract is published
|
|
153
|
-
|
|
154
|
-
Practical success bar:
|
|
155
|
-
|
|
156
|
-
- a new operator can run one install command and reach a working `/audit-code` or MCP-backed flow in each claimed host without guesswork
|
|
157
|
-
|
|
158
|
-
### 2. Harden configured interactive-provider continuation
|
|
159
|
-
|
|
160
|
-
Most likely shape:
|
|
161
|
-
|
|
162
|
-
- use the existing provider configuration surface in `.audit-artifacts/session-config.json`
|
|
163
|
-
- keep the provider-assisted review handoff less manual when `claude-code`, `opencode`, or another bridge is intentionally configured
|
|
164
|
-
- preserve explicit artifact imports and operator visibility instead of hiding state transitions
|
|
165
|
-
- improve diagnostics and recovery when the provider fails to emit structured results
|
|
166
|
-
|
|
167
|
-
Practical success bar:
|
|
168
|
-
|
|
169
|
-
- a configured provider can continue through audit-task review with good diagnostics and low operator guesswork when something goes wrong
|
|
170
|
-
|
|
171
|
-
### 3. Finish the Claude Desktop and Antigravity follow-through
|
|
172
|
-
|
|
173
|
-
Most likely shape:
|
|
174
|
-
|
|
175
|
-
- prove the generated Claude Desktop local bundle in a real Desktop install flow
|
|
176
|
-
- decide whether to check in or generate the final desktop-extension packaging metadata more explicitly
|
|
177
|
-
- add remote connector deployment guidance that is specific enough for Team or Enterprise rollout
|
|
178
|
-
- document exactly how Antigravity-produced artifacts should flow back through `import_results` and `import_runtime_updates`
|
|
179
|
-
|
|
180
|
-
Practical success bar:
|
|
181
|
-
|
|
182
|
-
- Claude Desktop and Antigravity guidance is operational, specific, and consistent with what the products really support
|
|
183
|
-
|
|
184
|
-
### 4. Prove the release path outside the repository
|
|
185
|
-
|
|
186
|
-
Most likely shape:
|
|
187
|
-
|
|
188
|
-
- confirm npm package-name ownership and npm Trusted Publisher configuration
|
|
189
|
-
- run a real GitHub Actions pre-release or dry-run publish
|
|
190
|
-
- keep `npm run verify:release` as the minimum in-repo gate before publish
|
|
191
|
-
|
|
192
|
-
Practical success bar:
|
|
193
|
-
|
|
194
|
-
- the release workflow is demonstrated end to end instead of only being inferred from configuration
|
|
195
|
-
|
|
196
|
-
## Non-goals for the next phase
|
|
197
|
-
|
|
198
|
-
These should not become the primary focus of the next implementation pass:
|
|
199
|
-
|
|
200
|
-
- repositioning the CLI as a peer product surface
|
|
201
|
-
- expanding low-level backend helpers into a CLI-first user experience
|
|
202
|
-
- making backend implementation details outrank the conversation contract in docs or product decisions
|