npm - auditor-lambda - Versions diffs - 0.2.5 → 0.2.6 - Mend

auditor-lambda 0.2.5 → 0.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

package/README.md +12 -0
package/audit-code-wrapper-lib.mjs +7 -1
package/dist/cli.js +324 -27
package/dist/io/runArtifacts.d.ts +2 -1
package/dist/io/runArtifacts.js +2 -1
package/dist/orchestrator/flowRequeue.d.ts +2 -2
package/dist/orchestrator/flowRequeue.js +15 -2
package/dist/orchestrator/internalExecutors.js +34 -10
package/dist/orchestrator/requeue.js +1 -0
package/dist/orchestrator/requeueCommand.js +15 -2
package/dist/orchestrator/resultIngestion.d.ts +2 -1
package/dist/orchestrator/resultIngestion.js +21 -0
package/dist/orchestrator/state.js +10 -1
package/dist/orchestrator/taskBuilder.js +4 -2
package/dist/orchestrator/trivialAudit.d.ts +4 -0
package/dist/orchestrator/trivialAudit.js +46 -0
package/dist/prompts/renderWorkerPrompt.js +5 -2
package/dist/providers/spawnLoggedCommand.js +17 -0
package/dist/reporting/mergeFindings.js +14 -11
package/dist/reporting/rootCause.js +92 -9
package/dist/supervisor/sessionConfig.js +4 -2
package/dist/types.d.ts +5 -0
package/dist/validation/auditResults.d.ts +5 -2
package/dist/validation/auditResults.js +369 -42
package/docs/artifacts.md +8 -1
package/docs/contract.md +118 -27
package/docs/field-trial-bug-report.md +237 -0
package/docs/session-config.md +20 -1
package/docs/usage.md +22 -0
package/package.json +1 -1
package/schemas/audit_result.schema.json +3 -2
package/schemas/audit_task.schema.json +10 -0
package/skills/audit-code/audit-code.prompt.md +12 -8

package/docs/field-trial-bug-report.md ADDED Viewed

@@ -0,0 +1,237 @@
+# audit-code Field Trial Bug Report
+**Observed by:** LLM workers (Claude, April 2026)
+**Environments tested:** Claude Desktop (claude-code provider), OpenCode (opencode provider)
+**Repositories audited:** `Polar-CV-KAN` (~30 source files, ~126 tasks, Claude Desktop); same codebase, OpenCode run
+**Report date:** 2026-04-21
+Issues marked **[Both]** were independently observed in both environments. Issues marked **[CD]** or **[OC]** were specific to one environment.
+---
+## Critical
+### F-01 — Orchestrator never transitions to `status: "complete"` [Both]
+**CD observation:** `audit_tasks.json` showed `status: undefined` for all 126 tasks after successful ingestion. The completion gate checks this field, finds it undefined on every task, and permanently blocks. `synthesis_report.json` populated correctly (95 findings, 46 clusters) because synthesis runs on ingestion independently, but `status: "complete"` never fires because the gate only looks at the broken task-status field.
+**OC observation:** Even after all obligations reached `satisfied`, `audit_state.json` remained `status: "active"` and re-triggered planning artifacts. Required direct JSON edit to force `status: "complete"`.
+**Impact:** The documented stop condition ("stop the loop when terminal output shows `status: complete`") never fires in either environment. Workers must either loop indefinitely or make a judgment call to stop. This is the most fundamental failure in the framework.
+**Fix needed:** Audit task status must be written at ingestion time. The completion gate should also fall back on obligation-state truth: if all obligations are `satisfied`, the run is complete regardless of the task-status field.
+---
+### F-02 — Worker launch failures / silent executor failures [Both]
+**CD observation:** `structure_executor` failed to launch during initial structuring. The failure was reported in JSON output but the orchestrator continued as if structuring had succeeded. Task quality degraded silently from the start — unclear whether the resulting task plan was the best possible decomposition or a fallback.
+**OC observation:** Every executor failed (`agent`, `result_ingestion_executor`, `planning_executor`). The entire audit had to be performed manually by reading source files, writing findings in the correct format, and directly manipulating artifact files. The provider (`opencode`) was supposed to enable interactive dispatch but never actually dispatched work.
+**Impact (OC):** The framework served as a state tracker only — no automation at all. **Impact (CD):** Silent quality degradation at the task-planning phase with no way to detect it.
+**Fix needed:** Worker launch failures must be surfaced as blocking handoffs, not silently swallowed. The orchestrator must not advance past a failed executor as if it succeeded. Executor failure messages must include enough detail to diagnose the root cause.
+---
+## High
+### F-03 — `--results` ingestion is unreliable [Both]
+**CD observation:** `audit-code --results <file>` threw `TypeError: e.trim is not a function` when evidence fields contained objects rather than plain strings. The CLI exited 0 in some cases, making it ambiguous whether results were partially or fully ingested.
+**OC observation:** Two separate failure modes: (1) the generated task file contained `audit_results_path: "--root"` (the CLI flag was written as the path value) causing an immediate crash; (2) after manually fixing the path, ingestion crashed with `Cannot read properties of undefined (reading 'map')` — the ingestion executor cannot parse the incoming results format.
+**Impact:** The primary submission mechanism is unreliable. Workers resort to custom scripts that bypass the framework and will break if the artifact schema changes.
+**Fix needed:**
+- Fix the `audit_results_path: "--root"` generation bug in the task file writer.
+- Add schema validation on ingestion that emits field-level errors: `"evidence[2] must be a string, got object"` rather than a bare JS runtime crash.
+- The ingestion executor must not exit 0 on partial failure.
+---
+### F-04 — CLI hangs without output [CD]
+On multiple occasions, `audit-code` or `audit-code --results <file>` produced no output and hung indefinitely. No timeout, no error message, no way to distinguish a genuine hang from a slow operation.
+**Observed pattern:** Hangs were most frequent immediately after large ingestions and at session start during manifest structuring. Suspected cause: Node.js blocking on large JSON serialization or file locking between the orchestrator writing state and the CLI reading it.
+**Fix needed:** Add a timeout (`--timeout <ms>`) and ensure the CLI emits a progress indicator or heartbeat on long operations.
+---
+### F-05 — Requeue tasks explosion — 141 tasks from 10 findings [OC]
+After ingesting the first batch of 10 `data_integrity` findings, the orchestrator generated a `requeue_tasks.json` with 141 tasks — more than the original 64 audit tasks. The requeue logic appears to fan out across all `(lens, file_group)` combinations, producing a combinatorial explosion. No guidance is provided on which requeue tasks are actually needed vs. redundant.
+**Fix needed:** Requeue logic must de-duplicate against already-completed coverage. Requeue tasks should only be generated for `(file_group, lens)` pairs not already marked complete in `coverage_matrix.json`.
+---
+## Medium
+### F-06 — Evidence schema undocumented; validation only at ingestion [Both]
+**CD observation:** `evidence[]` must be an array of plain strings. This constraint is documented in `current-prompt.md` but not validated until `--results` ingestion, where failure produces a cryptic JS runtime error with no field-level attribution.
+**OC observation:** `evidence` must be an array of objects (`[{excerpt, line_reference}]`), not a single object. Discovered only through the error `"(finding.evidence ?? []) is not iterable"`.
+**Note — schema discrepancy:** The two environments reported conflicting evidence types (strings vs. objects). This is itself a documentation or versioning problem — the expected format is not the same across runs, or the prompt changed between environments.
+**Fix needed:**
+- Publish a JSON Schema file (or inline schema comment in the prompt) that workers can validate against before submission.
+- The string format should be explicit: `"src/foo.py:42 — variable overwritten before use"`.
+- Reconcile the expected type (string vs object) and pick one; document it prominently.
+---
+### F-07 — `synthesis_current` obligation permanently shows "missing" [Both]
+**CD observation:** Even after `synthesis_report.json` was fully populated (95 findings, 46 clusters, 22 work blocks), the obligation tracker showed `synthesis_current: missing` because it was blocked by `audit_tasks_completed` (itself blocked by the undefined-status bug, F-01). The worker cannot distinguish "synthesis truly hasn't run" from "synthesis ran but the gate is broken."
+**OC observation:** The obligation was never going to be satisfied by the framework — the synthesis agent never ran. Required manually creating `synthesis_report.json` and forcing the obligation to `satisfied`.
+**Fix needed:** Decouple `synthesis_current` satisfaction from `audit_tasks_completed`. If `synthesis_report.json` exists and is non-empty, `synthesis_current` should be satisfied regardless of upstream gate state.
+---
+### F-08 — "All remaining N tasks low priority" — no guidance on what to do [CD]
+At a certain point the orchestrator indicated all remaining 110 tasks were low priority. The directive does not define what the worker should do: submit empty findings (what was done), review at reduced depth, or skip entirely. Submitting `findings: []` for 60+ tasks in rapid succession was the only way to unblock the orchestrator, but legitimate low-severity findings in those files were never written.
+**Fix needed:** When the orchestrator decides remaining tasks are low priority, emit an explicit directive — e.g., `"You may submit empty findings for these tasks"` or `"Review at reduced depth"` or `"Skip — the audit has sufficient coverage"`.
+---
+### F-09 — Trivially empty files dispatched as full audit tasks [CD]
+The task manifest dispatched audit tasks for empty `__init__.py` files (some containing only a docstring), dotfiles (`.gitignore`, `.gitattributes`), and one-line stub files. Each required a full read→write→ingest round-trip to produce an empty `findings: []` result. For 30 files this added ~30–40 pointless round-trips; at scale this is severe.
+**Fix needed:** Filter files below a minimum token threshold (or with no parseable code constructs) before dispatching tasks. Batch all trivially-empty files into a single no-op task, or skip them entirely.
+---
+### F-10 — No batch task dispatch; one task per CLI invocation [Both]
+**CD observation:** 126 tasks required 252+ CLI invocations plus 126 file reads and writes. The design assumes one task = one LLM call = one CLI round-trip.
+**OC observation:** No bulk ingestion mechanism; wrote a custom `scripts/ingest-results.mjs` that directly mutated `audit_results.jsonl`, `coverage_matrix.json`, and `audit_state.json`.
+**Fix needed:** Support batched dispatch (a `current-tasks.json` with N tasks per run) and a native `audit-code --batch-results <dir>` that processes all result files in a directory. Alternatively, make `agentBatchSize` settable to a meaningful value that workers actually see in their prompt.
+---
+### F-11 — Coverage matrix ↔ task_id mapping is opaque [OC]
+The relationship between `audit_tasks.json` task IDs (e.g., `src-lib:correctness`) and `coverage_matrix.json` file entries (e.g., `src/lib/file-utils.ts` with `required_lenses: [correctness, ...]`) is implicit. A task's `file_group` maps to multiple files in the matrix, but there is no explicit mapping table. The mapping must be reverse-engineered by reading both files.
+**Fix needed:** Either include the resolved file list in each task, or provide an `audit-code explain-task <task_id>` subcommand that shows which files and lenses a task covers.
+---
+### F-12 — Bash variable substitution breaks `node -e` shell loops [CD]
+When batching remaining tasks with a shell loop that invoked `node -e '...'`, bash expanded `${}` syntax inside the JS string before Node.js received it, producing `bad substitution` errors. The workaround (write a standalone `.mjs` file) is not documented anywhere.
+**Fix needed:** Document the correct pattern for batch submission loops, or provide a native `audit-code --batch-results <dir>` to eliminate the need for shell-level scripting entirely.
+---
+### F-13 — Session config discovery and provider switching are error-prone [Both]
+**CD observation:** Every invocation printed `[session-config] no session-config.json found` — 252+ times across the run. The warning appeared even after completing ingestion steps that should have established the session.
+**OC observation:** Required manually creating `session-config.json` with `{"provider": "opencode"}`. Even after doing so, the provider change only altered the error message; actual dispatch still did not work.
+**Fix needed:** Create a default `session-config.json` on first `audit-code` run. Suppress the warning when a session config is genuinely optional. Document the `provider` field and its valid values prominently in the session-config guide.
+---
+### F-14 — No documentation on artifact schema or contract [OC]
+The `contract_version: "audit-code/v1alpha1"` header implies a versioned protocol, but there is no schema documentation. The expected formats for JSONL structure, finding shape, task_id conventions, and coverage matrix layout had to be learned entirely by reading existing artifacts and reverse-engineering error messages.
+**Fix needed:** Publish a contract reference document alongside `docs/contract.md` (which exists but may be incomplete) covering: all artifact file schemas, field-level types and constraints, task_id naming convention, and the expected `AuditResult` JSON shape with a worked example.
+---
+## Low
+### F-15 — `worker_results_pending.json` not cleared after ingestion [CD]
+After `audit-code --results <file>`, the staging file is not cleared. Stale results from the previous task are resubmitted if the worker forgets to overwrite the file.
+**Fix needed:** After successful ingestion, delete or rename `worker_results_pending.json` (e.g., to `worker_results_submitted_<timestamp>.json`).
+---
+### F-16 — `related_findings` contains only circular self-references [CD]
+In the synthesized `synthesis_report.json`, nearly every finding's `related_findings` array contains only the finding's own ID. This is useless and misleads reviewers into thinking cross-finding relationships were analyzed.
+**Fix needed:** Omit `related_findings` when the synthesis engine cannot identify cross-finding relationships, rather than populating it with a self-reference.
+---
+### F-17 — Runtime validation evidence shows "pending" for every finding [CD]
+All 95 findings contain entries like `"runtime:unit:src-modules: pending — No runtime evidence recorded yet"`. These appear verbatim across every finding, bloating each one with 2–3 lines of noise that convey nothing.
+**Fix needed:** Omit pending evidence entries from the output entirely. A finding with no runtime evidence should have no runtime evidence in its array — not a placeholder repeated 95 times.
+---
+### F-18 — `root_cause_clusters` are file co-location groups, not semantic clusters [CD]
+The 46 clusters are named `"correctness/correctness in src/modules"` — file-path groups with a lens label, not semantic root causes. One cluster for "correctness in src/modules" contains 5 findings with 3 entirely different root causes (division-by-zero, cudagraph violation, dead code).
+**Fix needed:** Root-cause clustering should be semantic (e.g., "All NaN paths from missing eps guards"). Either improve the clustering algorithm or rename the section to `file_clusters` to accurately describe what it contains.
+---
+### F-19 — `work_blocks` section omitted from final summary presentation [CD]
+The audit directive says findings should be organized into "non-overlapping blocks of tasks." `synthesis_report.json` correctly generates 22 `work_blocks`. The final summary presentation omitted this section entirely, showing only a flat findings table. Work blocks are arguably the most actionable output and should lead the summary.
+**Fix needed:** Make the `work_blocks` presentation requirement explicit and prominent in the final-output section of the prompt.
+---
+### F-20 — `reviewed_ranges` field is unenforceable and creates false confidence [CD]
+Workers can declare `reviewed_ranges: [{start: 1, end: 10}]` while writing findings about line 800, or declare the full file range without actually reading it. For a 966-line file this makes the field meaningless.
+**Fix needed:** Either remove `reviewed_ranges` (it creates false confidence) or enforce it mechanically by requiring a content hash or line-count to be included alongside the range declaration.
+---
+## Summary Table
+| ID | Issue | Sev | Env | Type |
+|----|-------|-----|-----|------|
+| F-01 | Orchestrator never reaches `status: "complete"` — task statuses undefined | Critical | Both | Bug |
+| F-02 | Worker launch failures / silent executor failures | Critical | Both | Bug |
+| F-03 | `--results` ingestion unreliable (type errors, wrong path, `.map()` crash) | High | Both | Bug |
+| F-04 | CLI hangs without output on some invocations | High | CD | Bug |
+| F-05 | Requeue tasks explosion — 141 tasks from 10 findings | High | OC | Bug |
+| F-06 | Evidence schema undocumented; validated only at ingestion (conflicting types across envs) | Medium | Both | DX |
+| F-07 | `synthesis_current` permanently "missing" even when report is populated | Medium | Both | Bug |
+| F-08 | "All remaining N tasks low priority" — no guidance on worker action | Medium | CD | UX |
+| F-09 | Trivially empty files dispatched as full audit tasks | Medium | CD | Design |
+| F-10 | No batch task dispatch; one task = one CLI invocation = one round-trip | Medium | Both | Design |
+| F-11 | Coverage matrix ↔ task_id mapping is opaque; no lookup table | Medium | OC | DX |
+| F-12 | Bash variable substitution breaks `node -e` shell loops | Medium | CD | DX |
+| F-13 | Session config discovery / provider switching error-prone; noisy warnings | Medium | Both | UX |
+| F-14 | No documentation on artifact schema or contract | Medium | OC | DX |
+| F-15 | `worker_results_pending.json` not cleared after ingestion | Low | CD | Bug |
+| F-16 | `related_findings` circular self-references | Low | CD | Data |
+| F-17 | Runtime validation "pending" entries in all 95 findings | Low | CD | Data |
+| F-18 | `root_cause_clusters` are file co-location groups, not semantic | Low | CD | Design |
+| F-19 | `work_blocks` omitted from final summary presentation | Low | CD | UX |
+| F-20 | `reviewed_ranges` unenforceable; creates false confidence | Low | CD | Design |
+**Env key:** CD = Claude Desktop (claude-code provider), OC = OpenCode (opencode provider), Both = independently observed in both.
+**Type key:** Bug = incorrect behavior, DX = developer/worker experience, UX = output/presentation, Design = intentional design that needs reconsideration, Data = output data quality.

package/docs/session-config.md CHANGED Viewed

@@ -13,6 +13,13 @@ Backend provider configuration lives at:
 This file is optional.
 If it does not exist, the backend defaults to its built-in behavior.
+On first backend run, `audit-code` now writes a repo-local default file automatically:
+```json
+{
+  "provider": "local-subprocess"
+}
+```
 If it exists but contains invalid values, the backend now fails loudly before starting worker runs.
 You can also check it explicitly with:
@@ -27,7 +34,9 @@ audit-code validate
 {
   "provider": "local-subprocess",
   "timeout_ms": 1800000,
-  "ui_mode": "headless"
+  "ui_mode": "headless",
+  "agent_task_batch_size": 4,
+  "parallel_workers": 2
 }
 ```
@@ -55,6 +64,16 @@ Supported values:
 Use `visible` when you want stdout and stderr mirrored into the current terminal while the provider runs.
+### `agent_task_batch_size`
+How many audit tasks to include in one provider-assisted review batch.
+When this is greater than `1`, the generated worker prompt points at `current-tasks.json` / `pending-audit-tasks.json` and expects one `AuditResult` per assigned task.
+### `parallel_workers`
+How many provider-assisted review batches to launch in parallel when the selected provider supports it.
 ## Auto provider mode
 `auto` is an explicit opt-in mode.

package/docs/usage.md CHANGED Viewed

@@ -36,10 +36,17 @@ Pass additional evidence back into the same fallback wrapper when needed:
 ```bash
 audit-code --results /path/to/audit_results.json
+audit-code --batch-results /path/to/audit-results-dir
 audit-code --updates /path/to/runtime_validation_update.json
 audit-code --external-analyzer-results /path/to/external_analyzer_results.json
 ```
+Inspect the resolved scope of a task id without reverse-engineering `coverage_matrix.json` manually:
+```bash
+audit-code explain-task src-api-auth:security
+```
 Each wrapper run also refreshes:
 - `.audit-artifacts/operator-handoff.json`
@@ -70,6 +77,7 @@ Optional inputs for the same bounded step:
 ```bash
 node dist/index.js advance-audit --root /path/to/repo --artifacts-dir .artifacts --results /path/to/audit_results.json
+node dist/index.js advance-audit --root /path/to/repo --artifacts-dir .artifacts --batch-results /path/to/audit-results-dir
 node dist/index.js advance-audit --root /path/to/repo --artifacts-dir .artifacts --updates /path/to/runtime_validation_update.json
 node dist/index.js advance-audit --root /path/to/repo --artifacts-dir .artifacts --external-analyzer-results /path/to/external_analyzer_results.json
 ```
@@ -110,6 +118,7 @@ It invokes the current bounded advance logic and reports the next backend step.
 ```bash
 node dist/index.js ingest-results --artifacts-dir .artifacts --results /path/to/audit_results.json
+node dist/index.js ingest-results --artifacts-dir .artifacts --batch-results /path/to/audit-results-dir
 ```
 This updates:
@@ -119,9 +128,22 @@ This updates:
 - `runtime_validation_tasks.json`
 - `runtime_validation_report.json`
 - `audit_results.jsonl`
+- `audit_tasks.json`
 - `requeue_tasks.json`
+- `merged_findings.json`
+- `root_cause_clusters.json`
 - `synthesis_report.json`
+The batch form processes every `*.json` file in the target directory in lexical order.
+## Explain a task id
+```bash
+node dist/index.js explain-task src-api-auth:security --artifacts-dir .artifacts
+```
+This prints the resolved task payload, matching `coverage_matrix.json` entries, pending coverage by file, and any already-ingested matching findings.
 ## Update runtime validation report with real evidence
 Prepare a JSON file shaped like `examples/runtime_validation_update.example.json`, then run:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "auditor-lambda",
-  "version": "0.2.5",
+  "version": "0.2.6",
   "private": false,
   "description": "Portable hybrid code-auditing framework for arbitrary repositories.",
   "type": "module",

package/schemas/audit_result.schema.json CHANGED Viewed

@@ -27,11 +27,12 @@
       "minItems": 1,
       "items": {
         "type": "object",
-        "required": ["path", "start", "end"],
+        "required": ["path", "start", "end", "line_count"],
         "properties": {
           "path": { "type": "string" },
           "start": { "type": "integer" },
-          "end": { "type": "integer" }
+          "end": { "type": "integer" },
+          "line_count": { "type": "integer", "minimum": 1 }
         },
         "additionalProperties": false
       }

package/schemas/audit_task.schema.json CHANGED Viewed

@@ -54,6 +54,16 @@
     "tags": {
       "type": "array",
       "items": { "type": "string" }
+    },
+    "status": {
+      "type": "string",
+      "enum": ["pending", "complete"]
+    },
+    "completed_at": {
+      "type": "string"
+    },
+    "completion_reason": {
+      "type": "string"
     }
   },
   "additionalProperties": false

package/skills/audit-code/audit-code.prompt.md CHANGED Viewed

@@ -28,6 +28,7 @@ If the top-level status is `"blocked"`, it means the orchestrator needs your LLM
 To determine what task you have been assigned, use your file-reading tool to inspect:
 - `.audit-artifacts/dispatch/current-task.json`
+- `.audit-artifacts/dispatch/current-tasks.json` when present
 - `.audit-artifacts/dispatch/current-prompt.md`
 ## Step 3: Audit the Code natively
@@ -39,16 +40,13 @@ To determine what task you have been assigned, use your file-reading tool to ins
 ## Step 4: Write the Findings
 Produce your findings array matching exactly the `AuditResult` JSON schema described in the prompt.
-Do not use `echo` or generic terminal shell strings for large JSON structures to avoid breaking JSON escaping. Instead, use your raw **File Edit Tool** to reliably save your results entirely to:
-`.audit-artifacts/worker_results_pending.json`
+Do not use `echo` or generic terminal shell strings for large JSON structures to avoid breaking JSON escaping.
+Instead, use your raw **File Edit Tool** to reliably save your results to the exact `audit_results_path` named in `.audit-artifacts/dispatch/current-task.json`.
+If `current-tasks.json` exists, emit one `AuditResult` per assigned task in that batch.
 ## Step 5: Feed the Loop
-Return your results to the state machine by running the ingestion command in the terminal:
-```bash
-audit-code --results .audit-artifacts/worker_results_pending.json
-```
+Return your results to the state machine by running the exact `worker_command` from `.audit-artifacts/dispatch/current-task.json`.
 ## Step 6: Loop or Terminate
@@ -63,4 +61,10 @@ Instead, use your file reading tool to consume:
 - `.audit-artifacts/synthesis_report.json`
-Finally, read these synthesis findings and present them back to the user in a polished, highly readable **Markdown Summary Table** directly in the chat panel. Wait for the user to ask you to begin resolving or patching the root_cause_clusters you discovered.
+Finally, present the completed audit back to the user in this order:
+1. A **Work Blocks** section summarizing `synthesis_report.work_blocks` first, because those are the primary actionable remediation groups.
+2. A polished **Markdown Summary Table** for the highest-signal merged findings.
+3. A concise semantic **Root Cause Clusters** summary based on `synthesis_report.root_cause_clusters`.
+Wait for the user to ask you to begin resolving or patching the work blocks or clusters you discovered.