@nathapp/nax 0.19.0 → 0.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (58) hide show
  1. package/.claude/settings.json +15 -0
  2. package/docs/20260304-review-nax.md +492 -0
  3. package/docs/ROADMAP.md +52 -18
  4. package/docs/specs/bug-039-orphan-processes.md +131 -0
  5. package/docs/specs/bug-040-review-rectification.md +82 -0
  6. package/docs/specs/bug-041-cross-story-test-isolation.md +88 -0
  7. package/docs/specs/bug-042-verifier-failure-capture.md +117 -0
  8. package/docs/specs/feat-010-smart-runner-git-history.md +96 -0
  9. package/docs/specs/feat-011-file-context-strategy.md +73 -0
  10. package/docs/specs/feat-012-tdd-writer-tier.md +79 -0
  11. package/docs/specs/feat-013-test-after-review.md +89 -0
  12. package/docs/specs/feat-014-heartbeat-observability.md +127 -0
  13. package/memory/topic/feat-010-baseref.md +28 -0
  14. package/memory/topic/feat-013-test-after-deprecation.md +22 -0
  15. package/nax/config.json +7 -4
  16. package/nax/features/bug-039-medium/prd.json +45 -0
  17. package/nax/features/verify-v2/prd.json +79 -0
  18. package/nax/features/verify-v2/progress.txt +3 -0
  19. package/package.json +2 -2
  20. package/src/agents/claude.ts +66 -7
  21. package/src/config/defaults.ts +2 -1
  22. package/src/config/schemas.ts +2 -0
  23. package/src/config/types.ts +4 -0
  24. package/src/context/builder.ts +9 -1
  25. package/src/execution/lifecycle/index.ts +1 -0
  26. package/src/execution/lifecycle/run-completion.ts +29 -0
  27. package/src/execution/lifecycle/run-regression.ts +301 -0
  28. package/src/execution/pipeline-result-handler.ts +0 -1
  29. package/src/execution/post-verify.ts +31 -194
  30. package/src/execution/runner.ts +1 -0
  31. package/src/execution/sequential-executor.ts +1 -0
  32. package/src/pipeline/stages/verify.ts +27 -23
  33. package/src/pipeline/types.ts +2 -0
  34. package/src/review/runner.ts +39 -4
  35. package/src/routing/router.ts +3 -3
  36. package/src/routing/strategies/keyword.ts +5 -2
  37. package/src/routing/strategies/llm.ts +27 -1
  38. package/src/utils/git.ts +49 -25
  39. package/src/verification/executor.ts +8 -2
  40. package/src/verification/smart-runner.ts +58 -10
  41. package/test/integration/plugin-routing.test.ts +1 -1
  42. package/test/integration/rectification-flow.test.ts +3 -3
  43. package/test/integration/review-config-commands.test.ts +1 -1
  44. package/test/integration/verify-stage.test.ts +9 -0
  45. package/test/unit/agents/claude.test.ts +106 -0
  46. package/test/unit/config/defaults.test.ts +69 -0
  47. package/test/unit/config/regression-gate-schema.test.ts +159 -0
  48. package/test/unit/context.test.ts +6 -3
  49. package/test/unit/execution/lifecycle/run-completion.test.ts +239 -0
  50. package/test/unit/execution/lifecycle/run-regression.test.ts +418 -0
  51. package/test/unit/execution/post-verify-regression.test.ts +31 -84
  52. package/test/unit/execution/post-verify.test.ts +28 -48
  53. package/test/unit/pipeline/stages/verify.test.ts +266 -0
  54. package/test/unit/pipeline/verify-smart-runner.test.ts +2 -1
  55. package/test/unit/prd-auto-default.test.ts +2 -2
  56. package/test/unit/routing/routing-stability.test.ts +1 -1
  57. package/test/unit/routing/strategies/llm.test.ts +250 -0
  58. package/test/unit/routing.test.ts +7 -7
@@ -0,0 +1,15 @@
1
+ {
2
+ "hooks": {
3
+ "PostToolUse": [
4
+ {
5
+ "matcher": "Edit|Write",
6
+ "hooks": [
7
+ {
8
+ "type": "command",
9
+ "command": "bun x biome lint --write src/ bin/"
10
+ }
11
+ ]
12
+ }
13
+ ]
14
+ }
15
+ }
@@ -0,0 +1,492 @@
1
+ # Deep Code Review: @nathapp/nax
2
+
3
+ **Date:** 2026-03-04
4
+ **Reviewer:** Subrina (AI)
5
+ **Version:** 0.18.5
6
+ **Files:** 219 source (lib: ~32K LOC), 135 test (~50K LOC)
7
+ **Baseline:** 2069 tests pass, 12 skip, 0 fail
8
+
9
+ ---
10
+
11
+ ## Overall Grade: C+ (72/100)
12
+
13
+ nax is an ambitious orchestrator with a capable plugin system, well-structured pipeline architecture, and solid test suite (2069 tests, 0 failures). However, the codebase has accumulated significant technical debt: pervasive use of forbidden Node.js APIs (readFileSync, setTimeout, console.log), multiple mutation-of-caller-data bugs, race conditions in the parallel executor, and two unvalidated dynamic import paths that allow arbitrary code execution. The project's own conventions (.claude/rules/) are clear and well-documented, but compliance is inconsistent -- roughly 40% of source files violate at least one convention. The architecture (Runner -> Pipeline -> Stages) is sound, but execution modules have grown complex with duplicated logic and files exceeding the 400-line limit.
14
+
15
+ | Dimension | Score | Notes |
16
+ |:---|:---|:---|
17
+ | **Security** | 12/20 | Two unvalidated dynamic imports (SEC-1, SEC-2), shell injection vectors (SEC-3, SEC-4), --dangerously-skip-permissions hardcoded (SEC-5) |
18
+ | **Reliability** | 13/20 | Race condition in parallel executor (BUG-1), infinite PTY respawn (BUG-2), story duration miscalculation (BUG-3), unguarded array access (BUG-4), pipe buffer deadlock (MEM-1) |
19
+ | **API Design** | 16/20 | Clean plugin interfaces, well-typed pipeline stages. Duplicate RoutingDecision interface (TYPE-1), missing strategy in validator (BUG-12) |
20
+ | **Code Quality** | 15/20 | Good test coverage, clear architecture docs. 3 files over 400 lines, dead RunLifecycle class (312 LOC), tryLlmBatchRoute duplicated 3x |
21
+ | **Best Practices** | 16/20 | Strong .claude/rules, good test structure. ~30 forbidden-pattern violations (Node.js APIs, console.log, setTimeout, emojis) |
22
+
23
+ ---
24
+
25
+ ## Findings
26
+
27
+ ### CRITICAL
28
+
29
+ #### SEC-1: Unvalidated dynamic import in plugin loader
30
+ **Severity:** CRITICAL | **Category:** Security
31
+ **File:** `src/plugins/loader.ts:237`
32
+ ```typescript
33
+ const imported = await import(modulePath);
34
+ ```
35
+ `modulePath` comes from user config (`config.plugins[].module`) and is passed to `import()` with no path validation, allowlisting, or sandboxing. An attacker who can modify the project config can execute arbitrary code.
36
+ **Risk:** Remote code execution via malicious plugin config entry.
37
+ **Fix:** Validate `modulePath` against an allowlist of directories (global plugins dir, project plugins dir). Reject absolute paths outside these roots and any path containing `..`.
38
+
39
+ #### SEC-2: Unvalidated dynamic import in routing loader
40
+ **Severity:** CRITICAL | **Category:** Security
41
+ **File:** `src/routing/loader.ts:27-51`
42
+ ```typescript
43
+ const module = await import(absolutePath);
44
+ ```
45
+ `loadCustomStrategy` imports an arbitrary user-provided path with no validation. Same class of vulnerability as SEC-1.
46
+ **Risk:** Remote code execution via malicious routing strategy config.
47
+ **Fix:** Restrict to project-local paths, validate no path traversal.
48
+
49
+ #### BUG-1: Race condition in parallel executor concurrency control
50
+ **Severity:** CRITICAL | **Category:** Bug
51
+ **File:** `src/execution/parallel.ts:183-223`
52
+ The `executeParallelBatch` function uses a mutable array with `Promise.race` to enforce concurrency limits. When a promise resolves, the slot is freed and refilled -- but between the resolution check and the slot assignment, another resolution can also claim a slot, allowing more concurrent executions than the configured limit.
53
+ **Risk:** Exceeds configured concurrency, spawning more agent processes than intended. Can overload the system and cause OOM or rate-limit failures.
54
+ **Fix:** Replace with a proper semaphore/mutex pattern or use a tested concurrency limiter (e.g., p-limit).
55
+
56
+ #### BUG-2: Infinite PTY respawn from object reference dependency
57
+ **Severity:** CRITICAL | **Category:** Bug
58
+ **File:** `src/tui/hooks/usePty.ts:155`
59
+ ```typescript
60
+ }, [options]); // options is an object — new identity every render
61
+ ```
62
+ The `useEffect` dependency is the `options` object, which gets a new identity on every React render. This causes the effect to re-run every render cycle, killing and respawning the PTY process in an infinite loop.
63
+ **Risk:** Infinite process spawn/kill cycle consuming all system resources.
64
+ **Fix:** Memoize `options` with `useMemo` in the parent, or destructure individual primitive deps (`options.command`, `options.cwd`, etc.).
65
+
66
+ ---
67
+
68
+ ### HIGH
69
+
70
+ #### SEC-3: Incomplete shell operator regex in hooks runner
71
+ **Severity:** HIGH | **Category:** Security
72
+ **File:** `src/hooks/runner.ts:111`
73
+ The regex checking for dangerous shell operators omits the backtick character, allowing command substitution via `` `cmd` `` syntax to bypass the safety check.
74
+ **Risk:** Command injection through backtick substitution in hook commands.
75
+ **Fix:** Add backtick to the shell operator regex pattern.
76
+
77
+ #### SEC-4: Command injection via story content in auto plugin
78
+ **Severity:** HIGH | **Category:** Security
79
+ **File:** `src/interaction/plugins/auto.ts`
80
+ User story content is interpolated into CLI arguments without escaping. A story containing shell metacharacters can inject commands.
81
+ **Risk:** Arbitrary command execution if story content is attacker-controlled.
82
+ **Fix:** Use array-form `Bun.spawn()` instead of shell string interpolation. Escape or validate story content before passing as arguments.
83
+
84
+ #### SEC-5: --dangerously-skip-permissions hardcoded
85
+ **Severity:** HIGH | **Category:** Security
86
+ **File:** `src/acceptance/generator.ts`, `src/acceptance/fix-generator.ts`
87
+ The `--dangerously-skip-permissions` flag is hardcoded in acceptance prompt generators, bypassing the config gate that should control this setting.
88
+ **Risk:** Agent processes always run with elevated permissions regardless of user config.
89
+ **Fix:** Read from config (`quality.dangerouslySkipPermissions`) and only include the flag when explicitly enabled.
90
+
91
+ #### BUG-3: Story duration uses run startTime, not story startTime
92
+ **Severity:** HIGH | **Category:** Bug
93
+ **File:** `src/execution/pipeline-result-handler.ts:101,108,176,194,258`
94
+ All story durations are calculated as `Date.now() - startTime` where `startTime` is the run start time, not the individual story's start. Stories after the first report inflated durations that include all prior stories' execution time.
95
+ **Risk:** Incorrect metrics reporting. Makes performance analysis unreliable.
96
+ **Fix:** Track per-story start times and use those for duration calculation.
97
+
98
+ #### BUG-4: Unguarded prd.userStories[0] access
99
+ **Severity:** HIGH | **Category:** Bug
100
+ **File:** `src/execution/lifecycle/acceptance-loop.ts:172`
101
+ Accesses `prd.userStories[0]` without checking if the array is empty. If no stories remain, this throws a runtime error crashing the acceptance loop.
102
+ **Risk:** Runtime crash on empty story array.
103
+ **Fix:** Guard with `if (prd.userStories.length === 0) return`.
104
+
105
+ #### BUG-5: revertStoriesOnFailure mutates caller's data
106
+ **Severity:** HIGH | **Category:** Bug
107
+ **File:** `src/execution/post-verify-rectification.ts:154-181`
108
+ Uses `splice()` on `opts.allStoryMetrics` and directly mutates `opts.prd` — violating the project's immutability principle. Callers that retain references see unexpectedly modified data.
109
+ **Risk:** Corrupted state propagating to subsequent pipeline stages.
110
+ **Fix:** Return new arrays/objects instead of mutating. Use spread/filter to create copies.
111
+
112
+ #### BUG-6: Forbidden Node.js APIs in claude-plan.ts
113
+ **Severity:** HIGH | **Category:** Bug
114
+ **File:** `src/agents/claude-plan.ts`
115
+ Uses `require()`, `readFileSync`, `mkdtempSync`, `rmSync` — all explicitly forbidden by project conventions. These are blocking synchronous calls in an async orchestrator.
116
+ **Risk:** Blocks the event loop during plan execution. Inconsistent with Bun-native runtime.
117
+ **Fix:** Replace with `Bun.file().text()`, `Bun.write()`, `Bun.spawn()`, and async equivalents.
118
+
119
+ #### MEM-1: Unread stderr pipe blocks child process
120
+ **Severity:** HIGH | **Category:** Memory
121
+ **File:** `src/tui/hooks/usePty.ts:99`
122
+ ```typescript
123
+ stderr: "pipe",
124
+ ```
125
+ stderr is set to `"pipe"` but never consumed. When the pipe buffer fills (~64KB), the child process blocks on stderr writes, hanging indefinitely.
126
+ **Risk:** Agent processes hang silently when producing stderr output.
127
+ **Fix:** Either consume stderr (stream it alongside stdout) or use `stderr: "inherit"` to forward to parent.
128
+
129
+ #### MEM-2: Unbounded pendingResponses Map in webhook plugin
130
+ **Severity:** HIGH | **Category:** Memory
131
+ **File:** `src/interaction/plugins/webhook.ts`
132
+ The `pendingResponses` Map grows without bound. If webhooks fail to respond, entries accumulate forever.
133
+ **Risk:** Memory leak proportional to number of unanswered interactions.
134
+ **Fix:** Add TTL-based eviction or max-size cap with LRU eviction.
135
+
136
+ ---
137
+
138
+ ### MEDIUM
139
+
140
+ #### BUG-7: readFileSync throughout context/injector.ts
141
+ **Severity:** MEDIUM | **Category:** Bug
142
+ **File:** `src/context/injector.ts:76,103,130,142,177,206`
143
+ Six uses of `readFileSync` — forbidden Node.js API that blocks the event loop.
144
+ **Fix:** Replace with `await Bun.file(path).text()`.
145
+
146
+ #### BUG-8: setTimeout throughout codebase
147
+ **Severity:** MEDIUM | **Category:** Bug
148
+ **Files:** `src/verification/executor.ts:21,92,118`, `src/agents/claude.ts:184`, `src/routing/strategies/llm.ts:78-84`, `src/hooks/runner.ts:215`
149
+ Multiple uses of `setTimeout` instead of `Bun.sleep()` — forbidden pattern.
150
+ **Fix:** Replace with `await Bun.sleep(ms)` where applicable.
151
+
152
+ #### BUG-9: appendFileSync in logger and crash-recovery
153
+ **Severity:** MEDIUM | **Category:** Bug
154
+ **Files:** `src/logger/logger.ts:154`, `src/execution/crash-recovery.ts:67-68,110-111,282-283,346-347`
155
+ Blocking synchronous file writes in hot paths. Logger calls `appendFileSync` on every log line.
156
+ **Fix:** Use `Bun.write()` with append mode, or buffer writes.
157
+
158
+ #### BUG-10: Unsafe raceResult cast in executor
159
+ **Severity:** MEDIUM | **Category:** Bug
160
+ **File:** `src/verification/executor.ts:150`
161
+ ```typescript
162
+ const exitCode = raceResult as number;
163
+ ```
164
+ If `processPromise` resolves with `undefined` (Bun behavior on signal kill), this cast silently produces `NaN` comparisons.
165
+ **Fix:** Add explicit null/undefined check: `const exitCode = typeof raceResult === "number" ? raceResult : 1`.
166
+
167
+ #### BUG-11: test -d shell spawning in test-scanner
168
+ **Severity:** MEDIUM | **Category:** Bug
169
+ **File:** `src/context/test-scanner.ts:171-181`
170
+ Spawns a shell process (`test -d`) to check if a directory exists. Unnecessary and slow.
171
+ **Fix:** Use `Bun.file(path).exists()` or `fs.stat()`.
172
+
173
+ #### BUG-12: validateRoutingDecision excludes three-session-tdd-lite
174
+ **Severity:** MEDIUM | **Category:** Bug
175
+ **File:** `src/routing/strategies/llm-prompts.ts:110-116`
176
+ The routing decision validator does not include `"three-session-tdd-lite"` in the valid strategy set, causing this valid strategy to be silently rejected.
177
+ **Fix:** Add `"three-session-tdd-lite"` to the valid strategies array.
178
+
179
+ #### BUG-13: Byte-offset/character-index mismatch in followLogs
180
+ **Severity:** MEDIUM | **Category:** Bug
181
+ **File:** `src/commands/logs.ts:320-323`
182
+ Uses byte offset from `Bun.file().size` but character index for string slicing. Multi-byte UTF-8 content (e.g., emojis, CJK) causes garbled output or missed lines.
183
+ **Fix:** Track byte offsets consistently, or read raw buffers and convert.
184
+
185
+ #### BUG-14: auto interaction plugin receive() always throws
186
+ **Severity:** MEDIUM | **Category:** Bug
187
+ **File:** `src/interaction/plugins/auto.ts:66-71`
188
+ The `receive()` method unconditionally throws, making the auto plugin non-functional through the interaction chain for any inbound messages.
189
+ **Fix:** Implement proper receive handling or document the throw as intentional with `@design`.
190
+
191
+ #### BUG-15: savePRD mutates caller's object
192
+ **Severity:** MEDIUM | **Category:** Bug
193
+ **File:** `src/prd/index.ts:62`
194
+ `savePRD` sets `prd.updatedAt` directly, mutating the caller's reference.
195
+ **Fix:** Create a copy before modification: `const updated = { ...prd, updatedAt: new Date().toISOString() }`.
196
+
197
+ #### BUG-16: checkPRDValid mutates story objects
198
+ **Severity:** MEDIUM | **Category:** Bug
199
+ **File:** `src/precheck/checks-blockers.ts:136-140`
200
+ Mutates story objects in-place while a comment says "don't modify the PRD". Also has a triple null check copy-paste bug (`|| testCommand === null || testCommand === null`).
201
+ **Fix:** Use non-mutating approach. Fix duplicate null check.
202
+
203
+ #### BUG-17: usePipelineEvents startTime causes event listener churn
204
+ **Severity:** MEDIUM | **Category:** Bug
205
+ **File:** `src/tui/hooks/usePipelineEvents.ts:71,180`
206
+ `startTime = Date.now()` is computed every render and included in `useEffect` deps, causing the effect to re-run and re-register event listeners on every render.
207
+ **Fix:** Use `useRef` for startTime or compute it once with `useState(() => Date.now())`.
208
+
209
+ #### BUG-18: dispatcher results in completion order, not input order
210
+ **Severity:** MEDIUM | **Category:** Bug
211
+ **File:** `src/worktree/dispatcher.ts`
212
+ `pLimit` returns results in completion order. If callers expect input-order results, story assignments may be mismatched.
213
+ **Fix:** Map results back to input indices, or document completion-order semantics.
214
+
215
+ #### BUG-19: getTotalStories getter captures prd before declaration
216
+ **Severity:** MEDIUM | **Category:** Bug
217
+ **File:** `src/execution/runner.ts:141,152`
218
+ The `getTotalStories` getter closure captures `prd` at line 141, but `prd` is declared at line 152. Due to `let` temporal dead zone rules, accessing the getter before line 152 throws.
219
+ **Fix:** Move the getter definition after `prd` declaration, or restructure.
220
+
221
+ #### BUG-20: Bun.file() on directory path unreliable
222
+ **Severity:** MEDIUM | **Category:** Bug
223
+ **File:** `src/interaction/state.ts:148`
224
+ `Bun.file()` is called on a directory path. Behavior is undefined/unreliable for directories.
225
+ **Fix:** Use `readdir` or check `isDirectory()` first.
226
+
227
+ #### TYPE-1: Duplicate RoutingDecision interface
228
+ **Severity:** MEDIUM | **Category:** Type Safety
229
+ **File:** `src/routing/` (multiple files)
230
+ `RoutingDecision` is defined in two places with subtly different shapes, causing type confusion.
231
+ **Fix:** Consolidate into `src/routing/types.ts` and import from the barrel.
232
+
233
+ #### PERF-1: existsSync blocking calls
234
+ **Severity:** MEDIUM | **Category:** Performance
235
+ **Files:** `src/execution/pid-registry.ts:82,163`, `src/pipeline/stages/gate.ts`, `src/routing/path-security.ts`
236
+ Synchronous file existence checks block the event loop.
237
+ **Fix:** Use async `Bun.file(path).exists()`.
238
+
239
+ ---
240
+
241
+ ### LOW
242
+
243
+ #### STYLE-1: Files exceeding 400-line limit
244
+ **Severity:** LOW | **Category:** Style
245
+ **Files:**
246
+ - `src/execution/parallel.ts` (401 lines)
247
+ - `src/config/types.ts` (443 lines)
248
+ - `src/cli/config.ts` (562 lines)
249
+ **Fix:** Split by logical concern. `config.ts` is 162 lines over limit.
250
+
251
+ #### STYLE-2: Dead code — RunLifecycle class (312 LOC)
252
+ **Severity:** LOW | **Category:** Style
253
+ **File:** `src/execution/lifecycle/run-lifecycle.ts`
254
+ The `RunLifecycle` class duplicates logic now handled by other modules. Not referenced by `runner.ts`.
255
+ **Fix:** Delete the file after confirming no other references.
256
+
257
+ #### STYLE-3: tryLlmBatchRoute duplicated in 3 locations
258
+ **Severity:** LOW | **Category:** Style
259
+ **Files:** `src/execution/runner.ts`, `src/execution/parallel.ts`, `src/execution/sequential-executor.ts`
260
+ Same function copy-pasted across three files.
261
+ **Fix:** Extract to `src/routing/batch-route.ts` and import from barrel.
262
+
263
+ #### STYLE-4: console.log/console.error in source files
264
+ **Severity:** LOW | **Category:** Style
265
+ **Files:** `src/logger/logger.ts:70,116,156`, `src/execution/crash-recovery.ts:70,114,134`, `src/plugins/loader.ts:22`, `src/cli/config.ts` (throughout), `src/precheck/index.ts` (throughout), `src/review/runner.ts`, `src/optimizer/index.ts`, `src/tui/features/status-features.ts`
266
+ Forbidden pattern per `.claude/rules/04-forbidden-patterns.md`.
267
+ **Fix:** Replace with project logger (`src/logger`).
268
+
269
+ #### STYLE-5: Emojis in source code
270
+ **Severity:** LOW | **Category:** Style
271
+ **Files:** `src/logging/types.ts:15-36` (EMOJI constant), `src/tui/components/StoriesPanel.tsx`, `src/interaction/plugins/telegram.ts`, `src/acceptance/generator.ts`
272
+ Violates no-emoji convention.
273
+ **Fix:** Replace with text markers `[OK]`, `[WARN]`, `[FAIL]`, `->`.
274
+
275
+ #### STYLE-6: Internal path imports instead of barrels
276
+ **Severity:** LOW | **Category:** Style
277
+ **Files:** `src/pipeline/stages/verify.ts`, `src/pipeline/stages/routing.ts`, `src/context/test-scanner.ts`, several others
278
+ Importing from internal paths (`src/routing/router`) instead of barrels (`src/routing`). Risks singleton fragmentation in Bun's module registry (BUG-035).
279
+ **Fix:** Import from barrel `index.ts` files.
280
+
281
+ #### STYLE-7: .js extension imports
282
+ **Severity:** LOW | **Category:** Style
283
+ **File:** `src/verification/optimizer.ts`
284
+ Uses `.js` file extension in imports, inconsistent with the rest of the codebase which uses extensionless imports.
285
+ **Fix:** Remove `.js` extensions.
286
+
287
+ #### STYLE-8: Mixed Node.js fs APIs in execution/
288
+ **Severity:** LOW | **Category:** Style
289
+ **Files:** `src/execution/lock.ts` (openSync/writeSync/closeSync), `src/execution/progress.ts` (mkdirSync), `src/execution/queue-handler.ts` (mv/rm subprocess)
290
+ Various forbidden Node.js APIs scattered through execution modules.
291
+ **Fix:** Replace with Bun-native equivalents.
292
+
293
+ #### ENH-1: Unbounded testOutput in acceptance prompts
294
+ **Severity:** LOW | **Category:** Enhancement
295
+ **File:** `src/acceptance/generator.ts`, `src/acceptance/fix-generator.ts`
296
+ Full test output is injected into prompts with no truncation. Large test suites produce prompts that exceed model context limits.
297
+ **Fix:** Truncate test output to a configurable max (e.g., last 200 lines).
298
+
299
+ #### ENH-2: costPerMinute returns undefined for non-standard tiers
300
+ **Severity:** LOW | **Category:** Enhancement
301
+ **File:** `src/routing/strategies/`
302
+ The cost calculation function returns `undefined` for tiers not in its lookup table, causing NaN in downstream math.
303
+ **Fix:** Default to 0 or throw for unknown tiers.
304
+
305
+ #### ENH-3: decompose() has no timeout
306
+ **Severity:** LOW | **Category:** Enhancement
307
+ **File:** `src/agents/claude.ts:241-283`
308
+ The `decompose()` method calls the LLM with no timeout. A hung API call blocks the orchestrator indefinitely.
309
+ **Fix:** Add configurable timeout wrapping the API call.
310
+
311
+ ---
312
+
313
+ ## Priority Fix Order
314
+
315
+ | Priority | ID | Effort | Description |
316
+ |:---|:---|:---|:---|
317
+ | P0 | SEC-1 | S | Validate plugin loader import paths against allowlist |
318
+ | P0 | SEC-2 | S | Validate routing loader import paths against allowlist |
319
+ | P0 | BUG-1 | M | Replace parallel executor concurrency with proper semaphore |
320
+ | P0 | BUG-2 | S | Fix usePty useEffect deps — memoize or destructure primitives |
321
+ | P1 | SEC-3 | S | Add backtick to shell operator regex |
322
+ | P1 | SEC-4 | S | Use array-form Bun.spawn for story content |
323
+ | P1 | SEC-5 | S | Read --dangerously-skip-permissions from config, not hardcode |
324
+ | P1 | BUG-3 | M | Track per-story start times for duration calculation |
325
+ | P1 | BUG-5 | M | Make revertStoriesOnFailure return copies instead of mutating |
326
+ | P1 | MEM-1 | S | Consume or inherit stderr in usePty |
327
+ | P1 | MEM-2 | S | Add TTL eviction to webhook pendingResponses |
328
+ | P2 | BUG-6 | M | Replace Node.js APIs in claude-plan.ts with Bun-native |
329
+ | P2 | BUG-7 | M | Replace readFileSync in context/injector.ts |
330
+ | P2 | BUG-8 | M | Replace setTimeout with Bun.sleep across codebase |
331
+ | P2 | BUG-9 | M | Replace appendFileSync with async Bun.write |
332
+ | P2 | BUG-10 | S | Add null check for raceResult cast |
333
+ | P2 | BUG-12 | S | Add three-session-tdd-lite to valid strategies |
334
+ | P2 | BUG-13 | S | Fix byte-offset/character-index mismatch |
335
+ | P2 | BUG-17 | S | Use useRef for startTime in usePipelineEvents |
336
+ | P2 | BUG-19 | S | Move getTotalStories after prd declaration |
337
+ | P3 | BUG-4 | S | Guard prd.userStories[0] access |
338
+ | P3 | BUG-11 | S | Replace test -d spawn with Bun.file().exists() |
339
+ | P3 | BUG-14 | S | Fix or document auto plugin receive() throw |
340
+ | P3 | BUG-15 | S | Copy PRD before mutating in savePRD |
341
+ | P3 | BUG-16 | S | Fix mutation and duplicate null check in checkPRDValid |
342
+ | P3 | BUG-18 | S | Document or fix dispatcher result ordering |
343
+ | P3 | BUG-20 | S | Fix Bun.file() on directory path |
344
+ | P3 | TYPE-1 | S | Consolidate duplicate RoutingDecision interface |
345
+ | P3 | PERF-1 | S | Replace existsSync with async equivalents |
346
+ | P4 | STYLE-1 | M | Split 3 files exceeding 400-line limit |
347
+ | P4 | STYLE-2 | S | Delete dead RunLifecycle class |
348
+ | P4 | STYLE-3 | M | Extract tryLlmBatchRoute to shared module |
349
+ | P4 | STYLE-4 | M | Replace console.log/error with project logger |
350
+ | P4 | STYLE-5 | S | Replace emojis with text markers |
351
+ | P4 | STYLE-6 | M | Fix internal path imports to use barrels |
352
+ | P4 | STYLE-7 | S | Remove .js extension imports |
353
+ | P4 | STYLE-8 | M | Replace Node.js fs APIs in execution/ |
354
+ | P5 | ENH-1 | S | Truncate testOutput in acceptance prompts |
355
+ | P5 | ENH-2 | S | Default costPerMinute for unknown tiers |
356
+ | P5 | ENH-3 | S | Add timeout to decompose() |
357
+
358
+ ---
359
+
360
+ ## Summary Statistics
361
+
362
+ | Category | Count | CRITICAL | HIGH | MEDIUM | LOW |
363
+ |:---|:---|:---|:---|:---|:---|
364
+ | Security (SEC) | 5 | 2 | 3 | 0 | 0 |
365
+ | Bug (BUG) | 20 | 2 | 2 | 12 | 4 |
366
+ | Memory (MEM) | 2 | 0 | 2 | 0 | 0 |
367
+ | Type Safety (TYPE) | 1 | 0 | 0 | 1 | 0 |
368
+ | Performance (PERF) | 1 | 0 | 0 | 1 | 0 |
369
+ | Style (STYLE) | 8 | 0 | 0 | 0 | 8 |
370
+ | Enhancement (ENH) | 3 | 0 | 0 | 0 | 3 |
371
+ | **Total** | **40** | **4** | **7** | **14** | **15** |
372
+
373
+ ---
374
+
375
+ ## Methodology
376
+
377
+ - **Review type:** Deep (all source files read systematically)
378
+ - **Checklists applied:** universal.md, node-general.md
379
+ - **Agents used:** 6 parallel Explore agents covering all `src/` directories
380
+ - **Manual passes:** Security (secrets, injection, forbidden patterns), convention compliance
381
+ - **Test verification:** Unit (938 pass, 6 skip), Integration (1035 pass, 4 skip), UI (96 pass, 2 skip)
382
+
383
+ ---
384
+
385
+ ## v0.18.5 Addendum — BUN-001 Migration Gaps & _deps Pattern Assessment
386
+
387
+ *Added: 2026-03-04. Supplements the review above with issues specific to v0.18.5 (node-pty -> Bun.spawn migration) and v0.18.4 (_deps pattern adoption).*
388
+
389
+ ---
390
+
391
+ ### New Findings
392
+
393
+ #### MEM-3: Unread stderr pipe in runInteractive()
394
+ **Severity:** MEDIUM | **Category:** Memory
395
+ **File:** `src/agents/claude.ts:298`
396
+
397
+ Same class as MEM-1 (usePty stderr pipe) but in the `runInteractive()` method. `stderr: "pipe"` is set but never consumed. When the pipe buffer fills (~64 KB), the subprocess blocks on stderr writes, hanging indefinitely.
398
+
399
+ The code comment notes this method is "TUI-only and currently dormant in headless nax runs", which mitigates immediate risk — but the bug activates when TUI mode is re-enabled.
400
+
401
+ **Fix:** Consume stderr in a parallel IIFE (mirror the stdout loop) or change to `stderr: "inherit"`.
402
+
403
+ ---
404
+
405
+ #### BUG-21: Fire-and-forget IIFE in runInteractive() has no error handling
406
+ **Severity:** MEDIUM | **Category:** Bug
407
+ **File:** `src/agents/claude.ts:305-309`
408
+
409
+ ```typescript
410
+ (async () => {
411
+ for await (const chunk of proc.stdout) {
412
+ options.onOutput(Buffer.from(chunk));
413
+ }
414
+ })();
415
+ ```
416
+
417
+ The detached async IIFE swallows all errors. A stream error or a throw from `options.onOutput()` becomes an unhandled promise rejection — the caller receives no signal and output silently stops.
418
+
419
+ **Fix:** Append `.catch((err) => getLogger()?.error("agent", "runInteractive stdout error", { err }))` to the IIFE.
420
+
421
+ ---
422
+
423
+ #### BUG-22: proc.exited.then() missing .catch() in two locations
424
+ **Severity:** LOW | **Category:** Bug
425
+ **Files:** `src/agents/claude.ts:312`, `src/tui/hooks/usePty.ts:131`
426
+
427
+ Both call `.then()` on `proc.exited` with no `.catch()`. If `options.onExit()` throws synchronously, or the setState callback in usePty throws (e.g., during unmount), the rejection is unhandled and surfaces as an uncaught promise rejection.
428
+
429
+ **Fix:** Add `.catch((err) => { /* log */ })` to both `.then()` chains.
430
+
431
+ ---
432
+
433
+ #### STYLE-9: canSpawnPty = false permanently disables all PTY integration tests
434
+ **Severity:** LOW | **Category:** Style
435
+ **File:** `test/ui/tui-pty-integration.test.tsx`
436
+
437
+ ```typescript
438
+ const canSpawnPty = false; // BUN-001: no PTY — preserved for future re-enablement
439
+ ```
440
+
441
+ All PTY lifecycle tests (spawn, write, resize, kill, exit) are permanently skipped via a hardcoded `false`. There is no issue reference, no `test.todo()`, and no environment gate. PTY integration test coverage is zero with no path to re-enabling it.
442
+
443
+ **Fix:** Replace with `const canSpawnPty = process.env.RUN_PTY_TESTS === "1"` and file a tracking issue for the test gap.
444
+
445
+ ---
446
+
447
+ #### ENH-5: _deps pattern covers 2 of 50+ modules — adoption plan needed
448
+ **Severity:** MEDIUM | **Category:** Enhancement
449
+ **Files:** `src/verification/smart-runner.ts`, `src/pipeline/stages/verify.ts`
450
+
451
+ **Background:** v0.18.4 commit `8d80158` ("refactor: eliminate mock.module() and fix test architecture debt") introduced the `_deps` dependency injection pattern as the correct replacement for `mock.module()` (globally banned in Bun 1.x). The implementation in `smart-runner.ts` and `verify.ts` is clean and correct — originals are captured before each test and restored in `afterEach`.
452
+
453
+ **Gap:** ~50+ source modules with non-trivial internal dependencies have no `_deps` export. Tests for those modules must choose between: (a) leaving complex code paths untested, (b) relying on slow integration tests that spawn real processes, or (c) using `mock.module()` in violation of the convention (global ESM registry leak).
454
+
455
+ **Impact:** The convention exists in `.claude/rules/04-forbidden-patterns.md` but cannot be enforced without the infrastructure to comply. Any contributor writing a new unit test for an un-`_deps`-ified module faces this conflict.
456
+
457
+ **High-priority candidates for _deps rollout:**
458
+ - `src/routing/strategies/llm.ts` — LLM retry and fallback logic; currently hard to unit test
459
+ - `src/execution/crash-recovery.ts` — crash detection and recovery; complex branching
460
+ - `src/plugins/loader.ts` — dynamic import (also SEC-1); needs path validation testable in isolation
461
+ - `src/agents/claude.ts` — runOnce/runInteractive; timeout and retry logic
462
+
463
+ **Fix:** Document adoption steps in `.claude/rules/03-test-writing.md` with a tracking section. Prioritize modules with known test gaps (crash-recovery, routing/llm) in the next minor release.
464
+
465
+ ---
466
+
467
+ ### Updated Priority Matrix (addendum items only)
468
+
469
+ | Priority | ID | Effort | Description |
470
+ |:---|:---|:---|:---|
471
+ | P1 | MEM-3 | S | Consume or inherit stderr in runInteractive() |
472
+ | P2 | BUG-21 | S | Add .catch() to stdout IIFE in runInteractive() |
473
+ | P2 | ENH-5 | L | _deps rollout plan — prioritize crash-recovery, llm strategy, plugin loader |
474
+ | P4 | BUG-22 | S | Add .catch() to proc.exited.then() in claude.ts and usePty |
475
+ | P4 | STYLE-9 | S | Gate canSpawnPty on env var, file tracking issue |
476
+
477
+ ---
478
+
479
+ ### Updated Summary Statistics
480
+
481
+ | Category | Count | CRITICAL | HIGH | MEDIUM | LOW |
482
+ |:---|:---|:---|:---|:---|:---|
483
+ | Security (SEC) | 5 | 2 | 3 | 0 | 0 |
484
+ | Bug (BUG) | 22 | 2 | 2 | 13 | 5 |
485
+ | Memory (MEM) | 3 | 0 | 2 | 1 | 0 |
486
+ | Type Safety (TYPE) | 1 | 0 | 0 | 1 | 0 |
487
+ | Performance (PERF) | 1 | 0 | 0 | 1 | 0 |
488
+ | Style (STYLE) | 9 | 0 | 0 | 0 | 9 |
489
+ | Enhancement (ENH) | 5 | 0 | 0 | 2 | 3 |
490
+ | **Total** | **46** | **4** | **7** | **17** | **18** |
491
+
492
+ **Overall grade unchanged: C+ (72/100).** The BUN-001 migration in v0.18.5 eliminated the node-pty native build dependency (a concrete improvement) but introduced three new error-handling gaps and permanently skipped all PTY integration tests. The _deps pattern from v0.18.4 is a strong foundation — clean where implemented — but requires a structured rollout to the remaining ~50 modules before it meaningfully reduces test architecture risk.
package/docs/ROADMAP.md CHANGED
@@ -134,27 +134,58 @@
134
134
 
135
135
  ---
136
136
 
137
- ## v0.19.0 — Hardening & Compliance
137
+ ## v0.21.0 — Process Reliability & Observability ✅
138
+
139
+ **Theme:** Kill orphan processes cleanly, smart-runner precision, test strategy quality
140
+ **Status:** ✅ Shipped (2026-03-06)
141
+
142
+ ### Shipped
143
+ - [x] **BUG-039 (simple):** Timeouts for review/runner.ts lint/typecheck, git.ts, executor.ts timer leak
144
+ - [x] **BUG-039 (medium):** runOnce() SIGKILL follow-up + pidRegistry.unregister() in finally; LLM stream drain (stdout/stderr cancel) before proc.kill() on timeout
145
+ - [x] **FEAT-010:** baseRef tracking — capture HEAD per attempt, `git diff <baseRef>..HEAD` in smart-runner (precise, no cross-story pollution)
146
+ - [x] **FEAT-011:** Path-only context for oversized files (>10KB) — was silently dropped, now agent gets a path hint
147
+ - [x] **FEAT-013:** Deprecated `test-after` from auto routing — simple/medium stories now default to `three-session-tdd-lite`
148
+ - [x] ~~**BUG-041:**~~ Won't fix — superseded by FEAT-010
149
+ - [x] ~~**FEAT-012:**~~ Won't fix — balanced tier sufficient for test-writer
150
+
151
+ ### KIV → v0.22.0 Pipeline Observability
152
+ - [ ] **BUG-040:** Lint/typecheck auto-repair (re-architecture with BUG-042 + FEAT-014)
153
+ - [ ] **BUG-042:** Verifier failure capture (re-architecture with BUG-040 + FEAT-014)
154
+ - [ ] **FEAT-014:** Heartbeat observability (unified pipeline event bus re-architecture)
155
+
156
+ ---
157
+
158
+ ## v0.20.0 — Verification Architecture v2 ✅
138
159
 
139
160
  **Theme:** Eliminate duplicate test runs, deferred regression gate, structured escalation context
140
- **Status:** 🔲 Planned
141
- **Spec:** [docs/specs/verification-architecture-v2.md](specs/verification-architecture-v2.md) (Phase 2)
161
+ **Status:** Shipped (2026-03-06)
162
+ **Spec:** [docs/specs/verification-architecture-v2.md](specs/verification-architecture-v2.md)
163
+
164
+ ### Shipped
165
+ - [x] Pipeline verify stage is single test execution point (Smart Test Runner)
166
+ - [x] Removed scoped re-test in `post-verify.ts` (duplicate eliminated)
167
+ - [x] Review stage: typecheck + lint only — `checks: ["typecheck", "lint"]`
168
+ - [x] Deferred regression gate — `src/execution/lifecycle/run-regression.ts`
169
+ - [x] Reverse Smart Test Runner mapping: test → source → responsible story
170
+ - [x] Targeted rectification per story with full failure context
171
+ - [x] `regressionGate.mode: "deferred" | "per-story" | "disabled"` config
172
+ - [x] `maxRectificationAttempts` config (default: 2)
173
+ - [x] BUG-037: verify output shows last 20 lines (failures, not prechecks)
174
+
175
+ ---
142
176
 
143
- ### Remove Duplicate Test Execution
144
- - [ ] Pipeline verify stage is the single test execution point (Smart Test Runner)
145
- - [ ] Remove scoped re-test in `post-verify.ts` (duplicate of pipeline verify)
146
- - [ ] Review stage runs typecheck + lint only — remove `review.commands.test` execution
177
+ ## v0.19.0 Hardening & Compliance ✅
147
178
 
148
- ### Deferred Regression Gate
149
- - [ ] New `src/execution/lifecycle/run-regression.ts` — run full suite once at run-end (not per-story)
150
- - [ ] Reverse Smart Test Runner mapping: failing test → source file → responsible story
151
- - [ ] Targeted rectification per responsible story with full failure context
152
- - [ ] Config: `execution.regressionGate.mode: "deferred" | "per-story" | "disabled"` (default `"deferred"`)
153
- - [ ] Call deferred regression in `run-completion.ts` before final metrics
179
+ **Theme:** Security hardening, _deps injection pattern, Node.js API removal
180
+ **Status:** Shipped (2026-03-04)
181
+ **Spec:** [docs/specs/verification-architecture-v2.md](specs/verification-architecture-v2.md) (Phase 2)
154
182
 
155
- ### Full Structured Failure Context
156
- - [ ] `priorFailures` injected into escalated agent prompts via `context/builder.ts`
157
- - [ ] Reverse file mapping for regression attribution
183
+ ### Shipped
184
+ - [x] Pipeline verify stage is the single test execution point (Smart Test Runner)
185
+ - [x] Remove scoped re-test in `post-verify.ts` (duplicate of pipeline verify)
186
+ - [x] Review stage runs typecheck + lint only — remove `review.commands.test` execution
187
+ - [x] `priorFailures` injected into escalated agent prompts via `context/builder.ts`
188
+ - [x] Reverse file mapping for regression attribution
158
189
 
159
190
  ### Central Run Registry (carried forward)
160
191
  - [ ] `~/.nax/runs/<project>-<feature>-<runId>/` with status.json + events.jsonl symlink
@@ -166,7 +197,8 @@
166
197
  | Version | Theme | Date | Details |
167
198
  |:---|:---|:---|:---|
168
199
  | v0.18.1 | Type Safety + CI Pipeline | 2026-03-03 | 60 TS errors + 12 lint errors fixed, GitLab CI green (1952/56/0) |
169
- | v0.19.0 | Hardening & Compliance | TBD | SEC-1 to SEC-5, BUG-1, Node.js API removal, _deps rollout |
200
+ | v0.20.0 | Verification Architecture v2 | 2026-03-06 | Deferred regression gate, remove duplicate tests, BUG-037 |
201
+ | v0.19.0 | Hardening & Compliance | 2026-03-04 | SEC-1 to SEC-5, BUG-1, Node.js API removal, _deps rollout |
170
202
  | v0.18.5 | Bun PTY Migration | 2026-03-04 | BUN-001: node-pty → Bun.spawn, CI cleanup, flaky test fix |
171
203
  | v0.18.4 | Routing Stability | 2026-03-04 | BUG-031 keyword drift, BUG-033 LLM retry, pre-commit hook |
172
204
  | v0.18.3 | Execution Reliability + Smart Runner | 2026-03-04 | BUG-026/028/029/030/032 + SFC-001/002 + STR-007, all items complete |
@@ -224,6 +256,8 @@
224
256
  - [x] **BUG-032:** Routing stage overrides escalated `modelTier` with complexity-derived tier. `src/pipeline/stages/routing.ts:43` always runs `complexityToModelTier(routing.complexity, config)` even when `story.routing.modelTier` was explicitly set by `handleTierEscalation()`. BUG-026 was escalated to `balanced` (logged in iteration header), but `Task classified` shows `modelTier=fast` because `complexityToModelTier("simple", config)` → `"fast"`. Related to BUG-013 (escalation routing not applied) which was marked fixed, but the fix in `applyCachedRouting()` in `pipeline-result-handler.ts:295-310` runs **after** the routing stage — too late. **Location:** `src/pipeline/stages/routing.ts:43`. **Fix:** When `story.routing.modelTier` is explicitly set (by escalation), skip `complexityToModelTier()` and use the cached tier directly. Only derive from complexity when `story.routing.modelTier` is absent.
225
257
  - [x] **BUG-033:** LLM routing has no retry on timeout — single attempt with hardcoded 15s default. All 5 LLM routing attempts in the v0.18.3 run timed out at 15s, forcing keyword fallback every time. `src/routing/strategies/llm.ts:63` reads `llmConfig?.timeoutMs ?? 15000` but there's no retry logic — one timeout = immediate fallback. **Location:** `src/routing/strategies/llm.ts:callLlm()`. **Fix:** Add `routing.llm.retries` config (default: 1) with backoff. Also surface `routing.llm.timeoutMs` in `nax config --explain` and consider raising default to 30s for batch routing which processes multiple stories.
226
258
 
259
+ - [ ] **BUG-037:** Test output summary (verify stage) captures precheck boilerplate instead of actual `bun test` failure. **Symptom:** Logs show successful prechecks (Head) instead of failed tests (Tail). **Fix:** Change `Test output preview` log to tail the last 20 lines of output instead of heading the first 10.
260
+ - [ ] **BUG-038:** `smart-runner` over-matching when global defaults change. **Symptom:** Changing `DEFAULT_CONFIG` matches broad integration tests that fail due to environment/precheck side effects, obscuring targeted results. **Fix:** Refine path mapping to prioritize direct unit tests and exclude known heavy integration tests from default smart-runner matches unless explicitly relevant.
227
261
  ### Features
228
262
  - [x] ~~`nax unlock` command~~
229
263
  - [x] ~~Constitution file support~~
@@ -247,4 +281,4 @@ Sequential canary → stable: `v0.12.0-canary.0` → `canary.N` → `v0.12.0`
247
281
  Canary: `npm publish --tag canary`
248
282
  Stable: `npm publish` (latest)
249
283
 
250
- *Last updated: 2026-03-04 (v0.18.3 shipped; v0.18.4: BUG-031/033; v0.19.0: Verification Architecture v2)*
284
+ *Last updated: 2026-03-06 (v0.21.0 shipped; v0.22.0: Pipeline Observability planned — BUG-040, BUG-042, FEAT-014)*