@nathapp/nax 0.18.2 → 0.18.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (58) hide show
  1. package/.claude/rules/01-project-conventions.md +34 -0
  2. package/.claude/rules/02-test-architecture.md +39 -0
  3. package/.claude/rules/03-test-writing.md +58 -0
  4. package/.claude/rules/04-forbidden-patterns.md +29 -0
  5. package/.githooks/pre-commit +13 -0
  6. package/.gitlab-ci.yml +11 -5
  7. package/CHANGELOG.md +9 -0
  8. package/CLAUDE.md +45 -122
  9. package/bun.lock +1 -1
  10. package/bunfig.toml +2 -1
  11. package/docker-compose.test.yml +15 -0
  12. package/docs/ROADMAP.md +83 -14
  13. package/docs/specs/verification-architecture-v2.md +343 -0
  14. package/nax/config.json +7 -7
  15. package/nax/features/v0.18.3-execution-reliability/prd.json +80 -0
  16. package/nax/features/v0.18.3-execution-reliability/progress.txt +3 -0
  17. package/package.json +2 -2
  18. package/src/config/defaults.ts +1 -0
  19. package/src/config/schema.ts +1 -0
  20. package/src/config/schemas.ts +26 -1
  21. package/src/config/types.ts +21 -4
  22. package/src/context/builder.ts +11 -0
  23. package/src/context/elements.ts +38 -1
  24. package/src/execution/escalation/tier-escalation.ts +28 -3
  25. package/src/execution/post-verify-rectification.ts +4 -2
  26. package/src/execution/post-verify.ts +102 -20
  27. package/src/execution/progress.ts +2 -0
  28. package/src/pipeline/stages/execution.ts +10 -2
  29. package/src/pipeline/stages/review.ts +5 -3
  30. package/src/pipeline/stages/routing.ts +28 -9
  31. package/src/pipeline/stages/verify.ts +49 -8
  32. package/src/prd/index.ts +16 -1
  33. package/src/prd/types.ts +33 -0
  34. package/src/routing/strategies/keyword.ts +7 -4
  35. package/src/routing/strategies/llm.ts +45 -4
  36. package/src/verification/gate.ts +2 -1
  37. package/src/verification/smart-runner.ts +68 -0
  38. package/src/verification/types.ts +2 -0
  39. package/test/context/prior-failures.test.ts +462 -0
  40. package/test/execution/structured-failure.test.ts +414 -0
  41. package/test/integration/logger.test.ts +1 -1
  42. package/test/{US-002-orchestrator.test.ts → integration/precheck-orchestrator.test.ts} +3 -3
  43. package/test/integration/review-plugin-integration.test.ts +2 -1
  44. package/test/integration/story-id-in-events.test.ts +1 -1
  45. package/test/unit/config/smart-runner-flag.test.ts +36 -12
  46. package/test/unit/execution/post-verify-regression.test.ts +415 -0
  47. package/test/{execution → unit/execution}/post-verify.test.ts +33 -1
  48. package/test/unit/pipeline/routing-partial-override.test.ts +15 -36
  49. package/test/unit/pipeline/verify-smart-runner.test.ts +8 -6
  50. package/test/unit/prd-get-next-story.test.ts +28 -0
  51. package/test/unit/routing/routing-stability.test.ts +207 -0
  52. package/test/unit/routing.test.ts +102 -0
  53. package/test/unit/storyid-events.test.ts +20 -32
  54. package/test/unit/verification/smart-runner-config.test.ts +162 -0
  55. package/test/unit/verification/smart-runner-discovery.test.ts +353 -0
  56. package/test/TEST_COVERAGE_US001.md +0 -217
  57. package/test/TEST_COVERAGE_US003.md +0 -84
  58. package/test/TEST_COVERAGE_US005.md +0 -86
@@ -0,0 +1,34 @@
1
+ # Project Conventions
2
+
3
+ ## Language & Runtime
4
+
5
+ - **Bun-native only.** Use `Bun.file()`, `Bun.write()`, `Bun.spawn()`, `Bun.sleep()`. Never use Node.js equivalents (`fs.readFile`, `child_process.spawn`, `setTimeout` for delays).
6
+ - TypeScript strict mode. No `any` unless unavoidable (document why).
7
+ - Target: Bun 1.3.7+.
8
+
9
+ ## File Size
10
+
11
+ - **400-line hard limit** for all source and test files.
12
+ - If a file approaches 400 lines, split it before adding more code.
13
+ - Split by logical concern (one function/class per file when possible).
14
+
15
+ ## Module Structure
16
+
17
+ - Every directory with 2+ exports gets a barrel `index.ts`.
18
+ - Types go in `types.ts` per module directory.
19
+ - Import from barrels (`src/routing`), **never from internal paths** (`src/routing/router`). This prevents singleton fragmentation in Bun's module registry.
20
+
21
+ ## Logging
22
+
23
+ - Use the project logger (`src/logger`). Never use `console.log` / `console.error` in source code.
24
+ - Log format: no emojis. Use `[OK]`, `[WARN]`, `[FAIL]`, `->`. Machine-parseable.
25
+
26
+ ## Commits
27
+
28
+ - Conventional commits: `feat:`, `fix:`, `refactor:`, `test:`, `docs:`, `chore:`.
29
+ - Atomic — one logical change per commit.
30
+ - Never include `[run-release]` unless explicitly told to.
31
+
32
+ ## Formatting
33
+
34
+ - Biome handles formatting and linting. Run `bun run lint` before committing.
@@ -0,0 +1,39 @@
1
+ # Test Architecture
2
+
3
+ ## Directory Structure
4
+
5
+ Tests **must** mirror the `src/` directory structure:
6
+
7
+ ```
8
+ src/routing/strategies/foo.ts → test/unit/routing/strategies/foo.test.ts
9
+ src/execution/runner.ts → test/unit/execution/runner.test.ts
10
+ src/pipeline/stages/verify.ts → test/unit/pipeline/stages/verify.test.ts
11
+ src/verification/smart-runner.ts → test/unit/verification/smart-runner.test.ts
12
+ ```
13
+
14
+ ## Test Categories
15
+
16
+ | Category | Location | Purpose |
17
+ |:---|:---|:---|
18
+ | Unit | `test/unit/<mirror-of-src>/` | Test individual functions/classes in isolation |
19
+ | Integration | `test/integration/<feature>.test.ts` | Test multiple modules working together |
20
+ | UI | `test/ui/` | TUI component tests |
21
+
22
+ ## Placement Rules
23
+
24
+ 1. **Never create test files in `test/` root.** Always place in the appropriate subdirectory.
25
+ 2. **Never create standalone bug-fix test files** like `test/execution/post-verify-bug026.test.ts`. Add tests to the existing relevant test file instead. If the relevant file would exceed 400 lines, split the file by describe block — not by bug number.
26
+ 3. **Never create `TEST_COVERAGE_*.md` or documentation files in `test/`.** Put docs in `docs/`.
27
+ 4. **Unit test directories must exist under `test/unit/`**, mirroring `src/`. Do not create top-level test directories like `test/execution/` or `test/context/` — use `test/unit/execution/` and `test/unit/context/`.
28
+
29
+ ## File Naming
30
+
31
+ - Test files: `<source-file-name>.test.ts` — must match the source file name exactly.
32
+ - One test file per source file (for unit tests).
33
+ - If a test file needs splitting, split by describe block into `<module>-<concern>.test.ts`.
34
+
35
+ ## Temp Files & Fixtures
36
+
37
+ - Use `mkdtempSync(join(tmpdir(), "nax-test-"))` for temporary directories.
38
+ - Clean up in `afterAll()` — never leave files in `test/tmp/`.
39
+ - Integration tests needing git: always `git init` + `git add .` + `git commit` in the temp fixture before testing.
@@ -0,0 +1,58 @@
1
+ # Test Writing Rules
2
+
3
+ ## Mocking
4
+
5
+ ### Never use `mock.module()`
6
+
7
+ `mock.module()` in Bun 1.x is **globally scoped and leaks between test files**. It poisons the ESM module registry for the entire test run. `mock.restore()` does NOT undo `mock.module()` overrides.
8
+
9
+ **Instead, use dependency injection:**
10
+
11
+ ```typescript
12
+ // In source file: export a swappable deps object
13
+ export const _deps = {
14
+ readConfig: () => loadConfig(),
15
+ runCommand: (cmd: string) => Bun.spawn(cmd.split(" ")),
16
+ };
17
+
18
+ // In test file: override _deps directly
19
+ import { _deps } from "src/mymodule";
20
+
21
+ beforeEach(() => {
22
+ _deps.readConfig = mock(() => fakeConfig);
23
+ });
24
+
25
+ afterEach(() => {
26
+ mock.restore(); // restores mock() spies (NOT mock.module)
27
+ _deps.readConfig = originalReadConfig;
28
+ });
29
+ ```
30
+
31
+ ### General Mocking Rules
32
+
33
+ - Always call `mock.restore()` in `afterEach()`.
34
+ - Use `mock()` (function-level) freely — it's properly scoped.
35
+ - Never rely on test file execution order. Each file must be independently runnable.
36
+ - Store original function references before overriding `_deps` and restore in `afterEach`.
37
+
38
+ ## CI Compatibility
39
+
40
+ - Tests requiring the `claude` binary: guard with `const skipInCI = process.env.CI ? test.skip : test;`
41
+ - Tests requiring specific OS features: guard with platform checks.
42
+ - Never send real signals (`process.kill`) — mock `process.on()` instead.
43
+
44
+ ## Spawning & Subprocesses
45
+
46
+ - Never spawn full `nax` processes in tests — prechecks fail in temp dirs.
47
+ - Wrap `Bun.spawn()` in try/catch — throws `ENOENT` for missing binaries (not a failed exit code).
48
+
49
+ ## Test Structure
50
+
51
+ - One `describe()` block per source function or class being tested.
52
+ - Keep test files under 400 lines. Split by `describe()` block if needed.
53
+ - Use `test/helpers/` for shared mock factories and fixtures. Don't copy-paste mocking setup between files.
54
+
55
+ ## Imports
56
+
57
+ - **Import from barrels** (`src/routing`), not internal paths (`src/routing/router`).
58
+ - This matches the project convention and prevents Bun singleton fragmentation where the same module loaded via two different paths creates two separate instances.
@@ -0,0 +1,29 @@
1
+ # Forbidden Patterns
2
+
3
+ These patterns are **banned** from the nax codebase. Violations must be caught during implementation, not after.
4
+
5
+ ## Source Code
6
+
7
+ | ❌ Forbidden | ✅ Use Instead | Why |
8
+ |:---|:---|:---|
9
+ | `mock.module()` | Dependency injection (`_deps` pattern) | Leaks globally in Bun 1.x, poisons other test files |
10
+ | `console.log` / `console.error` in src/ | Project logger (`src/logger`) | Unstructured output breaks test capture and log parsing |
11
+ | `fs.readFileSync` / `fs.writeFileSync` | `Bun.file()` / `Bun.write()` | Bun-native project — no Node.js file APIs |
12
+ | `child_process.spawn` / `child_process.exec` | `Bun.spawn()` / `Bun.spawnSync()` | Bun-native project — no Node.js process APIs |
13
+ | `setTimeout` / `setInterval` for delays | `Bun.sleep()` | Bun-native equivalent |
14
+ | Hardcoded timeouts in logic | Config values from schema | Hardcoded values can't be tuned per-environment |
15
+ | `import from "src/module/internal-file"` | `import from "src/module"` (barrel) | Prevents singleton fragmentation (BUG-035) |
16
+ | Files > 400 lines | Split by concern | Unmaintainable; violates project convention |
17
+
18
+ ## Test Files
19
+
20
+ | ❌ Forbidden | ✅ Use Instead | Why |
21
+ |:---|:---|:---|
22
+ | Test files in `test/` root | `test/unit/`, `test/integration/`, etc. | Orphaned files with no clear ownership |
23
+ | Standalone bug-fix test files (`*-bug026.test.ts`) | Add to existing relevant test file | Fragments test coverage, creates ownership confusion |
24
+ | `TEST_COVERAGE_*.md` in test/ | `docs/` directory | Test dir is for test code only |
25
+ | `rm -rf` in test cleanup | `mkdtempSync` + OS temp dir | Accidental deletion risk |
26
+ | Tests depending on alphabetical file execution order | Independent, self-contained test files | Cross-file coupling causes phantom failures |
27
+ | Copy-pasted mock setup across files | `test/helpers/` shared factories | DRY; single place to update when interfaces change |
28
+ | Spawning full `nax` process in tests | Mock the relevant module | Prechecks fail in temp dirs; slow; flaky |
29
+ | Real signal sending (`process.kill`) | Mock `process.on()` | Can kill the test runner |
@@ -0,0 +1,13 @@
1
+ #!/usr/bin/env bash
2
+ # nax pre-commit hook — runs typecheck + lint
3
+ # Install: git config core.hooksPath .githooks
4
+
5
+ set -e
6
+
7
+ echo "[pre-commit] Running typecheck..."
8
+ bun run typecheck
9
+
10
+ echo "[pre-commit] Running lint..."
11
+ bun run lint
12
+
13
+ echo "[pre-commit] OK"
package/.gitlab-ci.yml CHANGED
@@ -15,9 +15,11 @@ stages:
15
15
  # --- Stage: Test ---
16
16
  test:
17
17
  stage: test
18
- image: nathapp/node-bun:22.21.0-1.3.9-alpine
18
+ image:
19
+ name: nathapp/node-bun:22.21.0-1.3.9-alpine
20
+ pull_policy: if-not-present
19
21
  before_script:
20
- - apk add --no-cache git python3 make g++
22
+ - apk add --no-cache git python3 make g++
21
23
  - git config --global safe.directory '*'
22
24
  - git config --global user.name "CI Runner"
23
25
  - git config --global user.email "ci@nathapp.io"
@@ -32,7 +34,7 @@ test:
32
34
  - bun install --frozen-lockfile --ignore-scripts
33
35
  - bun run typecheck
34
36
  - bun run lint
35
- - NAX_SKIP_PRECHECK=1 bun test test/ --timeout=60000
37
+ - bun run test:unit
36
38
  rules:
37
39
  - if: '$CI_COMMIT_MESSAGE =~ /release-by-bot/ || $CI_COMMIT_TAG'
38
40
  when: never
@@ -43,7 +45,9 @@ test:
43
45
  # --- Stage: Release ---
44
46
  release:
45
47
  stage: release
46
- image: nathapp/node-bun:22.21.0-1.3.9-alpine
48
+ image:
49
+ name: nathapp/node-bun:22.21.0-1.3.9-alpine
50
+ pull_policy: if-not-present
47
51
  cache:
48
52
  key:
49
53
  files:
@@ -80,7 +84,9 @@ release:
80
84
  # --- Stage: Notify ---
81
85
  notify:
82
86
  stage: notify
83
- image: registry-intl.cn-hongkong.aliyuncs.com/gkci/node:22.14.0-alpine-ci
87
+ image:
88
+ name: registry-intl.cn-hongkong.aliyuncs.com/gkci/node:22.14.0-alpine-ci
89
+ pull_policy: if-not-present
84
90
  needs: [release]
85
91
  script:
86
92
  - VERSION=$(node -e "console.log(require('./package.json').version)")
package/CHANGELOG.md CHANGED
@@ -5,6 +5,15 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.18.4] - 2026-03-04
9
+
10
+ ### Fixed
11
+ - **BUG-031:** Keyword classifier no longer drifts across retries — `description` excluded from complexity/strategy classification (only `title`, `acceptanceCriteria`, `tags` used). Prevents prior error context from upgrading story complexity mid-run.
12
+ - **BUG-033:** LLM routing now retries on timeout/transient failure. New config: `routing.llm.retries` (default: 1), `routing.llm.retryDelayMs` (default: 1000ms). Default timeout raised from 15s to 30s.
13
+
14
+ ### Added
15
+ - Pre-commit hook (`.githooks/pre-commit`) — runs `typecheck` + `lint` before every commit. Install with: `git config core.hooksPath .githooks`
16
+
8
17
  ## [0.10.0] - 2026-02-23
9
18
 
10
19
  ### Added
package/CLAUDE.md CHANGED
@@ -1,10 +1,9 @@
1
1
  # nax — AI Coding Agent Orchestrator
2
2
 
3
- Bun + TypeScript CLI that orchestrates AI coding agents with model routing, three-session TDD, and lifecycle hooks.
3
+ Bun + TypeScript CLI that orchestrates AI coding agents with model routing, TDD strategies, and lifecycle hooks.
4
4
 
5
5
  ## Git Identity
6
6
 
7
- Always set before committing:
8
7
  ```bash
9
8
  git config user.name "subrina.tai"
10
9
  git config user.email "subrina8080@outlook.com"
@@ -12,148 +11,72 @@ git config user.email "subrina8080@outlook.com"
12
11
 
13
12
  ## Commands
14
13
 
15
- - Test: `bun test`
16
- - Typecheck: `bun run typecheck`
17
- - Lint: `bun run lint`
18
- - Dev: `bun run dev`
19
- - Build: `bun run build`
20
- - Run before commit: `bun test && bun run typecheck`
21
-
22
- ## Code Style
23
-
24
- - Bun-native APIs only (Bun.file, Bun.write, Bun.spawn, Bun.sleep) — no Node.js equivalents
25
- - Functional style for pure logic; classes only for stateful adapters (e.g., ClaudeCodeAdapter)
26
- - Types in `types.ts` per module, barrel exports via `index.ts`
27
- - Max ~400 lines per file — split if larger
28
- - Biome for formatting/linting
29
-
30
- ## Testing
31
-
32
- - Framework: `bun:test` (describe/test/expect)
33
- - Unit tests: `test/unit/<module>.test.ts`
34
- - Integration tests: `test/integration/<feature>.test.ts`
35
- - Routing tests: `test/routing/<router>.test.ts`
36
- - UI tests: `test/ui/` (TUI testing, rarely needed)
37
- - All routing, classification, and isolation logic must have unit tests
14
+ ```bash
15
+ bun test # Full test suite
16
+ bun test test/unit/foo.test.ts # Specific file
17
+ bun run typecheck # tsc --noEmit
18
+ bun run lint # Biome
19
+ bun run build # Production build
20
+ bun test && bun run typecheck # Pre-commit check
21
+ ```
38
22
 
39
23
  ## Architecture
40
24
 
41
- ### Execution Flow
42
-
43
25
  ```
44
- Runner.run() [src/execution/runner.ts]
45
- -> loadPlugins() [src/plugins/loader.ts]
46
- -> for each story:
47
- -> Pipeline.execute() [src/pipeline/pipeline.ts]
48
- -> stages: queueCheck -> routing -> constitution -> context -> prompt -> execution -> verify -> review -> completion
49
- -> context stage injects plugin context providers [src/pipeline/stages/context.ts]
50
- -> routing stage checks plugin routers first [src/routing/chain.ts]
51
- -> Reporter.emit() [src/plugins/registry.ts]
52
- -> registry.teardownAll()
26
+ Runner.run() [src/execution/runner.ts — thin orchestrator only]
27
+ loadPlugins()
28
+ for each story:
29
+ Pipeline.execute() [src/pipeline/pipeline.ts]
30
+ stages: queueCheck routing constitution context prompt
31
+ execution verify review → completion
32
+ Reporter.emit()
33
+ registry.teardownAll()
53
34
  ```
54
35
 
55
36
  ### Key Directories
56
37
 
57
38
  | Directory | Purpose |
58
39
  |:---|:---|
59
- | `src/execution/` | Runner loop, agent adapters (Claude Code), TDD strategies |
60
- | `src/execution/lifecycle/` | (v0.15.0) Lifecycle hooks, startup/teardown orchestration |
61
- | `src/execution/escalation/` | (v0.15.0) Acceptance-loop escalation logic (when agent fails repeatedly) |
62
- | `src/execution/acceptance/` | (v0.15.0) Acceptance-loop iteration logic |
63
- | `src/pipeline/stages/` | Pipeline stages (routing, context, prompt, execution, review, etc.) |
64
- | `src/routing/` | Model routing — tier classification, router chain, plugin routers |
65
- | `src/plugins/` | Plugin system — loader, registry, validator, types |
40
+ | `src/execution/` | Runner loop, agent adapters, TDD strategies |
41
+ | `src/execution/lifecycle/` | Lifecycle hooks, startup/teardown |
42
+ | `src/execution/escalation/` | Escalation logic on repeated failures |
43
+ | `src/execution/acceptance/` | Acceptance-loop iteration |
44
+ | `src/pipeline/stages/` | Pipeline stages |
45
+ | `src/routing/` | Model routing — tier classification, router chain |
46
+ | `src/plugins/` | Plugin system — loader, registry, validator |
66
47
  | `src/config/` | Config schema, loader (layered global + project) |
67
- | `src/verification/` | (planned) Unified test execution, typecheck, lint, acceptance checks |
68
- | `src/agents/adapters/` | Agent integrations (Claude Code, future: Devin, Aider, etc.) |
69
- | `src/cli/` | CLI commands |
70
- | `examples/plugins/` | Sample plugins (console-reporter) |
48
+ | `src/agents/adapters/` | Agent integrations (Claude Code) |
49
+ | `src/cli/` + `src/commands/` | CLI commands (check both locations) |
50
+ | `src/verification/` | Test execution, smart test runner |
51
+ | `src/review/` | Post-verify review (typecheck, lint, plugin reviewers) |
71
52
 
72
- ### Plugin System
73
-
74
- Plugins extend nax via 4 extension points:
53
+ ### Plugin System (4 extension points)
75
54
 
76
55
  | Extension | Interface | Integration Point |
77
56
  |:---|:---|:---|
78
- | **Context Provider** | `IContextProvider` | `src/pipeline/stages/context.ts` — injects context into agent prompts before execution |
79
- | **Reviewer** | `IReviewer` | Pipeline review stage — runs after built-in checks (typecheck/lint/test) |
80
- | **Reporter** | `IReporter` | `src/execution/runner.ts`receives onRunStart/onStoryComplete/onRunEnd events |
81
- | **Router** | `IRoutingStrategy` | `src/routing/chain.ts` — overrides model routing for specific stories |
82
-
83
- Plugin loading order: global (`~/.nax/plugins/`) -> project (`<workdir>/nax/plugins/`) -> config (`plugins[]` in config.json).
57
+ | Context Provider | `IContextProvider` | `context.ts` stage — injects into prompts |
58
+ | Reviewer | `IReviewer` | Review stage — after built-in checks |
59
+ | Reporter | `IReporter` | Runner — onRunStart/onStoryComplete/onRunEnd |
60
+ | Router | `IRoutingStrategy` | Router chain — overrides model routing |
84
61
 
85
62
  ### Config
86
63
 
87
- - Global: `~/.nax/config.json`
88
- - Project: `<workdir>/nax/config.json`
89
- - Key settings: `execution.contextProviderTokenBudget` (default: 2000), `plugins[]` array
90
-
91
- ## Target Architecture (v0.15.0+)
92
-
93
- ### File Size Hard Limit
94
-
95
- **400 lines maximum per file.** If you are about to exceed it, STOP and split first.
96
-
97
- ### execution/ Module Re-architecture Goal
98
-
99
- Keep `runner.ts` as a **thin orchestrator only**. Extract:
64
+ - Global: `~/.nax/config.json` → Project: `<workdir>/nax/config.json`
65
+ - Schema: `src/config/schema.ts` — no hardcoded flags or credentials
100
66
 
101
- - `sequential-executor.ts` — single-story execution loop
102
- - `parallel-runner.ts` — parallel story execution (future)
103
- - `acceptance-loop.ts` — retry/escalation logic for failed stories
104
- - `reporter-notifier.ts` — plugin event emission (onRunStart, onStoryComplete, onRunEnd)
105
- - `lifecycle/` subdir — startup, teardown, cleanup handlers
106
- - `escalation/` subdir — escalation strategies when acceptance loop fails
67
+ ## Design Principles
107
68
 
108
- **Never add new concerns to `runner.ts`** new logic goes into a focused sub-module.
109
-
110
- ### verification/ Unified Layer (Planned)
111
-
112
- Do not duplicate test execution logic across pipeline stages. When building new verification features (typecheck, lint, test, acceptance checks), put the logic in `src/verification/` and call from pipeline stages. This prevents scattered test invocations and ensures consistent test result parsing.
113
-
114
- ### Plugin Extension Points
115
-
116
- When adding new agent integrations (e.g., Devin, Aider, Cursor):
117
-
118
- 1. Add adapter class to `src/agents/adapters/<name>.ts`
119
- 2. Register in `src/agents/adapters/index.ts`
120
- 3. Do NOT inline agent logic in `runner.ts` or `claude.ts`
121
-
122
- ### Logging Style
123
-
124
- - No emojis in log messages
125
- - Use `[OK]`, `[WARN]`, `[FAIL]`, `->` instead
126
- - Keep logs machine-parseable
127
-
128
- ### Configuration
129
-
130
- - No hardcoded flags or credentials
131
- - Always read from config schema (`src/config/schema.ts`)
132
- - Validate config at startup
133
-
134
- ### Closure Passing for Long-Lived Handlers
135
-
136
- Pass **closures, not values** to long-lived handlers (crash handlers, heartbeat timers). This ensures handlers always reference the latest state, not stale snapshots.
137
-
138
- ```typescript
139
- // WRONG: Captures stale value
140
- const handler = () => cleanup(currentStory)
141
-
142
- // CORRECT: Closure references latest state
143
- const handler = () => cleanup(() => getCurrentStory())
144
- ```
69
+ - **`runner.ts` is a thin orchestrator.** Never add new concerns — extract into focused sub-modules.
70
+ - **`src/verification/` is the single test execution layer.** Don't duplicate test invocation in pipeline stages.
71
+ - **Closures over values** for long-lived handlers (crash handlers, timers) — prevents stale state capture.
72
+ - **New agent adapters** go in `src/agents/adapters/<name>.ts` — never inline in runner or existing adapters.
145
73
 
146
- ## Testing Constraints (CRITICAL)
74
+ ## Rules
147
75
 
148
- - **Never spawn full `nax` processes in tests.** nax has prechecks (git-repo-exists, dependencies-installed) that fail in temp directories. Write unit tests with mocks instead.
149
- - **Integration tests that need git:** Always `git init` + `git add` + `git commit` in the test fixture before running any code that triggers nax precheck validation.
150
- - **Test files for crash/signal handling:** Use process-level mocks (e.g., mock `process.on('SIGTERM', ...)`) — do not send real signals in tests.
151
- - **Context files:** If a test needs specific context files, create them in the test fixture directory — don't rely on auto-detection from the real workspace.
76
+ Detailed coding standards, test architecture, and forbidden patterns are in `.claude/rules/`. Claude Code loads these automatically.
152
77
 
153
78
  ## IMPORTANT
154
79
 
155
- - Never hardcode API keysagents use their own auth from env
156
- - Agent adapters spawn external processes always handle timeouts and cleanup
157
- - Conventional commits: `feat:`, `fix:`, `refactor:`, `test:`, `docs:`, `chore:`
158
- - Keep commits atomic — one logical change per commit
159
- - Do NOT push to remote — let the human review and push
80
+ - Do NOT push to remote let the human review and push.
81
+ - Never hardcode API keysagents use their own auth from env.
82
+ - Agent adapters spawn external processes always handle timeouts and cleanup.
package/bun.lock CHANGED
@@ -18,7 +18,7 @@
18
18
  },
19
19
  "devDependencies": {
20
20
  "@biomejs/biome": "^1.9.4",
21
- "@types/bun": "^1.2.4",
21
+ "@types/bun": "^1.3.8",
22
22
  "react-devtools-core": "^7.0.1",
23
23
  "typescript": "^5.7.3",
24
24
  },
package/bunfig.toml CHANGED
@@ -8,4 +8,5 @@ timeout = 30000
8
8
  # Exclude nax dogfood feature directories (contain acceptance tests from nax runs, not source tests)
9
9
  exclude = ["nax/**"]
10
10
 
11
- # Note: E2E tests may override this with longer timeouts using test options
11
+ [test]
12
+ concurrent = 1
@@ -0,0 +1,15 @@
1
+ version: "3.9"
2
+ services:
3
+ app:
4
+ image: nathapp/bun:1.3.8-ci
5
+ working_dir: /app
6
+ volumes:
7
+ - .:/app
8
+ command: >
9
+ sh -c "
10
+ bun install &&
11
+ bun run test:unit
12
+ "
13
+ environment:
14
+ - NAX_SKIP_PRECHECK=1
15
+ - CI=true
package/docs/ROADMAP.md CHANGED
@@ -50,18 +50,18 @@
50
50
 
51
51
  ---
52
52
 
53
- ## v0.18.2 — Smart Test Runner + Bun PTY Migration
53
+ ## v0.18.2 — Smart Test Runner + Routing Fix
54
54
 
55
- **Theme:** Scope verify to changed files only + remove node-pty native addon
56
- **Status:** 🔲 Planned
55
+ **Theme:** Scope verify to changed files only + fix routing override
56
+ **Status:** Shipped (2026-03-03)
57
57
 
58
58
  ### Smart Test Runner
59
- - [ ] After agent implementation, run `git diff --name-only` to get changed source files
60
- - [ ] Map source → test files by naming convention (`src/foo/bar.ts` → `test/unit/foo/bar.test.ts`)
61
- - [ ] Run only related tests for verify (instead of full suite)
62
- - [ ] Fallback to full suite when mapping yields no test files
63
- - [ ] Config flag `execution.smartTestRunner: true` (default: true) to opt out
64
- - [ ] Result: verify drops from ~125s to ~10-20s for typical single-file fixes
59
+ - [x] ~~After agent implementation, run `git diff --name-only` to get changed source files~~
60
+ - [x] ~~Map source → test files by naming convention (`src/foo/bar.ts` → `test/unit/foo/bar.test.ts`)~~
61
+ - [x] ~~Run only related tests for verify (instead of full suite)~~
62
+ - [x] ~~Fallback to full suite when mapping yields no test files~~
63
+ - [x] ~~Config flag `execution.smartTestRunner: true` (default: true) to opt out~~
64
+ - [x] ~~Result: verify drops from ~125s to ~10-20s for typical single-file fixes~~
65
65
 
66
66
  ### Bun PTY Migration (BUN-001)
67
67
  - [ ] Replace `node-pty` (native addon, requires python/make/g++ to build) with `Bun.Terminal` API (v1.3.5+)
@@ -80,12 +80,68 @@
80
80
 
81
81
  ---
82
82
 
83
- ## v0.19.0Central Run Registry
83
+ ## v0.18.3Execution Reliability
84
+
85
+ **Theme:** Fix execution pipeline bugs (escalation, routing, review), structured failure context, and Smart Runner enhancement
86
+ **Status:** ✅ Shipped (2026-03-04)
87
+ **Spec:** [docs/specs/verification-architecture-v2.md](specs/verification-architecture-v2.md) (Phase 1)
88
+
89
+ ### Bugfixes — Completed
90
+ - [x] **BUG-026:** Regression gate timeout → accept scoped pass + warn (not escalate). Config: `regressionGate.acceptOnTimeout: true`.
91
+ - [x] **BUG-028:** Routing cache ignores escalation tier — `clearCacheForStory(storyId)` in `llm.ts`, called on tier escalation in both `preIterationTierCheck()` and `handleTierEscalation()`.
92
+
93
+ ### Structured Failure Context — Completed
94
+ - [x] **SFC-001:** `StructuredFailure` type with `TestFailureContext[]` + `priorFailures?: StructuredFailure[]` on `UserStory`. Populated on verify, regression, rectification, and escalation failures.
95
+ - [x] **SFC-002:** Format `priorFailures` into agent prompt at priority 95 via `createPriorFailuresContext()` in `context/builder.ts`.
96
+
97
+ ### Bugfixes — Completed (Round 2)
98
+ - [x] **BUG-029:** Escalation resets story to `pending` → bypasses BUG-022 retry priority. After escalation, `getNextStory()` picks the next pending story instead of retrying the escalated one. **Location:** `src/prd/index.ts:getNextStory()`. **Fix:** Recognize escalated-pending stories in Priority 1 (e.g. check `story.routing.modelTier` changed, or use `"retry-pending"` status).
99
+ - [x] **BUG-030:** Review lint/typecheck failure → hard `"fail"`, no rectification or retry. `review.ts:92` returns `{ action: "fail" }` → `markStoryFailed()` permanently. Lint errors are auto-fixable but story is killed with zero retry. **Fix:** Return `"escalate"` for lint/typecheck failures (or add review-rectification loop). Reserve `"fail"` for plugin reviewer rejection only.
100
+ - [x] **BUG-032:** Routing stage overrides escalated `modelTier` with complexity-derived tier. `routing.ts:43` always runs `complexityToModelTier()` even when `story.routing.modelTier` was set by escalation → escalated tier silently ignored. BUG-013 fix (`applyCachedRouting`) runs too late. **Fix:** Skip `complexityToModelTier()` when `story.routing.modelTier` is explicitly set.
101
+
102
+ ### STR-007: Smart Test Runner Enhancement — Completed
103
+ - [x] Configurable `testFilePatterns` in config (default: `test/**/*.test.ts`)
104
+ - [x] `testFileFallback` config option: `"import-grep"` | `"full-suite"` (default: `"import-grep"`)
105
+ - [x] 3-pass test discovery: path-convention → import-grep (grep test files for changed module name) → full-suite
106
+ - [x] Config schema update: `execution.smartTestRunner` becomes object `{ enabled, testFilePatterns, fallback }` (backward compat: boolean coerced)
107
+
108
+ ---
109
+
110
+ ## v0.18.4 — Routing Stability ✅
84
111
 
85
- **Theme:** Unified run tracking across worktrees + dashboard integration
112
+ **Theme:** Fix routing classifier consistency and LLM routing reliability
113
+ **Status:** ✅ Shipped (2026-03-04)
114
+
115
+ ### Bugfixes
116
+ - [x] **BUG-031:** Keyword fallback classifier gives inconsistent strategy across retries for same story. `priorErrors` text shifts keyword classification. **Fix:** Keyword classifier should only use original story fields; or lock `story.routing.testStrategy` once set.
117
+ - [x] **BUG-033:** LLM routing has no retry on timeout — single 15s attempt, then keyword fallback. **Fix:** Add `routing.llm.retries` config (default: 1) with backoff. Raise default timeout to 30s for batch routing.
118
+
119
+ ---
120
+
121
+ ## v0.19.0 — Verification Architecture v2
122
+
123
+ **Theme:** Eliminate duplicate test runs, deferred regression gate, structured escalation context
86
124
  **Status:** 🔲 Planned
125
+ **Spec:** [docs/specs/verification-architecture-v2.md](specs/verification-architecture-v2.md) (Phase 2)
87
126
 
88
- - [ ] **Central Run Registry** — `~/.nax/runs/<project>-<feature>-<runId>/` with status.json + events.jsonl symlink. Dashboard reads from registry.
127
+ ### Remove Duplicate Test Execution
128
+ - [ ] Pipeline verify stage is the single test execution point (Smart Test Runner)
129
+ - [ ] Remove scoped re-test in `post-verify.ts` (duplicate of pipeline verify)
130
+ - [ ] Review stage runs typecheck + lint only — remove `review.commands.test` execution
131
+
132
+ ### Deferred Regression Gate
133
+ - [ ] New `src/execution/lifecycle/run-regression.ts` — run full suite once at run-end (not per-story)
134
+ - [ ] Reverse Smart Test Runner mapping: failing test → source file → responsible story
135
+ - [ ] Targeted rectification per responsible story with full failure context
136
+ - [ ] Config: `execution.regressionGate.mode: "deferred" | "per-story" | "disabled"` (default `"deferred"`)
137
+ - [ ] Call deferred regression in `run-completion.ts` before final metrics
138
+
139
+ ### Full Structured Failure Context
140
+ - [ ] `priorFailures` injected into escalated agent prompts via `context/builder.ts`
141
+ - [ ] Reverse file mapping for regression attribution
142
+
143
+ ### Central Run Registry (carried forward)
144
+ - [ ] `~/.nax/runs/<project>-<feature>-<runId>/` with status.json + events.jsonl symlink
89
145
 
90
146
  ---
91
147
 
@@ -94,6 +150,9 @@
94
150
  | Version | Theme | Date | Details |
95
151
  |:---|:---|:---|:---|
96
152
  | v0.18.1 | Type Safety + CI Pipeline | 2026-03-03 | 60 TS errors + 12 lint errors fixed, GitLab CI green (1952/56/0) |
153
+ | v0.18.4 | Routing Stability | 2026-03-04 | BUG-031 keyword drift, BUG-033 LLM retry, pre-commit hook |
154
+ | v0.18.3 | Execution Reliability + Smart Runner | 2026-03-04 | BUG-026/028/029/030/032 + SFC-001/002 + STR-007, all items complete |
155
+ | v0.18.2 | Smart Test Runner + Routing Fix | 2026-03-03 | FIX-001 + STR-001–006, 2038 pass/11 skip/0 fail |
97
156
  | v0.18.0 | Orchestration Quality | 2026-03-03 | BUG-016/017/018/019/020/021/022/023/025 all fixed |
98
157
  | v0.17.0 | Config Management | 2026-03-02 | CM-001 --explain, CM-002 --diff, CM-003 default view |
99
158
  | v0.16.4 | Bugfixes: Routing + Env Allowlist | 2026-03-02 | BUG-012/013/014 |
@@ -128,7 +187,9 @@
128
187
  - [x] ~~BUG-012: Greenfield detection ignores pre-existing test files~~
129
188
  - [x] ~~BUG-013: Escalation routing not applied in iterations~~
130
189
  - [x] ~~BUG-014: buildAllowedEnv() strips USER/LOGNAME~~
131
- - [ ] **BUG-015:** `loadConstitution()` leaks global `~/.nax/constitution.md` into unit tests
190
+ - [x] ~~**BUG-015:** `loadConstitution()` leaks global `~/.nax/constitution.md` into unit tests — fixed via `skipGlobal: true` in all unit tests~~
191
+ - [x] ~~**BUG-027:** `runPrecheck()` always prints to stdout — pollutes test output when called programmatically. Shipped in v0.18.2.~~
192
+ - [x] ~~**BUG-028:** Routing cache ignores escalation tier — escalated stories re-run at original tier. Shipped in v0.18.3.~~
132
193
  - [x] ~~**BUG-016:** Hardcoded 120s timeout in pipeline verify stage → fixed in v0.18.0~~
133
194
  - [x] ~~**BUG-017:** run.complete not emitted on SIGTERM → fixed in v0.18.0~~
134
195
  - [x] ~~**BUG-018:** Test-writer wastes ~3min/retry when tests already exist → fixed in v0.18.0~~
@@ -139,12 +200,20 @@
139
200
  - [x] ~~**BUG-023:** Agent failure silent — no exitCode/stderr in JSONL → fixed in v0.18.0~~
140
201
  - [x] ~~**BUG-025:** `needsHumanReview` not triggering interactive plugin → fixed in v0.18.0~~
141
202
 
203
+ - [x] **BUG-029:** Escalation resets story to `pending` → bypasses BUG-022 retry priority. `handleTierEscalation()` sets `status: "pending"` after escalation, but `getNextStory()` Priority 1 only checks `status === "failed"`. Result: after BUG-026 escalated (iter 1), nax moved to BUG-028 (iter 2) instead of retrying BUG-026 immediately. **Location:** `src/prd/index.ts:getNextStory()` + `src/execution/escalation/tier-escalation.ts`. **Fix:** `getNextStory()` should also prioritize stories with `story.routing.modelTier` that changed since last attempt (escalation marker), or `handleTierEscalation` should use a distinct status like `"retry-pending"` that Priority 1 recognizes.
204
+ - [x] **BUG-030:** Review lint failure → hard `"fail"`, no rectification or retry. `src/pipeline/stages/review.ts:92` returns `{ action: "fail" }` for all review failures including lint. In `pipeline-result-handler.ts`, `"fail"` calls `markStoryFailed()` — permanently dead. But lint errors are auto-fixable (agent can run `biome check --fix`). Contrast with verify stage which returns `"escalate"` on test failure, allowing retry. SFC-001 and SFC-002 both hit this — tests passed but 5 Biome lint errors killed the stories permanently. **Fix:** Review stage should return `"escalate"` (not `"fail"`) for lint/typecheck failures, or add a review-rectification loop (like verify has) that gives the agent one retry with the lint output as context. Reserve `"fail"` for unfixable review issues (e.g. plugin reviewer rejection).
205
+ - [x] **BUG-031:** Keyword fallback classifier gives inconsistent strategy across retries for same story. BUG-026 was classified as `test-after` on iter 1 (keyword fallback), but `three-session-tdd-lite` on iter 5 (same keyword fallback). The keyword classifier in `src/routing/strategies/keyword.ts:classifyComplexity()` may be influenced by `priorErrors` text added between attempts, shifting the keyword match result. **Location:** `src/routing/strategies/keyword.ts`. **Fix:** Keyword classifier should only consider the story's original title + description + acceptance criteria, not accumulated `priorErrors` or `priorFailures`. Alternatively, once a strategy is set in `story.routing.testStrategy`, the routing stage should preserve it across retries (already partially done in `routing.ts:40-41` but may not apply when LLM falls back to keyword).
206
+ - [x] **BUG-032:** Routing stage overrides escalated `modelTier` with complexity-derived tier. `src/pipeline/stages/routing.ts:43` always runs `complexityToModelTier(routing.complexity, config)` even when `story.routing.modelTier` was explicitly set by `handleTierEscalation()`. BUG-026 was escalated to `balanced` (logged in iteration header), but `Task classified` shows `modelTier=fast` because `complexityToModelTier("simple", config)` → `"fast"`. Related to BUG-013 (escalation routing not applied) which was marked fixed, but the fix in `applyCachedRouting()` in `pipeline-result-handler.ts:295-310` runs **after** the routing stage — too late. **Location:** `src/pipeline/stages/routing.ts:43`. **Fix:** When `story.routing.modelTier` is explicitly set (by escalation), skip `complexityToModelTier()` and use the cached tier directly. Only derive from complexity when `story.routing.modelTier` is absent.
207
+ - [x] **BUG-033:** LLM routing has no retry on timeout — single attempt with hardcoded 15s default. All 5 LLM routing attempts in the v0.18.3 run timed out at 15s, forcing keyword fallback every time. `src/routing/strategies/llm.ts:63` reads `llmConfig?.timeoutMs ?? 15000` but there's no retry logic — one timeout = immediate fallback. **Location:** `src/routing/strategies/llm.ts:callLlm()`. **Fix:** Add `routing.llm.retries` config (default: 1) with backoff. Also surface `routing.llm.timeoutMs` in `nax config --explain` and consider raising default to 30s for batch routing which processes multiple stories.
208
+
142
209
  ### Features
143
210
  - [x] ~~`nax unlock` command~~
144
211
  - [x] ~~Constitution file support~~
145
212
  - [x] ~~Per-story testStrategy override — v0.18.1~~
146
213
  - [x] ~~Smart Test Runner — v0.18.2~~
147
214
  - [x] ~~Central Run Registry — v0.19.0~~
215
+ - [ ] **BUN-001:** Bun PTY Migration — replace `node-pty` with `Bun.Terminal` API
216
+ - [ ] **CI-001:** CI Memory Optimization — parallel test sharding for 1GB runners
148
217
  - [ ] Cost tracking dashboard
149
218
  - [ ] npm publish setup
150
219
  - [ ] `nax diagnose --ai` flag (LLM-assisted, future TBD)
@@ -160,4 +229,4 @@ Sequential canary → stable: `v0.12.0-canary.0` → `canary.N` → `v0.12.0`
160
229
  Canary: `npm publish --tag canary`
161
230
  Stable: `npm publish` (latest)
162
231
 
163
- *Last updated: 2026-03-03 (v0.18.0 shipped all 9 bugs fixed)*
232
+ *Last updated: 2026-03-04 (v0.18.3 shipped; v0.18.4: BUG-031/033; v0.19.0: Verification Architecture v2)*