@nathapp/nax 0.28.0 → 0.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (385) hide show
  1. package/CHANGELOG.md +23 -2
  2. package/bin/nax.ts +2 -3
  3. package/dist/nax.js +72753 -0
  4. package/package.json +11 -3
  5. package/src/cli/analyze.ts +2 -7
  6. package/src/cli/config.ts +3 -1
  7. package/src/config/defaults.ts +1 -0
  8. package/src/config/schemas.ts +1 -0
  9. package/src/config/types.ts +1 -0
  10. package/src/context/builder.ts +10 -1
  11. package/src/execution/lifecycle/headless-formatter.ts +2 -4
  12. package/src/prompts/builder.ts +12 -69
  13. package/src/prompts/sections/isolation.ts +38 -8
  14. package/src/prompts/sections/role-task.ts +79 -17
  15. package/src/review/runner.ts +6 -1
  16. package/src/version.ts +2 -1
  17. package/.claude/rules/01-project-conventions.md +0 -34
  18. package/.claude/rules/02-test-architecture.md +0 -39
  19. package/.claude/rules/03-test-writing.md +0 -58
  20. package/.claude/rules/04-forbidden-patterns.md +0 -29
  21. package/.claude/settings.json +0 -15
  22. package/.githooks/pre-commit +0 -16
  23. package/.gitlab-ci.yml +0 -103
  24. package/.mcp.json +0 -8
  25. package/BRIEF.md +0 -140
  26. package/CLAUDE.md +0 -143
  27. package/US-007-IMPLEMENTATION.md +0 -139
  28. package/biome.json +0 -14
  29. package/bun.lock +0 -163
  30. package/bunfig.toml +0 -12
  31. package/docker-compose.test.yml +0 -15
  32. package/docs/20260216-fix-plan-context-review.md +0 -56
  33. package/docs/20260216-relentless-vs-ngent-comparison.md +0 -208
  34. package/docs/20260216-v02-plan.md +0 -136
  35. package/docs/20260216-v02-review.md +0 -685
  36. package/docs/20260217-dogfood-findings.md +0 -56
  37. package/docs/20260217-p2-plus-plan.md +0 -117
  38. package/docs/20260217-partial-fixes-plan.md +0 -62
  39. package/docs/20260217-plan-analyze-spec.md +0 -117
  40. package/docs/20260217-post-impl-review.md +0 -1137
  41. package/docs/20260217-quick-wins-plan.md +0 -66
  42. package/docs/20260217-split-runner-plan.md +0 -75
  43. package/docs/20260217-v03-impl-plan.md +0 -80
  44. package/docs/20260217-v03-post-impl-review.md +0 -589
  45. package/docs/20260217-v04-impl-plan.md +0 -86
  46. package/docs/20260217-v05-post-impl-review.md +0 -850
  47. package/docs/20260217-v06-post-impl-review.md +0 -817
  48. package/docs/20260218-adr003-port-plan.md +0 -151
  49. package/docs/20260218-review-adr003-verification.md +0 -175
  50. package/docs/20260219-fix-plan-bug16-19.md +0 -79
  51. package/docs/20260219-fix-plan-bug20-22.md +0 -114
  52. package/docs/20260219-plan-llm-routing.md +0 -116
  53. package/docs/20260219-review-bug20-22-fixes.md +0 -135
  54. package/docs/20260219-routing-baseline-keyword.md +0 -63
  55. package/docs/20260220-plan-structured-logging-p1.md +0 -80
  56. package/docs/20260220-plan-structured-logging-p2.md +0 -37
  57. package/docs/20260220-review-llm-routing.md +0 -180
  58. package/docs/20260220-review-post-fix-llm-routing.md +0 -70
  59. package/docs/20260221-fix-plan-relevantfiles-split.md +0 -101
  60. package/docs/20260221-fix-plan-routing-mode.md +0 -125
  61. package/docs/20260221-review-v0.9-implementation.md +0 -379
  62. package/docs/20260222-fix-plan-v091-routing-isolation.md +0 -197
  63. package/docs/20260223-fix-plan-prompt-audit.md +0 -62
  64. package/docs/20260224-nax-roadmap-phases.md +0 -189
  65. package/docs/20260225-phase2-llm-service-layer.md +0 -401
  66. package/docs/20260225-review-v0.10.1.md +0 -187
  67. package/docs/20260303-v010-implementation-plan.md +0 -165
  68. package/docs/20260304-review-nax.md +0 -492
  69. package/docs/CLAUDE.md.bak +0 -191
  70. package/docs/ROADMAP.md +0 -390
  71. package/docs/SPEC-rectification.md +0 -0
  72. package/docs/SPEC.md +0 -324
  73. package/docs/US-001-plugin-loading-verification.md +0 -152
  74. package/docs/adr/ADR-005-implementation-plan.md +0 -655
  75. package/docs/adr/ADR-005-pipeline-re-architecture.md +0 -464
  76. package/docs/architecture-analysis.md +0 -1076
  77. package/docs/bugs/BUG-21-escalation-null-attempts.md +0 -48
  78. package/docs/bugs-from-dogfood-run-c.md +0 -243
  79. package/docs/code-review-20260228.md +0 -612
  80. package/docs/code-review-v0.15.0.md +0 -629
  81. package/docs/hook-lifecycle-test-plan.md +0 -149
  82. package/docs/releases/v0.11.0-and-earlier.md +0 -20
  83. package/docs/releases/v0.12.0.md +0 -15
  84. package/docs/releases/v0.13.0.md +0 -14
  85. package/docs/releases/v0.14.0.md +0 -20
  86. package/docs/releases/v0.14.1.md +0 -36
  87. package/docs/releases/v0.14.2.md +0 -51
  88. package/docs/releases/v0.14.3.md +0 -174
  89. package/docs/releases/v0.14.4.md +0 -94
  90. package/docs/releases/v0.15.0.md +0 -502
  91. package/docs/releases/v0.15.1.md +0 -170
  92. package/docs/releases/v0.15.3.md +0 -193
  93. package/docs/specs/bug-039-orphan-processes.md +0 -131
  94. package/docs/specs/bug-040-review-rectification.md +0 -82
  95. package/docs/specs/bug-041-cross-story-test-isolation.md +0 -88
  96. package/docs/specs/bug-042-verifier-failure-capture.md +0 -117
  97. package/docs/specs/bun-pty-migration.md +0 -171
  98. package/docs/specs/central-run-registry.md +0 -116
  99. package/docs/specs/feat-010-smart-runner-git-history.md +0 -96
  100. package/docs/specs/feat-011-file-context-strategy.md +0 -73
  101. package/docs/specs/feat-012-tdd-writer-tier.md +0 -79
  102. package/docs/specs/feat-013-test-after-review.md +0 -89
  103. package/docs/specs/feat-014-heartbeat-observability.md +0 -127
  104. package/docs/specs/status-file-consolidation.md +0 -93
  105. package/docs/specs/status-file-v0.10.1.md +0 -812
  106. package/docs/specs/trigger-completion.md +0 -145
  107. package/docs/specs/verification-architecture-v2.md +0 -343
  108. package/docs/tdd/strategies.md +0 -97
  109. package/docs/v0.10-global-config.md +0 -206
  110. package/docs/v0.10-plugin-system.md +0 -415
  111. package/docs/v0.10-prompt-optimizer.md +0 -234
  112. package/docs/v0.3-spec.md +0 -244
  113. package/docs/v0.4-spec.md +0 -140
  114. package/docs/v0.5-spec.md +0 -237
  115. package/docs/v0.6-spec.md +0 -371
  116. package/docs/v0.7-spec.md +0 -177
  117. package/docs/v0.8-llm-routing.md +0 -206
  118. package/docs/v0.8-structured-logging.md +0 -132
  119. package/docs/v0.9.3-prompt-audit.md +0 -112
  120. package/examples/plugins/console-reporter/index.test.ts +0 -207
  121. package/examples/plugins/console-reporter/index.ts +0 -110
  122. package/memory/topic/feat-010-baseref.md +0 -28
  123. package/memory/topic/feat-013-test-after-deprecation.md +0 -22
  124. package/nax/config.json +0 -154
  125. package/nax/features/bug-039-medium/prd.json +0 -45
  126. package/nax/features/bugfix-v0171/prd.json +0 -52
  127. package/nax/features/central-run-registry/prd.json +0 -105
  128. package/nax/features/config-management/prd.json +0 -108
  129. package/nax/features/config-management/progress.txt +0 -5
  130. package/nax/features/diagnose/acceptance.test.ts +0 -414
  131. package/nax/features/diagnose/prd.json +0 -41
  132. package/nax/features/nax-compliance/prd.json +0 -52
  133. package/nax/features/nax-compliance/progress.txt +0 -1
  134. package/nax/features/orchestration-fixes/prd.json +0 -89
  135. package/nax/features/orchestration-fixes/progress.txt +0 -1
  136. package/nax/features/plugin-integration/US-007-VERIFICATION.md +0 -259
  137. package/nax/features/plugin-integration/prd.json +0 -208
  138. package/nax/features/plugin-integration/progress.txt +0 -5
  139. package/nax/features/post-rearch-bugfix/prd.json +0 -137
  140. package/nax/features/precheck/prd.json +0 -205
  141. package/nax/features/precheck/progress.txt +0 -15
  142. package/nax/features/prompt-builder/prd.json +0 -152
  143. package/nax/features/prompt-builder/progress.txt +0 -3
  144. package/nax/features/review-quality/prd.json +0 -55
  145. package/nax/features/routing-persistence/prd.json +0 -104
  146. package/nax/features/routing-persistence/progress.txt +0 -1
  147. package/nax/features/smart-test-runner/plan.md +0 -7
  148. package/nax/features/smart-test-runner/prd.json +0 -203
  149. package/nax/features/smart-test-runner/progress.txt +0 -13
  150. package/nax/features/smart-test-runner/spec.md +0 -7
  151. package/nax/features/smart-test-runner/tasks.md +0 -8
  152. package/nax/features/status-file-consolidation/prd.json +0 -106
  153. package/nax/features/structured-logging/prd.json +0 -199
  154. package/nax/features/trigger-completion/prd.json +0 -150
  155. package/nax/features/trigger-completion/progress.txt +0 -7
  156. package/nax/features/unlock/prd.json +0 -36
  157. package/nax/features/v0.18.3-execution-reliability/prd.json +0 -80
  158. package/nax/features/v0.18.3-execution-reliability/progress.txt +0 -3
  159. package/nax/features/v0.19.0-hardening/plan.md +0 -7
  160. package/nax/features/v0.19.0-hardening/prd.json +0 -84
  161. package/nax/features/v0.19.0-hardening/progress.txt +0 -7
  162. package/nax/features/v0.19.0-hardening/spec.md +0 -18
  163. package/nax/features/v0.19.0-hardening/tasks.md +0 -8
  164. package/nax/features/verify-v2/prd.json +0 -79
  165. package/nax/features/verify-v2/progress.txt +0 -3
  166. package/nax/status.json +0 -36
  167. package/src/prompts/templates/implementer.ts +0 -6
  168. package/src/prompts/templates/single-session.ts +0 -6
  169. package/src/prompts/templates/test-writer.ts +0 -6
  170. package/src/prompts/templates/verifier.ts +0 -6
  171. package/test/COVERAGE-GAPS.md +0 -333
  172. package/test/e2e/cm-003-default-view.test.ts +0 -195
  173. package/test/e2e/plan-analyze-run.test.ts +0 -902
  174. package/test/helpers/helpers.test.ts +0 -295
  175. package/test/helpers/timeout.ts +0 -42
  176. package/test/integration/US-002-TEST-SUMMARY.md +0 -107
  177. package/test/integration/US-003-TEST-SUMMARY.md +0 -149
  178. package/test/integration/US-004-TEST-SUMMARY.md +0 -106
  179. package/test/integration/US-005-TEST-SUMMARY.md +0 -138
  180. package/test/integration/US-007-TEST-SUMMARY.md +0 -100
  181. package/test/integration/cli/agent-validation.test.ts +0 -439
  182. package/test/integration/cli/cli-config-default-edge-cases.test.ts +0 -223
  183. package/test/integration/cli/cli-config-default-view.test.ts +0 -230
  184. package/test/integration/cli/cli-config-diff.test.ts +0 -461
  185. package/test/integration/cli/cli-config-prompts-explain.test.ts +0 -74
  186. package/test/integration/cli/cli-config.test.ts +0 -737
  187. package/test/integration/cli/cli-diagnose.test.ts +0 -595
  188. package/test/integration/cli/cli-logs.test.ts +0 -346
  189. package/test/integration/cli/cli-plugins.test.ts +0 -679
  190. package/test/integration/cli/cli-precheck.test.ts +0 -372
  191. package/test/integration/cli/cli-run-headless.test.ts +0 -174
  192. package/test/integration/cli/cli.test.ts +0 -76
  193. package/test/integration/cli/precheck-integration.test.ts +0 -476
  194. package/test/integration/cli/precheck-orchestrator.test.ts +0 -247
  195. package/test/integration/cli/precheck.test.ts +0 -806
  196. package/test/integration/config/config-loader.test.ts +0 -266
  197. package/test/integration/config/config.test.ts +0 -444
  198. package/test/integration/config/merger.test.ts +0 -466
  199. package/test/integration/config/paths.test.ts +0 -52
  200. package/test/integration/config/security-loader.test.ts +0 -83
  201. package/test/integration/context/context-integration.test.ts +0 -703
  202. package/test/integration/context/context-path-security.test.ts +0 -173
  203. package/test/integration/context/context-provider-injection.test.ts +0 -507
  204. package/test/integration/context/context-verification-integration.test.ts +0 -296
  205. package/test/integration/context/s5-greenfield-fallback.test.ts +0 -298
  206. package/test/integration/execution/execution-isolation.test.ts +0 -143
  207. package/test/integration/execution/execution.test.ts +0 -634
  208. package/test/integration/execution/feature-status-write.test.ts +0 -302
  209. package/test/integration/execution/parallel.test.ts +0 -251
  210. package/test/integration/execution/prd-pause.test.ts +0 -205
  211. package/test/integration/execution/prd-resolvers.test.ts +0 -186
  212. package/test/integration/execution/progress.test.ts +0 -34
  213. package/test/integration/execution/runner-batching.test.ts +0 -682
  214. package/test/integration/execution/runner-config-plugins.test.ts +0 -462
  215. package/test/integration/execution/runner-escalation.test.ts +0 -561
  216. package/test/integration/execution/runner-fixes.test.ts +0 -400
  217. package/test/integration/execution/runner-plugin-integration.test.ts +0 -544
  218. package/test/integration/execution/runner-queue-and-attempts.test.ts +0 -476
  219. package/test/integration/execution/status-file-integration.test.ts +0 -289
  220. package/test/integration/execution/status-file.test.ts +0 -380
  221. package/test/integration/execution/status-writer.test.ts +0 -447
  222. package/test/integration/execution/story-id-in-events.test.ts +0 -274
  223. package/test/integration/interaction/interaction-chain-pipeline.test.ts +0 -476
  224. package/test/integration/pipeline/hooks.test.ts +0 -363
  225. package/test/integration/pipeline/pipeline-acceptance.test.ts +0 -303
  226. package/test/integration/pipeline/pipeline-events.test.ts +0 -476
  227. package/test/integration/pipeline/pipeline.test.ts +0 -660
  228. package/test/integration/pipeline/reporter-lifecycle.test.ts +0 -862
  229. package/test/integration/pipeline/verify-stage.test.ts +0 -286
  230. package/test/integration/plan/analyze-integration.test.ts +0 -262
  231. package/test/integration/plan/analyze-scanner.test.ts +0 -132
  232. package/test/integration/plan/logger.test.ts +0 -461
  233. package/test/integration/plan/plan.test.ts +0 -157
  234. package/test/integration/plugins/config-integration.test.ts +0 -173
  235. package/test/integration/plugins/config-resolution.test.ts +0 -523
  236. package/test/integration/plugins/loader.test.ts +0 -644
  237. package/test/integration/plugins/plugins-registry.test.ts +0 -747
  238. package/test/integration/plugins/validator.test.ts +0 -564
  239. package/test/integration/prompts/pb-004-migration.test.ts +0 -523
  240. package/test/integration/review/review-config-commands.test.ts +0 -320
  241. package/test/integration/review/review-config-schema.test.ts +0 -117
  242. package/test/integration/review/review-plugin-integration.test.ts +0 -729
  243. package/test/integration/review/review.test.ts +0 -150
  244. package/test/integration/routing/plugin-routing-advanced.test.ts +0 -461
  245. package/test/integration/routing/plugin-routing-core.test.ts +0 -527
  246. package/test/integration/routing/routing-stage-bug-021.test.ts +0 -275
  247. package/test/integration/routing/routing-stage-greenfield.test.ts +0 -287
  248. package/test/integration/tdd/tdd-cleanup.test.ts +0 -246
  249. package/test/integration/tdd/tdd-orchestrator-core.test.ts +0 -565
  250. package/test/integration/tdd/tdd-orchestrator-failureCategory.test.ts +0 -355
  251. package/test/integration/tdd/tdd-orchestrator-fallback.test.ts +0 -311
  252. package/test/integration/tdd/tdd-orchestrator-lite.test.ts +0 -289
  253. package/test/integration/tdd/tdd-orchestrator-prompts.test.ts +0 -260
  254. package/test/integration/tdd/tdd-orchestrator-verdict.test.ts +0 -536
  255. package/test/integration/tmp/headless-test/test.jsonl +0 -30
  256. package/test/integration/verification/test-scanner.test.ts +0 -403
  257. package/test/integration/verification/verification-asset-check.test.ts +0 -143
  258. package/test/integration/worktree/manager.test.ts +0 -218
  259. package/test/integration/worktree/worktree-merge.test.ts +0 -341
  260. package/test/manual/logging-formatter-demo.ts +0 -158
  261. package/test/ui/tui-agent-panel.test.tsx +0 -99
  262. package/test/ui/tui-pty-integration.test.tsx +0 -146
  263. package/test/unit/acceptance.test.ts +0 -187
  264. package/test/unit/agent-stderr-capture.test.ts +0 -147
  265. package/test/unit/agents/claude.test.ts +0 -107
  266. package/test/unit/analyze-classifier.test.ts +0 -216
  267. package/test/unit/analyze.test.ts +0 -224
  268. package/test/unit/auto-detect.test.ts +0 -250
  269. package/test/unit/cli-status-project-level.test.ts +0 -283
  270. package/test/unit/cli-status.test.ts +0 -418
  271. package/test/unit/commands/common.test.ts +0 -321
  272. package/test/unit/commands/logs.test.ts +0 -458
  273. package/test/unit/commands/runs.test.ts +0 -303
  274. package/test/unit/commands/unlock.test.ts +0 -320
  275. package/test/unit/config/defaults.test.ts +0 -70
  276. package/test/unit/config/quality-commands-schema.test.ts +0 -72
  277. package/test/unit/config/regression-gate-schema.test.ts +0 -160
  278. package/test/unit/config/smart-runner-flag.test.ts +0 -250
  279. package/test/unit/constitution-generators.test.ts +0 -161
  280. package/test/unit/constitution.test.ts +0 -210
  281. package/test/unit/context/context-autodetect.test.ts +0 -297
  282. package/test/unit/context/context-build.test.ts +0 -575
  283. package/test/unit/context/context-coverage.test.ts +0 -236
  284. package/test/unit/context/context-error.test.ts +0 -93
  285. package/test/unit/context/context-estimate-tokens.test.ts +0 -201
  286. package/test/unit/context/context-format.test.ts +0 -302
  287. package/test/unit/context/context-isolation.test.ts +0 -267
  288. package/test/unit/context/context-sort.test.ts +0 -93
  289. package/test/unit/context/context-story.test.ts +0 -108
  290. package/test/unit/context/prior-failures.test.ts +0 -463
  291. package/test/unit/context.test.ts +0 -1726
  292. package/test/unit/cost.test.ts +0 -231
  293. package/test/unit/crash-recovery.test.ts +0 -309
  294. package/test/unit/escalation.test.ts +0 -127
  295. package/test/unit/execution/lifecycle/run-completion.test.ts +0 -240
  296. package/test/unit/execution/lifecycle/run-regression.test.ts +0 -420
  297. package/test/unit/execution/pid-registry.test.ts +0 -241
  298. package/test/unit/execution/sequential-executor.test.ts +0 -235
  299. package/test/unit/execution/sfc-004-dead-code-cleanup.test.ts +0 -89
  300. package/test/unit/execution/structured-failure.test.ts +0 -415
  301. package/test/unit/execution-logging-stderr.test.ts +0 -157
  302. package/test/unit/execution-stage.test.ts +0 -123
  303. package/test/unit/fix-generator.test.ts +0 -276
  304. package/test/unit/formatters.test.ts +0 -468
  305. package/test/unit/greenfield.test.ts +0 -180
  306. package/test/unit/hooks/shell-security.test.ts +0 -40
  307. package/test/unit/interaction/auto-plugin.test.ts +0 -162
  308. package/test/unit/interaction/human-review-trigger.test.ts +0 -165
  309. package/test/unit/interaction-network-failures.test.ts +0 -390
  310. package/test/unit/interaction-plugins.test.ts +0 -472
  311. package/test/unit/logging/formatter.test.ts +0 -456
  312. package/test/unit/merge.test.ts +0 -269
  313. package/test/unit/metrics/aggregator.test.ts +0 -164
  314. package/test/unit/metrics/tracker.test.ts +0 -186
  315. package/test/unit/metrics.test.ts +0 -276
  316. package/test/unit/optimizer/noop.optimizer.test.ts +0 -125
  317. package/test/unit/optimizer/rule-based.optimizer.test.ts +0 -358
  318. package/test/unit/pipeline/event-bus.test.ts +0 -105
  319. package/test/unit/pipeline/routing-partial-override.test.ts +0 -121
  320. package/test/unit/pipeline/runner-retry.test.ts +0 -89
  321. package/test/unit/pipeline/stages/autofix.test.ts +0 -97
  322. package/test/unit/pipeline/stages/completion-review-gate.test.ts +0 -218
  323. package/test/unit/pipeline/stages/execution-ambiguity.test.ts +0 -311
  324. package/test/unit/pipeline/stages/execution-merge-conflict.test.ts +0 -218
  325. package/test/unit/pipeline/stages/rectify.test.ts +0 -101
  326. package/test/unit/pipeline/stages/regression-stage.test.ts +0 -69
  327. package/test/unit/pipeline/stages/review.test.ts +0 -201
  328. package/test/unit/pipeline/stages/routing-idempotence.test.ts +0 -139
  329. package/test/unit/pipeline/stages/routing-initial-complexity.test.ts +0 -321
  330. package/test/unit/pipeline/stages/routing-persistence.test.ts +0 -380
  331. package/test/unit/pipeline/stages/verify.test.ts +0 -267
  332. package/test/unit/pipeline/subscribers/events-writer.test.ts +0 -227
  333. package/test/unit/pipeline/subscribers/hooks.test.ts +0 -84
  334. package/test/unit/pipeline/subscribers/interaction.test.ts +0 -313
  335. package/test/unit/pipeline/subscribers/registry.test.ts +0 -149
  336. package/test/unit/pipeline/subscribers/reporters.test.ts +0 -90
  337. package/test/unit/pipeline/verify-smart-runner.test.ts +0 -345
  338. package/test/unit/prd-auto-default.test.ts +0 -291
  339. package/test/unit/prd-failure-category.test.ts +0 -177
  340. package/test/unit/prd-get-next-story.test.ts +0 -215
  341. package/test/unit/precheck/checks-warnings.test.ts +0 -114
  342. package/test/unit/precheck-checks.test.ts +0 -841
  343. package/test/unit/precheck-story-size-gate.test.ts +0 -288
  344. package/test/unit/precheck-types.test.ts +0 -143
  345. package/test/unit/prompts/builder.test.ts +0 -258
  346. package/test/unit/prompts/loader.test.ts +0 -355
  347. package/test/unit/prompts/sections/conventions.test.ts +0 -30
  348. package/test/unit/prompts/sections/isolation.test.ts +0 -35
  349. package/test/unit/prompts/sections/role-task.test.ts +0 -40
  350. package/test/unit/prompts/sections/sections.test.ts +0 -238
  351. package/test/unit/prompts/sections/story.test.ts +0 -45
  352. package/test/unit/prompts/sections/verdict.test.ts +0 -58
  353. package/test/unit/prompts.test.ts +0 -476
  354. package/test/unit/queue.test.ts +0 -237
  355. package/test/unit/rectification.test.ts +0 -285
  356. package/test/unit/registry.test.ts +0 -288
  357. package/test/unit/review/runner.test.ts +0 -117
  358. package/test/unit/routing/content-hash.test.ts +0 -99
  359. package/test/unit/routing/routing-stability.test.ts +0 -208
  360. package/test/unit/routing/strategies/llm.test.ts +0 -306
  361. package/test/unit/routing-advanced.test.ts +0 -313
  362. package/test/unit/routing-core.test.ts +0 -341
  363. package/test/unit/routing-strategies.test.ts +0 -440
  364. package/test/unit/storyid-events.test.ts +0 -213
  365. package/test/unit/tdd-verdict.test.ts +0 -492
  366. package/test/unit/test-output-parser.test.ts +0 -377
  367. package/test/unit/ui/tui-controls.test.ts +0 -335
  368. package/test/unit/ui/tui-cost-and-pty.test.ts +0 -190
  369. package/test/unit/ui/tui-layout.test.ts +0 -379
  370. package/test/unit/ui/tui-stories.test.ts +0 -333
  371. package/test/unit/unit-isolation.test.ts +0 -135
  372. package/test/unit/utils/git.test.ts +0 -50
  373. package/test/unit/utils/path-security.test.ts +0 -47
  374. package/test/unit/utils-helpers.test.ts +0 -318
  375. package/test/unit/verdict.test.ts +0 -325
  376. package/test/unit/verification/orchestrator-types.test.ts +0 -54
  377. package/test/unit/verification/orchestrator.test.ts +0 -66
  378. package/test/unit/verification/smart-runner-config.test.ts +0 -163
  379. package/test/unit/verification/smart-runner-discovery.test.ts +0 -354
  380. package/test/unit/verification/smart-runner.test.ts +0 -262
  381. package/test/unit/verification/strategies/acceptance.test.ts +0 -33
  382. package/test/unit/verification/strategies/regression.test.ts +0 -87
  383. package/test/unit/verification/strategies/scoped.test.ts +0 -100
  384. package/test/unit/worktree-manager.test.ts +0 -159
  385. package/tsconfig.json +0 -27
@@ -1,812 +0,0 @@
1
- # Spec: v0.10.1 — Status File + TDD Escalation Retry
2
-
3
- **Version:** v0.10.1
4
- **Author:** Subrina
5
- **Date:** 2026-02-25
6
- **Status:** Draft
7
-
8
- ---
9
-
10
- ## Summary
11
-
12
- Add a `--status-file <path>` flag to `nax run` that writes a machine-readable JSON status file, updated after each story completes. Enables external tools (CI/CD, orchestrators, dashboards) to monitor nax runs without parsing logs or aggregating hooks.
13
-
14
- ## Motivation
15
-
16
- - **Log parsing is fragile** — format changes break consumers
17
- - **Hook aggregation has gaps** — if a hook fails, events are lost; no single source of truth
18
- - **nax already tracks this state** — `RunResult`, story counts, cost, PRD status are all in memory
19
- - **General-purpose** — useful for any integration, not just our orchestrator skill
20
-
21
- ## Interface
22
-
23
- ### CLI Flag
24
-
25
- ```bash
26
- nax run -f <feature> --headless --status-file ./nax-status.json
27
- ```
28
-
29
- | Flag | Type | Default | Description |
30
- |:-----|:-----|:--------|:------------|
31
- | `--status-file` | `string` | `undefined` | Path to write JSON status file. If not set, no file is written. |
32
-
33
- Relative paths resolved from `cwd` (same as `--headless` log behavior).
34
-
35
- ### Status File Schema
36
-
37
- ```typescript
38
- interface NaxStatusFile {
39
- /** Schema version for forward compatibility */
40
- version: 1;
41
-
42
- /** Run metadata */
43
- run: {
44
- id: string; // Run ID (e.g. "run-2026-02-25T10-00-00-000Z")
45
- feature: string; // Feature name
46
- startedAt: string; // ISO 8601
47
- status: "running" | "completed" | "failed" | "stalled";
48
- dryRun: boolean;
49
- };
50
-
51
- /** Aggregate progress */
52
- progress: {
53
- total: number; // Total stories in PRD
54
- passed: number;
55
- failed: number;
56
- paused: number;
57
- blocked: number;
58
- pending: number; // total - passed - failed - paused - blocked
59
- };
60
-
61
- /** Cost tracking */
62
- cost: {
63
- spent: number; // USD accumulated
64
- limit: number | null; // From config.execution.costLimit
65
- };
66
-
67
- /** Current story being processed (null if between stories) */
68
- current: {
69
- storyId: string;
70
- title: string;
71
- complexity: string; // simple | medium | complex
72
- tddStrategy: string; // test-after | tdd-lite | three-session-tdd
73
- model: string; // Resolved model name
74
- attempt: number; // Current attempt (1-based)
75
- phase: string; // routing | test-write | implement | verify | review
76
- } | null;
77
-
78
- /** Iteration count */
79
- iterations: number;
80
-
81
- /** Last updated timestamp */
82
- updatedAt: string; // ISO 8601
83
-
84
- /** Duration so far in ms */
85
- durationMs: number;
86
- }
87
- ```
88
-
89
- ### Example Output
90
-
91
- ```json
92
- {
93
- "version": 1,
94
- "run": {
95
- "id": "run-2026-02-25T10-00-00-000Z",
96
- "feature": "auth-refactor",
97
- "startedAt": "2026-02-25T10:00:00Z",
98
- "status": "running",
99
- "dryRun": false
100
- },
101
- "progress": {
102
- "total": 12,
103
- "passed": 7,
104
- "failed": 1,
105
- "paused": 0,
106
- "blocked": 1,
107
- "pending": 3
108
- },
109
- "cost": {
110
- "spent": 1.23,
111
- "limit": 5.00
112
- },
113
- "current": {
114
- "storyId": "US-008",
115
- "title": "Add retry logic to queue handler",
116
- "complexity": "medium",
117
- "tddStrategy": "tdd-lite",
118
- "model": "claude-sonnet-4-5-20250514",
119
- "attempt": 1,
120
- "phase": "implement"
121
- },
122
- "iterations": 8,
123
- "updatedAt": "2026-02-25T10:15:32Z",
124
- "durationMs": 932000
125
- }
126
- ```
127
-
128
- ## Implementation
129
-
130
- ### Files to Change
131
-
132
- | File | Change |
133
- |:-----|:-------|
134
- | `src/execution/runner.ts` | Add `statusFile?: string` to `RunOptions`. Call `writeStatusFile()` at key points. |
135
- | `src/execution/status-file.ts` | **New file.** `writeStatusFile()` function — builds `NaxStatusFile` from run state, writes atomically. |
136
- | `src/main.ts` (or wherever CLI args are parsed) | Add `--status-file` option, pass to `RunOptions`. |
137
-
138
- ### Write Points
139
-
140
- Status file is updated at these moments:
141
-
142
- 1. **Run start** — initial state (all stories pending)
143
- 2. **Story start** — update `current` with story info
144
- 3. **Story complete/fail/pause** — update `progress` counts, clear `current`
145
- 4. **Run end** — final state (`status: "completed"` or `"failed"`)
146
-
147
- ### Atomic Writes
148
-
149
- Write to `<path>.tmp` then rename to `<path>` to prevent readers from seeing partial JSON:
150
-
151
- ```typescript
152
- import { rename } from "node:fs/promises";
153
-
154
- async function writeStatusFile(path: string, status: NaxStatusFile): Promise<void> {
155
- const tmpPath = `${path}.tmp`;
156
- await Bun.write(tmpPath, JSON.stringify(status, null, 2));
157
- await rename(tmpPath, path);
158
- }
159
- ```
160
-
161
- ### Integration with RunOptions
162
-
163
- ```typescript
164
- // src/execution/runner.ts
165
- export interface RunOptions {
166
- // ... existing fields
167
- /** Path to write JSON status file (optional) */
168
- statusFile?: string;
169
- }
170
- ```
171
-
172
- ### Progress Counting
173
-
174
- Derive from PRD state (already loaded):
175
-
176
- ```typescript
177
- function countProgress(prd: PRD): NaxStatusFile["progress"] {
178
- const stories = prd.stories;
179
- const passed = stories.filter(s => s.status === "passed").length;
180
- const failed = stories.filter(s => s.status === "failed").length;
181
- const paused = stories.filter(s => s.status === "paused").length;
182
- const blocked = stories.filter(s => s.status === "blocked").length;
183
- const total = stories.length;
184
- return { total, passed, failed, paused, blocked, pending: total - passed - failed - paused - blocked };
185
- }
186
- ```
187
-
188
- ### Cleanup
189
-
190
- The status file is **not** deleted on run end — it persists as a record of the last run. Consumers can check `run.status` to determine if the run is still active.
191
-
192
- ## Testing
193
-
194
- | Test | Description |
195
- |:-----|:------------|
196
- | `status-file.test.ts` | Unit: `writeStatusFile()` produces valid JSON, atomic write works |
197
- | `status-file.test.ts` | Unit: `countProgress()` correctly counts all states |
198
- | `runner.test.ts` | Integration: `--status-file` option flows through to `RunOptions` |
199
- | `runner.test.ts` | Integration: status file updates at each write point |
200
- | Manual | `--status-file` + `--dry-run` produces correct output |
201
-
202
- ## Non-Goals
203
-
204
- - **Real-time streaming** — this is a polled file, not a websocket/SSE stream
205
- - **Historical run data** — status file represents current/last run only (hooks + events.jsonl cover history)
206
- - **`nax status --json` command** — future work, can read this file
207
-
208
- ## Migration
209
-
210
- None. New optional flag, no breaking changes. If `--status-file` is not passed, behavior is identical to v0.10.0.
211
-
212
- ---
213
-
214
- # Feature 2: TDD Escalation Retry
215
-
216
- ## Summary
217
-
218
- Three-session TDD currently hard-codes `pause` for all failures — isolation violations, session crashes, and test failures all result in the story being paused with no retry. This means TDD stories never benefit from the escalation system that test-after stories use.
219
-
220
- Change: TDD failures should follow the same escalation retry pattern as test-after. Only pause when all retry paths are exhausted.
221
-
222
- ## Problem
223
-
224
- Current flow (all TDD failures):
225
- ```
226
- TDD failure → needsHumanReview=true → execution stage returns "pause" → story paused → NO RETRY
227
- ```
228
-
229
- test-after flow (for comparison):
230
- ```
231
- Agent failure → execution stage returns "escalate" → runner bumps tier → retries → only fails after max attempts
232
- ```
233
-
234
- ## Proposed Retry Strategy
235
-
236
- TDD failures are classified into three categories with different retry paths:
237
-
238
- ### Category 1: Isolation Violation (test-writer touches source)
239
-
240
- **Current:** Pause immediately.
241
- **Proposed:** Auto-downgrade to tdd-lite, then escalate.
242
-
243
- ```
244
- three-session-tdd fails (isolation violation)
245
- → Retry 1: three-session-tdd-lite (same tier, skip isolation for writer/implementer)
246
- → Success? Done ✅
247
- → Fail? Escalate to next tier
248
- → Retry 2: tdd-lite + stronger model
249
- → Success? Done ✅
250
- → Fail? Continue escalation through tier chain
251
- → All tiers exhausted → pause (needs human review) ⏸
252
- ```
253
-
254
- **Note:** The zero-file fallback already does this for one specific case (test-writer creates no test files → auto-retry as lite). This generalizes that pattern to all isolation violations.
255
-
256
- ### Category 2: Session Failure (agent crash, timeout, non-zero exit)
257
-
258
- **Current:** Pause immediately.
259
- **Proposed:** Escalate model tier (same as test-after).
260
-
261
- ```
262
- TDD session fails (crash/timeout)
263
- → Escalate to next model tier
264
- → Retry with stronger model (same TDD strategy)
265
- → Success? Done ✅
266
- → Fail? Continue escalation
267
- → All tiers exhausted → mark failed ❌
268
- ```
269
-
270
- ### Category 3: Tests Still Failing After All Sessions
271
-
272
- **Current:** Post-TDD verification runs. If tests fail → pause.
273
- **Proposed:** Escalate model tier.
274
-
275
- ```
276
- All 3 sessions complete but tests still fail
277
- → Escalate to next model tier
278
- → Retry full TDD with stronger model
279
- → Success? Done ✅
280
- → Fail? Continue escalation
281
- → All tiers exhausted → mark failed ❌
282
- ```
283
-
284
- ### Summary Table
285
-
286
- | Failure Type | Current Action | New Action | Final Fallback |
287
- |:-------------|:--------------|:-----------|:--------------|
288
- | Isolation violation | pause | Downgrade to lite → escalate | pause (human review) |
289
- | Zero test files created | lite retry (exists) | Keep existing + escalate | pause (human review) |
290
- | Session crash/timeout | pause | Escalate tier | fail |
291
- | Tests fail post-TDD | pause | Escalate tier | fail |
292
- | Verifier flags bad code | pause | Escalate tier | pause (human review) |
293
-
294
- **Why "pause" for isolation/verifier but "fail" for crashes?**
295
- - Isolation violations and verifier concerns suggest the code needs *human judgment* — the AI may be fundamentally misunderstanding the task.
296
- - Crashes and test failures are mechanical — a stronger model usually fixes them.
297
-
298
- ## Implementation
299
-
300
- ### Changes to `ThreeSessionTddResult`
301
-
302
- Add a `failureCategory` field so the execution stage can differentiate:
303
-
304
- ```typescript
305
- export interface ThreeSessionTddResult {
306
- success: boolean;
307
- sessions: TddSessionResult[];
308
- needsHumanReview: boolean;
309
- reviewReason?: string;
310
- totalCost: number;
311
- lite: boolean;
312
-
313
- /** NEW: Categorize failure for retry routing */
314
- failureCategory?: "isolation-violation" | "session-failure" | "tests-failing" | "verifier-rejected";
315
- }
316
- ```
317
-
318
- ### Changes to `execution.ts` (pipeline stage)
319
-
320
- Replace the blanket `pause` with category-based routing:
321
-
322
- ```typescript
323
- // Current:
324
- if (tddResult.needsHumanReview) {
325
- return { action: "pause", reason: tddResult.reviewReason };
326
- }
327
-
328
- // Proposed:
329
- if (!tddResult.success) {
330
- switch (tddResult.failureCategory) {
331
- case "isolation-violation":
332
- // If already lite → escalate. If strict → retry as lite (same tier).
333
- if (tddResult.lite) {
334
- return { action: "escalate", reason: tddResult.reviewReason };
335
- }
336
- // Store flag in context so runner knows to downgrade strategy
337
- ctx.retryAsLite = true;
338
- return { action: "escalate", reason: `Isolation violation — downgrading to lite` };
339
-
340
- case "session-failure":
341
- case "tests-failing":
342
- return { action: "escalate", reason: tddResult.reviewReason };
343
-
344
- case "verifier-rejected":
345
- // Escalate first, pause only after all tiers exhausted
346
- return { action: "escalate", reason: tddResult.reviewReason };
347
-
348
- default:
349
- return { action: "pause", reason: tddResult.reviewReason };
350
- }
351
- }
352
- ```
353
-
354
- ### Changes to `runner.ts` (escalation handler)
355
-
356
- When escalating a TDD story with `retryAsLite`, update the story's routing to use `three-session-tdd-lite`:
357
-
358
- ```typescript
359
- case "escalate": {
360
- // ... existing escalation logic ...
361
-
362
- // NEW: If retryAsLite flag set, downgrade TDD strategy
363
- if (pipelineResult.context?.retryAsLite && story.routing) {
364
- story.routing.testStrategy = "three-session-tdd-lite";
365
- }
366
-
367
- // ... rest of escalation ...
368
- }
369
- ```
370
-
371
- ### Changes to `tdd/orchestrator.ts`
372
-
373
- Set `failureCategory` based on what went wrong:
374
-
375
- ```typescript
376
- // After session 1 (test-writer) isolation failure:
377
- return {
378
- success: false,
379
- ...
380
- failureCategory: "isolation-violation",
381
- };
382
-
383
- // After session crash/timeout:
384
- return {
385
- success: false,
386
- ...
387
- failureCategory: "session-failure",
388
- };
389
-
390
- // After post-TDD verification fails:
391
- return {
392
- success: false,
393
- ...
394
- failureCategory: "tests-failing",
395
- };
396
- ```
397
-
398
- ### Files to Change
399
-
400
- | File | Change |
401
- |:-----|:-------|
402
- | `src/tdd/types.ts` | Add `failureCategory` to `ThreeSessionTddResult` |
403
- | `src/tdd/orchestrator.ts` | Set `failureCategory` at each failure point |
404
- | `src/pipeline/stages/execution.ts` | Route by `failureCategory` instead of blanket `pause` |
405
- | `src/pipeline/types.ts` | Add `retryAsLite?: boolean` to `PipelineContext` |
406
- | `src/execution/runner.ts` | Handle `retryAsLite` flag in escalation case |
407
-
408
- ### Testing
409
-
410
- | Test | Description |
411
- |:-----|:------------|
412
- | `tdd/orchestrator.test.ts` | Unit: each failure path sets correct `failureCategory` |
413
- | `pipeline/execution.test.ts` | Unit: isolation violation returns `escalate` (not `pause`) |
414
- | `pipeline/execution.test.ts` | Unit: lite isolation violation returns `escalate` |
415
- | `pipeline/execution.test.ts` | Unit: session failure returns `escalate` |
416
- | `execution/runner.test.ts` | Integration: TDD story escalates through tiers before failing |
417
- | `execution/runner.test.ts` | Integration: `retryAsLite` downgrades strategy on next attempt |
418
- | Manual | Run with intentionally strict project, verify lite downgrade + tier escalation |
419
-
420
- ## Retry Budget
421
-
422
- Uses the existing escalation config (`autoMode.escalation.tierOrder`). Example:
423
-
424
- ```json
425
- {
426
- "autoMode": {
427
- "escalation": {
428
- "enabled": true,
429
- "tierOrder": [
430
- { "tier": "fast", "attempts": 2 },
431
- { "tier": "balanced", "attempts": 2 },
432
- { "tier": "powerful", "attempts": 1 }
433
- ]
434
- }
435
- }
436
- }
437
- ```
438
-
439
- For a strict TDD story with isolation violation:
440
- ```
441
- Attempt 1: three-session-tdd @ fast → isolation violation
442
- Attempt 2: three-session-tdd-lite @ fast → tests fail
443
- Attempt 3: tdd-lite @ balanced → tests fail
444
- Attempt 4: tdd-lite @ balanced → tests fail
445
- Attempt 5: tdd-lite @ powerful → success ✅ (or fail → pause)
446
- ```
447
-
448
- Max cost is bounded by the existing tier budget. No new config needed.
449
-
450
- ---
451
-
452
- # Feature 3: Structured Verifier Verdicts
453
-
454
- ## Summary
455
-
456
- The verifier (session 3) is designed to judge whether the implementer's changes are legitimate — especially when the implementer modified test files. Currently, this judgment is implicit: the verifier runs as a regular agent, and the only signal is "did tests pass after verifier ran?" There's no structured verdict flowing back to the pipeline.
457
-
458
- Add structured output parsing to the verifier session so its judgment feeds into `failureCategory` and the escalation system.
459
-
460
- ## Problem
461
-
462
- Current verifier prompt asks it to:
463
- 1. Run tests and verify they pass
464
- 2. Review implementation quality
465
- 3. Check acceptance criteria
466
- 4. **Check if implementer modified test files and judge legitimacy**
467
- 5. Fix issues minimally
468
-
469
- But the result is just `{ success: boolean, estimatedCost: number }` — same as any agent session. The verifier's judgment about test modifications, code quality, and acceptance criteria is lost.
470
-
471
- **Consequences:**
472
- - If verifier finds illegitimate test modifications, it tries to fix them but we don't know *what* it found
473
- - If verifier can't fix the issue, it exits non-zero → treated same as a crash
474
- - No signal to differentiate "tests pass but code is bad" from "tests fail"
475
- - The `VerifierDecision` type exists in `types.ts` but is **never populated**
476
-
477
- ## Proposed Solution
478
-
479
- ### Structured Verdict File
480
-
481
- Instead of parsing agent stdout (fragile), the verifier writes a structured verdict file that the orchestrator reads after the session:
482
-
483
- ```
484
- <workdir>/.nax-verifier-verdict.json
485
- ```
486
-
487
- **Why a file?** Claude Code (the agent) can easily write files. Parsing structured output from stdout is unreliable with Claude Code since it mixes tool calls, thinking, and output.
488
-
489
- ### Verdict Schema
490
-
491
- ```typescript
492
- interface VerifierVerdict {
493
- /** Schema version */
494
- version: 1;
495
-
496
- /** Overall approval */
497
- approved: boolean;
498
-
499
- /** Test results */
500
- tests: {
501
- /** Did all tests pass? */
502
- allPassing: boolean;
503
- /** Number of tests passing */
504
- passCount: number;
505
- /** Number of tests failing */
506
- failCount: number;
507
- };
508
-
509
- /** Implementer test modification review */
510
- testModifications: {
511
- /** Were test files modified by implementer? */
512
- detected: boolean;
513
- /** List of modified test files */
514
- files: string[];
515
- /** Are the modifications legitimate? */
516
- legitimate: boolean;
517
- /** Reasoning for legitimacy judgment */
518
- reasoning: string;
519
- };
520
-
521
- /** Acceptance criteria check */
522
- acceptanceCriteria: {
523
- /** All criteria met? */
524
- allMet: boolean;
525
- /** Per-criterion status */
526
- criteria: Array<{
527
- criterion: string;
528
- met: boolean;
529
- note?: string;
530
- }>;
531
- };
532
-
533
- /** Code quality assessment */
534
- quality: {
535
- /** Overall quality: good | acceptable | poor */
536
- rating: "good" | "acceptable" | "poor";
537
- /** Issues found */
538
- issues: string[];
539
- };
540
-
541
- /** Fixes applied by verifier */
542
- fixes: string[];
543
-
544
- /** Overall reasoning */
545
- reasoning: string;
546
- }
547
- ```
548
-
549
- ### Updated Verifier Prompt
550
-
551
- ```typescript
552
- export function buildVerifierPrompt(story: UserStory): string {
553
- return `# Test-Driven Development — Session 3: Verify
554
-
555
- You are in the third session of a three-session TDD workflow. Tests and implementation are complete.
556
-
557
- **Story:** ${story.title}
558
-
559
- **Your tasks:**
560
- 1. Run all tests and verify they pass
561
- 2. Review the implementation for quality and correctness
562
- 3. Check that the implementation meets all acceptance criteria
563
- 4. Check if test files were modified by the implementer. If yes, verify the changes are legitimate fixes (e.g. fixing incorrect expectations) and NOT just loosening assertions to mask bugs.
564
- 5. If any issues exist, fix them minimally
565
-
566
- **Acceptance Criteria:**
567
- ${story.acceptanceCriteria.map((ac, i) => `${i + 1}. ${ac}`).join("\n")}
568
-
569
- **IMPORTANT — Write Verdict File:**
570
- After completing your review, write a JSON verdict file to \`.nax-verifier-verdict.json\` in the project root.
571
-
572
- \`\`\`json
573
- {
574
- "version": 1,
575
- "approved": true,
576
- "tests": {
577
- "allPassing": true,
578
- "passCount": 15,
579
- "failCount": 0
580
- },
581
- "testModifications": {
582
- "detected": false,
583
- "files": [],
584
- "legitimate": true,
585
- "reasoning": "No test files were modified by implementer"
586
- },
587
- "acceptanceCriteria": {
588
- "allMet": true,
589
- "criteria": [
590
- { "criterion": "Criterion text", "met": true }
591
- ]
592
- },
593
- "quality": {
594
- "rating": "good",
595
- "issues": []
596
- },
597
- "fixes": [],
598
- "reasoning": "All tests pass, implementation is clean, all criteria met."
599
- }
600
- \`\`\`
601
-
602
- Set \`approved: false\` if:
603
- - Tests are failing and you cannot fix them
604
- - Implementer loosened test assertions to mask bugs (testModifications.legitimate = false)
605
- - Critical acceptance criteria are not met
606
- - Code quality is poor with security or correctness issues
607
-
608
- Set \`approved: true\` if:
609
- - All tests pass (or pass after your minimal fixes)
610
- - Implementation is clean and follows conventions
611
- - All acceptance criteria met
612
- - Any test modifications by implementer are legitimate fixes
613
-
614
- When done, commit any fixes with message: "fix: verify and adjust ${story.title}"`;
615
- }
616
- ```
617
-
618
- ### Orchestrator Changes
619
-
620
- After verifier session completes, read and parse the verdict file:
621
-
622
- ```typescript
623
- // In tdd/orchestrator.ts, after session 3 completes:
624
-
625
- // Read verdict file
626
- const verdictPath = path.join(workdir, ".nax-verifier-verdict.json");
627
- let verdict: VerifierVerdict | null = null;
628
-
629
- try {
630
- const file = Bun.file(verdictPath);
631
- if (await file.exists()) {
632
- verdict = await file.json() as VerifierVerdict;
633
- logger.info("tdd", "Verifier verdict loaded", {
634
- storyId: story.id,
635
- approved: verdict.approved,
636
- testsAllPassing: verdict.tests.allPassing,
637
- testModsDetected: verdict.testModifications.detected,
638
- testModsLegitimate: verdict.testModifications.legitimate,
639
- qualityRating: verdict.quality.rating,
640
- allCriteriaMet: verdict.acceptanceCriteria.allMet,
641
- });
642
- } else {
643
- logger.warn("tdd", "No verifier verdict file found — falling back to test-only check", {
644
- storyId: story.id,
645
- });
646
- }
647
- } catch (err) {
648
- logger.warn("tdd", "Failed to parse verifier verdict", {
649
- storyId: story.id,
650
- error: String(err),
651
- });
652
- }
653
-
654
- // Clean up verdict file (don't leave it in the repo)
655
- try {
656
- await unlink(verdictPath);
657
- } catch { /* ignore */ }
658
- ```
659
-
660
- ### Verdict → failureCategory Mapping
661
-
662
- ```typescript
663
- function categorizeVerdict(
664
- verdict: VerifierVerdict | null,
665
- session3Success: boolean,
666
- testsPass: boolean,
667
- ): { success: boolean; failureCategory?: FailureCategory; reviewReason?: string } {
668
-
669
- // No verdict file → fall back to existing behavior (test-only check)
670
- if (!verdict) {
671
- if (testsPass) return { success: true };
672
- return {
673
- success: false,
674
- failureCategory: "tests-failing",
675
- reviewReason: "Tests failing after all sessions (no verdict file)",
676
- };
677
- }
678
-
679
- // Verdict: approved
680
- if (verdict.approved) {
681
- return { success: true };
682
- }
683
-
684
- // Verdict: not approved — classify why
685
-
686
- // Illegitimate test modifications (implementer cheated)
687
- if (verdict.testModifications.detected && !verdict.testModifications.legitimate) {
688
- return {
689
- success: false,
690
- failureCategory: "verifier-rejected",
691
- reviewReason: `Verifier rejected: illegitimate test modifications in ${verdict.testModifications.files.join(", ")}. ${verdict.testModifications.reasoning}`,
692
- };
693
- }
694
-
695
- // Tests failing
696
- if (!verdict.tests.allPassing) {
697
- return {
698
- success: false,
699
- failureCategory: "tests-failing",
700
- reviewReason: `Tests failing: ${verdict.tests.failCount} failures. ${verdict.reasoning}`,
701
- };
702
- }
703
-
704
- // Acceptance criteria not met
705
- if (!verdict.acceptanceCriteria.allMet) {
706
- const unmet = verdict.acceptanceCriteria.criteria
707
- .filter(c => !c.met)
708
- .map(c => c.criterion);
709
- return {
710
- success: false,
711
- failureCategory: "verifier-rejected",
712
- reviewReason: `Acceptance criteria not met: ${unmet.join("; ")}`,
713
- };
714
- }
715
-
716
- // Poor quality
717
- if (verdict.quality.rating === "poor") {
718
- return {
719
- success: false,
720
- failureCategory: "verifier-rejected",
721
- reviewReason: `Poor code quality: ${verdict.quality.issues.join("; ")}`,
722
- };
723
- }
724
-
725
- // Catch-all: verdict says not approved but no clear reason
726
- return {
727
- success: false,
728
- failureCategory: "verifier-rejected",
729
- reviewReason: verdict.reasoning || "Verifier rejected without specific reason",
730
- };
731
- }
732
- ```
733
-
734
- ### Escalation Behavior per Verdict
735
-
736
- | Verdict Reason | failureCategory | Escalation Path |
737
- |:---------------|:---------------|:---------------|
738
- | Illegitimate test mods | `verifier-rejected` | Escalate tier → pause after all tiers |
739
- | Tests failing | `tests-failing` | Escalate tier → fail after all tiers |
740
- | Criteria not met | `verifier-rejected` | Escalate tier → pause after all tiers |
741
- | Poor quality | `verifier-rejected` | Escalate tier → pause after all tiers |
742
- | Approved | — | Success ✅ |
743
- | No verdict file | Falls back to test check | Same as before |
744
-
745
- ### Verdict File Lifecycle
746
-
747
- 1. **Created by:** Verifier agent (session 3) writes `.nax-verifier-verdict.json`
748
- 2. **Read by:** TDD orchestrator after session 3 completes
749
- 3. **Deleted by:** TDD orchestrator after reading (not committed to git)
750
- 4. **Fallback:** If file missing or unparseable, fall back to existing behavior (post-TDD test verification)
751
-
752
- ### `.gitignore`
753
-
754
- Add to project `.gitignore` (or nax init template):
755
- ```
756
- .nax-verifier-verdict.json
757
- ```
758
-
759
- ### Files to Change
760
-
761
- | File | Change |
762
- |:-----|:-------|
763
- | `src/tdd/types.ts` | Add `VerifierVerdict` interface |
764
- | `src/tdd/prompts.ts` | Update `buildVerifierPrompt()` with verdict file instructions |
765
- | `src/tdd/orchestrator.ts` | Read verdict file after session 3, map to `failureCategory` |
766
- | `src/tdd/verdict.ts` | **New file.** `readVerdict()`, `categorizeVerdict()`, `cleanupVerdict()` |
767
-
768
- ### Testing
769
-
770
- | Test | Description |
771
- |:-----|:------------|
772
- | `tdd/verdict.test.ts` | Unit: `categorizeVerdict()` for all verdict combinations |
773
- | `tdd/verdict.test.ts` | Unit: missing verdict file falls back gracefully |
774
- | `tdd/verdict.test.ts` | Unit: malformed JSON falls back gracefully |
775
- | `tdd/orchestrator.test.ts` | Integration: verdict file read + cleanup after session 3 |
776
- | `tdd/orchestrator.test.ts` | Integration: illegitimate test mods → `verifier-rejected` |
777
- | Manual | Run TDD on a story, verify verdict file is written and consumed |
778
-
779
- ### Robustness
780
-
781
- **What if the agent doesn't write the verdict file?**
782
- Fall back to existing behavior: run tests independently, check pass/fail. This is the same as v0.10.0. The verdict file is an enhancement, not a requirement.
783
-
784
- **What if the JSON is malformed?**
785
- Log warning, fall back to test-only check. Never crash.
786
-
787
- **What if the agent writes wrong data?**
788
- Validate required fields (`version`, `approved`, `tests`). Missing fields → fall back. The verdict is advisory — the independent test run is the ground truth for "tests pass."
789
-
790
- ---
791
-
792
- # v0.10.1 Summary
793
-
794
- Three features, cohesive release:
795
-
796
- | Feature | Files Changed | Effort | Dependency |
797
- |:--------|:-------------|:-------|:-----------|
798
- | 1. `--status-file` | 3 (new `status-file.ts`, modify `runner.ts`, CLI) | Medium | None |
799
- | 2. TDD Escalation Retry | 5 (types, orchestrator, execution stage, pipeline types, runner) | Medium | None |
800
- | 3. Structured Verifier Verdicts | 4 (types, prompts, orchestrator, new `verdict.ts`) | Medium | Feature 2 (feeds `failureCategory`) |
801
-
802
- **Total files:** 10 changed/new (some overlap — `types.ts` and `orchestrator.ts` touched by features 2+3).
803
-
804
- **Breaking changes:** None. All features are additive/optional.
805
-
806
- **Config changes:** None. Uses existing escalation config.
807
-
808
- ### Implementation Order
809
-
810
- 1. Feature 1 (`--status-file`) — independent, can ship alone
811
- 2. Feature 2 (TDD escalation) — core retry logic
812
- 3. Feature 3 (verifier verdicts) — builds on feature 2's `failureCategory`