@nathapp/nax 0.28.0 → 0.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (376) hide show
  1. package/CHANGELOG.md +13 -2
  2. package/dist/nax.js +72691 -0
  3. package/package.json +12 -4
  4. package/src/cli/config.ts +3 -1
  5. package/src/config/defaults.ts +1 -0
  6. package/src/config/schemas.ts +1 -0
  7. package/src/config/types.ts +1 -0
  8. package/src/context/builder.ts +10 -1
  9. package/src/prompts/sections/role-task.ts +4 -2
  10. package/src/review/runner.ts +6 -1
  11. package/src/version.ts +2 -1
  12. package/.claude/rules/01-project-conventions.md +0 -34
  13. package/.claude/rules/02-test-architecture.md +0 -39
  14. package/.claude/rules/03-test-writing.md +0 -58
  15. package/.claude/rules/04-forbidden-patterns.md +0 -29
  16. package/.claude/settings.json +0 -15
  17. package/.githooks/pre-commit +0 -16
  18. package/.gitlab-ci.yml +0 -103
  19. package/.mcp.json +0 -8
  20. package/BRIEF.md +0 -140
  21. package/CLAUDE.md +0 -143
  22. package/US-007-IMPLEMENTATION.md +0 -139
  23. package/biome.json +0 -14
  24. package/bun.lock +0 -163
  25. package/bunfig.toml +0 -12
  26. package/docker-compose.test.yml +0 -15
  27. package/docs/20260216-fix-plan-context-review.md +0 -56
  28. package/docs/20260216-relentless-vs-ngent-comparison.md +0 -208
  29. package/docs/20260216-v02-plan.md +0 -136
  30. package/docs/20260216-v02-review.md +0 -685
  31. package/docs/20260217-dogfood-findings.md +0 -56
  32. package/docs/20260217-p2-plus-plan.md +0 -117
  33. package/docs/20260217-partial-fixes-plan.md +0 -62
  34. package/docs/20260217-plan-analyze-spec.md +0 -117
  35. package/docs/20260217-post-impl-review.md +0 -1137
  36. package/docs/20260217-quick-wins-plan.md +0 -66
  37. package/docs/20260217-split-runner-plan.md +0 -75
  38. package/docs/20260217-v03-impl-plan.md +0 -80
  39. package/docs/20260217-v03-post-impl-review.md +0 -589
  40. package/docs/20260217-v04-impl-plan.md +0 -86
  41. package/docs/20260217-v05-post-impl-review.md +0 -850
  42. package/docs/20260217-v06-post-impl-review.md +0 -817
  43. package/docs/20260218-adr003-port-plan.md +0 -151
  44. package/docs/20260218-review-adr003-verification.md +0 -175
  45. package/docs/20260219-fix-plan-bug16-19.md +0 -79
  46. package/docs/20260219-fix-plan-bug20-22.md +0 -114
  47. package/docs/20260219-plan-llm-routing.md +0 -116
  48. package/docs/20260219-review-bug20-22-fixes.md +0 -135
  49. package/docs/20260219-routing-baseline-keyword.md +0 -63
  50. package/docs/20260220-plan-structured-logging-p1.md +0 -80
  51. package/docs/20260220-plan-structured-logging-p2.md +0 -37
  52. package/docs/20260220-review-llm-routing.md +0 -180
  53. package/docs/20260220-review-post-fix-llm-routing.md +0 -70
  54. package/docs/20260221-fix-plan-relevantfiles-split.md +0 -101
  55. package/docs/20260221-fix-plan-routing-mode.md +0 -125
  56. package/docs/20260221-review-v0.9-implementation.md +0 -379
  57. package/docs/20260222-fix-plan-v091-routing-isolation.md +0 -197
  58. package/docs/20260223-fix-plan-prompt-audit.md +0 -62
  59. package/docs/20260224-nax-roadmap-phases.md +0 -189
  60. package/docs/20260225-phase2-llm-service-layer.md +0 -401
  61. package/docs/20260225-review-v0.10.1.md +0 -187
  62. package/docs/20260303-v010-implementation-plan.md +0 -165
  63. package/docs/20260304-review-nax.md +0 -492
  64. package/docs/CLAUDE.md.bak +0 -191
  65. package/docs/ROADMAP.md +0 -390
  66. package/docs/SPEC-rectification.md +0 -0
  67. package/docs/SPEC.md +0 -324
  68. package/docs/US-001-plugin-loading-verification.md +0 -152
  69. package/docs/adr/ADR-005-implementation-plan.md +0 -655
  70. package/docs/adr/ADR-005-pipeline-re-architecture.md +0 -464
  71. package/docs/architecture-analysis.md +0 -1076
  72. package/docs/bugs/BUG-21-escalation-null-attempts.md +0 -48
  73. package/docs/bugs-from-dogfood-run-c.md +0 -243
  74. package/docs/code-review-20260228.md +0 -612
  75. package/docs/code-review-v0.15.0.md +0 -629
  76. package/docs/hook-lifecycle-test-plan.md +0 -149
  77. package/docs/releases/v0.11.0-and-earlier.md +0 -20
  78. package/docs/releases/v0.12.0.md +0 -15
  79. package/docs/releases/v0.13.0.md +0 -14
  80. package/docs/releases/v0.14.0.md +0 -20
  81. package/docs/releases/v0.14.1.md +0 -36
  82. package/docs/releases/v0.14.2.md +0 -51
  83. package/docs/releases/v0.14.3.md +0 -174
  84. package/docs/releases/v0.14.4.md +0 -94
  85. package/docs/releases/v0.15.0.md +0 -502
  86. package/docs/releases/v0.15.1.md +0 -170
  87. package/docs/releases/v0.15.3.md +0 -193
  88. package/docs/specs/bug-039-orphan-processes.md +0 -131
  89. package/docs/specs/bug-040-review-rectification.md +0 -82
  90. package/docs/specs/bug-041-cross-story-test-isolation.md +0 -88
  91. package/docs/specs/bug-042-verifier-failure-capture.md +0 -117
  92. package/docs/specs/bun-pty-migration.md +0 -171
  93. package/docs/specs/central-run-registry.md +0 -116
  94. package/docs/specs/feat-010-smart-runner-git-history.md +0 -96
  95. package/docs/specs/feat-011-file-context-strategy.md +0 -73
  96. package/docs/specs/feat-012-tdd-writer-tier.md +0 -79
  97. package/docs/specs/feat-013-test-after-review.md +0 -89
  98. package/docs/specs/feat-014-heartbeat-observability.md +0 -127
  99. package/docs/specs/status-file-consolidation.md +0 -93
  100. package/docs/specs/status-file-v0.10.1.md +0 -812
  101. package/docs/specs/trigger-completion.md +0 -145
  102. package/docs/specs/verification-architecture-v2.md +0 -343
  103. package/docs/tdd/strategies.md +0 -97
  104. package/docs/v0.10-global-config.md +0 -206
  105. package/docs/v0.10-plugin-system.md +0 -415
  106. package/docs/v0.10-prompt-optimizer.md +0 -234
  107. package/docs/v0.3-spec.md +0 -244
  108. package/docs/v0.4-spec.md +0 -140
  109. package/docs/v0.5-spec.md +0 -237
  110. package/docs/v0.6-spec.md +0 -371
  111. package/docs/v0.7-spec.md +0 -177
  112. package/docs/v0.8-llm-routing.md +0 -206
  113. package/docs/v0.8-structured-logging.md +0 -132
  114. package/docs/v0.9.3-prompt-audit.md +0 -112
  115. package/examples/plugins/console-reporter/index.test.ts +0 -207
  116. package/examples/plugins/console-reporter/index.ts +0 -110
  117. package/memory/topic/feat-010-baseref.md +0 -28
  118. package/memory/topic/feat-013-test-after-deprecation.md +0 -22
  119. package/nax/config.json +0 -154
  120. package/nax/features/bug-039-medium/prd.json +0 -45
  121. package/nax/features/bugfix-v0171/prd.json +0 -52
  122. package/nax/features/central-run-registry/prd.json +0 -105
  123. package/nax/features/config-management/prd.json +0 -108
  124. package/nax/features/config-management/progress.txt +0 -5
  125. package/nax/features/diagnose/acceptance.test.ts +0 -414
  126. package/nax/features/diagnose/prd.json +0 -41
  127. package/nax/features/nax-compliance/prd.json +0 -52
  128. package/nax/features/nax-compliance/progress.txt +0 -1
  129. package/nax/features/orchestration-fixes/prd.json +0 -89
  130. package/nax/features/orchestration-fixes/progress.txt +0 -1
  131. package/nax/features/plugin-integration/US-007-VERIFICATION.md +0 -259
  132. package/nax/features/plugin-integration/prd.json +0 -208
  133. package/nax/features/plugin-integration/progress.txt +0 -5
  134. package/nax/features/post-rearch-bugfix/prd.json +0 -137
  135. package/nax/features/precheck/prd.json +0 -205
  136. package/nax/features/precheck/progress.txt +0 -15
  137. package/nax/features/prompt-builder/prd.json +0 -152
  138. package/nax/features/prompt-builder/progress.txt +0 -3
  139. package/nax/features/review-quality/prd.json +0 -55
  140. package/nax/features/routing-persistence/prd.json +0 -104
  141. package/nax/features/routing-persistence/progress.txt +0 -1
  142. package/nax/features/smart-test-runner/plan.md +0 -7
  143. package/nax/features/smart-test-runner/prd.json +0 -203
  144. package/nax/features/smart-test-runner/progress.txt +0 -13
  145. package/nax/features/smart-test-runner/spec.md +0 -7
  146. package/nax/features/smart-test-runner/tasks.md +0 -8
  147. package/nax/features/status-file-consolidation/prd.json +0 -106
  148. package/nax/features/structured-logging/prd.json +0 -199
  149. package/nax/features/trigger-completion/prd.json +0 -150
  150. package/nax/features/trigger-completion/progress.txt +0 -7
  151. package/nax/features/unlock/prd.json +0 -36
  152. package/nax/features/v0.18.3-execution-reliability/prd.json +0 -80
  153. package/nax/features/v0.18.3-execution-reliability/progress.txt +0 -3
  154. package/nax/features/v0.19.0-hardening/plan.md +0 -7
  155. package/nax/features/v0.19.0-hardening/prd.json +0 -84
  156. package/nax/features/v0.19.0-hardening/progress.txt +0 -7
  157. package/nax/features/v0.19.0-hardening/spec.md +0 -18
  158. package/nax/features/v0.19.0-hardening/tasks.md +0 -8
  159. package/nax/features/verify-v2/prd.json +0 -79
  160. package/nax/features/verify-v2/progress.txt +0 -3
  161. package/nax/status.json +0 -36
  162. package/test/COVERAGE-GAPS.md +0 -333
  163. package/test/e2e/cm-003-default-view.test.ts +0 -195
  164. package/test/e2e/plan-analyze-run.test.ts +0 -902
  165. package/test/helpers/helpers.test.ts +0 -295
  166. package/test/helpers/timeout.ts +0 -42
  167. package/test/integration/US-002-TEST-SUMMARY.md +0 -107
  168. package/test/integration/US-003-TEST-SUMMARY.md +0 -149
  169. package/test/integration/US-004-TEST-SUMMARY.md +0 -106
  170. package/test/integration/US-005-TEST-SUMMARY.md +0 -138
  171. package/test/integration/US-007-TEST-SUMMARY.md +0 -100
  172. package/test/integration/cli/agent-validation.test.ts +0 -439
  173. package/test/integration/cli/cli-config-default-edge-cases.test.ts +0 -223
  174. package/test/integration/cli/cli-config-default-view.test.ts +0 -230
  175. package/test/integration/cli/cli-config-diff.test.ts +0 -461
  176. package/test/integration/cli/cli-config-prompts-explain.test.ts +0 -74
  177. package/test/integration/cli/cli-config.test.ts +0 -737
  178. package/test/integration/cli/cli-diagnose.test.ts +0 -595
  179. package/test/integration/cli/cli-logs.test.ts +0 -346
  180. package/test/integration/cli/cli-plugins.test.ts +0 -679
  181. package/test/integration/cli/cli-precheck.test.ts +0 -372
  182. package/test/integration/cli/cli-run-headless.test.ts +0 -174
  183. package/test/integration/cli/cli.test.ts +0 -76
  184. package/test/integration/cli/precheck-integration.test.ts +0 -476
  185. package/test/integration/cli/precheck-orchestrator.test.ts +0 -247
  186. package/test/integration/cli/precheck.test.ts +0 -806
  187. package/test/integration/config/config-loader.test.ts +0 -266
  188. package/test/integration/config/config.test.ts +0 -444
  189. package/test/integration/config/merger.test.ts +0 -466
  190. package/test/integration/config/paths.test.ts +0 -52
  191. package/test/integration/config/security-loader.test.ts +0 -83
  192. package/test/integration/context/context-integration.test.ts +0 -703
  193. package/test/integration/context/context-path-security.test.ts +0 -173
  194. package/test/integration/context/context-provider-injection.test.ts +0 -507
  195. package/test/integration/context/context-verification-integration.test.ts +0 -296
  196. package/test/integration/context/s5-greenfield-fallback.test.ts +0 -298
  197. package/test/integration/execution/execution-isolation.test.ts +0 -143
  198. package/test/integration/execution/execution.test.ts +0 -634
  199. package/test/integration/execution/feature-status-write.test.ts +0 -302
  200. package/test/integration/execution/parallel.test.ts +0 -251
  201. package/test/integration/execution/prd-pause.test.ts +0 -205
  202. package/test/integration/execution/prd-resolvers.test.ts +0 -186
  203. package/test/integration/execution/progress.test.ts +0 -34
  204. package/test/integration/execution/runner-batching.test.ts +0 -682
  205. package/test/integration/execution/runner-config-plugins.test.ts +0 -462
  206. package/test/integration/execution/runner-escalation.test.ts +0 -561
  207. package/test/integration/execution/runner-fixes.test.ts +0 -400
  208. package/test/integration/execution/runner-plugin-integration.test.ts +0 -544
  209. package/test/integration/execution/runner-queue-and-attempts.test.ts +0 -476
  210. package/test/integration/execution/status-file-integration.test.ts +0 -289
  211. package/test/integration/execution/status-file.test.ts +0 -380
  212. package/test/integration/execution/status-writer.test.ts +0 -447
  213. package/test/integration/execution/story-id-in-events.test.ts +0 -274
  214. package/test/integration/interaction/interaction-chain-pipeline.test.ts +0 -476
  215. package/test/integration/pipeline/hooks.test.ts +0 -363
  216. package/test/integration/pipeline/pipeline-acceptance.test.ts +0 -303
  217. package/test/integration/pipeline/pipeline-events.test.ts +0 -476
  218. package/test/integration/pipeline/pipeline.test.ts +0 -660
  219. package/test/integration/pipeline/reporter-lifecycle.test.ts +0 -862
  220. package/test/integration/pipeline/verify-stage.test.ts +0 -286
  221. package/test/integration/plan/analyze-integration.test.ts +0 -262
  222. package/test/integration/plan/analyze-scanner.test.ts +0 -132
  223. package/test/integration/plan/logger.test.ts +0 -461
  224. package/test/integration/plan/plan.test.ts +0 -157
  225. package/test/integration/plugins/config-integration.test.ts +0 -173
  226. package/test/integration/plugins/config-resolution.test.ts +0 -523
  227. package/test/integration/plugins/loader.test.ts +0 -644
  228. package/test/integration/plugins/plugins-registry.test.ts +0 -747
  229. package/test/integration/plugins/validator.test.ts +0 -564
  230. package/test/integration/prompts/pb-004-migration.test.ts +0 -523
  231. package/test/integration/review/review-config-commands.test.ts +0 -320
  232. package/test/integration/review/review-config-schema.test.ts +0 -117
  233. package/test/integration/review/review-plugin-integration.test.ts +0 -729
  234. package/test/integration/review/review.test.ts +0 -150
  235. package/test/integration/routing/plugin-routing-advanced.test.ts +0 -461
  236. package/test/integration/routing/plugin-routing-core.test.ts +0 -527
  237. package/test/integration/routing/routing-stage-bug-021.test.ts +0 -275
  238. package/test/integration/routing/routing-stage-greenfield.test.ts +0 -287
  239. package/test/integration/tdd/tdd-cleanup.test.ts +0 -246
  240. package/test/integration/tdd/tdd-orchestrator-core.test.ts +0 -565
  241. package/test/integration/tdd/tdd-orchestrator-failureCategory.test.ts +0 -355
  242. package/test/integration/tdd/tdd-orchestrator-fallback.test.ts +0 -311
  243. package/test/integration/tdd/tdd-orchestrator-lite.test.ts +0 -289
  244. package/test/integration/tdd/tdd-orchestrator-prompts.test.ts +0 -260
  245. package/test/integration/tdd/tdd-orchestrator-verdict.test.ts +0 -536
  246. package/test/integration/tmp/headless-test/test.jsonl +0 -30
  247. package/test/integration/verification/test-scanner.test.ts +0 -403
  248. package/test/integration/verification/verification-asset-check.test.ts +0 -143
  249. package/test/integration/worktree/manager.test.ts +0 -218
  250. package/test/integration/worktree/worktree-merge.test.ts +0 -341
  251. package/test/manual/logging-formatter-demo.ts +0 -158
  252. package/test/ui/tui-agent-panel.test.tsx +0 -99
  253. package/test/ui/tui-pty-integration.test.tsx +0 -146
  254. package/test/unit/acceptance.test.ts +0 -187
  255. package/test/unit/agent-stderr-capture.test.ts +0 -147
  256. package/test/unit/agents/claude.test.ts +0 -107
  257. package/test/unit/analyze-classifier.test.ts +0 -216
  258. package/test/unit/analyze.test.ts +0 -224
  259. package/test/unit/auto-detect.test.ts +0 -250
  260. package/test/unit/cli-status-project-level.test.ts +0 -283
  261. package/test/unit/cli-status.test.ts +0 -418
  262. package/test/unit/commands/common.test.ts +0 -321
  263. package/test/unit/commands/logs.test.ts +0 -458
  264. package/test/unit/commands/runs.test.ts +0 -303
  265. package/test/unit/commands/unlock.test.ts +0 -320
  266. package/test/unit/config/defaults.test.ts +0 -70
  267. package/test/unit/config/quality-commands-schema.test.ts +0 -72
  268. package/test/unit/config/regression-gate-schema.test.ts +0 -160
  269. package/test/unit/config/smart-runner-flag.test.ts +0 -250
  270. package/test/unit/constitution-generators.test.ts +0 -161
  271. package/test/unit/constitution.test.ts +0 -210
  272. package/test/unit/context/context-autodetect.test.ts +0 -297
  273. package/test/unit/context/context-build.test.ts +0 -575
  274. package/test/unit/context/context-coverage.test.ts +0 -236
  275. package/test/unit/context/context-error.test.ts +0 -93
  276. package/test/unit/context/context-estimate-tokens.test.ts +0 -201
  277. package/test/unit/context/context-format.test.ts +0 -302
  278. package/test/unit/context/context-isolation.test.ts +0 -267
  279. package/test/unit/context/context-sort.test.ts +0 -93
  280. package/test/unit/context/context-story.test.ts +0 -108
  281. package/test/unit/context/prior-failures.test.ts +0 -463
  282. package/test/unit/context.test.ts +0 -1726
  283. package/test/unit/cost.test.ts +0 -231
  284. package/test/unit/crash-recovery.test.ts +0 -309
  285. package/test/unit/escalation.test.ts +0 -127
  286. package/test/unit/execution/lifecycle/run-completion.test.ts +0 -240
  287. package/test/unit/execution/lifecycle/run-regression.test.ts +0 -420
  288. package/test/unit/execution/pid-registry.test.ts +0 -241
  289. package/test/unit/execution/sequential-executor.test.ts +0 -235
  290. package/test/unit/execution/sfc-004-dead-code-cleanup.test.ts +0 -89
  291. package/test/unit/execution/structured-failure.test.ts +0 -415
  292. package/test/unit/execution-logging-stderr.test.ts +0 -157
  293. package/test/unit/execution-stage.test.ts +0 -123
  294. package/test/unit/fix-generator.test.ts +0 -276
  295. package/test/unit/formatters.test.ts +0 -468
  296. package/test/unit/greenfield.test.ts +0 -180
  297. package/test/unit/hooks/shell-security.test.ts +0 -40
  298. package/test/unit/interaction/auto-plugin.test.ts +0 -162
  299. package/test/unit/interaction/human-review-trigger.test.ts +0 -165
  300. package/test/unit/interaction-network-failures.test.ts +0 -390
  301. package/test/unit/interaction-plugins.test.ts +0 -472
  302. package/test/unit/logging/formatter.test.ts +0 -456
  303. package/test/unit/merge.test.ts +0 -269
  304. package/test/unit/metrics/aggregator.test.ts +0 -164
  305. package/test/unit/metrics/tracker.test.ts +0 -186
  306. package/test/unit/metrics.test.ts +0 -276
  307. package/test/unit/optimizer/noop.optimizer.test.ts +0 -125
  308. package/test/unit/optimizer/rule-based.optimizer.test.ts +0 -358
  309. package/test/unit/pipeline/event-bus.test.ts +0 -105
  310. package/test/unit/pipeline/routing-partial-override.test.ts +0 -121
  311. package/test/unit/pipeline/runner-retry.test.ts +0 -89
  312. package/test/unit/pipeline/stages/autofix.test.ts +0 -97
  313. package/test/unit/pipeline/stages/completion-review-gate.test.ts +0 -218
  314. package/test/unit/pipeline/stages/execution-ambiguity.test.ts +0 -311
  315. package/test/unit/pipeline/stages/execution-merge-conflict.test.ts +0 -218
  316. package/test/unit/pipeline/stages/rectify.test.ts +0 -101
  317. package/test/unit/pipeline/stages/regression-stage.test.ts +0 -69
  318. package/test/unit/pipeline/stages/review.test.ts +0 -201
  319. package/test/unit/pipeline/stages/routing-idempotence.test.ts +0 -139
  320. package/test/unit/pipeline/stages/routing-initial-complexity.test.ts +0 -321
  321. package/test/unit/pipeline/stages/routing-persistence.test.ts +0 -380
  322. package/test/unit/pipeline/stages/verify.test.ts +0 -267
  323. package/test/unit/pipeline/subscribers/events-writer.test.ts +0 -227
  324. package/test/unit/pipeline/subscribers/hooks.test.ts +0 -84
  325. package/test/unit/pipeline/subscribers/interaction.test.ts +0 -313
  326. package/test/unit/pipeline/subscribers/registry.test.ts +0 -149
  327. package/test/unit/pipeline/subscribers/reporters.test.ts +0 -90
  328. package/test/unit/pipeline/verify-smart-runner.test.ts +0 -345
  329. package/test/unit/prd-auto-default.test.ts +0 -291
  330. package/test/unit/prd-failure-category.test.ts +0 -177
  331. package/test/unit/prd-get-next-story.test.ts +0 -215
  332. package/test/unit/precheck/checks-warnings.test.ts +0 -114
  333. package/test/unit/precheck-checks.test.ts +0 -841
  334. package/test/unit/precheck-story-size-gate.test.ts +0 -288
  335. package/test/unit/precheck-types.test.ts +0 -143
  336. package/test/unit/prompts/builder.test.ts +0 -258
  337. package/test/unit/prompts/loader.test.ts +0 -355
  338. package/test/unit/prompts/sections/conventions.test.ts +0 -30
  339. package/test/unit/prompts/sections/isolation.test.ts +0 -35
  340. package/test/unit/prompts/sections/role-task.test.ts +0 -40
  341. package/test/unit/prompts/sections/sections.test.ts +0 -238
  342. package/test/unit/prompts/sections/story.test.ts +0 -45
  343. package/test/unit/prompts/sections/verdict.test.ts +0 -58
  344. package/test/unit/prompts.test.ts +0 -476
  345. package/test/unit/queue.test.ts +0 -237
  346. package/test/unit/rectification.test.ts +0 -285
  347. package/test/unit/registry.test.ts +0 -288
  348. package/test/unit/review/runner.test.ts +0 -117
  349. package/test/unit/routing/content-hash.test.ts +0 -99
  350. package/test/unit/routing/routing-stability.test.ts +0 -208
  351. package/test/unit/routing/strategies/llm.test.ts +0 -306
  352. package/test/unit/routing-advanced.test.ts +0 -313
  353. package/test/unit/routing-core.test.ts +0 -341
  354. package/test/unit/routing-strategies.test.ts +0 -440
  355. package/test/unit/storyid-events.test.ts +0 -213
  356. package/test/unit/tdd-verdict.test.ts +0 -492
  357. package/test/unit/test-output-parser.test.ts +0 -377
  358. package/test/unit/ui/tui-controls.test.ts +0 -335
  359. package/test/unit/ui/tui-cost-and-pty.test.ts +0 -190
  360. package/test/unit/ui/tui-layout.test.ts +0 -379
  361. package/test/unit/ui/tui-stories.test.ts +0 -333
  362. package/test/unit/unit-isolation.test.ts +0 -135
  363. package/test/unit/utils/git.test.ts +0 -50
  364. package/test/unit/utils/path-security.test.ts +0 -47
  365. package/test/unit/utils-helpers.test.ts +0 -318
  366. package/test/unit/verdict.test.ts +0 -325
  367. package/test/unit/verification/orchestrator-types.test.ts +0 -54
  368. package/test/unit/verification/orchestrator.test.ts +0 -66
  369. package/test/unit/verification/smart-runner-config.test.ts +0 -163
  370. package/test/unit/verification/smart-runner-discovery.test.ts +0 -354
  371. package/test/unit/verification/smart-runner.test.ts +0 -262
  372. package/test/unit/verification/strategies/acceptance.test.ts +0 -33
  373. package/test/unit/verification/strategies/regression.test.ts +0 -87
  374. package/test/unit/verification/strategies/scoped.test.ts +0 -100
  375. package/test/unit/worktree-manager.test.ts +0 -159
  376. package/tsconfig.json +0 -27
@@ -1,812 +0,0 @@
1
- # Spec: v0.10.1 — Status File + TDD Escalation Retry
2
-
3
- **Version:** v0.10.1
4
- **Author:** Subrina
5
- **Date:** 2026-02-25
6
- **Status:** Draft
7
-
8
- ---
9
-
10
- ## Summary
11
-
12
- Add a `--status-file <path>` flag to `nax run` that writes a machine-readable JSON status file, updated after each story completes. Enables external tools (CI/CD, orchestrators, dashboards) to monitor nax runs without parsing logs or aggregating hooks.
13
-
14
- ## Motivation
15
-
16
- - **Log parsing is fragile** — format changes break consumers
17
- - **Hook aggregation has gaps** — if a hook fails, events are lost; no single source of truth
18
- - **nax already tracks this state** — `RunResult`, story counts, cost, PRD status are all in memory
19
- - **General-purpose** — useful for any integration, not just our orchestrator skill
20
-
21
- ## Interface
22
-
23
- ### CLI Flag
24
-
25
- ```bash
26
- nax run -f <feature> --headless --status-file ./nax-status.json
27
- ```
28
-
29
- | Flag | Type | Default | Description |
30
- |:-----|:-----|:--------|:------------|
31
- | `--status-file` | `string` | `undefined` | Path to write JSON status file. If not set, no file is written. |
32
-
33
- Relative paths resolved from `cwd` (same as `--headless` log behavior).
34
-
35
- ### Status File Schema
36
-
37
- ```typescript
38
- interface NaxStatusFile {
39
- /** Schema version for forward compatibility */
40
- version: 1;
41
-
42
- /** Run metadata */
43
- run: {
44
- id: string; // Run ID (e.g. "run-2026-02-25T10-00-00-000Z")
45
- feature: string; // Feature name
46
- startedAt: string; // ISO 8601
47
- status: "running" | "completed" | "failed" | "stalled";
48
- dryRun: boolean;
49
- };
50
-
51
- /** Aggregate progress */
52
- progress: {
53
- total: number; // Total stories in PRD
54
- passed: number;
55
- failed: number;
56
- paused: number;
57
- blocked: number;
58
- pending: number; // total - passed - failed - paused - blocked
59
- };
60
-
61
- /** Cost tracking */
62
- cost: {
63
- spent: number; // USD accumulated
64
- limit: number | null; // From config.execution.costLimit
65
- };
66
-
67
- /** Current story being processed (null if between stories) */
68
- current: {
69
- storyId: string;
70
- title: string;
71
- complexity: string; // simple | medium | complex
72
- tddStrategy: string; // test-after | tdd-lite | three-session-tdd
73
- model: string; // Resolved model name
74
- attempt: number; // Current attempt (1-based)
75
- phase: string; // routing | test-write | implement | verify | review
76
- } | null;
77
-
78
- /** Iteration count */
79
- iterations: number;
80
-
81
- /** Last updated timestamp */
82
- updatedAt: string; // ISO 8601
83
-
84
- /** Duration so far in ms */
85
- durationMs: number;
86
- }
87
- ```
88
-
89
- ### Example Output
90
-
91
- ```json
92
- {
93
- "version": 1,
94
- "run": {
95
- "id": "run-2026-02-25T10-00-00-000Z",
96
- "feature": "auth-refactor",
97
- "startedAt": "2026-02-25T10:00:00Z",
98
- "status": "running",
99
- "dryRun": false
100
- },
101
- "progress": {
102
- "total": 12,
103
- "passed": 7,
104
- "failed": 1,
105
- "paused": 0,
106
- "blocked": 1,
107
- "pending": 3
108
- },
109
- "cost": {
110
- "spent": 1.23,
111
- "limit": 5.00
112
- },
113
- "current": {
114
- "storyId": "US-008",
115
- "title": "Add retry logic to queue handler",
116
- "complexity": "medium",
117
- "tddStrategy": "tdd-lite",
118
- "model": "claude-sonnet-4-5-20250514",
119
- "attempt": 1,
120
- "phase": "implement"
121
- },
122
- "iterations": 8,
123
- "updatedAt": "2026-02-25T10:15:32Z",
124
- "durationMs": 932000
125
- }
126
- ```
127
-
128
- ## Implementation
129
-
130
- ### Files to Change
131
-
132
- | File | Change |
133
- |:-----|:-------|
134
- | `src/execution/runner.ts` | Add `statusFile?: string` to `RunOptions`. Call `writeStatusFile()` at key points. |
135
- | `src/execution/status-file.ts` | **New file.** `writeStatusFile()` function — builds `NaxStatusFile` from run state, writes atomically. |
136
- | `src/main.ts` (or wherever CLI args are parsed) | Add `--status-file` option, pass to `RunOptions`. |
137
-
138
- ### Write Points
139
-
140
- Status file is updated at these moments:
141
-
142
- 1. **Run start** — initial state (all stories pending)
143
- 2. **Story start** — update `current` with story info
144
- 3. **Story complete/fail/pause** — update `progress` counts, clear `current`
145
- 4. **Run end** — final state (`status: "completed"` or `"failed"`)
146
-
147
- ### Atomic Writes
148
-
149
- Write to `<path>.tmp` then rename to `<path>` to prevent readers from seeing partial JSON:
150
-
151
- ```typescript
152
- import { rename } from "node:fs/promises";
153
-
154
- async function writeStatusFile(path: string, status: NaxStatusFile): Promise<void> {
155
- const tmpPath = `${path}.tmp`;
156
- await Bun.write(tmpPath, JSON.stringify(status, null, 2));
157
- await rename(tmpPath, path);
158
- }
159
- ```
160
-
161
- ### Integration with RunOptions
162
-
163
- ```typescript
164
- // src/execution/runner.ts
165
- export interface RunOptions {
166
- // ... existing fields
167
- /** Path to write JSON status file (optional) */
168
- statusFile?: string;
169
- }
170
- ```
171
-
172
- ### Progress Counting
173
-
174
- Derive from PRD state (already loaded):
175
-
176
- ```typescript
177
- function countProgress(prd: PRD): NaxStatusFile["progress"] {
178
- const stories = prd.stories;
179
- const passed = stories.filter(s => s.status === "passed").length;
180
- const failed = stories.filter(s => s.status === "failed").length;
181
- const paused = stories.filter(s => s.status === "paused").length;
182
- const blocked = stories.filter(s => s.status === "blocked").length;
183
- const total = stories.length;
184
- return { total, passed, failed, paused, blocked, pending: total - passed - failed - paused - blocked };
185
- }
186
- ```
187
-
188
- ### Cleanup
189
-
190
- The status file is **not** deleted on run end — it persists as a record of the last run. Consumers can check `run.status` to determine if the run is still active.
191
-
192
- ## Testing
193
-
194
- | Test | Description |
195
- |:-----|:------------|
196
- | `status-file.test.ts` | Unit: `writeStatusFile()` produces valid JSON, atomic write works |
197
- | `status-file.test.ts` | Unit: `countProgress()` correctly counts all states |
198
- | `runner.test.ts` | Integration: `--status-file` option flows through to `RunOptions` |
199
- | `runner.test.ts` | Integration: status file updates at each write point |
200
- | Manual | `--status-file` + `--dry-run` produces correct output |
201
-
202
- ## Non-Goals
203
-
204
- - **Real-time streaming** — this is a polled file, not a websocket/SSE stream
205
- - **Historical run data** — status file represents current/last run only (hooks + events.jsonl cover history)
206
- - **`nax status --json` command** — future work, can read this file
207
-
208
- ## Migration
209
-
210
- None. New optional flag, no breaking changes. If `--status-file` is not passed, behavior is identical to v0.10.0.
211
-
212
- ---
213
-
214
- # Feature 2: TDD Escalation Retry
215
-
216
- ## Summary
217
-
218
- Three-session TDD currently hard-codes `pause` for all failures — isolation violations, session crashes, and test failures all result in the story being paused with no retry. This means TDD stories never benefit from the escalation system that test-after stories use.
219
-
220
- Change: TDD failures should follow the same escalation retry pattern as test-after. Only pause when all retry paths are exhausted.
221
-
222
- ## Problem
223
-
224
- Current flow (all TDD failures):
225
- ```
226
- TDD failure → needsHumanReview=true → execution stage returns "pause" → story paused → NO RETRY
227
- ```
228
-
229
- test-after flow (for comparison):
230
- ```
231
- Agent failure → execution stage returns "escalate" → runner bumps tier → retries → only fails after max attempts
232
- ```
233
-
234
- ## Proposed Retry Strategy
235
-
236
- TDD failures are classified into three categories with different retry paths:
237
-
238
- ### Category 1: Isolation Violation (test-writer touches source)
239
-
240
- **Current:** Pause immediately.
241
- **Proposed:** Auto-downgrade to tdd-lite, then escalate.
242
-
243
- ```
244
- three-session-tdd fails (isolation violation)
245
- → Retry 1: three-session-tdd-lite (same tier, skip isolation for writer/implementer)
246
- → Success? Done ✅
247
- → Fail? Escalate to next tier
248
- → Retry 2: tdd-lite + stronger model
249
- → Success? Done ✅
250
- → Fail? Continue escalation through tier chain
251
- → All tiers exhausted → pause (needs human review) ⏸
252
- ```
253
-
254
- **Note:** The zero-file fallback already does this for one specific case (test-writer creates no test files → auto-retry as lite). This generalizes that pattern to all isolation violations.
255
-
256
- ### Category 2: Session Failure (agent crash, timeout, non-zero exit)
257
-
258
- **Current:** Pause immediately.
259
- **Proposed:** Escalate model tier (same as test-after).
260
-
261
- ```
262
- TDD session fails (crash/timeout)
263
- → Escalate to next model tier
264
- → Retry with stronger model (same TDD strategy)
265
- → Success? Done ✅
266
- → Fail? Continue escalation
267
- → All tiers exhausted → mark failed ❌
268
- ```
269
-
270
- ### Category 3: Tests Still Failing After All Sessions
271
-
272
- **Current:** Post-TDD verification runs. If tests fail → pause.
273
- **Proposed:** Escalate model tier.
274
-
275
- ```
276
- All 3 sessions complete but tests still fail
277
- → Escalate to next model tier
278
- → Retry full TDD with stronger model
279
- → Success? Done ✅
280
- → Fail? Continue escalation
281
- → All tiers exhausted → mark failed ❌
282
- ```
283
-
284
- ### Summary Table
285
-
286
- | Failure Type | Current Action | New Action | Final Fallback |
287
- |:-------------|:--------------|:-----------|:--------------|
288
- | Isolation violation | pause | Downgrade to lite → escalate | pause (human review) |
289
- | Zero test files created | lite retry (exists) | Keep existing + escalate | pause (human review) |
290
- | Session crash/timeout | pause | Escalate tier | fail |
291
- | Tests fail post-TDD | pause | Escalate tier | fail |
292
- | Verifier flags bad code | pause | Escalate tier | pause (human review) |
293
-
294
- **Why "pause" for isolation/verifier but "fail" for crashes?**
295
- - Isolation violations and verifier concerns suggest the code needs *human judgment* — the AI may be fundamentally misunderstanding the task.
296
- - Crashes and test failures are mechanical — a stronger model usually fixes them.
297
-
298
- ## Implementation
299
-
300
- ### Changes to `ThreeSessionTddResult`
301
-
302
- Add a `failureCategory` field so the execution stage can differentiate:
303
-
304
- ```typescript
305
- export interface ThreeSessionTddResult {
306
- success: boolean;
307
- sessions: TddSessionResult[];
308
- needsHumanReview: boolean;
309
- reviewReason?: string;
310
- totalCost: number;
311
- lite: boolean;
312
-
313
- /** NEW: Categorize failure for retry routing */
314
- failureCategory?: "isolation-violation" | "session-failure" | "tests-failing" | "verifier-rejected";
315
- }
316
- ```
317
-
318
- ### Changes to `execution.ts` (pipeline stage)
319
-
320
- Replace the blanket `pause` with category-based routing:
321
-
322
- ```typescript
323
- // Current:
324
- if (tddResult.needsHumanReview) {
325
- return { action: "pause", reason: tddResult.reviewReason };
326
- }
327
-
328
- // Proposed:
329
- if (!tddResult.success) {
330
- switch (tddResult.failureCategory) {
331
- case "isolation-violation":
332
- // If already lite → escalate. If strict → retry as lite (same tier).
333
- if (tddResult.lite) {
334
- return { action: "escalate", reason: tddResult.reviewReason };
335
- }
336
- // Store flag in context so runner knows to downgrade strategy
337
- ctx.retryAsLite = true;
338
- return { action: "escalate", reason: `Isolation violation — downgrading to lite` };
339
-
340
- case "session-failure":
341
- case "tests-failing":
342
- return { action: "escalate", reason: tddResult.reviewReason };
343
-
344
- case "verifier-rejected":
345
- // Escalate first, pause only after all tiers exhausted
346
- return { action: "escalate", reason: tddResult.reviewReason };
347
-
348
- default:
349
- return { action: "pause", reason: tddResult.reviewReason };
350
- }
351
- }
352
- ```
353
-
354
- ### Changes to `runner.ts` (escalation handler)
355
-
356
- When escalating a TDD story with `retryAsLite`, update the story's routing to use `three-session-tdd-lite`:
357
-
358
- ```typescript
359
- case "escalate": {
360
- // ... existing escalation logic ...
361
-
362
- // NEW: If retryAsLite flag set, downgrade TDD strategy
363
- if (pipelineResult.context?.retryAsLite && story.routing) {
364
- story.routing.testStrategy = "three-session-tdd-lite";
365
- }
366
-
367
- // ... rest of escalation ...
368
- }
369
- ```
370
-
371
- ### Changes to `tdd/orchestrator.ts`
372
-
373
- Set `failureCategory` based on what went wrong:
374
-
375
- ```typescript
376
- // After session 1 (test-writer) isolation failure:
377
- return {
378
- success: false,
379
- ...
380
- failureCategory: "isolation-violation",
381
- };
382
-
383
- // After session crash/timeout:
384
- return {
385
- success: false,
386
- ...
387
- failureCategory: "session-failure",
388
- };
389
-
390
- // After post-TDD verification fails:
391
- return {
392
- success: false,
393
- ...
394
- failureCategory: "tests-failing",
395
- };
396
- ```
397
-
398
- ### Files to Change
399
-
400
- | File | Change |
401
- |:-----|:-------|
402
- | `src/tdd/types.ts` | Add `failureCategory` to `ThreeSessionTddResult` |
403
- | `src/tdd/orchestrator.ts` | Set `failureCategory` at each failure point |
404
- | `src/pipeline/stages/execution.ts` | Route by `failureCategory` instead of blanket `pause` |
405
- | `src/pipeline/types.ts` | Add `retryAsLite?: boolean` to `PipelineContext` |
406
- | `src/execution/runner.ts` | Handle `retryAsLite` flag in escalation case |
407
-
408
- ### Testing
409
-
410
- | Test | Description |
411
- |:-----|:------------|
412
- | `tdd/orchestrator.test.ts` | Unit: each failure path sets correct `failureCategory` |
413
- | `pipeline/execution.test.ts` | Unit: isolation violation returns `escalate` (not `pause`) |
414
- | `pipeline/execution.test.ts` | Unit: lite isolation violation returns `escalate` |
415
- | `pipeline/execution.test.ts` | Unit: session failure returns `escalate` |
416
- | `execution/runner.test.ts` | Integration: TDD story escalates through tiers before failing |
417
- | `execution/runner.test.ts` | Integration: `retryAsLite` downgrades strategy on next attempt |
418
- | Manual | Run with intentionally strict project, verify lite downgrade + tier escalation |
419
-
420
- ## Retry Budget
421
-
422
- Uses the existing escalation config (`autoMode.escalation.tierOrder`). Example:
423
-
424
- ```json
425
- {
426
- "autoMode": {
427
- "escalation": {
428
- "enabled": true,
429
- "tierOrder": [
430
- { "tier": "fast", "attempts": 2 },
431
- { "tier": "balanced", "attempts": 2 },
432
- { "tier": "powerful", "attempts": 1 }
433
- ]
434
- }
435
- }
436
- }
437
- ```
438
-
439
- For a strict TDD story with isolation violation:
440
- ```
441
- Attempt 1: three-session-tdd @ fast → isolation violation
442
- Attempt 2: three-session-tdd-lite @ fast → tests fail
443
- Attempt 3: tdd-lite @ balanced → tests fail
444
- Attempt 4: tdd-lite @ balanced → tests fail
445
- Attempt 5: tdd-lite @ powerful → success ✅ (or fail → pause)
446
- ```
447
-
448
- Max cost is bounded by the existing tier budget. No new config needed.
449
-
450
- ---
451
-
452
- # Feature 3: Structured Verifier Verdicts
453
-
454
- ## Summary
455
-
456
- The verifier (session 3) is designed to judge whether the implementer's changes are legitimate — especially when the implementer modified test files. Currently, this judgment is implicit: the verifier runs as a regular agent, and the only signal is "did tests pass after verifier ran?" There's no structured verdict flowing back to the pipeline.
457
-
458
- Add structured output parsing to the verifier session so its judgment feeds into `failureCategory` and the escalation system.
459
-
460
- ## Problem
461
-
462
- Current verifier prompt asks it to:
463
- 1. Run tests and verify they pass
464
- 2. Review implementation quality
465
- 3. Check acceptance criteria
466
- 4. **Check if implementer modified test files and judge legitimacy**
467
- 5. Fix issues minimally
468
-
469
- But the result is just `{ success: boolean, estimatedCost: number }` — same as any agent session. The verifier's judgment about test modifications, code quality, and acceptance criteria is lost.
470
-
471
- **Consequences:**
472
- - If verifier finds illegitimate test modifications, it tries to fix them but we don't know *what* it found
473
- - If verifier can't fix the issue, it exits non-zero → treated same as a crash
474
- - No signal to differentiate "tests pass but code is bad" from "tests fail"
475
- - The `VerifierDecision` type exists in `types.ts` but is **never populated**
476
-
477
- ## Proposed Solution
478
-
479
- ### Structured Verdict File
480
-
481
- Instead of parsing agent stdout (fragile), the verifier writes a structured verdict file that the orchestrator reads after the session:
482
-
483
- ```
484
- <workdir>/.nax-verifier-verdict.json
485
- ```
486
-
487
- **Why a file?** Claude Code (the agent) can easily write files. Parsing structured output from stdout is unreliable with Claude Code since it mixes tool calls, thinking, and output.
488
-
489
- ### Verdict Schema
490
-
491
- ```typescript
492
- interface VerifierVerdict {
493
- /** Schema version */
494
- version: 1;
495
-
496
- /** Overall approval */
497
- approved: boolean;
498
-
499
- /** Test results */
500
- tests: {
501
- /** Did all tests pass? */
502
- allPassing: boolean;
503
- /** Number of tests passing */
504
- passCount: number;
505
- /** Number of tests failing */
506
- failCount: number;
507
- };
508
-
509
- /** Implementer test modification review */
510
- testModifications: {
511
- /** Were test files modified by implementer? */
512
- detected: boolean;
513
- /** List of modified test files */
514
- files: string[];
515
- /** Are the modifications legitimate? */
516
- legitimate: boolean;
517
- /** Reasoning for legitimacy judgment */
518
- reasoning: string;
519
- };
520
-
521
- /** Acceptance criteria check */
522
- acceptanceCriteria: {
523
- /** All criteria met? */
524
- allMet: boolean;
525
- /** Per-criterion status */
526
- criteria: Array<{
527
- criterion: string;
528
- met: boolean;
529
- note?: string;
530
- }>;
531
- };
532
-
533
- /** Code quality assessment */
534
- quality: {
535
- /** Overall quality: good | acceptable | poor */
536
- rating: "good" | "acceptable" | "poor";
537
- /** Issues found */
538
- issues: string[];
539
- };
540
-
541
- /** Fixes applied by verifier */
542
- fixes: string[];
543
-
544
- /** Overall reasoning */
545
- reasoning: string;
546
- }
547
- ```
548
-
549
- ### Updated Verifier Prompt
550
-
551
- ```typescript
552
- export function buildVerifierPrompt(story: UserStory): string {
553
- return `# Test-Driven Development — Session 3: Verify
554
-
555
- You are in the third session of a three-session TDD workflow. Tests and implementation are complete.
556
-
557
- **Story:** ${story.title}
558
-
559
- **Your tasks:**
560
- 1. Run all tests and verify they pass
561
- 2. Review the implementation for quality and correctness
562
- 3. Check that the implementation meets all acceptance criteria
563
- 4. Check if test files were modified by the implementer. If yes, verify the changes are legitimate fixes (e.g. fixing incorrect expectations) and NOT just loosening assertions to mask bugs.
564
- 5. If any issues exist, fix them minimally
565
-
566
- **Acceptance Criteria:**
567
- ${story.acceptanceCriteria.map((ac, i) => `${i + 1}. ${ac}`).join("\n")}
568
-
569
- **IMPORTANT — Write Verdict File:**
570
- After completing your review, write a JSON verdict file to \`.nax-verifier-verdict.json\` in the project root.
571
-
572
- \`\`\`json
573
- {
574
- "version": 1,
575
- "approved": true,
576
- "tests": {
577
- "allPassing": true,
578
- "passCount": 15,
579
- "failCount": 0
580
- },
581
- "testModifications": {
582
- "detected": false,
583
- "files": [],
584
- "legitimate": true,
585
- "reasoning": "No test files were modified by implementer"
586
- },
587
- "acceptanceCriteria": {
588
- "allMet": true,
589
- "criteria": [
590
- { "criterion": "Criterion text", "met": true }
591
- ]
592
- },
593
- "quality": {
594
- "rating": "good",
595
- "issues": []
596
- },
597
- "fixes": [],
598
- "reasoning": "All tests pass, implementation is clean, all criteria met."
599
- }
600
- \`\`\`
601
-
602
- Set \`approved: false\` if:
603
- - Tests are failing and you cannot fix them
604
- - Implementer loosened test assertions to mask bugs (testModifications.legitimate = false)
605
- - Critical acceptance criteria are not met
606
- - Code quality is poor with security or correctness issues
607
-
608
- Set \`approved: true\` if:
609
- - All tests pass (or pass after your minimal fixes)
610
- - Implementation is clean and follows conventions
611
- - All acceptance criteria met
612
- - Any test modifications by implementer are legitimate fixes
613
-
614
- When done, commit any fixes with message: "fix: verify and adjust ${story.title}"`;
615
- }
616
- ```
617
-
618
- ### Orchestrator Changes
619
-
620
- After verifier session completes, read and parse the verdict file:
621
-
622
- ```typescript
623
- // In tdd/orchestrator.ts, after session 3 completes:
624
-
625
- // Read verdict file
626
- const verdictPath = path.join(workdir, ".nax-verifier-verdict.json");
627
- let verdict: VerifierVerdict | null = null;
628
-
629
- try {
630
- const file = Bun.file(verdictPath);
631
- if (await file.exists()) {
632
- verdict = await file.json() as VerifierVerdict;
633
- logger.info("tdd", "Verifier verdict loaded", {
634
- storyId: story.id,
635
- approved: verdict.approved,
636
- testsAllPassing: verdict.tests.allPassing,
637
- testModsDetected: verdict.testModifications.detected,
638
- testModsLegitimate: verdict.testModifications.legitimate,
639
- qualityRating: verdict.quality.rating,
640
- allCriteriaMet: verdict.acceptanceCriteria.allMet,
641
- });
642
- } else {
643
- logger.warn("tdd", "No verifier verdict file found — falling back to test-only check", {
644
- storyId: story.id,
645
- });
646
- }
647
- } catch (err) {
648
- logger.warn("tdd", "Failed to parse verifier verdict", {
649
- storyId: story.id,
650
- error: String(err),
651
- });
652
- }
653
-
654
- // Clean up verdict file (don't leave it in the repo)
655
- try {
656
- await unlink(verdictPath);
657
- } catch { /* ignore */ }
658
- ```
659
-
660
- ### Verdict → failureCategory Mapping
661
-
662
- ```typescript
663
- function categorizeVerdict(
664
- verdict: VerifierVerdict | null,
665
- session3Success: boolean,
666
- testsPass: boolean,
667
- ): { success: boolean; failureCategory?: FailureCategory; reviewReason?: string } {
668
-
669
- // No verdict file → fall back to existing behavior (test-only check)
670
- if (!verdict) {
671
- if (testsPass) return { success: true };
672
- return {
673
- success: false,
674
- failureCategory: "tests-failing",
675
- reviewReason: "Tests failing after all sessions (no verdict file)",
676
- };
677
- }
678
-
679
- // Verdict: approved
680
- if (verdict.approved) {
681
- return { success: true };
682
- }
683
-
684
- // Verdict: not approved — classify why
685
-
686
- // Illegitimate test modifications (implementer cheated)
687
- if (verdict.testModifications.detected && !verdict.testModifications.legitimate) {
688
- return {
689
- success: false,
690
- failureCategory: "verifier-rejected",
691
- reviewReason: `Verifier rejected: illegitimate test modifications in ${verdict.testModifications.files.join(", ")}. ${verdict.testModifications.reasoning}`,
692
- };
693
- }
694
-
695
- // Tests failing
696
- if (!verdict.tests.allPassing) {
697
- return {
698
- success: false,
699
- failureCategory: "tests-failing",
700
- reviewReason: `Tests failing: ${verdict.tests.failCount} failures. ${verdict.reasoning}`,
701
- };
702
- }
703
-
704
- // Acceptance criteria not met
705
- if (!verdict.acceptanceCriteria.allMet) {
706
- const unmet = verdict.acceptanceCriteria.criteria
707
- .filter(c => !c.met)
708
- .map(c => c.criterion);
709
- return {
710
- success: false,
711
- failureCategory: "verifier-rejected",
712
- reviewReason: `Acceptance criteria not met: ${unmet.join("; ")}`,
713
- };
714
- }
715
-
716
- // Poor quality
717
- if (verdict.quality.rating === "poor") {
718
- return {
719
- success: false,
720
- failureCategory: "verifier-rejected",
721
- reviewReason: `Poor code quality: ${verdict.quality.issues.join("; ")}`,
722
- };
723
- }
724
-
725
- // Catch-all: verdict says not approved but no clear reason
726
- return {
727
- success: false,
728
- failureCategory: "verifier-rejected",
729
- reviewReason: verdict.reasoning || "Verifier rejected without specific reason",
730
- };
731
- }
732
- ```
733
-
734
- ### Escalation Behavior per Verdict
735
-
736
- | Verdict Reason | failureCategory | Escalation Path |
737
- |:---------------|:---------------|:---------------|
738
- | Illegitimate test mods | `verifier-rejected` | Escalate tier → pause after all tiers |
739
- | Tests failing | `tests-failing` | Escalate tier → fail after all tiers |
740
- | Criteria not met | `verifier-rejected` | Escalate tier → pause after all tiers |
741
- | Poor quality | `verifier-rejected` | Escalate tier → pause after all tiers |
742
- | Approved | — | Success ✅ |
743
- | No verdict file | Falls back to test check | Same as before |
744
-
745
- ### Verdict File Lifecycle
746
-
747
- 1. **Created by:** Verifier agent (session 3) writes `.nax-verifier-verdict.json`
748
- 2. **Read by:** TDD orchestrator after session 3 completes
749
- 3. **Deleted by:** TDD orchestrator after reading (not committed to git)
750
- 4. **Fallback:** If file missing or unparseable, fall back to existing behavior (post-TDD test verification)
751
-
752
- ### `.gitignore`
753
-
754
- Add to project `.gitignore` (or nax init template):
755
- ```
756
- .nax-verifier-verdict.json
757
- ```
758
-
759
- ### Files to Change
760
-
761
- | File | Change |
762
- |:-----|:-------|
763
- | `src/tdd/types.ts` | Add `VerifierVerdict` interface |
764
- | `src/tdd/prompts.ts` | Update `buildVerifierPrompt()` with verdict file instructions |
765
- | `src/tdd/orchestrator.ts` | Read verdict file after session 3, map to `failureCategory` |
766
- | `src/tdd/verdict.ts` | **New file.** `readVerdict()`, `categorizeVerdict()`, `cleanupVerdict()` |
767
-
768
- ### Testing
769
-
770
- | Test | Description |
771
- |:-----|:------------|
772
- | `tdd/verdict.test.ts` | Unit: `categorizeVerdict()` for all verdict combinations |
773
- | `tdd/verdict.test.ts` | Unit: missing verdict file falls back gracefully |
774
- | `tdd/verdict.test.ts` | Unit: malformed JSON falls back gracefully |
775
- | `tdd/orchestrator.test.ts` | Integration: verdict file read + cleanup after session 3 |
776
- | `tdd/orchestrator.test.ts` | Integration: illegitimate test mods → `verifier-rejected` |
777
- | Manual | Run TDD on a story, verify verdict file is written and consumed |
778
-
779
- ### Robustness
780
-
781
- **What if the agent doesn't write the verdict file?**
782
- Fall back to existing behavior: run tests independently, check pass/fail. This is the same as v0.10.0. The verdict file is an enhancement, not a requirement.
783
-
784
- **What if the JSON is malformed?**
785
- Log warning, fall back to test-only check. Never crash.
786
-
787
- **What if the agent writes wrong data?**
788
- Validate required fields (`version`, `approved`, `tests`). Missing fields → fall back. The verdict is advisory — the independent test run is the ground truth for "tests pass."
789
-
790
- ---
791
-
792
- # v0.10.1 Summary
793
-
794
- Three features, cohesive release:
795
-
796
- | Feature | Files Changed | Effort | Dependency |
797
- |:--------|:-------------|:-------|:-----------|
798
- | 1. `--status-file` | 3 (new `status-file.ts`, modify `runner.ts`, CLI) | Medium | None |
799
- | 2. TDD Escalation Retry | 5 (types, orchestrator, execution stage, pipeline types, runner) | Medium | None |
800
- | 3. Structured Verifier Verdicts | 4 (types, prompts, orchestrator, new `verdict.ts`) | Medium | Feature 2 (feeds `failureCategory`) |
801
-
802
- **Total files:** 10 changed/new (some overlap — `types.ts` and `orchestrator.ts` touched by features 2+3).
803
-
804
- **Breaking changes:** None. All features are additive/optional.
805
-
806
- **Config changes:** None. Uses existing escalation config.
807
-
808
- ### Implementation Order
809
-
810
- 1. Feature 1 (`--status-file`) — independent, can ship alone
811
- 2. Feature 2 (TDD escalation) — core retry logic
812
- 3. Feature 3 (verifier verdicts) — builds on feature 2's `failureCategory`