@nathapp/nax 0.18.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.gitlab-ci.yml +96 -0
- package/BRIEF.md +140 -0
- package/CHANGELOG.md +60 -0
- package/CLAUDE.md +159 -0
- package/README.md +373 -0
- package/US-007-IMPLEMENTATION.md +139 -0
- package/bin/nax.ts +930 -0
- package/biome.json +14 -0
- package/bun.lock +168 -0
- package/bunfig.toml +11 -0
- package/docs/20260216-fix-plan-context-review.md +56 -0
- package/docs/20260216-relentless-vs-ngent-comparison.md +208 -0
- package/docs/20260216-v02-plan.md +136 -0
- package/docs/20260216-v02-review.md +685 -0
- package/docs/20260217-dogfood-findings.md +56 -0
- package/docs/20260217-p2-plus-plan.md +117 -0
- package/docs/20260217-partial-fixes-plan.md +62 -0
- package/docs/20260217-plan-analyze-spec.md +117 -0
- package/docs/20260217-post-impl-review.md +1137 -0
- package/docs/20260217-quick-wins-plan.md +66 -0
- package/docs/20260217-split-runner-plan.md +75 -0
- package/docs/20260217-v03-impl-plan.md +80 -0
- package/docs/20260217-v03-post-impl-review.md +589 -0
- package/docs/20260217-v04-impl-plan.md +86 -0
- package/docs/20260217-v05-post-impl-review.md +850 -0
- package/docs/20260217-v06-post-impl-review.md +817 -0
- package/docs/20260218-adr003-port-plan.md +151 -0
- package/docs/20260218-review-adr003-verification.md +175 -0
- package/docs/20260219-fix-plan-bug16-19.md +79 -0
- package/docs/20260219-fix-plan-bug20-22.md +114 -0
- package/docs/20260219-plan-llm-routing.md +116 -0
- package/docs/20260219-review-bug20-22-fixes.md +135 -0
- package/docs/20260219-routing-baseline-keyword.md +63 -0
- package/docs/20260220-plan-structured-logging-p1.md +80 -0
- package/docs/20260220-plan-structured-logging-p2.md +37 -0
- package/docs/20260220-review-llm-routing.md +180 -0
- package/docs/20260220-review-post-fix-llm-routing.md +70 -0
- package/docs/20260221-fix-plan-relevantfiles-split.md +101 -0
- package/docs/20260221-fix-plan-routing-mode.md +125 -0
- package/docs/20260221-review-v0.9-implementation.md +379 -0
- package/docs/20260222-fix-plan-v091-routing-isolation.md +197 -0
- package/docs/20260223-fix-plan-prompt-audit.md +62 -0
- package/docs/20260224-nax-roadmap-phases.md +189 -0
- package/docs/20260225-phase2-llm-service-layer.md +401 -0
- package/docs/20260225-review-v0.10.1.md +187 -0
- package/docs/20260303-v010-implementation-plan.md +165 -0
- package/docs/CLAUDE.md.bak +191 -0
- package/docs/ROADMAP.md +165 -0
- package/docs/SPEC-rectification.md +0 -0
- package/docs/SPEC.md +324 -0
- package/docs/US-001-plugin-loading-verification.md +152 -0
- package/docs/architecture-analysis.md +1076 -0
- package/docs/bugs/BUG-21-escalation-null-attempts.md +48 -0
- package/docs/bugs-from-dogfood-run-c.md +243 -0
- package/docs/code-review-20260228.md +612 -0
- package/docs/code-review-v0.15.0.md +629 -0
- package/docs/hook-lifecycle-test-plan.md +149 -0
- package/docs/releases/v0.11.0-and-earlier.md +20 -0
- package/docs/releases/v0.12.0.md +15 -0
- package/docs/releases/v0.13.0.md +14 -0
- package/docs/releases/v0.14.0.md +20 -0
- package/docs/releases/v0.14.1.md +36 -0
- package/docs/releases/v0.14.2.md +51 -0
- package/docs/releases/v0.14.3.md +174 -0
- package/docs/releases/v0.14.4.md +94 -0
- package/docs/releases/v0.15.0.md +502 -0
- package/docs/releases/v0.15.1.md +170 -0
- package/docs/releases/v0.15.3.md +193 -0
- package/docs/specs/status-file-v0.10.1.md +812 -0
- package/docs/v0.10-global-config.md +206 -0
- package/docs/v0.10-plugin-system.md +415 -0
- package/docs/v0.10-prompt-optimizer.md +234 -0
- package/docs/v0.3-spec.md +244 -0
- package/docs/v0.4-spec.md +140 -0
- package/docs/v0.5-spec.md +237 -0
- package/docs/v0.6-spec.md +371 -0
- package/docs/v0.7-spec.md +177 -0
- package/docs/v0.8-llm-routing.md +206 -0
- package/docs/v0.8-structured-logging.md +132 -0
- package/docs/v0.9.3-prompt-audit.md +112 -0
- package/examples/plugins/console-reporter/index.test.ts +207 -0
- package/examples/plugins/console-reporter/index.ts +110 -0
- package/nax/config.json +147 -0
- package/nax/features/bugfix-v0171/prd.json +52 -0
- package/nax/features/config-management/prd.json +108 -0
- package/nax/features/config-management/progress.txt +5 -0
- package/nax/features/diagnose/acceptance.test.ts +412 -0
- package/nax/features/diagnose/prd.json +41 -0
- package/nax/features/orchestration-fixes/prd.json +89 -0
- package/nax/features/orchestration-fixes/progress.txt +1 -0
- package/nax/features/plugin-integration/US-007-VERIFICATION.md +259 -0
- package/nax/features/plugin-integration/prd.json +208 -0
- package/nax/features/plugin-integration/progress.txt +5 -0
- package/nax/features/precheck/prd.json +205 -0
- package/nax/features/precheck/progress.txt +15 -0
- package/nax/features/structured-logging/prd.json +199 -0
- package/nax/features/unlock/prd.json +36 -0
- package/package.json +47 -0
- package/src/acceptance/fix-generator.ts +348 -0
- package/src/acceptance/generator.ts +282 -0
- package/src/acceptance/index.ts +30 -0
- package/src/acceptance/types.ts +79 -0
- package/src/agents/claude-decompose.ts +169 -0
- package/src/agents/claude-plan.ts +139 -0
- package/src/agents/claude.ts +324 -0
- package/src/agents/cost.ts +268 -0
- package/src/agents/index.ts +13 -0
- package/src/agents/registry.ts +48 -0
- package/src/agents/types-extended.ts +133 -0
- package/src/agents/types.ts +113 -0
- package/src/agents/validation.ts +69 -0
- package/src/analyze/classifier.ts +305 -0
- package/src/analyze/index.ts +16 -0
- package/src/analyze/scanner.ts +175 -0
- package/src/analyze/types.ts +51 -0
- package/src/cli/accept.ts +108 -0
- package/src/cli/analyze-parser.ts +284 -0
- package/src/cli/analyze.ts +207 -0
- package/src/cli/config.ts +561 -0
- package/src/cli/constitution.ts +109 -0
- package/src/cli/diagnose-analysis.ts +159 -0
- package/src/cli/diagnose-formatter.ts +87 -0
- package/src/cli/diagnose.ts +203 -0
- package/src/cli/generate.ts +127 -0
- package/src/cli/index.ts +37 -0
- package/src/cli/init.ts +188 -0
- package/src/cli/interact.ts +295 -0
- package/src/cli/plan.ts +198 -0
- package/src/cli/plugins.ts +111 -0
- package/src/cli/prompts.ts +295 -0
- package/src/cli/runs.ts +174 -0
- package/src/cli/status-cost.ts +151 -0
- package/src/cli/status-features.ts +338 -0
- package/src/cli/status.ts +13 -0
- package/src/commands/common.ts +171 -0
- package/src/commands/diagnose.ts +17 -0
- package/src/commands/index.ts +8 -0
- package/src/commands/logs.ts +384 -0
- package/src/commands/precheck.ts +86 -0
- package/src/commands/unlock.ts +96 -0
- package/src/config/defaults.ts +160 -0
- package/src/config/index.ts +22 -0
- package/src/config/loader.ts +121 -0
- package/src/config/merger.ts +147 -0
- package/src/config/path-security.ts +121 -0
- package/src/config/paths.ts +27 -0
- package/src/config/schema.ts +56 -0
- package/src/config/schemas.ts +286 -0
- package/src/config/types.ts +423 -0
- package/src/config/validate.ts +103 -0
- package/src/constitution/generator.ts +191 -0
- package/src/constitution/generators/aider.ts +41 -0
- package/src/constitution/generators/claude.ts +35 -0
- package/src/constitution/generators/cursor.ts +36 -0
- package/src/constitution/generators/opencode.ts +38 -0
- package/src/constitution/generators/types.ts +33 -0
- package/src/constitution/generators/windsurf.ts +36 -0
- package/src/constitution/index.ts +10 -0
- package/src/constitution/loader.ts +133 -0
- package/src/constitution/types.ts +31 -0
- package/src/context/auto-detect.ts +227 -0
- package/src/context/builder.ts +246 -0
- package/src/context/elements.ts +83 -0
- package/src/context/formatter.ts +107 -0
- package/src/context/generator.ts +129 -0
- package/src/context/generators/aider.ts +34 -0
- package/src/context/generators/claude.ts +28 -0
- package/src/context/generators/cursor.ts +28 -0
- package/src/context/generators/opencode.ts +30 -0
- package/src/context/generators/windsurf.ts +28 -0
- package/src/context/greenfield.ts +114 -0
- package/src/context/index.ts +33 -0
- package/src/context/injector.ts +279 -0
- package/src/context/test-scanner.ts +370 -0
- package/src/context/types.ts +98 -0
- package/src/errors.ts +67 -0
- package/src/execution/batching.ts +157 -0
- package/src/execution/crash-recovery.ts +373 -0
- package/src/execution/escalation/escalation.ts +44 -0
- package/src/execution/escalation/index.ts +13 -0
- package/src/execution/escalation/tier-escalation.ts +295 -0
- package/src/execution/escalation/tier-outcome.ts +158 -0
- package/src/execution/helpers.ts +38 -0
- package/src/execution/index.ts +45 -0
- package/src/execution/lifecycle/acceptance-loop.ts +272 -0
- package/src/execution/lifecycle/headless-formatter.ts +85 -0
- package/src/execution/lifecycle/index.ts +12 -0
- package/src/execution/lifecycle/parallel-lifecycle.ts +101 -0
- package/src/execution/lifecycle/precheck-runner.ts +140 -0
- package/src/execution/lifecycle/run-cleanup.ts +81 -0
- package/src/execution/lifecycle/run-completion.ts +129 -0
- package/src/execution/lifecycle/run-initialization.ts +141 -0
- package/src/execution/lifecycle/run-lifecycle.ts +312 -0
- package/src/execution/lifecycle/run-setup.ts +204 -0
- package/src/execution/lifecycle/story-hooks.ts +38 -0
- package/src/execution/lifecycle/story-size-prompts.ts +123 -0
- package/src/execution/lock.ts +115 -0
- package/src/execution/parallel-executor.ts +216 -0
- package/src/execution/parallel.ts +400 -0
- package/src/execution/pid-registry.ts +280 -0
- package/src/execution/pipeline-result-handler.ts +388 -0
- package/src/execution/post-verify-rectification.ts +188 -0
- package/src/execution/post-verify.ts +274 -0
- package/src/execution/progress.ts +25 -0
- package/src/execution/prompts.ts +127 -0
- package/src/execution/queue-handler.ts +109 -0
- package/src/execution/rectification.ts +13 -0
- package/src/execution/runner.ts +377 -0
- package/src/execution/sequential-executor.ts +388 -0
- package/src/execution/status-file.ts +264 -0
- package/src/execution/status-writer.ts +139 -0
- package/src/execution/story-context.ts +229 -0
- package/src/execution/test-output-parser.ts +14 -0
- package/src/execution/verification.ts +72 -0
- package/src/hooks/index.ts +2 -0
- package/src/hooks/runner.ts +286 -0
- package/src/hooks/types.ts +67 -0
- package/src/interaction/chain.ts +154 -0
- package/src/interaction/index.ts +60 -0
- package/src/interaction/init.ts +83 -0
- package/src/interaction/plugins/auto.ts +217 -0
- package/src/interaction/plugins/cli.ts +300 -0
- package/src/interaction/plugins/telegram.ts +384 -0
- package/src/interaction/plugins/webhook.ts +258 -0
- package/src/interaction/state.ts +171 -0
- package/src/interaction/triggers.ts +229 -0
- package/src/interaction/types.ts +163 -0
- package/src/logger/formatters.ts +84 -0
- package/src/logger/index.ts +16 -0
- package/src/logger/logger.ts +298 -0
- package/src/logger/types.ts +48 -0
- package/src/logging/formatter.ts +355 -0
- package/src/logging/index.ts +22 -0
- package/src/logging/types.ts +93 -0
- package/src/metrics/aggregator.ts +190 -0
- package/src/metrics/index.ts +14 -0
- package/src/metrics/tracker.ts +200 -0
- package/src/metrics/types.ts +109 -0
- package/src/optimizer/index.ts +62 -0
- package/src/optimizer/noop.optimizer.ts +24 -0
- package/src/optimizer/rule-based.optimizer.ts +248 -0
- package/src/optimizer/types.ts +53 -0
- package/src/pipeline/events.ts +130 -0
- package/src/pipeline/index.ts +19 -0
- package/src/pipeline/runner.ts +161 -0
- package/src/pipeline/stages/acceptance.ts +197 -0
- package/src/pipeline/stages/completion.ts +99 -0
- package/src/pipeline/stages/constitution.ts +63 -0
- package/src/pipeline/stages/context.ts +117 -0
- package/src/pipeline/stages/execution.ts +194 -0
- package/src/pipeline/stages/index.ts +62 -0
- package/src/pipeline/stages/optimizer.ts +74 -0
- package/src/pipeline/stages/prompt.ts +57 -0
- package/src/pipeline/stages/queue-check.ts +103 -0
- package/src/pipeline/stages/review.ts +181 -0
- package/src/pipeline/stages/routing.ts +81 -0
- package/src/pipeline/stages/verify.ts +100 -0
- package/src/pipeline/types.ts +167 -0
- package/src/plugins/index.ts +31 -0
- package/src/plugins/loader.ts +287 -0
- package/src/plugins/registry.ts +168 -0
- package/src/plugins/types.ts +327 -0
- package/src/plugins/validator.ts +352 -0
- package/src/prd/index.ts +172 -0
- package/src/prd/types.ts +202 -0
- package/src/precheck/checks-blockers.ts +391 -0
- package/src/precheck/checks-warnings.ts +142 -0
- package/src/precheck/checks.ts +30 -0
- package/src/precheck/index.ts +247 -0
- package/src/precheck/story-size-gate.ts +144 -0
- package/src/precheck/types.ts +31 -0
- package/src/queue/index.ts +2 -0
- package/src/queue/manager.ts +254 -0
- package/src/queue/types.ts +54 -0
- package/src/review/index.ts +8 -0
- package/src/review/runner.ts +172 -0
- package/src/review/types.ts +66 -0
- package/src/routing/builder.ts +81 -0
- package/src/routing/chain.ts +74 -0
- package/src/routing/index.ts +16 -0
- package/src/routing/loader.ts +58 -0
- package/src/routing/router.ts +303 -0
- package/src/routing/strategies/adaptive.ts +215 -0
- package/src/routing/strategies/index.ts +8 -0
- package/src/routing/strategies/keyword.ts +163 -0
- package/src/routing/strategies/llm-prompts.ts +209 -0
- package/src/routing/strategies/llm.ts +235 -0
- package/src/routing/strategies/manual.ts +50 -0
- package/src/routing/strategy.ts +99 -0
- package/src/tdd/cleanup.ts +111 -0
- package/src/tdd/index.ts +23 -0
- package/src/tdd/isolation.ts +123 -0
- package/src/tdd/orchestrator.ts +383 -0
- package/src/tdd/prompts.ts +270 -0
- package/src/tdd/rectification-gate.ts +183 -0
- package/src/tdd/session-runner.ts +179 -0
- package/src/tdd/types.ts +81 -0
- package/src/tdd/verdict.ts +271 -0
- package/src/tui/App.tsx +265 -0
- package/src/tui/components/AgentPanel.tsx +75 -0
- package/src/tui/components/CostOverlay.tsx +118 -0
- package/src/tui/components/HelpOverlay.tsx +107 -0
- package/src/tui/components/StatusBar.tsx +63 -0
- package/src/tui/components/StoriesPanel.tsx +177 -0
- package/src/tui/hooks/useKeyboard.ts +142 -0
- package/src/tui/hooks/useLayout.ts +137 -0
- package/src/tui/hooks/usePipelineEvents.ts +183 -0
- package/src/tui/hooks/usePty.ts +194 -0
- package/src/tui/index.tsx +38 -0
- package/src/tui/types.ts +76 -0
- package/src/utils/git.ts +83 -0
- package/src/utils/queue-writer.ts +54 -0
- package/src/verification/executor.ts +235 -0
- package/src/verification/gate.ts +207 -0
- package/src/verification/index.ts +12 -0
- package/src/verification/parser.ts +230 -0
- package/src/verification/rectification.ts +108 -0
- package/src/verification/types.ts +113 -0
- package/src/worktree/dispatcher.ts +65 -0
- package/src/worktree/index.ts +2 -0
- package/src/worktree/manager.ts +187 -0
- package/src/worktree/merge.ts +301 -0
- package/src/worktree/types.ts +4 -0
- package/test/TEST_COVERAGE_US001.md +217 -0
- package/test/TEST_COVERAGE_US003.md +84 -0
- package/test/TEST_COVERAGE_US005.md +86 -0
- package/test/US-002-orchestrator.test.ts +246 -0
- package/test/acceptance/cm-003-default-view.test.ts +194 -0
- package/test/execution/pid-registry.test.ts +240 -0
- package/test/execution/post-verify.test.ts +224 -0
- package/test/helpers/timeout.ts +42 -0
- package/test/integration/US-002-TEST-SUMMARY.md +107 -0
- package/test/integration/US-003-TEST-SUMMARY.md +149 -0
- package/test/integration/US-004-TEST-SUMMARY.md +106 -0
- package/test/integration/US-005-TEST-SUMMARY.md +138 -0
- package/test/integration/US-007-TEST-SUMMARY.md +100 -0
- package/test/integration/agent-validation.test.ts +439 -0
- package/test/integration/analyze-integration.test.ts +261 -0
- package/test/integration/analyze-scanner.test.ts +131 -0
- package/test/integration/cli-config-default-edge-cases.test.ts +222 -0
- package/test/integration/cli-config-default-view.test.ts +229 -0
- package/test/integration/cli-config-diff.test.ts +460 -0
- package/test/integration/cli-config.test.ts +736 -0
- package/test/integration/cli-diagnose.test.ts +592 -0
- package/test/integration/cli-logs.test.ts +314 -0
- package/test/integration/cli-plugins.test.ts +678 -0
- package/test/integration/cli-precheck.test.ts +371 -0
- package/test/integration/cli-run-headless.test.ts +173 -0
- package/test/integration/cli.test.ts +75 -0
- package/test/integration/config/merger.test.ts +465 -0
- package/test/integration/config/paths.test.ts +51 -0
- package/test/integration/config-loader.test.ts +265 -0
- package/test/integration/config.test.ts +444 -0
- package/test/integration/context-integration.test.ts +702 -0
- package/test/integration/context-provider-injection.test.ts +506 -0
- package/test/integration/context-verification-integration.test.ts +295 -0
- package/test/integration/e2e.test.ts +896 -0
- package/test/integration/execution.test.ts +625 -0
- package/test/integration/helpers.test.ts +295 -0
- package/test/integration/hooks.test.ts +361 -0
- package/test/integration/interaction-chain-pipeline.test.ts +464 -0
- package/test/integration/isolation.test.ts +143 -0
- package/test/integration/logger.test.ts +461 -0
- package/test/integration/parallel.test.ts +250 -0
- package/test/integration/path-security.test.ts +173 -0
- package/test/integration/pipeline-acceptance.test.ts +302 -0
- package/test/integration/pipeline-events.test.ts +475 -0
- package/test/integration/pipeline.test.ts +658 -0
- package/test/integration/plan.test.ts +157 -0
- package/test/integration/plugin-routing.test.ts +921 -0
- package/test/integration/plugins/config-integration.test.ts +172 -0
- package/test/integration/plugins/config-resolution.test.ts +522 -0
- package/test/integration/plugins/loader.test.ts +641 -0
- package/test/integration/plugins/registry.test.ts +746 -0
- package/test/integration/plugins/validator.test.ts +563 -0
- package/test/integration/prd-pause.test.ts +205 -0
- package/test/integration/prd-resolvers.test.ts +185 -0
- package/test/integration/precheck-integration.test.ts +468 -0
- package/test/integration/precheck.test.ts +805 -0
- package/test/integration/progress.test.ts +34 -0
- package/test/integration/rectification-flow.test.ts +512 -0
- package/test/integration/reporter-lifecycle.test.ts +860 -0
- package/test/integration/review-config-commands.test.ts +319 -0
- package/test/integration/review-config-schema.test.ts +116 -0
- package/test/integration/review-plugin-integration.test.ts +722 -0
- package/test/integration/review.test.ts +149 -0
- package/test/integration/routing-stage-bug-021.test.ts +274 -0
- package/test/integration/routing-stage-greenfield.test.ts +286 -0
- package/test/integration/runner-config-plugins.test.ts +461 -0
- package/test/integration/runner-fixes.test.ts +399 -0
- package/test/integration/runner-plugin-integration.test.ts +543 -0
- package/test/integration/runner.test.ts +1679 -0
- package/test/integration/s5-greenfield-fallback.test.ts +297 -0
- package/test/integration/status-file-integration.test.ts +325 -0
- package/test/integration/status-file.test.ts +379 -0
- package/test/integration/status-writer.test.ts +345 -0
- package/test/integration/story-id-in-events.test.ts +273 -0
- package/test/integration/tdd-cleanup.test.ts +246 -0
- package/test/integration/tdd-orchestrator.test.ts +1762 -0
- package/test/integration/test-scanner.test.ts +403 -0
- package/test/integration/verification-asset-check.test.ts +142 -0
- package/test/integration/verify-stage.test.ts +275 -0
- package/test/integration/worktree/manager.test.ts +218 -0
- package/test/integration/worktree/merge.test.ts +341 -0
- package/test/manual/logging-formatter-demo.ts +158 -0
- package/test/ui/tui-agent-panel.test.tsx +99 -0
- package/test/ui/tui-controls.test.ts +334 -0
- package/test/ui/tui-cost-and-pty.test.ts +189 -0
- package/test/ui/tui-layout.test.ts +378 -0
- package/test/ui/tui-pty-integration.test.tsx +159 -0
- package/test/ui/tui-stories.test.ts +332 -0
- package/test/unit/acceptance.test.ts +186 -0
- package/test/unit/agent-stderr-capture.test.ts +146 -0
- package/test/unit/analyze-classifier.test.ts +215 -0
- package/test/unit/analyze.test.ts +224 -0
- package/test/unit/auto-detect.test.ts +249 -0
- package/test/unit/cli-status.test.ts +417 -0
- package/test/unit/commands/common.test.ts +320 -0
- package/test/unit/commands/logs.test.ts +416 -0
- package/test/unit/commands/unlock.test.ts +319 -0
- package/test/unit/constitution-generators.test.ts +160 -0
- package/test/unit/constitution.test.ts +209 -0
- package/test/unit/context.test.ts +1722 -0
- package/test/unit/cost.test.ts +231 -0
- package/test/unit/crash-recovery.test.ts +308 -0
- package/test/unit/escalation.test.ts +126 -0
- package/test/unit/execution-logging-stderr.test.ts +156 -0
- package/test/unit/execution-stage.test.ts +122 -0
- package/test/unit/fix-generator.test.ts +275 -0
- package/test/unit/formatters.test.ts +469 -0
- package/test/unit/greenfield.test.ts +179 -0
- package/test/unit/helpers.test.ts +317 -0
- package/test/unit/interaction/human-review-trigger.test.ts +164 -0
- package/test/unit/interaction-network-failures.test.ts +389 -0
- package/test/unit/interaction-plugins.test.ts +164 -0
- package/test/unit/isolation.test.ts +134 -0
- package/test/unit/logging/formatter.test.ts +455 -0
- package/test/unit/merge.test.ts +268 -0
- package/test/unit/metrics.test.ts +276 -0
- package/test/unit/optimizer/noop.optimizer.test.ts +125 -0
- package/test/unit/optimizer/rule-based.optimizer.test.ts +358 -0
- package/test/unit/prd-auto-default.test.ts +290 -0
- package/test/unit/prd-failure-category.test.ts +176 -0
- package/test/unit/prd-get-next-story.test.ts +186 -0
- package/test/unit/precheck-checks.test.ts +840 -0
- package/test/unit/precheck-story-size-gate.test.ts +287 -0
- package/test/unit/precheck-types.test.ts +142 -0
- package/test/unit/prompts.test.ts +475 -0
- package/test/unit/queue.test.ts +237 -0
- package/test/unit/rectification.test.ts +284 -0
- package/test/unit/registry.test.ts +287 -0
- package/test/unit/routing.test.ts +937 -0
- package/test/unit/run-lifecycle.test.ts +140 -0
- package/test/unit/storyid-events.test.ts +224 -0
- package/test/unit/tdd-verdict.test.ts +492 -0
- package/test/unit/test-output-parser.test.ts +377 -0
- package/test/unit/verdict.test.ts +324 -0
- package/test/unit/worktree-manager.test.ts +158 -0
- package/tsconfig.json +27 -0
|
@@ -0,0 +1,812 @@
|
|
|
1
|
+
# Spec: v0.10.1 — Status File + TDD Escalation Retry
|
|
2
|
+
|
|
3
|
+
**Version:** v0.10.1
|
|
4
|
+
**Author:** Subrina
|
|
5
|
+
**Date:** 2026-02-25
|
|
6
|
+
**Status:** Draft
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Summary
|
|
11
|
+
|
|
12
|
+
Add a `--status-file <path>` flag to `nax run` that writes a machine-readable JSON status file, updated after each story completes. Enables external tools (CI/CD, orchestrators, dashboards) to monitor nax runs without parsing logs or aggregating hooks.
|
|
13
|
+
|
|
14
|
+
## Motivation
|
|
15
|
+
|
|
16
|
+
- **Log parsing is fragile** — format changes break consumers
|
|
17
|
+
- **Hook aggregation has gaps** — if a hook fails, events are lost; no single source of truth
|
|
18
|
+
- **nax already tracks this state** — `RunResult`, story counts, cost, PRD status are all in memory
|
|
19
|
+
- **General-purpose** — useful for any integration, not just our orchestrator skill
|
|
20
|
+
|
|
21
|
+
## Interface
|
|
22
|
+
|
|
23
|
+
### CLI Flag
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
nax run -f <feature> --headless --status-file ./nax-status.json
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
| Flag | Type | Default | Description |
|
|
30
|
+
|:-----|:-----|:--------|:------------|
|
|
31
|
+
| `--status-file` | `string` | `undefined` | Path to write JSON status file. If not set, no file is written. |
|
|
32
|
+
|
|
33
|
+
Relative paths resolved from `cwd` (same as `--headless` log behavior).
|
|
34
|
+
|
|
35
|
+
### Status File Schema
|
|
36
|
+
|
|
37
|
+
```typescript
|
|
38
|
+
interface NaxStatusFile {
|
|
39
|
+
/** Schema version for forward compatibility */
|
|
40
|
+
version: 1;
|
|
41
|
+
|
|
42
|
+
/** Run metadata */
|
|
43
|
+
run: {
|
|
44
|
+
id: string; // Run ID (e.g. "run-2026-02-25T10-00-00-000Z")
|
|
45
|
+
feature: string; // Feature name
|
|
46
|
+
startedAt: string; // ISO 8601
|
|
47
|
+
status: "running" | "completed" | "failed" | "stalled";
|
|
48
|
+
dryRun: boolean;
|
|
49
|
+
};
|
|
50
|
+
|
|
51
|
+
/** Aggregate progress */
|
|
52
|
+
progress: {
|
|
53
|
+
total: number; // Total stories in PRD
|
|
54
|
+
passed: number;
|
|
55
|
+
failed: number;
|
|
56
|
+
paused: number;
|
|
57
|
+
blocked: number;
|
|
58
|
+
pending: number; // total - passed - failed - paused - blocked
|
|
59
|
+
};
|
|
60
|
+
|
|
61
|
+
/** Cost tracking */
|
|
62
|
+
cost: {
|
|
63
|
+
spent: number; // USD accumulated
|
|
64
|
+
limit: number | null; // From config.execution.costLimit
|
|
65
|
+
};
|
|
66
|
+
|
|
67
|
+
/** Current story being processed (null if between stories) */
|
|
68
|
+
current: {
|
|
69
|
+
storyId: string;
|
|
70
|
+
title: string;
|
|
71
|
+
complexity: string; // simple | medium | complex
|
|
72
|
+
tddStrategy: string; // test-after | tdd-lite | three-session-tdd
|
|
73
|
+
model: string; // Resolved model name
|
|
74
|
+
attempt: number; // Current attempt (1-based)
|
|
75
|
+
phase: string; // routing | test-write | implement | verify | review
|
|
76
|
+
} | null;
|
|
77
|
+
|
|
78
|
+
/** Iteration count */
|
|
79
|
+
iterations: number;
|
|
80
|
+
|
|
81
|
+
/** Last updated timestamp */
|
|
82
|
+
updatedAt: string; // ISO 8601
|
|
83
|
+
|
|
84
|
+
/** Duration so far in ms */
|
|
85
|
+
durationMs: number;
|
|
86
|
+
}
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### Example Output
|
|
90
|
+
|
|
91
|
+
```json
|
|
92
|
+
{
|
|
93
|
+
"version": 1,
|
|
94
|
+
"run": {
|
|
95
|
+
"id": "run-2026-02-25T10-00-00-000Z",
|
|
96
|
+
"feature": "auth-refactor",
|
|
97
|
+
"startedAt": "2026-02-25T10:00:00Z",
|
|
98
|
+
"status": "running",
|
|
99
|
+
"dryRun": false
|
|
100
|
+
},
|
|
101
|
+
"progress": {
|
|
102
|
+
"total": 12,
|
|
103
|
+
"passed": 7,
|
|
104
|
+
"failed": 1,
|
|
105
|
+
"paused": 0,
|
|
106
|
+
"blocked": 1,
|
|
107
|
+
"pending": 3
|
|
108
|
+
},
|
|
109
|
+
"cost": {
|
|
110
|
+
"spent": 1.23,
|
|
111
|
+
"limit": 5.00
|
|
112
|
+
},
|
|
113
|
+
"current": {
|
|
114
|
+
"storyId": "US-008",
|
|
115
|
+
"title": "Add retry logic to queue handler",
|
|
116
|
+
"complexity": "medium",
|
|
117
|
+
"tddStrategy": "tdd-lite",
|
|
118
|
+
"model": "claude-sonnet-4-5-20250514",
|
|
119
|
+
"attempt": 1,
|
|
120
|
+
"phase": "implement"
|
|
121
|
+
},
|
|
122
|
+
"iterations": 8,
|
|
123
|
+
"updatedAt": "2026-02-25T10:15:32Z",
|
|
124
|
+
"durationMs": 932000
|
|
125
|
+
}
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
## Implementation
|
|
129
|
+
|
|
130
|
+
### Files to Change
|
|
131
|
+
|
|
132
|
+
| File | Change |
|
|
133
|
+
|:-----|:-------|
|
|
134
|
+
| `src/execution/runner.ts` | Add `statusFile?: string` to `RunOptions`. Call `writeStatusFile()` at key points. |
|
|
135
|
+
| `src/execution/status-file.ts` | **New file.** `writeStatusFile()` function — builds `NaxStatusFile` from run state, writes atomically. |
|
|
136
|
+
| `src/main.ts` (or wherever CLI args are parsed) | Add `--status-file` option, pass to `RunOptions`. |
|
|
137
|
+
|
|
138
|
+
### Write Points
|
|
139
|
+
|
|
140
|
+
Status file is updated at these moments:
|
|
141
|
+
|
|
142
|
+
1. **Run start** — initial state (all stories pending)
|
|
143
|
+
2. **Story start** — update `current` with story info
|
|
144
|
+
3. **Story complete/fail/pause** — update `progress` counts, clear `current`
|
|
145
|
+
4. **Run end** — final state (`status: "completed"` or `"failed"`)
|
|
146
|
+
|
|
147
|
+
### Atomic Writes
|
|
148
|
+
|
|
149
|
+
Write to `<path>.tmp` then rename to `<path>` to prevent readers from seeing partial JSON:
|
|
150
|
+
|
|
151
|
+
```typescript
|
|
152
|
+
import { rename } from "node:fs/promises";
|
|
153
|
+
|
|
154
|
+
async function writeStatusFile(path: string, status: NaxStatusFile): Promise<void> {
|
|
155
|
+
const tmpPath = `${path}.tmp`;
|
|
156
|
+
await Bun.write(tmpPath, JSON.stringify(status, null, 2));
|
|
157
|
+
await rename(tmpPath, path);
|
|
158
|
+
}
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
### Integration with RunOptions
|
|
162
|
+
|
|
163
|
+
```typescript
|
|
164
|
+
// src/execution/runner.ts
|
|
165
|
+
export interface RunOptions {
|
|
166
|
+
// ... existing fields
|
|
167
|
+
/** Path to write JSON status file (optional) */
|
|
168
|
+
statusFile?: string;
|
|
169
|
+
}
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### Progress Counting
|
|
173
|
+
|
|
174
|
+
Derive from PRD state (already loaded):
|
|
175
|
+
|
|
176
|
+
```typescript
|
|
177
|
+
function countProgress(prd: PRD): NaxStatusFile["progress"] {
|
|
178
|
+
const stories = prd.stories;
|
|
179
|
+
const passed = stories.filter(s => s.status === "passed").length;
|
|
180
|
+
const failed = stories.filter(s => s.status === "failed").length;
|
|
181
|
+
const paused = stories.filter(s => s.status === "paused").length;
|
|
182
|
+
const blocked = stories.filter(s => s.status === "blocked").length;
|
|
183
|
+
const total = stories.length;
|
|
184
|
+
return { total, passed, failed, paused, blocked, pending: total - passed - failed - paused - blocked };
|
|
185
|
+
}
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
### Cleanup
|
|
189
|
+
|
|
190
|
+
The status file is **not** deleted on run end — it persists as a record of the last run. Consumers can check `run.status` to determine if the run is still active.
|
|
191
|
+
|
|
192
|
+
## Testing
|
|
193
|
+
|
|
194
|
+
| Test | Description |
|
|
195
|
+
|:-----|:------------|
|
|
196
|
+
| `status-file.test.ts` | Unit: `writeStatusFile()` produces valid JSON, atomic write works |
|
|
197
|
+
| `status-file.test.ts` | Unit: `countProgress()` correctly counts all states |
|
|
198
|
+
| `runner.test.ts` | Integration: `--status-file` option flows through to `RunOptions` |
|
|
199
|
+
| `runner.test.ts` | Integration: status file updates at each write point |
|
|
200
|
+
| Manual | `--status-file` + `--dry-run` produces correct output |
|
|
201
|
+
|
|
202
|
+
## Non-Goals
|
|
203
|
+
|
|
204
|
+
- **Real-time streaming** — this is a polled file, not a websocket/SSE stream
|
|
205
|
+
- **Historical run data** — status file represents current/last run only (hooks + events.jsonl cover history)
|
|
206
|
+
- **`nax status --json` command** — future work, can read this file
|
|
207
|
+
|
|
208
|
+
## Migration
|
|
209
|
+
|
|
210
|
+
None. New optional flag, no breaking changes. If `--status-file` is not passed, behavior is identical to v0.10.0.
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
# Feature 2: TDD Escalation Retry
|
|
215
|
+
|
|
216
|
+
## Summary
|
|
217
|
+
|
|
218
|
+
Three-session TDD currently hard-codes `pause` for all failures — isolation violations, session crashes, and test failures all result in the story being paused with no retry. This means TDD stories never benefit from the escalation system that test-after stories use.
|
|
219
|
+
|
|
220
|
+
Change: TDD failures should follow the same escalation retry pattern as test-after. Only pause when all retry paths are exhausted.
|
|
221
|
+
|
|
222
|
+
## Problem
|
|
223
|
+
|
|
224
|
+
Current flow (all TDD failures):
|
|
225
|
+
```
|
|
226
|
+
TDD failure → needsHumanReview=true → execution stage returns "pause" → story paused → NO RETRY
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
test-after flow (for comparison):
|
|
230
|
+
```
|
|
231
|
+
Agent failure → execution stage returns "escalate" → runner bumps tier → retries → only fails after max attempts
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
## Proposed Retry Strategy
|
|
235
|
+
|
|
236
|
+
TDD failures are classified into three categories with different retry paths:
|
|
237
|
+
|
|
238
|
+
### Category 1: Isolation Violation (test-writer touches source)
|
|
239
|
+
|
|
240
|
+
**Current:** Pause immediately.
|
|
241
|
+
**Proposed:** Auto-downgrade to tdd-lite, then escalate.
|
|
242
|
+
|
|
243
|
+
```
|
|
244
|
+
three-session-tdd fails (isolation violation)
|
|
245
|
+
→ Retry 1: three-session-tdd-lite (same tier, skip isolation for writer/implementer)
|
|
246
|
+
→ Success? Done ✅
|
|
247
|
+
→ Fail? Escalate to next tier
|
|
248
|
+
→ Retry 2: tdd-lite + stronger model
|
|
249
|
+
→ Success? Done ✅
|
|
250
|
+
→ Fail? Continue escalation through tier chain
|
|
251
|
+
→ All tiers exhausted → pause (needs human review) ⏸
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
**Note:** The zero-file fallback already does this for one specific case (test-writer creates no test files → auto-retry as lite). This generalizes that pattern to all isolation violations.
|
|
255
|
+
|
|
256
|
+
### Category 2: Session Failure (agent crash, timeout, non-zero exit)
|
|
257
|
+
|
|
258
|
+
**Current:** Pause immediately.
|
|
259
|
+
**Proposed:** Escalate model tier (same as test-after).
|
|
260
|
+
|
|
261
|
+
```
|
|
262
|
+
TDD session fails (crash/timeout)
|
|
263
|
+
→ Escalate to next model tier
|
|
264
|
+
→ Retry with stronger model (same TDD strategy)
|
|
265
|
+
→ Success? Done ✅
|
|
266
|
+
→ Fail? Continue escalation
|
|
267
|
+
→ All tiers exhausted → mark failed ❌
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
### Category 3: Tests Still Failing After All Sessions
|
|
271
|
+
|
|
272
|
+
**Current:** Post-TDD verification runs. If tests fail → pause.
|
|
273
|
+
**Proposed:** Escalate model tier.
|
|
274
|
+
|
|
275
|
+
```
|
|
276
|
+
All 3 sessions complete but tests still fail
|
|
277
|
+
→ Escalate to next model tier
|
|
278
|
+
→ Retry full TDD with stronger model
|
|
279
|
+
→ Success? Done ✅
|
|
280
|
+
→ Fail? Continue escalation
|
|
281
|
+
→ All tiers exhausted → mark failed ❌
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
### Summary Table
|
|
285
|
+
|
|
286
|
+
| Failure Type | Current Action | New Action | Final Fallback |
|
|
287
|
+
|:-------------|:--------------|:-----------|:--------------|
|
|
288
|
+
| Isolation violation | pause | Downgrade to lite → escalate | pause (human review) |
|
|
289
|
+
| Zero test files created | lite retry (exists) | Keep existing + escalate | pause (human review) |
|
|
290
|
+
| Session crash/timeout | pause | Escalate tier | fail |
|
|
291
|
+
| Tests fail post-TDD | pause | Escalate tier | fail |
|
|
292
|
+
| Verifier flags bad code | pause | Escalate tier | pause (human review) |
|
|
293
|
+
|
|
294
|
+
**Why "pause" for isolation/verifier but "fail" for crashes?**
|
|
295
|
+
- Isolation violations and verifier concerns suggest the code needs *human judgment* — the AI may be fundamentally misunderstanding the task.
|
|
296
|
+
- Crashes and test failures are mechanical — a stronger model usually fixes them.
|
|
297
|
+
|
|
298
|
+
## Implementation
|
|
299
|
+
|
|
300
|
+
### Changes to `ThreeSessionTddResult`
|
|
301
|
+
|
|
302
|
+
Add a `failureCategory` field so the execution stage can differentiate:
|
|
303
|
+
|
|
304
|
+
```typescript
|
|
305
|
+
export interface ThreeSessionTddResult {
|
|
306
|
+
success: boolean;
|
|
307
|
+
sessions: TddSessionResult[];
|
|
308
|
+
needsHumanReview: boolean;
|
|
309
|
+
reviewReason?: string;
|
|
310
|
+
totalCost: number;
|
|
311
|
+
lite: boolean;
|
|
312
|
+
|
|
313
|
+
/** NEW: Categorize failure for retry routing */
|
|
314
|
+
failureCategory?: "isolation-violation" | "session-failure" | "tests-failing" | "verifier-rejected";
|
|
315
|
+
}
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
### Changes to `execution.ts` (pipeline stage)
|
|
319
|
+
|
|
320
|
+
Replace the blanket `pause` with category-based routing:
|
|
321
|
+
|
|
322
|
+
```typescript
|
|
323
|
+
// Current:
|
|
324
|
+
if (tddResult.needsHumanReview) {
|
|
325
|
+
return { action: "pause", reason: tddResult.reviewReason };
|
|
326
|
+
}
|
|
327
|
+
|
|
328
|
+
// Proposed:
|
|
329
|
+
if (!tddResult.success) {
|
|
330
|
+
switch (tddResult.failureCategory) {
|
|
331
|
+
case "isolation-violation":
|
|
332
|
+
// If already lite → escalate. If strict → retry as lite (same tier).
|
|
333
|
+
if (tddResult.lite) {
|
|
334
|
+
return { action: "escalate", reason: tddResult.reviewReason };
|
|
335
|
+
}
|
|
336
|
+
// Store flag in context so runner knows to downgrade strategy
|
|
337
|
+
ctx.retryAsLite = true;
|
|
338
|
+
return { action: "escalate", reason: `Isolation violation — downgrading to lite` };
|
|
339
|
+
|
|
340
|
+
case "session-failure":
|
|
341
|
+
case "tests-failing":
|
|
342
|
+
return { action: "escalate", reason: tddResult.reviewReason };
|
|
343
|
+
|
|
344
|
+
case "verifier-rejected":
|
|
345
|
+
// Escalate first, pause only after all tiers exhausted
|
|
346
|
+
return { action: "escalate", reason: tddResult.reviewReason };
|
|
347
|
+
|
|
348
|
+
default:
|
|
349
|
+
return { action: "pause", reason: tddResult.reviewReason };
|
|
350
|
+
}
|
|
351
|
+
}
|
|
352
|
+
```
|
|
353
|
+
|
|
354
|
+
### Changes to `runner.ts` (escalation handler)
|
|
355
|
+
|
|
356
|
+
When escalating a TDD story with `retryAsLite`, update the story's routing to use `three-session-tdd-lite`:
|
|
357
|
+
|
|
358
|
+
```typescript
|
|
359
|
+
case "escalate": {
|
|
360
|
+
// ... existing escalation logic ...
|
|
361
|
+
|
|
362
|
+
// NEW: If retryAsLite flag set, downgrade TDD strategy
|
|
363
|
+
if (pipelineResult.context?.retryAsLite && story.routing) {
|
|
364
|
+
story.routing.testStrategy = "three-session-tdd-lite";
|
|
365
|
+
}
|
|
366
|
+
|
|
367
|
+
// ... rest of escalation ...
|
|
368
|
+
}
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
### Changes to `tdd/orchestrator.ts`
|
|
372
|
+
|
|
373
|
+
Set `failureCategory` based on what went wrong:
|
|
374
|
+
|
|
375
|
+
```typescript
|
|
376
|
+
// After session 1 (test-writer) isolation failure:
|
|
377
|
+
return {
|
|
378
|
+
success: false,
|
|
379
|
+
...
|
|
380
|
+
failureCategory: "isolation-violation",
|
|
381
|
+
};
|
|
382
|
+
|
|
383
|
+
// After session crash/timeout:
|
|
384
|
+
return {
|
|
385
|
+
success: false,
|
|
386
|
+
...
|
|
387
|
+
failureCategory: "session-failure",
|
|
388
|
+
};
|
|
389
|
+
|
|
390
|
+
// After post-TDD verification fails:
|
|
391
|
+
return {
|
|
392
|
+
success: false,
|
|
393
|
+
...
|
|
394
|
+
failureCategory: "tests-failing",
|
|
395
|
+
};
|
|
396
|
+
```
|
|
397
|
+
|
|
398
|
+
### Files to Change
|
|
399
|
+
|
|
400
|
+
| File | Change |
|
|
401
|
+
|:-----|:-------|
|
|
402
|
+
| `src/tdd/types.ts` | Add `failureCategory` to `ThreeSessionTddResult` |
|
|
403
|
+
| `src/tdd/orchestrator.ts` | Set `failureCategory` at each failure point |
|
|
404
|
+
| `src/pipeline/stages/execution.ts` | Route by `failureCategory` instead of blanket `pause` |
|
|
405
|
+
| `src/pipeline/types.ts` | Add `retryAsLite?: boolean` to `PipelineContext` |
|
|
406
|
+
| `src/execution/runner.ts` | Handle `retryAsLite` flag in escalation case |
|
|
407
|
+
|
|
408
|
+
### Testing
|
|
409
|
+
|
|
410
|
+
| Test | Description |
|
|
411
|
+
|:-----|:------------|
|
|
412
|
+
| `tdd/orchestrator.test.ts` | Unit: each failure path sets correct `failureCategory` |
|
|
413
|
+
| `pipeline/execution.test.ts` | Unit: isolation violation returns `escalate` (not `pause`) |
|
|
414
|
+
| `pipeline/execution.test.ts` | Unit: lite isolation violation returns `escalate` |
|
|
415
|
+
| `pipeline/execution.test.ts` | Unit: session failure returns `escalate` |
|
|
416
|
+
| `execution/runner.test.ts` | Integration: TDD story escalates through tiers before failing |
|
|
417
|
+
| `execution/runner.test.ts` | Integration: `retryAsLite` downgrades strategy on next attempt |
|
|
418
|
+
| Manual | Run with intentionally strict project, verify lite downgrade + tier escalation |
|
|
419
|
+
|
|
420
|
+
## Retry Budget
|
|
421
|
+
|
|
422
|
+
Uses the existing escalation config (`autoMode.escalation.tierOrder`). Example:
|
|
423
|
+
|
|
424
|
+
```json
|
|
425
|
+
{
|
|
426
|
+
"autoMode": {
|
|
427
|
+
"escalation": {
|
|
428
|
+
"enabled": true,
|
|
429
|
+
"tierOrder": [
|
|
430
|
+
{ "tier": "fast", "attempts": 2 },
|
|
431
|
+
{ "tier": "balanced", "attempts": 2 },
|
|
432
|
+
{ "tier": "powerful", "attempts": 1 }
|
|
433
|
+
]
|
|
434
|
+
}
|
|
435
|
+
}
|
|
436
|
+
}
|
|
437
|
+
```
|
|
438
|
+
|
|
439
|
+
For a strict TDD story with isolation violation:
|
|
440
|
+
```
|
|
441
|
+
Attempt 1: three-session-tdd @ fast → isolation violation
|
|
442
|
+
Attempt 2: three-session-tdd-lite @ fast → tests fail
|
|
443
|
+
Attempt 3: tdd-lite @ balanced → tests fail
|
|
444
|
+
Attempt 4: tdd-lite @ balanced → tests fail
|
|
445
|
+
Attempt 5: tdd-lite @ powerful → success ✅ (or fail → pause)
|
|
446
|
+
```
|
|
447
|
+
|
|
448
|
+
Max cost is bounded by the existing tier budget. No new config needed.
|
|
449
|
+
|
|
450
|
+
---
|
|
451
|
+
|
|
452
|
+
# Feature 3: Structured Verifier Verdicts
|
|
453
|
+
|
|
454
|
+
## Summary
|
|
455
|
+
|
|
456
|
+
The verifier (session 3) is designed to judge whether the implementer's changes are legitimate — especially when the implementer modified test files. Currently, this judgment is implicit: the verifier runs as a regular agent, and the only signal is "did tests pass after verifier ran?" There's no structured verdict flowing back to the pipeline.
|
|
457
|
+
|
|
458
|
+
Add structured output parsing to the verifier session so its judgment feeds into `failureCategory` and the escalation system.
|
|
459
|
+
|
|
460
|
+
## Problem
|
|
461
|
+
|
|
462
|
+
Current verifier prompt asks it to:
|
|
463
|
+
1. Run tests and verify they pass
|
|
464
|
+
2. Review implementation quality
|
|
465
|
+
3. Check acceptance criteria
|
|
466
|
+
4. **Check if implementer modified test files and judge legitimacy**
|
|
467
|
+
5. Fix issues minimally
|
|
468
|
+
|
|
469
|
+
But the result is just `{ success: boolean, estimatedCost: number }` — same as any agent session. The verifier's judgment about test modifications, code quality, and acceptance criteria is lost.
|
|
470
|
+
|
|
471
|
+
**Consequences:**
|
|
472
|
+
- If verifier finds illegitimate test modifications, it tries to fix them but we don't know *what* it found
|
|
473
|
+
- If verifier can't fix the issue, it exits non-zero → treated same as a crash
|
|
474
|
+
- No signal to differentiate "tests pass but code is bad" from "tests fail"
|
|
475
|
+
- The `VerifierDecision` type exists in `types.ts` but is **never populated**
|
|
476
|
+
|
|
477
|
+
## Proposed Solution
|
|
478
|
+
|
|
479
|
+
### Structured Verdict File
|
|
480
|
+
|
|
481
|
+
Instead of parsing agent stdout (fragile), the verifier writes a structured verdict file that the orchestrator reads after the session:
|
|
482
|
+
|
|
483
|
+
```
|
|
484
|
+
<workdir>/.nax-verifier-verdict.json
|
|
485
|
+
```
|
|
486
|
+
|
|
487
|
+
**Why a file?** Claude Code (the agent) can easily write files. Parsing structured output from stdout is unreliable with Claude Code since it mixes tool calls, thinking, and output.
|
|
488
|
+
|
|
489
|
+
### Verdict Schema
|
|
490
|
+
|
|
491
|
+
```typescript
|
|
492
|
+
interface VerifierVerdict {
|
|
493
|
+
/** Schema version */
|
|
494
|
+
version: 1;
|
|
495
|
+
|
|
496
|
+
/** Overall approval */
|
|
497
|
+
approved: boolean;
|
|
498
|
+
|
|
499
|
+
/** Test results */
|
|
500
|
+
tests: {
|
|
501
|
+
/** Did all tests pass? */
|
|
502
|
+
allPassing: boolean;
|
|
503
|
+
/** Number of tests passing */
|
|
504
|
+
passCount: number;
|
|
505
|
+
/** Number of tests failing */
|
|
506
|
+
failCount: number;
|
|
507
|
+
};
|
|
508
|
+
|
|
509
|
+
/** Implementer test modification review */
|
|
510
|
+
testModifications: {
|
|
511
|
+
/** Were test files modified by implementer? */
|
|
512
|
+
detected: boolean;
|
|
513
|
+
/** List of modified test files */
|
|
514
|
+
files: string[];
|
|
515
|
+
/** Are the modifications legitimate? */
|
|
516
|
+
legitimate: boolean;
|
|
517
|
+
/** Reasoning for legitimacy judgment */
|
|
518
|
+
reasoning: string;
|
|
519
|
+
};
|
|
520
|
+
|
|
521
|
+
/** Acceptance criteria check */
|
|
522
|
+
acceptanceCriteria: {
|
|
523
|
+
/** All criteria met? */
|
|
524
|
+
allMet: boolean;
|
|
525
|
+
/** Per-criterion status */
|
|
526
|
+
criteria: Array<{
|
|
527
|
+
criterion: string;
|
|
528
|
+
met: boolean;
|
|
529
|
+
note?: string;
|
|
530
|
+
}>;
|
|
531
|
+
};
|
|
532
|
+
|
|
533
|
+
/** Code quality assessment */
|
|
534
|
+
quality: {
|
|
535
|
+
/** Overall quality: good | acceptable | poor */
|
|
536
|
+
rating: "good" | "acceptable" | "poor";
|
|
537
|
+
/** Issues found */
|
|
538
|
+
issues: string[];
|
|
539
|
+
};
|
|
540
|
+
|
|
541
|
+
/** Fixes applied by verifier */
|
|
542
|
+
fixes: string[];
|
|
543
|
+
|
|
544
|
+
/** Overall reasoning */
|
|
545
|
+
reasoning: string;
|
|
546
|
+
}
|
|
547
|
+
```
|
|
548
|
+
|
|
549
|
+
### Updated Verifier Prompt
|
|
550
|
+
|
|
551
|
+
```typescript
|
|
552
|
+
export function buildVerifierPrompt(story: UserStory): string {
|
|
553
|
+
return `# Test-Driven Development — Session 3: Verify
|
|
554
|
+
|
|
555
|
+
You are in the third session of a three-session TDD workflow. Tests and implementation are complete.
|
|
556
|
+
|
|
557
|
+
**Story:** ${story.title}
|
|
558
|
+
|
|
559
|
+
**Your tasks:**
|
|
560
|
+
1. Run all tests and verify they pass
|
|
561
|
+
2. Review the implementation for quality and correctness
|
|
562
|
+
3. Check that the implementation meets all acceptance criteria
|
|
563
|
+
4. Check if test files were modified by the implementer. If yes, verify the changes are legitimate fixes (e.g. fixing incorrect expectations) and NOT just loosening assertions to mask bugs.
|
|
564
|
+
5. If any issues exist, fix them minimally
|
|
565
|
+
|
|
566
|
+
**Acceptance Criteria:**
|
|
567
|
+
${story.acceptanceCriteria.map((ac, i) => `${i + 1}. ${ac}`).join("\n")}
|
|
568
|
+
|
|
569
|
+
**IMPORTANT — Write Verdict File:**
|
|
570
|
+
After completing your review, write a JSON verdict file to \`.nax-verifier-verdict.json\` in the project root.
|
|
571
|
+
|
|
572
|
+
\`\`\`json
|
|
573
|
+
{
|
|
574
|
+
"version": 1,
|
|
575
|
+
"approved": true,
|
|
576
|
+
"tests": {
|
|
577
|
+
"allPassing": true,
|
|
578
|
+
"passCount": 15,
|
|
579
|
+
"failCount": 0
|
|
580
|
+
},
|
|
581
|
+
"testModifications": {
|
|
582
|
+
"detected": false,
|
|
583
|
+
"files": [],
|
|
584
|
+
"legitimate": true,
|
|
585
|
+
"reasoning": "No test files were modified by implementer"
|
|
586
|
+
},
|
|
587
|
+
"acceptanceCriteria": {
|
|
588
|
+
"allMet": true,
|
|
589
|
+
"criteria": [
|
|
590
|
+
{ "criterion": "Criterion text", "met": true }
|
|
591
|
+
]
|
|
592
|
+
},
|
|
593
|
+
"quality": {
|
|
594
|
+
"rating": "good",
|
|
595
|
+
"issues": []
|
|
596
|
+
},
|
|
597
|
+
"fixes": [],
|
|
598
|
+
"reasoning": "All tests pass, implementation is clean, all criteria met."
|
|
599
|
+
}
|
|
600
|
+
\`\`\`
|
|
601
|
+
|
|
602
|
+
Set \`approved: false\` if:
|
|
603
|
+
- Tests are failing and you cannot fix them
|
|
604
|
+
- Implementer loosened test assertions to mask bugs (testModifications.legitimate = false)
|
|
605
|
+
- Critical acceptance criteria are not met
|
|
606
|
+
- Code quality is poor with security or correctness issues
|
|
607
|
+
|
|
608
|
+
Set \`approved: true\` if:
|
|
609
|
+
- All tests pass (or pass after your minimal fixes)
|
|
610
|
+
- Implementation is clean and follows conventions
|
|
611
|
+
- All acceptance criteria met
|
|
612
|
+
- Any test modifications by implementer are legitimate fixes
|
|
613
|
+
|
|
614
|
+
When done, commit any fixes with message: "fix: verify and adjust ${story.title}"`;
|
|
615
|
+
}
|
|
616
|
+
```
|
|
617
|
+
|
|
618
|
+
### Orchestrator Changes
|
|
619
|
+
|
|
620
|
+
After verifier session completes, read and parse the verdict file:
|
|
621
|
+
|
|
622
|
+
```typescript
|
|
623
|
+
// In tdd/orchestrator.ts, after session 3 completes:
|
|
624
|
+
|
|
625
|
+
// Read verdict file
|
|
626
|
+
const verdictPath = path.join(workdir, ".nax-verifier-verdict.json");
|
|
627
|
+
let verdict: VerifierVerdict | null = null;
|
|
628
|
+
|
|
629
|
+
try {
|
|
630
|
+
const file = Bun.file(verdictPath);
|
|
631
|
+
if (await file.exists()) {
|
|
632
|
+
verdict = await file.json() as VerifierVerdict;
|
|
633
|
+
logger.info("tdd", "Verifier verdict loaded", {
|
|
634
|
+
storyId: story.id,
|
|
635
|
+
approved: verdict.approved,
|
|
636
|
+
testsAllPassing: verdict.tests.allPassing,
|
|
637
|
+
testModsDetected: verdict.testModifications.detected,
|
|
638
|
+
testModsLegitimate: verdict.testModifications.legitimate,
|
|
639
|
+
qualityRating: verdict.quality.rating,
|
|
640
|
+
allCriteriaMet: verdict.acceptanceCriteria.allMet,
|
|
641
|
+
});
|
|
642
|
+
} else {
|
|
643
|
+
logger.warn("tdd", "No verifier verdict file found — falling back to test-only check", {
|
|
644
|
+
storyId: story.id,
|
|
645
|
+
});
|
|
646
|
+
}
|
|
647
|
+
} catch (err) {
|
|
648
|
+
logger.warn("tdd", "Failed to parse verifier verdict", {
|
|
649
|
+
storyId: story.id,
|
|
650
|
+
error: String(err),
|
|
651
|
+
});
|
|
652
|
+
}
|
|
653
|
+
|
|
654
|
+
// Clean up verdict file (don't leave it in the repo)
|
|
655
|
+
try {
|
|
656
|
+
await unlink(verdictPath);
|
|
657
|
+
} catch { /* ignore */ }
|
|
658
|
+
```
|
|
659
|
+
|
|
660
|
+
### Verdict → failureCategory Mapping
|
|
661
|
+
|
|
662
|
+
```typescript
|
|
663
|
+
function categorizeVerdict(
|
|
664
|
+
verdict: VerifierVerdict | null,
|
|
665
|
+
session3Success: boolean,
|
|
666
|
+
testsPass: boolean,
|
|
667
|
+
): { success: boolean; failureCategory?: FailureCategory; reviewReason?: string } {
|
|
668
|
+
|
|
669
|
+
// No verdict file → fall back to existing behavior (test-only check)
|
|
670
|
+
if (!verdict) {
|
|
671
|
+
if (testsPass) return { success: true };
|
|
672
|
+
return {
|
|
673
|
+
success: false,
|
|
674
|
+
failureCategory: "tests-failing",
|
|
675
|
+
reviewReason: "Tests failing after all sessions (no verdict file)",
|
|
676
|
+
};
|
|
677
|
+
}
|
|
678
|
+
|
|
679
|
+
// Verdict: approved
|
|
680
|
+
if (verdict.approved) {
|
|
681
|
+
return { success: true };
|
|
682
|
+
}
|
|
683
|
+
|
|
684
|
+
// Verdict: not approved — classify why
|
|
685
|
+
|
|
686
|
+
// Illegitimate test modifications (implementer cheated)
|
|
687
|
+
if (verdict.testModifications.detected && !verdict.testModifications.legitimate) {
|
|
688
|
+
return {
|
|
689
|
+
success: false,
|
|
690
|
+
failureCategory: "verifier-rejected",
|
|
691
|
+
reviewReason: `Verifier rejected: illegitimate test modifications in ${verdict.testModifications.files.join(", ")}. ${verdict.testModifications.reasoning}`,
|
|
692
|
+
};
|
|
693
|
+
}
|
|
694
|
+
|
|
695
|
+
// Tests failing
|
|
696
|
+
if (!verdict.tests.allPassing) {
|
|
697
|
+
return {
|
|
698
|
+
success: false,
|
|
699
|
+
failureCategory: "tests-failing",
|
|
700
|
+
reviewReason: `Tests failing: ${verdict.tests.failCount} failures. ${verdict.reasoning}`,
|
|
701
|
+
};
|
|
702
|
+
}
|
|
703
|
+
|
|
704
|
+
// Acceptance criteria not met
|
|
705
|
+
if (!verdict.acceptanceCriteria.allMet) {
|
|
706
|
+
const unmet = verdict.acceptanceCriteria.criteria
|
|
707
|
+
.filter(c => !c.met)
|
|
708
|
+
.map(c => c.criterion);
|
|
709
|
+
return {
|
|
710
|
+
success: false,
|
|
711
|
+
failureCategory: "verifier-rejected",
|
|
712
|
+
reviewReason: `Acceptance criteria not met: ${unmet.join("; ")}`,
|
|
713
|
+
};
|
|
714
|
+
}
|
|
715
|
+
|
|
716
|
+
// Poor quality
|
|
717
|
+
if (verdict.quality.rating === "poor") {
|
|
718
|
+
return {
|
|
719
|
+
success: false,
|
|
720
|
+
failureCategory: "verifier-rejected",
|
|
721
|
+
reviewReason: `Poor code quality: ${verdict.quality.issues.join("; ")}`,
|
|
722
|
+
};
|
|
723
|
+
}
|
|
724
|
+
|
|
725
|
+
// Catch-all: verdict says not approved but no clear reason
|
|
726
|
+
return {
|
|
727
|
+
success: false,
|
|
728
|
+
failureCategory: "verifier-rejected",
|
|
729
|
+
reviewReason: verdict.reasoning || "Verifier rejected without specific reason",
|
|
730
|
+
};
|
|
731
|
+
}
|
|
732
|
+
```
|
|
733
|
+
|
|
734
|
+
### Escalation Behavior per Verdict
|
|
735
|
+
|
|
736
|
+
| Verdict Reason | failureCategory | Escalation Path |
|
|
737
|
+
|:---------------|:---------------|:---------------|
|
|
738
|
+
| Illegitimate test mods | `verifier-rejected` | Escalate tier → pause after all tiers |
|
|
739
|
+
| Tests failing | `tests-failing` | Escalate tier → fail after all tiers |
|
|
740
|
+
| Criteria not met | `verifier-rejected` | Escalate tier → pause after all tiers |
|
|
741
|
+
| Poor quality | `verifier-rejected` | Escalate tier → pause after all tiers |
|
|
742
|
+
| Approved | — | Success ✅ |
|
|
743
|
+
| No verdict file | Falls back to test check | Same as before |
|
|
744
|
+
|
|
745
|
+
### Verdict File Lifecycle
|
|
746
|
+
|
|
747
|
+
1. **Created by:** Verifier agent (session 3) writes `.nax-verifier-verdict.json`
|
|
748
|
+
2. **Read by:** TDD orchestrator after session 3 completes
|
|
749
|
+
3. **Deleted by:** TDD orchestrator after reading (not committed to git)
|
|
750
|
+
4. **Fallback:** If file missing or unparseable, fall back to existing behavior (post-TDD test verification)
|
|
751
|
+
|
|
752
|
+
### `.gitignore`
|
|
753
|
+
|
|
754
|
+
Add to project `.gitignore` (or nax init template):
|
|
755
|
+
```
|
|
756
|
+
.nax-verifier-verdict.json
|
|
757
|
+
```
|
|
758
|
+
|
|
759
|
+
### Files to Change
|
|
760
|
+
|
|
761
|
+
| File | Change |
|
|
762
|
+
|:-----|:-------|
|
|
763
|
+
| `src/tdd/types.ts` | Add `VerifierVerdict` interface |
|
|
764
|
+
| `src/tdd/prompts.ts` | Update `buildVerifierPrompt()` with verdict file instructions |
|
|
765
|
+
| `src/tdd/orchestrator.ts` | Read verdict file after session 3, map to `failureCategory` |
|
|
766
|
+
| `src/tdd/verdict.ts` | **New file.** `readVerdict()`, `categorizeVerdict()`, `cleanupVerdict()` |
|
|
767
|
+
|
|
768
|
+
### Testing
|
|
769
|
+
|
|
770
|
+
| Test | Description |
|
|
771
|
+
|:-----|:------------|
|
|
772
|
+
| `tdd/verdict.test.ts` | Unit: `categorizeVerdict()` for all verdict combinations |
|
|
773
|
+
| `tdd/verdict.test.ts` | Unit: missing verdict file falls back gracefully |
|
|
774
|
+
| `tdd/verdict.test.ts` | Unit: malformed JSON falls back gracefully |
|
|
775
|
+
| `tdd/orchestrator.test.ts` | Integration: verdict file read + cleanup after session 3 |
|
|
776
|
+
| `tdd/orchestrator.test.ts` | Integration: illegitimate test mods → `verifier-rejected` |
|
|
777
|
+
| Manual | Run TDD on a story, verify verdict file is written and consumed |
|
|
778
|
+
|
|
779
|
+
### Robustness
|
|
780
|
+
|
|
781
|
+
**What if the agent doesn't write the verdict file?**
|
|
782
|
+
Fall back to existing behavior: run tests independently, check pass/fail. This is the same as v0.10.0. The verdict file is an enhancement, not a requirement.
|
|
783
|
+
|
|
784
|
+
**What if the JSON is malformed?**
|
|
785
|
+
Log warning, fall back to test-only check. Never crash.
|
|
786
|
+
|
|
787
|
+
**What if the agent writes wrong data?**
|
|
788
|
+
Validate required fields (`version`, `approved`, `tests`). Missing fields → fall back. The verdict is advisory — the independent test run is the ground truth for "tests pass."
|
|
789
|
+
|
|
790
|
+
---
|
|
791
|
+
|
|
792
|
+
# v0.10.1 Summary
|
|
793
|
+
|
|
794
|
+
Three features, cohesive release:
|
|
795
|
+
|
|
796
|
+
| Feature | Files Changed | Effort | Dependency |
|
|
797
|
+
|:--------|:-------------|:-------|:-----------|
|
|
798
|
+
| 1. `--status-file` | 3 (new `status-file.ts`, modify `runner.ts`, CLI) | Medium | None |
|
|
799
|
+
| 2. TDD Escalation Retry | 5 (types, orchestrator, execution stage, pipeline types, runner) | Medium | None |
|
|
800
|
+
| 3. Structured Verifier Verdicts | 4 (types, prompts, orchestrator, new `verdict.ts`) | Medium | Feature 2 (feeds `failureCategory`) |
|
|
801
|
+
|
|
802
|
+
**Total files:** 10 changed/new (some overlap — `types.ts` and `orchestrator.ts` touched by features 2+3).
|
|
803
|
+
|
|
804
|
+
**Breaking changes:** None. All features are additive/optional.
|
|
805
|
+
|
|
806
|
+
**Config changes:** None. Uses existing escalation config.
|
|
807
|
+
|
|
808
|
+
### Implementation Order
|
|
809
|
+
|
|
810
|
+
1. Feature 1 (`--status-file`) — independent, can ship alone
|
|
811
|
+
2. Feature 2 (TDD escalation) — core retry logic
|
|
812
|
+
3. Feature 3 (verifier verdicts) — builds on feature 2's `failureCategory`
|