npm - @nathapp/nax - Versions diffs - 0.20.0 → 0.22.0 - Mend

@nathapp/nax 0.20.0 → 0.22.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (233) hide show

package/docs/specs/bug-041-cross-story-test-isolation.md ADDED Viewed

@@ -0,0 +1,88 @@
+# BUG-041 — Cross-Story Test Isolation
+**Status:** Won't Fix — superseded by FEAT-010 (baseRef tracking eliminates root cause)
+**Target:** N/A
+**Author:** Nax Dev
+**Date:** 2026-03-06
+---
+## 1. Problem
+**Scenario:**
+1. Story A touches `src/parser.ts`. Verify runs `test/unit/parser.test.ts` → 2 tests fail. Story A escalates.
+2. Story B touches `src/formatter.ts`. Smart runner also picks up `test/unit/parser.test.ts` (both changed since common base). Formatter tests pass, parser tests still fail (inherited from Story A).
+3. Story B is marked failed — its implementation was correct. It escalates needlessly.
+**Root cause:** Verify has no memory of which test failures pre-existed before a story's session. All failures are attributed to the current story.
+---
+## 2. Root Cause
+The verify stage runs tests and reports pass/fail with no concept of:
+- Which tests were already failing before this story ran
+- Whether a failure is "inherited" vs "introduced by this story"
+---
+## 3. Proposed Solution
+### 3.1 Baseline snapshot at story start
+Before the agent session starts (same time as FEAT-010's `baseRef` capture), record which test files the smart runner would pick up for this story and which are already failing. Store as `story.inheritedFailures: string[]`.
+### 3.2 Verify: filter inherited failures
+After running tests and parsing `TestFailure[]`:
+- If ALL failures are in `inheritedFailures` files → return `{ action: "continue" }` with warning: *"Failures are pre-existing — not attributed to this story"*
+- If ANY failure is in a new file → escalate normally
+### 3.3 Re-verify when source story resolves
+When Story A eventually passes verify, clear its test files from downstream stories' `inheritedFailures` so they get re-evaluated on the next run.
+---
+## 4. Data Model Changes
+```typescript
+// src/prd/types.ts
+interface UserStory {
+  baseRef?: string;               // from FEAT-010
+  inheritedFailures?: string[];   // NEW — test files already failing before this story
+}
+```
+---
+## 5. Files Affected
+| File | Change |
+|---|---|
+| `src/prd/types.ts` | Add `inheritedFailures?: string[]` to `UserStory` |
+| `src/execution/sequential-executor.ts` | Capture `inheritedFailures` baseline before agent runs |
+| `src/verification/smart-runner.ts` | Export `runBaselineCheck(testFiles, workdir)` helper |
+| `src/pipeline/stages/verify.ts` | Filter inherited failures from escalation decision |
+| `src/execution/lifecycle/run-regression.ts` | Clear inherited failures when source story passes |
+---
+## 6. Edge Cases
+| Scenario | Handling |
+|---|---|
+| Baseline check times out | `inheritedFailures: []` — conservative, may incorrectly blame story but no false passes |
+| Flaky inherited failure disappears | Story B's verify finds no inherited failures → correct attribution |
+| ALL test files in `inheritedFailures` | Return `continue` with warning |
+| First story in a run | No prior failures → `inheritedFailures: []` → normal behavior |
+| Deferred regression gate | Runs after all stories pass — inherited failures expected to be resolved |
+---
+## 7. Test Plan
+- Story B inherits Story A's failing test file → verify returns `continue` (not escalated)
+- Story B introduces new failing test → escalated normally
+- Story A passes → Story B's `inheritedFailures` cleared for next run
+- Baseline check timeout → `inheritedFailures: []` → conservative

package/docs/specs/bug-042-verifier-failure-capture.md ADDED Viewed

@@ -0,0 +1,117 @@
+# BUG-042 — Verifier Test Failure Capture
+**Status:** Proposal
+**Target:** v0.21.0
+**Author:** Nax Dev
+**Date:** 2026-03-06
+---
+## 1. Problem
+The deferred regression gate (`run-regression.ts`) calls `parseBunTestOutput()` → gets structured `TestFailure[]` (file, testName, error, stackTrace) → targeted rectification works well.
+The per-story verify stage (`verify.ts`) does NOT call `parseBunTestOutput()` on failure → passes raw output string to rectification → agent receives a wall of text and must parse it mentally.
+**Same failure, two different agent experiences:**
+| Path | Agent gets | Quality |
+|---|---|---|
+| Deferred regression | Structured `TestFailure[]` | ✅ Precise context |
+| Per-story verify | Raw output (last 20 lines) | ⚠️ Noisy, may miss root cause |
+---
+## 2. Current vs Proposed Data Flow
+**Current:**
+```
+verify.ts → runVerification() → { success: false, output: "...raw..." }
+  → rectification: testOutput = raw string
+  → priorFailures[].testFailures = undefined
+  → agent prompt: wall of text
+```
+**Proposed:**
+```
+verify.ts → runVerification() → { success: false, output: "...raw..." }
+  → parseBunTestOutput(output) → TestFailure[]
+  → VerificationResult.failures = TestFailure[]
+  → rectification: testOutput + structured failures
+  → priorFailures[].testFailures = TestFailure[]
+  → agent prompt: structured failure table
+```
+---
+## 3. Code Changes
+**`src/verification/types.ts`** — add failures field:
+```typescript
+interface VerificationResult {
+  success: boolean;
+  output?: string;
+  status: "SUCCESS" | "TEST_FAILURE" | "TIMEOUT" | "ERROR";
+  passCount?: number;
+  failCount?: number;
+  failures?: TestFailure[];   // NEW
+}
+```
+**`src/pipeline/stages/verify.ts`** — parse on failure:
+```typescript
+// Add to _verifyDeps:
+export const _verifyDeps = {
+  regression,
+  parseBunTestOutput,   // NEW — injectable for tests
+};
+// After runVerification() failure:
+if (!result.success && result.output) {
+  result.failures = _verifyDeps.parseBunTestOutput(result.output).failures;
+}
+```
+**Structured log** — replace last-20-lines with failure summary:
+```typescript
+// Current: logger.warn("verify", "Test failures", { output: last20lines });
+// Proposed:
+for (const f of (result.failures ?? []).slice(0, 5)) {
+  logger.warn("verify", `FAIL: ${f.testName}`, { file: f.file, error: f.error });
+}
+```
+**`src/execution/post-verify-rectification.ts`** — populate `testFailures` in `StructuredFailure`:
+```typescript
+const structuredFailure: StructuredFailure = {
+  // ...existing fields
+  testFailures: result.failures?.map(f => ({
+    file: f.file ?? "",
+    testName: f.testName,
+    error: f.error,
+    stackTrace: f.stackTrace ?? [],
+  })),
+};
+```
+---
+## 4. Files Affected
+| File | Change |
+|---|---|
+| `src/verification/types.ts` | Add `failures?: TestFailure[]` to `VerificationResult` |
+| `src/pipeline/stages/verify.ts` | Call `parseBunTestOutput()` on failure; add to `_verifyDeps` |
+| `src/execution/post-verify-rectification.ts` | Populate `testFailures` from `result.failures` |
+| `src/execution/verification.ts` | Pass `failures` through if available |
+---
+## 5. Test Plan
+- `verify.ts` test failure → `result.failures` populated with `TestFailure[]`
+- `result.failures` forwarded to rectification loop and `priorFailures`
+- Agent prompt includes structured failure table (via existing priorFailures formatter)
+- `parseBunTestOutput` in `_verifyDeps` is mockable
+- Empty/no output → `result.failures = []` (no crash)
+- Timeout → `result.failures` not set (timeout ≠ test failure)

package/docs/specs/feat-010-smart-runner-git-history.md ADDED Viewed

@@ -0,0 +1,96 @@
+# FEAT-010 — Smart Test Runner: Git-History Mode
+**Status:** Proposal
+**Target:** v0.21.0
+**Author:** Nax Dev
+**Date:** 2026-03-06
+---
+## 1. Problem with Current Approach
+Smart Test Runner uses `git diff --name-only HEAD` (or `HEAD~1`) to find changed files. This breaks in several scenarios:
+| Scenario | Problem |
+|---|---|
+| Agent makes 3 commits | `HEAD~1` only sees last commit; earlier changes missed |
+| Agent uses `git commit --amend` | HEAD stays same; diff shows nothing |
+| Uncommitted staged changes | Picks up unrelated staged changes |
+| Story retried after partial commit | Baseline resets to wrong point |
+Result: empty `[]` → full suite fallback (150s+) → deferred mode skips → no per-story tests.
+---
+## 2. Proposed Solution
+Track a **baseCommitHash** per story at session start. On verify, diff `HEAD` vs `baseCommitHash` — exact files the agent touched regardless of commit count.
+```
+Story starts → capture git HEAD → store as story.baseRef
+Agent runs   → makes N commits (any pattern)
+Verify runs  → git diff --name-only story.baseRef HEAD → precise file list
+```
+---
+## 3. Implementation Details
+**Capture baseRef** in `sequential-executor.ts` before agent launch:
+```typescript
+story.baseRef = await captureGitRef(workdir);  // already exists in utils/git.ts
+await savePrd(prd, prdPath);
+```
+**New mode branch** in `smart-runner.ts`:
+```typescript
+if (mode === "git-history" && story?.baseRef) {
+  return gitWithTimeout(["diff", "--name-only", story.baseRef, "HEAD"], workdir);
+}
+// fallback: existing git-diff logic
+```
+---
+## 4. Files Affected
+| File | Change |
+|---|---|
+| `src/prd/types.ts` | Add `baseRef?: string` to `UserStory` |
+| `src/execution/sequential-executor.ts` | Capture `baseRef` before agent, persist to PRD |
+| `src/verification/smart-runner.ts` | Add `"git-history"` mode |
+| `src/config/schemas.ts` | Add `smartTestRunner.mode: "git-diff" | "git-history"` |
+| `src/config/types.ts` | Add `mode` to `SmartTestRunnerConfig` |
+---
+## 5. Config Changes
+```jsonc
+{
+  "execution": {
+    "smartTestRunnerConfig": {
+      "mode": "git-history",   // "git-diff" (default) | "git-history"
+      "enabled": true
+    }
+  }
+}
+```
+---
+## 6. Migration / Compatibility
+- Default: `"git-diff"` — no behavior change
+- `"git-history"` opt-in
+- Missing `story.baseRef` → falls back to `"git-diff"` (no crash)
+- nax self-dev config should switch to `"git-history"` immediately
+---
+## 7. Test Plan
+- `baseRef` captured and persisted before agent runs
+- Multi-commit session: all files detected (not just last commit's)
+- Missing `baseRef` → graceful fallback to `"git-diff"`
+- `captureGitRef()` failure → `baseRef` undefined, fallback used

package/docs/specs/feat-011-file-context-strategy.md ADDED Viewed

@@ -0,0 +1,73 @@
+# FEAT-011 — File Context Strategy
+**Status:** Proposal
+**Target:** v0.21.0
+**Author:** Nax Dev
+**Date:** 2026-03-06
+---
+## 1. Problem
+nax injects full file content into agent prompts for all relevant source files. For large files (500+ lines), this bloats the context window — increasing cost and reducing focus. The agent has tool access to read files directly, making full content injection for large files redundant.
+---
+## 2. Proposed Config
+```jsonc
+{
+  "context": {
+    "fileContext": {
+      "strategy": "auto",      // "auto" | "full" | "path-only"
+      "maxInlineLines": 500,   // threshold for "auto" mode
+      "previewLines": 20       // lines shown in path-only / large-file preview
+    }
+  }
+}
+```
+---
+## 3. Injection Logic
+| Strategy | Condition | Agent receives |
+|---|---|---|
+| `"full"` | always | Complete file content |
+| `"path-only"` | always | Relative path + line count only |
+| `"auto"` | file ≤ `maxInlineLines` | Complete file content |
+| `"auto"` | file > `maxInlineLines` | Path + line count + first `previewLines` lines |
+**Large file preview format:**
+```
+// src/execution/sequential-executor.ts (847 lines — use Read tool for full content)
+import { ... } from "...";
+// first 20 lines...
+```
+---
+## 4. Files Affected
+| File | Change |
+|---|---|
+| `src/config/schemas.ts` | Add `context.fileContext` schema |
+| `src/config/types.ts` | Add `FileContextConfig` interface |
+| `src/context/builder.ts` | Apply strategy when injecting file content |
+| `src/context/providers/` | Update providers that inject raw file content |
+---
+## 5. Cost Impact
+Primary benefit is **quality** (more focused context), not raw cost savings. Rough estimate for a typical 5-story run: ~3000 tokens saved if avg file is 800 lines. At sonnet pricing: <$0.01 per run — marginal, but compounds.
+---
+## 6. Test Plan
+- `strategy: "full"` → always full content regardless of line count
+- `strategy: "path-only"` → always path + count only
+- `strategy: "auto"`, 300-line file → full content
+- `strategy: "auto"`, 600-line file → path + 20-line preview
+- Default: `"auto"` with `maxInlineLines: 500`

package/docs/specs/feat-012-tdd-writer-tier.md ADDED Viewed

@@ -0,0 +1,79 @@
+# FEAT-012 — TDD Test Writer Tier Validation
+**Status:** Won't Fix — balanced tier is sufficient for test-writer; not worth the added complexity
+**Target:** v0.21.0
+**Author:** Nax Dev
+**Date:** 2026-03-06
+---
+## 1. Problem
+nax TDD runs two sessions: **testWriter** then **implementer**. The testWriter tier is configured separately (`tdd.sessionTiers.testWriter`, default `"balanced"`). The implementer uses the story's routed `modelTier`.
+**Risk:** If testWriter runs `"fast"` and the implementer runs `"powerful"`, the tests written may be too shallow — they test happy paths but miss edge cases a powerful model's implementation handles. Result: powerful implementer writes sophisticated code, all tests pass (trivially), then the deferred regression gate catches real failures.
+---
+## 2. Tier Ordering
+```
+fast (1) < balanced (2) < powerful (3)
+```
+**Invariant:** `testWriterTier >= implementerTier`
+---
+## 3. Validation Logic
+In `src/tdd/session-runner.ts` before launching testWriter:
+```typescript
+const tierOrder = { fast: 1, balanced: 2, powerful: 3 };
+const writerTier = config.tdd.sessionTiers?.testWriter ?? "balanced";
+const implementerTier = story.routing.modelTier ?? "balanced";
+if (tierOrder[writerTier] < tierOrder[implementerTier]) {
+  if (config.tdd.enforceWriterTierParity) {
+    effectiveWriterTier = implementerTier; // auto-elevate
+    logger.warn("tdd", `Auto-elevated testWriter tier ${writerTier} → ${implementerTier}`);
+  } else {
+    logger.warn("tdd", `testWriter tier (${writerTier}) < implementer tier (${implementerTier}) — tests may be shallow`);
+  }
+}
+```
+---
+## 4. Config Changes
+```jsonc
+{
+  "tdd": {
+    "sessionTiers": { "testWriter": "balanced", "verifier": "fast" },
+    "enforceWriterTierParity": false   // NEW — auto-elevates testWriter when true
+  }
+}
+```
+`nax config --explain`: *"testWriter tier should be ≥ implementer tier. Enable enforceWriterTierParity to auto-elevate."*
+---
+## 5. Files Affected
+| File | Change |
+|---|---|
+| `src/tdd/session-runner.ts` | Tier comparison + warn/elevate logic |
+| `src/config/schemas.ts` | Add `tdd.enforceWriterTierParity` (boolean, default false) |
+| `src/config/types.ts` | Add `enforceWriterTierParity` to `TddConfig` |
+| `src/config/defaults.ts` | Default: `false` |
+---
+## 6. Test Plan
+- `writerTier < implementerTier`, `enforceWriterTierParity: false` → warning logged, tier unchanged
+- `writerTier < implementerTier`, `enforceWriterTierParity: true` → tier elevated, warning logged
+- `writerTier >= implementerTier` → no warning, no change

package/docs/specs/feat-013-test-after-review.md ADDED Viewed

@@ -0,0 +1,89 @@
+# FEAT-013 — Test-After Strategy Review & Deprecation Path
+**Status:** Proposal
+**Target:** v0.21.0
+**Author:** Nax Dev
+**Date:** 2026-03-06
+---
+## 1. Problem with `test-after`
+`test-after` runs the agent in a single session: implement first, then write tests. Structural problem: **the agent writes tests to match its own (possibly broken) implementation.** Tests confirm buggy behavior rather than guarding against it.
+---
+## 2. Strategy Comparison
+| Strategy | Order | Sessions | Quality | Risk |
+|---|---|---|---|---|
+| `tdd-lite` | Tests → Impl | 2 | ✅ High | Low |
+| `three-session-tdd` | Tests → Impl → Verify | 3 | ✅✅ Highest | Very low |
+| `test-after` | Impl → Tests | 1 | ⚠️ Variable | High — tests may confirm bugs |
+---
+## 3. Proposed Changes
+### 3.1 Post-write isolation verify (opt-in fix)
+After agent's session completes, run new test files against a clean stash of the implementation — tests should **fail** without the implementation (proving they actually test something):
+```
+1. Agent writes impl + tests
+2. git stash (hide impl changes)
+3. Run new test files → should FAIL (no impl)
+4. git stash pop
+5. If tests PASSED in step 3 → escalate ("trivially passing tests")
+6. Normal verify (impl + tests together)
+```
+Config: `tdd.testAfterIsolationVerify: true` (default: false)
+### 3.2 Remove from auto-routing
+LLM router and keyword router no longer auto-assign `test-after`. It only runs when:
+- Explicitly set in PRD (`testStrategy: "test-after"`)
+- OR `execution.allowTestAfter: true` and router returns it
+### 3.3 Warning in `nax config --explain`
+### 3.4 Config gate
+```jsonc
+{
+  "execution": { "allowTestAfter": true },        // NEW — false blocks test-after
+  "tdd": { "testAfterIsolationVerify": false }    // NEW — opt-in isolation check
+}
+```
+---
+## 4. Migration Path
+| Version | Change |
+|---|---|
+| v0.21.0 | Warning in --explain. Remove from auto-routing. Add `allowTestAfter` config. |
+| v0.22.0 | `allowTestAfter` default → `false`. Explicit opt-in required. |
+| v0.23.0+ | Evaluate full removal. |
+---
+## 5. Files Affected
+| File | Change |
+|---|---|
+| `src/routing/strategies/llm.ts` | Remove `test-after` from auto-assignable set |
+| `src/routing/strategies/keyword.ts` | Remove `test-after` from auto-assignable set |
+| `src/tdd/session-runner.ts` | Add isolation verify step for `test-after` |
+| `src/config/schemas.ts` | Add `execution.allowTestAfter`, `tdd.testAfterIsolationVerify` |
+| `src/cli/config.ts` | Add warning in `--explain` for `test-after` |
+---
+## 6. Test Plan
+- `allowTestAfter: false` + router selects `test-after` → fallback to `tdd-lite` + warning
+- `testAfterIsolationVerify: true` + tests pass on clean stash → escalate
+- `testAfterIsolationVerify: true` + tests fail on clean stash → normal (tests are genuine)
+- LLM router no longer returns `test-after` in auto-routing

package/docs/specs/feat-014-heartbeat-observability.md ADDED Viewed

@@ -0,0 +1,127 @@
+# FEAT-014 — Structured Log & Heartbeat
+**Status:** Proposal
+**Target:** v0.21.0
+**Author:** Nax Dev
+**Date:** 2026-03-06
+---
+## 1. Problem
+nax runs take 30–120 minutes for multi-story features with no "where are we?" view:
+- `nax status` shows last known state (stale, no stage detail)
+- `nax logs --follow` is raw JSONL event stream (too noisy)
+Users have no visibility into current story, current stage, elapsed time, cost, or pass/fail counts during a run.
+---
+## 2. Heartbeat Data Model
+```typescript
+// src/events/types.ts
+interface RunHeartbeat {
+  type: "run.heartbeat";
+  timestamp: string;
+  runId: string;
+  elapsedSeconds: number;
+  currentStory: {
+    id: string;
+    title: string;
+    status: string;
+    currentStage: string;       // "routing" | "execution" | "verify" | "review" | "completion"
+    stageElapsedSeconds: number;
+    attempts: number;
+    modelTier: string;
+  } | null;                     // null between stories (e.g. deferred regression)
+  storyCounts: {
+    total: number;
+    passed: number;
+    failed: number;
+    pending: number;
+    running: number;
+  };
+  estimatedCostUsd: number;
+  lastActivityAt: string;
+}
+```
+---
+## 3. Implementation Plan
+**Heartbeat emitter** (`runner.ts`):
+```typescript
+const intervalSec = config.logging?.heartbeatIntervalSeconds ?? 30;
+if (intervalSec > 0) {
+  const id = setInterval(async () => {
+    const hb = buildHeartbeat(runState);
+    emitEvent("run.heartbeat", hb);
+    await statusWriter.writeHeartbeat(hb);
+  }, intervalSec * 1000);
+  runCleanup(() => clearInterval(id));
+}
+```
+**Stage transition events** — each pipeline stage emits:
+```typescript
+emitEvent("stage.enter", { storyId, stage: "verify", timestamp });
+// ... logic ...
+emitEvent("stage.exit", { storyId, stage: "verify", result: action, durationMs });
+```
+---
+## 4. CLI Changes
+**`nax status`** (extended output):
+```
+┌─ Run Status ──────────────────────────────────────────────────┐
+│ Feature: verify-v2    Elapsed: 12m 34s    Cost: $0.42         │
+│ Stories: ✅ 2 passed  ❌ 0 failed  ⏳ 3 pending               │
+├─ Current Story ───────────────────────────────────────────────┤
+│ US-003: Smart test runner baseline tracking                    │
+│ Stage: execution (fast tier, attempt 1)  — 2m 18s in stage    │
+└───────────────────────────────────────────────────────────────┘
+```
+**`nax logs --follow --heartbeat`** — filter to heartbeat-only lines (progress bar style, replaces previous line).
+---
+## 5. Files Affected
+| File | Change |
+|---|---|
+| `src/execution/runner.ts` | Add heartbeat `setInterval`, clear on cleanup |
+| `src/events/types.ts` | Add `RunHeartbeat` interface |
+| `src/execution/status-writer.ts` | Add `writeHeartbeat()` method |
+| `src/pipeline/stages/*.ts` | Emit `stage.enter` / `stage.exit` |
+| `src/cli/status.ts` | Render heartbeat table from `status.json` |
+| `src/cli/logs.ts` | Add `--heartbeat` filter flag |
+| `src/config/schemas.ts` | Add `logging.heartbeatIntervalSeconds` |
+| `src/config/types.ts` | Add `LoggingConfig` interface |
+---
+## 6. Config Changes
+```jsonc
+{
+  "logging": {
+    "heartbeatIntervalSeconds": 30   // 0 = disabled
+  }
+}
+```
+---
+## 7. Test Plan
+- Heartbeat emitted every N seconds (mock `setInterval`)
+- Heartbeat written to `status.json`
+- `stage.enter` / `stage.exit` emitted by each pipeline stage
+- `heartbeatIntervalSeconds: 0` → no interval, no events
+- Interval cleared on run completion (no leak)
+- `nax status` renders table when `status.json` has `heartbeat` field

package/memory/topic/feat-010-baseref.md ADDED Viewed

@@ -0,0 +1,28 @@
+# FEAT-010 — baseRef Tracking Design Decision
+## Decision
+Capture `baseRef = current HEAD` **in-memory at each attempt start** (not stored in PRD).
+Use `git diff <baseRef>..HEAD` in smart-runner instead of `HEAD~1`.
+## Why per-attempt, not per-story
+- Story may retry after other stories have committed
+- Storing in PRD: retry would use stale baseRef from first attempt → includes other stories' files ❌
+- Capturing fresh per attempt: retry anchors to HEAD at that moment → only sees its own commits ✅
+## Why no cross-story pollution
+- Story 1 retry baseRef = HEAD after stories 2+3 committed
+- diff <baseRef>..HEAD = only story 1 retry's own commits
+- Other stories' commits are BEFORE baseRef → excluded automatically
+## Flow
+```
+attempt start → captureGitRef() → baseRef (in-memory)
+agent runs → makes N commits
+verify → getChangedSourceFiles(workdir, baseRef)
+         → git diff <baseRef>..HEAD
+         → only this attempt's changed files ✅
+```
+## Edge Cases
+- Agent makes 0 commits → diff = empty → fallback to full suite (existing behavior)
+- Partial commits on failure → next attempt captures new baseRef → clean isolation