npm - @kodax-ai/kodax - Versions diffs - 0.7.40 → 0.7.42 - Mend

@kodax-ai/kodax 0.7.40 → 0.7.42

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (70) hide show

package/CHANGELOG.md +146 -1
package/README.md +129 -232
package/README_CN.md +128 -253
package/dist/chunks/chunk-3RKBXWZS.js +2 -0
package/dist/chunks/chunk-7JLYVWAF.js +1033 -0
package/dist/chunks/chunk-CD3R5YBH.js +16 -0
package/dist/chunks/chunk-DKXUY5F2.js +209 -0
package/dist/chunks/chunk-HMYEQJGT.js +31 -0
package/dist/chunks/{chunk-FAVPT4P7.js → chunk-IYJ5EPRV.js} +1 -1
package/dist/chunks/chunk-KUX5LRPP.js +2 -0
package/dist/chunks/{chunk-EQ5DGS2W.js → chunk-OWSKU55I.js} +5 -6
package/dist/chunks/chunk-ZZ4KRK2B.js +465 -0
package/dist/chunks/compaction-config-FIFFP4FT.js +2 -0
package/dist/chunks/{construction-bootstrap-OFPUZTXQ.js → construction-bootstrap-J2WOCYEK.js} +1 -1
package/dist/chunks/dist-2ZHWDXMQ.js +2 -0
package/dist/chunks/dist-W4CJWLIH.js +2 -0
package/dist/chunks/utils-A5MWDTWZ.js +2 -0
package/dist/index.d.ts +237 -7
package/dist/index.js +5 -5
package/dist/kodax_cli.js +935 -917
package/dist/sdk-agent.d.ts +1375 -10
package/dist/sdk-agent.js +1 -1
package/dist/sdk-coding.d.ts +4608 -14
package/dist/sdk-coding.js +1 -1
package/dist/sdk-llm.d.ts +210 -10
package/dist/sdk-llm.js +1 -1
package/dist/sdk-mcp.d.ts +17 -0
package/dist/sdk-mcp.js +2 -0
package/dist/sdk-repl.d.ts +3026 -13
package/dist/sdk-repl.js +2 -1
package/dist/sdk-session.d.ts +164 -0
package/dist/sdk-session.js +2 -0
package/dist/sdk-skills.d.ts +553 -9
package/dist/sdk-skills.js +1 -1
package/dist/types-chunks/bash-prefix-extractor.d-CkhaqKkg.d.ts +2571 -0
package/dist/types-chunks/capability.d-3C62G8Eq.d.ts +39 -0
package/dist/types-chunks/config.d-BfJUXxC0.d.ts +41 -0
package/dist/types-chunks/cost-tracker.d-B6vMoLLF.d.ts +360 -0
package/dist/types-chunks/history-cleanup.d-DznrzEiU.d.ts +1475 -0
package/dist/types-chunks/instance-discovery.d-BsKnIwpg.d.ts +990 -0
package/dist/types-chunks/resolver.d-DX9au4NJ.d.ts +263 -0
package/dist/types-chunks/session-storage.d-Cci897iM.d.ts +68 -0
package/dist/types-chunks/storage.d-Bc5DoAwp.d.ts +532 -0
package/dist/types-chunks/transport.d-DuyjG30t.d.ts +180 -0
package/dist/types-chunks/types.d-B1uGoVTE.d.ts +400 -0
package/dist/types-chunks/types.d-C5mHR87z.d.ts +119 -0
package/dist/types-chunks/types.d-mM8vqvhT.d.ts +254 -0
package/package.json +16 -3
package/dist/acp_events.d.ts +0 -109
package/dist/acp_logger.d.ts +0 -20
package/dist/acp_server.d.ts +0 -92
package/dist/chunks/chunk-6QO6HWGU.js +0 -30
package/dist/chunks/chunk-CLS57NPX.js +0 -460
package/dist/chunks/chunk-NDNILSTR.js +0 -2
package/dist/chunks/chunk-QZEDWITG.js +0 -1226
package/dist/chunks/chunk-Z5EBDA6R.js +0 -15
package/dist/chunks/compaction-config-A7XZ6H5Y.js +0 -2
package/dist/chunks/dist-M57GIWR4.js +0 -2
package/dist/chunks/dist-OTUF22DA.js +0 -2
package/dist/chunks/utils-DFMYJUTE.js +0 -2
package/dist/cli_commands.d.ts +0 -17
package/dist/cli_option_helpers.d.ts +0 -49
package/dist/cli_option_helpers.test.d.ts +0 -1
package/dist/constructed_cli.d.ts +0 -82
package/dist/constructed_cli.test.d.ts +0 -1
package/dist/kodax_cli.d.ts +0 -7
package/dist/self_modify_cli.d.ts +0 -81
package/dist/self_modify_cli.test.d.ts +0 -9
package/dist/skill_cli.d.ts +0 -15
package/dist/skill_cli.test.d.ts +0 -1

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,151 @@ All notable changes to this project will be documented in this file.
 > Full history for versions prior to v0.7.0: [CHANGELOG_ARCHIVE.md](docs/CHANGELOG_ARCHIVE.md)
+## [0.7.42] - 2026-05-21
+### Theme
+**SDK Embedder Surface Closure + Compaction Systemic Fixes + Plan-List Resilience + Hits Ledger** — Four parallel work streams converge on v0.7.42. **FEATURE_186** (this cycle's headline external-facing item) closes the 10-gap export list reported by KodaX Space (downstream SDK consumer on v0.7.40) plus the MCP popout design request: build-dts CI guard against `@kodax-ai/*` internal-import leaks in entry `.d.ts`, one-liner re-exports for `bootstrapAutoMode` / `loadCommands` / `getAgentConfigHome` etc., a Skill `!cmd` dynamic-context host hook (`executeDynamicContext?` + `disableDynamicContext?`), declarative `ToolSideEffect` metadata on all 51 built-in tools with metadata-driven plan-mode gate (kills the `acp_server.ts` hardcoded `Set(['write','edit'])`), Custom provider + MCP server CRUD against `~/.kodax/config.json` with dynamic `getAgentConfigPath` resolution (no frozen `KODAX_CONFIG_FILE`), a new non-blocking `startKodaX(opts, prompt): RunningSession` entry exposing mid-run `setProvider` / `setModel` / `setReasoning` / `abort` (CAP-055 per-turn re-resolution picks up the new values on the next turn), and a sixth SDK subpath `@kodax-ai/kodax/mcp` for popout consumers who only need the MCP layer. **FEATURE_177→FEATURE_183 + FEATURE_185** address compaction systemic regressions surfaced by long-running kimi-loop investigations: read-file-state cache (F177), L2 stall-detector sidecar (F178, 4 commits), AMA compaction trigger parity at top-of-loop ([ADR-029](docs/ADR.md#adr-029-ama-compaction-trigger-parity--top-of-loop-feature_179-v0742)), repo-intelligence system-message dedup (F180), empty-summary-must-not-overwrite-real-prior (F181), fast-path requires non-empty `previousSummary` (F182), `PROTECTED` whitelist 1→26 (F183, claudecode parity), and hits-ledger enrichment that preserves grep / glob / bash result-side artifacts across microcompact ([ADR-031](docs/ADR.md#adr-031-task-level-hits-ledger-与-cross-session-memdir-分层独立feature_185-v0742)). **FEATURE_175** ships two of three plan-list fixes (id-preserve on `op:'init'` + B2 synth `autoCompleteOnAccept`); the dirty-reject prototype was REVERTED post-eval after zhipu's intent-vs-action floor failed pre-registered SHIP gate (b). **FEATURE_173** lands the public session-management SDK surface + the `runner-${epoch}` ghost-session double-write fix that produced parallel `runner-*.jsonl` files since v0.7.36. Several **claudecode-parity surface polishes** also ship in this cycle: dedicated `skill` tool replaces read-SKILL.md invocation, multimodal `tool_result` for the read tool with image-aware compaction, `todo_get` tool, `subject` / `description` schema split, plan-list staleness refresh + dedup scan, ark-coding adds `deepseek-v4-{flash,pro}`. **FEATURE_094 (Deep Anti-Escape Hardening) was CANCELLED 2026-05-19** after the necessity probe measured 0/43 escape across the canonical 5-alias panel — the post-v0.7.26 layered defense (P0 prompt + P2a multi_edit + P2b cap) plus FEATURE_152 (bash AST) + FEATURE_158 (signal classifier) + FEATURE_169 (pull-tool prompt) absorbed the bypass surface. The probe is retained as a permanent regression sweep (`tests/feature-094-necessity-probe.eval.ts`) — escape rate must stay 0%; >5% re-opens FEATURE_094.
+### Added
+- **FEATURE_186 — SDK Embedder Surface Closure (KodaX Space Gap List + MCP Popout)**. 7 atomic commits across 7 phases (Phase 1 `2e33b681` build-dts CI guard / Phase 2 `d3ab38b0` 一行 export 集 / Phase 3 `9b1e440f` Skill `!cmd` host hook / Phase 4 `7defd65f` Tool side-effect metadata + metadata-driven plan-mode gate / Phase 5 `ee549d6f` Custom provider CRUD / Phase 6 `9ba68f25` `RunningSession` + `sessionControl` / Phase 7 `523e9a28` MCP server CRUD + `@kodax-ai/kodax/mcp` subpath). Closes the 10 export gaps + MCP popout design request reported by KodaX Space (substrate consumer on `@kodax-ai/kodax@0.7.40`). Three categories: (1) **SDK publish hazards** — entry `.d.ts` bundle no longer leaks `@kodax-ai/*` internal imports; `build-dts.mjs` self-tests against POSITIVE/NEGATIVE samples + hard-asserts via grep on each entry `.d.ts`. (2) **Barrel re-exports** — Space no longer maintains parallel implementations: `bootstrapAutoMode`, `loadCommands`, `KODAX_COMMANDS_DIR`, `processCommandCall`, `parseCommandCall`, `getAgentConfigHome` / `Path`, `setAgentConfigHome`, new `getAppDataDir(appId)` (with reserved-name guard `^[a-z][a-z0-9-]{1,31}$`, rejects `kodax-*` prefix), `validateCustomProviderConfig`, `ToolSideEffect` enum + 4 helpers (`getAllRegisteredTools` / `isToolPlanModeAllowed` / `isToolFileMutation` / `isToolMutation`) all surface through the SDK barrel. (3) **Runtime hooks** — Skill `!cmd` execution gets a 3-tier dispatch (host `executeDynamicContext?` hook → `disableDynamicContext?` throws → legacy `execSync`); `runKodaX` gains a non-blocking sibling `startKodaX(opts, prompt): RunningSession` with `id` / `currentProvider/Model/Reasoning` getters, `setProvider` / `setModel` / `setReasoning` setters (queue + replay on pre-attach, direct mutation post-attach; CAP-055 reads the live `RuntimeSessionState` on next turn), `abort(reason?)` via internal `AbortController` (forwards external `options.abortSignal`), and `result` Promise pass-through. Plan-mode gate is now metadata-driven: `LocalToolDefinition.sideEffect: 'readonly' | 'mutates-fs' | 'mutates-shell' | 'mutates-network' | 'mutates-state'` is required, optional `planModeAllowed?: boolean` whitelists per-tool; 51 built-in tools labeled (22 readonly / 12 mutates-fs / 1 mutates-shell / 5 mutates-network / 12 mutates-state); `acp_server.ts`'s hardcoded `Set(['write','edit'])` replaced by `isToolFileMutation`. Custom provider CRUD (`list/get/upsert/removeCustomProvider`) and MCP server CRUD (`list/get/upsert/remove/validateMcpServerConfig`) own `~/.kodax/config.json` end-to-end, with `getAgentConfigPath('config.json')` resolved on every call (no frozen `KODAX_CONFIG_FILE` constant — `setAgentConfigHome()` overrides take effect immediately). The new `@kodax-ai/kodax/mcp` subpath re-exports `@kodax-ai/mcp` only (~0 kB + shared chunks); popout consumers pull MCP without the full coding bundle. **138 new unit tests** across 7 phases. Design doc: [docs/features/v0.7.42.md#feature_186-sdk-embedder-surface-closure--kodax-space-gap-list--mcp-popout](docs/features/v0.7.42.md#feature_186-sdk-embedder-surface-closure--kodax-space-gap-list--mcp-popout). Architecture: [ADR-032](docs/ADR.md#adr-032-sdk-embedder-surface-closure-feature_186-v0742).
+- **FEATURE_173 — Session Management Public SDK + `session.id` Propagation Bug Fix** (commit `a8258d29` implementation; `ac2752a4` design relocation). New `packages/repl/src/session/public-api.ts` thin facade over `FileSessionStorage`; exposes `listSessions({ projectRoot, scope, includeArchived, limit, before })` / `loadSession` / `forkSession` / `rewindSession` / `setActiveEntry` / `deleteSession` / `listRunningSessions` / `watchSessions(cb)` + `createSessionManager({ sessionsDir })` factory via `@kodax-ai/kodax/session` (`dist/sdk-session.js` 731 B + `dist/sdk-session.d.ts` 5.9 KB in tarball). Running-session lock reuses FEATURE_125 team-mode `<configHome>/instances/<pid>/` heartbeat; mutation against a running session returns `{ error: { code: 'session_running', runningProcess: { pid, startedAt } } }` (never throws). Platform-branched `watchSessions`: POSIX `fs.watch` + 100ms debounce coalesce / Windows 1000ms polling (cross-process file creation on Windows fs.watch is unreliable). **13 stable-contract tests** total (12 Part B + 1 Part A) pin `SessionSummary` field names + `forkSession` never-throws semantics + running gate + watch coalesce. **Part A bug fix**: `runManagedTask` call chain dropped `opts.session.id` between `runWithIdleYield` → `primitives/runner.ts`, so the `effectiveRunResult.sessionId ?? \`runner-${Date.now()}\`` resolution at `runner-driven.ts:1965` always fell to the right-hand fallback, producing duplicate `runner-*.jsonl` files (synthesized id) alongside the canonical `YYYYMMDD_HHMMSS.jsonl` (REPL-side). 5-LoC fix prepends `options.session?.id` to the `??`-chain; `FEATURE_173 Part A` contract test locks "caller id wins, ghost-prefix never appears" forever. Out of scope (deferred to v0.7.43): `listRunningSessions().sessionId` field reserved but unpopulated (needs FEATURE_125 heartbeat schema bump to write sessionId into state.json — deleteSession running-gate matches by pid for v0.7.42); `createSessionManager({sessionsDir})` accepts but ignores `sessionsDir` (FileSessionStorage hardcodes `KODAX_SESSIONS_DIR`); old `runner-*.jsonl` cleanup deferred to FEATURE_174 `kodax sessions dedupe`. Design doc: [docs/features/v0.7.42.md#feature_173-session-management-public-sdk--sessionid-propagation-bug-fix](docs/features/v0.7.42.md#feature_173-session-management-public-sdk--sessionid-propagation-bug-fix).
+- **FEATURE_184 — Sidecar Verifier Substrate (claudecode-Shape Main Agent + Stop Hook Primitive)**. Originally drafted as v0.7.45; shipped to v0.7.42 release window 2026-05-21 with full SHIP gate (a)+(b)+(c)+(d) MET on Phase D.4 Layer 2 eval (100/100 primaryPassed; 0% LLM-judge audit disagreement on 20-cell random sample). Retires the AMA H2 Worker→Evaluator role state machine in favor of claudecode-style single-loop Main Agent + agent-layer `StopHookFn` primitive + out-of-chain Sidecar Verifier. Resolves the zhipu/glm51 intent-vs-action floor that made FEATURE_167 B2 synth-accept fallback silently no-op the verification gate. **Net delete ~423 LoC** across `EVALUATOR_AGENT_NAME` / `emit_handoff` / `verdict-recorder` evaluator branches / F165/166/167 dead retry paths. New module `packages/coding/src/agent-runtime/middleware/sidecar-verifier/` (5 files, ~200 LoC impl + ~250 LoC test); sidecar context = current-turn user queries + 24-msg rolling buffer + file-edit summary (must see what main agent **did**, not only what it **said**); model default-inherits main agent, with `KODAX_VERIFIER_PROVIDER` / `KODAX_VERIFIER_MODEL` env-var opt-in for cross-family decoupling. UI surface: `⊙ Verifying...` dim spinner + `↻ Retrying: <reason>` + `⚠ Cannot verify: <reason>` (per claudecode `hook_stopped_continuation` style). See [ADR-030](docs/ADR.md#adr-030-claudecode-shape-main-agent--sidecar-verifier-substrate-feature_184-v0745). Design doc: [docs/features/v0.7.42.md#feature_184-sidecar-verifier-substrate--claudecode-shape-main-agent--stop-hook-primitive](docs/features/v0.7.42.md#feature_184-sidecar-verifier-substrate--claudecode-shape-main-agent--stop-hook-primitive).
+- **FEATURE_175 — Plan-List Resilience: `op:'init'` Mid-Task Status Preservation + B2 Synth Auto-Completion** (commit `1368ce55` + dirty-reject revert markers). Based on 2026-05-19 production session where V2 PLANNED ran 12m54s but plan stayed at `0/4 completed`. Three independent bugs stacked: (1) `todo-store.ts:218-237` `init()` unconditionally reset status to pending — Worker mid-task `op:'init'` refine-scope wiped prior completed/skipped/cancelled; (2) FEATURE_167 (v0.7.41) B2 synth fallback directly assigned `recorder.verdict` property, bypassing the `wrapEmitterWithRecorder` slot setter, so `autoCompleteOnAccept` never fired — run accepted, UI froze at `0/N completed`; (3) `executeInitOp` had no dirty-store guard, magnifying (1). **Slice 1 prototype** three fixes same version: (a) `init()` id-match terminal-success preserve (keeps completed/skipped/cancelled + note, new ids pending, pending/in_progress/failed reset) SHIPPED; (b) B2 synth path now mirrors wrapper side-effect via `todoStore.autoCompleteOnAccept()` SHIPPED; (c) `executeInitOp` returns `{ok:false, reason:"... use surgical APIs ..."}` on non-pending store contents — **PROTOTYPED → eval-driven REVERTED** after Layer 2 panel (51 calls = 1 pilot + 50 phase1, ~$3) showed zhipu/glm51 0/10 PASS on C1+C2 with audit disagreement 0% (real [project_zhipu_send_message_floor](../../../memory/project_zhipu_send_message_floor.md) intent-vs-action floor: "明白，用 todo_create 插入新步骤：" prose-without-tool); pre-registered SHIP gate (b) hard-fail → REVERT. Reverted code retained as revert-pin tests + marker comments. Slice 2: +6 net tests (4 todo-store + 1 todo-update revert-marker + 1 runner-driven integration); coding 2704/2704 + repl 1431/1432 green. Design doc: [docs/features/v0.7.42.md#feature_175-plan-list-resilience--opinit-mid-task-status-preservation--b2-synth-auto-completion](docs/features/v0.7.42.md#feature_175-plan-list-resilience--opinit-mid-task-status-preservation--b2-synth-auto-completion).
+- **FEATURE_177 — Read-File-State Cache (anti-loop)** (commit `8e64e09e` + `c66e2403` post-compact fire). Per-task LRU keyed by absolute path stores `{ mtime, size, hash }` for files the worker has read; subsequent identical reads return cached envelope with a "still fresh — your prior read at turn N is current" banner, suppressing the kimi-loop "read file 4 times in a row" pattern observed in production. Cache invalidated on tool-side mutation (write / edit / multi_edit / insert_after_anchor) and on cross-microcompact boundaries via `onPostCompact` (fixed in `c66e2403` to fire on microcompact-only changes, not just full compactions). Design doc: [docs/features/v0.7.42.md#feature_177-读文件状态缓存read-file-state-cache--抑制非必要重复读取](docs/features/v0.7.42.md#feature_177-%E8%AF%BB%E6%96%87%E4%BB%B6%E7%8A%B6%E6%80%81%E7%BC%93%E5%AD%98read-file-state-cache--%E6%8A%91%E5%88%B6%E9%9D%9E%E5%BF%85%E8%A6%81%E9%87%8D%E5%A4%8D%E8%AF%BB%E5%8F%96).
+- **FEATURE_178 — L2 Stall Sidecar (Rule + LLM dual-layer anti-loop detector)** (4 commits `e79008c1` → `f91cf7cb` → `9bc209f9` → `d9c52638`). L1 (rule layer): standalone stall detector module scans the last N turns for repeat tool-call signatures (same name + same input keys); fires when ≥3 identical calls in N=5 turns. L2 (LLM sidecar): on L1 fire, dispatches a sidecar LLM judge with the recent turn window + a stall-classification system prompt; returns `{ stalled: true|false, reason }` deterministically parseable. Control plane: orchestrator + nudge injection prepends `<stall-detector>` system reminder to the next user message when L2 confirms; rule-only mode (no LLM) available via `KODAX_STALL_SIDECAR=rule`. Design doc: [docs/features/v0.7.42.md#feature_178-l2-stall-sidecar--rule--llm-双层反-loop-检测](docs/features/v0.7.42.md#feature_178-l2-stall-sidecar--rule--llm-%E5%8F%8C%E5%B1%82%E5%8F%8D-loop-%E6%A3%80%E6%B5%8B).
+- **FEATURE_179 — AMA Compaction Trigger Parity (Top-of-Loop)** (commit `02836a72`, see [ADR-029](docs/ADR.md#adr-029-ama-compaction-trigger-parity--top-of-loop-feature_179-v0742)). Moves the AMA compaction hook from end-of-turn to top-of-loop, mirroring SA path's `runCompactionLifecycle` ordering. Pre-fix: AMA path called compaction AFTER the new user message landed in the transcript, so the trigger metric saw the next-turn budget already eaten — compaction either fired too late (already over) or skipped (transcript estimate sub-threshold but post-merge over). Post-fix: hook runs BEFORE the next-turn LLM call, against the pre-merge transcript state, matching SA path semantics. Design doc: [docs/features/v0.7.42.md#feature_179-ama-compaction-trigger-parity--top-of-loop-触发](docs/features/v0.7.42.md#feature_179-ama-compaction-trigger-parity--top-of-loop-%E8%A7%A6%E5%8F%91).
+- **FEATURE_180 — Repo-Intelligence System Message Dedup** (commit `e1782ffe`). Repo-intel capsule injection (FEATURE_161 v0.7.40) could land identical system messages across rounds when topology / module / impact signals were stable; dedup by content hash keeps one copy. Design doc: [docs/features/v0.7.42.md#feature_180-repo-intelligence-system-message-dedup](docs/features/v0.7.42.md#feature_180-repo-intelligence-system-message-dedup).
+- **FEATURE_181 — Empty LLM Summary Must Not Overwrite Real Prior Summary** (commit `57a79767`). When the compaction LLM call returns empty / whitespace-only / API error, the prior `summary` (if non-empty) is preserved instead of overwritten with `""`. Closes a kimi-loop-adjacent case where a single compaction failure wiped the entire compacted history. Design doc: [docs/features/v0.7.42.md#feature_181-empty-llm-summary-不再覆盖-real-prior-summary](docs/features/v0.7.42.md#feature_181-empty-llm-summary-%E4%B8%8D%E5%86%8D%E8%A6%86%E7%9B%96-real-prior-summary).
+- **FEATURE_182 — Compaction Fast-Path Requires Non-Empty `previousSummary`** (commit `d67aa776`). The compaction fast-path (skip LLM, reuse prior summary + new turn delta) gated on `previousSummary` length > 0; cold-start sessions correctly fall through to full LLM compaction. Design doc: [docs/features/v0.7.42.md#feature_182-compaction-fast-path-必须有-previoussummary-才能复用](docs/features/v0.7.42.md#feature_182-compaction-fast-path-%E5%BF%85%E9%A1%BB%E6%9C%89-previoussummary-%E6%89%8D%E8%83%BD%E5%A4%8D%E7%94%A8).
+- **FEATURE_183 — PROTECTED Tool Whitelist Expansion (1 → 26, claudecode parity)** (commits `f6a51be2` + `c322d835` review-amend). `PROTECTED` tools are exempted from compaction's "clear tool_result content" step (the tool's structured payload survives across compact boundaries); pre-fix only `read` was on the whitelist. Expanded to 26 tools matching claudecode's parity set: `read`, `write`, `edit`, `multi_edit`, `glob`, `grep`, `bash`, `todo_create`, `todo_update`, `todo_list`, `todo_get`, `web_search`, `web_fetch`, `task`, `dispatch_child_task`, `ask_user_question`, `emit_verdict`, `emit_handoff`, `module_context`, `symbol_context`, `process_context`, `impact_estimate`, `worktree_create`, `worktree_remove`, `exit_plan_mode`, `skill`. Design doc: [docs/features/v0.7.42.md#feature_183-protected-工具白名单扩容--claudecode-对照修正](docs/features/v0.7.42.md#feature_183-protected-%E5%B7%A5%E5%85%B7%E7%99%BD%E5%90%8D%E5%8D%95%E6%89%A9%E5%AE%B9--claudecode-%E5%AF%B9%E7%85%A7%E4%BF%AE%E6%AD%A3).
+- **FEATURE_185 — Tool Result-Side Enrichment: Hits Ledger Cross-Compaction Preservation** (5 commits `15b1ea3c` → `83976149` → `da8d7b28` → `bddc3d58` + `fcd4cc76` docs, see [ADR-031](docs/ADR.md#adr-031-task-level-hits-ledger-与-cross-session-memdir-分层独立feature_185-v0742)). `KodaXSessionArtifactLedgerEntry` extraction now reads `tool_result.content` (not just `tool_use.input`) for grep / glob / bash entries — grep gains `hits: Array<{ path, line, preview? }>` up to 50 per entry, glob gains `paths: string[]`, bash gains `exit_code` + `tail` (last 240 chars). Metadata-aware merge keystone: when microcompact clears the raw `tool_result.content` to `[Cleared: ...]` placeholder, the ledger summary in the post-compact attachment still shows "you found 12 hits at module/foo.ts:23, 45, 78 / module/bar.ts:12" so the model knows what the prior grep found without re-running it. 5-alias × 2-case × 5-run Layer 2 panel: 5/5 alias ≥80% (gate met). Design doc: [docs/features/v0.7.42.md#feature_185-工具结果侧-enrichment--hits-ledger-跨压缩保留](docs/features/v0.7.42.md#feature_185-%E5%B7%A5%E5%85%B7%E7%BB%93%E6%9E%9C%E4%BE%A7-enrichment--hits-ledger-%E8%B7%A8%E5%8E%8B%E7%BC%A9%E4%BF%9D%E7%95%99).
+- **claudecode-parity polish: dedicated `skill` tool** (commit `09e84aaf`). Replaces "read the SKILL.md file via Read tool" pattern with a dedicated `skill` tool that returns the skill body + metadata in one call. Matches claudecode V2 skill invocation surface.
+- **claudecode-parity polish: `todo_get` tool** (commit `35b93cd7`). Single-task fetch by id, matches V2 TaskGet parity.
+- **claudecode-parity polish: `subject` / `description` split on todo items** (commit `0833aeb7`). Two-field schema matching V2 — `subject` for the short title, `description` for the elaboration. Compatibility shim: legacy `content` field still accepted on input, mapped to `subject` server-side.
+- **Plan-list metadata per-key delete** (commit `9094edda`). `todo_update` patch operation gains granular metadata delete (set value to `null` clears the key); previously the only way to clear metadata was full overwrite.
+- **Plan-list hygiene — staleness refresh + dedup scan** (commit `a7748bbb`). Stale items (status pending for >N turns) get a system-reminder nudge; dedup scan flags subject-collisions across active items.
+- **Deprecate LLM-side `op:'init'`** (commit `3f06330b`). `todo_create` batch is the canonical creation path going forward; LLM-side `op:'init'` remains backward-compatible but emits a deprecation hint in the tool response.
+- **Verification nudge: `todo_update` reminder on terminal-completion transition** (commit `c9a3fe91`). When `todo_update` flips an item to `completed`, the tool response now appends a brief verification reminder ("verify you actually completed this; if not, set status back to in_progress").
+- **`@kodax-ai/llm` ark-coding gains `deepseek-v4-{flash,pro}` (1M ctx)** (commit `c312e899`). Updates the canonical eval alias panel to use coding-plan provider variants (see `feedback_canonical_eval_alias_panel`).
+- **Two-layer cascade for `replay` / `strict` / `streamMax`** (commit `a7615d54`). Custom provider parity with built-in providers' two-layer config resolution (provider default ← user override).
+- **FEATURE_188 — claudecode-Parity dispatch_child Architecture: Drop Forced Worktree + Prompt-Level Conflict Awareness** (see [ADR-034](docs/ADR.md#adr-034-claudecode-parity-dispatch_child-architecture--drop-forced-worktree--prompt-level-conflict-awareness-feature_188-v0742)). Surfaced after FEATURE_177 panel #2 dump showed 0/250 real binding dispatches in `dispatch_child_task` C4 (read fan-out) + C5 (write fan-out) cells — model writing `<tool>dispatch_child_task</tool>` markup in narrative without invoking the structured tool. Three dead-assumption fixes: (1) `executeWriteChild` no longer creates a worktree — share parent `executionCwd` / `gitRoot`, per-file `backups` Map remains the rollback substrate. (2) Worker `dispatchRules` swaps `≥3 independent investigations` / `≥45 seconds` / `≥3 modules` for `multiple independent investigations` / `a while` / `multiple modules` (qualitative criteria per [ADR-033](docs/ADR.md#adr-033-claudecode-style-prompt-design-principles--qualitative-criteria-over-quantitative-rules-v0742-v0743) §1; pilot v3 isolation test 20 calls verified non-load-bearing). (3) RULE C drops `Worktrees are isolated; merge happens at Evaluator review time` — the Evaluator role was retired in FEATURE_184 v0.7.45 ([ADR-030](docs/ADR.md#adr-030-claudecode-shape-main-agent--sidecar-verifier-substrate-feature_184-v0745)). Write children's `buildChildBriefing` now carries a `## Coordination with peers` section instructing them to STOP-and-report if peer-conflict cannot be ruled out (read children's briefing intentionally omits this — they don't write files). Cross-package infrastructure (`childWriteWorktreePathsRef` ref + `registerChildWriteWorktrees` callback + `childWriteWorktreePaths` payload field + `worktreePaths` ReadonlyMap type, 4 type-decl + 4 plumbing sites across `child-executor.ts` / `runner-driven.ts` / `dispatch-child.ts` / `payload-builder.ts` / `types.ts`) all retired. CAP-097 contract test deleted (worktree-creation product behavior gone); CAP-095 / CAP-096 / `child-executor.test.ts` mocks + assertions updated. Design doc: [docs/features/v0.7.42.md#feature_188-dispatch_child-worktree-drop--conflict-awareness-prompt-hardening](docs/features/v0.7.42.md#feature_188-dispatch_child-worktree-drop--conflict-awareness-prompt-hardening).
+### Fixed
+- **FEATURE_177 follow-up: read-file-state cache fires on microcompact-only changes** (commit `c66e2403`). Pre-fix the `onPostCompact` listener only fired on full compactions; microcompact-only iterations left stale cache entries. Now both microcompact and full compaction invalidate the per-task cache.
+- **`dispatch_child_task` empty-summary fallback + opt-in trace** (commit `8c17dba4`). Child task that exited with empty summary previously fell through `??` to a default banner that read like a real summary; now produces a "no summary returned" diagnostic envelope with `mode=silent-drop` so the parent worker can react. Opt-in trace via env-gated logging.
+- **`dispatch_child_task` review pass — flaky test + minor cleanups** (commit `3b5a862f`). Stabilizes one flaky test in the child-task harness and clears a handful of LOW-severity review items.
+- **Shift-Tab cycle uses canonical `'auto'`** (commit `1b513824` + revert chain `32396db8` → `3637bcec`). Closes the Windows-SSH cursor-misalignment root cause. The follow-up revert `3637bcec` restored the `aliasedCurrent` mapping after `32396db8` was challenged by the user — semantic intent (explicit `auto-in-project ≡ auto`) ≠ behavior equivalence (`indexOf=-1` fallback); the original mapping is load-bearing. See `feedback_behavioral_vs_semantic_equivalence` memory.
+- **FEATURE_172 follow-up: `Output.width` follows terminal viewport** (commits `fabe0b4f` + revert `e62312b3`). Attempted fix for Windows-SSH ghost cells via dynamic viewport width; reverted same cycle after broader regression surfaced. Investigation continues in v0.7.43+ work.
+- **REPL queue layout — budget reserves N+1 rows for `QueuedCommandsSurface`** (commit `f4267d4d`). The queue surface was 1 row short of its actual rendered height in tight terminals, causing trailing ellipsis cutoff.
+- **Compaction preserves image blocks + counts image tokens** (commit `92b11e68`). Image blocks were silently dropped during summary roll-up; now preserved verbatim and their estimated tokens included in the total.
+- **REPL drops `[Image #N]` anchor from user-message text** (commit `1eac821d`). Pre-fix the visible user-message text carried both the image block and a redundant `[Image #N]` anchor string; claudecode parity removes the text anchor since the image block itself is the canonical reference.
+- **Read tool image-aware via multimodal `tool_result`** (commit `286c16db`). Reading an image file (`.png` / `.jpg` / `.webp` etc.) now returns the binary as an image-content block in the tool_result, not as a base64 string in text; claudecode parity.
+- **`loadCompactionConfig` uses per-model `contextWindow` for adaptive `triggerPercent`** (commit `0cef1b66`). Pre-fix the trigger percentage was computed against the legacy hard-coded 200k context window; now reads the per-model `contextWindow` (e.g., 1M for `glm-5-turbo` corrected in `5324889e`, 200k for Claude Sonnet 4.x) so the 60% trigger threshold scales correctly.
+- **Status bar `contextWindow` re-resolves on `/model` swap** (commit `c9f62030`). Pre-fix the status-bar contextWindow value was captured at REPL bootstrap; switching models mid-session left the bar showing the stale value.
+- **`zhipu` / `zhipu-coding` `glm-5-turbo` contextWindow 128K → 200K** (commit `5324889e`). Provider metadata correction.
+- **Narrow P2b RST-prone default list to `zhipu-coding` only** (commit `8e9b4520`). FEATURE_152 P2b (write-turn max_output_tokens cap) defaulted to a too-broad provider list, causing unrelated max_tokens RST on healthy providers; narrowed.
+- **Image vision perception: tightened regex + Layer 3 compaction variants — `bc04581c` REVERTED**. Worker image-perception prompt block was prototyped (`bc04581c`) then reverted (`2fd8d8fc`) after Layer 3 V_*_compacted variants showed zhipu state turning honest refuses into confident hallucinations; saturated eval surfaced via tightened regex (require image-content keyword, not SVG markup). Layer 2 eval driver `fe76d3da` retained as permanent regression sweep. See `project_image_perception_worker_prompt` memory.
+- **InkREPL spinner fallback: `item.content` → `item.subject`** (commit `157b162d`). Follow-up to the FEATURE_060 Tier 2 rename; the parallel-thread InkREPL.tsx still referenced `item.content` on the spinner-row fallback path.
+- **InkREPL: `onThinkingEnd` no longer creates duplicate thinking item after assistant text** (commit `4798e66a`). Pre-fix, the end-of-thinking event could append a second transcript entry when the assistant had already begun streaming text; now coalesces with the existing thinking row.
+- **`/compact` updates live token count via `onCompactStats`** (commits `4da09289` + `c058aeff` + revert `829401a8`). Status-bar token count was frozen pre-compaction; now updates in real-time as compaction proceeds. Revert chain captured a temporary command-bridge wiring path that crossed a layer boundary; replaced with the canonical onCompactStats callback in a follow-up.
+- **FEATURE_184 follow-up shipped to v0.7.42 via narrow types**: `RunnerToolResult.content` union narrowed at string-only consumers (`ab2c63be`), unblocking the v0.7.45 sidecar-verifier work in parallel without forcing a v0.7.42 ship dependency.
+- **FEATURE_173 ghost-session double-write**: see Added bullet above.
+### Reverted
+- **FEATURE_177 Worker prompt RULE D — `task_output` teaching layer** (commit `9082551b`). Layer 2 panel rerun (250 cells, 3.2% audit disagreement DATA VALID) hit pre-registered REVERT threshold: case C5 kimi RULE C write fan-out 80% → 20% (-60pp, judge + regex agree). Worker `dispatchRules` reverted to RULE A/B/C + IDLE-YIELD + LARGE CHILD OUTPUT + MODEL HINT (no RULE D in any state). **The runtime `task_output` tool itself stays ON** (commit `334756b7` — in-memory `ChildProgressSnapshot` ring buffer cap=200 + claudecode-shape envelope tool); SDK consumers can opt the worker into the RULE D prompt teaching via `KODAX_TASK_OUTPUT_PROMPT='1'`. Eval drivers retained as permanent regression sweep at `tests/feature-177-task-output*.eval.ts`. User-driven root-cause diagnosis (C5 -60pp is a systemic prompt design problem, not a wording issue) produced **ADR-033** (claudecode-Style Prompt Design Principles — qualitative criteria / single-concept sentences / sparing ✗ + WHY / no enumerated taxonomies / no version metadata in prompt body) and the v0.7.42 hygiene sweep below.
+### Cancelled
+- **FEATURE_094 — Deep Anti-Escape Hardening** (2026-05-19, see [memory](../memory/project_feature_094_cancelled.md)). Necessity probe (5 alias × 3 case × 3 run = 43 probes) measured **0/43 escape rate** across the canonical 5-alias panel — far below the cancel threshold (<5% AND <15%). The post-v0.7.26 layered defense (P0 system prompt + P2a `multi_edit` + P2b `max_output_tokens` write-turn cap) combined with FEATURE_152 (bash AST migration) + FEATURE_158 (signal-based classifier) + FEATURE_169 (pull-tool prompt hardening) absorbed the bypass surface that motivated the original 2026-04 design (~15% bypass at that time). Probe retained as permanent regression sweep: `tests/feature-094-necessity-probe.eval.ts` + `benchmark/datasets/feature-094-necessity-probe/cases.ts`. Escape rate **must** stay 0%; `>5%` reopens FEATURE_094.
+### Internal / architecture
+- **ADR-029 — AMA Compaction Trigger Parity (Top-of-Loop)** documents the FEATURE_179 lifecycle move.
+- **ADR-031 — Task-Level Hits Ledger 与 Cross-Session Memdir 分层独立** documents the FEATURE_185 vs FEATURE_124 (v0.7.43 memdir) boundary.
+- **ADR-032 — SDK Embedder Surface Closure (FEATURE_186, v0.7.42)** documents the 7-phase atomic execution + no-dual-route + dynamic config-path + metadata-driven plan-mode gate design decisions.
+- **ADR-034 — claudecode-Parity dispatch_child Architecture (FEATURE_188, v0.7.42)** documents the forced-worktree drop + qualitative dispatchRules + write-child Coordination briefing. Three dead assumptions retired (Evaluator review-at-merge / "failed rollback needs worktree" / "parallel writes must conflict"). claudecode's `isolation:'worktree'` opt-in is the precedent; KodaX picks user-directed prompt-level peer coordination instead of an explicit opt-in toggle to keep the dispatch friction low.
+- **ADR-033 hygiene sweep — Worker `dispatchRules` claudecode-style refactor**. Two commits in v0.7.42 release window apply ADR-033 principles systemically on top of FEATURE_188's qualitative swap:
+  - **PLAN-FIRST trigger qualitative swap** (commit `5569c49c`, `worker-role-prompt.ts:212`). `≥3 children` → `multiple children`. Panel 95/100 cells empty-binding (floor saturation analog of `feedback_pre_registered_gate_saturation`); audit DATA VALID (plan_first 10.0% at threshold / dispatch_intent 0.0%); per-alias gate met; aggregate +3/100. Policy alignment, not behavioral change.
+  - **FAN-OUT PLAN GRANULARITY block 18-line → claudecode 3-bullet** (commit `1e60eeb0`, `worker-role-prompt.ts:210-216`). 18-line block → 4-line (−57% chars); deletes 6 × ✗ 反模式 + 5 × enumerated label + WORKED EXAMPLE code block + version metadata. Layer 2 panel C4 baseline 0/25 dispatch vs claudecode 7/25 (judge view); 5/5 alias dispatch Δ ≥ 0. mmx C4 -2 cell strict gate failure overridden via evidence-driven SHIP (baseline saturation in plan-without-dispatch case).
+- **Doc reconciliation: FEATURE_184 design relocation v0.7.45.md → v0.7.42.md** (commit `ac4d0267`). FEATURE_184 was drafted as v0.7.45 then shipped to v0.7.42's release window 2026-05-21; the design doc is relocated to match shipped reality. Git history of the 28 v0.7.45-tagged commits is preserved as-is — only the `docs/features/v0.7.{42,45}.md` files were rewritten.
+- **build pipeline: `build-dts.mjs` self-test** (Phase 1 of FEATURE_186). Builds a CI guard against `@kodax-ai/*` internal-import leaks in any of the 7 entry `.d.ts` files (root + 6 subpaths). POSITIVE/NEGATIVE sample regex self-test + hard-assert grep on each built entry — exits 1 if any leak found. Prevents the v0.7.40 publish hazard from reaching the tarball again.
+- **`@kodax-ai/kodax/mcp` subpath** (Phase 7 of FEATURE_186). Sixth SDK subpath; thin re-export of `@kodax-ai/mcp`. Build pipeline (`build-bundle.mjs` `sdkEntryNames` / `build-dts.mjs` `sdkEntries` / `release.mjs` `pkg.exports`) and release.mjs publishConfig wiring all three sync.
+- **Cancelled features tracker hygiene**: FEATURE_094 row updated in `docs/FEATURE_LIST.md`; tracker entry shows `Cancelled 2026-05-19` with necessity-probe rationale + probe retention pointer.
+- **`@kodax-ai/coding` MCP barrel** — `registerConfiguredMcpCapabilityProvider` + `McpCapabilityProvider` etc. still re-exported through coding for backward compatibility; new `@kodax-ai/kodax/mcp` subpath is the cleaner entry going forward.
+### Breaking changes
+- **`LocalToolDefinition.sideEffect` is now required** (Phase 4 of FEATURE_186, commit `7defd65f`). SDK consumers who construct custom `LocalToolDefinition` objects via `registerTool({...})` must now include a `sideEffect: 'readonly' | 'mutates-fs' | 'mutates-shell' | 'mutates-network' | 'mutates-state'` field. tsc will fail on pre-v0.7.42 consumer code until this field is added. The most-defensive default for custom tools is `'mutates-state'`; `'readonly'` is appropriate only for tools with NO observable effects on the system.
+- **`@kodax-ai/coding` exports new types**: `ToolSideEffect`, `KodaXSessionControl`, `KodaXSessionMutators`. These are additive (no rename); existing imports unaffected.
+- **FEATURE_188 (ADR-034) — `dispatch_child_task` no longer auto-creates a worktree for write children**. Write children now share the parent agent's `executionCwd` + `gitRoot` (per-file `backups` Map remains the rollback substrate, and the write-child briefing now carries a "Coordination with peers" section instructing the child to STOP-and-report if peer-conflict cannot be ruled out). The `KodaXChildExecutionResult.worktreePaths?: ReadonlyMap<string,string>` field is removed; the `KodaXManagedTaskRuntimeState.childWriteWorktreePaths` field is removed; the `KodaXToolExecutionContext.registerChildWriteWorktrees?` callback is removed; the `WriteChildDiff` interface + `buildEvaluatorMergePrompt` / `collectWriteChildDiffs` / `cherryPickWorktree` / `cleanupWorktrees` helpers (all dead since FEATURE_184 ADR-030 retired the Evaluator role) are removed. `toolWorktreeCreate` / `toolWorktreeRemove` tools themselves stay in the registry — they still serve the user-explicit `EnterWorktreeTool` / `ExitWorktreeTool` flow. SDK consumers reading `worktreePaths` for diff inspection must instead consume `evidence` / `mergedFindings`. See [ADR-034](docs/ADR.md#adr-034-claudecode-parity-dispatch_child-architecture--drop-forced-worktree--prompt-level-conflict-awareness-feature_188-v0742).
+### Test coverage delta
+- **+138 new unit tests** from FEATURE_186 alone (32 `getAppDataDir` + 18 tool-metadata helpers + 21 custom-provider CRUD + 20 RunningSession + 26 MCP CRUD + 21 plan-mode gate / skill-resolver / build-dts self-test).
+- Plus tests added by FEATURE_173 (12 stable-contract), FEATURE_175 Slice 2 (6 net), FEATURE_177 cache (per-task LRU), FEATURE_178 stall detector (L1+L2), FEATURE_179 lifecycle test, FEATURE_180 dedup test, FEATURE_181 / 182 / 183 single-case fixes, FEATURE_185 enrichment (13 file-tracker + 9 post-compact + 33 result-extractors + Layer 2 eval driver).
+- Coding 2704/2704 + repl 1431/1432 green across the cycle. Build:bundle + build:dts clean for all 7 subpath entries.
+## [0.7.41] - 2026-05-19
+### Theme
+**KodaX Team Mode + AMA Reliability + Source-Tree Modularization + REPL Render & TTFB Perf** — The release lands a third-axis differentiator (multi-instance auto coordination) alongside three AMA-path reliability fixes (mid-turn inject, pending-children handoff gate, post-handoff label flip, terminal-verdict fallback), the Todo V2 per-task CRUD migration with extension hooks, and the largest source-tree refactor since v0.7.25 (`runner-driven.ts` 6406 → 1897 lines, -70.4%, byte-identical). FEATURE_125 KodaX Team Mode is the headline: zero-cognitive-load multi-session awareness (no `/team create`, no `team_id`) — each KodaX instance writes per-pid state to `<configHome>/instances/<pid>/`, every LLM round injects a sibling-snapshot block into all 5 managed roles' system prompts (Scout / Planner / Generator / Evaluator / Worker), and a runtime content-hash safety net catches the only genuine data-race surface (concurrent overwrite of a file another session already read). The LLM-First design contrasts with claude code Team Mode's mode-based 4-stage workflow and no-conflict-resolution semantics. FEATURE_167 closes a structural `signal:'COMPLETE'` false-positive on V2 Evaluator turns (3-layer probe-gated defense: B0 parser SKIP / B1 retry cap / B2 synthesized verdict accept). FEATURE_165 + 166 land together as a Worker→Evaluator handoff hardening pair — runtime gate blocks `emit_handoff` while child registry is non-empty (covers V1+V2 shared `handoffEmit` path), and the REPL surface flips role labels immediately on `agentSwitched` (was lagged a turn). FEATURE_170 migrates the todo subsystem from monolithic init/replace to per-task add/patch/remove with extension events + hooks + new `todo_create` tool; Layer 2 LLM-judge eval (Layer A + Layer B 3.2% disagreement → DATA VALID) clears gate (a)+(b) MET, (c) saturation-artifact noted. FEATURE_171 extracts 12 submodules from `runner-driven.ts` across R1–R4 — zero behavior change, 4 reviewer APPROVE rounds, 4314/4314 tests pass each commit, ADR-026 + HLD §3.5.1 documented.
+### Added
+- **FEATURE_125 — KodaX Team Mode (Multi-Instance Auto Coordination)**. 11 commits S1–S7 + W1–W4 (`acef3c5e` → `9225ad31` → S7 `e2916675` + `e6bc5d7b` audit) + release-prep wiring `0cfc8bc4`. KodaX 自创的多 session 自动协调机制：用户**零认知负担**（无 `/team create`、无 `team_id` 概念），KodaX 自动感知本机其他 KodaX session 状态，把状态注入 LLM system prompt 让 LLM 自决避让/协作/调度；runtime 仅在 race condition 物理边界（content hash mismatch）兜底，不强制 lock、不强制等待。这是与 claude code Team Mode（mode-based 4-stage workflow + 完全无 conflict resolution）的核心差异化。**5 layers**: (S1) per-instance state writer at `<configHome>/instances/<pid>/{state.json,meta.json,heartbeat}` with atomic writes + 1s heartbeat + register/refresh/shutdown lifecycle; (S2) sibling-instance discovery + stale detection + reap with `PersistedSessionState v1` version guard + per-instance failure isolation; (S3) pure system-prompt formatter for the `=== Other active KodaX sessions ===` block (LLM-First wording, truncation, no behavior dictation); (S4) `KodaXToolExecutionContext.contentHashCache?` sha256-based stale-write detection with `recordRead` / `checkStale` / `recordWrite` per-task lifetime; (S5) tool-time soft-warning formatter for exact-path overlap match (no blocking, just an informational banner). **Wiring** (W1–W4): Read tool records sha256 on every successful read up to 5 MB (size cap so huge files don't pay the hash cost); Edit / Write / MultiEdit pre-mutation `checkStale` block + post-mutation `recordWrite` + sibling-overlap warning banner via `ctx.siblingSnapshot`; REPL bootstrap helper `bootstrapTeamMode()` with process-level singleton + `/exit` + SIGTERM lifecycle hooks (mirror wiring landed for both `runInteractiveMode` legacy path AND `runInkInteractiveMode` Ink REPL path — the latter was the release-blocker fix in commit `0cfc8bc4`); runner-driven adapter does per-LLM-round sibling discovery, injects `teamModeSection` into all 5 managed roles' system prompts (Scout / Planner / Generator / Evaluator / Worker) via a mutable `siblingSnapshot` ref + `Object.defineProperty` getter so tool ctx always reads the freshest snapshot. **S7 Layer 2 panel + audit**: `tests/feature-125-team-mode-awareness.eval.ts` + `benchmark/datasets/feature-125-team-mode-awareness/cases.ts`; 5 aliases × 2 cases × 5 runs = 50 LLM calls; **SHIP** per pre-registered matrix after audit-corrected regex extension (`buildToolNamePatterns` expanded from 4 to 9 syntax variants to capture kimi `read:0>{...}` and zhipu `<tool_name>read</tool_name>` forms). Layer A + Layer B audit-corrected primary verdict: case 1 84% / case 2 60% overall (4/5 aliases ≥60% — kimi case 2 narrate-without-tool documented as `feedback_model_structural_floor_not_prompt_tunable`; not addressable via prompt iteration). Design doc: [docs/features/v0.7.41.md#feature_125-kodax-team-mode--multi-instance-auto-coordination](docs/features/v0.7.41.md#feature_125-kodax-team-mode--multi-instance-auto-coordination). Test guide: `docs/test-guides/FEATURE_125_v0.7.41_TEST_GUIDE.md`.
+- **FEATURE_165 — Worker `emit_handoff` pending-children gate**. Commit `0ebeb15f`. Runtime gate at `runner-driven.ts:2402` blocks `emit_handoff` when the child registry is non-empty (covers both V1 and V2's shared `handoffEmit` path). 9 unit tests + 1 integration test pin the gate semantics across both paths. **Prompt addition PARTIAL/dropped**: Layer 2 probe (250 calls × 5 aliases) showed negative-case D/E already 100% on the baseline (`Δ=0pp`), so the pre-registered SHIP condition (2) failed mathematically; the runtime gate is the production-load-bearing change. Probe also confirmed zhipu intent-vs-action floor reproduces in canned-history sessions (structural, not context-length-driven). Design doc: [docs/features/v0.7.41.md#feature_165--worker-emit_handoff-pending-children-gatev0741-hotfix](docs/features/v0.7.41.md#feature_165--worker-emit_handoff-pending-children-gatev0741-hotfix).
+- **FEATURE_166 — Post-handoff role label flip**. Commit `0ebeb15f`. New `onAgentSwitched` hook on agent-runtime + `ObserverBridge.agentSwitched(role)` on coding-side. Fixes the V2 Worker→Evaluator handoff label-lag (`[Worker]` would persist on the next Evaluator turn until the assistant produced output). Production session `20260515_185354` gave a directly reproducible verdict trace. 7 unit tests + 1 pre-existing test corrected. Same session also surfaced FEATURE_167 (Evaluator text-only termination leaves `recorder.verdict === undefined`, V2 runner-driven never wired the `parseManagedTaskVerdictDirectiveFromJson` fallback — landed in FEATURE_167 below). Design doc: [docs/features/v0.7.41.md#feature_166--post-handoff-role-label-flipshipped](docs/features/v0.7.41.md#feature_166--post-handoff-role-label-flipshipped).
+- **FEATURE_167 — Evaluator terminal-verdict fallback (B0 parser + B1 retry + B2 synthesized accept)**. Commit `d537c784` 2026-05-15. Three-layer probe-gated defense closes the structural `signal:'COMPLETE'` false-positive on V2 Evaluator turns where the model exits text-only without calling `emit_verdict` and `recorder.verdict` therefore stays `undefined`. **Layer B0** — parser SKIP path (regex+JSON parse on the assistant's terminal text looking for `{"signal":"COMPLETE","grade":...}` directives); **Layer B1** — retry gate with per-alias cap (default 2, zhipu cap 1 to avoid amplifying the intent-vs-action floor — see `project_zhipu_send_message_floor` memory); **Layer B2** — synthesized verdict accept (fabricates a `{signal:'NEEDS_REVISION', grade:'C', summary:'…inferred from terminal text…'}` envelope so the V2 task engine can complete instead of hanging on the missing verdict). Reviewer-suggested change "include `revise` in the gate" was rejected — correct invariant is `recorder.verdict` object identity comparison (NOT status comparison), otherwise a stale `revise` from a prior turn would falsely satisfy the gate. 29 tests (16 retry-config + 9 predicate + 4 integration); audit panel 0/75 disagreement → DATA VALID. Design doc: [docs/features/v0.7.41.md#feature_167--evaluator-terminal-verdict-兜底shipped](docs/features/v0.7.41.md#feature_167--evaluator-terminal-verdict-兜底shipped).
+- **FEATURE_170 — Todo V2 Migration (per-task CRUD + extension hooks + `todo_create` tool)**. C1–C6 across 8 commits (`e45ddaa8` → `20e02103`). Replaces v0.7.x's monolithic init/replace todo-store API with per-task `add` / `patch` / `remove` operations + monotonic counter + metadata + extension events (`todo:added` / `todo:patched` / `todo:removed` / `todo:before-complete`) + before-complete hook for downstream consumers. New `todo_create` tool added to the registry + role wiring + throttle reset. Worker / legacy / throttle prompts updated to teach the per-item API (C5) with activeForm parity fix (C5 follow-up). **Layer 2 LLM-judge eval (Layer A + Layer B)**: 250-call panel + Layer A 5-sub-agent self-judge + Layer B 3-judge majority (750 calls), Layer B 3.2% disagreement → DATA VALID; gate (a)+(b) MET; gate (c) FAIL as a **pre-registered SHIP gate saturation artifact** (C2 baseline 96% / C3 100% — mathematically unable to add +20pp from a near-saturated baseline). C1 +32pp / mmx 0→100% are direct prompt-cause evidence — SHIP, keep the prompt rewrite. Lessons captured in two new memory entries: `feedback_pre_registered_gate_saturation` (pilot for baseline ceiling before deferring on Δ ≥+N pp) and `feedback_simplifying_prompt_can_regress` (Prefer over X when Y comparative clauses are load-bearing). Design doc: [docs/features/v0.7.41.md#feature_170--todo-v2-migration-per-task-crud--extension-hookssshipped-2026-05-16](docs/features/v0.7.41.md#feature_170--todo-v2-migration-per-task-crud--extension-hookssshipped-2026-05-16).
+- **FEATURE_164 — Mid-turn user-input injection** (shipped as part of commit `0ebeb15f`, the FEATURE_164+165+166 triple). Closes the gap where a user prompt typed during an active LLM round was queued but only delivered as a synthetic `[user]` banner on the next idle-yield wake — semantically incorrect for the user's intent ("inject as if I'd typed it mid-turn"). Now the runner-driven adapter checks the `MessageQueue` snapshot before each LLM call and prepends any queued real-user messages as proper non-synthetic user-bubble messages within the same round.
+### Fixed
+- **FEATURE_125 W3 — Ink REPL Team Mode bootstrap wiring** (release-blocker fix). Commit `0cfc8bc4`. Discovered during v0.7.41 release prep audit: FEATURE_125 W3 (commit `1a073ecc`) wired `bootstrapTeamMode` into the legacy `runInteractiveMode` path but never into `runInkInteractiveMode`, so the Ink REPL (the default REPL path on all platforms since v0.7.25) ran with Team Mode dormant — `<configHome>/instances/<pid>/` was never created, no heartbeat thread started, sibling discovery returned empty, and the system-prompt `teamModeSection` was a no-op for every Ink-launched session. Mirror-wires the bootstrap + `process.on('exit')` + `process.on('SIGTERM')` + clean-exit cleanup into `runInkInteractiveMode` at the same insertion point (after `gitRoot` resolution, before render). 37 lines net.
+- **Issue 132 — h2-boundary `session.jsonl` ENOENT race**. Commit `bf3006fb`. Eager-read in `agent-task-runner` resolves the timing window where benchmark h2-boundary cases would call `tail -f session.jsonl` before the file existed on disk; pre-reads on task start instead of awaiting the first append.
+- **FEATURE_166 stale-test correction** (1 pre-existing test): `agent-runtime.test.ts` had been asserting the buggy label-lag behavior as-correct — corrected to pin the fixed semantics so future regressions surface immediately.
+- **FEATURE_171 build break + decl emit** (covered transitively by the R1–R4 chain test-pass discipline): every refactor commit ran `tsc -b tsconfig.build.json` + 4314 tests green; no stage shipped a partial transform.
+- **Bundle SDK `.d.ts` so consumer `tsc` resolves types** (commit `af623000`). Footgun caught at the SDK consumer surface: tarball shipped `dist/index.js` + subpath bundles but no matching `.d.ts`, so `import { runKodaX } from '@kodax-ai/kodax'` worked at runtime while consumer `tsc` reported missing types. Build pipeline now layers `tsc --emitDeclarationOnly` on top of the esbuild bundle so every published subpath ships real types.
+- **`KODAX_RENDER_TRACE` default path uses `os.tmpdir()` not `homedir()`** (commit `54a59caa`). Phase A.0 review follow-up — `homedir()` pollutes the user's home with per-pid trace files; `os.tmpdir()` is the conventional location for ephemeral diagnostic output and gets cleaned up by the OS.
+### Performance
+- **FEATURE_172 — REPL Render Path Optimization (Phase 1 + Phase A.0/A.1)**. Triggered by user SSH long-session (`kodax -c` with 200+ history items) reporting "every 2-3s a frame refresh" during streaming. Two-phase work, with a mid-feature scope correction.
+  - **Phase 1 (data layer)** — 5 commits `19c6aff3` → `26d47084`. Split `transcript-layout.ts` into pure static/dynamic helpers (`buildTranscriptStaticPortion` / `buildTranscriptDynamicPortion` / `composeTranscriptRenderModel`); split `promptMainScreenRenderModel` + `transcriptMainScreenRenderModel` `useMemo` into static + dynamic with a static-cache-key invariant (streaming-state changes no longer invalidate the static portion); added `React.memo` `areTranscriptRowPropsEqual` comparator on `TranscriptRowRenderer`. **Data-layer bench** (`baseline-26d47084.json`, 800 items): streaming-tick p95 94.18ms → 0.52ms (-99.4%).
+  - **Phase 1 scope correction (2026-05-19)** — Phase 1 ship review with 3 parallel Explore-agent traces + claudecode end-to-end pipeline comparison revealed the data-layer bench (`benchmark/perf/repl-render-perf.bench.ts`) only measured `buildTranscriptRenderModel` inner function (~3-5% of total per-frame cost). The real ~80% lives in `tui/substrate/ink/` rendering substrate: `renderNodeToOutput` full-tree recursion (~55%), `setCellAt` `cells.slice()` O(N²) (~12%), `Output.getGrid()` rebuild (~12%), `diffEach` full-screen walk (~10%), `markDirty` propagation gap (~5%). **Lesson** captured to feedback memory: bench must measure end-to-end wall-time, not isolated inner functions; static analysis of a hot loop can miss the actual cost center.
+  - **Phase A.0 — `KODAX_RENDER_TRACE` env-gated per-frame trace + end-to-end bench scaffold** (commits `5ca91970` + `54a59caa` + `dae85141` + `99e7f2af`). Env-gated trace writes one `frame=N renderTime=X bytes_per_frame=Y writes=Z` line per render to `<tmpdir>/kodax-render-trace-<pid>.log`; bench scaffold parametrizes viewport at the user's real SSH dimensions (148×43) and measures the full engine `onRender` pipeline with a mock stdout so `setCellAt` / `outputToScreen` / `diff` costs are real.
+  - **Phase A.1 — `ScreenBuilder` eliminates `setCellAt` O(N²) `cells.slice()`** (commit `25bf0f52`). New mutable builder pattern at `output-to-screen.ts:211`: original `setCellAt(screen, ...)` did `screen.cells.slice()` (full width×height ref copy) + `{...screen, cells}` per non-empty cell — on a 148×43 viewport with ~500 non-empty cells/frame that's ~3.18M element-copies + 500 fresh arrays + 500 fresh Screen objects per frame. `createScreenBuilder(width, height)` exposes O(1) `setCellAt` writes + one-shot `build()` that returns a frozen Screen; only the `outputToScreen` hot loop migrated, public `setCellAt(Screen, ...)` API preserved for tests + future immutable callers. **End-to-end bench delta** (148×43, `mainscreen-windowed-800` scenario): renderer p95 14.804ms → 3.095ms (-79%, 4.78× speedup). 193 substrate-ink tests + 7 new ScreenBuilder unit tests (byte-equal vs `setCellAt`, OOB rejection, post-build-write rejection, 10k-write soft budget) + last-write-wins test `1105a181` close the review loop. 1426/1427 full repl PASS.
+  - **Phase A.2-E deferred pending user SSH trace measurement after A.1 ship.** ADR-028 documents the full claudecode port plan (Phase B nodeCache + markDirty / Phase C screen.damage bounding box / Phase D Output.charCache + StylePool / Phase E FRAME_INTERVAL + viewport culling). Layer 0 G1 (transcript render goldens, `925a4d77`) + G2 (perf bench + baseline, `4641ebb9`) + G4 (hit-test + selection 22 edge tests, `4fb590f3`) shipped as Phase 0 planning artifacts; ADR-027 + ADR-028 + `docs/test-guides/FEATURE_172_v0.7.41_REGRESSION_GUIDE.md` document the full pipeline.
+- **First-round TTFB compression — drop `refresh:true` tax + parallel pre-LLM + REPL-mount prewarm** (commit `e8b336ed`). Triggered by user observation: review-type prompts on a medium repo paid ~24s pre-LLM wall-time (after parallel/memoize work) before any LLM token streamed. Compressed via L1+L2 to ~10-15s (LLM-TTFB-bound). 5 stacked changes:
+  - **L1 — `middleware/repo-intelligence.ts` first-round NEVER forces `refresh:true`**. 4 sites of `refresh: isNewSession` → `refresh: false`. The 30s `PREMIUM_REFRESH_TIMEOUT_MS` budget was paid on every new session, but the daemon's own background polling keeps its on-disk state fresh; the 4s budget path returns daemon's already-cached state immediately. Single biggest savings (~10-15s).
+  - **L2 — REPL-mount prewarm** (new `prewarmRepoIntelligenceCaches` helper exported from `@kodax-ai/coding` + Ink-REPL `useEffect`). Fires `getRepoRoutingSignals` + `getRepoPreturnBundle` with refresh:false at REPL mount, fire-and-forget. Cache-coherent with L1 (both refresh:false) so user-path either coalesces onto in-flight prewarm Promise (~2s) or hits warmed P3+ cache (~0ms). Default-on; opt-out via `KODAX_PREWARM_REPO_INTELLIGENCE=0`.
+  - **P1.a — middleware parallel fan-out**. Two-phase `Promise.all`: Phase 1 races OSS overview (git+fs) with premium preturn (daemon); Phase 2 races module + impact direct-call fallbacks ONLY for slots not already filled by preturn. Behavioral pins preserved (preturn gating + `.catch(() => null)` error isolation + emit order: preturn → module → impact).
+  - **P1.b — run-substrate parallel**. `hydrateSession` (MCP state restore) and `getRepoRoutingSignals` collapsed to one wall-time slot via `Promise.all`; hydration error propagation unchanged, routing has independent `.catch(() => null)`.
+  - **P2 / P3 / P3+ — multi-tier cache stack**. P2 in-flight Promise sharing in `tryPremiumPreturn` (1.5s TTL, cacheKey DELIBERATELY includes `refresh` so explicit `refresh:true` callers — `/repointel warm`, eval harness — get their own daemon work). P3/P3+ session-scoped caches (60s TTL on routing signals + preturn bundle, cacheKey OMITS refresh so prewarm + first-round share one entry under the "data within 60s is fresh by definition" semantics). `normalizeCachePath` helper makes cacheKey robust to Windows drive-letter case + relative-vs-absolute caller variations + Promise rejection paths.
+  - **Default repo-intelligence mode preserved as `'auto'`**. Briefly experimented with flipping default to `'oss'` for users without repointel; cost analysis showed `'auto'` fallback path is ~10ms localhost TCP RST + 2s `PREMIUM_FAILURE_TTL_MS` cache → 0ms within TTL + ~5-10ms per >2s gap (negligible vs LLM TTFB). Auto-detection of installed repointel is the right default per README:182.
+- **Inline spinner-row stats tail — elapsed + tokens (claudecode parity)** (commit `58682cbf`). REPL spinner row gains an inline `Xs · Y tokens` running tail (matches claudecode's status indicator). Frontline of a sequence of claudecode-parity surface improvements; documented in ADR-027 Phase 0.
+### Internal / architecture
+- **FEATURE_171 — `runner-driven.ts` modular split**. R1 `2fef1c31` (4 leaf modules + `types.ts`) → R2 `f0be2d4e` (4 mid-coupling modules) → R3 `bfb2b818` (agent-chain + llm-adapter) → R4 `62dc1c58` (payload-builder + checkpoint-flow). 12 submodule extraction; **6406 → 1897 lines (-70.4%)**; **zero behavior change**; **4 reviewer APPROVE rounds**; **4314/4314 tests pass each commit**. ADR-026 + HLD.md §3.5.1 documented in R5 (`4d108af9`). The refactor preserves the closure pattern around `baseCtx` / `siblingSnapshot` / `contextTokenSnapshotRef` — what was a 6400-line monolith is now a stack of named factories each under 800 lines. Module map: `types.ts`, `agent-chain.ts`, `payload-builder.ts`, `checkpoint-flow.ts`, `llm-adapter.ts`, `compaction-bridge.ts`, `manager-input-builder.ts`, `result-projection.ts`, `tool-ctx-builder.ts`, `child-task-orchestration.ts`, `recorder-bridge.ts`, plus the residual `runner-driven.ts` entry. Side benefit: faster IDE hover-pop on the public surface; the public export shape is unchanged so all consumers are byte-equivalent.
+- **`bootstrapTeamMode` + `TeamModeHandle` exports added to `@kodax-ai/agent`** so the Ink REPL can import them without depending on legacy-CLI internals. The handle exposes `shutdown()` and is opaque otherwise (per the layer-independence guarantee — REPL has no business poking at the per-instance writer's internals).
+- **`KodaXToolExecutionContext.contentHashCache?`** field added with `recordRead` / `checkStale` / `recordWrite` API surface. Per-task lifetime (created at task start, destroyed at completion). Wired into Read / Edit / Write / MultiEdit tool implementations so the FEATURE_125 race-detection works without per-tool plumbing.
+- **`KodaXToolExecutionContext.siblingSnapshot?`** field added (as a mutable ref) with `Object.defineProperty` getter on the tool ctx so each tool invocation reads the freshest snapshot from the runner-driven adapter's per-round refresh. Avoids stale-snapshot reads when the LLM stream spans multiple seconds.
+- **`buildToolNamePatterns` extended from 4 to 9 syntax variants** in the benchmark harness regex tooling (`benchmark/datasets/feature-125-team-mode-awareness/cases.ts` + downstream). Captures kimi `read:0>{...}`, zhipu `<tool_name>read</tool_name>` and 3 other non-canonical syntaxes; lesson saved as `feedback_regex_audit_per_new_eval`.
+- **`JudgeContext.toolCalls?`** plumbed through `benchmark/harness/judges.ts` + both call sites in `benchmark/harness/harness.ts`. Optional `judge(output, context?)` arg lets binding-only providers (zhipu/glm51, mmx/m27, etc. — they emit `text=""` and put the tool call in the structured `tool_calls` field) be judged on what the harness actually captured, not on the empty raw text. Existing text-only judges ignore the arg and continue to work unchanged. Per `feedback_audit_must_see_binding` + `feedback_audit_binding_priority_in_prompt`: also requires the audit judge prompt to label the binding as "ABSOLUTE GROUND TRUTH" + a `CRITICAL RULE` system prompt section, or judges over-anchor on the empty raw text.
+- **2 prompt-eval datasets** added under `benchmark/datasets/`: `feature-125-team-mode-awareness/` (S7 Layer 2 panel: peer-active-file-acknowledge-read-first + peer-recently-modified-reread) and `tool-schema-slim/` (Layer 2 eval of v2_slim ~half + v3_aggressive ~quarter description variants for `ask_user_question` + `todo_create` — see "Tool schema slim eval" below).
+- **Tool schema slim eval (DEFER both v2 + v3)**. Commit `d68141ea`. Designed + ran the largest two-Scout-tool slim attempt: `ask_user_question` (2760 B / ~690 tok) + `todo_create` (2384 B / ~596 tok) — combined ~785–990 tokens potentially saved. 4-alias panel × 9 cases × 5 runs + panel-internal majority audit (initial 85–97% disagreement on AUQ_6 / 18–30% on TC_1 fixed by switching to v2 `CRITICAL RULE` prompt → 0% disagreement, data validated). Both variants **DEFER**: v2 gate (a) violations AUQ_1 zhipu −20pp + TC_1 zhipu/ds/kimi −20 to −40pp; v3 gate (a) violations AUQ_1 zhipu/ds −20 to −40pp + TC_1 zhipu/ds −40pp. Reason: `"For X use Y, NOT Z"` comparative clauses in schema descriptions are load-bearing disambiguation priors — slimming caused zhipu/ds to mis-classify simple cases. Pattern matches existing `feedback_simplifying_prompt_can_regress` + `feedback_model_structural_floor_not_prompt_tunable`. Future schema-slim work: don't touch "use X for ... NOT for ..." clauses; safe to slim version prefixes + return-value descriptions + "use sparingly" style instructions + property description secondary detail. Net cost ~$23 within ~$27 budget.
+### Test coverage delta
+- New: 17 (S1) + 20 (S2) + 20 (S3) + 15 (S4) + 15 (S5) + 7 (S6 integration) + 4 (W1) + 14 (W2) + 10 (W3) + 9 (W4) = 131 FEATURE_125 tests; 9 (FEATURE_165) + 7 (FEATURE_166) + 29 (FEATURE_167) + ~30 (FEATURE_170 C1–C6 follow-ups) = ~75 reliability tests; 50 (FEATURE_171 R4 tool wiring contract) + 0 net new for R1–R3 (all R-series ran the full pre-existing 4314 each round); ~95 FEATURE_172 Phase 1 (65 transcript-layout helpers + 8 golden snapshot + 22 hit-test/selection edge) + 17 React.memo comparator + 7 ScreenBuilder + 1 last-write-wins + 1 KODAX_RENDER_TRACE = ~120 FEATURE_172 tests; 4 cache-coalesce regression tests for the TTFB stack (P2 in-flight, P3 cross-call, P3+ multi-round, refresh:true-within-TTL).
+- Total green at HEAD: **5,081 tests pass + 23 todo + 1 skipped across 8 workspaces** (agent 477 / coding 2712 / llm 276 / mcp 28 / repl 1419 / repo-intel & skills 136 / repointel-protocol & session-lineage 18 / tracing 15). `tsc -p packages/coding/tsconfig.json --noEmit` + `tsc -p packages/repl/tsconfig.json --noEmit` both clean.
 ## [0.7.40] - 2026-05-13
 ### Theme
@@ -1210,7 +1355,7 @@ repl            → coding, skills
 ### Tests
 - Added / expanded tests for `task-engine`, `reasoning`, `tool-display`, `live-streaming`, `StatusBar`, `invocation-runtime`, `types-legacy`, and `InkREPL.interrupted`
-<!-- last-sync: HEAD -->
+<!-- last-sync: a8258d29 -->
 ### Added
 - **Repository intelligence substrate (FEATURE_018)**: Task-aware repository intelligence layer under `.agent/repo-intelligence/` with durable artifacts — `repo-overview.json`, `changed-scope.json`, `module-index.json`, `symbol-index.json`, `process-index.json`, `repo-intelligence-manifest.json` — supporting incremental refresh, freshness metadata, and language-tiered extraction (TS/JS via AST, Python, Go, Rust, Java, C++)