@kodax-ai/kodax 0.7.41 → 0.7.43

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/CHANGELOG.md +119 -3
  2. package/README.md +214 -286
  3. package/README_CN.md +173 -277
  4. package/dist/chunks/chunk-7G5PSL6C.js +830 -0
  5. package/dist/chunks/{chunk-6OB4AJOM.js → chunk-IYSK7LUK.js} +1 -1
  6. package/dist/chunks/chunk-K75O2CAE.js +31 -0
  7. package/dist/chunks/chunk-UG4262JI.js +502 -0
  8. package/dist/chunks/chunk-VHKAJDQD.js +425 -0
  9. package/dist/chunks/chunk-YMRZBS4G.js +2 -0
  10. package/dist/chunks/{compaction-config-LT5PEXPT.js → compaction-config-3E57ABCT.js} +1 -1
  11. package/dist/chunks/{construction-bootstrap-HBCWJFHC.js → construction-bootstrap-JR63KI5N.js} +4 -4
  12. package/dist/chunks/dist-KWHUKXEL.js +2 -0
  13. package/dist/chunks/dist-XANXEVTU.js +2 -0
  14. package/dist/chunks/utils-HQ2QCKJA.js +2 -0
  15. package/dist/index.d.ts +15 -10
  16. package/dist/index.js +5 -5
  17. package/dist/kodax_cli.js +1084 -1054
  18. package/dist/sdk-agent.d.ts +853 -135
  19. package/dist/sdk-agent.js +1 -1
  20. package/dist/sdk-coding.d.ts +932 -981
  21. package/dist/sdk-coding.js +1 -1
  22. package/dist/sdk-llm.d.ts +8 -5
  23. package/dist/sdk-llm.js +1 -1
  24. package/dist/sdk-mcp.d.ts +17 -0
  25. package/dist/sdk-mcp.js +2 -0
  26. package/dist/sdk-repl.d.ts +343 -10
  27. package/dist/sdk-repl.js +2 -1
  28. package/dist/sdk-session.d.ts +176 -0
  29. package/dist/sdk-session.js +2 -0
  30. package/dist/sdk-skills.d.ts +72 -4
  31. package/dist/sdk-skills.js +1 -1
  32. package/dist/types-chunks/{cost-tracker.d-C4dMlQuV.d.ts → base.d-FUJahC0i.d.ts} +22 -112
  33. package/dist/types-chunks/{bash-prefix-extractor.d-B2iliwdi.d.ts → bash-prefix-extractor.d-DMrGImMl.d.ts} +266 -228
  34. package/dist/types-chunks/capability.d-3C62G8Eq.d.ts +39 -0
  35. package/dist/types-chunks/config.d-BfJUXxC0.d.ts +41 -0
  36. package/dist/types-chunks/cost-tracker.d-wRtyEW9d.d.ts +110 -0
  37. package/dist/types-chunks/{history-cleanup.d-q1vAvCss.d.ts → file-tracker.d-zaLZeNBK.d.ts} +532 -15
  38. package/dist/types-chunks/manager.d-87belpiS.d.ts +370 -0
  39. package/dist/types-chunks/{resolver.d-BwD6TKz7.d.ts → resolver.d-CA68_NeH.d.ts} +150 -5
  40. package/dist/types-chunks/storage.d-DPAEX7zS.d.ts +115 -0
  41. package/dist/types-chunks/{capability.d-BxNgd1-c.d.ts → types.d-B1uGoVTE.d.ts} +72 -40
  42. package/dist/types-chunks/{instance-discovery.d-DZhp77vb.d.ts → types.d-CKJtjo-6.d.ts} +168 -258
  43. package/dist/types-chunks/types.d-mM8vqvhT.d.ts +254 -0
  44. package/dist/types-chunks/{storage.d-Bv9T99Qu.d.ts → utils.d-DkLZD_wa.d.ts} +38 -112
  45. package/package.json +15 -6
  46. package/dist/chunks/chunk-5TFLMGER.js +0 -2
  47. package/dist/chunks/chunk-6QO6HWGU.js +0 -30
  48. package/dist/chunks/chunk-EQ5DGS2W.js +0 -14
  49. package/dist/chunks/chunk-HYWVRTFA.js +0 -1233
  50. package/dist/chunks/chunk-SX2IS5JP.js +0 -16
  51. package/dist/chunks/chunk-ZPJPNLBK.js +0 -462
  52. package/dist/chunks/dist-M57GIWR4.js +0 -2
  53. package/dist/chunks/dist-V3BS2NKB.js +0 -2
  54. package/dist/chunks/utils-FAFUQJ2A.js +0 -2
package/CHANGELOG.md CHANGED
@@ -4,6 +4,122 @@ All notable changes to this project will be documented in this file.
4
4
 
5
5
  > Full history for versions prior to v0.7.0: [CHANGELOG_ARCHIVE.md](docs/CHANGELOG_ARCHIVE.md)
6
6
 
7
+ ## [0.7.43] - 2026-05-25
8
+
9
+ ### Breaking Changes
10
+
11
+ - **FEATURE_190 follow-up: `KodaXOrchestrationVerdict.continuationSuggested?: boolean` + `KodaXManagedTaskVerdict.continuationSuggested?: boolean` SDK-visible fields deleted** (commit `3cbe3f68`, Risk 2 cleanup of the FEATURE_190 audit). After FEATURE_190 Phase 3 deleted the `emit_handoff` tool surface and FEATURE_193 retired the V1 Generator role entirely, the `continuationSuggested` derivation in `payload-builder.ts` (`recorder.handoff?.payload.handoff?.status === 'ready' && verdictStatus !== 'accept'`) became unreachable — `recorder.handoff` was never populated post-Phase-3 because no tool existed to populate it, making the field permanently `false` in production. Rather than ship a permanently-false field in the public type surface, the field is deleted in v0.7.43. **Migration**: SDK consumers reading `result.managedTask?.verdict.continuationSuggested` (`KodaXOrchestrationVerdict`) or `KodaXManagedTaskVerdict.continuationSuggested` should switch to reading `result.managedTask?.verdict.disposition === 'needs_continuation'` (the `disposition` field lives on `KodaXOrchestrationVerdict` — note: `result.managedProtocolPayload?.verdict` is the unrelated `KodaXManagedVerdictPayload` type which has `status` / `source` but no `disposition`; if you were reading `continuationSuggested` off that surface you had a type bug — the field never existed there). The Sidecar Verifier owns the continuation decision via `disposition` + `signal` per FEATURE_184 / ADR-030 §F184; the in-tree REPL UI at [`InkREPL.tsx:buildManagedTaskTranscriptItems`](packages/repl/src/ui/InkREPL.tsx) L612-618 already reads the canonical `disposition` field — no UI behavior change. The `continuation.json` write-artifact at [`artifacts.ts:writeManagedTaskArtifacts`](packages/coding/src/task-engine/_internal/managed-task/artifacts.ts) preserves its `continuationSuggested: <boolean>` JSON output shape — only the source of the value changes (was `verdict.continuationSuggested || disposition==='needs_continuation' || blocked`; now `disposition==='needs_continuation' || blocked`). External readers of `continuation.json` see no schema change.
12
+
13
+ ### Fixed
14
+
15
+ - **Sidecar Verifier silent disablement on V2 single-Worker chain — latent regression from FEATURE_193 Commit 2** (commit `cc8ce393`, fixed 2026-05-24, regression window: 2026-05-22 `c5d4b829` → 2026-05-23). F193 Commit 2 (`c5d4b829`) flipped entry agent to `chain.worker` for the V2 single-loop but left `currentAgentRoleRef.current` initialised to the V1 `'scout'` sentinel ([`runner-driven.ts:1394`](packages/coding/src/task-engine/runner-driven.ts)). Combined with `Runner.run` NOT firing `onAgentSwitched` for the entry agent on a single-agent chain (proven by [`packages/agent/src/primitives/runner-handoff.test.ts:360-365`](packages/agent/src/primitives/runner-handoff.test.ts)), the ref was stuck at `'scout'` for every V2 run. The verifier gate (`isExecutionRole = currentAgentRoleRef.current === 'worker'`) therefore never opened, silently disabling the entire FEATURE_184 Sidecar Verifier StopHook on every V2 production run despite the `verifier-provider-resolver.ts` invariant "always returns a defined value — the verifier hook is always installed in production". **Six downstream features were transitively dead** for the ~24h regression window: (1) Sidecar Verifier accept/revise/blocked three-state verdict; (2) `KodaXResult.success` could never go `false` on blocked outcomes (`verdictStatus` was permanently `undefined` so `success = signal !== 'BLOCKED' && verdictStatus !== 'blocked'` was permanently `true`); (3) FEATURE_076 round-boundary `reshapeToUserConversation` was permanently skipped because `verdict.status='running'` is classified as unconverged at [`round-boundary.ts:48-49`](packages/coding/src/task-engine/_internal/round-boundary.ts) — cross-round transcript reshape + V1-legacy cleanup + synthetic final-assistant-message append all silently no-op'd; (4) REPL `[AMA Verifying]` spinner (FEATURE_184 D.3) never fired; (5) `KODAX_VERIFIER_LOG=1` opt-in observability silently produced zero log lines; (6) `session.jsonl` `verdict.status` field stuck at `'running'` for every V2 session, polluting downstream scorecard aggregates + REPL transcript dumps + resume payloads. **Why no test caught it**: verifier unit tests in `sidecar-verifier/*.test.ts` cover provider resolution + recorder bridge + verifier internals in isolation, never the `runner-driven.ts:composedStopHook` integration; the F184 D.4 Layer 2 eval (100/100 cells PASS) bypassed the runner gate by calling the verifier directly via a driver; and the runner-driven integration assertions at [`runner-driven.test.ts:702`](packages/coding/src/task-engine/runner-driven.test.ts) (`expect(verdict).toBeUndefined()`) + `:773` (`expect(verdict.status).toBe('running')`) had been updated by F193 Commit 13a to hard-code the broken state, pinning the regression in place. **Fix**: 1-line production change at [`runner-driven.ts:1394-1396`](packages/coding/src/task-engine/runner-driven.ts) — initialise `currentAgentRoleRef.current` directly to `'worker'` and narrow the type from `KodaXTaskRole | 'scout' | 'planner'` to `KodaXTaskRole` (V2 chain has no Scout/Planner agents). The two assertions at `runner-driven.test.ts:702 + :773` updated to lock in the restored V2 contract (`verdict.source='sidecar'` + `status='accept'` via verifier fail-open `provider_error` trace in the no-API-key test environment; `verdict.status='completed'`). **User-visible behavior change**: every V2 query now triggers a real verifier LLM call on the inherit-main provider after Worker text-only termination (~3-10s tail latency, FEATURE_184 design intent) — users routing around a model floor (e.g. zhipu/glm-5.1 intent-vs-action) can set `KODAX_VERIFIER_PROVIDER` + `KODAX_VERIFIER_MODEL` to redirect verification to a different family. The `[AMA Verifying]` spinner appears for the first time on V2 runs; it does not corrupt the `activeWorkerTitle` state. 8-point completeness audit (Sidecar Verifier composedStopHook / Stall Sidecar F178/F187 / `observer.agentSwitched` / `observer.idleWaiting` / `onScoutSuspiciousCompletion` / `extensionTurnCompleteHook` / type narrowing scope / REPL spinner) all clean — no CRITICAL/HIGH issues introduced. All 55 `runner-driven.test.ts` cases pass (was 53 + 2 fail); full `vitest run packages/coding/src` status unchanged from pre-fix (2857 pass; only pre-existing FEATURE_168 schema parity 6 failures from FEATURE_189 Batch 3 B.1 description drift remain, unrelated).
16
+
17
+ ### Behavior Changes
18
+
19
+ - **FEATURE_194 follow-up: SDK subpath `/skills` and `/mcp` narrowed from agent-full to capability-subset** (1 commit, shipped 2026-05-24). Post-FEATURE_194 the v0.7.43 ship had a residual issue: `src/sdk-skills.ts` and `src/sdk-mcp.ts` were `export * from '@kodax-ai/agent'`, which made `@kodax-ai/kodax/skills` and `@kodax-ai/kodax/mcp` expose the **entire** agent surface (202 symbols each — including `Runner`, `Agent`, `runFanOut`, `createSessionLineage`, etc.) despite the subpath names implying a narrow capability slice. This commit corrects the leak so each narrow-subset subpath only exposes its named capability's API: `@kodax-ai/kodax/skills` → 26 symbols (= the complete pre-FEATURE_194 `@kodax-ai/skills` standalone package public API: `SkillRegistry` / `SkillExecutor` / `VariableResolver` / `loadFullSkill` / `parseSkillMarkdown` / `expandSkillForLLM` / `discoverSkills` / 19 more); `@kodax-ai/kodax/mcp` → 11 symbols (= complete pre-FEATURE_194 `@kodax-ai/mcp` standalone package public API: `McpCapabilityProvider` / `McpManager` / `McpServerRuntime` / `createMcpTransport` / `searchMcpCatalog` / 6 more). **Total SDK surface unchanged**: 715 unique symbols across all 7 subpaths before and after (deduped union); the removed symbols still exist in `@kodax-ai/kodax/agent`. **What breaks**: code that imported agent framework symbols via the `/skills` or `/mcp` subpath — e.g. `import { Runner } from '@kodax-ai/kodax/skills'`. That was always semantically incorrect (the subpath name promises only skills/mcp APIs) and worked due to the residual leak. **Migration** (one import path swap, no API rename): `import { Runner } from '@kodax-ai/kodax/skills'` → `import { Runner } from '@kodax-ai/kodax/agent'`; similarly for `Agent`, `runFanOut`, `createSessionLineage`, `Tracer`, `McpManager` (when imported via `/skills`), etc. Anyone migrating from v0.7.42's `@kodax-ai/skills` or `@kodax-ai/mcp` standalone packages to the v0.7.43 bundled `@kodax-ai/kodax/{skills,mcp}` subpaths sees no symbol-coverage difference — the narrow subset is byte-equivalent to the pre-FEATURE_194 standalone packages' public API. **Bundle impact**: `dist/sdk-skills.js` 7 kB → 1 kB; `dist/sdk-mcp.js` 7 kB → 1 kB; `dist/sdk-skills.d.ts` 130 kB → 19 kB; `dist/sdk-mcp.d.ts` 130 kB → 1.2 kB. **Implementation note**: both files use explicit named re-exports from the `@kodax-ai/agent` top-level barrel (not `export * from '@kodax-ai/agent/capabilities/{skills,mcp}'`) because rollup-plugin-dts does not resolve `package.json#exports` subpaths in this monorepo build — same workaround used by `src/sdk-session.ts` since FEATURE_173. Runtime path uses subpath resolution normally (esbuild handles it). 6-gate verification PASS: G1 tsc clean / G2 vitest 16 baseline (net regression 0) / G3 build:packages + build:bundle + build:dts PASS / G4 surface counts `/skills`=26, `/mcp`=11, `/agent`=202 unchanged / G5 `npm publish --dry-run` PASS / G6 reverse assertion (`Runner` NOT in `/skills` or `/mcp`; `loadFullSkill` IN `/skills`; `McpManager` IN `/mcp`; full set retained in `/agent`). Doc: README + README_CN add "Source-side vs npm-published surface" mapping table explaining the full-package vs narrow-subset subpath roles. See ADR-036 (narrow-subset subpath convention).
20
+
21
+ ### Refactored
22
+
23
+ - **FEATURE_194 — Package Consolidation: 9 → 4 Workspace Packages (Inline mcp / skills / tracing / session-lineage / repointel-protocol)** (9 commits `b7235f0e` → `ced8a30d` → `801eeae5` → `c1301898` → `1fb0433a` → `7523a5c0` → `324779b4` → `3bb70d1e` → this commit, shipped 2026-05-24). Closes the v0.7.35.1 FEATURE_142 Batch B split + v0.7.36 mcp/skills isolation rationale: grep-verified zero external npm consumers for the 5 single-consumer subpackages (`@kodax-ai/{mcp, skills, tracing, session-lineage, repointel-protocol}`) violated CLAUDE.md "3+ use cases" rule. Real measurement (post Windows-path correction): KodaX coding=66k LoC, total ~132k LoC across 9 packages — larger than pi (96k / 4 packages), not smaller. Consolidation goal is not "less code" but reducing **carrying cost**: 10→4 npm publish cycles, 9→4 build-graph nodes, ~84 cross-pkg import sites collapsed to internal relative paths, IDE jump-to-source friction eliminated, and the latent `@kodax-ai/session-lineage` dep-not-declared bug (agent imported it without declaring in package.json — worked via tsconfig path in monorepo, would break on npm publish) auto-fixed. **9 dependency-ordered commits** (hybrid soft-delete for the only MED-risk subpackage): (0)立项 docs (FEATURE_LIST + v0.7.43.md plan + ADR-036); (1) mcp inline to `agent/src/capabilities/mcp/` + `@kodax-ai/agent/capabilities/mcp` subpath export (9→8 packages); (2) skills inline to `agent/src/capabilities/skills/` (including 50 files + `builtin/` builtin skills + `shared/yaml.ts` shared parser) + subpath exports + `copy:builtin` post-build script (8→7 packages); (3) tracing inline to `agent/src/tracing/` (agent self-merge, 8 internal agent imports rewritten + `fflate` + `yaml` deps absorbed) (7→6 packages); (4a) session-lineage inline (MED RISK, 32 files cross compaction critical path; 4 circular-import value-imports through agent barrel rewritten to direct source paths: `countTokens` + `estimateTokens` → `tokenizer.js`, `getAgentConfigPath` → `runtime/agent-home.js`; stub package shell remains); (4b) session-lineage stub delete after reverse-grep verified 0 active imports; (5) repointel-protocol inline to `coding/src/repo-intelligence/protocol.ts` (69 LoC, no stub because risk floor); (6) workspace cleanup (README + README_CN ASCII tree + dependency graph + Package Overview table + SDK subpath JSDoc + agent/coding README + 1 example import path); (7) this commit — docs finalize + CHANGELOG entry + ADR-036 status + FEATURE_LIST status + memory update. **Target structure achieved**: `@kodax-ai/{llm 7.3k, agent 20.8k (absorbed mcp + skills + tracing + session-lineage), coding 66.4k (absorbed repointel-protocol), repl 37.7k}` — 4 packages aligned with pi count. **Public API**: subpath exports `@kodax-ai/agent/session-lineage`, `@kodax-ai/agent/capabilities/mcp`, `@kodax-ai/agent/capabilities/skills`, `@kodax-ai/agent/capabilities/skills/shared/yaml`, `@kodax-ai/agent/tracing` preserve all consumer-visible symbols; top-level `export * from './capabilities/...js'` etc. on agent's `index.ts` provide barrel-compat. The `REPOINTEL_DEFAULT_ENDPOINT` re-export from `@kodax-ai/coding` is preserved (was direct re-export from `@kodax-ai/repointel-protocol` pre-F194). **Breaking on direct npm consumers** (none known, but in case external dep): replace `@kodax-ai/mcp` → `@kodax-ai/agent/capabilities/mcp`; `@kodax-ai/skills` → `@kodax-ai/agent/capabilities/skills`; `@kodax-ai/tracing` → `@kodax-ai/agent/tracing`; `@kodax-ai/session-lineage` → `@kodax-ai/agent/session-lineage`; `@kodax-ai/repointel-protocol` → `@kodax-ai/coding` (REPOINTEL_DEFAULT_ENDPOINT top-level re-export) or `@kodax-ai/coding` internal (`./repo-intelligence/protocol.js`). 5-gate verification per commit: G1 `npx tsc --noEmit` clean, G2 vitest full-suite (16 baseline failures stable across all 7 implementation commits — 11 pre-existing FEATURE_114 scout-drift + FEATURE_168 schema parity + kodax_cli + tracker-consistency + extension-runtime; 5 concurrent-thread FEATURE_195 InkREPL WIP unrelated to this feature; net regression from F194 = 0), G3 `npm run build:packages` PASS, G4 API surface diff against baseline-exports snapshots (agent 155→202 +47 session-lineage symbols; coding 342 stable + REPOINTEL_DEFAULT_ENDPOINT preserved), G5 smoke imports of all subpaths + top-level barrels. **Zero prompt eval cost** ($0) — pure structural refactor, no LLM-facing behavior change. **Concurrent-thread safety** per [[feedback_concurrent_thread_git_race]]: every commit used explicit `git add` file lists (never `-A`) to avoid staging concurrent FEATURE_195 InkREPL WIP in the same repo. Design doc: [docs/features/v0.7.43.md#feature_194-package-consolidation](docs/features/v0.7.43.md#feature_194-package-consolidation--inline-mcp--skills--tracing--session-lineage--repointel-protocol-subpackages--9--4-workspace-packages). ADR: [ADR-036 Package Consolidation](docs/ADR.md#adr-036-package-consolidation--inline-single-consumer-subpackages-into-agent-feature_194-v0743).
24
+
25
+ ### Added
26
+
27
+ - **FEATURE-SDK-MODEL-CAPS — Expose Per-Model Capabilities Without API Key** (2 commits `7f627d0c` feat + `c37b0a13` fix, shipped 2026-05-25). SDK consumers (KodaX Space etc.) need to list providers + their models with context-window / reasoning info in popout UIs — but the pre-v0.7.43 path forced instantiation of each `KodaXProvider` class, which throws on missing API key. Static metadata was hidden behind runtime credentials, an architectural mismatch — that data is KodaX-maintained, not negotiated with the upstream. **Fix** (2-part): (1) Promote capability metadata (`contextWindow` / `maxOutputTokens` / `thinkingBudgetCap` / `supportsThinking` / full `KodaXModelDescriptor[]`) from per-Provider `class.config` field initializers UP to the existing `KODAX_PROVIDER_SNAPSHOTS` constant — Provider classes now derive their runtime `config` from the snapshot via `buildProviderConfig` (single source of truth, no drift risk; net −160 lines duplication / +30 lines metadata / byte-equivalent runtime behavior). (2) Add 9 new SDK exports reading directly from the snapshot (zero API keys touched): built-in `getProviderModelDescriptors` / `getModelCapabilities` / `listBuiltinModelCapabilities`; custom (from `~/.kodax/config.json#customProviders`) `getCustomProviderModelDescriptors` / `getCustomModelCapabilities` / `listCustomProviderModelCapabilities`; unified dispatchers `resolveProviderModelDescriptors` / `resolveModelCapabilities` / `listAllModelCapabilities`. New public type `KodaXModelCapabilities` exposed from `@kodax-ai/kodax/llm`. **`maxOutputTokens` rationale** (fix commit `c37b0a13`): the field IS reliable — it's the KodaX-side per-turn `max_tokens` request decision (bench-validated against kill-windows / decode-rate / cost-per-turn), NOT the upstream "theoretical maximum" (which is often inflated or absent — zhipu-coding / kimi-code / minimax-coding / ark-coding / deepseek `/v1/models` returns `{id, object, owned_by, created}` only). Embedders showing "expected output size" should use this value; theoretical ceilings should be looked up from the upstream provider's own docs. **Maintainer-probe scripts shipped**: `scripts/probe-upstream-model-metadata.mjs` (re-run periodically to detect upstream API improvements) + `scripts/probe-ark-tokens.mjs` (Ark-specific drill-down). **Tests**: `packages/llm/src/providers/model-capabilities.test.ts` 20/20 ✓ (no-API-key verification clears 6 env vars during assertion; snapshot drift guard asserts every `supportsThinking` provider declares `contextWindow` + every `models[]` entry is a descriptor object); full llm suite 304/304 ✓. **Bundle impact**: `dist/sdk-llm.d.ts` +1.2 kB (new types + symbols); `dist/sdk-llm.js` +400 bytes. **Architectural debt followup**: `KODAX_PROVIDER_SNAPSHOTS` still TS const compiled into bundle; capability data update path still needs `npm publish` + consumer `npm update`. FEATURE_198 (filed for v0.7.44) splits the snapshot to JSON + runtime loader for dist-patch-time updates (hot-update over network deferred to v0.7.46+). Docs: [`SDK_EMBEDDER_GUIDE.md §9`](docs/SDK_EMBEDDER_GUIDE.md#9-querying-per-model-capabilities-without-api-keys).
28
+ - **FEATURE_197 — Read-Only Markdown Agent Discovery: `discoverMarkdownAgents` SDK API (FEATURE_191 follow-up)** (1 commit, shipped 2026-05-24). KodaX Space (SDK 消费方) 2026-05-24 反馈:F191 `loadAgentsFromMarkdown` 触发 admission + 全局 registry 注册 side effect,他们想做 "agent picker" UI(用户 preview 已有 markdown agents 后再选择性激活),现有 loader 形态不匹配。`listConstructedAgentsWithSource()` 虽然技术上能 list 但是 `@internal` 标记的([`agent-resolver.ts:159-164`](packages/coding/src/construction/agent-resolver.ts#L159-L164) 明确写 "NOT yet a stable SDK surface; embedders SHOULD continue using `listConstructedAgents()`"),不能给 SDK consumer 用。F035 `discoverSkills(root?, opts?)` 是 pure read-only 形态,F191 没有对应的 read-only counterpart 是 SDK surface 设计 gap。**Fix**:抽 `parseMarkdownAgentFile(filePath)` shared helper(loader 和 discover 共用 parser,loader 行为 byte-identical),新增 `discoverMarkdownAgents(opts): Promise<{agents: DiscoveredMarkdownAgent[], failed: MarkdownLoadFailure[]}>` 公开 API:扫描同 two-tier path (user → project) → 返回 metadata `{name, description, source: 'markdown:user' | 'markdown:project', path, tools?, model?}`,**零 admission / 零 registration / 零全局 registry mutation**。Last-write-wins 与 loader parity(project 同名 shadow user)。Tools 字段返回 raw 名字不带 `builtin:` 前缀(discovery 暴露用户写的形态,ref-prefix 逻辑移到 loader 内 inline `.map(ref:)` 应用)。**Validation 边界**:discover 不验 admission(unknown tool ref / handoff cycle 都 surface),admission 仍在 `loadAgentsFromMarkdown` 兜底 — 与 F035 discoverSkills 不验 skill admission 形态对齐。**测试**:13 既有 F191 loader test 全过(parser 抽取无行为变化)+ 15 新 F197 unit test 覆盖 empty/missing-frontmatter/missing-name/missing-description/empty-body/project-shadows-user/tools-array/tools-csv/model-passthrough/admission-not-validated/loader-roundtrip-parity;**Read-only 硬契约**断言(`listConstructedAgents().length` discover 前后不变 + `resolveConstructedAgent(name)` discover 后仍 `undefined`)锁定 "discover 不能误注册" 边界。**Round-trip parity**断言 `discover.agents.length === loader.loaded` + 失败路径 set 相等 + 名字 set 相等 — 同 parser 共用保证 SDK consumer 用 discover preview 决定的 set 与最终 loader 激活的 set 一致。**Public surface**:`discoverMarkdownAgents` + `DiscoveredMarkdownAgent` + `DiscoverMarkdownAgentsResult` 从 `@kodax-ai/coding` 一路 reexport 到 `@kodax-ai/kodax` + `@kodax-ai/kodax/coding` 子路径。**Eval $0** — pure file-system + YAML parse, no LLM-facing change. 28/28 tests pass, tsc clean. 详见 [v0.7.43.md §FEATURE_197](docs/features/v0.7.43.md#feature_197-read-only-markdown-agent-discovery--discovermarkdownagents-sdk-apif191-follow-up).
29
+ - **FEATURE_195 — Sidecar Verifier UI Silent Accept: Default-Hide Accept Verdict Evidence Entry + Transcript-Mode Opt-In** (1 commit `1b53150e`, shipped 2026-05-24). User 2026-05-24 实战 session 截图("你好 → 你好!" 对话)显示 sidecar verifier accept verdict 的 `reason` 文本以 `> [Evaluator] ...` event-item 渲染到 transcript,背离 FEATURE_184 (v0.7.42, ADR-030) "silent accept" 设计意图(accept verdict 应只走 session.jsonl + artifact,UI 端仅看 `[AMA Verifying]` spinner)。3-step pipeline 漏 silent 到 UI 层:(a) [`verifier-recorder-bridge.ts:89-104`](packages/coding/src/agent-runtime/middleware/sidecar-verifier/verifier-recorder-bridge.ts#L89) 历史 backward-compat 写 `role:'evaluator'` 入 recorder;(b) [`payload-builder.ts:249-298`](packages/coding/src/task-engine/_internal/managed-task/payload-builder.ts#L249) recorder 进 evidence.entries;(c) [`InkREPL.tsx:574-624`](packages/repl/src/ui/InkREPL.tsx#L574) `buildManagedTaskTranscriptItems` 无差别 render 全部 evidence.entries 为 event-item。**Fix**:单 commit REPL render filter — `shouldFilterSidecarAcceptEntry(entry, verifierLog)` helper + extend `buildManagedTaskTranscriptItems(result, options?: { verifierLog?: boolean })`;filter 规则 `role==='evaluator' AND signal==='COMPLETE' AND !verifierLog ⇒ filter`;revise/blocked verdict 因 signal 不是 `'COMPLETE'` 自然 fall-through。Default 读 `process.env.KODAX_VERIFIER_LOG === '1'` (复用 F184 Phase D.3 已有 env var);config 入口同时支持 `verifierLog: true` in `~/.kodax/config.json`。**数据层 0 改动**:`recorder.verdict` 仍写 session.jsonl + artifact —— replay / debug / scorecard / `kodax sessions` resume 全完整。**测试**:8 新 unit test 覆盖 4 verdict state (accept-no-userAnswer / accept-with-userAnswer / revise / blocked) × 2 mode (default / verifierLog=true)。**Root cause refinement during impl**:立项 doc 假设 H0_DIRECT trivial-chat `decidedByAssignmentId='evaluator'`,实际生产 `payload-builder.ts:218-219` 三元 `harness === 'H0_DIRECT' ? 'direct' : verdictStatus ? 'evaluator' : 'worker'` 让 H0_DIRECT 是 `direct`(最高优先级)——所有 fixture 已对齐生产路径用 `direct`。**Eval $0**:无 LLM-facing prompt change;UI render filter 是 deterministic 行为,unit test 覆盖 sufficient。**Concurrent-thread safety**:0 文件 overlap with F194 (改 `packages/{mcp,skills,tracing,session-lineage}`);atomic stage + commit + push 同 Bash 调用 per `feedback_concurrent_thread_git_race`。详见 [v0.7.43.md §FEATURE_195](docs/features/v0.7.43.md#feature_195-sidecar-verifier-ui-silent-accept--default-hide-accept-verdict-evidence-entry--transcript-mode-opt-in) + ADR-030 §F195/F196 cross-reference。
30
+ - **FEATURE_196 — Sidecar Verifier Content-Aware Fire Gate: Action-Surface Detector + Conversational User-Intent Skip** (4 commits `10b8b290` → `c25ff99c` → `af7bc588` → this commit, shipped 2026-05-24). FEATURE_184 (v0.7.42, ADR-030) 在 Worker text-only termination 时无差别 fire sidecar verifier,包括 "你好" 这种零 action-surface trivial-chat 也跑 3-10s + LLM cost。F184 设计动机是抓 zhipu intent-vs-action floor(Worker 说 "明白,我用 todo_create..." 但没真调 tool),不是 trivial-chat 内容审查器;trivial chat 没有可 verify 的"声称完成"surface。F196 在 [`runner-driven.ts`](packages/coding/src/task-engine/runner-driven.ts) `composedStopHook` `!isIdleYieldTurn` 分支 `observer.sidecarStarted()` 之前加 deterministic 前置 gate `composeGateDecision(ctx, process.env)`,`fire===false` 直返 `extensionTurnCompleteHook(ctx)` 不进 sidecar;F184 fire 路径保持 byte-identical。**Gate 逻辑** (新模块 [`packages/coding/src/agent-runtime/middleware/sidecar-verifier/gate.ts`](packages/coding/src/agent-runtime/middleware/sidecar-verifier/gate.ts) ~213 LoC):(1) Layer 1 `detectActionSurface` — 看 last assistant message 有无 `tool_use` content block,有则 fire (action-surface);(2) Layer 2 `detectConversationalIntent` — greeting prefix regex (中英双语 + 通用 punctuation 👋 🙏) AND 长度 ≤ 20 codepoint AND 无 imperative verb (中文单字查/写/修/改/删/搜... + 中文多字 + 英文 imperative),三合取真则 skip (conversational);(3) escape hatch `KODAX_VERIFIER_ALWAYS=1` 强制 fire;(4) 默认 fire(保守失败 — F184 跑一遍 cost < 漏抓 zhipu floor)。`KODAX_VERIFIER_LOG=1` stderr `[sidecar-gate] {fire|skip}: <reason>` 复用 F195 env var。**测试**:23 unit (`gate.test.ts` — 6 actionSurface + 11 conversationalIntent + 6 composeGateDecision) + 3 integration (`runner-driven.test.ts` FEATURE_196 describe block — trivial-greeting skip / mutation-tool fire / imperative+zero-action fire) 全 pass。**Layer 2 eval — SHIP gate ALL EXCEEDED**(4 case × 5 canonical alias × 1 run = 60 panel cells + pilot 12 cells):(a) C1 greeting skip 5/5 alias **100%** (≥95% 立项门槛) / (b) C2 imperative fire 5/5 alias **100%** (≥95%) / (c) C3 long-message fire 5/5 alias **100%** (=100%) / (d) C4 no-greeting fire 5/5 alias **100%** (=100%) / (e) 5/5 alias meet (a)+(b) → **SHIP**。Eval cost **~$2 actual vs $10-15 budget** (under-spend ~8×) — gate logic deterministic(`composeGateDecision` is pure function),Layer 1 unit tests authoritative;Layer 2 scope 收窄到 tuple realism only("do real Worker LLM outputs across 5 provider families produce `KodaXContentBlock[]` shapes that `lastAssistantHasToolUse` detector handles?" + "do real model families respond to canonical user-message inputs with response patterns case categories assume?")。**3-judge audit 跳过** per EVAL_GUIDELINES.md §Layer 1 justification:gate decision per cell 是 `actualDecision === c.expectedDecision` 严格等值,无 LLM 歧义空间,3-judge majority 适用 LLM-judge 场景不适用 deterministic gate eval (raw text 抽查 spot-check 6 行已在 commit-3 message 记录)。**Eval drivers retained as permanent regression sweep**:`tests/feature-196-sidecar-content-gate.eval.ts` + `benchmark/datasets/feature-196-sidecar-content-gate/cases.ts` 入 repo;raw dumps 留 `<tmpdir>/kodax-eval-dumps/feature-196-sidecar-content-gate/` per `feedback_eval_dumps_stay_in_temp` 不入 repo;mkdirSync per flush survive Windows tmpdir race per `feedback_audit_dump_dir_vanishes`。**Behavior change for users**:trivial-chat (greeting + 零 tool call + ≤20 codepoint) 无 sidecar latency (省 3-10s tail + LLM cost);imperative + zero-action (zhipu intent-vs-action floor) 仍 fire 保 F184 contract;mutation + worker tool_use 仍 fire;`KODAX_VERIFIER_ALWAYS=1` env opt-back-in 强制 fire (debug / audit)。详见 [v0.7.43.md §FEATURE_196](docs/features/v0.7.43.md#feature_196-sidecar-verifier-content-aware-gate--action-surface-detector--conversational-user-intent-skip) + ADR-030 §F195/F196 cross-reference。
31
+ - **FEATURE_191 — User-Authored Custom Agents (Markdown Loader + Extension `registerAgent` + `dispatch_child_task` Bridge)** (10 commits 2026-05-23, supersedes v0.7.50 FEATURE_128 placeholder). Closes a 3-gap stack: (a) Worker had no way to dispatch a registered specialist; (b) users couldn't author agents in markdown; (c) extension API lacked `registerAgent`. Same-version closure because the three depend on a shared `(name, AgentContent)` → `buildAdmissionManifest` → `Runner.admit` → `registerConstructedAgent` pipeline. **Phase A — dispatch bridge**: `dispatch_child_task.subagent_type?: string` schema field; `KodaXChildContextBundle.specialistName?` carrier; `AgentContent.description?` + `Agent.description?` glue field; `dispatch-child-tasks.ts` unknown-name guard + write-role gate (rejects specialist-write dispatched from non-Worker/Generator role); `child-executor.ts:resolveSpecialistOverride` computes systemPromptOverride (= specialist instructions verbatim) + complementary excludeTools (`allTools - specialistTools`); `prompts/capability-sections.ts:buildSpecialistAgentsBlock` injects `=== Available specialist agents ===` SP section when registry non-empty; `worker-role-prompt.ts:dispatchRules` appends ADR-033-compliant SPECIALIST ROUTING bullet (qualitative, no enumeration, no ✗, no FEATURE_xxx). **Phase B — markdown loader**: new `construction/markdown-loader.ts` scans `~/.kodax/agents/*.md` then `<cwd>/.kodax/agents/*.md` with last-write-wins precedence (project shadows user); uses `parseYamlFrontmatter` from `@kodax-ai/skills/shared/yaml` (repo canonical, NOT gray-matter); tolerant `tools` field accepts YAML array or comma-separated string; ignores `mcpServers`/`hooks`/`memory`/`isolation`/`permissionMode`/`maxTurns`/`skills` for forward-compat. `ConstructedAgentRegistration.source?` field tracks 6-value provenance enum (`'built-in' | 'extension' | 'markdown:user' | 'markdown:project' | 'constructed:cli' | 'constructed:llm'`); REPL boot calls `loadAgentsFromMarkdown(cwd)` after `rehydrateActiveArtifacts` so resolver is populated for cross-agent handoff validation. **Phase C — extension API**: `KodaXExtensionAPI.registerAgent(name, content): Promise<() => void>` adapts caller-friendly `(name, AgentContent)` to manifest; throws on admission rejection; auto-unregisters via `LoadedExtensionRecord.disposables` reverse-iterate. **Tests**: 18 agent-resolver + 13 markdown-loader + 4 bootstrap + 19 extension-runtime + 4 cap-095 contract (including new CAP-CHILD-EXEC-004 specialist branch) + 6 specialist tests in child-executor.test.ts + 6 dispatch-child-tasks specialist tests, all green; `tsc --noEmit` clean across coding + repl. **Eval (actual run, 2026-05-23)**: 5-alias canonical panel × 4 case × 5 runs = 100 cells (~$3) + 3-judge majority audit (zhipu/glm51 + ark/v4pro + kimi, panel-internal — NEVER anthropic/openai per EVAL_GUIDELINES; ~$2 / 300 calls). Audit disagreement 5.0% → **DATA VALID** per anti-pattern 7 §3. Pre-registered SHIP gate strict result: (a) C1 dispatch ≥60% per alias **3/5 met** (kimi 80% ✓, ark/v4{flash,pro} 60% borderline ✓, zhipu 20% ✗, mmx 20% ✗) / (b) C3 false-name ≤10% **0% across all 5 alias** ✓ PERFECT / (c) C4 multi-candidate ≥50% **1/5 met** (kimi 60% ✓ only) / (d) audit disagreement ≤10% **5.0%** ✓ / (e) 4-of-5 strict gate **FAIL** on (a)+(c). **SHIP with evidence-driven override** per [`feedback_pre_registered_gate_saturation`](memory/feedback_pre_registered_gate_saturation.md): (1) baseline = 0% by construction (pre-F191 SP has no specialist block + schema field missing); every C1/C4 PASS is new behavior (net +21 PASS, no regression); (2) C2+C3 negative cases each 25/25 — SP does NOT introduce false-positive dispatches nor name fabrication (safety property load-bearing + satisfied); (3) C1/C4 under-trigger is single-turn-probe ceiling + zhipu intent-vs-action floor + kimi narration-loop, structurally model-side not prompt-tunable per [`feedback_model_structural_floor_not_prompt_tunable`](memory/feedback_model_structural_floor_not_prompt_tunable.md); production is multi-turn (narrate→tool naturally splits across rounds). Pilot pre-scale uncovered regex false-negative (`subagent_type: name` no-quote YAML form) → fixed to 5-syntax matrix per anti-pattern 7 §4 before panel run. Eval drivers retained as permanent regression sweep; v0.7.44 follow-up to investigate multi-turn-friendly Layer 3 eval design. Test guide: [docs/test-guides/FEATURE_191_v0.7.43_TEST_GUIDE.md](docs/test-guides/FEATURE_191_v0.7.43_TEST_GUIDE.md). Design doc: [docs/features/v0.7.43.md#feature_191-user-authored-custom-agents--markdown-loader--extension-registeragent--dispatch_child_task-bridge](docs/features/v0.7.43.md#feature_191-user-authored-custom-agents--markdown-loader--extension-registeragent--dispatch_child_task-bridge).
32
+ - **FEATURE_193 — V1 Chain Full Retirement (Scout/Planner/Generator Chain Agents + Entry Routing + V1 Emit Tools)** (6 commits `9fb07d67` → `c5d4b829` → `dcac55ea` → `ef82e99c` → `c556d46d` → this commit, shipped 2026-05-23). Closes the V1 harness deprecation tail: FEATURE_114 (v0.7.36) introduced the V2 Worker single-loop as a `KODAX_HARNESS_V2=true` opt-in path, v0.7.38 Slice 7 flipped V2 to the default, FEATURE_184 (v0.7.42) retired the in-chain Evaluator, FEATURE_190 (v0.7.43) deleted `emit_handoff`. F193 finishes the cleanup by deleting the V1 Scout/Planner/Generator chain agents themselves + their role prompts + their emit tools + the V1 entry-routing branch in the runner + the `KODAX_HARNESS_V2` flag. **5 dependency-ordered commits**: (1) `9fb07d67` V1 test surface deletion (10 files, ~50 tests deleted + 19 cross-cutting tests migrated to Worker handler, −2577 LoC, test-only no production-behavior risk); (2) `c5d4b829` runner-driven.ts entry routing simplification (`entryAgent = chain.worker` unconditional; L776 `initialHarness` always `'PLANNED'`) + `isHarnessV2Enabled()` deleted + V1 branches in `verdict-recorder.ts` (L332/L482) + `observer-bridge.ts` (L353) collapsed; (3) `dcac55ea` V1 chain agent declarations deleted from `agent-chain.ts` (chain.scout/.planner/.generator + their handoff arrays + helpers) + `coding-agents.ts` slimmed to `CODING_AGENT_MARKER` only + `task-engine-agents.ts` retains name constants for verdict-recorder routing/session-id compat (workerAgent only declarative Agent) + `buildRunnerScoutAgent` deleted; (4) `ef82e99c` V1 role prompts deleted from `role-prompt.ts` (createRolePrompt switch loses scout/planner/generator cases, ~548 LoC) + `role-prompts.ts` (SCOUT/PLANNER/GENERATOR_INSTRUCTIONS_FALLBACK) + `protocol-emitters.ts` (`emitScoutVerdict` / `emitContract` / `EMIT_SCOUT_VERDICT_TOOL_NAME` / `EMIT_CONTRACT_TOOL_NAME` deleted; PROTOCOL_EMITTER_TOOLS shrinks 3→1) + `parse-helpers.ts` (scout/planner cases in `getEmitToolNameForRole`) + `tool-permission.ts` (V1 emit→subagent cases) + `tool-policy.ts` + `role-exclude.ts` + entire `scope-aware-harness-guardrail.ts` module deleted (was V1-specific Scout H0/H1/H2 miscalibration detection); (5) `c556d46d` SDK barrel re-exports trimmed (V1 emit names + emitter functions removed from `coding/src/index.ts` + `coding/src/agents/index.ts`) + 8 V1 eval files archived to `tests/_archive/` (`ama-harness-selection*` 3 files + `eval-scout-*` 2 files + `feature-097-*` 2 files + `feature-114-scout-trivial-exemption.eval.ts`) + ADR-030 V1 retirement cross-reference; (6) this commit — post-review dead-code residual cleanup: `agent-chain.ts` `scoutDispatch` + `generatorDispatch` deletion (declared but never consumed after V1 chain agent removal), `verdict-recorder.ts` `wrapEmitterWithRecorder` slot type narrowed from union to `'verdict'` literal + dead `slot === 'scout'` / `'contract'` / `'handoff'` branches removed (scout todoStore seeding, contract replan-seed, scout pre-handoff write warning, scout budget-cap upgrade, `applyScoutDecisionToPlanRunner` propagation, multi-slot summary fallback) + unused imports pruned (`applyScoutDecisionToPlanRunner`, `BUDGET_CAP_BY_HARNESS`, `emitResilienceDebug`, `ManagedMutationTracker`) + `child-executor.ts` `validateWriteBundles` allow-list comment updated to clarify legacy `generator` + `H2_PLAN_EXECUTE_EVAL` parity branches survive only for test-surface continuity (production V2 Worker uses `tool-dispatch`). Test scope: 30 child-executor + 128 runner-driven/todo-store + 168 task-engine/_internal/managed-task tests all green; tsc clean. ~−139 net LoC across 3 files. **Aggregate impact**: ~−4500 LoC net deletion across ~30 files. **Zero runtime behavior change on V2 paths**: `KODAX_HARNESS_V2=true` route is byte-identical to pre-F193 (V2 is the only active route). The `KODAX_HARNESS_V2=false` env opt-out is silently ignored — won't break user shell configs but no longer routes through V1 (V1 deleted). V1 type union members (`harnessProfile: 'H0_DIRECT' | 'H1_EXECUTE_EVAL' | 'H2_PLAN_EXECUTE_EVAL' | 'PLANNED'`, `roleAssignments[].role: 'scout' | 'planner' | 'generator' | 'direct' | 'worker' | 'evaluator'`, `harnessTransitions`) survive as pre-1.0 SDK-surface vestigial fields — they're harmless (the runner no longer populates them on V1 values, so they become unreachable from runtime, but external SDK type consumers that destructure them aren't broken). Removal deferred to a future major-bump scope review. **Zero eval cost**: V1 is dead code; deletion needs no LLM judge. Pre-existing FEATURE_168 schema parity failures (6 tests) are unrelated F189 B.1 drift and remain — fix scheduled separately. Design doc: [docs/features/v0.7.43.md#feature_193-v1-chain-full-retirement](docs/features/v0.7.43.md#feature_193-v1-chain-full-retirement--scoutplannergenerator-chain-agents--entry-routing--v1-emit-tools).
33
+ - **FEATURE_190 — FEATURE_184 Cleanup Tail: Text-Only Termination + `emit_handoff` Tool Surface Removal + Evaluator Prompt Sweep** (9 commits `078d2e99` → `aefa12d1`, shipped 2026-05-23). FEATURE_184 (v0.7.45) retired the in-chain Evaluator role + made Worker/Generator terminal but left the `emit_handoff` tool + Worker `EVALUATOR HANDOFF` prompt block as load-bearing dead code (the tool was the V2 chain's *only* terminal signal — `detectIdleYield.hasEmittedHandoff = Boolean(recorder.handoff)` gated idle-yield exit). F190 is the cleanup tail. 5-phase plan: (0) `078d2e99` NIL-conflict plumbing (stall-sidecar suggest list / `tool-permission` case / `detectMissingTerminalVerdict` dead code); (1) `8b08d5c1` text-only termination canonical-path ratification (12 new tests + docstring updates); (2a) `5fa1c362` Worker/Generator prompt rewrite (TERMINATION block replaces EVALUATOR HANDOFF; `protocol-emitters.ts` description swaps "Evaluator" → "Sidecar Verifier"); (2b) `901a4c26` Layer 2 eval pilot driver; (2c) `0675d611` + `9ca593ef` 200-cell panel (5 alias × 4 case × 2 variant × 5 runs) + 3-judge LLM majority audit (zhipu/glm51 + ark/v4pro + kimi, panel-internal); (3) `4c296ad4` tool surface deletion (`handoffEmit` wrapper + FEATURE_165 pending-children gate logic + `emitHandoff` + `EMIT_HANDOFF_TOOL_NAME` + `PROTOCOL_EMITTER_TOOLS` 4→3 + barrel re-exports across `agents/index.ts` and `coding/src/index.ts` + `ROLE_EMIT_TOOL_NAMES` narrowed to scout+planner + `getEmitToolNameForRole` returns undefined for generator/worker + Generator-prompt `generatorReasoningDiscipline` reworded to drop the tool reference; 7 source files / +52 −178 LoC); (4) `d6ea1366` test rewrites (9 test files: protocol-emitters / coding-agents / runner-driven-tool-wiring / runner-driven / role-prompt / text-only-termination / parse-helpers / idle-yield + the Generator-prompt source change; +279 −717 LoC; 108 passed / 3 todo); (5) `aefa12d1` ship-status doc block in `docs/features/v0.7.43.md` + memory record. **Layer 2 SHIP gate evidence** (evidence-driven per `feedback_pre_registered_gate_saturation`): C1 (all-todos-completed) V_new 25/25 (100%) text-only + summary across all 5 alias; C2 (blocked-state) V_new 24/25 (96%); C3 (mid-task negative) + C4 (trivial completed positive) classified case-design-saturated (V_baseline also fails equally on those cases, V_new ≥ V_baseline on every alias so not a V_new regression); 3-judge audit reached **4.4% disagreement on drop-C4 set** (DATA VALID per EVAL_GUIDELINES anti-pattern 7 §3). C3+C4 case redesign scheduled v0.7.44. Cost ~$12 (pilot $0.30 + panel $10 + audit $1.50, under design-doc $13-15 budget). **Architectural payoff**: the FEATURE_165 pending-children gate's invariant survives via idle-yield — when Worker text-terminates with pending children, `detectIdleYield` returns true → runner waits + resumes Worker, same end-user observable as the rejected-tool-call gate, without an LLM ever calling a tool that needs to be rejected. The `recorder.handoff` slot / `IdleYieldSnapshot.hasEmittedHandoff` field / `Boolean(recorder.handoff)` reads in `runner-driven.ts` remain in the public type surface as vestigial (always-false post-Phase-3); removing them widens scope beyond F190 and is deferred. See [ADR-030](docs/ADR.md#adr-030-claudecode-shape-main-agent--sidecar-verifier-substrate-feature_184-v0745) §F190 cross-reference + [v0.7.43.md §FEATURE_190](docs/features/v0.7.43.md#feature_190-feature_184-cleanup-tail--text-only-termination--emit_handoff-tool-surface-removal--evaluator-prompt-sweep).
34
+
35
+ ## [0.7.42] - 2026-05-21
36
+
37
+ ### Theme
38
+
39
+ **SDK Embedder Surface Closure + Compaction Systemic Fixes + Plan-List Resilience + Hits Ledger** — Four parallel work streams converge on v0.7.42. **FEATURE_186** (this cycle's headline external-facing item) closes the 10-gap export list reported by KodaX Space (downstream SDK consumer on v0.7.40) plus the MCP popout design request: build-dts CI guard against `@kodax-ai/*` internal-import leaks in entry `.d.ts`, one-liner re-exports for `bootstrapAutoMode` / `loadCommands` / `getAgentConfigHome` etc., a Skill `!cmd` dynamic-context host hook (`executeDynamicContext?` + `disableDynamicContext?`), declarative `ToolSideEffect` metadata on all 51 built-in tools with metadata-driven plan-mode gate (kills the `acp_server.ts` hardcoded `Set(['write','edit'])`), Custom provider + MCP server CRUD against `~/.kodax/config.json` with dynamic `getAgentConfigPath` resolution (no frozen `KODAX_CONFIG_FILE`), a new non-blocking `startKodaX(opts, prompt): RunningSession` entry exposing mid-run `setProvider` / `setModel` / `setReasoning` / `abort` (CAP-055 per-turn re-resolution picks up the new values on the next turn), and a sixth SDK subpath `@kodax-ai/kodax/mcp` for popout consumers who only need the MCP layer. **FEATURE_177→FEATURE_183 + FEATURE_185** address compaction systemic regressions surfaced by long-running kimi-loop investigations: read-file-state cache (F177), L2 stall-detector sidecar (F178, 4 commits), AMA compaction trigger parity at top-of-loop ([ADR-029](docs/ADR.md#adr-029-ama-compaction-trigger-parity--top-of-loop-feature_179-v0742)), repo-intelligence system-message dedup (F180), empty-summary-must-not-overwrite-real-prior (F181), fast-path requires non-empty `previousSummary` (F182), `PROTECTED` whitelist 1→26 (F183, claudecode parity), and hits-ledger enrichment that preserves grep / glob / bash result-side artifacts across microcompact ([ADR-031](docs/ADR.md#adr-031-task-level-hits-ledger-与-cross-session-memdir-分层独立feature_185-v0742)). **FEATURE_175** ships two of three plan-list fixes (id-preserve on `op:'init'` + B2 synth `autoCompleteOnAccept`); the dirty-reject prototype was REVERTED post-eval after zhipu's intent-vs-action floor failed pre-registered SHIP gate (b). **FEATURE_173** lands the public session-management SDK surface + the `runner-${epoch}` ghost-session double-write fix that produced parallel `runner-*.jsonl` files since v0.7.36. Several **claudecode-parity surface polishes** also ship in this cycle: dedicated `skill` tool replaces read-SKILL.md invocation, multimodal `tool_result` for the read tool with image-aware compaction, `todo_get` tool, `subject` / `description` schema split, plan-list staleness refresh + dedup scan, ark-coding adds `deepseek-v4-{flash,pro}`. **FEATURE_094 (Deep Anti-Escape Hardening) was CANCELLED 2026-05-19** after the necessity probe measured 0/43 escape across the canonical 5-alias panel — the post-v0.7.26 layered defense (P0 prompt + P2a multi_edit + P2b cap) plus FEATURE_152 (bash AST) + FEATURE_158 (signal classifier) + FEATURE_169 (pull-tool prompt) absorbed the bypass surface. The probe is retained as a permanent regression sweep (`tests/feature-094-necessity-probe.eval.ts`) — escape rate must stay 0%; >5% re-opens FEATURE_094.
40
+
41
+ ### Added
42
+
43
+ - **FEATURE_186 — SDK Embedder Surface Closure (KodaX Space Gap List + MCP Popout)**. **8 atomic commits across 8 phases** (Phase 1 `2e33b681` build-dts CI guard / Phase 2 `d3ab38b0` 一行 export 集 / Phase 3 `9b1e440f` Skill `!cmd` host hook / Phase 4 `7defd65f` Tool side-effect metadata + metadata-driven plan-mode gate / Phase 5 `ee549d6f` Custom provider CRUD / Phase 6 `9ba68f25` `RunningSession` + `sessionControl` / Phase 7 `523e9a28` MCP server CRUD + `@kodax-ai/kodax/mcp` subpath / **Phase 8 `McpManager` popout-shape API**). Closes the 10 export gaps + MCP popout design request reported by KodaX Space (substrate consumer on `@kodax-ai/kodax@0.7.40`). Three categories: (1) **SDK publish hazards** — entry `.d.ts` bundle no longer leaks `@kodax-ai/*` internal imports; `build-dts.mjs` self-tests against POSITIVE/NEGATIVE samples + hard-asserts via grep on each entry `.d.ts`. (2) **Barrel re-exports** — Space no longer maintains parallel implementations: `bootstrapAutoMode`, `loadCommands`, `KODAX_COMMANDS_DIR`, `processCommandCall`, `parseCommandCall`, `getAgentConfigHome` / `Path`, `setAgentConfigHome`, new `getAppDataDir(appId)` (with reserved-name guard `^[a-z][a-z0-9-]{1,31}$`, rejects `kodax-*` prefix), `validateCustomProviderConfig`, `ToolSideEffect` enum + 4 helpers (`getAllRegisteredTools` / `isToolPlanModeAllowed` / `isToolFileMutation` / `isToolMutation`) all surface through the SDK barrel. (3) **Runtime hooks** — Skill `!cmd` execution gets a 3-tier dispatch (host `executeDynamicContext?` hook → `disableDynamicContext?` throws → legacy `execSync`); `runKodaX` gains a non-blocking sibling `startKodaX(opts, prompt): RunningSession` with `id` / `currentProvider/Model/Reasoning` getters, `setProvider` / `setModel` / `setReasoning` setters (queue + replay on pre-attach, direct mutation post-attach; CAP-055 reads the live `RuntimeSessionState` on next turn), `abort(reason?)` via internal `AbortController` (forwards external `options.abortSignal`), and `result` Promise pass-through. Plan-mode gate is now metadata-driven: `LocalToolDefinition.sideEffect: 'readonly' | 'mutates-fs' | 'mutates-shell' | 'mutates-network' | 'mutates-state'` is required, optional `planModeAllowed?: boolean` whitelists per-tool; 51 built-in tools labeled (22 readonly / 12 mutates-fs / 1 mutates-shell / 5 mutates-network / 12 mutates-state); `acp_server.ts`'s hardcoded `Set(['write','edit'])` replaced by `isToolFileMutation`. Custom provider CRUD (`list/get/upsert/removeCustomProvider`) and MCP server CRUD (`list/get/upsert/remove/validateMcpServerConfig`) own `~/.kodax/config.json` end-to-end, with `getAgentConfigPath('config.json')` resolved on every call (no frozen `KODAX_CONFIG_FILE` constant — `setAgentConfigHome()` overrides take effect immediately). The new `@kodax-ai/kodax/mcp` subpath re-exports `@kodax-ai/mcp` only (~0 kB + shared chunks); popout consumers pull MCP without the full coding bundle. **Phase 8 added after KodaX Space reported Phase 7's `/mcp` only exposed "types + helpers, no manager-shape API"**: new `McpManager` class + `createMcpManager(servers, options?)` factory expose `listServers / startServer / stopServer / getServerLogs / listTools` popout operations plus `provider() / execute / describe / search / read / dispose` escape hatch. Internally wraps one `McpCapabilityProvider`; `McpCapabilityProvider` gains two readonly accessors (`getServerIds()` + `getRuntime(id)`) so manager can read the active runtimes Map without re-constructing them — capability-provider API (the substrate-facing shape) stays fully backwards-compatible. **158 new unit tests** across 8 phases (Phase 8 = 20 manager tests against a real MCP stdio JSON-RPC fixture). Design doc: [docs/features/v0.7.42.md#feature_186-sdk-embedder-surface-closure--kodax-space-gap-list--mcp-popout](docs/features/v0.7.42.md#feature_186-sdk-embedder-surface-closure--kodax-space-gap-list--mcp-popout). Architecture: [ADR-032](docs/ADR.md#adr-032-sdk-embedder-surface-closure-feature_186-v0742).
44
+ - **FEATURE_173 — Session Management Public SDK + `session.id` Propagation Bug Fix** (commit `a8258d29` implementation; `ac2752a4` design relocation). New `packages/repl/src/session/public-api.ts` thin facade over `FileSessionStorage`; exposes `listSessions({ projectRoot, scope, includeArchived, limit, before })` / `loadSession` / `forkSession` / `rewindSession` / `setActiveEntry` / `deleteSession` / `listRunningSessions` / `watchSessions(cb)` + `createSessionManager({ sessionsDir })` factory via `@kodax-ai/kodax/session` (`dist/sdk-session.js` 731 B + `dist/sdk-session.d.ts` 5.9 KB in tarball). Running-session lock reuses FEATURE_125 team-mode `<configHome>/instances/<pid>/` heartbeat; mutation against a running session returns `{ error: { code: 'session_running', runningProcess: { pid, startedAt } } }` (never throws). Platform-branched `watchSessions`: POSIX `fs.watch` + 100ms debounce coalesce / Windows 1000ms polling (cross-process file creation on Windows fs.watch is unreliable). **13 stable-contract tests** total (12 Part B + 1 Part A) pin `SessionSummary` field names + `forkSession` never-throws semantics + running gate + watch coalesce. **Part A bug fix**: `runManagedTask` call chain dropped `opts.session.id` between `runWithIdleYield` → `primitives/runner.ts`, so the `effectiveRunResult.sessionId ?? \`runner-${Date.now()}\`` resolution at `runner-driven.ts:1965` always fell to the right-hand fallback, producing duplicate `runner-*.jsonl` files (synthesized id) alongside the canonical `YYYYMMDD_HHMMSS.jsonl` (REPL-side). 5-LoC fix prepends `options.session?.id` to the `??`-chain; `FEATURE_173 Part A` contract test locks "caller id wins, ghost-prefix never appears" forever. Out of scope (deferred to v0.7.43): `listRunningSessions().sessionId` field reserved but unpopulated (needs FEATURE_125 heartbeat schema bump to write sessionId into state.json — deleteSession running-gate matches by pid for v0.7.42); `createSessionManager({sessionsDir})` accepts but ignores `sessionsDir` (FileSessionStorage hardcodes `KODAX_SESSIONS_DIR`); old `runner-*.jsonl` cleanup deferred to FEATURE_174 `kodax sessions dedupe`. Design doc: [docs/features/v0.7.42.md#feature_173-session-management-public-sdk--sessionid-propagation-bug-fix](docs/features/v0.7.42.md#feature_173-session-management-public-sdk--sessionid-propagation-bug-fix).
45
+ - **FEATURE_184 — Sidecar Verifier Substrate (claudecode-Shape Main Agent + Stop Hook Primitive)**. Originally drafted as v0.7.45; shipped to v0.7.42 release window 2026-05-21 with full SHIP gate (a)+(b)+(c)+(d) MET on Phase D.4 Layer 2 eval (100/100 primaryPassed; 0% LLM-judge audit disagreement on 20-cell random sample). Retires the AMA H2 Worker→Evaluator role state machine in favor of claudecode-style single-loop Main Agent + agent-layer `StopHookFn` primitive + out-of-chain Sidecar Verifier. Resolves the zhipu/glm51 intent-vs-action floor that made FEATURE_167 B2 synth-accept fallback silently no-op the verification gate. **Net delete ~423 LoC** across `EVALUATOR_AGENT_NAME` / `emit_handoff` / `verdict-recorder` evaluator branches / F165/166/167 dead retry paths. New module `packages/coding/src/agent-runtime/middleware/sidecar-verifier/` (5 files, ~200 LoC impl + ~250 LoC test); sidecar context = current-turn user queries + 24-msg rolling buffer + file-edit summary (must see what main agent **did**, not only what it **said**); model default-inherits main agent, with `KODAX_VERIFIER_PROVIDER` / `KODAX_VERIFIER_MODEL` env-var opt-in for cross-family decoupling. UI surface: `⊙ Verifying...` dim spinner + `↻ Retrying: <reason>` + `⚠ Cannot verify: <reason>` (per claudecode `hook_stopped_continuation` style). See [ADR-030](docs/ADR.md#adr-030-claudecode-shape-main-agent--sidecar-verifier-substrate-feature_184-v0745). Design doc: [docs/features/v0.7.42.md#feature_184-sidecar-verifier-substrate--claudecode-shape-main-agent--stop-hook-primitive](docs/features/v0.7.42.md#feature_184-sidecar-verifier-substrate--claudecode-shape-main-agent--stop-hook-primitive).
46
+ - **FEATURE_175 — Plan-List Resilience: `op:'init'` Mid-Task Status Preservation + B2 Synth Auto-Completion** (commit `1368ce55` + dirty-reject revert markers). Based on 2026-05-19 production session where V2 PLANNED ran 12m54s but plan stayed at `0/4 completed`. Three independent bugs stacked: (1) `todo-store.ts:218-237` `init()` unconditionally reset status to pending — Worker mid-task `op:'init'` refine-scope wiped prior completed/skipped/cancelled; (2) FEATURE_167 (v0.7.41) B2 synth fallback directly assigned `recorder.verdict` property, bypassing the `wrapEmitterWithRecorder` slot setter, so `autoCompleteOnAccept` never fired — run accepted, UI froze at `0/N completed`; (3) `executeInitOp` had no dirty-store guard, magnifying (1). **Slice 1 prototype** three fixes same version: (a) `init()` id-match terminal-success preserve (keeps completed/skipped/cancelled + note, new ids pending, pending/in_progress/failed reset) SHIPPED; (b) B2 synth path now mirrors wrapper side-effect via `todoStore.autoCompleteOnAccept()` SHIPPED; (c) `executeInitOp` returns `{ok:false, reason:"... use surgical APIs ..."}` on non-pending store contents — **PROTOTYPED → eval-driven REVERTED** after Layer 2 panel (51 calls = 1 pilot + 50 phase1, ~$3) showed zhipu/glm51 0/10 PASS on C1+C2 with audit disagreement 0% (real [project_zhipu_send_message_floor](../../../memory/project_zhipu_send_message_floor.md) intent-vs-action floor: "明白,用 todo_create 插入新步骤:" prose-without-tool); pre-registered SHIP gate (b) hard-fail → REVERT. Reverted code retained as revert-pin tests + marker comments. Slice 2: +6 net tests (4 todo-store + 1 todo-update revert-marker + 1 runner-driven integration); coding 2704/2704 + repl 1431/1432 green. Design doc: [docs/features/v0.7.42.md#feature_175-plan-list-resilience--opinit-mid-task-status-preservation--b2-synth-auto-completion](docs/features/v0.7.42.md#feature_175-plan-list-resilience--opinit-mid-task-status-preservation--b2-synth-auto-completion).
47
+ - **FEATURE_177 — Read-File-State Cache (anti-loop)** (commit `8e64e09e` + `c66e2403` post-compact fire). Per-task LRU keyed by absolute path stores `{ mtime, size, hash }` for files the worker has read; subsequent identical reads return cached envelope with a "still fresh — your prior read at turn N is current" banner, suppressing the kimi-loop "read file 4 times in a row" pattern observed in production. Cache invalidated on tool-side mutation (write / edit / multi_edit / insert_after_anchor) and on cross-microcompact boundaries via `onPostCompact` (fixed in `c66e2403` to fire on microcompact-only changes, not just full compactions). Design doc: [docs/features/v0.7.42.md#feature_177-读文件状态缓存read-file-state-cache--抑制非必要重复读取](docs/features/v0.7.42.md#feature_177-%E8%AF%BB%E6%96%87%E4%BB%B6%E7%8A%B6%E6%80%81%E7%BC%93%E5%AD%98read-file-state-cache--%E6%8A%91%E5%88%B6%E9%9D%9E%E5%BF%85%E8%A6%81%E9%87%8D%E5%A4%8D%E8%AF%BB%E5%8F%96).
48
+ - **FEATURE_178 — L2 Stall Sidecar (Rule + LLM dual-layer anti-loop detector)** (4 commits `e79008c1` → `f91cf7cb` → `9bc209f9` → `d9c52638`). L1 (rule layer): standalone stall detector module scans the last N turns for repeat tool-call signatures (same name + same input keys); fires when ≥3 identical calls in N=5 turns. L2 (LLM sidecar): on L1 fire, dispatches a sidecar LLM judge with the recent turn window + a stall-classification system prompt; returns `{ stalled: true|false, reason }` deterministically parseable. Control plane: orchestrator + nudge injection prepends `<stall-detector>` system reminder to the next user message when L2 confirms; rule-only mode (no LLM) available via `KODAX_STALL_SIDECAR=rule`. Design doc: [docs/features/v0.7.42.md#feature_178-l2-stall-sidecar--rule--llm-双层反-loop-检测](docs/features/v0.7.42.md#feature_178-l2-stall-sidecar--rule--llm-%E5%8F%8C%E5%B1%82%E5%8F%8D-loop-%E6%A3%80%E6%B5%8B).
49
+ - **FEATURE_179 — AMA Compaction Trigger Parity (Top-of-Loop)** (commit `02836a72`, see [ADR-029](docs/ADR.md#adr-029-ama-compaction-trigger-parity--top-of-loop-feature_179-v0742)). Moves the AMA compaction hook from end-of-turn to top-of-loop, mirroring SA path's `runCompactionLifecycle` ordering. Pre-fix: AMA path called compaction AFTER the new user message landed in the transcript, so the trigger metric saw the next-turn budget already eaten — compaction either fired too late (already over) or skipped (transcript estimate sub-threshold but post-merge over). Post-fix: hook runs BEFORE the next-turn LLM call, against the pre-merge transcript state, matching SA path semantics. Design doc: [docs/features/v0.7.42.md#feature_179-ama-compaction-trigger-parity--top-of-loop-触发](docs/features/v0.7.42.md#feature_179-ama-compaction-trigger-parity--top-of-loop-%E8%A7%A6%E5%8F%91).
50
+ - **FEATURE_180 — Repo-Intelligence System Message Dedup** (commit `e1782ffe`). Repo-intel capsule injection (FEATURE_161 v0.7.40) could land identical system messages across rounds when topology / module / impact signals were stable; dedup by content hash keeps one copy. Design doc: [docs/features/v0.7.42.md#feature_180-repo-intelligence-system-message-dedup](docs/features/v0.7.42.md#feature_180-repo-intelligence-system-message-dedup).
51
+ - **FEATURE_181 — Empty LLM Summary Must Not Overwrite Real Prior Summary** (commit `57a79767`). When the compaction LLM call returns empty / whitespace-only / API error, the prior `summary` (if non-empty) is preserved instead of overwritten with `""`. Closes a kimi-loop-adjacent case where a single compaction failure wiped the entire compacted history. Design doc: [docs/features/v0.7.42.md#feature_181-empty-llm-summary-不再覆盖-real-prior-summary](docs/features/v0.7.42.md#feature_181-empty-llm-summary-%E4%B8%8D%E5%86%8D%E8%A6%86%E7%9B%96-real-prior-summary).
52
+ - **FEATURE_182 — Compaction Fast-Path Requires Non-Empty `previousSummary`** (commit `d67aa776`). The compaction fast-path (skip LLM, reuse prior summary + new turn delta) gated on `previousSummary` length > 0; cold-start sessions correctly fall through to full LLM compaction. Design doc: [docs/features/v0.7.42.md#feature_182-compaction-fast-path-必须有-previoussummary-才能复用](docs/features/v0.7.42.md#feature_182-compaction-fast-path-%E5%BF%85%E9%A1%BB%E6%9C%89-previoussummary-%E6%89%8D%E8%83%BD%E5%A4%8D%E7%94%A8).
53
+ - **FEATURE_183 — PROTECTED Tool Whitelist Expansion (1 → 26, claudecode parity)** (commits `f6a51be2` + `c322d835` review-amend). `PROTECTED` tools are exempted from compaction's "clear tool_result content" step (the tool's structured payload survives across compact boundaries); pre-fix only `read` was on the whitelist. Expanded to 26 tools matching claudecode's parity set: `read`, `write`, `edit`, `multi_edit`, `glob`, `grep`, `bash`, `todo_create`, `todo_update`, `todo_list`, `todo_get`, `web_search`, `web_fetch`, `task`, `dispatch_child_task`, `ask_user_question`, `emit_verdict`, `emit_handoff`, `module_context`, `symbol_context`, `process_context`, `impact_estimate`, `worktree_create`, `worktree_remove`, `exit_plan_mode`, `skill`. Design doc: [docs/features/v0.7.42.md#feature_183-protected-工具白名单扩容--claudecode-对照修正](docs/features/v0.7.42.md#feature_183-protected-%E5%B7%A5%E5%85%B7%E7%99%BD%E5%90%8D%E5%8D%95%E6%89%A9%E5%AE%B9--claudecode-%E5%AF%B9%E7%85%A7%E4%BF%AE%E6%AD%A3).
54
+ - **FEATURE_185 — Tool Result-Side Enrichment: Hits Ledger Cross-Compaction Preservation** (5 commits `15b1ea3c` → `83976149` → `da8d7b28` → `bddc3d58` + `fcd4cc76` docs, see [ADR-031](docs/ADR.md#adr-031-task-level-hits-ledger-与-cross-session-memdir-分层独立feature_185-v0742)). `KodaXSessionArtifactLedgerEntry` extraction now reads `tool_result.content` (not just `tool_use.input`) for grep / glob / bash entries — grep gains `hits: Array<{ path, line, preview? }>` up to 50 per entry, glob gains `paths: string[]`, bash gains `exit_code` + `tail` (last 240 chars). Metadata-aware merge keystone: when microcompact clears the raw `tool_result.content` to `[Cleared: ...]` placeholder, the ledger summary in the post-compact attachment still shows "you found 12 hits at module/foo.ts:23, 45, 78 / module/bar.ts:12" so the model knows what the prior grep found without re-running it. 5-alias × 2-case × 5-run Layer 2 panel: 5/5 alias ≥80% (gate met). Design doc: [docs/features/v0.7.42.md#feature_185-工具结果侧-enrichment--hits-ledger-跨压缩保留](docs/features/v0.7.42.md#feature_185-%E5%B7%A5%E5%85%B7%E7%BB%93%E6%9E%9C%E4%BE%A7-enrichment--hits-ledger-%E8%B7%A8%E5%8E%8B%E7%BC%A9%E4%BF%9D%E7%95%99).
55
+ - **claudecode-parity polish: dedicated `skill` tool** (commit `09e84aaf`). Replaces "read the SKILL.md file via Read tool" pattern with a dedicated `skill` tool that returns the skill body + metadata in one call. Matches claudecode V2 skill invocation surface.
56
+ - **claudecode-parity polish: `todo_get` tool** (commit `35b93cd7`). Single-task fetch by id, matches V2 TaskGet parity.
57
+ - **claudecode-parity polish: `subject` / `description` split on todo items** (commit `0833aeb7`). Two-field schema matching V2 — `subject` for the short title, `description` for the elaboration. Compatibility shim: legacy `content` field still accepted on input, mapped to `subject` server-side.
58
+ - **Plan-list metadata per-key delete** (commit `9094edda`). `todo_update` patch operation gains granular metadata delete (set value to `null` clears the key); previously the only way to clear metadata was full overwrite.
59
+ - **Plan-list hygiene — staleness refresh + dedup scan** (commit `a7748bbb`). Stale items (status pending for >N turns) get a system-reminder nudge; dedup scan flags subject-collisions across active items.
60
+ - **Deprecate LLM-side `op:'init'`** (commit `3f06330b`). `todo_create` batch is the canonical creation path going forward; LLM-side `op:'init'` remains backward-compatible but emits a deprecation hint in the tool response.
61
+ - **Verification nudge: `todo_update` reminder on terminal-completion transition** (commit `c9a3fe91`). When `todo_update` flips an item to `completed`, the tool response now appends a brief verification reminder ("verify you actually completed this; if not, set status back to in_progress").
62
+ - **`@kodax-ai/llm` ark-coding gains `deepseek-v4-{flash,pro}` (1M ctx)** (commit `c312e899`). Updates the canonical eval alias panel to use coding-plan provider variants (see `feedback_canonical_eval_alias_panel`).
63
+ - **Two-layer cascade for `replay` / `strict` / `streamMax`** (commit `a7615d54`). Custom provider parity with built-in providers' two-layer config resolution (provider default ← user override).
64
+ - **FEATURE_188 — claudecode-Parity dispatch_child Architecture: Drop Forced Worktree + Prompt-Level Conflict Awareness** (see [ADR-034](docs/ADR.md#adr-034-claudecode-parity-dispatch_child-architecture--drop-forced-worktree--prompt-level-conflict-awareness-feature_188-v0742)). Surfaced after FEATURE_177 panel #2 dump showed 0/250 real binding dispatches in `dispatch_child_task` C4 (read fan-out) + C5 (write fan-out) cells — model writing `<tool>dispatch_child_task</tool>` markup in narrative without invoking the structured tool. Three dead-assumption fixes: (1) `executeWriteChild` no longer creates a worktree — share parent `executionCwd` / `gitRoot`, per-file `backups` Map remains the rollback substrate. (2) Worker `dispatchRules` swaps `≥3 independent investigations` / `≥45 seconds` / `≥3 modules` for `multiple independent investigations` / `a while` / `multiple modules` (qualitative criteria per [ADR-033](docs/ADR.md#adr-033-claudecode-style-prompt-design-principles--qualitative-criteria-over-quantitative-rules-v0742-v0743) §1; pilot v3 isolation test 20 calls verified non-load-bearing). (3) RULE C drops `Worktrees are isolated; merge happens at Evaluator review time` — the Evaluator role was retired in FEATURE_184 v0.7.45 ([ADR-030](docs/ADR.md#adr-030-claudecode-shape-main-agent--sidecar-verifier-substrate-feature_184-v0745)). Write children's `buildChildBriefing` now carries a `## Coordination with peers` section instructing them to STOP-and-report if peer-conflict cannot be ruled out (read children's briefing intentionally omits this — they don't write files). Cross-package infrastructure (`childWriteWorktreePathsRef` ref + `registerChildWriteWorktrees` callback + `childWriteWorktreePaths` payload field + `worktreePaths` ReadonlyMap type, 4 type-decl + 4 plumbing sites across `child-executor.ts` / `runner-driven.ts` / `dispatch-child.ts` / `payload-builder.ts` / `types.ts`) all retired. CAP-097 contract test deleted (worktree-creation product behavior gone); CAP-095 / CAP-096 / `child-executor.test.ts` mocks + assertions updated. Design doc: [docs/features/v0.7.42.md#feature_188-dispatch_child-worktree-drop--conflict-awareness-prompt-hardening](docs/features/v0.7.42.md#feature_188-dispatch_child-worktree-drop--conflict-awareness-prompt-hardening).
65
+
66
+ ### Fixed
67
+
68
+ - **FEATURE_177 follow-up: read-file-state cache fires on microcompact-only changes** (commit `c66e2403`). Pre-fix the `onPostCompact` listener only fired on full compactions; microcompact-only iterations left stale cache entries. Now both microcompact and full compaction invalidate the per-task cache.
69
+ - **`dispatch_child_task` empty-summary fallback + opt-in trace** (commit `8c17dba4`). Child task that exited with empty summary previously fell through `??` to a default banner that read like a real summary; now produces a "no summary returned" diagnostic envelope with `mode=silent-drop` so the parent worker can react. Opt-in trace via env-gated logging.
70
+ - **`dispatch_child_task` review pass — flaky test + minor cleanups** (commit `3b5a862f`). Stabilizes one flaky test in the child-task harness and clears a handful of LOW-severity review items.
71
+ - **Shift-Tab cycle uses canonical `'auto'`** (commit `1b513824` + revert chain `32396db8` → `3637bcec`). Closes the Windows-SSH cursor-misalignment root cause. The follow-up revert `3637bcec` restored the `aliasedCurrent` mapping after `32396db8` was challenged by the user — semantic intent (explicit `auto-in-project ≡ auto`) ≠ behavior equivalence (`indexOf=-1` fallback); the original mapping is load-bearing. See `feedback_behavioral_vs_semantic_equivalence` memory.
72
+ - **FEATURE_172 `Output.width` viewport-sync attempt** (commits `fabe0b4f` + revert `e62312b3`). Same-cycle revert, retrospectively classified as a **misjudged hypothesis** — no real ghost-cell bug to fix. FEATURE_172 main scope (Phase 1 data layer + Phase A.1 ScreenBuilder) remains CLOSED with no v0.7.43+ follow-up.
73
+ - **REPL queue layout — budget reserves N+1 rows for `QueuedCommandsSurface`** (commit `f4267d4d`). The queue surface was 1 row short of its actual rendered height in tight terminals, causing trailing ellipsis cutoff.
74
+ - **Compaction preserves image blocks + counts image tokens** (commit `92b11e68`). Image blocks were silently dropped during summary roll-up; now preserved verbatim and their estimated tokens included in the total.
75
+ - **REPL drops `[Image #N]` anchor from user-message text** (commit `1eac821d`). Pre-fix the visible user-message text carried both the image block and a redundant `[Image #N]` anchor string; claudecode parity removes the text anchor since the image block itself is the canonical reference.
76
+ - **Read tool image-aware via multimodal `tool_result`** (commit `286c16db`). Reading an image file (`.png` / `.jpg` / `.webp` etc.) now returns the binary as an image-content block in the tool_result, not as a base64 string in text; claudecode parity.
77
+ - **`loadCompactionConfig` uses per-model `contextWindow` for adaptive `triggerPercent`** (commit `0cef1b66`). Pre-fix the trigger percentage was computed against the legacy hard-coded 200k context window; now reads the per-model `contextWindow` (e.g., 1M for `glm-5-turbo` corrected in `5324889e`, 200k for Claude Sonnet 4.x) so the 60% trigger threshold scales correctly.
78
+ - **Status bar `contextWindow` re-resolves on `/model` swap** (commit `c9f62030`). Pre-fix the status-bar contextWindow value was captured at REPL bootstrap; switching models mid-session left the bar showing the stale value.
79
+ - **`zhipu` / `zhipu-coding` `glm-5-turbo` contextWindow 128K → 200K** (commit `5324889e`). Provider metadata correction.
80
+ - **Narrow P2b RST-prone default list to `zhipu-coding` only** (commit `8e9b4520`). FEATURE_152 P2b (write-turn max_output_tokens cap) defaulted to a too-broad provider list, causing unrelated max_tokens RST on healthy providers; narrowed.
81
+ - **Image vision perception: tightened regex + Layer 3 compaction variants — `bc04581c` REVERTED**. Worker image-perception prompt block was prototyped (`bc04581c`) then reverted (`2fd8d8fc`) after Layer 3 V_*_compacted variants showed zhipu state turning honest refuses into confident hallucinations; saturated eval surfaced via tightened regex (require image-content keyword, not SVG markup). Layer 2 eval driver `fe76d3da` retained as permanent regression sweep. See `project_image_perception_worker_prompt` memory.
82
+ - **InkREPL spinner fallback: `item.content` → `item.subject`** (commit `157b162d`). Follow-up to the FEATURE_060 Tier 2 rename; the parallel-thread InkREPL.tsx still referenced `item.content` on the spinner-row fallback path.
83
+ - **InkREPL: `onThinkingEnd` no longer creates duplicate thinking item after assistant text** (commit `4798e66a`). Pre-fix, the end-of-thinking event could append a second transcript entry when the assistant had already begun streaming text; now coalesces with the existing thinking row.
84
+ - **`/compact` updates live token count via `onCompactStats`** (commits `4da09289` + `c058aeff` + revert `829401a8`). Status-bar token count was frozen pre-compaction; now updates in real-time as compaction proceeds. Revert chain captured a temporary command-bridge wiring path that crossed a layer boundary; replaced with the canonical onCompactStats callback in a follow-up.
85
+ - **FEATURE_184 follow-up shipped to v0.7.42 via narrow types**: `RunnerToolResult.content` union narrowed at string-only consumers (`ab2c63be`), unblocking the v0.7.45 sidecar-verifier work in parallel without forcing a v0.7.42 ship dependency.
86
+ - **FEATURE_173 ghost-session double-write**: see Added bullet above.
87
+
88
+ ### Reverted
89
+
90
+ - **FEATURE_177 Worker prompt RULE D — `task_output` teaching layer** (commit `9082551b`). Layer 2 panel rerun (250 cells, 3.2% audit disagreement DATA VALID) hit pre-registered REVERT threshold: case C5 kimi RULE C write fan-out 80% → 20% (-60pp, judge + regex agree). Worker `dispatchRules` reverted to RULE A/B/C + IDLE-YIELD + LARGE CHILD OUTPUT + MODEL HINT (no RULE D in any state). **The runtime `task_output` tool itself stays ON** (commit `334756b7` — in-memory `ChildProgressSnapshot` ring buffer cap=200 + claudecode-shape envelope tool); SDK consumers can opt the worker into the RULE D prompt teaching via `KODAX_TASK_OUTPUT_PROMPT='1'`. Eval drivers retained as permanent regression sweep at `tests/feature-177-task-output*.eval.ts`. User-driven root-cause diagnosis (C5 -60pp is a systemic prompt design problem, not a wording issue) produced **ADR-033** (claudecode-Style Prompt Design Principles — qualitative criteria / single-concept sentences / sparing ✗ + WHY / no enumerated taxonomies / no version metadata in prompt body) and the v0.7.42 hygiene sweep below.
91
+
92
+ ### Cancelled
93
+
94
+ - **FEATURE_094 — Deep Anti-Escape Hardening** (2026-05-19, see [memory](../memory/project_feature_094_cancelled.md)). Necessity probe (5 alias × 3 case × 3 run = 43 probes) measured **0/43 escape rate** across the canonical 5-alias panel — far below the cancel threshold (<5% AND <15%). The post-v0.7.26 layered defense (P0 system prompt + P2a `multi_edit` + P2b `max_output_tokens` write-turn cap) combined with FEATURE_152 (bash AST migration) + FEATURE_158 (signal-based classifier) + FEATURE_169 (pull-tool prompt hardening) absorbed the bypass surface that motivated the original 2026-04 design (~15% bypass at that time). Probe retained as permanent regression sweep: `tests/feature-094-necessity-probe.eval.ts` + `benchmark/datasets/feature-094-necessity-probe/cases.ts`. Escape rate **must** stay 0%; `>5%` reopens FEATURE_094.
95
+
96
+ ### Internal / architecture
97
+
98
+ - **ADR-029 — AMA Compaction Trigger Parity (Top-of-Loop)** documents the FEATURE_179 lifecycle move.
99
+ - **ADR-031 — Task-Level Hits Ledger 与 Cross-Session Memdir 分层独立** documents the FEATURE_185 vs FEATURE_124 (v0.7.43 memdir) boundary.
100
+ - **ADR-032 — SDK Embedder Surface Closure (FEATURE_186, v0.7.42)** documents the 8-phase atomic execution + no-dual-route + dynamic config-path + metadata-driven plan-mode gate + Phase 7 vs Phase 8 (capability-provider-shape vs manager-shape) design decisions.
101
+ - **ADR-034 — claudecode-Parity dispatch_child Architecture (FEATURE_188, v0.7.42)** documents the forced-worktree drop + qualitative dispatchRules + write-child Coordination briefing. Three dead assumptions retired (Evaluator review-at-merge / "failed rollback needs worktree" / "parallel writes must conflict"). claudecode's `isolation:'worktree'` opt-in is the precedent; KodaX picks user-directed prompt-level peer coordination instead of an explicit opt-in toggle to keep the dispatch friction low.
102
+ - **ADR-033 hygiene sweep — Worker `dispatchRules` claudecode-style refactor**. Two commits in v0.7.42 release window apply ADR-033 principles systemically on top of FEATURE_188's qualitative swap:
103
+ - **PLAN-FIRST trigger qualitative swap** (commit `5569c49c`, `worker-role-prompt.ts:212`). `≥3 children` → `multiple children`. Panel 95/100 cells empty-binding (floor saturation analog of `feedback_pre_registered_gate_saturation`); audit DATA VALID (plan_first 10.0% at threshold / dispatch_intent 0.0%); per-alias gate met; aggregate +3/100. Policy alignment, not behavioral change.
104
+ - **FAN-OUT PLAN GRANULARITY block 18-line → claudecode 3-bullet** (commit `1e60eeb0`, `worker-role-prompt.ts:210-216`). 18-line block → 4-line (−57% chars); deletes 6 × ✗ 反模式 + 5 × enumerated label + WORKED EXAMPLE code block + version metadata. Layer 2 panel C4 baseline 0/25 dispatch vs claudecode 7/25 (judge view); 5/5 alias dispatch Δ ≥ 0. mmx C4 -2 cell strict gate failure overridden via evidence-driven SHIP (baseline saturation in plan-without-dispatch case).
105
+ - **Doc reconciliation: FEATURE_184 design relocation v0.7.45.md → v0.7.42.md** (commit `ac4d0267`). FEATURE_184 was drafted as v0.7.45 then shipped to v0.7.42's release window 2026-05-21; the design doc is relocated to match shipped reality. Git history of the 28 v0.7.45-tagged commits is preserved as-is — only the `docs/features/v0.7.{42,45}.md` files were rewritten.
106
+ - **build pipeline: `build-dts.mjs` self-test** (Phase 1 of FEATURE_186). Builds a CI guard against `@kodax-ai/*` internal-import leaks in any of the 7 entry `.d.ts` files (root + 6 subpaths). POSITIVE/NEGATIVE sample regex self-test + hard-assert grep on each built entry — exits 1 if any leak found. Prevents the v0.7.40 publish hazard from reaching the tarball again.
107
+ - **`@kodax-ai/kodax/mcp` subpath** (Phase 7 of FEATURE_186). Sixth SDK subpath; thin re-export of `@kodax-ai/mcp`. Build pipeline (`build-bundle.mjs` `sdkEntryNames` / `build-dts.mjs` `sdkEntries` / `release.mjs` `pkg.exports`) and release.mjs publishConfig wiring all three sync.
108
+ - **Cancelled features tracker hygiene**: FEATURE_094 row updated in `docs/FEATURE_LIST.md`; tracker entry shows `Cancelled 2026-05-19` with necessity-probe rationale + probe retention pointer.
109
+ - **`@kodax-ai/coding` MCP barrel** — `registerConfiguredMcpCapabilityProvider` + `McpCapabilityProvider` etc. still re-exported through coding for backward compatibility; new `@kodax-ai/kodax/mcp` subpath is the cleaner entry going forward.
110
+
111
+ ### Breaking changes
112
+
113
+ - **`LocalToolDefinition.sideEffect` is now required** (Phase 4 of FEATURE_186, commit `7defd65f`). SDK consumers who construct custom `LocalToolDefinition` objects via `registerTool({...})` must now include a `sideEffect: 'readonly' | 'mutates-fs' | 'mutates-shell' | 'mutates-network' | 'mutates-state'` field. tsc will fail on pre-v0.7.42 consumer code until this field is added. The most-defensive default for custom tools is `'mutates-state'`; `'readonly'` is appropriate only for tools with NO observable effects on the system.
114
+ - **`@kodax-ai/coding` exports new types**: `ToolSideEffect`, `KodaXSessionControl`, `KodaXSessionMutators`. These are additive (no rename); existing imports unaffected.
115
+ - **FEATURE_188 (ADR-034) — `dispatch_child_task` no longer auto-creates a worktree for write children**. Write children now share the parent agent's `executionCwd` + `gitRoot` (per-file `backups` Map remains the rollback substrate, and the write-child briefing now carries a "Coordination with peers" section instructing the child to STOP-and-report if peer-conflict cannot be ruled out). The `KodaXChildExecutionResult.worktreePaths?: ReadonlyMap<string,string>` field is removed; the `KodaXManagedTaskRuntimeState.childWriteWorktreePaths` field is removed; the `KodaXToolExecutionContext.registerChildWriteWorktrees?` callback is removed; the `WriteChildDiff` interface + `buildEvaluatorMergePrompt` / `collectWriteChildDiffs` / `cherryPickWorktree` / `cleanupWorktrees` helpers (all dead since FEATURE_184 ADR-030 retired the Evaluator role) are removed. `toolWorktreeCreate` / `toolWorktreeRemove` tools themselves stay in the registry — they still serve the user-explicit `EnterWorktreeTool` / `ExitWorktreeTool` flow. SDK consumers reading `worktreePaths` for diff inspection must instead consume `evidence` / `mergedFindings`. See [ADR-034](docs/ADR.md#adr-034-claudecode-parity-dispatch_child-architecture--drop-forced-worktree--prompt-level-conflict-awareness-feature_188-v0742).
116
+
117
+ ### Test coverage delta
118
+
119
+ - **+158 new unit tests** from FEATURE_186 alone (32 `getAppDataDir` + 18 tool-metadata helpers + 21 custom-provider CRUD + 20 RunningSession + 26 MCP CRUD + 20 McpManager + 21 plan-mode gate / skill-resolver / build-dts self-test).
120
+ - Plus tests added by FEATURE_173 (12 stable-contract), FEATURE_175 Slice 2 (6 net), FEATURE_177 cache (per-task LRU), FEATURE_178 stall detector (L1+L2), FEATURE_179 lifecycle test, FEATURE_180 dedup test, FEATURE_181 / 182 / 183 single-case fixes, FEATURE_185 enrichment (13 file-tracker + 9 post-compact + 33 result-extractors + Layer 2 eval driver).
121
+ - Coding 2704/2704 + repl 1431/1432 green across the cycle. Build:bundle + build:dts clean for all 7 subpath entries.
122
+
7
123
  ## [0.7.41] - 2026-05-19
8
124
 
9
125
  ### Theme
@@ -267,8 +383,8 @@ Production trace showed Evaluator emitting `emit_verdict` accept before children
267
383
 
268
384
  ### Known not-in-scope
269
385
 
270
- - **Mid-tool-call prompt injection** (streaming a new user message to the LLM while a tool is still executing) — conflicts with cancel-then-reissue boundaries; deferred to v0.7.43+.
271
- - **Soft-pause state machine** — FEATURE_111 v0.7.43 scope.
386
+ - **Mid-tool-call prompt injection** (streaming a new user message to the LLM while a tool is still executing) — conflicts with cancel-then-reissue boundaries. **NOT in v0.7.43 scope** (FEATURE_124 + FEATURE_189 占满 release window); blocked on FEATURE_115 stabilization before re-entry. Earliest realistic window: v0.7.46+.
387
+ - **Soft-pause state machine** — FEATURE_111 cancelled, absorbed into FEATURE_115 (per FEATURE_LIST.md row 130). v0.7.43 slot reallocated to FEATURE_124.
272
388
  - **Council / multi-advisor consult** — FEATURE_105 v0.7.46 scope.
273
389
  - **Read-child cost-stripping** — v1 of FEATURE_117 was abandoned; read children already minimal.
274
390
 
@@ -1267,7 +1383,7 @@ repl → coding, skills
1267
1383
  ### Tests
1268
1384
  - Added / expanded tests for `task-engine`, `reasoning`, `tool-display`, `live-streaming`, `StatusBar`, `invocation-runtime`, `types-legacy`, and `InkREPL.interrupted`
1269
1385
 
1270
- <!-- last-sync: HEAD -->
1386
+ <!-- last-sync: a8258d29 -->
1271
1387
 
1272
1388
  ### Added
1273
1389
  - **Repository intelligence substrate (FEATURE_018)**: Task-aware repository intelligence layer under `.agent/repo-intelligence/` with durable artifacts — `repo-overview.json`, `changed-scope.json`, `module-index.json`, `symbol-index.json`, `process-index.json`, `repo-intelligence-manifest.json` — supporting incremental refresh, freshness metadata, and language-tiered extraction (TS/JS via AST, Python, Go, Rust, Java, C++)