npm - jeo-code - Versions diffs - 0.6.26 → 0.6.28 - Mend

jeo-code 0.6.26 → 0.6.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/CHANGELOG.md +25 -0
package/README.ja.md +2 -6
package/README.ko.md +2 -6
package/README.md +2 -6
package/README.zh.md +2 -6
package/package.json +1 -1
package/src/agent/compaction.ts +10 -1
package/src/agent/engine.ts +62 -16
package/src/agent/loop.ts +3 -0
package/src/ai/model-manager.ts +6 -8
package/src/ai/providers/anthropic.ts +114 -21
package/src/ai/providers/antigravity.ts +6 -0
package/src/ai/providers/errors.ts +18 -0
package/src/ai/providers/gemini.ts +84 -28
package/src/ai/providers/openai-compatible-catalog.ts +10 -4
package/src/ai/providers/openai-responses.ts +76 -19
package/src/ai/types.ts +55 -2
package/src/commands/launch/flags.ts +5 -2
package/src/commands/launch.ts +119 -25
package/src/tui/app.ts +38 -6
package/src/tui/components/ascii-art.ts +38 -45

package/CHANGELOG.md CHANGED Viewed

@@ -6,6 +6,31 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 The README mirrors the latest 5 entries — regenerate with `bun run changelog:sync`.
+## [0.6.28] - 2026-06-19
+_Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity)._
+### Added
+- **Provider-native reasoning replay across all three first-party providers.** jeo now captures each provider's opaque/signed reasoning artifact during streaming and replays it on later turns to the SAME provider+model, so the model keeps its chain of thought across tool steps instead of re-deriving it. New `Message.reasoningArtifacts` plus structured `Message.toolUse` / `toolResults` (stable ids) let capable adapters reconstruct **native** tool blocks (the key to continuity — plain-text tool feedback makes Claude strip prior thinking):
+  - **Anthropic**: captures `signature_delta` + `redacted_thinking`; replays `thinking`(+signature) → `tool_use` → `tool_result` blocks (gated on same-model + thinking-enabled).
+  - **OpenAI Responses**: requests `include: ["reasoning.encrypted_content"]` (store stays false), captures reasoning item id+encrypted_content, replays native `reasoning` + `function_call` + `function_call_output` items.
+  - **Gemini**: captures per-part `thoughtSignature`, replays native `functionCall`(+thoughtSignature) / `functionResponse` parts (coalescing-safe). This was previously deferred — structured `toolUse` unblocks the functionCall binding.
+- **Fail-safe strip-and-retry.** A 400 naming a thinking/signature/encrypted/reasoning field retries the step ONCE with artifacts stripped (plain history), so an expired signature or edited history can never wedge a turn. Per provider (Anthropic/OpenAI/Gemini).
+### Changed
+- **Reasoning artifacts ride the session record + token accounting.** `reasoningArtifacts` round-trips through session save/load (so `/resume` preserves replay continuity) and counts toward `estimateMessageTokens` (OpenAI encrypted blobs are KB-scale) so compaction/overflow stay honest. Markdown export is unchanged (artifacts are opaque). The engine's ~11 assistant-push sites are unified behind `pushAssistantTurn`, so every step (not just the final reply) carries its reasoning + artifacts. Antigravity is explicitly out of scope (no capture/replay; the provider-keyed match guard prevents any cross-adapter leakage).
+## [0.6.27] - 2026-06-19
+_Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`._
+### Changed
+- **`thinkingToReasoningEffort` collapsed to its essential mapping (ponytail/YAGNI pass).** The four redundant pass-through branches (`minimal`/`low`/`medium`/`high` each returning themselves) are now a single `level === "xhigh" ? "high" : level` — behavior-identical (every level still maps to a genuine reasoning effort; only an unset level stays off), 8 fewer lines, fully covered by the existing `model-manager`/`round-b` contract tests. Reasoning continues to activate at EVERY thinking level (gajae parity).
+### Fixed
+- **`/agents <role> provider <name>` now accepts every registered provider and always shows a model list (jeo team role config).** Three compounding bugs surfaced via a real `jeo --tmux` session pinning a role to `groq`: (1) `isProviderName` was an unsound type guard hardcoding only 5 names (`anthropic|openai|gemini|antigravity|ollama`), so `/agents <role> provider groq` (and every other OpenAI-compat provider — deepseek, openrouter, mistral, …) was rejected as invalid usage; it now validates against the canonical `PROVIDER_NAMES` registry. (2) Live discovery only returns ids for a logged-in, reachable provider, and the catalog backfill applied only to OAuth-source providers — so an unconfigured API-key provider showed an EMPTY model list and silently pinned a bare default. (3) The 24 OpenAI-compat providers carry no capability-catalog rows, so even the catalog fallback was empty for them. The new `providerPickEntries` helper now climbs live ids → static catalog → the provider's known default model, so the list is never empty, and the source is labeled (`Live`/`Catalog … log in to list live models`). Verified end-to-end in a real tmux session (`#1 groq/llama-3.3-70b-versatile` listed and pinned). Covered by `test/provider-pick-entries.test.ts` and a new `isProviderName` regression test in `test/launch-flags.test.ts`.
+### Verified
+- **`jeo --tmux` session profile confirmed against the real `tmux` binary.** The gjc-parity profile (`mouse on`, `@jeo-profile`/`@jeo-branch`/`@jeo-project` markers, `set-clipboard on`, copy-mode `mode-style`) was exercised on an isolated `-L` socket using the exact `=name:` target syntax the launch code emits — every option set and read back correctly. `test/tmux.test.ts` passes 12/0 alongside the full 1645/0 suite.
 ## [0.6.26] - 2026-06-19
 _The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게)._

package/README.ja.md CHANGED Viewed

@@ -2,10 +2,6 @@
   <img src="assets/hero.png" alt="jeo-code 自律コーディングエージェントのヒーローイラスト" width="100%" />
 </p>
-<p align="center">
-  <img src="assets/icon.png" alt="jeo-code icon" width="96" />
-</p>
 <h1 align="center">jeo-code (jeo)</h1>
 <p align="center">
@@ -204,11 +200,11 @@ CI は `.github/workflows/npm-publish.yml` で公開します — GitHub リリ
 ## 変更履歴 (Changelog)
 <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
+- **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
+- **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
 - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
 - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
 - **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
-- **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
-- **[0.6.22]** (2026-06-18) — Extended-thinking activation is now consistent across providers: a `low` session thinking level enables reasoning everywhere.
 See [CHANGELOG.md](CHANGELOG.md) for the full history.
 <!-- CHANGELOG:END -->

package/README.ko.md CHANGED Viewed

@@ -2,10 +2,6 @@
   <img src="assets/hero.png" alt="jeo-code 자율 코딩 에이전트 히어로 일러스트" width="100%" />
 </p>
-<p align="center">
-  <img src="assets/icon.png" alt="jeo-code icon" width="96" />
-</p>
 <h1 align="center">jeo-code (jeo)</h1>
 <p align="center">
@@ -204,11 +200,11 @@ CI는 `.github/workflows/npm-publish.yml`로 배포합니다 — GitHub 릴리
 ## 변경 이력 (Changelog)
 <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
+- **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
+- **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
 - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
 - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
 - **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
-- **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
-- **[0.6.22]** (2026-06-18) — Extended-thinking activation is now consistent across providers: a `low` session thinking level enables reasoning everywhere.
 See [CHANGELOG.md](CHANGELOG.md) for the full history.
 <!-- CHANGELOG:END -->

package/README.md CHANGED Viewed

@@ -2,10 +2,6 @@
   <img src="assets/hero.png" alt="jeo-code autonomous coding-agent hero illustration" width="100%" />
 </p>
-<p align="center">
-  <img src="assets/icon.png" alt="jeo-code icon" width="96" />
-</p>
 <h1 align="center">jeo-code (jeo)</h1>
 <p align="center">
@@ -204,11 +200,11 @@ Required npm token permissions (repository secret `NPM_TOKEN`):
 ## Changelog
 <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
+- **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
+- **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
 - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
 - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
 - **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
-- **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
-- **[0.6.22]** (2026-06-18) — Extended-thinking activation is now consistent across providers: a `low` session thinking level enables reasoning everywhere.
 See [CHANGELOG.md](CHANGELOG.md) for the full history.
 <!-- CHANGELOG:END -->

package/README.zh.md CHANGED Viewed

@@ -2,10 +2,6 @@
   <img src="assets/hero.png" alt="jeo-code 自主编码代理主视觉插图" width="100%" />
 </p>
-<p align="center">
-  <img src="assets/icon.png" alt="jeo-code icon" width="96" />
-</p>
 <h1 align="center">jeo-code (jeo)</h1>
 <p align="center">
@@ -204,11 +200,11 @@ CI 通过 `.github/workflows/npm-publish.yml` 发布 — GitHub 发布 release
 ## 更新日志 (Changelog)
 <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
+- **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
+- **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
 - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
 - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
 - **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
-- **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
-- **[0.6.22]** (2026-06-18) — Extended-thinking activation is now consistent across providers: a `low` session thinking level enables reasoning everywhere.
 See [CHANGELOG.md](CHANGELOG.md) for the full history.
 <!-- CHANGELOG:END -->

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "jeo-code",
-  "version": "0.6.26",
+  "version": "0.6.28",
   "description": "Clean, highly optimized AI coding agent using spec-first loop",
   "type": "module",
   "main": "src/cli.ts",

package/src/agent/compaction.ts CHANGED Viewed

@@ -78,7 +78,16 @@ const messageTokenCache = new WeakMap<Message, number>();
 export function estimateMessageTokens(msg: Message): number {
   const hit = messageTokenCache.get(msg);
   if (hit !== undefined) return hit;
-  const n = estimateTokens(msg.role) + estimateTokens(msg.content) + (msg.images?.length ?? 0) * IMAGE_TOKEN_ESTIMATE + 1;
+  let n = estimateTokens(msg.role) + estimateTokens(msg.content) + (msg.images?.length ?? 0) * IMAGE_TOKEN_ESTIMATE + 1;
+  // Native reasoning artifacts (signature / encrypted_content / thought text) are NOT in
+  // `content` but become REAL input tokens once an adapter replays them — count them so
+  // the context meter and compaction trigger stay honest (OpenAI encrypted blobs are KB-scale).
+  // toolUse/toolResults/toolResultExtra are already reflected in `content`, so they are not re-added.
+  for (const a of msg.reasoningArtifacts ?? []) {
+    n += estimateTokens(a.text ?? "") + estimateTokens(a.signature ?? "")
+      + estimateTokens(a.redacted ?? "") + estimateTokens(a.thoughtSignature ?? "")
+      + estimateTokens(a.encrypted ?? "");
+  }
   messageTokenCache.set(msg, n);
   return n;
 }

package/src/agent/engine.ts CHANGED Viewed

@@ -34,11 +34,30 @@ async function invokeCallLlm(history: Message[], options: {
   onRetry?: (attempt: number, err: unknown, delayMs: number) => void;
   onToken?: (delta: string) => void;
   onReasoning?: (delta: string) => void;
+  onReasoningArtifact?: (artifact: import("../ai/types").ReasoningArtifact) => void;
   tools?: import("../ai/types").NativeToolSchema[];
 }): Promise<string> {
   const mod = await import("./loop");
   return mod.callLlm(history, options);
 }
+/** Push an assistant turn, attaching the step's reasoning + native replay records when
+ *  present. Centralizes the assistant-push sites so reasoning/artifacts attach uniformly
+ *  (not just the final reply). Omits empty fields so back-compat serialization and the
+ *  identity-keyed token cache are unaffected. */
+function pushAssistantTurn(
+  history: Message[],
+  content: string,
+  reasoning: string,
+  artifacts: import("../ai/types").ReasoningArtifact[],
+  toolUse?: import("../ai/types").ToolUseRecord[],
+): void {
+  const msg: Message = { role: "assistant", content };
+  if (reasoning.trim()) msg.reasoning = reasoning;
+  if (artifacts.length) msg.reasoningArtifacts = artifacts;
+  if (toolUse && toolUse.length) msg.toolUse = toolUse;
+  history.push(msg);
+}
 export interface ToolInvocation {
   tool: string;
   arguments?: Record<string, any>;
@@ -176,6 +195,9 @@ export interface AgentLoopEvents {
   /** Accumulated native reasoning/thinking text so far — drives a transient dimmed
    *  "thinking" view. Only requested when a consumer (TUI) attaches. */
   onReasoningStream?(textSoFar: string): void;
+  /** Each provider-native reasoning ARTIFACT as it is captured (signature / thoughtSignature /
+   *  reasoning item). Lets the final-reply path (launch.ts) persist artifacts for replay. */
+  onReasoningArtifactStream?(artifact: import("../ai/types").ReasoningArtifact): void;
   /** Step-budget change (gjc-style retry flow): the limit was extended because the
    *  turn is making progress. `limit` is the new max; `reason` is display-ready. */
   onBudget?(limit: number, reason: string): void;
@@ -345,7 +367,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
         );
         const consolidated = wrapUp.trim();
         if (consolidated) {
-          history.push({ role: "assistant", content: consolidated });
+          pushAssistantTurn(history, consolidated, "", []);
           return finish({
             done: false,
             steps: step,
@@ -493,6 +515,14 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
     const onReasoning = ev.onReasoningStream
       ? (delta: string) => { reasonBuf += delta; ev.onReasoningStream!(reasonBuf); }
       : undefined;
+    // Capture provider-native reasoning ARTIFACTS for replay (always — independent of any
+    // TUI display sink). Stays scoped to THIS step so a later consolidation push can't
+    // inherit a prior step's signatures.
+    const artifactBuf: import("../ai/types").ReasoningArtifact[] = [];
+    const onReasoningArtifact = (a: import("../ai/types").ReasoningArtifact) => {
+      artifactBuf.push(a);
+      ev.onReasoningArtifactStream?.(a);
+    };
     let responseText: string;
     try {
       responseText = await invokeCallLlm(history, {
@@ -510,6 +540,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
               onUsage: u => { acc.inputTokens += u.inputTokens ?? 0; acc.outputTokens += u.outputTokens ?? 0; sawUsage = true; },
               onToken,
               onReasoning,
+              onReasoningArtifact,
               // Make provider auto-retry visible: previously a rate-limited call sat in a
               // silent backoff wait, then surfaced "auto-retry was exhausted" with no trace
               // of the retries that DID happen.
@@ -604,10 +635,10 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
       const trimmed = responseText.trim();
       parseFailures++;
       if (trimmed && (!trimmed.includes("{") || parseFailures > MAX_PARSE_BOUNCES)) {
-        history.push({ role: "assistant", content: responseText });
+        pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
         return finish({ done: true, steps: step, doneReason: trimmed });
       }
-      history.push({ role: "assistant", content: responseText });
+      pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
       history.push({
         role: "user",
         content:
@@ -654,7 +685,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
           doneReason: `Stopped: the model returned no valid tool call ${MAX_INVALID_CALLS}× (a JSON reply with no valid "tool" or "tools" field). The selected model may be too small to follow the JSON tool protocol — switch to a stronger model with /model.`,
         });
       }
-      history.push({ role: "assistant", content: responseText });
+      pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
       history.push({
         role: "user",
         content: `Your last reply had no "tool" or "tools" field. Reply with exactly one JSON object, e.g. {"tool":"find","arguments":{"globPattern":"src/**"}} or {"tools":[{"tool":"read","arguments":{"filePath":"src/main.ts"}}, ...]}.`,
@@ -674,7 +705,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
     if (toolCalls.length === 1 && toolCalls[0].tool === "done") {
       if (sawMutation && (!sawVerification || pendingHookFailure !== null) && !donePushbackUsed) {
         donePushbackUsed = true; // second done always passes — escape hatch
-        history.push({ role: "assistant", content: responseText });
+        pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
         history.push({
           role: "user",
           content: pendingHookFailure !== null
@@ -696,7 +727,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
         const nudge = await ev.onBeforeDone((toolCalls[0].arguments?.reason as string) ?? "");
         if (nudge) {
           beforeDoneNudgeUsed = true;
-          history.push({ role: "assistant", content: responseText });
+          pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
           history.push({ role: "user", content: nudge });
           ev.onNotice?.("done deferred once — final plan reconciliation requested");
           step++;
@@ -709,7 +740,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
       if (opts.steer) {
         const pending = opts.steer().map(s => (s ?? "").trim()).filter(Boolean);
         if (pending.length) {
-          history.push({ role: "assistant", content: responseText });
+          pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
           for (const text of pending) {
             history.push({
               role: "user",
@@ -754,7 +785,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
       const lastChance = repeatCount === MAX_REPEAT - 1
         ? "This is your LAST attempt: if you emit the same call again the turn will end. "
         : "";
-      history.push({ role: "assistant", content: responseText });
+      pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
       history.push({
         role: "user",
         content:
@@ -784,7 +815,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
       if (!cycleBounceUsed) {
         cycleBounceUsed = true;
         recentStepSigs.length = 0; // fresh window: the correction earns a real retry
-        history.push({ role: "assistant", content: responseText });
+        pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
         history.push({
           role: "user",
           content:
@@ -944,6 +975,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
       );
       // Append the batch's hook diagnostics once so the model can self-correct. Two
       // DISTINCT hooks with identical output collapse to one full block + a cross-ref.
+      let hookExtra = "";
       if (hookDiags.length > 0) {
         const seenHookFeedback = new Set<string>();
         const diagLines: string[] = [];
@@ -956,14 +988,28 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
             diagLines.push(`[post-turn hook "${d.run}" — exit ${d.exitCode}]:\n${truncateToolOutput(d.output)}`);
           }
         }
-        resultBlocks.push(diagLines.join("\n"));
+        hookExtra = diagLines.join("\n");
+        resultBlocks.push(hookExtra);
       }
-      history.push({ role: "assistant", content: responseText });
-      history.push({
-        role: "user",
-        content: resultBlocks.join("\n\n"),
-      });
+      // Structured native replay records: stable ids correlate the assistant tool_use
+      // turn with its tool_result user turn (the string `content` stays the source of
+      // truth for display / compaction / fallback adapters).
+      const idFor = (idx: number) => `call_${step}_${idx}`;
+      const toolUse: import("../ai/types").ToolUseRecord[] = indices.map(idx => ({
+        id: idFor(idx),
+        tool: toolCalls[idx].tool,
+        arguments: toolCalls[idx].arguments ?? {},
+      }));
+      const toolResults: import("../ai/types").ToolResultRecord[] = indices.map((idx, i) => ({
+        id: idFor(idx),
+        output: bodies[i],
+        isError: !results[idx].success,
+      }));
+      pushAssistantTurn(history, responseText, reasonBuf, artifactBuf, toolUse);
+      const resultMsg: Message = { role: "user", content: resultBlocks.join("\n\n"), toolResults };
+      if (hookExtra) resultMsg.toolResultExtra = hookExtra;
+      history.push(resultMsg);
     };
     if (aborted) {
@@ -1053,7 +1099,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
       );
       const consolidated = wrapUp.trim();
       if (consolidated) {
-        history.push({ role: "assistant", content: consolidated });
+        pushAssistantTurn(history, consolidated, "", []);
         return finish({
           done: false,
           steps: budget.limit(),

package/src/agent/loop.ts CHANGED Viewed

@@ -26,6 +26,9 @@ export interface ChatOptions {
   onToken?: (delta: string) => void;
   /** Streaming sink for native reasoning/thinking deltas (drives the dimmed live view). */
   onReasoning?: (delta: string) => void;
+  /** Streaming sink for provider-native reasoning ARTIFACTS (signature / thoughtSignature /
+   *  reasoning item id+encrypted) — the replay channel, separate from onReasoning. */
+  onReasoningArtifact?: (artifact: import("../ai/types").ReasoningArtifact) => void;
   /** NATIVE tool-calling function declarations (forwarded to capable adapters). */
   tools?: import("../ai/types").NativeToolSchema[];
 }

package/src/ai/model-manager.ts CHANGED Viewed

@@ -100,18 +100,15 @@ export function thinkingMaxTokens(level?: "minimal" | "low" | "medium" | "high"
   return 16000;
 }
-/** Map the thinking level to an OpenAI reasoning-effort tier. `minimal` is preserved as a
- *  genuine (lightest) reasoning effort — NOT collapsed to `low` — so reasoning works at EVERY
- *  thinking level (gajae parity: Minimal is a real effort). Only an unset level returns undefined
- *  (reasoning off). `xhigh` maps to `high`, the deepest tier the provider APIs accept. */
+/** Map the thinking level to an OpenAI reasoning-effort tier. minimal/low/medium/high pass
+ *  through unchanged and xhigh folds to high (the deepest tier the provider APIs accept), so
+ *  reasoning works at EVERY thinking level (gajae parity: minimal is a real effort). Only an
+ *  unset level returns undefined (reasoning off — the explicit /fast path). */
 export function thinkingToReasoningEffort(
   level?: "minimal" | "low" | "medium" | "high" | "xhigh",
 ): "minimal" | "low" | "medium" | "high" | undefined {
   if (!level) return undefined;
-  if (level === "minimal") return "minimal";
-  if (level === "low") return "low";
-  if (level === "high" || level === "xhigh") return "high";
-  return "medium";
+  return level === "xhigh" ? "high" : level;
 }
 /** Describe a model id: alias expansion + the provider it routes to. For `/model` + diagnostics.
@@ -335,6 +332,7 @@ async function resolveCall(options: Partial<CallOptions>, kind: "request" | "str
     signal: options.signal,
     reasoningEffort: options.reasoningEffort ?? thinkingToReasoningEffort(config.thinkingLevel),
     onReasoning: options.onReasoning,
+    onReasoningArtifact: options.onReasoningArtifact,
     tools: options.tools,
   };
   // Caller-supplied retry sink rides on the config-derived retry budget so the

package/src/ai/providers/anthropic.ts CHANGED Viewed

@@ -88,28 +88,76 @@ function anthropicThinkingBudget(effort: CallOptions["reasoningEffort"], maxToke
   return Math.min(budget, Math.max(1024, maxTokens - 1024));
 }
+type AnthropicContentBlock = Record<string, unknown>;
+type AnthropicMessage = { role: string; content: string | AnthropicContentBlock[] };
+/** True when an assistant turn can be replayed as native tool_use + thinking blocks: it has
+ *  structured toolUse AND a same-model Anthropic reasoning artifact that yields at least one
+ *  valid thinking/redacted block, AND thinking is enabled this call. Native tool_use →
+ *  tool_result is what makes Claude KEEP the prior thinking blocks (plain-text tool feedback
+ *  gets them stripped on most models), so this is the core of cross-step reasoning continuity. */
+export function anthropicNativizable(m: Message, model: string, thinkingEnabled: boolean): boolean {
+  return thinkingEnabled
+    && !!m.toolUse?.length
+    && !!m.reasoningArtifacts?.some(a => a.provider === "anthropic" && a.model === model && ((!!a.signature && !!a.text) || !!a.redacted));
+}
+/** Build Anthropic wire messages, reconstructing native tool_use / tool_result / thinking
+ *  blocks for matching turns. `thinkingEnabled` is false (or stripped on a fail-safe retry)
+ *  ⇒ everything falls back to the plain string/image content (current, always-valid shape). */
+export function buildAnthropicMessages(messages: Message[], model: string, thinkingEnabled: boolean): AnthropicMessage[] {
+  const nonSystem = messages.filter(m => m.role !== "system");
+  const plain = (m: Message): AnthropicMessage => ({
+    role: m.role,
+    content: m.images?.length
+      ? [
+          ...m.images.map((img): AnthropicContentBlock => ({ type: "image", source: { type: "base64", media_type: img.mediaType, data: img.data } })),
+          ...(m.content ? [{ type: "text", text: m.content } as AnthropicContentBlock] : []),
+        ]
+      : m.content,
+  });
+  return nonSystem.map((m, i) => {
+    if (m.role === "assistant" && anthropicNativizable(m, model, thinkingEnabled)) {
+      const blocks: AnthropicContentBlock[] = [];
+      for (const a of m.reasoningArtifacts!) {
+        if (a.provider !== "anthropic" || a.model !== model) continue;
+        if (a.signature && a.text) blocks.push({ type: "thinking", thinking: a.text, signature: a.signature });
+        else if (a.redacted) blocks.push({ type: "redacted_thinking", data: a.redacted });
+      }
+      for (const tu of m.toolUse!) blocks.push({ type: "tool_use", id: tu.id, name: tu.tool, input: tu.arguments });
+      return { role: "assistant", content: blocks };
+    }
+    // A tool-result user turn is nativized iff its preceding assistant was — so a native
+    // tool_use always has its matching native tool_result (Anthropic errors on a mismatch).
+    if (m.role === "user" && m.toolResults?.length && i > 0
+        && nonSystem[i - 1].role === "assistant"
+        && anthropicNativizable(nonSystem[i - 1], model, thinkingEnabled)) {
+      const blocks: AnthropicContentBlock[] = m.toolResults.map(tr => ({
+        type: "tool_result", tool_use_id: tr.id, content: tr.output, is_error: tr.isError,
+      }));
+      if (m.toolResultExtra) blocks.push({ type: "text", text: m.toolResultExtra });
+      return { role: "user", content: blocks };
+    }
+    return plain(m);
+  });
+}
 export function anthropicPayload(
   messages: Message[],
   options: CallOptions,
   stream: boolean,
   includeTemperature: boolean,
   credential: Credential = { kind: "none", provider: "anthropic" },
+  stripArtifacts = false,
 ): string {
   const model = stripAnthropicPrefix(options.model);
   const systemPrompt = options.systemPrompt ?? messages.find(m => m.role === "system")?.content;
-  // Image attachments (clipboard paste) become Anthropic content blocks; plain
-  // string content is kept for text-only messages (the overwhelmingly common case).
-  type ContentBlock = Record<string, unknown>;
-  const anthropicMessages: { role: string; content: string | ContentBlock[] }[] =
-    messages.filter(m => m.role !== "system").map(m => ({
-      role: m.role,
-      content: m.images?.length
-        ? [
-            ...m.images.map((img): ContentBlock => ({ type: "image", source: { type: "base64", media_type: img.mediaType, data: img.data } })),
-            ...(m.content ? [{ type: "text", text: m.content } as ContentBlock] : []),
-          ]
-        : m.content,
-    }));
+  // Image attachments + native tool/thinking-block reconstruction live in buildAnthropicMessages.
+  const maxTokens = options.maxTokens ?? 4000;
+  const thinkingBudget = anthropicThinkingBudget(options.reasoningEffort, maxTokens);
+  // Reconstruct native tool_use / tool_result / thinking blocks for same-model turns when
+  // thinking is enabled (and not stripped by a fail-safe retry); else plain string/image.
+  const anthropicMessages = buildAnthropicMessages(messages, options.model, thinkingBudget !== undefined && !stripArtifacts);
   // Conversation prompt caching (gjc parity — the main same-model latency gap):
   // one breakpoint on the LAST message caches the entire conversation prefix, so
   // each agent-loop step only pays input processing for the new tail instead of
@@ -125,8 +173,7 @@ export function anthropicPayload(
       last.content[last.content.length - 1] = { ...tail, cache_control: { type: "ephemeral" } };
     }
   }
-  const maxTokens = options.maxTokens ?? 4000;
-  const thinkingBudget = anthropicThinkingBudget(options.reasoningEffort, maxTokens);
   const payload: Record<string, unknown> = {
     model,
     messages: anthropicMessages,
@@ -162,13 +209,14 @@ export function anthropicRequest(
   credential: Credential,
   stream: boolean,
   includeTemperature: boolean,
+  stripArtifacts = false,
 ): { url: string; headers: Record<string, string>; body: string } {
   return {
     // Anthropic-compatible providers (z.ai, MiniMax, …) accept the Messages wire
     // format at their own host; an explicit baseUrl pins `${base}/v1/messages`.
     url: options.baseUrl ? `${options.baseUrl.replace(/\/$/, "")}/v1/messages` : ANTHROPIC_URL,
     headers: headersFor(credential, stream),
-    body: anthropicPayload(messages, options, stream, includeTemperature, credential),
+    body: anthropicPayload(messages, options, stream, includeTemperature, credential, stripArtifacts),
   };
 }
@@ -176,14 +224,21 @@ function isDeprecatedTemperatureError(status: number, detail: string): boolean {
   return status === 400 && detail.includes(DEPRECATED_TEMPERATURE);
 }
+/** A 400 that names thinking/signature/redacted means a replayed reasoning artifact was
+ *  rejected (expired signature, edited history, thinking toggled). The fail-safe retries
+ *  once with artifacts stripped (plain string history) so the turn survives. */
+function isReasoningArtifactError(status: number, detail: string): boolean {
+  return status === 400 && /thinking|signature|redacted_thinking/i.test(detail);
+}
 async function postAnthropic(
   messages: Message[],
   options: CallOptions,
   credential: Credential,
   stream: boolean,
 ): Promise<Response> {
-  const send = (includeTemperature: boolean) => {
-    const { url, headers, body } = anthropicRequest(messages, options, credential, stream, includeTemperature);
+  const send = (includeTemperature: boolean, stripArtifacts = false) => {
+    const { url, headers, body } = anthropicRequest(messages, options, credential, stream, includeTemperature, stripArtifacts);
     return fetch(url, { method: "POST", headers, body, signal: options.signal });
   };
@@ -196,6 +251,12 @@ async function postAnthropic(
     if (response.ok) return response;
     throw await providerHttpError("Anthropic", response, stream ? "(stream)" : undefined);
   }
+  // Fail-safe: a rejected replay artifact → retry once with artifacts stripped (plain history).
+  if (isReasoningArtifactError(response.status, detail)) {
+    response = await send(true, true);
+    if (response.ok) return response;
+    throw await providerHttpError("Anthropic", response, stream ? "(stream)" : undefined);
+  }
   throw new ProviderHttpError(
     "Anthropic",
@@ -233,8 +294,16 @@ export const anthropicAdapter: ProviderAdapter = {
   supportsNativeTools: true,
   async call(messages, options, credential) {
     const response = await postAnthropic(messages, options, credential, false);
-    const result = (await response.json()) as { content: { type: string; text?: string; name?: string; input?: unknown }[]; stop_reason?: string; usage?: AnthropicUsage };
+    const result = (await response.json()) as { content: { type: string; text?: string; name?: string; input?: unknown; thinking?: string; signature?: string; data?: string }[]; stop_reason?: string; usage?: AnthropicUsage };
     if (result.usage) options.onUsage?.({ inputTokens: totalInputTokens(result.usage), outputTokens: result.usage.output_tokens });
+    // Capture thinking/redacted blocks as replay artifacts (parity with the stream path).
+    for (const c of result.content) {
+      if (c.type === "thinking" && (c.thinking || c.signature)) {
+        options.onReasoningArtifact?.({ provider: "anthropic", model: options.model, text: c.thinking || undefined, signature: c.signature });
+      } else if (c.type === "redacted_thinking" && c.data) {
+        options.onReasoningArtifact?.({ provider: "anthropic", model: options.model, redacted: c.data });
+      }
+    }
     // Prefer a native tool call (re-serialized to canonical JSON) over any stray text.
     const toolCall = serializeToolCalls(
       result.content
@@ -256,12 +325,16 @@ export const anthropicAdapter: ProviderAdapter = {
     // never as text_delta — accumulate per block index, then re-serialize to canonical
     // JSON and yield it once at the end (concatenation still equals call()).
     const toolBlocks = new Map<number, { name: string; args: string }>();
+    // Thinking blocks stream as content_block_start(type:thinking) + thinking_delta(text)
+    // + signature_delta(signature). Accumulate per index and emit one ReasoningArtifact per
+    // block on stream end so the signed thought can be replayed (gajae continuity).
+    const thinkBlocks = new Map<number, { text: string; signature?: string }>();
     for await (const data of readSse(response.body)) {
       let evt: {
         type?: string;
         index?: number;
-        content_block?: { type?: string; name?: string };
-        delta?: { type?: string; text?: string; partial_json?: string; thinking?: string; stop_reason?: string };
+        content_block?: { type?: string; name?: string; data?: string };
+        delta?: { type?: string; text?: string; partial_json?: string; thinking?: string; signature?: string; stop_reason?: string };
         message?: { usage?: AnthropicUsage };
         usage?: { output_tokens?: number };
       };
@@ -272,6 +345,11 @@ export const anthropicAdapter: ProviderAdapter = {
       }
       if (evt.type === "content_block_start" && evt.content_block?.type === "tool_use" && typeof evt.index === "number") {
         toolBlocks.set(evt.index, { name: evt.content_block.name ?? "", args: "" });
+      } else if (evt.type === "content_block_start" && evt.content_block?.type === "thinking" && typeof evt.index === "number") {
+        thinkBlocks.set(evt.index, { text: "" });
+      } else if (evt.type === "content_block_start" && evt.content_block?.type === "redacted_thinking" && evt.content_block.data) {
+        // Redacted thinking carries opaque `data` directly (no deltas) — emit immediately.
+        options.onReasoningArtifact?.({ provider: "anthropic", model: options.model, redacted: evt.content_block.data });
       } else if (evt.type === "content_block_delta" && evt.delta?.type === "input_json_delta" && typeof evt.index === "number") {
         const b = toolBlocks.get(evt.index);
         if (b) b.args += evt.delta.partial_json ?? "";
@@ -280,6 +358,15 @@ export const anthropicAdapter: ProviderAdapter = {
         yield evt.delta.text;
       } else if (evt.type === "content_block_delta" && evt.delta?.type === "thinking_delta" && evt.delta.thinking) {
         options.onReasoning?.(evt.delta.thinking);
+        if (typeof evt.index === "number") {
+          const tb = thinkBlocks.get(evt.index) ?? { text: "" };
+          tb.text += evt.delta.thinking;
+          thinkBlocks.set(evt.index, tb);
+        }
+      } else if (evt.type === "content_block_delta" && evt.delta?.type === "signature_delta" && evt.delta.signature && typeof evt.index === "number") {
+        const tb = thinkBlocks.get(evt.index) ?? { text: "" };
+        tb.signature = (tb.signature ?? "") + evt.delta.signature;
+        thinkBlocks.set(evt.index, tb);
       } else if (evt.type === "message_start" && evt.message?.usage) {
         // Cache only — usage is reported ONCE at message_delta so an accumulating
         // sink can't double-count input (and a pre-first-chunk retry that replays
@@ -290,6 +377,12 @@ export const anthropicAdapter: ProviderAdapter = {
         if (evt.usage) options.onUsage?.({ inputTokens: cachedInput, outputTokens: evt.usage.output_tokens });
       }
     }
+    // Emit captured thinking blocks as replay artifacts (signed thought + signature).
+    for (const tb of thinkBlocks.values()) {
+      if (tb.text || tb.signature) {
+        options.onReasoningArtifact?.({ provider: "anthropic", model: options.model, text: tb.text || undefined, signature: tb.signature });
+      }
+    }
     const envelope = serializeAccumulatedToolCalls(toolBlocks);
     if (envelope) { yieldedAny = true; yield envelope; }
     if (!yieldedAny) throw emptyCompletionError(stopReason);