jeo-code 0.6.27 → 0.6.28

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -6,6 +6,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
6
6
 
7
7
  The README mirrors the latest 5 entries — regenerate with `bun run changelog:sync`.
8
8
 
9
+ ## [0.6.28] - 2026-06-19
10
+ _Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity)._
11
+
12
+ ### Added
13
+ - **Provider-native reasoning replay across all three first-party providers.** jeo now captures each provider's opaque/signed reasoning artifact during streaming and replays it on later turns to the SAME provider+model, so the model keeps its chain of thought across tool steps instead of re-deriving it. New `Message.reasoningArtifacts` plus structured `Message.toolUse` / `toolResults` (stable ids) let capable adapters reconstruct **native** tool blocks (the key to continuity — plain-text tool feedback makes Claude strip prior thinking):
14
+ - **Anthropic**: captures `signature_delta` + `redacted_thinking`; replays `thinking`(+signature) → `tool_use` → `tool_result` blocks (gated on same-model + thinking-enabled).
15
+ - **OpenAI Responses**: requests `include: ["reasoning.encrypted_content"]` (store stays false), captures reasoning item id+encrypted_content, replays native `reasoning` + `function_call` + `function_call_output` items.
16
+ - **Gemini**: captures per-part `thoughtSignature`, replays native `functionCall`(+thoughtSignature) / `functionResponse` parts (coalescing-safe). This was previously deferred — structured `toolUse` unblocks the functionCall binding.
17
+ - **Fail-safe strip-and-retry.** A 400 naming a thinking/signature/encrypted/reasoning field retries the step ONCE with artifacts stripped (plain history), so an expired signature or edited history can never wedge a turn. Per provider (Anthropic/OpenAI/Gemini).
18
+
19
+ ### Changed
20
+ - **Reasoning artifacts ride the session record + token accounting.** `reasoningArtifacts` round-trips through session save/load (so `/resume` preserves replay continuity) and counts toward `estimateMessageTokens` (OpenAI encrypted blobs are KB-scale) so compaction/overflow stay honest. Markdown export is unchanged (artifacts are opaque). The engine's ~11 assistant-push sites are unified behind `pushAssistantTurn`, so every step (not just the final reply) carries its reasoning + artifacts. Antigravity is explicitly out of scope (no capture/replay; the provider-keyed match guard prevents any cross-adapter leakage).
21
+
9
22
  ## [0.6.27] - 2026-06-19
10
23
  _Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`._
11
24
 
package/README.ja.md CHANGED
@@ -2,10 +2,6 @@
2
2
  <img src="assets/hero.png" alt="jeo-code 自律コーディングエージェントのヒーローイラスト" width="100%" />
3
3
  </p>
4
4
 
5
- <p align="center">
6
- <img src="assets/icon.png" alt="jeo-code icon" width="96" />
7
- </p>
8
-
9
5
  <h1 align="center">jeo-code (jeo)</h1>
10
6
 
11
7
  <p align="center">
@@ -204,11 +200,11 @@ CI は `.github/workflows/npm-publish.yml` で公開します — GitHub リリ
204
200
  ## 変更履歴 (Changelog)
205
201
 
206
202
  <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
203
+ - **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
207
204
  - **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
208
205
  - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
209
206
  - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
210
207
  - **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
211
- - **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
212
208
 
213
209
  See [CHANGELOG.md](CHANGELOG.md) for the full history.
214
210
  <!-- CHANGELOG:END -->
package/README.ko.md CHANGED
@@ -2,10 +2,6 @@
2
2
  <img src="assets/hero.png" alt="jeo-code 자율 코딩 에이전트 히어로 일러스트" width="100%" />
3
3
  </p>
4
4
 
5
- <p align="center">
6
- <img src="assets/icon.png" alt="jeo-code icon" width="96" />
7
- </p>
8
-
9
5
  <h1 align="center">jeo-code (jeo)</h1>
10
6
 
11
7
  <p align="center">
@@ -204,11 +200,11 @@ CI는 `.github/workflows/npm-publish.yml`로 배포합니다 — GitHub 릴리
204
200
  ## 변경 이력 (Changelog)
205
201
 
206
202
  <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
203
+ - **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
207
204
  - **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
208
205
  - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
209
206
  - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
210
207
  - **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
211
- - **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
212
208
 
213
209
  See [CHANGELOG.md](CHANGELOG.md) for the full history.
214
210
  <!-- CHANGELOG:END -->
package/README.md CHANGED
@@ -2,10 +2,6 @@
2
2
  <img src="assets/hero.png" alt="jeo-code autonomous coding-agent hero illustration" width="100%" />
3
3
  </p>
4
4
 
5
- <p align="center">
6
- <img src="assets/icon.png" alt="jeo-code icon" width="96" />
7
- </p>
8
-
9
5
  <h1 align="center">jeo-code (jeo)</h1>
10
6
 
11
7
  <p align="center">
@@ -204,11 +200,11 @@ Required npm token permissions (repository secret `NPM_TOKEN`):
204
200
  ## Changelog
205
201
 
206
202
  <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
203
+ - **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
207
204
  - **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
208
205
  - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
209
206
  - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
210
207
  - **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
211
- - **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
212
208
 
213
209
  See [CHANGELOG.md](CHANGELOG.md) for the full history.
214
210
  <!-- CHANGELOG:END -->
package/README.zh.md CHANGED
@@ -2,10 +2,6 @@
2
2
  <img src="assets/hero.png" alt="jeo-code 自主编码代理主视觉插图" width="100%" />
3
3
  </p>
4
4
 
5
- <p align="center">
6
- <img src="assets/icon.png" alt="jeo-code icon" width="96" />
7
- </p>
8
-
9
5
  <h1 align="center">jeo-code (jeo)</h1>
10
6
 
11
7
  <p align="center">
@@ -204,11 +200,11 @@ CI 通过 `.github/workflows/npm-publish.yml` 发布 — GitHub 发布 release
204
200
  ## 更新日志 (Changelog)
205
201
 
206
202
  <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
203
+ - **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
207
204
  - **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
208
205
  - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
209
206
  - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
210
207
  - **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
211
- - **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
212
208
 
213
209
  See [CHANGELOG.md](CHANGELOG.md) for the full history.
214
210
  <!-- CHANGELOG:END -->
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "jeo-code",
3
- "version": "0.6.27",
3
+ "version": "0.6.28",
4
4
  "description": "Clean, highly optimized AI coding agent using spec-first loop",
5
5
  "type": "module",
6
6
  "main": "src/cli.ts",
@@ -78,7 +78,16 @@ const messageTokenCache = new WeakMap<Message, number>();
78
78
  export function estimateMessageTokens(msg: Message): number {
79
79
  const hit = messageTokenCache.get(msg);
80
80
  if (hit !== undefined) return hit;
81
- const n = estimateTokens(msg.role) + estimateTokens(msg.content) + (msg.images?.length ?? 0) * IMAGE_TOKEN_ESTIMATE + 1;
81
+ let n = estimateTokens(msg.role) + estimateTokens(msg.content) + (msg.images?.length ?? 0) * IMAGE_TOKEN_ESTIMATE + 1;
82
+ // Native reasoning artifacts (signature / encrypted_content / thought text) are NOT in
83
+ // `content` but become REAL input tokens once an adapter replays them — count them so
84
+ // the context meter and compaction trigger stay honest (OpenAI encrypted blobs are KB-scale).
85
+ // toolUse/toolResults/toolResultExtra are already reflected in `content`, so they are not re-added.
86
+ for (const a of msg.reasoningArtifacts ?? []) {
87
+ n += estimateTokens(a.text ?? "") + estimateTokens(a.signature ?? "")
88
+ + estimateTokens(a.redacted ?? "") + estimateTokens(a.thoughtSignature ?? "")
89
+ + estimateTokens(a.encrypted ?? "");
90
+ }
82
91
  messageTokenCache.set(msg, n);
83
92
  return n;
84
93
  }
@@ -34,11 +34,30 @@ async function invokeCallLlm(history: Message[], options: {
34
34
  onRetry?: (attempt: number, err: unknown, delayMs: number) => void;
35
35
  onToken?: (delta: string) => void;
36
36
  onReasoning?: (delta: string) => void;
37
+ onReasoningArtifact?: (artifact: import("../ai/types").ReasoningArtifact) => void;
37
38
  tools?: import("../ai/types").NativeToolSchema[];
38
39
  }): Promise<string> {
39
40
  const mod = await import("./loop");
40
41
  return mod.callLlm(history, options);
41
42
  }
43
+
44
+ /** Push an assistant turn, attaching the step's reasoning + native replay records when
45
+ * present. Centralizes the assistant-push sites so reasoning/artifacts attach uniformly
46
+ * (not just the final reply). Omits empty fields so back-compat serialization and the
47
+ * identity-keyed token cache are unaffected. */
48
+ function pushAssistantTurn(
49
+ history: Message[],
50
+ content: string,
51
+ reasoning: string,
52
+ artifacts: import("../ai/types").ReasoningArtifact[],
53
+ toolUse?: import("../ai/types").ToolUseRecord[],
54
+ ): void {
55
+ const msg: Message = { role: "assistant", content };
56
+ if (reasoning.trim()) msg.reasoning = reasoning;
57
+ if (artifacts.length) msg.reasoningArtifacts = artifacts;
58
+ if (toolUse && toolUse.length) msg.toolUse = toolUse;
59
+ history.push(msg);
60
+ }
42
61
  export interface ToolInvocation {
43
62
  tool: string;
44
63
  arguments?: Record<string, any>;
@@ -176,6 +195,9 @@ export interface AgentLoopEvents {
176
195
  /** Accumulated native reasoning/thinking text so far — drives a transient dimmed
177
196
  * "thinking" view. Only requested when a consumer (TUI) attaches. */
178
197
  onReasoningStream?(textSoFar: string): void;
198
+ /** Each provider-native reasoning ARTIFACT as it is captured (signature / thoughtSignature /
199
+ * reasoning item). Lets the final-reply path (launch.ts) persist artifacts for replay. */
200
+ onReasoningArtifactStream?(artifact: import("../ai/types").ReasoningArtifact): void;
179
201
  /** Step-budget change (gjc-style retry flow): the limit was extended because the
180
202
  * turn is making progress. `limit` is the new max; `reason` is display-ready. */
181
203
  onBudget?(limit: number, reason: string): void;
@@ -345,7 +367,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
345
367
  );
346
368
  const consolidated = wrapUp.trim();
347
369
  if (consolidated) {
348
- history.push({ role: "assistant", content: consolidated });
370
+ pushAssistantTurn(history, consolidated, "", []);
349
371
  return finish({
350
372
  done: false,
351
373
  steps: step,
@@ -493,6 +515,14 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
493
515
  const onReasoning = ev.onReasoningStream
494
516
  ? (delta: string) => { reasonBuf += delta; ev.onReasoningStream!(reasonBuf); }
495
517
  : undefined;
518
+ // Capture provider-native reasoning ARTIFACTS for replay (always — independent of any
519
+ // TUI display sink). Stays scoped to THIS step so a later consolidation push can't
520
+ // inherit a prior step's signatures.
521
+ const artifactBuf: import("../ai/types").ReasoningArtifact[] = [];
522
+ const onReasoningArtifact = (a: import("../ai/types").ReasoningArtifact) => {
523
+ artifactBuf.push(a);
524
+ ev.onReasoningArtifactStream?.(a);
525
+ };
496
526
  let responseText: string;
497
527
  try {
498
528
  responseText = await invokeCallLlm(history, {
@@ -510,6 +540,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
510
540
  onUsage: u => { acc.inputTokens += u.inputTokens ?? 0; acc.outputTokens += u.outputTokens ?? 0; sawUsage = true; },
511
541
  onToken,
512
542
  onReasoning,
543
+ onReasoningArtifact,
513
544
  // Make provider auto-retry visible: previously a rate-limited call sat in a
514
545
  // silent backoff wait, then surfaced "auto-retry was exhausted" with no trace
515
546
  // of the retries that DID happen.
@@ -604,10 +635,10 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
604
635
  const trimmed = responseText.trim();
605
636
  parseFailures++;
606
637
  if (trimmed && (!trimmed.includes("{") || parseFailures > MAX_PARSE_BOUNCES)) {
607
- history.push({ role: "assistant", content: responseText });
638
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
608
639
  return finish({ done: true, steps: step, doneReason: trimmed });
609
640
  }
610
- history.push({ role: "assistant", content: responseText });
641
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
611
642
  history.push({
612
643
  role: "user",
613
644
  content:
@@ -654,7 +685,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
654
685
  doneReason: `Stopped: the model returned no valid tool call ${MAX_INVALID_CALLS}× (a JSON reply with no valid "tool" or "tools" field). The selected model may be too small to follow the JSON tool protocol — switch to a stronger model with /model.`,
655
686
  });
656
687
  }
657
- history.push({ role: "assistant", content: responseText });
688
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
658
689
  history.push({
659
690
  role: "user",
660
691
  content: `Your last reply had no "tool" or "tools" field. Reply with exactly one JSON object, e.g. {"tool":"find","arguments":{"globPattern":"src/**"}} or {"tools":[{"tool":"read","arguments":{"filePath":"src/main.ts"}}, ...]}.`,
@@ -674,7 +705,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
674
705
  if (toolCalls.length === 1 && toolCalls[0].tool === "done") {
675
706
  if (sawMutation && (!sawVerification || pendingHookFailure !== null) && !donePushbackUsed) {
676
707
  donePushbackUsed = true; // second done always passes — escape hatch
677
- history.push({ role: "assistant", content: responseText });
708
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
678
709
  history.push({
679
710
  role: "user",
680
711
  content: pendingHookFailure !== null
@@ -696,7 +727,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
696
727
  const nudge = await ev.onBeforeDone((toolCalls[0].arguments?.reason as string) ?? "");
697
728
  if (nudge) {
698
729
  beforeDoneNudgeUsed = true;
699
- history.push({ role: "assistant", content: responseText });
730
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
700
731
  history.push({ role: "user", content: nudge });
701
732
  ev.onNotice?.("done deferred once — final plan reconciliation requested");
702
733
  step++;
@@ -709,7 +740,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
709
740
  if (opts.steer) {
710
741
  const pending = opts.steer().map(s => (s ?? "").trim()).filter(Boolean);
711
742
  if (pending.length) {
712
- history.push({ role: "assistant", content: responseText });
743
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
713
744
  for (const text of pending) {
714
745
  history.push({
715
746
  role: "user",
@@ -754,7 +785,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
754
785
  const lastChance = repeatCount === MAX_REPEAT - 1
755
786
  ? "This is your LAST attempt: if you emit the same call again the turn will end. "
756
787
  : "";
757
- history.push({ role: "assistant", content: responseText });
788
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
758
789
  history.push({
759
790
  role: "user",
760
791
  content:
@@ -784,7 +815,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
784
815
  if (!cycleBounceUsed) {
785
816
  cycleBounceUsed = true;
786
817
  recentStepSigs.length = 0; // fresh window: the correction earns a real retry
787
- history.push({ role: "assistant", content: responseText });
818
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
788
819
  history.push({
789
820
  role: "user",
790
821
  content:
@@ -944,6 +975,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
944
975
  );
945
976
  // Append the batch's hook diagnostics once so the model can self-correct. Two
946
977
  // DISTINCT hooks with identical output collapse to one full block + a cross-ref.
978
+ let hookExtra = "";
947
979
  if (hookDiags.length > 0) {
948
980
  const seenHookFeedback = new Set<string>();
949
981
  const diagLines: string[] = [];
@@ -956,14 +988,28 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
956
988
  diagLines.push(`[post-turn hook "${d.run}" — exit ${d.exitCode}]:\n${truncateToolOutput(d.output)}`);
957
989
  }
958
990
  }
959
- resultBlocks.push(diagLines.join("\n"));
991
+ hookExtra = diagLines.join("\n");
992
+ resultBlocks.push(hookExtra);
960
993
  }
961
994
 
962
- history.push({ role: "assistant", content: responseText });
963
- history.push({
964
- role: "user",
965
- content: resultBlocks.join("\n\n"),
966
- });
995
+ // Structured native replay records: stable ids correlate the assistant tool_use
996
+ // turn with its tool_result user turn (the string `content` stays the source of
997
+ // truth for display / compaction / fallback adapters).
998
+ const idFor = (idx: number) => `call_${step}_${idx}`;
999
+ const toolUse: import("../ai/types").ToolUseRecord[] = indices.map(idx => ({
1000
+ id: idFor(idx),
1001
+ tool: toolCalls[idx].tool,
1002
+ arguments: toolCalls[idx].arguments ?? {},
1003
+ }));
1004
+ const toolResults: import("../ai/types").ToolResultRecord[] = indices.map((idx, i) => ({
1005
+ id: idFor(idx),
1006
+ output: bodies[i],
1007
+ isError: !results[idx].success,
1008
+ }));
1009
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf, toolUse);
1010
+ const resultMsg: Message = { role: "user", content: resultBlocks.join("\n\n"), toolResults };
1011
+ if (hookExtra) resultMsg.toolResultExtra = hookExtra;
1012
+ history.push(resultMsg);
967
1013
  };
968
1014
 
969
1015
  if (aborted) {
@@ -1053,7 +1099,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
1053
1099
  );
1054
1100
  const consolidated = wrapUp.trim();
1055
1101
  if (consolidated) {
1056
- history.push({ role: "assistant", content: consolidated });
1102
+ pushAssistantTurn(history, consolidated, "", []);
1057
1103
  return finish({
1058
1104
  done: false,
1059
1105
  steps: budget.limit(),
package/src/agent/loop.ts CHANGED
@@ -26,6 +26,9 @@ export interface ChatOptions {
26
26
  onToken?: (delta: string) => void;
27
27
  /** Streaming sink for native reasoning/thinking deltas (drives the dimmed live view). */
28
28
  onReasoning?: (delta: string) => void;
29
+ /** Streaming sink for provider-native reasoning ARTIFACTS (signature / thoughtSignature /
30
+ * reasoning item id+encrypted) — the replay channel, separate from onReasoning. */
31
+ onReasoningArtifact?: (artifact: import("../ai/types").ReasoningArtifact) => void;
29
32
  /** NATIVE tool-calling function declarations (forwarded to capable adapters). */
30
33
  tools?: import("../ai/types").NativeToolSchema[];
31
34
  }
@@ -332,6 +332,7 @@ async function resolveCall(options: Partial<CallOptions>, kind: "request" | "str
332
332
  signal: options.signal,
333
333
  reasoningEffort: options.reasoningEffort ?? thinkingToReasoningEffort(config.thinkingLevel),
334
334
  onReasoning: options.onReasoning,
335
+ onReasoningArtifact: options.onReasoningArtifact,
335
336
  tools: options.tools,
336
337
  };
337
338
  // Caller-supplied retry sink rides on the config-derived retry budget so the
@@ -88,28 +88,76 @@ function anthropicThinkingBudget(effort: CallOptions["reasoningEffort"], maxToke
88
88
  return Math.min(budget, Math.max(1024, maxTokens - 1024));
89
89
  }
90
90
 
91
+ type AnthropicContentBlock = Record<string, unknown>;
92
+ type AnthropicMessage = { role: string; content: string | AnthropicContentBlock[] };
93
+
94
+ /** True when an assistant turn can be replayed as native tool_use + thinking blocks: it has
95
+ * structured toolUse AND a same-model Anthropic reasoning artifact that yields at least one
96
+ * valid thinking/redacted block, AND thinking is enabled this call. Native tool_use →
97
+ * tool_result is what makes Claude KEEP the prior thinking blocks (plain-text tool feedback
98
+ * gets them stripped on most models), so this is the core of cross-step reasoning continuity. */
99
+ export function anthropicNativizable(m: Message, model: string, thinkingEnabled: boolean): boolean {
100
+ return thinkingEnabled
101
+ && !!m.toolUse?.length
102
+ && !!m.reasoningArtifacts?.some(a => a.provider === "anthropic" && a.model === model && ((!!a.signature && !!a.text) || !!a.redacted));
103
+ }
104
+
105
+ /** Build Anthropic wire messages, reconstructing native tool_use / tool_result / thinking
106
+ * blocks for matching turns. `thinkingEnabled` is false (or stripped on a fail-safe retry)
107
+ * ⇒ everything falls back to the plain string/image content (current, always-valid shape). */
108
+ export function buildAnthropicMessages(messages: Message[], model: string, thinkingEnabled: boolean): AnthropicMessage[] {
109
+ const nonSystem = messages.filter(m => m.role !== "system");
110
+ const plain = (m: Message): AnthropicMessage => ({
111
+ role: m.role,
112
+ content: m.images?.length
113
+ ? [
114
+ ...m.images.map((img): AnthropicContentBlock => ({ type: "image", source: { type: "base64", media_type: img.mediaType, data: img.data } })),
115
+ ...(m.content ? [{ type: "text", text: m.content } as AnthropicContentBlock] : []),
116
+ ]
117
+ : m.content,
118
+ });
119
+ return nonSystem.map((m, i) => {
120
+ if (m.role === "assistant" && anthropicNativizable(m, model, thinkingEnabled)) {
121
+ const blocks: AnthropicContentBlock[] = [];
122
+ for (const a of m.reasoningArtifacts!) {
123
+ if (a.provider !== "anthropic" || a.model !== model) continue;
124
+ if (a.signature && a.text) blocks.push({ type: "thinking", thinking: a.text, signature: a.signature });
125
+ else if (a.redacted) blocks.push({ type: "redacted_thinking", data: a.redacted });
126
+ }
127
+ for (const tu of m.toolUse!) blocks.push({ type: "tool_use", id: tu.id, name: tu.tool, input: tu.arguments });
128
+ return { role: "assistant", content: blocks };
129
+ }
130
+ // A tool-result user turn is nativized iff its preceding assistant was — so a native
131
+ // tool_use always has its matching native tool_result (Anthropic errors on a mismatch).
132
+ if (m.role === "user" && m.toolResults?.length && i > 0
133
+ && nonSystem[i - 1].role === "assistant"
134
+ && anthropicNativizable(nonSystem[i - 1], model, thinkingEnabled)) {
135
+ const blocks: AnthropicContentBlock[] = m.toolResults.map(tr => ({
136
+ type: "tool_result", tool_use_id: tr.id, content: tr.output, is_error: tr.isError,
137
+ }));
138
+ if (m.toolResultExtra) blocks.push({ type: "text", text: m.toolResultExtra });
139
+ return { role: "user", content: blocks };
140
+ }
141
+ return plain(m);
142
+ });
143
+ }
144
+
91
145
  export function anthropicPayload(
92
146
  messages: Message[],
93
147
  options: CallOptions,
94
148
  stream: boolean,
95
149
  includeTemperature: boolean,
96
150
  credential: Credential = { kind: "none", provider: "anthropic" },
151
+ stripArtifacts = false,
97
152
  ): string {
98
153
  const model = stripAnthropicPrefix(options.model);
99
154
  const systemPrompt = options.systemPrompt ?? messages.find(m => m.role === "system")?.content;
100
- // Image attachments (clipboard paste) become Anthropic content blocks; plain
101
- // string content is kept for text-only messages (the overwhelmingly common case).
102
- type ContentBlock = Record<string, unknown>;
103
- const anthropicMessages: { role: string; content: string | ContentBlock[] }[] =
104
- messages.filter(m => m.role !== "system").map(m => ({
105
- role: m.role,
106
- content: m.images?.length
107
- ? [
108
- ...m.images.map((img): ContentBlock => ({ type: "image", source: { type: "base64", media_type: img.mediaType, data: img.data } })),
109
- ...(m.content ? [{ type: "text", text: m.content } as ContentBlock] : []),
110
- ]
111
- : m.content,
112
- }));
155
+ // Image attachments + native tool/thinking-block reconstruction live in buildAnthropicMessages.
156
+ const maxTokens = options.maxTokens ?? 4000;
157
+ const thinkingBudget = anthropicThinkingBudget(options.reasoningEffort, maxTokens);
158
+ // Reconstruct native tool_use / tool_result / thinking blocks for same-model turns when
159
+ // thinking is enabled (and not stripped by a fail-safe retry); else plain string/image.
160
+ const anthropicMessages = buildAnthropicMessages(messages, options.model, thinkingBudget !== undefined && !stripArtifacts);
113
161
  // Conversation prompt caching (gjc parity — the main same-model latency gap):
114
162
  // one breakpoint on the LAST message caches the entire conversation prefix, so
115
163
  // each agent-loop step only pays input processing for the new tail instead of
@@ -125,8 +173,7 @@ export function anthropicPayload(
125
173
  last.content[last.content.length - 1] = { ...tail, cache_control: { type: "ephemeral" } };
126
174
  }
127
175
  }
128
- const maxTokens = options.maxTokens ?? 4000;
129
- const thinkingBudget = anthropicThinkingBudget(options.reasoningEffort, maxTokens);
176
+
130
177
  const payload: Record<string, unknown> = {
131
178
  model,
132
179
  messages: anthropicMessages,
@@ -162,13 +209,14 @@ export function anthropicRequest(
162
209
  credential: Credential,
163
210
  stream: boolean,
164
211
  includeTemperature: boolean,
212
+ stripArtifacts = false,
165
213
  ): { url: string; headers: Record<string, string>; body: string } {
166
214
  return {
167
215
  // Anthropic-compatible providers (z.ai, MiniMax, …) accept the Messages wire
168
216
  // format at their own host; an explicit baseUrl pins `${base}/v1/messages`.
169
217
  url: options.baseUrl ? `${options.baseUrl.replace(/\/$/, "")}/v1/messages` : ANTHROPIC_URL,
170
218
  headers: headersFor(credential, stream),
171
- body: anthropicPayload(messages, options, stream, includeTemperature, credential),
219
+ body: anthropicPayload(messages, options, stream, includeTemperature, credential, stripArtifacts),
172
220
  };
173
221
  }
174
222
 
@@ -176,14 +224,21 @@ function isDeprecatedTemperatureError(status: number, detail: string): boolean {
176
224
  return status === 400 && detail.includes(DEPRECATED_TEMPERATURE);
177
225
  }
178
226
 
227
+ /** A 400 that names thinking/signature/redacted means a replayed reasoning artifact was
228
+ * rejected (expired signature, edited history, thinking toggled). The fail-safe retries
229
+ * once with artifacts stripped (plain string history) so the turn survives. */
230
+ function isReasoningArtifactError(status: number, detail: string): boolean {
231
+ return status === 400 && /thinking|signature|redacted_thinking/i.test(detail);
232
+ }
233
+
179
234
  async function postAnthropic(
180
235
  messages: Message[],
181
236
  options: CallOptions,
182
237
  credential: Credential,
183
238
  stream: boolean,
184
239
  ): Promise<Response> {
185
- const send = (includeTemperature: boolean) => {
186
- const { url, headers, body } = anthropicRequest(messages, options, credential, stream, includeTemperature);
240
+ const send = (includeTemperature: boolean, stripArtifacts = false) => {
241
+ const { url, headers, body } = anthropicRequest(messages, options, credential, stream, includeTemperature, stripArtifacts);
187
242
  return fetch(url, { method: "POST", headers, body, signal: options.signal });
188
243
  };
189
244
 
@@ -196,6 +251,12 @@ async function postAnthropic(
196
251
  if (response.ok) return response;
197
252
  throw await providerHttpError("Anthropic", response, stream ? "(stream)" : undefined);
198
253
  }
254
+ // Fail-safe: a rejected replay artifact → retry once with artifacts stripped (plain history).
255
+ if (isReasoningArtifactError(response.status, detail)) {
256
+ response = await send(true, true);
257
+ if (response.ok) return response;
258
+ throw await providerHttpError("Anthropic", response, stream ? "(stream)" : undefined);
259
+ }
199
260
 
200
261
  throw new ProviderHttpError(
201
262
  "Anthropic",
@@ -233,8 +294,16 @@ export const anthropicAdapter: ProviderAdapter = {
233
294
  supportsNativeTools: true,
234
295
  async call(messages, options, credential) {
235
296
  const response = await postAnthropic(messages, options, credential, false);
236
- const result = (await response.json()) as { content: { type: string; text?: string; name?: string; input?: unknown }[]; stop_reason?: string; usage?: AnthropicUsage };
297
+ const result = (await response.json()) as { content: { type: string; text?: string; name?: string; input?: unknown; thinking?: string; signature?: string; data?: string }[]; stop_reason?: string; usage?: AnthropicUsage };
237
298
  if (result.usage) options.onUsage?.({ inputTokens: totalInputTokens(result.usage), outputTokens: result.usage.output_tokens });
299
+ // Capture thinking/redacted blocks as replay artifacts (parity with the stream path).
300
+ for (const c of result.content) {
301
+ if (c.type === "thinking" && (c.thinking || c.signature)) {
302
+ options.onReasoningArtifact?.({ provider: "anthropic", model: options.model, text: c.thinking || undefined, signature: c.signature });
303
+ } else if (c.type === "redacted_thinking" && c.data) {
304
+ options.onReasoningArtifact?.({ provider: "anthropic", model: options.model, redacted: c.data });
305
+ }
306
+ }
238
307
  // Prefer a native tool call (re-serialized to canonical JSON) over any stray text.
239
308
  const toolCall = serializeToolCalls(
240
309
  result.content
@@ -256,12 +325,16 @@ export const anthropicAdapter: ProviderAdapter = {
256
325
  // never as text_delta — accumulate per block index, then re-serialize to canonical
257
326
  // JSON and yield it once at the end (concatenation still equals call()).
258
327
  const toolBlocks = new Map<number, { name: string; args: string }>();
328
+ // Thinking blocks stream as content_block_start(type:thinking) + thinking_delta(text)
329
+ // + signature_delta(signature). Accumulate per index and emit one ReasoningArtifact per
330
+ // block on stream end so the signed thought can be replayed (gajae continuity).
331
+ const thinkBlocks = new Map<number, { text: string; signature?: string }>();
259
332
  for await (const data of readSse(response.body)) {
260
333
  let evt: {
261
334
  type?: string;
262
335
  index?: number;
263
- content_block?: { type?: string; name?: string };
264
- delta?: { type?: string; text?: string; partial_json?: string; thinking?: string; stop_reason?: string };
336
+ content_block?: { type?: string; name?: string; data?: string };
337
+ delta?: { type?: string; text?: string; partial_json?: string; thinking?: string; signature?: string; stop_reason?: string };
265
338
  message?: { usage?: AnthropicUsage };
266
339
  usage?: { output_tokens?: number };
267
340
  };
@@ -272,6 +345,11 @@ export const anthropicAdapter: ProviderAdapter = {
272
345
  }
273
346
  if (evt.type === "content_block_start" && evt.content_block?.type === "tool_use" && typeof evt.index === "number") {
274
347
  toolBlocks.set(evt.index, { name: evt.content_block.name ?? "", args: "" });
348
+ } else if (evt.type === "content_block_start" && evt.content_block?.type === "thinking" && typeof evt.index === "number") {
349
+ thinkBlocks.set(evt.index, { text: "" });
350
+ } else if (evt.type === "content_block_start" && evt.content_block?.type === "redacted_thinking" && evt.content_block.data) {
351
+ // Redacted thinking carries opaque `data` directly (no deltas) — emit immediately.
352
+ options.onReasoningArtifact?.({ provider: "anthropic", model: options.model, redacted: evt.content_block.data });
275
353
  } else if (evt.type === "content_block_delta" && evt.delta?.type === "input_json_delta" && typeof evt.index === "number") {
276
354
  const b = toolBlocks.get(evt.index);
277
355
  if (b) b.args += evt.delta.partial_json ?? "";
@@ -280,6 +358,15 @@ export const anthropicAdapter: ProviderAdapter = {
280
358
  yield evt.delta.text;
281
359
  } else if (evt.type === "content_block_delta" && evt.delta?.type === "thinking_delta" && evt.delta.thinking) {
282
360
  options.onReasoning?.(evt.delta.thinking);
361
+ if (typeof evt.index === "number") {
362
+ const tb = thinkBlocks.get(evt.index) ?? { text: "" };
363
+ tb.text += evt.delta.thinking;
364
+ thinkBlocks.set(evt.index, tb);
365
+ }
366
+ } else if (evt.type === "content_block_delta" && evt.delta?.type === "signature_delta" && evt.delta.signature && typeof evt.index === "number") {
367
+ const tb = thinkBlocks.get(evt.index) ?? { text: "" };
368
+ tb.signature = (tb.signature ?? "") + evt.delta.signature;
369
+ thinkBlocks.set(evt.index, tb);
283
370
  } else if (evt.type === "message_start" && evt.message?.usage) {
284
371
  // Cache only — usage is reported ONCE at message_delta so an accumulating
285
372
  // sink can't double-count input (and a pre-first-chunk retry that replays
@@ -290,6 +377,12 @@ export const anthropicAdapter: ProviderAdapter = {
290
377
  if (evt.usage) options.onUsage?.({ inputTokens: cachedInput, outputTokens: evt.usage.output_tokens });
291
378
  }
292
379
  }
380
+ // Emit captured thinking blocks as replay artifacts (signed thought + signature).
381
+ for (const tb of thinkBlocks.values()) {
382
+ if (tb.text || tb.signature) {
383
+ options.onReasoningArtifact?.({ provider: "anthropic", model: options.model, text: tb.text || undefined, signature: tb.signature });
384
+ }
385
+ }
293
386
  const envelope = serializeAccumulatedToolCalls(toolBlocks);
294
387
  if (envelope) { yieldedAny = true; yield envelope; }
295
388
  if (!yieldedAny) throw emptyCompletionError(stopReason);
@@ -108,6 +108,12 @@ export async function resolveAntigravityProjectId(
108
108
 
109
109
  type CcaPart = { text: string } | { inlineData: { mimeType: string; data: string } };
110
110
 
111
+ // Reasoning-artifact replay (signed thinking / thoughtSignature / encrypted reasoning) is
112
+ // deliberately OUT OF SCOPE for antigravity: it serves Gemini- and Claude-shaped models over
113
+ // the CCA wire (neither the native Anthropic messages nor the public Gemini shape), so it
114
+ // captures no artifacts and replays none — Message.toolUse/toolResults/reasoningArtifacts are
115
+ // ignored here. The provider-keyed match guard (D3) keeps "anthropic"/"gemini" artifacts from
116
+ // ever being re-injected by this adapter, so there is no cross-adapter leakage.
111
117
  function antigravityContents(messages: Message[]): { role: "user" | "model"; parts: CcaPart[] }[] {
112
118
  const contents: { role: "user" | "model"; parts: CcaPart[] }[] = [];
113
119
  for (const m of messages) {
@@ -54,6 +54,24 @@ export function parseRetryFromBody(detail: string | null | undefined): number |
54
54
  * and any `Retry-After`. Use at every adapter's `!response.ok` site so the retry
55
55
  * layer sees a uniform, status-carrying, backoff-aware error.
56
56
  */
57
+ /**
58
+ * One-shot reasoning-artifact fail-safe: send the request; if it 400s because a replayed
59
+ * reasoning artifact (signature / thoughtSignature / encrypted reasoning item) was rejected
60
+ * — expired signature, edited history, toggled thinking — retry ONCE with artifacts stripped
61
+ * (plain history). `send(strip)` rebuilds + fetches; `isArtifactError` matches the 400 body.
62
+ * ponytail: heuristic error-body string match — tighten to structured error codes if/when
63
+ * the providers expose them.
64
+ */
65
+ export async function fetchWithArtifactFailSafe(
66
+ send: (stripArtifacts: boolean) => Promise<Response>,
67
+ isArtifactError: (status: number, body: string) => boolean,
68
+ ): Promise<Response> {
69
+ const res = await send(false);
70
+ if (res.ok) return res;
71
+ const body = await res.clone().text().catch(() => "");
72
+ return isArtifactError(res.status, body) ? send(true) : res;
73
+ }
74
+
57
75
  export async function providerHttpError(provider: string, response: Response, context?: string): Promise<ProviderHttpError> {
58
76
  const detail = await response.text().catch(() => "");
59
77
  const retryAfterMs = parseRetryAfter(response.headers.get("retry-after")) ?? parseRetryFromBody(detail);