jeo-code 0.6.27 → 0.6.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -6,6 +6,32 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
6
6
 
7
7
  The README mirrors the latest 5 entries — regenerate with `bun run changelog:sync`.
8
8
 
9
+ ## [0.6.29] - 2026-06-19
10
+ _Signature-only thinking-block replay (Anthropic opus-4-7/4-8), plus a tmux mouse-flood memory guard confirming `jeo --tmux` does not leak._
11
+
12
+ ### Fixed
13
+ - **Anthropic thinking-block replay now covers signature-only artifacts.** Newer Opus models (opus-4-7/opus-4-8) think internally — tokens billed, a valid `signature` present — but return empty thinking text. The cross-turn replay required both `signature` AND `text`, so those models' reasoning was dropped between steps. Replay now sends a signed `thinking` block whenever a `signature` (or `redacted`) is present (text defaults to `""`), restoring multi-step reasoning continuity for signature-only models. API-key requests also send the `interleaved-thinking` + `prompt-caching-scope` betas so thinking+tools and scoped caching work outside OAuth.
14
+
15
+ ### Added
16
+ - **`claude-opus-4-7` catalogued** (FULL thinking, 200k ctx) and a dynamic context-window fallback for uncatalogued ids (claude 200k / gpt-5 400k / gemini-3 1M).
17
+ - **tmux mouse-report-flood memory guard** (`test/mouse-report-filter.test.ts`): 100k SGR mouse-move reports through `queuePromptInputChunk` leave the prompt queue at zero accumulation — the regression guard for the "`jeo --tmux` slows down over time" concern.
18
+
19
+ ### Verified
20
+ - **`jeo --tmux` has no bun memory leak.** The in-process lifecycle probe (`scripts/mem-probe.ts`, 3000 turns) reports a per-turn heap slope of ≈0 (returns to baseline, exit-listeners flat); a real `jeo --tmux` process plateaus in RSS under sustained mouse/resize/keystroke churn instead of climbing; and mouse reports are filtered (not buffered) with `activityLog` bounded to a 200-entry per-turn ring.
21
+
22
+ ## [0.6.28] - 2026-06-19
23
+ _Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity)._
24
+
25
+ ### Added
26
+ - **Provider-native reasoning replay across all three first-party providers.** jeo now captures each provider's opaque/signed reasoning artifact during streaming and replays it on later turns to the SAME provider+model, so the model keeps its chain of thought across tool steps instead of re-deriving it. New `Message.reasoningArtifacts` plus structured `Message.toolUse` / `toolResults` (stable ids) let capable adapters reconstruct **native** tool blocks (the key to continuity — plain-text tool feedback makes Claude strip prior thinking):
27
+ - **Anthropic**: captures `signature_delta` + `redacted_thinking`; replays `thinking`(+signature) → `tool_use` → `tool_result` blocks (gated on same-model + thinking-enabled).
28
+ - **OpenAI Responses**: requests `include: ["reasoning.encrypted_content"]` (store stays false), captures reasoning item id+encrypted_content, replays native `reasoning` + `function_call` + `function_call_output` items.
29
+ - **Gemini**: captures per-part `thoughtSignature`, replays native `functionCall`(+thoughtSignature) / `functionResponse` parts (coalescing-safe). This was previously deferred — structured `toolUse` unblocks the functionCall binding.
30
+ - **Fail-safe strip-and-retry.** A 400 naming a thinking/signature/encrypted/reasoning field retries the step ONCE with artifacts stripped (plain history), so an expired signature or edited history can never wedge a turn. Per provider (Anthropic/OpenAI/Gemini).
31
+
32
+ ### Changed
33
+ - **Reasoning artifacts ride the session record + token accounting.** `reasoningArtifacts` round-trips through session save/load (so `/resume` preserves replay continuity) and counts toward `estimateMessageTokens` (OpenAI encrypted blobs are KB-scale) so compaction/overflow stay honest. Markdown export is unchanged (artifacts are opaque). The engine's ~11 assistant-push sites are unified behind `pushAssistantTurn`, so every step (not just the final reply) carries its reasoning + artifacts. Antigravity is explicitly out of scope (no capture/replay; the provider-keyed match guard prevents any cross-adapter leakage).
34
+
9
35
  ## [0.6.27] - 2026-06-19
10
36
  _Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`._
11
37
 
package/README.ja.md CHANGED
@@ -2,10 +2,6 @@
2
2
  <img src="assets/hero.png" alt="jeo-code 自律コーディングエージェントのヒーローイラスト" width="100%" />
3
3
  </p>
4
4
 
5
- <p align="center">
6
- <img src="assets/icon.png" alt="jeo-code icon" width="96" />
7
- </p>
8
-
9
5
  <h1 align="center">jeo-code (jeo)</h1>
10
6
 
11
7
  <p align="center">
@@ -204,11 +200,11 @@ CI は `.github/workflows/npm-publish.yml` で公開します — GitHub リリ
204
200
  ## 変更履歴 (Changelog)
205
201
 
206
202
  <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
203
+ - **[0.6.29]** (2026-06-19) — Signature-only thinking-block replay (Anthropic opus-4-7/4-8), plus a tmux mouse-flood memory guard confirming `jeo --tmux` does not leak.
204
+ - **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
207
205
  - **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
208
206
  - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
209
207
  - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
210
- - **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
211
- - **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
212
208
 
213
209
  See [CHANGELOG.md](CHANGELOG.md) for the full history.
214
210
  <!-- CHANGELOG:END -->
package/README.ko.md CHANGED
@@ -2,10 +2,6 @@
2
2
  <img src="assets/hero.png" alt="jeo-code 자율 코딩 에이전트 히어로 일러스트" width="100%" />
3
3
  </p>
4
4
 
5
- <p align="center">
6
- <img src="assets/icon.png" alt="jeo-code icon" width="96" />
7
- </p>
8
-
9
5
  <h1 align="center">jeo-code (jeo)</h1>
10
6
 
11
7
  <p align="center">
@@ -204,11 +200,11 @@ CI는 `.github/workflows/npm-publish.yml`로 배포합니다 — GitHub 릴리
204
200
  ## 변경 이력 (Changelog)
205
201
 
206
202
  <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
203
+ - **[0.6.29]** (2026-06-19) — Signature-only thinking-block replay (Anthropic opus-4-7/4-8), plus a tmux mouse-flood memory guard confirming `jeo --tmux` does not leak.
204
+ - **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
207
205
  - **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
208
206
  - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
209
207
  - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
210
- - **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
211
- - **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
212
208
 
213
209
  See [CHANGELOG.md](CHANGELOG.md) for the full history.
214
210
  <!-- CHANGELOG:END -->
package/README.md CHANGED
@@ -2,10 +2,6 @@
2
2
  <img src="assets/hero.png" alt="jeo-code autonomous coding-agent hero illustration" width="100%" />
3
3
  </p>
4
4
 
5
- <p align="center">
6
- <img src="assets/icon.png" alt="jeo-code icon" width="96" />
7
- </p>
8
-
9
5
  <h1 align="center">jeo-code (jeo)</h1>
10
6
 
11
7
  <p align="center">
@@ -204,11 +200,11 @@ Required npm token permissions (repository secret `NPM_TOKEN`):
204
200
  ## Changelog
205
201
 
206
202
  <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
203
+ - **[0.6.29]** (2026-06-19) — Signature-only thinking-block replay (Anthropic opus-4-7/4-8), plus a tmux mouse-flood memory guard confirming `jeo --tmux` does not leak.
204
+ - **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
207
205
  - **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
208
206
  - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
209
207
  - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
210
- - **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
211
- - **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
212
208
 
213
209
  See [CHANGELOG.md](CHANGELOG.md) for the full history.
214
210
  <!-- CHANGELOG:END -->
package/README.zh.md CHANGED
@@ -2,10 +2,6 @@
2
2
  <img src="assets/hero.png" alt="jeo-code 自主编码代理主视觉插图" width="100%" />
3
3
  </p>
4
4
 
5
- <p align="center">
6
- <img src="assets/icon.png" alt="jeo-code icon" width="96" />
7
- </p>
8
-
9
5
  <h1 align="center">jeo-code (jeo)</h1>
10
6
 
11
7
  <p align="center">
@@ -204,11 +200,11 @@ CI 通过 `.github/workflows/npm-publish.yml` 发布 — GitHub 发布 release
204
200
  ## 更新日志 (Changelog)
205
201
 
206
202
  <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
203
+ - **[0.6.29]** (2026-06-19) — Signature-only thinking-block replay (Anthropic opus-4-7/4-8), plus a tmux mouse-flood memory guard confirming `jeo --tmux` does not leak.
204
+ - **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
207
205
  - **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
208
206
  - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
209
207
  - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
210
- - **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
211
- - **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
212
208
 
213
209
  See [CHANGELOG.md](CHANGELOG.md) for the full history.
214
210
  <!-- CHANGELOG:END -->
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "jeo-code",
3
- "version": "0.6.27",
3
+ "version": "0.6.29",
4
4
  "description": "Clean, highly optimized AI coding agent using spec-first loop",
5
5
  "type": "module",
6
6
  "main": "src/cli.ts",
@@ -78,7 +78,16 @@ const messageTokenCache = new WeakMap<Message, number>();
78
78
  export function estimateMessageTokens(msg: Message): number {
79
79
  const hit = messageTokenCache.get(msg);
80
80
  if (hit !== undefined) return hit;
81
- const n = estimateTokens(msg.role) + estimateTokens(msg.content) + (msg.images?.length ?? 0) * IMAGE_TOKEN_ESTIMATE + 1;
81
+ let n = estimateTokens(msg.role) + estimateTokens(msg.content) + (msg.images?.length ?? 0) * IMAGE_TOKEN_ESTIMATE + 1;
82
+ // Native reasoning artifacts (signature / encrypted_content / thought text) are NOT in
83
+ // `content` but become REAL input tokens once an adapter replays them — count them so
84
+ // the context meter and compaction trigger stay honest (OpenAI encrypted blobs are KB-scale).
85
+ // toolUse/toolResults/toolResultExtra are already reflected in `content`, so they are not re-added.
86
+ for (const a of msg.reasoningArtifacts ?? []) {
87
+ n += estimateTokens(a.text ?? "") + estimateTokens(a.signature ?? "")
88
+ + estimateTokens(a.redacted ?? "") + estimateTokens(a.thoughtSignature ?? "")
89
+ + estimateTokens(a.encrypted ?? "");
90
+ }
82
91
  messageTokenCache.set(msg, n);
83
92
  return n;
84
93
  }
@@ -34,11 +34,30 @@ async function invokeCallLlm(history: Message[], options: {
34
34
  onRetry?: (attempt: number, err: unknown, delayMs: number) => void;
35
35
  onToken?: (delta: string) => void;
36
36
  onReasoning?: (delta: string) => void;
37
+ onReasoningArtifact?: (artifact: import("../ai/types").ReasoningArtifact) => void;
37
38
  tools?: import("../ai/types").NativeToolSchema[];
38
39
  }): Promise<string> {
39
40
  const mod = await import("./loop");
40
41
  return mod.callLlm(history, options);
41
42
  }
43
+
44
+ /** Push an assistant turn, attaching the step's reasoning + native replay records when
45
+ * present. Centralizes the assistant-push sites so reasoning/artifacts attach uniformly
46
+ * (not just the final reply). Omits empty fields so back-compat serialization and the
47
+ * identity-keyed token cache are unaffected. */
48
+ function pushAssistantTurn(
49
+ history: Message[],
50
+ content: string,
51
+ reasoning: string,
52
+ artifacts: import("../ai/types").ReasoningArtifact[],
53
+ toolUse?: import("../ai/types").ToolUseRecord[],
54
+ ): void {
55
+ const msg: Message = { role: "assistant", content };
56
+ if (reasoning.trim()) msg.reasoning = reasoning;
57
+ if (artifacts.length) msg.reasoningArtifacts = artifacts;
58
+ if (toolUse && toolUse.length) msg.toolUse = toolUse;
59
+ history.push(msg);
60
+ }
42
61
  export interface ToolInvocation {
43
62
  tool: string;
44
63
  arguments?: Record<string, any>;
@@ -176,6 +195,9 @@ export interface AgentLoopEvents {
176
195
  /** Accumulated native reasoning/thinking text so far — drives a transient dimmed
177
196
  * "thinking" view. Only requested when a consumer (TUI) attaches. */
178
197
  onReasoningStream?(textSoFar: string): void;
198
+ /** Each provider-native reasoning ARTIFACT as it is captured (signature / thoughtSignature /
199
+ * reasoning item). Lets the final-reply path (launch.ts) persist artifacts for replay. */
200
+ onReasoningArtifactStream?(artifact: import("../ai/types").ReasoningArtifact): void;
179
201
  /** Step-budget change (gjc-style retry flow): the limit was extended because the
180
202
  * turn is making progress. `limit` is the new max; `reason` is display-ready. */
181
203
  onBudget?(limit: number, reason: string): void;
@@ -345,7 +367,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
345
367
  );
346
368
  const consolidated = wrapUp.trim();
347
369
  if (consolidated) {
348
- history.push({ role: "assistant", content: consolidated });
370
+ pushAssistantTurn(history, consolidated, "", []);
349
371
  return finish({
350
372
  done: false,
351
373
  steps: step,
@@ -493,6 +515,14 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
493
515
  const onReasoning = ev.onReasoningStream
494
516
  ? (delta: string) => { reasonBuf += delta; ev.onReasoningStream!(reasonBuf); }
495
517
  : undefined;
518
+ // Capture provider-native reasoning ARTIFACTS for replay (always — independent of any
519
+ // TUI display sink). Stays scoped to THIS step so a later consolidation push can't
520
+ // inherit a prior step's signatures.
521
+ const artifactBuf: import("../ai/types").ReasoningArtifact[] = [];
522
+ const onReasoningArtifact = (a: import("../ai/types").ReasoningArtifact) => {
523
+ artifactBuf.push(a);
524
+ ev.onReasoningArtifactStream?.(a);
525
+ };
496
526
  let responseText: string;
497
527
  try {
498
528
  responseText = await invokeCallLlm(history, {
@@ -510,6 +540,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
510
540
  onUsage: u => { acc.inputTokens += u.inputTokens ?? 0; acc.outputTokens += u.outputTokens ?? 0; sawUsage = true; },
511
541
  onToken,
512
542
  onReasoning,
543
+ onReasoningArtifact,
513
544
  // Make provider auto-retry visible: previously a rate-limited call sat in a
514
545
  // silent backoff wait, then surfaced "auto-retry was exhausted" with no trace
515
546
  // of the retries that DID happen.
@@ -604,10 +635,10 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
604
635
  const trimmed = responseText.trim();
605
636
  parseFailures++;
606
637
  if (trimmed && (!trimmed.includes("{") || parseFailures > MAX_PARSE_BOUNCES)) {
607
- history.push({ role: "assistant", content: responseText });
638
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
608
639
  return finish({ done: true, steps: step, doneReason: trimmed });
609
640
  }
610
- history.push({ role: "assistant", content: responseText });
641
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
611
642
  history.push({
612
643
  role: "user",
613
644
  content:
@@ -654,7 +685,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
654
685
  doneReason: `Stopped: the model returned no valid tool call ${MAX_INVALID_CALLS}× (a JSON reply with no valid "tool" or "tools" field). The selected model may be too small to follow the JSON tool protocol — switch to a stronger model with /model.`,
655
686
  });
656
687
  }
657
- history.push({ role: "assistant", content: responseText });
688
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
658
689
  history.push({
659
690
  role: "user",
660
691
  content: `Your last reply had no "tool" or "tools" field. Reply with exactly one JSON object, e.g. {"tool":"find","arguments":{"globPattern":"src/**"}} or {"tools":[{"tool":"read","arguments":{"filePath":"src/main.ts"}}, ...]}.`,
@@ -674,7 +705,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
674
705
  if (toolCalls.length === 1 && toolCalls[0].tool === "done") {
675
706
  if (sawMutation && (!sawVerification || pendingHookFailure !== null) && !donePushbackUsed) {
676
707
  donePushbackUsed = true; // second done always passes — escape hatch
677
- history.push({ role: "assistant", content: responseText });
708
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
678
709
  history.push({
679
710
  role: "user",
680
711
  content: pendingHookFailure !== null
@@ -696,7 +727,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
696
727
  const nudge = await ev.onBeforeDone((toolCalls[0].arguments?.reason as string) ?? "");
697
728
  if (nudge) {
698
729
  beforeDoneNudgeUsed = true;
699
- history.push({ role: "assistant", content: responseText });
730
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
700
731
  history.push({ role: "user", content: nudge });
701
732
  ev.onNotice?.("done deferred once — final plan reconciliation requested");
702
733
  step++;
@@ -709,7 +740,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
709
740
  if (opts.steer) {
710
741
  const pending = opts.steer().map(s => (s ?? "").trim()).filter(Boolean);
711
742
  if (pending.length) {
712
- history.push({ role: "assistant", content: responseText });
743
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
713
744
  for (const text of pending) {
714
745
  history.push({
715
746
  role: "user",
@@ -754,7 +785,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
754
785
  const lastChance = repeatCount === MAX_REPEAT - 1
755
786
  ? "This is your LAST attempt: if you emit the same call again the turn will end. "
756
787
  : "";
757
- history.push({ role: "assistant", content: responseText });
788
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
758
789
  history.push({
759
790
  role: "user",
760
791
  content:
@@ -784,7 +815,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
784
815
  if (!cycleBounceUsed) {
785
816
  cycleBounceUsed = true;
786
817
  recentStepSigs.length = 0; // fresh window: the correction earns a real retry
787
- history.push({ role: "assistant", content: responseText });
818
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
788
819
  history.push({
789
820
  role: "user",
790
821
  content:
@@ -944,6 +975,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
944
975
  );
945
976
  // Append the batch's hook diagnostics once so the model can self-correct. Two
946
977
  // DISTINCT hooks with identical output collapse to one full block + a cross-ref.
978
+ let hookExtra = "";
947
979
  if (hookDiags.length > 0) {
948
980
  const seenHookFeedback = new Set<string>();
949
981
  const diagLines: string[] = [];
@@ -956,14 +988,28 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
956
988
  diagLines.push(`[post-turn hook "${d.run}" — exit ${d.exitCode}]:\n${truncateToolOutput(d.output)}`);
957
989
  }
958
990
  }
959
- resultBlocks.push(diagLines.join("\n"));
991
+ hookExtra = diagLines.join("\n");
992
+ resultBlocks.push(hookExtra);
960
993
  }
961
994
 
962
- history.push({ role: "assistant", content: responseText });
963
- history.push({
964
- role: "user",
965
- content: resultBlocks.join("\n\n"),
966
- });
995
+ // Structured native replay records: stable ids correlate the assistant tool_use
996
+ // turn with its tool_result user turn (the string `content` stays the source of
997
+ // truth for display / compaction / fallback adapters).
998
+ const idFor = (idx: number) => `call_${step}_${idx}`;
999
+ const toolUse: import("../ai/types").ToolUseRecord[] = indices.map(idx => ({
1000
+ id: idFor(idx),
1001
+ tool: toolCalls[idx].tool,
1002
+ arguments: toolCalls[idx].arguments ?? {},
1003
+ }));
1004
+ const toolResults: import("../ai/types").ToolResultRecord[] = indices.map((idx, i) => ({
1005
+ id: idFor(idx),
1006
+ output: bodies[i],
1007
+ isError: !results[idx].success,
1008
+ }));
1009
+ pushAssistantTurn(history, responseText, reasonBuf, artifactBuf, toolUse);
1010
+ const resultMsg: Message = { role: "user", content: resultBlocks.join("\n\n"), toolResults };
1011
+ if (hookExtra) resultMsg.toolResultExtra = hookExtra;
1012
+ history.push(resultMsg);
967
1013
  };
968
1014
 
969
1015
  if (aborted) {
@@ -1053,7 +1099,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
1053
1099
  );
1054
1100
  const consolidated = wrapUp.trim();
1055
1101
  if (consolidated) {
1056
- history.push({ role: "assistant", content: consolidated });
1102
+ pushAssistantTurn(history, consolidated, "", []);
1057
1103
  return finish({
1058
1104
  done: false,
1059
1105
  steps: budget.limit(),
package/src/agent/loop.ts CHANGED
@@ -26,6 +26,9 @@ export interface ChatOptions {
26
26
  onToken?: (delta: string) => void;
27
27
  /** Streaming sink for native reasoning/thinking deltas (drives the dimmed live view). */
28
28
  onReasoning?: (delta: string) => void;
29
+ /** Streaming sink for provider-native reasoning ARTIFACTS (signature / thoughtSignature /
30
+ * reasoning item id+encrypted) — the replay channel, separate from onReasoning. */
31
+ onReasoningArtifact?: (artifact: import("../ai/types").ReasoningArtifact) => void;
29
32
  /** NATIVE tool-calling function declarations (forwarded to capable adapters). */
30
33
  tools?: import("../ai/types").NativeToolSchema[];
31
34
  }
@@ -37,6 +37,8 @@ const STD: ThinkLevel[] = ["minimal", "low", "medium", "high"];
37
37
  export const ANTIGRAVITY_MODELS = [
38
38
  "claude-opus-4-5-thinking",
39
39
  "claude-opus-4-6-thinking",
40
+ "claude-opus-4-7",
41
+ "claude-opus-4-7-thinking",
40
42
  "claude-opus-4-8",
41
43
  "claude-opus-4-8-thinking",
42
44
  "claude-sonnet-4-5",
@@ -52,6 +54,7 @@ export const ANTIGRAVITY_MODELS = [
52
54
  "gemini-3.1-pro-high",
53
55
  "gemini-3.1-pro-low",
54
56
  "gpt-oss-120b-medium",
57
+ "gpt-5.5",
55
58
  ] as const;
56
59
 
57
60
  /** A curated set of common public models with their documented capabilities. */
@@ -62,9 +65,13 @@ export const MODEL_CATALOG: readonly CatalogModel[] = [
62
65
  { canonical: "claude-sonnet-4-5", provider: "anthropic", providerModel: "claude-sonnet-4-5-20250929", contextTokens: 200_000, maxOutputTokens: 64_000, thinking: FULL, images: true },
63
66
  { canonical: "claude-opus-4-1", provider: "anthropic", providerModel: "claude-opus-4-1-20250805", contextTokens: 200_000, maxOutputTokens: 32_000, thinking: FULL, images: true },
64
67
  { canonical: "claude-opus-4-5", provider: "anthropic", providerModel: "claude-opus-4-5-20251101", contextTokens: 200_000, maxOutputTokens: 64_000, thinking: FULL, images: true },
65
- // NOTE: confirm exact dated provider ids when these ship publicly; the family
66
- // heuristic in `catalogMetadata` keeps reasoning working even before that.
68
+ // NOTE: opus-4-7 accepts extended thinking but currently returns 0 thinking tokens
69
+ // (model-internal, no visible thought). opus-4-8 thinks internally (tokens billed,
70
+ // signature present) but returns empty thinking text. Both are FULL-capable in the
71
+ // catalog so the budget is always sent — the nativizable path handles signature-only
72
+ // artifacts for cross-turn continuity.
67
73
  { canonical: "claude-opus-4-6", provider: "anthropic", providerModel: "claude-opus-4-6", contextTokens: 200_000, maxOutputTokens: 64_000, thinking: FULL, images: true },
74
+ { canonical: "claude-opus-4-7", provider: "anthropic", providerModel: "claude-opus-4-7", contextTokens: 200_000, maxOutputTokens: 64_000, thinking: FULL, images: true },
68
75
  { canonical: "claude-opus-4-8", provider: "anthropic", providerModel: "claude-opus-4-8", contextTokens: 200_000, maxOutputTokens: 64_000, thinking: FULL, images: true },
69
76
  // OpenAI
70
77
  { canonical: "gpt-4o", provider: "openai", providerModel: "gpt-4o", contextTokens: 128_000, maxOutputTokens: 16_384, thinking: [], images: true },
@@ -96,9 +103,9 @@ export const MODEL_CATALOG: readonly CatalogModel[] = [
96
103
  canonical: `antigravity/${id}`,
97
104
  provider: "antigravity",
98
105
  providerModel: id,
99
- contextTokens: id.includes("claude") ? 200_000 : id.includes("gemini-3") ? 1_000_000 : 1_000_000,
100
- maxOutputTokens: id.includes("claude") ? 64_000 : 65_536,
101
- thinking: id.includes("thinking") || id.includes("-high") || id.includes("-low") || id.includes("gemini-3") ? FULL : STD,
106
+ contextTokens: id.includes("claude") ? 200_000 : id.startsWith("gpt-5") ? 400_000 : id.includes("gemini-3") ? 1_000_000 : 1_000_000,
107
+ maxOutputTokens: id.includes("claude") ? 64_000 : id.startsWith("gpt-5") ? 128_000 : 65_536,
108
+ thinking: id.includes("thinking") || id.includes("-high") || id.includes("-low") || id.includes("gemini-3") || id.startsWith("gpt-5") ? FULL : STD,
102
109
  images: !id.includes("gpt-oss"),
103
110
  company: id.includes("claude") ? "Anthropic via Antigravity" : id.includes("gpt") ? "OpenAI via Antigravity" : "Google Antigravity",
104
111
  })),
@@ -332,6 +332,7 @@ async function resolveCall(options: Partial<CallOptions>, kind: "request" | "str
332
332
  signal: options.signal,
333
333
  reasoningEffort: options.reasoningEffort ?? thinkingToReasoningEffort(config.thinkingLevel),
334
334
  onReasoning: options.onReasoning,
335
+ onReasoningArtifact: options.onReasoningArtifact,
335
336
  tools: options.tools,
336
337
  };
337
338
  // Caller-supplied retry sink rides on the config-derived retry budget so the