npm - jeo-code - Versions diffs - 0.6.27 → 0.6.29 - Mend

jeo-code 0.6.27 → 0.6.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/CHANGELOG.md +26 -0
package/README.ja.md +2 -6
package/README.ko.md +2 -6
package/README.md +2 -6
package/README.zh.md +2 -6
package/package.json +1 -1
package/src/agent/compaction.ts +10 -1
package/src/agent/engine.ts +62 -16
package/src/agent/loop.ts +3 -0
package/src/ai/model-catalog.ts +12 -5
package/src/ai/model-manager.ts +1 -0
package/src/ai/providers/anthropic.ts +121 -21
package/src/ai/providers/antigravity.ts +6 -0
package/src/ai/providers/errors.ts +18 -0
package/src/ai/providers/gemini.ts +84 -28
package/src/ai/providers/openai-compatible-catalog.ts +10 -4
package/src/ai/providers/openai-responses.ts +76 -19
package/src/ai/types.ts +55 -2
package/src/commands/launch.ts +90 -22
package/src/tui/app.ts +38 -6
package/src/tui/components/ascii-art.ts +27 -31

package/CHANGELOG.md CHANGED Viewed

@@ -6,6 +6,32 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 The README mirrors the latest 5 entries — regenerate with `bun run changelog:sync`.
+## [0.6.29] - 2026-06-19
+_Signature-only thinking-block replay (Anthropic opus-4-7/4-8), plus a tmux mouse-flood memory guard confirming `jeo --tmux` does not leak._
+### Fixed
+- **Anthropic thinking-block replay now covers signature-only artifacts.** Newer Opus models (opus-4-7/opus-4-8) think internally — tokens billed, a valid `signature` present — but return empty thinking text. The cross-turn replay required both `signature` AND `text`, so those models' reasoning was dropped between steps. Replay now sends a signed `thinking` block whenever a `signature` (or `redacted`) is present (text defaults to `""`), restoring multi-step reasoning continuity for signature-only models. API-key requests also send the `interleaved-thinking` + `prompt-caching-scope` betas so thinking+tools and scoped caching work outside OAuth.
+### Added
+- **`claude-opus-4-7` catalogued** (FULL thinking, 200k ctx) and a dynamic context-window fallback for uncatalogued ids (claude 200k / gpt-5 400k / gemini-3 1M).
+- **tmux mouse-report-flood memory guard** (`test/mouse-report-filter.test.ts`): 100k SGR mouse-move reports through `queuePromptInputChunk` leave the prompt queue at zero accumulation — the regression guard for the "`jeo --tmux` slows down over time" concern.
+### Verified
+- **`jeo --tmux` has no bun memory leak.** The in-process lifecycle probe (`scripts/mem-probe.ts`, 3000 turns) reports a per-turn heap slope of ≈0 (returns to baseline, exit-listeners flat); a real `jeo --tmux` process plateaus in RSS under sustained mouse/resize/keystroke churn instead of climbing; and mouse reports are filtered (not buffered) with `activityLog` bounded to a 200-entry per-turn ring.
+## [0.6.28] - 2026-06-19
+_Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity)._
+### Added
+- **Provider-native reasoning replay across all three first-party providers.** jeo now captures each provider's opaque/signed reasoning artifact during streaming and replays it on later turns to the SAME provider+model, so the model keeps its chain of thought across tool steps instead of re-deriving it. New `Message.reasoningArtifacts` plus structured `Message.toolUse` / `toolResults` (stable ids) let capable adapters reconstruct **native** tool blocks (the key to continuity — plain-text tool feedback makes Claude strip prior thinking):
+  - **Anthropic**: captures `signature_delta` + `redacted_thinking`; replays `thinking`(+signature) → `tool_use` → `tool_result` blocks (gated on same-model + thinking-enabled).
+  - **OpenAI Responses**: requests `include: ["reasoning.encrypted_content"]` (store stays false), captures reasoning item id+encrypted_content, replays native `reasoning` + `function_call` + `function_call_output` items.
+  - **Gemini**: captures per-part `thoughtSignature`, replays native `functionCall`(+thoughtSignature) / `functionResponse` parts (coalescing-safe). This was previously deferred — structured `toolUse` unblocks the functionCall binding.
+- **Fail-safe strip-and-retry.** A 400 naming a thinking/signature/encrypted/reasoning field retries the step ONCE with artifacts stripped (plain history), so an expired signature or edited history can never wedge a turn. Per provider (Anthropic/OpenAI/Gemini).
+### Changed
+- **Reasoning artifacts ride the session record + token accounting.** `reasoningArtifacts` round-trips through session save/load (so `/resume` preserves replay continuity) and counts toward `estimateMessageTokens` (OpenAI encrypted blobs are KB-scale) so compaction/overflow stay honest. Markdown export is unchanged (artifacts are opaque). The engine's ~11 assistant-push sites are unified behind `pushAssistantTurn`, so every step (not just the final reply) carries its reasoning + artifacts. Antigravity is explicitly out of scope (no capture/replay; the provider-keyed match guard prevents any cross-adapter leakage).
 ## [0.6.27] - 2026-06-19
 _Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`._

package/README.ja.md CHANGED Viewed

@@ -2,10 +2,6 @@
   <img src="assets/hero.png" alt="jeo-code 自律コーディングエージェントのヒーローイラスト" width="100%" />
 </p>
-<p align="center">
-  <img src="assets/icon.png" alt="jeo-code icon" width="96" />
-</p>
 <h1 align="center">jeo-code (jeo)</h1>
 <p align="center">
@@ -204,11 +200,11 @@ CI は `.github/workflows/npm-publish.yml` で公開します — GitHub リリ
 ## 変更履歴 (Changelog)
 <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
+- **[0.6.29]** (2026-06-19) — Signature-only thinking-block replay (Anthropic opus-4-7/4-8), plus a tmux mouse-flood memory guard confirming `jeo --tmux` does not leak.
+- **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
 - **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
 - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
 - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
-- **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
-- **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
 See [CHANGELOG.md](CHANGELOG.md) for the full history.
 <!-- CHANGELOG:END -->

package/README.ko.md CHANGED Viewed

@@ -2,10 +2,6 @@
   <img src="assets/hero.png" alt="jeo-code 자율 코딩 에이전트 히어로 일러스트" width="100%" />
 </p>
-<p align="center">
-  <img src="assets/icon.png" alt="jeo-code icon" width="96" />
-</p>
 <h1 align="center">jeo-code (jeo)</h1>
 <p align="center">
@@ -204,11 +200,11 @@ CI는 `.github/workflows/npm-publish.yml`로 배포합니다 — GitHub 릴리
 ## 변경 이력 (Changelog)
 <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
+- **[0.6.29]** (2026-06-19) — Signature-only thinking-block replay (Anthropic opus-4-7/4-8), plus a tmux mouse-flood memory guard confirming `jeo --tmux` does not leak.
+- **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
 - **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
 - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
 - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
-- **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
-- **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
 See [CHANGELOG.md](CHANGELOG.md) for the full history.
 <!-- CHANGELOG:END -->

package/README.md CHANGED Viewed

@@ -2,10 +2,6 @@
   <img src="assets/hero.png" alt="jeo-code autonomous coding-agent hero illustration" width="100%" />
 </p>
-<p align="center">
-  <img src="assets/icon.png" alt="jeo-code icon" width="96" />
-</p>
 <h1 align="center">jeo-code (jeo)</h1>
 <p align="center">
@@ -204,11 +200,11 @@ Required npm token permissions (repository secret `NPM_TOKEN`):
 ## Changelog
 <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
+- **[0.6.29]** (2026-06-19) — Signature-only thinking-block replay (Anthropic opus-4-7/4-8), plus a tmux mouse-flood memory guard confirming `jeo --tmux` does not leak.
+- **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
 - **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
 - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
 - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
-- **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
-- **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
 See [CHANGELOG.md](CHANGELOG.md) for the full history.
 <!-- CHANGELOG:END -->

package/README.zh.md CHANGED Viewed

@@ -2,10 +2,6 @@
   <img src="assets/hero.png" alt="jeo-code 自主编码代理主视觉插图" width="100%" />
 </p>
-<p align="center">
-  <img src="assets/icon.png" alt="jeo-code icon" width="96" />
-</p>
 <h1 align="center">jeo-code (jeo)</h1>
 <p align="center">
@@ -204,11 +200,11 @@ CI 通过 `.github/workflows/npm-publish.yml` 发布 — GitHub 发布 release
 ## 更新日志 (Changelog)
 <!-- CHANGELOG:START (auto-generated from CHANGELOG.md — run `bun run changelog:sync`) -->
+- **[0.6.29]** (2026-06-19) — Signature-only thinking-block replay (Anthropic opus-4-7/4-8), plus a tmux mouse-flood memory guard confirming `jeo --tmux` does not leak.
+- **[0.6.28]** (2026-06-19) — Signed thinking-block replay: native reasoning is now sent BACK to providers across steps/turns, restoring multi-step reasoning continuity (gajae parity).
 - **[0.6.27]** (2026-06-19) — Ponytail pass on the reasoning-tier mapper, plus a real-tmux verification of `jeo --tmux`.
 - **[0.6.26]** (2026-06-19) — The forge emblem is redrawn again as the mascot crayfish, foregrounding its signature pincer claws (집게).
 - **[0.6.25]** (2026-06-19) — Reasoning works at every thinking level (gajae parity), and the forge emblem is redrawn as the neon-lens coding wizard.
-- **[0.6.24]** (2026-06-19) — `/provider` opens an interactive onboarding selector (OAuth vs API-compatible), and OpenAI-compatible backends gain per-vendor native-reasoning formats.
-- **[0.6.23]** (2026-06-19) — Live reasoning/thinking streams in the TUI across every provider, three new OpenAI-compatible backends (LM Studio, xAI, Kimi) join the auth/discovery/catalog surface, and Gemini gains native function-calling.
 See [CHANGELOG.md](CHANGELOG.md) for the full history.
 <!-- CHANGELOG:END -->

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "jeo-code",
-  "version": "0.6.27",
+  "version": "0.6.29",
   "description": "Clean, highly optimized AI coding agent using spec-first loop",
   "type": "module",
   "main": "src/cli.ts",

package/src/agent/compaction.ts CHANGED Viewed

@@ -78,7 +78,16 @@ const messageTokenCache = new WeakMap<Message, number>();
 export function estimateMessageTokens(msg: Message): number {
   const hit = messageTokenCache.get(msg);
   if (hit !== undefined) return hit;
-  const n = estimateTokens(msg.role) + estimateTokens(msg.content) + (msg.images?.length ?? 0) * IMAGE_TOKEN_ESTIMATE + 1;
+  let n = estimateTokens(msg.role) + estimateTokens(msg.content) + (msg.images?.length ?? 0) * IMAGE_TOKEN_ESTIMATE + 1;
+  // Native reasoning artifacts (signature / encrypted_content / thought text) are NOT in
+  // `content` but become REAL input tokens once an adapter replays them — count them so
+  // the context meter and compaction trigger stay honest (OpenAI encrypted blobs are KB-scale).
+  // toolUse/toolResults/toolResultExtra are already reflected in `content`, so they are not re-added.
+  for (const a of msg.reasoningArtifacts ?? []) {
+    n += estimateTokens(a.text ?? "") + estimateTokens(a.signature ?? "")
+      + estimateTokens(a.redacted ?? "") + estimateTokens(a.thoughtSignature ?? "")
+      + estimateTokens(a.encrypted ?? "");
+  }
   messageTokenCache.set(msg, n);
   return n;
 }

package/src/agent/engine.ts CHANGED Viewed

@@ -34,11 +34,30 @@ async function invokeCallLlm(history: Message[], options: {
   onRetry?: (attempt: number, err: unknown, delayMs: number) => void;
   onToken?: (delta: string) => void;
   onReasoning?: (delta: string) => void;
+  onReasoningArtifact?: (artifact: import("../ai/types").ReasoningArtifact) => void;
   tools?: import("../ai/types").NativeToolSchema[];
 }): Promise<string> {
   const mod = await import("./loop");
   return mod.callLlm(history, options);
 }
+/** Push an assistant turn, attaching the step's reasoning + native replay records when
+ *  present. Centralizes the assistant-push sites so reasoning/artifacts attach uniformly
+ *  (not just the final reply). Omits empty fields so back-compat serialization and the
+ *  identity-keyed token cache are unaffected. */
+function pushAssistantTurn(
+  history: Message[],
+  content: string,
+  reasoning: string,
+  artifacts: import("../ai/types").ReasoningArtifact[],
+  toolUse?: import("../ai/types").ToolUseRecord[],
+): void {
+  const msg: Message = { role: "assistant", content };
+  if (reasoning.trim()) msg.reasoning = reasoning;
+  if (artifacts.length) msg.reasoningArtifacts = artifacts;
+  if (toolUse && toolUse.length) msg.toolUse = toolUse;
+  history.push(msg);
+}
 export interface ToolInvocation {
   tool: string;
   arguments?: Record<string, any>;
@@ -176,6 +195,9 @@ export interface AgentLoopEvents {
   /** Accumulated native reasoning/thinking text so far — drives a transient dimmed
    *  "thinking" view. Only requested when a consumer (TUI) attaches. */
   onReasoningStream?(textSoFar: string): void;
+  /** Each provider-native reasoning ARTIFACT as it is captured (signature / thoughtSignature /
+   *  reasoning item). Lets the final-reply path (launch.ts) persist artifacts for replay. */
+  onReasoningArtifactStream?(artifact: import("../ai/types").ReasoningArtifact): void;
   /** Step-budget change (gjc-style retry flow): the limit was extended because the
    *  turn is making progress. `limit` is the new max; `reason` is display-ready. */
   onBudget?(limit: number, reason: string): void;
@@ -345,7 +367,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
         );
         const consolidated = wrapUp.trim();
         if (consolidated) {
-          history.push({ role: "assistant", content: consolidated });
+          pushAssistantTurn(history, consolidated, "", []);
           return finish({
             done: false,
             steps: step,
@@ -493,6 +515,14 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
     const onReasoning = ev.onReasoningStream
       ? (delta: string) => { reasonBuf += delta; ev.onReasoningStream!(reasonBuf); }
       : undefined;
+    // Capture provider-native reasoning ARTIFACTS for replay (always — independent of any
+    // TUI display sink). Stays scoped to THIS step so a later consolidation push can't
+    // inherit a prior step's signatures.
+    const artifactBuf: import("../ai/types").ReasoningArtifact[] = [];
+    const onReasoningArtifact = (a: import("../ai/types").ReasoningArtifact) => {
+      artifactBuf.push(a);
+      ev.onReasoningArtifactStream?.(a);
+    };
     let responseText: string;
     try {
       responseText = await invokeCallLlm(history, {
@@ -510,6 +540,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
               onUsage: u => { acc.inputTokens += u.inputTokens ?? 0; acc.outputTokens += u.outputTokens ?? 0; sawUsage = true; },
               onToken,
               onReasoning,
+              onReasoningArtifact,
               // Make provider auto-retry visible: previously a rate-limited call sat in a
               // silent backoff wait, then surfaced "auto-retry was exhausted" with no trace
               // of the retries that DID happen.
@@ -604,10 +635,10 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
       const trimmed = responseText.trim();
       parseFailures++;
       if (trimmed && (!trimmed.includes("{") || parseFailures > MAX_PARSE_BOUNCES)) {
-        history.push({ role: "assistant", content: responseText });
+        pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
         return finish({ done: true, steps: step, doneReason: trimmed });
       }
-      history.push({ role: "assistant", content: responseText });
+      pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
       history.push({
         role: "user",
         content:
@@ -654,7 +685,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
           doneReason: `Stopped: the model returned no valid tool call ${MAX_INVALID_CALLS}× (a JSON reply with no valid "tool" or "tools" field). The selected model may be too small to follow the JSON tool protocol — switch to a stronger model with /model.`,
         });
       }
-      history.push({ role: "assistant", content: responseText });
+      pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
       history.push({
         role: "user",
         content: `Your last reply had no "tool" or "tools" field. Reply with exactly one JSON object, e.g. {"tool":"find","arguments":{"globPattern":"src/**"}} or {"tools":[{"tool":"read","arguments":{"filePath":"src/main.ts"}}, ...]}.`,
@@ -674,7 +705,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
     if (toolCalls.length === 1 && toolCalls[0].tool === "done") {
       if (sawMutation && (!sawVerification || pendingHookFailure !== null) && !donePushbackUsed) {
         donePushbackUsed = true; // second done always passes — escape hatch
-        history.push({ role: "assistant", content: responseText });
+        pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
         history.push({
           role: "user",
           content: pendingHookFailure !== null
@@ -696,7 +727,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
         const nudge = await ev.onBeforeDone((toolCalls[0].arguments?.reason as string) ?? "");
         if (nudge) {
           beforeDoneNudgeUsed = true;
-          history.push({ role: "assistant", content: responseText });
+          pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
           history.push({ role: "user", content: nudge });
           ev.onNotice?.("done deferred once — final plan reconciliation requested");
           step++;
@@ -709,7 +740,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
       if (opts.steer) {
         const pending = opts.steer().map(s => (s ?? "").trim()).filter(Boolean);
         if (pending.length) {
-          history.push({ role: "assistant", content: responseText });
+          pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
           for (const text of pending) {
             history.push({
               role: "user",
@@ -754,7 +785,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
       const lastChance = repeatCount === MAX_REPEAT - 1
         ? "This is your LAST attempt: if you emit the same call again the turn will end. "
         : "";
-      history.push({ role: "assistant", content: responseText });
+      pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
       history.push({
         role: "user",
         content:
@@ -784,7 +815,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
       if (!cycleBounceUsed) {
         cycleBounceUsed = true;
         recentStepSigs.length = 0; // fresh window: the correction earns a real retry
-        history.push({ role: "assistant", content: responseText });
+        pushAssistantTurn(history, responseText, reasonBuf, artifactBuf);
         history.push({
           role: "user",
           content:
@@ -944,6 +975,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
       );
       // Append the batch's hook diagnostics once so the model can self-correct. Two
       // DISTINCT hooks with identical output collapse to one full block + a cross-ref.
+      let hookExtra = "";
       if (hookDiags.length > 0) {
         const seenHookFeedback = new Set<string>();
         const diagLines: string[] = [];
@@ -956,14 +988,28 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
             diagLines.push(`[post-turn hook "${d.run}" — exit ${d.exitCode}]:\n${truncateToolOutput(d.output)}`);
           }
         }
-        resultBlocks.push(diagLines.join("\n"));
+        hookExtra = diagLines.join("\n");
+        resultBlocks.push(hookExtra);
       }
-      history.push({ role: "assistant", content: responseText });
-      history.push({
-        role: "user",
-        content: resultBlocks.join("\n\n"),
-      });
+      // Structured native replay records: stable ids correlate the assistant tool_use
+      // turn with its tool_result user turn (the string `content` stays the source of
+      // truth for display / compaction / fallback adapters).
+      const idFor = (idx: number) => `call_${step}_${idx}`;
+      const toolUse: import("../ai/types").ToolUseRecord[] = indices.map(idx => ({
+        id: idFor(idx),
+        tool: toolCalls[idx].tool,
+        arguments: toolCalls[idx].arguments ?? {},
+      }));
+      const toolResults: import("../ai/types").ToolResultRecord[] = indices.map((idx, i) => ({
+        id: idFor(idx),
+        output: bodies[i],
+        isError: !results[idx].success,
+      }));
+      pushAssistantTurn(history, responseText, reasonBuf, artifactBuf, toolUse);
+      const resultMsg: Message = { role: "user", content: resultBlocks.join("\n\n"), toolResults };
+      if (hookExtra) resultMsg.toolResultExtra = hookExtra;
+      history.push(resultMsg);
     };
     if (aborted) {
@@ -1053,7 +1099,7 @@ export async function runAgentLoop(history: Message[], opts: AgentLoopOptions):
       );
       const consolidated = wrapUp.trim();
       if (consolidated) {
-        history.push({ role: "assistant", content: consolidated });
+        pushAssistantTurn(history, consolidated, "", []);
         return finish({
           done: false,
           steps: budget.limit(),

package/src/agent/loop.ts CHANGED Viewed

@@ -26,6 +26,9 @@ export interface ChatOptions {
   onToken?: (delta: string) => void;
   /** Streaming sink for native reasoning/thinking deltas (drives the dimmed live view). */
   onReasoning?: (delta: string) => void;
+  /** Streaming sink for provider-native reasoning ARTIFACTS (signature / thoughtSignature /
+   *  reasoning item id+encrypted) — the replay channel, separate from onReasoning. */
+  onReasoningArtifact?: (artifact: import("../ai/types").ReasoningArtifact) => void;
   /** NATIVE tool-calling function declarations (forwarded to capable adapters). */
   tools?: import("../ai/types").NativeToolSchema[];
 }

package/src/ai/model-catalog.ts CHANGED Viewed

@@ -37,6 +37,8 @@ const STD: ThinkLevel[] = ["minimal", "low", "medium", "high"];
 export const ANTIGRAVITY_MODELS = [
   "claude-opus-4-5-thinking",
   "claude-opus-4-6-thinking",
+  "claude-opus-4-7",
+  "claude-opus-4-7-thinking",
   "claude-opus-4-8",
   "claude-opus-4-8-thinking",
   "claude-sonnet-4-5",
@@ -52,6 +54,7 @@ export const ANTIGRAVITY_MODELS = [
   "gemini-3.1-pro-high",
   "gemini-3.1-pro-low",
   "gpt-oss-120b-medium",
+  "gpt-5.5",
 ] as const;
 /** A curated set of common public models with their documented capabilities. */
@@ -62,9 +65,13 @@ export const MODEL_CATALOG: readonly CatalogModel[] = [
   { canonical: "claude-sonnet-4-5", provider: "anthropic", providerModel: "claude-sonnet-4-5-20250929", contextTokens: 200_000, maxOutputTokens: 64_000, thinking: FULL, images: true },
   { canonical: "claude-opus-4-1", provider: "anthropic", providerModel: "claude-opus-4-1-20250805", contextTokens: 200_000, maxOutputTokens: 32_000, thinking: FULL, images: true },
   { canonical: "claude-opus-4-5", provider: "anthropic", providerModel: "claude-opus-4-5-20251101", contextTokens: 200_000, maxOutputTokens: 64_000, thinking: FULL, images: true },
-  // NOTE: confirm exact dated provider ids when these ship publicly; the family
-  // heuristic in `catalogMetadata` keeps reasoning working even before that.
+  // NOTE: opus-4-7 accepts extended thinking but currently returns 0 thinking tokens
+  // (model-internal, no visible thought). opus-4-8 thinks internally (tokens billed,
+  // signature present) but returns empty thinking text. Both are FULL-capable in the
+  // catalog so the budget is always sent — the nativizable path handles signature-only
+  // artifacts for cross-turn continuity.
   { canonical: "claude-opus-4-6", provider: "anthropic", providerModel: "claude-opus-4-6", contextTokens: 200_000, maxOutputTokens: 64_000, thinking: FULL, images: true },
+  { canonical: "claude-opus-4-7", provider: "anthropic", providerModel: "claude-opus-4-7", contextTokens: 200_000, maxOutputTokens: 64_000, thinking: FULL, images: true },
   { canonical: "claude-opus-4-8", provider: "anthropic", providerModel: "claude-opus-4-8", contextTokens: 200_000, maxOutputTokens: 64_000, thinking: FULL, images: true },
   // OpenAI
   { canonical: "gpt-4o", provider: "openai", providerModel: "gpt-4o", contextTokens: 128_000, maxOutputTokens: 16_384, thinking: [], images: true },
@@ -96,9 +103,9 @@ export const MODEL_CATALOG: readonly CatalogModel[] = [
     canonical: `antigravity/${id}`,
     provider: "antigravity",
     providerModel: id,
-    contextTokens: id.includes("claude") ? 200_000 : id.includes("gemini-3") ? 1_000_000 : 1_000_000,
-    maxOutputTokens: id.includes("claude") ? 64_000 : 65_536,
-    thinking: id.includes("thinking") || id.includes("-high") || id.includes("-low") || id.includes("gemini-3") ? FULL : STD,
+    contextTokens: id.includes("claude") ? 200_000 : id.startsWith("gpt-5") ? 400_000 : id.includes("gemini-3") ? 1_000_000 : 1_000_000,
+    maxOutputTokens: id.includes("claude") ? 64_000 : id.startsWith("gpt-5") ? 128_000 : 65_536,
+    thinking: id.includes("thinking") || id.includes("-high") || id.includes("-low") || id.includes("gemini-3") || id.startsWith("gpt-5") ? FULL : STD,
     images: !id.includes("gpt-oss"),
     company: id.includes("claude") ? "Anthropic via Antigravity" : id.includes("gpt") ? "OpenAI via Antigravity" : "Google Antigravity",
   })),

package/src/ai/model-manager.ts CHANGED Viewed

@@ -332,6 +332,7 @@ async function resolveCall(options: Partial<CallOptions>, kind: "request" | "str
     signal: options.signal,
     reasoningEffort: options.reasoningEffort ?? thinkingToReasoningEffort(config.thinkingLevel),
     onReasoning: options.onReasoning,
+    onReasoningArtifact: options.onReasoningArtifact,
     tools: options.tools,
   };
   // Caller-supplied retry sink rides on the config-derived retry budget so the