npm - @offbynan/pi-cursor-provider - Versions diffs - 0.3.0 → 0.5.0 - Mend

@offbynan/pi-cursor-provider 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -1,13 +1,67 @@
 # pi-cursor-provider
-> **This fork improves on the upstream in ten areas:** image support, correct `pi -p` exit behaviour, removal of dead eviction code, accurate per-model context window inference, post-compaction session sync, context window scaling when Cursor enforces a tighter cap, per-model cost estimation, model deduplication with reasoning-effort mapping, thinking-tag filtering, and structured debug logging. See the sections below for details.
+**This fork improves on the upstream across six areas:**
-[![npm version](https://img.shields.io/npm/v/@offbynan/pi-cursor-provider.svg)](https://www.npmjs.com/package/@offbynan/pi-cursor-provider)
+- **Image support** — base64 `image_url` content parts forwarded to Cursor end-to-end; the upstream silently drops them
+- **Compaction support** — old turns archived as inline text to cut `getBlobArgs` round-trips from O(history) to O(tail); bridge termination errors surface as real failures instead of silent empty responses; checkpoint cleared after compaction to keep both sides in sync
+- **Reliability** — bridge timeouts hardened and configurable; SSE keepalive prevents pi from timing out during blob-fetching; conversation state and checkpoints survive transient failures and client disconnects
+- **Model support** — per-model context window inference (vs. hardcoded 200 k); runtime cap scaling when Cursor enforces a tighter window; detailed cost table for all current families; effort-suffix variants deduplicated so pi's reasoning-level setting drives the suffix automatically
+- **Thinking-tag filtering** — inline `<think>` / `<reasoning>` tags stripped from the response and routed to `reasoning_content`
+- **Fixes & observability** — `pi -p` exit hang fixed; dead TTL eviction code removed; opt-in JSONL debug logging with a bundled timeline viewer
+[Pi](https://github.com/badlogic/pi-mono) extension that provides access to [Cursor](https://cursor.com) models (Claude, GPT, Gemini, Grok, Kimi, Composer) via OAuth and a local OpenAI-compatible proxy.
-[Pi](https://github.com/badlogic/pi-mono) extension that provides access to [Cursor](https://cursor.com) models via OAuth authentication and a local OpenAI-compatible proxy.
+[![npm version](https://img.shields.io/npm/v/@offbynan/pi-cursor-provider.svg)](https://www.npmjs.com/package/@offbynan/pi-cursor-provider)
 Forked from [ndraiman/pi-cursor-provider](https://github.com/ndraiman/pi-cursor-provider).
+## Install
+```bash
+# Via pi
+pi install npm:@offbynan/pi-cursor-provider
+# Or manually
+git clone https://github.com/offbynan/pi-cursor-provider ~/.pi/agent/extensions/cursor-provider
+cd ~/.pi/agent/extensions/cursor-provider
+npm install
+```
+## Usage
+```
+/login cursor     # authenticate via browser
+/model            # select a Cursor model
+```
+## How it works
+```
+pi  →  openai-completions  →  localhost:PORT/v1/chat/completions
+                                      ↓
+                              proxy.ts (HTTP server)
+                                      ↓
+                              h2-bridge.mjs (Node HTTP/2)
+                                      ↓
+                              api2.cursor.sh gRPC
+```
+1. **PKCE OAuth** — browser-based login to Cursor, no client secret needed
+2. **Model discovery** — queries Cursor's `GetUsableModels` gRPC endpoint
+3. **Local proxy** — translates OpenAI `/v1/chat/completions` to Cursor's protobuf/HTTP2 Connect protocol
+4. **Tool routing** — rejects Cursor's native tools, exposes pi's tools via MCP
+## Configuration
+| Env var | Default | Description |
+| ------- | ------- | ----------- |
+| `PI_CURSOR_PROVIDER_DEBUG` | off | Set to any truthy value to enable JSONL debug logging |
+| `PI_CURSOR_PROVIDER_DEBUG_FILE` | auto in tmpdir | Override the debug log file path |
+| `PI_CURSOR_BRIDGE_INITIAL_TIMEOUT_MS` | `120000` | Kill bridge if no HTTP/2 activity within this many ms of spawn |
+| `PI_CURSOR_BRIDGE_ACTIVITY_TIMEOUT_MS` | `300000` | Kill bridge if no HTTP/2 activity for this many ms after the first frame |
+| `PI_CURSOR_TURN_ARCHIVE_THRESHOLD` | `20` | Keep this many recent turns as raw blobs; older turns are archived as inline text |
+| `PI_CURSOR_RAW_MODELS` | off | Set to disable model deduplication and see all raw Cursor model IDs |
 ## Changes vs upstream
 ### Image support
@@ -96,41 +150,55 @@ The upstream has no observability. This fork adds opt-in JSONL event logging (se
 npm run debug:timeline -- --latest
 ```
-## How it works
+### Bridge timeout hardening
-```
-pi  →  openai-completions  →  localhost:PORT/v1/chat/completions
-                                      ↓
-                              proxy.ts (HTTP server)
-                                      ↓
-                              h2-bridge.mjs (Node HTTP/2)
-                                      ↓
-                              api2.cursor.sh gRPC
-```
+The upstream `h2-bridge.mjs` used a 30-second initial connection timeout and a 120-second activity timeout. Large conversations require Cursor to deserialise a big checkpoint and complete many `getBlobArgs` round-trips before it starts streaming tokens, which regularly exceeded these limits and caused compaction to fail with a `terminated` error.
-1. **PKCE OAuth** — browser-based login to Cursor, no client secret needed
-2. **Model discovery** — queries Cursor's `GetUsableModels` gRPC endpoint
-3. **Local proxy** — translates OpenAI `/v1/chat/completions` to Cursor's protobuf/HTTP2 Connect protocol
-4. **Tool routing** — rejects Cursor's native tools, exposes pi's tools via MCP
+This fork raises the defaults (120 s initial, 300 s activity) and makes them configurable via `PI_CURSOR_BRIDGE_INITIAL_TIMEOUT_MS` and `PI_CURSOR_BRIDGE_ACTIVITY_TIMEOUT_MS` (see [Configuration](#configuration)).
-## Install
+### Bridge termination error propagation
-```bash
-# Via pi install
-pi install npm:@offbynan/pi-cursor-provider
+In the upstream, if the `h2-bridge` child process exits before producing any response (e.g. due to a timeout), the proxy sends a `finish_reason: "stop"` with empty content on the streaming path, and a silent 200 OK on the non-streaming path. Pi receives what looks like a successful but empty response, then fails compaction with an opaque `terminated` error.
-# Or manually
-git clone https://github.com/offbynan/pi-cursor-provider ~/.pi/agent/extensions/cursor-provider
-cd ~/.pi/agent/extensions/cursor-provider
-npm install
-```
+This fork checks the bridge exit code in both paths:
+- **Streaming path** — if the bridge exits with code ≠ 0 before any response, an SSE error chunk is sent so pi surfaces a real failure.
+- **Non-streaming path** — same condition returns a 502 JSON error.
+- **Both paths** — the conversation state is preserved so the next retry can resume from the last good checkpoint rather than rebuilding from scratch.
-## Usage
+### Conversation history archiving
-```
-/login cursor     # authenticate via browser
-/model            # select a Cursor model
-```
+Cursor's `AgentService/Run` RPC is stateless per request: each turn sends the full conversation state as a checkpoint blob, and the server fetches individual turn blobs via `getBlobArgs` as needed. For a long conversation every request incurs O(history) round-trips; the compaction turn is the worst case because Cursor must read the entire history to generate a summary.
+This fork folds turns older than a configurable tail into a single `ConversationSummaryArchive` protobuf blob that stores the transcript as **inline text**. The server reads one blob instead of hundreds, cutting round-trips from O(N) to O(tail):
+| Scenario | `getBlobArgs` before | `getBlobArgs` after |
+| ---------------------- | --------------------- | ------------------- |
+| 100-turn compaction | ~300 | ~61 |
+| 20-turn normal turn | ~60 | ~60 (unchanged) |
+The tail size is configurable via `PI_CURSOR_TURN_ARCHIVE_THRESHOLD` (default 20, see [Configuration](#configuration)).
+Archiving is conservative: old turns are only replaced if every required blob is already in the local store. If any blob is missing the turns are left as-is, so no context is silently dropped.
+### SSE keepalive during blob-fetching
+Before the first token arrives, the proxy is silent: it sends HTTP 200 headers immediately but emits no SSE events while Cursor fetches conversation blobs. If pi's HTTP client has a request timeout (or a "time since last data" idle timeout), it fires during this window and the request is aborted with `Error: Request timed out.`
+This fork starts a 15-second keepalive timer alongside the SSE stream. While the response is open and no data has been sent yet, the timer periodically writes an SSE comment (`: ping`) which is invisible to pi's message parser but resets any inactivity timer in the HTTP layer.
+### Conversation state preserved on transient errors
+Previously, a bridge timeout (`exit code ≠ 0`) or a Connect-level error from Cursor caused the proxy to call `conversationStates.delete(convKey)`, wiping the stored checkpoint. On the next request pi would rebuild the Cursor conversation from scratch — losing any context accumulated since the last compaction.
+Neither failure mode actually invalidates the checkpoint. A bridge timeout means Cursor stopped responding to the current request, not that its conversation state is corrupt. A Connect error (e.g. rate limit, transient upstream failure) also leaves the prior checkpoint intact.
+This fork removes both deletes. The last good checkpoint survives errors, so the next request resumes from where the conversation was rather than starting over.
+### Checkpoint saved on client disconnect
+When pi closes the SSE connection (e.g. its own request timeout fires), the proxy previously guarded checkpoint persistence behind `if (!cancelled)`, discarding any checkpoint that Cursor had already sent for that turn. On the next request the proxy used a stale checkpoint, losing the partial turn's context.
+This fork removes the `!cancelled` guard. If Cursor sent a checkpoint before the disconnect, it is saved and the retry picks it up.
 ## Model Mapping
@@ -167,19 +235,17 @@ Models sharing the same `(base, variant)` with **≥2 effort levels** and a sens
 The proxy inserts the effort before `-fast`/`-thinking`:
 ```
-pi selects: gpt-5.4-fast  +  effort: high  →  Cursor receives: gpt-5.4-high-fast
+pi selects: gpt-5.4-fast  +  effort: high    →  Cursor receives: gpt-5.4-high-fast
 pi selects: gpt-5.4       +  effort: medium  →  Cursor receives: gpt-5.4-medium
-pi selects: composer-2     +  (no effort)     →  Cursor receives: composer-2
+pi selects: composer-2    +  (no effort)     →  Cursor receives: composer-2
 ```
-When a group is **collapsed**, the proxy registers one model with `supportsReasoningEffort: true` and an internal effort map (see table above).
 **Collapsed** when Cursor returns either:
 - **Multiple** effort suffixes for the same `(base, -fast, -thinking)` group, or
-- **A single** variant whose parsed effort suffix is **non-empty** (for example only `claude-4.5-opus-high` is listed). The suffix is removed from the displayed ID so Pi's reasoning-effort setting supplies it.
+- **A single** variant whose parsed effort suffix is **non-empty** (for example only `claude-4.5-opus-high` is listed). The suffix is removed from the displayed ID so pi's reasoning-effort setting supplies it.
-**Left as-is** (raw Cursor ID on that row, `supportsReasoningEffort: false`) when the group has **one** variant and the parsed effort suffix is **empty**—typically IDs with no effort segment, such as `composer-2`, `gemini-3.1-pro`, or `kimi-k2.5`.
+**Left as-is** when the group has **one** variant and the parsed effort suffix is **empty** — typically IDs with no effort segment, such as `composer-2`, `gemini-3.1-pro`, or `kimi-k2.5`.
 ### Disabling the mapping
@@ -191,42 +257,26 @@ PI_CURSOR_RAW_MODELS=1 pi
 ## Session Management
-The proxy maintains conversation state per pi session, enabling multi-turn conversations with Cursor models while preserving forks, tool continuations, and interruptions correctly.
+The proxy maintains per-session conversation state to enable multi-turn conversations with tool call continuations and clean lifecycle handling.
-### How it works
+### State storage
-- **Session tracking** — pi's session ID is injected into requests via a `before_provider_request` hook. The proxy keys bridge state and stored conversation state from that real session ID.
-- **Checkpoints** — Cursor returns a conversation checkpoint after completed turns. The proxy stores that checkpoint, plus the completed-turn count and a fingerprint of the completed structured history, and reuses it only when the incoming history still matches.
-- **Session-scoped state** — real pi session state is kept in memory until explicit cleanup or process restart. Anonymous fallback state can still be TTL-evicted.
-- **Lifecycle cleanup** — session state is cleaned up on pi lifecycle events such as session switch, fork, `/tree`, and shutdown.
+- **Keyed by session ID** — pi injects its session ID into every request via a `before_provider_request` hook; the proxy uses it to key both bridge state and the stored conversation checkpoint.
+- **Checkpoint** — Cursor sends a `conversationCheckpointUpdate` message after each completed turn. The proxy stores the latest checkpoint and reuses it on the next request, so Cursor picks up exactly where it left off without rebuilding the full conversation from scratch.
+- **Blob store** — protobuf blobs referenced by the checkpoint are cached locally and served back to Cursor on demand via `getBlobArgs` / `setBlobArgs`.
+- **In-memory only** — all state lives in process memory. A proxy restart loses checkpoints; the next request rebuilds from pi's message history.
 ### Tool continuations
-When Cursor pauses for a tool call, the proxy keeps the live upstream bridge open and waits for pi to send the tool result on the next request. That tool result is sent back into the same in-flight Cursor run, so the tool continuation stays part of the original user turn instead of inflating completed history.
-### Interruptions
-If the client disconnects or interrupts a turn mid-stream, the proxy cancels the upstream Cursor run and does **not** commit the pending checkpoint. Checkpoints are only committed after a turn finishes successfully.
-### Session fork
-When you navigate back in pi's session tree and branch from an earlier point, the proxy discards the stored checkpoint whenever the completed history no longer matches the stored checkpoint metadata. That includes both:
-- completed turn count mismatches, and
-- same-depth branch changes detected via completed-history fingerprint mismatch.
-After discarding a stale checkpoint, the proxy reconstructs proper protobuf conversation turns from the message history pi sends, so Cursor sees the actual conversation structure at the fork point.
+When Cursor requests a tool call, the proxy pauses the SSE stream, stores the live bridge in memory, and returns the tool call to pi. When pi sends the result on the next request, the proxy forwards it into the same in-flight Cursor run so the continuation stays part of the original turn.
-### Session resume
+### Lifecycle cleanup
-Conversation state is stored in memory. If the proxy restarts, checkpoints are lost. On the next request, pi sends the full conversation history, and the proxy reconstructs structured protobuf turns from that history instead of relying on an inline plaintext fallback.
+Session state is cleared on pi lifecycle events — session switch, fork, `/tree`, shutdown, and post-compaction — so stale checkpoints never carry over into a new context.
-That reconstruction preserves:
+### Error resilience
-- assistant messages
-- tool calls
-- tool results
-- final assistant text after tool results
+A bridge timeout or Connect-level error from Cursor does not wipe the stored checkpoint. The last good checkpoint survives transient failures and is used on the next retry. If Cursor sends a checkpoint before a client disconnect, that checkpoint is also preserved.
 ## Requirements

package/cursor-models-raw.json CHANGED Viewed

@@ -118,6 +118,20 @@
     "contextWindow": 200000,
     "maxTokens": 64000
   },
+  {
+    "id": "composer-2.5",
+    "name": "Composer 2.5",
+    "reasoning": false,
+    "contextWindow": 200000,
+    "maxTokens": 64000
+  },
+  {
+    "id": "composer-2.5-fast",
+    "name": "Composer 2.5 Fast",
+    "reasoning": false,
+    "contextWindow": 200000,
+    "maxTokens": 64000
+  },
   {
     "id": "default",
     "name": "Auto",

package/h2-bridge.mjs CHANGED Viewed

@@ -91,11 +91,16 @@ const client = http2.connect(url || "https://api2.cursor.sh");
 // Guard against initial connection failure. Reset on any h2 activity
 // so long-running agent conversations (with tool call round-trips) survive.
-let timeout = setTimeout(killBridge, 30_000);
+// Initial timeout is generous because large conversations require Cursor to
+// deserialize a big checkpoint + run many getBlobArgs round-trips before it
+// starts streaming tokens — 30 s was too short and caused compaction failures.
+const INITIAL_TIMEOUT_MS = parseInt(process.env.PI_CURSOR_BRIDGE_INITIAL_TIMEOUT_MS ?? "") || 120_000;
+const ACTIVITY_TIMEOUT_MS = parseInt(process.env.PI_CURSOR_BRIDGE_ACTIVITY_TIMEOUT_MS ?? "") || 300_000;
+let timeout = setTimeout(killBridge, INITIAL_TIMEOUT_MS);
 function resetTimeout() {
   clearTimeout(timeout);
-  timeout = setTimeout(killBridge, 120_000);
+  timeout = setTimeout(killBridge, ACTIVITY_TIMEOUT_MS);
 }
 function killBridge() {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@offbynan/pi-cursor-provider",
-  "version": "0.3.0",
+  "version": "0.5.0",
   "description": "Pi extension providing access to Cursor models via OAuth and a local OpenAI-compatible gRPC proxy",
   "type": "module",
   "license": "MIT",

package/proxy.ts CHANGED Viewed

@@ -78,6 +78,7 @@ import {
   WriteResultSchema,
   WriteShellStdinErrorSchema,
   WriteShellStdinResultSchema,
+  ConversationSummaryArchiveSchema,
   GetUsableModelsRequestSchema,
   GetUsableModelsResponseSchema,
   type AgentServerMessage,
@@ -598,8 +599,9 @@ function decodeConnectUnaryBody(payload: Uint8Array): Uint8Array | null {
  * derive them from known model families.  Update when new major versions ship.
  *
  * Sources:
- *  - Claude: platform.claude.ai/docs — claude-4.6-sonnet / claude-4.6-opus: 1M (GA Mar 2026);
- *    all other Claude incl. 4.5, 4, Haiku: 200k.
+ *  - Claude: platform.claude.ai/docs — claude-4.6-sonnet / claude-4.6-opus: native 1M context
+ *    (GA Mar 2026), but Cursor enforces a 200k cap via ConversationTokenDetails.maxTokens.
+ *    Registered at 200k to match Cursor's actual limit; all other Claude incl. 4.5, 4, Haiku: 200k.
  *  - Gemini: ai.google.dev/gemini-api/docs — all 2.5 / 3.x models: 1M.
  *  - GPT: chatai.guide — GPT-5.x: 400k; GPT-5.5+: 1M; nano/mini variants: 128k.
  *  - Grok 4: docs.x.ai — 256k.
@@ -612,9 +614,9 @@ export function inferContextWindow(id: string): number {
   if (lower.includes("-1m")) return 1_048_576;
   // ── Claude ────────────────────────────────────────────────────────────────
-  // Sonnet 4.6 and Opus 4.6 gained 1M context (GA March 2026).
-  // All earlier versions (4.5, 4, …) and Haiku remain at 200k.
-  if (lower.startsWith("claude-4.6-sonnet") || lower.startsWith("claude-4.6-opus")) return 1_048_576;
+  // Sonnet 4.6 / Opus 4.6 natively support 1M but Cursor enforces 200k server-side.
+  // Registering at 200k avoids spurious 5× scaling in computeUsage.
+  // All other Claude (4.5, 4, Haiku, …) are also 200k.
   if (lower.startsWith("claude-")) return 200_000;
   // ── Gemini ────────────────────────────────────────────────────────────────
@@ -1331,6 +1333,103 @@ function buildTurnStepBytes(step: ParsedTurnStep): Uint8Array {
   );
 }
+// Number of most-recent turns to keep as raw blobs; older turns are folded
+// into a ConversationSummaryArchive blob with inline text.  Keeping only the
+// tail as raw blobs caps the number of blob fetches the server needs per
+// request to O(THRESHOLD) instead of O(conversation_length), which is the
+// primary driver of compaction slowness for long sessions.
+const TURN_ARCHIVE_THRESHOLD =
+  parseInt(process.env.PI_CURSOR_TURN_ARCHIVE_THRESHOLD ?? "") || 20;
+/**
+ * Renders parsed turns (already-decoded OpenAI messages) as plain text for
+ * use as the `summary` field of a ConversationSummaryArchive.  Tool results
+ * are truncated so the archive blob stays small.
+ */
+function buildTurnsTranscript(turns: ParsedTurn[]): string {
+  const parts: string[] = [
+    `[Earlier conversation — ${turns.length} turn(s)]\n`,
+  ];
+  for (const [i, turn] of turns.entries()) {
+    parts.push(`Turn ${i + 1}:`);
+    if (turn.userText) parts.push(`User: ${turn.userText.slice(0, 1000)}`);
+    for (const step of turn.steps) {
+      if (step.kind === "assistantText") {
+        if (step.text) parts.push(`Assistant: ${step.text.slice(0, 800)}`);
+      } else if (step.kind === "toolCall") {
+        const argsStr = JSON.stringify(step.arguments).slice(0, 300);
+        parts.push(`Tool: ${step.toolName}(${argsStr})`);
+        if (step.result?.content) {
+          parts.push(`Result: ${step.result.content.slice(0, 400)}`);
+        }
+      }
+    }
+    parts.push("");
+  }
+  return parts.join("\n");
+}
+/**
+ * Extracts a human-readable transcript of a single turn from the blob store.
+ * Returns null if required blobs are missing (turn is left as a raw blob).
+ */
+function extractTextFromTurnBlob(
+  turnBlobId: Uint8Array,
+  blobStore: Map<string, Uint8Array>,
+): string | null {
+  try {
+    const turnData = blobStore.get(Buffer.from(turnBlobId).toString("hex"));
+    if (!turnData) return null;
+    const turnStructure = fromBinary(ConversationTurnStructureSchema, turnData);
+    if (turnStructure.turn.case !== "agentConversationTurn") return null;
+    const agentTurn = turnStructure.turn.value;
+    const lines: string[] = [];
+    const userMsgData = blobStore.get(
+      Buffer.from(agentTurn.userMessage).toString("hex"),
+    );
+    if (userMsgData) {
+      const userMsg = fromBinary(UserMessageSchema, userMsgData);
+      if (userMsg.text) lines.push(`User: ${userMsg.text.slice(0, 1000)}`);
+    } else {
+      return null; // can't represent this turn without its user message
+    }
+    for (const stepBlobId of agentTurn.steps) {
+      const stepData = blobStore.get(
+        Buffer.from(stepBlobId).toString("hex"),
+      );
+      if (!stepData) continue;
+      const step = fromBinary(ConversationStepSchema, stepData);
+      if (step.message.case === "assistantMessage") {
+        const text = step.message.value.text;
+        if (text) lines.push(`Assistant: ${text.slice(0, 800)}`);
+      } else if (step.message.case === "toolCall") {
+        const tc = step.message.value;
+        if (tc.tool.case === "mcpToolCall") {
+          const mcp = tc.tool.value;
+          const name = mcp.args?.name ?? "tool";
+          lines.push(`Tool: ${name}`);
+          if (mcp.result?.result.case === "success") {
+            const content = mcp.result.result.value.content
+              .map((c) =>
+                c.content.case === "text" ? c.content.value.text : "",
+              )
+              .join("")
+              .slice(0, 400);
+            if (content) lines.push(`Result: ${content}`);
+          }
+        }
+      }
+    }
+    return lines.join("\n") || null;
+  } catch {
+    return null;
+  }
+}
 export function buildCursorRequest(
   modelId: string,
   systemPrompt: string,
@@ -1367,9 +1466,90 @@ export function buildCursorRequest(
       ConversationStateStructureSchema,
       checkpoint,
     );
+    // Archive old turns from the checkpoint when the tail is too long.
+    // Each raw turn blob requires ~3 getBlobArgs round-trips from the server;
+    // replacing old turns with a single ConversationSummaryArchive blob (inline
+    // text) cuts that to 1 fetch for all archived history.
+    if (conversationState.turns.length > TURN_ARCHIVE_THRESHOLD) {
+      const oldTurnIds = conversationState.turns.slice(
+        0,
+        conversationState.turns.length - TURN_ARCHIVE_THRESHOLD,
+      );
+      const recentTurnIds = conversationState.turns.slice(
+        -TURN_ARCHIVE_THRESHOLD,
+      );
+      const archiveLines: string[] = [
+        `[Earlier conversation \u2014 ${oldTurnIds.length} turn(s)]\n`,
+      ];
+      let archivedCount = 0;
+      for (const [i, oldTurnId] of oldTurnIds.entries()) {
+        const text = extractTextFromTurnBlob(oldTurnId, blobStore);
+        if (text === null) continue; // blob missing — leave turn as-is
+        archiveLines.push(`Turn ${i + 1}:\n${text}`);
+        archiveLines.push("");
+        archivedCount++;
+      }
+      // Only replace turns with archive if we could represent all of them;
+      // a partial archive would silently drop context the server needs.
+      if (archivedCount === oldTurnIds.length) {
+        const archive = create(ConversationSummaryArchiveSchema, {
+          summarizedMessages: oldTurnIds,
+          summary: archiveLines.join("\n"),
+          windowTail: oldTurnIds.length,
+          summaryMessage: new Uint8Array(0),
+        });
+        const archiveBlobId = storeAsBlob(
+          toBinary(ConversationSummaryArchiveSchema, archive),
+          blobStore,
+        );
+        conversationState.turns = recentTurnIds;
+        conversationState.summaryArchives = [
+          ...conversationState.summaryArchives,
+          archiveBlobId,
+        ];
+        debugLog("cursor_request.turns_archived", {
+          archivedCount,
+          remaining: recentTurnIds.length,
+          totalArchives: conversationState.summaryArchives.length,
+        });
+      }
+    }
   } else {
+    // When rebuilding from scratch (no checkpoint), archive old parsed turns
+    // directly — we have their text, no blob parsing needed.
+    const olderTurns =
+      turns.length > TURN_ARCHIVE_THRESHOLD
+        ? turns.slice(0, turns.length - TURN_ARCHIVE_THRESHOLD)
+        : [];
+    const recentTurns =
+      turns.length > TURN_ARCHIVE_THRESHOLD
+        ? turns.slice(-TURN_ARCHIVE_THRESHOLD)
+        : turns;
+    const summaryArchives: Uint8Array[] = [];
+    if (olderTurns.length > 0) {
+      const archive = create(ConversationSummaryArchiveSchema, {
+        summarizedMessages: [], // no blob IDs yet — turns haven't been stored
+        summary: buildTurnsTranscript(olderTurns),
+        windowTail: olderTurns.length,
+        summaryMessage: new Uint8Array(0),
+      });
+      summaryArchives.push(
+        storeAsBlob(
+          toBinary(ConversationSummaryArchiveSchema, archive),
+          blobStore,
+        ),
+      );
+      debugLog("cursor_request.turns_archived_from_scratch", {
+        archivedCount: olderTurns.length,
+        remaining: recentTurns.length,
+      });
+    }
     const turnBlobIds: Uint8Array[] = [];
-    for (const turn of turns) {
+    for (const turn of recentTurns) {
       const userMsg = createUserMessage(
         turn.userText,
         selectedCtxBlob,
@@ -1408,7 +1588,7 @@ export function buildCursorRequest(
       mode: 1,
       fileStates: {},
       fileStatesV2: {},
-      summaryArchives: [],
+      summaryArchives,
       turnTimings: [],
       subagentStates: {},
       selfSummaryCount: 0,
@@ -2213,6 +2393,7 @@ function writeSSEStream(
   });
   let closed = false;
+  let keepAliveTimer: ReturnType<typeof setInterval> | undefined;
   const sendSSE = (data: object) => {
     if (closed) return;
     res.write(`data: ${JSON.stringify(data)}\n\n`);
@@ -2224,6 +2405,7 @@ function writeSSEStream(
   const closeResponse = () => {
     if (closed) return;
     closed = true;
+    clearInterval(keepAliveTimer);
     res.end();
   };
@@ -2265,6 +2447,13 @@ function writeSSEStream(
   let cancelled = false;
   let latestCheckpoint: Uint8Array | null = null;
+  // Keep the SSE connection alive during the silent blob-fetching phase so
+  // pi's request timeout does not fire before the first token arrives.
+  keepAliveTimer = setInterval(() => {
+    if (!closed) res.write(": ping\n\n");
+  }, 15_000);
+  keepAliveTimer.unref();
   // Detect client disconnect (e.g. user pressed Escape in pi)
   const onClientClose = () => {
     if (cancelled || closed) return;
@@ -2394,7 +2583,6 @@ function writeSSEStream(
           `[cursor-provider] Cursor stream error (${modelId}):`,
           endError.message,
         );
-        conversationStates.delete(convKey);
         sendSSE(makeChunk({ content: endError.message }, "error"));
         sendSSE(makeUsageChunk());
         sendDone();
@@ -2437,7 +2625,7 @@ function writeSSEStream(
     const stored = conversationStates.get(convKey);
     if (stored) {
       for (const [k, v] of blobStore) stored.blobStore.set(k, v);
-      if (!cancelled && latestCheckpoint) {
+      if (latestCheckpoint) {
         stored.checkpoint = latestCheckpoint;
         debugLog("stream.checkpoint_committed", { requestId, convKey, stored });
       }
@@ -2447,17 +2635,31 @@ function writeSSEStream(
     }
     if (cancelled) return;
     if (!mcpExecReceived) {
-      const flushed = tagFilter.flush();
-      if (flushed.reasoning)
-        sendSSE(makeChunk({ reasoning_content: flushed.reasoning }));
-      if (flushed.content) {
-        appendAssistantTextToTurn(currentTurn, flushed.content);
-        sendSSE(makeChunk({ content: flushed.content }));
+      if (code !== 0) {
+        // Bridge was killed before receiving any response (e.g. timeout waiting
+        // for Cursor to process a large checkpoint during compaction). Treat as
+        // an error so callers (like pi compaction) see a real failure instead of
+        // an empty successful-looking response.
+        console.error(
+          `[cursor-provider] Bridge exited (code ${code}) before receiving response (${modelId})`,
+        );
+        sendSSE(makeChunk({ content: `Cursor bridge terminated (exit ${code}) before response — try again or shorten the conversation` }, "error"));
+        sendSSE(makeUsageChunk());
+        sendDone();
+        closeResponse();
+      } else {
+        const flushed = tagFilter.flush();
+        if (flushed.reasoning)
+          sendSSE(makeChunk({ reasoning_content: flushed.reasoning }));
+        if (flushed.content) {
+          appendAssistantTextToTurn(currentTurn, flushed.content);
+          sendSSE(makeChunk({ content: flushed.content }));
+        }
+        sendSSE(makeChunk({}, "stop"));
+        sendSSE(makeUsageChunk());
+        sendDone();
+        closeResponse();
       }
-      sendSSE(makeChunk({}, "stop"));
-      sendSSE(makeUsageChunk());
-      sendDone();
-      closeResponse();
     } else if (code !== 0) {
       sendSSE(makeChunk({ content: "Bridge connection lost" }, "error"));
       sendSSE(makeUsageChunk());
@@ -2762,17 +2964,17 @@ async function handleNonStreamingResponse(
               `[cursor-provider] Cursor non-stream error (${modelId}):`,
               endError.message,
             );
-            conversationStates.delete(convKey);
             nonStreamError = endError;
           }
         },
       ),
     );
-    bridge.onClose(() => {
+    bridge.onClose((code) => {
       debugLog("nonstream.bridge_close", {
         requestId,
         convKey,
+        code,
         cancelled,
         nonStreamError: nonStreamError?.message,
         currentTurn,
@@ -2784,7 +2986,7 @@ async function handleNonStreamingResponse(
       const stored = conversationStates.get(convKey);
       if (stored) {
         for (const [k, v] of payload.blobStore) stored.blobStore.set(k, v);
-        if (!cancelled && !nonStreamError && latestCheckpoint) {
+        if (latestCheckpoint) {
           stored.checkpoint = latestCheckpoint;
           debugLog("nonstream.checkpoint_committed", {
             requestId,
@@ -2829,6 +3031,24 @@ async function handleNonStreamingResponse(
         return;
       }
+      if (code !== 0) {
+        console.error(
+          `[cursor-provider] Bridge exited (code ${code}) before non-stream response (${modelId})`,
+        );
+        res.writeHead(502, { "Content-Type": "application/json" });
+        res.end(
+          JSON.stringify({
+            error: {
+              message: `Cursor bridge terminated (exit ${code}) before response — try again or shorten the conversation`,
+              type: "upstream_error",
+              code: "bridge_terminated",
+            },
+          }),
+        );
+        resolve();
+        return;
+      }
       const flushed = tagFilter.flush();
       fullText += flushed.content;
       appendAssistantTextToTurn(currentTurn, flushed.content);