npm - @offbynan/pi-cursor-provider - Versions diffs - 0.2.0 → 0.3.0 - Mend

@offbynan/pi-cursor-provider 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # pi-cursor-provider
-> **This fork improves on the upstream in three areas:** image support, correct `pi -p` exit behaviour, and removal of dead eviction code. See the sections below for details.
+> **This fork improves on the upstream in ten areas:** image support, correct `pi -p` exit behaviour, removal of dead eviction code, accurate per-model context window inference, post-compaction session sync, context window scaling when Cursor enforces a tighter cap, per-model cost estimation, model deduplication with reasoning-effort mapping, thinking-tag filtering, and structured debug logging. See the sections below for details.
 [![npm version](https://img.shields.io/npm/v/@offbynan/pi-cursor-provider.svg)](https://www.npmjs.com/package/@offbynan/pi-cursor-provider)
@@ -33,6 +33,69 @@ This fork fixes both: empty and non-JSON end-stream bodies are treated as succes
 The upstream proxy included a 30-minute TTL eviction mechanism (`evictStaleConversations`, `CONVERSATION_TTL_MS`, `sessionScoped`, `lastAccessMs`). All conversations created by pi include a session ID, permanently exempting them from TTL eviction, so this code was never reachable. This fork removes it.
+### Accurate per-model context window inference
+Cursor's `GetUsableModels` RPC does not return context window sizes, so the upstream proxy hardcodes 200 k for every model. This fork exports an `inferContextWindow(id)` function that derives the correct window from known model families:
+| Family | Window |
+| ------ | ------ |
+| Claude 4.6 Sonnet / Opus | 1 M |
+| All other Claude | 200 k |
+| Gemini 2.5 / 3.x | 1 M |
+| GPT nano / mini variants | 128 k |
+| GPT-5.5+ | 1 M |
+| GPT-5.x (other) | 400 k |
+| Grok 4 | 256 k |
+| Kimi K2.x | 262 k |
+| Anything with `-1m` suffix | 1 M |
+| Unknown / Composer | 200 k |
+This ensures pi uses the right compaction thresholds and token budget for each model.
+### Post-compaction session sync
+When pi compacts its message list (the `session_compact` lifecycle event), the proxy's cached conversation checkpoint still reflects the full pre-compaction conversation. Continuing without clearing that cache would cause a history mismatch, forcing an expensive full reconstruction on the next request.
+This fork listens for `session_compact` and eagerly clears the stored checkpoint for the affected session, so both sides stay in sync at zero extra cost.
+### Context window scaling when Cursor enforces a tighter cap
+Cursor sometimes enforces a tighter context window at runtime than what the model ID implies (for example, capping Gemini at 200 k even though we registered 1 M). In that case the raw `usedTokens` from Cursor's `ConversationTokenDetails` would appear far below pi's compaction threshold, so pi would never compact — then Cursor would eventually error with a context-overflow.
+This fork reads `maxTokens` from `ConversationTokenDetails` and, when Cursor's cap is tighter than the inferred window, scales `total_tokens` proportionally:
+```
+total_tokens = round(usedTokens × piWindow / cursorWindow)
+```
+That makes pi's compaction threshold fire at the right time relative to the window Cursor is actually enforcing.
+### Per-model cost estimation
+The upstream repo provides no cost data, so pi cannot show per-turn cost estimates for Cursor models.
+This fork ships a detailed cost table (input / output / cache-read / cache-write prices in $/M tokens) covering every current model family — Claude 4.x, GPT-5.x, Gemini 2.5/3.x, Grok 4, Kimi K2, and Composer — plus a pattern-based fallback for variants not yet in the table. Pi uses this data to display cost estimates after each turn.
+### Model deduplication with reasoning-effort mapping
+Cursor's `GetUsableModels` RPC can return dozens of near-duplicate IDs that differ only by effort suffix (e.g. `gpt-5.4-low`, `gpt-5.4-medium`, `gpt-5.4-high`, `gpt-5.4-xhigh`). The upstream passes all of these through verbatim, producing a cluttered model list where the user must manually pick the right suffix and pi's reasoning-effort setting is ignored.
+This fork deduplicates them: model variants that share the same base ID and differ only by effort suffix are collapsed into a single entry with `supportsReasoningEffort: true` and an effort map keyed by pi's reasoning levels (`minimal` / `low` / `medium` / `high` / `xhigh`). Pi's thinking-level setting then drives the effort suffix automatically, and the model list stays manageable. See the [Model Mapping](#model-mapping) section for the full deduplication rules.
+### Thinking-tag filtering
+Some models (notably certain Gemini variants) emit reasoning content inline with the response, wrapped in tags like `<think>`, `<thinking>`, `<reasoning>`, or `<thought>`. The upstream passes this through as raw text, polluting the main response with unrendered XML tags.
+This fork detects and strips these tags in the proxy's stream processor, routing the extracted content to the `reasoning_content` SSE field so pi renders it as structured reasoning rather than as part of the assistant's reply.
+### Structured debug logging
+The upstream has no observability. This fork adds opt-in JSONL event logging (set `PI_CURSOR_PROVIDER_DEBUG=1`) covering every stage of a request: HTTP ingress, message parsing, checkpoint reads/writes, bridge lifecycle, tool call pauses, tool result resumes, and stream completion. A bundled `debug:timeline` script converts a raw log file into a compact human-readable timeline for diagnosing proxy behaviour.
+```bash
+npm run debug:timeline -- --latest
+```
 ## How it works
 ```

package/auth.ts CHANGED Viewed

@@ -111,6 +111,7 @@ export interface CursorCredentials {
   access: string;
   refresh: string;
   expires: number;
+  [key: string]: unknown;
 }
 export async function refreshCursorToken(

package/index.ts CHANGED Viewed

@@ -12,7 +12,7 @@
  * Based on https://github.com/ephraimduncan/opencode-cursor by Ephraim Duncan.
  */
-import rawFallbackModels from "./cursor-models-raw.json";
+import rawFallbackModels from "./cursor-models-raw.json" with { type: "json" };
 import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
 import type {
   OAuthCredentials,
@@ -30,6 +30,7 @@ import {
 import {
   cleanupSessionState,
   getCursorModels,
+  inferContextWindow,
   startProxy,
   type CursorModel,
 } from "./proxy.js";
@@ -126,7 +127,7 @@ function summarizeBranchTail(
   ctx: {
     sessionManager?: {
       getBranch?: () => unknown[];
-      getLeafId?: () => string;
+      getLeafId?: () => string | null;
       getSessionId?: () => string;
     };
   },
@@ -474,7 +475,7 @@ function modelConfig(m: ProcessedModel) {
     reasoning: supportsReasoningModelId(m.id),
     input: ["text", "image"] as ("text" | "image")[],
     cost: estimateModelCost(m.id),
-    contextWindow: m.contextWindow,
+    contextWindow: inferContextWindow(m.id),
     maxTokens: m.maxTokens,
     compat: {
       supportsDeveloperRole: false,
@@ -497,11 +498,11 @@ export const FALLBACK_MODELS: CursorModel[] = (
 // ── Extension ──
-export function registerSessionLifecycleCleanup(pi: ExtensionAPI) {
+export function registerSessionLifecycleCleanup(pi: ExtensionAPI): void {
   const cleanupCurrentSession = (
     _event: unknown,
     ctx: {
-      sessionManager: { getSessionId(): string; getLeafId?: () => string };
+      sessionManager: { getSessionId(): string; getLeafId?: () => string | null };
     },
   ) => {
     debugExtensionLog("session.cleanup_hook", {
@@ -515,6 +516,16 @@ export function registerSessionLifecycleCleanup(pi: ExtensionAPI) {
   pi.on("session_before_fork", cleanupCurrentSession);
   pi.on("session_before_tree", cleanupCurrentSession);
   pi.on("session_shutdown", cleanupCurrentSession);
+  // After pi compacts its message list the cursor proxy's cached checkpoint
+  // still reflects the full pre-compaction conversation.  Clearing the state
+  // here forces the proxy to rebuild the cursor conversation from pi's now-
+  // compacted messages on the next request, so both sides stay in sync.
+  pi.on("session_compact", (_event, ctx) => {
+    const sessionId = ctx.sessionManager.getSessionId();
+    debugExtensionLog("session.post_compact_cleanup", { sessionId });
+    cleanupSessionState(sessionId);
+  });
 }
 function registerExtensionDebugHooks(pi: ExtensionAPI) {
@@ -613,7 +624,7 @@ function registerExtensionDebugHooks(pi: ExtensionAPI) {
   });
 }
-export default async function (pi: ExtensionAPI) {
+export default async function (pi: ExtensionAPI): Promise<void> {
   // Current access token, updated by login/refresh/getApiKey
   let currentToken = "";

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@offbynan/pi-cursor-provider",
-  "version": "0.2.0",
+  "version": "0.3.0",
   "description": "Pi extension providing access to Cursor models via OAuth and a local OpenAI-compatible gRPC proxy",
   "type": "module",
   "license": "MIT",
@@ -53,6 +53,7 @@
     "debug:timeline": "node scripts/debug-log-timeline.mjs"
   },
   "devDependencies": {
+    "typescript": "^6.0.3",
     "vitest": "^4.1.3"
   }
 }

package/proto/agent_pb.ts CHANGED Viewed

@@ -1,6 +1,7 @@
 // @generated by protoc-gen-es v2.10.2 with parameter "target=ts"
 // @generated from file agent.proto (package agent.v1, syntax proto3)
 /* eslint-disable */
+// @ts-nocheck
 import type { Message } from "@bufbuild/protobuf";
 import type { GenEnum, GenFile, GenMessage, GenService } from "@bufbuild/protobuf/codegenv2";

package/proxy.ts CHANGED Viewed

@@ -56,6 +56,8 @@ import {
   McpTextContentSchema,
   McpToolCallSchema,
   McpToolDefinitionSchema,
+  McpToolErrorSchema,
+  McpToolResultSchema,
   McpToolResultContentItemSchema,
   ModelDetailsSchema,
   ReadRejectedSchema,
@@ -176,13 +178,24 @@ export interface StoredConversation {
   conversationId: string;
   checkpoint: Uint8Array | null;
   blobStore: Map<string, Uint8Array>;
+  /**
+   * Cursor's actual context window for this conversation, populated from
+   * ConversationTokenDetails.maxTokens in checkpoint updates.  Used to correct
+   * our static inferContextWindow() estimate when Cursor enforces a tighter cap.
+   */
+  effectiveContextWindow?: number;
 }
 interface StreamState {
   toolCallIndex: number;
   pendingExecs: PendingExec[];
   outputTokens: number;
+  /** usedTokens from Cursor's ConversationTokenDetails. */
   totalTokens: number;
+  /** maxTokens from Cursor's ConversationTokenDetails; 0 = not yet received. */
+  cursorContextWindow: number;
+  /** inferContextWindow(modelId) — our static estimate for this model. */
+  inferredContextWindow: number;
 }
 interface ToolResultInfo {
@@ -265,6 +278,8 @@ function truncateDebugString(value: string, max = 4000): string {
     : value;
 }
+function sanitizeForDebug(value: Record<string, unknown>): Record<string, unknown>;
+function sanitizeForDebug(value: unknown): unknown;
 function sanitizeForDebug(value: unknown): unknown {
   if (value == null) return value;
   if (typeof value === "string") return truncateDebugString(value);
@@ -443,7 +458,7 @@ function spawnBridge(options: SpawnBridgeOptions): BridgeHandle {
     unref() {
       try {
         proc.unref();
-        (proc.stdout as any)?.unref?.();
+        (proc.stdout as { unref?: () => void } | null)?.unref?.();
       } catch {}
     },
     onClose(cb: (code: number) => void) {
@@ -525,7 +540,7 @@ export async function getCursorModels(apiKey: string): Promise<CursorModel[]> {
       response.exitCode === 0 &&
       response.body.length > 0
     ) {
-      let decoded: any = null;
+      let decoded: ReturnType<typeof fromBinary<typeof GetUsableModelsResponseSchema>> | null = null;
       try {
         decoded = fromBinary(GetUsableModelsResponseSchema, response.body);
       } catch {
@@ -576,18 +591,68 @@ function decodeConnectUnaryBody(payload: Uint8Array): Uint8Array | null {
   return null;
 }
+/**
+ * Infer context window size from the model ID.
+ *
+ * Cursor's GetUsableModels RPC does not expose context window sizes, so we
+ * derive them from known model families.  Update when new major versions ship.
+ *
+ * Sources:
+ *  - Claude: platform.claude.ai/docs — claude-4.6-sonnet / claude-4.6-opus: 1M (GA Mar 2026);
+ *    all other Claude incl. 4.5, 4, Haiku: 200k.
+ *  - Gemini: ai.google.dev/gemini-api/docs — all 2.5 / 3.x models: 1M.
+ *  - GPT: chatai.guide — GPT-5.x: 400k; GPT-5.5+: 1M; nano/mini variants: 128k.
+ *  - Grok 4: docs.x.ai — 256k.
+ *  - Kimi K2.x: platform.kimi.ai — 262,144 tokens (256k).
+ */
+export function inferContextWindow(id: string): number {
+  const lower = id.toLowerCase();
+  // Any model with an explicit -1m suffix (e.g. claude-4-sonnet-1m)
+  if (lower.includes("-1m")) return 1_048_576;
+  // ── Claude ────────────────────────────────────────────────────────────────
+  // Sonnet 4.6 and Opus 4.6 gained 1M context (GA March 2026).
+  // All earlier versions (4.5, 4, …) and Haiku remain at 200k.
+  if (lower.startsWith("claude-4.6-sonnet") || lower.startsWith("claude-4.6-opus")) return 1_048_576;
+  if (lower.startsWith("claude-")) return 200_000;
+  // ── Gemini ────────────────────────────────────────────────────────────────
+  // Gemini 2.5 / 3.x family: 1M context.
+  if (lower.startsWith("gemini-")) return 1_048_576;
+  // ── GPT ───────────────────────────────────────────────────────────────────
+  // nano / mini variants: 128k.  GPT-5.5+: 1M.  Everything else (5.x): 400k.
+  if (/^gpt-[0-9.]*-(nano|mini)/.test(lower)) return 128_000;
+  if (lower.startsWith("gpt-5.5")) return 1_048_576;
+  if (lower.startsWith("gpt-")) return 400_000;
+  // ── Grok ──────────────────────────────────────────────────────────────────
+  // Grok 4 series: 256k.
+  if (lower.startsWith("grok-")) return 256_000;
+  // ── Kimi ──────────────────────────────────────────────────────────────────
+  // Kimi K2.x: 262,144 tokens (256k).
+  if (lower.startsWith("kimi-")) return 262_144;
+  // Composer, default, unknown: 200k.
+  return 200_000;
+}
 function normalizeCursorModels(models: readonly unknown[]): CursorModel[] {
   const byId = new Map<string, CursorModel>();
   for (const model of models) {
-    const m = model as any;
-    const id = m?.modelId?.trim?.();
+    if (!model || typeof model !== "object") continue;
+    const m = model as Record<string, unknown>;
+    const rawId = m["modelId"];
+    const id = typeof rawId === "string" ? rawId.trim() : "";
     if (!id) continue;
-    const name = m.displayName || m.displayNameShort || m.displayModelId || id;
+    const name = String(m["displayName"] || m["displayNameShort"] || m["displayModelId"] || id);
     byId.set(id, {
       id,
       name,
-      reasoning: Boolean(m.thinkingDetails),
-      contextWindow: 200_000,
+      reasoning: Boolean(m["thinkingDetails"]),
+      contextWindow: inferContextWindow(id),
       maxTokens: 64_000,
     });
   }
@@ -1224,11 +1289,11 @@ function buildTurnStepBytes(step: ParsedTurnStep): Uint8Array {
       toolName,
     }),
     ...(step.result && {
-      result: create(McpResultSchema, {
+      result: create(McpToolResultSchema, {
         result: step.result.isError
           ? {
               case: "error",
-              value: create(McpErrorSchema, { error: step.result.content }),
+              value: create(McpToolErrorSchema, { error: step.result.content }),
             }
           : {
               case: "success",
@@ -1422,7 +1487,9 @@ function processServerMessage(
   } else if (msgCase === "conversationCheckpointUpdate") {
     const stateStructure = msg.message.value as ConversationStateStructure;
     if ((stateStructure as any).tokenDetails) {
-      state.totalTokens = (stateStructure as any).tokenDetails.usedTokens;
+      const td = (stateStructure as any).tokenDetails as { usedTokens?: number; maxTokens?: number };
+      if (td.usedTokens) state.totalTokens = td.usedTokens;
+      if (td.maxTokens) state.cursorContextWindow = td.maxTokens;
     }
     if (onCheckpoint) {
       onCheckpoint(toBinary(ConversationStateStructureSchema, stateStructure));
@@ -1831,7 +1898,7 @@ export function deterministicConversationId(convKey: string): string {
     hex.slice(0, 8),
     hex.slice(8, 12),
     `4${hex.slice(13, 16)}`,
-    `${(0x8 | (parseInt(hex[16], 16) & 0x3)).toString(16)}${hex.slice(17, 20)}`,
+    `${(0x8 | (parseInt(hex[16]!, 16) & 0x3)).toString(16)}${hex.slice(17, 20)}`,
     hex.slice(20, 32),
   ].join("-");
 }
@@ -1945,7 +2012,22 @@ function makeHeartbeatBytes(): Uint8Array {
 function computeUsage(state: StreamState) {
   const completion_tokens = state.outputTokens;
-  const total_tokens = state.totalTokens || completion_tokens;
+  const usedTokens = state.totalTokens || completion_tokens;
+  // If Cursor enforces a tighter context window than we inferred, scale
+  // total_tokens proportionally so pi's compaction threshold fires before
+  // Cursor errors — rather than after.
+  //
+  // Example: Cursor caps Gemini at 200 k but we registered 1 M.
+  //   usedTokens=197k → total_tokens = round(197k × 1M/200k) = 985k
+  //   985k > 1M − 16k (reserveTokens) → pi triggers compaction ✓
+  let total_tokens = usedTokens;
+  const cursorWindow = state.cursorContextWindow;
+  const piWindow = state.inferredContextWindow;
+  if (cursorWindow > 0 && piWindow > cursorWindow) {
+    total_tokens = Math.round(usedTokens * piWindow / cursorWindow);
+  }
   const prompt_tokens = Math.max(0, total_tokens - completion_tokens);
   return { prompt_tokens, completion_tokens, total_tokens };
 }
@@ -2169,11 +2251,14 @@ function writeSSEStream(
     };
   };
+  const storedForState = conversationStates.get(convKey);
   const state: StreamState = {
     toolCallIndex: 0,
     pendingExecs: [],
     outputTokens: 0,
     totalTokens: 0,
+    cursorContextWindow: storedForState?.effectiveContextWindow ?? 0,
+    inferredContextWindow: inferContextWindow(modelId),
   };
   const tagFilter = createThinkingTagFilter();
   let mcpExecReceived = false;
@@ -2279,7 +2364,9 @@ function writeSSEStream(
             if (stored) {
               stored.checkpoint = checkpointBytes;
               for (const [k, v] of blobStore) stored.blobStore.set(k, v);
+              if (state.cursorContextWindow > 0) {
+                stored.effectiveContextWindow = state.cursorContextWindow;
+              }
             }
             debugLog("stream.checkpoint_buffered", {
               requestId,
@@ -2354,6 +2441,9 @@ function writeSSEStream(
         stored.checkpoint = latestCheckpoint;
         debugLog("stream.checkpoint_committed", { requestId, convKey, stored });
       }
+      if (state.cursorContextWindow > 0) {
+        stored.effectiveContextWindow = state.cursorContextWindow;
+      }
     }
     if (cancelled) return;
     if (!mcpExecReceived) {
@@ -2446,7 +2536,7 @@ function handleToolResultResume(
   for (const result of toolResults) {
     const turnToolStep = currentTurn.steps.find(
-      (step) =>
+      (step): step is ParsedToolCallStep =>
         step.kind === "toolCall" && step.toolCallId === result.toolCallId,
     );
     if (turnToolStep) {
@@ -2571,11 +2661,14 @@ async function handleNonStreamingResponse(
   };
   req.on("close", onClientClose);
   res.on("close", onClientClose);
+  const storedForNonStream = conversationStates.get(convKey);
   const state: StreamState = {
     toolCallIndex: 0,
     pendingExecs: [],
     outputTokens: 0,
     totalTokens: 0,
+    cursorContextWindow: storedForNonStream?.effectiveContextWindow ?? 0,
+    inferredContextWindow: inferContextWindow(modelId),
   };
   const tagFilter = createThinkingTagFilter();
   let fullText = "";
@@ -2699,6 +2792,9 @@ async function handleNonStreamingResponse(
             stored,
           });
         }
+        if (state.cursorContextWindow > 0) {
+          stored.effectiveContextWindow = state.cursorContextWindow;
+        }
       }
       if (cancelled) {