@offbynan/pi-cursor-provider 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +64 -1
- package/auth.ts +1 -0
- package/index.ts +17 -6
- package/package.json +2 -1
- package/proto/agent_pb.ts +1 -0
- package/proxy.ts +110 -14
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# pi-cursor-provider
|
|
2
2
|
|
|
3
|
-
> **This fork improves on the upstream in
|
|
3
|
+
> **This fork improves on the upstream in ten areas:** image support, correct `pi -p` exit behaviour, removal of dead eviction code, accurate per-model context window inference, post-compaction session sync, context window scaling when Cursor enforces a tighter cap, per-model cost estimation, model deduplication with reasoning-effort mapping, thinking-tag filtering, and structured debug logging. See the sections below for details.
|
|
4
4
|
|
|
5
5
|
[](https://www.npmjs.com/package/@offbynan/pi-cursor-provider)
|
|
6
6
|
|
|
@@ -33,6 +33,69 @@ This fork fixes both: empty and non-JSON end-stream bodies are treated as succes
|
|
|
33
33
|
|
|
34
34
|
The upstream proxy included a 30-minute TTL eviction mechanism (`evictStaleConversations`, `CONVERSATION_TTL_MS`, `sessionScoped`, `lastAccessMs`). All conversations created by pi include a session ID, permanently exempting them from TTL eviction, so this code was never reachable. This fork removes it.
|
|
35
35
|
|
|
36
|
+
### Accurate per-model context window inference
|
|
37
|
+
|
|
38
|
+
Cursor's `GetUsableModels` RPC does not return context window sizes, so the upstream proxy hardcodes 200 k for every model. This fork exports an `inferContextWindow(id)` function that derives the correct window from known model families:
|
|
39
|
+
|
|
40
|
+
| Family | Window |
|
|
41
|
+
| ------ | ------ |
|
|
42
|
+
| Claude 4.6 Sonnet / Opus | 1 M |
|
|
43
|
+
| All other Claude | 200 k |
|
|
44
|
+
| Gemini 2.5 / 3.x | 1 M |
|
|
45
|
+
| GPT nano / mini variants | 128 k |
|
|
46
|
+
| GPT-5.5+ | 1 M |
|
|
47
|
+
| GPT-5.x (other) | 400 k |
|
|
48
|
+
| Grok 4 | 256 k |
|
|
49
|
+
| Kimi K2.x | 262 k |
|
|
50
|
+
| Anything with `-1m` suffix | 1 M |
|
|
51
|
+
| Unknown / Composer | 200 k |
|
|
52
|
+
|
|
53
|
+
This ensures pi uses the right compaction thresholds and token budget for each model.
|
|
54
|
+
|
|
55
|
+
### Post-compaction session sync
|
|
56
|
+
|
|
57
|
+
When pi compacts its message list (the `session_compact` lifecycle event), the proxy's cached conversation checkpoint still reflects the full pre-compaction conversation. Continuing without clearing that cache would cause a history mismatch, forcing an expensive full reconstruction on the next request.
|
|
58
|
+
|
|
59
|
+
This fork listens for `session_compact` and eagerly clears the stored checkpoint for the affected session, so both sides stay in sync at zero extra cost.
|
|
60
|
+
|
|
61
|
+
### Context window scaling when Cursor enforces a tighter cap
|
|
62
|
+
|
|
63
|
+
Cursor sometimes enforces a tighter context window at runtime than what the model ID implies (for example, capping Gemini at 200 k even though we registered 1 M). In that case the raw `usedTokens` from Cursor's `ConversationTokenDetails` would appear far below pi's compaction threshold, so pi would never compact — then Cursor would eventually error with a context-overflow.
|
|
64
|
+
|
|
65
|
+
This fork reads `maxTokens` from `ConversationTokenDetails` and, when Cursor's cap is tighter than the inferred window, scales `total_tokens` proportionally:
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
total_tokens = round(usedTokens × piWindow / cursorWindow)
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
That makes pi's compaction threshold fire at the right time relative to the window Cursor is actually enforcing.
|
|
72
|
+
|
|
73
|
+
### Per-model cost estimation
|
|
74
|
+
|
|
75
|
+
The upstream repo provides no cost data, so pi cannot show per-turn cost estimates for Cursor models.
|
|
76
|
+
|
|
77
|
+
This fork ships a detailed cost table (input / output / cache-read / cache-write prices in $/M tokens) covering every current model family — Claude 4.x, GPT-5.x, Gemini 2.5/3.x, Grok 4, Kimi K2, and Composer — plus a pattern-based fallback for variants not yet in the table. Pi uses this data to display cost estimates after each turn.
|
|
78
|
+
|
|
79
|
+
### Model deduplication with reasoning-effort mapping
|
|
80
|
+
|
|
81
|
+
Cursor's `GetUsableModels` RPC can return dozens of near-duplicate IDs that differ only by effort suffix (e.g. `gpt-5.4-low`, `gpt-5.4-medium`, `gpt-5.4-high`, `gpt-5.4-xhigh`). The upstream passes all of these through verbatim, producing a cluttered model list where the user must manually pick the right suffix and pi's reasoning-effort setting is ignored.
|
|
82
|
+
|
|
83
|
+
This fork deduplicates them: model variants that share the same base ID and differ only by effort suffix are collapsed into a single entry with `supportsReasoningEffort: true` and an effort map keyed by pi's reasoning levels (`minimal` / `low` / `medium` / `high` / `xhigh`). Pi's thinking-level setting then drives the effort suffix automatically, and the model list stays manageable. See the [Model Mapping](#model-mapping) section for the full deduplication rules.
|
|
84
|
+
|
|
85
|
+
### Thinking-tag filtering
|
|
86
|
+
|
|
87
|
+
Some models (notably certain Gemini variants) emit reasoning content inline with the response, wrapped in tags like `<think>`, `<thinking>`, `<reasoning>`, or `<thought>`. The upstream passes this through as raw text, polluting the main response with unrendered XML tags.
|
|
88
|
+
|
|
89
|
+
This fork detects and strips these tags in the proxy's stream processor, routing the extracted content to the `reasoning_content` SSE field so pi renders it as structured reasoning rather than as part of the assistant's reply.
|
|
90
|
+
|
|
91
|
+
### Structured debug logging
|
|
92
|
+
|
|
93
|
+
The upstream has no observability. This fork adds opt-in JSONL event logging (set `PI_CURSOR_PROVIDER_DEBUG=1`) covering every stage of a request: HTTP ingress, message parsing, checkpoint reads/writes, bridge lifecycle, tool call pauses, tool result resumes, and stream completion. A bundled `debug:timeline` script converts a raw log file into a compact human-readable timeline for diagnosing proxy behaviour.
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
npm run debug:timeline -- --latest
|
|
97
|
+
```
|
|
98
|
+
|
|
36
99
|
## How it works
|
|
37
100
|
|
|
38
101
|
```
|
package/auth.ts
CHANGED
package/index.ts
CHANGED
|
@@ -12,7 +12,7 @@
|
|
|
12
12
|
* Based on https://github.com/ephraimduncan/opencode-cursor by Ephraim Duncan.
|
|
13
13
|
*/
|
|
14
14
|
|
|
15
|
-
import rawFallbackModels from "./cursor-models-raw.json";
|
|
15
|
+
import rawFallbackModels from "./cursor-models-raw.json" with { type: "json" };
|
|
16
16
|
import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
|
|
17
17
|
import type {
|
|
18
18
|
OAuthCredentials,
|
|
@@ -30,6 +30,7 @@ import {
|
|
|
30
30
|
import {
|
|
31
31
|
cleanupSessionState,
|
|
32
32
|
getCursorModels,
|
|
33
|
+
inferContextWindow,
|
|
33
34
|
startProxy,
|
|
34
35
|
type CursorModel,
|
|
35
36
|
} from "./proxy.js";
|
|
@@ -126,7 +127,7 @@ function summarizeBranchTail(
|
|
|
126
127
|
ctx: {
|
|
127
128
|
sessionManager?: {
|
|
128
129
|
getBranch?: () => unknown[];
|
|
129
|
-
getLeafId?: () => string;
|
|
130
|
+
getLeafId?: () => string | null;
|
|
130
131
|
getSessionId?: () => string;
|
|
131
132
|
};
|
|
132
133
|
},
|
|
@@ -474,7 +475,7 @@ function modelConfig(m: ProcessedModel) {
|
|
|
474
475
|
reasoning: supportsReasoningModelId(m.id),
|
|
475
476
|
input: ["text", "image"] as ("text" | "image")[],
|
|
476
477
|
cost: estimateModelCost(m.id),
|
|
477
|
-
contextWindow: m.
|
|
478
|
+
contextWindow: inferContextWindow(m.id),
|
|
478
479
|
maxTokens: m.maxTokens,
|
|
479
480
|
compat: {
|
|
480
481
|
supportsDeveloperRole: false,
|
|
@@ -497,11 +498,11 @@ export const FALLBACK_MODELS: CursorModel[] = (
|
|
|
497
498
|
|
|
498
499
|
// ── Extension ──
|
|
499
500
|
|
|
500
|
-
export function registerSessionLifecycleCleanup(pi: ExtensionAPI) {
|
|
501
|
+
export function registerSessionLifecycleCleanup(pi: ExtensionAPI): void {
|
|
501
502
|
const cleanupCurrentSession = (
|
|
502
503
|
_event: unknown,
|
|
503
504
|
ctx: {
|
|
504
|
-
sessionManager: { getSessionId(): string; getLeafId?: () => string };
|
|
505
|
+
sessionManager: { getSessionId(): string; getLeafId?: () => string | null };
|
|
505
506
|
},
|
|
506
507
|
) => {
|
|
507
508
|
debugExtensionLog("session.cleanup_hook", {
|
|
@@ -515,6 +516,16 @@ export function registerSessionLifecycleCleanup(pi: ExtensionAPI) {
|
|
|
515
516
|
pi.on("session_before_fork", cleanupCurrentSession);
|
|
516
517
|
pi.on("session_before_tree", cleanupCurrentSession);
|
|
517
518
|
pi.on("session_shutdown", cleanupCurrentSession);
|
|
519
|
+
|
|
520
|
+
// After pi compacts its message list the cursor proxy's cached checkpoint
|
|
521
|
+
// still reflects the full pre-compaction conversation. Clearing the state
|
|
522
|
+
// here forces the proxy to rebuild the cursor conversation from pi's now-
|
|
523
|
+
// compacted messages on the next request, so both sides stay in sync.
|
|
524
|
+
pi.on("session_compact", (_event, ctx) => {
|
|
525
|
+
const sessionId = ctx.sessionManager.getSessionId();
|
|
526
|
+
debugExtensionLog("session.post_compact_cleanup", { sessionId });
|
|
527
|
+
cleanupSessionState(sessionId);
|
|
528
|
+
});
|
|
518
529
|
}
|
|
519
530
|
|
|
520
531
|
function registerExtensionDebugHooks(pi: ExtensionAPI) {
|
|
@@ -613,7 +624,7 @@ function registerExtensionDebugHooks(pi: ExtensionAPI) {
|
|
|
613
624
|
});
|
|
614
625
|
}
|
|
615
626
|
|
|
616
|
-
export default async function (pi: ExtensionAPI) {
|
|
627
|
+
export default async function (pi: ExtensionAPI): Promise<void> {
|
|
617
628
|
// Current access token, updated by login/refresh/getApiKey
|
|
618
629
|
let currentToken = "";
|
|
619
630
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@offbynan/pi-cursor-provider",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.3.0",
|
|
4
4
|
"description": "Pi extension providing access to Cursor models via OAuth and a local OpenAI-compatible gRPC proxy",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"license": "MIT",
|
|
@@ -53,6 +53,7 @@
|
|
|
53
53
|
"debug:timeline": "node scripts/debug-log-timeline.mjs"
|
|
54
54
|
},
|
|
55
55
|
"devDependencies": {
|
|
56
|
+
"typescript": "^6.0.3",
|
|
56
57
|
"vitest": "^4.1.3"
|
|
57
58
|
}
|
|
58
59
|
}
|
package/proto/agent_pb.ts
CHANGED
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
// @generated by protoc-gen-es v2.10.2 with parameter "target=ts"
|
|
2
2
|
// @generated from file agent.proto (package agent.v1, syntax proto3)
|
|
3
3
|
/* eslint-disable */
|
|
4
|
+
// @ts-nocheck
|
|
4
5
|
|
|
5
6
|
import type { Message } from "@bufbuild/protobuf";
|
|
6
7
|
import type { GenEnum, GenFile, GenMessage, GenService } from "@bufbuild/protobuf/codegenv2";
|
package/proxy.ts
CHANGED
|
@@ -56,6 +56,8 @@ import {
|
|
|
56
56
|
McpTextContentSchema,
|
|
57
57
|
McpToolCallSchema,
|
|
58
58
|
McpToolDefinitionSchema,
|
|
59
|
+
McpToolErrorSchema,
|
|
60
|
+
McpToolResultSchema,
|
|
59
61
|
McpToolResultContentItemSchema,
|
|
60
62
|
ModelDetailsSchema,
|
|
61
63
|
ReadRejectedSchema,
|
|
@@ -176,13 +178,24 @@ export interface StoredConversation {
|
|
|
176
178
|
conversationId: string;
|
|
177
179
|
checkpoint: Uint8Array | null;
|
|
178
180
|
blobStore: Map<string, Uint8Array>;
|
|
181
|
+
/**
|
|
182
|
+
* Cursor's actual context window for this conversation, populated from
|
|
183
|
+
* ConversationTokenDetails.maxTokens in checkpoint updates. Used to correct
|
|
184
|
+
* our static inferContextWindow() estimate when Cursor enforces a tighter cap.
|
|
185
|
+
*/
|
|
186
|
+
effectiveContextWindow?: number;
|
|
179
187
|
}
|
|
180
188
|
|
|
181
189
|
interface StreamState {
|
|
182
190
|
toolCallIndex: number;
|
|
183
191
|
pendingExecs: PendingExec[];
|
|
184
192
|
outputTokens: number;
|
|
193
|
+
/** usedTokens from Cursor's ConversationTokenDetails. */
|
|
185
194
|
totalTokens: number;
|
|
195
|
+
/** maxTokens from Cursor's ConversationTokenDetails; 0 = not yet received. */
|
|
196
|
+
cursorContextWindow: number;
|
|
197
|
+
/** inferContextWindow(modelId) — our static estimate for this model. */
|
|
198
|
+
inferredContextWindow: number;
|
|
186
199
|
}
|
|
187
200
|
|
|
188
201
|
interface ToolResultInfo {
|
|
@@ -265,6 +278,8 @@ function truncateDebugString(value: string, max = 4000): string {
|
|
|
265
278
|
: value;
|
|
266
279
|
}
|
|
267
280
|
|
|
281
|
+
function sanitizeForDebug(value: Record<string, unknown>): Record<string, unknown>;
|
|
282
|
+
function sanitizeForDebug(value: unknown): unknown;
|
|
268
283
|
function sanitizeForDebug(value: unknown): unknown {
|
|
269
284
|
if (value == null) return value;
|
|
270
285
|
if (typeof value === "string") return truncateDebugString(value);
|
|
@@ -443,7 +458,7 @@ function spawnBridge(options: SpawnBridgeOptions): BridgeHandle {
|
|
|
443
458
|
unref() {
|
|
444
459
|
try {
|
|
445
460
|
proc.unref();
|
|
446
|
-
(proc.stdout as
|
|
461
|
+
(proc.stdout as { unref?: () => void } | null)?.unref?.();
|
|
447
462
|
} catch {}
|
|
448
463
|
},
|
|
449
464
|
onClose(cb: (code: number) => void) {
|
|
@@ -525,7 +540,7 @@ export async function getCursorModels(apiKey: string): Promise<CursorModel[]> {
|
|
|
525
540
|
response.exitCode === 0 &&
|
|
526
541
|
response.body.length > 0
|
|
527
542
|
) {
|
|
528
|
-
let decoded:
|
|
543
|
+
let decoded: ReturnType<typeof fromBinary<typeof GetUsableModelsResponseSchema>> | null = null;
|
|
529
544
|
try {
|
|
530
545
|
decoded = fromBinary(GetUsableModelsResponseSchema, response.body);
|
|
531
546
|
} catch {
|
|
@@ -576,18 +591,68 @@ function decodeConnectUnaryBody(payload: Uint8Array): Uint8Array | null {
|
|
|
576
591
|
return null;
|
|
577
592
|
}
|
|
578
593
|
|
|
594
|
+
/**
|
|
595
|
+
* Infer context window size from the model ID.
|
|
596
|
+
*
|
|
597
|
+
* Cursor's GetUsableModels RPC does not expose context window sizes, so we
|
|
598
|
+
* derive them from known model families. Update when new major versions ship.
|
|
599
|
+
*
|
|
600
|
+
* Sources:
|
|
601
|
+
* - Claude: platform.claude.ai/docs — claude-4.6-sonnet / claude-4.6-opus: 1M (GA Mar 2026);
|
|
602
|
+
* all other Claude incl. 4.5, 4, Haiku: 200k.
|
|
603
|
+
* - Gemini: ai.google.dev/gemini-api/docs — all 2.5 / 3.x models: 1M.
|
|
604
|
+
* - GPT: chatai.guide — GPT-5.x: 400k; GPT-5.5+: 1M; nano/mini variants: 128k.
|
|
605
|
+
* - Grok 4: docs.x.ai — 256k.
|
|
606
|
+
* - Kimi K2.x: platform.kimi.ai — 262,144 tokens (256k).
|
|
607
|
+
*/
|
|
608
|
+
export function inferContextWindow(id: string): number {
|
|
609
|
+
const lower = id.toLowerCase();
|
|
610
|
+
|
|
611
|
+
// Any model with an explicit -1m suffix (e.g. claude-4-sonnet-1m)
|
|
612
|
+
if (lower.includes("-1m")) return 1_048_576;
|
|
613
|
+
|
|
614
|
+
// ── Claude ────────────────────────────────────────────────────────────────
|
|
615
|
+
// Sonnet 4.6 and Opus 4.6 gained 1M context (GA March 2026).
|
|
616
|
+
// All earlier versions (4.5, 4, …) and Haiku remain at 200k.
|
|
617
|
+
if (lower.startsWith("claude-4.6-sonnet") || lower.startsWith("claude-4.6-opus")) return 1_048_576;
|
|
618
|
+
if (lower.startsWith("claude-")) return 200_000;
|
|
619
|
+
|
|
620
|
+
// ── Gemini ────────────────────────────────────────────────────────────────
|
|
621
|
+
// Gemini 2.5 / 3.x family: 1M context.
|
|
622
|
+
if (lower.startsWith("gemini-")) return 1_048_576;
|
|
623
|
+
|
|
624
|
+
// ── GPT ───────────────────────────────────────────────────────────────────
|
|
625
|
+
// nano / mini variants: 128k. GPT-5.5+: 1M. Everything else (5.x): 400k.
|
|
626
|
+
if (/^gpt-[0-9.]*-(nano|mini)/.test(lower)) return 128_000;
|
|
627
|
+
if (lower.startsWith("gpt-5.5")) return 1_048_576;
|
|
628
|
+
if (lower.startsWith("gpt-")) return 400_000;
|
|
629
|
+
|
|
630
|
+
// ── Grok ──────────────────────────────────────────────────────────────────
|
|
631
|
+
// Grok 4 series: 256k.
|
|
632
|
+
if (lower.startsWith("grok-")) return 256_000;
|
|
633
|
+
|
|
634
|
+
// ── Kimi ──────────────────────────────────────────────────────────────────
|
|
635
|
+
// Kimi K2.x: 262,144 tokens (256k).
|
|
636
|
+
if (lower.startsWith("kimi-")) return 262_144;
|
|
637
|
+
|
|
638
|
+
// Composer, default, unknown: 200k.
|
|
639
|
+
return 200_000;
|
|
640
|
+
}
|
|
641
|
+
|
|
579
642
|
function normalizeCursorModels(models: readonly unknown[]): CursorModel[] {
|
|
580
643
|
const byId = new Map<string, CursorModel>();
|
|
581
644
|
for (const model of models) {
|
|
582
|
-
|
|
583
|
-
const
|
|
645
|
+
if (!model || typeof model !== "object") continue;
|
|
646
|
+
const m = model as Record<string, unknown>;
|
|
647
|
+
const rawId = m["modelId"];
|
|
648
|
+
const id = typeof rawId === "string" ? rawId.trim() : "";
|
|
584
649
|
if (!id) continue;
|
|
585
|
-
const name = m
|
|
650
|
+
const name = String(m["displayName"] || m["displayNameShort"] || m["displayModelId"] || id);
|
|
586
651
|
byId.set(id, {
|
|
587
652
|
id,
|
|
588
653
|
name,
|
|
589
|
-
reasoning: Boolean(m
|
|
590
|
-
contextWindow:
|
|
654
|
+
reasoning: Boolean(m["thinkingDetails"]),
|
|
655
|
+
contextWindow: inferContextWindow(id),
|
|
591
656
|
maxTokens: 64_000,
|
|
592
657
|
});
|
|
593
658
|
}
|
|
@@ -1224,11 +1289,11 @@ function buildTurnStepBytes(step: ParsedTurnStep): Uint8Array {
|
|
|
1224
1289
|
toolName,
|
|
1225
1290
|
}),
|
|
1226
1291
|
...(step.result && {
|
|
1227
|
-
result: create(
|
|
1292
|
+
result: create(McpToolResultSchema, {
|
|
1228
1293
|
result: step.result.isError
|
|
1229
1294
|
? {
|
|
1230
1295
|
case: "error",
|
|
1231
|
-
value: create(
|
|
1296
|
+
value: create(McpToolErrorSchema, { error: step.result.content }),
|
|
1232
1297
|
}
|
|
1233
1298
|
: {
|
|
1234
1299
|
case: "success",
|
|
@@ -1422,7 +1487,9 @@ function processServerMessage(
|
|
|
1422
1487
|
} else if (msgCase === "conversationCheckpointUpdate") {
|
|
1423
1488
|
const stateStructure = msg.message.value as ConversationStateStructure;
|
|
1424
1489
|
if ((stateStructure as any).tokenDetails) {
|
|
1425
|
-
|
|
1490
|
+
const td = (stateStructure as any).tokenDetails as { usedTokens?: number; maxTokens?: number };
|
|
1491
|
+
if (td.usedTokens) state.totalTokens = td.usedTokens;
|
|
1492
|
+
if (td.maxTokens) state.cursorContextWindow = td.maxTokens;
|
|
1426
1493
|
}
|
|
1427
1494
|
if (onCheckpoint) {
|
|
1428
1495
|
onCheckpoint(toBinary(ConversationStateStructureSchema, stateStructure));
|
|
@@ -1831,7 +1898,7 @@ export function deterministicConversationId(convKey: string): string {
|
|
|
1831
1898
|
hex.slice(0, 8),
|
|
1832
1899
|
hex.slice(8, 12),
|
|
1833
1900
|
`4${hex.slice(13, 16)}`,
|
|
1834
|
-
`${(0x8 | (parseInt(hex[16]
|
|
1901
|
+
`${(0x8 | (parseInt(hex[16]!, 16) & 0x3)).toString(16)}${hex.slice(17, 20)}`,
|
|
1835
1902
|
hex.slice(20, 32),
|
|
1836
1903
|
].join("-");
|
|
1837
1904
|
}
|
|
@@ -1945,7 +2012,22 @@ function makeHeartbeatBytes(): Uint8Array {
|
|
|
1945
2012
|
|
|
1946
2013
|
function computeUsage(state: StreamState) {
|
|
1947
2014
|
const completion_tokens = state.outputTokens;
|
|
1948
|
-
const
|
|
2015
|
+
const usedTokens = state.totalTokens || completion_tokens;
|
|
2016
|
+
|
|
2017
|
+
// If Cursor enforces a tighter context window than we inferred, scale
|
|
2018
|
+
// total_tokens proportionally so pi's compaction threshold fires before
|
|
2019
|
+
// Cursor errors — rather than after.
|
|
2020
|
+
//
|
|
2021
|
+
// Example: Cursor caps Gemini at 200 k but we registered 1 M.
|
|
2022
|
+
// usedTokens=197k → total_tokens = round(197k × 1M/200k) = 985k
|
|
2023
|
+
// 985k > 1M − 16k (reserveTokens) → pi triggers compaction ✓
|
|
2024
|
+
let total_tokens = usedTokens;
|
|
2025
|
+
const cursorWindow = state.cursorContextWindow;
|
|
2026
|
+
const piWindow = state.inferredContextWindow;
|
|
2027
|
+
if (cursorWindow > 0 && piWindow > cursorWindow) {
|
|
2028
|
+
total_tokens = Math.round(usedTokens * piWindow / cursorWindow);
|
|
2029
|
+
}
|
|
2030
|
+
|
|
1949
2031
|
const prompt_tokens = Math.max(0, total_tokens - completion_tokens);
|
|
1950
2032
|
return { prompt_tokens, completion_tokens, total_tokens };
|
|
1951
2033
|
}
|
|
@@ -2169,11 +2251,14 @@ function writeSSEStream(
|
|
|
2169
2251
|
};
|
|
2170
2252
|
};
|
|
2171
2253
|
|
|
2254
|
+
const storedForState = conversationStates.get(convKey);
|
|
2172
2255
|
const state: StreamState = {
|
|
2173
2256
|
toolCallIndex: 0,
|
|
2174
2257
|
pendingExecs: [],
|
|
2175
2258
|
outputTokens: 0,
|
|
2176
2259
|
totalTokens: 0,
|
|
2260
|
+
cursorContextWindow: storedForState?.effectiveContextWindow ?? 0,
|
|
2261
|
+
inferredContextWindow: inferContextWindow(modelId),
|
|
2177
2262
|
};
|
|
2178
2263
|
const tagFilter = createThinkingTagFilter();
|
|
2179
2264
|
let mcpExecReceived = false;
|
|
@@ -2279,7 +2364,9 @@ function writeSSEStream(
|
|
|
2279
2364
|
if (stored) {
|
|
2280
2365
|
stored.checkpoint = checkpointBytes;
|
|
2281
2366
|
for (const [k, v] of blobStore) stored.blobStore.set(k, v);
|
|
2282
|
-
|
|
2367
|
+
if (state.cursorContextWindow > 0) {
|
|
2368
|
+
stored.effectiveContextWindow = state.cursorContextWindow;
|
|
2369
|
+
}
|
|
2283
2370
|
}
|
|
2284
2371
|
debugLog("stream.checkpoint_buffered", {
|
|
2285
2372
|
requestId,
|
|
@@ -2354,6 +2441,9 @@ function writeSSEStream(
|
|
|
2354
2441
|
stored.checkpoint = latestCheckpoint;
|
|
2355
2442
|
debugLog("stream.checkpoint_committed", { requestId, convKey, stored });
|
|
2356
2443
|
}
|
|
2444
|
+
if (state.cursorContextWindow > 0) {
|
|
2445
|
+
stored.effectiveContextWindow = state.cursorContextWindow;
|
|
2446
|
+
}
|
|
2357
2447
|
}
|
|
2358
2448
|
if (cancelled) return;
|
|
2359
2449
|
if (!mcpExecReceived) {
|
|
@@ -2446,7 +2536,7 @@ function handleToolResultResume(
|
|
|
2446
2536
|
|
|
2447
2537
|
for (const result of toolResults) {
|
|
2448
2538
|
const turnToolStep = currentTurn.steps.find(
|
|
2449
|
-
(step) =>
|
|
2539
|
+
(step): step is ParsedToolCallStep =>
|
|
2450
2540
|
step.kind === "toolCall" && step.toolCallId === result.toolCallId,
|
|
2451
2541
|
);
|
|
2452
2542
|
if (turnToolStep) {
|
|
@@ -2571,11 +2661,14 @@ async function handleNonStreamingResponse(
|
|
|
2571
2661
|
};
|
|
2572
2662
|
req.on("close", onClientClose);
|
|
2573
2663
|
res.on("close", onClientClose);
|
|
2664
|
+
const storedForNonStream = conversationStates.get(convKey);
|
|
2574
2665
|
const state: StreamState = {
|
|
2575
2666
|
toolCallIndex: 0,
|
|
2576
2667
|
pendingExecs: [],
|
|
2577
2668
|
outputTokens: 0,
|
|
2578
2669
|
totalTokens: 0,
|
|
2670
|
+
cursorContextWindow: storedForNonStream?.effectiveContextWindow ?? 0,
|
|
2671
|
+
inferredContextWindow: inferContextWindow(modelId),
|
|
2579
2672
|
};
|
|
2580
2673
|
const tagFilter = createThinkingTagFilter();
|
|
2581
2674
|
let fullText = "";
|
|
@@ -2699,6 +2792,9 @@ async function handleNonStreamingResponse(
|
|
|
2699
2792
|
stored,
|
|
2700
2793
|
});
|
|
2701
2794
|
}
|
|
2795
|
+
if (state.cursorContextWindow > 0) {
|
|
2796
|
+
stored.effectiveContextWindow = state.cursorContextWindow;
|
|
2797
|
+
}
|
|
2702
2798
|
}
|
|
2703
2799
|
|
|
2704
2800
|
if (cancelled) {
|