npm - pi-cursor-sdk - Versions diffs - 0.1.13 → 0.1.15 - Mend

pi-cursor-sdk 0.1.13 → 0.1.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/CHANGELOG.md +36 -0
package/README.md +71 -32
package/docs/cursor-model-ux-spec.md +23 -9
package/docs/cursor-native-tool-replay.md +88 -0
package/docs/cursor-native-tool-visual-audit.md +183 -0
package/package.json +5 -2
package/src/bundled-context-windows.ts +5 -2
package/src/context.ts +34 -11
package/src/cursor-fallback-models.generated.ts +4068 -71
package/src/cursor-mcp-timeout-override.ts +111 -0
package/src/cursor-native-tool-display.ts +397 -46
package/src/cursor-pi-tool-bridge.ts +637 -0
package/src/cursor-provider.ts +477 -81
package/src/cursor-question-tool.ts +247 -0
package/src/cursor-session-cwd.ts +33 -0
package/src/cursor-tool-names.ts +67 -0
package/src/cursor-tool-transcript.ts +730 -61
package/src/index.ts +7 -0

package/docs/cursor-native-tool-visual-audit.md ADDED Viewed

@@ -0,0 +1,183 @@
+# Cursor Native Tool Visual Audit Workflow
+This workflow verifies Cursor SDK tool replay the way a human sees it in pi's interactive TUI, without stealing macOS focus.
+Use it before accepting replay-card commits or PRs. Text logs and JSONL are necessary, but they are not enough when the claim is visual parity: always keep before/after PNGs for the exact prompt.
+## When to use this
+Use this workflow when changing or reviewing:
+- Cursor native tool replay cards.
+- Tool-call turn ordering.
+- Tool-result error styling.
+- Truncation, continuation hints, timeout labels, or path display.
+- Any PR claiming native TUI parity.
+Do not use this for ordinary unit-only logic changes.
+## Why this workflow exists
+Earlier manual verification used a visible Terminal window plus `screencapture`. That worked, but it stole system focus and made it easy for the user to type into the audit window by accident.
+The preferred workflow is now offscreen:
+1. Spawn `pi` in a pseudo-terminal at a fixed size.
+2. Feed the prompt programmatically.
+3. Save raw ANSI output and plain text output.
+4. Render the terminal buffer through xterm.js in headless Playwright.
+5. Save a PNG screenshot.
+6. Inspect the session JSONL for exact persisted `toolCall` / `toolResult` data.
+This gives human-like visual evidence without activating Terminal, iTerm, or a browser window.
+## Tool stack
+Install the harness outside this repo so generated assets and temporary dependencies do not pollute commits:
+```bash
+HARNESS=/tmp/pi-visual-harness
+rm -rf "$HARNESS"
+mkdir -p "$HARNESS"
+cd "$HARNESS"
+npm init -y
+npm install node-pty @xterm/xterm playwright
+npm rebuild node-pty
+```
+`npm rebuild node-pty` is useful after Node upgrades; without it, `node-pty` may fail with `posix_spawnp failed`.
+## Runner contract
+A runner script should:
+- Spawn `pi -e <extension-dir> --model cursor/composer-2.5` with:
+  - `PI_CURSOR_NATIVE_TOOL_DISPLAY=1`
+  - `TERM=xterm-256color`
+  - fixed PTY size, for example `150x45`
+  - cwd set to the target audit repo.
+- Wait for startup.
+- Write the exact prompt and carriage return to the PTY.
+- Wait a bounded amount of time.
+- Save:
+  - `<label>.ansi` raw terminal bytes.
+  - `<label>.txt` stripped text for quick search.
+  - `<label>.png` rendered xterm screenshot.
+  - `<label>.jsonl.path` pointing to the latest pi session JSONL.
+- Kill the PTY child after capture.
+- Check for leftover commands when prompts can background work, especially shell timeout tests.
+Example invocation shape:
+```bash
+node /tmp/pi-visual-harness/run-pi-visual.mjs \
+  --label after-shell-nonzero \
+  --ext /path/to/pi-cursor-sdk \
+  --cwd /path/to/test-workspace \
+  --prompt "Run \`printf 'cursor-shell-stderr\\n' >&2; exit 7\` using only the shell/terminal tool. Do not use read, grep, glob, find, ls, edit, or write. Print the command result exactly, then stop." \
+  --wait-ms 30000 \
+  --out-dir /tmp/pi-visual-harness/review-current
+```
+Keep the runner in `/tmp` unless the project explicitly decides to check in a maintained audit harness.
+## Before/after comparison
+Use a clean worktree for the baseline and the active worktree for the candidate change:
+```bash
+BASE=/tmp/pi-cursor-visual-review
+BEFORE_WT=$BASE/before-main
+AFTER_WT=/path/to/pi-cursor-sdk
+TARGET=/path/to/test-workspace
+rm -rf "$BASE"
+git fetch origin main
+BASE_COMMIT=$(git merge-base origin/main HEAD)
+git worktree add --detach "$BEFORE_WT" "$BASE_COMMIT"
+# Optional speedup when the before worktree has no install of its own.
+ln -s "$AFTER_WT/node_modules" "$BEFORE_WT/node_modules"
+```
+Then run the same prompt against both extension dirs:
+```bash
+node /tmp/pi-visual-harness/run-pi-visual.mjs \
+  --label before-glob-single \
+  --ext "$BEFORE_WT" \
+  --cwd "$TARGET" \
+  --prompt "Find files matching \`src/tools/reindex.ts\` using only the glob/file-search tool. Do not use shell, bash, grep, read, or ls. Print the matched files exactly as found, then stop." \
+  --wait-ms 16000 \
+  --out-dir /tmp/pi-visual-harness/review-current
+node /tmp/pi-visual-harness/run-pi-visual.mjs \
+  --label after-glob-single \
+  --ext "$AFTER_WT" \
+  --cwd "$TARGET" \
+  --prompt "Find files matching \`src/tools/reindex.ts\` using only the glob/file-search tool. Do not use shell, bash, grep, read, or ls. Print the matched files exactly as found, then stop." \
+  --wait-ms 16000 \
+  --out-dir /tmp/pi-visual-harness/review-current
+```
+For review, create a simple HTML/PNG gallery that places `before-*.png` and `after-*.png` side by side. Keep the generated gallery in `/tmp` unless explicitly asked to commit visual artifacts.
+## JSONL inspection
+For each visual claim, inspect the JSONL path written by the runner. Confirm at least:
+- `toolCall.name` is the expected pi-facing replay tool name.
+- `toolCall.arguments` show the expected user-facing args.
+- `toolResult.toolName` matches the call.
+- `toolResult.content[0].text` contains the recorded body expected in the card.
+- `toolResult.isError` matches the visual card state.
+For local pi MCP bridge claims, also confirm:
+- Bridged calls appear as the real pi tool name (for example `sem_reindex`), not the MCP bridge name (for example `pi__sem_reindex`; or `read`/`pi__read` when overlapping built-ins are explicitly exposed).
+- The JSONL has no second Cursor MCP replay card for the same bridged call.
+- Non-bridge Cursor MCP activity, if present, still renders as neutral Cursor activity instead of being suppressed.
+Small helper pattern:
+```bash
+python3 - <<'PY'
+import json, pathlib
+path = pathlib.Path('/tmp/pi-visual-harness/review-current/after-shell-nonzero.jsonl.path').read_text().strip()
+for line in pathlib.Path(path).read_text().splitlines():
+    obj = json.loads(line)
+    msg = obj.get('message', {})
+    if msg.get('role') == 'assistant':
+        for part in msg.get('content', []):
+            if part.get('type') == 'toolCall':
+                print('CALL', part.get('name'), part.get('arguments'))
+    if msg.get('role') == 'toolResult':
+        text = msg.get('content', [{}])[0].get('text', '')
+        print('RESULT', msg.get('toolName'), 'isError=', msg.get('isError'), repr(text[:160]))
+PY
+```
+## Safety rules
+- Prefer the offscreen PTY renderer. Do not use `osascript`, visible Terminal windows, or `screencapture` unless a user explicitly asks for a real desktop screenshot.
+- Keep generated screenshots, HTML galleries, ANSI logs, and temporary harness dependencies out of the repo by default.
+- Use short, deterministic prompts with bounded wait times.
+- For timeout/background prompts, always check for leftovers:
+```bash
+ps -axo pid,etime,command | rg "sleep 2|should-not-print|<audit-session-label>" || true
+```
+- If the model uses a different tool than requested, record it as model/provider behavior unless JSONL shows replay lost or misrendered a completed Cursor tool event.
+- Visual output can differ slightly from macOS Terminal fonts because xterm.js renders offscreen. Treat this workflow as evidence for card class, color state, labels, ordering, truncation, and content. Use a real terminal screenshot only for pixel-level terminal-specific bugs.
+## Required evidence before commit or merge
+Before accepting a replay-card change, provide:
+- Before and after PNG paths.
+- The prompt used for each pair.
+- JSONL paths for each run.
+- A short statement of what changed visually.
+- The relevant JSONL `toolCall` / `toolResult` facts.
+- `npm test` and `npm run typecheck` results, unless the change is documentation-only.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
 	"name": "pi-cursor-sdk",
-	"version": "0.1.13",
+	"version": "0.1.15",
 	"description": "pi provider extension backed by @cursor/sdk local agents",
 	"author": "Mitch Fultz (https://github.com/fitchmultz)",
 	"license": "MIT",
@@ -26,6 +26,8 @@
 		"scripts/refresh-cursor-model-snapshots.mjs",
 		"README.md",
 		"docs/cursor-model-ux-spec.md",
+		"docs/cursor-native-tool-replay.md",
+		"docs/cursor-native-tool-visual-audit.md",
 		"LICENSE",
 		"CHANGELOG.md"
 	],
@@ -40,7 +42,8 @@
 		"refresh:cursor-snapshots": "node scripts/refresh-cursor-model-snapshots.mjs"
 	},
 	"dependencies": {
-		"@cursor/sdk": "^1.0.13"
+		"@cursor/sdk": "^1.0.13",
+		"@modelcontextprotocol/sdk": "^1.29.0"
 	},
 	"peerDependencies": {
 		"@earendil-works/pi-ai": "*",

package/src/bundled-context-windows.ts CHANGED Viewed

@@ -1,14 +1,16 @@
-// Generated from Cursor SDK checkpoint tokenDetails.maxTokens on 2026-05-04.
+// Generated from Cursor SDK checkpoint tokenDetails.maxTokens on 2026-05-18.
+// Refresh with: npm run refresh:cursor-snapshots -- --write --context-windows ~/.pi/agent/cursor-sdk-context-windows.json
 // These are default/non-Max-mode SDK context windows for Cursor models that do not
 // expose a catalog `context` parameter. Do not replace them with Max Mode values
 // unless the Cursor SDK exposes an exact Max Mode model selection and the extension
 // uses that selection for matching pi model IDs.
 export const BUNDLED_CONTEXT_WINDOWS = {
+	"default": 200000,
 	"claude-haiku-4-5": 200000,
 	"claude-opus-4-5": 200000,
 	"composer-1.5": 200000,
 	"composer-2": 200000,
-	default: 200000,
+	"composer-2.5": 200000,
 	"gemini-2.5-flash": 200000,
 	"gemini-3-flash": 200000,
 	"gemini-3.1-pro": 200000,
@@ -22,6 +24,7 @@ export const BUNDLED_CONTEXT_WINDOWS = {
 	"gpt-5.3-codex-spark": 128000,
 	"gpt-5.4-mini": 272000,
 	"gpt-5.4-nano": 272000,
+	"gpt-5.5@272k": 272000,
 	"grok-4-20": 200000,
 	"kimi-k2.5": 262000,
 } as const satisfies Record<string, number>;

package/src/context.ts CHANGED Viewed

@@ -1,5 +1,6 @@
 import type { Context, Message, ToolCall } from "@earendil-works/pi-ai";
 import type { SDKImage } from "@cursor/sdk";
+import { getCursorReplayPromptLabel } from "./cursor-tool-names.js";
 export interface CursorPrompt {
 	text: string;
@@ -58,8 +59,26 @@ function formatContentBlocks(content: string | { type: string; text?: string; da
 }
 function formatToolCall(toolCall: ToolCall): string {
-	const args = JSON.stringify(toolCall.arguments);
-	return `Tool call (${toolCall.name}, call ${toolCall.id}): ${args}`;
+	const args = JSON.stringify(toolCall.arguments) ?? "";
+	return `Tool call (${getCursorReplayPromptLabel(toolCall.name)}, call ${toolCall.id}): ${args}`;
+}
+function sanitizeSystemPromptForCursor(systemPrompt: string): string {
+	let sanitized = systemPrompt;
+	sanitized = sanitized.replace(
+		/Available tools:\n[\s\S]*?\n\nIn addition to the tools above, you may have access to other custom tools depending on the project\.\n\n/g,
+		"Pi tool catalog omitted: Cursor can call only Cursor SDK tools exposed in this run.\n\n",
+	);
+	sanitized = sanitized.replace(
+		/Guidelines:\n[\s\S]*?\n\nPi documentation /g,
+		"Guidelines:\n- Be concise in your responses.\n- Show file paths clearly when working with files.\n\nPi documentation ",
+	);
+	sanitized = sanitized.replace(
+		/\n\nThe following skills provide specialized instructions for specific tasks\.[\s\S]*?<\/available_skills>/g,
+		"",
+	);
+	sanitized = sanitized.replace(/\n+Semantic code intelligence priority:[\s\S]*$/g, "");
+	return sanitized.trim();
 }
 function formatMessage(msg: Message): string | undefined {
@@ -84,7 +103,7 @@ function formatMessage(msg: Message): string | undefined {
 		case "toolResult": {
 			const text = formatContentBlocks(msg.content);
 			const label = msg.isError ? "Tool error" : "Tool result";
-			return `${label} (${msg.toolName}, call ${msg.toolCallId}): ${text}`;
+			return `${label} (${getCursorReplayPromptLabel(msg.toolName)}, call ${msg.toolCallId}): ${text}`;
 		}
 	}
 }
@@ -152,15 +171,17 @@ export function buildCursorPrompt(context: Context, options: CursorPromptOptions
 	const sectionsBeforeMessages: string[] = [
 		[
 			"Cursor SDK tool boundary:",
-			"Only tools exposed by the Cursor SDK in this run are callable. The pi system prompt and transcript are context only; they do not grant access to pi tools or tool names mentioned there.",
-			"If the user asks you to search, fetch, browse, or research the web, use an actual Cursor SDK web/search/browser/MCP tool call. If no such Cursor SDK tool is available, say that web search is not configured for this Cursor SDK run.",
-			"Do not plan to use or claim to have used pi-only tools such as WebSearch or WebFetch unless the Cursor SDK actually exposes and executes that tool in this run.",
-			"Image payload boundary: only images attached to the latest user message are available as image bytes. Earlier images appear only as [image omitted from transcript] placeholders; ask the user to reattach or describe a prior image if the latest request depends on it.",
+			"You can call only tools actually exposed by Cursor SDK in this run. Pi tool names, replay tool names, and transcript tool names are context only, not callable capabilities.",
+			"If asked to list or exercise available tools, list and exercise Cursor SDK tools only; do not claim access to pi-side tools from the system prompt unless Cursor exposes an equivalent tool that runs.",
+			"Use pi__cursor_ask_question for material choices if exposed.",
+			"Web: use Cursor web/search/browser/MCP or say web search is not configured; do not claim WebSearch/WebFetch unless Cursor executes them.",
+			"Replay: pi may display recorded Cursor tool activity as pi-style cards, but replay is display-only and not a capability to invoke.",
+			"Images: only latest user images are sent; ask to reattach or describe prior images.",
 		].join("\n"),
 	];
 	if (context.systemPrompt) {
-		sectionsBeforeMessages.push(`System instructions from pi:\n${context.systemPrompt}`);
+		sectionsBeforeMessages.push(`System instructions from pi:\n${sanitizeSystemPromptForCursor(context.systemPrompt)}`);
 	}
 	const messageSections = context.messages
@@ -171,8 +192,8 @@ export function buildCursorPrompt(context: Context, options: CursorPromptOptions
 		.filter((section): section is { index: number; text: string } => section !== undefined);
 	const sectionsAfterMessages = [
 		[
-			"Answer the latest user request above using your capabilities. Do not assume access to pi tools.",
-			"If the user asks for web research, do not claim to have searched the web unless a Cursor SDK web/search/browser/MCP tool was actually used.",
+			"Answer the latest user request above using Cursor SDK capabilities only. Do not list, promise, or call pi-only tools from the system prompt as if they were available.",
+			"If web research is requested, do not claim it unless a Cursor web/search/browser/MCP tool ran.",
 		].join("\n"),
 	];
 	const images = extractLatestImages(context.messages);
@@ -188,6 +209,8 @@ export function buildCursorPrompt(context: Context, options: CursorPromptOptions
 		getLatestUserMessageIndex(context.messages),
 		budgetOptions,
 	);
+	const text = parts.join(SECTION_SEPARATOR);
-	return { text: parts.join(SECTION_SEPARATOR), images };
+	return { text, images };
 }