npm - pi-cursor-sdk - Versions diffs - 0.1.14 → 0.1.16 - Mend

pi-cursor-sdk 0.1.14 → 0.1.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/CHANGELOG.md +57 -0
package/README.md +68 -14
package/docs/cursor-live-smoke-checklist.md +271 -0
package/docs/cursor-model-ux-spec.md +27 -4
package/docs/cursor-native-tool-replay.md +99 -0
package/docs/cursor-native-tool-visual-audit.md +183 -0
package/package.json +6 -2
package/src/context.ts +214 -16
package/src/cursor-bridge-contract.ts +27 -0
package/src/cursor-live-run-accounting.ts +65 -0
package/src/cursor-mcp-timeout-override.ts +111 -0
package/src/cursor-native-tool-display.ts +409 -49
package/src/cursor-pi-tool-bridge.ts +1174 -0
package/src/cursor-provider.ts +614 -146
package/src/cursor-question-tool.ts +252 -0
package/src/cursor-session-agent.ts +372 -0
package/src/cursor-session-cwd.ts +28 -0
package/src/cursor-session-scope.ts +65 -0
package/src/cursor-state.ts +38 -10
package/src/cursor-tool-names.ts +67 -0
package/src/cursor-tool-transcript.ts +730 -61
package/src/cursor-usage-accounting.ts +71 -0
package/src/index.ts +27 -3

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,62 @@
 # Changelog
+## Unreleased
+## 0.1.16 - 2026-05-22
+### Added
+- Reuse Cursor SDK agents within the same pi session when model, API key, cwd, bridge surface, and pi context remain compatible, sending incremental follow-up prompts instead of re-bootstrapping full history on every turn.
+- Add context fingerprinting to choose bootstrap vs incremental `Agent.send()` prompts, including branch and compaction summary detection after `/tree` navigation and session compaction.
+- Add a manual [Cursor live smoke checklist](docs/cursor-live-smoke-checklist.md) for release validation with real `pi -e . --cursor-no-fast --model cursor/composer-2.5` runs, diagnostics safety scans, TUI observation, bridge/replay checks, abort/cancel coverage, and an assume-everything-is-in-scope no-optional/no-deferred release rule.
+- Share the Cursor pi bridge contract through provider prompts and bridged MCP tool descriptions via `src/cursor-bridge-contract.ts`.
+- Isolate Cursor usage and live-run accounting in `src/cursor-usage-accounting.ts` and `src/cursor-live-run-accounting.ts`.
+### Changed
+- Clarify the Cursor provider tool contract in README and replay docs: separate Cursor-native surface, pi bridge surface, and display-only replay.
+- Document bridge debug diagnostics (`PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1`) and the scrubbed JSONL allowlist behavior.
+- Refresh Cursor fast footer status on `turn_start` and treat models with the `cursor-sdk` API as Cursor models for status updates.
+### Fixed
+- Harden Cursor pi tool bridge diagnostics so debug JSONL uses run-safe IDs separate from tokenized loopback routes and an allowlisted serializer that omits endpoint path material, raw args/results, and secrets.
+- Improve Cursor SDK token accounting for `/session` and compaction by keeping raw Cursor internal usage diagnostic-only, counting split-run tool-call activity/tool-result consumption in approximate pi session usage, using `usage.totalTokens` for the replayable Cursor prompt/context estimate, and sharing the same matched tool-result boundary between provider usage and bridge result resolution.
+- Fix duplicated final assistant text when Cursor streams partial post-tool text that prefixes the eventual final answer.
+- Preserve the latest user request in budgeted incremental Cursor session-agent prompts.
+- Invalidate and recreate session agents on compaction, API key changes, send errors, session shutdown, and `/tree` navigation so reused agents stay aligned with the active branch.
+- Treat `/reload` session shutdown as non-terminal for the session-agent pool so the same session can acquire a fresh Cursor SDK agent after reload.
+- Bootstrap prompts now include branch summaries after `/tree` navigation.
+- Harden Cursor pi tool bridge validation and contract boundaries.
+## 0.1.15 - 2026-05-21
+### Added
+- Add the default-on local pi MCP tool bridge, which exposes bridgeable active pi tools to local Cursor agents while executing calls through pi's normal tool path.
+- Add `cursor_ask_question` through the bridge so Cursor can ask users through pi UI as `pi__cursor_ask_question`.
+- Add `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1` for opting in to overlapping built-in pi tools that are hidden from the Cursor bridge by default.
+- Add Cursor SDK MCP tool-call timeout overrides via `PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS` and `PI_CURSOR_MCP_TOOL_TIMEOUT_MS` for long-running local MCP tools, including bridged pi tools.
+- Replay Cursor SDK `grep` activity through native pi `grep` cards and `glob` activity through native pi `find` cards, so search activity matches built-in tool UX in interactive TTY sessions.
+### Changed
+- Load Cursor setting sources with `PI_CURSOR_SETTING_SOURCES=all` by default while filtering direct Cursor SDK startup logs so settings, rules, plugins, and configured Cursor MCP servers are available without corrupting pi's TUI.
+### Fixed
+- Replay recorded Cursor tool errors, including nonzero shell exits and timeout-backgrounded shell commands, as native pi tool errors instead of successful green cards.
+- Format zero-match Cursor grep results as `(no matches)` instead of raw `{ "totalMatches": 0 }` JSON in native replay and transcript output.
+- Strip trailing colons from Cursor grep file-list replay output.
+- Make native Cursor read replay closer to pi's built-in read cards by displaying session-relative paths and 20-line continuation hints.
+- Convert Cursor SDK shell timeouts from milliseconds to seconds in native bash replay cards instead of rendering `30000ms` as `30000s`.
+- Use the pi session cwd for Cursor `Agent.create`, not only native tool replay display. Completes the 0.1.10 cwd work that previously updated replay registration but left the Cursor agent runtime on `process.cwd()`.
+- Replay path-only Cursor `write` activity through neutral recorded Cursor activity instead of invalid native pi `write` calls.
+- Preserve literal `cursor_edit`, `cursor_write`, and `cursor_mcp` text in user messages, assistant text, tool args, and tool results while still relabeling structured replay tool names.
+- Avoid hiding unrelated MCP activity whose result payload merely contains a bridge tool name, while still suppressing real bridge-owned Cursor MCP replay by invocation identity and call ID.
+- Clean up pending native replay waits when abort signals are already aborted or abort before listener registration.
+- Suppress direct Cursor SDK settings/skills startup noise, including late `managed_skills.removed` lines, without swallowing unrelated non-startup stdout/stderr output.
 ## 0.1.14 - 2026-05-18
 ### Changed

package/README.md CHANGED Viewed

@@ -165,7 +165,7 @@ For Claude models with both `thinking` and `effort`, pi thinking `off` sends `th
 In `pi --list-models`, `thinking=no` means pi cannot control the model's thinking level with `--thinking`, a final `:medium` model suffix, or shift+tab. It does not mean the Cursor model cannot think.
-Some Cursor SDK models do not expose a `reasoning`, `effort`, or `thinking` parameter for the extension to set. Cursor thinking is still enabled/supported by the model, and Cursor may still emit thinking deltas. The extension does not disable Cursor's default reasoning behavior.
+Some Cursor SDK models do not expose a `reasoning`, `effort`, or `thinking` parameter for the extension to set. Cursor thinking is still enabled/supported by the model, and Cursor may still emit thinking deltas. The extension surfaces those deltas through pi's native thinking rendering when the SDK emits them.
 ## Fast mode
@@ -197,6 +197,43 @@ If you do not see `cursor fast`, fast mode is off.
 Images from the latest user message are forwarded to Cursor. Historical images are kept out of the transcript and appear only as `[image omitted from transcript]` placeholders, so follow-up questions about an earlier image should reattach the image or include a textual description. The extension advertises `text` and `image` input for Cursor models because Cursor's SDK accepts image messages and Cursor models are expected to support them.
+## Cursor provider tool contract
+Cursor runs use local Cursor SDK agents with two separate tool surfaces:
+- **Cursor-native surface:** Cursor local-agent tools, Cursor settings, plugins, and configured Cursor MCP servers. These remain owned by the Cursor SDK local agent path.
+- **pi bridge surface:** pi-cursor-sdk exposes bridgeable active pi tools through a per-run local loopback MCP bridge when the bridge is enabled and the current pi tool registry has exposed tools.
+Bridge capabilities are snapshotted from `pi.getActiveTools()` and `pi.getAllTools()` for each Cursor run. Cursor sees active bridgeable pi tools as collision-safe MCP names such as `pi__sem_reindex` only when they are exposed in that current run. Pi session output, tool cards, confirmations, hooks, renderers, history, and abort behavior use the real pi tool name, such as `sem_reindex`. The bridge queues Cursor's MCP call, emits a normal pi `toolCall`, waits for the matching pi `toolResult`, and resolves that result back into the same live Cursor SDK run without creating a new `Agent`, unless the run was disposed, aborted, or cancelled. The bridge does not call pi tool `execute()` handlers directly.
+Overlapping built-in pi tools (`read`, `bash`, `write`, `edit`, `grep`, `find`, `ls`) are hidden by default because Cursor local agents already have native equivalents. Extension/custom tools and non-overlapping active tools present in pi's active tool registry normally remain exposed. The bridge also exposes `cursor_ask_question` as `pi__cursor_ask_question` when enabled, allowing Cursor to ask the user through pi UI instead of silently choosing a default.
+Cursor-native tool replay is separate from the bridge. Replay cards are display-only recorded Cursor SDK activity. They never re-run Cursor-side commands, reapply Cursor edits, call MCP servers, or mutate pi state. See [Cursor native tool replay](docs/cursor-native-tool-replay.md).
+Bridge controls:
+```bash
+# Roll back to Cursor SDK tools/settings/MCP only; do not expose active pi tools through the bridge.
+PI_CURSOR_PI_TOOL_BRIDGE=0 pi --model cursor/composer-2.5
+# Opt in to also expose overlapping pi tool names through the bridge.
+PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 pi --model cursor/composer-2.5
+# Override Cursor SDK MCP tool-call timeout, including bridged pi tools and configured Cursor MCP servers.
+PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS=7200 pi --model cursor/composer-2.5
+PI_CURSOR_MCP_TOOL_TIMEOUT_MS=7200000 pi --model cursor/composer-2.5
+# Emit scrubbed bridge diagnostics as JSONL to stderr with prefix [pi-cursor-sdk:bridge].
+PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 pi --model cursor/composer-2.5
+```
+`PI_CURSOR_PI_TOOL_BRIDGE=0` is the supported rollback flag and disables the bridge entirely. The bridge also treats `false`, `off`, `none`, `no`, and `disabled` as off; `1`, `true`, `on`, `yes`, and `enabled` as on. `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1` opts in to exposing overlapping pi tool names that Cursor already has native equivalents for. The Cursor MCP timeout override defaults to 3600 seconds because the installed Cursor SDK has a 60-second MCP request default that is too short for some local MCP tools, including bridged pi tools and configured Cursor MCP servers. `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` is off by default and emits typed, allowlisted, scrubbed single-line JSONL records to `process.stderr`. These records are operational diagnostics, not anonymous telemetry: they intentionally include tool names, safe correlation IDs, bridge run state, exposed pi↔MCP name pairs, queued requests, result resolution, rejection, cancellation, and pending counts. They must not include endpoint URLs, endpoint path components, endpoint tokens, raw args/results, stdout/stderr payloads, file contents, Cursor settings output, API keys, bearer tokens, cookies, session credentials, or secrets. Do not enable or share bridge debug logs where tool names themselves are sensitive.
+### Maintainer live smoke release gate
+For Cursor provider/runtime changes, follow the manual [Cursor live smoke checklist](docs/cursor-live-smoke-checklist.md) before release. Assume every runtime surface is in scope. The checklist uses real `pi -e . --cursor-no-fast --model cursor/composer-2.5` runs with temporary session dirs and requires the visible TUI/output, scrubbed diagnostics, and persisted JSONL to agree. Do not mark a release ready with optional, deferred, mostly-passing, or unobserved smoke checks outstanding.
 ## Fallback models
 If no key is available from `/login`, `CURSOR_API_KEY`, or `--api-key`, model discovery fails, or discovery returns no models, the extension registers a bundled fallback snapshot of the latest reviewed Cursor SDK model catalog and notifies interactive users when possible.
@@ -207,14 +244,14 @@ Actual Cursor runs still need a key from `/login`, `CURSOR_API_KEY`, or `--api-k
 ## Limits
-- **Local Cursor SDK agents only.** This extension does not use Cursor cloud agents.
-- **Cursor-side tool use is not re-executed by pi.** Cursor still uses its own internal SDK tools. The extension records completed Cursor tool activity and, in interactive TTY sessions, replays supported `read`, `bash`, `ls`, `edit`, and `write` activity through pi's native tool-call path with recorded results (for example green `read`, `$ ...`, and Cursor edit/write cards) without forcing Cursor to call pi tools or rerun commands. Cursor edit/write activity uses replay-only `cursor_edit` and `cursor_write` tool cards because Cursor's file-editing schema is not the same as pi's built-in `edit`/`write` schemas; those replay tools only display recorded Cursor results and never mutate files directly. If a Cursor read completion reports no content, the extension may include a bounded local file preview for safe in-workspace paths; that preview is explicitly labeled as a local preview captured at transcript time, not guaranteed Cursor-observed content. Native replay wrappers are registered only for tool names not already owned by another extension; skipped tools fall back to the scrubbed Cursor activity transcript. As Cursor SDK tool completions arrive, the extension mirrors native Codex ordering by ending a tool-use turn, letting pi render the recorded tool results immediately, then continuing with live post-tool Cursor thinking/text, any later Cursor tool batches, or Cursor's final answer as the next assistant turn. Non-interactive/session consumers still get bounded scrubbed transcript data so `pi -p` keeps printing normal assistant text.
-- **Pi tool schemas are not passed through to Cursor.** This extension is a Cursor provider, not a bridge that forwards pi's tool system into Cursor.
-- **One fresh Cursor agent is created per provider call.** Cursor agent state is not reused between pi provider calls.
-- **Cursor setting sources are opt-in.** The extension does not pass `local.settingSources` by default because the Cursor SDK can print settings/skills loading output directly to the terminal during startup. To load configured Cursor MCP servers, plugin tools, project/user settings, and related Cursor-native capabilities, start pi with `PI_CURSOR_SETTING_SOURCES=all`. To narrow loading, set a comma-separated list such as `PI_CURSOR_SETTING_SOURCES=project,user,plugins`.
+- **Local Cursor SDK agents only.** This extension does not use Cursor cloud agents. Cloud pi tool bridging is out of scope because it needs a separate auth, transport, lifetime, and remote trust design.
+- **The pi tool bridge is local and MCP-backed.** Bridgeable active pi tools are exposed to local Cursor agents through a tokenized `127.0.0.1` MCP endpoint; internal Cursor replay activity names are excluded, and overlapping built-in pi tools are hidden by default. Set `PI_CURSOR_PI_TOOL_BRIDGE=0` to disable it or `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1` to expose overlapping built-ins too.
+- **Cursor native tool replay is display-only.** Replay renders recorded Cursor SDK activity and never re-runs Cursor-side commands, reapplies Cursor edits, calls MCP servers, or mutates pi state. Workflow tools such as Cursor `SwitchMode` and Cursor todo state are not pi workflow controls. See [Cursor native tool replay](docs/cursor-native-tool-replay.md) for supported replay cards, ordering, conflict handling, and opt-out flags.
+- **Cursor run state can span tool-use turns.** Within a pi session, the extension reuses one Cursor SDK agent across compatible follow-up turns and sends incremental prompts when context still matches. It recreates the agent when context diverges, after compaction or `/tree` navigation, on API key changes, after send errors, or on session shutdown. For bridged pi tools, the matching pi `toolResult` resolves into the same live Cursor SDK run without creating a new `Agent`, unless the run was disposed, aborted, or cancelled. Replay can also split one live Cursor SDK run across pi `toolUse` turns for display.
+- **Cursor setting sources default to all.** The extension passes `local.settingSources: ["all"]` by default so configured Cursor MCP servers, plugin tools, project/user settings, and related Cursor-native capabilities are available like they are in Cursor. To narrow loading, set a comma-separated list such as `PI_CURSOR_SETTING_SOURCES=project,user,plugins`. To disable ambient setting sources, set `PI_CURSOR_SETTING_SOURCES=none`. Direct Cursor SDK startup logs are suppressed so setting/skill loading messages do not pollute the TUI.
 - **Max Mode is not a manual pi variant.** Cursor's SDK may enable Max Mode automatically for models that require it. This extension only advertises exact context-window variants that the SDK catalog exposes and otherwise uses conservative SDK-derived default/non-Max context windows.
 - **Output token limits are conservative.** Cursor SDK model metadata does not currently expose output token limits directly.
-- **Token usage is approximate in pi.** Cursor SDK usage events include internal agent/tool/cache work, so the extension reports an approximate replayable pi prompt/output size for context display and compaction decisions.
+- **Token usage is approximate in pi.** Cursor SDK usage events include cumulative internal agent/tool/cache work, so raw Cursor SDK counters are not copied into pi usage. The extension reports approximate pi session activity in `input`/`output`, including split-run tool calls and consumed tool results, while `totalTokens` tracks the replayable Cursor prompt/context estimate used for context display and compaction.
 ## Troubleshooting
@@ -251,7 +288,7 @@ pi install npm:pi-cursor-sdk
 ### `pi --list-models` shows `thinking=no`
-That does not mean the model cannot think. It means the Cursor SDK does not expose a pi-controllable thinking parameter for that model. The model may still think internally and may still emit thinking deltas.
+That does not mean the model cannot think. It means the Cursor SDK does not expose a pi-controllable thinking parameter for that model. The model may still think internally and may still emit thinking deltas that pi renders natively.
 ### I do not see `cursor fast` in the footer
@@ -259,21 +296,38 @@ Fast mode is currently off. The footer only shows `cursor fast` when fast mode i
 ### My Cursor app settings or rules do not seem to apply
-Cursor setting sources are not loaded by default because the Cursor SDK can print settings/skills loading output directly to the terminal. Start pi with `PI_CURSOR_SETTING_SOURCES=all`, or choose a narrower list such as `PI_CURSOR_SETTING_SOURCES=project,user,plugins`.
+Cursor setting sources are loaded with `PI_CURSOR_SETTING_SOURCES=all` by default. To narrow loading, set `PI_CURSOR_SETTING_SOURCES=project,user,plugins` or another comma-separated list. If you explicitly disabled sources with `PI_CURSOR_SETTING_SOURCES=none`, remove that override.
 ### Cursor does not call my web search MCP/tool
-Cursor SDK local agents load MCP servers from Cursor setting sources and inline SDK config. This extension leaves Cursor setting sources off by default to avoid startup log noise, so a web search tool needs to be configured in Cursor and settings sources need to be enabled with `PI_CURSOR_SETTING_SOURCES=all` or a narrower list.
+Cursor SDK local agents load MCP servers from Cursor setting sources and inline SDK config. This extension enables all Cursor setting sources by default, so a missing web search tool usually means it is not configured in Cursor or the run was started with a narrowing/disable override such as `PI_CURSOR_SETTING_SOURCES=none`.
-### Cursor native tool cards conflict with another extension
+### Cursor does not call my pi extension tool
+The local pi bridge only exposes tools that are active in the current pi session and present in pi's tool registry at Cursor run start. By default, it does not expose overlapping pi tool names that Cursor already has native equivalents for (`read`, `bash`, `write`, `edit`, `grep`, `find`, and `ls`). Opt in if you intentionally want Cursor to see both the Cursor-native tool and an overlapping built-in pi tool:
+```bash
+PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 pi --model cursor/composer-2.5
+```
-Cursor native replay is a UI enhancement for interactive TTY sessions. If another extension already owns `read`, `bash`, `ls`, `cursor_edit`, or `cursor_write`, this extension skips only the conflicting native replay wrapper and uses the scrubbed Cursor activity transcript for that tool instead. To disable Cursor native replay registration entirely, start pi with:
+To disable the bridge for rollback or isolation, start pi with:
 ```bash
-PI_CURSOR_NATIVE_TOOL_DISPLAY=0 pi --model cursor/composer-2.5
+PI_CURSOR_PI_TOOL_BRIDGE=0 pi --model cursor/composer-2.5
 ```
-`PI_CURSOR_REGISTER_NATIVE_TOOLS=0` is also accepted as a registration-only opt-out.
+### A Cursor MCP tool times out
+The extension raises Cursor SDK's MCP tool-call timeout from 60 seconds to 3600 seconds by default for Cursor SDK MCP `callTool` requests, including the local pi bridge and configured Cursor MCP servers. For longer local MCP tools, set one override:
+```bash
+PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS=7200 pi --model cursor/composer-2.5
+PI_CURSOR_MCP_TOOL_TIMEOUT_MS=7200000 pi --model cursor/composer-2.5
+```
+### Cursor native tool cards conflict with another extension
+Cursor native replay is a UI enhancement for interactive TTY sessions. See [Cursor native tool replay](docs/cursor-native-tool-replay.md) for conflict behavior and opt-out flags.
 ## Development

package/docs/cursor-live-smoke-checklist.md ADDED Viewed

@@ -0,0 +1,271 @@
+# Cursor Live Smoke Checklist
+## Purpose
+Use this manual checklist before releasing Cursor provider/runtime changes. Unit tests and mocks are necessary, but they are not enough for this extension. Always assume every runtime surface is in scope. A release is not ready until every live check below has been observed with `cursor/composer-2.5` through the local working tree.
+## Release rule
+- Run from a clean working tree except for the intended branch diff.
+- Use the local extension under test: `pi -e . --cursor-no-fast --model cursor/composer-2.5`.
+- Use a temporary `--session-dir` for every run.
+- Do not paste or commit Cursor API keys, raw session contents with secrets, endpoint URLs, or local private paths.
+- If a check fails, stop and fix or explicitly mark the release blocked. Do not ship with "optional," "deferred," "mostly," or "probably" checks outstanding.
+- Do not narrow the smoke scope to the apparent code diff. Treat provider reality, TUI behavior, bridge behavior, replay behavior, diagnostics safety, abort/cancel cleanup, usage accounting, packaging, and cleanup as in scope for every Cursor provider/runtime release.
+- A check is passed only when the visible TUI/output, stderr diagnostics, and persisted JSONL agree with the expected behavior.
+## Prerequisites
+```bash
+export SMOKE_DIR="/tmp/pi-cursor-sdk-live-smoke-$(date +%Y%m%dT%H%M%S)"
+mkdir -p "$SMOKE_DIR"
+pi -e . --list-models cursor
+```
+Pass criteria:
+- `cursor/composer-2.5` appears in the model list.
+- No Cursor key or auth token is printed.
+- If `CURSOR_API_KEY` is unavailable and `/login` is not configured, stop and report the live smoke as blocked.
+## 1. Basic provider reality check
+```bash
+PI_CURSOR_SETTING_SOURCES=none \
+pi -e . --cursor-no-fast --model cursor/composer-2.5 \
+  --session-dir "$SMOKE_DIR/basic" \
+  --no-tools \
+  -p 'Live smoke. Reply exactly: PI_CURSOR_SMOKE_OK' \
+  > "$SMOKE_DIR/basic.stdout.txt" \
+  2> "$SMOKE_DIR/basic.stderr.txt"
+```
+Pass criteria:
+- Exit code is `0`.
+- stdout contains `PI_CURSOR_SMOKE_OK`.
+- stderr is empty or contains only expected non-secret diagnostics for the specific test.
+- The persisted JSONL has exactly one assistant message with non-negative usage fields and `cacheRead/cacheWrite` equal to `0`.
+## 2. Default setting-source startup noise check
+```bash
+pi -e . --cursor-no-fast --model cursor/composer-2.5 \
+  --session-dir "$SMOKE_DIR/default-settings" \
+  --no-tools \
+  -p 'Default settings smoke. Include PRODUCT=42 in the final answer.' \
+  > "$SMOKE_DIR/default-settings.stdout.txt" \
+  2> "$SMOKE_DIR/default-settings.stderr.txt"
+```
+Pass criteria:
+- Exit code is `0`.
+- stdout includes `PRODUCT=42`.
+- stderr is empty.
+- No Cursor SDK settings/skills startup logs corrupt stdout or the TUI.
+## 3. TUI observation check
+Run a real interactive session under tmux:
+```bash
+SESSION="pi-cursor-sdk-smoke-$(date +%s)"
+tmux new-session -d -s "$SESSION" -x 120 -y 40 -- zsh -lc \
+  "cd '$PWD' && PI_CURSOR_SETTING_SOURCES=none pi -e . --cursor-no-fast --model cursor/composer-2.5 --session-dir '$SMOKE_DIR/tui' --no-tools 'TUI smoke. Compute 19 + 23. Reply only with SUM=<number>.'"
+```
+Observe with `tmux capture-pane -pt "$SESSION"` or attach manually.
+Pass criteria:
+- Footer shows `(cursor) composer-2.5`. With `--cursor-no-fast`, Cursor fast mode is off and the Cursor extension status should not show `cursor fast`; ignore unrelated status text from other extensions.
+- Assistant answer appears correctly.
+- `/session` shows one user and one assistant message for the simple run.
+- Persisted JSONL has one assistant message. If the screen appears duplicated, inspect JSONL before deciding whether it is a rendering bug.
+- Kill the tmux session after the check and verify no smoke tmux sessions remain.
+## 4. Bridge multi-tool success and failure
+```bash
+PI_CURSOR_SETTING_SOURCES=none \
+PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 \
+PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 \
+pi -e . --cursor-no-fast --model cursor/composer-2.5 \
+  --session-dir "$SMOKE_DIR/bridge" \
+  -p 'Bridge smoke. Do exactly two tool calls before answering: first call pi__read on ./package.json; second call pi__read on ./definitely-missing-pi-cursor-sdk-smoke-file.txt. Then answer: OK_NAME=<package name>; MISSING_RESULT=<error or success>. Do not use shell.' \
+  > "$SMOKE_DIR/bridge.stdout.txt" \
+  2> "$SMOKE_DIR/bridge.stderr.txt"
+```
+Pass criteria:
+- stdout includes `OK_NAME=pi-cursor-sdk`.
+- Diagnostics include `run_created`, `tools_exposed`, two `request_queued`, two `request_resolved`, and `run_disposed`.
+- The missing-file request has `isError: true`.
+- Persisted JSONL contains real pi tool calls named `read`, matching `toolResult` messages, and final assistant output.
+- Later assistant usage counts consumed tool-result input; no assistant usage has negative values or nonzero cache fields.
+## 5. Native replay cards without the pi bridge
+```bash
+PI_CURSOR_SETTING_SOURCES=none \
+PI_CURSOR_PI_TOOL_BRIDGE=0 \
+PI_CURSOR_NATIVE_TOOL_DISPLAY=1 \
+pi -e . --cursor-no-fast --model cursor/composer-2.5 \
+  --session-dir "$SMOKE_DIR/native-replay" \
+  -p 'Native replay smoke. Use your Cursor file-reading capability to read ./README.md, then answer README_SEEN=yes if it contains pi-cursor-sdk.' \
+  > "$SMOKE_DIR/native-replay.stdout.txt" \
+  2> "$SMOKE_DIR/native-replay.stderr.txt"
+```
+Pass criteria:
+- stdout includes `README_SEEN=yes`.
+- Persisted JSONL shows an assistant `toolUse` turn with a replayed `read` tool call, a pi `read` `toolResult`, and a final assistant turn.
+- Native replay is display-only: it must not re-run Cursor-side mutations or create duplicate pi mutations.
+## 6. Diagnostics safety contract
+Bridge diagnostics are scrubbed operational logs, not anonymous telemetry.
+Allowed fields:
+- event name
+- run-safe correlation IDs that are not endpoint path components
+- bridge/pi tool call IDs derived from the run-safe ID
+- hashed Cursor MCP call correlation IDs of the form `cursor-mcp-call-<8 hex chars>`
+- exposed pi/MCP tool name pairs
+- pending/queued/cancelled counts
+- success/error booleans
+- rejection kind
+Forbidden fields:
+- Cursor API keys or auth headers
+- bearer tokens, cookies, sessions, or raw credential material
+- endpoint URLs, endpoint path components, endpoint tokens, or loopback URLs
+- raw tool args
+- raw tool results
+- stdout/stderr payloads
+- file contents
+- Cursor settings/skills startup output
+- local private session paths in tracked docs
+Run a forbidden-material scan over smoke stderr/captures:
+```bash
+find "$SMOKE_DIR" -type f \( -name '*stderr.txt' -o -name 'capture*.txt' \) -print0 |
+  xargs -0 grep -E 'CURSOR_API_KEY|Bearer [A-Za-z0-9._-]+|/cursor-pi-tool-bridge/[^ ]+/mcp|127\.0\.0\.1:[0-9]+/cursor-pi-tool-bridge|apiKey|cookie|session-cookie|secret-token'
+```
+Pass criteria:
+- The grep returns no matches except deliberately planted test strings that are asserted not to appear in serialized diagnostics.
+- If tool names themselves are considered sensitive for a release target, do not enable `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` for shared logs. The diagnostics contract intentionally allows tool names.
+## 7. Long-running bridge and abort/cancel
+This check is release-blocking for every Cursor provider/runtime release.
+Use a harmless long-running command and interrupt it after the bridge request is queued:
+```bash
+PI_CURSOR_SETTING_SOURCES=none \
+PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 \
+PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 \
+pi -e . --cursor-no-fast --model cursor/composer-2.5 \
+  --session-dir "$SMOKE_DIR/abort" \
+  -p 'Abort smoke. Call pi__bash with command: sleep 30 && echo SHOULD_NOT_PRINT. Do not answer until the tool completes.'
+```
+Pass criteria:
+- Interrupting the run does not leave `sleep 30`, `SHOULD_NOT_PRINT`, `pi`, or bridge-related child processes running.
+- Diagnostics either show clean cancellation/disposal or the process exits cleanly without orphaning children.
+- Persisted JSONL does not contain a false successful final answer.
+## 8. Final structural session scan
+After all live runs, scan JSONL structurally instead of reading raw content into a report:
+```bash
+node <<'NODE'
+const fs = require('fs');
+const path = require('path');
+const root = process.env.SMOKE_DIR;
+const files = [];
+function walk(dir) {
+  for (const name of fs.readdirSync(dir)) {
+    const p = path.join(dir, name);
+    const st = fs.statSync(p);
+    if (st.isDirectory()) walk(p);
+    else if (p.endsWith('.jsonl')) files.push(p);
+  }
+}
+walk(root);
+let failures = 0;
+for (const file of files.sort()) {
+  const records = fs.readFileSync(file, 'utf8').trim().split(/\n+/).filter(Boolean).map(JSON.parse);
+  const messages = records.filter((record) => record.type === 'message').map((record) => record.message);
+  const assistants = messages.filter((message) => message.role === 'assistant');
+  const usage = assistants.map((message) => message.usage).filter(Boolean);
+  const badUsage = usage.filter((u) =>
+    typeof u.input !== 'number' || u.input < 0 ||
+    typeof u.output !== 'number' || u.output < 0 ||
+    typeof u.totalTokens !== 'number' || u.totalTokens < 0 ||
+    u.cacheRead !== 0 || u.cacheWrite !== 0
+  );
+  if (usage.length !== assistants.length || badUsage.length > 0) failures += 1;
+  console.log(JSON.stringify({ file: path.relative(root, file), assistantCount: assistants.length, usageCount: usage.length, badUsageCount: badUsage.length }));
+}
+process.exit(failures === 0 ? 0 : 1);
+NODE
+```
+Pass criteria:
+- Every assistant message has valid usage.
+- Cache fields remain `0`.
+- Tool-heavy runs show nonzero output for visible assistant/tool-call activity.
+- Split runs count consumed tool-result input once on the following assistant turn.
+## 9. Standard local gates
+```bash
+git diff --check
+npm test
+npm run typecheck
+npm pack --dry-run
+```
+Pass criteria:
+- All commands exit `0`.
+- `npm pack --dry-run` includes all new runtime source files and excludes local smoke artifacts, sessions, package tarballs, `.env*`, `.pi/`, `dist/`, and `coverage/`.
+## 10. Cleanup
+```bash
+tmux list-sessions | grep 'pi-cursor-sdk-smoke' || true
+rm -rf "$SMOKE_DIR"
+```
+Pass criteria:
+- No smoke tmux sessions remain.
+- No smoke child processes remain.
+- No smoke artifacts are committed.
+## Coverage gaps this checklist makes explicit
+Everything in this section is in scope for Cursor provider/runtime releases. These are not accepted as "done" unless the matching live check passes:
+- Long-running bridged tool abort/cancel cleanup.
+- Native replay cards beyond read, especially shell/edit/write cards, when those renderers change.
+- Bridge question UI when `cursor_ask_question` changes.
+- MCP timeout override behavior when timeout code changes.
+- Ambient Cursor setting-source behavior when startup filtering or local Cursor settings handling changes.
+- Model discovery aliases/context variants when model-discovery code or Cursor SDK versions change.
+If any surface has no adequate live check, add that check before release instead of assuming mocks cover reality.

package/docs/cursor-model-ux-spec.md CHANGED Viewed

@@ -16,13 +16,27 @@ Current implementation notes:
 - Image payload forwarding sends images only from the latest user message. If the latest user turn is plain text after an earlier image turn, the transcript keeps an `[image omitted from transcript]` placeholder but no image bytes are sent to Cursor. The prompt explicitly tells Cursor that prior image bytes are unavailable and to ask the user to reattach or describe a prior image when needed. Carrying images forward across turns remains a future product decision because it affects token cost, privacy, stale visual context, and expected multimodal follow-up behavior.
 - `@cursor/sdk` is a package dependency of this extension; users should not need a global SDK install.
 - Cursor auth uses pi-native API-key resolution for provider `cursor`: CLI `--api-key`, stored `~/.pi/agent/auth.json` API key from `/login`, then `CURSOR_API_KEY`. The extension config file stores only non-secret Cursor-only state such as fast defaults.
-- Local agents do not pass `settingSources` by default because the Cursor SDK can print settings/skills loading output directly to the terminal during startup. Users can opt in with `PI_CURSOR_SETTING_SOURCES=all` or narrow loading with a comma-separated list such as `PI_CURSOR_SETTING_SOURCES=project,user,plugins`.
+- Local agents pass `settingSources: ["all"]` by default so Cursor MCP servers, plugin tools, project/user settings, and related Cursor-native capabilities are available. Users can narrow loading with a comma-separated list such as `PI_CURSOR_SETTING_SOURCES=project,user,plugins`, or disable ambient setting sources with `PI_CURSOR_SETTING_SOURCES=none`. The provider suppresses direct Cursor SDK startup writes around agent creation so setting/skill loading logs do not pollute pi's TUI.
 - Cursor SDK models are treated as thinking-capable even when pi reports `thinking=no`; that pi column only means the SDK did not expose a pi-controllable thinking parameter for that model.
-- Cursor-side thinking remains visible. Cursor internal tool activity is recorded from SDK events and scrubbed. In interactive TTY sessions, supported completed `read`, `bash`, `ls`, `edit`, and `write` activity is replayed through pi's native tool-call rendering path with recorded Cursor results, so the TUI can show native green cards without forcing Cursor to call pi tools or rerunning Cursor's reads/shell commands/file edits. Cursor edit/write activity is replayed through `cursor_edit` and `cursor_write` cards rather than pi's built-in `edit`/`write` names because Cursor's edit/write schemas differ from pi's schemas; these replay-only tools display recorded Cursor results and fail closed if called without a recorded result. Native replay wrappers are registered only for tool names not already owned by another extension; conflicting tools use the bounded scrubbed transcript fallback. `PI_CURSOR_NATIVE_TOOL_DISPLAY=0` disables native replay, and `PI_CURSOR_REGISTER_NATIVE_TOOLS=0` is a registration-only opt-out that keeps the transcript fallback without shadowing pi tool names. When these native cards are emitted, the provider mirrors Codex's turn shape as Cursor SDK completions arrive: assistant `toolUse`, pi `toolResult`s, live post-tool Cursor thinking/text, any later Cursor tool batches as further `toolUse` turns, then Cursor's final assistant answer. Non-interactive runs keep bounded scrubbed transcript output instead, preserving `pi -p` assistant text output. Cursor text deltas stream live when native tool replay is not active.
-- Cursor SDK usage events report cumulative internal agent/tool/cache work, not the replayable pi prompt context. The extension reports approximate prompt/output usage for pi context display and compaction decisions instead of copying raw Cursor SDK usage. When native replay splits one Cursor SDK run into multiple pi turns, prompt input is counted once for the run; later synthetic replay turns report `input: 0` and only their own output estimate.
+- Cursor-side thinking remains visible through pi's native thinking rendering when the Cursor SDK emits thinking or summary deltas.
+- Local Cursor agents get two tool surfaces. First, Cursor keeps the Cursor SDK local-agent tool surface plus configured Cursor settings, plugins, and Cursor MCP servers. Second, pi-cursor-sdk exposes active pi tools through a default-on, tokenized loopback MCP bridge when bridgeable tools exist.
+- `buildCursorPiToolBridgeSnapshot()` is the runtime capability source for pi bridge tools. It snapshots `pi.getActiveTools()` and `pi.getAllTools()`, filters internal replay names, hides overlapping built-in pi tools (`read`, `bash`, `write`, `edit`, `grep`, `find`, `ls`) unless `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1`, and creates collision-safe MCP names such as `pi__sem_reindex`. Cursor discovers the current run's exposed bridge tools through MCP `listTools`; there is no default run-start manifest, per-turn visible tool list, status manifest, or footer manifest.
+- Prompt text is the primary provider/bridge contract. MCP tool descriptions repeat the same contract to reinforce discovery, but do not replace the prompt boundary. Cursor must call the exposed `pi__*` MCP name, not the real pi tool name shown in pi history or transcripts. Pi emits and executes the real pi tool name.
+- The provider also registers `cursor_ask_question` for Cursor models when the bridge is enabled. Cursor sees it as `pi__cursor_ask_question`, and pi executes it through the normal tool path so interactive users can choose options from pi UI. In non-UI modes it reports that UI is unavailable so Cursor can state a default assumption instead. `PI_CURSOR_PI_TOOL_BRIDGE=0` disables the local bridge, including question bridging. Cloud Cursor agents remain out of scope for the bridge.
+- The bridge queues MCP calls, emits provider `toolcall_*` events, waits for matching pi `toolResult` messages by `toolCallId`, resolves the result back into the same live Cursor SDK run without creating a new `Agent`, and never calls tool `execute()` handlers directly. The same-run resume invariant holds unless the run was disposed, aborted, or cancelled.
+- Cursor SDK MCP tool calls use a guarded timeout override because installed `@cursor/sdk` 1.0.13 has a 60-second MCP request default with no public per-server timeout option. The extension extends that Cursor SDK MCP `callTool` timeout path to 3600 seconds by default. Users can override it with `PI_CURSOR_MCP_TOOL_TIMEOUT_MS` or `PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS`.
+- Bridge diagnostics are opt-in only: `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` writes typed, allowlisted, scrubbed single-line JSONL records to `process.stderr` with prefix `[pi-cursor-sdk:bridge]`. Diagnostics are scrubbed operational logs, not anonymous telemetry. They intentionally include tool names, safe correlation IDs, run lifecycle, exposed pi↔MCP name pairs, queued requests, result resolution, rejection, cancellation, and pending counts. Correlation IDs are generated independently from the tokenized endpoint path, and Cursor MCP call IDs are hashed before serialization. Diagnostics must not include endpoint paths/URLs/path components/tokens, API keys, bearer tokens, cookies, session credentials, raw args/results, stdout/stderr payloads, file contents, Cursor settings output, or local private session paths in tracked docs, and they must not call pi UI status, notification, or footer APIs. If tool names themselves are unacceptable for a release target, bridge debug diagnostics are not safe for shared logs under the current contract.
+- This repo does not provide a generic desktop-automation, browser-driver, or CDP recipe. Provider docs should describe pi-cursor-sdk's Cursor provider/bridge contract only.
+- Cursor internal tool activity is recorded from SDK events and scrubbed. In interactive TTY sessions, supported completed `read`, `bash`, `grep`, `find`, `ls`, `edit`, `write`, diagnostics, delete, todo/plan, task, image generation, and MCP activity is replayed through pi's native tool-call rendering path with recorded Cursor results, so the TUI can show native-looking cards without rerunning Cursor's reads/shell commands/file edits. Cursor `glob` activity is replayed through native `find` cards. Cursor write activity is replayed through native-looking `write` cards, and Cursor StrReplace/edit activity uses native-looking `edit` only when recorded arguments truthfully satisfy pi's `edit` schema; path-only Cursor edit and notebook edit replay falls back to neutral Cursor activity before pi validation. Diagnostics, delete, todos/plans, task, image, and MCP activity use neutral Cursor activity cards with pi's default success/error shell. Neutral Cursor activity calls include `activityTitle` and, when available, `activitySummary` so partial/collapsed cards preserve identity such as `Cursor plan`, `Cursor todos`, `Cursor MCP`, or `Cursor edit`. Replay-only tools display recorded Cursor results, normalize workspace-local paths/diff headers for display, use pi diff colors for edit previews and path-inferred syntax highlighting for write previews, and fail closed if called without a recorded result. Native replay wrappers are registered only for tool names not already owned by another extension; conflicting tools use the bounded scrubbed transcript fallback. Cursor workflow tools such as `SwitchMode` and Cursor todo state are not pi workflow controls; reported todo/plan events are displayed as Cursor activity only. Plan/todo replay cards can be followed by Cursor's final plan text, selected from `run.wait().result` when Cursor provides one and trimmed against already-emitted text. Started Cursor SDK tool calls that never receive a completion event are discarded without synthetic replay errors; explicit failures remain visible when Cursor reports them through completed tool calls or step results. `PI_CURSOR_NATIVE_TOOL_DISPLAY=0` disables native replay, and `PI_CURSOR_REGISTER_NATIVE_TOOLS=0` is a registration-only opt-out that keeps the transcript fallback without shadowing pi tool names. When bridge or native replay cards are emitted, the provider mirrors Codex's turn shape as Cursor SDK activity arrives: assistant `toolUse`, pi `toolResult`s, live post-tool Cursor thinking/text, any later tool batches as further `toolUse` turns, then Cursor's final assistant answer. For shell replay, completed `stdout` / `stderr` are primary; unambiguous `shell-output-delta` data is used only as display-only fallback for empty successful shell completions, and overlapping shell calls drop ambiguous deltas instead of guessing. Non-interactive runs keep bounded scrubbed transcript output instead, preserving `pi -p` assistant text output. Cursor text deltas stream live when no live-run turn split is active.
+- Synthetic replay names are internal compatibility details. New model-facing prompt text and user-visible cards use native tool names when renderer-compatible, or neutral Cursor activity labels when not. Legacy sessions containing old internal replay names are sanitized before prompt/display. Bridge MCP names such as `pi__sem_reindex` are MCP-only; pi session output uses real pi tool names.
+- Cursor SDK usage events report cumulative internal agent/tool/cache work, not the replayable pi prompt context. The extension does not copy raw Cursor SDK usage into pi usage or compaction. For Cursor assistant messages, `usage.input`/`usage.output` are approximate pi session activity components: initial Cursor prompt input is counted once, consumed split-run tool results are counted as deduped input on the following assistant turn, and assistant output includes visible text/thinking/tool-call content. `usage.totalTokens` is the replayable Cursor prompt/context estimate derived from the same `buildCursorPrompt()` path used for `Agent.send`; it may differ from `input + output` and is the context-safe value for display/compaction. `src/cursor-usage-accounting.ts` owns this usage policy, and `src/cursor-live-run-accounting.ts` owns prompt-once and consumed-tool-result accounting so provider usage and bridge result resolution share the same matched tool-result boundary.
+- Audit observation, 2026-05-19, superseded by the 2026-05-21 replay pass: a missing-file read with Composer 2.5 emitted `tool-call-started` for Cursor `read`, then streamed final text `Error: File not found`, but did not emit `tool-call-completed` or an `onStep` `toolCall` error result. Leftover started calls are now discarded at run completion instead of becoming synthetic replay errors. Cursor-reported completed/step errors remain visible.
+- Maintainer visual verification for replay-card changes should follow [Cursor Native Tool Visual Audit Workflow](./cursor-native-tool-visual-audit.md): offscreen PTY-driven pi run, xterm.js/Playwright screenshot rendering, and JSONL inspection before accepting commits or PRs.
+- Cursor provider/runtime releases should follow [Cursor Live Smoke Checklist](./cursor-live-smoke-checklist.md) with real `pi -e . --cursor-no-fast --model cursor/composer-2.5` invocations, manual observation, temporary session dirs, diagnostics scans, and persisted JSONL inspection. Assume every runtime surface is in scope. A release is not ready when any live check is optional, deferred, mostly passing, or unobserved.
 - For models without a catalog `context` parameter, context windows are not hardcoded. The extension ships a bundled SDK-derived default/non-Max cache generated from `createAgentPlatform().checkpointStore.loadLatest(agentId).tokenDetails.maxTokens`. Successful runs can update a local override cache, but model discovery does not probe models at startup.
 - Max Mode context windows are distinct from default/non-Max context windows. `@cursor/sdk` 1.0.13 documentation says the SDK may enable Max Mode automatically when a selected model requires it, but the public local-agent `ModelSelection` path still does not expose a manual Max Mode selector. Do not advertise Max Mode context windows unless the SDK catalog exposes an exact parameter/variant or the SDK public API adds a Max Mode selector that the extension actually sends.
 - `@cursor/sdk` 1.0.13 adds latest-style `ModelListItem.aliases`. The extension registers only unambiguous aliases as pi model IDs (with the same context suffixes when applicable) and sends the alias back in `ModelSelection.id`, while sharing Cursor-only state such as fast defaults with the underlying catalog `id`. Aliases shared by multiple base models, such as generic family aliases, are skipped because the pi row metadata would otherwise imply one base model while Cursor may resolve the alias to another.
+- Session-scoped Cursor SDK agent pooling reuses one live `@cursor/sdk` agent across compatible follow-up turns within the same pi session scope. `computeCursorContextFingerprint()` and `shouldBootstrapCursorSend()` decide whether the next turn sends a full bootstrap prompt or an incremental follow-up. The pool recreates the agent when context diverges, when branch or compaction summaries appear after `/tree` navigation or compaction, when the API key identity changes, after send errors, on `session_shutdown`, and when `session_before_tree` / `session_tree` invalidate the active branch. Incremental sends omit the full Cursor SDK tool boundary block because the session agent retains prior bootstrap context.
 ## Goal
@@ -236,7 +250,7 @@ Important distinction:
 - **Cursor thinking support** applies to all Cursor SDK models. The extension should assume Cursor models can think and may emit thinking deltas.
 - **Pi-controllable thinking** means Cursor exposes a `reasoning`, `effort`, or `thinking` parameter that the extension can set from pi's native thinking level. These models register `reasoning: true` and show `thinking=yes` in `pi --list-models`.
-- **Cursor SDK thinking-control gap** means the model can still think, but the SDK does not expose a user-controllable thinking parameter for that model. These models register `reasoning: false` and show `thinking=no` in `pi --list-models` because pi cannot control a level for them. The extension still parses Cursor `thinking-delta` events if they are emitted.
+- **Cursor SDK thinking-control gap** means the model can still think, but the SDK does not expose a user-controllable thinking parameter for that model. These models register `reasoning: false` and show `thinking=no` in `pi --list-models` because pi cannot control a level for them. The extension still surfaces Cursor `thinking-delta` and summary events through pi's native thinking rendering when they are emitted.
 Do not mark a model `reasoning: true` only because it can think. That would make pi show controls such as `--thinking`, `:medium`, and shift+tab even though the extension cannot translate them into Cursor SDK params.
@@ -655,3 +669,12 @@ Before calling done:
    - `pi --model cursor/gpt-5.5@272k --thinking xhigh -p "Say ok only"`
    - `pi --model cursor/gpt-5.5@1m --cursor-fast -p "Say ok only"`
    - confirm requests use selected context, pi thinking, and fast flag state
+4. Tool bridge and replay:
+   - `npm test -- test/cursor-pi-tool-bridge.test.ts test/cursor-provider.test.ts test/cursor-mcp-timeout-override.test.ts`
+   - confirm `Agent.create()` gets `mcpServers.pi_tools` when active pi tools exist and omits it when `PI_CURSOR_PI_TOOL_BRIDGE=0` or the active snapshot is empty
+   - confirm bridged MCP requests emit real pi tool calls and resolve matching pi tool results back to the same live Cursor SDK run without creating a new `Agent`, unless the run was disposed, aborted, or cancelled
+   - confirm bridge MCP activity is suppressed from Cursor replay while non-bridge Cursor MCP activity remains visible
+   - confirm `PI_CURSOR_MCP_TOOL_TIMEOUT_MS` and `PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS` override the Cursor SDK MCP callTool timeout seam
+   - confirm `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` emits typed, allowlisted, scrubbed JSONL to `process.stderr` with prefix `[pi-cursor-sdk:bridge]`, omits endpoint URLs/path components/tokens, and unset/false leaves output unchanged
+   - run the visual audit workflow when replay card visuals or bridge card visuals change; JSONL should show real pi tool names for bridged calls and no duplicate MCP replay for bridge calls