npm - pi-cursor-sdk - Versions diffs - 0.1.19 → 0.1.20 - Mend

pi-cursor-sdk 0.1.19 → 0.1.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/CHANGELOG.md +20 -0
package/README.md +23 -2
package/docs/cursor-live-smoke-checklist.md +1 -1
package/docs/cursor-model-ux-spec.md +5 -4
package/docs/cursor-native-tool-replay.md +6 -4
package/docs/cursor-testing-lessons.md +2 -2
package/package.json +4 -2
package/scripts/probe-mcp-coldstart.mjs +244 -0
package/src/cursor-incomplete-tool-visibility.ts +51 -45
package/src/cursor-mcp-timeout-override.ts +66 -11
package/src/cursor-native-tool-display-replay.ts +2 -1
package/src/cursor-provider-turn-coordinator.ts +29 -8
package/src/cursor-provider.ts +55 -33
package/src/cursor-sdk-event-debug.ts +6 -1
package/src/cursor-session-agent.ts +262 -87
package/src/cursor-tool-lifecycle.ts +9 -35
package/src/cursor-tool-names.ts +27 -0
package/src/cursor-tool-visibility.ts +63 -0
package/src/cursor-transcript-tool-specs.ts +26 -14

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,26 @@
 ## Unreleased
+## 0.1.20 - 2026-05-26
+### Added
+- Shorten known Cursor SDK MCP initialize/listTools timeouts to 10 seconds by default so unavailable configured MCP servers fail fast on first send instead of blocking for the SDK's 60-second protocol default; unknown MCP protocol timeout stacks keep the SDK default. Override with `PI_CURSOR_MCP_CONNECT_TIMEOUT_MS` or `PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS`.
+- Add maintainer cold-start timing probe `scripts/probe-mcp-coldstart.mjs` and `npm run debug:mcp-coldstart`.
+### Changed
+- Document first-send MCP cold-start behavior and initialize/listTools timeout defaults in README troubleshooting.
+- Centralize Cursor started-tool visibility classification across incomplete-tool cards, lifecycle progress, fast local discovery suppression, and completed replay titles.
+- Rework the cold-start probe to run each scenario in a fresh child process before the first Cursor SDK import.
+### Fixed
+- Make pooled Cursor session agents idle before send planning/reuse by awaiting fire-and-forget live-run `run.wait()` cleanup in `acquireSessionCursorAgent()`, scoped to the pooled agent instance id, so pi auto-compaction summarization does not hit Cursor SDK `AgentBusyError` (`already has active run`) or plan against stale send state while manual `/compact` after idle still works.
+- Fix stale busy pooled-agent waits so reset, terminal disposal, and pool-key replacement wake blocked acquires even when an old SDK `run.wait()` never settles.
+- Remove test-only live-run coordinator detachment hooks and keep race invariants inside the session-agent lease/pool contract.
+- Keep non-60-second timer scheduling on the cheap path by only capturing timeout stack traces for Cursor SDK's 60-second MCP protocol default.
 ## 0.1.19 - 2026-05-25
 ### Added

package/README.md CHANGED Viewed

@@ -224,11 +224,15 @@ PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 pi --model cursor/composer-2.5
 PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS=7200 pi --model cursor/composer-2.5
 PI_CURSOR_MCP_TOOL_TIMEOUT_MS=7200000 pi --model cursor/composer-2.5
+# Override known MCP initialize/listTools timeouts on first send (default 10s).
+PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS=5 pi --model cursor/composer-2.5
+PI_CURSOR_MCP_CONNECT_TIMEOUT_MS=5000 pi --model cursor/composer-2.5
 # Emit scrubbed bridge diagnostics as JSONL to stderr with prefix [pi-cursor-sdk:bridge].
 PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 pi --model cursor/composer-2.5
 ```
-`PI_CURSOR_PI_TOOL_BRIDGE=0` is the supported rollback flag and disables the bridge entirely. The bridge also treats `false`, `off`, `none`, `no`, and `disabled` as off; `1`, `true`, `on`, `yes`, and `enabled` as on. `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1` opts in to exposing overlapping pi tool names that Cursor already has native equivalents for. The Cursor MCP timeout override defaults to 3600 seconds because the installed Cursor SDK has a 60-second MCP request default that is too short for some local MCP tools, including bridged pi tools and configured Cursor MCP servers. `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` is off by default and emits typed, allowlisted, scrubbed single-line JSONL records to `process.stderr`. These records are operational diagnostics, not anonymous telemetry: they intentionally include tool names, safe correlation IDs, bridge run state, exposed pi↔MCP name pairs, queued requests, result resolution, rejection, cancellation, and pending counts. They must not include endpoint URLs, endpoint path components, endpoint tokens, raw args/results, stdout/stderr payloads, file contents, Cursor settings output, API keys, bearer tokens, cookies, session credentials, or secrets. Do not enable or share bridge debug logs where tool names themselves are sensitive.
+`PI_CURSOR_PI_TOOL_BRIDGE=0` is the supported rollback flag and disables the bridge entirely. The bridge also treats `false`, `off`, `none`, `no`, and `disabled` as off; `1`, `true`, `on`, `yes`, and `enabled` as on. `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1` opts in to exposing overlapping pi tool names that Cursor already has native equivalents for. The installed Cursor SDK uses a 60-second MCP protocol default with no public per-server timeout option. pi-cursor-sdk overrides that seam in two directions by default: MCP `callTool` requests are extended to 3600 seconds for long-running local MCP tools (including the pi bridge and configured Cursor MCP servers), and known MCP initialize/listTools requests on first send are shortened to 10 seconds so unavailable configured MCP servers fail fast instead of blocking for a full minute. Unknown Cursor SDK MCP protocol timeout stacks keep the SDK default instead of being shortened. Override tool-call timeouts with `PI_CURSOR_MCP_TOOL_TIMEOUT_MS` or `PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS`, and first-send initialize/listTools timeouts with `PI_CURSOR_MCP_CONNECT_TIMEOUT_MS` or `PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS`. `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` is off by default and emits typed, allowlisted, scrubbed single-line JSONL records to `process.stderr`. These records are operational diagnostics, not anonymous telemetry: they intentionally include tool names, safe correlation IDs, bridge run state, exposed pi↔MCP name pairs, queued requests, result resolution, rejection, cancellation, and pending counts. They must not include endpoint URLs, endpoint path components, endpoint tokens, raw args/results, stdout/stderr payloads, file contents, Cursor settings output, API keys, bearer tokens, cookies, session credentials, or secrets. Do not enable or share bridge debug logs where tool names themselves are sensitive.
 ### Maintainer live smoke release gate
@@ -341,6 +345,23 @@ To disable the bridge for rollback or isolation, start pi with:
 PI_CURSOR_PI_TOOL_BRIDGE=0 pi --model cursor/composer-2.5
 ```
+### First Cursor message is slow (10+ seconds)
+The extension loads Cursor setting sources with `PI_CURSOR_SETTING_SOURCES=all` by default, which includes user MCP servers from `~/.cursor/mcp.json`. On the first send of a session, the Cursor SDK connects to each configured MCP server before streaming a reply. pi-cursor-sdk shortens the known MCP initialize/listTools timeout path to **10 seconds by default** (the raw Cursor SDK default is 60 seconds), so a dead server should fail fast instead of blocking for a full minute. Unknown MCP protocol timeout stacks keep the SDK default instead of being shortened. A slow or unavailable server can still add roughly that connect timeout before the first reply. Tighten further with:
+```bash
+PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS=5 pi --model cursor/composer-2.5
+PI_CURSOR_MCP_CONNECT_TIMEOUT_MS=5000 pi --model cursor/composer-2.5
+```
+Workarounds if you do not need user-level MCP in pi:
+```bash
+PI_CURSOR_SETTING_SOURCES=project,plugins,team pi --model cursor/composer-2.5
+```
+Or fix/disable the slow MCP server in Cursor settings. Maintainer timing probe: `npm run debug:mcp-coldstart`.
 ### A Cursor MCP tool times out
 The extension raises Cursor SDK's MCP tool-call timeout from 60 seconds to 3600 seconds by default for Cursor SDK MCP `callTool` requests, including the local pi bridge and configured Cursor MCP servers. For longer local MCP tools, set one override:
@@ -357,7 +378,7 @@ This usually needs session JSONL to classify. Common cases:
 - **Model text echo:** Assistant `text` blocks contain lines like `Tool call`, `Cursor activity`, or `call cursor-replay-…` without matching `toolCall` blocks — the Cursor model narrated pi prompt transcript format instead of invoking SDK tools. See [Tool calls listed as plain text (#40 triage)](docs/cursor-testing-lessons.md#tool-calls-listed-as-plain-text-40-triage).
 - **Stale replay routing / plan-strip:** Error `toolResult` or error assistant messages contain `Tool grep/cursor/find/ls not found`, or provider debug shows `inactive_trace` after plan-mode execute stripped active tools — tracked in **#52** (distinct from model text echo and #55).
 - **Replay vs execution:** `cursor-replay-*` IDs and neutral **Cursor MCP** activity cards are display-only recorded Cursor results; they do not re-run browser/MCP work. See [Cursor native tool replay](docs/cursor-native-tool-replay.md).
-- **Run failure / discarded tools:** A red toast with scrubbed detail may indicate an SDK failure (#55). Started-but-never-completed Cursor tools now surface neutral **Cursor … did not complete** activity cards with a bounded reason; maintainer debug for the same gap remains in **#52** (`PI_CURSOR_SDK_EVENT_DEBUG=1`).
+- **Run failure / discarded tools:** A red toast with scrubbed detail may indicate an SDK failure (#55). Started-but-never-completed Cursor tools surface neutral **Cursor … did not complete** activity cards with a bounded reason when the run failed/aborted, produced no assistant text, or involved external/side-effectful tools. Incomplete fast local discovery starts (`read`, `grep`, `glob`, `ls`) are debug-only after a successful text-producing run so stale SDK start events do not create red post-answer cards; maintainer debug for the same gap remains in **#52** (`PI_CURSOR_SDK_EVENT_DEBUG=1`).
 - **Hard network crash:** pi exited with uncaught `ConnectError` / `ETIMEDOUT` — **#43**, not #40 text echo.
 Capture `pi --version`, extension version, model, flags, the exact prompt, and a redacted session dir before filing bugs.

package/docs/cursor-live-smoke-checklist.md CHANGED Viewed

@@ -279,7 +279,7 @@ Everything in this section is in scope for Cursor provider/runtime releases. The
 - Long-running bridged tool abort/cancel cleanup.
 - Native replay cards beyond read, especially shell/edit/write cards, when those renderers change.
 - Bridge question UI when `cursor_ask_question` changes.
-- MCP timeout override behavior when timeout code changes.
+- MCP timeout override behavior (3600s `callTool` default, 10s initialize/listTools default, and SDK-default unknown protocol stacks) when timeout code changes.
 - SDK `semSearch` / `recordScreen` activity replay when those formatters change. There is no reliable local prompt that forces Cursor to call these built-in SDK tools on demand; regression is covered by `test/cursor-tool-transcript.test.ts`. Opportunistically confirm neutral `Cursor semantic search` / `Cursor screen recording` cards if a live run surfaces them.
 - Ambient Cursor setting-source behavior when startup filtering or local Cursor settings handling changes.
 - Model discovery aliases/context variants when model-discovery code or Cursor SDK versions change.

package/docs/cursor-model-ux-spec.md CHANGED Viewed

@@ -25,20 +25,20 @@ Current implementation notes:
 - Prompt text is the primary provider/bridge contract. MCP tool descriptions repeat the same contract to reinforce discovery, but do not replace the prompt boundary. Cursor must call the exposed `pi__*` MCP name, not the real pi tool name shown in pi history or transcripts. Pi emits and executes the real pi tool name.
 - The provider also registers `cursor_ask_question` for Cursor models when the bridge is enabled. Cursor sees it as `pi__cursor_ask_question`, and pi executes it through the normal tool path so interactive users can choose options from pi UI. In non-UI modes it reports that UI is unavailable so Cursor can state a default assumption instead. `PI_CURSOR_PI_TOOL_BRIDGE=0` disables the local bridge, including question bridging. Cloud Cursor agents remain out of scope for the bridge.
 - The bridge queues MCP calls, emits provider `toolcall_*` events, waits for matching pi `toolResult` messages by `toolCallId`, resolves the result back into the same live Cursor SDK run without creating a new `Agent`, and never calls tool `execute()` handlers directly. The same-run resume invariant holds unless the run was disposed, aborted, or cancelled.
-- Cursor SDK MCP tool calls use a guarded timeout override because installed `@cursor/sdk` 1.0.13 has a 60-second MCP request default with no public per-server timeout option. The extension extends that Cursor SDK MCP `callTool` timeout path to 3600 seconds by default. Users can override it with `PI_CURSOR_MCP_TOOL_TIMEOUT_MS` or `PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS`.
+- Cursor SDK MCP tool calls use a guarded timeout override because installed `@cursor/sdk` 1.0.13 has a 60-second MCP request default with no public per-server timeout option. The extension extends the verified Cursor SDK MCP `callTool` timeout path to 3600 seconds by default and shortens the verified first-send MCP initialize/listTools timeout paths to 10 seconds by default so unavailable configured MCP servers do not block the first reply for a full minute; unknown MCP protocol timeout stacks keep the SDK default. Users can override tool-call timeouts with `PI_CURSOR_MCP_TOOL_TIMEOUT_MS` or `PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS`, and initialize/listTools timeouts with `PI_CURSOR_MCP_CONNECT_TIMEOUT_MS` or `PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS`.
 - Bridge diagnostics are opt-in only: `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` writes typed, allowlisted, scrubbed single-line JSONL records to `process.stderr` with prefix `[pi-cursor-sdk:bridge]`. Diagnostics are scrubbed operational logs, not anonymous telemetry. They intentionally include tool names, safe correlation IDs, run lifecycle, exposed pi↔MCP name pairs, queued requests, result resolution, rejection, cancellation, and pending counts. Correlation IDs are generated independently from the tokenized endpoint path, and Cursor MCP call IDs are hashed before serialization. Diagnostics must not include endpoint paths/URLs/path components/tokens, API keys, bearer tokens, cookies, session credentials, raw args/results, stdout/stderr payloads, file contents, Cursor settings output, or local private session paths in tracked docs, and they must not call pi UI status, notification, or footer APIs. If tool names themselves are unacceptable for a release target, bridge debug diagnostics are not safe for shared logs under the current contract.
 - This repo does not provide a generic desktop-automation, browser-driver, or CDP recipe. Provider docs should describe pi-cursor-sdk's Cursor provider/bridge contract only.
-- Cursor internal tool activity is recorded from SDK events and scrubbed. Maintainer reference for all 16 `@cursor/sdk@^1.0.13` `ToolType` values, runtime alias normalization, and intentional mapping/fallback rules: [Cursor native tool replay — SDK ToolType replay matrix](./cursor-native-tool-replay.md#sdk-tooltype-replay-matrix) (official SDK docs: https://cursor.com/docs/sdk/typescript). In interactive TTY sessions, supported completed `read`, `bash`, `grep`, `find`, `ls`, `edit`, `write`, diagnostics, delete, todo/plan, task, image generation, MCP, semantic search, and screen recording activity is replayed through pi's native tool-call rendering path with recorded Cursor results, so the TUI can show native-looking cards without rerunning Cursor's reads/shell commands/file edits. Cursor `glob` activity is replayed through native `find` cards. Cursor write activity is replayed through native-looking `write` cards, and Cursor StrReplace/edit activity uses native-looking `edit` only when recorded arguments truthfully satisfy pi's `edit` schema; path-only Cursor edit and notebook edit replay falls back to neutral Cursor activity before pi validation. Diagnostics, delete, todos/plans, task, image, and MCP activity use neutral Cursor activity cards with pi's default success/error shell. Neutral Cursor activity calls include `activityTitle` and, when available, `activitySummary` so partial/collapsed cards preserve identity such as `Cursor plan`, `Cursor todos`, `Cursor MCP`, or `Cursor edit`. For long-running or externally meaningful Cursor tools (`task`, `shell`, `mcp`, `generateImage`, `recordScreen`, `semSearch`, web search/fetch, plan/todo), the provider may surface one low-noise deferred in-progress thinking line such as `Cursor MCP: external_search` from bounded, scrubbed SDK args; fast local tools (`read`, `grep`, `glob`, and similar) skip lifecycle lines when completion follows immediately, and pi bridge MCP calls are excluded because pi already shows real pi tool execution ([lifecycle visibility](./cursor-native-tool-replay.md#low-noise-tool-lifecycle-visibility)). Replay-only tools display recorded Cursor results, normalize workspace-local paths/diff headers for display, use pi diff colors for edit previews and path-inferred syntax highlighting for write previews, and fail closed if called without a recorded result. Native replay wrappers are registered only for tool names not already owned by another extension; conflicting tools use the bounded scrubbed transcript fallback. Cursor workflow tools such as `SwitchMode` and Cursor todo state are not pi workflow controls; reported todo/plan events are displayed as Cursor activity only. Plan/todo replay cards can be followed by Cursor's final plan text, selected from `run.wait().result` when Cursor provides one and trimmed against already-emitted text. Started Cursor SDK tool calls that never receive a completion event are surfaced with bounded user-visible labels/traces (neutral activity cards when native replay routing allows, otherwise the same inactive or transcript trace fallbacks used for completed replay) instead of being silently discarded; explicit failures remain visible when Cursor reports them through completed tool calls or step results. Pi bridge MCP starts remain excluded from duplicate incomplete Cursor cards because pi already shows real pi tool execution. `PI_CURSOR_NATIVE_TOOL_DISPLAY=0` disables native replay, and `PI_CURSOR_REGISTER_NATIVE_TOOLS=0` is a registration-only opt-out that keeps the transcript fallback without shadowing pi tool names. When bridge or native replay cards are emitted, the provider mirrors Codex's turn shape as Cursor SDK activity arrives: assistant `toolUse`, pi `toolResult`s, live post-tool Cursor thinking/text, any later tool batches as further `toolUse` turns, then Cursor's final assistant answer. For shell replay, completed `stdout` / `stderr` are primary; unambiguous `shell-output-delta` data is used only as display-only fallback for empty successful shell completions, and overlapping shell calls drop ambiguous deltas instead of guessing. Non-interactive runs keep bounded scrubbed transcript output instead, preserving `pi -p` assistant text output. Cursor text deltas stream live when no live-run turn split is active.
+- Cursor internal tool activity is recorded from SDK events and scrubbed. Maintainer reference for all 16 `@cursor/sdk@^1.0.13` `ToolType` values, runtime alias normalization, and intentional mapping/fallback rules: [Cursor native tool replay — SDK ToolType replay matrix](./cursor-native-tool-replay.md#sdk-tooltype-replay-matrix) (official SDK docs: https://cursor.com/docs/sdk/typescript). In interactive TTY sessions, supported completed `read`, `bash`, `grep`, `find`, `ls`, `edit`, `write`, diagnostics, delete, todo/plan, task, image generation, MCP, semantic search, and screen recording activity is replayed through pi's native tool-call rendering path with recorded Cursor results, so the TUI can show native-looking cards without rerunning Cursor's reads/shell commands/file edits. Cursor `glob` activity is replayed through native `find` cards. Cursor write activity is replayed through native-looking `write` cards, and Cursor StrReplace/edit activity uses native-looking `edit` only when recorded arguments truthfully satisfy pi's `edit` schema; path-only Cursor edit and notebook edit replay falls back to neutral Cursor activity before pi validation. Diagnostics, delete, todos/plans, task, image, and MCP activity use neutral Cursor activity cards with pi's default success/error shell. Neutral Cursor activity calls include `activityTitle` and, when available, `activitySummary` so partial/collapsed cards preserve identity such as `Cursor plan`, `Cursor todos`, `Cursor MCP`, or `Cursor edit`. For long-running or externally meaningful Cursor tools (`task`, `shell`, `mcp`, `generateImage`, `recordScreen`, `semSearch`, web search/fetch, plan/todo), the provider may surface one low-noise deferred in-progress thinking line such as `Cursor MCP: external_search` from bounded, scrubbed SDK args; fast local tools (`read`, `grep`, `glob`, and similar) skip lifecycle lines when completion follows immediately, and pi bridge MCP calls are excluded because pi already shows real pi tool execution ([lifecycle visibility](./cursor-native-tool-replay.md#low-noise-tool-lifecycle-visibility)). Replay-only tools display recorded Cursor results, normalize workspace-local paths/diff headers for display, use pi diff colors for edit previews and path-inferred syntax highlighting for write previews, and fail closed if called without a recorded result. Native replay wrappers are registered only for tool names not already owned by another extension; conflicting tools use the bounded scrubbed transcript fallback. Cursor workflow tools such as `SwitchMode` and Cursor todo state are not pi workflow controls; reported todo/plan events are displayed as Cursor activity only. Plan/todo replay cards can be followed by Cursor's final plan text, selected from `run.wait().result` when Cursor provides one and trimmed against already-emitted text. Started Cursor SDK tool calls that never receive a completion event are surfaced with bounded user-visible labels/traces (neutral activity cards when native replay routing allows, otherwise the same inactive or transcript trace fallbacks used for completed replay) instead of being silently discarded when the run failed/aborted, produced no assistant text, or involved external/side-effectful tools; incomplete fast local discovery starts (`read`, `grep`, `glob`, `ls`) remain maintainer-debug-only after successful text-producing runs so stale SDK start events do not create red post-answer cards. Explicit failures remain visible when Cursor reports them through completed tool calls or step results. Pi bridge MCP starts remain excluded from duplicate incomplete Cursor cards because pi already shows real pi tool execution. `PI_CURSOR_NATIVE_TOOL_DISPLAY=0` disables native replay, and `PI_CURSOR_REGISTER_NATIVE_TOOLS=0` is a registration-only opt-out that keeps the transcript fallback without shadowing pi tool names. When bridge or native replay cards are emitted, the provider mirrors Codex's turn shape as Cursor SDK activity arrives: assistant `toolUse`, pi `toolResult`s, live post-tool Cursor thinking/text, any later tool batches as further `toolUse` turns, then Cursor's final assistant answer. For shell replay, completed `stdout` / `stderr` are primary; unambiguous `shell-output-delta` data is used only as display-only fallback for empty successful shell completions, and overlapping shell calls drop ambiguous deltas instead of guessing. Non-interactive runs keep bounded scrubbed transcript output instead, preserving `pi -p` assistant text output. Cursor text deltas stream live when no live-run turn split is active.
 - Synthetic replay names are internal compatibility details. New model-facing prompt text and user-visible cards use native tool names when renderer-compatible, or neutral Cursor activity labels when not. Legacy sessions containing old internal replay names are sanitized before prompt/display. Bridge MCP names such as `pi__sem_reindex` are MCP-only; pi session output uses real pi tool names.
 - Cursor SDK usage events report cumulative internal agent/tool/cache work, not the replayable pi prompt context. The extension does not copy raw Cursor SDK usage into pi usage or compaction. For Cursor assistant messages, `usage.input`/`usage.output` are approximate pi session activity components: initial Cursor prompt input is counted once, consumed split-run tool results are counted as deduped input on the following assistant turn, and assistant output includes visible text/thinking/tool-call content. `usage.totalTokens` is the replayable Cursor prompt/context estimate derived from the same `buildCursorPrompt()` path used for `Agent.send`; it may differ from `input + output` and is the context-safe value for display/compaction. `src/cursor-usage-accounting.ts` owns this usage policy, and `src/cursor-live-run-accounting.ts` owns prompt-once and consumed-tool-result accounting so provider usage and bridge result resolution share the same matched tool-result boundary.
-- Audit observation, 2026-05-19, superseded by the 2026-05-21 replay pass and #68 incomplete visibility: a missing-file read with Composer 2.5 emitted `tool-call-started` for Cursor `read`, then streamed final text `Error: File not found`, but did not emit `tool-call-completed` or an `onStep` `toolCall` error result. Leftover started calls are now surfaced at run completion through the same native replay routing as completed tools (activity cards when allowed, otherwise inactive/transcript traces) instead of becoming synthetic replay errors or bogus `cursor` toolUse events. Cursor-reported completed/step errors remain visible.
+- Audit observation, 2026-05-19, superseded by the 2026-05-21 replay pass and #68 incomplete visibility, then narrowed by the 2026-05-26 fast-local suppression: a missing-file read with Composer 2.5 emitted `tool-call-started` for Cursor `read`, then streamed final text `Error: File not found`, but did not emit `tool-call-completed` or an `onStep` `toolCall` error result. Leftover external/side-effectful started calls are surfaced at run completion through the same native replay routing as completed tools (activity cards when allowed, otherwise inactive/transcript traces), while fast local discovery starts are debug-only after a successful text-producing run. Cursor-reported completed/step errors remain visible.
 - Maintainer visual verification for replay-card changes should follow [Cursor Native Tool Visual Audit Workflow](./cursor-native-tool-visual-audit.md): offscreen PTY-driven pi run, xterm.js/Playwright screenshot rendering, and JSONL inspection before accepting commits or PRs.
 - Cursor provider/runtime releases should follow [Cursor Live Smoke Checklist](./cursor-live-smoke-checklist.md) with real `pi -e . --cursor-no-fast --model cursor/composer-2.5` invocations, manual observation, temporary session dirs, diagnostics scans, and persisted JSONL inspection. See [Cursor testing lessons](./cursor-testing-lessons.md) for auth.json seeding, isolated smoke harnesses, and replay JSONL scans. Assume every runtime surface is in scope. A release is not ready when any live check is optional, deferred, mostly passing, or unobserved.
 - For models without a catalog `context` parameter, context windows are not hardcoded. The extension ships a bundled SDK-derived default/non-Max cache generated from `createAgentPlatform().checkpointStore.loadLatest(agentId).tokenDetails.maxTokens`. Successful runs can update a local override cache, but model discovery does not probe models at startup.
 - Max Mode context windows are distinct from default/non-Max context windows. `@cursor/sdk` 1.0.13 documentation says the SDK may enable Max Mode automatically when a selected model requires it, but the public local-agent `ModelSelection` path still does not expose a manual Max Mode selector. Do not advertise Max Mode context windows unless the SDK catalog exposes an exact parameter/variant or the SDK public API adds a Max Mode selector that the extension actually sends.
 - `@cursor/sdk` 1.0.13 adds latest-style `ModelListItem.aliases`. The extension registers only unambiguous aliases as pi model IDs (with the same context suffixes when applicable) and sends the alias back in `ModelSelection.id`, while sharing Cursor-only state such as fast defaults with the underlying catalog `id`. Aliases shared by multiple base models, such as generic family aliases, are skipped because the pi row metadata would otherwise imply one base model while Cursor may resolve the alias to another.
 - Session-scoped Cursor SDK agent pooling reuses one live `@cursor/sdk` agent across compatible follow-up turns within the same pi session scope. `planCursorSessionSend()` in `src/cursor-session-send-policy.ts` decides whether the next turn sends a full bootstrap prompt or an incremental follow-up, whether the SDK agent must be recreated, and why. `computeCursorContextFingerprint()` and `shouldBootstrapCursorContext()` remain the context-only bootstrap signal. The pool recreates the agent when context diverges, when branch or compaction summaries appear after `/tree` navigation or compaction, after 20 completed incremental sends, when the API key identity changes, after send errors, on `session_shutdown`, and when `session_before_tree` / `session_tree` invalidate the active branch. Incremental sends omit the full Cursor SDK tool boundary block because the session agent retains prior bootstrap context, but every send ends with a short tool tail guard placed after the latest user request.
-- Pi steering/follow-up delivery can arrive while a split live Cursor SDK run is still active. The provider resolves pending live runs by scanning trailing `toolResult` messages while skipping trailing `user` messages, tracks the active live run per session scope, and resumes the in-flight run instead of calling `Agent.send()` again. When the context ends with steering user text after tool results, the provider releases the prior live run and chains an incremental `Agent.send()` for the latest user message in the same provider turn; if the prior run emits more text or tool requests after steering arrives, that stale activity is cancelled instead of surfacing another old-run tool turn and losing the new user input. A pre-send guard waits for or resumes any still-active scoped live run before starting a fresh send so `@cursor/sdk` `AgentBusyError` (`already has active run`) does not surface to pi users.
+- Pi steering/follow-up delivery can arrive while a split live Cursor SDK run is still active. The provider resolves pending live runs by scanning trailing `toolResult` messages while skipping trailing `user` messages, tracks the active live run per session scope, and resumes the in-flight run instead of calling `Agent.send()` again. When the context ends with steering user text after tool results, the provider releases the prior live run and chains an incremental `Agent.send()` for the latest user message in the same provider turn; if the prior run emits more text or tool requests after steering arrives, that stale activity is cancelled instead of surfacing another old-run tool turn and losing the new user input. A pre-send guard waits for or resumes any still-active scoped live run before starting a fresh send so `@cursor/sdk` `AgentBusyError` (`already has active run`) does not surface to pi users. `acquireSessionCursorAgent()` also awaits fire-and-forget background `run.wait()` cleanup for the current pooled agent instance before returning a lease, so send planning, transcript offsets, and later `Agent.send()` do not race the prior turn's SDK run completion (for example pi auto-compaction summarization). Tracked completions and send commits are scoped to the pooled agent `instanceId` so disposal/replacement drops stale tracking and ignores late commits from disposed agents.
 ## Goal
@@ -678,5 +678,6 @@ Before calling done:
    - confirm bridged MCP requests emit real pi tool calls and resolve matching pi tool results back to the same live Cursor SDK run without creating a new `Agent`, unless the run was disposed, aborted, or cancelled
    - confirm bridge MCP activity is suppressed from Cursor replay while non-bridge Cursor MCP activity remains visible
    - confirm `PI_CURSOR_MCP_TOOL_TIMEOUT_MS` and `PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS` override the Cursor SDK MCP callTool timeout seam
+   - confirm `PI_CURSOR_MCP_CONNECT_TIMEOUT_MS` and `PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS` override the Cursor SDK MCP initialize/listTools timeout seam while unknown protocol timeout stacks keep the SDK default
    - confirm `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` emits typed, allowlisted, scrubbed JSONL to `process.stderr` with prefix `[pi-cursor-sdk:bridge]`, omits endpoint URLs/path components/tokens, and unset/false leaves output unchanged
    - run the visual audit workflow when replay card visuals or bridge card visuals change; JSONL should show real pi tool names for bridged calls and no duplicate MCP replay for bridge calls

package/docs/cursor-native-tool-replay.md CHANGED Viewed

@@ -28,10 +28,12 @@ PI_CURSOR_PI_TOOL_BRIDGE=0 pi --model cursor/composer-2.5
 PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 pi --model cursor/composer-2.5
 PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS=7200 pi --model cursor/composer-2.5
 PI_CURSOR_MCP_TOOL_TIMEOUT_MS=7200000 pi --model cursor/composer-2.5
+PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS=5 pi --model cursor/composer-2.5
+PI_CURSOR_MCP_CONNECT_TIMEOUT_MS=5000 pi --model cursor/composer-2.5
 PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 pi --model cursor/composer-2.5
 ```
-`PI_CURSOR_PI_TOOL_BRIDGE=0` disables the bridge, including `pi__cursor_ask_question`. `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1` opts in to exposing overlapping pi tool names that Cursor already has native equivalents for (`read`, `bash`, `write`, `edit`, `grep`, `find`, and `ls`). By default those names are hidden even when pi's Cursor replay wrapper has registered them as extension tools; non-overlapping active built-ins remain bridgeable by default. `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` emits typed, allowlisted, scrubbed single-line JSONL bridge diagnostics to `process.stderr` with prefix `[pi-cursor-sdk:bridge]`; it is off by default, uses run-safe IDs that are not reused in endpoint paths, and does not print endpoint URLs/path components/tokens, raw args/results, file contents, or secrets. Cursor-native tools, Cursor settings, plugins, and configured Cursor MCP servers still come from the Cursor SDK local agent path. Cloud Cursor agents are out of scope for this bridge.
+`PI_CURSOR_PI_TOOL_BRIDGE=0` disables the bridge, including `pi__cursor_ask_question`. `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1` opts in to exposing overlapping pi tool names that Cursor already has native equivalents for (`read`, `bash`, `write`, `edit`, `grep`, `find`, and `ls`). By default those names are hidden even when pi's Cursor replay wrapper has registered them as extension tools; non-overlapping active built-ins remain bridgeable by default. The installed Cursor SDK uses a 60-second MCP protocol default; pi-cursor-sdk overrides that seam by default with 3600 seconds for MCP `callTool` requests and 10 seconds for verified initialize/listTools requests on first send. Unknown MCP protocol timeout stacks keep the SDK default. `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` emits typed, allowlisted, scrubbed single-line JSONL bridge diagnostics to `process.stderr` with prefix `[pi-cursor-sdk:bridge]`; it is off by default, uses run-safe IDs that are not reused in endpoint paths, and does not print endpoint URLs/path components/tokens, raw args/results, file contents, or secrets. Cursor-native tools, Cursor settings, plugins, and configured Cursor MCP servers still come from the Cursor SDK local agent path. Cloud Cursor agents are out of scope for this bridge.
 ## What gets replayed
@@ -58,7 +60,7 @@ Cursor `glob` activity is displayed through native `find` cards.
 For the full `@cursor/sdk@^1.0.13` `ToolType` set, disposition matrix, and runtime alias normalization, see [SDK ToolType replay matrix](#sdk-tooltype-replay-matrix) below. Official SDK reference: https://cursor.com/docs/sdk/typescript
-Edit and write activity replays through pi-facing `edit` and `write` cards only when replay arguments truthfully satisfy the matching pi schema, but still uses recorded Cursor results only. The adapter passes through truthful Cursor paths, content when Cursor reported it, and recorded diff/details; it does not pretend Cursor's editing schema is pi's schema and it fails closed if a recorded replay result is missing. Cursor `StrReplace` with recorded replacement text displays as native-looking `edit`; path-only Cursor `edit` and notebook edit activity fall back to neutral Cursor activity so pi does not reject the replay before recorded-result handling. Cursor `write` displays as native-looking `write`. Diagnostics, delete, todos/plans, task, image, MCP, semantic search, screen recording, and web search/fetch activity use neutral Cursor activity cards with pi's default success/error tool shell. MCP completions whose `toolName` is `WebSearch` / `web_search` / `WebFetch` / similar are labeled **Cursor web search** or **Cursor web fetch** instead of generic **Cursor MCP**. Neutral Cursor activity cards carry display metadata such as `activityTitle` and `activitySummary`, so partial/collapsed cards can say `Cursor plan`, `Cursor todos`, `Cursor MCP`, `Cursor semantic search`, `Cursor screen recording`, `Cursor web search`, `Cursor web fetch`, or `Cursor edit` instead of only `Cursor activity`. These replay tools only display recorded Cursor results; they never mutate files or execute tool work directly. Replay paths are normalized to workspace-relative paths when possible. Collapsed replay cards include bounded previews for diffs and text details so small edits, todos, task output, and MCP results are visible without expanding; edit previews omit raw unified diff headers and show compact numbered changed/context lines using pi's native diff added/removed/context colors, and write previews use syntax highlighting when pi can infer a language from the path. Image generation replay cards show the saved image path in the collapsed summary and render the image inline when pi terminal image display is enabled and the generated file is still readable.
+Edit and write activity replays through pi-facing `edit` and `write` cards only when replay arguments truthfully satisfy the matching pi schema, but still uses recorded Cursor results only. The adapter passes through truthful Cursor paths, content when Cursor reported it, and recorded diff/details; it does not pretend Cursor's editing schema is pi's schema and it fails closed if a recorded replay result is missing. Cursor `StrReplace` with recorded replacement text displays as native-looking `edit`; path-only Cursor `edit` and notebook edit activity fall back to neutral Cursor activity so pi does not reject the replay before recorded-result handling. Cursor `write` displays as native-looking `write`. Diagnostics, delete, todos/plans, task, image, MCP, semantic search, screen recording, and web search/fetch activity use neutral Cursor activity cards with pi's default success/error tool shell. MCP completions whose `toolName` is `WebSearch` / `web_search` / `WebFetch` / similar are labeled **Cursor web search** or **Cursor web fetch** instead of generic **Cursor MCP**. Neutral Cursor activity cards carry display metadata such as `activityTitle` and `activitySummary`, so partial/collapsed cards can say `Cursor plan`, `Cursor todos`, `Cursor MCP`, `Cursor semantic search`, `Cursor screen recording`, `Cursor web search`, `Cursor web fetch`, or `Cursor edit` instead of only `Cursor activity`. These replay tools only display recorded Cursor results; they never mutate files or execute tool work directly. Replay paths are normalized to workspace-relative paths when possible. Most collapsed replay cards include bounded previews for diffs and text details so small edits, todos, task output, and MCP results are visible without expanding; web search/fetch activity stays summary-only while collapsed because those cards often arrive after final text and can otherwise bury the answer. Ctrl+O expansion shows the recorded details. Edit previews omit raw unified diff headers and show compact numbered changed/context lines using pi's native diff added/removed/context colors, and write previews use syntax highlighting when pi can infer a language from the path. Image generation replay cards show the saved image path in the collapsed summary and render the image inline when pi terminal image display is enabled and the generated file is still readable.
 ## SDK ToolType replay matrix
@@ -139,7 +141,7 @@ Native replay is display-only:
 If a Cursor read completion reports no content, the extension may include a bounded local file preview for safe in-workspace paths. That preview is labeled as a local preview captured at transcript time, not guaranteed Cursor-observed content.
-Other unsupported Cursor SDK tools may still be described through a bounded scrubbed activity transcript when the SDK reports completed tool-call data. Started Cursor SDK tool calls that never receive a completion event are surfaced as neutral **Cursor … did not complete** activity cards or equivalent low-noise thinking traces with a bounded reason such as `missing completion`, `aborted`, or `SDK run failed`. They are not replayed as successful results and raw args/results/errors are not dumped. Explicit failures remain visible when Cursor reports an error through a completed tool call or step result. Some Cursor-internal workflow actions (including web search/fetch that never surfaces as replayable SDK tool completions or local transcript web tool records) may only appear in Cursor's own thinking stream, assistant text, or not be reported as replayable SDK tool data at all.
+Other unsupported Cursor SDK tools may still be described through a bounded scrubbed activity transcript when the SDK reports completed tool-call data. Started Cursor SDK tool calls that never receive a completion event are surfaced as neutral **Cursor … did not complete** activity cards or equivalent low-noise thinking traces with a bounded reason such as `missing completion`, `aborted`, or `SDK run failed` when the run failed/aborted, produced no assistant text, or involved external/side-effectful tools. Incomplete fast local discovery starts (`read`, `grep`, `glob`, `ls`) are recorded for maintainer debug but suppressed from user-visible output after a successful text-producing run, because those are often stale SDK start events that would otherwise create confusing red post-answer cards such as **Cursor find did not complete**. They are not replayed as successful results and raw args/results/errors are not dumped. Explicit failures remain visible when Cursor reports an error through a completed tool call or step result. Some Cursor-internal workflow actions (including web search/fetch that never surfaces as replayable SDK tool completions or local transcript web tool records) may only appear in Cursor's own thinking stream, assistant text, or not be reported as replayable SDK tool data at all.
 ## SDK reporting limits
@@ -150,7 +152,7 @@ These are integration boundaries, not pi replay bugs:
 - **Future SDK tools:** Cursor's official SDK docs say tool names, args, and result payloads can change. Unknown completed tools therefore fall back to neutral Cursor activity cards with bounded, scrubbed text. The extension cannot render tools that the SDK never emits.
 - **Abort exceptions:** user aborts are guarded for the observed Cursor SDK ConnectRPC cancellation shape. A materially different future SDK process-level abort error must be added to the guard after it is observed; broad suppression would hide real crashes.
-Maintainer debug (`PI_CURSOR_SDK_EVENT_DEBUG=1`) still records the same discarded started-call events in `coordinator-events.jsonl` under phase `discarded-incomplete-started-tool-call` for investigation (**#52**). User-visible incomplete cards and debug artifacts are complementary: cards explain the gap in the TUI; debug files retain normalized tool names and scrubbed call-id hashes without changing default stderr behavior.
+Maintainer debug (`PI_CURSOR_SDK_EVENT_DEBUG=1`) still records the same discarded started-call events in `coordinator-events.jsonl` under phase `discarded-incomplete-started-tool-call` for investigation (**#52**), including fast local starts suppressed from successful text-producing runs. User-visible incomplete cards and debug artifacts are complementary: cards explain actionable gaps in the TUI; debug files retain normalized tool names and scrubbed call-id hashes without changing default stderr behavior.
 ## Low-noise tool lifecycle visibility

package/docs/cursor-testing-lessons.md CHANGED Viewed

@@ -313,7 +313,7 @@ Capture is file-only by default: no stderr markers, and bridge diagnostics durin
 ### Discarded incomplete SDK tool calls
-When Cursor emits `tool-call-started` without a matching completion/step result, the provider surfaces a bounded neutral **Cursor … did not complete** activity card or thinking trace at run end. pi bridge MCP calls (`pi__*`) are excluded because pi already shows the real pi tool execution path.
+When Cursor emits `tool-call-started` without a matching completion/step result, the provider surfaces a bounded neutral **Cursor … did not complete** activity card or thinking trace at run end for failed/aborted runs, runs with no assistant text, and external/side-effectful tools. Incomplete fast local discovery starts (`read`, `grep`, `glob`, `ls`) are debug-only after a successful text-producing run so stale SDK start events do not create red post-answer cards. pi bridge MCP calls (`pi__*`) are excluded because pi already shows the real pi tool execution path.
 With `PI_CURSOR_SDK_EVENT_DEBUG=1`, each discarded started call is also recorded in `coordinator-events.jsonl` under phase `discarded-incomplete-started-tool-call` with:
@@ -321,7 +321,7 @@ With `PI_CURSOR_SDK_EVENT_DEBUG=1`, each discarded started call is also recorded
 - scrubbed call-id hash (raw call IDs are not written)
 - reason such as `no-completion-at-run-end`, `abort`, or `sdk-failure`
-Stderr output for these records requires `PI_CURSOR_SDK_EVENT_DEBUG_STDERR=1`. This complements the standalone `npm run debug:sdk-events` probe by interpreting a specific provider discard path during normal pi runs. User-visible incomplete cards explain the gap in the TUI; debug artifacts remain maintainer-only (**#52**).
+Stderr output for these records requires `PI_CURSOR_SDK_EVENT_DEBUG_STDERR=1`. This complements the standalone `npm run debug:sdk-events` probe by interpreting a specific provider discard path during normal pi runs. User-visible incomplete cards explain actionable gaps in the TUI; debug artifacts remain maintainer-only (**#52**) and are the source of truth for suppressed fast-local stale starts.
 ## Tool calls listed as plain text (#40 triage)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
 	"name": "pi-cursor-sdk",
-	"version": "0.1.19",
+	"version": "0.1.20",
 	"description": "pi provider extension backed by @cursor/sdk local agents",
 	"author": "Mitch Fultz (https://github.com/fitchmultz)",
 	"license": "MIT",
@@ -28,6 +28,7 @@
 		"scripts/tmux-live-smoke.sh",
 		"scripts/isolated-cursor-smoke.sh",
 		"scripts/validate-smoke-jsonl.mjs",
+		"scripts/probe-mcp-coldstart.mjs",
 		"scripts/debug-sdk-events.mjs",
 		"scripts/debug-provider-events.mjs",
 		"scripts/lib/cursor-probe-utils.mjs",
@@ -55,7 +56,8 @@
 		"smoke:steering": "node scripts/steering-rpc-smoke.mjs",
 		"smoke:jsonl": "node scripts/validate-smoke-jsonl.mjs",
 		"debug:sdk-events": "node scripts/debug-sdk-events.mjs",
-		"debug:provider-events": "node scripts/debug-provider-events.mjs"
+		"debug:provider-events": "node scripts/debug-provider-events.mjs",
+		"debug:mcp-coldstart": "node scripts/probe-mcp-coldstart.mjs"
 	},
 	"dependencies": {
 		"@cursor/sdk": "^1.0.13",

package/scripts/probe-mcp-coldstart.mjs ADDED Viewed

@@ -0,0 +1,244 @@
+#!/usr/bin/env node
+/**
+ * Maintainer probe: measure Cursor SDK cold-start timing with/without ambient MCP settings
+ * and with the pi-cursor-sdk MCP connect timeout override installed.
+ */
+import { spawn } from "node:child_process";
+import { performance } from "node:perf_hooks";
+import { fileURLToPath } from "node:url";
+import {
+	installCursorMcpToolTimeoutOverride,
+	restoreCursorMcpToolTimeoutOverride,
+} from "../src/cursor-mcp-timeout-override.ts";
+import { scrubSensitiveText } from "./lib/cursor-probe-utils.mjs";
+import { installCursorSdkOutputFilter, suppressCursorSdkOutput } from "./lib/cursor-sdk-output-filter.mjs";
+const SCRIPT_PATH = fileURLToPath(import.meta.url);
+const SCENARIOS = [
+	{ label: "with-all-settings", settingSources: ["all"] },
+	{ label: "with-all-settings+connect-override", settingSources: ["all"], installConnectOverride: true },
+	{ label: "no-setting-sources", settingSources: undefined },
+];
+function printHelp() {
+	console.log(`Measure Cursor SDK first-send MCP cold-start timing.
+Usage:
+  CURSOR_API_KEY=... npm run debug:mcp-coldstart
+  node scripts/probe-mcp-coldstart.mjs [options]
+Options:
+  --api-key <key>     Cursor API key. Prefer CURSOR_API_KEY to avoid shell history.
+  --scenario <label>  Run one scenario in this process. Used by the orchestrator.
+  -h, --help          Show this help without importing or calling the Cursor SDK.
+Stdout:
+  Emits one JSON object per scenario. Human status lines go to stderr.
+Scenarios:
+  with-all-settings                   Cursor settingSources=["all"]
+  with-all-settings+connect-override  Same, with pi-cursor-sdk timeout override installed
+  no-setting-sources                  No explicit settingSources
+Safety:
+  - --help never performs live Cursor calls.
+  - Each default scenario runs in a fresh child process before its first Cursor SDK import.
+  - SDK startup noise is suppressed.
+  - Error messages are scrubbed for API keys, bearer tokens, cookies, and bridge endpoints.`);
+}
+function fail(message, apiKey) {
+	console.error(`probe-mcp-coldstart: ${scrubSensitiveText(message, apiKey)}`);
+	process.exit(1);
+}
+function findScenario(label) {
+	return SCENARIOS.find((scenario) => scenario.label === label);
+}
+function parseArgs(argv, env = process.env) {
+	const args = {
+		apiKey: env.CURSOR_API_KEY?.trim() || undefined,
+		help: false,
+		scenario: undefined,
+	};
+	for (let index = 0; index < argv.length; index++) {
+		const arg = argv[index];
+		if (arg === "-h" || arg === "--help") {
+			args.help = true;
+			continue;
+		}
+		if (arg === "--api-key") {
+			const value = argv[++index];
+			if (!value || value.startsWith("--")) fail("--api-key requires a value", args.apiKey);
+			args.apiKey = value.trim();
+			continue;
+		}
+		if (arg.startsWith("--api-key=")) {
+			args.apiKey = arg.slice("--api-key=".length).trim();
+			continue;
+		}
+		if (arg === "--scenario") {
+			const value = argv[++index];
+			if (!value || value.startsWith("--")) fail("--scenario requires a value", args.apiKey);
+			args.scenario = value.trim();
+			continue;
+		}
+		if (arg.startsWith("--scenario=")) {
+			args.scenario = arg.slice("--scenario=".length).trim();
+			continue;
+		}
+		fail(`unknown argument: ${arg}`, args.apiKey);
+	}
+	if (args.scenario && !findScenario(args.scenario)) {
+		fail(`unknown scenario: ${args.scenario}`, args.apiKey);
+	}
+	return args;
+}
+async function probe(Agent, apiKey, label, { settingSources, installConnectOverride = false } = {}) {
+	let agent;
+	try {
+		const marks = [];
+		const t0 = performance.now();
+		const mark = (name) => marks.push({ name, ms: Math.round(performance.now() - t0) });
+		mark("start");
+		agent = await suppressCursorSdkOutput(() =>
+			Agent.create({
+				apiKey,
+				model: { id: "composer-2.5" },
+				local: settingSources
+					? { cwd: process.cwd(), settingSources }
+					: { cwd: process.cwd() },
+			}),
+		);
+		mark("agent.create");
+		let firstDeltaMs;
+		const run = await suppressCursorSdkOutput(() =>
+			agent.send("Reply with exactly: pong", {
+				onDelta: ({ update }) => {
+					if (firstDeltaMs === undefined && update.type === "text-delta") {
+						firstDeltaMs = Math.round(performance.now() - t0);
+						mark("first-delta");
+					}
+				},
+			}),
+		);
+		mark("agent.send-returned");
+		const result = await suppressCursorSdkOutput(() => run.wait());
+		mark("run.wait");
+		await suppressCursorSdkOutput(() => agent[Symbol.asyncDispose]());
+		agent = undefined;
+		mark("dispose");
+		const sendReturnedMs = marks.find((entry) => entry.name === "agent.send-returned")?.ms;
+		const mcpBlockingMs =
+			firstDeltaMs !== undefined && sendReturnedMs !== undefined ? firstDeltaMs - sendReturnedMs : undefined;
+		return {
+			label,
+			settingSources: settingSources ?? null,
+			installConnectOverride,
+			marks,
+			firstDeltaMs,
+			mcpBlockingMs,
+			status: result.status,
+			text: typeof result.result === "string" ? result.result.slice(0, 120) : null,
+		};
+	} finally {
+		if (agent) {
+			await suppressCursorSdkOutput(() => agent[Symbol.asyncDispose]()).catch(() => undefined);
+		}
+	}
+}
+async function runScenarioInThisProcess(args, scenario) {
+	const restoreOutputFilter = installCursorSdkOutputFilter();
+	try {
+		if (scenario.installConnectOverride) {
+			const state = installCursorMcpToolTimeoutOverride();
+			console.error(
+				`probe-mcp-coldstart: installed connect override (${state.connectTimeoutMs}ms initialize/listTools, ${state.timeoutMs}ms callTool)`,
+			);
+		}
+		const { Agent } = await suppressCursorSdkOutput(() => import("@cursor/sdk"));
+		console.log(JSON.stringify(await probe(Agent, args.apiKey, scenario.label, scenario)));
+	} catch (error) {
+		const message = error instanceof Error ? error.message : String(error);
+		console.log(
+			JSON.stringify({
+				label: scenario.label,
+				error: scrubSensitiveText(message, args.apiKey),
+			}),
+		);
+	} finally {
+		restoreCursorMcpToolTimeoutOverride();
+		restoreOutputFilter();
+	}
+}
+function runScenarioChild(args, scenario) {
+	return new Promise((resolve) => {
+		const child = spawn(process.execPath, [SCRIPT_PATH, "--scenario", scenario.label], {
+			cwd: process.cwd(),
+			env: { ...process.env, CURSOR_API_KEY: args.apiKey },
+			stdio: ["ignore", "pipe", "pipe"],
+		});
+		let stdout = "";
+		let stderr = "";
+		child.stdout.on("data", (chunk) => {
+			stdout += chunk;
+		});
+		child.stderr.on("data", (chunk) => {
+			stderr += chunk;
+		});
+		child.on("error", (error) => {
+			stderr += error instanceof Error ? error.message : String(error);
+		});
+		child.on("close", (code) => {
+			const scrubbedStderr = scrubSensitiveText(stderr, args.apiKey);
+			if (scrubbedStderr) process.stderr.write(scrubbedStderr.endsWith("\n") ? scrubbedStderr : `${scrubbedStderr}\n`);
+			if (code === 0 && stdout.trim()) {
+				process.stdout.write(stdout.endsWith("\n") ? stdout : `${stdout}\n`);
+				resolve();
+				return;
+			}
+			const error = scrubbedStderr.trim() || `child process exited with code ${code ?? "unknown"}`;
+			console.log(JSON.stringify({ label: scenario.label, error }));
+			resolve();
+		});
+	});
+}
+async function main(argv = process.argv.slice(2), env = process.env) {
+	const args = parseArgs(argv, env);
+	if (args.help) {
+		printHelp();
+		return;
+	}
+	if (!args.apiKey) {
+		fail("CURSOR_API_KEY is required. Set CURSOR_API_KEY or pass --api-key.");
+	}
+	const scenario = args.scenario ? findScenario(args.scenario) : undefined;
+	if (scenario) {
+		await runScenarioInThisProcess(args, scenario);
+		return;
+	}
+	for (const scenarioToRun of SCENARIOS) {
+		await runScenarioChild(args, scenarioToRun);
+	}
+}
+if (import.meta.url === new URL(process.argv[1], "file:").href) {
+	main().catch((error) => {
+		const message = error instanceof Error ? error.message : String(error);
+		fail(message, process.env.CURSOR_API_KEY);
+	});
+}