pi-cursor-sdk 0.1.27 → 0.1.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. package/CHANGELOG.md +29 -0
  2. package/README.md +40 -37
  3. package/docs/crabbox-platform-testing-lessons.md +508 -0
  4. package/docs/cursor-dogfood-checklist.md +4 -3
  5. package/docs/cursor-live-smoke-checklist.md +24 -22
  6. package/docs/cursor-model-ux-spec.md +12 -12
  7. package/docs/cursor-native-tool-replay.md +10 -10
  8. package/docs/cursor-native-tool-visual-audit.md +9 -7
  9. package/docs/cursor-testing-lessons.md +22 -17
  10. package/docs/cursor-tool-surfaces.md +3 -3
  11. package/docs/platform-smoke.md +994 -0
  12. package/package.json +35 -6
  13. package/platform-smoke.config.mjs +21 -0
  14. package/scripts/debug-provider-events.mjs +10 -3
  15. package/scripts/debug-sdk-events.mjs +10 -2
  16. package/scripts/isolated-cursor-smoke.sh +4 -4
  17. package/scripts/lib/cursor-visual-render.mjs +1 -0
  18. package/scripts/platform-smoke/artifacts.mjs +124 -0
  19. package/scripts/platform-smoke/assertions.mjs +101 -0
  20. package/scripts/platform-smoke/card-detect.mjs +96 -0
  21. package/scripts/platform-smoke/crabbox-runner.mjs +215 -0
  22. package/scripts/platform-smoke/doctor.mjs +446 -0
  23. package/scripts/platform-smoke/jsonl-text.mjs +31 -0
  24. package/scripts/platform-smoke/live-suite-runner.mjs +677 -0
  25. package/scripts/platform-smoke/platform-build-windows.ps1 +187 -0
  26. package/scripts/platform-smoke/pty-capture.mjs +131 -0
  27. package/scripts/platform-smoke/render-ansi.mjs +65 -0
  28. package/scripts/platform-smoke/scenarios.mjs +186 -0
  29. package/scripts/platform-smoke/targets.mjs +900 -0
  30. package/scripts/platform-smoke/visual-evidence.mjs +139 -0
  31. package/scripts/platform-smoke.mjs +193 -0
  32. package/scripts/probe-mcp-coldstart.mjs +8 -1
  33. package/scripts/steering-rpc-smoke.mjs +1 -1
  34. package/scripts/tmux-live-smoke.sh +3 -3
  35. package/scripts/visual-tui-smoke.mjs +1 -1
  36. package/src/cursor-pi-tool-bridge-abort.ts +1 -0
  37. package/src/cursor-pi-tool-bridge-diagnostics.ts +12 -1
  38. package/src/cursor-pi-tool-bridge.ts +46 -1
  39. package/src/cursor-provider-errors.ts +18 -2
  40. package/src/cursor-provider-turn-lifecycle-emitter.ts +65 -8
  41. package/src/cursor-provider-turn-tool-ledger.ts +2 -3
  42. package/src/cursor-run-final-text.ts +11 -1
  43. package/src/cursor-sdk-process-error-guard.ts +1 -1
  44. package/src/cursor-state.ts +38 -19
  45. package/src/cursor-tool-lifecycle.ts +1 -1
  46. package/src/cursor-tool-manifest.ts +1 -1
  47. package/src/cursor-transcript-utils.ts +7 -3
@@ -1,16 +1,18 @@
1
1
  # Cursor Live Smoke Checklist
2
2
 
3
+ > **Platform Smoke (new):** The required cross-platform release gate is `npm run smoke:platform:doctor && npm run smoke:platform:all`. See [docs/platform-smoke.md](./platform-smoke.md) for the full contract. The manual checks below remain useful inner-loop/debug tools but are not the required release gate.
4
+
3
5
  ## Purpose
4
6
 
5
- Use this manual checklist before releasing Cursor provider/runtime changes. Unit tests and mocks are necessary, but they are not enough for this extension. See [Cursor testing lessons](./cursor-testing-lessons.md) for auth/isolated-harness pitfalls and the plan-mode replay regression that motivated recent hardening. Always assume every runtime surface is in scope. A release is not ready until every live check below has been observed with `cursor/composer-2.5` through the local working tree.
7
+ Use this manual checklist during development and debugging of Cursor provider/runtime changes. Unit tests and mocks are necessary, but they are not enough for this extension. See [Cursor testing lessons](./cursor-testing-lessons.md) for auth/isolated-harness pitfalls and the plan-mode replay regression that motivated recent hardening. Always assume every runtime surface is in scope. For release readiness, run the platform gate in [docs/platform-smoke.md](./platform-smoke.md); this checklist is inner-loop evidence only.
6
8
 
7
- ## Release rule
9
+ ## Inner-loop rule
8
10
 
9
11
  - Run from a clean working tree except for the intended branch diff.
10
- - Use the local extension under test: `pi -e . --cursor-no-fast --model cursor/composer-2.5`.
12
+ - Use the local extension under test: `pi -e . --cursor-no-fast --model cursor/composer-2-5`.
11
13
  - Use a temporary `--session-dir` for every run.
12
14
  - Do not paste or commit Cursor API keys, raw session contents with secrets, endpoint URLs, or local private paths.
13
- - If a check fails, stop and fix or explicitly mark the release blocked. Do not ship with "optional," "deferred," "mostly," or "probably" checks outstanding.
15
+ - If an inner-loop check fails, stop and fix or use [docs/platform-smoke.md](./platform-smoke.md) as the release-blocking source of truth. Do not treat this checklist as a narrower replacement for the platform gate.
14
16
  - Do not narrow the smoke scope to the apparent code diff. Treat provider reality, TUI behavior, bridge behavior, replay behavior, diagnostics safety, abort/cancel cleanup, usage accounting, packaging, and cleanup as in scope for every Cursor provider/runtime release.
15
17
  - A check is passed only when the visible TUI/output, stderr diagnostics, and persisted JSONL agree with the expected behavior.
16
18
 
@@ -61,13 +63,13 @@ node scripts/validate-smoke-jsonl.mjs --replay-errors-only "$SMOKE_DIR/session-s
61
63
 
62
64
  The replay scan flags only error `toolResult` / error assistant messages with `Tool grep/cursor/find/ls not found`, not successful reads of docs that mention those strings. See [Cursor testing lessons](./cursor-testing-lessons.md#what-counts-as-a-replay-failure).
63
65
 
64
- `npm run smoke:live` is a helper only; it polls the section 3 TUI for answer/footer evidence and then cleans up the tmux session, but it does not replace the canonical rendered-PNG visual review in section 4. Run the relevant helper `--self-test` (`smoke:live`, `smoke:visual`, `smoke:steering`, or `smoke:isolated`) when changing sealed PATH or env wrappers. Release readiness still requires the manual checks below for detailed visual TUI behavior, bridge, standalone native replay, abort/cancel, packaging, cleanup, and any touched runtime surface not covered by the helper.
66
+ `npm run smoke:live` is a helper only; it polls the section 3 TUI for answer/footer evidence and then cleans up the tmux session, but it does not replace the canonical rendered-PNG visual review in section 4. Run the relevant helper `--self-test` (`smoke:live`, `smoke:visual`, `smoke:steering`, or `smoke:isolated`) when changing sealed PATH or env wrappers. Release readiness requires the platform smoke gate. Run focused manual checks below when debugging detailed visual TUI behavior, bridge, standalone native replay, abort/cancel, packaging, cleanup, or any touched runtime surface before rerunning the platform gate.
65
67
 
66
68
  Pass criteria:
67
69
 
68
- - `pi --version` reports pi 0.77.0 for this cutover baseline.
69
- - `npm ls` shows `@cursor/sdk@1.0.16` and local `@earendil-works/*@0.77.0` packages.
70
- - `cursor/composer-2.5` appears in the model list.
70
+ - `pi --version` reports pi 0.78.0 for this cutover baseline.
71
+ - `npm ls` shows `@cursor/sdk@1.0.17` and local `@earendil-works/*@0.78.0` packages.
72
+ - `cursor/composer-2-5` appears in the model list.
71
73
  - No Cursor key or auth token is printed.
72
74
  - If neither `~/.pi/agent/auth.json` cursor auth nor `CURSOR_API_KEY` is available, stop and report the live smoke as blocked.
73
75
 
@@ -75,7 +77,7 @@ Pass criteria:
75
77
 
76
78
  ```bash
77
79
  PI_CURSOR_SETTING_SOURCES=none \
78
- pi -e . --cursor-no-fast --model cursor/composer-2.5 \
80
+ pi -e . --cursor-no-fast --model cursor/composer-2-5 \
79
81
  --session-dir "$SMOKE_DIR/basic" \
80
82
  --no-tools \
81
83
  -p 'Live smoke. Reply exactly: PI_CURSOR_SMOKE_OK' \
@@ -93,7 +95,7 @@ Pass criteria:
93
95
  ## 2. Default setting-source startup noise check
94
96
 
95
97
  ```bash
96
- pi -e . --cursor-no-fast --model cursor/composer-2.5 \
98
+ pi -e . --cursor-no-fast --model cursor/composer-2-5 \
97
99
  --session-dir "$SMOKE_DIR/default-settings" \
98
100
  --no-tools \
99
101
  -p 'Default settings smoke. Include PRODUCT=42 in the final answer.' \
@@ -115,23 +117,23 @@ Run a real interactive session under tmux:
115
117
  ```bash
116
118
  SESSION="pi-cursor-sdk-smoke-$(date +%s)"
117
119
  tmux new-session -d -s "$SESSION" -x 120 -y 40 -- zsh -lc \
118
- "cd '$PWD' && PI_CURSOR_SETTING_SOURCES=none pi -e . --cursor-no-fast --model cursor/composer-2.5 --session-dir '$SMOKE_DIR/tui' --session-id cursor-sdk-1016-tui --no-tools 'TUI smoke. Compute 19 + 23. Reply only with SUM=<number>.'"
120
+ "cd '$PWD' && PI_CURSOR_SETTING_SOURCES=none pi -e . --cursor-no-fast --model cursor/composer-2-5 --session-dir '$SMOKE_DIR/tui' --session-id cursor-sdk-1016-tui --no-tools 'TUI smoke. Compute 19 + 23. Reply only with SUM=<number>.'"
119
121
  ```
120
122
 
121
123
  Observe with `tmux capture-pane -pt "$SESSION"` or attach manually.
122
124
 
123
125
  Pass criteria:
124
126
 
125
- - Footer shows `(cursor) composer-2.5`. With `--cursor-no-fast`, Cursor fast mode is off and the Cursor extension status should not show `cursor fast`; ignore unrelated status text from other extensions.
126
- - The run uses pi 0.77.0 `--session-id` successfully.
127
+ - Footer shows `(cursor) composer-2-5`. With `--cursor-no-fast`, Cursor fast mode is off and the Cursor extension status should not show `cursor fast`; ignore unrelated status text from other extensions.
128
+ - The run uses pi 0.78.0 `--session-id` successfully.
127
129
  - Assistant answer appears correctly.
128
130
  - `/session` shows one user and one assistant message for the simple run.
129
131
  - Persisted JSONL has one assistant message. If the screen appears duplicated, inspect JSONL before deciding whether it is a rendering bug.
130
132
  - Kill the tmux session after the check and verify no smoke tmux sessions remain.
131
133
 
132
- ## 4. Mandatory visual card/color rendering check
134
+ ## 4. Focused visual card/color rendering check
133
135
 
134
- This is the canonical visual release path for Cursor provider/runtime changes. It requires offscreen TUI visual inspection, not only JSONL or code review. Use pi 0.77.0, `@cursor/sdk@1.0.16`, a fresh temporary session dir, Cursor SDK `plan` mode, native replay enabled, and the checked-in visual runner. The runner resolves `pi` by directly walking the parent `PATH`, uses `process.execPath` for Node, and prepends that Node directory for both prereq checks and tmux launches so `#!/usr/bin/env node` shims use the validated Node. The default matrix is native replay only: native replay registration is forced on, settings sources are `none`, the pi bridge is off, overlapping built-in pi tools are not exposed, and inherited Cursor SDK event-debug artifact env is cleared. With `--event-debug`, debug capture writes to a deterministic directory under `VISUAL_DIR`.
136
+ This is the canonical inner-loop visual debug path for Cursor provider/runtime changes. It requires offscreen TUI visual inspection, not only JSONL or code review. Use pi 0.78.0, `@cursor/sdk@1.0.17`, a fresh temporary session dir, Cursor SDK `plan` mode, native replay enabled, and the checked-in visual runner. The runner resolves `pi` by directly walking the parent `PATH`, uses `process.execPath` for Node, and prepends that Node directory for both prereq checks and tmux launches so `#!/usr/bin/env node` shims use the validated Node. The default matrix is native replay only: native replay registration is forced on, settings sources are `none`, the pi bridge is off, overlapping built-in pi tools are not exposed, and inherited Cursor SDK event-debug artifact env is cleared. With `--event-debug`, debug capture writes to a deterministic directory under `VISUAL_DIR`.
135
137
 
136
138
  ```bash
137
139
  VISUAL_DIR="$(mktemp -d /tmp/pi-cursor-sdk-1016-visual.XXXXXX)"
@@ -202,7 +204,7 @@ Pass criteria:
202
204
 
203
205
  ```bash
204
206
  PI_CURSOR_SETTING_SOURCES=none \
205
- pi -e . --cursor-no-fast --cursor-mode plan --model cursor/composer-2.5 \
207
+ pi -e . --cursor-no-fast --cursor-mode plan --model cursor/composer-2-5 \
206
208
  --session-dir "$SMOKE_DIR/cursor-mode-plan" \
207
209
  --session-id cursor-sdk-1016-plan \
208
210
  --no-tools \
@@ -224,7 +226,7 @@ Pass criteria:
224
226
  PI_CURSOR_SETTING_SOURCES=none \
225
227
  PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 \
226
228
  PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 \
227
- pi -e . --cursor-no-fast --model cursor/composer-2.5 \
229
+ pi -e . --cursor-no-fast --model cursor/composer-2-5 \
228
230
  --session-dir "$SMOKE_DIR/bridge" \
229
231
  -p 'Bridge smoke. Do exactly two tool calls before answering: first call pi__read on ./package.json; second call pi__read on ./definitely-missing-pi-cursor-sdk-smoke-file.txt. Then answer: OK_NAME=<package name>; MISSING_RESULT=<error or success>. Do not use shell.' \
230
232
  > "$SMOKE_DIR/bridge.stdout.txt" \
@@ -245,7 +247,7 @@ Pass criteria:
245
247
  PI_CURSOR_SETTING_SOURCES=none \
246
248
  PI_CURSOR_PI_TOOL_BRIDGE=0 \
247
249
  PI_CURSOR_NATIVE_TOOL_DISPLAY=1 \
248
- pi -e . --cursor-no-fast --model cursor/composer-2.5 \
250
+ pi -e . --cursor-no-fast --model cursor/composer-2-5 \
249
251
  --session-dir "$SMOKE_DIR/native-replay" \
250
252
  -p 'Native replay smoke. Use your Cursor file-reading capability to read ./README.md, then answer README_SEEN=yes if it contains pi-cursor-sdk.' \
251
253
  > "$SMOKE_DIR/native-replay.stdout.txt" \
@@ -311,7 +313,7 @@ Pass criteria:
311
313
 
312
314
  ## 9. Long-running bridge and abort/cancel
313
315
 
314
- This check is release-blocking for every Cursor provider/runtime release.
316
+ Use this focused check when debugging abort cleanup. The platform smoke gate is the release-blocking source of truth for every Cursor provider/runtime release.
315
317
 
316
318
  Use a harmless long-running command and interrupt it after the bridge request is queued:
317
319
 
@@ -319,7 +321,7 @@ Use a harmless long-running command and interrupt it after the bridge request is
319
321
  PI_CURSOR_SETTING_SOURCES=none \
320
322
  PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 \
321
323
  PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 \
322
- pi -e . --cursor-no-fast --model cursor/composer-2.5 \
324
+ pi -e . --cursor-no-fast --model cursor/composer-2-5 \
323
325
  --session-dir "$SMOKE_DIR/abort" \
324
326
  -p 'Abort smoke. Call pi__bash with command: sleep 30 && echo SHOULD_NOT_PRINT. Do not answer until the tool completes.'
325
327
  ```
@@ -380,7 +382,7 @@ Pass criteria:
380
382
 
381
383
  ## Coverage gaps this checklist makes explicit
382
384
 
383
- Everything in this section is in scope for Cursor provider/runtime releases. These are not accepted as "done" unless the matching live check passes:
385
+ Everything in this section is in scope when using this checklist for Cursor provider/runtime debugging. Release readiness still comes from the platform smoke gate:
384
386
 
385
387
  - Long-running bridged tool abort/cancel cleanup.
386
388
  - Native replay cards beyond read, especially shell/edit/write cards, when those renderers change.
@@ -390,4 +392,4 @@ Everything in this section is in scope for Cursor provider/runtime releases. The
390
392
  - Ambient Cursor setting-source behavior when startup filtering or local Cursor settings handling changes.
391
393
  - Model discovery aliases/context variants when model-discovery code or Cursor SDK versions change.
392
394
 
393
- If any surface has no adequate live check, add that check before release instead of assuming mocks cover reality.
395
+ If any surface has no adequate platform or focused live check, add that coverage before release instead of assuming mocks cover reality.
@@ -15,7 +15,7 @@ Current implementation notes:
15
15
  - Cursor status uses one coordinated `ctx.ui.setStatus("cursor", ...)` value for fast and non-default plan mode; the default pi footer remains intact.
16
16
  - Installed `@cursor/sdk` user messages accept images, and Cursor models are treated as image-capable; registered input metadata is `text` plus `image`.
17
17
  - Image payload forwarding sends images only from the latest user message. If the latest user turn is plain text after an earlier image turn, the transcript keeps an `[image omitted from transcript]` placeholder but no image bytes are sent to Cursor. The prompt explicitly tells Cursor that prior image bytes are unavailable and to ask the user to reattach or describe a prior image when needed. Carrying images forward across turns remains a future product decision because it affects token cost, privacy, stale visual context, and expected multimodal follow-up behavior.
18
- - Exact `@cursor/sdk@1.0.16` is a package dependency of this extension; users should not need a global SDK install. pi 0.77.0 is the current validation baseline, while published pi peer dependencies are minimum-only `>=0.76.0` ranges with no upper bound. Newer pi versions are allowed to attempt loading this extension before a matching extension release exists; compatibility is best-effort until validated.
18
+ - Exact `@cursor/sdk@1.0.17` is a package dependency of this extension; users should not need a global SDK install. pi 0.78.0 is the current validation baseline, while published pi peer dependencies are minimum-only `>=0.76.0` ranges with no upper bound. Newer pi versions are allowed to attempt loading this extension before a matching extension release exists; compatibility is best-effort until validated.
19
19
  - Cursor auth uses pi-native API-key resolution for provider `cursor`: CLI `--api-key`, stored `~/.pi/agent/auth.json` API key from `/login`, then `CURSOR_API_KEY`. The extension config file stores only non-secret Cursor-only state such as fast defaults.
20
20
  - Local agents pass `settingSources: ["all"]` by default so Cursor MCP servers, plugin tools, project/user settings, and related Cursor-native capabilities are available. Users can narrow loading with a comma-separated list such as `PI_CURSOR_SETTING_SOURCES=project,user,plugins`, or disable ambient setting sources with `PI_CURSOR_SETTING_SOURCES=none`. The provider suppresses direct Cursor SDK bootstrap stdout/stderr/console noise (including late first-send workspace loading such as hook compatibility warnings) so it does not pollute pi's TUI.
21
21
  - On `cursor/*` models, pi-cursor-sdk removes only pi-generated `<project_instructions>` blocks that overlap the effective Cursor `settingSources`: `user` for `~/.pi/agent/AGENTS.md`; `project` for discovered repo/parent `AGENTS.md` and `CLAUDE.md` (verified Cursor behavior: local agents load project `AGENTS.md` and `CLAUDE.md`). `~/.pi/agent/CLAUDE.md` is not removed (Cursor user layer uses `~/.claude/CLAUDE.md`). Blocks are removed by exact pi serialization match from structured `contextFiles` via the `before_agent_start` hook, not in `buildCursorPrompt` sanitization. Suppression is skipped with `-nc`, `PI_CURSOR_SETTING_SOURCES=none`, narrowed sources such as `plugins` that omit the matching layer, or `PI_CURSOR_PRESERVE_PI_AGENTS_MD=1`. Switching away from a Cursor model restores pi's full context block on the next user message.
@@ -26,18 +26,18 @@ Current implementation notes:
26
26
  - Prompt text is the primary provider/bridge contract. Bootstrap prompts carry a short boundary block plus the callable-surface manifest by default (`PI_CURSOR_TOOL_MANIFEST=1`). MCP `listTools` descriptions use a one-line pointer to the bootstrap prompt instead of repeating the full contract (`buildCursorPiBridgeMcpToolDescription()`). Cursor must call the exposed `pi__*` MCP name, not the real pi tool name shown in pi history or transcripts. Pi emits and executes the real pi tool name. Maintainer debug: `/cursor-tools` prints bridge/manifest enablement, effective `PI_CURSOR_SETTING_SOURCES`, and the current callable-surface snapshot.
27
27
  - The provider also registers `cursor_ask_question` for Cursor models when the bridge is enabled. Cursor sees it as `pi__cursor_ask_question`, and pi executes it through the normal tool path so interactive users can choose options from pi UI. In non-UI modes it reports that UI is unavailable so Cursor can state a default assumption instead. `PI_CURSOR_PI_TOOL_BRIDGE=0` disables the local bridge, including question bridging. Cloud Cursor agents remain out of scope for the bridge.
28
28
  - The bridge queues MCP calls, emits provider `toolcall_*` events, waits for matching pi `toolResult` messages by `toolCallId`, resolves the result back into the same live Cursor SDK run without creating a new `Agent`, and never calls tool `execute()` handlers directly. The same-run resume invariant holds unless the run was disposed, aborted, or cancelled.
29
- - Cursor SDK MCP tool calls use a guarded timeout override because installed `@cursor/sdk` 1.0.16 has a 60-second MCP request default with no public per-server timeout option. The extension extends the verified Cursor SDK MCP `callTool` timeout path to 3600 seconds by default and shortens the verified first-send MCP initialize/listTools timeout paths to 10 seconds by default so unavailable configured MCP servers do not block the first reply for a full minute; unknown MCP protocol timeout stacks keep the SDK default. Users can override tool-call timeouts with `PI_CURSOR_MCP_TOOL_TIMEOUT_MS` or `PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS`, and initialize/listTools timeouts with `PI_CURSOR_MCP_CONNECT_TIMEOUT_MS` or `PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS`.
29
+ - Cursor SDK MCP tool calls use a guarded timeout override because installed `@cursor/sdk` 1.0.17 has a 60-second MCP request default with no public per-server timeout option. The extension extends the verified Cursor SDK MCP `callTool` timeout path to 3600 seconds by default and shortens the verified first-send MCP initialize/listTools timeout paths to 10 seconds by default so unavailable configured MCP servers do not block the first reply for a full minute; unknown MCP protocol timeout stacks keep the SDK default. Users can override tool-call timeouts with `PI_CURSOR_MCP_TOOL_TIMEOUT_MS` or `PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS`, and initialize/listTools timeouts with `PI_CURSOR_MCP_CONNECT_TIMEOUT_MS` or `PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS`.
30
30
  - Bridge diagnostics are opt-in only: `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` writes typed, allowlisted, scrubbed single-line JSONL records to `process.stderr` with prefix `[pi-cursor-sdk:bridge]`. Diagnostics are scrubbed operational logs, not anonymous telemetry. They intentionally include tool names, safe correlation IDs, run lifecycle, exposed pi↔MCP name pairs, queued requests, result resolution, rejection, cancellation, and pending counts. Correlation IDs are generated independently from the tokenized endpoint path, and Cursor MCP call IDs are hashed before serialization. Diagnostics must not include endpoint paths/URLs/path components/tokens, API keys, bearer tokens, cookies, session credentials, raw args/results, stdout/stderr payloads, file contents, Cursor settings output, or local private session paths in tracked docs, and they must not call pi UI status, notification, or footer APIs. If tool names themselves are unacceptable for a release target, bridge debug diagnostics are not safe for shared logs under the current contract.
31
31
  - This repo does not provide a generic desktop-automation, browser-driver, or CDP recipe. Provider docs should describe pi-cursor-sdk's Cursor provider/bridge contract only.
32
- - Cursor internal tool activity is recorded from SDK events and scrubbed. Maintainer reference for all 16 `@cursor/sdk@1.0.16` `ToolType` values, runtime alias normalization, and intentional mapping/fallback rules: [Cursor native tool replay — SDK ToolType replay matrix](./cursor-native-tool-replay.md#sdk-tooltype-replay-matrix) (official SDK docs: https://cursor.com/docs/sdk/typescript). In interactive TTY sessions, supported completed `read`, `bash`, `grep`, `find`, `ls`, `edit`, `write`, diagnostics, delete, todo/plan, task, image generation, MCP, semantic search, and screen recording activity is replayed through pi's native tool-call rendering path with recorded Cursor results, so the TUI can show native-looking cards without rerunning Cursor's reads/shell commands/file edits. Cursor `glob` activity is replayed through native `find` cards. Cursor write activity is replayed through native-looking `write` cards, and Cursor StrReplace/edit activity uses native-looking `edit` only when recorded arguments truthfully satisfy pi's `edit` schema; path-only Cursor edit and notebook edit replay falls back to neutral Cursor activity before pi validation. Diagnostics, delete, todos/plans, task, image, and MCP activity use neutral Cursor activity cards with pi's default success/error shell. Neutral Cursor activity calls include `activityTitle` and, when available, `activitySummary` so partial/collapsed cards preserve identity such as `Cursor plan`, `Cursor todos`, `Cursor MCP`, or `Cursor edit`. For long-running or externally meaningful Cursor tools (`task`, `shell`, `mcp`, `generateImage`, `recordScreen`, `semSearch`, web search/fetch, plan/todo), the provider may surface one low-noise deferred in-progress thinking line such as `Cursor MCP: external_search` from bounded, scrubbed SDK args; fast local tools (`read`, `grep`, `glob`, and similar) skip lifecycle lines when completion follows immediately, and pi bridge MCP calls are excluded because pi already shows real pi tool execution ([lifecycle visibility](./cursor-native-tool-replay.md#low-noise-tool-lifecycle-visibility)). Replay-only tools display recorded Cursor results, normalize workspace-local paths/diff headers for display, use pi diff colors for edit previews and path-inferred syntax highlighting for write previews, and fail closed if called without a recorded result. Native replay wrappers are registered only for tool names not already owned by another extension; conflicting tools use the bounded scrubbed transcript fallback. Cursor workflow tools such as mode/task/todo/plan activity are not pi workflow controls; reported todo/plan events are displayed as Cursor activity only. Plan/todo replay cards can be followed by Cursor's final plan text, selected from `run.wait().result` when Cursor provides one and trimmed against already-emitted text. Started Cursor SDK tool calls that never receive a completion event are surfaced with bounded user-visible labels/traces (neutral activity cards when native replay routing allows, otherwise the same inactive or transcript trace fallbacks used for completed replay) instead of being silently discarded when the run failed/aborted, produced no assistant text, or involved external/side-effectful tools; incomplete fast local discovery starts (`read`, `grep`, `glob`, `ls`) remain maintainer-debug-only after successful text-producing runs so stale SDK start events do not create red post-answer cards. Explicit failures remain visible when Cursor reports them through completed tool calls or step results. Pi bridge MCP starts remain excluded from duplicate incomplete Cursor cards because pi already shows real pi tool execution. `PI_CURSOR_NATIVE_TOOL_DISPLAY=0` disables native replay, and `PI_CURSOR_REGISTER_NATIVE_TOOLS=0` is a registration-only opt-out that keeps the transcript fallback without shadowing pi tool names. When bridge or native replay cards are emitted, the provider mirrors Codex's turn shape as Cursor SDK activity arrives: assistant `toolUse`, pi `toolResult`s, live post-tool Cursor thinking/text, any later tool batches as further `toolUse` turns, then Cursor's final assistant answer. For shell replay, completed `stdout` / `stderr` are primary; unambiguous `shell-output-delta` data is used only as display-only fallback for empty successful shell completions, and overlapping shell calls drop ambiguous deltas instead of guessing. Non-interactive runs keep bounded scrubbed transcript output instead, preserving `pi -p` assistant text output. Cursor text deltas stream live when no live-run turn split is active.
32
+ - Cursor internal tool activity is recorded from SDK events and scrubbed. Maintainer reference for all 16 `@cursor/sdk@1.0.17` `ToolType` values, runtime alias normalization, and intentional mapping/fallback rules: [Cursor native tool replay — SDK ToolType replay matrix](./cursor-native-tool-replay.md#sdk-tooltype-replay-matrix) (official SDK docs: https://cursor.com/docs/sdk/typescript). In interactive TTY sessions, supported completed `read`, `bash`, `grep`, `find`, `ls`, `edit`, `write`, diagnostics, delete, todo/plan, task, image generation, MCP, semantic search, and screen recording activity is replayed through pi's native tool-call rendering path with recorded Cursor results, so the TUI can show native-looking cards without rerunning Cursor's reads/shell commands/file edits. Cursor `glob` activity is replayed through native `find` cards. Cursor write activity is replayed through native-looking `write` cards, and Cursor StrReplace/edit activity uses native-looking `edit` only when recorded arguments truthfully satisfy pi's `edit` schema; path-only Cursor edit and notebook edit replay falls back to neutral Cursor activity before pi validation. Diagnostics, delete, todos/plans, task, image, and MCP activity use neutral Cursor activity cards with pi's default success/error shell. Neutral Cursor activity calls include `activityTitle` and, when available, `activitySummary` so partial/collapsed cards preserve identity such as `Cursor plan`, `Cursor todos`, `Cursor MCP`, or `Cursor edit`. For long-running or externally meaningful Cursor tools (`task`, `shell`, `mcp`, `generateImage`, `recordScreen`, `semSearch`, web search/fetch, plan/todo), the provider may surface one low-noise deferred in-progress thinking line such as `Cursor MCP: external_search` from bounded, scrubbed SDK args; fast local tools (`read`, `grep`, `glob`, and similar) skip lifecycle lines when completion follows immediately, and pi bridge MCP calls are excluded because pi already shows real pi tool execution ([lifecycle visibility](./cursor-native-tool-replay.md#low-noise-tool-lifecycle-visibility)). Replay-only tools display recorded Cursor results, normalize workspace-local paths/diff headers for display, use pi diff colors for edit previews and path-inferred syntax highlighting for write previews, and fail closed if called without a recorded result. Native replay wrappers are registered only for tool names not already owned by another extension; conflicting tools use the bounded scrubbed transcript fallback. Cursor workflow tools such as mode/task/todo/plan activity are not pi workflow controls; reported todo/plan events are displayed as Cursor activity only. Plan/todo replay cards can be followed by Cursor's final plan text, selected from `run.wait().result` when Cursor provides one and trimmed against already-emitted text. Started Cursor SDK tool calls that never receive a completion event are surfaced with bounded user-visible labels/traces (neutral activity cards when native replay routing allows, otherwise the same inactive or transcript trace fallbacks used for completed replay) instead of being silently discarded when the run failed/aborted, produced no assistant text, or involved external/side-effectful tools; incomplete fast local discovery starts (`read`, `grep`, `glob`, `ls`) remain maintainer-debug-only after successful text-producing runs so stale SDK start events do not create red post-answer cards. Explicit failures remain visible when Cursor reports them through completed tool calls or step results. Pi bridge MCP starts remain excluded from duplicate incomplete Cursor cards because pi already shows real pi tool execution. `PI_CURSOR_NATIVE_TOOL_DISPLAY=0` disables native replay, and `PI_CURSOR_REGISTER_NATIVE_TOOLS=0` is a registration-only opt-out that keeps the transcript fallback without shadowing pi tool names. When bridge or native replay cards are emitted, the provider mirrors Codex's turn shape as Cursor SDK activity arrives: assistant `toolUse`, pi `toolResult`s, live post-tool Cursor thinking/text, any later tool batches as further `toolUse` turns, then Cursor's final assistant answer. For shell replay, completed `stdout` / `stderr` are primary; unambiguous `shell-output-delta` data is used only as display-only fallback for empty successful shell completions, and overlapping shell calls drop ambiguous deltas instead of guessing. Non-interactive runs keep bounded scrubbed transcript output instead, preserving `pi -p` assistant text output. Cursor text deltas stream live when no live-run turn split is active.
33
33
  - Synthetic replay names are internal compatibility details. New model-facing prompt text and user-visible cards use native tool names when renderer-compatible, or neutral Cursor activity labels when not. Legacy sessions containing old internal replay names are sanitized before prompt/display. Bridge MCP names such as `pi__sem_reindex` are MCP-only; pi session output uses real pi tool names.
34
34
  - Cursor SDK usage events report cumulative internal agent/tool/cache work, not the replayable pi prompt context. The extension does not copy raw Cursor SDK usage into pi usage or compaction. For Cursor assistant messages, `usage.input`/`usage.output` are approximate pi session activity components: initial Cursor prompt input is counted once, consumed split-run tool results are counted as deduped input on the following assistant turn, and assistant output includes visible text/thinking/tool-call content. `usage.totalTokens` is the replayable Cursor prompt/context estimate derived from the same `buildCursorPrompt()` path used for `Agent.send`; it may differ from `input + output` and is the context-safe value for display/compaction. `src/cursor-usage-accounting.ts` owns this usage policy, and `src/cursor-live-run-accounting.ts` owns prompt-once and consumed-tool-result accounting so provider usage and bridge result resolution share the same matched tool-result boundary.
35
35
  - Audit observation, 2026-05-19, superseded by the 2026-05-21 replay pass and #68 incomplete visibility, then narrowed by the 2026-05-26 fast-local suppression: a missing-file read with Composer 2.5 emitted `tool-call-started` for Cursor `read`, then streamed final text `Error: File not found`, but did not emit `tool-call-completed` or an `onStep` `toolCall` error result. Leftover external/side-effectful started calls are surfaced at run completion through the same native replay routing as completed tools (activity cards when allowed, otherwise inactive/transcript traces), while fast local discovery starts are debug-only after a successful text-producing run. Cursor-reported completed/step errors remain visible.
36
36
  - Maintainer visual verification for replay-card changes should follow [Cursor Native Tool Visual Audit Workflow](./cursor-native-tool-visual-audit.md): offscreen PTY-driven pi run, xterm.js/Playwright screenshot rendering, and JSONL inspection before accepting commits or PRs.
37
- - Cursor provider/runtime releases should follow [Cursor Live Smoke Checklist](./cursor-live-smoke-checklist.md) with real `pi -e . --cursor-no-fast --model cursor/composer-2.5` invocations, manual observation, temporary session dirs, diagnostics scans, and persisted JSONL inspection. See [Cursor testing lessons](./cursor-testing-lessons.md) for auth.json seeding, isolated smoke harnesses, and replay JSONL scans. Assume every runtime surface is in scope. A release is not ready when any live check is optional, deferred, mostly passing, or unobserved.
37
+ - Cursor provider/runtime releases must pass the [Platform Smoke Gate](./platform-smoke.md): `npm run smoke:platform:doctor && npm run smoke:platform:all`. Use [Cursor Live Smoke Checklist](./cursor-live-smoke-checklist.md) only for focused inner-loop/debug runs with real `pi -e . --cursor-no-fast --model cursor/composer-2-5` invocations, manual observation, temporary session dirs, diagnostics scans, and persisted JSONL inspection. See [Cursor testing lessons](./cursor-testing-lessons.md) for auth.json seeding, isolated smoke harnesses, and replay JSONL scans. Assume every runtime surface is in scope.
38
38
  - For models without a catalog `context` parameter, context windows are not hardcoded. The extension ships a bundled SDK-derived default/non-Max cache generated from `createAgentPlatform().checkpointStore.loadLatest(agentId).tokenDetails.maxTokens`. Successful runs can update a local override cache, but model discovery does not probe models at startup.
39
- - Max Mode context windows are distinct from default/non-Max context windows. `@cursor/sdk` 1.0.16 documentation says the SDK may enable Max Mode automatically when a selected model requires it, but the public local-agent `ModelSelection` path still does not expose a manual Max Mode selector. Do not advertise Max Mode context windows unless the SDK catalog exposes an exact parameter/variant or the SDK public API adds a Max Mode selector that the extension actually sends.
40
- - The installed `@cursor/sdk` exposes latest-style `ModelListItem.aliases`. The extension registers only unambiguous aliases as pi model IDs (with the same context suffixes when applicable) and sends the alias back in `ModelSelection.id`, while sharing Cursor-only state such as fast defaults with the underlying catalog `id`. Aliases shared by multiple base models, such as generic family aliases, are skipped because the pi row metadata would otherwise imply one base model while Cursor may resolve the alias to another.
39
+ - Max Mode context windows are distinct from default/non-Max context windows. `@cursor/sdk` 1.0.17 documentation says the SDK may enable Max Mode automatically when a selected model requires it, but the public local-agent `ModelSelection` path still does not expose a manual Max Mode selector. Do not advertise Max Mode context windows unless the SDK catalog exposes an exact parameter/variant or the SDK public API adds a Max Mode selector that the extension actually sends.
40
+ - The installed `@cursor/sdk` exposes latest-style `ModelListItem.aliases`. The extension registers only unambiguous aliases as pi model IDs (with the same context suffixes when applicable) and sends the alias back in `ModelSelection.id`. Cursor-only fast preferences are keyed by the selected SDK model ID/alias, with read fallback for older preferences keyed by the underlying catalog `id`. Aliases shared by multiple base models, such as generic family aliases, are skipped because the pi row metadata would otherwise imply one base model while Cursor may resolve the alias to another.
41
41
  - Session-scoped Cursor SDK agent pooling reuses one live `@cursor/sdk` agent across compatible follow-up turns within the same pi session scope. `planCursorSessionSend()` in `src/cursor-session-send-policy.ts` decides whether the next turn sends a full bootstrap prompt or an incremental follow-up, whether the SDK agent must be recreated, and why. `computeCursorContextFingerprint()` and `shouldBootstrapCursorContext()` remain the context-only bootstrap signal. The pool recreates the agent when context diverges, when branch or compaction summaries appear after `/tree` navigation or compaction, after 20 completed incremental sends, when the API key identity changes, after send errors, on `session_shutdown`, and when `session_before_tree` / `session_tree` invalidate the active branch. Incremental sends omit the full Cursor SDK tool boundary block because the session agent retains prior bootstrap context, but every send ends with a short tool tail guard placed after the latest user request (including an explicit shell `cd` hint).
42
42
  - Pi steering/follow-up delivery can arrive while a split live Cursor SDK run is still active. The provider resolves pending live runs by scanning trailing `toolResult` messages while skipping trailing `user` messages, tracks the active live run per session scope, and resumes the in-flight run instead of calling `Agent.send()` again. When the context ends with steering user text after tool results, the provider releases the prior live run and chains an incremental `Agent.send()` for the latest user message in the same provider turn; if the prior run emits more text or tool requests after steering arrives, that stale activity is cancelled instead of surfacing another old-run tool turn and losing the new user input. A pre-send guard waits for or resumes any still-active scoped live run before starting a fresh send so `@cursor/sdk` `AgentBusyError` (`already has active run`) does not surface to pi users. Pooled session agents mark busy as soon as live/direct `run.wait()` tracking starts (`trackRunCompletion` on the session lease), and `acquireSessionCursorAgent()` awaits that busy state before returning a lease so send planning, transcript offsets, and later `Agent.send()` do not race the prior turn's SDK run completion (for example pi auto-compaction summarization). `session_before_compact` calls `prepareCursorSessionForCompaction()` to release scoped live-run drain state and reset the pooled agent before summarization streams. Tracked completions and send commits are scoped to the pooled agent `instanceId` so disposal/replacement drops stale tracking and ignores late commits from disposed agents.
43
43
 
@@ -167,7 +167,7 @@ cursor/gpt-5.5@1m
167
167
  cursor/gpt-5.5@272k
168
168
  cursor/claude-opus-4-8@1m
169
169
  cursor/claude-opus-4-8@300k
170
- cursor/composer-2.5
170
+ cursor/composer-2-5
171
171
  ```
172
172
 
173
173
  Avoid colon-based context IDs in the first implementation unless this spec is intentionally changed:
@@ -382,7 +382,7 @@ cursor fast
382
382
 
383
383
  ## Cursor SDK Mode Behavior
384
384
 
385
- Cursor SDK 1.0.16 exposes SDK-native conversation mode:
385
+ Cursor SDK 1.0.17 exposes SDK-native conversation mode:
386
386
 
387
387
  ```ts
388
388
  type AgentModeOption = "agent" | "plan";
@@ -462,7 +462,7 @@ Let pi persist:
462
462
  The extension persists only Cursor-only state:
463
463
 
464
464
  - `fast` per session,
465
- - `fast` global default per Cursor base model,
465
+ - `fast` global default per selected Cursor SDK model ID or alias,
466
466
  - Cursor SDK `mode` per session,
467
467
  - any future Cursor-only parameter that does not map to pi model metadata.
468
468
 
@@ -478,7 +478,7 @@ Use Cursor default variants:
478
478
 
479
479
  ```text
480
480
  gpt-5.5 -> cursor/gpt-5.5@1m, thinking medium, fast=false
481
- composer-2.5 -> cursor/composer-2.5, fast=true
481
+ composer-2.5 -> cursor/composer-2-5, fast=true
482
482
  ```
483
483
 
484
484
  ### Resume Session
@@ -494,7 +494,7 @@ Restore:
494
494
  Use:
495
495
 
496
496
  1. pi's selected/default model and thinking level,
497
- 2. global saved Cursor-only defaults for the selected base model,
497
+ 2. global saved Cursor-only defaults for the selected SDK model ID or alias, falling back to older base-model keys,
498
498
  3. else Cursor default variant params.
499
499
 
500
500
  ## CLI / Print Mode
@@ -562,7 +562,7 @@ If Cursor later adds `fast`, `context`, `reasoning`, `effort`, or aliases to a m
562
562
  Initial Cursor default for Composer 2.5:
563
563
 
564
564
  ```text
565
- pi model: cursor/composer-2.5
565
+ pi model: cursor/composer-2-5
566
566
  Cursor params: fast=true
567
567
  pi thinking: off
568
568
  Cursor status: cursor fast
@@ -28,13 +28,13 @@ The bridge is enabled by default when bridgeable active pi tools exist. Cursor s
28
28
  Rollback, timeout, and diagnostics controls:
29
29
 
30
30
  ```bash
31
- PI_CURSOR_PI_TOOL_BRIDGE=0 pi --model cursor/composer-2.5
32
- PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 pi --model cursor/composer-2.5
33
- PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS=7200 pi --model cursor/composer-2.5
34
- PI_CURSOR_MCP_TOOL_TIMEOUT_MS=7200000 pi --model cursor/composer-2.5
35
- PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS=5 pi --model cursor/composer-2.5
36
- PI_CURSOR_MCP_CONNECT_TIMEOUT_MS=5000 pi --model cursor/composer-2.5
37
- PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 pi --model cursor/composer-2.5
31
+ PI_CURSOR_PI_TOOL_BRIDGE=0 pi --model cursor/composer-2-5
32
+ PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 pi --model cursor/composer-2-5
33
+ PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS=7200 pi --model cursor/composer-2-5
34
+ PI_CURSOR_MCP_TOOL_TIMEOUT_MS=7200000 pi --model cursor/composer-2-5
35
+ PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS=5 pi --model cursor/composer-2-5
36
+ PI_CURSOR_MCP_CONNECT_TIMEOUT_MS=5000 pi --model cursor/composer-2-5
37
+ PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 pi --model cursor/composer-2-5
38
38
  ```
39
39
 
40
40
  `PI_CURSOR_PI_TOOL_BRIDGE=0` disables the bridge, including `pi__cursor_ask_question`. `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1` opts in to exposing overlapping pi tool names that Cursor already has native equivalents for (`read`, `bash`, `write`, `edit`, `grep`, `find`, and `ls`). By default those names are hidden even when pi's Cursor replay wrapper has registered them as extension tools; non-overlapping active built-ins remain bridgeable by default. The installed Cursor SDK uses a 60-second MCP protocol default; pi-cursor-sdk overrides that seam by default with 3600 seconds for MCP `callTool` requests and 10 seconds for verified initialize/listTools requests on first send. Unknown MCP protocol timeout stacks keep the SDK default. `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` emits typed, allowlisted, scrubbed single-line JSONL bridge diagnostics to `process.stderr` with prefix `[pi-cursor-sdk:bridge]`; it is off by default, uses run-safe IDs that are not reused in endpoint paths, and does not print endpoint URLs/path components/tokens, raw args/results, file contents, or secrets. Cursor-native tools, Cursor settings, plugins, and configured Cursor MCP servers still come from the Cursor SDK local agent path. Cloud Cursor agents are out of scope for this bridge.
@@ -62,13 +62,13 @@ When Cursor reports completed tool activity, the extension can display recorded
62
62
 
63
63
  Cursor `glob` activity is displayed through native `find` cards.
64
64
 
65
- For the full `@cursor/sdk@1.0.16` `ToolType` set, disposition matrix, and runtime alias normalization, see [SDK ToolType replay matrix](#sdk-tooltype-replay-matrix) below. Official SDK reference: https://cursor.com/docs/sdk/typescript
65
+ For the full `@cursor/sdk@1.0.17` `ToolType` set, disposition matrix, and runtime alias normalization, see [SDK ToolType replay matrix](#sdk-tooltype-replay-matrix) below. Official SDK reference: https://cursor.com/docs/sdk/typescript
66
66
 
67
67
  Edit and write activity replays through pi-facing `edit` and `write` cards only when replay arguments truthfully satisfy the matching pi schema, but still uses recorded Cursor results only. The adapter passes through truthful Cursor paths, content when Cursor reported it, and recorded diff/details; it does not pretend Cursor's editing schema is pi's schema and it fails closed if a recorded replay result is missing. Cursor `StrReplace` with recorded replacement text displays as native-looking `edit`; path-only Cursor `edit` and notebook edit activity fall back to neutral Cursor activity so pi does not reject the replay before recorded-result handling. Cursor `write` displays as native-looking `write`. Diagnostics, delete, todos/plans, task, image, MCP, semantic search, screen recording, and web search/fetch activity use neutral Cursor activity cards with pi's default success/error tool shell. MCP completions whose `toolName` is `WebSearch` / `web_search` / `WebFetch` / similar are labeled **Cursor web search** or **Cursor web fetch** instead of generic **Cursor MCP**. Neutral Cursor activity cards carry display metadata such as `activityTitle` and `activitySummary`, so partial/collapsed cards can say `Cursor plan`, `Cursor todos`, `Cursor MCP`, `Cursor semantic search`, `Cursor screen recording`, `Cursor web search`, `Cursor web fetch`, or `Cursor edit` instead of only `Cursor activity`. These replay tools only display recorded Cursor results; they never mutate files or execute tool work directly. Replay paths are normalized to workspace-relative paths when possible. Most collapsed replay cards include bounded previews for diffs and text details so small edits, todos, task output, and MCP results are visible without expanding; web search/fetch activity stays summary-only while collapsed because those cards often arrive after final text and can otherwise bury the answer. Ctrl+O expansion shows the recorded details. Edit previews omit raw unified diff headers and show compact numbered changed/context lines using pi's native diff added/removed/context colors, and write previews use syntax highlighting when pi can infer a language from the path. Image generation replay cards show the saved image path in the collapsed summary and render the image inline when pi terminal image display is enabled and the generated file is still readable.
68
68
 
69
69
  ## SDK ToolType replay matrix
70
70
 
71
- Source of truth for SDK tool names: `@cursor/sdk@1.0.16` conversation `ToolType` values and https://cursor.com/docs/sdk/typescript
71
+ Source of truth for SDK tool names: `@cursor/sdk@1.0.17` conversation `ToolType` values and https://cursor.com/docs/sdk/typescript
72
72
 
73
73
  Implementation owners: `src/cursor-tool-presentation-registry.ts` (canonical names, labels, visibility, replay policy, bridge exclusions for internal replay wrappers, and display-spec key completeness), `src/cursor-transcript-tool-specs.ts` (registry-keyed `TOOL_DISPLAY_SPECS` formatters/builders), `src/cursor-native-tool-display-replay.ts` (replay card rendering derived from registry replay metadata), and `src/cursor-transcript-utils.ts` (`normalizeToolName()` delegating to the registry).
74
74
 
@@ -197,7 +197,7 @@ Native replay wrappers are registered only for tool names not already owned by a
197
197
  Disable native replay registration entirely:
198
198
 
199
199
  ```bash
200
- PI_CURSOR_NATIVE_TOOL_DISPLAY=0 pi --model cursor/composer-2.5
200
+ PI_CURSOR_NATIVE_TOOL_DISPLAY=0 pi --model cursor/composer-2-5
201
201
  ```
202
202
 
203
203
  `PI_CURSOR_REGISTER_NATIVE_TOOLS=0` is also accepted as a registration-only opt-out.
@@ -1,20 +1,22 @@
1
1
  # Cursor Native Tool Visual Audit Workflow
2
2
 
3
+ > **Platform Smoke (new):** The required cross-platform release gate includes a deterministic visual card matrix across all targets. See [docs/platform-smoke.md](./platform-smoke.md) for the required cards, assertion contract, and platform-matrix budget.
4
+
3
5
  This workflow is the canonical repo path for verifying Cursor SDK tool replay the way a human sees it in pi's interactive TUI, without stealing macOS focus.
4
6
 
5
7
  Use it before accepting replay-card commits or PRs, and for every Cursor provider/runtime release where TUI card/color behavior could regress. Text logs and JSONL are necessary, but they are not enough when the claim is visual parity: always keep PNGs for the exact prompt, and keep before/after PNGs when reviewing a rendering change.
6
8
 
7
- Current validation baseline: pi 0.77.0, exact `@cursor/sdk@1.0.16`, local validation packages `@earendil-works/pi-ai`, `@earendil-works/pi-coding-agent`, and `@earendil-works/pi-tui` at 0.77.0. Published peer dependencies remain minimum-only at pi 0.76.0+ with no upper bound, so newer pi installs can try the extension before a matching validation release exists.
9
+ Current validation baseline: pi 0.78.0, exact `@cursor/sdk@1.0.17`, local validation packages `@earendil-works/pi-ai`, `@earendil-works/pi-coding-agent`, and `@earendil-works/pi-tui` at 0.78.0. Published peer dependencies remain minimum-only at pi 0.76.0+ with no upper bound, so newer pi installs can try the extension before a matching validation release exists.
8
10
 
9
- ## Cursor SDK 1.0.16 / pi 0.77.0 cutover visual record
11
+ ## Cursor SDK 1.0.17 / pi 0.78.0 cutover visual record
10
12
 
11
13
  Record the required cutover validation here or in the final release handoff. The default matrix is native replay only: the runner forces native replay registration on, forces Cursor setting sources off, disables the pi bridge, disables overlapping built-in pi tool exposure, and clears inherited Cursor SDK event-debug artifact env. With `--event-debug`, debug capture writes to a deterministic directory under the visual output directory. Do not commit raw ANSI logs, screenshots, terminal recordings, debug artifacts, or `.debug/visual-smoke` scratch files.
12
14
 
13
15
  | Field | Required value / evidence |
14
16
  | --- | --- |
15
17
  | Command/session used | `npm run smoke:visual -- --ext "$PWD" --cwd "$PWD" --mode plan --out-dir <fresh /tmp dir> --label <matrix label> --prompt <matrix prompt>` with default native-replay isolation |
16
- | Baseline versions | `pi --version` = 0.77.0; `npm ls` = `@cursor/sdk@1.0.16` and local `@earendil-works/*@0.77.0` |
17
- | Card categories checked | Claim only categories proven by both PNG and JSONL. Required cutover categories are read, grep/search, find/glob, list, shell success, write, edit/diff, and true read failure. Neutral Cursor plan/todo/task/mode activity is optional/opportunistic and only counts when JSONL contains a completed Cursor workflow event. |
18
+ | Baseline versions | `pi --version` = 0.78.0; `npm ls` = `@cursor/sdk@1.0.17` and local `@earendil-works/*@0.78.0` |
19
+ | Card categories checked | Claim only categories proven by both PNG and JSONL. Required cutover categories are read, grep/search, find/glob, shell success, write, edit/diff, and true read failure. Direct `ls`/list is tracked as excluded from the current one-prompt platform matrix because composer-2-5 does not route it through native `ls` reliably; source-enumeration coverage is gated through find/glob. Neutral Cursor plan/todo/task/mode activity is optional/opportunistic and only counts when JSONL contains a completed Cursor workflow event. |
18
20
  | Observed status/card colors | Confirm native-looking cards use native pi styling; neutral Cursor activity is not red; true errors are distinct; diff previews show red/green; plan status is readable |
19
21
  | Screenshot/ANSI evidence location | External path only, for example `/tmp/pi-cursor-sdk-1016-visual.*/read-package.{ansi,txt,html,png,jsonl.path}` |
20
22
  | Debug artifact location | External `.debug/cursor-sdk-events/...` or temp artifact directory path only; do not commit raw artifacts |
@@ -27,7 +29,7 @@ Required prompt matrix for this cutover:
27
29
  | `read-package` | `Use only your file read tool. Read ./package.json and answer with only the package name. Do not use shell, grep, glob, find, or list tools.` | `toolCall.name=read`, `toolResult.toolName=read`, `isError=false` | Native-looking read card; collapsed label/path readable |
28
30
  | `grep-readme` | `Use only your grep/search tool to search ./README.md for the literal string "pi-cursor-sdk". Do not use shell, read, glob, find, ls, or list tools. Report only the first matching file path.` | `toolCall.name=grep`, `toolResult.toolName=grep`, `isError=false` | Native-looking grep/search card; match preview readable |
29
31
  | `find-readme` | `Use only your glob/file-search/find tool to find README.md from the repository root. Do not use shell, read, grep, ls, or list tools. Report matched paths exactly.` | `toolCall.name=find`, `toolResult.toolName=find`, `isError=false` | Native-looking find/glob card; matched path readable |
30
- | `list-src` | `Use only your directory listing tool to list ./src. Do not use shell, read, grep, glob, or find tools. Report whether cursor-provider.ts is present.` | `toolCall.name=ls`, `toolResult.toolName=ls`, `isError=false` | Native-looking list card; directory/path readable |
32
+ | `list-src` | Excluded from current required platform matrix. Track manually when Cursor reliably routes this prompt through native `ls`. | `toolCall.name=ls`, `toolResult.toolName=ls`, `isError=false` when exercised | Native-looking list card; directory/path readable |
31
33
  | `shell-success` | `Use only your shell/terminal tool to run printf 'cursor visual smoke\\n'. Do not use read, grep, glob, find, ls, edit, or write. Report the output.` | `toolCall.name=bash`, `toolResult.toolName=bash`, `isError=false` | Shell success card is not red/error-styled; stdout readable |
32
34
  | `write-file` | `Use your normal file write tool to create .debug/visual-smoke/cursor-mode.txt with exactly two lines: alpha and beta. Do not use shell.` | `toolCall.name=write`, `toolResult.toolName=write`, `isError=false` | Native-looking write card; path/content preview readable |
33
35
  | `edit-file` | `Use your normal file edit/str-replace tool to change beta to gamma in .debug/visual-smoke/cursor-mode.txt. Do not use shell.` | `toolCall.name=edit`, `toolResult.toolName=edit`, `isError=false` | Native-looking edit card; diff preview shows red/green added/removed lines |
@@ -61,7 +63,7 @@ The canonical workflow is now offscreen and browser-rendered:
61
63
  5. Save PNG screenshots with `agent_browser` when the harness is available, or Playwright directly when running outside that harness.
62
64
  6. Inspect the session JSONL for exact persisted `toolCall` / `toolResult` data.
63
65
 
64
- This is the best default release path because it exercises the real pi TUI, captures card class/color/label/order/truncation issues before users see them, avoids desktop focus stealing, and leaves reviewable artifacts. Use visible Terminal/Ghostty screenshots only for terminal-specific or pixel-level bugs that cannot be judged through browser-rendered ANSI.
66
+ This is the best default focused visual-debug path because it exercises the real pi TUI, captures card class/color/label/order/truncation issues before users see them, avoids desktop focus stealing, and leaves reviewable artifacts. Use visible Terminal/Ghostty screenshots only for terminal-specific or pixel-level bugs that cannot be judged through browser-rendered ANSI. The cross-platform release gate remains [Platform Smoke](./platform-smoke.md).
65
67
 
66
68
  ## Tool stack
67
69
 
@@ -80,7 +82,7 @@ npx playwright install chromium
80
82
 
81
83
  `scripts/visual-tui-smoke.mjs` is the durable source of truth for this workflow. It must keep supporting:
82
84
 
83
- - fixed-size tmux PTY execution of the parent-resolved `pi -e <extension-dir> --model cursor/composer-2.5`
85
+ - fixed-size tmux PTY execution of the parent-resolved `pi -e <extension-dir> --model cursor/composer-2-5`
84
86
  - parent-resolved `pi` and `tmux` command paths reused in tmux-launched runs, with `process.execPath`'s directory prepended for prereq checks and tmux launches so Node shims use the validated Node
85
87
  - `PI_CURSOR_NATIVE_TOOL_DISPLAY=1`
86
88
  - `PI_CURSOR_REGISTER_NATIVE_TOOLS=1` by default
@@ -1,10 +1,12 @@
1
1
  # Cursor Testing Lessons
2
2
 
3
+ > **Platform Smoke (new):** The required cross-platform release gate is `npm run smoke:platform:doctor && npm run smoke:platform:all`. See [docs/platform-smoke.md](./platform-smoke.md). For portable lessons other pi extension projects can adapt without sharing repo-specific state, see [Crabbox Platform Testing Lessons](./crabbox-platform-testing-lessons.md). The live smoke checklist remains useful for inner-loop development but is not the release gate.
4
+
3
5
  ## Purpose
4
6
 
5
7
  This document records maintainer testing lessons for `pi-cursor-sdk`. It complements unit tests and the [Cursor live smoke checklist](./cursor-live-smoke-checklist.md). Use it when adding regression coverage, debugging false-green releases, or building isolated smoke harnesses.
6
8
 
7
- For a **minimal one-session dogfood pass** (baseline env, one native + one bridge call, JSONL ID patterns, bootstrap manifest, edit diff card), use the [Cursor dogfood checklist](./cursor-dogfood-checklist.md) before running the full live smoke matrix.
9
+ For a **minimal one-session dogfood pass** (baseline env, one native + one bridge call, JSONL ID patterns, bootstrap manifest, edit diff card), use the [Cursor dogfood checklist](./cursor-dogfood-checklist.md) as inner-loop evidence before running the platform smoke gate.
8
10
 
9
11
  ## Core lesson: integration-shaped bugs beat unit mocks
10
12
 
@@ -176,7 +178,7 @@ Simulate plan-mode execute stripping with the repo fixture:
176
178
  It sets active tools to `read`, `bash`, `edit`, `write` on each `turn_start`. Run pi with:
177
179
 
178
180
  ```bash
179
- pi -e scripts/fixtures/plan-strip-shim --cursor-no-fast --model cursor/composer-2.5 \
181
+ pi -e scripts/fixtures/plan-strip-shim --cursor-no-fast --model cursor/composer-2-5 \
180
182
  --session-dir "$SMOKE_DIR/plan-strip" \
181
183
  -p 'After reset, read README.md and answer PLAN_STRIP_OK=yes.'
182
184
  ```
@@ -189,15 +191,17 @@ Pass criteria:
189
191
 
190
192
  ## Local validation ladder
191
193
 
192
- Run in order before claiming release-ready for provider/runtime changes:
194
+ Run local checks first, then the platform smoke gate before claiming release-ready for provider/runtime changes:
193
195
 
194
196
  ```bash
195
197
  npm test
196
198
  npm run typecheck
197
199
  npm pack --dry-run
198
200
  SKIP_LIVE=1 npm run smoke:isolated
199
- npm run smoke:isolated # requires auth.json or CURSOR_API_KEY
200
- npm run smoke:live # partial tmux checklist subset
201
+ npm run smoke:isolated # inner-loop helper; requires auth.json or CURSOR_API_KEY
202
+ npm run smoke:live # inner-loop partial tmux checklist subset
203
+ npm run smoke:platform:doctor
204
+ npm run smoke:platform:all
201
205
  ```
202
206
 
203
207
  After changing `scripts/validate-smoke-jsonl.mjs` or replay scan expectations, also run:
@@ -206,14 +210,15 @@ After changing `scripts/validate-smoke-jsonl.mjs` or replay scan expectations, a
206
210
  npm test -- test/validate-smoke-jsonl.test.ts
207
211
  ```
208
212
 
209
- Then follow the full manual [Cursor live smoke checklist](./cursor-live-smoke-checklist.md) for surfaces the scripts do not cover (bridge MCP, abort/cancel, full TUI observation, packaging review, cleanup).
213
+ Then use the [Cursor live smoke checklist](./cursor-live-smoke-checklist.md) only for focused inner-loop surfaces the scripts do not cover (bridge MCP, abort/cancel, full TUI observation, packaging review, cleanup) before rerunning the platform smoke gate.
210
214
 
211
- ## What belongs in CI vs manual smoke
215
+ ## What belongs in CI vs platform/manual smoke
212
216
 
213
217
  - **CI / default `npm test`:** mocked provider tests, extension lifecycle tests, JSONL validator tests, script syntax/help checks. No live Cursor calls.
214
- - **Manual / pre-release:** `npm run smoke:isolated`, `npm run smoke:live`, and the full checklist. Requires real Cursor auth and observes TUI/runtime behavior mocks cannot reproduce.
218
+ - **Platform release gate:** `npm run smoke:platform:doctor && npm run smoke:platform:all`. Requires real Cursor auth and cross-platform Crabbox setup.
219
+ - **Focused manual smoke:** `npm run smoke:isolated`, `npm run smoke:live`, and selected live-checklist sections for inner-loop debugging of behavior mocks cannot reproduce.
215
220
 
216
- If live smoke auth is unavailable, report the release as **blocked**, not skipped-ready.
221
+ If platform smoke auth or target setup is unavailable, report the release as **blocked**, not skipped-ready.
217
222
 
218
223
  ## Cursor SDK event capture probe
219
224
 
@@ -238,7 +243,7 @@ The script writes timestamped artifacts under `--out` (default `/tmp/pi-cursor-s
238
243
 
239
244
  Stdout prints artifact paths and summary counts only. Raw payloads stay on disk and may contain local paths, project text, tool args/results, or secrets — do not commit or share them.
240
245
 
241
- Hard repo rule: Cursor SDK behavior claims must come from the installed `@cursor/sdk` package and/or https://cursor.com/docs/sdk/typescript, not from memory or ad-hoc probes alone. Current cutover validation targets exact `@cursor/sdk@1.0.16` and pi 0.77.0 local packages.
246
+ Hard repo rule: Cursor SDK behavior claims must come from the installed `@cursor/sdk` package and/or https://cursor.com/docs/sdk/typescript, not from memory or ad-hoc probes alone. Current cutover validation targets exact `@cursor/sdk@1.0.17` and pi 0.78.0 local packages.
242
247
 
243
248
  ## Pi provider SDK event capture
244
249
 
@@ -249,7 +254,7 @@ One-shot maintainer script (RPC pi run, gitignored artifacts by default):
249
254
  ```bash
250
255
  CURSOR_API_KEY=... npm run debug:provider-events -- \
251
256
  --cwd . \
252
- --model cursor/composer-2.5 \
257
+ --model cursor/composer-2-5 \
253
258
  --prompt 'Repro prompt here' \
254
259
  --out .debug/cursor-sdk-events/manual-repro
255
260
  ```
@@ -289,7 +294,7 @@ Artifacts under `--out` (default `.debug/cursor-sdk-events/<timestamp>/` under `
289
294
  During any normal pi session you can also opt in with:
290
295
 
291
296
  ```bash
292
- PI_CURSOR_SDK_EVENT_DEBUG=1 pi -e . --model cursor/composer-2.5
297
+ PI_CURSOR_SDK_EVENT_DEBUG=1 pi -e . --model cursor/composer-2-5
293
298
  ```
294
299
 
295
300
  Multi-turn sessions group automatically by pi session file:
@@ -340,7 +345,7 @@ Ask the reporter (or capture yourself) for:
340
345
  | Field | Why |
341
346
  | --- | --- |
342
347
  | `pi --version` and installed `pi-cursor-sdk` version | Confirms extension/runtime in use |
343
- | Model ID (for example `cursor/composer-2.5`) | Routing/replay behavior is model-scoped |
348
+ | Model ID (for example `cursor/composer-2-5`) | Routing/replay behavior is model-scoped |
344
349
  | Exact repro prompt and prior turns | Multi-turn replay history affects prompt text |
345
350
  | Flags: `--cursor-no-fast`, `PI_CURSOR_PI_TOOL_BRIDGE`, `PI_CURSOR_EXPOSE_BUILTIN_TOOLS`, `PI_CURSOR_SETTING_SOURCES`, `PI_CURSOR_TOOL_MANIFEST` | Bridge vs native-only vs narrowed settings; bootstrap callable-surface manifest |
346
351
  | Whether the listed names are `pi__*` bridge MCP, Cursor-native (`browser_navigate`, `WebSearch`), or `cursor-replay-*` replay IDs | Three different surfaces (see [Cursor native tool replay](./cursor-native-tool-replay.md#live-bridge-vs-replay)) |
@@ -361,7 +366,7 @@ chmod 600 "$SMOKE_DIR/home/.pi/agent/auth.json"
361
366
  env -i HOME="$SMOKE_DIR/home" PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin" \
362
367
  MISE_DISABLE=1 \
363
368
  PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 \
364
- pi -e . --cursor-no-fast --model cursor/composer-2.5 \
369
+ pi -e . --cursor-no-fast --model cursor/composer-2-5 \
365
370
  --session-dir "$SMOKE_DIR/session" \
366
371
  -p '<exact reporter prompt>'
367
372
  ```
@@ -373,7 +378,7 @@ For pi parsing, replay routing, or bridge timing, prefer:
373
378
  ```bash
374
379
  npm run debug:provider-events -- \
375
380
  --cwd "$PWD" \
376
- --model cursor/composer-2.5 \
381
+ --model cursor/composer-2-5 \
377
382
  --prompt '<exact reporter prompt>' \
378
383
  --out "$SMOKE_DIR/provider-events"
379
384
  ```
@@ -394,7 +399,7 @@ npm run debug:sdk-events -- \
394
399
 
395
400
  Start with whether pi stayed alive:
396
401
 
397
- 0. **pi process exited / shell returned with uncaught `ConnectError` (`ETIMEDOUT`, code 14, `read ETIMEDOUT`)** — hard network crash bypassing provider error surfacing. Route to **#43** (coordinate with #55 for caught-failure messaging). If tools were mid-flight, note whether session JSONL ends abruptly; do not classify as #40 model text echo.
402
+ 0. **pi process exited / shell returned with uncaught `ConnectError` (for example `ETIMEDOUT`, `ECONNRESET`, `read ETIMEDOUT`, or `[aborted] read ECONNRESET`)** — hard network crash bypassing provider error surfacing. Current code guards observed Cursor SDK/network-reset shapes during active Cursor turns and should show scrubbed retry guidance instead; treat a fresh process exit as a process-guard regression, capture the stack/session tail, and route to **#43/#107** rather than #40 model text echo. If tools were mid-flight, note whether session JSONL ends abruptly.
398
403
 
399
404
  Then inspect the failing assistant turn in `$SMOKE_DIR/session/*.jsonl`:
400
405
 
@@ -414,7 +419,7 @@ rg '"type": "toolCall"|Tool call \(Cursor|cursor-replay-' "$SMOKE_DIR/session"/*
414
419
 
415
420
  ### When to file follow-ups
416
421
 
417
- - **#43** — pi exited from uncaught `ConnectError` / `ETIMEDOUT` during Cursor SDK HTTP traffic (hard crash, not a scrubbed #55 toast).
422
+ - **#43/#107** — pi exited from uncaught Cursor SDK `ConnectError` / network reset during HTTP traffic (hard crash, not a scrubbed #55 toast). Observed `ETIMEDOUT` and `ECONNRESET` shapes should be guarded during active Cursor turns; new exits need stack/session evidence.
418
423
  - **#55** — caught SDK run failure or abort with missing/opaque detail (already addressed on main for surfacing).
419
424
  - **#52** — stale/inactive native replay routing after plan-strip or stale `context.tools` snapshot (`Tool * not found` in JSONL, `inactive_trace` in `display-decisions.jsonl`); or maintainer needs an explicit "started X, never completed" debug line when JSONL shows no completion and no model text echo.
420
425
  - **New issue** — bridge dispatch failure with `[pi-cursor-sdk:bridge]` evidence, or proven provider bug with JSONL showing missing `toolCall` despite SDK `tool-call-completed` in `on-delta.jsonl` from `debug:provider-events` or `debug:sdk-events` artifacts.
@@ -31,13 +31,13 @@ Default behavior:
31
31
 
32
32
  ```bash
33
33
  # Disable pi bridge entirely
34
- PI_CURSOR_PI_TOOL_BRIDGE=0 pi --model cursor/composer-2.5
34
+ PI_CURSOR_PI_TOOL_BRIDGE=0 pi --model cursor/composer-2-5
35
35
 
36
36
  # Expose overlapping pi builtins through the bridge
37
- PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 pi --model cursor/composer-2.5
37
+ PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 pi --model cursor/composer-2-5
38
38
 
39
39
  # Disable bootstrap tool manifest
40
- PI_CURSOR_TOOL_MANIFEST=0 pi --model cursor/composer-2.5
40
+ PI_CURSOR_TOOL_MANIFEST=0 pi --model cursor/composer-2-5
41
41
  ```
42
42
 
43
43
  ## Cursor settings vs pi toggles