pi-cursor-sdk 0.1.27 → 0.1.29
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +29 -0
- package/README.md +40 -37
- package/docs/crabbox-platform-testing-lessons.md +508 -0
- package/docs/cursor-dogfood-checklist.md +4 -3
- package/docs/cursor-live-smoke-checklist.md +24 -22
- package/docs/cursor-model-ux-spec.md +12 -12
- package/docs/cursor-native-tool-replay.md +10 -10
- package/docs/cursor-native-tool-visual-audit.md +9 -7
- package/docs/cursor-testing-lessons.md +22 -17
- package/docs/cursor-tool-surfaces.md +3 -3
- package/docs/platform-smoke.md +994 -0
- package/package.json +35 -6
- package/platform-smoke.config.mjs +21 -0
- package/scripts/debug-provider-events.mjs +10 -3
- package/scripts/debug-sdk-events.mjs +10 -2
- package/scripts/isolated-cursor-smoke.sh +4 -4
- package/scripts/lib/cursor-visual-render.mjs +1 -0
- package/scripts/platform-smoke/artifacts.mjs +124 -0
- package/scripts/platform-smoke/assertions.mjs +101 -0
- package/scripts/platform-smoke/card-detect.mjs +96 -0
- package/scripts/platform-smoke/crabbox-runner.mjs +215 -0
- package/scripts/platform-smoke/doctor.mjs +446 -0
- package/scripts/platform-smoke/jsonl-text.mjs +31 -0
- package/scripts/platform-smoke/live-suite-runner.mjs +677 -0
- package/scripts/platform-smoke/platform-build-windows.ps1 +187 -0
- package/scripts/platform-smoke/pty-capture.mjs +131 -0
- package/scripts/platform-smoke/render-ansi.mjs +65 -0
- package/scripts/platform-smoke/scenarios.mjs +186 -0
- package/scripts/platform-smoke/targets.mjs +900 -0
- package/scripts/platform-smoke/visual-evidence.mjs +139 -0
- package/scripts/platform-smoke.mjs +193 -0
- package/scripts/probe-mcp-coldstart.mjs +8 -1
- package/scripts/steering-rpc-smoke.mjs +1 -1
- package/scripts/tmux-live-smoke.sh +3 -3
- package/scripts/visual-tui-smoke.mjs +1 -1
- package/src/cursor-pi-tool-bridge-abort.ts +1 -0
- package/src/cursor-pi-tool-bridge-diagnostics.ts +12 -1
- package/src/cursor-pi-tool-bridge.ts +46 -1
- package/src/cursor-provider-errors.ts +18 -2
- package/src/cursor-provider-turn-lifecycle-emitter.ts +65 -8
- package/src/cursor-provider-turn-tool-ledger.ts +2 -3
- package/src/cursor-run-final-text.ts +11 -1
- package/src/cursor-sdk-process-error-guard.ts +1 -1
- package/src/cursor-state.ts +38 -19
- package/src/cursor-tool-lifecycle.ts +1 -1
- package/src/cursor-tool-manifest.ts +1 -1
- package/src/cursor-transcript-utils.ts +7 -3
|
@@ -1,16 +1,18 @@
|
|
|
1
1
|
# Cursor Live Smoke Checklist
|
|
2
2
|
|
|
3
|
+
> **Platform Smoke (new):** The required cross-platform release gate is `npm run smoke:platform:doctor && npm run smoke:platform:all`. See [docs/platform-smoke.md](./platform-smoke.md) for the full contract. The manual checks below remain useful inner-loop/debug tools but are not the required release gate.
|
|
4
|
+
|
|
3
5
|
## Purpose
|
|
4
6
|
|
|
5
|
-
Use this manual checklist
|
|
7
|
+
Use this manual checklist during development and debugging of Cursor provider/runtime changes. Unit tests and mocks are necessary, but they are not enough for this extension. See [Cursor testing lessons](./cursor-testing-lessons.md) for auth/isolated-harness pitfalls and the plan-mode replay regression that motivated recent hardening. Always assume every runtime surface is in scope. For release readiness, run the platform gate in [docs/platform-smoke.md](./platform-smoke.md); this checklist is inner-loop evidence only.
|
|
6
8
|
|
|
7
|
-
##
|
|
9
|
+
## Inner-loop rule
|
|
8
10
|
|
|
9
11
|
- Run from a clean working tree except for the intended branch diff.
|
|
10
|
-
- Use the local extension under test: `pi -e . --cursor-no-fast --model cursor/composer-2
|
|
12
|
+
- Use the local extension under test: `pi -e . --cursor-no-fast --model cursor/composer-2-5`.
|
|
11
13
|
- Use a temporary `--session-dir` for every run.
|
|
12
14
|
- Do not paste or commit Cursor API keys, raw session contents with secrets, endpoint URLs, or local private paths.
|
|
13
|
-
- If
|
|
15
|
+
- If an inner-loop check fails, stop and fix or use [docs/platform-smoke.md](./platform-smoke.md) as the release-blocking source of truth. Do not treat this checklist as a narrower replacement for the platform gate.
|
|
14
16
|
- Do not narrow the smoke scope to the apparent code diff. Treat provider reality, TUI behavior, bridge behavior, replay behavior, diagnostics safety, abort/cancel cleanup, usage accounting, packaging, and cleanup as in scope for every Cursor provider/runtime release.
|
|
15
17
|
- A check is passed only when the visible TUI/output, stderr diagnostics, and persisted JSONL agree with the expected behavior.
|
|
16
18
|
|
|
@@ -61,13 +63,13 @@ node scripts/validate-smoke-jsonl.mjs --replay-errors-only "$SMOKE_DIR/session-s
|
|
|
61
63
|
|
|
62
64
|
The replay scan flags only error `toolResult` / error assistant messages with `Tool grep/cursor/find/ls not found`, not successful reads of docs that mention those strings. See [Cursor testing lessons](./cursor-testing-lessons.md#what-counts-as-a-replay-failure).
|
|
63
65
|
|
|
64
|
-
`npm run smoke:live` is a helper only; it polls the section 3 TUI for answer/footer evidence and then cleans up the tmux session, but it does not replace the canonical rendered-PNG visual review in section 4. Run the relevant helper `--self-test` (`smoke:live`, `smoke:visual`, `smoke:steering`, or `smoke:isolated`) when changing sealed PATH or env wrappers. Release readiness
|
|
66
|
+
`npm run smoke:live` is a helper only; it polls the section 3 TUI for answer/footer evidence and then cleans up the tmux session, but it does not replace the canonical rendered-PNG visual review in section 4. Run the relevant helper `--self-test` (`smoke:live`, `smoke:visual`, `smoke:steering`, or `smoke:isolated`) when changing sealed PATH or env wrappers. Release readiness requires the platform smoke gate. Run focused manual checks below when debugging detailed visual TUI behavior, bridge, standalone native replay, abort/cancel, packaging, cleanup, or any touched runtime surface before rerunning the platform gate.
|
|
65
67
|
|
|
66
68
|
Pass criteria:
|
|
67
69
|
|
|
68
|
-
- `pi --version` reports pi 0.
|
|
69
|
-
- `npm ls` shows `@cursor/sdk@1.0.
|
|
70
|
-
- `cursor/composer-2
|
|
70
|
+
- `pi --version` reports pi 0.78.0 for this cutover baseline.
|
|
71
|
+
- `npm ls` shows `@cursor/sdk@1.0.17` and local `@earendil-works/*@0.78.0` packages.
|
|
72
|
+
- `cursor/composer-2-5` appears in the model list.
|
|
71
73
|
- No Cursor key or auth token is printed.
|
|
72
74
|
- If neither `~/.pi/agent/auth.json` cursor auth nor `CURSOR_API_KEY` is available, stop and report the live smoke as blocked.
|
|
73
75
|
|
|
@@ -75,7 +77,7 @@ Pass criteria:
|
|
|
75
77
|
|
|
76
78
|
```bash
|
|
77
79
|
PI_CURSOR_SETTING_SOURCES=none \
|
|
78
|
-
pi -e . --cursor-no-fast --model cursor/composer-2
|
|
80
|
+
pi -e . --cursor-no-fast --model cursor/composer-2-5 \
|
|
79
81
|
--session-dir "$SMOKE_DIR/basic" \
|
|
80
82
|
--no-tools \
|
|
81
83
|
-p 'Live smoke. Reply exactly: PI_CURSOR_SMOKE_OK' \
|
|
@@ -93,7 +95,7 @@ Pass criteria:
|
|
|
93
95
|
## 2. Default setting-source startup noise check
|
|
94
96
|
|
|
95
97
|
```bash
|
|
96
|
-
pi -e . --cursor-no-fast --model cursor/composer-2
|
|
98
|
+
pi -e . --cursor-no-fast --model cursor/composer-2-5 \
|
|
97
99
|
--session-dir "$SMOKE_DIR/default-settings" \
|
|
98
100
|
--no-tools \
|
|
99
101
|
-p 'Default settings smoke. Include PRODUCT=42 in the final answer.' \
|
|
@@ -115,23 +117,23 @@ Run a real interactive session under tmux:
|
|
|
115
117
|
```bash
|
|
116
118
|
SESSION="pi-cursor-sdk-smoke-$(date +%s)"
|
|
117
119
|
tmux new-session -d -s "$SESSION" -x 120 -y 40 -- zsh -lc \
|
|
118
|
-
"cd '$PWD' && PI_CURSOR_SETTING_SOURCES=none pi -e . --cursor-no-fast --model cursor/composer-2
|
|
120
|
+
"cd '$PWD' && PI_CURSOR_SETTING_SOURCES=none pi -e . --cursor-no-fast --model cursor/composer-2-5 --session-dir '$SMOKE_DIR/tui' --session-id cursor-sdk-1016-tui --no-tools 'TUI smoke. Compute 19 + 23. Reply only with SUM=<number>.'"
|
|
119
121
|
```
|
|
120
122
|
|
|
121
123
|
Observe with `tmux capture-pane -pt "$SESSION"` or attach manually.
|
|
122
124
|
|
|
123
125
|
Pass criteria:
|
|
124
126
|
|
|
125
|
-
- Footer shows `(cursor) composer-2
|
|
126
|
-
- The run uses pi 0.
|
|
127
|
+
- Footer shows `(cursor) composer-2-5`. With `--cursor-no-fast`, Cursor fast mode is off and the Cursor extension status should not show `cursor fast`; ignore unrelated status text from other extensions.
|
|
128
|
+
- The run uses pi 0.78.0 `--session-id` successfully.
|
|
127
129
|
- Assistant answer appears correctly.
|
|
128
130
|
- `/session` shows one user and one assistant message for the simple run.
|
|
129
131
|
- Persisted JSONL has one assistant message. If the screen appears duplicated, inspect JSONL before deciding whether it is a rendering bug.
|
|
130
132
|
- Kill the tmux session after the check and verify no smoke tmux sessions remain.
|
|
131
133
|
|
|
132
|
-
## 4.
|
|
134
|
+
## 4. Focused visual card/color rendering check
|
|
133
135
|
|
|
134
|
-
This is the canonical visual
|
|
136
|
+
This is the canonical inner-loop visual debug path for Cursor provider/runtime changes. It requires offscreen TUI visual inspection, not only JSONL or code review. Use pi 0.78.0, `@cursor/sdk@1.0.17`, a fresh temporary session dir, Cursor SDK `plan` mode, native replay enabled, and the checked-in visual runner. The runner resolves `pi` by directly walking the parent `PATH`, uses `process.execPath` for Node, and prepends that Node directory for both prereq checks and tmux launches so `#!/usr/bin/env node` shims use the validated Node. The default matrix is native replay only: native replay registration is forced on, settings sources are `none`, the pi bridge is off, overlapping built-in pi tools are not exposed, and inherited Cursor SDK event-debug artifact env is cleared. With `--event-debug`, debug capture writes to a deterministic directory under `VISUAL_DIR`.
|
|
135
137
|
|
|
136
138
|
```bash
|
|
137
139
|
VISUAL_DIR="$(mktemp -d /tmp/pi-cursor-sdk-1016-visual.XXXXXX)"
|
|
@@ -202,7 +204,7 @@ Pass criteria:
|
|
|
202
204
|
|
|
203
205
|
```bash
|
|
204
206
|
PI_CURSOR_SETTING_SOURCES=none \
|
|
205
|
-
pi -e . --cursor-no-fast --cursor-mode plan --model cursor/composer-2
|
|
207
|
+
pi -e . --cursor-no-fast --cursor-mode plan --model cursor/composer-2-5 \
|
|
206
208
|
--session-dir "$SMOKE_DIR/cursor-mode-plan" \
|
|
207
209
|
--session-id cursor-sdk-1016-plan \
|
|
208
210
|
--no-tools \
|
|
@@ -224,7 +226,7 @@ Pass criteria:
|
|
|
224
226
|
PI_CURSOR_SETTING_SOURCES=none \
|
|
225
227
|
PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 \
|
|
226
228
|
PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 \
|
|
227
|
-
pi -e . --cursor-no-fast --model cursor/composer-2
|
|
229
|
+
pi -e . --cursor-no-fast --model cursor/composer-2-5 \
|
|
228
230
|
--session-dir "$SMOKE_DIR/bridge" \
|
|
229
231
|
-p 'Bridge smoke. Do exactly two tool calls before answering: first call pi__read on ./package.json; second call pi__read on ./definitely-missing-pi-cursor-sdk-smoke-file.txt. Then answer: OK_NAME=<package name>; MISSING_RESULT=<error or success>. Do not use shell.' \
|
|
230
232
|
> "$SMOKE_DIR/bridge.stdout.txt" \
|
|
@@ -245,7 +247,7 @@ Pass criteria:
|
|
|
245
247
|
PI_CURSOR_SETTING_SOURCES=none \
|
|
246
248
|
PI_CURSOR_PI_TOOL_BRIDGE=0 \
|
|
247
249
|
PI_CURSOR_NATIVE_TOOL_DISPLAY=1 \
|
|
248
|
-
pi -e . --cursor-no-fast --model cursor/composer-2
|
|
250
|
+
pi -e . --cursor-no-fast --model cursor/composer-2-5 \
|
|
249
251
|
--session-dir "$SMOKE_DIR/native-replay" \
|
|
250
252
|
-p 'Native replay smoke. Use your Cursor file-reading capability to read ./README.md, then answer README_SEEN=yes if it contains pi-cursor-sdk.' \
|
|
251
253
|
> "$SMOKE_DIR/native-replay.stdout.txt" \
|
|
@@ -311,7 +313,7 @@ Pass criteria:
|
|
|
311
313
|
|
|
312
314
|
## 9. Long-running bridge and abort/cancel
|
|
313
315
|
|
|
314
|
-
|
|
316
|
+
Use this focused check when debugging abort cleanup. The platform smoke gate is the release-blocking source of truth for every Cursor provider/runtime release.
|
|
315
317
|
|
|
316
318
|
Use a harmless long-running command and interrupt it after the bridge request is queued:
|
|
317
319
|
|
|
@@ -319,7 +321,7 @@ Use a harmless long-running command and interrupt it after the bridge request is
|
|
|
319
321
|
PI_CURSOR_SETTING_SOURCES=none \
|
|
320
322
|
PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 \
|
|
321
323
|
PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 \
|
|
322
|
-
pi -e . --cursor-no-fast --model cursor/composer-2
|
|
324
|
+
pi -e . --cursor-no-fast --model cursor/composer-2-5 \
|
|
323
325
|
--session-dir "$SMOKE_DIR/abort" \
|
|
324
326
|
-p 'Abort smoke. Call pi__bash with command: sleep 30 && echo SHOULD_NOT_PRINT. Do not answer until the tool completes.'
|
|
325
327
|
```
|
|
@@ -380,7 +382,7 @@ Pass criteria:
|
|
|
380
382
|
|
|
381
383
|
## Coverage gaps this checklist makes explicit
|
|
382
384
|
|
|
383
|
-
Everything in this section is in scope for Cursor provider/runtime
|
|
385
|
+
Everything in this section is in scope when using this checklist for Cursor provider/runtime debugging. Release readiness still comes from the platform smoke gate:
|
|
384
386
|
|
|
385
387
|
- Long-running bridged tool abort/cancel cleanup.
|
|
386
388
|
- Native replay cards beyond read, especially shell/edit/write cards, when those renderers change.
|
|
@@ -390,4 +392,4 @@ Everything in this section is in scope for Cursor provider/runtime releases. The
|
|
|
390
392
|
- Ambient Cursor setting-source behavior when startup filtering or local Cursor settings handling changes.
|
|
391
393
|
- Model discovery aliases/context variants when model-discovery code or Cursor SDK versions change.
|
|
392
394
|
|
|
393
|
-
If any surface has no adequate live check, add that
|
|
395
|
+
If any surface has no adequate platform or focused live check, add that coverage before release instead of assuming mocks cover reality.
|
|
@@ -15,7 +15,7 @@ Current implementation notes:
|
|
|
15
15
|
- Cursor status uses one coordinated `ctx.ui.setStatus("cursor", ...)` value for fast and non-default plan mode; the default pi footer remains intact.
|
|
16
16
|
- Installed `@cursor/sdk` user messages accept images, and Cursor models are treated as image-capable; registered input metadata is `text` plus `image`.
|
|
17
17
|
- Image payload forwarding sends images only from the latest user message. If the latest user turn is plain text after an earlier image turn, the transcript keeps an `[image omitted from transcript]` placeholder but no image bytes are sent to Cursor. The prompt explicitly tells Cursor that prior image bytes are unavailable and to ask the user to reattach or describe a prior image when needed. Carrying images forward across turns remains a future product decision because it affects token cost, privacy, stale visual context, and expected multimodal follow-up behavior.
|
|
18
|
-
- Exact `@cursor/sdk@1.0.
|
|
18
|
+
- Exact `@cursor/sdk@1.0.17` is a package dependency of this extension; users should not need a global SDK install. pi 0.78.0 is the current validation baseline, while published pi peer dependencies are minimum-only `>=0.76.0` ranges with no upper bound. Newer pi versions are allowed to attempt loading this extension before a matching extension release exists; compatibility is best-effort until validated.
|
|
19
19
|
- Cursor auth uses pi-native API-key resolution for provider `cursor`: CLI `--api-key`, stored `~/.pi/agent/auth.json` API key from `/login`, then `CURSOR_API_KEY`. The extension config file stores only non-secret Cursor-only state such as fast defaults.
|
|
20
20
|
- Local agents pass `settingSources: ["all"]` by default so Cursor MCP servers, plugin tools, project/user settings, and related Cursor-native capabilities are available. Users can narrow loading with a comma-separated list such as `PI_CURSOR_SETTING_SOURCES=project,user,plugins`, or disable ambient setting sources with `PI_CURSOR_SETTING_SOURCES=none`. The provider suppresses direct Cursor SDK bootstrap stdout/stderr/console noise (including late first-send workspace loading such as hook compatibility warnings) so it does not pollute pi's TUI.
|
|
21
21
|
- On `cursor/*` models, pi-cursor-sdk removes only pi-generated `<project_instructions>` blocks that overlap the effective Cursor `settingSources`: `user` for `~/.pi/agent/AGENTS.md`; `project` for discovered repo/parent `AGENTS.md` and `CLAUDE.md` (verified Cursor behavior: local agents load project `AGENTS.md` and `CLAUDE.md`). `~/.pi/agent/CLAUDE.md` is not removed (Cursor user layer uses `~/.claude/CLAUDE.md`). Blocks are removed by exact pi serialization match from structured `contextFiles` via the `before_agent_start` hook, not in `buildCursorPrompt` sanitization. Suppression is skipped with `-nc`, `PI_CURSOR_SETTING_SOURCES=none`, narrowed sources such as `plugins` that omit the matching layer, or `PI_CURSOR_PRESERVE_PI_AGENTS_MD=1`. Switching away from a Cursor model restores pi's full context block on the next user message.
|
|
@@ -26,18 +26,18 @@ Current implementation notes:
|
|
|
26
26
|
- Prompt text is the primary provider/bridge contract. Bootstrap prompts carry a short boundary block plus the callable-surface manifest by default (`PI_CURSOR_TOOL_MANIFEST=1`). MCP `listTools` descriptions use a one-line pointer to the bootstrap prompt instead of repeating the full contract (`buildCursorPiBridgeMcpToolDescription()`). Cursor must call the exposed `pi__*` MCP name, not the real pi tool name shown in pi history or transcripts. Pi emits and executes the real pi tool name. Maintainer debug: `/cursor-tools` prints bridge/manifest enablement, effective `PI_CURSOR_SETTING_SOURCES`, and the current callable-surface snapshot.
|
|
27
27
|
- The provider also registers `cursor_ask_question` for Cursor models when the bridge is enabled. Cursor sees it as `pi__cursor_ask_question`, and pi executes it through the normal tool path so interactive users can choose options from pi UI. In non-UI modes it reports that UI is unavailable so Cursor can state a default assumption instead. `PI_CURSOR_PI_TOOL_BRIDGE=0` disables the local bridge, including question bridging. Cloud Cursor agents remain out of scope for the bridge.
|
|
28
28
|
- The bridge queues MCP calls, emits provider `toolcall_*` events, waits for matching pi `toolResult` messages by `toolCallId`, resolves the result back into the same live Cursor SDK run without creating a new `Agent`, and never calls tool `execute()` handlers directly. The same-run resume invariant holds unless the run was disposed, aborted, or cancelled.
|
|
29
|
-
- Cursor SDK MCP tool calls use a guarded timeout override because installed `@cursor/sdk` 1.0.
|
|
29
|
+
- Cursor SDK MCP tool calls use a guarded timeout override because installed `@cursor/sdk` 1.0.17 has a 60-second MCP request default with no public per-server timeout option. The extension extends the verified Cursor SDK MCP `callTool` timeout path to 3600 seconds by default and shortens the verified first-send MCP initialize/listTools timeout paths to 10 seconds by default so unavailable configured MCP servers do not block the first reply for a full minute; unknown MCP protocol timeout stacks keep the SDK default. Users can override tool-call timeouts with `PI_CURSOR_MCP_TOOL_TIMEOUT_MS` or `PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS`, and initialize/listTools timeouts with `PI_CURSOR_MCP_CONNECT_TIMEOUT_MS` or `PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS`.
|
|
30
30
|
- Bridge diagnostics are opt-in only: `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` writes typed, allowlisted, scrubbed single-line JSONL records to `process.stderr` with prefix `[pi-cursor-sdk:bridge]`. Diagnostics are scrubbed operational logs, not anonymous telemetry. They intentionally include tool names, safe correlation IDs, run lifecycle, exposed pi↔MCP name pairs, queued requests, result resolution, rejection, cancellation, and pending counts. Correlation IDs are generated independently from the tokenized endpoint path, and Cursor MCP call IDs are hashed before serialization. Diagnostics must not include endpoint paths/URLs/path components/tokens, API keys, bearer tokens, cookies, session credentials, raw args/results, stdout/stderr payloads, file contents, Cursor settings output, or local private session paths in tracked docs, and they must not call pi UI status, notification, or footer APIs. If tool names themselves are unacceptable for a release target, bridge debug diagnostics are not safe for shared logs under the current contract.
|
|
31
31
|
- This repo does not provide a generic desktop-automation, browser-driver, or CDP recipe. Provider docs should describe pi-cursor-sdk's Cursor provider/bridge contract only.
|
|
32
|
-
- Cursor internal tool activity is recorded from SDK events and scrubbed. Maintainer reference for all 16 `@cursor/sdk@1.0.
|
|
32
|
+
- Cursor internal tool activity is recorded from SDK events and scrubbed. Maintainer reference for all 16 `@cursor/sdk@1.0.17` `ToolType` values, runtime alias normalization, and intentional mapping/fallback rules: [Cursor native tool replay — SDK ToolType replay matrix](./cursor-native-tool-replay.md#sdk-tooltype-replay-matrix) (official SDK docs: https://cursor.com/docs/sdk/typescript). In interactive TTY sessions, supported completed `read`, `bash`, `grep`, `find`, `ls`, `edit`, `write`, diagnostics, delete, todo/plan, task, image generation, MCP, semantic search, and screen recording activity is replayed through pi's native tool-call rendering path with recorded Cursor results, so the TUI can show native-looking cards without rerunning Cursor's reads/shell commands/file edits. Cursor `glob` activity is replayed through native `find` cards. Cursor write activity is replayed through native-looking `write` cards, and Cursor StrReplace/edit activity uses native-looking `edit` only when recorded arguments truthfully satisfy pi's `edit` schema; path-only Cursor edit and notebook edit replay falls back to neutral Cursor activity before pi validation. Diagnostics, delete, todos/plans, task, image, and MCP activity use neutral Cursor activity cards with pi's default success/error shell. Neutral Cursor activity calls include `activityTitle` and, when available, `activitySummary` so partial/collapsed cards preserve identity such as `Cursor plan`, `Cursor todos`, `Cursor MCP`, or `Cursor edit`. For long-running or externally meaningful Cursor tools (`task`, `shell`, `mcp`, `generateImage`, `recordScreen`, `semSearch`, web search/fetch, plan/todo), the provider may surface one low-noise deferred in-progress thinking line such as `Cursor MCP: external_search` from bounded, scrubbed SDK args; fast local tools (`read`, `grep`, `glob`, and similar) skip lifecycle lines when completion follows immediately, and pi bridge MCP calls are excluded because pi already shows real pi tool execution ([lifecycle visibility](./cursor-native-tool-replay.md#low-noise-tool-lifecycle-visibility)). Replay-only tools display recorded Cursor results, normalize workspace-local paths/diff headers for display, use pi diff colors for edit previews and path-inferred syntax highlighting for write previews, and fail closed if called without a recorded result. Native replay wrappers are registered only for tool names not already owned by another extension; conflicting tools use the bounded scrubbed transcript fallback. Cursor workflow tools such as mode/task/todo/plan activity are not pi workflow controls; reported todo/plan events are displayed as Cursor activity only. Plan/todo replay cards can be followed by Cursor's final plan text, selected from `run.wait().result` when Cursor provides one and trimmed against already-emitted text. Started Cursor SDK tool calls that never receive a completion event are surfaced with bounded user-visible labels/traces (neutral activity cards when native replay routing allows, otherwise the same inactive or transcript trace fallbacks used for completed replay) instead of being silently discarded when the run failed/aborted, produced no assistant text, or involved external/side-effectful tools; incomplete fast local discovery starts (`read`, `grep`, `glob`, `ls`) remain maintainer-debug-only after successful text-producing runs so stale SDK start events do not create red post-answer cards. Explicit failures remain visible when Cursor reports them through completed tool calls or step results. Pi bridge MCP starts remain excluded from duplicate incomplete Cursor cards because pi already shows real pi tool execution. `PI_CURSOR_NATIVE_TOOL_DISPLAY=0` disables native replay, and `PI_CURSOR_REGISTER_NATIVE_TOOLS=0` is a registration-only opt-out that keeps the transcript fallback without shadowing pi tool names. When bridge or native replay cards are emitted, the provider mirrors Codex's turn shape as Cursor SDK activity arrives: assistant `toolUse`, pi `toolResult`s, live post-tool Cursor thinking/text, any later tool batches as further `toolUse` turns, then Cursor's final assistant answer. For shell replay, completed `stdout` / `stderr` are primary; unambiguous `shell-output-delta` data is used only as display-only fallback for empty successful shell completions, and overlapping shell calls drop ambiguous deltas instead of guessing. Non-interactive runs keep bounded scrubbed transcript output instead, preserving `pi -p` assistant text output. Cursor text deltas stream live when no live-run turn split is active.
|
|
33
33
|
- Synthetic replay names are internal compatibility details. New model-facing prompt text and user-visible cards use native tool names when renderer-compatible, or neutral Cursor activity labels when not. Legacy sessions containing old internal replay names are sanitized before prompt/display. Bridge MCP names such as `pi__sem_reindex` are MCP-only; pi session output uses real pi tool names.
|
|
34
34
|
- Cursor SDK usage events report cumulative internal agent/tool/cache work, not the replayable pi prompt context. The extension does not copy raw Cursor SDK usage into pi usage or compaction. For Cursor assistant messages, `usage.input`/`usage.output` are approximate pi session activity components: initial Cursor prompt input is counted once, consumed split-run tool results are counted as deduped input on the following assistant turn, and assistant output includes visible text/thinking/tool-call content. `usage.totalTokens` is the replayable Cursor prompt/context estimate derived from the same `buildCursorPrompt()` path used for `Agent.send`; it may differ from `input + output` and is the context-safe value for display/compaction. `src/cursor-usage-accounting.ts` owns this usage policy, and `src/cursor-live-run-accounting.ts` owns prompt-once and consumed-tool-result accounting so provider usage and bridge result resolution share the same matched tool-result boundary.
|
|
35
35
|
- Audit observation, 2026-05-19, superseded by the 2026-05-21 replay pass and #68 incomplete visibility, then narrowed by the 2026-05-26 fast-local suppression: a missing-file read with Composer 2.5 emitted `tool-call-started` for Cursor `read`, then streamed final text `Error: File not found`, but did not emit `tool-call-completed` or an `onStep` `toolCall` error result. Leftover external/side-effectful started calls are surfaced at run completion through the same native replay routing as completed tools (activity cards when allowed, otherwise inactive/transcript traces), while fast local discovery starts are debug-only after a successful text-producing run. Cursor-reported completed/step errors remain visible.
|
|
36
36
|
- Maintainer visual verification for replay-card changes should follow [Cursor Native Tool Visual Audit Workflow](./cursor-native-tool-visual-audit.md): offscreen PTY-driven pi run, xterm.js/Playwright screenshot rendering, and JSONL inspection before accepting commits or PRs.
|
|
37
|
-
- Cursor provider/runtime releases
|
|
37
|
+
- Cursor provider/runtime releases must pass the [Platform Smoke Gate](./platform-smoke.md): `npm run smoke:platform:doctor && npm run smoke:platform:all`. Use [Cursor Live Smoke Checklist](./cursor-live-smoke-checklist.md) only for focused inner-loop/debug runs with real `pi -e . --cursor-no-fast --model cursor/composer-2-5` invocations, manual observation, temporary session dirs, diagnostics scans, and persisted JSONL inspection. See [Cursor testing lessons](./cursor-testing-lessons.md) for auth.json seeding, isolated smoke harnesses, and replay JSONL scans. Assume every runtime surface is in scope.
|
|
38
38
|
- For models without a catalog `context` parameter, context windows are not hardcoded. The extension ships a bundled SDK-derived default/non-Max cache generated from `createAgentPlatform().checkpointStore.loadLatest(agentId).tokenDetails.maxTokens`. Successful runs can update a local override cache, but model discovery does not probe models at startup.
|
|
39
|
-
- Max Mode context windows are distinct from default/non-Max context windows. `@cursor/sdk` 1.0.
|
|
40
|
-
- The installed `@cursor/sdk` exposes latest-style `ModelListItem.aliases`. The extension registers only unambiguous aliases as pi model IDs (with the same context suffixes when applicable) and sends the alias back in `ModelSelection.id
|
|
39
|
+
- Max Mode context windows are distinct from default/non-Max context windows. `@cursor/sdk` 1.0.17 documentation says the SDK may enable Max Mode automatically when a selected model requires it, but the public local-agent `ModelSelection` path still does not expose a manual Max Mode selector. Do not advertise Max Mode context windows unless the SDK catalog exposes an exact parameter/variant or the SDK public API adds a Max Mode selector that the extension actually sends.
|
|
40
|
+
- The installed `@cursor/sdk` exposes latest-style `ModelListItem.aliases`. The extension registers only unambiguous aliases as pi model IDs (with the same context suffixes when applicable) and sends the alias back in `ModelSelection.id`. Cursor-only fast preferences are keyed by the selected SDK model ID/alias, with read fallback for older preferences keyed by the underlying catalog `id`. Aliases shared by multiple base models, such as generic family aliases, are skipped because the pi row metadata would otherwise imply one base model while Cursor may resolve the alias to another.
|
|
41
41
|
- Session-scoped Cursor SDK agent pooling reuses one live `@cursor/sdk` agent across compatible follow-up turns within the same pi session scope. `planCursorSessionSend()` in `src/cursor-session-send-policy.ts` decides whether the next turn sends a full bootstrap prompt or an incremental follow-up, whether the SDK agent must be recreated, and why. `computeCursorContextFingerprint()` and `shouldBootstrapCursorContext()` remain the context-only bootstrap signal. The pool recreates the agent when context diverges, when branch or compaction summaries appear after `/tree` navigation or compaction, after 20 completed incremental sends, when the API key identity changes, after send errors, on `session_shutdown`, and when `session_before_tree` / `session_tree` invalidate the active branch. Incremental sends omit the full Cursor SDK tool boundary block because the session agent retains prior bootstrap context, but every send ends with a short tool tail guard placed after the latest user request (including an explicit shell `cd` hint).
|
|
42
42
|
- Pi steering/follow-up delivery can arrive while a split live Cursor SDK run is still active. The provider resolves pending live runs by scanning trailing `toolResult` messages while skipping trailing `user` messages, tracks the active live run per session scope, and resumes the in-flight run instead of calling `Agent.send()` again. When the context ends with steering user text after tool results, the provider releases the prior live run and chains an incremental `Agent.send()` for the latest user message in the same provider turn; if the prior run emits more text or tool requests after steering arrives, that stale activity is cancelled instead of surfacing another old-run tool turn and losing the new user input. A pre-send guard waits for or resumes any still-active scoped live run before starting a fresh send so `@cursor/sdk` `AgentBusyError` (`already has active run`) does not surface to pi users. Pooled session agents mark busy as soon as live/direct `run.wait()` tracking starts (`trackRunCompletion` on the session lease), and `acquireSessionCursorAgent()` awaits that busy state before returning a lease so send planning, transcript offsets, and later `Agent.send()` do not race the prior turn's SDK run completion (for example pi auto-compaction summarization). `session_before_compact` calls `prepareCursorSessionForCompaction()` to release scoped live-run drain state and reset the pooled agent before summarization streams. Tracked completions and send commits are scoped to the pooled agent `instanceId` so disposal/replacement drops stale tracking and ignores late commits from disposed agents.
|
|
43
43
|
|
|
@@ -167,7 +167,7 @@ cursor/gpt-5.5@1m
|
|
|
167
167
|
cursor/gpt-5.5@272k
|
|
168
168
|
cursor/claude-opus-4-8@1m
|
|
169
169
|
cursor/claude-opus-4-8@300k
|
|
170
|
-
cursor/composer-2
|
|
170
|
+
cursor/composer-2-5
|
|
171
171
|
```
|
|
172
172
|
|
|
173
173
|
Avoid colon-based context IDs in the first implementation unless this spec is intentionally changed:
|
|
@@ -382,7 +382,7 @@ cursor fast
|
|
|
382
382
|
|
|
383
383
|
## Cursor SDK Mode Behavior
|
|
384
384
|
|
|
385
|
-
Cursor SDK 1.0.
|
|
385
|
+
Cursor SDK 1.0.17 exposes SDK-native conversation mode:
|
|
386
386
|
|
|
387
387
|
```ts
|
|
388
388
|
type AgentModeOption = "agent" | "plan";
|
|
@@ -462,7 +462,7 @@ Let pi persist:
|
|
|
462
462
|
The extension persists only Cursor-only state:
|
|
463
463
|
|
|
464
464
|
- `fast` per session,
|
|
465
|
-
- `fast` global default per Cursor
|
|
465
|
+
- `fast` global default per selected Cursor SDK model ID or alias,
|
|
466
466
|
- Cursor SDK `mode` per session,
|
|
467
467
|
- any future Cursor-only parameter that does not map to pi model metadata.
|
|
468
468
|
|
|
@@ -478,7 +478,7 @@ Use Cursor default variants:
|
|
|
478
478
|
|
|
479
479
|
```text
|
|
480
480
|
gpt-5.5 -> cursor/gpt-5.5@1m, thinking medium, fast=false
|
|
481
|
-
composer-2.5 -> cursor/composer-2
|
|
481
|
+
composer-2.5 -> cursor/composer-2-5, fast=true
|
|
482
482
|
```
|
|
483
483
|
|
|
484
484
|
### Resume Session
|
|
@@ -494,7 +494,7 @@ Restore:
|
|
|
494
494
|
Use:
|
|
495
495
|
|
|
496
496
|
1. pi's selected/default model and thinking level,
|
|
497
|
-
2. global saved Cursor-only defaults for the selected
|
|
497
|
+
2. global saved Cursor-only defaults for the selected SDK model ID or alias, falling back to older base-model keys,
|
|
498
498
|
3. else Cursor default variant params.
|
|
499
499
|
|
|
500
500
|
## CLI / Print Mode
|
|
@@ -562,7 +562,7 @@ If Cursor later adds `fast`, `context`, `reasoning`, `effort`, or aliases to a m
|
|
|
562
562
|
Initial Cursor default for Composer 2.5:
|
|
563
563
|
|
|
564
564
|
```text
|
|
565
|
-
pi model: cursor/composer-2
|
|
565
|
+
pi model: cursor/composer-2-5
|
|
566
566
|
Cursor params: fast=true
|
|
567
567
|
pi thinking: off
|
|
568
568
|
Cursor status: cursor fast
|
|
@@ -28,13 +28,13 @@ The bridge is enabled by default when bridgeable active pi tools exist. Cursor s
|
|
|
28
28
|
Rollback, timeout, and diagnostics controls:
|
|
29
29
|
|
|
30
30
|
```bash
|
|
31
|
-
PI_CURSOR_PI_TOOL_BRIDGE=0 pi --model cursor/composer-2
|
|
32
|
-
PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 pi --model cursor/composer-2
|
|
33
|
-
PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS=7200 pi --model cursor/composer-2
|
|
34
|
-
PI_CURSOR_MCP_TOOL_TIMEOUT_MS=7200000 pi --model cursor/composer-2
|
|
35
|
-
PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS=5 pi --model cursor/composer-2
|
|
36
|
-
PI_CURSOR_MCP_CONNECT_TIMEOUT_MS=5000 pi --model cursor/composer-2
|
|
37
|
-
PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 pi --model cursor/composer-2
|
|
31
|
+
PI_CURSOR_PI_TOOL_BRIDGE=0 pi --model cursor/composer-2-5
|
|
32
|
+
PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 pi --model cursor/composer-2-5
|
|
33
|
+
PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS=7200 pi --model cursor/composer-2-5
|
|
34
|
+
PI_CURSOR_MCP_TOOL_TIMEOUT_MS=7200000 pi --model cursor/composer-2-5
|
|
35
|
+
PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS=5 pi --model cursor/composer-2-5
|
|
36
|
+
PI_CURSOR_MCP_CONNECT_TIMEOUT_MS=5000 pi --model cursor/composer-2-5
|
|
37
|
+
PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 pi --model cursor/composer-2-5
|
|
38
38
|
```
|
|
39
39
|
|
|
40
40
|
`PI_CURSOR_PI_TOOL_BRIDGE=0` disables the bridge, including `pi__cursor_ask_question`. `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1` opts in to exposing overlapping pi tool names that Cursor already has native equivalents for (`read`, `bash`, `write`, `edit`, `grep`, `find`, and `ls`). By default those names are hidden even when pi's Cursor replay wrapper has registered them as extension tools; non-overlapping active built-ins remain bridgeable by default. The installed Cursor SDK uses a 60-second MCP protocol default; pi-cursor-sdk overrides that seam by default with 3600 seconds for MCP `callTool` requests and 10 seconds for verified initialize/listTools requests on first send. Unknown MCP protocol timeout stacks keep the SDK default. `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` emits typed, allowlisted, scrubbed single-line JSONL bridge diagnostics to `process.stderr` with prefix `[pi-cursor-sdk:bridge]`; it is off by default, uses run-safe IDs that are not reused in endpoint paths, and does not print endpoint URLs/path components/tokens, raw args/results, file contents, or secrets. Cursor-native tools, Cursor settings, plugins, and configured Cursor MCP servers still come from the Cursor SDK local agent path. Cloud Cursor agents are out of scope for this bridge.
|
|
@@ -62,13 +62,13 @@ When Cursor reports completed tool activity, the extension can display recorded
|
|
|
62
62
|
|
|
63
63
|
Cursor `glob` activity is displayed through native `find` cards.
|
|
64
64
|
|
|
65
|
-
For the full `@cursor/sdk@1.0.
|
|
65
|
+
For the full `@cursor/sdk@1.0.17` `ToolType` set, disposition matrix, and runtime alias normalization, see [SDK ToolType replay matrix](#sdk-tooltype-replay-matrix) below. Official SDK reference: https://cursor.com/docs/sdk/typescript
|
|
66
66
|
|
|
67
67
|
Edit and write activity replays through pi-facing `edit` and `write` cards only when replay arguments truthfully satisfy the matching pi schema, but still uses recorded Cursor results only. The adapter passes through truthful Cursor paths, content when Cursor reported it, and recorded diff/details; it does not pretend Cursor's editing schema is pi's schema and it fails closed if a recorded replay result is missing. Cursor `StrReplace` with recorded replacement text displays as native-looking `edit`; path-only Cursor `edit` and notebook edit activity fall back to neutral Cursor activity so pi does not reject the replay before recorded-result handling. Cursor `write` displays as native-looking `write`. Diagnostics, delete, todos/plans, task, image, MCP, semantic search, screen recording, and web search/fetch activity use neutral Cursor activity cards with pi's default success/error tool shell. MCP completions whose `toolName` is `WebSearch` / `web_search` / `WebFetch` / similar are labeled **Cursor web search** or **Cursor web fetch** instead of generic **Cursor MCP**. Neutral Cursor activity cards carry display metadata such as `activityTitle` and `activitySummary`, so partial/collapsed cards can say `Cursor plan`, `Cursor todos`, `Cursor MCP`, `Cursor semantic search`, `Cursor screen recording`, `Cursor web search`, `Cursor web fetch`, or `Cursor edit` instead of only `Cursor activity`. These replay tools only display recorded Cursor results; they never mutate files or execute tool work directly. Replay paths are normalized to workspace-relative paths when possible. Most collapsed replay cards include bounded previews for diffs and text details so small edits, todos, task output, and MCP results are visible without expanding; web search/fetch activity stays summary-only while collapsed because those cards often arrive after final text and can otherwise bury the answer. Ctrl+O expansion shows the recorded details. Edit previews omit raw unified diff headers and show compact numbered changed/context lines using pi's native diff added/removed/context colors, and write previews use syntax highlighting when pi can infer a language from the path. Image generation replay cards show the saved image path in the collapsed summary and render the image inline when pi terminal image display is enabled and the generated file is still readable.
|
|
68
68
|
|
|
69
69
|
## SDK ToolType replay matrix
|
|
70
70
|
|
|
71
|
-
Source of truth for SDK tool names: `@cursor/sdk@1.0.
|
|
71
|
+
Source of truth for SDK tool names: `@cursor/sdk@1.0.17` conversation `ToolType` values and https://cursor.com/docs/sdk/typescript
|
|
72
72
|
|
|
73
73
|
Implementation owners: `src/cursor-tool-presentation-registry.ts` (canonical names, labels, visibility, replay policy, bridge exclusions for internal replay wrappers, and display-spec key completeness), `src/cursor-transcript-tool-specs.ts` (registry-keyed `TOOL_DISPLAY_SPECS` formatters/builders), `src/cursor-native-tool-display-replay.ts` (replay card rendering derived from registry replay metadata), and `src/cursor-transcript-utils.ts` (`normalizeToolName()` delegating to the registry).
|
|
74
74
|
|
|
@@ -197,7 +197,7 @@ Native replay wrappers are registered only for tool names not already owned by a
|
|
|
197
197
|
Disable native replay registration entirely:
|
|
198
198
|
|
|
199
199
|
```bash
|
|
200
|
-
PI_CURSOR_NATIVE_TOOL_DISPLAY=0 pi --model cursor/composer-2
|
|
200
|
+
PI_CURSOR_NATIVE_TOOL_DISPLAY=0 pi --model cursor/composer-2-5
|
|
201
201
|
```
|
|
202
202
|
|
|
203
203
|
`PI_CURSOR_REGISTER_NATIVE_TOOLS=0` is also accepted as a registration-only opt-out.
|
|
@@ -1,20 +1,22 @@
|
|
|
1
1
|
# Cursor Native Tool Visual Audit Workflow
|
|
2
2
|
|
|
3
|
+
> **Platform Smoke (new):** The required cross-platform release gate includes a deterministic visual card matrix across all targets. See [docs/platform-smoke.md](./platform-smoke.md) for the required cards, assertion contract, and platform-matrix budget.
|
|
4
|
+
|
|
3
5
|
This workflow is the canonical repo path for verifying Cursor SDK tool replay the way a human sees it in pi's interactive TUI, without stealing macOS focus.
|
|
4
6
|
|
|
5
7
|
Use it before accepting replay-card commits or PRs, and for every Cursor provider/runtime release where TUI card/color behavior could regress. Text logs and JSONL are necessary, but they are not enough when the claim is visual parity: always keep PNGs for the exact prompt, and keep before/after PNGs when reviewing a rendering change.
|
|
6
8
|
|
|
7
|
-
Current validation baseline: pi 0.
|
|
9
|
+
Current validation baseline: pi 0.78.0, exact `@cursor/sdk@1.0.17`, local validation packages `@earendil-works/pi-ai`, `@earendil-works/pi-coding-agent`, and `@earendil-works/pi-tui` at 0.78.0. Published peer dependencies remain minimum-only at pi 0.76.0+ with no upper bound, so newer pi installs can try the extension before a matching validation release exists.
|
|
8
10
|
|
|
9
|
-
## Cursor SDK 1.0.
|
|
11
|
+
## Cursor SDK 1.0.17 / pi 0.78.0 cutover visual record
|
|
10
12
|
|
|
11
13
|
Record the required cutover validation here or in the final release handoff. The default matrix is native replay only: the runner forces native replay registration on, forces Cursor setting sources off, disables the pi bridge, disables overlapping built-in pi tool exposure, and clears inherited Cursor SDK event-debug artifact env. With `--event-debug`, debug capture writes to a deterministic directory under the visual output directory. Do not commit raw ANSI logs, screenshots, terminal recordings, debug artifacts, or `.debug/visual-smoke` scratch files.
|
|
12
14
|
|
|
13
15
|
| Field | Required value / evidence |
|
|
14
16
|
| --- | --- |
|
|
15
17
|
| Command/session used | `npm run smoke:visual -- --ext "$PWD" --cwd "$PWD" --mode plan --out-dir <fresh /tmp dir> --label <matrix label> --prompt <matrix prompt>` with default native-replay isolation |
|
|
16
|
-
| Baseline versions | `pi --version` = 0.
|
|
17
|
-
| Card categories checked | Claim only categories proven by both PNG and JSONL. Required cutover categories are read, grep/search, find/glob,
|
|
18
|
+
| Baseline versions | `pi --version` = 0.78.0; `npm ls` = `@cursor/sdk@1.0.17` and local `@earendil-works/*@0.78.0` |
|
|
19
|
+
| Card categories checked | Claim only categories proven by both PNG and JSONL. Required cutover categories are read, grep/search, find/glob, shell success, write, edit/diff, and true read failure. Direct `ls`/list is tracked as excluded from the current one-prompt platform matrix because composer-2-5 does not route it through native `ls` reliably; source-enumeration coverage is gated through find/glob. Neutral Cursor plan/todo/task/mode activity is optional/opportunistic and only counts when JSONL contains a completed Cursor workflow event. |
|
|
18
20
|
| Observed status/card colors | Confirm native-looking cards use native pi styling; neutral Cursor activity is not red; true errors are distinct; diff previews show red/green; plan status is readable |
|
|
19
21
|
| Screenshot/ANSI evidence location | External path only, for example `/tmp/pi-cursor-sdk-1016-visual.*/read-package.{ansi,txt,html,png,jsonl.path}` |
|
|
20
22
|
| Debug artifact location | External `.debug/cursor-sdk-events/...` or temp artifact directory path only; do not commit raw artifacts |
|
|
@@ -27,7 +29,7 @@ Required prompt matrix for this cutover:
|
|
|
27
29
|
| `read-package` | `Use only your file read tool. Read ./package.json and answer with only the package name. Do not use shell, grep, glob, find, or list tools.` | `toolCall.name=read`, `toolResult.toolName=read`, `isError=false` | Native-looking read card; collapsed label/path readable |
|
|
28
30
|
| `grep-readme` | `Use only your grep/search tool to search ./README.md for the literal string "pi-cursor-sdk". Do not use shell, read, glob, find, ls, or list tools. Report only the first matching file path.` | `toolCall.name=grep`, `toolResult.toolName=grep`, `isError=false` | Native-looking grep/search card; match preview readable |
|
|
29
31
|
| `find-readme` | `Use only your glob/file-search/find tool to find README.md from the repository root. Do not use shell, read, grep, ls, or list tools. Report matched paths exactly.` | `toolCall.name=find`, `toolResult.toolName=find`, `isError=false` | Native-looking find/glob card; matched path readable |
|
|
30
|
-
| `list-src` |
|
|
32
|
+
| `list-src` | Excluded from current required platform matrix. Track manually when Cursor reliably routes this prompt through native `ls`. | `toolCall.name=ls`, `toolResult.toolName=ls`, `isError=false` when exercised | Native-looking list card; directory/path readable |
|
|
31
33
|
| `shell-success` | `Use only your shell/terminal tool to run printf 'cursor visual smoke\\n'. Do not use read, grep, glob, find, ls, edit, or write. Report the output.` | `toolCall.name=bash`, `toolResult.toolName=bash`, `isError=false` | Shell success card is not red/error-styled; stdout readable |
|
|
32
34
|
| `write-file` | `Use your normal file write tool to create .debug/visual-smoke/cursor-mode.txt with exactly two lines: alpha and beta. Do not use shell.` | `toolCall.name=write`, `toolResult.toolName=write`, `isError=false` | Native-looking write card; path/content preview readable |
|
|
33
35
|
| `edit-file` | `Use your normal file edit/str-replace tool to change beta to gamma in .debug/visual-smoke/cursor-mode.txt. Do not use shell.` | `toolCall.name=edit`, `toolResult.toolName=edit`, `isError=false` | Native-looking edit card; diff preview shows red/green added/removed lines |
|
|
@@ -61,7 +63,7 @@ The canonical workflow is now offscreen and browser-rendered:
|
|
|
61
63
|
5. Save PNG screenshots with `agent_browser` when the harness is available, or Playwright directly when running outside that harness.
|
|
62
64
|
6. Inspect the session JSONL for exact persisted `toolCall` / `toolResult` data.
|
|
63
65
|
|
|
64
|
-
This is the best default
|
|
66
|
+
This is the best default focused visual-debug path because it exercises the real pi TUI, captures card class/color/label/order/truncation issues before users see them, avoids desktop focus stealing, and leaves reviewable artifacts. Use visible Terminal/Ghostty screenshots only for terminal-specific or pixel-level bugs that cannot be judged through browser-rendered ANSI. The cross-platform release gate remains [Platform Smoke](./platform-smoke.md).
|
|
65
67
|
|
|
66
68
|
## Tool stack
|
|
67
69
|
|
|
@@ -80,7 +82,7 @@ npx playwright install chromium
|
|
|
80
82
|
|
|
81
83
|
`scripts/visual-tui-smoke.mjs` is the durable source of truth for this workflow. It must keep supporting:
|
|
82
84
|
|
|
83
|
-
- fixed-size tmux PTY execution of the parent-resolved `pi -e <extension-dir> --model cursor/composer-2
|
|
85
|
+
- fixed-size tmux PTY execution of the parent-resolved `pi -e <extension-dir> --model cursor/composer-2-5`
|
|
84
86
|
- parent-resolved `pi` and `tmux` command paths reused in tmux-launched runs, with `process.execPath`'s directory prepended for prereq checks and tmux launches so Node shims use the validated Node
|
|
85
87
|
- `PI_CURSOR_NATIVE_TOOL_DISPLAY=1`
|
|
86
88
|
- `PI_CURSOR_REGISTER_NATIVE_TOOLS=1` by default
|
|
@@ -1,10 +1,12 @@
|
|
|
1
1
|
# Cursor Testing Lessons
|
|
2
2
|
|
|
3
|
+
> **Platform Smoke (new):** The required cross-platform release gate is `npm run smoke:platform:doctor && npm run smoke:platform:all`. See [docs/platform-smoke.md](./platform-smoke.md). For portable lessons other pi extension projects can adapt without sharing repo-specific state, see [Crabbox Platform Testing Lessons](./crabbox-platform-testing-lessons.md). The live smoke checklist remains useful for inner-loop development but is not the release gate.
|
|
4
|
+
|
|
3
5
|
## Purpose
|
|
4
6
|
|
|
5
7
|
This document records maintainer testing lessons for `pi-cursor-sdk`. It complements unit tests and the [Cursor live smoke checklist](./cursor-live-smoke-checklist.md). Use it when adding regression coverage, debugging false-green releases, or building isolated smoke harnesses.
|
|
6
8
|
|
|
7
|
-
For a **minimal one-session dogfood pass** (baseline env, one native + one bridge call, JSONL ID patterns, bootstrap manifest, edit diff card), use the [Cursor dogfood checklist](./cursor-dogfood-checklist.md) before running the
|
|
9
|
+
For a **minimal one-session dogfood pass** (baseline env, one native + one bridge call, JSONL ID patterns, bootstrap manifest, edit diff card), use the [Cursor dogfood checklist](./cursor-dogfood-checklist.md) as inner-loop evidence before running the platform smoke gate.
|
|
8
10
|
|
|
9
11
|
## Core lesson: integration-shaped bugs beat unit mocks
|
|
10
12
|
|
|
@@ -176,7 +178,7 @@ Simulate plan-mode execute stripping with the repo fixture:
|
|
|
176
178
|
It sets active tools to `read`, `bash`, `edit`, `write` on each `turn_start`. Run pi with:
|
|
177
179
|
|
|
178
180
|
```bash
|
|
179
|
-
pi -e scripts/fixtures/plan-strip-shim --cursor-no-fast --model cursor/composer-2
|
|
181
|
+
pi -e scripts/fixtures/plan-strip-shim --cursor-no-fast --model cursor/composer-2-5 \
|
|
180
182
|
--session-dir "$SMOKE_DIR/plan-strip" \
|
|
181
183
|
-p 'After reset, read README.md and answer PLAN_STRIP_OK=yes.'
|
|
182
184
|
```
|
|
@@ -189,15 +191,17 @@ Pass criteria:
|
|
|
189
191
|
|
|
190
192
|
## Local validation ladder
|
|
191
193
|
|
|
192
|
-
Run
|
|
194
|
+
Run local checks first, then the platform smoke gate before claiming release-ready for provider/runtime changes:
|
|
193
195
|
|
|
194
196
|
```bash
|
|
195
197
|
npm test
|
|
196
198
|
npm run typecheck
|
|
197
199
|
npm pack --dry-run
|
|
198
200
|
SKIP_LIVE=1 npm run smoke:isolated
|
|
199
|
-
npm run smoke:isolated # requires auth.json or CURSOR_API_KEY
|
|
200
|
-
npm run smoke:live # partial tmux checklist subset
|
|
201
|
+
npm run smoke:isolated # inner-loop helper; requires auth.json or CURSOR_API_KEY
|
|
202
|
+
npm run smoke:live # inner-loop partial tmux checklist subset
|
|
203
|
+
npm run smoke:platform:doctor
|
|
204
|
+
npm run smoke:platform:all
|
|
201
205
|
```
|
|
202
206
|
|
|
203
207
|
After changing `scripts/validate-smoke-jsonl.mjs` or replay scan expectations, also run:
|
|
@@ -206,14 +210,15 @@ After changing `scripts/validate-smoke-jsonl.mjs` or replay scan expectations, a
|
|
|
206
210
|
npm test -- test/validate-smoke-jsonl.test.ts
|
|
207
211
|
```
|
|
208
212
|
|
|
209
|
-
Then
|
|
213
|
+
Then use the [Cursor live smoke checklist](./cursor-live-smoke-checklist.md) only for focused inner-loop surfaces the scripts do not cover (bridge MCP, abort/cancel, full TUI observation, packaging review, cleanup) before rerunning the platform smoke gate.
|
|
210
214
|
|
|
211
|
-
## What belongs in CI vs manual smoke
|
|
215
|
+
## What belongs in CI vs platform/manual smoke
|
|
212
216
|
|
|
213
217
|
- **CI / default `npm test`:** mocked provider tests, extension lifecycle tests, JSONL validator tests, script syntax/help checks. No live Cursor calls.
|
|
214
|
-
- **
|
|
218
|
+
- **Platform release gate:** `npm run smoke:platform:doctor && npm run smoke:platform:all`. Requires real Cursor auth and cross-platform Crabbox setup.
|
|
219
|
+
- **Focused manual smoke:** `npm run smoke:isolated`, `npm run smoke:live`, and selected live-checklist sections for inner-loop debugging of behavior mocks cannot reproduce.
|
|
215
220
|
|
|
216
|
-
If
|
|
221
|
+
If platform smoke auth or target setup is unavailable, report the release as **blocked**, not skipped-ready.
|
|
217
222
|
|
|
218
223
|
## Cursor SDK event capture probe
|
|
219
224
|
|
|
@@ -238,7 +243,7 @@ The script writes timestamped artifacts under `--out` (default `/tmp/pi-cursor-s
|
|
|
238
243
|
|
|
239
244
|
Stdout prints artifact paths and summary counts only. Raw payloads stay on disk and may contain local paths, project text, tool args/results, or secrets — do not commit or share them.
|
|
240
245
|
|
|
241
|
-
Hard repo rule: Cursor SDK behavior claims must come from the installed `@cursor/sdk` package and/or https://cursor.com/docs/sdk/typescript, not from memory or ad-hoc probes alone. Current cutover validation targets exact `@cursor/sdk@1.0.
|
|
246
|
+
Hard repo rule: Cursor SDK behavior claims must come from the installed `@cursor/sdk` package and/or https://cursor.com/docs/sdk/typescript, not from memory or ad-hoc probes alone. Current cutover validation targets exact `@cursor/sdk@1.0.17` and pi 0.78.0 local packages.
|
|
242
247
|
|
|
243
248
|
## Pi provider SDK event capture
|
|
244
249
|
|
|
@@ -249,7 +254,7 @@ One-shot maintainer script (RPC pi run, gitignored artifacts by default):
|
|
|
249
254
|
```bash
|
|
250
255
|
CURSOR_API_KEY=... npm run debug:provider-events -- \
|
|
251
256
|
--cwd . \
|
|
252
|
-
--model cursor/composer-2
|
|
257
|
+
--model cursor/composer-2-5 \
|
|
253
258
|
--prompt 'Repro prompt here' \
|
|
254
259
|
--out .debug/cursor-sdk-events/manual-repro
|
|
255
260
|
```
|
|
@@ -289,7 +294,7 @@ Artifacts under `--out` (default `.debug/cursor-sdk-events/<timestamp>/` under `
|
|
|
289
294
|
During any normal pi session you can also opt in with:
|
|
290
295
|
|
|
291
296
|
```bash
|
|
292
|
-
PI_CURSOR_SDK_EVENT_DEBUG=1 pi -e . --model cursor/composer-2
|
|
297
|
+
PI_CURSOR_SDK_EVENT_DEBUG=1 pi -e . --model cursor/composer-2-5
|
|
293
298
|
```
|
|
294
299
|
|
|
295
300
|
Multi-turn sessions group automatically by pi session file:
|
|
@@ -340,7 +345,7 @@ Ask the reporter (or capture yourself) for:
|
|
|
340
345
|
| Field | Why |
|
|
341
346
|
| --- | --- |
|
|
342
347
|
| `pi --version` and installed `pi-cursor-sdk` version | Confirms extension/runtime in use |
|
|
343
|
-
| Model ID (for example `cursor/composer-2
|
|
348
|
+
| Model ID (for example `cursor/composer-2-5`) | Routing/replay behavior is model-scoped |
|
|
344
349
|
| Exact repro prompt and prior turns | Multi-turn replay history affects prompt text |
|
|
345
350
|
| Flags: `--cursor-no-fast`, `PI_CURSOR_PI_TOOL_BRIDGE`, `PI_CURSOR_EXPOSE_BUILTIN_TOOLS`, `PI_CURSOR_SETTING_SOURCES`, `PI_CURSOR_TOOL_MANIFEST` | Bridge vs native-only vs narrowed settings; bootstrap callable-surface manifest |
|
|
346
351
|
| Whether the listed names are `pi__*` bridge MCP, Cursor-native (`browser_navigate`, `WebSearch`), or `cursor-replay-*` replay IDs | Three different surfaces (see [Cursor native tool replay](./cursor-native-tool-replay.md#live-bridge-vs-replay)) |
|
|
@@ -361,7 +366,7 @@ chmod 600 "$SMOKE_DIR/home/.pi/agent/auth.json"
|
|
|
361
366
|
env -i HOME="$SMOKE_DIR/home" PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin" \
|
|
362
367
|
MISE_DISABLE=1 \
|
|
363
368
|
PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 \
|
|
364
|
-
pi -e . --cursor-no-fast --model cursor/composer-2
|
|
369
|
+
pi -e . --cursor-no-fast --model cursor/composer-2-5 \
|
|
365
370
|
--session-dir "$SMOKE_DIR/session" \
|
|
366
371
|
-p '<exact reporter prompt>'
|
|
367
372
|
```
|
|
@@ -373,7 +378,7 @@ For pi parsing, replay routing, or bridge timing, prefer:
|
|
|
373
378
|
```bash
|
|
374
379
|
npm run debug:provider-events -- \
|
|
375
380
|
--cwd "$PWD" \
|
|
376
|
-
--model cursor/composer-2
|
|
381
|
+
--model cursor/composer-2-5 \
|
|
377
382
|
--prompt '<exact reporter prompt>' \
|
|
378
383
|
--out "$SMOKE_DIR/provider-events"
|
|
379
384
|
```
|
|
@@ -394,7 +399,7 @@ npm run debug:sdk-events -- \
|
|
|
394
399
|
|
|
395
400
|
Start with whether pi stayed alive:
|
|
396
401
|
|
|
397
|
-
0. **pi process exited / shell returned with uncaught `ConnectError` (`ETIMEDOUT`,
|
|
402
|
+
0. **pi process exited / shell returned with uncaught `ConnectError` (for example `ETIMEDOUT`, `ECONNRESET`, `read ETIMEDOUT`, or `[aborted] read ECONNRESET`)** — hard network crash bypassing provider error surfacing. Current code guards observed Cursor SDK/network-reset shapes during active Cursor turns and should show scrubbed retry guidance instead; treat a fresh process exit as a process-guard regression, capture the stack/session tail, and route to **#43/#107** rather than #40 model text echo. If tools were mid-flight, note whether session JSONL ends abruptly.
|
|
398
403
|
|
|
399
404
|
Then inspect the failing assistant turn in `$SMOKE_DIR/session/*.jsonl`:
|
|
400
405
|
|
|
@@ -414,7 +419,7 @@ rg '"type": "toolCall"|Tool call \(Cursor|cursor-replay-' "$SMOKE_DIR/session"/*
|
|
|
414
419
|
|
|
415
420
|
### When to file follow-ups
|
|
416
421
|
|
|
417
|
-
- **#43** — pi exited from uncaught `ConnectError` /
|
|
422
|
+
- **#43/#107** — pi exited from uncaught Cursor SDK `ConnectError` / network reset during HTTP traffic (hard crash, not a scrubbed #55 toast). Observed `ETIMEDOUT` and `ECONNRESET` shapes should be guarded during active Cursor turns; new exits need stack/session evidence.
|
|
418
423
|
- **#55** — caught SDK run failure or abort with missing/opaque detail (already addressed on main for surfacing).
|
|
419
424
|
- **#52** — stale/inactive native replay routing after plan-strip or stale `context.tools` snapshot (`Tool * not found` in JSONL, `inactive_trace` in `display-decisions.jsonl`); or maintainer needs an explicit "started X, never completed" debug line when JSONL shows no completion and no model text echo.
|
|
420
425
|
- **New issue** — bridge dispatch failure with `[pi-cursor-sdk:bridge]` evidence, or proven provider bug with JSONL showing missing `toolCall` despite SDK `tool-call-completed` in `on-delta.jsonl` from `debug:provider-events` or `debug:sdk-events` artifacts.
|
|
@@ -31,13 +31,13 @@ Default behavior:
|
|
|
31
31
|
|
|
32
32
|
```bash
|
|
33
33
|
# Disable pi bridge entirely
|
|
34
|
-
PI_CURSOR_PI_TOOL_BRIDGE=0 pi --model cursor/composer-2
|
|
34
|
+
PI_CURSOR_PI_TOOL_BRIDGE=0 pi --model cursor/composer-2-5
|
|
35
35
|
|
|
36
36
|
# Expose overlapping pi builtins through the bridge
|
|
37
|
-
PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 pi --model cursor/composer-2
|
|
37
|
+
PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1 pi --model cursor/composer-2-5
|
|
38
38
|
|
|
39
39
|
# Disable bootstrap tool manifest
|
|
40
|
-
PI_CURSOR_TOOL_MANIFEST=0 pi --model cursor/composer-2
|
|
40
|
+
PI_CURSOR_TOOL_MANIFEST=0 pi --model cursor/composer-2-5
|
|
41
41
|
```
|
|
42
42
|
|
|
43
43
|
## Cursor settings vs pi toggles
|