pi-cursor-sdk 0.1.17 → 0.1.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/CHANGELOG.md +62 -0
  2. package/README.md +38 -1
  3. package/docs/cursor-live-smoke-checklist.md +22 -2
  4. package/docs/cursor-model-ux-spec.md +5 -4
  5. package/docs/cursor-native-tool-replay.md +96 -2
  6. package/docs/cursor-testing-lessons.md +428 -0
  7. package/package.json +11 -2
  8. package/scripts/debug-provider-events.mjs +403 -0
  9. package/scripts/debug-sdk-events.mjs +413 -0
  10. package/scripts/isolated-cursor-smoke.sh +226 -0
  11. package/scripts/lib/cursor-probe-utils.mjs +52 -0
  12. package/scripts/lib/cursor-sdk-output-filter.mjs +86 -0
  13. package/scripts/validate-smoke-jsonl.mjs +86 -7
  14. package/src/context.ts +45 -32
  15. package/src/cursor-agent-message-web-tools.ts +172 -0
  16. package/src/cursor-agents-context.ts +176 -0
  17. package/src/cursor-context-tools.ts +6 -0
  18. package/src/cursor-display-text.ts +10 -0
  19. package/src/cursor-incomplete-tool-visibility.ts +118 -0
  20. package/src/cursor-live-run-coordinator.ts +18 -7
  21. package/src/cursor-model.ts +12 -0
  22. package/src/cursor-native-replay-routing.ts +48 -0
  23. package/src/cursor-native-replay-trace.ts +29 -0
  24. package/src/cursor-native-tool-display-registration.ts +14 -7
  25. package/src/cursor-native-tool-display-replay.ts +63 -5
  26. package/src/cursor-native-tool-display-tools.ts +20 -0
  27. package/src/cursor-pi-tool-bridge-diagnostics.ts +11 -1
  28. package/src/cursor-pi-tool-bridge-run.ts +16 -1
  29. package/src/cursor-pi-tool-bridge-types.ts +3 -0
  30. package/src/cursor-provider-errors.ts +96 -0
  31. package/src/cursor-provider-live-run-drain.ts +208 -63
  32. package/src/cursor-provider-turn-coordinator.ts +217 -47
  33. package/src/cursor-provider.ts +275 -83
  34. package/src/cursor-question-tool.ts +10 -5
  35. package/src/cursor-sdk-abort-error-guard.ts +109 -0
  36. package/src/cursor-sdk-event-debug-constants.ts +40 -0
  37. package/src/cursor-sdk-event-debug-session.ts +163 -0
  38. package/src/cursor-sdk-event-debug.ts +597 -0
  39. package/src/cursor-sensitive-text.ts +27 -7
  40. package/src/cursor-session-agent.ts +25 -3
  41. package/src/cursor-session-send-policy.ts +43 -0
  42. package/src/cursor-setting-sources.ts +29 -0
  43. package/src/cursor-state.ts +1 -5
  44. package/src/cursor-tool-lifecycle.ts +111 -0
  45. package/src/cursor-tool-names.ts +12 -0
  46. package/src/cursor-tool-transcript.ts +4 -2
  47. package/src/cursor-transcript-tool-formatters.ts +228 -5
  48. package/src/cursor-transcript-tool-specs.ts +113 -14
  49. package/src/cursor-transcript-utils.ts +12 -0
  50. package/src/cursor-web-tool-activity.ts +84 -0
  51. package/src/index.ts +4 -1
@@ -0,0 +1,428 @@
1
+ # Cursor Testing Lessons
2
+
3
+ ## Purpose
4
+
5
+ This document records maintainer testing lessons for `pi-cursor-sdk`. It complements unit tests and the [Cursor live smoke checklist](./cursor-live-smoke-checklist.md). Use it when adding regression coverage, debugging false-green releases, or building isolated smoke harnesses.
6
+
7
+ ## Core lesson: integration-shaped bugs beat unit mocks
8
+
9
+ The native replay `Tool grep not found` failure was integration-shaped, not unit-shaped:
10
+
11
+ 1. **Plan mode** calls `setActiveTools(["read", "bash", "edit", "write"])` when execution starts.
12
+ 2. **pi-cursor-sdk** only re-synced native replay wrappers on `session_start` / `model_select`, not every turn.
13
+ 3. **The provider** still emitted native replay `toolUse` for `grep` / `cursor`.
14
+ 4. **pi's agent loop** looked up tools in `context.tools` and failed with `Tool grep not found`.
15
+
16
+ Passing hundreds of unit tests did not prove that chain was safe. Regression coverage now includes:
17
+
18
+ - `test/index.test.ts` — `before_agent_start` and `turn_start` resync after plan-style strip
19
+ - `test/cursor-native-replay-stress.test.ts` — plan strip → resync → grep replay; inactive-tool trace fallbacks
20
+ - `test/cursor-provider-replay-live-run.test.ts` — inactive replay tools emit trace instead of broken `toolUse`
21
+ - `test/cursor-native-replay-trace.test.ts` — shared inactive replay trace formatting
22
+ - `test/cursor-native-replay-routing.test.ts` — `resolveNativeReplayDisposition` and `partitionNativeToolsByActiveContext`
23
+ - `test/validate-smoke-jsonl.test.ts` — replay scan semantics (real errors vs doc mentions in successful reads)
24
+
25
+ When changing provider/runtime behavior, ask whether the bug spans **pi extension lifecycle**, **active tool state**, **provider streaming**, and **persisted JSONL**. If yes, add an integration-style unit test or live smoke coverage for that chain.
26
+
27
+ ## Dual-check invariant: `context.tools` vs pi active tools
28
+
29
+ Native replay routing intentionally uses two layers:
30
+
31
+ 1. **Extension resync** (`before_agent_start`, `turn_start`) updates pi's active tool set via `syncRegisteredNativeCursorToolsForModel`. This fixes the common case where plan-mode execute strips `grep`/`find`/`cursor` before the next turn.
32
+ 2. **Provider routing** uses the **`context.tools` snapshot** captured when `streamCursor()` starts (`getActiveContextToolNames` in `src/cursor-context-tools.ts`). It does not read live `pi.getActiveTools()` mid-stream.
33
+
34
+ `src/cursor-native-replay-routing.ts` centralizes provider-side routing against the same `context.tools` snapshot:
35
+
36
+ - **Turn coordinator** calls `resolveNativeReplayDisposition()` per completed SDK tool → `queue_replay` (queue native `toolUse`), `inactive_trace` (`formatInactiveCursorReplayTrace()`), or `transcript_trace`.
37
+ - **Live-run drain** calls `partitionNativeToolsByActiveContext()` on already-queued native tool batches → active tools become `toolUse`; inactive tools get trace only and the batch returns `"handled"` without `toolUse`.
38
+
39
+ Disposition outcomes:
40
+
41
+ - `queue_replay` — tool is in `context.tools` and a live run exists
42
+ - `inactive_trace` — native replay tool missing from `context.tools`
43
+ - `transcript_trace` — native replay off or non-native tool
44
+
45
+ If resync runs but `context.tools` is still stale (e.g. only `read` listed), the provider must **not** emit `toolUse` for inactive tools. `test/cursor-native-replay-stress.test.ts` covers that stale-snapshot path.
46
+
47
+ ## Auth: use `auth.json`, not only env
48
+
49
+ pi resolves Cursor auth in this order:
50
+
51
+ 1. pi `--api-key`
52
+ 2. stored `cursor` key in `~/.pi/agent/auth.json` from `/login`
53
+ 3. `CURSOR_API_KEY`
54
+
55
+ For live smoke and isolated harnesses:
56
+
57
+ - **Do not assume** `CURSOR_API_KEY` or `~/.secrets` alone is enough.
58
+ - **Do assume** pi reads auth from the active `HOME`, usually `~/.pi/agent/auth.json`.
59
+ - Isolated runs with `env -i HOME=/tmp/...` must **copy** `auth.json` into that temporary home before calling `pi`.
60
+
61
+ Example seed step used by `scripts/isolated-cursor-smoke.sh`:
62
+
63
+ ```bash
64
+ mkdir -p "$HOME/.pi/agent"
65
+ cp "$REAL_HOME/.pi/agent/auth.json" "$HOME/.pi/agent/auth.json"
66
+ chmod 600 "$HOME/.pi/agent/auth.json"
67
+ ```
68
+
69
+ Fallback when `auth.json` lacks a `cursor` provider entry:
70
+
71
+ ```bash
72
+ export CURSOR_API_KEY="your-key"
73
+ ```
74
+
75
+ Never commit, log, or paste `auth.json` contents, API keys, or session JSONL with secrets.
76
+
77
+ ## Isolated directories: why and how
78
+
79
+ Use isolated `/tmp` trees when validating:
80
+
81
+ - packed tarball install (`npm pack` → extract → `pi install -l`)
82
+ - clean `HOME` with no inherited shell profile state
83
+ - plan-mode-style tool stripping via a shim extension
84
+ - JSONL replay-error scans independent of stdout
85
+
86
+ Recommended layout:
87
+
88
+ ```text
89
+ /tmp/pi-cursor-sdk-isolated-<timestamp>/
90
+ home/ # seeded ~/.pi/agent/auth.json
91
+ pack/ # npm pack output (*.tgz)
92
+ extract/package/ # unpacked extension
93
+ project/ # empty pi project for install -l
94
+ sessions/
95
+ basic/
96
+ native-replay/
97
+ plan-strip/
98
+ ```
99
+
100
+ Commands:
101
+
102
+ ```bash
103
+ # full isolated smoke (unit preflight + pack + live pi)
104
+ npm run smoke:isolated
105
+
106
+ # pack/unit only, no live Cursor calls
107
+ SKIP_LIVE=1 npm run smoke:isolated
108
+
109
+ # custom artifact root
110
+ ISOLATED=/tmp/pi-cursor-sdk-isolated-manual npm run smoke:isolated
111
+ ```
112
+
113
+ Every live check should use its own `--session-dir` under the isolated tree. Do not reuse session dirs across scenarios.
114
+
115
+ ## Harness traps we hit repeatedly
116
+
117
+ | Trap | What went wrong | Fix |
118
+ | --- | --- | --- |
119
+ | Clean `HOME` without auth | `pi` could not authenticate Cursor in isolated runs | Copy `~/.pi/agent/auth.json` into isolated `HOME` |
120
+ | `npm pack \| tail -1` | Captured npm notice text, not tarball path | Use `ls -t "$PACK_DIR"/*.tgz \| head -1` |
121
+ | Packed extension, no install | Provider never loaded in isolated project | Run `npm install --omit=dev` inside extracted package |
122
+ | Inherited shell env | mise/profile hooks hung or polluted runs | Use `env -i ... MISE_DISABLE=1` for isolated pi calls |
123
+ | No per-check timeout | One stuck prompt blocked entire harness | Wrap each live check with timeout/watchdog |
124
+ | stdout-only assertions | Missed replay failures persisted only in JSONL | Scan JSONL for `Tool grep/cursor/find/ls not found` |
125
+ | Naive JSONL substring scan | Successful `read` of docs mentioning replay errors looked like failures | `validate-smoke-jsonl.mjs` only flags error `toolResult` / error assistant messages |
126
+ | Plan strip only on first turn | Under-tested multi-turn resync | Shim strips on every `turn_start`; stress multi-turn separately |
127
+ | Assuming env auth equals pi auth | False "blocked" or false "pass" in CI-like shells | Check `auth.json` provider keys explicitly when needed |
128
+
129
+ ## JSONL is the source of truth for replay regressions
130
+
131
+ Stdout can look fine while persisted tool results contain errors. Prefer structural JSONL scans over grepping terminal output.
132
+
133
+ Replay failure scan:
134
+
135
+ ```bash
136
+ node scripts/validate-smoke-jsonl.mjs --replay-errors-only "$SESSION_DIR"
137
+ ```
138
+
139
+ Combined usage + replay scan after broader smoke:
140
+
141
+ ```bash
142
+ node scripts/validate-smoke-jsonl.mjs --replay-errors "$SMOKE_DIR"
143
+ ```
144
+
145
+ ### What counts as a replay failure
146
+
147
+ The scan fails only on **persisted error messages**, not arbitrary substring matches in session JSONL:
148
+
149
+ - error `toolResult` records (`isError: true`) whose text contains:
150
+ - `Tool grep not found`
151
+ - `Tool cursor not found`
152
+ - `Tool find not found`
153
+ - `Tool ls not found`
154
+ - error assistant messages (`stopReason: "error"` or `errorMessage`) containing those strings
155
+
156
+ Successful tool results are ignored even when file contents mention those strings (for example a `read` of `docs/cursor-testing-lessons.md` during plan-strip smoke).
157
+
158
+ ### False-positive edge case (2026-05-23)
159
+
160
+ Plan-strip live smoke can make Cursor `read` testing docs that *document* replay failure strings. A naive whole-record JSON scan reported four failures from one successful `read` toolResult (`isError: false`).
161
+
162
+ When changing replay scan logic:
163
+
164
+ 1. Update `scripts/validate-smoke-jsonl.mjs`
165
+ 2. Add/adjust cases in `test/validate-smoke-jsonl.test.ts` (error toolResult must still fail; successful read of doc text must pass)
166
+ 3. Re-run `npm run smoke:isolated` on a packed temp install before release
167
+
168
+ ## Plan-mode regression scenario
169
+
170
+ Simulate plan-mode execute stripping with the repo fixture:
171
+
172
+ - `scripts/fixtures/plan-strip-shim/index.ts`
173
+
174
+ It sets active tools to `read`, `bash`, `edit`, `write` on each `turn_start`. Run pi with:
175
+
176
+ ```bash
177
+ pi -e scripts/fixtures/plan-strip-shim --cursor-no-fast --model cursor/composer-2.5 \
178
+ --session-dir "$SMOKE_DIR/plan-strip" \
179
+ -p 'After reset, read README.md and answer PLAN_STRIP_OK=yes.'
180
+ ```
181
+
182
+ Pass criteria:
183
+
184
+ - No replay `Tool * not found` entries in JSONL
185
+ - Native replay tools (`grep`, `find`, `read`, etc.) succeed after `turn_start` resync
186
+ - On non-Cursor model switch, native replay wrappers are removed except core pi tools
187
+
188
+ ## Local validation ladder
189
+
190
+ Run in order before claiming release-ready for provider/runtime changes:
191
+
192
+ ```bash
193
+ npm test
194
+ npm run typecheck
195
+ npm pack --dry-run
196
+ SKIP_LIVE=1 npm run smoke:isolated
197
+ npm run smoke:isolated # requires auth.json or CURSOR_API_KEY
198
+ npm run smoke:live # partial tmux checklist subset
199
+ ```
200
+
201
+ After changing `scripts/validate-smoke-jsonl.mjs` or replay scan expectations, also run:
202
+
203
+ ```bash
204
+ npm test -- test/validate-smoke-jsonl.test.ts
205
+ ```
206
+
207
+ Then follow the full manual [Cursor live smoke checklist](./cursor-live-smoke-checklist.md) for surfaces the scripts do not cover (bridge MCP, abort/cancel, full TUI observation, packaging review, cleanup).
208
+
209
+ ## What belongs in CI vs manual smoke
210
+
211
+ - **CI / default `npm test`:** mocked provider tests, extension lifecycle tests, JSONL validator tests, script syntax/help checks. No live Cursor calls.
212
+ - **Manual / pre-release:** `npm run smoke:isolated`, `npm run smoke:live`, and the full checklist. Requires real Cursor auth and observes TUI/runtime behavior mocks cannot reproduce.
213
+
214
+ If live smoke auth is unavailable, report the release as **blocked**, not skipped-ready.
215
+
216
+ ## Cursor SDK event capture probe
217
+
218
+ When debugging TUI/progress/replay timing gaps, capture raw Cursor SDK surfaces side-by-side instead of writing a throwaway probe:
219
+
220
+ ```bash
221
+ CURSOR_API_KEY=... npm run debug:sdk-events -- \
222
+ --cwd ~/Projects \
223
+ --model composer-2.5 \
224
+ --prompt 'Scan all of my projects and give me ideas that would be great to add the Cursor SDK to' \
225
+ --out /tmp/pi-cursor-sdk-sdk-events-manual
226
+ ```
227
+
228
+ The script writes timestamped artifacts under `--out` (default `/tmp/pi-cursor-sdk-sdk-events-<timestamp>`):
229
+
230
+ - `stream-events.jsonl` — `run.stream()` messages
231
+ - `on-delta.jsonl` — `agent.send(..., { onDelta })` updates
232
+ - `on-step.jsonl` — `agent.send(..., { onStep })` steps
233
+ - `wait-result.json` — final `run.wait()` metadata
234
+ - optional `conversation.json` with `--include-conversation`
235
+ - `summary.json` — event counts and timing gaps
236
+
237
+ Stdout prints artifact paths and summary counts only. Raw payloads stay on disk and may contain local paths, project text, tool args/results, or secrets — do not commit or share them.
238
+
239
+ Hard repo rule: Cursor SDK behavior claims must come from the installed `@cursor/sdk` package and/or https://cursor.com/docs/sdk/typescript, not from memory or ad-hoc probes alone.
240
+
241
+ ## Pi provider SDK event capture
242
+
243
+ When debugging pi parsing, replay routing, bridge timing, or send-plan behavior, capture the raw `onDelta`/`onStep` payloads **as the Cursor provider receives them** instead of using the direct SDK probe above.
244
+
245
+ One-shot maintainer script (RPC pi run, gitignored artifacts by default):
246
+
247
+ ```bash
248
+ CURSOR_API_KEY=... npm run debug:provider-events -- \
249
+ --cwd . \
250
+ --model cursor/composer-2.5 \
251
+ --prompt 'Repro prompt here' \
252
+ --out .debug/cursor-sdk-events/manual-repro
253
+ ```
254
+
255
+ Or read a prompt from disk:
256
+
257
+ ```bash
258
+ CURSOR_API_KEY=... npm run debug:provider-events -- \
259
+ --prompt-file .debug/repro-prompt.txt \
260
+ --out .debug/cursor-sdk-events/manual-repro
261
+ ```
262
+
263
+ Artifacts under `--out` (default `.debug/cursor-sdk-events/<timestamp>/` under `--cwd`):
264
+
265
+ - `metadata.json` — model, cwd, send-plan/provider metadata
266
+ - `context-snapshot.json` — full pi `Context` passed into the provider turn
267
+ - `send-payload.json` — exact `agent.send()` input (text + images)
268
+ - `on-delta.jsonl` — raw `InteractionUpdate` objects passed to `turnCoordinator.handleDelta`
269
+ - `on-step.jsonl` — raw `onStep` payloads passed to `turnCoordinator.handleStep`
270
+ - `stream-events.jsonl` — raw `run.stream()` events when supported
271
+ - `pi-stream-events.jsonl` — exact pi stream events emitted to the TUI (`text_delta`, `thinking_delta`, replay cards, `done`, etc.)
272
+ - `provider-events.jsonl` — provider lifecycle markers (`agent_send_start`, `agent_send_returned`, …)
273
+ - `live-run-events.jsonl` — queued native replay / bridge live-run events
274
+ - `bridge-events.jsonl` — bridge lifecycle/request diagnostics (file-only; no stderr unless bridge debug is also enabled)
275
+ - `bridge-raw.jsonl` — raw bridged MCP args/results
276
+ - `display-decisions.jsonl` — per-tool native replay routing (`queue_replay`, `emit_trace`, `inactive_trace`, dedupe skips, bridge ignores) with transcript/trace text
277
+ - `coordinator-events.jsonl` — turn-coordinator side effects (task progress labels, discarded incomplete started tool calls, etc.)
278
+ - `drain-events.jsonl` — live-run pre-send drain and per-turn drain lifecycle (`turn_start`, `turn_end`, inactive replay traces, native display registration)
279
+ - `timeline.jsonl` — merged cross-layer timeline (one grep-friendly stream for the whole turn)
280
+ - `pi-session-snapshot.jsonl` — copy of pi session JSONL at turn finalize (session dir also gets latest `pi-session.jsonl`)
281
+ - `final-partial.json` — assistant partial emitted to pi at end of the provider turn
282
+ - `errors.jsonl` — provider/stream/conversation failures
283
+ - `wait-result.json` — `run.wait()` result
284
+ - `conversation.json` — `run.conversation()` when supported
285
+ - `summary.json` — counts and artifact paths
286
+
287
+ During any normal pi session you can also opt in with:
288
+
289
+ ```bash
290
+ PI_CURSOR_SDK_EVENT_DEBUG=1 pi -e . --model cursor/composer-2.5
291
+ ```
292
+
293
+ Multi-turn sessions group automatically by pi session file:
294
+
295
+ ```text
296
+ .debug/cursor-sdk-events/sessions/<session-slug>/
297
+ session.json # index of all turns in this pi session
298
+ turn-001-<timestamp>/ # first provider turn
299
+ turn-002-<timestamp>/ # second provider turn
300
+ ...
301
+ ```
302
+
303
+ Each turn still gets the full per-turn artifact bundle above. Use `session.json` to jump between turns while debugging incremental send, bridge resolution, or native replay continuation across pi messages. For tool-heavy turns, trace/thinking replay often drains on the **next** pi message — check turn N+1 `drain-events.jsonl` and `pi-stream-events.jsonl` alongside turn N `display-decisions.jsonl`.
304
+
305
+ Optional env:
306
+
307
+ - `PI_CURSOR_SDK_EVENT_DEBUG_DIR` — base directory (default `.debug/cursor-sdk-events`)
308
+ - `PI_CURSOR_SDK_EVENT_DEBUG_SESSION_DIR` — exact session root for all turns in the current pi session
309
+ - `PI_CURSOR_SDK_EVENT_DEBUG_RUN_DIR` — exact artifact directory for one isolated turn (the maintainer script sets this via `--out`; bypasses session grouping)
310
+ - `PI_CURSOR_SDK_EVENT_DEBUG_STDERR=1` — also print the summary line to stderr (off by default so the pi TUI stays normal)
311
+
312
+ Capture is file-only by default: no stderr markers, and bridge diagnostics during SDK event debug go to `bridge-events.jsonl` instead of `[pi-cursor-sdk:bridge]` unless you separately set `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1`. Raw payloads stay on disk and may contain secrets — do not commit or share them.
313
+
314
+ ### Discarded incomplete SDK tool calls
315
+
316
+ When Cursor emits `tool-call-started` without a matching completion/step result, the provider surfaces a bounded neutral **Cursor … did not complete** activity card or thinking trace at run end. pi bridge MCP calls (`pi__*`) are excluded because pi already shows the real pi tool execution path.
317
+
318
+ With `PI_CURSOR_SDK_EVENT_DEBUG=1`, each discarded started call is also recorded in `coordinator-events.jsonl` under phase `discarded-incomplete-started-tool-call` with:
319
+
320
+ - normalized SDK tool name
321
+ - scrubbed call-id hash (raw call IDs are not written)
322
+ - reason such as `no-completion-at-run-end`, `abort`, or `sdk-failure`
323
+
324
+ Stderr output for these records requires `PI_CURSOR_SDK_EVENT_DEBUG_STDERR=1`. This complements the standalone `npm run debug:sdk-events` probe by interpreting a specific provider discard path during normal pi runs. User-visible incomplete cards explain the gap in the TUI; debug artifacts remain maintainer-only (**#52**).
325
+
326
+ ## Tool calls listed as plain text (#40 triage)
327
+
328
+ **Symptom:** Assistant output lists tool invocations (for example `Tool call`, `Cursor activity`, `call cursor-replay-…`, `toolName`, `browser_navigate`) instead of pi tool execution cards/results.
329
+
330
+ **What the screenshot in [#40](https://github.com/fitchmultz/pi-cursor-sdk/issues/40) shows:** Plain assistant text that mirrors pi's **prompt transcript format** for replay tool calls (`Tool call (Cursor activity, call cursor-replay-…): …` from `src/context.ts`) rather than a rendered pi `toolCall` card. That pattern usually means the Cursor model **narrated** a tool call as text; it is not proof that pi failed to emit `toolcall_start` / `toolUse`.
331
+
332
+ **Do not close #40 as duplicate of #55 without session JSONL.** #55 surfaces scrubbed SDK run failures and abort causes in the TUI. #40 can occur with no error toast when the model prints tool metadata as assistant text, when replay is display-only but the user expected real execution, when stale native replay routing or plan-strip resync gaps produce `Tool * not found` errors (see **#52**), or when started SDK tools were discarded at run end (see **#52** maintainer debug and [Discarded incomplete SDK tool calls](#discarded-incomplete-sdk-tool-calls) above). A hard **process exit** from uncaught `ConnectError` / `ETIMEDOUT` is **#43**, not #40 text echo.
333
+
334
+ ### Reporter checklist (required before claiming a provider bug)
335
+
336
+ Ask the reporter (or capture yourself) for:
337
+
338
+ | Field | Why |
339
+ | --- | --- |
340
+ | `pi --version` and installed `pi-cursor-sdk` version | Confirms extension/runtime in use |
341
+ | Model ID (for example `cursor/composer-2.5`) | Routing/replay behavior is model-scoped |
342
+ | Exact repro prompt and prior turns | Multi-turn replay history affects prompt text |
343
+ | Flags: `--cursor-no-fast`, `PI_CURSOR_PI_TOOL_BRIDGE`, `PI_CURSOR_EXPOSE_BUILTIN_TOOLS`, `PI_CURSOR_SETTING_SOURCES` | Bridge vs native-only vs narrowed settings |
344
+ | Whether the listed names are `pi__*` bridge MCP, Cursor-native (`browser_navigate`, `WebSearch`), or `cursor-replay-*` replay IDs | Three different surfaces (see [Cursor native tool replay](./cursor-native-tool-replay.md#live-bridge-vs-replay)) |
345
+ | Red toast / `errorMessage` text, if any | Distinguishes #55 failure surfacing from silent text echo |
346
+ | Process exit / uncaught `ConnectError` / `ETIMEDOUT` stack trace, if any | Hard network crash (**#43**), not #40 model text echo |
347
+ | Session JSONL path (redact secrets before sharing) | Source of truth for `toolCall` vs plain `text` blocks; scan for replay `Tool * not found` (**#52**) |
348
+
349
+ ### Capture steps (maintainers)
350
+
351
+ Use an isolated session dir and do not paste auth, tokens, or raw debug payloads into issues.
352
+
353
+ ```bash
354
+ SMOKE_DIR="/tmp/pi-cursor-sdk-issue40-$(date +%s)"
355
+ mkdir -p "$SMOKE_DIR/home/.pi/agent"
356
+ cp "$HOME/.pi/agent/auth.json" "$SMOKE_DIR/home/.pi/agent/auth.json"
357
+ chmod 600 "$SMOKE_DIR/home/.pi/agent/auth.json"
358
+
359
+ env -i HOME="$SMOKE_DIR/home" PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin" \
360
+ MISE_DISABLE=1 \
361
+ PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 \
362
+ pi -e . --cursor-no-fast --model cursor/composer-2.5 \
363
+ --session-dir "$SMOKE_DIR/session" \
364
+ -p '<exact reporter prompt>'
365
+ ```
366
+
367
+ Optional provider/SDK timelines (separate from pi session JSONL; see [Pi provider SDK event capture](#pi-provider-sdk-event-capture) and [Cursor SDK event capture probe](#cursor-sdk-event-capture-probe)):
368
+
369
+ For pi parsing, replay routing, or bridge timing, prefer:
370
+
371
+ ```bash
372
+ npm run debug:provider-events -- \
373
+ --cwd "$PWD" \
374
+ --model cursor/composer-2.5 \
375
+ --prompt '<exact reporter prompt>' \
376
+ --out "$SMOKE_DIR/provider-events"
377
+ ```
378
+
379
+ Or add `PI_CURSOR_SDK_EVENT_DEBUG=1` to the pi run above (writes under `.debug/cursor-sdk-events/`).
380
+
381
+ For raw Cursor SDK surfaces only:
382
+
383
+ ```bash
384
+ npm run debug:sdk-events -- \
385
+ --cwd "$PWD" \
386
+ --model composer-2.5 \
387
+ --prompt '<exact reporter prompt>' \
388
+ --out "$SMOKE_DIR/sdk-events"
389
+ ```
390
+
391
+ ### JSONL classification (decision tree)
392
+
393
+ Start with whether pi stayed alive:
394
+
395
+ 0. **pi process exited / shell returned with uncaught `ConnectError` (`ETIMEDOUT`, code 14, `read ETIMEDOUT`)** — hard network crash bypassing provider error surfacing. Route to **#43** (coordinate with #55 for caught-failure messaging). If tools were mid-flight, note whether session JSONL ends abruptly; do not classify as #40 model text echo.
396
+
397
+ Then inspect the failing assistant turn in `$SMOKE_DIR/session/*.jsonl`:
398
+
399
+ 1. **Error `toolResult` (`isError: true`) or error assistant message contains `Tool grep/cursor/find/ls not found`** — stale `context.tools` snapshot or plan-strip resync gap after plan-mode execute stripped active tools. Run `node scripts/validate-smoke-jsonl.mjs --replay-errors-only "$SMOKE_DIR/session"`. Optional: `display-decisions.jsonl` from `PI_CURSOR_SDK_EVENT_DEBUG=1` shows `inactive_trace` routing. Route to **#52** — not model text echo (those strings appear in persisted error records, not narrated `Tool call (` lines). See [Dual-check invariant](#dual-check-invariant-contexttools-vs-pi-active-tools).
400
+ 2. **`content` has `type: "toolCall"` blocks and matching `toolResult` rows** — pi executed or replayed tools; if the TUI still looked like plain text, capture a screenshot and pi version (possible pi TUI/display issue, not provider dispatch).
401
+ 3. **`content` is only `type: "text"` and text contains `Tool call (` / `cursor-replay-` / serialized arg keys** — model text echo of prompt transcript format; not #55, not #52 stale routing. Compare with `buildCursorPrompt()` output in the prior turn.
402
+ 4. **No `toolCall` blocks, no error toast, user expected real execution** — check whether names are replay-only (`cursor-replay-*`) or Cursor-native MCP; replay never re-runs work ([replay doc](./cursor-native-tool-replay.md)).
403
+ 5. **`stopReason: "error"` or scrubbed `errorMessage`** — classify under **#55**; check whether incomplete started tools were discarded (`discardIncompleteStartedToolCalls()`). Discarded starts with no completion and no model text echo: see `coordinator-events.jsonl` phase `discarded-incomplete-started-tool-call` ([Discarded incomplete SDK tool calls](#discarded-incomplete-sdk-tool-calls) above); route broader stale/inactive replay gaps to **#52**.
404
+ 6. **Bridge expected (`pi__*` in Cursor MCP)** — inspect stderr `[pi-cursor-sdk:bridge]` JSONL with `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` for pending/unresolved bridge requests.
405
+
406
+ Quick structural scan (no secrets):
407
+
408
+ ```bash
409
+ node scripts/validate-smoke-jsonl.mjs --replay-errors-only "$SMOKE_DIR/session"
410
+ rg '"type": "toolCall"|Tool call \(Cursor|cursor-replay-' "$SMOKE_DIR/session"/*.jsonl
411
+ ```
412
+
413
+ ### When to file follow-ups
414
+
415
+ - **#43** — pi exited from uncaught `ConnectError` / `ETIMEDOUT` during Cursor SDK HTTP traffic (hard crash, not a scrubbed #55 toast).
416
+ - **#55** — caught SDK run failure or abort with missing/opaque detail (already addressed on main for surfacing).
417
+ - **#52** — stale/inactive native replay routing after plan-strip or stale `context.tools` snapshot (`Tool * not found` in JSONL, `inactive_trace` in `display-decisions.jsonl`); or maintainer needs an explicit "started X, never completed" debug line when JSONL shows no completion and no model text echo.
418
+ - **New issue** — bridge dispatch failure with `[pi-cursor-sdk:bridge]` evidence, or proven provider bug with JSONL showing missing `toolCall` despite SDK `tool-call-completed` in `on-delta.jsonl` from `debug:provider-events` or `debug:sdk-events` artifacts.
419
+ ## Related docs and scripts
420
+
421
+ - [Cursor live smoke checklist](./cursor-live-smoke-checklist.md)
422
+ - [Cursor native tool replay](./cursor-native-tool-replay.md)
423
+ - `scripts/isolated-cursor-smoke.sh`
424
+ - `scripts/tmux-live-smoke.sh`
425
+ - `scripts/validate-smoke-jsonl.mjs`
426
+ - `scripts/debug-sdk-events.mjs`
427
+ - `scripts/debug-provider-events.mjs`
428
+ - `test/helpers/cursor-provider-harness.ts` — controllable native replay pi mock (`createNativeToolDisplayPiForTest`)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-cursor-sdk",
3
- "version": "0.1.17",
3
+ "version": "0.1.19",
4
4
  "description": "pi provider extension backed by @cursor/sdk local agents",
5
5
  "author": "Mitch Fultz (https://github.com/fitchmultz)",
6
6
  "license": "MIT",
@@ -26,10 +26,16 @@
26
26
  "scripts/refresh-cursor-model-snapshots.mjs",
27
27
  "scripts/steering-rpc-smoke.mjs",
28
28
  "scripts/tmux-live-smoke.sh",
29
+ "scripts/isolated-cursor-smoke.sh",
29
30
  "scripts/validate-smoke-jsonl.mjs",
31
+ "scripts/debug-sdk-events.mjs",
32
+ "scripts/debug-provider-events.mjs",
33
+ "scripts/lib/cursor-probe-utils.mjs",
34
+ "scripts/lib/cursor-sdk-output-filter.mjs",
30
35
  "README.md",
31
36
  "docs/cursor-model-ux-spec.md",
32
37
  "docs/cursor-live-smoke-checklist.md",
38
+ "docs/cursor-testing-lessons.md",
33
39
  "docs/cursor-native-tool-replay.md",
34
40
  "docs/cursor-native-tool-visual-audit.md",
35
41
  "LICENSE",
@@ -45,8 +51,11 @@
45
51
  "test:watch": "vitest",
46
52
  "refresh:cursor-snapshots": "node scripts/refresh-cursor-model-snapshots.mjs",
47
53
  "smoke:live": "scripts/tmux-live-smoke.sh",
54
+ "smoke:isolated": "scripts/isolated-cursor-smoke.sh",
48
55
  "smoke:steering": "node scripts/steering-rpc-smoke.mjs",
49
- "smoke:jsonl": "node scripts/validate-smoke-jsonl.mjs"
56
+ "smoke:jsonl": "node scripts/validate-smoke-jsonl.mjs",
57
+ "debug:sdk-events": "node scripts/debug-sdk-events.mjs",
58
+ "debug:provider-events": "node scripts/debug-provider-events.mjs"
50
59
  },
51
60
  "dependencies": {
52
61
  "@cursor/sdk": "^1.0.13",