pi-cursor-sdk 0.1.17 → 0.1.19
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +62 -0
- package/README.md +38 -1
- package/docs/cursor-live-smoke-checklist.md +22 -2
- package/docs/cursor-model-ux-spec.md +5 -4
- package/docs/cursor-native-tool-replay.md +96 -2
- package/docs/cursor-testing-lessons.md +428 -0
- package/package.json +11 -2
- package/scripts/debug-provider-events.mjs +403 -0
- package/scripts/debug-sdk-events.mjs +413 -0
- package/scripts/isolated-cursor-smoke.sh +226 -0
- package/scripts/lib/cursor-probe-utils.mjs +52 -0
- package/scripts/lib/cursor-sdk-output-filter.mjs +86 -0
- package/scripts/validate-smoke-jsonl.mjs +86 -7
- package/src/context.ts +45 -32
- package/src/cursor-agent-message-web-tools.ts +172 -0
- package/src/cursor-agents-context.ts +176 -0
- package/src/cursor-context-tools.ts +6 -0
- package/src/cursor-display-text.ts +10 -0
- package/src/cursor-incomplete-tool-visibility.ts +118 -0
- package/src/cursor-live-run-coordinator.ts +18 -7
- package/src/cursor-model.ts +12 -0
- package/src/cursor-native-replay-routing.ts +48 -0
- package/src/cursor-native-replay-trace.ts +29 -0
- package/src/cursor-native-tool-display-registration.ts +14 -7
- package/src/cursor-native-tool-display-replay.ts +63 -5
- package/src/cursor-native-tool-display-tools.ts +20 -0
- package/src/cursor-pi-tool-bridge-diagnostics.ts +11 -1
- package/src/cursor-pi-tool-bridge-run.ts +16 -1
- package/src/cursor-pi-tool-bridge-types.ts +3 -0
- package/src/cursor-provider-errors.ts +96 -0
- package/src/cursor-provider-live-run-drain.ts +208 -63
- package/src/cursor-provider-turn-coordinator.ts +217 -47
- package/src/cursor-provider.ts +275 -83
- package/src/cursor-question-tool.ts +10 -5
- package/src/cursor-sdk-abort-error-guard.ts +109 -0
- package/src/cursor-sdk-event-debug-constants.ts +40 -0
- package/src/cursor-sdk-event-debug-session.ts +163 -0
- package/src/cursor-sdk-event-debug.ts +597 -0
- package/src/cursor-sensitive-text.ts +27 -7
- package/src/cursor-session-agent.ts +25 -3
- package/src/cursor-session-send-policy.ts +43 -0
- package/src/cursor-setting-sources.ts +29 -0
- package/src/cursor-state.ts +1 -5
- package/src/cursor-tool-lifecycle.ts +111 -0
- package/src/cursor-tool-names.ts +12 -0
- package/src/cursor-tool-transcript.ts +4 -2
- package/src/cursor-transcript-tool-formatters.ts +228 -5
- package/src/cursor-transcript-tool-specs.ts +113 -14
- package/src/cursor-transcript-utils.ts +12 -0
- package/src/cursor-web-tool-activity.ts +84 -0
- package/src/index.ts +4 -1
|
@@ -0,0 +1,428 @@
|
|
|
1
|
+
# Cursor Testing Lessons
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
This document records maintainer testing lessons for `pi-cursor-sdk`. It complements unit tests and the [Cursor live smoke checklist](./cursor-live-smoke-checklist.md). Use it when adding regression coverage, debugging false-green releases, or building isolated smoke harnesses.
|
|
6
|
+
|
|
7
|
+
## Core lesson: integration-shaped bugs beat unit mocks
|
|
8
|
+
|
|
9
|
+
The native replay `Tool grep not found` failure was integration-shaped, not unit-shaped:
|
|
10
|
+
|
|
11
|
+
1. **Plan mode** calls `setActiveTools(["read", "bash", "edit", "write"])` when execution starts.
|
|
12
|
+
2. **pi-cursor-sdk** only re-synced native replay wrappers on `session_start` / `model_select`, not every turn.
|
|
13
|
+
3. **The provider** still emitted native replay `toolUse` for `grep` / `cursor`.
|
|
14
|
+
4. **pi's agent loop** looked up tools in `context.tools` and failed with `Tool grep not found`.
|
|
15
|
+
|
|
16
|
+
Passing hundreds of unit tests did not prove that chain was safe. Regression coverage now includes:
|
|
17
|
+
|
|
18
|
+
- `test/index.test.ts` — `before_agent_start` and `turn_start` resync after plan-style strip
|
|
19
|
+
- `test/cursor-native-replay-stress.test.ts` — plan strip → resync → grep replay; inactive-tool trace fallbacks
|
|
20
|
+
- `test/cursor-provider-replay-live-run.test.ts` — inactive replay tools emit trace instead of broken `toolUse`
|
|
21
|
+
- `test/cursor-native-replay-trace.test.ts` — shared inactive replay trace formatting
|
|
22
|
+
- `test/cursor-native-replay-routing.test.ts` — `resolveNativeReplayDisposition` and `partitionNativeToolsByActiveContext`
|
|
23
|
+
- `test/validate-smoke-jsonl.test.ts` — replay scan semantics (real errors vs doc mentions in successful reads)
|
|
24
|
+
|
|
25
|
+
When changing provider/runtime behavior, ask whether the bug spans **pi extension lifecycle**, **active tool state**, **provider streaming**, and **persisted JSONL**. If yes, add an integration-style unit test or live smoke coverage for that chain.
|
|
26
|
+
|
|
27
|
+
## Dual-check invariant: `context.tools` vs pi active tools
|
|
28
|
+
|
|
29
|
+
Native replay routing intentionally uses two layers:
|
|
30
|
+
|
|
31
|
+
1. **Extension resync** (`before_agent_start`, `turn_start`) updates pi's active tool set via `syncRegisteredNativeCursorToolsForModel`. This fixes the common case where plan-mode execute strips `grep`/`find`/`cursor` before the next turn.
|
|
32
|
+
2. **Provider routing** uses the **`context.tools` snapshot** captured when `streamCursor()` starts (`getActiveContextToolNames` in `src/cursor-context-tools.ts`). It does not read live `pi.getActiveTools()` mid-stream.
|
|
33
|
+
|
|
34
|
+
`src/cursor-native-replay-routing.ts` centralizes provider-side routing against the same `context.tools` snapshot:
|
|
35
|
+
|
|
36
|
+
- **Turn coordinator** calls `resolveNativeReplayDisposition()` per completed SDK tool → `queue_replay` (queue native `toolUse`), `inactive_trace` (`formatInactiveCursorReplayTrace()`), or `transcript_trace`.
|
|
37
|
+
- **Live-run drain** calls `partitionNativeToolsByActiveContext()` on already-queued native tool batches → active tools become `toolUse`; inactive tools get trace only and the batch returns `"handled"` without `toolUse`.
|
|
38
|
+
|
|
39
|
+
Disposition outcomes:
|
|
40
|
+
|
|
41
|
+
- `queue_replay` — tool is in `context.tools` and a live run exists
|
|
42
|
+
- `inactive_trace` — native replay tool missing from `context.tools`
|
|
43
|
+
- `transcript_trace` — native replay off or non-native tool
|
|
44
|
+
|
|
45
|
+
If resync runs but `context.tools` is still stale (e.g. only `read` listed), the provider must **not** emit `toolUse` for inactive tools. `test/cursor-native-replay-stress.test.ts` covers that stale-snapshot path.
|
|
46
|
+
|
|
47
|
+
## Auth: use `auth.json`, not only env
|
|
48
|
+
|
|
49
|
+
pi resolves Cursor auth in this order:
|
|
50
|
+
|
|
51
|
+
1. pi `--api-key`
|
|
52
|
+
2. stored `cursor` key in `~/.pi/agent/auth.json` from `/login`
|
|
53
|
+
3. `CURSOR_API_KEY`
|
|
54
|
+
|
|
55
|
+
For live smoke and isolated harnesses:
|
|
56
|
+
|
|
57
|
+
- **Do not assume** `CURSOR_API_KEY` or `~/.secrets` alone is enough.
|
|
58
|
+
- **Do assume** pi reads auth from the active `HOME`, usually `~/.pi/agent/auth.json`.
|
|
59
|
+
- Isolated runs with `env -i HOME=/tmp/...` must **copy** `auth.json` into that temporary home before calling `pi`.
|
|
60
|
+
|
|
61
|
+
Example seed step used by `scripts/isolated-cursor-smoke.sh`:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
mkdir -p "$HOME/.pi/agent"
|
|
65
|
+
cp "$REAL_HOME/.pi/agent/auth.json" "$HOME/.pi/agent/auth.json"
|
|
66
|
+
chmod 600 "$HOME/.pi/agent/auth.json"
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Fallback when `auth.json` lacks a `cursor` provider entry:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
export CURSOR_API_KEY="your-key"
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
Never commit, log, or paste `auth.json` contents, API keys, or session JSONL with secrets.
|
|
76
|
+
|
|
77
|
+
## Isolated directories: why and how
|
|
78
|
+
|
|
79
|
+
Use isolated `/tmp` trees when validating:
|
|
80
|
+
|
|
81
|
+
- packed tarball install (`npm pack` → extract → `pi install -l`)
|
|
82
|
+
- clean `HOME` with no inherited shell profile state
|
|
83
|
+
- plan-mode-style tool stripping via a shim extension
|
|
84
|
+
- JSONL replay-error scans independent of stdout
|
|
85
|
+
|
|
86
|
+
Recommended layout:
|
|
87
|
+
|
|
88
|
+
```text
|
|
89
|
+
/tmp/pi-cursor-sdk-isolated-<timestamp>/
|
|
90
|
+
home/ # seeded ~/.pi/agent/auth.json
|
|
91
|
+
pack/ # npm pack output (*.tgz)
|
|
92
|
+
extract/package/ # unpacked extension
|
|
93
|
+
project/ # empty pi project for install -l
|
|
94
|
+
sessions/
|
|
95
|
+
basic/
|
|
96
|
+
native-replay/
|
|
97
|
+
plan-strip/
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
Commands:
|
|
101
|
+
|
|
102
|
+
```bash
|
|
103
|
+
# full isolated smoke (unit preflight + pack + live pi)
|
|
104
|
+
npm run smoke:isolated
|
|
105
|
+
|
|
106
|
+
# pack/unit only, no live Cursor calls
|
|
107
|
+
SKIP_LIVE=1 npm run smoke:isolated
|
|
108
|
+
|
|
109
|
+
# custom artifact root
|
|
110
|
+
ISOLATED=/tmp/pi-cursor-sdk-isolated-manual npm run smoke:isolated
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
Every live check should use its own `--session-dir` under the isolated tree. Do not reuse session dirs across scenarios.
|
|
114
|
+
|
|
115
|
+
## Harness traps we hit repeatedly
|
|
116
|
+
|
|
117
|
+
| Trap | What went wrong | Fix |
|
|
118
|
+
| --- | --- | --- |
|
|
119
|
+
| Clean `HOME` without auth | `pi` could not authenticate Cursor in isolated runs | Copy `~/.pi/agent/auth.json` into isolated `HOME` |
|
|
120
|
+
| `npm pack \| tail -1` | Captured npm notice text, not tarball path | Use `ls -t "$PACK_DIR"/*.tgz \| head -1` |
|
|
121
|
+
| Packed extension, no install | Provider never loaded in isolated project | Run `npm install --omit=dev` inside extracted package |
|
|
122
|
+
| Inherited shell env | mise/profile hooks hung or polluted runs | Use `env -i ... MISE_DISABLE=1` for isolated pi calls |
|
|
123
|
+
| No per-check timeout | One stuck prompt blocked entire harness | Wrap each live check with timeout/watchdog |
|
|
124
|
+
| stdout-only assertions | Missed replay failures persisted only in JSONL | Scan JSONL for `Tool grep/cursor/find/ls not found` |
|
|
125
|
+
| Naive JSONL substring scan | Successful `read` of docs mentioning replay errors looked like failures | `validate-smoke-jsonl.mjs` only flags error `toolResult` / error assistant messages |
|
|
126
|
+
| Plan strip only on first turn | Under-tested multi-turn resync | Shim strips on every `turn_start`; stress multi-turn separately |
|
|
127
|
+
| Assuming env auth equals pi auth | False "blocked" or false "pass" in CI-like shells | Check `auth.json` provider keys explicitly when needed |
|
|
128
|
+
|
|
129
|
+
## JSONL is the source of truth for replay regressions
|
|
130
|
+
|
|
131
|
+
Stdout can look fine while persisted tool results contain errors. Prefer structural JSONL scans over grepping terminal output.
|
|
132
|
+
|
|
133
|
+
Replay failure scan:
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
node scripts/validate-smoke-jsonl.mjs --replay-errors-only "$SESSION_DIR"
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
Combined usage + replay scan after broader smoke:
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
node scripts/validate-smoke-jsonl.mjs --replay-errors "$SMOKE_DIR"
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
### What counts as a replay failure
|
|
146
|
+
|
|
147
|
+
The scan fails only on **persisted error messages**, not arbitrary substring matches in session JSONL:
|
|
148
|
+
|
|
149
|
+
- error `toolResult` records (`isError: true`) whose text contains:
|
|
150
|
+
- `Tool grep not found`
|
|
151
|
+
- `Tool cursor not found`
|
|
152
|
+
- `Tool find not found`
|
|
153
|
+
- `Tool ls not found`
|
|
154
|
+
- error assistant messages (`stopReason: "error"` or `errorMessage`) containing those strings
|
|
155
|
+
|
|
156
|
+
Successful tool results are ignored even when file contents mention those strings (for example a `read` of `docs/cursor-testing-lessons.md` during plan-strip smoke).
|
|
157
|
+
|
|
158
|
+
### False-positive edge case (2026-05-23)
|
|
159
|
+
|
|
160
|
+
Plan-strip live smoke can make Cursor `read` testing docs that *document* replay failure strings. A naive whole-record JSON scan reported four failures from one successful `read` toolResult (`isError: false`).
|
|
161
|
+
|
|
162
|
+
When changing replay scan logic:
|
|
163
|
+
|
|
164
|
+
1. Update `scripts/validate-smoke-jsonl.mjs`
|
|
165
|
+
2. Add/adjust cases in `test/validate-smoke-jsonl.test.ts` (error toolResult must still fail; successful read of doc text must pass)
|
|
166
|
+
3. Re-run `npm run smoke:isolated` on a packed temp install before release
|
|
167
|
+
|
|
168
|
+
## Plan-mode regression scenario
|
|
169
|
+
|
|
170
|
+
Simulate plan-mode execute stripping with the repo fixture:
|
|
171
|
+
|
|
172
|
+
- `scripts/fixtures/plan-strip-shim/index.ts`
|
|
173
|
+
|
|
174
|
+
It sets active tools to `read`, `bash`, `edit`, `write` on each `turn_start`. Run pi with:
|
|
175
|
+
|
|
176
|
+
```bash
|
|
177
|
+
pi -e scripts/fixtures/plan-strip-shim --cursor-no-fast --model cursor/composer-2.5 \
|
|
178
|
+
--session-dir "$SMOKE_DIR/plan-strip" \
|
|
179
|
+
-p 'After reset, read README.md and answer PLAN_STRIP_OK=yes.'
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
Pass criteria:
|
|
183
|
+
|
|
184
|
+
- No replay `Tool * not found` entries in JSONL
|
|
185
|
+
- Native replay tools (`grep`, `find`, `read`, etc.) succeed after `turn_start` resync
|
|
186
|
+
- On non-Cursor model switch, native replay wrappers are removed except core pi tools
|
|
187
|
+
|
|
188
|
+
## Local validation ladder
|
|
189
|
+
|
|
190
|
+
Run in order before claiming release-ready for provider/runtime changes:
|
|
191
|
+
|
|
192
|
+
```bash
|
|
193
|
+
npm test
|
|
194
|
+
npm run typecheck
|
|
195
|
+
npm pack --dry-run
|
|
196
|
+
SKIP_LIVE=1 npm run smoke:isolated
|
|
197
|
+
npm run smoke:isolated # requires auth.json or CURSOR_API_KEY
|
|
198
|
+
npm run smoke:live # partial tmux checklist subset
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
After changing `scripts/validate-smoke-jsonl.mjs` or replay scan expectations, also run:
|
|
202
|
+
|
|
203
|
+
```bash
|
|
204
|
+
npm test -- test/validate-smoke-jsonl.test.ts
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
Then follow the full manual [Cursor live smoke checklist](./cursor-live-smoke-checklist.md) for surfaces the scripts do not cover (bridge MCP, abort/cancel, full TUI observation, packaging review, cleanup).
|
|
208
|
+
|
|
209
|
+
## What belongs in CI vs manual smoke
|
|
210
|
+
|
|
211
|
+
- **CI / default `npm test`:** mocked provider tests, extension lifecycle tests, JSONL validator tests, script syntax/help checks. No live Cursor calls.
|
|
212
|
+
- **Manual / pre-release:** `npm run smoke:isolated`, `npm run smoke:live`, and the full checklist. Requires real Cursor auth and observes TUI/runtime behavior mocks cannot reproduce.
|
|
213
|
+
|
|
214
|
+
If live smoke auth is unavailable, report the release as **blocked**, not skipped-ready.
|
|
215
|
+
|
|
216
|
+
## Cursor SDK event capture probe
|
|
217
|
+
|
|
218
|
+
When debugging TUI/progress/replay timing gaps, capture raw Cursor SDK surfaces side-by-side instead of writing a throwaway probe:
|
|
219
|
+
|
|
220
|
+
```bash
|
|
221
|
+
CURSOR_API_KEY=... npm run debug:sdk-events -- \
|
|
222
|
+
--cwd ~/Projects \
|
|
223
|
+
--model composer-2.5 \
|
|
224
|
+
--prompt 'Scan all of my projects and give me ideas that would be great to add the Cursor SDK to' \
|
|
225
|
+
--out /tmp/pi-cursor-sdk-sdk-events-manual
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
The script writes timestamped artifacts under `--out` (default `/tmp/pi-cursor-sdk-sdk-events-<timestamp>`):
|
|
229
|
+
|
|
230
|
+
- `stream-events.jsonl` — `run.stream()` messages
|
|
231
|
+
- `on-delta.jsonl` — `agent.send(..., { onDelta })` updates
|
|
232
|
+
- `on-step.jsonl` — `agent.send(..., { onStep })` steps
|
|
233
|
+
- `wait-result.json` — final `run.wait()` metadata
|
|
234
|
+
- optional `conversation.json` with `--include-conversation`
|
|
235
|
+
- `summary.json` — event counts and timing gaps
|
|
236
|
+
|
|
237
|
+
Stdout prints artifact paths and summary counts only. Raw payloads stay on disk and may contain local paths, project text, tool args/results, or secrets — do not commit or share them.
|
|
238
|
+
|
|
239
|
+
Hard repo rule: Cursor SDK behavior claims must come from the installed `@cursor/sdk` package and/or https://cursor.com/docs/sdk/typescript, not from memory or ad-hoc probes alone.
|
|
240
|
+
|
|
241
|
+
## Pi provider SDK event capture
|
|
242
|
+
|
|
243
|
+
When debugging pi parsing, replay routing, bridge timing, or send-plan behavior, capture the raw `onDelta`/`onStep` payloads **as the Cursor provider receives them** instead of using the direct SDK probe above.
|
|
244
|
+
|
|
245
|
+
One-shot maintainer script (RPC pi run, gitignored artifacts by default):
|
|
246
|
+
|
|
247
|
+
```bash
|
|
248
|
+
CURSOR_API_KEY=... npm run debug:provider-events -- \
|
|
249
|
+
--cwd . \
|
|
250
|
+
--model cursor/composer-2.5 \
|
|
251
|
+
--prompt 'Repro prompt here' \
|
|
252
|
+
--out .debug/cursor-sdk-events/manual-repro
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
Or read a prompt from disk:
|
|
256
|
+
|
|
257
|
+
```bash
|
|
258
|
+
CURSOR_API_KEY=... npm run debug:provider-events -- \
|
|
259
|
+
--prompt-file .debug/repro-prompt.txt \
|
|
260
|
+
--out .debug/cursor-sdk-events/manual-repro
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
Artifacts under `--out` (default `.debug/cursor-sdk-events/<timestamp>/` under `--cwd`):
|
|
264
|
+
|
|
265
|
+
- `metadata.json` — model, cwd, send-plan/provider metadata
|
|
266
|
+
- `context-snapshot.json` — full pi `Context` passed into the provider turn
|
|
267
|
+
- `send-payload.json` — exact `agent.send()` input (text + images)
|
|
268
|
+
- `on-delta.jsonl` — raw `InteractionUpdate` objects passed to `turnCoordinator.handleDelta`
|
|
269
|
+
- `on-step.jsonl` — raw `onStep` payloads passed to `turnCoordinator.handleStep`
|
|
270
|
+
- `stream-events.jsonl` — raw `run.stream()` events when supported
|
|
271
|
+
- `pi-stream-events.jsonl` — exact pi stream events emitted to the TUI (`text_delta`, `thinking_delta`, replay cards, `done`, etc.)
|
|
272
|
+
- `provider-events.jsonl` — provider lifecycle markers (`agent_send_start`, `agent_send_returned`, …)
|
|
273
|
+
- `live-run-events.jsonl` — queued native replay / bridge live-run events
|
|
274
|
+
- `bridge-events.jsonl` — bridge lifecycle/request diagnostics (file-only; no stderr unless bridge debug is also enabled)
|
|
275
|
+
- `bridge-raw.jsonl` — raw bridged MCP args/results
|
|
276
|
+
- `display-decisions.jsonl` — per-tool native replay routing (`queue_replay`, `emit_trace`, `inactive_trace`, dedupe skips, bridge ignores) with transcript/trace text
|
|
277
|
+
- `coordinator-events.jsonl` — turn-coordinator side effects (task progress labels, discarded incomplete started tool calls, etc.)
|
|
278
|
+
- `drain-events.jsonl` — live-run pre-send drain and per-turn drain lifecycle (`turn_start`, `turn_end`, inactive replay traces, native display registration)
|
|
279
|
+
- `timeline.jsonl` — merged cross-layer timeline (one grep-friendly stream for the whole turn)
|
|
280
|
+
- `pi-session-snapshot.jsonl` — copy of pi session JSONL at turn finalize (session dir also gets latest `pi-session.jsonl`)
|
|
281
|
+
- `final-partial.json` — assistant partial emitted to pi at end of the provider turn
|
|
282
|
+
- `errors.jsonl` — provider/stream/conversation failures
|
|
283
|
+
- `wait-result.json` — `run.wait()` result
|
|
284
|
+
- `conversation.json` — `run.conversation()` when supported
|
|
285
|
+
- `summary.json` — counts and artifact paths
|
|
286
|
+
|
|
287
|
+
During any normal pi session you can also opt in with:
|
|
288
|
+
|
|
289
|
+
```bash
|
|
290
|
+
PI_CURSOR_SDK_EVENT_DEBUG=1 pi -e . --model cursor/composer-2.5
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
Multi-turn sessions group automatically by pi session file:
|
|
294
|
+
|
|
295
|
+
```text
|
|
296
|
+
.debug/cursor-sdk-events/sessions/<session-slug>/
|
|
297
|
+
session.json # index of all turns in this pi session
|
|
298
|
+
turn-001-<timestamp>/ # first provider turn
|
|
299
|
+
turn-002-<timestamp>/ # second provider turn
|
|
300
|
+
...
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
Each turn still gets the full per-turn artifact bundle above. Use `session.json` to jump between turns while debugging incremental send, bridge resolution, or native replay continuation across pi messages. For tool-heavy turns, trace/thinking replay often drains on the **next** pi message — check turn N+1 `drain-events.jsonl` and `pi-stream-events.jsonl` alongside turn N `display-decisions.jsonl`.
|
|
304
|
+
|
|
305
|
+
Optional env:
|
|
306
|
+
|
|
307
|
+
- `PI_CURSOR_SDK_EVENT_DEBUG_DIR` — base directory (default `.debug/cursor-sdk-events`)
|
|
308
|
+
- `PI_CURSOR_SDK_EVENT_DEBUG_SESSION_DIR` — exact session root for all turns in the current pi session
|
|
309
|
+
- `PI_CURSOR_SDK_EVENT_DEBUG_RUN_DIR` — exact artifact directory for one isolated turn (the maintainer script sets this via `--out`; bypasses session grouping)
|
|
310
|
+
- `PI_CURSOR_SDK_EVENT_DEBUG_STDERR=1` — also print the summary line to stderr (off by default so the pi TUI stays normal)
|
|
311
|
+
|
|
312
|
+
Capture is file-only by default: no stderr markers, and bridge diagnostics during SDK event debug go to `bridge-events.jsonl` instead of `[pi-cursor-sdk:bridge]` unless you separately set `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1`. Raw payloads stay on disk and may contain secrets — do not commit or share them.
|
|
313
|
+
|
|
314
|
+
### Discarded incomplete SDK tool calls
|
|
315
|
+
|
|
316
|
+
When Cursor emits `tool-call-started` without a matching completion/step result, the provider surfaces a bounded neutral **Cursor … did not complete** activity card or thinking trace at run end. pi bridge MCP calls (`pi__*`) are excluded because pi already shows the real pi tool execution path.
|
|
317
|
+
|
|
318
|
+
With `PI_CURSOR_SDK_EVENT_DEBUG=1`, each discarded started call is also recorded in `coordinator-events.jsonl` under phase `discarded-incomplete-started-tool-call` with:
|
|
319
|
+
|
|
320
|
+
- normalized SDK tool name
|
|
321
|
+
- scrubbed call-id hash (raw call IDs are not written)
|
|
322
|
+
- reason such as `no-completion-at-run-end`, `abort`, or `sdk-failure`
|
|
323
|
+
|
|
324
|
+
Stderr output for these records requires `PI_CURSOR_SDK_EVENT_DEBUG_STDERR=1`. This complements the standalone `npm run debug:sdk-events` probe by interpreting a specific provider discard path during normal pi runs. User-visible incomplete cards explain the gap in the TUI; debug artifacts remain maintainer-only (**#52**).
|
|
325
|
+
|
|
326
|
+
## Tool calls listed as plain text (#40 triage)
|
|
327
|
+
|
|
328
|
+
**Symptom:** Assistant output lists tool invocations (for example `Tool call`, `Cursor activity`, `call cursor-replay-…`, `toolName`, `browser_navigate`) instead of pi tool execution cards/results.
|
|
329
|
+
|
|
330
|
+
**What the screenshot in [#40](https://github.com/fitchmultz/pi-cursor-sdk/issues/40) shows:** Plain assistant text that mirrors pi's **prompt transcript format** for replay tool calls (`Tool call (Cursor activity, call cursor-replay-…): …` from `src/context.ts`) rather than a rendered pi `toolCall` card. That pattern usually means the Cursor model **narrated** a tool call as text; it is not proof that pi failed to emit `toolcall_start` / `toolUse`.
|
|
331
|
+
|
|
332
|
+
**Do not close #40 as duplicate of #55 without session JSONL.** #55 surfaces scrubbed SDK run failures and abort causes in the TUI. #40 can occur with no error toast when the model prints tool metadata as assistant text, when replay is display-only but the user expected real execution, when stale native replay routing or plan-strip resync gaps produce `Tool * not found` errors (see **#52**), or when started SDK tools were discarded at run end (see **#52** maintainer debug and [Discarded incomplete SDK tool calls](#discarded-incomplete-sdk-tool-calls) above). A hard **process exit** from uncaught `ConnectError` / `ETIMEDOUT` is **#43**, not #40 text echo.
|
|
333
|
+
|
|
334
|
+
### Reporter checklist (required before claiming a provider bug)
|
|
335
|
+
|
|
336
|
+
Ask the reporter (or capture yourself) for:
|
|
337
|
+
|
|
338
|
+
| Field | Why |
|
|
339
|
+
| --- | --- |
|
|
340
|
+
| `pi --version` and installed `pi-cursor-sdk` version | Confirms extension/runtime in use |
|
|
341
|
+
| Model ID (for example `cursor/composer-2.5`) | Routing/replay behavior is model-scoped |
|
|
342
|
+
| Exact repro prompt and prior turns | Multi-turn replay history affects prompt text |
|
|
343
|
+
| Flags: `--cursor-no-fast`, `PI_CURSOR_PI_TOOL_BRIDGE`, `PI_CURSOR_EXPOSE_BUILTIN_TOOLS`, `PI_CURSOR_SETTING_SOURCES` | Bridge vs native-only vs narrowed settings |
|
|
344
|
+
| Whether the listed names are `pi__*` bridge MCP, Cursor-native (`browser_navigate`, `WebSearch`), or `cursor-replay-*` replay IDs | Three different surfaces (see [Cursor native tool replay](./cursor-native-tool-replay.md#live-bridge-vs-replay)) |
|
|
345
|
+
| Red toast / `errorMessage` text, if any | Distinguishes #55 failure surfacing from silent text echo |
|
|
346
|
+
| Process exit / uncaught `ConnectError` / `ETIMEDOUT` stack trace, if any | Hard network crash (**#43**), not #40 model text echo |
|
|
347
|
+
| Session JSONL path (redact secrets before sharing) | Source of truth for `toolCall` vs plain `text` blocks; scan for replay `Tool * not found` (**#52**) |
|
|
348
|
+
|
|
349
|
+
### Capture steps (maintainers)
|
|
350
|
+
|
|
351
|
+
Use an isolated session dir and do not paste auth, tokens, or raw debug payloads into issues.
|
|
352
|
+
|
|
353
|
+
```bash
|
|
354
|
+
SMOKE_DIR="/tmp/pi-cursor-sdk-issue40-$(date +%s)"
|
|
355
|
+
mkdir -p "$SMOKE_DIR/home/.pi/agent"
|
|
356
|
+
cp "$HOME/.pi/agent/auth.json" "$SMOKE_DIR/home/.pi/agent/auth.json"
|
|
357
|
+
chmod 600 "$SMOKE_DIR/home/.pi/agent/auth.json"
|
|
358
|
+
|
|
359
|
+
env -i HOME="$SMOKE_DIR/home" PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin" \
|
|
360
|
+
MISE_DISABLE=1 \
|
|
361
|
+
PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1 \
|
|
362
|
+
pi -e . --cursor-no-fast --model cursor/composer-2.5 \
|
|
363
|
+
--session-dir "$SMOKE_DIR/session" \
|
|
364
|
+
-p '<exact reporter prompt>'
|
|
365
|
+
```
|
|
366
|
+
|
|
367
|
+
Optional provider/SDK timelines (separate from pi session JSONL; see [Pi provider SDK event capture](#pi-provider-sdk-event-capture) and [Cursor SDK event capture probe](#cursor-sdk-event-capture-probe)):
|
|
368
|
+
|
|
369
|
+
For pi parsing, replay routing, or bridge timing, prefer:
|
|
370
|
+
|
|
371
|
+
```bash
|
|
372
|
+
npm run debug:provider-events -- \
|
|
373
|
+
--cwd "$PWD" \
|
|
374
|
+
--model cursor/composer-2.5 \
|
|
375
|
+
--prompt '<exact reporter prompt>' \
|
|
376
|
+
--out "$SMOKE_DIR/provider-events"
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
Or add `PI_CURSOR_SDK_EVENT_DEBUG=1` to the pi run above (writes under `.debug/cursor-sdk-events/`).
|
|
380
|
+
|
|
381
|
+
For raw Cursor SDK surfaces only:
|
|
382
|
+
|
|
383
|
+
```bash
|
|
384
|
+
npm run debug:sdk-events -- \
|
|
385
|
+
--cwd "$PWD" \
|
|
386
|
+
--model composer-2.5 \
|
|
387
|
+
--prompt '<exact reporter prompt>' \
|
|
388
|
+
--out "$SMOKE_DIR/sdk-events"
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
### JSONL classification (decision tree)
|
|
392
|
+
|
|
393
|
+
Start with whether pi stayed alive:
|
|
394
|
+
|
|
395
|
+
0. **pi process exited / shell returned with uncaught `ConnectError` (`ETIMEDOUT`, code 14, `read ETIMEDOUT`)** — hard network crash bypassing provider error surfacing. Route to **#43** (coordinate with #55 for caught-failure messaging). If tools were mid-flight, note whether session JSONL ends abruptly; do not classify as #40 model text echo.
|
|
396
|
+
|
|
397
|
+
Then inspect the failing assistant turn in `$SMOKE_DIR/session/*.jsonl`:
|
|
398
|
+
|
|
399
|
+
1. **Error `toolResult` (`isError: true`) or error assistant message contains `Tool grep/cursor/find/ls not found`** — stale `context.tools` snapshot or plan-strip resync gap after plan-mode execute stripped active tools. Run `node scripts/validate-smoke-jsonl.mjs --replay-errors-only "$SMOKE_DIR/session"`. Optional: `display-decisions.jsonl` from `PI_CURSOR_SDK_EVENT_DEBUG=1` shows `inactive_trace` routing. Route to **#52** — not model text echo (those strings appear in persisted error records, not narrated `Tool call (` lines). See [Dual-check invariant](#dual-check-invariant-contexttools-vs-pi-active-tools).
|
|
400
|
+
2. **`content` has `type: "toolCall"` blocks and matching `toolResult` rows** — pi executed or replayed tools; if the TUI still looked like plain text, capture a screenshot and pi version (possible pi TUI/display issue, not provider dispatch).
|
|
401
|
+
3. **`content` is only `type: "text"` and text contains `Tool call (` / `cursor-replay-` / serialized arg keys** — model text echo of prompt transcript format; not #55, not #52 stale routing. Compare with `buildCursorPrompt()` output in the prior turn.
|
|
402
|
+
4. **No `toolCall` blocks, no error toast, user expected real execution** — check whether names are replay-only (`cursor-replay-*`) or Cursor-native MCP; replay never re-runs work ([replay doc](./cursor-native-tool-replay.md)).
|
|
403
|
+
5. **`stopReason: "error"` or scrubbed `errorMessage`** — classify under **#55**; check whether incomplete started tools were discarded (`discardIncompleteStartedToolCalls()`). Discarded starts with no completion and no model text echo: see `coordinator-events.jsonl` phase `discarded-incomplete-started-tool-call` ([Discarded incomplete SDK tool calls](#discarded-incomplete-sdk-tool-calls) above); route broader stale/inactive replay gaps to **#52**.
|
|
404
|
+
6. **Bridge expected (`pi__*` in Cursor MCP)** — inspect stderr `[pi-cursor-sdk:bridge]` JSONL with `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` for pending/unresolved bridge requests.
|
|
405
|
+
|
|
406
|
+
Quick structural scan (no secrets):
|
|
407
|
+
|
|
408
|
+
```bash
|
|
409
|
+
node scripts/validate-smoke-jsonl.mjs --replay-errors-only "$SMOKE_DIR/session"
|
|
410
|
+
rg '"type": "toolCall"|Tool call \(Cursor|cursor-replay-' "$SMOKE_DIR/session"/*.jsonl
|
|
411
|
+
```
|
|
412
|
+
|
|
413
|
+
### When to file follow-ups
|
|
414
|
+
|
|
415
|
+
- **#43** — pi exited from uncaught `ConnectError` / `ETIMEDOUT` during Cursor SDK HTTP traffic (hard crash, not a scrubbed #55 toast).
|
|
416
|
+
- **#55** — caught SDK run failure or abort with missing/opaque detail (already addressed on main for surfacing).
|
|
417
|
+
- **#52** — stale/inactive native replay routing after plan-strip or stale `context.tools` snapshot (`Tool * not found` in JSONL, `inactive_trace` in `display-decisions.jsonl`); or maintainer needs an explicit "started X, never completed" debug line when JSONL shows no completion and no model text echo.
|
|
418
|
+
- **New issue** — bridge dispatch failure with `[pi-cursor-sdk:bridge]` evidence, or proven provider bug with JSONL showing missing `toolCall` despite SDK `tool-call-completed` in `on-delta.jsonl` from `debug:provider-events` or `debug:sdk-events` artifacts.
|
|
419
|
+
## Related docs and scripts
|
|
420
|
+
|
|
421
|
+
- [Cursor live smoke checklist](./cursor-live-smoke-checklist.md)
|
|
422
|
+
- [Cursor native tool replay](./cursor-native-tool-replay.md)
|
|
423
|
+
- `scripts/isolated-cursor-smoke.sh`
|
|
424
|
+
- `scripts/tmux-live-smoke.sh`
|
|
425
|
+
- `scripts/validate-smoke-jsonl.mjs`
|
|
426
|
+
- `scripts/debug-sdk-events.mjs`
|
|
427
|
+
- `scripts/debug-provider-events.mjs`
|
|
428
|
+
- `test/helpers/cursor-provider-harness.ts` — controllable native replay pi mock (`createNativeToolDisplayPiForTest`)
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "pi-cursor-sdk",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.19",
|
|
4
4
|
"description": "pi provider extension backed by @cursor/sdk local agents",
|
|
5
5
|
"author": "Mitch Fultz (https://github.com/fitchmultz)",
|
|
6
6
|
"license": "MIT",
|
|
@@ -26,10 +26,16 @@
|
|
|
26
26
|
"scripts/refresh-cursor-model-snapshots.mjs",
|
|
27
27
|
"scripts/steering-rpc-smoke.mjs",
|
|
28
28
|
"scripts/tmux-live-smoke.sh",
|
|
29
|
+
"scripts/isolated-cursor-smoke.sh",
|
|
29
30
|
"scripts/validate-smoke-jsonl.mjs",
|
|
31
|
+
"scripts/debug-sdk-events.mjs",
|
|
32
|
+
"scripts/debug-provider-events.mjs",
|
|
33
|
+
"scripts/lib/cursor-probe-utils.mjs",
|
|
34
|
+
"scripts/lib/cursor-sdk-output-filter.mjs",
|
|
30
35
|
"README.md",
|
|
31
36
|
"docs/cursor-model-ux-spec.md",
|
|
32
37
|
"docs/cursor-live-smoke-checklist.md",
|
|
38
|
+
"docs/cursor-testing-lessons.md",
|
|
33
39
|
"docs/cursor-native-tool-replay.md",
|
|
34
40
|
"docs/cursor-native-tool-visual-audit.md",
|
|
35
41
|
"LICENSE",
|
|
@@ -45,8 +51,11 @@
|
|
|
45
51
|
"test:watch": "vitest",
|
|
46
52
|
"refresh:cursor-snapshots": "node scripts/refresh-cursor-model-snapshots.mjs",
|
|
47
53
|
"smoke:live": "scripts/tmux-live-smoke.sh",
|
|
54
|
+
"smoke:isolated": "scripts/isolated-cursor-smoke.sh",
|
|
48
55
|
"smoke:steering": "node scripts/steering-rpc-smoke.mjs",
|
|
49
|
-
"smoke:jsonl": "node scripts/validate-smoke-jsonl.mjs"
|
|
56
|
+
"smoke:jsonl": "node scripts/validate-smoke-jsonl.mjs",
|
|
57
|
+
"debug:sdk-events": "node scripts/debug-sdk-events.mjs",
|
|
58
|
+
"debug:provider-events": "node scripts/debug-provider-events.mjs"
|
|
50
59
|
},
|
|
51
60
|
"dependencies": {
|
|
52
61
|
"@cursor/sdk": "^1.0.13",
|