pi-cursor-sdk 0.1.27 → 0.1.29
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +29 -0
- package/README.md +40 -37
- package/docs/crabbox-platform-testing-lessons.md +508 -0
- package/docs/cursor-dogfood-checklist.md +4 -3
- package/docs/cursor-live-smoke-checklist.md +24 -22
- package/docs/cursor-model-ux-spec.md +12 -12
- package/docs/cursor-native-tool-replay.md +10 -10
- package/docs/cursor-native-tool-visual-audit.md +9 -7
- package/docs/cursor-testing-lessons.md +22 -17
- package/docs/cursor-tool-surfaces.md +3 -3
- package/docs/platform-smoke.md +994 -0
- package/package.json +35 -6
- package/platform-smoke.config.mjs +21 -0
- package/scripts/debug-provider-events.mjs +10 -3
- package/scripts/debug-sdk-events.mjs +10 -2
- package/scripts/isolated-cursor-smoke.sh +4 -4
- package/scripts/lib/cursor-visual-render.mjs +1 -0
- package/scripts/platform-smoke/artifacts.mjs +124 -0
- package/scripts/platform-smoke/assertions.mjs +101 -0
- package/scripts/platform-smoke/card-detect.mjs +96 -0
- package/scripts/platform-smoke/crabbox-runner.mjs +215 -0
- package/scripts/platform-smoke/doctor.mjs +446 -0
- package/scripts/platform-smoke/jsonl-text.mjs +31 -0
- package/scripts/platform-smoke/live-suite-runner.mjs +677 -0
- package/scripts/platform-smoke/platform-build-windows.ps1 +187 -0
- package/scripts/platform-smoke/pty-capture.mjs +131 -0
- package/scripts/platform-smoke/render-ansi.mjs +65 -0
- package/scripts/platform-smoke/scenarios.mjs +186 -0
- package/scripts/platform-smoke/targets.mjs +900 -0
- package/scripts/platform-smoke/visual-evidence.mjs +139 -0
- package/scripts/platform-smoke.mjs +193 -0
- package/scripts/probe-mcp-coldstart.mjs +8 -1
- package/scripts/steering-rpc-smoke.mjs +1 -1
- package/scripts/tmux-live-smoke.sh +3 -3
- package/scripts/visual-tui-smoke.mjs +1 -1
- package/src/cursor-pi-tool-bridge-abort.ts +1 -0
- package/src/cursor-pi-tool-bridge-diagnostics.ts +12 -1
- package/src/cursor-pi-tool-bridge.ts +46 -1
- package/src/cursor-provider-errors.ts +18 -2
- package/src/cursor-provider-turn-lifecycle-emitter.ts +65 -8
- package/src/cursor-provider-turn-tool-ledger.ts +2 -3
- package/src/cursor-run-final-text.ts +11 -1
- package/src/cursor-sdk-process-error-guard.ts +1 -1
- package/src/cursor-state.ts +38 -19
- package/src/cursor-tool-lifecycle.ts +1 -1
- package/src/cursor-tool-manifest.ts +1 -1
- package/src/cursor-transcript-utils.ts +7 -3
|
@@ -0,0 +1,994 @@
|
|
|
1
|
+
# Platform Smoke Gate
|
|
2
|
+
|
|
3
|
+
Status: current release gate for Cursor provider/runtime changes. The Crabbox runner, packed-install platform-build suite, and real live PTY/ConPTY suite runner are implemented for macOS, Ubuntu, and Windows native targets with one-lease-per-target orchestration.
|
|
4
|
+
|
|
5
|
+
Branch introduced by: `feat/crabbox-platform-smoke`
|
|
6
|
+
|
|
7
|
+
Oracle review incorporated: this gate resolves the packed-install workspace conflict, Cursor budget contradiction, Windows shell drift, artifact-on-failure gap, render-location ambiguity, provider-debug ambiguity, and registry-classification gap called out during review.
|
|
8
|
+
|
|
9
|
+
## Decision
|
|
10
|
+
|
|
11
|
+
Crabbox is the required platform smoke runner for `pi-cursor-sdk` releases that touch Cursor provider/runtime behavior.
|
|
12
|
+
|
|
13
|
+
Inner-loop checks remain useful, but they are not release gates:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
npm run verify
|
|
17
|
+
npm pack --dry-run
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
The required release gate is exactly:
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
npm run smoke:platform:all
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
`smoke:platform:all` runs `smoke:platform:doctor` first and only starts the target matrix after doctor passes. Maintainers may still run `npm run smoke:platform:doctor` by itself for setup diagnosis.
|
|
27
|
+
|
|
28
|
+
|
|
29
|
+
Per-target commands exist for diagnosis and iteration. They are not additional release-gate commands because requiring each per-target command plus `all` doubles Cursor token use.
|
|
30
|
+
|
|
31
|
+
No partial adoption exists. The release evidence must include macOS, Ubuntu, and Windows native passing through `smoke:platform:all`.
|
|
32
|
+
|
|
33
|
+
## Non-negotiable constraints
|
|
34
|
+
|
|
35
|
+
- No GitHub Actions dependency.
|
|
36
|
+
- No cloud provider dependency.
|
|
37
|
+
- No Crabbox broker/coordinator dependency.
|
|
38
|
+
- No release gate that runs on only one operating system.
|
|
39
|
+
- No release gate that proves command behavior but not TUI visual behavior.
|
|
40
|
+
- No platform release gate based on `pi -e .`.
|
|
41
|
+
- No skipped target because setup is missing; missing setup is a doctor failure.
|
|
42
|
+
- No one-prompt-per-card visual matrix.
|
|
43
|
+
- No `tmux` as the canonical visual test contract.
|
|
44
|
+
- No target passes from stdout alone when JSONL or visual proof is required.
|
|
45
|
+
- No target loses artifacts on failure.
|
|
46
|
+
- No hidden optional evidence. Every required artifact is produced or the suite fails.
|
|
47
|
+
|
|
48
|
+
## Required Crabbox baseline
|
|
49
|
+
|
|
50
|
+
The runner uses one supported Crabbox build.
|
|
51
|
+
|
|
52
|
+
Current baseline:
|
|
53
|
+
|
|
54
|
+
```text
|
|
55
|
+
install: brew install crabbox
|
|
56
|
+
version: 0.24.0
|
|
57
|
+
binary: /opt/homebrew/bin/crabbox on Apple Silicon Homebrew installs
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
Keep this exact version or replace it with another exact released Crabbox version when updating the gate. `smoke:platform:doctor` verifies the configured Crabbox binary and fails on mismatch.
|
|
61
|
+
|
|
62
|
+
Required Crabbox providers:
|
|
63
|
+
|
|
64
|
+
- `local-container` for Ubuntu.
|
|
65
|
+
- `ssh` static localhost for macOS. Static localhost leases use Crabbox's shared `static_localhost` lease id, so the runner passes `--reclaim` during macOS warmup to claim that lease for this repository before running suites.
|
|
66
|
+
- `parallels` for Windows native.
|
|
67
|
+
|
|
68
|
+
## Architecture
|
|
69
|
+
|
|
70
|
+
The source of truth is:
|
|
71
|
+
|
|
72
|
+
```text
|
|
73
|
+
scenario + target capability + artifact contract
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
not a one-off shell script.
|
|
77
|
+
|
|
78
|
+
High-level flow:
|
|
79
|
+
|
|
80
|
+
```text
|
|
81
|
+
platform-smoke.config.mjs
|
|
82
|
+
-> target definition
|
|
83
|
+
-> target session manager
|
|
84
|
+
-> scenario suite runner
|
|
85
|
+
-> PTY/ConPTY capture on the target
|
|
86
|
+
-> artifact package/download
|
|
87
|
+
-> host-side xterm/Playwright render
|
|
88
|
+
-> visual evidence screenshot/assertion engine
|
|
89
|
+
-> JSONL/assertion engine
|
|
90
|
+
-> artifact manifest
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
Rendering is host-side. Targets capture the real ANSI stream; the macOS host renders it and captures per-evidence screenshots from the rendered xterm DOM. This keeps the renderer identical across macOS, Ubuntu, and Windows native and avoids browser dependency drift inside test targets.
|
|
94
|
+
|
|
95
|
+
## Target session model
|
|
96
|
+
|
|
97
|
+
Each target opens one Crabbox target session, syncs once, runs all suites for that target under one coherent target run id, collects artifacts, and stops/releases the target. The release-gate entrypoint runs required targets concurrently; each target still runs its own suites in order and fails fast within that target. Platform smoke disables Crabbox git-seed sync (`CRABBOX_SYNC_GIT_SEED=false`) so every run tests the current local checkout and uncommitted smoke-runner changes rather than a remote Git seed.
|
|
98
|
+
|
|
99
|
+
```text
|
|
100
|
+
start target session
|
|
101
|
+
verify target prerequisites
|
|
102
|
+
acquire or warm target
|
|
103
|
+
create unique remote run root
|
|
104
|
+
sync checkout once into extensionSourceRoot
|
|
105
|
+
run platform-build
|
|
106
|
+
run cursor-native-visual-matrix
|
|
107
|
+
run cursor-bridge-visual-matrix
|
|
108
|
+
run cursor-abort-cleanup
|
|
109
|
+
download artifacts after every suite
|
|
110
|
+
stop target
|
|
111
|
+
write lease-cleanup stop evidence
|
|
112
|
+
end target session
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
The target session fails fast. The release-gate path warms one Crabbox lease per target, performs one fresh sync, runs suites in order on that target, and stops that target after the first failure. Different targets run concurrently to keep wall time bounded by the slowest platform instead of the sum of all platforms. Per-suite commands remain available for diagnosis, but they are intentionally not the normal release path because repeated warmup/sync/install cycles make releases too slow.
|
|
116
|
+
|
|
117
|
+
Runtime budget is part of the contract:
|
|
118
|
+
|
|
119
|
+
- `smoke:platform:doctor` never calls Cursor.
|
|
120
|
+
- `platform-build` runs once per target and is the only suite that performs the full local CI/build/typecheck/package gate.
|
|
121
|
+
- Live suites reuse the target checkout and prepared `node_modules` when run after `platform-build`; they do not repeat `npm ci` in a target-session release run.
|
|
122
|
+
- Live suites share one target-local packed-install prep directory per target-session release run. The first live suite runs `npm pack` and `npm install --no-save <tarball>` once, then each suite still performs its own `pi install -l <packed package path>`, `pi list`, fresh `--session-dir`, suite `PI_CODING_AGENT_DIR`, workspace fixture, JSONL, visual, bridge, and abort assertions.
|
|
123
|
+
- Visual coverage is batched into one native prompt, one bridge prompt, and one abort/cleanup prompt per target. Do not split these into one prompt per card.
|
|
124
|
+
- The gate is fail-fast by target to avoid burning Cursor calls after a platform has already failed.
|
|
125
|
+
|
|
126
|
+
## Required targets
|
|
127
|
+
|
|
128
|
+
| Target | Crabbox provider | Execution contract | TUI visual contract |
|
|
129
|
+
| --- | --- | --- | --- |
|
|
130
|
+
| `macos` | `ssh` static localhost | native macOS shell | PTY ANSI capture and host-side render |
|
|
131
|
+
| `ubuntu` | `local-container` | Docker Ubuntu container | PTY ANSI capture and host-side render |
|
|
132
|
+
| `windows-native` | `parallels` | Windows 11 clone, native PowerShell/Node | ConPTY ANSI capture and host-side render |
|
|
133
|
+
|
|
134
|
+
Ubuntu is covered as its own local-container target, and Windows native remains a full visual TUI target.
|
|
135
|
+
|
|
136
|
+
## Files and scripts
|
|
137
|
+
|
|
138
|
+
Files:
|
|
139
|
+
|
|
140
|
+
```text
|
|
141
|
+
platform-smoke.config.mjs
|
|
142
|
+
scripts/platform-smoke.mjs
|
|
143
|
+
scripts/platform-smoke/assertions.mjs
|
|
144
|
+
scripts/platform-smoke/artifacts.mjs
|
|
145
|
+
scripts/platform-smoke/card-detect.mjs
|
|
146
|
+
scripts/platform-smoke/crabbox-runner.mjs
|
|
147
|
+
scripts/platform-smoke/doctor.mjs
|
|
148
|
+
scripts/platform-smoke/live-suite-runner.mjs
|
|
149
|
+
scripts/platform-smoke/platform-build-windows.ps1
|
|
150
|
+
scripts/platform-smoke/pty-capture.mjs
|
|
151
|
+
scripts/platform-smoke/render-ansi.mjs
|
|
152
|
+
scripts/platform-smoke/scenarios.mjs
|
|
153
|
+
scripts/platform-smoke/targets.mjs
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
Package scripts:
|
|
157
|
+
|
|
158
|
+
```json
|
|
159
|
+
{
|
|
160
|
+
"check:platform-smoke": "node --check <platform smoke scripts> && vitest run test/smoke-tooling.test.ts",
|
|
161
|
+
"smoke:platform": "node scripts/platform-smoke.mjs",
|
|
162
|
+
"smoke:platform:doctor": "node scripts/platform-smoke.mjs doctor",
|
|
163
|
+
"smoke:platform:macos": "node scripts/platform-smoke.mjs run --target macos",
|
|
164
|
+
"smoke:platform:ubuntu": "node scripts/platform-smoke.mjs run --target ubuntu",
|
|
165
|
+
"smoke:platform:windows-native": "node scripts/platform-smoke.mjs run --target windows-native",
|
|
166
|
+
"smoke:platform:all": "npm run smoke:platform:doctor && node scripts/platform-smoke.mjs run --target macos,ubuntu,windows-native"
|
|
167
|
+
}
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
Add `.artifacts/`, `.crabbox/`, and `.platform-smoke-runs/` to `.gitignore`.
|
|
171
|
+
|
|
172
|
+
## Configuration source
|
|
173
|
+
|
|
174
|
+
All repo-specific behavior lives in `platform-smoke.config.mjs` so the framework can be reused by other pi extensions.
|
|
175
|
+
|
|
176
|
+
Required config fields:
|
|
177
|
+
|
|
178
|
+
```js
|
|
179
|
+
export default {
|
|
180
|
+
packageName: "pi-cursor-sdk",
|
|
181
|
+
cursorModel: "cursor/composer-2-5",
|
|
182
|
+
artifactRoot: ".artifacts/platform-smoke",
|
|
183
|
+
requiredTargets: ["macos", "ubuntu", "windows-native"],
|
|
184
|
+
requiredSuites: [
|
|
185
|
+
"platform-build",
|
|
186
|
+
"cursor-native-visual-matrix",
|
|
187
|
+
"cursor-bridge-visual-matrix",
|
|
188
|
+
"cursor-abort-cleanup",
|
|
189
|
+
],
|
|
190
|
+
requiredCrabbox: {
|
|
191
|
+
install: "brew install crabbox",
|
|
192
|
+
version: "0.24.0",
|
|
193
|
+
},
|
|
194
|
+
ubuntuContainerImage: "cimg/node:24.16",
|
|
195
|
+
nodeValidationMajor: 24,
|
|
196
|
+
};
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
`ubuntuContainerImage` defaults the local-container Ubuntu target to an Ubuntu 24.04 Node 24 image with a current glibc baseline for native test dependencies; Crabbox still bootstraps SSH/Git/rsync/curl as needed. `nodeValidationMajor: 24` is the release-smoke validation baseline. It does not change the package engine by itself. A separate compatibility lane can test Node 22.19 later; this required gate validates Node 24 on every target.
|
|
200
|
+
|
|
201
|
+
## Required local environment
|
|
202
|
+
|
|
203
|
+
The doctor fails if any required value is missing.
|
|
204
|
+
|
|
205
|
+
```bash
|
|
206
|
+
PLATFORM_SMOKE_CRABBOX=/opt/homebrew/bin/crabbox
|
|
207
|
+
|
|
208
|
+
PLATFORM_SMOKE_MAC_HOST=localhost
|
|
209
|
+
PLATFORM_SMOKE_MAC_USER="$USER"
|
|
210
|
+
PLATFORM_SMOKE_MAC_WORK_ROOT="/Users/$USER/crabbox/pi-cursor-sdk"
|
|
211
|
+
PLATFORM_SMOKE_UBUNTU_IMAGE="cimg/node:24.16"
|
|
212
|
+
|
|
213
|
+
PLATFORM_SMOKE_WINDOWS_VM="Windows 11"
|
|
214
|
+
PLATFORM_SMOKE_WINDOWS_SNAPSHOT="crabbox-ready"
|
|
215
|
+
PLATFORM_SMOKE_WINDOWS_USER="<windows-ssh-user>"
|
|
216
|
+
PLATFORM_SMOKE_WINDOWS_NATIVE_WORK_ROOT="C:\\crabbox\\pi-cursor-sdk"
|
|
217
|
+
|
|
218
|
+
CURSOR_API_KEY="..."
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
Cursor auth is passed as a target process environment value. The key must not appear in repo files, artifacts, logs, or rendered output.
|
|
222
|
+
|
|
223
|
+
## Workspace model
|
|
224
|
+
|
|
225
|
+
Every target session uses a unique run root.
|
|
226
|
+
|
|
227
|
+
```text
|
|
228
|
+
<targetWorkRoot>/runs/<run-id>/
|
|
229
|
+
extension-source/ # synced repository under test
|
|
230
|
+
test-workspace/ # live pi cwd and deterministic fixture repo
|
|
231
|
+
pi-project/ # target-local pi settings for packed install
|
|
232
|
+
artifacts/ # target-side suite artifacts
|
|
233
|
+
pack/ # packed tarball and install material
|
|
234
|
+
|
|
235
|
+
<targetWorkRoot>/runs/live-prep-<target-session>/
|
|
236
|
+
packed-workspace/ # shared target-local npm install of the packed tarball
|
|
237
|
+
pack/ # shared live-suite tarball
|
|
238
|
+
ready.json # package path reused by later live suites
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
Definitions:
|
|
242
|
+
|
|
243
|
+
- `extensionSourceRoot`: synced repo used for `npm ci`, `npm test`, `npm run typecheck`, and `npm pack`.
|
|
244
|
+
- `testWorkspaceRoot`: cwd used by live Cursor suites. It contains deterministic fixture files the prompts operate on: `package.json`, `README.md`, `src/`, and suite scratch directories.
|
|
245
|
+
- `piProjectRoot`: target-local pi project where platform-build proves packed install.
|
|
246
|
+
- `livePrepRoot`: target-local shared live-suite prep where the first live suite installs the packed tarball once for reuse by later live suites in the same target session.
|
|
247
|
+
|
|
248
|
+
Live suites run in a suite-local `testWorkspaceRoot`. The extension loaded by pi is the packed tarball package path from `livePrepRoot`, installed into that suite-local workspace with `pi install -l`; no live suite uses `pi -e .`.
|
|
249
|
+
|
|
250
|
+
The runner must prove this by recording:
|
|
251
|
+
|
|
252
|
+
- packed tarball path;
|
|
253
|
+
- `pi list` output from the suite-local project after `pi install -l <packed package path>`;
|
|
254
|
+
- command line showing no `-e .`;
|
|
255
|
+
- live suite cwd as `testWorkspaceRoot`.
|
|
256
|
+
|
|
257
|
+
## Target setup requirements
|
|
258
|
+
|
|
259
|
+
### macOS
|
|
260
|
+
|
|
261
|
+
Required:
|
|
262
|
+
|
|
263
|
+
- OpenSSH enabled on localhost.
|
|
264
|
+
- Configured SSH user logs in without interactive prompts.
|
|
265
|
+
- `git`, `rsync`, `tar`, `curl`, Node 24+, and npm are available.
|
|
266
|
+
- Work root is writable.
|
|
267
|
+
- `node-pty` self-test passes.
|
|
268
|
+
|
|
269
|
+
### Ubuntu
|
|
270
|
+
|
|
271
|
+
Required:
|
|
272
|
+
|
|
273
|
+
- Docker-compatible runtime is active.
|
|
274
|
+
- `crabbox doctor --provider local-container --json` passes.
|
|
275
|
+
- Required local image exists with Node 24+, npm, OpenSSH prerequisites, `git`, `rsync`, `curl`, `sudo`, `python3`, `tar`, and `ripgrep`.
|
|
276
|
+
- `node-pty` self-test passes in the container.
|
|
277
|
+
|
|
278
|
+
### Windows template VM
|
|
279
|
+
|
|
280
|
+
The user's daily Windows VM is not the long-term test target. A dedicated template VM and snapshot are required.
|
|
281
|
+
|
|
282
|
+
Template requirements:
|
|
283
|
+
|
|
284
|
+
- Windows 11.
|
|
285
|
+
- Parallels Tools installed.
|
|
286
|
+
- OpenSSH Server enabled.
|
|
287
|
+
- Stable SSH user configured.
|
|
288
|
+
- Node 24+ and npm installed for native Windows.
|
|
289
|
+
- Git for Windows installed.
|
|
290
|
+
- PowerShell available.
|
|
291
|
+
- `tar` available in native Windows PATH.
|
|
292
|
+
- `node-pty` self-test passes in native Windows.
|
|
293
|
+
- Source VM is powered off.
|
|
294
|
+
- Snapshot named `crabbox-ready` exists.
|
|
295
|
+
|
|
296
|
+
Crabbox Parallels creates linked clones from the powered-off snapshot. The source template VM is never used directly for smoke runs.
|
|
297
|
+
|
|
298
|
+
### Windows native
|
|
299
|
+
|
|
300
|
+
Required native probe:
|
|
301
|
+
|
|
302
|
+
```powershell
|
|
303
|
+
node --version
|
|
304
|
+
npm --version
|
|
305
|
+
git --version
|
|
306
|
+
tar --version
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
## Doctor command
|
|
310
|
+
|
|
311
|
+
`npm run smoke:platform:doctor` runs before any token-spending suite. The canonical `npm run smoke:platform:all` script enforces doctor first before it starts macOS, Ubuntu, or Windows suites.
|
|
312
|
+
|
|
313
|
+
Doctor checks:
|
|
314
|
+
|
|
315
|
+
1. Required env vars exist.
|
|
316
|
+
2. `PLATFORM_SMOKE_CRABBOX` exists and is executable.
|
|
317
|
+
3. Crabbox build matches the configured baseline.
|
|
318
|
+
4. Crabbox provider registry includes `local-container`, `ssh`, and `parallels`.
|
|
319
|
+
5. `crabbox doctor --provider local-container --json` passes.
|
|
320
|
+
6. Docker runtime is active.
|
|
321
|
+
7. macOS SSH localhost probe passes and sees Node, npm, Git, rsync, and tar.
|
|
322
|
+
8. `prlctl` exists.
|
|
323
|
+
9. Windows source VM exists.
|
|
324
|
+
10. Windows source snapshot exists.
|
|
325
|
+
11. Windows source VM is stopped and the configured snapshot is power-off/forkable for linked clones.
|
|
326
|
+
12. Disposable Windows native clone probe passes and sees Node, npm, Git, tar, and the configured SSH user.
|
|
327
|
+
13. Node 24+ is available on every target.
|
|
328
|
+
14. npm is available on every target.
|
|
329
|
+
15. `git` is available on every target.
|
|
330
|
+
16. `rsync` is available on macOS and Ubuntu.
|
|
331
|
+
17. `tar` is available on macOS and native Windows.
|
|
332
|
+
18. `node-pty` self-test passes on every target.
|
|
333
|
+
19. Target pi tool probe proves the shell tool accepts platform-rendered commands on every target.
|
|
334
|
+
20. Host-side xterm/Playwright render self-test passes.
|
|
335
|
+
21. `CURSOR_API_KEY` is present.
|
|
336
|
+
22. Artifact root is writable.
|
|
337
|
+
23. `git status --short` is recorded.
|
|
338
|
+
24. Forbidden tracked artifacts, package tarballs, `.env*`, auth files, and secrets are absent.
|
|
339
|
+
|
|
340
|
+
Doctor does not fail merely because the branch has uncommitted source or doc changes under test. It fails on forbidden artifacts and missing platform readiness.
|
|
341
|
+
|
|
342
|
+
## Dependency spike before implementation
|
|
343
|
+
|
|
344
|
+
Before adding `node-pty` as a dev dependency, run a phase-zero spike on all three targets:
|
|
345
|
+
|
|
346
|
+
```text
|
|
347
|
+
node -e "require('node-pty'); console.log('node-pty ok')"
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
Windows native must use either verified prebuilt `node-pty` binaries for Node 24 or a documented build toolchain. If Node 24 + Windows native + `node-pty` cannot be made reliable, reject Crabbox as the required platform runner.
|
|
351
|
+
|
|
352
|
+
## Packed-install rule
|
|
353
|
+
|
|
354
|
+
Platform smoke tests the installed package, not the source extension path.
|
|
355
|
+
|
|
356
|
+
Per target, `platform-build` must:
|
|
357
|
+
|
|
358
|
+
1. Record `node --version` and assert the target Node major is at least `nodeValidationMajor`.
|
|
359
|
+
2. Run `npm ci` in `extensionSourceRoot`.
|
|
360
|
+
3. Run `npm run check:platform-smoke` on the target so smoke harness syntax and invariant tests fail before live Cursor calls, with only the target-local release-tag guard bypassed because Crabbox worktrees may not have release tags.
|
|
361
|
+
4. Run `npm test` on the target with the same target-local release-tag guard bypass.
|
|
362
|
+
5. Run `npm run typecheck`.
|
|
363
|
+
6. Run `npm pack`.
|
|
364
|
+
7. Create `testWorkspaceRoot` with deterministic fixture files copied from the repo.
|
|
365
|
+
8. Create `piProjectRoot`.
|
|
366
|
+
9. Install the packed tarball into `piProjectRoot` with `pi install -l <tarball>`.
|
|
367
|
+
10. Run `pi list` and assert the installed package points at the packed tarball/install, not `-e .`.
|
|
368
|
+
|
|
369
|
+
## Required suites
|
|
370
|
+
|
|
371
|
+
### `platform-build`
|
|
372
|
+
|
|
373
|
+
Cursor calls: `0`.
|
|
374
|
+
|
|
375
|
+
Purpose:
|
|
376
|
+
|
|
377
|
+
- prove build and package readiness on the target OS;
|
|
378
|
+
- fail before spending Cursor tokens;
|
|
379
|
+
- produce the packed extension used by later suites.
|
|
380
|
+
|
|
381
|
+
The host `smoke:platform:all` entrypoint enforces doctor first and then the release-version reuse guard before running targets, using local git tags and `package.json`. Required artifacts include `node-version.txt`, `npm-version.txt`, stdout/stderr for `npm ci`, `npm run check:platform-smoke`, `npm test`, `npm run typecheck`, `npm pack`, packed npm install, `pi install`, and `pi list`, plus `packed-tarball.txt`, `summary.json`, `artifact-manifest.json`, `assertions.json`, and `failures.md` on failed assertions.
|
|
382
|
+
|
|
383
|
+
### `cursor-native-visual-matrix`
|
|
384
|
+
|
|
385
|
+
Cursor calls: `1`.
|
|
386
|
+
|
|
387
|
+
Required environment:
|
|
388
|
+
|
|
389
|
+
```text
|
|
390
|
+
PI_CURSOR_SETTING_SOURCES=none
|
|
391
|
+
PI_CURSOR_NATIVE_TOOL_DISPLAY=1
|
|
392
|
+
PI_CURSOR_REGISTER_NATIVE_TOOLS=1
|
|
393
|
+
PI_CURSOR_PI_TOOL_BRIDGE=0
|
|
394
|
+
PI_CURSOR_EXPOSE_BUILTIN_TOOLS=0
|
|
395
|
+
PI_CURSOR_SDK_EVENT_DEBUG=1
|
|
396
|
+
```
|
|
397
|
+
|
|
398
|
+
Purpose:
|
|
399
|
+
|
|
400
|
+
- prove provider reality;
|
|
401
|
+
- prove native Cursor tool replay;
|
|
402
|
+
- prove deterministic TUI card rendering;
|
|
403
|
+
- prove JSONL toolCall/toolResult correctness;
|
|
404
|
+
- prove footer/status readability.
|
|
405
|
+
|
|
406
|
+
The prompt is rendered per target. Shell command steps are platform-specific:
|
|
407
|
+
|
|
408
|
+
```text
|
|
409
|
+
success POSIX: printf 'cursor visual smoke\n'
|
|
410
|
+
success PowerShell: Write-Output 'cursor visual smoke'
|
|
411
|
+
failure POSIX: sh -c 'echo native shell failure >&2; exit 7'
|
|
412
|
+
failure PowerShell: Write-Error 'native shell failure'; exit 7
|
|
413
|
+
```
|
|
414
|
+
|
|
415
|
+
Required prompt template:
|
|
416
|
+
|
|
417
|
+
```text
|
|
418
|
+
Native visual matrix.
|
|
419
|
+
|
|
420
|
+
Use Cursor-native tools only. Do not use pi__ tools.
|
|
421
|
+
|
|
422
|
+
Steps:
|
|
423
|
+
1. read ./package.json and remember the package name.
|
|
424
|
+
2. grep ./README.md for "pi-cursor-sdk".
|
|
425
|
+
3. find README.md from repo root.
|
|
426
|
+
4. find src/cursor-provider.ts from repo root.
|
|
427
|
+
5. run shell: <platform-rendered-success-command>
|
|
428
|
+
6. write .debug/platform-smoke/<run-id>/native.txt with alpha and beta.
|
|
429
|
+
7. edit beta to gamma in that file.
|
|
430
|
+
8. run shell and preserve the failure: <platform-rendered-failure-command>
|
|
431
|
+
9. answer exactly:
|
|
432
|
+
NATIVE_MATRIX_OK package=<name> grep=<yes/no> find=<yes/no> list=<yes/no> shell=<yes/no> shell_fail=<yes/no> write=<yes/no> edit=<yes/no>
|
|
433
|
+
```
|
|
434
|
+
|
|
435
|
+
Required final marker: `NATIVE_MATRIX_OK`.
|
|
436
|
+
|
|
437
|
+
Required visual card evidence:
|
|
438
|
+
|
|
439
|
+
- `read`
|
|
440
|
+
- `grep`
|
|
441
|
+
- `find`
|
|
442
|
+
- `shell-success`
|
|
443
|
+
- `write`
|
|
444
|
+
- `edit-diff`
|
|
445
|
+
- `shell-failure`
|
|
446
|
+
- `footer-status`
|
|
447
|
+
|
|
448
|
+
Required JSONL evidence:
|
|
449
|
+
|
|
450
|
+
- successful `read`, `grep`, `find`/`glob`, `shell`, `write`, and `edit` results;
|
|
451
|
+
- successful native `find` result proving `src/cursor-provider.ts` was enumerated;
|
|
452
|
+
- failed shell result with `isError=true` and `native shell failure` output;
|
|
453
|
+
- final assistant message's last non-empty `text` part contains `NATIVE_MATRIX_OK`;
|
|
454
|
+
- assistant usage fields are non-negative;
|
|
455
|
+
- `cacheRead=0` and `cacheWrite=0`.
|
|
456
|
+
|
|
457
|
+
### `cursor-bridge-visual-matrix`
|
|
458
|
+
|
|
459
|
+
Cursor calls: `1`.
|
|
460
|
+
|
|
461
|
+
Required environment:
|
|
462
|
+
|
|
463
|
+
```text
|
|
464
|
+
PI_CURSOR_SETTING_SOURCES=none
|
|
465
|
+
PI_CURSOR_NATIVE_TOOL_DISPLAY=1
|
|
466
|
+
PI_CURSOR_REGISTER_NATIVE_TOOLS=1
|
|
467
|
+
PI_CURSOR_PI_TOOL_BRIDGE=1
|
|
468
|
+
PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1
|
|
469
|
+
PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1
|
|
470
|
+
PI_CURSOR_SDK_EVENT_DEBUG=1
|
|
471
|
+
```
|
|
472
|
+
|
|
473
|
+
Purpose:
|
|
474
|
+
|
|
475
|
+
- prove pi bridge request routing;
|
|
476
|
+
- prove successful bridge tool card;
|
|
477
|
+
- prove failed bridge tool card;
|
|
478
|
+
- prove bridge shell card;
|
|
479
|
+
- prove bridge diagnostics and JSONL use real pi tool names.
|
|
480
|
+
|
|
481
|
+
The bridge shell call uses pi's `bash` tool on every target, including Windows native. The command is shell-neutral and relies only on Node, which every target already validates:
|
|
482
|
+
|
|
483
|
+
```text
|
|
484
|
+
node -e "console.log('bridge visual smoke')"
|
|
485
|
+
```
|
|
486
|
+
|
|
487
|
+
Required prompt template:
|
|
488
|
+
|
|
489
|
+
```text
|
|
490
|
+
Bridge visual matrix.
|
|
491
|
+
|
|
492
|
+
Use pi bridge tools only. Use exact pi__ names.
|
|
493
|
+
|
|
494
|
+
You must make exactly three pi bridge tool calls before the final answer: pi__bash, pi__read, then pi__read. Do not answer until all three calls complete.
|
|
495
|
+
|
|
496
|
+
Steps:
|
|
497
|
+
1. call pi__bash with command: <platform-rendered-shell-command>
|
|
498
|
+
2. call pi__read on ./package.json.
|
|
499
|
+
3. call pi__read on ./definitely-missing-platform-smoke-file.txt.
|
|
500
|
+
4. answer exactly:
|
|
501
|
+
BRIDGE_MATRIX_OK bash_ok=<yes/no> read_ok=<yes/no> read_missing_error=<yes/no>
|
|
502
|
+
```
|
|
503
|
+
|
|
504
|
+
Required final marker: `BRIDGE_MATRIX_OK`.
|
|
505
|
+
|
|
506
|
+
Required visual card evidence:
|
|
507
|
+
|
|
508
|
+
- `bridge-read-success`
|
|
509
|
+
- `bridge-read-failure`
|
|
510
|
+
- `bridge-shell-success`
|
|
511
|
+
- `footer-status`
|
|
512
|
+
|
|
513
|
+
Required diagnostics evidence:
|
|
514
|
+
|
|
515
|
+
- `run_created`
|
|
516
|
+
- `tools_exposed`
|
|
517
|
+
- at least one rendered `request_resolved` bridge diagnostic event
|
|
518
|
+
- no bridge endpoint URL in collected artifacts
|
|
519
|
+
- no bearer token
|
|
520
|
+
- no auth/token JSON field payload
|
|
521
|
+
- no `CURSOR_API_KEY`
|
|
522
|
+
|
|
523
|
+
Required JSONL evidence:
|
|
524
|
+
|
|
525
|
+
- real pi tool call named `read`, success;
|
|
526
|
+
- real pi tool call named `read`, failure;
|
|
527
|
+
- real pi tool call named `bash`, success;
|
|
528
|
+
- final assistant message's last non-empty `text` part contains `BRIDGE_MATRIX_OK`;
|
|
529
|
+
- assistant usage fields are non-negative;
|
|
530
|
+
- `cacheRead=0` and `cacheWrite=0`.
|
|
531
|
+
|
|
532
|
+
### `cursor-abort-cleanup`
|
|
533
|
+
|
|
534
|
+
Cursor calls: `1`, intentionally interrupted.
|
|
535
|
+
|
|
536
|
+
Required environment:
|
|
537
|
+
|
|
538
|
+
```text
|
|
539
|
+
PI_CURSOR_SETTING_SOURCES=none
|
|
540
|
+
PI_CURSOR_NATIVE_TOOL_DISPLAY=1
|
|
541
|
+
PI_CURSOR_REGISTER_NATIVE_TOOLS=1
|
|
542
|
+
PI_CURSOR_PI_TOOL_BRIDGE=1
|
|
543
|
+
PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1
|
|
544
|
+
PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1
|
|
545
|
+
PI_CURSOR_SDK_EVENT_DEBUG=1
|
|
546
|
+
```
|
|
547
|
+
|
|
548
|
+
Purpose:
|
|
549
|
+
|
|
550
|
+
- prove long-running bridge cancellation;
|
|
551
|
+
- prove no orphan processes;
|
|
552
|
+
- prove no false successful answer.
|
|
553
|
+
|
|
554
|
+
The long-running bridge command uses pi's `bash` tool on every target and relies only on Node, which every target already validates:
|
|
555
|
+
|
|
556
|
+
```text
|
|
557
|
+
node -e "const fs=require('fs');fs.mkdirSync('.debug/platform-smoke',{recursive:true});fs.writeFileSync('.debug/platform-smoke/abort-started.txt',String(process.pid));setTimeout(() => console.log(process.env.PLATFORM_ABORT_MARKER), 30000)"
|
|
558
|
+
```
|
|
559
|
+
|
|
560
|
+
Required prompt template:
|
|
561
|
+
|
|
562
|
+
```text
|
|
563
|
+
Abort cleanup matrix.
|
|
564
|
+
|
|
565
|
+
Call pi__bash with command:
|
|
566
|
+
<platform-rendered-long-running-command>
|
|
567
|
+
|
|
568
|
+
Do not answer until the tool completes.
|
|
569
|
+
```
|
|
570
|
+
|
|
571
|
+
The harness interrupts after the bridge request is queued.
|
|
572
|
+
|
|
573
|
+
Required evidence:
|
|
574
|
+
|
|
575
|
+
- process snapshot before run;
|
|
576
|
+
- process snapshot after interrupt;
|
|
577
|
+
- `.debug/platform-smoke/abort-started.txt` was written by the long-running process before interrupt;
|
|
578
|
+
- no `PLATFORM_ABORT_MARKER` long-running command remains;
|
|
579
|
+
- no `SHOULD_NOT_PRINT` process remains;
|
|
580
|
+
- marker-scoped bridge/bash/node process cleanup is recorded in `leftover-process-check`;
|
|
581
|
+
- no final successful assistant answer claiming completion;
|
|
582
|
+
- bridge diagnostics in `artifacts/bridge-diagnostics.jsonl` include `request_queued` for `pi__bash`, `run_cancelled`, and cancelled `request_rejected`;
|
|
583
|
+
- cancellation or abort state is visible;
|
|
584
|
+
- no successful output contains `SHOULD_NOT_PRINT`.
|
|
585
|
+
|
|
586
|
+
## Cursor usage budget
|
|
587
|
+
|
|
588
|
+
Per target maximum live Cursor invocations:
|
|
589
|
+
|
|
590
|
+
```text
|
|
591
|
+
cursor-native-visual-matrix: 1
|
|
592
|
+
cursor-bridge-visual-matrix: 1
|
|
593
|
+
cursor-abort-cleanup: 1
|
|
594
|
+
```
|
|
595
|
+
|
|
596
|
+
Maximum per target: `3` Cursor invocations.
|
|
597
|
+
|
|
598
|
+
Maximum full gate: `12` Cursor invocations.
|
|
599
|
+
|
|
600
|
+
The merge gate is `npm run smoke:platform:all`; that script runs doctor first and then the matrix to preserve this budget. No suite adds a new Cursor invocation without updating this plan and `platform-smoke.config.mjs`.
|
|
601
|
+
|
|
602
|
+
## Artifact contract
|
|
603
|
+
|
|
604
|
+
Every target session writes under:
|
|
605
|
+
|
|
606
|
+
```text
|
|
607
|
+
.artifacts/platform-smoke/<run-id>/<target>/
|
|
608
|
+
```
|
|
609
|
+
|
|
610
|
+
Every suite writes under:
|
|
611
|
+
|
|
612
|
+
```text
|
|
613
|
+
.artifacts/platform-smoke/<run-id>/<target>/<suite>/
|
|
614
|
+
```
|
|
615
|
+
|
|
616
|
+
Common required artifacts:
|
|
617
|
+
|
|
618
|
+
```text
|
|
619
|
+
summary.json
|
|
620
|
+
artifact-manifest.json
|
|
621
|
+
target.json
|
|
622
|
+
suite.json
|
|
623
|
+
command.txt
|
|
624
|
+
exit-code.txt
|
|
625
|
+
crabbox.stdout.txt
|
|
626
|
+
crabbox.stderr.txt
|
|
627
|
+
crabbox.timing.json
|
|
628
|
+
assertions.json
|
|
629
|
+
failures.md # only when assertions fail
|
|
630
|
+
```
|
|
631
|
+
|
|
632
|
+
Required `platform-build` artifacts:
|
|
633
|
+
|
|
634
|
+
```text
|
|
635
|
+
node-version.txt
|
|
636
|
+
npm-version.txt
|
|
637
|
+
npm-ci.stdout.txt
|
|
638
|
+
npm-ci.stderr.txt
|
|
639
|
+
check-platform-smoke.stdout.txt
|
|
640
|
+
check-platform-smoke.stderr.txt
|
|
641
|
+
npm-test.stdout.txt
|
|
642
|
+
npm-test.stderr.txt
|
|
643
|
+
typecheck.stdout.txt
|
|
644
|
+
typecheck.stderr.txt
|
|
645
|
+
npm-pack.stdout.txt
|
|
646
|
+
npm-pack.stderr.txt
|
|
647
|
+
packed-tarball.txt
|
|
648
|
+
packed-node-install.stdout.txt
|
|
649
|
+
packed-node-install.stderr.txt
|
|
650
|
+
pi-install.stdout.txt
|
|
651
|
+
pi-install.stderr.txt
|
|
652
|
+
pi-list.stdout.txt
|
|
653
|
+
pi-list.stderr.txt
|
|
654
|
+
```
|
|
655
|
+
|
|
656
|
+
Every target-session release run also writes a `lease-cleanup/` suite directory under the same target run id:
|
|
657
|
+
|
|
658
|
+
```text
|
|
659
|
+
lease-cleanup/summary.json
|
|
660
|
+
lease-cleanup/assertions.json
|
|
661
|
+
lease-cleanup/crabbox.stop.stdout.txt
|
|
662
|
+
lease-cleanup/crabbox.stop.stderr.txt
|
|
663
|
+
lease-cleanup/crabbox.stop.exit-code.txt
|
|
664
|
+
```
|
|
665
|
+
|
|
666
|
+
A stop failure is a failed target result, even when all functional suites passed.
|
|
667
|
+
|
|
668
|
+
Required PTY artifacts for live suites:
|
|
669
|
+
|
|
670
|
+
```text
|
|
671
|
+
pty.events.jsonl
|
|
672
|
+
terminal.ansi
|
|
673
|
+
terminal.txt
|
|
674
|
+
terminal.html
|
|
675
|
+
terminal.full.png
|
|
676
|
+
terminal.final-viewport.png
|
|
677
|
+
```
|
|
678
|
+
|
|
679
|
+
Required card artifacts:
|
|
680
|
+
|
|
681
|
+
```text
|
|
682
|
+
cards/
|
|
683
|
+
index.html
|
|
684
|
+
cards.json
|
|
685
|
+
*.png
|
|
686
|
+
```
|
|
687
|
+
|
|
688
|
+
Required live session and provider-debug artifacts:
|
|
689
|
+
|
|
690
|
+
```text
|
|
691
|
+
artifacts/session.jsonl
|
|
692
|
+
cursor-sdk-events/
|
|
693
|
+
sessions/**/session.json
|
|
694
|
+
sessions/**/<turn-artifact>.json or .jsonl
|
|
695
|
+
```
|
|
696
|
+
|
|
697
|
+
Required abort artifacts:
|
|
698
|
+
|
|
699
|
+
```text
|
|
700
|
+
artifacts/abort-started.txt
|
|
701
|
+
logs/process-before.stdout.txt
|
|
702
|
+
logs/process-after.stdout.txt
|
|
703
|
+
logs/leftover-process-check.stdout.txt
|
|
704
|
+
```
|
|
705
|
+
|
|
706
|
+
Provider debug artifacts are required for every live suite through `PI_CURSOR_SDK_EVENT_DEBUG=1` and suite-scoped debug dirs.
|
|
707
|
+
|
|
708
|
+
## Artifact collection on failure
|
|
709
|
+
|
|
710
|
+
Crabbox success-path download is not sufficient. The target-side suite wrapper must always package artifacts before returning to the host.
|
|
711
|
+
|
|
712
|
+
Required target wrapper behavior:
|
|
713
|
+
|
|
714
|
+
1. Run the scenario.
|
|
715
|
+
2. Capture real scenario exit/assertion state in `exit-code.txt` and `assertions.json`.
|
|
716
|
+
3. Write `failures.md` when assertions fail.
|
|
717
|
+
4. Package the suite artifact directory.
|
|
718
|
+
5. Exit `0` for Crabbox transport so the archive downloads.
|
|
719
|
+
6. Let the host runner fail after unpacking and reading `assertions.json`.
|
|
720
|
+
|
|
721
|
+
Crabbox command exit means transport status. Suite pass/fail comes from `assertions.json`.
|
|
722
|
+
|
|
723
|
+
Archive names:
|
|
724
|
+
|
|
725
|
+
```text
|
|
726
|
+
<target>-<suite>-artifacts.tar.gz # macOS, Ubuntu
|
|
727
|
+
<target>-<suite>-artifacts.zip # Windows native
|
|
728
|
+
```
|
|
729
|
+
|
|
730
|
+
The host unpacks into the canonical artifact directory and verifies `artifact-manifest.json`.
|
|
731
|
+
|
|
732
|
+
## Assertion contract
|
|
733
|
+
|
|
734
|
+
Each suite produces `assertions.json`:
|
|
735
|
+
|
|
736
|
+
```json
|
|
737
|
+
{
|
|
738
|
+
"ok": true,
|
|
739
|
+
"target": "ubuntu",
|
|
740
|
+
"suite": "cursor-native-visual-matrix",
|
|
741
|
+
"checks": [
|
|
742
|
+
{ "id": "final-marker", "ok": true },
|
|
743
|
+
{ "id": "card-read", "ok": true },
|
|
744
|
+
{ "id": "jsonl-read", "ok": true }
|
|
745
|
+
]
|
|
746
|
+
}
|
|
747
|
+
```
|
|
748
|
+
|
|
749
|
+
Failures produce `failures.md` with:
|
|
750
|
+
|
|
751
|
+
- target;
|
|
752
|
+
- suite;
|
|
753
|
+
- failed assertion IDs;
|
|
754
|
+
- artifact paths;
|
|
755
|
+
- command summary;
|
|
756
|
+
- next diagnostic command.
|
|
757
|
+
|
|
758
|
+
## Visual evidence detector
|
|
759
|
+
|
|
760
|
+
The detector operates on host-rendered terminal HTML and PNG evidence. It must not pass from prompt text alone.
|
|
761
|
+
|
|
762
|
+
Required behavior:
|
|
763
|
+
|
|
764
|
+
- render ANSI with xterm/Playwright and assert the terminal DOM/theme is present, styled, non-empty, and screenshotted;
|
|
765
|
+
- search the rendered xterm buffer for suite-owned evidence patterns that correspond to actual tool output/results, not instructions in the prompt;
|
|
766
|
+
- scroll to each evidence line and write `cards/<evidence-id>.png` screenshots plus `visual-evidence.json`;
|
|
767
|
+
- write `cards.json` for the legacy rendered-evidence inventory;
|
|
768
|
+
- fail when required visual evidence is missing;
|
|
769
|
+
- fail when a card/evidence item has the wrong success/error state;
|
|
770
|
+
- fail when footer/status is missing or unreadable.
|
|
771
|
+
|
|
772
|
+
Meaningful gap closed: earlier card assertions could pass when the prompt mentioned `pi__read` or a missing-file path even if the actual tool card/result never rendered. The gate now requires JSONL result evidence and per-evidence rendered screenshots for native read, native shell success/failure, native edit diffs, bridge read success/failure, and bridge shell success.
|
|
773
|
+
|
|
774
|
+
## Registry visual classification
|
|
775
|
+
|
|
776
|
+
The implementation must classify every `CURSOR_TOOL_PRESENTATION_SPECS` entry from `src/cursor-tool-presentation-registry.ts` as required or excluded for the release visual gate. A validation check fails when a registry entry lacks classification.
|
|
777
|
+
|
|
778
|
+
Required deterministic cards:
|
|
779
|
+
|
|
780
|
+
- `read`
|
|
781
|
+
- `grep`
|
|
782
|
+
- `glob` / find
|
|
783
|
+
- `shell`
|
|
784
|
+
- `write`
|
|
785
|
+
- `edit`
|
|
786
|
+
- failed `read`
|
|
787
|
+
|
|
788
|
+
Excluded from release visual matrix with required rationale:
|
|
789
|
+
|
|
790
|
+
- `delete`: destructive and redundant with file mutation card coverage.
|
|
791
|
+
- `readLints`: dependent on target diagnostics state.
|
|
792
|
+
- `updateTodos`: model workflow dependent.
|
|
793
|
+
- `createPlan`: model workflow dependent.
|
|
794
|
+
- `task`: model/task orchestration dependent.
|
|
795
|
+
- `generateImage`: external image generation surface.
|
|
796
|
+
- `mcp`: separate MCP integration surface beyond built-in bridge smoke.
|
|
797
|
+
- `semSearch`: semantic index state dependent.
|
|
798
|
+
- `recordScreen`: desktop capture dependency outside terminal smoke.
|
|
799
|
+
- `webSearch`: network/search dependent.
|
|
800
|
+
- `webFetch`: network dependent.
|
|
801
|
+
|
|
802
|
+
Adding a registry entry requires adding it to the required or excluded list with rationale. `ls` is currently excluded from the required one-prompt matrix because composer-2-5 does not route the deterministic source-enumeration step through the native `ls` surface reliably; the suite instead gates that behavior through a successful native `find` result for `src/cursor-provider.ts`.
|
|
803
|
+
|
|
804
|
+
## Platform command rendering
|
|
805
|
+
|
|
806
|
+
Scenario commands are not raw shell strings. The runner renders commands per target:
|
|
807
|
+
|
|
808
|
+
- `posix` for macOS and Ubuntu.
|
|
809
|
+
- `powershell` for Windows native.
|
|
810
|
+
|
|
811
|
+
Scenario shape:
|
|
812
|
+
|
|
813
|
+
```js
|
|
814
|
+
{
|
|
815
|
+
id: "cursor-native-visual-matrix",
|
|
816
|
+
requires: ["cursor-auth", "pty", "packed-install"],
|
|
817
|
+
promptTemplate: "... <platform-command:shellSmoke> ...",
|
|
818
|
+
commands: {
|
|
819
|
+
shellSmoke: {
|
|
820
|
+
posix: "printf 'cursor visual smoke\\n'",
|
|
821
|
+
powershell: "Write-Output 'cursor visual smoke'",
|
|
822
|
+
},
|
|
823
|
+
},
|
|
824
|
+
assertions: ["final-marker", "required-cards", "jsonl-tools"],
|
|
825
|
+
}
|
|
826
|
+
```
|
|
827
|
+
|
|
828
|
+
The renderer owns quoting, path normalization, environment assignment, and archive packaging.
|
|
829
|
+
|
|
830
|
+
## Security and redaction
|
|
831
|
+
|
|
832
|
+
The runner must scan every artifact and fail on:
|
|
833
|
+
|
|
834
|
+
- the literal `CURSOR_API_KEY` value;
|
|
835
|
+
- bearer tokens;
|
|
836
|
+
- auth headers;
|
|
837
|
+
- cookies;
|
|
838
|
+
- bridge endpoint URLs;
|
|
839
|
+
- raw Cursor SDK auth payloads;
|
|
840
|
+
- contents of `~/.pi/agent/auth.json`.
|
|
841
|
+
|
|
842
|
+
Bridge diagnostics may include safe tool names and correlation IDs only.
|
|
843
|
+
|
|
844
|
+
## Implementation phases
|
|
845
|
+
|
|
846
|
+
### Phase 0: plan-only branch state
|
|
847
|
+
|
|
848
|
+
Create this plan on `feat/crabbox-platform-smoke`. Do not implement code in this phase.
|
|
849
|
+
|
|
850
|
+
### Phase 1: dependency spike
|
|
851
|
+
|
|
852
|
+
Verify `node-pty` and ConPTY on every target before committing the dependency.
|
|
853
|
+
|
|
854
|
+
Exit criteria:
|
|
855
|
+
|
|
856
|
+
- node-pty self-test passes on macOS;
|
|
857
|
+
- node-pty self-test passes on Ubuntu local-container;
|
|
858
|
+
- node-pty self-test passes on Windows native Node 24.
|
|
859
|
+
|
|
860
|
+
### Phase 2: config and doctor
|
|
861
|
+
|
|
862
|
+
Add config, CLI skeleton, doctor, and npm scripts.
|
|
863
|
+
|
|
864
|
+
Exit criteria:
|
|
865
|
+
|
|
866
|
+
```bash
|
|
867
|
+
npm run smoke:platform:doctor
|
|
868
|
+
```
|
|
869
|
+
|
|
870
|
+
passes only when all required local setup exists.
|
|
871
|
+
|
|
872
|
+
### Phase 3: target session manager
|
|
873
|
+
|
|
874
|
+
Implement Crabbox target lifecycle for all three targets.
|
|
875
|
+
|
|
876
|
+
Exit criteria:
|
|
877
|
+
|
|
878
|
+
- each target can acquire/warm;
|
|
879
|
+
- each target can sync;
|
|
880
|
+
- each target can run `node --version`;
|
|
881
|
+
- each target can package/download a trivial artifact;
|
|
882
|
+
- each target can stop/cleanup;
|
|
883
|
+
- one lease per target session.
|
|
884
|
+
|
|
885
|
+
### Phase 4: `platform-build`
|
|
886
|
+
|
|
887
|
+
Implement build/package/install suite.
|
|
888
|
+
|
|
889
|
+
Exit criteria: `platform-build` passes on all targets through `smoke:platform:all -- --suite platform-build` without live Cursor calls.
|
|
890
|
+
|
|
891
|
+
### Phase 5: PTY capture and host render
|
|
892
|
+
|
|
893
|
+
Implement PTY/ConPTY capture and host-side xterm/Playwright render.
|
|
894
|
+
|
|
895
|
+
Exit criteria:
|
|
896
|
+
|
|
897
|
+
- ANSI capture works on all targets;
|
|
898
|
+
- host render writes HTML, full PNG, and final viewport PNG;
|
|
899
|
+
- visual evidence detector can capture fixture evidence screenshots.
|
|
900
|
+
|
|
901
|
+
### Phase 6: native visual matrix
|
|
902
|
+
|
|
903
|
+
Implement one-call native matrix.
|
|
904
|
+
|
|
905
|
+
Exit criteria:
|
|
906
|
+
|
|
907
|
+
- all required native visual evidence screenshots are captured on every target;
|
|
908
|
+
- JSONL assertions pass on every target;
|
|
909
|
+
- Cursor call budget remains one call per target.
|
|
910
|
+
|
|
911
|
+
### Phase 7: bridge visual matrix
|
|
912
|
+
|
|
913
|
+
Implement one-call bridge matrix.
|
|
914
|
+
|
|
915
|
+
Exit criteria:
|
|
916
|
+
|
|
917
|
+
- all required bridge visual evidence screenshots are captured on every target;
|
|
918
|
+
- bridge diagnostics assertions pass on every target;
|
|
919
|
+
- JSONL assertions pass on every target.
|
|
920
|
+
|
|
921
|
+
### Phase 8: abort cleanup
|
|
922
|
+
|
|
923
|
+
Implement interrupted bridge run.
|
|
924
|
+
|
|
925
|
+
Exit criteria:
|
|
926
|
+
|
|
927
|
+
- no leftovers on any target;
|
|
928
|
+
- no false success in JSONL;
|
|
929
|
+
- target session stops cleanly.
|
|
930
|
+
|
|
931
|
+
### Phase 9: docs and legacy cleanup
|
|
932
|
+
|
|
933
|
+
Update:
|
|
934
|
+
|
|
935
|
+
- `README.md`
|
|
936
|
+
- `docs/cursor-live-smoke-checklist.md`
|
|
937
|
+
- `docs/cursor-testing-lessons.md`
|
|
938
|
+
- `docs/cursor-native-tool-visual-audit.md`
|
|
939
|
+
|
|
940
|
+
They must state:
|
|
941
|
+
|
|
942
|
+
- required release gate is `npm run smoke:platform:all`;
|
|
943
|
+
- legacy smoke scripts are inner-loop/debug helpers;
|
|
944
|
+
- `tmux` visual smoke is not the canonical cross-platform gate.
|
|
945
|
+
|
|
946
|
+
## Release bar
|
|
947
|
+
|
|
948
|
+
A provider/runtime release is ready only after this exact command passes on the maintainer machine:
|
|
949
|
+
|
|
950
|
+
```bash
|
|
951
|
+
npm run smoke:platform:all
|
|
952
|
+
```
|
|
953
|
+
|
|
954
|
+
The command runs doctor first and then all required targets and suites in one full gate execution.
|
|
955
|
+
|
|
956
|
+
## Gate replacement criteria
|
|
957
|
+
|
|
958
|
+
Replace or redesign this platform runner if any of these become true:
|
|
959
|
+
|
|
960
|
+
- Parallels Windows linked clones are unreliable.
|
|
961
|
+
- Windows native cannot run the required ConPTY visual matrix.
|
|
962
|
+
- macOS static SSH localhost cannot run the required PTY visual matrix.
|
|
963
|
+
- Ubuntu local-container cannot run the required PTY visual matrix.
|
|
964
|
+
- Packed install cannot be tested uniformly across all targets.
|
|
965
|
+
- Artifact transfer cannot be made uniform across success and failure.
|
|
966
|
+
- The visual card detector cannot reliably identify required deterministic cards.
|
|
967
|
+
- The full gate exceeds the fixed Cursor invocation budget.
|
|
968
|
+
- Node 24 + `node-pty` cannot be made reliable on Windows native.
|
|
969
|
+
|
|
970
|
+
If the gate is replaced, document the new cross-platform release process before removing this one. Existing local smoke scripts remain inner-loop/debug helpers, not release gates.
|
|
971
|
+
|
|
972
|
+
## Portability to other pi extensions
|
|
973
|
+
|
|
974
|
+
Repo-specific pieces:
|
|
975
|
+
|
|
976
|
+
- `platform-smoke.config.mjs`
|
|
977
|
+
- expected package name
|
|
978
|
+
- model IDs
|
|
979
|
+
- scenario prompts
|
|
980
|
+
- required visual card matrix
|
|
981
|
+
- final markers
|
|
982
|
+
|
|
983
|
+
Reusable pieces:
|
|
984
|
+
|
|
985
|
+
- Crabbox target session manager
|
|
986
|
+
- PTY/ConPTY capture
|
|
987
|
+
- host-side ANSI render
|
|
988
|
+
- artifact manifest writer
|
|
989
|
+
- JSONL parser
|
|
990
|
+
- visual evidence detector
|
|
991
|
+
- process cleanup checker
|
|
992
|
+
- target doctor
|
|
993
|
+
|
|
994
|
+
The framework is successful when another pi extension can copy the runner and change only its config plus scenarios.
|