pi-cursor-sdk 0.1.20 → 0.1.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (88) hide show
  1. package/CHANGELOG.md +32 -0
  2. package/README.md +49 -9
  3. package/docs/cursor-dogfood-checklist.md +57 -0
  4. package/docs/cursor-live-smoke-checklist.md +115 -9
  5. package/docs/cursor-model-ux-spec.md +57 -17
  6. package/docs/cursor-native-tool-replay.md +15 -7
  7. package/docs/cursor-native-tool-visual-audit.md +104 -59
  8. package/docs/cursor-testing-lessons.md +8 -3
  9. package/docs/cursor-tool-surfaces.md +69 -0
  10. package/package.json +34 -10
  11. package/scripts/debug-provider-events.d.mts +59 -0
  12. package/scripts/debug-provider-events.mjs +70 -175
  13. package/scripts/debug-sdk-events.d.mts +90 -0
  14. package/scripts/debug-sdk-events.mjs +36 -98
  15. package/scripts/fixtures/plan-strip-shim/index.ts +12 -0
  16. package/scripts/isolated-cursor-smoke.sh +264 -102
  17. package/scripts/lib/cursor-child-process.d.mts +10 -0
  18. package/scripts/lib/cursor-child-process.mjs +50 -0
  19. package/scripts/lib/cursor-cli-args.d.mts +63 -0
  20. package/scripts/lib/cursor-cli-args.mjs +129 -0
  21. package/scripts/lib/cursor-script-fail.d.mts +1 -0
  22. package/scripts/lib/cursor-script-fail.mjs +13 -0
  23. package/scripts/lib/cursor-sdk-output-filter.d.mts +5 -0
  24. package/scripts/lib/cursor-smoke-env.d.mts +38 -0
  25. package/scripts/lib/cursor-smoke-env.mjs +81 -0
  26. package/scripts/lib/cursor-smoke-shell.sh +174 -0
  27. package/scripts/lib/cursor-visual-render.d.mts +15 -0
  28. package/scripts/lib/cursor-visual-render.mjs +131 -0
  29. package/scripts/probe-mcp-coldstart.mjs +20 -38
  30. package/scripts/refresh-cursor-model-snapshots.mjs +29 -65
  31. package/scripts/steering-rpc-smoke.mjs +170 -65
  32. package/scripts/tmux-live-smoke.sh +152 -98
  33. package/scripts/visual-tui-smoke.mjs +659 -0
  34. package/shared/cursor-sdk-event-debug-env.d.mts +12 -0
  35. package/shared/cursor-sdk-event-debug-env.mjs +13 -0
  36. package/shared/cursor-sensitive-text.d.mts +1 -0
  37. package/{scripts/lib/cursor-probe-utils.mjs → shared/cursor-sensitive-text.mjs} +1 -13
  38. package/shared/cursor-setting-sources.d.mts +5 -0
  39. package/shared/cursor-setting-sources.mjs +22 -0
  40. package/src/context.ts +21 -12
  41. package/src/cursor-bridge-contract.ts +1 -3
  42. package/src/cursor-incomplete-tool-visibility.ts +22 -5
  43. package/src/cursor-native-tool-display-registration.ts +63 -27
  44. package/src/cursor-native-tool-display-replay.ts +246 -144
  45. package/src/cursor-native-tool-display-state.ts +2 -0
  46. package/src/cursor-native-tool-display-tools.ts +149 -41
  47. package/src/cursor-provider-live-run-drain.ts +1 -52
  48. package/src/cursor-provider-run-finalizer.ts +235 -0
  49. package/src/cursor-provider-run-outcome.ts +149 -0
  50. package/src/cursor-provider-turn-api-key.ts +8 -0
  51. package/src/cursor-provider-turn-coordinator.ts +98 -446
  52. package/src/cursor-provider-turn-display-router.ts +216 -0
  53. package/src/cursor-provider-turn-emit.ts +59 -0
  54. package/src/cursor-provider-turn-finalize.ts +119 -0
  55. package/src/cursor-provider-turn-lifecycle-emitter.ts +97 -0
  56. package/src/cursor-provider-turn-message-offset.ts +15 -0
  57. package/src/cursor-provider-turn-prepare.ts +216 -0
  58. package/src/cursor-provider-turn-runner.ts +138 -0
  59. package/src/cursor-provider-turn-sdk-normalizer.ts +88 -0
  60. package/src/cursor-provider-turn-send.ts +103 -0
  61. package/src/cursor-provider-turn-shell-output.ts +107 -0
  62. package/src/cursor-provider-turn-tool-ledger.ts +126 -0
  63. package/src/cursor-provider-turn-types.ts +87 -0
  64. package/src/cursor-provider.ts +16 -504
  65. package/src/cursor-replay-activity-builders.ts +276 -0
  66. package/src/cursor-replay-source-names.ts +33 -0
  67. package/src/cursor-replay-summary-args.ts +191 -0
  68. package/src/cursor-replay-tool-details.ts +464 -0
  69. package/src/cursor-run-final-text.ts +56 -0
  70. package/src/cursor-sdk-abort-error-guard.ts +4 -0
  71. package/src/cursor-sdk-event-debug-constants.ts +14 -5
  72. package/src/cursor-sdk-event-debug.ts +2 -1
  73. package/src/cursor-sensitive-text.ts +3 -36
  74. package/src/cursor-session-agent.ts +3 -1
  75. package/src/cursor-setting-sources.ts +7 -10
  76. package/src/cursor-state.ts +232 -28
  77. package/src/cursor-tool-lifecycle.ts +9 -8
  78. package/src/cursor-tool-manifest.ts +41 -0
  79. package/src/cursor-tool-names.ts +18 -106
  80. package/src/cursor-tool-presentation-registry.ts +556 -0
  81. package/src/cursor-tool-transcript.ts +1 -1
  82. package/src/cursor-tool-visibility.ts +3 -27
  83. package/src/cursor-transcript-tool-formatters.ts +0 -59
  84. package/src/cursor-transcript-tool-specs.ts +158 -233
  85. package/src/cursor-transcript-utils.ts +0 -44
  86. package/src/cursor-web-tool-activity.ts +10 -60
  87. package/src/cursor-web-tool-args.ts +39 -0
  88. package/src/index.ts +4 -10
@@ -11,33 +11,34 @@ Current implementation notes:
11
11
  - Cursor context variants use `base@context` pi model IDs.
12
12
  - Cursor `reasoning`, `effort`, and boolean `thinking` parameters are driven by pi native thinking when the Cursor SDK exposes those controls.
13
13
  - Cursor `fast` is extension state, not model identity.
14
- - Cursor fast status uses `ctx.ui.setStatus()`; the default pi footer remains intact.
14
+ - Cursor SDK `mode` (`agent` or `plan`) is extension session state, not model identity, pi thinking, Cursor `fast`, or pi's separate plan-mode extension.
15
+ - Cursor status uses one coordinated `ctx.ui.setStatus("cursor", ...)` value for fast and non-default plan mode; the default pi footer remains intact.
15
16
  - Installed `@cursor/sdk` user messages accept images, and Cursor models are treated as image-capable; registered input metadata is `text` plus `image`.
16
17
  - Image payload forwarding sends images only from the latest user message. If the latest user turn is plain text after an earlier image turn, the transcript keeps an `[image omitted from transcript]` placeholder but no image bytes are sent to Cursor. The prompt explicitly tells Cursor that prior image bytes are unavailable and to ask the user to reattach or describe a prior image when needed. Carrying images forward across turns remains a future product decision because it affects token cost, privacy, stale visual context, and expected multimodal follow-up behavior.
17
- - `@cursor/sdk` is a package dependency of this extension; users should not need a global SDK install.
18
+ - Exact `@cursor/sdk@1.0.14` is a package dependency of this extension; users should not need a global SDK install. pi 0.76.0 is the supported validation baseline, with peer dependencies expressed as minimum-only `>=0.76.0` ranges and no upper bound.
18
19
  - Cursor auth uses pi-native API-key resolution for provider `cursor`: CLI `--api-key`, stored `~/.pi/agent/auth.json` API key from `/login`, then `CURSOR_API_KEY`. The extension config file stores only non-secret Cursor-only state such as fast defaults.
19
20
  - Local agents pass `settingSources: ["all"]` by default so Cursor MCP servers, plugin tools, project/user settings, and related Cursor-native capabilities are available. Users can narrow loading with a comma-separated list such as `PI_CURSOR_SETTING_SOURCES=project,user,plugins`, or disable ambient setting sources with `PI_CURSOR_SETTING_SOURCES=none`. The provider suppresses direct Cursor SDK bootstrap stdout/stderr/console noise (including late first-send workspace loading such as hook compatibility warnings) so it does not pollute pi's TUI.
20
21
  - On `cursor/*` models, pi-cursor-sdk removes only pi-generated `<project_instructions>` blocks that overlap the effective Cursor `settingSources`: `user` for `~/.pi/agent/AGENTS.md`; `project` for discovered repo/parent `AGENTS.md` and `CLAUDE.md` (verified Cursor behavior: local agents load project `AGENTS.md` and `CLAUDE.md`). `~/.pi/agent/CLAUDE.md` is not removed (Cursor user layer uses `~/.claude/CLAUDE.md`). Blocks are removed by exact pi serialization match from structured `contextFiles` via the `before_agent_start` hook, not in `buildCursorPrompt` sanitization. Suppression is skipped with `-nc`, `PI_CURSOR_SETTING_SOURCES=none`, narrowed sources such as `plugins` that omit the matching layer, or `PI_CURSOR_PRESERVE_PI_AGENTS_MD=1`. Switching away from a Cursor model restores pi's full context block on the next user message.
21
22
  - Cursor SDK models are treated as thinking-capable even when pi reports `thinking=no`; that pi column only means the SDK did not expose a pi-controllable thinking parameter for that model.
22
23
  - Cursor-side thinking remains visible through pi's native thinking rendering when the Cursor SDK emits thinking or summary deltas.
23
24
  - Local Cursor agents get two tool surfaces. First, Cursor keeps the Cursor SDK local-agent tool surface plus configured Cursor settings, plugins, and Cursor MCP servers. Second, pi-cursor-sdk exposes active pi tools through a default-on, tokenized loopback MCP bridge when bridgeable tools exist.
24
- - `buildCursorPiToolBridgeSnapshot()` is the runtime capability source for pi bridge tools. It snapshots `pi.getActiveTools()` and `pi.getAllTools()`, filters internal replay names, hides overlapping built-in pi tools (`read`, `bash`, `write`, `edit`, `grep`, `find`, `ls`) unless `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1`, and creates collision-safe MCP names such as `pi__sem_reindex`. Cursor discovers the current run's exposed bridge tools through MCP `listTools`; there is no default run-start manifest, per-turn visible tool list, status manifest, or footer manifest.
25
- - Prompt text is the primary provider/bridge contract. MCP tool descriptions repeat the same contract to reinforce discovery, but do not replace the prompt boundary. Cursor must call the exposed `pi__*` MCP name, not the real pi tool name shown in pi history or transcripts. Pi emits and executes the real pi tool name.
25
+ - `buildCursorPiToolBridgeSnapshot()` is the runtime capability source for pi bridge tools. It snapshots `pi.getActiveTools()` and `pi.getAllTools()`, filters internal replay names, hides overlapping built-in pi tools (`read`, `bash`, `write`, `edit`, `grep`, `find`, `ls`) unless `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=1`, and creates collision-safe MCP names such as `pi__sem_reindex`. Cursor discovers the current run's exposed bridge tools through MCP `listTools`. Bootstrap prompts include a compact callable-surface manifest from `buildCursorToolManifestText()` by default (`PI_CURSOR_TOOL_MANIFEST=1`); disable with `PI_CURSOR_TOOL_MANIFEST=0`. There is no per-turn visible tool list, status manifest, or footer manifest. User-facing summary: [Cursor tool surfaces in pi](./cursor-tool-surfaces.md).
26
+ - Prompt text is the primary provider/bridge contract. Bootstrap prompts carry a short boundary block plus the callable-surface manifest by default (`PI_CURSOR_TOOL_MANIFEST=1`). MCP `listTools` descriptions use a one-line pointer to the bootstrap prompt instead of repeating the full contract (`buildCursorPiBridgeMcpToolDescription()`). Cursor must call the exposed `pi__*` MCP name, not the real pi tool name shown in pi history or transcripts. Pi emits and executes the real pi tool name. Maintainer debug: `/cursor-tools` prints bridge/manifest enablement, effective `PI_CURSOR_SETTING_SOURCES`, and the current callable-surface snapshot.
26
27
  - The provider also registers `cursor_ask_question` for Cursor models when the bridge is enabled. Cursor sees it as `pi__cursor_ask_question`, and pi executes it through the normal tool path so interactive users can choose options from pi UI. In non-UI modes it reports that UI is unavailable so Cursor can state a default assumption instead. `PI_CURSOR_PI_TOOL_BRIDGE=0` disables the local bridge, including question bridging. Cloud Cursor agents remain out of scope for the bridge.
27
28
  - The bridge queues MCP calls, emits provider `toolcall_*` events, waits for matching pi `toolResult` messages by `toolCallId`, resolves the result back into the same live Cursor SDK run without creating a new `Agent`, and never calls tool `execute()` handlers directly. The same-run resume invariant holds unless the run was disposed, aborted, or cancelled.
28
- - Cursor SDK MCP tool calls use a guarded timeout override because installed `@cursor/sdk` 1.0.13 has a 60-second MCP request default with no public per-server timeout option. The extension extends the verified Cursor SDK MCP `callTool` timeout path to 3600 seconds by default and shortens the verified first-send MCP initialize/listTools timeout paths to 10 seconds by default so unavailable configured MCP servers do not block the first reply for a full minute; unknown MCP protocol timeout stacks keep the SDK default. Users can override tool-call timeouts with `PI_CURSOR_MCP_TOOL_TIMEOUT_MS` or `PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS`, and initialize/listTools timeouts with `PI_CURSOR_MCP_CONNECT_TIMEOUT_MS` or `PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS`.
29
+ - Cursor SDK MCP tool calls use a guarded timeout override because installed `@cursor/sdk` 1.0.14 has a 60-second MCP request default with no public per-server timeout option. The extension extends the verified Cursor SDK MCP `callTool` timeout path to 3600 seconds by default and shortens the verified first-send MCP initialize/listTools timeout paths to 10 seconds by default so unavailable configured MCP servers do not block the first reply for a full minute; unknown MCP protocol timeout stacks keep the SDK default. Users can override tool-call timeouts with `PI_CURSOR_MCP_TOOL_TIMEOUT_MS` or `PI_CURSOR_MCP_TOOL_TIMEOUT_SECONDS`, and initialize/listTools timeouts with `PI_CURSOR_MCP_CONNECT_TIMEOUT_MS` or `PI_CURSOR_MCP_CONNECT_TIMEOUT_SECONDS`.
29
30
  - Bridge diagnostics are opt-in only: `PI_CURSOR_PI_TOOL_BRIDGE_DEBUG=1` writes typed, allowlisted, scrubbed single-line JSONL records to `process.stderr` with prefix `[pi-cursor-sdk:bridge]`. Diagnostics are scrubbed operational logs, not anonymous telemetry. They intentionally include tool names, safe correlation IDs, run lifecycle, exposed pi↔MCP name pairs, queued requests, result resolution, rejection, cancellation, and pending counts. Correlation IDs are generated independently from the tokenized endpoint path, and Cursor MCP call IDs are hashed before serialization. Diagnostics must not include endpoint paths/URLs/path components/tokens, API keys, bearer tokens, cookies, session credentials, raw args/results, stdout/stderr payloads, file contents, Cursor settings output, or local private session paths in tracked docs, and they must not call pi UI status, notification, or footer APIs. If tool names themselves are unacceptable for a release target, bridge debug diagnostics are not safe for shared logs under the current contract.
30
31
  - This repo does not provide a generic desktop-automation, browser-driver, or CDP recipe. Provider docs should describe pi-cursor-sdk's Cursor provider/bridge contract only.
31
- - Cursor internal tool activity is recorded from SDK events and scrubbed. Maintainer reference for all 16 `@cursor/sdk@^1.0.13` `ToolType` values, runtime alias normalization, and intentional mapping/fallback rules: [Cursor native tool replay — SDK ToolType replay matrix](./cursor-native-tool-replay.md#sdk-tooltype-replay-matrix) (official SDK docs: https://cursor.com/docs/sdk/typescript). In interactive TTY sessions, supported completed `read`, `bash`, `grep`, `find`, `ls`, `edit`, `write`, diagnostics, delete, todo/plan, task, image generation, MCP, semantic search, and screen recording activity is replayed through pi's native tool-call rendering path with recorded Cursor results, so the TUI can show native-looking cards without rerunning Cursor's reads/shell commands/file edits. Cursor `glob` activity is replayed through native `find` cards. Cursor write activity is replayed through native-looking `write` cards, and Cursor StrReplace/edit activity uses native-looking `edit` only when recorded arguments truthfully satisfy pi's `edit` schema; path-only Cursor edit and notebook edit replay falls back to neutral Cursor activity before pi validation. Diagnostics, delete, todos/plans, task, image, and MCP activity use neutral Cursor activity cards with pi's default success/error shell. Neutral Cursor activity calls include `activityTitle` and, when available, `activitySummary` so partial/collapsed cards preserve identity such as `Cursor plan`, `Cursor todos`, `Cursor MCP`, or `Cursor edit`. For long-running or externally meaningful Cursor tools (`task`, `shell`, `mcp`, `generateImage`, `recordScreen`, `semSearch`, web search/fetch, plan/todo), the provider may surface one low-noise deferred in-progress thinking line such as `Cursor MCP: external_search` from bounded, scrubbed SDK args; fast local tools (`read`, `grep`, `glob`, and similar) skip lifecycle lines when completion follows immediately, and pi bridge MCP calls are excluded because pi already shows real pi tool execution ([lifecycle visibility](./cursor-native-tool-replay.md#low-noise-tool-lifecycle-visibility)). Replay-only tools display recorded Cursor results, normalize workspace-local paths/diff headers for display, use pi diff colors for edit previews and path-inferred syntax highlighting for write previews, and fail closed if called without a recorded result. Native replay wrappers are registered only for tool names not already owned by another extension; conflicting tools use the bounded scrubbed transcript fallback. Cursor workflow tools such as `SwitchMode` and Cursor todo state are not pi workflow controls; reported todo/plan events are displayed as Cursor activity only. Plan/todo replay cards can be followed by Cursor's final plan text, selected from `run.wait().result` when Cursor provides one and trimmed against already-emitted text. Started Cursor SDK tool calls that never receive a completion event are surfaced with bounded user-visible labels/traces (neutral activity cards when native replay routing allows, otherwise the same inactive or transcript trace fallbacks used for completed replay) instead of being silently discarded when the run failed/aborted, produced no assistant text, or involved external/side-effectful tools; incomplete fast local discovery starts (`read`, `grep`, `glob`, `ls`) remain maintainer-debug-only after successful text-producing runs so stale SDK start events do not create red post-answer cards. Explicit failures remain visible when Cursor reports them through completed tool calls or step results. Pi bridge MCP starts remain excluded from duplicate incomplete Cursor cards because pi already shows real pi tool execution. `PI_CURSOR_NATIVE_TOOL_DISPLAY=0` disables native replay, and `PI_CURSOR_REGISTER_NATIVE_TOOLS=0` is a registration-only opt-out that keeps the transcript fallback without shadowing pi tool names. When bridge or native replay cards are emitted, the provider mirrors Codex's turn shape as Cursor SDK activity arrives: assistant `toolUse`, pi `toolResult`s, live post-tool Cursor thinking/text, any later tool batches as further `toolUse` turns, then Cursor's final assistant answer. For shell replay, completed `stdout` / `stderr` are primary; unambiguous `shell-output-delta` data is used only as display-only fallback for empty successful shell completions, and overlapping shell calls drop ambiguous deltas instead of guessing. Non-interactive runs keep bounded scrubbed transcript output instead, preserving `pi -p` assistant text output. Cursor text deltas stream live when no live-run turn split is active.
32
+ - Cursor internal tool activity is recorded from SDK events and scrubbed. Maintainer reference for all 16 `@cursor/sdk@1.0.14` `ToolType` values, runtime alias normalization, and intentional mapping/fallback rules: [Cursor native tool replay — SDK ToolType replay matrix](./cursor-native-tool-replay.md#sdk-tooltype-replay-matrix) (official SDK docs: https://cursor.com/docs/sdk/typescript). In interactive TTY sessions, supported completed `read`, `bash`, `grep`, `find`, `ls`, `edit`, `write`, diagnostics, delete, todo/plan, task, image generation, MCP, semantic search, and screen recording activity is replayed through pi's native tool-call rendering path with recorded Cursor results, so the TUI can show native-looking cards without rerunning Cursor's reads/shell commands/file edits. Cursor `glob` activity is replayed through native `find` cards. Cursor write activity is replayed through native-looking `write` cards, and Cursor StrReplace/edit activity uses native-looking `edit` only when recorded arguments truthfully satisfy pi's `edit` schema; path-only Cursor edit and notebook edit replay falls back to neutral Cursor activity before pi validation. Diagnostics, delete, todos/plans, task, image, and MCP activity use neutral Cursor activity cards with pi's default success/error shell. Neutral Cursor activity calls include `activityTitle` and, when available, `activitySummary` so partial/collapsed cards preserve identity such as `Cursor plan`, `Cursor todos`, `Cursor MCP`, or `Cursor edit`. For long-running or externally meaningful Cursor tools (`task`, `shell`, `mcp`, `generateImage`, `recordScreen`, `semSearch`, web search/fetch, plan/todo), the provider may surface one low-noise deferred in-progress thinking line such as `Cursor MCP: external_search` from bounded, scrubbed SDK args; fast local tools (`read`, `grep`, `glob`, and similar) skip lifecycle lines when completion follows immediately, and pi bridge MCP calls are excluded because pi already shows real pi tool execution ([lifecycle visibility](./cursor-native-tool-replay.md#low-noise-tool-lifecycle-visibility)). Replay-only tools display recorded Cursor results, normalize workspace-local paths/diff headers for display, use pi diff colors for edit previews and path-inferred syntax highlighting for write previews, and fail closed if called without a recorded result. Native replay wrappers are registered only for tool names not already owned by another extension; conflicting tools use the bounded scrubbed transcript fallback. Cursor workflow tools such as mode/task/todo/plan activity are not pi workflow controls; reported todo/plan events are displayed as Cursor activity only. Plan/todo replay cards can be followed by Cursor's final plan text, selected from `run.wait().result` when Cursor provides one and trimmed against already-emitted text. Started Cursor SDK tool calls that never receive a completion event are surfaced with bounded user-visible labels/traces (neutral activity cards when native replay routing allows, otherwise the same inactive or transcript trace fallbacks used for completed replay) instead of being silently discarded when the run failed/aborted, produced no assistant text, or involved external/side-effectful tools; incomplete fast local discovery starts (`read`, `grep`, `glob`, `ls`) remain maintainer-debug-only after successful text-producing runs so stale SDK start events do not create red post-answer cards. Explicit failures remain visible when Cursor reports them through completed tool calls or step results. Pi bridge MCP starts remain excluded from duplicate incomplete Cursor cards because pi already shows real pi tool execution. `PI_CURSOR_NATIVE_TOOL_DISPLAY=0` disables native replay, and `PI_CURSOR_REGISTER_NATIVE_TOOLS=0` is a registration-only opt-out that keeps the transcript fallback without shadowing pi tool names. When bridge or native replay cards are emitted, the provider mirrors Codex's turn shape as Cursor SDK activity arrives: assistant `toolUse`, pi `toolResult`s, live post-tool Cursor thinking/text, any later tool batches as further `toolUse` turns, then Cursor's final assistant answer. For shell replay, completed `stdout` / `stderr` are primary; unambiguous `shell-output-delta` data is used only as display-only fallback for empty successful shell completions, and overlapping shell calls drop ambiguous deltas instead of guessing. Non-interactive runs keep bounded scrubbed transcript output instead, preserving `pi -p` assistant text output. Cursor text deltas stream live when no live-run turn split is active.
32
33
  - Synthetic replay names are internal compatibility details. New model-facing prompt text and user-visible cards use native tool names when renderer-compatible, or neutral Cursor activity labels when not. Legacy sessions containing old internal replay names are sanitized before prompt/display. Bridge MCP names such as `pi__sem_reindex` are MCP-only; pi session output uses real pi tool names.
33
34
  - Cursor SDK usage events report cumulative internal agent/tool/cache work, not the replayable pi prompt context. The extension does not copy raw Cursor SDK usage into pi usage or compaction. For Cursor assistant messages, `usage.input`/`usage.output` are approximate pi session activity components: initial Cursor prompt input is counted once, consumed split-run tool results are counted as deduped input on the following assistant turn, and assistant output includes visible text/thinking/tool-call content. `usage.totalTokens` is the replayable Cursor prompt/context estimate derived from the same `buildCursorPrompt()` path used for `Agent.send`; it may differ from `input + output` and is the context-safe value for display/compaction. `src/cursor-usage-accounting.ts` owns this usage policy, and `src/cursor-live-run-accounting.ts` owns prompt-once and consumed-tool-result accounting so provider usage and bridge result resolution share the same matched tool-result boundary.
34
35
  - Audit observation, 2026-05-19, superseded by the 2026-05-21 replay pass and #68 incomplete visibility, then narrowed by the 2026-05-26 fast-local suppression: a missing-file read with Composer 2.5 emitted `tool-call-started` for Cursor `read`, then streamed final text `Error: File not found`, but did not emit `tool-call-completed` or an `onStep` `toolCall` error result. Leftover external/side-effectful started calls are surfaced at run completion through the same native replay routing as completed tools (activity cards when allowed, otherwise inactive/transcript traces), while fast local discovery starts are debug-only after a successful text-producing run. Cursor-reported completed/step errors remain visible.
35
36
  - Maintainer visual verification for replay-card changes should follow [Cursor Native Tool Visual Audit Workflow](./cursor-native-tool-visual-audit.md): offscreen PTY-driven pi run, xterm.js/Playwright screenshot rendering, and JSONL inspection before accepting commits or PRs.
36
37
  - Cursor provider/runtime releases should follow [Cursor Live Smoke Checklist](./cursor-live-smoke-checklist.md) with real `pi -e . --cursor-no-fast --model cursor/composer-2.5` invocations, manual observation, temporary session dirs, diagnostics scans, and persisted JSONL inspection. See [Cursor testing lessons](./cursor-testing-lessons.md) for auth.json seeding, isolated smoke harnesses, and replay JSONL scans. Assume every runtime surface is in scope. A release is not ready when any live check is optional, deferred, mostly passing, or unobserved.
37
38
  - For models without a catalog `context` parameter, context windows are not hardcoded. The extension ships a bundled SDK-derived default/non-Max cache generated from `createAgentPlatform().checkpointStore.loadLatest(agentId).tokenDetails.maxTokens`. Successful runs can update a local override cache, but model discovery does not probe models at startup.
38
- - Max Mode context windows are distinct from default/non-Max context windows. `@cursor/sdk` 1.0.13 documentation says the SDK may enable Max Mode automatically when a selected model requires it, but the public local-agent `ModelSelection` path still does not expose a manual Max Mode selector. Do not advertise Max Mode context windows unless the SDK catalog exposes an exact parameter/variant or the SDK public API adds a Max Mode selector that the extension actually sends.
39
- - `@cursor/sdk` 1.0.13 adds latest-style `ModelListItem.aliases`. The extension registers only unambiguous aliases as pi model IDs (with the same context suffixes when applicable) and sends the alias back in `ModelSelection.id`, while sharing Cursor-only state such as fast defaults with the underlying catalog `id`. Aliases shared by multiple base models, such as generic family aliases, are skipped because the pi row metadata would otherwise imply one base model while Cursor may resolve the alias to another.
40
- - Session-scoped Cursor SDK agent pooling reuses one live `@cursor/sdk` agent across compatible follow-up turns within the same pi session scope. `planCursorSessionSend()` in `src/cursor-session-send-policy.ts` decides whether the next turn sends a full bootstrap prompt or an incremental follow-up, whether the SDK agent must be recreated, and why. `computeCursorContextFingerprint()` and `shouldBootstrapCursorContext()` remain the context-only bootstrap signal. The pool recreates the agent when context diverges, when branch or compaction summaries appear after `/tree` navigation or compaction, after 20 completed incremental sends, when the API key identity changes, after send errors, on `session_shutdown`, and when `session_before_tree` / `session_tree` invalidate the active branch. Incremental sends omit the full Cursor SDK tool boundary block because the session agent retains prior bootstrap context, but every send ends with a short tool tail guard placed after the latest user request.
39
+ - Max Mode context windows are distinct from default/non-Max context windows. `@cursor/sdk` 1.0.14 documentation says the SDK may enable Max Mode automatically when a selected model requires it, but the public local-agent `ModelSelection` path still does not expose a manual Max Mode selector. Do not advertise Max Mode context windows unless the SDK catalog exposes an exact parameter/variant or the SDK public API adds a Max Mode selector that the extension actually sends.
40
+ - `@cursor/sdk` 1.0.14 adds latest-style `ModelListItem.aliases`. The extension registers only unambiguous aliases as pi model IDs (with the same context suffixes when applicable) and sends the alias back in `ModelSelection.id`, while sharing Cursor-only state such as fast defaults with the underlying catalog `id`. Aliases shared by multiple base models, such as generic family aliases, are skipped because the pi row metadata would otherwise imply one base model while Cursor may resolve the alias to another.
41
+ - Session-scoped Cursor SDK agent pooling reuses one live `@cursor/sdk` agent across compatible follow-up turns within the same pi session scope. `planCursorSessionSend()` in `src/cursor-session-send-policy.ts` decides whether the next turn sends a full bootstrap prompt or an incremental follow-up, whether the SDK agent must be recreated, and why. `computeCursorContextFingerprint()` and `shouldBootstrapCursorContext()` remain the context-only bootstrap signal. The pool recreates the agent when context diverges, when branch or compaction summaries appear after `/tree` navigation or compaction, after 20 completed incremental sends, when the API key identity changes, after send errors, on `session_shutdown`, and when `session_before_tree` / `session_tree` invalidate the active branch. Incremental sends omit the full Cursor SDK tool boundary block because the session agent retains prior bootstrap context, but every send ends with a short tool tail guard placed after the latest user request (including an explicit shell `cd` hint).
41
42
  - Pi steering/follow-up delivery can arrive while a split live Cursor SDK run is still active. The provider resolves pending live runs by scanning trailing `toolResult` messages while skipping trailing `user` messages, tracks the active live run per session scope, and resumes the in-flight run instead of calling `Agent.send()` again. When the context ends with steering user text after tool results, the provider releases the prior live run and chains an incremental `Agent.send()` for the latest user message in the same provider turn; if the prior run emits more text or tool requests after steering arrives, that stale activity is cancelled instead of surfacing another old-run tool turn and losing the new user input. A pre-send guard waits for or resumes any still-active scoped live run before starting a fresh send so `@cursor/sdk` `AgentBusyError` (`already has active run`) does not surface to pi users. `acquireSessionCursorAgent()` also awaits fire-and-forget background `run.wait()` cleanup for the current pooled agent instance before returning a lease, so send planning, transcript offsets, and later `Agent.send()` do not race the prior turn's SDK run completion (for example pi auto-compaction summarization). Tracked completions and send commits are scoped to the pooled agent `instanceId` so disposal/replacement drops stale tracking and ignores late commits from disposed agents.
42
43
 
43
44
  ## Goal
@@ -49,7 +50,7 @@ Main outcomes:
49
50
  - `pi --list-models` shows pi-native Cursor models with accurate `contextWindow`, pi-controllable thinking metadata, and conservative defaults where the Cursor SDK does not expose limits or capabilities.
50
51
  - `shift+tab` is pi's native thinking control and drives Cursor `reasoning` or `effort`.
51
52
  - Cursor context options are represented as pi-visible model variants when they change native model metadata.
52
- - Cursor-only state, currently `fast`, is controlled by extension commands and shown through native status text.
53
+ - Cursor-only state (`fast` and Cursor SDK `mode`) is controlled by extension flags/commands and shown through native status text only when non-default.
53
54
  - The default pi footer remains intact.
54
55
  - Model capabilities are discovered from the Cursor SDK, not hardcoded per model.
55
56
 
@@ -136,6 +137,7 @@ Use native pi abstractions wherever possible:
136
137
  | Cursor `effort` | pi native thinking via `thinkingLevelMap` |
137
138
  | Cursor `thinking=false` | pi native `off` |
138
139
  | Cursor `fast` | extension state, not model identity |
140
+ | Cursor SDK `mode` | extension session state; `agent` by default, `plan` via SDK-native mode |
139
141
  | Footer | default pi footer plus optional extension status |
140
142
 
141
143
  Reason:
@@ -154,7 +156,7 @@ Rules:
154
156
  - Register one pi model for each Cursor base model and each unambiguous SDK alias when there is no Cursor `context` parameter.
155
157
  - Register one pi model per Cursor `context` value for each Cursor base model and each unambiguous SDK alias when the model exposes a `context` parameter.
156
158
  - Skip SDK aliases that collide with another base model ID or are shared by multiple base models; those aliases can resolve differently from the pi row metadata.
157
- - Do not encode `reasoning`, `effort`, `thinking`, or `fast` into pi model IDs.
159
+ - Do not encode `reasoning`, `effort`, `thinking`, `fast`, or Cursor SDK `mode` into pi model IDs.
158
160
  - Prefer stable, readable `@<context>` suffixes that do not conflict with pi's final `:<thinking>` suffix parser.
159
161
  - Sort Cursor models by base ID, then context value in Cursor SDK order before calling `pi.registerProvider()`. Registration order matters for `/model` display and model cycling; `--list-models` sorts output separately.
160
162
 
@@ -241,6 +243,8 @@ Cursor extension controls:
241
243
  | Action | Preferred control | Applies when |
242
244
  |---|---:|---|
243
245
  | Toggle fast | `/cursor-fast` | model has `fast` |
246
+ | Set SDK mode | `/cursor-mode agent\|plan` | Cursor model selected |
247
+ | Show tool surfaces (maintainer) | `/cursor-tools` | Cursor model selected |
244
248
 
245
249
  Do not register a shortcut for `shift+tab`. Pi reserves the native thinking keybinding, and the extension should only influence it through model metadata.
246
250
 
@@ -376,13 +380,42 @@ Status example:
376
380
  cursor fast
377
381
  ```
378
382
 
383
+ ## Cursor SDK Mode Behavior
384
+
385
+ Cursor SDK 1.0.14 exposes SDK-native conversation mode:
386
+
387
+ ```ts
388
+ type AgentModeOption = "agent" | "plan";
389
+ ```
390
+
391
+ Rules:
392
+
393
+ - Default mode is `agent`.
394
+ - Supported modes are exactly `agent` and `plan`.
395
+ - Mode is extension session state, not a model variant, not pi thinking/reasoning, not Cursor `fast`, and not pi's separate plan-mode extension.
396
+ - `--cursor-mode agent|plan` sets a one-run CLI override and does not append session state.
397
+ - `/cursor-mode agent` and `/cursor-mode plan` persist session mode with `pi.appendEntry()`.
398
+ - `/cursor-mode` with no args reports current mode and usage.
399
+ - Invalid CLI values fail non-UI runs and notify interactive users before the provider rejects the run.
400
+ - New SDK agents are seeded with `Agent.create({ mode })`.
401
+ - Every SDK send passes the effective mode through `agent.send(..., { mode })` so `/cursor-mode` and `--cursor-mode` remain the source of truth.
402
+ - Mode is not part of the session-agent pool key because Cursor SDK supports SDK-native per-send mode switches.
403
+ - Cursor plan/todo/task/mode activity remains display-only Cursor activity unless pi itself exposes a native state path. Replay cards do not mutate pi plan/todo state or active tools.
404
+
405
+ Status examples:
406
+
407
+ ```text
408
+ cursor plan
409
+ cursor fast · plan
410
+ ```
411
+
379
412
  ## Footer Behavior
380
413
 
381
414
  Hard requirement:
382
415
 
383
416
  - Leave pi's default footer intact.
384
417
  - Do not use `ctx.ui.setFooter()` for the first pass.
385
- - Use `ctx.ui.setStatus()` only for Cursor-only state that pi cannot show natively, such as `fast`.
418
+ - Use `ctx.ui.setStatus()` only for Cursor-only state that pi cannot show natively, such as `fast` and non-default Cursor SDK `plan` mode.
386
419
  - Non-cursor models must have no Cursor status.
387
420
 
388
421
  Reason:
@@ -396,13 +429,13 @@ Expected native footer behavior:
396
429
  - provider/model is shown by pi from the selected `cursor` model,
397
430
  - thinking level is shown by pi when `reasoning` is true,
398
431
  - context usage is computed from `contextWindow`,
399
- - extension status adds only Cursor-only text such as `cursor fast`.
432
+ - extension status adds only Cursor-only text such as `cursor fast`, `cursor plan`, or `cursor fast · plan`.
400
433
 
401
434
  `ctx.ui.setStatus()` adds an extension status line in the default footer. It does not patch the built-in model segment. The native shape is closer to:
402
435
 
403
436
  ```text
404
437
  ... (cursor) gpt-5.5@1m • medium
405
- cursor fast
438
+ cursor fast · plan
406
439
  ```
407
440
 
408
441
  not:
@@ -430,6 +463,7 @@ The extension persists only Cursor-only state:
430
463
 
431
464
  - `fast` per session,
432
465
  - `fast` global default per Cursor base model,
466
+ - Cursor SDK `mode` per session,
433
467
  - any future Cursor-only parameter that does not map to pi model metadata.
434
468
 
435
469
  Use:
@@ -453,7 +487,7 @@ Restore:
453
487
 
454
488
  - pi model, including context variant,
455
489
  - pi thinking level,
456
- - session Cursor-only state such as `fast`.
490
+ - session Cursor-only state such as `fast` and Cursor SDK `mode`.
457
491
 
458
492
  ### New Session
459
493
 
@@ -469,6 +503,7 @@ Guaranteed first-pass support:
469
503
 
470
504
  ```bash
471
505
  pi --model cursor/gpt-5.5@1m --thinking medium
506
+ pi --model cursor/gpt-5.5@1m --cursor-mode plan
472
507
  pi --model cursor/gpt-5.5@1m:medium
473
508
  pi --model cursor/gpt-5.5@272k:xhigh
474
509
  ```
@@ -487,13 +522,15 @@ Reason:
487
522
  - Cursor-only parameters are not generic pi CLI parameters.
488
523
  - Context is already represented by the registered pi model ID.
489
524
  - `fast` is controlled by saved extension defaults or the first-pass `--cursor-fast` extension flag.
525
+ - Cursor SDK `mode` is controlled by `/cursor-mode` session state or the first-pass `--cursor-mode` extension flag; it is never encoded in `--model`.
490
526
 
491
527
  For print mode:
492
528
 
493
529
  - no keybindings,
494
530
  - use selected context model variant,
495
531
  - use `--thinking` or `:medium` for reasoning/effort,
496
- - use saved global `fast` defaults unless `--cursor-fast` is present.
532
+ - use saved global `fast` defaults unless `--cursor-fast` is present,
533
+ - use Cursor SDK `agent` mode unless `/cursor-mode` session state or `--cursor-mode` overrides it.
497
534
 
498
535
  Fast flag example:
499
536
 
@@ -647,6 +684,7 @@ Before calling done:
647
684
  - dynamic capability discovery
648
685
  - context variant registration and decoding
649
686
  - fast extension state and status behavior
687
+ - Cursor SDK mode session/CLI state and status behavior
650
688
  - `reasoning` mapping
651
689
  - `effort` mapping
652
690
  - boolean `thinking` maps to pi `off` / enabled levels
@@ -661,6 +699,7 @@ Before calling done:
661
699
  - launch interactive with Cursor
662
700
  - verify default pi footer remains unchanged
663
701
  - verify Cursor `fast` status appears only when enabled
702
+ - verify Cursor `plan` status appears only in non-default mode and combines with fast as `cursor fast · plan`
664
703
  - verify non-cursor footer/status unchanged
665
704
  - verify `shift+tab` uses pi native thinking
666
705
  - verify context changes through native model selection
@@ -670,7 +709,8 @@ Before calling done:
670
709
  - `pi --model cursor/gpt-5.5@1m:medium -p "Say ok only"`
671
710
  - `pi --model cursor/gpt-5.5@272k --thinking xhigh -p "Say ok only"`
672
711
  - `pi --model cursor/gpt-5.5@1m --cursor-fast -p "Say ok only"`
673
- - confirm requests use selected context, pi thinking, and fast flag state
712
+ - `pi --model cursor/gpt-5.5@1m --cursor-mode plan -p "Say ok only"`
713
+ - confirm requests use selected context, pi thinking, fast flag state, and SDK-native mode
674
714
 
675
715
  4. Tool bridge and replay:
676
716
  - `npm test -- test/cursor-pi-tool-bridge.test.ts test/cursor-provider.test.ts test/cursor-mcp-timeout-override.test.ts`
@@ -1,5 +1,7 @@
1
1
  # Cursor native tool replay
2
2
 
3
+ User-facing overview of callable vs display-only tools: [Cursor tool surfaces in pi](./cursor-tool-surfaces.md).
4
+
3
5
  pi-cursor-sdk has two separate pi-facing paths plus Cursor's own local-agent tool surface:
4
6
 
5
7
  1. **Local pi MCP bridge:** default-on for local Cursor agents. It exposes the current pi session's bridgeable active tools to Cursor through a tokenized `127.0.0.1` MCP endpoint, excluding internal Cursor replay activity names and, by default, overlapping built-in pi tools (`read`, `bash`, `write`, `edit`, `grep`, `find`, `ls`). When Cursor calls one of those MCP tools, pi executes the real pi tool through the normal pi tool path.
@@ -17,6 +19,8 @@ This document is about replay. Replay is not execution and is not the local pi b
17
19
 
18
20
  Replay labels, replay cards, and transcript tool names are display-only/context-only. Bridge MCP names are also not pi tool names: Cursor must call the exposed `pi__*` MCP name, while pi history and cards use the real pi tool name.
19
21
 
22
+ Cursor SDK `plan` mode (`--cursor-mode plan` or `/cursor-mode plan`) can make Cursor produce plan-oriented text and plan/todo activity. Replay still treats Cursor `createPlan`, `updateTodos`, task/mode, and related workflow activity as display-only Cursor activity. It does not switch pi into plan mode, mutate pi todos, or change pi active tools.
23
+
20
24
  ## Local pi bridge summary
21
25
 
22
26
  The bridge is enabled by default when bridgeable active pi tools exist. Cursor sees bridge-owned MCP names such as `pi__sem_reindex`, while pi history and tool cards use the real pi tool name such as `sem_reindex`. The bridge hides overlapping built-in pi tools by default because Cursor already has native equivalents; extension/custom tools and non-overlapping active tools present in pi's active tool registry normally remain exposed. pi-cursor-sdk also registers `cursor_ask_question` for Cursor models when the bridge is enabled, exposed to Cursor as `pi__cursor_ask_question`, so Cursor can ask the user to choose instead of silently defaulting when the pi UI is available. The bridge does not call pi tool `execute()` handlers directly; it queues the request, emits a real pi `toolCall`, waits for the matching pi `toolResult`, and resolves the Cursor MCP call back into the same live Cursor SDK run without creating a new `Agent`, unless the run was disposed, aborted, or cancelled.
@@ -58,15 +62,17 @@ When Cursor reports completed tool activity, the extension can display recorded
58
62
 
59
63
  Cursor `glob` activity is displayed through native `find` cards.
60
64
 
61
- For the full `@cursor/sdk@^1.0.13` `ToolType` set, disposition matrix, and runtime alias normalization, see [SDK ToolType replay matrix](#sdk-tooltype-replay-matrix) below. Official SDK reference: https://cursor.com/docs/sdk/typescript
65
+ For the full `@cursor/sdk@1.0.14` `ToolType` set, disposition matrix, and runtime alias normalization, see [SDK ToolType replay matrix](#sdk-tooltype-replay-matrix) below. Official SDK reference: https://cursor.com/docs/sdk/typescript
62
66
 
63
67
  Edit and write activity replays through pi-facing `edit` and `write` cards only when replay arguments truthfully satisfy the matching pi schema, but still uses recorded Cursor results only. The adapter passes through truthful Cursor paths, content when Cursor reported it, and recorded diff/details; it does not pretend Cursor's editing schema is pi's schema and it fails closed if a recorded replay result is missing. Cursor `StrReplace` with recorded replacement text displays as native-looking `edit`; path-only Cursor `edit` and notebook edit activity fall back to neutral Cursor activity so pi does not reject the replay before recorded-result handling. Cursor `write` displays as native-looking `write`. Diagnostics, delete, todos/plans, task, image, MCP, semantic search, screen recording, and web search/fetch activity use neutral Cursor activity cards with pi's default success/error tool shell. MCP completions whose `toolName` is `WebSearch` / `web_search` / `WebFetch` / similar are labeled **Cursor web search** or **Cursor web fetch** instead of generic **Cursor MCP**. Neutral Cursor activity cards carry display metadata such as `activityTitle` and `activitySummary`, so partial/collapsed cards can say `Cursor plan`, `Cursor todos`, `Cursor MCP`, `Cursor semantic search`, `Cursor screen recording`, `Cursor web search`, `Cursor web fetch`, or `Cursor edit` instead of only `Cursor activity`. These replay tools only display recorded Cursor results; they never mutate files or execute tool work directly. Replay paths are normalized to workspace-relative paths when possible. Most collapsed replay cards include bounded previews for diffs and text details so small edits, todos, task output, and MCP results are visible without expanding; web search/fetch activity stays summary-only while collapsed because those cards often arrive after final text and can otherwise bury the answer. Ctrl+O expansion shows the recorded details. Edit previews omit raw unified diff headers and show compact numbered changed/context lines using pi's native diff added/removed/context colors, and write previews use syntax highlighting when pi can infer a language from the path. Image generation replay cards show the saved image path in the collapsed summary and render the image inline when pi terminal image display is enabled and the generated file is still readable.
64
68
 
65
69
  ## SDK ToolType replay matrix
66
70
 
67
- Source of truth for SDK tool names: `@cursor/sdk@^1.0.13` conversation `ToolType` values and https://cursor.com/docs/sdk/typescript
71
+ Source of truth for SDK tool names: `@cursor/sdk@1.0.14` conversation `ToolType` values and https://cursor.com/docs/sdk/typescript
68
72
 
69
- Implementation owners: `src/cursor-transcript-tool-specs.ts` (`TOOL_DISPLAY_SPECS`), `src/cursor-native-tool-display-replay.ts`, and `src/cursor-transcript-utils.ts` (`normalizeToolName()`).
73
+ Implementation owners: `src/cursor-tool-presentation-registry.ts` (canonical names, labels, visibility, replay policy, bridge exclusions for internal replay wrappers, and display-spec key completeness), `src/cursor-transcript-tool-specs.ts` (registry-keyed `TOOL_DISPLAY_SPECS` formatters/builders), `src/cursor-native-tool-display-replay.ts` (replay card rendering derived from registry replay metadata), and `src/cursor-transcript-utils.ts` (`normalizeToolName()` delegating to the registry).
74
+
75
+ **Maintainer invariants — edit/write replay previews:** All colored diff rendering (native `edit` cards and `Cursor edit` activity fallbacks) flows through the single `formatCursorReplayDiff()` in `src/cursor-native-tool-display-replay.ts`. Activity write fallbacks with structured `fileContentAfterWrite` use the same `formatCursorReplayFilePreview()` path as native `write` cards. Structured `diffString` (and `diff`/`lines*`) or `fileContentAfterWrite` on `CursorReplay*Details` (including activity variants) is the source of truth for TUI preview coloring/highlighting. `expandedText` on activity details is for summary/expansion and legacy JSONL compatibility only; it is never the primary preview source when structured fields are present. Legacy paths retain `extractUnifiedDiffSection` + delegation solely for old session JSONL that predates structured population; no parallel +/- coloring loops exist for new paths.
70
76
 
71
77
  This matrix covers **Cursor native tool replay only**. It does not describe the [live pi MCP bridge](#live-bridge-vs-replay) or Cursor-native host tools, settings, plugins, and configured MCP servers from the Cursor SDK local-agent path.
72
78
 
@@ -81,8 +87,8 @@ This matrix covers **Cursor native tool replay only**. It does not describe the
81
87
  | `write` | native replay or neutral activity | `write` or `cursor` | Native `write` only when recorded content/path args satisfy pi's `write` schema; otherwise neutral **Cursor write** activity |
82
88
  | `delete` | neutral activity | `cursor` | Collapsed label **Cursor delete** |
83
89
  | `readLints` | neutral activity | `cursor` | Collapsed label **Cursor diagnostics** |
84
- | `updateTodos` | neutral activity | `cursor` | Collapsed label **Cursor todos**; display-only, does not drive pi todos |
85
- | `createPlan` | neutral activity | `cursor` | Collapsed label **Cursor plan**; display-only, does not drive pi plan mode |
90
+ | `updateTodos` | neutral activity | `cursor` | Collapsed label **Cursor todos**; display-only, does not drive pi todos, including in Cursor SDK `plan` mode |
91
+ | `createPlan` | neutral activity | `cursor` | Collapsed label **Cursor plan**; display-only, does not drive pi plan mode, including in Cursor SDK `plan` mode |
86
92
  | `task` | neutral activity | `cursor` | Collapsed label **Cursor task** |
87
93
  | `generateImage` | neutral activity | `cursor` | Collapsed label **Cursor image generation** |
88
94
  | `mcp` | neutral activity | `cursor` | Collapsed label **Cursor MCP** for non-web MCP completions; web search/fetch MCP `toolName` values reclassify to the rows below |
@@ -92,7 +98,9 @@ This matrix covers **Cursor native tool replay only**. It does not describe the
92
98
  | *(host/MCP alias)* `WebFetch` / `web_fetch` / similar | neutral activity | `cursor` | Collapsed label **Cursor web fetch**; display-only Cursor web access reported by the SDK, not an executable pi web tool |
93
99
  | _(no spec; future/unknown SDK name)_ | neutral activity | `cursor` | Collapsed label **Cursor** plus SDK tool name via `buildGenericPiToolDisplay()`; bounded fallback transcript only |
94
100
 
95
- **Unknown/future fallback path:** SDK tool names with no `TOOL_DISPLAY_SPECS` entry (future or unknown types) use `buildGenericPiToolDisplay()` in `src/cursor-transcript-tool-specs.ts` with bounded `formatFallback()` content from `src/cursor-transcript-tool-formatters.ts`. When native replay is enabled, those completions queue through neutral pi tool name `cursor` (not native pi `read`/`bash`/… cards). Collapsed labels read like **Cursor futureSemSearchWidget** (title `Cursor` plus the SDK tool name) with optional bounded `activitySummary` from scrubbed args/result lines. Errors keep `details.summary` undefined so unbounded raw errors do not leak into replay cards (#52). Known explicit specs still win over this path; pi and bridge tool names are never shadowed.
101
+ **Unknown/future fallback path:** SDK tool names with no registry-backed `TOOL_DISPLAY_SPECS` entry (future or unknown types) use `buildGenericPiToolDisplay()` in `src/cursor-transcript-tool-specs.ts` with bounded `formatFallback()` content from `src/cursor-transcript-tool-formatters.ts`. Lookup uses `Object.hasOwn(TOOL_DISPLAY_SPECS, name)` so inherited object keys such as `constructor` or `toString` cannot accidentally match a registry spec. When native replay is enabled, those completions queue through neutral pi tool name `cursor` (not native pi `read`/`bash`/… cards). Collapsed labels read like **Cursor futureSemSearchWidget** (title `Cursor` plus the SDK tool name) with optional bounded `activitySummary` from scrubbed args/result lines. Errors keep `details.summary` undefined so unbounded raw errors do not leak into replay cards (#52). Known explicit specs still win over this path; real pi bridge tool names such as `edit` and `write` are not suppressed by internal replay-wrapper exclusions.
102
+
103
+ **Replay detail disposition model:** `src/cursor-replay-tool-details.ts` stores replay card disposition separately from SDK source tool identity. Variants are `nativeEdit`, `nativeWrite`, `activity` (`sourceToolName` + display `title`), `generateImage`, and `genericFallback`. Path-only or notebook edit/write fallbacks produce `activity` details (neutral `cursor` cards) instead of structured edit/write variants with optional `title` escape hatches. Native edit/write cards use `nativeEdit` / `nativeWrite` only when pi-facing replay args satisfy the matching schema. The renderer dispatches on `variant` only; legacy payloads with `cursorToolName`/`title` are parsed into the matching disposition at the boundary.
96
104
 
97
105
  Neutral activity rows use pi tool name `cursor` with `activityTitle` / `activitySummary` metadata. Legacy internal replay label keys such as `cursor_sem_search` are compatibility details; user-visible collapsed cards use labels like **Cursor semantic search**.
98
106
 
@@ -122,7 +130,7 @@ These behaviors are by design. They are not pi replay execution bugs:
122
130
  - **`shell` → `bash`:** Cursor shell completions render as native pi `bash` cards, including aliases normalized to `shell`.
123
131
  - **`edit` / `StrReplace` / notebook edits:** native pi `edit` cards only when recorded replay args truthfully satisfy pi's `edit` schema; otherwise neutral **Cursor edit** activity so pi validation does not reject the replay before recorded-result handling.
124
132
  - **`write`:** native pi `write` cards only when recorded content/path args satisfy pi's schema; otherwise neutral **Cursor write** activity.
125
- - **Plan/todo tools:** `createPlan` and `updateTodos` replay is display-only and does not drive pi plan mode or pi todo state (see [What replay does not do](#what-replay-does-not-do)).
133
+ - **Plan/todo tools:** `createPlan` and `updateTodos` replay is display-only and does not drive pi plan mode or pi todo state, even when Cursor SDK mode is `plan` (see [What replay does not do](#what-replay-does-not-do)).
126
134
  - **`semSearch`:** semantic codebase search activity, not web search.
127
135
  - **Web search/fetch:** visible **Cursor web search** / **Cursor web fetch** activity when the SDK reports completed replayable tool data (SDK `mcp` with web `toolName`, host aliases above, or local transcript `webSearchToolCall` / `webFetchToolCall` records). These cards are display-only; pi does not expose executable web search/fetch tools through replay.
128
136
  - **Unknown/future SDK tools:** neutral Cursor activity cards titled with the SDK tool name (for example **Cursor futureSemSearchWidget**) and bounded scrubbed args/result/error text until an explicit spec is added.
@@ -1,8 +1,40 @@
1
1
  # Cursor Native Tool Visual Audit Workflow
2
2
 
3
- This workflow verifies Cursor SDK tool replay the way a human sees it in pi's interactive TUI, without stealing macOS focus.
3
+ This workflow is the canonical repo path for verifying Cursor SDK tool replay the way a human sees it in pi's interactive TUI, without stealing macOS focus.
4
4
 
5
- Use it before accepting replay-card commits or PRs. Text logs and JSONL are necessary, but they are not enough when the claim is visual parity: always keep before/after PNGs for the exact prompt.
5
+ Use it before accepting replay-card commits or PRs, and for every Cursor provider/runtime release where TUI card/color behavior could regress. Text logs and JSONL are necessary, but they are not enough when the claim is visual parity: always keep PNGs for the exact prompt, and keep before/after PNGs when reviewing a rendering change.
6
+
7
+ Current cutover baseline: pi 0.76.0+, exact `@cursor/sdk@1.0.14`, local validation packages `@earendil-works/pi-ai`, `@earendil-works/pi-coding-agent`, and `@earendil-works/pi-tui` at 0.76.0.
8
+
9
+ ## Cursor SDK 1.0.14 / pi 0.76.0 cutover visual record
10
+
11
+ Record the required cutover validation here or in the final release handoff. The default matrix is native replay only: the runner forces native replay registration on, forces Cursor setting sources off, disables the pi bridge, disables overlapping built-in pi tool exposure, and clears inherited Cursor SDK event-debug artifact env. With `--event-debug`, debug capture writes to a deterministic directory under the visual output directory. Do not commit raw ANSI logs, screenshots, terminal recordings, debug artifacts, or `.debug/visual-smoke` scratch files.
12
+
13
+ | Field | Required value / evidence |
14
+ | --- | --- |
15
+ | Command/session used | `npm run smoke:visual -- --ext "$PWD" --cwd "$PWD" --mode plan --out-dir <fresh /tmp dir> --label <matrix label> --prompt <matrix prompt>` with default native-replay isolation |
16
+ | Baseline versions | `pi --version` = 0.76.0; `npm ls` = `@cursor/sdk@1.0.14` and local `@earendil-works/*@0.76.0` |
17
+ | Card categories checked | Claim only categories proven by both PNG and JSONL. Required cutover categories are read, grep/search, find/glob, list, shell success, write, edit/diff, and true read failure. Neutral Cursor plan/todo/task/mode activity is optional/opportunistic and only counts when JSONL contains a completed Cursor workflow event. |
18
+ | Observed status/card colors | Confirm native-looking cards use native pi styling; neutral Cursor activity is not red; true errors are distinct; diff previews show red/green; plan status is readable |
19
+ | Screenshot/ANSI evidence location | External path only, for example `/tmp/pi-cursor-sdk-1014-visual.*/read-package.{ansi,txt,html,png,jsonl.path}` |
20
+ | Debug artifact location | External `.debug/cursor-sdk-events/...` or temp artifact directory path only; do not commit raw artifacts |
21
+ | Pass/fail notes | Summarize any mismatch, blocker, or auth/environment limitation |
22
+
23
+ Required prompt matrix for this cutover:
24
+
25
+ | Label | Prompt | Required JSONL proof | Required visual proof |
26
+ | --- | --- | --- | --- |
27
+ | `read-package` | `Use only your file read tool. Read ./package.json and answer with only the package name. Do not use shell, grep, glob, find, or list tools.` | `toolCall.name=read`, `toolResult.toolName=read`, `isError=false` | Native-looking read card; collapsed label/path readable |
28
+ | `grep-readme` | `Use only your grep/search tool to search ./README.md for the literal string "pi-cursor-sdk". Do not use shell, read, glob, find, ls, or list tools. Report only the first matching file path.` | `toolCall.name=grep`, `toolResult.toolName=grep`, `isError=false` | Native-looking grep/search card; match preview readable |
29
+ | `find-readme` | `Use only your glob/file-search/find tool to find README.md from the repository root. Do not use shell, read, grep, ls, or list tools. Report matched paths exactly.` | `toolCall.name=find`, `toolResult.toolName=find`, `isError=false` | Native-looking find/glob card; matched path readable |
30
+ | `list-src` | `Use only your directory listing tool to list ./src. Do not use shell, read, grep, glob, or find tools. Report whether cursor-provider.ts is present.` | `toolCall.name=ls`, `toolResult.toolName=ls`, `isError=false` | Native-looking list card; directory/path readable |
31
+ | `shell-success` | `Use only your shell/terminal tool to run printf 'cursor visual smoke\\n'. Do not use read, grep, glob, find, ls, edit, or write. Report the output.` | `toolCall.name=bash`, `toolResult.toolName=bash`, `isError=false` | Shell success card is not red/error-styled; stdout readable |
32
+ | `write-file` | `Use your normal file write tool to create .debug/visual-smoke/cursor-mode.txt with exactly two lines: alpha and beta. Do not use shell.` | `toolCall.name=write`, `toolResult.toolName=write`, `isError=false` | Native-looking write card; path/content preview readable |
33
+ | `edit-file` | `Use your normal file edit/str-replace tool to change beta to gamma in .debug/visual-smoke/cursor-mode.txt. Do not use shell.` | `toolCall.name=edit`, `toolResult.toolName=edit`, `isError=false` | Native-looking edit card; diff preview shows red/green added/removed lines |
34
+ | `read-missing` | `Use only your file read tool to read .debug/visual-smoke/does-not-exist.txt. Then explain the result. Do not use shell, grep, glob, find, ls, edit, or write.` | `toolCall.name=read`, `toolResult.toolName=read`, `isError=true` | True failure is visible, bounded, and distinct from neutral Cursor activity |
35
+ | `workflow-activity` | `Stay in Cursor plan mode. If Cursor exposes plan, todo, task, or mode activity for this request, use that capability to outline a tiny unit test without editing files. Otherwise answer with a concise numbered plan. Do not use shell or file mutation tools.` | Optional: completed `cursor` activity whose details/source identify `createPlan`, `updateTodos`, `task`, or mode activity. If absent, record this category as not exercised. | Optional: neutral Cursor workflow activity is neutral, not red, and does not mutate pi plan/todo state. If absent, do not claim this visual category passed. |
36
+
37
+ Do not mark a category passed because the prompt was sent. A category passes only when the PNG shows the expected card and the JSONL shows the expected completed `toolCall` / `toolResult` pair. If Cursor chooses a different tool, rerun with a tighter prompt or record that the category was not exercised.
6
38
 
7
39
  ## When to use this
8
40
 
@@ -16,70 +48,70 @@ Use this workflow when changing or reviewing:
16
48
 
17
49
  Do not use this for ordinary unit-only logic changes.
18
50
 
19
- ## Why this workflow exists
51
+ ## Canonical visual inspection path
20
52
 
21
53
  Earlier manual verification used a visible Terminal window plus `screencapture`. That worked, but it stole system focus and made it easy for the user to type into the audit window by accident.
22
54
 
23
- The preferred workflow is now offscreen:
55
+ The canonical workflow is now offscreen and browser-rendered:
24
56
 
25
57
  1. Spawn `pi` in a pseudo-terminal at a fixed size.
26
58
  2. Feed the prompt programmatically.
27
- 3. Save raw ANSI output and plain text output.
28
- 4. Render the terminal buffer through xterm.js in headless Playwright.
29
- 5. Save a PNG screenshot.
59
+ 3. Save raw ANSI output and stripped plain text output.
60
+ 4. Render the terminal buffer through a browser-backed terminal renderer, preferably xterm.js.
61
+ 5. Save PNG screenshots with `agent_browser` when the harness is available, or Playwright directly when running outside that harness.
30
62
  6. Inspect the session JSONL for exact persisted `toolCall` / `toolResult` data.
31
63
 
32
- This gives human-like visual evidence without activating Terminal, iTerm, or a browser window.
64
+ This is the best default release path because it exercises the real pi TUI, captures card class/color/label/order/truncation issues before users see them, avoids desktop focus stealing, and leaves reviewable artifacts. Use visible Terminal/Ghostty screenshots only for terminal-specific or pixel-level bugs that cannot be judged through browser-rendered ANSI.
33
65
 
34
66
  ## Tool stack
35
67
 
36
- Install the harness outside this repo so generated assets and temporary dependencies do not pollute commits:
68
+ The canonical runner is checked in at `scripts/visual-tui-smoke.mjs` and exposed as `npm run smoke:visual`. It uses tmux for the fixed-size PTY, `@xterm/xterm` for browser rendering, and Playwright for automatic PNG capture. It resolves `pi` by directly walking the parent `PATH`, uses `process.execPath` for Node, and prepends that Node directory for prereq checks and tmux launches so `#!/usr/bin/env node` shims use the validated Node and a login shell or stale tmux server `PATH` cannot silently select a different executable.
69
+
70
+ One-time setup from a clean checkout:
37
71
 
38
72
  ```bash
39
- HARNESS=/tmp/pi-visual-harness
40
- rm -rf "$HARNESS"
41
- mkdir -p "$HARNESS"
42
- cd "$HARNESS"
43
- npm init -y
44
- npm install node-pty @xterm/xterm playwright
45
- npm rebuild node-pty
73
+ npm install
74
+ npx playwright install chromium
46
75
  ```
47
76
 
48
- `npm rebuild node-pty` is useful after Node upgrades; without it, `node-pty` may fail with `posix_spawnp failed`.
77
+ `npx playwright install chromium` is only needed for automatic PNG capture. When running inside the pi agent harness, `agent_browser` is the preferred screenshot tool for generated HTML/ANSI output because it can open local files, verify saved artifacts, and capture exact evidence paths; in that case, run `npm run smoke:visual -- --no-screenshot ...` and screenshot the generated `.html` with `agent_browser`. Outside the harness, use Playwright through the checked-in runner.
49
78
 
50
79
  ## Runner contract
51
80
 
52
- A runner script should:
53
-
54
- - Spawn `pi -e <extension-dir> --model cursor/composer-2.5` with:
55
- - `PI_CURSOR_NATIVE_TOOL_DISPLAY=1`
56
- - `TERM=xterm-256color`
57
- - fixed PTY size, for example `150x45`
58
- - cwd set to the target audit repo.
59
- - Wait for startup.
60
- - Write the exact prompt and carriage return to the PTY.
61
- - Wait a bounded amount of time.
62
- - Save:
63
- - `<label>.ansi` raw terminal bytes.
64
- - `<label>.txt` stripped text for quick search.
65
- - `<label>.png` rendered xterm screenshot.
66
- - `<label>.jsonl.path` pointing to the latest pi session JSONL.
67
- - Kill the PTY child after capture.
68
- - Check for leftover commands when prompts can background work, especially shell timeout tests.
69
-
70
- Example invocation shape:
81
+ `scripts/visual-tui-smoke.mjs` is the durable source of truth for this workflow. It must keep supporting:
82
+
83
+ - fixed-size tmux PTY execution of the parent-resolved `pi -e <extension-dir> --model cursor/composer-2.5`
84
+ - parent-resolved `pi` and `tmux` command paths reused in tmux-launched runs, with `process.execPath`'s directory prepended for prereq checks and tmux launches so Node shims use the validated Node
85
+ - `PI_CURSOR_NATIVE_TOOL_DISPLAY=1`
86
+ - `PI_CURSOR_REGISTER_NATIVE_TOOLS=1` by default
87
+ - `PI_CURSOR_SETTING_SOURCES=none` by default
88
+ - `PI_CURSOR_PI_TOOL_BRIDGE=0` by default
89
+ - `PI_CURSOR_EXPOSE_BUILTIN_TOOLS=0` by default
90
+ - Cursor SDK event-debug artifact env cleared before each run; `--event-debug` sets a deterministic debug directory under `--out-dir`
91
+ - `TERM=xterm-256color`
92
+ - cwd set to the target audit repo
93
+ - prompt paste plus carriage return into the interactive TUI
94
+ - bounded post-prompt wait via `--wait-ms`
95
+ - artifacts outside the repo by default
96
+ - `<label>.ansi`, `<label>.txt`, `<label>.html`, `<label>.png`, and `<label>.jsonl.path`
97
+ - `--label`, `--ext`, `--cwd`, `--prompt`, `--prompt-file`, `--wait-ms`, and `--out-dir`
98
+ - `--setting-sources` and `--bridge` opt-ins for non-default visual audits; `--expose-builtin-tools` is accepted only with `--bridge`
99
+ - repeatable `--leftover-pattern` checks for prompts that can background work
100
+ - `-h` / `--help` with examples and exit codes
101
+
102
+ Example invocation:
71
103
 
72
104
  ```bash
73
- node /tmp/pi-visual-harness/run-pi-visual.mjs \
74
- --label after-shell-nonzero \
75
- --ext /path/to/pi-cursor-sdk \
76
- --cwd /path/to/test-workspace \
77
- --prompt "Run \`printf 'cursor-shell-stderr\\n' >&2; exit 7\` using only the shell/terminal tool. Do not use read, grep, glob, find, ls, edit, or write. Print the command result exactly, then stop." \
78
- --wait-ms 30000 \
79
- --out-dir /tmp/pi-visual-harness/review-current
105
+ npm run smoke:visual -- \
106
+ --label shell-success \
107
+ --ext "$PWD" \
108
+ --cwd "$PWD" \
109
+ --prompt "Use only your shell/terminal tool to run printf 'cursor visual smoke\\n'. Do not use read, grep, glob, find, ls, edit, or write. Report the output." \
110
+ --wait-ms 60000 \
111
+ --out-dir /tmp/pi-cursor-sdk-visual-review
80
112
  ```
81
113
 
82
- Keep the runner in `/tmp` unless the project explicitly decides to check in a maintained audit harness.
114
+ The runner writes the `.png` through Playwright by default. In the pi agent harness, pass `--no-screenshot`, open the generated `.html` with `agent_browser`, save a PNG screenshot, and record that path beside the runner artifacts. The default evidence is native replay evidence only. For bridge/default-settings visual audits, pass `--bridge`, `--bridge --expose-builtin-tools`, or `--setting-sources <value>` explicitly and label that evidence separately.
83
115
 
84
116
  ## Before/after comparison
85
117
 
@@ -103,34 +135,35 @@ ln -s "$AFTER_WT/node_modules" "$BEFORE_WT/node_modules"
103
135
  Then run the same prompt against both extension dirs:
104
136
 
105
137
  ```bash
106
- node /tmp/pi-visual-harness/run-pi-visual.mjs \
138
+ npm run smoke:visual -- \
107
139
  --label before-glob-single \
108
140
  --ext "$BEFORE_WT" \
109
141
  --cwd "$TARGET" \
110
- --prompt "Find files matching \`src/tools/reindex.ts\` using only the glob/file-search tool. Do not use shell, bash, grep, read, or ls. Print the matched files exactly as found, then stop." \
142
+ --prompt "Use only your glob/file-search/find tool to find src/tools/reindex.ts. Do not use shell, bash, grep, read, ls, or list. Print the matched files exactly as found, then stop." \
111
143
  --wait-ms 16000 \
112
- --out-dir /tmp/pi-visual-harness/review-current
144
+ --out-dir /tmp/pi-cursor-sdk-visual-review-current
113
145
 
114
- node /tmp/pi-visual-harness/run-pi-visual.mjs \
146
+ npm run smoke:visual -- \
115
147
  --label after-glob-single \
116
148
  --ext "$AFTER_WT" \
117
149
  --cwd "$TARGET" \
118
- --prompt "Find files matching \`src/tools/reindex.ts\` using only the glob/file-search tool. Do not use shell, bash, grep, read, or ls. Print the matched files exactly as found, then stop." \
150
+ --prompt "Use only your glob/file-search/find tool to find src/tools/reindex.ts. Do not use shell, bash, grep, read, ls, or list. Print the matched files exactly as found, then stop." \
119
151
  --wait-ms 16000 \
120
- --out-dir /tmp/pi-visual-harness/review-current
152
+ --out-dir /tmp/pi-cursor-sdk-visual-review-current
121
153
  ```
122
154
 
123
- For review, create a simple HTML/PNG gallery that places `before-*.png` and `after-*.png` side by side. Keep the generated gallery in `/tmp` unless explicitly asked to commit visual artifacts.
155
+ For review, create a simple HTML/PNG gallery that places `before-*.png` and `after-*.png` side by side. Keep the generated gallery in `/tmp` unless explicitly asked to commit visual artifacts. In agent-harness runs, use `agent_browser` to open that gallery or the generated single-run HTML and save verified screenshots.
124
156
 
125
157
  ## JSONL inspection
126
158
 
127
159
  For each visual claim, inspect the JSONL path written by the runner. Confirm at least:
128
160
 
129
- - `toolCall.name` is the expected pi-facing replay tool name.
161
+ - `toolCall.name` matches the prompt matrix for the category being claimed.
130
162
  - `toolCall.arguments` show the expected user-facing args.
131
163
  - `toolResult.toolName` matches the call.
132
164
  - `toolResult.content[0].text` contains the recorded body expected in the card.
133
165
  - `toolResult.isError` matches the visual card state.
166
+ - The screenshot label and JSONL path are recorded together, so a card category cannot be claimed from a screenshot or JSONL alone.
134
167
 
135
168
  For local pi MCP bridge claims, also confirm:
136
169
 
@@ -143,7 +176,7 @@ Small helper pattern:
143
176
  ```bash
144
177
  python3 - <<'PY'
145
178
  import json, pathlib
146
- path = pathlib.Path('/tmp/pi-visual-harness/review-current/after-shell-nonzero.jsonl.path').read_text().strip()
179
+ path = pathlib.Path('/tmp/pi-cursor-sdk-visual-review-current/shell-success.jsonl.path').read_text().strip()
147
180
  for line in pathlib.Path(path).read_text().splitlines():
148
181
  obj = json.loads(line)
149
182
  msg = obj.get('message', {})
@@ -159,25 +192,37 @@ PY
159
192
 
160
193
  ## Safety rules
161
194
 
162
- - Prefer the offscreen PTY renderer. Do not use `osascript`, visible Terminal windows, or `screencapture` unless a user explicitly asks for a real desktop screenshot.
195
+ - Prefer the canonical offscreen PTY plus browser-rendered screenshot path. Do not use `osascript`, visible Terminal windows, or `screencapture` unless a user explicitly asks for a real desktop screenshot or the bug is terminal-specific.
163
196
  - Keep generated screenshots, HTML galleries, ANSI logs, and temporary harness dependencies out of the repo by default.
164
197
  - Use short, deterministic prompts with bounded wait times.
165
- - For timeout/background prompts, always check for leftovers:
198
+ - For timeout/background prompts, always check for leftovers, preferably with the runner's repeatable `--leftover-pattern` option:
199
+
200
+ ```bash
201
+ npm run smoke:visual -- \
202
+ --label shell-timeout \
203
+ --prompt 'Run sleep 30 && echo should-not-print using only the shell tool.' \
204
+ --leftover-pattern 'sleep 30|should-not-print'
205
+ ```
206
+
207
+ Manual fallback:
166
208
 
167
209
  ```bash
168
- ps -axo pid,etime,command | rg "sleep 2|should-not-print|<audit-session-label>" || true
210
+ ps -axo pid,etime,command | rg "sleep 30|should-not-print|<audit-session-label>" || true
169
211
  ```
170
212
 
171
213
  - If the model uses a different tool than requested, record it as model/provider behavior unless JSONL shows replay lost or misrendered a completed Cursor tool event.
172
- - Visual output can differ slightly from macOS Terminal fonts because xterm.js renders offscreen. Treat this workflow as evidence for card class, color state, labels, ordering, truncation, and content. Use a real terminal screenshot only for pixel-level terminal-specific bugs.
214
+ - Do not use `--bridge`, `--bridge --expose-builtin-tools`, or non-`none` `--setting-sources` for the default native replay matrix. Those opt-ins validate different surfaces and must be labeled separately.
215
+ - Visual output can differ slightly from macOS Terminal fonts because browser/xterm renderers run offscreen. Treat this workflow as authoritative release evidence for card class, color state, labels, ordering, truncation, footer/status readability, and content. Use a real terminal screenshot only for pixel-level terminal-specific bugs.
173
216
 
174
217
  ## Required evidence before commit or merge
175
218
 
176
219
  Before accepting a replay-card change, provide:
177
220
 
178
- - Before and after PNG paths.
221
+ - Browser-rendered PNG paths captured from offscreen ANSI output.
222
+ - Before and after PNG paths when comparing a rendering change.
179
223
  - The prompt used for each pair.
224
+ - ANSI/text/HTML paths when helpful for review.
180
225
  - JSONL paths for each run.
181
226
  - A short statement of what changed visually.
182
- - The relevant JSONL `toolCall` / `toolResult` facts.
227
+ - The relevant JSONL `toolCall` / `toolResult` facts, including expected tool name and `isError` state from the prompt matrix.
183
228
  - `npm test` and `npm run typecheck` results, unless the change is documentation-only.