ptywright 0.3.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +76 -31
- package/dist/agent.mjs +2 -2
- package/dist/bin/ptywright.mjs +1 -1
- package/dist/{cli-CfvlbRoZ.mjs → cli-PnG6UR43.mjs} +2390 -2309
- package/dist/cli.mjs +1 -1
- package/dist/config-bGg636EW.mjs +52 -0
- package/dist/config.mjs +2 -0
- package/dist/env-DPYHo-zH.mjs +36 -0
- package/dist/index.mjs +1 -1
- package/dist/manifest_files-DW80c1H7.mjs +77 -0
- package/dist/mcp.mjs +1 -1
- package/dist/pty-cassette.mjs +1 -1
- package/dist/{runner-zi0nItvB.mjs → runner-C1gPRyCM.mjs} +2002 -1038
- package/dist/{runner-zApMYWZx.mjs → runner-wW_DCBX7.mjs} +1576 -1422
- package/dist/script.mjs +1 -1
- package/dist/{server-BC3yo-dq.mjs → server-DMnnXjWv.mjs} +2643 -2527
- package/dist/session.mjs +1 -1
- package/dist/{terminal_session-DopC7Xg6.mjs → terminal_session-DJKr-O3X.mjs} +349 -328
- package/package.json +3 -1
- package/skills/ptywright-testing/SKILL.md +113 -79
- package/skills/ptywright-testing/agents/openai.yaml +4 -0
- package/skills/ptywright-testing/references/agent-regression.md +132 -0
- package/skills/ptywright-testing/references/ci-and-debugging.md +95 -0
- package/skills/ptywright-testing/references/mcp-tools.md +91 -0
- package/skills/ptywright-testing/references/raw-pty-cassettes.md +82 -0
- package/skills/ptywright-testing/references/script-runner.md +80 -0
- package/dist/{pty_like-Cpkh_O9B.mjs → pty_like-BjeBibSL.mjs} +2 -2
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "ptywright",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.5.0",
|
|
4
4
|
"description": "Terminal/TUI automation driver over PTY + xterm, exposed as MCP tools",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"agent",
|
|
@@ -37,6 +37,7 @@
|
|
|
37
37
|
"exports": {
|
|
38
38
|
".": "./dist/cli.mjs",
|
|
39
39
|
"./agent": "./dist/agent.mjs",
|
|
40
|
+
"./config": "./dist/config.mjs",
|
|
40
41
|
"./mcp": "./dist/mcp.mjs",
|
|
41
42
|
"./pty-cassette": "./dist/pty-cassette.mjs",
|
|
42
43
|
"./session": "./dist/session.mjs",
|
|
@@ -62,6 +63,7 @@
|
|
|
62
63
|
"prepublishOnly": "bun run build"
|
|
63
64
|
},
|
|
64
65
|
"dependencies": {
|
|
66
|
+
"@aitty/snapshot": "^0.5.1",
|
|
65
67
|
"@modelcontextprotocol/sdk": "^1.25.2",
|
|
66
68
|
"@xterm/headless": "^6.0.0",
|
|
67
69
|
"asciinema-player": "3.9.0",
|
|
@@ -1,122 +1,156 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: ptywright-testing
|
|
3
|
-
description:
|
|
3
|
+
description: Build, run, record, replay, debug, and maintain deterministic terminal, TUI, PTY cassette, and browser-terminal agent regression tests with ptywright. Use when an agent needs to drive CLI/TUI apps, create ptywright scripts, configure ptywright.config.*, record or replay PTY output, solidify browser terminal agent flows into non-AI snapshot tests, inspect generated artifacts, or diagnose ptywright CI failures.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# Ptywright Testing
|
|
7
7
|
|
|
8
|
-
Use ptywright
|
|
8
|
+
Use ptywright when the task involves terminal or browser-terminal behavior that should be repeatable without manual inspection. Prefer stable text, DOM, and terminal snapshots over screenshots unless the user explicitly needs visual media.
|
|
9
9
|
|
|
10
|
-
##
|
|
10
|
+
## First Decision
|
|
11
11
|
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
12
|
+
Choose one workflow before editing:
|
|
13
|
+
|
|
14
|
+
- **Browser terminal agent regression**: Use when a web app renders a terminal and exposes `[data-terminal-root]`, or when testing integrations such as Codex/Claude/Droid wrappers. Read `references/agent-regression.md`.
|
|
15
|
+
- **Raw PTY recording and replay**: Use when the user wants to capture terminal bytes from `node-pty`, Bun Terminal, `bun-pty`, or an arbitrary command, then replay them into another renderer. Read `references/raw-pty-cassettes.md`.
|
|
16
|
+
- **Scripted TUI tests**: Use when testing a CLI/TUI directly through ptywright scripts, golden snapshots, and HTML reports. Read `references/script-runner.md`.
|
|
17
|
+
- **MCP interactive driving or recording**: Use when an agent should interact through ptywright MCP tools or record an MCP-driven session into a script. Read `references/mcp-tools.md`.
|
|
18
|
+
- **CI/debugging/artifact triage**: Use when a ptywright run failed, snapshots mismatch, a manifest is stale, or reusable commands need to be executed. Read `references/ci-and-debugging.md`.
|
|
19
|
+
|
|
20
|
+
If more than one workflow applies, start with the highest-level workflow that preserves determinism. For example, for an evolving browser terminal renderer, record a raw PTY cassette first, then create a browser agent regression that replays the cassette into the renderer.
|
|
15
21
|
|
|
16
|
-
|
|
17
|
-
bun add -g ptywright
|
|
18
|
-
ptywright <command>
|
|
22
|
+
## Installation And Entry Points
|
|
19
23
|
|
|
20
|
-
|
|
24
|
+
Prefer the local project command when working inside a ptywright checkout:
|
|
25
|
+
|
|
26
|
+
```bash
|
|
21
27
|
bun run bin/ptywright <command>
|
|
22
28
|
```
|
|
23
29
|
|
|
24
|
-
|
|
30
|
+
Prefer published package commands in downstream projects:
|
|
25
31
|
|
|
26
|
-
|
|
27
|
-
|
|
32
|
+
```bash
|
|
33
|
+
bunx ptywright@latest <command>
|
|
34
|
+
# or
|
|
35
|
+
npx ptywright@latest <command>
|
|
36
|
+
```
|
|
28
37
|
|
|
29
|
-
|
|
38
|
+
Common commands:
|
|
30
39
|
|
|
31
40
|
```bash
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
41
|
+
ptywright mcp
|
|
42
|
+
ptywright mcp --caps core
|
|
43
|
+
ptywright run <file.json|file.ts>
|
|
44
|
+
ptywright run-all --dir scripts
|
|
45
|
+
ptywright agent run <flow.json> --update-snapshots
|
|
46
|
+
ptywright agent check
|
|
47
|
+
ptywright pty record --out tests/cassettes/session.pty.json -- <command> [args...]
|
|
48
|
+
```
|
|
37
49
|
|
|
38
|
-
|
|
39
|
-
|
|
50
|
+
## Project Config
|
|
51
|
+
|
|
52
|
+
Use `ptywright.config.ts` for project defaults, not as a second test DSL. The flow file remains the test case.
|
|
53
|
+
|
|
54
|
+
```ts
|
|
55
|
+
import { defineConfig } from "ptywright/config";
|
|
56
|
+
|
|
57
|
+
export default defineConfig({
|
|
58
|
+
agent: {
|
|
59
|
+
artifactsRoot: ".tmp/agent",
|
|
60
|
+
cassetteDir: "tests/agent-cassettes",
|
|
61
|
+
snapshotDir: "tests/agent-snapshots",
|
|
62
|
+
defaults: {
|
|
63
|
+
headless: true,
|
|
64
|
+
timeoutMs: 45_000,
|
|
65
|
+
screenshot: false,
|
|
66
|
+
viewports: [{ name: "desktop", width: 1280, height: 820 }],
|
|
67
|
+
mask: [{ regex: "session_[a-z0-9]+", replacement: "<session>" }],
|
|
68
|
+
},
|
|
69
|
+
},
|
|
70
|
+
});
|
|
40
71
|
```
|
|
41
72
|
|
|
42
|
-
|
|
73
|
+
Priority rule: explicit CLI args override flow fields, and flow fields override config defaults. Config-relative paths resolve from the config file directory.
|
|
74
|
+
|
|
75
|
+
## Core Invariants
|
|
43
76
|
|
|
44
|
-
|
|
77
|
+
- Keep tests deterministic: fixed terminal size, explicit waits, stable snapshots, masks for random text.
|
|
78
|
+
- Prefer structured APIs and generated reusable commands over shell string reconstruction.
|
|
79
|
+
- Treat `--update-snapshots` as the only intentional baseline update path.
|
|
80
|
+
- Use generated manifests and summaries as durable reproduction bundles.
|
|
81
|
+
- Do not hand-edit cassette, run-record, summary, or manifest command metadata unless a test explicitly asks for malformed fixture data.
|
|
82
|
+
- Avoid app-specific assumptions. ptywright should integrate with any renderer through commands, URLs, DOM roots, and cassette data.
|
|
45
83
|
|
|
46
|
-
|
|
84
|
+
## Minimal Examples
|
|
85
|
+
|
|
86
|
+
Browser agent flow:
|
|
47
87
|
|
|
48
88
|
```json
|
|
49
89
|
{
|
|
50
|
-
"
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
90
|
+
"name": "browser_terminal_smoke",
|
|
91
|
+
"launch": {
|
|
92
|
+
"mode": "command",
|
|
93
|
+
"agentFlavor": "generic",
|
|
94
|
+
"command": "node",
|
|
95
|
+
"args": ["scripts/start-browser-terminal.js", "--print-url"],
|
|
96
|
+
"waitForUrlMs": 15000
|
|
97
|
+
},
|
|
98
|
+
"steps": [
|
|
99
|
+
{ "type": "waitForStableDom" },
|
|
100
|
+
{ "type": "snapshot", "name": "ready", "targets": ["terminal", "dom"] }
|
|
101
|
+
]
|
|
56
102
|
}
|
|
57
103
|
```
|
|
58
104
|
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
### Run the whole suite (preferred)
|
|
105
|
+
Raw PTY cassette:
|
|
62
106
|
|
|
63
107
|
```bash
|
|
64
|
-
|
|
108
|
+
ptywright pty record --out tests/cassettes/codex.pty.json -- codex --yolo
|
|
109
|
+
ptywright pty replay tests/cassettes/codex.pty.json --speed 0
|
|
110
|
+
ptywright pty validate tests/cassettes/codex.pty.json
|
|
65
111
|
```
|
|
66
112
|
|
|
67
|
-
|
|
68
|
-
- `reportPath` (open in a browser)
|
|
69
|
-
- `summaryPath` (`run.summary.json` for agents/CI)
|
|
70
|
-
|
|
71
|
-
MCP equivalent:
|
|
72
|
-
- `run_all_scripts` (defaults: `dir="scripts"`, suite report in `.tmp/run-all/`)
|
|
73
|
-
- Keep MCP output small: `run_all_scripts(includeEntries="failures", maxEntries=20)`
|
|
113
|
+
Script runner:
|
|
74
114
|
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
115
|
+
```json
|
|
116
|
+
{
|
|
117
|
+
"name": "tui_smoke",
|
|
118
|
+
"command": ["bun", "tests/fixtures/tui_demo.ts"],
|
|
119
|
+
"cols": 80,
|
|
120
|
+
"rows": 24,
|
|
121
|
+
"steps": [
|
|
122
|
+
{ "type": "waitForText", "text": "Ready" },
|
|
123
|
+
{ "type": "snapshot", "kind": "text", "saveAs": "ready" }
|
|
124
|
+
]
|
|
125
|
+
}
|
|
79
126
|
```
|
|
80
127
|
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
## Debug a failure
|
|
84
|
-
|
|
85
|
-
Script runner artifacts to check (paths are returned by CLI/MCP):
|
|
86
|
-
|
|
87
|
-
- `*.report.html` (timeline + snapshots)
|
|
88
|
-
- `*.cast` (full playback)
|
|
89
|
-
- `failure.last.view.txt` / `failure.last.txt` (last screen)
|
|
90
|
-
- `failure.error.txt` (stack trace)
|
|
128
|
+
## Verification Commands
|
|
91
129
|
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
## Record an interactive flow (MCP)
|
|
95
|
-
|
|
96
|
-
1) `start_script_recording(name=...)`
|
|
97
|
-
2) Drive the app with normal tools:
|
|
98
|
-
- `launch_session` → `send_text` / `press_key` / `wait_for_text` / `snapshot_*`
|
|
99
|
-
3) Add golden checkpoints: `mark(label=...)`
|
|
100
|
-
4) Export: `stop_script_recording(recordingId=..., writeFiles=true)`
|
|
101
|
-
|
|
102
|
-
## All-tools smoke (recommended)
|
|
103
|
-
|
|
104
|
-
To verify ptywright MCP tool coverage without relying on external apps/network, run:
|
|
130
|
+
Use the narrowest useful verification first, then broaden when editing shared behavior:
|
|
105
131
|
|
|
106
132
|
```bash
|
|
107
|
-
bun
|
|
133
|
+
bun run format:check
|
|
134
|
+
bun run lint
|
|
135
|
+
bun test tests/agent_config.test.ts
|
|
136
|
+
bun test tests/agent_rerun.test.ts
|
|
137
|
+
bun run build
|
|
138
|
+
bun run check
|
|
108
139
|
```
|
|
109
140
|
|
|
110
|
-
|
|
141
|
+
For downstream projects:
|
|
111
142
|
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
143
|
+
```bash
|
|
144
|
+
ptywright agent validate <artifact-or-dir>
|
|
145
|
+
ptywright agent inspect <artifact-or-dir>
|
|
146
|
+
ptywright agent commands <artifact-or-dir> --json
|
|
147
|
+
ptywright agent exec <artifact-or-dir> --command rerun
|
|
148
|
+
```
|
|
118
149
|
|
|
119
|
-
##
|
|
150
|
+
## Resource Map
|
|
120
151
|
|
|
121
|
-
- `
|
|
122
|
-
|
|
152
|
+
- `references/agent-regression.md`: Browser terminal agent flows, cassettes, snapshots, promote/check/rerun, and renderer integration.
|
|
153
|
+
- `references/raw-pty-cassettes.md`: Raw PTY cassette recording, replay, wrapper integration, and renderer handoff.
|
|
154
|
+
- `references/script-runner.md`: JSON/TS script runner, MCP script recording, goldens, masks, and reports.
|
|
155
|
+
- `references/mcp-tools.md`: MCP setup and tool selection.
|
|
156
|
+
- `references/ci-and-debugging.md`: Failure triage, manifests, reusable commands, snapshot updates, and CI gates.
|
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
# Browser Agent Regression
|
|
2
|
+
|
|
3
|
+
Use this workflow when ptywright drives a browser-hosted terminal renderer. The renderer must expose a terminal root as `[data-terminal-root]`.
|
|
4
|
+
|
|
5
|
+
## Contract
|
|
6
|
+
|
|
7
|
+
`launch.mode=command` is the preferred integration:
|
|
8
|
+
|
|
9
|
+
- `command` and `args` start a wrapper or app process.
|
|
10
|
+
- The process prints a browser URL to stdout or stderr.
|
|
11
|
+
- ptywright opens that URL with Playwright.
|
|
12
|
+
- The page renders the terminal under `[data-terminal-root]`.
|
|
13
|
+
- Steps drive browser input and compare terminal/DOM snapshots.
|
|
14
|
+
|
|
15
|
+
Use `launch.mode=url` only when the page is already running.
|
|
16
|
+
|
|
17
|
+
## Flow Lifecycle
|
|
18
|
+
|
|
19
|
+
1. Create a flow JSON or TS file.
|
|
20
|
+
2. Run live once and write baselines:
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
ptywright agent run tests/agents/name.flow.json --update-snapshots
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
3. Compare later without updating:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
ptywright agent run tests/agents/name.flow.json
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
4. Replay a run record or cassette without the live agent:
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
ptywright agent replay .tmp/agent/name/name.agent-run.json
|
|
36
|
+
ptywright agent replay .tmp/agent/name/name.cassette.json
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
5. Promote a good live run into committed non-AI regression:
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
ptywright agent promote .tmp/agent/name/name.cassette.json --update-snapshots
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
6. Run the committed suite:
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
ptywright agent check
|
|
49
|
+
ptywright agent replay-all tests/agent-cassettes --update-snapshots
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Recommended Flow Shape
|
|
53
|
+
|
|
54
|
+
```json
|
|
55
|
+
{
|
|
56
|
+
"name": "agent_renderer_smoke",
|
|
57
|
+
"launch": {
|
|
58
|
+
"mode": "command",
|
|
59
|
+
"agentFlavor": "codex",
|
|
60
|
+
"command": "node",
|
|
61
|
+
"args": [
|
|
62
|
+
"tests/harness/browser-terminal.js",
|
|
63
|
+
"--",
|
|
64
|
+
"codex",
|
|
65
|
+
"--yolo",
|
|
66
|
+
"--print-url"
|
|
67
|
+
],
|
|
68
|
+
"waitForUrlMs": 20000,
|
|
69
|
+
"urlRegex": "(https?://\\S+)"
|
|
70
|
+
},
|
|
71
|
+
"defaults": {
|
|
72
|
+
"timeoutMs": 45000,
|
|
73
|
+
"screenshot": false,
|
|
74
|
+
"mask": [{ "regex": "req_[a-zA-Z0-9]+", "replacement": "<request-id>" }]
|
|
75
|
+
},
|
|
76
|
+
"viewports": [{ "name": "desktop", "width": 1280, "height": 820 }],
|
|
77
|
+
"steps": [
|
|
78
|
+
{ "type": "waitForStableDom", "quietMs": 600 },
|
|
79
|
+
{ "type": "snapshot", "name": "launch", "targets": ["terminal", "dom"] }
|
|
80
|
+
]
|
|
81
|
+
}
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
Keep the flow generic. ptywright should not import app internals. The downstream app should provide a command or test harness that prints a browser URL and can consume replay data if needed.
|
|
85
|
+
|
|
86
|
+
## Recording Browser Interactions
|
|
87
|
+
|
|
88
|
+
Use `agent record` when manually exploring a browser-terminal flow:
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
ptywright agent record tests/agents/base.flow.json \
|
|
92
|
+
--out tests/agents/recorded.flow.json \
|
|
93
|
+
--duration-ms 60000 \
|
|
94
|
+
--headed
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
End recording by waiting for `duration-ms` to elapse or by stopping the process. The output is a normal flow JSON containing keyboard/click steps plus a final checkpoint.
|
|
98
|
+
|
|
99
|
+
## Non-AI Regression Strategy
|
|
100
|
+
|
|
101
|
+
For evolving agent UIs:
|
|
102
|
+
|
|
103
|
+
1. Capture or create a stable PTY or browser-agent cassette.
|
|
104
|
+
2. Replay that cassette into the renderer.
|
|
105
|
+
3. Snapshot terminal text and DOM.
|
|
106
|
+
4. Commit cassette and snapshots.
|
|
107
|
+
5. Use `agent check` in CI.
|
|
108
|
+
|
|
109
|
+
This lets renderer changes be verified without asking the live AI to reproduce the same answer.
|
|
110
|
+
|
|
111
|
+
## Artifact Meanings
|
|
112
|
+
|
|
113
|
+
- `.agent-run.json`: Per-run record with `commands.replay.argv` and `commands.updateSnapshots.argv`.
|
|
114
|
+
- `.cassette.json`: Normalized flow spec plus captured terminal/DOM frames and hashes.
|
|
115
|
+
- `agent-replay.summary.json`: Replay-all suite summary.
|
|
116
|
+
- `agent-check.summary.json`: Committed cassette check summary.
|
|
117
|
+
- `agent-promote.summary.json`: Promote operation summary.
|
|
118
|
+
- `ptywright-agent.manifest.json`: Hash-indexed portable artifact bundle.
|
|
119
|
+
- `index.html`: Human-readable report with snapshots and reusable commands.
|
|
120
|
+
|
|
121
|
+
## Common Commands
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
ptywright agent inspect .tmp/agent-check
|
|
125
|
+
ptywright agent validate .tmp/agent-check
|
|
126
|
+
ptywright agent commands .tmp/agent-check --json
|
|
127
|
+
ptywright agent exec .tmp/agent-check --command rerun
|
|
128
|
+
ptywright agent exec .tmp/agent-check --command updateSnapshots
|
|
129
|
+
ptywright agent rerun .tmp/agent-check/agent-check.summary.json
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
Prefer `agent exec` when an artifact already contains a reusable command. It avoids shell parsing and relocates copied manifest bundles safely.
|
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
# CI And Debugging
|
|
2
|
+
|
|
3
|
+
Use this guide when a ptywright command fails, CI times out, snapshots mismatch, or generated artifact commands need to be reused.
|
|
4
|
+
|
|
5
|
+
## First Triage
|
|
6
|
+
|
|
7
|
+
1. Read the failing command and exact artifact paths from the log.
|
|
8
|
+
2. Open the HTML report if available.
|
|
9
|
+
3. Inspect the generated summary JSON.
|
|
10
|
+
4. Run validation on the artifact or directory.
|
|
11
|
+
5. Use generated commands instead of reconstructing shell strings manually.
|
|
12
|
+
|
|
13
|
+
Commands:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
ptywright agent inspect <artifact-or-dir>
|
|
17
|
+
ptywright agent validate <artifact-or-dir>
|
|
18
|
+
ptywright agent commands <artifact-or-dir> --json
|
|
19
|
+
ptywright agent commands <artifact-or-dir> --command rerun
|
|
20
|
+
ptywright agent exec <artifact-or-dir> --command rerun
|
|
21
|
+
ptywright agent exec <artifact-or-dir> --command updateSnapshots
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Snapshot Mismatches
|
|
25
|
+
|
|
26
|
+
Default replay/check mode compares snapshots. Only update baselines intentionally:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
ptywright agent replay-all tests/agent-cassettes --update-snapshots
|
|
30
|
+
ptywright agent exec <artifact-or-dir> --command updateSnapshots
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
For script runner:
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
ptywright run-all --dir scripts --update-goldens
|
|
37
|
+
ptywright script exec <summary-or-dir> --command updateGoldens
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
Always inspect diffs before committing updated baselines.
|
|
41
|
+
|
|
42
|
+
## Portable Bundles
|
|
43
|
+
|
|
44
|
+
Agent run/check/promote/replay-all outputs include `ptywright-agent.manifest.json`. A manifest bundle can be copied and still supports:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
ptywright agent inspect <copied-dir>
|
|
48
|
+
ptywright agent commands <copied-dir> --json
|
|
49
|
+
ptywright agent exec <copied-dir> --command rerun
|
|
50
|
+
ptywright agent validate <copied-dir>
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
If a directory has artifacts but no top-level manifest, use `agent validate <dir>` for recursive validation. `agent commands` and `agent exec` expect a manifest-backed command bundle for directory arguments.
|
|
54
|
+
|
|
55
|
+
## Common Failure Causes
|
|
56
|
+
|
|
57
|
+
- Missing `[data-terminal-root]` in browser terminal pages.
|
|
58
|
+
- Flow waits on unstable AI prose instead of stable markers.
|
|
59
|
+
- Snapshot baseline was not updated after an intentional UI change.
|
|
60
|
+
- Random text was not masked.
|
|
61
|
+
- Relative cassette or snapshot paths were moved without a manifest bundle.
|
|
62
|
+
- Stored command metadata in summaries was hand-edited and no longer matches schema expectations.
|
|
63
|
+
- CI is too slow for tests that run multiple full browser replays in one case.
|
|
64
|
+
|
|
65
|
+
## Timeout Reduction
|
|
66
|
+
|
|
67
|
+
When a test times out:
|
|
68
|
+
|
|
69
|
+
- Avoid running setup and rerun paths that both do full browser replay in the same test.
|
|
70
|
+
- Use summary fixtures to test command metadata or override behavior.
|
|
71
|
+
- Keep one full end-to-end test per workflow and make surrounding tests narrower.
|
|
72
|
+
- Use committed deterministic cassettes instead of live agents.
|
|
73
|
+
- Keep test timeouts realistic but do not hide structural slowness by only increasing timeouts.
|
|
74
|
+
|
|
75
|
+
## Repository Gates
|
|
76
|
+
|
|
77
|
+
For ptywright itself:
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
bun run format:check
|
|
81
|
+
bun run lint
|
|
82
|
+
bun test tests/agent_rerun.test.ts
|
|
83
|
+
bun test tests/agent_promote.test.ts tests/agent_commands.test.ts
|
|
84
|
+
bun run build
|
|
85
|
+
bun run check
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
For downstream projects:
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
ptywright agent check
|
|
92
|
+
ptywright agent validate .tmp/agent-check
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Use the narrowest failing test while iterating, then broaden before finalizing shared behavior.
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
# MCP Tools
|
|
2
|
+
|
|
3
|
+
Use MCP when an agent should interact with a live terminal session, inspect terminal state, or record an exploratory flow into a script.
|
|
4
|
+
|
|
5
|
+
## Start Server
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
ptywright mcp
|
|
9
|
+
ptywright mcp --caps core
|
|
10
|
+
ptywright mcp --caps core,script,recording
|
|
11
|
+
ptywright mcp-http --port 3000
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
Capabilities:
|
|
15
|
+
|
|
16
|
+
- `core`: Launch sessions, send input, wait, snapshot.
|
|
17
|
+
- `debug`: Extra inspection and traces.
|
|
18
|
+
- `script`: Run script files and suites.
|
|
19
|
+
- `recording`: Record MCP tool calls into scripts.
|
|
20
|
+
- `all`: Everything.
|
|
21
|
+
|
|
22
|
+
Use smaller capability sets to reduce agent context pressure.
|
|
23
|
+
|
|
24
|
+
## Client Config
|
|
25
|
+
|
|
26
|
+
Example for clients that use a JSON MCP server config:
|
|
27
|
+
|
|
28
|
+
```json
|
|
29
|
+
{
|
|
30
|
+
"mcpServers": {
|
|
31
|
+
"ptywright": {
|
|
32
|
+
"command": "bunx",
|
|
33
|
+
"args": ["ptywright@latest", "mcp", "--caps", "core,script,recording"]
|
|
34
|
+
}
|
|
35
|
+
}
|
|
36
|
+
}
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Inside this repository, use:
|
|
40
|
+
|
|
41
|
+
```json
|
|
42
|
+
{
|
|
43
|
+
"mcpServers": {
|
|
44
|
+
"ptywright": {
|
|
45
|
+
"command": "bun",
|
|
46
|
+
"args": ["run", "src/cli.ts", "mcp"]
|
|
47
|
+
}
|
|
48
|
+
}
|
|
49
|
+
}
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Tool Selection
|
|
53
|
+
|
|
54
|
+
Typical interactive sequence:
|
|
55
|
+
|
|
56
|
+
1. `launch_session` with fixed `cols`, `rows`, and `env.TERM`.
|
|
57
|
+
2. `wait_for_text` for stable startup markers.
|
|
58
|
+
3. `send_text`, `press_key`, or mouse tools.
|
|
59
|
+
4. `wait_for_stable_screen` before snapshots.
|
|
60
|
+
5. `snapshot_text`, `snapshot_view`, or `snapshot_grid`.
|
|
61
|
+
6. `close_session` when done.
|
|
62
|
+
|
|
63
|
+
Prefer semantic terminal snapshots over screenshots. Use screenshots only if the task explicitly needs visual proof.
|
|
64
|
+
|
|
65
|
+
## Recording
|
|
66
|
+
|
|
67
|
+
Use recording when an exploratory interaction should become a repeatable test:
|
|
68
|
+
|
|
69
|
+
```text
|
|
70
|
+
start_script_recording
|
|
71
|
+
launch_session
|
|
72
|
+
send_text / press_key / wait_for_text / snapshot_text
|
|
73
|
+
mark
|
|
74
|
+
stop_script_recording(writeFiles=true)
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
After export, run the generated script from the CLI to ensure it is deterministic:
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
ptywright run <exported-script.json>
|
|
81
|
+
ptywright run <exported-script.json> --update-goldens
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
## Context Control
|
|
85
|
+
|
|
86
|
+
When using MCP from an LLM agent:
|
|
87
|
+
|
|
88
|
+
- Avoid returning huge terminal text unless needed.
|
|
89
|
+
- Prefer `includeText=false` or failure-only entries for suite tools when available.
|
|
90
|
+
- Use report and summary paths for detailed inspection.
|
|
91
|
+
- Use masks early if non-deterministic output appears.
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
# Raw PTY Cassettes
|
|
2
|
+
|
|
3
|
+
Use raw PTY cassettes when the goal is to capture terminal output once and replay it later without relaunching the original CLI, AI agent, or TUI.
|
|
4
|
+
|
|
5
|
+
## CLI Recording
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
ptywright pty record --out tests/cassettes/session.pty.json -- <command> [args...]
|
|
9
|
+
ptywright pty validate tests/cassettes/session.pty.json
|
|
10
|
+
ptywright pty inspect tests/cassettes/session.pty.json
|
|
11
|
+
ptywright pty replay tests/cassettes/session.pty.json --speed 0
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
Examples:
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
ptywright pty record --out tests/cassettes/codex-yolo.pty.json -- codex --yolo
|
|
18
|
+
ptywright pty record --out tests/cassettes/browser-terminal-codex.pty.json -- \
|
|
19
|
+
node tests/harness/browser-terminal.js -- codex --yolo
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
Use `--cols`, `--rows`, `--term`, and `--backend` to stabilize output:
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
ptywright pty record \
|
|
26
|
+
--out tests/cassettes/session.pty.json \
|
|
27
|
+
--cols 120 \
|
|
28
|
+
--rows 32 \
|
|
29
|
+
--term xterm-256color \
|
|
30
|
+
--backend auto \
|
|
31
|
+
-- <command>
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Programmatic Integration
|
|
35
|
+
|
|
36
|
+
Use `ptywright/pty-cassette` in projects that already control a PTY-like object.
|
|
37
|
+
|
|
38
|
+
```ts
|
|
39
|
+
import { wrapPtyLike } from "ptywright/pty-cassette";
|
|
40
|
+
|
|
41
|
+
const recorder = wrapPtyLike(ptyProcess, {
|
|
42
|
+
path: "tests/cassettes/session.pty.json",
|
|
43
|
+
command: ["codex", "--yolo"],
|
|
44
|
+
cols: 120,
|
|
45
|
+
rows: 32,
|
|
46
|
+
term: "xterm-256color",
|
|
47
|
+
});
|
|
48
|
+
|
|
49
|
+
// Use recorder.process like the original ptyProcess.
|
|
50
|
+
// Close/finalize according to the package API.
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Prefer wrapper integration when a downstream project wants to keep using native `node-pty`, Bun Terminal, or `bun-pty` while still producing ptywright-compatible data.
|
|
54
|
+
|
|
55
|
+
## Renderer Handoff Pattern
|
|
56
|
+
|
|
57
|
+
For browser terminal renderers:
|
|
58
|
+
|
|
59
|
+
1. Record raw PTY output as `*.pty.json`.
|
|
60
|
+
2. Add a small local harness in the renderer project that loads this cassette and renders it into the browser terminal.
|
|
61
|
+
3. Print the browser URL from that harness.
|
|
62
|
+
4. Use a ptywright agent flow to open the URL and snapshot `[data-terminal-root]`.
|
|
63
|
+
|
|
64
|
+
This separates byte-level reproduction from renderer-level DOM regression.
|
|
65
|
+
|
|
66
|
+
## Updating Scenarios Without Duplicating Huge Sessions
|
|
67
|
+
|
|
68
|
+
Avoid repeatedly recording long sessions just to test one rendering edge.
|
|
69
|
+
|
|
70
|
+
Recommended patterns:
|
|
71
|
+
|
|
72
|
+
- Keep small, named cassettes for specific UI states: `code-block.pty.json`, `spinner.pty.json`, `long-line.pty.json`.
|
|
73
|
+
- Prefer fixture commands that emit deterministic terminal sequences for a targeted state.
|
|
74
|
+
- Trim at the source by recording a shorter command or a purpose-built harness.
|
|
75
|
+
- Use masks to normalize timestamps, ids, spinner ticks, and model names.
|
|
76
|
+
- Store cassettes under `tests/cassettes/` and keep renderer snapshots under `tests/agent-snapshots/`.
|
|
77
|
+
|
|
78
|
+
If an existing long cassette is useful but contains irrelevant frames, create a derived fixture in the app's harness rather than hand-editing hashes unless the project has a supported cassette transform.
|
|
79
|
+
|
|
80
|
+
## When To Use Browser Agent Cassettes Instead
|
|
81
|
+
|
|
82
|
+
Use browser agent cassettes when you need DOM snapshots, viewport coverage, or Playwright interactions. Use raw PTY cassettes when you only need terminal bytes and want broad compatibility with any PTY provider.
|