playwriter 0.0.63 → 0.0.80

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (216) hide show
  1. package/dist/aria-snapshot.d.ts +41 -3
  2. package/dist/aria-snapshot.d.ts.map +1 -1
  3. package/dist/aria-snapshot.js +131 -54
  4. package/dist/aria-snapshot.js.map +1 -1
  5. package/dist/aria-snapshot.test.js +5 -2
  6. package/dist/aria-snapshot.test.js.map +1 -1
  7. package/dist/aria-snapshot.unit.test.js +83 -41
  8. package/dist/aria-snapshot.unit.test.js.map +1 -1
  9. package/dist/assets/cursors/screen-studio/pointer-macos-tahoe-data-url.d.ts +5 -0
  10. package/dist/assets/cursors/screen-studio/pointer-macos-tahoe-data-url.d.ts.map +1 -0
  11. package/dist/assets/cursors/screen-studio/pointer-macos-tahoe-data-url.js +5 -0
  12. package/dist/assets/cursors/screen-studio/pointer-macos-tahoe-data-url.js.map +1 -0
  13. package/dist/bippy.js +1 -1
  14. package/dist/cdp-log.d.ts +1 -1
  15. package/dist/cdp-log.d.ts.map +1 -1
  16. package/dist/cdp-log.js +1 -1
  17. package/dist/cdp-log.js.map +1 -1
  18. package/dist/cdp-relay.d.ts.map +1 -1
  19. package/dist/cdp-relay.js +408 -298
  20. package/dist/cdp-relay.js.map +1 -1
  21. package/dist/cdp-session.d.ts.map +1 -1
  22. package/dist/cdp-session.js.map +1 -1
  23. package/dist/cdp-types.d.ts.map +1 -1
  24. package/dist/cdp-types.js +7 -7
  25. package/dist/cdp-types.js.map +1 -1
  26. package/dist/clean-html.d.ts.map +1 -1
  27. package/dist/clean-html.js +4 -5
  28. package/dist/clean-html.js.map +1 -1
  29. package/dist/cli.js +45 -27
  30. package/dist/cli.js.map +1 -1
  31. package/dist/create-logger.d.ts.map +1 -1
  32. package/dist/create-logger.js +3 -1
  33. package/dist/create-logger.js.map +1 -1
  34. package/dist/debugger-examples-types.d.ts.map +1 -1
  35. package/dist/debugger.d.ts.map +1 -1
  36. package/dist/debugger.js +1 -3
  37. package/dist/debugger.js.map +1 -1
  38. package/dist/diff-utils.d.ts.map +1 -1
  39. package/dist/diff-utils.js +1 -4
  40. package/dist/diff-utils.js.map +1 -1
  41. package/dist/editor-api.md +12 -2
  42. package/dist/editor-examples.d.ts +1 -1
  43. package/dist/editor-examples.d.ts.map +1 -1
  44. package/dist/editor-examples.js +1 -1
  45. package/dist/editor-examples.js.map +1 -1
  46. package/dist/editor.d.ts +1 -1
  47. package/dist/editor.d.ts.map +1 -1
  48. package/dist/editor.js +1 -1
  49. package/dist/editor.js.map +1 -1
  50. package/dist/executor.d.ts +26 -3
  51. package/dist/executor.d.ts.map +1 -1
  52. package/dist/executor.js +295 -64
  53. package/dist/executor.js.map +1 -1
  54. package/dist/executor.unit.test.js +38 -1
  55. package/dist/executor.unit.test.js.map +1 -1
  56. package/dist/extension-connection.test.js +139 -36
  57. package/dist/extension-connection.test.js.map +1 -1
  58. package/dist/ffmpeg.d.ts +148 -0
  59. package/dist/ffmpeg.d.ts.map +1 -0
  60. package/dist/ffmpeg.js +523 -0
  61. package/dist/ffmpeg.js.map +1 -0
  62. package/dist/ghost-browser.d.ts.map +1 -1
  63. package/dist/ghost-browser.js.map +1 -1
  64. package/dist/ghost-cursor-client.js +281 -0
  65. package/dist/ghost-cursor.d.ts +27 -0
  66. package/dist/ghost-cursor.d.ts.map +1 -0
  67. package/dist/ghost-cursor.js +63 -0
  68. package/dist/ghost-cursor.js.map +1 -0
  69. package/dist/htmlrewrite.d.ts.map +1 -1
  70. package/dist/htmlrewrite.js +17 -55
  71. package/dist/htmlrewrite.js.map +1 -1
  72. package/dist/htmlrewrite.test.js.map +1 -1
  73. package/dist/kill-port.d.ts.map +1 -1
  74. package/dist/kill-port.js +1 -3
  75. package/dist/kill-port.js.map +1 -1
  76. package/dist/locator-selector.test.d.ts +2 -0
  77. package/dist/locator-selector.test.d.ts.map +1 -0
  78. package/dist/locator-selector.test.js +96 -0
  79. package/dist/locator-selector.test.js.map +1 -0
  80. package/dist/mcp-client.js.map +1 -1
  81. package/dist/mcp.d.ts.map +1 -1
  82. package/dist/mcp.js +8 -3
  83. package/dist/mcp.js.map +1 -1
  84. package/dist/on-mouse-action.test.d.ts +2 -0
  85. package/dist/on-mouse-action.test.d.ts.map +1 -0
  86. package/dist/on-mouse-action.test.js +155 -0
  87. package/dist/on-mouse-action.test.js.map +1 -0
  88. package/dist/page-markdown.js +4 -4
  89. package/dist/page-markdown.js.map +1 -1
  90. package/dist/prompt.md +594 -255
  91. package/dist/protocol.d.ts +4 -0
  92. package/dist/protocol.d.ts.map +1 -1
  93. package/dist/readability.js +1 -1
  94. package/dist/recording-ghost-cursor.d.ts +41 -0
  95. package/dist/recording-ghost-cursor.d.ts.map +1 -0
  96. package/dist/recording-ghost-cursor.js +79 -0
  97. package/dist/recording-ghost-cursor.js.map +1 -0
  98. package/dist/recording-relay.d.ts.map +1 -1
  99. package/dist/recording-relay.js +8 -8
  100. package/dist/recording-relay.js.map +1 -1
  101. package/dist/relay-client.d.ts +17 -4
  102. package/dist/relay-client.d.ts.map +1 -1
  103. package/dist/relay-client.js +44 -10
  104. package/dist/relay-client.js.map +1 -1
  105. package/dist/relay-core.test.d.ts.map +1 -1
  106. package/dist/relay-core.test.js +187 -26
  107. package/dist/relay-core.test.js.map +1 -1
  108. package/dist/relay-navigation.test.d.ts.map +1 -1
  109. package/dist/relay-navigation.test.js +54 -31
  110. package/dist/relay-navigation.test.js.map +1 -1
  111. package/dist/relay-session.test.d.ts.map +1 -1
  112. package/dist/relay-session.test.js +113 -65
  113. package/dist/relay-session.test.js.map +1 -1
  114. package/dist/relay-state.d.ts +158 -0
  115. package/dist/relay-state.d.ts.map +1 -0
  116. package/dist/relay-state.js +306 -0
  117. package/dist/relay-state.js.map +1 -0
  118. package/dist/relay-state.test.d.ts +2 -0
  119. package/dist/relay-state.test.d.ts.map +1 -0
  120. package/dist/relay-state.test.js +472 -0
  121. package/dist/relay-state.test.js.map +1 -0
  122. package/dist/scoped-fs.d.ts.map +1 -1
  123. package/dist/scoped-fs.js.map +1 -1
  124. package/dist/screen-recording.d.ts +42 -4
  125. package/dist/screen-recording.d.ts.map +1 -1
  126. package/dist/screen-recording.js +88 -13
  127. package/dist/screen-recording.js.map +1 -1
  128. package/dist/selector-generator.js +1 -1
  129. package/dist/snapshot-tools.test.js +71 -28
  130. package/dist/snapshot-tools.test.js.map +1 -1
  131. package/dist/start-relay-server.d.ts +1 -1
  132. package/dist/start-relay-server.d.ts.map +1 -1
  133. package/dist/start-relay-server.js +1 -1
  134. package/dist/start-relay-server.js.map +1 -1
  135. package/dist/styles-api.md +8 -1
  136. package/dist/styles-examples.d.ts +1 -1
  137. package/dist/styles-examples.d.ts.map +1 -1
  138. package/dist/styles-examples.js +1 -1
  139. package/dist/styles-examples.js.map +1 -1
  140. package/dist/styles.d.ts.map +1 -1
  141. package/dist/styles.js +1 -3
  142. package/dist/styles.js.map +1 -1
  143. package/dist/test-declarations.d.ts.map +1 -1
  144. package/dist/test-utils.d.ts +1 -1
  145. package/dist/test-utils.d.ts.map +1 -1
  146. package/dist/test-utils.js +7 -5
  147. package/dist/test-utils.js.map +1 -1
  148. package/dist/utils.d.ts.map +1 -1
  149. package/dist/utils.js.map +1 -1
  150. package/dist/wait-for-page-load.d.ts.map +1 -1
  151. package/dist/wait-for-page-load.js +1 -1
  152. package/dist/wait-for-page-load.js.map +1 -1
  153. package/package.json +4 -3
  154. package/src/a11y-client.ts +5 -4
  155. package/src/aria-snapshot.test.ts +5 -2
  156. package/src/aria-snapshot.ts +303 -116
  157. package/src/aria-snapshot.unit.test.ts +199 -141
  158. package/src/aria-snapshots/github-raw.txt +1 -1
  159. package/src/aria-snapshots/hackernews-interactive.txt +240 -240
  160. package/src/aria-snapshots/hackernews-raw.txt +270 -270
  161. package/src/assets/aria-labels-example.png +0 -0
  162. package/src/assets/aria-labels-github.png +0 -0
  163. package/src/assets/aria-labels-hacker-news.png +0 -0
  164. package/src/assets/aria-labels-old-reddit.png +0 -0
  165. package/src/assets/cursors/screen-studio/pointer-macos-tahoe-data-url.ts +5 -0
  166. package/src/assets/cursors/screen-studio/pointer-macos-tahoe.svg +18 -0
  167. package/src/cdp-log.ts +4 -1
  168. package/src/cdp-relay.ts +949 -737
  169. package/src/cdp-session.ts +12 -3
  170. package/src/cdp-types.ts +51 -51
  171. package/src/clean-html.ts +4 -5
  172. package/src/cli.ts +82 -55
  173. package/src/create-logger.ts +5 -3
  174. package/src/debugger-examples-types.ts +4 -1
  175. package/src/debugger.ts +1 -5
  176. package/src/diff-utils.ts +2 -5
  177. package/src/editor-examples.ts +11 -1
  178. package/src/editor.ts +10 -2
  179. package/src/executor.ts +372 -73
  180. package/src/executor.unit.test.ts +48 -1
  181. package/src/extension-connection.test.ts +612 -488
  182. package/src/ffmpeg.ts +769 -0
  183. package/src/ghost-browser.ts +4 -6
  184. package/src/ghost-cursor-client.ts +368 -0
  185. package/src/ghost-cursor.ts +110 -0
  186. package/src/htmlrewrite.test.ts +6 -2
  187. package/src/htmlrewrite.ts +348 -386
  188. package/src/kill-port.ts +1 -3
  189. package/src/locator-selector.test.ts +115 -0
  190. package/src/mcp-client.ts +1 -1
  191. package/src/mcp.ts +21 -15
  192. package/src/on-mouse-action.test.ts +196 -0
  193. package/src/page-markdown.ts +7 -7
  194. package/src/protocol.ts +73 -57
  195. package/src/recording-ghost-cursor.ts +107 -0
  196. package/src/recording-relay.ts +20 -12
  197. package/src/relay-client.ts +84 -17
  198. package/src/relay-core.test.ts +761 -583
  199. package/src/relay-navigation.test.ts +517 -484
  200. package/src/relay-session.test.ts +984 -929
  201. package/src/relay-state.test.ts +570 -0
  202. package/src/relay-state.ts +497 -0
  203. package/src/resource.md +21 -49
  204. package/src/scoped-fs.ts +9 -3
  205. package/src/screen-recording.ts +175 -31
  206. package/src/skill.md +619 -271
  207. package/src/snapshot-tools.test.ts +580 -528
  208. package/src/snapshots/shadcn-ui-accessibility-full.md +181 -183
  209. package/src/snapshots/shadcn-ui-accessibility-interactive.md +119 -121
  210. package/src/start-relay-server.ts +14 -11
  211. package/src/styles-examples.ts +8 -1
  212. package/src/styles.ts +20 -21
  213. package/src/test-declarations.ts +6 -6
  214. package/src/test-utils.ts +104 -91
  215. package/src/utils.ts +2 -1
  216. package/src/wait-for-page-load.ts +6 -1
package/src/skill.md CHANGED
@@ -14,6 +14,7 @@ If using npx or bunx always use @latest for the first session command. so we are
14
14
  ### Session management
15
15
 
16
16
  Each session runs in an **isolated sandbox** with its own `state` object. Use sessions to:
17
+
17
18
  - Keep state separate between different tasks or agents
18
19
  - Persist data (pages, variables) across multiple execute calls
19
20
  - Avoid interference when multiple agents use playwriter simultaneously
@@ -57,43 +58,51 @@ Default timeout is 10 seconds. you can increase the timeout with `--timeout <ms>
57
58
 
58
59
  ```bash
59
60
  # Navigate to a page
60
- playwriter -s 1 -e "state.page = await context.newPage(); await state.page.goto('https://example.com')"
61
+ playwriter -s 1 -e 'state.page = await context.newPage(); await state.page.goto("https://example.com")'
61
62
 
62
63
  # Click a button
63
- playwriter -s 1 -e "await page.click('button')"
64
+ playwriter -s 1 -e 'await state.page.click("button")'
64
65
 
65
66
  # Get page title
66
- playwriter -s 1 -e "await page.title()"
67
+ playwriter -s 1 -e 'await state.page.title()'
67
68
 
68
69
  # Take a screenshot
69
- playwriter -s 1 -e "await page.screenshot({ path: 'screenshot.png', scale: 'css' })"
70
+ playwriter -s 1 -e 'await state.page.screenshot({ path: "screenshot.png", scale: "css" })'
70
71
 
71
72
  # Get accessibility snapshot
72
- playwriter -s 1 -e "await accessibilitySnapshot({ page })"
73
+ playwriter -s 1 -e 'await snapshot({ page: state.page })'
73
74
 
74
75
  # Get accessibility snapshot for a specific iframe
75
- const frame = await page.locator('iframe').contentFrame()
76
- await accessibilitySnapshot({ frame })
76
+ playwriter -s 1 -e 'const frame = await state.page.locator("iframe").contentFrame(); await snapshot({ frame })'
77
77
  ```
78
78
 
79
+ **Why single quotes?** Always wrap `-e` code in single quotes (`'...'`) to prevent bash from interpreting `$`, backticks, and other special characters inside your JS code. Use double quotes or backtick template literals for strings inside the JS code.
80
+
79
81
  **Multiline code:**
80
82
 
81
83
  ```bash
82
- # Using $'...' syntax for multiline code
83
- playwriter -s 1 -e $'
84
- const title = await page.title();
85
- const url = page.url();
86
- console.log({ title, url });
87
- '
88
-
89
- # Or use heredoc
84
+ # Preferred: use heredoc with quoted delimiter (disables all bash expansion)
90
85
  playwriter -s 1 -e "$(cat <<'EOF'
91
- const links = await page.$$eval('a', els => els.map(e => e.href));
86
+ const links = await state.page.$$eval('a', els => els.map(e => e.href));
92
87
  console.log('Found', links.length, 'links');
88
+ const price = text.match(/\$[\d.]+/);
93
89
  EOF
94
90
  )"
91
+
92
+ # Alternative: $'...' syntax (but beware: \n and \t become special, and
93
+ # single quotes inside must be escaped as \')
94
+ playwriter -s 1 -e $'
95
+ const title = await state.page.title();
96
+ const url = state.page.url();
97
+ console.log({ title, url });
98
+ '
95
99
  ```
96
100
 
101
+ **Quoting rules summary:**
102
+ - **Single quotes** (`'...'`): best for one-liners. No bash expansion at all. But you cannot include a literal single quote inside — use double quotes for JS strings instead.
103
+ - **Heredoc** (`<<'EOF'`): best for multiline code. The quoted `'EOF'` delimiter disables all bash expansion. Any character works inside, including `$`, backticks, and single quotes.
104
+ - **`$'...'`**: allows `\'` escaping but `\n`, `\t`, `\\` become special — conflicts with JS regex patterns.
105
+
97
106
  ### Debugging playwriter issues
98
107
 
99
108
  If some internal critical error happens you can read the relay server logs to understand the issue. The log file is located in the user home directory:
@@ -119,58 +128,62 @@ If you find a bug, you can create a gh issue using `gh issue create -R remorses/
119
128
 
120
129
  Control user's Chrome browser via playwright code snippets. Prefer single-line code with semicolons between statements. Use playwriter immediately without waiting for user actions; only if you get "extension is not connected" or "no browser tabs have Playwriter enabled" should you ask the user to click the playwriter extension icon on the target tab.
121
130
 
131
+ **When to use playwriter instead of webfetch/curl:** If a website is JS-heavy (SPAs like Instagram, Twitter, Facebook, etc.), has cookie consent modals, login walls, lazy-loaded content, carousels, or infinite scroll — **always use playwriter**. Simple fetch/webfetch will return an empty HTML shell with no content. Do NOT waste time trying curl, webfetch, or parsing raw HTML from JS-rendered sites. Go straight to playwriter: navigate with a real browser, dismiss modals, then extract what you need via `page.evaluate()` or network interception.
132
+
122
133
  **If Chrome is not running**, the extension can't connect. Start Chrome from the command line before retrying:
123
134
 
124
135
  ```bash
125
136
  # macOS
126
- open -a "Google Chrome"
137
+ open -a "Google Chrome" --args --profile-directory=Default
127
138
 
128
139
  # Linux
129
- google-chrome &
140
+ google-chrome --profile-directory=Default &
130
141
 
131
142
  # Windows (cmd)
132
- start chrome.exe
143
+ start chrome.exe --profile-directory=Default
133
144
 
134
145
  # Windows (PowerShell)
135
- Start-Process chrome.exe
146
+ Start-Process chrome.exe -ArgumentList '--profile-directory=Default'
136
147
  ```
137
148
 
138
149
  To also enable automatic tab capture for screen recording (no manual extension click needed), add the `--allowlisted-extension-id` and `--auto-accept-this-tab-capture` flags:
139
150
 
140
151
  ```bash
141
152
  # macOS
142
- open -a "Google Chrome" --args --allowlisted-extension-id=jfeammnjpkecdekppnclgkkffahnhfhe --auto-accept-this-tab-capture
153
+ open -a "Google Chrome" --args --profile-directory=Default --allowlisted-extension-id=jfeammnjpkecdekppnclgkkffahnhfhe --auto-accept-this-tab-capture
143
154
 
144
155
  # Linux
145
- google-chrome --allowlisted-extension-id=jfeammnjpkecdekppnclgkkffahnhfhe --auto-accept-this-tab-capture &
156
+ google-chrome --profile-directory=Default --allowlisted-extension-id=jfeammnjpkecdekppnclgkkffahnhfhe --auto-accept-this-tab-capture &
146
157
 
147
158
  # Windows
148
- start chrome.exe --allowlisted-extension-id=jfeammnjpkecdekppnclgkkffahnhfhe --auto-accept-this-tab-capture
159
+ start chrome.exe --profile-directory=Default --allowlisted-extension-id=jfeammnjpkecdekppnclgkkffahnhfhe --auto-accept-this-tab-capture
149
160
  ```
150
161
 
151
162
  You can collaborate with the user - they can help with captchas, difficult elements, or reproducing bugs.
152
163
 
153
164
  ## context variables
154
165
 
155
- - `state` - object persisted between calls **within your session**. Each session has its own isolated state. Use to store pages, data, listeners (e.g., `state.myPage = await context.newPage()`)
166
+ - `state` - object persisted between calls **within your session**. Each session has its own isolated state. Use to store pages, data, listeners (e.g., `state.page = await context.newPage()`)
156
167
  - `page` - a default page (may be shared with other agents). Prefer creating your own page and storing it in `state` (see "working with pages")
157
168
  - `context` - browser context, access all pages via `context.pages()`
158
- - `require` - load Node.js modules like fs
169
+ - `require` - load Node.js modules (e.g., `const fs = require('node:fs')`). ESM `import` is not available in the sandbox
159
170
  - Node.js globals: `setTimeout`, `setInterval`, `fetch`, `URL`, `Buffer`, `crypto`, etc.
160
171
 
161
172
  **Important:** `state` is **session-isolated** but pages are **shared** across all sessions. See "working with pages" for how to avoid interference.
162
173
 
163
174
  ## rules
164
175
 
165
- - **Create your own page**: see "working with pages" — always create and store your own page in `state`, never use the default `page` for automation
176
+ - **Initialize state.page first**: see "working with pages" — at the start of a task, assign `state.page` (reuse `about:blank` or create one) and use `state.page` for all automation steps.
166
177
  - **Multiple calls**: use multiple execute calls for complex logic - helps understand intermediate state and isolate which action failed
167
178
  - **Never close**: never call `browser.close()` or `context.close()`. Only close pages you created or if user asks
168
179
  - **No bringToFront**: never call unless user asks - it's disruptive and unnecessary, you can interact with background pages
169
180
  - **Check state after actions**: always verify page state after clicking/submitting (see next section)
170
- - **Clean up listeners**: call `page.removeAllListeners()` at end of message to prevent leaks
171
- - **CDP sessions**: use `getCDPSession({ page })` not `page.context().newCDPSession()` - NEVER use `newCDPSession()` method, it doesn't work through playwriter relay
172
- - **Wait for load**: use `page.waitForLoadState('domcontentloaded')` not `page.waitForEvent('load')` - waitForEvent times out if already loaded
173
- - **Avoid timeouts**: prefer proper waits over `page.waitForTimeout()` - there are better ways to wait for elements
181
+ - **Clean up listeners**: call `state.page.removeAllListeners()` at end of message to prevent leaks
182
+ - **CDP sessions**: use `getCDPSession({ page: state.page })` not `state.page.context().newCDPSession()` - NEVER use `newCDPSession()` method, it doesn't work through playwriter relay
183
+ - **Wait for load**: use `state.page.waitForLoadState('domcontentloaded')` not `state.page.waitForEvent('load')` - waitForEvent times out if already loaded
184
+ - **Minimize timeouts**: prefer proper waits (`waitForSelector`, `waitForPageLoad`) over `state.page.waitForTimeout()`. Short timeouts (1-2s) are acceptable for non-deterministic events like popups, animations, or tab opens where no specific selector is available
185
+ - **Snapshot before screenshot**: always use `snapshot()` first to understand page state (text-based, fast, cheap). Only use `screenshot` when you specifically need visual/spatial information. Never take a screenshot just to check if a page loaded or to read text content — snapshot gives you that instantly without burning image tokens
186
+ - **Snapshot replaces page.evaluate() for inspection**: do NOT write `page.evaluate()` calls to manually query class names, bounding boxes, child counts, or visibility flags. `snapshot()` already shows every interactive element with its text, role, and a ready-to-use locator. If you catch yourself writing `document.querySelector` or `getBoundingClientRect` inside evaluate — stop and use `snapshot()` instead. Reserve `page.evaluate()` for actions that modify page state (e.g., `localStorage.clear()`, scroll manipulation) or extract non-DOM data (e.g., `window.__CONFIG__`)
174
187
 
175
188
  ## interaction feedback loop
176
189
 
@@ -179,10 +192,10 @@ Every browser interaction should follow a **observe → act → observe** loop.
179
192
  **Core loop:**
180
193
 
181
194
  1. **Open page** — get or create your page and navigate to the target URL
182
- 2. **Observe** — take an accessibility snapshot to understand the current state
183
- 3. **Update priors** — read the snapshot, identify the element to interact with
195
+ 2. **Observe** — print `state.page.url()` and take an accessibility snapshot. Always print the URL so you know where you are — pages can redirect, and actions can trigger unexpected navigation.
196
+ 3. **Check** — read the snapshot and URL. If the page isn't ready (still loading, expected content missing, wrong URL), **wait and observe again** — don't act on stale or incomplete state. Only proceed when you can identify the element to interact with.
184
197
  4. **Act** — perform one action (click, type, submit)
185
- 5. **Observe again** — take another snapshot to verify the action's effect
198
+ 5. **Observe again** — print URL + snapshot to verify the action's effect. If the action didn't take effect (nothing changed, page still loading), wait and observe again before proceeding.
186
199
  6. **Repeat** — continue from step 3 until the task is complete
187
200
 
188
201
  ```
@@ -191,19 +204,20 @@ Every browser interaction should follow a **observe → act → observe** loop.
191
204
  └──────────────────┬──────────────────────────┘
192
205
 
193
206
  ┌────────────────┐
194
- observe │◄─────────────────┐
195
- (snapshot) │ │
196
- └───────┬────────┘ │
197
- ▼ │
198
- ┌────────────────┐ │
199
- update priors │ │
200
- │ (read result) │ │
201
- └───────┬────────┘
202
-
203
- ┌────────────────┐
204
- act │ │
205
- (click/type) │──────────────────┘
206
- └────────────────┘
207
+ ┌───►│ observe │◄─────────────────┐
208
+ (url + snapshot) │ │
209
+ └───────┬────────┘ │
210
+ ▼ │
211
+ ┌────────────────┐ │
212
+ check │
213
+ (read result) │ │
214
+ │ └───┬────────┬───┘
215
+ not │ │ ready │
216
+ ready │ ▼ │
217
+ └────────┘ ┌────────────────┐
218
+ act │ │
219
+ │ (click/type) │─────────────┘
220
+ └────────────────┘
207
221
  ```
208
222
 
209
223
  **Example: opening a Framer plugin via the command palette**
@@ -211,30 +225,36 @@ Every browser interaction should follow a **observe → act → observe** loop.
211
225
  Each step is a separate execute call. Notice how every action is followed by a snapshot to verify what happened:
212
226
 
213
227
  ```js
214
- // 1. Open page and observe
215
- state.myPage = context.pages().find(p => p.url() === 'about:blank') ?? await context.newPage();
216
- await state.myPage.goto('https://framer.com/projects/my-project', { waitUntil: 'domcontentloaded' });
217
- await accessibilitySnapshot({ page: state.myPage }).then(console.log)
228
+ // 1. Open page and observe — always print URL first
229
+ state.page = context.pages().find((p) => p.url() === 'about:blank') ?? (await context.newPage())
230
+ await state.page.goto('https://framer.com/projects/my-project', { waitUntil: 'domcontentloaded' })
231
+ console.log('URL:', state.page.url())
232
+ await snapshot({ page: state.page }).then(console.log)
218
233
  ```
219
234
 
220
235
  ```js
221
236
  // 2. Act: open command palette → observe result
222
- await state.myPage.keyboard.press('Meta+k');
223
- await accessibilitySnapshot({ page: state.myPage, search: /dialog|Search/ }).then(console.log)
237
+ await state.page.keyboard.press('Meta+k')
238
+ console.log('URL:', state.page.url())
239
+ await snapshot({ page: state.page, search: /dialog|Search/ }).then(console.log)
240
+ // If dialog didn't appear, observe again before retrying
224
241
  ```
225
242
 
226
243
  ```js
227
244
  // 3. Act: type search query → observe result
228
- await state.myPage.keyboard.type('MCP');
229
- await accessibilitySnapshot({ page: state.myPage, search: /MCP/ }).then(console.log)
245
+ await state.page.keyboard.type('MCP')
246
+ console.log('URL:', state.page.url())
247
+ await snapshot({ page: state.page, search: /MCP/ }).then(console.log)
230
248
  ```
231
249
 
232
250
  ```js
233
251
  // 4. Act: press Enter → observe plugin loaded
234
- await state.myPage.keyboard.press('Enter');
235
- await state.myPage.waitForTimeout(1000);
236
- const frame = state.myPage.frames().find(f => f.url().includes('plugins.framercdn.com'));
237
- await accessibilitySnapshot({ page: state.myPage, frame: frame || undefined }).then(console.log)
252
+ await state.page.keyboard.press('Enter')
253
+ await state.page.waitForTimeout(1000)
254
+ console.log('URL:', state.page.url())
255
+ const frame = state.page.frames().find((f) => f.url().includes('plugins.framercdn.com'))
256
+ await snapshot({ page: state.page, frame: frame || undefined }).then(console.log)
257
+ // If frame not found, wait and observe again — plugin may still be loading
238
258
  ```
239
259
 
240
260
  **Other ways to observe action results:**
@@ -243,226 +263,321 @@ Snapshots are the primary feedback mechanism, but some actions have side effects
243
263
 
244
264
  - **Console logs** — check for errors or app state after an action:
245
265
  ```js
246
- await getLatestLogs({ page, search: /error|fail/i, count: 20 })
266
+ await getLatestLogs({ page: state.page, search: /error|fail/i, count: 20 })
247
267
  ```
248
268
  - **Network requests** — verify API calls were made after a form submit or button click:
249
269
  ```js
250
- page.on('response', async res => { if (res.url().includes('/api/')) { console.log(res.status(), res.url()); } });
270
+ state.page.on('response', async (res) => {
271
+ if (res.url().includes('/api/')) {
272
+ console.log(res.status(), res.url())
273
+ }
274
+ })
251
275
  ```
252
276
  - **URL changes** — confirm navigation happened:
253
277
  ```js
254
- console.log(page.url())
278
+ console.log(state.page.url())
255
279
  ```
256
- - **Screenshots** — only when you need to verify visual layout (CSS, spatial positioning, colors). Snapshots are always preferred for content verification.
280
+ - **Screenshots** — only for visual layout issues (see "choosing between snapshot methods" below).
257
281
 
258
282
  ## common mistakes to avoid
259
283
 
260
284
  **1. Not verifying actions succeeded**
261
285
  Always check page state after important actions (form submissions, uploads, typing). Your mental model can diverge from actual browser state:
286
+
262
287
  ```js
263
- await page.keyboard.type('my text');
264
- await accessibilitySnapshot({ page, search: /my text/ })
288
+ await state.page.keyboard.type('my text')
289
+ await snapshot({ page: state.page, search: /my text/ })
265
290
  // If verifying visual layout specifically, use screenshotWithAccessibilityLabels instead
266
291
  ```
267
292
 
268
293
  **2. Assuming paste/upload worked**
269
294
  Clipboard paste (`Meta+v`) can silently fail. For file uploads, prefer file input:
295
+
270
296
  ```js
271
297
  // Reliable: use file input
272
- const fileInput = page.locator('input[type="file"]').first();
273
- await fileInput.setInputFiles('/path/to/image.png');
298
+ const fileInput = state.page.locator('input[type="file"]').first()
299
+ await fileInput.setInputFiles('/path/to/image.png')
274
300
 
275
301
  // Unreliable: clipboard paste may silently fail, need to focus textarea first for example
276
- await page.keyboard.press('Meta+v'); // always verify with screenshot!
302
+ await state.page.keyboard.press('Meta+v') // always verify with screenshot!
277
303
  ```
278
304
 
279
305
  **3. Using stale locators from old snapshots**
280
306
  Locators (especially ones with `>> nth=`) can change when the page updates. Always get a fresh snapshot before clicking:
307
+
281
308
  ```js
282
309
  // BAD: using ref from minutes ago
283
- await page.locator('[id="old-id"]').click(); // element may have changed
310
+ await state.page.locator('[id="old-id"]').click() // element may have changed
284
311
 
285
312
  // GOOD: get fresh snapshot, then immediately use locators from it
286
- await accessibilitySnapshot({ page, showDiffSinceLastCall: true })
313
+ await snapshot({ page: state.page, showDiffSinceLastCall: true })
287
314
  // Now use the NEW locators from this output
288
315
  ```
289
316
 
290
317
  **4. Wrong assumptions about current page/element**
291
318
  Before destructive actions (delete, submit), verify you're targeting the right thing:
319
+
292
320
  ```js
293
321
  // Before deleting, verify it's the right item
294
- await page.screenshotWithAccessibilityLabels({ page });
322
+ await screenshotWithAccessibilityLabels({ page: state.page })
295
323
  // READ the screenshot to confirm, THEN proceed with delete
296
324
  ```
297
325
 
298
326
  **5. Text concatenation without line breaks**
299
327
  `keyboard.type()` doesn't insert newlines from `\n` in strings. Use `keyboard.press('Enter')`:
328
+
300
329
  ```js
301
330
  // BAD: newlines in string don't create line breaks
302
- await page.keyboard.type('Line 1\nLine 2'); // becomes "Line 1Line 2"
331
+ await state.page.keyboard.type('Line 1\nLine 2') // becomes "Line 1Line 2"
303
332
 
304
333
  // GOOD: use Enter key for line breaks
305
- await page.keyboard.type('Line 1');
306
- await page.keyboard.press('Enter');
307
- await page.keyboard.type('Line 2');
334
+ await state.page.keyboard.type('Line 1')
335
+ await state.page.keyboard.press('Enter')
336
+ await state.page.keyboard.type('Line 2')
308
337
  ```
309
338
 
310
- **6. Quote escaping in $'...' syntax**
311
- When using `$'...'` for multiline code, nested quotes break parsing. Use different quote styles or escape them:
339
+ **6. Quote escaping in bash**
340
+ Bash parses `$`, backticks, and `\` inside double-quoted strings. This silently corrupts JS code containing dollar signs (regex like `/\$[\d.]+/`), template literals, or backslash patterns.
341
+
312
342
  ```bash
313
- # BAD: nested double quotes break $'...'
314
- playwriter -s 1 -e $'await page.locator("[id=\"_r_a_\"]").click()'
343
+ # BAD: double quotes bash interprets $ and backticks in your JS
344
+ playwriter -s 1 -e "const price = text.match(/\$[\d.]+/)"
315
345
 
316
- # GOOD: use single quotes inside, or template strings
317
- playwriter -s 1 -e $'await page.locator(\'[id="_r_a_"]\').click()'
346
+ # GOOD: single quotes bash passes everything through literally
347
+ playwriter -s 1 -e 'await state.page.locator(`[id="_r_a_"]`).click()'
318
348
 
319
- # GOOD: use heredoc for complex quoting
349
+ # GOOD: heredoc for complex code with mixed quotes
320
350
  playwriter -s 1 -e "$(cat <<'EOF'
321
- await page.locator('[id="_r_a_"]').click()
351
+ await state.page.locator('[id="_r_a_"]').click()
352
+ const match = html.match(/\$[\d.]+/g)
322
353
  EOF
323
354
  )"
324
355
  ```
325
356
 
326
357
  **7. Using screenshots when snapshots suffice**
327
358
  Screenshots + image analysis is expensive and slow. Only use screenshots for visual/CSS issues:
359
+
328
360
  ```js
329
361
  // BAD: screenshot to check if text appeared (wastes tokens on image analysis)
330
- await page.screenshot({ path: 'check.png', scale: 'css' });
362
+ await state.page.screenshot({ path: 'check.png', scale: 'css' })
331
363
 
332
364
  // GOOD: snapshot is text — fast, cheap, searchable
333
- await accessibilitySnapshot({ page, search: /expected text/i })
365
+ await snapshot({ page: state.page, search: /expected text/i })
334
366
 
335
367
  // GOOD: evaluate DOM directly for content checks
336
- const text = await page.evaluate(() => document.querySelector('.message')?.textContent);
368
+ const text = await state.page.evaluate(() => document.querySelector('.message')?.textContent)
337
369
  ```
338
370
 
339
371
  **8. Assuming page content loaded**
340
372
  Even after `goto()`, dynamic content may not be ready:
373
+
341
374
  ```js
342
- await page.goto('https://example.com');
375
+ await state.page.goto('https://example.com')
343
376
  // Content may still be loading via JavaScript!
344
- await page.waitForSelector('article', { timeout: 10000 });
377
+ await state.page.waitForSelector('article', { timeout: 10000 })
345
378
  // Or use waitForPageLoad utility
346
- await waitForPageLoad({ page, timeout: 5000 });
379
+ await waitForPageLoad({ page: state.page, timeout: 5000 })
347
380
  ```
348
381
 
349
- **9. Login buttons that open popups**
382
+ **9. Not using playwriter for JS-rendered sites**
383
+ Do NOT waste context trying webfetch, curl, or Playwright CLI screenshots on SPAs (Instagram, Twitter, etc.). These sites return empty HTML shells — the real content is rendered by JavaScript. Use playwriter with a real browser session instead:
384
+
385
+ ```js
386
+ // BAD: webfetch/curl on Instagram returns empty HTML, grep finds nothing, huge context wasted
387
+ // BAD: Playwright CLI screenshot needs browser install, produces blank/modal-blocked images
388
+
389
+ // GOOD: use playwriter — real browser, full JS rendering, interactive
390
+ state.page = context.pages().find((p) => p.url() === 'about:blank') ?? (await context.newPage())
391
+ await state.page.goto('https://www.instagram.com/p/ABC123/', { waitUntil: 'domcontentloaded' })
392
+ await waitForPageLoad({ page: state.page, timeout: 8000 })
393
+ await snapshot({ page: state.page, search: /cookie|consent|accept/i }).then(console.log)
394
+ // Now you can see modals, dismiss them, navigate carousels, extract content
395
+ ```
396
+
397
+ **10. Login buttons that open popups**
350
398
  Playwriter extension cannot control popup windows. If a login button opens a popup (common with OAuth/SSO), use cmd+click to open in a new tab instead:
399
+
351
400
  ```js
352
401
  // BAD: popup window is not controllable by playwriter
353
- await page.click('button:has-text("Login with Google")');
402
+ await state.page.click('button:has-text("Login with Google")')
354
403
 
355
404
  // GOOD: cmd+click opens in new tab that playwriter can control
356
- await page.locator('button:has-text("Login with Google")').click({ modifiers: ['Meta'] });
357
- await page.waitForTimeout(2000);
405
+ await state.page.locator('button:has-text("Login with Google")').click({ modifiers: ['Meta'] })
406
+ await state.page.waitForTimeout(2000)
358
407
 
359
408
  // Verify new tab opened - last page should be the login page
360
- const pages = context.pages();
361
- const loginPage = pages[pages.length - 1];
362
- if (loginPage.url() === page.url()) {
363
- throw new Error('Cmd+click did not open new tab - login may have opened as popup');
409
+ const pages = context.pages()
410
+ const loginPage = pages[pages.length - 1]
411
+ if (loginPage.url() === state.page.url()) {
412
+ throw new Error('Cmd+click did not open new tab - login may have opened as popup')
364
413
  }
365
414
 
366
415
  // Complete login flow in loginPage, cookies are shared with original page
367
- await loginPage.locator('[data-email]').first().click();
368
- await loginPage.waitForURL('**/callback**');
416
+ await loginPage.locator('[data-email]').first().click()
417
+ await loginPage.waitForURL('**/callback**')
369
418
  // Original page should now be authenticated
370
419
  ```
371
420
 
421
+ **11. Click times out or does nothing — snapshot to find the blocker**
422
+ When a click times out, a **modal or overlay** is likely intercepting pointer events. Do not retry with different selectors or `{ force: true }` — snapshot to find the blocker:
423
+
424
+ ```js
425
+ // click timed out → don't retry blindly, find what's blocking
426
+ await snapshot({ page: state.page, search: /dialog|modal/i })
427
+ // Found modal → interact with it properly (don't just close via X, it may reappear)
428
+ await state.page.getByRole('radio', { name: 'Nope, Vanilla' }).click()
429
+ ```
430
+
431
+ **12. Never use `dispatchEvent` or `{ force: true }` to bypass blockers**
432
+ `dispatchEvent(new MouseEvent(...))` and `{ force: true }` bypass Playwright checks but **do not trigger React/Vue/Svelte handlers** — state won't update. The same applies to `element.click()` inside `page.evaluate()`. If a click "succeeds" but nothing changes, you're either clicking the wrong node or using the wrong interaction pattern:
433
+
434
+ ```js
435
+ // BAD: heading click bypasses overlay but React ignores it
436
+ await state.page.locator('h3:has-text("Node.js")').click({ force: true })
437
+ // BAD: evaluate click bypasses all Playwright input simulation
438
+ await state.page.evaluate(() => document.querySelector('button').click())
439
+ // GOOD: snapshot shows the real interactive element is a radio, not the heading
440
+ await state.page.getByRole('radio', { name: 'Node.js' }).click()
441
+ ```
442
+
443
+ **13. Over-investigating instead of just interacting**
444
+ When something doesn't respond to a click, do NOT start inspecting CDP event listeners, React fibers, canvas pixel data, or writing `page.evaluate()` to read class names and bounding boxes. This wastes massive context. Instead:
445
+
446
+ 1. Take a `snapshot()` — it shows every interactive element and what to click
447
+ 2. Try a different interaction pattern if `click()` didn't work:
448
+ - **Drawing/annotation tools, canvas paint** → `mouse.down`, move with steps, `mouse.up` (see drag section)
449
+ - **Keyboard-activated modes** → press the shortcut key (snapshot shows tooltip text like "Draw mode D")
450
+ - **Sliders, timeline scrubbers** → drag pattern
451
+ - **Collapsed/toggled toolbars** → click the toggle first, wait, then interact
452
+ 3. Take another `snapshot()` to see what changed
453
+ 4. Only investigate DOM internals if correct interaction patterns produce zero response after 2–3 attempts
454
+
372
455
  ## checking page state
373
456
 
374
- After any action (click, submit, navigate), verify what happened. **Always prefer accessibility snapshots over screenshots** — snapshots are text (cheap, fast, searchable), screenshots require image analysis (expensive, slow).
457
+ After any action (click, submit, navigate), verify what happened. Always print URL first, then snapshot:
375
458
 
376
459
  ```js
377
- // Default: use snapshot with optional filtering
378
- page.url() + '\n' + await accessibilitySnapshot({ page })
460
+ // Always print URL first, then snapshot
461
+ console.log('URL:', state.page.url())
462
+ await snapshot({ page: state.page }).then(console.log)
379
463
 
380
464
  // Filter for specific content when snapshot is large
381
- await accessibilitySnapshot({ page, search: /dialog|button|error/i })
465
+ console.log('URL:', state.page.url())
466
+ await snapshot({ page: state.page, search: /dialog|button|error/i }).then(console.log)
382
467
  ```
383
468
 
384
- Only use `screenshotWithAccessibilityLabels({ page })` for **visual layout issues** (CSS bugs, spatial positioning, colors). For verifying text content, button states, or form values, snapshots are always sufficient.
385
-
386
- If nothing changed, try `await waitForPageLoad({ page, timeout: 3000 })` or you may have clicked the wrong element.
469
+ If nothing changed, try `await waitForPageLoad({ page: state.page, timeout: 3000 })` or you may have clicked the wrong element.
387
470
 
388
471
  ## accessibility snapshots
389
472
 
390
473
  ```js
391
- await accessibilitySnapshot({ page, search?, showDiffSinceLastCall? })
474
+ await snapshot({ page: state.page, search?, showDiffSinceLastCall? })
392
475
  ```
393
476
 
477
+ `accessibilitySnapshot` is still available as an alias for backward compatibility.
478
+
394
479
  - `search` - string/regex to filter results (returns first 10 matching lines)
395
- - `showDiffSinceLastCall` - returns diff since last snapshot (default: `true`). Pass `false` to get full snapshot.
480
+ - `showDiffSinceLastCall` - returns diff since last snapshot (default: `true`, but `false` when `search` is provided). Pass `false` to get full snapshot.
396
481
 
397
- Snapshots return full content on first call, then diffs on subsequent calls. If nothing changed, returns "No changes since last snapshot" message. Use `showDiffSinceLastCall: false` to always get full content.
482
+ Snapshots return full content on first call, then diffs on subsequent calls. Diff is only returned when shorter than full content. If nothing changed, returns "No changes since last snapshot" message. Use `showDiffSinceLastCall: false` to always get full content. When `search` is provided, diffing is disabled by default so the search filters the full content — pass `showDiffSinceLastCall: true` explicitly to combine both. This diffing behavior also applies to `getCleanHTML` and `getPageMarkdown`.
398
483
 
399
484
  Example output:
400
485
 
401
486
  ```md
402
487
  - banner:
403
- - link "Home" [id="nav-home"]
404
- - navigation:
405
- - link "Docs" [data-testid="docs-link"]
406
- - link "Blog" role=link[name="Blog"]
488
+ - link "Home" [id="nav-home"]
489
+ - navigation:
490
+ - link "Docs" [data-testid="docs-link"]
491
+ - link "Blog" role=link[name="Blog"]
407
492
  ```
408
493
 
409
- Each interactive line ends with a Playwright locator you can pass to `page.locator()`.
494
+ Each interactive line ends with a Playwright locator you can pass to `state.page.locator()`.
410
495
  If multiple elements share the same locator, a `>> nth=N` suffix is added (0-based)
411
496
  to make it unique.
412
497
 
498
+ **Use snapshot locators directly — never invent selectors.** The snapshot output IS the selector. Do not guess CSS selectors or `getByText` when the snapshot already gives you the exact match:
499
+
500
+ ```js
501
+ // Snapshot shows: role=radio[name="Nope, Vanilla"] → use it directly
502
+ await state.page.getByRole('radio', { name: 'Nope, Vanilla' }).click()
503
+ // Snapshot shows: role=link[name="SIGN IN"] → or pass raw string to locator()
504
+ await state.page.locator('role=link[name="SIGN IN"]').click()
505
+ ```
506
+
507
+ **Beware CSS text-transform**: snapshots show visual text (`heading "NODE.JS"`) but DOM may be `"Node.js"`. Use case-insensitive regex: `getByRole('heading', { name: /node\.js/i })`.
508
+
413
509
  If a screenshot shows ref labels like `e3`, resolve them using the last snapshot:
414
510
 
415
511
  ```js
416
- const snapshot = await accessibilitySnapshot({ page })
512
+ const snap = await snapshot({ page: state.page })
417
513
  const locator = refToLocator({ ref: 'e3' })
418
- await page.locator(locator!).click()
514
+ await state.page.locator(locator!).click()
419
515
  ```
420
516
 
421
517
  ```js
422
- await page.locator('[id="nav-home"]').click()
423
- await page.locator('[data-testid="docs-link"]').click()
424
- await page.locator('role=link[name="Blog"]').click()
518
+ await state.page.locator('[id="nav-home"]').click()
519
+ await state.page.locator('[data-testid="docs-link"]').click()
520
+ await state.page.locator('role=link[name="Blog"]').click()
425
521
  ```
426
522
 
427
523
  Search for specific elements:
428
524
 
429
525
  ```js
430
- const snapshot = await accessibilitySnapshot({ page, search: /button|submit/i })
526
+ const snap = await snapshot({ page: state.page, search: /button|submit/i })
527
+ ```
528
+
529
+ **Scoping snapshots to a specific element** — pass a `locator` instead of `page` to snapshot only a subtree. This dramatically reduces output size when you only care about one section of the page (e.g., the main content area, ignoring the sidebar/header/footer):
530
+
531
+ ```js
532
+ // Full page snapshot: ~150 lines (sidebar, nav, header, footer, everything)
533
+ await snapshot({ page: state.page })
534
+
535
+ // Scoped to main: ~20 lines (just the content you care about)
536
+ await snapshot({ locator: state.page.locator('main') })
537
+
538
+ // Scope to a specific form, dialog, or section
539
+ await snapshot({ locator: state.page.locator('[role="dialog"]') })
540
+ await snapshot({ locator: state.page.locator('form#checkout') })
431
541
  ```
432
542
 
543
+ Use this whenever the full page snapshot is dominated by navigation or layout elements you don't need. It saves significant tokens and makes the output much easier to parse.
544
+
433
545
  **Filtering large snapshots in JS** — when the built-in `search` isn't enough (e.g., you need multiple patterns or custom logic), filter the snapshot string directly:
434
546
 
435
547
  ```js
436
- const snap = await accessibilitySnapshot({ page, showDiffSinceLastCall: false });
437
- const relevant = snap.split('\n').filter(l =>
438
- l.includes('dialog') || l.includes('error') || l.includes('button')
439
- ).join('\n');
440
- console.log(relevant);
548
+ const snap = await snapshot({ page: state.page, showDiffSinceLastCall: false })
549
+ const relevant = snap
550
+ .split('\n')
551
+ .filter((l) => l.includes('dialog') || l.includes('error') || l.includes('button'))
552
+ .join('\n')
553
+ console.log(relevant)
441
554
  ```
442
555
 
443
556
  This is much cheaper than taking a screenshot — use it as your primary debugging tool for verifying text content, checking if elements exist, or confirming state changes.
444
557
 
445
558
  ## choosing between snapshot methods
446
559
 
447
- Both `accessibilitySnapshot` and `screenshotWithAccessibilityLabels` use the same ref system, so you can combine them effectively.
560
+ Both `snapshot` and `screenshotWithAccessibilityLabels` use the same ref system, so you can combine them effectively.
561
+
562
+ **Use `snapshot` when:**
448
563
 
449
- **Use `accessibilitySnapshot` when:**
450
564
  - Page has simple, semantic structure (articles, forms, lists)
451
565
  - You need to search for specific text or patterns
452
566
  - Token usage matters (text is smaller than images)
453
567
  - You need to process the output programmatically
454
568
 
455
569
  **Use `screenshotWithAccessibilityLabels` when:**
570
+
456
571
  - Page has complex visual layout (grids, galleries, dashboards, maps)
457
572
  - Spatial position matters (e.g., "first image", "top-left button")
458
573
  - DOM order doesn't match visual order
459
574
  - You need to understand the visual hierarchy
460
575
 
461
- **Combining both:** Use screenshot first to understand layout and identify target elements visually, then use `accessibilitySnapshot({ search: /pattern/ })` for efficient searching in subsequent calls.
576
+ **Combining both:** Use screenshot first to understand layout and identify target elements visually, then use `snapshot({ search: /pattern/ })` for efficient searching in subsequent calls.
462
577
 
463
578
  ## selector best practices
464
579
 
465
- **For unknown websites**: use `accessibilitySnapshot()` - it shows what's actually interactive with stable locators.
580
+ **For unknown websites**: use `snapshot()` - it shows what's actually interactive with stable locators.
466
581
 
467
582
  **For development** (when you have source code access), prefer stable selectors in this order:
468
583
 
@@ -476,16 +591,16 @@ Both `accessibilitySnapshot` and `screenshotWithAccessibilityLabels` use the sam
476
591
  Combine locators for precision:
477
592
 
478
593
  ```js
479
- page.locator('tr').filter({ hasText: 'John' }).locator('button').click()
480
- page.locator('button').nth(2).click()
594
+ state.page.locator('tr').filter({ hasText: 'John' }).locator('button').click()
595
+ state.page.locator('button').nth(2).click()
481
596
  ```
482
597
 
483
598
  If a locator matches multiple elements, Playwright throws "strict mode violation". Use `.first()`, `.last()`, or `.nth(n)`:
484
599
 
485
600
  ```js
486
- await page.locator('button').first().click() // first match
487
- await page.locator('.item').last().click() // last match
488
- await page.locator('li').nth(3).click() // 4th item (0-indexed)
601
+ await state.page.locator('button').first().click() // first match
602
+ await state.page.locator('.item').last().click() // last match
603
+ await state.page.locator('li').nth(3).click() // 4th item (0-indexed)
489
604
  ```
490
605
 
491
606
  ## working with pages
@@ -494,15 +609,15 @@ await page.locator('li').nth(3).click() // 4th item (0-indexed)
494
609
 
495
610
  **Get or create your page (first call):**
496
611
 
497
- On your very first execute call, reuse an existing empty tab or create a new one, and navigate it **in the same execute call**. Store it in `state` and use `state.myPage` for all subsequent operations instead of the default `page` variable:
612
+ On your very first execute call, reuse an existing empty tab or create a new one, and navigate it **in the same execute call**. Store it in `state` and use `state.page` for all subsequent operations instead of the default `page` variable:
498
613
 
499
614
  ```js
500
615
  // Reuse an empty about:blank tab if available, otherwise create a new one.
501
616
  // IMPORTANT: always navigate immediately in the same call to avoid another
502
617
  // agent grabbing the same about:blank tab between execute calls.
503
- state.myPage = context.pages().find(p => p.url() === 'about:blank') ?? await context.newPage();
504
- await state.myPage.goto('https://example.com');
505
- // Use state.myPage for ALL subsequent operations
618
+ state.page = context.pages().find((p) => p.url() === 'about:blank') ?? (await context.newPage())
619
+ await state.page.goto('https://example.com')
620
+ // Use state.page for ALL subsequent operations
506
621
  ```
507
622
 
508
623
  **Handle page closures gracefully:**
@@ -510,10 +625,10 @@ await state.myPage.goto('https://example.com');
510
625
  The user may close your page by accident (e.g., closing a tab in Chrome). Always check before using it and recreate if needed:
511
626
 
512
627
  ```js
513
- if (!state.myPage || state.myPage.isClosed()) {
514
- state.myPage = context.pages().find(p => p.url() === 'about:blank') ?? await context.newPage();
628
+ if (!state.page || state.page.isClosed()) {
629
+ state.page = context.pages().find((p) => p.url() === 'about:blank') ?? (await context.newPage())
515
630
  }
516
- await state.myPage.goto('https://example.com');
631
+ await state.page.goto('https://example.com')
517
632
  ```
518
633
 
519
634
  **Use an existing page only when the user asks:**
@@ -521,16 +636,16 @@ await state.myPage.goto('https://example.com');
521
636
  Only use a page from `context.pages()` if the user explicitly asks you to control a specific tab they already opened (e.g., they're logged into an app). Find it by URL pattern and store it in state:
522
637
 
523
638
  ```js
524
- const pages = context.pages().filter(x => x.url().includes('myapp.com'));
525
- if (pages.length === 0) throw new Error('No myapp.com page found. Ask user to enable playwriter on it.');
526
- if (pages.length > 1) throw new Error(`Found ${pages.length} matching pages, expected 1`);
527
- state.targetPage = pages[0];
639
+ const pages = context.pages().filter((x) => x.url().includes('myapp.com'))
640
+ if (pages.length === 0) throw new Error('No myapp.com page found. Ask user to enable playwriter on it.')
641
+ if (pages.length > 1) throw new Error(`Found ${pages.length} matching pages, expected 1`)
642
+ state.targetPage = pages[0]
528
643
  ```
529
644
 
530
645
  **List all available pages:**
531
646
 
532
647
  ```js
533
- context.pages().map(p => p.url())
648
+ context.pages().map((p) => p.url())
534
649
  ```
535
650
 
536
651
  ## navigation
@@ -538,8 +653,8 @@ context.pages().map(p => p.url())
538
653
  **Use `domcontentloaded`** for `page.goto()`:
539
654
 
540
655
  ```js
541
- await page.goto('https://example.com', { waitUntil: 'domcontentloaded' });
542
- await waitForPageLoad({ page, timeout: 5000 });
656
+ await state.page.goto('https://example.com', { waitUntil: 'domcontentloaded' })
657
+ await waitForPageLoad({ page: state.page, timeout: 5000 })
543
658
  ```
544
659
 
545
660
  ## common patterns
@@ -550,30 +665,31 @@ await waitForPageLoad({ page, timeout: 5000 });
550
665
  // BAD: curl/external requests don't have session cookies
551
666
  // curl -H "Cookie: ..." often fails due to missing cookies or CSRF
552
667
 
553
- // GOOD: fetch inside page.evaluate uses browser's full session
554
- const data = await page.evaluate(async (url) => {
555
- const resp = await fetch(url);
556
- return await resp.text();
557
- }, 'https://example.com/protected/resource');
668
+ // GOOD: fetch inside state.page.evaluate uses browser's full session
669
+ const data = await state.page.evaluate(async (url) => {
670
+ const resp = await fetch(url)
671
+ return await resp.text()
672
+ }, 'https://example.com/protected/resource')
558
673
  ```
559
674
 
560
675
  **Downloading large data** - console output truncates large strings. Trigger a browser download instead:
561
676
 
562
677
  ```js
563
678
  // Fetch protected data and trigger download to user's Downloads folder
564
- await page.evaluate(async (url) => {
565
- const resp = await fetch(url);
566
- const data = await resp.text();
567
- const blob = new Blob([data], { type: 'application/octet-stream' });
568
- const a = document.createElement('a');
569
- a.href = URL.createObjectURL(blob);
570
- a.download = 'data.json';
571
- a.click();
572
- }, 'https://example.com/protected/large-file');
679
+ await state.page.evaluate(async (url) => {
680
+ const resp = await fetch(url)
681
+ const data = await resp.text()
682
+ const blob = new Blob([data], { type: 'application/octet-stream' })
683
+ const a = document.createElement('a')
684
+ a.href = URL.createObjectURL(blob)
685
+ a.download = 'data.json'
686
+ a.click()
687
+ }, 'https://example.com/protected/large-file')
573
688
  // File saves to ~/Downloads - read it from there
574
689
  ```
575
690
 
576
691
  **Avoid permission-gated browser APIs** - some APIs require user permission prompts or special browser flags. These often fail silently or hang. Examples to avoid:
692
+
577
693
  - `navigator.clipboard.writeText()` - requires permission
578
694
  - Multiple concurrent downloads - browser may block
579
695
  - `window.showSaveFilePicker()` - requires user gesture
@@ -581,42 +697,86 @@ await page.evaluate(async (url) => {
581
697
 
582
698
  Instead, use simpler alternatives (single download via `a.click()`, store data in `state`, etc).
583
699
 
584
- **Links that open new tabs** - use cmd+click to open in a controllable new tab:
700
+ **Links that open new tabs** - playwriter cannot control popup windows opened via `window.open`. Use cmd+click to open in a controllable new tab instead (see mistake #9 above for a full example):
585
701
 
586
702
  ```js
587
- // For links with target=_blank or buttons that open popups
588
- await page.locator('a[target=_blank]').click({ modifiers: ['Meta'] });
589
- await page.waitForTimeout(1000);
590
-
591
- // New tab is last in context.pages()
592
- const pages = context.pages();
593
- const newTab = pages[pages.length - 1];
594
- console.log('New tab URL:', newTab.url());
703
+ await state.page.locator('a[target=_blank]').click({ modifiers: ['Meta'] })
704
+ await state.page.waitForTimeout(1000)
705
+ const pages = context.pages()
706
+ const newTab = pages[pages.length - 1]
707
+ console.log('New tab URL:', newTab.url())
595
708
  ```
596
709
 
597
- Note: `page.waitForEvent('popup')` is unreliable - playwriter cannot control popup windows opened via `window.open`. Use cmd+click instead.
598
-
599
710
  **Downloads** - capture and save:
600
711
 
601
712
  ```js
602
- const [download] = await Promise.all([page.waitForEvent('download'), page.click('button.download')]);
603
- await download.saveAs(`/tmp/${download.suggestedFilename()}`);
713
+ const [download] = await Promise.all([state.page.waitForEvent('download'), state.page.click('button.download')])
714
+ await download.saveAs(`/tmp/${download.suggestedFilename()}`)
604
715
  ```
605
716
 
606
- **iFrames** - use frameLocator:
717
+ **iFrames** - two approaches depending on what you need:
607
718
 
608
719
  ```js
609
- const frame = page.frameLocator('#my-iframe');
610
- await frame.locator('button').click();
720
+ // frameLocator: for chaining locator operations (click, fill, etc.)
721
+ const frame = state.page.frameLocator('#my-iframe')
722
+ await frame.locator('button').click()
723
+
724
+ // contentFrame: returns a Frame object, needed for snapshot({ frame })
725
+ const frame2 = await state.page.locator('iframe').contentFrame()
726
+ await snapshot({ frame: frame2 })
611
727
  ```
612
728
 
613
729
  **Dialogs** - handle alerts/confirms/prompts:
614
730
 
615
731
  ```js
616
- page.on('dialog', async dialog => { console.log(dialog.message()); await dialog.accept(); });
617
- await page.click('button.trigger-alert');
732
+ state.page.on('dialog', async (dialog) => {
733
+ console.log(dialog.message())
734
+ await dialog.accept()
735
+ })
736
+ await state.page.click('button.trigger-alert')
618
737
  ```
619
738
 
739
+ **Handling page obstacles (cookie modals, login walls, age gates)** - most major websites show blocking overlays. Always check for these with `snapshot()` right after navigation and dismiss them before doing anything else:
740
+
741
+ ```js
742
+ // After navigating, check for common obstacles
743
+ await waitForPageLoad({ page: state.page, timeout: 5000 })
744
+ const snap = await snapshot({
745
+ page: state.page,
746
+ search: /cookie|consent|accept|reject|decline|allow|age|verify|login|sign.in/i,
747
+ })
748
+ console.log(snap)
749
+ // Look for dismiss/accept/decline buttons in the snapshot, then click them:
750
+ // await state.page.locator('button:has-text("Accept")').click();
751
+ // await state.page.locator('button:has-text("Decline optional")').click();
752
+ // Then re-snapshot to confirm the modal is gone before proceeding
753
+ ```
754
+
755
+ If the page requires login and the user is already logged into Chrome, their session cookies are available — just navigate and the page should load authenticated. If not, ask the user for help or use their existing logged-in tab via `context.pages()`.
756
+
757
+ **Extracting and downloading media (images, videos)** - use `page.evaluate()` to extract URLs from the rendered DOM, then download via Node.js in the sandbox. This is far more reliable than parsing raw HTML:
758
+
759
+ ```js
760
+ // Extract all image URLs from rendered DOM
761
+ const images = await state.page.evaluate(() =>
762
+ Array.from(document.querySelectorAll('img[src]')).map((img) => ({
763
+ src: img.src,
764
+ alt: img.alt,
765
+ width: img.naturalWidth,
766
+ })),
767
+ )
768
+ console.log(JSON.stringify(images, null, 2))
769
+
770
+ // Download a specific image to disk
771
+ const fs = require('node:fs')
772
+ const resp = await fetch(images[0].src)
773
+ const buf = Buffer.from(await resp.arrayBuffer())
774
+ fs.writeFileSync('./downloaded-image.jpg', buf)
775
+ console.log('Saved', buf.length, 'bytes')
776
+ ```
777
+
778
+ For carousels or lazy-loaded galleries, you may need to click navigation arrows or scroll first, then re-extract. Use network interception (see "network interception" section) to capture high-resolution CDN URLs that may differ from the `img.src` thumbnails.
779
+
620
780
  ## utility functions
621
781
 
622
782
  **getLatestLogs** - retrieve captured browser console logs (up to 5000 per page, cleared on navigation):
@@ -625,51 +785,53 @@ await page.click('button.trigger-alert');
625
785
  await getLatestLogs({ page?, count?, search? })
626
786
  // Examples:
627
787
  const errors = await getLatestLogs({ search: /error/i, count: 50 })
628
- const pageLogs = await getLatestLogs({ page })
788
+ const pageLogs = await getLatestLogs({ page: state.page })
629
789
  ```
630
790
 
631
- For custom log collection across runs, store in state: `state.logs = []; page.on('console', m => state.logs.push(m.text()))`
791
+ For custom log collection across runs, store in state: `state.logs = []; state.page.on('console', m => state.logs.push(m.text()))`
632
792
 
633
793
  **getCleanHTML** - get cleaned HTML from a locator or page, with search and diffing:
634
794
 
635
795
  ```js
636
796
  await getCleanHTML({ locator, search?, showDiffSinceLastCall?, includeStyles? })
637
797
  // Examples:
638
- const html = await getCleanHTML({ locator: page.locator('body') })
639
- const html = await getCleanHTML({ locator: page, search: /button/i })
640
- const fullHtml = await getCleanHTML({ locator: page, showDiffSinceLastCall: false }) // disable diff
798
+ const html = await getCleanHTML({ locator: state.page.locator('body') })
799
+ const html = await getCleanHTML({ locator: state.page, search: /button/i })
800
+ const fullHtml = await getCleanHTML({ locator: state.page, showDiffSinceLastCall: false }) // disable diff
641
801
  ```
642
802
 
643
803
  **Parameters:**
804
+
644
805
  - `locator` - Playwright Locator or Page to get HTML from
645
806
  - `search` - string/regex to filter results (returns first 10 matching lines with 5 lines context)
646
- - `showDiffSinceLastCall` - returns diff since last call (default: `true`). Pass `false` to get full HTML.
807
+ - `showDiffSinceLastCall` - returns diff since last call (default: `true`, but `false` when `search` is provided). Pass `false` to get full HTML.
647
808
  - `includeStyles` - keep style and class attributes (default: false)
648
809
 
649
810
  **HTML processing:**
650
811
  The function cleans HTML for compact, readable output:
812
+
651
813
  - **Removes tags**: script, style, link, meta, noscript, svg, head
652
814
  - **Unwraps nested wrappers**: Empty divs/spans with no attributes that only wrap a single child are collapsed (e.g., `<div><div><div><p>text</p></div></div></div>` → `<div><p>text</p></div>`)
653
815
  - **Removes empty elements**: Elements with no attributes and no content are removed
654
816
  - **Truncates long values**: Attribute values >200 chars and text content >500 chars are truncated
655
817
 
656
818
  **Attributes kept (summary):**
819
+
657
820
  - Common semantic and ARIA attributes (e.g., `href`, `name`, `type`, `aria-*`)
658
821
  - All `data-*` test attributes
659
822
  - Frequently used test IDs and special attributes (e.g., `testid`, `qa`, `e2e`, `vimium-label`)
660
823
 
661
- Snapshots return full content on first call, then diffs on subsequent calls. Diff is only returned when shorter than full content.
662
-
663
824
  **getPageMarkdown** - extract main page content as plain text using Mozilla Readability (same algorithm as Firefox Reader View). Strips navigation, ads, sidebars, and other clutter. Returns formatted text with title, author, and content:
664
825
 
665
826
  ```js
666
- await getPageMarkdown({ page, search?, showDiffSinceLastCall? })
827
+ await getPageMarkdown({ page: state.page, search?, showDiffSinceLastCall? })
667
828
  // Examples:
668
- const content = await getPageMarkdown({ page, showDiffSinceLastCall: false }) // full article
669
- const matches = await getPageMarkdown({ page, search: /API/i }) // search within content
829
+ const content = await getPageMarkdown({ page: state.page, showDiffSinceLastCall: false }) // full article
830
+ const matches = await getPageMarkdown({ page: state.page, search: /API/i }) // search within content
670
831
  ```
671
832
 
672
833
  **Output format:**
834
+
673
835
  ```
674
836
  # Article Title
675
837
 
@@ -681,13 +843,13 @@ The main article content as plain text, with paragraphs preserved...
681
843
  ```
682
844
 
683
845
  **Parameters:**
846
+
684
847
  - `page` - Playwright Page to extract content from
685
848
  - `search` - string/regex to filter content (returns first 10 matching lines with 5 lines context)
686
- - `showDiffSinceLastCall` - returns diff since last call (default: `true`). Pass `false` to get full content.
687
-
688
- Snapshots return full content on first call, then diffs on subsequent calls. Diff is only returned when shorter than full content.
849
+ - `showDiffSinceLastCall` - returns diff since last call (default: `true`, but `false` when `search` is provided). Pass `false` to get full content.
689
850
 
690
851
  **Use cases:**
852
+
691
853
  - Extract article text for LLM processing without HTML noise
692
854
  - Get readable content from news sites, blogs, documentation
693
855
  - Compare content changes after interactions
@@ -695,116 +857,180 @@ Snapshots return full content on first call, then diffs on subsequent calls. Dif
695
857
  **waitForPageLoad** - smart load detection that ignores analytics/ads:
696
858
 
697
859
  ```js
698
- await waitForPageLoad({ page, timeout?, pollInterval?, minWait? })
860
+ await waitForPageLoad({ page: state.page, timeout?, pollInterval?, minWait? })
699
861
  // Returns: { success, readyState, pendingRequests, waitTimeMs, timedOut }
700
862
  ```
701
863
 
702
864
  **getCDPSession** - send raw CDP commands:
703
865
 
704
866
  ```js
705
- const cdp = await getCDPSession({ page });
706
- const metrics = await cdp.send('Page.getLayoutMetrics');
867
+ const cdp = await getCDPSession({ page: state.page })
868
+ const metrics = await cdp.send('Page.getLayoutMetrics')
707
869
  ```
708
870
 
709
871
  **getLocatorStringForElement** - get stable Playwright selector from an element:
710
872
 
711
873
  ```js
712
- const selector = await getLocatorStringForElement(page.locator('[id="submit-btn"]'));
874
+ const selector = await getLocatorStringForElement(state.page.locator('[id="submit-btn"]'))
713
875
  // => "getByRole('button', { name: 'Save' })"
714
876
  ```
715
877
 
716
878
  **getReactSource** - get React component source location (dev mode only):
717
879
 
718
880
  ```js
719
- const source = await getReactSource({ locator: page.locator('[data-testid="submit-btn"]') });
881
+ const source = await getReactSource({ locator: state.page.locator('[data-testid="submit-btn"]') })
720
882
  // => { fileName, lineNumber, columnNumber, componentName }
721
883
  ```
722
884
 
723
885
  **getStylesForLocator** - inspect CSS styles applied to an element, like browser DevTools "Styles" panel. Useful for debugging styling issues, finding where a CSS property is defined (file:line), and checking inherited styles. Returns selector, source location, and declarations for each matching rule. ALWAYS fetch `https://playwriter.dev/resources/styles-api.md` first with curl or webfetch tool.
724
886
 
725
887
  ```js
726
- const styles = await getStylesForLocator({ locator: page.locator('.btn'), cdp: await getCDPSession({ page }) });
727
- console.log(formatStylesAsText(styles));
888
+ const styles = await getStylesForLocator({
889
+ locator: state.page.locator('.btn'),
890
+ cdp: await getCDPSession({ page: state.page }),
891
+ })
892
+ console.log(formatStylesAsText(styles))
728
893
  ```
729
894
 
730
895
  **createDebugger** - set breakpoints, step through code, inspect variables at runtime. Useful for debugging issues that only reproduce in browser, understanding code flow, and inspecting state at specific points. Can pause on exceptions, evaluate expressions in scope, and blackbox framework code. ALWAYS fetch `https://playwriter.dev/resources/debugger-api.md` first.
731
896
 
732
897
  ```js
733
- const cdp = await getCDPSession({ page }); const dbg = createDebugger({ cdp }); await dbg.enable();
734
- const scripts = await dbg.listScripts({ search: 'app' });
735
- await dbg.setBreakpoint({ file: scripts[0].url, line: 42 });
898
+ const cdp = await getCDPSession({ page: state.page })
899
+ const dbg = createDebugger({ cdp })
900
+ await dbg.enable()
901
+ const scripts = await dbg.listScripts({ search: 'app' })
902
+ await dbg.setBreakpoint({ file: scripts[0].url, line: 42 })
736
903
  // when paused: dbg.inspectLocalVariables(), dbg.stepOver(), dbg.resume()
737
904
  ```
738
905
 
739
906
  **createEditor** - view and live-edit page scripts and CSS at runtime. Edits are in-memory (persist until reload). Useful for testing quick fixes, searching page scripts with grep, and toggling debug flags. ALWAYS read `https://playwriter.dev/resources/editor-api.md` first.
740
907
 
741
908
  ```js
742
- const cdp = await getCDPSession({ page }); const editor = createEditor({ cdp }); await editor.enable();
743
- const matches = await editor.grep({ regex: /console\.log/ });
744
- await editor.edit({ url: matches[0].url, oldString: 'DEBUG = false', newString: 'DEBUG = true' });
909
+ const cdp = await getCDPSession({ page: state.page })
910
+ const editor = createEditor({ cdp })
911
+ await editor.enable()
912
+ const matches = await editor.grep({ regex: /console\.log/ })
913
+ await editor.edit({ url: matches[0].url, oldString: 'DEBUG = false', newString: 'DEBUG = true' })
745
914
  ```
746
915
 
747
916
  **screenshotWithAccessibilityLabels** - take a screenshot with Vimium-style visual labels overlaid on interactive elements. Shows labels, captures screenshot, then removes labels. The image and accessibility snapshot are automatically included in the response. Can be called multiple times to capture multiple screenshots. Use a timeout of **20 seconds** for complex pages.
748
917
 
749
- Prefer this for pages with grids, image galleries, maps, or complex visual layouts where spatial position matters. For simple text-heavy pages, `accessibilitySnapshot` with search is faster and uses fewer tokens.
918
+ Prefer this for pages with grids, image galleries, maps, or complex visual layouts where spatial position matters. For simple text-heavy pages, `snapshot` with search is faster and uses fewer tokens.
750
919
 
751
920
  ```js
752
- await screenshotWithAccessibilityLabels({ page });
921
+ await screenshotWithAccessibilityLabels({ page: state.page })
753
922
  // Image and accessibility snapshot are automatically included in response
754
923
  // Use refs from snapshot to interact with elements
755
- await page.locator('[id="submit-btn"]').click();
924
+ await state.page.locator('[id="submit-btn"]').click()
756
925
 
757
926
  // Can take multiple screenshots in one execution
758
- await screenshotWithAccessibilityLabels({ page });
759
- await page.click('button');
760
- await screenshotWithAccessibilityLabels({ page });
927
+ await screenshotWithAccessibilityLabels({ page: state.page })
928
+ await state.page.click('button')
929
+ await screenshotWithAccessibilityLabels({ page: state.page })
761
930
  // Both images are included in the response
762
931
  ```
763
932
 
764
933
  Labels are color-coded: yellow=links, orange=buttons, coral=inputs, pink=checkboxes, peach=sliders, salmon=menus, amber=tabs.
765
934
 
766
- **startRecording / stopRecording** - record the page as a video at native FPS (30-60fps). Uses `chrome.tabCapture` in the extension context, so **recording survives page navigation**. Video is saved as mp4.
935
+ **resizeImage** - shrink an image in-place so it consumes fewer tokens when read back into context. `await resizeImage({ input: './screenshot.png' })`. Also accepts `width`, `height`, `maxDimension`, `quality`, `output`.
936
+
937
+ **recording.start / recording.stop** - record the page as a video at native FPS (30-60fps). Uses `chrome.tabCapture` in the extension context, so **recording survives page navigation**. Video is saved as mp4.
938
+
939
+ While recording is active, Playwriter automatically overlays a smooth ghost cursor that follows automated mouse actions (`page.mouse.*`, `locator.click()`, hover flows) using `page.onMouseAction` from the Playwright fork.
940
+
941
+ For demos where cursor movement should be visible and human-like, drive the page with interaction methods (`locator.click()`, `page.click()`, `page.mouse.move()`, `press`, typing). Avoid skipping interactions with direct state jumps (for example, `goto(itemUrl)` instead of clicking the link) when your goal is to show realistic pointer motion in the recording.
767
942
 
768
943
  **Note**: Recording requires the user to have clicked the Playwriter extension icon on the tab. This grants `activeTab` permission needed for `chrome.tabCapture`. Recording works on tabs where the icon was clicked - if you need to record a new tab, ask the user to click the icon on it first.
769
944
 
770
945
  ```js
771
946
  // Start recording - outputPath must be specified upfront
772
- await startRecording({
773
- page,
947
+ await recording.start({
948
+ page: state.page,
774
949
  outputPath: './recording.mp4',
775
- frameRate: 30, // default: 30
776
- audio: false, // default: false (tab audio)
777
- videoBitsPerSecond: 2500000 // 2.5 Mbps
778
- });
950
+ frameRate: 30, // default: 30
951
+ audio: false, // default: false (tab audio)
952
+ videoBitsPerSecond: 2500000, // 2.5 Mbps
953
+ })
779
954
 
780
955
  // Navigate around - recording continues!
781
- await page.click('a');
782
- await page.waitForLoadState('domcontentloaded');
783
- await page.goBack();
956
+ await state.page.click('a')
957
+ await state.page.waitForLoadState('domcontentloaded')
958
+ await state.page.goBack()
784
959
 
785
960
  // Stop and get result
786
- const { path, duration, size } = await stopRecording({ page });
787
- console.log(`Saved ${size} bytes, duration: ${duration}ms`);
961
+ const { path, duration, size } = await recording.stop({ page: state.page })
962
+ console.log(`Saved ${size} bytes, duration: ${duration}ms`)
788
963
  ```
789
964
 
790
965
  Additional recording utilities:
966
+
791
967
  ```js
792
968
  // Check if recording is active
793
- const { isRecording, startedAt } = await isRecording({ page });
969
+ const { isRecording, startedAt } = await recording.isRecording({ page: state.page })
794
970
 
795
971
  // Cancel recording without saving
796
- await cancelRecording({ page });
972
+ await recording.cancel({ page: state.page })
797
973
  ```
798
974
 
975
+ **ghostCursor.show / ghostCursor.hide** - manually show or hide the in-page cursor overlay. Useful for screenshots and demos even when recording is not running.
976
+
977
+ ```js
978
+ // Show cursor in the center (or keep current position if already visible)
979
+ await ghostCursor.show({ page: state.page })
980
+
981
+ // Optional styles: 'minimal' (default triangular pointer), 'dot', 'screenstudio'
982
+ await ghostCursor.show({ page: state.page, style: 'minimal' })
983
+
984
+ // Hide cursor overlay
985
+ await ghostCursor.hide({ page: state.page })
986
+ ```
987
+
988
+ `startRecording`, `stopRecording`, `isRecording`, and `cancelRecording` remain available as backward-compatible aliases.
989
+
799
990
  **Key difference from getDisplayMedia**: This approach uses `chrome.tabCapture` which runs in the extension context, not the page. The recording persists across navigations because the extension holds the `MediaRecorder`, not the page's JavaScript context.
800
991
 
992
+ **createDemoVideo** - create a polished demo video from a recording by automatically speeding up idle sections (time between execute() calls) while keeping interactions at normal speed. Useful for creating demo videos of agent workflows without long pauses.
993
+
994
+ While recording is active, playwriter tracks when each `execute()` call starts and ends. `recording.stop()` returns these timestamps alongside the video file. `createDemoVideo` uses this data to identify idle gaps and speed them up with ffmpeg in a single pass.
995
+
996
+ A 1-second buffer is preserved around each interaction so viewers see context before and after each action.
997
+
998
+ Requires `ffmpeg` and `ffprobe` installed on the system.
999
+
1000
+ **Timeout**: `createDemoVideo` runs ffmpeg on the full recording and can take 60–120+ seconds. Always pass `--timeout 120000` (or higher) to the playwriter execute call that contains it, otherwise it will silently time out before the file is written.
1001
+
1002
+ ```js
1003
+ // Start recording
1004
+ await recording.start({ page: state.page, outputPath: './recording.mp4' })
1005
+ ```
1006
+
1007
+ ```js
1008
+ // ... multiple execute() calls with browser interactions ...
1009
+ // Each call's timing is tracked automatically while recording is active
1010
+ ```
1011
+
1012
+ ```js
1013
+ // Stop recording — executionTimestamps is included in the result
1014
+ const recordingResult = await recording.stop({ page: state.page })
1015
+
1016
+ // Create demo video — idle gaps are sped up 4x (default)
1017
+ const demoPath = await createDemoVideo({
1018
+ recordingPath: recordingResult.path,
1019
+ durationMs: recordingResult.duration,
1020
+ executionTimestamps: recordingResult.executionTimestamps,
1021
+ speed: 5, // optional, default 5x for idle sections
1022
+ // outputFile: './demo.mp4', // optional, defaults to recording-demo.mp4
1023
+ })
1024
+ console.log('Demo video:', demoPath)
1025
+ ```
1026
+
801
1027
  ## pinned elements
802
1028
 
803
1029
  Users can right-click → "Copy Playwriter Element Reference" to store elements in `globalThis.playwriterPinnedElem1` (increments for each pin). The reference is copied to clipboard:
804
1030
 
805
1031
  ```js
806
- const el = await page.evaluateHandle(() => globalThis.playwriterPinnedElem1);
807
- await el.click();
1032
+ const el = await state.page.evaluateHandle(() => globalThis.playwriterPinnedElem1)
1033
+ await el.click()
808
1034
  ```
809
1035
 
810
1036
  ## taking screenshots
@@ -812,24 +1038,28 @@ await el.click();
812
1038
  Always use `scale: 'css'` to avoid 2-4x larger images on high-DPI displays:
813
1039
 
814
1040
  ```js
815
- await page.screenshot({ path: 'shot.png', scale: 'css' });
1041
+ await state.page.screenshot({ path: 'shot.png', scale: 'css' })
816
1042
  ```
817
1043
 
818
- If you want to read back the image file into context make sure to resize it first, scaling down the image to make sure max size is 1500px. for example with `sips --resampleHeightWidthMax 1500 input.png --out output.png` on macOS.
1044
+ If you want to read back the image file into context, resize it first so it consumes fewer tokens:
1045
+
1046
+ ```js
1047
+ await resizeImage({ input: './shot.png' })
1048
+ ```
819
1049
 
820
1050
  ## page.evaluate
821
1051
 
822
1052
  Code inside `page.evaluate()` runs in the browser - use plain JavaScript only, no TypeScript syntax. Return values and log outside (console.log inside evaluate runs in browser, not visible):
823
1053
 
824
1054
  ```js
825
- const title = await page.evaluate(() => document.title);
826
- console.log('Title:', title);
1055
+ const title = await state.page.evaluate(() => document.title)
1056
+ console.log('Title:', title)
827
1057
 
828
- const info = await page.evaluate(() => ({
829
- url: location.href,
830
- buttons: document.querySelectorAll('button').length,
831
- }));
832
- console.log(info);
1058
+ const info = await state.page.evaluate(() => ({
1059
+ url: location.href,
1060
+ buttons: document.querySelectorAll('button').length,
1061
+ }))
1062
+ console.log(info)
833
1063
  ```
834
1064
 
835
1065
  ## loading files
@@ -837,7 +1067,9 @@ console.log(info);
837
1067
  Fill inputs with file content:
838
1068
 
839
1069
  ```js
840
- const fs = require('node:fs'); const content = fs.readFileSync('./data.txt', 'utf-8'); await page.locator('textarea').fill(content);
1070
+ const fs = require('node:fs')
1071
+ const content = fs.readFileSync('./data.txt', 'utf-8')
1072
+ await state.page.locator('textarea').fill(content)
841
1073
  ```
842
1074
 
843
1075
  ## network interception
@@ -845,34 +1077,49 @@ const fs = require('node:fs'); const content = fs.readFileSync('./data.txt', 'ut
845
1077
  For scraping or reverse-engineering APIs, intercept network requests instead of scrolling DOM. Store in `state` to analyze across calls:
846
1078
 
847
1079
  ```js
848
- state.requests = []; state.responses = [];
849
- page.on('request', req => { if (req.url().includes('/api/')) state.requests.push({ url: req.url(), method: req.method(), headers: req.headers() }); });
850
- page.on('response', async res => { if (res.url().includes('/api/')) { try { state.responses.push({ url: res.url(), status: res.status(), body: await res.json() }); } catch {} } });
1080
+ state.requests = []
1081
+ state.responses = []
1082
+ state.page.on('request', (req) => {
1083
+ if (req.url().includes('/api/')) state.requests.push({ url: req.url(), method: req.method(), headers: req.headers() })
1084
+ })
1085
+ state.page.on('response', async (res) => {
1086
+ if (res.url().includes('/api/')) {
1087
+ try {
1088
+ state.responses.push({ url: res.url(), status: res.status(), body: await res.json() })
1089
+ } catch {}
1090
+ }
1091
+ })
851
1092
  ```
852
1093
 
853
1094
  Then trigger actions (scroll, click, navigate) and analyze captured data:
854
1095
 
855
1096
  ```js
856
- console.log('Captured', state.responses.length, 'API calls');
857
- state.responses.forEach(r => console.log(r.status, r.url.slice(0, 80)));
1097
+ console.log('Captured', state.responses.length, 'API calls')
1098
+ state.responses.forEach((r) => console.log(r.status, r.url.slice(0, 80)))
858
1099
  ```
859
1100
 
860
1101
  Inspect a specific response to understand schema:
861
1102
 
862
1103
  ```js
863
- const resp = state.responses.find(r => r.url.includes('users'));
864
- console.log(JSON.stringify(resp.body, null, 2).slice(0, 2000));
1104
+ const resp = state.responses.find((r) => r.url.includes('users'))
1105
+ console.log(JSON.stringify(resp.body, null, 2).slice(0, 2000))
865
1106
  ```
866
1107
 
867
1108
  Replay API directly (useful for pagination):
868
1109
 
869
1110
  ```js
870
- const { url, headers } = state.requests.find(r => r.url.includes('feed'));
871
- const data = await page.evaluate(async ({ url, headers }) => { const res = await fetch(url, { headers }); return res.json(); }, { url, headers });
872
- console.log(data);
1111
+ const { url, headers } = state.requests.find((r) => r.url.includes('feed'))
1112
+ const data = await state.page.evaluate(
1113
+ async ({ url, headers }) => {
1114
+ const res = await fetch(url, { headers })
1115
+ return res.json()
1116
+ },
1117
+ { url, headers },
1118
+ )
1119
+ console.log(data)
873
1120
  ```
874
1121
 
875
- Clean up listeners when done: `page.removeAllListeners('request'); page.removeAllListeners('response');`
1122
+ Clean up listeners when done: `state.page.removeAllListeners('request'); state.page.removeAllListeners('response');`
876
1123
 
877
1124
  ## debugging web apps
878
1125
 
@@ -881,38 +1128,39 @@ When debugging why a web app isn't working (e.g., content not rendering, API err
881
1128
  **1. Console logs** — use `getLatestLogs` to check for errors:
882
1129
 
883
1130
  ```js
884
- const errors = await getLatestLogs({ page, search: /error|fail/i, count: 20 });
885
- const appLogs = await getLatestLogs({ page, search: /myComponent|state/i });
1131
+ const errors = await getLatestLogs({ page: state.page, search: /error|fail/i, count: 20 })
1132
+ const appLogs = await getLatestLogs({ page: state.page, search: /myComponent|state/i })
886
1133
  ```
887
1134
 
888
1135
  **2. DOM inspection via evaluate** — check content directly without screenshots:
889
1136
 
890
1137
  ```js
891
- const info = await page.evaluate(() => {
892
- const msgs = document.querySelectorAll('.message');
893
- return Array.from(msgs).map(m => ({
1138
+ const info = await state.page.evaluate(() => {
1139
+ const msgs = document.querySelectorAll('.message')
1140
+ return Array.from(msgs).map((m) => ({
894
1141
  text: m.textContent?.slice(0, 200),
895
1142
  visible: m.offsetHeight > 0,
896
- }));
897
- });
898
- console.log(JSON.stringify(info, null, 2));
1143
+ }))
1144
+ })
1145
+ console.log(JSON.stringify(info, null, 2))
899
1146
  ```
900
1147
 
901
1148
  **3. Combine snapshot + logs for full picture:**
902
1149
 
903
1150
  ```js
904
- await page.keyboard.press('Enter');
905
- await page.waitForTimeout(2000);
1151
+ await state.page.keyboard.press('Enter')
1152
+ await state.page.waitForTimeout(2000)
906
1153
 
907
- const snap = await accessibilitySnapshot({ page, search: /dialog|error|message/ });
908
- const logs = await getLatestLogs({ page, search: /error/i, count: 10 });
909
- console.log('UI:', snap);
910
- console.log('Logs:', logs);
1154
+ const snap = await snapshot({ page: state.page, search: /dialog|error|message/ })
1155
+ const logs = await getLatestLogs({ page: state.page, search: /error/i, count: 10 })
1156
+ console.log('UI:', snap)
1157
+ console.log('Logs:', logs)
911
1158
  ```
912
1159
 
913
1160
  ## capabilities
914
1161
 
915
1162
  Examples of what playwriter can do:
1163
+
916
1164
  - Monitor console logs while user reproduces a bug
917
1165
  - Intercept network requests to reverse-engineer APIs and build SDKs
918
1166
  - Scrape data by replaying paginated API calls instead of scrolling DOM
@@ -922,6 +1170,110 @@ Examples of what playwriter can do:
922
1170
  - Handle popups, downloads, iframes, and dialog boxes
923
1171
  - Record videos of browser sessions that survive page navigation
924
1172
 
1173
+ ## computer use
1174
+
1175
+ Playwriter provides the same browser control as Anthropic's `computer_20250124` tool and the Claude Chrome extension, using Playwright APIs instead of screenshot-based coordinate clicking. No computer use beta needed.
1176
+
1177
+ This section covers low-level mouse/keyboard APIs not documented elsewhere. For locator-based clicking, screenshots, navigation, forms, evaluate, snapshots, and network interception see their dedicated sections above.
1178
+
1179
+ ### clicking
1180
+
1181
+ ```js
1182
+ // Preferred: by locator (stable, auto-waits, no coordinates needed)
1183
+ await state.page.locator('button[name="Submit"]').click()
1184
+ await state.page.locator('text=Login').click({ button: 'right' })
1185
+ await state.page.locator('text=Login').dblclick()
1186
+ await state.page
1187
+ .locator('a')
1188
+ .first()
1189
+ .click({ modifiers: ['Meta'] }) // cmd+click opens new tab
1190
+
1191
+ // By coordinates (when locators aren't available, e.g. canvas, maps, custom widgets)
1192
+ await state.page.mouse.click(450, 320) // left click
1193
+ await state.page.mouse.click(450, 320, { button: 'right' }) // right click
1194
+ await state.page.mouse.dblclick(450, 320) // double click
1195
+ await state.page.mouse.click(450, 320, { clickCount: 3 }) // triple click
1196
+ await state.page.mouse.click(450, 320, { modifiers: ['Shift'] }) // shift+click
1197
+ ```
1198
+
1199
+ ### hover
1200
+
1201
+ ```js
1202
+ await state.page.locator('.tooltip-trigger').hover() // by locator (preferred)
1203
+ await state.page.mouse.move(450, 320) // by coordinates
1204
+ ```
1205
+
1206
+ ### scroll
1207
+
1208
+ ```js
1209
+ // By locator (preferred)
1210
+ await state.page.locator('#footer').scrollIntoViewIfNeeded()
1211
+
1212
+ // By pixel (for canvas, maps, infinite scroll)
1213
+ await state.page.mouse.wheel(0, 300) // scroll down 300px
1214
+ await state.page.mouse.wheel(0, -300) // scroll up
1215
+ await state.page.mouse.wheel(300, 0) // scroll right
1216
+ await state.page.mouse.wheel(-300, 0) // scroll left
1217
+
1218
+ // Scroll at a specific position
1219
+ await state.page.mouse.move(450, 320)
1220
+ await state.page.mouse.wheel(0, 500)
1221
+
1222
+ // Scroll inside a container
1223
+ await state.page.locator('.scrollable-list').evaluate((el) => {
1224
+ el.scrollTop += 500
1225
+ })
1226
+ ```
1227
+
1228
+ ### drag
1229
+
1230
+ ```js
1231
+ // By locator (preferred)
1232
+ await state.page.locator('#item').dragTo(state.page.locator('#target'))
1233
+
1234
+ // By coordinates (for canvas, sliders, custom drag targets)
1235
+ await state.page.mouse.move(100, 200)
1236
+ await state.page.mouse.down()
1237
+ await state.page.mouse.move(400, 500, { steps: 10 }) // steps for smooth drag
1238
+ await state.page.mouse.up()
1239
+ ```
1240
+
1241
+ **Freehand drawing, annotation widgets, and canvas tools** use this same `mouse.down → move → up` pattern. If a widget expects a drawn stroke (paint tools, annotation overlays, range sliders, timeline scrubbers), always use held-mouse motion — not `mouse.click()`:
1242
+
1243
+ ```js
1244
+ // Draw a stroke across a canvas or annotation layer
1245
+ await state.page.mouse.move(startX, startY)
1246
+ await state.page.mouse.down()
1247
+ await state.page.mouse.move(endX, endY, { steps: 15 }) // steps = smoother stroke
1248
+ await state.page.mouse.up()
1249
+ await state.page.waitForTimeout(500) // let the widget process the stroke
1250
+ ```
1251
+
1252
+ ### key hold / release / repeat
1253
+
1254
+ ```js
1255
+ // Hold modifier while pressing another key
1256
+ await state.page.keyboard.down('Shift')
1257
+ await state.page.keyboard.press('ArrowDown')
1258
+ await state.page.keyboard.up('Shift')
1259
+
1260
+ // Repeat a key
1261
+ for (let i = 0; i < 5; i++) await state.page.keyboard.press('ArrowDown')
1262
+ ```
1263
+
1264
+ ### resize viewport
1265
+
1266
+ ```js
1267
+ await state.page.setViewportSize({ width: 1280, height: 720 })
1268
+ ```
1269
+
1270
+ ### region screenshot (zoom equivalent)
1271
+
1272
+ ```js
1273
+ await state.page.screenshot({ path: 'region.png', scale: 'css', clip: { x: 100, y: 200, width: 400, height: 300 } })
1274
+ ```
1275
+
1276
+ Prefer locator-based actions over coordinates — locators are stable across scroll/resize, auto-wait for elements, and don't require screenshot round-trips that burn ~800 image tokens per cycle.
925
1277
 
926
1278
  ## Ghost Browser integration
927
1279
 
@@ -929,19 +1281,15 @@ Playwriter supports [Ghost Browser](https://ghostbrowser.com/) for multi-identit
929
1281
 
930
1282
  ```js
931
1283
  // List identities and open tabs in different ones
932
- const identities = await chrome.projects.getIdentitiesList();
933
- await chrome.ghostPublicAPI.openTab({ url: 'https://reddit.com', identity: identities[0].id });
1284
+ const identities = await chrome.projects.getIdentitiesList()
1285
+ await chrome.ghostPublicAPI.openTab({ url: 'https://reddit.com', identity: identities[0].id })
934
1286
 
935
1287
  // Assign proxies per tab or identity
936
- const proxies = await chrome.ghostProxies.getList();
937
- await chrome.ghostProxies.setTabProxy(tabId, proxies[0].id);
1288
+ const proxies = await chrome.ghostProxies.getList()
1289
+ await chrome.ghostProxies.setTabProxy(tabId, proxies[0].id)
938
1290
  ```
939
1291
 
940
1292
  For complete API reference with all methods, types, and examples, read:
941
1293
  `extension/src/ghost-browser-api.d.ts`
942
1294
 
943
1295
  Note: Only works in Ghost Browser. In regular Chrome, calls fail with "not available".
944
-
945
- ## debugging playwriter issues
946
-
947
- if some internal critical error happens you can read your own relay ws logs to understand the issue, it will show logs from extension, mcp and ws server together. then you can create a gh issue using `gh issue create -R remorses/playwriter --title title --body body`. ask for user confirmation before doing this.