playwriter 0.0.63 → 0.0.80

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (216) hide show
  1. package/dist/aria-snapshot.d.ts +41 -3
  2. package/dist/aria-snapshot.d.ts.map +1 -1
  3. package/dist/aria-snapshot.js +131 -54
  4. package/dist/aria-snapshot.js.map +1 -1
  5. package/dist/aria-snapshot.test.js +5 -2
  6. package/dist/aria-snapshot.test.js.map +1 -1
  7. package/dist/aria-snapshot.unit.test.js +83 -41
  8. package/dist/aria-snapshot.unit.test.js.map +1 -1
  9. package/dist/assets/cursors/screen-studio/pointer-macos-tahoe-data-url.d.ts +5 -0
  10. package/dist/assets/cursors/screen-studio/pointer-macos-tahoe-data-url.d.ts.map +1 -0
  11. package/dist/assets/cursors/screen-studio/pointer-macos-tahoe-data-url.js +5 -0
  12. package/dist/assets/cursors/screen-studio/pointer-macos-tahoe-data-url.js.map +1 -0
  13. package/dist/bippy.js +1 -1
  14. package/dist/cdp-log.d.ts +1 -1
  15. package/dist/cdp-log.d.ts.map +1 -1
  16. package/dist/cdp-log.js +1 -1
  17. package/dist/cdp-log.js.map +1 -1
  18. package/dist/cdp-relay.d.ts.map +1 -1
  19. package/dist/cdp-relay.js +408 -298
  20. package/dist/cdp-relay.js.map +1 -1
  21. package/dist/cdp-session.d.ts.map +1 -1
  22. package/dist/cdp-session.js.map +1 -1
  23. package/dist/cdp-types.d.ts.map +1 -1
  24. package/dist/cdp-types.js +7 -7
  25. package/dist/cdp-types.js.map +1 -1
  26. package/dist/clean-html.d.ts.map +1 -1
  27. package/dist/clean-html.js +4 -5
  28. package/dist/clean-html.js.map +1 -1
  29. package/dist/cli.js +45 -27
  30. package/dist/cli.js.map +1 -1
  31. package/dist/create-logger.d.ts.map +1 -1
  32. package/dist/create-logger.js +3 -1
  33. package/dist/create-logger.js.map +1 -1
  34. package/dist/debugger-examples-types.d.ts.map +1 -1
  35. package/dist/debugger.d.ts.map +1 -1
  36. package/dist/debugger.js +1 -3
  37. package/dist/debugger.js.map +1 -1
  38. package/dist/diff-utils.d.ts.map +1 -1
  39. package/dist/diff-utils.js +1 -4
  40. package/dist/diff-utils.js.map +1 -1
  41. package/dist/editor-api.md +12 -2
  42. package/dist/editor-examples.d.ts +1 -1
  43. package/dist/editor-examples.d.ts.map +1 -1
  44. package/dist/editor-examples.js +1 -1
  45. package/dist/editor-examples.js.map +1 -1
  46. package/dist/editor.d.ts +1 -1
  47. package/dist/editor.d.ts.map +1 -1
  48. package/dist/editor.js +1 -1
  49. package/dist/editor.js.map +1 -1
  50. package/dist/executor.d.ts +26 -3
  51. package/dist/executor.d.ts.map +1 -1
  52. package/dist/executor.js +295 -64
  53. package/dist/executor.js.map +1 -1
  54. package/dist/executor.unit.test.js +38 -1
  55. package/dist/executor.unit.test.js.map +1 -1
  56. package/dist/extension-connection.test.js +139 -36
  57. package/dist/extension-connection.test.js.map +1 -1
  58. package/dist/ffmpeg.d.ts +148 -0
  59. package/dist/ffmpeg.d.ts.map +1 -0
  60. package/dist/ffmpeg.js +523 -0
  61. package/dist/ffmpeg.js.map +1 -0
  62. package/dist/ghost-browser.d.ts.map +1 -1
  63. package/dist/ghost-browser.js.map +1 -1
  64. package/dist/ghost-cursor-client.js +281 -0
  65. package/dist/ghost-cursor.d.ts +27 -0
  66. package/dist/ghost-cursor.d.ts.map +1 -0
  67. package/dist/ghost-cursor.js +63 -0
  68. package/dist/ghost-cursor.js.map +1 -0
  69. package/dist/htmlrewrite.d.ts.map +1 -1
  70. package/dist/htmlrewrite.js +17 -55
  71. package/dist/htmlrewrite.js.map +1 -1
  72. package/dist/htmlrewrite.test.js.map +1 -1
  73. package/dist/kill-port.d.ts.map +1 -1
  74. package/dist/kill-port.js +1 -3
  75. package/dist/kill-port.js.map +1 -1
  76. package/dist/locator-selector.test.d.ts +2 -0
  77. package/dist/locator-selector.test.d.ts.map +1 -0
  78. package/dist/locator-selector.test.js +96 -0
  79. package/dist/locator-selector.test.js.map +1 -0
  80. package/dist/mcp-client.js.map +1 -1
  81. package/dist/mcp.d.ts.map +1 -1
  82. package/dist/mcp.js +8 -3
  83. package/dist/mcp.js.map +1 -1
  84. package/dist/on-mouse-action.test.d.ts +2 -0
  85. package/dist/on-mouse-action.test.d.ts.map +1 -0
  86. package/dist/on-mouse-action.test.js +155 -0
  87. package/dist/on-mouse-action.test.js.map +1 -0
  88. package/dist/page-markdown.js +4 -4
  89. package/dist/page-markdown.js.map +1 -1
  90. package/dist/prompt.md +594 -255
  91. package/dist/protocol.d.ts +4 -0
  92. package/dist/protocol.d.ts.map +1 -1
  93. package/dist/readability.js +1 -1
  94. package/dist/recording-ghost-cursor.d.ts +41 -0
  95. package/dist/recording-ghost-cursor.d.ts.map +1 -0
  96. package/dist/recording-ghost-cursor.js +79 -0
  97. package/dist/recording-ghost-cursor.js.map +1 -0
  98. package/dist/recording-relay.d.ts.map +1 -1
  99. package/dist/recording-relay.js +8 -8
  100. package/dist/recording-relay.js.map +1 -1
  101. package/dist/relay-client.d.ts +17 -4
  102. package/dist/relay-client.d.ts.map +1 -1
  103. package/dist/relay-client.js +44 -10
  104. package/dist/relay-client.js.map +1 -1
  105. package/dist/relay-core.test.d.ts.map +1 -1
  106. package/dist/relay-core.test.js +187 -26
  107. package/dist/relay-core.test.js.map +1 -1
  108. package/dist/relay-navigation.test.d.ts.map +1 -1
  109. package/dist/relay-navigation.test.js +54 -31
  110. package/dist/relay-navigation.test.js.map +1 -1
  111. package/dist/relay-session.test.d.ts.map +1 -1
  112. package/dist/relay-session.test.js +113 -65
  113. package/dist/relay-session.test.js.map +1 -1
  114. package/dist/relay-state.d.ts +158 -0
  115. package/dist/relay-state.d.ts.map +1 -0
  116. package/dist/relay-state.js +306 -0
  117. package/dist/relay-state.js.map +1 -0
  118. package/dist/relay-state.test.d.ts +2 -0
  119. package/dist/relay-state.test.d.ts.map +1 -0
  120. package/dist/relay-state.test.js +472 -0
  121. package/dist/relay-state.test.js.map +1 -0
  122. package/dist/scoped-fs.d.ts.map +1 -1
  123. package/dist/scoped-fs.js.map +1 -1
  124. package/dist/screen-recording.d.ts +42 -4
  125. package/dist/screen-recording.d.ts.map +1 -1
  126. package/dist/screen-recording.js +88 -13
  127. package/dist/screen-recording.js.map +1 -1
  128. package/dist/selector-generator.js +1 -1
  129. package/dist/snapshot-tools.test.js +71 -28
  130. package/dist/snapshot-tools.test.js.map +1 -1
  131. package/dist/start-relay-server.d.ts +1 -1
  132. package/dist/start-relay-server.d.ts.map +1 -1
  133. package/dist/start-relay-server.js +1 -1
  134. package/dist/start-relay-server.js.map +1 -1
  135. package/dist/styles-api.md +8 -1
  136. package/dist/styles-examples.d.ts +1 -1
  137. package/dist/styles-examples.d.ts.map +1 -1
  138. package/dist/styles-examples.js +1 -1
  139. package/dist/styles-examples.js.map +1 -1
  140. package/dist/styles.d.ts.map +1 -1
  141. package/dist/styles.js +1 -3
  142. package/dist/styles.js.map +1 -1
  143. package/dist/test-declarations.d.ts.map +1 -1
  144. package/dist/test-utils.d.ts +1 -1
  145. package/dist/test-utils.d.ts.map +1 -1
  146. package/dist/test-utils.js +7 -5
  147. package/dist/test-utils.js.map +1 -1
  148. package/dist/utils.d.ts.map +1 -1
  149. package/dist/utils.js.map +1 -1
  150. package/dist/wait-for-page-load.d.ts.map +1 -1
  151. package/dist/wait-for-page-load.js +1 -1
  152. package/dist/wait-for-page-load.js.map +1 -1
  153. package/package.json +4 -3
  154. package/src/a11y-client.ts +5 -4
  155. package/src/aria-snapshot.test.ts +5 -2
  156. package/src/aria-snapshot.ts +303 -116
  157. package/src/aria-snapshot.unit.test.ts +199 -141
  158. package/src/aria-snapshots/github-raw.txt +1 -1
  159. package/src/aria-snapshots/hackernews-interactive.txt +240 -240
  160. package/src/aria-snapshots/hackernews-raw.txt +270 -270
  161. package/src/assets/aria-labels-example.png +0 -0
  162. package/src/assets/aria-labels-github.png +0 -0
  163. package/src/assets/aria-labels-hacker-news.png +0 -0
  164. package/src/assets/aria-labels-old-reddit.png +0 -0
  165. package/src/assets/cursors/screen-studio/pointer-macos-tahoe-data-url.ts +5 -0
  166. package/src/assets/cursors/screen-studio/pointer-macos-tahoe.svg +18 -0
  167. package/src/cdp-log.ts +4 -1
  168. package/src/cdp-relay.ts +949 -737
  169. package/src/cdp-session.ts +12 -3
  170. package/src/cdp-types.ts +51 -51
  171. package/src/clean-html.ts +4 -5
  172. package/src/cli.ts +82 -55
  173. package/src/create-logger.ts +5 -3
  174. package/src/debugger-examples-types.ts +4 -1
  175. package/src/debugger.ts +1 -5
  176. package/src/diff-utils.ts +2 -5
  177. package/src/editor-examples.ts +11 -1
  178. package/src/editor.ts +10 -2
  179. package/src/executor.ts +372 -73
  180. package/src/executor.unit.test.ts +48 -1
  181. package/src/extension-connection.test.ts +612 -488
  182. package/src/ffmpeg.ts +769 -0
  183. package/src/ghost-browser.ts +4 -6
  184. package/src/ghost-cursor-client.ts +368 -0
  185. package/src/ghost-cursor.ts +110 -0
  186. package/src/htmlrewrite.test.ts +6 -2
  187. package/src/htmlrewrite.ts +348 -386
  188. package/src/kill-port.ts +1 -3
  189. package/src/locator-selector.test.ts +115 -0
  190. package/src/mcp-client.ts +1 -1
  191. package/src/mcp.ts +21 -15
  192. package/src/on-mouse-action.test.ts +196 -0
  193. package/src/page-markdown.ts +7 -7
  194. package/src/protocol.ts +73 -57
  195. package/src/recording-ghost-cursor.ts +107 -0
  196. package/src/recording-relay.ts +20 -12
  197. package/src/relay-client.ts +84 -17
  198. package/src/relay-core.test.ts +761 -583
  199. package/src/relay-navigation.test.ts +517 -484
  200. package/src/relay-session.test.ts +984 -929
  201. package/src/relay-state.test.ts +570 -0
  202. package/src/relay-state.ts +497 -0
  203. package/src/resource.md +21 -49
  204. package/src/scoped-fs.ts +9 -3
  205. package/src/screen-recording.ts +175 -31
  206. package/src/skill.md +619 -271
  207. package/src/snapshot-tools.test.ts +580 -528
  208. package/src/snapshots/shadcn-ui-accessibility-full.md +181 -183
  209. package/src/snapshots/shadcn-ui-accessibility-interactive.md +119 -121
  210. package/src/start-relay-server.ts +14 -11
  211. package/src/styles-examples.ts +8 -1
  212. package/src/styles.ts +20 -21
  213. package/src/test-declarations.ts +6 -6
  214. package/src/test-utils.ts +104 -91
  215. package/src/utils.ts +2 -1
  216. package/src/wait-for-page-load.ts +6 -1
package/dist/prompt.md CHANGED
@@ -2,58 +2,62 @@
2
2
 
3
3
  Control user's Chrome browser via playwright code snippets. Prefer single-line code with semicolons between statements. Use playwriter immediately without waiting for user actions; only if you get "extension is not connected" or "no browser tabs have Playwriter enabled" should you ask the user to click the playwriter extension icon on the target tab.
4
4
 
5
+ **When to use playwriter instead of webfetch/curl:** If a website is JS-heavy (SPAs like Instagram, Twitter, Facebook, etc.), has cookie consent modals, login walls, lazy-loaded content, carousels, or infinite scroll — **always use playwriter**. Simple fetch/webfetch will return an empty HTML shell with no content. Do NOT waste time trying curl, webfetch, or parsing raw HTML from JS-rendered sites. Go straight to playwriter: navigate with a real browser, dismiss modals, then extract what you need via `page.evaluate()` or network interception.
6
+
5
7
  **If Chrome is not running**, the extension can't connect. Start Chrome from the command line before retrying:
6
8
 
7
9
  ```bash
8
10
  # macOS
9
- open -a "Google Chrome"
11
+ open -a "Google Chrome" --args --profile-directory=Default
10
12
 
11
13
  # Linux
12
- google-chrome &
14
+ google-chrome --profile-directory=Default &
13
15
 
14
16
  # Windows (cmd)
15
- start chrome.exe
17
+ start chrome.exe --profile-directory=Default
16
18
 
17
19
  # Windows (PowerShell)
18
- Start-Process chrome.exe
20
+ Start-Process chrome.exe -ArgumentList '--profile-directory=Default'
19
21
  ```
20
22
 
21
23
  To also enable automatic tab capture for screen recording (no manual extension click needed), add the `--allowlisted-extension-id` and `--auto-accept-this-tab-capture` flags:
22
24
 
23
25
  ```bash
24
26
  # macOS
25
- open -a "Google Chrome" --args --allowlisted-extension-id=jfeammnjpkecdekppnclgkkffahnhfhe --auto-accept-this-tab-capture
27
+ open -a "Google Chrome" --args --profile-directory=Default --allowlisted-extension-id=jfeammnjpkecdekppnclgkkffahnhfhe --auto-accept-this-tab-capture
26
28
 
27
29
  # Linux
28
- google-chrome --allowlisted-extension-id=jfeammnjpkecdekppnclgkkffahnhfhe --auto-accept-this-tab-capture &
30
+ google-chrome --profile-directory=Default --allowlisted-extension-id=jfeammnjpkecdekppnclgkkffahnhfhe --auto-accept-this-tab-capture &
29
31
 
30
32
  # Windows
31
- start chrome.exe --allowlisted-extension-id=jfeammnjpkecdekppnclgkkffahnhfhe --auto-accept-this-tab-capture
33
+ start chrome.exe --profile-directory=Default --allowlisted-extension-id=jfeammnjpkecdekppnclgkkffahnhfhe --auto-accept-this-tab-capture
32
34
  ```
33
35
 
34
36
  You can collaborate with the user - they can help with captchas, difficult elements, or reproducing bugs.
35
37
 
36
38
  ## context variables
37
39
 
38
- - `state` - object persisted between calls **within your session**. Each session has its own isolated state. Use to store pages, data, listeners (e.g., `state.myPage = await context.newPage()`)
40
+ - `state` - object persisted between calls **within your session**. Each session has its own isolated state. Use to store pages, data, listeners (e.g., `state.page = await context.newPage()`)
39
41
  - `page` - a default page (may be shared with other agents). Prefer creating your own page and storing it in `state` (see "working with pages")
40
42
  - `context` - browser context, access all pages via `context.pages()`
41
- - `require` - load Node.js modules like fs
43
+ - `require` - load Node.js modules (e.g., `const fs = require('node:fs')`). ESM `import` is not available in the sandbox
42
44
  - Node.js globals: `setTimeout`, `setInterval`, `fetch`, `URL`, `Buffer`, `crypto`, etc.
43
45
 
44
46
  **Important:** `state` is **session-isolated** but pages are **shared** across all sessions. See "working with pages" for how to avoid interference.
45
47
 
46
48
  ## rules
47
49
 
48
- - **Create your own page**: see "working with pages" — always create and store your own page in `state`, never use the default `page` for automation
50
+ - **Initialize state.page first**: see "working with pages" — at the start of a task, assign `state.page` (reuse `about:blank` or create one) and use `state.page` for all automation steps.
49
51
  - **Multiple calls**: use multiple execute calls for complex logic - helps understand intermediate state and isolate which action failed
50
52
  - **Never close**: never call `browser.close()` or `context.close()`. Only close pages you created or if user asks
51
53
  - **No bringToFront**: never call unless user asks - it's disruptive and unnecessary, you can interact with background pages
52
54
  - **Check state after actions**: always verify page state after clicking/submitting (see next section)
53
- - **Clean up listeners**: call `page.removeAllListeners()` at end of message to prevent leaks
54
- - **CDP sessions**: use `getCDPSession({ page })` not `page.context().newCDPSession()` - NEVER use `newCDPSession()` method, it doesn't work through playwriter relay
55
- - **Wait for load**: use `page.waitForLoadState('domcontentloaded')` not `page.waitForEvent('load')` - waitForEvent times out if already loaded
56
- - **Avoid timeouts**: prefer proper waits over `page.waitForTimeout()` - there are better ways to wait for elements
55
+ - **Clean up listeners**: call `state.page.removeAllListeners()` at end of message to prevent leaks
56
+ - **CDP sessions**: use `getCDPSession({ page: state.page })` not `state.page.context().newCDPSession()` - NEVER use `newCDPSession()` method, it doesn't work through playwriter relay
57
+ - **Wait for load**: use `state.page.waitForLoadState('domcontentloaded')` not `state.page.waitForEvent('load')` - waitForEvent times out if already loaded
58
+ - **Minimize timeouts**: prefer proper waits (`waitForSelector`, `waitForPageLoad`) over `state.page.waitForTimeout()`. Short timeouts (1-2s) are acceptable for non-deterministic events like popups, animations, or tab opens where no specific selector is available
59
+ - **Snapshot before screenshot**: always use `snapshot()` first to understand page state (text-based, fast, cheap). Only use `screenshot` when you specifically need visual/spatial information. Never take a screenshot just to check if a page loaded or to read text content — snapshot gives you that instantly without burning image tokens
60
+ - **Snapshot replaces page.evaluate() for inspection**: do NOT write `page.evaluate()` calls to manually query class names, bounding boxes, child counts, or visibility flags. `snapshot()` already shows every interactive element with its text, role, and a ready-to-use locator. If you catch yourself writing `document.querySelector` or `getBoundingClientRect` inside evaluate — stop and use `snapshot()` instead. Reserve `page.evaluate()` for actions that modify page state (e.g., `localStorage.clear()`, scroll manipulation) or extract non-DOM data (e.g., `window.__CONFIG__`)
57
61
 
58
62
  ## interaction feedback loop
59
63
 
@@ -62,10 +66,10 @@ Every browser interaction should follow a **observe → act → observe** loop.
62
66
  **Core loop:**
63
67
 
64
68
  1. **Open page** — get or create your page and navigate to the target URL
65
- 2. **Observe** — take an accessibility snapshot to understand the current state
66
- 3. **Update priors** — read the snapshot, identify the element to interact with
69
+ 2. **Observe** — print `state.page.url()` and take an accessibility snapshot. Always print the URL so you know where you are — pages can redirect, and actions can trigger unexpected navigation.
70
+ 3. **Check** — read the snapshot and URL. If the page isn't ready (still loading, expected content missing, wrong URL), **wait and observe again** — don't act on stale or incomplete state. Only proceed when you can identify the element to interact with.
67
71
  4. **Act** — perform one action (click, type, submit)
68
- 5. **Observe again** — take another snapshot to verify the action's effect
72
+ 5. **Observe again** — print URL + snapshot to verify the action's effect. If the action didn't take effect (nothing changed, page still loading), wait and observe again before proceeding.
69
73
  6. **Repeat** — continue from step 3 until the task is complete
70
74
 
71
75
  ```
@@ -74,19 +78,20 @@ Every browser interaction should follow a **observe → act → observe** loop.
74
78
  └──────────────────┬──────────────────────────┘
75
79
 
76
80
  ┌────────────────┐
77
- observe │◄─────────────────┐
78
- (snapshot) │ │
79
- └───────┬────────┘ │
80
- ▼ │
81
- ┌────────────────┐ │
82
- update priors │ │
83
- │ (read result) │ │
84
- └───────┬────────┘
85
-
86
- ┌────────────────┐
87
- act │ │
88
- (click/type) │──────────────────┘
89
- └────────────────┘
81
+ ┌───►│ observe │◄─────────────────┐
82
+ (url + snapshot) │ │
83
+ └───────┬────────┘ │
84
+ ▼ │
85
+ ┌────────────────┐ │
86
+ check │
87
+ (read result) │ │
88
+ │ └───┬────────┬───┘
89
+ not │ │ ready │
90
+ ready │ ▼ │
91
+ └────────┘ ┌────────────────┐
92
+ act │ │
93
+ │ (click/type) │─────────────┘
94
+ └────────────────┘
90
95
  ```
91
96
 
92
97
  **Example: opening a Framer plugin via the command palette**
@@ -94,30 +99,36 @@ Every browser interaction should follow a **observe → act → observe** loop.
94
99
  Each step is a separate execute call. Notice how every action is followed by a snapshot to verify what happened:
95
100
 
96
101
  ```js
97
- // 1. Open page and observe
98
- state.myPage = context.pages().find(p => p.url() === 'about:blank') ?? await context.newPage();
99
- await state.myPage.goto('https://framer.com/projects/my-project', { waitUntil: 'domcontentloaded' });
100
- await accessibilitySnapshot({ page: state.myPage }).then(console.log)
102
+ // 1. Open page and observe — always print URL first
103
+ state.page = context.pages().find((p) => p.url() === 'about:blank') ?? (await context.newPage())
104
+ await state.page.goto('https://framer.com/projects/my-project', { waitUntil: 'domcontentloaded' })
105
+ console.log('URL:', state.page.url())
106
+ await snapshot({ page: state.page }).then(console.log)
101
107
  ```
102
108
 
103
109
  ```js
104
110
  // 2. Act: open command palette → observe result
105
- await state.myPage.keyboard.press('Meta+k');
106
- await accessibilitySnapshot({ page: state.myPage, search: /dialog|Search/ }).then(console.log)
111
+ await state.page.keyboard.press('Meta+k')
112
+ console.log('URL:', state.page.url())
113
+ await snapshot({ page: state.page, search: /dialog|Search/ }).then(console.log)
114
+ // If dialog didn't appear, observe again before retrying
107
115
  ```
108
116
 
109
117
  ```js
110
118
  // 3. Act: type search query → observe result
111
- await state.myPage.keyboard.type('MCP');
112
- await accessibilitySnapshot({ page: state.myPage, search: /MCP/ }).then(console.log)
119
+ await state.page.keyboard.type('MCP')
120
+ console.log('URL:', state.page.url())
121
+ await snapshot({ page: state.page, search: /MCP/ }).then(console.log)
113
122
  ```
114
123
 
115
124
  ```js
116
125
  // 4. Act: press Enter → observe plugin loaded
117
- await state.myPage.keyboard.press('Enter');
118
- await state.myPage.waitForTimeout(1000);
119
- const frame = state.myPage.frames().find(f => f.url().includes('plugins.framercdn.com'));
120
- await accessibilitySnapshot({ page: state.myPage, frame: frame || undefined }).then(console.log)
126
+ await state.page.keyboard.press('Enter')
127
+ await state.page.waitForTimeout(1000)
128
+ console.log('URL:', state.page.url())
129
+ const frame = state.page.frames().find((f) => f.url().includes('plugins.framercdn.com'))
130
+ await snapshot({ page: state.page, frame: frame || undefined }).then(console.log)
131
+ // If frame not found, wait and observe again — plugin may still be loading
121
132
  ```
122
133
 
123
134
  **Other ways to observe action results:**
@@ -126,226 +137,321 @@ Snapshots are the primary feedback mechanism, but some actions have side effects
126
137
 
127
138
  - **Console logs** — check for errors or app state after an action:
128
139
  ```js
129
- await getLatestLogs({ page, search: /error|fail/i, count: 20 })
140
+ await getLatestLogs({ page: state.page, search: /error|fail/i, count: 20 })
130
141
  ```
131
142
  - **Network requests** — verify API calls were made after a form submit or button click:
132
143
  ```js
133
- page.on('response', async res => { if (res.url().includes('/api/')) { console.log(res.status(), res.url()); } });
144
+ state.page.on('response', async (res) => {
145
+ if (res.url().includes('/api/')) {
146
+ console.log(res.status(), res.url())
147
+ }
148
+ })
134
149
  ```
135
150
  - **URL changes** — confirm navigation happened:
136
151
  ```js
137
- console.log(page.url())
152
+ console.log(state.page.url())
138
153
  ```
139
- - **Screenshots** — only when you need to verify visual layout (CSS, spatial positioning, colors). Snapshots are always preferred for content verification.
154
+ - **Screenshots** — only for visual layout issues (see "choosing between snapshot methods" below).
140
155
 
141
156
  ## common mistakes to avoid
142
157
 
143
158
  **1. Not verifying actions succeeded**
144
159
  Always check page state after important actions (form submissions, uploads, typing). Your mental model can diverge from actual browser state:
160
+
145
161
  ```js
146
- await page.keyboard.type('my text');
147
- await accessibilitySnapshot({ page, search: /my text/ })
162
+ await state.page.keyboard.type('my text')
163
+ await snapshot({ page: state.page, search: /my text/ })
148
164
  // If verifying visual layout specifically, use screenshotWithAccessibilityLabels instead
149
165
  ```
150
166
 
151
167
  **2. Assuming paste/upload worked**
152
168
  Clipboard paste (`Meta+v`) can silently fail. For file uploads, prefer file input:
169
+
153
170
  ```js
154
171
  // Reliable: use file input
155
- const fileInput = page.locator('input[type="file"]').first();
156
- await fileInput.setInputFiles('/path/to/image.png');
172
+ const fileInput = state.page.locator('input[type="file"]').first()
173
+ await fileInput.setInputFiles('/path/to/image.png')
157
174
 
158
175
  // Unreliable: clipboard paste may silently fail, need to focus textarea first for example
159
- await page.keyboard.press('Meta+v'); // always verify with screenshot!
176
+ await state.page.keyboard.press('Meta+v') // always verify with screenshot!
160
177
  ```
161
178
 
162
179
  **3. Using stale locators from old snapshots**
163
180
  Locators (especially ones with `>> nth=`) can change when the page updates. Always get a fresh snapshot before clicking:
181
+
164
182
  ```js
165
183
  // BAD: using ref from minutes ago
166
- await page.locator('[id="old-id"]').click(); // element may have changed
184
+ await state.page.locator('[id="old-id"]').click() // element may have changed
167
185
 
168
186
  // GOOD: get fresh snapshot, then immediately use locators from it
169
- await accessibilitySnapshot({ page, showDiffSinceLastCall: true })
187
+ await snapshot({ page: state.page, showDiffSinceLastCall: true })
170
188
  // Now use the NEW locators from this output
171
189
  ```
172
190
 
173
191
  **4. Wrong assumptions about current page/element**
174
192
  Before destructive actions (delete, submit), verify you're targeting the right thing:
193
+
175
194
  ```js
176
195
  // Before deleting, verify it's the right item
177
- await page.screenshotWithAccessibilityLabels({ page });
196
+ await screenshotWithAccessibilityLabels({ page: state.page })
178
197
  // READ the screenshot to confirm, THEN proceed with delete
179
198
  ```
180
199
 
181
200
  **5. Text concatenation without line breaks**
182
201
  `keyboard.type()` doesn't insert newlines from `\n` in strings. Use `keyboard.press('Enter')`:
202
+
183
203
  ```js
184
204
  // BAD: newlines in string don't create line breaks
185
- await page.keyboard.type('Line 1\nLine 2'); // becomes "Line 1Line 2"
205
+ await state.page.keyboard.type('Line 1\nLine 2') // becomes "Line 1Line 2"
186
206
 
187
207
  // GOOD: use Enter key for line breaks
188
- await page.keyboard.type('Line 1');
189
- await page.keyboard.press('Enter');
190
- await page.keyboard.type('Line 2');
208
+ await state.page.keyboard.type('Line 1')
209
+ await state.page.keyboard.press('Enter')
210
+ await state.page.keyboard.type('Line 2')
191
211
  ```
192
212
 
193
- **6. Quote escaping in $'...' syntax**
194
- When using `$'...'` for multiline code, nested quotes break parsing. Use different quote styles or escape them:
213
+ **6. Quote escaping in bash**
214
+ Bash parses `$`, backticks, and `\` inside double-quoted strings. This silently corrupts JS code containing dollar signs (regex like `/\$[\d.]+/`), template literals, or backslash patterns.
215
+
195
216
  ```bash
196
- # BAD: nested double quotes break $'...'
197
- playwriter -s 1 -e $'await page.locator("[id=\"_r_a_\"]").click()'
217
+ # BAD: double quotes bash interprets $ and backticks in your JS
218
+ playwriter -s 1 -e "const price = text.match(/\$[\d.]+/)"
198
219
 
199
- # GOOD: use single quotes inside, or template strings
200
- playwriter -s 1 -e $'await page.locator(\'[id="_r_a_"]\').click()'
220
+ # GOOD: single quotes bash passes everything through literally
221
+ playwriter -s 1 -e 'await state.page.locator(`[id="_r_a_"]`).click()'
201
222
 
202
- # GOOD: use heredoc for complex quoting
223
+ # GOOD: heredoc for complex code with mixed quotes
203
224
  playwriter -s 1 -e "$(cat <<'EOF'
204
- await page.locator('[id="_r_a_"]').click()
225
+ await state.page.locator('[id="_r_a_"]').click()
226
+ const match = html.match(/\$[\d.]+/g)
205
227
  EOF
206
228
  )"
207
229
  ```
208
230
 
209
231
  **7. Using screenshots when snapshots suffice**
210
232
  Screenshots + image analysis is expensive and slow. Only use screenshots for visual/CSS issues:
233
+
211
234
  ```js
212
235
  // BAD: screenshot to check if text appeared (wastes tokens on image analysis)
213
- await page.screenshot({ path: 'check.png', scale: 'css' });
236
+ await state.page.screenshot({ path: 'check.png', scale: 'css' })
214
237
 
215
238
  // GOOD: snapshot is text — fast, cheap, searchable
216
- await accessibilitySnapshot({ page, search: /expected text/i })
239
+ await snapshot({ page: state.page, search: /expected text/i })
217
240
 
218
241
  // GOOD: evaluate DOM directly for content checks
219
- const text = await page.evaluate(() => document.querySelector('.message')?.textContent);
242
+ const text = await state.page.evaluate(() => document.querySelector('.message')?.textContent)
220
243
  ```
221
244
 
222
245
  **8. Assuming page content loaded**
223
246
  Even after `goto()`, dynamic content may not be ready:
247
+
224
248
  ```js
225
- await page.goto('https://example.com');
249
+ await state.page.goto('https://example.com')
226
250
  // Content may still be loading via JavaScript!
227
- await page.waitForSelector('article', { timeout: 10000 });
251
+ await state.page.waitForSelector('article', { timeout: 10000 })
228
252
  // Or use waitForPageLoad utility
229
- await waitForPageLoad({ page, timeout: 5000 });
253
+ await waitForPageLoad({ page: state.page, timeout: 5000 })
230
254
  ```
231
255
 
232
- **9. Login buttons that open popups**
256
+ **9. Not using playwriter for JS-rendered sites**
257
+ Do NOT waste context trying webfetch, curl, or Playwright CLI screenshots on SPAs (Instagram, Twitter, etc.). These sites return empty HTML shells — the real content is rendered by JavaScript. Use playwriter with a real browser session instead:
258
+
259
+ ```js
260
+ // BAD: webfetch/curl on Instagram returns empty HTML, grep finds nothing, huge context wasted
261
+ // BAD: Playwright CLI screenshot needs browser install, produces blank/modal-blocked images
262
+
263
+ // GOOD: use playwriter — real browser, full JS rendering, interactive
264
+ state.page = context.pages().find((p) => p.url() === 'about:blank') ?? (await context.newPage())
265
+ await state.page.goto('https://www.instagram.com/p/ABC123/', { waitUntil: 'domcontentloaded' })
266
+ await waitForPageLoad({ page: state.page, timeout: 8000 })
267
+ await snapshot({ page: state.page, search: /cookie|consent|accept/i }).then(console.log)
268
+ // Now you can see modals, dismiss them, navigate carousels, extract content
269
+ ```
270
+
271
+ **10. Login buttons that open popups**
233
272
  Playwriter extension cannot control popup windows. If a login button opens a popup (common with OAuth/SSO), use cmd+click to open in a new tab instead:
273
+
234
274
  ```js
235
275
  // BAD: popup window is not controllable by playwriter
236
- await page.click('button:has-text("Login with Google")');
276
+ await state.page.click('button:has-text("Login with Google")')
237
277
 
238
278
  // GOOD: cmd+click opens in new tab that playwriter can control
239
- await page.locator('button:has-text("Login with Google")').click({ modifiers: ['Meta'] });
240
- await page.waitForTimeout(2000);
279
+ await state.page.locator('button:has-text("Login with Google")').click({ modifiers: ['Meta'] })
280
+ await state.page.waitForTimeout(2000)
241
281
 
242
282
  // Verify new tab opened - last page should be the login page
243
- const pages = context.pages();
244
- const loginPage = pages[pages.length - 1];
245
- if (loginPage.url() === page.url()) {
246
- throw new Error('Cmd+click did not open new tab - login may have opened as popup');
283
+ const pages = context.pages()
284
+ const loginPage = pages[pages.length - 1]
285
+ if (loginPage.url() === state.page.url()) {
286
+ throw new Error('Cmd+click did not open new tab - login may have opened as popup')
247
287
  }
248
288
 
249
289
  // Complete login flow in loginPage, cookies are shared with original page
250
- await loginPage.locator('[data-email]').first().click();
251
- await loginPage.waitForURL('**/callback**');
290
+ await loginPage.locator('[data-email]').first().click()
291
+ await loginPage.waitForURL('**/callback**')
252
292
  // Original page should now be authenticated
253
293
  ```
254
294
 
295
+ **11. Click times out or does nothing — snapshot to find the blocker**
296
+ When a click times out, a **modal or overlay** is likely intercepting pointer events. Do not retry with different selectors or `{ force: true }` — snapshot to find the blocker:
297
+
298
+ ```js
299
+ // click timed out → don't retry blindly, find what's blocking
300
+ await snapshot({ page: state.page, search: /dialog|modal/i })
301
+ // Found modal → interact with it properly (don't just close via X, it may reappear)
302
+ await state.page.getByRole('radio', { name: 'Nope, Vanilla' }).click()
303
+ ```
304
+
305
+ **12. Never use `dispatchEvent` or `{ force: true }` to bypass blockers**
306
+ `dispatchEvent(new MouseEvent(...))` and `{ force: true }` bypass Playwright checks but **do not trigger React/Vue/Svelte handlers** — state won't update. The same applies to `element.click()` inside `page.evaluate()`. If a click "succeeds" but nothing changes, you're either clicking the wrong node or using the wrong interaction pattern:
307
+
308
+ ```js
309
+ // BAD: heading click bypasses overlay but React ignores it
310
+ await state.page.locator('h3:has-text("Node.js")').click({ force: true })
311
+ // BAD: evaluate click bypasses all Playwright input simulation
312
+ await state.page.evaluate(() => document.querySelector('button').click())
313
+ // GOOD: snapshot shows the real interactive element is a radio, not the heading
314
+ await state.page.getByRole('radio', { name: 'Node.js' }).click()
315
+ ```
316
+
317
+ **13. Over-investigating instead of just interacting**
318
+ When something doesn't respond to a click, do NOT start inspecting CDP event listeners, React fibers, canvas pixel data, or writing `page.evaluate()` to read class names and bounding boxes. This wastes massive context. Instead:
319
+
320
+ 1. Take a `snapshot()` — it shows every interactive element and what to click
321
+ 2. Try a different interaction pattern if `click()` didn't work:
322
+ - **Drawing/annotation tools, canvas paint** → `mouse.down`, move with steps, `mouse.up` (see drag section)
323
+ - **Keyboard-activated modes** → press the shortcut key (snapshot shows tooltip text like "Draw mode D")
324
+ - **Sliders, timeline scrubbers** → drag pattern
325
+ - **Collapsed/toggled toolbars** → click the toggle first, wait, then interact
326
+ 3. Take another `snapshot()` to see what changed
327
+ 4. Only investigate DOM internals if correct interaction patterns produce zero response after 2–3 attempts
328
+
255
329
  ## checking page state
256
330
 
257
- After any action (click, submit, navigate), verify what happened. **Always prefer accessibility snapshots over screenshots** — snapshots are text (cheap, fast, searchable), screenshots require image analysis (expensive, slow).
331
+ After any action (click, submit, navigate), verify what happened. Always print URL first, then snapshot:
258
332
 
259
333
  ```js
260
- // Default: use snapshot with optional filtering
261
- page.url() + '\n' + await accessibilitySnapshot({ page })
334
+ // Always print URL first, then snapshot
335
+ console.log('URL:', state.page.url())
336
+ await snapshot({ page: state.page }).then(console.log)
262
337
 
263
338
  // Filter for specific content when snapshot is large
264
- await accessibilitySnapshot({ page, search: /dialog|button|error/i })
339
+ console.log('URL:', state.page.url())
340
+ await snapshot({ page: state.page, search: /dialog|button|error/i }).then(console.log)
265
341
  ```
266
342
 
267
- Only use `screenshotWithAccessibilityLabels({ page })` for **visual layout issues** (CSS bugs, spatial positioning, colors). For verifying text content, button states, or form values, snapshots are always sufficient.
268
-
269
- If nothing changed, try `await waitForPageLoad({ page, timeout: 3000 })` or you may have clicked the wrong element.
343
+ If nothing changed, try `await waitForPageLoad({ page: state.page, timeout: 3000 })` or you may have clicked the wrong element.
270
344
 
271
345
  ## accessibility snapshots
272
346
 
273
347
  ```js
274
- await accessibilitySnapshot({ page, search?, showDiffSinceLastCall? })
348
+ await snapshot({ page: state.page, search?, showDiffSinceLastCall? })
275
349
  ```
276
350
 
351
+ `accessibilitySnapshot` is still available as an alias for backward compatibility.
352
+
277
353
  - `search` - string/regex to filter results (returns first 10 matching lines)
278
- - `showDiffSinceLastCall` - returns diff since last snapshot (default: `true`). Pass `false` to get full snapshot.
354
+ - `showDiffSinceLastCall` - returns diff since last snapshot (default: `true`, but `false` when `search` is provided). Pass `false` to get full snapshot.
279
355
 
280
- Snapshots return full content on first call, then diffs on subsequent calls. If nothing changed, returns "No changes since last snapshot" message. Use `showDiffSinceLastCall: false` to always get full content.
356
+ Snapshots return full content on first call, then diffs on subsequent calls. Diff is only returned when shorter than full content. If nothing changed, returns "No changes since last snapshot" message. Use `showDiffSinceLastCall: false` to always get full content. When `search` is provided, diffing is disabled by default so the search filters the full content — pass `showDiffSinceLastCall: true` explicitly to combine both. This diffing behavior also applies to `getCleanHTML` and `getPageMarkdown`.
281
357
 
282
358
  Example output:
283
359
 
284
360
  ```md
285
361
  - banner:
286
- - link "Home" [id="nav-home"]
287
- - navigation:
288
- - link "Docs" [data-testid="docs-link"]
289
- - link "Blog" role=link[name="Blog"]
362
+ - link "Home" [id="nav-home"]
363
+ - navigation:
364
+ - link "Docs" [data-testid="docs-link"]
365
+ - link "Blog" role=link[name="Blog"]
290
366
  ```
291
367
 
292
- Each interactive line ends with a Playwright locator you can pass to `page.locator()`.
368
+ Each interactive line ends with a Playwright locator you can pass to `state.page.locator()`.
293
369
  If multiple elements share the same locator, a `>> nth=N` suffix is added (0-based)
294
370
  to make it unique.
295
371
 
372
+ **Use snapshot locators directly — never invent selectors.** The snapshot output IS the selector. Do not guess CSS selectors or `getByText` when the snapshot already gives you the exact match:
373
+
374
+ ```js
375
+ // Snapshot shows: role=radio[name="Nope, Vanilla"] → use it directly
376
+ await state.page.getByRole('radio', { name: 'Nope, Vanilla' }).click()
377
+ // Snapshot shows: role=link[name="SIGN IN"] → or pass raw string to locator()
378
+ await state.page.locator('role=link[name="SIGN IN"]').click()
379
+ ```
380
+
381
+ **Beware CSS text-transform**: snapshots show visual text (`heading "NODE.JS"`) but DOM may be `"Node.js"`. Use case-insensitive regex: `getByRole('heading', { name: /node\.js/i })`.
382
+
296
383
  If a screenshot shows ref labels like `e3`, resolve them using the last snapshot:
297
384
 
298
385
  ```js
299
- const snapshot = await accessibilitySnapshot({ page })
386
+ const snap = await snapshot({ page: state.page })
300
387
  const locator = refToLocator({ ref: 'e3' })
301
- await page.locator(locator!).click()
388
+ await state.page.locator(locator!).click()
302
389
  ```
303
390
 
304
391
  ```js
305
- await page.locator('[id="nav-home"]').click()
306
- await page.locator('[data-testid="docs-link"]').click()
307
- await page.locator('role=link[name="Blog"]').click()
392
+ await state.page.locator('[id="nav-home"]').click()
393
+ await state.page.locator('[data-testid="docs-link"]').click()
394
+ await state.page.locator('role=link[name="Blog"]').click()
308
395
  ```
309
396
 
310
397
  Search for specific elements:
311
398
 
312
399
  ```js
313
- const snapshot = await accessibilitySnapshot({ page, search: /button|submit/i })
400
+ const snap = await snapshot({ page: state.page, search: /button|submit/i })
401
+ ```
402
+
403
+ **Scoping snapshots to a specific element** — pass a `locator` instead of `page` to snapshot only a subtree. This dramatically reduces output size when you only care about one section of the page (e.g., the main content area, ignoring the sidebar/header/footer):
404
+
405
+ ```js
406
+ // Full page snapshot: ~150 lines (sidebar, nav, header, footer, everything)
407
+ await snapshot({ page: state.page })
408
+
409
+ // Scoped to main: ~20 lines (just the content you care about)
410
+ await snapshot({ locator: state.page.locator('main') })
411
+
412
+ // Scope to a specific form, dialog, or section
413
+ await snapshot({ locator: state.page.locator('[role="dialog"]') })
414
+ await snapshot({ locator: state.page.locator('form#checkout') })
314
415
  ```
315
416
 
417
+ Use this whenever the full page snapshot is dominated by navigation or layout elements you don't need. It saves significant tokens and makes the output much easier to parse.
418
+
316
419
  **Filtering large snapshots in JS** — when the built-in `search` isn't enough (e.g., you need multiple patterns or custom logic), filter the snapshot string directly:
317
420
 
318
421
  ```js
319
- const snap = await accessibilitySnapshot({ page, showDiffSinceLastCall: false });
320
- const relevant = snap.split('\n').filter(l =>
321
- l.includes('dialog') || l.includes('error') || l.includes('button')
322
- ).join('\n');
323
- console.log(relevant);
422
+ const snap = await snapshot({ page: state.page, showDiffSinceLastCall: false })
423
+ const relevant = snap
424
+ .split('\n')
425
+ .filter((l) => l.includes('dialog') || l.includes('error') || l.includes('button'))
426
+ .join('\n')
427
+ console.log(relevant)
324
428
  ```
325
429
 
326
430
  This is much cheaper than taking a screenshot — use it as your primary debugging tool for verifying text content, checking if elements exist, or confirming state changes.
327
431
 
328
432
  ## choosing between snapshot methods
329
433
 
330
- Both `accessibilitySnapshot` and `screenshotWithAccessibilityLabels` use the same ref system, so you can combine them effectively.
434
+ Both `snapshot` and `screenshotWithAccessibilityLabels` use the same ref system, so you can combine them effectively.
435
+
436
+ **Use `snapshot` when:**
331
437
 
332
- **Use `accessibilitySnapshot` when:**
333
438
  - Page has simple, semantic structure (articles, forms, lists)
334
439
  - You need to search for specific text or patterns
335
440
  - Token usage matters (text is smaller than images)
336
441
  - You need to process the output programmatically
337
442
 
338
443
  **Use `screenshotWithAccessibilityLabels` when:**
444
+
339
445
  - Page has complex visual layout (grids, galleries, dashboards, maps)
340
446
  - Spatial position matters (e.g., "first image", "top-left button")
341
447
  - DOM order doesn't match visual order
342
448
  - You need to understand the visual hierarchy
343
449
 
344
- **Combining both:** Use screenshot first to understand layout and identify target elements visually, then use `accessibilitySnapshot({ search: /pattern/ })` for efficient searching in subsequent calls.
450
+ **Combining both:** Use screenshot first to understand layout and identify target elements visually, then use `snapshot({ search: /pattern/ })` for efficient searching in subsequent calls.
345
451
 
346
452
  ## selector best practices
347
453
 
348
- **For unknown websites**: use `accessibilitySnapshot()` - it shows what's actually interactive with stable locators.
454
+ **For unknown websites**: use `snapshot()` - it shows what's actually interactive with stable locators.
349
455
 
350
456
  **For development** (when you have source code access), prefer stable selectors in this order:
351
457
 
@@ -359,16 +465,16 @@ Both `accessibilitySnapshot` and `screenshotWithAccessibilityLabels` use the sam
359
465
  Combine locators for precision:
360
466
 
361
467
  ```js
362
- page.locator('tr').filter({ hasText: 'John' }).locator('button').click()
363
- page.locator('button').nth(2).click()
468
+ state.page.locator('tr').filter({ hasText: 'John' }).locator('button').click()
469
+ state.page.locator('button').nth(2).click()
364
470
  ```
365
471
 
366
472
  If a locator matches multiple elements, Playwright throws "strict mode violation". Use `.first()`, `.last()`, or `.nth(n)`:
367
473
 
368
474
  ```js
369
- await page.locator('button').first().click() // first match
370
- await page.locator('.item').last().click() // last match
371
- await page.locator('li').nth(3).click() // 4th item (0-indexed)
475
+ await state.page.locator('button').first().click() // first match
476
+ await state.page.locator('.item').last().click() // last match
477
+ await state.page.locator('li').nth(3).click() // 4th item (0-indexed)
372
478
  ```
373
479
 
374
480
  ## working with pages
@@ -377,15 +483,15 @@ await page.locator('li').nth(3).click() // 4th item (0-indexed)
377
483
 
378
484
  **Get or create your page (first call):**
379
485
 
380
- On your very first execute call, reuse an existing empty tab or create a new one, and navigate it **in the same execute call**. Store it in `state` and use `state.myPage` for all subsequent operations instead of the default `page` variable:
486
+ On your very first execute call, reuse an existing empty tab or create a new one, and navigate it **in the same execute call**. Store it in `state` and use `state.page` for all subsequent operations instead of the default `page` variable:
381
487
 
382
488
  ```js
383
489
  // Reuse an empty about:blank tab if available, otherwise create a new one.
384
490
  // IMPORTANT: always navigate immediately in the same call to avoid another
385
491
  // agent grabbing the same about:blank tab between execute calls.
386
- state.myPage = context.pages().find(p => p.url() === 'about:blank') ?? await context.newPage();
387
- await state.myPage.goto('https://example.com');
388
- // Use state.myPage for ALL subsequent operations
492
+ state.page = context.pages().find((p) => p.url() === 'about:blank') ?? (await context.newPage())
493
+ await state.page.goto('https://example.com')
494
+ // Use state.page for ALL subsequent operations
389
495
  ```
390
496
 
391
497
  **Handle page closures gracefully:**
@@ -393,10 +499,10 @@ await state.myPage.goto('https://example.com');
393
499
  The user may close your page by accident (e.g., closing a tab in Chrome). Always check before using it and recreate if needed:
394
500
 
395
501
  ```js
396
- if (!state.myPage || state.myPage.isClosed()) {
397
- state.myPage = context.pages().find(p => p.url() === 'about:blank') ?? await context.newPage();
502
+ if (!state.page || state.page.isClosed()) {
503
+ state.page = context.pages().find((p) => p.url() === 'about:blank') ?? (await context.newPage())
398
504
  }
399
- await state.myPage.goto('https://example.com');
505
+ await state.page.goto('https://example.com')
400
506
  ```
401
507
 
402
508
  **Use an existing page only when the user asks:**
@@ -404,16 +510,16 @@ await state.myPage.goto('https://example.com');
404
510
  Only use a page from `context.pages()` if the user explicitly asks you to control a specific tab they already opened (e.g., they're logged into an app). Find it by URL pattern and store it in state:
405
511
 
406
512
  ```js
407
- const pages = context.pages().filter(x => x.url().includes('myapp.com'));
408
- if (pages.length === 0) throw new Error('No myapp.com page found. Ask user to enable playwriter on it.');
409
- if (pages.length > 1) throw new Error(`Found ${pages.length} matching pages, expected 1`);
410
- state.targetPage = pages[0];
513
+ const pages = context.pages().filter((x) => x.url().includes('myapp.com'))
514
+ if (pages.length === 0) throw new Error('No myapp.com page found. Ask user to enable playwriter on it.')
515
+ if (pages.length > 1) throw new Error(`Found ${pages.length} matching pages, expected 1`)
516
+ state.targetPage = pages[0]
411
517
  ```
412
518
 
413
519
  **List all available pages:**
414
520
 
415
521
  ```js
416
- context.pages().map(p => p.url())
522
+ context.pages().map((p) => p.url())
417
523
  ```
418
524
 
419
525
  ## navigation
@@ -421,8 +527,8 @@ context.pages().map(p => p.url())
421
527
  **Use `domcontentloaded`** for `page.goto()`:
422
528
 
423
529
  ```js
424
- await page.goto('https://example.com', { waitUntil: 'domcontentloaded' });
425
- await waitForPageLoad({ page, timeout: 5000 });
530
+ await state.page.goto('https://example.com', { waitUntil: 'domcontentloaded' })
531
+ await waitForPageLoad({ page: state.page, timeout: 5000 })
426
532
  ```
427
533
 
428
534
  ## common patterns
@@ -433,30 +539,31 @@ await waitForPageLoad({ page, timeout: 5000 });
433
539
  // BAD: curl/external requests don't have session cookies
434
540
  // curl -H "Cookie: ..." often fails due to missing cookies or CSRF
435
541
 
436
- // GOOD: fetch inside page.evaluate uses browser's full session
437
- const data = await page.evaluate(async (url) => {
438
- const resp = await fetch(url);
439
- return await resp.text();
440
- }, 'https://example.com/protected/resource');
542
+ // GOOD: fetch inside state.page.evaluate uses browser's full session
543
+ const data = await state.page.evaluate(async (url) => {
544
+ const resp = await fetch(url)
545
+ return await resp.text()
546
+ }, 'https://example.com/protected/resource')
441
547
  ```
442
548
 
443
549
  **Downloading large data** - console output truncates large strings. Trigger a browser download instead:
444
550
 
445
551
  ```js
446
552
  // Fetch protected data and trigger download to user's Downloads folder
447
- await page.evaluate(async (url) => {
448
- const resp = await fetch(url);
449
- const data = await resp.text();
450
- const blob = new Blob([data], { type: 'application/octet-stream' });
451
- const a = document.createElement('a');
452
- a.href = URL.createObjectURL(blob);
453
- a.download = 'data.json';
454
- a.click();
455
- }, 'https://example.com/protected/large-file');
553
+ await state.page.evaluate(async (url) => {
554
+ const resp = await fetch(url)
555
+ const data = await resp.text()
556
+ const blob = new Blob([data], { type: 'application/octet-stream' })
557
+ const a = document.createElement('a')
558
+ a.href = URL.createObjectURL(blob)
559
+ a.download = 'data.json'
560
+ a.click()
561
+ }, 'https://example.com/protected/large-file')
456
562
  // File saves to ~/Downloads - read it from there
457
563
  ```
458
564
 
459
565
  **Avoid permission-gated browser APIs** - some APIs require user permission prompts or special browser flags. These often fail silently or hang. Examples to avoid:
566
+
460
567
  - `navigator.clipboard.writeText()` - requires permission
461
568
  - Multiple concurrent downloads - browser may block
462
569
  - `window.showSaveFilePicker()` - requires user gesture
@@ -464,42 +571,86 @@ await page.evaluate(async (url) => {
464
571
 
465
572
  Instead, use simpler alternatives (single download via `a.click()`, store data in `state`, etc).
466
573
 
467
- **Links that open new tabs** - use cmd+click to open in a controllable new tab:
574
+ **Links that open new tabs** - playwriter cannot control popup windows opened via `window.open`. Use cmd+click to open in a controllable new tab instead (see mistake #9 above for a full example):
468
575
 
469
576
  ```js
470
- // For links with target=_blank or buttons that open popups
471
- await page.locator('a[target=_blank]').click({ modifiers: ['Meta'] });
472
- await page.waitForTimeout(1000);
473
-
474
- // New tab is last in context.pages()
475
- const pages = context.pages();
476
- const newTab = pages[pages.length - 1];
477
- console.log('New tab URL:', newTab.url());
577
+ await state.page.locator('a[target=_blank]').click({ modifiers: ['Meta'] })
578
+ await state.page.waitForTimeout(1000)
579
+ const pages = context.pages()
580
+ const newTab = pages[pages.length - 1]
581
+ console.log('New tab URL:', newTab.url())
478
582
  ```
479
583
 
480
- Note: `page.waitForEvent('popup')` is unreliable - playwriter cannot control popup windows opened via `window.open`. Use cmd+click instead.
481
-
482
584
  **Downloads** - capture and save:
483
585
 
484
586
  ```js
485
- const [download] = await Promise.all([page.waitForEvent('download'), page.click('button.download')]);
486
- await download.saveAs(`/tmp/${download.suggestedFilename()}`);
587
+ const [download] = await Promise.all([state.page.waitForEvent('download'), state.page.click('button.download')])
588
+ await download.saveAs(`/tmp/${download.suggestedFilename()}`)
487
589
  ```
488
590
 
489
- **iFrames** - use frameLocator:
591
+ **iFrames** - two approaches depending on what you need:
490
592
 
491
593
  ```js
492
- const frame = page.frameLocator('#my-iframe');
493
- await frame.locator('button').click();
594
+ // frameLocator: for chaining locator operations (click, fill, etc.)
595
+ const frame = state.page.frameLocator('#my-iframe')
596
+ await frame.locator('button').click()
597
+
598
+ // contentFrame: returns a Frame object, needed for snapshot({ frame })
599
+ const frame2 = await state.page.locator('iframe').contentFrame()
600
+ await snapshot({ frame: frame2 })
494
601
  ```
495
602
 
496
603
  **Dialogs** - handle alerts/confirms/prompts:
497
604
 
498
605
  ```js
499
- page.on('dialog', async dialog => { console.log(dialog.message()); await dialog.accept(); });
500
- await page.click('button.trigger-alert');
606
+ state.page.on('dialog', async (dialog) => {
607
+ console.log(dialog.message())
608
+ await dialog.accept()
609
+ })
610
+ await state.page.click('button.trigger-alert')
501
611
  ```
502
612
 
613
+ **Handling page obstacles (cookie modals, login walls, age gates)** - most major websites show blocking overlays. Always check for these with `snapshot()` right after navigation and dismiss them before doing anything else:
614
+
615
+ ```js
616
+ // After navigating, check for common obstacles
617
+ await waitForPageLoad({ page: state.page, timeout: 5000 })
618
+ const snap = await snapshot({
619
+ page: state.page,
620
+ search: /cookie|consent|accept|reject|decline|allow|age|verify|login|sign.in/i,
621
+ })
622
+ console.log(snap)
623
+ // Look for dismiss/accept/decline buttons in the snapshot, then click them:
624
+ // await state.page.locator('button:has-text("Accept")').click();
625
+ // await state.page.locator('button:has-text("Decline optional")').click();
626
+ // Then re-snapshot to confirm the modal is gone before proceeding
627
+ ```
628
+
629
+ If the page requires login and the user is already logged into Chrome, their session cookies are available — just navigate and the page should load authenticated. If not, ask the user for help or use their existing logged-in tab via `context.pages()`.
630
+
631
+ **Extracting and downloading media (images, videos)** - use `page.evaluate()` to extract URLs from the rendered DOM, then download via Node.js in the sandbox. This is far more reliable than parsing raw HTML:
632
+
633
+ ```js
634
+ // Extract all image URLs from rendered DOM
635
+ const images = await state.page.evaluate(() =>
636
+ Array.from(document.querySelectorAll('img[src]')).map((img) => ({
637
+ src: img.src,
638
+ alt: img.alt,
639
+ width: img.naturalWidth,
640
+ })),
641
+ )
642
+ console.log(JSON.stringify(images, null, 2))
643
+
644
+ // Download a specific image to disk
645
+ const fs = require('node:fs')
646
+ const resp = await fetch(images[0].src)
647
+ const buf = Buffer.from(await resp.arrayBuffer())
648
+ fs.writeFileSync('./downloaded-image.jpg', buf)
649
+ console.log('Saved', buf.length, 'bytes')
650
+ ```
651
+
652
+ For carousels or lazy-loaded galleries, you may need to click navigation arrows or scroll first, then re-extract. Use network interception (see "network interception" section) to capture high-resolution CDN URLs that may differ from the `img.src` thumbnails.
653
+
503
654
  ## utility functions
504
655
 
505
656
  **getLatestLogs** - retrieve captured browser console logs (up to 5000 per page, cleared on navigation):
@@ -508,51 +659,53 @@ await page.click('button.trigger-alert');
508
659
  await getLatestLogs({ page?, count?, search? })
509
660
  // Examples:
510
661
  const errors = await getLatestLogs({ search: /error/i, count: 50 })
511
- const pageLogs = await getLatestLogs({ page })
662
+ const pageLogs = await getLatestLogs({ page: state.page })
512
663
  ```
513
664
 
514
- For custom log collection across runs, store in state: `state.logs = []; page.on('console', m => state.logs.push(m.text()))`
665
+ For custom log collection across runs, store in state: `state.logs = []; state.page.on('console', m => state.logs.push(m.text()))`
515
666
 
516
667
  **getCleanHTML** - get cleaned HTML from a locator or page, with search and diffing:
517
668
 
518
669
  ```js
519
670
  await getCleanHTML({ locator, search?, showDiffSinceLastCall?, includeStyles? })
520
671
  // Examples:
521
- const html = await getCleanHTML({ locator: page.locator('body') })
522
- const html = await getCleanHTML({ locator: page, search: /button/i })
523
- const fullHtml = await getCleanHTML({ locator: page, showDiffSinceLastCall: false }) // disable diff
672
+ const html = await getCleanHTML({ locator: state.page.locator('body') })
673
+ const html = await getCleanHTML({ locator: state.page, search: /button/i })
674
+ const fullHtml = await getCleanHTML({ locator: state.page, showDiffSinceLastCall: false }) // disable diff
524
675
  ```
525
676
 
526
677
  **Parameters:**
678
+
527
679
  - `locator` - Playwright Locator or Page to get HTML from
528
680
  - `search` - string/regex to filter results (returns first 10 matching lines with 5 lines context)
529
- - `showDiffSinceLastCall` - returns diff since last call (default: `true`). Pass `false` to get full HTML.
681
+ - `showDiffSinceLastCall` - returns diff since last call (default: `true`, but `false` when `search` is provided). Pass `false` to get full HTML.
530
682
  - `includeStyles` - keep style and class attributes (default: false)
531
683
 
532
684
  **HTML processing:**
533
685
  The function cleans HTML for compact, readable output:
686
+
534
687
  - **Removes tags**: script, style, link, meta, noscript, svg, head
535
688
  - **Unwraps nested wrappers**: Empty divs/spans with no attributes that only wrap a single child are collapsed (e.g., `<div><div><div><p>text</p></div></div></div>` → `<div><p>text</p></div>`)
536
689
  - **Removes empty elements**: Elements with no attributes and no content are removed
537
690
  - **Truncates long values**: Attribute values >200 chars and text content >500 chars are truncated
538
691
 
539
692
  **Attributes kept (summary):**
693
+
540
694
  - Common semantic and ARIA attributes (e.g., `href`, `name`, `type`, `aria-*`)
541
695
  - All `data-*` test attributes
542
696
  - Frequently used test IDs and special attributes (e.g., `testid`, `qa`, `e2e`, `vimium-label`)
543
697
 
544
- Snapshots return full content on first call, then diffs on subsequent calls. Diff is only returned when shorter than full content.
545
-
546
698
  **getPageMarkdown** - extract main page content as plain text using Mozilla Readability (same algorithm as Firefox Reader View). Strips navigation, ads, sidebars, and other clutter. Returns formatted text with title, author, and content:
547
699
 
548
700
  ```js
549
- await getPageMarkdown({ page, search?, showDiffSinceLastCall? })
701
+ await getPageMarkdown({ page: state.page, search?, showDiffSinceLastCall? })
550
702
  // Examples:
551
- const content = await getPageMarkdown({ page, showDiffSinceLastCall: false }) // full article
552
- const matches = await getPageMarkdown({ page, search: /API/i }) // search within content
703
+ const content = await getPageMarkdown({ page: state.page, showDiffSinceLastCall: false }) // full article
704
+ const matches = await getPageMarkdown({ page: state.page, search: /API/i }) // search within content
553
705
  ```
554
706
 
555
707
  **Output format:**
708
+
556
709
  ```
557
710
  # Article Title
558
711
 
@@ -564,13 +717,13 @@ The main article content as plain text, with paragraphs preserved...
564
717
  ```
565
718
 
566
719
  **Parameters:**
720
+
567
721
  - `page` - Playwright Page to extract content from
568
722
  - `search` - string/regex to filter content (returns first 10 matching lines with 5 lines context)
569
- - `showDiffSinceLastCall` - returns diff since last call (default: `true`). Pass `false` to get full content.
570
-
571
- Snapshots return full content on first call, then diffs on subsequent calls. Diff is only returned when shorter than full content.
723
+ - `showDiffSinceLastCall` - returns diff since last call (default: `true`, but `false` when `search` is provided). Pass `false` to get full content.
572
724
 
573
725
  **Use cases:**
726
+
574
727
  - Extract article text for LLM processing without HTML noise
575
728
  - Get readable content from news sites, blogs, documentation
576
729
  - Compare content changes after interactions
@@ -578,116 +731,180 @@ Snapshots return full content on first call, then diffs on subsequent calls. Dif
578
731
  **waitForPageLoad** - smart load detection that ignores analytics/ads:
579
732
 
580
733
  ```js
581
- await waitForPageLoad({ page, timeout?, pollInterval?, minWait? })
734
+ await waitForPageLoad({ page: state.page, timeout?, pollInterval?, minWait? })
582
735
  // Returns: { success, readyState, pendingRequests, waitTimeMs, timedOut }
583
736
  ```
584
737
 
585
738
  **getCDPSession** - send raw CDP commands:
586
739
 
587
740
  ```js
588
- const cdp = await getCDPSession({ page });
589
- const metrics = await cdp.send('Page.getLayoutMetrics');
741
+ const cdp = await getCDPSession({ page: state.page })
742
+ const metrics = await cdp.send('Page.getLayoutMetrics')
590
743
  ```
591
744
 
592
745
  **getLocatorStringForElement** - get stable Playwright selector from an element:
593
746
 
594
747
  ```js
595
- const selector = await getLocatorStringForElement(page.locator('[id="submit-btn"]'));
748
+ const selector = await getLocatorStringForElement(state.page.locator('[id="submit-btn"]'))
596
749
  // => "getByRole('button', { name: 'Save' })"
597
750
  ```
598
751
 
599
752
  **getReactSource** - get React component source location (dev mode only):
600
753
 
601
754
  ```js
602
- const source = await getReactSource({ locator: page.locator('[data-testid="submit-btn"]') });
755
+ const source = await getReactSource({ locator: state.page.locator('[data-testid="submit-btn"]') })
603
756
  // => { fileName, lineNumber, columnNumber, componentName }
604
757
  ```
605
758
 
606
759
  **getStylesForLocator** - inspect CSS styles applied to an element, like browser DevTools "Styles" panel. Useful for debugging styling issues, finding where a CSS property is defined (file:line), and checking inherited styles. Returns selector, source location, and declarations for each matching rule. ALWAYS fetch `https://playwriter.dev/resources/styles-api.md` first with curl or webfetch tool.
607
760
 
608
761
  ```js
609
- const styles = await getStylesForLocator({ locator: page.locator('.btn'), cdp: await getCDPSession({ page }) });
610
- console.log(formatStylesAsText(styles));
762
+ const styles = await getStylesForLocator({
763
+ locator: state.page.locator('.btn'),
764
+ cdp: await getCDPSession({ page: state.page }),
765
+ })
766
+ console.log(formatStylesAsText(styles))
611
767
  ```
612
768
 
613
769
  **createDebugger** - set breakpoints, step through code, inspect variables at runtime. Useful for debugging issues that only reproduce in browser, understanding code flow, and inspecting state at specific points. Can pause on exceptions, evaluate expressions in scope, and blackbox framework code. ALWAYS fetch `https://playwriter.dev/resources/debugger-api.md` first.
614
770
 
615
771
  ```js
616
- const cdp = await getCDPSession({ page }); const dbg = createDebugger({ cdp }); await dbg.enable();
617
- const scripts = await dbg.listScripts({ search: 'app' });
618
- await dbg.setBreakpoint({ file: scripts[0].url, line: 42 });
772
+ const cdp = await getCDPSession({ page: state.page })
773
+ const dbg = createDebugger({ cdp })
774
+ await dbg.enable()
775
+ const scripts = await dbg.listScripts({ search: 'app' })
776
+ await dbg.setBreakpoint({ file: scripts[0].url, line: 42 })
619
777
  // when paused: dbg.inspectLocalVariables(), dbg.stepOver(), dbg.resume()
620
778
  ```
621
779
 
622
780
  **createEditor** - view and live-edit page scripts and CSS at runtime. Edits are in-memory (persist until reload). Useful for testing quick fixes, searching page scripts with grep, and toggling debug flags. ALWAYS read `https://playwriter.dev/resources/editor-api.md` first.
623
781
 
624
782
  ```js
625
- const cdp = await getCDPSession({ page }); const editor = createEditor({ cdp }); await editor.enable();
626
- const matches = await editor.grep({ regex: /console\.log/ });
627
- await editor.edit({ url: matches[0].url, oldString: 'DEBUG = false', newString: 'DEBUG = true' });
783
+ const cdp = await getCDPSession({ page: state.page })
784
+ const editor = createEditor({ cdp })
785
+ await editor.enable()
786
+ const matches = await editor.grep({ regex: /console\.log/ })
787
+ await editor.edit({ url: matches[0].url, oldString: 'DEBUG = false', newString: 'DEBUG = true' })
628
788
  ```
629
789
 
630
790
  **screenshotWithAccessibilityLabels** - take a screenshot with Vimium-style visual labels overlaid on interactive elements. Shows labels, captures screenshot, then removes labels. The image and accessibility snapshot are automatically included in the response. Can be called multiple times to capture multiple screenshots. Use a timeout of **20 seconds** for complex pages.
631
791
 
632
- Prefer this for pages with grids, image galleries, maps, or complex visual layouts where spatial position matters. For simple text-heavy pages, `accessibilitySnapshot` with search is faster and uses fewer tokens.
792
+ Prefer this for pages with grids, image galleries, maps, or complex visual layouts where spatial position matters. For simple text-heavy pages, `snapshot` with search is faster and uses fewer tokens.
633
793
 
634
794
  ```js
635
- await screenshotWithAccessibilityLabels({ page });
795
+ await screenshotWithAccessibilityLabels({ page: state.page })
636
796
  // Image and accessibility snapshot are automatically included in response
637
797
  // Use refs from snapshot to interact with elements
638
- await page.locator('[id="submit-btn"]').click();
798
+ await state.page.locator('[id="submit-btn"]').click()
639
799
 
640
800
  // Can take multiple screenshots in one execution
641
- await screenshotWithAccessibilityLabels({ page });
642
- await page.click('button');
643
- await screenshotWithAccessibilityLabels({ page });
801
+ await screenshotWithAccessibilityLabels({ page: state.page })
802
+ await state.page.click('button')
803
+ await screenshotWithAccessibilityLabels({ page: state.page })
644
804
  // Both images are included in the response
645
805
  ```
646
806
 
647
807
  Labels are color-coded: yellow=links, orange=buttons, coral=inputs, pink=checkboxes, peach=sliders, salmon=menus, amber=tabs.
648
808
 
649
- **startRecording / stopRecording** - record the page as a video at native FPS (30-60fps). Uses `chrome.tabCapture` in the extension context, so **recording survives page navigation**. Video is saved as mp4.
809
+ **resizeImage** - shrink an image in-place so it consumes fewer tokens when read back into context. `await resizeImage({ input: './screenshot.png' })`. Also accepts `width`, `height`, `maxDimension`, `quality`, `output`.
810
+
811
+ **recording.start / recording.stop** - record the page as a video at native FPS (30-60fps). Uses `chrome.tabCapture` in the extension context, so **recording survives page navigation**. Video is saved as mp4.
812
+
813
+ While recording is active, Playwriter automatically overlays a smooth ghost cursor that follows automated mouse actions (`page.mouse.*`, `locator.click()`, hover flows) using `page.onMouseAction` from the Playwright fork.
814
+
815
+ For demos where cursor movement should be visible and human-like, drive the page with interaction methods (`locator.click()`, `page.click()`, `page.mouse.move()`, `press`, typing). Avoid skipping interactions with direct state jumps (for example, `goto(itemUrl)` instead of clicking the link) when your goal is to show realistic pointer motion in the recording.
650
816
 
651
817
  **Note**: Recording requires the user to have clicked the Playwriter extension icon on the tab. This grants `activeTab` permission needed for `chrome.tabCapture`. Recording works on tabs where the icon was clicked - if you need to record a new tab, ask the user to click the icon on it first.
652
818
 
653
819
  ```js
654
820
  // Start recording - outputPath must be specified upfront
655
- await startRecording({
656
- page,
821
+ await recording.start({
822
+ page: state.page,
657
823
  outputPath: './recording.mp4',
658
- frameRate: 30, // default: 30
659
- audio: false, // default: false (tab audio)
660
- videoBitsPerSecond: 2500000 // 2.5 Mbps
661
- });
824
+ frameRate: 30, // default: 30
825
+ audio: false, // default: false (tab audio)
826
+ videoBitsPerSecond: 2500000, // 2.5 Mbps
827
+ })
662
828
 
663
829
  // Navigate around - recording continues!
664
- await page.click('a');
665
- await page.waitForLoadState('domcontentloaded');
666
- await page.goBack();
830
+ await state.page.click('a')
831
+ await state.page.waitForLoadState('domcontentloaded')
832
+ await state.page.goBack()
667
833
 
668
834
  // Stop and get result
669
- const { path, duration, size } = await stopRecording({ page });
670
- console.log(`Saved ${size} bytes, duration: ${duration}ms`);
835
+ const { path, duration, size } = await recording.stop({ page: state.page })
836
+ console.log(`Saved ${size} bytes, duration: ${duration}ms`)
671
837
  ```
672
838
 
673
839
  Additional recording utilities:
840
+
674
841
  ```js
675
842
  // Check if recording is active
676
- const { isRecording, startedAt } = await isRecording({ page });
843
+ const { isRecording, startedAt } = await recording.isRecording({ page: state.page })
677
844
 
678
845
  // Cancel recording without saving
679
- await cancelRecording({ page });
846
+ await recording.cancel({ page: state.page })
680
847
  ```
681
848
 
849
+ **ghostCursor.show / ghostCursor.hide** - manually show or hide the in-page cursor overlay. Useful for screenshots and demos even when recording is not running.
850
+
851
+ ```js
852
+ // Show cursor in the center (or keep current position if already visible)
853
+ await ghostCursor.show({ page: state.page })
854
+
855
+ // Optional styles: 'minimal' (default triangular pointer), 'dot', 'screenstudio'
856
+ await ghostCursor.show({ page: state.page, style: 'minimal' })
857
+
858
+ // Hide cursor overlay
859
+ await ghostCursor.hide({ page: state.page })
860
+ ```
861
+
862
+ `startRecording`, `stopRecording`, `isRecording`, and `cancelRecording` remain available as backward-compatible aliases.
863
+
682
864
  **Key difference from getDisplayMedia**: This approach uses `chrome.tabCapture` which runs in the extension context, not the page. The recording persists across navigations because the extension holds the `MediaRecorder`, not the page's JavaScript context.
683
865
 
866
+ **createDemoVideo** - create a polished demo video from a recording by automatically speeding up idle sections (time between execute() calls) while keeping interactions at normal speed. Useful for creating demo videos of agent workflows without long pauses.
867
+
868
+ While recording is active, playwriter tracks when each `execute()` call starts and ends. `recording.stop()` returns these timestamps alongside the video file. `createDemoVideo` uses this data to identify idle gaps and speed them up with ffmpeg in a single pass.
869
+
870
+ A 1-second buffer is preserved around each interaction so viewers see context before and after each action.
871
+
872
+ Requires `ffmpeg` and `ffprobe` installed on the system.
873
+
874
+ **Timeout**: `createDemoVideo` runs ffmpeg on the full recording and can take 60–120+ seconds. Always pass `--timeout 120000` (or higher) to the playwriter execute call that contains it, otherwise it will silently time out before the file is written.
875
+
876
+ ```js
877
+ // Start recording
878
+ await recording.start({ page: state.page, outputPath: './recording.mp4' })
879
+ ```
880
+
881
+ ```js
882
+ // ... multiple execute() calls with browser interactions ...
883
+ // Each call's timing is tracked automatically while recording is active
884
+ ```
885
+
886
+ ```js
887
+ // Stop recording — executionTimestamps is included in the result
888
+ const recordingResult = await recording.stop({ page: state.page })
889
+
890
+ // Create demo video — idle gaps are sped up 4x (default)
891
+ const demoPath = await createDemoVideo({
892
+ recordingPath: recordingResult.path,
893
+ durationMs: recordingResult.duration,
894
+ executionTimestamps: recordingResult.executionTimestamps,
895
+ speed: 5, // optional, default 5x for idle sections
896
+ // outputFile: './demo.mp4', // optional, defaults to recording-demo.mp4
897
+ })
898
+ console.log('Demo video:', demoPath)
899
+ ```
900
+
684
901
  ## pinned elements
685
902
 
686
903
  Users can right-click → "Copy Playwriter Element Reference" to store elements in `globalThis.playwriterPinnedElem1` (increments for each pin). The reference is copied to clipboard:
687
904
 
688
905
  ```js
689
- const el = await page.evaluateHandle(() => globalThis.playwriterPinnedElem1);
690
- await el.click();
906
+ const el = await state.page.evaluateHandle(() => globalThis.playwriterPinnedElem1)
907
+ await el.click()
691
908
  ```
692
909
 
693
910
  ## taking screenshots
@@ -695,24 +912,28 @@ await el.click();
695
912
  Always use `scale: 'css'` to avoid 2-4x larger images on high-DPI displays:
696
913
 
697
914
  ```js
698
- await page.screenshot({ path: 'shot.png', scale: 'css' });
915
+ await state.page.screenshot({ path: 'shot.png', scale: 'css' })
699
916
  ```
700
917
 
701
- If you want to read back the image file into context make sure to resize it first, scaling down the image to make sure max size is 1500px. for example with `sips --resampleHeightWidthMax 1500 input.png --out output.png` on macOS.
918
+ If you want to read back the image file into context, resize it first so it consumes fewer tokens:
919
+
920
+ ```js
921
+ await resizeImage({ input: './shot.png' })
922
+ ```
702
923
 
703
924
  ## page.evaluate
704
925
 
705
926
  Code inside `page.evaluate()` runs in the browser - use plain JavaScript only, no TypeScript syntax. Return values and log outside (console.log inside evaluate runs in browser, not visible):
706
927
 
707
928
  ```js
708
- const title = await page.evaluate(() => document.title);
709
- console.log('Title:', title);
929
+ const title = await state.page.evaluate(() => document.title)
930
+ console.log('Title:', title)
710
931
 
711
- const info = await page.evaluate(() => ({
712
- url: location.href,
713
- buttons: document.querySelectorAll('button').length,
714
- }));
715
- console.log(info);
932
+ const info = await state.page.evaluate(() => ({
933
+ url: location.href,
934
+ buttons: document.querySelectorAll('button').length,
935
+ }))
936
+ console.log(info)
716
937
  ```
717
938
 
718
939
  ## loading files
@@ -720,7 +941,9 @@ console.log(info);
720
941
  Fill inputs with file content:
721
942
 
722
943
  ```js
723
- const fs = require('node:fs'); const content = fs.readFileSync('./data.txt', 'utf-8'); await page.locator('textarea').fill(content);
944
+ const fs = require('node:fs')
945
+ const content = fs.readFileSync('./data.txt', 'utf-8')
946
+ await state.page.locator('textarea').fill(content)
724
947
  ```
725
948
 
726
949
  ## network interception
@@ -728,34 +951,49 @@ const fs = require('node:fs'); const content = fs.readFileSync('./data.txt', 'ut
728
951
  For scraping or reverse-engineering APIs, intercept network requests instead of scrolling DOM. Store in `state` to analyze across calls:
729
952
 
730
953
  ```js
731
- state.requests = []; state.responses = [];
732
- page.on('request', req => { if (req.url().includes('/api/')) state.requests.push({ url: req.url(), method: req.method(), headers: req.headers() }); });
733
- page.on('response', async res => { if (res.url().includes('/api/')) { try { state.responses.push({ url: res.url(), status: res.status(), body: await res.json() }); } catch {} } });
954
+ state.requests = []
955
+ state.responses = []
956
+ state.page.on('request', (req) => {
957
+ if (req.url().includes('/api/')) state.requests.push({ url: req.url(), method: req.method(), headers: req.headers() })
958
+ })
959
+ state.page.on('response', async (res) => {
960
+ if (res.url().includes('/api/')) {
961
+ try {
962
+ state.responses.push({ url: res.url(), status: res.status(), body: await res.json() })
963
+ } catch {}
964
+ }
965
+ })
734
966
  ```
735
967
 
736
968
  Then trigger actions (scroll, click, navigate) and analyze captured data:
737
969
 
738
970
  ```js
739
- console.log('Captured', state.responses.length, 'API calls');
740
- state.responses.forEach(r => console.log(r.status, r.url.slice(0, 80)));
971
+ console.log('Captured', state.responses.length, 'API calls')
972
+ state.responses.forEach((r) => console.log(r.status, r.url.slice(0, 80)))
741
973
  ```
742
974
 
743
975
  Inspect a specific response to understand schema:
744
976
 
745
977
  ```js
746
- const resp = state.responses.find(r => r.url.includes('users'));
747
- console.log(JSON.stringify(resp.body, null, 2).slice(0, 2000));
978
+ const resp = state.responses.find((r) => r.url.includes('users'))
979
+ console.log(JSON.stringify(resp.body, null, 2).slice(0, 2000))
748
980
  ```
749
981
 
750
982
  Replay API directly (useful for pagination):
751
983
 
752
984
  ```js
753
- const { url, headers } = state.requests.find(r => r.url.includes('feed'));
754
- const data = await page.evaluate(async ({ url, headers }) => { const res = await fetch(url, { headers }); return res.json(); }, { url, headers });
755
- console.log(data);
985
+ const { url, headers } = state.requests.find((r) => r.url.includes('feed'))
986
+ const data = await state.page.evaluate(
987
+ async ({ url, headers }) => {
988
+ const res = await fetch(url, { headers })
989
+ return res.json()
990
+ },
991
+ { url, headers },
992
+ )
993
+ console.log(data)
756
994
  ```
757
995
 
758
- Clean up listeners when done: `page.removeAllListeners('request'); page.removeAllListeners('response');`
996
+ Clean up listeners when done: `state.page.removeAllListeners('request'); state.page.removeAllListeners('response');`
759
997
 
760
998
  ## debugging web apps
761
999
 
@@ -764,38 +1002,39 @@ When debugging why a web app isn't working (e.g., content not rendering, API err
764
1002
  **1. Console logs** — use `getLatestLogs` to check for errors:
765
1003
 
766
1004
  ```js
767
- const errors = await getLatestLogs({ page, search: /error|fail/i, count: 20 });
768
- const appLogs = await getLatestLogs({ page, search: /myComponent|state/i });
1005
+ const errors = await getLatestLogs({ page: state.page, search: /error|fail/i, count: 20 })
1006
+ const appLogs = await getLatestLogs({ page: state.page, search: /myComponent|state/i })
769
1007
  ```
770
1008
 
771
1009
  **2. DOM inspection via evaluate** — check content directly without screenshots:
772
1010
 
773
1011
  ```js
774
- const info = await page.evaluate(() => {
775
- const msgs = document.querySelectorAll('.message');
776
- return Array.from(msgs).map(m => ({
1012
+ const info = await state.page.evaluate(() => {
1013
+ const msgs = document.querySelectorAll('.message')
1014
+ return Array.from(msgs).map((m) => ({
777
1015
  text: m.textContent?.slice(0, 200),
778
1016
  visible: m.offsetHeight > 0,
779
- }));
780
- });
781
- console.log(JSON.stringify(info, null, 2));
1017
+ }))
1018
+ })
1019
+ console.log(JSON.stringify(info, null, 2))
782
1020
  ```
783
1021
 
784
1022
  **3. Combine snapshot + logs for full picture:**
785
1023
 
786
1024
  ```js
787
- await page.keyboard.press('Enter');
788
- await page.waitForTimeout(2000);
1025
+ await state.page.keyboard.press('Enter')
1026
+ await state.page.waitForTimeout(2000)
789
1027
 
790
- const snap = await accessibilitySnapshot({ page, search: /dialog|error|message/ });
791
- const logs = await getLatestLogs({ page, search: /error/i, count: 10 });
792
- console.log('UI:', snap);
793
- console.log('Logs:', logs);
1028
+ const snap = await snapshot({ page: state.page, search: /dialog|error|message/ })
1029
+ const logs = await getLatestLogs({ page: state.page, search: /error/i, count: 10 })
1030
+ console.log('UI:', snap)
1031
+ console.log('Logs:', logs)
794
1032
  ```
795
1033
 
796
1034
  ## capabilities
797
1035
 
798
1036
  Examples of what playwriter can do:
1037
+
799
1038
  - Monitor console logs while user reproduces a bug
800
1039
  - Intercept network requests to reverse-engineer APIs and build SDKs
801
1040
  - Scrape data by replaying paginated API calls instead of scrolling DOM
@@ -805,6 +1044,110 @@ Examples of what playwriter can do:
805
1044
  - Handle popups, downloads, iframes, and dialog boxes
806
1045
  - Record videos of browser sessions that survive page navigation
807
1046
 
1047
+ ## computer use
1048
+
1049
+ Playwriter provides the same browser control as Anthropic's `computer_20250124` tool and the Claude Chrome extension, using Playwright APIs instead of screenshot-based coordinate clicking. No computer use beta needed.
1050
+
1051
+ This section covers low-level mouse/keyboard APIs not documented elsewhere. For locator-based clicking, screenshots, navigation, forms, evaluate, snapshots, and network interception see their dedicated sections above.
1052
+
1053
+ ### clicking
1054
+
1055
+ ```js
1056
+ // Preferred: by locator (stable, auto-waits, no coordinates needed)
1057
+ await state.page.locator('button[name="Submit"]').click()
1058
+ await state.page.locator('text=Login').click({ button: 'right' })
1059
+ await state.page.locator('text=Login').dblclick()
1060
+ await state.page
1061
+ .locator('a')
1062
+ .first()
1063
+ .click({ modifiers: ['Meta'] }) // cmd+click opens new tab
1064
+
1065
+ // By coordinates (when locators aren't available, e.g. canvas, maps, custom widgets)
1066
+ await state.page.mouse.click(450, 320) // left click
1067
+ await state.page.mouse.click(450, 320, { button: 'right' }) // right click
1068
+ await state.page.mouse.dblclick(450, 320) // double click
1069
+ await state.page.mouse.click(450, 320, { clickCount: 3 }) // triple click
1070
+ await state.page.mouse.click(450, 320, { modifiers: ['Shift'] }) // shift+click
1071
+ ```
1072
+
1073
+ ### hover
1074
+
1075
+ ```js
1076
+ await state.page.locator('.tooltip-trigger').hover() // by locator (preferred)
1077
+ await state.page.mouse.move(450, 320) // by coordinates
1078
+ ```
1079
+
1080
+ ### scroll
1081
+
1082
+ ```js
1083
+ // By locator (preferred)
1084
+ await state.page.locator('#footer').scrollIntoViewIfNeeded()
1085
+
1086
+ // By pixel (for canvas, maps, infinite scroll)
1087
+ await state.page.mouse.wheel(0, 300) // scroll down 300px
1088
+ await state.page.mouse.wheel(0, -300) // scroll up
1089
+ await state.page.mouse.wheel(300, 0) // scroll right
1090
+ await state.page.mouse.wheel(-300, 0) // scroll left
1091
+
1092
+ // Scroll at a specific position
1093
+ await state.page.mouse.move(450, 320)
1094
+ await state.page.mouse.wheel(0, 500)
1095
+
1096
+ // Scroll inside a container
1097
+ await state.page.locator('.scrollable-list').evaluate((el) => {
1098
+ el.scrollTop += 500
1099
+ })
1100
+ ```
1101
+
1102
+ ### drag
1103
+
1104
+ ```js
1105
+ // By locator (preferred)
1106
+ await state.page.locator('#item').dragTo(state.page.locator('#target'))
1107
+
1108
+ // By coordinates (for canvas, sliders, custom drag targets)
1109
+ await state.page.mouse.move(100, 200)
1110
+ await state.page.mouse.down()
1111
+ await state.page.mouse.move(400, 500, { steps: 10 }) // steps for smooth drag
1112
+ await state.page.mouse.up()
1113
+ ```
1114
+
1115
+ **Freehand drawing, annotation widgets, and canvas tools** use this same `mouse.down → move → up` pattern. If a widget expects a drawn stroke (paint tools, annotation overlays, range sliders, timeline scrubbers), always use held-mouse motion — not `mouse.click()`:
1116
+
1117
+ ```js
1118
+ // Draw a stroke across a canvas or annotation layer
1119
+ await state.page.mouse.move(startX, startY)
1120
+ await state.page.mouse.down()
1121
+ await state.page.mouse.move(endX, endY, { steps: 15 }) // steps = smoother stroke
1122
+ await state.page.mouse.up()
1123
+ await state.page.waitForTimeout(500) // let the widget process the stroke
1124
+ ```
1125
+
1126
+ ### key hold / release / repeat
1127
+
1128
+ ```js
1129
+ // Hold modifier while pressing another key
1130
+ await state.page.keyboard.down('Shift')
1131
+ await state.page.keyboard.press('ArrowDown')
1132
+ await state.page.keyboard.up('Shift')
1133
+
1134
+ // Repeat a key
1135
+ for (let i = 0; i < 5; i++) await state.page.keyboard.press('ArrowDown')
1136
+ ```
1137
+
1138
+ ### resize viewport
1139
+
1140
+ ```js
1141
+ await state.page.setViewportSize({ width: 1280, height: 720 })
1142
+ ```
1143
+
1144
+ ### region screenshot (zoom equivalent)
1145
+
1146
+ ```js
1147
+ await state.page.screenshot({ path: 'region.png', scale: 'css', clip: { x: 100, y: 200, width: 400, height: 300 } })
1148
+ ```
1149
+
1150
+ Prefer locator-based actions over coordinates — locators are stable across scroll/resize, auto-wait for elements, and don't require screenshot round-trips that burn ~800 image tokens per cycle.
808
1151
 
809
1152
  ## Ghost Browser integration
810
1153
 
@@ -812,19 +1155,15 @@ Playwriter supports [Ghost Browser](https://ghostbrowser.com/) for multi-identit
812
1155
 
813
1156
  ```js
814
1157
  // List identities and open tabs in different ones
815
- const identities = await chrome.projects.getIdentitiesList();
816
- await chrome.ghostPublicAPI.openTab({ url: 'https://reddit.com', identity: identities[0].id });
1158
+ const identities = await chrome.projects.getIdentitiesList()
1159
+ await chrome.ghostPublicAPI.openTab({ url: 'https://reddit.com', identity: identities[0].id })
817
1160
 
818
1161
  // Assign proxies per tab or identity
819
- const proxies = await chrome.ghostProxies.getList();
820
- await chrome.ghostProxies.setTabProxy(tabId, proxies[0].id);
1162
+ const proxies = await chrome.ghostProxies.getList()
1163
+ await chrome.ghostProxies.setTabProxy(tabId, proxies[0].id)
821
1164
  ```
822
1165
 
823
1166
  For complete API reference with all methods, types, and examples, read:
824
1167
  `extension/src/ghost-browser-api.d.ts`
825
1168
 
826
1169
  Note: Only works in Ghost Browser. In regular Chrome, calls fail with "not available".
827
-
828
- ## debugging playwriter issues
829
-
830
- if some internal critical error happens you can read your own relay ws logs to understand the issue, it will show logs from extension, mcp and ws server together. then you can create a gh issue using `gh issue create -R remorses/playwriter --title title --body body`. ask for user confirmation before doing this.