cdp-skill 1.0.7 → 1.0.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. package/README.md +80 -35
  2. package/SKILL.md +198 -1344
  3. package/install.js +1 -0
  4. package/package.json +1 -1
  5. package/src/aria/index.js +8 -0
  6. package/src/aria/output-processor.js +173 -0
  7. package/src/aria/role-query.js +1229 -0
  8. package/src/aria/snapshot.js +459 -0
  9. package/src/aria.js +237 -43
  10. package/src/cdp/browser.js +22 -4
  11. package/src/cdp-skill.js +268 -68
  12. package/src/dom/click-executor.js +240 -76
  13. package/src/dom/element-locator.js +34 -25
  14. package/src/dom/fill-executor.js +55 -27
  15. package/src/page/dialog-handler.js +119 -0
  16. package/src/page/page-controller.js +190 -3
  17. package/src/runner/context-helpers.js +33 -55
  18. package/src/runner/execute-dynamic.js +34 -143
  19. package/src/runner/execute-form.js +11 -11
  20. package/src/runner/execute-input.js +2 -2
  21. package/src/runner/execute-interaction.js +99 -120
  22. package/src/runner/execute-navigation.js +11 -26
  23. package/src/runner/execute-query.js +8 -5
  24. package/src/runner/step-executors.js +256 -95
  25. package/src/runner/step-registry.js +1064 -0
  26. package/src/runner/step-validator.js +16 -740
  27. package/src/tests/Aria.test.js +1025 -0
  28. package/src/tests/ContextHelpers.test.js +39 -28
  29. package/src/tests/ExecuteBrowser.test.js +572 -0
  30. package/src/tests/ExecuteDynamic.test.js +34 -736
  31. package/src/tests/ExecuteForm.test.js +700 -0
  32. package/src/tests/ExecuteInput.test.js +540 -0
  33. package/src/tests/ExecuteInteraction.test.js +319 -0
  34. package/src/tests/ExecuteQuery.test.js +820 -0
  35. package/src/tests/FillExecutor.test.js +2 -2
  36. package/src/tests/StepValidator.test.js +222 -76
  37. package/src/tests/TestRunner.test.js +36 -25
  38. package/src/tests/integration.test.js +2 -1
  39. package/src/types.js +9 -9
  40. package/src/utils/backoff.js +118 -0
  41. package/src/utils/cdp-helpers.js +130 -0
  42. package/src/utils/devices.js +140 -0
  43. package/src/utils/errors.js +242 -0
  44. package/src/utils/index.js +65 -0
  45. package/src/utils/temp.js +75 -0
  46. package/src/utils/validators.js +433 -0
  47. package/src/utils.js +14 -1142
package/SKILL.md CHANGED
@@ -7,1418 +7,272 @@ compatibility: Requires Chrome/Chromium (auto-launched if not running) and Node.
7
7
 
8
8
  # CDP Browser Automation Skill
9
9
 
10
- Automate Chrome browser interactions via JSON passed to a Node.js CLI. Produce JSON step definitions, not JavaScript code.
10
+ Automate Chrome browser interactions via JSON step definitions passed to a Node.js CLI.
11
11
 
12
- ## Purpose
12
+ > **See EXAMPLES.md** for full JSON examples, response shapes, and worked patterns for every step type.
13
13
 
14
- This skill enables **AI-powered browser automation**. The intended workflow:
14
+ ## Site Profiles
15
15
 
16
- 1. **Test definitions** are written as markdown files describing what to test
17
- 2. **An agent** reads the definition, discovers page elements dynamically, and executes using this skill
18
- 3. The agent interprets intent and adapts to page changes - making automation resilient without brittle hardcoded selectors
16
+ Site profiles are per-domain cheatsheets stored at `~/.cdp-skill/sites/{domain}.md`. They record what the agent learned about a site: framework, quirks, stable selectors, and recipes for common flows.
19
17
 
20
- ## Quick Start
18
+ > **IMPORTANT — `actionRequired` responses are MANDATORY.** When any response contains `actionRequired`, you **MUST** complete it before doing anything else. Do not skip it. Do not proceed to your main task first. Handle `actionRequired` immediately.
21
19
 
22
- **Step 1: Check Chrome status (auto-launches if needed)**
23
- ```bash
24
- node src/cdp-skill.js '{"steps":[{"chromeStatus":true}]}'
25
- ```
20
+ ### How navigation uses profiles
26
21
 
27
- Returns:
28
- ```json
29
- {
30
- "status": "ok",
31
- "chrome": {
32
- "running": true,
33
- "launched": true,
34
- "version": "Chrome/120.0.6099.109",
35
- "port": 9222,
36
- "tabs": [{"targetId": "ABC123", "url": "about:blank", "title": ""}]
37
- }
38
- }
39
- ```
40
-
41
- The skill auto-detects Chrome location on macOS, Linux, and Windows. Set `CHROME_PATH` environment variable for custom installations.
22
+ Every `goto`, `newTab` (with URL), and `switchTab` checks for a profile. The result appears as a **top-level field** in the response:
42
23
 
43
- **Step 2: Open a tab and execute steps**
44
- ```bash
45
- # Use openTab to create a new tab - REQUIRED for first call without targetId
46
- node src/cdp-skill.js '{"steps":[{"openTab":"https://google.com"}]}'
24
+ - **`siteProfile`** present read it before doing anything else. It contains strategies, quirks, and recipes. Apply its `settledWhen`/`readyWhen` hooks, use its selectors, and respect its quirks.
25
+ - **`actionRequired`** present → **STOP. Create the profile NOW** before continuing your task:
26
+ 1. `snapshot` map page structure and landmarks
27
+ 2. `pageFunction` — detect framework (e.g. `() => { return { react: !!window.__REACT, next: !!window.__NEXT_DATA__, vue: !!window.__VUE__ } }`)
28
+ 3. `writeSiteProfile` — save domain and markdown content
47
29
 
48
- # Or separate the open and navigate steps:
49
- node src/cdp-skill.js '{"steps":[{"openTab":true},{"goto":"https://google.com"}]}'
50
- ```
30
+ The profile only needs to capture what's useful: environment, quirks, stable selectors, strategies for fill/click/wait, and recipes for common flows. A minimal profile is fine — even just the environment and one quirk is valuable.
51
31
 
52
- Stdin pipe also works:
53
- ```bash
54
- echo '{"steps":[{"openTab":"https://google.com"}]}' | node src/cdp-skill.js
55
- ```
32
+ ### Updating profiles after your task
56
33
 
57
- ### Tab Management (Critical)
34
+ After completing your goal, update the site profile with anything you learned. Discovered a quirk? Found a reliable selector? Worked out a multi-step flow? Call `writeSiteProfile` again with the improved content before closing your tab. If you didn't learn anything new, skip the update.
58
35
 
59
- **To create a new tab:** Use `{"openTab": "URL"}` or `{"openTab": true}` as your first step. This is REQUIRED when no tab id is provided.
36
+ ### Profile format
60
37
 
61
- **Tab IDs:** Each tab gets a short alias like `t1`, `t2`, etc. Use this in subsequent calls:
38
+ ```
39
+ # domain.com
40
+ Updated: YYYY-MM-DD | Fingerprint: <tech-stack>
62
41
 
63
- ```bash
64
- # First call creates tab "t1"
65
- node src/cdp-skill.js '{"steps":[{"openTab":"https://google.com"}]}'
66
- # Response: {"tab": {"id": "t1", ...}, "steps": [{"output": {"tab": "t1", ...}}]}
42
+ ## Environment / Quirks / Strategies / Regions / Recipes
43
+ ```
67
44
 
68
- # Use tab id in subsequent calls
69
- node src/cdp-skill.js '{"config":{"tab":"t1"},"steps":[{"click":"#btn"}]}'
70
- node src/cdp-skill.js '{"config":{"tab":"t1"},"steps":[{"snapshot":true}]}'
45
+ - **Environment**: tech stack, SPA behavior, main element selectors
46
+ - **Quirks**: pitfalls that cause failures without foreknowledge
47
+ - **Strategies**: how to fill, click, or wait on this specific site (include `settledWhen`/`readyWhen` hooks)
48
+ - **Regions**: stable landmark selectors
49
+ - **Recipes**: pre-built step sequences for common flows
71
50
 
72
- # Close by id when done
73
- node src/cdp-skill.js '{"steps":[{"closeTab":"t1"}]}'
74
- ```
51
+ Sections are optional include what's useful. See EXAMPLES.md for a full profile template.
75
52
 
76
- **Important:**
77
- - Calls **without** `tab` or `openTab` will **fail** with a helpful error message
78
- - Tab IDs persist across CLI invocations (stored in temp file)
79
- - Use `openTab` to explicitly create new tabs - prevents accidental tab accumulation
53
+ ### readSiteProfile
80
54
 
55
+ `"domain"` | `{domain}` — returns `{found, domain, content}` or `{found: false, domain}`
81
56
 
82
- ## Input Schema
57
+ ### writeSiteProfile
83
58
 
59
+ `{domain, content}` — returns `{written, path, domain}`
84
60
  ```json
85
- {
86
- "config": {
87
- "host": "localhost",
88
- "port": 9222,
89
- "tab": "t1",
90
- "timeout": 10000,
91
- "headless": false
92
- },
93
- "steps": [...]
94
- }
61
+ {"writeSiteProfile": {"domain": "example.com", "content": "# example.com\nUpdated: 2025-01-15\n\n## Environment\n- React SPA\n..."}}
95
62
  ```
96
63
 
97
- Config options:
98
- - `host`, `port` - CDP connection (default: localhost:9222)
99
- - `tab` - Tab ID to use (required on subsequent calls)
100
- - `timeout` - Command timeout in ms (default: 30000)
101
- - `headless` - Run Chrome in headless mode (default: false). Prevents Chrome from stealing focus. Chrome auto-launches if not running.
64
+ ## Quick Start
102
65
 
103
- ## Output Schema
66
+ ```bash
67
+ echo '{"steps":[{"openTab":"https://google.com"}]}' | node src/cdp-skill.js
68
+ echo '{"tab":"t1","steps":[{"click":"#btn"}]}' | node src/cdp-skill.js
69
+ echo '{"tab":"t1","steps":[{"snapshot":true}]}' | node src/cdp-skill.js
70
+ ```
104
71
 
105
- **Streamlined response format** - minimal payload with only actionable information:
72
+ Tab IDs (t1, t2, ...) persist across CLI invocations. Chrome auto-launches if not running.
106
73
 
107
- ```json
108
- {
109
- "status": "ok",
110
- "tab": "t1",
111
- "navigated": true,
112
- "context": {
113
- "url": "https://example.com/page",
114
- "title": "Page Title",
115
- "scroll": {"y": 0, "percent": 0},
116
- "viewport": {"width": 1189, "height": 739}
117
- },
118
- "screenshot": "/tmp/cdp-skill/t1.after.png",
119
- "fullSnapshot": "/tmp/cdp-skill/t1.after.yaml",
120
- "viewportSnapshot": "- heading \"Title\" [level=1]\n- button \"Submit\" [ref=s1e1]\n...",
121
- "steps": [{"action": "goto", "status": "ok"}]
122
- }
123
- ```
74
+ ## Reliability (v1.0.10-1.0.11)
124
75
 
125
- **Key fields:**
126
- - `tab` - Short tab ID (e.g., "t1") for subsequent calls
127
- - `context.scroll.y/percent` - Current scroll position (horizontal scroll omitted)
128
- - `context.activeElement` - Detailed info about focused element (only when present)
129
- - `context.modal` - Only present when a dialog is open
130
- - `screenshot` - Path to current page screenshot
131
- - `steps[]` - Minimal: `{action, status}` for success, adds `{params, error}` on failure
132
- - `errors` - Only present when steps failed
76
+ Recent improvements to stability and correctness:
133
77
 
134
- **Console messages** - Errors and warnings captured at command-level (not per-step):
135
- ```json
136
- {
137
- "console": {
138
- "errors": 1,
139
- "warnings": 2,
140
- "messages": [{"level": "error", "text": "TypeError: x is undefined", "source": "app.js:142"}]
141
- }
142
- }
143
- ```
78
+ - **Validation robustness** Fixed null pointer crashes in step validation for edge cases with missing or malformed parameters
79
+ - **Race condition fixes** — Resolved timing issues in browser connection initialization, file lock contention, and scroll-wait coordination
80
+ - **Resource cleanup** — Fixed HTTP connection leaks, event listener cleanup, and stderr stream handling
81
+ - **Frame context** — Corrected iframe element location and interaction to respect frame boundaries
82
+ - **Step simplification** — Consolidated 47 steps into 41 unified operations (fill, frame, elementsAt, pageFunction, sleep) for clearer API
144
83
 
145
- Exit code: `0` = ok, `1` = error.
84
+ The skill now passes 1261/1263 unit tests (99.8%) and maintains SHS 99/100 on the cdp-bench evaluation suite.
146
85
 
147
- Error types: `PARSE`, `VALIDATION`, `CONNECTION`, `EXECUTION`
86
+ ## Input / Output Schema
148
87
 
149
- **Failure context** - When a step fails, the step result includes enhanced diagnostics to aid debugging:
150
- ```json
151
- {
152
- "steps": [{
153
- "action": "click",
154
- "status": "error",
155
- "params": {"text": "Nonexistent"},
156
- "error": "Element not found",
157
- "context": {
158
- "url": "https://example.com/page",
159
- "title": "Example Page",
160
- "scrollPosition": {"x": 0, "y": 1200, "maxY": 5000, "percentY": 24},
161
- "visibleButtons": [
162
- {"text": "Submit", "selector": "#submit-btn", "ref": "s1e4"},
163
- {"text": "Cancel", "selector": "button.cancel", "ref": "s1e5"}
164
- ],
165
- "visibleLinks": [{"text": "Home", "href": "..."}],
166
- "visibleErrors": ["Please fill in all required fields"],
167
- "nearMatches": [
168
- {"text": "Submit Form", "selector": "button.submit-form", "ref": "s1e12", "score": 70},
169
- {"text": "Submit Feedback", "selector": "#feedback-submit", "ref": "s1e15", "score": 50}
170
- ]
171
- }
172
- }],
173
- "errors": [{"step": 1, "action": "click", "error": "Element not found"}]
174
- }
175
- ```
88
+ **Input fields:**
89
+ - `tab`: tab alias (e.g. "t1") — required after first call
90
+ - `timeout`: step timeout in ms (default 30000)
91
+ - `steps`: array of step objects (one action per step)
176
92
 
177
- ### Auto-Snapshot with Diff
93
+ **Output fields:**
94
+ - `status`: "ok" or "error"
95
+ - `tab`: short tab ID (e.g. "t1")
96
+ - `siteProfile`: full markdown content of existing profile (after goto/openTab to known site)
97
+ - `actionRequired`: `{action, domain, message}` — **MUST be handled immediately** before continuing (see Site Profiles)
98
+ - `context`: `{url, title, scroll: {y, percent}, viewport: {width, height}, activeElement?, modal?}`
99
+ - `screenshot`: path to after-screenshot (auto-captured on every visual action)
100
+ - `fullSnapshot`: path to full-page accessibility snapshot file
101
+ - `viewportSnapshot`: inline viewport-only snapshot YAML
102
+ - `changes`: `{summary, added[], removed[], changed[]}` — viewport diff on same-page interactions
103
+ - `navigated`: true when URL pathname changed
104
+ - `console`: `{errors, warnings, messages[]}` — captured errors/warnings
105
+ - `steps[]`: `{action, status}` on success; adds `{params, error, context}` on failure
106
+ - `errors[]`: only present when steps failed
178
107
 
179
- Commands automatically capture page context and accessibility snapshot at the end of execution. This helps agents understand what changed:
108
+ **Failure diagnostics**: failed steps include `context` with `visibleButtons`, `visibleLinks`, `visibleErrors`, `nearMatches` (fuzzy matches with scores), and scroll position.
180
109
 
181
- **Navigation (URL changed):**
182
- ```json
183
- {
184
- "status": "ok",
185
- "tab": "t1",
186
- "navigated": true,
187
- "context": {
188
- "url": "https://example.com/new-page",
189
- "title": "New Page Title",
190
- "scroll": {"y": 0, "percent": 0},
191
- "viewport": {"width": 1189, "height": 739}
192
- },
193
- "screenshot": "/tmp/cdp-skill/t1.after.png",
194
- "fullSnapshot": "/tmp/cdp-skill/t1.after.yaml",
195
- "viewportSnapshot": "- heading \"New Page\" [level=1]\n- button \"Submit\" [ref=s1e1]\n...",
196
- "steps": [{"action": "click", "status": "ok"}]
197
- }
198
- ```
110
+ **Error types**: PARSE, VALIDATION, CONNECTION, EXECUTION. Exit code: 0 = ok, 1 = error.
199
111
 
200
- **Same-page interaction (scroll, expand, toggle):**
201
- ```json
202
- {
203
- "status": "ok",
204
- "tab": "t1",
205
- "navigated": false,
206
- "context": {
207
- "url": "https://example.com/page",
208
- "scroll": {"y": 2400, "percent": 65},
209
- "activeElement": {"tag": "INPUT", "type": "text", "selector": "#search", "value": "", "editable": true, "box": {"x": 100, "y": 50, "width": 200, "height": 32}}
210
- },
211
- "screenshot": "/tmp/cdp-skill/t1.after.png",
212
- "fullSnapshot": "/tmp/cdp-skill/t1.after.yaml",
213
- "changes": {
214
- "summary": "Clicked. 3 added (s1e120, s1e121, s1e122), 1 removed (s1e1).",
215
- "added": ["- link \"New Link\" [ref=s1e120]"],
216
- "removed": ["- link \"Old Link\" [ref=s1e1]"],
217
- "changed": [{"ref": "s1e5", "field": "expanded", "from": false, "to": true}]
218
- },
219
- "steps": [{"action": "click", "status": "ok"}]
220
- }
221
- ```
112
+ ## Element References
222
113
 
223
- - `navigated: true` = URL pathname changed
224
- - `navigated: false` = Same page, viewport diff shows what changed
225
- - `viewportSnapshot` = Inline viewport-only snapshot (always included)
226
- - `fullSnapshot` = Path to full page snapshot file (for detailed inspection)
227
- - `changes.summary` = Human-readable one-liner with action context
228
- - `changes.added` = Elements now visible in viewport that weren't before
229
- - `changes.removed` = Elements that scrolled out of viewport
230
- - `changes.changed` = Elements whose state changed (e.g., `[checked]`, `[expanded]`)
231
- - `context.scroll.percent` = Current scroll position as percentage (0-100)
232
- - `context.activeElement` = Focused element details (only when present):
233
- ```json
234
- {
235
- "tag": "INPUT",
236
- "type": "text",
237
- "selector": "#search-input",
238
- "value": "query",
239
- "placeholder": "Search...",
240
- "editable": true,
241
- "box": {"x": 100, "y": 50, "width": 200, "height": 32}
242
- }
243
- ```
244
- - `context.modal` = Open dialog/modal title (only when present)
114
+ Snapshots return versioned refs like `[ref=s1e4]` format: `s{snapshotId}e{elementNumber}`.
115
+ Use refs with `click`, `fill`, `hover`. Each snapshot increments the ID. Refs from earlier snapshots remain valid while the element is in DOM.
245
116
 
117
+ **Auto re-resolution**: when a ref's element leaves the DOM (React re-render, lazy-load), the system tries to re-find it by stored selector + role + name. Response includes `reResolved: true` on success.
246
118
 
247
119
  ## Auto-Waiting
248
120
 
249
- All interaction actions (`click`, `fill`, `hover`, `type`) automatically wait for elements to be actionable before proceeding. Retries use exponential backoff with jitter (1.9-2.1x random factor) to avoid thundering herd issues.
250
-
251
121
  | Action | Waits For |
252
122
  |--------|-----------|
253
123
  | `click` | visible, enabled, stable, not covered, pointer-events |
254
- | `fill`, `type` | visible, enabled, editable |
124
+ | `fill` | visible, enabled, editable |
255
125
  | `hover` | visible, stable |
256
126
 
257
- **State definitions:**
258
- - **visible**: In DOM, not `display:none`, not `visibility:hidden`, has dimensions
259
- - **enabled**: Not disabled, not `aria-disabled="true"`
260
- - **editable**: Enabled + not readonly + is input/textarea/select/contenteditable
261
- - **stable**: Position unchanged for 3 consecutive animation frames
262
- - **not covered**: Element at click coordinates matches target (detects overlays/modals)
263
- - **pointer-events**: CSS `pointer-events` is not `none`
264
-
265
- **Force options:**
266
- - Use `force: true` to bypass all checks immediately
267
- - **Auto-force**: When actionability times out but element exists, automatically retries with `force: true`. This helps with overlays, cookie banners, and loading spinners that may obscure elements. Outputs include `autoForced: true` when this occurs.
268
-
269
- **Performance optimizations:**
270
- - Browser-side polling using MutationObserver (reduces network round-trips)
271
- - Content quads for accurate click positioning with CSS transforms
272
- - InsertText API for fast form fills (like paste)
273
- - IntersectionObserver for efficient viewport detection
274
-
275
-
276
- ## Element References
277
-
278
- The `snapshot` step returns an accessibility tree with versioned refs like `[ref=s1e4]`. The format is `s{snapshotId}e{elementNumber}`:
279
-
280
- ```json
281
- {"steps":[{"snapshot": true}]}
282
- // Response includes:
283
- // - snapshotId: "s1"
284
- // - yaml: "- button \"Submit\" [ref=s1e4]"
285
-
286
- {"config":{"tab":"t1"},"steps":[{"click":"s1e4"}]}
287
- ```
288
-
289
- Refs work with: `click`, `fill`, `hover`.
290
-
291
- **Versioned Refs**: Each snapshot increments the snapshot ID. This allows refs to remain valid even across page changes:
292
- - `s1e4` = element 4 from snapshot 1
293
- - `s2e10` = element 10 from snapshot 2
294
-
295
- Refs from earlier snapshots remain valid as long as the element is still in the DOM.
296
-
297
- **Ref Resilience (Auto Re-Resolution)**: When a ref's original element is no longer in the DOM (e.g., after a React re-render or lazy-load replacement), the system automatically attempts to re-find the element using stored metadata:
298
-
299
- 1. Looks up the CSS selector, ARIA role, and accessible name saved when the ref was created
300
- 2. Queries the DOM for an element matching that selector
301
- 3. Verifies the candidate has the same role and name (prevents wrong-element matches)
302
- 4. If a match is found, the ref is transparently updated and the action proceeds
303
-
304
- This means refs survive common SPA scenarios like component re-renders, virtual DOM reconciliation, and lazy-loaded content replacements. The response includes `reResolved: true` when a ref was re-found via fallback. If re-resolution fails, the ref is reported as stale with a suggestion to re-snapshot.
305
-
306
-
307
- ## Step Reference
308
-
309
- ### Chrome Management
310
-
311
- > **IMPORTANT**: Never launch Chrome manually via shell commands (`open`, `start`, `google-chrome`, etc.). Always use `chromeStatus` to manage Chrome. The skill handles launching Chrome with the correct CDP debugging flags, detecting existing instances, and managing tabs. Manual Chrome launches will not have CDP enabled and will cause connection errors.
312
-
313
- **chromeStatus** - Check if Chrome is running, auto-launch if not
314
- ```json
315
- {"chromeStatus": true}
316
- {"chromeStatus": {"autoLaunch": false}}
317
- {"chromeStatus": {"headless": true}}
318
- ```
319
- Options: `autoLaunch` (default: true), `headless` (default: false)
320
-
321
- Returns:
322
- ```json
323
- {
324
- "running": true,
325
- "launched": false,
326
- "version": "Chrome/120.0.6099.109",
327
- "port": 9222,
328
- "tabs": [
329
- {"targetId": "ABC123...", "url": "https://google.com", "title": "Google"}
330
- ]
331
- }
332
- ```
333
-
334
- If Chrome cannot be found: `{running: false, launched: false, error: "Chrome not found..."}`
335
-
336
- **Note:** This step is lightweight - it doesn't create a session. Use it as your first call to ensure Chrome is ready, then use `openTab` to create a new tab.
337
-
338
- ### Navigation
339
-
340
- **goto** - Navigate to URL
341
- ```json
342
- {"goto": "https://google.com"}
343
- {"goto": {"url": "https://google.com", "waitUntil": "networkidle"}}
344
- ```
345
- Options (object format): `url`, `waitUntil` (commit|domcontentloaded|load|networkidle)
346
-
347
- **reload** - Reload current page
348
- ```json
349
- {"reload": true}
350
- {"reload": {"waitUntil": "networkidle"}}
351
- ```
352
- Options: `waitUntil` (commit|domcontentloaded|load|networkidle)
353
-
354
- **back** / **forward** - History navigation
355
- ```json
356
- {"back": true}
357
- {"forward": true}
358
- ```
359
- Returns: `{url, title}` or `{noHistory: true}` if no history entry exists.
360
-
361
- **waitForNavigation** - Wait for navigation to complete
362
- ```json
363
- {"waitForNavigation": true}
364
- {"waitForNavigation": {"timeout": 5000, "waitUntil": "networkidle"}}
365
- ```
366
- Options: `timeout`, `waitUntil` (commit|domcontentloaded|load|networkidle)
367
-
368
- **Note:** For click-then-wait patterns, the system uses a two-step event pattern to prevent race conditions - it subscribes to navigation events BEFORE clicking to ensure fast navigations aren't missed.
369
-
370
-
371
- ### Frame/iFrame Navigation
372
-
373
- **listFrames** - List all frames in the page
374
- ```json
375
- {"listFrames": true}
376
- ```
377
- Returns: `{mainFrameId, currentFrameId, frames: [{frameId, url, name, parentId, depth}]}`
378
-
379
- **switchToFrame** - Switch to an iframe
380
- ```json
381
- {"switchToFrame": "iframe#content"}
382
- {"switchToFrame": 0}
383
- {"switchToFrame": {"selector": "iframe.editor"}}
384
- {"switchToFrame": {"index": 1}}
385
- {"switchToFrame": {"name": "myFrame"}}
386
- ```
387
- Options: CSS selector (string), index (number), or object with `selector`, `index`, `name`, or `frameId`
388
-
389
- Returns: `{frameId, url, name}`
390
-
391
- **switchToMainFrame** - Switch back to main frame
392
- ```json
393
- {"switchToMainFrame": true}
394
- ```
395
- Returns: `{frameId, url, name}`
396
-
397
- **Note:** After switching to a frame, all subsequent actions execute in that frame context until you switch to another frame or back to main.
398
-
399
-
400
- ### Waiting
401
-
402
- **wait** - Wait for element
403
- ```json
404
- {"wait": "#content"}
405
- {"wait": {"selector": "#loading", "hidden": true}}
406
- {"wait": {"selector": ".item", "minCount": 10}}
407
- ```
408
-
409
- **wait** - Wait for text
410
- ```json
411
- {"wait": {"text": "Welcome"}}
412
- {"wait": {"textRegex": "Order #[A-Z0-9]+"}}
413
- ```
414
-
415
- **wait** - Wait for URL
416
- ```json
417
- {"wait": {"urlContains": "/success"}}
418
- ```
419
-
420
- **wait** - Fixed time (ms)
421
- ```json
422
- {"wait": 2000}
423
- ```
424
-
425
- **Network idle detection:** The `networkidle` wait condition uses a precise counter-based tracker that monitors all network requests. It considers the network "idle" when no requests have been pending for 500ms.
426
-
427
-
428
- ### Interaction
429
-
430
- **click** - Click element
431
- ```json
432
- {"click": "#submit"}
433
- {"click": {"selector": "#btn", "verify": true}}
434
- {"click": {"ref": "s1e4"}}
435
- {"click": {"x": 450, "y": 200}}
436
- ```
437
- Options: `selector`, `ref`, `x`/`y`, `force`, `debug`, `timeout`, `jsClick`, `nativeOnly`
438
-
439
- Returns: `{clicked: true, method: "cdp"|"jsClick"|"jsClick-auto"}`. With navigation: adds `{navigated: true, newUrl: "..."}`.
440
-
441
- **Automatic Click Verification**
442
- Clicks are automatically verified - if CDP mouse events don't reach the target element (common on React, Vue, Next.js sites), the system automatically falls back to JavaScript click. The `method` field shows what was used:
443
- - `"cdp"` - CDP mouse events worked
444
- - `"jsClick"` - User requested `jsClick: true`
445
- - `"jsClick-auto"` - CDP failed, automatic fallback to JavaScript click
446
-
447
- **click** - Force JavaScript click
448
- ```json
449
- {"click": {"selector": "#submit", "jsClick": true}}
450
- {"click": {"ref": "s1e4", "jsClick": true}}
451
- ```
452
- Use `jsClick: true` to skip CDP and use JavaScript `element.click()` directly.
453
-
454
- **click** - Disable auto-fallback
455
- ```json
456
- {"click": {"selector": "#btn", "nativeOnly": true}}
457
- ```
458
- Use `nativeOnly: true` to disable the automatic jsClick fallback. The click will use CDP only and report `targetReceived: false` if the click didn't reach the element.
459
-
460
- **click** - Multi-selector fallback
461
- ```json
462
- {"click": {"selectors": ["[ref=s1e4]", "#submit", {"role": "button", "name": "Submit"}]}}
463
- ```
464
- Tries each selector in order until one succeeds. Accepts CSS selectors, refs, or role-based objects.
465
-
466
- Returns: `{clicked: true, matchedSelector: "#submit"}` indicating which selector succeeded.
127
+ Use `force: true` to bypass all checks. **Auto-force**: when actionability times out but element exists, automatically retries with force (outputs `autoForced: true`).
467
128
 
468
- **click** - Click by visible text
469
- ```json
470
- {"click": {"text": "Submit"}}
471
- {"click": {"text": "Learn more", "exact": true}}
472
- ```
473
- Finds and clicks an element containing the specified visible text. Use `exact: true` for exact match.
474
-
475
- **click** - Frame auto-detection
476
- ```json
477
- {"click": {"selector": "#editor", "searchFrames": true}}
478
- ```
479
- When `searchFrames: true`, searches for the element in all frames (main and iframes) and automatically switches to the correct frame before clicking.
129
+ ## Action Hooks
480
130
 
481
- **click** - Scroll until visible
482
- ```json
483
- {"click": {"selector": "#btn", "scrollUntilVisible": true}}
484
- ```
485
- Automatically scrolls the page to bring the element into view before clicking. Useful for elements that are off-screen.
131
+ Optional parameters on action steps to customize the step lifecycle:
486
132
 
487
- **click** - Auto-wait after click
488
- ```json
489
- {"click": "#submit", "waitAfter": true}
490
- {"click": {"selector": "#nav-link", "waitAfter": {"networkidle": true}}}
491
- {"click": {"selector": "#tab", "waitAfter": {"delay": 500}}}
492
- ```
493
- Waits for the page to settle after clicking. With `true`, waits for network idle. Can also specify `{delay: ms}` for fixed wait.
133
+ - **readyWhen**: `"() => condition"` — polled until truthy **before** the action executes
134
+ - **settledWhen**: `"() => condition"` — polled until truthy **after** the action completes
135
+ - **observe**: `"() => data"` — runs after settlement, return value appears in `result.observation`
494
136
 
495
- **click diagnostics** - When a click fails due to element interception (covered by another element), the error response includes diagnostic information:
496
- ```json
497
- {
498
- "error": "Element is covered by another element",
499
- "interceptedBy": {
500
- "tagName": "div",
501
- "id": "modal-overlay",
502
- "className": "overlay active",
503
- "textContent": "Loading..."
504
- }
505
- }
506
- ```
507
- This helps identify overlays, modals, or loading spinners blocking the target element.
137
+ Hooks can be combined on any action step. Applies to: click, fill, press, hover, drag, selectOption, scroll.
508
138
 
509
- **fill** - Fill input (clears first)
510
- ```json
511
- {"fill": {"selector": "#email", "value": "user@example.com"}}
512
- {"fill": {"ref": "s1e3", "value": "text"}}
513
- ```
514
- Options: `selector`, `ref`, `value`, `clear` (default: true), `react`, `force`, `timeout`
139
+ ## Core Steps
515
140
 
516
- Returns: `{filled: true}`. If the page navigates during fill (e.g., SPA auto-complete): `{filled: true, navigated: true, newUrl: "..."}`
141
+ ### chromeStatus (optional diagnostic)
142
+ `true` | `{host, port, headless, autoLaunch}` — returns `{running, launched, version, port, tabs[]}`
517
143
 
518
- **fill** - Fill by label
519
- ```json
520
- {"fill": {"label": "Email address", "value": "test@example.com"}}
521
- {"fill": {"label": "Password", "value": "secret123", "exact": true}}
522
- ```
523
- Finds an input by its associated label text and fills it. Uses `<label for="...">` associations or labels wrapping inputs. Use `exact: true` for exact label match.
144
+ > **Note**: You rarely need this — `newTab` auto-launches Chrome. Use `chromeStatus` only for diagnostics or non-default ports.
524
145
 
525
- **fillForm** - Fill multiple fields
526
- ```json
527
- {"fillForm": {"#firstName": "John", "#lastName": "Doe"}}
528
- ```
529
- Returns: `{total, filled, failed, results: [{selector, status, value}]}`
146
+ ### newTab
147
+ `true` | `"url"` | `{url, host, port, headless}` — returns `{opened, tab, url, navigated, viewportSnapshot, fullSnapshot, context}`
148
+ Response includes top-level `siteProfile` or `actionRequired` when URL provided (see Site Profiles).
149
+ **REQUIRED as first step** when no tab specified. Chrome auto-launches if not running.
150
+ Non-default Chrome: `{"steps":[{"newTab":{"url":"https://example.com","port":9333,"headless":true}}]}`
530
151
 
531
- **fillActive** - Fill the currently focused element (no selector needed)
532
- ```json
533
- {"fillActive": "search query"}
534
- {"fillActive": {"value": "text", "clear": false}}
535
- ```
536
- Options: `value`, `clear` (default: true)
152
+ ### goto
153
+ `"url"` | `{url, waitUntil}` — waitUntil: commit | domcontentloaded | load | networkidle
154
+ Response includes top-level `siteProfile` or `actionRequired` (see Site Profiles).
537
155
 
538
- Returns: `{filled: true, tag: "INPUT", type: "text", selector: "#search", valueBefore: "", valueAfter: "search query"}`
539
-
540
- Useful when refs go stale or when you just clicked an element and want to type into it.
541
-
542
- **type** - Type text (no clear)
543
- ```json
544
- {"type": {"selector": "#search", "text": "query", "delay": 50}}
545
- ```
546
- Returns: `{selector, typed, length}`
547
-
548
- **press** - Keyboard key/combo
549
- ```json
550
- {"press": "Enter"}
551
- {"press": "Control+a"}
552
- {"press": "Meta+Shift+Enter"}
553
- ```
554
-
555
- **select** - Select text in input
556
- ```json
557
- {"select": "#input"}
558
- {"select": {"selector": "#input", "start": 0, "end": 5}}
559
- ```
560
- Returns: `{selector, start, end, selectedText, totalLength}`
561
-
562
- **hover** - Mouse over element
563
- ```json
564
- {"hover": "#menu"}
565
- {"hover": {"selector": "#tooltip", "duration": 500}}
566
- ```
567
-
568
- **hover** - With result capture
569
- ```json
570
- {"hover": {"selector": "#menu", "captureResult": true}}
571
- ```
572
- When `captureResult: true`, captures information about elements that appear after hovering (tooltips, dropdowns, etc.).
573
-
574
- Returns:
575
- ```json
576
- {
577
- "hovered": true,
578
- "capturedResult": {
579
- "visibleElements": [
580
- {"selector": ".tooltip", "text": "Click to edit", "visible": true},
581
- {"selector": ".dropdown-menu", "itemCount": 5}
582
- ]
583
- }
584
- }
585
- ```
586
-
587
- **drag** - Drag element from source to target
588
- ```json
589
- {"drag": {"source": "#draggable", "target": "#dropzone"}}
590
- {"drag": {"source": {"ref": "s1e1"}, "target": {"ref": "s1e5"}}}
591
- {"drag": {"source": {"ref": "s1e1", "offsetX": 20}, "target": {"ref": "s1e5", "offsetY": -10}}}
592
- {"drag": {"source": {"x": 100, "y": 100}, "target": {"x": 300, "y": 200}}}
593
- {"drag": {"source": "#item", "target": "#container", "steps": 20, "delay": 10}}
594
- ```
595
- Options:
596
- - `source`/`target`: selector string, ref string (`"s1e1"`), ref object with offsets (`{"ref": "s1e1", "offsetX": 10, "offsetY": -5}`), or coordinates (`{x, y}`)
597
- - `offsetX`/`offsetY`: offset from element center (default: 0)
598
- - `steps` (default: 10), `delay` (ms, default: 0)
599
-
600
- Returns: `{dragged: true, method: "html5-dnd"|"range-input"|"mouse-events", source: {x, y}, target: {x, y}, steps}`
601
-
602
- The `method` field indicates which drag strategy was used:
603
- - `"html5-dnd"` - HTML5 Drag and Drop API (for draggable elements)
604
- - `"range-input"` - Direct value manipulation (for `<input type="range">` sliders)
605
- - `"mouse-events"` - JavaScript mouse event simulation (for custom drag implementations)
606
-
607
- **selectOption** - Select option(s) in a native `<select>` dropdown
608
- ```json
609
- {"selectOption": {"selector": "#country", "value": "US"}}
610
- {"selectOption": {"selector": "#country", "label": "United States"}}
611
- {"selectOption": {"selector": "#country", "index": 2}}
612
- {"selectOption": {"selector": "#colors", "values": ["red", "blue"]}}
613
- ```
614
- Options: `selector`, `value` (option value), `label` (option text), `index` (0-based), `values` (array for multi-select)
156
+ ### click
157
+ `"selector"` | `"ref"` | `{ref, selector, text, x/y, selectors[]}`
158
+ Options: force, jsClick, nativeOnly, doubleClick, scrollUntilVisible, searchFrames, exact, timeout, waitAfter
159
+ Hooks: readyWhen, settledWhen, observe
160
+ Returns: `{clicked, method: "cdp"|"jsClick"|"jsClick-auto", navigated?, newUrl?, newTabs?: [{targetId, url, title}]}`
615
161
 
616
- Returns: `{selected: ["US"], multiple: false}`
162
+ ### fill
163
+ `{selector|ref|label, value}` — the primary way to input text.
164
+ Options: clear(true), react, force, exact, timeout
165
+ Hooks: readyWhen, settledWhen, observe
166
+ Returns: `{filled, navigated?, newUrl?}`
617
167
 
618
- Note: This uses JavaScript to set `option.selected` and dispatch change events (same approach as Puppeteer/Playwright). Native dropdowns cannot be clicked via CDP.
168
+ ### press
169
+ `"Enter"` | `"Control+a"` | `"Meta+Shift+Enter"` — keyboard shortcuts and key presses.
619
170
 
620
-
621
- ### Scrolling
622
-
623
- ```json
624
- {"scroll": "top"}
625
- {"scroll": "bottom"}
626
- {"scroll": "#element"}
627
- {"scroll": {"deltaY": 500}}
628
- {"scroll": {"x": 0, "y": 1000}}
629
- ```
171
+ ### scroll
172
+ `"top"` | `"bottom"` | `"selector"` | `{deltaY}` | `{x, y}`
630
173
  Returns: `{scrollX, scrollY}`
631
174
 
632
-
633
- ### Data Extraction
634
-
635
- **extract** - Extract structured data from page
636
- ```json
637
- {"extract": "table.results"}
638
- {"extract": {"selector": "table.results"}}
639
- {"extract": {"selector": "ul.items", "type": "list"}}
640
- {"extract": {"selector": "#data-grid", "type": "table", "includeHeaders": true}}
641
- ```
642
- Options: `selector`, `type` (auto|table|list|text), `includeHeaders`
643
-
644
- Automatically detects data structure (tables, lists, etc.) and returns structured output:
645
-
646
- Table extraction:
647
- ```json
648
- {
649
- "type": "table",
650
- "headers": ["Name", "Email", "Status"],
651
- "rows": [
652
- ["John Doe", "john@example.com", "Active"],
653
- ["Jane Smith", "jane@example.com", "Pending"]
654
- ],
655
- "rowCount": 2,
656
- "columnCount": 3
657
- }
658
- ```
659
-
660
- List extraction:
661
- ```json
662
- {
663
- "type": "list",
664
- "items": ["Item 1", "Item 2", "Item 3"],
665
- "itemCount": 3
666
- }
667
- ```
668
-
669
- **getDom** - Get raw HTML of page or element
670
- ```json
671
- {"getDom": true}
672
- {"getDom": "#content"}
673
- {"getDom": {"selector": "#content", "outer": false}}
674
- ```
675
- Options: `selector` (CSS selector, omit for full page), `outer` (default: true, include element's own tag)
676
-
677
- Returns: `{html, tagName, selector, length}`
678
-
679
- **getBox** - Get bounding box and position of refs
680
- ```json
681
- {"getBox": "s1e1"}
682
- {"getBox": ["s1e1", "s1e2", "s2e3"]}
683
- {"getBox": {"refs": ["s1e1", "s1e5"]}}
684
- ```
685
-
686
- Single ref returns:
687
- ```json
688
- {"x": 100, "y": 200, "width": 150, "height": 40, "center": {"x": 175, "y": 220}}
689
- ```
690
-
691
- Multiple refs return object keyed by ref:
692
- ```json
693
- {
694
- "s1e1": {"x": 100, "y": 200, "width": 150, "height": 40, "center": {"x": 175, "y": 220}},
695
- "s1e2": {"error": "stale", "message": "Element no longer in DOM"},
696
- "s2e3": {"error": "hidden", "box": {"x": 0, "y": 0, "width": 100, "height": 50}}
697
- }
698
- ```
699
-
700
- **refAt** - Get or create ref for element at coordinates
701
- ```json
702
- {"refAt": {"x": 600, "y": 200}}
703
- ```
704
-
705
- Finds the element at the given viewport coordinates and returns/creates a ref for it. Useful when you need to interact with an element found visually (e.g., from a screenshot) rather than by selector.
706
-
707
- Returns:
708
- ```json
709
- {
710
- "ref": "s1e5",
711
- "existing": false,
712
- "tag": "BUTTON",
713
- "selector": "#submit-btn",
714
- "clickable": true,
715
- "role": "button",
716
- "name": "Submit",
717
- "box": {"x": 580, "y": 190, "width": 100, "height": 40}
718
- }
719
- ```
720
-
721
- **elementsAt** - Get refs for elements at multiple coordinates
722
- ```json
723
- {"elementsAt": [{"x": 100, "y": 200}, {"x": 300, "y": 400}, {"x": 500, "y": 150}]}
724
- ```
725
-
726
- Batch version of `refAt` for checking multiple points at once.
727
-
728
- Returns:
729
- ```json
730
- {
731
- "count": 3,
732
- "elements": [
733
- {"x": 100, "y": 200, "ref": "s1e1", "tag": "BUTTON", "selector": "#btn1", "clickable": true, ...},
734
- {"x": 300, "y": 400, "ref": "s1e2", "tag": "DIV", "selector": "div.card", "clickable": false, ...},
735
- {"x": 500, "y": 150, "error": "No element at this coordinate"}
736
- ]
737
- }
738
- ```
739
-
740
- **elementsNear** - Get refs for elements near a coordinate
741
- ```json
742
- {"elementsNear": {"x": 400, "y": 300}}
743
- {"elementsNear": {"x": 400, "y": 300, "radius": 100}}
744
- {"elementsNear": {"x": 400, "y": 300, "radius": 75, "limit": 10}}
745
- ```
746
-
747
- Finds all visible elements within a radius (default 50px) of the given point, sorted by distance.
748
-
749
- Options: `x`, `y`, `radius` (default: 50), `limit` (default: 20)
750
-
751
- Returns:
752
- ```json
753
- {
754
- "center": {"x": 400, "y": 300},
755
- "radius": 50,
756
- "count": 5,
757
- "elements": [
758
- {"ref": "s1e1", "tag": "BUTTON", "selector": "#nearby-btn", "clickable": true, "distance": 12, ...},
759
- {"ref": "s1e2", "tag": "SPAN", "selector": "span.label", "clickable": false, "distance": 28, ...}
760
- ]
761
- }
762
- ```
763
-
764
- Each element includes: `ref`, `tag`, `selector`, `clickable`, `role`, `name`, `distance`, `box`
765
-
766
- **formState** - Dump current form state
767
- ```json
768
- {"formState": "#checkout-form"}
769
- {"formState": {"selector": "form.registration", "includeHidden": true}}
770
- ```
771
- Options: `selector`, `includeHidden` (default: false)
772
-
773
- Returns complete form state including all field values, validation states, and field types:
774
- ```json
775
- {
776
- "selector": "#checkout-form",
777
- "action": "/api/checkout",
778
- "method": "POST",
779
- "fields": [
780
- {
781
- "name": "email",
782
- "type": "email",
783
- "value": "user@example.com",
784
- "label": "Email Address",
785
- "required": true,
786
- "valid": true
787
- },
788
- {
789
- "name": "quantity",
790
- "type": "number",
791
- "value": "2",
792
- "label": "Quantity",
793
- "required": true,
794
- "valid": true,
795
- "min": 1,
796
- "max": 100
797
- },
798
- {
799
- "name": "country",
800
- "type": "select",
801
- "value": "US",
802
- "label": "Country",
803
- "options": [
804
- {"value": "US", "text": "United States", "selected": true},
805
- {"value": "CA", "text": "Canada", "selected": false}
806
- ]
807
- }
808
- ],
809
- "valid": true,
810
- "fieldCount": 3
811
- }
812
- ```
813
-
814
- **query** - Find elements by CSS
815
- ```json
816
- {"query": "h1"}
817
- {"query": {"selector": "a", "limit": 5, "output": "href"}}
818
- {"query": {"selector": "div", "output": ["text", "href"]}}
819
- {"query": {"selector": "button", "output": {"attribute": "data-id"}}}
820
- ```
821
- Options: `selector`, `limit` (default: 10), `output` (text|html|href|value|tag|array|attribute object), `clean`, `metadata`
822
-
823
- Returns: `{selector, total, showing, results: [{index, value}]}`
824
-
825
- **query** - Find by ARIA role
826
- ```json
827
- {"query": {"role": "button"}}
828
- {"query": {"role": "button", "name": "Submit"}}
829
- {"query": {"role": "heading", "level": 2}}
830
- {"query": {"role": ["button", "link"], "refs": true}}
831
- ```
832
- Options: `role`, `name`, `nameExact`, `nameRegex`, `checked`, `disabled`, `level`, `countOnly`, `refs`
833
-
834
- Supported roles: `button`, `textbox`, `checkbox`, `link`, `heading`, `listitem`, `option`, `combobox`, `radio`, `img`, `tab`, `tabpanel`, `menu`, `menuitem`, `dialog`, `alert`, `navigation`, `main`, `search`, `form`
835
-
836
- **queryAll** - Multiple queries at once
837
- ```json
838
- {"queryAll": {"title": "h1", "links": "a", "buttons": {"role": "button"}}}
839
- ```
840
-
841
- **inspect** - Page overview
842
- ```json
843
- {"inspect": true}
844
- {"inspect": {"selectors": [".item"], "limit": 3}}
845
- ```
846
- Returns: `{title, url, counts: {links, buttons, inputs, images, headings}, custom: {...}}`
847
-
848
- **console** - Browser console logs
849
- ```json
850
- {"console": true}
851
- {"console": {"level": "error", "limit": 20, "stackTrace": true}}
852
- ```
853
- Options: `level`, `type`, `since`, `limit`, `clear`, `stackTrace`
854
-
855
- Returns: `{total, showing, messages: [{level, text, type, url, line, timestamp, stackTrace?}]}`
856
-
857
- Note: Console logs don't persist across CLI invocations.
858
-
859
-
860
- ### Screenshots & PDF
861
-
862
- **Automatic Screenshots** - Every visual action captures before/after screenshots automatically.
863
-
864
- Visual actions: `goto`, `openTab`, `click`, `fill`, `type`, `hover`, `press`, `scroll`, `wait`, `snapshot`, `query`, `queryAll`, `inspect`, `eval`, `extract`, `formState`, `drag`, `select`, `validate`, `submit`, `assert`
865
-
866
- Screenshots are saved to: `/tmp/cdp-skill/<tab>.before.png` and `/tmp/cdp-skill/<tab>.after.png`
867
-
868
- ```json
869
- {
870
- "summary": "OK | 1 step | after: /tmp/cdp-skill/t1.after.png | ...",
871
- "screenshotBefore": "/tmp/cdp-skill/t1.before.png",
872
- "screenshotAfter": "/tmp/cdp-skill/t1.after.png",
873
- "hint": "Use Read tool to view screenshotAfter (current state) and screenshotBefore (previous state)"
874
- }
875
- ```
876
-
877
- **pdf** - Generate PDF of page
878
- ```json
879
- {"pdf": "report.pdf"}
880
- {"pdf": {"path": "/absolute/path/report.pdf", "landscape": true, "printBackground": true}}
881
- ```
882
- Options: `path`, `selector`, `landscape`, `printBackground`, `scale`, `paperWidth`, `paperHeight`, margins, `pageRanges`, `validate`
883
-
884
- Returns: `{path, fileSize, fileSizeFormatted, pageCount, dimensions, validation?}`
885
-
886
- **Note:** Relative paths are saved to the platform temp directory (`$TMPDIR/cdp-skill/` on macOS/Linux, `%TEMP%\cdp-skill\` on Windows).
887
-
888
-
889
- ### Dynamic Browser Execution
890
-
891
- **pageFunction** - Run agent-generated JavaScript in the browser
892
- ```json
893
- {"pageFunction": "() => document.title"}
894
- {"pageFunction": "(document) => [...document.querySelectorAll('.item')].map(i => ({text: i.textContent, href: i.href}))"}
895
- ```
896
-
897
- Object form with options:
898
- ```json
899
- {"pageFunction": {"fn": "(refs) => refs.size", "refs": true}}
900
- {"pageFunction": {"fn": "() => document.querySelectorAll('button').length", "timeout": 5000}}
901
- ```
902
- Options: `fn` (function string), `refs` (pass `window.__ariaRefs` as first argument), `timeout` (ms)
903
-
904
- Differences from `eval`:
905
- - Function is auto-wrapped as IIFE — no need for `(function(){...})()`
906
- - `refs: true` passes the aria ref map as the first argument
907
- - Return value is auto-serialized (handles DOMRect, NodeList, Map, etc.)
908
- - Runs in current frame context (respects `switchToFrame`)
909
- - Errors include the function source for debugging
910
-
911
- Returns: Serialized value (same type system as `eval`)
912
-
913
- **poll** - Wait for a condition by polling a predicate
914
- ```json
915
- {"poll": "() => document.querySelector('.loaded') !== null"}
916
- {"poll": "() => document.readyState === 'complete'"}
917
- ```
918
-
919
- Object form with options:
920
- ```json
921
- {"poll": {"fn": "() => !document.querySelector('.spinner') && document.querySelector('.results')?.children.length > 0", "interval": 100, "timeout": 10000}}
922
- ```
923
- Options: `fn` (predicate string), `interval` (ms, default: 100), `timeout` (ms, default: 30000)
924
-
925
- Returns:
926
- - On success: `{resolved: true, value: <result>, elapsed: <ms>}`
927
- - On timeout: `{resolved: false, elapsed: <ms>, lastValue: <result>}`
928
-
929
- Use `poll` when you need to wait for a custom condition that the built-in `wait` step doesn't cover.
930
-
931
- **pipeline** - Multi-step browser-side transaction (zero roundtrips)
932
- ```json
933
- {"pipeline": [
934
- {"find": "#username", "fill": "admin"},
935
- {"find": "#password", "fill": "secret_sauce"},
936
- {"find": "#login-button", "click": true},
937
- {"waitFor": "() => location.pathname.includes('/inventory')"},
938
- {"sleep": 500},
939
- {"return": "() => document.querySelector('.title')?.textContent"}
940
- ]}
941
- ```
942
-
943
- Object form with timeout:
944
- ```json
945
- {"pipeline": {"steps": [...], "timeout": 15000}}
946
- ```
947
-
948
- Micro-operations:
949
- - `find` + `fill` — querySelector + set value + dispatch events (uses native setter for React compatibility)
950
- - `find` + `click` — querySelector + el.click()
951
- - `find` + `type` — querySelector + focus + keydown/keypress/input/keyup per char
952
- - `find` + `check` — querySelector + set checked + dispatch events
953
- - `find` + `select` — querySelector on `<select>` + set value
954
- - `waitFor` — poll predicate until truthy (default timeout: 10s)
955
- - `sleep` — setTimeout delay (ms)
956
- - `return` — evaluate function and collect return value
957
-
958
- Returns:
959
- - On success: `{completed: true, steps: <count>, results: [...]}`
960
- - On error: `{completed: false, failedAt: <index>, error: <message>, results: [...]}`
961
-
962
- The entire pipeline compiles into a single async JS function and executes via one CDP call — no network roundtrips between micro-ops. Use this for multi-step flows like login forms where latency matters.
963
-
964
-
965
- ### Action Hooks
966
-
967
- Optional parameters on action steps (`click`, `fill`, `press`, `hover`, `drag`, `selectOption`, `scroll`) to customize the step lifecycle:
968
-
969
- **readyWhen** - Custom pre-action readiness check
970
- ```json
971
- {"click": {"ref": "s1e5", "readyWhen": "() => !document.querySelector('.loading')"}}
972
- {"fill": {"selector": "#email", "value": "test@test.com", "readyWhen": "() => document.querySelector('#email').offsetHeight > 0"}}
973
- ```
974
- Polled until truthy before the action executes. Replaces/augments default actionability checks.
975
-
976
- **settledWhen** - Custom post-action settlement check
977
- ```json
978
- {"click": {"ref": "s1e5", "settledWhen": "() => document.querySelector('.results')?.children.length > 0"}}
979
- {"click": {"selector": "#nav-link", "settledWhen": "() => location.href.includes('/results')"}}
980
- ```
981
- Polled until truthy after the action completes. Use this instead of separate `poll`/`wait` steps when you know what the action should produce.
982
-
983
- **observe** - Capture data after settlement
984
- ```json
985
- {"click": {"ref": "s1e5", "observe": "() => ({url: location.href, count: document.querySelectorAll('.item').length})"}}
986
- ```
987
- Runs after the action (and after `settledWhen` if present). Return value appears in `result.observation`. Use this to capture post-action state without a separate step.
988
-
989
- Hooks can be combined:
990
- ```json
991
- {"click": {
992
- "ref": "s1e5",
993
- "readyWhen": "() => !document.querySelector('.loading')",
994
- "settledWhen": "() => document.querySelectorAll('.result').length > 0",
995
- "observe": "() => document.querySelectorAll('.result').length"
996
- }}
997
- ```
998
-
999
-
1000
- ### Site Manifests
1001
-
1002
- Per-domain knowledge files at `~/.cdp-skill/sites/{domain}.md` that capture how a site works.
1003
-
1004
- **Light auto-fit** runs automatically on first `goto` to an unknown domain. It detects frameworks (React, Next.js, Vue, Angular, Svelte, jQuery, Turbo, htmx, Nuxt, Remix), content regions, and SPA signals. The `goto` response includes:
1005
- - `fitted: {domain, level: "light", frameworks: [...]}` — when a new manifest was created
1006
- - `siteManifest: "..."` — markdown content of existing manifest (on subsequent visits)
1007
-
1008
- **writeSiteManifest** - Write or update a site manifest
1009
- ```json
1010
- {"writeSiteManifest": {"domain": "github.com", "content": "# github.com\nFitted: 2024-02-03 (full)\n\n## Environment\n..."}}
1011
- ```
1012
-
1013
- Returns: `{written: true, path: "...", domain: "..."}`
1014
-
1015
- **Full fitting process** (agent-driven):
1016
- 1. Navigate to site — read light-fit manifest from `goto` response
1017
- 2. Use `pageFunction` to explore JS internals, test strategies
1018
- 3. Use `snapshot` + `snapshotSearch` to map page structure
1019
- 4. Use `poll` to test readiness detection approaches
1020
- 5. Write comprehensive manifest via `writeSiteManifest`
1021
-
1022
- **Manifest template**:
1023
- ```markdown
1024
- # example.com
1025
- Fitted: 2024-02-03 (full) | Fingerprint: <hash>
1026
-
1027
- ## Environment
1028
- - React 18.x, Next.js (SSR)
1029
- - SPA with pushState navigation
1030
- - Has <main> element: #__next > main
1031
-
1032
- ## Quirks
1033
- - Turbo intercepts link clicks — use settledWhen with URL check
1034
- - File tree uses virtualization — only visible rows in DOM
1035
-
1036
- ## Strategies
1037
- ### fill (React controlled inputs)
1038
- \`\`\`js
1039
- (el, value) => {
1040
- const setter = Object.getOwnPropertyDescriptor(HTMLInputElement.prototype, 'value').set;
1041
- setter.call(el, value);
1042
- el.dispatchEvent(new Event('input', {bubbles: true}));
1043
- }
1044
- \`\`\`
1045
-
1046
- ## Regions
1047
- - mainContent: `main, [role="main"]`
1048
- - navigation: `.nav-bar`
1049
-
1050
- ## Recipes
1051
- ### Login
1052
- \`\`\`json
1053
- {"pipeline": [
1054
- {"find": "#username", "fill": "{{user}}"},
1055
- {"find": "#password", "fill": "{{pass}}"},
1056
- {"find": "#login", "click": true},
1057
- {"waitFor": "() => location.pathname !== '/login'"}
1058
- ]}
1059
- \`\`\`
1060
- ```
1061
-
1062
-
1063
- ### JavaScript Execution
1064
-
1065
- **eval** - Execute JS in page context
1066
- ```json
1067
- {"eval": "document.title"}
1068
- {"eval": {"expression": "fetch('/api').then(r=>r.json())", "await": true}}
1069
- ```
1070
- Options: `expression`, `await`, `timeout`, `serialize`
1071
-
1072
- **Shell escaping tip:** For complex expressions with quotes or special characters, use a heredoc or JSON file:
1073
- ```bash
1074
- # Heredoc approach (Unix)
1075
- node src/cdp-skill.js <<'EOF'
1076
- {"steps":[{"eval":"document.querySelectorAll('button').length"}]}
1077
- EOF
1078
-
1079
- # Or save to file and pipe
1080
- cat steps.json | node src/cdp-skill.js
1081
- ```
1082
-
1083
- Returns typed results:
1084
- - Numbers: `{type: "number", repr: "Infinity|NaN|-Infinity"}`
1085
- - Date: `{type: "Date", value: "ISO string", timestamp: N}`
1086
- - Map: `{type: "Map", size: N, entries: [...]}`
1087
- - Set: `{type: "Set", size: N, values: [...]}`
1088
- - Element: `{type: "Element", tagName, id, className, textContent, isConnected}`
1089
- - NodeList: `{type: "NodeList", length: N, items: [...]}`
1090
-
1091
-
1092
- ### Accessibility Snapshot
1093
-
1094
- **snapshot** - Get accessibility tree
1095
- ```json
1096
- {"snapshot": true}
1097
- {"snapshot": {"root": "#container", "maxElements": 500}}
1098
- {"snapshot": {"root": "role=main", "includeText": true}}
1099
- {"snapshot": {"includeFrames": true}}
1100
- {"snapshot": {"pierceShadow": true}}
1101
- {"snapshot": {"detail": "interactive"}}
1102
- {"snapshot": {"inlineLimit": 28000}}
1103
- {"snapshot": {"since": "s1"}}
1104
- ```
1105
- Options: `mode` (ai|full), `detail` (summary|interactive|full), `root` (CSS selector or "role=X"), `maxDepth`, `maxElements`, `includeText`, `includeFrames`, `pierceShadow`, `viewportOnly`, `inlineLimit`, `since`
1106
-
1107
- Returns YAML with: role, "name", states (`[checked]`, `[disabled]`, `[expanded]`, `[required]`, `[invalid]`, `[level=N]`), `[name=fieldName]` for form inputs, `[ref=s{N}e{M}]` for clicking, and `snapshotId`.
1108
-
1109
- ```yaml
1110
- - navigation:
1111
- - link "Home" [ref=s1e1]
1112
- - main:
1113
- - heading "Welcome" [level=1]
1114
- - textbox "Email" [required] [invalid] [name=email] [ref=s1e3]
1115
- - button "Submit" [ref=s1e4]
1116
- ```
1117
-
1118
- **Snapshot Caching (HTTP 304-like)**: Use the `since` parameter to check if the page has changed:
1119
- ```json
1120
- {"snapshot": {"since": "s1"}}
1121
- ```
1122
-
1123
- If the page hasn't changed (same URL, scroll position, DOM size, and interactive element count):
1124
- ```json
1125
- {
1126
- "unchanged": true,
1127
- "snapshotId": "s1",
1128
- "message": "Page unchanged since s1"
1129
- }
1130
- ```
1131
-
1132
- If the page has changed, a new snapshot is returned with an incremented `snapshotId`:
1133
- ```json
1134
- {
1135
- "snapshotId": "s2",
1136
- "yaml": "- button \"Login\" [ref=s2e1]\n...",
1137
- "refs": {...}
1138
- }
1139
- ```
1140
-
1141
- This minimizes context sent to the agent - only request new snapshots when needed.
1142
-
1143
- **Detail levels** control how much information is returned:
1144
- - `"full"` (default): Complete accessibility tree
1145
- - `"interactive"`: Only actionable elements (buttons, links, inputs) with their paths
1146
- - `"summary"`: Landmark overview with interactive element counts
1147
-
1148
- ```json
1149
- {"snapshot": {"detail": "summary"}}
1150
- ```
1151
- Returns:
1152
- ```yaml
1153
- # Snapshot Summary
1154
- # Total elements: 1847
1155
- # Interactive elements: 67
1156
- # Viewport elements: 23
1157
-
1158
- landmarks:
1159
- - role: main
1160
- interactiveCount: 47
1161
- children: [form, navigation, article]
1162
- ```
1163
-
1164
- **Large snapshot handling**: Snapshots over 9KB (configurable via `inlineLimit`) are automatically saved to a file:
1165
- ```json
1166
- {
1167
- "yaml": null,
1168
- "artifacts": {"snapshot": "/tmp/cdp-skill/t1.snapshot.yaml"},
1169
- "snapshotSize": 125000,
1170
- "truncatedInline": true
1171
- }
1172
- ```
1173
-
1174
- Use `includeText: true` to capture static text (error messages, etc.). Elements with `role="alert"` or `role="status"` always include text.
1175
-
1176
- Use `includeFrames: true` to include same-origin iframe content in the snapshot. Cross-origin iframes are marked with `crossOrigin: true`.
1177
-
1178
- Use `pierceShadow: true` to traverse into open Shadow DOM trees. This is useful for web components that use Shadow DOM to encapsulate their internal structure.
1179
-
1180
- **snapshotSearch** - Search within accessibility tree
1181
- ```json
1182
- {"snapshotSearch": {"text": "Submit"}}
1183
- {"snapshotSearch": {"text": "Submit", "role": "button"}}
1184
- {"snapshotSearch": {"pattern": "^Save.*draft$", "role": "button"}}
1185
- {"snapshotSearch": {"role": "button", "limit": 20}}
1186
- {"snapshotSearch": {"text": "Edit", "near": {"x": 500, "y": 300, "radius": 100}}}
1187
- ```
1188
- Options: `text` (fuzzy match), `pattern` (regex), `role`, `exact` (boolean), `limit` (default: 10), `context` (parent levels), `near` ({x, y, radius})
1189
-
1190
- Returns only matching elements without the full tree:
1191
- ```json
1192
- {
1193
- "matches": [
1194
- {"path": "main > form > button", "ref": "s1e47", "name": "Submit Form", "role": "button"},
1195
- {"path": "dialog > button", "ref": "s1e89", "name": "Submit Feedback", "role": "button"}
1196
- ],
1197
- "matchCount": 2,
1198
- "searchedElements": 1847
1199
- }
1200
- ```
1201
-
1202
- Use `snapshotSearch` to find specific elements without loading the entire accessibility tree - especially useful for large SPAs.
1203
-
1204
- **Ref Persistence**: Refs generated by `snapshotSearch` are preserved across subsequent commands. The auto-snapshots taken at command boundaries merge into the existing ref map instead of overwriting it. This means you can:
1205
- 1. Run `snapshotSearch` to find elements on a very long page (e.g., "s1e1170")
1206
- 2. Click that ref in the next command and it will still work
1207
-
1208
-
1209
- ### Viewport & Device Emulation
1210
-
1211
- **viewport** - Set viewport size
1212
- ```json
1213
- {"viewport": "iphone-14"}
1214
- {"viewport": {"width": 1280, "height": 720}}
1215
- {"viewport": {"width": 375, "height": 667, "mobile": true, "hasTouch": true, "isLandscape": true}}
1216
- ```
1217
- Options: `width`, `height`, `deviceScaleFactor`, `mobile`, `hasTouch`, `isLandscape`
1218
-
1219
- Returns: `{width, height, deviceScaleFactor, mobile, hasTouch}`
1220
-
1221
- Presets: `iphone-se`, `iphone-14`, `iphone-15-pro`, `ipad`, `ipad-pro-11`, `pixel-7`, `samsung-galaxy-s23`, `desktop`, `desktop-hd`, `macbook-pro-14`, etc.
1222
-
1223
-
1224
- ### Cookie Management
1225
-
1226
- **cookies** - Get/set/clear cookies (defaults to current tab's domain)
1227
- ```json
1228
- {"cookies": {"get": true}}
1229
- {"cookies": {"get": ["https://other-domain.com"], "name": "session_id"}}
1230
- {"cookies": {"set": [{"name": "token", "value": "abc", "domain": "example.com", "expires": "7d"}]}}
1231
- {"cookies": {"delete": "session_id"}}
1232
- {"cookies": {"delete": "session_id", "domain": "example.com"}}
1233
- {"cookies": {"clear": true}}
1234
- {"cookies": {"clear": true, "domain": "example.com"}}
1235
- ```
1236
-
1237
- **Note:** `get` without URLs returns only cookies for the current tab's domain. Specify explicit URLs to get cookies from other domains.
1238
-
1239
- Set options: `name`, `value`, `url` or `domain`, `path`, `secure`, `httpOnly`, `sameSite`, `expires`
1240
-
1241
- Clear/delete options: `domain` to limit to a specific domain (e.g., "example.com" matches ".example.com" and subdomains)
1242
-
1243
- Expiration formats: `30m`, `1h`, `7d`, `1w`, `1y`, or Unix timestamp.
1244
-
1245
- Returns: get → `{cookies: [...]}`, set → `{action: "set", count: N}`, delete/clear → `{action: "delete|clear", count: N}`
1246
-
1247
-
1248
- ### Form Validation
1249
-
1250
- **validate** - Check field validation state
1251
- ```json
1252
- {"validate": "#email"}
1253
- ```
1254
- Returns: `{valid, message, validity: {valueMissing, typeMismatch, ...}}`
1255
-
1256
- **submit** - Submit form with validation
1257
- ```json
1258
- {"submit": "form"}
1259
- {"submit": {"selector": "#login-form", "reportValidity": true}}
1260
- ```
1261
- Returns: `{submitted, valid, errors: [{name, type, message, value}]}`
1262
-
1263
-
1264
- ### Assertions
1265
-
1266
- **assert** - Validate conditions
1267
- ```json
1268
- {"assert": {"url": {"contains": "/success"}}}
1269
- {"assert": {"url": {"matches": "^https://.*\\.example\\.com"}}}
1270
- {"assert": {"text": "Welcome"}}
1271
- {"assert": {"selector": "h1", "text": "Title", "caseSensitive": false}}
1272
- ```
1273
-
1274
- URL options: `contains`, `equals`, `startsWith`, `endsWith`, `matches`
1275
-
1276
-
1277
- ### Tab Management
1278
-
1279
- **listTabs** - List open tabs
1280
- ```json
1281
- {"listTabs": true}
1282
- ```
1283
- Returns: `{count, tabs: [{targetId, url, title}]}`
1284
-
1285
- **closeTab** - Close a tab
1286
- ```json
1287
- {"closeTab": "ABC123..."}
1288
- ```
1289
- Returns: `{closed: "<targetId>"}`
1290
-
1291
- **openTab** - Create a new browser tab (REQUIRED as first step when no tab specified)
1292
- ```json
1293
- {"openTab": true} // Open blank tab
1294
- {"openTab": "https://example.com"} // Open and navigate to URL
1295
- {"openTab": {"url": "https://example.com"}} // Object format (allows future options)
1296
- ```
1297
- Returns: `{opened: true, tab: "t1", url: "...", navigated: true, viewportSnapshot: "...", fullSnapshot: "...", context: {...}}`
1298
-
1299
- Like other navigation actions, `openTab` returns an inline accessibility snapshot when a URL is provided. This establishes the baseline for diff comparison on subsequent actions.
1300
-
1301
- Use `openTab` when starting a new automation session. Use the returned `tab` id (e.g., "t1") in subsequent calls via `config.tab`.
1302
-
175
+ ### snapshot
176
+ `true` | `{root, detail, mode, maxDepth, maxElements, includeText, includeFrames, pierceShadow, viewportOnly, inlineLimit, since}`
177
+ Detail: summary | interactive | full(default)
178
+ Since: `"s1"` returns `{unchanged: true}` if page hasn't changed
179
+ Returns: YAML with role, "name", states, `[ref=s{N}e{M}]`, snapshotId
180
+ Notes: snapshots over 9KB saved to file (configurable via inlineLimit)
181
+
182
+ ### snapshotSearch
183
+ `{text, pattern, role, exact, limit(10), context, near: {x, y, radius}}`
184
+ Returns: `{matches[], matchCount, searchedElements}`
185
+ Notes: refs from snapshotSearch persist across subsequent commands.
186
+
187
+ ### wait
188
+ `"selector"` | `{selector, hidden, minCount}` | `{text}` | `{textRegex}` | `{urlContains}`
189
+
190
+ ### get
191
+ `"selector"` (text) | `{selector, mode: "text"|"html"|"value"|"box"|"attributes"}` — unified content extraction.
192
+ Returns: text → `{text}`, html → `{html, tagName, length}`, value → `{fields[], valid, fieldCount}`, box → `{x, y, width, height}`, attributes → `{attributes}`
193
+
194
+ ### getUrl
195
+ `true` returns `{url}`
196
+
197
+ ### getTitle
198
+ `true` — returns `{title}`
199
+
200
+ ### pageFunction
201
+ `"() => expr"` | `{fn, refs(bool), timeout}` — run custom JavaScript in the browser.
202
+ Notes: auto-wrapped as IIFE, refs passes `window.__ariaRefs`, return value auto-serialized, runs in current frame context.
203
+
204
+ ### closeTab
205
+ `"tabId"` — returns `{closed}`
206
+
207
+ ### listTabs
208
+ `true` — returns `{count, tabs[]}`
209
+
210
+ ## Also Available
211
+
212
+ These steps are fully functional see EXAMPLES.md for usage details.
213
+
214
+ | Step | Description |
215
+ |------|-------------|
216
+ | `fill` | Unified fill (replaces `type`): `"text"` (focused), `{selector, value}` (single), `{"#a":"x","#b":"y"}` (batch), `{fields:{...}, react}` (batch+options), `{value, clear}` (focused+options) |
217
+ | `sleep` | Time delay: `2000` (ms, 0–60000) |
218
+ | `pageFunction` | JS execution: `"() => expr"` (function) or `"document.title"` (bare expression) or `{fn, expression, refs, timeout}` |
219
+ | `poll` | Poll predicate until truthy: `"() => expr"` or `{fn, interval, timeout}` |
220
+ | `query` | CSS/ARIA query: `"selector"` or `{role, name}` → `{total, results[]}` |
221
+ | `queryAll` | Batch queries: `{"label": "selector", ...}` |
222
+ | `inspect` | Page overview: `true` `{title, url, counts}` |
223
+ | `get` | Unified content extraction (replaces `extract`, `getDom`, `getBox`, `formState`): `"selector"` (text), `{selector, mode: "text"\|"html"\|"value"\|"box"\|"attributes"}` |
224
+ | `getUrl` | Get current URL: `true` → `{url}` |
225
+ | `getTitle` | Get page title: `true` `{title}` |
226
+ | `hover` | Hover element: `"selector"` or `{selector, ref, text, x/y, duration}` |
227
+ | `drag` | Drag and drop: `{source, target, steps, delay, method}` — method: `auto`(default)\|`mouse`\|`html5` |
228
+ | `selectText` | Text selection (renamed from `select`): `"selector"` or `{selector, start, end}` |
229
+ | `selectOption` | Dropdown: `{selector, value\|label\|index\|values}` |
230
+ | `submit` | Submit form: `"selector"` → `{submitted, valid, errors[]}` |
231
+ | `assert` | Assert conditions: `{url: {contains\|equals\|matches}}` or `{text}` or `{selector, text}` |
232
+ | `elementsAt` | Coordinate lookup: `{x,y}` (point), `[{x,y},...]` (batch), `{x,y,radius}` (nearby) |
233
+ | `frame` | Frame ops: `"selector"` (switch), `0` (by index), `"top"` (main frame), `{name}`, `{list:true}` |
234
+ | `viewport` | Set viewport: `"iphone-14"` or `{width, height, mobile}` |
235
+ | `cookies` | Get/set/delete: `{get: true}`, `{set: [...]}`, `{delete: "name"}`, `{clear: true}` |
236
+ | `console` | Browser console: `true` or `{level, limit, clear}` → `{messages[]}` |
237
+ | `pdf` | Generate PDF: `"filename"` or `{path, landscape, scale, pageRanges}` |
238
+ | `back` / `forward` | History navigation: `true` `{url, title}` or `{noHistory: true}` |
239
+ | `reload` | Reload page: `true` or `{waitUntil}` |
240
+ | `newTab` | Create new tab (renamed from `openTab`): `"url"` or `{url, wait}` → `{tab, url}` |
241
+ | `switchTab` | Switch to existing tab (renamed from `connectTab`): `"t1"` or `{targetId}` or `{url: "regex"}` |
242
+ | `closeTab` | Close tab: `"t1"` or `true` (current) |
243
+ | `listTabs` | List all tabs: `true` `{tabs[]}` |
244
+ | `waitForNavigation` | Wait for nav: `true` or `{timeout, waitUntil}` |
1303
245
 
1304
246
  ### Optional Steps
1305
247
 
1306
- Add `"optional": true` to continue on failure:
1307
- ```json
1308
- {"click": "#maybe-exists", "optional": true}
1309
- ```
1310
-
1311
-
1312
- ## Debug Mode
1313
-
1314
- ### CLI Debug Logging
1315
-
1316
- Use the `--debug` flag to log all requests and responses to a `log/` directory:
1317
- ```bash
1318
- node src/cdp-skill.js --debug '{"steps":[{"goto":"https://google.com"}]}'
1319
- ```
1320
-
1321
- Creates files like:
1322
- - `log/001-chromeStatus.ok.json` - No tab (chromeStatus doesn't use tabs)
1323
- - `log/002-t1-openTab.ok.json` - Tab ID included when available
1324
- - `log/003-t1-click-fill.ok.json` - Multiple actions shown
1325
- - `log/004-t1-scroll.error.json` - Error cases include `.error` suffix
1326
-
1327
- Files are numbered sequentially and include tab ID + action names for easy identification.
1328
-
1329
- ### Config-Based Debug
1330
-
1331
- Capture screenshots/DOM before and after each action:
1332
- ```json
1333
- {
1334
- "config": {
1335
- "debug": true,
1336
- "debugOptions": {"captureScreenshots": true, "captureDom": true}
1337
- },
1338
- "steps": [...]
1339
- }
1340
- ```
1341
-
1342
- Debug output goes to the platform temp directory by default. Set `"outputDir": "/path/to/dir"` to override.
1343
-
1344
-
1345
- ## Not Supported
1346
-
1347
- Handle via multiple invocations:
1348
- - Conditional logic / loops
1349
- - Variables / templating
1350
- - File uploads
1351
- - Dialog handling (alert, confirm)
1352
-
248
+ Add `"optional": true` to any step to continue on failure (status becomes "skipped").
1353
249
 
1354
250
  ## Troubleshooting
1355
251
 
1356
252
  | Issue | Solution |
1357
253
  |-------|----------|
1358
- | Tabs accumulating | Include `tab` in config |
1359
- | CONNECTION error | Use `chromeStatus` first - it auto-launches Chrome |
1360
- | Chrome not found | Set `CHROME_PATH` environment variable |
254
+ | Tabs accumulating | Include `tab` at top level |
255
+ | CONNECTION error | Check Chrome is reachable; use `chromeStatus` to diagnose |
256
+ | Chrome not found | Set `CHROME_PATH` env var |
1361
257
  | Element not found | Add `wait` step first |
1362
- | Clicks not working | Scroll element into view first, or use `force: true` |
1363
- | Click/hover timeout on animations | Use `force: true` - auto-force triggers after 10s timeout |
1364
- | `back` returns `noHistory: true` | New tabs start at `about:blank` with no history. Navigate first, then use `back` |
1365
- | Select dropdown not working | Use `click` + `click` (open then select), or `press` arrow keys |
1366
- | Type not appearing | Ensure input is focused with `click` first, then use `type` |
1367
- | "Chrome has no tabs" | Use `chromeStatus` - it auto-creates a tab |
1368
- | macOS: Chrome running but no CDP | `chromeStatus` launches a new instance with CDP enabled |
1369
-
1370
- ### macOS Chrome Behavior
1371
-
1372
- On macOS, Chrome continues running as a background process even after closing all windows. This can cause issues:
1373
-
1374
- 1. **Chrome running without CDP port**: If Chrome was started normally (not via this skill), it won't have the CDP debugging port enabled. The skill detects this and launches a **new** Chrome instance with CDP enabled (it never closes your existing Chrome).
1375
-
1376
- 2. **Chrome running with CDP but no tabs**: After closing all Chrome windows, the process may still be listening on the CDP port but have no tabs. The skill automatically creates a new tab in this case.
1377
-
1378
- The `chromeStatus` step handles both scenarios automatically:
1379
- ```json
1380
- {"chromeStatus": true}
1381
- ```
1382
-
1383
- Response when Chrome needed intervention:
1384
- ```json
1385
- {
1386
- "running": true,
1387
- "launched": true,
1388
- "createdTab": true,
1389
- "note": "Chrome was running without CDP port. Launched new instance with debugging enabled. Created new tab.",
1390
- "tabs": [{"targetId": "ABC123", "url": "about:blank", "title": ""}]
1391
- }
1392
- ```
1393
-
1394
- **Important**: The skill never closes Chrome windows or processes without explicit user action. It only launches new instances or creates new tabs as needed.
258
+ | Clicks not working | Scroll into view first, or `force: true` |
259
+ | `back` returns noHistory | New tabs start at about:blank; navigate first |
260
+ | Select dropdown not working | Use click+click or press arrow keys |
261
+ | Type not appearing | Click input first to focus, then type |
262
+ | Elements missing from snapshot | Custom widgets may lack ARIA roles; use `pageFunction` or `get` with `mode: "html"` as fallback |
263
+ | macOS: Chrome running but no CDP | `chromeStatus` launches new instance with CDP enabled |
1395
264
 
1396
265
  ## Best Practices
1397
266
 
1398
- 1. **NEVER launch Chrome directly** - Always use `chromeStatus` to manage Chrome. Do NOT run shell commands like `open -a "Google Chrome"` or spawn Chrome processes yourself. The skill handles all Chrome lifecycle management including:
1399
- - Launching Chrome with the correct debugging flags
1400
- - Detecting existing Chrome instances
1401
- - Creating tabs when needed
1402
- - Handling macOS background process issues
1403
-
1404
- 2. **Use openTab to create your tab** - Your first call must include `{"openTab":"url"}` as the first step to create a new tab. Use the returned `tab` id (e.g., "t1") for subsequent calls.
1405
-
1406
- 3. **Reuse only your own tabs** - Always pass `tab` from your previous response; other agents may be using the same browser
1407
-
1408
- 4. **Clean up your tab when done** - After completing your test (pass or fail), close your tab unless instructed otherwise:
1409
- ```json
1410
- {"closeTab": "YOUR_TARGET_ID"}
1411
- ```
1412
-
1413
- 5. **Discover before interacting** - Use `inspect` and `snapshot` to understand page structure
1414
-
1415
- 6. **Use website navigation** - Click links and submit forms; don't guess URLs
1416
-
1417
- 7. **Be persistent** - Try alternative selectors, add waits, scroll first
1418
-
1419
- 8. **Prefer refs** - Use `snapshot` + refs over brittle CSS selectors
1420
-
1421
- ## Feedback
1422
-
1423
- If you encounter limitations, bugs, or feature requests that would significantly improve automation capabilities, please report them to the skill maintainer.
1424
- If you spot opportunities for speeding things up raise this in your results as well.
267
+ - **Handle `actionRequired` immediately** when a response contains this field, complete it before doing anything else
268
+ - **Never launch Chrome directly** `newTab` handles it automatically
269
+ - **Use newTab** as your first step to create a tab; use the returned tab ID for all subsequent calls
270
+ - **Reuse only your own tabs** other agents may share the browser
271
+ - **Update the site profile before closing** — add any quirks, selectors, or recipes you discovered
272
+ - **Close your tab when done** — `closeTab` with your tab ID
273
+ - **Discover before interacting** use `snapshot` to understand the page structure
274
+ - **Use website navigation** — click links and submit forms; don't guess URLs
275
+ - **Prefer refs** over CSS selectors use `snapshot` + refs for resilient targeting
276
+ - **Check `newTabs` after click** — clicks on `target="_blank"` links report new tabs; use `switchTab` to switch
277
+ - **Use `switchTab` for popups** connect by alias (`"t2"`), targetId, or URL regex (`{url: "pattern"}`)
278
+ - **Be persistent** — try alternative selectors, add waits, scroll first