cdp-skill 1.0.8 → 1.0.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/README.md +80 -35
  2. package/SKILL.md +157 -241
  3. package/install.js +1 -0
  4. package/package.json +1 -1
  5. package/src/aria/index.js +8 -0
  6. package/src/aria/output-processor.js +173 -0
  7. package/src/aria/role-query.js +1229 -0
  8. package/src/aria/snapshot.js +459 -0
  9. package/src/aria.js +251 -50
  10. package/src/cdp/browser.js +22 -4
  11. package/src/cdp-skill.js +246 -69
  12. package/src/dom/LazyResolver.js +634 -0
  13. package/src/dom/click-executor.js +366 -94
  14. package/src/dom/element-locator.js +34 -25
  15. package/src/dom/fill-executor.js +83 -50
  16. package/src/dom/index.js +3 -0
  17. package/src/page/dialog-handler.js +119 -0
  18. package/src/page/page-controller.js +236 -3
  19. package/src/runner/context-helpers.js +33 -55
  20. package/src/runner/execute-dynamic.js +8 -7
  21. package/src/runner/execute-form.js +11 -11
  22. package/src/runner/execute-input.js +2 -2
  23. package/src/runner/execute-interaction.js +105 -126
  24. package/src/runner/execute-navigation.js +14 -29
  25. package/src/runner/execute-query.js +17 -11
  26. package/src/runner/step-executors.js +225 -84
  27. package/src/runner/step-registry.js +1064 -0
  28. package/src/runner/step-validator.js +16 -754
  29. package/src/tests/Aria.test.js +1025 -0
  30. package/src/tests/ClickExecutor.test.js +170 -50
  31. package/src/tests/ContextHelpers.test.js +41 -30
  32. package/src/tests/ExecuteBrowser.test.js +572 -0
  33. package/src/tests/ExecuteDynamic.test.js +2 -457
  34. package/src/tests/ExecuteForm.test.js +700 -0
  35. package/src/tests/ExecuteInput.test.js +540 -0
  36. package/src/tests/ExecuteInteraction.test.js +319 -0
  37. package/src/tests/ExecuteQuery.test.js +820 -0
  38. package/src/tests/FillExecutor.test.js +89 -37
  39. package/src/tests/LazyResolver.test.js +383 -0
  40. package/src/tests/StepValidator.test.js +224 -78
  41. package/src/tests/TestRunner.test.js +38 -27
  42. package/src/tests/integration.test.js +2 -1
  43. package/src/types.js +9 -9
  44. package/src/utils/backoff.js +118 -0
  45. package/src/utils/cdp-helpers.js +130 -0
  46. package/src/utils/devices.js +140 -0
  47. package/src/utils/errors.js +242 -0
  48. package/src/utils/index.js +65 -0
  49. package/src/utils/temp.js +75 -0
  50. package/src/utils/validators.js +433 -0
  51. package/src/utils.js +14 -1142
package/README.md CHANGED
@@ -8,33 +8,43 @@ A lightweight, zero-dependency browser automation library using Chrome DevTools
8
8
  - **AI-agent optimized** - JSON in, JSON out; designed for LLM tool use
9
9
  - **Auto-launch Chrome** - Detects and starts Chrome automatically on macOS, Linux, Windows
10
10
  - **Accessibility-first** - ARIA snapshots with element refs for resilient automation
11
- - **Battle-tested** - 600+ unit tests
11
+ - **Site profiles** - Per-domain knowledge files that agents build and share across sessions
12
+ - **Battle-tested** - 1,150+ unit tests
12
13
 
13
14
  ## Quick Start
14
15
 
15
16
  ```bash
16
- # Check Chrome status (auto-launches if needed)
17
- node src/cdp-skill.js '{"steps":[{"chromeStatus":true}]}'
17
+ # Open a tab (Chrome auto-launches if needed)
18
+ node src/cdp-skill.js '{"steps":[{"newTab":"https://google.com"}]}'
18
19
 
19
- # Navigate to a page
20
- node src/cdp-skill.js '{"steps":[{"goto":"https://google.com"}]}'
20
+ # Use the returned tab ID for subsequent calls
21
+ node src/cdp-skill.js '{"tab":"t1","steps":[{"click":"#btn"}]}'
22
+
23
+ # Non-default Chrome (rare)
24
+ node src/cdp-skill.js '{"steps":[{"newTab":{"url":"https://google.com","port":9333,"headless":true}}]}'
21
25
  ```
22
26
 
23
27
  ## Features
24
28
 
29
+ ### Site Profiles
30
+ - **Per-domain knowledge** - Agents record quirks, selectors, and strategies at `~/.cdp-skill/sites/{domain}.md`
31
+ - **Automatic prompting** - `goto`/`newTab` returns `actionRequired` for unknown sites, `siteProfile` for known ones
32
+ - **Read/write** - `readSiteProfile` and `writeSiteProfile` steps for ad-hoc profile access
33
+ - **Collaborative** - Multiple agents share and improve profiles across sessions
34
+
25
35
  ### Chrome Management
26
36
  - **Auto-launch** - Detects Chrome path on macOS/Linux/Windows, launches with remote debugging
27
- - **Status check** - `chromeStatus` step reports running state, version, and open tabs
37
+ - **Status check** - `chromeStatus` step for diagnostics (optional `newTab` handles launch automatically)
28
38
  - **Multi-agent safe** - Multiple agents can share Chrome; each manages their own tabs
29
- - **Headless support** - Run Chrome without UI via `{"chromeStatus":{"headless":true}}`
39
+ - **Headless support** - Run Chrome without UI via `{"newTab":{"url":"...","headless":true}}`
30
40
 
31
41
  ### Navigation
32
- - **URL navigation** - `goto`, `back`, `forward`
42
+ - **URL navigation** - `goto`, `back`, `forward`, `reload`
33
43
  - **Wait conditions** - Network idle, DOM ready, element visible, text appears, URL changes
34
44
  - **Navigation detection** - Automatic navigation tracking after clicks
35
45
 
36
46
  ### Element Interaction
37
- - **Click** - CSS selectors, ARIA refs, or x/y coordinates
47
+ - **Click** - CSS selectors, ARIA refs, text content, or x/y coordinates
38
48
  - **Fill & Type** - Input filling with React/controlled component support
39
49
  - **Keyboard** - Key presses, combos (`Control+a`, `Meta+Shift+Enter`)
40
50
  - **Hover** - Mouse over with configurable duration
@@ -50,32 +60,42 @@ node src/cdp-skill.js '{"steps":[{"goto":"https://google.com"}]}'
50
60
  - **Pointer events** - CSS pointer-events not disabled
51
61
  - **Auto-force** - Retries with force when actionability times out
52
62
 
63
+ ### Action Hooks
64
+ - **readyWhen** - Poll a condition before executing the action
65
+ - **settledWhen** - Poll a condition after the action completes
66
+ - **observe** - Run a function after settlement, return value in response
67
+
53
68
  ### Accessibility & Queries
54
69
  - **ARIA snapshots** - Get accessibility tree as YAML with clickable refs
70
+ - **Snapshot search** - Find elements by text, pattern, or role within snapshots
55
71
  - **Role queries** - Find elements by ARIA role (`button`, `textbox`, `link`, etc.)
56
72
  - **CSS queries** - Traditional selector-based queries
57
73
  - **Multi-query** - Batch multiple queries in one step
58
74
  - **Page inspection** - Quick overview of page structure
59
- - **Coordinate discovery** - `refAt`, `elementsAt`, `elementsNear` for visual-based targeting
75
+ - **Coordinate discovery** - `elementsAt` for point, batch, and nearby visual-based targeting
76
+
77
+ ### Dynamic Browser Execution
78
+ - **pageFunction** - Run agent-generated JavaScript in the browser with serialized return values
79
+ - **poll** - Poll a predicate function until truthy or timeout
60
80
 
61
81
  ### Frame Support
62
- - **List frames** - Enumerate all iframes
63
- - **Switch context** - Execute in iframe by selector, index, or name
82
+ - **frame** - Unified step: `"selector"` (switch), `0` (by index), `"top"` (main), `{list: true}` (enumerate)
64
83
  - **Cross-origin detection** - Identifies cross-origin frames in snapshots
65
84
  - **Shadow DOM** - Pierce shadow roots with `pierceShadow` option in snapshots
66
85
 
67
86
  ### Screenshots & PDF
68
- - **Viewport capture** - Current view
87
+ - **Auto-capture** - Screenshots taken on every visual action
69
88
  - **Full page** - Entire scrollable area
70
89
  - **Element capture** - Specific element by selector
71
90
  - **PDF generation** - With metadata (page count, dimensions)
72
- - **Temp directory** - Auto-saves to platform temp dir for relative paths
73
91
 
74
92
  ### Data Extraction
75
- - **Text/HTML/attributes** - Extract content from elements
93
+ - **get** - Unified content extraction with modes: text, html, value, box, attributes
94
+ - **getUrl / getTitle** - Convenience shortcuts for common page metadata
95
+ - **Structured extraction** - Tables and lists with auto-detection via `get`
76
96
  - **Console logs** - Capture browser console output
77
97
  - **Cookies** - Get, set, delete with expiration support
78
- - **JavaScript eval** - Execute code in page context with serialization
98
+ - **pageFunction** - Execute JS functions or bare expressions in page context with serialization
79
99
 
80
100
  ### Form Handling
81
101
  - **Fill form** - Multiple fields in one step
@@ -93,34 +113,59 @@ node src/cdp-skill.js '{"steps":[{"goto":"https://google.com"}]}'
93
113
  - **Mobile mode** - Touch events, mobile user agent
94
114
 
95
115
  ### Tab Management
96
- - **List tabs** - See all open tabs with targetId
97
- - **Close tabs** - Clean up when done
98
- - **Tab reuse** - Pass targetId to reuse existing tab
99
-
100
- ### Debug Mode
101
- - **Before/after screenshots** - Capture state around each action
102
- - **DOM snapshots** - HTML at each step
103
- - **Output to temp dir** - Automatic cleanup-friendly location
116
+ - **Open/close tabs** - Create and clean up tabs
117
+ - **List tabs** - See all open tabs
118
+ - **Tab reuse** - Pass tab ID to reuse existing tab across CLI invocations
104
119
 
105
120
  ## Documentation
106
121
 
107
122
  - **[SKILL.md](./SKILL.md)** - Complete step reference and API documentation
108
- - **[src/](./src/)** - Source code with JSDoc comments
123
+ - **[EXAMPLES.md](./EXAMPLES.md)** - JSON examples, response shapes, and worked patterns
109
124
 
110
125
  ## Architecture
111
126
 
112
127
  ```
113
128
  src/
114
- ├── cdp-skill.js # CLI entry point
115
- ├── cdp.js # CDP connection, discovery, Chrome launcher
116
- ├── page.js # Page controller, navigation, cookies
117
- ├── dom.js # Element location, input emulation, clicks
118
- ├── aria.js # Accessibility snapshots, role queries
119
- ├── capture.js # Screenshots, PDF, console, network
120
- ├── diff.js # Snapshot diffing, context capture
121
- ├── runner.js # Step validation and execution
122
- ├── utils.js # Errors, key validation, device presets
123
- └── index.js # Public API exports
129
+ ├── cdp-skill.js # CLI entry point, JSON parsing, response assembly
130
+ ├── aria.js # Accessibility snapshots, role queries
131
+ ├── diff.js # Snapshot diffing, viewport change detection
132
+ ├── utils.js # Errors, key validation, device presets
133
+ ├── constants.js # Shared constants
134
+ ├── index.js # Public API exports
135
+ ├── cdp/ # CDP connection layer
136
+ ├── browser.js # Chrome launcher, path detection
137
+ ├── connection.js # WebSocket CDP connection
138
+ │ ├── discovery.js # Tab discovery, target filtering
139
+ │ └── target-and-session.js # Target attachment, session management
140
+ ├── page/ # Page-level operations
141
+ │ ├── page-controller.js # Navigation, frame switching, eval
142
+ │ ├── cookie-manager.js # Cookie get/set/delete
143
+ │ ├── dom-stability.js # DOM mutation and stability detection
144
+ │ └── wait-utilities.js # Wait conditions (selector, text, URL)
145
+ ├── dom/ # Element interaction
146
+ │ ├── element-locator.js # CSS/ref/text element finding
147
+ │ ├── actionability.js # Visibility, stability, pointer-events checks
148
+ │ ├── click-executor.js # Click dispatch (CDP, JS, coordinate)
149
+ │ ├── fill-executor.js # Input filling, React support
150
+ │ ├── input-emulator.js # Keyboard/mouse CDP commands
151
+ │ └── element-handle.js # Element box model, scrolling
152
+ ├── capture/ # Output capture
153
+ │ ├── screenshot-capture.js # Viewport and full-page screenshots
154
+ │ ├── pdf-capture.js # PDF generation
155
+ │ ├── console-capture.js # Console log capture
156
+ │ ├── eval-serializer.js # JS value serialization
157
+ │ └── error-aggregator.js # Error collection and formatting
158
+ └── runner/ # Step orchestration
159
+ ├── step-executors.js # Main step dispatch and execution
160
+ ├── step-validator.js # Step definition validation
161
+ ├── context-helpers.js # Step types, action context
162
+ ├── execute-dynamic.js # pageFunction, poll, site profiles
163
+ ├── execute-interaction.js# click, hover, drag
164
+ ├── execute-input.js # fill (focused), selectOption, selectText
165
+ ├── execute-navigation.js # wait, scroll, waitForNavigation
166
+ ├── execute-query.js # snapshot, query, inspect, get, etc.
167
+ ├── execute-form.js # submit, assert
168
+ └── execute-browser.js # pdf, cookies, console, tabs
124
169
  ```
125
170
 
126
171
  ## Requirements