crawlio-browser 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,245 @@
1
+ ---
2
+ name: browser-automation
3
+ description: Use this skill when the user asks to interact with a browser, take screenshots, inspect a page, capture network traffic, detect frameworks, click elements, fill forms, or automate any browser task. Orchestrates crawlio-agent's 92 browser tools via the search + execute + connect_tab interface.
4
+ allowed-tools: mcp__crawlio-browser__search, mcp__crawlio-browser__execute, mcp__crawlio-browser__connect_tab
5
+ ---
6
+
7
+ # Browser Automation with Crawlio Agent
8
+
9
+ ## When to Use
10
+
11
+ Use this skill when the user wants to:
12
+ - Inspect, test, or interact with a live web page
13
+ - Take screenshots or capture accessibility snapshots
14
+ - Monitor network traffic, console logs, or errors
15
+ - Detect frameworks (React, Vue, Angular, Next.js, etc.)
16
+ - Click buttons, fill forms, type text, or navigate
17
+ - Read cookies, localStorage, sessionStorage, or IndexedDB
18
+ - Capture performance metrics, security state, or service workers
19
+ - Automate multi-step browser workflows
20
+
21
+ ## Connection (Always First)
22
+
23
+ Before any browser operation, connect to a tab:
24
+
25
+ ```
26
+ connect_tab({ url: "https://example.com" })
27
+ ```
28
+
29
+ - Opens a new tab if no matching tab is found
30
+ - Attaches CDP debugger automatically
31
+ - Omit `url` to connect to the currently active tab
32
+
33
+ Check connection status anytime:
34
+ ```js
35
+ return await bridge.send({ type: "get_connection_status" })
36
+ ```
37
+
38
+ ## Core Patterns via `execute`
39
+
40
+ All browser commands run inside the `execute` tool. The code has access to:
41
+ - `bridge` — WebSocket bridge to the Chrome extension
42
+ - `crawlio` — HTTP client for Crawlio desktop app
43
+ - `smart` — auto-waiting wrappers with framework-aware data accessors
44
+ - `sleep(ms)` — async wait (max 30s)
45
+ - `TIMEOUTS` — per-command timeout constants
46
+
47
+ ### Navigation
48
+
49
+ ```js
50
+ return await smart.navigate("https://example.com")
51
+ ```
52
+
53
+ ### Screenshots
54
+
55
+ ```js
56
+ return await smart.screenshot()
57
+ ```
58
+
59
+ Returns base64 PNG. Use `{ fullPage: true }` for full-page capture.
60
+
61
+ ### Click an Element
62
+
63
+ ```js
64
+ return await smart.click("button.submit")
65
+ ```
66
+
67
+ Auto-waits for the element to be visible and actionable before clicking.
68
+
69
+ ### Type Text
70
+
71
+ ```js
72
+ await smart.type("input[name='email']", "user@example.com")
73
+ return await smart.snapshot()
74
+ ```
75
+
76
+ ### Fill a Form
77
+
78
+ ```js
79
+ await bridge.send({
80
+ type: "browser_fill_form",
81
+ selector: "form#login",
82
+ values: { username: "admin", password: "secret" }
83
+ })
84
+ return await smart.snapshot()
85
+ ```
86
+
87
+ ### Accessibility Snapshot
88
+
89
+ ```js
90
+ return await smart.snapshot()
91
+ ```
92
+
93
+ Returns the a11y tree — use this to discover available elements when selectors fail.
94
+
95
+ ### Evaluate JavaScript
96
+
97
+ ```js
98
+ return await smart.evaluate("document.title")
99
+ ```
100
+
101
+ ### Framework Detection
102
+
103
+ ```js
104
+ return await bridge.send({ type: "detect_framework" })
105
+ ```
106
+
107
+ Returns detected framework name, version, and metadata.
108
+
109
+ ### Framework-Specific Data
110
+
111
+ After connecting to a page, `smart` auto-detects the framework and exposes typed accessors:
112
+
113
+ ```js
114
+ // React
115
+ const version = await smart.react?.getVersion()
116
+
117
+ // Next.js
118
+ const nextData = await smart.nextjs?.getData()
119
+ return { page: nextData?.page, buildId: nextData?.buildId }
120
+
121
+ // Vue
122
+ const config = await smart.vue?.getConfig()
123
+
124
+ // Redux store
125
+ const state = await smart.redux?.getStoreState()
126
+ ```
127
+
128
+ Available framework namespaces: `react`, `vue`, `angular`, `svelte`, `nextjs`, `nuxt`, `remix`, `gatsby`, `shopify`, `wordpress`, `laravel`, `django`, `drupal`, `alpine`, `redux`, `jquery`.
129
+
130
+ ### Cookies
131
+
132
+ ```js
133
+ return await bridge.send({ type: "get_cookies" })
134
+ ```
135
+
136
+ ### Storage (localStorage / sessionStorage)
137
+
138
+ ```js
139
+ return await bridge.send({ type: "get_storage", storageType: "local" })
140
+ ```
141
+
142
+ ## Network Capture Pattern
143
+
144
+ Capture network traffic during interactions:
145
+
146
+ ```js
147
+ await bridge.send({ type: "start_network_capture" })
148
+
149
+ // ... perform interactions ...
150
+ await smart.click("button.load-data")
151
+ await sleep(2000)
152
+
153
+ return await bridge.send({ type: "stop_network_capture" })
154
+ ```
155
+
156
+ ## Console Logs
157
+
158
+ ```js
159
+ return await bridge.send({ type: "get_console_logs" })
160
+ ```
161
+
162
+ ## Page Capture (All-in-One)
163
+
164
+ Capture framework, network, console, DOM, and cookies in one call:
165
+
166
+ ```js
167
+ return await bridge.send({ type: "capture_page" })
168
+ ```
169
+
170
+ ## Tab Management
171
+
172
+ ```js
173
+ // List all tabs
174
+ return await bridge.send({ type: "list_tabs" })
175
+
176
+ // Create a new tab
177
+ return await bridge.send({ type: "create_tab", url: "https://example.com" })
178
+
179
+ // Switch to a tab
180
+ return await bridge.send({ type: "switch_tab", tabId: 123 })
181
+
182
+ // Close a tab
183
+ return await bridge.send({ type: "close_tab", tabId: 123 })
184
+ ```
185
+
186
+ ## Discovery via Search
187
+
188
+ When you don't know the exact command, search first:
189
+
190
+ ```
191
+ search({ query: "cookies" })
192
+ ```
193
+
194
+ This returns matching command names, descriptions, and parameter schemas from the full catalog of 125 commands (92 browser + 33 desktop).
195
+
196
+ ## Desktop Integration (Crawlio App)
197
+
198
+ If the Crawlio desktop app is running, use the HTTP client:
199
+
200
+ ```js
201
+ // Check status
202
+ return await crawlio.api("GET", "/status")
203
+
204
+ // Start a crawl
205
+ return await crawlio.api("POST", "/start", { url: "https://example.com" })
206
+
207
+ // Get settings
208
+ return await crawlio.api("GET", "/settings")
209
+
210
+ // Export site
211
+ return await crawlio.api("POST", "/export", { format: "zip", destinationPath: "/tmp/site.zip" })
212
+ ```
213
+
214
+ ## Error Handling
215
+
216
+ | Error | Solution |
217
+ |-------|----------|
218
+ | "No tab connected" | Call `connect_tab` first |
219
+ | "Element not found" | Use `smart.snapshot()` to see available elements, then adjust selector |
220
+ | "Extension disconnected" | Check that the Chrome extension is installed and the popup shows "Connected" |
221
+ | Timeout | Increase timeout: `await bridge.send({ type: "..." }, 60000)` |
222
+ | "Permission required" | Click the Crawlio extension icon and grant permissions |
223
+
224
+ ## Multi-Step Workflow Example
225
+
226
+ ```js
227
+ // 1. Navigate to login page
228
+ await smart.navigate("https://app.example.com/login")
229
+
230
+ // 2. Fill credentials
231
+ await smart.type("#email", "user@example.com")
232
+ await smart.type("#password", "secret123")
233
+
234
+ // 3. Click login
235
+ await smart.click("button[type='submit']")
236
+ await sleep(2000)
237
+
238
+ // 4. Verify navigation
239
+ const title = await smart.evaluate("document.title")
240
+ return { title, url: await smart.evaluate("location.href") }
241
+ ```
242
+
243
+ ## Reference
244
+
245
+ See [reference.md](./reference.md) for the full list of all 92 browser commands and 33 desktop commands with parameters.
@@ -0,0 +1,259 @@
1
+ # Crawlio Agent Command Reference
2
+
3
+ Full catalog of all commands available via `search` and `execute`.
4
+
5
+ ## Browser Commands (92)
6
+
7
+ Commands sent via `bridge.send({ type: "<command>", ...params })`.
8
+
9
+ ### Connection & Tab Management
10
+
11
+ | Command | Description | Key Parameters |
12
+ |---------|-------------|----------------|
13
+ | `connect_tab` | Connect to a browser tab (opens new if needed) | `url?`, `tabId?` |
14
+ | `disconnect_tab` | Disconnect from the current tab | — |
15
+ | `list_tabs` | List all open browser tabs | — |
16
+ | `get_connection_status` | Check current connection state | — |
17
+ | `reconnect_tab` | Reconnect to the last connected tab | — |
18
+ | `get_capabilities` | Get extension capabilities and version | — |
19
+ | `create_tab` | Create a new browser tab | `url` |
20
+ | `close_tab` | Close a specific tab | `tabId` |
21
+ | `switch_tab` | Switch to a specific tab | `tabId` |
22
+
23
+ ### Navigation & Interaction
24
+
25
+ | Command | Description | Key Parameters |
26
+ |---------|-------------|----------------|
27
+ | `browser_navigate` | Navigate to a URL | `url` |
28
+ | `browser_click` | Click an element | `selector`, `button?`, `modifiers?` |
29
+ | `browser_type` | Type text into an element | `selector`, `text`, `slowly?`, `submit?` |
30
+ | `browser_press_key` | Press a keyboard key | `key`, `modifiers?` |
31
+ | `browser_hover` | Hover over an element | `selector` |
32
+ | `browser_select_option` | Select dropdown option | `selector`, `value?`, `label?`, `index?` |
33
+ | `browser_wait` | Wait for a specified duration | `ms` |
34
+ | `browser_fill_form` | Fill multiple form fields at once | `selector`, `values` |
35
+ | `browser_scroll` | Scroll the page or element | `x?`, `y?`, `selector?`, `direction?` |
36
+ | `browser_double_click` | Double-click an element | `selector` |
37
+ | `browser_drag` | Drag from one element to another | `sourceSelector`, `targetSelector` |
38
+ | `browser_file_upload` | Upload a file to an input | `selector`, `filePath` |
39
+ | `browser_evaluate` | Execute JavaScript in the page | `expression`, `returnByValue?` |
40
+ | `browser_snapshot` | Capture accessibility tree snapshot | — |
41
+ | `browser_wait_for` | Wait for element to appear | `selector`, `timeout?` |
42
+ | `browser_intercept` | Intercept and modify network requests | `urlPattern`, `action`, `responseBody?`, `statusCode?` |
43
+
44
+ ### Data Capture
45
+
46
+ | Command | Description | Key Parameters |
47
+ |---------|-------------|----------------|
48
+ | `capture_page` | Full page capture (framework, network, console, DOM, cookies) | — |
49
+ | `detect_framework` | Detect JavaScript frameworks | — |
50
+ | `start_network_capture` | Start capturing network requests | — |
51
+ | `stop_network_capture` | Stop and return captured network requests | — |
52
+ | `get_console_logs` | Get console log entries | — |
53
+ | `get_cookies` | Get all cookies for the current page | `url?` |
54
+ | `get_dom_snapshot` | Get DOM snapshot | `depth?` |
55
+ | `take_screenshot` | Take a screenshot | `fullPage?`, `selector?`, `format?`, `quality?` |
56
+ | `get_response_body` | Get response body for a network request | `requestId` |
57
+ | `get_websocket_connections` | List active WebSocket connections | — |
58
+ | `get_websocket_messages` | Get messages for a WebSocket connection | `requestId`, `limit?` |
59
+
60
+ ### Cookies & Storage
61
+
62
+ | Command | Description | Key Parameters |
63
+ |---------|-------------|----------------|
64
+ | `set_cookie` | Set a cookie | `name`, `value`, `domain?`, `path?`, `secure?`, `httpOnly?`, `sameSite?`, `expires?` |
65
+ | `delete_cookies` | Delete cookies | `name?`, `domain?`, `url?` |
66
+ | `get_storage` | Get localStorage or sessionStorage | `storageType` |
67
+ | `set_storage` | Set a storage item | `storageType`, `key`, `value` |
68
+ | `clear_storage` | Clear storage | `storageType` |
69
+
70
+ ### Frames
71
+
72
+ | Command | Description | Key Parameters |
73
+ |---------|-------------|----------------|
74
+ | `get_frame_tree` | Get the frame tree hierarchy | — |
75
+ | `switch_to_frame` | Switch execution context to a frame | `frameId` |
76
+ | `switch_to_main_frame` | Switch back to the main frame | — |
77
+
78
+ ### Dialogs
79
+
80
+ | Command | Description | Key Parameters |
81
+ |---------|-------------|----------------|
82
+ | `get_dialog` | Get current dialog info (alert, confirm, prompt) | — |
83
+ | `handle_dialog` | Accept or dismiss a dialog | `accept`, `promptText?` |
84
+
85
+ ### Device Emulation
86
+
87
+ | Command | Description | Key Parameters |
88
+ |---------|-------------|----------------|
89
+ | `set_viewport` | Set viewport dimensions | `width`, `height`, `deviceScaleFactor?`, `isMobile?` |
90
+ | `set_user_agent` | Override the user agent string | `userAgent` |
91
+ | `emulate_device` | Emulate a device preset | `device` |
92
+ | `set_geolocation` | Set geolocation coordinates | `latitude`, `longitude`, `accuracy?` |
93
+
94
+ ### Network Control
95
+
96
+ | Command | Description | Key Parameters |
97
+ |---------|-------------|----------------|
98
+ | `emulate_network` | Emulate network conditions | `offline?`, `latency?`, `downloadThroughput?`, `uploadThroughput?` |
99
+ | `set_cache_disabled` | Enable or disable cache | `cacheDisabled` |
100
+ | `set_extra_headers` | Set extra HTTP headers | `headers` |
101
+ | `set_stealth_mode` | Enable stealth mode to avoid detection | `enabled` |
102
+
103
+ ### Security
104
+
105
+ | Command | Description | Key Parameters |
106
+ |---------|-------------|----------------|
107
+ | `get_security_state` | Get page security/TLS state | — |
108
+ | `ignore_certificate_errors` | Ignore certificate errors | `ignore` |
109
+
110
+ ### Service Workers
111
+
112
+ | Command | Description | Key Parameters |
113
+ |---------|-------------|----------------|
114
+ | `list_service_workers` | List registered service workers | — |
115
+ | `stop_service_worker` | Stop a service worker | `versionId` |
116
+ | `bypass_service_worker` | Bypass service worker for network | `bypass` |
117
+
118
+ ### DOM Manipulation
119
+
120
+ | Command | Description | Key Parameters |
121
+ |---------|-------------|----------------|
122
+ | `set_outer_html` | Set outerHTML of an element | `selector`, `html` |
123
+ | `set_attribute` | Set an attribute on an element | `selector`, `name`, `value` |
124
+ | `remove_attribute` | Remove an attribute from an element | `selector`, `name` |
125
+ | `remove_node` | Remove an element from the DOM | `selector` |
126
+ | `highlight_element` | Visually highlight an element | `selector`, `color?`, `duration?` |
127
+
128
+ ### Performance & Coverage
129
+
130
+ | Command | Description | Key Parameters |
131
+ |---------|-------------|----------------|
132
+ | `get_performance_metrics` | Get runtime performance metrics | — |
133
+ | `start_css_coverage` | Start CSS coverage collection | — |
134
+ | `stop_css_coverage` | Stop and return CSS coverage data | — |
135
+ | `start_js_coverage` | Start JS coverage collection | — |
136
+ | `stop_js_coverage` | Stop and return JS coverage data | — |
137
+ | `get_computed_style` | Get computed styles for an element | `selector`, `properties?` |
138
+ | `detect_fonts` | Detect fonts used on the page | `selector?` |
139
+ | `force_pseudo_state` | Force CSS pseudo state (hover, focus, etc.) | `selector`, `pseudoClasses` |
140
+ | `show_layout_shifts` | Visualize cumulative layout shifts | — |
141
+ | `show_paint_rects` | Show paint rectangles | `enabled` |
142
+
143
+ ### Memory & Debugging
144
+
145
+ | Command | Description | Key Parameters |
146
+ |---------|-------------|----------------|
147
+ | `get_dom_counters` | Get DOM node/event/document counters | — |
148
+ | `force_gc` | Force garbage collection | — |
149
+ | `take_heap_snapshot` | Capture a heap snapshot | — |
150
+ | `get_targets` | List all debugger targets | — |
151
+ | `attach_to_target` | Attach to a specific target | `targetId` |
152
+ | `create_browser_context` | Create an isolated browser context | — |
153
+
154
+ ### IndexedDB
155
+
156
+ | Command | Description | Key Parameters |
157
+ |---------|-------------|----------------|
158
+ | `get_databases` | List IndexedDB databases | `securityOrigin?` |
159
+ | `query_object_store` | Query an IndexedDB object store | `databaseName`, `objectStoreName`, `securityOrigin?`, `limit?` |
160
+ | `clear_database` | Clear an IndexedDB database | `databaseName`, `objectStoreName`, `securityOrigin?` |
161
+
162
+ ### Export & PDF
163
+
164
+ | Command | Description | Key Parameters |
165
+ |---------|-------------|----------------|
166
+ | `print_to_pdf` | Print page to PDF | `landscape?`, `displayHeaderFooter?`, `scale?`, `paperWidth?`, `paperHeight?` |
167
+ | `extract_site` | Full site extraction via Crawlio | `url`, `format?` |
168
+
169
+ ### Crawlio Desktop Bridge
170
+
171
+ | Command | Description | Key Parameters |
172
+ |---------|-------------|----------------|
173
+ | `get_crawl_status` | Get Crawlio crawl status | — |
174
+ | `get_enrichment` | Get enrichment data for a URL | `url?` |
175
+ | `get_crawled_urls` | Get list of crawled URLs | `status?`, `type?`, `limit?`, `offset?` |
176
+ | `enrich_url` | Submit enrichment data to Crawlio | `url`, `framework?`, `networkRequests?`, `consoleLogs?`, `domSnapshotJSON?` |
177
+
178
+ ## Desktop Commands (33)
179
+
180
+ Commands sent via `crawlio.api(method, path, body?)`. Requires Crawlio desktop app running.
181
+
182
+ | Command | HTTP | Description |
183
+ |---------|------|-------------|
184
+ | `get_crawl_status` | `GET /status` | Engine state, progress counters |
185
+ | `get_crawl_logs` | `GET /logs` | Recent log entries (filterable by category, level) |
186
+ | `get_errors` | `GET /logs?level=error` | Error and fault-level logs |
187
+ | `get_downloads` | `GET /downloads` | All download items with status |
188
+ | `get_failed_urls` | `GET /failed-urls` | Failed downloads with error messages |
189
+ | `get_site_tree` | `GET /site-tree` | Downloaded files as directory tree |
190
+ | `start_crawl` | `POST /start` | Start a new crawl |
191
+ | `stop_crawl` | `POST /stop` | Stop the current crawl |
192
+ | `pause_crawl` | `POST /pause` | Pause the current crawl |
193
+ | `resume_crawl` | `POST /resume` | Resume a paused crawl |
194
+ | `get_settings` | `GET /settings` | Current crawl settings and policy |
195
+ | `update_settings` | `PATCH /settings` | Partial merge of settings/policy |
196
+ | `list_projects` | `GET /projects` | All saved crawl projects |
197
+ | `save_project` | `POST /projects` | Save current project |
198
+ | `load_project` | `POST /projects/{id}/load` | Load a saved project |
199
+ | `delete_project` | `DELETE /projects/{id}` | Delete a saved project |
200
+ | `get_project` | `GET /projects/{id}` | Full project details |
201
+ | `export_site` | `POST /export` | Export downloaded site (folder, zip, singleHTML, warc) |
202
+ | `get_export_status` | `GET /export/status` | Export state and progress |
203
+ | `extract_site_pipeline` | `POST /extract` | Run extraction pipeline |
204
+ | `get_extraction_status` | `GET /extract/status` | Extraction state and progress |
205
+ | `recrawl_urls` | `POST /recrawl` | Re-crawl specific URLs |
206
+ | `get_enrichment_data` | `GET /enrichment` | Browser enrichment data |
207
+ | `get_observations` | `GET /observations` | Observation timeline |
208
+ | `create_finding` | `POST /finding` | Create curated finding |
209
+ | `get_findings` | `GET /findings` | List curated findings |
210
+ | `get_crawled_urls_list` | `GET /crawled-urls` | Downloaded URLs with pagination |
211
+ | `trigger_capture` | `POST /capture` | WebKit runtime capture |
212
+ | `submit_enrichment_bundle` | `POST /enrichment/bundle` | Submit complete enrichment bundle |
213
+ | `submit_enrichment_framework` | `POST /enrichment/framework` | Submit framework detection |
214
+ | `submit_enrichment_network` | `POST /enrichment/network` | Submit network requests |
215
+ | `submit_enrichment_console` | `POST /enrichment/console` | Submit console logs |
216
+ | `submit_enrichment_dom` | `POST /enrichment/dom` | Submit DOM snapshot |
217
+
218
+ ## Smart Object Reference
219
+
220
+ The `smart` object provides auto-waiting wrappers and framework-specific data:
221
+
222
+ ### Core Methods
223
+
224
+ | Method | Description |
225
+ |--------|-------------|
226
+ | `smart.evaluate(expr)` | Raw JS evaluation via CDP |
227
+ | `smart.click(selector, opts?)` | Poll + click + 500ms settle |
228
+ | `smart.type(selector, text, opts?)` | Poll + type + 300ms settle |
229
+ | `smart.navigate(url, opts?)` | Navigate + 1000ms settle |
230
+ | `smart.waitFor(selector, timeout?)` | Poll until element is actionable |
231
+ | `smart.snapshot()` | Capture accessibility snapshot |
232
+ | `smart.screenshot()` | Take screenshot (returns base64 PNG) |
233
+ | `smart.rebuild()` | Force rebuild of framework namespaces |
234
+
235
+ ### Framework Namespaces
236
+
237
+ | Namespace | Methods |
238
+ |-----------|---------|
239
+ | `smart.react` | `getVersion`, `getRootCount`, `hasProfiler`, `isHookInstalled` |
240
+ | `smart.vue` | `getVersion`, `getAppCount`, `getConfig`, `isDevMode` |
241
+ | `smart.angular` | `getVersion`, `isDebugMode`, `isIvy`, `getRootCount`, `getState` |
242
+ | `smart.svelte` | `getVersion`, `getMeta`, `isDetected` |
243
+ | `smart.nextjs` | `getData`, `getRouter`, `getSSRMode`, `getRouteManifest` |
244
+ | `smart.nuxt` | `getData`, `getConfig`, `isSSR` |
245
+ | `smart.remix` | `getContext`, `getRouteData` |
246
+ | `smart.gatsby` | `getData`, `getPageData` |
247
+ | `smart.redux` | `isInstalled`, `getStoreState` |
248
+ | `smart.alpine` | `getVersion`, `getStoreKeys`, `getComponentCount` |
249
+ | `smart.shopify` | `getShop`, `getCart` |
250
+ | `smart.wordpress` | `isWP`, `getRestUrl`, `getPlugins` |
251
+ | `smart.laravel` | `getCSRF` |
252
+ | `smart.django` | `getCSRF` |
253
+ | `smart.drupal` | `getSettings` |
254
+ | `smart.jquery` | `getVersion` |
255
+
256
+ ## Links
257
+
258
+ - Extension install: https://crawlio.app/agent
259
+ - GitHub: https://github.com/Crawlio-app/crawlio-agent