@vpxa/aikit 0.1.144 → 0.1.145
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/packages/browser/dist/index.js +11 -4
- package/packages/cli/dist/index.js +3 -3
- package/packages/cli/dist/{init-CeRqVSQt.js → init-BBmNDjFy.js} +1 -1
- package/packages/cli/dist/{templates-Uuiq1jc_.js → templates-BXyPFub1.js} +9 -4
- package/packages/cli/dist/{user-B6_6Sk9I.js → user-ZDsx66gQ.js} +1 -1
- package/scaffold/dist/adapters/claude-code.mjs +10 -6
- package/scaffold/dist/definitions/skills/aikit.mjs +1 -1
- package/scaffold/dist/definitions/skills/browser-use.mjs +661 -16
- package/scaffold/dist/definitions/tools.mjs +1 -1
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
var e=[{file:`SKILL.md`,content:`---
|
|
2
2
|
name: browser-use
|
|
3
|
-
description: "Browser automation for AI agents using AI Kit's owned \`browser\` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) \`web_fetch\` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that \`web_fetch\` cannot handle, (5) user asks to browse, scrape, test, or automate a website. Uses AI Kit's owned Chromium runtime — no external MCP server dependency."
|
|
3
|
+
description: "Browser automation for AI agents using AI Kit's owned \`browser\` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) \`web_fetch\` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that \`web_fetch\` cannot handle, (5) user asks to browse, scrape, test, or automate a website, or (6) another skill needs a standard recipe format for browser-driven workflows. Uses AI Kit's owned Chromium runtime and recipe patterns for domain-specific automation skills — no external MCP server dependency."
|
|
4
4
|
metadata:
|
|
5
5
|
category: cross-cutting
|
|
6
6
|
domain: general
|
|
@@ -52,16 +52,16 @@ Use AI Kit's owned \`browser\` MCP tool to solve authentication barriers, extrac
|
|
|
52
52
|
|
|
53
53
|
## Browser Action Reference
|
|
54
54
|
|
|
55
|
-
| Action | Purpose | Key
|
|
55
|
+
| Action | Purpose | Key Params |
|
|
56
56
|
|--------|---------|------------|
|
|
57
|
-
| \`open\` |
|
|
58
|
-
| \`read\` |
|
|
59
|
-
| \`act\` |
|
|
60
|
-
| \`navigate\` |
|
|
61
|
-
| \`eval\` |
|
|
62
|
-
| \`screenshot\` | Capture
|
|
63
|
-
| \`dialog\` |
|
|
64
|
-
| \`session\` |
|
|
57
|
+
| \`open\` | Launch browser page | \`url\`, \`mode\` (\`ui\`/\`headless\`/\`panel\`), \`waitUntil\` |
|
|
58
|
+
| \`read\` | Extract page content | \`pageId\`, \`readMode\` (\`snapshot\`/\`dom\`/\`markdown\`/\`text\`), \`selector\` |
|
|
59
|
+
| \`act\` | DOM interactions | \`pageId\`, \`kind\` (\`click\`/\`type\`/\`press\`/\`hover\`/\`drag\`/\`select\`/\`scroll\`/\`upload\`) |
|
|
60
|
+
| \`navigate\` | Page navigation | \`pageId\`, \`url\`/\`type\`/\`selector\` |
|
|
61
|
+
| \`eval\` | Execute JavaScript | \`pageId\`, \`code\` |
|
|
62
|
+
| \`screenshot\` | Capture screenshots | \`pageId\`, \`selector\`, \`fullPage\`, \`clip\`, \`format\`, \`quality\` |
|
|
63
|
+
| \`dialog\` | Handle dialogs | \`pageId\`, \`accept\`, \`promptText\` |
|
|
64
|
+
| \`session\` | Session management | \`sessionAction\` (\`list\`/\`close\`/\`cookies\`/\`set-cookie\`/\`delete-cookie\`/\`clear-cookies\`/\`get-storage\`/\`set-storage\`/\`clear-storage\`) |
|
|
65
65
|
|
|
66
66
|
## Core Workflow
|
|
67
67
|
|
|
@@ -129,6 +129,640 @@ await browser({ action: 'session', sessionAction: 'close', pageId })
|
|
|
129
129
|
|
|
130
130
|
Use cookie export only when the user explicitly needs session transfer back into CLI tools.
|
|
131
131
|
|
|
132
|
+
## Read Modes
|
|
133
|
+
|
|
134
|
+
### Get ARIA snapshot (default)
|
|
135
|
+
|
|
136
|
+
\`\`\`
|
|
137
|
+
browser({ action: 'read', pageId })
|
|
138
|
+
browser({ action: 'read', pageId, readMode: 'snapshot' })
|
|
139
|
+
\`\`\`
|
|
140
|
+
|
|
141
|
+
### Get page as clean markdown
|
|
142
|
+
|
|
143
|
+
\`\`\`
|
|
144
|
+
browser({ action: 'read', pageId, readMode: 'markdown' })
|
|
145
|
+
\`\`\`
|
|
146
|
+
|
|
147
|
+
### Get HTML content (full page or scoped)
|
|
148
|
+
|
|
149
|
+
\`\`\`
|
|
150
|
+
browser({ action: 'read', pageId, readMode: 'dom' })
|
|
151
|
+
browser({ action: 'read', pageId, readMode: 'dom', selector: 'main' })
|
|
152
|
+
\`\`\`
|
|
153
|
+
|
|
154
|
+
### Get plain text
|
|
155
|
+
|
|
156
|
+
\`\`\`
|
|
157
|
+
browser({ action: 'read', pageId, readMode: 'text', selector: '.article-content' })
|
|
158
|
+
\`\`\`
|
|
159
|
+
|
|
160
|
+
## Advanced Screenshots
|
|
161
|
+
|
|
162
|
+
### Capture specific region
|
|
163
|
+
|
|
164
|
+
\`\`\`
|
|
165
|
+
browser({ action: 'screenshot', pageId, clip: { x: 0, y: 0, width: 800, height: 600 } })
|
|
166
|
+
\`\`\`
|
|
167
|
+
|
|
168
|
+
### JPEG format with quality
|
|
169
|
+
|
|
170
|
+
\`\`\`
|
|
171
|
+
browser({ action: 'screenshot', pageId, format: 'jpeg', quality: 80 })
|
|
172
|
+
\`\`\`
|
|
173
|
+
|
|
174
|
+
### Element screenshot with format
|
|
175
|
+
|
|
176
|
+
\`\`\`
|
|
177
|
+
browser({ action: 'screenshot', pageId, selector: '.chart', format: 'png' })
|
|
178
|
+
\`\`\`
|
|
179
|
+
|
|
180
|
+
## Cookie Management
|
|
181
|
+
|
|
182
|
+
### Set cookies
|
|
183
|
+
|
|
184
|
+
\`\`\`
|
|
185
|
+
browser({ action: 'session', sessionAction: 'set-cookie', confirm: true, cookies: [{ name: 'token', value: 'abc', domain: '.example.com', path: '/' }] })
|
|
186
|
+
\`\`\`
|
|
187
|
+
|
|
188
|
+
### Delete specific cookie
|
|
189
|
+
|
|
190
|
+
\`\`\`
|
|
191
|
+
browser({ action: 'session', sessionAction: 'delete-cookie', confirm: true, name: 'tracking' })
|
|
192
|
+
\`\`\`
|
|
193
|
+
|
|
194
|
+
### Clear all cookies
|
|
195
|
+
|
|
196
|
+
\`\`\`
|
|
197
|
+
browser({ action: 'session', sessionAction: 'clear-cookies', confirm: true })
|
|
198
|
+
\`\`\`
|
|
199
|
+
|
|
200
|
+
## Storage Access
|
|
201
|
+
|
|
202
|
+
### Read all localStorage
|
|
203
|
+
|
|
204
|
+
\`\`\`
|
|
205
|
+
browser({ action: 'session', sessionAction: 'get-storage', pageId, storageType: 'localStorage' })
|
|
206
|
+
\`\`\`
|
|
207
|
+
|
|
208
|
+
### Read specific key
|
|
209
|
+
|
|
210
|
+
\`\`\`
|
|
211
|
+
browser({ action: 'session', sessionAction: 'get-storage', pageId, storageType: 'localStorage', storageKey: 'user-preferences' })
|
|
212
|
+
\`\`\`
|
|
213
|
+
|
|
214
|
+
### Set storage value
|
|
215
|
+
|
|
216
|
+
\`\`\`
|
|
217
|
+
browser({ action: 'session', sessionAction: 'set-storage', pageId, storageType: 'localStorage', storageKey: 'theme', storageValue: 'dark' })
|
|
218
|
+
\`\`\`
|
|
219
|
+
|
|
220
|
+
### Clear sessionStorage
|
|
221
|
+
|
|
222
|
+
\`\`\`
|
|
223
|
+
browser({ action: 'session', sessionAction: 'clear-storage', pageId, storageType: 'sessionStorage' })
|
|
224
|
+
\`\`\`
|
|
225
|
+
|
|
226
|
+
## Scroll and Upload
|
|
227
|
+
|
|
228
|
+
### Scroll down
|
|
229
|
+
|
|
230
|
+
\`\`\`
|
|
231
|
+
browser({ action: 'act', pageId, kind: 'scroll', value: 'down 500' })
|
|
232
|
+
\`\`\`
|
|
233
|
+
|
|
234
|
+
### Scroll to top/bottom
|
|
235
|
+
|
|
236
|
+
\`\`\`
|
|
237
|
+
browser({ action: 'act', pageId, kind: 'scroll', value: 'top' })
|
|
238
|
+
browser({ action: 'act', pageId, kind: 'scroll', value: 'bottom' })
|
|
239
|
+
\`\`\`
|
|
240
|
+
|
|
241
|
+
### Scroll element into view
|
|
242
|
+
|
|
243
|
+
\`\`\`
|
|
244
|
+
browser({ action: 'act', pageId, kind: 'scroll', selector: '#target-element' })
|
|
245
|
+
\`\`\`
|
|
246
|
+
|
|
247
|
+
### Upload file
|
|
248
|
+
|
|
249
|
+
\`\`\`
|
|
250
|
+
browser({ action: 'act', pageId, kind: 'upload', selector: 'input[type="file"]', value: '/path/to/file.pdf' })
|
|
251
|
+
\`\`\`
|
|
252
|
+
|
|
253
|
+
### Upload multiple files
|
|
254
|
+
|
|
255
|
+
\`\`\`
|
|
256
|
+
browser({ action: 'act', pageId, kind: 'upload', selector: 'input[type="file"]', value: '["/path/file1.pdf", "/path/file2.pdf"]' })
|
|
257
|
+
\`\`\`
|
|
258
|
+
|
|
259
|
+
## Browser Automation Recipes
|
|
260
|
+
|
|
261
|
+
The browser tool is the foundation for multi-step web automation. Use this section to standardize recipes that domain-specific skills can consume, extend, and execute without inventing their own browser workflow format.
|
|
262
|
+
|
|
263
|
+
### Recipe Format
|
|
264
|
+
|
|
265
|
+
A browser recipe is a markdown workflow with explicit metadata, variables, steps, and cleanup.
|
|
266
|
+
|
|
267
|
+
#### Metadata
|
|
268
|
+
|
|
269
|
+
- **Name** — Human-readable recipe name
|
|
270
|
+
- **Trigger** — When the recipe should be used
|
|
271
|
+
- **Target** — Domain, URL family, or app surface it operates on
|
|
272
|
+
- **Mode** — \`headless\`, \`ui\`, or \`panel\`
|
|
273
|
+
- **Requires Auth** — \`yes\` or \`no\`
|
|
274
|
+
- **Destructive** — \`yes\` or \`no\`; destructive recipes require explicit user confirmation before execution
|
|
275
|
+
|
|
276
|
+
#### Variables
|
|
277
|
+
|
|
278
|
+
Define placeholders the agent must resolve before starting.
|
|
279
|
+
|
|
280
|
+
- \`{{url}}\` — target URL
|
|
281
|
+
- \`{{username}}\` — login or account identifier
|
|
282
|
+
- \`{{file_path}}\` — file path for uploads
|
|
283
|
+
|
|
284
|
+
For each variable, document what it means, whether the agent can infer it or must ask the user, whether it is sensitive, and an example value when that removes ambiguity.
|
|
285
|
+
|
|
286
|
+
#### Steps
|
|
287
|
+
|
|
288
|
+
Each numbered step should include:
|
|
289
|
+
|
|
290
|
+
1. **Action** — exact \`browser(...)\` call
|
|
291
|
+
2. **Verify** — how to confirm the action succeeded
|
|
292
|
+
3. **On Failure** — recovery path if verification fails
|
|
293
|
+
4. **Extract** — data to capture for later steps or the final result
|
|
294
|
+
|
|
295
|
+
#### Cleanup
|
|
296
|
+
|
|
297
|
+
Cleanup always runs, even when earlier steps fail. Close pages, export only user-approved session state, and leave the browser runtime in a known state.
|
|
298
|
+
|
|
299
|
+
#### Recipe Skeleton
|
|
300
|
+
|
|
301
|
+
\`\`\`markdown
|
|
302
|
+
# Recipe: <Name>
|
|
303
|
+
|
|
304
|
+
## Metadata
|
|
305
|
+
- Name: <Human-readable name>
|
|
306
|
+
- Trigger: <When to use it>
|
|
307
|
+
- Target: <Domain or URL family>
|
|
308
|
+
- Mode: headless
|
|
309
|
+
- Requires Auth: no
|
|
310
|
+
- Destructive: no
|
|
311
|
+
|
|
312
|
+
## Variables
|
|
313
|
+
- \`{{url}}\` — Target URL
|
|
314
|
+
- \`{{selector}}\` — Primary element selector
|
|
315
|
+
|
|
316
|
+
## Steps
|
|
317
|
+
1. Open target
|
|
318
|
+
- Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
|
|
319
|
+
- Verify: Browser returns a \`pageId\`
|
|
320
|
+
- On Failure: Retry once with \`waitUntil: 'load'\` or \`mode: 'ui'\`
|
|
321
|
+
- Extract: Save \`pageId\`
|
|
322
|
+
|
|
323
|
+
2. Inspect page
|
|
324
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
|
|
325
|
+
- Verify: Expected controls or content appear in output
|
|
326
|
+
- On Failure: Reload and re-read
|
|
327
|
+
- Extract: Save refs, selectors, visible labels
|
|
328
|
+
|
|
329
|
+
## Cleanup
|
|
330
|
+
- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
|
|
331
|
+
\`\`\`
|
|
332
|
+
|
|
333
|
+
### Recipe Templates
|
|
334
|
+
|
|
335
|
+
#### Recipe: Submit Web Form
|
|
336
|
+
|
|
337
|
+
**Variables**
|
|
338
|
+
|
|
339
|
+
- \`{{url}}\` — form page URL
|
|
340
|
+
- \`{{fields}}\` — field values keyed by selector or control ref
|
|
341
|
+
|
|
342
|
+
**Steps**
|
|
343
|
+
|
|
344
|
+
1. Open page
|
|
345
|
+
- Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless', waitUntil: 'domcontentloaded' })\`
|
|
346
|
+
- Verify: Browser returns a \`pageId\`
|
|
347
|
+
- On Failure: Retry once with \`waitUntil: 'load'\` or \`mode: 'ui'\`
|
|
348
|
+
- Extract: Save \`pageId\`
|
|
349
|
+
|
|
350
|
+
2. Read form structure
|
|
351
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
|
|
352
|
+
- Verify: Form fields and submit button appear in output
|
|
353
|
+
- On Failure: Re-read after reload or scope the read with a form selector
|
|
354
|
+
- Extract: Required fields, labels, visible validation hints, selectors or refs
|
|
355
|
+
|
|
356
|
+
3. Fill fields
|
|
357
|
+
- Action: For text inputs, use \`browser({ action: 'act', pageId, kind: 'type', selector: fieldSelector, text: value })\`
|
|
358
|
+
- Action: For dropdowns, use \`browser({ action: 'act', pageId, kind: 'select', selector: fieldSelector, value: optionValue })\`
|
|
359
|
+
- Action: For checkboxes or radio buttons, use \`browser({ action: 'act', pageId, kind: 'click', selector: fieldSelector })\`
|
|
360
|
+
- Verify: Re-read affected fields or take a screenshot after the batch
|
|
361
|
+
- On Failure: Re-read page, correct the selector, retry the failed field once
|
|
362
|
+
- Extract: Inline validation messages and any server-provided field defaults
|
|
363
|
+
|
|
364
|
+
4. Verify form state
|
|
365
|
+
- Action: \`browser({ action: 'screenshot', pageId, fullPage: true })\`
|
|
366
|
+
- Verify: Screenshot shows required fields populated as expected
|
|
367
|
+
- On Failure: Read visible validation messages with \`browser({ action: 'read', pageId, readMode: 'text' })\`
|
|
368
|
+
- Extract: Evidence screenshot for the final report
|
|
369
|
+
|
|
370
|
+
5. Submit
|
|
371
|
+
- Action: \`browser({ action: 'act', pageId, kind: 'click', selector: 'button[type="submit"]' })\`
|
|
372
|
+
- Verify: \`browser({ action: 'read', pageId, readMode: 'text' })\` shows a success message or the page navigates to a confirmation state
|
|
373
|
+
- On Failure: Inspect validation errors, fix fields, retry once
|
|
374
|
+
- Extract: Success text, destination URL, confirmation number if present
|
|
375
|
+
|
|
376
|
+
6. Capture result
|
|
377
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'markdown' })\`
|
|
378
|
+
- Verify: Output contains the expected success state
|
|
379
|
+
- On Failure: Fall back to \`readMode: 'text'\`
|
|
380
|
+
- Extract: Confirmation content for downstream skills
|
|
381
|
+
|
|
382
|
+
**Cleanup**
|
|
383
|
+
|
|
384
|
+
- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
|
|
385
|
+
|
|
386
|
+
#### Recipe: Extract Data from Web Page
|
|
387
|
+
|
|
388
|
+
**Variables**
|
|
389
|
+
|
|
390
|
+
- \`{{url}}\` — target page URL
|
|
391
|
+
- \`{{data_selector}}\` — selector for the content to extract
|
|
392
|
+
- \`{{pagination_selector}}\` — selector for the next-page control, when pagination exists
|
|
393
|
+
|
|
394
|
+
**Steps**
|
|
395
|
+
|
|
396
|
+
1. Open page
|
|
397
|
+
- Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
|
|
398
|
+
- Verify: Browser returns a \`pageId\`
|
|
399
|
+
- On Failure: Retry with \`waitUntil: 'networkidle'\`
|
|
400
|
+
- Extract: Save \`pageId\`
|
|
401
|
+
|
|
402
|
+
2. Extract content
|
|
403
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'markdown', selector: '{{data_selector}}' })\`
|
|
404
|
+
- Verify: Output is non-empty and scoped to the requested selector
|
|
405
|
+
- On Failure: Re-run with \`readMode: 'text'\` or confirm the selector with a snapshot read
|
|
406
|
+
- Extract: Store extracted content for the current page
|
|
407
|
+
|
|
408
|
+
3. Check for pagination
|
|
409
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
|
|
410
|
+
- Verify: Snapshot shows either a next-page control or a clear end state
|
|
411
|
+
- On Failure: Reload once, then re-read
|
|
412
|
+
- Extract: Whether \`{{pagination_selector}}\` exists and appears enabled
|
|
413
|
+
|
|
414
|
+
4. Advance when another page exists
|
|
415
|
+
- Action: If \`{{pagination_selector}}\` is present and enabled, run \`browser({ action: 'act', pageId, kind: 'click', selector: '{{pagination_selector}}' })\`
|
|
416
|
+
- Verify: \`browser({ action: 'navigate', pageId, type: 'waitFor', selector: '{{data_selector}}', timeoutMs: 30000 })\`
|
|
417
|
+
- On Failure: Reload page and retry pagination once
|
|
418
|
+
- Extract: Updated page content, page count, or cursor state
|
|
419
|
+
|
|
420
|
+
5. Repeat until no more pages
|
|
421
|
+
- Action: Return to step 2 after successful pagination
|
|
422
|
+
- Verify: Loop exits only when the next-page control is missing or disabled
|
|
423
|
+
- On Failure: Stop and report partial results
|
|
424
|
+
- Extract: Aggregate page-by-page results
|
|
425
|
+
|
|
426
|
+
**Cleanup**
|
|
427
|
+
|
|
428
|
+
- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
|
|
429
|
+
|
|
430
|
+
#### Recipe: Upload File to Web Service
|
|
431
|
+
|
|
432
|
+
**Variables**
|
|
433
|
+
|
|
434
|
+
- \`{{url}}\` — upload page URL
|
|
435
|
+
- \`{{file_path}}\` — local file path to upload
|
|
436
|
+
- \`{{file_input_selector}}\` — file input selector, usually \`input[type="file"]\`
|
|
437
|
+
|
|
438
|
+
**Steps**
|
|
439
|
+
|
|
440
|
+
1. Open upload page
|
|
441
|
+
- Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
|
|
442
|
+
- Verify: Upload page loads successfully
|
|
443
|
+
- On Failure: Retry with \`mode: 'ui'\`
|
|
444
|
+
- Extract: Save \`pageId\`
|
|
445
|
+
|
|
446
|
+
2. Inspect upload controls
|
|
447
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
|
|
448
|
+
- Verify: File input and submit controls are present
|
|
449
|
+
- On Failure: Re-read after reload or refine the selector
|
|
450
|
+
- Extract: Confirm the file input selector and submit control
|
|
451
|
+
|
|
452
|
+
3. Upload file
|
|
453
|
+
- Action: \`browser({ action: 'act', pageId, kind: 'upload', selector: '{{file_input_selector}}', value: '{{file_path}}' })\`
|
|
454
|
+
- Verify: Selected filename appears in the page or read output
|
|
455
|
+
- On Failure: Verify the file exists, confirm the selector targets a real \`<input type="file">\`, retry once
|
|
456
|
+
- Extract: Selected filename and any client-side validation message
|
|
457
|
+
|
|
458
|
+
4. Submit upload
|
|
459
|
+
- Action: \`browser({ action: 'act', pageId, kind: 'click', selector: '.upload-submit' })\`
|
|
460
|
+
- Verify: \`browser({ action: 'navigate', pageId, type: 'waitFor', selector: '.upload-success', timeoutMs: 30000 })\`
|
|
461
|
+
- On Failure: Read the page for upload errors, then retry once if recoverable
|
|
462
|
+
- Extract: Completion state and resulting URL if visible
|
|
463
|
+
|
|
464
|
+
5. Verify upload
|
|
465
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'text' })\`
|
|
466
|
+
- Verify: Output includes upload confirmation
|
|
467
|
+
- On Failure: Take a screenshot and report an ambiguous completion state
|
|
468
|
+
- Extract: Confirmation text, file URL, or server response summary
|
|
469
|
+
|
|
470
|
+
**Cleanup**
|
|
471
|
+
|
|
472
|
+
- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
|
|
473
|
+
|
|
474
|
+
#### Recipe: Authenticated Web Task
|
|
475
|
+
|
|
476
|
+
**Variables**
|
|
477
|
+
|
|
478
|
+
- \`{{login_url}}\` — login page URL
|
|
479
|
+
- \`{{target_url}}\` — target page after login
|
|
480
|
+
- \`{{username}}\` — account identifier, ask the user if not already known
|
|
481
|
+
- \`{{password}}\` — sensitive; do not store or echo it, and prefer having the user type it directly in the browser UI
|
|
482
|
+
|
|
483
|
+
**Steps**
|
|
484
|
+
|
|
485
|
+
1. Open login page
|
|
486
|
+
- Action: \`browser({ action: 'open', url: '{{login_url}}', mode: 'ui', waitUntil: 'domcontentloaded' })\`
|
|
487
|
+
- Verify: Login page is visible
|
|
488
|
+
- On Failure: Retry with \`waitUntil: 'load'\`
|
|
489
|
+
- Extract: Save \`pageId\`
|
|
490
|
+
|
|
491
|
+
2. Read login form
|
|
492
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
|
|
493
|
+
- Verify: Username, password, and submit controls are visible
|
|
494
|
+
- On Failure: Reload and re-read, or ask the user to describe the current page state
|
|
495
|
+
- Extract: Login selectors, SSO options, and challenge indicators
|
|
496
|
+
|
|
497
|
+
3. Enter credentials
|
|
498
|
+
- Action: Ask the user for \`{{username}}\` if needed, then run \`browser({ action: 'act', pageId, kind: 'type', selector: usernameSelector, text: '{{username}}' })\`
|
|
499
|
+
- Action: Have the user type the password directly in the visible browser when possible
|
|
500
|
+
- Action: After the user confirms password entry, run \`browser({ action: 'act', pageId, kind: 'click', selector: submitSelector })\`
|
|
501
|
+
- Verify: Page advances to a post-login state
|
|
502
|
+
- On Failure: Re-read and classify the blocker as invalid credentials, 2FA, CAPTCHA, or selector mismatch
|
|
503
|
+
- Extract: Login result state
|
|
504
|
+
|
|
505
|
+
4. Handle post-login challenges
|
|
506
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
|
|
507
|
+
- Verify: Output shows whether 2FA, CAPTCHA, consent, or success is present
|
|
508
|
+
- On Failure: Take a screenshot and ask the user what they see
|
|
509
|
+
- Extract: Challenge type and controls needed to continue
|
|
510
|
+
- If 2FA appears: ask the user for the code or have them enter it directly in the UI, then continue
|
|
511
|
+
- If CAPTCHA appears: ask the user to solve it manually, then continue
|
|
512
|
+
|
|
513
|
+
5. Navigate to target
|
|
514
|
+
- Action: \`browser({ action: 'navigate', pageId, url: '{{target_url}}' })\`
|
|
515
|
+
- Verify: Target page loads and expected content appears
|
|
516
|
+
- On Failure: Retry once after a fresh read or follow the app's redirect path manually
|
|
517
|
+
- Extract: Final URL and target page state
|
|
518
|
+
|
|
519
|
+
6. Perform task-specific work
|
|
520
|
+
- Action: Insert task-specific browser steps using the same Action / Verify / On Failure / Extract pattern
|
|
521
|
+
- Verify: Task-specific completion criteria hold
|
|
522
|
+
- On Failure: Stop after two failed recoveries and report the current state to the user
|
|
523
|
+
- Extract: Requested result data
|
|
524
|
+
|
|
525
|
+
**Cleanup**
|
|
526
|
+
|
|
527
|
+
- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
|
|
528
|
+
|
|
529
|
+
#### Recipe: Monitor Web Page for Changes
|
|
530
|
+
|
|
531
|
+
**Variables**
|
|
532
|
+
|
|
533
|
+
- \`{{url}}\` — page to monitor
|
|
534
|
+
- \`{{watch_selector}}\` — selector for the watched element
|
|
535
|
+
- \`{{interval_ms}}\` — time between checks in milliseconds
|
|
536
|
+
|
|
537
|
+
**Steps**
|
|
538
|
+
|
|
539
|
+
1. Open page
|
|
540
|
+
- Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
|
|
541
|
+
- Verify: Browser returns a \`pageId\`
|
|
542
|
+
- On Failure: Retry with \`mode: 'ui'\`
|
|
543
|
+
- Extract: Save \`pageId\`
|
|
544
|
+
|
|
545
|
+
2. Capture baseline
|
|
546
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'text', selector: '{{watch_selector}}' })\`
|
|
547
|
+
- Verify: Baseline content is non-empty
|
|
548
|
+
- On Failure: Confirm the selector with a snapshot read
|
|
549
|
+
- Extract: Baseline content for later comparison
|
|
550
|
+
|
|
551
|
+
3. Wait and re-check
|
|
552
|
+
- Action: \`browser({ action: 'eval', pageId, code: 'await new Promise((resolve) => setTimeout(resolve, {{interval_ms}}))' })\`
|
|
553
|
+
- Action: \`browser({ action: 'navigate', pageId, type: 'reload' })\`
|
|
554
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'text', selector: '{{watch_selector}}' })\`
|
|
555
|
+
- Verify: New content is captured successfully
|
|
556
|
+
- On Failure: Reload again and retry once
|
|
557
|
+
- Extract: Current content for diffing
|
|
558
|
+
|
|
559
|
+
4. Compare against baseline
|
|
560
|
+
- Action: Compare the current content with the stored baseline outside the browser call
|
|
561
|
+
- Verify: Comparison is deterministic
|
|
562
|
+
- On Failure: Re-run the text read once to rule out a partial load
|
|
563
|
+
- Extract: Changed or unchanged state
|
|
564
|
+
- If changed: report it to the user and capture \`browser({ action: 'screenshot', pageId, selector: '{{watch_selector}}' })\`
|
|
565
|
+
- If unchanged: return to step 3
|
|
566
|
+
|
|
567
|
+
**Cleanup**
|
|
568
|
+
|
|
569
|
+
- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
|
|
570
|
+
|
|
571
|
+
### Execution Protocol
|
|
572
|
+
|
|
573
|
+
When an agent receives a browser recipe to execute:
|
|
574
|
+
|
|
575
|
+
1. **Resolve variables** — ask the user for all unresolved \`{{variables}}\`, and explicitly flag which ones are sensitive.
|
|
576
|
+
2. **Pre-flight the environment** — if the recipe requires auth, destructive actions, uploads, or cookie export, warn the user before starting.
|
|
577
|
+
3. **Run sequentially on each page** — within one page, execute Action → Verify → On Failure → Extract in order. Shared DOM state is not parallel-safe.
|
|
578
|
+
4. **Stop after two failed recoveries on the same step** — report the current state, what failed, and what the user can do next.
|
|
579
|
+
5. **Run cleanup even on failure** — always close pages unless the user asked to keep the session open.
|
|
580
|
+
6. **Summarize results** — report what was completed, what data was extracted, and which steps were skipped or blocked.
|
|
581
|
+
|
|
582
|
+
### Error Recovery Strategies
|
|
583
|
+
|
|
584
|
+
| Error | Recovery |
|
|
585
|
+
|-------|----------|
|
|
586
|
+
| Element not found | Re-read with \`readMode: 'snapshot'\`, adjust selector or ref, then retry once |
|
|
587
|
+
| Page timeout | Reload page, wait for a more specific selector, then retry |
|
|
588
|
+
| Navigation failed | Verify target URL, try \`waitUntil: 'load'\` on open or \`type: 'waitFor'\` on navigate |
|
|
589
|
+
| Auth required | Switch to \`mode: 'ui'\`, follow the auth pattern, and let the user handle secrets directly |
|
|
590
|
+
| CAPTCHA or human check | Stop and ask the user to solve it manually, then continue from the next read |
|
|
591
|
+
| File upload failed | Verify local file path, confirm the selector targets a real file input, retry once |
|
|
592
|
+
| Storage access denied | Fall back to a narrow \`eval\` call only when browser session storage APIs are blocked |
|
|
593
|
+
| Network error | Wait briefly, reload, retry once, then report partial progress |
|
|
594
|
+
|
|
595
|
+
### Building Skills on Browser Primitives
|
|
596
|
+
|
|
597
|
+
\`browser-use\` is the foundation skill. Domain-specific skills such as deployment planners, release-note generators, internal admin workflows, or authenticated data collectors should depend on it instead of redefining browser semantics.
|
|
598
|
+
|
|
599
|
+
When another skill ships browser automation, it should treat this section as the shared contract.
|
|
600
|
+
|
|
601
|
+
#### Domain Skill Architecture
|
|
602
|
+
|
|
603
|
+
\`browser-use\` provides the primitives. Domain skills provide the workflow.
|
|
604
|
+
|
|
605
|
+
A domain skill built on top of browser automation should follow this architecture:
|
|
606
|
+
|
|
607
|
+
- The domain skill has its own \`SKILL.md\` that explains what business task it automates, such as creating deployment release notes from GitHub PRs.
|
|
608
|
+
- Browser recipes live inside that skill, either embedded directly in \`SKILL.md\` or stored as reusable docs under the skill's \`references/\` directory.
|
|
609
|
+
- The domain skill references \`browser-use\` for browser action semantics, security rules, auth escalation, and recovery patterns instead of redefining them.
|
|
610
|
+
- The domain skill guides the LLM through the end-to-end workflow, including when to gather inputs, when to run browser recipes, when to switch tools, and how to format the final output.
|
|
611
|
+
- Browser recipes handle web interaction details. The domain skill handles business intent, sequencing, domain-specific validation, and final deliverables.
|
|
612
|
+
|
|
613
|
+
This separation matters because a teammate usually does not want "a browser script." They want a skill for a business outcome such as deployment planning, release-note generation, status monitoring, or internal-tool automation. The domain skill explains the outcome and uses browser recipes as implementation building blocks.
|
|
614
|
+
|
|
615
|
+
#### Example: Deployment Release Notes Skill
|
|
616
|
+
|
|
617
|
+
A teammate wants a skill that automates creating release notes by scraping GitHub PRs, commit history, and linked tickets. Here is how the skill's \`SKILL.md\` would be structured:
|
|
618
|
+
|
|
619
|
+
\`\`\`markdown
|
|
620
|
+
# Deployment Release Notes - Automated Release Documentation
|
|
621
|
+
|
|
622
|
+
Generate deployment release notes by collecting PR descriptions, commit messages, and linked tickets from GitHub, then formatting them into a structured release document.
|
|
623
|
+
|
|
624
|
+
**When to use:** Before a deployment, when the team needs a summary of changes, or when creating a changelog for stakeholders.
|
|
625
|
+
|
|
626
|
+
**Prerequisites:**
|
|
627
|
+
- \`browser-use\` skill loaded (provides browser automation primitives and recipe format)
|
|
628
|
+
- Access to the GitHub repository (may require auth)
|
|
629
|
+
|
|
630
|
+
## Workflow
|
|
631
|
+
|
|
632
|
+
### Step 1: Gather Context
|
|
633
|
+
- Ask the user for: repository URL, release branch/tag, previous release tag
|
|
634
|
+
- Determine if GitHub auth is needed (private repo -> use Authenticated Web Task recipe from browser-use)
|
|
635
|
+
|
|
636
|
+
### Step 2: Collect PR Data
|
|
637
|
+
Follow this browser recipe:
|
|
638
|
+
|
|
639
|
+
# Recipe: Extract GitHub PRs Between Tags
|
|
640
|
+
|
|
641
|
+
## Metadata
|
|
642
|
+
- Name: GitHub PR Extraction
|
|
643
|
+
- Trigger: Need to list merged PRs between two git tags
|
|
644
|
+
- Target: github.com or GitHub Enterprise
|
|
645
|
+
- Mode: headless (public repos) or ui (private repos needing auth)
|
|
646
|
+
- Requires Auth: depends on repo visibility
|
|
647
|
+
- Destructive: no
|
|
648
|
+
|
|
649
|
+
## Variables
|
|
650
|
+
- \`{{repo_url}}\` - GitHub repository URL (e.g., https://github.com/org/repo)
|
|
651
|
+
- \`{{base_tag}}\` - Previous release tag
|
|
652
|
+
- \`{{head_tag}}\` - New release tag
|
|
653
|
+
|
|
654
|
+
## Steps
|
|
655
|
+
1. Open compare page
|
|
656
|
+
- Action: \`browser({ action: 'open', url: '{{repo_url}}/compare/{{base_tag}}...{{head_tag}}', mode: 'headless' })\`
|
|
657
|
+
- Verify: Page loads with comparison content
|
|
658
|
+
- On Failure: Switch to \`mode: 'ui'\` for auth, follow browser-use auth pattern
|
|
659
|
+
- Extract: Save \`pageId\`
|
|
660
|
+
|
|
661
|
+
2. Extract PR list
|
|
662
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'markdown', selector: '.js-commits-list-item, .pr-list' })\`
|
|
663
|
+
- Verify: Output contains commit or PR references
|
|
664
|
+
- On Failure: Try \`readMode: 'dom'\` and parse HTML, or use the commits tab instead
|
|
665
|
+
- Extract: List of PRs with titles, numbers, authors
|
|
666
|
+
|
|
667
|
+
3. For each PR, extract details
|
|
668
|
+
- Action: \`browser({ action: 'navigate', pageId, url: '{{repo_url}}/pull/{{pr_number}}' })\`
|
|
669
|
+
- Action: \`browser({ action: 'read', pageId, readMode: 'markdown', selector: '.comment-body' })\`
|
|
670
|
+
- Verify: PR description is captured
|
|
671
|
+
- On Failure: Fall back to \`readMode: 'text'\`
|
|
672
|
+
- Extract: PR title, description, labels, linked issues
|
|
673
|
+
|
|
674
|
+
## Cleanup
|
|
675
|
+
- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
|
|
676
|
+
|
|
677
|
+
### Step 3: Collect Linked Tickets (Optional)
|
|
678
|
+
If PRs reference JIRA or ticket URLs, follow the Data Extraction recipe from browser-use to scrape ticket titles and statuses.
|
|
679
|
+
|
|
680
|
+
### Step 4: Format Release Notes
|
|
681
|
+
Using the collected data, generate a structured release document:
|
|
682
|
+
- Group changes by category (features, fixes, chores) based on PR labels or commit prefixes
|
|
683
|
+
- Include PR links, authors, and ticket references
|
|
684
|
+
- Add deployment metadata (date, branch, tag range)
|
|
685
|
+
|
|
686
|
+
### Step 5: Output
|
|
687
|
+
Present the release notes to the user. Offer to:
|
|
688
|
+
- Copy to clipboard
|
|
689
|
+
- Save as markdown file
|
|
690
|
+
- Create a GitHub Release draft (requires additional browser recipe)
|
|
691
|
+
|
|
692
|
+
## Error Handling
|
|
693
|
+
- If GitHub auth is needed, follow browser-use auth patterns (switch to ui mode, let user handle SSO or 2FA)
|
|
694
|
+
- If PR extraction returns empty results, verify the tag names exist and the compare URL is correct
|
|
695
|
+
- If the ticket system is unreachable, skip ticket enrichment and note it in the output
|
|
696
|
+
\`\`\`
|
|
697
|
+
|
|
698
|
+
This example shows the pattern: the domain skill orchestrates the workflow, deciding what to collect and how to format it, while \`browser-use\` provides the primitives for opening pages, reading content, handling auth, and recovering from common browser failures.
|
|
699
|
+
|
|
700
|
+
#### How to Help Users Create Domain Skills
|
|
701
|
+
|
|
702
|
+
When a user asks you to create a skill that automates a web-based workflow, follow this process:
|
|
703
|
+
|
|
704
|
+
1. **Identify the workflow** - Ask what manual steps the user currently performs, which websites or web apps are involved, and what data they need to extract or actions they need to perform.
|
|
705
|
+
2. **Map to recipes** - Break the workflow into discrete browser recipes. Each recipe should handle one website or one logical browser task. For example, extracting PRs from GitHub is one recipe, while formatting release notes is a separate non-browser step.
|
|
706
|
+
3. **Check for reusable recipes** - Reuse and adapt the templates in this skill first: Form Submission, Data Extraction, File Upload, Authenticated Web Task, and Monitor Web Page. Do not write everything from scratch when an existing pattern already fits.
|
|
707
|
+
4. **Structure the skill** - Give the user a skill layout that separates the main workflow from reusable references:
|
|
708
|
+
|
|
709
|
+
\`\`\`text
|
|
710
|
+
my-automation-skill/
|
|
711
|
+
SKILL.md
|
|
712
|
+
references/
|
|
713
|
+
recipe-github-extract-prs.md
|
|
714
|
+
recipe-jira-get-tickets.md
|
|
715
|
+
\`\`\`
|
|
716
|
+
|
|
717
|
+
5. **Write the SKILL.md** - Include these sections:
|
|
718
|
+
- **Header** - What the skill does, when to use it, and prerequisites. Always list \`browser-use\` when browser recipes are part of the workflow.
|
|
719
|
+
- **Workflow** - Numbered high-level steps that mix browser recipes with non-browser reasoning, formatting, summarization, or file generation.
|
|
720
|
+
- **Embedded recipes** - Put workflow-specific browser tasks inline when they are tightly coupled to that skill.
|
|
721
|
+
- **Referenced recipes** - Link to reusable docs under \`references/\` when the same recipe may be reused across multiple skills.
|
|
722
|
+
- **Error handling** - Describe domain-specific recovery, such as what to do when auth fails, data is missing, or target pages change.
|
|
723
|
+
- **Output** - State what artifact the skill produces and how it should be delivered to the user.
|
|
724
|
+
6. **Test the skill** - Run each recipe in \`mode: 'ui'\` first to validate selectors, flow, and auth handling. Only switch to \`headless\` after the browser interactions are proven stable.
|
|
725
|
+
7. **Register the skill** - Add it to \`scaffold/definitions/plugins.mjs\` so it deploys with \`aikit init\`.
|
|
726
|
+
|
|
727
|
+
#### Domain Skill Ideas
|
|
728
|
+
|
|
729
|
+
These are examples of skills teams could build on top of \`browser-use\`:
|
|
730
|
+
|
|
731
|
+
| Skill | What it automates | Key recipes used |
|
|
732
|
+
|-------|-------------------|------------------|
|
|
733
|
+
| Deployment Release Notes | Scrape PRs, commits, and tickets into a formatted changelog | Authenticated Web Task, Data Extraction |
|
|
734
|
+
| Deployment Plan Creator | Gather service dependencies, change scope, and risk inputs from internal tools | Data Extraction, Form Submission |
|
|
735
|
+
| Status Page Monitor | Watch status pages and summarize changes | Monitor Web Page |
|
|
736
|
+
| Form Auto-filler | Pre-fill repetitive internal forms such as expense reports or time sheets | Form Submission, Authenticated Web Task |
|
|
737
|
+
| Screenshot Documentation | Capture annotated screenshots of UI flows for docs | Multi-page navigation, screenshots |
|
|
738
|
+
| Competitive Analysis | Extract pricing, feature lists, and positioning details from public sites | Data Extraction, pagination |
|
|
739
|
+
| Internal Tool Automation | Automate admin workflows in internal web apps that have no API | Authenticated Web Task, Form Submission |
|
|
740
|
+
|
|
741
|
+
Use this guidance when you are helping a user create a new skill: describe the business workflow first, identify which browser recipes it needs, and keep browser-specific details aligned with the primitives and safety model documented here.
|
|
742
|
+
|
|
743
|
+
#### Naming Convention
|
|
744
|
+
|
|
745
|
+
- Use \`recipe-{domain}-{action}.md\` for reusable standalone recipe docs.
|
|
746
|
+
- Store reusable examples in the consuming skill's \`references/\` directory.
|
|
747
|
+
- Match the recipe title to the user-facing capability, not the implementation detail.
|
|
748
|
+
|
|
749
|
+
#### Quality Checklist
|
|
750
|
+
|
|
751
|
+
- [ ] Variables documented, including sensitivity and whether the agent may infer them
|
|
752
|
+
- [ ] Every step includes Action, Verify, On Failure, and Extract
|
|
753
|
+
- [ ] Auth and destructive behavior are declared in metadata
|
|
754
|
+
- [ ] Cleanup is present and closes browser pages unless a kept-open session is intentional
|
|
755
|
+
- [ ] Recovery paths stop after bounded retries instead of looping indefinitely
|
|
756
|
+
- [ ] Recipe was exercised in \`mode: 'ui'\` at least once
|
|
757
|
+
|
|
758
|
+
#### Composition Notes
|
|
759
|
+
|
|
760
|
+
- Keep steps small enough that one read, screenshot, or selector wait can verify them.
|
|
761
|
+
- Prefer \`readMode: 'snapshot'\` to discover controls, \`readMode: 'text'\` to verify outcomes, and \`readMode: 'markdown'\` to capture extracted content.
|
|
762
|
+
- Use \`navigate({ type: 'waitFor', selector, timeoutMs })\` instead of timing guesses when a page transition has a concrete ready signal.
|
|
763
|
+
- Use \`eval\` only for narrow gaps the built-in actions cannot cover.
|
|
764
|
+
- Follow [references/auth-patterns.md](references/auth-patterns.md) for SSO, OAuth, CAPTCHA, or 2FA flows.
|
|
765
|
+
|
|
132
766
|
## Security Model (HARD GATE)
|
|
133
767
|
|
|
134
768
|
- AI Kit enforces URL allowlisting before page navigation; respect denials instead of trying alternate bypasses.
|
|
@@ -173,17 +807,28 @@ This keeps the viewing workflow inside the same owned runtime.
|
|
|
173
807
|
| CAPTCHA appears | Ask the user to solve it manually, then continue from \`read\` |
|
|
174
808
|
| Need to inspect cookies | Use \`browser({ action: 'session', sessionAction: 'cookies', pageId })\` and warn the user |
|
|
175
809
|
| Need complex DOM extraction | Use \`browser({ action: 'eval', ... })\` with a small, targeted script |
|
|
810
|
+
| Scroll not loading more content | Add a wait after scroll: eval with setTimeout, then re-read |
|
|
811
|
+
| File upload not working | Ensure selector targets an actual \`<input type="file">\` element |
|
|
812
|
+
| Storage access denied | Some sites block storage access in certain contexts; try eval instead |
|
|
813
|
+
| Cookie set failed | Verify domain/path match the target site; set-cookie requires confirm:true |
|
|
814
|
+
| Markdown output too messy | Use \`readMode: 'text'\` for simpler output, or scope with selector |
|
|
176
815
|
|
|
177
816
|
## Decision Flow
|
|
178
817
|
|
|
179
818
|
\`\`\`
|
|
180
819
|
Need browser help?
|
|
181
|
-
├─ Public page, no JS or auth needed?
|
|
182
|
-
├─
|
|
183
|
-
├─
|
|
184
|
-
├─ Need
|
|
185
|
-
├─
|
|
186
|
-
|
|
820
|
+
├─ Public page, no JS or auth needed? → web_fetch (simpler, faster)
|
|
821
|
+
├─ Need JS rendering or interaction? → browser open → read
|
|
822
|
+
├─ Need clean markdown of a page? → browser read (readMode: 'markdown')
|
|
823
|
+
├─ Need structured HTML/DOM? → browser read (readMode: 'dom')
|
|
824
|
+
├─ Login wall or SSO flow? → repo-access → browser-use auth patterns
|
|
825
|
+
├─ Need to fill forms / submit data? → browser act (type/click/select)
|
|
826
|
+
├─ Need to upload files? → browser act (upload)
|
|
827
|
+
├─ Need to scroll / load lazy content? → browser act (scroll)
|
|
828
|
+
├─ Need screenshot of specific region? → browser screenshot (clip)
|
|
829
|
+
├─ Need session/cookie management? → browser session (cookies/storage)
|
|
830
|
+
├─ Need local dashboard viewing? → present(browser) → browser open
|
|
831
|
+
└─ Complex multi-step automation? → Compose patterns from this skill
|
|
187
832
|
\`\`\`
|
|
188
833
|
`},{file:`references/auth-patterns.md`,content:`# Browser Auth Patterns
|
|
189
834
|
|