@vpxa/aikit 0.1.144 → 0.1.145

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  var e=[{file:`SKILL.md`,content:`---
2
2
  name: browser-use
3
- description: "Browser automation for AI agents using AI Kit's owned \`browser\` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) \`web_fetch\` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that \`web_fetch\` cannot handle, (5) user asks to browse, scrape, test, or automate a website. Uses AI Kit's owned Chromium runtime — no external MCP server dependency."
3
+ description: "Browser automation for AI agents using AI Kit's owned \`browser\` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) \`web_fetch\` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that \`web_fetch\` cannot handle, (5) user asks to browse, scrape, test, or automate a website, or (6) another skill needs a standard recipe format for browser-driven workflows. Uses AI Kit's owned Chromium runtime and recipe patterns for domain-specific automation skills — no external MCP server dependency."
4
4
  metadata:
5
5
  category: cross-cutting
6
6
  domain: general
@@ -52,16 +52,16 @@ Use AI Kit's owned \`browser\` MCP tool to solve authentication barriers, extrac
52
52
 
53
53
  ## Browser Action Reference
54
54
 
55
- | Action | Purpose | Key Fields |
55
+ | Action | Purpose | Key Params |
56
56
  |--------|---------|------------|
57
- | \`open\` | Open a page in AI Kit's owned browser runtime | \`url\`, \`mode?\`, \`waitUntil?\` |
58
- | \`read\` | Return accessibility snapshot with refs and visible structure | \`pageId\` |
59
- | \`act\` | Interact with the page: click, type, press, hover, drag, select | \`pageId\`, \`kind\`, selector/ref/text/key fields |
60
- | \`navigate\` | Go to URL, back, forward, reload, or wait for navigation | \`pageId\`, \`url?\`, \`type?\`, \`waitFor?\` |
61
- | \`eval\` | Run sandboxed JavaScript in the page context | \`pageId\`, \`code\` |
62
- | \`screenshot\` | Capture page or element screenshot | \`pageId\`, selector/ref fields |
63
- | \`dialog\` | Accept or dismiss modal dialogs and related prompts | \`pageId\`, \`accept\`, \`promptText?\` |
64
- | \`session\` | List open pages, close a page, or export cookies | \`sessionAction\`, \`pageId?\` |
57
+ | \`open\` | Launch browser page | \`url\`, \`mode\` (\`ui\`/\`headless\`/\`panel\`), \`waitUntil\` |
58
+ | \`read\` | Extract page content | \`pageId\`, \`readMode\` (\`snapshot\`/\`dom\`/\`markdown\`/\`text\`), \`selector\` |
59
+ | \`act\` | DOM interactions | \`pageId\`, \`kind\` (\`click\`/\`type\`/\`press\`/\`hover\`/\`drag\`/\`select\`/\`scroll\`/\`upload\`) |
60
+ | \`navigate\` | Page navigation | \`pageId\`, \`url\`/\`type\`/\`selector\` |
61
+ | \`eval\` | Execute JavaScript | \`pageId\`, \`code\` |
62
+ | \`screenshot\` | Capture screenshots | \`pageId\`, \`selector\`, \`fullPage\`, \`clip\`, \`format\`, \`quality\` |
63
+ | \`dialog\` | Handle dialogs | \`pageId\`, \`accept\`, \`promptText\` |
64
+ | \`session\` | Session management | \`sessionAction\` (\`list\`/\`close\`/\`cookies\`/\`set-cookie\`/\`delete-cookie\`/\`clear-cookies\`/\`get-storage\`/\`set-storage\`/\`clear-storage\`) |
65
65
 
66
66
  ## Core Workflow
67
67
 
@@ -129,6 +129,640 @@ await browser({ action: 'session', sessionAction: 'close', pageId })
129
129
 
130
130
  Use cookie export only when the user explicitly needs session transfer back into CLI tools.
131
131
 
132
+ ## Read Modes
133
+
134
+ ### Get ARIA snapshot (default)
135
+
136
+ \`\`\`
137
+ browser({ action: 'read', pageId })
138
+ browser({ action: 'read', pageId, readMode: 'snapshot' })
139
+ \`\`\`
140
+
141
+ ### Get page as clean markdown
142
+
143
+ \`\`\`
144
+ browser({ action: 'read', pageId, readMode: 'markdown' })
145
+ \`\`\`
146
+
147
+ ### Get HTML content (full page or scoped)
148
+
149
+ \`\`\`
150
+ browser({ action: 'read', pageId, readMode: 'dom' })
151
+ browser({ action: 'read', pageId, readMode: 'dom', selector: 'main' })
152
+ \`\`\`
153
+
154
+ ### Get plain text
155
+
156
+ \`\`\`
157
+ browser({ action: 'read', pageId, readMode: 'text', selector: '.article-content' })
158
+ \`\`\`
159
+
160
+ ## Advanced Screenshots
161
+
162
+ ### Capture specific region
163
+
164
+ \`\`\`
165
+ browser({ action: 'screenshot', pageId, clip: { x: 0, y: 0, width: 800, height: 600 } })
166
+ \`\`\`
167
+
168
+ ### JPEG format with quality
169
+
170
+ \`\`\`
171
+ browser({ action: 'screenshot', pageId, format: 'jpeg', quality: 80 })
172
+ \`\`\`
173
+
174
+ ### Element screenshot with format
175
+
176
+ \`\`\`
177
+ browser({ action: 'screenshot', pageId, selector: '.chart', format: 'png' })
178
+ \`\`\`
179
+
180
+ ## Cookie Management
181
+
182
+ ### Set cookies
183
+
184
+ \`\`\`
185
+ browser({ action: 'session', sessionAction: 'set-cookie', confirm: true, cookies: [{ name: 'token', value: 'abc', domain: '.example.com', path: '/' }] })
186
+ \`\`\`
187
+
188
+ ### Delete specific cookie
189
+
190
+ \`\`\`
191
+ browser({ action: 'session', sessionAction: 'delete-cookie', confirm: true, name: 'tracking' })
192
+ \`\`\`
193
+
194
+ ### Clear all cookies
195
+
196
+ \`\`\`
197
+ browser({ action: 'session', sessionAction: 'clear-cookies', confirm: true })
198
+ \`\`\`
199
+
200
+ ## Storage Access
201
+
202
+ ### Read all localStorage
203
+
204
+ \`\`\`
205
+ browser({ action: 'session', sessionAction: 'get-storage', pageId, storageType: 'localStorage' })
206
+ \`\`\`
207
+
208
+ ### Read specific key
209
+
210
+ \`\`\`
211
+ browser({ action: 'session', sessionAction: 'get-storage', pageId, storageType: 'localStorage', storageKey: 'user-preferences' })
212
+ \`\`\`
213
+
214
+ ### Set storage value
215
+
216
+ \`\`\`
217
+ browser({ action: 'session', sessionAction: 'set-storage', pageId, storageType: 'localStorage', storageKey: 'theme', storageValue: 'dark' })
218
+ \`\`\`
219
+
220
+ ### Clear sessionStorage
221
+
222
+ \`\`\`
223
+ browser({ action: 'session', sessionAction: 'clear-storage', pageId, storageType: 'sessionStorage' })
224
+ \`\`\`
225
+
226
+ ## Scroll and Upload
227
+
228
+ ### Scroll down
229
+
230
+ \`\`\`
231
+ browser({ action: 'act', pageId, kind: 'scroll', value: 'down 500' })
232
+ \`\`\`
233
+
234
+ ### Scroll to top/bottom
235
+
236
+ \`\`\`
237
+ browser({ action: 'act', pageId, kind: 'scroll', value: 'top' })
238
+ browser({ action: 'act', pageId, kind: 'scroll', value: 'bottom' })
239
+ \`\`\`
240
+
241
+ ### Scroll element into view
242
+
243
+ \`\`\`
244
+ browser({ action: 'act', pageId, kind: 'scroll', selector: '#target-element' })
245
+ \`\`\`
246
+
247
+ ### Upload file
248
+
249
+ \`\`\`
250
+ browser({ action: 'act', pageId, kind: 'upload', selector: 'input[type="file"]', value: '/path/to/file.pdf' })
251
+ \`\`\`
252
+
253
+ ### Upload multiple files
254
+
255
+ \`\`\`
256
+ browser({ action: 'act', pageId, kind: 'upload', selector: 'input[type="file"]', value: '["/path/file1.pdf", "/path/file2.pdf"]' })
257
+ \`\`\`
258
+
259
+ ## Browser Automation Recipes
260
+
261
+ The browser tool is the foundation for multi-step web automation. Use this section to standardize recipes that domain-specific skills can consume, extend, and execute without inventing their own browser workflow format.
262
+
263
+ ### Recipe Format
264
+
265
+ A browser recipe is a markdown workflow with explicit metadata, variables, steps, and cleanup.
266
+
267
+ #### Metadata
268
+
269
+ - **Name** — Human-readable recipe name
270
+ - **Trigger** — When the recipe should be used
271
+ - **Target** — Domain, URL family, or app surface it operates on
272
+ - **Mode** — \`headless\`, \`ui\`, or \`panel\`
273
+ - **Requires Auth** — \`yes\` or \`no\`
274
+ - **Destructive** — \`yes\` or \`no\`; destructive recipes require explicit user confirmation before execution
275
+
276
+ #### Variables
277
+
278
+ Define placeholders the agent must resolve before starting.
279
+
280
+ - \`{{url}}\` — target URL
281
+ - \`{{username}}\` — login or account identifier
282
+ - \`{{file_path}}\` — file path for uploads
283
+
284
+ For each variable, document what it means, whether the agent can infer it or must ask the user, whether it is sensitive, and an example value when that removes ambiguity.
285
+
286
+ #### Steps
287
+
288
+ Each numbered step should include:
289
+
290
+ 1. **Action** — exact \`browser(...)\` call
291
+ 2. **Verify** — how to confirm the action succeeded
292
+ 3. **On Failure** — recovery path if verification fails
293
+ 4. **Extract** — data to capture for later steps or the final result
294
+
295
+ #### Cleanup
296
+
297
+ Cleanup always runs, even when earlier steps fail. Close pages, export only user-approved session state, and leave the browser runtime in a known state.
298
+
299
+ #### Recipe Skeleton
300
+
301
+ \`\`\`markdown
302
+ # Recipe: <Name>
303
+
304
+ ## Metadata
305
+ - Name: <Human-readable name>
306
+ - Trigger: <When to use it>
307
+ - Target: <Domain or URL family>
308
+ - Mode: headless
309
+ - Requires Auth: no
310
+ - Destructive: no
311
+
312
+ ## Variables
313
+ - \`{{url}}\` — Target URL
314
+ - \`{{selector}}\` — Primary element selector
315
+
316
+ ## Steps
317
+ 1. Open target
318
+ - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
319
+ - Verify: Browser returns a \`pageId\`
320
+ - On Failure: Retry once with \`waitUntil: 'load'\` or \`mode: 'ui'\`
321
+ - Extract: Save \`pageId\`
322
+
323
+ 2. Inspect page
324
+ - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
325
+ - Verify: Expected controls or content appear in output
326
+ - On Failure: Reload and re-read
327
+ - Extract: Save refs, selectors, visible labels
328
+
329
+ ## Cleanup
330
+ - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
331
+ \`\`\`
332
+
333
+ ### Recipe Templates
334
+
335
+ #### Recipe: Submit Web Form
336
+
337
+ **Variables**
338
+
339
+ - \`{{url}}\` — form page URL
340
+ - \`{{fields}}\` — field values keyed by selector or control ref
341
+
342
+ **Steps**
343
+
344
+ 1. Open page
345
+ - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless', waitUntil: 'domcontentloaded' })\`
346
+ - Verify: Browser returns a \`pageId\`
347
+ - On Failure: Retry once with \`waitUntil: 'load'\` or \`mode: 'ui'\`
348
+ - Extract: Save \`pageId\`
349
+
350
+ 2. Read form structure
351
+ - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
352
+ - Verify: Form fields and submit button appear in output
353
+ - On Failure: Re-read after reload or scope the read with a form selector
354
+ - Extract: Required fields, labels, visible validation hints, selectors or refs
355
+
356
+ 3. Fill fields
357
+ - Action: For text inputs, use \`browser({ action: 'act', pageId, kind: 'type', selector: fieldSelector, text: value })\`
358
+ - Action: For dropdowns, use \`browser({ action: 'act', pageId, kind: 'select', selector: fieldSelector, value: optionValue })\`
359
+ - Action: For checkboxes or radio buttons, use \`browser({ action: 'act', pageId, kind: 'click', selector: fieldSelector })\`
360
+ - Verify: Re-read affected fields or take a screenshot after the batch
361
+ - On Failure: Re-read page, correct the selector, retry the failed field once
362
+ - Extract: Inline validation messages and any server-provided field defaults
363
+
364
+ 4. Verify form state
365
+ - Action: \`browser({ action: 'screenshot', pageId, fullPage: true })\`
366
+ - Verify: Screenshot shows required fields populated as expected
367
+ - On Failure: Read visible validation messages with \`browser({ action: 'read', pageId, readMode: 'text' })\`
368
+ - Extract: Evidence screenshot for the final report
369
+
370
+ 5. Submit
371
+ - Action: \`browser({ action: 'act', pageId, kind: 'click', selector: 'button[type="submit"]' })\`
372
+ - Verify: \`browser({ action: 'read', pageId, readMode: 'text' })\` shows a success message or the page navigates to a confirmation state
373
+ - On Failure: Inspect validation errors, fix fields, retry once
374
+ - Extract: Success text, destination URL, confirmation number if present
375
+
376
+ 6. Capture result
377
+ - Action: \`browser({ action: 'read', pageId, readMode: 'markdown' })\`
378
+ - Verify: Output contains the expected success state
379
+ - On Failure: Fall back to \`readMode: 'text'\`
380
+ - Extract: Confirmation content for downstream skills
381
+
382
+ **Cleanup**
383
+
384
+ - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
385
+
386
+ #### Recipe: Extract Data from Web Page
387
+
388
+ **Variables**
389
+
390
+ - \`{{url}}\` — target page URL
391
+ - \`{{data_selector}}\` — selector for the content to extract
392
+ - \`{{pagination_selector}}\` — selector for the next-page control, when pagination exists
393
+
394
+ **Steps**
395
+
396
+ 1. Open page
397
+ - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
398
+ - Verify: Browser returns a \`pageId\`
399
+ - On Failure: Retry with \`waitUntil: 'networkidle'\`
400
+ - Extract: Save \`pageId\`
401
+
402
+ 2. Extract content
403
+ - Action: \`browser({ action: 'read', pageId, readMode: 'markdown', selector: '{{data_selector}}' })\`
404
+ - Verify: Output is non-empty and scoped to the requested selector
405
+ - On Failure: Re-run with \`readMode: 'text'\` or confirm the selector with a snapshot read
406
+ - Extract: Store extracted content for the current page
407
+
408
+ 3. Check for pagination
409
+ - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
410
+ - Verify: Snapshot shows either a next-page control or a clear end state
411
+ - On Failure: Reload once, then re-read
412
+ - Extract: Whether \`{{pagination_selector}}\` exists and appears enabled
413
+
414
+ 4. Advance when another page exists
415
+ - Action: If \`{{pagination_selector}}\` is present and enabled, run \`browser({ action: 'act', pageId, kind: 'click', selector: '{{pagination_selector}}' })\`
416
+ - Verify: \`browser({ action: 'navigate', pageId, type: 'waitFor', selector: '{{data_selector}}', timeoutMs: 30000 })\`
417
+ - On Failure: Reload page and retry pagination once
418
+ - Extract: Updated page content, page count, or cursor state
419
+
420
+ 5. Repeat until no more pages
421
+ - Action: Return to step 2 after successful pagination
422
+ - Verify: Loop exits only when the next-page control is missing or disabled
423
+ - On Failure: Stop and report partial results
424
+ - Extract: Aggregate page-by-page results
425
+
426
+ **Cleanup**
427
+
428
+ - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
429
+
430
+ #### Recipe: Upload File to Web Service
431
+
432
+ **Variables**
433
+
434
+ - \`{{url}}\` — upload page URL
435
+ - \`{{file_path}}\` — local file path to upload
436
+ - \`{{file_input_selector}}\` — file input selector, usually \`input[type="file"]\`
437
+
438
+ **Steps**
439
+
440
+ 1. Open upload page
441
+ - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
442
+ - Verify: Upload page loads successfully
443
+ - On Failure: Retry with \`mode: 'ui'\`
444
+ - Extract: Save \`pageId\`
445
+
446
+ 2. Inspect upload controls
447
+ - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
448
+ - Verify: File input and submit controls are present
449
+ - On Failure: Re-read after reload or refine the selector
450
+ - Extract: Confirm the file input selector and submit control
451
+
452
+ 3. Upload file
453
+ - Action: \`browser({ action: 'act', pageId, kind: 'upload', selector: '{{file_input_selector}}', value: '{{file_path}}' })\`
454
+ - Verify: Selected filename appears in the page or read output
455
+ - On Failure: Verify the file exists, confirm the selector targets a real \`<input type="file">\`, retry once
456
+ - Extract: Selected filename and any client-side validation message
457
+
458
+ 4. Submit upload
459
+ - Action: \`browser({ action: 'act', pageId, kind: 'click', selector: '.upload-submit' })\`
460
+ - Verify: \`browser({ action: 'navigate', pageId, type: 'waitFor', selector: '.upload-success', timeoutMs: 30000 })\`
461
+ - On Failure: Read the page for upload errors, then retry once if recoverable
462
+ - Extract: Completion state and resulting URL if visible
463
+
464
+ 5. Verify upload
465
+ - Action: \`browser({ action: 'read', pageId, readMode: 'text' })\`
466
+ - Verify: Output includes upload confirmation
467
+ - On Failure: Take a screenshot and report an ambiguous completion state
468
+ - Extract: Confirmation text, file URL, or server response summary
469
+
470
+ **Cleanup**
471
+
472
+ - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
473
+
474
+ #### Recipe: Authenticated Web Task
475
+
476
+ **Variables**
477
+
478
+ - \`{{login_url}}\` — login page URL
479
+ - \`{{target_url}}\` — target page after login
480
+ - \`{{username}}\` — account identifier, ask the user if not already known
481
+ - \`{{password}}\` — sensitive; do not store or echo it, and prefer having the user type it directly in the browser UI
482
+
483
+ **Steps**
484
+
485
+ 1. Open login page
486
+ - Action: \`browser({ action: 'open', url: '{{login_url}}', mode: 'ui', waitUntil: 'domcontentloaded' })\`
487
+ - Verify: Login page is visible
488
+ - On Failure: Retry with \`waitUntil: 'load'\`
489
+ - Extract: Save \`pageId\`
490
+
491
+ 2. Read login form
492
+ - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
493
+ - Verify: Username, password, and submit controls are visible
494
+ - On Failure: Reload and re-read, or ask the user to describe the current page state
495
+ - Extract: Login selectors, SSO options, and challenge indicators
496
+
497
+ 3. Enter credentials
498
+ - Action: Ask the user for \`{{username}}\` if needed, then run \`browser({ action: 'act', pageId, kind: 'type', selector: usernameSelector, text: '{{username}}' })\`
499
+ - Action: Have the user type the password directly in the visible browser when possible
500
+ - Action: After the user confirms password entry, run \`browser({ action: 'act', pageId, kind: 'click', selector: submitSelector })\`
501
+ - Verify: Page advances to a post-login state
502
+ - On Failure: Re-read and classify the blocker as invalid credentials, 2FA, CAPTCHA, or selector mismatch
503
+ - Extract: Login result state
504
+
505
+ 4. Handle post-login challenges
506
+ - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
507
+ - Verify: Output shows whether 2FA, CAPTCHA, consent, or success is present
508
+ - On Failure: Take a screenshot and ask the user what they see
509
+ - Extract: Challenge type and controls needed to continue
510
+ - If 2FA appears: ask the user for the code or have them enter it directly in the UI, then continue
511
+ - If CAPTCHA appears: ask the user to solve it manually, then continue
512
+
513
+ 5. Navigate to target
514
+ - Action: \`browser({ action: 'navigate', pageId, url: '{{target_url}}' })\`
515
+ - Verify: Target page loads and expected content appears
516
+ - On Failure: Retry once after a fresh read or follow the app's redirect path manually
517
+ - Extract: Final URL and target page state
518
+
519
+ 6. Perform task-specific work
520
+ - Action: Insert task-specific browser steps using the same Action / Verify / On Failure / Extract pattern
521
+ - Verify: Task-specific completion criteria hold
522
+ - On Failure: Stop after two failed recoveries and report the current state to the user
523
+ - Extract: Requested result data
524
+
525
+ **Cleanup**
526
+
527
+ - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
528
+
529
+ #### Recipe: Monitor Web Page for Changes
530
+
531
+ **Variables**
532
+
533
+ - \`{{url}}\` — page to monitor
534
+ - \`{{watch_selector}}\` — selector for the watched element
535
+ - \`{{interval_ms}}\` — time between checks in milliseconds
536
+
537
+ **Steps**
538
+
539
+ 1. Open page
540
+ - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
541
+ - Verify: Browser returns a \`pageId\`
542
+ - On Failure: Retry with \`mode: 'ui'\`
543
+ - Extract: Save \`pageId\`
544
+
545
+ 2. Capture baseline
546
+ - Action: \`browser({ action: 'read', pageId, readMode: 'text', selector: '{{watch_selector}}' })\`
547
+ - Verify: Baseline content is non-empty
548
+ - On Failure: Confirm the selector with a snapshot read
549
+ - Extract: Baseline content for later comparison
550
+
551
+ 3. Wait and re-check
552
+ - Action: \`browser({ action: 'eval', pageId, code: 'await new Promise((resolve) => setTimeout(resolve, {{interval_ms}}))' })\`
553
+ - Action: \`browser({ action: 'navigate', pageId, type: 'reload' })\`
554
+ - Action: \`browser({ action: 'read', pageId, readMode: 'text', selector: '{{watch_selector}}' })\`
555
+ - Verify: New content is captured successfully
556
+ - On Failure: Reload again and retry once
557
+ - Extract: Current content for diffing
558
+
559
+ 4. Compare against baseline
560
+ - Action: Compare the current content with the stored baseline outside the browser call
561
+ - Verify: Comparison is deterministic
562
+ - On Failure: Re-run the text read once to rule out a partial load
563
+ - Extract: Changed or unchanged state
564
+ - If changed: report it to the user and capture \`browser({ action: 'screenshot', pageId, selector: '{{watch_selector}}' })\`
565
+ - If unchanged: return to step 3
566
+
567
+ **Cleanup**
568
+
569
+ - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
570
+
571
+ ### Execution Protocol
572
+
573
+ When an agent receives a browser recipe to execute:
574
+
575
+ 1. **Resolve variables** — ask the user for all unresolved \`{{variables}}\`, and explicitly flag which ones are sensitive.
576
+ 2. **Pre-flight the environment** — if the recipe requires auth, destructive actions, uploads, or cookie export, warn the user before starting.
577
+ 3. **Run sequentially on each page** — within one page, execute Action → Verify → On Failure → Extract in order. Shared DOM state is not parallel-safe.
578
+ 4. **Stop after two failed recoveries on the same step** — report the current state, what failed, and what the user can do next.
579
+ 5. **Run cleanup even on failure** — always close pages unless the user asked to keep the session open.
580
+ 6. **Summarize results** — report what was completed, what data was extracted, and which steps were skipped or blocked.
581
+
582
+ ### Error Recovery Strategies
583
+
584
+ | Error | Recovery |
585
+ |-------|----------|
586
+ | Element not found | Re-read with \`readMode: 'snapshot'\`, adjust selector or ref, then retry once |
587
+ | Page timeout | Reload page, wait for a more specific selector, then retry |
588
+ | Navigation failed | Verify target URL, try \`waitUntil: 'load'\` on open or \`type: 'waitFor'\` on navigate |
589
+ | Auth required | Switch to \`mode: 'ui'\`, follow the auth pattern, and let the user handle secrets directly |
590
+ | CAPTCHA or human check | Stop and ask the user to solve it manually, then continue from the next read |
591
+ | File upload failed | Verify local file path, confirm the selector targets a real file input, retry once |
592
+ | Storage access denied | Fall back to a narrow \`eval\` call only when browser session storage APIs are blocked |
593
+ | Network error | Wait briefly, reload, retry once, then report partial progress |
594
+
595
+ ### Building Skills on Browser Primitives
596
+
597
+ \`browser-use\` is the foundation skill. Domain-specific skills such as deployment planners, release-note generators, internal admin workflows, or authenticated data collectors should depend on it instead of redefining browser semantics.
598
+
599
+ When another skill ships browser automation, it should treat this section as the shared contract.
600
+
601
+ #### Domain Skill Architecture
602
+
603
+ \`browser-use\` provides the primitives. Domain skills provide the workflow.
604
+
605
+ A domain skill built on top of browser automation should follow this architecture:
606
+
607
+ - The domain skill has its own \`SKILL.md\` that explains what business task it automates, such as creating deployment release notes from GitHub PRs.
608
+ - Browser recipes live inside that skill, either embedded directly in \`SKILL.md\` or stored as reusable docs under the skill's \`references/\` directory.
609
+ - The domain skill references \`browser-use\` for browser action semantics, security rules, auth escalation, and recovery patterns instead of redefining them.
610
+ - The domain skill guides the LLM through the end-to-end workflow, including when to gather inputs, when to run browser recipes, when to switch tools, and how to format the final output.
611
+ - Browser recipes handle web interaction details. The domain skill handles business intent, sequencing, domain-specific validation, and final deliverables.
612
+
613
+ This separation matters because a teammate usually does not want "a browser script." They want a skill for a business outcome such as deployment planning, release-note generation, status monitoring, or internal-tool automation. The domain skill explains the outcome and uses browser recipes as implementation building blocks.
614
+
615
+ #### Example: Deployment Release Notes Skill
616
+
617
+ A teammate wants a skill that automates creating release notes by scraping GitHub PRs, commit history, and linked tickets. Here is how the skill's \`SKILL.md\` would be structured:
618
+
619
+ \`\`\`markdown
620
+ # Deployment Release Notes - Automated Release Documentation
621
+
622
+ Generate deployment release notes by collecting PR descriptions, commit messages, and linked tickets from GitHub, then formatting them into a structured release document.
623
+
624
+ **When to use:** Before a deployment, when the team needs a summary of changes, or when creating a changelog for stakeholders.
625
+
626
+ **Prerequisites:**
627
+ - \`browser-use\` skill loaded (provides browser automation primitives and recipe format)
628
+ - Access to the GitHub repository (may require auth)
629
+
630
+ ## Workflow
631
+
632
+ ### Step 1: Gather Context
633
+ - Ask the user for: repository URL, release branch/tag, previous release tag
634
+ - Determine if GitHub auth is needed (private repo -> use Authenticated Web Task recipe from browser-use)
635
+
636
+ ### Step 2: Collect PR Data
637
+ Follow this browser recipe:
638
+
639
+ # Recipe: Extract GitHub PRs Between Tags
640
+
641
+ ## Metadata
642
+ - Name: GitHub PR Extraction
643
+ - Trigger: Need to list merged PRs between two git tags
644
+ - Target: github.com or GitHub Enterprise
645
+ - Mode: headless (public repos) or ui (private repos needing auth)
646
+ - Requires Auth: depends on repo visibility
647
+ - Destructive: no
648
+
649
+ ## Variables
650
+ - \`{{repo_url}}\` - GitHub repository URL (e.g., https://github.com/org/repo)
651
+ - \`{{base_tag}}\` - Previous release tag
652
+ - \`{{head_tag}}\` - New release tag
653
+
654
+ ## Steps
655
+ 1. Open compare page
656
+ - Action: \`browser({ action: 'open', url: '{{repo_url}}/compare/{{base_tag}}...{{head_tag}}', mode: 'headless' })\`
657
+ - Verify: Page loads with comparison content
658
+ - On Failure: Switch to \`mode: 'ui'\` for auth, follow browser-use auth pattern
659
+ - Extract: Save \`pageId\`
660
+
661
+ 2. Extract PR list
662
+ - Action: \`browser({ action: 'read', pageId, readMode: 'markdown', selector: '.js-commits-list-item, .pr-list' })\`
663
+ - Verify: Output contains commit or PR references
664
+ - On Failure: Try \`readMode: 'dom'\` and parse HTML, or use the commits tab instead
665
+ - Extract: List of PRs with titles, numbers, authors
666
+
667
+ 3. For each PR, extract details
668
+ - Action: \`browser({ action: 'navigate', pageId, url: '{{repo_url}}/pull/{{pr_number}}' })\`
669
+ - Action: \`browser({ action: 'read', pageId, readMode: 'markdown', selector: '.comment-body' })\`
670
+ - Verify: PR description is captured
671
+ - On Failure: Fall back to \`readMode: 'text'\`
672
+ - Extract: PR title, description, labels, linked issues
673
+
674
+ ## Cleanup
675
+ - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
676
+
677
+ ### Step 3: Collect Linked Tickets (Optional)
678
+ If PRs reference JIRA or ticket URLs, follow the Data Extraction recipe from browser-use to scrape ticket titles and statuses.
679
+
680
+ ### Step 4: Format Release Notes
681
+ Using the collected data, generate a structured release document:
682
+ - Group changes by category (features, fixes, chores) based on PR labels or commit prefixes
683
+ - Include PR links, authors, and ticket references
684
+ - Add deployment metadata (date, branch, tag range)
685
+
686
+ ### Step 5: Output
687
+ Present the release notes to the user. Offer to:
688
+ - Copy to clipboard
689
+ - Save as markdown file
690
+ - Create a GitHub Release draft (requires additional browser recipe)
691
+
692
+ ## Error Handling
693
+ - If GitHub auth is needed, follow browser-use auth patterns (switch to ui mode, let user handle SSO or 2FA)
694
+ - If PR extraction returns empty results, verify the tag names exist and the compare URL is correct
695
+ - If the ticket system is unreachable, skip ticket enrichment and note it in the output
696
+ \`\`\`
697
+
698
+ This example shows the pattern: the domain skill orchestrates the workflow, deciding what to collect and how to format it, while \`browser-use\` provides the primitives for opening pages, reading content, handling auth, and recovering from common browser failures.
699
+
700
+ #### How to Help Users Create Domain Skills
701
+
702
+ When a user asks you to create a skill that automates a web-based workflow, follow this process:
703
+
704
+ 1. **Identify the workflow** - Ask what manual steps the user currently performs, which websites or web apps are involved, and what data they need to extract or actions they need to perform.
705
+ 2. **Map to recipes** - Break the workflow into discrete browser recipes. Each recipe should handle one website or one logical browser task. For example, extracting PRs from GitHub is one recipe, while formatting release notes is a separate non-browser step.
706
+ 3. **Check for reusable recipes** - Reuse and adapt the templates in this skill first: Form Submission, Data Extraction, File Upload, Authenticated Web Task, and Monitor Web Page. Do not write everything from scratch when an existing pattern already fits.
707
+ 4. **Structure the skill** - Give the user a skill layout that separates the main workflow from reusable references:
708
+
709
+ \`\`\`text
710
+ my-automation-skill/
711
+ SKILL.md
712
+ references/
713
+ recipe-github-extract-prs.md
714
+ recipe-jira-get-tickets.md
715
+ \`\`\`
716
+
717
+ 5. **Write the SKILL.md** - Include these sections:
718
+ - **Header** - What the skill does, when to use it, and prerequisites. Always list \`browser-use\` when browser recipes are part of the workflow.
719
+ - **Workflow** - Numbered high-level steps that mix browser recipes with non-browser reasoning, formatting, summarization, or file generation.
720
+ - **Embedded recipes** - Put workflow-specific browser tasks inline when they are tightly coupled to that skill.
721
+ - **Referenced recipes** - Link to reusable docs under \`references/\` when the same recipe may be reused across multiple skills.
722
+ - **Error handling** - Describe domain-specific recovery, such as what to do when auth fails, data is missing, or target pages change.
723
+ - **Output** - State what artifact the skill produces and how it should be delivered to the user.
724
+ 6. **Test the skill** - Run each recipe in \`mode: 'ui'\` first to validate selectors, flow, and auth handling. Only switch to \`headless\` after the browser interactions are proven stable.
725
+ 7. **Register the skill** - Add it to \`scaffold/definitions/plugins.mjs\` so it deploys with \`aikit init\`.
726
+
727
+ #### Domain Skill Ideas
728
+
729
+ These are examples of skills teams could build on top of \`browser-use\`:
730
+
731
+ | Skill | What it automates | Key recipes used |
732
+ |-------|-------------------|------------------|
733
+ | Deployment Release Notes | Scrape PRs, commits, and tickets into a formatted changelog | Authenticated Web Task, Data Extraction |
734
+ | Deployment Plan Creator | Gather service dependencies, change scope, and risk inputs from internal tools | Data Extraction, Form Submission |
735
+ | Status Page Monitor | Watch status pages and summarize changes | Monitor Web Page |
736
+ | Form Auto-filler | Pre-fill repetitive internal forms such as expense reports or time sheets | Form Submission, Authenticated Web Task |
737
+ | Screenshot Documentation | Capture annotated screenshots of UI flows for docs | Multi-page navigation, screenshots |
738
+ | Competitive Analysis | Extract pricing, feature lists, and positioning details from public sites | Data Extraction, pagination |
739
+ | Internal Tool Automation | Automate admin workflows in internal web apps that have no API | Authenticated Web Task, Form Submission |
740
+
741
+ Use this guidance when you are helping a user create a new skill: describe the business workflow first, identify which browser recipes it needs, and keep browser-specific details aligned with the primitives and safety model documented here.
742
+
743
+ #### Naming Convention
744
+
745
+ - Use \`recipe-{domain}-{action}.md\` for reusable standalone recipe docs.
746
+ - Store reusable examples in the consuming skill's \`references/\` directory.
747
+ - Match the recipe title to the user-facing capability, not the implementation detail.
748
+
749
+ #### Quality Checklist
750
+
751
+ - [ ] Variables documented, including sensitivity and whether the agent may infer them
752
+ - [ ] Every step includes Action, Verify, On Failure, and Extract
753
+ - [ ] Auth and destructive behavior are declared in metadata
754
+ - [ ] Cleanup is present and closes browser pages unless a kept-open session is intentional
755
+ - [ ] Recovery paths stop after bounded retries instead of looping indefinitely
756
+ - [ ] Recipe was exercised in \`mode: 'ui'\` at least once
757
+
758
+ #### Composition Notes
759
+
760
+ - Keep steps small enough that one read, screenshot, or selector wait can verify them.
761
+ - Prefer \`readMode: 'snapshot'\` to discover controls, \`readMode: 'text'\` to verify outcomes, and \`readMode: 'markdown'\` to capture extracted content.
762
+ - Use \`navigate({ type: 'waitFor', selector, timeoutMs })\` instead of timing guesses when a page transition has a concrete ready signal.
763
+ - Use \`eval\` only for narrow gaps the built-in actions cannot cover.
764
+ - Follow [references/auth-patterns.md](references/auth-patterns.md) for SSO, OAuth, CAPTCHA, or 2FA flows.
765
+
132
766
  ## Security Model (HARD GATE)
133
767
 
134
768
  - AI Kit enforces URL allowlisting before page navigation; respect denials instead of trying alternate bypasses.
@@ -173,17 +807,28 @@ This keeps the viewing workflow inside the same owned runtime.
173
807
  | CAPTCHA appears | Ask the user to solve it manually, then continue from \`read\` |
174
808
  | Need to inspect cookies | Use \`browser({ action: 'session', sessionAction: 'cookies', pageId })\` and warn the user |
175
809
  | Need complex DOM extraction | Use \`browser({ action: 'eval', ... })\` with a small, targeted script |
810
+ | Scroll not loading more content | Add a wait after scroll: eval with setTimeout, then re-read |
811
+ | File upload not working | Ensure selector targets an actual \`<input type="file">\` element |
812
+ | Storage access denied | Some sites block storage access in certain contexts; try eval instead |
813
+ | Cookie set failed | Verify domain/path match the target site; set-cookie requires confirm:true |
814
+ | Markdown output too messy | Use \`readMode: 'text'\` for simpler output, or scope with selector |
176
815
 
177
816
  ## Decision Flow
178
817
 
179
818
  \`\`\`
180
819
  Need browser help?
181
- ├─ Public page, no JS or auth needed? → web_fetch
182
- ├─ Needs JS rendering or interaction? → browser open/read
183
- ├─ Login wall or SSO flow? repo-access browser-use
184
- ├─ Need local dashboard viewing? present(browser) browser open
185
- ├─ Need screenshot or accessibility? → browser screenshot/read
186
- └─ Need cookie/session transfer? → browser session (with user approval)
820
+ ├─ Public page, no JS or auth needed? → web_fetch (simpler, faster)
821
+ ├─ Need JS rendering or interaction? → browser openread
822
+ ├─ Need clean markdown of a page? browser read (readMode: 'markdown')
823
+ ├─ Need structured HTML/DOM? → browser read (readMode: 'dom')
824
+ ├─ Login wall or SSO flow? repo-access → browser-use auth patterns
825
+ ├─ Need to fill forms / submit data? → browser act (type/click/select)
826
+ ├─ Need to upload files? → browser act (upload)
827
+ ├─ Need to scroll / load lazy content? → browser act (scroll)
828
+ ├─ Need screenshot of specific region? → browser screenshot (clip)
829
+ ├─ Need session/cookie management? → browser session (cookies/storage)
830
+ ├─ Need local dashboard viewing? → present(browser) → browser open
831
+ └─ Complex multi-step automation? → Compose patterns from this skill
187
832
  \`\`\`
188
833
  `},{file:`references/auth-patterns.md`,content:`# Browser Auth Patterns
189
834