@vpxa/aikit 0.1.146 → 0.1.148

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,986 +1,319 @@
1
- var e=[{file:`SKILL.md`,content:`---
2
- name: browser-use
3
- description: "Browser automation for AI agents using AI Kit's owned \`browser\` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) \`web_fetch\` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that \`web_fetch\` cannot handle, (5) user asks to browse, scrape, test, or automate a website, or (6) another skill needs a standard recipe format for browser-driven workflows. Uses AI Kit's owned Chromium runtime and recipe patterns for domain-specific automation skills — no external MCP server dependency."
4
- metadata:
5
- category: cross-cutting
6
- domain: general
7
- applicability: on-demand
8
- inputs: [url, auth-error, browser-task, login-wall]
9
- outputs: [page-content, screenshots, extracted-data, authenticated-session]
10
- requires: []
11
- relatedSkills: [repo-access, present, aikit]
12
- argument-hint: "URL or browser task description"
13
- ---
14
-
15
- # Browser Automation for AI Agents
16
-
17
- Use AI Kit's owned \`browser\` MCP tool to solve authentication barriers, extract data, fill forms, and interact with web applications. This skill bridges CLI-based access failures (login walls, SAML SSO, OAuth, CAPTCHAs) and real browser interaction without any external browser MCP dependency.
18
-
19
- ## Runtime Model
20
-
21
- - Single MCP tool: \`browser({ action: ... })\`
22
- - Action-based dispatch across eight actions: \`open\`, \`read\`, \`act\`, \`navigate\`, \`eval\`, \`screenshot\`, \`dialog\`, \`session\`
23
- - Owned Chromium runtime managed by AI Kit itself
24
- - Install browser binaries once with \`aikit browser install\`
25
- - Runtime modes: \`headless\` for CI, \`ui\` for desktop browser windows, \`panel\` for VS Code-hosted browsing
26
- - Auto-idle shutdown closes inactive browser sessions after the configured timeout
27
- - No external MCP server, no separate browser tool registration, no extra setup after install
28
-
29
- ## When to Activate
30
-
31
- ### Reactive Triggers
32
-
33
- - \`repo-access\` exhausted its Strategy Ladder and SAML SSO, OAuth, or a login wall blocks CLI access.
34
- - \`web_fetch\` returns login HTML, redirect markup, or a CAPTCHA challenge instead of target content.
35
- - \`http\` returns \`401\` or \`403\` and the user confirms they can access the site in a browser.
36
- - Tool output mentions "CAPTCHA", "bot detection", "Cloudflare", "verify you are human", or similar anti-bot language.
37
- - User asks to interact with a web application, fill forms, click buttons, navigate flows, or extract rendered content.
38
- - User asks to take screenshots, inspect accessibility output, or debug a page that requires JavaScript.
1
+ var e=[{file:`SKILL.md`,content:"---\nname: browser-use\ndescription: \"Browser automation for AI agents using AI Kit's owned `browser` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) `web_fetch` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that `web_fetch` cannot handle, (5) user asks to browse, scrape, test, or automate a website, or (6) another skill needs a standard recipe format for browser-driven workflows. Uses AI Kit's owned Chromium runtime and recipe patterns for domain-specific automation skills — no external MCP server dependency.\"\nmetadata:\n category: cross-cutting\n domain: general\n applicability: on-demand\n inputs: [url, auth-error, browser-task, login-wall]\n outputs: [page-content, screenshots, extracted-data, authenticated-session, network-captures]\n requires: []\n relatedSkills: [repo-access, present, aikit]\nargument-hint: \"URL or browser task description\"\n---\n\n# Browser Automation for AI Agents\n\nUse AI Kit's `browser` MCP tool for authentication barriers, data extraction, form interactions, network capture, and web automation. Single tool, action-based dispatch, owned Chromium runtime.\n\n## Runtime\n\n- Tool: `browser({ action: ... })`\n- 11 actions: `open`, `read`, `act`, `navigate`, `network`, `console`, `fetch`, `eval`, `screenshot`, `dialog`, `session`\n- Modes: `headless` (CI), `ui` (desktop), `panel` (VS Code)\n- Install: `aikit browser install`\n- Auto-idle shutdown after timeout\n\n## When to Activate\n\n- `web_fetch` returns login HTML, SAML redirect, or CAPTCHA\n- `http` returns 401/403 and user confirms browser access works\n- `repo-access` Strategy Ladder exhausted — SSO/OAuth blocks CLI\n- Anti-bot detection (Cloudflare, \"verify you are human\")\n- User asks to browse, scrape, automate, test, or interact with a web app\n- Need screenshots, accessibility snapshots, or JS-rendered content\n- Need to capture network traffic or make authenticated API calls using page session\n\n## When NOT to Activate\n\n- Public pages `web_fetch` handles correctly\n- API endpoints reachable via `http` with auth headers\n- Static downloads via `http`\n- Tasks only needing raw HTML/links/outline\n\n## Two Automation Modes\n\n### Script Mode (Default — Imperative)\n\nDirect sequential `browser()` calls. Best for one-off tasks, testing, API capture.\n\n~~~text\n// Open → Read → Act → Read loop\nbrowser({ action: 'open', url: 'https://app.example.com', mode: 'ui' })\nbrowser({ action: 'read', pageId })\nbrowser({ action: 'act', pageId, kind: 'click', ref: '@login-button' })\nbrowser({ action: 'read', pageId }) // verify state changed\n~~~\n\n**Network Intelligence pattern:**\n\n~~~text\nbrowser({ action: 'network', pageId, subAction: 'enable', filter: { resourceTypes: ['xhr', 'fetch'] } })\n// ... navigate/interact to trigger API calls ...\nbrowser({ action: 'network', pageId, subAction: 'get' })\nbrowser({ action: 'network', pageId, subAction: 'export-har' })\n~~~\n\n**Authenticated API calls (using page cookies/session):**\n\n~~~text\nbrowser({ action: 'fetch', pageId, fetchUrl: 'https://app.example.com/api/data', fetchMethod: 'GET' })\n~~~\n\nExecutes `fetch()` in the page, so cookies, session state, and CSRF tokens are reused automatically.\n\n**Console capture:**\n\n~~~text\nbrowser({ action: 'console', pageId, consoleSubAction: 'enable' })\n// ... trigger page actions ...\nbrowser({ action: 'console', pageId, consoleSubAction: 'get', level: 'error' })\n~~~\n\n### Recipe Mode (Declarative)\n\nStructured step-by-step format for reusable workflows and domain skills. Each step declares Action, Verify, On Failure, and Extract fields.\n\nLoad [references/recipes.md](references/recipes.md) for full recipe templates and the recipe format specification.\n\nBrief recipe format:\n\n~~~text\nStep N: <description>\n Action: browser({ ... })\n Verify: <condition to check after action>\n On Failure: <recovery strategy>\n Extract: <data to capture for next steps>\n~~~\n\n## Action Reference\n\n| Action | Purpose | Key Params |\n|--------|---------|------------|\n| `open` | Launch page | `url`, `mode` (ui/headless/panel), `waitUntil` |\n| `read` | Extract content | `pageId`, `readMode` (snapshot/dom/markdown/text), `selector` |\n| `act` | DOM interaction | `pageId`, `kind`, `ref`/`selector`, `text`/`key`/`value` |\n| `navigate` | Page navigation | `pageId`, `url` or `type` (back/forward/reload/waitFor) |\n| `network` | Capture traffic | `pageId`, `subAction` (enable/get/clear/export-har), `filter` |\n| `console` | Capture console | `pageId`, `consoleSubAction` (enable/get/clear), `level` |\n| `fetch` | Page-context HTTP | `pageId`, `fetchUrl`, `fetchMethod`, `fetchHeaders`, `fetchBody` |\n| `eval` | Execute JS | `pageId`, `code` |\n| `screenshot` | Capture image | `pageId`, `selector`, `fullPage`, `clip`, `format` |\n| `dialog` | Pre-register handler for NEXT dialog | `pageId`, `accept`, `promptText` |\n| `session` | Manage sessions | `sessionAction` (list/close/cookies/set-cookie/get-storage/...) |\n\n## Read Modes\n\n| Mode | Output | Use Case |\n|------|--------|----------|\n| `snapshot` | ARIA accessibility tree with refs | Element targeting, form interaction |\n| `dom` | Raw HTML | HTML structure, debugging |\n| `markdown` | Clean readable text | Content extraction, summarization |\n| `text` | Plain text | Simple text extraction |\n\n## Interaction Kinds\n\n| Kind | Required Params | Notes |\n|------|-----------------|-------|\n| `click` | `ref` or `selector` | Left-click element |\n| `type` | `ref`/`selector` + `text` | Type into input/textarea |\n| `press` | `ref`/`selector` + `key` | Send key to element. Requires a target — use `ref` from snapshot or `selector`. |\n| `hover` | `ref`/`selector` | Trigger hover states |\n| `drag` | `fromRef`/`fromSelector` + `toRef`/`toSelector` | Drag and drop |\n| `select` | `ref`/`selector` + `value` | Select dropdown option |\n| `scroll` | optional `ref`/`selector` | Scroll page or element |\n| `upload` | `ref`/`selector` + `value` (path) | File upload |\n\n### Element Targeting Priority\n\n1. **`ref`** (e.g., `@F12`) — From `read(snapshot)` ARIA tree. Most reliable.\n2. **`selector`** (e.g., `input[name='q']`) — Playwright CSS/attribute selector. Precise.\n3. **`element`** (e.g., `'Submit'`) — Text matching via `text=` locator. **Picks first DOM match regardless of visibility.** Fragile for complex widgets (comboboxes, ARIA roles). Last resort.\n\n**Always `read(snapshot)` first** to get refs before interacting.\n\n> **Visibility Warning**: Playwright `act` waits up to 30s for the target to be visible. If a selector or `element` matches a hidden element first, the action times out. The browser tool does NOT expose a `force` or custom `timeout` parameter.\n>\n> **Workarounds:**\n> - Append `:visible` to selectors: `selector: 'button:has-text(\"Submit\"):visible'`\n> - Use specific selectors instead of `element` when labels are ambiguous (e.g., \"Search\" may match 30+ elements)\n> - Use `read(snapshot)` refs (`@F12`) which always target the specific rendered element\n\n## Network Intelligence\n\nThree new actions for API reverse-engineering and authenticated requests:\n\n**`network`** — Passive traffic capture with circular buffer (200 entries default):\n- `enable`: Start capturing with optional filter (resourceTypes, urlPattern, excludeUrls)\n- `get`: Retrieve captured requests + responses with timing\n- `clear`: Reset buffer\n- `export-har`: Export as HAR 1.2 format\n\nHeaders are redacted by default (Authorization, Cookie, etc.). Pass `showSensitive: true` to see full headers.\n\n**`console`** — Browser console message capture (1000 entries default):\n- `enable`: Start capturing all console output\n- `get`: Retrieve messages, optionally filtered by `level`\n- `clear`: Reset buffer\n\n**`fetch`** — Execute HTTP from page context:\n- Uses the page's live cookies, session, CSRF tokens\n- Supports GET/POST/PUT/PATCH/DELETE/HEAD/OPTIONS\n- Body auto-truncated at 256KB\n- Alternative to extracting cookies then calling `http` tool\n\n**Workflow — Reverse-engineer API:**\n\n~~~text\n1. open target page\n2. network enable (filter: xhr, fetch)\n3. interact with the page (click buttons, submit forms)\n4. network get → see API endpoints, methods, headers\n5. fetch → replay API calls using page session\n~~~\n\n## Session Management\n\n| Action | Purpose | Note |\n|--------|---------|------|\n| `cookies` | Export page cookies | `confirm: true` required |\n| `set-cookie` | Inject cookies | `confirm: true` required |\n| `delete-cookie` / `clear-cookies` | Remove cookies | `confirm: true` required |\n| `get-storage` / `set-storage` / `clear-storage` | localStorage/sessionStorage | |\n| `list` | List open pages | |\n| `close` | Close a page | |\n\n## Security Model\n\n**Hard gates — NEVER bypass:**\n- Credentials go via terminal input (NEVER through tool params or chat)\n- CAPTCHA/MFA: pause and ask user\n- Never store tokens in conversation\n- Close pages containing sensitive data when done\n- Verify page URL before entering credentials (phishing prevention)\n- Use `headless` mode for automated non-interactive tasks; `ui` for user-supervised auth\n\n**Cookie safety gate:** All cookie read/write session actions (`cookies`, `set-cookie`, `delete-cookie`, `clear-cookies`) require `confirm: true` as an explicit acknowledgment. Without it, the tool returns an error.\n\n## Integration\n\n| Skill | Handoff Pattern |\n|-------|------------------|\n| `repo-access` | Strategy Ladder step 6 → browser-use for SSO/OAuth login |\n| `present` | `present({ format: 'browser' })` returns URL → open with browser tool |\n| `aikit` | `web_fetch` fails → browser-use activates |\n\n## Dialog Handling\n\n`dialog()` registers a **one-shot handler** for the NEXT dialog. It must be called **BEFORE** the action that triggers alert, confirm, or prompt.\n\n**Pattern:**\n~~~text\nbrowser({ action: 'dialog', pageId, accept: true })\nbrowser({ action: 'eval', pageId, code: 'confirm(\"Sure?\")' }) // or browser({ action: 'act', ... }) if interaction triggers it\n~~~\n\nFor `prompt` dialogs, pass `promptText` for the response.\n\n## Troubleshooting\n\n| Issue | Fix |\n|-------|-----|\n| \"Browser not installed\" | Run `aikit browser install` |\n| Element not found | `read` with `snapshot` mode first, use ref from ARIA tree |\n| Timeout on navigation | Add `waitUntil: 'networkidle'` to open/navigate |\n| SSO redirect loop | Check cookies with `session({ sessionAction: 'cookies' })` |\n| Anti-bot block | Try `mode: 'ui'`, add delays between actions |\n| Network capture empty | Ensure `enable` called BEFORE navigating |\n\n## Decision Flow\n\n~~~text\nNeed browser?\n├─ Can web_fetch/http handle it? → NO browser needed\n├─ Login wall / SSO / CAPTCHA? → browser-use (Script mode for one-off, Recipe for reusable)\n├─ Need to capture API traffic? → network enable → interact → network get\n├─ Need authenticated API calls? → fetch action (uses page session)\n├─ JS-rendered content? → open + read(markdown)\n├─ Form interaction? → Script mode: open → read(snapshot) → act → verify\n└─ Reusable workflow? → Recipe mode (see references/recipes.md)\n~~~\n"},{file:`references/recipes.md`,content:`# Browser Recipes & Domain Skills
2
+
3
+ Reference file for reusable browser automation patterns. Load this when building domain-specific browser workflows.
4
+
5
+ ## Recipe Format
6
+
7
+ Each recipe step declares:
8
+ - **Action**: The browser() call to execute
9
+ - **Verify**: Condition to check (read page, check element, verify URL)
10
+ - **On Failure**: Recovery strategy (retry, alternative selector, escalate)
11
+ - **Extract**: Data to capture for subsequent steps
12
+
13
+ ## Recipe Templates
14
+
15
+ ### 1. Form Login
16
+
17
+ ~~~text
18
+ Step 1: Open login page
19
+ Action: browser({ action: 'open', url: '<login-url>', mode: 'ui' })
20
+ Verify: Page contains login form (read snapshot, check for username/password fields)
21
+ On Failure: URL may have changed check redirects
22
+ Extract: pageId
23
+
24
+ Step 2: Read form structure
25
+ Action: browser({ action: 'read', pageId, readMode: 'snapshot' })
26
+ Verify: Found username field, password field, submit button
27
+ On Failure: Try readMode 'dom' for hidden fields
28
+ Extract: field refs (@username, @password, @submit)
29
+
30
+ Step 3: Enter credentials
31
+ Action: browser({ action: 'act', pageId, kind: 'type', ref: '@username', text: '<from-user>' })
32
+ Action: browser({ action: 'act', pageId, kind: 'type', ref: '@password', text: '<from-user>' })
33
+ Verify: Fields populated (read snapshot)
34
+ On Failure: Field may need click-to-focus first
35
+
36
+ Step 4: Submit
37
+ Action: browser({ action: 'act', pageId, kind: 'click', ref: '@submit' })
38
+ Verify: URL changed OR dashboard content appears (read markdown)
39
+ On Failure: Check for error messages, CAPTCHA, MFA prompt
40
+ Extract: authenticated session state
41
+
42
+ Step 5: Verify authentication
43
+ Action: browser({ action: 'read', pageId, readMode: 'markdown' })
44
+ Verify: Page shows authenticated content (username, dashboard, logout link)
45
+ On Failure: Retry login or escalate to user
46
+ ~~~
47
+
48
+ ### 2. OAuth/SSO Flow
49
+
50
+ ~~~text
51
+ Step 1: Open target application
52
+ Action: browser({ action: 'open', url: '<app-url>', mode: 'ui' })
53
+ Verify: Redirected to SSO provider (check URL domain change)
54
+ Extract: pageId, SSO provider URL
55
+
56
+ Step 2: Read SSO login page
57
+ Action: browser({ action: 'read', pageId, readMode: 'snapshot' })
58
+ Verify: SSO form present (Okta, Azure AD, etc.)
59
+ On Failure: May need to click "Sign in with SSO" button first
60
+ Extract: form field refs
61
+
62
+ Step 3: Enter SSO credentials (pause for user)
63
+ Action: PAUSE — ask user to enter credentials in browser window
64
+ Verify: User confirms login complete OR URL returns to app domain
65
+ On Failure: Check for MFA prompt, ask user to complete
66
+
67
+ Step 4: Verify return to application
68
+ Action: browser({ action: 'read', pageId, readMode: 'markdown' })
69
+ Verify: Back on application domain with authenticated content
70
+ On Failure: Cookie may not have been set — check session cookies
71
+ Extract: session state
72
+
73
+ Step 5: Export session (optional)
74
+ Action: browser({ action: 'session', pageId, sessionAction: 'cookies', confirm: true })
75
+ Verify: Got authentication cookies
76
+ Extract: cookies for http tool usage
77
+ ~~~
78
+
79
+ ### 3. Data Extraction
80
+
81
+ ~~~text
82
+ Step 1: Navigate to data page
83
+ Action: browser({ action: 'open', url: '<data-url>', mode: 'headless' })
84
+ Verify: Page loaded successfully
85
+ Extract: pageId
86
+
87
+ Step 2: Wait for dynamic content
88
+ Action: browser({ action: 'navigate', pageId, type: 'waitFor', selector: '<data-container>' })
89
+ Verify: Target element present in DOM
90
+ On Failure: Increase timeout, check if JS is needed
91
+
92
+ Step 3: Extract structured data
93
+ Action: browser({ action: 'eval', pageId, code: 'document.querySelectorAll("<selector>").map(...)' })
94
+ Verify: Got expected data structure
95
+ On Failure: Try different selector, use read(dom) to inspect structure
96
+ Extract: structured data
97
+
98
+ Step 4: Paginate (if needed)
99
+ Action: browser({ action: 'act', pageId, kind: 'click', ref: '@next-page' })
100
+ Verify: New content loaded (different from previous page)
101
+ On Failure: End of pagination reached
102
+ Extract: additional data, loop back to Step 3
103
+ ~~~
104
+
105
+ ### 4. File Upload
106
+
107
+ ~~~text
108
+ Step 1: Navigate to upload page
109
+ Action: browser({ action: 'open', url: '<upload-url>', mode: 'ui' })
110
+ Verify: Upload form present
111
+ Extract: pageId
112
+
113
+ Step 2: Locate upload input
114
+ Action: browser({ action: 'read', pageId, readMode: 'snapshot' })
115
+ Verify: Found file input element
116
+ Extract: file input ref
117
+
118
+ Step 3: Upload file
119
+ Action: browser({ action: 'act', pageId, kind: 'upload', ref: '@file-input', value: '<local-file-path>' })
120
+ Verify: File name appears in UI
121
+ On Failure: Input may be hidden — try selector instead of ref
122
+
123
+ Step 4: Submit upload
124
+ Action: browser({ action: 'act', pageId, kind: 'click', ref: '@upload-button' })
125
+ Verify: Success message or progress indicator
126
+ On Failure: Check file size limits, format restrictions
127
+ Extract: upload result (URL, ID, etc.)
128
+ ~~~
129
+
130
+ ### 5. API Reverse-Engineering (Network Intelligence)
131
+
132
+ ~~~text
133
+ Step 1: Open application and enable capture
134
+ Action: browser({ action: 'open', url: '<app-url>', mode: 'ui' })
135
+ Action: browser({ action: 'network', pageId, subAction: 'enable', filter: { resourceTypes: ['xhr', 'fetch'] } })
136
+ Verify: Network capture enabled
137
+ Extract: pageId
138
+
139
+ Step 2: Trigger target API calls
140
+ Action: browser({ action: 'act', pageId, kind: 'click', ref: '@action-button' })
141
+ Verify: Page state changed (new data loaded)
142
+ On Failure: Try different interaction to trigger API
143
+
144
+ Step 3: Review captured traffic
145
+ Action: browser({ action: 'network', pageId, subAction: 'get' })
146
+ Verify: Found target API endpoints in captured requests
147
+ Extract: API URL, method, headers, response structure
148
+
149
+ Step 4: Replay with fetch (using page session)
150
+ Action: browser({ action: 'fetch', pageId, fetchUrl: '<captured-api-url>', fetchMethod: 'GET' })
151
+ Verify: Got same response as captured
152
+ Extract: API response data
153
+
154
+ Step 5: Export HAR (optional)
155
+ Action: browser({ action: 'network', pageId, subAction: 'export-har' })
156
+ Extract: HAR file for documentation or replay
157
+ ~~~
158
+
159
+ ### 6. Monitoring / Health Check
39
160
 
40
- ### Proactive Triggers
161
+ ~~~text
162
+ Step 1: Open target page
163
+ Action: browser({ action: 'open', url: '<url>', mode: 'headless', waitUntil: 'networkidle' })
164
+ Verify: Page loaded without errors
165
+ Extract: pageId
41
166
 
42
- - Task involves an internal or enterprise web application with SSO.
43
- - User asks to browse, scrape, test, or automate a website.
44
- - A workflow already uses \`present({ format: 'browser' })\` and you need to open the returned local dashboard URL.
167
+ Step 2: Check for error indicators
168
+ Action: browser({ action: 'console', pageId, consoleSubAction: 'enable' })
169
+ Action: browser({ action: 'read', pageId, readMode: 'markdown' })
170
+ Verify: No error banners, expected content present
171
+ Extract: page health status
45
172
 
46
- ## When NOT to Activate
173
+ Step 3: Check console for JS errors
174
+ Action: browser({ action: 'console', pageId, consoleSubAction: 'get', level: 'error' })
175
+ Verify: No critical JS errors
176
+ Extract: error list (if any)
47
177
 
48
- - Public pages that \`web_fetch\` handles correctly and do not require interaction.
49
- - API endpoints that are reachable via \`http\` with proper auth headers.
50
- - Static downloads that work through \`http\` or repo-local tooling.
51
- - Tasks that only need raw HTML, links, or outline extraction.
178
+ Step 4: Screenshot for visual regression
179
+ Action: browser({ action: 'screenshot', pageId, fullPage: true })
180
+ Extract: screenshot for comparison
181
+ ~~~
52
182
 
53
- ## Browser Action Reference
183
+ ## Creating Domain Skills
54
184
 
55
- | Action | Purpose | Key Params |
56
- |--------|---------|------------|
57
- | \`open\` | Launch browser page | \`url\`, \`mode\` (\`ui\`/\`headless\`/\`panel\`), \`waitUntil\` |
58
- | \`read\` | Extract page content | \`pageId\`, \`readMode\` (\`snapshot\`/\`dom\`/\`markdown\`/\`text\`), \`selector\` |
59
- | \`act\` | DOM interactions | \`pageId\`, \`kind\` (\`click\`/\`type\`/\`press\`/\`hover\`/\`drag\`/\`select\`/\`scroll\`/\`upload\`) |
60
- | \`navigate\` | Page navigation | \`pageId\`, \`url\`/\`type\`/\`selector\` |
61
- | \`eval\` | Execute JavaScript | \`pageId\`, \`code\` |
62
- | \`screenshot\` | Capture screenshots | \`pageId\`, \`selector\`, \`fullPage\`, \`clip\`, \`format\`, \`quality\` |
63
- | \`dialog\` | Handle dialogs | \`pageId\`, \`accept\`, \`promptText\` |
64
- | \`session\` | Session management | \`sessionAction\` (\`list\`/\`close\`/\`cookies\`/\`set-cookie\`/\`delete-cookie\`/\`clear-cookies\`/\`get-storage\`/\`set-storage\`/\`clear-storage\`) |
185
+ Domain skills use browser-use recipes as building blocks for specific automation tasks (e.g., "jira-automation", "salesforce-extract").
65
186
 
66
- ## Core Workflow
187
+ ### Structure
67
188
 
68
- Every browser task follows the same loop:
189
+ A domain skill should:
190
+ 1. Import the browser-use skill as a dependency
191
+ 2. Define domain-specific recipes using the recipe format above
192
+ 3. Include domain knowledge (selectors, URLs, auth patterns for that service)
193
+ 4. Provide error recovery specific to that domain
69
194
 
70
- \`\`\`
71
- 1. OPEN → browser({ action: 'open', url: '<target>', mode: 'ui' })
72
- 2. READ → browser({ action: 'read', pageId })
73
- 3. ACT → browser({ action: 'act', pageId, kind: 'click' | 'type' | 'press' | 'hover' | 'drag' | 'select', ... })
74
- 4. READ → browser({ action: 'read', pageId })
75
- 5. LOOP → Repeat steps 3-4 until the task is complete
76
- \`\`\`
195
+ ### Quality Checklist
77
196
 
78
- ## Usage Examples
197
+ - [ ] Every recipe step has Action + Verify + On Failure
198
+ - [ ] Credentials handled via user input (never hardcoded)
199
+ - [ ] Selectors are resilient (prefer aria labels, data-testid, role over CSS classes)
200
+ - [ ] Timeouts configured for slow pages
201
+ - [ ] Screenshots taken at key verification points
202
+ - [ ] Session cleanup (close page when done)
203
+ - [ ] Works in both headless and ui modes
79
204
 
80
- ### Open and Inspect a Page
205
+ ### Composition
81
206
 
82
- \`\`\`
83
- const { pageId } = await browser({ action: 'open', url: 'https://example.com', mode: 'ui' })
84
- await browser({ action: 'read', pageId })
85
- \`\`\`
207
+ Recipes compose by chaining — output of one recipe feeds into another:
86
208
 
87
- ### Login to a Web Application
209
+ ~~~text
210
+ Login Recipe → (authenticated pageId) → Data Extraction Recipe → (data) → Present Results
211
+ ~~~
88
212
 
89
- \`\`\`
90
- const { pageId } = await browser({ action: 'open', url: 'https://example.com/login', mode: 'ui' })
213
+ For complex flows:
91
214
 
92
- await browser({ action: 'read', pageId })
93
- await browser({ action: 'act', pageId, kind: 'type', ref: '@username-input', text: 'user@example.com' })
94
- await browser({ action: 'act', pageId, kind: 'type', ref: '@password-input', text: '<user-provided>' })
95
- await browser({ action: 'act', pageId, kind: 'click', ref: '@login-button' })
96
- await browser({ action: 'read', pageId })
97
- \`\`\`
215
+ ~~~text
216
+ Login Recipe Network Enable Interaction Recipe Network Get → Fetch API → Process Data
217
+ ~~~
218
+ `},{file:`references/auth-patterns.md`,content:`# Authentication Patterns
98
219
 
99
- **Rule:** ask the user for credentials and 2FA codes. Never guess, reuse, or log them.
220
+ Browser-based authentication strategies for different auth mechanisms.
100
221
 
101
- ### Extract Content from an Authenticated Page
222
+ ## Pattern 1: Basic Form Login
102
223
 
103
- \`\`\`
104
- const { pageId } = await browser({ action: 'open', url: 'https://internal.company.com/docs', mode: 'ui' })
105
- await browser({ action: 'read', pageId })
224
+ **When:** Simple username/password form without SSO.
106
225
 
107
- await browser({
108
- action: 'eval',
109
- pageId,
110
- code: "return page.evaluate(() => document.querySelector('main')?.innerText ?? '')",
111
- })
112
- \`\`\`
226
+ **Steps:**
227
+ 1. \`open\` login page
228
+ 2. \`read\` snapshot to find form fields
229
+ 3. \`act\` type credentials (ask user for input via terminal)
230
+ 4. \`act\` click submit
231
+ 5. \`read\` to verify authenticated state
113
232
 
114
- ### Navigate, Hover, and Capture a Screenshot
233
+ **Verification:** URL changed to dashboard, or page contains user-specific content.
115
234
 
116
- \`\`\`
117
- await browser({ action: 'navigate', pageId, url: 'https://example.com/dashboard' })
118
- await browser({ action: 'act', pageId, kind: 'hover', selector: '[data-help]' })
119
- await browser({ action: 'screenshot', pageId })
120
- \`\`\`
235
+ **Fallback:** Check for error messages, CAPTCHA, or MFA prompts.
121
236
 
122
- ### Session Management
237
+ ## Pattern 2: OAuth2 / SSO (Okta, Azure AD, Google)
123
238
 
124
- \`\`\`
125
- await browser({ action: 'session', sessionAction: 'list' })
126
- await browser({ action: 'session', sessionAction: 'cookies', pageId })
127
- await browser({ action: 'session', sessionAction: 'close', pageId })
128
- \`\`\`
239
+ **When:** Application redirects to external identity provider.
129
240
 
130
- Use cookie export only when the user explicitly needs session transfer back into CLI tools.
241
+ **Steps:**
242
+ 1. \`open\` application URL — observe redirect to IdP
243
+ 2. \`read\` SSO page snapshot
244
+ 3. PAUSE — ask user to complete SSO login in the browser window
245
+ 4. Detect redirect back to application (URL change)
246
+ 5. \`read\` to verify authenticated
131
247
 
132
- ## Read Modes
248
+ **Verification:** URL domain returned to application, authenticated content visible.
133
249
 
134
- ### Get ARIA snapshot (default)
250
+ **Fallback:** User may need to complete MFA or consent screens manually.
135
251
 
136
- \`\`\`
137
- browser({ action: 'read', pageId })
138
- browser({ action: 'read', pageId, readMode: 'snapshot' })
139
- \`\`\`
252
+ ## Pattern 3: Cookie/Session Transfer
140
253
 
141
- ### Get page as clean markdown
254
+ **When:** Need to use captured session in \`http\` tool for API calls.
142
255
 
143
- \`\`\`
144
- browser({ action: 'read', pageId, readMode: 'markdown' })
145
- \`\`\`
256
+ **Steps:**
257
+ 1. Complete authentication via Pattern 1 or 2
258
+ 2. \`session({ sessionAction: 'cookies', confirm: true })\` — export all cookies
259
+ 3. Use cookies in \`http\` tool: \`http({ url, headers: { Cookie: '<exported>' } })\`
146
260
 
147
- ### Get HTML content (full page or scoped)
261
+ **Alternative (Recommended):** Use \`browser({ action: 'fetch', pageId, fetchUrl: '<api-endpoint>' })\` instead of extracting cookies. The \`fetch\` action executes HTTP requests directly in the page context, automatically using the page's cookies, session, and CSRF tokens. No cookie extraction or manual header management needed.
148
262
 
149
- \`\`\`
150
- browser({ action: 'read', pageId, readMode: 'dom' })
151
- browser({ action: 'read', pageId, readMode: 'dom', selector: 'main' })
152
- \`\`\`
263
+ **Verification:** API response returns authenticated data.
153
264
 
154
- ### Get plain text
265
+ ## Pattern 4: Client Certificate / mTLS
155
266
 
156
- \`\`\`
157
- browser({ action: 'read', pageId, readMode: 'text', selector: '.article-content' })
158
- \`\`\`
267
+ **When:** Site requires client certificate authentication.
159
268
 
160
- ## Advanced Screenshots
269
+ **Steps:**
270
+ 1. Certificate must be installed in the system cert store
271
+ 2. \`open\` with \`mode: 'ui'\` — browser will prompt for cert selection
272
+ 3. PAUSE — user selects certificate
273
+ 4. Verify page loads authenticated content
161
274
 
162
- ### Capture specific region
275
+ **Note:** Headless mode may not support cert picker — use \`ui\` mode.
163
276
 
164
- \`\`\`
165
- browser({ action: 'screenshot', pageId, clip: { x: 0, y: 0, width: 800, height: 600 } })
166
- \`\`\`
277
+ ## Pattern 5: Multi-Factor Authentication (MFA)
167
278
 
168
- ### JPEG format with quality
279
+ **When:** Login requires second factor (TOTP, SMS, push notification).
169
280
 
170
- \`\`\`
171
- browser({ action: 'screenshot', pageId, format: 'jpeg', quality: 80 })
172
- \`\`\`
281
+ **Steps:**
282
+ 1. Complete username/password entry
283
+ 2. \`read\` snapshot — detect MFA prompt type
284
+ 3. PAUSE — ask user to complete MFA (enter code, approve push, etc.)
285
+ 4. \`read\` to verify MFA completed and authenticated
173
286
 
174
- ### Element screenshot with format
287
+ **Hard Rule:** NEVER attempt to automate MFA — always defer to user.
175
288
 
176
- \`\`\`
177
- browser({ action: 'screenshot', pageId, selector: '.chart', format: 'png' })
178
- \`\`\`
289
+ ## Pattern 6: API Key Discovery
179
290
 
180
- ## Cookie Management
291
+ **When:** Need to find API keys or tokens from an authenticated web session.
181
292
 
182
- ### Set cookies
293
+ **Steps:**
294
+ 1. Authenticate using Pattern 1 or 2
295
+ 2. \`network({ subAction: 'enable', filter: { resourceTypes: ['xhr', 'fetch'] } })\`
296
+ 3. Navigate to pages that trigger API calls
297
+ 4. \`network({ subAction: 'get' })\` — inspect Authorization headers
298
+ 5. Extract bearer tokens, API keys from captured request headers
183
299
 
184
- \`\`\`
185
- browser({ action: 'session', sessionAction: 'set-cookie', confirm: true, cookies: [{ name: 'token', value: 'abc', domain: '.example.com', path: '/' }] })
186
- \`\`\`
300
+ **Important:** Pass \`showSensitive: true\` to see full Authorization headers (redacted by default).
187
301
 
188
- ### Delete specific cookie
302
+ **Alternative:** Use \`eval\` to read tokens from localStorage/sessionStorage:
189
303
 
190
- \`\`\`
191
- browser({ action: 'session', sessionAction: 'delete-cookie', confirm: true, name: 'tracking' })
192
- \`\`\`
304
+ ~~~text
305
+ browser({ action: 'eval', pageId, code: 'localStorage.getItem("authToken")' })
306
+ ~~~
193
307
 
194
- ### Clear all cookies
308
+ ## Pattern 7: Session Persistence Across Tasks
195
309
 
196
- \`\`\`
197
- browser({ action: 'session', sessionAction: 'clear-cookies', confirm: true })
198
- \`\`\`
310
+ **When:** Need to maintain authenticated state for multiple operations.
199
311
 
200
- ## Storage Access
312
+ **Steps:**
313
+ 1. Authenticate once (any pattern above)
314
+ 2. Keep the page open (don't close)
315
+ 3. Use \`fetch\` action for subsequent API calls — inherits page session
316
+ 4. Or use \`navigate\` to move between pages — cookies persist
201
317
 
202
- ### Read all localStorage
203
-
204
- \`\`\`
205
- browser({ action: 'session', sessionAction: 'get-storage', pageId, storageType: 'localStorage' })
206
- \`\`\`
207
-
208
- ### Read specific key
209
-
210
- \`\`\`
211
- browser({ action: 'session', sessionAction: 'get-storage', pageId, storageType: 'localStorage', storageKey: 'user-preferences' })
212
- \`\`\`
213
-
214
- ### Set storage value
215
-
216
- \`\`\`
217
- browser({ action: 'session', sessionAction: 'set-storage', pageId, storageType: 'localStorage', storageKey: 'theme', storageValue: 'dark' })
218
- \`\`\`
219
-
220
- ### Clear sessionStorage
221
-
222
- \`\`\`
223
- browser({ action: 'session', sessionAction: 'clear-storage', pageId, storageType: 'sessionStorage' })
224
- \`\`\`
225
-
226
- ## Scroll and Upload
227
-
228
- ### Scroll down
229
-
230
- \`\`\`
231
- browser({ action: 'act', pageId, kind: 'scroll', value: 'down 500' })
232
- \`\`\`
233
-
234
- ### Scroll to top/bottom
235
-
236
- \`\`\`
237
- browser({ action: 'act', pageId, kind: 'scroll', value: 'top' })
238
- browser({ action: 'act', pageId, kind: 'scroll', value: 'bottom' })
239
- \`\`\`
240
-
241
- ### Scroll element into view
242
-
243
- \`\`\`
244
- browser({ action: 'act', pageId, kind: 'scroll', selector: '#target-element' })
245
- \`\`\`
246
-
247
- ### Upload file
248
-
249
- \`\`\`
250
- browser({ action: 'act', pageId, kind: 'upload', selector: 'input[type="file"]', value: '/path/to/file.pdf' })
251
- \`\`\`
252
-
253
- ### Upload multiple files
254
-
255
- \`\`\`
256
- browser({ action: 'act', pageId, kind: 'upload', selector: 'input[type="file"]', value: '["/path/file1.pdf", "/path/file2.pdf"]' })
257
- \`\`\`
258
-
259
- ## Browser Automation Recipes
260
-
261
- The browser tool is the foundation for multi-step web automation. Use this section to standardize recipes that domain-specific skills can consume, extend, and execute without inventing their own browser workflow format.
262
-
263
- ### Recipe Format
264
-
265
- A browser recipe is a markdown workflow with explicit metadata, variables, steps, and cleanup.
266
-
267
- #### Metadata
268
-
269
- - **Name** — Human-readable recipe name
270
- - **Trigger** — When the recipe should be used
271
- - **Target** — Domain, URL family, or app surface it operates on
272
- - **Mode** — \`headless\`, \`ui\`, or \`panel\`
273
- - **Requires Auth** — \`yes\` or \`no\`
274
- - **Destructive** — \`yes\` or \`no\`; destructive recipes require explicit user confirmation before execution
275
-
276
- #### Variables
277
-
278
- Define placeholders the agent must resolve before starting.
279
-
280
- - \`{{url}}\` — target URL
281
- - \`{{username}}\` — login or account identifier
282
- - \`{{file_path}}\` — file path for uploads
283
-
284
- For each variable, document what it means, whether the agent can infer it or must ask the user, whether it is sensitive, and an example value when that removes ambiguity.
285
-
286
- #### Steps
287
-
288
- Each numbered step should include:
289
-
290
- 1. **Action** — exact \`browser(...)\` call
291
- 2. **Verify** — how to confirm the action succeeded
292
- 3. **On Failure** — recovery path if verification fails
293
- 4. **Extract** — data to capture for later steps or the final result
294
-
295
- #### Cleanup
296
-
297
- Cleanup always runs, even when earlier steps fail. Close pages, export only user-approved session state, and leave the browser runtime in a known state.
298
-
299
- #### Recipe Skeleton
300
-
301
- \`\`\`markdown
302
- # Recipe: <Name>
303
-
304
- ## Metadata
305
- - Name: <Human-readable name>
306
- - Trigger: <When to use it>
307
- - Target: <Domain or URL family>
308
- - Mode: headless
309
- - Requires Auth: no
310
- - Destructive: no
311
-
312
- ## Variables
313
- - \`{{url}}\` — Target URL
314
- - \`{{selector}}\` — Primary element selector
315
-
316
- ## Steps
317
- 1. Open target
318
- - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
319
- - Verify: Browser returns a \`pageId\`
320
- - On Failure: Retry once with \`waitUntil: 'load'\` or \`mode: 'ui'\`
321
- - Extract: Save \`pageId\`
322
-
323
- 2. Inspect page
324
- - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
325
- - Verify: Expected controls or content appear in output
326
- - On Failure: Reload and re-read
327
- - Extract: Save refs, selectors, visible labels
328
-
329
- ## Cleanup
330
- - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
331
- \`\`\`
332
-
333
- ### Recipe Templates
334
-
335
- #### Recipe: Submit Web Form
336
-
337
- **Variables**
338
-
339
- - \`{{url}}\` — form page URL
340
- - \`{{fields}}\` — field values keyed by selector or control ref
341
-
342
- **Steps**
343
-
344
- 1. Open page
345
- - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless', waitUntil: 'domcontentloaded' })\`
346
- - Verify: Browser returns a \`pageId\`
347
- - On Failure: Retry once with \`waitUntil: 'load'\` or \`mode: 'ui'\`
348
- - Extract: Save \`pageId\`
349
-
350
- 2. Read form structure
351
- - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
352
- - Verify: Form fields and submit button appear in output
353
- - On Failure: Re-read after reload or scope the read with a form selector
354
- - Extract: Required fields, labels, visible validation hints, selectors or refs
355
-
356
- 3. Fill fields
357
- - Action: For text inputs, use \`browser({ action: 'act', pageId, kind: 'type', selector: fieldSelector, text: value })\`
358
- - Action: For dropdowns, use \`browser({ action: 'act', pageId, kind: 'select', selector: fieldSelector, value: optionValue })\`
359
- - Action: For checkboxes or radio buttons, use \`browser({ action: 'act', pageId, kind: 'click', selector: fieldSelector })\`
360
- - Verify: Re-read affected fields or take a screenshot after the batch
361
- - On Failure: Re-read page, correct the selector, retry the failed field once
362
- - Extract: Inline validation messages and any server-provided field defaults
363
-
364
- 4. Verify form state
365
- - Action: \`browser({ action: 'screenshot', pageId, fullPage: true })\`
366
- - Verify: Screenshot shows required fields populated as expected
367
- - On Failure: Read visible validation messages with \`browser({ action: 'read', pageId, readMode: 'text' })\`
368
- - Extract: Evidence screenshot for the final report
369
-
370
- 5. Submit
371
- - Action: \`browser({ action: 'act', pageId, kind: 'click', selector: 'button[type="submit"]' })\`
372
- - Verify: \`browser({ action: 'read', pageId, readMode: 'text' })\` shows a success message or the page navigates to a confirmation state
373
- - On Failure: Inspect validation errors, fix fields, retry once
374
- - Extract: Success text, destination URL, confirmation number if present
375
-
376
- 6. Capture result
377
- - Action: \`browser({ action: 'read', pageId, readMode: 'markdown' })\`
378
- - Verify: Output contains the expected success state
379
- - On Failure: Fall back to \`readMode: 'text'\`
380
- - Extract: Confirmation content for downstream skills
381
-
382
- **Cleanup**
383
-
384
- - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
385
-
386
- #### Recipe: Extract Data from Web Page
387
-
388
- **Variables**
389
-
390
- - \`{{url}}\` — target page URL
391
- - \`{{data_selector}}\` — selector for the content to extract
392
- - \`{{pagination_selector}}\` — selector for the next-page control, when pagination exists
393
-
394
- **Steps**
395
-
396
- 1. Open page
397
- - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
398
- - Verify: Browser returns a \`pageId\`
399
- - On Failure: Retry with \`waitUntil: 'networkidle'\`
400
- - Extract: Save \`pageId\`
401
-
402
- 2. Extract content
403
- - Action: \`browser({ action: 'read', pageId, readMode: 'markdown', selector: '{{data_selector}}' })\`
404
- - Verify: Output is non-empty and scoped to the requested selector
405
- - On Failure: Re-run with \`readMode: 'text'\` or confirm the selector with a snapshot read
406
- - Extract: Store extracted content for the current page
407
-
408
- 3. Check for pagination
409
- - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
410
- - Verify: Snapshot shows either a next-page control or a clear end state
411
- - On Failure: Reload once, then re-read
412
- - Extract: Whether \`{{pagination_selector}}\` exists and appears enabled
413
-
414
- 4. Advance when another page exists
415
- - Action: If \`{{pagination_selector}}\` is present and enabled, run \`browser({ action: 'act', pageId, kind: 'click', selector: '{{pagination_selector}}' })\`
416
- - Verify: \`browser({ action: 'navigate', pageId, type: 'waitFor', selector: '{{data_selector}}', timeoutMs: 30000 })\`
417
- - On Failure: Reload page and retry pagination once
418
- - Extract: Updated page content, page count, or cursor state
419
-
420
- 5. Repeat until no more pages
421
- - Action: Return to step 2 after successful pagination
422
- - Verify: Loop exits only when the next-page control is missing or disabled
423
- - On Failure: Stop and report partial results
424
- - Extract: Aggregate page-by-page results
425
-
426
- **Cleanup**
427
-
428
- - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
429
-
430
- #### Recipe: Upload File to Web Service
431
-
432
- **Variables**
433
-
434
- - \`{{url}}\` — upload page URL
435
- - \`{{file_path}}\` — local file path to upload
436
- - \`{{file_input_selector}}\` — file input selector, usually \`input[type="file"]\`
437
-
438
- **Steps**
439
-
440
- 1. Open upload page
441
- - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
442
- - Verify: Upload page loads successfully
443
- - On Failure: Retry with \`mode: 'ui'\`
444
- - Extract: Save \`pageId\`
445
-
446
- 2. Inspect upload controls
447
- - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
448
- - Verify: File input and submit controls are present
449
- - On Failure: Re-read after reload or refine the selector
450
- - Extract: Confirm the file input selector and submit control
451
-
452
- 3. Upload file
453
- - Action: \`browser({ action: 'act', pageId, kind: 'upload', selector: '{{file_input_selector}}', value: '{{file_path}}' })\`
454
- - Verify: Selected filename appears in the page or read output
455
- - On Failure: Verify the file exists, confirm the selector targets a real \`<input type="file">\`, retry once
456
- - Extract: Selected filename and any client-side validation message
457
-
458
- 4. Submit upload
459
- - Action: \`browser({ action: 'act', pageId, kind: 'click', selector: '.upload-submit' })\`
460
- - Verify: \`browser({ action: 'navigate', pageId, type: 'waitFor', selector: '.upload-success', timeoutMs: 30000 })\`
461
- - On Failure: Read the page for upload errors, then retry once if recoverable
462
- - Extract: Completion state and resulting URL if visible
463
-
464
- 5. Verify upload
465
- - Action: \`browser({ action: 'read', pageId, readMode: 'text' })\`
466
- - Verify: Output includes upload confirmation
467
- - On Failure: Take a screenshot and report an ambiguous completion state
468
- - Extract: Confirmation text, file URL, or server response summary
469
-
470
- **Cleanup**
471
-
472
- - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
473
-
474
- #### Recipe: Authenticated Web Task
475
-
476
- **Variables**
477
-
478
- - \`{{login_url}}\` — login page URL
479
- - \`{{target_url}}\` — target page after login
480
- - \`{{username}}\` — account identifier, ask the user if not already known
481
- - \`{{password}}\` — sensitive; do not store or echo it, and prefer having the user type it directly in the browser UI
482
-
483
- **Steps**
484
-
485
- 1. Open login page
486
- - Action: \`browser({ action: 'open', url: '{{login_url}}', mode: 'ui', waitUntil: 'domcontentloaded' })\`
487
- - Verify: Login page is visible
488
- - On Failure: Retry with \`waitUntil: 'load'\`
489
- - Extract: Save \`pageId\`
490
-
491
- 2. Read login form
492
- - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
493
- - Verify: Username, password, and submit controls are visible
494
- - On Failure: Reload and re-read, or ask the user to describe the current page state
495
- - Extract: Login selectors, SSO options, and challenge indicators
496
-
497
- 3. Enter credentials
498
- - Action: Ask the user for \`{{username}}\` if needed, then run \`browser({ action: 'act', pageId, kind: 'type', selector: usernameSelector, text: '{{username}}' })\`
499
- - Action: Have the user type the password directly in the visible browser when possible
500
- - Action: After the user confirms password entry, run \`browser({ action: 'act', pageId, kind: 'click', selector: submitSelector })\`
501
- - Verify: Page advances to a post-login state
502
- - On Failure: Re-read and classify the blocker as invalid credentials, 2FA, CAPTCHA, or selector mismatch
503
- - Extract: Login result state
504
-
505
- 4. Handle post-login challenges
506
- - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
507
- - Verify: Output shows whether 2FA, CAPTCHA, consent, or success is present
508
- - On Failure: Take a screenshot and ask the user what they see
509
- - Extract: Challenge type and controls needed to continue
510
- - If 2FA appears: ask the user for the code or have them enter it directly in the UI, then continue
511
- - If CAPTCHA appears: ask the user to solve it manually, then continue
512
-
513
- 5. Navigate to target
514
- - Action: \`browser({ action: 'navigate', pageId, url: '{{target_url}}' })\`
515
- - Verify: Target page loads and expected content appears
516
- - On Failure: Retry once after a fresh read or follow the app's redirect path manually
517
- - Extract: Final URL and target page state
518
-
519
- 6. Perform task-specific work
520
- - Action: Insert task-specific browser steps using the same Action / Verify / On Failure / Extract pattern
521
- - Verify: Task-specific completion criteria hold
522
- - On Failure: Stop after two failed recoveries and report the current state to the user
523
- - Extract: Requested result data
524
-
525
- **Cleanup**
526
-
527
- - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
528
-
529
- #### Recipe: Monitor Web Page for Changes
530
-
531
- **Variables**
532
-
533
- - \`{{url}}\` — page to monitor
534
- - \`{{watch_selector}}\` — selector for the watched element
535
- - \`{{interval_ms}}\` — time between checks in milliseconds
536
-
537
- **Steps**
538
-
539
- 1. Open page
540
- - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
541
- - Verify: Browser returns a \`pageId\`
542
- - On Failure: Retry with \`mode: 'ui'\`
543
- - Extract: Save \`pageId\`
544
-
545
- 2. Capture baseline
546
- - Action: \`browser({ action: 'read', pageId, readMode: 'text', selector: '{{watch_selector}}' })\`
547
- - Verify: Baseline content is non-empty
548
- - On Failure: Confirm the selector with a snapshot read
549
- - Extract: Baseline content for later comparison
550
-
551
- 3. Wait and re-check
552
- - Action: \`browser({ action: 'eval', pageId, code: 'await new Promise((resolve) => setTimeout(resolve, {{interval_ms}}))' })\`
553
- - Action: \`browser({ action: 'navigate', pageId, type: 'reload' })\`
554
- - Action: \`browser({ action: 'read', pageId, readMode: 'text', selector: '{{watch_selector}}' })\`
555
- - Verify: New content is captured successfully
556
- - On Failure: Reload again and retry once
557
- - Extract: Current content for diffing
558
-
559
- 4. Compare against baseline
560
- - Action: Compare the current content with the stored baseline outside the browser call
561
- - Verify: Comparison is deterministic
562
- - On Failure: Re-run the text read once to rule out a partial load
563
- - Extract: Changed or unchanged state
564
- - If changed: report it to the user and capture \`browser({ action: 'screenshot', pageId, selector: '{{watch_selector}}' })\`
565
- - If unchanged: return to step 3
566
-
567
- **Cleanup**
568
-
569
- - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
570
-
571
- ### Execution Protocol
572
-
573
- When an agent receives a browser recipe to execute:
574
-
575
- 1. **Resolve variables** — ask the user for all unresolved \`{{variables}}\`, and explicitly flag which ones are sensitive.
576
- 2. **Pre-flight the environment** — if the recipe requires auth, destructive actions, uploads, or cookie export, warn the user before starting.
577
- 3. **Run sequentially on each page** — within one page, execute Action → Verify → On Failure → Extract in order. Shared DOM state is not parallel-safe.
578
- 4. **Stop after two failed recoveries on the same step** — report the current state, what failed, and what the user can do next.
579
- 5. **Run cleanup even on failure** — always close pages unless the user asked to keep the session open.
580
- 6. **Summarize results** — report what was completed, what data was extracted, and which steps were skipped or blocked.
581
-
582
- ### Error Recovery Strategies
583
-
584
- | Error | Recovery |
585
- |-------|----------|
586
- | Element not found | Re-read with \`readMode: 'snapshot'\`, adjust selector or ref, then retry once |
587
- | Page timeout | Reload page, wait for a more specific selector, then retry |
588
- | Navigation failed | Verify target URL, try \`waitUntil: 'load'\` on open or \`type: 'waitFor'\` on navigate |
589
- | Auth required | Switch to \`mode: 'ui'\`, follow the auth pattern, and let the user handle secrets directly |
590
- | CAPTCHA or human check | Stop and ask the user to solve it manually, then continue from the next read |
591
- | File upload failed | Verify local file path, confirm the selector targets a real file input, retry once |
592
- | Storage access denied | Fall back to a narrow \`eval\` call only when browser session storage APIs are blocked |
593
- | Network error | Wait briefly, reload, retry once, then report partial progress |
594
-
595
- ### Building Skills on Browser Primitives
596
-
597
- \`browser-use\` is the foundation skill. Domain-specific skills such as deployment planners, release-note generators, internal admin workflows, or authenticated data collectors should depend on it instead of redefining browser semantics.
598
-
599
- When another skill ships browser automation, it should treat this section as the shared contract.
600
-
601
- #### Domain Skill Architecture
602
-
603
- \`browser-use\` provides the primitives. Domain skills provide the workflow.
604
-
605
- A domain skill built on top of browser automation should follow this architecture:
606
-
607
- - The domain skill has its own \`SKILL.md\` that explains what business task it automates, such as creating deployment release notes from GitHub PRs.
608
- - Browser recipes live inside that skill, either embedded directly in \`SKILL.md\` or stored as reusable docs under the skill's \`references/\` directory.
609
- - The domain skill references \`browser-use\` for browser action semantics, security rules, auth escalation, and recovery patterns instead of redefining them.
610
- - The domain skill guides the LLM through the end-to-end workflow, including when to gather inputs, when to run browser recipes, when to switch tools, and how to format the final output.
611
- - Browser recipes handle web interaction details. The domain skill handles business intent, sequencing, domain-specific validation, and final deliverables.
612
-
613
- This separation matters because a teammate usually does not want "a browser script." They want a skill for a business outcome such as deployment planning, release-note generation, status monitoring, or internal-tool automation. The domain skill explains the outcome and uses browser recipes as implementation building blocks.
614
-
615
- #### Example: Deployment Release Notes Skill
616
-
617
- A teammate wants a skill that automates creating release notes by scraping GitHub PRs, commit history, and linked tickets. Here is how the skill's \`SKILL.md\` would be structured:
618
-
619
- \`\`\`markdown
620
- # Deployment Release Notes - Automated Release Documentation
621
-
622
- Generate deployment release notes by collecting PR descriptions, commit messages, and linked tickets from GitHub, then formatting them into a structured release document.
623
-
624
- **When to use:** Before a deployment, when the team needs a summary of changes, or when creating a changelog for stakeholders.
625
-
626
- **Prerequisites:**
627
- - \`browser-use\` skill loaded (provides browser automation primitives and recipe format)
628
- - Access to the GitHub repository (may require auth)
629
-
630
- ## Workflow
631
-
632
- ### Step 1: Gather Context
633
- - Ask the user for: repository URL, release branch/tag, previous release tag
634
- - Determine if GitHub auth is needed (private repo -> use Authenticated Web Task recipe from browser-use)
635
-
636
- ### Step 2: Collect PR Data
637
- Follow this browser recipe:
638
-
639
- # Recipe: Extract GitHub PRs Between Tags
640
-
641
- ## Metadata
642
- - Name: GitHub PR Extraction
643
- - Trigger: Need to list merged PRs between two git tags
644
- - Target: github.com or GitHub Enterprise
645
- - Mode: headless (public repos) or ui (private repos needing auth)
646
- - Requires Auth: depends on repo visibility
647
- - Destructive: no
648
-
649
- ## Variables
650
- - \`{{repo_url}}\` - GitHub repository URL (e.g., https://github.com/org/repo)
651
- - \`{{base_tag}}\` - Previous release tag
652
- - \`{{head_tag}}\` - New release tag
653
-
654
- ## Steps
655
- 1. Open compare page
656
- - Action: \`browser({ action: 'open', url: '{{repo_url}}/compare/{{base_tag}}...{{head_tag}}', mode: 'headless' })\`
657
- - Verify: Page loads with comparison content
658
- - On Failure: Switch to \`mode: 'ui'\` for auth, follow browser-use auth pattern
659
- - Extract: Save \`pageId\`
660
-
661
- 2. Extract PR list
662
- - Action: \`browser({ action: 'read', pageId, readMode: 'markdown', selector: '.js-commits-list-item, .pr-list' })\`
663
- - Verify: Output contains commit or PR references
664
- - On Failure: Try \`readMode: 'dom'\` and parse HTML, or use the commits tab instead
665
- - Extract: List of PRs with titles, numbers, authors
666
-
667
- 3. For each PR, extract details
668
- - Action: \`browser({ action: 'navigate', pageId, url: '{{repo_url}}/pull/{{pr_number}}' })\`
669
- - Action: \`browser({ action: 'read', pageId, readMode: 'markdown', selector: '.comment-body' })\`
670
- - Verify: PR description is captured
671
- - On Failure: Fall back to \`readMode: 'text'\`
672
- - Extract: PR title, description, labels, linked issues
673
-
674
- ## Cleanup
675
- - \`browser({ action: 'session', sessionAction: 'close', pageId })\`
676
-
677
- ### Step 3: Collect Linked Tickets (Optional)
678
- If PRs reference JIRA or ticket URLs, follow the Data Extraction recipe from browser-use to scrape ticket titles and statuses.
679
-
680
- ### Step 4: Format Release Notes
681
- Using the collected data, generate a structured release document:
682
- - Group changes by category (features, fixes, chores) based on PR labels or commit prefixes
683
- - Include PR links, authors, and ticket references
684
- - Add deployment metadata (date, branch, tag range)
685
-
686
- ### Step 5: Output
687
- Present the release notes to the user. Offer to:
688
- - Copy to clipboard
689
- - Save as markdown file
690
- - Create a GitHub Release draft (requires additional browser recipe)
691
-
692
- ## Error Handling
693
- - If GitHub auth is needed, follow browser-use auth patterns (switch to ui mode, let user handle SSO or 2FA)
694
- - If PR extraction returns empty results, verify the tag names exist and the compare URL is correct
695
- - If the ticket system is unreachable, skip ticket enrichment and note it in the output
696
- \`\`\`
697
-
698
- This example shows the pattern: the domain skill orchestrates the workflow, deciding what to collect and how to format it, while \`browser-use\` provides the primitives for opening pages, reading content, handling auth, and recovering from common browser failures.
699
-
700
- #### How to Help Users Create Domain Skills
701
-
702
- When a user asks you to create a skill that automates a web-based workflow, follow this process:
703
-
704
- 1. **Identify the workflow** - Ask what manual steps the user currently performs, which websites or web apps are involved, and what data they need to extract or actions they need to perform.
705
- 2. **Map to recipes** - Break the workflow into discrete browser recipes. Each recipe should handle one website or one logical browser task. For example, extracting PRs from GitHub is one recipe, while formatting release notes is a separate non-browser step.
706
- 3. **Check for reusable recipes** - Reuse and adapt the templates in this skill first: Form Submission, Data Extraction, File Upload, Authenticated Web Task, and Monitor Web Page. Do not write everything from scratch when an existing pattern already fits.
707
- 4. **Structure the skill** - Give the user a skill layout that separates the main workflow from reusable references:
708
-
709
- \`\`\`text
710
- my-automation-skill/
711
- SKILL.md
712
- references/
713
- recipe-github-extract-prs.md
714
- recipe-jira-get-tickets.md
715
- \`\`\`
716
-
717
- 5. **Write the SKILL.md** - Include these sections:
718
- - **Header** - What the skill does, when to use it, and prerequisites. Always list \`browser-use\` when browser recipes are part of the workflow.
719
- - **Workflow** - Numbered high-level steps that mix browser recipes with non-browser reasoning, formatting, summarization, or file generation.
720
- - **Embedded recipes** - Put workflow-specific browser tasks inline when they are tightly coupled to that skill.
721
- - **Referenced recipes** - Link to reusable docs under \`references/\` when the same recipe may be reused across multiple skills.
722
- - **Error handling** - Describe domain-specific recovery, such as what to do when auth fails, data is missing, or target pages change.
723
- - **Output** - State what artifact the skill produces and how it should be delivered to the user.
724
- 6. **Test the skill** - Run each recipe in \`mode: 'ui'\` first to validate selectors, flow, and auth handling. Only switch to \`headless\` after the browser interactions are proven stable.
725
- 7. **Register the skill** - Add it to \`scaffold/definitions/plugins.mjs\` so it deploys with \`aikit init\`.
726
-
727
- #### Domain Skill Ideas
728
-
729
- These are examples of skills teams could build on top of \`browser-use\`:
730
-
731
- | Skill | What it automates | Key recipes used |
732
- |-------|-------------------|------------------|
733
- | Deployment Release Notes | Scrape PRs, commits, and tickets into a formatted changelog | Authenticated Web Task, Data Extraction |
734
- | Deployment Plan Creator | Gather service dependencies, change scope, and risk inputs from internal tools | Data Extraction, Form Submission |
735
- | Status Page Monitor | Watch status pages and summarize changes | Monitor Web Page |
736
- | Form Auto-filler | Pre-fill repetitive internal forms such as expense reports or time sheets | Form Submission, Authenticated Web Task |
737
- | Screenshot Documentation | Capture annotated screenshots of UI flows for docs | Multi-page navigation, screenshots |
738
- | Competitive Analysis | Extract pricing, feature lists, and positioning details from public sites | Data Extraction, pagination |
739
- | Internal Tool Automation | Automate admin workflows in internal web apps that have no API | Authenticated Web Task, Form Submission |
740
-
741
- Use this guidance when you are helping a user create a new skill: describe the business workflow first, identify which browser recipes it needs, and keep browser-specific details aligned with the primitives and safety model documented here.
742
-
743
- #### Naming Convention
744
-
745
- - Use \`recipe-{domain}-{action}.md\` for reusable standalone recipe docs.
746
- - Store reusable examples in the consuming skill's \`references/\` directory.
747
- - Match the recipe title to the user-facing capability, not the implementation detail.
748
-
749
- #### Quality Checklist
750
-
751
- - [ ] Variables documented, including sensitivity and whether the agent may infer them
752
- - [ ] Every step includes Action, Verify, On Failure, and Extract
753
- - [ ] Auth and destructive behavior are declared in metadata
754
- - [ ] Cleanup is present and closes browser pages unless a kept-open session is intentional
755
- - [ ] Recovery paths stop after bounded retries instead of looping indefinitely
756
- - [ ] Recipe was exercised in \`mode: 'ui'\` at least once
757
-
758
- #### Composition Notes
759
-
760
- - Keep steps small enough that one read, screenshot, or selector wait can verify them.
761
- - Prefer \`readMode: 'snapshot'\` to discover controls, \`readMode: 'text'\` to verify outcomes, and \`readMode: 'markdown'\` to capture extracted content.
762
- - Use \`navigate({ type: 'waitFor', selector, timeoutMs })\` instead of timing guesses when a page transition has a concrete ready signal.
763
- - Use \`eval\` only for narrow gaps the built-in actions cannot cover.
764
- - Follow [references/auth-patterns.md](references/auth-patterns.md) for SSO, OAuth, CAPTCHA, or 2FA flows.
765
-
766
- ## Security Model (HARD GATE)
767
-
768
- - AI Kit enforces URL allowlisting before page navigation; respect denials instead of trying alternate bypasses.
769
- - \`eval\` runs inside AI Kit's browser sandbox. Keep scripts minimal, purpose-built, and limited to the user-approved task.
770
- - Password field values are redacted by the runtime. Never ask the tool to expose them and never echo them back to the user.
771
- - Cookie export is gated behind \`action: 'session'\`. Only request cookies when necessary, tell the user they are sensitive, and never store them in code, commits, or logs.
772
- - Never screenshot or copy pages that visibly reveal passwords, tokens, or other secrets.
773
- - Never automate destructive or irreversible actions unless the user explicitly requested them.
774
- - Never bypass 2FA, CAPTCHA, or rate limits. Ask the user to complete the human step, then continue.
775
-
776
- ## Integration with Other Skills
777
-
778
- ### repo-access
779
-
780
- This skill is the final browser escalation path for \`repo-access\`. Use it when CLI auth recovery fails and the target requires SSO, OAuth, or a login wall. Typical flow:
781
-
782
- 1. \`repo-access\` exhausts Steps 1-5.
783
- 2. Load \`browser-use\`.
784
- 3. \`browser({ action: 'open', url: repoUrl, mode: 'ui' })\`
785
- 4. \`browser({ action: 'read', pageId })\` to inspect login state.
786
- 5. Use \`browser({ action: 'act', kind: 'type' | 'click', ... })\` for login fields and buttons.
787
- 6. Use \`browser({ action: 'eval', ... })\` or \`browser({ action: 'session', sessionAction: 'cookies', ... })\` only when the user explicitly needs extracted content or session transfer.
788
-
789
- ### present
790
-
791
- When \`present({ format: 'browser' })\` returns a local dashboard URL, open it with AI Kit's browser tool instead of an external browser MCP:
792
-
793
- \`\`\`
794
- browser({ action: 'open', url: 'http://127.0.0.1:{port}', mode: 'ui' })
795
- \`\`\`
796
-
797
- This keeps the viewing workflow inside the same owned runtime.
798
-
799
- ## Troubleshooting
800
-
801
- | Problem | Response |
802
- |---------|----------|
803
- | Browser runtime missing | Run \`aikit browser install\` and retry |
804
- | No active page or stale \`pageId\` | Re-open with \`action: 'open'\` or inspect \`action: 'session'\` \`list\` output |
805
- | Element refs stop matching | Re-run \`browser({ action: 'read', pageId })\` after each re-render |
806
- | Headless blocked by target site | Retry with \`mode: 'ui'\` or \`mode: 'panel'\` |
807
- | CAPTCHA appears | Ask the user to solve it manually, then continue from \`read\` |
808
- | Need to inspect cookies | Use \`browser({ action: 'session', sessionAction: 'cookies', pageId })\` and warn the user |
809
- | Need complex DOM extraction | Use \`browser({ action: 'eval', ... })\` with a small, targeted script |
810
- | Scroll not loading more content | Add a wait after scroll: eval with setTimeout, then re-read |
811
- | File upload not working | Ensure selector targets an actual \`<input type="file">\` element |
812
- | Storage access denied | Some sites block storage access in certain contexts; try eval instead |
813
- | Cookie set failed | Verify domain/path match the target site; set-cookie requires confirm:true |
814
- | Markdown output too messy | Use \`readMode: 'text'\` for simpler output, or scope with selector |
815
-
816
- ## Decision Flow
817
-
818
- \`\`\`
819
- Need browser help?
820
- ├─ Public page, no JS or auth needed? → web_fetch (simpler, faster)
821
- ├─ Need JS rendering or interaction? → browser open → read
822
- ├─ Need clean markdown of a page? → browser read (readMode: 'markdown')
823
- ├─ Need structured HTML/DOM? → browser read (readMode: 'dom')
824
- ├─ Login wall or SSO flow? → repo-access → browser-use auth patterns
825
- ├─ Need to fill forms / submit data? → browser act (type/click/select)
826
- ├─ Need to upload files? → browser act (upload)
827
- ├─ Need to scroll / load lazy content? → browser act (scroll)
828
- ├─ Need screenshot of specific region? → browser screenshot (clip)
829
- ├─ Need session/cookie management? → browser session (cookies/storage)
830
- ├─ Need local dashboard viewing? → present(browser) → browser open
831
- └─ Complex multi-step automation? → Compose patterns from this skill
832
- \`\`\`
833
- `},{file:`references/auth-patterns.md`,content:`# Browser Auth Patterns
834
-
835
- Patterns for using AI Kit's owned \`browser\` tool to solve authentication challenges that block CLI-based access.
836
-
837
- ## Pattern 1: SAML SSO Recovery
838
-
839
- **Problem:** \`web_fetch\` returns SAML redirect HTML instead of content and \`repo-access\` exhausted its Strategy Ladder.
840
-
841
- **Solution:**
842
- \`\`\`
843
- 1. Open the target URL:
844
- const { pageId } = await browser({ action: 'open', url: targetUrl, mode: 'ui' })
845
-
846
- 2. Read page state:
847
- await browser({ action: 'read', pageId })
848
- → If SSO login form: continue to step 3
849
- → If content is already visible: skip to step 5
850
-
851
- 3. SSO login interaction:
852
- - Username/email field → browser({ action: 'act', pageId, kind: 'type', ref: usernameRef, text: userEmail })
853
- - Password field → browser({ action: 'act', pageId, kind: 'type', ref: passwordRef, text: userPassword })
854
- - Submit button → browser({ action: 'act', pageId, kind: 'click', ref: signInButtonRef })
855
- - Ask the user for credentials first. Never guess.
856
-
857
- 4. Handle redirect chain:
858
- - Re-run \`browser({ action: 'read', pageId })\` after redirects
859
- - If 2FA prompt appears, ask the user for the code and enter it with \`kind: 'type'\`
860
-
861
- 5. Extract content:
862
- - \`browser({ action: 'read', pageId })\` for accessible text
863
- - \`browser({ action: 'eval', ... })\` for targeted extraction
864
- - \`browser({ action: 'screenshot', pageId })\` for visual capture
865
- \`\`\`
866
-
867
- ## Pattern 2: OAuth Consent Flow
868
-
869
- **Problem:** Service requires OAuth consent that cannot be completed in CLI.
870
-
871
- **Solution:**
872
- \`\`\`
873
- 1. const { pageId } = await browser({ action: 'open', url: oauthAuthorizeUrl, mode: 'ui' })
874
-
875
- 2. await browser({ action: 'read', pageId })
876
- → Find the "Authorize" / "Allow" / "Grant access" button
877
-
878
- 3. await browser({ action: 'act', pageId, kind: 'click', ref: authorizeButtonRef })
879
-
880
- 4. await browser({ action: 'read', pageId })
881
- → URL now contains ?code=abc123 or the consent flow is complete
882
-
883
- 5. Extract the final URL when needed:
884
- await browser({ action: 'eval', pageId, code: 'return page.url()' })
885
-
886
- 6. Return the authorization code or completed session to the CLI workflow
887
- \`\`\`
888
-
889
- ## Pattern 3: 2FA / MFA Challenge
890
-
891
- **Problem:** Login requires a 2FA code that only the user can provide.
892
-
893
- **CRITICAL:** Never bypass 2FA and never guess codes.
894
-
895
- **Solution:**
896
- \`\`\`
897
- 1. Complete username/password entry from Pattern 1
898
-
899
- 2. await browser({ action: 'read', pageId })
900
- → Confirm the page shows a 2FA input field
901
-
902
- 3. Ask the user for the code via elicitation
903
-
904
- 4. await browser({ action: 'act', pageId, kind: 'type', ref: totpInputRef, text: userProvidedCode })
905
- 5. await browser({ action: 'act', pageId, kind: 'press', key: 'Enter' })
906
-
907
- 6. await browser({ action: 'read', pageId })
908
- → Verify the page shows authenticated content, not the login form
909
- \`\`\`
910
-
911
- ## Pattern 4: Cookie or Token Transfer
912
-
913
- **Problem:** CLI tools need authenticated session state from the browser.
914
-
915
- **Solution:**
916
- \`\`\`
917
- 1. Complete login flow first
918
-
919
- 2. Export cookies only if the user explicitly asked for session transfer:
920
- await browser({ action: 'session', sessionAction: 'cookies', pageId })
921
-
922
- 3. Use the returned cookie data with CLI tools or \`http\` as needed
923
-
924
- 4. Tell the user the cookies are sensitive and ephemeral.
925
- Never commit, log, or persist them in source files.
926
- \`\`\`
927
-
928
- ## Pattern 5: Content Behind a Login Wall
929
-
930
- **Problem:** \`web_fetch\` returns a login page instead of the target content.
931
-
932
- **Solution:**
933
- \`\`\`
934
- 1. const { pageId } = await browser({ action: 'open', url: targetUrl, mode: 'ui' })
935
-
936
- 2. await browser({ action: 'read', pageId })
937
- → Confirm login form is visible
938
-
939
- 3. Ask the user for credentials
940
-
941
- 4. Fill and submit the form:
942
- - browser({ action: 'act', pageId, kind: 'type', ref: usernameRef, text: userEmail })
943
- - browser({ action: 'act', pageId, kind: 'type', ref: passwordRef, text: userPassword })
944
- - browser({ action: 'act', pageId, kind: 'click', ref: loginButtonRef })
945
-
946
- 5. Handle post-login challenges:
947
- - 2FA → Pattern 3
948
- - Consent screen → Pattern 2
949
- - Success → continue
950
-
951
- 6. Extract content with \`read\`, \`eval\`, or \`screenshot\`
952
- \`\`\`
953
-
954
- ## Pattern 6: CAPTCHA Handling
955
-
956
- **Problem:** Target site shows a CAPTCHA or anti-bot challenge.
957
-
958
- **Detection signals:**
959
- - "Checking your browser..."
960
- - reCAPTCHA, hCaptcha, or Turnstile widgets
961
- - "Please verify you are human"
962
-
963
- **Solution:**
964
- \`\`\`
965
- 1. const { pageId } = await browser({ action: 'open', url: targetUrl, mode: 'ui' })
966
-
967
- 2. Inspect with:
968
- - browser({ action: 'read', pageId })
969
- - browser({ action: 'screenshot', pageId })
970
-
971
- 3. Ask the user to solve the CAPTCHA in the browser window or panel
972
-
973
- 4. After the user confirms, continue with:
974
- browser({ action: 'read', pageId })
975
-
976
- 5. If the CAPTCHA loops, report that manual access is required
977
- \`\`\`
978
-
979
- ## Security Reminders
980
-
981
- - Always ask the user for credentials and 2FA codes; never guess or reuse hidden values
982
- - Exported cookies or tokens are secrets; never log, store, or commit them
983
- - Confirm before submitting forms or performing irreversible actions
984
- - Close authenticated pages when the task is complete: \`browser({ action: 'session', sessionAction: 'close', pageId })\`
985
- - Respect allowlisting, sandboxing, and any runtime security denial from the browser tool
318
+ **Important:** The browser auto-closes after idle timeout. For long-running tasks, interact periodically to reset the idle timer.
986
319
  `}];export{e as default};