@vpxa/aikit 0.1.140 → 0.1.142

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  var e=[{file:`SKILL.md`,content:`---
2
2
  name: browser-use
3
- description: "Browser automation for AI agents using Playwright MCP browser tools. Triggered when: (1) repo-access skill exhausts its Strategy Ladder and auth requires browser interaction, (2) \`web_fetch\` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that \`web_fetch\` cannot handle, (5) user asks to browse, scrape, test, or automate a website. Zero setup uses tools available to any MCP client with Playwright MCP server."
3
+ description: "Browser automation for AI agents using AI Kit's owned \`browser\` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) \`web_fetch\` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that \`web_fetch\` cannot handle, (5) user asks to browse, scrape, test, or automate a website. Uses AI Kit's owned Chromium runtime no external MCP server dependency."
4
4
  metadata:
5
5
  category: cross-cutting
6
6
  domain: general
@@ -8,432 +8,334 @@ metadata:
8
8
  inputs: [url, auth-error, browser-task, login-wall]
9
9
  outputs: [page-content, screenshots, extracted-data, authenticated-session]
10
10
  requires: []
11
- relatedSkills: [repo-access, aikit]
11
+ relatedSkills: [repo-access, present, aikit]
12
12
  argument-hint: "URL or browser task description"
13
13
  ---
14
14
 
15
15
  # Browser Automation for AI Agents
16
16
 
17
- Drive the Playwright MCP browser to solve authentication barriers, extract data, fill forms, and interact with web applications. This skill bridges the gap between CLI-based access (which fails on login walls, SAML SSO, CAPTCHAs) and real browser interaction.
17
+ Use AI Kit's owned \`browser\` MCP tool to solve authentication barriers, extract data, fill forms, and interact with web applications. This skill bridges CLI-based access failures (login walls, SAML SSO, OAuth, CAPTCHAs) and real browser interaction without any external browser MCP dependency.
18
18
 
19
- **Zero setup required** — all tools are provided by the Playwright MCP server. No installs, no API keys, no user configuration.
19
+ ## Runtime Model
20
+
21
+ - Single MCP tool: \`browser({ action: ... })\`
22
+ - Action-based dispatch across eight actions: \`open\`, \`read\`, \`act\`, \`navigate\`, \`eval\`, \`screenshot\`, \`dialog\`, \`session\`
23
+ - Owned Chromium runtime managed by AI Kit itself
24
+ - Install browser binaries once with \`aikit browser install\`
25
+ - Runtime modes: \`headless\` for CI, \`ui\` for desktop browser windows, \`panel\` for VS Code-hosted browsing
26
+ - Auto-idle shutdown closes inactive browser sessions after the configured timeout
27
+ - No external MCP server, no separate browser tool registration, no extra setup after install
20
28
 
21
29
  ## When to Activate
22
30
 
23
31
  ### Reactive Triggers
24
32
 
25
- - \`repo-access\` skill exhausted its Strategy Ladder SAML SSO, OAuth, or login walls block all CLI paths.
26
- - \`web_fetch\` returns login page HTML, SAML redirect, or CAPTCHA challenge instead of content.
27
- - \`http\` returns \`401\`/\`403\` and the user confirms they can access the resource in their browser.
28
- - Any tool output contains "CAPTCHA", "bot detection", "Cloudflare", "Please verify you are human", or similar anti-bot language.
29
- - User asks to interact with a web application (fill forms, click buttons, navigate, extract data).
30
- - User asks to take screenshots, test UI, or debug a web page.
31
- - A site requires JavaScript rendering that \`web_fetch\` cannot handle.
33
+ - \`repo-access\` exhausted its Strategy Ladder and SAML SSO, OAuth, or a login wall blocks CLI access.
34
+ - \`web_fetch\` returns login HTML, redirect markup, or a CAPTCHA challenge instead of target content.
35
+ - \`http\` returns \`401\` or \`403\` and the user confirms they can access the site in a browser.
36
+ - Tool output mentions "CAPTCHA", "bot detection", "Cloudflare", "verify you are human", or similar anti-bot language.
37
+ - User asks to interact with a web application, fill forms, click buttons, navigate flows, or extract rendered content.
38
+ - User asks to take screenshots, inspect accessibility output, or debug a page that requires JavaScript.
32
39
 
33
40
  ### Proactive Triggers
34
41
 
35
- - Task involves an internal/enterprise web application with SSO.
36
- - User asks to scrape, automate, or interact with a website.
37
- - User mentions a site that requires login.
42
+ - Task involves an internal or enterprise web application with SSO.
43
+ - User asks to browse, scrape, test, or automate a website.
44
+ - A workflow already uses \`present({ format: 'browser' })\` and you need to open the returned local dashboard URL.
38
45
 
39
46
  ## When NOT to Activate
40
47
 
41
- - Public pages that \`web_fetch\` handles fine (no login, no JS rendering needed).
42
- - API endpoints accessible via \`http\` tool with proper auth headers.
43
- - Static file downloads that work with \`http\`.
44
- - Tasks that only need \`read_page\` on an already-open browser tab.
45
-
46
- ## Available Browser Tools
48
+ - Public pages that \`web_fetch\` handles correctly and do not require interaction.
49
+ - API endpoints that are reachable via \`http\` with proper auth headers.
50
+ - Static downloads that work through \`http\` or repo-local tooling.
51
+ - Tasks that only need raw HTML, links, or outline extraction.
47
52
 
48
- All tools are provided by the Playwright MCP server — zero setup, always available:
53
+ ## Browser Action Reference
49
54
 
50
- | Tool | Purpose | Key Parameters |
51
- |------|---------|----------------|
52
- | \`open_browser_page\` | Open URL in the integrated browser | \`url\`, \`forceNew\` |
53
- | \`read_page\` | Get accessibility snapshot with element refs | \`pageId\` |
54
- | \`click_element\` | Click by ref, selector, or description | \`pageId\`, \`ref\`/\`selector\`, \`element\` |
55
- | \`type_in_page\` | Type text or press keys into elements | \`pageId\`, \`text\`/\`key\`, \`ref\`/\`selector\` |
56
- | \`navigate_page\` | Navigate by URL, back/forward, reload | \`pageId\`, \`url\`/\`type\` |
57
- | \`hover_element\` | Hover over elements (tooltips, menus) | \`pageId\`, \`ref\`/\`selector\` |
58
- | \`drag_element\` | Drag and drop between elements | \`pageId\`, \`fromRef\`, \`toRef\` |
59
- | \`handle_dialog\` | Respond to alerts, confirms, file choosers | \`pageId\`, \`acceptModal\` |
60
- | \`screenshot_page\` | Capture visual screenshot | \`pageId\`, \`ref\`/\`selector\` |
61
- | \`run_playwright_code\` | Run custom Playwright scripts for advanced automation | \`pageId\`, \`code\` |
55
+ | Action | Purpose | Key Fields |
56
+ |--------|---------|------------|
57
+ | \`open\` | Open a page in AI Kit's owned browser runtime | \`url\`, \`mode?\`, \`waitUntil?\` |
58
+ | \`read\` | Return accessibility snapshot with refs and visible structure | \`pageId\` |
59
+ | \`act\` | Interact with the page: click, type, press, hover, drag, select | \`pageId\`, \`kind\`, selector/ref/text/key fields |
60
+ | \`navigate\` | Go to URL, back, forward, reload, or wait for navigation | \`pageId\`, \`url?\`, \`type?\`, \`waitFor?\` |
61
+ | \`eval\` | Run sandboxed JavaScript in the page context | \`pageId\`, \`code\` |
62
+ | \`screenshot\` | Capture page or element screenshot | \`pageId\`, selector/ref fields |
63
+ | \`dialog\` | Accept or dismiss modal dialogs and related prompts | \`pageId\`, \`accept\`, \`promptText?\` |
64
+ | \`session\` | List open pages, close a page, or export cookies | \`sessionAction\`, \`pageId?\` |
62
65
 
63
66
  ## Core Workflow
64
67
 
65
- Every browser interaction follows this pattern:
68
+ Every browser task follows the same loop:
66
69
 
67
70
  \`\`\`
68
- 1. OPEN → open_browser_page({ url: "<target>" })
69
- 2. READ → read_page({ pageId }) — get accessibility tree with element refs
70
- 3. ACT → click_element / type_in_page / navigate_page interact with elements
71
- 4. READ → read_page({ pageId }) — verify the result
71
+ 1. OPEN → browser({ action: 'open', url: '<target>', mode: 'ui' })
72
+ 2. READ → browser({ action: 'read', pageId })
73
+ 3. ACT → browser({ action: 'act', pageId, kind: 'click' | 'type' | 'press' | 'hover' | 'drag' | 'select', ... })
74
+ 4. READ → browser({ action: 'read', pageId })
72
75
  5. LOOP → Repeat steps 3-4 until the task is complete
73
76
  \`\`\`
74
77
 
75
- ### Example: Login to a Web Application
76
-
77
- \`\`\`
78
- open_browser_page({ url: "https://example.com/login" })
79
- → Returns pageId
80
-
81
- read_page({ pageId })
82
- → Shows form with refs: @username-input, @password-input, @login-button
83
-
84
- type_in_page({ pageId, ref: "@username-input", text: "user@example.com" })
85
- → Note: ASK the user for credentials, NEVER guess
78
+ ## Usage Examples
86
79
 
87
- type_in_page({ pageId, ref: "@password-input", text: "<user-provided>" })
80
+ ### Open and Inspect a Page
88
81
 
89
- click_element({ pageId, ref: "@login-button", element: "Login button" })
90
-
91
- read_page({ pageId })
92
- → Verify: page shows dashboard/welcome content, not login form
93
82
  \`\`\`
94
-
95
- ### Example: Extract Content from Authenticated Page
96
-
83
+ const { pageId } = await browser({ action: 'open', url: 'https://example.com', mode: 'ui' })
84
+ await browser({ action: 'read', pageId })
97
85
  \`\`\`
98
- open_browser_page({ url: "https://internal.company.com/docs" })
99
86
 
100
- read_page({ pageId })
101
- → If login wall: follow login flow (see auth patterns)
102
- → If content visible: extract what you need
87
+ ### Login to a Web Application
103
88
 
104
- run_playwright_code({
105
- pageId,
106
- code: \\\`return page.evaluate(() => document.querySelector('main').innerText)\\\`
107
- })
108
- → Returns the page text content
109
89
  \`\`\`
90
+ const { pageId } = await browser({ action: 'open', url: 'https://example.com/login', mode: 'ui' })
110
91
 
111
- ### Example: Fill a Form
112
-
92
+ await browser({ action: 'read', pageId })
93
+ await browser({ action: 'act', pageId, kind: 'type', ref: '@username-input', text: 'user@example.com' })
94
+ await browser({ action: 'act', pageId, kind: 'type', ref: '@password-input', text: '<user-provided>' })
95
+ await browser({ action: 'act', pageId, kind: 'click', ref: '@login-button' })
96
+ await browser({ action: 'read', pageId })
113
97
  \`\`\`
114
- open_browser_page({ url: "https://example.com/form" })
115
- read_page({ pageId }) → identify form fields and their refs
116
98
 
117
- type_in_page({ pageId, ref: "@name-field", text: "John Doe" })
118
- type_in_page({ pageId, ref: "@email-field", text: "john@example.com" })
119
- click_element({ pageId, ref: "@country-select", element: "country dropdown" })
120
- click_element({ pageId, ref: "@us-option", element: "United States option" })
121
- click_element({ pageId, ref: "@submit-button", element: "Submit button" })
99
+ **Rule:** ask the user for credentials and 2FA codes. Never guess, reuse, or log them.
100
+
101
+ ### Extract Content from an Authenticated Page
122
102
 
123
- read_page({ pageId }) → verify submission success
124
103
  \`\`\`
104
+ const { pageId } = await browser({ action: 'open', url: 'https://internal.company.com/docs', mode: 'ui' })
105
+ await browser({ action: 'read', pageId })
125
106
 
126
- ## Advanced: run_playwright_code
107
+ await browser({
108
+ action: 'eval',
109
+ pageId,
110
+ code: "return page.evaluate(() => document.querySelector('main')?.innerText ?? '')",
111
+ })
112
+ \`\`\`
127
113
 
128
- For complex automation that basic tools can't handle, use \`run_playwright_code\`:
114
+ ### Navigate, Hover, and Capture a Screenshot
129
115
 
130
- ### Extract All Links
131
- \`\`\`javascript
132
- return page.evaluate(() =>
133
- Array.from(document.querySelectorAll('a[href]'))
134
- .map(a => ({ text: a.textContent.trim(), href: a.href }))
135
- .filter(l => l.text)
136
- )
137
116
  \`\`\`
138
-
139
- ### Wait for Dynamic Content
140
- \`\`\`javascript
141
- await page.waitForSelector('.results-loaded', { timeout: 10000 })
142
- return page.evaluate(() => document.querySelector('.results').innerText)
117
+ await browser({ action: 'navigate', pageId, url: 'https://example.com/dashboard' })
118
+ await browser({ action: 'act', pageId, kind: 'hover', selector: '[data-help]' })
119
+ await browser({ action: 'screenshot', pageId })
143
120
  \`\`\`
144
121
 
145
- ### Extract Table Data
146
- \`\`\`javascript
147
- return page.evaluate(() => {
148
- const rows = document.querySelectorAll('table tbody tr')
149
- return Array.from(rows).map(row =>
150
- Array.from(row.cells).map(cell => cell.textContent.trim())
151
- )
152
- })
153
- \`\`\`
122
+ ### Session Management
154
123
 
155
- ### Extract Cookies (for session transfer)
156
- \`\`\`javascript
157
- const cookies = await page.context().cookies()
158
- return cookies.filter(c => c.name.includes('session') || c.name.includes('auth'))
159
124
  \`\`\`
160
-
161
- ### Scroll and Load More
162
- \`\`\`javascript
163
- let previousHeight = 0
164
- while (true) {
165
- const height = await page.evaluate(() => document.body.scrollHeight)
166
- if (height === previousHeight) break
167
- previousHeight = height
168
- await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight))
169
- await page.waitForTimeout(1000)
170
- }
171
- return page.evaluate(() => document.body.innerText)
125
+ await browser({ action: 'session', sessionAction: 'list' })
126
+ await browser({ action: 'session', sessionAction: 'cookies', pageId })
127
+ await browser({ action: 'session', sessionAction: 'close', pageId })
172
128
  \`\`\`
173
129
 
174
- ## Integration with repo-access Skill
175
-
176
- This skill is the **browser escalation path** for repo-access. When repo-access cannot solve authentication via CLI strategies (Steps 1-5), browser-use provides the final recovery:
130
+ Use cookie export only when the user explicitly needs session transfer back into CLI tools.
177
131
 
178
- ### Scenario: SAML SSO on GitHub Enterprise
132
+ ## Security Model (HARD GATE)
179
133
 
180
- 1. \`repo-access\` detects SAML SSO redirect in \`web_fetch\` output
181
- 2. \`repo-access\` walks Strategy Ladder all CLI paths fail
182
- 3. **Escalate to browser-use:**
183
- a. \`open_browser_page({ url: repoUrl })\` open repo page
184
- b. \`read_page\` check if SSO redirect appears
185
- c. If SSO login form interact with it (user may need to provide credentials)
186
- d. If SSO auto-completes (IdP session) page loads with repo content
187
- e. Extract content via \`read_page\` or \`run_playwright_code\`
188
- f. For git clone access: extract cookies/tokens via \`run_playwright_code\` to use in CLI
134
+ - AI Kit enforces URL allowlisting before page navigation; respect denials instead of trying alternate bypasses.
135
+ - \`eval\` runs inside AI Kit's browser sandbox. Keep scripts minimal, purpose-built, and limited to the user-approved task.
136
+ - Password field values are redacted by the runtime. Never ask the tool to expose them and never echo them back to the user.
137
+ - Cookie export is gated behind \`action: 'session'\`. Only request cookies when necessary, tell the user they are sensitive, and never store them in code, commits, or logs.
138
+ - Never screenshot or copy pages that visibly reveal passwords, tokens, or other secrets.
139
+ - Never automate destructive or irreversible actions unless the user explicitly requested them.
140
+ - Never bypass 2FA, CAPTCHA, or rate limits. Ask the user to complete the human step, then continue.
189
141
 
190
- ### Scenario: OAuth Login Flow
142
+ ## Integration with Other Skills
191
143
 
192
- 1. A service requires OAuth consent screen interaction
193
- 2. \`open_browser_page({ url: oauthUrl })\` — open OAuth page
194
- 3. \`read_page\` → find the "Authorize" / "Allow" button
195
- 4. \`click_element\` → authorize
196
- 5. \`read_page\` → URL now contains \`?code=abc123\`
197
- 6. Extract the authorization code → return to CLI workflow for token exchange
144
+ ### repo-access
198
145
 
199
- ### Scenario: 2FA / MFA Challenge
146
+ This skill is the final browser escalation path for \`repo-access\`. Use it when CLI auth recovery fails and the target requires SSO, OAuth, or a login wall. Typical flow:
200
147
 
201
- 1. Open login page, fill credentials
202
- 2. Page shows 2FA prompt
203
- 3. **Ask the user** for their 2FA code (NEVER guess or bypass)
204
- 4. \`type_in_page\` enter code
205
- 5. Verify login succeeded via \`read_page\`
148
+ 1. \`repo-access\` exhausts Steps 1-5.
149
+ 2. Load \`browser-use\`.
150
+ 3. \`browser({ action: 'open', url: repoUrl, mode: 'ui' })\`
151
+ 4. \`browser({ action: 'read', pageId })\` to inspect login state.
152
+ 5. Use \`browser({ action: 'act', kind: 'type' | 'click', ... })\` for login fields and buttons.
153
+ 6. Use \`browser({ action: 'eval', ... })\` or \`browser({ action: 'session', sessionAction: 'cookies', ... })\` only when the user explicitly needs extracted content or session transfer.
206
154
 
207
- ### Scenario: Content Behind Login Wall
155
+ ### present
208
156
 
209
- 1. \`web_fetch\` returns login HTML instead of content
210
- 2. \`open_browser_page({ url })\` → open the page
211
- 3. If login form visible → guide through login (ask user for credentials)
212
- 4. Once authenticated → \`read_page\` or \`run_playwright_code\` to extract content
213
- 5. Content is now available without needing \`web_fetch\`
157
+ When \`present({ format: 'browser' })\` returns a local dashboard URL, open it with AI Kit's browser tool instead of an external browser MCP:
214
158
 
215
- ## Security Rules (HARD GATE)
159
+ \`\`\`
160
+ browser({ action: 'open', url: 'http://127.0.0.1:{port}', mode: 'ui' })
161
+ \`\`\`
216
162
 
217
- - **NEVER** extract, log, or display user passwords or secrets from browser sessions.
218
- - **NEVER** screenshot pages containing visible credentials, tokens, or sensitive data.
219
- - **NEVER** automate actions the user hasn't explicitly requested (no purchasing, no sending messages, no deleting content).
220
- - **NEVER** bypass 2FA/MFA — always ask the user for codes.
221
- - **ALWAYS** ask the user for credentials rather than guessing or using stored values.
222
- - **ALWAYS** confirm before submitting forms or performing irreversible actions.
223
- - When extracting cookies via \`run_playwright_code\`, warn the user they contain auth tokens.
224
- - **NEVER** store extracted cookies in code, commits, or logs.
225
- - Close browser sessions when done if they contain authenticated state.
163
+ This keeps the viewing workflow inside the same owned runtime.
226
164
 
227
165
  ## Troubleshooting
228
166
 
229
- | Problem | Solution |
167
+ | Problem | Response |
230
168
  |---------|----------|
231
- | \`open_browser_page\` fails | Check the URL is valid and accessible. Try with \`forceNew: true\` |
232
- | Page shows "No active page" | The pageId is stale — re-open with \`open_browser_page\` |
233
- | Element not found by ref | Refs change on page re-render — call \`read_page\` again to get fresh refs |
234
- | Element not visible | Use \`run_playwright_code\` to scroll: \`await page.evaluate(() => window.scrollBy(0, 500))\` |
235
- | Login redirect loop | The site may need cookies from a different domain check with \`run_playwright_code\` |
236
- | CAPTCHA appears | Ask the user to solve it manually in the browser panel, then continue |
237
- | Page loads empty/blank | Site may block headless browsers try \`screenshot_page\` to see what rendered |
238
- | Dynamic content not loaded | Use \`run_playwright_code\` with \`page.waitForSelector\` before reading |
239
- | Form submission fails | Check for hidden fields or CSRF tokens — use \`run_playwright_code\` to inspect |
240
- | Multiple pages needed | Track multiple pageIds — \`open_browser_page\` returns unique IDs per page |
169
+ | Browser runtime missing | Run \`aikit browser install\` and retry |
170
+ | No active page or stale \`pageId\` | Re-open with \`action: 'open'\` or inspect \`action: 'session'\` \`list\` output |
171
+ | Element refs stop matching | Re-run \`browser({ action: 'read', pageId })\` after each re-render |
172
+ | Headless blocked by target site | Retry with \`mode: 'ui'\` or \`mode: 'panel'\` |
173
+ | CAPTCHA appears | Ask the user to solve it manually, then continue from \`read\` |
174
+ | Need to inspect cookies | Use \`browser({ action: 'session', sessionAction: 'cookies', pageId })\` and warn the user |
175
+ | Need complex DOM extraction | Use \`browser({ action: 'eval', ... })\` with a small, targeted script |
241
176
 
242
177
  ## Decision Flow
243
178
 
244
179
  \`\`\`
245
- Need to access a web resource?
246
- ├─ Public, no JS needed? → web_fetch (don't use browser)
247
- ├─ Public, needs JS rendering? open_browser_page → read_page
248
- ├─ Behind login wall? → open_browser_pagelogin flow → extract content
249
- ├─ repo-access exhausted? → browser-use is the final escalation
250
- ├─ Need to fill forms/click? open_browser_page → read_page → interact
251
- ├─ Need screenshots? open_browser_page screenshot_page
252
- ├─ CAPTCHA blocking access? → ask user to solve in browser panel
253
- └─ Complex multi-step automation? → run_playwright_code for custom scripts
180
+ Need browser help?
181
+ ├─ Public page, no JS or auth needed? → web_fetch
182
+ ├─ Needs JS rendering or interaction? browser open/read
183
+ ├─ Login wall or SSO flow? → repo-accessbrowser-use
184
+ ├─ Need local dashboard viewing? present(browser) browser open
185
+ ├─ Need screenshot or accessibility? browser screenshot/read
186
+ └─ Need cookie/session transfer? browser session (with user approval)
254
187
  \`\`\`
255
188
  `},{file:`references/auth-patterns.md`,content:`# Browser Auth Patterns
256
189
 
257
- Patterns for using Playwright MCP browser tools to solve authentication challenges that block CLI-based access.
190
+ Patterns for using AI Kit's owned \`browser\` tool to solve authentication challenges that block CLI-based access.
258
191
 
259
192
  ## Pattern 1: SAML SSO Recovery
260
193
 
261
- **Problem:** \`web_fetch\` returns SAML redirect HTML instead of content. \`repo-access\` Strategy Ladder exhausted.
194
+ **Problem:** \`web_fetch\` returns SAML redirect HTML instead of content and \`repo-access\` exhausted its Strategy Ladder.
262
195
 
263
196
  **Solution:**
264
197
  \`\`\`
265
198
  1. Open the target URL:
266
- open_browser_page({ url: targetUrl })
199
+ const { pageId } = await browser({ action: 'open', url: targetUrl, mode: 'ui' })
267
200
 
268
- 2. Read page to check state:
269
- read_page({ pageId })
270
- → If SSO login form: proceed to step 3
271
- → If content already visible: skip to step 5
201
+ 2. Read page state:
202
+ await browser({ action: 'read', pageId })
203
+ → If SSO login form: continue to step 3
204
+ → If content is already visible: skip to step 5
272
205
 
273
206
  3. SSO login interaction:
274
- - Find username/email field → type_in_page({ ref, text: userEmail })
275
- - Find password field → type_in_page({ ref, text: userPassword })
276
- - Click "Sign In" click_element({ ref, element: "Sign In button" })
277
- - NOTE: ASK the user for credentials first
207
+ - Username/email field → browser({ action: 'act', pageId, kind: 'type', ref: usernameRef, text: userEmail })
208
+ - Password field → browser({ action: 'act', pageId, kind: 'type', ref: passwordRef, text: userPassword })
209
+ - Submit buttonbrowser({ action: 'act', pageId, kind: 'click', ref: signInButtonRef })
210
+ - Ask the user for credentials first. Never guess.
278
211
 
279
- 4. Handle SSO redirect chain:
280
- - The browser will auto-follow redirects through the IdP
281
- - read_page({ pageId }) after each redirect to check state
282
- - If 2FA prompt appears → ask user for code → type_in_page
212
+ 4. Handle redirect chain:
213
+ - Re-run \`browser({ action: 'read', pageId })\` after redirects
214
+ - If 2FA prompt appears, ask the user for the code and enter it with \`kind: 'type'\`
283
215
 
284
216
  5. Extract content:
285
- - read_page({ pageId }) get accessible text
286
- - Or: run_playwright_code document.querySelector('main').innerText
287
- - Or: screenshot_page for visual content
217
+ - \`browser({ action: 'read', pageId })\` for accessible text
218
+ - \`browser({ action: 'eval', ... })\` for targeted extraction
219
+ - \`browser({ action: 'screenshot', pageId })\` for visual capture
288
220
  \`\`\`
289
221
 
290
222
  ## Pattern 2: OAuth Consent Flow
291
223
 
292
- **Problem:** Service requires OAuth consent that can't be completed in CLI.
224
+ **Problem:** Service requires OAuth consent that cannot be completed in CLI.
293
225
 
294
226
  **Solution:**
295
227
  \`\`\`
296
- 1. open_browser_page({ url: oauthAuthorizeUrl })
228
+ 1. const { pageId } = await browser({ action: 'open', url: oauthAuthorizeUrl, mode: 'ui' })
297
229
 
298
- 2. read_page({ pageId })
230
+ 2. await browser({ action: 'read', pageId })
299
231
  → Find the "Authorize" / "Allow" / "Grant access" button
300
232
 
301
- 3. click_element({ pageId, ref: authorizeButtonRef, element: "Authorize button" })
233
+ 3. await browser({ action: 'act', pageId, kind: 'click', ref: authorizeButtonRef })
302
234
 
303
- 4. read_page({ pageId })
304
- → URL now contains ?code=abc123 (the authorization code)
235
+ 4. await browser({ action: 'read', pageId })
236
+ → URL now contains ?code=abc123 or the consent flow is complete
305
237
 
306
- 5. Extract the code:
307
- run_playwright_code({
308
- pageId,
309
- code: 'return page.url()'
310
- })
311
- → Parse the authorization code from the URL
238
+ 5. Extract the final URL when needed:
239
+ await browser({ action: 'eval', pageId, code: 'return page.url()' })
312
240
 
313
- 6. Return code to CLI workflow for token exchange
241
+ 6. Return the authorization code or completed session to the CLI workflow
314
242
  \`\`\`
315
243
 
316
244
  ## Pattern 3: 2FA / MFA Challenge
317
245
 
318
- **Problem:** Login requires 2FA code that only the user can provide.
246
+ **Problem:** Login requires a 2FA code that only the user can provide.
319
247
 
320
- **CRITICAL:** NEVER try to bypass 2FA. NEVER guess codes. ALWAYS ask the user.
248
+ **CRITICAL:** Never bypass 2FA and never guess codes.
321
249
 
322
250
  **Solution:**
323
251
  \`\`\`
324
- 1. Complete username/password entry (Pattern 1 steps 1-3)
325
-
326
- 2. read_page({ pageId })
327
- → Page shows 2FA input field
252
+ 1. Complete username/password entry from Pattern 1
328
253
 
329
- 3. Ask the user for their 2FA code via elicitation
254
+ 2. await browser({ action: 'read', pageId })
255
+ → Confirm the page shows a 2FA input field
330
256
 
331
- 4. type_in_page({ pageId, ref: totpInputRef, text: userProvidedCode })
257
+ 3. Ask the user for the code via elicitation
332
258
 
333
- 5. click_element({ pageId, ref: verifyButtonRef, element: "Verify button" })
334
- Or: type_in_page({ pageId, key: "Enter" })
259
+ 4. await browser({ action: 'act', pageId, kind: 'type', ref: totpInputRef, text: userProvidedCode })
260
+ 5. await browser({ action: 'act', pageId, kind: 'press', key: 'Enter' })
335
261
 
336
- 6. read_page({ pageId })
337
- → Verify: page shows authenticated content, not login/2FA form
262
+ 6. await browser({ action: 'read', pageId })
263
+ → Verify the page shows authenticated content, not the login form
338
264
  \`\`\`
339
265
 
340
- ## Pattern 4: Cookie/Token Extraction
266
+ ## Pattern 4: Cookie or Token Transfer
341
267
 
342
- **Problem:** Need to extract auth tokens from an authenticated browser session for use in CLI tools.
268
+ **Problem:** CLI tools need authenticated session state from the browser.
343
269
 
344
270
  **Solution:**
345
271
  \`\`\`
346
- 1. Complete login flow (Patterns 1-3)
347
-
348
- 2. Extract cookies:
349
- run_playwright_code({
350
- pageId,
351
- code: \\\`
352
- const cookies = await page.context().cookies()
353
- return cookies.filter(c =>
354
- c.name.includes('session') ||
355
- c.name.includes('auth') ||
356
- c.name.includes('token')
357
- )
358
- \\\`
359
- })
360
-
361
- 3. Use extracted cookie values in http tool:
362
- http({
363
- url: apiEndpoint,
364
- headers: { "Cookie": "session=<extracted-value>" }
365
- })
366
-
367
- 4. WARNING: Tell the user these tokens are ephemeral and will expire.
368
- NEVER store them in code, commits, or logs.
272
+ 1. Complete login flow first
273
+
274
+ 2. Export cookies only if the user explicitly asked for session transfer:
275
+ await browser({ action: 'session', sessionAction: 'cookies', pageId })
276
+
277
+ 3. Use the returned cookie data with CLI tools or \`http\` as needed
278
+
279
+ 4. Tell the user the cookies are sensitive and ephemeral.
280
+ Never commit, log, or persist them in source files.
369
281
  \`\`\`
370
282
 
371
- ## Pattern 5: Content Behind Login Wall
283
+ ## Pattern 5: Content Behind a Login Wall
372
284
 
373
- **Problem:** \`web_fetch\` returns a login page instead of content.
285
+ **Problem:** \`web_fetch\` returns a login page instead of the target content.
374
286
 
375
287
  **Solution:**
376
288
  \`\`\`
377
- 1. open_browser_page({ url: targetUrl })
289
+ 1. const { pageId } = await browser({ action: 'open', url: targetUrl, mode: 'ui' })
378
290
 
379
- 2. read_page({ pageId })
380
- Login form visible
291
+ 2. await browser({ action: 'read', pageId })
292
+ Confirm login form is visible
381
293
 
382
- 3. Ask user for credentials (NEVER guess)
294
+ 3. Ask the user for credentials
383
295
 
384
- 4. Fill and submit login form:
385
- type_in_page({ pageId, ref: usernameRef, text: userEmail })
386
- type_in_page({ pageId, ref: passwordRef, text: userPassword })
387
- click_element({ pageId, ref: loginButtonRef, element: "Login button" })
296
+ 4. Fill and submit the form:
297
+ - browser({ action: 'act', pageId, kind: 'type', ref: usernameRef, text: userEmail })
298
+ - browser({ action: 'act', pageId, kind: 'type', ref: passwordRef, text: userPassword })
299
+ - browser({ action: 'act', pageId, kind: 'click', ref: loginButtonRef })
388
300
 
389
301
  5. Handle post-login challenges:
390
- read_page({ pageId })
391
- 2FA? → Pattern 3
392
- Consent screen? Pattern 2
393
- → Content visible? → Continue
394
-
395
- 6. Extract the content:
396
- read_page({ pageId }) → accessible text
397
- run_playwright_code → targeted extraction
398
- screenshot_page → visual capture
302
+ - 2FA → Pattern 3
303
+ - Consent screen → Pattern 2
304
+ - Successcontinue
305
+
306
+ 6. Extract content with \`read\`, \`eval\`, or \`screenshot\`
399
307
  \`\`\`
400
308
 
401
309
  ## Pattern 6: CAPTCHA Handling
402
310
 
403
- **Problem:** Target site shows CAPTCHA challenge.
311
+ **Problem:** Target site shows a CAPTCHA or anti-bot challenge.
404
312
 
405
313
  **Detection signals:**
406
- - "Checking your browser..." (Cloudflare)
407
- - reCAPTCHA / hCaptcha / Turnstile widget visible
314
+ - "Checking your browser..."
315
+ - reCAPTCHA, hCaptcha, or Turnstile widgets
408
316
  - "Please verify you are human"
409
317
 
410
318
  **Solution:**
411
319
  \`\`\`
412
- 1. open_browser_page({ url: targetUrl })
320
+ 1. const { pageId } = await browser({ action: 'open', url: targetUrl, mode: 'ui' })
413
321
 
414
- 2. read_page({ pageId }) or screenshot_page({ pageId })
415
- CAPTCHA visible
322
+ 2. Inspect with:
323
+ - browser({ action: 'read', pageId })
324
+ - browser({ action: 'screenshot', pageId })
416
325
 
417
- 3. ASK THE USER to solve the CAPTCHA:
418
- "A CAPTCHA challenge appeared on the page. Please solve it
419
- in the browser panel, then let me know when done."
326
+ 3. Ask the user to solve the CAPTCHA in the browser window or panel
420
327
 
421
- 4. After user confirms:
422
- read_page({ pageId })
423
- → Content should now be accessible
328
+ 4. After the user confirms, continue with:
329
+ browser({ action: 'read', pageId })
424
330
 
425
- 5. If CAPTCHA reappears the site may be aggressively blocking
426
- automation. Report to user and suggest manual access.
331
+ 5. If the CAPTCHA loops, report that manual access is required
427
332
  \`\`\`
428
333
 
429
- **Key rule:** NEVER attempt to solve CAPTCHAs programmatically. Always ask the user.
430
-
431
334
  ## Security Reminders
432
335
 
433
- - Always ask the user for credentials NEVER guess, infer, or reuse
434
- - Extracted cookies/tokens are SECRETS never log, store, or commit them
435
- - Tell the user when you extract auth tokens and that they expire
336
+ - Always ask the user for credentials and 2FA codes; never guess or reuse hidden values
337
+ - Exported cookies or tokens are secrets; never log, store, or commit them
436
338
  - Confirm before submitting forms or performing irreversible actions
437
- - Close authenticated sessions when the task is complete
438
- - Never bypass security measures (2FA, CAPTCHA, rate limits)
339
+ - Close authenticated pages when the task is complete: \`browser({ action: 'session', sessionAction: 'close', pageId })\`
340
+ - Respect allowlisting, sandboxing, and any runtime security denial from the browser tool
439
341
  `}];export{e as default};