npm - @vpxa/aikit - Versions diffs - 0.1.146 → 0.1.148 - Mend

@vpxa/aikit 0.1.146 → 0.1.148

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/package.json +1 -1
package/packages/browser/dist/index.d.ts +2 -1
package/packages/browser/dist/index.js +9 -3
package/scaffold/dist/definitions/plugins.mjs +1 -1
package/scaffold/dist/definitions/skills/browser-use.mjs +269 -936
package/scaffold/dist/definitions/skills/c4-architecture.mjs +25 -20
package/scaffold/dist/definitions/skills/docs.mjs +1097 -1136

package/scaffold/dist/definitions/skills/browser-use.mjs CHANGED Viewed

@@ -1,986 +1,319 @@
-var e=[{file:`SKILL.md`,content:`---
-name: browser-use
-description: "Browser automation for AI agents using AI Kit's owned \`browser\` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) \`web_fetch\` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that \`web_fetch\` cannot handle, (5) user asks to browse, scrape, test, or automate a website, or (6) another skill needs a standard recipe format for browser-driven workflows. Uses AI Kit's owned Chromium runtime and recipe patterns for domain-specific automation skills — no external MCP server dependency."
-metadata:
-  category: cross-cutting
-  domain: general
-  applicability: on-demand
-  inputs: [url, auth-error, browser-task, login-wall]
-  outputs: [page-content, screenshots, extracted-data, authenticated-session]
-  requires: []
-  relatedSkills: [repo-access, present, aikit]
-argument-hint: "URL or browser task description"
----
-# Browser Automation for AI Agents
-Use AI Kit's owned \`browser\` MCP tool to solve authentication barriers, extract data, fill forms, and interact with web applications. This skill bridges CLI-based access failures (login walls, SAML SSO, OAuth, CAPTCHAs) and real browser interaction without any external browser MCP dependency.
-## Runtime Model
-- Single MCP tool: \`browser({ action: ... })\`
-- Action-based dispatch across eight actions: \`open\`, \`read\`, \`act\`, \`navigate\`, \`eval\`, \`screenshot\`, \`dialog\`, \`session\`
-- Owned Chromium runtime managed by AI Kit itself
-- Install browser binaries once with \`aikit browser install\`
-- Runtime modes: \`headless\` for CI, \`ui\` for desktop browser windows, \`panel\` for VS Code-hosted browsing
-- Auto-idle shutdown closes inactive browser sessions after the configured timeout
-- No external MCP server, no separate browser tool registration, no extra setup after install
-## When to Activate
-### Reactive Triggers
-- \`repo-access\` exhausted its Strategy Ladder and SAML SSO, OAuth, or a login wall blocks CLI access.
-- \`web_fetch\` returns login HTML, redirect markup, or a CAPTCHA challenge instead of target content.
-- \`http\` returns \`401\` or \`403\` and the user confirms they can access the site in a browser.
-- Tool output mentions "CAPTCHA", "bot detection", "Cloudflare", "verify you are human", or similar anti-bot language.
-- User asks to interact with a web application, fill forms, click buttons, navigate flows, or extract rendered content.
-- User asks to take screenshots, inspect accessibility output, or debug a page that requires JavaScript.
+var e=[{file:`SKILL.md`,content:"---\nname: browser-use\ndescription: \"Browser automation for AI agents using AI Kit's owned `browser` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) `web_fetch` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that `web_fetch` cannot handle, (5) user asks to browse, scrape, test, or automate a website, or (6) another skill needs a standard recipe format for browser-driven workflows. Uses AI Kit's owned Chromium runtime and recipe patterns for domain-specific automation skills — no external MCP server dependency.\"\nmetadata:\n  category: cross-cutting\n  domain: general\n  applicability: on-demand\n  inputs: [url, auth-error, browser-task, login-wall]\n  outputs: [page-content, screenshots, extracted-data, authenticated-session, network-captures]\n  requires: []\n  relatedSkills: [repo-access, present, aikit]\nargument-hint: \"URL or browser task description\"\n---\n\n# Browser Automation for AI Agents\n\nUse AI Kit's `browser` MCP tool for authentication barriers, data extraction, form interactions, network capture, and web automation. Single tool, action-based dispatch, owned Chromium runtime.\n\n## Runtime\n\n- Tool: `browser({ action: ... })`\n- 11 actions: `open`, `read`, `act`, `navigate`, `network`, `console`, `fetch`, `eval`, `screenshot`, `dialog`, `session`\n- Modes: `headless` (CI), `ui` (desktop), `panel` (VS Code)\n- Install: `aikit browser install`\n- Auto-idle shutdown after timeout\n\n## When to Activate\n\n- `web_fetch` returns login HTML, SAML redirect, or CAPTCHA\n- `http` returns 401/403 and user confirms browser access works\n- `repo-access` Strategy Ladder exhausted — SSO/OAuth blocks CLI\n- Anti-bot detection (Cloudflare, \"verify you are human\")\n- User asks to browse, scrape, automate, test, or interact with a web app\n- Need screenshots, accessibility snapshots, or JS-rendered content\n- Need to capture network traffic or make authenticated API calls using page session\n\n## When NOT to Activate\n\n- Public pages `web_fetch` handles correctly\n- API endpoints reachable via `http` with auth headers\n- Static downloads via `http`\n- Tasks only needing raw HTML/links/outline\n\n## Two Automation Modes\n\n### Script Mode (Default — Imperative)\n\nDirect sequential `browser()` calls. Best for one-off tasks, testing, API capture.\n\n~~~text\n// Open → Read → Act → Read loop\nbrowser({ action: 'open', url: 'https://app.example.com', mode: 'ui' })\nbrowser({ action: 'read', pageId })\nbrowser({ action: 'act', pageId, kind: 'click', ref: '@login-button' })\nbrowser({ action: 'read', pageId })  // verify state changed\n~~~\n\n**Network Intelligence pattern:**\n\n~~~text\nbrowser({ action: 'network', pageId, subAction: 'enable', filter: { resourceTypes: ['xhr', 'fetch'] } })\n// ... navigate/interact to trigger API calls ...\nbrowser({ action: 'network', pageId, subAction: 'get' })\nbrowser({ action: 'network', pageId, subAction: 'export-har' })\n~~~\n\n**Authenticated API calls (using page cookies/session):**\n\n~~~text\nbrowser({ action: 'fetch', pageId, fetchUrl: 'https://app.example.com/api/data', fetchMethod: 'GET' })\n~~~\n\nExecutes `fetch()` in the page, so cookies, session state, and CSRF tokens are reused automatically.\n\n**Console capture:**\n\n~~~text\nbrowser({ action: 'console', pageId, consoleSubAction: 'enable' })\n// ... trigger page actions ...\nbrowser({ action: 'console', pageId, consoleSubAction: 'get', level: 'error' })\n~~~\n\n### Recipe Mode (Declarative)\n\nStructured step-by-step format for reusable workflows and domain skills. Each step declares Action, Verify, On Failure, and Extract fields.\n\nLoad [references/recipes.md](references/recipes.md) for full recipe templates and the recipe format specification.\n\nBrief recipe format:\n\n~~~text\nStep N: <description>\n  Action: browser({ ... })\n  Verify: <condition to check after action>\n  On Failure: <recovery strategy>\n  Extract: <data to capture for next steps>\n~~~\n\n## Action Reference\n\n| Action | Purpose | Key Params |\n|--------|---------|------------|\n| `open` | Launch page | `url`, `mode` (ui/headless/panel), `waitUntil` |\n| `read` | Extract content | `pageId`, `readMode` (snapshot/dom/markdown/text), `selector` |\n| `act` | DOM interaction | `pageId`, `kind`, `ref`/`selector`, `text`/`key`/`value` |\n| `navigate` | Page navigation | `pageId`, `url` or `type` (back/forward/reload/waitFor) |\n| `network` | Capture traffic | `pageId`, `subAction` (enable/get/clear/export-har), `filter` |\n| `console` | Capture console | `pageId`, `consoleSubAction` (enable/get/clear), `level` |\n| `fetch` | Page-context HTTP | `pageId`, `fetchUrl`, `fetchMethod`, `fetchHeaders`, `fetchBody` |\n| `eval` | Execute JS | `pageId`, `code` |\n| `screenshot` | Capture image | `pageId`, `selector`, `fullPage`, `clip`, `format` |\n| `dialog` | Pre-register handler for NEXT dialog | `pageId`, `accept`, `promptText` |\n| `session` | Manage sessions | `sessionAction` (list/close/cookies/set-cookie/get-storage/...) |\n\n## Read Modes\n\n| Mode | Output | Use Case |\n|------|--------|----------|\n| `snapshot` | ARIA accessibility tree with refs | Element targeting, form interaction |\n| `dom` | Raw HTML | HTML structure, debugging |\n| `markdown` | Clean readable text | Content extraction, summarization |\n| `text` | Plain text | Simple text extraction |\n\n## Interaction Kinds\n\n| Kind | Required Params | Notes |\n|------|-----------------|-------|\n| `click` | `ref` or `selector` | Left-click element |\n| `type` | `ref`/`selector` + `text` | Type into input/textarea |\n| `press` | `ref`/`selector` + `key` | Send key to element. Requires a target — use `ref` from snapshot or `selector`. |\n| `hover` | `ref`/`selector` | Trigger hover states |\n| `drag` | `fromRef`/`fromSelector` + `toRef`/`toSelector` | Drag and drop |\n| `select` | `ref`/`selector` + `value` | Select dropdown option |\n| `scroll` | optional `ref`/`selector` | Scroll page or element |\n| `upload` | `ref`/`selector` + `value` (path) | File upload |\n\n### Element Targeting Priority\n\n1. **`ref`** (e.g., `@F12`) — From `read(snapshot)` ARIA tree. Most reliable.\n2. **`selector`** (e.g., `input[name='q']`) — Playwright CSS/attribute selector. Precise.\n3. **`element`** (e.g., `'Submit'`) — Text matching via `text=` locator. **Picks first DOM match regardless of visibility.** Fragile for complex widgets (comboboxes, ARIA roles). Last resort.\n\n**Always `read(snapshot)` first** to get refs before interacting.\n\n> **Visibility Warning**: Playwright `act` waits up to 30s for the target to be visible. If a selector or `element` matches a hidden element first, the action times out. The browser tool does NOT expose a `force` or custom `timeout` parameter.\n>\n> **Workarounds:**\n> - Append `:visible` to selectors: `selector: 'button:has-text(\"Submit\"):visible'`\n> - Use specific selectors instead of `element` when labels are ambiguous (e.g., \"Search\" may match 30+ elements)\n> - Use `read(snapshot)` refs (`@F12`) which always target the specific rendered element\n\n## Network Intelligence\n\nThree new actions for API reverse-engineering and authenticated requests:\n\n**`network`** — Passive traffic capture with circular buffer (200 entries default):\n- `enable`: Start capturing with optional filter (resourceTypes, urlPattern, excludeUrls)\n- `get`: Retrieve captured requests + responses with timing\n- `clear`: Reset buffer\n- `export-har`: Export as HAR 1.2 format\n\nHeaders are redacted by default (Authorization, Cookie, etc.). Pass `showSensitive: true` to see full headers.\n\n**`console`** — Browser console message capture (1000 entries default):\n- `enable`: Start capturing all console output\n- `get`: Retrieve messages, optionally filtered by `level`\n- `clear`: Reset buffer\n\n**`fetch`** — Execute HTTP from page context:\n- Uses the page's live cookies, session, CSRF tokens\n- Supports GET/POST/PUT/PATCH/DELETE/HEAD/OPTIONS\n- Body auto-truncated at 256KB\n- Alternative to extracting cookies then calling `http` tool\n\n**Workflow — Reverse-engineer API:**\n\n~~~text\n1. open target page\n2. network enable (filter: xhr, fetch)\n3. interact with the page (click buttons, submit forms)\n4. network get → see API endpoints, methods, headers\n5. fetch → replay API calls using page session\n~~~\n\n## Session Management\n\n| Action | Purpose | Note |\n|--------|---------|------|\n| `cookies` | Export page cookies | `confirm: true` required |\n| `set-cookie` | Inject cookies | `confirm: true` required |\n| `delete-cookie` / `clear-cookies` | Remove cookies | `confirm: true` required |\n| `get-storage` / `set-storage` / `clear-storage` | localStorage/sessionStorage | |\n| `list` | List open pages | |\n| `close` | Close a page | |\n\n## Security Model\n\n**Hard gates — NEVER bypass:**\n- Credentials go via terminal input (NEVER through tool params or chat)\n- CAPTCHA/MFA: pause and ask user\n- Never store tokens in conversation\n- Close pages containing sensitive data when done\n- Verify page URL before entering credentials (phishing prevention)\n- Use `headless` mode for automated non-interactive tasks; `ui` for user-supervised auth\n\n**Cookie safety gate:** All cookie read/write session actions (`cookies`, `set-cookie`, `delete-cookie`, `clear-cookies`) require `confirm: true` as an explicit acknowledgment. Without it, the tool returns an error.\n\n## Integration\n\n| Skill | Handoff Pattern |\n|-------|------------------|\n| `repo-access` | Strategy Ladder step 6 → browser-use for SSO/OAuth login |\n| `present` | `present({ format: 'browser' })` returns URL → open with browser tool |\n| `aikit` | `web_fetch` fails → browser-use activates |\n\n## Dialog Handling\n\n`dialog()` registers a **one-shot handler** for the NEXT dialog. It must be called **BEFORE** the action that triggers alert, confirm, or prompt.\n\n**Pattern:**\n~~~text\nbrowser({ action: 'dialog', pageId, accept: true })\nbrowser({ action: 'eval', pageId, code: 'confirm(\"Sure?\")' }) // or browser({ action: 'act', ... }) if interaction triggers it\n~~~\n\nFor `prompt` dialogs, pass `promptText` for the response.\n\n## Troubleshooting\n\n| Issue | Fix |\n|-------|-----|\n| \"Browser not installed\" | Run `aikit browser install` |\n| Element not found | `read` with `snapshot` mode first, use ref from ARIA tree |\n| Timeout on navigation | Add `waitUntil: 'networkidle'` to open/navigate |\n| SSO redirect loop | Check cookies with `session({ sessionAction: 'cookies' })` |\n| Anti-bot block | Try `mode: 'ui'`, add delays between actions |\n| Network capture empty | Ensure `enable` called BEFORE navigating |\n\n## Decision Flow\n\n~~~text\nNeed browser?\n├─ Can web_fetch/http handle it? → NO browser needed\n├─ Login wall / SSO / CAPTCHA? → browser-use (Script mode for one-off, Recipe for reusable)\n├─ Need to capture API traffic? → network enable → interact → network get\n├─ Need authenticated API calls? → fetch action (uses page session)\n├─ JS-rendered content? → open + read(markdown)\n├─ Form interaction? → Script mode: open → read(snapshot) → act → verify\n└─ Reusable workflow? → Recipe mode (see references/recipes.md)\n~~~\n"},{file:`references/recipes.md`,content:`# Browser Recipes & Domain Skills
+Reference file for reusable browser automation patterns. Load this when building domain-specific browser workflows.
+## Recipe Format
+Each recipe step declares:
+- **Action**: The browser() call to execute
+- **Verify**: Condition to check (read page, check element, verify URL)
+- **On Failure**: Recovery strategy (retry, alternative selector, escalate)
+- **Extract**: Data to capture for subsequent steps
+## Recipe Templates
+### 1. Form Login
+~~~text
+Step 1: Open login page
+  Action: browser({ action: 'open', url: '<login-url>', mode: 'ui' })
+  Verify: Page contains login form (read snapshot, check for username/password fields)
+  On Failure: URL may have changed — check redirects
+  Extract: pageId
+Step 2: Read form structure
+  Action: browser({ action: 'read', pageId, readMode: 'snapshot' })
+  Verify: Found username field, password field, submit button
+  On Failure: Try readMode 'dom' for hidden fields
+  Extract: field refs (@username, @password, @submit)
+Step 3: Enter credentials
+  Action: browser({ action: 'act', pageId, kind: 'type', ref: '@username', text: '<from-user>' })
+  Action: browser({ action: 'act', pageId, kind: 'type', ref: '@password', text: '<from-user>' })
+  Verify: Fields populated (read snapshot)
+  On Failure: Field may need click-to-focus first
+Step 4: Submit
+  Action: browser({ action: 'act', pageId, kind: 'click', ref: '@submit' })
+  Verify: URL changed OR dashboard content appears (read markdown)
+  On Failure: Check for error messages, CAPTCHA, MFA prompt
+  Extract: authenticated session state
+Step 5: Verify authentication
+  Action: browser({ action: 'read', pageId, readMode: 'markdown' })
+  Verify: Page shows authenticated content (username, dashboard, logout link)
+  On Failure: Retry login or escalate to user
+~~~
+### 2. OAuth/SSO Flow
+~~~text
+Step 1: Open target application
+  Action: browser({ action: 'open', url: '<app-url>', mode: 'ui' })
+  Verify: Redirected to SSO provider (check URL domain change)
+  Extract: pageId, SSO provider URL
+Step 2: Read SSO login page
+  Action: browser({ action: 'read', pageId, readMode: 'snapshot' })
+  Verify: SSO form present (Okta, Azure AD, etc.)
+  On Failure: May need to click "Sign in with SSO" button first
+  Extract: form field refs
+Step 3: Enter SSO credentials (pause for user)
+  Action: PAUSE — ask user to enter credentials in browser window
+  Verify: User confirms login complete OR URL returns to app domain
+  On Failure: Check for MFA prompt, ask user to complete
+Step 4: Verify return to application
+  Action: browser({ action: 'read', pageId, readMode: 'markdown' })
+  Verify: Back on application domain with authenticated content
+  On Failure: Cookie may not have been set — check session cookies
+  Extract: session state
+Step 5: Export session (optional)
+  Action: browser({ action: 'session', pageId, sessionAction: 'cookies', confirm: true })
+  Verify: Got authentication cookies
+  Extract: cookies for http tool usage
+~~~
+### 3. Data Extraction
+~~~text
+Step 1: Navigate to data page
+  Action: browser({ action: 'open', url: '<data-url>', mode: 'headless' })
+  Verify: Page loaded successfully
+  Extract: pageId
+Step 2: Wait for dynamic content
+  Action: browser({ action: 'navigate', pageId, type: 'waitFor', selector: '<data-container>' })
+  Verify: Target element present in DOM
+  On Failure: Increase timeout, check if JS is needed
+Step 3: Extract structured data
+  Action: browser({ action: 'eval', pageId, code: 'document.querySelectorAll("<selector>").map(...)' })
+  Verify: Got expected data structure
+  On Failure: Try different selector, use read(dom) to inspect structure
+  Extract: structured data
+Step 4: Paginate (if needed)
+  Action: browser({ action: 'act', pageId, kind: 'click', ref: '@next-page' })
+  Verify: New content loaded (different from previous page)
+  On Failure: End of pagination reached
+  Extract: additional data, loop back to Step 3
+~~~
+### 4. File Upload
+~~~text
+Step 1: Navigate to upload page
+  Action: browser({ action: 'open', url: '<upload-url>', mode: 'ui' })
+  Verify: Upload form present
+  Extract: pageId
+Step 2: Locate upload input
+  Action: browser({ action: 'read', pageId, readMode: 'snapshot' })
+  Verify: Found file input element
+  Extract: file input ref
+Step 3: Upload file
+  Action: browser({ action: 'act', pageId, kind: 'upload', ref: '@file-input', value: '<local-file-path>' })
+  Verify: File name appears in UI
+  On Failure: Input may be hidden — try selector instead of ref
+Step 4: Submit upload
+  Action: browser({ action: 'act', pageId, kind: 'click', ref: '@upload-button' })
+  Verify: Success message or progress indicator
+  On Failure: Check file size limits, format restrictions
+  Extract: upload result (URL, ID, etc.)
+~~~
+### 5. API Reverse-Engineering (Network Intelligence)
+~~~text
+Step 1: Open application and enable capture
+  Action: browser({ action: 'open', url: '<app-url>', mode: 'ui' })
+  Action: browser({ action: 'network', pageId, subAction: 'enable', filter: { resourceTypes: ['xhr', 'fetch'] } })
+  Verify: Network capture enabled
+  Extract: pageId
+Step 2: Trigger target API calls
+  Action: browser({ action: 'act', pageId, kind: 'click', ref: '@action-button' })
+  Verify: Page state changed (new data loaded)
+  On Failure: Try different interaction to trigger API
+Step 3: Review captured traffic
+  Action: browser({ action: 'network', pageId, subAction: 'get' })
+  Verify: Found target API endpoints in captured requests
+  Extract: API URL, method, headers, response structure
+Step 4: Replay with fetch (using page session)
+  Action: browser({ action: 'fetch', pageId, fetchUrl: '<captured-api-url>', fetchMethod: 'GET' })
+  Verify: Got same response as captured
+  Extract: API response data
+Step 5: Export HAR (optional)
+  Action: browser({ action: 'network', pageId, subAction: 'export-har' })
+  Extract: HAR file for documentation or replay
+~~~
+### 6. Monitoring / Health Check
-### Proactive Triggers
+~~~text
+Step 1: Open target page
+  Action: browser({ action: 'open', url: '<url>', mode: 'headless', waitUntil: 'networkidle' })
+  Verify: Page loaded without errors
+  Extract: pageId
-- Task involves an internal or enterprise web application with SSO.
-- User asks to browse, scrape, test, or automate a website.
-- A workflow already uses \`present({ format: 'browser' })\` and you need to open the returned local dashboard URL.
+Step 2: Check for error indicators
+  Action: browser({ action: 'console', pageId, consoleSubAction: 'enable' })
+  Action: browser({ action: 'read', pageId, readMode: 'markdown' })
+  Verify: No error banners, expected content present
+  Extract: page health status
-## When NOT to Activate
+Step 3: Check console for JS errors
+  Action: browser({ action: 'console', pageId, consoleSubAction: 'get', level: 'error' })
+  Verify: No critical JS errors
+  Extract: error list (if any)
-- Public pages that \`web_fetch\` handles correctly and do not require interaction.
-- API endpoints that are reachable via \`http\` with proper auth headers.
-- Static downloads that work through \`http\` or repo-local tooling.
-- Tasks that only need raw HTML, links, or outline extraction.
+Step 4: Screenshot for visual regression
+  Action: browser({ action: 'screenshot', pageId, fullPage: true })
+  Extract: screenshot for comparison
+~~~
-## Browser Action Reference
+## Creating Domain Skills
-| Action | Purpose | Key Params |
-|--------|---------|------------|
-| \`open\` | Launch browser page | \`url\`, \`mode\` (\`ui\`/\`headless\`/\`panel\`), \`waitUntil\` |
-| \`read\` | Extract page content | \`pageId\`, \`readMode\` (\`snapshot\`/\`dom\`/\`markdown\`/\`text\`), \`selector\` |
-| \`act\` | DOM interactions | \`pageId\`, \`kind\` (\`click\`/\`type\`/\`press\`/\`hover\`/\`drag\`/\`select\`/\`scroll\`/\`upload\`) |
-| \`navigate\` | Page navigation | \`pageId\`, \`url\`/\`type\`/\`selector\` |
-| \`eval\` | Execute JavaScript | \`pageId\`, \`code\` |
-| \`screenshot\` | Capture screenshots | \`pageId\`, \`selector\`, \`fullPage\`, \`clip\`, \`format\`, \`quality\` |
-| \`dialog\` | Handle dialogs | \`pageId\`, \`accept\`, \`promptText\` |
-| \`session\` | Session management | \`sessionAction\` (\`list\`/\`close\`/\`cookies\`/\`set-cookie\`/\`delete-cookie\`/\`clear-cookies\`/\`get-storage\`/\`set-storage\`/\`clear-storage\`) |
+Domain skills use browser-use recipes as building blocks for specific automation tasks (e.g., "jira-automation", "salesforce-extract").
-## Core Workflow
+### Structure
-Every browser task follows the same loop:
+A domain skill should:
+1. Import the browser-use skill as a dependency
+2. Define domain-specific recipes using the recipe format above
+3. Include domain knowledge (selectors, URLs, auth patterns for that service)
+4. Provide error recovery specific to that domain
-\`\`\`
-1. OPEN  → browser({ action: 'open', url: '<target>', mode: 'ui' })
-2. READ  → browser({ action: 'read', pageId })
-3. ACT   → browser({ action: 'act', pageId, kind: 'click' | 'type' | 'press' | 'hover' | 'drag' | 'select', ... })
-4. READ  → browser({ action: 'read', pageId })
-5. LOOP  → Repeat steps 3-4 until the task is complete
-\`\`\`
+### Quality Checklist
-## Usage Examples
+- [ ] Every recipe step has Action + Verify + On Failure
+- [ ] Credentials handled via user input (never hardcoded)
+- [ ] Selectors are resilient (prefer aria labels, data-testid, role over CSS classes)
+- [ ] Timeouts configured for slow pages
+- [ ] Screenshots taken at key verification points
+- [ ] Session cleanup (close page when done)
+- [ ] Works in both headless and ui modes
-### Open and Inspect a Page
+### Composition
-\`\`\`
-const { pageId } = await browser({ action: 'open', url: 'https://example.com', mode: 'ui' })
-await browser({ action: 'read', pageId })
-\`\`\`
+Recipes compose by chaining — output of one recipe feeds into another:
-### Login to a Web Application
+~~~text
+Login Recipe → (authenticated pageId) → Data Extraction Recipe → (data) → Present Results
+~~~
-\`\`\`
-const { pageId } = await browser({ action: 'open', url: 'https://example.com/login', mode: 'ui' })
+For complex flows:
-await browser({ action: 'read', pageId })
-await browser({ action: 'act', pageId, kind: 'type', ref: '@username-input', text: 'user@example.com' })
-await browser({ action: 'act', pageId, kind: 'type', ref: '@password-input', text: '<user-provided>' })
-await browser({ action: 'act', pageId, kind: 'click', ref: '@login-button' })
-await browser({ action: 'read', pageId })
-\`\`\`
+~~~text
+Login Recipe → Network Enable → Interaction Recipe → Network Get → Fetch API → Process Data
+~~~
+`},{file:`references/auth-patterns.md`,content:`# Authentication Patterns
-**Rule:** ask the user for credentials and 2FA codes. Never guess, reuse, or log them.
+Browser-based authentication strategies for different auth mechanisms.
-### Extract Content from an Authenticated Page
+## Pattern 1: Basic Form Login
-\`\`\`
-const { pageId } = await browser({ action: 'open', url: 'https://internal.company.com/docs', mode: 'ui' })
-await browser({ action: 'read', pageId })
+**When:** Simple username/password form without SSO.
-await browser({
-  action: 'eval',
-  pageId,
-  code: "return page.evaluate(() => document.querySelector('main')?.innerText ?? '')",
-})
-\`\`\`
+**Steps:**
+1. \`open\` login page
+2. \`read\` snapshot to find form fields
+3. \`act\` type credentials (ask user for input via terminal)
+4. \`act\` click submit
+5. \`read\` to verify authenticated state
-### Navigate, Hover, and Capture a Screenshot
+**Verification:** URL changed to dashboard, or page contains user-specific content.
-\`\`\`
-await browser({ action: 'navigate', pageId, url: 'https://example.com/dashboard' })
-await browser({ action: 'act', pageId, kind: 'hover', selector: '[data-help]' })
-await browser({ action: 'screenshot', pageId })
-\`\`\`
+**Fallback:** Check for error messages, CAPTCHA, or MFA prompts.
-### Session Management
+## Pattern 2: OAuth2 / SSO (Okta, Azure AD, Google)
-\`\`\`
-await browser({ action: 'session', sessionAction: 'list' })
-await browser({ action: 'session', sessionAction: 'cookies', pageId })
-await browser({ action: 'session', sessionAction: 'close', pageId })
-\`\`\`
+**When:** Application redirects to external identity provider.
-Use cookie export only when the user explicitly needs session transfer back into CLI tools.
+**Steps:**
+1. \`open\` application URL — observe redirect to IdP
+2. \`read\` SSO page snapshot
+3. PAUSE — ask user to complete SSO login in the browser window
+4. Detect redirect back to application (URL change)
+5. \`read\` to verify authenticated
-## Read Modes
+**Verification:** URL domain returned to application, authenticated content visible.
-### Get ARIA snapshot (default)
+**Fallback:** User may need to complete MFA or consent screens manually.
-\`\`\`
-browser({ action: 'read', pageId })
-browser({ action: 'read', pageId, readMode: 'snapshot' })
-\`\`\`
+## Pattern 3: Cookie/Session Transfer
-### Get page as clean markdown
+**When:** Need to use captured session in \`http\` tool for API calls.
-\`\`\`
-browser({ action: 'read', pageId, readMode: 'markdown' })
-\`\`\`
+**Steps:**
+1. Complete authentication via Pattern 1 or 2
+2. \`session({ sessionAction: 'cookies', confirm: true })\` — export all cookies
+3. Use cookies in \`http\` tool: \`http({ url, headers: { Cookie: '<exported>' } })\`
-### Get HTML content (full page or scoped)
+**Alternative (Recommended):** Use \`browser({ action: 'fetch', pageId, fetchUrl: '<api-endpoint>' })\` instead of extracting cookies. The \`fetch\` action executes HTTP requests directly in the page context, automatically using the page's cookies, session, and CSRF tokens. No cookie extraction or manual header management needed.
-\`\`\`
-browser({ action: 'read', pageId, readMode: 'dom' })
-browser({ action: 'read', pageId, readMode: 'dom', selector: 'main' })
-\`\`\`
+**Verification:** API response returns authenticated data.
-### Get plain text
+## Pattern 4: Client Certificate / mTLS
-\`\`\`
-browser({ action: 'read', pageId, readMode: 'text', selector: '.article-content' })
-\`\`\`
+**When:** Site requires client certificate authentication.
-## Advanced Screenshots
+**Steps:**
+1. Certificate must be installed in the system cert store
+2. \`open\` with \`mode: 'ui'\` — browser will prompt for cert selection
+3. PAUSE — user selects certificate
+4. Verify page loads authenticated content
-### Capture specific region
+**Note:** Headless mode may not support cert picker — use \`ui\` mode.
-\`\`\`
-browser({ action: 'screenshot', pageId, clip: { x: 0, y: 0, width: 800, height: 600 } })
-\`\`\`
+## Pattern 5: Multi-Factor Authentication (MFA)
-### JPEG format with quality
+**When:** Login requires second factor (TOTP, SMS, push notification).
-\`\`\`
-browser({ action: 'screenshot', pageId, format: 'jpeg', quality: 80 })
-\`\`\`
+**Steps:**
+1. Complete username/password entry
+2. \`read\` snapshot — detect MFA prompt type
+3. PAUSE — ask user to complete MFA (enter code, approve push, etc.)
+4. \`read\` to verify MFA completed and authenticated
-### Element screenshot with format
+**Hard Rule:** NEVER attempt to automate MFA — always defer to user.
-\`\`\`
-browser({ action: 'screenshot', pageId, selector: '.chart', format: 'png' })
-\`\`\`
+## Pattern 6: API Key Discovery
-## Cookie Management
+**When:** Need to find API keys or tokens from an authenticated web session.
-### Set cookies
+**Steps:**
+1. Authenticate using Pattern 1 or 2
+2. \`network({ subAction: 'enable', filter: { resourceTypes: ['xhr', 'fetch'] } })\`
+3. Navigate to pages that trigger API calls
+4. \`network({ subAction: 'get' })\` — inspect Authorization headers
+5. Extract bearer tokens, API keys from captured request headers
-\`\`\`
-browser({ action: 'session', sessionAction: 'set-cookie', confirm: true, cookies: [{ name: 'token', value: 'abc', domain: '.example.com', path: '/' }] })
-\`\`\`
+**Important:** Pass \`showSensitive: true\` to see full Authorization headers (redacted by default).
-### Delete specific cookie
+**Alternative:** Use \`eval\` to read tokens from localStorage/sessionStorage:
-\`\`\`
-browser({ action: 'session', sessionAction: 'delete-cookie', confirm: true, name: 'tracking' })
-\`\`\`
+~~~text
+browser({ action: 'eval', pageId, code: 'localStorage.getItem("authToken")' })
+~~~
-### Clear all cookies
+## Pattern 7: Session Persistence Across Tasks
-\`\`\`
-browser({ action: 'session', sessionAction: 'clear-cookies', confirm: true })
-\`\`\`
+**When:** Need to maintain authenticated state for multiple operations.
-## Storage Access
+**Steps:**
+1. Authenticate once (any pattern above)
+2. Keep the page open (don't close)
+3. Use \`fetch\` action for subsequent API calls — inherits page session
+4. Or use \`navigate\` to move between pages — cookies persist
-### Read all localStorage
-\`\`\`
-browser({ action: 'session', sessionAction: 'get-storage', pageId, storageType: 'localStorage' })
-\`\`\`
-### Read specific key
-\`\`\`
-browser({ action: 'session', sessionAction: 'get-storage', pageId, storageType: 'localStorage', storageKey: 'user-preferences' })
-\`\`\`
-### Set storage value
-\`\`\`
-browser({ action: 'session', sessionAction: 'set-storage', pageId, storageType: 'localStorage', storageKey: 'theme', storageValue: 'dark' })
-\`\`\`
-### Clear sessionStorage
-\`\`\`
-browser({ action: 'session', sessionAction: 'clear-storage', pageId, storageType: 'sessionStorage' })
-\`\`\`
-## Scroll and Upload
-### Scroll down
-\`\`\`
-browser({ action: 'act', pageId, kind: 'scroll', value: 'down 500' })
-\`\`\`
-### Scroll to top/bottom
-\`\`\`
-browser({ action: 'act', pageId, kind: 'scroll', value: 'top' })
-browser({ action: 'act', pageId, kind: 'scroll', value: 'bottom' })
-\`\`\`
-### Scroll element into view
-\`\`\`
-browser({ action: 'act', pageId, kind: 'scroll', selector: '#target-element' })
-\`\`\`
-### Upload file
-\`\`\`
-browser({ action: 'act', pageId, kind: 'upload', selector: 'input[type="file"]', value: '/path/to/file.pdf' })
-\`\`\`
-### Upload multiple files
-\`\`\`
-browser({ action: 'act', pageId, kind: 'upload', selector: 'input[type="file"]', value: '["/path/file1.pdf", "/path/file2.pdf"]' })
-\`\`\`
-## Browser Automation Recipes
-The browser tool is the foundation for multi-step web automation. Use this section to standardize recipes that domain-specific skills can consume, extend, and execute without inventing their own browser workflow format.
-### Recipe Format
-A browser recipe is a markdown workflow with explicit metadata, variables, steps, and cleanup.
-#### Metadata
-- **Name** — Human-readable recipe name
-- **Trigger** — When the recipe should be used
-- **Target** — Domain, URL family, or app surface it operates on
-- **Mode** — \`headless\`, \`ui\`, or \`panel\`
-- **Requires Auth** — \`yes\` or \`no\`
-- **Destructive** — \`yes\` or \`no\`; destructive recipes require explicit user confirmation before execution
-#### Variables
-Define placeholders the agent must resolve before starting.
-- \`{{url}}\` — target URL
-- \`{{username}}\` — login or account identifier
-- \`{{file_path}}\` — file path for uploads
-For each variable, document what it means, whether the agent can infer it or must ask the user, whether it is sensitive, and an example value when that removes ambiguity.
-#### Steps
-Each numbered step should include:
-1. **Action** — exact \`browser(...)\` call
-2. **Verify** — how to confirm the action succeeded
-3. **On Failure** — recovery path if verification fails
-4. **Extract** — data to capture for later steps or the final result
-#### Cleanup
-Cleanup always runs, even when earlier steps fail. Close pages, export only user-approved session state, and leave the browser runtime in a known state.
-#### Recipe Skeleton
-\`\`\`markdown
-# Recipe: <Name>
-## Metadata
-- Name: <Human-readable name>
-- Trigger: <When to use it>
-- Target: <Domain or URL family>
-- Mode: headless
-- Requires Auth: no
-- Destructive: no
-## Variables
-- \`{{url}}\` — Target URL
-- \`{{selector}}\` — Primary element selector
-## Steps
-1. Open target
-   - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
-   - Verify: Browser returns a \`pageId\`
-   - On Failure: Retry once with \`waitUntil: 'load'\` or \`mode: 'ui'\`
-   - Extract: Save \`pageId\`
-2. Inspect page
-   - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
-   - Verify: Expected controls or content appear in output
-   - On Failure: Reload and re-read
-   - Extract: Save refs, selectors, visible labels
-## Cleanup
-- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
-\`\`\`
-### Recipe Templates
-#### Recipe: Submit Web Form
-**Variables**
-- \`{{url}}\` — form page URL
-- \`{{fields}}\` — field values keyed by selector or control ref
-**Steps**
-1. Open page
-   - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless', waitUntil: 'domcontentloaded' })\`
-   - Verify: Browser returns a \`pageId\`
-   - On Failure: Retry once with \`waitUntil: 'load'\` or \`mode: 'ui'\`
-   - Extract: Save \`pageId\`
-2. Read form structure
-   - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
-   - Verify: Form fields and submit button appear in output
-   - On Failure: Re-read after reload or scope the read with a form selector
-   - Extract: Required fields, labels, visible validation hints, selectors or refs
-3. Fill fields
-   - Action: For text inputs, use \`browser({ action: 'act', pageId, kind: 'type', selector: fieldSelector, text: value })\`
-   - Action: For dropdowns, use \`browser({ action: 'act', pageId, kind: 'select', selector: fieldSelector, value: optionValue })\`
-   - Action: For checkboxes or radio buttons, use \`browser({ action: 'act', pageId, kind: 'click', selector: fieldSelector })\`
-   - Verify: Re-read affected fields or take a screenshot after the batch
-   - On Failure: Re-read page, correct the selector, retry the failed field once
-   - Extract: Inline validation messages and any server-provided field defaults
-4. Verify form state
-   - Action: \`browser({ action: 'screenshot', pageId, fullPage: true })\`
-   - Verify: Screenshot shows required fields populated as expected
-   - On Failure: Read visible validation messages with \`browser({ action: 'read', pageId, readMode: 'text' })\`
-   - Extract: Evidence screenshot for the final report
-5. Submit
-   - Action: \`browser({ action: 'act', pageId, kind: 'click', selector: 'button[type="submit"]' })\`
-   - Verify: \`browser({ action: 'read', pageId, readMode: 'text' })\` shows a success message or the page navigates to a confirmation state
-   - On Failure: Inspect validation errors, fix fields, retry once
-   - Extract: Success text, destination URL, confirmation number if present
-6. Capture result
-   - Action: \`browser({ action: 'read', pageId, readMode: 'markdown' })\`
-   - Verify: Output contains the expected success state
-   - On Failure: Fall back to \`readMode: 'text'\`
-   - Extract: Confirmation content for downstream skills
-**Cleanup**
-- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
-#### Recipe: Extract Data from Web Page
-**Variables**
-- \`{{url}}\` — target page URL
-- \`{{data_selector}}\` — selector for the content to extract
-- \`{{pagination_selector}}\` — selector for the next-page control, when pagination exists
-**Steps**
-1. Open page
-   - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
-   - Verify: Browser returns a \`pageId\`
-   - On Failure: Retry with \`waitUntil: 'networkidle'\`
-   - Extract: Save \`pageId\`
-2. Extract content
-   - Action: \`browser({ action: 'read', pageId, readMode: 'markdown', selector: '{{data_selector}}' })\`
-   - Verify: Output is non-empty and scoped to the requested selector
-   - On Failure: Re-run with \`readMode: 'text'\` or confirm the selector with a snapshot read
-   - Extract: Store extracted content for the current page
-3. Check for pagination
-   - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
-   - Verify: Snapshot shows either a next-page control or a clear end state
-   - On Failure: Reload once, then re-read
-   - Extract: Whether \`{{pagination_selector}}\` exists and appears enabled
-4. Advance when another page exists
-   - Action: If \`{{pagination_selector}}\` is present and enabled, run \`browser({ action: 'act', pageId, kind: 'click', selector: '{{pagination_selector}}' })\`
-   - Verify: \`browser({ action: 'navigate', pageId, type: 'waitFor', selector: '{{data_selector}}', timeoutMs: 30000 })\`
-   - On Failure: Reload page and retry pagination once
-   - Extract: Updated page content, page count, or cursor state
-5. Repeat until no more pages
-   - Action: Return to step 2 after successful pagination
-   - Verify: Loop exits only when the next-page control is missing or disabled
-   - On Failure: Stop and report partial results
-   - Extract: Aggregate page-by-page results
-**Cleanup**
-- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
-#### Recipe: Upload File to Web Service
-**Variables**
-- \`{{url}}\` — upload page URL
-- \`{{file_path}}\` — local file path to upload
-- \`{{file_input_selector}}\` — file input selector, usually \`input[type="file"]\`
-**Steps**
-1. Open upload page
-   - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
-   - Verify: Upload page loads successfully
-   - On Failure: Retry with \`mode: 'ui'\`
-   - Extract: Save \`pageId\`
-2. Inspect upload controls
-   - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
-   - Verify: File input and submit controls are present
-   - On Failure: Re-read after reload or refine the selector
-   - Extract: Confirm the file input selector and submit control
-3. Upload file
-   - Action: \`browser({ action: 'act', pageId, kind: 'upload', selector: '{{file_input_selector}}', value: '{{file_path}}' })\`
-   - Verify: Selected filename appears in the page or read output
-   - On Failure: Verify the file exists, confirm the selector targets a real \`<input type="file">\`, retry once
-   - Extract: Selected filename and any client-side validation message
-4. Submit upload
-   - Action: \`browser({ action: 'act', pageId, kind: 'click', selector: '.upload-submit' })\`
-   - Verify: \`browser({ action: 'navigate', pageId, type: 'waitFor', selector: '.upload-success', timeoutMs: 30000 })\`
-   - On Failure: Read the page for upload errors, then retry once if recoverable
-   - Extract: Completion state and resulting URL if visible
-5. Verify upload
-   - Action: \`browser({ action: 'read', pageId, readMode: 'text' })\`
-   - Verify: Output includes upload confirmation
-   - On Failure: Take a screenshot and report an ambiguous completion state
-   - Extract: Confirmation text, file URL, or server response summary
-**Cleanup**
-- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
-#### Recipe: Authenticated Web Task
-**Variables**
-- \`{{login_url}}\` — login page URL
-- \`{{target_url}}\` — target page after login
-- \`{{username}}\` — account identifier, ask the user if not already known
-- \`{{password}}\` — sensitive; do not store or echo it, and prefer having the user type it directly in the browser UI
-**Steps**
-1. Open login page
-   - Action: \`browser({ action: 'open', url: '{{login_url}}', mode: 'ui', waitUntil: 'domcontentloaded' })\`
-   - Verify: Login page is visible
-   - On Failure: Retry with \`waitUntil: 'load'\`
-   - Extract: Save \`pageId\`
-2. Read login form
-   - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
-   - Verify: Username, password, and submit controls are visible
-   - On Failure: Reload and re-read, or ask the user to describe the current page state
-   - Extract: Login selectors, SSO options, and challenge indicators
-3. Enter credentials
-   - Action: Ask the user for \`{{username}}\` if needed, then run \`browser({ action: 'act', pageId, kind: 'type', selector: usernameSelector, text: '{{username}}' })\`
-   - Action: Have the user type the password directly in the visible browser when possible
-   - Action: After the user confirms password entry, run \`browser({ action: 'act', pageId, kind: 'click', selector: submitSelector })\`
-   - Verify: Page advances to a post-login state
-   - On Failure: Re-read and classify the blocker as invalid credentials, 2FA, CAPTCHA, or selector mismatch
-   - Extract: Login result state
-4. Handle post-login challenges
-   - Action: \`browser({ action: 'read', pageId, readMode: 'snapshot' })\`
-   - Verify: Output shows whether 2FA, CAPTCHA, consent, or success is present
-   - On Failure: Take a screenshot and ask the user what they see
-   - Extract: Challenge type and controls needed to continue
-   - If 2FA appears: ask the user for the code or have them enter it directly in the UI, then continue
-   - If CAPTCHA appears: ask the user to solve it manually, then continue
-5. Navigate to target
-   - Action: \`browser({ action: 'navigate', pageId, url: '{{target_url}}' })\`
-   - Verify: Target page loads and expected content appears
-   - On Failure: Retry once after a fresh read or follow the app's redirect path manually
-   - Extract: Final URL and target page state
-6. Perform task-specific work
-   - Action: Insert task-specific browser steps using the same Action / Verify / On Failure / Extract pattern
-   - Verify: Task-specific completion criteria hold
-   - On Failure: Stop after two failed recoveries and report the current state to the user
-   - Extract: Requested result data
-**Cleanup**
-- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
-#### Recipe: Monitor Web Page for Changes
-**Variables**
-- \`{{url}}\` — page to monitor
-- \`{{watch_selector}}\` — selector for the watched element
-- \`{{interval_ms}}\` — time between checks in milliseconds
-**Steps**
-1. Open page
-   - Action: \`browser({ action: 'open', url: '{{url}}', mode: 'headless' })\`
-   - Verify: Browser returns a \`pageId\`
-   - On Failure: Retry with \`mode: 'ui'\`
-   - Extract: Save \`pageId\`
-2. Capture baseline
-   - Action: \`browser({ action: 'read', pageId, readMode: 'text', selector: '{{watch_selector}}' })\`
-   - Verify: Baseline content is non-empty
-   - On Failure: Confirm the selector with a snapshot read
-   - Extract: Baseline content for later comparison
-3. Wait and re-check
-   - Action: \`browser({ action: 'eval', pageId, code: 'await new Promise((resolve) => setTimeout(resolve, {{interval_ms}}))' })\`
-   - Action: \`browser({ action: 'navigate', pageId, type: 'reload' })\`
-   - Action: \`browser({ action: 'read', pageId, readMode: 'text', selector: '{{watch_selector}}' })\`
-   - Verify: New content is captured successfully
-   - On Failure: Reload again and retry once
-   - Extract: Current content for diffing
-4. Compare against baseline
-   - Action: Compare the current content with the stored baseline outside the browser call
-   - Verify: Comparison is deterministic
-   - On Failure: Re-run the text read once to rule out a partial load
-   - Extract: Changed or unchanged state
-   - If changed: report it to the user and capture \`browser({ action: 'screenshot', pageId, selector: '{{watch_selector}}' })\`
-   - If unchanged: return to step 3
-**Cleanup**
-- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
-### Execution Protocol
-When an agent receives a browser recipe to execute:
-1. **Resolve variables** — ask the user for all unresolved \`{{variables}}\`, and explicitly flag which ones are sensitive.
-2. **Pre-flight the environment** — if the recipe requires auth, destructive actions, uploads, or cookie export, warn the user before starting.
-3. **Run sequentially on each page** — within one page, execute Action → Verify → On Failure → Extract in order. Shared DOM state is not parallel-safe.
-4. **Stop after two failed recoveries on the same step** — report the current state, what failed, and what the user can do next.
-5. **Run cleanup even on failure** — always close pages unless the user asked to keep the session open.
-6. **Summarize results** — report what was completed, what data was extracted, and which steps were skipped or blocked.
-### Error Recovery Strategies
-| Error | Recovery |
-|-------|----------|
-| Element not found | Re-read with \`readMode: 'snapshot'\`, adjust selector or ref, then retry once |
-| Page timeout | Reload page, wait for a more specific selector, then retry |
-| Navigation failed | Verify target URL, try \`waitUntil: 'load'\` on open or \`type: 'waitFor'\` on navigate |
-| Auth required | Switch to \`mode: 'ui'\`, follow the auth pattern, and let the user handle secrets directly |
-| CAPTCHA or human check | Stop and ask the user to solve it manually, then continue from the next read |
-| File upload failed | Verify local file path, confirm the selector targets a real file input, retry once |
-| Storage access denied | Fall back to a narrow \`eval\` call only when browser session storage APIs are blocked |
-| Network error | Wait briefly, reload, retry once, then report partial progress |
-### Building Skills on Browser Primitives
-\`browser-use\` is the foundation skill. Domain-specific skills such as deployment planners, release-note generators, internal admin workflows, or authenticated data collectors should depend on it instead of redefining browser semantics.
-When another skill ships browser automation, it should treat this section as the shared contract.
-#### Domain Skill Architecture
-\`browser-use\` provides the primitives. Domain skills provide the workflow.
-A domain skill built on top of browser automation should follow this architecture:
-- The domain skill has its own \`SKILL.md\` that explains what business task it automates, such as creating deployment release notes from GitHub PRs.
-- Browser recipes live inside that skill, either embedded directly in \`SKILL.md\` or stored as reusable docs under the skill's \`references/\` directory.
-- The domain skill references \`browser-use\` for browser action semantics, security rules, auth escalation, and recovery patterns instead of redefining them.
-- The domain skill guides the LLM through the end-to-end workflow, including when to gather inputs, when to run browser recipes, when to switch tools, and how to format the final output.
-- Browser recipes handle web interaction details. The domain skill handles business intent, sequencing, domain-specific validation, and final deliverables.
-This separation matters because a teammate usually does not want "a browser script." They want a skill for a business outcome such as deployment planning, release-note generation, status monitoring, or internal-tool automation. The domain skill explains the outcome and uses browser recipes as implementation building blocks.
-#### Example: Deployment Release Notes Skill
-A teammate wants a skill that automates creating release notes by scraping GitHub PRs, commit history, and linked tickets. Here is how the skill's \`SKILL.md\` would be structured:
-\`\`\`markdown
-# Deployment Release Notes - Automated Release Documentation
-Generate deployment release notes by collecting PR descriptions, commit messages, and linked tickets from GitHub, then formatting them into a structured release document.
-**When to use:** Before a deployment, when the team needs a summary of changes, or when creating a changelog for stakeholders.
-**Prerequisites:**
-- \`browser-use\` skill loaded (provides browser automation primitives and recipe format)
-- Access to the GitHub repository (may require auth)
-## Workflow
-### Step 1: Gather Context
-- Ask the user for: repository URL, release branch/tag, previous release tag
-- Determine if GitHub auth is needed (private repo -> use Authenticated Web Task recipe from browser-use)
-### Step 2: Collect PR Data
-Follow this browser recipe:
-# Recipe: Extract GitHub PRs Between Tags
-## Metadata
-- Name: GitHub PR Extraction
-- Trigger: Need to list merged PRs between two git tags
-- Target: github.com or GitHub Enterprise
-- Mode: headless (public repos) or ui (private repos needing auth)
-- Requires Auth: depends on repo visibility
-- Destructive: no
-## Variables
-- \`{{repo_url}}\` - GitHub repository URL (e.g., https://github.com/org/repo)
-- \`{{base_tag}}\` - Previous release tag
-- \`{{head_tag}}\` - New release tag
-## Steps
-1. Open compare page
-   - Action: \`browser({ action: 'open', url: '{{repo_url}}/compare/{{base_tag}}...{{head_tag}}', mode: 'headless' })\`
-   - Verify: Page loads with comparison content
-   - On Failure: Switch to \`mode: 'ui'\` for auth, follow browser-use auth pattern
-   - Extract: Save \`pageId\`
-2. Extract PR list
-   - Action: \`browser({ action: 'read', pageId, readMode: 'markdown', selector: '.js-commits-list-item, .pr-list' })\`
-   - Verify: Output contains commit or PR references
-   - On Failure: Try \`readMode: 'dom'\` and parse HTML, or use the commits tab instead
-   - Extract: List of PRs with titles, numbers, authors
-3. For each PR, extract details
-   - Action: \`browser({ action: 'navigate', pageId, url: '{{repo_url}}/pull/{{pr_number}}' })\`
-   - Action: \`browser({ action: 'read', pageId, readMode: 'markdown', selector: '.comment-body' })\`
-   - Verify: PR description is captured
-   - On Failure: Fall back to \`readMode: 'text'\`
-   - Extract: PR title, description, labels, linked issues
-## Cleanup
-- \`browser({ action: 'session', sessionAction: 'close', pageId })\`
-### Step 3: Collect Linked Tickets (Optional)
-If PRs reference JIRA or ticket URLs, follow the Data Extraction recipe from browser-use to scrape ticket titles and statuses.
-### Step 4: Format Release Notes
-Using the collected data, generate a structured release document:
-- Group changes by category (features, fixes, chores) based on PR labels or commit prefixes
-- Include PR links, authors, and ticket references
-- Add deployment metadata (date, branch, tag range)
-### Step 5: Output
-Present the release notes to the user. Offer to:
-- Copy to clipboard
-- Save as markdown file
-- Create a GitHub Release draft (requires additional browser recipe)
-## Error Handling
-- If GitHub auth is needed, follow browser-use auth patterns (switch to ui mode, let user handle SSO or 2FA)
-- If PR extraction returns empty results, verify the tag names exist and the compare URL is correct
-- If the ticket system is unreachable, skip ticket enrichment and note it in the output
-\`\`\`
-This example shows the pattern: the domain skill orchestrates the workflow, deciding what to collect and how to format it, while \`browser-use\` provides the primitives for opening pages, reading content, handling auth, and recovering from common browser failures.
-#### How to Help Users Create Domain Skills
-When a user asks you to create a skill that automates a web-based workflow, follow this process:
-1. **Identify the workflow** - Ask what manual steps the user currently performs, which websites or web apps are involved, and what data they need to extract or actions they need to perform.
-2. **Map to recipes** - Break the workflow into discrete browser recipes. Each recipe should handle one website or one logical browser task. For example, extracting PRs from GitHub is one recipe, while formatting release notes is a separate non-browser step.
-3. **Check for reusable recipes** - Reuse and adapt the templates in this skill first: Form Submission, Data Extraction, File Upload, Authenticated Web Task, and Monitor Web Page. Do not write everything from scratch when an existing pattern already fits.
-4. **Structure the skill** - Give the user a skill layout that separates the main workflow from reusable references:
-\`\`\`text
-my-automation-skill/
-   SKILL.md
-   references/
-      recipe-github-extract-prs.md
-      recipe-jira-get-tickets.md
-\`\`\`
-5. **Write the SKILL.md** - Include these sections:
-   - **Header** - What the skill does, when to use it, and prerequisites. Always list \`browser-use\` when browser recipes are part of the workflow.
-   - **Workflow** - Numbered high-level steps that mix browser recipes with non-browser reasoning, formatting, summarization, or file generation.
-   - **Embedded recipes** - Put workflow-specific browser tasks inline when they are tightly coupled to that skill.
-   - **Referenced recipes** - Link to reusable docs under \`references/\` when the same recipe may be reused across multiple skills.
-   - **Error handling** - Describe domain-specific recovery, such as what to do when auth fails, data is missing, or target pages change.
-   - **Output** - State what artifact the skill produces and how it should be delivered to the user.
-6. **Test the skill** - Run each recipe in \`mode: 'ui'\` first to validate selectors, flow, and auth handling. Only switch to \`headless\` after the browser interactions are proven stable.
-7. **Register the skill** - Add it to \`scaffold/definitions/plugins.mjs\` so it deploys with \`aikit init\`.
-#### Domain Skill Ideas
-These are examples of skills teams could build on top of \`browser-use\`:
-| Skill | What it automates | Key recipes used |
-|-------|-------------------|------------------|
-| Deployment Release Notes | Scrape PRs, commits, and tickets into a formatted changelog | Authenticated Web Task, Data Extraction |
-| Deployment Plan Creator | Gather service dependencies, change scope, and risk inputs from internal tools | Data Extraction, Form Submission |
-| Status Page Monitor | Watch status pages and summarize changes | Monitor Web Page |
-| Form Auto-filler | Pre-fill repetitive internal forms such as expense reports or time sheets | Form Submission, Authenticated Web Task |
-| Screenshot Documentation | Capture annotated screenshots of UI flows for docs | Multi-page navigation, screenshots |
-| Competitive Analysis | Extract pricing, feature lists, and positioning details from public sites | Data Extraction, pagination |
-| Internal Tool Automation | Automate admin workflows in internal web apps that have no API | Authenticated Web Task, Form Submission |
-Use this guidance when you are helping a user create a new skill: describe the business workflow first, identify which browser recipes it needs, and keep browser-specific details aligned with the primitives and safety model documented here.
-#### Naming Convention
-- Use \`recipe-{domain}-{action}.md\` for reusable standalone recipe docs.
-- Store reusable examples in the consuming skill's \`references/\` directory.
-- Match the recipe title to the user-facing capability, not the implementation detail.
-#### Quality Checklist
-- [ ] Variables documented, including sensitivity and whether the agent may infer them
-- [ ] Every step includes Action, Verify, On Failure, and Extract
-- [ ] Auth and destructive behavior are declared in metadata
-- [ ] Cleanup is present and closes browser pages unless a kept-open session is intentional
-- [ ] Recovery paths stop after bounded retries instead of looping indefinitely
-- [ ] Recipe was exercised in \`mode: 'ui'\` at least once
-#### Composition Notes
-- Keep steps small enough that one read, screenshot, or selector wait can verify them.
-- Prefer \`readMode: 'snapshot'\` to discover controls, \`readMode: 'text'\` to verify outcomes, and \`readMode: 'markdown'\` to capture extracted content.
-- Use \`navigate({ type: 'waitFor', selector, timeoutMs })\` instead of timing guesses when a page transition has a concrete ready signal.
-- Use \`eval\` only for narrow gaps the built-in actions cannot cover.
-- Follow [references/auth-patterns.md](references/auth-patterns.md) for SSO, OAuth, CAPTCHA, or 2FA flows.
-## Security Model (HARD GATE)
-- AI Kit enforces URL allowlisting before page navigation; respect denials instead of trying alternate bypasses.
-- \`eval\` runs inside AI Kit's browser sandbox. Keep scripts minimal, purpose-built, and limited to the user-approved task.
-- Password field values are redacted by the runtime. Never ask the tool to expose them and never echo them back to the user.
-- Cookie export is gated behind \`action: 'session'\`. Only request cookies when necessary, tell the user they are sensitive, and never store them in code, commits, or logs.
-- Never screenshot or copy pages that visibly reveal passwords, tokens, or other secrets.
-- Never automate destructive or irreversible actions unless the user explicitly requested them.
-- Never bypass 2FA, CAPTCHA, or rate limits. Ask the user to complete the human step, then continue.
-## Integration with Other Skills
-### repo-access
-This skill is the final browser escalation path for \`repo-access\`. Use it when CLI auth recovery fails and the target requires SSO, OAuth, or a login wall. Typical flow:
-1. \`repo-access\` exhausts Steps 1-5.
-2. Load \`browser-use\`.
-3. \`browser({ action: 'open', url: repoUrl, mode: 'ui' })\`
-4. \`browser({ action: 'read', pageId })\` to inspect login state.
-5. Use \`browser({ action: 'act', kind: 'type' | 'click', ... })\` for login fields and buttons.
-6. Use \`browser({ action: 'eval', ... })\` or \`browser({ action: 'session', sessionAction: 'cookies', ... })\` only when the user explicitly needs extracted content or session transfer.
-### present
-When \`present({ format: 'browser' })\` returns a local dashboard URL, open it with AI Kit's browser tool instead of an external browser MCP:
-\`\`\`
-browser({ action: 'open', url: 'http://127.0.0.1:{port}', mode: 'ui' })
-\`\`\`
-This keeps the viewing workflow inside the same owned runtime.
-## Troubleshooting
-| Problem | Response |
-|---------|----------|
-| Browser runtime missing | Run \`aikit browser install\` and retry |
-| No active page or stale \`pageId\` | Re-open with \`action: 'open'\` or inspect \`action: 'session'\` \`list\` output |
-| Element refs stop matching | Re-run \`browser({ action: 'read', pageId })\` after each re-render |
-| Headless blocked by target site | Retry with \`mode: 'ui'\` or \`mode: 'panel'\` |
-| CAPTCHA appears | Ask the user to solve it manually, then continue from \`read\` |
-| Need to inspect cookies | Use \`browser({ action: 'session', sessionAction: 'cookies', pageId })\` and warn the user |
-| Need complex DOM extraction | Use \`browser({ action: 'eval', ... })\` with a small, targeted script |
-| Scroll not loading more content | Add a wait after scroll: eval with setTimeout, then re-read |
-| File upload not working | Ensure selector targets an actual \`<input type="file">\` element |
-| Storage access denied | Some sites block storage access in certain contexts; try eval instead |
-| Cookie set failed | Verify domain/path match the target site; set-cookie requires confirm:true |
-| Markdown output too messy | Use \`readMode: 'text'\` for simpler output, or scope with selector |
-## Decision Flow
-\`\`\`
-Need browser help?
-├─ Public page, no JS or auth needed?       → web_fetch (simpler, faster)
-├─ Need JS rendering or interaction?        → browser open → read
-├─ Need clean markdown of a page?           → browser read (readMode: 'markdown')
-├─ Need structured HTML/DOM?                → browser read (readMode: 'dom')
-├─ Login wall or SSO flow?                  → repo-access → browser-use auth patterns
-├─ Need to fill forms / submit data?        → browser act (type/click/select)
-├─ Need to upload files?                    → browser act (upload)
-├─ Need to scroll / load lazy content?      → browser act (scroll)
-├─ Need screenshot of specific region?      → browser screenshot (clip)
-├─ Need session/cookie management?          → browser session (cookies/storage)
-├─ Need local dashboard viewing?            → present(browser) → browser open
-└─ Complex multi-step automation?           → Compose patterns from this skill
-\`\`\`
-`},{file:`references/auth-patterns.md`,content:`# Browser Auth Patterns
-Patterns for using AI Kit's owned \`browser\` tool to solve authentication challenges that block CLI-based access.
-## Pattern 1: SAML SSO Recovery
-**Problem:** \`web_fetch\` returns SAML redirect HTML instead of content and \`repo-access\` exhausted its Strategy Ladder.
-**Solution:**
-\`\`\`
-1. Open the target URL:
-   const { pageId } = await browser({ action: 'open', url: targetUrl, mode: 'ui' })
-2. Read page state:
-   await browser({ action: 'read', pageId })
-   → If SSO login form: continue to step 3
-   → If content is already visible: skip to step 5
-3. SSO login interaction:
-   - Username/email field → browser({ action: 'act', pageId, kind: 'type', ref: usernameRef, text: userEmail })
-   - Password field → browser({ action: 'act', pageId, kind: 'type', ref: passwordRef, text: userPassword })
-   - Submit button → browser({ action: 'act', pageId, kind: 'click', ref: signInButtonRef })
-   - Ask the user for credentials first. Never guess.
-4. Handle redirect chain:
-   - Re-run \`browser({ action: 'read', pageId })\` after redirects
-   - If 2FA prompt appears, ask the user for the code and enter it with \`kind: 'type'\`
-5. Extract content:
-   - \`browser({ action: 'read', pageId })\` for accessible text
-   - \`browser({ action: 'eval', ... })\` for targeted extraction
-   - \`browser({ action: 'screenshot', pageId })\` for visual capture
-\`\`\`
-## Pattern 2: OAuth Consent Flow
-**Problem:** Service requires OAuth consent that cannot be completed in CLI.
-**Solution:**
-\`\`\`
-1. const { pageId } = await browser({ action: 'open', url: oauthAuthorizeUrl, mode: 'ui' })
-2. await browser({ action: 'read', pageId })
-   → Find the "Authorize" / "Allow" / "Grant access" button
-3. await browser({ action: 'act', pageId, kind: 'click', ref: authorizeButtonRef })
-4. await browser({ action: 'read', pageId })
-   → URL now contains ?code=abc123 or the consent flow is complete
-5. Extract the final URL when needed:
-   await browser({ action: 'eval', pageId, code: 'return page.url()' })
-6. Return the authorization code or completed session to the CLI workflow
-\`\`\`
-## Pattern 3: 2FA / MFA Challenge
-**Problem:** Login requires a 2FA code that only the user can provide.
-**CRITICAL:** Never bypass 2FA and never guess codes.
-**Solution:**
-\`\`\`
-1. Complete username/password entry from Pattern 1
-2. await browser({ action: 'read', pageId })
-   → Confirm the page shows a 2FA input field
-3. Ask the user for the code via elicitation
-4. await browser({ action: 'act', pageId, kind: 'type', ref: totpInputRef, text: userProvidedCode })
-5. await browser({ action: 'act', pageId, kind: 'press', key: 'Enter' })
-6. await browser({ action: 'read', pageId })
-   → Verify the page shows authenticated content, not the login form
-\`\`\`
-## Pattern 4: Cookie or Token Transfer
-**Problem:** CLI tools need authenticated session state from the browser.
-**Solution:**
-\`\`\`
-1. Complete login flow first
-2. Export cookies only if the user explicitly asked for session transfer:
-   await browser({ action: 'session', sessionAction: 'cookies', pageId })
-3. Use the returned cookie data with CLI tools or \`http\` as needed
-4. Tell the user the cookies are sensitive and ephemeral.
-   Never commit, log, or persist them in source files.
-\`\`\`
-## Pattern 5: Content Behind a Login Wall
-**Problem:** \`web_fetch\` returns a login page instead of the target content.
-**Solution:**
-\`\`\`
-1. const { pageId } = await browser({ action: 'open', url: targetUrl, mode: 'ui' })
-2. await browser({ action: 'read', pageId })
-   → Confirm login form is visible
-3. Ask the user for credentials
-4. Fill and submit the form:
-   - browser({ action: 'act', pageId, kind: 'type', ref: usernameRef, text: userEmail })
-   - browser({ action: 'act', pageId, kind: 'type', ref: passwordRef, text: userPassword })
-   - browser({ action: 'act', pageId, kind: 'click', ref: loginButtonRef })
-5. Handle post-login challenges:
-   - 2FA → Pattern 3
-   - Consent screen → Pattern 2
-   - Success → continue
-6. Extract content with \`read\`, \`eval\`, or \`screenshot\`
-\`\`\`
-## Pattern 6: CAPTCHA Handling
-**Problem:** Target site shows a CAPTCHA or anti-bot challenge.
-**Detection signals:**
-- "Checking your browser..."
-- reCAPTCHA, hCaptcha, or Turnstile widgets
-- "Please verify you are human"
-**Solution:**
-\`\`\`
-1. const { pageId } = await browser({ action: 'open', url: targetUrl, mode: 'ui' })
-2. Inspect with:
-   - browser({ action: 'read', pageId })
-   - browser({ action: 'screenshot', pageId })
-3. Ask the user to solve the CAPTCHA in the browser window or panel
-4. After the user confirms, continue with:
-   browser({ action: 'read', pageId })
-5. If the CAPTCHA loops, report that manual access is required
-\`\`\`
-## Security Reminders
-- Always ask the user for credentials and 2FA codes; never guess or reuse hidden values
-- Exported cookies or tokens are secrets; never log, store, or commit them
-- Confirm before submitting forms or performing irreversible actions
-- Close authenticated pages when the task is complete: \`browser({ action: 'session', sessionAction: 'close', pageId })\`
-- Respect allowlisting, sandboxing, and any runtime security denial from the browser tool
+**Important:** The browser auto-closes after idle timeout. For long-running tasks, interact periodically to reset the idle timer.
 `}];export{e as default};