@vpxa/aikit 0.1.184 → 0.1.186
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/packages/browser/dist/index.js +1 -1
- package/packages/server/dist/index.js +1 -1
- package/packages/server/dist/{server-TZDCpgEd.js → server-BCnhTdUy.js} +115 -115
- package/scaffold/dist/definitions/bodies.mjs +17 -1
- package/scaffold/dist/definitions/skills/browser-use.mjs +1 -1
|
@@ -268,7 +268,23 @@ Before every tool call, verify:
|
|
|
268
268
|
| \`lesson-learned\` | After completing work |
|
|
269
269
|
| \`docs\` | \`_docs-sync\` epilogue |
|
|
270
270
|
| \`repo-access\` | Auth failures (401/403/404/SSO) — ALWAYS walk ladder before declaring inaccessible |
|
|
271
|
-
| \`browser-use\` | After repo-access ladder exhausted,
|
|
271
|
+
| \`browser-use\` | After repo-access ladder exhausted, OR when agent needs to open/inspect/verify any web page (including \`present\` output) |
|
|
272
|
+
|
|
273
|
+
## Agent Browser Use — HARD RULE
|
|
274
|
+
|
|
275
|
+
When the agent needs to **open, inspect, verify, or interact** with any web page:
|
|
276
|
+
- **ALWAYS** use \`browser({ action: 'open', url, mode: 'ui' })\` + \`browser({ action: 'read' })\`
|
|
277
|
+
- **NEVER** use system browser (\`Start-Process\`, \`open\`, \`xdg-open\`) — provides no feedback to the agent
|
|
278
|
+
- Load the \`browser-use\` skill for advanced patterns (recipes, network capture, auth flows)
|
|
279
|
+
|
|
280
|
+
This applies when:
|
|
281
|
+
- Verifying \`present\` tool rendered output (screenshot or read to confirm rendering)
|
|
282
|
+
- Inspecting a URL before dispatching to subagents
|
|
283
|
+
- Checking web content that \`web_fetch\` cannot handle (JS-rendered, auth-walled)
|
|
284
|
+
|
|
285
|
+
Does NOT apply when:
|
|
286
|
+
- \`present\` tool internally opens system browser for user viewing (that’s the tool’s concern, not the agent’s)
|
|
287
|
+
- \`web_fetch\` / \`http\` can retrieve the content directly (no browser needed)
|
|
272
288
|
|
|
273
289
|
## Repo Access + Browser Escalation — HARD RULE
|
|
274
290
|
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
var e=[{file:`SKILL.md`,content:"---\nname: browser-use\ndescription: \"Browser automation for AI agents using AI Kit's owned `browser` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) `web_fetch` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that `web_fetch` cannot handle, (5) user asks to browse, scrape, test, or automate a website, or (6) another skill needs a standard recipe format for browser-driven workflows. Uses AI Kit's owned Chromium runtime and recipe patterns for domain-specific automation skills — no external MCP server dependency.\"\nmetadata:\n category: cross-cutting\n domain: general\n applicability: on-demand\n inputs: [url, auth-error, browser-task, login-wall]\n outputs: [page-content, screenshots, extracted-data, authenticated-session, network-captures]\n requires: []\n relatedSkills: [repo-access, present, aikit]\nargument-hint: \"URL or browser task description\"\n---\n\n# Browser Automation for AI Agents\n\nUse AI Kit's `browser` MCP tool for authentication barriers, data extraction, form interactions, network capture, and web automation. Single tool, action-based dispatch, owned Chromium runtime.\n\n## Quick Reference\n\n**Tool:** `browser({ action: \"...\", ... })` — single tool, 13 actions, owned Chromium.\n\n**Actions:**\n| Action | Purpose | Key params |\n|--------|---------|------------|\n| `open` | Launch page | `url`, `mode` (ui/headless), `label`, `autoDialog` |\n| `read` | Get page content | `pageId`, `readMode` (snapshot/dom/markdown/text) |\n| `act` | Interact with elements | `pageId`, `kind` (click/type/press/hover/drag/select/scroll/upload) |\n| `batch` | Execute multiple actions in one call | `steps` (array of action objects) |\n| `diff` | Compare accessibility snapshot vs baseline | `pageId` |\n| `navigate` | Go to URL, back/forward, wait | `pageId`, `url` or `type` |\n| `network` | Capture network traffic | `pageId`, `subAction` (enable/get/clear) |\n| `console` | Browser console messages | `pageId`, `consoleSubAction` |\n| `fetch` | HTTP with page cookies | `pageId`, `fetchUrl` |\n| `eval` | Run JS in page context | `pageId`, `code` |\n| `screenshot` | Capture page/element | `pageId`, `fullPage?`, `selector?` |\n| `dialog` | Handle alert/confirm/prompt | `pageId`, `accept` |\n| `session` | List pages, cookies, storage | `sessionAction` |\n\n**Two modes:**\n- **Script Mode** (default) — direct sequential `browser()` calls for one-off tasks\n- **Recipe Mode** — reusable labeled step sequences for domain-specific automation\n\n**Activate when:** `web_fetch` returns login/SAML/CAPTCHA, `http` gets 401/403, anti-bot detection, need JS rendering or screenshots.\n**Skip when:** Public pages (`web_fetch` works), API endpoints (`http` works), static downloads.\n\n**⚠️ `file:///` URLs are blocked** — serve locally with `npx serve` then open `http://localhost`.\n\n## Runtime\n\n- Tool: `browser({ action: ... })`\n- 13 actions: `open`, `read`, `act`, `batch`, `diff`, `navigate`, `network`, `console`, `fetch`, `eval`, `screenshot`, `dialog`, `session`\n- Modes: `headless` (CI), `ui` (desktop), `panel` (VS Code)\n- Install: `aikit browser install`\n- Auto-idle shutdown after timeout\n\n## When to Activate\n\n- `web_fetch` returns login HTML, SAML redirect, or CAPTCHA\n- `http` returns 401/403 and user confirms browser access works\n- `repo-access` Strategy Ladder exhausted — SSO/OAuth blocks CLI\n- Anti-bot detection (Cloudflare, \"verify you are human\")\n- User asks to browse, scrape, automate, test, or interact with a web app\n- Need screenshots, accessibility snapshots, or JS-rendered content\n- Preview or inspect local HTML files (serve locally, then open with browser)\n- Need to capture network traffic or make authenticated API calls using page session\n\n## When NOT to Activate\n\n- Public pages `web_fetch` handles correctly\n- API endpoints reachable via `http` with auth headers\n- Static downloads via `http`\n- Tasks only needing raw HTML/links/outline\n\n## Auto-Activation Detection (for agent protocols)\n\nThese exact patterns in tool responses should trigger browser escalation WITHOUT loading this full skill:\n\n| Tool Response Signal | Action |\n|---------------------|--------|\n| `web_fetch` body contains: `<form.*login`, `saml`, `sso`, `captcha`, `cloudflare`, `verify you are human` | Open URL in browser, wait for user auth |\n| `http` status: 401, 403, 407 | Open URL in browser, wait for user auth |\n| `web_fetch` returns `<meta http-equiv=\"refresh\"` to different domain | SSO redirect detected — open final URL in browser |\n| `web_fetch` body is mostly JavaScript with no readable content | JS-rendered page — open in browser, read with `readMode: 'markdown'` |\n| `repo-access` ladder exhausted | Open repo URL in browser for manual auth |\n\n**Key principle:** The browser tool is ALWAYS available (it's an MCP tool, not a skill). Agents should call `browser({ action: 'open', ... })` directly. This skill provides RECIPES and ADVANCED PATTERNS — but basic browser escalation requires NO skill loading.\n\n## Two Automation Modes\n\n### Script Mode (Default — Imperative)\n\nDirect sequential `browser()` calls. Best for one-off tasks, testing, API capture.\n\n~~~text\n// Open → Read → Act → Read loop\nbrowser({ action: 'open', url: 'https://app.example.com', mode: 'ui' })\nbrowser({ action: 'read', pageId })\nbrowser({ action: 'act', pageId, kind: 'click', ref: '@login-button' })\nbrowser({ action: 'read', pageId }) // verify state changed\n~~~\n\n**Network Intelligence pattern:**\n\n~~~text\nbrowser({ action: 'network', pageId, subAction: 'enable', filter: { resourceTypes: ['xhr', 'fetch'] } })\n// ... navigate/interact to trigger API calls ...\nbrowser({ action: 'network', pageId, subAction: 'get' })\nbrowser({ action: 'network', pageId, subAction: 'export-har' })\n~~~\n\n**Authenticated API calls (using page cookies/session):**\n\n~~~text\nbrowser({ action: 'fetch', pageId, fetchUrl: 'https://app.example.com/api/data', fetchMethod: 'GET' })\n~~~\n\nExecutes `fetch()` in the page, so cookies, session state, and CSRF tokens are reused automatically.\n\n**Console capture:**\n\n~~~text\nbrowser({ action: 'console', pageId, consoleSubAction: 'enable' })\n// ... trigger page actions ...\nbrowser({ action: 'console', pageId, consoleSubAction: 'get', level: 'error' })\n~~~\n\n### Recipe Mode (Declarative)\n\nStructured step-by-step format for reusable workflows and domain skills. Each step declares Action, Verify, On Failure, and Extract fields.\n\nLoad [references/recipes.md](references/recipes.md) for full recipe templates and the recipe format specification.\n\nBrief recipe format:\n\n~~~text\nStep N: <description>\n Action: browser({ ... })\n Verify: <condition to check after action>\n On Failure: <recovery strategy>\n Extract: <data to capture for next steps>\n~~~\n\n## Action Reference\n\n| Action | Purpose | Key Params |\n|--------|---------|------------|\n| `open` | Launch page | `url`, `mode` (ui/headless/panel), `waitUntil`, `label`, `autoDialog` |\n| `read` | Extract content | `pageId`, `readMode` (snapshot/dom/markdown/text), `selector` |\n| `act` | DOM interaction | `pageId`, `kind`, `ref`/`selector`, `text`/`key`/`value` |\n| `batch` | Multi-action execution | `steps` (array of {action, ...params}) |\n| `diff` | Snapshot diff | `pageId` |\n| `navigate` | Page navigation | `pageId`, `url` or `type` (back/forward/reload/waitFor) |\n| `network` | Capture traffic | `pageId`, `subAction` (enable/get/clear/export-har), `filter` |\n| `console` | Capture console | `pageId`, `consoleSubAction` (enable/get/clear), `level` |\n| `fetch` | Page-context HTTP | `pageId`, `fetchUrl`, `fetchMethod`, `fetchHeaders`, `fetchBody` |\n| `eval` | Execute JS | `pageId`, `code` |\n| `screenshot` | Capture image | `pageId`, `selector`, `fullPage`, `clip`, `format` |\n| `dialog` | Pre-register handler for NEXT dialog | `pageId`, `accept`, `promptText` |\n| `session` | Manage sessions | `sessionAction` (list/close/cookies/set-cookie/get-storage/...) |\n\n## Read Modes\n\n| Mode | Output | Use Case |\n|------|--------|----------|\n| `snapshot` | ARIA accessibility tree with refs | Element targeting, form interaction |\n| `dom` | Raw HTML | HTML structure, debugging |\n| `markdown` | Clean readable text | Content extraction, summarization |\n| `text` | Plain text | Simple text extraction |\n\n## Interaction Kinds\n\n| Kind | Required Params | Notes |\n|------|-----------------|-------|\n| `click` | `ref` or `selector` | Left-click element |\n| `type` | `ref`/`selector` + `text` | Type into input/textarea |\n| `press` | `ref`/`selector` + `key` | Send key to element. Requires a target — use `ref` from snapshot or `selector`. |\n| `hover` | `ref`/`selector` | Trigger hover states |\n| `drag` | `fromRef`/`fromSelector` + `toRef`/`toSelector` | Drag and drop |\n| `select` | `ref`/`selector` + `value` | Select dropdown option |\n| `scroll` | optional `ref`/`selector` | Scroll page or element |\n| `upload` | `ref`/`selector` + `value` (path) | File upload |\n\n### Element Targeting Priority\n\n1. **`ref`** (e.g., `@F12`) — From `read(snapshot)` ARIA tree. Most reliable.\n2. **`selector`** (e.g., `input[name='q']`) — Playwright CSS/attribute selector. Precise.\n3. **`element`** (e.g., `'Submit'`) — Text matching via `text=` locator. **Picks first DOM match regardless of visibility.** Fragile for complex widgets (comboboxes, ARIA roles). Last resort.\n\n**Always `read(snapshot)` first** to get refs before interacting.\n\n> **Visibility Warning**: Playwright `act` waits up to 30s for the target to be visible. If a selector or `element` matches a hidden element first, the action times out. The browser tool does NOT expose a `force` or custom `timeout` parameter.\n>\n> **Workarounds:**\n> - Append `:visible` to selectors: `selector: 'button:has-text(\"Submit\"):visible'`\n> - Use specific selectors instead of `element` when labels are ambiguous (e.g., \"Search\" may match 30+ elements)\n> - Use `read(snapshot)` refs (`@F12`) which always target the specific rendered element\n\n## Network Intelligence\n\nThree new actions for API reverse-engineering and authenticated requests:\n\n**`network`** — Passive traffic capture with circular buffer (200 entries default):\n- `enable`: Start capturing with optional filter (resourceTypes, urlPattern, excludeUrls)\n- `get`: Retrieve captured requests + responses with timing\n- `clear`: Reset buffer\n- `export-har`: Export as HAR 1.2 format\n\nHeaders are redacted by default (Authorization, Cookie, etc.). Pass `showSensitive: true` to see full headers.\n\n**`console`** — Browser console message capture (1000 entries default):\n- `enable`: Start capturing all console output\n- `get`: Retrieve messages, optionally filtered by `level`\n- `clear`: Reset buffer\n\n**`fetch`** — Execute HTTP from page context:\n- Uses the page's live cookies, session, CSRF tokens\n- Supports GET/POST/PUT/PATCH/DELETE/HEAD/OPTIONS\n- Body auto-truncated at 256KB\n- Alternative to extracting cookies then calling `http` tool\n\n**Workflow — Reverse-engineer API:**\n\n~~~text\n1. open target page\n2. network enable (filter: xhr, fetch)\n3. interact with the page (click buttons, submit forms)\n4. network get → see API endpoints, methods, headers\n5. fetch → replay API calls using page session\n~~~\n\n## Session Management\n\n| Action | Purpose | Note |\n|--------|---------|------|\n| `cookies` | Export page cookies | `confirm: true` required |\n| `set-cookie` | Inject cookies | `confirm: true` required |\n| `delete-cookie` / `clear-cookies` | Remove cookies | `confirm: true` required |\n| `get-storage` / `set-storage` / `clear-storage` | localStorage/sessionStorage | |\n| `list` | List open pages | |\n| `close` | Close a page | |\n\n## Security Model\n\n**Hard gates — NEVER bypass:**\n- Credentials go via terminal input (NEVER through tool params or chat)\n- CAPTCHA/MFA: pause and ask user\n- Never store tokens in conversation\n- Close pages containing sensitive data when done\n- Verify page URL before entering credentials (phishing prevention)\n- Use `headless` mode for automated non-interactive tasks; `ui` for user-supervised auth\n\n**Cookie safety gate:** All cookie read/write session actions (`cookies`, `set-cookie`, `delete-cookie`, `clear-cookies`) require `confirm: true` as an explicit acknowledgment. Without it, the tool returns an error.\n\n## Local File Preview\n\nThe browser tool blocks `file:///` URLs for security. To preview local HTML files, serve them via a local HTTP server first.\n\n**Pattern:**\n\n~~~text\n// 1. Start local server (pick an unused port)\n// Terminal: npx -y serve <directory> -l <port>\n// Example: npx -y serve ./dist -l 3847\n\n// 2. Open in browser\nbrowser({ action: 'open', url: 'http://localhost:3847/my-file.html', mode: 'ui' })\n\n// 3. Read content or take screenshot\nbrowser({ action: 'read', pageId, readMode: 'markdown' })\nbrowser({ action: 'screenshot', pageId, fullPage: true })\n\n// 4. Clean up — kill the server terminal when done\n~~~\n\n**Use cases:**\n- Preview generated HTML (viewers, reports, docs)\n- Visual regression testing of local builds\n- Inspect single-file HTML applications\n- Screenshot local pages for review\n\n**Important:** Always use `mode: 'ui'` for visual preview so the user can also see and interact with the page.\n\n## Integration\n\n| Skill | Handoff Pattern |\n|-------|------------------|\n| `repo-access` | Strategy Ladder step 6 → browser-use for SSO/OAuth login |\n| `present` | `present({ format: 'browser' })` returns URL → open with browser tool |\n| `aikit` | `web_fetch` fails → browser-use activates |\n\n## Dialog Handling\n\n`dialog()` registers a **one-shot handler** for the NEXT dialog. It must be called **BEFORE** the action that triggers alert, confirm, or prompt.\n\n**Pattern:**\n~~~text\nbrowser({ action: 'dialog', pageId, accept: true })\nbrowser({ action: 'eval', pageId, code: 'confirm(\"Sure?\")' }) // or browser({ action: 'act', ... }) if interaction triggers it\n~~~\n\nFor `prompt` dialogs, pass `promptText` for the response.\n\n**Auto-Dialog (New):** By default, `alert` and `beforeunload` dialogs are auto-accepted when a page is opened (via `autoDialog: true` default on `open`). Only `confirm` and `prompt` dialogs require manual handling with the `dialog` action. To disable auto-handling: `browser({ action: 'open', url, autoDialog: false })`.\n\n## Page Labels\n\nName pages on open for human-readable reference instead of UUIDs.\n\n**Pattern:**\n~~~text\nbrowser({ action: 'open', url: 'https://app.example.com', label: 'app' })\n// Later, use label instead of pageId:\nbrowser({ action: 'read', pageId: 'app', readMode: 'markdown' })\nbrowser({ action: 'act', pageId: 'app', kind: 'click', ref: '@button' })\n~~~\n\nLabels are resolved first (before UUID lookup), so they take priority if there's a collision.\n\n## Diff Snapshots\n\nTrack page changes between interactions using accessibility snapshot diffs.\n\n**Pattern:**\n~~~text\n// First call stores baseline\nbrowser({ action: 'diff', pageId }) // → returns full snapshot (no previous to compare)\n\n// ... interact with page ...\n\n// Second call compares to baseline\nbrowser({ action: 'diff', pageId }) // → returns only added/removed lines\n~~~\n\n**Use cases:**\n- Verify a click/submit changed the expected DOM elements\n- Monitor dynamic content updates\n- Reduce token usage by only seeing WHAT CHANGED instead of re-reading full page\n\n## Batch Execution\n\nExecute multiple browser actions in a single call. Reduces N tool round-trips to 1.\n\n**Pattern:**\n~~~text\nbrowser({ action: 'batch', steps: [\n { action: 'open', url: 'https://example.com', label: 'main' },\n { action: 'read', pageId: '<pageId>', readMode: 'snapshot' },\n { action: 'act', pageId: '<pageId>', kind: 'click', ref: '@submit' }\n] })\n~~~\n\n**Rules:**\n- Each step is a full action object (same params as individual calls)\n- Steps execute sequentially — earlier steps complete before later ones start\n- If any step fails, subsequent steps are skipped and error is returned\n- Results array matches step order\n- `pageId` must be known ahead of time (from a prior `open` call) — batch does NOT auto-chain pageIds between steps\n\n## Troubleshooting\n\n| Issue | Fix |\n|-------|-----|\n| \"Browser not installed\" | Run `aikit browser install` |\n| Element not found | `read` with `snapshot` mode first, use ref from ARIA tree |\n| Timeout on navigation | Add `waitUntil: 'networkidle'` to open/navigate |\n| SSO redirect loop | Check cookies with `session({ sessionAction: 'cookies' })` |\n| Anti-bot block | Try `mode: 'ui'`, add delays between actions |\n| Network capture empty | Ensure `enable` called BEFORE navigating |\n\n## Decision Flow\n\n~~~text\nNeed browser?\n├─ Can web_fetch/http handle it? → NO browser needed\n├─ Login wall / SSO / CAPTCHA? → browser-use (Script mode for one-off, Recipe for reusable)\n├─ Need to capture API traffic? → network enable → interact → network get\n├─ Need authenticated API calls? → fetch action (uses page session)\n├─ JS-rendered content? → open + read(markdown)\n├─ Preview local HTML file? → serve dir (npx serve) → open(http://localhost:<port>/file.html, mode: 'ui')\n├─ Form interaction? → Script mode: open → read(snapshot) → act → verify\n├─ Reusable workflow? → Recipe mode (see references/recipes.md)\n├─ Multiple actions needed? → batch (single round-trip)\n└─ Need to track page changes? → diff (snapshot delta)\n~~~\n"},{file:`references/recipes.md`,content:`# Browser Recipes & Domain Skills
|
|
1
|
+
var e=[{file:`SKILL.md`,content:"---\nname: browser-use\ndescription: \"Browser automation for AI agents using AI Kit's owned `browser` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) `web_fetch` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that `web_fetch` cannot handle, (5) user asks to browse, scrape, test, or automate a website, or (6) another skill needs a standard recipe format for browser-driven workflows. Uses AI Kit's owned Chromium runtime and recipe patterns for domain-specific automation skills — no external MCP server dependency.\"\nmetadata:\n category: cross-cutting\n domain: general\n applicability: on-demand\n inputs: [url, auth-error, browser-task, login-wall]\n outputs: [page-content, screenshots, extracted-data, authenticated-session, network-captures]\n requires: []\n relatedSkills: [repo-access, present, aikit]\nargument-hint: \"URL or browser task description\"\n---\n\n# Browser Automation for AI Agents\n\nUse AI Kit's `browser` MCP tool for authentication barriers, data extraction, form interactions, network capture, and web automation. Single tool, action-based dispatch, owned Chromium runtime.\n\n## Runtime Preference — HARD RULE\n\n**ALWAYS use AI Kit’s controlled Chromium** for all agent browser work:\n- Desktop/interactive: `mode: 'ui'` — visible window the agent can read/screenshot/interact with\n- CI/headless: `mode: 'headless'` — no display, same capabilities\n\n**NEVER use system browser commands** (`Start-Process`, `open`, `xdg-open`, `explorer.exe`) when the agent needs feedback from a page. System browser provides ZERO programmatic readback — the agent cannot verify, read, or interact with what it opened.\n\n**Exception:** The `present` tool’s internal system-browser open (for showing content TO the user) is handled by the tool itself — agents don’t control or override that path. This rule applies only when the **agent** needs to inspect, verify, or interact with web content.\n\n## Quick Reference\n\n**Tool:** `browser({ action: \"...\", ... })` — single tool, 13 actions, owned Chromium.\n\n**Actions:**\n| Action | Purpose | Key params |\n|--------|---------|------------|\n| `open` | Launch page | `url`, `mode` (ui/headless), `label`, `autoDialog` |\n| `read` | Get page content | `pageId`, `readMode` (snapshot/dom/markdown/text) |\n| `act` | Interact with elements | `pageId`, `kind` (click/type/press/hover/drag/select/scroll/upload) |\n| `batch` | Execute multiple actions in one call | `steps` (array of action objects) |\n| `diff` | Compare accessibility snapshot vs baseline | `pageId` |\n| `navigate` | Go to URL, back/forward, wait | `pageId`, `url` or `type` |\n| `network` | Capture network traffic | `pageId`, `subAction` (enable/get/clear) |\n| `console` | Browser console messages | `pageId`, `consoleSubAction` |\n| `fetch` | HTTP with page cookies | `pageId`, `fetchUrl` |\n| `eval` | Run JS in page context | `pageId`, `code` |\n| `screenshot` | Capture page/element | `pageId`, `fullPage?`, `selector?` |\n| `dialog` | Handle alert/confirm/prompt | `pageId`, `accept` |\n| `session` | List pages, cookies, storage | `sessionAction` |\n\n**Two modes:**\n- **Script Mode** (default) — direct sequential `browser()` calls for one-off tasks\n- **Recipe Mode** — reusable labeled step sequences for domain-specific automation\n\n**Activate when:** `web_fetch` returns login/SAML/CAPTCHA, `http` gets 401/403, anti-bot detection, need JS rendering or screenshots.\n**Skip when:** Public pages (`web_fetch` works), API endpoints (`http` works), static downloads.\n\n**⚠️ `file:///` URLs are blocked** — serve locally with `npx serve` then open `http://localhost`.\n\n## Runtime\n\n- Tool: `browser({ action: ... })`\n- 13 actions: `open`, `read`, `act`, `batch`, `diff`, `navigate`, `network`, `console`, `fetch`, `eval`, `screenshot`, `dialog`, `session`\n- Modes: `headless` (CI), `ui` (desktop), `panel` (VS Code)\n- Install: `aikit browser install`\n- Auto-idle shutdown after timeout\n\n## When to Activate\n\n- `web_fetch` returns login HTML, SAML redirect, or CAPTCHA\n- `http` returns 401/403 and user confirms browser access works\n- `repo-access` Strategy Ladder exhausted — SSO/OAuth blocks CLI\n- Anti-bot detection (Cloudflare, \"verify you are human\")\n- User asks to browse, scrape, automate, test, or interact with a web app\n- Need screenshots, accessibility snapshots, or JS-rendered content\n- Preview or inspect local HTML files (serve locally, then open with browser)\n- Need to capture network traffic or make authenticated API calls using page session\n\n## When NOT to Activate\n\n- Public pages `web_fetch` handles correctly\n- API endpoints reachable via `http` with auth headers\n- Static downloads via `http`\n- Tasks only needing raw HTML/links/outline\n\n## Auto-Activation Detection (for agent protocols)\n\nThese exact patterns in tool responses should trigger browser escalation WITHOUT loading this full skill:\n\n| Tool Response Signal | Action |\n|---------------------|--------|\n| `web_fetch` body contains: `<form.*login`, `saml`, `sso`, `captcha`, `cloudflare`, `verify you are human` | Open URL in browser, wait for user auth |\n| `http` status: 401, 403, 407 | Open URL in browser, wait for user auth |\n| `web_fetch` returns `<meta http-equiv=\"refresh\"` to different domain | SSO redirect detected — open final URL in browser |\n| `web_fetch` body is mostly JavaScript with no readable content | JS-rendered page — open in browser, read with `readMode: 'markdown'` |\n| `repo-access` ladder exhausted | Open repo URL in browser for manual auth |\n\n**Key principle:** The browser tool is ALWAYS available (it's an MCP tool, not a skill). Agents should call `browser({ action: 'open', ... })` directly. This skill provides RECIPES and ADVANCED PATTERNS — but basic browser escalation requires NO skill loading.\n\n## Two Automation Modes\n\n### Script Mode (Default — Imperative)\n\nDirect sequential `browser()` calls. Best for one-off tasks, testing, API capture.\n\n~~~text\n// Open → Read → Act → Read loop\nbrowser({ action: 'open', url: 'https://app.example.com', mode: 'ui' })\nbrowser({ action: 'read', pageId })\nbrowser({ action: 'act', pageId, kind: 'click', ref: '@login-button' })\nbrowser({ action: 'read', pageId }) // verify state changed\n~~~\n\n**Network Intelligence pattern:**\n\n~~~text\nbrowser({ action: 'network', pageId, subAction: 'enable', filter: { resourceTypes: ['xhr', 'fetch'] } })\n// ... navigate/interact to trigger API calls ...\nbrowser({ action: 'network', pageId, subAction: 'get' })\nbrowser({ action: 'network', pageId, subAction: 'export-har' })\n~~~\n\n**Authenticated API calls (using page cookies/session):**\n\n~~~text\nbrowser({ action: 'fetch', pageId, fetchUrl: 'https://app.example.com/api/data', fetchMethod: 'GET' })\n~~~\n\nExecutes `fetch()` in the page, so cookies, session state, and CSRF tokens are reused automatically.\n\n**Console capture:**\n\n~~~text\nbrowser({ action: 'console', pageId, consoleSubAction: 'enable' })\n// ... trigger page actions ...\nbrowser({ action: 'console', pageId, consoleSubAction: 'get', level: 'error' })\n~~~\n\n### Recipe Mode (Declarative)\n\nStructured step-by-step format for reusable workflows and domain skills. Each step declares Action, Verify, On Failure, and Extract fields.\n\nLoad [references/recipes.md](references/recipes.md) for full recipe templates and the recipe format specification.\n\nBrief recipe format:\n\n~~~text\nStep N: <description>\n Action: browser({ ... })\n Verify: <condition to check after action>\n On Failure: <recovery strategy>\n Extract: <data to capture for next steps>\n~~~\n\n## Action Reference\n\n| Action | Purpose | Key Params |\n|--------|---------|------------|\n| `open` | Launch page | `url`, `mode` (ui/headless/panel), `waitUntil`, `label`, `autoDialog` |\n| `read` | Extract content | `pageId`, `readMode` (snapshot/dom/markdown/text), `selector` |\n| `act` | DOM interaction | `pageId`, `kind`, `ref`/`selector`, `text`/`key`/`value` |\n| `batch` | Multi-action execution | `steps` (array of {action, ...params}) |\n| `diff` | Snapshot diff | `pageId` |\n| `navigate` | Page navigation | `pageId`, `url` or `type` (back/forward/reload/waitFor) |\n| `network` | Capture traffic | `pageId`, `subAction` (enable/get/clear/export-har), `filter` |\n| `console` | Capture console | `pageId`, `consoleSubAction` (enable/get/clear), `level` |\n| `fetch` | Page-context HTTP | `pageId`, `fetchUrl`, `fetchMethod`, `fetchHeaders`, `fetchBody` |\n| `eval` | Execute JS | `pageId`, `code` |\n| `screenshot` | Capture image | `pageId`, `selector`, `fullPage`, `clip`, `format` |\n| `dialog` | Pre-register handler for NEXT dialog | `pageId`, `accept`, `promptText` |\n| `session` | Manage sessions | `sessionAction` (list/close/cookies/set-cookie/get-storage/...) |\n\n## Read Modes\n\n| Mode | Output | Use Case |\n|------|--------|----------|\n| `snapshot` | ARIA accessibility tree with refs | Element targeting, form interaction |\n| `dom` | Raw HTML | HTML structure, debugging |\n| `markdown` | Clean readable text | Content extraction, summarization |\n| `text` | Plain text | Simple text extraction |\n\n## Interaction Kinds\n\n| Kind | Required Params | Notes |\n|------|-----------------|-------|\n| `click` | `ref` or `selector` | Left-click element |\n| `type` | `ref`/`selector` + `text` | Type into input/textarea |\n| `press` | `ref`/`selector` + `key` | Send key to element. Requires a target — use `ref` from snapshot or `selector`. |\n| `hover` | `ref`/`selector` | Trigger hover states |\n| `drag` | `fromRef`/`fromSelector` + `toRef`/`toSelector` | Drag and drop |\n| `select` | `ref`/`selector` + `value` | Select dropdown option |\n| `scroll` | optional `ref`/`selector` | Scroll page or element |\n| `upload` | `ref`/`selector` + `value` (path) | File upload |\n\n### Element Targeting Priority\n\n1. **`ref`** (e.g., `@F12`) — From `read(snapshot)` ARIA tree. Most reliable.\n2. **`selector`** (e.g., `input[name='q']`) — Playwright CSS/attribute selector. Precise.\n3. **`element`** (e.g., `'Submit'`) — Text matching via `text=` locator. **Picks first DOM match regardless of visibility.** Fragile for complex widgets (comboboxes, ARIA roles). Last resort.\n\n**Always `read(snapshot)` first** to get refs before interacting.\n\n> **Visibility Warning**: Playwright `act` waits up to 30s for the target to be visible. If a selector or `element` matches a hidden element first, the action times out. The browser tool does NOT expose a `force` or custom `timeout` parameter.\n>\n> **Workarounds:**\n> - Append `:visible` to selectors: `selector: 'button:has-text(\"Submit\"):visible'`\n> - Use specific selectors instead of `element` when labels are ambiguous (e.g., \"Search\" may match 30+ elements)\n> - Use `read(snapshot)` refs (`@F12`) which always target the specific rendered element\n\n## Network Intelligence\n\nThree new actions for API reverse-engineering and authenticated requests:\n\n**`network`** — Passive traffic capture with circular buffer (200 entries default):\n- `enable`: Start capturing with optional filter (resourceTypes, urlPattern, excludeUrls)\n- `get`: Retrieve captured requests + responses with timing\n- `clear`: Reset buffer\n- `export-har`: Export as HAR 1.2 format\n\nHeaders are redacted by default (Authorization, Cookie, etc.). Pass `showSensitive: true` to see full headers.\n\n**`console`** — Browser console message capture (1000 entries default):\n- `enable`: Start capturing all console output\n- `get`: Retrieve messages, optionally filtered by `level`\n- `clear`: Reset buffer\n\n**`fetch`** — Execute HTTP from page context:\n- Uses the page's live cookies, session, CSRF tokens\n- Supports GET/POST/PUT/PATCH/DELETE/HEAD/OPTIONS\n- Body auto-truncated at 256KB\n- Alternative to extracting cookies then calling `http` tool\n\n**Workflow — Reverse-engineer API:**\n\n~~~text\n1. open target page\n2. network enable (filter: xhr, fetch)\n3. interact with the page (click buttons, submit forms)\n4. network get → see API endpoints, methods, headers\n5. fetch → replay API calls using page session\n~~~\n\n## Session Management\n\n| Action | Purpose | Note |\n|--------|---------|------|\n| `cookies` | Export page cookies | `confirm: true` required |\n| `set-cookie` | Inject cookies | `confirm: true` required |\n| `delete-cookie` / `clear-cookies` | Remove cookies | `confirm: true` required |\n| `get-storage` / `set-storage` / `clear-storage` | localStorage/sessionStorage | |\n| `list` | List open pages | |\n| `close` | Close a page | |\n\n## Security Model\n\n**Hard gates — NEVER bypass:**\n- Credentials go via terminal input (NEVER through tool params or chat)\n- CAPTCHA/MFA: pause and ask user\n- Never store tokens in conversation\n- Close pages containing sensitive data when done\n- Verify page URL before entering credentials (phishing prevention)\n- Use `headless` mode for automated non-interactive tasks; `ui` for user-supervised auth\n\n**Cookie safety gate:** All cookie read/write session actions (`cookies`, `set-cookie`, `delete-cookie`, `clear-cookies`) require `confirm: true` as an explicit acknowledgment. Without it, the tool returns an error.\n\n## Local File Preview\n\nThe browser tool blocks `file:///` URLs for security. To preview local HTML files, serve them via a local HTTP server first.\n\n**Pattern:**\n\n~~~text\n// 1. Start local server (pick an unused port)\n// Terminal: npx -y serve <directory> -l <port>\n// Example: npx -y serve ./dist -l 3847\n\n// 2. Open in browser\nbrowser({ action: 'open', url: 'http://localhost:3847/my-file.html', mode: 'ui' })\n\n// 3. Read content or take screenshot\nbrowser({ action: 'read', pageId, readMode: 'markdown' })\nbrowser({ action: 'screenshot', pageId, fullPage: true })\n\n// 4. Clean up — kill the server terminal when done\n~~~\n\n**Use cases:**\n- Preview generated HTML (viewers, reports, docs)\n- Visual regression testing of local builds\n- Inspect single-file HTML applications\n- Screenshot local pages for review\n\n**Important:** Always use `mode: 'ui'` for visual preview so the user can also see and interact with the page.\n\n## Integration\n\n| Skill | Handoff Pattern |\n|-------|------------------|\n| `repo-access` | Strategy Ladder step 6 → browser-use for SSO/OAuth login |\n| `present` | `present({ format: 'browser' })` returns URL → open with browser tool |\n| `aikit` | `web_fetch` fails → browser-use activates |\n\n## Dialog Handling\n\n`dialog()` registers a **one-shot handler** for the NEXT dialog. It must be called **BEFORE** the action that triggers alert, confirm, or prompt.\n\n**Pattern:**\n~~~text\nbrowser({ action: 'dialog', pageId, accept: true })\nbrowser({ action: 'eval', pageId, code: 'confirm(\"Sure?\")' }) // or browser({ action: 'act', ... }) if interaction triggers it\n~~~\n\nFor `prompt` dialogs, pass `promptText` for the response.\n\n**Auto-Dialog (New):** By default, `alert` and `beforeunload` dialogs are auto-accepted when a page is opened (via `autoDialog: true` default on `open`). Only `confirm` and `prompt` dialogs require manual handling with the `dialog` action. To disable auto-handling: `browser({ action: 'open', url, autoDialog: false })`.\n\n## Page Labels\n\nName pages on open for human-readable reference instead of UUIDs.\n\n**Pattern:**\n~~~text\nbrowser({ action: 'open', url: 'https://app.example.com', label: 'app' })\n// Later, use label instead of pageId:\nbrowser({ action: 'read', pageId: 'app', readMode: 'markdown' })\nbrowser({ action: 'act', pageId: 'app', kind: 'click', ref: '@button' })\n~~~\n\nLabels are resolved first (before UUID lookup), so they take priority if there's a collision.\n\n## Diff Snapshots\n\nTrack page changes between interactions using accessibility snapshot diffs.\n\n**Pattern:**\n~~~text\n// First call stores baseline\nbrowser({ action: 'diff', pageId }) // → returns full snapshot (no previous to compare)\n\n// ... interact with page ...\n\n// Second call compares to baseline\nbrowser({ action: 'diff', pageId }) // → returns only added/removed lines\n~~~\n\n**Use cases:**\n- Verify a click/submit changed the expected DOM elements\n- Monitor dynamic content updates\n- Reduce token usage by only seeing WHAT CHANGED instead of re-reading full page\n\n## Batch Execution\n\nExecute multiple browser actions in a single call. Reduces N tool round-trips to 1.\n\n**Pattern:**\n~~~text\nbrowser({ action: 'batch', steps: [\n { action: 'open', url: 'https://example.com', label: 'main' },\n { action: 'read', pageId: '<pageId>', readMode: 'snapshot' },\n { action: 'act', pageId: '<pageId>', kind: 'click', ref: '@submit' }\n] })\n~~~\n\n**Rules:**\n- Each step is a full action object (same params as individual calls)\n- Steps execute sequentially — earlier steps complete before later ones start\n- If any step fails, subsequent steps are skipped and error is returned\n- Results array matches step order\n- `pageId` must be known ahead of time (from a prior `open` call) — batch does NOT auto-chain pageIds between steps\n\n## Troubleshooting\n\n| Issue | Fix |\n|-------|-----|\n| \"Browser not installed\" | Run `aikit browser install` |\n| Element not found | `read` with `snapshot` mode first, use ref from ARIA tree |\n| Timeout on navigation | Add `waitUntil: 'networkidle'` to open/navigate |\n| SSO redirect loop | Check cookies with `session({ sessionAction: 'cookies' })` |\n| Anti-bot block | Try `mode: 'ui'`, add delays between actions |\n| Network capture empty | Ensure `enable` called BEFORE navigating |\n\n## Decision Flow\n\n~~~text\nNeed browser?\n├─ Can web_fetch/http handle it? → NO browser needed\n├─ Login wall / SSO / CAPTCHA? → browser-use (Script mode for one-off, Recipe for reusable)\n├─ Need to capture API traffic? → network enable → interact → network get\n├─ Need authenticated API calls? → fetch action (uses page session)\n├─ JS-rendered content? → open + read(markdown)\n├─ Preview local HTML file? → serve dir (npx serve) → open(http://localhost:<port>/file.html, mode: 'ui')\n├─ Form interaction? → Script mode: open → read(snapshot) → act → verify\n├─ Reusable workflow? → Recipe mode (see references/recipes.md)\n├─ Multiple actions needed? → batch (single round-trip)\n└─ Need to track page changes? → diff (snapshot delta)\n~~~\n"},{file:`references/recipes.md`,content:`# Browser Recipes & Domain Skills
|
|
2
2
|
|
|
3
3
|
Reference file for reusable browser automation patterns. Load this when building domain-specific browser workflows.
|
|
4
4
|
|