@vpxa/aikit 0.1.185 → 0.1.187

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,4 +1,216 @@
1
- var e=[{file:`SKILL.md`,content:"---\nname: browser-use\ndescription: \"Browser automation for AI agents using AI Kit's owned `browser` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) `web_fetch` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that `web_fetch` cannot handle, (5) user asks to browse, scrape, test, or automate a website, or (6) another skill needs a standard recipe format for browser-driven workflows. Uses AI Kit's owned Chromium runtime and recipe patterns for domain-specific automation skills — no external MCP server dependency.\"\nmetadata:\n category: cross-cutting\n domain: general\n applicability: on-demand\n inputs: [url, auth-error, browser-task, login-wall]\n outputs: [page-content, screenshots, extracted-data, authenticated-session, network-captures]\n requires: []\n relatedSkills: [repo-access, present, aikit]\nargument-hint: \"URL or browser task description\"\n---\n\n# Browser Automation for AI Agents\n\nUse AI Kit's `browser` MCP tool for authentication barriers, data extraction, form interactions, network capture, and web automation. Single tool, action-based dispatch, owned Chromium runtime.\n\n## Runtime Preference — HARD RULE\n\n**ALWAYS use AI Kit’s controlled Chromium** for all agent browser work:\n- Desktop/interactive: `mode: 'ui'` — visible window the agent can read/screenshot/interact with\n- CI/headless: `mode: 'headless'` — no display, same capabilities\n\n**NEVER use system browser commands** (`Start-Process`, `open`, `xdg-open`, `explorer.exe`) when the agent needs feedback from a page. System browser provides ZERO programmatic readback — the agent cannot verify, read, or interact with what it opened.\n\n**Exception:** The `present` tool’s internal system-browser open (for showing content TO the user) is handled by the tool itself — agents don’t control or override that path. This rule applies only when the **agent** needs to inspect, verify, or interact with web content.\n\n## Quick Reference\n\n**Tool:** `browser({ action: \"...\", ... })` — single tool, 13 actions, owned Chromium.\n\n**Actions:**\n| Action | Purpose | Key params |\n|--------|---------|------------|\n| `open` | Launch page | `url`, `mode` (ui/headless), `label`, `autoDialog` |\n| `read` | Get page content | `pageId`, `readMode` (snapshot/dom/markdown/text) |\n| `act` | Interact with elements | `pageId`, `kind` (click/type/press/hover/drag/select/scroll/upload) |\n| `batch` | Execute multiple actions in one call | `steps` (array of action objects) |\n| `diff` | Compare accessibility snapshot vs baseline | `pageId` |\n| `navigate` | Go to URL, back/forward, wait | `pageId`, `url` or `type` |\n| `network` | Capture network traffic | `pageId`, `subAction` (enable/get/clear) |\n| `console` | Browser console messages | `pageId`, `consoleSubAction` |\n| `fetch` | HTTP with page cookies | `pageId`, `fetchUrl` |\n| `eval` | Run JS in page context | `pageId`, `code` |\n| `screenshot` | Capture page/element | `pageId`, `fullPage?`, `selector?` |\n| `dialog` | Handle alert/confirm/prompt | `pageId`, `accept` |\n| `session` | List pages, cookies, storage | `sessionAction` |\n\n**Two modes:**\n- **Script Mode** (default) — direct sequential `browser()` calls for one-off tasks\n- **Recipe Mode** — reusable labeled step sequences for domain-specific automation\n\n**Activate when:** `web_fetch` returns login/SAML/CAPTCHA, `http` gets 401/403, anti-bot detection, need JS rendering or screenshots.\n**Skip when:** Public pages (`web_fetch` works), API endpoints (`http` works), static downloads.\n\n**⚠️ `file:///` URLs are blocked** — serve locally with `npx serve` then open `http://localhost`.\n\n## Runtime\n\n- Tool: `browser({ action: ... })`\n- 13 actions: `open`, `read`, `act`, `batch`, `diff`, `navigate`, `network`, `console`, `fetch`, `eval`, `screenshot`, `dialog`, `session`\n- Modes: `headless` (CI), `ui` (desktop), `panel` (VS Code)\n- Install: `aikit browser install`\n- Auto-idle shutdown after timeout\n\n## When to Activate\n\n- `web_fetch` returns login HTML, SAML redirect, or CAPTCHA\n- `http` returns 401/403 and user confirms browser access works\n- `repo-access` Strategy Ladder exhausted — SSO/OAuth blocks CLI\n- Anti-bot detection (Cloudflare, \"verify you are human\")\n- User asks to browse, scrape, automate, test, or interact with a web app\n- Need screenshots, accessibility snapshots, or JS-rendered content\n- Preview or inspect local HTML files (serve locally, then open with browser)\n- Need to capture network traffic or make authenticated API calls using page session\n\n## When NOT to Activate\n\n- Public pages `web_fetch` handles correctly\n- API endpoints reachable via `http` with auth headers\n- Static downloads via `http`\n- Tasks only needing raw HTML/links/outline\n\n## Auto-Activation Detection (for agent protocols)\n\nThese exact patterns in tool responses should trigger browser escalation WITHOUT loading this full skill:\n\n| Tool Response Signal | Action |\n|---------------------|--------|\n| `web_fetch` body contains: `<form.*login`, `saml`, `sso`, `captcha`, `cloudflare`, `verify you are human` | Open URL in browser, wait for user auth |\n| `http` status: 401, 403, 407 | Open URL in browser, wait for user auth |\n| `web_fetch` returns `<meta http-equiv=\"refresh\"` to different domain | SSO redirect detected — open final URL in browser |\n| `web_fetch` body is mostly JavaScript with no readable content | JS-rendered page — open in browser, read with `readMode: 'markdown'` |\n| `repo-access` ladder exhausted | Open repo URL in browser for manual auth |\n\n**Key principle:** The browser tool is ALWAYS available (it's an MCP tool, not a skill). Agents should call `browser({ action: 'open', ... })` directly. This skill provides RECIPES and ADVANCED PATTERNS — but basic browser escalation requires NO skill loading.\n\n## Two Automation Modes\n\n### Script Mode (Default — Imperative)\n\nDirect sequential `browser()` calls. Best for one-off tasks, testing, API capture.\n\n~~~text\n// Open → Read → Act → Read loop\nbrowser({ action: 'open', url: 'https://app.example.com', mode: 'ui' })\nbrowser({ action: 'read', pageId })\nbrowser({ action: 'act', pageId, kind: 'click', ref: '@login-button' })\nbrowser({ action: 'read', pageId }) // verify state changed\n~~~\n\n**Network Intelligence pattern:**\n\n~~~text\nbrowser({ action: 'network', pageId, subAction: 'enable', filter: { resourceTypes: ['xhr', 'fetch'] } })\n// ... navigate/interact to trigger API calls ...\nbrowser({ action: 'network', pageId, subAction: 'get' })\nbrowser({ action: 'network', pageId, subAction: 'export-har' })\n~~~\n\n**Authenticated API calls (using page cookies/session):**\n\n~~~text\nbrowser({ action: 'fetch', pageId, fetchUrl: 'https://app.example.com/api/data', fetchMethod: 'GET' })\n~~~\n\nExecutes `fetch()` in the page, so cookies, session state, and CSRF tokens are reused automatically.\n\n**Console capture:**\n\n~~~text\nbrowser({ action: 'console', pageId, consoleSubAction: 'enable' })\n// ... trigger page actions ...\nbrowser({ action: 'console', pageId, consoleSubAction: 'get', level: 'error' })\n~~~\n\n### Recipe Mode (Declarative)\n\nStructured step-by-step format for reusable workflows and domain skills. Each step declares Action, Verify, On Failure, and Extract fields.\n\nLoad [references/recipes.md](references/recipes.md) for full recipe templates and the recipe format specification.\n\nBrief recipe format:\n\n~~~text\nStep N: <description>\n Action: browser({ ... })\n Verify: <condition to check after action>\n On Failure: <recovery strategy>\n Extract: <data to capture for next steps>\n~~~\n\n## Action Reference\n\n| Action | Purpose | Key Params |\n|--------|---------|------------|\n| `open` | Launch page | `url`, `mode` (ui/headless/panel), `waitUntil`, `label`, `autoDialog` |\n| `read` | Extract content | `pageId`, `readMode` (snapshot/dom/markdown/text), `selector` |\n| `act` | DOM interaction | `pageId`, `kind`, `ref`/`selector`, `text`/`key`/`value` |\n| `batch` | Multi-action execution | `steps` (array of {action, ...params}) |\n| `diff` | Snapshot diff | `pageId` |\n| `navigate` | Page navigation | `pageId`, `url` or `type` (back/forward/reload/waitFor) |\n| `network` | Capture traffic | `pageId`, `subAction` (enable/get/clear/export-har), `filter` |\n| `console` | Capture console | `pageId`, `consoleSubAction` (enable/get/clear), `level` |\n| `fetch` | Page-context HTTP | `pageId`, `fetchUrl`, `fetchMethod`, `fetchHeaders`, `fetchBody` |\n| `eval` | Execute JS | `pageId`, `code` |\n| `screenshot` | Capture image | `pageId`, `selector`, `fullPage`, `clip`, `format` |\n| `dialog` | Pre-register handler for NEXT dialog | `pageId`, `accept`, `promptText` |\n| `session` | Manage sessions | `sessionAction` (list/close/cookies/set-cookie/get-storage/...) |\n\n## Read Modes\n\n| Mode | Output | Use Case |\n|------|--------|----------|\n| `snapshot` | ARIA accessibility tree with refs | Element targeting, form interaction |\n| `dom` | Raw HTML | HTML structure, debugging |\n| `markdown` | Clean readable text | Content extraction, summarization |\n| `text` | Plain text | Simple text extraction |\n\n## Interaction Kinds\n\n| Kind | Required Params | Notes |\n|------|-----------------|-------|\n| `click` | `ref` or `selector` | Left-click element |\n| `type` | `ref`/`selector` + `text` | Type into input/textarea |\n| `press` | `ref`/`selector` + `key` | Send key to element. Requires a target — use `ref` from snapshot or `selector`. |\n| `hover` | `ref`/`selector` | Trigger hover states |\n| `drag` | `fromRef`/`fromSelector` + `toRef`/`toSelector` | Drag and drop |\n| `select` | `ref`/`selector` + `value` | Select dropdown option |\n| `scroll` | optional `ref`/`selector` | Scroll page or element |\n| `upload` | `ref`/`selector` + `value` (path) | File upload |\n\n### Element Targeting Priority\n\n1. **`ref`** (e.g., `@F12`) — From `read(snapshot)` ARIA tree. Most reliable.\n2. **`selector`** (e.g., `input[name='q']`) — Playwright CSS/attribute selector. Precise.\n3. **`element`** (e.g., `'Submit'`) — Text matching via `text=` locator. **Picks first DOM match regardless of visibility.** Fragile for complex widgets (comboboxes, ARIA roles). Last resort.\n\n**Always `read(snapshot)` first** to get refs before interacting.\n\n> **Visibility Warning**: Playwright `act` waits up to 30s for the target to be visible. If a selector or `element` matches a hidden element first, the action times out. The browser tool does NOT expose a `force` or custom `timeout` parameter.\n>\n> **Workarounds:**\n> - Append `:visible` to selectors: `selector: 'button:has-text(\"Submit\"):visible'`\n> - Use specific selectors instead of `element` when labels are ambiguous (e.g., \"Search\" may match 30+ elements)\n> - Use `read(snapshot)` refs (`@F12`) which always target the specific rendered element\n\n## Network Intelligence\n\nThree new actions for API reverse-engineering and authenticated requests:\n\n**`network`** — Passive traffic capture with circular buffer (200 entries default):\n- `enable`: Start capturing with optional filter (resourceTypes, urlPattern, excludeUrls)\n- `get`: Retrieve captured requests + responses with timing\n- `clear`: Reset buffer\n- `export-har`: Export as HAR 1.2 format\n\nHeaders are redacted by default (Authorization, Cookie, etc.). Pass `showSensitive: true` to see full headers.\n\n**`console`** — Browser console message capture (1000 entries default):\n- `enable`: Start capturing all console output\n- `get`: Retrieve messages, optionally filtered by `level`\n- `clear`: Reset buffer\n\n**`fetch`** — Execute HTTP from page context:\n- Uses the page's live cookies, session, CSRF tokens\n- Supports GET/POST/PUT/PATCH/DELETE/HEAD/OPTIONS\n- Body auto-truncated at 256KB\n- Alternative to extracting cookies then calling `http` tool\n\n**Workflow — Reverse-engineer API:**\n\n~~~text\n1. open target page\n2. network enable (filter: xhr, fetch)\n3. interact with the page (click buttons, submit forms)\n4. network get → see API endpoints, methods, headers\n5. fetch → replay API calls using page session\n~~~\n\n## Session Management\n\n| Action | Purpose | Note |\n|--------|---------|------|\n| `cookies` | Export page cookies | `confirm: true` required |\n| `set-cookie` | Inject cookies | `confirm: true` required |\n| `delete-cookie` / `clear-cookies` | Remove cookies | `confirm: true` required |\n| `get-storage` / `set-storage` / `clear-storage` | localStorage/sessionStorage | |\n| `list` | List open pages | |\n| `close` | Close a page | |\n\n## Security Model\n\n**Hard gates — NEVER bypass:**\n- Credentials go via terminal input (NEVER through tool params or chat)\n- CAPTCHA/MFA: pause and ask user\n- Never store tokens in conversation\n- Close pages containing sensitive data when done\n- Verify page URL before entering credentials (phishing prevention)\n- Use `headless` mode for automated non-interactive tasks; `ui` for user-supervised auth\n\n**Cookie safety gate:** All cookie read/write session actions (`cookies`, `set-cookie`, `delete-cookie`, `clear-cookies`) require `confirm: true` as an explicit acknowledgment. Without it, the tool returns an error.\n\n## Local File Preview\n\nThe browser tool blocks `file:///` URLs for security. To preview local HTML files, serve them via a local HTTP server first.\n\n**Pattern:**\n\n~~~text\n// 1. Start local server (pick an unused port)\n// Terminal: npx -y serve <directory> -l <port>\n// Example: npx -y serve ./dist -l 3847\n\n// 2. Open in browser\nbrowser({ action: 'open', url: 'http://localhost:3847/my-file.html', mode: 'ui' })\n\n// 3. Read content or take screenshot\nbrowser({ action: 'read', pageId, readMode: 'markdown' })\nbrowser({ action: 'screenshot', pageId, fullPage: true })\n\n// 4. Clean up — kill the server terminal when done\n~~~\n\n**Use cases:**\n- Preview generated HTML (viewers, reports, docs)\n- Visual regression testing of local builds\n- Inspect single-file HTML applications\n- Screenshot local pages for review\n\n**Important:** Always use `mode: 'ui'` for visual preview so the user can also see and interact with the page.\n\n## Integration\n\n| Skill | Handoff Pattern |\n|-------|------------------|\n| `repo-access` | Strategy Ladder step 6 → browser-use for SSO/OAuth login |\n| `present` | `present({ format: 'browser' })` returns URL → open with browser tool |\n| `aikit` | `web_fetch` fails → browser-use activates |\n\n## Dialog Handling\n\n`dialog()` registers a **one-shot handler** for the NEXT dialog. It must be called **BEFORE** the action that triggers alert, confirm, or prompt.\n\n**Pattern:**\n~~~text\nbrowser({ action: 'dialog', pageId, accept: true })\nbrowser({ action: 'eval', pageId, code: 'confirm(\"Sure?\")' }) // or browser({ action: 'act', ... }) if interaction triggers it\n~~~\n\nFor `prompt` dialogs, pass `promptText` for the response.\n\n**Auto-Dialog (New):** By default, `alert` and `beforeunload` dialogs are auto-accepted when a page is opened (via `autoDialog: true` default on `open`). Only `confirm` and `prompt` dialogs require manual handling with the `dialog` action. To disable auto-handling: `browser({ action: 'open', url, autoDialog: false })`.\n\n## Page Labels\n\nName pages on open for human-readable reference instead of UUIDs.\n\n**Pattern:**\n~~~text\nbrowser({ action: 'open', url: 'https://app.example.com', label: 'app' })\n// Later, use label instead of pageId:\nbrowser({ action: 'read', pageId: 'app', readMode: 'markdown' })\nbrowser({ action: 'act', pageId: 'app', kind: 'click', ref: '@button' })\n~~~\n\nLabels are resolved first (before UUID lookup), so they take priority if there's a collision.\n\n## Diff Snapshots\n\nTrack page changes between interactions using accessibility snapshot diffs.\n\n**Pattern:**\n~~~text\n// First call stores baseline\nbrowser({ action: 'diff', pageId }) // → returns full snapshot (no previous to compare)\n\n// ... interact with page ...\n\n// Second call compares to baseline\nbrowser({ action: 'diff', pageId }) // → returns only added/removed lines\n~~~\n\n**Use cases:**\n- Verify a click/submit changed the expected DOM elements\n- Monitor dynamic content updates\n- Reduce token usage by only seeing WHAT CHANGED instead of re-reading full page\n\n## Batch Execution\n\nExecute multiple browser actions in a single call. Reduces N tool round-trips to 1.\n\n**Pattern:**\n~~~text\nbrowser({ action: 'batch', steps: [\n { action: 'open', url: 'https://example.com', label: 'main' },\n { action: 'read', pageId: '<pageId>', readMode: 'snapshot' },\n { action: 'act', pageId: '<pageId>', kind: 'click', ref: '@submit' }\n] })\n~~~\n\n**Rules:**\n- Each step is a full action object (same params as individual calls)\n- Steps execute sequentially — earlier steps complete before later ones start\n- If any step fails, subsequent steps are skipped and error is returned\n- Results array matches step order\n- `pageId` must be known ahead of time (from a prior `open` call) — batch does NOT auto-chain pageIds between steps\n\n## Troubleshooting\n\n| Issue | Fix |\n|-------|-----|\n| \"Browser not installed\" | Run `aikit browser install` |\n| Element not found | `read` with `snapshot` mode first, use ref from ARIA tree |\n| Timeout on navigation | Add `waitUntil: 'networkidle'` to open/navigate |\n| SSO redirect loop | Check cookies with `session({ sessionAction: 'cookies' })` |\n| Anti-bot block | Try `mode: 'ui'`, add delays between actions |\n| Network capture empty | Ensure `enable` called BEFORE navigating |\n\n## Decision Flow\n\n~~~text\nNeed browser?\n├─ Can web_fetch/http handle it? → NO browser needed\n├─ Login wall / SSO / CAPTCHA? → browser-use (Script mode for one-off, Recipe for reusable)\n├─ Need to capture API traffic? → network enable → interact → network get\n├─ Need authenticated API calls? → fetch action (uses page session)\n├─ JS-rendered content? → open + read(markdown)\n├─ Preview local HTML file? → serve dir (npx serve) → open(http://localhost:<port>/file.html, mode: 'ui')\n├─ Form interaction? → Script mode: open → read(snapshot) → act → verify\n├─ Reusable workflow? → Recipe mode (see references/recipes.md)\n├─ Multiple actions needed? → batch (single round-trip)\n└─ Need to track page changes? → diff (snapshot delta)\n~~~\n"},{file:`references/recipes.md`,content:`# Browser Recipes & Domain Skills
1
+ var e=[{file:`SKILL.md`,content:`---
2
+ name: browser-use
3
+ description: "Browser automation for AI agents using AI Kit's owned \`browser\` MCP tool. Triggered when: (1) repo-access exhausts its Strategy Ladder and auth requires browser interaction, (2) \`web_fetch\` returns login page HTML, SAML redirect, or CAPTCHA instead of content, (3) user needs to interact with web applications (fill forms, click buttons, extract data), (4) a site requires JavaScript rendering that \`web_fetch\` cannot handle, (5) user asks to browse, scrape, test, or automate a website, or (6) another skill needs a standard recipe format for browser-driven workflows. Uses AI Kit's owned Chromium runtime and recipe patterns for domain-specific automation skills — no external MCP server dependency."
4
+ metadata:
5
+ category: cross-cutting
6
+ domain: general
7
+ applicability: on-demand
8
+ inputs: [url, auth-error, browser-task, login-wall]
9
+ outputs: [page-content, screenshots, extracted-data, authenticated-session, network-captures]
10
+ requires: []
11
+ relatedSkills: [repo-access, present, aikit]
12
+ argument-hint: "URL or browser task description"
13
+ ---
14
+
15
+ # Browser Automation for AI Agents
16
+
17
+ Use AI Kit's \`browser\` MCP tool for authentication barriers, data extraction, form interactions, network capture, and web automation. Single tool, action-based dispatch, owned Chromium runtime.
18
+
19
+ ## Runtime Preference
20
+
21
+ - Always use AI Kit's controlled Chromium when the agent needs to inspect, verify, or interact with a page.
22
+ - Use \`mode: 'headless'\` by default. Switch to \`mode: 'ui'\` only for user-visible auth or visual debugging.
23
+ - Never use system browser commands for agent work. They provide no programmatic feedback.
24
+
25
+ ## Quick Reference
26
+
27
+ | I need to... | Action | Key params |
28
+ |---|---|---|
29
+ | Open a page | \`open\` | url, mode (ui/headless) |
30
+ | Read page content | \`read\` | readMode: snapshot/dom/markdown/text |
31
+ | Click/type/interact | \`act\` | kind: click/type/press/hover/select |
32
+ | Wait for something | \`navigate\` | type: waitFor, selector |
33
+ | Check network calls | \`network\` | subAction: enable → get |
34
+ | Get cookies/storage | \`session\` | sessionAction: cookies/get-storage |
35
+ | Take screenshot | \`screenshot\` | fullPage, selector |
36
+ | Compare changes | \`diff\` | (compares to previous snapshot) |
37
+
38
+ For full parameter details: \`describe_tool('browser')\`
39
+
40
+ ## Principles
41
+
42
+ - **Prefer \`read\` over \`screenshot\`** — snapshots are structured (ARIA tree), searchable, and token-efficient. Screenshots are opaque blobs. Use screenshots only for visual verification.
43
+ - **Prefer \`headless\` over \`ui\`** — faster, no window management. Use \`ui\` only when: user needs to see the browser, auth requires manual interaction, or debugging visual issues.
44
+ - **Always \`read\` after \`act\`** — actions don't return page state. You need to verify the result.
45
+ - **Use \`diff\` instead of re-reading** — after an action, \`diff\` shows only what changed. Much more efficient than full \`read\`.
46
+ - **Network capture BEFORE navigation** — enable network capture THEN navigate. Captures start from enable time, not retroactively.
47
+ - **One page = one task** — don't reuse pages across unrelated tasks. Fresh pages avoid state contamination.
48
+ - **Read snapshot before targeting** — ARIA refs are more stable than guessing CSS selectors or text matches.
49
+ - **Use page-context fetch for authenticated APIs** — if the browser session already has cookies and CSRF state, \`fetch\` is usually simpler than exporting cookies.
50
+
51
+ ## NEVER
52
+
53
+ - **NEVER use \`file:///\` URLs** — the browser blocks local file access for security. Serve locally instead: \`npx -y serve <dir>\` then open \`http://localhost:<port>\`.
54
+ - **NEVER interact without reading first** — you need the ARIA tree to know what elements exist. Blind clicks fail.
55
+ - **NEVER send passwords via \`act({ kind: 'type' })\`** — tell the user to type credentials manually. Agent should never handle secrets.
56
+ - **NEVER use \`screenshot\` as primary information source** — screenshots waste tokens and can't be searched. Use \`read\` with appropriate readMode.
57
+ - **NEVER open system browser (\`Start-Process\`, \`open\`, \`xdg-open\`)** — provides zero feedback to the agent. Always use the owned browser.
58
+ - **NEVER leave pages open** — close pages when done: \`session({ sessionAction: 'close', pageId })\`. Leaked pages consume resources.
59
+ - **NEVER scrape without rate limiting** — rapid page loads trigger bot detection. Add reasonable delays between navigations.
60
+ - **NEVER enable network capture after the event you care about** — you can't recover missed requests.
61
+
62
+ ## Activation Signals
63
+
64
+ - Activate when \`web_fetch\` returns login HTML, SAML redirects, CAPTCHA pages, or JS-heavy shells with no readable content.
65
+ - Activate when \`http\` returns 401/403/407 and browser auth is a plausible recovery path.
66
+ - Activate when the task requires interaction, screenshots, network capture, authenticated browser-session fetches, or previewing a locally served HTML file.
67
+ - Skip it when \`web_fetch\` or \`http\` already gives the answer.
68
+ - The \`browser\` tool is always callable directly. This skill exists for recipes and operating discipline, not for basic availability.
69
+
70
+ ## Workflows
71
+
72
+ **Two modes:**
73
+ - **Script Mode** — direct sequential \`browser()\` calls for one-off tasks, debugging, and authenticated API capture.
74
+ - **Recipe Mode** — reusable labeled step sequences for domain-specific automation.
75
+
76
+ ### Script Mode (Default — Imperative)
77
+
78
+ Direct sequential \`browser()\` calls. Best for one-off tasks, testing, API capture.
79
+
80
+ ~~~text
81
+ // Open → Read → Act → Read loop
82
+ browser({ action: 'open', url: 'https://app.example.com', mode: 'ui' })
83
+ browser({ action: 'read', pageId })
84
+ browser({ action: 'act', pageId, kind: 'click', ref: '@login-button' })
85
+ browser({ action: 'read', pageId }) // verify state changed
86
+ ~~~
87
+
88
+ **Network Intelligence pattern:**
89
+
90
+ ~~~text
91
+ browser({ action: 'network', pageId, subAction: 'enable', filter: { resourceTypes: ['xhr', 'fetch'] } })
92
+ // ... navigate/interact to trigger API calls ...
93
+ browser({ action: 'network', pageId, subAction: 'get' })
94
+ browser({ action: 'network', pageId, subAction: 'export-har' })
95
+ ~~~
96
+
97
+ **Authenticated API calls (using page cookies/session):**
98
+
99
+ ~~~text
100
+ browser({ action: 'fetch', pageId, fetchUrl: 'https://app.example.com/api/data', fetchMethod: 'GET' })
101
+ ~~~
102
+
103
+ Executes \`fetch()\` in the page, so cookies, session state, and CSRF tokens are reused automatically.
104
+
105
+ **Console capture:**
106
+
107
+ ~~~text
108
+ browser({ action: 'console', pageId, consoleSubAction: 'enable' })
109
+ // ... trigger page actions ...
110
+ browser({ action: 'console', pageId, consoleSubAction: 'get', level: 'error' })
111
+ ~~~
112
+
113
+ ### Recipe Mode (Declarative)
114
+
115
+ Structured step-by-step format for reusable workflows and domain skills. Each step declares Action, Verify, On Failure, and Extract fields.
116
+
117
+ Load [references/recipes.md](references/recipes.md) for full recipe templates and the recipe format specification.
118
+
119
+ Brief recipe format:
120
+
121
+ ~~~text
122
+ Step N: <description>
123
+ Action: browser({ ... })
124
+ Verify: <condition to check after action>
125
+ On Failure: <recovery strategy>
126
+ Extract: <data to capture for next steps>
127
+ ~~~
128
+
129
+ ### Element Targeting Priority
130
+
131
+ 1. **\`ref\`** (e.g., \`@F12\`) — From \`read(snapshot)\` ARIA tree. Most reliable.
132
+ 2. **\`selector\`** (e.g., \`input[name='q']\`) — Playwright CSS/attribute selector. Precise.
133
+ 3. **\`element\`** (e.g., \`'Submit'\`) — Text matching via \`text=\` locator. **Picks first DOM match regardless of visibility.** Fragile for complex widgets (comboboxes, ARIA roles). Last resort.
134
+
135
+ **Always \`read(snapshot)\` first** to get refs before interacting.
136
+
137
+ If a selector times out, assume visibility ambiguity first: narrow the selector, add \`:visible\`, or switch to a snapshot ref.
138
+
139
+ ## Network Intelligence
140
+
141
+ Use browser-native capture when you need to learn how a web app really talks to its backend:
142
+
143
+ - **\`network\`** for passive capture of XHR/fetch traffic, timing, and HAR export.
144
+ - **\`console\`** for browser-side errors after UI actions.
145
+ - **\`fetch\`** for replaying authenticated requests from page context without manually exporting cookies.
146
+
147
+ Headers are redacted by default. Use sensitive output only when the task explicitly requires it and never echo secrets back to the user.
148
+
149
+ **Workflow — Reverse-engineer API:**
150
+
151
+ ~~~text
152
+ 1. open target page
153
+ 2. network enable (filter: xhr, fetch)
154
+ 3. interact with the page (click buttons, submit forms)
155
+ 4. network get → see API endpoints, methods, headers
156
+ 5. fetch → replay API calls using page session
157
+ ~~~
158
+
159
+ ## Session Management
160
+
161
+ - Cookie read/write actions require explicit confirmation.
162
+ - Prefer \`fetch\` over cookie export when the goal is an authenticated API call.
163
+ - Use labels for long flows, but close the page when the task ends.
164
+
165
+ ## Security Model
166
+
167
+ **Hard gates — NEVER bypass:**
168
+ - Credentials go via terminal input (NEVER through tool params or chat)
169
+ - CAPTCHA/MFA: pause and ask user
170
+ - Never store tokens in conversation
171
+ - Close pages containing sensitive data when done
172
+ - Verify page URL before entering credentials (phishing prevention)
173
+ - Use \`headless\` mode for automated non-interactive tasks; \`ui\` for user-supervised auth
174
+
175
+ **Cookie safety gate:** All cookie read/write session actions (\`cookies\`, \`set-cookie\`, \`delete-cookie\`, \`clear-cookies\`) require \`confirm: true\` as an explicit acknowledgment. Without it, the tool returns an error.
176
+
177
+ ## Local File Preview
178
+
179
+ The browser tool blocks \`file:///\` URLs for security. To preview local HTML files, serve them via a local HTTP server first.
180
+
181
+ **Pattern:**
182
+
183
+ ~~~text
184
+ // 1. Start local server (pick an unused port)
185
+ // Terminal: npx -y serve <directory> -l <port>
186
+ // Example: npx -y serve ./dist -l 3847
187
+
188
+ // 2. Open in browser
189
+ browser({ action: 'open', url: 'http://localhost:3847/my-file.html', mode: 'ui' })
190
+
191
+ // 3. Read content or take screenshot
192
+ browser({ action: 'read', pageId, readMode: 'markdown' })
193
+ browser({ action: 'screenshot', pageId, fullPage: true })
194
+
195
+ // 4. Clean up — kill the server terminal when done
196
+ ~~~
197
+
198
+ **Use cases:**
199
+ - Preview generated HTML (viewers, reports, docs)
200
+ - Visual regression testing of local builds
201
+ - Inspect single-file HTML applications
202
+ - Screenshot local pages for review
203
+
204
+ **Important:** Always use \`mode: 'ui'\` for visual preview so the user can also see and interact with the page.
205
+
206
+ ## High-Value Patterns
207
+
208
+ - **Dialogs:** register \`dialog\` before the action that triggers it. Prompt dialogs also need \`promptText\`.
209
+ - **Labels:** assign a \`label\` on open for long-running flows so later calls stay readable.
210
+ - **Batch:** use \`batch\` to reduce round-trips, but only when the needed \`pageId\` is already known.
211
+ - **Diff:** first call establishes baseline, second call shows the delta.
212
+ - **Preview local HTML:** serve the directory first, then open the localhost URL in the owned browser.
213
+ `},{file:`references/recipes.md`,content:`# Browser Recipes & Domain Skills
2
214
 
3
215
  Reference file for reusable browser automation patterns. Load this when building domain-specific browser workflows.
4
216