npm - agent-browser-loop - Versions diffs - 0.1.0 - Mend

agent-browser-loop 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/.claude/skills/agent-browser-loop/REFERENCE.md +374 -0
package/.claude/skills/agent-browser-loop/SKILL.md +211 -0
package/LICENSE +9 -0
package/README.md +168 -0
package/package.json +73 -0
package/src/actions.ts +267 -0
package/src/browser.ts +564 -0
package/src/chrome.ts +45 -0
package/src/cli.ts +795 -0
package/src/commands.ts +455 -0
package/src/config.ts +59 -0
package/src/context.ts +20 -0
package/src/daemon-entry.ts +4 -0
package/src/daemon.ts +626 -0
package/src/id.ts +109 -0
package/src/index.ts +58 -0
package/src/log.ts +42 -0
package/src/server.ts +927 -0
package/src/state.ts +602 -0
package/src/types.ts +229 -0

package/.claude/skills/agent-browser-loop/REFERENCE.md ADDED Viewed

@@ -0,0 +1,374 @@
+# Agent Browser Loop - CLI Reference
+Complete CLI reference for `agent-browser`.
+## Commands
+### `open <url>`
+Open a URL in the browser. Automatically starts daemon if not running.
+```bash
+agent-browser open <url> [options]
+```
+**Options:**
+| Flag | Description |
+|------|-------------|
+| `--headed` | Show browser window (default: headless) |
+| `--session <name>` | Named session for isolation |
+| `--viewport <WxH>` | Viewport size (default: 1280x720) |
+| `--json` | Output as JSON |
+**Examples:**
+```bash
+agent-browser open http://localhost:3000
+agent-browser open http://localhost:3000 --headed
+agent-browser open http://localhost:3000 --session test1 --viewport 1920x1080
+```
+---
+### `act <actions...>`
+Execute one or more actions on the page.
+```bash
+agent-browser act <actions...> [options]
+```
+**Options:**
+| Flag | Description |
+|------|-------------|
+| `--session <name>` | Target session |
+| `--no-state` | Skip state in response |
+| `--json` | Output as JSON |
+**Action Syntax:**
+| Action | Syntax | Example |
+|--------|--------|---------|
+| Navigate | `navigate:<url>` | `navigate:http://localhost:3000/login` |
+| Click | `click:<ref>` | `click:button_0` |
+| Type | `type:<ref>:<text>` | `type:input_0:hello` |
+| Press key | `press:<key>` | `press:Enter` |
+| Scroll | `scroll:<direction>[:<amount>]` | `scroll:down:500` |
+| Select | `select:<ref>:<value>` | `select:select_0:option1` |
+| Check | `check:<ref>` | `check:checkbox_0` |
+| Uncheck | `uncheck:<ref>` | `uncheck:checkbox_0` |
+| Focus | `focus:<ref>` | `focus:input_0` |
+| Blur | `blur:<ref>` | `blur:input_0` |
+| Hover | `hover:<ref>` | `hover:button_0` |
+| Clear | `clear:<ref>` | `clear:input_0` |
+| Upload | `upload:<ref>:<path>` | `upload:input_0:/path/to/file.pdf` |
+| Wait | `wait:<ms>` | `wait:1000` |
+| Go back | `back` | `back` |
+| Go forward | `forward` | `forward` |
+| Reload | `reload` | `reload` |
+**Key names for `press`:**
+- Navigation: `Enter`, `Tab`, `Escape`, `Backspace`, `Delete`
+- Arrows: `ArrowUp`, `ArrowDown`, `ArrowLeft`, `ArrowRight`
+- Modifiers: `Shift`, `Control`, `Alt`, `Meta`
+- Function: `F1`-`F12`
+- Special: `Home`, `End`, `PageUp`, `PageDown`, `Insert`
+**Examples:**
+```bash
+# Single action
+agent-browser act click:button_0
+# Multiple actions (executed in order)
+agent-browser act click:input_0 type:input_0:hello press:Enter
+# Text with spaces (use quotes)
+agent-browser act type:input_0:"hello world"
+# Form fill and submit
+agent-browser act \
+  type:input_0:user@example.com \
+  type:input_1:password123 \
+  click:button_0
+```
+---
+### `wait`
+Wait for a condition on the page.
+```bash
+agent-browser wait [options]
+```
+**Options:**
+| Flag | Description |
+|------|-------------|
+| `--text <string>` | Wait for text to appear |
+| `--selector <css>` | Wait for element to exist |
+| `--url <pattern>` | Wait for URL to match (substring) |
+| `--not-text <string>` | Wait for text to disappear |
+| `--not-selector <css>` | Wait for element to disappear |
+| `--timeout <ms>` | Timeout in milliseconds (default: 30000) |
+| `--session <name>` | Target session |
+| `--json` | Output as JSON |
+**Examples:**
+```bash
+# Wait for text
+agent-browser wait --text "Welcome"
+agent-browser wait --text "Login successful"
+# Wait for element
+agent-browser wait --selector "#success-message"
+agent-browser wait --selector ".dashboard"
+# Wait for URL change
+agent-browser wait --url "/dashboard"
+agent-browser wait --url "success=true"
+# Wait for disappearance (loading states)
+agent-browser wait --not-text "Loading..."
+agent-browser wait --not-selector ".spinner"
+# Custom timeout
+agent-browser wait --text "Done" --timeout 60000
+```
+---
+### `state`
+Get the current page state.
+```bash
+agent-browser state [options]
+```
+**Options:**
+| Flag | Description |
+|------|-------------|
+| `--session <name>` | Target session |
+| `--json` | Output as JSON |
+**Output includes:**
+- Current URL and page title
+- Tab count
+- Scroll position (pixels above/below viewport)
+- Interactive elements with refs, types, labels, values
+- Console errors
+- Network errors (4xx/5xx responses)
+**Example output:**
+```
+URL: http://localhost:3000/login
+Title: Login - MyApp
+Tabs: 1
+Scroll: 0px above, 250px below
+Interactive Elements:
+  [0] ref=input_0 textbox "Email" (placeholder="Enter email")
+  [1] ref=input_1 textbox "Password" (type="password")
+  [2] ref=checkbox_0 checkbox "Remember me"
+  [3] ref=button_0 button "Sign In"
+  [4] ref=link_0 link "Forgot password?" (href="/forgot")
+Errors:
+Console:
+  - [error] Failed to load resource: 404
+Network:
+  - 404 GET /api/config
+```
+---
+### `screenshot`
+Capture a screenshot of the current page.
+```bash
+agent-browser screenshot [options]
+```
+**Options:**
+| Flag | Description |
+|------|-------------|
+| `--output, -o <path>` | Save to file (PNG) instead of base64 output |
+| `--full-page` | Capture full scrollable page |
+| `--session <name>` | Target session |
+**Examples:**
+```bash
+# Save to file
+agent-browser screenshot -o screenshot.png
+# Full page screenshot
+agent-browser screenshot --full-page -o full.png
+# Output base64 (for piping or programmatic use)
+agent-browser screenshot
+```
+---
+### `close`
+Close browser and stop daemon.
+```bash
+agent-browser close [options]
+```
+**Options:**
+| Flag | Description |
+|------|-------------|
+| `--session <name>` | Close specific session only |
+---
+### `status`
+Check if daemon is running.
+```bash
+agent-browser status
+```
+---
+### `setup`
+Install Playwright browser and AI agent skill files.
+```bash
+agent-browser setup [options]
+```
+**Options:**
+| Flag | Description |
+|------|-------------|
+| `--skip-skill` | Skip installing skill files |
+| `--target <dir>` | Target directory for skill files (default: cwd) |
+Run this once after installing the package. Installs:
+1. Playwright Chromium browser
+2. Skill files to `.claude/skills/agent-browser-loop/`
+---
+### `server`
+Start HTTP server mode for multi-session scenarios.
+```bash
+agent-browser server [options]
+```
+**Options:**
+| Flag | Description |
+|------|-------------|
+| `--port <number>` | Port number (default: 3790) |
+| `--headed` | Show browser windows |
+| `--viewport <WxH>` | Default viewport size |
+Server provides REST API at `http://localhost:3790`. OpenAPI spec at `GET /openapi.json`.
+---
+---
+## Element References
+Elements are identified by type-prefixed refs that remain stable within a session:
+| Prefix | Element Type |
+|--------|--------------|
+| `button_N` | Buttons (`<button>`, `[role="button"]`, etc.) |
+| `input_N` | Text inputs, textareas |
+| `link_N` | Links (`<a>` with href) |
+| `checkbox_N` | Checkboxes |
+| `radio_N` | Radio buttons |
+| `select_N` | Select dropdowns |
+| `option_N` | Select options |
+| `img_N` | Images with click handlers |
+| `generic_N` | Other interactive elements |
+**Note:** Refs may change after DOM updates. Always re-fetch state if actions fail with "element not found".
+---
+## Global Options
+These options work with most commands:
+| Flag | Description |
+|------|-------------|
+| `--session <name>` | Named session for isolation |
+| `--json` | JSON output format |
+| `--help` | Show help |
+---
+## Exit Codes
+| Code | Meaning |
+|------|---------|
+| 0 | Success |
+| 1 | Error (action failed, timeout, daemon not running, etc.) |
+---
+## Environment Variables
+| Variable | Description |
+|----------|-------------|
+| `AGENT_BROWSER_SOCKET` | Custom socket path for daemon |
+| `DEBUG` | Enable debug logging |
+---
+## Examples
+### Login Flow
+```bash
+agent-browser open http://localhost:3000/login
+agent-browser act type:input_0:user@test.com type:input_1:secret
+agent-browser act click:button_0
+agent-browser wait --text "Dashboard"
+agent-browser close
+```
+### Form Validation Testing
+```bash
+agent-browser open http://localhost:3000/signup --headed
+agent-browser act click:button_0  # Submit empty form
+agent-browser wait --text "Email is required"
+agent-browser state  # Check error states
+agent-browser close
+```
+### Navigation Testing
+```bash
+agent-browser open http://localhost:3000
+agent-browser act click:link_0
+agent-browser wait --url "/about"
+agent-browser act back
+agent-browser wait --url "/"
+agent-browser close
+```
+### Multiple Sessions
+```bash
+# Session A: Admin user
+agent-browser open http://localhost:3000/login --session admin
+agent-browser act type:input_0:admin@test.com --session admin
+# Session B: Regular user
+agent-browser open http://localhost:3000/login --session user
+agent-browser act type:input_0:user@test.com --session user
+# Close both
+agent-browser close --session admin
+agent-browser close --session user
+```

package/.claude/skills/agent-browser-loop/SKILL.md ADDED Viewed

@@ -0,0 +1,211 @@
+---
+name: agent-browser-loop
+description: Use when an agent must drive a live browser session in a back-and-forth loop (state -> explicit actions -> state) for UI validation, reproducible QA, or debugging UI behavior. Prefer this over one-shot CLI usage when an agent needs inspectable, stepwise control.
+---
+# Agent Browser Loop
+Control a browser via CLI. Execute actions, read state, and verify UI changes in a stepwise loop.
+## Quick Start
+```bash
+# Open a URL (starts browser daemon automatically)
+agent-browser open http://localhost:3000
+# Interact and verify
+agent-browser act click:button_0
+agent-browser wait --text "Success"
+agent-browser state
+# Close when done
+agent-browser close
+```
+Use `--headed` to see the browser: `agent-browser open http://localhost:3000 --headed`
+## Core Loop
+1. **Open**: `agent-browser open <url>` - starts daemon, navigates to URL
+2. **Act**: `agent-browser act <actions...>` - interact with elements
+3. **Wait**: `agent-browser wait --text/--selector/--url` - wait for conditions
+4. **State**: `agent-browser state` - read current page state
+5. **Repeat** until task complete
+6. **Close**: `agent-browser close` - stop browser daemon
+## Commands
+| Command | Purpose |
+|---------|---------|
+| `open <url>` | Open URL (starts daemon if needed) |
+| `act <actions...>` | Execute actions |
+| `wait` | Wait for conditions |
+| `state` | Get current page state |
+| `screenshot` | Capture screenshot |
+| `close` | Close browser and daemon |
+| `status` | Check if daemon is running |
+## Action Syntax
+Actions use format `action:target` or `action:target:value`:
+```bash
+# Navigation
+agent-browser act navigate:http://localhost:3000
+# Click elements
+agent-browser act click:button_0
+agent-browser act click:link_2
+# Type into inputs
+agent-browser act type:input_0:hello
+agent-browser act type:input_1:"text with spaces"
+# Keyboard
+agent-browser act press:Enter
+agent-browser act press:Tab
+# Scroll
+agent-browser act scroll:down
+agent-browser act scroll:up:500
+# Multiple actions
+agent-browser act click:input_0 type:input_0:hello press:Enter
+```
+## Wait Conditions
+```bash
+# Wait for text
+agent-browser wait --text "Welcome"
+# Wait for element
+agent-browser wait --selector "#success"
+# Wait for URL
+agent-browser wait --url "/dashboard"
+# Wait for disappearance
+agent-browser wait --not-text "Loading..."
+agent-browser wait --not-selector ".spinner"
+# Custom timeout (default 30s)
+agent-browser wait --text "Done" --timeout 60000
+```
+## Element References
+State includes interactive elements with stable refs:
+```
+Interactive Elements:
+  [0] ref=input_0 textbox "Email" (placeholder="Enter email")
+  [1] ref=input_1 textbox "Password" (type="password")
+  [2] ref=button_0 button "Sign In"
+  [3] ref=link_0 link "Forgot password?" (href="/forgot")
+```
+**Use `ref` values in actions**: `click:button_0`, `type:input_0:hello`
+Refs are type-prefixed (`button_`, `input_`, `link_`, `checkbox_`, `select_`) and stable within a session.
+## Reading State
+State includes:
+- Current URL and title
+- Scroll position
+- Interactive elements with values
+- Console and network errors
+```
+URL: http://localhost:3000/login
+Title: Login
+Tabs: 1
+Scroll: 0px above, 500px below
+Interactive Elements:
+  [0] ref=input_0 textbox "Email" value="user@test.com"
+  [1] ref=input_1 textbox "Password" (type="password")
+  [2] ref=checkbox_0 checkbox "Remember me" (checked="true")
+  [3] ref=button_0 button "Sign In"
+Errors:
+Console:
+  - [error] Failed to load resource: 404
+Network:
+  - 404 GET /api/user
+```
+## Complete Example: Login Flow
+```bash
+# 1. Open login page
+agent-browser open http://localhost:3000/login
+# 2. Fill form and submit
+agent-browser act \
+  type:input_0:user@example.com \
+  type:input_1:password123 \
+  click:button_0
+# 3. Wait for login to complete
+agent-browser wait --text "Welcome" --timeout 5000
+# 4. Verify state
+agent-browser state
+# 5. Close when done
+agent-browser close
+```
+## Options
+```bash
+# Headed mode (visible browser)
+agent-browser open http://localhost:3000 --headed
+# Named session
+agent-browser open http://localhost:3000 --session my-test
+agent-browser act click:button_0 --session my-test
+# JSON output
+agent-browser state --json
+# Skip state in response
+agent-browser act click:button_0 --no-state
+```
+## Screenshots
+```bash
+agent-browser screenshot -o screenshot.png       # Save to file
+agent-browser screenshot --full-page -o full.png # Full scrollable page
+agent-browser screenshot                         # Output base64
+```
+Use when text state isn't enough to diagnose visual issues.
+## Debugging Tips
+1. **Action does nothing?** Check errors in state output
+2. **Element not found?** Run `agent-browser state` to see current refs
+3. **Waiting times out?** Check exact text/selector, try simpler condition
+4. **Need visual check?** Use `--headed` or `agent-browser screenshot`
+5. **Refs changed?** DOM updates can change refs - re-fetch state
+## HTTP Server Mode
+For multi-session scenarios or HTTP-based integrations:
+```bash
+# Start HTTP server
+agent-browser server --headed
+# Server at http://localhost:3790
+# Full API spec at GET /openapi.json
+```
+## Full Reference
+See REFERENCE.md for complete CLI documentation.

package/LICENSE ADDED Viewed

@@ -0,0 +1,9 @@
+MIT License
+Copyright (c) 2026 Jason Silberman
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.