npm - agent-browser-loop - Versions diffs - 0.1.0 - Mend

agent-browser-loop 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/.claude/skills/agent-browser-loop/REFERENCE.md +374 -0
package/.claude/skills/agent-browser-loop/SKILL.md +211 -0
package/LICENSE +9 -0
package/README.md +168 -0
package/package.json +73 -0
package/src/actions.ts +267 -0
package/src/browser.ts +564 -0
package/src/chrome.ts +45 -0
package/src/cli.ts +795 -0
package/src/commands.ts +455 -0
package/src/config.ts +59 -0
package/src/context.ts +20 -0
package/src/daemon-entry.ts +4 -0
package/src/daemon.ts +626 -0
package/src/id.ts +109 -0
package/src/index.ts +58 -0
package/src/log.ts +42 -0
package/src/server.ts +927 -0
package/src/state.ts +602 -0
package/src/types.ts +229 -0

package/README.md ADDED Viewed

@@ -0,0 +1,168 @@
+<p align="center">
+  <img src="readme-header.png" alt="Agent Browser Loop" width="100%" />
+</p>
+# Agent Browser Loop
+**Let your coding agent verify its own work.**
+AI coding agents can write code, run type checks, even execute unit tests - but they can't click through the app like a user would. They get stuck waiting for humans to manually verify "does the button work? does the form submit? does the error message appear?"
+Agent Browser Loop gives agents a browser they can drive. Write code, navigate to the page, fill the form, click submit, check the browser logs, take a screenshot, see the error, fix it, retry - all without human intervention.
+This doesn't eliminate human review - but it lets agents verify and unblock themselves instead of stopping at every turn. **Engineers can give agents a longer leash, and agents can build features end-to-end while proving they actually work.**
+---
+## Install
+Requires [Bun](https://bun.sh).
+```bash
+bun add -D agent-browser-loop
+agent-browser setup
+```
+This installs Playwright Chromium and copies skill files to `.claude/skills/` so Claude, OpenCode, and other AI agents know how to use the browser.
+## Quick Start
+```bash
+agent-browser open http://localhost:3000 --headed
+agent-browser act click:button_0 type:input_0:"hello"
+agent-browser wait --text "Success"
+agent-browser state
+agent-browser close
+```
+The `--headed` flag shows the browser. Omit for headless mode.
+## How Agents Use It
+The agent works in a loop: **open -> act -> wait/verify -> repeat**
+```bash
+# 1. Open the app
+agent-browser open http://localhost:3000/login
+# 2. Fill form and submit
+agent-browser act type:input_0:user@example.com type:input_1:password123 click:button_0
+# 3. Wait for navigation
+agent-browser wait --text "Welcome back"
+# 4. Verify state
+agent-browser state
+```
+Every command returns the current page state - interactive elements, form values, scroll position, console errors, network failures. The agent sees exactly what it needs to verify the code works or debug why it doesn't.
+## CLI Reference
+| Command | Description |
+|---------|-------------|
+| `open <url>` | Open URL (starts daemon if needed) |
+| `act <actions...>` | Execute actions |
+| `wait` | Wait for condition |
+| `state` | Get current page state |
+| `screenshot` | Capture screenshot |
+| `close` | Close browser and daemon |
+| `setup` | Install browser + skill files |
+### Actions
+```bash
+agent-browser act click:button_0           # Click element
+agent-browser act type:input_0:hello       # Type text
+agent-browser act press:Enter              # Press key
+agent-browser act scroll:down:500          # Scroll
+agent-browser act navigate:http://...      # Navigate
+```
+Multiple actions: `agent-browser act click:input_0 type:input_0:hello press:Enter`
+### Wait Conditions
+```bash
+agent-browser wait --text "Welcome"        # Text appears
+agent-browser wait --selector "#success"   # Element exists
+agent-browser wait --url "/dashboard"      # URL matches
+agent-browser wait --not-text "Loading"    # Text disappears
+agent-browser wait --timeout 60000         # Custom timeout
+```
+### Options
+```bash
+--headed              # Show browser window
+--session <name>      # Named session
+--json                # JSON output
+--no-state            # Skip state in response
+```
+## State Output
+```
+URL: http://localhost:3000/login
+Title: Login
+Scroll: 0px above, 500px below
+Interactive Elements:
+  [0] ref=input_0 textbox "Email" (placeholder="Enter email")
+  [1] ref=input_1 textbox "Password" (type="password")
+  [2] ref=button_0 button "Sign In"
+Errors:
+Console: [error] Failed to load resource: 404
+Network: 404 GET /api/user
+```
+Use `ref` values in actions: `click:button_0`, `type:input_0:hello`
+## Screenshots
+```bash
+agent-browser screenshot -o screenshot.png       # Save to file
+agent-browser screenshot --full-page -o full.png # Full scrollable page
+agent-browser screenshot                         # Output base64
+```
+Useful for visual debugging when text state isn't enough to diagnose issues.
+## HTTP Server Mode
+For multi-session scenarios or HTTP integrations:
+```bash
+agent-browser server --headed
+# Runs at http://localhost:3790
+# API spec at GET /openapi.json
+```
+## Configuration
+CLI flags or config file (`agent.browser.config.ts`):
+```ts
+import { defineBrowserConfig } from "agent-browser-loop";
+export default defineBrowserConfig({
+  headless: false,
+  viewportWidth: 1440,
+  viewportHeight: 900,
+});
+```
+## What This Is NOT For
+This tool is for agents to test their own code. It is **not** for:
+- Web scraping
+- Automating third-party sites
+- Bypassing authentication
+Use it on your localhost and staging environments.
+## License
+MIT

package/package.json ADDED Viewed

@@ -0,0 +1,73 @@
+{
+  "name": "agent-browser-loop",
+  "version": "0.1.0",
+  "description": "Let your AI coding agent drive a browser to verify its own work",
+  "license": "MIT",
+  "author": "Jason Silberman",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/jasonsilberman/agent-browser-loop.git"
+  },
+  "homepage": "https://github.com/jasonsilberman/agent-browser-loop#readme",
+  "bugs": {
+    "url": "https://github.com/jasonsilberman/agent-browser-loop/issues"
+  },
+  "engines": {
+    "bun": ">=1.0.0"
+  },
+  "keywords": [
+    "ai",
+    "agent",
+    "browser",
+    "automation",
+    "playwright",
+    "testing",
+    "claude",
+    "opencode",
+    "codex"
+  ],
+  "sideEffects": false,
+  "type": "module",
+  "workspaces": [
+    "examples/*"
+  ],
+  "files": [
+    "src",
+    ".claude/skills",
+    "README.md",
+    "LICENSE"
+  ],
+  "exports": {
+    ".": "./src/index.ts",
+    "./*": "./src/*.ts"
+  },
+  "bin": {
+    "agent-browser": "./src/cli.ts"
+  },
+  "scripts": {
+    "fmt": "biome check --write .",
+    "typecheck": "tsc --noEmit",
+    "dev:basic": "bun --cwd examples/basic-vite run dev",
+    "dev:next": "bun --cwd examples/next-app run dev"
+  },
+  "dependencies": {
+    "@hono/zod-openapi": "^1.2.0",
+    "@loglayer/transport-simple-pretty-terminal": "^2.3.1",
+    "cmd-ts": "^0.14.3",
+    "hono": "^4.6.11",
+    "loglayer": "^8.4.0",
+    "playwright": "^1.57.0",
+    "serialize-error": "^12.0.0",
+    "zod": "^4.3.5"
+  },
+  "devDependencies": {
+    "@biomejs/biome": "2.1.2",
+    "@tsconfig/node22": "^22.0.2",
+    "@types/bun": "latest",
+    "@types/node": "^22.10.2",
+    "typescript": "^5.7.2"
+  },
+  "volta": {
+    "node": "24.12.0"
+  }
+}

package/src/actions.ts ADDED Viewed

@@ -0,0 +1,267 @@
+import type { Page, Request } from "playwright";
+import type {
+  ClickOptions,
+  NavigateOptions,
+  NetworkEvent,
+  TypeOptions,
+} from "./types";
+/**
+ * Get a locator for an element by ref or index
+ * After calling getState(), elements have data-ref attributes injected
+ */
+function getLocator(page: Page, options: { ref?: string; index?: number }) {
+  if (options.ref) {
+    return page.locator(`[data-ref="${options.ref}"]`);
+  }
+  if (options.index !== undefined) {
+    // Use data-index (injected by getState). Fallback to legacy e{index} refs.
+    return page.locator(
+      `[data-index="${options.index}"], [data-ref="e${options.index}"]`,
+    );
+  }
+  throw new Error("Must provide either ref or index");
+}
+/**
+ * Click an element
+ */
+export async function click(page: Page, options: ClickOptions): Promise<void> {
+  const locator = getLocator(page, options);
+  const clickOptions: Parameters<typeof locator.click>[0] = {
+    button: options.button,
+    modifiers: options.modifiers,
+  };
+  if (options.double) {
+    await locator.dblclick(clickOptions);
+  } else {
+    await locator.click(clickOptions);
+  }
+}
+/**
+ * Type text into an element
+ */
+export async function type(page: Page, options: TypeOptions): Promise<void> {
+  const locator = getLocator(page, options);
+  // Clear existing text if requested
+  if (options.clear) {
+    await locator.clear();
+  }
+  // Type the text
+  if (options.delay) {
+    await locator.type(options.text, { delay: options.delay });
+  } else {
+    await locator.fill(options.text);
+  }
+  // Press Enter if submit requested
+  if (options.submit) {
+    await locator.press("Enter");
+  }
+}
+/**
+ * Navigate to a URL
+ */
+export async function navigate(
+  page: Page,
+  options: NavigateOptions,
+): Promise<void> {
+  await page.goto(options.url, {
+    waitUntil: options.waitUntil || "load",
+  });
+}
+/**
+ * Press a keyboard key
+ */
+export async function press(page: Page, key: string): Promise<void> {
+  await page.keyboard.press(key);
+}
+/**
+ * Scroll the page
+ */
+export async function scroll(
+  page: Page,
+  direction: "up" | "down",
+  amount: number = 500,
+): Promise<void> {
+  const delta = direction === "down" ? amount : -amount;
+  await page.mouse.wheel(0, delta);
+  // Wait for any lazy-loaded content
+  await page.waitForTimeout(100);
+}
+/**
+ * Wait for navigation to complete
+ */
+export async function waitForNavigation(
+  page: Page,
+  options?: { timeoutMs?: number },
+): Promise<void> {
+  await page.waitForLoadState("networkidle", {
+    timeout: options?.timeoutMs || 30000,
+  });
+}
+/**
+ * Wait for an element to appear
+ */
+export async function waitForElement(
+  page: Page,
+  selector: string,
+  options?: { timeoutMs?: number; state?: "attached" | "visible" },
+): Promise<void> {
+  await page.locator(selector).waitFor({
+    timeout: options?.timeoutMs || 30000,
+    state: options?.state || "visible",
+  });
+}
+/**
+ * Hover over an element
+ */
+export async function hover(
+  page: Page,
+  options: { ref?: string; index?: number },
+): Promise<void> {
+  const locator = getLocator(page, options);
+  await locator.hover();
+}
+/**
+ * Select an option from a dropdown
+ */
+export async function select(
+  page: Page,
+  options: { ref?: string; index?: number; value: string | string[] },
+): Promise<void> {
+  const locator = getLocator(page, options);
+  await locator.selectOption(options.value);
+}
+/**
+ * Take a screenshot
+ */
+export async function screenshot(
+  page: Page,
+  options?: { fullPage?: boolean; path?: string },
+): Promise<string> {
+  const buffer = await page.screenshot({
+    type: "jpeg",
+    quality: 80,
+    fullPage: options?.fullPage,
+    path: options?.path,
+  });
+  return buffer.toString("base64");
+}
+/**
+ * Get console logs from the page
+ */
+export function setupConsoleCapture(page: Page): string[] {
+  const logs: string[] = [];
+  page.on("console", (msg) => {
+    logs.push(`[${msg.type()}] ${msg.text()}`);
+  });
+  page.on("pageerror", (error) => {
+    logs.push(`[error] ${error.message}`);
+  });
+  return logs;
+}
+function pushNetworkEvent(
+  events: NetworkEvent[],
+  event: NetworkEvent,
+  limit: number,
+) {
+  events.push(event);
+  if (events.length > limit) {
+    events.splice(0, events.length - limit);
+  }
+}
+/**
+ * Capture network activity from the page
+ */
+export function setupNetworkCapture(
+  page: Page,
+  events: NetworkEvent[],
+  limit = 500,
+): void {
+  let counter = 0;
+  const requestMap = new Map<Request, { id: string; startedAt: number }>();
+  page.on("request", (request) => {
+    const id = `req-${counter++}`;
+    const startedAt = Date.now();
+    requestMap.set(request, { id, startedAt });
+    pushNetworkEvent(
+      events,
+      {
+        id,
+        type: "request",
+        url: request.url(),
+        method: request.method(),
+        resourceType: request.resourceType(),
+        timestamp: startedAt,
+      },
+      limit,
+    );
+  });
+  page.on("response", (response) => {
+    const request = response.request();
+    const cached = requestMap.get(request);
+    const id = cached?.id ?? `req-${counter++}`;
+    const startedAt = cached?.startedAt ?? Date.now();
+    const timestamp = Date.now();
+    pushNetworkEvent(
+      events,
+      {
+        id,
+        type: "response",
+        url: request.url(),
+        method: request.method(),
+        resourceType: request.resourceType(),
+        status: response.status(),
+        ok: response.ok(),
+        timestamp,
+        durationMs: timestamp - startedAt,
+      },
+      limit,
+    );
+    requestMap.delete(request);
+  });
+  page.on("requestfailed", (request) => {
+    const cached = requestMap.get(request);
+    const id = cached?.id ?? `req-${counter++}`;
+    const startedAt = cached?.startedAt ?? Date.now();
+    const timestamp = Date.now();
+    pushNetworkEvent(
+      events,
+      {
+        id,
+        type: "failed",
+        url: request.url(),
+        method: request.method(),
+        resourceType: request.resourceType(),
+        failureText: request.failure()?.errorText,
+        timestamp,
+        durationMs: timestamp - startedAt,
+      },
+      limit,
+    );
+    requestMap.delete(request);
+  });
+}