agent-browser-loop 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,168 @@
1
+ <p align="center">
2
+ <img src="readme-header.png" alt="Agent Browser Loop" width="100%" />
3
+ </p>
4
+
5
+ # Agent Browser Loop
6
+
7
+ **Let your coding agent verify its own work.**
8
+
9
+ AI coding agents can write code, run type checks, even execute unit tests - but they can't click through the app like a user would. They get stuck waiting for humans to manually verify "does the button work? does the form submit? does the error message appear?"
10
+
11
+ Agent Browser Loop gives agents a browser they can drive. Write code, navigate to the page, fill the form, click submit, check the browser logs, take a screenshot, see the error, fix it, retry - all without human intervention.
12
+
13
+ This doesn't eliminate human review - but it lets agents verify and unblock themselves instead of stopping at every turn. **Engineers can give agents a longer leash, and agents can build features end-to-end while proving they actually work.**
14
+
15
+ ---
16
+
17
+ ## Install
18
+
19
+ Requires [Bun](https://bun.sh).
20
+
21
+ ```bash
22
+ bun add -D agent-browser-loop
23
+ agent-browser setup
24
+ ```
25
+
26
+ This installs Playwright Chromium and copies skill files to `.claude/skills/` so Claude, OpenCode, and other AI agents know how to use the browser.
27
+
28
+ ## Quick Start
29
+
30
+ ```bash
31
+ agent-browser open http://localhost:3000 --headed
32
+ agent-browser act click:button_0 type:input_0:"hello"
33
+ agent-browser wait --text "Success"
34
+ agent-browser state
35
+ agent-browser close
36
+ ```
37
+
38
+ The `--headed` flag shows the browser. Omit for headless mode.
39
+
40
+ ## How Agents Use It
41
+
42
+ The agent works in a loop: **open -> act -> wait/verify -> repeat**
43
+
44
+ ```bash
45
+ # 1. Open the app
46
+ agent-browser open http://localhost:3000/login
47
+
48
+ # 2. Fill form and submit
49
+ agent-browser act type:input_0:user@example.com type:input_1:password123 click:button_0
50
+
51
+ # 3. Wait for navigation
52
+ agent-browser wait --text "Welcome back"
53
+
54
+ # 4. Verify state
55
+ agent-browser state
56
+ ```
57
+
58
+ Every command returns the current page state - interactive elements, form values, scroll position, console errors, network failures. The agent sees exactly what it needs to verify the code works or debug why it doesn't.
59
+
60
+ ## CLI Reference
61
+
62
+ | Command | Description |
63
+ |---------|-------------|
64
+ | `open <url>` | Open URL (starts daemon if needed) |
65
+ | `act <actions...>` | Execute actions |
66
+ | `wait` | Wait for condition |
67
+ | `state` | Get current page state |
68
+ | `screenshot` | Capture screenshot |
69
+ | `close` | Close browser and daemon |
70
+ | `setup` | Install browser + skill files |
71
+
72
+ ### Actions
73
+
74
+ ```bash
75
+ agent-browser act click:button_0 # Click element
76
+ agent-browser act type:input_0:hello # Type text
77
+ agent-browser act press:Enter # Press key
78
+ agent-browser act scroll:down:500 # Scroll
79
+ agent-browser act navigate:http://... # Navigate
80
+ ```
81
+
82
+ Multiple actions: `agent-browser act click:input_0 type:input_0:hello press:Enter`
83
+
84
+ ### Wait Conditions
85
+
86
+ ```bash
87
+ agent-browser wait --text "Welcome" # Text appears
88
+ agent-browser wait --selector "#success" # Element exists
89
+ agent-browser wait --url "/dashboard" # URL matches
90
+ agent-browser wait --not-text "Loading" # Text disappears
91
+ agent-browser wait --timeout 60000 # Custom timeout
92
+ ```
93
+
94
+ ### Options
95
+
96
+ ```bash
97
+ --headed # Show browser window
98
+ --session <name> # Named session
99
+ --json # JSON output
100
+ --no-state # Skip state in response
101
+ ```
102
+
103
+ ## State Output
104
+
105
+ ```
106
+ URL: http://localhost:3000/login
107
+ Title: Login
108
+ Scroll: 0px above, 500px below
109
+
110
+ Interactive Elements:
111
+ [0] ref=input_0 textbox "Email" (placeholder="Enter email")
112
+ [1] ref=input_1 textbox "Password" (type="password")
113
+ [2] ref=button_0 button "Sign In"
114
+
115
+ Errors:
116
+ Console: [error] Failed to load resource: 404
117
+ Network: 404 GET /api/user
118
+ ```
119
+
120
+ Use `ref` values in actions: `click:button_0`, `type:input_0:hello`
121
+
122
+ ## Screenshots
123
+
124
+ ```bash
125
+ agent-browser screenshot -o screenshot.png # Save to file
126
+ agent-browser screenshot --full-page -o full.png # Full scrollable page
127
+ agent-browser screenshot # Output base64
128
+ ```
129
+
130
+ Useful for visual debugging when text state isn't enough to diagnose issues.
131
+
132
+ ## HTTP Server Mode
133
+
134
+ For multi-session scenarios or HTTP integrations:
135
+
136
+ ```bash
137
+ agent-browser server --headed
138
+ # Runs at http://localhost:3790
139
+ # API spec at GET /openapi.json
140
+ ```
141
+
142
+ ## Configuration
143
+
144
+ CLI flags or config file (`agent.browser.config.ts`):
145
+
146
+ ```ts
147
+ import { defineBrowserConfig } from "agent-browser-loop";
148
+
149
+ export default defineBrowserConfig({
150
+ headless: false,
151
+ viewportWidth: 1440,
152
+ viewportHeight: 900,
153
+ });
154
+ ```
155
+
156
+ ## What This Is NOT For
157
+
158
+ This tool is for agents to test their own code. It is **not** for:
159
+
160
+ - Web scraping
161
+ - Automating third-party sites
162
+ - Bypassing authentication
163
+
164
+ Use it on your localhost and staging environments.
165
+
166
+ ## License
167
+
168
+ MIT
package/package.json ADDED
@@ -0,0 +1,73 @@
1
+ {
2
+ "name": "agent-browser-loop",
3
+ "version": "0.1.0",
4
+ "description": "Let your AI coding agent drive a browser to verify its own work",
5
+ "license": "MIT",
6
+ "author": "Jason Silberman",
7
+ "repository": {
8
+ "type": "git",
9
+ "url": "git+https://github.com/jasonsilberman/agent-browser-loop.git"
10
+ },
11
+ "homepage": "https://github.com/jasonsilberman/agent-browser-loop#readme",
12
+ "bugs": {
13
+ "url": "https://github.com/jasonsilberman/agent-browser-loop/issues"
14
+ },
15
+ "engines": {
16
+ "bun": ">=1.0.0"
17
+ },
18
+ "keywords": [
19
+ "ai",
20
+ "agent",
21
+ "browser",
22
+ "automation",
23
+ "playwright",
24
+ "testing",
25
+ "claude",
26
+ "opencode",
27
+ "codex"
28
+ ],
29
+ "sideEffects": false,
30
+ "type": "module",
31
+ "workspaces": [
32
+ "examples/*"
33
+ ],
34
+ "files": [
35
+ "src",
36
+ ".claude/skills",
37
+ "README.md",
38
+ "LICENSE"
39
+ ],
40
+ "exports": {
41
+ ".": "./src/index.ts",
42
+ "./*": "./src/*.ts"
43
+ },
44
+ "bin": {
45
+ "agent-browser": "./src/cli.ts"
46
+ },
47
+ "scripts": {
48
+ "fmt": "biome check --write .",
49
+ "typecheck": "tsc --noEmit",
50
+ "dev:basic": "bun --cwd examples/basic-vite run dev",
51
+ "dev:next": "bun --cwd examples/next-app run dev"
52
+ },
53
+ "dependencies": {
54
+ "@hono/zod-openapi": "^1.2.0",
55
+ "@loglayer/transport-simple-pretty-terminal": "^2.3.1",
56
+ "cmd-ts": "^0.14.3",
57
+ "hono": "^4.6.11",
58
+ "loglayer": "^8.4.0",
59
+ "playwright": "^1.57.0",
60
+ "serialize-error": "^12.0.0",
61
+ "zod": "^4.3.5"
62
+ },
63
+ "devDependencies": {
64
+ "@biomejs/biome": "2.1.2",
65
+ "@tsconfig/node22": "^22.0.2",
66
+ "@types/bun": "latest",
67
+ "@types/node": "^22.10.2",
68
+ "typescript": "^5.7.2"
69
+ },
70
+ "volta": {
71
+ "node": "24.12.0"
72
+ }
73
+ }
package/src/actions.ts ADDED
@@ -0,0 +1,267 @@
1
+ import type { Page, Request } from "playwright";
2
+ import type {
3
+ ClickOptions,
4
+ NavigateOptions,
5
+ NetworkEvent,
6
+ TypeOptions,
7
+ } from "./types";
8
+
9
+ /**
10
+ * Get a locator for an element by ref or index
11
+ * After calling getState(), elements have data-ref attributes injected
12
+ */
13
+ function getLocator(page: Page, options: { ref?: string; index?: number }) {
14
+ if (options.ref) {
15
+ return page.locator(`[data-ref="${options.ref}"]`);
16
+ }
17
+ if (options.index !== undefined) {
18
+ // Use data-index (injected by getState). Fallback to legacy e{index} refs.
19
+ return page.locator(
20
+ `[data-index="${options.index}"], [data-ref="e${options.index}"]`,
21
+ );
22
+ }
23
+ throw new Error("Must provide either ref or index");
24
+ }
25
+
26
+ /**
27
+ * Click an element
28
+ */
29
+ export async function click(page: Page, options: ClickOptions): Promise<void> {
30
+ const locator = getLocator(page, options);
31
+
32
+ const clickOptions: Parameters<typeof locator.click>[0] = {
33
+ button: options.button,
34
+ modifiers: options.modifiers,
35
+ };
36
+
37
+ if (options.double) {
38
+ await locator.dblclick(clickOptions);
39
+ } else {
40
+ await locator.click(clickOptions);
41
+ }
42
+ }
43
+
44
+ /**
45
+ * Type text into an element
46
+ */
47
+ export async function type(page: Page, options: TypeOptions): Promise<void> {
48
+ const locator = getLocator(page, options);
49
+
50
+ // Clear existing text if requested
51
+ if (options.clear) {
52
+ await locator.clear();
53
+ }
54
+
55
+ // Type the text
56
+ if (options.delay) {
57
+ await locator.type(options.text, { delay: options.delay });
58
+ } else {
59
+ await locator.fill(options.text);
60
+ }
61
+
62
+ // Press Enter if submit requested
63
+ if (options.submit) {
64
+ await locator.press("Enter");
65
+ }
66
+ }
67
+
68
+ /**
69
+ * Navigate to a URL
70
+ */
71
+ export async function navigate(
72
+ page: Page,
73
+ options: NavigateOptions,
74
+ ): Promise<void> {
75
+ await page.goto(options.url, {
76
+ waitUntil: options.waitUntil || "load",
77
+ });
78
+ }
79
+
80
+ /**
81
+ * Press a keyboard key
82
+ */
83
+ export async function press(page: Page, key: string): Promise<void> {
84
+ await page.keyboard.press(key);
85
+ }
86
+
87
+ /**
88
+ * Scroll the page
89
+ */
90
+ export async function scroll(
91
+ page: Page,
92
+ direction: "up" | "down",
93
+ amount: number = 500,
94
+ ): Promise<void> {
95
+ const delta = direction === "down" ? amount : -amount;
96
+ await page.mouse.wheel(0, delta);
97
+ // Wait for any lazy-loaded content
98
+ await page.waitForTimeout(100);
99
+ }
100
+
101
+ /**
102
+ * Wait for navigation to complete
103
+ */
104
+ export async function waitForNavigation(
105
+ page: Page,
106
+ options?: { timeoutMs?: number },
107
+ ): Promise<void> {
108
+ await page.waitForLoadState("networkidle", {
109
+ timeout: options?.timeoutMs || 30000,
110
+ });
111
+ }
112
+
113
+ /**
114
+ * Wait for an element to appear
115
+ */
116
+ export async function waitForElement(
117
+ page: Page,
118
+ selector: string,
119
+ options?: { timeoutMs?: number; state?: "attached" | "visible" },
120
+ ): Promise<void> {
121
+ await page.locator(selector).waitFor({
122
+ timeout: options?.timeoutMs || 30000,
123
+ state: options?.state || "visible",
124
+ });
125
+ }
126
+
127
+ /**
128
+ * Hover over an element
129
+ */
130
+ export async function hover(
131
+ page: Page,
132
+ options: { ref?: string; index?: number },
133
+ ): Promise<void> {
134
+ const locator = getLocator(page, options);
135
+ await locator.hover();
136
+ }
137
+
138
+ /**
139
+ * Select an option from a dropdown
140
+ */
141
+ export async function select(
142
+ page: Page,
143
+ options: { ref?: string; index?: number; value: string | string[] },
144
+ ): Promise<void> {
145
+ const locator = getLocator(page, options);
146
+ await locator.selectOption(options.value);
147
+ }
148
+
149
+ /**
150
+ * Take a screenshot
151
+ */
152
+ export async function screenshot(
153
+ page: Page,
154
+ options?: { fullPage?: boolean; path?: string },
155
+ ): Promise<string> {
156
+ const buffer = await page.screenshot({
157
+ type: "jpeg",
158
+ quality: 80,
159
+ fullPage: options?.fullPage,
160
+ path: options?.path,
161
+ });
162
+ return buffer.toString("base64");
163
+ }
164
+
165
+ /**
166
+ * Get console logs from the page
167
+ */
168
+ export function setupConsoleCapture(page: Page): string[] {
169
+ const logs: string[] = [];
170
+
171
+ page.on("console", (msg) => {
172
+ logs.push(`[${msg.type()}] ${msg.text()}`);
173
+ });
174
+
175
+ page.on("pageerror", (error) => {
176
+ logs.push(`[error] ${error.message}`);
177
+ });
178
+
179
+ return logs;
180
+ }
181
+
182
+ function pushNetworkEvent(
183
+ events: NetworkEvent[],
184
+ event: NetworkEvent,
185
+ limit: number,
186
+ ) {
187
+ events.push(event);
188
+ if (events.length > limit) {
189
+ events.splice(0, events.length - limit);
190
+ }
191
+ }
192
+
193
+ /**
194
+ * Capture network activity from the page
195
+ */
196
+ export function setupNetworkCapture(
197
+ page: Page,
198
+ events: NetworkEvent[],
199
+ limit = 500,
200
+ ): void {
201
+ let counter = 0;
202
+ const requestMap = new Map<Request, { id: string; startedAt: number }>();
203
+
204
+ page.on("request", (request) => {
205
+ const id = `req-${counter++}`;
206
+ const startedAt = Date.now();
207
+ requestMap.set(request, { id, startedAt });
208
+ pushNetworkEvent(
209
+ events,
210
+ {
211
+ id,
212
+ type: "request",
213
+ url: request.url(),
214
+ method: request.method(),
215
+ resourceType: request.resourceType(),
216
+ timestamp: startedAt,
217
+ },
218
+ limit,
219
+ );
220
+ });
221
+
222
+ page.on("response", (response) => {
223
+ const request = response.request();
224
+ const cached = requestMap.get(request);
225
+ const id = cached?.id ?? `req-${counter++}`;
226
+ const startedAt = cached?.startedAt ?? Date.now();
227
+ const timestamp = Date.now();
228
+ pushNetworkEvent(
229
+ events,
230
+ {
231
+ id,
232
+ type: "response",
233
+ url: request.url(),
234
+ method: request.method(),
235
+ resourceType: request.resourceType(),
236
+ status: response.status(),
237
+ ok: response.ok(),
238
+ timestamp,
239
+ durationMs: timestamp - startedAt,
240
+ },
241
+ limit,
242
+ );
243
+ requestMap.delete(request);
244
+ });
245
+
246
+ page.on("requestfailed", (request) => {
247
+ const cached = requestMap.get(request);
248
+ const id = cached?.id ?? `req-${counter++}`;
249
+ const startedAt = cached?.startedAt ?? Date.now();
250
+ const timestamp = Date.now();
251
+ pushNetworkEvent(
252
+ events,
253
+ {
254
+ id,
255
+ type: "failed",
256
+ url: request.url(),
257
+ method: request.method(),
258
+ resourceType: request.resourceType(),
259
+ failureText: request.failure()?.errorText,
260
+ timestamp,
261
+ durationMs: timestamp - startedAt,
262
+ },
263
+ limit,
264
+ );
265
+ requestMap.delete(request);
266
+ });
267
+ }