agent-browser-loop 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,374 @@
1
+ # Agent Browser Loop - CLI Reference
2
+
3
+ Complete CLI reference for `agent-browser`.
4
+
5
+ ## Commands
6
+
7
+ ### `open <url>`
8
+
9
+ Open a URL in the browser. Automatically starts daemon if not running.
10
+
11
+ ```bash
12
+ agent-browser open <url> [options]
13
+ ```
14
+
15
+ **Options:**
16
+ | Flag | Description |
17
+ |------|-------------|
18
+ | `--headed` | Show browser window (default: headless) |
19
+ | `--session <name>` | Named session for isolation |
20
+ | `--viewport <WxH>` | Viewport size (default: 1280x720) |
21
+ | `--json` | Output as JSON |
22
+
23
+ **Examples:**
24
+ ```bash
25
+ agent-browser open http://localhost:3000
26
+ agent-browser open http://localhost:3000 --headed
27
+ agent-browser open http://localhost:3000 --session test1 --viewport 1920x1080
28
+ ```
29
+
30
+ ---
31
+
32
+ ### `act <actions...>`
33
+
34
+ Execute one or more actions on the page.
35
+
36
+ ```bash
37
+ agent-browser act <actions...> [options]
38
+ ```
39
+
40
+ **Options:**
41
+ | Flag | Description |
42
+ |------|-------------|
43
+ | `--session <name>` | Target session |
44
+ | `--no-state` | Skip state in response |
45
+ | `--json` | Output as JSON |
46
+
47
+ **Action Syntax:**
48
+
49
+ | Action | Syntax | Example |
50
+ |--------|--------|---------|
51
+ | Navigate | `navigate:<url>` | `navigate:http://localhost:3000/login` |
52
+ | Click | `click:<ref>` | `click:button_0` |
53
+ | Type | `type:<ref>:<text>` | `type:input_0:hello` |
54
+ | Press key | `press:<key>` | `press:Enter` |
55
+ | Scroll | `scroll:<direction>[:<amount>]` | `scroll:down:500` |
56
+ | Select | `select:<ref>:<value>` | `select:select_0:option1` |
57
+ | Check | `check:<ref>` | `check:checkbox_0` |
58
+ | Uncheck | `uncheck:<ref>` | `uncheck:checkbox_0` |
59
+ | Focus | `focus:<ref>` | `focus:input_0` |
60
+ | Blur | `blur:<ref>` | `blur:input_0` |
61
+ | Hover | `hover:<ref>` | `hover:button_0` |
62
+ | Clear | `clear:<ref>` | `clear:input_0` |
63
+ | Upload | `upload:<ref>:<path>` | `upload:input_0:/path/to/file.pdf` |
64
+ | Wait | `wait:<ms>` | `wait:1000` |
65
+ | Go back | `back` | `back` |
66
+ | Go forward | `forward` | `forward` |
67
+ | Reload | `reload` | `reload` |
68
+
69
+ **Key names for `press`:**
70
+ - Navigation: `Enter`, `Tab`, `Escape`, `Backspace`, `Delete`
71
+ - Arrows: `ArrowUp`, `ArrowDown`, `ArrowLeft`, `ArrowRight`
72
+ - Modifiers: `Shift`, `Control`, `Alt`, `Meta`
73
+ - Function: `F1`-`F12`
74
+ - Special: `Home`, `End`, `PageUp`, `PageDown`, `Insert`
75
+
76
+ **Examples:**
77
+ ```bash
78
+ # Single action
79
+ agent-browser act click:button_0
80
+
81
+ # Multiple actions (executed in order)
82
+ agent-browser act click:input_0 type:input_0:hello press:Enter
83
+
84
+ # Text with spaces (use quotes)
85
+ agent-browser act type:input_0:"hello world"
86
+
87
+ # Form fill and submit
88
+ agent-browser act \
89
+ type:input_0:user@example.com \
90
+ type:input_1:password123 \
91
+ click:button_0
92
+ ```
93
+
94
+ ---
95
+
96
+ ### `wait`
97
+
98
+ Wait for a condition on the page.
99
+
100
+ ```bash
101
+ agent-browser wait [options]
102
+ ```
103
+
104
+ **Options:**
105
+ | Flag | Description |
106
+ |------|-------------|
107
+ | `--text <string>` | Wait for text to appear |
108
+ | `--selector <css>` | Wait for element to exist |
109
+ | `--url <pattern>` | Wait for URL to match (substring) |
110
+ | `--not-text <string>` | Wait for text to disappear |
111
+ | `--not-selector <css>` | Wait for element to disappear |
112
+ | `--timeout <ms>` | Timeout in milliseconds (default: 30000) |
113
+ | `--session <name>` | Target session |
114
+ | `--json` | Output as JSON |
115
+
116
+ **Examples:**
117
+ ```bash
118
+ # Wait for text
119
+ agent-browser wait --text "Welcome"
120
+ agent-browser wait --text "Login successful"
121
+
122
+ # Wait for element
123
+ agent-browser wait --selector "#success-message"
124
+ agent-browser wait --selector ".dashboard"
125
+
126
+ # Wait for URL change
127
+ agent-browser wait --url "/dashboard"
128
+ agent-browser wait --url "success=true"
129
+
130
+ # Wait for disappearance (loading states)
131
+ agent-browser wait --not-text "Loading..."
132
+ agent-browser wait --not-selector ".spinner"
133
+
134
+ # Custom timeout
135
+ agent-browser wait --text "Done" --timeout 60000
136
+ ```
137
+
138
+ ---
139
+
140
+ ### `state`
141
+
142
+ Get the current page state.
143
+
144
+ ```bash
145
+ agent-browser state [options]
146
+ ```
147
+
148
+ **Options:**
149
+ | Flag | Description |
150
+ |------|-------------|
151
+ | `--session <name>` | Target session |
152
+ | `--json` | Output as JSON |
153
+
154
+ **Output includes:**
155
+ - Current URL and page title
156
+ - Tab count
157
+ - Scroll position (pixels above/below viewport)
158
+ - Interactive elements with refs, types, labels, values
159
+ - Console errors
160
+ - Network errors (4xx/5xx responses)
161
+
162
+ **Example output:**
163
+ ```
164
+ URL: http://localhost:3000/login
165
+ Title: Login - MyApp
166
+ Tabs: 1
167
+
168
+ Scroll: 0px above, 250px below
169
+
170
+ Interactive Elements:
171
+ [0] ref=input_0 textbox "Email" (placeholder="Enter email")
172
+ [1] ref=input_1 textbox "Password" (type="password")
173
+ [2] ref=checkbox_0 checkbox "Remember me"
174
+ [3] ref=button_0 button "Sign In"
175
+ [4] ref=link_0 link "Forgot password?" (href="/forgot")
176
+
177
+ Errors:
178
+ Console:
179
+ - [error] Failed to load resource: 404
180
+ Network:
181
+ - 404 GET /api/config
182
+ ```
183
+
184
+ ---
185
+
186
+ ### `screenshot`
187
+
188
+ Capture a screenshot of the current page.
189
+
190
+ ```bash
191
+ agent-browser screenshot [options]
192
+ ```
193
+
194
+ **Options:**
195
+ | Flag | Description |
196
+ |------|-------------|
197
+ | `--output, -o <path>` | Save to file (PNG) instead of base64 output |
198
+ | `--full-page` | Capture full scrollable page |
199
+ | `--session <name>` | Target session |
200
+
201
+ **Examples:**
202
+ ```bash
203
+ # Save to file
204
+ agent-browser screenshot -o screenshot.png
205
+
206
+ # Full page screenshot
207
+ agent-browser screenshot --full-page -o full.png
208
+
209
+ # Output base64 (for piping or programmatic use)
210
+ agent-browser screenshot
211
+ ```
212
+
213
+ ---
214
+
215
+ ### `close`
216
+
217
+ Close browser and stop daemon.
218
+
219
+ ```bash
220
+ agent-browser close [options]
221
+ ```
222
+
223
+ **Options:**
224
+ | Flag | Description |
225
+ |------|-------------|
226
+ | `--session <name>` | Close specific session only |
227
+
228
+ ---
229
+
230
+ ### `status`
231
+
232
+ Check if daemon is running.
233
+
234
+ ```bash
235
+ agent-browser status
236
+ ```
237
+
238
+ ---
239
+
240
+ ### `setup`
241
+
242
+ Install Playwright browser and AI agent skill files.
243
+
244
+ ```bash
245
+ agent-browser setup [options]
246
+ ```
247
+
248
+ **Options:**
249
+ | Flag | Description |
250
+ |------|-------------|
251
+ | `--skip-skill` | Skip installing skill files |
252
+ | `--target <dir>` | Target directory for skill files (default: cwd) |
253
+
254
+ Run this once after installing the package. Installs:
255
+ 1. Playwright Chromium browser
256
+ 2. Skill files to `.claude/skills/agent-browser-loop/`
257
+
258
+ ---
259
+
260
+ ### `server`
261
+
262
+ Start HTTP server mode for multi-session scenarios.
263
+
264
+ ```bash
265
+ agent-browser server [options]
266
+ ```
267
+
268
+ **Options:**
269
+ | Flag | Description |
270
+ |------|-------------|
271
+ | `--port <number>` | Port number (default: 3790) |
272
+ | `--headed` | Show browser windows |
273
+ | `--viewport <WxH>` | Default viewport size |
274
+
275
+ Server provides REST API at `http://localhost:3790`. OpenAPI spec at `GET /openapi.json`.
276
+
277
+ ---
278
+
279
+ ---
280
+
281
+ ## Element References
282
+
283
+ Elements are identified by type-prefixed refs that remain stable within a session:
284
+
285
+ | Prefix | Element Type |
286
+ |--------|--------------|
287
+ | `button_N` | Buttons (`<button>`, `[role="button"]`, etc.) |
288
+ | `input_N` | Text inputs, textareas |
289
+ | `link_N` | Links (`<a>` with href) |
290
+ | `checkbox_N` | Checkboxes |
291
+ | `radio_N` | Radio buttons |
292
+ | `select_N` | Select dropdowns |
293
+ | `option_N` | Select options |
294
+ | `img_N` | Images with click handlers |
295
+ | `generic_N` | Other interactive elements |
296
+
297
+ **Note:** Refs may change after DOM updates. Always re-fetch state if actions fail with "element not found".
298
+
299
+ ---
300
+
301
+ ## Global Options
302
+
303
+ These options work with most commands:
304
+
305
+ | Flag | Description |
306
+ |------|-------------|
307
+ | `--session <name>` | Named session for isolation |
308
+ | `--json` | JSON output format |
309
+ | `--help` | Show help |
310
+
311
+ ---
312
+
313
+ ## Exit Codes
314
+
315
+ | Code | Meaning |
316
+ |------|---------|
317
+ | 0 | Success |
318
+ | 1 | Error (action failed, timeout, daemon not running, etc.) |
319
+
320
+ ---
321
+
322
+ ## Environment Variables
323
+
324
+ | Variable | Description |
325
+ |----------|-------------|
326
+ | `AGENT_BROWSER_SOCKET` | Custom socket path for daemon |
327
+ | `DEBUG` | Enable debug logging |
328
+
329
+ ---
330
+
331
+ ## Examples
332
+
333
+ ### Login Flow
334
+ ```bash
335
+ agent-browser open http://localhost:3000/login
336
+ agent-browser act type:input_0:user@test.com type:input_1:secret
337
+ agent-browser act click:button_0
338
+ agent-browser wait --text "Dashboard"
339
+ agent-browser close
340
+ ```
341
+
342
+ ### Form Validation Testing
343
+ ```bash
344
+ agent-browser open http://localhost:3000/signup --headed
345
+ agent-browser act click:button_0 # Submit empty form
346
+ agent-browser wait --text "Email is required"
347
+ agent-browser state # Check error states
348
+ agent-browser close
349
+ ```
350
+
351
+ ### Navigation Testing
352
+ ```bash
353
+ agent-browser open http://localhost:3000
354
+ agent-browser act click:link_0
355
+ agent-browser wait --url "/about"
356
+ agent-browser act back
357
+ agent-browser wait --url "/"
358
+ agent-browser close
359
+ ```
360
+
361
+ ### Multiple Sessions
362
+ ```bash
363
+ # Session A: Admin user
364
+ agent-browser open http://localhost:3000/login --session admin
365
+ agent-browser act type:input_0:admin@test.com --session admin
366
+
367
+ # Session B: Regular user
368
+ agent-browser open http://localhost:3000/login --session user
369
+ agent-browser act type:input_0:user@test.com --session user
370
+
371
+ # Close both
372
+ agent-browser close --session admin
373
+ agent-browser close --session user
374
+ ```
@@ -0,0 +1,211 @@
1
+ ---
2
+ name: agent-browser-loop
3
+ description: Use when an agent must drive a live browser session in a back-and-forth loop (state -> explicit actions -> state) for UI validation, reproducible QA, or debugging UI behavior. Prefer this over one-shot CLI usage when an agent needs inspectable, stepwise control.
4
+ ---
5
+
6
+ # Agent Browser Loop
7
+
8
+ Control a browser via CLI. Execute actions, read state, and verify UI changes in a stepwise loop.
9
+
10
+ ## Quick Start
11
+
12
+ ```bash
13
+ # Open a URL (starts browser daemon automatically)
14
+ agent-browser open http://localhost:3000
15
+
16
+ # Interact and verify
17
+ agent-browser act click:button_0
18
+ agent-browser wait --text "Success"
19
+ agent-browser state
20
+
21
+ # Close when done
22
+ agent-browser close
23
+ ```
24
+
25
+ Use `--headed` to see the browser: `agent-browser open http://localhost:3000 --headed`
26
+
27
+ ## Core Loop
28
+
29
+ 1. **Open**: `agent-browser open <url>` - starts daemon, navigates to URL
30
+ 2. **Act**: `agent-browser act <actions...>` - interact with elements
31
+ 3. **Wait**: `agent-browser wait --text/--selector/--url` - wait for conditions
32
+ 4. **State**: `agent-browser state` - read current page state
33
+ 5. **Repeat** until task complete
34
+ 6. **Close**: `agent-browser close` - stop browser daemon
35
+
36
+ ## Commands
37
+
38
+ | Command | Purpose |
39
+ |---------|---------|
40
+ | `open <url>` | Open URL (starts daemon if needed) |
41
+ | `act <actions...>` | Execute actions |
42
+ | `wait` | Wait for conditions |
43
+ | `state` | Get current page state |
44
+ | `screenshot` | Capture screenshot |
45
+ | `close` | Close browser and daemon |
46
+ | `status` | Check if daemon is running |
47
+
48
+ ## Action Syntax
49
+
50
+ Actions use format `action:target` or `action:target:value`:
51
+
52
+ ```bash
53
+ # Navigation
54
+ agent-browser act navigate:http://localhost:3000
55
+
56
+ # Click elements
57
+ agent-browser act click:button_0
58
+ agent-browser act click:link_2
59
+
60
+ # Type into inputs
61
+ agent-browser act type:input_0:hello
62
+ agent-browser act type:input_1:"text with spaces"
63
+
64
+ # Keyboard
65
+ agent-browser act press:Enter
66
+ agent-browser act press:Tab
67
+
68
+ # Scroll
69
+ agent-browser act scroll:down
70
+ agent-browser act scroll:up:500
71
+
72
+ # Multiple actions
73
+ agent-browser act click:input_0 type:input_0:hello press:Enter
74
+ ```
75
+
76
+ ## Wait Conditions
77
+
78
+ ```bash
79
+ # Wait for text
80
+ agent-browser wait --text "Welcome"
81
+
82
+ # Wait for element
83
+ agent-browser wait --selector "#success"
84
+
85
+ # Wait for URL
86
+ agent-browser wait --url "/dashboard"
87
+
88
+ # Wait for disappearance
89
+ agent-browser wait --not-text "Loading..."
90
+ agent-browser wait --not-selector ".spinner"
91
+
92
+ # Custom timeout (default 30s)
93
+ agent-browser wait --text "Done" --timeout 60000
94
+ ```
95
+
96
+ ## Element References
97
+
98
+ State includes interactive elements with stable refs:
99
+
100
+ ```
101
+ Interactive Elements:
102
+ [0] ref=input_0 textbox "Email" (placeholder="Enter email")
103
+ [1] ref=input_1 textbox "Password" (type="password")
104
+ [2] ref=button_0 button "Sign In"
105
+ [3] ref=link_0 link "Forgot password?" (href="/forgot")
106
+ ```
107
+
108
+ **Use `ref` values in actions**: `click:button_0`, `type:input_0:hello`
109
+
110
+ Refs are type-prefixed (`button_`, `input_`, `link_`, `checkbox_`, `select_`) and stable within a session.
111
+
112
+ ## Reading State
113
+
114
+ State includes:
115
+ - Current URL and title
116
+ - Scroll position
117
+ - Interactive elements with values
118
+ - Console and network errors
119
+
120
+ ```
121
+ URL: http://localhost:3000/login
122
+ Title: Login
123
+ Tabs: 1
124
+
125
+ Scroll: 0px above, 500px below
126
+
127
+ Interactive Elements:
128
+ [0] ref=input_0 textbox "Email" value="user@test.com"
129
+ [1] ref=input_1 textbox "Password" (type="password")
130
+ [2] ref=checkbox_0 checkbox "Remember me" (checked="true")
131
+ [3] ref=button_0 button "Sign In"
132
+
133
+ Errors:
134
+ Console:
135
+ - [error] Failed to load resource: 404
136
+ Network:
137
+ - 404 GET /api/user
138
+ ```
139
+
140
+ ## Complete Example: Login Flow
141
+
142
+ ```bash
143
+ # 1. Open login page
144
+ agent-browser open http://localhost:3000/login
145
+
146
+ # 2. Fill form and submit
147
+ agent-browser act \
148
+ type:input_0:user@example.com \
149
+ type:input_1:password123 \
150
+ click:button_0
151
+
152
+ # 3. Wait for login to complete
153
+ agent-browser wait --text "Welcome" --timeout 5000
154
+
155
+ # 4. Verify state
156
+ agent-browser state
157
+
158
+ # 5. Close when done
159
+ agent-browser close
160
+ ```
161
+
162
+ ## Options
163
+
164
+ ```bash
165
+ # Headed mode (visible browser)
166
+ agent-browser open http://localhost:3000 --headed
167
+
168
+ # Named session
169
+ agent-browser open http://localhost:3000 --session my-test
170
+ agent-browser act click:button_0 --session my-test
171
+
172
+ # JSON output
173
+ agent-browser state --json
174
+
175
+ # Skip state in response
176
+ agent-browser act click:button_0 --no-state
177
+ ```
178
+
179
+ ## Screenshots
180
+
181
+ ```bash
182
+ agent-browser screenshot -o screenshot.png # Save to file
183
+ agent-browser screenshot --full-page -o full.png # Full scrollable page
184
+ agent-browser screenshot # Output base64
185
+ ```
186
+
187
+ Use when text state isn't enough to diagnose visual issues.
188
+
189
+ ## Debugging Tips
190
+
191
+ 1. **Action does nothing?** Check errors in state output
192
+ 2. **Element not found?** Run `agent-browser state` to see current refs
193
+ 3. **Waiting times out?** Check exact text/selector, try simpler condition
194
+ 4. **Need visual check?** Use `--headed` or `agent-browser screenshot`
195
+ 5. **Refs changed?** DOM updates can change refs - re-fetch state
196
+
197
+ ## HTTP Server Mode
198
+
199
+ For multi-session scenarios or HTTP-based integrations:
200
+
201
+ ```bash
202
+ # Start HTTP server
203
+ agent-browser server --headed
204
+
205
+ # Server at http://localhost:3790
206
+ # Full API spec at GET /openapi.json
207
+ ```
208
+
209
+ ## Full Reference
210
+
211
+ See REFERENCE.md for complete CLI documentation.
package/LICENSE ADDED
@@ -0,0 +1,9 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Jason Silberman
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
6
+
7
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
8
+
9
+ THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.