ai-agent-browser 0.1.3__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,389 @@
1
+ Metadata-Version: 2.4
2
+ Name: ai-agent-browser
3
+ Version: 0.1.3
4
+ Summary: A robust browser automation tool for AI agents - control browsers via CLI or IPC
5
+ Author: Agent Browser Contributors
6
+ License: GPL-3.0-only
7
+ Project-URL: Homepage, https://github.com/abhinav-nigam/agent-browser
8
+ Project-URL: Documentation, https://github.com/abhinav-nigam/agent-browser#readme
9
+ Project-URL: Repository, https://github.com/abhinav-nigam/agent-browser
10
+ Project-URL: Issues, https://github.com/abhinav-nigam/agent-browser/issues
11
+ Keywords: browser,automation,playwright,testing,ai,agent,mcp,claude-code,autonomous-agents,aider,llm-agent,ai-agent,browser-automation-cli,gpt-control,file-ipc,web-scraping-cli,headless-browser,cli-automation,llm-tools,ui-testing,browser-testing,agentic-testing,playwright-cli
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Environment :: Console
14
+ Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: Operating System :: OS Independent
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Programming Language :: Python :: 3.9
19
+ Classifier: Programming Language :: Python :: 3.10
20
+ Classifier: Programming Language :: Python :: 3.11
21
+ Classifier: Programming Language :: Python :: 3.12
22
+ Classifier: Topic :: Software Development :: Testing
23
+ Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
24
+ Requires-Python: >=3.9
25
+ Description-Content-Type: text/markdown
26
+ License-File: LICENSE
27
+ Requires-Dist: playwright>=1.40.0
28
+ Requires-Dist: pillow>=10.0.0
29
+ Provides-Extra: dev
30
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
31
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
32
+ Dynamic: license-file
33
+
34
+ # agent-browser
35
+
36
+ A robust browser automation tool designed for AI agents to control browsers via CLI commands.
37
+
38
+ [![PyPI version](https://badge.fury.io/py/ai-agent-browser.svg)](https://badge.fury.io/py/ai-agent-browser)
39
+ [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
40
+ [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
41
+
42
+ ## 🎬 Feature Showcase
43
+
44
+ | **The Researcher (Claude)** | **The Architect (Gemini)** | **The Data Op (Interpreter)** |
45
+ | :--- | :--- | :--- |
46
+ | ![Claude Demo](demo_claude_research.gif) | ![Gemini Demo](demo_gemini_audit.gif) | ![Interpreter Demo](demo_interpreter_data.gif) |
47
+ | *Autonomous research & data extraction.* | *Cross-page architectural audits.* | *Complex table scraping to structured data.* |
48
+
49
+ ## How to use this with Claude Code / Aider / ChatGPT
50
+
51
+ Copy-paste this prompt to let your AI pair-programmer drive `agent-browser` safely:
52
+
53
+ ```
54
+ You can run shell commands on my machine. Use `agent-browser start <url> --session <name>` to launch a browser, then `agent-browser cmd <action> --session <name>` for steps like `screenshot`, `click`, `fill`, `assert_visible`, and `wait_for`. Keep sessions isolated by always passing `--session <name>` and stop them with `agent-browser stop --session <name>` when done. Screenshots land in ./screenshots. Avoid writing outside the project; use relative paths only. If you need to upload a file, ask me for the path first.
55
+ ```
56
+
57
+ ## Why This Exists
58
+
59
+ AI agents (like Claude Code, Codex, GPT-based tools) need to interact with web applications for testing and automation. However, most browser automation tools require:
60
+ - Programmatic API access within a running process
61
+ - Complex async/await patterns
62
+ - Persistent connections
63
+
64
+ **agent-browser** solves this by providing:
65
+ - **Simple CLI commands** - Any process that can run shell commands can control a browser
66
+ - **File-based IPC** - Stateless CLI commands control a stateful browser session
67
+ - **Multi-session support** - Run multiple browser sessions concurrently
68
+ - **Built for AI** - Screenshots auto-resize for vision models, assertions return clear PASS/FAIL
69
+
70
+ ## Installation
71
+
72
+ ```bash
73
+ pip install ai-agent-browser
74
+ playwright install chromium
75
+ ```
76
+
77
+ ## Quick Start
78
+
79
+ ```bash
80
+ # Terminal 1: Start browser (blocks while running)
81
+ agent-browser start http://localhost:8080
82
+
83
+ # Terminal 2: Send commands
84
+ agent-browser cmd screenshot home
85
+ agent-browser cmd click "button[type='submit']"
86
+ agent-browser cmd fill "#email" "test@example.com"
87
+ agent-browser cmd assert_visible ".success-message"
88
+
89
+ # When done
90
+ agent-browser stop
91
+ ```
92
+
93
+ ## Security Features
94
+
95
+ - Path traversal protection on file paths (screenshots, uploads) to keep writes inside allowed directories.
96
+ - Session isolation via explicit `--session` flags so concurrent agents stay sandboxed from each other.
97
+
98
+ ## Architecture
99
+
100
+ ```
101
+ +-------------------+ +----------------------+ +------------------+
102
+ | AI Agent / LLM | <--> | CLI + IPC files | <--> | Browser (PW) |
103
+ | (Claude, Codex) | | cmd.json / result | | Chromium/Playwr. |
104
+ +-------------------+ +----------------------+ +------------------+
105
+ ```
106
+
107
+ The browser runs in one process, listening for commands via JSON files. CLI commands write to `cmd.json`, the browser processes them and writes results to `result.json`. This decoupled architecture allows any process to control the browser.
108
+
109
+ ## Command Reference
110
+
111
+ ### Browser Control
112
+
113
+ | Command | Description | Example |
114
+ |---------|-------------|---------|
115
+ | `start <url>` | Start browser session (blocks) | `agent-browser start http://localhost:8080` |
116
+ | `start <url> --visible` | Start in headed mode | `agent-browser start http://localhost:8080 --visible` |
117
+ | `stop` | Close browser | `agent-browser stop` |
118
+ | `status` | Check if browser is running | `agent-browser status` |
119
+ | `cmd reload` | Reload current page | `agent-browser cmd reload` |
120
+ | `cmd goto <url>` | Navigate to URL | `agent-browser cmd goto http://example.com` |
121
+ | `cmd back` | Navigate back | `agent-browser cmd back` |
122
+ | `cmd forward` | Navigate forward | `agent-browser cmd forward` |
123
+ | `cmd url` | Print current URL | `agent-browser cmd url` |
124
+ | `cmd viewport <w> <h>` | Set viewport size | `agent-browser cmd viewport 1920 1080` |
125
+
126
+ ### Screenshots
127
+
128
+ | Command | Description | Example |
129
+ |---------|-------------|---------|
130
+ | `cmd screenshot [name]` | Full-page screenshot | `agent-browser cmd screenshot checkout_page` |
131
+ | `cmd screenshot viewport [name]` | Viewport only (faster) | `agent-browser cmd screenshot viewport header` |
132
+ | `cmd ss [name]` | Alias for screenshot | `agent-browser cmd ss step1` |
133
+
134
+ Screenshots are automatically resized to max 1500x1500 for AI vision model compatibility.
135
+
136
+ ### Interactions
137
+
138
+ | Command | Description | Example |
139
+ |---------|-------------|---------|
140
+ | `cmd click <selector>` | Click element | `agent-browser cmd click "#submit-btn"` |
141
+ | `cmd click_nth <selector> <n>` | Click nth element (0-indexed) | `agent-browser cmd click_nth ".item" 2` |
142
+ | `cmd fill <selector> <text>` | Fill input field | `agent-browser cmd fill "#email" "test@example.com"` |
143
+ | `cmd type <selector> <text>` | Type with key events | `agent-browser cmd type "#search" "query"` |
144
+ | `cmd select <selector> <value>` | Select dropdown option | `agent-browser cmd select "#country" "US"` |
145
+ | `cmd press <key>` | Press keyboard key | `agent-browser cmd press Enter` |
146
+ | `cmd scroll <direction>` | Scroll page | `agent-browser cmd scroll down` |
147
+ | `cmd hover <selector>` | Hover over element | `agent-browser cmd hover ".tooltip-trigger"` |
148
+ | `cmd focus <selector>` | Focus element | `agent-browser cmd focus "#input"` |
149
+ | `cmd upload <selector> <path>` | Upload file | `agent-browser cmd upload "#file" ./doc.pdf` |
150
+ | `cmd dialog <action> [text]` | Handle dialog | `agent-browser cmd dialog accept` |
151
+ | `cmd clear` | Clear localStorage/sessionStorage | `agent-browser cmd clear` |
152
+
153
+ **Scroll directions:** `up`, `down`, `top`, `bottom`, `left`, `right`
154
+
155
+ **Dialog actions:** `accept`, `dismiss`, `accept <prompt_text>`
156
+
157
+ ### Assertions
158
+
159
+ All assertions return `[PASS]` or `[FAIL]` prefix for easy parsing.
160
+
161
+ | Command | Description | Example |
162
+ |---------|-------------|---------|
163
+ | `cmd assert_visible <selector>` | Element is visible | `agent-browser cmd assert_visible ".modal"` |
164
+ | `cmd assert_hidden <selector>` | Element is hidden | `agent-browser cmd assert_hidden ".loading"` |
165
+ | `cmd assert_text <selector> <text>` | Element contains text | `agent-browser cmd assert_text ".msg" "Success"` |
166
+ | `cmd assert_text_exact <sel> <text>` | Text matches exactly | `agent-browser cmd assert_text_exact ".count" "42"` |
167
+ | `cmd assert_value <selector> <value>` | Input has value | `agent-browser cmd assert_value "#email" "test@example.com"` |
168
+ | `cmd assert_checked <selector>` | Checkbox is checked | `agent-browser cmd assert_checked "#agree"` |
169
+ | `cmd assert_url <pattern>` | URL contains pattern | `agent-browser cmd assert_url "/dashboard"` |
170
+
171
+ ### Data Extraction
172
+
173
+ | Command | Description | Example |
174
+ |---------|-------------|---------|
175
+ | `cmd text <selector>` | Get text content | `agent-browser cmd text ".title"` |
176
+ | `cmd value <selector>` | Get input value | `agent-browser cmd value "#email"` |
177
+ | `cmd attr <selector> <attr>` | Get attribute | `agent-browser cmd attr "a" "href"` |
178
+ | `cmd count <selector>` | Count matching elements | `agent-browser cmd count ".item"` |
179
+ | `cmd eval <javascript>` | Execute JavaScript | `agent-browser cmd eval "document.title"` |
180
+ | `cmd cookies` | Get all cookies (JSON) | `agent-browser cmd cookies` |
181
+ | `cmd storage` | Get localStorage (JSON) | `agent-browser cmd storage` |
182
+
183
+ ### Debugging
184
+
185
+ | Command | Description | Example |
186
+ |---------|-------------|---------|
187
+ | `cmd console` | View JS console logs | `agent-browser cmd console` |
188
+ | `cmd network` | View network requests | `agent-browser cmd network` |
189
+ | `cmd network_failed` | View failed requests | `agent-browser cmd network_failed` |
190
+ | `cmd clear_logs` | Clear console/network logs | `agent-browser cmd clear_logs` |
191
+ | `cmd wait <ms>` | Wait milliseconds | `agent-browser cmd wait 2000` |
192
+ | `cmd wait_for <selector> [ms]` | Wait for element | `agent-browser cmd wait_for ".loaded" 15000` |
193
+ | `cmd wait_for_text <text>` | Wait for text | `agent-browser cmd wait_for_text "Complete"` |
194
+ | `cmd help` | Show help | `agent-browser cmd help` |
195
+
196
+ ### Flag Tips
197
+
198
+ - `cmd --timeout <seconds>` overrides the IPC wait when sending commands (e.g., `agent-browser cmd --timeout 30 wait_for ".loaded" 20000`).
199
+ - `interact --headless` runs the interactive REPL without opening a visible browser window (e.g., `agent-browser interact http://localhost:8080 --headless`).
200
+
201
+ ## Session Management
202
+
203
+ Run multiple browser sessions concurrently using session IDs:
204
+
205
+ ```bash
206
+ # Start two sessions
207
+ agent-browser start http://localhost:8080 --session app1
208
+ agent-browser start http://localhost:9090 --session app2
209
+
210
+ # Send commands to specific sessions
211
+ agent-browser cmd screenshot home --session app1
212
+ agent-browser cmd click "#login" --session app2
213
+
214
+ # Check status
215
+ agent-browser status --session app1
216
+
217
+ # Stop specific session
218
+ agent-browser stop --session app1
219
+ ```
220
+
221
+ ## Configuration
222
+
223
+ ### Screenshot Output Directory
224
+
225
+ ```bash
226
+ agent-browser start http://localhost:8080 --output-dir ./my-screenshots
227
+ ```
228
+
229
+ ### Timeouts
230
+
231
+ Default timeouts:
232
+ - **Command timeout:** 5 seconds (click, fill, etc.)
233
+ - **wait_for timeout:** 10 seconds (can override: `wait_for .element 15000`)
234
+ - **IPC timeout:** 10 seconds (waiting for browser response) — increase with `cmd --timeout <seconds>` if your action needs more time.
235
+
236
+ ## Selectors
237
+
238
+ Use standard Playwright/CSS selectors:
239
+
240
+ ```bash
241
+ # CSS selectors
242
+ agent-browser cmd click ".btn-primary"
243
+ agent-browser cmd click "#submit"
244
+ agent-browser cmd click "button[type='submit']"
245
+ agent-browser cmd click "[data-testid='login-btn']"
246
+
247
+ # Text selectors
248
+ agent-browser cmd click "text='Sign In'"
249
+ agent-browser cmd click "text=Submit"
250
+
251
+ # Chained selectors
252
+ agent-browser cmd click ".card >> text='Edit'"
253
+ ```
254
+
255
+ ## Interactive Mode
256
+
257
+ For manual testing with AI assistance:
258
+
259
+ ```bash
260
+ agent-browser interact http://localhost:8080
261
+ ```
262
+
263
+ Headless REPL run:
264
+
265
+ ```bash
266
+ agent-browser interact http://localhost:8080 --headless
267
+ ```
268
+
269
+ This starts a REPL where you can type commands directly:
270
+
271
+ ```
272
+ > ss initial
273
+ Screenshot saved: ./screenshots/interactive/step_01_initial.png
274
+ > click #login
275
+ Clicked: #login
276
+ > ss after_login
277
+ Screenshot saved: ./screenshots/interactive/step_02_after_login.png
278
+ > quit
279
+ ```
280
+
281
+ ## Integration with AI Agents
282
+
283
+ ### Claude Code Example
284
+
285
+ ```bash
286
+ # In Claude Code conversation:
287
+ # "Test the login flow on localhost:8080"
288
+
289
+ # Claude runs:
290
+ agent-browser start http://localhost:8080 --session test1 &
291
+ sleep 2
292
+ agent-browser cmd screenshot login_page --session test1
293
+ # Claude analyzes screenshot...
294
+ agent-browser cmd fill "#username" "testuser" --session test1
295
+ agent-browser cmd fill "#password" "testpass" --session test1
296
+ agent-browser cmd click "button[type='submit']" --session test1
297
+ agent-browser cmd wait_for ".dashboard" --session test1
298
+ agent-browser cmd assert_url "/dashboard" --session test1
299
+ agent-browser cmd screenshot success --session test1
300
+ agent-browser stop --session test1
301
+ ```
302
+
303
+ ### Generic LLM Integration
304
+
305
+ ```python
306
+ import subprocess
307
+
308
+ def browser_cmd(cmd: str, session: str = "default") -> str:
309
+ result = subprocess.run(
310
+ ["agent-browser", "cmd", *cmd.split(), "--session", session],
311
+ capture_output=True, text=True
312
+ )
313
+ return result.stdout.strip()
314
+
315
+ # Start browser (in separate process)
316
+ subprocess.Popen(["agent-browser", "start", "http://localhost:8080", "--session", "test"])
317
+
318
+ # Send commands
319
+ browser_cmd("screenshot initial", "test")
320
+ browser_cmd("click #login", "test")
321
+ browser_cmd("assert_visible .dashboard", "test")
322
+ ```
323
+
324
+ ## File Locations
325
+
326
+ | File | Location | Purpose |
327
+ |------|----------|---------|
328
+ | State | `%TEMP%/agent_browser_{session}_state.json` | Browser running state |
329
+ | Commands | `%TEMP%/agent_browser_{session}_cmd.json` | Pending command |
330
+ | Results | `%TEMP%/agent_browser_{session}_result.json` | Command result |
331
+ | Console logs | `%TEMP%/agent_browser_{session}_console.json` | JS console output |
332
+ | Network logs | `%TEMP%/agent_browser_{session}_network.json` | Network requests |
333
+ | Screenshots | `./screenshots/` (configurable) | Captured screenshots |
334
+
335
+ ## Troubleshooting
336
+
337
+ | Problem | Solution |
338
+ |---------|----------|
339
+ | `Timeout waiting for result` | Browser may have crashed - run `status` to check |
340
+ | `Element not found` | Use `count` to verify selector matches elements |
341
+ | `Browser not responding` | Run `status` to ping the browser |
342
+ | `Browser process has died` | State was stale - run `start <url>` to restart |
343
+ | `Complex selector failing` | Use `eval` with JavaScript as fallback |
344
+
345
+ ### Debug Workflow
346
+
347
+ ```bash
348
+ # 1. Check browser status
349
+ agent-browser status
350
+
351
+ # 2. Check for JS errors
352
+ agent-browser cmd console
353
+
354
+ # 3. Check for failed requests
355
+ agent-browser cmd network_failed
356
+
357
+ # 4. Take screenshot to see current state
358
+ agent-browser cmd screenshot debug
359
+
360
+ # 5. Count elements to verify selector
361
+ agent-browser cmd count ".my-selector"
362
+ ```
363
+
364
+ ## Python API
365
+
366
+ You can also use agent-browser as a Python library:
367
+
368
+ ```python
369
+ from agent_browser import BrowserDriver
370
+
371
+ driver = BrowserDriver(session_id="test", output_dir="./screenshots")
372
+
373
+ # Start browser (blocking - run in thread/process)
374
+ # driver.start("http://localhost:8080")
375
+
376
+ # Or send commands to running browser
377
+ result = driver.send_command("screenshot home")
378
+ print(result)
379
+
380
+ status = driver.status() # Returns True if running
381
+ ```
382
+
383
+ ## Contributing
384
+
385
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
386
+
387
+ ## License
388
+
389
+ GNU General Public License v3.0 - see [LICENSE](LICENSE) for details.
@@ -0,0 +1,11 @@
1
+ agent_browser/__init__.py,sha256=EKkQy9ufgBd_4Kjmzyt_jIQO-XAZ37zCyqpQbn01Vi0,409
2
+ agent_browser/driver.py,sha256=xKHtodmMR26cG73DAtmOofZ-wi3q-1XA4ZHv2Eb_mHo,34814
3
+ agent_browser/interactive.py,sha256=TT9XWob-TOZlwW4HMNbMkjIpqZ9Mfaf7jBLaaawfxrg,8201
4
+ agent_browser/main.py,sha256=ETvTR17YB6nSGCIvoDgTMMTiEo8900LhdSqzamAmfpQ,5768
5
+ agent_browser/utils.py,sha256=1L_q69n5Fyq6JAqUGBFNc3XyWlUkPmcIsF8WAVyXB-8,11073
6
+ ai_agent_browser-0.1.3.dist-info/licenses/LICENSE,sha256=_RlBqQWRAsdWmbpyw8uOtd2oZNrirKWVxCyn0rZ-tT0,1083
7
+ ai_agent_browser-0.1.3.dist-info/METADATA,sha256=4Ld9epDgjeB7j24NWQxOpWgoE0I6nvwRKp1rYBlsJhk,15464
8
+ ai_agent_browser-0.1.3.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
9
+ ai_agent_browser-0.1.3.dist-info/entry_points.txt,sha256=ecV0nOuu7SYDw0fEYB8NJ-JkZL7VMCEBcO_uz2kIQbo,58
10
+ ai_agent_browser-0.1.3.dist-info/top_level.txt,sha256=dNG4hD8lHlr8VzLft2BjK6iYu7oEpasb7eROGeAWBlQ,14
11
+ ai_agent_browser-0.1.3.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (80.9.0)
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
5
+
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ agent-browser = agent_browser.main:main
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Agent Browser Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1 @@
1
+ agent_browser