chrome-agent 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,190 @@
1
+ Metadata-Version: 2.4
2
+ Name: chrome-agent
3
+ Version: 0.1.0
4
+ Summary: CLI tool for AI agents to observe and interact with Chrome via CDP
5
+ Keywords: chrome,cdp,devtools,browser,automation,agent
6
+ Author: Corey Gallon
7
+ Author-email: Corey Gallon <366332+captivus@users.noreply.github.com>
8
+ License-Expression: MIT
9
+ Classifier: Development Status :: 3 - Alpha
10
+ Classifier: Environment :: Console
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Programming Language :: Python :: 3.13
17
+ Classifier: Topic :: Software Development :: Testing
18
+ Classifier: Topic :: Utilities
19
+ Requires-Dist: playwright>=1.50.0
20
+ Requires-Python: >=3.11
21
+ Project-URL: Homepage, https://github.com/captivus/chrome-agent
22
+ Project-URL: Repository, https://github.com/captivus/chrome-agent
23
+ Project-URL: Issues, https://github.com/captivus/chrome-agent/issues
24
+ Description-Content-Type: text/markdown
25
+
26
+ # chrome-agent
27
+
28
+ [![PyPI version](https://img.shields.io/pypi/v/chrome-agent)](https://pypi.org/project/chrome-agent/)
29
+ [![Python versions](https://img.shields.io/pypi/pyversions/chrome-agent)](https://pypi.org/project/chrome-agent/)
30
+ [![License](https://img.shields.io/pypi/l/chrome-agent)](https://github.com/captivus/chrome-agent/blob/main/LICENSE)
31
+
32
+ A CLI tool that gives AI coding agents the ability to observe and interact with Chrome browsers.
33
+
34
+ Built as a replacement for browser MCP tools. Faster, lower token overhead, and supports something MCP tools can't do: multiple agents sharing the same browser instance.
35
+
36
+ ## Why this exists
37
+
38
+ AI coding agents need to see and interact with browsers -- to test their code, debug automation, inspect page state. The standard approach (browser MCP tools) uses a persistent server with protocol negotiation and verbose response formatting. `chrome-agent` takes a different approach: each command is a standalone CLI call that connects to Chrome via the DevTools Protocol, does one thing, and disconnects. No server, no session state, no bloat.
39
+
40
+ This also enables a workflow that MCP tools can't support: one process drives the browser (your automation code) while a separate agent observes the same browser to diagnose issues and improve the code.
41
+
42
+ ## Installation
43
+
44
+ ```bash
45
+ uv tool install chrome-agent
46
+ playwright install chromium
47
+ ```
48
+
49
+ Or add to a project:
50
+
51
+ ```bash
52
+ uv add chrome-agent
53
+ uv run playwright install chromium
54
+ ```
55
+
56
+ ## Two ways to use it
57
+
58
+ ### Drive mode -- you control the browser
59
+
60
+ Launch a browser and interact with it directly. This is the MCP replacement use case.
61
+
62
+ ```bash
63
+ chrome-agent launch &
64
+ chrome-agent navigate "https://example.com"
65
+ chrome-agent text # Read page content
66
+ chrome-agent element "h1" # Inspect an element
67
+ chrome-agent fill "#search" "query" # Fill a form field
68
+ chrome-agent click "#submit" # Click a button
69
+ chrome-agent screenshot /tmp/page.png # Capture the screen
70
+ ```
71
+
72
+ ### Attach mode -- observe a running browser
73
+
74
+ Your automation code launches a browser with `--remote-debugging-port=9222`. You connect to observe what the code is doing, diagnose failures, and figure out what to change.
75
+
76
+ ```bash
77
+ chrome-agent status # Is the browser running?
78
+ chrome-agent url # Where is it?
79
+ chrome-agent element "#submit-btn" # Why can't the code click this?
80
+ chrome-agent eval "document.querySelectorAll('.error').length"
81
+ chrome-agent screenshot # What does it look like?
82
+ ```
83
+
84
+ The feedback loop: **write code -> run it -> observe the browser -> diagnose -> modify code -> repeat.**
85
+
86
+ ## Commands
87
+
88
+ ```
89
+ chrome-agent [--port PORT] <command> [args...]
90
+ ```
91
+
92
+ ### Check browser status
93
+
94
+ ```
95
+ status Check if a browser is running on the CDP port
96
+ launch Launch a browser with CDP enabled
97
+ [--fingerprint PATH] [--headless] [--no-pin-desktop]
98
+ help Print command reference
99
+ ```
100
+
101
+ ### Observe (read-only, always safe)
102
+
103
+ ```
104
+ url Print current URL and page title
105
+ screenshot [path] Save a screenshot (default: /tmp/cdp-screenshot.png)
106
+ snapshot Print the ARIA accessibility tree
107
+ text Print visible text content
108
+ html [selector] Print page HTML or a specific element's HTML
109
+ element <selector> Detailed element inspection (visibility, dimensions,
110
+ attributes, position, disabled state)
111
+ find <selector> Count and list all matching elements
112
+ value <selector> Get an input element's current value
113
+ eval <code> Execute JavaScript and print the result
114
+ cookies List all cookies
115
+ tabs List all open tabs/pages
116
+ wait <target> Wait for a selector, milliseconds, or load state
117
+ ```
118
+
119
+ ### Navigate
120
+
121
+ ```
122
+ navigate <url> Go to a URL
123
+ back Browser back
124
+ forward Browser forward
125
+ reload Reload the page
126
+ ```
127
+
128
+ ### Interact
129
+
130
+ ```
131
+ click <selector> Click an element (JS fallback for hidden elements)
132
+ fill <selector> <val> Fill a form field (clears first)
133
+ type <selector> <txt> Type text character by character
134
+ press <key> Press a keyboard key (Enter, Escape, Tab, etc.)
135
+ select <sel> <value> Select a dropdown option
136
+ check <selector> Check a checkbox
137
+ uncheck <selector> Uncheck a checkbox
138
+ hover <selector> Hover over an element
139
+ scroll <target> Scroll to element, or scroll up/down
140
+ clickxy <x> <y> Click at page coordinates
141
+ close Close the current page
142
+ viewport <w> <h> Resize the viewport
143
+ ```
144
+
145
+ ## For AI agents
146
+
147
+ The primary user of this tool is an AI coding agent, not a human. See [INSTRUCTIONS.md](INSTRUCTIONS.md) for comprehensive agent instructions covering:
148
+
149
+ - Drive mode vs attach mode mental model
150
+ - Safety rules for shared browser access
151
+ - The development feedback loop
152
+ - When to observe vs intervene
153
+ - Command recipes for common tasks
154
+ - Failure modes and recovery
155
+
156
+ Include the contents of `INSTRUCTIONS.md` in your project's `CLAUDE.md` or agent instructions file.
157
+
158
+ ## Browser fingerprinting (optional)
159
+
160
+ For sites that detect automated browsers, launch with a fingerprint profile:
161
+
162
+ ```bash
163
+ chrome-agent launch --fingerprint path/to/fingerprint.json
164
+ ```
165
+
166
+ The fingerprint JSON overrides the browser's user agent, viewport, locale, timezone, and platform to match a real desktop browser:
167
+
168
+ ```json
169
+ {
170
+ "userAgent": "Mozilla/5.0 (X11; Linux x86_64) ...",
171
+ "platform": "Linux x86_64",
172
+ "vendor": "Google Inc.",
173
+ "language": "en-US",
174
+ "timezone": "America/Chicago",
175
+ "viewport": {"width": 1920, "height": 1080}
176
+ }
177
+ ```
178
+
179
+ Without `--fingerprint`, the browser launches with default Chromium settings.
180
+
181
+ ## Requirements
182
+
183
+ - Python >= 3.11
184
+ - Playwright >= 1.50.0
185
+ - Chromium (installed via `playwright install chromium`)
186
+ - Linux with xdotool (optional, for virtual desktop pinning)
187
+
188
+ ## License
189
+
190
+ MIT
@@ -0,0 +1,165 @@
1
+ # chrome-agent
2
+
3
+ [![PyPI version](https://img.shields.io/pypi/v/chrome-agent)](https://pypi.org/project/chrome-agent/)
4
+ [![Python versions](https://img.shields.io/pypi/pyversions/chrome-agent)](https://pypi.org/project/chrome-agent/)
5
+ [![License](https://img.shields.io/pypi/l/chrome-agent)](https://github.com/captivus/chrome-agent/blob/main/LICENSE)
6
+
7
+ A CLI tool that gives AI coding agents the ability to observe and interact with Chrome browsers.
8
+
9
+ Built as a replacement for browser MCP tools. Faster, lower token overhead, and supports something MCP tools can't do: multiple agents sharing the same browser instance.
10
+
11
+ ## Why this exists
12
+
13
+ AI coding agents need to see and interact with browsers -- to test their code, debug automation, inspect page state. The standard approach (browser MCP tools) uses a persistent server with protocol negotiation and verbose response formatting. `chrome-agent` takes a different approach: each command is a standalone CLI call that connects to Chrome via the DevTools Protocol, does one thing, and disconnects. No server, no session state, no bloat.
14
+
15
+ This also enables a workflow that MCP tools can't support: one process drives the browser (your automation code) while a separate agent observes the same browser to diagnose issues and improve the code.
16
+
17
+ ## Installation
18
+
19
+ ```bash
20
+ uv tool install chrome-agent
21
+ playwright install chromium
22
+ ```
23
+
24
+ Or add to a project:
25
+
26
+ ```bash
27
+ uv add chrome-agent
28
+ uv run playwright install chromium
29
+ ```
30
+
31
+ ## Two ways to use it
32
+
33
+ ### Drive mode -- you control the browser
34
+
35
+ Launch a browser and interact with it directly. This is the MCP replacement use case.
36
+
37
+ ```bash
38
+ chrome-agent launch &
39
+ chrome-agent navigate "https://example.com"
40
+ chrome-agent text # Read page content
41
+ chrome-agent element "h1" # Inspect an element
42
+ chrome-agent fill "#search" "query" # Fill a form field
43
+ chrome-agent click "#submit" # Click a button
44
+ chrome-agent screenshot /tmp/page.png # Capture the screen
45
+ ```
46
+
47
+ ### Attach mode -- observe a running browser
48
+
49
+ Your automation code launches a browser with `--remote-debugging-port=9222`. You connect to observe what the code is doing, diagnose failures, and figure out what to change.
50
+
51
+ ```bash
52
+ chrome-agent status # Is the browser running?
53
+ chrome-agent url # Where is it?
54
+ chrome-agent element "#submit-btn" # Why can't the code click this?
55
+ chrome-agent eval "document.querySelectorAll('.error').length"
56
+ chrome-agent screenshot # What does it look like?
57
+ ```
58
+
59
+ The feedback loop: **write code -> run it -> observe the browser -> diagnose -> modify code -> repeat.**
60
+
61
+ ## Commands
62
+
63
+ ```
64
+ chrome-agent [--port PORT] <command> [args...]
65
+ ```
66
+
67
+ ### Check browser status
68
+
69
+ ```
70
+ status Check if a browser is running on the CDP port
71
+ launch Launch a browser with CDP enabled
72
+ [--fingerprint PATH] [--headless] [--no-pin-desktop]
73
+ help Print command reference
74
+ ```
75
+
76
+ ### Observe (read-only, always safe)
77
+
78
+ ```
79
+ url Print current URL and page title
80
+ screenshot [path] Save a screenshot (default: /tmp/cdp-screenshot.png)
81
+ snapshot Print the ARIA accessibility tree
82
+ text Print visible text content
83
+ html [selector] Print page HTML or a specific element's HTML
84
+ element <selector> Detailed element inspection (visibility, dimensions,
85
+ attributes, position, disabled state)
86
+ find <selector> Count and list all matching elements
87
+ value <selector> Get an input element's current value
88
+ eval <code> Execute JavaScript and print the result
89
+ cookies List all cookies
90
+ tabs List all open tabs/pages
91
+ wait <target> Wait for a selector, milliseconds, or load state
92
+ ```
93
+
94
+ ### Navigate
95
+
96
+ ```
97
+ navigate <url> Go to a URL
98
+ back Browser back
99
+ forward Browser forward
100
+ reload Reload the page
101
+ ```
102
+
103
+ ### Interact
104
+
105
+ ```
106
+ click <selector> Click an element (JS fallback for hidden elements)
107
+ fill <selector> <val> Fill a form field (clears first)
108
+ type <selector> <txt> Type text character by character
109
+ press <key> Press a keyboard key (Enter, Escape, Tab, etc.)
110
+ select <sel> <value> Select a dropdown option
111
+ check <selector> Check a checkbox
112
+ uncheck <selector> Uncheck a checkbox
113
+ hover <selector> Hover over an element
114
+ scroll <target> Scroll to element, or scroll up/down
115
+ clickxy <x> <y> Click at page coordinates
116
+ close Close the current page
117
+ viewport <w> <h> Resize the viewport
118
+ ```
119
+
120
+ ## For AI agents
121
+
122
+ The primary user of this tool is an AI coding agent, not a human. See [INSTRUCTIONS.md](INSTRUCTIONS.md) for comprehensive agent instructions covering:
123
+
124
+ - Drive mode vs attach mode mental model
125
+ - Safety rules for shared browser access
126
+ - The development feedback loop
127
+ - When to observe vs intervene
128
+ - Command recipes for common tasks
129
+ - Failure modes and recovery
130
+
131
+ Include the contents of `INSTRUCTIONS.md` in your project's `CLAUDE.md` or agent instructions file.
132
+
133
+ ## Browser fingerprinting (optional)
134
+
135
+ For sites that detect automated browsers, launch with a fingerprint profile:
136
+
137
+ ```bash
138
+ chrome-agent launch --fingerprint path/to/fingerprint.json
139
+ ```
140
+
141
+ The fingerprint JSON overrides the browser's user agent, viewport, locale, timezone, and platform to match a real desktop browser:
142
+
143
+ ```json
144
+ {
145
+ "userAgent": "Mozilla/5.0 (X11; Linux x86_64) ...",
146
+ "platform": "Linux x86_64",
147
+ "vendor": "Google Inc.",
148
+ "language": "en-US",
149
+ "timezone": "America/Chicago",
150
+ "viewport": {"width": 1920, "height": 1080}
151
+ }
152
+ ```
153
+
154
+ Without `--fingerprint`, the browser launches with default Chromium settings.
155
+
156
+ ## Requirements
157
+
158
+ - Python >= 3.11
159
+ - Playwright >= 1.50.0
160
+ - Chromium (installed via `playwright install chromium`)
161
+ - Linux with xdotool (optional, for virtual desktop pinning)
162
+
163
+ ## License
164
+
165
+ MIT
@@ -0,0 +1,44 @@
1
+ [project]
2
+ name = "chrome-agent"
3
+ version = "0.1.0"
4
+ description = "CLI tool for AI agents to observe and interact with Chrome via CDP"
5
+ readme = "README.md"
6
+ license = "MIT"
7
+ authors = [
8
+ { name = "Corey Gallon", email = "366332+captivus@users.noreply.github.com" }
9
+ ]
10
+ keywords = ["chrome", "cdp", "devtools", "browser", "automation", "agent"]
11
+ classifiers = [
12
+ "Development Status :: 3 - Alpha",
13
+ "Environment :: Console",
14
+ "Intended Audience :: Developers",
15
+ "License :: OSI Approved :: MIT License",
16
+ "Programming Language :: Python :: 3",
17
+ "Programming Language :: Python :: 3.11",
18
+ "Programming Language :: Python :: 3.12",
19
+ "Programming Language :: Python :: 3.13",
20
+ "Topic :: Software Development :: Testing",
21
+ "Topic :: Utilities",
22
+ ]
23
+ requires-python = ">=3.11"
24
+ dependencies = [
25
+ "playwright>=1.50.0",
26
+ ]
27
+
28
+ [project.scripts]
29
+ chrome-agent = "chrome_agent.cli:main"
30
+
31
+ [project.urls]
32
+ Homepage = "https://github.com/captivus/chrome-agent"
33
+ Repository = "https://github.com/captivus/chrome-agent"
34
+ Issues = "https://github.com/captivus/chrome-agent/issues"
35
+
36
+ [build-system]
37
+ requires = ["uv_build>=0.9.17,<0.10.0"]
38
+ build-backend = "uv_build"
39
+
40
+ [dependency-groups]
41
+ dev = [
42
+ "pytest>=9.0.2",
43
+ "pytest-asyncio>=1.3.0",
44
+ ]
@@ -0,0 +1,5 @@
1
+ """chrome-agent: CLI tool for AI agents to observe and interact with Chrome via CDP."""
2
+
3
+ from importlib.metadata import version
4
+
5
+ __version__ = version("chrome-agent")
@@ -0,0 +1,5 @@
1
+ """Allow running as: python -m chrome_agent"""
2
+
3
+ from chrome_agent.cli import main
4
+
5
+ main()
@@ -0,0 +1,190 @@
1
+ """Browser launcher with optional anti-detection fingerprinting.
2
+
3
+ Launches a Playwright Chromium browser with CDP enabled. Optionally
4
+ applies a fingerprint profile to make the browser appear as a real
5
+ desktop browser.
6
+ """
7
+
8
+ import asyncio
9
+ import json
10
+ import logging
11
+ import os
12
+ import subprocess
13
+ from dataclasses import dataclass
14
+
15
+ from playwright.async_api import Browser, Page, Playwright, async_playwright
16
+
17
+ logger = logging.getLogger(__name__)
18
+
19
+
20
+ @dataclass
21
+ class BrowserFingerprint:
22
+ """Browser fingerprint profile for anti-detection."""
23
+
24
+ user_agent: str
25
+ viewport: dict[str, int]
26
+ locale: str
27
+ timezone: str
28
+ platform: str
29
+ vendor: str
30
+
31
+
32
+ @dataclass
33
+ class BrowserSession:
34
+ """Handle to a launched browser session."""
35
+
36
+ playwright: Playwright
37
+ browser: Browser
38
+ page: Page
39
+
40
+
41
+ async def load_fingerprint(*, path: str) -> BrowserFingerprint:
42
+ """Load a browser fingerprint profile from a JSON file.
43
+
44
+ Expected JSON schema:
45
+ {
46
+ "userAgent": "...",
47
+ "platform": "...",
48
+ "vendor": "...",
49
+ "language": "en-US",
50
+ "timezone": "America/Chicago",
51
+ "viewport": {"width": 1920, "height": 1080}
52
+ }
53
+ """
54
+ with open(path, "r") as f:
55
+ data = json.load(f)
56
+
57
+ return BrowserFingerprint(
58
+ user_agent=data["userAgent"],
59
+ viewport=data["viewport"],
60
+ locale=data["language"],
61
+ timezone=data["timezone"],
62
+ platform=data["platform"],
63
+ vendor=data["vendor"],
64
+ )
65
+
66
+
67
+ async def launch_browser(
68
+ *,
69
+ port: int = 9222,
70
+ fingerprint: BrowserFingerprint | None = None,
71
+ headless: bool = False,
72
+ wm_class: str = "chrome-agent",
73
+ pin_to_desktop: bool = True,
74
+ ) -> BrowserSession:
75
+ """Launch a Chromium browser with CDP enabled.
76
+
77
+ Args:
78
+ port: CDP remote debugging port.
79
+ fingerprint: Optional fingerprint for anti-detection. If None,
80
+ launches a clean browser with no spoofing.
81
+ headless: Run in headless mode.
82
+ wm_class: X11 window class name (for desktop pinning).
83
+ pin_to_desktop: Move browser window to the launching terminal's
84
+ virtual desktop. Default True. Linux/X11 only, requires
85
+ xdotool -- silently skipped on other platforms.
86
+ """
87
+ playwright = await async_playwright().start()
88
+
89
+ launch_args = [
90
+ "--disable-blink-features=AutomationControlled",
91
+ f"--remote-debugging-port={port}",
92
+ f"--class={wm_class}",
93
+ ]
94
+
95
+ browser = await playwright.chromium.launch(
96
+ headless=headless,
97
+ args=launch_args,
98
+ )
99
+
100
+ # Build context options
101
+ context_kwargs = {}
102
+ if fingerprint:
103
+ context_kwargs.update(
104
+ user_agent=fingerprint.user_agent,
105
+ viewport=fingerprint.viewport,
106
+ locale=fingerprint.locale,
107
+ timezone_id=fingerprint.timezone,
108
+ )
109
+
110
+ context = await browser.new_context(**context_kwargs)
111
+
112
+ # Apply anti-detection init script if fingerprinted
113
+ if fingerprint:
114
+ await context.add_init_script(f"""
115
+ Object.defineProperty(navigator, 'webdriver', {{
116
+ get: () => false
117
+ }});
118
+
119
+ Object.defineProperty(navigator, 'platform', {{
120
+ get: () => '{fingerprint.platform}'
121
+ }});
122
+
123
+ Object.defineProperty(navigator, 'vendor', {{
124
+ get: () => '{fingerprint.vendor}'
125
+ }});
126
+
127
+ window.chrome = {{
128
+ runtime: {{}},
129
+ app: {{}}
130
+ }};
131
+ """)
132
+
133
+ page = await context.new_page()
134
+
135
+ session = BrowserSession(
136
+ playwright=playwright,
137
+ browser=browser,
138
+ page=page,
139
+ )
140
+
141
+ if pin_to_desktop:
142
+ await _move_to_launching_desktop(wm_class=wm_class)
143
+
144
+ return session
145
+
146
+
147
+ async def _move_to_launching_desktop(*, wm_class: str) -> None:
148
+ """Move the browser window to the terminal's virtual desktop.
149
+
150
+ Linux/X11 only. Requires xdotool. Silently does nothing if
151
+ xdotool is unavailable or on non-X11 systems.
152
+ """
153
+ try:
154
+ # Determine which desktop our terminal is on
155
+ window_id = os.environ.get("WINDOWID", "")
156
+ if window_id:
157
+ result = subprocess.run(
158
+ ["xdotool", "get_desktop_for_window", window_id],
159
+ capture_output=True, text=True,
160
+ )
161
+ desktop = result.stdout.strip()
162
+ else:
163
+ result = subprocess.run(
164
+ ["xdotool", "get_desktop"],
165
+ capture_output=True, text=True,
166
+ )
167
+ desktop = result.stdout.strip()
168
+
169
+ if not desktop:
170
+ return
171
+
172
+ # Wait for the browser window to appear
173
+ await asyncio.sleep(0.5)
174
+
175
+ # Find our browser windows by the custom WM_CLASS
176
+ result = subprocess.run(
177
+ ["xdotool", "search", "--class", wm_class],
178
+ capture_output=True, text=True,
179
+ )
180
+ for wid in result.stdout.strip().split("\n"):
181
+ if wid.strip():
182
+ subprocess.run(
183
+ ["xdotool", "set_desktop_for_window", wid.strip(), desktop],
184
+ )
185
+
186
+ logger.info("Moved browser window(s) to desktop %s", desktop)
187
+ except FileNotFoundError:
188
+ logger.debug("xdotool not available -- skipping desktop move")
189
+ except Exception as e:
190
+ logger.debug("Could not move browser to desktop: %s", e)