PyPI - chrome-agent - Versions diffs - 0.1.0__tar.gz - Mend

chrome-agent 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

chrome_agent-0.1.0/PKG-INFO +190 -0
chrome_agent-0.1.0/README.md +165 -0
chrome_agent-0.1.0/pyproject.toml +44 -0
chrome_agent-0.1.0/src/chrome_agent/__init__.py +5 -0
chrome_agent-0.1.0/src/chrome_agent/__main__.py +5 -0
chrome_agent-0.1.0/src/chrome_agent/browser.py +190 -0
chrome_agent-0.1.0/src/chrome_agent/cli.py +279 -0
chrome_agent-0.1.0/src/chrome_agent/commands.py +370 -0
chrome_agent-0.1.0/src/chrome_agent/connection.py +109 -0
chrome_agent-0.1.0/src/chrome_agent/errors.py +45 -0

chrome_agent-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,190 @@
+Metadata-Version: 2.4
+Name: chrome-agent
+Version: 0.1.0
+Summary: CLI tool for AI agents to observe and interact with Chrome via CDP
+Keywords: chrome,cdp,devtools,browser,automation,agent
+Author: Corey Gallon
+Author-email: Corey Gallon <366332+captivus@users.noreply.github.com>
+License-Expression: MIT
+Classifier: Development Status :: 3 - Alpha
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Software Development :: Testing
+Classifier: Topic :: Utilities
+Requires-Dist: playwright>=1.50.0
+Requires-Python: >=3.11
+Project-URL: Homepage, https://github.com/captivus/chrome-agent
+Project-URL: Repository, https://github.com/captivus/chrome-agent
+Project-URL: Issues, https://github.com/captivus/chrome-agent/issues
+Description-Content-Type: text/markdown
+# chrome-agent
+[![PyPI version](https://img.shields.io/pypi/v/chrome-agent)](https://pypi.org/project/chrome-agent/)
+[![Python versions](https://img.shields.io/pypi/pyversions/chrome-agent)](https://pypi.org/project/chrome-agent/)
+[![License](https://img.shields.io/pypi/l/chrome-agent)](https://github.com/captivus/chrome-agent/blob/main/LICENSE)
+A CLI tool that gives AI coding agents the ability to observe and interact with Chrome browsers.
+Built as a replacement for browser MCP tools. Faster, lower token overhead, and supports something MCP tools can't do: multiple agents sharing the same browser instance.
+## Why this exists
+AI coding agents need to see and interact with browsers -- to test their code, debug automation, inspect page state. The standard approach (browser MCP tools) uses a persistent server with protocol negotiation and verbose response formatting. `chrome-agent` takes a different approach: each command is a standalone CLI call that connects to Chrome via the DevTools Protocol, does one thing, and disconnects. No server, no session state, no bloat.
+This also enables a workflow that MCP tools can't support: one process drives the browser (your automation code) while a separate agent observes the same browser to diagnose issues and improve the code.
+## Installation
+```bash
+uv tool install chrome-agent
+playwright install chromium
+```
+Or add to a project:
+```bash
+uv add chrome-agent
+uv run playwright install chromium
+```
+## Two ways to use it
+### Drive mode -- you control the browser
+Launch a browser and interact with it directly. This is the MCP replacement use case.
+```bash
+chrome-agent launch &
+chrome-agent navigate "https://example.com"
+chrome-agent text                        # Read page content
+chrome-agent element "h1"                # Inspect an element
+chrome-agent fill "#search" "query"      # Fill a form field
+chrome-agent click "#submit"             # Click a button
+chrome-agent screenshot /tmp/page.png    # Capture the screen
+```
+### Attach mode -- observe a running browser
+Your automation code launches a browser with `--remote-debugging-port=9222`. You connect to observe what the code is doing, diagnose failures, and figure out what to change.
+```bash
+chrome-agent status                      # Is the browser running?
+chrome-agent url                         # Where is it?
+chrome-agent element "#submit-btn"       # Why can't the code click this?
+chrome-agent eval "document.querySelectorAll('.error').length"
+chrome-agent screenshot                  # What does it look like?
+```
+The feedback loop: **write code -> run it -> observe the browser -> diagnose -> modify code -> repeat.**
+## Commands
+```
+chrome-agent [--port PORT] <command> [args...]
+```
+### Check browser status
+```
+status                Check if a browser is running on the CDP port
+launch                Launch a browser with CDP enabled
+                      [--fingerprint PATH] [--headless] [--no-pin-desktop]
+help                  Print command reference
+```
+### Observe (read-only, always safe)
+```
+url                   Print current URL and page title
+screenshot [path]     Save a screenshot (default: /tmp/cdp-screenshot.png)
+snapshot              Print the ARIA accessibility tree
+text                  Print visible text content
+html [selector]       Print page HTML or a specific element's HTML
+element <selector>    Detailed element inspection (visibility, dimensions,
+                      attributes, position, disabled state)
+find <selector>       Count and list all matching elements
+value <selector>      Get an input element's current value
+eval <code>           Execute JavaScript and print the result
+cookies               List all cookies
+tabs                  List all open tabs/pages
+wait <target>         Wait for a selector, milliseconds, or load state
+```
+### Navigate
+```
+navigate <url>        Go to a URL
+back                  Browser back
+forward               Browser forward
+reload                Reload the page
+```
+### Interact
+```
+click <selector>      Click an element (JS fallback for hidden elements)
+fill <selector> <val> Fill a form field (clears first)
+type <selector> <txt> Type text character by character
+press <key>           Press a keyboard key (Enter, Escape, Tab, etc.)
+select <sel> <value>  Select a dropdown option
+check <selector>      Check a checkbox
+uncheck <selector>    Uncheck a checkbox
+hover <selector>      Hover over an element
+scroll <target>       Scroll to element, or scroll up/down
+clickxy <x> <y>       Click at page coordinates
+close                 Close the current page
+viewport <w> <h>      Resize the viewport
+```
+## For AI agents
+The primary user of this tool is an AI coding agent, not a human. See [INSTRUCTIONS.md](INSTRUCTIONS.md) for comprehensive agent instructions covering:
+- Drive mode vs attach mode mental model
+- Safety rules for shared browser access
+- The development feedback loop
+- When to observe vs intervene
+- Command recipes for common tasks
+- Failure modes and recovery
+Include the contents of `INSTRUCTIONS.md` in your project's `CLAUDE.md` or agent instructions file.
+## Browser fingerprinting (optional)
+For sites that detect automated browsers, launch with a fingerprint profile:
+```bash
+chrome-agent launch --fingerprint path/to/fingerprint.json
+```
+The fingerprint JSON overrides the browser's user agent, viewport, locale, timezone, and platform to match a real desktop browser:
+```json
+{
+    "userAgent": "Mozilla/5.0 (X11; Linux x86_64) ...",
+    "platform": "Linux x86_64",
+    "vendor": "Google Inc.",
+    "language": "en-US",
+    "timezone": "America/Chicago",
+    "viewport": {"width": 1920, "height": 1080}
+}
+```
+Without `--fingerprint`, the browser launches with default Chromium settings.
+## Requirements
+- Python >= 3.11
+- Playwright >= 1.50.0
+- Chromium (installed via `playwright install chromium`)
+- Linux with xdotool (optional, for virtual desktop pinning)
+## License
+MIT

chrome_agent-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,165 @@
+# chrome-agent
+[![PyPI version](https://img.shields.io/pypi/v/chrome-agent)](https://pypi.org/project/chrome-agent/)
+[![Python versions](https://img.shields.io/pypi/pyversions/chrome-agent)](https://pypi.org/project/chrome-agent/)
+[![License](https://img.shields.io/pypi/l/chrome-agent)](https://github.com/captivus/chrome-agent/blob/main/LICENSE)
+A CLI tool that gives AI coding agents the ability to observe and interact with Chrome browsers.
+Built as a replacement for browser MCP tools. Faster, lower token overhead, and supports something MCP tools can't do: multiple agents sharing the same browser instance.
+## Why this exists
+AI coding agents need to see and interact with browsers -- to test their code, debug automation, inspect page state. The standard approach (browser MCP tools) uses a persistent server with protocol negotiation and verbose response formatting. `chrome-agent` takes a different approach: each command is a standalone CLI call that connects to Chrome via the DevTools Protocol, does one thing, and disconnects. No server, no session state, no bloat.
+This also enables a workflow that MCP tools can't support: one process drives the browser (your automation code) while a separate agent observes the same browser to diagnose issues and improve the code.
+## Installation
+```bash
+uv tool install chrome-agent
+playwright install chromium
+```
+Or add to a project:
+```bash
+uv add chrome-agent
+uv run playwright install chromium
+```
+## Two ways to use it
+### Drive mode -- you control the browser
+Launch a browser and interact with it directly. This is the MCP replacement use case.
+```bash
+chrome-agent launch &
+chrome-agent navigate "https://example.com"
+chrome-agent text                        # Read page content
+chrome-agent element "h1"                # Inspect an element
+chrome-agent fill "#search" "query"      # Fill a form field
+chrome-agent click "#submit"             # Click a button
+chrome-agent screenshot /tmp/page.png    # Capture the screen
+```
+### Attach mode -- observe a running browser
+Your automation code launches a browser with `--remote-debugging-port=9222`. You connect to observe what the code is doing, diagnose failures, and figure out what to change.
+```bash
+chrome-agent status                      # Is the browser running?
+chrome-agent url                         # Where is it?
+chrome-agent element "#submit-btn"       # Why can't the code click this?
+chrome-agent eval "document.querySelectorAll('.error').length"
+chrome-agent screenshot                  # What does it look like?
+```
+The feedback loop: **write code -> run it -> observe the browser -> diagnose -> modify code -> repeat.**
+## Commands
+```
+chrome-agent [--port PORT] <command> [args...]
+```
+### Check browser status
+```
+status                Check if a browser is running on the CDP port
+launch                Launch a browser with CDP enabled
+                      [--fingerprint PATH] [--headless] [--no-pin-desktop]
+help                  Print command reference
+```
+### Observe (read-only, always safe)
+```
+url                   Print current URL and page title
+screenshot [path]     Save a screenshot (default: /tmp/cdp-screenshot.png)
+snapshot              Print the ARIA accessibility tree
+text                  Print visible text content
+html [selector]       Print page HTML or a specific element's HTML
+element <selector>    Detailed element inspection (visibility, dimensions,
+                      attributes, position, disabled state)
+find <selector>       Count and list all matching elements
+value <selector>      Get an input element's current value
+eval <code>           Execute JavaScript and print the result
+cookies               List all cookies
+tabs                  List all open tabs/pages
+wait <target>         Wait for a selector, milliseconds, or load state
+```
+### Navigate
+```
+navigate <url>        Go to a URL
+back                  Browser back
+forward               Browser forward
+reload                Reload the page
+```
+### Interact
+```
+click <selector>      Click an element (JS fallback for hidden elements)
+fill <selector> <val> Fill a form field (clears first)
+type <selector> <txt> Type text character by character
+press <key>           Press a keyboard key (Enter, Escape, Tab, etc.)
+select <sel> <value>  Select a dropdown option
+check <selector>      Check a checkbox
+uncheck <selector>    Uncheck a checkbox
+hover <selector>      Hover over an element
+scroll <target>       Scroll to element, or scroll up/down
+clickxy <x> <y>       Click at page coordinates
+close                 Close the current page
+viewport <w> <h>      Resize the viewport
+```
+## For AI agents
+The primary user of this tool is an AI coding agent, not a human. See [INSTRUCTIONS.md](INSTRUCTIONS.md) for comprehensive agent instructions covering:
+- Drive mode vs attach mode mental model
+- Safety rules for shared browser access
+- The development feedback loop
+- When to observe vs intervene
+- Command recipes for common tasks
+- Failure modes and recovery
+Include the contents of `INSTRUCTIONS.md` in your project's `CLAUDE.md` or agent instructions file.
+## Browser fingerprinting (optional)
+For sites that detect automated browsers, launch with a fingerprint profile:
+```bash
+chrome-agent launch --fingerprint path/to/fingerprint.json
+```
+The fingerprint JSON overrides the browser's user agent, viewport, locale, timezone, and platform to match a real desktop browser:
+```json
+{
+    "userAgent": "Mozilla/5.0 (X11; Linux x86_64) ...",
+    "platform": "Linux x86_64",
+    "vendor": "Google Inc.",
+    "language": "en-US",
+    "timezone": "America/Chicago",
+    "viewport": {"width": 1920, "height": 1080}
+}
+```
+Without `--fingerprint`, the browser launches with default Chromium settings.
+## Requirements
+- Python >= 3.11
+- Playwright >= 1.50.0
+- Chromium (installed via `playwright install chromium`)
+- Linux with xdotool (optional, for virtual desktop pinning)
+## License
+MIT

chrome_agent-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,44 @@
+[project]
+name = "chrome-agent"
+version = "0.1.0"
+description = "CLI tool for AI agents to observe and interact with Chrome via CDP"
+readme = "README.md"
+license = "MIT"
+authors = [
+    { name = "Corey Gallon", email = "366332+captivus@users.noreply.github.com" }
+]
+keywords = ["chrome", "cdp", "devtools", "browser", "automation", "agent"]
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "Environment :: Console",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: MIT License",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+    "Topic :: Software Development :: Testing",
+    "Topic :: Utilities",
+]
+requires-python = ">=3.11"
+dependencies = [
+    "playwright>=1.50.0",
+]
+[project.scripts]
+chrome-agent = "chrome_agent.cli:main"
+[project.urls]
+Homepage = "https://github.com/captivus/chrome-agent"
+Repository = "https://github.com/captivus/chrome-agent"
+Issues = "https://github.com/captivus/chrome-agent/issues"
+[build-system]
+requires = ["uv_build>=0.9.17,<0.10.0"]
+build-backend = "uv_build"
+[dependency-groups]
+dev = [
+    "pytest>=9.0.2",
+    "pytest-asyncio>=1.3.0",
+]

chrome_agent-0.1.0/src/chrome_agent/__init__.py ADDED Viewed

@@ -0,0 +1,5 @@
+"""chrome-agent: CLI tool for AI agents to observe and interact with Chrome via CDP."""
+from importlib.metadata import version
+__version__ = version("chrome-agent")

chrome_agent-0.1.0/src/chrome_agent/__main__.py ADDED Viewed

@@ -0,0 +1,5 @@
+"""Allow running as: python -m chrome_agent"""
+from chrome_agent.cli import main
+main()

chrome_agent-0.1.0/src/chrome_agent/browser.py ADDED Viewed

@@ -0,0 +1,190 @@
+"""Browser launcher with optional anti-detection fingerprinting.
+Launches a Playwright Chromium browser with CDP enabled. Optionally
+applies a fingerprint profile to make the browser appear as a real
+desktop browser.
+"""
+import asyncio
+import json
+import logging
+import os
+import subprocess
+from dataclasses import dataclass
+from playwright.async_api import Browser, Page, Playwright, async_playwright
+logger = logging.getLogger(__name__)
+@dataclass
+class BrowserFingerprint:
+    """Browser fingerprint profile for anti-detection."""
+    user_agent: str
+    viewport: dict[str, int]
+    locale: str
+    timezone: str
+    platform: str
+    vendor: str
+@dataclass
+class BrowserSession:
+    """Handle to a launched browser session."""
+    playwright: Playwright
+    browser: Browser
+    page: Page
+async def load_fingerprint(*, path: str) -> BrowserFingerprint:
+    """Load a browser fingerprint profile from a JSON file.
+    Expected JSON schema:
+        {
+            "userAgent": "...",
+            "platform": "...",
+            "vendor": "...",
+            "language": "en-US",
+            "timezone": "America/Chicago",
+            "viewport": {"width": 1920, "height": 1080}
+        }
+    """
+    with open(path, "r") as f:
+        data = json.load(f)
+    return BrowserFingerprint(
+        user_agent=data["userAgent"],
+        viewport=data["viewport"],
+        locale=data["language"],
+        timezone=data["timezone"],
+        platform=data["platform"],
+        vendor=data["vendor"],
+    )
+async def launch_browser(
+    *,
+    port: int = 9222,
+    fingerprint: BrowserFingerprint | None = None,
+    headless: bool = False,
+    wm_class: str = "chrome-agent",
+    pin_to_desktop: bool = True,
+) -> BrowserSession:
+    """Launch a Chromium browser with CDP enabled.
+    Args:
+        port: CDP remote debugging port.
+        fingerprint: Optional fingerprint for anti-detection. If None,
+            launches a clean browser with no spoofing.
+        headless: Run in headless mode.
+        wm_class: X11 window class name (for desktop pinning).
+        pin_to_desktop: Move browser window to the launching terminal's
+            virtual desktop. Default True. Linux/X11 only, requires
+            xdotool -- silently skipped on other platforms.
+    """
+    playwright = await async_playwright().start()
+    launch_args = [
+        "--disable-blink-features=AutomationControlled",
+        f"--remote-debugging-port={port}",
+        f"--class={wm_class}",
+    ]
+    browser = await playwright.chromium.launch(
+        headless=headless,
+        args=launch_args,
+    )
+    # Build context options
+    context_kwargs = {}
+    if fingerprint:
+        context_kwargs.update(
+            user_agent=fingerprint.user_agent,
+            viewport=fingerprint.viewport,
+            locale=fingerprint.locale,
+            timezone_id=fingerprint.timezone,
+        )
+    context = await browser.new_context(**context_kwargs)
+    # Apply anti-detection init script if fingerprinted
+    if fingerprint:
+        await context.add_init_script(f"""
+            Object.defineProperty(navigator, 'webdriver', {{
+                get: () => false
+            }});
+            Object.defineProperty(navigator, 'platform', {{
+                get: () => '{fingerprint.platform}'
+            }});
+            Object.defineProperty(navigator, 'vendor', {{
+                get: () => '{fingerprint.vendor}'
+            }});
+            window.chrome = {{
+                runtime: {{}},
+                app: {{}}
+            }};
+        """)
+    page = await context.new_page()
+    session = BrowserSession(
+        playwright=playwright,
+        browser=browser,
+        page=page,
+    )
+    if pin_to_desktop:
+        await _move_to_launching_desktop(wm_class=wm_class)
+    return session
+async def _move_to_launching_desktop(*, wm_class: str) -> None:
+    """Move the browser window to the terminal's virtual desktop.
+    Linux/X11 only. Requires xdotool. Silently does nothing if
+    xdotool is unavailable or on non-X11 systems.
+    """
+    try:
+        # Determine which desktop our terminal is on
+        window_id = os.environ.get("WINDOWID", "")
+        if window_id:
+            result = subprocess.run(
+                ["xdotool", "get_desktop_for_window", window_id],
+                capture_output=True, text=True,
+            )
+            desktop = result.stdout.strip()
+        else:
+            result = subprocess.run(
+                ["xdotool", "get_desktop"],
+                capture_output=True, text=True,
+            )
+            desktop = result.stdout.strip()
+        if not desktop:
+            return
+        # Wait for the browser window to appear
+        await asyncio.sleep(0.5)
+        # Find our browser windows by the custom WM_CLASS
+        result = subprocess.run(
+            ["xdotool", "search", "--class", wm_class],
+            capture_output=True, text=True,
+        )
+        for wid in result.stdout.strip().split("\n"):
+            if wid.strip():
+                subprocess.run(
+                    ["xdotool", "set_desktop_for_window", wid.strip(), desktop],
+                )
+        logger.info("Moved browser window(s) to desktop %s", desktop)
+    except FileNotFoundError:
+        logger.debug("xdotool not available -- skipping desktop move")
+    except Exception as e:
+        logger.debug("Could not move browser to desktop: %s", e)