npm - slimbrowser-mcp - Versions diffs - 0.1.1 → 0.1.4 - Mend

slimbrowser-mcp 0.1.1 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +568 -55
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -1,105 +1,618 @@
-# slimbrowser-mcp
+<p align="center">
+  <h1 align="center">SlimBrowser MCP</h1>
+  <p align="center">
+    <strong>Token-efficient browser automation for AI agents</strong>
+  </p>
+  <p align="center">
+    A Model Context Protocol (MCP) server that gives AI agents full browser control —<br/>
+    navigate, click, type, extract data, capture sessions, and debug — all through a<br/>
+    clean, structured tool interface optimized for minimal token overhead.
+  </p>
+  <p align="center">
+    <a href="#quickstart">Quickstart</a> &nbsp;&bull;&nbsp;
+    <a href="#client-integrations">Client Integrations</a> &nbsp;&bull;&nbsp;
+    <a href="#arguments">Arguments</a> &nbsp;&bull;&nbsp;
+    <a href="#remote-chrome">Remote Chrome</a> &nbsp;&bull;&nbsp;
+    <a href="#mcp-tool-catalog">Tool Catalog</a>
+  </p>
+</p>
+---
+## Why SlimBrowser MCP?
+Most browser MCP servers dump full DOM trees and bloated HTML into your context window. That means **more tokens, higher cost, and slower agents**. SlimBrowser takes a fundamentally different approach — structured observations with three resolution modes, compact element IDs for surgical interactions, and delta-based updates that only send what changed.
+### Token Usage Comparison
+> Real-world benchmark: scraping 85 posts from the same website.
-A token-efficient browser MCP server for AI agents.
+```
+┌───────────────────────┬───────────┬─────────────┬────────────┐
+│        Tool           │ Same task │ Tokens used │  Savings   │
+├───────────────────────┼───────────┼─────────────┼────────────┤
+│ Chrome DevTools MCP   │ 85 posts  │  ~290,000   │  baseline  │
+├───────────────────────┼───────────┼─────────────┼────────────┤
+│ Claude in Chrome      │ 85 posts  │  ~100,000   │    65%     │
+├───────────────────────┼───────────┼─────────────┼────────────┤
+│ SlimBrowser MCP       │ 85 posts  │   ~44,000   │    85%     │
+└───────────────────────┴───────────┴─────────────┴────────────┘
+```
+**SlimBrowser uses 85% fewer tokens** than Chrome DevTools MCP and **56% fewer** than Claude in Chrome — for the exact same task.
+### Key Advantages
+- **85% token reduction** — proven in real-world benchmarks against leading alternatives
+- **3 observation modes** — SUMMARY, REGION, and FULL let agents choose the right level of detail
+- **42 specialized tools** — from one-click form fills to paginated network request inspection
+- **Session capture & replay** — every action is recorded; generate visual playback HTML automatically
+- **Remote Chrome support** — connect to Chrome instances running on remote machines, Docker containers, or cloud VMs
+- **Multi-tab orchestration** — open, switch, pin, and close tabs with full state tracking
+- **Built-in diagnostics** — console logs, network requests, performance metrics, and execution traces
+- **AI prompt templates** — pre-built workflows for scraping, testing, security audits, and more
+- **MCP resources** — subscribe to live session state via the `ab://` URI scheme
-- Chrome-only, CDP-first
-- Low-token observation model (`SUMMARY` by default)
-- Auto session capture/playback
-- Screenshot export with file paths returned to the agent
+---
-## Quickstart (End Users)
+## Quickstart
-### 1) Run doctor
+**Prerequisites:** Node.js >= 18, Chrome/Chromium installed locally (or a remote Chrome instance).
+Run the diagnostic check:
 ```bash
-npx -y slimbrowser-mcp doctor
+npx -y slimbrowser-mcp@latest doctor
 ```
-### 2) Start MCP server
+Start the MCP server:
 ```bash
-npx -y slimbrowser-mcp
+npx -y slimbrowser-mcp@latest
+```
+Verify it works — ask your AI agent to run:
 ```
+1. browser_create_session
+2. browser_navigate  →  url: "https://example.com"
+3. browser_observe   →  mode: "SUMMARY"
+4. browser_screenshot
+```
+---
-No `.env` is required for default usage.
+## Client Integrations
-## Connect to Claude Code
+Supported clients: **Codex** / **Claude Code** / **Gemini CLI** / **Claude Desktop** / **Cursor** / **Windsurf**
-Add server:
+<details>
+<summary><strong>Codex</strong></summary>
 ```bash
-claude mcp add --scope local slimbrowser -- npx -y slimbrowser-mcp
+# Add
+codex mcp add slimbrowser -- npx -y slimbrowser-mcp@latest
+# List / Remove
+codex mcp list
+codex mcp remove slimbrowser
 ```
-Check status:
+</details>
+<details>
+<summary><strong>Claude Code</strong></summary>
 ```bash
-/mcp
+# Add (user scope)
+claude mcp add --scope user slimbrowser -- npx -y slimbrowser-mcp@latest
+# List / Remove
+claude mcp list --scope user
+claude mcp remove --scope user slimbrowser
 ```
-Remove server:
+</details>
+<details>
+<summary><strong>Gemini CLI</strong></summary>
 ```bash
-claude mcp remove --scope local slimbrowser
+# Add
+gemini mcp add slimbrowser npx -y slimbrowser-mcp@latest
+# List / Remove
+gemini mcp list
+gemini mcp remove slimbrowser
 ```
-## What the Agent Can Do
+</details>
-Core tools include:
+<details>
+<summary><strong>Claude Desktop</strong></summary>
-- Session: `browser_create_session`, `browser_get_session`, `browser_close_session`
-- Observe/Navigate: `browser_observe`, `browser_navigate`, `browser_back`, `browser_forward`, `browser_reload`
-- Interact: `browser_click`, `browser_type`, `browser_select`, `browser_scroll`, `browser_wait_for`
-- Tabs: `browser_open_tab`, `browser_list_tabs`, `browser_switch_tab`, `browser_close_tab`
-- Data/Artifacts: `browser_extract`, `browser_screenshot`, `browser_list_artifacts`
-- Diagnostics: `browser_list_console_messages`, `browser_list_network_requests`, `browser_get_performance`
+Add to your Claude Desktop MCP configuration:
-Full catalog: [docs/MCP_TOOL_CATALOG.md](docs/MCP_TOOL_CATALOG.md)
+```json
+{
+  "mcpServers": {
+    "slimbrowser": {
+      "command": "npx",
+      "args": ["-y", "slimbrowser-mcp@latest"]
+    }
+  }
+}
+```
-## Screenshots and Playback
+</details>
-- `browser_screenshot` returns artifact metadata and saved file path.
-- Session capture starts with session creation (if enabled).
-- When the task ends, the agent should call `browser_finalize_session_capture`.
-- Final playback file path is returned, so it can be used in reports.
+<details>
+<summary><strong>Cursor</strong></summary>
-Default capture directory:
+Edit `~/.cursor/mcp.json` (Windows: `%USERPROFILE%\.cursor\mcp.json`):
+```json
+{
+  "mcpServers": {
+    "slimbrowser": {
+      "command": "npx",
+      "args": ["-y", "slimbrowser-mcp@latest"]
+    }
+  }
+}
+```
+</details>
+<details>
+<summary><strong>Windsurf</strong></summary>
+Edit `~/.codeium/windsurf/mcp_config.json` (Windows: `%USERPROFILE%\.codeium\windsurf\mcp_config.json`):
+```json
+{
+  "mcpServers": {
+    "slimbrowser": {
+      "command": "npx",
+      "args": ["-y", "slimbrowser-mcp@latest"]
+    }
+  }
+}
+```
+</details>
+---
+## Arguments
+All flags can be passed directly as command arguments — no `.env` file required. Append them after the base command in any client.
+**CLI example:**
 ```bash
-/tmp/slimbrowser-mcp-playbacks
+codex mcp add slimbrowser -- npx -y slimbrowser-mcp@latest --headless=false --enable-session-capture=true
+```
+**JSON example:**
+```json
+{
+  "mcpServers": {
+    "slimbrowser": {
+      "command": "npx",
+      "args": [
+        "-y",
+        "slimbrowser-mcp@latest",
+        "--headless=false",
+        "--enable-session-capture=true",
+        "--session-capture-dir=/tmp/slimbrowser-mcp-playbacks"
+      ]
+    }
+  }
+}
 ```
-## Minimal Configuration (Optional)
+### Transport & Runtime
+| Flag | Values | Default | Description |
+|------|--------|---------|-------------|
+| `--transport` | `stdio`, `http`, `both` | `stdio` | MCP transport mode |
+| `--runtime-mode` | `embedded`, `gateway` | `embedded` | Run browser engine in-process or connect to a gateway |
+| `--config` | `<path>` | — | Path to a TOML config file |
+### Browser Control
+| Flag | Values | Default | Description |
+|------|--------|---------|-------------|
+| `--headless` | `true`, `false` | `true` | Run Chrome in headless or headed mode |
+| `--autostart` | `auto`, `drivers`, `none` | `auto` | Browser autostart behavior |
+| `--chrome-launch-on-start` | `true`, `false` | `true` | Auto-launch Chrome when server starts |
+| `--chrome-binary-path` | `<path>` | — | Path to Chrome/Chromium binary |
+| `--chrome-debugger-address` | `<host:port>` | — | Connect to an existing Chrome DevTools Protocol endpoint |
+| `--chrome-user-data-dir` | `<path>` | — | Chrome user data directory (profile) |
+### Session Capture
-Set only if you want overrides:
+| Flag | Values | Default | Description |
+|------|--------|---------|-------------|
+| `--enable-session-capture` | `true`, `false` | `true` | Record screenshots after each tool call |
+| `--session-capture-dir` | `<path>` | `/tmp/slimbrowser-mcp-playbacks` | Output directory for capture frames |
+---
+## Remote Chrome
+SlimBrowser supports connecting to **remote Chrome/Chromium instances** — run the browser on a remote server, Docker container, CI runner, or cloud VM, and control it from anywhere.
+### Connect to a Remote Chrome Instance
+Use the `--chrome-debugger-address` flag to point SlimBrowser at a Chrome instance exposing its DevTools protocol:
+```bash
+npx -y slimbrowser-mcp@latest \
+  --chrome-debugger-address=192.168.1.100:9222
+```
+### Launch Chrome with Remote Debugging
+On the remote machine, start Chrome with debugging enabled:
+```bash
+# Linux / macOS
+google-chrome \
+  --remote-debugging-port=9222 \
+  --remote-debugging-address=0.0.0.0 \
+  --no-first-run \
+  --no-default-browser-check \
+  --user-data-dir=/tmp/chrome-remote
+# Docker
+docker run -d \
+  -p 9222:9222 \
+  --name chrome-remote \
+  chromedp/headless-shell:latest \
+  --remote-debugging-address=0.0.0.0 \
+  --remote-debugging-port=9222
+```
+### Client Configuration for Remote Chrome
+**CLI agents (Codex, Claude Code, Gemini CLI):**
+```bash
+codex mcp add slimbrowser -- \
+  npx -y slimbrowser-mcp@latest \
+  --chrome-debugger-address=your-server:9222
+```
+**JSON-based clients (Claude Desktop, Cursor, Windsurf):**
+```json
+{
+  "mcpServers": {
+    "slimbrowser": {
+      "command": "npx",
+      "args": [
+        "-y",
+        "slimbrowser-mcp@latest",
+        "--chrome-debugger-address=your-server:9222"
+      ]
+    }
+  }
+}
+```
+### Custom Chrome Binary & Profile
+For environments with non-standard Chrome installations:
 ```bash
-# MCP transport/runtime
-MCP_TRANSPORT=stdio
-MCP_RUNTIME_MODE=embedded
+npx -y slimbrowser-mcp@latest \
+  --chrome-binary-path=/opt/google/chrome/chrome \
+  --chrome-user-data-dir=/home/user/.config/chromium
+```
+### Common Remote Scenarios
+| Scenario | Flags |
+|----------|-------|
+| Chrome on another machine | `--chrome-debugger-address=<host>:9222` |
+| Chrome in Docker | `--chrome-debugger-address=localhost:9222` (with `-p 9222:9222`) |
+| Chrome on a cloud VM | `--chrome-debugger-address=<vm-ip>:9222` (ensure port is open) |
+| Skip auto-launch, use existing Chrome | `--chrome-launch-on-start=false --chrome-debugger-address=localhost:9222` |
+| Custom Chrome binary | `--chrome-binary-path=/path/to/chrome` |
+| Custom Chrome profile | `--chrome-user-data-dir=/path/to/profile` |
+---
+## MCP Tool Catalog
+SlimBrowser exposes **42 tools** organized into 9 categories. Every tool follows consistent conventions:
+- `session_id` is **always optional** — omit it to use the default session
+- `tab_id` is **optional** — omit it to target the active tab
+- `timeout_ms` is **optional** — override the default wait timeout
+- Tools marked **read-only** make no mutations; tools marked **idempotent** are safe to retry
+---
+### Session Management
+Create, inspect, and close browser sessions with optional session capture for full playback.
+| Tool | Description | Key Parameters |
+|------|-------------|----------------|
+| `browser_create_session` | Create a new browser session | `tenant_id?`, `profile?`, `browser?`, `policy?` |
+| `browser_get_session` | Get session details | `session_id?` |
+| `browser_close_session` | Close a session (auto-finalizes capture) | `session_id?`, `final_feedback?` |
+| `browser_set_default_session` | Set the default session for this client | `session_id` |
+| `browser_get_session_capture` | Get capture/playback metadata | `session_id?` |
+| `browser_finalize_session_capture` | Finalize recording and generate playback HTML | `session_id?`, `final_feedback?` |
+<details>
+<summary><strong>Session profiles & policies</strong></summary>
-# Chrome mode
-CHROME_HEADLESS=false
+**Profiles** control default behavior presets:
-# Session capture
-MCP_ENABLE_SESSION_CAPTURE=true
-MCP_SESSION_CAPTURE_DIR=/tmp/slimbrowser-mcp-playbacks
+| Profile | Use Case |
+|---------|----------|
+| `agent` | General-purpose AI agent browsing (default) |
+| `test` | E2E test automation |
+| `security` | Security auditing and passive crawling |
+| `scrape` | Data extraction and scraping |
+**Policies** control security boundaries:
+| Policy | Behavior |
+|--------|----------|
+| `strict` | Restricted navigation, no script eval |
+| `configurable` | Balanced defaults with overrides (default) |
+| `unrestricted` | Full access, no guardrails |
+</details>
+<details>
+<summary><strong>Session capture output</strong></summary>
+When session capture is enabled, each tool call produces a frame:
+```
+/tmp/slimbrowser-mcp-playbacks/
+  {session_id}/
+    playback/
+      001_browser_navigate.png
+      002_browser_click.png
+      003_browser_type.png
+      ...
+    manifest.json
+    playback.html          ← open this to replay visually
 ```
-Config resolution order:
+</details>
+---
+### Observation & Navigation
+Read page state and navigate between URLs. The observation system is the core of SlimBrowser's token efficiency.
+| Tool | Description | Key Parameters | Traits |
+|------|-------------|----------------|--------|
+| `browser_observe` | Read current page state | `mode?`, `region?` | Read-only, Idempotent |
+| `browser_snapshot` | Capture a point-in-time observation | `mode?`, `region?` | Read-only, Idempotent |
+| `browser_navigate` | Navigate to a URL | `url`, `timeout_ms?` | |
+| `browser_back` | Go back in history | `timeout_ms?` | |
+| `browser_forward` | Go forward in history | `timeout_ms?` | |
+| `browser_reload` | Reload current page | `timeout_ms?` | |
+<details>
+<summary><strong>Observation modes — the core of token efficiency</strong></summary>
+The observation system is what makes SlimBrowser dramatically more token-efficient than alternatives. Instead of dumping the entire DOM, you choose exactly how much detail you need:
+| Mode | Token Cost | What You Get | When to Use |
+|------|-----------|--------------|-------------|
+| `SUMMARY` | **~50-200 tokens** | Page title, URL, interactable elements with compact EIDs, state hints | Default — use for 90% of interactions |
+| `REGION` | **~200-500 tokens** | Focused observation of a specific page region | When you need detail on a specific area |
+| `FULL` | **~500-2000 tokens** | Complete page content with all elements | Last resort when SUMMARY isn't enough |
+Compare this to other tools that send **5,000-30,000 tokens per page observation** regardless of what you need.
+The observation payload includes:
+- **Interactables** — clickable/typeable elements with compact `eid`, `role`, `name`, `hint`, `state`, and `bbox`
+- **Delta** — only what changed since the last observation (subsequent calls are even cheaper)
+- **Budget** — remaining token/screenshot/retry budgets to prevent runaway context growth
+**Pro tip:** Start with `SUMMARY` mode. Only escalate to `REGION` or `FULL` if the agent needs more context. This single practice can cut your token usage by 80%+ compared to full-DOM approaches.
+</details>
+---
+### Interaction
+Click, type, select, scroll, and wait. All interaction tools return a normalized action response.
+| Tool | Description | Key Parameters |
+|------|-------------|----------------|
+| `browser_click` | Click an element by EID | `eid`, `timeout_ms?` |
+| `browser_type` | Type text into an element | `eid`, `text`, `timeout_ms?` |
+| `browser_select` | Select a dropdown option | `eid`, `option`, `timeout_ms?` |
+| `browser_scroll` | Scroll the page | `dx?`, `dy?`, `timeout_ms?` |
+| `browser_wait_for` | Wait for a condition to be true | `predicate`, `timeout_ms?` |
+<details>
+<summary><strong>Action response format</strong></summary>
+Every interaction tool returns:
+```json
+{
+  "action_id": "act_abc123",
+  "status": "OK | FAILED | AMBIGUOUS | BLOCKED",
+  "message": "Human-readable result",
+  "evidence": [],
+  "delta": { "...changes since last observation..." },
+  "budget": { "...remaining budgets..." }
+}
+```
+| Status | Meaning |
+|--------|---------|
+| `OK` | Action completed successfully |
+| `FAILED` | Action could not be performed |
+| `AMBIGUOUS` | Multiple matches found — refine your selector |
+| `BLOCKED` | Action blocked by policy or page state |
+</details>
+---
+### Tab Management
+Open, switch, pin, and close tabs. Full multi-tab orchestration for complex workflows.
+| Tool | Description | Key Parameters | Traits |
+|------|-------------|----------------|--------|
+| `browser_open_tab` | Open a new tab | `url?` (defaults to `about:blank`) | |
+| `browser_list_tabs` | List all tabs in session | | Read-only, Idempotent |
+| `browser_switch_tab` | Switch the active tab | `tab_id` | |
+| `browser_close_tab` | Close a tab | `tab_id` | |
+| `browser_pin_tab` | Pin or unpin a tab | `tab_id`, `pinned?` | |
+---
+### Macro & Discovery Tools
+High-level tools that combine multiple primitives for common operations.
-1. `--config <path>`
-2. `MCP_CONFIG=<path>`
-3. `./configs/mcp.toml`
-4. `./mcp.toml`
-5. `$HOME/.config/slimbrowser-mcp/mcp.toml`
-6. Built-in defaults
+| Tool | Description | Key Parameters |
+|------|-------------|----------------|
+| `browser_fill_form` | Fill multiple form fields at once | `fields` (object: `label -> value`) |
+| `browser_click_by_text` | Click the first element matching text | `text`, `timeout_ms?` |
+| `browser_type_by_label` | Type into an input matching a label | `label`, `text`, `timeout_ms?` |
+| `browser_find_interactables` | Search for interactive elements | `query` |
+<details>
+<summary><strong>Example: fill a login form in one call</strong></summary>
+```json
+{
+  "tool": "browser_fill_form",
+  "arguments": {
+    "fields": {
+      "Email": "user@example.com",
+      "Password": "••••••••"
+    }
+  }
+}
+```
+This finds each field by label and types the value — replacing what would be 4+ separate tool calls.
+</details>
+---
+### Data Extraction & Artifacts
+Extract structured data from pages, capture screenshots, and manage session artifacts.
+| Tool | Description | Key Parameters | Traits |
+|------|-------------|----------------|--------|
+| `browser_extract` | Extract structured data using a JSON schema | `schema?` | Read-only, Idempotent |
+| `browser_screenshot` | Capture a screenshot | `output_dir?`, `file_name?` | Idempotent |
+| `browser_list_artifacts` | List all session artifacts | | Read-only, Idempotent |
+<details>
+<summary><strong>Screenshot output</strong></summary>
+```json
+{
+  "artifact": {
+    "id": "art_xyz",
+    "path": "/internal/path/screenshot.png",
+    "size_bytes": 45320,
+    "created_at": "2025-01-15T10:30:00Z"
+  },
+  "artifact_path": "/internal/path/screenshot.png",
+  "saved_path": "/tmp/my-dir/screenshot.png",
+  "exported_path": "/tmp/my-dir/screenshot.png"
+}
+```
+</details>
+---
+### Diagnostics & Performance
+Inspect console logs, network traffic, dialogs, and performance metrics — all through paginated, structured APIs.
+| Tool | Description | Key Parameters | Traits |
+|------|-------------|----------------|--------|
+| `browser_evaluate_script` | Execute JavaScript in the page | `script`, `args?` | |
+| `browser_list_console_messages` | Read console logs (paginated) | `cursor?`, `limit?` | Read-only, Idempotent |
+| `browser_get_console_message` | Get a single console message | `message_id` | Read-only, Idempotent |
+| `browser_list_network_requests` | Read network requests (paginated) | `cursor?`, `limit?` | Read-only, Idempotent |
+| `browser_get_network_request` | Get a single network request | `request_id` | Read-only, Idempotent |
+| `browser_list_dialogs` | Read alert/confirm/prompt dialogs | `cursor?`, `limit?` | Read-only, Idempotent |
+| `browser_handle_dialog` | Accept or dismiss a dialog | `dialog_id`, `action?`, `text?` | |
+| `browser_get_performance` | Get navigation & resource timing | | Read-only, Idempotent |
+---
+### Tracing & Replay
+Record execution traces and replay them for debugging and reproducibility.
+| Tool | Description | Traits |
+|------|-------------|--------|
+| `browser_get_trace` | Get all trace events for a session | Read-only, Idempotent |
+| `browser_replay_trace` | Replay recorded trace events | Idempotent |
+---
+### Capability & Mode
+Query and configure runtime capabilities.
+| Tool | Description | Key Parameters | Traits |
+|------|-------------|----------------|--------|
+| `browser_capabilities` | Get supported browsers, modes, and features | | Read-only, Idempotent |
+| `browser_get_headless_mode` | Check current headless/headed mode | | Read-only, Idempotent |
+| `browser_set_headless_mode` | Switch between headless and headed | `headless` (boolean) | |
+---
 ## Troubleshooting
-- If `/mcp` shows `connecting` or `failed`, run: `npx -y slimbrowser-mcp doctor`
-- If Chrome cannot be controlled, confirm Chrome is installed and retry.
-- If a tool fails, use `browser_get_trace` and `browser_list_artifacts` for debugging evidence.
+**Server won't connect or shows `connecting` / `failed`:**
+```bash
+npx -y slimbrowser-mcp@latest doctor
+```
+**Chrome not found:**
+- Ensure Chrome/Chromium is installed locally, or
+- Point to it with `--chrome-binary-path=/path/to/chrome`
+**Remote Chrome not reachable:**
+- Verify the remote Chrome was started with `--remote-debugging-port=9222 --remote-debugging-address=0.0.0.0`
+- Check firewall rules allow traffic on the debugging port
+- Test connectivity: `curl http://<host>:9222/json/version`
+**Action failures:**
+- Use `browser_get_trace` to inspect the execution trace
+- Use `browser_list_artifacts` to find screenshots and other evidence
+- Use the `triage_failed_action_with_artifacts` prompt for AI-assisted diagnosis
+**Session capture not working:**
+- Ensure `--enable-session-capture=true` is set
+- Check that `--session-capture-dir` points to a writable directory
+---
 ## License

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "slimbrowser-mcp",
-  "version": "0.1.1",
+  "version": "0.1.4",
   "description": "npx bootstrap launcher for slimbrowser-mcp",
   "publishConfig": {
     "access": "public"