openchrome-mcp 1.12.0 → 1.12.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.ko.md +244 -0
- package/README.md +122 -681
- package/assets/chart-tokens.svg +10 -10
- package/dist/cdp/client.d.ts +9 -0
- package/dist/cdp/client.d.ts.map +1 -1
- package/dist/cdp/client.js +37 -3
- package/dist/cdp/client.js.map +1 -1
- package/dist/chrome/launcher.js.map +1 -1
- package/dist/core/task-ledger/types.d.ts +19 -0
- package/dist/core/task-ledger/types.d.ts.map +1 -1
- package/dist/index.js +5 -0
- package/dist/index.js.map +1 -1
- package/dist/mcp-server.d.ts.map +1 -1
- package/dist/mcp-server.js +5 -0
- package/dist/mcp-server.js.map +1 -1
- package/dist/session-snapshot-policy.d.ts.map +1 -1
- package/dist/session-snapshot-policy.js +1 -0
- package/dist/session-snapshot-policy.js.map +1 -1
- package/dist/tools/act.d.ts.map +1 -1
- package/dist/tools/act.js.map +1 -1
- package/dist/tools/index.d.ts.map +1 -1
- package/dist/tools/index.js +9 -0
- package/dist/tools/index.js.map +1 -1
- package/dist/tools/interact.d.ts.map +1 -1
- package/dist/tools/interact.js +12 -2
- package/dist/tools/interact.js.map +1 -1
- package/dist/tools/navigate.d.ts.map +1 -1
- package/dist/tools/navigate.js +31 -3
- package/dist/tools/navigate.js.map +1 -1
- package/dist/tools/oc-lane.js +4 -4
- package/dist/tools/oc-lane.js.map +1 -1
- package/dist/tools/oc-skill-export.js +1 -1
- package/dist/tools/oc-task-get.d.ts.map +1 -1
- package/dist/tools/oc-task-get.js +1 -0
- package/dist/tools/oc-task-get.js.map +1 -1
- package/dist/tools/query-dom.d.ts.map +1 -1
- package/dist/tools/query-dom.js +0 -2
- package/dist/tools/query-dom.js.map +1 -1
- package/dist/tools/read-page.d.ts.map +1 -1
- package/dist/tools/read-page.js +12 -2
- package/dist/tools/read-page.js.map +1 -1
- package/dist/types/tool-annotations.d.ts +5 -0
- package/dist/types/tool-annotations.d.ts.map +1 -1
- package/dist/types/tool-annotations.js +5 -0
- package/dist/types/tool-annotations.js.map +1 -1
- package/dist/watchdog/chrome-readiness.d.ts +9 -1
- package/dist/watchdog/chrome-readiness.d.ts.map +1 -1
- package/dist/watchdog/chrome-readiness.js +7 -1
- package/dist/watchdog/chrome-readiness.js.map +1 -1
- package/package.json +12 -1
package/README.md
CHANGED
|
@@ -6,241 +6,66 @@
|
|
|
6
6
|
|
|
7
7
|
<p align="center">
|
|
8
8
|
<b>Harness-Engineered Browser Automation</b><br>
|
|
9
|
-
The MCP server that guides AI agents.
|
|
9
|
+
The MCP server that drives and guides AI agents through a real Chrome.
|
|
10
10
|
</p>
|
|
11
11
|
|
|
12
12
|
<p align="center">
|
|
13
13
|
<a href="https://www.npmjs.com/package/openchrome-mcp"><img src="https://img.shields.io/npm/v/openchrome-mcp" alt="npm"></a>
|
|
14
14
|
<a href="https://github.com/shaun0927/openchrome/releases/latest"><img src="https://img.shields.io/github/v/release/shaun0927/openchrome" alt="Latest Release"></a>
|
|
15
|
-
<a href="https://github.com/shaun0927/openchrome/releases/latest"><img src="https://img.shields.io/github/release-date/shaun0927/openchrome" alt="Release Date"></a>
|
|
16
15
|
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="MIT"></a>
|
|
17
|
-
<a href="https://mseep.ai/app/shaun0927-openchrome"><img src="assets/badges/mseep.png" alt="Listed on MseeP.ai" height="20"></a>
|
|
18
16
|
</p>
|
|
19
17
|
|
|
20
18
|
<p align="center">
|
|
21
|
-
<
|
|
19
|
+
<b>English</b> · <a href="README.ko.md">한국어</a>
|
|
22
20
|
</p>
|
|
23
21
|
|
|
24
|
-
<p align="center">
|
|
25
|
-
<img src="assets/chart-tokens.svg" alt="Token Efficiency: OpenChrome vs Playwright" width="100%">
|
|
26
|
-
</p>
|
|
27
|
-
|
|
28
|
-
### How OpenChrome compares
|
|
29
|
-
|
|
30
|
-
| | OpenChrome | Playwright MCP | Chrome DevTools MCP | Vercel agent-browser |
|
|
31
|
-
|---|:---:|:---:|:---:|:---:|
|
|
32
|
-
| **Architecture** | MCP → CDP (direct) | MCP → Playwright → CDP | MCP → Puppeteer → CDP | CLI → Daemon → Playwright → CDP |
|
|
33
|
-
| **RAM (20 parallel)** | **~300 MB** | ~5 GB+ | impractical | impractical |
|
|
34
|
-
| **Bot detection** | **invisible** (real Chrome) | detected (TLS fingerprint) | detected (CDP signals) | detected (local) / cloud only |
|
|
35
|
-
| **Chrome login reuse** | **built-in** | extension mode only | manual | manual state files |
|
|
36
|
-
| **LLM hang prevention** | **hint engine** (30+ rules) | none | none | error rewrite (5 patterns) |
|
|
37
|
-
| **Reliability mechanisms** | **49** (8-layer defense) | ~3 | ~3 | ~5 |
|
|
38
|
-
| **Token compression** | **15x** (DOM serializer) | none | none | none |
|
|
39
|
-
| **Outcome classification** | **yes** (DOM delta) | none | none | none |
|
|
40
|
-
| **Cross-session learning** | **yes** (domain memory) | none | none | none |
|
|
41
|
-
| **Circuit breaker** | **3-level** | none | none | none |
|
|
42
|
-
| **Shadow DOM** | **all types** (open + closed) | open only | invisible | invisible |
|
|
43
|
-
| **MCP native** | **yes** | yes | yes | no (CLI only) |
|
|
44
|
-
| **Parallel sessions** | **1 Chrome, N tabs** | N browsers | manual tabs | N daemons |
|
|
45
|
-
|
|
46
|
-
> **tl;dr** — OpenChrome talks directly to Chrome via CDP with zero middleware, reuses your real login sessions, and is the only browser MCP server with **harness engineering** — 27 intelligent subsystems that guide, protect, and optimize the AI agent at every step.
|
|
47
|
-
|
|
48
22
|
---
|
|
49
23
|
|
|
50
|
-
## What is
|
|
51
|
-
|
|
52
|
-
Imagine **20+ parallel Playwright sessions** — but already logged in to everything, invisible to bot detection, and sharing one Chrome process at 300MB. That's OpenChrome.
|
|
53
|
-
|
|
54
|
-
Search across 20 sites simultaneously. Crawl authenticated dashboards in seconds. Debug production UIs with real user sessions. Connect to [OpenClaw](https://github.com/openclaw/openclaw) and give your AI agent browser superpowers across Telegram, Discord, or any chat platform.
|
|
55
|
-
|
|
56
|
-
```
|
|
57
|
-
You: oc compare "AirPods Pro" prices across Amazon, eBay, Walmart,
|
|
58
|
-
Best Buy, Target, Costco, B&H, Newegg — find the lowest
|
|
59
|
-
|
|
60
|
-
AI: [8 parallel workers, all sites simultaneously]
|
|
61
|
-
Best Buy: $179 ← lowest (sale)
|
|
62
|
-
Amazon: $189
|
|
63
|
-
Costco: $194 (members)
|
|
64
|
-
...
|
|
65
|
-
Time: 2.8s | All prices from live pages, already logged in.
|
|
66
|
-
```
|
|
67
|
-
|
|
68
|
-
| | Traditional | OpenChrome |
|
|
69
|
-
|---|:---:|:---:|
|
|
70
|
-
| **5-site task** | ~250s (login each) | **~3s** (parallel) |
|
|
71
|
-
| **Memory** | ~2.5 GB (5 browsers) | **~300 MB** (1 Chrome) |
|
|
72
|
-
| **Auth** | Every time | **Never** |
|
|
73
|
-
| **Bot detection** | Flagged | **Invisible** |
|
|
74
|
-
|
|
75
|
-
---
|
|
76
|
-
|
|
77
|
-
## Harness-Engineered, Not Just Automated
|
|
78
|
-
|
|
79
|
-
Traditional browser automation exposes raw APIs. When the AI agent fails, it's on its own — burning tokens guessing, retrying, and wandering. **Harness engineering** means the tool itself wraps intelligence around those APIs: preventing mistakes, recovering from errors, and guiding the agent toward efficient behavior.
|
|
24
|
+
## What it is
|
|
80
25
|
|
|
81
|
-
|
|
26
|
+
OpenChrome is an **MCP server** that controls your real, already-logged-in Chrome
|
|
27
|
+
through the CDP — no middleware, no separate browser, no re-authentication.
|
|
28
|
+
One Chrome process, many isolated tabs, ~300 MB for 20 parallel lanes.
|
|
82
29
|
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
navigate 2s
|
|
88
|
-
⚡ bot detection LLM thinks... 12s → retry with UA
|
|
89
|
-
⚡ CAPTCHA LLM thinks... 10s → stuck, skip
|
|
90
|
-
navigate to login 2s
|
|
91
|
-
⚡ no session LLM thinks... 12s → fill credentials
|
|
92
|
-
2FA prompt LLM thinks... 10s → stuck
|
|
93
|
-
...
|
|
94
|
-
finally reaches product after ~20 LLM calls, ~4 minutes
|
|
95
|
-
|
|
96
|
-
× 5 sites, sequential = ~100 LLM calls, ~20 minutes, ~$2.00
|
|
97
|
-
|
|
98
|
-
Actual work: 5 calls. Wasted on wandering: 95 calls.
|
|
99
|
-
```
|
|
100
|
-
|
|
101
|
-
OpenChrome eliminates this entirely — your Chrome is already logged in, and the hint engine corrects mistakes before they cascade:
|
|
30
|
+
It is **harness-engineered**: the server doesn't just expose browser APIs, it wraps
|
|
31
|
+
them with a hint engine, a circuit breaker, an automatic-recovery runtime, and
|
|
32
|
+
token-efficient page serialization — so the agent makes fewer mistakes, recovers
|
|
33
|
+
without "thinking", and burns far fewer tokens.
|
|
102
34
|
|
|
103
35
|
```
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
All 5 sites in parallel:
|
|
107
|
-
navigate (already authenticated) 1s
|
|
108
|
-
read prices 2s
|
|
109
|
-
⚡ stale ref on one site
|
|
110
|
-
└─ Hint: "Use read_page for fresh refs" ← no guessing
|
|
111
|
-
read_page → done 1s
|
|
112
|
-
|
|
113
|
-
= ~20 LLM calls, ~15 seconds, ~$0.40
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
The hint engine watches every tool call across 9 categories — error recovery, blocking page detection, composite suggestions, repetition detection, sequence detection, pagination detection, learned patterns, success guidance, and setup hints. When it sees the same error→recovery pattern 3+ times, it promotes it to a permanent rule across sessions via the Pattern Learner.
|
|
117
|
-
|
|
118
|
-
| | Playwright | OpenChrome | Savings |
|
|
119
|
-
|---|---|---|---|
|
|
120
|
-
| **LLM calls** | ~100 | ~20 | **80% fewer** |
|
|
121
|
-
| **Wall time** | ~20 min | ~15 sec | **80x faster** |
|
|
122
|
-
| **Token cost** | ~$2.00 | ~$0.40 | **5x cheaper** |
|
|
123
|
-
| **Wasted calls** | ~95% | ~0% | |
|
|
124
|
-
|
|
125
|
-
### 27 Harness Features Across 7 Categories
|
|
126
|
-
|
|
127
|
-
OpenChrome isn't just a browser API — it's an intelligent harness with 27 subsystems that work together:
|
|
128
|
-
|
|
129
|
-
| Category | Key Features | What It Does |
|
|
130
|
-
|----------|-------------|--------------|
|
|
131
|
-
| **Guidance** | Hint Engine (30+ rules, 9 types), Progress Tracker, Usage Guide | Prevents mistakes before they cascade |
|
|
132
|
-
| **Resilience** | Ralph Engine (7-strategy waterfall), Auto-Reconnect, Ref Self-Healing | Recovers from failures automatically |
|
|
133
|
-
| **Protection** | 3-Level Circuit Breaker, Rate Limiter, Domain Guard | Stops runaway token waste |
|
|
134
|
-
| **Feedback** | Outcome Classifier, DOM Delta, Visual Summary, Hit Detection | Reports what *actually* happened |
|
|
135
|
-
| **Learning** | Pattern Learner, Strategy Learner, Domain Memory | Gets smarter across sessions |
|
|
136
|
-
| **Optimization** | DOM Mode (15x compression), Adaptive Screenshot, Snapshot Delta | Minimizes token consumption |
|
|
137
|
-
| **Detection** | Auth Redirect Detection, Blocking Page, Pagination Detector | Identifies situations early |
|
|
138
|
-
|
|
139
|
-
<details>
|
|
140
|
-
<summary>Feature highlights</summary>
|
|
141
|
-
|
|
142
|
-
**Hint Engine** — 30+ rules across 9 categories (error recovery, blocking page detection, repetition loops, pagination, composite suggestions, sequence optimization, learned patterns, success guidance, setup hints). Escalates from `info` → `warning` → `critical` as patterns repeat. The Progress Tracker detects stuck agents within 3-5 tool calls.
|
|
143
|
-
|
|
144
|
-
**Ralph Engine** — When an interaction fails, Ralph automatically tries 7 strategies in sequence: AX tree click → CSS discovery → CDP coordinate dispatch → JS injection → Keyboard navigation → Raw CDP mouse events → Human-in-the-loop escalation. Each attempt is classified by the Outcome Classifier (SUCCESS / SILENT_CLICK / WRONG_ELEMENT).
|
|
145
|
-
|
|
146
|
-
**3-Level Circuit Breaker** — Element level (3 failures → skip, 2min reset), Page level (5 distinct failures → suggest reload), Global level (10 failures in 5min → pause all). Prevents agents from burning tokens on permanently broken elements.
|
|
147
|
-
|
|
148
|
-
**Pattern Learner** — When a hint rule misses, the learner observes the next 3 tool calls. If a different tool succeeds, it records the error→recovery correlation. After 3 occurrences at 60%+ confidence, it promotes the pattern to a permanent rule that fires in future sessions.
|
|
149
|
-
|
|
150
|
-
**DOM Mode** — Serializes the full DOM into a compact text format: strips SCRIPT/STYLE/SVG, keeps only 18 actionable attributes, deduplicates repetitive siblings, collapses nested wrapper chains. **Benchmarked: ~12K tokens vs ~180K tokens** for the same page (15x compression).
|
|
151
|
-
|
|
152
|
-
</details>
|
|
153
|
-
|
|
154
|
-
---
|
|
155
|
-
|
|
156
|
-
## Desktop App (Beta)
|
|
157
|
-
|
|
158
|
-
<p align="center">
|
|
159
|
-
<img src="https://img.shields.io/badge/macOS-Apple%20Silicon%20%7C%20Intel-black?logo=apple" alt="macOS">
|
|
160
|
-
<img src="https://img.shields.io/badge/Windows-x64-0078d4?logo=windows" alt="Windows">
|
|
161
|
-
<img src="https://img.shields.io/badge/Linux-x86__64-FCC624?logo=linux&logoColor=black" alt="Linux">
|
|
162
|
-
</p>
|
|
163
|
-
|
|
164
|
-
OpenChrome is also available as a **desktop app** — a one-click installer that runs the MCP server locally without requiring Node.js, npm, or any command-line setup. Designed for non-developers who want browser automation without the terminal.
|
|
36
|
+
You: compare "AirPods Pro" prices across Amazon, eBay, Walmart, Best Buy
|
|
165
37
|
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
| Platform | Download |
|
|
171
|
-
|----------|----------|
|
|
172
|
-
| macOS (Apple Silicon) | [OpenChrome_0.1.0_aarch64.dmg](https://github.com/shaun0927/openchrome/releases/download/desktop-v0.1.0/OpenChrome_0.1.0_aarch64.dmg) |
|
|
173
|
-
| macOS (Intel) | [OpenChrome_0.1.0_x64.dmg](https://github.com/shaun0927/openchrome/releases/download/desktop-v0.1.0/OpenChrome_0.1.0_x64.dmg) |
|
|
174
|
-
| Windows (EXE) | [OpenChrome_0.1.0_x64-setup.exe](https://github.com/shaun0927/openchrome/releases/download/desktop-v0.1.0/OpenChrome_0.1.0_x64-setup.exe) |
|
|
175
|
-
| Windows (MSI) | [OpenChrome_0.1.0_x64_en-US.msi](https://github.com/shaun0927/openchrome/releases/download/desktop-v0.1.0/OpenChrome_0.1.0_x64_en-US.msi) |
|
|
176
|
-
| Linux | Coming soon (deb/rpm available in [Releases](https://github.com/shaun0927/openchrome/releases/tag/desktop-v0.1.0)) |
|
|
177
|
-
|
|
178
|
-
### Get Started (non-developers)
|
|
179
|
-
|
|
180
|
-
1. **Download** the installer for your platform from the [Releases](https://github.com/shaun0927/openchrome/releases?q=desktop) page.
|
|
181
|
-
2. **Install** — open the `.dmg` / run the `.exe` installer / make the `.AppImage` executable and launch it.
|
|
182
|
-
3. **Connect** — the app starts the MCP server automatically. Point your MCP client (Claude, Cursor, etc.) to the local server address shown in the app.
|
|
183
|
-
|
|
184
|
-
### Installation Notes
|
|
185
|
-
|
|
186
|
-
**macOS:** The app is not notarized. On first launch, macOS will block it. To fix:
|
|
187
|
-
```bash
|
|
188
|
-
xattr -cr /Applications/OpenChrome.app
|
|
38
|
+
AI: [4 parallel lanes, already authenticated everywhere]
|
|
39
|
+
Best Buy $179 · Amazon $189 · Walmart $185 · eBay $172
|
|
40
|
+
2.4s — live pages, past bot detection
|
|
189
41
|
```
|
|
190
|
-
Or right-click the app → Open → Open.
|
|
191
|
-
|
|
192
|
-
**Windows:** SmartScreen will show "Windows protected your PC". Click "More info" → "Run anyway".
|
|
193
42
|
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
43
|
+
| | Traditional (Playwright et al.) | OpenChrome |
|
|
44
|
+
|---|:---:|:---:|
|
|
45
|
+
| 5-site task | ~250s (login each) | **~3s** (parallel) |
|
|
46
|
+
| Memory | ~2.5 GB (5 browsers) | **~300 MB** (1 Chrome) |
|
|
47
|
+
| Re-auth | every run | **never** |
|
|
48
|
+
| Bot detection | flagged | **invisible** (real Chrome) |
|
|
197
49
|
|
|
198
50
|
---
|
|
199
51
|
|
|
200
|
-
## Quick
|
|
52
|
+
## Quick start
|
|
201
53
|
|
|
202
|
-
|
|
203
|
-
```bash
|
|
204
|
-
npm install -g openchrome-mcp
|
|
205
|
-
openchrome setup
|
|
206
|
-
```
|
|
54
|
+
Install and point your MCP client at it — one command:
|
|
207
55
|
|
|
208
|
-
**Codex CLI**
|
|
209
56
|
```bash
|
|
210
57
|
npm install -g openchrome-mcp
|
|
211
|
-
openchrome setup --client codex
|
|
212
|
-
```
|
|
213
58
|
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
npx openchrome-mcp setup --client opencode
|
|
59
|
+
openchrome setup # Claude Code
|
|
60
|
+
openchrome setup --client codex # Codex CLI
|
|
61
|
+
npx openchrome-mcp setup --client opencode # OpenCode
|
|
217
62
|
```
|
|
218
63
|
|
|
219
|
-
|
|
220
|
-
Restart your MCP client after setup completes.
|
|
64
|
+
Restart your MCP client. That's it — Chrome auto-launches on first tool call.
|
|
221
65
|
|
|
222
66
|
<details>
|
|
223
|
-
<summary>Manual config</summary>
|
|
67
|
+
<summary>Manual MCP config (Cursor / VS Code / Windsurf / others)</summary>
|
|
224
68
|
|
|
225
|
-
**Claude Code:**
|
|
226
|
-
```bash
|
|
227
|
-
claude mcp add openchrome -- openchrome serve --auto-launch
|
|
228
|
-
```
|
|
229
|
-
|
|
230
|
-
**VS Code / Copilot** (`.vscode/mcp.json`):
|
|
231
|
-
```json
|
|
232
|
-
{
|
|
233
|
-
"servers": {
|
|
234
|
-
"openchrome": {
|
|
235
|
-
"type": "stdio",
|
|
236
|
-
"command": "openchrome",
|
|
237
|
-
"args": ["serve", "--auto-launch"]
|
|
238
|
-
}
|
|
239
|
-
}
|
|
240
|
-
}
|
|
241
|
-
```
|
|
242
|
-
|
|
243
|
-
**Codex CLI** (`~/.codex/mcp.json`):
|
|
244
69
|
```json
|
|
245
70
|
{
|
|
246
71
|
"mcpServers": {
|
|
@@ -252,537 +77,154 @@ claude mcp add openchrome -- openchrome serve --auto-launch
|
|
|
252
77
|
}
|
|
253
78
|
```
|
|
254
79
|
|
|
255
|
-
|
|
256
|
-
**OpenCode** (`~/.config/opencode/opencode.json`):
|
|
257
|
-
```json
|
|
258
|
-
{
|
|
259
|
-
"$schema": "https://opencode.ai/config.json",
|
|
260
|
-
"mcp": {
|
|
261
|
-
"openchrome": {
|
|
262
|
-
"type": "local",
|
|
263
|
-
"command": ["npx", "--prefer-online", "-y", "openchrome-mcp@latest", "serve", "--auto-launch"]
|
|
264
|
-
}
|
|
265
|
-
}
|
|
266
|
-
}
|
|
267
|
-
```
|
|
268
|
-
|
|
269
|
-
**Cursor / Windsurf / Other stdio MCP clients:**
|
|
270
|
-
```json
|
|
271
|
-
{
|
|
272
|
-
"mcpServers": {
|
|
273
|
-
"openchrome": {
|
|
274
|
-
"command": "openchrome",
|
|
275
|
-
"args": ["serve", "--auto-launch"]
|
|
276
|
-
}
|
|
277
|
-
}
|
|
278
|
-
}
|
|
279
|
-
```
|
|
280
|
-
|
|
281
|
-
To update the CLI later and refresh your MCP client configuration, run:
|
|
282
|
-
```bash
|
|
283
|
-
openchrome update
|
|
284
|
-
```
|
|
285
|
-
|
|
286
|
-
</details>
|
|
287
|
-
|
|
288
|
-
---
|
|
289
|
-
|
|
290
|
-
## What's new in v1.11
|
|
291
|
-
|
|
292
|
-
v1.11 establishes a **core / pilot tier split** inside the same npm package. Existing v1.10 configurations work bit-identically; new harness capabilities are additive.
|
|
293
|
-
|
|
294
|
-
**Core tier** (active by default, no flag):
|
|
295
|
-
|
|
296
|
-
- **Trace recorder** — JSONL session capture with credential redactor
|
|
297
|
-
- **Skill state graph** — JSON-per-domain graph storage + `openchrome://skill-graph/<domain>` MCP resource
|
|
298
|
-
- **Skill memory** — JSON store + audit-log-backed stats; `oc_skill_record`, `oc_skill_recall` MCP tools
|
|
299
|
-
- **Outcome contracts** — DSL + evaluators (DOM, network, screenshot-class, perceptual hash); `oc_assert`, `oc_evidence_bundle` MCP tools
|
|
300
|
-
- **Perception primitives** — perceptual DOM metadata + DOM↔screenshot cross-check (Sobel + color)
|
|
301
|
-
|
|
302
|
-
**Pilot tier** (opt-in via `--pilot`):
|
|
303
|
-
|
|
304
|
-
```bash
|
|
305
|
-
openchrome serve --pilot
|
|
306
|
-
```
|
|
307
|
-
|
|
308
|
-
Adds: contract runtime (retry + idempotency + `beforeIrreversibleAction`), handoff token + AES-256-GCM persistence (ephemeral key default), multi-model voting framework (deterministic voters), skill curator (extractor + recall ranking + structural merge + PID lock + background runner).
|
|
309
|
-
|
|
310
|
-
Pilot families are individually gated by `OPENCHROME_TRACE`, `OPENCHROME_STATE_GRAPH`, `OPENCHROME_CONTRACT_RUNTIME`, `OPENCHROME_HANDOFF_PERSIST`, `OPENCHROME_PERCEPTION_VOTING`, `OPENCHROME_SKILL_CURATOR`. All default *active* inside `--pilot`; set the env to `0` to turn one family off.
|
|
311
|
-
|
|
312
|
-
**Design contract**: portability-harness policy at [`docs/roadmap/portability-harness-contract.md`](docs/roadmap/portability-harness-contract.md). Architecture overview at [`docs/architecture.md`](docs/architecture.md). End-to-end walkthrough at [`docs/getting-started.md`](docs/getting-started.md). Full v1.11 release notes at [`docs/releases/v1.11.1.md`](docs/releases/v1.11.1.md).
|
|
313
|
-
|
|
314
|
-
---
|
|
315
|
-
|
|
316
|
-
## Examples
|
|
317
|
-
|
|
318
|
-
**Parallel monitoring:**
|
|
319
|
-
```
|
|
320
|
-
oc screenshot AWS billing, GCP console, Stripe, and Datadog — all at once
|
|
321
|
-
→ 4 workers, 3.1s, already authenticated everywhere
|
|
322
|
-
```
|
|
323
|
-
|
|
324
|
-
**Multi-account:**
|
|
325
|
-
```
|
|
326
|
-
oc check orders on personal and business Amazon accounts simultaneously
|
|
327
|
-
→ 2 workers, isolated sessions, same site different accounts
|
|
328
|
-
```
|
|
329
|
-
|
|
330
|
-
**Competitive intelligence:**
|
|
331
|
-
```
|
|
332
|
-
oc compare prices for "AirPods Pro" across Amazon, eBay, Walmart, Best Buy
|
|
333
|
-
→ 4 workers, 4 sites, 2.4s, works past bot detection
|
|
334
|
-
```
|
|
335
|
-
|
|
336
|
-
---
|
|
337
|
-
|
|
338
|
-
## 50 Tools
|
|
339
|
-
|
|
340
|
-
| Category | Tools |
|
|
341
|
-
|----------|-------|
|
|
342
|
-
| **Navigate & Interact** | `navigate`, `interact`, `fill_form`, `find`, `computer` |
|
|
343
|
-
| **Read & Extract** | `read_page`, `page_content`, `javascript_tool`, `selector_query`, `xpath_query` |
|
|
344
|
-
| **Environment** | `emulate_device`, `geolocation`, `user_agent`, `network` |
|
|
345
|
-
| **Storage & Debug** | `cookies`, `storage`, `console_capture`, `performance_metrics`, `request_intercept` |
|
|
346
|
-
| **Parallel Workflows** | `workflow_init`, `workflow_collect`, `worker_create`, `batch_execute` |
|
|
347
|
-
| **Memory** | `memory_record`, `memory_query`, `memory_validate` |
|
|
348
|
-
| **Contracts & Skills** *(v1.11)* | `oc_assert`, `oc_evidence_bundle`, `oc_skill_record`, `oc_skill_recall` |
|
|
349
|
-
|
|
350
|
-
**MCP resources**: `openchrome://skill-graph/<domain>` (read-only JSON snapshot of the per-domain skill graph; v1.11).
|
|
351
|
-
|
|
352
|
-
<details>
|
|
353
|
-
<summary>Full tool list (50)</summary>
|
|
354
|
-
|
|
355
|
-
`navigate` `interact` `computer` `read_page` `find` `form_input` `fill_form` `javascript_tool` `page_reload` `page_content` `page_pdf` `wait_for` `user_agent` `geolocation` `emulate_device` `network` `selector_query` `xpath_query` `cookies` `storage` `console_capture` `performance_metrics` `request_intercept` `drag_drop` `file_upload` `http_auth` `worker_create` `worker_list` `worker_update` `worker_complete` `worker_delete` `tabs_create` `tabs_context` `tabs_close` `workflow_init` `workflow_status` `workflow_collect` `workflow_collect_partial` `workflow_cleanup` `execute_plan` `batch_execute` `lightweight_scroll` `memory_record` `memory_query` `memory_validate` `oc_stop` `oc_assert` `oc_evidence_bundle` `oc_skill_record` `oc_skill_recall`
|
|
356
|
-
|
|
357
|
-
With `--pilot` flag, additional pilot-tier tools are registered under the `oc_pilot_*` namespace (handoff, voting, curator). See [`docs/architecture.md`](docs/architecture.md).
|
|
358
|
-
|
|
80
|
+
Run `openchrome update` later to refresh the CLI and client config.
|
|
359
81
|
</details>
|
|
360
82
|
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
## CLI
|
|
364
|
-
|
|
365
|
-
```bash
|
|
366
|
-
openchrome setup # Auto-configure
|
|
367
|
-
openchrome serve --auto-launch # Start server
|
|
368
|
-
openchrome serve --pilot # Opt into pilot tier (v1.11+)
|
|
369
|
-
openchrome serve --headless-shell # Headless mode
|
|
370
|
-
openchrome doctor # Diagnose issues
|
|
371
|
-
openchrome update # Update CLI
|
|
372
|
-
```
|
|
373
|
-
|
|
374
|
-
Window placement (v1.11): `--window-size <w,h>`, `--window-position <x,y>`, `--window-bounds <x,y,w,h>`, `--start-maximized`, or set `OPENCHROME_WINDOW_*` env. Default is `0,0` + `1280×900` (previously `--start-maximized` + `1920×1080`).
|
|
375
|
-
|
|
376
|
-
---
|
|
377
|
-
|
|
378
|
-
## Cross-Platform
|
|
379
|
-
|
|
380
|
-
| Platform | Status |
|
|
381
|
-
|----------|--------|
|
|
382
|
-
| **macOS** | Full support |
|
|
383
|
-
| **Windows** | Full support (taskkill process cleanup) |
|
|
384
|
-
| **Linux** | Full support (Snap paths, `CHROME_PATH` env, `--no-sandbox` for CI) |
|
|
83
|
+
**Prefer no terminal?** A one-click [desktop app](https://github.com/shaun0927/openchrome/releases?q=desktop)
|
|
84
|
+
(macOS / Windows / Linux, beta) runs the server with no Node.js setup.
|
|
385
85
|
|
|
386
86
|
---
|
|
387
87
|
|
|
388
|
-
##
|
|
389
|
-
|
|
390
|
-
`read_page` supports three output modes:
|
|
391
|
-
|
|
392
|
-
| Mode | Output | Tokens | Use Case |
|
|
393
|
-
|------|--------|--------|----------|
|
|
394
|
-
| `ax` (default) | Accessibility tree with `ref_N` IDs | Baseline | Screen readers, semantic analysis |
|
|
395
|
-
| `dom` | Compact DOM with `backendNodeId` | **~5-10x fewer** | Click, fill, extract — most tasks |
|
|
396
|
-
| `css` | CSS diagnostic info (variables, computed styles, framework detection) | Minimal | Debugging styles, Tailwind detection |
|
|
397
|
-
|
|
398
|
-
**DOM mode example:**
|
|
399
|
-
```
|
|
400
|
-
read_page tabId="tab1" mode="dom"
|
|
401
|
-
|
|
402
|
-
[page_stats] url: https://example.com | title: Example | scroll: 0,0 | viewport: 1920x1080
|
|
403
|
-
|
|
404
|
-
# [142]<input type="search" placeholder="Search..." aria-label="Search"/> ★
|
|
405
|
-
$ [156]<button type="submit"/>Search ★
|
|
406
|
-
@ [289]<a href="/home"/>Home ★
|
|
407
|
-
[352]<h1/>Welcome to Example
|
|
408
|
-
```
|
|
409
|
-
|
|
410
|
-
DOM mode outputs `[backendNodeId]` as stable identifiers — they persist for the lifetime of the DOM node, unlike `ref_N` IDs which are cleared on each AX-mode `read_page` call. A compact marker before an identifier describes the action affordance: `#` text input, `@` link, `$` button/control, `%` visual target. The marker is display metadata only; pass the identifier itself (`142`, `node_142`, or `ref_N`) to action tools.
|
|
411
|
-
|
|
412
|
-
### JavaScript and Shadow DOM
|
|
413
|
-
|
|
414
|
-
`read_page` and `find` use CDP-pierced DOM reads, so they can show content inside open shadow roots that plain page JavaScript will not find with `document.querySelectorAll(...)`.
|
|
415
|
-
|
|
416
|
-
`javascript_tool` runs in the page context and preserves normal browser semantics, but it also injects helper functions for open shadow roots:
|
|
88
|
+
## What you can do with it
|
|
417
89
|
|
|
418
|
-
|
|
419
|
-
// Returns an array of matches from document plus recursively discovered open shadow roots.
|
|
420
|
-
__pierce('.artdeco-button')
|
|
90
|
+
Ask your agent in plain language — these all map to OpenChrome tools:
|
|
421
91
|
|
|
422
|
-
|
|
423
|
-
|
|
424
|
-
|
|
92
|
+
- **Parallel research** — "screenshot AWS billing, GCP, Stripe, Datadog at once" → 4 lanes, one Chrome, already authenticated.
|
|
93
|
+
- **Authenticated scraping** — crawl dashboards and member-only pages using your existing login. No credentials in config.
|
|
94
|
+
- **Form & flow automation** — fill, click, navigate multi-step flows; the agent gets corrective hints when a step drifts.
|
|
95
|
+
- **Production UI debugging** — `oc_performance_insights` / `oc_vitals` for LCP/CLS, `console_capture`, `oc_devtools_url` to attach live DevTools.
|
|
96
|
+
- **Site monitoring & diffing** — `oc_evidence_bundle` snapshots + `oc_diff` for deterministic before/after (DOM, screenshot pHash, network, console).
|
|
97
|
+
- **Crawling** — async `crawl_start` / `crawl_status` / `crawl_cancel` jobs with cursor pagination.
|
|
98
|
+
- **Verifiable runs** — `oc_assert` checks page state against an Outcome Contract (pass / fail / inconclusive) instead of guessing.
|
|
425
99
|
|
|
426
|
-
|
|
100
|
+
The default surface is ~110 tools across navigation, interaction, reading,
|
|
101
|
+
extraction, parallel workflows, contracts, skills, recovery, and diagnostics.
|
|
102
|
+
Full catalogue: [`docs/agent/capability-map.md`](docs/agent/capability-map.md).
|
|
427
103
|
|
|
428
104
|
---
|
|
429
105
|
|
|
430
|
-
##
|
|
431
|
-
|
|
432
|
-
Action tools that accept a `ref` parameter (`form_input`, `computer`, etc.) support three identifier formats:
|
|
106
|
+
## Using it conveniently
|
|
433
107
|
|
|
434
|
-
|
|
435
|
-
|--------|---------|--------|
|
|
436
|
-
| `ref_N` | `ref_5` | From `read_page` AX mode (ephemeral) |
|
|
437
|
-
| Raw integer | `142` | From `read_page` DOM mode (stable) |
|
|
438
|
-
| `node_N` | `node_142` | Explicit prefix form (stable) |
|
|
108
|
+
### Drive it from the shell — no MCP host needed
|
|
439
109
|
|
|
440
|
-
|
|
441
|
-
|
|
442
|
-
---
|
|
443
|
-
|
|
444
|
-
## Session Persistence
|
|
445
|
-
|
|
446
|
-
Headless mode (`--headless-shell`) doesn't persist cookies across restarts. Enable storage state persistence to maintain authenticated sessions:
|
|
110
|
+
The CLI can call the MCP surface directly. Great for scripts, CI, and debugging:
|
|
447
111
|
|
|
448
112
|
```bash
|
|
449
|
-
oc
|
|
450
|
-
oc
|
|
113
|
+
oc run navigate --arg url=https://example.com
|
|
114
|
+
oc run read_page --arg mode=dom --json
|
|
115
|
+
oc navigate https://example.com # positional sugar for common tools
|
|
116
|
+
oc click ref_5
|
|
451
117
|
```
|
|
452
118
|
|
|
453
|
-
|
|
454
|
-
|
|
455
|
-
---
|
|
456
|
-
|
|
457
|
-
## Anti-Bot & Turnstile Support
|
|
458
|
-
|
|
459
|
-
OpenChrome includes built-in defenses against Cloudflare Turnstile and similar anti-bot systems. See [Turnstile Guide](docs/turnstile-guide.md) for details.
|
|
460
|
-
|
|
461
|
-
### 3-Tier Auto-Fallback for CDN/WAF Blocks
|
|
462
|
-
|
|
463
|
-
When a navigation is blocked by CDN/WAF systems (Akamai, Cloudflare, etc.), OpenChrome automatically escalates through three tiers:
|
|
464
|
-
|
|
465
|
-
| Tier | Mode | What It Bypasses |
|
|
466
|
-
|------|------|-----------------|
|
|
467
|
-
| 1 | Headless Chrome | Normal navigation — works for most sites |
|
|
468
|
-
| 2 | Stealth + Headless | JS-level anti-bot (PerimeterX, Turnstile, basic fingerprinting) |
|
|
469
|
-
| 3 | **Headed Chrome** | TLS/UA-level blocking (Akamai CDN, network security filters) |
|
|
470
|
-
|
|
471
|
-
Tier 3 launches a real headed Chrome window with a genuine user-agent (`Chrome/...` instead of `HeadlessChrome/...`) and a different TLS fingerprint, bypassing binary-level detection that no JavaScript injection can fix.
|
|
472
|
-
|
|
473
|
-
**Parameters:**
|
|
474
|
-
- `autoFallback: false` — disable automatic CDN/WAF retry. This does not log in for you or bypass normal authentication redirects.
|
|
475
|
-
- `headed: true` — skip directly to headed Chrome for user-visible login, 2FA, CAPTCHA, or sites that require a real browser window.
|
|
476
|
-
- `stealth: true` — use stealth mode (Tier 2) explicitly.
|
|
477
|
-
|
|
478
|
-
**Authentication note:** Auto-fallback is for detected CDN/WAF blocking. If a protected app redirects from the requested URL to a same-site login page, treat that as an authentication handoff: retry with `headed: true` and the same persistent profile, let the user complete login, then verify whether headless can reuse that profile state.
|
|
479
|
-
|
|
480
|
-
**Environment:** Tier 3 requires a display (macOS/Windows desktop, or Linux with `$DISPLAY`). In server/container environments without a display, Tier 3 is gracefully skipped.
|
|
481
|
-
|
|
482
|
-
### Known Limitations
|
|
483
|
-
|
|
484
|
-
- **CAPTCHA-protected sites (e.g., Reddit):** Auto-fallback correctly detects and escalates through all tiers, but sites that serve CAPTCHA challenges ("Prove your humanity") to all automated clients — regardless of headless/headed mode — require human interaction to solve. This is beyond auto-fallback's scope, which targets CDN/WAF network-level blocking (TLS fingerprint, user-agent detection), not interactive CAPTCHA challenges.
|
|
485
|
-
|
|
486
|
-
---
|
|
487
|
-
|
|
488
|
-
## FAQ: Comparison with Other Browser MCPs
|
|
489
|
-
|
|
490
|
-
Common questions from users evaluating OpenChrome against Chrome DevTools MCP, Firefox DevTools MCP, and similar tools (see [#612](https://github.com/shaun0927/openchrome/issues/612)).
|
|
491
|
-
|
|
492
|
-
### Can multiple MCP clients share tabs safely?
|
|
493
|
-
|
|
494
|
-
**Yes — tabs cannot clobber each other across clients.**
|
|
495
|
-
|
|
496
|
-
OpenChrome identifies every tab by its CDP `targetId` — a stable, browser-assigned string — not by a visible 1/2/3 index. On top of stable IDs, two layers of isolation are specifically designed for multi-client scenarios:
|
|
119
|
+
### Declarative scenarios with `oc playbook`
|
|
497
120
|
|
|
498
|
-
|
|
499
|
-
|
|
500
|
-
|
|
501
|
-
If client A opens five tabs and client B opens five tabs, all ten `tabId`s are distinct and stable; a new tab from A can never displace B's tab #3.
|
|
502
|
-
|
|
503
|
-
### How do I handle sites that require interactive login (password, 2FA, CAPTCHA)?
|
|
504
|
-
|
|
505
|
-
Use two mechanisms, but keep their guarantees separate:
|
|
506
|
-
|
|
507
|
-
**1. Persistent-profile headless — reuse an already-authenticated profile.**
|
|
508
|
-
Point OpenChrome at a persistent `userDataDir` (+ optional `profileDirectory`) so cookies / `localStorage` / IndexedDB can survive across runs. If that persistent profile already contains a valid session, subsequent **headless** runs stay logged in until the site invalidates the session.
|
|
509
|
-
|
|
510
|
-
**2. Headed-by-default / headed fallback — let the user complete an interactive step.**
|
|
511
|
-
Since #657 the launcher runs headed by default, so first-time login, 2FA, CAPTCHA, and WebAuthn can use a real visible window without extra flags. CI / Docker users opt into headless via `--headless` or `OPENCHROME_HEADLESS=1` after their persistent profile is bootstrapped. When a Tier-1/Tier-2 headless attempt is blocked by a CDN/WAF, OpenChrome can also lazy-launch a separate headed Chrome on a different debug port and register the headed page back into the same logical session.
|
|
512
|
-
|
|
513
|
-
**Important:** a headed tab being authenticated does not automatically prove that a new headless tab can reuse the session after the headed tab is closed. Sites differ in how they bind cookies, storage, browser fingerprints, and session freshness. Always verify the handoff by closing/restarting the headed path you plan to stop using and navigating headless to the protected URL with the same persistent profile.
|
|
514
|
-
|
|
515
|
-
**Recommended flow:**
|
|
516
|
-
1. Start with the persistent `userDataDir` / `profileDirectory` you intend to keep using.
|
|
517
|
-
2. Navigate to the protected URL. If it resolves to `/login` or another auth page, do not keep retrying unauthenticated headless navigation.
|
|
518
|
-
3. Use the visible headed window (default) or navigate with `headed: true` and the same profile context, then let the user complete login/2FA/CAPTCHA.
|
|
519
|
-
4. Retry the protected URL with the same profile in the mode you intend to automate.
|
|
520
|
-
5. If headless still lands on the login page, keep the headed tab open for that site or reconfigure persistence; do not assume the headed auth state transferred.
|
|
521
|
-
|
|
522
|
-
### Does OpenChrome steal focus with popup windows?
|
|
523
|
-
|
|
524
|
-
**No — the "recurring popup interruptions" problem does not occur in OpenChrome.**
|
|
525
|
-
|
|
526
|
-
The headed-browser focus-stealing pattern that users encounter with some MCP servers (cross-Space jumps on macOS, un-minimizable popups, per-tool-call window raises) comes from designs where the MCP drives a user-visible browser and creates OS windows as it works. OpenChrome is architected differently:
|
|
527
|
-
|
|
528
|
-
- **`tabs_create` opens a tab, not an OS window.** New tabs are created via CDP inside the already-running Chrome, and OpenChrome never calls `page.bringToFront()` anywhere in the codebase.
|
|
529
|
-
- **No per-call window raises.** Each navigation/click/tool call runs against the existing window without `bringToFront()`, `focus()`, or any other stealing primitive. After the initial Chrome launch you keep working in your other apps without interruption.
|
|
530
|
-
- **One Chrome per server lifetime.** Auto-launch creates Chrome **once** at startup and reuses it for every later tool call — no popup-per-action loop.
|
|
531
|
-
- **Headless opt-in available.** For CI/server use, `--headless` or `OPENCHROME_HEADLESS=1` runs without any window at all. The default is headed since #657 because headless mode is materially more prone to silent hangs and login failures on real-world sites.
|
|
532
|
-
|
|
533
|
-
The only scenario in which a focus grab can happen is the very first Chrome launch — not one per tool call, never one per tab.
|
|
534
|
-
|
|
535
|
-
---
|
|
536
|
-
|
|
537
|
-
## Benchmarks
|
|
538
|
-
|
|
539
|
-
Measure token efficiency and parallel performance:
|
|
121
|
+
Write a YAML scenario where every step is one tool call with an inline
|
|
122
|
+
Outcome Contract — deterministic, no LLM judgement:
|
|
540
123
|
|
|
541
124
|
```bash
|
|
542
|
-
|
|
543
|
-
npm run benchmark:ci # Stub mode: AX vs DOM with JSON + regression detection
|
|
544
|
-
npm run benchmark:perception # Stub mode: perception latency/token/fallback guards
|
|
545
|
-
npm run benchmark:perception -- --ci --json # CI-friendly JSON output for perception regression checks
|
|
546
|
-
npm run benchmark -- --mode real # Real mode: actual MCP server (requires Chrome)
|
|
547
|
-
npx ts-node tests/benchmark/run-parallel.ts # Stub mode: all parallel benchmark categories
|
|
548
|
-
npx ts-node tests/benchmark/run-parallel.ts --mode real --category batch-js --runs 1 # Real mode
|
|
549
|
-
npx ts-node tests/benchmark/run-parallel.ts --mode real --category realworld --runs 1 # Real-world benchmarks
|
|
125
|
+
oc playbook run scenario.yaml --vars url=https://iana.org --out report.md
|
|
550
126
|
```
|
|
551
127
|
|
|
552
|
-
|
|
553
|
-
|
|
554
|
-
**Parallel benchmark categories:**
|
|
555
|
-
|
|
556
|
-
| Category | What It Measures |
|
|
557
|
-
|----------|-----------------|
|
|
558
|
-
| Multi-step interaction | Form fill + click sequences across N parallel pages |
|
|
559
|
-
| Batch JS execution | N × `javascript_tool` vs 1 × `batch_execute` |
|
|
560
|
-
| Compiled plan execution | Sequential agent tool calls vs single `execute_plan` |
|
|
561
|
-
| Streaming collection | Blocking vs `workflow_collect_partial` |
|
|
562
|
-
| Init overhead | Sequential `tabs_create` vs batch `workflow_init` |
|
|
563
|
-
| Fault tolerance | Circuit breaker recovery speed |
|
|
564
|
-
| Scalability curve | Speedup efficiency at 1–50x concurrency |
|
|
565
|
-
| **Real-world** | Multi-site crawl, heavy JS, pipeline, scalability with public websites (`httpbin.org`, `jsonplaceholder`, `example.com`) — NOT included in `all`, requires network |
|
|
566
|
-
|
|
567
|
-
---
|
|
568
|
-
|
|
569
|
-
## Server / Headless Deployment
|
|
128
|
+
See [`docs/cli/playbook.md`](docs/cli/playbook.md).
|
|
570
129
|
|
|
571
|
-
|
|
130
|
+
### Keep one browser warm — HTTP daemon mode
|
|
572
131
|
|
|
573
|
-
OpenChrome
|
|
574
|
-
|
|
575
|
-
|
|
132
|
+
Run OpenChrome as a long-lived daemon so multiple clients (Claude Code + CI +
|
|
133
|
+
a dashboard) share **one** Chrome process, and the server outlives whatever
|
|
134
|
+
launched it (Docker, systemd, CI):
|
|
576
135
|
|
|
577
136
|
```bash
|
|
578
|
-
|
|
579
|
-
openchrome serve --server-mode
|
|
580
|
-
```
|
|
581
|
-
|
|
582
|
-
`--server-mode` automatically sets:
|
|
583
|
-
- Auto-launches Chrome in headless mode
|
|
584
|
-
- Skips cookie bridge scanning (~5s faster per page creation)
|
|
585
|
-
- Optimal defaults for server environments
|
|
586
|
-
|
|
587
|
-
### What works without login
|
|
588
|
-
|
|
589
|
-
| Category | Tools |
|
|
590
|
-
|----------|-------|
|
|
591
|
-
| **Navigation & scraping** | `navigate`, `read_page`, `page_content`, `javascript_tool` |
|
|
592
|
-
| **Interaction** | `interact`, `fill_form`, `drag_drop`, `file_upload` |
|
|
593
|
-
| **Parallel workflows** | `workflow_init` with multiple workers, `batch_execute` |
|
|
594
|
-
| **Screenshots & PDF** | `computer(screenshot)`, `page_pdf` |
|
|
595
|
-
| **Network & performance** | `request_intercept`, `performance_metrics`, `console_capture` |
|
|
596
|
-
|
|
597
|
-
### HTTP daemon mode (multi-client / remote)
|
|
598
|
-
|
|
599
|
-
OpenChrome ships a first-class HTTP transport that turns the server into a
|
|
600
|
-
long-running daemon. Use it when:
|
|
601
|
-
|
|
602
|
-
- you need **multiple MCP clients** (Claude Code, a CI job, a dashboard) to
|
|
603
|
-
share one browser process, or
|
|
604
|
-
- the server must **outlive its launching process** (Docker, systemd, CI
|
|
605
|
-
orchestrator), or
|
|
606
|
-
- a **sidecar** (monitoring probe, dashboard) needs to poll `/health` or
|
|
607
|
-
`/metrics` independently.
|
|
608
|
-
|
|
609
|
-
Quick start (see the full guide for options, security, and Windows recipes):
|
|
610
|
-
|
|
611
|
-
```bash
|
|
612
|
-
# Start a token-authenticated daemon on loopback port 3100
|
|
613
|
-
npx openchrome serve --http 3100 --auth-token mysecrettoken --idle-timeout 30m
|
|
614
|
-
|
|
615
|
-
# Verify it is up
|
|
137
|
+
openchrome serve --http 3100 --auth-token <token> --idle-timeout 30m
|
|
616
138
|
curl -s http://127.0.0.1:3100/health
|
|
617
|
-
# → {"status":"ok","uptime":1.2}
|
|
618
|
-
|
|
619
|
-
# Send an MCP request
|
|
620
|
-
curl -s -X POST http://127.0.0.1:3100/mcp \
|
|
621
|
-
-H "Content-Type: application/json" \
|
|
622
|
-
-H "Authorization: Bearer mysecrettoken" \
|
|
623
|
-
--data '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'
|
|
624
139
|
```
|
|
625
140
|
|
|
626
|
-
|
|
141
|
+
One Chrome process, tabs isolated per session. Without `--idle-timeout` it stays
|
|
142
|
+
up until stopped; with it, it self-exits after the idle window. Full guide:
|
|
143
|
+
[`docs/getting-started/http-daemon.md`](docs/getting-started/http-daemon.md).
|
|
627
144
|
|
|
628
|
-
|
|
629
|
-
interaction, multi-client architecture diagram, security model (bearer-token
|
|
630
|
-
auth, loopback-only default, unauthenticated rejection rules), idle-timeout
|
|
631
|
-
behaviour, dashboard endpoints, and troubleshooting.
|
|
145
|
+
### Diagnose your environment
|
|
632
146
|
|
|
633
|
-
|
|
634
|
-
|
|
635
|
-
|
|
636
|
-
|
|
637
|
-
### Important: MCP client required
|
|
638
|
-
|
|
639
|
-
OpenChrome is an MCP server — it responds to tool calls, not standalone scripts. Server-side usage requires an MCP client (e.g., Claude API, Claude Code, or a custom MCP client) to drive it:
|
|
640
|
-
|
|
641
|
-
```
|
|
642
|
-
MCP Client (LLM) → stdio → OpenChrome (--server-mode) → Chrome
|
|
147
|
+
```bash
|
|
148
|
+
openchrome doctor # Node, disk, Chrome binary/port, orphans, perms, locks
|
|
149
|
+
openchrome check # verify CLI + runtime wiring
|
|
643
150
|
```
|
|
644
151
|
|
|
645
|
-
|
|
646
|
-
|
|
647
|
-
### Docker
|
|
152
|
+
### Token-efficient page reads
|
|
648
153
|
|
|
649
|
-
|
|
154
|
+
`read_page mode="dom"` serializes the page into a compact text form — **~5–15x
|
|
155
|
+
fewer tokens** than the raw DOM. Each element carries an affordance marker so the
|
|
156
|
+
agent knows the type at a glance:
|
|
650
157
|
|
|
651
|
-
```bash
|
|
652
|
-
docker build -t openchrome .
|
|
653
|
-
docker run openchrome
|
|
654
158
|
```
|
|
655
|
-
|
|
656
|
-
|
|
657
|
-
|
|
658
|
-
| Variable | Description |
|
|
659
|
-
|----------|-------------|
|
|
660
|
-
| `CHROME_PATH` | Path to Chrome/Chromium binary (used by launcher) |
|
|
661
|
-
| `CHROME_BINARY` | Path to Chrome binary (used by `--chrome-binary` CLI flag) |
|
|
662
|
-
| `CHROME_USER_DATA_DIR` | Custom profile directory |
|
|
663
|
-
| `CI` | Detected automatically; adds `--no-sandbox` |
|
|
664
|
-
| `DOCKER` | Detected automatically; adds `--no-sandbox` |
|
|
665
|
-
| `OPENCHROME_PPID_WATCH` | Set to `0` to disable the parent-process watcher in stdio mode. Default: enabled. The watcher exits the server when its launching MCP-client parent dies, so abrupt parent termination (force-quit, `kill -9`, tmux teardown) does not orphan the openchrome process. HTTP and `both` transport modes ignore this flag — they remain daemon-capable. |
|
|
666
|
-
| `OPENCHROME_PPID_WATCH_INTERVAL_MS` | Polling interval for the parent watcher in milliseconds. Default: `2000`. Clamped to `[500, 60000]`. |
|
|
667
|
-
| `OPENCHROME_HEALTH_ENDPOINT` | Force-enable (`1`/`true`) or force-disable (`0`/`false`) the `/health` and `/metrics` HTTP listener. Default: on for `--transport http` / `both`, off for `--transport stdio`. Stdio-mode instances usually talk to a single MCP client over the pipe and do not need an external monitoring port; disabling it saves ~200-300 KB heap + one file descriptor per instance. Operators who run stdio under a supervisor can opt in with `OPENCHROME_HEALTH_ENDPOINT=1`; daemon-mode operators who run the health check externally can opt out with `OPENCHROME_HEALTH_ENDPOINT=0`. Invalid values (e.g. `yes`, `2`) fall through to the transport-mode default. |
|
|
668
|
-
| `OPENCHROME_HEADLESS` | Opt into headless Chrome from CI / Docker without `--headless`. Accepts `1`/`true`/`yes` (headless) or `0`/`false`/`no` (headed). Lower precedence than the CLI flag, higher than persisted config. Default since #657 is headed. |
|
|
669
|
-
| `OPENCHROME_WINDOW_SIZE` | Headed auto-launch window size as `width,height`, for example `1280,900`. Width and height must be positive integers. |
|
|
670
|
-
| `OPENCHROME_WINDOW_POSITION` | Headed auto-launch window position as `x,y`, for example `0,0` or `1920,0`. Coordinates may be negative. |
|
|
671
|
-
| `OPENCHROME_WINDOW_BOUNDS` | Headed auto-launch bounds as `x,y,width,height`. Overrides `OPENCHROME_WINDOW_SIZE` and `OPENCHROME_WINDOW_POSITION`. |
|
|
672
|
-
| `OPENCHROME_START_MAXIMIZED` | Set to `1`/`true`/`yes` to start headed Chrome maximized when no explicit bounds, size, or position is configured. |
|
|
673
|
-
| `OPENCHROME_FILE_UPLOAD_ROOTS` | Additional directories allowed for `file_upload`, separated with the platform path delimiter (`:` on POSIX, `;` on Windows). Defaults always include the server process current working directory and the temp upload directory. |
|
|
674
|
-
| `OPENCHROME_FILE_UPLOAD_TEMP_DIR` | Temp/upload directory allowed for `file_upload`. Default: `path.join(os.tmpdir(), 'openchrome-uploads')`. |
|
|
675
|
-
| `OPENCHROME_CONSOLE_BUFFER_MAX_LINES` | Hard cap on retained console log entries per capture session. Default: `1000` (matches the legacy `maxLogs` default). Increase for high-frequency logging pages; decrease to reduce memory pressure. |
|
|
676
|
-
| `OPENCHROME_CONSOLE_BUFFER_MAX_BYTES` | Hard cap on aggregate byte size of retained console entries per session (JSON.stringify length). Default: `4194304` (4 MiB). When a single entry exceeds this cap it is stored as a truncated placeholder with the original size recorded in `truncatedFrom`. |
|
|
677
|
-
|
|
678
|
-
`file_upload` only accepts files whose resolved real path stays inside the server current working directory, the temp upload directory, or an explicit upload root. Add the narrowest possible root, for example `OPENCHROME_FILE_UPLOAD_ROOTS=/srv/openchrome/uploads`, rather than broad locations such as `$HOME`, browser profiles, SSH/GPG/cloud credential directories, or application config trees. Symlinks and `..` traversal are resolved with `realpath` before allowlist checks, so a symlink inside an allowed root that points outside the root is rejected. For untrusted clients, prefer staging files into `OPENCHROME_FILE_UPLOAD_TEMP_DIR` and upload from there.
|
|
679
|
-
|
|
680
|
-
### Individual flags
|
|
681
|
-
|
|
682
|
-
For fine-grained control, use individual flags instead of `--server-mode`:
|
|
683
|
-
|
|
684
|
-
```bash
|
|
685
|
-
openchrome serve \
|
|
686
|
-
--auto-launch \
|
|
687
|
-
--headless-shell \
|
|
688
|
-
--port 9222
|
|
159
|
+
# [142]<input type="search" .../> ★ ← # text input
|
|
160
|
+
$ [156]<button type="submit"/>Search ★ ← $ button / control
|
|
161
|
+
@ [289]<a href="/home"/>Home ★ ← @ link (% = visual target)
|
|
689
162
|
```
|
|
690
163
|
|
|
691
|
-
|
|
692
|
-
|
|
693
|
-
|
|
694
|
-
|
|
695
|
-
| `--headless-shell` | `false` | Use chrome-headless-shell binary |
|
|
696
|
-
| `--visible` | (deprecated) | No-op alias for headed; the new default is headed since #657. Will be removed in a future release. |
|
|
697
|
-
| `--window-size <width,height>` | `1280,900` headed | Set headed Chrome auto-launch size. CLI overrides env. |
|
|
698
|
-
| `--window-position <x,y>` | `0,0` headed | Set headed Chrome auto-launch position. CLI overrides env. |
|
|
699
|
-
| `--window-bounds <x,y,width,height>` | unset | Set headed Chrome auto-launch position and size together. Overrides size/position. |
|
|
700
|
-
| `--start-maximized` | `false` | Start headed Chrome maximized only when no explicit bounds, size, or position is set. |
|
|
701
|
-
| `--server-mode` | `false` | Compound flag for server deployment (forces `--headless` + auto-launch) |
|
|
702
|
-
|
|
703
|
-
Headed auto-launch defaults to `--window-position=0,0 --window-size=1280,900` so the first Chrome window lands predictably without maximizing. To place Chrome on the right side of a 1920px-wide primary display:
|
|
164
|
+
`[backendNodeId]` identifiers are stable for the node's lifetime — pass `142`,
|
|
165
|
+
`node_142`, or `ref_N` to any action tool. `oc_observe` goes further: it returns
|
|
166
|
+
a ready-to-act numbered list in one call instead of `read_page → query_dom →
|
|
167
|
+
inspect → interact`.
|
|
704
168
|
|
|
705
|
-
|
|
706
|
-
openchrome serve --auto-launch --window-bounds 1920,0,1280,900
|
|
707
|
-
```
|
|
169
|
+
---
|
|
708
170
|
|
|
709
|
-
|
|
171
|
+
## Why agents fail less on OpenChrome
|
|
710
172
|
|
|
711
|
-
|
|
712
|
-
|
|
713
|
-
```
|
|
714
|
-
|
|
715
|
-
---
|
|
173
|
+
The bottleneck in browser automation is the LLM *thinking* between steps — every
|
|
174
|
+
wrong guess costs 10–15s of inference. OpenChrome's harness cuts that loop:
|
|
716
175
|
|
|
717
|
-
|
|
176
|
+
| Subsystem | What it does |
|
|
177
|
+
|---|---|
|
|
178
|
+
| **Hint engine** (30+ rules) | Catches error→recovery patterns and corrects the agent before mistakes cascade. Promotes repeated patterns to permanent rules. |
|
|
179
|
+
| **Recovery runtime** | Deterministic, bounded recovery for a tool call — recover in-server, no LLM round-trip (pilot tier). |
|
|
180
|
+
| **Ralph engine** | 7-strategy interaction waterfall: AX click → CSS → CDP coords → JS → keyboard → raw mouse → human escalation. |
|
|
181
|
+
| **3-level circuit breaker** | Element / page / global — stops the agent burning tokens on permanently broken elements. |
|
|
182
|
+
| **Outcome classifier** | Reports what *actually* happened after a click (SUCCESS / SILENT_CLICK / WRONG_ELEMENT). |
|
|
183
|
+
| **49 reliability mechanisms** | 8 defense layers from process lifecycle to MCP gateway — no single failure hangs the server. See [`docs/architecture.md`](docs/architecture.md). |
|
|
718
184
|
|
|
719
|
-
|
|
185
|
+
Result on a typical 5-site task: ~80% fewer LLM calls, ~80x faster wall time,
|
|
186
|
+
~5x cheaper.
|
|
720
187
|
|
|
721
188
|
---
|
|
722
189
|
|
|
723
|
-
##
|
|
190
|
+
## Other capabilities worth knowing
|
|
724
191
|
|
|
725
|
-
|
|
192
|
+
- **Parallel sessions** — 1 Chrome, N tabs/lanes; `workerId` + `profileDirectory` give per-client isolation. Multiple MCP clients can share tabs safely.
|
|
193
|
+
- **Anti-bot / Turnstile** — 3-tier auto-fallback (headless → stealth → real headed Chrome) bypasses CDN/WAF blocks. [Turnstile guide](docs/turnstile-guide.md).
|
|
194
|
+
- **Interactive login** — headed by default since the launcher runs visible; complete 2FA/CAPTCHA once, reuse the persistent profile after.
|
|
195
|
+
- **Session persistence** — `--persist-storage` saves cookies + localStorage atomically for headless reuse.
|
|
196
|
+
- **Shadow DOM** — open + closed roots via CDP-pierced reads; `__pierce()` / `__openchrome.querySelectorAllDeep()` helpers in `javascript_tool`.
|
|
197
|
+
- **Element intelligence** — find elements by natural language (AX-first, CSS fallback, Korean role keywords built in: `"버튼"` → button).
|
|
198
|
+
- **Core / pilot tiers** — core is on by default and preserves the stable surface; `--pilot` opts into contract runtime, handoff persistence, voting, and the skill curator.
|
|
726
199
|
|
|
727
200
|
---
|
|
728
201
|
|
|
729
|
-
##
|
|
202
|
+
## Server & headless deployment
|
|
730
203
|
|
|
731
|
-
|
|
732
|
-
|
|
733
|
-
```
|
|
734
|
-
┌─────────────────────────────────────────────────────────────┐
|
|
735
|
-
│ Layer 7: MCP Gateway │
|
|
736
|
-
│ Rate limiter · Tool timeout (120s) · Error recovery hints │
|
|
737
|
-
├─────────────────────────────────────────────────────────────┤
|
|
738
|
-
│ Layer 6: Session Management │
|
|
739
|
-
│ TTL cleanup · Memory pressure · Target reconciliation │
|
|
740
|
-
├─────────────────────────────────────────────────────────────┤
|
|
741
|
-
│ Layer 5: Request Queue │
|
|
742
|
-
│ Per-session FIFO · Per-item timeout (120s) │
|
|
743
|
-
├─────────────────────────────────────────────────────────────┤
|
|
744
|
-
│ Layer 4: Circuit Breaker │
|
|
745
|
-
│ Element (3 fails) · Page (5 fails) · Global (10/5min) │
|
|
746
|
-
├─────────────────────────────────────────────────────────────┤
|
|
747
|
-
│ Layer 3: CDP Client │
|
|
748
|
-
│ Adaptive heartbeat · Stale target guard · Page defenses │
|
|
749
|
-
├─────────────────────────────────────────────────────────────┤
|
|
750
|
-
│ Layer 2: Reconnection Engine │
|
|
751
|
-
│ Auto-reconnect (5 retries) · Exponential backoff · Cookie │
|
|
752
|
-
│ restore · Sleep/wake detection │
|
|
753
|
-
├─────────────────────────────────────────────────────────────┤
|
|
754
|
-
│ Layer 1: Self-Healing │
|
|
755
|
-
│ Chrome watchdog · Tab health monitor · Event loop monitor │
|
|
756
|
-
│ Disk monitor · Health endpoint (/health, /metrics) │
|
|
757
|
-
├─────────────────────────────────────────────────────────────┤
|
|
758
|
-
│ Layer 0: Process Lifecycle │
|
|
759
|
-
│ Graceful shutdown · Orphan cleanup · Atomic file writes │
|
|
760
|
-
└─────────────────────────────────────────────────────────────┘
|
|
204
|
+
```bash
|
|
205
|
+
openchrome serve --server-mode # headless + auto-launch + server defaults
|
|
761
206
|
```
|
|
762
207
|
|
|
763
|
-
|
|
208
|
+
Works in CI/CD and containers with no login — navigation, scraping, screenshots,
|
|
209
|
+
forms, and parallel workflows all run in clean sessions. A production
|
|
210
|
+
`Dockerfile` is included (`docker build -t openchrome . && docker run openchrome`).
|
|
764
211
|
|
|
765
|
-
|
|
212
|
+
Authentication (per-tenant API keys, JWT/OAuth, shared token): [`docs/auth.md`](docs/auth.md).
|
|
213
|
+
Transport stability policy: [`docs/transport-lifecycle.md`](docs/transport-lifecycle.md).
|
|
766
214
|
|
|
767
|
-
|
|
215
|
+
---
|
|
768
216
|
|
|
769
|
-
|
|
770
|
-
"Submit button" → normalizeQuery → parseQueryForAX → AX Tree Resolution
|
|
771
|
-
│
|
|
772
|
-
match found?
|
|
773
|
-
/ \
|
|
774
|
-
yes no
|
|
775
|
-
│ │
|
|
776
|
-
[AX result] CSS Fallback
|
|
777
|
-
+ Shadow DOM
|
|
778
|
-
+ Scoring
|
|
779
|
-
```
|
|
217
|
+
## Documentation
|
|
780
218
|
|
|
781
|
-
|
|
782
|
-
|
|
783
|
-
|
|
784
|
-
|
|
785
|
-
|
|
219
|
+
| Topic | Link |
|
|
220
|
+
|---|---|
|
|
221
|
+
| Architecture & reliability layers | [`docs/architecture.md`](docs/architecture.md) |
|
|
222
|
+
| Getting started walkthrough | [`docs/getting-started.md`](docs/getting-started.md) |
|
|
223
|
+
| Full tool catalogue | [`docs/agent/capability-map.md`](docs/agent/capability-map.md) |
|
|
224
|
+
| CLI & playbook | [`docs/cli.md`](docs/cli.md) · [`docs/cli/playbook.md`](docs/cli/playbook.md) |
|
|
225
|
+
| HTTP daemon mode | [`docs/getting-started/http-daemon.md`](docs/getting-started/http-daemon.md) |
|
|
226
|
+
| Research recipes | [`docs/recipes/README.md`](docs/recipes/README.md) |
|
|
227
|
+
| Latest release notes | [`docs/releases/v1.12.2.md`](docs/releases/v1.12.2.md) |
|
|
786
228
|
|
|
787
229
|
---
|
|
788
230
|
|
|
@@ -794,12 +236,11 @@ cd openchrome
|
|
|
794
236
|
npm install && npm run build && npm test
|
|
795
237
|
```
|
|
796
238
|
|
|
797
|
-
|
|
798
|
-
|
|
799
|
-
|
|
800
|
-
smaller PR checks, `npm run lint:changed -- --base origin/develop` lints only
|
|
801
|
-
changed `src/**/*.ts` files and also fails on warnings.
|
|
239
|
+
Lint before submitting source changes: `npm run lint -- --max-warnings=0`
|
|
240
|
+
(or `npm run lint:changed -- --base origin/develop` for changed files only).
|
|
241
|
+
PRs target the `develop` branch.
|
|
802
242
|
|
|
803
243
|
## License
|
|
804
244
|
|
|
805
245
|
MIT
|
|
246
|
+
</content>
|