athena-browser-mcp 2.0.5 → 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +80 -231
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,293 +1,142 @@
|
|
|
1
1
|
# Athena Browser MCP
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
[](https://www.npmjs.com/package/athena-browser-mcp)
|
|
5
|
-
[](https://opensource.org/licenses/MIT)
|
|
3
|
+
An MCP server for browser automation that exposes semantic, token-efficient page representations optimized for LLM agents.
|
|
6
4
|
|
|
7
|
-
|
|
5
|
+
---
|
|
8
6
|
|
|
9
|
-
##
|
|
7
|
+
## Motivation
|
|
10
8
|
|
|
11
|
-
LLM agents
|
|
9
|
+
LLM-based agents operate under strict context window and token constraints.
|
|
10
|
+
However, most browser automation tools expose entire DOMs or full accessibility trees to the model.
|
|
12
11
|
|
|
13
|
-
|
|
12
|
+
This leads to:
|
|
14
13
|
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
-
|
|
18
|
-
- **Stable references** - Semantic `eid`s survive DOM mutations, eliminating stale element errors
|
|
14
|
+
- Rapid token exhaustion
|
|
15
|
+
- Higher inference costs
|
|
16
|
+
- Reduced reliability as relevant signal is buried in noise
|
|
19
17
|
|
|
20
|
-
|
|
18
|
+
In practice, agents spend more effort _finding_ the right information than reasoning about it.
|
|
21
19
|
|
|
22
|
-
|
|
20
|
+
Athena exists to change the unit of information exposed to the model.
|
|
23
21
|
|
|
24
|
-
|
|
22
|
+
---
|
|
25
23
|
|
|
26
|
-
|
|
27
|
-
| --- | ------------------------------------------------------------------------------------- | -------------- | ---------- | ----------- | ---------- |
|
|
28
|
-
| 1 | Login → Create wishlist "Summer Escapes" → Add beach property (Airbnb) | **Athena** | ✅ Success | 92,870 | 2m 08s |
|
|
29
|
-
| | | **Playwright** | ✅ Success | 137,063 | 5m 23s |
|
|
30
|
-
| 2 | Bangkok Experiences → Food tour → Extract itinerary & pricing (Airbnb) | **Athena** | ✅ Success | 87,194 | 3m 27s |
|
|
31
|
-
| | | **Playwright** | ✅ Success | 94,942 | 3m 38s |
|
|
32
|
-
| 3 | Miami → Beachfront stays under $300 → Top 3 names + prices (Airbnb) | **Athena** | ✅ Success | 124,597 | 5m 38s |
|
|
33
|
-
| | | **Playwright** | ✅ Success | 122,077 | 4m 51s |
|
|
34
|
-
| 4 | Paris → "Play" section → Top 5 titles + descriptions (Airbnb) | **Athena** | ❌ Failed | 146,575 | 4m 15s |
|
|
35
|
-
| | | **Playwright** | ❌ Failed | 189,495 | 7m 37s |
|
|
36
|
-
| 5 | Navigate Apple → find iPhone → configure iPhone 17 → add 256GB Black → confirm in bag | **Athena** | ✅ Success | 65,629 | 3m 30s |
|
|
37
|
-
| | | **Playwright** | ✅ Success | 102,754 | 6m 59s |
|
|
24
|
+
## Core Idea: Semantic Page Snapshots
|
|
38
25
|
|
|
39
|
-
**
|
|
26
|
+
Instead of exposing raw DOM structures or full accessibility trees, Athena produces **semantic page snapshots**.
|
|
40
27
|
|
|
41
|
-
|
|
42
|
-
- **Time**: Athena completed tasks **9m 30s faster** (~33.4% faster)
|
|
28
|
+
These snapshots are:
|
|
43
29
|
|
|
44
|
-
|
|
30
|
+
- Compact and structured
|
|
31
|
+
- Focused on user-visible intent
|
|
32
|
+
- Designed for LLM recall and reasoning, not DOM completeness
|
|
33
|
+
- Stable across layout shifts and DOM churn
|
|
45
34
|
|
|
46
|
-
|
|
35
|
+
The goal is not to mirror the browser, but to present the page in a form that aligns with how language models reason about interfaces.
|
|
47
36
|
|
|
48
|
-
|
|
49
|
-
┌─────────────────────────────────────────────────────────────────┐
|
|
50
|
-
│ AI Agent │
|
|
51
|
-
│ ┌────────────────────────────────────────────────────────────┐ │
|
|
52
|
-
│ │ System Prompt: XML state (layers, actionables, atoms) │ │
|
|
53
|
-
│ └────────────────────────────────────────────────────────────┘ │
|
|
54
|
-
└───────────────────────────┬─────────────────────────────────────┘
|
|
55
|
-
│ MCP Protocol (stdio)
|
|
56
|
-
┌───────────────────────────▼─────────────────────────────────────┐
|
|
57
|
-
│ SESSION: launch_browser, connect_browser, close_page, │
|
|
58
|
-
│ close_session │
|
|
59
|
-
│ NAVIGATION: navigate, go_back, go_forward, reload │
|
|
60
|
-
│ OBSERVATION: capture_snapshot, find_elements, get_node_details │
|
|
61
|
-
│ INTERACTION: click, type, press, select, hover, │
|
|
62
|
-
│ scroll_element_into_view, scroll_page │
|
|
63
|
-
└───────────────────────────┬─────────────────────────────────────┘
|
|
64
|
-
│ Playwright + CDP
|
|
65
|
-
┌───────────────────────────▼─────────────────────────────────────┐
|
|
66
|
-
│ Chromium Browser │
|
|
67
|
-
└─────────────────────────────────────────────────────────────────┘
|
|
68
|
-
```
|
|
37
|
+
---
|
|
69
38
|
|
|
70
|
-
##
|
|
39
|
+
## How It Works
|
|
71
40
|
|
|
72
|
-
|
|
41
|
+
At a high level:
|
|
73
42
|
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
| `close_page` | Close specific page | `{ page_id }` |
|
|
79
|
-
| `close_session` | Close entire browser | `{}` |
|
|
43
|
+
1. The browser is controlled via Playwright and CDP
|
|
44
|
+
2. The page is reduced into semantic regions and actionable elements
|
|
45
|
+
3. A structured snapshot is generated and sent to the LLM
|
|
46
|
+
4. Actions are resolved against stable semantic identifiers rather than fragile selectors
|
|
80
47
|
|
|
81
|
-
|
|
48
|
+
This separation keeps:
|
|
82
49
|
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
| `go_back` | Browser back | `{ page_id? }` |
|
|
87
|
-
| `go_forward` | Browser forward | `{ page_id? }` |
|
|
88
|
-
| `reload` | Refresh page | `{ page_id? }` |
|
|
50
|
+
- Browser lifecycle management isolated
|
|
51
|
+
- Snapshots deterministic and low-entropy
|
|
52
|
+
- Agent reasoning predictable and efficient
|
|
89
53
|
|
|
90
|
-
|
|
54
|
+
---
|
|
91
55
|
|
|
92
|
-
|
|
93
|
-
| ------------------ | ------------------- | ----------------------------------------------------------------- |
|
|
94
|
-
| `capture_snapshot` | Capture page state | `{ page_id? }` |
|
|
95
|
-
| `find_elements` | Find by criteria | `{ kind?, label?, region?, limit?, include_readable?, page_id? }` |
|
|
96
|
-
| `get_node_details` | Get element details | `{ eid, page_id? }` |
|
|
56
|
+
## Benchmarks
|
|
97
57
|
|
|
98
|
-
|
|
58
|
+
Early benchmarks against Playwright MCP show:
|
|
99
59
|
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
| `type` | Type text | `{ eid, text, clear?, page_id? }` |
|
|
104
|
-
| `press` | Press keyboard key | `{ key, modifiers?, page_id? }` |
|
|
105
|
-
| `select` | Select option | `{ eid, value, page_id? }` |
|
|
106
|
-
| `hover` | Hover element | `{ eid, page_id? }` |
|
|
107
|
-
| `scroll_element_into_view` | Scroll to element | `{ eid, page_id? }` |
|
|
108
|
-
| `scroll_page` | Scroll viewport | `{ direction, amount?, page_id? }` |
|
|
60
|
+
- **~19% fewer tokens consumed**
|
|
61
|
+
- **~33% faster task completion**
|
|
62
|
+
- Same or better success rates on common navigation tasks
|
|
109
63
|
|
|
110
|
-
|
|
64
|
+
Benchmarks were run using Claude Code on representative real-world tasks.
|
|
65
|
+
Results are task-dependent and should be treated as directional rather than absolute.
|
|
111
66
|
|
|
112
|
-
|
|
67
|
+
---
|
|
113
68
|
|
|
114
|
-
|
|
115
|
-
<match eid="a1b2c3d4e5f6" kind="button" label="Sign In" region="header" />
|
|
116
|
-
```
|
|
69
|
+
## What Athena Is (and Is Not)
|
|
117
70
|
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
- Role/kind (button, link, input)
|
|
121
|
-
- Accessible name (label text)
|
|
122
|
-
- Landmark path (region + group hierarchy)
|
|
123
|
-
- Position hint (screen zone, quadrant)
|
|
124
|
-
|
|
125
|
-
This means the same logical element keeps its `eid` across page updates.
|
|
126
|
-
|
|
127
|
-
## Response Format
|
|
128
|
-
|
|
129
|
-
Tools return XML state responses with page understanding:
|
|
130
|
-
|
|
131
|
-
```xml
|
|
132
|
-
<state page_id="abc123" url="https://example.com" title="Example">
|
|
133
|
-
<layer type="main" active="true">
|
|
134
|
-
<actionables count="12">
|
|
135
|
-
<el eid="a1b2c3" kind="button" label="Sign In" />
|
|
136
|
-
<el eid="d4e5f6" kind="link" label="Forgot password?" />
|
|
137
|
-
<el eid="g7h8i9" kind="input" label="Email" type="email" />
|
|
138
|
-
</actionables>
|
|
139
|
-
</layer>
|
|
140
|
-
<atoms>
|
|
141
|
-
<viewport w="1280" h="720" />
|
|
142
|
-
<scroll x="0" y="0" />
|
|
143
|
-
</atoms>
|
|
144
|
-
</state>
|
|
145
|
-
```
|
|
71
|
+
### Athena is:
|
|
146
72
|
|
|
147
|
-
|
|
73
|
+
- A semantic interface between browsers and LLM agents
|
|
74
|
+
- An MCP server focused on reliability and efficiency
|
|
75
|
+
- Designed for agent workflows, not test automation
|
|
148
76
|
|
|
149
|
-
|
|
150
|
-
| --------- | -------------------------- |
|
|
151
|
-
| `main` | Primary page content |
|
|
152
|
-
| `modal` | Dialog overlays |
|
|
153
|
-
| `drawer` | Slide-in panels |
|
|
154
|
-
| `popover` | Dropdowns, tooltips, menus |
|
|
77
|
+
### Athena is not:
|
|
155
78
|
|
|
156
|
-
|
|
79
|
+
- A general-purpose browser
|
|
80
|
+
- A visual testing or screenshot framework
|
|
81
|
+
- A replacement for Playwright
|
|
157
82
|
|
|
158
|
-
|
|
83
|
+
Playwright remains the execution layer; Athena focuses on representation and reasoning.
|
|
159
84
|
|
|
160
|
-
|
|
161
|
-
1. launch_browser { }
|
|
162
|
-
→ XML state with initial page
|
|
85
|
+
---
|
|
163
86
|
|
|
164
|
-
|
|
165
|
-
→ State shows login form elements
|
|
87
|
+
## Usage
|
|
166
88
|
|
|
167
|
-
|
|
168
|
-
→ <match eid="abc123" kind="input" label="Email" />
|
|
89
|
+
Athena implements the **Model Context Protocol (MCP)** and works with:
|
|
169
90
|
|
|
170
|
-
|
|
171
|
-
|
|
91
|
+
- Claude Code
|
|
92
|
+
- Claude Desktop
|
|
93
|
+
- Cursor
|
|
94
|
+
- VS Code
|
|
95
|
+
- Any MCP-compatible client
|
|
172
96
|
|
|
173
|
-
|
|
174
|
-
→ Value filled
|
|
97
|
+
Example workflows include:
|
|
175
98
|
|
|
176
|
-
|
|
177
|
-
|
|
99
|
+
- Navigating complex web apps
|
|
100
|
+
- Handling login and consent flows
|
|
101
|
+
- Performing multi-step UI interactions with lower token usage
|
|
178
102
|
|
|
179
|
-
|
|
180
|
-
→ Password filled
|
|
103
|
+
See the `examples/` directory for concrete agent workflows.
|
|
181
104
|
|
|
182
|
-
|
|
183
|
-
→ Form submitted, navigation to dashboard
|
|
184
|
-
```
|
|
185
|
-
|
|
186
|
-
### Cookie Consent (Multi-Frame)
|
|
187
|
-
|
|
188
|
-
```
|
|
189
|
-
1. navigate { url: "https://news-site.com" }
|
|
190
|
-
→ Modal layer detected (cookie consent iframe)
|
|
191
|
-
|
|
192
|
-
2. find_elements { label: "Accept", kind: "button" }
|
|
193
|
-
→ <match eid="xyz789" kind="button" label="Accept All" />
|
|
194
|
-
|
|
195
|
-
3. click { eid: "xyz789" }
|
|
196
|
-
→ Modal closed, main layer active
|
|
197
|
-
```
|
|
105
|
+
---
|
|
198
106
|
|
|
199
107
|
## Installation
|
|
200
108
|
|
|
201
109
|
```bash
|
|
110
|
+
git clone https://github.com/lespaceman/athena-browser-mcp
|
|
111
|
+
cd athena-browser-mcp
|
|
202
112
|
npm install
|
|
203
113
|
npm run build
|
|
204
114
|
```
|
|
205
115
|
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
### Claude Desktop
|
|
116
|
+
Configure the MCP server in your client according to its MCP integration instructions.
|
|
209
117
|
|
|
210
|
-
|
|
118
|
+
---
|
|
211
119
|
|
|
212
|
-
|
|
213
|
-
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
|
|
214
|
-
**Linux**: `~/.config/Claude/claude_desktop_config.json`
|
|
120
|
+
## Architecture Overview
|
|
215
121
|
|
|
216
|
-
|
|
217
|
-
{
|
|
218
|
-
"mcpServers": {
|
|
219
|
-
"browser": {
|
|
220
|
-
"command": "npx",
|
|
221
|
-
"args": ["athena-browser-mcp@latest"]
|
|
222
|
-
}
|
|
223
|
-
}
|
|
224
|
-
}
|
|
225
|
-
```
|
|
122
|
+
Athena separates concerns into three layers:
|
|
226
123
|
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
claude mcp add athena-browser-mcp npx athena-browser-mcp@latest
|
|
231
|
-
```
|
|
124
|
+
- **Browser lifecycle** — page creation, navigation, teardown
|
|
125
|
+
- **Semantic snapshot generation** — regions, elements, identifiers
|
|
126
|
+
- **Action resolution** — mapping agent intent to browser actions
|
|
232
127
|
|
|
233
|
-
|
|
128
|
+
This separation allows each layer to evolve independently while keeping agent-visible behavior stable.
|
|
234
129
|
|
|
235
|
-
|
|
236
|
-
code --add-mcp '{"name":"athena-browser-mcp","command":"npx","args":["athena-browser-mcp@latest"]}'
|
|
237
|
-
```
|
|
130
|
+
---
|
|
238
131
|
|
|
239
|
-
|
|
132
|
+
## Status
|
|
240
133
|
|
|
241
|
-
|
|
134
|
+
Athena is under active development.
|
|
135
|
+
APIs and snapshot formats may evolve as real-world agent usage informs the design.
|
|
242
136
|
|
|
243
|
-
|
|
244
|
-
npx athena-browser-mcp@latest
|
|
245
|
-
```
|
|
137
|
+
Feedback from practitioners building agent systems is especially welcome.
|
|
246
138
|
|
|
247
|
-
|
|
248
|
-
|
|
249
|
-
```bash
|
|
250
|
-
codex mcp add athena-browser-mcp npx athena-browser-mcp@latest
|
|
251
|
-
```
|
|
252
|
-
|
|
253
|
-
### Gemini CLI
|
|
254
|
-
|
|
255
|
-
```bash
|
|
256
|
-
gemini mcp add -s user athena-browser-mcp -- npx athena-browser-mcp@latest
|
|
257
|
-
```
|
|
258
|
-
|
|
259
|
-
### Connect to Existing Browser
|
|
260
|
-
|
|
261
|
-
To connect to an existing Chromium browser with CDP enabled:
|
|
262
|
-
|
|
263
|
-
```bash
|
|
264
|
-
# Start Chrome with remote debugging
|
|
265
|
-
google-chrome --remote-debugging-port=9222
|
|
266
|
-
|
|
267
|
-
# Or use environment variables
|
|
268
|
-
export CEF_BRIDGE_HOST=127.0.0.1
|
|
269
|
-
export CEF_BRIDGE_PORT=9222
|
|
270
|
-
```
|
|
271
|
-
|
|
272
|
-
Then use `connect_browser` instead of `launch_browser`.
|
|
273
|
-
|
|
274
|
-
### Environment Variables
|
|
275
|
-
|
|
276
|
-
| Variable | Description | Default |
|
|
277
|
-
| ----------------- | -------------------- | ----------- |
|
|
278
|
-
| `CEF_BRIDGE_HOST` | CDP host for connect | `127.0.0.1` |
|
|
279
|
-
| `CEF_BRIDGE_PORT` | CDP port for connect | `9223` |
|
|
280
|
-
|
|
281
|
-
## Development
|
|
282
|
-
|
|
283
|
-
```bash
|
|
284
|
-
npm run build # Compile TypeScript
|
|
285
|
-
npm run type-check # TypeScript type checking
|
|
286
|
-
npm run lint # ESLint
|
|
287
|
-
npm run format # Prettier format
|
|
288
|
-
npm run check # Run all checks
|
|
289
|
-
npm test # Run tests
|
|
290
|
-
```
|
|
139
|
+
---
|
|
291
140
|
|
|
292
141
|
## License
|
|
293
142
|
|