surfagent 1.0.3 → 1.0.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +84 -111
- package/package.json +24 -2
package/README.md
CHANGED
|
@@ -1,8 +1,17 @@
|
|
|
1
1
|
# surfagent
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
**Browser automation API for AI agents.** Give any AI agent the ability to see, navigate, and interact with real web pages through Chrome.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
`npm install -g surfagent` — two commands to give your agent a browser.
|
|
6
|
+
|
|
7
|
+
[](https://www.npmjs.com/package/surfagent)
|
|
8
|
+
[](https://opensource.org/licenses/MIT)
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
**surfagent** connects to a local Chrome browser via CDP and exposes a simple HTTP API that returns structured page data — every interactive element, form field, link, and CSS selector — so AI agents can navigate websites fast and precisely without screenshots or trial-and-error.
|
|
13
|
+
|
|
14
|
+
**Works with any AI agent framework:** LangChain, CrewAI, AutoGPT, Claude Code, OpenAI Agents, custom agents — anything that can make HTTP calls.
|
|
6
15
|
|
|
7
16
|
## Quick Start
|
|
8
17
|
|
|
@@ -11,159 +20,123 @@ npm install -g surfagent
|
|
|
11
20
|
surfagent start
|
|
12
21
|
```
|
|
13
22
|
|
|
14
|
-
|
|
23
|
+
A **new Chrome window** opens with debug mode — your personal Chrome is not affected. The API starts on `http://localhost:3456`.
|
|
15
24
|
|
|
16
|
-
|
|
25
|
+
## Why surfagent?
|
|
17
26
|
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
27
|
+
| Without surfagent | With surfagent |
|
|
28
|
+
|---|---|
|
|
29
|
+
| Agent takes screenshots, sends to vision model | Agent calls `/recon`, gets structured JSON in 30ms |
|
|
30
|
+
| Guesses CSS selectors, fails, retries | Gets exact selectors from recon response |
|
|
31
|
+
| Can't read forms, dropdowns, or modals | Gets form schemas with labels, types, required flags |
|
|
32
|
+
| Breaks on SPAs, iframes, shadow DOM | Handles all of them out of the box |
|
|
33
|
+
| Slow (2-5s per screenshot round-trip) | Fast (20-60ms per API call on existing tabs) |
|
|
25
34
|
|
|
26
|
-
##
|
|
35
|
+
## How Agents Use It
|
|
27
36
|
|
|
28
|
-
|
|
37
|
+
The workflow is: **recon → act → read**.
|
|
29
38
|
|
|
30
|
-
```
|
|
31
|
-
|
|
39
|
+
```
|
|
40
|
+
1. POST /recon → get the page map (selectors, forms, elements)
|
|
41
|
+
2. POST /click → click something using a selector from step 1
|
|
42
|
+
POST /fill → fill a form using selectors from step 1
|
|
43
|
+
3. POST /read → check what happened (success? error? new content?)
|
|
44
|
+
4. POST /recon → if the page changed, map it again
|
|
32
45
|
```
|
|
33
46
|
|
|
34
|
-
|
|
35
|
-
- Every clickable element with a CSS selector
|
|
36
|
-
- Every form with field labels, types, and required flags
|
|
37
|
-
- Page headings, navigation links, metadata
|
|
38
|
-
- Overlay/modal detection
|
|
39
|
-
|
|
40
|
-
Your agent uses those selectors to interact — no guessing.
|
|
41
|
-
|
|
42
|
-
## What Can It Do?
|
|
47
|
+
### Example: search on any website
|
|
43
48
|
|
|
44
|
-
### Map a page
|
|
45
49
|
```bash
|
|
46
|
-
#
|
|
50
|
+
# 1. Recon the page — find the search input
|
|
47
51
|
curl -X POST localhost:3456/recon -H 'Content-Type: application/json' \
|
|
48
52
|
-d '{"tab":"0"}'
|
|
53
|
+
# Response includes: { "selector": "input[name='search']", "text": "Search..." }
|
|
49
54
|
|
|
50
|
-
#
|
|
51
|
-
curl -X POST localhost:3456/
|
|
52
|
-
-d '{"
|
|
53
|
-
```
|
|
55
|
+
# 2. Type and submit
|
|
56
|
+
curl -X POST localhost:3456/fill -H 'Content-Type: application/json' \
|
|
57
|
+
-d '{"tab":"0", "fields":[{"selector":"input[name=\"search\"]","value":"AI agents"}], "submit":"enter"}'
|
|
54
58
|
|
|
55
|
-
|
|
56
|
-
```bash
|
|
57
|
-
# Structured text — headings, tables, notifications
|
|
59
|
+
# 3. Read the results
|
|
58
60
|
curl -X POST localhost:3456/read -H 'Content-Type: application/json' \
|
|
59
61
|
-d '{"tab":"0"}'
|
|
60
|
-
|
|
61
|
-
# Read a specific element
|
|
62
|
-
curl -X POST localhost:3456/read -H 'Content-Type: application/json' \
|
|
63
|
-
-d '{"tab":"0", "selector":".results"}'
|
|
64
62
|
```
|
|
65
63
|
|
|
66
|
-
|
|
67
|
-
```bash
|
|
68
|
-
curl -X POST localhost:3456/fill -H 'Content-Type: application/json' \
|
|
69
|
-
-d '{"tab":"0", "fields":[
|
|
70
|
-
{"selector":"#email", "value":"me@example.com"},
|
|
71
|
-
{"selector":"#password", "value":"secret"}
|
|
72
|
-
], "submit":"enter"}'
|
|
73
|
-
```
|
|
64
|
+
## All Endpoints
|
|
74
65
|
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
66
|
+
| Endpoint | Method | Description |
|
|
67
|
+
|---|---|---|
|
|
68
|
+
| `/recon` | POST | Full page map — every element, form, selector, heading, nav link, metadata, captcha detection |
|
|
69
|
+
| `/read` | POST | Structured page content — headings, tables, code blocks, notifications, result areas |
|
|
70
|
+
| `/fill` | POST | Fill form fields with real CDP keystrokes (works with React, Vue, SPAs) |
|
|
71
|
+
| `/click` | POST | Click by CSS selector or text match (handles `target="_blank"` automatically) |
|
|
72
|
+
| `/scroll` | POST | Scroll page, returns visible content preview and scroll position |
|
|
73
|
+
| `/navigate` | POST | Go to URL, back, or forward in the same tab |
|
|
74
|
+
| `/eval` | POST | Run JavaScript in any tab or cross-origin iframe |
|
|
75
|
+
| `/captcha` | POST | Detect and interact with captchas — Arkose, reCAPTCHA, hCaptcha (experimental) |
|
|
76
|
+
| `/focus` | POST | Bring a tab to the front in Chrome |
|
|
77
|
+
| `/tabs` | GET | List all open Chrome tabs |
|
|
78
|
+
| `/health` | GET | Check if Chrome and API are connected |
|
|
80
79
|
|
|
81
|
-
|
|
82
|
-
curl -X POST localhost:3456/click -H 'Content-Type: application/json' \
|
|
83
|
-
-d '{"tab":"0", "selector":"#submit-btn"}'
|
|
84
|
-
```
|
|
80
|
+
Full API reference with request/response schemas: **[API.md](./API.md)**
|
|
85
81
|
|
|
86
|
-
|
|
87
|
-
```bash
|
|
88
|
-
# Go to URL (same tab)
|
|
89
|
-
curl -X POST localhost:3456/navigate -H 'Content-Type: application/json' \
|
|
90
|
-
-d '{"tab":"0", "url":"https://example.com"}'
|
|
82
|
+
## Key Features
|
|
91
83
|
|
|
92
|
-
|
|
93
|
-
curl -X POST localhost:3456/navigate -H 'Content-Type: application/json' \
|
|
94
|
-
-d '{"tab":"0", "back":true}'
|
|
95
|
-
```
|
|
84
|
+
**Page reconnaissance** — one call returns every interactive element with stable CSS selectors, form schemas with field labels and validation, navigation structure, metadata, and content summary.
|
|
96
85
|
|
|
97
|
-
|
|
98
|
-
```bash
|
|
99
|
-
curl -X POST localhost:3456/scroll -H 'Content-Type: application/json' \
|
|
100
|
-
-d '{"tab":"0", "direction":"down", "amount":1000}'
|
|
101
|
-
```
|
|
86
|
+
**Real keyboard input** — fills forms using CDP `Input.dispatchKeyEvent`, not JavaScript value injection. Works with React, Vue, Angular, and any framework-controlled inputs.
|
|
102
87
|
|
|
103
|
-
|
|
104
|
-
```bash
|
|
105
|
-
curl -X POST localhost:3456/eval -H 'Content-Type: application/json' \
|
|
106
|
-
-d '{"tab":"0", "expression":"document.title"}'
|
|
107
|
-
```
|
|
88
|
+
**Cross-origin iframe support** — target iframes by domain (`"tab": "stripe.com"`). CDP connects to them as separate targets, bypassing same-origin restrictions.
|
|
108
89
|
|
|
109
|
-
|
|
90
|
+
**SPA navigation** — handles single-page apps (YouTube, Gmail, Google Flights). Enter key submission, client-side routing, dynamic content — all work.
|
|
110
91
|
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
| `/click` | POST | Click by selector or text |
|
|
117
|
-
| `/scroll` | POST | Scroll with content preview |
|
|
118
|
-
| `/navigate` | POST | Go to URL, back, or forward (same tab) |
|
|
119
|
-
| `/eval` | POST | Run JavaScript in any tab or iframe |
|
|
120
|
-
| `/captcha` | POST | Detect captchas, basic interaction (experimental) |
|
|
121
|
-
| `/focus` | POST | Bring a tab to the front |
|
|
122
|
-
| `/tabs` | GET | List open tabs |
|
|
123
|
-
| `/health` | GET | Check Chrome connection |
|
|
124
|
-
|
|
125
|
-
Full API reference with response schemas: [API.md](./API.md)
|
|
92
|
+
**Captcha detection** — `/recon` automatically detects captcha iframes (Arkose, reCAPTCHA, hCaptcha) and flags them. `/captcha` endpoint provides basic interaction.
|
|
93
|
+
|
|
94
|
+
**Overlay detection** — modals, cookie banners, and blocking overlays are detected and reported so agents can dismiss them before interacting.
|
|
95
|
+
|
|
96
|
+
**Same-tab navigation** — links with `target="_blank"` are automatically opened in the same tab instead of spawning new ones.
|
|
126
97
|
|
|
127
98
|
## Tab Targeting
|
|
128
99
|
|
|
129
|
-
Every endpoint
|
|
100
|
+
Every endpoint accepts a `tab` field:
|
|
130
101
|
|
|
131
102
|
```json
|
|
132
103
|
{"tab": "0"} // by index
|
|
133
|
-
{"tab": "github"} //
|
|
134
|
-
{"tab": "
|
|
104
|
+
{"tab": "github"} // partial match on URL or title
|
|
105
|
+
{"tab": "stripe.com"} // matches cross-origin iframes too
|
|
135
106
|
```
|
|
136
107
|
|
|
137
|
-
##
|
|
138
|
-
|
|
139
|
-
The workflow is: **recon → act → read**.
|
|
108
|
+
## Commands
|
|
140
109
|
|
|
110
|
+
```bash
|
|
111
|
+
surfagent start # Start Chrome + API (one command)
|
|
112
|
+
surfagent chrome # Start Chrome debug session only
|
|
113
|
+
surfagent api # Start API only (Chrome must be running)
|
|
114
|
+
surfagent health # Check if everything is running
|
|
115
|
+
surfagent help # Show all options
|
|
141
116
|
```
|
|
142
|
-
1. /recon → get the page map (selectors, forms, elements)
|
|
143
|
-
2. /click → click something using a selector from step 1
|
|
144
|
-
/fill → fill a form using selectors from step 1
|
|
145
|
-
3. /read → check what happened (success message? error? new content?)
|
|
146
|
-
4. /recon → if the page changed, map it again
|
|
147
|
-
```
|
|
148
|
-
|
|
149
|
-
Agents never need to guess selectors or parse screenshots. The recon response has everything.
|
|
150
117
|
|
|
151
118
|
## Tested On
|
|
152
119
|
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
120
|
+
Google Flights, YouTube, GitHub, Supabase, Hacker News, Reddit, CodePen, Polymarket, npm — including autocomplete dropdowns, date pickers, complex forms, SPA navigation, cross-origin iframes, and captchas.
|
|
121
|
+
|
|
122
|
+
## Platform Support
|
|
123
|
+
|
|
124
|
+
| Platform | Status |
|
|
125
|
+
|---|---|
|
|
126
|
+
| macOS | Fully supported |
|
|
127
|
+
| Linux | Fully supported |
|
|
128
|
+
| Windows | Not yet supported — coming soon |
|
|
161
129
|
|
|
162
130
|
## Requirements
|
|
163
131
|
|
|
132
|
+
- macOS or Linux
|
|
164
133
|
- Chrome (any recent version)
|
|
165
134
|
- Node.js 18+
|
|
166
135
|
|
|
136
|
+
## Contributing
|
|
137
|
+
|
|
138
|
+
Issues and PRs welcome at [github.com/AllAboutAI-YT/surfagent](https://github.com/AllAboutAI-YT/surfagent).
|
|
139
|
+
|
|
167
140
|
## License
|
|
168
141
|
|
|
169
142
|
MIT
|
package/package.json
CHANGED
|
@@ -1,7 +1,29 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "surfagent",
|
|
3
|
-
"version": "1.0.
|
|
4
|
-
"description": "
|
|
3
|
+
"version": "1.0.5",
|
|
4
|
+
"description": "Browser automation API for AI agents — structured page recon, form filling, clicking, and navigation via Chrome CDP",
|
|
5
|
+
"keywords": [
|
|
6
|
+
"ai-agent",
|
|
7
|
+
"browser-automation",
|
|
8
|
+
"chrome-devtools",
|
|
9
|
+
"cdp",
|
|
10
|
+
"web-scraping",
|
|
11
|
+
"automation",
|
|
12
|
+
"langchain",
|
|
13
|
+
"crewai",
|
|
14
|
+
"openai",
|
|
15
|
+
"claude",
|
|
16
|
+
"agent",
|
|
17
|
+
"browser",
|
|
18
|
+
"recon",
|
|
19
|
+
"selenium-alternative",
|
|
20
|
+
"puppeteer-alternative"
|
|
21
|
+
],
|
|
22
|
+
"repository": {
|
|
23
|
+
"type": "git",
|
|
24
|
+
"url": "https://github.com/AllAboutAI-YT/surfagent"
|
|
25
|
+
},
|
|
26
|
+
"license": "MIT",
|
|
5
27
|
"main": "dist/api/server.js",
|
|
6
28
|
"type": "module",
|
|
7
29
|
"bin": {
|