surfagent 1.0.4 → 1.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +75 -111
  2. package/package.json +24 -2
package/README.md CHANGED
@@ -1,8 +1,17 @@
1
1
  # surfagent
2
2
 
3
- A local API that gives AI agents structured page data from your Chrome browser. Instead of guessing selectors or taking screenshots, your agent gets a complete map of every element, form, and link on the page then acts on it precisely.
3
+ **Browser automation API for AI agents.** Give any AI agent the ability to see, navigate, and interact with real web pages through Chrome.
4
4
 
5
- **One recon call replaces dozens of trial-and-error clicks.**
5
+ `npm install -g surfagent` two commands to give your agent a browser.
6
+
7
+ [![npm version](https://img.shields.io/npm/v/surfagent.svg)](https://www.npmjs.com/package/surfagent)
8
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
9
+
10
+ ---
11
+
12
+ **surfagent** connects to a local Chrome browser via CDP and exposes a simple HTTP API that returns structured page data — every interactive element, form field, link, and CSS selector — so AI agents can navigate websites fast and precisely without screenshots or trial-and-error.
13
+
14
+ **Works with any AI agent framework:** LangChain, CrewAI, AutoGPT, Claude Code, OpenAI Agents, custom agents — anything that can make HTTP calls.
6
15
 
7
16
  ## Quick Start
8
17
 
@@ -11,153 +20,104 @@ npm install -g surfagent
11
20
  surfagent start
12
21
  ```
13
22
 
14
- That's it. A **new Chrome window** opens with debug mode — your personal Chrome is not affected. The API starts on `http://localhost:3456` and your agent can start calling it immediately.
23
+ A **new Chrome window** opens with debug mode — your personal Chrome is not affected. The API starts on `http://localhost:3456`.
15
24
 
16
- ### Other commands
25
+ ## Why surfagent?
17
26
 
18
- ```bash
19
- surfagent start # Start Chrome + API (one command)
20
- surfagent chrome # Start Chrome debug session only
21
- surfagent api # Start API only (Chrome must be running)
22
- surfagent health # Check if everything is running
23
- surfagent help # Show all options
24
- ```
27
+ | Without surfagent | With surfagent |
28
+ |---|---|
29
+ | Agent takes screenshots, sends to vision model | Agent calls `/recon`, gets structured JSON in 30ms |
30
+ | Guesses CSS selectors, fails, retries | Gets exact selectors from recon response |
31
+ | Can't read forms, dropdowns, or modals | Gets form schemas with labels, types, required flags |
32
+ | Breaks on SPAs, iframes, shadow DOM | Handles all of them out of the box |
33
+ | Slow (2-5s per screenshot round-trip) | Fast (20-60ms per API call on existing tabs) |
25
34
 
26
- ## Your First Recon
35
+ ## How Agents Use It
27
36
 
28
- Open any website in Chrome, then:
37
+ The workflow is: **recon act → read**.
29
38
 
30
- ```bash
31
- curl -s -X POST localhost:3456/recon -d '{"tab":"0"}' -H 'Content-Type: application/json'
39
+ ```
40
+ 1. POST /recon get the page map (selectors, forms, elements)
41
+ 2. POST /click → click something using a selector from step 1
42
+ POST /fill → fill a form using selectors from step 1
43
+ 3. POST /read → check what happened (success? error? new content?)
44
+ 4. POST /recon → if the page changed, map it again
32
45
  ```
33
46
 
34
- You get back:
35
- - Every clickable element with a CSS selector
36
- - Every form with field labels, types, and required flags
37
- - Page headings, navigation links, metadata
38
- - Overlay/modal detection
39
-
40
- Your agent uses those selectors to interact — no guessing.
41
-
42
- ## What Can It Do?
47
+ ### Example: search on any website
43
48
 
44
- ### Map a page
45
49
  ```bash
46
- # Get full page structure
50
+ # 1. Recon the page — find the search input
47
51
  curl -X POST localhost:3456/recon -H 'Content-Type: application/json' \
48
52
  -d '{"tab":"0"}'
53
+ # Response includes: { "selector": "input[name='search']", "text": "Search..." }
49
54
 
50
- # Open a URL and map it
51
- curl -X POST localhost:3456/recon -H 'Content-Type: application/json' \
52
- -d '{"url":"https://example.com", "keepTab": true}'
53
- ```
55
+ # 2. Type and submit
56
+ curl -X POST localhost:3456/fill -H 'Content-Type: application/json' \
57
+ -d '{"tab":"0", "fields":[{"selector":"input[name=\"search\"]","value":"AI agents"}], "submit":"enter"}'
54
58
 
55
- ### Read page content
56
- ```bash
57
- # Structured text — headings, tables, notifications
59
+ # 3. Read the results
58
60
  curl -X POST localhost:3456/read -H 'Content-Type: application/json' \
59
61
  -d '{"tab":"0"}'
60
-
61
- # Read a specific element
62
- curl -X POST localhost:3456/read -H 'Content-Type: application/json' \
63
- -d '{"tab":"0", "selector":".results"}'
64
62
  ```
65
63
 
66
- ### Fill forms
67
- ```bash
68
- curl -X POST localhost:3456/fill -H 'Content-Type: application/json' \
69
- -d '{"tab":"0", "fields":[
70
- {"selector":"#email", "value":"me@example.com"},
71
- {"selector":"#password", "value":"secret"}
72
- ], "submit":"enter"}'
73
- ```
64
+ ## All Endpoints
74
65
 
75
- ### Click elements
76
- ```bash
77
- # By text
78
- curl -X POST localhost:3456/click -H 'Content-Type: application/json' \
79
- -d '{"tab":"0", "text":"Sign In"}'
66
+ | Endpoint | Method | Description |
67
+ |---|---|---|
68
+ | `/recon` | POST | Full page map — every element, form, selector, heading, nav link, metadata, captcha detection |
69
+ | `/read` | POST | Structured page content — headings, tables, code blocks, notifications, result areas |
70
+ | `/fill` | POST | Fill form fields with real CDP keystrokes (works with React, Vue, SPAs) |
71
+ | `/click` | POST | Click by CSS selector or text match (handles `target="_blank"` automatically) |
72
+ | `/scroll` | POST | Scroll page, returns visible content preview and scroll position |
73
+ | `/navigate` | POST | Go to URL, back, or forward in the same tab |
74
+ | `/eval` | POST | Run JavaScript in any tab or cross-origin iframe |
75
+ | `/captcha` | POST | Detect and interact with captchas — Arkose, reCAPTCHA, hCaptcha (experimental) |
76
+ | `/focus` | POST | Bring a tab to the front in Chrome |
77
+ | `/tabs` | GET | List all open Chrome tabs |
78
+ | `/health` | GET | Check if Chrome and API are connected |
80
79
 
81
- # By selector (from recon)
82
- curl -X POST localhost:3456/click -H 'Content-Type: application/json' \
83
- -d '{"tab":"0", "selector":"#submit-btn"}'
84
- ```
80
+ Full API reference with request/response schemas: **[API.md](./API.md)**
85
81
 
86
- ### Navigate
87
- ```bash
88
- # Go to URL (same tab)
89
- curl -X POST localhost:3456/navigate -H 'Content-Type: application/json' \
90
- -d '{"tab":"0", "url":"https://example.com"}'
82
+ ## Key Features
91
83
 
92
- # Go back
93
- curl -X POST localhost:3456/navigate -H 'Content-Type: application/json' \
94
- -d '{"tab":"0", "back":true}'
95
- ```
84
+ **Page reconnaissance** — one call returns every interactive element with stable CSS selectors, form schemas with field labels and validation, navigation structure, metadata, and content summary.
96
85
 
97
- ### Scroll
98
- ```bash
99
- curl -X POST localhost:3456/scroll -H 'Content-Type: application/json' \
100
- -d '{"tab":"0", "direction":"down", "amount":1000}'
101
- ```
86
+ **Real keyboard input** — fills forms using CDP `Input.dispatchKeyEvent`, not JavaScript value injection. Works with React, Vue, Angular, and any framework-controlled inputs.
102
87
 
103
- ### Run JavaScript
104
- ```bash
105
- curl -X POST localhost:3456/eval -H 'Content-Type: application/json' \
106
- -d '{"tab":"0", "expression":"document.title"}'
107
- ```
88
+ **Cross-origin iframe support** — target iframes by domain (`"tab": "stripe.com"`). CDP connects to them as separate targets, bypassing same-origin restrictions.
108
89
 
109
- ## All Endpoints
90
+ **SPA navigation** — handles single-page apps (YouTube, Gmail, Google Flights). Enter key submission, client-side routing, dynamic content — all work.
110
91
 
111
- | Endpoint | Method | What it does |
112
- |---|---|---|
113
- | `/recon` | POST | Full page map elements, forms, selectors, metadata |
114
- | `/read` | POST | Structured page content — headings, tables, notifications |
115
- | `/fill` | POST | Fill form fields with real keystrokes |
116
- | `/click` | POST | Click by selector or text |
117
- | `/scroll` | POST | Scroll with content preview |
118
- | `/navigate` | POST | Go to URL, back, or forward (same tab) |
119
- | `/eval` | POST | Run JavaScript in any tab or iframe |
120
- | `/captcha` | POST | Detect captchas, basic interaction (experimental) |
121
- | `/focus` | POST | Bring a tab to the front |
122
- | `/tabs` | GET | List open tabs |
123
- | `/health` | GET | Check Chrome connection |
124
-
125
- Full API reference with response schemas: [API.md](./API.md)
92
+ **Captcha detection** `/recon` automatically detects captcha iframes (Arkose, reCAPTCHA, hCaptcha) and flags them. `/captcha` endpoint provides basic interaction.
93
+
94
+ **Overlay detection** modals, cookie banners, and blocking overlays are detected and reported so agents can dismiss them before interacting.
95
+
96
+ **Same-tab navigation** — links with `target="_blank"` are automatically opened in the same tab instead of spawning new ones.
126
97
 
127
98
  ## Tab Targeting
128
99
 
129
- Every endpoint takes a `tab` field. You can target tabs three ways:
100
+ Every endpoint accepts a `tab` field:
130
101
 
131
102
  ```json
132
103
  {"tab": "0"} // by index
133
- {"tab": "github"} // by URL or title (partial match)
134
- {"tab": "cdpn.io"} // cross-origin iframes work too
104
+ {"tab": "github"} // partial match on URL or title
105
+ {"tab": "stripe.com"} // matches cross-origin iframes too
135
106
  ```
136
107
 
137
- ## How Agents Should Use This
138
-
139
- The workflow is: **recon → act → read**.
108
+ ## Commands
140
109
 
141
- ```
142
- 1. /recon → get the page map (selectors, forms, elements)
143
- 2. /click → click something using a selector from step 1
144
- /fill → fill a form using selectors from step 1
145
- 3. /read check what happened (success message? error? new content?)
146
- 4. /recon → if the page changed, map it again
110
+ ```bash
111
+ surfagent start # Start Chrome + API (one command)
112
+ surfagent chrome # Start Chrome debug session only
113
+ surfagent api # Start API only (Chrome must be running)
114
+ surfagent health # Check if everything is running
115
+ surfagent help # Show all options
147
116
  ```
148
117
 
149
- Agents never need to guess selectors or parse screenshots. The recon response has everything.
150
-
151
118
  ## Tested On
152
119
 
153
- - Google Flights (autocomplete dropdowns, date pickers, complex forms)
154
- - YouTube (SPA navigation, search, video selection)
155
- - GitHub (login forms, repository pages)
156
- - Supabase (dashboard navigation, SQL editor)
157
- - Hacker News (link following, content reading)
158
- - Reddit (thread navigation, comments)
159
- - CodePen (cross-origin iframe interaction)
160
- - Polymarket (market data extraction)
120
+ Google Flights, YouTube, GitHub, Supabase, Hacker News, Reddit, CodePen, Polymarket, npm — including autocomplete dropdowns, date pickers, complex forms, SPA navigation, cross-origin iframes, and captchas.
161
121
 
162
122
  ## Platform Support
163
123
 
@@ -173,6 +133,10 @@ Agents never need to guess selectors or parse screenshots. The recon response ha
173
133
  - Chrome (any recent version)
174
134
  - Node.js 18+
175
135
 
136
+ ## Contributing
137
+
138
+ Issues and PRs welcome at [github.com/AllAboutAI-YT/surfagent](https://github.com/AllAboutAI-YT/surfagent).
139
+
176
140
  ## License
177
141
 
178
142
  MIT
package/package.json CHANGED
@@ -1,7 +1,29 @@
1
1
  {
2
2
  "name": "surfagent",
3
- "version": "1.0.4",
4
- "description": "Local API that gives AI agents structured page data from Chrome for fast, precise web navigation",
3
+ "version": "1.0.5",
4
+ "description": "Browser automation API for AI agents structured page recon, form filling, clicking, and navigation via Chrome CDP",
5
+ "keywords": [
6
+ "ai-agent",
7
+ "browser-automation",
8
+ "chrome-devtools",
9
+ "cdp",
10
+ "web-scraping",
11
+ "automation",
12
+ "langchain",
13
+ "crewai",
14
+ "openai",
15
+ "claude",
16
+ "agent",
17
+ "browser",
18
+ "recon",
19
+ "selenium-alternative",
20
+ "puppeteer-alternative"
21
+ ],
22
+ "repository": {
23
+ "type": "git",
24
+ "url": "https://github.com/AllAboutAI-YT/surfagent"
25
+ },
26
+ "license": "MIT",
5
27
  "main": "dist/api/server.js",
6
28
  "type": "module",
7
29
  "bin": {