npm - surfagent - Versions diffs - 1.1.1 → 1.1.2 - Mend

surfagent 1.1.1 → 1.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/AGENT.md ADDED Viewed

@@ -0,0 +1,200 @@
+## IMPORTANT: Never kill Chrome
+`surfagent` launches a **separate Chrome window** with its own profile. The user's personal Chrome stays untouched. NEVER run `pkill Chrome`, `killall Chrome`, or any command that kills Chrome processes. If you need to restart the debug session, use `surfagent start` — it will launch a new one without affecting anything.
+---
+## Critical Rule: Always Close Unused Tabs
+After completing any task, close tabs you no longer need:
+```bash
+curl -X POST localhost:3456/click -H 'Content-Type: application/json' -d '{"tab":"0","text":"close"}'
+# Or via CLI:
+node dist/browser.js close all
+```
+---
+## Setup
+### Start everything (one command)
+```bash
+surfagent start
+```
+This will:
+1. Check if Chrome debug session is already running on port 9222
+2. If not, launch a **new Chrome window** with a separate profile (`/tmp/surfagent-chrome`)
+3. Copy cookies from the user's default Chrome profile (preserves logins)
+4. Start the API server on `http://localhost:3456`
+The user's personal Chrome browser is NOT affected. A second Chrome window will appear — this is the debug session that the API controls.
+### Other commands
+```bash
+surfagent start     # Chrome + API (recommended)
+surfagent chrome    # Launch Chrome debug session only
+surfagent api       # Start API only (Chrome must already be running)
+surfagent health    # Check if Chrome and API are running
+```
+---
+## API Endpoints (preferred for agents)
+The API is the primary interface. Always **recon first, then act**.
+See `API.md` for full documentation with examples.
+```bash
+# Recon a page — get full map of elements, forms, selectors
+curl -X POST localhost:3456/recon -H 'Content-Type: application/json' -d '{"url":"https://example.com","keepTab":true}'
+# Read page content — structured text, tables, notifications
+curl -X POST localhost:3456/read -H 'Content-Type: application/json' -d '{"tab":"0"}'
+# Fill form fields — real CDP keystrokes
+curl -X POST localhost:3456/fill -H 'Content-Type: application/json' -d '{"tab":"0","fields":[{"selector":"#email","value":"test@example.com"}],"submit":"enter"}'
+# Click an element
+curl -X POST localhost:3456/click -H 'Content-Type: application/json' -d '{"tab":"0","text":"Submit"}'
+# Scroll
+curl -X POST localhost:3456/scroll -H 'Content-Type: application/json' -d '{"tab":"0","direction":"down","amount":1000}'
+# Navigate (same tab)
+curl -X POST localhost:3456/navigate -H 'Content-Type: application/json' -d '{"tab":"0","url":"https://example.com"}'
+# Go back
+curl -X POST localhost:3456/navigate -H 'Content-Type: application/json' -d '{"tab":"0","back":true}'
+# Run JavaScript in a tab or iframe
+curl -X POST localhost:3456/eval -H 'Content-Type: application/json' -d '{"tab":"0","expression":"document.title"}'
+# Bring tab to front
+curl -X POST localhost:3456/focus -H 'Content-Type: application/json' -d '{"tab":"0"}'
+# Raw key typing — no clear step, for Google Sheets / contenteditable / canvas
+curl -X POST localhost:3456/type -H 'Content-Type: application/json' -d '{"tab":"0","keys":"Hello","submit":"tab"}'
+# Captcha detection and interaction (experimental)
+curl -X POST localhost:3456/captcha -H 'Content-Type: application/json' -d '{"tab":"0","action":"detect"}'
+# List tabs
+curl localhost:3456/tabs
+# Health check
+curl localhost:3456/health
+```
+### Google Sheets
+Google Sheets requires `/type` instead of `/fill` for cell input (because `/fill` does Ctrl+A which selects all cells). Use the name box to navigate, then `/type` to enter data:
+```bash
+# 1. Click the name box
+curl -X POST localhost:3456/click -H 'Content-Type: application/json' -d '{"tab":"sheets","selector":"#t-name-box"}'
+# 2. Navigate to a cell
+curl -X POST localhost:3456/fill -H 'Content-Type: application/json' -d '{"tab":"sheets","fields":[{"selector":"#t-name-box","value":"A1","clear":true}],"submit":"enter"}'
+# 3. Type into the cell (Tab moves right, Enter moves down)
+curl -X POST localhost:3456/type -H 'Content-Type: application/json' -d '{"tab":"sheets","keys":"=SUM(B2:B10)","submit":"tab"}'
+```
+Some Sheets buttons (Add Sheet +, toolbar) only respond to CDP mouse events, not DOM clicks. See `API.md` for the CDP mouse click pattern.
+**Warning:** Navigating away from unsaved Sheets triggers a native Chrome "Leave page?" dialog that blocks ALL CDP commands. See `API.md` > "Native Chrome Dialogs" for detection and dismissal via Swift AX API.
+### Tab Targeting
+All endpoints accept a `tab` field:
+- `"0"` — by index
+- `"github"` — partial URL/title match
+- `"cdpn.io"` — matches cross-origin iframes too
+---
+## CLI Commands
+The CLI is useful for quick manual operations and debugging.
+```bash
+node dist/browser.js list                    # List all open tabs
+node dist/browser.js content <tab>           # Get text content from a tab
+node dist/browser.js content <tab> -s "sel"  # Get text from specific CSS selector
+node dist/browser.js content-all             # Get content from all tabs
+node dist/browser.js elements <tab>          # List interactive elements
+node dist/browser.js click <tab> <text>      # Click element by text
+node dist/browser.js type <tab> <text>       # Type into input field
+node dist/browser.js open <url>              # Navigate to URL
+node dist/browser.js open <url> --new-tab    # Open in new tab
+node dist/browser.js screenshot <tab> -o f   # Take screenshot
+node dist/browser.js search <query>          # Search across all tabs
+node dist/browser.js html <tab>              # Get full HTML of page
+node dist/browser.js html <tab> -s "sel"     # Get HTML of specific element
+node dist/browser.js desc <tab>              # Get item description from JSON-LD/meta
+node dist/browser.js close <tab>             # Close a specific tab
+node dist/browser.js close all               # Close all tabs except first
+```
+---
+## Architecture
+```
+surfagent start
+    │
+    ├── Launches Chrome (separate window, separate profile)
+    │   └── --remote-debugging-port=9222
+    │       --user-data-dir=/tmp/surfagent-chrome
+    │
+    └── Starts API Server (:3456)
+        │
+        ├── src/api/recon.ts    (page reconnaissance)
+        ├── src/api/act.ts      (fill, click, scroll, read, navigate, eval, captcha)
+        └── src/api/server.ts   (HTTP routing)
+             │
+             └── src/chrome/    (CDP connection layer)
+                  │
+                  ▼
+             Chrome (:9222)     ← separate window, user's Chrome untouched
+```
+---
+## Troubleshooting
+**"Cannot connect to Chrome"**
+- Run `surfagent start` — it handles everything
+- If Chrome is already running with debug mode, use `surfagent api` to just start the API
+**A second Chrome window appeared**
+- This is expected. `surfagent` runs its own Chrome with a separate profile. Your personal Chrome is not affected. Close the surfagent Chrome window when you're done.
+**Elements not found**
+- Always `/recon` first to see what's available
+- Try partial text matching with `/click`
+- Some elements are `role="option"` or `li` with `aria-label` — use selector from recon
+**Form fields not filling**
+- Use the API `/fill` endpoint — it uses real CDP keystrokes
+- For SPAs, use `"submit": "enter"` instead of clicking submit buttons
+**Links opening new tabs instead of navigating**
+- The API `/click` handles `target="_blank"` automatically
+- It overrides the target and navigates in the same tab
+**Cross-origin iframes**
+- Target them by their domain: `"tab": "cdpn.io"`
+- CDP connects to iframes as separate targets
+**Tab hidden behind other tabs**
+- Use `/focus` to bring it to front
+- `/navigate` does this automatically
+**Too many tabs open**
+- Use `close all` to close all but first tab
+- Always clean up after tasks

package/SURFAGENT_BUG_REPORT.md ADDED Viewed

@@ -0,0 +1,199 @@
+# Surfagent v1.1.0 — QA Bug Report
+**Date:** 2026-04-13
+**Tester:** Claude (automated QA)
+**Environment:** macOS Darwin 24.5.0, Chrome via CDP port 9222, API port 3500
+**Scope:** All API endpoints, CLI commands, error handling, edge cases (single-tab browsing)
+---
+## BUGS
+### BUG-1: `/click` response `clicked` field returns empty text [Medium]
+**Steps:** Click any element by text or selector on any page.
+**Expected:** `clicked` field should contain the element tag + visible text (e.g. `"A: Show HN"`).
+**Actual:** Returns empty text like `"A: "`, `"TEXTAREA: "` on many elements.
+**Root cause:** In `act.ts:219`, the click handler uses `el.innerText || el.value` but many elements (links with nested spans, textareas before user input) return empty `innerText`. The code also doesn't check `el.textContent` or `el.getAttribute('aria-label')` as fallbacks.
+**File:** `src/api/act.ts`, lines 218-219
+**Suggested fix:** Use the same `getText()` helper from `recon.ts` which checks `innerText`, `textContent`, `value`, `aria-label`, `title`, `alt`, and `name`.
+---
+### BUG-2: HTML entities in tab titles not decoded [Low]
+**Steps:** Open a page with `&` in the title (e.g. BBC). Call `GET /tabs` or CLI `list`.
+**Expected:** Title shows `&` (decoded).
+**Actual:** Title shows `&amp;` (HTML entity).
+**Example:** `"BBC Home - Breaking News, World News, US News, Sports, Business, Innovation, Climate, Culture, Travel, Video &amp; Audio"`
+**Root cause:** Chrome CDP returns HTML-encoded titles for some pages. `tabs.ts` passes them through without decoding.
+**File:** `src/chrome/tabs.ts`
+**Suggested fix:** Decode HTML entities in title after receiving from CDP (e.g. replace `&amp;` -> `&`, `&lt;` -> `<`, etc.).
+---
+### BUG-3: `reconUrl` title discrepancy vs `reconTab` [Low]
+**Steps:** Call `/recon` with `{"url":"https://httpbin.org/html", "keepTab":true}`. Then call `/recon` with `{"tab":"httpbin"}`.
+**Expected:** Both return the same title.
+**Actual:**
+- `reconUrl` returns `title: ""` (uses `document.title` which is empty)
+- `reconTab` returns `title: "httpbin.org/html"` (uses CDP target title, which Chrome auto-generates)
+**Root cause:** `reconUrl` reads title via `document.title` in JS. `reconTab` reads from CDP target metadata. When a page has no `<title>` tag, these diverge.
+**File:** `src/api/recon.ts`, lines 345-348 vs 434
+**Suggested fix:** In `reconUrl`, fall back to CDP target title when `document.title` is empty.
+---
+### BUG-4: Windows platform: `getChromePath()` uses Unix-only `test -f` [Medium]
+**Steps:** Run `surfagent start` on Windows.
+**Expected:** Chrome is found at standard Windows paths.
+**Actual:** `execSync('test -f "..."')` always fails on Windows (no `test` command in cmd.exe).
+**File:** `src/cli.ts`, lines 56-58
+**Suggested fix:** Use `fs.existsSync()` instead of `execSync('test -f ...')` for cross-platform compatibility.
+---
+### BUG-5: `-h` flag conflict in CLI — `--host` shadows `--help` [Medium]
+**Steps:** Run `node dist/browser.js -h`
+**Expected:** Shows help output.
+**Actual:** Error: `option '-h, --host <string>' argument missing`
+**Root cause:** Commander.js reserves `-h` for `--help` by default, but the program overrides it with `-h, --host`.
+**File:** `src/browser.ts`, line 27
+**Suggested fix:** Change host shorthand to `-H` or `--host` only (no shorthand). Keep `-h` for help.
+---
+### BUG-6: CLI version mismatch [Low]
+**Steps:** Run `node dist/browser.js --version`
+**Expected:** Shows `1.1.0` (matching package.json).
+**Actual:** Shows `1.0.0`.
+**File:** `src/browser.ts`, line 20 — `.version('1.0.0')` should be `1.1.0`
+**Suggested fix:** Read version from package.json or update hardcoded string.
+---
+### BUG-7: CLI program name is "browser" not "surfagent" [Low]
+**Steps:** Run `node dist/browser.js --help`
+**Expected:** `Usage: surfagent [options] [command]`
+**Actual:** `Usage: browser [options] [command]`
+**File:** `src/browser.ts`, line 19 — `.name('browser')` should be `.name('surfagent')`
+---
+### BUG-8: Help text references wrong repo URL [Low]
+**Steps:** Run `npx surfagent help`
+**Actual output:** `Full API docs: https://github.com/AllAboutAI-YT/webpilot#readme`
+**Expected:** Should reference `https://github.com/AllAboutAI-YT/surfagent#readme`
+**File:** `src/cli.ts`, line 133
+---
+### BUG-9: 404 error message lists only 3 of 12+ endpoints [Low]
+**Steps:** Hit any unknown endpoint (e.g. `GET /notreal`).
+**Actual:** `"Not found. Endpoints: POST /recon, GET /tabs, GET /health"`
+**Expected:** Should list all endpoints or link to docs.
+**File:** `src/api/server.ts`, line 211
+**Suggested fix:** List all endpoints or say `"Not found. See GET /health for status or docs at <url>"`
+---
+### BUG-10: Error messages echo back full user input (info leak / response bloat) [Low]
+**Steps:** Send a request with an extremely long tab pattern (e.g. 10,000 chars).
+**Actual:** Error response includes the full 10K string: `"Tab not found: AAAA...AAAA"`
+**Expected:** Error should truncate the echoed input.
+**File:** `src/api/act.ts`, line 25
+**Suggested fix:** Truncate `tabPattern` in error messages to ~100 chars.
+---
+## IMPROVEMENTS
+### IMP-1: No `/close` API endpoint [High]
+The CLI has a `close` command but there is no HTTP API equivalent. Agents using the API cannot close tabs without using the CLI. The CLAUDE.md docs suggest using `/click` with text "close" which is unreliable.
+**Suggested:** Add `POST /close` endpoint with `{"tab": "...", "all": false}`.
+---
+### IMP-2: `/fill` returns HTTP 200 even when all fields fail [Medium]
+When every field in a `/fill` request fails (e.g. all selectors not found), the response is HTTP 200 with `success: false` in each field. An agent checking only HTTP status would think the operation succeeded.
+**Suggested:** Return HTTP 207 (Multi-Status) or at minimum add a top-level `"allSucceeded": false` field.
+---
+### IMP-3: No timeout parameter on `/click` with `waitAfter` [Low]
+`/click` supports `waitAfter` but caps it at 10 seconds silently. There's no feedback that the wait was capped.
+**Suggested:** Return `"waitCapped": true` in response when `waitAfter` exceeds 10s.
+---
+### IMP-4: `/scroll` contentPreview unreliable on pages with fixed/sticky sidebars [Low]
+The scroll content preview skips `position: fixed/sticky` elements via `getComputedStyle` check on `el.closest('nav, aside, ...')`, but many sticky sidebars use `position: sticky` on a non-nav/aside element. Some content may be incorrectly filtered.
+---
+### IMP-5: `/dismiss` has limited cookie banner coverage [Low]
+Tested on BBC — no cookie banner dismissed despite BBC having a consent banner. The consent patterns list is good but may miss banners that use custom markup (not standard `<button>` elements, or that appear inside shadow DOM or iframes).
+---
+### IMP-6: No `POST /screenshot` API endpoint [Medium]
+The CLI has a `screenshot` command but there's no HTTP API equivalent. Screenshots are essential for agent visual verification.
+**Suggested:** Add `POST /screenshot` returning base64-encoded image or saving to a path.
+---
+### IMP-7: `startChrome()` copies cookies but not LocalStorage/IndexedDB [Low]
+Cookies are copied from the default Chrome profile, preserving some logins. But many modern apps use LocalStorage or IndexedDB for auth tokens (e.g. SPAs), which are not copied.
+---
+### IMP-8: No rate limiting or request queuing [Low]
+Concurrent CDP connections to the same tab could cause race conditions. While basic parallel reads worked in testing, concurrent writes (fill + click on same tab) could conflict.
+**Suggested:** Add optional request queuing per tab to serialize mutations.
+---
+## SUMMARY
+| Category | Count | Critical | Medium | Low |
+|----------|-------|----------|--------|-----|
+| Bugs     | 10    | 0        | 3      | 7   |
+| Improvements | 8 | 0        | 3      | 5   |
+**Overall assessment:** The core functionality (recon, navigate, fill, click, scroll, read, eval) works reliably. Error handling is solid across all endpoints with proper HTTP status codes and descriptive messages. The main issues are cosmetic/documentation bugs and missing API parity with the CLI. The most impactful improvements would be adding `/close` and `/screenshot` API endpoints.

package/dist/api/act.js CHANGED Viewed

@@ -20,7 +20,7 @@ async function resolveTab(tabPattern, port, host) {
         }
     }
     if (!tab)
-        throw new Error(`Tab not found: ${tabPattern}`);
+        throw new Error(`Tab not found: ${tabPattern.length > 100 ? tabPattern.substring(0, 100) + '...' : tabPattern}`);
     return tab;
 }
 export async function fillFields(request, options) {

package/dist/api/recon.js CHANGED Viewed

@@ -286,7 +286,17 @@ export async function reconUrl(url, options) {
             expression: 'document.title',
             returnByValue: true
         });
-        const title = titleResult.result.value || '';
+        let title = titleResult.result.value || '';
+        // Fall back to CDP target title when document.title is empty (e.g. pages with no <title> tag)
+        if (!title) {
+            try {
+                const targets = await CDP.List({ port, host });
+                const t = targets.find((t) => t.id === target.id);
+                if (t?.title)
+                    title = t.title;
+            }
+            catch { }
+        }
         // Get current URL (may have redirected)
         const urlResult = await client.Runtime.evaluate({
             expression: 'window.location.href',
@@ -307,7 +317,7 @@ export async function reconUrl(url, options) {
         }
         return {
             url: finalUrl,
-            title,
+            title: title || target.title || '',
             tabId: target.id,
             timestamp: new Date().toISOString(),
             meta: data.meta,

package/dist/api/server.js CHANGED Viewed

@@ -180,7 +180,7 @@ const server = http.createServer(async (req, res) => {
                 return json(res, 503, { status: 'error', cdpConnected: false });
             }
         }
-        json(res, 404, { error: 'Not found. Endpoints: POST /recon, GET /tabs, GET /health' });
+        json(res, 404, { error: 'Not found. Endpoints: POST /recon, /read, /fill, /click, /type, /scroll, /navigate, /eval, /dismiss, /captcha, /focus | GET /tabs, /health' });
     }
     catch (error) {
         const message = error instanceof Error ? error.message : String(error);

package/dist/browser.js CHANGED Viewed

@@ -14,13 +14,13 @@ import { descCommand } from './commands/desc.js';
 import { closeCommand } from './commands/close.js';
 const program = new Command();
 program
-    .name('browser')
-    .description('CLI tool to interact with Chrome tabs via CDP')
-    .version('1.0.0');
+    .name('surfagent')
+    .description('Browser automation CLI for AI agents — interact with Chrome tabs via CDP')
+    .version('1.1.1');
 // Global options
 program
     .option('-p, --port <number>', 'Chrome debugging port', '9222')
-    .option('-h, --host <string>', 'Chrome debugging host', 'localhost');
+    .option('-H, --host <string>', 'Chrome debugging host', 'localhost');
 // List command
 program
     .command('list')

package/dist/cli.js CHANGED Viewed

@@ -1,5 +1,6 @@
 #!/usr/bin/env node
 import { execSync, spawn } from 'node:child_process';
+import fs from 'node:fs';
 import http from 'node:http';
 import path from 'node:path';
 import { fileURLToPath } from 'node:url';
@@ -49,8 +50,8 @@ function getChromePath() {
     };
     for (const p of paths[os] || []) {
         try {
-            execSync(`test -f "${p}"`, { stdio: 'ignore' });
-            return p;
+            if (fs.existsSync(p))
+                return p;
         }
         catch {
             continue;
@@ -120,7 +121,7 @@ Environment variables:
   CHROME_USER_DATA_DIR  Chrome profile directory (default: /tmp/surfagent-chrome)
 After starting, your AI agent can call http://localhost:3456
-Full API docs: https://github.com/AllAboutAI-YT/webpilot#readme
+Full API docs: https://github.com/AllAboutAI-YT/surfagent#readme
 `);
         return;
     }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "surfagent",
-  "version": "1.1.1",
+  "version": "1.1.2",
   "description": "Browser automation API for AI agents — structured page recon, form filling, clicking, and navigation via Chrome CDP",
   "keywords": [
     "ai-agent",

package/src/api/act.ts CHANGED Viewed

@@ -22,7 +22,7 @@ async function resolveTab(tabPattern: string, port: number, host: string): Promi
     }
   }
-  if (!tab) throw new Error(`Tab not found: ${tabPattern}`);
+  if (!tab) throw new Error(`Tab not found: ${tabPattern.length > 100 ? tabPattern.substring(0, 100) + '...' : tabPattern}`);
   return tab;
 }

package/src/api/recon.ts CHANGED Viewed

@@ -343,7 +343,15 @@ export async function reconUrl(
       expression: 'document.title',
       returnByValue: true
     });
-    const title = titleResult.result.value as string || '';
+    let title = titleResult.result.value as string || '';
+    // Fall back to CDP target title when document.title is empty (e.g. pages with no <title> tag)
+    if (!title) {
+      try {
+        const targets = await CDP.List({ port, host });
+        const t = targets.find((t: any) => t.id === target.id);
+        if (t?.title) title = t.title;
+      } catch {}
+    }
     // Get current URL (may have redirected)
     const urlResult = await client.Runtime.evaluate({
@@ -370,7 +378,7 @@ export async function reconUrl(
     return {
       url: finalUrl,
-      title,
+      title: title || target.title || '',
       tabId: target.id,
       timestamp: new Date().toISOString(),
       meta: data.meta,

package/src/api/server.ts CHANGED Viewed

@@ -208,7 +208,7 @@ const server = http.createServer(async (req, res) => {
       }
     }
-    json(res, 404, { error: 'Not found. Endpoints: POST /recon, GET /tabs, GET /health' });
+    json(res, 404, { error: 'Not found. Endpoints: POST /recon, /read, /fill, /click, /type, /scroll, /navigate, /eval, /dismiss, /captcha, /focus | GET /tabs, /health' });
   } catch (error) {
     const message = error instanceof Error ? error.message : String(error);
     console.error(`[${new Date().toISOString()}] Error:`, message);

package/src/browser.ts CHANGED Viewed

@@ -17,14 +17,14 @@ import { closeCommand } from './commands/close.js';
 const program = new Command();
 program
-  .name('browser')
-  .description('CLI tool to interact with Chrome tabs via CDP')
-  .version('1.0.0');
+  .name('surfagent')
+  .description('Browser automation CLI for AI agents — interact with Chrome tabs via CDP')
+  .version('1.1.1');
 // Global options
 program
   .option('-p, --port <number>', 'Chrome debugging port', '9222')
-  .option('-h, --host <string>', 'Chrome debugging host', 'localhost');
+  .option('-H, --host <string>', 'Chrome debugging host', 'localhost');
 // List command
 program

package/src/cli.ts CHANGED Viewed

@@ -1,6 +1,7 @@
 #!/usr/bin/env node
 import { execSync, spawn } from 'node:child_process';
+import fs from 'node:fs';
 import http from 'node:http';
 import path from 'node:path';
 import { fileURLToPath } from 'node:url';
@@ -54,8 +55,7 @@ function getChromePath(): string | null {
   for (const p of paths[os] || []) {
     try {
-      execSync(`test -f "${p}"`, { stdio: 'ignore' });
-      return p;
+      if (fs.existsSync(p)) return p;
     } catch {
       continue;
     }
@@ -130,7 +130,7 @@ Environment variables:
   CHROME_USER_DATA_DIR  Chrome profile directory (default: /tmp/surfagent-chrome)
 After starting, your AI agent can call http://localhost:3456
-Full API docs: https://github.com/AllAboutAI-YT/webpilot#readme
+Full API docs: https://github.com/AllAboutAI-YT/surfagent#readme
 `);
     return;
   }