npm - browse-agent-cli - Versions diffs - 0.0.1 → 0.0.3 - Mend

browse-agent-cli 0.0.1 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/package.json +4 -3
package/skills/references/api.md +91 -0
package/skills/references/cli.md +91 -0
package/skills/references/examples.md +212 -0
/package/{skill → skills}/SKILL.md +0 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "browse-agent-cli",
-  "version": "0.0.1",
+  "version": "0.0.3",
   "type": "module",
   "description": "TypeScript CLI for browse-agent",
   "main": "./dist/cli.js",
@@ -21,12 +21,13 @@
   "files": [
     "dist/**/*.js",
     "dist/**/*.d.ts",
-    "skill/**"
+    "skills/**/*"
   ],
   "scripts": {
     "build": "tsdown",
     "clean": "rm -rf dist",
-    "prepublishOnly": "npm run clean && npm run build"
+    "sync-skill": "mkdir -p skills/references/ && cp -r skill/references/* skills/references/ && cp -r skill/SKILL.md skills/",
+    "prepublishOnly": "npm run clean && npm run build && npm run sync-skill"
   },
   "dependencies": {
     "browse-agent-sdk": "^0.0.2"

package/skills/references/api.md ADDED Viewed

@@ -0,0 +1,91 @@
+# API Reference
+## Agent Methods
+| Method | Description | Result |
+|--------|-------------|--------------------------|
+| `navigate(url, opts?)` | Open URL in new tab. `opts: { waitForLoad?, timeout? }` | `{ tabId, url, title }` |
+| `getContent(opts?)` | Get page content. `opts: { format: 'html'\|'text', tabId? }` | `{ content, url, title }` |
+| `getDOM(selector, opts?)` | Query DOM. `opts: { property?: 'outerHTML'\|'innerHTML'\|'innerText', all?, tabId? }` | `{ result }` |
+| `evaluate(expression, tabId?)` | Run JS expression, return value. | `{ result }` |
+| `injectScript(code, tabId?)` | Execute JS code block. | `{ success }` |
+| `injectCSS(code, tabId?)` | Inject CSS stylesheet. | `{ success }` |
+| `screenshotVisible(opts?)` | Capture viewport. `opts: { format?, quality?, tabId? }` | `{ data (base64), format, width, height }` |
+| `screenshotFullPage(opts?)` | Capture full page. `opts: { format?, quality?, tabId? }` | `{ data (base64), format, width, height }` |
+| `screenshotArea(clip, opts?)` | Capture region. `clip: { x, y, width, height }` `opts: { format?, quality?, tabId? }` | `{ data (base64), format, width, height }` |
+| `listTabs()` | List all open tabs | `{ tabs: [{ id, url, title, active }] }` |
+| `closeTab(tabId)` | Close a tab | — |
+| `activateTab(tabId)` | Switch to a tab | — |
+All methods return direct result objects (for example `{ result }`, `{ content, url, title }`, `{ tabs }`).
+## Modular Scripts
+All functionality is also available as individual modules for fine-grained control. Import from `browse-agent-cli/script`.
+### Lifecycle Scripts
+| Script | Export | Description |
+|--------|--------|-------------|
+| [launch-browser.mjs](../scripts/launch-browser.mjs) | `launchBrowser(options?)` | Start browser with extension. Returns session object |
+| [connect.mjs](../scripts/connect.mjs) | `connect(options?)` | Connect to a running browser session. Returns agent |
+| [close-browser.mjs](../scripts/close-browser.mjs) | `closeBrowser(agent?)` | Kill browser, stop agent, clean up temp files |
+| [clear.mjs](../scripts/clear.mjs) | `clear(options?)` | Remove installation, dependencies, and session data |
+### Feature Scripts
+| Script | Export | Description |
+|--------|--------|-------------|
+| [navigate.mjs](../scripts/navigate.mjs) | `navigate(agent, url, opts?)` | Open URL → `{ tabId, url, title }` |
+| [get-content.mjs](../scripts/get-content.mjs) | `getContent(agent, opts?)` | Get page HTML/text → `{ content, url, title }` |
+| [get-dom.mjs](../scripts/get-dom.mjs) | `getDOM(agent, selector, opts?)` | Query DOM elements → `{ result }` |
+| [evaluate.mjs](../scripts/evaluate.mjs) | `evaluate(agent, expression, opts?)` | Run JS expression → `{ result }` |
+| [inject-script.mjs](../scripts/inject-script.mjs) | `injectScript(agent, code, opts?)` | Execute JS code → `{ success }` |
+| [inject-css.mjs](../scripts/inject-css.mjs) | `injectCSS(agent, code, opts?)` | Inject CSS → `{ success }` |
+| [screenshot.mjs](../scripts/screenshot.mjs) | `screenshot(agent, mode, opts?)` | Capture screenshot → `{ data, format, width, height }` |
+| [tabs.mjs](../scripts/tabs.mjs) | `listTabs(agent)` / `closeTab(agent, id)` / `activateTab(agent, id)` | Tab management |
+### Step-by-Step Example (Modular)
+```javascript
+import { launchBrowser } from 'browse-agent-cli/script';
+import { navigate } from 'browse-agent-cli/script';
+import { getContent } from 'browse-agent-cli/script';
+import { closeBrowser } from 'browse-agent-cli/script';
+// 1. Launch
+const session = await launchBrowser({ browser: 'chrome' });
+const agent = session._agent;
+await agent.waitForConnection(30000);
+try {
+  // 2. Browse
+  await navigate(agent, 'https://example.com');
+  const page = await getContent(agent, { format: 'text' });
+  console.log(JSON.stringify(page, null, 2));
+} finally {
+  // 3. Clean up
+  await closeBrowser(agent);
+}
+```
+All modules are also re-exported from [browse.mjs](../scripts/browse.mjs) for convenience:
+```javascript
+import { launchBrowser, navigate, getContent, screenshot, closeBrowser } from 'browse-agent-cli/script';
+```
+## Browse Options
+Options passed as second argument to `browse()` or via env vars:
+| Option / Env Var | Values | Default | Description |
+|---|---|---|---|
+| `browser` / `BROWSER` | `chrome`, `chromium`, `edge`, `brave` | `chrome` | Browser to launch |
+| `headless` / `HEADLESS` | `true`, `false` | `false` | Run without visible window |
+| `useUserProfile` / `USE_USER_PROFILE` | `true`, `false` | `false` | Use default browser profile (keeps login sessions, cookies). ⚠ Close the browser first! |
+| `port` / `BROWSE_AGENT_PORT` | number | `9315` | WebSocket port |
+| `timeout` / `CONNECTION_TIMEOUT` | ms | `30000` | Connection timeout |
+| `secret` / `SHARED_SECRET` | string | `''` | Optional shared secret. Empty uses no-secret handshake |
+| `printResult` | `true`, `false` | `true` | Print callback return value to stdout as JSON |
+| — / `CHROME_PATH` | path | — | Custom browser executable path |

package/skills/references/cli.md ADDED Viewed

@@ -0,0 +1,91 @@
+# CLI Reference
+The [CLI](../cli.mjs) provides a command-line interface to all browse-agent functionality:
+```bash
+browse-agent <command> [options]
+```
+## Lifecycle Commands
+```bash
+# Setup (install SDK + extension)
+browse-agent setup
+browse-agent setup --global
+# Launch browser
+browse-agent launch
+browse-agent launch --browser edge --headless
+# Check connection
+browse-agent connect
+# Close browser
+browse-agent close
+# Remove installation
+browse-agent clear
+browse-agent clear --global
+```
+## Feature Commands
+Feature commands connect to an existing browser session (started with `launch`):
+```bash
+# Navigate
+browse-agent navigate https://example.com
+# Get content
+browse-agent get-content --format text
+# Query DOM
+browse-agent get-dom "h1" --property innerText
+browse-agent get-dom ".item" --property innerHTML --all
+# Evaluate JS
+browse-agent evaluate "document.title"
+# Inject script/CSS
+browse-agent inject-script "document.body.style.background = 'red'"
+browse-agent inject-css "body { background: #f0f8ff }"
+# Screenshot
+browse-agent screenshot visible
+browse-agent screenshot fullPage --format png
+# Tab management
+browse-agent tabs list
+browse-agent tabs activate 123
+browse-agent tabs close 123
+```
+## Options
+| Option | Applies to | Description |
+|---|---|---|
+| `--global` | setup, clear | Use global installation (`~/.browse-agent/`) |
+| `--browser <name>` | launch | Browser: `chrome` \| `chromium` \| `edge` \| `brave` |
+| `--headless` | launch | Run in headless mode |
+| `--port <number>` | launch, connect, feature cmds | WebSocket port (default: 9315) |
+| `--tabId <id>` | all feature cmds | Target a specific tab (ID from `navigate` or `tabs list`) |
+| `--format <type>` | get-content, screenshot | Content format (`text`/`html`) or screenshot format (`png`/`jpeg`) |
+| `--property <prop>` | get-dom | DOM property: `outerHTML` \| `innerHTML` \| `innerText` |
+| `--all` | get-dom | Return all DOM matches |
+| `--quality <num>` | screenshot | JPEG quality 1-100 |
+| `--timeout <ms>` | connect, feature cmds | Connection timeout in ms |
+| `--help`, `-h` | all | Show help |
+## Cleanup
+Remove the browse-agent installation with the [clear script](../scripts/clear.mjs):
+```bash
+# Remove local installation (project .browse-agent/ + uninstall SDK)
+browse-agent clear
+```
+The clear script will:
+1. Kill any running browser session
+2. Remove the `.browse-agent/` directory (extension, profile, session data)
+3. Uninstall `browse-agent-sdk` (local mode only — global SDK is inside the removed directory)

package/skills/references/examples.md ADDED Viewed

@@ -0,0 +1,212 @@
+# Examples & Troubleshooting
+## Step-by-Step CLI Examples (Recommended)
+### Extract Text from a Page
+```bash
+# 1. Launch browser
+browse-agent launch 2>/dev/null
+# 2. Navigate to page
+browse-agent navigate "https://example.com" 2>/dev/null
+# 3. Read content — decide next step based on output
+browse-agent get-content --format text 2>/dev/null
+# 4. Done — close browser
+browse-agent close 2>/dev/null
+```
+### Find Specific Content on a Page
+```bash
+browse-agent launch 2>/dev/null
+browse-agent navigate "https://news.ycombinator.com" 2>/dev/null
+# First, read the page to understand structure
+browse-agent get-content --format text 2>/dev/null
+# Then query specific elements based on what you found
+browse-agent get-dom ".titleline > a" --property innerText --all 2>/dev/null
+browse-agent close 2>/dev/null
+```
+### Screenshot and Inspect
+```bash
+browse-agent launch 2>/dev/null
+browse-agent navigate "https://example.com" 2>/dev/null
+# Take a screenshot to see the page visually
+browse-agent screenshot visible 2>/dev/null
+# Run JS to count elements, check state, etc.
+browse-agent evaluate "document.querySelectorAll('a').length" 2>/dev/null
+browse-agent close 2>/dev/null
+```
+### Multi-Page Exploration
+```bash
+browse-agent launch 2>/dev/null
+# Visit first page
+browse-agent navigate "https://example.com" 2>/dev/null
+browse-agent get-content --format text 2>/dev/null
+# Visit second page (based on what you found)
+browse-agent navigate "https://example.org" 2>/dev/null
+browse-agent get-content --format text 2>/dev/null
+# Manage tabs
+browse-agent tabs list 2>/dev/null
+browse-agent tabs close 123 2>/dev/null
+browse-agent close 2>/dev/null
+```
+### Use Logged-in Browser Profile
+```bash
+# Launch with user's default browser profile (preserves cookies/sessions)
+browse-agent launch --browser chrome 2>/dev/null
+# ⚠ Close all Chrome windows first!
+USE_USER_PROFILE=true browse-agent launch 2>/dev/null
+browse-agent navigate "https://github.com/notifications" 2>/dev/null
+browse-agent get-content --format text 2>/dev/null
+browse-agent close 2>/dev/null
+```
+## One-Shot Script Examples
+### Extract Text from a Page
+```javascript
+import { browse } from 'browse-agent-cli/script';
+await browse(async (agent) => {
+  const { tabId } = await agent.navigate('https://example.com');
+  const result = await agent.getContent({ format: 'text', tabId });
+  return { title: result.title, text: result.content };
+});
+```
+### Query DOM Elements
+```javascript
+import { browse } from 'browse-agent-cli/script';
+await browse(async (agent) => {
+  const { tabId } = await agent.navigate('https://news.ycombinator.com');
+  const titles = await agent.getDOM('.titleline > a', {
+    property: 'innerText',
+    all: true,
+    tabId,
+  });
+  return { headlines: titles.result };
+});
+```
+### Run JavaScript on the Page
+```javascript
+import { browse } from 'browse-agent-cli/script';
+await browse(async (agent) => {
+  const { tabId } = await agent.navigate('https://example.com');
+  const count = await agent.evaluate('document.querySelectorAll("a").length', tabId);
+  return { linkCount: count.result };
+});
+```
+### Take a Screenshot
+```javascript
+import { browse } from 'browse-agent-cli/script';
+import { writeFileSync } from 'fs';
+await browse(async (agent) => {
+  const { tabId } = await agent.navigate('https://example.com');
+  const shot = await agent.screenshotVisible({ format: 'png', tabId });
+  writeFileSync('screenshot.png', Buffer.from(shot.data, 'base64'));
+  return { saved: 'screenshot.png', width: shot.width, height: shot.height };
+});
+```
+### Multi-Page Data Collection
+```javascript
+import { browse } from 'browse-agent-cli/script';
+await browse(async (agent) => {
+  const urls = ['https://example.com', 'https://example.org'];
+  const results = [];
+  for (const url of urls) {
+    const { tabId } = await agent.navigate(url);
+    const page = await agent.getContent({ format: 'text', tabId });
+    results.push({ url: page.url, title: page.title, content: page.content });
+    const tabs = await agent.listTabs();
+    const tab = tabs.tabs.find(t => t.url === url);
+    if (tab) await agent.closeTab(tab.id);
+  }
+  return results;
+});
+```
+### Access Logged-in Content (User Profile)
+```javascript
+import { browse } from 'browse-agent-cli/script';
+await browse(async (agent) => {
+  const { tabId } = await agent.navigate('https://github.com/notifications');
+  const content = await agent.getContent({ format: 'text', tabId });
+  return { title: content.title, content: content.content };
+}, { useUserProfile: true });
+```
+### Override Browser Options via Env
+```bash
+# Use user's default Chrome profile (keeps login sessions)
+USE_USER_PROFILE=true node _browse_task.mjs 2>/dev/null
+# Use Edge with user profile
+BROWSER=edge USE_USER_PROFILE=true node _browse_task.mjs 2>/dev/null
+# Custom executable path
+CHROME_PATH=/path/to/browser node _browse_task.mjs 2>/dev/null
+```
+### Options via Code
+```javascript
+// Use Edge with user profile
+await browse(async (agent) => { /* ... */ }, {
+  browser: 'edge',
+  useUserProfile: true,
+});
+// Return result without printing JSON to stdout
+const data = await browse(async (agent) => {
+  await agent.navigate('https://example.com');
+  return { title: 'ok' };
+}, {
+  printResult: false,
+});
+```
+## Troubleshooting
+| Problem | Solution |
+|---------|----------|
+| Browser not found | Set `CHROME_PATH` env var to the executable path, or use `BROWSER=edge\|chromium\|brave` |
+| Profile locked | Close all existing browser windows first when using `USE_USER_PROFILE=true` |
+| Connection timeout (30s) | Ensure port 9315 is free. Kill stale Chrome: `pkill -f "user-data-dir=.*browse-agent"` |
+| Extension not loading | Verify `.browse-agent/extension/manifest.json` exists. Re-run setup script |
+| CSP blocks script injection | Use `evaluate()` instead — it uses CDP to bypass Content Security Policy |
+| Stale Chrome profile | Delete `.browse-agent/chrome-profile/` and retry |

/package/{skill → skills}/SKILL.md RENAMED Viewed

File without changes