browse-agent-cli 0.0.1 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "browse-agent-cli",
3
- "version": "0.0.1",
3
+ "version": "0.0.3",
4
4
  "type": "module",
5
5
  "description": "TypeScript CLI for browse-agent",
6
6
  "main": "./dist/cli.js",
@@ -21,12 +21,13 @@
21
21
  "files": [
22
22
  "dist/**/*.js",
23
23
  "dist/**/*.d.ts",
24
- "skill/**"
24
+ "skills/**/*"
25
25
  ],
26
26
  "scripts": {
27
27
  "build": "tsdown",
28
28
  "clean": "rm -rf dist",
29
- "prepublishOnly": "npm run clean && npm run build"
29
+ "sync-skill": "mkdir -p skills/references/ && cp -r skill/references/* skills/references/ && cp -r skill/SKILL.md skills/",
30
+ "prepublishOnly": "npm run clean && npm run build && npm run sync-skill"
30
31
  },
31
32
  "dependencies": {
32
33
  "browse-agent-sdk": "^0.0.2"
@@ -0,0 +1,91 @@
1
+ # API Reference
2
+
3
+ ## Agent Methods
4
+
5
+ | Method | Description | Result |
6
+ |--------|-------------|--------------------------|
7
+ | `navigate(url, opts?)` | Open URL in new tab. `opts: { waitForLoad?, timeout? }` | `{ tabId, url, title }` |
8
+ | `getContent(opts?)` | Get page content. `opts: { format: 'html'\|'text', tabId? }` | `{ content, url, title }` |
9
+ | `getDOM(selector, opts?)` | Query DOM. `opts: { property?: 'outerHTML'\|'innerHTML'\|'innerText', all?, tabId? }` | `{ result }` |
10
+ | `evaluate(expression, tabId?)` | Run JS expression, return value. | `{ result }` |
11
+ | `injectScript(code, tabId?)` | Execute JS code block. | `{ success }` |
12
+ | `injectCSS(code, tabId?)` | Inject CSS stylesheet. | `{ success }` |
13
+ | `screenshotVisible(opts?)` | Capture viewport. `opts: { format?, quality?, tabId? }` | `{ data (base64), format, width, height }` |
14
+ | `screenshotFullPage(opts?)` | Capture full page. `opts: { format?, quality?, tabId? }` | `{ data (base64), format, width, height }` |
15
+ | `screenshotArea(clip, opts?)` | Capture region. `clip: { x, y, width, height }` `opts: { format?, quality?, tabId? }` | `{ data (base64), format, width, height }` |
16
+ | `listTabs()` | List all open tabs | `{ tabs: [{ id, url, title, active }] }` |
17
+ | `closeTab(tabId)` | Close a tab | — |
18
+ | `activateTab(tabId)` | Switch to a tab | — |
19
+
20
+ All methods return direct result objects (for example `{ result }`, `{ content, url, title }`, `{ tabs }`).
21
+
22
+ ## Modular Scripts
23
+
24
+ All functionality is also available as individual modules for fine-grained control. Import from `browse-agent-cli/script`.
25
+
26
+ ### Lifecycle Scripts
27
+
28
+ | Script | Export | Description |
29
+ |--------|--------|-------------|
30
+ | [launch-browser.mjs](../scripts/launch-browser.mjs) | `launchBrowser(options?)` | Start browser with extension. Returns session object |
31
+ | [connect.mjs](../scripts/connect.mjs) | `connect(options?)` | Connect to a running browser session. Returns agent |
32
+ | [close-browser.mjs](../scripts/close-browser.mjs) | `closeBrowser(agent?)` | Kill browser, stop agent, clean up temp files |
33
+ | [clear.mjs](../scripts/clear.mjs) | `clear(options?)` | Remove installation, dependencies, and session data |
34
+
35
+ ### Feature Scripts
36
+
37
+ | Script | Export | Description |
38
+ |--------|--------|-------------|
39
+ | [navigate.mjs](../scripts/navigate.mjs) | `navigate(agent, url, opts?)` | Open URL → `{ tabId, url, title }` |
40
+ | [get-content.mjs](../scripts/get-content.mjs) | `getContent(agent, opts?)` | Get page HTML/text → `{ content, url, title }` |
41
+ | [get-dom.mjs](../scripts/get-dom.mjs) | `getDOM(agent, selector, opts?)` | Query DOM elements → `{ result }` |
42
+ | [evaluate.mjs](../scripts/evaluate.mjs) | `evaluate(agent, expression, opts?)` | Run JS expression → `{ result }` |
43
+ | [inject-script.mjs](../scripts/inject-script.mjs) | `injectScript(agent, code, opts?)` | Execute JS code → `{ success }` |
44
+ | [inject-css.mjs](../scripts/inject-css.mjs) | `injectCSS(agent, code, opts?)` | Inject CSS → `{ success }` |
45
+ | [screenshot.mjs](../scripts/screenshot.mjs) | `screenshot(agent, mode, opts?)` | Capture screenshot → `{ data, format, width, height }` |
46
+ | [tabs.mjs](../scripts/tabs.mjs) | `listTabs(agent)` / `closeTab(agent, id)` / `activateTab(agent, id)` | Tab management |
47
+
48
+ ### Step-by-Step Example (Modular)
49
+
50
+ ```javascript
51
+ import { launchBrowser } from 'browse-agent-cli/script';
52
+ import { navigate } from 'browse-agent-cli/script';
53
+ import { getContent } from 'browse-agent-cli/script';
54
+ import { closeBrowser } from 'browse-agent-cli/script';
55
+
56
+ // 1. Launch
57
+ const session = await launchBrowser({ browser: 'chrome' });
58
+ const agent = session._agent;
59
+ await agent.waitForConnection(30000);
60
+
61
+ try {
62
+ // 2. Browse
63
+ await navigate(agent, 'https://example.com');
64
+ const page = await getContent(agent, { format: 'text' });
65
+ console.log(JSON.stringify(page, null, 2));
66
+ } finally {
67
+ // 3. Clean up
68
+ await closeBrowser(agent);
69
+ }
70
+ ```
71
+
72
+ All modules are also re-exported from [browse.mjs](../scripts/browse.mjs) for convenience:
73
+
74
+ ```javascript
75
+ import { launchBrowser, navigate, getContent, screenshot, closeBrowser } from 'browse-agent-cli/script';
76
+ ```
77
+
78
+ ## Browse Options
79
+
80
+ Options passed as second argument to `browse()` or via env vars:
81
+
82
+ | Option / Env Var | Values | Default | Description |
83
+ |---|---|---|---|
84
+ | `browser` / `BROWSER` | `chrome`, `chromium`, `edge`, `brave` | `chrome` | Browser to launch |
85
+ | `headless` / `HEADLESS` | `true`, `false` | `false` | Run without visible window |
86
+ | `useUserProfile` / `USE_USER_PROFILE` | `true`, `false` | `false` | Use default browser profile (keeps login sessions, cookies). ⚠ Close the browser first! |
87
+ | `port` / `BROWSE_AGENT_PORT` | number | `9315` | WebSocket port |
88
+ | `timeout` / `CONNECTION_TIMEOUT` | ms | `30000` | Connection timeout |
89
+ | `secret` / `SHARED_SECRET` | string | `''` | Optional shared secret. Empty uses no-secret handshake |
90
+ | `printResult` | `true`, `false` | `true` | Print callback return value to stdout as JSON |
91
+ | — / `CHROME_PATH` | path | — | Custom browser executable path |
@@ -0,0 +1,91 @@
1
+ # CLI Reference
2
+
3
+ The [CLI](../cli.mjs) provides a command-line interface to all browse-agent functionality:
4
+
5
+ ```bash
6
+ browse-agent <command> [options]
7
+ ```
8
+
9
+ ## Lifecycle Commands
10
+
11
+ ```bash
12
+ # Setup (install SDK + extension)
13
+ browse-agent setup
14
+ browse-agent setup --global
15
+
16
+ # Launch browser
17
+ browse-agent launch
18
+ browse-agent launch --browser edge --headless
19
+
20
+ # Check connection
21
+ browse-agent connect
22
+
23
+ # Close browser
24
+ browse-agent close
25
+
26
+ # Remove installation
27
+ browse-agent clear
28
+ browse-agent clear --global
29
+ ```
30
+
31
+ ## Feature Commands
32
+
33
+ Feature commands connect to an existing browser session (started with `launch`):
34
+
35
+ ```bash
36
+ # Navigate
37
+ browse-agent navigate https://example.com
38
+
39
+ # Get content
40
+ browse-agent get-content --format text
41
+
42
+ # Query DOM
43
+ browse-agent get-dom "h1" --property innerText
44
+ browse-agent get-dom ".item" --property innerHTML --all
45
+
46
+ # Evaluate JS
47
+ browse-agent evaluate "document.title"
48
+
49
+ # Inject script/CSS
50
+ browse-agent inject-script "document.body.style.background = 'red'"
51
+ browse-agent inject-css "body { background: #f0f8ff }"
52
+
53
+ # Screenshot
54
+ browse-agent screenshot visible
55
+ browse-agent screenshot fullPage --format png
56
+
57
+ # Tab management
58
+ browse-agent tabs list
59
+ browse-agent tabs activate 123
60
+ browse-agent tabs close 123
61
+ ```
62
+
63
+ ## Options
64
+
65
+ | Option | Applies to | Description |
66
+ |---|---|---|
67
+ | `--global` | setup, clear | Use global installation (`~/.browse-agent/`) |
68
+ | `--browser <name>` | launch | Browser: `chrome` \| `chromium` \| `edge` \| `brave` |
69
+ | `--headless` | launch | Run in headless mode |
70
+ | `--port <number>` | launch, connect, feature cmds | WebSocket port (default: 9315) |
71
+ | `--tabId <id>` | all feature cmds | Target a specific tab (ID from `navigate` or `tabs list`) |
72
+ | `--format <type>` | get-content, screenshot | Content format (`text`/`html`) or screenshot format (`png`/`jpeg`) |
73
+ | `--property <prop>` | get-dom | DOM property: `outerHTML` \| `innerHTML` \| `innerText` |
74
+ | `--all` | get-dom | Return all DOM matches |
75
+ | `--quality <num>` | screenshot | JPEG quality 1-100 |
76
+ | `--timeout <ms>` | connect, feature cmds | Connection timeout in ms |
77
+ | `--help`, `-h` | all | Show help |
78
+
79
+ ## Cleanup
80
+
81
+ Remove the browse-agent installation with the [clear script](../scripts/clear.mjs):
82
+
83
+ ```bash
84
+ # Remove local installation (project .browse-agent/ + uninstall SDK)
85
+ browse-agent clear
86
+ ```
87
+
88
+ The clear script will:
89
+ 1. Kill any running browser session
90
+ 2. Remove the `.browse-agent/` directory (extension, profile, session data)
91
+ 3. Uninstall `browse-agent-sdk` (local mode only — global SDK is inside the removed directory)
@@ -0,0 +1,212 @@
1
+ # Examples & Troubleshooting
2
+
3
+ ## Step-by-Step CLI Examples (Recommended)
4
+
5
+ ### Extract Text from a Page
6
+
7
+ ```bash
8
+ # 1. Launch browser
9
+ browse-agent launch 2>/dev/null
10
+
11
+ # 2. Navigate to page
12
+ browse-agent navigate "https://example.com" 2>/dev/null
13
+
14
+ # 3. Read content — decide next step based on output
15
+ browse-agent get-content --format text 2>/dev/null
16
+
17
+ # 4. Done — close browser
18
+ browse-agent close 2>/dev/null
19
+ ```
20
+
21
+ ### Find Specific Content on a Page
22
+
23
+ ```bash
24
+ browse-agent launch 2>/dev/null
25
+ browse-agent navigate "https://news.ycombinator.com" 2>/dev/null
26
+
27
+ # First, read the page to understand structure
28
+ browse-agent get-content --format text 2>/dev/null
29
+
30
+ # Then query specific elements based on what you found
31
+ browse-agent get-dom ".titleline > a" --property innerText --all 2>/dev/null
32
+
33
+ browse-agent close 2>/dev/null
34
+ ```
35
+
36
+ ### Screenshot and Inspect
37
+
38
+ ```bash
39
+ browse-agent launch 2>/dev/null
40
+ browse-agent navigate "https://example.com" 2>/dev/null
41
+
42
+ # Take a screenshot to see the page visually
43
+ browse-agent screenshot visible 2>/dev/null
44
+
45
+ # Run JS to count elements, check state, etc.
46
+ browse-agent evaluate "document.querySelectorAll('a').length" 2>/dev/null
47
+
48
+ browse-agent close 2>/dev/null
49
+ ```
50
+
51
+ ### Multi-Page Exploration
52
+
53
+ ```bash
54
+ browse-agent launch 2>/dev/null
55
+
56
+ # Visit first page
57
+ browse-agent navigate "https://example.com" 2>/dev/null
58
+ browse-agent get-content --format text 2>/dev/null
59
+
60
+ # Visit second page (based on what you found)
61
+ browse-agent navigate "https://example.org" 2>/dev/null
62
+ browse-agent get-content --format text 2>/dev/null
63
+
64
+ # Manage tabs
65
+ browse-agent tabs list 2>/dev/null
66
+ browse-agent tabs close 123 2>/dev/null
67
+
68
+ browse-agent close 2>/dev/null
69
+ ```
70
+
71
+ ### Use Logged-in Browser Profile
72
+
73
+ ```bash
74
+ # Launch with user's default browser profile (preserves cookies/sessions)
75
+ browse-agent launch --browser chrome 2>/dev/null
76
+ # ⚠ Close all Chrome windows first!
77
+
78
+ USE_USER_PROFILE=true browse-agent launch 2>/dev/null
79
+ browse-agent navigate "https://github.com/notifications" 2>/dev/null
80
+ browse-agent get-content --format text 2>/dev/null
81
+ browse-agent close 2>/dev/null
82
+ ```
83
+
84
+ ## One-Shot Script Examples
85
+
86
+ ### Extract Text from a Page
87
+
88
+ ```javascript
89
+ import { browse } from 'browse-agent-cli/script';
90
+
91
+ await browse(async (agent) => {
92
+ const { tabId } = await agent.navigate('https://example.com');
93
+ const result = await agent.getContent({ format: 'text', tabId });
94
+ return { title: result.title, text: result.content };
95
+ });
96
+ ```
97
+
98
+ ### Query DOM Elements
99
+
100
+ ```javascript
101
+ import { browse } from 'browse-agent-cli/script';
102
+
103
+ await browse(async (agent) => {
104
+ const { tabId } = await agent.navigate('https://news.ycombinator.com');
105
+ const titles = await agent.getDOM('.titleline > a', {
106
+ property: 'innerText',
107
+ all: true,
108
+ tabId,
109
+ });
110
+ return { headlines: titles.result };
111
+ });
112
+ ```
113
+
114
+ ### Run JavaScript on the Page
115
+
116
+ ```javascript
117
+ import { browse } from 'browse-agent-cli/script';
118
+
119
+ await browse(async (agent) => {
120
+ const { tabId } = await agent.navigate('https://example.com');
121
+ const count = await agent.evaluate('document.querySelectorAll("a").length', tabId);
122
+ return { linkCount: count.result };
123
+ });
124
+ ```
125
+
126
+ ### Take a Screenshot
127
+
128
+ ```javascript
129
+ import { browse } from 'browse-agent-cli/script';
130
+ import { writeFileSync } from 'fs';
131
+
132
+ await browse(async (agent) => {
133
+ const { tabId } = await agent.navigate('https://example.com');
134
+ const shot = await agent.screenshotVisible({ format: 'png', tabId });
135
+ writeFileSync('screenshot.png', Buffer.from(shot.data, 'base64'));
136
+ return { saved: 'screenshot.png', width: shot.width, height: shot.height };
137
+ });
138
+ ```
139
+
140
+ ### Multi-Page Data Collection
141
+
142
+ ```javascript
143
+ import { browse } from 'browse-agent-cli/script';
144
+
145
+ await browse(async (agent) => {
146
+ const urls = ['https://example.com', 'https://example.org'];
147
+ const results = [];
148
+ for (const url of urls) {
149
+ const { tabId } = await agent.navigate(url);
150
+ const page = await agent.getContent({ format: 'text', tabId });
151
+ results.push({ url: page.url, title: page.title, content: page.content });
152
+ const tabs = await agent.listTabs();
153
+ const tab = tabs.tabs.find(t => t.url === url);
154
+ if (tab) await agent.closeTab(tab.id);
155
+ }
156
+ return results;
157
+ });
158
+ ```
159
+
160
+ ### Access Logged-in Content (User Profile)
161
+
162
+ ```javascript
163
+ import { browse } from 'browse-agent-cli/script';
164
+
165
+ await browse(async (agent) => {
166
+ const { tabId } = await agent.navigate('https://github.com/notifications');
167
+ const content = await agent.getContent({ format: 'text', tabId });
168
+ return { title: content.title, content: content.content };
169
+ }, { useUserProfile: true });
170
+ ```
171
+
172
+ ### Override Browser Options via Env
173
+
174
+ ```bash
175
+ # Use user's default Chrome profile (keeps login sessions)
176
+ USE_USER_PROFILE=true node _browse_task.mjs 2>/dev/null
177
+
178
+ # Use Edge with user profile
179
+ BROWSER=edge USE_USER_PROFILE=true node _browse_task.mjs 2>/dev/null
180
+
181
+ # Custom executable path
182
+ CHROME_PATH=/path/to/browser node _browse_task.mjs 2>/dev/null
183
+ ```
184
+
185
+ ### Options via Code
186
+
187
+ ```javascript
188
+ // Use Edge with user profile
189
+ await browse(async (agent) => { /* ... */ }, {
190
+ browser: 'edge',
191
+ useUserProfile: true,
192
+ });
193
+
194
+ // Return result without printing JSON to stdout
195
+ const data = await browse(async (agent) => {
196
+ await agent.navigate('https://example.com');
197
+ return { title: 'ok' };
198
+ }, {
199
+ printResult: false,
200
+ });
201
+ ```
202
+
203
+ ## Troubleshooting
204
+
205
+ | Problem | Solution |
206
+ |---------|----------|
207
+ | Browser not found | Set `CHROME_PATH` env var to the executable path, or use `BROWSER=edge\|chromium\|brave` |
208
+ | Profile locked | Close all existing browser windows first when using `USE_USER_PROFILE=true` |
209
+ | Connection timeout (30s) | Ensure port 9315 is free. Kill stale Chrome: `pkill -f "user-data-dir=.*browse-agent"` |
210
+ | Extension not loading | Verify `.browse-agent/extension/manifest.json` exists. Re-run setup script |
211
+ | CSP blocks script injection | Use `evaluate()` instead — it uses CDP to bypass Content Security Policy |
212
+ | Stale Chrome profile | Delete `.browse-agent/chrome-profile/` and retry |
File without changes