tab-agent 0.2.3 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,32 +1,45 @@
1
1
  # Tab Agent
2
2
 
3
- **Secure browser control for Claude Code and Codex** — only the tabs you explicitly activate, not your entire browser.
3
+ [![npm version](https://img.shields.io/npm/v/tab-agent.svg)](https://www.npmjs.com/package/tab-agent)
4
+
5
+ **Give Claude, Codex, or any LLM full control of your browser tabs** — securely, with click-to-activate permission.
4
6
 
5
7
  ```
6
8
  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
7
- Claude/Codex │────▶│ Relay Server │────▶│ Extension │
8
- │◀────│ :9876 │◀────│ (Chrome) │
9
+ Claude Code │────▶│ Relay Server │────▶│ Extension │
10
+ Codex / LLM │◀────│ (background) │◀────│ (Chrome) │
9
11
  └─────────────────┘ └─────────────────┘ └─────────────────┘
10
-
11
-
12
- ┌───────────────────┐
13
- │ Your Active Tab │
14
- 🟢 ON
15
- └───────────────────┘
12
+
13
+
14
+ ┌───────────────────┐
15
+ │ Your Active Tab │
16
+ 🟢 Click to ON
17
+ └───────────────────┘
16
18
  ```
17
19
 
20
+ ## Features
21
+
22
+ - **Full browser control** — navigate, click, type, scroll, screenshot, run JavaScript
23
+ - **Uses your login sessions** — access authenticated sites (GitHub, Gmail, X) without sharing credentials
24
+ - **Runs in background** — relay server starts automatically, works while you do other things
25
+ - **Click-to-activate security** — only tabs you explicitly enable, your other tabs stay private
26
+ - **AI-optimized snapshots** — pages converted to readable text with element refs `[e1]`, `[e2]`
27
+ - **Works with any LLM** — Claude Code, Codex, or any tool that can run shell commands
28
+
29
+ ---
30
+
18
31
  ## Quick Start
19
32
 
20
33
  ```bash
21
34
  # 1. Clone and load extension
22
35
  git clone https://github.com/DrHB/tab-agent
23
- # → Open chrome://extensions → Enable Developer mode → Load unpacked → select extension/
36
+ # → Chrome: chrome://extensions → Developer mode → Load unpacked → select extension/
24
37
 
25
38
  # 2. Setup (auto-detects everything)
26
39
  npx tab-agent setup
27
40
 
28
- # 3. Use it
29
- # → Click Tab Agent icon on a tab (turns green)
41
+ # 3. Activate a tab & go!
42
+ # → Click Tab Agent icon on any tab (turns green = active)
30
43
  # → Ask Claude: "Use tab-agent to search Google for 'hello world'"
31
44
  ```
32
45
 
@@ -42,9 +55,8 @@ npx tab-agent setup
42
55
  | **Visibility** | Green badge = active | Hidden/background |
43
56
  | **Sessions** | Uses your cookies | Requires re-login |
44
57
  | **Credentials** | Never shared | Often required |
45
- | **Audit** | Full action logging | Varies |
46
58
 
47
- **Click-to-activate model:** Your banking, email, and sensitive tabs stay completely isolated. You always see exactly which tabs AI can control.
59
+ **Click-to-activate model:** Your banking, email, and sensitive tabs stay completely isolated. You always see exactly which tabs the LLM can control.
48
60
 
49
61
  ### 🍪 Works With Your Login Sessions
50
62
 
@@ -55,9 +67,9 @@ Because Tab Agent runs as a Chrome extension:
55
67
  - **Works with SSO and 2FA** — enterprise apps, protected accounts
56
68
  - **No credential sharing** — your passwords stay in your browser
57
69
 
58
- ### 🤖 AI-Optimized
70
+ ### 🤖 LLM-Optimized
59
71
 
60
- - **Semantic snapshots** — pages converted to AI-readable text with refs `[e1]`, `[e2]`
72
+ - **Semantic snapshots** — pages converted to readable text with refs `[e1]`, `[e2]`
61
73
  - **Screenshot fallback** — for complex dynamic pages
62
74
  - **Simple targeting** — click/type using refs instead of fragile CSS selectors
63
75
 
@@ -111,7 +123,7 @@ This automatically:
111
123
 
112
124
  1. Navigate to any webpage
113
125
  2. **Click the Tab Agent icon** — it turns green (🟢 ON)
114
- 3. Ask your AI to interact with the page
126
+ 3. Ask your LLM to interact with the page
115
127
 
116
128
  ---
117
129
 
@@ -122,37 +134,43 @@ This automatically:
122
134
  |---------|-------------|
123
135
  | `tabs` | List all activated tabs |
124
136
  | `navigate` | Go to a URL |
125
- | `snapshot` | Get AI-readable page with element refs |
137
+ | `snapshot` | Get page with element refs |
126
138
  | `screenshot` | Capture viewport image |
127
- | `screenshot fullPage` | Capture entire page |
139
+ | `screenshot --full` | Capture entire page |
128
140
 
129
141
  ### Interaction
130
142
  | Command | Description |
131
143
  |---------|-------------|
132
144
  | `click` | Click element by ref |
133
145
  | `type` | Type text into element |
134
- | `type ... submit` | Type and press Enter |
135
146
  | `fill` | Fill a form field |
136
- | `batchfill` | Fill multiple fields at once |
137
147
  | `press` | Press a key (Enter, Escape, Tab, Arrows) |
138
148
 
139
149
  ### Page Control
140
150
  | Command | Description |
141
151
  |---------|-------------|
142
152
  | `scroll` | Scroll up/down by amount |
143
- | `scrollintoview` | Scroll element into view |
144
153
  | `wait` | Wait for text or element to appear |
145
154
  | `evaluate` | Run JavaScript in page context |
146
- | `dialog` | Handle alert/confirm/prompt |
147
155
 
148
156
  ---
149
157
 
150
- ## CLI Reference
158
+ ## CLI Usage
151
159
 
152
160
  ```bash
153
- npx tab-agent setup # Initial configuration
154
- npx tab-agent status # Check if everything works
155
- npx tab-agent start # Start relay server manually
161
+ # Setup & Status
162
+ npx tab-agent setup # Initial configuration
163
+ npx tab-agent status # Check if everything works
164
+ npx tab-agent start # Start relay server manually
165
+
166
+ # Browser Commands
167
+ npx tab-agent tabs # List active tabs
168
+ npx tab-agent snapshot # Get page content with refs
169
+ npx tab-agent screenshot # Capture viewport
170
+ npx tab-agent screenshot --full # Capture full page
171
+ npx tab-agent click e5 # Click element
172
+ npx tab-agent type e3 "hello" # Type text
173
+ npx tab-agent navigate "https://..." # Go to URL
156
174
  ```
157
175
 
158
176
  ---
@@ -189,17 +207,17 @@ Setup automatically detects your browser.
189
207
 
190
208
  1. **Chrome Extension** — Runs in your browser with access to activated tabs and your session cookies
191
209
 
192
- 2. **Relay Server** — Local WebSocket server (port 9876) that bridges AI ↔ Extension via Chrome's Native Messaging API
210
+ 2. **Relay Server** — Local WebSocket server that bridges LLM ↔ Extension via Chrome's Native Messaging API (runs in background)
193
211
 
194
- 3. **Skill File** — Tells Claude/Codex how to send commands to the relay
212
+ 3. **Skill File** — Tells Claude/Codex how to send commands
195
213
 
196
214
  **Data flow:**
197
215
  ```
198
216
  You: "Search Google for cats"
199
217
 
200
- Claude/CodexWebSocket command → Relay Server → Native Messaging → Extension → DOM action
218
+ LLMCLI command → Relay Server → Native Messaging → Extension → Browser action
201
219
 
202
- Results ← WebSocket response ← Relay Server ← Native Messaging ← Page snapshot
220
+ Results ← Response ← Relay Server ← Native Messaging ← Page snapshot
203
221
  ```
204
222
 
205
223
  ---
@@ -210,4 +228,4 @@ MIT
210
228
 
211
229
  ---
212
230
 
213
- **Made for [Claude Code](https://claude.ai/code) and [Codex](https://openai.com/codex)**
231
+ **Works with [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), and any LLM that can run shell commands.**
package/bin/tab-agent.js CHANGED
@@ -1,40 +1,64 @@
1
1
  #!/usr/bin/env node
2
2
  const command = process.argv[2];
3
- const pkg = require('../package.json');
3
+
4
+ // Commands that go to the command module
5
+ const BROWSER_COMMANDS = ['tabs', 'snapshot', 'screenshot', 'click', 'type', 'fill', 'press', 'scroll', 'navigate', 'wait', 'evaluate'];
6
+
7
+ if (command === '-v' || command === '--version') {
8
+ console.log(require('../package.json').version);
9
+ process.exit(0);
10
+ }
11
+
12
+ if (BROWSER_COMMANDS.includes(command)) {
13
+ const { runCommand } = require('../cli/command.js');
14
+ runCommand(process.argv.slice(2));
15
+ } else {
16
+ switch (command) {
17
+ case 'setup':
18
+ require('../cli/setup.js');
19
+ break;
20
+ case 'start':
21
+ require('../cli/start.js');
22
+ break;
23
+ case 'status':
24
+ require('../cli/status.js');
25
+ break;
26
+ default:
27
+ showHelp();
28
+ }
29
+ }
4
30
 
5
31
  function showHelp() {
6
32
  console.log(`
7
33
  tab-agent - Browser control for Claude/Codex
8
34
 
9
- Commands:
10
- setup Auto-detect extension, register native host, install skills
11
- start Start the relay server
12
- status Check configuration status
35
+ Setup Commands:
36
+ setup Auto-detect extension, register native host, install skills
37
+ start Start the relay server
38
+ status Check configuration status
13
39
 
14
- Usage:
40
+ Browser Commands:
41
+ tabs List active tabs
42
+ snapshot Get AI-readable page content
43
+ screenshot [--full] Capture screenshot
44
+ click <ref> Click element (e.g., click e5)
45
+ type <ref> <text> Type text into element
46
+ fill <ref> <value> Fill form field
47
+ press <key> Press key (Enter, Escape, etc.)
48
+ scroll <dir> [amount] Scroll up/down
49
+ navigate <url> Go to URL
50
+ wait <text|selector> Wait for text or element
51
+ evaluate <script> Run JavaScript
52
+
53
+ Examples:
15
54
  npx tab-agent setup
16
- npx tab-agent start
17
- `);
18
- }
55
+ npx tab-agent snapshot
56
+ npx tab-agent click e5
57
+ npx tab-agent type e3 "hello world"
19
58
 
20
- switch (command) {
21
- case 'setup':
22
- require('../cli/setup.js');
23
- break;
24
- case 'start':
25
- require('../cli/start.js');
26
- break;
27
- case 'status':
28
- require('../cli/status.js');
29
- break;
30
- case '-v':
31
- case '--version':
32
- console.log(pkg.version);
33
- break;
34
- case undefined:
35
- showHelp();
36
- break;
37
- default:
38
- showHelp();
59
+ Version: ${require('../package.json').version}
60
+ `);
61
+ if (command && command !== 'help' && command !== '--help' && command !== '-h') {
39
62
  process.exit(1);
63
+ }
40
64
  }
package/cli/command.js ADDED
@@ -0,0 +1,200 @@
1
+ // cli/command.js
2
+ const WebSocket = require('ws');
3
+
4
+ const COMMANDS = ['tabs', 'snapshot', 'screenshot', 'click', 'type', 'fill', 'press', 'scroll', 'navigate', 'wait', 'evaluate'];
5
+
6
+ async function runCommand(args) {
7
+ const [command, ...params] = args;
8
+
9
+ if (!command || command === 'help') {
10
+ printHelp();
11
+ return;
12
+ }
13
+
14
+ if (!COMMANDS.includes(command)) {
15
+ console.error(`Unknown command: ${command}`);
16
+ console.error(`Available: ${COMMANDS.join(', ')}`);
17
+ process.exit(1);
18
+ }
19
+
20
+ const ws = new WebSocket('ws://localhost:9876');
21
+
22
+ const timeout = setTimeout(() => {
23
+ console.error('Connection timeout - is the relay running? Try: npx tab-agent start');
24
+ ws.close();
25
+ process.exit(1);
26
+ }, 5000);
27
+
28
+ ws.on('error', (err) => {
29
+ clearTimeout(timeout);
30
+ console.error('Connection failed:', err.message);
31
+ console.error('Make sure relay is running: npx tab-agent start');
32
+ process.exit(1);
33
+ });
34
+
35
+ ws.on('open', () => {
36
+ clearTimeout(timeout);
37
+
38
+ // First get tabs to find tabId
39
+ if (command === 'tabs') {
40
+ ws.send(JSON.stringify({ id: 1, action: 'tabs' }));
41
+ } else {
42
+ // Get active tab first, then run command
43
+ ws.send(JSON.stringify({ id: 0, action: 'tabs' }));
44
+ }
45
+ });
46
+
47
+ ws.on('message', (data) => {
48
+ const msg = JSON.parse(data);
49
+
50
+ // Handle tabs response
51
+ if (msg.id === 0) {
52
+ if (!msg.tabs || msg.tabs.length === 0) {
53
+ console.error('No active tabs. Click Tab Agent icon on a tab to activate it.');
54
+ ws.close();
55
+ process.exit(1);
56
+ }
57
+
58
+ const tabId = msg.tabs[0].tabId;
59
+ const payload = buildPayload(command, params, tabId);
60
+ ws.send(JSON.stringify({ id: 1, ...payload }));
61
+ return;
62
+ }
63
+
64
+ // Handle command response
65
+ if (msg.id === 1) {
66
+ if (command === 'tabs') {
67
+ printTabs(msg);
68
+ } else if (command === 'snapshot') {
69
+ printSnapshot(msg);
70
+ } else if (command === 'screenshot') {
71
+ printScreenshot(msg);
72
+ } else {
73
+ printResult(msg);
74
+ }
75
+ ws.close();
76
+ process.exit(msg.ok ? 0 : 1);
77
+ }
78
+ });
79
+ }
80
+
81
+ function buildPayload(command, params, tabId) {
82
+ const payload = { action: command, tabId };
83
+
84
+ switch (command) {
85
+ case 'click':
86
+ payload.ref = params[0];
87
+ break;
88
+ case 'type':
89
+ payload.ref = params[0];
90
+ payload.text = params.slice(1).join(' ');
91
+ if (params.includes('--submit')) {
92
+ payload.submit = true;
93
+ payload.text = payload.text.replace('--submit', '').trim();
94
+ }
95
+ break;
96
+ case 'fill':
97
+ payload.ref = params[0];
98
+ payload.value = params.slice(1).join(' ');
99
+ break;
100
+ case 'press':
101
+ payload.key = params[0];
102
+ break;
103
+ case 'scroll':
104
+ payload.direction = params[0] || 'down';
105
+ payload.amount = parseInt(params[1]) || 500;
106
+ break;
107
+ case 'navigate':
108
+ payload.url = params[0];
109
+ break;
110
+ case 'wait':
111
+ if (params[0]?.startsWith('.') || params[0]?.startsWith('#')) {
112
+ payload.selector = params[0];
113
+ } else {
114
+ payload.text = params.join(' ');
115
+ }
116
+ payload.timeout = parseInt(params.find(p => /^\d+$/.test(p))) || 5000;
117
+ break;
118
+ case 'evaluate':
119
+ payload.script = params.join(' ');
120
+ break;
121
+ case 'screenshot':
122
+ if (params.includes('--full') || params.includes('--fullPage')) {
123
+ payload.fullPage = true;
124
+ }
125
+ break;
126
+ }
127
+
128
+ return payload;
129
+ }
130
+
131
+ function printHelp() {
132
+ console.log(`
133
+ tab-agent - Browser control commands
134
+
135
+ Usage: npx tab-agent <command> [options]
136
+
137
+ Commands:
138
+ tabs List active tabs
139
+ snapshot Get AI-readable page content
140
+ screenshot [--full] Capture screenshot (--full for full page)
141
+ click <ref> Click element (e.g., click e5)
142
+ type <ref> <text> Type text into element
143
+ fill <ref> <value> Fill form field
144
+ press <key> Press key (Enter, Escape, Tab, etc.)
145
+ scroll <dir> [amount] Scroll up/down (default: 500px)
146
+ navigate <url> Go to URL
147
+ wait <text|selector> Wait for text or element
148
+ evaluate <script> Run JavaScript
149
+
150
+ Examples:
151
+ npx tab-agent tabs
152
+ npx tab-agent snapshot
153
+ npx tab-agent click e5
154
+ npx tab-agent type e3 hello world
155
+ npx tab-agent navigate https://google.com
156
+ npx tab-agent screenshot --full
157
+ `);
158
+ }
159
+
160
+ function printTabs(msg) {
161
+ if (!msg.ok) {
162
+ console.error('Error:', msg.error);
163
+ return;
164
+ }
165
+ console.log('Active tabs:\n');
166
+ msg.tabs.forEach((tab, i) => {
167
+ console.log(` ${i + 1}. [${tab.tabId}] ${tab.title}`);
168
+ console.log(` ${tab.url}\n`);
169
+ });
170
+ }
171
+
172
+ function printSnapshot(msg) {
173
+ if (!msg.ok) {
174
+ console.error('Error:', msg.error);
175
+ return;
176
+ }
177
+ console.log(msg.snapshot);
178
+ }
179
+
180
+ function printScreenshot(msg) {
181
+ if (!msg.ok) {
182
+ console.error('Error:', msg.error);
183
+ return;
184
+ }
185
+ // Output base64 directly - no file, no auto-open
186
+ console.log(msg.screenshot);
187
+ }
188
+
189
+ function printResult(msg) {
190
+ if (!msg.ok) {
191
+ console.error('Error:', msg.error);
192
+ return;
193
+ }
194
+ console.log('OK');
195
+ if (msg.result !== undefined) {
196
+ console.log('Result:', msg.result);
197
+ }
198
+ }
199
+
200
+ module.exports = { runCommand };
@@ -191,32 +191,23 @@ async function routeCommand(tabId, command) {
191
191
  // Full page screenshot using chrome.debugger
192
192
  if (fullPage) {
193
193
  try {
194
- await chrome.debugger.attach({ tabId }, '1.3');
194
+ // Try to detach first in case previous attempt left debugger attached
195
+ try { await chrome.debugger.detach({ tabId }); } catch {}
195
196
 
196
- const { result: layout } = await chrome.debugger.sendCommand(
197
- { tabId },
198
- 'Page.getLayoutMetrics'
199
- );
197
+ await chrome.debugger.attach({ tabId }, '1.3');
200
198
 
201
- const { data } = await chrome.debugger.sendCommand(
199
+ const screenshot = await chrome.debugger.sendCommand(
202
200
  { tabId },
203
201
  'Page.captureScreenshot',
204
202
  {
205
203
  format: 'png',
206
- captureBeyondViewport: true,
207
- clip: {
208
- x: 0,
209
- y: 0,
210
- width: layout.contentSize.width,
211
- height: layout.contentSize.height,
212
- scale: 1
213
- }
204
+ captureBeyondViewport: true
214
205
  }
215
206
  );
216
207
 
217
208
  await chrome.debugger.detach({ tabId });
218
209
  audit('screenshot', { tabId, fullPage: true }, { ok: true });
219
- return { ok: true, screenshot: 'data:image/png;base64,' + data, format: 'png' };
210
+ return { ok: true, screenshot: 'data:image/png;base64,' + screenshot.data, format: 'png' };
220
211
  } catch (error) {
221
212
  try { await chrome.debugger.detach({ tabId }); } catch {}
222
213
  const result = { ok: false, error: error.message };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tab-agent",
3
- "version": "0.2.3",
3
+ "version": "0.3.1",
4
4
  "description": "Browser control for Claude Code and Codex via WebSocket",
5
5
  "bin": {
6
6
  "tab-agent": "./bin/tab-agent.js"
@@ -1,11 +1,11 @@
1
1
  ---
2
2
  name: tab-agent
3
- description: Browser control via WebSocket - snapshot, click, type, fill, evaluate, screenshot
3
+ description: Browser control via CLI - snapshot, click, type, fill, screenshot
4
4
  ---
5
5
 
6
6
  # Tab Agent
7
7
 
8
- WebSocket `ws://localhost:9876`. User activates tabs via extension icon (green = active).
8
+ Control browser tabs via CLI. User activates tabs via extension icon (green = active).
9
9
 
10
10
  ## Before First Command
11
11
 
@@ -16,38 +16,44 @@ sleep 2
16
16
 
17
17
  ## Commands
18
18
 
19
- ```json
20
- {"id": 1, "action": "tabs"} // list active tabs
21
- {"id": 2, "action": "snapshot", "tabId": ID} // get page with refs [e1], [e2]...
22
- {"id": 3, "action": "screenshot", "tabId": ID} // viewport screenshot
23
- {"id": 4, "action": "screenshot", "tabId": ID, "fullPage": true} // full page screenshot
24
- {"id": 5, "action": "click", "tabId": ID, "ref": "e1"} // click element
25
- {"id": 6, "action": "fill", "tabId": ID, "ref": "e1", "value": "text"} // fill input
26
- {"id": 7, "action": "type", "tabId": ID, "ref": "e1", "text": "hello"} // type text
27
- {"id": 8, "action": "type", "tabId": ID, "ref": "e1", "text": "query", "submit": true} // type and Enter
28
- {"id": 9, "action": "press", "tabId": ID, "key": "Enter"} // press key
29
- {"id": 10, "action": "scroll", "tabId": ID, "direction": "down", "amount": 500}
30
- {"id": 11, "action": "scrollintoview", "tabId": ID, "ref": "e1"} // scroll element visible
31
- {"id": 12, "action": "navigate", "tabId": ID, "url": "https://..."}
32
- {"id": 13, "action": "wait", "tabId": ID, "text": "Loading complete"} // wait for text
33
- {"id": 14, "action": "wait", "tabId": ID, "selector": ".results", "timeout": 5000} // wait for element
34
- {"id": 15, "action": "evaluate", "tabId": ID, "script": "document.title"} // run JavaScript
35
- {"id": 16, "action": "batchfill", "tabId": ID, "fields": [{"ref": "e1", "value": "a"}, {"ref": "e2", "value": "b"}]}
36
- {"id": 17, "action": "dialog", "tabId": ID, "accept": true} // handle alert/confirm
19
+ ```bash
20
+ npx tab-agent tabs # List active tabs
21
+ npx tab-agent snapshot # Get page with refs [e1], [e2]...
22
+ npx tab-agent screenshot # Capture viewport
23
+ npx tab-agent screenshot --full # Capture full page
24
+ npx tab-agent click <ref> # Click element
25
+ npx tab-agent type <ref> <text> # Type text
26
+ npx tab-agent fill <ref> <value> # Fill form field
27
+ npx tab-agent press <key> # Press key (Enter, Escape, Tab)
28
+ npx tab-agent scroll <dir> [amount] # Scroll up/down
29
+ npx tab-agent navigate <url> # Go to URL
30
+ npx tab-agent wait <text|selector> # Wait for condition
31
+ npx tab-agent evaluate <script> # Run JavaScript
37
32
  ```
38
33
 
39
34
  ## Usage
40
35
 
41
- 1. `tabs` -> get tabId of active tab
36
+ 1. `tabs` -> find active tab
42
37
  2. `snapshot` -> read page, get element refs [e1], [e2]...
43
- 3. `click`/`fill`/`type` using refs
44
- 4. If snapshot incomplete (complex page) -> `screenshot` and analyze visually
38
+ 3. `click`/`type`/`fill` using refs
39
+ 4. If snapshot incomplete -> `screenshot` and analyze visually
40
+
41
+ ## Examples
42
+
43
+ ```bash
44
+ # Search Google
45
+ npx tab-agent navigate "https://google.com"
46
+ npx tab-agent snapshot
47
+ npx tab-agent type e1 "hello world"
48
+ npx tab-agent press Enter
49
+
50
+ # Read page content
51
+ npx tab-agent snapshot
52
+ npx tab-agent screenshot --full
53
+ ```
45
54
 
46
55
  ## Notes
47
56
 
48
- - Screenshot returns `{"screenshot": "data:image/png;base64,..."}` - analyze directly
49
- - Snapshot refs reset on each call - always snapshot before interacting
57
+ - Screenshot saves to /tmp/ and opens automatically
58
+ - Refs reset on each snapshot - always snapshot before interacting
50
59
  - Keys: Enter, Escape, Tab, Backspace, ArrowUp/Down/Left/Right
51
- - `type` with `submit: true` presses Enter after typing (for search boxes)
52
- - `evaluate` runs in page context - can access page variables/functions
53
- - `dialog` handles alert/confirm/prompt - debugger bar appears when attached
@@ -1,11 +1,11 @@
1
1
  ---
2
2
  name: tab-agent
3
- description: Browser control via WebSocket
3
+ description: Browser control via CLI
4
4
  ---
5
5
 
6
6
  # Tab Agent
7
7
 
8
- `ws://localhost:9876` - User activates tabs via extension (green = active)
8
+ CLI browser control. User activates tabs via extension (green = active).
9
9
 
10
10
  ## Start Relay
11
11
 
@@ -15,26 +15,20 @@ curl -s http://localhost:9876/health || (npx tab-agent start &)
15
15
 
16
16
  ## Commands
17
17
 
18
- ```
19
- tabs -> list active tabs
20
- snapshot tabId -> page with refs [e1], [e2]...
21
- screenshot tabId -> viewport screenshot (base64 PNG)
22
- screenshot tabId fullPage=true -> full page screenshot
23
- click tabId ref -> click element
24
- fill tabId ref value -> fill input
25
- type tabId ref text -> type text
26
- type tabId ref text submit=true -> type and press Enter
27
- press tabId key -> Enter/Escape/Tab/Arrow*
28
- scroll tabId direction amount -> scroll page
29
- scrollintoview tabId ref -> scroll element visible
30
- navigate tabId url -> go to URL
31
- wait tabId text="..." -> wait for text
32
- wait tabId selector="..." timeout=ms -> wait for element
33
- evaluate tabId script="..." -> run JavaScript
34
- batchfill tabId fields=[...] -> fill multiple fields
35
- dialog tabId accept=true -> handle alert/confirm
18
+ ```bash
19
+ tabs # List active tabs
20
+ snapshot # Page with refs [e1], [e2]...
21
+ screenshot [--full] # Capture viewport/full page
22
+ click <ref> # Click element
23
+ type <ref> <text> # Type text
24
+ fill <ref> <value> # Fill form field
25
+ press <key> # Enter/Escape/Tab/Arrow*
26
+ scroll <dir> [amount] # Scroll up/down
27
+ navigate <url> # Go to URL
28
+ wait <text|selector> # Wait for condition
29
+ evaluate <script> # Run JavaScript
36
30
  ```
37
31
 
38
32
  ## Flow
39
33
 
40
- `tabs` -> `snapshot` -> `click`/`fill` -> repeat. Use `screenshot` if snapshot incomplete.
34
+ `snapshot` -> `click`/`type` -> repeat. Use `screenshot` if snapshot incomplete.