tab-agent 0.3.0 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,32 +1,45 @@
1
1
  # Tab Agent
2
2
 
3
- **Secure browser control for Claude Code and Codex** — only the tabs you explicitly activate, not your entire browser.
3
+ [![npm version](https://img.shields.io/npm/v/tab-agent.svg)](https://www.npmjs.com/package/tab-agent)
4
+
5
+ **Give Claude, Codex, or any LLM full control of your browser tabs** — securely, with click-to-activate permission.
4
6
 
5
7
  ```
6
8
  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
7
- Claude/Codex │────▶│ Relay Server │────▶│ Extension │
8
- │◀────│ :9876 │◀────│ (Chrome) │
9
+ Claude Code │────▶│ Relay Server │────▶│ Extension │
10
+ Codex / LLM │◀────│ (background) │◀────│ (Chrome) │
9
11
  └─────────────────┘ └─────────────────┘ └─────────────────┘
10
-
11
-
12
- ┌───────────────────┐
13
- │ Your Active Tab │
14
- 🟢 ON
15
- └───────────────────┘
12
+
13
+
14
+ ┌───────────────────┐
15
+ │ Your Active Tab │
16
+ 🟢 Click to ON
17
+ └───────────────────┘
16
18
  ```
17
19
 
20
+ ## Features
21
+
22
+ - **Full browser control** — navigate, click, type, scroll, screenshot, run JavaScript
23
+ - **Uses your login sessions** — access authenticated sites (GitHub, Gmail, X) without sharing credentials
24
+ - **Runs in background** — relay server starts automatically, works while you do other things
25
+ - **Click-to-activate security** — only tabs you explicitly enable, your other tabs stay private
26
+ - **AI-optimized snapshots** — pages converted to readable text with element refs `[e1]`, `[e2]`
27
+ - **Works with any LLM** — Claude Code, Codex, or any tool that can run shell commands
28
+
29
+ ---
30
+
18
31
  ## Quick Start
19
32
 
20
33
  ```bash
21
34
  # 1. Clone and load extension
22
35
  git clone https://github.com/DrHB/tab-agent
23
- # → Open chrome://extensions → Enable Developer mode → Load unpacked → select extension/
36
+ # → Chrome: chrome://extensions → Developer mode → Load unpacked → select extension/
24
37
 
25
38
  # 2. Setup (auto-detects everything)
26
39
  npx tab-agent setup
27
40
 
28
- # 3. Use it
29
- # → Click Tab Agent icon on a tab (turns green)
41
+ # 3. Activate a tab & go!
42
+ # → Click Tab Agent icon on any tab (turns green = active)
30
43
  # → Ask Claude: "Use tab-agent to search Google for 'hello world'"
31
44
  ```
32
45
 
@@ -42,9 +55,8 @@ npx tab-agent setup
42
55
  | **Visibility** | Green badge = active | Hidden/background |
43
56
  | **Sessions** | Uses your cookies | Requires re-login |
44
57
  | **Credentials** | Never shared | Often required |
45
- | **Audit** | Full action logging | Varies |
46
58
 
47
- **Click-to-activate model:** Your banking, email, and sensitive tabs stay completely isolated. You always see exactly which tabs AI can control.
59
+ **Click-to-activate model:** Your banking, email, and sensitive tabs stay completely isolated. You always see exactly which tabs the LLM can control.
48
60
 
49
61
  ### 🍪 Works With Your Login Sessions
50
62
 
@@ -55,9 +67,9 @@ Because Tab Agent runs as a Chrome extension:
55
67
  - **Works with SSO and 2FA** — enterprise apps, protected accounts
56
68
  - **No credential sharing** — your passwords stay in your browser
57
69
 
58
- ### 🤖 AI-Optimized
70
+ ### 🤖 LLM-Optimized
59
71
 
60
- - **Semantic snapshots** — pages converted to AI-readable text with refs `[e1]`, `[e2]`
72
+ - **Semantic snapshots** — pages converted to readable text with refs `[e1]`, `[e2]`
61
73
  - **Screenshot fallback** — for complex dynamic pages
62
74
  - **Simple targeting** — click/type using refs instead of fragile CSS selectors
63
75
 
@@ -111,7 +123,7 @@ This automatically:
111
123
 
112
124
  1. Navigate to any webpage
113
125
  2. **Click the Tab Agent icon** — it turns green (🟢 ON)
114
- 3. Ask your AI to interact with the page
126
+ 3. Ask your LLM to interact with the page
115
127
 
116
128
  ---
117
129
 
@@ -122,56 +134,43 @@ This automatically:
122
134
  |---------|-------------|
123
135
  | `tabs` | List all activated tabs |
124
136
  | `navigate` | Go to a URL |
125
- | `snapshot` | Get AI-readable page with element refs |
137
+ | `snapshot` | Get page with element refs |
126
138
  | `screenshot` | Capture viewport image |
127
- | `screenshot fullPage` | Capture entire page |
139
+ | `screenshot --full` | Capture entire page |
128
140
 
129
141
  ### Interaction
130
142
  | Command | Description |
131
143
  |---------|-------------|
132
144
  | `click` | Click element by ref |
133
145
  | `type` | Type text into element |
134
- | `type ... submit` | Type and press Enter |
135
146
  | `fill` | Fill a form field |
136
- | `batchfill` | Fill multiple fields at once |
137
147
  | `press` | Press a key (Enter, Escape, Tab, Arrows) |
138
148
 
139
149
  ### Page Control
140
150
  | Command | Description |
141
151
  |---------|-------------|
142
152
  | `scroll` | Scroll up/down by amount |
143
- | `scrollintoview` | Scroll element into view |
144
153
  | `wait` | Wait for text or element to appear |
145
154
  | `evaluate` | Run JavaScript in page context |
146
- | `dialog` | Handle alert/confirm/prompt |
147
155
 
148
- ## CLI Usage
156
+ ---
149
157
 
150
- Run commands directly from terminal:
158
+ ## CLI Usage
151
159
 
152
160
  ```bash
161
+ # Setup & Status
162
+ npx tab-agent setup # Initial configuration
163
+ npx tab-agent status # Check if everything works
164
+ npx tab-agent start # Start relay server manually
165
+
166
+ # Browser Commands
153
167
  npx tab-agent tabs # List active tabs
154
- npx tab-agent snapshot # Get page content
168
+ npx tab-agent snapshot # Get page content with refs
155
169
  npx tab-agent screenshot # Capture viewport
156
170
  npx tab-agent screenshot --full # Capture full page
157
171
  npx tab-agent click e5 # Click element
158
172
  npx tab-agent type e3 "hello" # Type text
159
- npx tab-agent fill e3 "value" # Fill field
160
- npx tab-agent press Enter # Press key
161
- npx tab-agent scroll down 500 # Scroll
162
173
  npx tab-agent navigate "https://..." # Go to URL
163
- npx tab-agent wait "Loading" # Wait for text
164
- npx tab-agent evaluate "document.title" # Run JS
165
- ```
166
-
167
- ---
168
-
169
- ## CLI Reference
170
-
171
- ```bash
172
- npx tab-agent setup # Initial configuration
173
- npx tab-agent status # Check if everything works
174
- npx tab-agent start # Start relay server manually
175
174
  ```
176
175
 
177
176
  ---
@@ -208,17 +207,17 @@ Setup automatically detects your browser.
208
207
 
209
208
  1. **Chrome Extension** — Runs in your browser with access to activated tabs and your session cookies
210
209
 
211
- 2. **Relay Server** — Local WebSocket server (port 9876) that bridges AI ↔ Extension via Chrome's Native Messaging API
210
+ 2. **Relay Server** — Local WebSocket server that bridges LLM ↔ Extension via Chrome's Native Messaging API (runs in background)
212
211
 
213
- 3. **Skill File** — Tells Claude/Codex how to send commands to the relay
212
+ 3. **Skill File** — Tells Claude/Codex how to send commands
214
213
 
215
214
  **Data flow:**
216
215
  ```
217
216
  You: "Search Google for cats"
218
217
 
219
- Claude/CodexWebSocket command → Relay Server → Native Messaging → Extension → DOM action
218
+ LLMCLI command → Relay Server → Native Messaging → Extension → Browser action
220
219
 
221
- Results ← WebSocket response ← Relay Server ← Native Messaging ← Page snapshot
220
+ Results ← Response ← Relay Server ← Native Messaging ← Page snapshot
222
221
  ```
223
222
 
224
223
  ---
@@ -229,4 +228,4 @@ MIT
229
228
 
230
229
  ---
231
230
 
232
- **Made for [Claude Code](https://claude.ai/code) and [Codex](https://openai.com/codex)**
231
+ **Works with [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), and any LLM that can run shell commands.**
package/cli/command.js CHANGED
@@ -182,16 +182,8 @@ function printScreenshot(msg) {
182
182
  console.error('Error:', msg.error);
183
183
  return;
184
184
  }
185
- // Save to file
186
- const fs = require('fs');
187
- const filename = `/tmp/tab-agent-screenshot-${Date.now()}.png`;
188
- const base64Data = msg.screenshot.replace(/^data:image\/png;base64,/, '');
189
- fs.writeFileSync(filename, base64Data, 'base64');
190
- console.log(`Screenshot saved: ${filename}`);
191
-
192
- // Try to open it
193
- const { exec } = require('child_process');
194
- exec(`open "${filename}"`, () => {});
185
+ // Output base64 directly - no file, no auto-open
186
+ console.log(msg.screenshot);
195
187
  }
196
188
 
197
189
  function printResult(msg) {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tab-agent",
3
- "version": "0.3.0",
3
+ "version": "0.3.2",
4
4
  "description": "Browser control for Claude Code and Codex via WebSocket",
5
5
  "bin": {
6
6
  "tab-agent": "./bin/tab-agent.js"
@@ -19,8 +19,6 @@ sleep 2
19
19
  ```bash
20
20
  npx tab-agent tabs # List active tabs
21
21
  npx tab-agent snapshot # Get page with refs [e1], [e2]...
22
- npx tab-agent screenshot # Capture viewport
23
- npx tab-agent screenshot --full # Capture full page
24
22
  npx tab-agent click <ref> # Click element
25
23
  npx tab-agent type <ref> <text> # Type text
26
24
  npx tab-agent fill <ref> <value> # Fill form field
@@ -28,15 +26,19 @@ npx tab-agent press <key> # Press key (Enter, Escape, Tab)
28
26
  npx tab-agent scroll <dir> [amount] # Scroll up/down
29
27
  npx tab-agent navigate <url> # Go to URL
30
28
  npx tab-agent wait <text|selector> # Wait for condition
31
- npx tab-agent evaluate <script> # Run JavaScript
29
+ npx tab-agent screenshot # Capture viewport (fallback only)
30
+ npx tab-agent screenshot --full # Capture full page (fallback only)
32
31
  ```
33
32
 
34
- ## Usage
33
+ ## Workflow
35
34
 
36
- 1. `tabs` -> find active tab
37
- 2. `snapshot` -> read page, get element refs [e1], [e2]...
38
- 3. `click`/`type`/`fill` using refs
39
- 4. If snapshot incomplete -> `screenshot` and analyze visually
35
+ 1. `snapshot` first - always start here to get element refs
36
+ 2. Use refs [e1], [e2]... with `click`/`type`/`fill`
37
+ 3. `snapshot` again after actions to see results
38
+ 4. **Only use `screenshot` if:**
39
+ - Snapshot is missing expected content
40
+ - Page has complex visuals (charts, images, canvas)
41
+ - Debugging why an action didn't work
40
42
 
41
43
  ## Examples
42
44
 
@@ -46,14 +48,15 @@ npx tab-agent navigate "https://google.com"
46
48
  npx tab-agent snapshot
47
49
  npx tab-agent type e1 "hello world"
48
50
  npx tab-agent press Enter
51
+ npx tab-agent snapshot # See results
49
52
 
50
- # Read page content
51
- npx tab-agent snapshot
53
+ # Only screenshot if snapshot doesn't show what you need
52
54
  npx tab-agent screenshot --full
53
55
  ```
54
56
 
55
57
  ## Notes
56
58
 
57
- - Screenshot saves to /tmp/ and opens automatically
58
59
  - Refs reset on each snapshot - always snapshot before interacting
59
60
  - Keys: Enter, Escape, Tab, Backspace, ArrowUp/Down/Left/Right
61
+ - Screenshot outputs base64 to stdout (no file saved)
62
+ - Prefer snapshot over screenshot - it's faster and text-based
@@ -16,19 +16,23 @@ curl -s http://localhost:9876/health || (npx tab-agent start &)
16
16
  ## Commands
17
17
 
18
18
  ```bash
19
- tabs # List active tabs
20
- snapshot # Page with refs [e1], [e2]...
21
- screenshot [--full] # Capture viewport/full page
22
- click <ref> # Click element
23
- type <ref> <text> # Type text
24
- fill <ref> <value> # Fill form field
25
- press <key> # Enter/Escape/Tab/Arrow*
26
- scroll <dir> [amount] # Scroll up/down
27
- navigate <url> # Go to URL
28
- wait <text|selector> # Wait for condition
29
- evaluate <script> # Run JavaScript
19
+ npx tab-agent tabs # List active tabs
20
+ npx tab-agent snapshot # Page with refs [e1], [e2]...
21
+ npx tab-agent click <ref> # Click element
22
+ npx tab-agent type <ref> <text> # Type text
23
+ npx tab-agent fill <ref> <val> # Fill form field
24
+ npx tab-agent press <key> # Enter/Escape/Tab/Arrow*
25
+ npx tab-agent scroll <dir> [n] # Scroll up/down
26
+ npx tab-agent navigate <url> # Go to URL
27
+ npx tab-agent wait <text|sel> # Wait for condition
28
+ npx tab-agent screenshot # Fallback only - if snapshot incomplete
30
29
  ```
31
30
 
32
- ## Flow
31
+ ## Workflow
33
32
 
34
- `snapshot` -> `click`/`type` -> repeat. Use `screenshot` if snapshot incomplete.
33
+ 1. Always `snapshot` first - get refs [e1], [e2]...
34
+ 2. `click`/`type`/`fill` using refs
35
+ 3. `snapshot` again to see results
36
+ 4. **Only screenshot if snapshot missing content** (charts, canvas, debugging)
37
+
38
+ Prefer snapshot over screenshot - faster and text-based.