tab-agent 0.3.2 → 0.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,13 +1,14 @@
1
1
  # Tab Agent
2
2
 
3
3
  [![npm version](https://img.shields.io/npm/v/tab-agent.svg)](https://www.npmjs.com/package/tab-agent)
4
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
4
5
 
5
- **Give Claude, Codex, or any LLM full control of your browser tabs** — securely, with click-to-activate permission.
6
+ **Browser control for Claude Code and Codex** — click-to-activate security.
6
7
 
7
8
  ```
8
9
  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
9
- Claude Code │────▶│ Relay Server │────▶│ Extension │
10
- Codex / LLM │◀────│ (background) │◀────│ (Chrome) │
10
+ Claude Code │────▶│ Relay Server │────▶│ Extension │
11
+ or Codex │◀────│ (background) │◀────│ (Chrome) │
11
12
  └─────────────────┘ └─────────────────┘ └─────────────────┘
12
13
 
13
14
 
@@ -20,160 +21,103 @@
20
21
  ## Features
21
22
 
22
23
  - **Full browser control** — navigate, click, type, scroll, screenshot, run JavaScript
23
- - **Uses your login sessions** — access authenticated sites (GitHub, Gmail, X) without sharing credentials
24
- - **Runs in background** — relay server starts automatically, works while you do other things
25
- - **Click-to-activate security** — only tabs you explicitly enable, your other tabs stay private
26
- - **AI-optimized snapshots** — pages converted to readable text with element refs `[e1]`, `[e2]`
27
- - **Works with any LLM** — Claude Code, Codex, or any tool that can run shell commands
28
-
29
- ---
24
+ - **Uses your login sessions** — access GitHub, Gmail, Amazon without sharing credentials
25
+ - **Runs in background** — relay starts automatically, works while you do other things
26
+ - **Click-to-activate security** — only tabs you explicitly enable, others stay private
27
+ - **AI-optimized snapshots** — pages converted to text with refs `[e1]`, `[e2]` for easy targeting
28
+ - **Works with Claude Code & Codex** installs skills automatically
30
29
 
31
30
  ## Quick Start
32
31
 
33
32
  ```bash
34
- # 1. Clone and load extension
33
+ # 1. Install extension
35
34
  git clone https://github.com/DrHB/tab-agent
36
- # Chrome: chrome://extensions → Developer mode → Load unpacked → select extension/
35
+ # Chrome: chrome://extensions → Developer mode → Load unpacked → select extension/
37
36
 
38
- # 2. Setup (auto-detects everything)
37
+ # 2. Setup
39
38
  npx tab-agent setup
40
39
 
41
- # 3. Activate a tab & go!
42
- # Click Tab Agent icon on any tab (turns green = active)
43
- # Ask Claude: "Use tab-agent to search Google for 'hello world'"
40
+ # 3. Activate & go
41
+ # Click extension icon on any tab (turns green)
42
+ # Ask Claude: "Search Amazon for mechanical keyboards and find the best rated"
44
43
  ```
45
44
 
46
- ---
47
-
48
- ## Why Tab Agent?
49
-
50
- ### 🔒 Security First
51
-
52
- | | Tab Agent | Traditional Automation |
53
- |--|-----------|----------------------|
54
- | **Access** | Only tabs you activate | Entire browser |
55
- | **Visibility** | Green badge = active | Hidden/background |
56
- | **Sessions** | Uses your cookies | Requires re-login |
57
- | **Credentials** | Never shared | Often required |
58
-
59
- **Click-to-activate model:** Your banking, email, and sensitive tabs stay completely isolated. You always see exactly which tabs the LLM can control.
60
-
61
- ### 🍪 Works With Your Login Sessions
62
-
63
- Because Tab Agent runs as a Chrome extension:
64
-
65
- - **Uses your existing cookies** — no re-authentication needed
66
- - **Access any site you're logged into** — GitHub, X, Gmail, internal tools
67
- - **Works with SSO and 2FA** — enterprise apps, protected accounts
68
- - **No credential sharing** — your passwords stay in your browser
45
+ ## Example Tasks
69
46
 
70
- ### 🤖 LLM-Optimized
71
-
72
- - **Semantic snapshots** pages converted to readable text with refs `[e1]`, `[e2]`
73
- - **Screenshot fallback** — for complex dynamic pages
74
- - **Simple targeting** — click/type using refs instead of fragile CSS selectors
75
-
76
- ---
47
+ ```bash
48
+ # Research
49
+ "Go to Hacker News and summarize the top 5 stories"
77
50
 
78
- ## Example Use Cases
51
+ # Shopping (uses your login!)
52
+ "Search Amazon for protein powder, filter by 4+ stars, find the best value"
79
53
 
80
- **Web Research**
81
- > "Go to Hacker News and summarize the top 5 articles"
54
+ # Social Media
55
+ "Check my GitHub notifications and list unread ones"
82
56
 
83
- **Authenticated Actions** (uses your session!)
84
- > "Check my GitHub notifications and list the unread ones"
57
+ # Data Extraction
58
+ "Get the titles and prices of the first 10 products on this page"
85
59
 
86
- **Form Automation**
87
- > "Fill out this contact form with my details"
60
+ # Automation
61
+ "Fill out this form with my details"
62
+ ```
88
63
 
89
- **Data Extraction**
90
- > "Get the last 20 posts from my X timeline with author names"
64
+ ## Commands
91
65
 
92
- **Multi-step Workflows**
93
- > "Search Amazon for 'mechanical keyboard', filter by 4+ stars, and list the top 3"
66
+ ```bash
67
+ # Core workflow
68
+ npx tab-agent snapshot # Get page content with refs [e1], [e2]...
69
+ npx tab-agent click <ref> # Click element (e.g., click e5)
70
+ npx tab-agent type <ref> <text> # Type into element
71
+ npx tab-agent fill <ref> <value> # Fill form field
72
+
73
+ # Navigation
74
+ npx tab-agent navigate <url> # Go to URL
75
+ npx tab-agent scroll <dir> [amount] # Scroll up/down
76
+ npx tab-agent press <key> # Press key (Enter, Escape, Tab)
77
+
78
+ # Utilities
79
+ npx tab-agent tabs # List active tabs
80
+ npx tab-agent wait <text> # Wait for text to appear
81
+ npx tab-agent screenshot # Capture page (fallback for complex UIs)
82
+ ```
94
83
 
95
- ---
84
+ **Workflow:** `snapshot` → use refs → `click`/`type` → `snapshot` again → repeat
96
85
 
97
86
  ## Installation
98
87
 
99
- ### Step 1: Load Extension
88
+ ### 1. Load Extension
100
89
 
101
90
  ```bash
102
91
  git clone https://github.com/DrHB/tab-agent
103
92
  ```
104
93
 
105
- 1. Open `chrome://extensions` in your browser
106
- 2. Enable **Developer mode** (toggle in top right)
94
+ 1. Open `chrome://extensions`
95
+ 2. Enable **Developer mode** (top right)
107
96
  3. Click **Load unpacked**
108
97
  4. Select the `extension/` folder
109
- 5. You'll see the Tab Agent icon in your toolbar
110
98
 
111
- ### Step 2: Run Setup
99
+ ### 2. Run Setup
112
100
 
113
101
  ```bash
114
102
  npx tab-agent setup
115
103
  ```
116
104
 
117
- This automatically:
118
- - Detects your extension ID
119
- - Configures native messaging
120
- - Installs the Claude/Codex skill
105
+ This auto-detects your extension and configures everything.
121
106
 
122
- ### Step 3: Activate & Use
107
+ ### 3. Activate Tabs
123
108
 
124
- 1. Navigate to any webpage
125
- 2. **Click the Tab Agent icon** — it turns green (🟢 ON)
126
- 3. Ask your LLM to interact with the page
109
+ Click the Tab Agent icon on any tab you want to control. Green = active.
127
110
 
128
- ---
129
-
130
- ## Commands Reference
131
-
132
- ### Navigation & Viewing
133
- | Command | Description |
134
- |---------|-------------|
135
- | `tabs` | List all activated tabs |
136
- | `navigate` | Go to a URL |
137
- | `snapshot` | Get page with element refs |
138
- | `screenshot` | Capture viewport image |
139
- | `screenshot --full` | Capture entire page |
140
-
141
- ### Interaction
142
- | Command | Description |
143
- |---------|-------------|
144
- | `click` | Click element by ref |
145
- | `type` | Type text into element |
146
- | `fill` | Fill a form field |
147
- | `press` | Press a key (Enter, Escape, Tab, Arrows) |
148
-
149
- ### Page Control
150
- | Command | Description |
151
- |---------|-------------|
152
- | `scroll` | Scroll up/down by amount |
153
- | `wait` | Wait for text or element to appear |
154
- | `evaluate` | Run JavaScript in page context |
155
-
156
- ---
157
-
158
- ## CLI Usage
111
+ ## Security Model
159
112
 
160
- ```bash
161
- # Setup & Status
162
- npx tab-agent setup # Initial configuration
163
- npx tab-agent status # Check if everything works
164
- npx tab-agent start # Start relay server manually
113
+ | Feature | Tab Agent | Traditional Automation |
114
+ |---------|--------------|----------------------|
115
+ | **Access** | Only tabs you click to activate | Entire browser |
116
+ | **Sessions** | Uses your cookies | Requires credentials |
117
+ | **Visibility** | Green badge shows active tabs | Hidden/background |
118
+ | **Control** | You choose what AI can access | Full access by default |
165
119
 
166
- # Browser Commands
167
- npx tab-agent tabs # List active tabs
168
- npx tab-agent snapshot # Get page content with refs
169
- npx tab-agent screenshot # Capture viewport
170
- npx tab-agent screenshot --full # Capture full page
171
- npx tab-agent click e5 # Click element
172
- npx tab-agent type e3 "hello" # Type text
173
- npx tab-agent navigate "https://..." # Go to URL
174
- ```
175
-
176
- ---
120
+ Your banking, email, and sensitive tabs stay completely isolated unless you explicitly activate them.
177
121
 
178
122
  ## Supported Browsers
179
123
 
@@ -182,50 +126,39 @@ npx tab-agent navigate "https://..." # Go to URL
182
126
  - Microsoft Edge
183
127
  - Chromium
184
128
 
185
- Setup automatically detects your browser.
186
-
187
- ---
188
-
189
129
  ## Troubleshooting
190
130
 
191
131
  **Extension not detected?**
192
- - Ensure `extension/` folder is loaded in chrome://extensions
193
- - Developer mode must be enabled
194
- - Try refreshing the extensions page
132
+ - Make sure Developer mode is enabled in chrome://extensions
133
+ - Reload the extension
195
134
 
196
- **Tab not responding?**
197
- - Click the Tab Agent icon — must show green "ON" badge
198
- - Refresh the page after activating
135
+ **Commands not working?**
136
+ - Click the extension icon — must show green "ON"
137
+ - Run `npx tab-agent status` to check configuration
199
138
 
200
- **Relay connection issues?**
201
- - Run `npx tab-agent status` to check config
202
- - Run `npx tab-agent start` to see error details
203
-
204
- ---
139
+ **No active tabs?**
140
+ - Activate at least one tab by clicking the extension icon
205
141
 
206
142
  ## How It Works
207
143
 
208
- 1. **Chrome Extension** — Runs in your browser with access to activated tabs and your session cookies
144
+ 1. **Chrome Extension** — Injects into activated tabs, captures DOM snapshots
145
+ 2. **Relay Server** — Bridges AI ↔ Extension via Chrome Native Messaging (runs in background)
146
+ 3. **CLI** — Simple commands for Claude Code and Codex
209
147
 
210
- 2. **Relay Server** — Local WebSocket server that bridges LLM ↔ Extension via Chrome's Native Messaging API (runs in background)
211
-
212
- 3. **Skill File** — Tells Claude/Codex how to send commands
213
-
214
- **Data flow:**
215
148
  ```
216
- You: "Search Google for cats"
149
+ You: "Find cheap flights to Tokyo"
217
150
 
218
- LLMCLI command Relay Server → Native Messaging → Extension → Browser action
219
-
220
- Results Response Relay Server ← Native Messaging ← Page snapshot
151
+ Claudenpx tab-agent navigate "google.com/flights"
152
+ npx tab-agent snapshot
153
+ npx tab-agent type e5 "Tokyo"
154
+ → npx tab-agent click e12
155
+ → ...
221
156
  ```
222
157
 
223
- ---
224
-
225
158
  ## License
226
159
 
227
160
  MIT
228
161
 
229
162
  ---
230
163
 
231
- **Works with [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), and any LLM that can run shell commands.**
164
+ **Keywords:** browser automation, claude code, codex, AI browser control, web automation, puppeteer alternative, playwright alternative
package/bin/tab-agent.js CHANGED
@@ -30,31 +30,33 @@ if (BROWSER_COMMANDS.includes(command)) {
30
30
 
31
31
  function showHelp() {
32
32
  console.log(`
33
- tab-agent - Browser control for Claude/Codex
33
+ tabpilot - Give LLMs full control of your browser
34
34
 
35
- Setup Commands:
36
- setup Auto-detect extension, register native host, install skills
35
+ Setup:
36
+ setup Auto-detect extension, configure native messaging
37
37
  start Start the relay server
38
- status Check configuration status
38
+ status Check configuration
39
39
 
40
- Browser Commands:
41
- tabs List active tabs
42
- snapshot Get AI-readable page content
43
- screenshot [--full] Capture screenshot
40
+ Browser Control:
41
+ snapshot Get page content with refs [e1], [e2]...
44
42
  click <ref> Click element (e.g., click e5)
45
- type <ref> <text> Type text into element
43
+ type <ref> <text> Type into element
46
44
  fill <ref> <value> Fill form field
47
- press <key> Press key (Enter, Escape, etc.)
45
+ press <key> Press key (Enter, Escape, Tab)
48
46
  scroll <dir> [amount] Scroll up/down
49
47
  navigate <url> Go to URL
48
+ tabs List active tabs
50
49
  wait <text|selector> Wait for text or element
51
- evaluate <script> Run JavaScript
50
+ screenshot [--full] Capture page (fallback)
51
+
52
+ Workflow: snapshot → click/type → snapshot → repeat
52
53
 
53
54
  Examples:
54
- npx tab-agent setup
55
- npx tab-agent snapshot
56
- npx tab-agent click e5
57
- npx tab-agent type e3 "hello world"
55
+ npx tabpilot setup
56
+ npx tabpilot snapshot
57
+ npx tabpilot click e5
58
+ npx tabpilot type e3 "hello world"
59
+ npx tabpilot navigate "https://google.com"
58
60
 
59
61
  Version: ${require('../package.json').version}
60
62
  `);
package/cli/command.js CHANGED
@@ -130,30 +130,30 @@ function buildPayload(command, params, tabId) {
130
130
 
131
131
  function printHelp() {
132
132
  console.log(`
133
- tab-agent - Browser control commands
133
+ tab-agent - Give LLMs full control of your browser
134
134
 
135
135
  Usage: npx tab-agent <command> [options]
136
136
 
137
137
  Commands:
138
- tabs List active tabs
139
- snapshot Get AI-readable page content
140
- screenshot [--full] Capture screenshot (--full for full page)
138
+ snapshot Get page content with refs [e1], [e2]...
141
139
  click <ref> Click element (e.g., click e5)
142
- type <ref> <text> Type text into element
140
+ type <ref> <text> Type into element
143
141
  fill <ref> <value> Fill form field
144
- press <key> Press key (Enter, Escape, Tab, etc.)
145
- scroll <dir> [amount] Scroll up/down (default: 500px)
142
+ press <key> Press key (Enter, Escape, Tab)
143
+ scroll <dir> [amount] Scroll up/down
146
144
  navigate <url> Go to URL
145
+ tabs List active tabs
147
146
  wait <text|selector> Wait for text or element
147
+ screenshot [--full] Capture page (fallback)
148
148
  evaluate <script> Run JavaScript
149
149
 
150
+ Workflow: snapshot → click/type → snapshot → repeat
151
+
150
152
  Examples:
151
- npx tab-agent tabs
152
153
  npx tab-agent snapshot
153
154
  npx tab-agent click e5
154
- npx tab-agent type e3 hello world
155
- npx tab-agent navigate https://google.com
156
- npx tab-agent screenshot --full
155
+ npx tab-agent type e3 "hello world"
156
+ npx tab-agent navigate "https://google.com"
157
157
  `);
158
158
  }
159
159
 
package/cli/status.js CHANGED
@@ -54,8 +54,8 @@ async function status() {
54
54
  const claudeSkill = path.join(home, '.claude', 'skills', 'tab-agent.md');
55
55
  const codexSkill = path.join(home, '.codex', 'skills', 'tab-agent.md');
56
56
 
57
- console.log(`\nClaude Skill: ${fs.existsSync(claudeSkill) ? 'Installed' : 'Not installed'} ${claudeSkill}`);
58
- console.log(`Codex Skill: ${fs.existsSync(codexSkill) ? 'Installed' : 'Not installed (optional)'} ${codexSkill}`);
57
+ console.log(`\nClaude Skill: ${fs.existsSync(claudeSkill) ? 'Installed' : 'Not installed'}`);
58
+ console.log(`Codex Skill: ${fs.existsSync(codexSkill) ? 'Installed' : 'Not installed (optional)'}`);
59
59
 
60
60
  // Check relay server
61
61
  console.log('\nRelay Server:');
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "tab-agent",
3
- "version": "0.3.2",
4
- "description": "Browser control for Claude Code and Codex via WebSocket",
3
+ "version": "0.3.4",
4
+ "description": "Give LLMs full control of your browser - secure, click-to-activate automation for Claude, ChatGPT, Codex, and any AI",
5
5
  "bin": {
6
6
  "tab-agent": "./bin/tab-agent.js"
7
7
  },
@@ -20,13 +20,28 @@
20
20
  "ws": "^8.16.0"
21
21
  },
22
22
  "keywords": [
23
- "chrome",
24
- "extension",
25
- "browser",
26
- "automation",
23
+ "tab-agent",
24
+ "browser-agent",
25
+ "browser-automation",
26
+ "browser-control",
27
+ "ai-browser",
28
+ "llm-browser",
27
29
  "claude",
28
- "codex"
30
+ "chatgpt",
31
+ "codex",
32
+ "openai",
33
+ "anthropic",
34
+ "chrome-extension",
35
+ "web-automation",
36
+ "ai-agent",
37
+ "puppeteer-alternative",
38
+ "playwright-alternative",
39
+ "web-agent"
29
40
  ],
30
- "repository": "https://github.com/DrHB/tab-agent",
41
+ "repository": {
42
+ "type": "git",
43
+ "url": "https://github.com/DrHB/tab-agent"
44
+ },
45
+ "homepage": "https://github.com/DrHB/tab-agent#readme",
31
46
  "license": "MIT"
32
47
  }
@@ -2,7 +2,7 @@
2
2
  set -e
3
3
 
4
4
  SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
5
- HOST_NAME="com.tabagent.relay"
5
+ HOST_NAME="com.tabpilot.relay"
6
6
  HOST_DIR="$HOME/Library/Application Support/TabAgent"
7
7
  WRAPPER_PATH="$HOST_DIR/native-host-wrapper.sh"
8
8
 
@@ -40,7 +40,7 @@ cp -R "$SCRIPT_DIR/node_modules" "$HOST_DIR/node_modules"
40
40
  cat > "$MANIFEST_DIR/$HOST_NAME.json" << EOF
41
41
  {
42
42
  "name": "$HOST_NAME",
43
- "description": "Tab Agent Native Messaging Host",
43
+ "description": "TabPilot Native Messaging Host",
44
44
  "path": "$WRAPPER_PATH",
45
45
  "type": "stdio",
46
46
  "allowed_origins": [
@@ -6,7 +6,7 @@ cd "$SCRIPT_DIR"
6
6
 
7
7
  LOG_FILE="$SCRIPT_DIR/wrapper.log"
8
8
  echo "$(date): Starting native host from $SCRIPT_DIR" >> "$LOG_FILE"
9
- export TAB_AGENT_LOG="/tmp/tab-agent-native-host.log"
9
+ export TAB_AGENT_LOG="/tmp/tabpilot-native-host.log"
10
10
 
11
11
  NODE_BIN="/opt/homebrew/bin/node"
12
12
  if [ ! -x "$NODE_BIN" ]; then
@@ -1,7 +1,7 @@
1
1
  {
2
- "name": "tab-agent-relay",
2
+ "name": "tabpilot-relay",
3
3
  "version": "0.1.0",
4
- "description": "WebSocket relay for Tab Agent Chrome extension",
4
+ "description": "WebSocket relay for TabPilot Chrome extension",
5
5
  "main": "server.js",
6
6
  "scripts": {
7
7
  "start": "node server.js"
package/relay/server.js CHANGED
@@ -102,7 +102,7 @@ wss.on('connection', (ws, req) => {
102
102
  });
103
103
 
104
104
  httpServer.listen(PORT, () => {
105
- console.log(`Tab Agent Relay running on ws://localhost:${PORT}`);
105
+ console.log(`TabPilot Relay running on ws://localhost:${PORT}`);
106
106
  console.log(`Health check: http://localhost:${PORT}/health`);
107
107
  });
108
108
 
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: tab-agent
3
- description: Browser control via CLI - snapshot, click, type, fill, screenshot
3
+ description: Browser control via CLI - snapshot, click, type, navigate
4
4
  ---
5
5
 
6
6
  # Tab Agent
@@ -17,7 +17,6 @@ sleep 2
17
17
  ## Commands
18
18
 
19
19
  ```bash
20
- npx tab-agent tabs # List active tabs
21
20
  npx tab-agent snapshot # Get page with refs [e1], [e2]...
22
21
  npx tab-agent click <ref> # Click element
23
22
  npx tab-agent type <ref> <text> # Type text
@@ -25,9 +24,9 @@ npx tab-agent fill <ref> <value> # Fill form field
25
24
  npx tab-agent press <key> # Press key (Enter, Escape, Tab)
26
25
  npx tab-agent scroll <dir> [amount] # Scroll up/down
27
26
  npx tab-agent navigate <url> # Go to URL
27
+ npx tab-agent tabs # List active tabs
28
28
  npx tab-agent wait <text|selector> # Wait for condition
29
- npx tab-agent screenshot # Capture viewport (fallback only)
30
- npx tab-agent screenshot --full # Capture full page (fallback only)
29
+ npx tab-agent screenshot # Capture page (fallback only)
31
30
  ```
32
31
 
33
32
  ## Workflow
@@ -49,14 +48,10 @@ npx tab-agent snapshot
49
48
  npx tab-agent type e1 "hello world"
50
49
  npx tab-agent press Enter
51
50
  npx tab-agent snapshot # See results
52
-
53
- # Only screenshot if snapshot doesn't show what you need
54
- npx tab-agent screenshot --full
55
51
  ```
56
52
 
57
53
  ## Notes
58
54
 
59
55
  - Refs reset on each snapshot - always snapshot before interacting
60
56
  - Keys: Enter, Escape, Tab, Backspace, ArrowUp/Down/Left/Right
61
- - Screenshot outputs base64 to stdout (no file saved)
62
- - Prefer snapshot over screenshot - it's faster and text-based
57
+ - Prefer snapshot over screenshot - faster and text-based
@@ -16,7 +16,6 @@ curl -s http://localhost:9876/health || (npx tab-agent start &)
16
16
  ## Commands
17
17
 
18
18
  ```bash
19
- npx tab-agent tabs # List active tabs
20
19
  npx tab-agent snapshot # Page with refs [e1], [e2]...
21
20
  npx tab-agent click <ref> # Click element
22
21
  npx tab-agent type <ref> <text> # Type text
@@ -24,6 +23,7 @@ npx tab-agent fill <ref> <val> # Fill form field
24
23
  npx tab-agent press <key> # Enter/Escape/Tab/Arrow*
25
24
  npx tab-agent scroll <dir> [n] # Scroll up/down
26
25
  npx tab-agent navigate <url> # Go to URL
26
+ npx tab-agent tabs # List active tabs
27
27
  npx tab-agent wait <text|sel> # Wait for condition
28
28
  npx tab-agent screenshot # Fallback only - if snapshot incomplete
29
29
  ```