tab-agent 0.3.1 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,12 +1,15 @@
1
1
  # Tab Agent
2
2
 
3
3
  [![npm version](https://img.shields.io/npm/v/tab-agent.svg)](https://www.npmjs.com/package/tab-agent)
4
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
4
5
 
5
- **Give Claude, Codex, or any LLM full control of your browser tabs** — securely, with click-to-activate permission.
6
+ **Give LLMs full control of your browser** — securely, with click-to-activate permission.
7
+
8
+ Works with Claude, ChatGPT, Codex, and any AI that can run shell commands.
6
9
 
7
10
  ```
8
11
  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
9
- Claude Code │────▶│ Relay Server │────▶│ Extension │
12
+ Claude / GPT │────▶│ Relay Server │────▶│ Extension │
10
13
  │ Codex / LLM │◀────│ (background) │◀────│ (Chrome) │
11
14
  └─────────────────┘ └─────────────────┘ └─────────────────┘
12
15
 
@@ -20,160 +23,103 @@
20
23
  ## Features
21
24
 
22
25
  - **Full browser control** — navigate, click, type, scroll, screenshot, run JavaScript
23
- - **Uses your login sessions** — access authenticated sites (GitHub, Gmail, X) without sharing credentials
24
- - **Runs in background** — relay server starts automatically, works while you do other things
25
- - **Click-to-activate security** — only tabs you explicitly enable, your other tabs stay private
26
- - **AI-optimized snapshots** — pages converted to readable text with element refs `[e1]`, `[e2]`
27
- - **Works with any LLM** — Claude Code, Codex, or any tool that can run shell commands
28
-
29
- ---
26
+ - **Uses your login sessions** — access GitHub, Gmail, Amazon without sharing credentials
27
+ - **Runs in background** — relay starts automatically, works while you do other things
28
+ - **Click-to-activate security** — only tabs you explicitly enable, others stay private
29
+ - **AI-optimized snapshots** — pages converted to text with refs `[e1]`, `[e2]` for easy targeting
30
+ - **Works with any LLM** — Claude, ChatGPT, Codex, or custom AI agents
30
31
 
31
32
  ## Quick Start
32
33
 
33
34
  ```bash
34
- # 1. Clone and load extension
35
+ # 1. Install extension
35
36
  git clone https://github.com/DrHB/tab-agent
36
- # Chrome: chrome://extensions → Developer mode → Load unpacked → select extension/
37
+ # Chrome: chrome://extensions → Developer mode → Load unpacked → select extension/
37
38
 
38
- # 2. Setup (auto-detects everything)
39
+ # 2. Setup
39
40
  npx tab-agent setup
40
41
 
41
- # 3. Activate a tab & go!
42
- # Click Tab Agent icon on any tab (turns green = active)
43
- # Ask Claude: "Use tab-agent to search Google for 'hello world'"
42
+ # 3. Activate & go
43
+ # Click extension icon on any tab (turns green)
44
+ # Ask your AI: "Search Amazon for mechanical keyboards and find the best rated"
44
45
  ```
45
46
 
46
- ---
47
-
48
- ## Why Tab Agent?
49
-
50
- ### 🔒 Security First
51
-
52
- | | Tab Agent | Traditional Automation |
53
- |--|-----------|----------------------|
54
- | **Access** | Only tabs you activate | Entire browser |
55
- | **Visibility** | Green badge = active | Hidden/background |
56
- | **Sessions** | Uses your cookies | Requires re-login |
57
- | **Credentials** | Never shared | Often required |
58
-
59
- **Click-to-activate model:** Your banking, email, and sensitive tabs stay completely isolated. You always see exactly which tabs the LLM can control.
60
-
61
- ### 🍪 Works With Your Login Sessions
62
-
63
- Because Tab Agent runs as a Chrome extension:
47
+ ## Example Tasks
64
48
 
65
- - **Uses your existing cookies** — no re-authentication needed
66
- - **Access any site you're logged into** — GitHub, X, Gmail, internal tools
67
- - **Works with SSO and 2FA** enterprise apps, protected accounts
68
- - **No credential sharing** — your passwords stay in your browser
69
-
70
- ### 🤖 LLM-Optimized
71
-
72
- - **Semantic snapshots** — pages converted to readable text with refs `[e1]`, `[e2]`
73
- - **Screenshot fallback** — for complex dynamic pages
74
- - **Simple targeting** — click/type using refs instead of fragile CSS selectors
75
-
76
- ---
49
+ ```bash
50
+ # Research
51
+ "Go to Hacker News and summarize the top 5 stories"
77
52
 
78
- ## Example Use Cases
53
+ # Shopping (uses your login!)
54
+ "Search Amazon for protein powder, filter by 4+ stars, find the best value"
79
55
 
80
- **Web Research**
81
- > "Go to Hacker News and summarize the top 5 articles"
56
+ # Social Media
57
+ "Check my GitHub notifications and list unread ones"
82
58
 
83
- **Authenticated Actions** (uses your session!)
84
- > "Check my GitHub notifications and list the unread ones"
59
+ # Data Extraction
60
+ "Get the titles and prices of the first 10 products on this page"
85
61
 
86
- **Form Automation**
87
- > "Fill out this contact form with my details"
62
+ # Automation
63
+ "Fill out this form with my details"
64
+ ```
88
65
 
89
- **Data Extraction**
90
- > "Get the last 20 posts from my X timeline with author names"
66
+ ## Commands
91
67
 
92
- **Multi-step Workflows**
93
- > "Search Amazon for 'mechanical keyboard', filter by 4+ stars, and list the top 3"
68
+ ```bash
69
+ # Core workflow
70
+ npx tab-agent snapshot # Get page content with refs [e1], [e2]...
71
+ npx tab-agent click <ref> # Click element (e.g., click e5)
72
+ npx tab-agent type <ref> <text> # Type into element
73
+ npx tab-agent fill <ref> <value> # Fill form field
74
+
75
+ # Navigation
76
+ npx tab-agent navigate <url> # Go to URL
77
+ npx tab-agent scroll <dir> [amount] # Scroll up/down
78
+ npx tab-agent press <key> # Press key (Enter, Escape, Tab)
79
+
80
+ # Utilities
81
+ npx tab-agent tabs # List active tabs
82
+ npx tab-agent wait <text> # Wait for text to appear
83
+ npx tab-agent screenshot # Capture page (fallback for complex UIs)
84
+ ```
94
85
 
95
- ---
86
+ **Workflow:** `snapshot` → use refs → `click`/`type` → `snapshot` again → repeat
96
87
 
97
88
  ## Installation
98
89
 
99
- ### Step 1: Load Extension
90
+ ### 1. Load Extension
100
91
 
101
92
  ```bash
102
93
  git clone https://github.com/DrHB/tab-agent
103
94
  ```
104
95
 
105
- 1. Open `chrome://extensions` in your browser
106
- 2. Enable **Developer mode** (toggle in top right)
96
+ 1. Open `chrome://extensions`
97
+ 2. Enable **Developer mode** (top right)
107
98
  3. Click **Load unpacked**
108
99
  4. Select the `extension/` folder
109
- 5. You'll see the Tab Agent icon in your toolbar
110
100
 
111
- ### Step 2: Run Setup
101
+ ### 2. Run Setup
112
102
 
113
103
  ```bash
114
104
  npx tab-agent setup
115
105
  ```
116
106
 
117
- This automatically:
118
- - Detects your extension ID
119
- - Configures native messaging
120
- - Installs the Claude/Codex skill
107
+ This auto-detects your extension and configures everything.
121
108
 
122
- ### Step 3: Activate & Use
109
+ ### 3. Activate Tabs
123
110
 
124
- 1. Navigate to any webpage
125
- 2. **Click the Tab Agent icon** — it turns green (🟢 ON)
126
- 3. Ask your LLM to interact with the page
111
+ Click the Tab Agent icon on any tab you want to control. Green = active.
127
112
 
128
- ---
129
-
130
- ## Commands Reference
131
-
132
- ### Navigation & Viewing
133
- | Command | Description |
134
- |---------|-------------|
135
- | `tabs` | List all activated tabs |
136
- | `navigate` | Go to a URL |
137
- | `snapshot` | Get page with element refs |
138
- | `screenshot` | Capture viewport image |
139
- | `screenshot --full` | Capture entire page |
140
-
141
- ### Interaction
142
- | Command | Description |
143
- |---------|-------------|
144
- | `click` | Click element by ref |
145
- | `type` | Type text into element |
146
- | `fill` | Fill a form field |
147
- | `press` | Press a key (Enter, Escape, Tab, Arrows) |
148
-
149
- ### Page Control
150
- | Command | Description |
151
- |---------|-------------|
152
- | `scroll` | Scroll up/down by amount |
153
- | `wait` | Wait for text or element to appear |
154
- | `evaluate` | Run JavaScript in page context |
155
-
156
- ---
157
-
158
- ## CLI Usage
113
+ ## Security Model
159
114
 
160
- ```bash
161
- # Setup & Status
162
- npx tab-agent setup # Initial configuration
163
- npx tab-agent status # Check if everything works
164
- npx tab-agent start # Start relay server manually
115
+ | Feature | Tab Agent | Traditional Automation |
116
+ |---------|--------------|----------------------|
117
+ | **Access** | Only tabs you click to activate | Entire browser |
118
+ | **Sessions** | Uses your cookies | Requires credentials |
119
+ | **Visibility** | Green badge shows active tabs | Hidden/background |
120
+ | **Control** | You choose what AI can access | Full access by default |
165
121
 
166
- # Browser Commands
167
- npx tab-agent tabs # List active tabs
168
- npx tab-agent snapshot # Get page content with refs
169
- npx tab-agent screenshot # Capture viewport
170
- npx tab-agent screenshot --full # Capture full page
171
- npx tab-agent click e5 # Click element
172
- npx tab-agent type e3 "hello" # Type text
173
- npx tab-agent navigate "https://..." # Go to URL
174
- ```
175
-
176
- ---
122
+ Your banking, email, and sensitive tabs stay completely isolated unless you explicitly activate them.
177
123
 
178
124
  ## Supported Browsers
179
125
 
@@ -182,50 +128,39 @@ npx tab-agent navigate "https://..." # Go to URL
182
128
  - Microsoft Edge
183
129
  - Chromium
184
130
 
185
- Setup automatically detects your browser.
186
-
187
- ---
188
-
189
131
  ## Troubleshooting
190
132
 
191
133
  **Extension not detected?**
192
- - Ensure `extension/` folder is loaded in chrome://extensions
193
- - Developer mode must be enabled
194
- - Try refreshing the extensions page
134
+ - Make sure Developer mode is enabled in chrome://extensions
135
+ - Reload the extension
195
136
 
196
- **Tab not responding?**
197
- - Click the Tab Agent icon — must show green "ON" badge
198
- - Refresh the page after activating
137
+ **Commands not working?**
138
+ - Click the extension icon — must show green "ON"
139
+ - Run `npx tab-agent status` to check configuration
199
140
 
200
- **Relay connection issues?**
201
- - Run `npx tab-agent status` to check config
202
- - Run `npx tab-agent start` to see error details
203
-
204
- ---
141
+ **No active tabs?**
142
+ - Activate at least one tab by clicking the extension icon
205
143
 
206
144
  ## How It Works
207
145
 
208
- 1. **Chrome Extension** — Runs in your browser with access to activated tabs and your session cookies
146
+ 1. **Chrome Extension** — Injects into activated tabs, captures DOM snapshots
147
+ 2. **Relay Server** — Bridges AI ↔ Extension via Chrome Native Messaging (runs in background)
148
+ 3. **CLI** — Simple commands that any LLM can execute
209
149
 
210
- 2. **Relay Server** — Local WebSocket server that bridges LLM ↔ Extension via Chrome's Native Messaging API (runs in background)
211
-
212
- 3. **Skill File** — Tells Claude/Codex how to send commands
213
-
214
- **Data flow:**
215
150
  ```
216
- You: "Search Google for cats"
151
+ You: "Find cheap flights to Tokyo"
217
152
 
218
- LLM → CLI command Relay Server → Native Messaging → Extension → Browser action
219
-
220
- Results Response Relay Server ← Native Messaging ← Page snapshot
153
+ LLM → npx tab-agent navigate "google.com/flights"
154
+ npx tab-agent snapshot
155
+ npx tab-agent type e5 "Tokyo"
156
+ → npx tab-agent click e12
157
+ → ...
221
158
  ```
222
159
 
223
- ---
224
-
225
160
  ## License
226
161
 
227
162
  MIT
228
163
 
229
164
  ---
230
165
 
231
- **Works with [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), and any LLM that can run shell commands.**
166
+ **Keywords:** browser agent, browser automation, AI browser control, Claude browser, ChatGPT browser, LLM web automation, Codex browser, puppeteer alternative, playwright alternative
package/bin/tab-agent.js CHANGED
@@ -30,31 +30,33 @@ if (BROWSER_COMMANDS.includes(command)) {
30
30
 
31
31
  function showHelp() {
32
32
  console.log(`
33
- tab-agent - Browser control for Claude/Codex
33
+ tabpilot - Give LLMs full control of your browser
34
34
 
35
- Setup Commands:
36
- setup Auto-detect extension, register native host, install skills
35
+ Setup:
36
+ setup Auto-detect extension, configure native messaging
37
37
  start Start the relay server
38
- status Check configuration status
38
+ status Check configuration
39
39
 
40
- Browser Commands:
41
- tabs List active tabs
42
- snapshot Get AI-readable page content
43
- screenshot [--full] Capture screenshot
40
+ Browser Control:
41
+ snapshot Get page content with refs [e1], [e2]...
44
42
  click <ref> Click element (e.g., click e5)
45
- type <ref> <text> Type text into element
43
+ type <ref> <text> Type into element
46
44
  fill <ref> <value> Fill form field
47
- press <key> Press key (Enter, Escape, etc.)
45
+ press <key> Press key (Enter, Escape, Tab)
48
46
  scroll <dir> [amount] Scroll up/down
49
47
  navigate <url> Go to URL
48
+ tabs List active tabs
50
49
  wait <text|selector> Wait for text or element
51
- evaluate <script> Run JavaScript
50
+ screenshot [--full] Capture page (fallback)
51
+
52
+ Workflow: snapshot → click/type → snapshot → repeat
52
53
 
53
54
  Examples:
54
- npx tab-agent setup
55
- npx tab-agent snapshot
56
- npx tab-agent click e5
57
- npx tab-agent type e3 "hello world"
55
+ npx tabpilot setup
56
+ npx tabpilot snapshot
57
+ npx tabpilot click e5
58
+ npx tabpilot type e3 "hello world"
59
+ npx tabpilot navigate "https://google.com"
58
60
 
59
61
  Version: ${require('../package.json').version}
60
62
  `);
package/cli/command.js CHANGED
@@ -130,30 +130,30 @@ function buildPayload(command, params, tabId) {
130
130
 
131
131
  function printHelp() {
132
132
  console.log(`
133
- tab-agent - Browser control commands
133
+ tab-agent - Give LLMs full control of your browser
134
134
 
135
135
  Usage: npx tab-agent <command> [options]
136
136
 
137
137
  Commands:
138
- tabs List active tabs
139
- snapshot Get AI-readable page content
140
- screenshot [--full] Capture screenshot (--full for full page)
138
+ snapshot Get page content with refs [e1], [e2]...
141
139
  click <ref> Click element (e.g., click e5)
142
- type <ref> <text> Type text into element
140
+ type <ref> <text> Type into element
143
141
  fill <ref> <value> Fill form field
144
- press <key> Press key (Enter, Escape, Tab, etc.)
145
- scroll <dir> [amount] Scroll up/down (default: 500px)
142
+ press <key> Press key (Enter, Escape, Tab)
143
+ scroll <dir> [amount] Scroll up/down
146
144
  navigate <url> Go to URL
145
+ tabs List active tabs
147
146
  wait <text|selector> Wait for text or element
147
+ screenshot [--full] Capture page (fallback)
148
148
  evaluate <script> Run JavaScript
149
149
 
150
+ Workflow: snapshot → click/type → snapshot → repeat
151
+
150
152
  Examples:
151
- npx tab-agent tabs
152
153
  npx tab-agent snapshot
153
154
  npx tab-agent click e5
154
- npx tab-agent type e3 hello world
155
- npx tab-agent navigate https://google.com
156
- npx tab-agent screenshot --full
155
+ npx tab-agent type e3 "hello world"
156
+ npx tab-agent navigate "https://google.com"
157
157
  `);
158
158
  }
159
159
 
package/cli/status.js CHANGED
@@ -54,8 +54,8 @@ async function status() {
54
54
  const claudeSkill = path.join(home, '.claude', 'skills', 'tab-agent.md');
55
55
  const codexSkill = path.join(home, '.codex', 'skills', 'tab-agent.md');
56
56
 
57
- console.log(`\nClaude Skill: ${fs.existsSync(claudeSkill) ? 'Installed' : 'Not installed'} ${claudeSkill}`);
58
- console.log(`Codex Skill: ${fs.existsSync(codexSkill) ? 'Installed' : 'Not installed (optional)'} ${codexSkill}`);
57
+ console.log(`\nClaude Skill: ${fs.existsSync(claudeSkill) ? 'Installed' : 'Not installed'}`);
58
+ console.log(`Codex Skill: ${fs.existsSync(codexSkill) ? 'Installed' : 'Not installed (optional)'}`);
59
59
 
60
60
  // Check relay server
61
61
  console.log('\nRelay Server:');
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "tab-agent",
3
- "version": "0.3.1",
4
- "description": "Browser control for Claude Code and Codex via WebSocket",
3
+ "version": "0.3.3",
4
+ "description": "Give LLMs full control of your browser - secure, click-to-activate automation for Claude, ChatGPT, Codex, and any AI",
5
5
  "bin": {
6
6
  "tab-agent": "./bin/tab-agent.js"
7
7
  },
@@ -20,13 +20,28 @@
20
20
  "ws": "^8.16.0"
21
21
  },
22
22
  "keywords": [
23
- "chrome",
24
- "extension",
25
- "browser",
26
- "automation",
23
+ "tab-agent",
24
+ "browser-agent",
25
+ "browser-automation",
26
+ "browser-control",
27
+ "ai-browser",
28
+ "llm-browser",
27
29
  "claude",
28
- "codex"
30
+ "chatgpt",
31
+ "codex",
32
+ "openai",
33
+ "anthropic",
34
+ "chrome-extension",
35
+ "web-automation",
36
+ "ai-agent",
37
+ "puppeteer-alternative",
38
+ "playwright-alternative",
39
+ "web-agent"
29
40
  ],
30
- "repository": "https://github.com/DrHB/tab-agent",
41
+ "repository": {
42
+ "type": "git",
43
+ "url": "https://github.com/DrHB/tab-agent"
44
+ },
45
+ "homepage": "https://github.com/DrHB/tab-agent#readme",
31
46
  "license": "MIT"
32
47
  }
@@ -2,7 +2,7 @@
2
2
  set -e
3
3
 
4
4
  SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
5
- HOST_NAME="com.tabagent.relay"
5
+ HOST_NAME="com.tabpilot.relay"
6
6
  HOST_DIR="$HOME/Library/Application Support/TabAgent"
7
7
  WRAPPER_PATH="$HOST_DIR/native-host-wrapper.sh"
8
8
 
@@ -40,7 +40,7 @@ cp -R "$SCRIPT_DIR/node_modules" "$HOST_DIR/node_modules"
40
40
  cat > "$MANIFEST_DIR/$HOST_NAME.json" << EOF
41
41
  {
42
42
  "name": "$HOST_NAME",
43
- "description": "Tab Agent Native Messaging Host",
43
+ "description": "TabPilot Native Messaging Host",
44
44
  "path": "$WRAPPER_PATH",
45
45
  "type": "stdio",
46
46
  "allowed_origins": [
@@ -6,7 +6,7 @@ cd "$SCRIPT_DIR"
6
6
 
7
7
  LOG_FILE="$SCRIPT_DIR/wrapper.log"
8
8
  echo "$(date): Starting native host from $SCRIPT_DIR" >> "$LOG_FILE"
9
- export TAB_AGENT_LOG="/tmp/tab-agent-native-host.log"
9
+ export TAB_AGENT_LOG="/tmp/tabpilot-native-host.log"
10
10
 
11
11
  NODE_BIN="/opt/homebrew/bin/node"
12
12
  if [ ! -x "$NODE_BIN" ]; then
@@ -1,7 +1,7 @@
1
1
  {
2
- "name": "tab-agent-relay",
2
+ "name": "tabpilot-relay",
3
3
  "version": "0.1.0",
4
- "description": "WebSocket relay for Tab Agent Chrome extension",
4
+ "description": "WebSocket relay for TabPilot Chrome extension",
5
5
  "main": "server.js",
6
6
  "scripts": {
7
7
  "start": "node server.js"
package/relay/server.js CHANGED
@@ -102,7 +102,7 @@ wss.on('connection', (ws, req) => {
102
102
  });
103
103
 
104
104
  httpServer.listen(PORT, () => {
105
- console.log(`Tab Agent Relay running on ws://localhost:${PORT}`);
105
+ console.log(`TabPilot Relay running on ws://localhost:${PORT}`);
106
106
  console.log(`Health check: http://localhost:${PORT}/health`);
107
107
  });
108
108
 
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: tab-agent
3
- description: Browser control via CLI - snapshot, click, type, fill, screenshot
3
+ description: Browser control via CLI - snapshot, click, type, navigate
4
4
  ---
5
5
 
6
6
  # Tab Agent
@@ -17,26 +17,27 @@ sleep 2
17
17
  ## Commands
18
18
 
19
19
  ```bash
20
- npx tab-agent tabs # List active tabs
21
20
  npx tab-agent snapshot # Get page with refs [e1], [e2]...
22
- npx tab-agent screenshot # Capture viewport
23
- npx tab-agent screenshot --full # Capture full page
24
21
  npx tab-agent click <ref> # Click element
25
22
  npx tab-agent type <ref> <text> # Type text
26
23
  npx tab-agent fill <ref> <value> # Fill form field
27
24
  npx tab-agent press <key> # Press key (Enter, Escape, Tab)
28
25
  npx tab-agent scroll <dir> [amount] # Scroll up/down
29
26
  npx tab-agent navigate <url> # Go to URL
27
+ npx tab-agent tabs # List active tabs
30
28
  npx tab-agent wait <text|selector> # Wait for condition
31
- npx tab-agent evaluate <script> # Run JavaScript
29
+ npx tab-agent screenshot # Capture page (fallback only)
32
30
  ```
33
31
 
34
- ## Usage
32
+ ## Workflow
35
33
 
36
- 1. `tabs` -> find active tab
37
- 2. `snapshot` -> read page, get element refs [e1], [e2]...
38
- 3. `click`/`type`/`fill` using refs
39
- 4. If snapshot incomplete -> `screenshot` and analyze visually
34
+ 1. `snapshot` first - always start here to get element refs
35
+ 2. Use refs [e1], [e2]... with `click`/`type`/`fill`
36
+ 3. `snapshot` again after actions to see results
37
+ 4. **Only use `screenshot` if:**
38
+ - Snapshot is missing expected content
39
+ - Page has complex visuals (charts, images, canvas)
40
+ - Debugging why an action didn't work
40
41
 
41
42
  ## Examples
42
43
 
@@ -46,14 +47,11 @@ npx tab-agent navigate "https://google.com"
46
47
  npx tab-agent snapshot
47
48
  npx tab-agent type e1 "hello world"
48
49
  npx tab-agent press Enter
49
-
50
- # Read page content
51
- npx tab-agent snapshot
52
- npx tab-agent screenshot --full
50
+ npx tab-agent snapshot # See results
53
51
  ```
54
52
 
55
53
  ## Notes
56
54
 
57
- - Screenshot saves to /tmp/ and opens automatically
58
55
  - Refs reset on each snapshot - always snapshot before interacting
59
56
  - Keys: Enter, Escape, Tab, Backspace, ArrowUp/Down/Left/Right
57
+ - Prefer snapshot over screenshot - faster and text-based
@@ -16,19 +16,23 @@ curl -s http://localhost:9876/health || (npx tab-agent start &)
16
16
  ## Commands
17
17
 
18
18
  ```bash
19
- tabs # List active tabs
20
- snapshot # Page with refs [e1], [e2]...
21
- screenshot [--full] # Capture viewport/full page
22
- click <ref> # Click element
23
- type <ref> <text> # Type text
24
- fill <ref> <value> # Fill form field
25
- press <key> # Enter/Escape/Tab/Arrow*
26
- scroll <dir> [amount] # Scroll up/down
27
- navigate <url> # Go to URL
28
- wait <text|selector> # Wait for condition
29
- evaluate <script> # Run JavaScript
19
+ npx tab-agent snapshot # Page with refs [e1], [e2]...
20
+ npx tab-agent click <ref> # Click element
21
+ npx tab-agent type <ref> <text> # Type text
22
+ npx tab-agent fill <ref> <val> # Fill form field
23
+ npx tab-agent press <key> # Enter/Escape/Tab/Arrow*
24
+ npx tab-agent scroll <dir> [n] # Scroll up/down
25
+ npx tab-agent navigate <url> # Go to URL
26
+ npx tab-agent tabs # List active tabs
27
+ npx tab-agent wait <text|sel> # Wait for condition
28
+ npx tab-agent screenshot # Fallback only - if snapshot incomplete
30
29
  ```
31
30
 
32
- ## Flow
31
+ ## Workflow
33
32
 
34
- `snapshot` -> `click`/`type` -> repeat. Use `screenshot` if snapshot incomplete.
33
+ 1. Always `snapshot` first - get refs [e1], [e2]...
34
+ 2. `click`/`type`/`fill` using refs
35
+ 3. `snapshot` again to see results
36
+ 4. **Only screenshot if snapshot missing content** (charts, canvas, debugging)
37
+
38
+ Prefer snapshot over screenshot - faster and text-based.