tab-agent 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +159 -104
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -1,135 +1,161 @@
1
1
  # Tab Agent
2
2
 
3
- Secure tab-level browser control for Claude Code and Codex — only the tabs you explicitly activate, not your entire browser.
3
+ **Secure browser control for Claude Code and Codex** — only the tabs you explicitly activate, not your entire browser.
4
+
5
+ ```
6
+ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
7
+ │ Claude/Codex │────▶│ Relay Server │────▶│ Extension │
8
+ │ │◀────│ :9876 │◀────│ (Chrome) │
9
+ └─────────────────┘ └─────────────────┘ └─────────────────┘
10
+
11
+
12
+ ┌───────────────────┐
13
+ │ Your Active Tab │
14
+ │ 🟢 ON │
15
+ └───────────────────┘
16
+ ```
17
+
18
+ ## Quick Start
19
+
20
+ ```bash
21
+ # 1. Clone and load extension
22
+ git clone https://github.com/DrHB/tab-agent
23
+ # → Open chrome://extensions → Enable Developer mode → Load unpacked → select extension/
24
+
25
+ # 2. Setup (auto-detects everything)
26
+ npx tab-agent setup
27
+
28
+ # 3. Use it
29
+ # → Click Tab Agent icon on a tab (turns green)
30
+ # → Ask Claude: "Use tab-agent to search Google for 'hello world'"
31
+ ```
32
+
33
+ ---
4
34
 
5
35
  ## Why Tab Agent?
6
36
 
7
- ### Security First
8
- Unlike browser automation tools that control your entire browser, Tab Agent uses a **click-to-activate** model:
9
- - Only tabs you explicitly activate (green badge) can be controlled
10
- - Your banking, email, and other sensitive tabs remain completely isolated
11
- - No background access you see exactly which tabs AI can interact with
12
- - Full audit logging of every action taken
37
+ ### 🔒 Security First
38
+
39
+ | | Tab Agent | Traditional Automation |
40
+ |--|-----------|----------------------|
41
+ | **Access** | Only tabs you activate | Entire browser |
42
+ | **Visibility** | Green badge = active | Hidden/background |
43
+ | **Sessions** | Uses your cookies | Requires re-login |
44
+ | **Credentials** | Never shared | Often required |
45
+ | **Audit** | Full action logging | Varies |
46
+
47
+ **Click-to-activate model:** Your banking, email, and sensitive tabs stay completely isolated. You always see exactly which tabs AI can control.
48
+
49
+ ### 🍪 Works With Your Login Sessions
13
50
 
14
- ### Works With Your Session
15
- Tab Agent operates through a Chrome extension, which means:
16
- - **Uses your existing cookies and login sessions** — no need to re-authenticate
17
- - Access sites that require login (GitHub, Twitter, internal tools, etc.)
18
- - Works with SSO, 2FA-protected accounts, and enterprise apps
19
- - No credential sharing or token management needed
51
+ Because Tab Agent runs as a Chrome extension:
20
52
 
21
- ### AI-Optimized
22
- - **Semantic snapshots** pages converted to AI-readable text with element refs `[e1]`, `[e2]`
23
- - **Screenshot fallback** — for complex/dynamic pages, get visual screenshots
24
- - **Smart element targeting** — click, type, fill using simple refs instead of fragile selectors
53
+ - **Uses your existing cookies** — no re-authentication needed
54
+ - **Access any site you're logged into** GitHub, Twitter, Gmail, internal tools
55
+ - **Works with SSO and 2FA** — enterprise apps, protected accounts
56
+ - **No credential sharing** — your passwords stay in your browser
25
57
 
26
- ## Install
58
+ ### 🤖 AI-Optimized
27
59
 
28
- ### 1. Load Extension
60
+ - **Semantic snapshots** — pages converted to AI-readable text with refs `[e1]`, `[e2]`
61
+ - **Screenshot fallback** — for complex dynamic pages
62
+ - **Simple targeting** — click/type using refs instead of fragile CSS selectors
63
+
64
+ ---
65
+
66
+ ## Example Use Cases
67
+
68
+ **Web Research**
69
+ > "Go to Hacker News and summarize the top 5 articles"
70
+
71
+ **Authenticated Actions** (uses your session!)
72
+ > "Check my GitHub notifications and list the unread ones"
73
+
74
+ **Form Automation**
75
+ > "Fill out this contact form with my details"
76
+
77
+ **Data Extraction**
78
+ > "Get the last 20 tweets from my timeline with author names"
79
+
80
+ **Multi-step Workflows**
81
+ > "Search Amazon for 'mechanical keyboard', filter by 4+ stars, and list the top 3"
82
+
83
+ ---
84
+
85
+ ## Installation
86
+
87
+ ### Step 1: Load Extension
29
88
 
30
89
  ```bash
31
90
  git clone https://github.com/DrHB/tab-agent
32
91
  ```
33
92
 
34
- 1. Open `chrome://extensions`
35
- 2. Enable **Developer mode** (top right)
36
- 3. Click **Load unpacked** → select the `extension/` folder
37
- 4. You'll see the Tab Agent icon in your toolbar
93
+ 1. Open `chrome://extensions` in your browser
94
+ 2. Enable **Developer mode** (toggle in top right)
95
+ 3. Click **Load unpacked**
96
+ 4. Select the `extension/` folder
97
+ 5. You'll see the Tab Agent icon in your toolbar
38
98
 
39
- ### 2. Run Setup
99
+ ### Step 2: Run Setup
40
100
 
41
101
  ```bash
42
102
  npx tab-agent setup
43
103
  ```
44
104
 
45
- This auto-detects your extension and configures everything (native messaging + skills).
46
-
47
- ### 3. Use It
105
+ This automatically:
106
+ - Detects your extension ID
107
+ - Configures native messaging
108
+ - Installs the Claude/Codex skill
48
109
 
49
- 1. **Click the Tab Agent icon** on any tab you want to control (turns green = active)
50
- 2. **Ask Claude/Codex:**
51
- - "Use tab-agent to search Google for 'best restaurants nearby'"
52
- - "Go to my GitHub notifications and summarize them"
53
- - "Fill out this form with my details"
54
-
55
- ## Example Use Cases
110
+ ### Step 3: Activate & Use
56
111
 
57
- ### Web Research
58
- ```
59
- "Go to Hacker News and get me the top 5 articles with summaries"
60
- ```
61
-
62
- ### Authenticated Actions
63
- ```
64
- "Check my GitHub notifications and mark the resolved ones as read"
65
- ```
66
- Works because Tab Agent uses your existing GitHub session!
112
+ 1. Navigate to any webpage
113
+ 2. **Click the Tab Agent icon** — it turns green (🟢 ON)
114
+ 3. Ask your AI to interact with the page
67
115
 
68
- ### Form Automation
69
- ```
70
- "Fill out this job application with my resume details"
71
- ```
116
+ ---
72
117
 
73
- ### Data Extraction
74
- ```
75
- "Go to my Twitter timeline and get the last 20 tweets"
76
- ```
118
+ ## Commands Reference
77
119
 
78
- ## Commands
120
+ ### Navigation & Viewing
121
+ | Command | Description |
122
+ |---------|-------------|
123
+ | `tabs` | List all activated tabs |
124
+ | `navigate` | Go to a URL |
125
+ | `snapshot` | Get AI-readable page with element refs |
126
+ | `screenshot` | Capture viewport image |
127
+ | `screenshot fullPage` | Capture entire page |
79
128
 
129
+ ### Interaction
80
130
  | Command | Description |
81
131
  |---------|-------------|
82
- | `tabs` | List activated tabs |
83
- | `snapshot` | Get AI-readable page with refs [e1], [e2]... |
84
- | `screenshot` | Capture viewport (add `fullPage: true` for full page) |
85
132
  | `click` | Click element by ref |
86
- | `fill` | Fill form field |
87
- | `type` | Type text (add `submit: true` to press Enter) |
88
- | `press` | Press key (Enter, Escape, Tab, Arrow keys) |
89
- | `scroll` | Scroll page up/down |
90
- | `scrollintoview` | Scroll element into view |
91
- | `navigate` | Go to URL |
92
- | `wait` | Wait for text or selector to appear |
93
- | `evaluate` | Run JavaScript in page context |
133
+ | `type` | Type text into element |
134
+ | `type ... submit` | Type and press Enter |
135
+ | `fill` | Fill a form field |
94
136
  | `batchfill` | Fill multiple fields at once |
95
- | `dialog` | Handle alert/confirm/prompt dialogs |
137
+ | `press` | Press a key (Enter, Escape, Tab, Arrows) |
96
138
 
97
- ## CLI Commands
139
+ ### Page Control
140
+ | Command | Description |
141
+ |---------|-------------|
142
+ | `scroll` | Scroll up/down by amount |
143
+ | `scrollintoview` | Scroll element into view |
144
+ | `wait` | Wait for text or element to appear |
145
+ | `evaluate` | Run JavaScript in page context |
146
+ | `dialog` | Handle alert/confirm/prompt |
98
147
 
99
- ```bash
100
- npx tab-agent setup # Configure everything (run once)
101
- npx tab-agent status # Check if everything is working
102
- npx tab-agent start # Manually start the relay server
103
- ```
148
+ ---
104
149
 
105
- ## How It Works
150
+ ## CLI Reference
106
151
 
152
+ ```bash
153
+ npx tab-agent setup # Initial configuration
154
+ npx tab-agent status # Check if everything works
155
+ npx tab-agent start # Start relay server manually
107
156
  ```
108
- ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
109
- │ Claude/Codex │────▶│ Relay Server │────▶│ Extension │
110
- │ (Your AI) │◀────│ (WebSocket) │◀────│ (Chrome) │
111
- └─────────────────┘ └─────────────────┘ └─────────────────┘
112
- :9876 │
113
-
114
- ┌─────────────────┐
115
- │ Activated Tab │
116
- │ (Green = ON) │
117
- └─────────────────┘
118
- ```
119
-
120
- 1. **Extension** runs in Chrome with access to your tabs and sessions
121
- 2. **Relay Server** bridges WebSocket (AI) ↔ Native Messaging (Extension)
122
- 3. **AI** sends commands, receives snapshots/screenshots, takes actions
123
-
124
- ## Security Model
125
157
 
126
- | Feature | Tab Agent | Traditional Automation |
127
- |---------|-----------|----------------------|
128
- | Tab Access | Only activated tabs | All tabs or new browser |
129
- | Sessions | Uses existing cookies | Requires re-login |
130
- | Visibility | Green badge shows active | Hidden/background |
131
- | Audit | Full action logging | Varies |
132
- | Credentials | Never shared | Often required |
158
+ ---
133
159
 
134
160
  ## Supported Browsers
135
161
 
@@ -138,21 +164,50 @@ npx tab-agent start # Manually start the relay server
138
164
  - Microsoft Edge
139
165
  - Chromium
140
166
 
141
- The setup automatically detects which browser you're using.
167
+ Setup automatically detects your browser.
168
+
169
+ ---
142
170
 
143
171
  ## Troubleshooting
144
172
 
145
173
  **Extension not detected?**
146
- - Make sure you loaded the `extension/` folder in chrome://extensions
147
- - Check that Developer mode is enabled
174
+ - Ensure `extension/` folder is loaded in chrome://extensions
175
+ - Developer mode must be enabled
176
+ - Try refreshing the extensions page
177
+
178
+ **Tab not responding?**
179
+ - Click the Tab Agent icon — must show green "ON" badge
180
+ - Refresh the page after activating
181
+
182
+ **Relay connection issues?**
183
+ - Run `npx tab-agent status` to check config
184
+ - Run `npx tab-agent start` to see error details
185
+
186
+ ---
148
187
 
149
- **Commands not working?**
150
- - Click the Tab Agent icon to activate the tab (must show green "ON")
151
- - Run `npx tab-agent status` to check configuration
188
+ ## How It Works
189
+
190
+ 1. **Chrome Extension** Runs in your browser with access to activated tabs and your session cookies
191
+
192
+ 2. **Relay Server** — Local WebSocket server (port 9876) that bridges AI ↔ Extension via Chrome's Native Messaging API
152
193
 
153
- **Relay not connecting?**
154
- - Run `npx tab-agent start` manually to see any errors
194
+ 3. **Skill File** — Tells Claude/Codex how to send commands to the relay
195
+
196
+ **Data flow:**
197
+ ```
198
+ You: "Search Google for cats"
199
+
200
+ Claude/Codex → WebSocket command → Relay Server → Native Messaging → Extension → DOM action
201
+
202
+ Results ← WebSocket response ← Relay Server ← Native Messaging ← Page snapshot
203
+ ```
204
+
205
+ ---
155
206
 
156
207
  ## License
157
208
 
158
209
  MIT
210
+
211
+ ---
212
+
213
+ **Made for [Claude Code](https://claude.ai/code) and [Codex](https://openai.com/codex)**
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tab-agent",
3
- "version": "0.2.1",
3
+ "version": "0.2.2",
4
4
  "description": "Browser control for Claude Code and Codex via WebSocket",
5
5
  "bin": {
6
6
  "tab-agent": "./bin/tab-agent.js"