tab-agent 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +114 -18
  2. package/package.json +9 -2
package/README.md CHANGED
@@ -2,28 +2,78 @@
2
2
 
3
3
  Secure tab-level browser control for Claude Code and Codex — only the tabs you explicitly activate, not your entire browser.
4
4
 
5
+ ## Why Tab Agent?
6
+
7
+ ### Security First
8
+ Unlike browser automation tools that control your entire browser, Tab Agent uses a **click-to-activate** model:
9
+ - Only tabs you explicitly activate (green badge) can be controlled
10
+ - Your banking, email, and other sensitive tabs remain completely isolated
11
+ - No background access — you see exactly which tabs AI can interact with
12
+ - Full audit logging of every action taken
13
+
14
+ ### Works With Your Session
15
+ Tab Agent operates through a Chrome extension, which means:
16
+ - **Uses your existing cookies and login sessions** — no need to re-authenticate
17
+ - Access sites that require login (GitHub, Twitter, internal tools, etc.)
18
+ - Works with SSO, 2FA-protected accounts, and enterprise apps
19
+ - No credential sharing or token management needed
20
+
21
+ ### AI-Optimized
22
+ - **Semantic snapshots** — pages converted to AI-readable text with element refs `[e1]`, `[e2]`
23
+ - **Screenshot fallback** — for complex/dynamic pages, get visual screenshots
24
+ - **Smart element targeting** — click, type, fill using simple refs instead of fragile selectors
25
+
5
26
  ## Install
6
27
 
7
28
  ### 1. Load Extension
29
+
8
30
  ```bash
9
31
  git clone https://github.com/DrHB/tab-agent
10
32
  ```
11
33
 
12
34
  1. Open `chrome://extensions`
13
- 2. Enable **Developer mode**
14
- 3. Click **Load unpacked** → select `extension/` folder
35
+ 2. Enable **Developer mode** (top right)
36
+ 3. Click **Load unpacked** → select the `extension/` folder
37
+ 4. You'll see the Tab Agent icon in your toolbar
38
+
39
+ ### 2. Run Setup
15
40
 
16
- ### 2. Setup
17
41
  ```bash
18
42
  npx tab-agent setup
19
43
  ```
20
44
 
21
- That's it! The setup auto-detects your extension and configures everything.
45
+ This auto-detects your extension and configures everything (native messaging + skills).
46
+
47
+ ### 3. Use It
22
48
 
23
- ## Use
49
+ 1. **Click the Tab Agent icon** on any tab you want to control (turns green = active)
50
+ 2. **Ask Claude/Codex:**
51
+ - "Use tab-agent to search Google for 'best restaurants nearby'"
52
+ - "Go to my GitHub notifications and summarize them"
53
+ - "Fill out this form with my details"
54
+
55
+ ## Example Use Cases
56
+
57
+ ### Web Research
58
+ ```
59
+ "Go to Hacker News and get me the top 5 articles with summaries"
60
+ ```
61
+
62
+ ### Authenticated Actions
63
+ ```
64
+ "Check my GitHub notifications and mark the resolved ones as read"
65
+ ```
66
+ Works because Tab Agent uses your existing GitHub session!
67
+
68
+ ### Form Automation
69
+ ```
70
+ "Fill out this job application with my resume details"
71
+ ```
24
72
 
25
- 1. Click Tab Agent icon on any tab (turns green = active)
26
- 2. Ask Claude/Codex: "Use tab-agent to search Google for 'hello world'"
73
+ ### Data Extraction
74
+ ```
75
+ "Go to my Twitter timeline and get the last 20 tweets"
76
+ ```
27
77
 
28
78
  ## Commands
29
79
 
@@ -31,32 +81,78 @@ That's it! The setup auto-detects your extension and configures everything.
31
81
  |---------|-------------|
32
82
  | `tabs` | List activated tabs |
33
83
  | `snapshot` | Get AI-readable page with refs [e1], [e2]... |
34
- | `screenshot` | Capture viewport (or `fullPage: true` for full page) |
84
+ | `screenshot` | Capture viewport (add `fullPage: true` for full page) |
35
85
  | `click` | Click element by ref |
36
86
  | `fill` | Fill form field |
37
- | `type` | Type text (with optional `submit: true`) |
38
- | `press` | Press key (Enter, Escape, Tab, Arrow*) |
39
- | `scroll` | Scroll page |
87
+ | `type` | Type text (add `submit: true` to press Enter) |
88
+ | `press` | Press key (Enter, Escape, Tab, Arrow keys) |
89
+ | `scroll` | Scroll page up/down |
40
90
  | `scrollintoview` | Scroll element into view |
41
91
  | `navigate` | Go to URL |
42
- | `wait` | Wait for text or selector |
92
+ | `wait` | Wait for text or selector to appear |
43
93
  | `evaluate` | Run JavaScript in page context |
44
94
  | `batchfill` | Fill multiple fields at once |
45
- | `dialog` | Handle alert/confirm/prompt |
95
+ | `dialog` | Handle alert/confirm/prompt dialogs |
46
96
 
47
- ## Manual Commands
97
+ ## CLI Commands
48
98
 
49
99
  ```bash
50
- npx tab-agent status # Check configuration
51
- npx tab-agent start # Start relay manually
100
+ npx tab-agent setup # Configure everything (run once)
101
+ npx tab-agent status # Check if everything is working
102
+ npx tab-agent start # Manually start the relay server
52
103
  ```
53
104
 
54
- ## Architecture
105
+ ## How It Works
55
106
 
56
107
  ```
57
- Claude/Codex → WebSocket:9876 → Relay → Native Messaging → Extension → DOM
108
+ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
109
+ │ Claude/Codex │────▶│ Relay Server │────▶│ Extension │
110
+ │ (Your AI) │◀────│ (WebSocket) │◀────│ (Chrome) │
111
+ └─────────────────┘ └─────────────────┘ └─────────────────┘
112
+ :9876 │
113
+
114
+ ┌─────────────────┐
115
+ │ Activated Tab │
116
+ │ (Green = ON) │
117
+ └─────────────────┘
58
118
  ```
59
119
 
120
+ 1. **Extension** runs in Chrome with access to your tabs and sessions
121
+ 2. **Relay Server** bridges WebSocket (AI) ↔ Native Messaging (Extension)
122
+ 3. **AI** sends commands, receives snapshots/screenshots, takes actions
123
+
124
+ ## Security Model
125
+
126
+ | Feature | Tab Agent | Traditional Automation |
127
+ |---------|-----------|----------------------|
128
+ | Tab Access | Only activated tabs | All tabs or new browser |
129
+ | Sessions | Uses existing cookies | Requires re-login |
130
+ | Visibility | Green badge shows active | Hidden/background |
131
+ | Audit | Full action logging | Varies |
132
+ | Credentials | Never shared | Often required |
133
+
134
+ ## Supported Browsers
135
+
136
+ - Google Chrome
137
+ - Brave
138
+ - Microsoft Edge
139
+ - Chromium
140
+
141
+ The setup automatically detects which browser you're using.
142
+
143
+ ## Troubleshooting
144
+
145
+ **Extension not detected?**
146
+ - Make sure you loaded the `extension/` folder in chrome://extensions
147
+ - Check that Developer mode is enabled
148
+
149
+ **Commands not working?**
150
+ - Click the Tab Agent icon to activate the tab (must show green "ON")
151
+ - Run `npx tab-agent status` to check configuration
152
+
153
+ **Relay not connecting?**
154
+ - Run `npx tab-agent start` manually to see any errors
155
+
60
156
  ## License
61
157
 
62
158
  MIT
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tab-agent",
3
- "version": "0.2.0",
3
+ "version": "0.2.1",
4
4
  "description": "Browser control for Claude Code and Codex via WebSocket",
5
5
  "bin": {
6
6
  "tab-agent": "./bin/tab-agent.js"
@@ -19,7 +19,14 @@
19
19
  "dependencies": {
20
20
  "ws": "^8.16.0"
21
21
  },
22
- "keywords": ["chrome", "extension", "browser", "automation", "claude", "codex"],
22
+ "keywords": [
23
+ "chrome",
24
+ "extension",
25
+ "browser",
26
+ "automation",
27
+ "claude",
28
+ "codex"
29
+ ],
23
30
  "repository": "https://github.com/DrHB/tab-agent",
24
31
  "license": "MIT"
25
32
  }