npm - tab-agent - Versions diffs - 0.3.1 → 0.3.3 - Mend

tab-agent 0.3.1 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +80 -145
package/bin/tab-agent.js +17 -15
package/cli/command.js +11 -11
package/cli/status.js +2 -2
package/package.json +23 -8
package/relay/install-native-host.sh +2 -2
package/relay/native-host-wrapper.sh +1 -1
package/relay/package.json +2 -2
package/relay/server.js +1 -1
package/skills/claude-code/tab-agent.md +13 -15
package/skills/codex/tab-agent.md +17 -13

package/README.md CHANGED Viewed

@@ -1,12 +1,15 @@
 # Tab Agent
 [![npm version](https://img.shields.io/npm/v/tab-agent.svg)](https://www.npmjs.com/package/tab-agent)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-**Give Claude, Codex, or any LLM full control of your browser tabs** — securely, with click-to-activate permission.
+**Give LLMs full control of your browser** — securely, with click-to-activate permission.
+Works with Claude, ChatGPT, Codex, and any AI that can run shell commands.
 ```
 ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
-│   Claude Code   │────▶│  Relay Server   │────▶│    Extension    │
+│  Claude / GPT   │────▶│  Relay Server   │────▶│    Extension    │
 │   Codex / LLM   │◀────│   (background)  │◀────│    (Chrome)     │
 └─────────────────┘     └─────────────────┘     └─────────────────┘
                                                        │
@@ -20,160 +23,103 @@
 ## Features
 - **Full browser control** — navigate, click, type, scroll, screenshot, run JavaScript
-- **Uses your login sessions** — access authenticated sites (GitHub, Gmail, X) without sharing credentials
-- **Runs in background** — relay server starts automatically, works while you do other things
-- **Click-to-activate security** — only tabs you explicitly enable, your other tabs stay private
-- **AI-optimized snapshots** — pages converted to readable text with element refs `[e1]`, `[e2]`
-- **Works with any LLM** — Claude Code, Codex, or any tool that can run shell commands
----
+- **Uses your login sessions** — access GitHub, Gmail, Amazon without sharing credentials
+- **Runs in background** — relay starts automatically, works while you do other things
+- **Click-to-activate security** — only tabs you explicitly enable, others stay private
+- **AI-optimized snapshots** — pages converted to text with refs `[e1]`, `[e2]` for easy targeting
+- **Works with any LLM** — Claude, ChatGPT, Codex, or custom AI agents
 ## Quick Start
 ```bash
-# 1. Clone and load extension
+# 1. Install extension
 git clone https://github.com/DrHB/tab-agent
-# → Chrome: chrome://extensions → Developer mode → Load unpacked → select extension/
+# Chrome: chrome://extensions → Developer mode → Load unpacked → select extension/
-# 2. Setup (auto-detects everything)
+# 2. Setup
 npx tab-agent setup
-# 3. Activate a tab & go!
-# → Click Tab Agent icon on any tab (turns green = active)
-# → Ask Claude: "Use tab-agent to search Google for 'hello world'"
+# 3. Activate & go
+# Click extension icon on any tab (turns green)
+# Ask your AI: "Search Amazon for mechanical keyboards and find the best rated"
 ```
----
-## Why Tab Agent?
-### 🔒 Security First
-| | Tab Agent | Traditional Automation |
-|--|-----------|----------------------|
-| **Access** | Only tabs you activate | Entire browser |
-| **Visibility** | Green badge = active | Hidden/background |
-| **Sessions** | Uses your cookies | Requires re-login |
-| **Credentials** | Never shared | Often required |
-**Click-to-activate model:** Your banking, email, and sensitive tabs stay completely isolated. You always see exactly which tabs the LLM can control.
-### 🍪 Works With Your Login Sessions
-Because Tab Agent runs as a Chrome extension:
+## Example Tasks
-- **Uses your existing cookies** — no re-authentication needed
-- **Access any site you're logged into** — GitHub, X, Gmail, internal tools
-- **Works with SSO and 2FA** — enterprise apps, protected accounts
-- **No credential sharing** — your passwords stay in your browser
-### 🤖 LLM-Optimized
-- **Semantic snapshots** — pages converted to readable text with refs `[e1]`, `[e2]`
-- **Screenshot fallback** — for complex dynamic pages
-- **Simple targeting** — click/type using refs instead of fragile CSS selectors
----
+```bash
+# Research
+"Go to Hacker News and summarize the top 5 stories"
-## Example Use Cases
+# Shopping (uses your login!)
+"Search Amazon for protein powder, filter by 4+ stars, find the best value"
-**Web Research**
-> "Go to Hacker News and summarize the top 5 articles"
+# Social Media
+"Check my GitHub notifications and list unread ones"
-**Authenticated Actions** (uses your session!)
-> "Check my GitHub notifications and list the unread ones"
+# Data Extraction
+"Get the titles and prices of the first 10 products on this page"
-**Form Automation**
-> "Fill out this contact form with my details"
+# Automation
+"Fill out this form with my details"
+```
-**Data Extraction**
-> "Get the last 20 posts from my X timeline with author names"
+## Commands
-**Multi-step Workflows**
-> "Search Amazon for 'mechanical keyboard', filter by 4+ stars, and list the top 3"
+```bash
+# Core workflow
+npx tab-agent snapshot                # Get page content with refs [e1], [e2]...
+npx tab-agent click <ref>             # Click element (e.g., click e5)
+npx tab-agent type <ref> <text>       # Type into element
+npx tab-agent fill <ref> <value>      # Fill form field
+# Navigation
+npx tab-agent navigate <url>          # Go to URL
+npx tab-agent scroll <dir> [amount]   # Scroll up/down
+npx tab-agent press <key>             # Press key (Enter, Escape, Tab)
+# Utilities
+npx tab-agent tabs                    # List active tabs
+npx tab-agent wait <text>             # Wait for text to appear
+npx tab-agent screenshot              # Capture page (fallback for complex UIs)
+```
----
+**Workflow:** `snapshot` → use refs → `click`/`type` → `snapshot` again → repeat
 ## Installation
-### Step 1: Load Extension
+### 1. Load Extension
 ```bash
 git clone https://github.com/DrHB/tab-agent
 ```
-1. Open `chrome://extensions` in your browser
-2. Enable **Developer mode** (toggle in top right)
+1. Open `chrome://extensions`
+2. Enable **Developer mode** (top right)
 3. Click **Load unpacked**
 4. Select the `extension/` folder
-5. You'll see the Tab Agent icon in your toolbar
-### Step 2: Run Setup
+### 2. Run Setup
 ```bash
 npx tab-agent setup
 ```
-This automatically:
-- Detects your extension ID
-- Configures native messaging
-- Installs the Claude/Codex skill
+This auto-detects your extension and configures everything.
-### Step 3: Activate & Use
+### 3. Activate Tabs
-1. Navigate to any webpage
-2. **Click the Tab Agent icon** — it turns green (🟢 ON)
-3. Ask your LLM to interact with the page
+Click the Tab Agent icon on any tab you want to control. Green = active.
----
-## Commands Reference
-### Navigation & Viewing
-| Command | Description |
-|---------|-------------|
-| `tabs` | List all activated tabs |
-| `navigate` | Go to a URL |
-| `snapshot` | Get page with element refs |
-| `screenshot` | Capture viewport image |
-| `screenshot --full` | Capture entire page |
-### Interaction
-| Command | Description |
-|---------|-------------|
-| `click` | Click element by ref |
-| `type` | Type text into element |
-| `fill` | Fill a form field |
-| `press` | Press a key (Enter, Escape, Tab, Arrows) |
-### Page Control
-| Command | Description |
-|---------|-------------|
-| `scroll` | Scroll up/down by amount |
-| `wait` | Wait for text or element to appear |
-| `evaluate` | Run JavaScript in page context |
----
-## CLI Usage
+## Security Model
-```bash
-# Setup & Status
-npx tab-agent setup                   # Initial configuration
-npx tab-agent status                  # Check if everything works
-npx tab-agent start                   # Start relay server manually
+| Feature | Tab Agent | Traditional Automation |
+|---------|--------------|----------------------|
+| **Access** | Only tabs you click to activate | Entire browser |
+| **Sessions** | Uses your cookies | Requires credentials |
+| **Visibility** | Green badge shows active tabs | Hidden/background |
+| **Control** | You choose what AI can access | Full access by default |
-# Browser Commands
-npx tab-agent tabs                    # List active tabs
-npx tab-agent snapshot                # Get page content with refs
-npx tab-agent screenshot              # Capture viewport
-npx tab-agent screenshot --full       # Capture full page
-npx tab-agent click e5                # Click element
-npx tab-agent type e3 "hello"         # Type text
-npx tab-agent navigate "https://..."  # Go to URL
-```
----
+Your banking, email, and sensitive tabs stay completely isolated unless you explicitly activate them.
 ## Supported Browsers
@@ -182,50 +128,39 @@ npx tab-agent navigate "https://..."  # Go to URL
 - Microsoft Edge
 - Chromium
-Setup automatically detects your browser.
----
 ## Troubleshooting
 **Extension not detected?**
-- Ensure `extension/` folder is loaded in chrome://extensions
-- Developer mode must be enabled
-- Try refreshing the extensions page
+- Make sure Developer mode is enabled in chrome://extensions
+- Reload the extension
-**Tab not responding?**
-- Click the Tab Agent icon — must show green "ON" badge
-- Refresh the page after activating
+**Commands not working?**
+- Click the extension icon — must show green "ON"
+- Run `npx tab-agent status` to check configuration
-**Relay connection issues?**
-- Run `npx tab-agent status` to check config
-- Run `npx tab-agent start` to see error details
----
+**No active tabs?**
+- Activate at least one tab by clicking the extension icon
 ## How It Works
-1. **Chrome Extension** — Runs in your browser with access to activated tabs and your session cookies
+1. **Chrome Extension** — Injects into activated tabs, captures DOM snapshots
+2. **Relay Server** — Bridges AI ↔ Extension via Chrome Native Messaging (runs in background)
+3. **CLI** — Simple commands that any LLM can execute
-2. **Relay Server** — Local WebSocket server that bridges LLM ↔ Extension via Chrome's Native Messaging API (runs in background)
-3. **Skill File** — Tells Claude/Codex how to send commands
-**Data flow:**
 ```
-You: "Search Google for cats"
+You: "Find cheap flights to Tokyo"
  ↓
-LLM → CLI command → Relay Server → Native Messaging → Extension → Browser action
- ↑
-Results ← Response ← Relay Server ← Native Messaging ← Page snapshot
+LLM → npx tab-agent navigate "google.com/flights"
+    → npx tab-agent snapshot
+    → npx tab-agent type e5 "Tokyo"
+    → npx tab-agent click e12
+    → ...
 ```
----
 ## License
 MIT
 ---
-**Works with [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), and any LLM that can run shell commands.**
+**Keywords:** browser agent, browser automation, AI browser control, Claude browser, ChatGPT browser, LLM web automation, Codex browser, puppeteer alternative, playwright alternative

package/bin/tab-agent.js CHANGED Viewed

@@ -30,31 +30,33 @@ if (BROWSER_COMMANDS.includes(command)) {
 function showHelp() {
   console.log(`
-tab-agent - Browser control for Claude/Codex
+tabpilot - Give LLMs full control of your browser
-Setup Commands:
-  setup    Auto-detect extension, register native host, install skills
+Setup:
+  setup    Auto-detect extension, configure native messaging
   start    Start the relay server
-  status   Check configuration status
+  status   Check configuration
-Browser Commands:
-  tabs                      List active tabs
-  snapshot                  Get AI-readable page content
-  screenshot [--full]       Capture screenshot
+Browser Control:
+  snapshot                  Get page content with refs [e1], [e2]...
   click <ref>               Click element (e.g., click e5)
-  type <ref> <text>         Type text into element
+  type <ref> <text>         Type into element
   fill <ref> <value>        Fill form field
-  press <key>               Press key (Enter, Escape, etc.)
+  press <key>               Press key (Enter, Escape, Tab)
   scroll <dir> [amount]     Scroll up/down
   navigate <url>            Go to URL
+  tabs                      List active tabs
   wait <text|selector>      Wait for text or element
-  evaluate <script>         Run JavaScript
+  screenshot [--full]       Capture page (fallback)
+Workflow: snapshot → click/type → snapshot → repeat
 Examples:
-  npx tab-agent setup
-  npx tab-agent snapshot
-  npx tab-agent click e5
-  npx tab-agent type e3 "hello world"
+  npx tabpilot setup
+  npx tabpilot snapshot
+  npx tabpilot click e5
+  npx tabpilot type e3 "hello world"
+  npx tabpilot navigate "https://google.com"
 Version: ${require('../package.json').version}
 `);

package/cli/command.js CHANGED Viewed

@@ -130,30 +130,30 @@ function buildPayload(command, params, tabId) {
 function printHelp() {
   console.log(`
-tab-agent - Browser control commands
+tab-agent - Give LLMs full control of your browser
 Usage: npx tab-agent <command> [options]
 Commands:
-  tabs                      List active tabs
-  snapshot                  Get AI-readable page content
-  screenshot [--full]       Capture screenshot (--full for full page)
+  snapshot                  Get page content with refs [e1], [e2]...
   click <ref>               Click element (e.g., click e5)
-  type <ref> <text>         Type text into element
+  type <ref> <text>         Type into element
   fill <ref> <value>        Fill form field
-  press <key>               Press key (Enter, Escape, Tab, etc.)
-  scroll <dir> [amount]     Scroll up/down (default: 500px)
+  press <key>               Press key (Enter, Escape, Tab)
+  scroll <dir> [amount]     Scroll up/down
   navigate <url>            Go to URL
+  tabs                      List active tabs
   wait <text|selector>      Wait for text or element
+  screenshot [--full]       Capture page (fallback)
   evaluate <script>         Run JavaScript
+Workflow: snapshot → click/type → snapshot → repeat
 Examples:
-  npx tab-agent tabs
   npx tab-agent snapshot
   npx tab-agent click e5
-  npx tab-agent type e3 hello world
-  npx tab-agent navigate https://google.com
-  npx tab-agent screenshot --full
+  npx tab-agent type e3 "hello world"
+  npx tab-agent navigate "https://google.com"
 `);
 }

package/cli/status.js CHANGED Viewed

@@ -54,8 +54,8 @@ async function status() {
   const claudeSkill = path.join(home, '.claude', 'skills', 'tab-agent.md');
   const codexSkill = path.join(home, '.codex', 'skills', 'tab-agent.md');
-  console.log(`\nClaude Skill: ${fs.existsSync(claudeSkill) ? 'Installed' : 'Not installed'} ${claudeSkill}`);
-  console.log(`Codex Skill:  ${fs.existsSync(codexSkill) ? 'Installed' : 'Not installed (optional)'} ${codexSkill}`);
+  console.log(`\nClaude Skill: ${fs.existsSync(claudeSkill) ? 'Installed' : 'Not installed'}`);
+  console.log(`Codex Skill:  ${fs.existsSync(codexSkill) ? 'Installed' : 'Not installed (optional)'}`);
   // Check relay server
   console.log('\nRelay Server:');

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "tab-agent",
-  "version": "0.3.1",
-  "description": "Browser control for Claude Code and Codex via WebSocket",
+  "version": "0.3.3",
+  "description": "Give LLMs full control of your browser - secure, click-to-activate automation for Claude, ChatGPT, Codex, and any AI",
   "bin": {
     "tab-agent": "./bin/tab-agent.js"
   },
@@ -20,13 +20,28 @@
     "ws": "^8.16.0"
   },
   "keywords": [
-    "chrome",
-    "extension",
-    "browser",
-    "automation",
+    "tab-agent",
+    "browser-agent",
+    "browser-automation",
+    "browser-control",
+    "ai-browser",
+    "llm-browser",
     "claude",
-    "codex"
+    "chatgpt",
+    "codex",
+    "openai",
+    "anthropic",
+    "chrome-extension",
+    "web-automation",
+    "ai-agent",
+    "puppeteer-alternative",
+    "playwright-alternative",
+    "web-agent"
   ],
-  "repository": "https://github.com/DrHB/tab-agent",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/DrHB/tab-agent"
+  },
+  "homepage": "https://github.com/DrHB/tab-agent#readme",
   "license": "MIT"
 }

package/relay/install-native-host.sh CHANGED Viewed

@@ -2,7 +2,7 @@
 set -e
 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
-HOST_NAME="com.tabagent.relay"
+HOST_NAME="com.tabpilot.relay"
 HOST_DIR="$HOME/Library/Application Support/TabAgent"
 WRAPPER_PATH="$HOST_DIR/native-host-wrapper.sh"
@@ -40,7 +40,7 @@ cp -R "$SCRIPT_DIR/node_modules" "$HOST_DIR/node_modules"
 cat > "$MANIFEST_DIR/$HOST_NAME.json" << EOF
 {
   "name": "$HOST_NAME",
-  "description": "Tab Agent Native Messaging Host",
+  "description": "TabPilot Native Messaging Host",
   "path": "$WRAPPER_PATH",
   "type": "stdio",
   "allowed_origins": [

package/relay/native-host-wrapper.sh CHANGED Viewed

@@ -6,7 +6,7 @@ cd "$SCRIPT_DIR"
 LOG_FILE="$SCRIPT_DIR/wrapper.log"
 echo "$(date): Starting native host from $SCRIPT_DIR" >> "$LOG_FILE"
-export TAB_AGENT_LOG="/tmp/tab-agent-native-host.log"
+export TAB_AGENT_LOG="/tmp/tabpilot-native-host.log"
 NODE_BIN="/opt/homebrew/bin/node"
 if [ ! -x "$NODE_BIN" ]; then

package/relay/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
-  "name": "tab-agent-relay",
+  "name": "tabpilot-relay",
   "version": "0.1.0",
-  "description": "WebSocket relay for Tab Agent Chrome extension",
+  "description": "WebSocket relay for TabPilot Chrome extension",
   "main": "server.js",
   "scripts": {
     "start": "node server.js"

package/relay/server.js CHANGED Viewed

@@ -102,7 +102,7 @@ wss.on('connection', (ws, req) => {
 });
 httpServer.listen(PORT, () => {
-  console.log(`Tab Agent Relay running on ws://localhost:${PORT}`);
+  console.log(`TabPilot Relay running on ws://localhost:${PORT}`);
   console.log(`Health check: http://localhost:${PORT}/health`);
 });

package/skills/claude-code/tab-agent.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: tab-agent
-description: Browser control via CLI - snapshot, click, type, fill, screenshot
+description: Browser control via CLI - snapshot, click, type, navigate
 ---
 # Tab Agent
@@ -17,26 +17,27 @@ sleep 2
 ## Commands
 ```bash
-npx tab-agent tabs                    # List active tabs
 npx tab-agent snapshot                # Get page with refs [e1], [e2]...
-npx tab-agent screenshot              # Capture viewport
-npx tab-agent screenshot --full       # Capture full page
 npx tab-agent click <ref>             # Click element
 npx tab-agent type <ref> <text>       # Type text
 npx tab-agent fill <ref> <value>      # Fill form field
 npx tab-agent press <key>             # Press key (Enter, Escape, Tab)
 npx tab-agent scroll <dir> [amount]   # Scroll up/down
 npx tab-agent navigate <url>          # Go to URL
+npx tab-agent tabs                    # List active tabs
 npx tab-agent wait <text|selector>    # Wait for condition
-npx tab-agent evaluate <script>       # Run JavaScript
+npx tab-agent screenshot              # Capture page (fallback only)
 ```
-## Usage
+## Workflow
-1. `tabs` -> find active tab
-2. `snapshot` -> read page, get element refs [e1], [e2]...
-3. `click`/`type`/`fill` using refs
-4. If snapshot incomplete -> `screenshot` and analyze visually
+1. `snapshot` first - always start here to get element refs
+2. Use refs [e1], [e2]... with `click`/`type`/`fill`
+3. `snapshot` again after actions to see results
+4. **Only use `screenshot` if:**
+   - Snapshot is missing expected content
+   - Page has complex visuals (charts, images, canvas)
+   - Debugging why an action didn't work
 ## Examples
@@ -46,14 +47,11 @@ npx tab-agent navigate "https://google.com"
 npx tab-agent snapshot
 npx tab-agent type e1 "hello world"
 npx tab-agent press Enter
-# Read page content
-npx tab-agent snapshot
-npx tab-agent screenshot --full
+npx tab-agent snapshot  # See results
 ```
 ## Notes
-- Screenshot saves to /tmp/ and opens automatically
 - Refs reset on each snapshot - always snapshot before interacting
 - Keys: Enter, Escape, Tab, Backspace, ArrowUp/Down/Left/Right
+- Prefer snapshot over screenshot - faster and text-based

package/skills/codex/tab-agent.md CHANGED Viewed

@@ -16,19 +16,23 @@ curl -s http://localhost:9876/health || (npx tab-agent start &)
 ## Commands
 ```bash
-tabs                         # List active tabs
-snapshot                     # Page with refs [e1], [e2]...
-screenshot [--full]          # Capture viewport/full page
-click <ref>                  # Click element
-type <ref> <text>            # Type text
-fill <ref> <value>           # Fill form field
-press <key>                  # Enter/Escape/Tab/Arrow*
-scroll <dir> [amount]        # Scroll up/down
-navigate <url>               # Go to URL
-wait <text|selector>         # Wait for condition
-evaluate <script>            # Run JavaScript
+npx tab-agent snapshot          # Page with refs [e1], [e2]...
+npx tab-agent click <ref>       # Click element
+npx tab-agent type <ref> <text> # Type text
+npx tab-agent fill <ref> <val>  # Fill form field
+npx tab-agent press <key>       # Enter/Escape/Tab/Arrow*
+npx tab-agent scroll <dir> [n]  # Scroll up/down
+npx tab-agent navigate <url>    # Go to URL
+npx tab-agent tabs              # List active tabs
+npx tab-agent wait <text|sel>   # Wait for condition
+npx tab-agent screenshot        # Fallback only - if snapshot incomplete
 ```
-## Flow
+## Workflow
-`snapshot` -> `click`/`type` -> repeat. Use `screenshot` if snapshot incomplete.
+1. Always `snapshot` first - get refs [e1], [e2]...
+2. `click`/`type`/`fill` using refs
+3. `snapshot` again to see results
+4. **Only screenshot if snapshot missing content** (charts, canvas, debugging)
+Prefer snapshot over screenshot - faster and text-based.