npm - tab-agent - Versions diffs - 0.2.3 → 0.3.1 - Mend

tab-agent 0.2.3 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +50 -32
package/bin/tab-agent.js +52 -28
package/cli/command.js +200 -0
package/extension/service-worker.js +6 -15
package/package.json +1 -1
package/skills/claude-code/tab-agent.md +34 -28
package/skills/codex/tab-agent.md +15 -21

package/README.md CHANGED Viewed

@@ -1,32 +1,45 @@
 # Tab Agent
-**Secure browser control for Claude Code and Codex** — only the tabs you explicitly activate, not your entire browser.
+[![npm version](https://img.shields.io/npm/v/tab-agent.svg)](https://www.npmjs.com/package/tab-agent)
+**Give Claude, Codex, or any LLM full control of your browser tabs** — securely, with click-to-activate permission.
 ```
 ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
-│  Claude/Codex   │────▶│  Relay Server   │────▶│    Extension    │
-│                 │◀────│   :9876         │◀────│    (Chrome)     │
+│   Claude Code   │────▶│  Relay Server   │────▶│    Extension    │
+│   Codex / LLM   │◀────│   (background)  │◀────│    (Chrome)     │
 └─────────────────┘     └─────────────────┘     └─────────────────┘
-                                                        │
-                                                        ▼
-                                              ┌───────────────────┐
-                                              │  Your Active Tab  │
-                                              │    🟢 ON          │
-                                              └───────────────────┘
+                                                       │
+                                                       ▼
+                                             ┌───────────────────┐
+                                             │  Your Active Tab  │
+                                             │   🟢 Click to ON  │
+                                             └───────────────────┘
 ```
+## Features
+- **Full browser control** — navigate, click, type, scroll, screenshot, run JavaScript
+- **Uses your login sessions** — access authenticated sites (GitHub, Gmail, X) without sharing credentials
+- **Runs in background** — relay server starts automatically, works while you do other things
+- **Click-to-activate security** — only tabs you explicitly enable, your other tabs stay private
+- **AI-optimized snapshots** — pages converted to readable text with element refs `[e1]`, `[e2]`
+- **Works with any LLM** — Claude Code, Codex, or any tool that can run shell commands
+---
 ## Quick Start
 ```bash
 # 1. Clone and load extension
 git clone https://github.com/DrHB/tab-agent
-# → Open chrome://extensions → Enable Developer mode → Load unpacked → select extension/
+# → Chrome: chrome://extensions → Developer mode → Load unpacked → select extension/
 # 2. Setup (auto-detects everything)
 npx tab-agent setup
-# 3. Use it
-# → Click Tab Agent icon on a tab (turns green)
+# 3. Activate a tab & go!
+# → Click Tab Agent icon on any tab (turns green = active)
 # → Ask Claude: "Use tab-agent to search Google for 'hello world'"
 ```
@@ -42,9 +55,8 @@ npx tab-agent setup
 | **Visibility** | Green badge = active | Hidden/background |
 | **Sessions** | Uses your cookies | Requires re-login |
 | **Credentials** | Never shared | Often required |
-| **Audit** | Full action logging | Varies |
-**Click-to-activate model:** Your banking, email, and sensitive tabs stay completely isolated. You always see exactly which tabs AI can control.
+**Click-to-activate model:** Your banking, email, and sensitive tabs stay completely isolated. You always see exactly which tabs the LLM can control.
 ### 🍪 Works With Your Login Sessions
@@ -55,9 +67,9 @@ Because Tab Agent runs as a Chrome extension:
 - **Works with SSO and 2FA** — enterprise apps, protected accounts
 - **No credential sharing** — your passwords stay in your browser
-### 🤖 AI-Optimized
+### 🤖 LLM-Optimized
-- **Semantic snapshots** — pages converted to AI-readable text with refs `[e1]`, `[e2]`
+- **Semantic snapshots** — pages converted to readable text with refs `[e1]`, `[e2]`
 - **Screenshot fallback** — for complex dynamic pages
 - **Simple targeting** — click/type using refs instead of fragile CSS selectors
@@ -111,7 +123,7 @@ This automatically:
 1. Navigate to any webpage
 2. **Click the Tab Agent icon** — it turns green (🟢 ON)
-3. Ask your AI to interact with the page
+3. Ask your LLM to interact with the page
 ---
@@ -122,37 +134,43 @@ This automatically:
 |---------|-------------|
 | `tabs` | List all activated tabs |
 | `navigate` | Go to a URL |
-| `snapshot` | Get AI-readable page with element refs |
+| `snapshot` | Get page with element refs |
 | `screenshot` | Capture viewport image |
-| `screenshot fullPage` | Capture entire page |
+| `screenshot --full` | Capture entire page |
 ### Interaction
 | Command | Description |
 |---------|-------------|
 | `click` | Click element by ref |
 | `type` | Type text into element |
-| `type ... submit` | Type and press Enter |
 | `fill` | Fill a form field |
-| `batchfill` | Fill multiple fields at once |
 | `press` | Press a key (Enter, Escape, Tab, Arrows) |
 ### Page Control
 | Command | Description |
 |---------|-------------|
 | `scroll` | Scroll up/down by amount |
-| `scrollintoview` | Scroll element into view |
 | `wait` | Wait for text or element to appear |
 | `evaluate` | Run JavaScript in page context |
-| `dialog` | Handle alert/confirm/prompt |
 ---
-## CLI Reference
+## CLI Usage
 ```bash
-npx tab-agent setup   # Initial configuration
-npx tab-agent status  # Check if everything works
-npx tab-agent start   # Start relay server manually
+# Setup & Status
+npx tab-agent setup                   # Initial configuration
+npx tab-agent status                  # Check if everything works
+npx tab-agent start                   # Start relay server manually
+# Browser Commands
+npx tab-agent tabs                    # List active tabs
+npx tab-agent snapshot                # Get page content with refs
+npx tab-agent screenshot              # Capture viewport
+npx tab-agent screenshot --full       # Capture full page
+npx tab-agent click e5                # Click element
+npx tab-agent type e3 "hello"         # Type text
+npx tab-agent navigate "https://..."  # Go to URL
 ```
 ---
@@ -189,17 +207,17 @@ Setup automatically detects your browser.
 1. **Chrome Extension** — Runs in your browser with access to activated tabs and your session cookies
-2. **Relay Server** — Local WebSocket server (port 9876) that bridges AI ↔ Extension via Chrome's Native Messaging API
+2. **Relay Server** — Local WebSocket server that bridges LLM ↔ Extension via Chrome's Native Messaging API (runs in background)
-3. **Skill File** — Tells Claude/Codex how to send commands to the relay
+3. **Skill File** — Tells Claude/Codex how to send commands
 **Data flow:**
 ```
 You: "Search Google for cats"
  ↓
-Claude/Codex → WebSocket command → Relay Server → Native Messaging → Extension → DOM action
+LLM → CLI command → Relay Server → Native Messaging → Extension → Browser action
  ↑
-Results ← WebSocket response ← Relay Server ← Native Messaging ← Page snapshot
+Results ← Response ← Relay Server ← Native Messaging ← Page snapshot
 ```
 ---
@@ -210,4 +228,4 @@ MIT
 ---
-**Made for [Claude Code](https://claude.ai/code) and [Codex](https://openai.com/codex)**
+**Works with [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), and any LLM that can run shell commands.**

package/bin/tab-agent.js CHANGED Viewed

@@ -1,40 +1,64 @@
 #!/usr/bin/env node
 const command = process.argv[2];
-const pkg = require('../package.json');
+// Commands that go to the command module
+const BROWSER_COMMANDS = ['tabs', 'snapshot', 'screenshot', 'click', 'type', 'fill', 'press', 'scroll', 'navigate', 'wait', 'evaluate'];
+if (command === '-v' || command === '--version') {
+  console.log(require('../package.json').version);
+  process.exit(0);
+}
+if (BROWSER_COMMANDS.includes(command)) {
+  const { runCommand } = require('../cli/command.js');
+  runCommand(process.argv.slice(2));
+} else {
+  switch (command) {
+    case 'setup':
+      require('../cli/setup.js');
+      break;
+    case 'start':
+      require('../cli/start.js');
+      break;
+    case 'status':
+      require('../cli/status.js');
+      break;
+    default:
+      showHelp();
+  }
+}
 function showHelp() {
   console.log(`
 tab-agent - Browser control for Claude/Codex
-Commands:
-  setup   Auto-detect extension, register native host, install skills
-  start   Start the relay server
-  status  Check configuration status
+Setup Commands:
+  setup    Auto-detect extension, register native host, install skills
+  start    Start the relay server
+  status   Check configuration status
-Usage:
+Browser Commands:
+  tabs                      List active tabs
+  snapshot                  Get AI-readable page content
+  screenshot [--full]       Capture screenshot
+  click <ref>               Click element (e.g., click e5)
+  type <ref> <text>         Type text into element
+  fill <ref> <value>        Fill form field
+  press <key>               Press key (Enter, Escape, etc.)
+  scroll <dir> [amount]     Scroll up/down
+  navigate <url>            Go to URL
+  wait <text|selector>      Wait for text or element
+  evaluate <script>         Run JavaScript
+Examples:
   npx tab-agent setup
-  npx tab-agent start
-`);
-}
+  npx tab-agent snapshot
+  npx tab-agent click e5
+  npx tab-agent type e3 "hello world"
-switch (command) {
-  case 'setup':
-    require('../cli/setup.js');
-    break;
-  case 'start':
-    require('../cli/start.js');
-    break;
-  case 'status':
-    require('../cli/status.js');
-    break;
-  case '-v':
-  case '--version':
-    console.log(pkg.version);
-    break;
-  case undefined:
-    showHelp();
-    break;
-  default:
-    showHelp();
+Version: ${require('../package.json').version}
+`);
+  if (command && command !== 'help' && command !== '--help' && command !== '-h') {
     process.exit(1);
+  }
 }

package/cli/command.js ADDED Viewed

@@ -0,0 +1,200 @@
+// cli/command.js
+const WebSocket = require('ws');
+const COMMANDS = ['tabs', 'snapshot', 'screenshot', 'click', 'type', 'fill', 'press', 'scroll', 'navigate', 'wait', 'evaluate'];
+async function runCommand(args) {
+  const [command, ...params] = args;
+  if (!command || command === 'help') {
+    printHelp();
+    return;
+  }
+  if (!COMMANDS.includes(command)) {
+    console.error(`Unknown command: ${command}`);
+    console.error(`Available: ${COMMANDS.join(', ')}`);
+    process.exit(1);
+  }
+  const ws = new WebSocket('ws://localhost:9876');
+  const timeout = setTimeout(() => {
+    console.error('Connection timeout - is the relay running? Try: npx tab-agent start');
+    ws.close();
+    process.exit(1);
+  }, 5000);
+  ws.on('error', (err) => {
+    clearTimeout(timeout);
+    console.error('Connection failed:', err.message);
+    console.error('Make sure relay is running: npx tab-agent start');
+    process.exit(1);
+  });
+  ws.on('open', () => {
+    clearTimeout(timeout);
+    // First get tabs to find tabId
+    if (command === 'tabs') {
+      ws.send(JSON.stringify({ id: 1, action: 'tabs' }));
+    } else {
+      // Get active tab first, then run command
+      ws.send(JSON.stringify({ id: 0, action: 'tabs' }));
+    }
+  });
+  ws.on('message', (data) => {
+    const msg = JSON.parse(data);
+    // Handle tabs response
+    if (msg.id === 0) {
+      if (!msg.tabs || msg.tabs.length === 0) {
+        console.error('No active tabs. Click Tab Agent icon on a tab to activate it.');
+        ws.close();
+        process.exit(1);
+      }
+      const tabId = msg.tabs[0].tabId;
+      const payload = buildPayload(command, params, tabId);
+      ws.send(JSON.stringify({ id: 1, ...payload }));
+      return;
+    }
+    // Handle command response
+    if (msg.id === 1) {
+      if (command === 'tabs') {
+        printTabs(msg);
+      } else if (command === 'snapshot') {
+        printSnapshot(msg);
+      } else if (command === 'screenshot') {
+        printScreenshot(msg);
+      } else {
+        printResult(msg);
+      }
+      ws.close();
+      process.exit(msg.ok ? 0 : 1);
+    }
+  });
+}
+function buildPayload(command, params, tabId) {
+  const payload = { action: command, tabId };
+  switch (command) {
+    case 'click':
+      payload.ref = params[0];
+      break;
+    case 'type':
+      payload.ref = params[0];
+      payload.text = params.slice(1).join(' ');
+      if (params.includes('--submit')) {
+        payload.submit = true;
+        payload.text = payload.text.replace('--submit', '').trim();
+      }
+      break;
+    case 'fill':
+      payload.ref = params[0];
+      payload.value = params.slice(1).join(' ');
+      break;
+    case 'press':
+      payload.key = params[0];
+      break;
+    case 'scroll':
+      payload.direction = params[0] || 'down';
+      payload.amount = parseInt(params[1]) || 500;
+      break;
+    case 'navigate':
+      payload.url = params[0];
+      break;
+    case 'wait':
+      if (params[0]?.startsWith('.') || params[0]?.startsWith('#')) {
+        payload.selector = params[0];
+      } else {
+        payload.text = params.join(' ');
+      }
+      payload.timeout = parseInt(params.find(p => /^\d+$/.test(p))) || 5000;
+      break;
+    case 'evaluate':
+      payload.script = params.join(' ');
+      break;
+    case 'screenshot':
+      if (params.includes('--full') || params.includes('--fullPage')) {
+        payload.fullPage = true;
+      }
+      break;
+  }
+  return payload;
+}
+function printHelp() {
+  console.log(`
+tab-agent - Browser control commands
+Usage: npx tab-agent <command> [options]
+Commands:
+  tabs                      List active tabs
+  snapshot                  Get AI-readable page content
+  screenshot [--full]       Capture screenshot (--full for full page)
+  click <ref>               Click element (e.g., click e5)
+  type <ref> <text>         Type text into element
+  fill <ref> <value>        Fill form field
+  press <key>               Press key (Enter, Escape, Tab, etc.)
+  scroll <dir> [amount]     Scroll up/down (default: 500px)
+  navigate <url>            Go to URL
+  wait <text|selector>      Wait for text or element
+  evaluate <script>         Run JavaScript
+Examples:
+  npx tab-agent tabs
+  npx tab-agent snapshot
+  npx tab-agent click e5
+  npx tab-agent type e3 hello world
+  npx tab-agent navigate https://google.com
+  npx tab-agent screenshot --full
+`);
+}
+function printTabs(msg) {
+  if (!msg.ok) {
+    console.error('Error:', msg.error);
+    return;
+  }
+  console.log('Active tabs:\n');
+  msg.tabs.forEach((tab, i) => {
+    console.log(`  ${i + 1}. [${tab.tabId}] ${tab.title}`);
+    console.log(`     ${tab.url}\n`);
+  });
+}
+function printSnapshot(msg) {
+  if (!msg.ok) {
+    console.error('Error:', msg.error);
+    return;
+  }
+  console.log(msg.snapshot);
+}
+function printScreenshot(msg) {
+  if (!msg.ok) {
+    console.error('Error:', msg.error);
+    return;
+  }
+  // Output base64 directly - no file, no auto-open
+  console.log(msg.screenshot);
+}
+function printResult(msg) {
+  if (!msg.ok) {
+    console.error('Error:', msg.error);
+    return;
+  }
+  console.log('OK');
+  if (msg.result !== undefined) {
+    console.log('Result:', msg.result);
+  }
+}
+module.exports = { runCommand };

package/extension/service-worker.js CHANGED Viewed

@@ -191,32 +191,23 @@ async function routeCommand(tabId, command) {
       // Full page screenshot using chrome.debugger
       if (fullPage) {
         try {
-          await chrome.debugger.attach({ tabId }, '1.3');
+          // Try to detach first in case previous attempt left debugger attached
+          try { await chrome.debugger.detach({ tabId }); } catch {}
-          const { result: layout } = await chrome.debugger.sendCommand(
-            { tabId },
-            'Page.getLayoutMetrics'
-          );
+          await chrome.debugger.attach({ tabId }, '1.3');
-          const { data } = await chrome.debugger.sendCommand(
+          const screenshot = await chrome.debugger.sendCommand(
             { tabId },
             'Page.captureScreenshot',
             {
               format: 'png',
-              captureBeyondViewport: true,
-              clip: {
-                x: 0,
-                y: 0,
-                width: layout.contentSize.width,
-                height: layout.contentSize.height,
-                scale: 1
-              }
+              captureBeyondViewport: true
             }
           );
           await chrome.debugger.detach({ tabId });
           audit('screenshot', { tabId, fullPage: true }, { ok: true });
-          return { ok: true, screenshot: 'data:image/png;base64,' + data, format: 'png' };
+          return { ok: true, screenshot: 'data:image/png;base64,' + screenshot.data, format: 'png' };
         } catch (error) {
           try { await chrome.debugger.detach({ tabId }); } catch {}
           const result = { ok: false, error: error.message };

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "tab-agent",
-  "version": "0.2.3",
+  "version": "0.3.1",
   "description": "Browser control for Claude Code and Codex via WebSocket",
   "bin": {
     "tab-agent": "./bin/tab-agent.js"

package/skills/claude-code/tab-agent.md CHANGED Viewed

@@ -1,11 +1,11 @@
 ---
 name: tab-agent
-description: Browser control via WebSocket - snapshot, click, type, fill, evaluate, screenshot
+description: Browser control via CLI - snapshot, click, type, fill, screenshot
 ---
 # Tab Agent
-WebSocket `ws://localhost:9876`. User activates tabs via extension icon (green = active).
+Control browser tabs via CLI. User activates tabs via extension icon (green = active).
 ## Before First Command
@@ -16,38 +16,44 @@ sleep 2
 ## Commands
-```json
-{"id": 1, "action": "tabs"}                                    // list active tabs
-{"id": 2, "action": "snapshot", "tabId": ID}                   // get page with refs [e1], [e2]...
-{"id": 3, "action": "screenshot", "tabId": ID}                 // viewport screenshot
-{"id": 4, "action": "screenshot", "tabId": ID, "fullPage": true}  // full page screenshot
-{"id": 5, "action": "click", "tabId": ID, "ref": "e1"}         // click element
-{"id": 6, "action": "fill", "tabId": ID, "ref": "e1", "value": "text"}  // fill input
-{"id": 7, "action": "type", "tabId": ID, "ref": "e1", "text": "hello"}  // type text
-{"id": 8, "action": "type", "tabId": ID, "ref": "e1", "text": "query", "submit": true}  // type and Enter
-{"id": 9, "action": "press", "tabId": ID, "key": "Enter"}      // press key
-{"id": 10, "action": "scroll", "tabId": ID, "direction": "down", "amount": 500}
-{"id": 11, "action": "scrollintoview", "tabId": ID, "ref": "e1"}  // scroll element visible
-{"id": 12, "action": "navigate", "tabId": ID, "url": "https://..."}
-{"id": 13, "action": "wait", "tabId": ID, "text": "Loading complete"}  // wait for text
-{"id": 14, "action": "wait", "tabId": ID, "selector": ".results", "timeout": 5000}  // wait for element
-{"id": 15, "action": "evaluate", "tabId": ID, "script": "document.title"}  // run JavaScript
-{"id": 16, "action": "batchfill", "tabId": ID, "fields": [{"ref": "e1", "value": "a"}, {"ref": "e2", "value": "b"}]}
-{"id": 17, "action": "dialog", "tabId": ID, "accept": true}    // handle alert/confirm
+```bash
+npx tab-agent tabs                    # List active tabs
+npx tab-agent snapshot                # Get page with refs [e1], [e2]...
+npx tab-agent screenshot              # Capture viewport
+npx tab-agent screenshot --full       # Capture full page
+npx tab-agent click <ref>             # Click element
+npx tab-agent type <ref> <text>       # Type text
+npx tab-agent fill <ref> <value>      # Fill form field
+npx tab-agent press <key>             # Press key (Enter, Escape, Tab)
+npx tab-agent scroll <dir> [amount]   # Scroll up/down
+npx tab-agent navigate <url>          # Go to URL
+npx tab-agent wait <text|selector>    # Wait for condition
+npx tab-agent evaluate <script>       # Run JavaScript
 ```
 ## Usage
-1. `tabs` -> get tabId of active tab
+1. `tabs` -> find active tab
 2. `snapshot` -> read page, get element refs [e1], [e2]...
-3. `click`/`fill`/`type` using refs
-4. If snapshot incomplete (complex page) -> `screenshot` and analyze visually
+3. `click`/`type`/`fill` using refs
+4. If snapshot incomplete -> `screenshot` and analyze visually
+## Examples
+```bash
+# Search Google
+npx tab-agent navigate "https://google.com"
+npx tab-agent snapshot
+npx tab-agent type e1 "hello world"
+npx tab-agent press Enter
+# Read page content
+npx tab-agent snapshot
+npx tab-agent screenshot --full
+```
 ## Notes
-- Screenshot returns `{"screenshot": "data:image/png;base64,..."}` - analyze directly
-- Snapshot refs reset on each call - always snapshot before interacting
+- Screenshot saves to /tmp/ and opens automatically
+- Refs reset on each snapshot - always snapshot before interacting
 - Keys: Enter, Escape, Tab, Backspace, ArrowUp/Down/Left/Right
-- `type` with `submit: true` presses Enter after typing (for search boxes)
-- `evaluate` runs in page context - can access page variables/functions
-- `dialog` handles alert/confirm/prompt - debugger bar appears when attached

package/skills/codex/tab-agent.md CHANGED Viewed

@@ -1,11 +1,11 @@
 ---
 name: tab-agent
-description: Browser control via WebSocket
+description: Browser control via CLI
 ---
 # Tab Agent
-`ws://localhost:9876` - User activates tabs via extension (green = active)
+CLI browser control. User activates tabs via extension (green = active).
 ## Start Relay
@@ -15,26 +15,20 @@ curl -s http://localhost:9876/health || (npx tab-agent start &)
 ## Commands
-```
-tabs                                  -> list active tabs
-snapshot tabId                        -> page with refs [e1], [e2]...
-screenshot tabId                      -> viewport screenshot (base64 PNG)
-screenshot tabId fullPage=true        -> full page screenshot
-click tabId ref                       -> click element
-fill tabId ref value                  -> fill input
-type tabId ref text                   -> type text
-type tabId ref text submit=true       -> type and press Enter
-press tabId key                       -> Enter/Escape/Tab/Arrow*
-scroll tabId direction amount         -> scroll page
-scrollintoview tabId ref              -> scroll element visible
-navigate tabId url                    -> go to URL
-wait tabId text="..."                 -> wait for text
-wait tabId selector="..." timeout=ms  -> wait for element
-evaluate tabId script="..."           -> run JavaScript
-batchfill tabId fields=[...]          -> fill multiple fields
-dialog tabId accept=true              -> handle alert/confirm
+```bash
+tabs                         # List active tabs
+snapshot                     # Page with refs [e1], [e2]...
+screenshot [--full]          # Capture viewport/full page
+click <ref>                  # Click element
+type <ref> <text>            # Type text
+fill <ref> <value>           # Fill form field
+press <key>                  # Enter/Escape/Tab/Arrow*
+scroll <dir> [amount]        # Scroll up/down
+navigate <url>               # Go to URL
+wait <text|selector>         # Wait for condition
+evaluate <script>            # Run JavaScript
 ```
 ## Flow
-`tabs` -> `snapshot` -> `click`/`fill` -> repeat. Use `screenshot` if snapshot incomplete.
+`snapshot` -> `click`/`type` -> repeat. Use `screenshot` if snapshot incomplete.