npm - alvin-bot - Versions diffs - 4.20.1 → 4.21.0 - Mend

alvin-bot 4.20.1 → 4.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md +34 -0
package/bin/cli.js +47 -0
package/dist/config.js +14 -0
package/dist/platforms/slack.js +41 -0
package/dist/web/server.js +13 -2
package/package.json +1 -1
package/skills/agent-browser/SKILL.md +183 -0
package/skills/browse/SKILL.md +8 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,40 @@
 All notable changes to Alvin Bot are documented here.
+## [4.21.0] — 2026-05-04
+### 🌐 New skill: Agent Browser (Tier-1.5)
+Adds a new bundled skill, `skills/agent-browser/SKILL.md`, that teaches the bot to use the `agent-browser` CLI when it's available. Agent Browser is a [Vercel Labs](https://github.com/vercel-labs/agent-browser) tool that exposes pages as accessibility-tree snapshots with `@e1`, `@e2`, … refs — interactions cost ~200–400 tokens per turn instead of parsing rendered HTML, which is roughly 90 % cheaper than a Playwright/Puppeteer-driven flow.
+The skill is **opt-in by install, not by config**: it only activates when `command -v agent-browser` succeeds. No new dependency in `package.json`, no postinstall hook, no extra disk on a fresh install. Existing browser strategies (Tier 1 Stealth, Tier 2 CDP, Tier 3 Extension) keep working untouched and remain the right tool for stealth scraping, logged-in personal accounts, and watch-along flows.
+The bundled `Browser Automation` skill (`skills/browse/SKILL.md`) was updated to route the bot to the Agent Browser skill first when the binary is on the PATH and the task is interactive (click/fill/extract on cooperative pages).
+`alvin-bot doctor` shows a new `Browser tools:` section reporting whether agent-browser is installed, and gives the one-liner install command if not:
+```
+npm i -g agent-browser && agent-browser install
+```
+The first command pulls the Node CLI; the second downloads a private Chrome-for-Testing build into `~/.agent-browser/`. Together about 240 MB — that's why we don't bundle it.
+No code changes in the bot's core pipeline. Existing users notice nothing unless they install the CLI.
+## [4.20.2] — 2026-05-04
+### 🛡️ Security: Web UI loopback by default + Slack caller allowlist
+Two real attack surfaces closed.
+**Web UI binds to 127.0.0.1 by default.** Previous versions called `server.listen(port)` with no host argument, which Node interprets as "listen on all interfaces". Combined with an empty `WEB_PASSWORD` (which the login route silently treats as "anyone can log in"), this meant any device on the same LAN could log into the bot's Web UI and reach every authenticated endpoint — user list, memory contents, model switch, the WebSocket chat, etc. New default: bind to `127.0.0.1`. To restore LAN access, set `WEB_HOST=0.0.0.0` explicitly in `.env`. If both `WEB_HOST=0.0.0.0` and an empty `WEB_PASSWORD` are present, the bot logs a loud warning on startup.
+**Slack caller allowlist.** New `SLACK_ALLOWED_USERS` env var: comma-separated list of Slack user IDs allowed to talk to the bot (DMs, @mentions, slash commands). Empty list keeps the legacy behaviour — any workspace member can interact, which is safe iff the workspace is private to the operator. To find your Slack user ID: open your profile in Slack → "..." → "Copy member ID", or just message the bot once and read the line `[slack] caller discovered: user=U… — to lock the bot to specific users, add to .env: SLACK_ALLOWED_USERS=U…` from the logs (we log each unique caller once when the allowlist is empty).
+**`alvin-bot doctor` now reports both.** New `Web UI:` and `Slack:` sections flag insecure combos and show whether an allowlist is active.
+No schema or behaviour changes for users who already have `WEB_PASSWORD` set or only use the bot via Telegram. Telegram allowlist (`ALLOWED_USERS`) is unchanged.
 ## [4.20.1] — 2026-05-03
 ### 🛡️ Hardening for the v4.20.0 SQLite migration

package/bin/cli.js CHANGED Viewed

@@ -1361,6 +1361,53 @@ async function doctor() {
     console.log(`  ❌ ALLOWED_USERS not set (nobody can message the bot)`);
   }
+  // ── Web UI security ──
+  console.log("\n  Web UI:");
+  const webHost = getEnv("WEB_HOST") || "127.0.0.1";
+  const webPw = getEnv("WEB_PASSWORD");
+  if (webHost === "127.0.0.1" || webHost === "::1") {
+    console.log(`  ✅ WEB_HOST=${webHost} — loopback only (LAN unreachable)`);
+  } else if (webHost === "0.0.0.0" || webHost === "*") {
+    if (webPw) {
+      console.log(`  ✅ WEB_HOST=${webHost} (LAN-reachable) + WEB_PASSWORD set`);
+    } else {
+      console.log(`  ❌ WEB_HOST=${webHost} (LAN-reachable) WITHOUT WEB_PASSWORD — anyone on LAN can log in`);
+      console.log(`     Fix: set WEB_PASSWORD in .env, or set WEB_HOST=127.0.0.1`);
+    }
+  } else {
+    console.log(`  ℹ️  WEB_HOST=${webHost}${webPw ? " + WEB_PASSWORD set" : " — WEB_PASSWORD empty"}`);
+  }
+  // ── Slack caller allowlist ──
+  if (getEnv("SLACK_BOT_TOKEN")) {
+    console.log("\n  Slack:");
+    const slackAllow = getEnv("SLACK_ALLOWED_USERS");
+    if (slackAllow) {
+      const ids = slackAllow.split(",").map(s => s.trim()).filter(Boolean);
+      console.log(`  ✅ SLACK_ALLOWED_USERS: ${ids.length} user${ids.length === 1 ? "" : "s"} (caller allowlist active)`);
+    } else {
+      console.log(`  ⚠️  SLACK_ALLOWED_USERS not set — any workspace member can talk to the bot`);
+      console.log(`     Safe iff the Slack workspace is private to you. Otherwise add e.g.:`);
+      console.log(`     SLACK_ALLOWED_USERS=U0ABC123,U0DEF456`);
+    }
+  }
+  // ── Browser tools (optional Tier-1.5 agent-browser) ──
+  console.log("\n  Browser tools:");
+  let agentBrowserVersion = "";
+  try {
+    agentBrowserVersion = execSync("agent-browser --version 2>/dev/null", { encoding: "utf-8", timeout: 3000 }).trim();
+  } catch {}
+  if (agentBrowserVersion) {
+    // `agent-browser --version` prints "agent-browser X.Y.Z" — strip the prefix.
+    const v = agentBrowserVersion.replace(/^agent-browser\s+/i, "");
+    console.log(`  ✅ agent-browser ${v} — Tier-1.5 (token-efficient snapshot+ref) available`);
+  } else {
+    console.log(`  ℹ️  agent-browser not installed (optional Tier-1.5)`);
+    console.log(`     Install for ~90% cheaper interactive automation:`);
+    console.log(`       npm i -g agent-browser && agent-browser install`);
+  }
   // ── Memory (semantic search backend) ──
   console.log("\n  Memory:");
   const embJson = resolve(DATA_DIR, "memory", ".embeddings.json");

package/dist/config.js CHANGED Viewed

@@ -63,6 +63,20 @@ export const config = {
     sessionMode: (process.env.SESSION_MODE || "per-user"),
     webhookEnabled: process.env.WEBHOOK_ENABLED === "true",
     webhookToken: process.env.WEBHOOK_TOKEN || "",
+    // Web UI bind host. Default is 127.0.0.1 (loopback only) — set to "0.0.0.0"
+    // explicitly if you want LAN/external access. Combined with WEB_PASSWORD
+    // this is the safe default since v4.20.2; previous versions defaulted to
+    // listening on all interfaces with no auth required when WEB_PASSWORD was
+    // empty.
+    webHost: process.env.WEB_HOST || "127.0.0.1",
+    // Slack caller allowlist. Comma-separated Slack user IDs (e.g. "U0ABC123,U0DEF456").
+    // When non-empty, only these users can talk to the bot in Slack DMs and via @mention.
+    // When empty, the bot accepts any Slack workspace member (legacy behavior; safe iff
+    // the workspace is private to you).
+    slackAllowedUsers: (process.env.SLACK_ALLOWED_USERS || "")
+        .split(",")
+        .map(s => s.trim())
+        .filter(Boolean),
     // Browser
     cdpUrl: process.env.CDP_URL || "",
     browseServerPort: Number(process.env.BROWSE_SERVER_PORT) || 3800,

package/dist/platforms/slack.js CHANGED Viewed

@@ -18,6 +18,32 @@
  */
 import fs from "fs";
 import { parseSlackSlashCommand } from "./slack-slash-parser.js";
+import { config } from "../config.js";
+/**
+ * v4.20.2 — Slack caller allowlist. When SLACK_ALLOWED_USERS is set in the
+ * environment (comma-separated Slack user IDs), only those users get past
+ * this gate. When the list is empty, fall back to legacy behaviour: any
+ * member of the workspace can talk to the bot. The empty-list case is safe
+ * iff the workspace is private to the operator.
+ *
+ * Slack user IDs are workspace-scoped (e.g. "U0ABC123"); rotate the list if
+ * you migrate workspaces.
+ */
+function isSlackUserAllowed(userId) {
+    if (config.slackAllowedUsers.length === 0) {
+        // No allowlist set — log each unique caller once so the operator can
+        // copy a known ID into SLACK_ALLOWED_USERS and lock the bot down.
+        if (userId && !discoveredCallers.has(userId)) {
+            discoveredCallers.add(userId);
+            console.warn(`[slack] caller discovered: user=${userId} — to lock the bot to specific users, ` +
+                `add to .env: SLACK_ALLOWED_USERS=${userId}` +
+                (discoveredCallers.size > 1 ? ` (or comma-separate multiple)` : ""));
+        }
+        return true;
+    }
+    return config.slackAllowedUsers.includes(userId);
+}
+const discoveredCallers = new Set();
 let _slackState = {
     status: "disconnected",
     botName: null,
@@ -148,6 +174,13 @@ export class SlackAdapter {
         const userId = message.user || "";
         const channelId = message.channel || "";
         const messageId = message.ts || "";
+        // v4.20.2 — caller allowlist. If SLACK_ALLOWED_USERS is set, silently
+        // ignore anyone not on the list. Empty list = legacy behaviour
+        // (any workspace member can talk to the bot — safe iff the workspace
+        // is private to the operator).
+        if (!isSlackUserAllowed(userId)) {
+            return;
+        }
         // Determine channel type
         // DMs (im) have channel_type "im", group DMs are "mpim", channels are "channel"/"group"
         const channelType = message.channel_type || "";
@@ -222,6 +255,10 @@ export class SlackAdapter {
         const channelId = command.channel_id || "";
         const userId = command.user_id || "";
         const userName = command.user_name || userId;
+        // v4.20.2 — caller allowlist for slash commands.
+        if (!isSlackUserAllowed(userId)) {
+            return;
+        }
         const incoming = {
             platform: "slack",
             messageId: `cmd-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`,
@@ -247,6 +284,10 @@ export class SlackAdapter {
         const userId = event.user || "";
         const channelId = event.channel || "";
         const messageId = event.ts || "";
+        // v4.20.2 — same caller allowlist as DMs.
+        if (!isSlackUserAllowed(userId)) {
+            return;
+        }
         // Strip the @mention from text
         text = text.replace(new RegExp(`<@${this.botUserId}>`, "g"), "").trim();
         if (!text)

package/dist/web/server.js CHANGED Viewed

@@ -1566,7 +1566,11 @@ function scheduleBindAttempt(port, attempt) {
     // invalid backlog, kernel hiccup) can throw synchronously. Catch here
     // so the main routine never crashes during web-UI bind.
     try {
-        server.listen(port, () => {
+        // v4.20.2 — bind to config.webHost (default 127.0.0.1) so the Web UI
+        // is loopback-only unless the operator opts in by setting WEB_HOST=0.0.0.0.
+        // Empty/"*" maps to all interfaces.
+        const bindHost = (config.webHost === "*" || config.webHost === "") ? undefined : config.webHost;
+        server.listen(port, bindHost, () => {
             if (handled)
                 return; // Should be impossible; paranoia.
             handled = true;
@@ -1587,10 +1591,17 @@ function scheduleBindAttempt(port, attempt) {
             server.on("error", (err) => {
                 console.warn(`[web] post-bind server error (ignored): ${err.message}`);
             });
-            console.log(`🌐 Web UI: http://localhost:${actualWebPort}`);
+            const bindLabel = bindHost && bindHost !== "127.0.0.1" && bindHost !== "::1"
+                ? `http://${bindHost}:${actualWebPort}` + (bindHost === "0.0.0.0" ? " (LAN-reachable)" : "")
+                : `http://localhost:${actualWebPort}`;
+            console.log(`🌐 Web UI: ${bindLabel}`);
             if (actualWebPort !== originalPort) {
                 console.log(`   (Port ${originalPort} was busy, using ${actualWebPort} instead)`);
             }
+            if (bindHost === "0.0.0.0" && !process.env.WEB_PASSWORD) {
+                console.warn("⚠️ Web UI is bound to 0.0.0.0 but WEB_PASSWORD is empty — anyone on the LAN can log in. " +
+                    "Set WEB_PASSWORD in ~/.alvin-bot/.env or set WEB_HOST=127.0.0.1.");
+            }
         });
     }
     catch (err) {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "alvin-bot",
-  "version": "4.20.1",
+  "version": "4.21.0",
   "description": "Alvin Bot — Your personal AI agent on Telegram, WhatsApp, Discord, Signal, and Web.",
   "type": "module",
   "main": "dist/index.js",

package/skills/agent-browser/SKILL.md ADDED Viewed

@@ -0,0 +1,183 @@
+---
+name: Agent Browser (Snapshot+Ref)
+description: Token-efficient browser automation via the `agent-browser` CLI (Vercel Labs). Uses accessibility-tree snapshots with @eN refs (~200–400 tokens per page) instead of raw HTML parsing — typically 90%+ cheaper than Playwright/Puppeteer. Use for click-fill-extract on public pages, single-page test flows, structured form submission, and screenshots-with-refs. Optional dependency — only active if `agent-browser` is on the PATH; otherwise the regular Browser Automation skill takes over.
+triggers: snapshot the page, get refs, list interactive elements, click @e, fill @e, agent-browser, click button on, click the button, fill in the field, extract from page, find on page, scrape page interactively, visit and click, open page and click, navigate and fill, semantic locator, accessibility tree, snapshot+ref, schau auf der Seite nach, klicke auf den Button, fülle das Feld, formular ausfüllen
+priority: 9
+category: automation
+---
+# Agent Browser — Token-Efficient Snapshot+Ref Workflow
+Use this skill when interactive browser automation is needed (click, fill,
+extract, screenshot) AND `agent-browser` is installed. The accessibility-tree
+snapshot makes per-page interaction roughly an order of magnitude cheaper in
+tokens than parsing rendered HTML with Playwright.
+## Pre-flight: is the CLI installed?
+```bash
+command -v agent-browser >/dev/null 2>&1 \
+  && echo "agent-browser ok" \
+  || echo "fall back to the Browser Automation skill"
+```
+If absent: **stop and use the regular Browser Automation skill** (Tier 1
+Stealth / Tier 2 CDP). Don't suggest installing it unless the user asks —
+it's an opt-in tool, see `alvin-bot doctor` for installation hints.
+## Core loop
+```bash
+agent-browser open <url>
+agent-browser snapshot -i               # interactive elements, with @e1..@eN refs
+agent-browser click @e3                 # act on a ref
+agent-browser snapshot -i               # CRITICAL — re-snapshot after every page change
+agent-browser close
+```
+Refs (`@e1`, `@e2`, …) are **assigned fresh every snapshot**. They go stale
+the moment the page changes (click that navigates, form submit, dynamic
+re-render, modal open). Always re-snapshot before the next ref interaction.
+This single rule is the most common pitfall.
+A snapshot looks like:
+```
+Page: Example - Log in
+URL: https://example.com/login
+@e1 [heading] "Log in"
+@e2 [form]
+  @e3 [input type="email"] placeholder="Email"
+  @e4 [input type="password"] placeholder="Password"
+  @e5 [button type="submit"] "Continue"
+  @e6 [link] "Forgot password?"
+```
+## Common patterns
+### Read a page
+```bash
+agent-browser snapshot -i               # interactive only (preferred)
+agent-browser snapshot -i -u            # include href URLs on links
+agent-browser snapshot -i --json        # machine-readable
+agent-browser get text @e1              # visible text of an element
+agent-browser get attr @e10 href        # any attribute
+agent-browser get url                   # current URL
+```
+### Interact
+```bash
+agent-browser click @e1
+agent-browser fill @e2 "user@example.com"  # clear + type
+agent-browser type @e2 " more text"        # type without clearing
+agent-browser press Enter
+agent-browser select @e4 "option-value"
+agent-browser upload @e5 file.pdf
+agent-browser scroll down 500
+agent-browser screenshot result.png
+```
+### Wait for the right thing (most failures come from bad waits)
+```bash
+agent-browser wait @e1                     # until an element appears
+agent-browser wait --text "Success"        # until specific text on the page
+agent-browser wait --url "**/dashboard"    # until URL matches glob
+agent-browser wait --load networkidle      # post-navigation catch-all
+```
+Avoid bare `wait 2000` except in throwaway debugging. Default timeout: 25 s.
+### Find by semantics when refs aren't ergonomic
+```bash
+agent-browser find role button click --name "Submit"
+agent-browser find text "Sign In" click --exact
+agent-browser find label "Email" fill "user@example.com"
+agent-browser find placeholder "Search" type "query"
+agent-browser find testid "submit-btn" click
+```
+### Multiple isolated browser sessions (parallel users)
+```bash
+agent-browser --session a open https://app.example.com
+agent-browser --session b open https://app.example.com
+agent-browser --session a fill @e1 "alice@test.com"
+agent-browser --session b fill @e1 "bob@test.com"
+```
+### Persist login across runs
+```bash
+# Save once after a successful login:
+agent-browser state save ./auth.json
+# Resume already-logged-in:
+agent-browser --state ./auth.json open https://app.example.com
+```
+### Auth vault (don't put passwords in shell history)
+```bash
+agent-browser auth save my-app --url https://app.example.com/login \
+  --username user@example.com --password-stdin
+# (paste password, Ctrl+D)
+agent-browser auth login my-app
+```
+### Iframes
+Iframes are inlined in the snapshot — refs work transparently. To scope a
+snapshot to one iframe:
+```bash
+agent-browser frame @e3
+agent-browser snapshot -i
+agent-browser frame main
+```
+### Mock network (testing)
+```bash
+agent-browser network route "**/api/users" --body '{"users":[]}'
+agent-browser network route "**/analytics" --abort
+agent-browser network har start /tmp/trace.har
+# ... do stuff ...
+agent-browser network har stop
+```
+## When NOT to use this skill
+| Situation | Skill |
+|---|---|
+| Bot-protected site (Cloudflare, DataDome) | regular **Browser Automation** skill, Tier 1 Stealth |
+| Logged-in personal account on LinkedIn / Gmail | **Browser Automation**, Tier 2 CDP (`alvin-bot browser …`) |
+| User wants to watch a complex flow live | **Browser Automation**, Tier 3 Extension |
+| Static HTML / public JSON / RSS / API | `curl` / WebFetch — no browser engine needed |
+agent-browser is great for **task automation on cooperative pages** (your
+own apps, public data sites, form submissions). It is *not* a stealth tool.
+## Diagnostics
+```bash
+agent-browser doctor                # full env check
+agent-browser doctor --quick        # local-only
+agent-browser dashboard start       # observability UI on :4848
+agent-browser skills get core       # the upstream tool's own usage guide
+```
+## One-liner sanity test
+```bash
+agent-browser open https://example.com \
+  && agent-browser snapshot -i \
+  && agent-browser close
+```
+Expect two `@e` refs (heading + link). If that works, the tool is healthy.

package/skills/browse/SKILL.md CHANGED Viewed

@@ -15,12 +15,20 @@ Du hast drei Browser-Strategien plus WebFetch. **Wähle die billigste passende S
 | Task | Tool | Warum |
 |------|------|-------|
 | Einzelne öffentliche Seite, nur Text | `curl` oder WebFetch | Am schnellsten, keine Browser-Engine |
+| Interaktiv (klicken/füllen/extrahieren) auf kooperativer Seite | **Tier 1.5 agent-browser** *(falls installiert)* | Snapshot+Ref-Workflow ist ~90 % token-günstiger als rohes Playwright. Siehe Skill „Agent Browser". |
 | Öffentliche Seite mit JS / Cloudflare | **Tier 1 Stealth** | Headless + Fingerprint-Masking |
 | Login-pflichtige Seite (LinkedIn, Gmail, …) | **Tier 2 CDP** | Echtes Chromium, persistente Cookies |
 | Komplexer Multi-Step-Flow, User soll zusehen | **Tier 3 Extension** | Nur in interaktiven CLI-Sessions |
 **NIEMALS** nacktes `node -e "const {chromium}…"` für externe Seiten — wird sofort geblockt.
+**Vorab prüfen ob agent-browser verfügbar ist:**
+```bash
+command -v agent-browser >/dev/null 2>&1 && echo "Tier 1.5 verfügbar"
+```
+Falls ja und der Task ist „klick X, lies Y, fülle Z aus" → den `agent-browser`-Skill nehmen.
+Falls nein → mit Tier 1/2/3 weitermachen wie unten. Installation auf Wunsch des Users: `npm i -g agent-browser && agent-browser install`.
 ---
 ## Tier 0 — curl / WebFetch (schnellster Pfad)