alvin-bot 4.20.1 → 4.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,40 @@
2
2
 
3
3
  All notable changes to Alvin Bot are documented here.
4
4
 
5
+ ## [4.21.0] — 2026-05-04
6
+
7
+ ### 🌐 New skill: Agent Browser (Tier-1.5)
8
+
9
+ Adds a new bundled skill, `skills/agent-browser/SKILL.md`, that teaches the bot to use the `agent-browser` CLI when it's available. Agent Browser is a [Vercel Labs](https://github.com/vercel-labs/agent-browser) tool that exposes pages as accessibility-tree snapshots with `@e1`, `@e2`, … refs — interactions cost ~200–400 tokens per turn instead of parsing rendered HTML, which is roughly 90 % cheaper than a Playwright/Puppeteer-driven flow.
10
+
11
+ The skill is **opt-in by install, not by config**: it only activates when `command -v agent-browser` succeeds. No new dependency in `package.json`, no postinstall hook, no extra disk on a fresh install. Existing browser strategies (Tier 1 Stealth, Tier 2 CDP, Tier 3 Extension) keep working untouched and remain the right tool for stealth scraping, logged-in personal accounts, and watch-along flows.
12
+
13
+ The bundled `Browser Automation` skill (`skills/browse/SKILL.md`) was updated to route the bot to the Agent Browser skill first when the binary is on the PATH and the task is interactive (click/fill/extract on cooperative pages).
14
+
15
+ `alvin-bot doctor` shows a new `Browser tools:` section reporting whether agent-browser is installed, and gives the one-liner install command if not:
16
+
17
+ ```
18
+ npm i -g agent-browser && agent-browser install
19
+ ```
20
+
21
+ The first command pulls the Node CLI; the second downloads a private Chrome-for-Testing build into `~/.agent-browser/`. Together about 240 MB — that's why we don't bundle it.
22
+
23
+ No code changes in the bot's core pipeline. Existing users notice nothing unless they install the CLI.
24
+
25
+ ## [4.20.2] — 2026-05-04
26
+
27
+ ### 🛡️ Security: Web UI loopback by default + Slack caller allowlist
28
+
29
+ Two real attack surfaces closed.
30
+
31
+ **Web UI binds to 127.0.0.1 by default.** Previous versions called `server.listen(port)` with no host argument, which Node interprets as "listen on all interfaces". Combined with an empty `WEB_PASSWORD` (which the login route silently treats as "anyone can log in"), this meant any device on the same LAN could log into the bot's Web UI and reach every authenticated endpoint — user list, memory contents, model switch, the WebSocket chat, etc. New default: bind to `127.0.0.1`. To restore LAN access, set `WEB_HOST=0.0.0.0` explicitly in `.env`. If both `WEB_HOST=0.0.0.0` and an empty `WEB_PASSWORD` are present, the bot logs a loud warning on startup.
32
+
33
+ **Slack caller allowlist.** New `SLACK_ALLOWED_USERS` env var: comma-separated list of Slack user IDs allowed to talk to the bot (DMs, @mentions, slash commands). Empty list keeps the legacy behaviour — any workspace member can interact, which is safe iff the workspace is private to the operator. To find your Slack user ID: open your profile in Slack → "..." → "Copy member ID", or just message the bot once and read the line `[slack] caller discovered: user=U… — to lock the bot to specific users, add to .env: SLACK_ALLOWED_USERS=U…` from the logs (we log each unique caller once when the allowlist is empty).
34
+
35
+ **`alvin-bot doctor` now reports both.** New `Web UI:` and `Slack:` sections flag insecure combos and show whether an allowlist is active.
36
+
37
+ No schema or behaviour changes for users who already have `WEB_PASSWORD` set or only use the bot via Telegram. Telegram allowlist (`ALLOWED_USERS`) is unchanged.
38
+
5
39
  ## [4.20.1] — 2026-05-03
6
40
 
7
41
  ### 🛡️ Hardening for the v4.20.0 SQLite migration
package/bin/cli.js CHANGED
@@ -1361,6 +1361,53 @@ async function doctor() {
1361
1361
  console.log(` ❌ ALLOWED_USERS not set (nobody can message the bot)`);
1362
1362
  }
1363
1363
 
1364
+ // ── Web UI security ──
1365
+ console.log("\n Web UI:");
1366
+ const webHost = getEnv("WEB_HOST") || "127.0.0.1";
1367
+ const webPw = getEnv("WEB_PASSWORD");
1368
+ if (webHost === "127.0.0.1" || webHost === "::1") {
1369
+ console.log(` ✅ WEB_HOST=${webHost} — loopback only (LAN unreachable)`);
1370
+ } else if (webHost === "0.0.0.0" || webHost === "*") {
1371
+ if (webPw) {
1372
+ console.log(` ✅ WEB_HOST=${webHost} (LAN-reachable) + WEB_PASSWORD set`);
1373
+ } else {
1374
+ console.log(` ❌ WEB_HOST=${webHost} (LAN-reachable) WITHOUT WEB_PASSWORD — anyone on LAN can log in`);
1375
+ console.log(` Fix: set WEB_PASSWORD in .env, or set WEB_HOST=127.0.0.1`);
1376
+ }
1377
+ } else {
1378
+ console.log(` ℹ️ WEB_HOST=${webHost}${webPw ? " + WEB_PASSWORD set" : " — WEB_PASSWORD empty"}`);
1379
+ }
1380
+
1381
+ // ── Slack caller allowlist ──
1382
+ if (getEnv("SLACK_BOT_TOKEN")) {
1383
+ console.log("\n Slack:");
1384
+ const slackAllow = getEnv("SLACK_ALLOWED_USERS");
1385
+ if (slackAllow) {
1386
+ const ids = slackAllow.split(",").map(s => s.trim()).filter(Boolean);
1387
+ console.log(` ✅ SLACK_ALLOWED_USERS: ${ids.length} user${ids.length === 1 ? "" : "s"} (caller allowlist active)`);
1388
+ } else {
1389
+ console.log(` ⚠️ SLACK_ALLOWED_USERS not set — any workspace member can talk to the bot`);
1390
+ console.log(` Safe iff the Slack workspace is private to you. Otherwise add e.g.:`);
1391
+ console.log(` SLACK_ALLOWED_USERS=U0ABC123,U0DEF456`);
1392
+ }
1393
+ }
1394
+
1395
+ // ── Browser tools (optional Tier-1.5 agent-browser) ──
1396
+ console.log("\n Browser tools:");
1397
+ let agentBrowserVersion = "";
1398
+ try {
1399
+ agentBrowserVersion = execSync("agent-browser --version 2>/dev/null", { encoding: "utf-8", timeout: 3000 }).trim();
1400
+ } catch {}
1401
+ if (agentBrowserVersion) {
1402
+ // `agent-browser --version` prints "agent-browser X.Y.Z" — strip the prefix.
1403
+ const v = agentBrowserVersion.replace(/^agent-browser\s+/i, "");
1404
+ console.log(` ✅ agent-browser ${v} — Tier-1.5 (token-efficient snapshot+ref) available`);
1405
+ } else {
1406
+ console.log(` ℹ️ agent-browser not installed (optional Tier-1.5)`);
1407
+ console.log(` Install for ~90% cheaper interactive automation:`);
1408
+ console.log(` npm i -g agent-browser && agent-browser install`);
1409
+ }
1410
+
1364
1411
  // ── Memory (semantic search backend) ──
1365
1412
  console.log("\n Memory:");
1366
1413
  const embJson = resolve(DATA_DIR, "memory", ".embeddings.json");
package/dist/config.js CHANGED
@@ -63,6 +63,20 @@ export const config = {
63
63
  sessionMode: (process.env.SESSION_MODE || "per-user"),
64
64
  webhookEnabled: process.env.WEBHOOK_ENABLED === "true",
65
65
  webhookToken: process.env.WEBHOOK_TOKEN || "",
66
+ // Web UI bind host. Default is 127.0.0.1 (loopback only) — set to "0.0.0.0"
67
+ // explicitly if you want LAN/external access. Combined with WEB_PASSWORD
68
+ // this is the safe default since v4.20.2; previous versions defaulted to
69
+ // listening on all interfaces with no auth required when WEB_PASSWORD was
70
+ // empty.
71
+ webHost: process.env.WEB_HOST || "127.0.0.1",
72
+ // Slack caller allowlist. Comma-separated Slack user IDs (e.g. "U0ABC123,U0DEF456").
73
+ // When non-empty, only these users can talk to the bot in Slack DMs and via @mention.
74
+ // When empty, the bot accepts any Slack workspace member (legacy behavior; safe iff
75
+ // the workspace is private to you).
76
+ slackAllowedUsers: (process.env.SLACK_ALLOWED_USERS || "")
77
+ .split(",")
78
+ .map(s => s.trim())
79
+ .filter(Boolean),
66
80
  // Browser
67
81
  cdpUrl: process.env.CDP_URL || "",
68
82
  browseServerPort: Number(process.env.BROWSE_SERVER_PORT) || 3800,
@@ -18,6 +18,32 @@
18
18
  */
19
19
  import fs from "fs";
20
20
  import { parseSlackSlashCommand } from "./slack-slash-parser.js";
21
+ import { config } from "../config.js";
22
+ /**
23
+ * v4.20.2 — Slack caller allowlist. When SLACK_ALLOWED_USERS is set in the
24
+ * environment (comma-separated Slack user IDs), only those users get past
25
+ * this gate. When the list is empty, fall back to legacy behaviour: any
26
+ * member of the workspace can talk to the bot. The empty-list case is safe
27
+ * iff the workspace is private to the operator.
28
+ *
29
+ * Slack user IDs are workspace-scoped (e.g. "U0ABC123"); rotate the list if
30
+ * you migrate workspaces.
31
+ */
32
+ function isSlackUserAllowed(userId) {
33
+ if (config.slackAllowedUsers.length === 0) {
34
+ // No allowlist set — log each unique caller once so the operator can
35
+ // copy a known ID into SLACK_ALLOWED_USERS and lock the bot down.
36
+ if (userId && !discoveredCallers.has(userId)) {
37
+ discoveredCallers.add(userId);
38
+ console.warn(`[slack] caller discovered: user=${userId} — to lock the bot to specific users, ` +
39
+ `add to .env: SLACK_ALLOWED_USERS=${userId}` +
40
+ (discoveredCallers.size > 1 ? ` (or comma-separate multiple)` : ""));
41
+ }
42
+ return true;
43
+ }
44
+ return config.slackAllowedUsers.includes(userId);
45
+ }
46
+ const discoveredCallers = new Set();
21
47
  let _slackState = {
22
48
  status: "disconnected",
23
49
  botName: null,
@@ -148,6 +174,13 @@ export class SlackAdapter {
148
174
  const userId = message.user || "";
149
175
  const channelId = message.channel || "";
150
176
  const messageId = message.ts || "";
177
+ // v4.20.2 — caller allowlist. If SLACK_ALLOWED_USERS is set, silently
178
+ // ignore anyone not on the list. Empty list = legacy behaviour
179
+ // (any workspace member can talk to the bot — safe iff the workspace
180
+ // is private to the operator).
181
+ if (!isSlackUserAllowed(userId)) {
182
+ return;
183
+ }
151
184
  // Determine channel type
152
185
  // DMs (im) have channel_type "im", group DMs are "mpim", channels are "channel"/"group"
153
186
  const channelType = message.channel_type || "";
@@ -222,6 +255,10 @@ export class SlackAdapter {
222
255
  const channelId = command.channel_id || "";
223
256
  const userId = command.user_id || "";
224
257
  const userName = command.user_name || userId;
258
+ // v4.20.2 — caller allowlist for slash commands.
259
+ if (!isSlackUserAllowed(userId)) {
260
+ return;
261
+ }
225
262
  const incoming = {
226
263
  platform: "slack",
227
264
  messageId: `cmd-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`,
@@ -247,6 +284,10 @@ export class SlackAdapter {
247
284
  const userId = event.user || "";
248
285
  const channelId = event.channel || "";
249
286
  const messageId = event.ts || "";
287
+ // v4.20.2 — same caller allowlist as DMs.
288
+ if (!isSlackUserAllowed(userId)) {
289
+ return;
290
+ }
250
291
  // Strip the @mention from text
251
292
  text = text.replace(new RegExp(`<@${this.botUserId}>`, "g"), "").trim();
252
293
  if (!text)
@@ -1566,7 +1566,11 @@ function scheduleBindAttempt(port, attempt) {
1566
1566
  // invalid backlog, kernel hiccup) can throw synchronously. Catch here
1567
1567
  // so the main routine never crashes during web-UI bind.
1568
1568
  try {
1569
- server.listen(port, () => {
1569
+ // v4.20.2 — bind to config.webHost (default 127.0.0.1) so the Web UI
1570
+ // is loopback-only unless the operator opts in by setting WEB_HOST=0.0.0.0.
1571
+ // Empty/"*" maps to all interfaces.
1572
+ const bindHost = (config.webHost === "*" || config.webHost === "") ? undefined : config.webHost;
1573
+ server.listen(port, bindHost, () => {
1570
1574
  if (handled)
1571
1575
  return; // Should be impossible; paranoia.
1572
1576
  handled = true;
@@ -1587,10 +1591,17 @@ function scheduleBindAttempt(port, attempt) {
1587
1591
  server.on("error", (err) => {
1588
1592
  console.warn(`[web] post-bind server error (ignored): ${err.message}`);
1589
1593
  });
1590
- console.log(`🌐 Web UI: http://localhost:${actualWebPort}`);
1594
+ const bindLabel = bindHost && bindHost !== "127.0.0.1" && bindHost !== "::1"
1595
+ ? `http://${bindHost}:${actualWebPort}` + (bindHost === "0.0.0.0" ? " (LAN-reachable)" : "")
1596
+ : `http://localhost:${actualWebPort}`;
1597
+ console.log(`🌐 Web UI: ${bindLabel}`);
1591
1598
  if (actualWebPort !== originalPort) {
1592
1599
  console.log(` (Port ${originalPort} was busy, using ${actualWebPort} instead)`);
1593
1600
  }
1601
+ if (bindHost === "0.0.0.0" && !process.env.WEB_PASSWORD) {
1602
+ console.warn("⚠️ Web UI is bound to 0.0.0.0 but WEB_PASSWORD is empty — anyone on the LAN can log in. " +
1603
+ "Set WEB_PASSWORD in ~/.alvin-bot/.env or set WEB_HOST=127.0.0.1.");
1604
+ }
1594
1605
  });
1595
1606
  }
1596
1607
  catch (err) {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "alvin-bot",
3
- "version": "4.20.1",
3
+ "version": "4.21.0",
4
4
  "description": "Alvin Bot — Your personal AI agent on Telegram, WhatsApp, Discord, Signal, and Web.",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",
@@ -0,0 +1,183 @@
1
+ ---
2
+ name: Agent Browser (Snapshot+Ref)
3
+ description: Token-efficient browser automation via the `agent-browser` CLI (Vercel Labs). Uses accessibility-tree snapshots with @eN refs (~200–400 tokens per page) instead of raw HTML parsing — typically 90%+ cheaper than Playwright/Puppeteer. Use for click-fill-extract on public pages, single-page test flows, structured form submission, and screenshots-with-refs. Optional dependency — only active if `agent-browser` is on the PATH; otherwise the regular Browser Automation skill takes over.
4
+ triggers: snapshot the page, get refs, list interactive elements, click @e, fill @e, agent-browser, click button on, click the button, fill in the field, extract from page, find on page, scrape page interactively, visit and click, open page and click, navigate and fill, semantic locator, accessibility tree, snapshot+ref, schau auf der Seite nach, klicke auf den Button, fülle das Feld, formular ausfüllen
5
+ priority: 9
6
+ category: automation
7
+ ---
8
+
9
+ # Agent Browser — Token-Efficient Snapshot+Ref Workflow
10
+
11
+ Use this skill when interactive browser automation is needed (click, fill,
12
+ extract, screenshot) AND `agent-browser` is installed. The accessibility-tree
13
+ snapshot makes per-page interaction roughly an order of magnitude cheaper in
14
+ tokens than parsing rendered HTML with Playwright.
15
+
16
+ ## Pre-flight: is the CLI installed?
17
+
18
+ ```bash
19
+ command -v agent-browser >/dev/null 2>&1 \
20
+ && echo "agent-browser ok" \
21
+ || echo "fall back to the Browser Automation skill"
22
+ ```
23
+
24
+ If absent: **stop and use the regular Browser Automation skill** (Tier 1
25
+ Stealth / Tier 2 CDP). Don't suggest installing it unless the user asks —
26
+ it's an opt-in tool, see `alvin-bot doctor` for installation hints.
27
+
28
+ ## Core loop
29
+
30
+ ```bash
31
+ agent-browser open <url>
32
+ agent-browser snapshot -i # interactive elements, with @e1..@eN refs
33
+ agent-browser click @e3 # act on a ref
34
+ agent-browser snapshot -i # CRITICAL — re-snapshot after every page change
35
+ agent-browser close
36
+ ```
37
+
38
+ Refs (`@e1`, `@e2`, …) are **assigned fresh every snapshot**. They go stale
39
+ the moment the page changes (click that navigates, form submit, dynamic
40
+ re-render, modal open). Always re-snapshot before the next ref interaction.
41
+ This single rule is the most common pitfall.
42
+
43
+ A snapshot looks like:
44
+
45
+ ```
46
+ Page: Example - Log in
47
+ URL: https://example.com/login
48
+
49
+ @e1 [heading] "Log in"
50
+ @e2 [form]
51
+ @e3 [input type="email"] placeholder="Email"
52
+ @e4 [input type="password"] placeholder="Password"
53
+ @e5 [button type="submit"] "Continue"
54
+ @e6 [link] "Forgot password?"
55
+ ```
56
+
57
+ ## Common patterns
58
+
59
+ ### Read a page
60
+
61
+ ```bash
62
+ agent-browser snapshot -i # interactive only (preferred)
63
+ agent-browser snapshot -i -u # include href URLs on links
64
+ agent-browser snapshot -i --json # machine-readable
65
+ agent-browser get text @e1 # visible text of an element
66
+ agent-browser get attr @e10 href # any attribute
67
+ agent-browser get url # current URL
68
+ ```
69
+
70
+ ### Interact
71
+
72
+ ```bash
73
+ agent-browser click @e1
74
+ agent-browser fill @e2 "user@example.com" # clear + type
75
+ agent-browser type @e2 " more text" # type without clearing
76
+ agent-browser press Enter
77
+ agent-browser select @e4 "option-value"
78
+ agent-browser upload @e5 file.pdf
79
+ agent-browser scroll down 500
80
+ agent-browser screenshot result.png
81
+ ```
82
+
83
+ ### Wait for the right thing (most failures come from bad waits)
84
+
85
+ ```bash
86
+ agent-browser wait @e1 # until an element appears
87
+ agent-browser wait --text "Success" # until specific text on the page
88
+ agent-browser wait --url "**/dashboard" # until URL matches glob
89
+ agent-browser wait --load networkidle # post-navigation catch-all
90
+ ```
91
+
92
+ Avoid bare `wait 2000` except in throwaway debugging. Default timeout: 25 s.
93
+
94
+ ### Find by semantics when refs aren't ergonomic
95
+
96
+ ```bash
97
+ agent-browser find role button click --name "Submit"
98
+ agent-browser find text "Sign In" click --exact
99
+ agent-browser find label "Email" fill "user@example.com"
100
+ agent-browser find placeholder "Search" type "query"
101
+ agent-browser find testid "submit-btn" click
102
+ ```
103
+
104
+ ### Multiple isolated browser sessions (parallel users)
105
+
106
+ ```bash
107
+ agent-browser --session a open https://app.example.com
108
+ agent-browser --session b open https://app.example.com
109
+ agent-browser --session a fill @e1 "alice@test.com"
110
+ agent-browser --session b fill @e1 "bob@test.com"
111
+ ```
112
+
113
+ ### Persist login across runs
114
+
115
+ ```bash
116
+ # Save once after a successful login:
117
+ agent-browser state save ./auth.json
118
+
119
+ # Resume already-logged-in:
120
+ agent-browser --state ./auth.json open https://app.example.com
121
+ ```
122
+
123
+ ### Auth vault (don't put passwords in shell history)
124
+
125
+ ```bash
126
+ agent-browser auth save my-app --url https://app.example.com/login \
127
+ --username user@example.com --password-stdin
128
+ # (paste password, Ctrl+D)
129
+
130
+ agent-browser auth login my-app
131
+ ```
132
+
133
+ ### Iframes
134
+
135
+ Iframes are inlined in the snapshot — refs work transparently. To scope a
136
+ snapshot to one iframe:
137
+
138
+ ```bash
139
+ agent-browser frame @e3
140
+ agent-browser snapshot -i
141
+ agent-browser frame main
142
+ ```
143
+
144
+ ### Mock network (testing)
145
+
146
+ ```bash
147
+ agent-browser network route "**/api/users" --body '{"users":[]}'
148
+ agent-browser network route "**/analytics" --abort
149
+ agent-browser network har start /tmp/trace.har
150
+ # ... do stuff ...
151
+ agent-browser network har stop
152
+ ```
153
+
154
+ ## When NOT to use this skill
155
+
156
+ | Situation | Skill |
157
+ |---|---|
158
+ | Bot-protected site (Cloudflare, DataDome) | regular **Browser Automation** skill, Tier 1 Stealth |
159
+ | Logged-in personal account on LinkedIn / Gmail | **Browser Automation**, Tier 2 CDP (`alvin-bot browser …`) |
160
+ | User wants to watch a complex flow live | **Browser Automation**, Tier 3 Extension |
161
+ | Static HTML / public JSON / RSS / API | `curl` / WebFetch — no browser engine needed |
162
+
163
+ agent-browser is great for **task automation on cooperative pages** (your
164
+ own apps, public data sites, form submissions). It is *not* a stealth tool.
165
+
166
+ ## Diagnostics
167
+
168
+ ```bash
169
+ agent-browser doctor # full env check
170
+ agent-browser doctor --quick # local-only
171
+ agent-browser dashboard start # observability UI on :4848
172
+ agent-browser skills get core # the upstream tool's own usage guide
173
+ ```
174
+
175
+ ## One-liner sanity test
176
+
177
+ ```bash
178
+ agent-browser open https://example.com \
179
+ && agent-browser snapshot -i \
180
+ && agent-browser close
181
+ ```
182
+
183
+ Expect two `@e` refs (heading + link). If that works, the tool is healthy.
@@ -15,12 +15,20 @@ Du hast drei Browser-Strategien plus WebFetch. **Wähle die billigste passende S
15
15
  | Task | Tool | Warum |
16
16
  |------|------|-------|
17
17
  | Einzelne öffentliche Seite, nur Text | `curl` oder WebFetch | Am schnellsten, keine Browser-Engine |
18
+ | Interaktiv (klicken/füllen/extrahieren) auf kooperativer Seite | **Tier 1.5 agent-browser** *(falls installiert)* | Snapshot+Ref-Workflow ist ~90 % token-günstiger als rohes Playwright. Siehe Skill „Agent Browser". |
18
19
  | Öffentliche Seite mit JS / Cloudflare | **Tier 1 Stealth** | Headless + Fingerprint-Masking |
19
20
  | Login-pflichtige Seite (LinkedIn, Gmail, …) | **Tier 2 CDP** | Echtes Chromium, persistente Cookies |
20
21
  | Komplexer Multi-Step-Flow, User soll zusehen | **Tier 3 Extension** | Nur in interaktiven CLI-Sessions |
21
22
 
22
23
  **NIEMALS** nacktes `node -e "const {chromium}…"` für externe Seiten — wird sofort geblockt.
23
24
 
25
+ **Vorab prüfen ob agent-browser verfügbar ist:**
26
+ ```bash
27
+ command -v agent-browser >/dev/null 2>&1 && echo "Tier 1.5 verfügbar"
28
+ ```
29
+ Falls ja und der Task ist „klick X, lies Y, fülle Z aus" → den `agent-browser`-Skill nehmen.
30
+ Falls nein → mit Tier 1/2/3 weitermachen wie unten. Installation auf Wunsch des Users: `npm i -g agent-browser && agent-browser install`.
31
+
24
32
  ---
25
33
 
26
34
  ## Tier 0 — curl / WebFetch (schnellster Pfad)