alvin-bot 4.21.0 → 4.22.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/02_yandex.png +0 -0
  2. package/CHANGELOG.md +67 -0
  3. package/README.md +151 -244
  4. package/bin/cli.js +48 -15
  5. package/dist/handlers/commands.js +6 -1
  6. package/dist/services/cron-scheduling.js +109 -29
  7. package/dist/services/embeddings/auto-detect.js +74 -0
  8. package/dist/services/embeddings/fts5.js +108 -0
  9. package/dist/services/embeddings/gemini.js +65 -0
  10. package/dist/services/embeddings/index.js +499 -0
  11. package/dist/services/embeddings/ollama.js +78 -0
  12. package/dist/services/embeddings/openai.js +49 -0
  13. package/dist/services/embeddings/provider.js +22 -0
  14. package/dist/services/embeddings/vector-base.js +113 -0
  15. package/dist/services/embeddings.js +6 -502
  16. package/dist/services/memory-inject-mode.js +43 -0
  17. package/dist/services/memory-layers.js +24 -15
  18. package/dist/services/memory.js +19 -13
  19. package/package.json +1 -1
  20. package/docs/screenshots/00-Login.png +0 -0
  21. package/docs/screenshots/01-Chat-Dark-Conversation.png +0 -0
  22. package/docs/screenshots/02-Chat.png +0 -0
  23. package/docs/screenshots/03-Dashboard-Overview.png +0 -0
  24. package/docs/screenshots/04-AI-Models-and-Providers.png +0 -0
  25. package/docs/screenshots/05-Personality-Editor.png +0 -0
  26. package/docs/screenshots/06-Memory-Manager.png +0 -0
  27. package/docs/screenshots/07-Active-Sessions.png +0 -0
  28. package/docs/screenshots/08-File-Browser.png +0 -0
  29. package/docs/screenshots/09-Scheduled-Jobs.png +0 -0
  30. package/docs/screenshots/10-Custom-Tools.png +0 -0
  31. package/docs/screenshots/11-Plugins-and-MCP.png +0 -0
  32. package/docs/screenshots/12-Messaging-Platforms.png +0 -0
  33. package/docs/screenshots/12.1-Messaging-Platforms-WhatsApp-Groups-List.png +0 -0
  34. package/docs/screenshots/12.2-Messaging-Platforms-WA-Group-Details.png +0 -0
  35. package/docs/screenshots/13-User-Management.png +0 -0
  36. package/docs/screenshots/14-Web-Terminal.png +0 -0
  37. package/docs/screenshots/15-Maintenance-and-Health.png +0 -0
  38. package/docs/screenshots/16-Settings-and-Env.png +0 -0
  39. package/docs/screenshots/TG-commands.png +0 -0
  40. package/docs/screenshots/TG.png +0 -0
  41. package/docs/screenshots/_Mac-Installer.png +0 -0
package/02_yandex.png ADDED
Binary file
package/CHANGELOG.md CHANGED
@@ -2,6 +2,73 @@
2
2
 
3
3
  All notable changes to Alvin Bot are documented here.
4
4
 
5
+ ## [4.22.1] — 2026-05-09
6
+
7
+ ### Fixed
8
+
9
+ - **Cron scheduler:** prevent duplicate catch-up runs for already-attempted slots. When a daily/scheduled job fired but crashed before completion, the boot-time catch-up would re-fire it hours later (e.g. an `0 8 * * *` job retried at 11:00 after a 10:51 reboot). The catch-up now skips jobs whose current schedule slot has already been attempted: for cron expressions, `lastAttemptAt >= mostRecentPastTrigger` short-circuits the rewind; for interval schedules, `now - lastAttemptAt < intervalMs` does the same. Crashed runs from *previous* slots within the 6h grace window still catch up as before. New `prevCronRun()` helper in `cron-scheduling.ts`. ([#cron-catchup-bug](src/services/cron-scheduling.ts))
10
+
11
+ ## [4.22.0] — 2026-05-05
12
+
13
+ ### 🧠 Memory architecture overhaul: pluggable providers + smart inject
14
+
15
+ Public users without `GOOGLE_API_KEY` (the v4.20–v4.21 default for embeddings) now get a working indexed memory store out of the box. The embeddings layer is refactored behind a provider interface with four backends auto-detected at startup:
16
+
17
+ | Tier | Provider | Setup | Cost | Dim |
18
+ |---|---|---|---|---|
19
+ | 1 | Gemini (`gemini-embedding-001`) | `GOOGLE_API_KEY` | free tier | 3072 |
20
+ | 2 | OpenAI (`text-embedding-3-small`) | `OPENAI_API_KEY` | ~$0.02 / 1M tokens | 1536 |
21
+ | 3 | Ollama (default `nomic-embed-text`) | `ollama pull nomic-embed-text` | free, local, private | 768 |
22
+ | 4 | **FTS5 (BM25 keyword)** | nothing | free | n/a |
23
+
24
+ The FTS5 fallback is the headline: SQLite's built-in full-text-search virtual table with BM25 ranking. No API key, no network, no setup. Indexes the same chunks as the vector providers (`MEMORY.md`, daily logs, project files, hub memory, asset index) and ranks matches by relevance. Excellent for proper-noun and exact-term lookups (project names, commands, error messages); weaker than vector search for synonyms and conceptual paraphrase queries — but available everywhere.
25
+
26
+ **Upgrade path.** A user starts on FTS5 (no keys needed). Later they set `GOOGLE_API_KEY` in their `.env` → next bot start detects the schema mismatch via `meta.embedding_model`, drops the FTS5 table, initialises the vector schema, and reindexes. Same in reverse. All seamless, no manual steps.
27
+
28
+ Override the auto-detection with `EMBEDDINGS_PROVIDER=gemini|openai|ollama|fts5|auto` (default `auto`).
29
+
30
+ ### ✂️ MEMORY.md no longer bulk-injected into every system prompt (when SQLite is populated)
31
+
32
+ Pre-v4.22, `MEMORY.md` (typically tens of KB of curated long-term knowledge) and the last two daily logs were plain-text-injected into the system prompt on **every turn**. With a populated SQLite store, the same content is available via the smaller, query-targeted `searchMemory()` retrieval — much smaller prompts, much more relevant context.
33
+
34
+ New `MEMORY_INJECT_MODE` env var:
35
+
36
+ - `auto` (default) — sqlite when the store has indexed entries, else legacy
37
+ - `legacy` — pre-v4.22 behaviour, full plain-text inject every turn
38
+ - `sqlite` — never plain-text-inject `MEMORY.md` or daily logs (force smart mode regardless of store state)
39
+
40
+ Always plain-text injected regardless of mode: `identity.md` (L0) and `preferences.md` (L1) — these are tiny by design and contain always-on facts that semantic search may miss for short or generic queries. Recommended pattern: keep critical "never X" / "always Y" rules in `preferences.md`, let the bulk knowledge live in `MEMORY.md` and be retrieved on demand.
41
+
42
+ For users still on the legacy monolithic `MEMORY.md` setup (no `identity.md`, no `preferences.md`), auto mode kicks in only after the SQLite store is populated — until then, plain-text injection of `MEMORY.md` continues to work as before. Zero-touch upgrade.
43
+
44
+ ### 🔇 Quieter logs for missing keys
45
+
46
+ The `⚠️ Embeddings init failed: Google API key not configured` warning is gone — that startup line is now `ℹ️ Memory provider: fts5-bm25 (keyword-local). Initial index will run on first use.` Public users without Gemini no longer see a scary warning that suggested the bot was broken when in fact it was working correctly.
47
+
48
+ ### 🩺 `alvin-bot doctor` Memory section expanded
49
+
50
+ Reports the active provider, dimension, indexed entry/file counts, last-reindex timestamp, and effective inject mode. For not-yet-initialised stores it predicts which provider will run on first start so users can confirm the auto-detection picked what they expected.
51
+
52
+ ```
53
+ Memory:
54
+ ✅ Provider: gemini-embedding-001 (vector-cloud, 3072-dim)
55
+ 3827 entries / 316 files indexed, 48.8 MB on disk
56
+ Last reindex: 25 h ago
57
+ Inject mode: sqlite (auto)
58
+ ```
59
+
60
+ ### Architecture
61
+
62
+ - New: `src/services/embeddings/` directory — `provider.ts` (interface), `vector-base.ts` (shared vector logic), `gemini.ts`, `openai.ts`, `ollama.ts`, `fts5.ts`, `auto-detect.ts`, `index.ts` (facade)
63
+ - New: `src/services/memory-inject-mode.ts` — env resolver
64
+ - Updated: `src/services/memory-layers.ts`, `src/services/memory.ts` — gate plain-text injection on inject mode
65
+ - `src/services/embeddings.ts` is now a thin re-export shim — all existing imports keep working
66
+
67
+ ### Tests
68
+
69
+ - 24 new tests across FTS5 provider, auto-detection, and inject-mode resolver
70
+ - All 535 existing tests still pass (one pre-existing port-binding flake in `web-server-integration.test.ts` is unrelated)
71
+
5
72
  ## [4.21.0] — 2026-05-04
6
73
 
7
74
  ### 🌐 New skill: Agent Browser (Tier-1.5)
package/README.md CHANGED
@@ -1,104 +1,49 @@
1
1
  # 🤖 Alvin Bot — Autonomous AI Agent
2
2
 
3
- > Your personal AI assistant — on Telegram, WhatsApp, Discord, Signal, Terminal, and Web.
3
+ > Your personal AI agent — on Telegram, WhatsApp, Discord, Slack, Signal, Terminal, and Web.
4
4
 
5
- Alvin Bot is an open-source, self-hosted AI agent that lives where you chat. Built on a multi-model engine with full system access, memory, plugins, and a rich web dashboard. Not just a chatbot an autonomous agent that remembers, acts, and learns.
5
+ Open-source, self-hosted, multi-model. Lives where you chat, has full shell + filesystem access, remembers across sessions, and dispatches detached sub-agents for long-running work. Built on the Claude Agent SDK with a provider-agnostic engine that also drives OpenAI, Groq, Gemini, NVIDIA NIM, OpenRouter, and Ollama.
6
6
 
7
+ > **What's new — v4.22 (May 2026):** Pluggable memory backends — Gemini · OpenAI · Ollama · **FTS5 keyword fallback (zero-config)**. Users without an embedding API key now get a working indexed memory store out of the box. Smart inject mode trims ~25 k tokens per turn off long system prompts. [Full changelog →](CHANGELOG.md)
7
8
 
8
9
  ---
9
10
 
10
- ## 📸 Preview
11
-
12
- <table>
13
- <tr>
14
- <td align="center"><b>💬 Chat (Dark Mode)</b><br><img src="docs/screenshots/01-Chat-Dark-Conversation.png" width="400"></td>
15
- <td align="center"><b>📊 Dashboard</b><br><img src="docs/screenshots/03-Dashboard-Overview.png" width="400"></td>
16
- </tr>
17
- <tr>
18
- <td align="center"><b>🤖 AI Models & Providers</b><br><img src="docs/screenshots/04-AI-Models-and-Providers.png" width="400"></td>
19
- <td align="center"><b>🎭 Personality Editor</b><br><img src="docs/screenshots/05-Personality-Editor.png" width="400"></td>
20
- </tr>
21
- <tr>
22
- <td align="center"><b>💬 Telegram</b><br><img src="docs/screenshots/TG.png" width="400"></td>
23
- <td align="center"><b>📱 Messaging Platforms</b><br><img src="docs/screenshots/12-Messaging-Platforms.png" width="400"></td>
24
- </tr>
25
- <tr>
26
- <td align="center"><b>🔧 Custom Tools</b><br><img src="docs/screenshots/10-Custom-Tools.png" width="400"></td>
27
- <td align="center"><b>🩺 Health & Maintenance</b><br><img src="docs/screenshots/15-Maintenance-and-Health.png" width="400"></td>
28
- </tr>
29
- </table>
30
-
31
- <details>
32
- <summary><b>🖼️ More Screenshots</b> (click to expand)</summary>
33
- <br>
34
-
35
- | Feature | Screenshot |
36
- |---------|-----------|
37
- | Login | <img src="docs/screenshots/00-Login.png" width="500"> |
38
- | Chat (Light) | <img src="docs/screenshots/02-Chat.png" width="500"> |
39
- | Memory Manager | <img src="docs/screenshots/06-Memory-Manager.png" width="500"> |
40
- | Active Sessions | <img src="docs/screenshots/07-Active-Sessions.png" width="500"> |
41
- | File Browser | <img src="docs/screenshots/08-File-Browser.png" width="500"> |
42
- | Scheduled Jobs | <img src="docs/screenshots/09-Scheduled-Jobs.png" width="500"> |
43
- | Plugins & MCP | <img src="docs/screenshots/11-Plugins-and-MCP.png" width="500"> |
44
- | WhatsApp Groups | <img src="docs/screenshots/12.1-Messaging-Platforms-WhatsApp-Groups-List.png" width="500"> |
45
- | WA Group Details | <img src="docs/screenshots/12.2-Messaging-Platforms-WA-Group-Details.png" width="500"> |
46
- | User Management | <img src="docs/screenshots/13-User-Management.png" width="500"> |
47
- | Web Terminal | <img src="docs/screenshots/14-Web-Terminal.png" width="500"> |
48
- | Settings & Env | <img src="docs/screenshots/16-Settings-and-Env.png" width="500"> |
49
- | Telegram Commands | <img src="docs/screenshots/TG-commands.png" width="500"> |
50
- | macOS Installer | <img src="docs/screenshots/_Mac-Installer.png" width="500"> |
51
-
52
- </details>
53
-
54
- ---
55
-
56
-
57
11
  ## ✨ Features
58
12
 
59
13
  ### 🧠 Intelligence
60
- - **Multi-Model Engine** — Claude (Agent SDK with full tool use), OpenAI, Groq, NVIDIA NIM, Google Gemini, OpenRouter, or any OpenAI-compatible API
61
- - **Automatic Fallback** — If one provider fails, seamlessly tries the next
62
- - **Heartbeat Monitor** — Pings providers every 5 minutes, auto-failover after 2 failures, auto-recovery
63
- - **User-Configurable Fallback Order** — Rearrange provider priority via Telegram (`/fallback`), Web UI, or API
64
- - **Adjustable Thinking** — From quick answers (`/effort low`) to deep analysis (`/effort max`)
65
- - **Persistent Memory** — Remembers across sessions via vector-indexed knowledge base; session state (Claude SDK resume tokens, conversation history, language, effort) survives bot restarts (v4.11.0)
66
- - **Multi-Session Workspaces** — Run multiple parallel, context-isolated sessions on the same bot — one per Slack channel or per Telegram `/workspace` — each with its own working directory, purpose, and persona. Memory, skills, and sub-agents stay globally shared (v4.12.0). [How-to ↓](#-multi-session-workspaces-v4120)
67
- - **Truly Detached Sub-Agents** — Claude dispatches long-running research/audit tasks via the `alvin_dispatch_agent` MCP tool, which spawns independent `claude -p` subprocesses with their own PID + process group. Main session stays fully responsive, user can interrupt freely without killing sub-agents. Results deliver as separate messages. Works identically on Telegram, Slack, Discord, and WhatsApp (v4.13.0+ dispatch, v4.14.0 multi-platform)
68
- - **Smart Tool Discovery** — Scans your system at startup, knows exactly what CLI tools, plugins, and APIs are available
69
- - **Skill System** — 12 built-in SKILL.md files (code, data analysis, email, docs, research, sysadmin, browse, etc.) auto-activate based on message context
70
- - **Self-Awareness** — Knows it IS the AI model won't call external APIs for tasks it can do itself
71
- - **Automatic Language Detection** — Detects user language (EN/DE/ES/FR) and adapts; learns preference over time
14
+ - **Multi-model engine** — Claude Agent SDK · OpenAI · Groq · NVIDIA NIM · Gemini · OpenRouter · Ollama · Codex CLI · any OpenAI-compatible API
15
+ - **Automatic fallback + heartbeat monitor** — pings providers every 5 min, auto-failover after 2 failures, auto-recovery; reorder priority via Telegram `/fallback`, Web UI, or API
16
+ - **Adjustable thinking depth** — `/effort low` to `/effort max`
17
+ - **Pluggable memory backends (v4.22)** — Gemini · OpenAI · Ollama · FTS5 keyword fallback. Auto-detection picks the best available. Indexed search across `MEMORY.md`, daily logs, project files, hub memory, asset index. Override via `EMBEDDINGS_PROVIDER`.
18
+ - **Smart system-prompt injection (v4.22)** — once SQLite is populated, stops bulk-injecting `MEMORY.md` and surfaces only the chunks relevant to the user's current message. Cuts ~25 k tokens per turn for typical setups. `MEMORY_INJECT_MODE=auto|legacy|sqlite` to override.
19
+ - **Layered memory (L0–L3)** — `identity.md` + `preferences.md` always plain-text · project memories on topic match · daily logs / curated knowledge via semantic or keyword search
20
+ - **Persistent sessions** — Claude SDK resume tokens, conversation history, language, effort survive bot restarts
21
+ - **Multi-session workspaces** — parallel context-isolated sessions per Slack channel or `/workspace` switch, each with its own cwd, purpose, persona. Memory + skills stay globally shared. [How-to ↓](#-multi-session-workspaces-v4120)
22
+ - **Detached sub-agents** — `alvin_dispatch_agent` MCP tool spawns independent `claude -p` subprocesses that survive parent aborts. Results deliver as separate messages. Works identically on Telegram / Slack / Discord / WhatsApp.
23
+ - **Smart tool discovery** — scans your system at startup; typical install surfaces 30–70 tools depending on what's locally available
24
+ - **Skill system** — 14 SKILL.md files (see [Skills ↓](#-skills)) auto-activate based on message context
25
+ - **Self-awareness + auto-language** — knows it IS the AI · detects EN/DE/ES/FR and adapts; learns preference over time
72
26
 
73
27
  ### 💬 Multi-Platform
74
- - **Telegram** — Full-featured with streaming, inline keyboards, voice, photos, documents
75
- - **Slack** — Socket Mode bot via `@slack/bolt`, DMs + @mentions, file attachments, reactions, `assistant.threads.setStatus` typing indicator. **One channel = one isolated workspace.** See [Multi-Session Workspaces](#-multi-session-workspaces-v4120) below.
76
- - **WhatsApp** — Via WhatsApp Web: self-chat as AI notepad, group whitelist with per-contact access control, full media support (photos, docs, audio, video)
77
- - **WhatsApp Group Approval** — Owner gets approval requests via Telegram (or WhatsApp DM fallback) before the bot responds to group messages. Silent — group members see nothing.
78
- - **Discord** — Server bot with mention/reply detection, slash commands
79
- - **Signal** — Via signal-cli REST API with voice transcription
80
- - **Terminal** — Rich TUI with ANSI colors and streaming (`alvin-bot tui`)
81
- - **Web UI** — Full dashboard with chat, settings, file manager, terminal, workspace overview
28
+ - **Telegram** — streaming, inline keyboards, voice, photos, documents
29
+ - **Slack** — Socket Mode via `@slack/bolt`, DMs + @mentions, file attachments, `assistant.threads.setStatus` typing. **One channel = one isolated workspace.**
30
+ - **WhatsApp** — via WhatsApp Web; self-chat as AI notepad, group whitelist with per-contact access, full media. Owner approval gate routes to Telegram (DM / Discord / Signal fallback) before the bot replies.
31
+ - **Discord** — server bot with mention/reply detection and slash commands
32
+ - **Signal** — via signal-cli REST API, voice transcription
33
+ - **Terminal** — rich TUI with ANSI colors + streaming (`alvin-bot tui`)
34
+ - **Web UI** — full dashboard, chat, settings, file manager, terminal, workspace overview
82
35
 
83
36
  ### 🔧 Capabilities
84
- - **52+ Built-in Tools** — Shell, files, email, screenshots, PDF, media, git, system control
85
- - **Plugin System** — 6 built-in plugins (weather, finance, notes, calendar, email, smarthome)
86
- - **MCP Client** — Connect any Model Context Protocol server
87
- - **Cron Jobs** — Scheduled tasks with AI-driven creation ("check my email every morning")
88
- - **Voice** — Speech-to-text (Groq Whisper) + text-to-speech (Edge TTS)
89
- - **Vision** — Photo analysis, document scanning, screenshot understanding
90
- - **Image Generation** — Via Google Gemini / DALL·E (with API key)
91
- - **Web Browsing** — Fetch and summarize web pages
37
+ - **Tool layer** — Shell · files · Python · git · email · PDF · media · vision · screenshots · system control. Universal tool use across any provider that supports function calling; text-only fallback for those that don't.
38
+ - **6 built-in plugins** weather · finance · notes · calendar · email · smarthome
39
+ - **MCP client** — connect any Model Context Protocol server
40
+ - **Cron** — AI-driven scheduled tasks (`"check my email every morning"`)
41
+ - **Voice** — STT via Groq Whisper, TTS via Edge TTS or ElevenLabs
42
+ - **Vision + image generation** — photo / document analysis · Gemini / DALL·E generation with API key
43
+ - **Browser** — 4-tier strategy: WebFetch · stealth Playwright · CDP with persistent profile · agent-browser CLI (Tier-1.5, opt-in)
92
44
 
93
45
  ### 🖥️ Web Dashboard
94
- - **Live Chat** WebSocket streaming, same experience as Telegram
95
- - **Model Switcher** — Change AI models on the fly
96
- - **Platform Setup** — Configure all messengers and providers via UI, WhatsApp group management inline
97
- - **File Manager** — Browse, edit, create files in the working directory
98
- - **Memory Editor** — View and edit the agent's knowledge base
99
- - **Session Browser** — Inspect conversation history
100
- - **Terminal** — Run commands directly from the browser
101
- - **Maintenance** — Health checks, backups, bot controls
46
+ - WebSocket streaming chat · model switcher · platform & provider setup · file manager · memory editor · session browser · in-browser terminal · maintenance + health · workspace cards with cost aggregation
102
47
 
103
48
  ---
104
49
 
@@ -121,7 +66,7 @@ Free AI providers available — no credit card needed. **Privacy-first?** Pick t
121
66
 
122
67
  ### 📘 First-time setup walkthroughs
123
68
 
124
- Step-by-step guides with screenshots and screen-for-screen instructions:
69
+ Step-by-step printable PDF guides:
125
70
 
126
71
  | Platform | PDF (printable) |
127
72
  |---|---|
@@ -262,69 +207,72 @@ If your AI provider isn't working, run `doctor` — it tests the actual API conn
262
207
 
263
208
  ## 🏗️ Architecture
264
209
 
265
- ```
266
- ┌──────────────┐
267
- │ Web UI │ (Dashboard, Workspaces, Chat, Settings)
268
- └──────┬───────┘
269
- HTTP/WS
270
- ┌──────────┐ ┌───────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
271
- │ Telegram │ │ Slack │ │ WhatsApp │ │ Discord │ │ Signal │
272
- └────┬─────┘ └───┬───┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
273
- │ │ │ │ │
274
- └────────────┴───────────┴─────────────┴──────────────┘
275
-
276
- ┌─────────┴──────────┐
277
- Workspace Resolver (per-channel context: cwd + persona)
278
- └─────────┬──────────┘
279
-
280
- ┌──────┴───────┐
281
- │ Engine │ (Query routing, fallback)
282
- └──────┬───────┘
283
-
284
- ┌────────────────┼────────────────┐
285
- │ │ │
286
- ┌──────┴──────┐ ┌─────┴──────┐ ┌──────┴──────┐
287
- │ Claude SDK │ │ OpenAI │ │ Custom │
288
- (full agent)│ │ Compatible │ │ Models │
289
- └─────────────┘ └────────────┘ └─────────────┘
210
+ ```mermaid
211
+ flowchart TB
212
+ classDef ent fill:#1e293b,color:#e2e8f0,stroke:#475569
213
+ classDef core fill:#0f172a,color:#f1f5f9,stroke:#3b82f6,stroke-width:2px
214
+ classDef prov fill:#1e293b,color:#cbd5e1,stroke:#64748b
215
+ classDef mem fill:#1e293b,color:#cbd5e1,stroke:#10b981
216
+
217
+ TG[Telegram]:::ent
218
+ SL[Slack]:::ent
219
+ WA[WhatsApp]:::ent
220
+ DC[Discord]:::ent
221
+ SG[Signal]:::ent
222
+ WEB[Web UI · TUI · CLI]:::ent
223
+
224
+ TG & SL & WA & DC & SG & WEB --> WR
225
+
226
+ WR[Workspace Resolver<br/>per-channel cwd + persona]:::core
227
+ WR --> ENG[Engine<br/>routing · fallback · heartbeat]:::core
228
+
229
+ ENG --> CL[Claude SDK]:::prov
230
+ ENG --> OAI[OpenAI · Groq · Gemini ·<br/>NVIDIA · OpenRouter]:::prov
231
+ ENG --> OL[Ollama · Codex CLI ·<br/>OpenAI-compatible]:::prov
232
+
233
+ ENG -.reads.-> MEM
234
+ MEM[Memory Layer]:::mem
235
+ MEM --> L0[L0 / L1<br/>identity.md · preferences.md<br/>always plain-text]:::mem
236
+ MEM --> SQL[SQLite store · provider auto-detect<br/>Gemini · OpenAI · Ollama · FTS5]:::mem
237
+
238
+ ENG -.dispatches.-> SUB[Detached sub-agents<br/>independent claude -p]:::prov
290
239
  ```
291
240
 
292
- ### Provider Types
241
+ ### Provider matrix
293
242
 
294
- | Provider | Tool Use | Streaming | Vision | Auth |
295
- |----------|----------|-----------|--------|------|
296
- | Claude SDK | ✅ Full (native Bash, Read, Write, Web) | ✅ | ✅ | Claude CLI (OAuth) |
297
- | OpenAI, Groq, Gemini | ✅ Full (Shell, Files, Python, Web) | ✅ | Varies | API Key |
298
- | NVIDIA NIM | ✅ Full (Shell, Files, Python, Web) | ✅ | Varies | API Key (free) |
299
- | OpenRouter | ✅ Full (Shell, Files, Python, Web) | ✅ | | API Key |
300
- | Other OpenAI-compatible | ⚡ Auto-detect | ✅ | Varies | API Key |
243
+ | Provider | Tool use | Streaming | Vision | Auth |
244
+ |---|---|---|---|---|
245
+ | Claude SDK (Agent) | ✅ native (Bash, Read, Write, Web, MCP) | ✅ | ✅ | Claude CLI OAuth |
246
+ | OpenAI · Groq · Gemini · NVIDIA NIM · OpenRouter | ✅ universal tool use | ✅ | varies | API key |
247
+ | Ollama (local) | ✅ via tool-bridge | ✅ | varies | none |
248
+ | Codex CLI | ✅ subprocess | ✅ | | Codex CLI auth |
249
+ | Any OpenAI-compatible | ⚡ auto-detect | ✅ | varies | API key |
301
250
 
302
- > **Universal Tool Use:** Alvin Bot gives full agent capabilities to *any* provider that supports function calling — not just Claude. Shell commands, file operations, Python execution, web search, and more work across all major providers. If a provider doesn't support tool calls, Alvin Bot automatically falls back to text-only chat mode.
251
+ > **Universal tool use** Alvin gives full agent powers to any provider that supports function calling. Shell · files · Python · web work everywhere; providers without tool calls degrade cleanly to text-only chat.
303
252
 
304
- ### Project Structure
253
+ ### Project layout
305
254
 
306
255
  ```
307
- alvin-bot/
308
- ├── src/
309
- ├── index.ts # Entry point
310
- ├── engine.ts # Multi-model query engine
311
- ├── config.ts # Configuration
312
- ├── handlers/ # Message & command handlers
313
- ├── middleware/ # Auth & access control
314
- │ ├── platforms/ # Telegram, Slack, WhatsApp, Discord, Signal adapters
315
- │ ├── providers/ # AI provider implementations
316
- │ ├── services/ # Memory, voice, cron, plugins, workspaces, tool discovery
317
- │ ├── tui/ # Terminal UI
318
- └── web/ # Web server, APIs, setup wizard
319
- ├── web/public/ # Web UI (HTML/CSS/JS, zero build step)
320
- ├── plugins/ # Plugin directory (6 built-in)
321
- ├── docs/
322
- │ ├── install/ # Setup guides (macOS, Windows, Slack)
323
- │ └── custom-models.json # Custom model configurations
324
- ├── TOOLS.md # Custom tool definitions (Markdown)
325
- ├── SOUL.md # Agent personality
326
- ├── bin/cli.js # CLI entry point
327
- └── ecosystem.config.cjs # PM2 configuration
256
+ src/
257
+ ├── index.ts entry point
258
+ ├── engine.ts multi-model query engine
259
+ ├── handlers/ message + command handlers
260
+ ├── platforms/ Telegram · Slack · WhatsApp · Discord · Signal
261
+ ├── providers/ Claude SDK · OpenAI-compat · Ollama · Codex CLI
262
+ ├── services/
263
+ │ ├── embeddings/ v4.22 pluggable provider facade (Gemini/OpenAI/Ollama/FTS5)
264
+ │ ├── memory*.ts layered memory (L0-L3) + inject-mode resolver
265
+ │ ├── workspaces.ts per-channel cwd + persona registry
266
+ │ ├── alvin-dispatch.ts detached sub-agent orchestration
267
+ ├── browser-manager.ts 4-tier browser strategy
268
+ │ └── … cron · voice · skills · MCP · hooks · …
269
+ ├── tui/ terminal chat UI
270
+ └── web/ dashboard server + APIs
271
+ web/public/ zero-build HTML/CSS/JS UI
272
+ plugins/ 6 built-in plugins (hot-reload)
273
+ skills/ 14 SKILL.md files (hot-reload)
274
+ bin/cli.js CLI entry point
275
+ electron/ Electron wrapper for the .dmg build
328
276
  ```
329
277
 
330
278
  ---
@@ -368,7 +316,7 @@ The `cwd` auto-loads the project-specific `CLAUDE.md` via Claude SDK's `settingS
368
316
  ### Slack setup (5 minutes)
369
317
 
370
318
  1. Download the setup guide + manifest from the [latest release](https://github.com/alvbln/Alvin-Bot/releases/latest):
371
- - `slack-setup.md` — step-by-step instructions with screenshots
319
+ - `slack-setup.md` — step-by-step instructions
372
320
  - `slack-manifest.json` — copy-paste ready Slack App manifest
373
321
  2. Create a Slack App from the manifest at https://api.slack.com/apps → **Create New App** → **From an app manifest**
374
322
  3. Enable Socket Mode, generate an **App-Level Token** (starts with `xapp-`)
@@ -433,6 +381,13 @@ OPENROUTER_API_KEY=<key> # OpenRouter (100+ models)
433
381
  PRIMARY_PROVIDER=claude-sdk # Primary AI provider
434
382
  FALLBACK_PROVIDERS=nvidia-kimi-k2.5,nvidia-llama-3.3-70b
435
383
 
384
+ # Memory backend (v4.22+) — auto-detects based on what keys you have.
385
+ # Set to override the default priority: gemini → openai → ollama → fts5.
386
+ # fts5 is the zero-config keyword fallback — no key needed, works for everyone.
387
+ EMBEDDINGS_PROVIDER=auto # auto | gemini | openai | ollama | fts5
388
+ OLLAMA_EMBEDDING_MODEL=nomic-embed-text # only used for ollama provider
389
+ MEMORY_INJECT_MODE=auto # auto | legacy | sqlite (see CHANGELOG v4.22)
390
+
436
391
  # Optional Platforms
437
392
  WHATSAPP_ENABLED=true # Enable WhatsApp (needs Chrome)
438
393
  DISCORD_TOKEN=<token> # Enable Discord
@@ -541,18 +496,26 @@ Plugins are auto-loaded at startup. Create your own by adding a directory with a
541
496
 
542
497
  ## 🎯 Skills
543
498
 
544
- Built-in skills in `skills/`:
499
+ Skills are markdown files in `skills/` that auto-activate when the user's message matches their trigger keywords. The skill body gets injected into the system prompt, giving the agent specialized expertise on demand. 14 ship built-in:
545
500
 
546
- | Skill | Triggers | Description |
547
- |-------|----------|-------------|
548
- | code-project | code, build, implement, debug, refactor | Software development workflows, architecture patterns |
549
- | data-analysis | analyze, chart, csv, excel, statistics | Data processing, visualization, statistical analysis |
550
- | document-creation | document, report, letter, pdf, write | Professional document creation and formatting |
551
- | email-summary | email, inbox, unread, newsletter | Email triage, summarization, priority sorting |
552
- | system-admin | server, deploy, docker, nginx, ssl | DevOps, deployment, system administration |
553
- | web-research | research, compare, find, review | Deep web research with source verification |
554
-
555
- Skills activate automatically when your message matches their trigger keywords. The skill's SKILL.md content is injected into the system prompt, giving the agent specialized expertise for that task.
501
+ | Skill | Description |
502
+ |---|---|
503
+ | **agent-browser** | Token-efficient web automation via the agent-browser CLI (accessibility-tree snapshots) Tier 1.5 of the browser stack |
504
+ | **apple-notes** | Read, create, search Apple Notes via AppleScript (macOS) |
505
+ | **browse** | 3-tier browser control: WebFetch · stealth Playwright · CDP with persistent profile |
506
+ | **code-project** | Software development workflows: build, debug, refactor, architecture patterns |
507
+ | **data-analysis** | CSV / JSON / Excel processing, charts, statistics via Python |
508
+ | **document-creation** | Professional PDFs, reports, letters with formatting |
509
+ | **email-summary** | Inbox triage, newsletter digests, priority sorting |
510
+ | **github** | Issues, PRs, releases, workflows via the `gh` CLI |
511
+ | **social-fetch** | Analyse Instagram / TikTok / YouTube / X URLs the user shares |
512
+ | **summarize** | Condense URLs, PDFs, long documents |
513
+ | **system-admin** | Server management, deploys, Docker, nginx, SSL |
514
+ | **weather** | Forecasts and conditions |
515
+ | **web-research** | Deep multi-source research with citation aggregation |
516
+ | **webcheck** | Security / SEO audit of a website |
517
+
518
+ Drop your own `<name>/SKILL.md` into `~/.alvin-bot/skills/` for hot-reload. List active skills via `/skills` or `alvin-bot skills`.
556
519
 
557
520
  ---
558
521
 
@@ -579,96 +542,40 @@ alvin-bot version # Show version
579
542
 
580
543
  ## 🗺️ Roadmap
581
544
 
582
- - [x] **Phase 1** Multi-Model Engine (provider abstraction, fallback chains)
583
- - [x] **Phase 2** — Memory System (vector search, user profiles, smart context)
584
- - [x] **Phase 3** — Rich Interactions (video messages, browser automation, email)
585
- - [x] **Phase 4** — Plugins & Tools (plugin ecosystem, MCP client, custom tools)
586
- - [x] **Phase 5** CLI Installer (setup wizard, Docker, health check)
587
- - [x] **Phase 6** — Web Dashboard (chat, settings, file manager, terminal)
588
- - [x] **Phase 7** — Multi-Platform (Telegram, Discord, WhatsApp, Signal adapters)
589
- - [x] **Phase 8** Universal Tool Use *(NEW)*All providers get agent powers:
590
- - Shell execution, file read/write/edit, directory listing
591
- - Python execution (Excel, PDF, charts, data processing)
592
- - Web fetch & search
593
- - Auto-detect function calling support per provider
594
- - Graceful fallback to text-only for providers without tool support
595
- - [x] **Phase 9** — Skill System + Self-Awareness + Language Adaptation:
596
- - SKILL.md files for specialized domain knowledge (email, data analysis, code, docs, research, sysadmin)
597
- - ✅ Auto-matching: skill triggers activate contextual expertise on demand
598
- - Self-Awareness Core: agent knows it IS the AI (no external LLM calls for text tasks)
599
- - ✅ Automatic language detection and adaptation (EN default, learns user preference)
600
- - Human-readable cron schedules + visual schedule builder in WebUI
601
- - ✅ Platform Manager refactor: all adapters via unified registration system
602
- - Cron notifications for all platforms (Telegram, WhatsApp, Discord, Signal)
603
- - ✅ PM2 auto-refresh on Maintenance page
604
- - WhatsApp group whitelist with per-contact access control
605
- - Owner approval gate (TelegramWhatsApp DM Discord Signal fallback)
606
- - Full media processing: photos, documents, audio/voice, video across all platforms
607
- - File Browser: create, edit, delete files with safety guards
608
- - Git history sanitized (personal data removed via git-filter-repo)
609
- - [x] **Phase 10** Anthropic API Provider + WebUI Provider Management
610
- - [x] Anthropic API key test case in WebUI (validation endpoint)
611
- - [x] "Add Provider" flow in WebUI add new providers post-setup without editing `.env`
612
- - [x] Claude SDK guided setup from WebUI (install check, login status, step-by-step)
613
- - [x] `.env.example` update with `ANTHROPIC_API_KEY`
614
- - [x] **Phase 11** — WebUI Professional Redesign
615
- - [x] Replace emoji icons with Lucide SVG icons (60+ icons, sidebar, pages, buttons)
616
- - [x] i18n framework (`i18n.js`) — bilingual DE/EN with browser-locale detection (~400 keys)
617
- - [x] Language toggle in sidebar footer (DE | EN)
618
- - [x] Typography upgrade (Inter webfont via Google Fonts)
619
- - [x] Gradient accents + subtle glassmorphism on cards
620
- - [x] Smooth page transitions (fade animation on page switch)
621
- - [x] Skeleton loading states + status pulse animations
622
- - [x] Command Palette (Cmd+K / Ctrl+K) with fuzzy search
623
- - [x] **Phase 12** — Native Installers (Non-Techie Friendly)
624
- - [x] Electron wrapper (embedded Node.js + WebUI + tray icon)
625
- - [x] macOS `.dmg` build via electron-builder (arm64)
626
- - [ ] Windows `.exe` (NSIS) via electron-builder
627
- - [ ] Linux `.AppImage` + `.deb` via electron-builder
628
- - [x] Auto-update mechanism (electron-updater)
629
- - [x] GUI Setup Wizard (provider selection, Telegram token, first-run experience)
630
- - [ ] Homebrew formula (`brew install alvin-bot`)
631
- - [ ] Scoop manifest for Windows
632
- - [ ] One-line install script
633
- - [x] Docker Compose polish (production-ready `docker-compose.yml`)
634
- - [x] **Phase 13** — npm publish (security audit)
635
- - [x] **Phase 14** — Async Sub-Agents (v4.10.0)
636
- - [x] `run_in_background: true` system prompt hint for Claude SDK
637
- - [x] Async-agent watcher polling `outputFile` JSONL, delivering results as separate messages
638
- - [x] Session-bound sub-agents (each session spawns its own background workers)
639
- - [x] **Phase 15** — Memory Persistence + Smart Loading (v4.11.0)
640
- - [x] Session persistence across bot restarts (debounced atomic flush, v2 envelope)
641
- - [x] SDK memory injection (MEMORY.md in every system prompt, not just tool-call dependent)
642
- - [x] Semantic recall on SDK first-turn via embeddings
643
- - [x] Layered memory stack (L0 identity / L1 preferences / L2 projects / L3 vector search)
644
- - [x] Auto-fact extraction during compaction (Mem0-style)
645
- - [x] **Phase 16** — Multi-Session + Slack Interface (v4.12.0)
646
- - [x] Session-key fix: platform-message.ts routes through `buildSessionKey()`
647
- - [x] Workspace registry with hot-reload (`~/.alvin-bot/workspaces/*.md`)
648
- - [x] Workspace resolver in platform handlers (per-channel persona + cwd)
649
- - [x] Slack adapter polish: progress ticker (`chat.update`), typing status (`assistant.threads.setStatus`), channel name cache
650
- - [x] Telegram `/workspace` + `/workspaces` commands (feature parity)
651
- - [x] Per-workspace cost aggregation + Web UI workspace cards
652
- - [x] Slack setup guide + copy-paste app manifest (in GitHub Release assets)
653
- - [x] **Phase 17** — Truly detached sub-agents + multi-platform dispatch (v4.13.0 – v4.14.2, 2026-04-16)
654
- - [x] `alvin_dispatch_agent` MCP tool — spawns independent `claude -p` subprocesses that survive parent aborts (v4.13.0)
655
- - [x] Slack `/alvin` slash command (namespaced parent with subcommands: status / new / effort / help + LLM fallthrough) (v4.13.2)
656
- - [x] Sub-agent dispatch on Slack, Discord, WhatsApp via platform-aware delivery registry (v4.14.0)
657
- - [x] `/subagents list` merged view — v4.0.0 bot-level agents + v4.13+ detached dispatches in one list (v4.14.1)
658
- - [x] Watcher zombie guard — missing outputFile > 10 min delivers as failed instead of 12h timeout (v4.14.2)
659
- - [x] Staleness-based partial output recovery for interrupted sub-agents (v4.12.4)
660
- - [ ] SQLite migration of the embeddings index (currently 128 MB JSON)
661
- - [ ] Per-workspace memory layer (additive over global) — facts learned in one workspace stay there unless explicitly promoted to global
662
- - [ ] Per-workspace provider override (`provider:` in frontmatter) — e.g. one workspace uses Claude Opus, another uses a cheaper model
663
- - [ ] Per-workspace skill allowlist — scope Apple Notes to personal workspace, sysadmin only to devops workspace, etc.
664
- - [ ] Multi-User Slack (real `per-channel-peer` mode) — different users in the same Slack channel get their own sub-sessions
665
- - [ ] Workspace cloning / templates — `/workspace clone my-project as my-fork` spins up a new workspace from an existing one
666
- - [ ] Daily log decay / archive — older daily logs move to cold storage after N days
667
- - [ ] **Phase 18** — Security + Platform hardening (from v4.12.1 audit, prioritized)
668
- - [ ] **P1 — Electron major upgrade** (35 → 41+) — fixes 1 HIGH + 5 MODERATE Electron CVEs in the Desktop-Build path. Major version jump, requires full rebuild + test of `.dmg` flow. Separate release (likely bundled with Windows `.exe` work).
669
- - [ ] **P1 — Prompt injection defense strategy** — not a single fix but a design debate: heuristic filters vs allow-list vs no-sandbox-accept-the-risk. Currently handled as a documented design-constraint (README security section), not as a code filter. When we decide the policy, implement it across all message entry points.
670
- - [ ] **P2 — TypeScript 5 → 6 upgrade** — major release, likely breaking changes in strict mode. Needs a dedicated release + test sweep. Low priority since 5.x is still supported.
671
- - [ ] **P0 for v5.0 — MCP plugin sandboxing** — currently MCP servers run with full Node privileges. Plan: run each MCP in a child process with restricted FS + network policy (similar to deno-permission model). Architectural change, v5.0 territory.
545
+ > Per-version details: see [`CHANGELOG.md`](CHANGELOG.md). The roadmap is a forward-looking summary, not a changelog.
546
+
547
+ ### Recently shipped
548
+
549
+ | Version | Theme | Highlights |
550
+ |---|---|---|
551
+ | **v4.22** *(May 2026)* | Memory architecture overhaul | Pluggable embedding providers **Gemini · OpenAI · Ollama · FTS5 (zero-config keyword fallback)**. Auto-detection picks the best available, so users with no API key still get a working indexed memory store. Smart inject mode stops bulk-injecting `MEMORY.md` once SQLite is populated. |
552
+ | **v4.21** | Agent Browser skill | Tier-1.5 token-efficient web automation via the [agent-browser](https://github.com/vercel-labs/agent-browser) CLI opt-in by install. ~90 % token reduction vs Playwright on cooperative pages. |
553
+ | **v4.20** | SQLite-backed vector memory | Replaces the legacy 128 MB JSON index. Automatic migration on first start, per-chunk INSERT/UPDATE, lazy native binary load with graceful fallback. |
554
+ | **v4.18 – v4.19** | Reliability + per-workspace overrides | SDK auto-recovery on token rotation / quota exhaustion / empty streams. Per-workspace `effort` / `provider` / `voice` / `temperature` / `toolset`. |
555
+ | **v4.17** | Hardening audit | Disk cleanup service, hardening fixes from internal audit. |
556
+ | **v4.13 – v4.14** | Detached sub-agents | `alvin_dispatch_agent` MCP tool spawns independent `claude -p` subprocesses that survive parent aborts. Multi-platform dispatch (Slack / Discord / WhatsApp). Watcher zombie guard. |
557
+ | **v4.10 – v4.12** | Multi-session + Slack | Workspace registry with hot-reload, per-channel personas + cwd, Slack adapter with progress ticker + typing status, owner approval gate, async sub-agents. |
558
+
559
+ ### 🏛️ Foundations (built before v4.10)
560
+
561
+ Multi-model provider abstraction with fallback chains · plugin & skill ecosystems with hot-reload · multi-platform adapters (Telegram, WhatsApp, Discord, Signal, Slack) · Web UI with i18n + command palette · native macOS `.dmg` via Electron · Docker Compose · npm distribution · MCP client + custom tools · universal tool use across providers · full media pipeline (audio · video · photo · voice).
562
+
563
+ ### 🎯 On the radar
564
+
565
+ | Priority | Item | Why |
566
+ |---|---|---|
567
+ | **P0 v5.0** | MCP plugin sandboxing | MCP servers currently run with full Node privileges. Plan: child process with restricted FS + network policy (deno-permission style). Architectural change. |
568
+ | **P1** | Electron major upgrade (3541+) + Windows `.exe` | Closes desktop-build CVEs, unblocks the only platform still missing a native installer. |
569
+ | **P1** | Prompt injection defense policy | Needs a design decision (heuristic filter / allow-list / accept-the-risk with clearer warnings) and consistent enforcement at every message entry point. |
570
+ | **P2** | Per-workspace memory layer | Facts learned in one workspace stay scoped unless explicitly promoted. Builds on the v4.22 SQLite store. |
571
+ | **P2** | Per-workspace skill allowlist | Scope Apple Notes to personal workspace, sysadmin tools to devops only, etc. |
572
+ | **P2** | Multi-user Slack (`per-channel-peer`) | Different users in the same Slack channel get their own sub-sessions. |
573
+ | **P3** | Linux `.AppImage` / `.deb`, Homebrew formula, Scoop manifest, one-line install script | Platform reach for non-npm users. |
574
+ | **P3** | Daily-log decay / archive | Older daily logs move to cold storage after N days. |
575
+ | **P3** | Workspace cloning / templates | `/workspace clone my-project as my-fork` spins up a new workspace from an existing one. |
576
+ | **P3** | TypeScript 5 → 6 | 5.x still supported; strict-mode break-fix work, not urgent. |
577
+
578
+ Pull requests welcome see [`CONTRIBUTING.md`](CONTRIBUTING.md).
672
579
 
673
580
  ---
674
581