alvin-bot 4.12.0 → 4.12.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +43 -0
- package/README.md +144 -16
- package/dist/handlers/commands.js +6 -0
- package/dist/handlers/message.js +54 -15
- package/dist/handlers/stuck-timer.js +54 -0
- package/dist/providers/claude-sdk-provider.js +25 -0
- package/dist/services/personality.js +55 -30
- package/package.json +1 -1
- package/skills/social-fetch/SKILL.md +385 -0
- package/skills/webcheck/SKILL.md +150 -0
- package/test/claude-sdk-tool-use-id.test.ts +180 -0
- package/test/stuck-timer.test.ts +116 -0
- package/test/sync-task-timeout.test.ts +153 -0
- package/test/system-prompt-background-hint.test.ts +17 -0
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,49 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to Alvin Bot are documented here.
|
|
4
4
|
|
|
5
|
+
## [4.12.1] — 2026-04-15
|
|
6
|
+
|
|
7
|
+
### 🐛 Patch: Sync sub-agent timeout + workspace command menu
|
|
8
|
+
|
|
9
|
+
Three issues from v4.12.0 production use, fixed:
|
|
10
|
+
|
|
11
|
+
- **Fix (Bug 1)**: `Task`/`Agent` tool calls without `run_in_background: true` were false-aborted after 10 minutes. The Claude Agent SDK runs synchronous sub-agents entirely inside the tool call — the parent stream emits no intermediate chunks during that time, so the flat 10-minute stuck-timer fired on legitimate long-running work. The new task-aware stuck timer detects sync Task/Agent tool calls (tracked by `toolUseId`) and automatically escalates the idle timeout to 120 minutes (configurable via `ALVIN_SYNC_AGENT_IDLE_TIMEOUT_MINUTES`). Once the matching `tool_result` arrives, the timer reverts to the normal 10-minute idle detection for genuine SDK hangs.
|
|
12
|
+
|
|
13
|
+
- **Mitigation (Bug 2)**: The `BACKGROUND_SUBAGENT_HINT` in `src/services/personality.ts` was rewritten with `⚠️ CRITICAL` framing, a concrete decision-tree structure, an aggressive ~30 second threshold (down from "2 minutes"), and an explicit warning about the Telegram session-blocking consequence. The goal is to get Claude to reliably set `run_in_background: true` when sub-agents will take more than a few seconds, so the main Telegram session doesn't stay blocked while the sub-agent works. This is defense-in-depth on top of the Bug 1 fix — the timer prevents false aborts regardless of Claude's compliance; the strengthened hint reduces how often main-session blocking happens in the first place. Compliance is monitored empirically via logs.
|
|
14
|
+
|
|
15
|
+
- **Fix (Bug 3)**: `/workspace` and `/workspaces` were registered as Telegram command handlers in v4.12.0 but not added to the `bot.api.setMyCommands` array, so they didn't appear in Telegram's auto-complete menu (the list that pops up when you type `/`). Added both, plus a new "🧭 Workspaces" block in the `/help` text.
|
|
16
|
+
|
|
17
|
+
#### Architecture details
|
|
18
|
+
|
|
19
|
+
**NEW `src/handlers/stuck-timer.ts`**: Pure state machine `createStuckTimer({normalMs, extendedMs, onTimeout})` returning `{reset, enterSync, exitSync, cancel}`. Testable in isolation without grammy/session/provider mocks via `vi.useFakeTimers()`. 8 unit tests cover normal fire, enterSync extends, exitSync returns, multi-pending, unknown-id no-op, cancel, reset-while-extended, idempotent enterSync.
|
|
20
|
+
|
|
21
|
+
**Protocol change in `src/providers/types.ts` + `claude-sdk-provider.ts`**: `StreamChunk` gains a new additive optional field `runInBackground?: boolean`. The provider extracts it from `block.input.run_in_background` **before** the existing 500-char JSON truncation on `toolInput` — this is load-bearing because for long prompts the serialized input can exceed 500 chars, and naive post-truncation parsing would lose the flag and misclassify sync tasks as async. `toolUseId` is now also yielded on `tool_use` chunks (previously only on `tool_result`) so the consumer can correlate tool_use → tool_result for sync tracking. 4 contract-pin tests mock `@anthropic-ai/claude-agent-sdk` with scripted assistant messages to verify the extraction logic.
|
|
22
|
+
|
|
23
|
+
**Critical ordering in `message.ts`**: State mutation of the pending-sync-task set (`stuckTimer.enterSync` / `stuckTimer.exitSync`) happens **before** `stuckTimer.reset()` in the for-await loop, so the timer arms with the post-mutation state. Inline comment added documenting this invariant.
|
|
24
|
+
|
|
25
|
+
#### Known limitation (not fixed in v4.12.1)
|
|
26
|
+
|
|
27
|
+
A Nanosecond-race where the stuck timer fires the same moment a `tool_result` arrives (fundamentally unfixable without `check-before-fire` semantics in `setTimeout`). With the 120-minute extended window the race requires the tool_result to arrive at exactly 120:00:00.000 — practically irrelevant. A proper fix would require rewriting the timer as a state machine with a pre-fire check, deferred to v4.13.0 if it ever matters.
|
|
28
|
+
|
|
29
|
+
#### Testing
|
|
30
|
+
|
|
31
|
+
**350 tests total** (330 baseline from v4.12.0 + 20 new). All green, TSC clean.
|
|
32
|
+
|
|
33
|
+
- 8 `test/stuck-timer.test.ts` — pure state-machine unit tests
|
|
34
|
+
- 4 `test/claude-sdk-tool-use-id.test.ts` — contract pins for `toolUseId` + `runInBackground` on tool_use chunks
|
|
35
|
+
- 3 new assertions in `test/system-prompt-background-hint.test.ts` (CRITICAL framing, Telegram blocking, 30-second threshold)
|
|
36
|
+
- 5 `test/sync-task-timeout.test.ts` — integration tests over realistic timing scales + regression guard for the pre-fix flat-timeout behavior
|
|
37
|
+
|
|
38
|
+
Live verification after release: local bot restart, Telegram `/` auto-complete shows `/workspace` + `/workspaces`, `curl https://api.telegram.org/bot$TOKEN/getMyCommands` returns the new entries.
|
|
39
|
+
|
|
40
|
+
#### Files changed
|
|
41
|
+
|
|
42
|
+
- **NEW**: `src/handlers/stuck-timer.ts`
|
|
43
|
+
- **NEW tests**: `test/stuck-timer.test.ts`, `test/claude-sdk-tool-use-id.test.ts`, `test/sync-task-timeout.test.ts`
|
|
44
|
+
- **Modified**: `src/providers/types.ts` (`StreamChunk.runInBackground`), `src/providers/claude-sdk-provider.ts` (extract `runInBackground` before truncation, yield `toolUseId` on tool_use), `src/handlers/message.ts` (`createStuckTimer` integration + task-aware flow), `src/services/personality.ts` (`BACKGROUND_SUBAGENT_HINT` rewrite), `src/handlers/commands.ts` (setMyCommands + `/help`), `test/system-prompt-background-hint.test.ts` (3 new assertions)
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
5
48
|
## [4.12.0] — 2026-04-13
|
|
6
49
|
|
|
7
50
|
### 🧭 Multi-Session + Slack Interface — parallel contexts, per-channel workspaces
|
package/README.md
CHANGED
|
@@ -62,20 +62,23 @@ Alvin Bot is an open-source, self-hosted AI agent that lives where you chat. Bui
|
|
|
62
62
|
- **Heartbeat Monitor** — Pings providers every 5 minutes, auto-failover after 2 failures, auto-recovery
|
|
63
63
|
- **User-Configurable Fallback Order** — Rearrange provider priority via Telegram (`/fallback`), Web UI, or API
|
|
64
64
|
- **Adjustable Thinking** — From quick answers (`/effort low`) to deep analysis (`/effort max`)
|
|
65
|
-
- **Persistent Memory** — Remembers across sessions via vector-indexed knowledge base
|
|
65
|
+
- **Persistent Memory** — Remembers across sessions via vector-indexed knowledge base; session state (Claude SDK resume tokens, conversation history, language, effort) survives bot restarts (v4.11.0)
|
|
66
|
+
- **Multi-Session Workspaces** — Run multiple parallel, context-isolated sessions on the same bot — one per Slack channel or per Telegram `/workspace` — each with its own working directory, purpose, and persona. Memory, skills, and sub-agents stay globally shared (v4.12.0). [How-to ↓](#-multi-session-workspaces-v4120)
|
|
67
|
+
- **Background Sub-Agents** — Claude autonomously uses `run_in_background: true` for long audits/research; main session stays responsive, results deliver as separate messages (v4.10.0)
|
|
66
68
|
- **Smart Tool Discovery** — Scans your system at startup, knows exactly what CLI tools, plugins, and APIs are available
|
|
67
|
-
- **Skill System** —
|
|
69
|
+
- **Skill System** — 12 built-in SKILL.md files (code, data analysis, email, docs, research, sysadmin, browse, etc.) auto-activate based on message context
|
|
68
70
|
- **Self-Awareness** — Knows it IS the AI model — won't call external APIs for tasks it can do itself
|
|
69
|
-
- **Automatic Language Detection** — Detects user language (EN/DE) and adapts; learns preference over time
|
|
71
|
+
- **Automatic Language Detection** — Detects user language (EN/DE/ES/FR) and adapts; learns preference over time
|
|
70
72
|
|
|
71
73
|
### 💬 Multi-Platform
|
|
72
74
|
- **Telegram** — Full-featured with streaming, inline keyboards, voice, photos, documents
|
|
75
|
+
- **Slack** — Socket Mode bot via `@slack/bolt`, DMs + @mentions, file attachments, reactions, `assistant.threads.setStatus` typing indicator. **One channel = one isolated workspace.** See [Multi-Session Workspaces](#-multi-session-workspaces-v4120) below.
|
|
73
76
|
- **WhatsApp** — Via WhatsApp Web: self-chat as AI notepad, group whitelist with per-contact access control, full media support (photos, docs, audio, video)
|
|
74
77
|
- **WhatsApp Group Approval** — Owner gets approval requests via Telegram (or WhatsApp DM fallback) before the bot responds to group messages. Silent — group members see nothing.
|
|
75
78
|
- **Discord** — Server bot with mention/reply detection, slash commands
|
|
76
79
|
- **Signal** — Via signal-cli REST API with voice transcription
|
|
77
80
|
- **Terminal** — Rich TUI with ANSI colors and streaming (`alvin-bot tui`)
|
|
78
|
-
- **Web UI** — Full dashboard with chat, settings, file manager, terminal
|
|
81
|
+
- **Web UI** — Full dashboard with chat, settings, file manager, terminal, workspace overview
|
|
79
82
|
|
|
80
83
|
### 🔧 Capabilities
|
|
81
84
|
- **52+ Built-in Tools** — Shell, files, email, screenshots, PDF, media, git, system control
|
|
@@ -244,6 +247,8 @@ If your AI provider isn't working, run `doctor` — it tests the actual API conn
|
|
|
244
247
|
| `/remember <text>` | Save to memory |
|
|
245
248
|
| `/export` | Export conversation |
|
|
246
249
|
| `/dir <path>` | Change working directory |
|
|
250
|
+
| `/workspaces` | List all configured workspaces (v4.12.0) |
|
|
251
|
+
| `/workspace [name]` | Show or switch the active workspace — `/workspace default` resets (v4.12.0) |
|
|
247
252
|
| `/status` | Current session & cost info |
|
|
248
253
|
| `/setup` | Configure API keys & platforms |
|
|
249
254
|
| `/system <prompt>` | Set custom system prompt |
|
|
@@ -258,15 +263,19 @@ If your AI provider isn't working, run `doctor` — it tests the actual API conn
|
|
|
258
263
|
## 🏗️ Architecture
|
|
259
264
|
|
|
260
265
|
```
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
┌──────────┐ ┌──────────┐
|
|
266
|
-
│ Telegram │ │
|
|
267
|
-
└────┬─────┘ └────┬─────┘
|
|
268
|
-
│
|
|
269
|
-
|
|
266
|
+
┌──────────────┐
|
|
267
|
+
│ Web UI │ (Dashboard, Workspaces, Chat, Settings)
|
|
268
|
+
└──────┬───────┘
|
|
269
|
+
│ HTTP/WS
|
|
270
|
+
┌──────────┐ ┌───────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
|
|
271
|
+
│ Telegram │ │ Slack │ │ WhatsApp │ │ Discord │ │ Signal │
|
|
272
|
+
└────┬─────┘ └───┬───┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
|
|
273
|
+
│ │ │ │ │
|
|
274
|
+
└────────────┴───────────┴─────────────┴──────────────┘
|
|
275
|
+
│
|
|
276
|
+
┌─────────┴──────────┐
|
|
277
|
+
│ Workspace Resolver │ (per-channel context: cwd + persona)
|
|
278
|
+
└─────────┬──────────┘
|
|
270
279
|
│
|
|
271
280
|
┌──────┴───────┐
|
|
272
281
|
│ Engine │ (Query routing, fallback)
|
|
@@ -302,14 +311,15 @@ alvin-bot/
|
|
|
302
311
|
│ ├── config.ts # Configuration
|
|
303
312
|
│ ├── handlers/ # Message & command handlers
|
|
304
313
|
│ ├── middleware/ # Auth & access control
|
|
305
|
-
│ ├── platforms/ # Telegram, WhatsApp, Discord, Signal adapters
|
|
314
|
+
│ ├── platforms/ # Telegram, Slack, WhatsApp, Discord, Signal adapters
|
|
306
315
|
│ ├── providers/ # AI provider implementations
|
|
307
|
-
│ ├── services/ # Memory, voice, cron, plugins, tool discovery
|
|
316
|
+
│ ├── services/ # Memory, voice, cron, plugins, workspaces, tool discovery
|
|
308
317
|
│ ├── tui/ # Terminal UI
|
|
309
318
|
│ └── web/ # Web server, APIs, setup wizard
|
|
310
319
|
├── web/public/ # Web UI (HTML/CSS/JS, zero build step)
|
|
311
320
|
├── plugins/ # Plugin directory (6 built-in)
|
|
312
321
|
├── docs/
|
|
322
|
+
│ ├── install/ # Setup guides (macOS, Windows, Slack)
|
|
313
323
|
│ └── custom-models.json # Custom model configurations
|
|
314
324
|
├── TOOLS.md # Custom tool definitions (Markdown)
|
|
315
325
|
├── SOUL.md # Agent personality
|
|
@@ -319,6 +329,89 @@ alvin-bot/
|
|
|
319
329
|
|
|
320
330
|
---
|
|
321
331
|
|
|
332
|
+
## 🧭 Multi-Session Workspaces (v4.12.0)
|
|
333
|
+
|
|
334
|
+
**Run multiple parallel Alvin sessions on the same bot — one per project, context-isolated, memory shared.** Think Claude Coworker, but on your own machine with your own tools. Each workspace has its own working directory, purpose, and optional persona. Sub-agents spawned in one workspace stay in that workspace. Memory, skills, and the knowledge base are globally shared across all of them.
|
|
335
|
+
|
|
336
|
+
### Why you'd want this
|
|
337
|
+
|
|
338
|
+
Without workspaces, Alvin has one big blob of context. If you ask about Alev-B deployment right after debugging a trading bot, Claude pollutes one context with the other. Workspaces solve this: **Slack channel = session**, or on Telegram, **`/workspace alev-b` = session**. Each one has its own Claude SDK `resume` token, history, and current project CLAUDE.md loaded via its working directory.
|
|
339
|
+
|
|
340
|
+
### How it works
|
|
341
|
+
|
|
342
|
+
1. **Drop a markdown file** into `~/.alvin-bot/workspaces/<name>.md` with YAML frontmatter.
|
|
343
|
+
2. **Alvin hot-reloads** the workspace registry (no restart needed — same pattern as skills).
|
|
344
|
+
3. On **Slack**, workspaces resolve by explicit channel ID first, then by channel name match (`#alev-b` → `workspaces/alev-b.md`, case-insensitive).
|
|
345
|
+
4. On **Telegram**, run `/workspace <name>` to switch — next message uses the new persona and cwd.
|
|
346
|
+
5. Nothing configured? Alvin falls back to the "default" workspace exactly like pre-v4.12 — **no breaking changes**.
|
|
347
|
+
|
|
348
|
+
### Example workspace file
|
|
349
|
+
|
|
350
|
+
Create `~/.alvin-bot/workspaces/alev-b.md`:
|
|
351
|
+
|
|
352
|
+
```markdown
|
|
353
|
+
---
|
|
354
|
+
purpose: Alev-B consulting website dev
|
|
355
|
+
cwd: ~/Projects/alev-b-website
|
|
356
|
+
emoji: "🏢"
|
|
357
|
+
color: "#6366f1"
|
|
358
|
+
channels: ["C01ABCDEF"]
|
|
359
|
+
---
|
|
360
|
+
You are focused on the Alev-B consulting website. Stack: React + Express +
|
|
361
|
+
Drizzle + MySQL. Production VPS 72.62.34.230, deploy via rsync. Prefer
|
|
362
|
+
concise, directly actionable answers about features, deployment, and
|
|
363
|
+
Stripe integration.
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
The `cwd` auto-loads the project-specific `CLAUDE.md` via Claude SDK's `settingSources: ["user", "project"]`, so each workspace inherits its project's conventions automatically. `channels` is optional — omit it to match by filename.
|
|
367
|
+
|
|
368
|
+
### Slack setup (5 minutes)
|
|
369
|
+
|
|
370
|
+
1. Download the setup guide + manifest from the [latest release](https://github.com/alvbln/Alvin-Bot/releases/latest):
|
|
371
|
+
- `slack-setup.md` — step-by-step instructions with screenshots
|
|
372
|
+
- `slack-manifest.json` — copy-paste ready Slack App manifest
|
|
373
|
+
2. Create a Slack App from the manifest at https://api.slack.com/apps → **Create New App** → **From an app manifest**
|
|
374
|
+
3. Enable Socket Mode, generate an **App-Level Token** (starts with `xapp-`)
|
|
375
|
+
4. Install the app to your workspace, copy the **Bot User OAuth Token** (starts with `xoxb-`)
|
|
376
|
+
5. Add both to `~/.alvin-bot/.env`:
|
|
377
|
+
```bash
|
|
378
|
+
SLACK_APP_TOKEN=xapp-1-...
|
|
379
|
+
SLACK_BOT_TOKEN=xoxb-...
|
|
380
|
+
SLACK_ALLOWED_USERS=U01ABCDEF # optional, comma-separated
|
|
381
|
+
```
|
|
382
|
+
6. Restart Alvin. You should see `💬 Slack connected (Alvin @ YourWorkspace)` in the log.
|
|
383
|
+
7. Invite Alvin to channels with `/invite @Alvin`. DMs work without an invite.
|
|
384
|
+
|
|
385
|
+
### Telegram `/workspace` commands
|
|
386
|
+
|
|
387
|
+
| Command | Effect |
|
|
388
|
+
|---|---|
|
|
389
|
+
| `/workspaces` | List all configured workspaces with emojis and purposes (active one marked ✅) |
|
|
390
|
+
| `/workspace` | Show the currently active workspace |
|
|
391
|
+
| `/workspace <name>` | Switch to `<name>` — next message uses its persona and cwd |
|
|
392
|
+
| `/workspace default` | Reset to the default workspace (global cwd, no persona) |
|
|
393
|
+
|
|
394
|
+
Workspace selection is per Telegram user, persisted across bot restarts via `~/.alvin-bot/state/sessions.json` (v2 envelope format, backwards compatible with v4.11).
|
|
395
|
+
|
|
396
|
+
### Web UI
|
|
397
|
+
|
|
398
|
+
The dashboard has a dedicated **🧭 Workspaces** tab (Data section in the sidebar). Each workspace shows as a color-coded card with emoji, purpose, cwd, mapped channels, session count, message count, and cumulative cost. Useful for spotting which project is burning the most tokens.
|
|
399
|
+
|
|
400
|
+
Or query directly:
|
|
401
|
+
|
|
402
|
+
```bash
|
|
403
|
+
curl -s http://localhost:3100/api/workspaces | jq
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
### Architecture guarantees
|
|
407
|
+
|
|
408
|
+
- **Memory is global.** Facts Alvin learns in `#alev-b` are visible in `#homes` via the shared `MEMORY.md` and embeddings index. Per-workspace memory layer is on the v4.13 roadmap.
|
|
409
|
+
- **Sub-agents are per-session.** Each workspace can spawn its own `run_in_background` agents — results come back to the same channel automatically (v4.10.0).
|
|
410
|
+
- **Session state survives restart.** Claude SDK `resume` tokens, conversation history, language, effort, and `workspaceName` all persist via `session-persistence.ts` (v4.11.0).
|
|
411
|
+
- **Backwards compatible.** If you don't create any workspace files, everything behaves exactly like v4.11. Upgrade is a no-op.
|
|
412
|
+
|
|
413
|
+
---
|
|
414
|
+
|
|
322
415
|
## ⚙️ Configuration
|
|
323
416
|
|
|
324
417
|
### Environment Variables
|
|
@@ -345,13 +438,21 @@ WHATSAPP_ENABLED=true # Enable WhatsApp (needs Chrome)
|
|
|
345
438
|
DISCORD_TOKEN=<token> # Enable Discord
|
|
346
439
|
SIGNAL_API_URL=<url> # Signal REST API URL
|
|
347
440
|
SIGNAL_NUMBER=<number> # Signal phone number
|
|
441
|
+
SLACK_BOT_TOKEN=xoxb-... # Slack Bot User OAuth Token (Socket Mode)
|
|
442
|
+
SLACK_APP_TOKEN=xapp-1-... # Slack App-Level Token (connections:write scope)
|
|
443
|
+
SLACK_ALLOWED_USERS=U01... # Optional: comma-separated Slack user IDs allowlist
|
|
444
|
+
|
|
445
|
+
# Multi-Session (v4.12.0)
|
|
446
|
+
SESSION_MODE=per-channel # per-user (default) | per-channel | per-channel-peer
|
|
447
|
+
# per-channel gives each Slack channel / group its own isolated session
|
|
348
448
|
|
|
349
449
|
# Optional
|
|
350
|
-
WORKING_DIR=~ # Default working directory
|
|
450
|
+
WORKING_DIR=~ # Default working directory (used when no workspace is resolved)
|
|
351
451
|
MAX_BUDGET_USD=5.0 # Cost limit per session
|
|
352
452
|
WEB_PORT=3100 # Web UI port
|
|
353
453
|
WEB_PASSWORD=<password> # Web UI auth (optional)
|
|
354
454
|
CHROME_PATH=/path/to/chrome # Custom Chrome path (for WhatsApp)
|
|
455
|
+
MEMORY_EXTRACTION_DISABLED=1 # Opt out of v4.11.0 auto-fact-extraction in compaction
|
|
355
456
|
```
|
|
356
457
|
|
|
357
458
|
### Custom Models
|
|
@@ -531,6 +632,33 @@ alvin-bot version # Show version
|
|
|
531
632
|
- [ ] One-line install script
|
|
532
633
|
- [x] Docker Compose polish (production-ready `docker-compose.yml`)
|
|
533
634
|
- [x] **Phase 13** — npm publish (security audit)
|
|
635
|
+
- [x] **Phase 14** — Async Sub-Agents (v4.10.0)
|
|
636
|
+
- [x] `run_in_background: true` system prompt hint for Claude SDK
|
|
637
|
+
- [x] Async-agent watcher polling `outputFile` JSONL, delivering results as separate messages
|
|
638
|
+
- [x] Session-bound sub-agents (each session spawns its own background workers)
|
|
639
|
+
- [x] **Phase 15** — Memory Persistence + Smart Loading (v4.11.0)
|
|
640
|
+
- [x] Session persistence across bot restarts (debounced atomic flush, v2 envelope)
|
|
641
|
+
- [x] SDK memory injection (MEMORY.md in every system prompt, not just tool-call dependent)
|
|
642
|
+
- [x] Semantic recall on SDK first-turn via embeddings
|
|
643
|
+
- [x] Layered memory stack (L0 identity / L1 preferences / L2 projects / L3 vector search)
|
|
644
|
+
- [x] Auto-fact extraction during compaction (Mem0-style)
|
|
645
|
+
- [x] **Phase 16** — Multi-Session + Slack Interface (v4.12.0)
|
|
646
|
+
- [x] Session-key fix: platform-message.ts routes through `buildSessionKey()`
|
|
647
|
+
- [x] Workspace registry with hot-reload (`~/.alvin-bot/workspaces/*.md`)
|
|
648
|
+
- [x] Workspace resolver in platform handlers (per-channel persona + cwd)
|
|
649
|
+
- [x] Slack adapter polish: progress ticker (`chat.update`), typing status (`assistant.threads.setStatus`), channel name cache
|
|
650
|
+
- [x] Telegram `/workspace` + `/workspaces` commands (feature parity)
|
|
651
|
+
- [x] Per-workspace cost aggregation + Web UI workspace cards
|
|
652
|
+
- [x] Slack setup guide + copy-paste app manifest (in GitHub Release assets)
|
|
653
|
+
- [ ] **Phase 17** — Memory + Workspace polish (v4.13.0+)
|
|
654
|
+
- [ ] SQLite migration of the embeddings index (currently 128 MB JSON)
|
|
655
|
+
- [ ] Per-workspace memory layer (additive over global) — facts learned in `#alev-b` stay in `alev-b` unless explicitly promoted to global
|
|
656
|
+
- [ ] Per-workspace provider override (`provider:` in frontmatter) — e.g. Alev-B uses Claude Opus, JobSnack uses cheap Gemini
|
|
657
|
+
- [ ] Per-workspace skill allowlist — scope Apple Notes to personal workspace, sysadmin only to devops workspace, etc.
|
|
658
|
+
- [ ] Multi-User Slack (real `per-channel-peer` mode) — different users in the same Slack channel get their own sub-sessions
|
|
659
|
+
- [ ] Workspace cloning / templates — `/workspace clone alev-b as homes-dev` spins up a new workspace from an existing one
|
|
660
|
+
- [ ] Slack slash commands (`/alvin workspace`, `/alvin status`, `/alvin new`) — native Slack command integration via Bolt
|
|
661
|
+
- [ ] Daily log decay / archive — older daily logs move to cold storage after N days
|
|
534
662
|
|
|
535
663
|
---
|
|
536
664
|
|
|
@@ -110,6 +110,10 @@ export function registerCommands(bot) {
|
|
|
110
110
|
`/effort — Set reasoning depth\n` +
|
|
111
111
|
`/voice — Voice replies on/off\n` +
|
|
112
112
|
`/dir <path> — Working directory\n\n` +
|
|
113
|
+
`🧭 *Workspaces*\n` +
|
|
114
|
+
`/workspaces — List all workspaces\n` +
|
|
115
|
+
`/workspace <name> — Switch active workspace\n` +
|
|
116
|
+
`/workspace default — Reset to default\n\n` +
|
|
113
117
|
`🎨 *Extras*\n` +
|
|
114
118
|
`/imagine <prompt> — Generate image\n` +
|
|
115
119
|
`/remind <time> <text> — Set reminder\n` +
|
|
@@ -149,6 +153,8 @@ export function registerCommands(bot) {
|
|
|
149
153
|
{ command: "version", description: "Show Alvin Bot version" },
|
|
150
154
|
{ command: "new", description: "Start new session" },
|
|
151
155
|
{ command: "dir", description: "Change working directory" },
|
|
156
|
+
{ command: "workspaces", description: "List all workspaces" },
|
|
157
|
+
{ command: "workspace", description: "Switch active workspace" },
|
|
152
158
|
{ command: "web", description: "Quick web search" },
|
|
153
159
|
{ command: "imagine", description: "Generate image (e.g. /imagine A fox)" },
|
|
154
160
|
{ command: "remind", description: "Set reminder (e.g. /remind 30m Text)" },
|
package/dist/handlers/message.js
CHANGED
|
@@ -17,6 +17,7 @@ import { emitUserMessage as broadcastUserMessage, emitResponseStart as broadcast
|
|
|
17
17
|
import { t } from "../i18n.js";
|
|
18
18
|
import { isHarmlessTelegramError } from "../util/telegram-error-filter.js";
|
|
19
19
|
import { handleToolResultChunk } from "./async-agent-chunk-handler.js";
|
|
20
|
+
import { createStuckTimer } from "./stuck-timer.js";
|
|
20
21
|
/**
|
|
21
22
|
* Stuck-only timeout — NO absolute cap.
|
|
22
23
|
*
|
|
@@ -37,6 +38,26 @@ import { handleToolResultChunk } from "./async-agent-chunk-handler.js";
|
|
|
37
38
|
*/
|
|
38
39
|
const STUCK_TIMEOUT_MINUTES = Number(process.env.ALVIN_STUCK_TIMEOUT_MINUTES) || 10;
|
|
39
40
|
const STUCK_TIMEOUT_MS = STUCK_TIMEOUT_MINUTES * 60 * 1000;
|
|
41
|
+
/**
|
|
42
|
+
* v4.12.1 — Task-aware stuck timeout for sync Task/Agent tool calls.
|
|
43
|
+
*
|
|
44
|
+
* When Claude calls the Task/Agent tool WITHOUT run_in_background: true,
|
|
45
|
+
* the Claude Agent SDK runs the sub-agent synchronously inside the tool
|
|
46
|
+
* call. The parent stream emits NO intermediate chunks during that time
|
|
47
|
+
* — it's silent until the sub-agent finishes and the final tool_result
|
|
48
|
+
* arrives. With the normal STUCK_TIMEOUT_MS (10 min), this triggered a
|
|
49
|
+
* false abort on legitimate long-running sub-agents.
|
|
50
|
+
*
|
|
51
|
+
* The new approach: track pending sync Task/Agent tool calls by their
|
|
52
|
+
* toolUseId, and while any are active, escalate the idle timeout to
|
|
53
|
+
* SYNC_AGENT_IDLE_TIMEOUT_MS (default 120 min, env-configurable). After
|
|
54
|
+
* the matching tool_result arrives, revert to the normal timeout.
|
|
55
|
+
*
|
|
56
|
+
* The normal 10-min timeout still applies for genuine SDK hangs (no
|
|
57
|
+
* sync tool call active, no chunks arriving).
|
|
58
|
+
*/
|
|
59
|
+
const SYNC_AGENT_IDLE_TIMEOUT_MINUTES = Number(process.env.ALVIN_SYNC_AGENT_IDLE_TIMEOUT_MINUTES) || 120;
|
|
60
|
+
const SYNC_AGENT_IDLE_TIMEOUT_MS = SYNC_AGENT_IDLE_TIMEOUT_MINUTES * 60 * 1000;
|
|
40
61
|
/** Checkpoint reminder thresholds — kept in sync with
|
|
41
62
|
* src/providers/claude-sdk-provider.ts (where the actual hint injection
|
|
42
63
|
* happens). We mirror the check here so the session telemetry knows
|
|
@@ -165,21 +186,23 @@ export async function handleMessage(ctx) {
|
|
|
165
186
|
const typingInterval = setInterval(() => {
|
|
166
187
|
ctx.api.sendChatAction(ctx.chat.id, "typing").catch(() => { });
|
|
167
188
|
}, 4000);
|
|
168
|
-
//
|
|
169
|
-
//
|
|
170
|
-
//
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
189
|
+
// v4.12.1 — Task-aware stuck timer. Normal mode (STUCK_TIMEOUT_MS)
|
|
190
|
+
// fires after 10 min of silence. When a sync Task/Agent tool call is
|
|
191
|
+
// active (tracked by toolUseId in the for-await loop below), the
|
|
192
|
+
// timeout escalates to SYNC_AGENT_IDLE_TIMEOUT_MS (120 min) so
|
|
193
|
+
// legitimate long-running sub-agents that emit no intermediate chunks
|
|
194
|
+
// don't get falsely aborted. See src/handlers/stuck-timer.ts.
|
|
195
|
+
const stuckTimer = createStuckTimer({
|
|
196
|
+
normalMs: STUCK_TIMEOUT_MS,
|
|
197
|
+
extendedMs: SYNC_AGENT_IDLE_TIMEOUT_MS,
|
|
198
|
+
onTimeout: () => {
|
|
176
199
|
if (session.abortController && !session.abortController.signal.aborted) {
|
|
177
200
|
timedOut = true;
|
|
178
201
|
session.abortController.abort();
|
|
179
202
|
}
|
|
180
|
-
},
|
|
181
|
-
};
|
|
182
|
-
|
|
203
|
+
},
|
|
204
|
+
});
|
|
205
|
+
stuckTimer.reset();
|
|
183
206
|
try {
|
|
184
207
|
// React with 🤔 to show we're thinking
|
|
185
208
|
await react(ctx, "🤔");
|
|
@@ -304,8 +327,25 @@ export async function handleMessage(ctx) {
|
|
|
304
327
|
// not in the tool_result text). See Fix #17 Stage 2.
|
|
305
328
|
let lastAgentToolUseInput;
|
|
306
329
|
for await (const chunk of registry.queryWithFallback(queryOpts)) {
|
|
307
|
-
//
|
|
308
|
-
|
|
330
|
+
// v4.12.1 — Update pending-sync-task state FIRST so the timer's
|
|
331
|
+
// next reset picks up the new state. This ordering is load-bearing:
|
|
332
|
+
// reversing it means the timer rearms with stale state. A sync
|
|
333
|
+
// Task/Agent tool call switches the stuck timer to extended mode
|
|
334
|
+
// (120 min) to tolerate the silent gap until tool_result arrives.
|
|
335
|
+
if (chunk.type === "tool_use" &&
|
|
336
|
+
(chunk.toolName === "Task" || chunk.toolName === "Agent") &&
|
|
337
|
+
chunk.toolUseId &&
|
|
338
|
+
chunk.runInBackground !== true) {
|
|
339
|
+
stuckTimer.enterSync(chunk.toolUseId);
|
|
340
|
+
}
|
|
341
|
+
else if (chunk.type === "tool_result" && chunk.toolUseId) {
|
|
342
|
+
// Any tool_result may match a pending sync entry. Set.delete is
|
|
343
|
+
// a no-op if the id isn't in the set — safe for async results.
|
|
344
|
+
stuckTimer.exitSync(chunk.toolUseId);
|
|
345
|
+
}
|
|
346
|
+
// Any chunk is progress — reset the stuck timer (now with
|
|
347
|
+
// updated pending-sync state so the correct timeout is armed).
|
|
348
|
+
stuckTimer.reset();
|
|
309
349
|
switch (chunk.type) {
|
|
310
350
|
case "text":
|
|
311
351
|
finalText = chunk.text || "";
|
|
@@ -473,8 +513,7 @@ export async function handleMessage(ctx) {
|
|
|
473
513
|
}
|
|
474
514
|
}
|
|
475
515
|
finally {
|
|
476
|
-
|
|
477
|
-
clearTimeout(stuckTimer);
|
|
516
|
+
stuckTimer.cancel();
|
|
478
517
|
clearInterval(typingInterval);
|
|
479
518
|
session.isProcessing = false;
|
|
480
519
|
session.abortController = null;
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Task-aware stuck timer for the Telegram message handler (v4.12.1).
|
|
3
|
+
*
|
|
4
|
+
* The main handler must detect genuine SDK hangs (no chunks for N minutes)
|
|
5
|
+
* while NOT aborting legitimate long-running work — specifically sync Agent
|
|
6
|
+
* tool calls that emit no intermediate chunks for their entire duration.
|
|
7
|
+
*
|
|
8
|
+
* State machine:
|
|
9
|
+
* - Normal mode: idle timeout = NORMAL_MS (default 10 min, env-configurable
|
|
10
|
+
* in message.ts via ALVIN_STUCK_TIMEOUT_MINUTES)
|
|
11
|
+
* - When any Agent/Task tool call is known to be running sync (tracked by
|
|
12
|
+
* its toolUseId), the next reset() arms the timer with EXTENDED_MS
|
|
13
|
+
* instead (default 120 min, env-configurable via
|
|
14
|
+
* ALVIN_SYNC_AGENT_IDLE_TIMEOUT_MINUTES)
|
|
15
|
+
* - Back to NORMAL_MS once all tracked sync tool calls have emitted their
|
|
16
|
+
* tool_result and been released via exitSync()
|
|
17
|
+
*
|
|
18
|
+
* This module is pure — no grammy, no session, no provider. Takes its ms
|
|
19
|
+
* values and onTimeout callback as constructor args. Testable in isolation
|
|
20
|
+
* with vi.useFakeTimers(). The handler owns the state; the handler decides
|
|
21
|
+
* which chunks flip the mode based on chunk.toolName and chunk.runInBackground.
|
|
22
|
+
*
|
|
23
|
+
* See docs/superpowers/plans/... and test/stuck-timer.test.ts.
|
|
24
|
+
*/
|
|
25
|
+
export function createStuckTimer(cfg) {
|
|
26
|
+
const pending = new Set();
|
|
27
|
+
let handle = null;
|
|
28
|
+
const currentTimeout = () => pending.size > 0 ? cfg.extendedMs : cfg.normalMs;
|
|
29
|
+
const rearm = () => {
|
|
30
|
+
if (handle)
|
|
31
|
+
clearTimeout(handle);
|
|
32
|
+
handle = setTimeout(cfg.onTimeout, currentTimeout());
|
|
33
|
+
};
|
|
34
|
+
return {
|
|
35
|
+
reset: rearm,
|
|
36
|
+
enterSync(id) {
|
|
37
|
+
pending.add(id);
|
|
38
|
+
rearm();
|
|
39
|
+
},
|
|
40
|
+
exitSync(id) {
|
|
41
|
+
pending.delete(id);
|
|
42
|
+
rearm();
|
|
43
|
+
},
|
|
44
|
+
cancel() {
|
|
45
|
+
if (handle) {
|
|
46
|
+
clearTimeout(handle);
|
|
47
|
+
handle = null;
|
|
48
|
+
}
|
|
49
|
+
},
|
|
50
|
+
_pendingCount() {
|
|
51
|
+
return pending.size;
|
|
52
|
+
},
|
|
53
|
+
};
|
|
54
|
+
}
|
|
@@ -161,6 +161,24 @@ export class ClaudeSDKProvider {
|
|
|
161
161
|
}
|
|
162
162
|
if ("name" in block) {
|
|
163
163
|
localToolUseCount++;
|
|
164
|
+
// v4.12.1 — Extract run_in_background from the raw input
|
|
165
|
+
// object BEFORE the 500-char JSON truncation below. This is
|
|
166
|
+
// load-bearing: for long prompts the serialized input can
|
|
167
|
+
// exceed 500 chars, and naive post-truncation parsing would
|
|
168
|
+
// lose the flag and misclassify sync tasks as async (→ false
|
|
169
|
+
// 10-min abort on legitimate long-running sub-agents).
|
|
170
|
+
// See src/handlers/stuck-timer.ts and message.ts for the
|
|
171
|
+
// consumer side.
|
|
172
|
+
let runInBackground;
|
|
173
|
+
if ("input" in block &&
|
|
174
|
+
block.input &&
|
|
175
|
+
typeof block.input === "object") {
|
|
176
|
+
const input = block.input;
|
|
177
|
+
if (input.run_in_background === true)
|
|
178
|
+
runInBackground = true;
|
|
179
|
+
else if (input.run_in_background === false)
|
|
180
|
+
runInBackground = false;
|
|
181
|
+
}
|
|
164
182
|
// Serialise the tool input (parameters) so the message
|
|
165
183
|
// handler can surface detail for specific tools — most
|
|
166
184
|
// importantly the "Task" tool where `input.description`
|
|
@@ -176,10 +194,17 @@ export class ClaudeSDKProvider {
|
|
|
176
194
|
// unserializable — skip
|
|
177
195
|
}
|
|
178
196
|
}
|
|
197
|
+
// Tool-use blocks in the Anthropic API always have an `id`
|
|
198
|
+
// at runtime, but the SDK's .d.ts shape doesn't guarantee it
|
|
199
|
+
// — defensive cast. Used by the task-aware stuck timer to
|
|
200
|
+
// correlate tool_use → tool_result for sync tracking.
|
|
201
|
+
const toolUseId = block.id;
|
|
179
202
|
yield {
|
|
180
203
|
type: "tool_use",
|
|
181
204
|
toolName: block.name,
|
|
182
205
|
toolInput: toolInputStr,
|
|
206
|
+
toolUseId,
|
|
207
|
+
runInBackground,
|
|
183
208
|
sessionId: capturedSessionId,
|
|
184
209
|
};
|
|
185
210
|
}
|
|
@@ -61,43 +61,68 @@ const SDK_ADDON = `When you run commands or edit files, briefly explain what you
|
|
|
61
61
|
* See test/system-prompt-background-hint.test.ts and
|
|
62
62
|
* docs/superpowers/plans/2026-04-13-async-subagents.md
|
|
63
63
|
*/
|
|
64
|
-
const BACKGROUND_SUBAGENT_HINT = `## Background Sub-Agents (
|
|
64
|
+
const BACKGROUND_SUBAGENT_HINT = `## ⚠️ CRITICAL: Background Sub-Agents (Telegram blocking rule)
|
|
65
65
|
|
|
66
|
-
For
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
66
|
+
**The rule**: For any Agent/Task tool call that will take more than
|
|
67
|
+
~30 seconds, you MUST set \`run_in_background: true\`. Otherwise the
|
|
68
|
+
Telegram user is COMPLETELY BLOCKED from messaging me until the task
|
|
69
|
+
finishes — terrible UX for anything that ends up taking minutes or
|
|
70
|
+
hours. This is the single most important rule for using the Agent tool.
|
|
71
71
|
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
message in this chat — the bot handles that delivery automatically.
|
|
80
|
-
You don't need to poll the outputFile proactively.
|
|
72
|
+
**Why it matters**: During a synchronous Agent tool call the parent
|
|
73
|
+
session has no way to know the sub-agent is still working. It appears
|
|
74
|
+
hung. After a long silence the session can be aborted, destroying the
|
|
75
|
+
work. Using \`run_in_background: true\` solves this: the tool returns
|
|
76
|
+
an \`agentId\` + \`outputFile\` path IMMEDIATELY, your turn ends in
|
|
77
|
+
seconds, the user can keep chatting with me, and the bot automatically
|
|
78
|
+
delivers the sub-agent's final result as a separate message when ready.
|
|
81
79
|
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
80
|
+
**Decision tree** (apply every time you consider the Agent/Task tool):
|
|
81
|
+
|
|
82
|
+
Does the task involve ANY of the following?
|
|
83
|
+
• Visiting more than 2 URLs
|
|
84
|
+
• Reading/analyzing more than 5 files
|
|
85
|
+
• A full-repo scan, audit, or refactor proposal
|
|
86
|
+
• A multi-section report or structured document
|
|
87
|
+
• Tasks you'd describe as "thorough", "comprehensive", or "deep"
|
|
88
|
+
• Shell commands that may take >30s (builds, installs, tests)
|
|
89
|
+
• Crawling, scraping, or fetching multiple resources
|
|
90
|
+
• Research across multiple sources or domains
|
|
91
|
+
|
|
92
|
+
YES → \`run_in_background: true\` (no exceptions)
|
|
93
|
+
NO → foreground is fine (single quick sub-query under 30s)
|
|
85
94
|
|
|
86
|
-
**
|
|
87
|
-
-
|
|
88
|
-
- Research
|
|
89
|
-
-
|
|
95
|
+
**Examples where you MUST use \`run_in_background: true\`:**
|
|
96
|
+
- ANY audit (SEO, security, code quality, performance, accessibility, GEO)
|
|
97
|
+
- Research visiting more than 1-2 web pages
|
|
98
|
+
- Code reviews on more than a single file
|
|
99
|
+
- Full-repo analyses, dependency scans, architecture proposals
|
|
90
100
|
- Report generation with multiple sub-steps
|
|
91
|
-
-
|
|
101
|
+
- Build / install / test runs
|
|
102
|
+
- Long data-processing jobs
|
|
103
|
+
- Anything involving the word "analyze", "audit", "review", "scan", "research"
|
|
92
104
|
|
|
93
|
-
**
|
|
94
|
-
-
|
|
95
|
-
-
|
|
96
|
-
-
|
|
97
|
-
|
|
105
|
+
**Examples where foreground is fine:**
|
|
106
|
+
- "Read this file and summarize it" (single file, <10s)
|
|
107
|
+
- "What's 2+2?" (no sub-agent needed — answer yourself)
|
|
108
|
+
- "Check if package.json has foo" (one quick tool call)
|
|
109
|
+
|
|
110
|
+
**After launching a background agent, you MUST:**
|
|
111
|
+
1. Tell the user in ONE short sentence what you kicked off.
|
|
112
|
+
Example: "Starting SEO audit for gethomes.io in the background —
|
|
113
|
+
I'll send the report when it's done."
|
|
114
|
+
2. End your turn IMMEDIATELY. Do not continue working. Do not wait.
|
|
115
|
+
3. The bot will deliver the result as a separate message when ready.
|
|
116
|
+
You don't need to poll the outputFile proactively.
|
|
117
|
+
|
|
118
|
+
If the user asks "is it done yet?" before the bot delivers the result,
|
|
119
|
+
you MAY read the agent's \`outputFile\` (from the original tool result)
|
|
120
|
+
using the Read tool to peek at progress — but don't block on it.
|
|
98
121
|
|
|
99
|
-
|
|
100
|
-
|
|
122
|
+
**Never** call the Agent/Task tool without \`run_in_background: true\`
|
|
123
|
+
for anything you're not 100% sure completes in under 30 seconds. The
|
|
124
|
+
cost of unnecessary background mode is zero. The cost of blocking the
|
|
125
|
+
Telegram user for 20 minutes on a synchronous call is very high.`;
|
|
101
126
|
/**
|
|
102
127
|
* Self-Awareness Core — Dynamic introspection block.
|
|
103
128
|
*
|