alvin-bot 4.14.1 → 4.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,119 @@
2
2
 
3
3
  All notable changes to Alvin Bot are documented here.
4
4
 
5
+ ## [4.15.0] — 2026-04-16
6
+
7
+ ### ✨ Feature: auto-latest Claude model selection + per-workspace overrides
8
+
9
+ Alvin now picks up new Claude models (e.g. Opus 4.7 on Max subscription) automatically, and users can switch between Opus / Sonnet / Haiku tiers directly from Telegram — or pin a specific tier per workspace.
10
+
11
+ #### What's new
12
+
13
+ **`/model` now lists four Claude entries** (plus any configured custom providers + Ollama):
14
+ - `Claude (Agent SDK)` — CLI default (= whatever Anthropic ships as current, currently Opus 4.7)
15
+ - `Claude Opus (auto-latest)` — forwards `model: "opus"` to the Agent SDK → latest Opus tier
16
+ - `Claude Sonnet (auto-latest)` — same pattern with Sonnet
17
+ - `Claude Haiku (auto-latest)` — same pattern with Haiku
18
+
19
+ The three aliased entries all route through `ClaudeSDKProvider` with different `model:` values. Switching persists to `~/.alvin-bot/.env` (`PRIMARY_PROVIDER=…`), so the choice survives bot restarts.
20
+
21
+ **Workspaces can pin a model** via an optional YAML frontmatter field:
22
+
23
+ ```yaml
24
+ ---
25
+ purpose: Interview prep
26
+ cwd: ~/Documents/Interviews
27
+ model: sonnet # opus | sonnet | haiku | claude-opus-4-7 | ...
28
+ ---
29
+ ```
30
+
31
+ When `model:` is omitted (the default for all existing workspaces), the globally active `/model` choice is used — no behaviour change.
32
+
33
+ **Fallback on rate limits:** the Agent SDK is now always called with `fallbackModel: "haiku"`. Keeps the bot responsive when the primary tier is throttled.
34
+
35
+ #### Why this matters
36
+
37
+ Before v4.15, `claude-opus-4-6` was hardcoded in six places. When Anthropic released Opus 4.7 on the Max plan, the CLI picked it up automatically — but Alvin's `/status` still claimed `claude-opus-4-6`, and there was no way to force a specific tier from Telegram. The Agent SDK's `query()` call wasn't even receiving a `model:` parameter, so whatever lived in `config.model` was dead metadata.
38
+
39
+ Now:
40
+ - The default `"inherit"` means "don't pass model: — let the CLI pick its current default." Fresh installs on Max plans get Opus 4.7 automatically.
41
+ - Aliases (`opus` / `sonnet` / `haiku`) resolve to the latest tier each release cycle without any code change.
42
+ - Pinning a specific ID (e.g. `claude-opus-4-7`) is supported for reproducibility.
43
+
44
+ #### Implementation
45
+
46
+ - `src/providers/claude-sdk-provider.ts` — forwards `model:` and sets `fallbackModel: "haiku"` on every `query()` call. Resolution order: per-query `options.model` → provider `this.config.model` → `"inherit"` (= no model passed).
47
+ - `src/providers/registry.ts` — registers three virtual entries (`claude-opus`, `claude-sonnet`, `claude-haiku`) as additional keys all backed by `ClaudeSDKProvider` with different `model:` values.
48
+ - `src/services/env-file.ts` — new module extracting the `readEnv` / `writeEnvVar` / `removeEnvVar` helpers from `setup-api.ts` so Telegram command handlers can persist runtime choices.
49
+ - `src/handlers/commands.ts` — `switchProviderWithLifecycle` now calls `writeEnvVar("PRIMARY_PROVIDER", targetKey)` on every switch, not just Web UI changes.
50
+ - `src/services/workspaces.ts` — `Workspace` type gets optional `model?: string`, the YAML parser picks it up from frontmatter.
51
+ - `src/providers/types.ts` — `QueryOptions` gets optional `model?: string` for per-query overrides.
52
+ - `src/handlers/message.ts` + `src/handlers/platform-message.ts` — both forward `workspace.model` into `queryOpts` when the active workspace has one defined.
53
+
54
+ #### Backward compatibility
55
+
56
+ - Default provider config is `"inherit"` — identical to pre-v4.15 behaviour (no `model:` passed to the Agent SDK, CLI default wins).
57
+ - Workspaces without a `model:` field behave exactly as before.
58
+ - Stale presets `claude-sonnet-4-20250514` → `claude-sonnet-4-6` and `claude-3-5-haiku-20241022` → `claude-haiku-4-5` updated (previously unused — only affected the REST-API code paths, which nobody referenced).
59
+
60
+ #### Docs
61
+
62
+ Workspace guides updated (`docs/install/workspaces-de.html` + `workspaces-en.html`) — the YAML-field reference table now documents the new optional `model:` entry.
63
+
64
+ ### 🐛 Bonus: stale model-ID cleanup
65
+
66
+ Four hardcoded Claude model IDs replaced with current strings: `claude-sonnet-4-20250514` → `claude-sonnet-4-6`, `claude-3-5-haiku-20241022` → `claude-haiku-4-5`, openai-compat fallback `claude-opus-4` → `claude-opus-4-6`, setup-API defaults likewise. None of these were on active code paths, but they would have shipped confusing display names if anyone had referenced them.
67
+
68
+ ### Commits
69
+
70
+ - `fed4b91` — feat(providers): v4.15 — auto-latest Claude model selection via /model
71
+ - `b2a6e1f` — feat(workspaces): v4.15 — optional per-workspace model override
72
+
73
+ ---
74
+
75
+ ## [4.14.2] — 2026-04-16
76
+
77
+ ### 🐛 Patch: watcher zombie-entry fix (missing outputFile > 10 min = failed)
78
+
79
+ **Edge case Ali caught today:** a pending async-agent entry stuck in `/subagents list` for 3+ hours showing "running" — but the underlying `alvin_dispatch_agent` subprocess had already died (its output file was gone). The entry would have continued haunting the list until the 12-hour `giveUpAt` ceiling fired.
80
+
81
+ **Root cause:** `async-agent-watcher`'s `pollOnce` handled four states from `parseOutputFileStatus` — `completed` / `failed` / `running` / `missing`. For `missing` (file doesn't exist or is empty), the watcher just kept polling forever, on the assumption that a slow subprocess might eventually write. If the subprocess crashed before writing ANY output, the file never appeared, and we polled for 12 hours before timing out.
82
+
83
+ **Fix:** when `status.state === "missing"` AND `now - entry.startedAt > MISSING_FILE_FAILURE_MS` (default 10 min, configurable via `ALVIN_MISSING_FILE_FAILURE_MS` env var), deliver as failed with an explicit message:
84
+
85
+ > *Dispatched subprocess never wrote its output file (N m after start). Likely crashed before initializing, or the file was removed externally.*
86
+
87
+ 10 minutes is well above any legitimate `claude -p` startup variance (normal first-write latency is seconds) and well below the 12-hour hard ceiling.
88
+
89
+ ### What's preserved (regression-guard tested)
90
+
91
+ - Running agents (file has content but no `end_turn`/`result` yet) are untouched by this path — they still keep polling as before.
92
+ - Completed agents (clean `end_turn` or `stream-json result` event) still deliver normally.
93
+ - Explicit `failed` state from the parser (if ever used) still delivers error normally.
94
+ - v4.12.4's "file is stale but has text → deliver partial" path takes precedence over the new zombie check (the file has content, so not "missing").
95
+ - 12-hour `giveUpAt` hard ceiling still applies as the ultimate safety net.
96
+ - Session's `pendingBackgroundCount` decrement fires on zombie failure, same as every other delivery path.
97
+
98
+ ### Testing
99
+
100
+ - **Baseline**: 498 tests (v4.14.1)
101
+ - **New**: `test/watcher-zombie-fix.test.ts` — 6 tests:
102
+ - Young missing file (<threshold) stays pending
103
+ - Old missing file (>threshold) delivers failed + removes from pending
104
+ - Default threshold is 10 min when env var unset
105
+ - Running file (has content) is unaffected by zombie check
106
+ - Completed file delivers as completed (regression guard)
107
+ - Session's `pendingBackgroundCount` decrements on zombie delivery
108
+ - **Total**: 504 tests, all green, TSC clean
109
+
110
+ ### Files changed
111
+
112
+ - **Modified**: `src/services/async-agent-watcher.ts` (new `getMissingFileFailureMs()` + zombie branch in `pollOnce`)
113
+ - **NEW tests**: `test/watcher-zombie-fix.test.ts`
114
+ - **Version**: `package.json` 4.14.1 → 4.14.2
115
+
116
+ ---
117
+
5
118
  ## [4.14.1] — 2026-04-16
6
119
 
7
120
  ### 🐛 Patch: `/subagents list` now shows v4.13+ dispatch agents too
package/README.md CHANGED
@@ -64,7 +64,7 @@ Alvin Bot is an open-source, self-hosted AI agent that lives where you chat. Bui
64
64
  - **Adjustable Thinking** — From quick answers (`/effort low`) to deep analysis (`/effort max`)
65
65
  - **Persistent Memory** — Remembers across sessions via vector-indexed knowledge base; session state (Claude SDK resume tokens, conversation history, language, effort) survives bot restarts (v4.11.0)
66
66
  - **Multi-Session Workspaces** — Run multiple parallel, context-isolated sessions on the same bot — one per Slack channel or per Telegram `/workspace` — each with its own working directory, purpose, and persona. Memory, skills, and sub-agents stay globally shared (v4.12.0). [How-to ↓](#-multi-session-workspaces-v4120)
67
- - **Background Sub-Agents** — Claude autonomously uses `run_in_background: true` for long audits/research; main session stays responsive, results deliver as separate messages (v4.10.0)
67
+ - **Truly Detached Sub-Agents** — Claude dispatches long-running research/audit tasks via the `alvin_dispatch_agent` MCP tool, which spawns independent `claude -p` subprocesses with their own PID + process group. Main session stays fully responsive, user can interrupt freely without killing sub-agents. Results deliver as separate messages. Works identically on Telegram, Slack, Discord, and WhatsApp (v4.13.0+ dispatch, v4.14.0 multi-platform)
68
68
  - **Smart Tool Discovery** — Scans your system at startup, knows exactly what CLI tools, plugins, and APIs are available
69
69
  - **Skill System** — 12 built-in SKILL.md files (code, data analysis, email, docs, research, sysadmin, browse, etc.) auto-activate based on message context
70
70
  - **Self-Awareness** — Knows it IS the AI model — won't call external APIs for tasks it can do itself
@@ -406,7 +406,7 @@ curl -s http://localhost:3100/api/workspaces | jq
406
406
  ### Architecture guarantees
407
407
 
408
408
  - **Memory is global.** Facts Alvin learns in `#alev-b` are visible in `#homes` via the shared `MEMORY.md` and embeddings index. Per-workspace memory layer is on the v4.13 roadmap.
409
- - **Sub-agents are per-session.** Each workspace can spawn its own `run_in_background` agents — results come back to the same channel automatically (v4.10.0).
409
+ - **Sub-agents are per-session.** Each workspace can dispatch its own detached sub-agents via `alvin_dispatch_agent` — results come back to the originating channel on any platform (Telegram, Slack, Discord, WhatsApp), visible in `/subagents list` (v4.13.0+ dispatch, v4.14.0 cross-platform, v4.14.1 unified list view).
410
410
  - **Session state survives restart.** Claude SDK `resume` tokens, conversation history, language, effort, and `workspaceName` all persist via `session-persistence.ts` (v4.11.0).
411
411
  - **Backwards compatible.** If you don't create any workspace files, everything behaves exactly like v4.11. Upgrade is a no-op.
412
412
 
@@ -650,14 +650,19 @@ alvin-bot version # Show version
650
650
  - [x] Telegram `/workspace` + `/workspaces` commands (feature parity)
651
651
  - [x] Per-workspace cost aggregation + Web UI workspace cards
652
652
  - [x] Slack setup guide + copy-paste app manifest (in GitHub Release assets)
653
- - [ ] **Phase 17** — Memory + Workspace polish (v4.13.0+)
653
+ - [x] **Phase 17** — Truly detached sub-agents + multi-platform dispatch (v4.13.0 – v4.14.2, 2026-04-16)
654
+ - [x] `alvin_dispatch_agent` MCP tool — spawns independent `claude -p` subprocesses that survive parent aborts (v4.13.0)
655
+ - [x] Slack `/alvin` slash command (namespaced parent with subcommands: status / new / effort / help + LLM fallthrough) (v4.13.2)
656
+ - [x] Sub-agent dispatch on Slack, Discord, WhatsApp via platform-aware delivery registry (v4.14.0)
657
+ - [x] `/subagents list` merged view — v4.0.0 bot-level agents + v4.13+ detached dispatches in one list (v4.14.1)
658
+ - [x] Watcher zombie guard — missing outputFile > 10 min delivers as failed instead of 12h timeout (v4.14.2)
659
+ - [x] Staleness-based partial output recovery for interrupted sub-agents (v4.12.4)
654
660
  - [ ] SQLite migration of the embeddings index (currently 128 MB JSON)
655
661
  - [ ] Per-workspace memory layer (additive over global) — facts learned in `#alev-b` stay in `alev-b` unless explicitly promoted to global
656
662
  - [ ] Per-workspace provider override (`provider:` in frontmatter) — e.g. Alev-B uses Claude Opus, JobSnack uses cheap Gemini
657
663
  - [ ] Per-workspace skill allowlist — scope Apple Notes to personal workspace, sysadmin only to devops workspace, etc.
658
664
  - [ ] Multi-User Slack (real `per-channel-peer` mode) — different users in the same Slack channel get their own sub-sessions
659
665
  - [ ] Workspace cloning / templates — `/workspace clone alev-b as homes-dev` spins up a new workspace from an existing one
660
- - [ ] Slack slash commands (`/alvin workspace`, `/alvin status`, `/alvin new`) — native Slack command integration via Bolt
661
666
  - [ ] Daily log decay / archive — older daily logs move to cold storage after N days
662
667
  - [ ] **Phase 18** — Security + Platform hardening (from v4.12.1 audit, prioritized)
663
668
  - [ ] **P1 — Electron major upgrade** (35 → 41+) — fixes 1 HIGH + 5 MODERATE Electron CVEs in the Desktop-Build path. Major version jump, requires full rebuild + test of `.dmg` flow. Separate release (likely bundled with Windows `.exe` work).
@@ -16,6 +16,7 @@ import { getLoadedPlugins, getPluginsDir } from "../services/plugins.js";
16
16
  import { getMCPStatus, getMCPTools, callMCPTool } from "../services/mcp.js";
17
17
  import { listCustomTools, executeCustomTool } from "../services/custom-tools.js";
18
18
  import { screenshotUrl, extractText, generatePdf, hasPlaywright } from "../services/browser.js";
19
+ import { writeEnvVar } from "../services/env-file.js";
19
20
  import { listJobs, createJob, deleteJob, toggleJob, runJobNow, formatNextRun, humanReadableSchedule } from "../services/cron.js";
20
21
  import { resolveJobByNameOrId } from "../services/cron-resolver.js";
21
22
  import { buildTickerText, buildDoneText, escapeMarkdown } from "./cron-progress.js";
@@ -518,6 +519,15 @@ export function registerCommands(bot) {
518
519
  if (!registry.switchTo(targetKey)) {
519
520
  return { ok: false, error: "switch rejected by registry" };
520
521
  }
522
+ // v4.15 — Persist the switch to ~/.alvin-bot/.env so the choice
523
+ // survives bot restarts. In-memory switchTo() alone would revert to
524
+ // PRIMARY_PROVIDER on next boot.
525
+ try {
526
+ writeEnvVar("PRIMARY_PROVIDER", targetKey);
527
+ }
528
+ catch (err) {
529
+ console.warn("⚠️ Failed to persist PRIMARY_PROVIDER:", err);
530
+ }
521
531
  // Tear down the previous provider's lifecycle (if any) after the switch.
522
532
  // ensureStopped() internally checks isBotManaged — no-op for externally
523
533
  // managed daemons.
@@ -386,6 +386,9 @@ export async function handleMessage(ctx) {
386
386
  systemPrompt,
387
387
  workingDir: session.workingDir,
388
388
  effort: session.effort,
389
+ // v4.15 — Per-workspace model override (optional YAML `model:` field).
390
+ // When unset, falls through to the globally active provider's model.
391
+ ...(workspace.model ? { model: workspace.model } : {}),
389
392
  abortSignal: session.abortController.signal,
390
393
  // User's UI locale — registry uses it to localize failure messages.
391
394
  locale: session.language,
@@ -169,6 +169,9 @@ export async function handlePlatformMessage(msg, adapter) {
169
169
  systemPrompt,
170
170
  workingDir: session.workingDir,
171
171
  effort: session.effort,
172
+ // v4.15 — Per-workspace model override (optional YAML `model:` field).
173
+ // When unset, falls through to the globally active provider's model.
174
+ ...(workspace.model ? { model: workspace.model } : {}),
172
175
  sessionId: isSDK ? session.sessionId : null,
173
176
  history: !isSDK ? session.history : undefined,
174
177
  // v4.14 — Expose alvin_dispatch_agent MCP tool on non-Telegram
@@ -49,7 +49,10 @@ export class ClaudeSDKProvider {
49
49
  this.config = {
50
50
  type: "claude-sdk",
51
51
  name: "Claude (Agent SDK)",
52
- model: "claude-opus-4-6",
52
+ // "inherit" = don't pass model: to the SDK → Claude CLI default wins
53
+ // (currently Opus 4.7 on Max subscription). Override with an alias
54
+ // ("opus" | "sonnet" | "haiku") or a full ID ("claude-opus-4-7").
55
+ model: "inherit",
53
56
  supportsTools: true,
54
57
  supportsVision: true,
55
58
  supportsStreaming: true,
@@ -123,6 +126,13 @@ export class ClaudeSDKProvider {
123
126
  if (options.alvinDispatchContext) {
124
127
  defaultAllowed.push("mcp__alvin__dispatch_agent");
125
128
  }
129
+ // v4.15 — Forward model selection to the Agent SDK. Resolution order:
130
+ // 1. options.model (per-query override — e.g. workspace `model:` field)
131
+ // 2. this.config.model (provider-level default — e.g. claude-sonnet)
132
+ // 3. "inherit" → don't pass model: → Claude CLI default (Opus 4.7 on Max)
133
+ // Aliases "opus" | "sonnet" | "haiku" auto-resolve to the latest tier.
134
+ const rawModel = options.model ?? this.config.model;
135
+ const modelOverride = rawModel && rawModel !== "inherit" ? rawModel : undefined;
126
136
  const q = query({
127
137
  prompt,
128
138
  options: {
@@ -145,6 +155,12 @@ export class ClaudeSDKProvider {
145
155
  effort: (options.effort || "medium"),
146
156
  maxTurns: 50,
147
157
  betas: ["context-1m-2025-08-07"],
158
+ ...(modelOverride ? { model: modelOverride } : {}),
159
+ // Always prefer Haiku as fallback on rate-limit/overload — cheap
160
+ // and fast, keeps the bot responsive when the primary tier is
161
+ // throttled. Users can disable this by setting model: "inherit"
162
+ // and relying purely on CLI defaults.
163
+ fallbackModel: "haiku",
148
164
  },
149
165
  });
150
166
  let accumulatedText = "";
@@ -370,9 +386,12 @@ export class ClaudeSDKProvider {
370
386
  }
371
387
  }
372
388
  getInfo() {
389
+ const model = this.config.model === "inherit"
390
+ ? "CLI default (latest)"
391
+ : this.config.model;
373
392
  return {
374
393
  name: this.config.name,
375
- model: this.config.model,
394
+ model,
376
395
  status: "✅ Agent SDK (CLI auth)",
377
396
  };
378
397
  }
@@ -271,13 +271,38 @@ export function createRegistry(config) {
271
271
  model: "gpt-5.4",
272
272
  };
273
273
  }
274
- // Always register Claude SDK if it's referenced
275
- if (config.primary === "claude-sdk" || config.fallbacks?.includes("claude-sdk")) {
274
+ // Claude (Agent SDK) the base provider plus three tier-aliased virtual
275
+ // entries. All four route through the same ClaudeSDKProvider implementation
276
+ // but pass a different `model:` to the Agent SDK at query time. The aliases
277
+ // ("opus" | "sonnet" | "haiku") auto-resolve to the latest tier on the
278
+ // Claude CLI — no hardcoded version IDs, no manual updates when Anthropic
279
+ // releases a new model.
280
+ const claudeKeys = ["claude-sdk", "claude-opus", "claude-sonnet", "claude-haiku"];
281
+ const claudeReferenced = claudeKeys.some((k) => config.primary === k || config.fallbacks?.includes(k));
282
+ if (claudeReferenced) {
276
283
  providers["claude-sdk"] = {
277
284
  ...PROVIDER_PRESETS["claude-sdk"],
278
285
  type: "claude-sdk",
279
286
  name: "Claude (Agent SDK)",
280
- model: "claude-opus-4-6",
287
+ model: "inherit", // CLI default → currently Opus 4.7 on Max plan
288
+ };
289
+ providers["claude-opus"] = {
290
+ ...PROVIDER_PRESETS["claude-sdk"],
291
+ type: "claude-sdk",
292
+ name: "Claude Opus (auto-latest)",
293
+ model: "opus",
294
+ };
295
+ providers["claude-sonnet"] = {
296
+ ...PROVIDER_PRESETS["claude-sdk"],
297
+ type: "claude-sdk",
298
+ name: "Claude Sonnet (auto-latest)",
299
+ model: "sonnet",
300
+ };
301
+ providers["claude-haiku"] = {
302
+ ...PROVIDER_PRESETS["claude-sdk"],
303
+ type: "claude-sdk",
304
+ name: "Claude Haiku (auto-latest)",
305
+ model: "haiku",
281
306
  };
282
307
  }
283
308
  // Register Google Gemini only if explicitly referenced as primary/fallback
@@ -38,8 +38,8 @@ export const PROVIDER_PRESETS = {
38
38
  },
39
39
  "claude-sonnet": {
40
40
  type: "openai-compatible",
41
- name: "Claude Sonnet 4",
42
- model: "claude-sonnet-4-20250514",
41
+ name: "Claude Sonnet 4.6",
42
+ model: "claude-sonnet-4-6",
43
43
  baseUrl: "https://api.anthropic.com/v1/",
44
44
  supportsVision: true,
45
45
  supportsStreaming: true,
@@ -48,8 +48,8 @@ export const PROVIDER_PRESETS = {
48
48
  },
49
49
  "claude-haiku": {
50
50
  type: "openai-compatible",
51
- name: "Claude 3.5 Haiku",
52
- model: "claude-3-5-haiku-20241022",
51
+ name: "Claude Haiku 4.5",
52
+ model: "claude-haiku-4-5",
53
53
  baseUrl: "https://api.anthropic.com/v1/",
54
54
  supportsVision: true,
55
55
  supportsStreaming: true,
@@ -33,6 +33,31 @@ const POLL_INTERVAL_MS = 15_000;
33
33
  * a timeout banner. SEO audits historically take ~13 min, so 12h
34
34
  * is absurdly generous and protects against state-file growth. */
35
35
  const MAX_AGENT_AGE_MS = 12 * 60 * 60 * 1000;
36
+ /**
37
+ * v4.14.2 — When a dispatched subprocess never creates its outputFile
38
+ * (spawn failure, crash before first write, file deleted externally),
39
+ * `parseOutputFileStatus` returns "missing" on every poll. Pre-v4.14.2
40
+ * that meant waiting the full 12h MAX_AGENT_AGE_MS before delivering a
41
+ * timeout — a 12-hour zombie in `/subagents list`.
42
+ *
43
+ * This threshold caps how long we tolerate a missing file before
44
+ * declaring the agent failed. `claude -p` normally writes its first
45
+ * JSONL line within seconds of spawn; 10 minutes is way above any
46
+ * legitimate startup variance and well below the 12h ceiling.
47
+ *
48
+ * Configurable via the ALVIN_MISSING_FILE_FAILURE_MS env var. Tests
49
+ * use shorter values via the same hook. Only the getter is exposed
50
+ * so callers always see the current env value, not a stale constant.
51
+ */
52
+ function getMissingFileFailureMs() {
53
+ const raw = process.env.ALVIN_MISSING_FILE_FAILURE_MS;
54
+ if (raw) {
55
+ const n = Number(raw);
56
+ if (Number.isFinite(n) && n > 0)
57
+ return n;
58
+ }
59
+ return 10 * 60 * 1000; // default 10 min
60
+ }
36
61
  // ── Module state ──────────────────────────────────────────────────
37
62
  const pending = new Map();
38
63
  let pollTimer = null;
@@ -139,6 +164,7 @@ export function stopWatcher() {
139
164
  export async function pollOnce() {
140
165
  const now = Date.now();
141
166
  const toRemove = [];
167
+ const missingFileFailureMs = getMissingFileFailureMs();
142
168
  for (const entry of pending.values()) {
143
169
  entry.lastCheckedAt = now;
144
170
  // Timeout check first — if the agent is past its giveUpAt, give up
@@ -157,7 +183,15 @@ export async function pollOnce() {
157
183
  await deliverAsFailure(entry, "error", status.error);
158
184
  toRemove.push(entry.agentId);
159
185
  }
160
- // running / missing → keep polling next cycle
186
+ else if (status.state === "missing" &&
187
+ now - entry.startedAt > missingFileFailureMs) {
188
+ // v4.14.2 — Zombie guard: the subprocess never created its
189
+ // output file within `missingFileFailureMs` (default 10 min).
190
+ // Declare failed instead of polling until the 12h giveUpAt.
191
+ await deliverAsFailure(entry, "error", `Dispatched subprocess never wrote its output file (${Math.round((now - entry.startedAt) / 60_000)}m after start). Likely crashed before initializing, or the file was removed externally.`);
192
+ toRemove.push(entry.agentId);
193
+ }
194
+ // running / missing-but-young → keep polling next cycle
161
195
  }
162
196
  if (toRemove.length > 0) {
163
197
  for (const id of toRemove)
@@ -0,0 +1,46 @@
1
+ /**
2
+ * env-file — Shared helpers for reading and persisting key=value pairs
3
+ * in ~/.alvin-bot/.env. Previously private to setup-api.ts; extracted so
4
+ * Telegram command handlers (e.g. /model) can persist the user's runtime
5
+ * choices across bot restarts.
6
+ *
7
+ * All writes go through writeSecure() which enforces 0o600 on the env
8
+ * file — it contains bot tokens and API keys.
9
+ */
10
+ import fs from "fs";
11
+ import { ENV_FILE } from "../paths.js";
12
+ import { writeSecure } from "./file-permissions.js";
13
+ /** Read the env file into a plain object. Skips comments and malformed lines. */
14
+ export function readEnv() {
15
+ if (!fs.existsSync(ENV_FILE))
16
+ return {};
17
+ const lines = fs.readFileSync(ENV_FILE, "utf-8").split("\n");
18
+ const env = {};
19
+ for (const line of lines) {
20
+ if (line.startsWith("#") || !line.includes("="))
21
+ continue;
22
+ const idx = line.indexOf("=");
23
+ env[line.slice(0, idx).trim()] = line.slice(idx + 1).trim();
24
+ }
25
+ return env;
26
+ }
27
+ /** Upsert a key=value pair in the env file, preserving all other lines. */
28
+ export function writeEnvVar(key, value) {
29
+ let content = fs.existsSync(ENV_FILE) ? fs.readFileSync(ENV_FILE, "utf-8") : "";
30
+ const regex = new RegExp(`^${key}=.*$`, "m");
31
+ if (regex.test(content)) {
32
+ content = content.replace(regex, `${key}=${value}`);
33
+ }
34
+ else {
35
+ content = content.trimEnd() + `\n${key}=${value}\n`;
36
+ }
37
+ writeSecure(ENV_FILE, content);
38
+ }
39
+ /** Remove a key from the env file. No-op if missing. */
40
+ export function removeEnvVar(key) {
41
+ if (!fs.existsSync(ENV_FILE))
42
+ return;
43
+ let content = fs.readFileSync(ENV_FILE, "utf-8");
44
+ content = content.replace(new RegExp(`^${key}=.*\n?`, "m"), "");
45
+ writeSecure(ENV_FILE, content);
46
+ }
@@ -99,6 +99,7 @@ function readWorkspaceFile(filePath, name) {
99
99
  const cwd = expandHome(rawCwd);
100
100
  const color = typeof fm.color === "string" ? fm.color : undefined;
101
101
  const emoji = typeof fm.emoji === "string" ? fm.emoji : undefined;
102
+ const model = typeof fm.model === "string" && fm.model.trim() ? fm.model.trim() : undefined;
102
103
  const channels = Array.isArray(fm.channels)
103
104
  ? fm.channels.filter((c) => typeof c === "string")
104
105
  : [];
@@ -109,6 +110,7 @@ function readWorkspaceFile(filePath, name) {
109
110
  color,
110
111
  emoji,
111
112
  channels,
113
+ model,
112
114
  systemPromptOverride: body.trim(),
113
115
  };
114
116
  }
@@ -78,7 +78,7 @@ async function handleChatCompletions(req, res, body) {
78
78
  const { prompt, systemPrompt } = buildPromptFromMessages(oaiReq.messages);
79
79
  const completionId = `chatcmpl-${crypto.randomUUID().replace(/-/g, "").slice(0, 24)}`;
80
80
  const created = Math.floor(Date.now() / 1000);
81
- const model = oaiReq.model || "claude-opus-4";
81
+ const model = oaiReq.model || "claude-opus-4-6";
82
82
  // Optional session resumption via header
83
83
  const sessionId = req.headers["x-session-id"] || null;
84
84
  const p = getProvider();
@@ -12,42 +12,8 @@ import { execSync } from "child_process";
12
12
  import { getRegistry } from "../engine.js";
13
13
  import { listJobs, createJob, deleteJob, toggleJob, updateJob, runJobNow, formatNextRun, humanReadableSchedule } from "../services/cron.js";
14
14
  import { storePassword, revokePassword, getSudoStatus, verifyPassword, sudoExec, requestAdminViaDialog, openSystemSettings } from "../services/sudo.js";
15
- import { ENV_FILE, CUSTOM_MODELS as CUSTOM_MODELS_FILE, BOT_ROOT, WHATSAPP_AUTH } from "../paths.js";
16
- import { writeSecure } from "../services/file-permissions.js";
17
- // ── Env Helpers ─────────────────────────────────────────
18
- function readEnv() {
19
- if (!fs.existsSync(ENV_FILE))
20
- return {};
21
- const lines = fs.readFileSync(ENV_FILE, "utf-8").split("\n");
22
- const env = {};
23
- for (const line of lines) {
24
- if (line.startsWith("#") || !line.includes("="))
25
- continue;
26
- const idx = line.indexOf("=");
27
- env[line.slice(0, idx).trim()] = line.slice(idx + 1).trim();
28
- }
29
- return env;
30
- }
31
- function writeEnvVar(key, value) {
32
- let content = fs.existsSync(ENV_FILE) ? fs.readFileSync(ENV_FILE, "utf-8") : "";
33
- const regex = new RegExp(`^${key}=.*$`, "m");
34
- if (regex.test(content)) {
35
- content = content.replace(regex, `${key}=${value}`);
36
- }
37
- else {
38
- content = content.trimEnd() + `\n${key}=${value}\n`;
39
- }
40
- // v4.12.2 — .env contains all secrets (bot tokens, API keys). Enforce
41
- // 0o600 so other users on the machine can't read it.
42
- writeSecure(ENV_FILE, content);
43
- }
44
- function removeEnvVar(key) {
45
- if (!fs.existsSync(ENV_FILE))
46
- return;
47
- let content = fs.readFileSync(ENV_FILE, "utf-8");
48
- content = content.replace(new RegExp(`^${key}=.*\n?`, "m"), "");
49
- writeSecure(ENV_FILE, content);
50
- }
15
+ import { CUSTOM_MODELS as CUSTOM_MODELS_FILE, BOT_ROOT, WHATSAPP_AUTH } from "../paths.js";
16
+ import { readEnv, writeEnvVar, removeEnvVar } from "../services/env-file.js";
51
17
  function loadCustomModels() {
52
18
  try {
53
19
  return JSON.parse(fs.readFileSync(CUSTOM_MODELS_FILE, "utf-8"));
@@ -180,9 +146,9 @@ const PROVIDERS = [
180
146
  description: "Claude Opus, Sonnet, Haiku directly via API key. OpenAI-compatible.",
181
147
  envKey: "ANTHROPIC_API_KEY",
182
148
  models: [
183
- { key: "claude-opus", name: "Claude Opus 4", model: "claude-opus-4-6" },
184
- { key: "claude-sonnet", name: "Claude Sonnet 4", model: "claude-sonnet-4-20250514" },
185
- { key: "claude-haiku", name: "Claude 3.5 Haiku", model: "claude-3-5-haiku-20241022" },
149
+ { key: "claude-opus", name: "Claude Opus 4.6", model: "claude-opus-4-6" },
150
+ { key: "claude-sonnet", name: "Claude Sonnet 4.6", model: "claude-sonnet-4-6" },
151
+ { key: "claude-haiku", name: "Claude Haiku 4.5", model: "claude-haiku-4-5" },
186
152
  ],
187
153
  signupUrl: "https://console.anthropic.com/settings/keys",
188
154
  docsUrl: "https://docs.anthropic.com/en/api",
package/docs/security.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Alvin Bot — Security Threat Model & Hardening Guide
2
2
 
3
- > **Last updated:** 2026-04-15 (v4.12.2)
3
+ > **Last updated:** 2026-04-16 (v4.14.2)
4
4
  > **Audience:** Operators installing Alvin Bot on their own machine.
5
5
  > **Short version:** Alvin Bot is a full AI agent with shell, filesystem, and network access on the machine it runs on. Treat it like you would `sudo` access. Only install on machines where you would trust Claude Code to run without supervision.
6
6
 
@@ -270,6 +270,20 @@ If you suspect the bot has been compromised or exfiltrated secrets:
270
270
 
271
271
  ## Version history
272
272
 
273
+ - **v4.14.2** (2026-04-16) — Watcher zombie guard: missing outputFile > 10 min (env-configurable) delivers as failed instead of 12h timeout. Prevents stuck pending entries when a dispatched `claude -p` subprocess crashes before writing output or the file gets removed externally. No new attack surface.
274
+
275
+ - **v4.14.1** (2026-04-16) — `/subagents list` unified view: merges v4.0.0 bot-level `activeAgents` registry with v4.13+ `async-agent-watcher` pending registry. Cosmetic/diagnostic only, no security implications.
276
+
277
+ - **v4.14.0** (2026-04-16) — Sub-agent dispatch on Slack / Discord / WhatsApp via the `alvin_dispatch_agent` MCP tool. New `delivery-registry` module routes sub-agent deliveries to the right platform adapter. Types widened (`chatId: number | string`, `platform?: ...`). Telegram path bit-for-bit unchanged. Trust boundary expanded: each non-Telegram platform adapter now has `sendText` access to its respective channel — same trust level as the main adapter's `sendText`, no new capabilities.
278
+
279
+ - **v4.13.2** (2026-04-16) — Slack `/alvin` slash command via Bolt `app.command()` handler. Requires the `commands` OAuth scope on the Slack app. Subcommand parsing is case-insensitive on the command word, preserves args verbatim. Ack within 3 seconds; response via `chat.postMessage` (persistent, channel-visible). No new network surface.
280
+
281
+ - **v4.13.1** (2026-04-16) — Slack Test Connection endpoint validated via `auth.test` (cheap, no ambient state change). Maintenance UI (`/api/pm2/*` routes, kept for compat) now auto-detects launchd / PM2 / standalone via new `process-manager` abstraction. No new external attack surface.
282
+
283
+ - **v4.13.0** (2026-04-16) — **Architectural**: `alvin_dispatch_agent` MCP tool spawns truly detached `claude -p` subprocesses via `child_process.spawn({ detached: true, ..., unref() })`. The subprocess inherits current env (with `CLAUDECODE`/`CLAUDE_CODE_ENTRYPOINT` stripped to prevent nested-session errors) and writes stream-json to `~/.alvin-bot/subagents/<agentId>.jsonl`. Trust boundary: each dispatched subprocess runs with the same user privileges as the parent bot — same trust as `Bash` tool executions. The subprocess has its own separate abort lifecycle; parent abort (e.g. bypass-abort from v4.12.3) no longer cascades into killing the sub-agent, which was a legitimate concern under the old Task-tool-based flow.
284
+
285
+ - **v4.12.4** (2026-04-16) — Parser staleness detection: if outputFile hasn't been written in `ALVIN_SUBAGENT_STALENESS_MS` (default 5 min) AND has usable assistant text, deliver as "completed with partial output" instead of waiting 12h for timeout. Recovers real work from agents interrupted mid-execution. No new privileges or surface.
286
+
273
287
  - **v4.12.2** (2026-04-15) — First formal security release: file-permissions hardening, ALLOWED_USERS hard-fail, webhook timing-safe comparison, exec-guard metachar rejection, cron shell-job execGuard integration, sub-agent toolset presets (readonly, research), axios + claude-agent-sdk CVE patches. This document.
274
288
 
275
289
  - **v4.12.0 – v4.12.1** — Multi-session + Slack + task-aware stuck timer. No dedicated security content, though the v4.12.0 session-key fix closed a confused-deputy bug on Slack/WhatsApp where all channels from the same user collapsed into one session.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "alvin-bot",
3
- "version": "4.14.1",
3
+ "version": "4.15.0",
4
4
  "description": "Alvin Bot \u2014 Your personal AI agent on Telegram, WhatsApp, Discord, Signal, and Web.",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",
@@ -0,0 +1,252 @@
1
+ /**
2
+ * v4.14.2 — zombie-entry fix for async-agent-watcher.
3
+ *
4
+ * Problem: when the dispatched `claude -p` subprocess never produces
5
+ * its outputFile (crashed before the first write, spawn failed, file
6
+ * got deleted externally), `parseOutputFileStatus` returns "missing"
7
+ * on every poll. The watcher keeps polling forever until `giveUpAt`
8
+ * (12 hours) fires, then delivers a timeout banner. Meanwhile the
9
+ * entry hangs in `/subagents list` as a permanent "running" zombie.
10
+ *
11
+ * Fix: when status is "missing" for longer than
12
+ * `MISSING_FILE_FAILURE_MS` (default 10 min, env-configurable), the
13
+ * watcher declares the agent failed with a clear "output file never
14
+ * appeared" reason, delivers the failure banner, and removes the
15
+ * entry. 10 minutes is well above normal startup variance (seconds)
16
+ * and well below the 12h hard ceiling.
17
+ *
18
+ * Invariants preserved:
19
+ * - An agent whose output file DOES appear, even slowly, continues
20
+ * normally (missing on first poll, running on second, completed
21
+ * on third — same as v4.14.1).
22
+ * - The `completed` path (end_turn or stream-json result) is
23
+ * unchanged.
24
+ * - The `failed` path (existing "error" state from parser) is
25
+ * unchanged.
26
+ * - The 12h giveUpAt ceiling still applies — it's now just less
27
+ * likely to be hit because missing-file zombies resolve earlier.
28
+ */
29
+ import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
30
+ import fs from "fs";
31
+ import os from "os";
32
+ import { resolve } from "path";
33
+
34
+ const TEST_DATA_DIR = resolve(
35
+ os.tmpdir(),
36
+ `alvin-zombie-${process.pid}-${Date.now()}`,
37
+ );
38
+
39
+ interface Delivered {
40
+ info: { name: string; status: string };
41
+ result: { status: string; output: string; error?: string };
42
+ }
43
+ let delivered: Delivered[] = [];
44
+
45
+ beforeEach(async () => {
46
+ if (fs.existsSync(TEST_DATA_DIR)) {
47
+ fs.rmSync(TEST_DATA_DIR, { recursive: true, force: true });
48
+ }
49
+ fs.mkdirSync(TEST_DATA_DIR, { recursive: true });
50
+ process.env.ALVIN_DATA_DIR = TEST_DATA_DIR;
51
+ // Reset the env override between tests
52
+ delete process.env.ALVIN_MISSING_FILE_FAILURE_MS;
53
+ delivered = [];
54
+ vi.resetModules();
55
+ vi.doMock("../src/services/subagent-delivery.js", () => ({
56
+ deliverSubAgentResult: async (info: unknown, result: unknown) => {
57
+ delivered.push({
58
+ info: info as Delivered["info"],
59
+ result: result as Delivered["result"],
60
+ });
61
+ },
62
+ attachBotApi: () => {},
63
+ __setBotApiForTest: () => {},
64
+ }));
65
+ });
66
+
67
+ afterEach(async () => {
68
+ try {
69
+ const mod = await import("../src/services/async-agent-watcher.js");
70
+ mod.stopWatcher();
71
+ mod.__resetForTest();
72
+ } catch {}
73
+ delete process.env.ALVIN_MISSING_FILE_FAILURE_MS;
74
+ });
75
+
76
+ describe("watcher zombie fix (v4.14.2)", () => {
77
+ it("missing file younger than threshold stays pending (no premature fail)", async () => {
78
+ // Threshold = 10 min. Backdate only 2 min. Expect: still pending.
79
+ const mod = await import("../src/services/async-agent-watcher.js");
80
+ mod.registerPendingAgent({
81
+ agentId: "young-zombie",
82
+ outputFile: `${TEST_DATA_DIR}/nonexistent.jsonl`,
83
+ description: "young",
84
+ prompt: "p",
85
+ chatId: 1,
86
+ userId: 1,
87
+ toolUseId: null,
88
+ });
89
+ // Forcibly set startedAt to 2 min ago
90
+ const pending = mod.listPendingAgents();
91
+ expect(pending).toHaveLength(1);
92
+ (pending[0] as { startedAt: number }).startedAt = Date.now() - 2 * 60_000;
93
+
94
+ await mod.pollOnce();
95
+
96
+ expect(delivered).toHaveLength(0);
97
+ expect(mod.listPendingAgents()).toHaveLength(1);
98
+ });
99
+
100
+ it("missing file older than threshold delivers failed + removes from pending", async () => {
101
+ process.env.ALVIN_MISSING_FILE_FAILURE_MS = "120000"; // 2 min for test
102
+ const mod = await import("../src/services/async-agent-watcher.js");
103
+ mod.registerPendingAgent({
104
+ agentId: "old-zombie",
105
+ outputFile: `${TEST_DATA_DIR}/never-appears.jsonl`,
106
+ description: "stuck crash zombie",
107
+ prompt: "p",
108
+ chatId: 1,
109
+ userId: 1,
110
+ toolUseId: null,
111
+ });
112
+ // Backdate 5 min (> 2 min threshold)
113
+ const pending = mod.listPendingAgents();
114
+ (pending[0] as { startedAt: number }).startedAt = Date.now() - 5 * 60_000;
115
+
116
+ await mod.pollOnce();
117
+
118
+ expect(delivered).toHaveLength(1);
119
+ expect(delivered[0].result.status).toBe("error");
120
+ // Error message should be explicit so user understands
121
+ expect(delivered[0].result.error).toMatch(/output file|never appeared|never wrote/i);
122
+ expect(mod.listPendingAgents()).toHaveLength(0);
123
+ });
124
+
125
+ it("default threshold is 10 min when env var is not set", async () => {
126
+ const mod = await import("../src/services/async-agent-watcher.js");
127
+ mod.registerPendingAgent({
128
+ agentId: "at-default",
129
+ outputFile: `${TEST_DATA_DIR}/z.jsonl`,
130
+ description: "default threshold",
131
+ prompt: "p",
132
+ chatId: 1,
133
+ userId: 1,
134
+ toolUseId: null,
135
+ });
136
+ // Backdate 9 min — still under the 10-min default, should stay pending
137
+ let p = mod.listPendingAgents();
138
+ (p[0] as { startedAt: number }).startedAt = Date.now() - 9 * 60_000;
139
+ await mod.pollOnce();
140
+ expect(delivered).toHaveLength(0);
141
+ expect(mod.listPendingAgents()).toHaveLength(1);
142
+
143
+ // Backdate to 11 min — over threshold, should fire
144
+ p = mod.listPendingAgents();
145
+ (p[0] as { startedAt: number }).startedAt = Date.now() - 11 * 60_000;
146
+ await mod.pollOnce();
147
+ expect(delivered).toHaveLength(1);
148
+ });
149
+
150
+ it("running file (has content, no end_turn) is unaffected by zombie check", async () => {
151
+ // A file WITH content should never trigger the missing-file path
152
+ // regardless of age.
153
+ const outPath = `${TEST_DATA_DIR}/running.jsonl`;
154
+ fs.writeFileSync(
155
+ outPath,
156
+ JSON.stringify({
157
+ type: "assistant",
158
+ isSidechain: true,
159
+ agentId: "x",
160
+ message: {
161
+ role: "assistant",
162
+ content: [{ type: "tool_use", name: "Bash", input: {} }],
163
+ stop_reason: "tool_use",
164
+ },
165
+ }) + "\n",
166
+ "utf-8",
167
+ );
168
+ const mod = await import("../src/services/async-agent-watcher.js");
169
+ mod.registerPendingAgent({
170
+ agentId: "active-work",
171
+ outputFile: outPath,
172
+ description: "legitimately running",
173
+ prompt: "p",
174
+ chatId: 1,
175
+ userId: 1,
176
+ toolUseId: null,
177
+ });
178
+ const p = mod.listPendingAgents();
179
+ (p[0] as { startedAt: number }).startedAt = Date.now() - 30 * 60_000; // 30 min old
180
+
181
+ await mod.pollOnce();
182
+
183
+ // v4.12.4 staleness detection COULD fire here because the file has
184
+ // text content and is stale. That's a different (benign) path — the
185
+ // agent gets delivered as "completed with partial output". Either
186
+ // way, the zombie-fix error path must NOT fire.
187
+ const anyZombieError = delivered.some(
188
+ (d) => d.result.error && /output file never/i.test(d.result.error),
189
+ );
190
+ expect(anyZombieError).toBe(false);
191
+ });
192
+
193
+ it("completed file delivers as completed (unchanged)", async () => {
194
+ const outPath = `${TEST_DATA_DIR}/done.jsonl`;
195
+ fs.writeFileSync(
196
+ outPath,
197
+ JSON.stringify({
198
+ type: "assistant",
199
+ agentId: "x",
200
+ message: {
201
+ content: [{ type: "text", text: "all good" }],
202
+ stop_reason: "end_turn",
203
+ },
204
+ }) + "\n",
205
+ "utf-8",
206
+ );
207
+ const mod = await import("../src/services/async-agent-watcher.js");
208
+ mod.registerPendingAgent({
209
+ agentId: "done-agent",
210
+ outputFile: outPath,
211
+ description: "clean completion",
212
+ prompt: "p",
213
+ chatId: 1,
214
+ userId: 1,
215
+ toolUseId: null,
216
+ });
217
+ // Backdate 1h — would trigger zombie if misapplied
218
+ const p = mod.listPendingAgents();
219
+ (p[0] as { startedAt: number }).startedAt = Date.now() - 60 * 60_000;
220
+
221
+ await mod.pollOnce();
222
+
223
+ expect(delivered).toHaveLength(1);
224
+ expect(delivered[0].result.status).toBe("completed");
225
+ });
226
+
227
+ it("decrements session counter on zombie failure delivery", async () => {
228
+ process.env.ALVIN_MISSING_FILE_FAILURE_MS = "1000"; // 1 sec for fast test
229
+ const sessionMod = await import("../src/services/session.js");
230
+ const session = sessionMod.getSession("zombie-session");
231
+ session.pendingBackgroundCount = 1;
232
+
233
+ const mod = await import("../src/services/async-agent-watcher.js");
234
+ mod.registerPendingAgent({
235
+ agentId: "session-zombie",
236
+ outputFile: `${TEST_DATA_DIR}/gone.jsonl`,
237
+ description: "zombie for counter",
238
+ prompt: "p",
239
+ chatId: 1,
240
+ userId: 1,
241
+ toolUseId: null,
242
+ sessionKey: "zombie-session",
243
+ });
244
+ const p = mod.listPendingAgents();
245
+ (p[0] as { startedAt: number }).startedAt = Date.now() - 5000; // 5 sec ago, > 1sec threshold
246
+
247
+ await mod.pollOnce();
248
+
249
+ expect(delivered).toHaveLength(1);
250
+ expect(session.pendingBackgroundCount).toBe(0);
251
+ });
252
+ });