npm - alvin-bot - Versions diffs - 4.14.1 → 4.15.0 - Mend

alvin-bot 4.14.1 → 4.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/CHANGELOG.md +113 -0
package/README.md +9 -4
package/dist/handlers/commands.js +10 -0
package/dist/handlers/message.js +3 -0
package/dist/handlers/platform-message.js +3 -0
package/dist/providers/claude-sdk-provider.js +21 -2
package/dist/providers/registry.js +28 -3
package/dist/providers/types.js +4 -4
package/dist/services/async-agent-watcher.js +35 -1
package/dist/services/env-file.js +46 -0
package/dist/services/workspaces.js +2 -0
package/dist/web/openai-compat.js +1 -1
package/dist/web/setup-api.js +5 -39
package/docs/security.md +15 -1
package/package.json +1 -1
package/test/watcher-zombie-fix.test.ts +252 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,119 @@
 All notable changes to Alvin Bot are documented here.
+## [4.15.0] — 2026-04-16
+### ✨ Feature: auto-latest Claude model selection + per-workspace overrides
+Alvin now picks up new Claude models (e.g. Opus 4.7 on Max subscription) automatically, and users can switch between Opus / Sonnet / Haiku tiers directly from Telegram — or pin a specific tier per workspace.
+#### What's new
+**`/model` now lists four Claude entries** (plus any configured custom providers + Ollama):
+- `Claude (Agent SDK)` — CLI default (= whatever Anthropic ships as current, currently Opus 4.7)
+- `Claude Opus (auto-latest)` — forwards `model: "opus"` to the Agent SDK → latest Opus tier
+- `Claude Sonnet (auto-latest)` — same pattern with Sonnet
+- `Claude Haiku (auto-latest)` — same pattern with Haiku
+The three aliased entries all route through `ClaudeSDKProvider` with different `model:` values. Switching persists to `~/.alvin-bot/.env` (`PRIMARY_PROVIDER=…`), so the choice survives bot restarts.
+**Workspaces can pin a model** via an optional YAML frontmatter field:
+```yaml
+---
+purpose: Interview prep
+cwd: ~/Documents/Interviews
+model: sonnet           # opus | sonnet | haiku | claude-opus-4-7 | ...
+---
+```
+When `model:` is omitted (the default for all existing workspaces), the globally active `/model` choice is used — no behaviour change.
+**Fallback on rate limits:** the Agent SDK is now always called with `fallbackModel: "haiku"`. Keeps the bot responsive when the primary tier is throttled.
+#### Why this matters
+Before v4.15, `claude-opus-4-6` was hardcoded in six places. When Anthropic released Opus 4.7 on the Max plan, the CLI picked it up automatically — but Alvin's `/status` still claimed `claude-opus-4-6`, and there was no way to force a specific tier from Telegram. The Agent SDK's `query()` call wasn't even receiving a `model:` parameter, so whatever lived in `config.model` was dead metadata.
+Now:
+- The default `"inherit"` means "don't pass model: — let the CLI pick its current default." Fresh installs on Max plans get Opus 4.7 automatically.
+- Aliases (`opus` / `sonnet` / `haiku`) resolve to the latest tier each release cycle without any code change.
+- Pinning a specific ID (e.g. `claude-opus-4-7`) is supported for reproducibility.
+#### Implementation
+- `src/providers/claude-sdk-provider.ts` — forwards `model:` and sets `fallbackModel: "haiku"` on every `query()` call. Resolution order: per-query `options.model` → provider `this.config.model` → `"inherit"` (= no model passed).
+- `src/providers/registry.ts` — registers three virtual entries (`claude-opus`, `claude-sonnet`, `claude-haiku`) as additional keys all backed by `ClaudeSDKProvider` with different `model:` values.
+- `src/services/env-file.ts` — new module extracting the `readEnv` / `writeEnvVar` / `removeEnvVar` helpers from `setup-api.ts` so Telegram command handlers can persist runtime choices.
+- `src/handlers/commands.ts` — `switchProviderWithLifecycle` now calls `writeEnvVar("PRIMARY_PROVIDER", targetKey)` on every switch, not just Web UI changes.
+- `src/services/workspaces.ts` — `Workspace` type gets optional `model?: string`, the YAML parser picks it up from frontmatter.
+- `src/providers/types.ts` — `QueryOptions` gets optional `model?: string` for per-query overrides.
+- `src/handlers/message.ts` + `src/handlers/platform-message.ts` — both forward `workspace.model` into `queryOpts` when the active workspace has one defined.
+#### Backward compatibility
+- Default provider config is `"inherit"` — identical to pre-v4.15 behaviour (no `model:` passed to the Agent SDK, CLI default wins).
+- Workspaces without a `model:` field behave exactly as before.
+- Stale presets `claude-sonnet-4-20250514` → `claude-sonnet-4-6` and `claude-3-5-haiku-20241022` → `claude-haiku-4-5` updated (previously unused — only affected the REST-API code paths, which nobody referenced).
+#### Docs
+Workspace guides updated (`docs/install/workspaces-de.html` + `workspaces-en.html`) — the YAML-field reference table now documents the new optional `model:` entry.
+### 🐛 Bonus: stale model-ID cleanup
+Four hardcoded Claude model IDs replaced with current strings: `claude-sonnet-4-20250514` → `claude-sonnet-4-6`, `claude-3-5-haiku-20241022` → `claude-haiku-4-5`, openai-compat fallback `claude-opus-4` → `claude-opus-4-6`, setup-API defaults likewise. None of these were on active code paths, but they would have shipped confusing display names if anyone had referenced them.
+### Commits
+- `fed4b91` — feat(providers): v4.15 — auto-latest Claude model selection via /model
+- `b2a6e1f` — feat(workspaces): v4.15 — optional per-workspace model override
+---
+## [4.14.2] — 2026-04-16
+### 🐛 Patch: watcher zombie-entry fix (missing outputFile > 10 min = failed)
+**Edge case Ali caught today:** a pending async-agent entry stuck in `/subagents list` for 3+ hours showing "running" — but the underlying `alvin_dispatch_agent` subprocess had already died (its output file was gone). The entry would have continued haunting the list until the 12-hour `giveUpAt` ceiling fired.
+**Root cause:** `async-agent-watcher`'s `pollOnce` handled four states from `parseOutputFileStatus` — `completed` / `failed` / `running` / `missing`. For `missing` (file doesn't exist or is empty), the watcher just kept polling forever, on the assumption that a slow subprocess might eventually write. If the subprocess crashed before writing ANY output, the file never appeared, and we polled for 12 hours before timing out.
+**Fix:** when `status.state === "missing"` AND `now - entry.startedAt > MISSING_FILE_FAILURE_MS` (default 10 min, configurable via `ALVIN_MISSING_FILE_FAILURE_MS` env var), deliver as failed with an explicit message:
+> *Dispatched subprocess never wrote its output file (N m after start). Likely crashed before initializing, or the file was removed externally.*
+10 minutes is well above any legitimate `claude -p` startup variance (normal first-write latency is seconds) and well below the 12-hour hard ceiling.
+### What's preserved (regression-guard tested)
+- Running agents (file has content but no `end_turn`/`result` yet) are untouched by this path — they still keep polling as before.
+- Completed agents (clean `end_turn` or `stream-json result` event) still deliver normally.
+- Explicit `failed` state from the parser (if ever used) still delivers error normally.
+- v4.12.4's "file is stale but has text → deliver partial" path takes precedence over the new zombie check (the file has content, so not "missing").
+- 12-hour `giveUpAt` hard ceiling still applies as the ultimate safety net.
+- Session's `pendingBackgroundCount` decrement fires on zombie failure, same as every other delivery path.
+### Testing
+- **Baseline**: 498 tests (v4.14.1)
+- **New**: `test/watcher-zombie-fix.test.ts` — 6 tests:
+  - Young missing file (<threshold) stays pending
+  - Old missing file (>threshold) delivers failed + removes from pending
+  - Default threshold is 10 min when env var unset
+  - Running file (has content) is unaffected by zombie check
+  - Completed file delivers as completed (regression guard)
+  - Session's `pendingBackgroundCount` decrements on zombie delivery
+- **Total**: 504 tests, all green, TSC clean
+### Files changed
+- **Modified**: `src/services/async-agent-watcher.ts` (new `getMissingFileFailureMs()` + zombie branch in `pollOnce`)
+- **NEW tests**: `test/watcher-zombie-fix.test.ts`
+- **Version**: `package.json` 4.14.1 → 4.14.2
+---
 ## [4.14.1] — 2026-04-16
 ### 🐛 Patch: `/subagents list` now shows v4.13+ dispatch agents too

package/README.md CHANGED Viewed

@@ -64,7 +64,7 @@ Alvin Bot is an open-source, self-hosted AI agent that lives where you chat. Bui
 - **Adjustable Thinking** — From quick answers (`/effort low`) to deep analysis (`/effort max`)
 - **Persistent Memory** — Remembers across sessions via vector-indexed knowledge base; session state (Claude SDK resume tokens, conversation history, language, effort) survives bot restarts (v4.11.0)
 - **Multi-Session Workspaces** — Run multiple parallel, context-isolated sessions on the same bot — one per Slack channel or per Telegram `/workspace` — each with its own working directory, purpose, and persona. Memory, skills, and sub-agents stay globally shared (v4.12.0). [How-to ↓](#-multi-session-workspaces-v4120)
-- **Background Sub-Agents** — Claude autonomously uses `run_in_background: true` for long audits/research; main session stays responsive, results deliver as separate messages (v4.10.0)
+- **Truly Detached Sub-Agents** — Claude dispatches long-running research/audit tasks via the `alvin_dispatch_agent` MCP tool, which spawns independent `claude -p` subprocesses with their own PID + process group. Main session stays fully responsive, user can interrupt freely without killing sub-agents. Results deliver as separate messages. Works identically on Telegram, Slack, Discord, and WhatsApp (v4.13.0+ dispatch, v4.14.0 multi-platform)
 - **Smart Tool Discovery** — Scans your system at startup, knows exactly what CLI tools, plugins, and APIs are available
 - **Skill System** — 12 built-in SKILL.md files (code, data analysis, email, docs, research, sysadmin, browse, etc.) auto-activate based on message context
 - **Self-Awareness** — Knows it IS the AI model — won't call external APIs for tasks it can do itself
@@ -406,7 +406,7 @@ curl -s http://localhost:3100/api/workspaces | jq
 ### Architecture guarantees
 - **Memory is global.** Facts Alvin learns in `#alev-b` are visible in `#homes` via the shared `MEMORY.md` and embeddings index. Per-workspace memory layer is on the v4.13 roadmap.
-- **Sub-agents are per-session.** Each workspace can spawn its own `run_in_background` agents — results come back to the same channel automatically (v4.10.0).
+- **Sub-agents are per-session.** Each workspace can dispatch its own detached sub-agents via `alvin_dispatch_agent` — results come back to the originating channel on any platform (Telegram, Slack, Discord, WhatsApp), visible in `/subagents list` (v4.13.0+ dispatch, v4.14.0 cross-platform, v4.14.1 unified list view).
 - **Session state survives restart.** Claude SDK `resume` tokens, conversation history, language, effort, and `workspaceName` all persist via `session-persistence.ts` (v4.11.0).
 - **Backwards compatible.** If you don't create any workspace files, everything behaves exactly like v4.11. Upgrade is a no-op.
@@ -650,14 +650,19 @@ alvin-bot version   # Show version
   - [x] Telegram `/workspace` + `/workspaces` commands (feature parity)
   - [x] Per-workspace cost aggregation + Web UI workspace cards
   - [x] Slack setup guide + copy-paste app manifest (in GitHub Release assets)
-- [ ] **Phase 17** — Memory + Workspace polish (v4.13.0+)
+- [x] **Phase 17** — Truly detached sub-agents + multi-platform dispatch (v4.13.0 – v4.14.2, 2026-04-16)
+  - [x] `alvin_dispatch_agent` MCP tool — spawns independent `claude -p` subprocesses that survive parent aborts (v4.13.0)
+  - [x] Slack `/alvin` slash command (namespaced parent with subcommands: status / new / effort / help + LLM fallthrough) (v4.13.2)
+  - [x] Sub-agent dispatch on Slack, Discord, WhatsApp via platform-aware delivery registry (v4.14.0)
+  - [x] `/subagents list` merged view — v4.0.0 bot-level agents + v4.13+ detached dispatches in one list (v4.14.1)
+  - [x] Watcher zombie guard — missing outputFile > 10 min delivers as failed instead of 12h timeout (v4.14.2)
+  - [x] Staleness-based partial output recovery for interrupted sub-agents (v4.12.4)
   - [ ] SQLite migration of the embeddings index (currently 128 MB JSON)
   - [ ] Per-workspace memory layer (additive over global) — facts learned in `#alev-b` stay in `alev-b` unless explicitly promoted to global
   - [ ] Per-workspace provider override (`provider:` in frontmatter) — e.g. Alev-B uses Claude Opus, JobSnack uses cheap Gemini
   - [ ] Per-workspace skill allowlist — scope Apple Notes to personal workspace, sysadmin only to devops workspace, etc.
   - [ ] Multi-User Slack (real `per-channel-peer` mode) — different users in the same Slack channel get their own sub-sessions
   - [ ] Workspace cloning / templates — `/workspace clone alev-b as homes-dev` spins up a new workspace from an existing one
-  - [ ] Slack slash commands (`/alvin workspace`, `/alvin status`, `/alvin new`) — native Slack command integration via Bolt
   - [ ] Daily log decay / archive — older daily logs move to cold storage after N days
 - [ ] **Phase 18** — Security + Platform hardening (from v4.12.1 audit, prioritized)
   - [ ] **P1 — Electron major upgrade** (35 → 41+) — fixes 1 HIGH + 5 MODERATE Electron CVEs in the Desktop-Build path. Major version jump, requires full rebuild + test of `.dmg` flow. Separate release (likely bundled with Windows `.exe` work).

package/dist/handlers/commands.js CHANGED Viewed

@@ -16,6 +16,7 @@ import { getLoadedPlugins, getPluginsDir } from "../services/plugins.js";
 import { getMCPStatus, getMCPTools, callMCPTool } from "../services/mcp.js";
 import { listCustomTools, executeCustomTool } from "../services/custom-tools.js";
 import { screenshotUrl, extractText, generatePdf, hasPlaywright } from "../services/browser.js";
+import { writeEnvVar } from "../services/env-file.js";
 import { listJobs, createJob, deleteJob, toggleJob, runJobNow, formatNextRun, humanReadableSchedule } from "../services/cron.js";
 import { resolveJobByNameOrId } from "../services/cron-resolver.js";
 import { buildTickerText, buildDoneText, escapeMarkdown } from "./cron-progress.js";
@@ -518,6 +519,15 @@ export function registerCommands(bot) {
         if (!registry.switchTo(targetKey)) {
             return { ok: false, error: "switch rejected by registry" };
         }
+        // v4.15 — Persist the switch to ~/.alvin-bot/.env so the choice
+        // survives bot restarts. In-memory switchTo() alone would revert to
+        // PRIMARY_PROVIDER on next boot.
+        try {
+            writeEnvVar("PRIMARY_PROVIDER", targetKey);
+        }
+        catch (err) {
+            console.warn("⚠️ Failed to persist PRIMARY_PROVIDER:", err);
+        }
         // Tear down the previous provider's lifecycle (if any) after the switch.
         // ensureStopped() internally checks isBotManaged — no-op for externally
         // managed daemons.

package/dist/handlers/message.js CHANGED Viewed

@@ -386,6 +386,9 @@ export async function handleMessage(ctx) {
             systemPrompt,
             workingDir: session.workingDir,
             effort: session.effort,
+            // v4.15 — Per-workspace model override (optional YAML `model:` field).
+            // When unset, falls through to the globally active provider's model.
+            ...(workspace.model ? { model: workspace.model } : {}),
             abortSignal: session.abortController.signal,
             // User's UI locale — registry uses it to localize failure messages.
             locale: session.language,

package/dist/handlers/platform-message.js CHANGED Viewed

@@ -169,6 +169,9 @@ export async function handlePlatformMessage(msg, adapter) {
             systemPrompt,
             workingDir: session.workingDir,
             effort: session.effort,
+            // v4.15 — Per-workspace model override (optional YAML `model:` field).
+            // When unset, falls through to the globally active provider's model.
+            ...(workspace.model ? { model: workspace.model } : {}),
             sessionId: isSDK ? session.sessionId : null,
             history: !isSDK ? session.history : undefined,
             // v4.14 — Expose alvin_dispatch_agent MCP tool on non-Telegram

package/dist/providers/claude-sdk-provider.js CHANGED Viewed

@@ -49,7 +49,10 @@ export class ClaudeSDKProvider {
         this.config = {
             type: "claude-sdk",
             name: "Claude (Agent SDK)",
-            model: "claude-opus-4-6",
+            // "inherit" = don't pass model: to the SDK → Claude CLI default wins
+            // (currently Opus 4.7 on Max subscription). Override with an alias
+            // ("opus" | "sonnet" | "haiku") or a full ID ("claude-opus-4-7").
+            model: "inherit",
             supportsTools: true,
             supportsVision: true,
             supportsStreaming: true,
@@ -123,6 +126,13 @@ export class ClaudeSDKProvider {
             if (options.alvinDispatchContext) {
                 defaultAllowed.push("mcp__alvin__dispatch_agent");
             }
+            // v4.15 — Forward model selection to the Agent SDK. Resolution order:
+            //   1. options.model (per-query override — e.g. workspace `model:` field)
+            //   2. this.config.model (provider-level default — e.g. claude-sonnet)
+            //   3. "inherit" → don't pass model: → Claude CLI default (Opus 4.7 on Max)
+            // Aliases "opus" | "sonnet" | "haiku" auto-resolve to the latest tier.
+            const rawModel = options.model ?? this.config.model;
+            const modelOverride = rawModel && rawModel !== "inherit" ? rawModel : undefined;
             const q = query({
                 prompt,
                 options: {
@@ -145,6 +155,12 @@ export class ClaudeSDKProvider {
                     effort: (options.effort || "medium"),
                     maxTurns: 50,
                     betas: ["context-1m-2025-08-07"],
+                    ...(modelOverride ? { model: modelOverride } : {}),
+                    // Always prefer Haiku as fallback on rate-limit/overload — cheap
+                    // and fast, keeps the bot responsive when the primary tier is
+                    // throttled. Users can disable this by setting model: "inherit"
+                    // and relying purely on CLI defaults.
+                    fallbackModel: "haiku",
                 },
             });
             let accumulatedText = "";
@@ -370,9 +386,12 @@ export class ClaudeSDKProvider {
         }
     }
     getInfo() {
+        const model = this.config.model === "inherit"
+            ? "CLI default (latest)"
+            : this.config.model;
         return {
             name: this.config.name,
-            model: this.config.model,
+            model,
             status: "✅ Agent SDK (CLI auth)",
         };
     }

package/dist/providers/registry.js CHANGED Viewed

@@ -271,13 +271,38 @@ export function createRegistry(config) {
             model: "gpt-5.4",
         };
     }
-    // Always register Claude SDK if it's referenced
-    if (config.primary === "claude-sdk" || config.fallbacks?.includes("claude-sdk")) {
+    // Claude (Agent SDK) — the base provider plus three tier-aliased virtual
+    // entries. All four route through the same ClaudeSDKProvider implementation
+    // but pass a different `model:` to the Agent SDK at query time. The aliases
+    // ("opus" | "sonnet" | "haiku") auto-resolve to the latest tier on the
+    // Claude CLI — no hardcoded version IDs, no manual updates when Anthropic
+    // releases a new model.
+    const claudeKeys = ["claude-sdk", "claude-opus", "claude-sonnet", "claude-haiku"];
+    const claudeReferenced = claudeKeys.some((k) => config.primary === k || config.fallbacks?.includes(k));
+    if (claudeReferenced) {
         providers["claude-sdk"] = {
             ...PROVIDER_PRESETS["claude-sdk"],
             type: "claude-sdk",
             name: "Claude (Agent SDK)",
-            model: "claude-opus-4-6",
+            model: "inherit", // CLI default → currently Opus 4.7 on Max plan
+        };
+        providers["claude-opus"] = {
+            ...PROVIDER_PRESETS["claude-sdk"],
+            type: "claude-sdk",
+            name: "Claude Opus (auto-latest)",
+            model: "opus",
+        };
+        providers["claude-sonnet"] = {
+            ...PROVIDER_PRESETS["claude-sdk"],
+            type: "claude-sdk",
+            name: "Claude Sonnet (auto-latest)",
+            model: "sonnet",
+        };
+        providers["claude-haiku"] = {
+            ...PROVIDER_PRESETS["claude-sdk"],
+            type: "claude-sdk",
+            name: "Claude Haiku (auto-latest)",
+            model: "haiku",
         };
     }
     // Register Google Gemini only if explicitly referenced as primary/fallback

package/dist/providers/types.js CHANGED Viewed

@@ -38,8 +38,8 @@ export const PROVIDER_PRESETS = {
     },
     "claude-sonnet": {
         type: "openai-compatible",
-        name: "Claude Sonnet 4",
-        model: "claude-sonnet-4-20250514",
+        name: "Claude Sonnet 4.6",
+        model: "claude-sonnet-4-6",
         baseUrl: "https://api.anthropic.com/v1/",
         supportsVision: true,
         supportsStreaming: true,
@@ -48,8 +48,8 @@ export const PROVIDER_PRESETS = {
     },
     "claude-haiku": {
         type: "openai-compatible",
-        name: "Claude 3.5 Haiku",
-        model: "claude-3-5-haiku-20241022",
+        name: "Claude Haiku 4.5",
+        model: "claude-haiku-4-5",
         baseUrl: "https://api.anthropic.com/v1/",
         supportsVision: true,
         supportsStreaming: true,

package/dist/services/async-agent-watcher.js CHANGED Viewed

@@ -33,6 +33,31 @@ const POLL_INTERVAL_MS = 15_000;
  *  a timeout banner. SEO audits historically take ~13 min, so 12h
  *  is absurdly generous and protects against state-file growth. */
 const MAX_AGENT_AGE_MS = 12 * 60 * 60 * 1000;
+/**
+ * v4.14.2 — When a dispatched subprocess never creates its outputFile
+ * (spawn failure, crash before first write, file deleted externally),
+ * `parseOutputFileStatus` returns "missing" on every poll. Pre-v4.14.2
+ * that meant waiting the full 12h MAX_AGENT_AGE_MS before delivering a
+ * timeout — a 12-hour zombie in `/subagents list`.
+ *
+ * This threshold caps how long we tolerate a missing file before
+ * declaring the agent failed. `claude -p` normally writes its first
+ * JSONL line within seconds of spawn; 10 minutes is way above any
+ * legitimate startup variance and well below the 12h ceiling.
+ *
+ * Configurable via the ALVIN_MISSING_FILE_FAILURE_MS env var. Tests
+ * use shorter values via the same hook. Only the getter is exposed
+ * so callers always see the current env value, not a stale constant.
+ */
+function getMissingFileFailureMs() {
+    const raw = process.env.ALVIN_MISSING_FILE_FAILURE_MS;
+    if (raw) {
+        const n = Number(raw);
+        if (Number.isFinite(n) && n > 0)
+            return n;
+    }
+    return 10 * 60 * 1000; // default 10 min
+}
 // ── Module state ──────────────────────────────────────────────────
 const pending = new Map();
 let pollTimer = null;
@@ -139,6 +164,7 @@ export function stopWatcher() {
 export async function pollOnce() {
     const now = Date.now();
     const toRemove = [];
+    const missingFileFailureMs = getMissingFileFailureMs();
     for (const entry of pending.values()) {
         entry.lastCheckedAt = now;
         // Timeout check first — if the agent is past its giveUpAt, give up
@@ -157,7 +183,15 @@ export async function pollOnce() {
             await deliverAsFailure(entry, "error", status.error);
             toRemove.push(entry.agentId);
         }
-        // running / missing → keep polling next cycle
+        else if (status.state === "missing" &&
+            now - entry.startedAt > missingFileFailureMs) {
+            // v4.14.2 — Zombie guard: the subprocess never created its
+            // output file within `missingFileFailureMs` (default 10 min).
+            // Declare failed instead of polling until the 12h giveUpAt.
+            await deliverAsFailure(entry, "error", `Dispatched subprocess never wrote its output file (${Math.round((now - entry.startedAt) / 60_000)}m after start). Likely crashed before initializing, or the file was removed externally.`);
+            toRemove.push(entry.agentId);
+        }
+        // running / missing-but-young → keep polling next cycle
     }
     if (toRemove.length > 0) {
         for (const id of toRemove)

package/dist/services/env-file.js ADDED Viewed

@@ -0,0 +1,46 @@
+/**
+ * env-file — Shared helpers for reading and persisting key=value pairs
+ * in ~/.alvin-bot/.env. Previously private to setup-api.ts; extracted so
+ * Telegram command handlers (e.g. /model) can persist the user's runtime
+ * choices across bot restarts.
+ *
+ * All writes go through writeSecure() which enforces 0o600 on the env
+ * file — it contains bot tokens and API keys.
+ */
+import fs from "fs";
+import { ENV_FILE } from "../paths.js";
+import { writeSecure } from "./file-permissions.js";
+/** Read the env file into a plain object. Skips comments and malformed lines. */
+export function readEnv() {
+    if (!fs.existsSync(ENV_FILE))
+        return {};
+    const lines = fs.readFileSync(ENV_FILE, "utf-8").split("\n");
+    const env = {};
+    for (const line of lines) {
+        if (line.startsWith("#") || !line.includes("="))
+            continue;
+        const idx = line.indexOf("=");
+        env[line.slice(0, idx).trim()] = line.slice(idx + 1).trim();
+    }
+    return env;
+}
+/** Upsert a key=value pair in the env file, preserving all other lines. */
+export function writeEnvVar(key, value) {
+    let content = fs.existsSync(ENV_FILE) ? fs.readFileSync(ENV_FILE, "utf-8") : "";
+    const regex = new RegExp(`^${key}=.*$`, "m");
+    if (regex.test(content)) {
+        content = content.replace(regex, `${key}=${value}`);
+    }
+    else {
+        content = content.trimEnd() + `\n${key}=${value}\n`;
+    }
+    writeSecure(ENV_FILE, content);
+}
+/** Remove a key from the env file. No-op if missing. */
+export function removeEnvVar(key) {
+    if (!fs.existsSync(ENV_FILE))
+        return;
+    let content = fs.readFileSync(ENV_FILE, "utf-8");
+    content = content.replace(new RegExp(`^${key}=.*\n?`, "m"), "");
+    writeSecure(ENV_FILE, content);
+}

package/dist/services/workspaces.js CHANGED Viewed

@@ -99,6 +99,7 @@ function readWorkspaceFile(filePath, name) {
         const cwd = expandHome(rawCwd);
         const color = typeof fm.color === "string" ? fm.color : undefined;
         const emoji = typeof fm.emoji === "string" ? fm.emoji : undefined;
+        const model = typeof fm.model === "string" && fm.model.trim() ? fm.model.trim() : undefined;
         const channels = Array.isArray(fm.channels)
             ? fm.channels.filter((c) => typeof c === "string")
             : [];
@@ -109,6 +110,7 @@ function readWorkspaceFile(filePath, name) {
             color,
             emoji,
             channels,
+            model,
             systemPromptOverride: body.trim(),
         };
     }

package/dist/web/openai-compat.js CHANGED Viewed

@@ -78,7 +78,7 @@ async function handleChatCompletions(req, res, body) {
     const { prompt, systemPrompt } = buildPromptFromMessages(oaiReq.messages);
     const completionId = `chatcmpl-${crypto.randomUUID().replace(/-/g, "").slice(0, 24)}`;
     const created = Math.floor(Date.now() / 1000);
-    const model = oaiReq.model || "claude-opus-4";
+    const model = oaiReq.model || "claude-opus-4-6";
     // Optional session resumption via header
     const sessionId = req.headers["x-session-id"] || null;
     const p = getProvider();

package/dist/web/setup-api.js CHANGED Viewed

@@ -12,42 +12,8 @@ import { execSync } from "child_process";
 import { getRegistry } from "../engine.js";
 import { listJobs, createJob, deleteJob, toggleJob, updateJob, runJobNow, formatNextRun, humanReadableSchedule } from "../services/cron.js";
 import { storePassword, revokePassword, getSudoStatus, verifyPassword, sudoExec, requestAdminViaDialog, openSystemSettings } from "../services/sudo.js";
-import { ENV_FILE, CUSTOM_MODELS as CUSTOM_MODELS_FILE, BOT_ROOT, WHATSAPP_AUTH } from "../paths.js";
-import { writeSecure } from "../services/file-permissions.js";
-// ── Env Helpers ─────────────────────────────────────────
-function readEnv() {
-    if (!fs.existsSync(ENV_FILE))
-        return {};
-    const lines = fs.readFileSync(ENV_FILE, "utf-8").split("\n");
-    const env = {};
-    for (const line of lines) {
-        if (line.startsWith("#") || !line.includes("="))
-            continue;
-        const idx = line.indexOf("=");
-        env[line.slice(0, idx).trim()] = line.slice(idx + 1).trim();
-    }
-    return env;
-}
-function writeEnvVar(key, value) {
-    let content = fs.existsSync(ENV_FILE) ? fs.readFileSync(ENV_FILE, "utf-8") : "";
-    const regex = new RegExp(`^${key}=.*$`, "m");
-    if (regex.test(content)) {
-        content = content.replace(regex, `${key}=${value}`);
-    }
-    else {
-        content = content.trimEnd() + `\n${key}=${value}\n`;
-    }
-    // v4.12.2 — .env contains all secrets (bot tokens, API keys). Enforce
-    // 0o600 so other users on the machine can't read it.
-    writeSecure(ENV_FILE, content);
-}
-function removeEnvVar(key) {
-    if (!fs.existsSync(ENV_FILE))
-        return;
-    let content = fs.readFileSync(ENV_FILE, "utf-8");
-    content = content.replace(new RegExp(`^${key}=.*\n?`, "m"), "");
-    writeSecure(ENV_FILE, content);
-}
+import { CUSTOM_MODELS as CUSTOM_MODELS_FILE, BOT_ROOT, WHATSAPP_AUTH } from "../paths.js";
+import { readEnv, writeEnvVar, removeEnvVar } from "../services/env-file.js";
 function loadCustomModels() {
     try {
         return JSON.parse(fs.readFileSync(CUSTOM_MODELS_FILE, "utf-8"));
@@ -180,9 +146,9 @@ const PROVIDERS = [
         description: "Claude Opus, Sonnet, Haiku directly via API key. OpenAI-compatible.",
         envKey: "ANTHROPIC_API_KEY",
         models: [
-            { key: "claude-opus", name: "Claude Opus 4", model: "claude-opus-4-6" },
-            { key: "claude-sonnet", name: "Claude Sonnet 4", model: "claude-sonnet-4-20250514" },
-            { key: "claude-haiku", name: "Claude 3.5 Haiku", model: "claude-3-5-haiku-20241022" },
+            { key: "claude-opus", name: "Claude Opus 4.6", model: "claude-opus-4-6" },
+            { key: "claude-sonnet", name: "Claude Sonnet 4.6", model: "claude-sonnet-4-6" },
+            { key: "claude-haiku", name: "Claude Haiku 4.5", model: "claude-haiku-4-5" },
         ],
         signupUrl: "https://console.anthropic.com/settings/keys",
         docsUrl: "https://docs.anthropic.com/en/api",

package/docs/security.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Alvin Bot — Security Threat Model & Hardening Guide
-> **Last updated:** 2026-04-15 (v4.12.2)
+> **Last updated:** 2026-04-16 (v4.14.2)
 > **Audience:** Operators installing Alvin Bot on their own machine.
 > **Short version:** Alvin Bot is a full AI agent with shell, filesystem, and network access on the machine it runs on. Treat it like you would `sudo` access. Only install on machines where you would trust Claude Code to run without supervision.
@@ -270,6 +270,20 @@ If you suspect the bot has been compromised or exfiltrated secrets:
 ## Version history
+- **v4.14.2** (2026-04-16) — Watcher zombie guard: missing outputFile > 10 min (env-configurable) delivers as failed instead of 12h timeout. Prevents stuck pending entries when a dispatched `claude -p` subprocess crashes before writing output or the file gets removed externally. No new attack surface.
+- **v4.14.1** (2026-04-16) — `/subagents list` unified view: merges v4.0.0 bot-level `activeAgents` registry with v4.13+ `async-agent-watcher` pending registry. Cosmetic/diagnostic only, no security implications.
+- **v4.14.0** (2026-04-16) — Sub-agent dispatch on Slack / Discord / WhatsApp via the `alvin_dispatch_agent` MCP tool. New `delivery-registry` module routes sub-agent deliveries to the right platform adapter. Types widened (`chatId: number | string`, `platform?: ...`). Telegram path bit-for-bit unchanged. Trust boundary expanded: each non-Telegram platform adapter now has `sendText` access to its respective channel — same trust level as the main adapter's `sendText`, no new capabilities.
+- **v4.13.2** (2026-04-16) — Slack `/alvin` slash command via Bolt `app.command()` handler. Requires the `commands` OAuth scope on the Slack app. Subcommand parsing is case-insensitive on the command word, preserves args verbatim. Ack within 3 seconds; response via `chat.postMessage` (persistent, channel-visible). No new network surface.
+- **v4.13.1** (2026-04-16) — Slack Test Connection endpoint validated via `auth.test` (cheap, no ambient state change). Maintenance UI (`/api/pm2/*` routes, kept for compat) now auto-detects launchd / PM2 / standalone via new `process-manager` abstraction. No new external attack surface.
+- **v4.13.0** (2026-04-16) — **Architectural**: `alvin_dispatch_agent` MCP tool spawns truly detached `claude -p` subprocesses via `child_process.spawn({ detached: true, ..., unref() })`. The subprocess inherits current env (with `CLAUDECODE`/`CLAUDE_CODE_ENTRYPOINT` stripped to prevent nested-session errors) and writes stream-json to `~/.alvin-bot/subagents/<agentId>.jsonl`. Trust boundary: each dispatched subprocess runs with the same user privileges as the parent bot — same trust as `Bash` tool executions. The subprocess has its own separate abort lifecycle; parent abort (e.g. bypass-abort from v4.12.3) no longer cascades into killing the sub-agent, which was a legitimate concern under the old Task-tool-based flow.
+- **v4.12.4** (2026-04-16) — Parser staleness detection: if outputFile hasn't been written in `ALVIN_SUBAGENT_STALENESS_MS` (default 5 min) AND has usable assistant text, deliver as "completed with partial output" instead of waiting 12h for timeout. Recovers real work from agents interrupted mid-execution. No new privileges or surface.
 - **v4.12.2** (2026-04-15) — First formal security release: file-permissions hardening, ALLOWED_USERS hard-fail, webhook timing-safe comparison, exec-guard metachar rejection, cron shell-job execGuard integration, sub-agent toolset presets (readonly, research), axios + claude-agent-sdk CVE patches. This document.
 - **v4.12.0 – v4.12.1** — Multi-session + Slack + task-aware stuck timer. No dedicated security content, though the v4.12.0 session-key fix closed a confused-deputy bug on Slack/WhatsApp where all channels from the same user collapsed into one session.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "alvin-bot",
-  "version": "4.14.1",
+  "version": "4.15.0",
   "description": "Alvin Bot \u2014 Your personal AI agent on Telegram, WhatsApp, Discord, Signal, and Web.",
   "type": "module",
   "main": "dist/index.js",

package/test/watcher-zombie-fix.test.ts ADDED Viewed

@@ -0,0 +1,252 @@
+/**
+ * v4.14.2 — zombie-entry fix for async-agent-watcher.
+ *
+ * Problem: when the dispatched `claude -p` subprocess never produces
+ * its outputFile (crashed before the first write, spawn failed, file
+ * got deleted externally), `parseOutputFileStatus` returns "missing"
+ * on every poll. The watcher keeps polling forever until `giveUpAt`
+ * (12 hours) fires, then delivers a timeout banner. Meanwhile the
+ * entry hangs in `/subagents list` as a permanent "running" zombie.
+ *
+ * Fix: when status is "missing" for longer than
+ * `MISSING_FILE_FAILURE_MS` (default 10 min, env-configurable), the
+ * watcher declares the agent failed with a clear "output file never
+ * appeared" reason, delivers the failure banner, and removes the
+ * entry. 10 minutes is well above normal startup variance (seconds)
+ * and well below the 12h hard ceiling.
+ *
+ * Invariants preserved:
+ *   - An agent whose output file DOES appear, even slowly, continues
+ *     normally (missing on first poll, running on second, completed
+ *     on third — same as v4.14.1).
+ *   - The `completed` path (end_turn or stream-json result) is
+ *     unchanged.
+ *   - The `failed` path (existing "error" state from parser) is
+ *     unchanged.
+ *   - The 12h giveUpAt ceiling still applies — it's now just less
+ *     likely to be hit because missing-file zombies resolve earlier.
+ */
+import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
+import fs from "fs";
+import os from "os";
+import { resolve } from "path";
+const TEST_DATA_DIR = resolve(
+  os.tmpdir(),
+  `alvin-zombie-${process.pid}-${Date.now()}`,
+);
+interface Delivered {
+  info: { name: string; status: string };
+  result: { status: string; output: string; error?: string };
+}
+let delivered: Delivered[] = [];
+beforeEach(async () => {
+  if (fs.existsSync(TEST_DATA_DIR)) {
+    fs.rmSync(TEST_DATA_DIR, { recursive: true, force: true });
+  }
+  fs.mkdirSync(TEST_DATA_DIR, { recursive: true });
+  process.env.ALVIN_DATA_DIR = TEST_DATA_DIR;
+  // Reset the env override between tests
+  delete process.env.ALVIN_MISSING_FILE_FAILURE_MS;
+  delivered = [];
+  vi.resetModules();
+  vi.doMock("../src/services/subagent-delivery.js", () => ({
+    deliverSubAgentResult: async (info: unknown, result: unknown) => {
+      delivered.push({
+        info: info as Delivered["info"],
+        result: result as Delivered["result"],
+      });
+    },
+    attachBotApi: () => {},
+    __setBotApiForTest: () => {},
+  }));
+});
+afterEach(async () => {
+  try {
+    const mod = await import("../src/services/async-agent-watcher.js");
+    mod.stopWatcher();
+    mod.__resetForTest();
+  } catch {}
+  delete process.env.ALVIN_MISSING_FILE_FAILURE_MS;
+});
+describe("watcher zombie fix (v4.14.2)", () => {
+  it("missing file younger than threshold stays pending (no premature fail)", async () => {
+    // Threshold = 10 min. Backdate only 2 min. Expect: still pending.
+    const mod = await import("../src/services/async-agent-watcher.js");
+    mod.registerPendingAgent({
+      agentId: "young-zombie",
+      outputFile: `${TEST_DATA_DIR}/nonexistent.jsonl`,
+      description: "young",
+      prompt: "p",
+      chatId: 1,
+      userId: 1,
+      toolUseId: null,
+    });
+    // Forcibly set startedAt to 2 min ago
+    const pending = mod.listPendingAgents();
+    expect(pending).toHaveLength(1);
+    (pending[0] as { startedAt: number }).startedAt = Date.now() - 2 * 60_000;
+    await mod.pollOnce();
+    expect(delivered).toHaveLength(0);
+    expect(mod.listPendingAgents()).toHaveLength(1);
+  });
+  it("missing file older than threshold delivers failed + removes from pending", async () => {
+    process.env.ALVIN_MISSING_FILE_FAILURE_MS = "120000"; // 2 min for test
+    const mod = await import("../src/services/async-agent-watcher.js");
+    mod.registerPendingAgent({
+      agentId: "old-zombie",
+      outputFile: `${TEST_DATA_DIR}/never-appears.jsonl`,
+      description: "stuck crash zombie",
+      prompt: "p",
+      chatId: 1,
+      userId: 1,
+      toolUseId: null,
+    });
+    // Backdate 5 min (> 2 min threshold)
+    const pending = mod.listPendingAgents();
+    (pending[0] as { startedAt: number }).startedAt = Date.now() - 5 * 60_000;
+    await mod.pollOnce();
+    expect(delivered).toHaveLength(1);
+    expect(delivered[0].result.status).toBe("error");
+    // Error message should be explicit so user understands
+    expect(delivered[0].result.error).toMatch(/output file|never appeared|never wrote/i);
+    expect(mod.listPendingAgents()).toHaveLength(0);
+  });
+  it("default threshold is 10 min when env var is not set", async () => {
+    const mod = await import("../src/services/async-agent-watcher.js");
+    mod.registerPendingAgent({
+      agentId: "at-default",
+      outputFile: `${TEST_DATA_DIR}/z.jsonl`,
+      description: "default threshold",
+      prompt: "p",
+      chatId: 1,
+      userId: 1,
+      toolUseId: null,
+    });
+    // Backdate 9 min — still under the 10-min default, should stay pending
+    let p = mod.listPendingAgents();
+    (p[0] as { startedAt: number }).startedAt = Date.now() - 9 * 60_000;
+    await mod.pollOnce();
+    expect(delivered).toHaveLength(0);
+    expect(mod.listPendingAgents()).toHaveLength(1);
+    // Backdate to 11 min — over threshold, should fire
+    p = mod.listPendingAgents();
+    (p[0] as { startedAt: number }).startedAt = Date.now() - 11 * 60_000;
+    await mod.pollOnce();
+    expect(delivered).toHaveLength(1);
+  });
+  it("running file (has content, no end_turn) is unaffected by zombie check", async () => {
+    // A file WITH content should never trigger the missing-file path
+    // regardless of age.
+    const outPath = `${TEST_DATA_DIR}/running.jsonl`;
+    fs.writeFileSync(
+      outPath,
+      JSON.stringify({
+        type: "assistant",
+        isSidechain: true,
+        agentId: "x",
+        message: {
+          role: "assistant",
+          content: [{ type: "tool_use", name: "Bash", input: {} }],
+          stop_reason: "tool_use",
+        },
+      }) + "\n",
+      "utf-8",
+    );
+    const mod = await import("../src/services/async-agent-watcher.js");
+    mod.registerPendingAgent({
+      agentId: "active-work",
+      outputFile: outPath,
+      description: "legitimately running",
+      prompt: "p",
+      chatId: 1,
+      userId: 1,
+      toolUseId: null,
+    });
+    const p = mod.listPendingAgents();
+    (p[0] as { startedAt: number }).startedAt = Date.now() - 30 * 60_000; // 30 min old
+    await mod.pollOnce();
+    // v4.12.4 staleness detection COULD fire here because the file has
+    // text content and is stale. That's a different (benign) path — the
+    // agent gets delivered as "completed with partial output". Either
+    // way, the zombie-fix error path must NOT fire.
+    const anyZombieError = delivered.some(
+      (d) => d.result.error && /output file never/i.test(d.result.error),
+    );
+    expect(anyZombieError).toBe(false);
+  });
+  it("completed file delivers as completed (unchanged)", async () => {
+    const outPath = `${TEST_DATA_DIR}/done.jsonl`;
+    fs.writeFileSync(
+      outPath,
+      JSON.stringify({
+        type: "assistant",
+        agentId: "x",
+        message: {
+          content: [{ type: "text", text: "all good" }],
+          stop_reason: "end_turn",
+        },
+      }) + "\n",
+      "utf-8",
+    );
+    const mod = await import("../src/services/async-agent-watcher.js");
+    mod.registerPendingAgent({
+      agentId: "done-agent",
+      outputFile: outPath,
+      description: "clean completion",
+      prompt: "p",
+      chatId: 1,
+      userId: 1,
+      toolUseId: null,
+    });
+    // Backdate 1h — would trigger zombie if misapplied
+    const p = mod.listPendingAgents();
+    (p[0] as { startedAt: number }).startedAt = Date.now() - 60 * 60_000;
+    await mod.pollOnce();
+    expect(delivered).toHaveLength(1);
+    expect(delivered[0].result.status).toBe("completed");
+  });
+  it("decrements session counter on zombie failure delivery", async () => {
+    process.env.ALVIN_MISSING_FILE_FAILURE_MS = "1000"; // 1 sec for fast test
+    const sessionMod = await import("../src/services/session.js");
+    const session = sessionMod.getSession("zombie-session");
+    session.pendingBackgroundCount = 1;
+    const mod = await import("../src/services/async-agent-watcher.js");
+    mod.registerPendingAgent({
+      agentId: "session-zombie",
+      outputFile: `${TEST_DATA_DIR}/gone.jsonl`,
+      description: "zombie for counter",
+      prompt: "p",
+      chatId: 1,
+      userId: 1,
+      toolUseId: null,
+      sessionKey: "zombie-session",
+    });
+    const p = mod.listPendingAgents();
+    (p[0] as { startedAt: number }).startedAt = Date.now() - 5000; // 5 sec ago, > 1sec threshold
+    await mod.pollOnce();
+    expect(delivered).toHaveLength(1);
+    expect(session.pendingBackgroundCount).toBe(0);
+  });
+});