alvin-bot 4.14.1 → 4.15.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +113 -0
- package/README.md +9 -4
- package/dist/handlers/commands.js +10 -0
- package/dist/handlers/message.js +3 -0
- package/dist/handlers/platform-message.js +3 -0
- package/dist/providers/claude-sdk-provider.js +21 -2
- package/dist/providers/registry.js +28 -3
- package/dist/providers/types.js +4 -4
- package/dist/services/async-agent-watcher.js +35 -1
- package/dist/services/env-file.js +46 -0
- package/dist/services/workspaces.js +2 -0
- package/dist/web/openai-compat.js +1 -1
- package/dist/web/setup-api.js +5 -39
- package/docs/security.md +15 -1
- package/package.json +1 -1
- package/test/watcher-zombie-fix.test.ts +252 -0
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,119 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to Alvin Bot are documented here.
|
|
4
4
|
|
|
5
|
+
## [4.15.0] — 2026-04-16
|
|
6
|
+
|
|
7
|
+
### ✨ Feature: auto-latest Claude model selection + per-workspace overrides
|
|
8
|
+
|
|
9
|
+
Alvin now picks up new Claude models (e.g. Opus 4.7 on Max subscription) automatically, and users can switch between Opus / Sonnet / Haiku tiers directly from Telegram — or pin a specific tier per workspace.
|
|
10
|
+
|
|
11
|
+
#### What's new
|
|
12
|
+
|
|
13
|
+
**`/model` now lists four Claude entries** (plus any configured custom providers + Ollama):
|
|
14
|
+
- `Claude (Agent SDK)` — CLI default (= whatever Anthropic ships as current, currently Opus 4.7)
|
|
15
|
+
- `Claude Opus (auto-latest)` — forwards `model: "opus"` to the Agent SDK → latest Opus tier
|
|
16
|
+
- `Claude Sonnet (auto-latest)` — same pattern with Sonnet
|
|
17
|
+
- `Claude Haiku (auto-latest)` — same pattern with Haiku
|
|
18
|
+
|
|
19
|
+
The three aliased entries all route through `ClaudeSDKProvider` with different `model:` values. Switching persists to `~/.alvin-bot/.env` (`PRIMARY_PROVIDER=…`), so the choice survives bot restarts.
|
|
20
|
+
|
|
21
|
+
**Workspaces can pin a model** via an optional YAML frontmatter field:
|
|
22
|
+
|
|
23
|
+
```yaml
|
|
24
|
+
---
|
|
25
|
+
purpose: Interview prep
|
|
26
|
+
cwd: ~/Documents/Interviews
|
|
27
|
+
model: sonnet # opus | sonnet | haiku | claude-opus-4-7 | ...
|
|
28
|
+
---
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
When `model:` is omitted (the default for all existing workspaces), the globally active `/model` choice is used — no behaviour change.
|
|
32
|
+
|
|
33
|
+
**Fallback on rate limits:** the Agent SDK is now always called with `fallbackModel: "haiku"`. Keeps the bot responsive when the primary tier is throttled.
|
|
34
|
+
|
|
35
|
+
#### Why this matters
|
|
36
|
+
|
|
37
|
+
Before v4.15, `claude-opus-4-6` was hardcoded in six places. When Anthropic released Opus 4.7 on the Max plan, the CLI picked it up automatically — but Alvin's `/status` still claimed `claude-opus-4-6`, and there was no way to force a specific tier from Telegram. The Agent SDK's `query()` call wasn't even receiving a `model:` parameter, so whatever lived in `config.model` was dead metadata.
|
|
38
|
+
|
|
39
|
+
Now:
|
|
40
|
+
- The default `"inherit"` means "don't pass model: — let the CLI pick its current default." Fresh installs on Max plans get Opus 4.7 automatically.
|
|
41
|
+
- Aliases (`opus` / `sonnet` / `haiku`) resolve to the latest tier each release cycle without any code change.
|
|
42
|
+
- Pinning a specific ID (e.g. `claude-opus-4-7`) is supported for reproducibility.
|
|
43
|
+
|
|
44
|
+
#### Implementation
|
|
45
|
+
|
|
46
|
+
- `src/providers/claude-sdk-provider.ts` — forwards `model:` and sets `fallbackModel: "haiku"` on every `query()` call. Resolution order: per-query `options.model` → provider `this.config.model` → `"inherit"` (= no model passed).
|
|
47
|
+
- `src/providers/registry.ts` — registers three virtual entries (`claude-opus`, `claude-sonnet`, `claude-haiku`) as additional keys all backed by `ClaudeSDKProvider` with different `model:` values.
|
|
48
|
+
- `src/services/env-file.ts` — new module extracting the `readEnv` / `writeEnvVar` / `removeEnvVar` helpers from `setup-api.ts` so Telegram command handlers can persist runtime choices.
|
|
49
|
+
- `src/handlers/commands.ts` — `switchProviderWithLifecycle` now calls `writeEnvVar("PRIMARY_PROVIDER", targetKey)` on every switch, not just Web UI changes.
|
|
50
|
+
- `src/services/workspaces.ts` — `Workspace` type gets optional `model?: string`, the YAML parser picks it up from frontmatter.
|
|
51
|
+
- `src/providers/types.ts` — `QueryOptions` gets optional `model?: string` for per-query overrides.
|
|
52
|
+
- `src/handlers/message.ts` + `src/handlers/platform-message.ts` — both forward `workspace.model` into `queryOpts` when the active workspace has one defined.
|
|
53
|
+
|
|
54
|
+
#### Backward compatibility
|
|
55
|
+
|
|
56
|
+
- Default provider config is `"inherit"` — identical to pre-v4.15 behaviour (no `model:` passed to the Agent SDK, CLI default wins).
|
|
57
|
+
- Workspaces without a `model:` field behave exactly as before.
|
|
58
|
+
- Stale presets `claude-sonnet-4-20250514` → `claude-sonnet-4-6` and `claude-3-5-haiku-20241022` → `claude-haiku-4-5` updated (previously unused — only affected the REST-API code paths, which nobody referenced).
|
|
59
|
+
|
|
60
|
+
#### Docs
|
|
61
|
+
|
|
62
|
+
Workspace guides updated (`docs/install/workspaces-de.html` + `workspaces-en.html`) — the YAML-field reference table now documents the new optional `model:` entry.
|
|
63
|
+
|
|
64
|
+
### 🐛 Bonus: stale model-ID cleanup
|
|
65
|
+
|
|
66
|
+
Four hardcoded Claude model IDs replaced with current strings: `claude-sonnet-4-20250514` → `claude-sonnet-4-6`, `claude-3-5-haiku-20241022` → `claude-haiku-4-5`, openai-compat fallback `claude-opus-4` → `claude-opus-4-6`, setup-API defaults likewise. None of these were on active code paths, but they would have shipped confusing display names if anyone had referenced them.
|
|
67
|
+
|
|
68
|
+
### Commits
|
|
69
|
+
|
|
70
|
+
- `fed4b91` — feat(providers): v4.15 — auto-latest Claude model selection via /model
|
|
71
|
+
- `b2a6e1f` — feat(workspaces): v4.15 — optional per-workspace model override
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## [4.14.2] — 2026-04-16
|
|
76
|
+
|
|
77
|
+
### 🐛 Patch: watcher zombie-entry fix (missing outputFile > 10 min = failed)
|
|
78
|
+
|
|
79
|
+
**Edge case Ali caught today:** a pending async-agent entry stuck in `/subagents list` for 3+ hours showing "running" — but the underlying `alvin_dispatch_agent` subprocess had already died (its output file was gone). The entry would have continued haunting the list until the 12-hour `giveUpAt` ceiling fired.
|
|
80
|
+
|
|
81
|
+
**Root cause:** `async-agent-watcher`'s `pollOnce` handled four states from `parseOutputFileStatus` — `completed` / `failed` / `running` / `missing`. For `missing` (file doesn't exist or is empty), the watcher just kept polling forever, on the assumption that a slow subprocess might eventually write. If the subprocess crashed before writing ANY output, the file never appeared, and we polled for 12 hours before timing out.
|
|
82
|
+
|
|
83
|
+
**Fix:** when `status.state === "missing"` AND `now - entry.startedAt > MISSING_FILE_FAILURE_MS` (default 10 min, configurable via `ALVIN_MISSING_FILE_FAILURE_MS` env var), deliver as failed with an explicit message:
|
|
84
|
+
|
|
85
|
+
> *Dispatched subprocess never wrote its output file (N m after start). Likely crashed before initializing, or the file was removed externally.*
|
|
86
|
+
|
|
87
|
+
10 minutes is well above any legitimate `claude -p` startup variance (normal first-write latency is seconds) and well below the 12-hour hard ceiling.
|
|
88
|
+
|
|
89
|
+
### What's preserved (regression-guard tested)
|
|
90
|
+
|
|
91
|
+
- Running agents (file has content but no `end_turn`/`result` yet) are untouched by this path — they still keep polling as before.
|
|
92
|
+
- Completed agents (clean `end_turn` or `stream-json result` event) still deliver normally.
|
|
93
|
+
- Explicit `failed` state from the parser (if ever used) still delivers error normally.
|
|
94
|
+
- v4.12.4's "file is stale but has text → deliver partial" path takes precedence over the new zombie check (the file has content, so not "missing").
|
|
95
|
+
- 12-hour `giveUpAt` hard ceiling still applies as the ultimate safety net.
|
|
96
|
+
- Session's `pendingBackgroundCount` decrement fires on zombie failure, same as every other delivery path.
|
|
97
|
+
|
|
98
|
+
### Testing
|
|
99
|
+
|
|
100
|
+
- **Baseline**: 498 tests (v4.14.1)
|
|
101
|
+
- **New**: `test/watcher-zombie-fix.test.ts` — 6 tests:
|
|
102
|
+
- Young missing file (<threshold) stays pending
|
|
103
|
+
- Old missing file (>threshold) delivers failed + removes from pending
|
|
104
|
+
- Default threshold is 10 min when env var unset
|
|
105
|
+
- Running file (has content) is unaffected by zombie check
|
|
106
|
+
- Completed file delivers as completed (regression guard)
|
|
107
|
+
- Session's `pendingBackgroundCount` decrements on zombie delivery
|
|
108
|
+
- **Total**: 504 tests, all green, TSC clean
|
|
109
|
+
|
|
110
|
+
### Files changed
|
|
111
|
+
|
|
112
|
+
- **Modified**: `src/services/async-agent-watcher.ts` (new `getMissingFileFailureMs()` + zombie branch in `pollOnce`)
|
|
113
|
+
- **NEW tests**: `test/watcher-zombie-fix.test.ts`
|
|
114
|
+
- **Version**: `package.json` 4.14.1 → 4.14.2
|
|
115
|
+
|
|
116
|
+
---
|
|
117
|
+
|
|
5
118
|
## [4.14.1] — 2026-04-16
|
|
6
119
|
|
|
7
120
|
### 🐛 Patch: `/subagents list` now shows v4.13+ dispatch agents too
|
package/README.md
CHANGED
|
@@ -64,7 +64,7 @@ Alvin Bot is an open-source, self-hosted AI agent that lives where you chat. Bui
|
|
|
64
64
|
- **Adjustable Thinking** — From quick answers (`/effort low`) to deep analysis (`/effort max`)
|
|
65
65
|
- **Persistent Memory** — Remembers across sessions via vector-indexed knowledge base; session state (Claude SDK resume tokens, conversation history, language, effort) survives bot restarts (v4.11.0)
|
|
66
66
|
- **Multi-Session Workspaces** — Run multiple parallel, context-isolated sessions on the same bot — one per Slack channel or per Telegram `/workspace` — each with its own working directory, purpose, and persona. Memory, skills, and sub-agents stay globally shared (v4.12.0). [How-to ↓](#-multi-session-workspaces-v4120)
|
|
67
|
-
- **
|
|
67
|
+
- **Truly Detached Sub-Agents** — Claude dispatches long-running research/audit tasks via the `alvin_dispatch_agent` MCP tool, which spawns independent `claude -p` subprocesses with their own PID + process group. Main session stays fully responsive, user can interrupt freely without killing sub-agents. Results deliver as separate messages. Works identically on Telegram, Slack, Discord, and WhatsApp (v4.13.0+ dispatch, v4.14.0 multi-platform)
|
|
68
68
|
- **Smart Tool Discovery** — Scans your system at startup, knows exactly what CLI tools, plugins, and APIs are available
|
|
69
69
|
- **Skill System** — 12 built-in SKILL.md files (code, data analysis, email, docs, research, sysadmin, browse, etc.) auto-activate based on message context
|
|
70
70
|
- **Self-Awareness** — Knows it IS the AI model — won't call external APIs for tasks it can do itself
|
|
@@ -406,7 +406,7 @@ curl -s http://localhost:3100/api/workspaces | jq
|
|
|
406
406
|
### Architecture guarantees
|
|
407
407
|
|
|
408
408
|
- **Memory is global.** Facts Alvin learns in `#alev-b` are visible in `#homes` via the shared `MEMORY.md` and embeddings index. Per-workspace memory layer is on the v4.13 roadmap.
|
|
409
|
-
- **Sub-agents are per-session.** Each workspace can
|
|
409
|
+
- **Sub-agents are per-session.** Each workspace can dispatch its own detached sub-agents via `alvin_dispatch_agent` — results come back to the originating channel on any platform (Telegram, Slack, Discord, WhatsApp), visible in `/subagents list` (v4.13.0+ dispatch, v4.14.0 cross-platform, v4.14.1 unified list view).
|
|
410
410
|
- **Session state survives restart.** Claude SDK `resume` tokens, conversation history, language, effort, and `workspaceName` all persist via `session-persistence.ts` (v4.11.0).
|
|
411
411
|
- **Backwards compatible.** If you don't create any workspace files, everything behaves exactly like v4.11. Upgrade is a no-op.
|
|
412
412
|
|
|
@@ -650,14 +650,19 @@ alvin-bot version # Show version
|
|
|
650
650
|
- [x] Telegram `/workspace` + `/workspaces` commands (feature parity)
|
|
651
651
|
- [x] Per-workspace cost aggregation + Web UI workspace cards
|
|
652
652
|
- [x] Slack setup guide + copy-paste app manifest (in GitHub Release assets)
|
|
653
|
-
- [
|
|
653
|
+
- [x] **Phase 17** — Truly detached sub-agents + multi-platform dispatch (v4.13.0 – v4.14.2, 2026-04-16)
|
|
654
|
+
- [x] `alvin_dispatch_agent` MCP tool — spawns independent `claude -p` subprocesses that survive parent aborts (v4.13.0)
|
|
655
|
+
- [x] Slack `/alvin` slash command (namespaced parent with subcommands: status / new / effort / help + LLM fallthrough) (v4.13.2)
|
|
656
|
+
- [x] Sub-agent dispatch on Slack, Discord, WhatsApp via platform-aware delivery registry (v4.14.0)
|
|
657
|
+
- [x] `/subagents list` merged view — v4.0.0 bot-level agents + v4.13+ detached dispatches in one list (v4.14.1)
|
|
658
|
+
- [x] Watcher zombie guard — missing outputFile > 10 min delivers as failed instead of 12h timeout (v4.14.2)
|
|
659
|
+
- [x] Staleness-based partial output recovery for interrupted sub-agents (v4.12.4)
|
|
654
660
|
- [ ] SQLite migration of the embeddings index (currently 128 MB JSON)
|
|
655
661
|
- [ ] Per-workspace memory layer (additive over global) — facts learned in `#alev-b` stay in `alev-b` unless explicitly promoted to global
|
|
656
662
|
- [ ] Per-workspace provider override (`provider:` in frontmatter) — e.g. Alev-B uses Claude Opus, JobSnack uses cheap Gemini
|
|
657
663
|
- [ ] Per-workspace skill allowlist — scope Apple Notes to personal workspace, sysadmin only to devops workspace, etc.
|
|
658
664
|
- [ ] Multi-User Slack (real `per-channel-peer` mode) — different users in the same Slack channel get their own sub-sessions
|
|
659
665
|
- [ ] Workspace cloning / templates — `/workspace clone alev-b as homes-dev` spins up a new workspace from an existing one
|
|
660
|
-
- [ ] Slack slash commands (`/alvin workspace`, `/alvin status`, `/alvin new`) — native Slack command integration via Bolt
|
|
661
666
|
- [ ] Daily log decay / archive — older daily logs move to cold storage after N days
|
|
662
667
|
- [ ] **Phase 18** — Security + Platform hardening (from v4.12.1 audit, prioritized)
|
|
663
668
|
- [ ] **P1 — Electron major upgrade** (35 → 41+) — fixes 1 HIGH + 5 MODERATE Electron CVEs in the Desktop-Build path. Major version jump, requires full rebuild + test of `.dmg` flow. Separate release (likely bundled with Windows `.exe` work).
|
|
@@ -16,6 +16,7 @@ import { getLoadedPlugins, getPluginsDir } from "../services/plugins.js";
|
|
|
16
16
|
import { getMCPStatus, getMCPTools, callMCPTool } from "../services/mcp.js";
|
|
17
17
|
import { listCustomTools, executeCustomTool } from "../services/custom-tools.js";
|
|
18
18
|
import { screenshotUrl, extractText, generatePdf, hasPlaywright } from "../services/browser.js";
|
|
19
|
+
import { writeEnvVar } from "../services/env-file.js";
|
|
19
20
|
import { listJobs, createJob, deleteJob, toggleJob, runJobNow, formatNextRun, humanReadableSchedule } from "../services/cron.js";
|
|
20
21
|
import { resolveJobByNameOrId } from "../services/cron-resolver.js";
|
|
21
22
|
import { buildTickerText, buildDoneText, escapeMarkdown } from "./cron-progress.js";
|
|
@@ -518,6 +519,15 @@ export function registerCommands(bot) {
|
|
|
518
519
|
if (!registry.switchTo(targetKey)) {
|
|
519
520
|
return { ok: false, error: "switch rejected by registry" };
|
|
520
521
|
}
|
|
522
|
+
// v4.15 — Persist the switch to ~/.alvin-bot/.env so the choice
|
|
523
|
+
// survives bot restarts. In-memory switchTo() alone would revert to
|
|
524
|
+
// PRIMARY_PROVIDER on next boot.
|
|
525
|
+
try {
|
|
526
|
+
writeEnvVar("PRIMARY_PROVIDER", targetKey);
|
|
527
|
+
}
|
|
528
|
+
catch (err) {
|
|
529
|
+
console.warn("⚠️ Failed to persist PRIMARY_PROVIDER:", err);
|
|
530
|
+
}
|
|
521
531
|
// Tear down the previous provider's lifecycle (if any) after the switch.
|
|
522
532
|
// ensureStopped() internally checks isBotManaged — no-op for externally
|
|
523
533
|
// managed daemons.
|
package/dist/handlers/message.js
CHANGED
|
@@ -386,6 +386,9 @@ export async function handleMessage(ctx) {
|
|
|
386
386
|
systemPrompt,
|
|
387
387
|
workingDir: session.workingDir,
|
|
388
388
|
effort: session.effort,
|
|
389
|
+
// v4.15 — Per-workspace model override (optional YAML `model:` field).
|
|
390
|
+
// When unset, falls through to the globally active provider's model.
|
|
391
|
+
...(workspace.model ? { model: workspace.model } : {}),
|
|
389
392
|
abortSignal: session.abortController.signal,
|
|
390
393
|
// User's UI locale — registry uses it to localize failure messages.
|
|
391
394
|
locale: session.language,
|
|
@@ -169,6 +169,9 @@ export async function handlePlatformMessage(msg, adapter) {
|
|
|
169
169
|
systemPrompt,
|
|
170
170
|
workingDir: session.workingDir,
|
|
171
171
|
effort: session.effort,
|
|
172
|
+
// v4.15 — Per-workspace model override (optional YAML `model:` field).
|
|
173
|
+
// When unset, falls through to the globally active provider's model.
|
|
174
|
+
...(workspace.model ? { model: workspace.model } : {}),
|
|
172
175
|
sessionId: isSDK ? session.sessionId : null,
|
|
173
176
|
history: !isSDK ? session.history : undefined,
|
|
174
177
|
// v4.14 — Expose alvin_dispatch_agent MCP tool on non-Telegram
|
|
@@ -49,7 +49,10 @@ export class ClaudeSDKProvider {
|
|
|
49
49
|
this.config = {
|
|
50
50
|
type: "claude-sdk",
|
|
51
51
|
name: "Claude (Agent SDK)",
|
|
52
|
-
model:
|
|
52
|
+
// "inherit" = don't pass model: to the SDK → Claude CLI default wins
|
|
53
|
+
// (currently Opus 4.7 on Max subscription). Override with an alias
|
|
54
|
+
// ("opus" | "sonnet" | "haiku") or a full ID ("claude-opus-4-7").
|
|
55
|
+
model: "inherit",
|
|
53
56
|
supportsTools: true,
|
|
54
57
|
supportsVision: true,
|
|
55
58
|
supportsStreaming: true,
|
|
@@ -123,6 +126,13 @@ export class ClaudeSDKProvider {
|
|
|
123
126
|
if (options.alvinDispatchContext) {
|
|
124
127
|
defaultAllowed.push("mcp__alvin__dispatch_agent");
|
|
125
128
|
}
|
|
129
|
+
// v4.15 — Forward model selection to the Agent SDK. Resolution order:
|
|
130
|
+
// 1. options.model (per-query override — e.g. workspace `model:` field)
|
|
131
|
+
// 2. this.config.model (provider-level default — e.g. claude-sonnet)
|
|
132
|
+
// 3. "inherit" → don't pass model: → Claude CLI default (Opus 4.7 on Max)
|
|
133
|
+
// Aliases "opus" | "sonnet" | "haiku" auto-resolve to the latest tier.
|
|
134
|
+
const rawModel = options.model ?? this.config.model;
|
|
135
|
+
const modelOverride = rawModel && rawModel !== "inherit" ? rawModel : undefined;
|
|
126
136
|
const q = query({
|
|
127
137
|
prompt,
|
|
128
138
|
options: {
|
|
@@ -145,6 +155,12 @@ export class ClaudeSDKProvider {
|
|
|
145
155
|
effort: (options.effort || "medium"),
|
|
146
156
|
maxTurns: 50,
|
|
147
157
|
betas: ["context-1m-2025-08-07"],
|
|
158
|
+
...(modelOverride ? { model: modelOverride } : {}),
|
|
159
|
+
// Always prefer Haiku as fallback on rate-limit/overload — cheap
|
|
160
|
+
// and fast, keeps the bot responsive when the primary tier is
|
|
161
|
+
// throttled. Users can disable this by setting model: "inherit"
|
|
162
|
+
// and relying purely on CLI defaults.
|
|
163
|
+
fallbackModel: "haiku",
|
|
148
164
|
},
|
|
149
165
|
});
|
|
150
166
|
let accumulatedText = "";
|
|
@@ -370,9 +386,12 @@ export class ClaudeSDKProvider {
|
|
|
370
386
|
}
|
|
371
387
|
}
|
|
372
388
|
getInfo() {
|
|
389
|
+
const model = this.config.model === "inherit"
|
|
390
|
+
? "CLI default (latest)"
|
|
391
|
+
: this.config.model;
|
|
373
392
|
return {
|
|
374
393
|
name: this.config.name,
|
|
375
|
-
model
|
|
394
|
+
model,
|
|
376
395
|
status: "✅ Agent SDK (CLI auth)",
|
|
377
396
|
};
|
|
378
397
|
}
|
|
@@ -271,13 +271,38 @@ export function createRegistry(config) {
|
|
|
271
271
|
model: "gpt-5.4",
|
|
272
272
|
};
|
|
273
273
|
}
|
|
274
|
-
//
|
|
275
|
-
|
|
274
|
+
// Claude (Agent SDK) — the base provider plus three tier-aliased virtual
|
|
275
|
+
// entries. All four route through the same ClaudeSDKProvider implementation
|
|
276
|
+
// but pass a different `model:` to the Agent SDK at query time. The aliases
|
|
277
|
+
// ("opus" | "sonnet" | "haiku") auto-resolve to the latest tier on the
|
|
278
|
+
// Claude CLI — no hardcoded version IDs, no manual updates when Anthropic
|
|
279
|
+
// releases a new model.
|
|
280
|
+
const claudeKeys = ["claude-sdk", "claude-opus", "claude-sonnet", "claude-haiku"];
|
|
281
|
+
const claudeReferenced = claudeKeys.some((k) => config.primary === k || config.fallbacks?.includes(k));
|
|
282
|
+
if (claudeReferenced) {
|
|
276
283
|
providers["claude-sdk"] = {
|
|
277
284
|
...PROVIDER_PRESETS["claude-sdk"],
|
|
278
285
|
type: "claude-sdk",
|
|
279
286
|
name: "Claude (Agent SDK)",
|
|
280
|
-
model: "
|
|
287
|
+
model: "inherit", // CLI default → currently Opus 4.7 on Max plan
|
|
288
|
+
};
|
|
289
|
+
providers["claude-opus"] = {
|
|
290
|
+
...PROVIDER_PRESETS["claude-sdk"],
|
|
291
|
+
type: "claude-sdk",
|
|
292
|
+
name: "Claude Opus (auto-latest)",
|
|
293
|
+
model: "opus",
|
|
294
|
+
};
|
|
295
|
+
providers["claude-sonnet"] = {
|
|
296
|
+
...PROVIDER_PRESETS["claude-sdk"],
|
|
297
|
+
type: "claude-sdk",
|
|
298
|
+
name: "Claude Sonnet (auto-latest)",
|
|
299
|
+
model: "sonnet",
|
|
300
|
+
};
|
|
301
|
+
providers["claude-haiku"] = {
|
|
302
|
+
...PROVIDER_PRESETS["claude-sdk"],
|
|
303
|
+
type: "claude-sdk",
|
|
304
|
+
name: "Claude Haiku (auto-latest)",
|
|
305
|
+
model: "haiku",
|
|
281
306
|
};
|
|
282
307
|
}
|
|
283
308
|
// Register Google Gemini only if explicitly referenced as primary/fallback
|
package/dist/providers/types.js
CHANGED
|
@@ -38,8 +38,8 @@ export const PROVIDER_PRESETS = {
|
|
|
38
38
|
},
|
|
39
39
|
"claude-sonnet": {
|
|
40
40
|
type: "openai-compatible",
|
|
41
|
-
name: "Claude Sonnet 4",
|
|
42
|
-
model: "claude-sonnet-4-
|
|
41
|
+
name: "Claude Sonnet 4.6",
|
|
42
|
+
model: "claude-sonnet-4-6",
|
|
43
43
|
baseUrl: "https://api.anthropic.com/v1/",
|
|
44
44
|
supportsVision: true,
|
|
45
45
|
supportsStreaming: true,
|
|
@@ -48,8 +48,8 @@ export const PROVIDER_PRESETS = {
|
|
|
48
48
|
},
|
|
49
49
|
"claude-haiku": {
|
|
50
50
|
type: "openai-compatible",
|
|
51
|
-
name: "Claude
|
|
52
|
-
model: "claude-
|
|
51
|
+
name: "Claude Haiku 4.5",
|
|
52
|
+
model: "claude-haiku-4-5",
|
|
53
53
|
baseUrl: "https://api.anthropic.com/v1/",
|
|
54
54
|
supportsVision: true,
|
|
55
55
|
supportsStreaming: true,
|
|
@@ -33,6 +33,31 @@ const POLL_INTERVAL_MS = 15_000;
|
|
|
33
33
|
* a timeout banner. SEO audits historically take ~13 min, so 12h
|
|
34
34
|
* is absurdly generous and protects against state-file growth. */
|
|
35
35
|
const MAX_AGENT_AGE_MS = 12 * 60 * 60 * 1000;
|
|
36
|
+
/**
|
|
37
|
+
* v4.14.2 — When a dispatched subprocess never creates its outputFile
|
|
38
|
+
* (spawn failure, crash before first write, file deleted externally),
|
|
39
|
+
* `parseOutputFileStatus` returns "missing" on every poll. Pre-v4.14.2
|
|
40
|
+
* that meant waiting the full 12h MAX_AGENT_AGE_MS before delivering a
|
|
41
|
+
* timeout — a 12-hour zombie in `/subagents list`.
|
|
42
|
+
*
|
|
43
|
+
* This threshold caps how long we tolerate a missing file before
|
|
44
|
+
* declaring the agent failed. `claude -p` normally writes its first
|
|
45
|
+
* JSONL line within seconds of spawn; 10 minutes is way above any
|
|
46
|
+
* legitimate startup variance and well below the 12h ceiling.
|
|
47
|
+
*
|
|
48
|
+
* Configurable via the ALVIN_MISSING_FILE_FAILURE_MS env var. Tests
|
|
49
|
+
* use shorter values via the same hook. Only the getter is exposed
|
|
50
|
+
* so callers always see the current env value, not a stale constant.
|
|
51
|
+
*/
|
|
52
|
+
function getMissingFileFailureMs() {
|
|
53
|
+
const raw = process.env.ALVIN_MISSING_FILE_FAILURE_MS;
|
|
54
|
+
if (raw) {
|
|
55
|
+
const n = Number(raw);
|
|
56
|
+
if (Number.isFinite(n) && n > 0)
|
|
57
|
+
return n;
|
|
58
|
+
}
|
|
59
|
+
return 10 * 60 * 1000; // default 10 min
|
|
60
|
+
}
|
|
36
61
|
// ── Module state ──────────────────────────────────────────────────
|
|
37
62
|
const pending = new Map();
|
|
38
63
|
let pollTimer = null;
|
|
@@ -139,6 +164,7 @@ export function stopWatcher() {
|
|
|
139
164
|
export async function pollOnce() {
|
|
140
165
|
const now = Date.now();
|
|
141
166
|
const toRemove = [];
|
|
167
|
+
const missingFileFailureMs = getMissingFileFailureMs();
|
|
142
168
|
for (const entry of pending.values()) {
|
|
143
169
|
entry.lastCheckedAt = now;
|
|
144
170
|
// Timeout check first — if the agent is past its giveUpAt, give up
|
|
@@ -157,7 +183,15 @@ export async function pollOnce() {
|
|
|
157
183
|
await deliverAsFailure(entry, "error", status.error);
|
|
158
184
|
toRemove.push(entry.agentId);
|
|
159
185
|
}
|
|
160
|
-
|
|
186
|
+
else if (status.state === "missing" &&
|
|
187
|
+
now - entry.startedAt > missingFileFailureMs) {
|
|
188
|
+
// v4.14.2 — Zombie guard: the subprocess never created its
|
|
189
|
+
// output file within `missingFileFailureMs` (default 10 min).
|
|
190
|
+
// Declare failed instead of polling until the 12h giveUpAt.
|
|
191
|
+
await deliverAsFailure(entry, "error", `Dispatched subprocess never wrote its output file (${Math.round((now - entry.startedAt) / 60_000)}m after start). Likely crashed before initializing, or the file was removed externally.`);
|
|
192
|
+
toRemove.push(entry.agentId);
|
|
193
|
+
}
|
|
194
|
+
// running / missing-but-young → keep polling next cycle
|
|
161
195
|
}
|
|
162
196
|
if (toRemove.length > 0) {
|
|
163
197
|
for (const id of toRemove)
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* env-file — Shared helpers for reading and persisting key=value pairs
|
|
3
|
+
* in ~/.alvin-bot/.env. Previously private to setup-api.ts; extracted so
|
|
4
|
+
* Telegram command handlers (e.g. /model) can persist the user's runtime
|
|
5
|
+
* choices across bot restarts.
|
|
6
|
+
*
|
|
7
|
+
* All writes go through writeSecure() which enforces 0o600 on the env
|
|
8
|
+
* file — it contains bot tokens and API keys.
|
|
9
|
+
*/
|
|
10
|
+
import fs from "fs";
|
|
11
|
+
import { ENV_FILE } from "../paths.js";
|
|
12
|
+
import { writeSecure } from "./file-permissions.js";
|
|
13
|
+
/** Read the env file into a plain object. Skips comments and malformed lines. */
|
|
14
|
+
export function readEnv() {
|
|
15
|
+
if (!fs.existsSync(ENV_FILE))
|
|
16
|
+
return {};
|
|
17
|
+
const lines = fs.readFileSync(ENV_FILE, "utf-8").split("\n");
|
|
18
|
+
const env = {};
|
|
19
|
+
for (const line of lines) {
|
|
20
|
+
if (line.startsWith("#") || !line.includes("="))
|
|
21
|
+
continue;
|
|
22
|
+
const idx = line.indexOf("=");
|
|
23
|
+
env[line.slice(0, idx).trim()] = line.slice(idx + 1).trim();
|
|
24
|
+
}
|
|
25
|
+
return env;
|
|
26
|
+
}
|
|
27
|
+
/** Upsert a key=value pair in the env file, preserving all other lines. */
|
|
28
|
+
export function writeEnvVar(key, value) {
|
|
29
|
+
let content = fs.existsSync(ENV_FILE) ? fs.readFileSync(ENV_FILE, "utf-8") : "";
|
|
30
|
+
const regex = new RegExp(`^${key}=.*$`, "m");
|
|
31
|
+
if (regex.test(content)) {
|
|
32
|
+
content = content.replace(regex, `${key}=${value}`);
|
|
33
|
+
}
|
|
34
|
+
else {
|
|
35
|
+
content = content.trimEnd() + `\n${key}=${value}\n`;
|
|
36
|
+
}
|
|
37
|
+
writeSecure(ENV_FILE, content);
|
|
38
|
+
}
|
|
39
|
+
/** Remove a key from the env file. No-op if missing. */
|
|
40
|
+
export function removeEnvVar(key) {
|
|
41
|
+
if (!fs.existsSync(ENV_FILE))
|
|
42
|
+
return;
|
|
43
|
+
let content = fs.readFileSync(ENV_FILE, "utf-8");
|
|
44
|
+
content = content.replace(new RegExp(`^${key}=.*\n?`, "m"), "");
|
|
45
|
+
writeSecure(ENV_FILE, content);
|
|
46
|
+
}
|
|
@@ -99,6 +99,7 @@ function readWorkspaceFile(filePath, name) {
|
|
|
99
99
|
const cwd = expandHome(rawCwd);
|
|
100
100
|
const color = typeof fm.color === "string" ? fm.color : undefined;
|
|
101
101
|
const emoji = typeof fm.emoji === "string" ? fm.emoji : undefined;
|
|
102
|
+
const model = typeof fm.model === "string" && fm.model.trim() ? fm.model.trim() : undefined;
|
|
102
103
|
const channels = Array.isArray(fm.channels)
|
|
103
104
|
? fm.channels.filter((c) => typeof c === "string")
|
|
104
105
|
: [];
|
|
@@ -109,6 +110,7 @@ function readWorkspaceFile(filePath, name) {
|
|
|
109
110
|
color,
|
|
110
111
|
emoji,
|
|
111
112
|
channels,
|
|
113
|
+
model,
|
|
112
114
|
systemPromptOverride: body.trim(),
|
|
113
115
|
};
|
|
114
116
|
}
|
|
@@ -78,7 +78,7 @@ async function handleChatCompletions(req, res, body) {
|
|
|
78
78
|
const { prompt, systemPrompt } = buildPromptFromMessages(oaiReq.messages);
|
|
79
79
|
const completionId = `chatcmpl-${crypto.randomUUID().replace(/-/g, "").slice(0, 24)}`;
|
|
80
80
|
const created = Math.floor(Date.now() / 1000);
|
|
81
|
-
const model = oaiReq.model || "claude-opus-4";
|
|
81
|
+
const model = oaiReq.model || "claude-opus-4-6";
|
|
82
82
|
// Optional session resumption via header
|
|
83
83
|
const sessionId = req.headers["x-session-id"] || null;
|
|
84
84
|
const p = getProvider();
|
package/dist/web/setup-api.js
CHANGED
|
@@ -12,42 +12,8 @@ import { execSync } from "child_process";
|
|
|
12
12
|
import { getRegistry } from "../engine.js";
|
|
13
13
|
import { listJobs, createJob, deleteJob, toggleJob, updateJob, runJobNow, formatNextRun, humanReadableSchedule } from "../services/cron.js";
|
|
14
14
|
import { storePassword, revokePassword, getSudoStatus, verifyPassword, sudoExec, requestAdminViaDialog, openSystemSettings } from "../services/sudo.js";
|
|
15
|
-
import {
|
|
16
|
-
import {
|
|
17
|
-
// ── Env Helpers ─────────────────────────────────────────
|
|
18
|
-
function readEnv() {
|
|
19
|
-
if (!fs.existsSync(ENV_FILE))
|
|
20
|
-
return {};
|
|
21
|
-
const lines = fs.readFileSync(ENV_FILE, "utf-8").split("\n");
|
|
22
|
-
const env = {};
|
|
23
|
-
for (const line of lines) {
|
|
24
|
-
if (line.startsWith("#") || !line.includes("="))
|
|
25
|
-
continue;
|
|
26
|
-
const idx = line.indexOf("=");
|
|
27
|
-
env[line.slice(0, idx).trim()] = line.slice(idx + 1).trim();
|
|
28
|
-
}
|
|
29
|
-
return env;
|
|
30
|
-
}
|
|
31
|
-
function writeEnvVar(key, value) {
|
|
32
|
-
let content = fs.existsSync(ENV_FILE) ? fs.readFileSync(ENV_FILE, "utf-8") : "";
|
|
33
|
-
const regex = new RegExp(`^${key}=.*$`, "m");
|
|
34
|
-
if (regex.test(content)) {
|
|
35
|
-
content = content.replace(regex, `${key}=${value}`);
|
|
36
|
-
}
|
|
37
|
-
else {
|
|
38
|
-
content = content.trimEnd() + `\n${key}=${value}\n`;
|
|
39
|
-
}
|
|
40
|
-
// v4.12.2 — .env contains all secrets (bot tokens, API keys). Enforce
|
|
41
|
-
// 0o600 so other users on the machine can't read it.
|
|
42
|
-
writeSecure(ENV_FILE, content);
|
|
43
|
-
}
|
|
44
|
-
function removeEnvVar(key) {
|
|
45
|
-
if (!fs.existsSync(ENV_FILE))
|
|
46
|
-
return;
|
|
47
|
-
let content = fs.readFileSync(ENV_FILE, "utf-8");
|
|
48
|
-
content = content.replace(new RegExp(`^${key}=.*\n?`, "m"), "");
|
|
49
|
-
writeSecure(ENV_FILE, content);
|
|
50
|
-
}
|
|
15
|
+
import { CUSTOM_MODELS as CUSTOM_MODELS_FILE, BOT_ROOT, WHATSAPP_AUTH } from "../paths.js";
|
|
16
|
+
import { readEnv, writeEnvVar, removeEnvVar } from "../services/env-file.js";
|
|
51
17
|
function loadCustomModels() {
|
|
52
18
|
try {
|
|
53
19
|
return JSON.parse(fs.readFileSync(CUSTOM_MODELS_FILE, "utf-8"));
|
|
@@ -180,9 +146,9 @@ const PROVIDERS = [
|
|
|
180
146
|
description: "Claude Opus, Sonnet, Haiku directly via API key. OpenAI-compatible.",
|
|
181
147
|
envKey: "ANTHROPIC_API_KEY",
|
|
182
148
|
models: [
|
|
183
|
-
{ key: "claude-opus", name: "Claude Opus 4", model: "claude-opus-4-6" },
|
|
184
|
-
{ key: "claude-sonnet", name: "Claude Sonnet 4", model: "claude-sonnet-4-
|
|
185
|
-
{ key: "claude-haiku", name: "Claude
|
|
149
|
+
{ key: "claude-opus", name: "Claude Opus 4.6", model: "claude-opus-4-6" },
|
|
150
|
+
{ key: "claude-sonnet", name: "Claude Sonnet 4.6", model: "claude-sonnet-4-6" },
|
|
151
|
+
{ key: "claude-haiku", name: "Claude Haiku 4.5", model: "claude-haiku-4-5" },
|
|
186
152
|
],
|
|
187
153
|
signupUrl: "https://console.anthropic.com/settings/keys",
|
|
188
154
|
docsUrl: "https://docs.anthropic.com/en/api",
|
package/docs/security.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Alvin Bot — Security Threat Model & Hardening Guide
|
|
2
2
|
|
|
3
|
-
> **Last updated:** 2026-04-
|
|
3
|
+
> **Last updated:** 2026-04-16 (v4.14.2)
|
|
4
4
|
> **Audience:** Operators installing Alvin Bot on their own machine.
|
|
5
5
|
> **Short version:** Alvin Bot is a full AI agent with shell, filesystem, and network access on the machine it runs on. Treat it like you would `sudo` access. Only install on machines where you would trust Claude Code to run without supervision.
|
|
6
6
|
|
|
@@ -270,6 +270,20 @@ If you suspect the bot has been compromised or exfiltrated secrets:
|
|
|
270
270
|
|
|
271
271
|
## Version history
|
|
272
272
|
|
|
273
|
+
- **v4.14.2** (2026-04-16) — Watcher zombie guard: missing outputFile > 10 min (env-configurable) delivers as failed instead of 12h timeout. Prevents stuck pending entries when a dispatched `claude -p` subprocess crashes before writing output or the file gets removed externally. No new attack surface.
|
|
274
|
+
|
|
275
|
+
- **v4.14.1** (2026-04-16) — `/subagents list` unified view: merges v4.0.0 bot-level `activeAgents` registry with v4.13+ `async-agent-watcher` pending registry. Cosmetic/diagnostic only, no security implications.
|
|
276
|
+
|
|
277
|
+
- **v4.14.0** (2026-04-16) — Sub-agent dispatch on Slack / Discord / WhatsApp via the `alvin_dispatch_agent` MCP tool. New `delivery-registry` module routes sub-agent deliveries to the right platform adapter. Types widened (`chatId: number | string`, `platform?: ...`). Telegram path bit-for-bit unchanged. Trust boundary expanded: each non-Telegram platform adapter now has `sendText` access to its respective channel — same trust level as the main adapter's `sendText`, no new capabilities.
|
|
278
|
+
|
|
279
|
+
- **v4.13.2** (2026-04-16) — Slack `/alvin` slash command via Bolt `app.command()` handler. Requires the `commands` OAuth scope on the Slack app. Subcommand parsing is case-insensitive on the command word, preserves args verbatim. Ack within 3 seconds; response via `chat.postMessage` (persistent, channel-visible). No new network surface.
|
|
280
|
+
|
|
281
|
+
- **v4.13.1** (2026-04-16) — Slack Test Connection endpoint validated via `auth.test` (cheap, no ambient state change). Maintenance UI (`/api/pm2/*` routes, kept for compat) now auto-detects launchd / PM2 / standalone via new `process-manager` abstraction. No new external attack surface.
|
|
282
|
+
|
|
283
|
+
- **v4.13.0** (2026-04-16) — **Architectural**: `alvin_dispatch_agent` MCP tool spawns truly detached `claude -p` subprocesses via `child_process.spawn({ detached: true, ..., unref() })`. The subprocess inherits current env (with `CLAUDECODE`/`CLAUDE_CODE_ENTRYPOINT` stripped to prevent nested-session errors) and writes stream-json to `~/.alvin-bot/subagents/<agentId>.jsonl`. Trust boundary: each dispatched subprocess runs with the same user privileges as the parent bot — same trust as `Bash` tool executions. The subprocess has its own separate abort lifecycle; parent abort (e.g. bypass-abort from v4.12.3) no longer cascades into killing the sub-agent, which was a legitimate concern under the old Task-tool-based flow.
|
|
284
|
+
|
|
285
|
+
- **v4.12.4** (2026-04-16) — Parser staleness detection: if outputFile hasn't been written in `ALVIN_SUBAGENT_STALENESS_MS` (default 5 min) AND has usable assistant text, deliver as "completed with partial output" instead of waiting 12h for timeout. Recovers real work from agents interrupted mid-execution. No new privileges or surface.
|
|
286
|
+
|
|
273
287
|
- **v4.12.2** (2026-04-15) — First formal security release: file-permissions hardening, ALLOWED_USERS hard-fail, webhook timing-safe comparison, exec-guard metachar rejection, cron shell-job execGuard integration, sub-agent toolset presets (readonly, research), axios + claude-agent-sdk CVE patches. This document.
|
|
274
288
|
|
|
275
289
|
- **v4.12.0 – v4.12.1** — Multi-session + Slack + task-aware stuck timer. No dedicated security content, though the v4.12.0 session-key fix closed a confused-deputy bug on Slack/WhatsApp where all channels from the same user collapsed into one session.
|
package/package.json
CHANGED
|
@@ -0,0 +1,252 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* v4.14.2 — zombie-entry fix for async-agent-watcher.
|
|
3
|
+
*
|
|
4
|
+
* Problem: when the dispatched `claude -p` subprocess never produces
|
|
5
|
+
* its outputFile (crashed before the first write, spawn failed, file
|
|
6
|
+
* got deleted externally), `parseOutputFileStatus` returns "missing"
|
|
7
|
+
* on every poll. The watcher keeps polling forever until `giveUpAt`
|
|
8
|
+
* (12 hours) fires, then delivers a timeout banner. Meanwhile the
|
|
9
|
+
* entry hangs in `/subagents list` as a permanent "running" zombie.
|
|
10
|
+
*
|
|
11
|
+
* Fix: when status is "missing" for longer than
|
|
12
|
+
* `MISSING_FILE_FAILURE_MS` (default 10 min, env-configurable), the
|
|
13
|
+
* watcher declares the agent failed with a clear "output file never
|
|
14
|
+
* appeared" reason, delivers the failure banner, and removes the
|
|
15
|
+
* entry. 10 minutes is well above normal startup variance (seconds)
|
|
16
|
+
* and well below the 12h hard ceiling.
|
|
17
|
+
*
|
|
18
|
+
* Invariants preserved:
|
|
19
|
+
* - An agent whose output file DOES appear, even slowly, continues
|
|
20
|
+
* normally (missing on first poll, running on second, completed
|
|
21
|
+
* on third — same as v4.14.1).
|
|
22
|
+
* - The `completed` path (end_turn or stream-json result) is
|
|
23
|
+
* unchanged.
|
|
24
|
+
* - The `failed` path (existing "error" state from parser) is
|
|
25
|
+
* unchanged.
|
|
26
|
+
* - The 12h giveUpAt ceiling still applies — it's now just less
|
|
27
|
+
* likely to be hit because missing-file zombies resolve earlier.
|
|
28
|
+
*/
|
|
29
|
+
import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
|
|
30
|
+
import fs from "fs";
|
|
31
|
+
import os from "os";
|
|
32
|
+
import { resolve } from "path";
|
|
33
|
+
|
|
34
|
+
const TEST_DATA_DIR = resolve(
|
|
35
|
+
os.tmpdir(),
|
|
36
|
+
`alvin-zombie-${process.pid}-${Date.now()}`,
|
|
37
|
+
);
|
|
38
|
+
|
|
39
|
+
interface Delivered {
|
|
40
|
+
info: { name: string; status: string };
|
|
41
|
+
result: { status: string; output: string; error?: string };
|
|
42
|
+
}
|
|
43
|
+
let delivered: Delivered[] = [];
|
|
44
|
+
|
|
45
|
+
beforeEach(async () => {
|
|
46
|
+
if (fs.existsSync(TEST_DATA_DIR)) {
|
|
47
|
+
fs.rmSync(TEST_DATA_DIR, { recursive: true, force: true });
|
|
48
|
+
}
|
|
49
|
+
fs.mkdirSync(TEST_DATA_DIR, { recursive: true });
|
|
50
|
+
process.env.ALVIN_DATA_DIR = TEST_DATA_DIR;
|
|
51
|
+
// Reset the env override between tests
|
|
52
|
+
delete process.env.ALVIN_MISSING_FILE_FAILURE_MS;
|
|
53
|
+
delivered = [];
|
|
54
|
+
vi.resetModules();
|
|
55
|
+
vi.doMock("../src/services/subagent-delivery.js", () => ({
|
|
56
|
+
deliverSubAgentResult: async (info: unknown, result: unknown) => {
|
|
57
|
+
delivered.push({
|
|
58
|
+
info: info as Delivered["info"],
|
|
59
|
+
result: result as Delivered["result"],
|
|
60
|
+
});
|
|
61
|
+
},
|
|
62
|
+
attachBotApi: () => {},
|
|
63
|
+
__setBotApiForTest: () => {},
|
|
64
|
+
}));
|
|
65
|
+
});
|
|
66
|
+
|
|
67
|
+
afterEach(async () => {
|
|
68
|
+
try {
|
|
69
|
+
const mod = await import("../src/services/async-agent-watcher.js");
|
|
70
|
+
mod.stopWatcher();
|
|
71
|
+
mod.__resetForTest();
|
|
72
|
+
} catch {}
|
|
73
|
+
delete process.env.ALVIN_MISSING_FILE_FAILURE_MS;
|
|
74
|
+
});
|
|
75
|
+
|
|
76
|
+
describe("watcher zombie fix (v4.14.2)", () => {
|
|
77
|
+
it("missing file younger than threshold stays pending (no premature fail)", async () => {
|
|
78
|
+
// Threshold = 10 min. Backdate only 2 min. Expect: still pending.
|
|
79
|
+
const mod = await import("../src/services/async-agent-watcher.js");
|
|
80
|
+
mod.registerPendingAgent({
|
|
81
|
+
agentId: "young-zombie",
|
|
82
|
+
outputFile: `${TEST_DATA_DIR}/nonexistent.jsonl`,
|
|
83
|
+
description: "young",
|
|
84
|
+
prompt: "p",
|
|
85
|
+
chatId: 1,
|
|
86
|
+
userId: 1,
|
|
87
|
+
toolUseId: null,
|
|
88
|
+
});
|
|
89
|
+
// Forcibly set startedAt to 2 min ago
|
|
90
|
+
const pending = mod.listPendingAgents();
|
|
91
|
+
expect(pending).toHaveLength(1);
|
|
92
|
+
(pending[0] as { startedAt: number }).startedAt = Date.now() - 2 * 60_000;
|
|
93
|
+
|
|
94
|
+
await mod.pollOnce();
|
|
95
|
+
|
|
96
|
+
expect(delivered).toHaveLength(0);
|
|
97
|
+
expect(mod.listPendingAgents()).toHaveLength(1);
|
|
98
|
+
});
|
|
99
|
+
|
|
100
|
+
it("missing file older than threshold delivers failed + removes from pending", async () => {
|
|
101
|
+
process.env.ALVIN_MISSING_FILE_FAILURE_MS = "120000"; // 2 min for test
|
|
102
|
+
const mod = await import("../src/services/async-agent-watcher.js");
|
|
103
|
+
mod.registerPendingAgent({
|
|
104
|
+
agentId: "old-zombie",
|
|
105
|
+
outputFile: `${TEST_DATA_DIR}/never-appears.jsonl`,
|
|
106
|
+
description: "stuck crash zombie",
|
|
107
|
+
prompt: "p",
|
|
108
|
+
chatId: 1,
|
|
109
|
+
userId: 1,
|
|
110
|
+
toolUseId: null,
|
|
111
|
+
});
|
|
112
|
+
// Backdate 5 min (> 2 min threshold)
|
|
113
|
+
const pending = mod.listPendingAgents();
|
|
114
|
+
(pending[0] as { startedAt: number }).startedAt = Date.now() - 5 * 60_000;
|
|
115
|
+
|
|
116
|
+
await mod.pollOnce();
|
|
117
|
+
|
|
118
|
+
expect(delivered).toHaveLength(1);
|
|
119
|
+
expect(delivered[0].result.status).toBe("error");
|
|
120
|
+
// Error message should be explicit so user understands
|
|
121
|
+
expect(delivered[0].result.error).toMatch(/output file|never appeared|never wrote/i);
|
|
122
|
+
expect(mod.listPendingAgents()).toHaveLength(0);
|
|
123
|
+
});
|
|
124
|
+
|
|
125
|
+
it("default threshold is 10 min when env var is not set", async () => {
|
|
126
|
+
const mod = await import("../src/services/async-agent-watcher.js");
|
|
127
|
+
mod.registerPendingAgent({
|
|
128
|
+
agentId: "at-default",
|
|
129
|
+
outputFile: `${TEST_DATA_DIR}/z.jsonl`,
|
|
130
|
+
description: "default threshold",
|
|
131
|
+
prompt: "p",
|
|
132
|
+
chatId: 1,
|
|
133
|
+
userId: 1,
|
|
134
|
+
toolUseId: null,
|
|
135
|
+
});
|
|
136
|
+
// Backdate 9 min — still under the 10-min default, should stay pending
|
|
137
|
+
let p = mod.listPendingAgents();
|
|
138
|
+
(p[0] as { startedAt: number }).startedAt = Date.now() - 9 * 60_000;
|
|
139
|
+
await mod.pollOnce();
|
|
140
|
+
expect(delivered).toHaveLength(0);
|
|
141
|
+
expect(mod.listPendingAgents()).toHaveLength(1);
|
|
142
|
+
|
|
143
|
+
// Backdate to 11 min — over threshold, should fire
|
|
144
|
+
p = mod.listPendingAgents();
|
|
145
|
+
(p[0] as { startedAt: number }).startedAt = Date.now() - 11 * 60_000;
|
|
146
|
+
await mod.pollOnce();
|
|
147
|
+
expect(delivered).toHaveLength(1);
|
|
148
|
+
});
|
|
149
|
+
|
|
150
|
+
it("running file (has content, no end_turn) is unaffected by zombie check", async () => {
|
|
151
|
+
// A file WITH content should never trigger the missing-file path
|
|
152
|
+
// regardless of age.
|
|
153
|
+
const outPath = `${TEST_DATA_DIR}/running.jsonl`;
|
|
154
|
+
fs.writeFileSync(
|
|
155
|
+
outPath,
|
|
156
|
+
JSON.stringify({
|
|
157
|
+
type: "assistant",
|
|
158
|
+
isSidechain: true,
|
|
159
|
+
agentId: "x",
|
|
160
|
+
message: {
|
|
161
|
+
role: "assistant",
|
|
162
|
+
content: [{ type: "tool_use", name: "Bash", input: {} }],
|
|
163
|
+
stop_reason: "tool_use",
|
|
164
|
+
},
|
|
165
|
+
}) + "\n",
|
|
166
|
+
"utf-8",
|
|
167
|
+
);
|
|
168
|
+
const mod = await import("../src/services/async-agent-watcher.js");
|
|
169
|
+
mod.registerPendingAgent({
|
|
170
|
+
agentId: "active-work",
|
|
171
|
+
outputFile: outPath,
|
|
172
|
+
description: "legitimately running",
|
|
173
|
+
prompt: "p",
|
|
174
|
+
chatId: 1,
|
|
175
|
+
userId: 1,
|
|
176
|
+
toolUseId: null,
|
|
177
|
+
});
|
|
178
|
+
const p = mod.listPendingAgents();
|
|
179
|
+
(p[0] as { startedAt: number }).startedAt = Date.now() - 30 * 60_000; // 30 min old
|
|
180
|
+
|
|
181
|
+
await mod.pollOnce();
|
|
182
|
+
|
|
183
|
+
// v4.12.4 staleness detection COULD fire here because the file has
|
|
184
|
+
// text content and is stale. That's a different (benign) path — the
|
|
185
|
+
// agent gets delivered as "completed with partial output". Either
|
|
186
|
+
// way, the zombie-fix error path must NOT fire.
|
|
187
|
+
const anyZombieError = delivered.some(
|
|
188
|
+
(d) => d.result.error && /output file never/i.test(d.result.error),
|
|
189
|
+
);
|
|
190
|
+
expect(anyZombieError).toBe(false);
|
|
191
|
+
});
|
|
192
|
+
|
|
193
|
+
it("completed file delivers as completed (unchanged)", async () => {
|
|
194
|
+
const outPath = `${TEST_DATA_DIR}/done.jsonl`;
|
|
195
|
+
fs.writeFileSync(
|
|
196
|
+
outPath,
|
|
197
|
+
JSON.stringify({
|
|
198
|
+
type: "assistant",
|
|
199
|
+
agentId: "x",
|
|
200
|
+
message: {
|
|
201
|
+
content: [{ type: "text", text: "all good" }],
|
|
202
|
+
stop_reason: "end_turn",
|
|
203
|
+
},
|
|
204
|
+
}) + "\n",
|
|
205
|
+
"utf-8",
|
|
206
|
+
);
|
|
207
|
+
const mod = await import("../src/services/async-agent-watcher.js");
|
|
208
|
+
mod.registerPendingAgent({
|
|
209
|
+
agentId: "done-agent",
|
|
210
|
+
outputFile: outPath,
|
|
211
|
+
description: "clean completion",
|
|
212
|
+
prompt: "p",
|
|
213
|
+
chatId: 1,
|
|
214
|
+
userId: 1,
|
|
215
|
+
toolUseId: null,
|
|
216
|
+
});
|
|
217
|
+
// Backdate 1h — would trigger zombie if misapplied
|
|
218
|
+
const p = mod.listPendingAgents();
|
|
219
|
+
(p[0] as { startedAt: number }).startedAt = Date.now() - 60 * 60_000;
|
|
220
|
+
|
|
221
|
+
await mod.pollOnce();
|
|
222
|
+
|
|
223
|
+
expect(delivered).toHaveLength(1);
|
|
224
|
+
expect(delivered[0].result.status).toBe("completed");
|
|
225
|
+
});
|
|
226
|
+
|
|
227
|
+
it("decrements session counter on zombie failure delivery", async () => {
|
|
228
|
+
process.env.ALVIN_MISSING_FILE_FAILURE_MS = "1000"; // 1 sec for fast test
|
|
229
|
+
const sessionMod = await import("../src/services/session.js");
|
|
230
|
+
const session = sessionMod.getSession("zombie-session");
|
|
231
|
+
session.pendingBackgroundCount = 1;
|
|
232
|
+
|
|
233
|
+
const mod = await import("../src/services/async-agent-watcher.js");
|
|
234
|
+
mod.registerPendingAgent({
|
|
235
|
+
agentId: "session-zombie",
|
|
236
|
+
outputFile: `${TEST_DATA_DIR}/gone.jsonl`,
|
|
237
|
+
description: "zombie for counter",
|
|
238
|
+
prompt: "p",
|
|
239
|
+
chatId: 1,
|
|
240
|
+
userId: 1,
|
|
241
|
+
toolUseId: null,
|
|
242
|
+
sessionKey: "zombie-session",
|
|
243
|
+
});
|
|
244
|
+
const p = mod.listPendingAgents();
|
|
245
|
+
(p[0] as { startedAt: number }).startedAt = Date.now() - 5000; // 5 sec ago, > 1sec threshold
|
|
246
|
+
|
|
247
|
+
await mod.pollOnce();
|
|
248
|
+
|
|
249
|
+
expect(delivered).toHaveLength(1);
|
|
250
|
+
expect(session.pendingBackgroundCount).toBe(0);
|
|
251
|
+
});
|
|
252
|
+
});
|