alvin-bot 4.25.1 → 4.26.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,101 @@
2
2
 
3
3
  All notable changes to Alvin Bot are documented here.
4
4
 
5
+ ## [4.26.0] — 2026-05-13
6
+
7
+ ### Self-Preservation Phase 1 — four new resilience features, zero hot-path cost
8
+
9
+ Bot now **survives more failure modes** and **alerts you when it can't survive them**. All four features run event-driven or on low-frequency timers — no hot-path overhead, measured RSS +4 MB / cold-start +81 ms vs baseline on a real Apple Silicon Mac (within the +5 MB / +2000 ms tolerance budget).
10
+
11
+ #### Pre-Flight Sanity Check at startup (feature 1A)
12
+
13
+ In parallel at boot, the bot now checks: (1) Telegram `getMe`, (2) AI provider `isAvailable()` — provider-agnostic via the existing Provider interface, works equally for `claude-sdk` / `codex-cli` / `groq` / `gemini` / `offline-gemma4` / etc., (3) SQLite `PRAGMA quick_check` on the embeddings DB, (4) Disk space ≥ 1 GB. Fire-and-forget — startup is **not** delayed; results land ~1 s after `Alvin Bot started` with severity-tagged output:
14
+
15
+ ```
16
+ 🩺 ✅ Pre-Flight: all checks ok — 986ms total
17
+ ✓ telegram bot=@AlvinMBAM4_bot (405ms)
18
+ ✓ ai-provider claude-sdk reachable (922ms)
19
+ ✓ sqlite embeddings DB integrity ok (43ms)
20
+ ✓ disk 53.28 GB free (37ms)
21
+ ```
22
+
23
+ Per-check timeouts (3 s / 5 s / 10 s / 2 s) bound the cost. Critical findings will feed Phase 2's auto-diagnostic (already wired). Opt-out: `ALVIN_DISABLE_PREFLIGHT=true`.
24
+
25
+ #### Critical-Event Cross-Channel Notify (feature 1D)
26
+
27
+ When the bot hits a state it can't recover from on its own — watchdog crash-loop brake engaged, repeated Telegram 409s, all providers dead, disk critically low — it now alerts the operator through a **fallback chain that doesn't depend on the bot's own platform being healthy**:
28
+
29
+ 1. **`~/.alvin-bot/CRITICAL.log`** — durable audit trail, always written first. Plain text, dated, machine-readable.
30
+ 2. **macOS native notification** via `osascript` — visible immediately on the user's desktop.
31
+ 3. **Telegram DM to admin** via `curl` — synchronous in exit-imminent contexts so the alert lands before `process.exit()` kills any pending I/O.
32
+
33
+ The synchronous-vs-detached distinction matters: detached child processes get killed by macOS+launchd before they finish their fork-and-exec when the parent exits within a few ms. The watchdog brake explicitly uses `blockTelegram: true` to spawnSync the curl POST and confirm the HTTP response code. Plain-text body (not Markdown) so shell-command `suggestedAction`s with `"`, `&&`, etc. don't trigger Telegram's `Bad Request: can't parse entities` error. Opt-out: `ALVIN_DISABLE_CRITICAL_NOTIFY=true`.
34
+
35
+ #### Zombie Dead-Man-Switch (feature 2E)
36
+
37
+ Bot writes a unix-timestamp heartbeat to `~/.alvin-bot/heartbeat.txt` every 60 s. A **separate, tiny launchd LaunchAgent** (`com.alvinbot.deadman`) wakes every 5 min and checks the heartbeat — if older than 10 min, the watcher fires `launchctl kickstart -k gui/$UID/com.alvinbot.app` to force-restart.
38
+
39
+ Catches the failure mode the in-process watchdog **cannot** see: process is alive but frozen (event-loop deadlock, blocked I/O, native-binding hang). The in-process watchdog can't detect its own death — that's a contradiction in terms — so the external observer is the only architecturally sound solution.
40
+
41
+ Threshold overridable for testing: `ALVIN_DEADMAN_THRESHOLD_SEC=60` (default 600). End-to-end verified on a real Mac: `kill -STOP` froze the bot at PID X, watcher detected stale heartbeat 700 s old, kickstart fired, fresh PID Y came up within 8 s. CPU cost of the watcher: 0.017 %.
42
+
43
+ #### Auto-Diagnostic Logs-Collector (feature 2F)
44
+
45
+ On any critical failure, the bot now writes a structured forensic Markdown bundle to `~/.alvin-bot/diagnostics/<timestamp>-<category>.md` containing:
46
+
47
+ 1. Event detail + suggested action
48
+ 2. Process state (PID, RSS, heap, uptime, node version, platform, argv)
49
+ 3. Non-secret environment vars (PATH, PRIMARY_PROVIDER, FALLBACK_PROVIDERS, WEB_*, …)
50
+ 4. Last 200 lines of `alvin-bot.err.log`
51
+ 5. Last 200 lines of `alvin-bot.out.log`
52
+ 6. Watchdog state (`~/.alvin-bot/state/watchdog.json`)
53
+ 7. System tool inventory (`node`, `npm`, `brew`, `pm2`, `codex`, `claude`, `yt-dlp`, `ffmpeg`, `wacli`, `agent-browser`)
54
+ 8. Disk space (`df -h ~/.alvin-bot`)
55
+ 9. PM2 status (if PM2 installed — the same kind of state that bit us in 4.25.1)
56
+
57
+ Bundles are ~18 KB each, capped at 50 retained files (oldest pruned automatically). The Telegram DM from feature 1D now includes the bundle path so the operator can immediately `cat` or scp it.
58
+
59
+ This is also the data input the 5.0.0 AI-Self-Diagnosis (feature 3I) will feed to a sub-agent for automated analysis. As a 4.26.0 deliverable it stands on its own as "human-readable forensic dump".
60
+
61
+ Opt-out: `ALVIN_DISABLE_AUTO_DIAGNOSTIC=true`.
62
+
63
+ ### Bundle wacli (WhatsApp CLI) with conditional opt-in
64
+
65
+ `wacli` (https://wacli.sh, brew tap `steipete/tap`, v0.8.1, ~25 MB Go binary) is now part of `BOOTSTRAP_TOOLS` — but with a **hybrid install condition** that avoids forcing it onto users who don't use WhatsApp:
66
+
67
+ - **If `wacli` is already installed** → bootstrap runs `brew upgrade wacli` (treated like any other bundled tool).
68
+ - **If `WHATSAPP_ENABLED=true` is set in `.env`** → bootstrap installs via `brew install steipete/tap/wacli`.
69
+ - **Otherwise** → silent skip with dimmer `·` icon: `· wacli (WhatsApp CLI) skipped (not opted in)`.
70
+
71
+ License: see https://wacli.sh — alvin-bot does not bundle wacli, only invokes the user's brew, the user remains the licensee. macOS only (no Linux build upstream; bootstrap skips on Linux automatically).
72
+
73
+ ### Opt-out env vars summary
74
+
75
+ For users who want minimal footprint:
76
+
77
+ ```
78
+ ALVIN_DISABLE_SELF_PRESERVATION=true # skip ALL Phase-1 features
79
+ ALVIN_DISABLE_PREFLIGHT=true # skip Pre-Flight only
80
+ ALVIN_DISABLE_CRITICAL_NOTIFY=true # skip cross-channel notify
81
+ ALVIN_DISABLE_DEAD_MAN=true # skip heartbeat writer
82
+ ALVIN_DISABLE_AUTO_DIAGNOSTIC=true # skip diagnostic bundles
83
+ ALVIN_DEADMAN_THRESHOLD_SEC=600 # tune dead-man threshold (default 10 min)
84
+ ```
85
+
86
+ ### Performance budget verified on real hardware
87
+
88
+ End-to-end measurements on Apple Silicon Mac (.75 test box):
89
+
90
+ | Metric | Baseline 4.25.1 | 4.26.0 | Δ | Tolerance |
91
+ |---|---|---|---|---|
92
+ | Cold-start ready (median, throttled) | 5023 ms | 5104 ms | +81 ms | +2000 ms |
93
+ | Cold-start ready (unthrottled, 1st run) | 2189 ms | 2170 ms | -19 ms | +2000 ms |
94
+ | RSS idle steady-state | ~102 MB | 106.4 MB | +4.4 MB | +5 MB |
95
+ | CPU idle | 0.0 % | 0.0 % | 0 | +0.1 % |
96
+ | Log dir growth | stable | stable | n/a | <1 KB/s |
97
+
98
+ All five metrics within tolerance.
99
+
5
100
  ## [4.25.1] — 2026-05-13
6
101
 
7
102
  ### Fixed: `alvin-bot launchd install` now persists the PM2 cleanup
package/bin/cli.js CHANGED
@@ -272,6 +272,24 @@ const BOOTSTRAP_TOOLS = [
272
272
  install: { macos: "brew install ffmpeg", linux: "sudo apt-get install -y ffmpeg" },
273
273
  upgrade: { macos: "brew upgrade ffmpeg", linux: "sudo apt-get install --only-upgrade -y ffmpeg" },
274
274
  },
275
+ {
276
+ // wacli — WhatsApp CLI from steipete/tap. Hybrid bootstrap: only
277
+ // install/upgrade if the user has already installed it (we
278
+ // respect their existing setup) or has explicitly opted in via
279
+ // WHATSAPP_ENABLED=true in .env. This avoids pulling a ~25 MB
280
+ // Go binary onto every public user's machine, including those
281
+ // who never touch WhatsApp.
282
+ cmd: "wacli",
283
+ name: "wacli (WhatsApp CLI)",
284
+ license: "see https://wacli.sh — installed via your own brew, you remain the licensee",
285
+ install: { macos: "brew install steipete/tap/wacli", linux: null },
286
+ upgrade: { macos: "brew upgrade wacli", linux: null },
287
+ // Hybrid: only bootstrap if the user has explicitly signalled
288
+ // interest. installCondition is checked BEFORE any install/upgrade
289
+ // attempt; returns false → tool silently skipped.
290
+ installCondition: (env) =>
291
+ hasCommand("wacli") || env.WHATSAPP_ENABLED === "true",
292
+ },
275
293
  ];
276
294
 
277
295
  // Memoized: `brew update` is slow (5-30s) but needs to run at least once
@@ -309,6 +327,22 @@ function detectPlatformPm() {
309
327
  function bootstrapOneTool(tool, platform) {
310
328
  const cmdAvailable = hasCommand(tool.cmd);
311
329
 
330
+ // installCondition: optional gate that respects user intent. A tool with
331
+ // installCondition returning false is treated as "user hasn't opted in,
332
+ // silently skip". This is how wacli avoids forcing a 25 MB WhatsApp CLI
333
+ // onto every public user — only installs if they have it already or
334
+ // explicitly set WHATSAPP_ENABLED=true in .env.
335
+ if (typeof tool.installCondition === "function") {
336
+ try {
337
+ if (!tool.installCondition(process.env)) {
338
+ return { ok: true, skipped: true, message: `${tool.name} skipped (not opted in)` };
339
+ }
340
+ } catch {
341
+ // condition function threw — be defensive, skip
342
+ return { ok: true, skipped: true, message: `${tool.name} skipped (condition error)` };
343
+ }
344
+ }
345
+
312
346
  // Linux-only prerequisite check (e.g. pipx for yt-dlp).
313
347
  if (platform === "linux" && tool.linuxSkipIf && !hasCommand(tool.linuxSkipIf)) {
314
348
  return { ok: false, message: `${tool.name} skipped — needs '${tool.linuxSkipIf}' on Linux` };
@@ -376,12 +410,12 @@ async function ensureBootstrapTools(opts = {}) {
376
410
  const platform = detectPlatformPm();
377
411
  if (!platform) return;
378
412
 
379
- console.log("\n🎬 Setting up media tools (yt-dlp + ffmpeg)...");
413
+ console.log("\n🎬 Setting up bundled tools (yt-dlp, ffmpeg, wacli on opt-in)...");
380
414
 
381
415
  // macOS needs brew on PATH — same trick as ensureBrewOnPath() uses.
382
416
  if (platform === "macos" && !hasCommand("brew")) {
383
417
  if (!ensureBrewOnPath()) {
384
- console.log(" ⚠️ Skipping media-tool bootstrap — Homebrew not installed.");
418
+ console.log(" ⚠️ Skipping tool bootstrap — Homebrew not installed.");
385
419
  console.log(" To enable: install brew from https://brew.sh and re-run setup.");
386
420
  return;
387
421
  }
@@ -389,7 +423,9 @@ async function ensureBootstrapTools(opts = {}) {
389
423
 
390
424
  for (const tool of BOOTSTRAP_TOOLS) {
391
425
  const result = bootstrapOneTool(tool, platform);
392
- console.log(` ${result.ok ? "✓" : "⚠"} ${result.message}`);
426
+ // skipped (opt-in not signaled) use dimmer icon, less attention-grabbing
427
+ const icon = result.skipped ? "·" : result.ok ? "✓" : "⚠";
428
+ console.log(` ${icon} ${result.message}`);
393
429
  }
394
430
  console.log("");
395
431
  }
@@ -2688,7 +2724,80 @@ function launchdPaths() {
2688
2724
  const entryPoint = resolve(join(import.meta.dirname, "..", "dist", "index.js"));
2689
2725
  const cwd = resolve(join(import.meta.dirname, ".."));
2690
2726
  const nodePath = process.execPath;
2691
- return { home, label, plistPath, logDir, entryPoint, cwd, nodePath };
2727
+ // Dead-man-switch watcher (Self-Preservation Phase 1, feature 2E).
2728
+ // Separate, tiny LaunchAgent that fires every 5 min and force-restarts
2729
+ // the main bot if its heartbeat-file is stale. The two agents are
2730
+ // intentionally independent: if the main bot is wedged, the dead-man
2731
+ // agent is still scheduling and reading the file.
2732
+ const deadmanLabel = "com.alvinbot.deadman";
2733
+ const deadmanPlistPath = join(home, "Library", "LaunchAgents", `${deadmanLabel}.plist`);
2734
+ return { home, label, plistPath, logDir, entryPoint, cwd, nodePath, deadmanLabel, deadmanPlistPath };
2735
+ }
2736
+
2737
+ /**
2738
+ * Generate the dead-man watcher LaunchAgent plist. It runs a tiny shell
2739
+ * script every 5 minutes (StartInterval) that compares the bot's
2740
+ * heartbeat-file timestamp against now. If the heartbeat is more than
2741
+ * 10 minutes stale, it `launchctl kickstart -k`s the main bot.
2742
+ *
2743
+ * The threshold is overridable via ALVIN_DEADMAN_THRESHOLD_SEC for
2744
+ * testing; default is 600 s = 10 minutes.
2745
+ *
2746
+ * Why inline shell instead of a bundled script:
2747
+ * - Zero extra files to ship via npm
2748
+ * - Trivial to audit: 12 lines of POSIX sh
2749
+ * - No PATH dependency (uses absolute /bin paths)
2750
+ */
2751
+ function renderDeadmanPlist({ deadmanLabel, mainLabel, home, logDir }) {
2752
+ // Inline shell — kept POSIX-clean, uses only built-ins + launchctl.
2753
+ // The redirect to logDir/deadman.log gives us a record of any
2754
+ // kickstart actions without the watcher writing more than ~50
2755
+ // bytes per event.
2756
+ const script = `
2757
+ HEARTBEAT="${home}/.alvin-bot/heartbeat.txt"
2758
+ LOG="${logDir}/deadman.log"
2759
+ THRESHOLD="\${ALVIN_DEADMAN_THRESHOLD_SEC:-600}"
2760
+ if [ ! -f "$HEARTBEAT" ]; then exit 0; fi
2761
+ LAST=$(cat "$HEARTBEAT" 2>/dev/null | tr -d ' \\n')
2762
+ NOW=$(date +%s)
2763
+ case "$LAST" in
2764
+ ''|*[!0-9]*) exit 0 ;;
2765
+ esac
2766
+ DIFF=$((NOW - LAST))
2767
+ if [ "$DIFF" -gt "$THRESHOLD" ]; then
2768
+ echo "$(date -u +%FT%TZ) deadman: heartbeat $DIFF s old (> $THRESHOLD s), kickstarting ${mainLabel}" >> "$LOG"
2769
+ /bin/launchctl kickstart -k "gui/$(id -u)/${mainLabel}" 2>>"$LOG" || true
2770
+ fi
2771
+ `.trim();
2772
+
2773
+ return `<?xml version="1.0" encoding="UTF-8"?>
2774
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
2775
+ <plist version="1.0">
2776
+ <dict>
2777
+ <key>Label</key>
2778
+ <string>${deadmanLabel}</string>
2779
+
2780
+ <key>ProgramArguments</key>
2781
+ <array>
2782
+ <string>/bin/sh</string>
2783
+ <string>-c</string>
2784
+ <string>${script.replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;")}</string>
2785
+ </array>
2786
+
2787
+ <key>StartInterval</key>
2788
+ <integer>300</integer>
2789
+
2790
+ <key>RunAtLoad</key>
2791
+ <false/>
2792
+
2793
+ <key>StandardErrorPath</key>
2794
+ <string>${logDir}/deadman.err.log</string>
2795
+
2796
+ <key>LimitLoadToSessionType</key>
2797
+ <string>Aqua</string>
2798
+ </dict>
2799
+ </plist>
2800
+ `;
2692
2801
  }
2693
2802
 
2694
2803
  async function launchdInstall() {
@@ -2830,6 +2939,38 @@ async function launchdInstall() {
2830
2939
  console.log(` protected files. (Granted path: ${fda.realNodePath})`);
2831
2940
  }
2832
2941
 
2942
+ // ── Dead-Man-Switch (Self-Preservation Phase 1, feature 2E) ──────────
2943
+ // Install a second tiny LaunchAgent that wakes every 5 min and force-
2944
+ // restarts the main bot if its heartbeat-file is stale. Catches "process
2945
+ // alive but frozen" — event-loop deadlocks, blocked I/O, etc. — that
2946
+ // the in-process watchdog can't see.
2947
+ // Opt-out: ALVIN_DISABLE_DEAD_MAN=true or ALVIN_DISABLE_SELF_PRESERVATION=true.
2948
+ if (
2949
+ process.env.ALVIN_DISABLE_DEAD_MAN !== "true" &&
2950
+ process.env.ALVIN_DISABLE_SELF_PRESERVATION !== "true"
2951
+ ) {
2952
+ const { deadmanLabel, deadmanPlistPath } = launchdPaths();
2953
+ const deadmanPlist = renderDeadmanPlist({
2954
+ deadmanLabel,
2955
+ mainLabel: label,
2956
+ home,
2957
+ logDir,
2958
+ });
2959
+ writeFileSync(deadmanPlistPath, deadmanPlist, { mode: 0o644 });
2960
+ console.log("");
2961
+ console.log(`📝 Wrote ${deadmanPlistPath}`);
2962
+ try {
2963
+ execSync(`launchctl bootout gui/$(id -u)/${deadmanLabel} 2>/dev/null || true`, { stdio: "pipe" });
2964
+ } catch {}
2965
+ try {
2966
+ execSync(`launchctl bootstrap gui/$(id -u) "${deadmanPlistPath}"`, { stdio: "pipe" });
2967
+ console.log("🛡️ Dead-man watcher active — checks every 5 min, force-restarts main bot if heartbeat > 10 min stale.");
2968
+ } catch (err) {
2969
+ console.log(`⚠️ Dead-man watcher load failed (non-fatal): ${err.message?.split("\n")[0] || err}`);
2970
+ console.log(" The main bot still works; only zombie-detection is disabled.");
2971
+ }
2972
+ }
2973
+
2833
2974
  process.exit(0);
2834
2975
  }
2835
2976
 
@@ -2859,6 +3000,20 @@ async function launchdUninstall() {
2859
3000
  console.log(`⚠️ Could not remove plist: ${err.message}`);
2860
3001
  }
2861
3002
 
3003
+ // Dead-Man watcher (feature 2E) — also remove its companion plist.
3004
+ const { deadmanLabel, deadmanPlistPath } = launchdPaths();
3005
+ if (existsSync(deadmanPlistPath)) {
3006
+ try {
3007
+ execSync(`launchctl bootout gui/$(id -u)/${deadmanLabel} 2>/dev/null || true`, { stdio: "pipe" });
3008
+ } catch {}
3009
+ try {
3010
+ execSync(`rm -f "${deadmanPlistPath}"`);
3011
+ console.log(`🗑 Removed ${deadmanPlistPath} (dead-man watcher)`);
3012
+ } catch (err) {
3013
+ console.log(`⚠️ Could not remove dead-man plist: ${err.message}`);
3014
+ }
3015
+ }
3016
+
2862
3017
  console.log("");
2863
3018
  console.log("✅ alvin-bot is no longer a launchd user agent.");
2864
3019
  process.exit(0);
package/dist/index.js CHANGED
@@ -204,6 +204,17 @@ if (hasProvider) {
204
204
  else {
205
205
  console.warn("⚠️ Engine not initialized — no AI provider configured.");
206
206
  }
207
+ // Pre-Flight Sanity Check (Self-Preservation Phase 1, feature 1A) —
208
+ // runs in parallel, fire-and-forget. Does NOT block startup.
209
+ // Catches misconfigurations + degraded state at boot time.
210
+ import("./services/preflight.js")
211
+ .then(({ runPreFlight, formatPreFlightReport }) => runPreFlight(config.botToken, registry).then((report) => {
212
+ console.log(formatPreFlightReport(report));
213
+ }))
214
+ .catch((err) => {
215
+ // Pre-Flight itself must never crash the bot.
216
+ console.warn("⚠️ Pre-Flight check threw:", err?.message || err);
217
+ });
207
218
  // Load plugins
208
219
  const pluginResult = await loadPlugins();
209
220
  if (pluginResult.loaded.length > 0) {
@@ -527,6 +538,14 @@ setNotifyCallback(async (target, text) => {
527
538
  enqueue(target.platform, String(target.chatId), text);
528
539
  });
529
540
  startScheduler();
541
+ // Heartbeat-file writer (Self-Preservation Phase 1, feature 2E).
542
+ // Writes ~/.alvin-bot/heartbeat.txt every 60 s so an external
543
+ // dead-man-watch launchd agent can detect "process alive but frozen"
544
+ // and force-restart the bot. Catches event-loop deadlocks that the
545
+ // in-process watchdog cannot see.
546
+ import("./services/heartbeat-file.js").then(({ startHeartbeatWriter }) => {
547
+ startHeartbeatWriter();
548
+ });
530
549
  // Start the async-agent watcher (Fix #17 Stage 2). Polls outputFiles
531
550
  // of background sub-agents Claude launched with run_in_background and
532
551
  // delivers their completed reports as separate Telegram messages.
@@ -0,0 +1,228 @@
1
+ /**
2
+ * Auto-Diagnostic Logs-Collector (Self-Preservation Phase 1, feature 2F).
3
+ *
4
+ * On critical failure, write a structured Markdown "forensic bundle" to
5
+ * ~/.alvin-bot/diagnostics/<timestamp>-<category>.md containing:
6
+ *
7
+ * - Bot version + boot info
8
+ * - Last 200 lines of out.log + err.log
9
+ * - Current process state (PID, RSS, uptime, node version, platform)
10
+ * - Non-secret environment vars (PATH, PRIMARY_PROVIDER, …)
11
+ * - Watchdog state (~/.alvin-bot/state/watchdog.json)
12
+ * - System tool inventory (which node/codex/claude/pm2/yt-dlp/…)
13
+ * - Disk space snapshot
14
+ * - The triggering event itself + suggestion
15
+ *
16
+ * The bundle is the input that the 5.0.0 AI-Diagnostic feature (3I) will
17
+ * later feed to a sub-agent for automated analysis. As of 4.26.0 it's a
18
+ * "human-readable forensic dump" — useful on its own, no AI required.
19
+ *
20
+ * Auto-prune: max 50 retained bundles, oldest deleted on next write.
21
+ *
22
+ * Performance: <100KB per bundle, ~50-200ms wall-clock per write,
23
+ * synchronous (we're typically called right before process.exit so
24
+ * blocking is the right semantic). Files are atomic — full bundle or
25
+ * nothing.
26
+ *
27
+ * Opt-out:
28
+ * ALVIN_DISABLE_AUTO_DIAGNOSTIC=true → skip bundle writes
29
+ * ALVIN_DISABLE_SELF_PRESERVATION=true → skip ALL Phase-1
30
+ */
31
+ import { writeFileSync, readFileSync, mkdirSync, existsSync, readdirSync, statSync, unlinkSync, } from "fs";
32
+ import { join } from "path";
33
+ import { homedir } from "os";
34
+ import { execSync } from "child_process";
35
+ import { BOT_VERSION } from "../version.js";
36
+ const MAX_BUNDLES = 50;
37
+ function isDisabled() {
38
+ return (process.env.ALVIN_DISABLE_AUTO_DIAGNOSTIC === "true" ||
39
+ process.env.ALVIN_DISABLE_SELF_PRESERVATION === "true");
40
+ }
41
+ function safeReadTail(filename, n) {
42
+ try {
43
+ const path = join(homedir(), ".alvin-bot", "logs", filename);
44
+ if (!existsSync(path))
45
+ return "(log file not present)";
46
+ const content = readFileSync(path, "utf-8");
47
+ const lines = content.split("\n");
48
+ return lines.slice(Math.max(0, lines.length - n)).join("\n");
49
+ }
50
+ catch (err) {
51
+ return `(read failed: ${err instanceof Error ? err.message : String(err)})`;
52
+ }
53
+ }
54
+ function safeShell(cmd, timeoutMs = 5000) {
55
+ try {
56
+ return execSync(cmd, { encoding: "utf-8", timeout: timeoutMs, stdio: ["ignore", "pipe", "pipe"] }).trim();
57
+ }
58
+ catch (err) {
59
+ const e = err;
60
+ const out = e.stdout?.toString().trim() ?? "";
61
+ const stderr = e.stderr?.toString().trim() ?? "";
62
+ if (out)
63
+ return out + (stderr ? `\n[stderr]: ${stderr}` : "");
64
+ return `(command failed: ${e.message || "unknown"})`;
65
+ }
66
+ }
67
+ function safeReadFile(path) {
68
+ try {
69
+ return readFileSync(path, "utf-8").trim();
70
+ }
71
+ catch (err) {
72
+ return `(could not read ${path}: ${err instanceof Error ? err.message : String(err)})`;
73
+ }
74
+ }
75
+ /**
76
+ * Prune diagnostic bundles older than MAX_BUNDLES (50). Oldest deleted
77
+ * first by mtime. Best-effort: silent on errors.
78
+ */
79
+ export function pruneDiagnostics(maxKeep = MAX_BUNDLES) {
80
+ try {
81
+ const dir = join(homedir(), ".alvin-bot", "diagnostics");
82
+ if (!existsSync(dir))
83
+ return;
84
+ const files = readdirSync(dir)
85
+ .filter((f) => f.endsWith(".md"))
86
+ .map((f) => {
87
+ try {
88
+ return { name: f, mtime: statSync(join(dir, f)).mtimeMs };
89
+ }
90
+ catch {
91
+ return { name: f, mtime: 0 };
92
+ }
93
+ })
94
+ .sort((a, b) => b.mtime - a.mtime);
95
+ for (const f of files.slice(maxKeep)) {
96
+ try {
97
+ unlinkSync(join(dir, f.name));
98
+ }
99
+ catch {
100
+ /* best-effort */
101
+ }
102
+ }
103
+ }
104
+ catch {
105
+ /* never fail the caller */
106
+ }
107
+ }
108
+ /**
109
+ * Write a diagnostic bundle for the given event. Returns the absolute
110
+ * path to the written file, or null if disabled / failed.
111
+ *
112
+ * Safe to call from any context — never throws. Side-effects:
113
+ * - Creates ~/.alvin-bot/diagnostics/ if absent
114
+ * - Writes a single ~50-100KB markdown file
115
+ * - Prunes to MAX_BUNDLES retained
116
+ */
117
+ export function writeDiagnosticBundle(event) {
118
+ if (isDisabled())
119
+ return null;
120
+ try {
121
+ const dir = join(homedir(), ".alvin-bot", "diagnostics");
122
+ mkdirSync(dir, { recursive: true });
123
+ const ts = (event.ts || new Date()).toISOString().replace(/[:.]/g, "-");
124
+ const filename = `${ts}-${event.category}.md`;
125
+ const filepath = join(dir, filename);
126
+ const mem = process.memoryUsage();
127
+ const rssMB = Math.round(mem.rss / 1024 / 1024);
128
+ const heapMB = Math.round(mem.heapUsed / 1024 / 1024);
129
+ const sections = [
130
+ `# Alvin Bot — Diagnostic Bundle`,
131
+ ``,
132
+ `**Generated:** ${new Date().toISOString()}`,
133
+ `**Bot version:** ${BOT_VERSION}`,
134
+ `**Trigger category:** ${event.category}`,
135
+ `**Severity:** ${event.severity}`,
136
+ `**Title:** ${event.title}`,
137
+ ``,
138
+ `## 1. Event Detail`,
139
+ ``,
140
+ "```",
141
+ event.detail,
142
+ "```",
143
+ ``,
144
+ ...(event.suggestedAction
145
+ ? [`### Suggested action`, ``, "```", event.suggestedAction, "```", ``]
146
+ : []),
147
+ `## 2. Process State`,
148
+ ``,
149
+ `- PID: ${process.pid}`,
150
+ `- RSS memory: ${rssMB} MB`,
151
+ `- Heap used: ${heapMB} MB`,
152
+ `- Uptime: ${Math.round(process.uptime())} s`,
153
+ `- Node.js: ${process.version}`,
154
+ `- Platform: ${process.platform} (${process.arch})`,
155
+ `- argv: ${process.argv.join(" ")}`,
156
+ ``,
157
+ `## 3. Environment (non-secret only)`,
158
+ ``,
159
+ ...[
160
+ "NODE_ENV",
161
+ "HOME",
162
+ "PATH",
163
+ "PRIMARY_PROVIDER",
164
+ "FALLBACK_PROVIDERS",
165
+ "AUTH_MODE",
166
+ "SESSION_MODE",
167
+ "WEB_HOST",
168
+ "WEB_PORT",
169
+ "WORKING_DIR",
170
+ "MAX_BUDGET_USD",
171
+ "ALVIN_DATA_DIR",
172
+ "ALVIN_DEADMAN_THRESHOLD_SEC",
173
+ "ALVIN_DISABLE_SELF_PRESERVATION",
174
+ ].map((key) => `- ${key}: ${process.env[key] ?? "(unset)"}`),
175
+ ``,
176
+ `## 4. Recent stderr (last 200 lines)`,
177
+ ``,
178
+ "```",
179
+ safeReadTail("alvin-bot.err.log", 200),
180
+ "```",
181
+ ``,
182
+ `## 5. Recent stdout (last 200 lines)`,
183
+ ``,
184
+ "```",
185
+ safeReadTail("alvin-bot.out.log", 200),
186
+ "```",
187
+ ``,
188
+ `## 6. Watchdog state`,
189
+ ``,
190
+ "```json",
191
+ safeReadFile(join(homedir(), ".alvin-bot", "state", "watchdog.json")),
192
+ "```",
193
+ ``,
194
+ `## 7. System tool inventory`,
195
+ ``,
196
+ "```",
197
+ safeShell("for t in node npm brew pm2 codex claude yt-dlp ffmpeg wacli agent-browser; do printf '%-15s %s\\n' \"$t\" \"$(command -v $t 2>/dev/null || echo NOT_FOUND)\"; done"),
198
+ "```",
199
+ ``,
200
+ `## 8. Disk space (.alvin-bot data dir)`,
201
+ ``,
202
+ "```",
203
+ safeShell(`df -h "${join(homedir(), ".alvin-bot")}" 2>&1 | head -2`),
204
+ "```",
205
+ ``,
206
+ `## 9. PM2 status (if installed)`,
207
+ ``,
208
+ "```",
209
+ safeShell("command -v pm2 >/dev/null && pm2 jlist 2>/dev/null | head -50 || echo 'pm2 not installed'", 3000),
210
+ "```",
211
+ ``,
212
+ `---`,
213
+ ``,
214
+ `*This bundle was generated automatically by the Alvin Bot auto-diagnostic system.*`,
215
+ `*Set \`ALVIN_DISABLE_AUTO_DIAGNOSTIC=true\` in ~/.alvin-bot/.env to opt out.*`,
216
+ ``,
217
+ ];
218
+ writeFileSync(filepath, sections.join("\n"), { mode: 0o600 });
219
+ pruneDiagnostics();
220
+ return filepath;
221
+ }
222
+ catch (err) {
223
+ // Diagnostic writer must not be a new failure mode. Log to stderr
224
+ // (which the critical-notify file flag will reference) and bail.
225
+ console.error(`[auto-diagnostic] failed to write bundle: ${err instanceof Error ? err.message : String(err)}`);
226
+ return null;
227
+ }
228
+ }
@@ -0,0 +1,203 @@
1
+ /**
2
+ * Critical-Event Cross-Channel Notify (Self-Preservation Phase 1, feature 1D).
3
+ *
4
+ * When something genuinely critical happens — watchdog brake engaged,
5
+ * repeated Telegram 409s, all providers dead, disk full, memory blow-up —
6
+ * deliver the alert through a fallback chain so the user actually finds
7
+ * out even if Telegram (the primary channel) is itself the failure mode.
8
+ *
9
+ * Channel cascade — ALL fire, in order of preference:
10
+ * 1. File flag at ~/.alvin-bot/CRITICAL.log [durable audit trail, always written]
11
+ * 2. macOS native notification (osascript) [if darwin, visible immediately]
12
+ * 3. Telegram DM to admin (detached curl) [survives process exit via spawn+unref]
13
+ *
14
+ * Order is deliberate: we ALWAYS persist the audit (1) first, so even
15
+ * if the process crashes mid-notify we have a forensic record. Then we
16
+ * try the user-facing channels (2, 3) best-effort.
17
+ *
18
+ * The Telegram channel uses a detached child `curl` process precisely
19
+ * because critical events often come paired with process.exit() — most
20
+ * notably the watchdog brake. A normal in-process fetch() wouldn't
21
+ * survive parent termination. `spawn + detached + unref` does.
22
+ *
23
+ * Performance: ZERO steady-state overhead. Only the file-flag write
24
+ * runs at all, and only when emitCritical() is called.
25
+ *
26
+ * Opt-out:
27
+ * ALVIN_DISABLE_CRITICAL_NOTIFY=true → skip Tier 1/2/3 entirely
28
+ * ALVIN_DISABLE_SELF_PRESERVATION=true → skip ALL Phase-1 features
29
+ */
30
+ import { spawn, execFileSync, spawnSync } from "child_process";
31
+ import { appendFileSync, mkdirSync } from "fs";
32
+ import { join } from "path";
33
+ import { homedir } from "os";
34
+ function isDisabled() {
35
+ return (process.env.ALVIN_DISABLE_CRITICAL_NOTIFY === "true" ||
36
+ process.env.ALVIN_DISABLE_SELF_PRESERVATION === "true");
37
+ }
38
+ function resolveOptions(opts) {
39
+ const botToken = opts?.botToken ?? process.env.BOT_TOKEN ?? undefined;
40
+ let adminChatId = opts?.adminChatId;
41
+ if (adminChatId === undefined && process.env.ALLOWED_USERS) {
42
+ const first = process.env.ALLOWED_USERS.split(",")[0]?.trim();
43
+ if (first) {
44
+ const parsed = parseInt(first, 10);
45
+ if (Number.isFinite(parsed))
46
+ adminChatId = parsed;
47
+ }
48
+ }
49
+ return { botToken, adminChatId };
50
+ }
51
+ // ── Tier 3: Durable file flag — ALWAYS written first ──────────────────────
52
+ function writeFileFlag(event) {
53
+ try {
54
+ const dir = join(homedir(), ".alvin-bot");
55
+ mkdirSync(dir, { recursive: true });
56
+ const path = join(dir, "CRITICAL.log");
57
+ const ts = (event.ts || new Date()).toISOString();
58
+ const block = [
59
+ `[${ts}] ${event.severity.toUpperCase()} ${event.category}`,
60
+ ` ${event.title}`,
61
+ ...event.detail.split("\n").map((l) => ` ${l}`),
62
+ ...(event.suggestedAction ? [` Suggested: ${event.suggestedAction}`] : []),
63
+ "",
64
+ ].join("\n");
65
+ appendFileSync(path, block);
66
+ return true;
67
+ }
68
+ catch {
69
+ return false;
70
+ }
71
+ }
72
+ // ── Tier 2: macOS native notification (silent on Linux/Windows) ───────────
73
+ function macosNotification(event) {
74
+ if (process.platform !== "darwin")
75
+ return false;
76
+ try {
77
+ // Escape any embedded double-quotes for AppleScript string literal
78
+ const message = `${event.title} — ${event.detail.split("\n")[0]}`.replace(/"/g, '\\"');
79
+ const title = `Alvin Bot ${event.severity === "critical" ? "🚨" : "⚠️"}`;
80
+ execFileSync("osascript", ["-e", `display notification "${message}" with title "${title}"`], { timeout: 3000, stdio: "pipe" });
81
+ return true;
82
+ }
83
+ catch {
84
+ return false;
85
+ }
86
+ }
87
+ // ── Tier 1: Telegram DM to admin via detached curl ────────────────────────
88
+ //
89
+ // Why detached + curl instead of in-process fetch:
90
+ // - emitCritical() is sometimes called moments before process.exit()
91
+ // (notably from the watchdog brake path). In-process async work
92
+ // would be cancelled.
93
+ // - A detached child with stdio:'ignore' + unref() outlives its parent
94
+ // and is the standard pattern for "survive my own death" notifications.
95
+ // - curl is universally available on macOS + Linux. No node-only deps.
96
+ function telegramAdminDM(event, opts) {
97
+ if (!opts.botToken || !opts.adminChatId)
98
+ return false;
99
+ // Plain text — NOT Markdown. Critical events frequently contain shell
100
+ // commands in `suggestedAction` (paths with quotes, `&&` chains, etc.)
101
+ // which break Telegram's Markdown parser with HTTP 400. Reliability >
102
+ // visual prettiness for an alarm channel. The emoji prefix already
103
+ // makes it visually obvious.
104
+ const lines = [
105
+ `🚨 Alvin Bot — ${event.severity.toUpperCase()}`,
106
+ "",
107
+ event.title,
108
+ "",
109
+ event.detail,
110
+ ];
111
+ if (event.suggestedAction) {
112
+ lines.push("", `Suggested: ${event.suggestedAction}`);
113
+ }
114
+ const text = lines.join("\n");
115
+ const curlArgs = [
116
+ "-s",
117
+ "-o", "/dev/null",
118
+ "-X", "POST",
119
+ "--max-time", "5",
120
+ `https://api.telegram.org/bot${opts.botToken}/sendMessage`,
121
+ "-d", `chat_id=${opts.adminChatId}`,
122
+ "--data-urlencode", `text=${text}`,
123
+ ];
124
+ if (opts.blockTelegram) {
125
+ // Synchronous: caller is about to process.exit(). spawnSync blocks
126
+ // up to max-time + a small buffer, then returns. Guaranteed delivery
127
+ // attempt — no fork-race with process termination.
128
+ try {
129
+ // Drop -s -o /dev/null so we can see the HTTP response. The body
130
+ // is logged to stderr if Telegram returns a non-2xx.
131
+ const verboseArgs = curlArgs.filter((a) => a !== "-s" && a !== "/dev/null" && a !== "-o");
132
+ verboseArgs.push("-w", "HTTP=%{http_code}");
133
+ const result = spawnSync("curl", verboseArgs, { timeout: 7000, encoding: "utf-8" });
134
+ const stdout = (result.stdout || "").toString();
135
+ const stderr = (result.stderr || "").toString();
136
+ // Diagnostic — only logs in failure path. Helps debug "DM never arrived".
137
+ if (result.status !== 0 || !/HTTP=2\d\d/.test(stdout)) {
138
+ console.error(`[critical-notify] telegram sync curl status=${result.status} stdout=${stdout.slice(0, 200)} stderr=${stderr.slice(0, 200)}`);
139
+ return false;
140
+ }
141
+ return true;
142
+ }
143
+ catch (err) {
144
+ console.error(`[critical-notify] telegram sync curl threw: ${err instanceof Error ? err.message : String(err)}`);
145
+ return false;
146
+ }
147
+ }
148
+ // Async detached: bot keeps running afterwards, no need to block.
149
+ // detached + stdio:ignore + unref is the standard pattern for
150
+ // "fire and forget". Note: NOT safe if caller calls process.exit()
151
+ // immediately after — use blockTelegram:true for those cases.
152
+ try {
153
+ const child = spawn("curl", curlArgs, { detached: true, stdio: "ignore" });
154
+ child.unref();
155
+ return true;
156
+ }
157
+ catch {
158
+ return false;
159
+ }
160
+ }
161
+ /**
162
+ * Emit a critical event across all configured channels.
163
+ *
164
+ * Synchronous-fast: file flag + osascript run inline (<60ms total typical).
165
+ * Telegram is detached so it doesn't block; we return true if it was
166
+ * scheduled (not whether it succeeded — that we can't know synchronously
167
+ * without blocking).
168
+ *
169
+ * Always safe to call. Never throws. Never blocks longer than ~3s
170
+ * (osascript timeout) in the worst case.
171
+ *
172
+ * Outcome of each tier is also logged to stderr so users can diagnose
173
+ * "why didn't I get the Telegram DM?" by reading their err.log.
174
+ */
175
+ export function emitCritical(event, opts) {
176
+ if (isDisabled()) {
177
+ console.error("[critical-notify] skipped — opt-out via env var");
178
+ return { fileFlag: false, macos: false, telegram: false, reachedAtLeastOne: false };
179
+ }
180
+ // Tier 3 first — most durable, cheapest.
181
+ const fileFlag = writeFileFlag(event);
182
+ // Tier 2 — macOS user-facing.
183
+ const macos = macosNotification(event);
184
+ // Tier 1 — Telegram DM (sync if caller signaled exit, else detached).
185
+ const resolved = resolveOptions(opts);
186
+ const telegram = telegramAdminDM(event, { ...resolved, blockTelegram: opts?.blockTelegram });
187
+ // Diagnostics — written to stderr so even brake-context invocations
188
+ // leave a paper trail in err.log. The user previously hit a case
189
+ // where 1D fired the file flag and osascript but the Telegram DM
190
+ // seemingly never arrived — this log makes it obvious whether
191
+ // resolveOptions found a token + chat_id.
192
+ console.error(`[critical-notify] event="${event.category}" ` +
193
+ `file=${fileFlag ? "ok" : "fail"} ` +
194
+ `macos=${macos ? "ok" : "skip"} ` +
195
+ `telegram=${telegram ? "scheduled" : "skip"}` +
196
+ (telegram ? "" : ` (botToken=${resolved.botToken ? "set" : "missing"} adminChatId=${resolved.adminChatId ?? "missing"})`));
197
+ return {
198
+ fileFlag,
199
+ macos,
200
+ telegram,
201
+ reachedAtLeastOne: fileFlag || macos || telegram,
202
+ };
203
+ }
@@ -0,0 +1,65 @@
1
+ /**
2
+ * Heartbeat-File Writer (Self-Preservation Phase 1, feature 2E).
3
+ *
4
+ * Writes a unix timestamp (seconds) to ~/.alvin-bot/heartbeat.txt every
5
+ * 60 seconds. An external launchd-managed dead-man watcher reads this
6
+ * file every 5 minutes — if the timestamp is older than 10 minutes,
7
+ * the bot is presumed frozen (event-loop deadlock, blocked I/O,
8
+ * unresponsive but alive process) and the watcher force-restarts via
9
+ * `launchctl kickstart -k`.
10
+ *
11
+ * This complements the in-process watchdog (src/services/watchdog.ts)
12
+ * which only catches process exits — it cannot catch "process alive
13
+ * but frozen" because that's exactly the state where the watchdog's
14
+ * own beacon writer also stops.
15
+ *
16
+ * Why a file + external watcher instead of an internal timer:
17
+ * - An internal "I'm frozen" timer is a contradiction in terms.
18
+ * If the event loop is dead, the timer doesn't fire either.
19
+ * - The file-based external watcher is the only architecturally
20
+ * sound way to detect this class of failure.
21
+ *
22
+ * Performance: file write of 11 bytes every 60s. CPU cost ~1ms/min,
23
+ * disk I/O ~0.7 KB/day. Truly negligible.
24
+ *
25
+ * Opt-out:
26
+ * ALVIN_DISABLE_DEAD_MAN=true → skip heartbeat writer
27
+ * ALVIN_DISABLE_SELF_PRESERVATION=true → skip all Phase-1
28
+ */
29
+ import { writeFileSync, mkdirSync } from "fs";
30
+ import { join } from "path";
31
+ import { homedir } from "os";
32
+ const HEARTBEAT_PATH = join(homedir(), ".alvin-bot", "heartbeat.txt");
33
+ const HEARTBEAT_INTERVAL_MS = 60_000;
34
+ let heartbeatTimer = null;
35
+ function writeHeartbeat() {
36
+ try {
37
+ mkdirSync(join(homedir(), ".alvin-bot"), { recursive: true });
38
+ // 11 bytes — Unix seconds + newline. Easy to parse from shell.
39
+ writeFileSync(HEARTBEAT_PATH, `${Math.floor(Date.now() / 1000)}\n`);
40
+ }
41
+ catch {
42
+ // Disk full or permissions — non-fatal. The dead-man watcher will
43
+ // see a stale file and kickstart, which is the right behaviour:
44
+ // a bot that can't write its heartbeat IS effectively stuck.
45
+ }
46
+ }
47
+ export function startHeartbeatWriter() {
48
+ if (process.env.ALVIN_DISABLE_DEAD_MAN === "true" ||
49
+ process.env.ALVIN_DISABLE_SELF_PRESERVATION === "true") {
50
+ return;
51
+ }
52
+ // Write immediately so the dead-man watcher doesn't see a stale file
53
+ // from the previous process incarnation.
54
+ writeHeartbeat();
55
+ heartbeatTimer = setInterval(writeHeartbeat, HEARTBEAT_INTERVAL_MS);
56
+ // Allow the process to exit without waiting for this timer.
57
+ if (heartbeatTimer.unref)
58
+ heartbeatTimer.unref();
59
+ }
60
+ export function stopHeartbeatWriter() {
61
+ if (heartbeatTimer) {
62
+ clearInterval(heartbeatTimer);
63
+ heartbeatTimer = null;
64
+ }
65
+ }
@@ -0,0 +1,292 @@
1
+ /**
2
+ * Pre-Flight Sanity Check (Self-Preservation Phase 1, feature 1A).
3
+ *
4
+ * Runs in PARALLEL at startup, fire-and-forget: never blocks the bot's main
5
+ * startup sequence. Each check has a tight timeout. Results are logged with
6
+ * a severity classification (ok / warn / critical). Critical findings can
7
+ * optionally feed into the cross-channel notify pipeline (1D).
8
+ *
9
+ * Provider-agnostic: AI-provider check is routed through the active
10
+ * Provider's `isAvailable()` method, which every concrete provider
11
+ * implements — so the same check works for claude-sdk, codex-cli,
12
+ * groq, gemini, openai, openrouter, ollama (gemma), nvidia.
13
+ *
14
+ * Opt-out:
15
+ * ALVIN_DISABLE_PREFLIGHT=true → skip Pre-Flight specifically
16
+ * ALVIN_DISABLE_SELF_PRESERVATION=true → skip ALL Phase-1 features
17
+ *
18
+ * Performance budget (measured on Apple Silicon M-series):
19
+ * - Telegram getMe: typical 150-400ms, timeout 3000ms
20
+ * - AI Provider isAvailable: typical 50-800ms, timeout 5000ms
21
+ * - SQLite PRAGMA quick_check: typical 5-50ms, timeout 10000ms
22
+ * - df disk space: typical 5-15ms, timeout 2000ms
23
+ * - Total wall-clock = max of all four (Promise.all) — typically <1s
24
+ */
25
+ import { existsSync } from "fs";
26
+ import { join } from "path";
27
+ import { homedir } from "os";
28
+ function isDisabled() {
29
+ return (process.env.ALVIN_DISABLE_PREFLIGHT === "true" ||
30
+ process.env.ALVIN_DISABLE_SELF_PRESERVATION === "true");
31
+ }
32
+ /**
33
+ * Run a promise with a wall-clock timeout. Returns `fallback` if the
34
+ * promise doesn't settle in time. Never rejects.
35
+ */
36
+ function withTimeout(promise, ms, fallback) {
37
+ return new Promise((resolve) => {
38
+ let settled = false;
39
+ const timer = setTimeout(() => {
40
+ if (!settled) {
41
+ settled = true;
42
+ resolve(fallback);
43
+ }
44
+ }, ms);
45
+ promise.then((value) => {
46
+ if (!settled) {
47
+ settled = true;
48
+ clearTimeout(timer);
49
+ resolve(value);
50
+ }
51
+ }, () => {
52
+ if (!settled) {
53
+ settled = true;
54
+ clearTimeout(timer);
55
+ resolve(fallback);
56
+ }
57
+ });
58
+ });
59
+ }
60
+ async function checkTelegram(botToken) {
61
+ const start = Date.now();
62
+ if (!botToken) {
63
+ return {
64
+ name: "telegram",
65
+ ok: true,
66
+ severity: "ok",
67
+ message: "skipped (WebUI-only mode, no BOT_TOKEN)",
68
+ durationMs: Date.now() - start,
69
+ };
70
+ }
71
+ const url = `https://api.telegram.org/bot${botToken}/getMe`;
72
+ const result = await withTimeout(fetch(url).then(async (r) => ({ ok: r.ok, status: r.status, body: await r.json().catch(() => null) })), 3000, null);
73
+ if (!result) {
74
+ return {
75
+ name: "telegram",
76
+ ok: false,
77
+ severity: "warn",
78
+ message: "getMe timed out (3s) — bot may have network / Telegram issues",
79
+ durationMs: Date.now() - start,
80
+ };
81
+ }
82
+ if (!result.ok) {
83
+ return {
84
+ name: "telegram",
85
+ ok: false,
86
+ severity: "critical",
87
+ message: `getMe HTTP ${result.status} — token may be invalid`,
88
+ durationMs: Date.now() - start,
89
+ };
90
+ }
91
+ const username = result.body?.result?.username;
92
+ return {
93
+ name: "telegram",
94
+ ok: true,
95
+ severity: "ok",
96
+ message: username ? `bot=@${username}` : "bot reachable",
97
+ durationMs: Date.now() - start,
98
+ };
99
+ }
100
+ async function checkAiProvider(registry) {
101
+ const start = Date.now();
102
+ if (!registry) {
103
+ return {
104
+ name: "ai-provider",
105
+ ok: false,
106
+ severity: "warn",
107
+ message: "no provider configured (AI features will be disabled)",
108
+ durationMs: Date.now() - start,
109
+ };
110
+ }
111
+ let provider;
112
+ let activeKey = "(unknown)";
113
+ try {
114
+ provider = registry.getActive();
115
+ activeKey = registry.getActiveKey();
116
+ }
117
+ catch {
118
+ return {
119
+ name: "ai-provider",
120
+ ok: false,
121
+ severity: "warn",
122
+ message: "no active provider in registry",
123
+ durationMs: Date.now() - start,
124
+ };
125
+ }
126
+ if (!provider) {
127
+ return {
128
+ name: "ai-provider",
129
+ ok: false,
130
+ severity: "warn",
131
+ message: "no active provider in registry",
132
+ durationMs: Date.now() - start,
133
+ };
134
+ }
135
+ const available = await withTimeout(provider.isAvailable(), 5000, false);
136
+ return {
137
+ name: "ai-provider",
138
+ ok: available,
139
+ severity: available ? "ok" : "warn",
140
+ message: available
141
+ ? `${activeKey} reachable`
142
+ : `${activeKey} not reachable / not configured — bot will degrade gracefully on AI calls`,
143
+ durationMs: Date.now() - start,
144
+ };
145
+ }
146
+ async function checkSqliteIntegrity() {
147
+ const start = Date.now();
148
+ const dbPath = join(homedir(), ".alvin-bot", "memory", ".embeddings.db");
149
+ if (!existsSync(dbPath)) {
150
+ return {
151
+ name: "sqlite",
152
+ ok: true,
153
+ severity: "ok",
154
+ message: "embeddings DB not yet created (lazily on first use)",
155
+ durationMs: Date.now() - start,
156
+ };
157
+ }
158
+ try {
159
+ const { createRequire } = await import("module");
160
+ const req = createRequire(import.meta.url);
161
+ const Database = req("better-sqlite3");
162
+ const db = new Database(dbPath, { readonly: true });
163
+ // PRAGMA quick_check is materially faster than integrity_check
164
+ // (catches the same classes of corruption but doesn't verify every
165
+ // page). For our purpose — "is the file readable + structurally
166
+ // sane?" — quick_check is the right tool.
167
+ const result = await withTimeout(Promise.resolve(db.prepare("PRAGMA quick_check").get()), 10_000, null);
168
+ db.close();
169
+ if (result === null) {
170
+ return {
171
+ name: "sqlite",
172
+ ok: false,
173
+ severity: "warn",
174
+ message: "PRAGMA quick_check timed out (>10s) — DB may be very large or locked",
175
+ durationMs: Date.now() - start,
176
+ };
177
+ }
178
+ const r = result;
179
+ const checkResult = r.quick_check || "(unknown)";
180
+ const ok = checkResult === "ok";
181
+ return {
182
+ name: "sqlite",
183
+ ok,
184
+ severity: ok ? "ok" : "critical",
185
+ message: ok ? "embeddings DB integrity ok" : `embeddings DB integrity FAILED: ${checkResult}`,
186
+ durationMs: Date.now() - start,
187
+ };
188
+ }
189
+ catch (err) {
190
+ const message = err instanceof Error ? err.message : String(err);
191
+ return {
192
+ name: "sqlite",
193
+ ok: true,
194
+ severity: "ok",
195
+ message: `check skipped: ${message.split("\n")[0]}`,
196
+ durationMs: Date.now() - start,
197
+ };
198
+ }
199
+ }
200
+ async function checkDiskSpace() {
201
+ const start = Date.now();
202
+ try {
203
+ const { execSync } = await import("child_process");
204
+ const dataDir = join(homedir(), ".alvin-bot");
205
+ const out = execSync(`df -k "${dataDir}"`, { encoding: "utf-8", timeout: 2000 });
206
+ const lines = out.trim().split("\n");
207
+ const data = lines[lines.length - 1].split(/\s+/);
208
+ // df output: Filesystem 1024-blocks Used Available Capacity ...
209
+ const availableKB = parseInt(data[3], 10);
210
+ if (!Number.isFinite(availableKB)) {
211
+ return {
212
+ name: "disk",
213
+ ok: true,
214
+ severity: "ok",
215
+ message: "could not parse df output",
216
+ durationMs: Date.now() - start,
217
+ };
218
+ }
219
+ const availableGB = availableKB / 1024 / 1024;
220
+ const severity = availableKB < 512 * 1024 ? "critical" :
221
+ availableKB < 1024 * 1024 ? "warn" :
222
+ "ok";
223
+ return {
224
+ name: "disk",
225
+ ok: severity === "ok",
226
+ severity,
227
+ message: `${availableGB.toFixed(2)} GB free`,
228
+ durationMs: Date.now() - start,
229
+ };
230
+ }
231
+ catch (err) {
232
+ const message = err instanceof Error ? err.message : String(err);
233
+ return {
234
+ name: "disk",
235
+ ok: true,
236
+ severity: "ok",
237
+ message: `check skipped: ${message.split("\n")[0]}`,
238
+ durationMs: Date.now() - start,
239
+ };
240
+ }
241
+ }
242
+ /**
243
+ * Run the full pre-flight suite in parallel. Always resolves (never
244
+ * throws). Returns a structured report so the caller can decide how
245
+ * to react.
246
+ */
247
+ export async function runPreFlight(botToken, registry) {
248
+ if (isDisabled()) {
249
+ return {
250
+ results: [],
251
+ slowestMs: 0,
252
+ totalMs: 0,
253
+ anyCritical: false,
254
+ anyWarning: false,
255
+ skipped: true,
256
+ };
257
+ }
258
+ const start = Date.now();
259
+ const results = await Promise.all([
260
+ checkTelegram(botToken),
261
+ checkAiProvider(registry),
262
+ checkSqliteIntegrity(),
263
+ checkDiskSpace(),
264
+ ]);
265
+ return {
266
+ results,
267
+ slowestMs: Math.max(...results.map((r) => r.durationMs)),
268
+ totalMs: Date.now() - start,
269
+ anyCritical: results.some((r) => r.severity === "critical"),
270
+ anyWarning: results.some((r) => r.severity === "warn"),
271
+ skipped: false,
272
+ };
273
+ }
274
+ /**
275
+ * Format a PreFlightReport for console output. Compact, single line per
276
+ * check, clear severity icons.
277
+ */
278
+ export function formatPreFlightReport(report) {
279
+ if (report.skipped) {
280
+ return "🩺 Pre-Flight: skipped (ALVIN_DISABLE_PREFLIGHT=true)";
281
+ }
282
+ const icons = { ok: "✓", warn: "⚠", critical: "❌" };
283
+ const headline = report.anyCritical
284
+ ? "❌ Pre-Flight: critical issues"
285
+ : report.anyWarning
286
+ ? "⚠️ Pre-Flight: warnings"
287
+ : "✅ Pre-Flight: all checks ok";
288
+ const lines = report.results.map((r) => {
289
+ return ` ${icons[r.severity]} ${r.name.padEnd(12)} ${r.message} (${r.durationMs}ms)`;
290
+ });
291
+ return `🩺 ${headline} — ${report.totalMs}ms total\n${lines.join("\n")}`;
292
+ }
@@ -27,6 +27,8 @@ import { resolve } from "path";
27
27
  import os from "os";
28
28
  import { execSync } from "child_process";
29
29
  import { BOT_VERSION } from "../version.js";
30
+ import { emitCritical } from "./critical-notify.js";
31
+ import { writeDiagnosticBundle } from "./auto-diagnostic.js";
30
32
  import { decideBrakeAction, shouldResetCrashCounter, DEFAULTS, } from "./watchdog-brake.js";
31
33
  const DATA_DIR = process.env.ALVIN_DATA_DIR || resolve(os.homedir(), ".alvin-bot");
32
34
  const STATE_DIR = resolve(DATA_DIR, "state");
@@ -164,6 +166,51 @@ export function startWatchdog() {
164
166
  if (decision.action === "brake") {
165
167
  console.error(`[watchdog] crash-loop brake triggered: ${decision.reason}`);
166
168
  writeAlert(decision.reason, previous?.crashCount ?? 0);
169
+ // Critical-event notify (Self-Preservation Phase 1, feature 1D).
170
+ // emitCritical is synchronous-fast (file flag + osascript inline)
171
+ // and schedules a detached Telegram DM via curl that survives the
172
+ // process.exit(3) below — exactly the case this mechanism was
173
+ // built for.
174
+ // Auto-diagnostic (feature 2F) — collect forensic bundle BEFORE
175
+ // emitCritical so the Telegram DM can reference the file path.
176
+ let bundlePath = null;
177
+ try {
178
+ bundlePath = writeDiagnosticBundle({
179
+ category: "watchdog-brake",
180
+ severity: "critical",
181
+ title: "Watchdog crash-loop brake engaged",
182
+ detail: `${decision.reason}\n` +
183
+ `Bot version: ${BOT_VERSION}`,
184
+ suggestedAction: `rm "${ALERT_FILE}" && alvin-bot launchd install`,
185
+ });
186
+ if (bundlePath) {
187
+ console.error(`[auto-diagnostic] forensic bundle written: ${bundlePath}`);
188
+ }
189
+ }
190
+ catch (err) {
191
+ console.error("[watchdog] auto-diagnostic failed:", err);
192
+ }
193
+ try {
194
+ emitCritical({
195
+ category: "watchdog-brake",
196
+ severity: "critical",
197
+ title: "Watchdog crash-loop brake engaged",
198
+ detail: `${decision.reason}\n` +
199
+ `Bot version: ${BOT_VERSION}\n` +
200
+ `The bot has stopped itself to prevent further damage.` +
201
+ (bundlePath ? `\n\nDiagnostic bundle: ${bundlePath}` : ""),
202
+ suggestedAction: `rm "${ALERT_FILE}" && alvin-bot launchd install`,
203
+ }, {
204
+ // We're about to process.exit(3). Block on the Telegram POST
205
+ // synchronously — detached spawn races the exit on macOS+launchd
206
+ // and the alert silently never lands. Adds ~1-2 s before exit;
207
+ // worth it to actually inform the user their bot just braked.
208
+ blockTelegram: true,
209
+ });
210
+ }
211
+ catch (err) {
212
+ console.error("[watchdog] critical-notify failed:", err);
213
+ }
167
214
  // checkCrashLoopBrake tries to unload the LaunchAgent so launchd stops
168
215
  // retrying. It only runs the exit path if ALERT_FILE exists, which is
169
216
  // normally true after writeAlert — but if writeAlert failed silently
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "alvin-bot",
3
- "version": "4.25.1",
3
+ "version": "4.26.0",
4
4
  "description": "Alvin Bot — Your personal AI agent on Telegram, WhatsApp, Discord, Signal, and Web.",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",