alvin-bot 4.25.1 → 4.26.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +95 -0
- package/bin/cli.js +159 -4
- package/dist/index.js +19 -0
- package/dist/services/auto-diagnostic.js +228 -0
- package/dist/services/critical-notify.js +203 -0
- package/dist/services/heartbeat-file.js +65 -0
- package/dist/services/preflight.js +292 -0
- package/dist/services/watchdog.js +47 -0
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,101 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to Alvin Bot are documented here.
|
|
4
4
|
|
|
5
|
+
## [4.26.0] — 2026-05-13
|
|
6
|
+
|
|
7
|
+
### Self-Preservation Phase 1 — four new resilience features, zero hot-path cost
|
|
8
|
+
|
|
9
|
+
Bot now **survives more failure modes** and **alerts you when it can't survive them**. All four features run event-driven or on low-frequency timers — no hot-path overhead, measured RSS +4 MB / cold-start +81 ms vs baseline on a real Apple Silicon Mac (within the +5 MB / +2000 ms tolerance budget).
|
|
10
|
+
|
|
11
|
+
#### Pre-Flight Sanity Check at startup (feature 1A)
|
|
12
|
+
|
|
13
|
+
In parallel at boot, the bot now checks: (1) Telegram `getMe`, (2) AI provider `isAvailable()` — provider-agnostic via the existing Provider interface, works equally for `claude-sdk` / `codex-cli` / `groq` / `gemini` / `offline-gemma4` / etc., (3) SQLite `PRAGMA quick_check` on the embeddings DB, (4) Disk space ≥ 1 GB. Fire-and-forget — startup is **not** delayed; results land ~1 s after `Alvin Bot started` with severity-tagged output:
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
🩺 ✅ Pre-Flight: all checks ok — 986ms total
|
|
17
|
+
✓ telegram bot=@AlvinMBAM4_bot (405ms)
|
|
18
|
+
✓ ai-provider claude-sdk reachable (922ms)
|
|
19
|
+
✓ sqlite embeddings DB integrity ok (43ms)
|
|
20
|
+
✓ disk 53.28 GB free (37ms)
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
Per-check timeouts (3 s / 5 s / 10 s / 2 s) bound the cost. Critical findings will feed Phase 2's auto-diagnostic (already wired). Opt-out: `ALVIN_DISABLE_PREFLIGHT=true`.
|
|
24
|
+
|
|
25
|
+
#### Critical-Event Cross-Channel Notify (feature 1D)
|
|
26
|
+
|
|
27
|
+
When the bot hits a state it can't recover from on its own — watchdog crash-loop brake engaged, repeated Telegram 409s, all providers dead, disk critically low — it now alerts the operator through a **fallback chain that doesn't depend on the bot's own platform being healthy**:
|
|
28
|
+
|
|
29
|
+
1. **`~/.alvin-bot/CRITICAL.log`** — durable audit trail, always written first. Plain text, dated, machine-readable.
|
|
30
|
+
2. **macOS native notification** via `osascript` — visible immediately on the user's desktop.
|
|
31
|
+
3. **Telegram DM to admin** via `curl` — synchronous in exit-imminent contexts so the alert lands before `process.exit()` kills any pending I/O.
|
|
32
|
+
|
|
33
|
+
The synchronous-vs-detached distinction matters: detached child processes get killed by macOS+launchd before they finish their fork-and-exec when the parent exits within a few ms. The watchdog brake explicitly uses `blockTelegram: true` to spawnSync the curl POST and confirm the HTTP response code. Plain-text body (not Markdown) so shell-command `suggestedAction`s with `"`, `&&`, etc. don't trigger Telegram's `Bad Request: can't parse entities` error. Opt-out: `ALVIN_DISABLE_CRITICAL_NOTIFY=true`.
|
|
34
|
+
|
|
35
|
+
#### Zombie Dead-Man-Switch (feature 2E)
|
|
36
|
+
|
|
37
|
+
Bot writes a unix-timestamp heartbeat to `~/.alvin-bot/heartbeat.txt` every 60 s. A **separate, tiny launchd LaunchAgent** (`com.alvinbot.deadman`) wakes every 5 min and checks the heartbeat — if older than 10 min, the watcher fires `launchctl kickstart -k gui/$UID/com.alvinbot.app` to force-restart.
|
|
38
|
+
|
|
39
|
+
Catches the failure mode the in-process watchdog **cannot** see: process is alive but frozen (event-loop deadlock, blocked I/O, native-binding hang). The in-process watchdog can't detect its own death — that's a contradiction in terms — so the external observer is the only architecturally sound solution.
|
|
40
|
+
|
|
41
|
+
Threshold overridable for testing: `ALVIN_DEADMAN_THRESHOLD_SEC=60` (default 600). End-to-end verified on a real Mac: `kill -STOP` froze the bot at PID X, watcher detected stale heartbeat 700 s old, kickstart fired, fresh PID Y came up within 8 s. CPU cost of the watcher: 0.017 %.
|
|
42
|
+
|
|
43
|
+
#### Auto-Diagnostic Logs-Collector (feature 2F)
|
|
44
|
+
|
|
45
|
+
On any critical failure, the bot now writes a structured forensic Markdown bundle to `~/.alvin-bot/diagnostics/<timestamp>-<category>.md` containing:
|
|
46
|
+
|
|
47
|
+
1. Event detail + suggested action
|
|
48
|
+
2. Process state (PID, RSS, heap, uptime, node version, platform, argv)
|
|
49
|
+
3. Non-secret environment vars (PATH, PRIMARY_PROVIDER, FALLBACK_PROVIDERS, WEB_*, …)
|
|
50
|
+
4. Last 200 lines of `alvin-bot.err.log`
|
|
51
|
+
5. Last 200 lines of `alvin-bot.out.log`
|
|
52
|
+
6. Watchdog state (`~/.alvin-bot/state/watchdog.json`)
|
|
53
|
+
7. System tool inventory (`node`, `npm`, `brew`, `pm2`, `codex`, `claude`, `yt-dlp`, `ffmpeg`, `wacli`, `agent-browser`)
|
|
54
|
+
8. Disk space (`df -h ~/.alvin-bot`)
|
|
55
|
+
9. PM2 status (if PM2 installed — the same kind of state that bit us in 4.25.1)
|
|
56
|
+
|
|
57
|
+
Bundles are ~18 KB each, capped at 50 retained files (oldest pruned automatically). The Telegram DM from feature 1D now includes the bundle path so the operator can immediately `cat` or scp it.
|
|
58
|
+
|
|
59
|
+
This is also the data input the 5.0.0 AI-Self-Diagnosis (feature 3I) will feed to a sub-agent for automated analysis. As a 4.26.0 deliverable it stands on its own as "human-readable forensic dump".
|
|
60
|
+
|
|
61
|
+
Opt-out: `ALVIN_DISABLE_AUTO_DIAGNOSTIC=true`.
|
|
62
|
+
|
|
63
|
+
### Bundle wacli (WhatsApp CLI) with conditional opt-in
|
|
64
|
+
|
|
65
|
+
`wacli` (https://wacli.sh, brew tap `steipete/tap`, v0.8.1, ~25 MB Go binary) is now part of `BOOTSTRAP_TOOLS` — but with a **hybrid install condition** that avoids forcing it onto users who don't use WhatsApp:
|
|
66
|
+
|
|
67
|
+
- **If `wacli` is already installed** → bootstrap runs `brew upgrade wacli` (treated like any other bundled tool).
|
|
68
|
+
- **If `WHATSAPP_ENABLED=true` is set in `.env`** → bootstrap installs via `brew install steipete/tap/wacli`.
|
|
69
|
+
- **Otherwise** → silent skip with dimmer `·` icon: `· wacli (WhatsApp CLI) skipped (not opted in)`.
|
|
70
|
+
|
|
71
|
+
License: see https://wacli.sh — alvin-bot does not bundle wacli, only invokes the user's brew, the user remains the licensee. macOS only (no Linux build upstream; bootstrap skips on Linux automatically).
|
|
72
|
+
|
|
73
|
+
### Opt-out env vars summary
|
|
74
|
+
|
|
75
|
+
For users who want minimal footprint:
|
|
76
|
+
|
|
77
|
+
```
|
|
78
|
+
ALVIN_DISABLE_SELF_PRESERVATION=true # skip ALL Phase-1 features
|
|
79
|
+
ALVIN_DISABLE_PREFLIGHT=true # skip Pre-Flight only
|
|
80
|
+
ALVIN_DISABLE_CRITICAL_NOTIFY=true # skip cross-channel notify
|
|
81
|
+
ALVIN_DISABLE_DEAD_MAN=true # skip heartbeat writer
|
|
82
|
+
ALVIN_DISABLE_AUTO_DIAGNOSTIC=true # skip diagnostic bundles
|
|
83
|
+
ALVIN_DEADMAN_THRESHOLD_SEC=600 # tune dead-man threshold (default 10 min)
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### Performance budget verified on real hardware
|
|
87
|
+
|
|
88
|
+
End-to-end measurements on Apple Silicon Mac (.75 test box):
|
|
89
|
+
|
|
90
|
+
| Metric | Baseline 4.25.1 | 4.26.0 | Δ | Tolerance |
|
|
91
|
+
|---|---|---|---|---|
|
|
92
|
+
| Cold-start ready (median, throttled) | 5023 ms | 5104 ms | +81 ms | +2000 ms |
|
|
93
|
+
| Cold-start ready (unthrottled, 1st run) | 2189 ms | 2170 ms | -19 ms | +2000 ms |
|
|
94
|
+
| RSS idle steady-state | ~102 MB | 106.4 MB | +4.4 MB | +5 MB |
|
|
95
|
+
| CPU idle | 0.0 % | 0.0 % | 0 | +0.1 % |
|
|
96
|
+
| Log dir growth | stable | stable | n/a | <1 KB/s |
|
|
97
|
+
|
|
98
|
+
All five metrics within tolerance.
|
|
99
|
+
|
|
5
100
|
## [4.25.1] — 2026-05-13
|
|
6
101
|
|
|
7
102
|
### Fixed: `alvin-bot launchd install` now persists the PM2 cleanup
|
package/bin/cli.js
CHANGED
|
@@ -272,6 +272,24 @@ const BOOTSTRAP_TOOLS = [
|
|
|
272
272
|
install: { macos: "brew install ffmpeg", linux: "sudo apt-get install -y ffmpeg" },
|
|
273
273
|
upgrade: { macos: "brew upgrade ffmpeg", linux: "sudo apt-get install --only-upgrade -y ffmpeg" },
|
|
274
274
|
},
|
|
275
|
+
{
|
|
276
|
+
// wacli — WhatsApp CLI from steipete/tap. Hybrid bootstrap: only
|
|
277
|
+
// install/upgrade if the user has already installed it (we
|
|
278
|
+
// respect their existing setup) or has explicitly opted in via
|
|
279
|
+
// WHATSAPP_ENABLED=true in .env. This avoids pulling a ~25 MB
|
|
280
|
+
// Go binary onto every public user's machine, including those
|
|
281
|
+
// who never touch WhatsApp.
|
|
282
|
+
cmd: "wacli",
|
|
283
|
+
name: "wacli (WhatsApp CLI)",
|
|
284
|
+
license: "see https://wacli.sh — installed via your own brew, you remain the licensee",
|
|
285
|
+
install: { macos: "brew install steipete/tap/wacli", linux: null },
|
|
286
|
+
upgrade: { macos: "brew upgrade wacli", linux: null },
|
|
287
|
+
// Hybrid: only bootstrap if the user has explicitly signalled
|
|
288
|
+
// interest. installCondition is checked BEFORE any install/upgrade
|
|
289
|
+
// attempt; returns false → tool silently skipped.
|
|
290
|
+
installCondition: (env) =>
|
|
291
|
+
hasCommand("wacli") || env.WHATSAPP_ENABLED === "true",
|
|
292
|
+
},
|
|
275
293
|
];
|
|
276
294
|
|
|
277
295
|
// Memoized: `brew update` is slow (5-30s) but needs to run at least once
|
|
@@ -309,6 +327,22 @@ function detectPlatformPm() {
|
|
|
309
327
|
function bootstrapOneTool(tool, platform) {
|
|
310
328
|
const cmdAvailable = hasCommand(tool.cmd);
|
|
311
329
|
|
|
330
|
+
// installCondition: optional gate that respects user intent. A tool with
|
|
331
|
+
// installCondition returning false is treated as "user hasn't opted in,
|
|
332
|
+
// silently skip". This is how wacli avoids forcing a 25 MB WhatsApp CLI
|
|
333
|
+
// onto every public user — only installs if they have it already or
|
|
334
|
+
// explicitly set WHATSAPP_ENABLED=true in .env.
|
|
335
|
+
if (typeof tool.installCondition === "function") {
|
|
336
|
+
try {
|
|
337
|
+
if (!tool.installCondition(process.env)) {
|
|
338
|
+
return { ok: true, skipped: true, message: `${tool.name} skipped (not opted in)` };
|
|
339
|
+
}
|
|
340
|
+
} catch {
|
|
341
|
+
// condition function threw — be defensive, skip
|
|
342
|
+
return { ok: true, skipped: true, message: `${tool.name} skipped (condition error)` };
|
|
343
|
+
}
|
|
344
|
+
}
|
|
345
|
+
|
|
312
346
|
// Linux-only prerequisite check (e.g. pipx for yt-dlp).
|
|
313
347
|
if (platform === "linux" && tool.linuxSkipIf && !hasCommand(tool.linuxSkipIf)) {
|
|
314
348
|
return { ok: false, message: `${tool.name} skipped — needs '${tool.linuxSkipIf}' on Linux` };
|
|
@@ -376,12 +410,12 @@ async function ensureBootstrapTools(opts = {}) {
|
|
|
376
410
|
const platform = detectPlatformPm();
|
|
377
411
|
if (!platform) return;
|
|
378
412
|
|
|
379
|
-
console.log("\n🎬 Setting up
|
|
413
|
+
console.log("\n🎬 Setting up bundled tools (yt-dlp, ffmpeg, wacli on opt-in)...");
|
|
380
414
|
|
|
381
415
|
// macOS needs brew on PATH — same trick as ensureBrewOnPath() uses.
|
|
382
416
|
if (platform === "macos" && !hasCommand("brew")) {
|
|
383
417
|
if (!ensureBrewOnPath()) {
|
|
384
|
-
console.log(" ⚠️ Skipping
|
|
418
|
+
console.log(" ⚠️ Skipping tool bootstrap — Homebrew not installed.");
|
|
385
419
|
console.log(" To enable: install brew from https://brew.sh and re-run setup.");
|
|
386
420
|
return;
|
|
387
421
|
}
|
|
@@ -389,7 +423,9 @@ async function ensureBootstrapTools(opts = {}) {
|
|
|
389
423
|
|
|
390
424
|
for (const tool of BOOTSTRAP_TOOLS) {
|
|
391
425
|
const result = bootstrapOneTool(tool, platform);
|
|
392
|
-
|
|
426
|
+
// skipped (opt-in not signaled) → use dimmer icon, less attention-grabbing
|
|
427
|
+
const icon = result.skipped ? "·" : result.ok ? "✓" : "⚠";
|
|
428
|
+
console.log(` ${icon} ${result.message}`);
|
|
393
429
|
}
|
|
394
430
|
console.log("");
|
|
395
431
|
}
|
|
@@ -2688,7 +2724,80 @@ function launchdPaths() {
|
|
|
2688
2724
|
const entryPoint = resolve(join(import.meta.dirname, "..", "dist", "index.js"));
|
|
2689
2725
|
const cwd = resolve(join(import.meta.dirname, ".."));
|
|
2690
2726
|
const nodePath = process.execPath;
|
|
2691
|
-
|
|
2727
|
+
// Dead-man-switch watcher (Self-Preservation Phase 1, feature 2E).
|
|
2728
|
+
// Separate, tiny LaunchAgent that fires every 5 min and force-restarts
|
|
2729
|
+
// the main bot if its heartbeat-file is stale. The two agents are
|
|
2730
|
+
// intentionally independent: if the main bot is wedged, the dead-man
|
|
2731
|
+
// agent is still scheduling and reading the file.
|
|
2732
|
+
const deadmanLabel = "com.alvinbot.deadman";
|
|
2733
|
+
const deadmanPlistPath = join(home, "Library", "LaunchAgents", `${deadmanLabel}.plist`);
|
|
2734
|
+
return { home, label, plistPath, logDir, entryPoint, cwd, nodePath, deadmanLabel, deadmanPlistPath };
|
|
2735
|
+
}
|
|
2736
|
+
|
|
2737
|
+
/**
|
|
2738
|
+
* Generate the dead-man watcher LaunchAgent plist. It runs a tiny shell
|
|
2739
|
+
* script every 5 minutes (StartInterval) that compares the bot's
|
|
2740
|
+
* heartbeat-file timestamp against now. If the heartbeat is more than
|
|
2741
|
+
* 10 minutes stale, it `launchctl kickstart -k`s the main bot.
|
|
2742
|
+
*
|
|
2743
|
+
* The threshold is overridable via ALVIN_DEADMAN_THRESHOLD_SEC for
|
|
2744
|
+
* testing; default is 600 s = 10 minutes.
|
|
2745
|
+
*
|
|
2746
|
+
* Why inline shell instead of a bundled script:
|
|
2747
|
+
* - Zero extra files to ship via npm
|
|
2748
|
+
* - Trivial to audit: 12 lines of POSIX sh
|
|
2749
|
+
* - No PATH dependency (uses absolute /bin paths)
|
|
2750
|
+
*/
|
|
2751
|
+
function renderDeadmanPlist({ deadmanLabel, mainLabel, home, logDir }) {
|
|
2752
|
+
// Inline shell — kept POSIX-clean, uses only built-ins + launchctl.
|
|
2753
|
+
// The redirect to logDir/deadman.log gives us a record of any
|
|
2754
|
+
// kickstart actions without the watcher writing more than ~50
|
|
2755
|
+
// bytes per event.
|
|
2756
|
+
const script = `
|
|
2757
|
+
HEARTBEAT="${home}/.alvin-bot/heartbeat.txt"
|
|
2758
|
+
LOG="${logDir}/deadman.log"
|
|
2759
|
+
THRESHOLD="\${ALVIN_DEADMAN_THRESHOLD_SEC:-600}"
|
|
2760
|
+
if [ ! -f "$HEARTBEAT" ]; then exit 0; fi
|
|
2761
|
+
LAST=$(cat "$HEARTBEAT" 2>/dev/null | tr -d ' \\n')
|
|
2762
|
+
NOW=$(date +%s)
|
|
2763
|
+
case "$LAST" in
|
|
2764
|
+
''|*[!0-9]*) exit 0 ;;
|
|
2765
|
+
esac
|
|
2766
|
+
DIFF=$((NOW - LAST))
|
|
2767
|
+
if [ "$DIFF" -gt "$THRESHOLD" ]; then
|
|
2768
|
+
echo "$(date -u +%FT%TZ) deadman: heartbeat $DIFF s old (> $THRESHOLD s), kickstarting ${mainLabel}" >> "$LOG"
|
|
2769
|
+
/bin/launchctl kickstart -k "gui/$(id -u)/${mainLabel}" 2>>"$LOG" || true
|
|
2770
|
+
fi
|
|
2771
|
+
`.trim();
|
|
2772
|
+
|
|
2773
|
+
return `<?xml version="1.0" encoding="UTF-8"?>
|
|
2774
|
+
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
2775
|
+
<plist version="1.0">
|
|
2776
|
+
<dict>
|
|
2777
|
+
<key>Label</key>
|
|
2778
|
+
<string>${deadmanLabel}</string>
|
|
2779
|
+
|
|
2780
|
+
<key>ProgramArguments</key>
|
|
2781
|
+
<array>
|
|
2782
|
+
<string>/bin/sh</string>
|
|
2783
|
+
<string>-c</string>
|
|
2784
|
+
<string>${script.replace(/&/g, "&").replace(/</g, "<").replace(/>/g, ">")}</string>
|
|
2785
|
+
</array>
|
|
2786
|
+
|
|
2787
|
+
<key>StartInterval</key>
|
|
2788
|
+
<integer>300</integer>
|
|
2789
|
+
|
|
2790
|
+
<key>RunAtLoad</key>
|
|
2791
|
+
<false/>
|
|
2792
|
+
|
|
2793
|
+
<key>StandardErrorPath</key>
|
|
2794
|
+
<string>${logDir}/deadman.err.log</string>
|
|
2795
|
+
|
|
2796
|
+
<key>LimitLoadToSessionType</key>
|
|
2797
|
+
<string>Aqua</string>
|
|
2798
|
+
</dict>
|
|
2799
|
+
</plist>
|
|
2800
|
+
`;
|
|
2692
2801
|
}
|
|
2693
2802
|
|
|
2694
2803
|
async function launchdInstall() {
|
|
@@ -2830,6 +2939,38 @@ async function launchdInstall() {
|
|
|
2830
2939
|
console.log(` protected files. (Granted path: ${fda.realNodePath})`);
|
|
2831
2940
|
}
|
|
2832
2941
|
|
|
2942
|
+
// ── Dead-Man-Switch (Self-Preservation Phase 1, feature 2E) ──────────
|
|
2943
|
+
// Install a second tiny LaunchAgent that wakes every 5 min and force-
|
|
2944
|
+
// restarts the main bot if its heartbeat-file is stale. Catches "process
|
|
2945
|
+
// alive but frozen" — event-loop deadlocks, blocked I/O, etc. — that
|
|
2946
|
+
// the in-process watchdog can't see.
|
|
2947
|
+
// Opt-out: ALVIN_DISABLE_DEAD_MAN=true or ALVIN_DISABLE_SELF_PRESERVATION=true.
|
|
2948
|
+
if (
|
|
2949
|
+
process.env.ALVIN_DISABLE_DEAD_MAN !== "true" &&
|
|
2950
|
+
process.env.ALVIN_DISABLE_SELF_PRESERVATION !== "true"
|
|
2951
|
+
) {
|
|
2952
|
+
const { deadmanLabel, deadmanPlistPath } = launchdPaths();
|
|
2953
|
+
const deadmanPlist = renderDeadmanPlist({
|
|
2954
|
+
deadmanLabel,
|
|
2955
|
+
mainLabel: label,
|
|
2956
|
+
home,
|
|
2957
|
+
logDir,
|
|
2958
|
+
});
|
|
2959
|
+
writeFileSync(deadmanPlistPath, deadmanPlist, { mode: 0o644 });
|
|
2960
|
+
console.log("");
|
|
2961
|
+
console.log(`📝 Wrote ${deadmanPlistPath}`);
|
|
2962
|
+
try {
|
|
2963
|
+
execSync(`launchctl bootout gui/$(id -u)/${deadmanLabel} 2>/dev/null || true`, { stdio: "pipe" });
|
|
2964
|
+
} catch {}
|
|
2965
|
+
try {
|
|
2966
|
+
execSync(`launchctl bootstrap gui/$(id -u) "${deadmanPlistPath}"`, { stdio: "pipe" });
|
|
2967
|
+
console.log("🛡️ Dead-man watcher active — checks every 5 min, force-restarts main bot if heartbeat > 10 min stale.");
|
|
2968
|
+
} catch (err) {
|
|
2969
|
+
console.log(`⚠️ Dead-man watcher load failed (non-fatal): ${err.message?.split("\n")[0] || err}`);
|
|
2970
|
+
console.log(" The main bot still works; only zombie-detection is disabled.");
|
|
2971
|
+
}
|
|
2972
|
+
}
|
|
2973
|
+
|
|
2833
2974
|
process.exit(0);
|
|
2834
2975
|
}
|
|
2835
2976
|
|
|
@@ -2859,6 +3000,20 @@ async function launchdUninstall() {
|
|
|
2859
3000
|
console.log(`⚠️ Could not remove plist: ${err.message}`);
|
|
2860
3001
|
}
|
|
2861
3002
|
|
|
3003
|
+
// Dead-Man watcher (feature 2E) — also remove its companion plist.
|
|
3004
|
+
const { deadmanLabel, deadmanPlistPath } = launchdPaths();
|
|
3005
|
+
if (existsSync(deadmanPlistPath)) {
|
|
3006
|
+
try {
|
|
3007
|
+
execSync(`launchctl bootout gui/$(id -u)/${deadmanLabel} 2>/dev/null || true`, { stdio: "pipe" });
|
|
3008
|
+
} catch {}
|
|
3009
|
+
try {
|
|
3010
|
+
execSync(`rm -f "${deadmanPlistPath}"`);
|
|
3011
|
+
console.log(`🗑 Removed ${deadmanPlistPath} (dead-man watcher)`);
|
|
3012
|
+
} catch (err) {
|
|
3013
|
+
console.log(`⚠️ Could not remove dead-man plist: ${err.message}`);
|
|
3014
|
+
}
|
|
3015
|
+
}
|
|
3016
|
+
|
|
2862
3017
|
console.log("");
|
|
2863
3018
|
console.log("✅ alvin-bot is no longer a launchd user agent.");
|
|
2864
3019
|
process.exit(0);
|
package/dist/index.js
CHANGED
|
@@ -204,6 +204,17 @@ if (hasProvider) {
|
|
|
204
204
|
else {
|
|
205
205
|
console.warn("⚠️ Engine not initialized — no AI provider configured.");
|
|
206
206
|
}
|
|
207
|
+
// Pre-Flight Sanity Check (Self-Preservation Phase 1, feature 1A) —
|
|
208
|
+
// runs in parallel, fire-and-forget. Does NOT block startup.
|
|
209
|
+
// Catches misconfigurations + degraded state at boot time.
|
|
210
|
+
import("./services/preflight.js")
|
|
211
|
+
.then(({ runPreFlight, formatPreFlightReport }) => runPreFlight(config.botToken, registry).then((report) => {
|
|
212
|
+
console.log(formatPreFlightReport(report));
|
|
213
|
+
}))
|
|
214
|
+
.catch((err) => {
|
|
215
|
+
// Pre-Flight itself must never crash the bot.
|
|
216
|
+
console.warn("⚠️ Pre-Flight check threw:", err?.message || err);
|
|
217
|
+
});
|
|
207
218
|
// Load plugins
|
|
208
219
|
const pluginResult = await loadPlugins();
|
|
209
220
|
if (pluginResult.loaded.length > 0) {
|
|
@@ -527,6 +538,14 @@ setNotifyCallback(async (target, text) => {
|
|
|
527
538
|
enqueue(target.platform, String(target.chatId), text);
|
|
528
539
|
});
|
|
529
540
|
startScheduler();
|
|
541
|
+
// Heartbeat-file writer (Self-Preservation Phase 1, feature 2E).
|
|
542
|
+
// Writes ~/.alvin-bot/heartbeat.txt every 60 s so an external
|
|
543
|
+
// dead-man-watch launchd agent can detect "process alive but frozen"
|
|
544
|
+
// and force-restart the bot. Catches event-loop deadlocks that the
|
|
545
|
+
// in-process watchdog cannot see.
|
|
546
|
+
import("./services/heartbeat-file.js").then(({ startHeartbeatWriter }) => {
|
|
547
|
+
startHeartbeatWriter();
|
|
548
|
+
});
|
|
530
549
|
// Start the async-agent watcher (Fix #17 Stage 2). Polls outputFiles
|
|
531
550
|
// of background sub-agents Claude launched with run_in_background and
|
|
532
551
|
// delivers their completed reports as separate Telegram messages.
|
|
@@ -0,0 +1,228 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Auto-Diagnostic Logs-Collector (Self-Preservation Phase 1, feature 2F).
|
|
3
|
+
*
|
|
4
|
+
* On critical failure, write a structured Markdown "forensic bundle" to
|
|
5
|
+
* ~/.alvin-bot/diagnostics/<timestamp>-<category>.md containing:
|
|
6
|
+
*
|
|
7
|
+
* - Bot version + boot info
|
|
8
|
+
* - Last 200 lines of out.log + err.log
|
|
9
|
+
* - Current process state (PID, RSS, uptime, node version, platform)
|
|
10
|
+
* - Non-secret environment vars (PATH, PRIMARY_PROVIDER, …)
|
|
11
|
+
* - Watchdog state (~/.alvin-bot/state/watchdog.json)
|
|
12
|
+
* - System tool inventory (which node/codex/claude/pm2/yt-dlp/…)
|
|
13
|
+
* - Disk space snapshot
|
|
14
|
+
* - The triggering event itself + suggestion
|
|
15
|
+
*
|
|
16
|
+
* The bundle is the input that the 5.0.0 AI-Diagnostic feature (3I) will
|
|
17
|
+
* later feed to a sub-agent for automated analysis. As of 4.26.0 it's a
|
|
18
|
+
* "human-readable forensic dump" — useful on its own, no AI required.
|
|
19
|
+
*
|
|
20
|
+
* Auto-prune: max 50 retained bundles, oldest deleted on next write.
|
|
21
|
+
*
|
|
22
|
+
* Performance: <100KB per bundle, ~50-200ms wall-clock per write,
|
|
23
|
+
* synchronous (we're typically called right before process.exit so
|
|
24
|
+
* blocking is the right semantic). Files are atomic — full bundle or
|
|
25
|
+
* nothing.
|
|
26
|
+
*
|
|
27
|
+
* Opt-out:
|
|
28
|
+
* ALVIN_DISABLE_AUTO_DIAGNOSTIC=true → skip bundle writes
|
|
29
|
+
* ALVIN_DISABLE_SELF_PRESERVATION=true → skip ALL Phase-1
|
|
30
|
+
*/
|
|
31
|
+
import { writeFileSync, readFileSync, mkdirSync, existsSync, readdirSync, statSync, unlinkSync, } from "fs";
|
|
32
|
+
import { join } from "path";
|
|
33
|
+
import { homedir } from "os";
|
|
34
|
+
import { execSync } from "child_process";
|
|
35
|
+
import { BOT_VERSION } from "../version.js";
|
|
36
|
+
const MAX_BUNDLES = 50;
|
|
37
|
+
function isDisabled() {
|
|
38
|
+
return (process.env.ALVIN_DISABLE_AUTO_DIAGNOSTIC === "true" ||
|
|
39
|
+
process.env.ALVIN_DISABLE_SELF_PRESERVATION === "true");
|
|
40
|
+
}
|
|
41
|
+
function safeReadTail(filename, n) {
|
|
42
|
+
try {
|
|
43
|
+
const path = join(homedir(), ".alvin-bot", "logs", filename);
|
|
44
|
+
if (!existsSync(path))
|
|
45
|
+
return "(log file not present)";
|
|
46
|
+
const content = readFileSync(path, "utf-8");
|
|
47
|
+
const lines = content.split("\n");
|
|
48
|
+
return lines.slice(Math.max(0, lines.length - n)).join("\n");
|
|
49
|
+
}
|
|
50
|
+
catch (err) {
|
|
51
|
+
return `(read failed: ${err instanceof Error ? err.message : String(err)})`;
|
|
52
|
+
}
|
|
53
|
+
}
|
|
54
|
+
function safeShell(cmd, timeoutMs = 5000) {
|
|
55
|
+
try {
|
|
56
|
+
return execSync(cmd, { encoding: "utf-8", timeout: timeoutMs, stdio: ["ignore", "pipe", "pipe"] }).trim();
|
|
57
|
+
}
|
|
58
|
+
catch (err) {
|
|
59
|
+
const e = err;
|
|
60
|
+
const out = e.stdout?.toString().trim() ?? "";
|
|
61
|
+
const stderr = e.stderr?.toString().trim() ?? "";
|
|
62
|
+
if (out)
|
|
63
|
+
return out + (stderr ? `\n[stderr]: ${stderr}` : "");
|
|
64
|
+
return `(command failed: ${e.message || "unknown"})`;
|
|
65
|
+
}
|
|
66
|
+
}
|
|
67
|
+
function safeReadFile(path) {
|
|
68
|
+
try {
|
|
69
|
+
return readFileSync(path, "utf-8").trim();
|
|
70
|
+
}
|
|
71
|
+
catch (err) {
|
|
72
|
+
return `(could not read ${path}: ${err instanceof Error ? err.message : String(err)})`;
|
|
73
|
+
}
|
|
74
|
+
}
|
|
75
|
+
/**
|
|
76
|
+
* Prune diagnostic bundles older than MAX_BUNDLES (50). Oldest deleted
|
|
77
|
+
* first by mtime. Best-effort: silent on errors.
|
|
78
|
+
*/
|
|
79
|
+
export function pruneDiagnostics(maxKeep = MAX_BUNDLES) {
|
|
80
|
+
try {
|
|
81
|
+
const dir = join(homedir(), ".alvin-bot", "diagnostics");
|
|
82
|
+
if (!existsSync(dir))
|
|
83
|
+
return;
|
|
84
|
+
const files = readdirSync(dir)
|
|
85
|
+
.filter((f) => f.endsWith(".md"))
|
|
86
|
+
.map((f) => {
|
|
87
|
+
try {
|
|
88
|
+
return { name: f, mtime: statSync(join(dir, f)).mtimeMs };
|
|
89
|
+
}
|
|
90
|
+
catch {
|
|
91
|
+
return { name: f, mtime: 0 };
|
|
92
|
+
}
|
|
93
|
+
})
|
|
94
|
+
.sort((a, b) => b.mtime - a.mtime);
|
|
95
|
+
for (const f of files.slice(maxKeep)) {
|
|
96
|
+
try {
|
|
97
|
+
unlinkSync(join(dir, f.name));
|
|
98
|
+
}
|
|
99
|
+
catch {
|
|
100
|
+
/* best-effort */
|
|
101
|
+
}
|
|
102
|
+
}
|
|
103
|
+
}
|
|
104
|
+
catch {
|
|
105
|
+
/* never fail the caller */
|
|
106
|
+
}
|
|
107
|
+
}
|
|
108
|
+
/**
|
|
109
|
+
* Write a diagnostic bundle for the given event. Returns the absolute
|
|
110
|
+
* path to the written file, or null if disabled / failed.
|
|
111
|
+
*
|
|
112
|
+
* Safe to call from any context — never throws. Side-effects:
|
|
113
|
+
* - Creates ~/.alvin-bot/diagnostics/ if absent
|
|
114
|
+
* - Writes a single ~50-100KB markdown file
|
|
115
|
+
* - Prunes to MAX_BUNDLES retained
|
|
116
|
+
*/
|
|
117
|
+
export function writeDiagnosticBundle(event) {
|
|
118
|
+
if (isDisabled())
|
|
119
|
+
return null;
|
|
120
|
+
try {
|
|
121
|
+
const dir = join(homedir(), ".alvin-bot", "diagnostics");
|
|
122
|
+
mkdirSync(dir, { recursive: true });
|
|
123
|
+
const ts = (event.ts || new Date()).toISOString().replace(/[:.]/g, "-");
|
|
124
|
+
const filename = `${ts}-${event.category}.md`;
|
|
125
|
+
const filepath = join(dir, filename);
|
|
126
|
+
const mem = process.memoryUsage();
|
|
127
|
+
const rssMB = Math.round(mem.rss / 1024 / 1024);
|
|
128
|
+
const heapMB = Math.round(mem.heapUsed / 1024 / 1024);
|
|
129
|
+
const sections = [
|
|
130
|
+
`# Alvin Bot — Diagnostic Bundle`,
|
|
131
|
+
``,
|
|
132
|
+
`**Generated:** ${new Date().toISOString()}`,
|
|
133
|
+
`**Bot version:** ${BOT_VERSION}`,
|
|
134
|
+
`**Trigger category:** ${event.category}`,
|
|
135
|
+
`**Severity:** ${event.severity}`,
|
|
136
|
+
`**Title:** ${event.title}`,
|
|
137
|
+
``,
|
|
138
|
+
`## 1. Event Detail`,
|
|
139
|
+
``,
|
|
140
|
+
"```",
|
|
141
|
+
event.detail,
|
|
142
|
+
"```",
|
|
143
|
+
``,
|
|
144
|
+
...(event.suggestedAction
|
|
145
|
+
? [`### Suggested action`, ``, "```", event.suggestedAction, "```", ``]
|
|
146
|
+
: []),
|
|
147
|
+
`## 2. Process State`,
|
|
148
|
+
``,
|
|
149
|
+
`- PID: ${process.pid}`,
|
|
150
|
+
`- RSS memory: ${rssMB} MB`,
|
|
151
|
+
`- Heap used: ${heapMB} MB`,
|
|
152
|
+
`- Uptime: ${Math.round(process.uptime())} s`,
|
|
153
|
+
`- Node.js: ${process.version}`,
|
|
154
|
+
`- Platform: ${process.platform} (${process.arch})`,
|
|
155
|
+
`- argv: ${process.argv.join(" ")}`,
|
|
156
|
+
``,
|
|
157
|
+
`## 3. Environment (non-secret only)`,
|
|
158
|
+
``,
|
|
159
|
+
...[
|
|
160
|
+
"NODE_ENV",
|
|
161
|
+
"HOME",
|
|
162
|
+
"PATH",
|
|
163
|
+
"PRIMARY_PROVIDER",
|
|
164
|
+
"FALLBACK_PROVIDERS",
|
|
165
|
+
"AUTH_MODE",
|
|
166
|
+
"SESSION_MODE",
|
|
167
|
+
"WEB_HOST",
|
|
168
|
+
"WEB_PORT",
|
|
169
|
+
"WORKING_DIR",
|
|
170
|
+
"MAX_BUDGET_USD",
|
|
171
|
+
"ALVIN_DATA_DIR",
|
|
172
|
+
"ALVIN_DEADMAN_THRESHOLD_SEC",
|
|
173
|
+
"ALVIN_DISABLE_SELF_PRESERVATION",
|
|
174
|
+
].map((key) => `- ${key}: ${process.env[key] ?? "(unset)"}`),
|
|
175
|
+
``,
|
|
176
|
+
`## 4. Recent stderr (last 200 lines)`,
|
|
177
|
+
``,
|
|
178
|
+
"```",
|
|
179
|
+
safeReadTail("alvin-bot.err.log", 200),
|
|
180
|
+
"```",
|
|
181
|
+
``,
|
|
182
|
+
`## 5. Recent stdout (last 200 lines)`,
|
|
183
|
+
``,
|
|
184
|
+
"```",
|
|
185
|
+
safeReadTail("alvin-bot.out.log", 200),
|
|
186
|
+
"```",
|
|
187
|
+
``,
|
|
188
|
+
`## 6. Watchdog state`,
|
|
189
|
+
``,
|
|
190
|
+
"```json",
|
|
191
|
+
safeReadFile(join(homedir(), ".alvin-bot", "state", "watchdog.json")),
|
|
192
|
+
"```",
|
|
193
|
+
``,
|
|
194
|
+
`## 7. System tool inventory`,
|
|
195
|
+
``,
|
|
196
|
+
"```",
|
|
197
|
+
safeShell("for t in node npm brew pm2 codex claude yt-dlp ffmpeg wacli agent-browser; do printf '%-15s %s\\n' \"$t\" \"$(command -v $t 2>/dev/null || echo NOT_FOUND)\"; done"),
|
|
198
|
+
"```",
|
|
199
|
+
``,
|
|
200
|
+
`## 8. Disk space (.alvin-bot data dir)`,
|
|
201
|
+
``,
|
|
202
|
+
"```",
|
|
203
|
+
safeShell(`df -h "${join(homedir(), ".alvin-bot")}" 2>&1 | head -2`),
|
|
204
|
+
"```",
|
|
205
|
+
``,
|
|
206
|
+
`## 9. PM2 status (if installed)`,
|
|
207
|
+
``,
|
|
208
|
+
"```",
|
|
209
|
+
safeShell("command -v pm2 >/dev/null && pm2 jlist 2>/dev/null | head -50 || echo 'pm2 not installed'", 3000),
|
|
210
|
+
"```",
|
|
211
|
+
``,
|
|
212
|
+
`---`,
|
|
213
|
+
``,
|
|
214
|
+
`*This bundle was generated automatically by the Alvin Bot auto-diagnostic system.*`,
|
|
215
|
+
`*Set \`ALVIN_DISABLE_AUTO_DIAGNOSTIC=true\` in ~/.alvin-bot/.env to opt out.*`,
|
|
216
|
+
``,
|
|
217
|
+
];
|
|
218
|
+
writeFileSync(filepath, sections.join("\n"), { mode: 0o600 });
|
|
219
|
+
pruneDiagnostics();
|
|
220
|
+
return filepath;
|
|
221
|
+
}
|
|
222
|
+
catch (err) {
|
|
223
|
+
// Diagnostic writer must not be a new failure mode. Log to stderr
|
|
224
|
+
// (which the critical-notify file flag will reference) and bail.
|
|
225
|
+
console.error(`[auto-diagnostic] failed to write bundle: ${err instanceof Error ? err.message : String(err)}`);
|
|
226
|
+
return null;
|
|
227
|
+
}
|
|
228
|
+
}
|
|
@@ -0,0 +1,203 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Critical-Event Cross-Channel Notify (Self-Preservation Phase 1, feature 1D).
|
|
3
|
+
*
|
|
4
|
+
* When something genuinely critical happens — watchdog brake engaged,
|
|
5
|
+
* repeated Telegram 409s, all providers dead, disk full, memory blow-up —
|
|
6
|
+
* deliver the alert through a fallback chain so the user actually finds
|
|
7
|
+
* out even if Telegram (the primary channel) is itself the failure mode.
|
|
8
|
+
*
|
|
9
|
+
* Channel cascade — ALL fire, in order of preference:
|
|
10
|
+
* 1. File flag at ~/.alvin-bot/CRITICAL.log [durable audit trail, always written]
|
|
11
|
+
* 2. macOS native notification (osascript) [if darwin, visible immediately]
|
|
12
|
+
* 3. Telegram DM to admin (detached curl) [survives process exit via spawn+unref]
|
|
13
|
+
*
|
|
14
|
+
* Order is deliberate: we ALWAYS persist the audit (1) first, so even
|
|
15
|
+
* if the process crashes mid-notify we have a forensic record. Then we
|
|
16
|
+
* try the user-facing channels (2, 3) best-effort.
|
|
17
|
+
*
|
|
18
|
+
* The Telegram channel uses a detached child `curl` process precisely
|
|
19
|
+
* because critical events often come paired with process.exit() — most
|
|
20
|
+
* notably the watchdog brake. A normal in-process fetch() wouldn't
|
|
21
|
+
* survive parent termination. `spawn + detached + unref` does.
|
|
22
|
+
*
|
|
23
|
+
* Performance: ZERO steady-state overhead. Only the file-flag write
|
|
24
|
+
* runs at all, and only when emitCritical() is called.
|
|
25
|
+
*
|
|
26
|
+
* Opt-out:
|
|
27
|
+
* ALVIN_DISABLE_CRITICAL_NOTIFY=true → skip Tier 1/2/3 entirely
|
|
28
|
+
* ALVIN_DISABLE_SELF_PRESERVATION=true → skip ALL Phase-1 features
|
|
29
|
+
*/
|
|
30
|
+
import { spawn, execFileSync, spawnSync } from "child_process";
|
|
31
|
+
import { appendFileSync, mkdirSync } from "fs";
|
|
32
|
+
import { join } from "path";
|
|
33
|
+
import { homedir } from "os";
|
|
34
|
+
function isDisabled() {
|
|
35
|
+
return (process.env.ALVIN_DISABLE_CRITICAL_NOTIFY === "true" ||
|
|
36
|
+
process.env.ALVIN_DISABLE_SELF_PRESERVATION === "true");
|
|
37
|
+
}
|
|
38
|
+
function resolveOptions(opts) {
|
|
39
|
+
const botToken = opts?.botToken ?? process.env.BOT_TOKEN ?? undefined;
|
|
40
|
+
let adminChatId = opts?.adminChatId;
|
|
41
|
+
if (adminChatId === undefined && process.env.ALLOWED_USERS) {
|
|
42
|
+
const first = process.env.ALLOWED_USERS.split(",")[0]?.trim();
|
|
43
|
+
if (first) {
|
|
44
|
+
const parsed = parseInt(first, 10);
|
|
45
|
+
if (Number.isFinite(parsed))
|
|
46
|
+
adminChatId = parsed;
|
|
47
|
+
}
|
|
48
|
+
}
|
|
49
|
+
return { botToken, adminChatId };
|
|
50
|
+
}
|
|
51
|
+
// ── Tier 3: Durable file flag — ALWAYS written first ──────────────────────
|
|
52
|
+
function writeFileFlag(event) {
|
|
53
|
+
try {
|
|
54
|
+
const dir = join(homedir(), ".alvin-bot");
|
|
55
|
+
mkdirSync(dir, { recursive: true });
|
|
56
|
+
const path = join(dir, "CRITICAL.log");
|
|
57
|
+
const ts = (event.ts || new Date()).toISOString();
|
|
58
|
+
const block = [
|
|
59
|
+
`[${ts}] ${event.severity.toUpperCase()} ${event.category}`,
|
|
60
|
+
` ${event.title}`,
|
|
61
|
+
...event.detail.split("\n").map((l) => ` ${l}`),
|
|
62
|
+
...(event.suggestedAction ? [` Suggested: ${event.suggestedAction}`] : []),
|
|
63
|
+
"",
|
|
64
|
+
].join("\n");
|
|
65
|
+
appendFileSync(path, block);
|
|
66
|
+
return true;
|
|
67
|
+
}
|
|
68
|
+
catch {
|
|
69
|
+
return false;
|
|
70
|
+
}
|
|
71
|
+
}
|
|
72
|
+
// ── Tier 2: macOS native notification (silent on Linux/Windows) ───────────
|
|
73
|
+
function macosNotification(event) {
|
|
74
|
+
if (process.platform !== "darwin")
|
|
75
|
+
return false;
|
|
76
|
+
try {
|
|
77
|
+
// Escape any embedded double-quotes for AppleScript string literal
|
|
78
|
+
const message = `${event.title} — ${event.detail.split("\n")[0]}`.replace(/"/g, '\\"');
|
|
79
|
+
const title = `Alvin Bot ${event.severity === "critical" ? "🚨" : "⚠️"}`;
|
|
80
|
+
execFileSync("osascript", ["-e", `display notification "${message}" with title "${title}"`], { timeout: 3000, stdio: "pipe" });
|
|
81
|
+
return true;
|
|
82
|
+
}
|
|
83
|
+
catch {
|
|
84
|
+
return false;
|
|
85
|
+
}
|
|
86
|
+
}
|
|
87
|
+
// ── Tier 1: Telegram DM to admin via detached curl ────────────────────────
|
|
88
|
+
//
|
|
89
|
+
// Why detached + curl instead of in-process fetch:
|
|
90
|
+
// - emitCritical() is sometimes called moments before process.exit()
|
|
91
|
+
// (notably from the watchdog brake path). In-process async work
|
|
92
|
+
// would be cancelled.
|
|
93
|
+
// - A detached child with stdio:'ignore' + unref() outlives its parent
|
|
94
|
+
// and is the standard pattern for "survive my own death" notifications.
|
|
95
|
+
// - curl is universally available on macOS + Linux. No node-only deps.
|
|
96
|
+
function telegramAdminDM(event, opts) {
|
|
97
|
+
if (!opts.botToken || !opts.adminChatId)
|
|
98
|
+
return false;
|
|
99
|
+
// Plain text — NOT Markdown. Critical events frequently contain shell
|
|
100
|
+
// commands in `suggestedAction` (paths with quotes, `&&` chains, etc.)
|
|
101
|
+
// which break Telegram's Markdown parser with HTTP 400. Reliability >
|
|
102
|
+
// visual prettiness for an alarm channel. The emoji prefix already
|
|
103
|
+
// makes it visually obvious.
|
|
104
|
+
const lines = [
|
|
105
|
+
`🚨 Alvin Bot — ${event.severity.toUpperCase()}`,
|
|
106
|
+
"",
|
|
107
|
+
event.title,
|
|
108
|
+
"",
|
|
109
|
+
event.detail,
|
|
110
|
+
];
|
|
111
|
+
if (event.suggestedAction) {
|
|
112
|
+
lines.push("", `Suggested: ${event.suggestedAction}`);
|
|
113
|
+
}
|
|
114
|
+
const text = lines.join("\n");
|
|
115
|
+
const curlArgs = [
|
|
116
|
+
"-s",
|
|
117
|
+
"-o", "/dev/null",
|
|
118
|
+
"-X", "POST",
|
|
119
|
+
"--max-time", "5",
|
|
120
|
+
`https://api.telegram.org/bot${opts.botToken}/sendMessage`,
|
|
121
|
+
"-d", `chat_id=${opts.adminChatId}`,
|
|
122
|
+
"--data-urlencode", `text=${text}`,
|
|
123
|
+
];
|
|
124
|
+
if (opts.blockTelegram) {
|
|
125
|
+
// Synchronous: caller is about to process.exit(). spawnSync blocks
|
|
126
|
+
// up to max-time + a small buffer, then returns. Guaranteed delivery
|
|
127
|
+
// attempt — no fork-race with process termination.
|
|
128
|
+
try {
|
|
129
|
+
// Drop -s -o /dev/null so we can see the HTTP response. The body
|
|
130
|
+
// is logged to stderr if Telegram returns a non-2xx.
|
|
131
|
+
const verboseArgs = curlArgs.filter((a) => a !== "-s" && a !== "/dev/null" && a !== "-o");
|
|
132
|
+
verboseArgs.push("-w", "HTTP=%{http_code}");
|
|
133
|
+
const result = spawnSync("curl", verboseArgs, { timeout: 7000, encoding: "utf-8" });
|
|
134
|
+
const stdout = (result.stdout || "").toString();
|
|
135
|
+
const stderr = (result.stderr || "").toString();
|
|
136
|
+
// Diagnostic — only logs in failure path. Helps debug "DM never arrived".
|
|
137
|
+
if (result.status !== 0 || !/HTTP=2\d\d/.test(stdout)) {
|
|
138
|
+
console.error(`[critical-notify] telegram sync curl status=${result.status} stdout=${stdout.slice(0, 200)} stderr=${stderr.slice(0, 200)}`);
|
|
139
|
+
return false;
|
|
140
|
+
}
|
|
141
|
+
return true;
|
|
142
|
+
}
|
|
143
|
+
catch (err) {
|
|
144
|
+
console.error(`[critical-notify] telegram sync curl threw: ${err instanceof Error ? err.message : String(err)}`);
|
|
145
|
+
return false;
|
|
146
|
+
}
|
|
147
|
+
}
|
|
148
|
+
// Async detached: bot keeps running afterwards, no need to block.
|
|
149
|
+
// detached + stdio:ignore + unref is the standard pattern for
|
|
150
|
+
// "fire and forget". Note: NOT safe if caller calls process.exit()
|
|
151
|
+
// immediately after — use blockTelegram:true for those cases.
|
|
152
|
+
try {
|
|
153
|
+
const child = spawn("curl", curlArgs, { detached: true, stdio: "ignore" });
|
|
154
|
+
child.unref();
|
|
155
|
+
return true;
|
|
156
|
+
}
|
|
157
|
+
catch {
|
|
158
|
+
return false;
|
|
159
|
+
}
|
|
160
|
+
}
|
|
161
|
+
/**
|
|
162
|
+
* Emit a critical event across all configured channels.
|
|
163
|
+
*
|
|
164
|
+
* Synchronous-fast: file flag + osascript run inline (<60ms total typical).
|
|
165
|
+
* Telegram is detached so it doesn't block; we return true if it was
|
|
166
|
+
* scheduled (not whether it succeeded — that we can't know synchronously
|
|
167
|
+
* without blocking).
|
|
168
|
+
*
|
|
169
|
+
* Always safe to call. Never throws. Never blocks longer than ~3s
|
|
170
|
+
* (osascript timeout) in the worst case.
|
|
171
|
+
*
|
|
172
|
+
* Outcome of each tier is also logged to stderr so users can diagnose
|
|
173
|
+
* "why didn't I get the Telegram DM?" by reading their err.log.
|
|
174
|
+
*/
|
|
175
|
+
export function emitCritical(event, opts) {
|
|
176
|
+
if (isDisabled()) {
|
|
177
|
+
console.error("[critical-notify] skipped — opt-out via env var");
|
|
178
|
+
return { fileFlag: false, macos: false, telegram: false, reachedAtLeastOne: false };
|
|
179
|
+
}
|
|
180
|
+
// Tier 3 first — most durable, cheapest.
|
|
181
|
+
const fileFlag = writeFileFlag(event);
|
|
182
|
+
// Tier 2 — macOS user-facing.
|
|
183
|
+
const macos = macosNotification(event);
|
|
184
|
+
// Tier 1 — Telegram DM (sync if caller signaled exit, else detached).
|
|
185
|
+
const resolved = resolveOptions(opts);
|
|
186
|
+
const telegram = telegramAdminDM(event, { ...resolved, blockTelegram: opts?.blockTelegram });
|
|
187
|
+
// Diagnostics — written to stderr so even brake-context invocations
|
|
188
|
+
// leave a paper trail in err.log. The user previously hit a case
|
|
189
|
+
// where 1D fired the file flag and osascript but the Telegram DM
|
|
190
|
+
// seemingly never arrived — this log makes it obvious whether
|
|
191
|
+
// resolveOptions found a token + chat_id.
|
|
192
|
+
console.error(`[critical-notify] event="${event.category}" ` +
|
|
193
|
+
`file=${fileFlag ? "ok" : "fail"} ` +
|
|
194
|
+
`macos=${macos ? "ok" : "skip"} ` +
|
|
195
|
+
`telegram=${telegram ? "scheduled" : "skip"}` +
|
|
196
|
+
(telegram ? "" : ` (botToken=${resolved.botToken ? "set" : "missing"} adminChatId=${resolved.adminChatId ?? "missing"})`));
|
|
197
|
+
return {
|
|
198
|
+
fileFlag,
|
|
199
|
+
macos,
|
|
200
|
+
telegram,
|
|
201
|
+
reachedAtLeastOne: fileFlag || macos || telegram,
|
|
202
|
+
};
|
|
203
|
+
}
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Heartbeat-File Writer (Self-Preservation Phase 1, feature 2E).
|
|
3
|
+
*
|
|
4
|
+
* Writes a unix timestamp (seconds) to ~/.alvin-bot/heartbeat.txt every
|
|
5
|
+
* 60 seconds. An external launchd-managed dead-man watcher reads this
|
|
6
|
+
* file every 5 minutes — if the timestamp is older than 10 minutes,
|
|
7
|
+
* the bot is presumed frozen (event-loop deadlock, blocked I/O,
|
|
8
|
+
* unresponsive but alive process) and the watcher force-restarts via
|
|
9
|
+
* `launchctl kickstart -k`.
|
|
10
|
+
*
|
|
11
|
+
* This complements the in-process watchdog (src/services/watchdog.ts)
|
|
12
|
+
* which only catches process exits — it cannot catch "process alive
|
|
13
|
+
* but frozen" because that's exactly the state where the watchdog's
|
|
14
|
+
* own beacon writer also stops.
|
|
15
|
+
*
|
|
16
|
+
* Why a file + external watcher instead of an internal timer:
|
|
17
|
+
* - An internal "I'm frozen" timer is a contradiction in terms.
|
|
18
|
+
* If the event loop is dead, the timer doesn't fire either.
|
|
19
|
+
* - The file-based external watcher is the only architecturally
|
|
20
|
+
* sound way to detect this class of failure.
|
|
21
|
+
*
|
|
22
|
+
* Performance: file write of 11 bytes every 60s. CPU cost ~1ms/min,
|
|
23
|
+
* disk I/O ~0.7 KB/day. Truly negligible.
|
|
24
|
+
*
|
|
25
|
+
* Opt-out:
|
|
26
|
+
* ALVIN_DISABLE_DEAD_MAN=true → skip heartbeat writer
|
|
27
|
+
* ALVIN_DISABLE_SELF_PRESERVATION=true → skip all Phase-1
|
|
28
|
+
*/
|
|
29
|
+
import { writeFileSync, mkdirSync } from "fs";
|
|
30
|
+
import { join } from "path";
|
|
31
|
+
import { homedir } from "os";
|
|
32
|
+
const HEARTBEAT_PATH = join(homedir(), ".alvin-bot", "heartbeat.txt");
|
|
33
|
+
const HEARTBEAT_INTERVAL_MS = 60_000;
|
|
34
|
+
let heartbeatTimer = null;
|
|
35
|
+
function writeHeartbeat() {
|
|
36
|
+
try {
|
|
37
|
+
mkdirSync(join(homedir(), ".alvin-bot"), { recursive: true });
|
|
38
|
+
// 11 bytes — Unix seconds + newline. Easy to parse from shell.
|
|
39
|
+
writeFileSync(HEARTBEAT_PATH, `${Math.floor(Date.now() / 1000)}\n`);
|
|
40
|
+
}
|
|
41
|
+
catch {
|
|
42
|
+
// Disk full or permissions — non-fatal. The dead-man watcher will
|
|
43
|
+
// see a stale file and kickstart, which is the right behaviour:
|
|
44
|
+
// a bot that can't write its heartbeat IS effectively stuck.
|
|
45
|
+
}
|
|
46
|
+
}
|
|
47
|
+
export function startHeartbeatWriter() {
|
|
48
|
+
if (process.env.ALVIN_DISABLE_DEAD_MAN === "true" ||
|
|
49
|
+
process.env.ALVIN_DISABLE_SELF_PRESERVATION === "true") {
|
|
50
|
+
return;
|
|
51
|
+
}
|
|
52
|
+
// Write immediately so the dead-man watcher doesn't see a stale file
|
|
53
|
+
// from the previous process incarnation.
|
|
54
|
+
writeHeartbeat();
|
|
55
|
+
heartbeatTimer = setInterval(writeHeartbeat, HEARTBEAT_INTERVAL_MS);
|
|
56
|
+
// Allow the process to exit without waiting for this timer.
|
|
57
|
+
if (heartbeatTimer.unref)
|
|
58
|
+
heartbeatTimer.unref();
|
|
59
|
+
}
|
|
60
|
+
export function stopHeartbeatWriter() {
|
|
61
|
+
if (heartbeatTimer) {
|
|
62
|
+
clearInterval(heartbeatTimer);
|
|
63
|
+
heartbeatTimer = null;
|
|
64
|
+
}
|
|
65
|
+
}
|
|
@@ -0,0 +1,292 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Pre-Flight Sanity Check (Self-Preservation Phase 1, feature 1A).
|
|
3
|
+
*
|
|
4
|
+
* Runs in PARALLEL at startup, fire-and-forget: never blocks the bot's main
|
|
5
|
+
* startup sequence. Each check has a tight timeout. Results are logged with
|
|
6
|
+
* a severity classification (ok / warn / critical). Critical findings can
|
|
7
|
+
* optionally feed into the cross-channel notify pipeline (1D).
|
|
8
|
+
*
|
|
9
|
+
* Provider-agnostic: AI-provider check is routed through the active
|
|
10
|
+
* Provider's `isAvailable()` method, which every concrete provider
|
|
11
|
+
* implements — so the same check works for claude-sdk, codex-cli,
|
|
12
|
+
* groq, gemini, openai, openrouter, ollama (gemma), nvidia.
|
|
13
|
+
*
|
|
14
|
+
* Opt-out:
|
|
15
|
+
* ALVIN_DISABLE_PREFLIGHT=true → skip Pre-Flight specifically
|
|
16
|
+
* ALVIN_DISABLE_SELF_PRESERVATION=true → skip ALL Phase-1 features
|
|
17
|
+
*
|
|
18
|
+
* Performance budget (measured on Apple Silicon M-series):
|
|
19
|
+
* - Telegram getMe: typical 150-400ms, timeout 3000ms
|
|
20
|
+
* - AI Provider isAvailable: typical 50-800ms, timeout 5000ms
|
|
21
|
+
* - SQLite PRAGMA quick_check: typical 5-50ms, timeout 10000ms
|
|
22
|
+
* - df disk space: typical 5-15ms, timeout 2000ms
|
|
23
|
+
* - Total wall-clock = max of all four (Promise.all) — typically <1s
|
|
24
|
+
*/
|
|
25
|
+
import { existsSync } from "fs";
|
|
26
|
+
import { join } from "path";
|
|
27
|
+
import { homedir } from "os";
|
|
28
|
+
function isDisabled() {
|
|
29
|
+
return (process.env.ALVIN_DISABLE_PREFLIGHT === "true" ||
|
|
30
|
+
process.env.ALVIN_DISABLE_SELF_PRESERVATION === "true");
|
|
31
|
+
}
|
|
32
|
+
/**
|
|
33
|
+
* Run a promise with a wall-clock timeout. Returns `fallback` if the
|
|
34
|
+
* promise doesn't settle in time. Never rejects.
|
|
35
|
+
*/
|
|
36
|
+
function withTimeout(promise, ms, fallback) {
|
|
37
|
+
return new Promise((resolve) => {
|
|
38
|
+
let settled = false;
|
|
39
|
+
const timer = setTimeout(() => {
|
|
40
|
+
if (!settled) {
|
|
41
|
+
settled = true;
|
|
42
|
+
resolve(fallback);
|
|
43
|
+
}
|
|
44
|
+
}, ms);
|
|
45
|
+
promise.then((value) => {
|
|
46
|
+
if (!settled) {
|
|
47
|
+
settled = true;
|
|
48
|
+
clearTimeout(timer);
|
|
49
|
+
resolve(value);
|
|
50
|
+
}
|
|
51
|
+
}, () => {
|
|
52
|
+
if (!settled) {
|
|
53
|
+
settled = true;
|
|
54
|
+
clearTimeout(timer);
|
|
55
|
+
resolve(fallback);
|
|
56
|
+
}
|
|
57
|
+
});
|
|
58
|
+
});
|
|
59
|
+
}
|
|
60
|
+
async function checkTelegram(botToken) {
|
|
61
|
+
const start = Date.now();
|
|
62
|
+
if (!botToken) {
|
|
63
|
+
return {
|
|
64
|
+
name: "telegram",
|
|
65
|
+
ok: true,
|
|
66
|
+
severity: "ok",
|
|
67
|
+
message: "skipped (WebUI-only mode, no BOT_TOKEN)",
|
|
68
|
+
durationMs: Date.now() - start,
|
|
69
|
+
};
|
|
70
|
+
}
|
|
71
|
+
const url = `https://api.telegram.org/bot${botToken}/getMe`;
|
|
72
|
+
const result = await withTimeout(fetch(url).then(async (r) => ({ ok: r.ok, status: r.status, body: await r.json().catch(() => null) })), 3000, null);
|
|
73
|
+
if (!result) {
|
|
74
|
+
return {
|
|
75
|
+
name: "telegram",
|
|
76
|
+
ok: false,
|
|
77
|
+
severity: "warn",
|
|
78
|
+
message: "getMe timed out (3s) — bot may have network / Telegram issues",
|
|
79
|
+
durationMs: Date.now() - start,
|
|
80
|
+
};
|
|
81
|
+
}
|
|
82
|
+
if (!result.ok) {
|
|
83
|
+
return {
|
|
84
|
+
name: "telegram",
|
|
85
|
+
ok: false,
|
|
86
|
+
severity: "critical",
|
|
87
|
+
message: `getMe HTTP ${result.status} — token may be invalid`,
|
|
88
|
+
durationMs: Date.now() - start,
|
|
89
|
+
};
|
|
90
|
+
}
|
|
91
|
+
const username = result.body?.result?.username;
|
|
92
|
+
return {
|
|
93
|
+
name: "telegram",
|
|
94
|
+
ok: true,
|
|
95
|
+
severity: "ok",
|
|
96
|
+
message: username ? `bot=@${username}` : "bot reachable",
|
|
97
|
+
durationMs: Date.now() - start,
|
|
98
|
+
};
|
|
99
|
+
}
|
|
100
|
+
async function checkAiProvider(registry) {
|
|
101
|
+
const start = Date.now();
|
|
102
|
+
if (!registry) {
|
|
103
|
+
return {
|
|
104
|
+
name: "ai-provider",
|
|
105
|
+
ok: false,
|
|
106
|
+
severity: "warn",
|
|
107
|
+
message: "no provider configured (AI features will be disabled)",
|
|
108
|
+
durationMs: Date.now() - start,
|
|
109
|
+
};
|
|
110
|
+
}
|
|
111
|
+
let provider;
|
|
112
|
+
let activeKey = "(unknown)";
|
|
113
|
+
try {
|
|
114
|
+
provider = registry.getActive();
|
|
115
|
+
activeKey = registry.getActiveKey();
|
|
116
|
+
}
|
|
117
|
+
catch {
|
|
118
|
+
return {
|
|
119
|
+
name: "ai-provider",
|
|
120
|
+
ok: false,
|
|
121
|
+
severity: "warn",
|
|
122
|
+
message: "no active provider in registry",
|
|
123
|
+
durationMs: Date.now() - start,
|
|
124
|
+
};
|
|
125
|
+
}
|
|
126
|
+
if (!provider) {
|
|
127
|
+
return {
|
|
128
|
+
name: "ai-provider",
|
|
129
|
+
ok: false,
|
|
130
|
+
severity: "warn",
|
|
131
|
+
message: "no active provider in registry",
|
|
132
|
+
durationMs: Date.now() - start,
|
|
133
|
+
};
|
|
134
|
+
}
|
|
135
|
+
const available = await withTimeout(provider.isAvailable(), 5000, false);
|
|
136
|
+
return {
|
|
137
|
+
name: "ai-provider",
|
|
138
|
+
ok: available,
|
|
139
|
+
severity: available ? "ok" : "warn",
|
|
140
|
+
message: available
|
|
141
|
+
? `${activeKey} reachable`
|
|
142
|
+
: `${activeKey} not reachable / not configured — bot will degrade gracefully on AI calls`,
|
|
143
|
+
durationMs: Date.now() - start,
|
|
144
|
+
};
|
|
145
|
+
}
|
|
146
|
+
async function checkSqliteIntegrity() {
|
|
147
|
+
const start = Date.now();
|
|
148
|
+
const dbPath = join(homedir(), ".alvin-bot", "memory", ".embeddings.db");
|
|
149
|
+
if (!existsSync(dbPath)) {
|
|
150
|
+
return {
|
|
151
|
+
name: "sqlite",
|
|
152
|
+
ok: true,
|
|
153
|
+
severity: "ok",
|
|
154
|
+
message: "embeddings DB not yet created (lazily on first use)",
|
|
155
|
+
durationMs: Date.now() - start,
|
|
156
|
+
};
|
|
157
|
+
}
|
|
158
|
+
try {
|
|
159
|
+
const { createRequire } = await import("module");
|
|
160
|
+
const req = createRequire(import.meta.url);
|
|
161
|
+
const Database = req("better-sqlite3");
|
|
162
|
+
const db = new Database(dbPath, { readonly: true });
|
|
163
|
+
// PRAGMA quick_check is materially faster than integrity_check
|
|
164
|
+
// (catches the same classes of corruption but doesn't verify every
|
|
165
|
+
// page). For our purpose — "is the file readable + structurally
|
|
166
|
+
// sane?" — quick_check is the right tool.
|
|
167
|
+
const result = await withTimeout(Promise.resolve(db.prepare("PRAGMA quick_check").get()), 10_000, null);
|
|
168
|
+
db.close();
|
|
169
|
+
if (result === null) {
|
|
170
|
+
return {
|
|
171
|
+
name: "sqlite",
|
|
172
|
+
ok: false,
|
|
173
|
+
severity: "warn",
|
|
174
|
+
message: "PRAGMA quick_check timed out (>10s) — DB may be very large or locked",
|
|
175
|
+
durationMs: Date.now() - start,
|
|
176
|
+
};
|
|
177
|
+
}
|
|
178
|
+
const r = result;
|
|
179
|
+
const checkResult = r.quick_check || "(unknown)";
|
|
180
|
+
const ok = checkResult === "ok";
|
|
181
|
+
return {
|
|
182
|
+
name: "sqlite",
|
|
183
|
+
ok,
|
|
184
|
+
severity: ok ? "ok" : "critical",
|
|
185
|
+
message: ok ? "embeddings DB integrity ok" : `embeddings DB integrity FAILED: ${checkResult}`,
|
|
186
|
+
durationMs: Date.now() - start,
|
|
187
|
+
};
|
|
188
|
+
}
|
|
189
|
+
catch (err) {
|
|
190
|
+
const message = err instanceof Error ? err.message : String(err);
|
|
191
|
+
return {
|
|
192
|
+
name: "sqlite",
|
|
193
|
+
ok: true,
|
|
194
|
+
severity: "ok",
|
|
195
|
+
message: `check skipped: ${message.split("\n")[0]}`,
|
|
196
|
+
durationMs: Date.now() - start,
|
|
197
|
+
};
|
|
198
|
+
}
|
|
199
|
+
}
|
|
200
|
+
async function checkDiskSpace() {
|
|
201
|
+
const start = Date.now();
|
|
202
|
+
try {
|
|
203
|
+
const { execSync } = await import("child_process");
|
|
204
|
+
const dataDir = join(homedir(), ".alvin-bot");
|
|
205
|
+
const out = execSync(`df -k "${dataDir}"`, { encoding: "utf-8", timeout: 2000 });
|
|
206
|
+
const lines = out.trim().split("\n");
|
|
207
|
+
const data = lines[lines.length - 1].split(/\s+/);
|
|
208
|
+
// df output: Filesystem 1024-blocks Used Available Capacity ...
|
|
209
|
+
const availableKB = parseInt(data[3], 10);
|
|
210
|
+
if (!Number.isFinite(availableKB)) {
|
|
211
|
+
return {
|
|
212
|
+
name: "disk",
|
|
213
|
+
ok: true,
|
|
214
|
+
severity: "ok",
|
|
215
|
+
message: "could not parse df output",
|
|
216
|
+
durationMs: Date.now() - start,
|
|
217
|
+
};
|
|
218
|
+
}
|
|
219
|
+
const availableGB = availableKB / 1024 / 1024;
|
|
220
|
+
const severity = availableKB < 512 * 1024 ? "critical" :
|
|
221
|
+
availableKB < 1024 * 1024 ? "warn" :
|
|
222
|
+
"ok";
|
|
223
|
+
return {
|
|
224
|
+
name: "disk",
|
|
225
|
+
ok: severity === "ok",
|
|
226
|
+
severity,
|
|
227
|
+
message: `${availableGB.toFixed(2)} GB free`,
|
|
228
|
+
durationMs: Date.now() - start,
|
|
229
|
+
};
|
|
230
|
+
}
|
|
231
|
+
catch (err) {
|
|
232
|
+
const message = err instanceof Error ? err.message : String(err);
|
|
233
|
+
return {
|
|
234
|
+
name: "disk",
|
|
235
|
+
ok: true,
|
|
236
|
+
severity: "ok",
|
|
237
|
+
message: `check skipped: ${message.split("\n")[0]}`,
|
|
238
|
+
durationMs: Date.now() - start,
|
|
239
|
+
};
|
|
240
|
+
}
|
|
241
|
+
}
|
|
242
|
+
/**
|
|
243
|
+
* Run the full pre-flight suite in parallel. Always resolves (never
|
|
244
|
+
* throws). Returns a structured report so the caller can decide how
|
|
245
|
+
* to react.
|
|
246
|
+
*/
|
|
247
|
+
export async function runPreFlight(botToken, registry) {
|
|
248
|
+
if (isDisabled()) {
|
|
249
|
+
return {
|
|
250
|
+
results: [],
|
|
251
|
+
slowestMs: 0,
|
|
252
|
+
totalMs: 0,
|
|
253
|
+
anyCritical: false,
|
|
254
|
+
anyWarning: false,
|
|
255
|
+
skipped: true,
|
|
256
|
+
};
|
|
257
|
+
}
|
|
258
|
+
const start = Date.now();
|
|
259
|
+
const results = await Promise.all([
|
|
260
|
+
checkTelegram(botToken),
|
|
261
|
+
checkAiProvider(registry),
|
|
262
|
+
checkSqliteIntegrity(),
|
|
263
|
+
checkDiskSpace(),
|
|
264
|
+
]);
|
|
265
|
+
return {
|
|
266
|
+
results,
|
|
267
|
+
slowestMs: Math.max(...results.map((r) => r.durationMs)),
|
|
268
|
+
totalMs: Date.now() - start,
|
|
269
|
+
anyCritical: results.some((r) => r.severity === "critical"),
|
|
270
|
+
anyWarning: results.some((r) => r.severity === "warn"),
|
|
271
|
+
skipped: false,
|
|
272
|
+
};
|
|
273
|
+
}
|
|
274
|
+
/**
|
|
275
|
+
* Format a PreFlightReport for console output. Compact, single line per
|
|
276
|
+
* check, clear severity icons.
|
|
277
|
+
*/
|
|
278
|
+
export function formatPreFlightReport(report) {
|
|
279
|
+
if (report.skipped) {
|
|
280
|
+
return "🩺 Pre-Flight: skipped (ALVIN_DISABLE_PREFLIGHT=true)";
|
|
281
|
+
}
|
|
282
|
+
const icons = { ok: "✓", warn: "⚠", critical: "❌" };
|
|
283
|
+
const headline = report.anyCritical
|
|
284
|
+
? "❌ Pre-Flight: critical issues"
|
|
285
|
+
: report.anyWarning
|
|
286
|
+
? "⚠️ Pre-Flight: warnings"
|
|
287
|
+
: "✅ Pre-Flight: all checks ok";
|
|
288
|
+
const lines = report.results.map((r) => {
|
|
289
|
+
return ` ${icons[r.severity]} ${r.name.padEnd(12)} ${r.message} (${r.durationMs}ms)`;
|
|
290
|
+
});
|
|
291
|
+
return `🩺 ${headline} — ${report.totalMs}ms total\n${lines.join("\n")}`;
|
|
292
|
+
}
|
|
@@ -27,6 +27,8 @@ import { resolve } from "path";
|
|
|
27
27
|
import os from "os";
|
|
28
28
|
import { execSync } from "child_process";
|
|
29
29
|
import { BOT_VERSION } from "../version.js";
|
|
30
|
+
import { emitCritical } from "./critical-notify.js";
|
|
31
|
+
import { writeDiagnosticBundle } from "./auto-diagnostic.js";
|
|
30
32
|
import { decideBrakeAction, shouldResetCrashCounter, DEFAULTS, } from "./watchdog-brake.js";
|
|
31
33
|
const DATA_DIR = process.env.ALVIN_DATA_DIR || resolve(os.homedir(), ".alvin-bot");
|
|
32
34
|
const STATE_DIR = resolve(DATA_DIR, "state");
|
|
@@ -164,6 +166,51 @@ export function startWatchdog() {
|
|
|
164
166
|
if (decision.action === "brake") {
|
|
165
167
|
console.error(`[watchdog] crash-loop brake triggered: ${decision.reason}`);
|
|
166
168
|
writeAlert(decision.reason, previous?.crashCount ?? 0);
|
|
169
|
+
// Critical-event notify (Self-Preservation Phase 1, feature 1D).
|
|
170
|
+
// emitCritical is synchronous-fast (file flag + osascript inline)
|
|
171
|
+
// and schedules a detached Telegram DM via curl that survives the
|
|
172
|
+
// process.exit(3) below — exactly the case this mechanism was
|
|
173
|
+
// built for.
|
|
174
|
+
// Auto-diagnostic (feature 2F) — collect forensic bundle BEFORE
|
|
175
|
+
// emitCritical so the Telegram DM can reference the file path.
|
|
176
|
+
let bundlePath = null;
|
|
177
|
+
try {
|
|
178
|
+
bundlePath = writeDiagnosticBundle({
|
|
179
|
+
category: "watchdog-brake",
|
|
180
|
+
severity: "critical",
|
|
181
|
+
title: "Watchdog crash-loop brake engaged",
|
|
182
|
+
detail: `${decision.reason}\n` +
|
|
183
|
+
`Bot version: ${BOT_VERSION}`,
|
|
184
|
+
suggestedAction: `rm "${ALERT_FILE}" && alvin-bot launchd install`,
|
|
185
|
+
});
|
|
186
|
+
if (bundlePath) {
|
|
187
|
+
console.error(`[auto-diagnostic] forensic bundle written: ${bundlePath}`);
|
|
188
|
+
}
|
|
189
|
+
}
|
|
190
|
+
catch (err) {
|
|
191
|
+
console.error("[watchdog] auto-diagnostic failed:", err);
|
|
192
|
+
}
|
|
193
|
+
try {
|
|
194
|
+
emitCritical({
|
|
195
|
+
category: "watchdog-brake",
|
|
196
|
+
severity: "critical",
|
|
197
|
+
title: "Watchdog crash-loop brake engaged",
|
|
198
|
+
detail: `${decision.reason}\n` +
|
|
199
|
+
`Bot version: ${BOT_VERSION}\n` +
|
|
200
|
+
`The bot has stopped itself to prevent further damage.` +
|
|
201
|
+
(bundlePath ? `\n\nDiagnostic bundle: ${bundlePath}` : ""),
|
|
202
|
+
suggestedAction: `rm "${ALERT_FILE}" && alvin-bot launchd install`,
|
|
203
|
+
}, {
|
|
204
|
+
// We're about to process.exit(3). Block on the Telegram POST
|
|
205
|
+
// synchronously — detached spawn races the exit on macOS+launchd
|
|
206
|
+
// and the alert silently never lands. Adds ~1-2 s before exit;
|
|
207
|
+
// worth it to actually inform the user their bot just braked.
|
|
208
|
+
blockTelegram: true,
|
|
209
|
+
});
|
|
210
|
+
}
|
|
211
|
+
catch (err) {
|
|
212
|
+
console.error("[watchdog] critical-notify failed:", err);
|
|
213
|
+
}
|
|
167
214
|
// checkCrashLoopBrake tries to unload the LaunchAgent so launchd stops
|
|
168
215
|
// retrying. It only runs the exit path if ALERT_FILE exists, which is
|
|
169
216
|
// normally true after writeAlert — but if writeAlert failed silently
|