alvin-bot 4.8.6 → 4.8.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,120 @@
2
2
 
3
3
  All notable changes to Alvin Bot are documented here.
4
4
 
5
+ ## [4.8.8] — 2026-04-11
6
+
7
+ ### ✨ Unlimited sub-agent & cron timeouts (user-configurable)
8
+
9
+ Sub-agents and `ai-query` cron jobs used to hard-cap at 5 minutes (`SUBAGENT_TIMEOUT=300000` default), and `shell` cron jobs at 60 s. Long-running research, deep-dive audits, or anything that crossed the threshold got killed mid-stream with `status: "timeout"`. 4.8.8 flips the default to **unlimited** and lets the user override both globally and per job.
10
+
11
+ **What changed:**
12
+
13
+ - **Default is now infinite.** `src/config.ts` seeds `subAgentTimeout` from `SUBAGENT_TIMEOUT` env or falls back to `-1` (unlimited). The runtime value lives in `~/.alvin-bot/sub-agents.json` as `defaultTimeoutMs` and is changeable at runtime without restart.
14
+ - **New `/subagents timeout` command.** `/subagents timeout` shows the current value; `/subagents timeout 3600` sets 1 h; `/subagents timeout off` (or `-1`, `0`, `unlimited`, `infinite`) disables the cap entirely. The default-status output now includes a `⏱ Timeout` line.
15
+ - **Per-job override on cron.** `/cron add 1h ai-query "deep audit" --timeout off` gives this one job no timeout. `/cron add 5m shell "pm2 ls" --timeout 30` caps this shell at 30 s. Omitting `--timeout` inherits the current global default. Same flag exists on `scripts/cron-manage.js add --timeout <sec|off>`.
16
+ - **`CronJob.timeoutMs` field.** Optional number in `cron-jobs.json`. Undefined = inherit global default. Value ≤ 0 = unlimited.
17
+ - **Semantics.** `spawnSubAgent` now only arms the `setTimeout(abort)` when `timeout > 0`. At ≤ 0, no abort timer is created, existing `if (timeoutId) clearTimeout(…)` call sites are null-safe, and the agent runs until it finishes, is cancelled via `/subagents cancel`, or the process dies.
18
+ - **Shell cron unchanged behaviour preserved.** If the shell job has no `timeoutMs`, `execSync` is called without a `timeout` option, which Node treats as infinite — same effect as before was *meant* to provide, but the old hard-coded 60 s removed that freedom.
19
+
20
+ **ENV var still works but is seed-only.** `SUBAGENT_TIMEOUT=600000` at startup still seeds the config on first load, but the persisted value in `sub-agents.json` wins after that.
21
+
22
+ ### 🐛 Silenced harmless `message is not modified` Telegram errors
23
+
24
+ Occasionally Ali would see a red banner at the bottom of an Alvin message:
25
+
26
+ > Error: Call to 'editMessageText' failed! (400: Bad Request: message is not modified: specified new message content and reply markup are exactly the same as a current content and reply markup of the message)
27
+
28
+ It never broke anything, but it polluted logs and showed up as an "internal error" reply to the user. Root cause: Telegram's Bot API refuses `editMessageText` when the new content + reply markup are byte-identical to the existing message. This happens legitimately in callback handlers — e.g. tapping a cron-toggle button twice, re-rendering a sudo/keys/platforms menu, language-switch callbacks that render the same content, or stream flushes where the throttled partial hasn't changed since the last edit.
29
+
30
+ **Fix**: `bot.catch()` in `src/index.ts` now filters out this specific error early. Two regex patterns (`/message is not modified/i` and `/specified new message content.*exactly the same/i`) cover both variants Telegram sends. Real errors (network, SDK, provider failures) still log and still surface the "internal error" reply to the user — only this one harmless class gets dropped.
31
+
32
+ ### 📝 CLAUDE.md: PM2 references updated to launchd
33
+
34
+ The project `CLAUDE.md` still said *"PM2: `alvin-bot` Prozess, Config in `ecosystem.config.cjs`"* — outdated since the 4.8.6 switch to launchd. Updated to reflect the actual process manager (`~/Library/LaunchAgents/com.alvinbot.app.plist`, `KeepAlive=true`, `RunAtLoad=true`), the log paths, and a note that `watchdog.ts` only brakes process crash-loops — it does **not** kill long-running sessions or sub-agents. `ecosystem.config.cjs` is now labelled legacy.
35
+
36
+ The global `~/.claude/CLAUDE.md` was also corrected: `alvin-bot` was removed from the VPS PM2-process list (it runs locally, not on the VPS) and the cron-hub note now correctly says "als **launchd LaunchAgent**".
37
+
38
+ ## [4.8.7] — 2026-04-11
39
+
40
+ ### 🐛 `/update` now detects stale-runtime (rebuild without restart)
41
+
42
+ Caught immediately after publishing 4.8.6 on the Mac mini: `/update` reported "Already up to date — no new commits" even though the running process was on **v4.8.5** while the disk was already built at **v4.8.6**. The user could see the version mismatch in `/status` (v4.8.5) but `/update` refused to acknowledge it.
43
+
44
+ **Root cause**: The updater only compared **git commits** (or **npm registry version**) against the local install. It never checked whether the **running process's in-memory version** was older than the **on-disk built version**. This is the dev/CI loop scenario:
45
+
46
+ 1. You edit src/, bump package.json, commit + push
47
+ 2. `npm run build` regenerates dist/ at the new version
48
+ 3. The running process has the OLD code in memory
49
+ 4. You run `/update` in Telegram
50
+ 5. git: HEAD == origin/main (just pushed) → 0 commits behind → "up to date"
51
+ 6. Process never restarts → keeps running OLD code
52
+
53
+ **Fix**: New `isRuntimeStale()` check at the very start of `runUpdate()`. Compares `BOT_VERSION` (in-memory at process start) against `package.json.version` from disk via the existing semver compare. If disk is newer, returns success with `requiresRestart=true` immediately — skip the git/npm fetch entirely, just signal a restart so the fresh code takes effect.
54
+
55
+ After 4.8.7, running `/update` after a manual rebuild will correctly say *"Disk is already built at vX, running vY. Restarting to pick up the new code..."* and trigger the restart.
56
+
57
+ ### ✨ Internal watchdog with crash-loop brake (`src/services/watchdog.ts`)
58
+
59
+ Ali asked for "derbe persistent" — already 95% there with `KeepAlive: true` from 4.8.6, but the missing piece was a brake to stop the bot from infinite-restart-looping if a deterministic crash happens (corrupt state file, missing dependency, broken upgrade).
60
+
61
+ **New module**: `src/services/watchdog.ts`. Two responsibilities:
62
+
63
+ **1. Liveness beacon**. Every 30 s the bot writes `~/.alvin-bot/state/watchdog.json` with `{lastBeat, pid, bootTime, crashCount, crashWindowStart, version}`. Fast disk write, no I/O blocking.
64
+
65
+ **2. Crash-loop brake**. On every fresh boot, the watchdog reads the previous beacon:
66
+
67
+ - If the previous beacon is **less than 90 s old** → the previous process exited very recently → that's a crash (or a deliberate restart, treated the same way for the brake's purpose). Increment `crashCount`.
68
+ - If the previous beacon is **older than 90 s** → previous process had clean uptime → reset counter to 0.
69
+ - The crash window is **10 minutes**. Crashes within this window accumulate; older ones don't count.
70
+ - If `crashCount` reaches **10**, the brake engages:
71
+ - Writes `~/.alvin-bot/state/crash-loop.alert` with the timestamp, version, error log path, and recovery steps
72
+ - Tries to `launchctl unload -w` its own LaunchAgent so launchd stops retrying (otherwise `KeepAlive: true` would keep burning CPU forever)
73
+ - Exits with code 3
74
+
75
+ **3. Recovery**. After **5 minutes of clean uptime**, the watchdog auto-resets the crash counter to 0. So a healthy bot that occasionally has a transient hiccup doesn't slowly accumulate toward the brake over days.
76
+
77
+ **4. Brake check at startup**. `checkCrashLoopBrake()` runs in `index.ts` **before** any expensive init — if the alert file already exists, the bot exits cleanly with code 3 and tries to unload itself again. This prevents launchd from spinning the bot up just to write the same alert over and over.
78
+
79
+ **Recovery from a tripped brake**:
80
+
81
+ ```bash
82
+ # 1. Investigate the error log
83
+ cat ~/.alvin-bot/logs/alvin-bot.err.log
84
+
85
+ # 2. Fix whatever was wrong
86
+ # 3. Remove the alert file
87
+ rm ~/.alvin-bot/state/crash-loop.alert
88
+
89
+ # 4. Reload the LaunchAgent
90
+ alvin-bot launchd install
91
+ ```
92
+
93
+ **What this catches**:
94
+
95
+ - Process crashes (segfault, OOM kill) → exit non-zero → brake increments
96
+ - `process.exit()` from unhandled rejection → similar
97
+ - Tight crash loops → brake engages at 10 within 10 min
98
+ - Corrupted state files that crash on read → brake engages eventually
99
+
100
+ **What this does NOT catch (yet)**:
101
+
102
+ - Event-loop deadlocks where the process is alive but completely frozen. The watchdog beacon needs the event loop to be alive, so it can't detect freeze. A future release will add an external sister LaunchAgent (`com.alvinbot.watchdog`) that runs every 2 minutes via `StartInterval` and kills the main bot if its beacon file is too stale. Tracked as a follow-up.
103
+
104
+ **Telemetry surface**: `alvin-bot status` could read the beacon file in a future release to show "crash count: X in last Y minutes" — for now, the alert file is the main user-facing signal.
105
+
106
+ ### 🛡 LaunchAgent: ProcessType + LimitLoadToSessionType
107
+
108
+ Two small plist hardening tweaks:
109
+
110
+ - **`ProcessType: Background`** — explicit hint to launchd that this is a long-running background service. macOS treats Background processes with friendlier scheduling and is less likely to kill them under memory pressure (vs `Standard` which is the default for unlabeled jobs).
111
+ - **`LimitLoadToSessionType: Aqua`** — only loads in user GUI sessions. Prevents the LaunchAgent from accidentally loading in non-GUI contexts (e.g. SSH login session) where it would not have Keychain access. Defensive: matches our existing assumption that the bot needs the GUI keychain unlocked for Claude SDK OAuth.
112
+
113
+ These don't change behaviour for normal use, but they're explicit about our intent. macOS will treat the bot as a proper background service rather than a generic foreground job.
114
+
115
+ ### Tests
116
+
117
+ 87 still passing — no test changes (the stale-runtime check is a fast-path branch that doesn't disturb the existing git/npm logic).
118
+
5
119
  ## [4.8.6] — 2026-04-11
6
120
 
7
121
  ### 🐛 LaunchAgent: `/restart` left the bot down forever
package/bin/cli.js CHANGED
@@ -1466,6 +1466,12 @@ function renderLaunchdPlist({ label, nodePath, entryPoint, cwd, home, logDir })
1466
1466
  <key>ThrottleInterval</key>
1467
1467
  <integer>5</integer>
1468
1468
 
1469
+ <key>ProcessType</key>
1470
+ <string>Background</string>
1471
+
1472
+ <key>LimitLoadToSessionType</key>
1473
+ <string>Aqua</string>
1474
+
1469
1475
  <key>StandardOutPath</key>
1470
1476
  <string>${logDir}/alvin-bot.out.log</string>
1471
1477
 
package/dist/config.js CHANGED
@@ -45,7 +45,13 @@ export const config = {
45
45
  compactionThreshold: Number(process.env.COMPACTION_THRESHOLD) || 80000,
46
46
  // Sub-Agents
47
47
  maxSubAgents: Number(process.env.MAX_SUBAGENTS) || 4,
48
- subAgentTimeout: Number(process.env.SUBAGENT_TIMEOUT) || 300000, // 5 min
48
+ // Default sub-agent timeout. -1 / 0 = unlimited (no hard cut-off).
49
+ // The runtime value lives in sub-agents.json and can be changed at runtime
50
+ // via /subagents timeout; this constant only seeds the initial config on
51
+ // first launch when SUBAGENT_TIMEOUT is not set.
52
+ subAgentTimeout: process.env.SUBAGENT_TIMEOUT !== undefined && process.env.SUBAGENT_TIMEOUT !== ""
53
+ ? Number(process.env.SUBAGENT_TIMEOUT)
54
+ : -1,
49
55
  // TTS Provider
50
56
  ttsProvider: (process.env.TTS_PROVIDER || "edge"),
51
57
  elevenlabs: {
@@ -1277,9 +1277,29 @@ export function registerCommands(bot) {
1277
1277
  `Commands: /cron add · delete · toggle · run · info`, { parse_mode: "HTML", reply_markup: keyboard });
1278
1278
  return;
1279
1279
  }
1280
- // /cron add <schedule> <type> <payload>
1280
+ // /cron add <schedule> <type> <payload> [--timeout <sec|off>]
1281
1281
  if (arg.startsWith("add ")) {
1282
- const rest = arg.slice(4).trim();
1282
+ let rest = arg.slice(4).trim();
1283
+ // Extract optional --timeout flag from anywhere in the command.
1284
+ // Accepts seconds, "off", "unlimited", "-1", or "0" — anything ≤ 0
1285
+ // or non-numeric collapses to -1 (unlimited).
1286
+ let timeoutMs;
1287
+ const timeoutMatch = rest.match(/(^|\s)--timeout\s+(\S+)/);
1288
+ if (timeoutMatch) {
1289
+ const val = timeoutMatch[2].toLowerCase();
1290
+ if (["off", "unlimited", "infinite", "-1", "0"].includes(val)) {
1291
+ timeoutMs = -1;
1292
+ }
1293
+ else {
1294
+ const secs = Number(timeoutMatch[2]);
1295
+ if (!Number.isFinite(secs) || secs < 0) {
1296
+ await ctx.reply(`❌ Invalid <code>--timeout</code> value: ${timeoutMatch[2]}`, { parse_mode: "HTML" });
1297
+ return;
1298
+ }
1299
+ timeoutMs = Math.floor(secs * 1000);
1300
+ }
1301
+ rest = rest.replace(/(^|\s)--timeout\s+\S+/, "").trim();
1302
+ }
1283
1303
  // Natural language schedule shortcuts (German + English)
1284
1304
  const naturalSchedules = {
1285
1305
  "täglich": "0 8 * * *", "daily": "0 8 * * *",
@@ -1342,7 +1362,7 @@ export function registerCommands(bot) {
1342
1362
  else {
1343
1363
  const sp = rest.indexOf(" ");
1344
1364
  if (sp < 0) {
1345
- await ctx.reply("Format: <code>/cron add &lt;schedule&gt; &lt;type&gt; &lt;payload&gt;</code>\n\nSchedule options:\n• <b>Intervals:</b> 5m, 1h, 30s, 2d\n• <b>Natural:</b> daily, weekly, monthly, weekdays, hourly\n• <b>With time:</b> 8:30 daily, weekdays 9:00\n• <b>German:</b> täglich, wöchentlich, morgens, abends\n• <b>Cron:</b> \"0 9 * * 1-5\"", { parse_mode: "HTML" });
1365
+ await ctx.reply("Format: <code>/cron add &lt;schedule&gt; &lt;type&gt; &lt;payload&gt; [--timeout &lt;sec|off&gt;]</code>\n\nSchedule options:\n• <b>Intervals:</b> 5m, 1h, 30s, 2d\n• <b>Natural:</b> daily, weekly, monthly, weekdays, hourly\n• <b>With time:</b> 8:30 daily, weekdays 9:00\n• <b>German:</b> täglich, wöchentlich, morgens, abends\n• <b>Cron:</b> \"0 9 * * 1-5\"\n\nOptional <code>--timeout</code> in seconds, or <code>off</code>/<code>-1</code> for unlimited.", { parse_mode: "HTML" });
1346
1366
  return;
1347
1367
  }
1348
1368
  schedule = rest.slice(0, sp);
@@ -1381,12 +1401,19 @@ export function registerCommands(bot) {
1381
1401
  payload,
1382
1402
  target: { platform: "telegram", chatId: String(chatId) },
1383
1403
  createdBy: `telegram:${userId}`,
1404
+ ...(timeoutMs !== undefined ? { timeoutMs } : {}),
1384
1405
  });
1385
1406
  const readableSched = humanReadableSchedule(job.schedule);
1407
+ const timeoutLine = typeof job.timeoutMs === "number"
1408
+ ? job.timeoutMs <= 0
1409
+ ? `<b>Timeout:</b> ∞ (unlimited)\n`
1410
+ : `<b>Timeout:</b> ${Math.round(job.timeoutMs / 1000)}s\n`
1411
+ : "";
1386
1412
  await ctx.reply(`✅ <b>Cron Job created</b>\n\n` +
1387
1413
  `<b>Name:</b> ${job.name}\n` +
1388
1414
  `📅 <b>${readableSched}</b>\n` +
1389
1415
  `<b>Type:</b> ${job.type}\n` +
1416
+ timeoutLine +
1390
1417
  `<b>Next run:</b> ${formatNextRun(job.nextRunAt)}\n` +
1391
1418
  `<b>ID:</b> <code>${job.id}</code>`, { parse_mode: "HTML" });
1392
1419
  return;
@@ -1734,7 +1761,7 @@ export function registerCommands(bot) {
1734
1761
  // type both "/sub-agents" and "/subagents" — Telegram routes both to this.
1735
1762
  bot.command(["sub_agents", "subagents"], async (ctx) => {
1736
1763
  const lang = getSession(ctx.from.id).language;
1737
- const { listSubAgents, cancelSubAgent, getSubAgentResult, getMaxParallelAgents, getConfiguredMaxParallel, setMaxParallelAgents, findSubAgentByName, getVisibility, setVisibility, getQueueCap, setQueueCap, } = await import("../services/subagents.js");
1764
+ const { listSubAgents, cancelSubAgent, getSubAgentResult, getMaxParallelAgents, getConfiguredMaxParallel, setMaxParallelAgents, findSubAgentByName, getVisibility, setVisibility, getQueueCap, setQueueCap, getDefaultTimeoutMs, setDefaultTimeoutMs, } = await import("../services/subagents.js");
1738
1765
  const arg = (ctx.match || "").trim();
1739
1766
  const tokens = arg.split(/\s+/).filter(Boolean);
1740
1767
  const sub = tokens[0]?.toLowerCase() || "";
@@ -1792,6 +1819,47 @@ export function registerCommands(bot) {
1792
1819
  await ctx.reply(lines.join("\n"), { parse_mode: "Markdown" });
1793
1820
  return;
1794
1821
  }
1822
+ // /subagents timeout [sec|off|unlimited|-1] — set default sub-agent timeout
1823
+ if (sub === "timeout") {
1824
+ const val = tokens[1];
1825
+ const formatTimeout = (ms) => {
1826
+ if (ms <= 0)
1827
+ return "∞ (unlimited)";
1828
+ if (ms < 1000)
1829
+ return `${ms}ms`;
1830
+ const sec = ms / 1000;
1831
+ if (sec < 60)
1832
+ return `${sec}s`;
1833
+ const min = sec / 60;
1834
+ if (min < 60)
1835
+ return `${min.toFixed(min < 10 ? 1 : 0)}min`;
1836
+ return `${(min / 60).toFixed(1)}h`;
1837
+ };
1838
+ if (!val) {
1839
+ const current = getDefaultTimeoutMs();
1840
+ await ctx.reply(`⏱ Default sub-agent timeout: *${formatTimeout(current)}*\n\n` +
1841
+ `Usage: \`/subagents timeout <sec>\` · \`/subagents timeout off\`\n` +
1842
+ `\`off\`, \`unlimited\`, \`-1\` oder \`0\` = kein Timeout. ` +
1843
+ `Gilt für neue Subagents und ai-query Cron-Jobs ohne eigenen Wert.`, { parse_mode: "Markdown" });
1844
+ return;
1845
+ }
1846
+ const lower = val.toLowerCase();
1847
+ let ms;
1848
+ if (["off", "unlimited", "infinite", "-1", "0"].includes(lower)) {
1849
+ ms = -1;
1850
+ }
1851
+ else {
1852
+ const secs = Number(val);
1853
+ if (!Number.isFinite(secs) || secs < 0) {
1854
+ await ctx.reply(`❌ Ungültiger Wert \`${val}\`. Nutze Sekunden (z.B. \`300\`) oder \`off\`.`, { parse_mode: "Markdown" });
1855
+ return;
1856
+ }
1857
+ ms = Math.floor(secs * 1000);
1858
+ }
1859
+ const effective = setDefaultTimeoutMs(ms);
1860
+ await ctx.reply(`✅ Default sub-agent timeout: *${formatTimeout(effective)}*`, { parse_mode: "Markdown" });
1861
+ return;
1862
+ }
1795
1863
  // /subagents queue <n> — set bounded-queue cap (0 disables queue)
1796
1864
  if (sub === "queue") {
1797
1865
  const n = parseInt(tokens[1] || "", 10);
@@ -1921,6 +1989,10 @@ export function registerCommands(bot) {
1921
1989
  ? `${t("bot.subagents.maxLabel", lang)} 0 ${t("bot.subagents.autoSuffix", lang, { n: effective })}`
1922
1990
  : `${t("bot.subagents.maxLabel", lang)} ${configured}`;
1923
1991
  const visibilityLabel = `${t("bot.subagents.visibilityLabel", lang)} *${getVisibility()}*`;
1992
+ const currentTimeout = getDefaultTimeoutMs();
1993
+ const timeoutLabel = currentTimeout <= 0
1994
+ ? `⏱ Timeout: *∞ (unlimited)*`
1995
+ : `⏱ Timeout: *${Math.round(currentTimeout / 1000)}s*`;
1924
1996
  const agents = listSubAgents();
1925
1997
  let body = "";
1926
1998
  if (agents.length === 0) {
@@ -1931,7 +2003,7 @@ export function registerCommands(bot) {
1931
2003
  }
1932
2004
  const header = t("bot.subagents.header", lang);
1933
2005
  const usage = `\n\n${t("bot.subagents.usage", lang)}`;
1934
- const full = `${header}\n${maxLabel}\n${visibilityLabel}${body}${usage}`;
2006
+ const full = `${header}\n${maxLabel}\n${visibilityLabel}\n${timeoutLabel}${body}${usage}`;
1935
2007
  await ctx.reply(full, { parse_mode: "Markdown" }).catch(() => ctx.reply(full));
1936
2008
  });
1937
2009
  }
package/dist/i18n.js CHANGED
@@ -519,10 +519,10 @@ const strings = {
519
519
  fr: "Durée : {sec}s · Tokens : {in}/{out}",
520
520
  },
521
521
  "bot.subagents.usage": {
522
- en: "Commands:\n/subagents — show status\n/subagents max <n> — set parallel limit (0=auto)\n/subagents visibility <auto|banner|silent|live> — delivery mode\n/subagents queue <n> — bounded-queue cap (0 = disabled)\n/subagents stats — last 24h run stats\n/subagents list — list all\n/subagents cancel <name|id> — cancel one\n/subagents result <name|id> — show result",
523
- de: "Befehle:\n/subagents — Status anzeigen\n/subagents max <n> — Parallel-Limit setzen (0=auto)\n/subagents visibility <auto|banner|silent|live> — Delivery-Modus\n/subagents list — alle anzeigen\n/subagents cancel <name|id> — abbrechen\n/subagents result <name|id> — Ergebnis anzeigen",
524
- es: "Comandos:\n/subagents — ver estado\n/subagents max <n> — establecer límite (0=auto)\n/subagents visibility <auto|banner|silent|live> — modo de entrega\n/subagents list — listar todos\n/subagents cancel <nombre|id> — cancelar uno\n/subagents result <nombre|id> — ver resultado",
525
- fr: "Commandes :\n/subagents — état\n/subagents max <n> — limite parallèle (0=auto)\n/subagents visibility <auto|banner|silent|live> — mode de livraison\n/subagents list — lister tous\n/subagents cancel <nom|id> — annuler un\n/subagents result <nom|id> — voir résultat",
522
+ en: "Commands:\n/subagents — show status\n/subagents max <n> — set parallel limit (0=auto)\n/subagents timeout <sec|off> — default timeout (off = unlimited)\n/subagents visibility <auto|banner|silent|live> — delivery mode\n/subagents queue <n> — bounded-queue cap (0 = disabled)\n/subagents stats — last 24h run stats\n/subagents list — list all\n/subagents cancel <name|id> — cancel one\n/subagents result <name|id> — show result",
523
+ de: "Befehle:\n/subagents — Status anzeigen\n/subagents max <n> — Parallel-Limit setzen (0=auto)\n/subagents timeout <sec|off> — Default-Timeout (off = unendlich)\n/subagents visibility <auto|banner|silent|live> — Delivery-Modus\n/subagents queue <n> — Queue-Cap (0 = deaktiviert)\n/subagents list — alle anzeigen\n/subagents cancel <name|id> — abbrechen\n/subagents result <name|id> — Ergebnis anzeigen",
524
+ es: "Comandos:\n/subagents — ver estado\n/subagents max <n> — establecer límite (0=auto)\n/subagents timeout <seg|off> — timeout por defecto (off = sin límite)\n/subagents visibility <auto|banner|silent|live> — modo de entrega\n/subagents list — listar todos\n/subagents cancel <nombre|id> — cancelar uno\n/subagents result <nombre|id> — ver resultado",
525
+ fr: "Commandes :\n/subagents — état\n/subagents max <n> — limite parallèle (0=auto)\n/subagents timeout <sec|off> — délai par défaut (off = illimité)\n/subagents visibility <auto|banner|silent|live> — mode de livraison\n/subagents list — lister tous\n/subagents cancel <nom|id> — annuler un\n/subagents result <nom|id> — voir résultat",
526
526
  },
527
527
  "bot.subagents.visibilityLabel": {
528
528
  en: "Visibility:",
package/dist/index.js CHANGED
@@ -14,6 +14,11 @@ if (hasLegacyData()) {
14
14
  }
15
15
  // 3. Seed defaults for any files that don't exist yet (fresh install)
16
16
  seedDefaults();
17
+ // 4. Crash-loop brake check — if we've crashed N times in a short window,
18
+ // refuse to start, write an alert file, and unload our LaunchAgent so
19
+ // launchd stops retrying. Runs BEFORE any expensive init so a broken
20
+ // state file doesn't tank the whole CPU.
21
+ checkCrashLoopBrake();
17
22
  // ── Normal imports (safe now — DATA_DIR is ready) ──────────────────
18
23
  import { Bot, InlineKeyboard } from "grammy";
19
24
  import { config } from "./config.js";
@@ -76,6 +81,7 @@ import { loadSkills } from "./services/skills.js";
76
81
  import { loadHooks } from "./services/hooks.js";
77
82
  import { registerShutdownHandler } from "./services/restart.js";
78
83
  import { cancelAllSubAgents } from "./services/subagents.js";
84
+ import { startWatchdog, stopWatchdog, checkCrashLoopBrake } from "./services/watchdog.js";
79
85
  import { getRegistry } from "./engine.js";
80
86
  import { scanAssets } from "./services/asset-index.js";
81
87
  // Scan asset directory and generate INDEX.json + INDEX.md
@@ -210,10 +216,20 @@ if (hasTelegram) {
210
216
  bot.on("message:photo", handlePhoto);
211
217
  bot.on("message:document", handleDocument);
212
218
  bot.on("message:text", handleMessage);
213
- // Error handling — log but don't crash
219
+ // Error handling — log but don't crash.
214
220
  bot.catch((err) => {
215
221
  const ctx = err.ctx;
216
222
  const e = err.error;
223
+ // Telegram's "message is not modified" (400) is harmless — it fires
224
+ // when a callback handler re-renders an inline keyboard / edited
225
+ // message with content that happens to match the current message
226
+ // exactly (e.g. double-tapped toggle button, identical list after
227
+ // re-render). Swallow it silently so it neither pollutes the logs
228
+ // nor bubbles up to the user as "internal error".
229
+ const msg = e instanceof Error ? e.message : String(e);
230
+ if (/message is not modified/i.test(msg) || /specified new message content.*exactly the same/i.test(msg)) {
231
+ return;
232
+ }
217
233
  console.error(`Error handling update ${ctx?.update?.update_id}:`, e);
218
234
  // Try to notify the user
219
235
  if (ctx?.chat?.id) {
@@ -235,6 +251,7 @@ const shutdown = async () => {
235
251
  // agents can post a cancellation message to Telegram before the bot
236
252
  // stops. Capped at 5s internally so a hang can't block shutdown.
237
253
  await cancelAllSubAgents(true);
254
+ stopWatchdog();
238
255
  stopScheduler();
239
256
  stopSessionCleanup();
240
257
  if (queueInterval)
@@ -472,6 +489,8 @@ if (bot) {
472
489
  console.log(` Users: ${config.allowedUsers.length} authorized`);
473
490
  // Start heartbeat monitor
474
491
  startHeartbeat();
492
+ // Start internal watchdog (crash-loop brake + liveness beacon)
493
+ startWatchdog();
475
494
  // Index memory vectors in background (non-blocking)
476
495
  initEmbeddings().catch(() => { });
477
496
  },
@@ -483,5 +502,6 @@ else {
483
502
  console.log(` WebUI: http://localhost:${process.env.WEB_PORT || 3100}`);
484
503
  // Start heartbeat monitor even without Telegram
485
504
  startHeartbeat();
505
+ startWatchdog();
486
506
  initEmbeddings().catch(() => { });
487
507
  }
@@ -122,11 +122,16 @@ async function executeJob(job) {
122
122
  }
123
123
  case "shell": {
124
124
  const cmd = job.payload.command || "echo 'no command'";
125
- const output = execSync(cmd, {
126
- timeout: 60_000,
125
+ // Per-job timeout, default = no timeout (execSync treats timeout=0
126
+ // or "undefined" as infinite). Users opt in via /cron add … --timeout N.
127
+ const shellOpts = {
127
128
  stdio: "pipe",
128
129
  env: { ...process.env, PATH: process.env.PATH + ":/opt/homebrew/bin:/usr/local/bin" },
129
- }).toString().trim();
130
+ };
131
+ if (typeof job.timeoutMs === "number" && job.timeoutMs > 0) {
132
+ shellOpts.timeout = job.timeoutMs;
133
+ }
134
+ const output = execSync(cmd, shellOpts).toString().trim();
130
135
  // Notify with output
131
136
  if (notifyCallback && output) {
132
137
  await notifyCallback(job.target, `🔧 ${job.name}\n\`\`\`\n${output.slice(0, 3000)}\n\`\`\``);
@@ -173,14 +178,20 @@ async function executeJob(job) {
173
178
  ? Number(job.target.chatId)
174
179
  : undefined;
175
180
  const result = await new Promise((resolve, reject) => {
176
- spawnSubAgent({
181
+ // Only pass `timeout` through when the job has a per-job value.
182
+ // Otherwise the sub-agent inherits the current /subagents default.
183
+ const spawnConfig = {
177
184
  name: job.name,
178
185
  prompt,
179
186
  workingDir: BOT_ROOT,
180
187
  source: "cron",
181
188
  parentChatId,
182
189
  onComplete: (r) => resolve(r),
183
- }).catch(reject);
190
+ };
191
+ if (typeof job.timeoutMs === "number") {
192
+ spawnConfig.timeout = job.timeoutMs;
193
+ }
194
+ spawnSubAgent(spawnConfig).catch(reject);
184
195
  });
185
196
  // Non-success: don't notify here. The I3 delivery router has
186
197
  // already posted the appropriate banner (cancelled / timeout /
@@ -309,6 +320,7 @@ export function createJob(input) {
309
320
  nextRunAt: null,
310
321
  runCount: 0,
311
322
  createdBy: input.createdBy || "unknown",
323
+ ...(typeof input.timeoutMs === "number" ? { timeoutMs: input.timeoutMs } : {}),
312
324
  };
313
325
  // Calculate first run
314
326
  job.nextRunAt = calculateNextRun(job);
@@ -21,6 +21,14 @@ let configCache = null;
21
21
  function isValidVisibility(v) {
22
22
  return v === "auto" || v === "banner" || v === "silent" || v === "live";
23
23
  }
24
+ /** Resolve the initial default timeout from config.ts, which itself seeds
25
+ * from the SUBAGENT_TIMEOUT env var. -1 = unlimited. */
26
+ function seedDefaultTimeout() {
27
+ const raw = config.subAgentTimeout;
28
+ if (typeof raw !== "number" || !Number.isFinite(raw) || raw <= 0)
29
+ return -1;
30
+ return Math.floor(raw);
31
+ }
24
32
  function loadSubAgentsConfig() {
25
33
  if (configCache)
26
34
  return configCache;
@@ -33,14 +41,18 @@ function loadSubAgentsConfig() {
33
41
  queueCap: typeof parsed.queueCap === "number"
34
42
  ? Math.max(0, Math.min(Math.floor(parsed.queueCap), ABSOLUTE_MAX_QUEUE))
35
43
  : DEFAULT_QUEUE_CAP,
44
+ defaultTimeoutMs: typeof parsed.defaultTimeoutMs === "number" && Number.isFinite(parsed.defaultTimeoutMs)
45
+ ? (parsed.defaultTimeoutMs <= 0 ? -1 : Math.floor(parsed.defaultTimeoutMs))
46
+ : seedDefaultTimeout(),
36
47
  };
37
48
  }
38
49
  catch {
39
- // File missing or invalid — seed from env var then default to auto
50
+ // File missing or invalid — seed from env vars then default to auto/unlimited
40
51
  configCache = {
41
52
  maxParallel: Number(process.env.MAX_SUBAGENTS) || 0,
42
53
  visibility: "auto",
43
54
  queueCap: DEFAULT_QUEUE_CAP,
55
+ defaultTimeoutMs: seedDefaultTimeout(),
44
56
  };
45
57
  }
46
58
  return configCache;
@@ -102,6 +114,18 @@ export function setQueueCap(n) {
102
114
  saveSubAgentsConfig({ ...cfg, queueCap: clamped });
103
115
  return clamped;
104
116
  }
117
+ /** Current default timeout in ms. -1 = unlimited. */
118
+ export function getDefaultTimeoutMs() {
119
+ return loadSubAgentsConfig().defaultTimeoutMs;
120
+ }
121
+ /** Set the default timeout in ms. Any value ≤ 0 or non-finite collapses
122
+ * to -1 (unlimited). Returns the persisted value. */
123
+ export function setDefaultTimeoutMs(ms) {
124
+ const normalized = !Number.isFinite(ms) || ms <= 0 ? -1 : Math.floor(ms);
125
+ const cfg = loadSubAgentsConfig();
126
+ saveSubAgentsConfig({ ...cfg, defaultTimeoutMs: normalized });
127
+ return normalized;
128
+ }
105
129
  // ── State ───────────────────────────────────────────────
106
130
  const activeAgents = new Map();
107
131
  // ── Name resolver (B2) ──────────────────────────────────
@@ -433,14 +457,23 @@ export function spawnSubAgent(agentConfig) {
433
457
  const resolved = resolveAgentName(agentConfig.name);
434
458
  const resolvedName = resolved.name;
435
459
  const id = crypto.randomUUID();
436
- const timeout = agentConfig.timeout ?? config.subAgentTimeout;
460
+ // Timeout resolution order:
461
+ // 1. Per-spawn override (agentConfig.timeout) — used by cron jobs that
462
+ // carry their own timeoutMs.
463
+ // 2. Runtime default from sub-agents.json (set via /subagents timeout).
464
+ // 3. config.subAgentTimeout fallback (seeded from SUBAGENT_TIMEOUT env).
465
+ // Any value ≤ 0 means "no timeout" — we simply don't arm the abort timer.
466
+ // The existing null-safe `clearTimeout(timeoutId)` call sites make this
467
+ // a safe no-op when the agent finishes or is cancelled.
468
+ const timeout = agentConfig.timeout ?? getDefaultTimeoutMs();
437
469
  const abort = new AbortController();
438
- const timeoutId = setTimeout(() => abort.abort(), timeout);
470
+ const timeoutId = timeout > 0 ? setTimeout(() => abort.abort(), timeout) : null;
439
471
  const willRunImmediately = running < maxParallel;
440
472
  const canQueue = !willRunImmediately && queueCap > 0 && queuedLen < queueCap;
441
473
  if (!willRunImmediately && !canQueue) {
442
474
  // No slot, no queue room → priority-aware reject
443
- clearTimeout(timeoutId);
475
+ if (timeoutId)
476
+ clearTimeout(timeoutId);
444
477
  const source = sourceOf(agentConfig);
445
478
  const runningAgents = [...activeAgents.values()].filter((a) => a.info.status === "running");
446
479
  const userSlots = runningAgents.filter((a) => a.info.source === "user").length;
@@ -19,6 +19,7 @@ import { resolve, dirname } from "path";
19
19
  import { fileURLToPath } from "url";
20
20
  import fs from "fs";
21
21
  import os from "os";
22
+ import { BOT_VERSION } from "../version.js";
22
23
  const execAsync = promisify(exec);
23
24
  const PROJECT_ROOT = resolve(dirname(fileURLToPath(import.meta.url)), "../..");
24
25
  const DATA_DIR = process.env.ALVIN_DATA_DIR || resolve(os.homedir(), ".alvin-bot");
@@ -84,12 +85,40 @@ function compareSemver(a, b) {
84
85
  }
85
86
  return 0;
86
87
  }
88
+ /**
89
+ * Is the running bot's in-memory version older than what's already built
90
+ * on disk? This happens when the dev/CI rebuilt the bot mid-session and
91
+ * the process hasn't restarted yet. A manual /update without a git/npm
92
+ * fetch should still trigger a restart in this case so the fresh code
93
+ * takes effect.
94
+ */
95
+ function isRuntimeStale() {
96
+ const onDisk = readLocalVersion();
97
+ if (!onDisk || !BOT_VERSION || BOT_VERSION === "unknown")
98
+ return false;
99
+ return compareSemver(BOT_VERSION, onDisk) < 0;
100
+ }
87
101
  /** Pull latest changes, install deps, rebuild. Returns a structured result
88
102
  * instead of throwing so the /update command can report cleanly to Telegram.
89
103
  * Dispatches to the git path for source installs and the npm path for
90
- * npm-global installs. */
104
+ * npm-global installs.
105
+ *
106
+ * Before doing any fetch, checks whether the disk is already newer than
107
+ * the running process (i.e. someone rebuilt between the process start
108
+ * and this call). If so, returns success with requiresRestart=true so
109
+ * the command handler can trigger a graceful restart.
110
+ */
91
111
  export async function runUpdate() {
92
112
  try {
113
+ // Stale-runtime check: disk is already newer than the running code.
114
+ if (isRuntimeStale()) {
115
+ const onDisk = readLocalVersion();
116
+ return {
117
+ ok: true,
118
+ message: `Disk is already built at v${onDisk}, running v${BOT_VERSION}. Restarting to pick up the new code...`,
119
+ requiresRestart: true,
120
+ };
121
+ }
93
122
  if (isOwnGitRepo()) {
94
123
  return await runGitUpdate();
95
124
  }
@@ -0,0 +1,236 @@
1
+ /**
2
+ * Internal Watchdog — Self-monitoring for crash-loop detection.
3
+ *
4
+ * Writes a liveness beacon file every 30 s with the current pid + boot
5
+ * time + crash counter. On startup, reads the beacon to detect whether
6
+ * the previous process exited cleanly or crashed. If too many crashes
7
+ * happen in a short window, refuses to keep restarting and writes an
8
+ * alert file so the user can investigate.
9
+ *
10
+ * Persistence layers this complements:
11
+ * - launchd KeepAlive: true → restarts on any exit (good)
12
+ * - ThrottleInterval: 5 → minimum 5 s between restarts (good)
13
+ * - This watchdog → caps the total restart count so we
14
+ * don't burn CPU on a truly broken state
15
+ *
16
+ * What this CAN catch:
17
+ * - Process crash → exit non-zero → launchd restarts → next boot reads
18
+ * beacon, sees a recent exit, increments crash counter
19
+ * - Tight crash loop → counter accumulates → hits brake at 10
20
+ *
21
+ * What this CANNOT catch (yet):
22
+ * - True event-loop deadlocks (process alive but frozen). That requires
23
+ * an external watchdog process — tracked as a follow-up.
24
+ */
25
+ import fs from "fs";
26
+ import { resolve } from "path";
27
+ import os from "os";
28
+ import { execSync } from "child_process";
29
+ import { BOT_VERSION } from "../version.js";
30
+ const DATA_DIR = process.env.ALVIN_DATA_DIR || resolve(os.homedir(), ".alvin-bot");
31
+ const STATE_DIR = resolve(DATA_DIR, "state");
32
+ const BEACON_FILE = resolve(STATE_DIR, "watchdog.json");
33
+ const ALERT_FILE = resolve(STATE_DIR, "crash-loop.alert");
34
+ const BEACON_INTERVAL_MS = 30_000; // write a beacon every 30 s
35
+ const CRASH_WINDOW_MS = 10 * 60 * 1000; // 10 min — crashes within this count toward the brake
36
+ const CRASH_BRAKE_THRESHOLD = 10; // after this many crashes in the window, brake
37
+ const STALE_BEACON_MS = 90_000; // a beacon older than this is considered "old enough that previous process really exited"
38
+ const RECOVERY_UPTIME_MS = 5 * 60 * 1000; // 5 min of clean uptime resets the counter
39
+ let beaconTimer = null;
40
+ let resetTimer = null;
41
+ let bootTime = 0;
42
+ function ensureStateDir() {
43
+ try {
44
+ fs.mkdirSync(STATE_DIR, { recursive: true });
45
+ }
46
+ catch (err) {
47
+ console.error("[watchdog] failed to create state dir:", err);
48
+ }
49
+ }
50
+ function readBeacon() {
51
+ try {
52
+ const raw = fs.readFileSync(BEACON_FILE, "utf-8");
53
+ const parsed = JSON.parse(raw);
54
+ if (typeof parsed.lastBeat === "number" &&
55
+ typeof parsed.pid === "number" &&
56
+ typeof parsed.bootTime === "number" &&
57
+ typeof parsed.crashCount === "number" &&
58
+ typeof parsed.crashWindowStart === "number" &&
59
+ typeof parsed.version === "string") {
60
+ return parsed;
61
+ }
62
+ return null;
63
+ }
64
+ catch {
65
+ return null;
66
+ }
67
+ }
68
+ function writeBeacon(data) {
69
+ try {
70
+ fs.writeFileSync(BEACON_FILE, JSON.stringify(data, null, 0), "utf-8");
71
+ }
72
+ catch (err) {
73
+ console.error("[watchdog] failed to write beacon:", err);
74
+ }
75
+ }
76
+ function writeAlert(reason, crashCount) {
77
+ try {
78
+ const content = [
79
+ `Alvin Bot crash-loop brake hit at ${new Date().toISOString()}`,
80
+ `Version: ${BOT_VERSION}`,
81
+ `Crashes in the last ${CRASH_WINDOW_MS / 60_000} minutes: ${crashCount}`,
82
+ `Threshold: ${CRASH_BRAKE_THRESHOLD}`,
83
+ ``,
84
+ `Reason: ${reason}`,
85
+ ``,
86
+ `The bot will refuse to start until this file is removed AND the`,
87
+ `LaunchAgent is reloaded. Investigate the recent error log:`,
88
+ ` ${resolve(DATA_DIR, "logs", "alvin-bot.err.log")}`,
89
+ ``,
90
+ `Recovery steps once you've fixed the underlying issue:`,
91
+ ` rm "${ALERT_FILE}"`,
92
+ ` alvin-bot launchd install # or just kickstart the service`,
93
+ ``,
94
+ ].join("\n");
95
+ fs.writeFileSync(ALERT_FILE, content, "utf-8");
96
+ }
97
+ catch (err) {
98
+ console.error("[watchdog] failed to write alert:", err);
99
+ }
100
+ }
101
+ /**
102
+ * Check whether the watchdog has hit the crash-loop brake. Called once
103
+ * at startup, BEFORE most of the bot initializes. If the brake is set
104
+ * (alert file exists), the bot exits cleanly with code 3 — and because
105
+ * launchd's KeepAlive will keep retrying, we also try to unload our
106
+ * own LaunchAgent so the retries stop.
107
+ */
108
+ export function checkCrashLoopBrake() {
109
+ if (!fs.existsSync(ALERT_FILE))
110
+ return;
111
+ console.error("");
112
+ console.error("==================================================");
113
+ console.error("⛔ alvin-bot crash-loop brake is engaged");
114
+ console.error("==================================================");
115
+ try {
116
+ const content = fs.readFileSync(ALERT_FILE, "utf-8");
117
+ console.error(content);
118
+ }
119
+ catch { /* ignore */ }
120
+ // Attempt to unload our own LaunchAgent so launchd stops retrying.
121
+ // If we don't do this, launchd just KeepAlive's us forever and we
122
+ // burn CPU writing the same alert.
123
+ if (process.platform === "darwin") {
124
+ try {
125
+ const home = os.homedir();
126
+ const plistPath = resolve(home, "Library", "LaunchAgents", "com.alvinbot.app.plist");
127
+ if (fs.existsSync(plistPath)) {
128
+ execSync(`launchctl unload -w "${plistPath}"`, { stdio: "pipe" });
129
+ console.error("[watchdog] LaunchAgent unloaded — bot will not auto-restart.");
130
+ }
131
+ }
132
+ catch (err) {
133
+ console.error("[watchdog] failed to unload LaunchAgent:", err);
134
+ }
135
+ }
136
+ // Exit with a distinct code so logs make the cause obvious
137
+ process.exit(3);
138
+ }
139
+ /**
140
+ * Start the watchdog. Called from src/index.ts after all services are
141
+ * initialized. Reads the previous beacon, increments crash counter if
142
+ * the previous run exited recently, schedules the periodic beacon
143
+ * writer, and schedules a recovery-mark reset after RECOVERY_UPTIME_MS
144
+ * of clean uptime.
145
+ */
146
+ export function startWatchdog() {
147
+ ensureStateDir();
148
+ bootTime = Date.now();
149
+ const previous = readBeacon();
150
+ let crashCount = 0;
151
+ let crashWindowStart = bootTime;
152
+ if (previous) {
153
+ const timeSinceLastBeat = bootTime - previous.lastBeat;
154
+ const inWindow = bootTime - previous.crashWindowStart < CRASH_WINDOW_MS;
155
+ if (timeSinceLastBeat < STALE_BEACON_MS) {
156
+ // Previous process exited very recently → that's a crash (or a
157
+ // graceful exit immediately followed by a restart, which we treat
158
+ // the same way for the brake — the goal is to detect rapid cycles).
159
+ if (inWindow) {
160
+ crashCount = previous.crashCount + 1;
161
+ crashWindowStart = previous.crashWindowStart;
162
+ }
163
+ else {
164
+ // Previous crash was outside the window → reset counter
165
+ crashCount = 1;
166
+ }
167
+ console.log(`[watchdog] detected restart after ${Math.round(timeSinceLastBeat / 1000)}s — crash ${crashCount}/${CRASH_BRAKE_THRESHOLD} in current ${CRASH_WINDOW_MS / 60_000}min window`);
168
+ if (crashCount >= CRASH_BRAKE_THRESHOLD) {
169
+ console.error(`[watchdog] crash-loop brake triggered (${crashCount} crashes in ${CRASH_WINDOW_MS / 60_000}min)`);
170
+ writeAlert(`Process restarted ${crashCount} times within ${CRASH_WINDOW_MS / 60_000} minutes. Last beacon was ${Math.round(timeSinceLastBeat / 1000)}s ago. Most likely a deterministic crash on startup.`, crashCount);
171
+ // Re-use the brake check to unload + exit cleanly
172
+ checkCrashLoopBrake();
173
+ }
174
+ }
175
+ else {
176
+ // Previous beacon was old → process had clean uptime before exit,
177
+ // OR system was rebooted between runs. Reset crash count.
178
+ crashCount = 0;
179
+ crashWindowStart = bootTime;
180
+ }
181
+ }
182
+ // Write the first beacon immediately so a fresh restart updates the file
183
+ writeBeacon({
184
+ lastBeat: bootTime,
185
+ pid: process.pid,
186
+ bootTime,
187
+ crashCount,
188
+ crashWindowStart,
189
+ version: BOT_VERSION,
190
+ });
191
+ // Periodic beacon writer
192
+ beaconTimer = setInterval(() => {
193
+ writeBeacon({
194
+ lastBeat: Date.now(),
195
+ pid: process.pid,
196
+ bootTime,
197
+ crashCount,
198
+ crashWindowStart,
199
+ version: BOT_VERSION,
200
+ });
201
+ }, BEACON_INTERVAL_MS);
202
+ // Schedule a recovery counter reset after RECOVERY_UPTIME_MS of clean
203
+ // uptime. If we make it that far without dying, the bot is healthy
204
+ // again and we shouldn't penalize a future single crash.
205
+ resetTimer = setTimeout(() => {
206
+ if (crashCount > 0) {
207
+ console.log(`[watchdog] ${RECOVERY_UPTIME_MS / 60_000}min clean uptime — resetting crash counter from ${crashCount} to 0`);
208
+ crashCount = 0;
209
+ crashWindowStart = Date.now();
210
+ writeBeacon({
211
+ lastBeat: Date.now(),
212
+ pid: process.pid,
213
+ bootTime,
214
+ crashCount,
215
+ crashWindowStart,
216
+ version: BOT_VERSION,
217
+ });
218
+ }
219
+ }, RECOVERY_UPTIME_MS);
220
+ console.log(`[watchdog] started — beacon every ${BEACON_INTERVAL_MS / 1000}s, brake at ${CRASH_BRAKE_THRESHOLD} crashes per ${CRASH_WINDOW_MS / 60_000}min, recovery after ${RECOVERY_UPTIME_MS / 60_000}min uptime`);
221
+ }
222
+ /**
223
+ * Stop the watchdog cleanly. Called from the shutdown handler in
224
+ * index.ts so beacon timers don't keep the process alive after the
225
+ * grammy bot has stopped.
226
+ */
227
+ export function stopWatchdog() {
228
+ if (beaconTimer) {
229
+ clearInterval(beaconTimer);
230
+ beaconTimer = null;
231
+ }
232
+ if (resetTimer) {
233
+ clearTimeout(resetTimer);
234
+ resetTimer = null;
235
+ }
236
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "alvin-bot",
3
- "version": "4.8.6",
3
+ "version": "4.8.8",
4
4
  "description": "Alvin Bot — Your personal AI agent on Telegram, WhatsApp, Discord, Signal, and Web.",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",