npm - alvin-bot - Versions diffs - 4.16.1 → 4.18.0 - Mend

alvin-bot 4.16.1 → 4.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/CHANGELOG.md +47 -0
package/dist/handlers/commands.js +31 -0
package/dist/index.js +14 -1
package/dist/platforms/whatsapp.js +13 -0
package/dist/services/async-agent-watcher.js +23 -0
package/dist/services/browser-manager.js +11 -0
package/dist/services/cdp-bootstrap.js +6 -0
package/dist/services/disk-cleanup.js +203 -0
package/dist/services/embeddings.js +24 -1
package/dist/services/heartbeat.js +4 -0
package/dist/services/mcp.js +11 -0
package/dist/services/skills.js +4 -2
package/dist/services/subagents.js +38 -0
package/dist/services/updater.js +1 -0
package/dist/services/users.js +82 -11
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,53 @@
 All notable changes to Alvin Bot are documented here.
+## [4.18.0] — 2026-04-20
+### ⚡ Performance + Hardening: medium-priority cleanups from the stability audit
+Completes the audit work started in 4.17.0 by addressing the remaining medium-severity findings.
+**Performance (hot path):**
+- **User profiles now cached in memory** (`src/services/users.ts`). Previously `touchProfile` — called on every inbound message — did a sync `readFileSync` + `writeFileSync` on disk. Now it updates an in-memory cache and schedules a debounced flush (2s batch window). A final flush runs on graceful shutdown so nothing is lost. Drops 2 blocking fs operations per message.
+- **Embeddings index now cached** (`src/services/embeddings.ts`). Semantic search previously re-read + re-parsed the full on-disk index on every query (100+ MB for large memories). Now cached in memory with mtime-based invalidation — external reindexers still picked up without a restart.
+- **Skills no longer force-reload every 5 minutes** (`src/services/skills.ts`). `getSkills()` used to re-scan the disk after 5min even though `fs.watch` already triggers hot-reload on change. Cache is now authoritative.
+**Hardening (unbounded growth):**
+- **Sub-agents map capped at 1000** (`src/services/subagents.ts`). Hits the 90%-target on overflow and evicts oldest delivered/terminated entries first. Running agents are never evicted.
+- **Async-agent pending map capped at 500** (`src/services/async-agent-watcher.ts`). Same LRU strategy for orphaned `registerPending` entries.
+- **Browser gateway + MCP subprocess stderr now have error handlers** (`browser-manager.ts`, `mcp.ts`). Previously a stream error would throw unhandled and could crash the node process.
+**Net effect:** message path now does zero blocking fs reads/writes on the profile/skills/embeddings side. Long-running installs can't grow the in-memory state beyond the caps. No API changes.
+## [4.17.0] — 2026-04-20
+### 🛡️ Hardening: long-running stability audit + leak fixes
+Ran a full audit of leak/stability hazards for 24/7 operation. Fixed the critical findings and added a disk-cleanup service so the bot stays lean over months of uptime.
+**Fixes:**
+- **WhatsApp event-listener leak on reconnect** (`src/platforms/whatsapp.ts`): Before every new socket, the previous socket's listeners are now removed and the old socket is ended. Without this, every reconnect stacked new listeners on top of old ones — causing memory growth and duplicate message processing after long sessions.
+- **CDP file-descriptor leak** (`src/services/cdp-bootstrap.ts`): The log-file fd passed to the detached Chromium spawn is now closed in the parent after the child inherits it. Previously leaked one fd per browser bootstrap.
+- **Heartbeat + auto-update timers now `.unref()`'d** and explicitly stopped in the shutdown handler. Prevents timers from keeping the process alive during graceful exit.
+### 🧹 Feature: disk-cleanup service
+New service (`src/services/disk-cleanup.ts`) that runs automatically once a day. Deletes transient files that grow without bound on long-running installs:
+- Bot log rotation (>100 MB by default)
+- Browser screenshots (>30 days)
+- Subagent output streams (>30 days)
+- `/tmp/alvin-bot/` media (>7 days)
+- WhatsApp media cache (>30 days)
+- CDP log file
+**NEVER touched:** memory, assets, workspaces, cron-jobs, .env, session-store, delivery-queue. Memory is protected.
+**Configuration via env:** `CLEANUP_LOG_MAX_MB`, `CLEANUP_SCREENSHOTS_DAYS`, `CLEANUP_SUBAGENTS_DAYS`, `CLEANUP_TMP_DAYS`, `CLEANUP_WA_MEDIA_DAYS`. Set any to `0` to disable that category.
+**Telegram command:**
+- `/cleanup` — show current policy + protected paths
+- `/cleanup run` — trigger manual pass, get stats back
 ## [4.16.1] — 2026-04-20
 ### 🆕 Feature: /update shows release highlights

package/dist/handlers/commands.js CHANGED Viewed

@@ -28,6 +28,7 @@ import { getWebPort } from "../web/server.js";
 import { getUsageSummary, getAllRateLimits, formatTokens } from "../services/usage-tracker.js";
 import { runUpdate, getAutoUpdate, setAutoUpdate, startAutoUpdateLoop } from "../services/updater.js";
 import { getReleaseHighlights } from "../services/release-highlights.js";
+import { runCleanup, getCleanupPolicy } from "../services/disk-cleanup.js";
 import { getHealthStatus, isFailedOver } from "../services/heartbeat.js";
 import { t, LOCALE_NAMES, LOCALE_FLAGS } from "../i18n.js";
 // Kick off auto-update loop on module load if the persistent flag is set.
@@ -1919,6 +1920,36 @@ export function registerCommands(bot) {
             await ctx.reply(`${t("bot.autoupdate.statusLabel", lang)} *${status ? "ON" : "OFF"}*\n\n${t("bot.autoupdate.commandsLabel", lang)}\n\`/autoupdate on\`\n\`/autoupdate off\``, { parse_mode: "Markdown" });
         }
     });
+    // /cleanup — trigger disk cleanup manually, or show current policy.
+    //   /cleanup          → show policy
+    //   /cleanup run      → run a cleanup pass and report what was deleted
+    bot.command("cleanup", async (ctx) => {
+        const arg = (ctx.match || "").trim().toLowerCase();
+        if (arg === "run" || arg === "now") {
+            await ctx.reply("🧹 Running disk cleanup...");
+            const r = await runCleanup();
+            const bytes = r.bytesReclaimed;
+            const human = bytes < 1024 * 1024
+                ? `${(bytes / 1024).toFixed(1)} KB`
+                : bytes < 1024 * 1024 * 1024
+                    ? `${(bytes / 1024 / 1024).toFixed(1)} MB`
+                    : `${(bytes / 1024 / 1024 / 1024).toFixed(2)} GB`;
+            const errLine = r.errors.length > 0 ? `\n⚠️ ${r.errors.length} error(s)` : "";
+            await ctx.reply(`✅ Cleanup done\n• Files deleted: ${r.filesDeleted}\n• Logs rotated: ${r.logsRotated}\n• Reclaimed: ${human}${errLine}`);
+        }
+        else {
+            const p = getCleanupPolicy();
+            await ctx.reply(`🧹 *Cleanup policy*\n` +
+                `• Log rotation: >${p.logMaxSizeMb} MB\n` +
+                `• Screenshots: >${p.screenshotsMaxAgeDays} days\n` +
+                `• Subagent outputs: >${p.subagentsMaxAgeDays} days\n` +
+                `• /tmp/alvin-bot: >${p.tmpMaxAgeDays} days\n` +
+                `• WhatsApp media: >${p.waMediaMaxAgeDays} days\n\n` +
+                `Memory, assets, workspaces, cron jobs are NEVER touched.\n\n` +
+                `Configure via env: \`CLEANUP_LOG_MAX_MB\`, \`CLEANUP_SCREENSHOTS_DAYS\`, \`CLEANUP_SUBAGENTS_DAYS\`, \`CLEANUP_TMP_DAYS\`, \`CLEANUP_WA_MEDIA_DAYS\`\n\n` +
+                `Run manually: \`/cleanup run\``, { parse_mode: "Markdown" });
+        }
+    });
     // ── /sub-agents — manage background subagents (cron jobs + manual spawns) ──
     //
     // /sub-agents                → show current config + running agents

package/dist/index.js CHANGED Viewed

@@ -155,7 +155,10 @@ import { startSessionCleanup, stopSessionCleanup, attachPersistHook } from "./se
 import { loadPersistedSessions, flushSessions, schedulePersist, } from "./services/session-persistence.js";
 import { processQueue, cleanupQueue, setSenders, enqueue } from "./services/delivery-queue.js";
 import { discoverTools } from "./services/tool-discovery.js";
-import { startHeartbeat } from "./services/heartbeat.js";
+import { startHeartbeat, stopHeartbeat } from "./services/heartbeat.js";
+import { stopAutoUpdateLoop } from "./services/updater.js";
+import { startCleanupLoop, stopCleanupLoop } from "./services/disk-cleanup.js";
+import { flushProfiles } from "./services/users.js";
 import { initEmbeddings } from "./services/embeddings.js";
 import { loadSkills } from "./services/skills.js";
 import { loadHooks } from "./services/hooks.js";
@@ -335,10 +338,19 @@ const shutdown = async () => {
     stopAsyncAgentWatcher();
     stopSessionCleanup();
     stopWorkspaceWatcher();
+    stopHeartbeat();
+    stopAutoUpdateLoop();
+    stopCleanupLoop();
     // v4.11.0 — Final immediate flush of in-memory sessions to disk before exit.
     // The debounced timer might be pending; flushSessions() cancels it and writes
     // synchronously so the next boot can rehydrate the latest state.
     await flushSessions().catch((err) => console.warn("[shutdown] flushSessions failed:", err));
+    try {
+        flushProfiles();
+    }
+    catch (err) {
+        console.warn("[shutdown] flushProfiles failed:", err);
+    }
     if (queueInterval)
         clearInterval(queueInterval);
     if (queueCleanupInterval)
@@ -612,5 +624,6 @@ else {
     // Start heartbeat monitor even without Telegram
     startHeartbeat();
     startWatchdog();
+    startCleanupLoop();
     initEmbeddings().catch(() => { });
 }

package/dist/platforms/whatsapp.js CHANGED Viewed

@@ -252,6 +252,19 @@ export class WhatsAppAdapter {
             fs.mkdirSync(authDir, { recursive: true });
         const { state, saveCreds } = await useMultiFileAuthState(authDir);
         const { version } = await fetchLatestBaileysVersion();
+        // Cleanup previous socket (reconnect path) — without this, every reconnect
+        // stacks a new set of listeners on baileys' EventEmitter, so messages get
+        // processed N times after N reconnects and closures leak.
+        if (this.sock) {
+            try {
+                this.sock.ev?.removeAllListeners?.();
+                this.sock.end?.(new Error("reconnect"));
+            }
+            catch {
+                // best-effort cleanup — ignore failures from already-dead socket
+            }
+            this.sock = null;
+        }
         const sock = makeWASocket({
             version,
             auth: {

package/dist/services/async-agent-watcher.js CHANGED Viewed

@@ -62,6 +62,28 @@ function getMissingFileFailureMs() {
 const pending = new Map();
 let pollTimer = null;
 let started = false;
+/**
+ * Hard cap on the pending-agents map. Without this, a bot that runs many
+ * async agents but sees some fail to write their outputFile would see
+ * entries linger up to `giveUpAt` (12h default). If the rate of
+ * registerPending() outpaces resolutions for days, memory and the disk
+ * state file grow unbounded. We evict oldest-first when over the cap.
+ */
+const MAX_PENDING_AGENTS = 500;
+function enforcePendingCap() {
+    if (pending.size < MAX_PENDING_AGENTS)
+        return;
+    const entries = [...pending.entries()].sort((a, b) => a[1].startedAt - b[1].startedAt);
+    const target = Math.floor(MAX_PENDING_AGENTS * 0.9);
+    let toEvict = pending.size - target;
+    for (const [id] of entries) {
+        if (toEvict <= 0)
+            break;
+        pending.delete(id);
+        toEvict--;
+    }
+    console.warn(`[async-agent-watcher] pending map hit cap ${MAX_PENDING_AGENTS}, evicted to ${pending.size}`);
+}
 // ── Persistence ───────────────────────────────────────────────────
 function loadFromDisk() {
     try {
@@ -110,6 +132,7 @@ export function registerPendingAgent(input) {
         sessionKey: input.sessionKey,
         platform: input.platform,
     };
+    enforcePendingCap();
     pending.set(input.agentId, entry);
     saveToDisk();
 }

package/dist/services/browser-manager.js CHANGED Viewed

@@ -233,6 +233,17 @@ async function ensureGateway() {
     gatewayProcess.on("exit", () => {
         gatewayProcess = null;
     });
+    // Surface spawn failures so we don't silently think the gateway is running.
+    gatewayProcess.on("error", (err) => {
+        log(`gateway spawn error: ${err.message}`);
+        gatewayProcess = null;
+    });
+    // Drain stdio pipes — otherwise stdout/stderr buffer fills and the child
+    // blocks on write. We don't care about the content (just that they drain).
+    gatewayProcess.stdout?.on("error", () => { });
+    gatewayProcess.stderr?.on("error", () => { });
+    gatewayProcess.stdout?.resume();
+    gatewayProcess.stderr?.resume();
     // Wait for startup (max 10s)
     for (let i = 0; i < 20; i++) {
         await new Promise((r) => setTimeout(r, 500));

package/dist/services/cdp-bootstrap.js CHANGED Viewed

@@ -196,6 +196,12 @@ export async function ensureRunning(opts = {}) {
             detached: true,
         });
         child.unref();
+        // The child inherits its own copy of the fd. Close our copy so the parent
+        // process doesn't leak a file descriptor per Chromium bootstrap.
+        try {
+            fs.closeSync(logStream);
+        }
+        catch { /* already closed — fine */ }
         if (!child.pid) {
             throw new Error("Failed to spawn Chromium (no PID)");
         }

package/dist/services/disk-cleanup.js ADDED Viewed

@@ -0,0 +1,203 @@
+/**
+ * Disk Cleanup Service — periodic cleanup of transient bot files.
+ *
+ * Targets files that are SAFE to delete (logs, temp screenshots, browser
+ * artifacts, old subagent streams) and leaves critical data alone
+ * (memory, assets, workspaces, cron-jobs, .env, session-store).
+ *
+ * Strategy:
+ *   - Each path has a max age (days) OR a max size (MB, with rotation)
+ *   - Defaults are conservative: keep 30 days of artifacts, rotate logs >100MB
+ *   - All knobs overridable via env (CLEANUP_* vars) and via /cleanup set <key>
+ *   - Runs once at boot + every 24h thereafter, unref'd so it doesn't
+ *     prevent shutdown
+ *
+ * NEVER cleaned:
+ *   ~/.alvin-bot/memory/         (daily logs, long-term memory)
+ *   ~/.alvin-bot/assets/         (user-supplied files)
+ *   ~/.alvin-bot/workspaces/     (user configuration)
+ *   ~/.alvin-bot/cron-jobs.json  (scheduled tasks)
+ *   ~/.alvin-bot/.env            (secrets)
+ *   ~/.alvin-bot/session-store.json (resume tokens)
+ *   ~/.alvin-bot/delivery-queue.json
+ *   ~/.alvin-bot/standing-orders
+ *   ~/.alvin-bot/auto-update.flag
+ */
+import fs from "fs";
+import path from "path";
+import os from "os";
+import { DATA_DIR } from "../paths.js";
+const DEFAULT_POLICY = {
+    logMaxSizeMb: parseInt(process.env.CLEANUP_LOG_MAX_MB || "100", 10),
+    screenshotsMaxAgeDays: parseInt(process.env.CLEANUP_SCREENSHOTS_DAYS || "30", 10),
+    subagentsMaxAgeDays: parseInt(process.env.CLEANUP_SUBAGENTS_DAYS || "30", 10),
+    tmpMaxAgeDays: parseInt(process.env.CLEANUP_TMP_DAYS || "7", 10),
+    waMediaMaxAgeDays: parseInt(process.env.CLEANUP_WA_MEDIA_DAYS || "30", 10),
+};
+const CLEANUP_INTERVAL_MS = 24 * 60 * 60 * 1000; // once a day
+let cleanupTimer = null;
+/**
+ * Return the current effective policy (env-overridden defaults).
+ */
+export function getCleanupPolicy() {
+    return { ...DEFAULT_POLICY };
+}
+/**
+ * Run a cleanup pass once. Safe to call manually (e.g. /cleanup command).
+ */
+export async function runCleanup(policyOverride) {
+    const policy = { ...DEFAULT_POLICY, ...policyOverride };
+    const result = {
+        filesDeleted: 0,
+        bytesReclaimed: 0,
+        logsRotated: 0,
+        errors: [],
+        details: [],
+    };
+    // 1. Rotate large log files (launchd stdout/stderr)
+    if (policy.logMaxSizeMb > 0) {
+        const logsDir = path.join(DATA_DIR, "logs");
+        try {
+            if (fs.existsSync(logsDir)) {
+                for (const name of fs.readdirSync(logsDir)) {
+                    if (!name.endsWith(".log"))
+                        continue;
+                    const full = path.join(logsDir, name);
+                    try {
+                        const st = fs.statSync(full);
+                        if (st.size > policy.logMaxSizeMb * 1024 * 1024) {
+                            // Rotate: keep a .old, overwrite current. Launchd will reopen on next write.
+                            const oldPath = full + ".old";
+                            try {
+                                fs.rmSync(oldPath, { force: true });
+                            }
+                            catch { }
+                            fs.renameSync(full, oldPath);
+                            fs.writeFileSync(full, "");
+                            result.logsRotated++;
+                            result.bytesReclaimed += st.size;
+                            result.details.push({ path: full, action: "rotated", size: st.size });
+                        }
+                    }
+                    catch (err) {
+                        result.errors.push(`log-rotate ${full}: ${err.message}`);
+                    }
+                }
+            }
+        }
+        catch (err) {
+            result.errors.push(`logs scan: ${err.message}`);
+        }
+    }
+    // 2. Browser screenshots (bot-owned CDP)
+    if (policy.screenshotsMaxAgeDays > 0) {
+        const dir = path.join(DATA_DIR, "browser", "screenshots");
+        cleanupOldFiles(dir, policy.screenshotsMaxAgeDays, result);
+    }
+    // 3. Subagent streaming outputs — only delete FINISHED ones (older than N days).
+    // We trust that the async-agent-watcher has already marked them done — files
+    // older than a few days are either delivered or definitively abandoned.
+    if (policy.subagentsMaxAgeDays > 0) {
+        const dir = path.join(DATA_DIR, "subagents");
+        cleanupOldFiles(dir, policy.subagentsMaxAgeDays, result, [".jsonl", ".err"]);
+    }
+    // 4. /tmp/alvin-bot/*  (media, temp scrapes)
+    if (policy.tmpMaxAgeDays > 0) {
+        cleanupOldFiles("/tmp/alvin-bot", policy.tmpMaxAgeDays, result);
+    }
+    // 5. WhatsApp media cache
+    if (policy.waMediaMaxAgeDays > 0) {
+        const dir = path.join(DATA_DIR, "data", "wa-media");
+        cleanupOldFiles(dir, policy.waMediaMaxAgeDays, result);
+    }
+    // 6. CDP log (/tmp/chrome-cdp.log) — always keep just the latest boot
+    const cdpLog = path.join(os.tmpdir(), "chrome-cdp.log");
+    try {
+        if (fs.existsSync(cdpLog)) {
+            const st = fs.statSync(cdpLog);
+            const ageDays = (Date.now() - st.mtimeMs) / (24 * 60 * 60 * 1000);
+            if (ageDays > 7) {
+                fs.unlinkSync(cdpLog);
+                result.filesDeleted++;
+                result.bytesReclaimed += st.size;
+                result.details.push({ path: cdpLog, action: "deleted", size: st.size });
+            }
+        }
+    }
+    catch {
+        // Not critical
+    }
+    return result;
+}
+/**
+ * Delete files in `dir` older than `maxAgeDays`. Safe if `dir` doesn't exist.
+ * Optional extension filter — e.g. [".jsonl", ".err"] restricts to those types.
+ */
+function cleanupOldFiles(dir, maxAgeDays, result, extensions) {
+    if (!fs.existsSync(dir))
+        return;
+    const cutoffMs = Date.now() - maxAgeDays * 24 * 60 * 60 * 1000;
+    try {
+        for (const entry of fs.readdirSync(dir, { withFileTypes: true })) {
+            const full = path.join(dir, entry.name);
+            if (!entry.isFile())
+                continue;
+            if (extensions && !extensions.some((ext) => entry.name.endsWith(ext)))
+                continue;
+            try {
+                const st = fs.statSync(full);
+                if (st.mtimeMs < cutoffMs) {
+                    fs.unlinkSync(full);
+                    result.filesDeleted++;
+                    result.bytesReclaimed += st.size;
+                    result.details.push({ path: full, action: "deleted", size: st.size });
+                }
+            }
+            catch (err) {
+                result.errors.push(`${full}: ${err.message}`);
+            }
+        }
+    }
+    catch (err) {
+        result.errors.push(`scan ${dir}: ${err.message}`);
+    }
+}
+/**
+ * Start the periodic cleanup loop. Runs first pass after 5 minutes (let the
+ * bot fully boot and avoid competing with startup I/O), then every 24h.
+ */
+export function startCleanupLoop() {
+    if (cleanupTimer)
+        return;
+    // First run delayed so we don't step on a restart that's still writing logs
+    setTimeout(() => {
+        void runCleanup().then((r) => {
+            if (r.filesDeleted > 0 || r.logsRotated > 0) {
+                console.log(`[cleanup] ${r.filesDeleted} files deleted, ${r.logsRotated} logs rotated, ${formatBytes(r.bytesReclaimed)} reclaimed`);
+            }
+        });
+    }, 5 * 60 * 1000);
+    cleanupTimer = setInterval(() => {
+        void runCleanup().then((r) => {
+            if (r.filesDeleted > 0 || r.logsRotated > 0) {
+                console.log(`[cleanup] ${r.filesDeleted} files deleted, ${r.logsRotated} logs rotated, ${formatBytes(r.bytesReclaimed)} reclaimed`);
+            }
+        });
+    }, CLEANUP_INTERVAL_MS);
+    cleanupTimer.unref?.();
+}
+export function stopCleanupLoop() {
+    if (cleanupTimer) {
+        clearInterval(cleanupTimer);
+        cleanupTimer = null;
+    }
+}
+function formatBytes(n) {
+    if (n < 1024)
+        return `${n} B`;
+    if (n < 1024 * 1024)
+        return `${(n / 1024).toFixed(1)} KB`;
+    if (n < 1024 * 1024 * 1024)
+        return `${(n / 1024 / 1024).toFixed(1)} MB`;
+    return `${(n / 1024 / 1024 / 1024).toFixed(2)} GB`;
+}

package/dist/services/embeddings.js CHANGED Viewed

@@ -143,12 +143,26 @@ function chunkMarkdown(content, source) {
     return chunks;
 }
 // ── Index Management ────────────────────────────────────
+// In-memory cache for the embedding index. Without this, every query would
+// re-read and re-parse the on-disk index (can be 100+ MB, making searchMemory
+// the slowest step in a message turn). We keep the parsed object and invalidate
+// via mtime check — so external reindexers are still picked up.
+let indexCache = null;
+let indexCacheMtime = 0;
 function loadIndex() {
     try {
+        const st = fs.statSync(INDEX_FILE);
+        if (indexCache && st.mtimeMs === indexCacheMtime) {
+            return indexCache;
+        }
         const raw = fs.readFileSync(INDEX_FILE, "utf-8");
-        return JSON.parse(raw);
+        indexCache = JSON.parse(raw);
+        indexCacheMtime = st.mtimeMs;
+        return indexCache;
     }
     catch {
+        // File missing or unparseable — return an empty index and don't cache it
+        // (next call will retry, so a freshly-written index gets picked up).
         return {
             model: EMBEDDING_MODEL,
             lastReindex: 0,
@@ -159,6 +173,15 @@ function loadIndex() {
 }
 function saveIndex(index) {
     fs.writeFileSync(INDEX_FILE, JSON.stringify(index));
+    // Refresh cache immediately so the next loadIndex() sees the new state
+    // without a disk round-trip.
+    indexCache = index;
+    try {
+        indexCacheMtime = fs.statSync(INDEX_FILE).mtimeMs;
+    }
+    catch {
+        indexCacheMtime = Date.now();
+    }
 }
 /**
  * Recursively walk a directory, returning file paths.

package/dist/services/heartbeat.js CHANGED Viewed

@@ -72,6 +72,10 @@ export function startHeartbeat() {
     setTimeout(() => {
         runHeartbeat();
         state.intervalId = setInterval(runHeartbeat, HEARTBEAT_INTERVAL_MS);
+        // .unref() so this interval alone doesn't keep the process alive during
+        // graceful shutdown — the bot's main loop (grammy, platforms) keeps it
+        // running, and once those stop we want the process to exit cleanly.
+        state.intervalId?.unref?.();
     }, 30_000);
 }
 /**

package/dist/services/mcp.js CHANGED Viewed

@@ -116,6 +116,17 @@ async function connectStdio(name, config) {
         proc.stderr.on("data", (data) => {
             console.error(`MCP ${name} stderr:`, data.toString().trim());
         });
+        // Surface stderr stream errors so we don't silently lose the channel
+        // (EPIPE, ECONNRESET etc). Without this, unhandled 'error' on the
+        // stream would crash the whole Node process.
+        proc.stderr.on("error", (err) => {
+            console.error(`MCP ${name} stderr stream error:`, err.message);
+            server.connected = false;
+        });
+        proc.stdout?.on("error", (err) => {
+            console.error(`MCP ${name} stdout stream error:`, err.message);
+            server.connected = false;
+        });
         proc.on("error", (err) => {
             console.error(`MCP ${name} process error:`, err);
             server.connected = false;

package/dist/services/skills.js CHANGED Viewed

@@ -167,10 +167,12 @@ export function loadSkills() {
     return cachedSkills;
 }
 /**
- * Get all loaded skills.
+ * Get all loaded skills. Cached after the first loadSkills() call; hot-reload
+ * happens via fs.watch when files change on disk. We only force a scan here if
+ * the cache is empty (init-order edge case).
  */
 export function getSkills() {
-    if (cachedSkills.length === 0 || Date.now() - lastScanAt > 300_000) {
+    if (cachedSkills.length === 0) {
         reloadAllSkills();
     }
     return cachedSkills;

package/dist/services/subagents.js CHANGED Viewed

@@ -128,6 +128,43 @@ export function setDefaultTimeoutMs(ms) {
 }
 // ── State ───────────────────────────────────────────────
 const activeAgents = new Map();
+/**
+ * Hard cap on the activeAgents map. Without this, a long-running bot that
+ * spawns many agents (e.g. a chatty cron + manual triggers over months) would
+ * accumulate delivered entries indefinitely. The 30-min auto-cleanup inside
+ * runSubAgent only fires on graceful completion, so crashed/orphaned entries
+ * would linger until the 12h giveUpAt ceiling.
+ *
+ * Enforcement: whenever we insert a new entry and the map is at-or-over the
+ * cap, evict the oldest finished-and-delivered entries first. Running agents
+ * are never evicted.
+ */
+const MAX_ACTIVE_AGENTS = 1000;
+function enforceAgentCap() {
+    if (activeAgents.size < MAX_ACTIVE_AGENTS)
+        return;
+    // Collect evictable entries (delivered OR terminal status), sort by startedAt
+    const evictable = [];
+    for (const [id, entry] of activeAgents) {
+        const status = entry.info.status;
+        const done = entry.delivered || status === "error" || status === "timeout" || status === "cancelled";
+        if (done)
+            evictable.push([id, entry.info.startedAt]);
+    }
+    evictable.sort((a, b) => a[1] - b[1]);
+    // Evict enough to land 10% below the cap, so we don't oscillate.
+    const target = Math.floor(MAX_ACTIVE_AGENTS * 0.9);
+    let toEvict = activeAgents.size - target;
+    for (const [id] of evictable) {
+        if (toEvict <= 0)
+            break;
+        activeAgents.delete(id);
+        toEvict--;
+    }
+    if (toEvict > 0) {
+        console.warn(`[subagents] map at ${activeAgents.size}/${MAX_ACTIVE_AGENTS} — could not evict enough finished entries (too many still running)`);
+    }
+}
 // ── Name resolver (B2) ──────────────────────────────────
 /**
  * Return all currently-tracked agents whose *base* name matches `base`.
@@ -563,6 +600,7 @@ export function spawnSubAgent(agentConfig) {
         nameIndex: resolved.index,
         queuePosition: willRunImmediately ? undefined : queuedLen + 1,
     };
+    enforceAgentCap();
     activeAgents.set(id, { info, abort, delivered: false });
     const queuedSpawn = { id, resolvedName, agentConfig, depth, timeoutId };
     if (willRunImmediately) {

package/dist/services/updater.js CHANGED Viewed

@@ -272,6 +272,7 @@ export function startAutoUpdateLoop() {
             console.log(`[auto-update] check failed: ${result.message}`);
         }
     }, AUTO_CHECK_INTERVAL_MS);
+    autoTimer.unref?.();
     console.log(`[auto-update] loop started (interval: 6h)`);
 }
 export function stopAutoUpdateLoop() {

package/dist/services/users.js CHANGED Viewed

@@ -8,6 +8,12 @@
  *
  * The admin/owner user uses the global docs/memory/ and docs/MEMORY.md.
  * Additional users get isolated memory spaces.
+ *
+ * Performance:
+ *   Profiles are cached in memory after first read. `touchProfile` — called
+ *   on every inbound message — writes to cache and schedules a debounced
+ *   disk flush (2s). This avoids two sync fs operations per message on the
+ *   hot path. A final flush happens on graceful shutdown so nothing is lost.
  */
 import fs from "fs";
 import { resolve } from "path";
@@ -18,6 +24,42 @@ import { USERS_DIR, MEMORY_DIR } from "../paths.js";
 // Ensure users dir exists
 if (!fs.existsSync(USERS_DIR))
     fs.mkdirSync(USERS_DIR, { recursive: true });
+// ── In-memory cache + debounced persistence ─────────────
+const cache = new Map();
+const dirty = new Set();
+let flushTimer = null;
+const FLUSH_DELAY_MS = 2000;
+function schedule_flush() {
+    if (flushTimer)
+        return;
+    flushTimer = setTimeout(() => {
+        flushTimer = null;
+        flushProfiles();
+    }, FLUSH_DELAY_MS);
+    flushTimer.unref?.();
+}
+/**
+ * Write every dirty profile to disk synchronously. Called by the debounce
+ * timer AND by the graceful-shutdown handler so no in-flight updates are
+ * lost even if the bot exits between debounce ticks.
+ */
+export function flushProfiles() {
+    if (dirty.size === 0)
+        return;
+    for (const userId of dirty) {
+        const profile = cache.get(userId);
+        if (!profile)
+            continue;
+        try {
+            fs.writeFileSync(profilePath(userId), JSON.stringify(profile, null, 2));
+        }
+        catch (err) {
+            // Don't throw — a persistent error would block future flushes.
+            console.warn(`[users] flush ${userId} failed: ${err.message}`);
+        }
+    }
+    dirty.clear();
+}
 // ── Profile Management ──────────────────────────────────
 function profilePath(userId) {
     return resolve(USERS_DIR, `${userId}.json`);
@@ -26,22 +68,32 @@ function userMemoryDir(userId) {
     return resolve(USERS_DIR, `${userId}`);
 }
 /**
- * Load a user profile. Returns null if not found.
+ * Load a user profile. Returns null if not found. Reads from cache first,
+ * falls back to disk on cache miss.
  */
 export function loadProfile(userId) {
+    const cached = cache.get(userId);
+    if (cached)
+        return cached;
     try {
         const raw = fs.readFileSync(profilePath(userId), "utf-8");
-        return JSON.parse(raw);
+        const profile = JSON.parse(raw);
+        cache.set(userId, profile);
+        return profile;
     }
     catch {
         return null;
     }
 }
 /**
- * Save a user profile.
+ * Save a user profile — updates cache and schedules a debounced disk flush.
+ * For immediate durability (e.g. during shutdown), call flushProfiles()
+ * after this.
  */
 export function saveProfile(profile) {
-    fs.writeFileSync(profilePath(profile.userId), JSON.stringify(profile, null, 2));
+    cache.set(profile.userId, profile);
+    dirty.add(profile.userId);
+    schedule_flush();
 }
 /**
  * Get or create a user profile.
@@ -76,6 +128,9 @@ export function getOrCreateProfile(userId, name, username) {
 }
 /**
  * Update a user's activity (call on each message).
+ *
+ * Previously this did a sync read + write per message. Now it works purely
+ * in memory and lets the debounce timer batch writes to disk.
  */
 export function touchProfile(userId, name, username, platform, messageText) {
     const profile = getOrCreateProfile(userId, name, username);
@@ -95,20 +150,33 @@ export function touchProfile(userId, name, username, platform, messageText) {
     return profile;
 }
 /**
- * List all known user profiles.
+ * List all known user profiles. Reads from disk; populates cache for
+ * subsequent fast access.
  */
 export function listProfiles() {
     const profiles = [];
     try {
         const files = fs.readdirSync(USERS_DIR);
         for (const file of files) {
-            if (file.endsWith(".json")) {
-                try {
-                    const raw = fs.readFileSync(resolve(USERS_DIR, file), "utf-8");
-                    profiles.push(JSON.parse(raw));
-                }
-                catch { /* skip corrupt */ }
+            if (!file.endsWith(".json"))
+                continue;
+            // Parse user id from filename — skip non-numeric (e.g. stray files)
+            const userId = parseInt(file.slice(0, -5), 10);
+            if (!Number.isFinite(userId))
+                continue;
+            // If cached, use that; otherwise read once and cache
+            const cached = cache.get(userId);
+            if (cached) {
+                profiles.push(cached);
+                continue;
+            }
+            try {
+                const raw = fs.readFileSync(resolve(USERS_DIR, file), "utf-8");
+                const p = JSON.parse(raw);
+                cache.set(userId, p);
+                profiles.push(p);
             }
+            catch { /* skip corrupt */ }
         }
     }
     catch { /* dir doesn't exist */ }
@@ -145,6 +213,9 @@ export function addUserNote(userId, note) {
 export function deleteUser(userId) {
     const deleted = [];
     const errors = [];
+    // 0. Drop from cache + dirty set so the debounce doesn't re-create the file
+    cache.delete(userId);
+    dirty.delete(userId);
     // 1. Delete profile JSON
     const pPath = profilePath(userId);
     try {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "alvin-bot",
-  "version": "4.16.1",
+  "version": "4.18.0",
   "description": "Alvin Bot \u2014 Your personal AI agent on Telegram, WhatsApp, Discord, Signal, and Web.",
   "type": "module",
   "main": "dist/index.js",