npm - alvin-bot - Versions diffs - 4.9.4 → 4.11.0 - Mend

alvin-bot 4.9.4 → 4.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/CHANGELOG.md +154 -0
package/dist/handlers/async-agent-chunk-handler.js +33 -0
package/dist/handlers/commands.js +6 -1
package/dist/handlers/message.js +44 -11
package/dist/handlers/platform-message.js +4 -4
package/dist/index.js +20 -1
package/dist/paths.js +18 -1
package/dist/providers/claude-sdk-provider.js +43 -0
package/dist/services/async-agent-parser.js +152 -0
package/dist/services/async-agent-watcher.js +206 -0
package/dist/services/compaction.js +13 -0
package/dist/services/memory-extractor.js +178 -0
package/dist/services/memory-layers.js +147 -0
package/dist/services/memory.js +15 -8
package/dist/services/personality.js +100 -18
package/dist/services/session-persistence.js +159 -0
package/dist/services/session.js +30 -0
package/package.json +2 -2
package/test/async-agent-chunk-flow.test.ts +131 -0
package/test/async-agent-parser.test.ts +322 -0
package/test/async-agent-watcher.test.ts +229 -0
package/test/memory-extractor.test.ts +151 -0
package/test/memory-layers.test.ts +169 -0
package/test/memory-sdk-injection.test.ts +146 -0
package/test/memory-stress-restart.test.ts +336 -0
package/test/session-persistence.test.ts +192 -0
package/test/system-prompt-background-hint.test.ts +48 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,160 @@
 All notable changes to Alvin Bot are documented here.
+## [4.11.0] — 2026-04-13
+### 🧠 Memory Persistence + Smart Loading — sessions survive restart, memory is layered
+A colleague asked the same day v4.10.0 shipped: *"Memory after session restart is also a bit fiddly. I installed mempalace as a workaround — maybe build something like that natively."* He was right. Alvin had a hand-curated `MEMORY.md`, a 128 MB embeddings vector index, and an AI-powered compaction service — but **the in-memory `sessions Map` was wiped on every bot restart**. Claude SDK then started a fresh conversation on the next user message, behaving like a goldfish despite all that memory infrastructure on disk.
+This release fixes that with **five complementary tasks**, all bundled into v4.11.0. Three core fixes (P0) plus two structural improvements (P1) inspired by mempalace's L0–L3 stack and Mem0's auto-extraction pattern.
+#### P0 #1 — Session Persistence (`src/services/session-persistence.ts`, NEW)
+The core fix. The `sessions Map` in `src/services/session.ts` was in-memory only; every `launchctl kickstart` wiped every user's `sessionId`, history, language, effort, voiceReply, and tracking counters.
+- **Debounced flush** (1.5 s coalesce window) writes a sanitized snapshot of `getAllSessions()` to `~/.alvin-bot/state/sessions.json` via atomic tmp+rename.
+- **`loadPersistedSessions()`** rehydrates the Map at bot startup; `flushSessions()` flushes synchronously on graceful shutdown (SIGINT/SIGTERM).
+- **`attachPersistHook()` / `markSessionDirty()`** in `session.ts` give handlers a callback to trigger persist after direct mutations (`/lang`, `/effort`, `/voice`). `addToHistory()` and `trackProviderUsage()` trigger it automatically.
+- History is capped at `MAX_PERSISTED_HISTORY = 50` per session so the file stays small.
+- Runtime-only fields (`abortController`, `isProcessing`, `messageQueue`) are stripped before persisting.
+- Schema drift is handled: missing fields fall back to defaults; corrupt JSON loads zero sessions; null root rejected gracefully.
+- **9 unit tests** + **18 stress tests** covering 100-session burst, 1000-mutate debounce coalescing, unicode (RTL/ZWJ/astral plane), atomic write recovery from stale `.tmp`, schema drift, hostile JSON, read-only filesystem, simulated bot restart.
+#### P0 #2 — MEMORY.md Auto-Inject for SDK (`src/services/personality.ts`)
+Before v4.11.0, only non-SDK providers (Groq, Gemini, NVIDIA) got `buildMemoryContext()` injected into their system prompt. The Claude SDK was *expected* to read memory files via tools, but in practice rarely did unless the user's first message specifically prompted it.
+- Drops the `!isSDK` guard around `buildMemoryContext()` and asset-index injection.
+- SDK now gets the same compact memory context (MEMORY.md + today + yesterday daily logs) at every turn — the same context non-SDK providers had since 4.0.
+- **3 unit tests** verifying SDK includes the memory section, non-SDK regression, and graceful behavior when MEMORY.md is missing.
+#### P0 #3 — Semantic Recall on SDK First Turn (`src/services/personality.ts`, `src/handlers/message.ts`, `src/handlers/platform-message.ts`)
+`buildSmartSystemPrompt()` now accepts an `isFirstTurn` flag. For SDK providers it runs the embeddings-based `searchMemory()` only on the first turn (`session.sessionId === null` — meaning Claude hasn't given us a resume token yet for this session). After the first turn Claude carries the recalled context inside the SDK session via resume, so spamming the embeddings API on every subsequent turn is wasted work. Non-SDK providers still run the search on every turn (no resume mechanism).
+- `handlers/message.ts` and `handlers/platform-message.ts` updated to compute `isFirstSDKTurn = isSDK && session.sessionId === null` and pass it through.
+- The bare `buildSystemPrompt` calls on the SDK paths are gone — `buildSmartSystemPrompt` is the single entry point.
+- **5 mocked-search tests** covering call-count semantics for SDK first/later turns, non-SDK every turn, missing `userMessage` skip, and graceful failure when `searchMemory` throws.
+#### P1 #4 — Layered Memory Loader (`src/services/memory-layers.ts`, NEW)
+Inspired by mempalace's L0–L3 stack. Replaces the monolithic `MEMORY.md → System Prompt` injection with a structured, token-budgeted layered loader:
+- **L0** `~/.alvin-bot/memory/identity.md` — always loaded, ~200 tokens (core user facts: name, location, family, contact)
+- **L1** `~/.alvin-bot/memory/preferences.md` — always loaded (communication style, do's and don'ts)
+- **L1** `~/.alvin-bot/memory/MEMORY.md` — backwards-compat: existing curated knowledge (full content if no split files exist; truncated to 1500 chars when split files coexist)
+- **L2** `~/.alvin-bot/memory/projects/*.md` — loaded only when the user's incoming query mentions the project topic (substring or first-200-char keyword overlap)
+- **L3** daily logs — still handled by `embeddings.ts` vector search (unchanged)
+The split is **opt-in**: if `identity.md` and `preferences.md` don't exist, the loader falls back to monolithic MEMORY.md exactly like before. No migration required for existing users. Users who want the cleaner layout can split MEMORY.md manually and the loader picks it up automatically. Token budget: L0+L1 capped at 5000 chars (~1300 tokens), L2 capped at 3000 chars total (~750 tokens, max 1500 per matched project file). New `query` parameter on `buildSystemPrompt()` and `buildMemoryContext()` propagates the user message all the way through. **9 unit tests** + 2 layered-context stress tests.
+#### P1 #5 — Auto-Fact-Extraction in Compaction (`src/services/memory-extractor.ts`, NEW)
+Inspired by Mem0's auto-extraction. When `compactSession()` archives old messages, it now runs an additional extraction pass that pulls structured facts (`user_facts`, `preferences`, `decisions`) out of the archived chunk via the active AI provider and appends them to MEMORY.md.
+- **`parseExtractedFacts(text)`** — tolerates JSON wrapped in markdown code fences, surrounding prose, null/undefined fields, non-string entries.
+- **`appendFactsToMemoryFile(facts)`** — exact-string dedup against existing MEMORY.md content, structured under `## Auto-extracted (YYYY-MM-DD)` header with `### User Facts` / `### Preferences` / `### Decisions` sub-sections.
+- **`extractAndStoreFacts(chunk)`** — safe wrapper, never throws. Opt-out via `MEMORY_EXTRACTION_DISABLED=1` env var. Uses effort=low for cost minimization. Skips short input (<50 chars). Provider failures are swallowed; compaction always continues.
+- Wired into `compactSession()` after the daily-log flush, before the AI summary generation.
+- Marked **experimental** in v4.11.0. Semantic dedup (vs current exact-string match) deferred to v4.12+.
+- **11 unit tests** covering JSON parsing edge cases, dedup, opt-out, short-input skip, garbage input, non-string filtering, graceful provider-failure handling.
+#### Architecture decisions
+- **mempalace as MCP server: rejected.** Considered installing mempalace as a Python MCP service. Rejected because (1) Alvin is all-TypeScript and adding a 2nd Python service to launchd is operational complexity, (2) Alvin already has an embeddings vector index — mempalace would be a parallel duplicate, (3) mempalace's MCP tools are only consumed by the SDK; cron jobs, sub-agents, and non-SDK providers wouldn't see them. Conclusion: **adopt the patterns natively** (L0–L3 layering, AAAK-style structured extraction) rather than running a second service.
+- **SQLite migration deferred.** The 128 MB JSON embeddings index is a known performance issue and is already noted in `~/.claude/projects/-Users-alvin-de/memory/project_alvinbot_sqlite_migration.md` for v4.12+. Orthogonal to the "frickelig nach Restart" UX problem this release targets.
+- **Multi-user isolation deferred.** Memories are still global per data dir. Single-user use case, not a privacy concern for Ali's setup.
+- **Decay/aging deferred.** Daily logs grow monotonically. Will be addressed alongside SQLite migration.
+#### Testing
+**292 tests total** (237 baseline + 55 new). All green. TSC clean.
+- 9 session-persistence unit tests
+- 8 SDK memory-injection tests (3 base + 5 smart-prompt mocked-search)
+- 9 memory-layers tests (loader + topic match + token budget)
+- 11 memory-extractor tests (parse + append + extract pipeline)
+- 18 stress tests (100 sessions, schema drift, unicode, atomic recovery, hostile JSON, simulated restart)
+**Live verification:**
+- `tmp/live-stress-memory.mjs` — 50 fake sessions against the built `dist/`, real ~/.alvin-bot/memory/MEMORY.md as the L1 source, simulated restart via Map clear + reload. Result: 215 KB state file, 1 ms flush, 1 ms reload, 50/50 perfect round-trip.
+- `tmp/live-edge-cases.mjs` — 7 hostile scenarios: all-null fields, 1000-burst debounce (2 ms), 20 concurrent flushes, extreme unicode (RTL + ZWJ + astral plane), 4-layer memory with project topic match, atomic write recovery from stale .tmp, empty project file skipping. All passed.
+#### Files changed
+- **NEW:** `src/services/session-persistence.ts`, `src/services/memory-layers.ts`, `src/services/memory-extractor.ts`
+- **NEW tests:** `test/session-persistence.test.ts`, `test/memory-sdk-injection.test.ts`, `test/memory-layers.test.ts`, `test/memory-extractor.test.ts`, `test/memory-stress-restart.test.ts`
+- **Modified:** `src/services/session.ts` (persist hook), `src/services/personality.ts` (SDK injection + isFirstTurn), `src/services/memory.ts` (use layered loader), `src/services/compaction.ts` (extractor hook), `src/handlers/message.ts` + `src/handlers/platform-message.ts` (smart prompt wiring), `src/handlers/commands.ts` (`markSessionDirty` calls), `src/index.ts` (load + flush wiring), `src/paths.ts` (4 new constants)
+- **Plan:** `docs/superpowers/plans/2026-04-13-memory-persistence.md`
+---
+## [4.10.0] — 2026-04-13
+### 🚀 Async sub-agents — main session no longer blocks during long tasks
+The big architecture upgrade: Claude can now delegate long-running work (SEO audits, multi-page research, full-repo analyses) to **background** sub-agents. The main Telegram session ends quickly, the user can keep chatting, and the sub-agent's final report arrives as a separate message when ready.
+A colleague flagged the underlying problem on 2026-04-13 via WhatsApp voice note: *"It's weird that the main routine crashes when the sub-agents are still running. It should just run in the background, and that should have zero impact on the main routine."* He was right. OpenClaw had this years ago because back then the SDK didn't support async; today's `@anthropic-ai/claude-agent-sdk@0.2.97` already ships `run_in_background: true` on the Agent tool — Alvin just wasn't using it.
+This release closes that gap in two complementary stages, both bundled into the same v4.10.0:
+#### Stage 1 — System prompt teaches Claude when to use `run_in_background`
+- New `BACKGROUND_SUBAGENT_HINT` constant in `src/services/personality.ts`, injected only into SDK sessions (non-SDK providers don't have an Agent tool).
+- The hint tells Claude: for audits / multi-page research / >2 min tasks → ALWAYS set `run_in_background: true`. After launching, end the turn promptly. The bot delivers the result automatically when done.
+- Net effect: Claude's main turn ends in ~5 s instead of 10+ minutes. `session.isProcessing` flips to `false` quickly so the user can keep chatting.
+#### Stage 2 — Async-agent watcher polls and delivers
+The hard part. Three new pure modules + one new wired-up service:
+- **`src/services/async-agent-parser.ts`** (NEW, pure) — two helpers:
+  - `parseAsyncLaunchedToolResult(text)` extracts `agentId` + `output_file` from the SDK's plain-text `Async agent launched successfully…` tool-result. **Important**: the `.d.ts` type in the SDK package claims this is a JSON object with `outputFile: string`. The runtime actually emits plain text with `output_file` (snake_case). Captured live via probe — see the parser test fixtures.
+  - `parseOutputFileStatus(path)` tail-reads (64 KB) the JSONL `output_file` and detects completion by finding the most-recent `assistant` message with `stop_reason: "end_turn"`. Concatenates `content[].text` blocks for the final answer. Token usage extracted from the `usage` field. Survives partial last lines, garbage lines, and tail-cuts on huge files. **19 unit tests** including a 200 KB tail-test.
+- **`src/services/async-agent-watcher.ts`** (NEW) — the polling service. `Map<agentId, PendingAsyncAgent>` in memory, persisted to `~/.alvin-bot/state/async-agents.json` for restart catch-up (same pattern as v4.9.0 cron scheduler). Public API: `startWatcher` / `stopWatcher` / `registerPendingAgent` / `pollOnce` / `listPendingAgents`. Polls every 15 s, gives up after 12 h per-agent (timeout banner). On completion → builds a `SubAgentInfo + SubAgentResult` and hands off to the existing `subagent-delivery.ts` from v4.9.x. **7 integration tests** including bot-restart catch-up.
+- **`src/handlers/async-agent-chunk-handler.ts`** (NEW) — bridge between provider stream chunks and the watcher. Inspects `tool_result` chunks for the async_launched payload, extracts the `description` from the immediately preceding `tool_use` chunk, registers with the watcher. **4 unit tests**.
+- **`src/providers/claude-sdk-provider.ts`** — extended to surface `tool_result` blocks from SDK `user` messages as a new `tool_result` chunk type. Previously the provider only emitted `text` and `tool_use` chunks.
+- **`src/providers/types.ts`** — `StreamChunk` gets two new optional fields: `toolUseId` and `toolResultContent`.
+- **`src/handlers/message.ts`** — captures `lastAgentToolUseInput` from each `tool_use` chunk and consumes it on the immediately-following `tool_result` chunk. Tool-name match also extended from `"Task"` → `"Task" | "Agent"` (the SDK renamed it in v2.1.63).
+- **`src/index.ts`** — `startAsyncAgentWatcher()` after the cron scheduler, `stopAsyncAgentWatcher()` in the shutdown handler.
+- **`src/paths.ts`** — new `ASYNC_AGENTS_STATE_FILE` constant under `~/.alvin-bot/state/`.
+#### Investigation artifacts (gitignored, maintainer-local)
+- `docs/superpowers/plans/2026-04-13-async-subagents.md` — full TDD plan
+- `docs/superpowers/specs/sdk-async-agent-outputfile-format.md` — live-captured SDK format spec; documents the `.d.ts` mismatch that ate ~30 minutes of debugging time
+#### Testing
+**237 tests total** (201 baseline + 36 new). All green. TSC clean.
+- 6 system-prompt-hint tests (Stage 1)
+- 19 parser tests (8 plain-text format + 11 JSONL format including 200 KB tail-test)
+- 7 watcher integration tests (register, deliver, persistence, restart catch-up, timeout, concurrent agents)
+- 4 chunk-handler unit tests
+Live-verified via isolated SDK probe (`node sdk-probe.mjs` inside the repo) which confirmed the real `output_file` path and JSONL format match the parser's expectations.
+#### What you'll see as a user
+Send: *"Make a SEO audit of gethomes.io and alev-b.com in parallel"*
+- **0 s** — Claude responds: *"Starting both audits in the background — I'll send the reports when done."* Main session **unlocks**.
+- **1–10 min later** — You can chat about anything else. The bot answers immediately.
+- **~13 min** (when each agent finishes) — Two separate banner messages arrive: *"✅ SEO audit gethomes.io completed · 13m 17s · 2.6M in / 28k out"* + the full report body, delivered via the v4.9.3 Markdown→plain-text fallback path.
+#### Non-goals
+- No session-mutex refactor (Stage 3 from the analysis, out of scope here)
+- No replacement for Alvin's existing cron `spawnSubAgent` system (different use case)
+- No SDK upgrade beyond `0.2.97`
+#### Compatibility
+- `CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1` in `.env` disables background mode at the SDK level → Stage 1 hint becomes inert, watcher idles; foreground behavior is restored
 ## [4.9.4] — 2026-04-13
 ### 🔌 Web UI fully decoupled from main bot — port conflicts no longer crash anything

package/dist/handlers/async-agent-chunk-handler.js ADDED Viewed

@@ -0,0 +1,33 @@
+import { parseAsyncLaunchedToolResult } from "../services/async-agent-parser.js";
+import { registerPendingAgent } from "../services/async-agent-watcher.js";
+/**
+ * Inspect a stream chunk; if it's an Agent async_launched tool_result,
+ * register the pending agent with the watcher.
+ *
+ * Safe to call on any chunk type — non-tool_result chunks are ignored.
+ */
+export function handleToolResultChunk(chunk, ctx) {
+    if (chunk.type !== "tool_result")
+        return;
+    if (!chunk.toolResultContent)
+        return;
+    const info = parseAsyncLaunchedToolResult(chunk.toolResultContent);
+    if (!info)
+        return;
+    // The description and prompt come from the original tool_use input,
+    // not the tool_result text. If we don't have them (e.g. test setup
+    // forgot to pass lastToolUseInput), fall back to a generic label so
+    // the user still sees something meaningful in the delivery banner.
+    const description = ctx.lastToolUseInput?.description?.trim() ||
+        `Background agent ${info.agentId.slice(0, 8)}`;
+    const prompt = ctx.lastToolUseInput?.prompt?.trim() || "";
+    registerPendingAgent({
+        agentId: info.agentId,
+        outputFile: info.outputFile,
+        description,
+        prompt,
+        chatId: ctx.chatId,
+        userId: ctx.userId,
+        toolUseId: chunk.toolUseId ?? null,
+    });
+}

package/dist/handlers/commands.js CHANGED Viewed

@@ -2,7 +2,7 @@ import { InlineKeyboard, InputFile } from "grammy";
 import fs from "fs";
 import path, { resolve } from "path";
 import os from "os";
-import { getSession, resetSession } from "../services/session.js";
+import { getSession, resetSession, markSessionDirty } from "../services/session.js";
 import { getRegistry } from "../engine.js";
 import { reloadSoul } from "../services/personality.js";
 import { parseDuration, createReminder, listReminders, cancelReminder } from "../services/reminders.js";
@@ -399,6 +399,7 @@ export function registerCommands(bot) {
         const userId = ctx.from.id;
         const session = getSession(userId);
         session.voiceReply = !session.voiceReply;
+        markSessionDirty(userId);
         await ctx.reply(session.voiceReply
             ? "Voice replies enabled. Responses will also be sent as voice messages."
             : "Voice replies disabled. Text-only responses.");
@@ -421,6 +422,7 @@ export function registerCommands(bot) {
             return;
         }
         session.effort = level;
+        markSessionDirty(userId);
         await ctx.reply(`✅ Effort: ${EFFORT_LABELS[session.effort]}`);
     });
     // Inline keyboard callback for effort switching
@@ -433,6 +435,7 @@ export function registerCommands(bot) {
         const userId = ctx.from.id;
         const session = getSession(userId);
         session.effort = level;
+        markSessionDirty(userId);
         const keyboard = new InlineKeyboard();
         for (const [key, label] of Object.entries(EFFORT_LABELS)) {
             const marker = key === session.effort ? "✅ " : "";
@@ -827,6 +830,7 @@ export function registerCommands(bot) {
         }
         else if (arg === "en" || arg === "de" || arg === "es" || arg === "fr") {
             session.language = arg;
+            markSessionDirty(userId);
             const { setExplicitLanguage } = await import("../services/language-detect.js");
             setExplicitLanguage(userId, arg);
             await ctx.reply(t("bot.lang.setFixed", arg, { name: LOCALE_NAMES[arg] }));
@@ -851,6 +855,7 @@ export function registerCommands(bot) {
         }
         const newLang = choice;
         session.language = newLang;
+        markSessionDirty(userId);
         const { setExplicitLanguage } = await import("../services/language-detect.js");
         setExplicitLanguage(userId, newLang);
         const currentName = `${LOCALE_FLAGS[newLang]} ${LOCALE_NAMES[newLang]}`;

package/dist/handlers/message.js CHANGED Viewed

@@ -4,7 +4,7 @@ import { getSession, addToHistory, trackProviderUsage, buildSessionKey } from ".
 import { TelegramStreamer } from "../services/telegram.js";
 import { getRegistry } from "../engine.js";
 import { textToSpeech } from "../services/voice.js";
-import { buildSystemPrompt, buildSmartSystemPrompt } from "../services/personality.js";
+import { buildSmartSystemPrompt } from "../services/personality.js";
 import { buildSkillContext } from "../services/skills.js";
 import { isForwardingAllowed } from "../services/access.js";
 import { touchProfile } from "../services/users.js";
@@ -15,6 +15,7 @@ import { trackUsage } from "../services/usage-tracker.js";
 import { emitUserMessage as broadcastUserMessage, emitResponseStart as broadcastResponseStart, emitResponseDelta as broadcastResponseDelta, emitResponseDone as broadcastResponseDone, } from "../services/broadcast.js";
 import { t } from "../i18n.js";
 import { isHarmlessTelegramError } from "../util/telegram-error-filter.js";
+import { handleToolResultChunk } from "./async-agent-chunk-handler.js";
 /**
  * Stuck-only timeout — NO absolute cap.
  *
@@ -218,12 +219,17 @@ export async function handleMessage(ctx) {
         if (adaptedLang !== session.language) {
             session.language = adaptedLang;
         }
-        // Build query options (with semantic memory search for non-SDK + skill injection)
+        // Build query options (with semantic memory search for non-SDK + skill injection).
+        // v4.11.0 P0 #3: SDK now also gets semantic recall on first-turn. The signal
+        // is `session.sessionId === null` — meaning Claude SDK hasn't given us a
+        // resume token yet for this session. True for: brand-new users, post-/new,
+        // and rehydrated sessions where the persisted snapshot lacked a sessionId.
+        // After the first SDK turn, Claude resumes via SDK session_id and already
+        // carries the recalled context — no need for another search per turn.
         const chatIdStr = String(ctx.chat.id);
         const skillContext = buildSkillContext(text);
-        const systemPrompt = (isSDK
-            ? buildSystemPrompt(isSDK, session.language, chatIdStr)
-            : await buildSmartSystemPrompt(isSDK, session.language, text, chatIdStr)) + skillContext;
+        const isFirstSDKTurn = isSDK && session.sessionId === null;
+        const systemPrompt = (await buildSmartSystemPrompt(isSDK, session.language, text, chatIdStr, isFirstSDKTurn)) + skillContext;
         // Track the user turn in history regardless of provider type. This keeps
         // the fallback path (Ollama etc.) aware of what was said on SDK turns.
         addToHistory(userId, { role: "user", content: text });
@@ -279,6 +285,11 @@ export async function handleMessage(ctx) {
         };
         // Stream response from provider (with fallback)
         let lastBroadcastLen = 0;
+        // Captured during tool_use chunks; consumed by tool_result chunks so
+        // the async-agent watcher can label pending agents with their human-
+        // readable description (which only appears in the tool_use input,
+        // not in the tool_result text). See Fix #17 Stage 2.
+        let lastAgentToolUseInput;
         for await (const chunk of registry.queryWithFallback(queryOpts)) {
             // Any chunk is progress — reset the stuck timer.
             resetStuckTimer();
@@ -309,13 +320,14 @@ export async function handleMessage(ctx) {
                     if (chunk.toolName) {
                         session.toolUseCount++;
                         const icon = TOOL_ICONS[chunk.toolName] || "🔧";
-                        // Special treatment for Claude's SDK-internal Task tool:
+                        // Special treatment for Claude's SDK-internal Task/Agent tool:
                         // track how many sub-tasks Claude delegated and surface the
                         // task description in the status line so the user sees WHAT
-                        // is being delegated, not just "Task…".
-                        if (chunk.toolName === "Task") {
+                        // is being delegated, not just "Task…". The tool was renamed
+                        // from "Task" to "Agent" in Claude Code v2.1.63 — match both.
+                        if (chunk.toolName === "Task" || chunk.toolName === "Agent") {
                             session.sdkSubTaskCount++;
-                            let label = "Task";
+                            let label = chunk.toolName;
                             if (chunk.toolInput) {
                                 try {
                                     const parsed = JSON.parse(chunk.toolInput);
@@ -324,11 +336,18 @@ export async function handleMessage(ctx) {
                                         const desc = parsed.description.length > 80
                                             ? parsed.description.slice(0, 80) + "…"
                                             : parsed.description;
-                                        label = `Task: ${desc}`;
+                                        label = `${chunk.toolName}: ${desc}`;
                                     }
                                     else if (parsed.subagent_type) {
-                                        label = `Task (${parsed.subagent_type})`;
+                                        label = `${chunk.toolName} (${parsed.subagent_type})`;
                                     }
+                                    // Capture the description+prompt for the upcoming
+                                    // tool_result. Used by Fix #17 Stage 2 to label
+                                    // background agents in the watcher's delivery banner.
+                                    lastAgentToolUseInput = {
+                                        description: parsed.description,
+                                        prompt: parsed.prompt,
+                                    };
                                 }
                                 catch {
                                     // not JSON — keep generic label
@@ -341,6 +360,20 @@ export async function handleMessage(ctx) {
                         }
                     }
                     break;
+                case "tool_result":
+                    // Fix #17 Stage 2: detect Agent async_launched payloads and
+                    // hand them off to the async-agent watcher. The watcher will
+                    // poll the outputFile and deliver the result as a separate
+                    // Telegram message when the background agent finishes.
+                    handleToolResultChunk(chunk, {
+                        chatId: ctx.chat.id,
+                        userId,
+                        lastToolUseInput: lastAgentToolUseInput,
+                    });
+                    // Reset the captured input — only the immediately following
+                    // tool_result should consume it.
+                    lastAgentToolUseInput = undefined;
+                    break;
                 case "done":
                     if (chunk.sessionId)
                         session.sessionId = chunk.sessionId;

package/dist/handlers/platform-message.js CHANGED Viewed

@@ -9,7 +9,7 @@
 import fs from "fs";
 import { getSession, addToHistory, trackProviderUsage } from "../services/session.js";
 import { getRegistry } from "../engine.js";
-import { buildSystemPrompt, buildSmartSystemPrompt } from "../services/personality.js";
+import { buildSmartSystemPrompt } from "../services/personality.js";
 import { buildSkillContext } from "../services/skills.js";
 import { touchProfile } from "../services/users.js";
 import { trackAndAdapt } from "../services/language-detect.js";
@@ -129,9 +129,9 @@ export async function handlePlatformMessage(msg, adapter) {
         const activeProvider = registry.getActive();
         const isSDK = activeProvider.config.type === "claude-sdk";
         const skillContext = buildSkillContext(fullText);
-        const systemPrompt = (isSDK
-            ? buildSystemPrompt(isSDK, session.language, msg.chatId)
-            : await buildSmartSystemPrompt(isSDK, session.language, fullText, msg.chatId)) + skillContext;
+        // v4.11.0 P0 #3 — SDK gets semantic recall on first turn (when no resume token yet).
+        const isFirstSDKTurn = isSDK && session.sessionId === null;
+        const systemPrompt = (await buildSmartSystemPrompt(isSDK, session.language, fullText, msg.chatId, isFirstSDKTurn)) + skillContext;
         const queryOpts = {
             prompt: fullText,
             systemPrompt,

package/dist/index.js CHANGED Viewed

@@ -78,7 +78,9 @@ import { loadPlugins, registerPluginCommands, unloadPlugins } from "./services/p
 import { initMCP, disconnectMCP, hasMCPConfig } from "./services/mcp.js";
 import { startWebServer, stopWebServer } from "./web/server.js";
 import { startScheduler, stopScheduler, setNotifyCallback } from "./services/cron.js";
-import { startSessionCleanup, stopSessionCleanup } from "./services/session.js";
+import { startWatcher as startAsyncAgentWatcher, stopWatcher as stopAsyncAgentWatcher } from "./services/async-agent-watcher.js";
+import { startSessionCleanup, stopSessionCleanup, attachPersistHook } from "./services/session.js";
+import { loadPersistedSessions, flushSessions, schedulePersist, } from "./services/session-persistence.js";
 import { processQueue, cleanupQueue, setSenders, enqueue } from "./services/delivery-queue.js";
 import { discoverTools } from "./services/tool-discovery.js";
 import { startHeartbeat } from "./services/heartbeat.js";
@@ -254,7 +256,12 @@ const shutdown = async () => {
     await cancelAllSubAgents(true);
     stopWatchdog();
     stopScheduler();
+    stopAsyncAgentWatcher();
     stopSessionCleanup();
+    // v4.11.0 — Final immediate flush of in-memory sessions to disk before exit.
+    // The debounced timer might be pending; flushSessions() cancels it and writes
+    // synchronously so the next boot can rehydrate the latest state.
+    await flushSessions().catch((err) => console.warn("[shutdown] flushSessions failed:", err));
     if (queueInterval)
         clearInterval(queueInterval);
     if (queueCleanupInterval)
@@ -420,9 +427,21 @@ setNotifyCallback(async (target, text) => {
     enqueue(target.platform, String(target.chatId), text);
 });
 startScheduler();
+// Start the async-agent watcher (Fix #17 Stage 2). Polls outputFiles
+// of background sub-agents Claude launched with run_in_background and
+// delivers their completed reports as separate Telegram messages.
+// Loads any persisted pending agents from disk on boot.
+startAsyncAgentWatcher();
 // Session memory hygiene: purge sessions idle > 7 days (configurable via
 // ALVIN_SESSION_TTL_DAYS). Never touches active sessions — see session.ts.
 startSessionCleanup();
+// Session persistence (v4.11.0): wire the debounced persist hook BEFORE we
+// load the snapshot, then rehydrate the in-memory Map from disk so users'
+// Claude SDK session_id, conversation history, language and effort all
+// survive bot restarts. Without this, every launchctl restart turns the
+// bot into a goldfish for every active conversation.
+attachPersistHook(schedulePersist);
+loadPersistedSessions();
 // Wire delivery queue senders
 setSenders({
     telegram: async (chatId, content) => {

package/dist/paths.js CHANGED Viewed

@@ -41,8 +41,15 @@ export const TOOLS_EXAMPLE_JSON = resolve(BOT_ROOT, "docs", "tools.example.json"
 export const ENV_FILE = resolve(DATA_DIR, ".env");
 /** memory/ — Daily logs and embeddings */
 export const MEMORY_DIR = resolve(DATA_DIR, "memory");
-/** memory/MEMORY.md — Long-term curated memory */
+/** memory/MEMORY.md — Long-term curated memory (legacy monolithic, still loaded) */
 export const MEMORY_FILE = resolve(DATA_DIR, "memory", "MEMORY.md");
+/** memory/identity.md — L0 layer (v4.11.0): core user facts, always loaded.
+ *  Optional. If missing, MEMORY.md acts as the L0+L1 fallback. */
+export const IDENTITY_FILE = resolve(DATA_DIR, "memory", "identity.md");
+/** memory/preferences.md — L1 layer (v4.11.0): communication style + don'ts. */
+export const PREFERENCES_FILE = resolve(DATA_DIR, "memory", "preferences.md");
+/** memory/projects/ — L2 layer (v4.11.0): per-project context loaded on topic match. */
+export const PROJECTS_MEMORY_DIR = resolve(DATA_DIR, "memory", "projects");
 /** memory/.embeddings.json — Vector index */
 export const EMBEDDINGS_IDX = resolve(DATA_DIR, "memory", ".embeddings.json");
 /** users/ — User profiles and per-user memory */
@@ -62,6 +69,16 @@ export const SUDO_ENC_FILE = resolve(DATA_DIR, "data", ".sudo-enc");
 export const SUDO_KEY_FILE = resolve(DATA_DIR, "data", ".sudo-key");
 /** backups/ — Config snapshots */
 export const BACKUP_DIR = resolve(DATA_DIR, "backups");
+/** state/async-agents.json — Pending background SDK agents (Fix #17 Stage 2).
+ *  See src/services/async-agent-watcher.ts for the watcher that polls and
+ *  delivers these. Survives bot restarts. */
+export const ASYNC_AGENTS_STATE_FILE = resolve(DATA_DIR, "state", "async-agents.json");
+/** state/sessions.json — Persisted user sessions across bot restarts (v4.11.0).
+ *  Includes: sessionId (Claude SDK resume token), language, effort, voiceReply,
+ *  workingDir, lastActivity, lastSdkHistoryIndex, history (capped). Atomic write
+ *  via tmp+rename. Loaded on startup, debounce-flushed on mutations.
+ *  See src/services/session-persistence.ts for the loader/flusher. */
+export const SESSIONS_STATE_FILE = resolve(DATA_DIR, "state", "sessions.json");
 /** soul.md — Bot personality */
 export const SOUL_FILE = resolve(DATA_DIR, "soul.md");
 /** tools.md — Custom tool definitions (Markdown) */

package/dist/providers/claude-sdk-provider.js CHANGED Viewed

@@ -186,6 +186,49 @@ export class ClaudeSDKProvider {
                         }
                     }
                 }
+                // User message — tool_results from the Claude API arrive as user
+                // messages in the SDK protocol. We surface tool_result blocks as
+                // chunks so the message handler can detect Agent async_launched
+                // payloads and register them with the watcher (Fix #17 Stage 2).
+                if (message.type === "user") {
+                    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+                    const userMsg = message;
+                    const content = userMsg.message?.content;
+                    if (Array.isArray(content)) {
+                        for (const block of content) {
+                            if (block &&
+                                typeof block === "object" &&
+                                block.type === "tool_result" &&
+                                typeof block.tool_use_id === "string") {
+                                // The `content` field on a tool_result block can be a
+                                // plain string OR an array of content blocks. Normalize
+                                // to a single string so the chunk consumer doesn't need
+                                // to know about the SDK shape.
+                                let contentText = "";
+                                if (typeof block.content === "string") {
+                                    contentText = block.content;
+                                }
+                                else if (Array.isArray(block.content)) {
+                                    contentText = block.content
+                                        .map((c) => {
+                                        if (c && typeof c === "object" && "text" in c) {
+                                            const t = c.text;
+                                            return typeof t === "string" ? t : "";
+                                        }
+                                        return "";
+                                    })
+                                        .join("");
+                                }
+                                yield {
+                                    type: "tool_result",
+                                    toolUseId: block.tool_use_id,
+                                    toolResultContent: contentText,
+                                    sessionId: capturedSessionId,
+                                };
+                            }
+                        }
+                    }
+                }
                 // Result — done (extract full usage including cache tokens)
                 if (message.type === "result") {
                     const resultMsg = message;