alvin-bot 4.9.4 → 4.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,160 @@
2
2
 
3
3
  All notable changes to Alvin Bot are documented here.
4
4
 
5
+ ## [4.11.0] — 2026-04-13
6
+
7
+ ### 🧠 Memory Persistence + Smart Loading — sessions survive restart, memory is layered
8
+
9
+ A colleague asked the same day v4.10.0 shipped: *"Memory after session restart is also a bit fiddly. I installed mempalace as a workaround — maybe build something like that natively."* He was right. Alvin had a hand-curated `MEMORY.md`, a 128 MB embeddings vector index, and an AI-powered compaction service — but **the in-memory `sessions Map` was wiped on every bot restart**. Claude SDK then started a fresh conversation on the next user message, behaving like a goldfish despite all that memory infrastructure on disk.
10
+
11
+ This release fixes that with **five complementary tasks**, all bundled into v4.11.0. Three core fixes (P0) plus two structural improvements (P1) inspired by mempalace's L0–L3 stack and Mem0's auto-extraction pattern.
12
+
13
+ #### P0 #1 — Session Persistence (`src/services/session-persistence.ts`, NEW)
14
+
15
+ The core fix. The `sessions Map` in `src/services/session.ts` was in-memory only; every `launchctl kickstart` wiped every user's `sessionId`, history, language, effort, voiceReply, and tracking counters.
16
+
17
+ - **Debounced flush** (1.5 s coalesce window) writes a sanitized snapshot of `getAllSessions()` to `~/.alvin-bot/state/sessions.json` via atomic tmp+rename.
18
+ - **`loadPersistedSessions()`** rehydrates the Map at bot startup; `flushSessions()` flushes synchronously on graceful shutdown (SIGINT/SIGTERM).
19
+ - **`attachPersistHook()` / `markSessionDirty()`** in `session.ts` give handlers a callback to trigger persist after direct mutations (`/lang`, `/effort`, `/voice`). `addToHistory()` and `trackProviderUsage()` trigger it automatically.
20
+ - History is capped at `MAX_PERSISTED_HISTORY = 50` per session so the file stays small.
21
+ - Runtime-only fields (`abortController`, `isProcessing`, `messageQueue`) are stripped before persisting.
22
+ - Schema drift is handled: missing fields fall back to defaults; corrupt JSON loads zero sessions; null root rejected gracefully.
23
+ - **9 unit tests** + **18 stress tests** covering 100-session burst, 1000-mutate debounce coalescing, unicode (RTL/ZWJ/astral plane), atomic write recovery from stale `.tmp`, schema drift, hostile JSON, read-only filesystem, simulated bot restart.
24
+
25
+ #### P0 #2 — MEMORY.md Auto-Inject for SDK (`src/services/personality.ts`)
26
+
27
+ Before v4.11.0, only non-SDK providers (Groq, Gemini, NVIDIA) got `buildMemoryContext()` injected into their system prompt. The Claude SDK was *expected* to read memory files via tools, but in practice rarely did unless the user's first message specifically prompted it.
28
+
29
+ - Drops the `!isSDK` guard around `buildMemoryContext()` and asset-index injection.
30
+ - SDK now gets the same compact memory context (MEMORY.md + today + yesterday daily logs) at every turn — the same context non-SDK providers had since 4.0.
31
+ - **3 unit tests** verifying SDK includes the memory section, non-SDK regression, and graceful behavior when MEMORY.md is missing.
32
+
33
+ #### P0 #3 — Semantic Recall on SDK First Turn (`src/services/personality.ts`, `src/handlers/message.ts`, `src/handlers/platform-message.ts`)
34
+
35
+ `buildSmartSystemPrompt()` now accepts an `isFirstTurn` flag. For SDK providers it runs the embeddings-based `searchMemory()` only on the first turn (`session.sessionId === null` — meaning Claude hasn't given us a resume token yet for this session). After the first turn Claude carries the recalled context inside the SDK session via resume, so spamming the embeddings API on every subsequent turn is wasted work. Non-SDK providers still run the search on every turn (no resume mechanism).
36
+
37
+ - `handlers/message.ts` and `handlers/platform-message.ts` updated to compute `isFirstSDKTurn = isSDK && session.sessionId === null` and pass it through.
38
+ - The bare `buildSystemPrompt` calls on the SDK paths are gone — `buildSmartSystemPrompt` is the single entry point.
39
+ - **5 mocked-search tests** covering call-count semantics for SDK first/later turns, non-SDK every turn, missing `userMessage` skip, and graceful failure when `searchMemory` throws.
40
+
41
+ #### P1 #4 — Layered Memory Loader (`src/services/memory-layers.ts`, NEW)
42
+
43
+ Inspired by mempalace's L0–L3 stack. Replaces the monolithic `MEMORY.md → System Prompt` injection with a structured, token-budgeted layered loader:
44
+
45
+ - **L0** `~/.alvin-bot/memory/identity.md` — always loaded, ~200 tokens (core user facts: name, location, family, contact)
46
+ - **L1** `~/.alvin-bot/memory/preferences.md` — always loaded (communication style, do's and don'ts)
47
+ - **L1** `~/.alvin-bot/memory/MEMORY.md` — backwards-compat: existing curated knowledge (full content if no split files exist; truncated to 1500 chars when split files coexist)
48
+ - **L2** `~/.alvin-bot/memory/projects/*.md` — loaded only when the user's incoming query mentions the project topic (substring or first-200-char keyword overlap)
49
+ - **L3** daily logs — still handled by `embeddings.ts` vector search (unchanged)
50
+
51
+ The split is **opt-in**: if `identity.md` and `preferences.md` don't exist, the loader falls back to monolithic MEMORY.md exactly like before. No migration required for existing users. Users who want the cleaner layout can split MEMORY.md manually and the loader picks it up automatically. Token budget: L0+L1 capped at 5000 chars (~1300 tokens), L2 capped at 3000 chars total (~750 tokens, max 1500 per matched project file). New `query` parameter on `buildSystemPrompt()` and `buildMemoryContext()` propagates the user message all the way through. **9 unit tests** + 2 layered-context stress tests.
52
+
53
+ #### P1 #5 — Auto-Fact-Extraction in Compaction (`src/services/memory-extractor.ts`, NEW)
54
+
55
+ Inspired by Mem0's auto-extraction. When `compactSession()` archives old messages, it now runs an additional extraction pass that pulls structured facts (`user_facts`, `preferences`, `decisions`) out of the archived chunk via the active AI provider and appends them to MEMORY.md.
56
+
57
+ - **`parseExtractedFacts(text)`** — tolerates JSON wrapped in markdown code fences, surrounding prose, null/undefined fields, non-string entries.
58
+ - **`appendFactsToMemoryFile(facts)`** — exact-string dedup against existing MEMORY.md content, structured under `## Auto-extracted (YYYY-MM-DD)` header with `### User Facts` / `### Preferences` / `### Decisions` sub-sections.
59
+ - **`extractAndStoreFacts(chunk)`** — safe wrapper, never throws. Opt-out via `MEMORY_EXTRACTION_DISABLED=1` env var. Uses effort=low for cost minimization. Skips short input (<50 chars). Provider failures are swallowed; compaction always continues.
60
+ - Wired into `compactSession()` after the daily-log flush, before the AI summary generation.
61
+ - Marked **experimental** in v4.11.0. Semantic dedup (vs current exact-string match) deferred to v4.12+.
62
+ - **11 unit tests** covering JSON parsing edge cases, dedup, opt-out, short-input skip, garbage input, non-string filtering, graceful provider-failure handling.
63
+
64
+ #### Architecture decisions
65
+
66
+ - **mempalace as MCP server: rejected.** Considered installing mempalace as a Python MCP service. Rejected because (1) Alvin is all-TypeScript and adding a 2nd Python service to launchd is operational complexity, (2) Alvin already has an embeddings vector index — mempalace would be a parallel duplicate, (3) mempalace's MCP tools are only consumed by the SDK; cron jobs, sub-agents, and non-SDK providers wouldn't see them. Conclusion: **adopt the patterns natively** (L0–L3 layering, AAAK-style structured extraction) rather than running a second service.
67
+ - **SQLite migration deferred.** The 128 MB JSON embeddings index is a known performance issue and is already noted in `~/.claude/projects/-Users-alvin-de/memory/project_alvinbot_sqlite_migration.md` for v4.12+. Orthogonal to the "frickelig nach Restart" UX problem this release targets.
68
+ - **Multi-user isolation deferred.** Memories are still global per data dir. Single-user use case, not a privacy concern for Ali's setup.
69
+ - **Decay/aging deferred.** Daily logs grow monotonically. Will be addressed alongside SQLite migration.
70
+
71
+ #### Testing
72
+
73
+ **292 tests total** (237 baseline + 55 new). All green. TSC clean.
74
+
75
+ - 9 session-persistence unit tests
76
+ - 8 SDK memory-injection tests (3 base + 5 smart-prompt mocked-search)
77
+ - 9 memory-layers tests (loader + topic match + token budget)
78
+ - 11 memory-extractor tests (parse + append + extract pipeline)
79
+ - 18 stress tests (100 sessions, schema drift, unicode, atomic recovery, hostile JSON, simulated restart)
80
+
81
+ **Live verification:**
82
+ - `tmp/live-stress-memory.mjs` — 50 fake sessions against the built `dist/`, real ~/.alvin-bot/memory/MEMORY.md as the L1 source, simulated restart via Map clear + reload. Result: 215 KB state file, 1 ms flush, 1 ms reload, 50/50 perfect round-trip.
83
+ - `tmp/live-edge-cases.mjs` — 7 hostile scenarios: all-null fields, 1000-burst debounce (2 ms), 20 concurrent flushes, extreme unicode (RTL + ZWJ + astral plane), 4-layer memory with project topic match, atomic write recovery from stale .tmp, empty project file skipping. All passed.
84
+
85
+ #### Files changed
86
+
87
+ - **NEW:** `src/services/session-persistence.ts`, `src/services/memory-layers.ts`, `src/services/memory-extractor.ts`
88
+ - **NEW tests:** `test/session-persistence.test.ts`, `test/memory-sdk-injection.test.ts`, `test/memory-layers.test.ts`, `test/memory-extractor.test.ts`, `test/memory-stress-restart.test.ts`
89
+ - **Modified:** `src/services/session.ts` (persist hook), `src/services/personality.ts` (SDK injection + isFirstTurn), `src/services/memory.ts` (use layered loader), `src/services/compaction.ts` (extractor hook), `src/handlers/message.ts` + `src/handlers/platform-message.ts` (smart prompt wiring), `src/handlers/commands.ts` (`markSessionDirty` calls), `src/index.ts` (load + flush wiring), `src/paths.ts` (4 new constants)
90
+ - **Plan:** `docs/superpowers/plans/2026-04-13-memory-persistence.md`
91
+
92
+ ---
93
+
94
+ ## [4.10.0] — 2026-04-13
95
+
96
+ ### 🚀 Async sub-agents — main session no longer blocks during long tasks
97
+
98
+ The big architecture upgrade: Claude can now delegate long-running work (SEO audits, multi-page research, full-repo analyses) to **background** sub-agents. The main Telegram session ends quickly, the user can keep chatting, and the sub-agent's final report arrives as a separate message when ready.
99
+
100
+ A colleague flagged the underlying problem on 2026-04-13 via WhatsApp voice note: *"It's weird that the main routine crashes when the sub-agents are still running. It should just run in the background, and that should have zero impact on the main routine."* He was right. OpenClaw had this years ago because back then the SDK didn't support async; today's `@anthropic-ai/claude-agent-sdk@0.2.97` already ships `run_in_background: true` on the Agent tool — Alvin just wasn't using it.
101
+
102
+ This release closes that gap in two complementary stages, both bundled into the same v4.10.0:
103
+
104
+ #### Stage 1 — System prompt teaches Claude when to use `run_in_background`
105
+
106
+ - New `BACKGROUND_SUBAGENT_HINT` constant in `src/services/personality.ts`, injected only into SDK sessions (non-SDK providers don't have an Agent tool).
107
+ - The hint tells Claude: for audits / multi-page research / >2 min tasks → ALWAYS set `run_in_background: true`. After launching, end the turn promptly. The bot delivers the result automatically when done.
108
+ - Net effect: Claude's main turn ends in ~5 s instead of 10+ minutes. `session.isProcessing` flips to `false` quickly so the user can keep chatting.
109
+
110
+ #### Stage 2 — Async-agent watcher polls and delivers
111
+
112
+ The hard part. Three new pure modules + one new wired-up service:
113
+
114
+ - **`src/services/async-agent-parser.ts`** (NEW, pure) — two helpers:
115
+ - `parseAsyncLaunchedToolResult(text)` extracts `agentId` + `output_file` from the SDK's plain-text `Async agent launched successfully…` tool-result. **Important**: the `.d.ts` type in the SDK package claims this is a JSON object with `outputFile: string`. The runtime actually emits plain text with `output_file` (snake_case). Captured live via probe — see the parser test fixtures.
116
+ - `parseOutputFileStatus(path)` tail-reads (64 KB) the JSONL `output_file` and detects completion by finding the most-recent `assistant` message with `stop_reason: "end_turn"`. Concatenates `content[].text` blocks for the final answer. Token usage extracted from the `usage` field. Survives partial last lines, garbage lines, and tail-cuts on huge files. **19 unit tests** including a 200 KB tail-test.
117
+ - **`src/services/async-agent-watcher.ts`** (NEW) — the polling service. `Map<agentId, PendingAsyncAgent>` in memory, persisted to `~/.alvin-bot/state/async-agents.json` for restart catch-up (same pattern as v4.9.0 cron scheduler). Public API: `startWatcher` / `stopWatcher` / `registerPendingAgent` / `pollOnce` / `listPendingAgents`. Polls every 15 s, gives up after 12 h per-agent (timeout banner). On completion → builds a `SubAgentInfo + SubAgentResult` and hands off to the existing `subagent-delivery.ts` from v4.9.x. **7 integration tests** including bot-restart catch-up.
118
+ - **`src/handlers/async-agent-chunk-handler.ts`** (NEW) — bridge between provider stream chunks and the watcher. Inspects `tool_result` chunks for the async_launched payload, extracts the `description` from the immediately preceding `tool_use` chunk, registers with the watcher. **4 unit tests**.
119
+ - **`src/providers/claude-sdk-provider.ts`** — extended to surface `tool_result` blocks from SDK `user` messages as a new `tool_result` chunk type. Previously the provider only emitted `text` and `tool_use` chunks.
120
+ - **`src/providers/types.ts`** — `StreamChunk` gets two new optional fields: `toolUseId` and `toolResultContent`.
121
+ - **`src/handlers/message.ts`** — captures `lastAgentToolUseInput` from each `tool_use` chunk and consumes it on the immediately-following `tool_result` chunk. Tool-name match also extended from `"Task"` → `"Task" | "Agent"` (the SDK renamed it in v2.1.63).
122
+ - **`src/index.ts`** — `startAsyncAgentWatcher()` after the cron scheduler, `stopAsyncAgentWatcher()` in the shutdown handler.
123
+ - **`src/paths.ts`** — new `ASYNC_AGENTS_STATE_FILE` constant under `~/.alvin-bot/state/`.
124
+
125
+ #### Investigation artifacts (gitignored, maintainer-local)
126
+
127
+ - `docs/superpowers/plans/2026-04-13-async-subagents.md` — full TDD plan
128
+ - `docs/superpowers/specs/sdk-async-agent-outputfile-format.md` — live-captured SDK format spec; documents the `.d.ts` mismatch that ate ~30 minutes of debugging time
129
+
130
+ #### Testing
131
+
132
+ **237 tests total** (201 baseline + 36 new). All green. TSC clean.
133
+
134
+ - 6 system-prompt-hint tests (Stage 1)
135
+ - 19 parser tests (8 plain-text format + 11 JSONL format including 200 KB tail-test)
136
+ - 7 watcher integration tests (register, deliver, persistence, restart catch-up, timeout, concurrent agents)
137
+ - 4 chunk-handler unit tests
138
+
139
+ Live-verified via isolated SDK probe (`node sdk-probe.mjs` inside the repo) which confirmed the real `output_file` path and JSONL format match the parser's expectations.
140
+
141
+ #### What you'll see as a user
142
+
143
+ Send: *"Make a SEO audit of gethomes.io and alev-b.com in parallel"*
144
+
145
+ - **0 s** — Claude responds: *"Starting both audits in the background — I'll send the reports when done."* Main session **unlocks**.
146
+ - **1–10 min later** — You can chat about anything else. The bot answers immediately.
147
+ - **~13 min** (when each agent finishes) — Two separate banner messages arrive: *"✅ SEO audit gethomes.io completed · 13m 17s · 2.6M in / 28k out"* + the full report body, delivered via the v4.9.3 Markdown→plain-text fallback path.
148
+
149
+ #### Non-goals
150
+
151
+ - No session-mutex refactor (Stage 3 from the analysis, out of scope here)
152
+ - No replacement for Alvin's existing cron `spawnSubAgent` system (different use case)
153
+ - No SDK upgrade beyond `0.2.97`
154
+
155
+ #### Compatibility
156
+
157
+ - `CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1` in `.env` disables background mode at the SDK level → Stage 1 hint becomes inert, watcher idles; foreground behavior is restored
158
+
5
159
  ## [4.9.4] — 2026-04-13
6
160
 
7
161
  ### 🔌 Web UI fully decoupled from main bot — port conflicts no longer crash anything
@@ -0,0 +1,33 @@
1
+ import { parseAsyncLaunchedToolResult } from "../services/async-agent-parser.js";
2
+ import { registerPendingAgent } from "../services/async-agent-watcher.js";
3
+ /**
4
+ * Inspect a stream chunk; if it's an Agent async_launched tool_result,
5
+ * register the pending agent with the watcher.
6
+ *
7
+ * Safe to call on any chunk type — non-tool_result chunks are ignored.
8
+ */
9
+ export function handleToolResultChunk(chunk, ctx) {
10
+ if (chunk.type !== "tool_result")
11
+ return;
12
+ if (!chunk.toolResultContent)
13
+ return;
14
+ const info = parseAsyncLaunchedToolResult(chunk.toolResultContent);
15
+ if (!info)
16
+ return;
17
+ // The description and prompt come from the original tool_use input,
18
+ // not the tool_result text. If we don't have them (e.g. test setup
19
+ // forgot to pass lastToolUseInput), fall back to a generic label so
20
+ // the user still sees something meaningful in the delivery banner.
21
+ const description = ctx.lastToolUseInput?.description?.trim() ||
22
+ `Background agent ${info.agentId.slice(0, 8)}`;
23
+ const prompt = ctx.lastToolUseInput?.prompt?.trim() || "";
24
+ registerPendingAgent({
25
+ agentId: info.agentId,
26
+ outputFile: info.outputFile,
27
+ description,
28
+ prompt,
29
+ chatId: ctx.chatId,
30
+ userId: ctx.userId,
31
+ toolUseId: chunk.toolUseId ?? null,
32
+ });
33
+ }
@@ -2,7 +2,7 @@ import { InlineKeyboard, InputFile } from "grammy";
2
2
  import fs from "fs";
3
3
  import path, { resolve } from "path";
4
4
  import os from "os";
5
- import { getSession, resetSession } from "../services/session.js";
5
+ import { getSession, resetSession, markSessionDirty } from "../services/session.js";
6
6
  import { getRegistry } from "../engine.js";
7
7
  import { reloadSoul } from "../services/personality.js";
8
8
  import { parseDuration, createReminder, listReminders, cancelReminder } from "../services/reminders.js";
@@ -399,6 +399,7 @@ export function registerCommands(bot) {
399
399
  const userId = ctx.from.id;
400
400
  const session = getSession(userId);
401
401
  session.voiceReply = !session.voiceReply;
402
+ markSessionDirty(userId);
402
403
  await ctx.reply(session.voiceReply
403
404
  ? "Voice replies enabled. Responses will also be sent as voice messages."
404
405
  : "Voice replies disabled. Text-only responses.");
@@ -421,6 +422,7 @@ export function registerCommands(bot) {
421
422
  return;
422
423
  }
423
424
  session.effort = level;
425
+ markSessionDirty(userId);
424
426
  await ctx.reply(`✅ Effort: ${EFFORT_LABELS[session.effort]}`);
425
427
  });
426
428
  // Inline keyboard callback for effort switching
@@ -433,6 +435,7 @@ export function registerCommands(bot) {
433
435
  const userId = ctx.from.id;
434
436
  const session = getSession(userId);
435
437
  session.effort = level;
438
+ markSessionDirty(userId);
436
439
  const keyboard = new InlineKeyboard();
437
440
  for (const [key, label] of Object.entries(EFFORT_LABELS)) {
438
441
  const marker = key === session.effort ? "✅ " : "";
@@ -827,6 +830,7 @@ export function registerCommands(bot) {
827
830
  }
828
831
  else if (arg === "en" || arg === "de" || arg === "es" || arg === "fr") {
829
832
  session.language = arg;
833
+ markSessionDirty(userId);
830
834
  const { setExplicitLanguage } = await import("../services/language-detect.js");
831
835
  setExplicitLanguage(userId, arg);
832
836
  await ctx.reply(t("bot.lang.setFixed", arg, { name: LOCALE_NAMES[arg] }));
@@ -851,6 +855,7 @@ export function registerCommands(bot) {
851
855
  }
852
856
  const newLang = choice;
853
857
  session.language = newLang;
858
+ markSessionDirty(userId);
854
859
  const { setExplicitLanguage } = await import("../services/language-detect.js");
855
860
  setExplicitLanguage(userId, newLang);
856
861
  const currentName = `${LOCALE_FLAGS[newLang]} ${LOCALE_NAMES[newLang]}`;
@@ -4,7 +4,7 @@ import { getSession, addToHistory, trackProviderUsage, buildSessionKey } from ".
4
4
  import { TelegramStreamer } from "../services/telegram.js";
5
5
  import { getRegistry } from "../engine.js";
6
6
  import { textToSpeech } from "../services/voice.js";
7
- import { buildSystemPrompt, buildSmartSystemPrompt } from "../services/personality.js";
7
+ import { buildSmartSystemPrompt } from "../services/personality.js";
8
8
  import { buildSkillContext } from "../services/skills.js";
9
9
  import { isForwardingAllowed } from "../services/access.js";
10
10
  import { touchProfile } from "../services/users.js";
@@ -15,6 +15,7 @@ import { trackUsage } from "../services/usage-tracker.js";
15
15
  import { emitUserMessage as broadcastUserMessage, emitResponseStart as broadcastResponseStart, emitResponseDelta as broadcastResponseDelta, emitResponseDone as broadcastResponseDone, } from "../services/broadcast.js";
16
16
  import { t } from "../i18n.js";
17
17
  import { isHarmlessTelegramError } from "../util/telegram-error-filter.js";
18
+ import { handleToolResultChunk } from "./async-agent-chunk-handler.js";
18
19
  /**
19
20
  * Stuck-only timeout — NO absolute cap.
20
21
  *
@@ -218,12 +219,17 @@ export async function handleMessage(ctx) {
218
219
  if (adaptedLang !== session.language) {
219
220
  session.language = adaptedLang;
220
221
  }
221
- // Build query options (with semantic memory search for non-SDK + skill injection)
222
+ // Build query options (with semantic memory search for non-SDK + skill injection).
223
+ // v4.11.0 P0 #3: SDK now also gets semantic recall on first-turn. The signal
224
+ // is `session.sessionId === null` — meaning Claude SDK hasn't given us a
225
+ // resume token yet for this session. True for: brand-new users, post-/new,
226
+ // and rehydrated sessions where the persisted snapshot lacked a sessionId.
227
+ // After the first SDK turn, Claude resumes via SDK session_id and already
228
+ // carries the recalled context — no need for another search per turn.
222
229
  const chatIdStr = String(ctx.chat.id);
223
230
  const skillContext = buildSkillContext(text);
224
- const systemPrompt = (isSDK
225
- ? buildSystemPrompt(isSDK, session.language, chatIdStr)
226
- : await buildSmartSystemPrompt(isSDK, session.language, text, chatIdStr)) + skillContext;
231
+ const isFirstSDKTurn = isSDK && session.sessionId === null;
232
+ const systemPrompt = (await buildSmartSystemPrompt(isSDK, session.language, text, chatIdStr, isFirstSDKTurn)) + skillContext;
227
233
  // Track the user turn in history regardless of provider type. This keeps
228
234
  // the fallback path (Ollama etc.) aware of what was said on SDK turns.
229
235
  addToHistory(userId, { role: "user", content: text });
@@ -279,6 +285,11 @@ export async function handleMessage(ctx) {
279
285
  };
280
286
  // Stream response from provider (with fallback)
281
287
  let lastBroadcastLen = 0;
288
+ // Captured during tool_use chunks; consumed by tool_result chunks so
289
+ // the async-agent watcher can label pending agents with their human-
290
+ // readable description (which only appears in the tool_use input,
291
+ // not in the tool_result text). See Fix #17 Stage 2.
292
+ let lastAgentToolUseInput;
282
293
  for await (const chunk of registry.queryWithFallback(queryOpts)) {
283
294
  // Any chunk is progress — reset the stuck timer.
284
295
  resetStuckTimer();
@@ -309,13 +320,14 @@ export async function handleMessage(ctx) {
309
320
  if (chunk.toolName) {
310
321
  session.toolUseCount++;
311
322
  const icon = TOOL_ICONS[chunk.toolName] || "🔧";
312
- // Special treatment for Claude's SDK-internal Task tool:
323
+ // Special treatment for Claude's SDK-internal Task/Agent tool:
313
324
  // track how many sub-tasks Claude delegated and surface the
314
325
  // task description in the status line so the user sees WHAT
315
- // is being delegated, not just "Task…".
316
- if (chunk.toolName === "Task") {
326
+ // is being delegated, not just "Task…". The tool was renamed
327
+ // from "Task" to "Agent" in Claude Code v2.1.63 — match both.
328
+ if (chunk.toolName === "Task" || chunk.toolName === "Agent") {
317
329
  session.sdkSubTaskCount++;
318
- let label = "Task";
330
+ let label = chunk.toolName;
319
331
  if (chunk.toolInput) {
320
332
  try {
321
333
  const parsed = JSON.parse(chunk.toolInput);
@@ -324,11 +336,18 @@ export async function handleMessage(ctx) {
324
336
  const desc = parsed.description.length > 80
325
337
  ? parsed.description.slice(0, 80) + "…"
326
338
  : parsed.description;
327
- label = `Task: ${desc}`;
339
+ label = `${chunk.toolName}: ${desc}`;
328
340
  }
329
341
  else if (parsed.subagent_type) {
330
- label = `Task (${parsed.subagent_type})`;
342
+ label = `${chunk.toolName} (${parsed.subagent_type})`;
331
343
  }
344
+ // Capture the description+prompt for the upcoming
345
+ // tool_result. Used by Fix #17 Stage 2 to label
346
+ // background agents in the watcher's delivery banner.
347
+ lastAgentToolUseInput = {
348
+ description: parsed.description,
349
+ prompt: parsed.prompt,
350
+ };
332
351
  }
333
352
  catch {
334
353
  // not JSON — keep generic label
@@ -341,6 +360,20 @@ export async function handleMessage(ctx) {
341
360
  }
342
361
  }
343
362
  break;
363
+ case "tool_result":
364
+ // Fix #17 Stage 2: detect Agent async_launched payloads and
365
+ // hand them off to the async-agent watcher. The watcher will
366
+ // poll the outputFile and deliver the result as a separate
367
+ // Telegram message when the background agent finishes.
368
+ handleToolResultChunk(chunk, {
369
+ chatId: ctx.chat.id,
370
+ userId,
371
+ lastToolUseInput: lastAgentToolUseInput,
372
+ });
373
+ // Reset the captured input — only the immediately following
374
+ // tool_result should consume it.
375
+ lastAgentToolUseInput = undefined;
376
+ break;
344
377
  case "done":
345
378
  if (chunk.sessionId)
346
379
  session.sessionId = chunk.sessionId;
@@ -9,7 +9,7 @@
9
9
  import fs from "fs";
10
10
  import { getSession, addToHistory, trackProviderUsage } from "../services/session.js";
11
11
  import { getRegistry } from "../engine.js";
12
- import { buildSystemPrompt, buildSmartSystemPrompt } from "../services/personality.js";
12
+ import { buildSmartSystemPrompt } from "../services/personality.js";
13
13
  import { buildSkillContext } from "../services/skills.js";
14
14
  import { touchProfile } from "../services/users.js";
15
15
  import { trackAndAdapt } from "../services/language-detect.js";
@@ -129,9 +129,9 @@ export async function handlePlatformMessage(msg, adapter) {
129
129
  const activeProvider = registry.getActive();
130
130
  const isSDK = activeProvider.config.type === "claude-sdk";
131
131
  const skillContext = buildSkillContext(fullText);
132
- const systemPrompt = (isSDK
133
- ? buildSystemPrompt(isSDK, session.language, msg.chatId)
134
- : await buildSmartSystemPrompt(isSDK, session.language, fullText, msg.chatId)) + skillContext;
132
+ // v4.11.0 P0 #3 — SDK gets semantic recall on first turn (when no resume token yet).
133
+ const isFirstSDKTurn = isSDK && session.sessionId === null;
134
+ const systemPrompt = (await buildSmartSystemPrompt(isSDK, session.language, fullText, msg.chatId, isFirstSDKTurn)) + skillContext;
135
135
  const queryOpts = {
136
136
  prompt: fullText,
137
137
  systemPrompt,
package/dist/index.js CHANGED
@@ -78,7 +78,9 @@ import { loadPlugins, registerPluginCommands, unloadPlugins } from "./services/p
78
78
  import { initMCP, disconnectMCP, hasMCPConfig } from "./services/mcp.js";
79
79
  import { startWebServer, stopWebServer } from "./web/server.js";
80
80
  import { startScheduler, stopScheduler, setNotifyCallback } from "./services/cron.js";
81
- import { startSessionCleanup, stopSessionCleanup } from "./services/session.js";
81
+ import { startWatcher as startAsyncAgentWatcher, stopWatcher as stopAsyncAgentWatcher } from "./services/async-agent-watcher.js";
82
+ import { startSessionCleanup, stopSessionCleanup, attachPersistHook } from "./services/session.js";
83
+ import { loadPersistedSessions, flushSessions, schedulePersist, } from "./services/session-persistence.js";
82
84
  import { processQueue, cleanupQueue, setSenders, enqueue } from "./services/delivery-queue.js";
83
85
  import { discoverTools } from "./services/tool-discovery.js";
84
86
  import { startHeartbeat } from "./services/heartbeat.js";
@@ -254,7 +256,12 @@ const shutdown = async () => {
254
256
  await cancelAllSubAgents(true);
255
257
  stopWatchdog();
256
258
  stopScheduler();
259
+ stopAsyncAgentWatcher();
257
260
  stopSessionCleanup();
261
+ // v4.11.0 — Final immediate flush of in-memory sessions to disk before exit.
262
+ // The debounced timer might be pending; flushSessions() cancels it and writes
263
+ // synchronously so the next boot can rehydrate the latest state.
264
+ await flushSessions().catch((err) => console.warn("[shutdown] flushSessions failed:", err));
258
265
  if (queueInterval)
259
266
  clearInterval(queueInterval);
260
267
  if (queueCleanupInterval)
@@ -420,9 +427,21 @@ setNotifyCallback(async (target, text) => {
420
427
  enqueue(target.platform, String(target.chatId), text);
421
428
  });
422
429
  startScheduler();
430
+ // Start the async-agent watcher (Fix #17 Stage 2). Polls outputFiles
431
+ // of background sub-agents Claude launched with run_in_background and
432
+ // delivers their completed reports as separate Telegram messages.
433
+ // Loads any persisted pending agents from disk on boot.
434
+ startAsyncAgentWatcher();
423
435
  // Session memory hygiene: purge sessions idle > 7 days (configurable via
424
436
  // ALVIN_SESSION_TTL_DAYS). Never touches active sessions — see session.ts.
425
437
  startSessionCleanup();
438
+ // Session persistence (v4.11.0): wire the debounced persist hook BEFORE we
439
+ // load the snapshot, then rehydrate the in-memory Map from disk so users'
440
+ // Claude SDK session_id, conversation history, language and effort all
441
+ // survive bot restarts. Without this, every launchctl restart turns the
442
+ // bot into a goldfish for every active conversation.
443
+ attachPersistHook(schedulePersist);
444
+ loadPersistedSessions();
426
445
  // Wire delivery queue senders
427
446
  setSenders({
428
447
  telegram: async (chatId, content) => {
package/dist/paths.js CHANGED
@@ -41,8 +41,15 @@ export const TOOLS_EXAMPLE_JSON = resolve(BOT_ROOT, "docs", "tools.example.json"
41
41
  export const ENV_FILE = resolve(DATA_DIR, ".env");
42
42
  /** memory/ — Daily logs and embeddings */
43
43
  export const MEMORY_DIR = resolve(DATA_DIR, "memory");
44
- /** memory/MEMORY.md — Long-term curated memory */
44
+ /** memory/MEMORY.md — Long-term curated memory (legacy monolithic, still loaded) */
45
45
  export const MEMORY_FILE = resolve(DATA_DIR, "memory", "MEMORY.md");
46
+ /** memory/identity.md — L0 layer (v4.11.0): core user facts, always loaded.
47
+ * Optional. If missing, MEMORY.md acts as the L0+L1 fallback. */
48
+ export const IDENTITY_FILE = resolve(DATA_DIR, "memory", "identity.md");
49
+ /** memory/preferences.md — L1 layer (v4.11.0): communication style + don'ts. */
50
+ export const PREFERENCES_FILE = resolve(DATA_DIR, "memory", "preferences.md");
51
+ /** memory/projects/ — L2 layer (v4.11.0): per-project context loaded on topic match. */
52
+ export const PROJECTS_MEMORY_DIR = resolve(DATA_DIR, "memory", "projects");
46
53
  /** memory/.embeddings.json — Vector index */
47
54
  export const EMBEDDINGS_IDX = resolve(DATA_DIR, "memory", ".embeddings.json");
48
55
  /** users/ — User profiles and per-user memory */
@@ -62,6 +69,16 @@ export const SUDO_ENC_FILE = resolve(DATA_DIR, "data", ".sudo-enc");
62
69
  export const SUDO_KEY_FILE = resolve(DATA_DIR, "data", ".sudo-key");
63
70
  /** backups/ — Config snapshots */
64
71
  export const BACKUP_DIR = resolve(DATA_DIR, "backups");
72
+ /** state/async-agents.json — Pending background SDK agents (Fix #17 Stage 2).
73
+ * See src/services/async-agent-watcher.ts for the watcher that polls and
74
+ * delivers these. Survives bot restarts. */
75
+ export const ASYNC_AGENTS_STATE_FILE = resolve(DATA_DIR, "state", "async-agents.json");
76
+ /** state/sessions.json — Persisted user sessions across bot restarts (v4.11.0).
77
+ * Includes: sessionId (Claude SDK resume token), language, effort, voiceReply,
78
+ * workingDir, lastActivity, lastSdkHistoryIndex, history (capped). Atomic write
79
+ * via tmp+rename. Loaded on startup, debounce-flushed on mutations.
80
+ * See src/services/session-persistence.ts for the loader/flusher. */
81
+ export const SESSIONS_STATE_FILE = resolve(DATA_DIR, "state", "sessions.json");
65
82
  /** soul.md — Bot personality */
66
83
  export const SOUL_FILE = resolve(DATA_DIR, "soul.md");
67
84
  /** tools.md — Custom tool definitions (Markdown) */
@@ -186,6 +186,49 @@ export class ClaudeSDKProvider {
186
186
  }
187
187
  }
188
188
  }
189
+ // User message — tool_results from the Claude API arrive as user
190
+ // messages in the SDK protocol. We surface tool_result blocks as
191
+ // chunks so the message handler can detect Agent async_launched
192
+ // payloads and register them with the watcher (Fix #17 Stage 2).
193
+ if (message.type === "user") {
194
+ // eslint-disable-next-line @typescript-eslint/no-explicit-any
195
+ const userMsg = message;
196
+ const content = userMsg.message?.content;
197
+ if (Array.isArray(content)) {
198
+ for (const block of content) {
199
+ if (block &&
200
+ typeof block === "object" &&
201
+ block.type === "tool_result" &&
202
+ typeof block.tool_use_id === "string") {
203
+ // The `content` field on a tool_result block can be a
204
+ // plain string OR an array of content blocks. Normalize
205
+ // to a single string so the chunk consumer doesn't need
206
+ // to know about the SDK shape.
207
+ let contentText = "";
208
+ if (typeof block.content === "string") {
209
+ contentText = block.content;
210
+ }
211
+ else if (Array.isArray(block.content)) {
212
+ contentText = block.content
213
+ .map((c) => {
214
+ if (c && typeof c === "object" && "text" in c) {
215
+ const t = c.text;
216
+ return typeof t === "string" ? t : "";
217
+ }
218
+ return "";
219
+ })
220
+ .join("");
221
+ }
222
+ yield {
223
+ type: "tool_result",
224
+ toolUseId: block.tool_use_id,
225
+ toolResultContent: contentText,
226
+ sessionId: capturedSessionId,
227
+ };
228
+ }
229
+ }
230
+ }
231
+ }
189
232
  // Result — done (extract full usage including cache tokens)
190
233
  if (message.type === "result") {
191
234
  const resultMsg = message;