npm - alvin-bot - Versions diffs - 4.10.0 → 4.11.0 - Mend

alvin-bot 4.10.0 → 4.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/CHANGELOG.md +89 -0
package/dist/handlers/commands.js +6 -1
package/dist/handlers/message.js +10 -5
package/dist/handlers/platform-message.js +4 -4
package/dist/index.js +13 -1
package/dist/paths.js +14 -1
package/dist/services/compaction.js +13 -0
package/dist/services/memory-extractor.js +178 -0
package/dist/services/memory-layers.js +147 -0
package/dist/services/memory.js +15 -8
package/dist/services/personality.js +45 -18
package/dist/services/session-persistence.js +159 -0
package/dist/services/session.js +30 -0
package/package.json +2 -2
package/test/memory-extractor.test.ts +151 -0
package/test/memory-layers.test.ts +169 -0
package/test/memory-sdk-injection.test.ts +146 -0
package/test/memory-stress-restart.test.ts +336 -0
package/test/session-persistence.test.ts +192 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,95 @@
 All notable changes to Alvin Bot are documented here.
+## [4.11.0] — 2026-04-13
+### 🧠 Memory Persistence + Smart Loading — sessions survive restart, memory is layered
+A colleague asked the same day v4.10.0 shipped: *"Memory after session restart is also a bit fiddly. I installed mempalace as a workaround — maybe build something like that natively."* He was right. Alvin had a hand-curated `MEMORY.md`, a 128 MB embeddings vector index, and an AI-powered compaction service — but **the in-memory `sessions Map` was wiped on every bot restart**. Claude SDK then started a fresh conversation on the next user message, behaving like a goldfish despite all that memory infrastructure on disk.
+This release fixes that with **five complementary tasks**, all bundled into v4.11.0. Three core fixes (P0) plus two structural improvements (P1) inspired by mempalace's L0–L3 stack and Mem0's auto-extraction pattern.
+#### P0 #1 — Session Persistence (`src/services/session-persistence.ts`, NEW)
+The core fix. The `sessions Map` in `src/services/session.ts` was in-memory only; every `launchctl kickstart` wiped every user's `sessionId`, history, language, effort, voiceReply, and tracking counters.
+- **Debounced flush** (1.5 s coalesce window) writes a sanitized snapshot of `getAllSessions()` to `~/.alvin-bot/state/sessions.json` via atomic tmp+rename.
+- **`loadPersistedSessions()`** rehydrates the Map at bot startup; `flushSessions()` flushes synchronously on graceful shutdown (SIGINT/SIGTERM).
+- **`attachPersistHook()` / `markSessionDirty()`** in `session.ts` give handlers a callback to trigger persist after direct mutations (`/lang`, `/effort`, `/voice`). `addToHistory()` and `trackProviderUsage()` trigger it automatically.
+- History is capped at `MAX_PERSISTED_HISTORY = 50` per session so the file stays small.
+- Runtime-only fields (`abortController`, `isProcessing`, `messageQueue`) are stripped before persisting.
+- Schema drift is handled: missing fields fall back to defaults; corrupt JSON loads zero sessions; null root rejected gracefully.
+- **9 unit tests** + **18 stress tests** covering 100-session burst, 1000-mutate debounce coalescing, unicode (RTL/ZWJ/astral plane), atomic write recovery from stale `.tmp`, schema drift, hostile JSON, read-only filesystem, simulated bot restart.
+#### P0 #2 — MEMORY.md Auto-Inject for SDK (`src/services/personality.ts`)
+Before v4.11.0, only non-SDK providers (Groq, Gemini, NVIDIA) got `buildMemoryContext()` injected into their system prompt. The Claude SDK was *expected* to read memory files via tools, but in practice rarely did unless the user's first message specifically prompted it.
+- Drops the `!isSDK` guard around `buildMemoryContext()` and asset-index injection.
+- SDK now gets the same compact memory context (MEMORY.md + today + yesterday daily logs) at every turn — the same context non-SDK providers had since 4.0.
+- **3 unit tests** verifying SDK includes the memory section, non-SDK regression, and graceful behavior when MEMORY.md is missing.
+#### P0 #3 — Semantic Recall on SDK First Turn (`src/services/personality.ts`, `src/handlers/message.ts`, `src/handlers/platform-message.ts`)
+`buildSmartSystemPrompt()` now accepts an `isFirstTurn` flag. For SDK providers it runs the embeddings-based `searchMemory()` only on the first turn (`session.sessionId === null` — meaning Claude hasn't given us a resume token yet for this session). After the first turn Claude carries the recalled context inside the SDK session via resume, so spamming the embeddings API on every subsequent turn is wasted work. Non-SDK providers still run the search on every turn (no resume mechanism).
+- `handlers/message.ts` and `handlers/platform-message.ts` updated to compute `isFirstSDKTurn = isSDK && session.sessionId === null` and pass it through.
+- The bare `buildSystemPrompt` calls on the SDK paths are gone — `buildSmartSystemPrompt` is the single entry point.
+- **5 mocked-search tests** covering call-count semantics for SDK first/later turns, non-SDK every turn, missing `userMessage` skip, and graceful failure when `searchMemory` throws.
+#### P1 #4 — Layered Memory Loader (`src/services/memory-layers.ts`, NEW)
+Inspired by mempalace's L0–L3 stack. Replaces the monolithic `MEMORY.md → System Prompt` injection with a structured, token-budgeted layered loader:
+- **L0** `~/.alvin-bot/memory/identity.md` — always loaded, ~200 tokens (core user facts: name, location, family, contact)
+- **L1** `~/.alvin-bot/memory/preferences.md` — always loaded (communication style, do's and don'ts)
+- **L1** `~/.alvin-bot/memory/MEMORY.md` — backwards-compat: existing curated knowledge (full content if no split files exist; truncated to 1500 chars when split files coexist)
+- **L2** `~/.alvin-bot/memory/projects/*.md` — loaded only when the user's incoming query mentions the project topic (substring or first-200-char keyword overlap)
+- **L3** daily logs — still handled by `embeddings.ts` vector search (unchanged)
+The split is **opt-in**: if `identity.md` and `preferences.md` don't exist, the loader falls back to monolithic MEMORY.md exactly like before. No migration required for existing users. Users who want the cleaner layout can split MEMORY.md manually and the loader picks it up automatically. Token budget: L0+L1 capped at 5000 chars (~1300 tokens), L2 capped at 3000 chars total (~750 tokens, max 1500 per matched project file). New `query` parameter on `buildSystemPrompt()` and `buildMemoryContext()` propagates the user message all the way through. **9 unit tests** + 2 layered-context stress tests.
+#### P1 #5 — Auto-Fact-Extraction in Compaction (`src/services/memory-extractor.ts`, NEW)
+Inspired by Mem0's auto-extraction. When `compactSession()` archives old messages, it now runs an additional extraction pass that pulls structured facts (`user_facts`, `preferences`, `decisions`) out of the archived chunk via the active AI provider and appends them to MEMORY.md.
+- **`parseExtractedFacts(text)`** — tolerates JSON wrapped in markdown code fences, surrounding prose, null/undefined fields, non-string entries.
+- **`appendFactsToMemoryFile(facts)`** — exact-string dedup against existing MEMORY.md content, structured under `## Auto-extracted (YYYY-MM-DD)` header with `### User Facts` / `### Preferences` / `### Decisions` sub-sections.
+- **`extractAndStoreFacts(chunk)`** — safe wrapper, never throws. Opt-out via `MEMORY_EXTRACTION_DISABLED=1` env var. Uses effort=low for cost minimization. Skips short input (<50 chars). Provider failures are swallowed; compaction always continues.
+- Wired into `compactSession()` after the daily-log flush, before the AI summary generation.
+- Marked **experimental** in v4.11.0. Semantic dedup (vs current exact-string match) deferred to v4.12+.
+- **11 unit tests** covering JSON parsing edge cases, dedup, opt-out, short-input skip, garbage input, non-string filtering, graceful provider-failure handling.
+#### Architecture decisions
+- **mempalace as MCP server: rejected.** Considered installing mempalace as a Python MCP service. Rejected because (1) Alvin is all-TypeScript and adding a 2nd Python service to launchd is operational complexity, (2) Alvin already has an embeddings vector index — mempalace would be a parallel duplicate, (3) mempalace's MCP tools are only consumed by the SDK; cron jobs, sub-agents, and non-SDK providers wouldn't see them. Conclusion: **adopt the patterns natively** (L0–L3 layering, AAAK-style structured extraction) rather than running a second service.
+- **SQLite migration deferred.** The 128 MB JSON embeddings index is a known performance issue and is already noted in `~/.claude/projects/-Users-alvin-de/memory/project_alvinbot_sqlite_migration.md` for v4.12+. Orthogonal to the "frickelig nach Restart" UX problem this release targets.
+- **Multi-user isolation deferred.** Memories are still global per data dir. Single-user use case, not a privacy concern for Ali's setup.
+- **Decay/aging deferred.** Daily logs grow monotonically. Will be addressed alongside SQLite migration.
+#### Testing
+**292 tests total** (237 baseline + 55 new). All green. TSC clean.
+- 9 session-persistence unit tests
+- 8 SDK memory-injection tests (3 base + 5 smart-prompt mocked-search)
+- 9 memory-layers tests (loader + topic match + token budget)
+- 11 memory-extractor tests (parse + append + extract pipeline)
+- 18 stress tests (100 sessions, schema drift, unicode, atomic recovery, hostile JSON, simulated restart)
+**Live verification:**
+- `tmp/live-stress-memory.mjs` — 50 fake sessions against the built `dist/`, real ~/.alvin-bot/memory/MEMORY.md as the L1 source, simulated restart via Map clear + reload. Result: 215 KB state file, 1 ms flush, 1 ms reload, 50/50 perfect round-trip.
+- `tmp/live-edge-cases.mjs` — 7 hostile scenarios: all-null fields, 1000-burst debounce (2 ms), 20 concurrent flushes, extreme unicode (RTL + ZWJ + astral plane), 4-layer memory with project topic match, atomic write recovery from stale .tmp, empty project file skipping. All passed.
+#### Files changed
+- **NEW:** `src/services/session-persistence.ts`, `src/services/memory-layers.ts`, `src/services/memory-extractor.ts`
+- **NEW tests:** `test/session-persistence.test.ts`, `test/memory-sdk-injection.test.ts`, `test/memory-layers.test.ts`, `test/memory-extractor.test.ts`, `test/memory-stress-restart.test.ts`
+- **Modified:** `src/services/session.ts` (persist hook), `src/services/personality.ts` (SDK injection + isFirstTurn), `src/services/memory.ts` (use layered loader), `src/services/compaction.ts` (extractor hook), `src/handlers/message.ts` + `src/handlers/platform-message.ts` (smart prompt wiring), `src/handlers/commands.ts` (`markSessionDirty` calls), `src/index.ts` (load + flush wiring), `src/paths.ts` (4 new constants)
+- **Plan:** `docs/superpowers/plans/2026-04-13-memory-persistence.md`
+---
 ## [4.10.0] — 2026-04-13
 ### 🚀 Async sub-agents — main session no longer blocks during long tasks

package/dist/handlers/commands.js CHANGED Viewed

@@ -2,7 +2,7 @@ import { InlineKeyboard, InputFile } from "grammy";
 import fs from "fs";
 import path, { resolve } from "path";
 import os from "os";
-import { getSession, resetSession } from "../services/session.js";
+import { getSession, resetSession, markSessionDirty } from "../services/session.js";
 import { getRegistry } from "../engine.js";
 import { reloadSoul } from "../services/personality.js";
 import { parseDuration, createReminder, listReminders, cancelReminder } from "../services/reminders.js";
@@ -399,6 +399,7 @@ export function registerCommands(bot) {
         const userId = ctx.from.id;
         const session = getSession(userId);
         session.voiceReply = !session.voiceReply;
+        markSessionDirty(userId);
         await ctx.reply(session.voiceReply
             ? "Voice replies enabled. Responses will also be sent as voice messages."
             : "Voice replies disabled. Text-only responses.");
@@ -421,6 +422,7 @@ export function registerCommands(bot) {
             return;
         }
         session.effort = level;
+        markSessionDirty(userId);
         await ctx.reply(`✅ Effort: ${EFFORT_LABELS[session.effort]}`);
     });
     // Inline keyboard callback for effort switching
@@ -433,6 +435,7 @@ export function registerCommands(bot) {
         const userId = ctx.from.id;
         const session = getSession(userId);
         session.effort = level;
+        markSessionDirty(userId);
         const keyboard = new InlineKeyboard();
         for (const [key, label] of Object.entries(EFFORT_LABELS)) {
             const marker = key === session.effort ? "✅ " : "";
@@ -827,6 +830,7 @@ export function registerCommands(bot) {
         }
         else if (arg === "en" || arg === "de" || arg === "es" || arg === "fr") {
             session.language = arg;
+            markSessionDirty(userId);
             const { setExplicitLanguage } = await import("../services/language-detect.js");
             setExplicitLanguage(userId, arg);
             await ctx.reply(t("bot.lang.setFixed", arg, { name: LOCALE_NAMES[arg] }));
@@ -851,6 +855,7 @@ export function registerCommands(bot) {
         }
         const newLang = choice;
         session.language = newLang;
+        markSessionDirty(userId);
         const { setExplicitLanguage } = await import("../services/language-detect.js");
         setExplicitLanguage(userId, newLang);
         const currentName = `${LOCALE_FLAGS[newLang]} ${LOCALE_NAMES[newLang]}`;

package/dist/handlers/message.js CHANGED Viewed

@@ -4,7 +4,7 @@ import { getSession, addToHistory, trackProviderUsage, buildSessionKey } from ".
 import { TelegramStreamer } from "../services/telegram.js";
 import { getRegistry } from "../engine.js";
 import { textToSpeech } from "../services/voice.js";
-import { buildSystemPrompt, buildSmartSystemPrompt } from "../services/personality.js";
+import { buildSmartSystemPrompt } from "../services/personality.js";
 import { buildSkillContext } from "../services/skills.js";
 import { isForwardingAllowed } from "../services/access.js";
 import { touchProfile } from "../services/users.js";
@@ -219,12 +219,17 @@ export async function handleMessage(ctx) {
         if (adaptedLang !== session.language) {
             session.language = adaptedLang;
         }
-        // Build query options (with semantic memory search for non-SDK + skill injection)
+        // Build query options (with semantic memory search for non-SDK + skill injection).
+        // v4.11.0 P0 #3: SDK now also gets semantic recall on first-turn. The signal
+        // is `session.sessionId === null` — meaning Claude SDK hasn't given us a
+        // resume token yet for this session. True for: brand-new users, post-/new,
+        // and rehydrated sessions where the persisted snapshot lacked a sessionId.
+        // After the first SDK turn, Claude resumes via SDK session_id and already
+        // carries the recalled context — no need for another search per turn.
         const chatIdStr = String(ctx.chat.id);
         const skillContext = buildSkillContext(text);
-        const systemPrompt = (isSDK
-            ? buildSystemPrompt(isSDK, session.language, chatIdStr)
-            : await buildSmartSystemPrompt(isSDK, session.language, text, chatIdStr)) + skillContext;
+        const isFirstSDKTurn = isSDK && session.sessionId === null;
+        const systemPrompt = (await buildSmartSystemPrompt(isSDK, session.language, text, chatIdStr, isFirstSDKTurn)) + skillContext;
         // Track the user turn in history regardless of provider type. This keeps
         // the fallback path (Ollama etc.) aware of what was said on SDK turns.
         addToHistory(userId, { role: "user", content: text });

package/dist/handlers/platform-message.js CHANGED Viewed

@@ -9,7 +9,7 @@
 import fs from "fs";
 import { getSession, addToHistory, trackProviderUsage } from "../services/session.js";
 import { getRegistry } from "../engine.js";
-import { buildSystemPrompt, buildSmartSystemPrompt } from "../services/personality.js";
+import { buildSmartSystemPrompt } from "../services/personality.js";
 import { buildSkillContext } from "../services/skills.js";
 import { touchProfile } from "../services/users.js";
 import { trackAndAdapt } from "../services/language-detect.js";
@@ -129,9 +129,9 @@ export async function handlePlatformMessage(msg, adapter) {
         const activeProvider = registry.getActive();
         const isSDK = activeProvider.config.type === "claude-sdk";
         const skillContext = buildSkillContext(fullText);
-        const systemPrompt = (isSDK
-            ? buildSystemPrompt(isSDK, session.language, msg.chatId)
-            : await buildSmartSystemPrompt(isSDK, session.language, fullText, msg.chatId)) + skillContext;
+        // v4.11.0 P0 #3 — SDK gets semantic recall on first turn (when no resume token yet).
+        const isFirstSDKTurn = isSDK && session.sessionId === null;
+        const systemPrompt = (await buildSmartSystemPrompt(isSDK, session.language, fullText, msg.chatId, isFirstSDKTurn)) + skillContext;
         const queryOpts = {
             prompt: fullText,
             systemPrompt,

package/dist/index.js CHANGED Viewed

@@ -79,7 +79,8 @@ import { initMCP, disconnectMCP, hasMCPConfig } from "./services/mcp.js";
 import { startWebServer, stopWebServer } from "./web/server.js";
 import { startScheduler, stopScheduler, setNotifyCallback } from "./services/cron.js";
 import { startWatcher as startAsyncAgentWatcher, stopWatcher as stopAsyncAgentWatcher } from "./services/async-agent-watcher.js";
-import { startSessionCleanup, stopSessionCleanup } from "./services/session.js";
+import { startSessionCleanup, stopSessionCleanup, attachPersistHook } from "./services/session.js";
+import { loadPersistedSessions, flushSessions, schedulePersist, } from "./services/session-persistence.js";
 import { processQueue, cleanupQueue, setSenders, enqueue } from "./services/delivery-queue.js";
 import { discoverTools } from "./services/tool-discovery.js";
 import { startHeartbeat } from "./services/heartbeat.js";
@@ -257,6 +258,10 @@ const shutdown = async () => {
     stopScheduler();
     stopAsyncAgentWatcher();
     stopSessionCleanup();
+    // v4.11.0 — Final immediate flush of in-memory sessions to disk before exit.
+    // The debounced timer might be pending; flushSessions() cancels it and writes
+    // synchronously so the next boot can rehydrate the latest state.
+    await flushSessions().catch((err) => console.warn("[shutdown] flushSessions failed:", err));
     if (queueInterval)
         clearInterval(queueInterval);
     if (queueCleanupInterval)
@@ -430,6 +435,13 @@ startAsyncAgentWatcher();
 // Session memory hygiene: purge sessions idle > 7 days (configurable via
 // ALVIN_SESSION_TTL_DAYS). Never touches active sessions — see session.ts.
 startSessionCleanup();
+// Session persistence (v4.11.0): wire the debounced persist hook BEFORE we
+// load the snapshot, then rehydrate the in-memory Map from disk so users'
+// Claude SDK session_id, conversation history, language and effort all
+// survive bot restarts. Without this, every launchctl restart turns the
+// bot into a goldfish for every active conversation.
+attachPersistHook(schedulePersist);
+loadPersistedSessions();
 // Wire delivery queue senders
 setSenders({
     telegram: async (chatId, content) => {

package/dist/paths.js CHANGED Viewed

@@ -41,8 +41,15 @@ export const TOOLS_EXAMPLE_JSON = resolve(BOT_ROOT, "docs", "tools.example.json"
 export const ENV_FILE = resolve(DATA_DIR, ".env");
 /** memory/ — Daily logs and embeddings */
 export const MEMORY_DIR = resolve(DATA_DIR, "memory");
-/** memory/MEMORY.md — Long-term curated memory */
+/** memory/MEMORY.md — Long-term curated memory (legacy monolithic, still loaded) */
 export const MEMORY_FILE = resolve(DATA_DIR, "memory", "MEMORY.md");
+/** memory/identity.md — L0 layer (v4.11.0): core user facts, always loaded.
+ *  Optional. If missing, MEMORY.md acts as the L0+L1 fallback. */
+export const IDENTITY_FILE = resolve(DATA_DIR, "memory", "identity.md");
+/** memory/preferences.md — L1 layer (v4.11.0): communication style + don'ts. */
+export const PREFERENCES_FILE = resolve(DATA_DIR, "memory", "preferences.md");
+/** memory/projects/ — L2 layer (v4.11.0): per-project context loaded on topic match. */
+export const PROJECTS_MEMORY_DIR = resolve(DATA_DIR, "memory", "projects");
 /** memory/.embeddings.json — Vector index */
 export const EMBEDDINGS_IDX = resolve(DATA_DIR, "memory", ".embeddings.json");
 /** users/ — User profiles and per-user memory */
@@ -66,6 +73,12 @@ export const BACKUP_DIR = resolve(DATA_DIR, "backups");
  *  See src/services/async-agent-watcher.ts for the watcher that polls and
  *  delivers these. Survives bot restarts. */
 export const ASYNC_AGENTS_STATE_FILE = resolve(DATA_DIR, "state", "async-agents.json");
+/** state/sessions.json — Persisted user sessions across bot restarts (v4.11.0).
+ *  Includes: sessionId (Claude SDK resume token), language, effort, voiceReply,
+ *  workingDir, lastActivity, lastSdkHistoryIndex, history (capped). Atomic write
+ *  via tmp+rename. Loaded on startup, debounce-flushed on mutations.
+ *  See src/services/session-persistence.ts for the loader/flusher. */
+export const SESSIONS_STATE_FILE = resolve(DATA_DIR, "state", "sessions.json");
 /** soul.md — Bot personality */
 export const SOUL_FILE = resolve(DATA_DIR, "soul.md");
 /** tools.md — Custom tool definitions (Markdown) */

package/dist/services/compaction.js CHANGED Viewed

@@ -65,6 +65,19 @@ export async function compactSession(session) {
     catch (err) {
         console.error("Compaction: failed to flush to memory:", err);
     }
+    // v4.11.0 P1 #5 — Auto-extract structured facts from the archived chunk
+    // and persist them to MEMORY.md. Experimental feature, opt-out via
+    // MEMORY_EXTRACTION_DISABLED=1. Safe wrapper — never throws.
+    try {
+        const { extractAndStoreFacts } = await import("./memory-extractor.js");
+        const result = await extractAndStoreFacts(summaryInput);
+        if (result.factsStored > 0) {
+            console.log(`🧠 memory-extractor: stored ${result.factsStored} new fact(s) in MEMORY.md`);
+        }
+    }
+    catch (err) {
+        console.warn("memory-extractor failed (non-fatal):", err instanceof Error ? err.message : err);
+    }
     // Try AI-powered summary
     let summaryText = null;
     try {

package/dist/services/memory-extractor.js ADDED Viewed

@@ -0,0 +1,178 @@
+/**
+ * Memory Extractor (v4.11.0, experimental)
+ *
+ * When the compaction service archives old conversation chunks, it normally
+ * dumps prose into the daily log. This extractor adds a structured pass that
+ * pulls user_facts, preferences, and decisions out of the chunk and appends
+ * them to MEMORY.md (de-duplicated by exact-string match).
+ *
+ * Pattern inspired by Mem0's auto-extraction. Designed to be safe:
+ *   - Opt-out via MEMORY_EXTRACTION_DISABLED=1
+ *   - Uses the active provider with effort=low
+ *   - Failures are swallowed; compaction continues regardless
+ *   - Dedup is exact-string only (no embedding-based semantic dedup yet)
+ */
+import fs from "fs";
+import { dirname } from "path";
+import { MEMORY_FILE } from "../paths.js";
+const EMPTY_FACTS = {
+    user_facts: [],
+    preferences: [],
+    decisions: [],
+};
+const EXTRACTION_PROMPT = `Extract structured facts from this conversation chunk. Return ONLY a JSON object with these keys:
+{
+  "user_facts": ["concrete facts about the user that should persist forever"],
+  "preferences": ["communication style or workflow preferences the user expressed"],
+  "decisions": ["explicit decisions made (e.g., 'use X instead of Y')"]
+}
+Rules:
+- Each entry must be ONE short, declarative sentence (max 100 chars).
+- Skip transient conversation details (questions, todos, ephemeral state).
+- Skip facts that are obvious from context (e.g., "user asked a question").
+- Empty arrays are fine — don't invent facts.
+- Output ONLY the JSON, no commentary.
+Conversation chunk:
+`;
+/**
+ * Parse the JSON output from the AI extractor. Tolerates markdown code-fence
+ * wrapping and surrounding prose. Returns empty arrays on any parse failure.
+ */
+export function parseExtractedFacts(text) {
+    if (!text || typeof text !== "string")
+        return { ...EMPTY_FACTS };
+    // Strip markdown code fences if present
+    let cleaned = text.trim();
+    const fenceMatch = cleaned.match(/^```(?:json)?\s*\n?([\s\S]*?)\n?```\s*$/);
+    if (fenceMatch)
+        cleaned = fenceMatch[1].trim();
+    // Try to find the first { ... } block if there's surrounding prose
+    const braceMatch = cleaned.match(/\{[\s\S]*\}/);
+    if (braceMatch)
+        cleaned = braceMatch[0];
+    try {
+        const parsed = JSON.parse(cleaned);
+        return {
+            user_facts: Array.isArray(parsed.user_facts)
+                ? parsed.user_facts.filter((s) => typeof s === "string")
+                : [],
+            preferences: Array.isArray(parsed.preferences)
+                ? parsed.preferences.filter((s) => typeof s === "string")
+                : [],
+            decisions: Array.isArray(parsed.decisions)
+                ? parsed.decisions.filter((s) => typeof s === "string")
+                : [],
+        };
+    }
+    catch {
+        return { ...EMPTY_FACTS };
+    }
+}
+/**
+ * Append extracted facts to MEMORY.md under structured headers, deduplicated
+ * by exact-string match against existing content.
+ */
+export async function appendFactsToMemoryFile(facts) {
+    const total = facts.user_facts.length + facts.preferences.length + facts.decisions.length;
+    if (total === 0)
+        return 0;
+    // Read existing content for dedup
+    let existing = "";
+    try {
+        existing = fs.readFileSync(MEMORY_FILE, "utf-8");
+    }
+    catch {
+        // File doesn't exist yet — that's fine, mkdir parent
+        fs.mkdirSync(dirname(MEMORY_FILE), { recursive: true });
+    }
+    const isDuplicate = (line) => existing.includes(line);
+    const newLines = [];
+    const todayIso = new Date().toISOString().slice(0, 10);
+    const sectionHeader = `\n\n## Auto-extracted (${todayIso})\n`;
+    let stored = 0;
+    if (facts.user_facts.length > 0) {
+        const newOnes = facts.user_facts.filter(f => !isDuplicate(f));
+        if (newOnes.length > 0) {
+            newLines.push("\n### User Facts");
+            for (const f of newOnes) {
+                newLines.push(`- ${f}`);
+                stored++;
+            }
+        }
+    }
+    if (facts.preferences.length > 0) {
+        const newOnes = facts.preferences.filter(p => !isDuplicate(p));
+        if (newOnes.length > 0) {
+            newLines.push("\n### Preferences");
+            for (const p of newOnes) {
+                newLines.push(`- ${p}`);
+                stored++;
+            }
+        }
+    }
+    if (facts.decisions.length > 0) {
+        const newOnes = facts.decisions.filter(d => !isDuplicate(d));
+        if (newOnes.length > 0) {
+            newLines.push("\n### Decisions");
+            for (const d of newOnes) {
+                newLines.push(`- ${d}`);
+                stored++;
+            }
+        }
+    }
+    if (stored > 0) {
+        const block = sectionHeader + newLines.join("\n") + "\n";
+        fs.appendFileSync(MEMORY_FILE, block, "utf-8");
+    }
+    return stored;
+}
+/**
+ * Extract facts from a conversation chunk and store them in MEMORY.md.
+ * Safe wrapper — never throws, always returns an ExtractionResult.
+ */
+export async function extractAndStoreFacts(conversationText) {
+    if (process.env.MEMORY_EXTRACTION_DISABLED === "1") {
+        return { disabled: true, factsStored: 0 };
+    }
+    if (!conversationText || conversationText.trim().length < 50) {
+        return { disabled: false, factsStored: 0 };
+    }
+    let extractedText = "";
+    try {
+        // Lazy-import the registry so test environments without an engine init
+        // don't crash on module load.
+        const { getRegistry } = await import("../engine.js");
+        const registry = getRegistry();
+        const opts = {
+            prompt: EXTRACTION_PROMPT + conversationText.slice(0, 8000),
+            systemPrompt: "You are a fact extractor. Output only valid JSON, no commentary.",
+            effort: "low",
+        };
+        for await (const chunk of registry.queryWithFallback(opts)) {
+            if (chunk.type === "text" && chunk.text) {
+                extractedText = chunk.text;
+            }
+            if (chunk.type === "error") {
+                // Provider failed — silent fallback
+                return { disabled: false, factsStored: 0 };
+            }
+        }
+    }
+    catch {
+        return { disabled: false, factsStored: 0 };
+    }
+    if (!extractedText)
+        return { disabled: false, factsStored: 0 };
+    const facts = parseExtractedFacts(extractedText);
+    let stored = 0;
+    try {
+        stored = await appendFactsToMemoryFile(facts);
+    }
+    catch {
+        // appendFactsToMemoryFile failed — non-fatal
+    }
+    return { disabled: false, factsStored: stored };
+}

package/dist/services/memory-layers.js ADDED Viewed

@@ -0,0 +1,147 @@
+/**
+ * Memory Layers Service (v4.11.0)
+ *
+ * Layered memory loader inspired by mempalace's L0–L3 stack:
+ *
+ *   L0 identity.md       always loaded, ~200 tokens (core user facts)
+ *   L1 preferences.md    always loaded (communication style)
+ *   L1 MEMORY.md         backwards-compat: monolithic curated knowledge
+ *   L2 projects/*.md     loaded on topic match against the user's query
+ *   L3 daily logs        only via vector search (handled by embeddings.ts)
+ *
+ * If neither identity.md nor preferences.md exists, this loader still works
+ * via the monolithic MEMORY.md fallback, so existing setups need no migration.
+ *
+ * Token budget: capped at ~5000 chars for L0+L1, +~3000 chars for matched L2.
+ */
+import fs from "fs";
+import path from "path";
+import { IDENTITY_FILE, PREFERENCES_FILE, PROJECTS_MEMORY_DIR, MEMORY_FILE, } from "../paths.js";
+const MAX_L0_L1_CHARS = 5000;
+const MAX_L2_PROJECT_CHARS = 1500;
+const MAX_L2_TOTAL_CHARS = 3000;
+function readSafe(file) {
+    try {
+        return fs.readFileSync(file, "utf-8");
+    }
+    catch {
+        return "";
+    }
+}
+/**
+ * Load all memory layers from disk. Cheap — no API calls, just file reads.
+ */
+export function loadMemoryLayers() {
+    const identity = readSafe(IDENTITY_FILE);
+    const preferences = readSafe(PREFERENCES_FILE);
+    const longTerm = readSafe(MEMORY_FILE);
+    const projects = [];
+    try {
+        if (fs.existsSync(PROJECTS_MEMORY_DIR)) {
+            const entries = fs.readdirSync(PROJECTS_MEMORY_DIR);
+            for (const entry of entries) {
+                if (!entry.endsWith(".md") || entry.startsWith("."))
+                    continue;
+                const fullPath = path.resolve(PROJECTS_MEMORY_DIR, entry);
+                const content = readSafe(fullPath);
+                if (content.trim()) {
+                    projects.push({
+                        topic: entry.replace(/\.md$/, ""),
+                        content,
+                    });
+                }
+            }
+        }
+    }
+    catch {
+        // projects dir missing or unreadable — fine
+    }
+    return { identity, preferences, longTerm, projects };
+}
+/**
+ * Match L2 projects against the user query.
+ * Topic match is naive substring (case-insensitive) on filename + first 200 chars
+ * of the project content. For v4.11.0 this is intentionally simple — vector
+ * search via embeddings.ts handles the deep cases.
+ */
+function matchProjectsToQuery(projects, query) {
+    if (!query)
+        return [];
+    const q = query.toLowerCase();
+    const matched = [];
+    for (const p of projects) {
+        const topicLower = p.topic.toLowerCase();
+        if (q.includes(topicLower)) {
+            matched.push(p);
+            continue;
+        }
+        // Also check the first 200 chars of project content — this catches cases
+        // where the user mentions a project's headline term that isn't the
+        // filename (e.g., "VPS" matching alev-b.md which mentions "VPS:" upfront).
+        const head = p.content.slice(0, 200).toLowerCase();
+        const headWords = head.split(/[\s\W]+/).filter(w => w.length >= 4);
+        if (headWords.some(w => q.includes(w))) {
+            matched.push(p);
+        }
+    }
+    return matched;
+}
+/**
+ * Build a token-budgeted layered context string suitable for system prompt injection.
+ *
+ * @param query Optional user query. If provided, L2 projects matching the query
+ *              get included. If omitted, only L0+L1 are loaded (boot-up brief).
+ */
+export function buildLayeredContext(query) {
+    const layers = loadMemoryLayers();
+    const parts = [];
+    let l0l1Chars = 0;
+    if (layers.identity) {
+        const truncated = layers.identity.length > MAX_L0_L1_CHARS
+            ? layers.identity.slice(0, MAX_L0_L1_CHARS) + "\n[...truncated]"
+            : layers.identity;
+        parts.push("## Identity (L0)\n" + truncated);
+        l0l1Chars += truncated.length;
+    }
+    if (layers.preferences && l0l1Chars < MAX_L0_L1_CHARS) {
+        const remaining = MAX_L0_L1_CHARS - l0l1Chars;
+        const truncated = layers.preferences.length > remaining
+            ? layers.preferences.slice(0, remaining) + "\n[...truncated]"
+            : layers.preferences;
+        parts.push("## Preferences (L1)\n" + truncated);
+        l0l1Chars += truncated.length;
+    }
+    // Backwards-compat: if no identity AND no preferences, use the monolithic
+    // MEMORY.md as L1 fully (existing user setups). If split files exist,
+    // include MEMORY.md as a secondary L1 with tighter truncation.
+    if (!layers.identity && !layers.preferences && layers.longTerm) {
+        const truncated = layers.longTerm.length > MAX_L0_L1_CHARS
+            ? layers.longTerm.slice(0, MAX_L0_L1_CHARS) + "\n[...truncated]"
+            : layers.longTerm;
+        parts.push("## Long-term Memory (L1, monolithic)\n" + truncated);
+    }
+    else if (layers.longTerm) {
+        const SECONDARY_CAP = 1500;
+        const truncated = layers.longTerm.length > SECONDARY_CAP
+            ? layers.longTerm.slice(0, SECONDARY_CAP) + "\n[...truncated]"
+            : layers.longTerm;
+        parts.push("## Long-term Memory (L1, legacy MEMORY.md)\n" + truncated);
+    }
+    // L2: project-specific, only when a query is provided
+    if (query && layers.projects.length > 0) {
+        const matched = matchProjectsToQuery(layers.projects, query);
+        let l2TotalChars = 0;
+        for (const p of matched) {
+            if (l2TotalChars >= MAX_L2_TOTAL_CHARS)
+                break;
+            const remaining = MAX_L2_TOTAL_CHARS - l2TotalChars;
+            const cap = Math.min(MAX_L2_PROJECT_CHARS, remaining);
+            const content = p.content.length > cap
+                ? p.content.slice(0, cap) + "\n[...truncated]"
+                : p.content;
+            parts.push(`## Project: ${p.topic} (L2)\n${content}`);
+            l2TotalChars += content.length;
+        }
+    }
+    return parts.join("\n\n");
+}

package/dist/services/memory.js CHANGED Viewed

@@ -11,6 +11,7 @@ import fs from "fs";
 import { resolve } from "path";
 import { MEMORY_DIR, MEMORY_FILE } from "../paths.js";
 import { reindexMemory } from "./embeddings.js";
+import { buildLayeredContext } from "./memory-layers.js";
 // Ensure dirs exist
 if (!fs.existsSync(MEMORY_DIR))
     fs.mkdirSync(MEMORY_DIR, { recursive: true });
@@ -65,16 +66,22 @@ export function appendDailyLog(entry) {
     reindexMemory().catch(() => { });
 }
 /**
- * Build memory context for injection into non-SDK prompts.
- * Returns relevant memory as a compact string.
+ * Build memory context for injection into prompts.
+ *
+ * v4.11.0 — Now uses the layered memory loader (memory-layers.ts) which
+ * combines L0 (identity.md), L1 (preferences.md + legacy MEMORY.md), and
+ * optional L2 (projects/*.md matched against the query). Falls back to the
+ * monolithic MEMORY.md alone if the split files don't exist.
+ *
+ * @param query Optional user query — when provided, L2 projects matching
+ *              the query get included. When omitted, only L0+L1 are loaded.
  */
-export function buildMemoryContext() {
+export function buildMemoryContext(query) {
     const parts = [];
-    // Long-term memory (truncate if too long)
-    const ltm = loadLongTermMemory();
-    if (ltm) {
-        const truncated = ltm.length > 2000 ? ltm.slice(0, 2000) + "\n[...truncated]" : ltm;
-        parts.push(`## Long-term Memory\n${truncated}`);
+    // L0+L1 (+ matched L2 if query) via layered loader
+    const layered = buildLayeredContext(query);
+    if (layered) {
+        parts.push(layered);
     }
     // Today's log
     const todayLog = loadDailyLog();