npm - claude-mem-lite - Versions diffs - 3.4.0 → 3.5.0 - Mend

claude-mem-lite 3.4.0 → 3.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +1 -1
package/README.md +31 -13
package/mem-cli.mjs +11 -4
package/package.json +1 -1
package/search-engine.mjs +40 -1
package/server.mjs +11 -5

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -10,7 +10,7 @@
   "plugins": [
     {
       "name": "claude-mem-lite",
-      "version": "3.4.0",
+      "version": "3.5.0",
       "source": "./",
       "description": "Persistent long-term memory for Claude Code via MCP — captures coding decisions, bugfixes, and context across sessions. Hybrid FTS5 + TF-IDF search with episode batching. Single SQLite DB, no external services. A lighter, lower-cost alternative to claude-mem (episode batching + a smaller model; cost savings are an internal estimate, not a measured benchmark)."
     }

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-mem-lite",
-  "version": "3.4.0",
+  "version": "3.5.0",
   "description": "Persistent long-term memory for Claude Code via MCP — captures coding decisions, bugfixes, and context across sessions. Hybrid FTS5 + TF-IDF search with episode batching. Single SQLite DB, no external services. A lighter, lower-cost alternative to claude-mem (episode batching + a smaller model; cost savings are an internal estimate, not a measured benchmark).",
   "author": {
     "name": "sdsrss"

package/README.md CHANGED Viewed

@@ -649,10 +649,15 @@ The benchmark suite runs as a CI gate (`npm run benchmark:gate`) to prevent sear
 Beyond the in-repo micro-benchmark above, claude-mem-lite is measured on
 [LongMemEval](https://github.com/xiaowu0162/LongMemEval) (Wu et al.) — a
 500-question long-term-memory benchmark — so its recall is comparable to the
-field, not just to itself. Metric is **recall_any@k**: is a gold evidence session
-in the top *k* retrieved? Corpus is user-turns-only (the standard raw-baseline
-rule). Runners: `benchmark/longmemeval.mjs` (lexical) and
-`benchmark/longmemeval-rerank.mjs` (rerank).
+field, not just to itself. Metric is **recall_any@k**: does *any* gold evidence session appear in the
+top *k* retrieved? This is the same session-level definition the systems we
+compare against report on this split — [agentmemory](https://github.com/rohitg00/agentmemory)
+(BM25 + vector + graph) and dense-embedding systems like MemPalace — so the rows
+below sit on one axis, not metric-shopped. (Note: 65% of the 500 questions have
+multiple gold sessions, so `recall_any@k` is looser than fractional recall there;
+all systems in this comparison report the any-hit form.) Corpus is user-turns-only
+(the standard raw-baseline rule). Runners: `benchmark/longmemeval.mjs` (lexical)
+and `benchmark/longmemeval-rerank.mjs` (rerank).
 | Retriever (zero embeddings) | @1 | @5 | @10 |
 |---|---|---|---|
@@ -664,15 +669,28 @@ hands the top 20 lexical candidates to a single Haiku call (~1.4 s/query) that
 reorders them. It is **never worse than the lexical baseline by construction** —
 any LLM or parse failure falls back to the original candidate order.
-**On embeddings, honestly.** With no LLM in the loop, dense-embedding retrieval
-still wins on raw recall — a dense-embedding baseline reports ~96.6% @5 on this
-split, versus our 90.6%. The rerank row's point is that a *single cheap LLM call
-closes that gap*: a zero-embedding lexical stack reaches 96.8% @5, edging the
-embedding raw number, because the lexical candidate set is already rich enough
-(recall@20 = 97.8%) that ranking — not recall — is the bottleneck. An
-embedding-plus-rerank stack still leads when both sides spend an LLM call; the
-takeaway is that claude-mem-lite needs **no vector model, no Python, and no
-external service** to reach embedding-competitive precision.
+**Stricter metric, for the record.** The rows above are `recall_any@k` — does *any*
+gold session reach the top *k* — the metric agentmemory and MemPalace publish, so the
+comparison is like-for-like. Under the stricter **standard recall@k** (`|gold ∩ top-k| /
+|gold|`, the *fraction* of all gold sessions retrieved), the lexical stack scores
+@1 = 46.9% / @5 = 84.4% / @10 = 91.9%. The whole gap is the 65% of questions with
+multiple gold sessions — any-hit needs one, fractional needs them all, and @1 is capped
+at 1/|gold| there; single-gold question types score identically under both.
+`benchmark/longmemeval.mjs` reports both columns (the rerank row's fractional is not yet
+measured).
+**On embeddings, honestly.** With no LLM in the loop, both a dense-embedding
+baseline (MemPalace, ~96.6% @5) and a BM25 + vector + graph hybrid (agentmemory,
+95.2% @5) out-recall our zero-embedding lexical stack (90.6% @5) at the same
+retrieval stage — dense and graph signal genuinely help raw recall, and most of
+our remaining gap is paraphrase (single-session-preference is our lowest category
+at 63%). The rerank row's point is that a *single cheap LLM call closes it*:
+reordering the top-20 lexical candidates reaches 96.8% @5 — matching the dense raw
+number and edging the hybrid's retrieval score — because the lexical candidate set
+is already rich enough (recall@20 = 97.8%) that ranking, not recall, is the
+bottleneck. An embedding-plus-rerank stack still leads when both sides spend an LLM
+call; the takeaway is that claude-mem-lite reaches embedding-competitive precision
+with **no vector model, no knowledge graph, no Python, and no external service**.
 Per-category @5 (lexical → +rerank): knowledge-update 98.7 → 100.0 ·
 single-session-user 91.4 → 98.6 · temporal-reasoning 89.5 → 97.7 · multi-session

package/mem-cli.mjs CHANGED Viewed

@@ -9,7 +9,7 @@ import { resolveProject } from './project-utils.mjs';
 import { TIER_CASE_SQL, tierSqlParams } from './tier.mjs';
 import { _resetVocabCache } from './tfidf.mjs';
 import { autoBoostIfNeeded, reRankWithContext, markSuperseded } from './server-internals.mjs';
-import { searchObservationsHybrid, countSearchTotal } from './search-engine.mjs';
+import { searchObservationsHybrid, countSearchTotal, attachBodyTokens } from './search-engine.mjs';
 import { deepSearch, resolveDeepMode, shouldEscalateToDeep, autoDeepLlmReady, hasEscalatableCorpus } from './deep-search.mjs';
 import { ensureRegistryDb, upsertResource } from './registry.mjs';
 import { searchResources } from './registry-retriever.mjs';
@@ -323,6 +323,9 @@ async function cmdSearch(db, args, { llm } = {}) {
       includeNoise,
     }), results.length);
   const paged = results.slice(offset, offset + limit);
+  // Enrich the final page with the ~Nt fetch-cost hint (paired with MCP mem_search; #8654 both
+  // source keys handled). Batch-fetches heavy obs fields by id — no-op on an empty page.
+  attachBodyTokens(db, paged);
   if (paged.length === 0) {
     if (jsonOutput) {
@@ -361,6 +364,7 @@ async function cmdSearch(db, args, { llm } = {}) {
         importance: r.importance ?? null,
         superseded: Boolean(r.superseded),
         files_modified: r.files_modified || null,
+        body_tokens: r.bodyTokens ?? null,
       };
     });
     out(JSON.stringify({
@@ -382,19 +386,22 @@ async function cmdSearch(db, args, { llm } = {}) {
   // Pluralize on total — "Found 1 of 44 result" reads wrong; the population (44) drives
   // grammatical number, not the page slice (1).
   out(`[mem] Found ${countLabel} result${total !== 1 ? 's' : ''} for "${query}"${fallbackHint}:${hasMixed ? ' (# observation, S# session, P# prompt)' : ''}`);
+  // `~Nt` = est. tokens to fetch this row's full body via mem_get (attachBodyTokens, paired with
+  // MCP). Conditional so a row that skipped enrichment renders cleanly, not "~undefinedt".
+  const tok = r => (r.bodyTokens ? ` ~${r.bodyTokens}t` : '');
   for (const r of paged) {
     const timeStr = showTime && r.created_at_epoch ? ` (${relativeTime(r.created_at_epoch)})` : '';
     if (r._source === 'session') {
       const date = fmtDateShort(r.created_at);
-      out(`S#${r.id} 📋 ${date}${timeStr} ${truncate(r.request || r.completed || '(no summary)', 80)}`);
+      out(`S#${r.id} 📋 ${date}${timeStr} ${truncate(r.request || r.completed || '(no summary)', 80)}${tok(r)}`);
     } else if (r._source === 'prompt') {
       const date = fmtDateShort(r.created_at);
-      out(`P#${r.id} 💬 ${date}${timeStr} ${truncate(r.prompt_text || '(empty)', 80)}`);
+      out(`P#${r.id} 💬 ${date}${timeStr} ${truncate(r.prompt_text || '(empty)', 80)}${tok(r)}`);
     } else {
       const date = fmtDateShort(r.created_at);
       const title = truncate(r.title || r.subtitle || '(untitled)', 80);
       const supersededTag = r.superseded ? ' [SUPERSEDED]' : '';
-      out(`#${r.id} ${typeIcon(r.type)} ${date}${timeStr} ${title}${supersededTag}`);
+      out(`#${r.id} ${typeIcon(r.type)} ${date}${timeStr} ${title}${supersededTag}${tok(r)}`);
       if (r.lesson_learned) {
         out(`  -> ${truncate(r.lesson_learned, 80)}`);
       }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-mem-lite",
-  "version": "3.4.0",
+  "version": "3.5.0",
   "description": "Persistent long-term memory for Claude Code via MCP — captures coding decisions, bugfixes, and context across sessions. Hybrid FTS5 + TF-IDF search with episode batching. Single SQLite DB, no external services. A lighter, lower-cost alternative to claude-mem (episode batching + a smaller model; cost savings are an internal estimate, not a measured benchmark).",
   "type": "module",
   "packageManager": "npm@10.9.2",

package/search-engine.mjs CHANGED Viewed

@@ -9,7 +9,7 @@ import {
   OBS_BM25, TYPE_DECAY_CASE, TYPE_QUALITY_CASE,
   DEFAULT_DECAY_HALF_LIFE_MS,
   notLowSignalTitleClause, LOW_SIGNAL_TITLE,
-  relaxFtsQueryToOr, debugLog, debugCatch,
+  relaxFtsQueryToOr, debugLog, debugCatch, estimateTokens,
 } from './utils.mjs';
 import { getVocabulary, computeVector, vectorSearch, rrfMerge } from './tfidf.mjs';
 import { extractPRFTerms, expandQueryByConcepts } from './server-internals.mjs';
@@ -190,6 +190,45 @@ export function ftsRowToResult(r, { scoreMultiplier, snippet } = {}) {
   };
 }
+// Per-result estimate of the token cost to fetch the FULL body via mem_get, surfaced as the
+// `~Nt` hint in search output so the agent can budget the 3-layer protocol (search → timeline →
+// get) before paying to expand any ID. Adopted from thedotmack/claude-mem's token-cost column
+// (reference_claude_mem_comparison) — the one genuinely portable idea from that analysis.
+//
+// Layer-1 search deliberately omits narrative/facts (that's what keeps the index light), so the
+// heavy obs fields are batch-fetched by id HERE rather than carried on every result. The source
+// key is read as `source || _source` because the two render paths disagree (#8654): MCP sets
+// `source`+`text`, CLI sets `_source`+`prompt_text`. estimateTokens floors at 1, so a missing row
+// or empty body yields 1 — never 0/NaN.
+export function attachBodyTokens(db, results) {
+  if (!Array.isArray(results) || results.length === 0) return results;
+  const obsIds = results
+    .filter(r => (r.source || r._source) === 'obs' && Number.isInteger(r.id))
+    .map(r => r.id);
+  const bodyById = new Map();
+  if (obsIds.length > 0) {
+    try {
+      const ph = obsIds.map(() => '?').join(',');
+      const rows = db.prepare(`SELECT id, narrative, facts, text FROM observations WHERE id IN (${ph})`).all(...obsIds);
+      for (const row of rows) bodyById.set(row.id, row);
+    } catch (e) { debugCatch(e, 'attachBodyTokens'); }
+  }
+  for (const r of results) {
+    const src = r.source || r._source;
+    let parts;
+    if (src === 'obs') {
+      const row = bodyById.get(r.id) || {};
+      parts = [r.title, r.subtitle, r.lesson_learned, row.narrative, row.facts, row.text];
+    } else if (src === 'session') {
+      parts = [r.request, r.completed, r.working_on];
+    } else {
+      parts = [r.text, r.prompt_text];
+    }
+    r.bodyTokens = estimateTokens(parts.filter(Boolean).join(' '));
+  }
+  return results;
+}
 function expandObsByConceptCo(db, ctx, now, existingIds, results, includeNoise = false) {
   const { ftsQuery, args, epochFrom, epochTo, limit } = ctx;
   if (results.length >= Math.ceil(limit / 2)) return;

package/server.mjs CHANGED Viewed

@@ -9,7 +9,7 @@ import { truncate, typeIcon, inferProject, scrubSecrets, fmtDate, debugLog, debu
 import { resolveProject as _resolveProjectShared } from './project-utils.mjs';
 import { ensureDb, DB_PATH, DB_DIR, REGISTRY_DB_PATH } from './schema.mjs';
 import { reRankWithContext, markSuperseded, autoBoostIfNeeded, runIdleCleanup, buildServerInstructions } from './server-internals.mjs';
-import { searchObservationsHybrid, countSearchTotal } from './search-engine.mjs';
+import { searchObservationsHybrid, countSearchTotal, attachBodyTokens } from './search-engine.mjs';
 import { deepSearch, resolveDeepMode, shouldEscalateToDeep, autoDeepLlmReady, hasEscalatableCorpus } from './deep-search.mjs';
 import { selectCompressionCandidates, groupByProjectWeek, compressGroup } from './lib/compress-core.mjs';
 import { resolveAnchorToken, formatAnchorError, resolveQueryAnchor, fetchRecentTimeline, fetchTimelineWindow } from './lib/timeline-core.mjs';
@@ -294,21 +294,24 @@ function formatSearchOutput(paginatedResults, args, ftsQuery, totalCount, orFall
   const fallbackHint = orFallbackFired && !args.or ? ' (relaxed AND→OR)' : '';
   lines.push(`Found ${countLabel} result(s)${qLabel}${fallbackHint}:${hasMixed ? ' (# observation, S# session, P# prompt)' : ''}\n`);
+  // `~Nt` = estimated tokens to fetch this row's full body via mem_get (attachBodyTokens).
+  // Conditional so a result that skipped enrichment renders cleanly, not "~undefinedt".
+  const tok = r => (r.bodyTokens ? ` ~${r.bodyTokens}t` : '');
   for (const r of paginatedResults) {
     if (r.source === 'obs') {
       const supersededTag = r.superseded ? ' [SUPERSEDED]' : '';
-      lines.push(`#${r.id} ${typeIcon(r.type)} [${r.type}] ${truncate(r.title || r.subtitle || '(untitled)')} | ${r.project} | ${fmtDate(r.date)}${supersededTag}`);
+      lines.push(`#${r.id} ${typeIcon(r.type)} [${r.type}] ${truncate(r.title || r.subtitle || '(untitled)')} | ${r.project} | ${fmtDate(r.date)}${supersededTag}${tok(r)}`);
       if (r.snippet && r.snippet.length > 10 && r.snippet !== r.title) {
         lines.push(`     ${truncate(r.snippet, 100)}`);
       }
     } else if (r.source === 'session') {
-      lines.push(`S#${r.id} 📋 ${truncate(r.request || r.completed || '(no summary)')} | ${r.project} | ${fmtDate(r.date)}`);
+      lines.push(`S#${r.id} 📋 ${truncate(r.request || r.completed || '(no summary)')} | ${r.project} | ${fmtDate(r.date)}${tok(r)}`);
     } else if (r.source === 'prompt') {
-      lines.push(`P#${r.id} 💬 ${truncate(r.text)} | ${fmtDate(r.date)}`);
+      lines.push(`P#${r.id} 💬 ${truncate(r.text)} | ${fmtDate(r.date)}${tok(r)}`);
     }
   }
-  lines.push(`\nWorkflow: mem_timeline(anchor=ID) for context | mem_get(ids=[...]) for full details`);
+  lines.push(`\nWorkflow: mem_timeline(anchor=ID) for context | mem_get(ids=[...]) for full details  ·  ~Nt = est. tokens to fetch full detail`);
   return { content: [{ type: 'text', text: lines.join('\n') }] };
 }
@@ -508,6 +511,9 @@ async function runSearchPipeline(db, args, { llm, rerankLlm } = {}) {
       }), results.length);
     // Always apply pagination — single-source results can exceed SQL LIMIT due to expansion (concept co-occurrence, PRF, vector search)
     const paginatedResults = (offset > 0 || results.length > limit) ? results.slice(offset, offset + limit) : results;
+    // Enrich the FINAL page with a fetch-cost estimate (~Nt) so the agent budgets before mem_get.
+    // Uses the same db threaded through the pipeline (#8743) — batch-fetches heavy obs fields by id.
+    attachBodyTokens(db, paginatedResults);
     // Observability: announce auto-escalation on stderr (parity with CLI deep note).
     if (escalated) process.stderr.write(`[mem] auto-escalated to deep search (weak results: ${escalatedObsCount} hits)\n`);