claude-mem-lite 3.4.0 → 3.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -10,7 +10,7 @@
10
10
  "plugins": [
11
11
  {
12
12
  "name": "claude-mem-lite",
13
- "version": "3.4.0",
13
+ "version": "3.5.0",
14
14
  "source": "./",
15
15
  "description": "Persistent long-term memory for Claude Code via MCP — captures coding decisions, bugfixes, and context across sessions. Hybrid FTS5 + TF-IDF search with episode batching. Single SQLite DB, no external services. A lighter, lower-cost alternative to claude-mem (episode batching + a smaller model; cost savings are an internal estimate, not a measured benchmark)."
16
16
  }
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-mem-lite",
3
- "version": "3.4.0",
3
+ "version": "3.5.0",
4
4
  "description": "Persistent long-term memory for Claude Code via MCP — captures coding decisions, bugfixes, and context across sessions. Hybrid FTS5 + TF-IDF search with episode batching. Single SQLite DB, no external services. A lighter, lower-cost alternative to claude-mem (episode batching + a smaller model; cost savings are an internal estimate, not a measured benchmark).",
5
5
  "author": {
6
6
  "name": "sdsrss"
package/README.md CHANGED
@@ -649,10 +649,15 @@ The benchmark suite runs as a CI gate (`npm run benchmark:gate`) to prevent sear
649
649
  Beyond the in-repo micro-benchmark above, claude-mem-lite is measured on
650
650
  [LongMemEval](https://github.com/xiaowu0162/LongMemEval) (Wu et al.) — a
651
651
  500-question long-term-memory benchmark — so its recall is comparable to the
652
- field, not just to itself. Metric is **recall_any@k**: is a gold evidence session
653
- in the top *k* retrieved? Corpus is user-turns-only (the standard raw-baseline
654
- rule). Runners: `benchmark/longmemeval.mjs` (lexical) and
655
- `benchmark/longmemeval-rerank.mjs` (rerank).
652
+ field, not just to itself. Metric is **recall_any@k**: does *any* gold evidence session appear in the
653
+ top *k* retrieved? This is the same session-level definition the systems we
654
+ compare against report on this split — [agentmemory](https://github.com/rohitg00/agentmemory)
655
+ (BM25 + vector + graph) and dense-embedding systems like MemPalace — so the rows
656
+ below sit on one axis, not metric-shopped. (Note: 65% of the 500 questions have
657
+ multiple gold sessions, so `recall_any@k` is looser than fractional recall there;
658
+ all systems in this comparison report the any-hit form.) Corpus is user-turns-only
659
+ (the standard raw-baseline rule). Runners: `benchmark/longmemeval.mjs` (lexical)
660
+ and `benchmark/longmemeval-rerank.mjs` (rerank).
656
661
 
657
662
  | Retriever (zero embeddings) | @1 | @5 | @10 |
658
663
  |---|---|---|---|
@@ -664,15 +669,28 @@ hands the top 20 lexical candidates to a single Haiku call (~1.4 s/query) that
664
669
  reorders them. It is **never worse than the lexical baseline by construction** —
665
670
  any LLM or parse failure falls back to the original candidate order.
666
671
 
667
- **On embeddings, honestly.** With no LLM in the loop, dense-embedding retrieval
668
- still wins on raw recalla dense-embedding baseline reports ~96.6% @5 on this
669
- split, versus our 90.6%. The rerank row's point is that a *single cheap LLM call
670
- closes that gap*: a zero-embedding lexical stack reaches 96.8% @5, edging the
671
- embedding raw number, because the lexical candidate set is already rich enough
672
- (recall@20 = 97.8%) that ranking not recall is the bottleneck. An
673
- embedding-plus-rerank stack still leads when both sides spend an LLM call; the
674
- takeaway is that claude-mem-lite needs **no vector model, no Python, and no
675
- external service** to reach embedding-competitive precision.
672
+ **Stricter metric, for the record.** The rows above are `recall_any@k` does *any*
673
+ gold session reach the top *k* the metric agentmemory and MemPalace publish, so the
674
+ comparison is like-for-like. Under the stricter **standard recall@k** (`|gold top-k| /
675
+ |gold|`, the *fraction* of all gold sessions retrieved), the lexical stack scores
676
+ @1 = 46.9% / @5 = 84.4% / @10 = 91.9%. The whole gap is the 65% of questions with
677
+ multiple gold sessions any-hit needs one, fractional needs them all, and @1 is capped
678
+ at 1/|gold| there; single-gold question types score identically under both.
679
+ `benchmark/longmemeval.mjs` reports both columns (the rerank row's fractional is not yet
680
+ measured).
681
+
682
+ **On embeddings, honestly.** With no LLM in the loop, both a dense-embedding
683
+ baseline (MemPalace, ~96.6% @5) and a BM25 + vector + graph hybrid (agentmemory,
684
+ 95.2% @5) out-recall our zero-embedding lexical stack (90.6% @5) at the same
685
+ retrieval stage — dense and graph signal genuinely help raw recall, and most of
686
+ our remaining gap is paraphrase (single-session-preference is our lowest category
687
+ at 63%). The rerank row's point is that a *single cheap LLM call closes it*:
688
+ reordering the top-20 lexical candidates reaches 96.8% @5 — matching the dense raw
689
+ number and edging the hybrid's retrieval score — because the lexical candidate set
690
+ is already rich enough (recall@20 = 97.8%) that ranking, not recall, is the
691
+ bottleneck. An embedding-plus-rerank stack still leads when both sides spend an LLM
692
+ call; the takeaway is that claude-mem-lite reaches embedding-competitive precision
693
+ with **no vector model, no knowledge graph, no Python, and no external service**.
676
694
 
677
695
  Per-category @5 (lexical → +rerank): knowledge-update 98.7 → 100.0 ·
678
696
  single-session-user 91.4 → 98.6 · temporal-reasoning 89.5 → 97.7 · multi-session
package/mem-cli.mjs CHANGED
@@ -9,7 +9,7 @@ import { resolveProject } from './project-utils.mjs';
9
9
  import { TIER_CASE_SQL, tierSqlParams } from './tier.mjs';
10
10
  import { _resetVocabCache } from './tfidf.mjs';
11
11
  import { autoBoostIfNeeded, reRankWithContext, markSuperseded } from './server-internals.mjs';
12
- import { searchObservationsHybrid, countSearchTotal } from './search-engine.mjs';
12
+ import { searchObservationsHybrid, countSearchTotal, attachBodyTokens } from './search-engine.mjs';
13
13
  import { deepSearch, resolveDeepMode, shouldEscalateToDeep, autoDeepLlmReady, hasEscalatableCorpus } from './deep-search.mjs';
14
14
  import { ensureRegistryDb, upsertResource } from './registry.mjs';
15
15
  import { searchResources } from './registry-retriever.mjs';
@@ -323,6 +323,9 @@ async function cmdSearch(db, args, { llm } = {}) {
323
323
  includeNoise,
324
324
  }), results.length);
325
325
  const paged = results.slice(offset, offset + limit);
326
+ // Enrich the final page with the ~Nt fetch-cost hint (paired with MCP mem_search; #8654 both
327
+ // source keys handled). Batch-fetches heavy obs fields by id — no-op on an empty page.
328
+ attachBodyTokens(db, paged);
326
329
 
327
330
  if (paged.length === 0) {
328
331
  if (jsonOutput) {
@@ -361,6 +364,7 @@ async function cmdSearch(db, args, { llm } = {}) {
361
364
  importance: r.importance ?? null,
362
365
  superseded: Boolean(r.superseded),
363
366
  files_modified: r.files_modified || null,
367
+ body_tokens: r.bodyTokens ?? null,
364
368
  };
365
369
  });
366
370
  out(JSON.stringify({
@@ -382,19 +386,22 @@ async function cmdSearch(db, args, { llm } = {}) {
382
386
  // Pluralize on total — "Found 1 of 44 result" reads wrong; the population (44) drives
383
387
  // grammatical number, not the page slice (1).
384
388
  out(`[mem] Found ${countLabel} result${total !== 1 ? 's' : ''} for "${query}"${fallbackHint}:${hasMixed ? ' (# observation, S# session, P# prompt)' : ''}`);
389
+ // `~Nt` = est. tokens to fetch this row's full body via mem_get (attachBodyTokens, paired with
390
+ // MCP). Conditional so a row that skipped enrichment renders cleanly, not "~undefinedt".
391
+ const tok = r => (r.bodyTokens ? ` ~${r.bodyTokens}t` : '');
385
392
  for (const r of paged) {
386
393
  const timeStr = showTime && r.created_at_epoch ? ` (${relativeTime(r.created_at_epoch)})` : '';
387
394
  if (r._source === 'session') {
388
395
  const date = fmtDateShort(r.created_at);
389
- out(`S#${r.id} 📋 ${date}${timeStr} ${truncate(r.request || r.completed || '(no summary)', 80)}`);
396
+ out(`S#${r.id} 📋 ${date}${timeStr} ${truncate(r.request || r.completed || '(no summary)', 80)}${tok(r)}`);
390
397
  } else if (r._source === 'prompt') {
391
398
  const date = fmtDateShort(r.created_at);
392
- out(`P#${r.id} 💬 ${date}${timeStr} ${truncate(r.prompt_text || '(empty)', 80)}`);
399
+ out(`P#${r.id} 💬 ${date}${timeStr} ${truncate(r.prompt_text || '(empty)', 80)}${tok(r)}`);
393
400
  } else {
394
401
  const date = fmtDateShort(r.created_at);
395
402
  const title = truncate(r.title || r.subtitle || '(untitled)', 80);
396
403
  const supersededTag = r.superseded ? ' [SUPERSEDED]' : '';
397
- out(`#${r.id} ${typeIcon(r.type)} ${date}${timeStr} ${title}${supersededTag}`);
404
+ out(`#${r.id} ${typeIcon(r.type)} ${date}${timeStr} ${title}${supersededTag}${tok(r)}`);
398
405
  if (r.lesson_learned) {
399
406
  out(` -> ${truncate(r.lesson_learned, 80)}`);
400
407
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-mem-lite",
3
- "version": "3.4.0",
3
+ "version": "3.5.0",
4
4
  "description": "Persistent long-term memory for Claude Code via MCP — captures coding decisions, bugfixes, and context across sessions. Hybrid FTS5 + TF-IDF search with episode batching. Single SQLite DB, no external services. A lighter, lower-cost alternative to claude-mem (episode batching + a smaller model; cost savings are an internal estimate, not a measured benchmark).",
5
5
  "type": "module",
6
6
  "packageManager": "npm@10.9.2",
package/search-engine.mjs CHANGED
@@ -9,7 +9,7 @@ import {
9
9
  OBS_BM25, TYPE_DECAY_CASE, TYPE_QUALITY_CASE,
10
10
  DEFAULT_DECAY_HALF_LIFE_MS,
11
11
  notLowSignalTitleClause, LOW_SIGNAL_TITLE,
12
- relaxFtsQueryToOr, debugLog, debugCatch,
12
+ relaxFtsQueryToOr, debugLog, debugCatch, estimateTokens,
13
13
  } from './utils.mjs';
14
14
  import { getVocabulary, computeVector, vectorSearch, rrfMerge } from './tfidf.mjs';
15
15
  import { extractPRFTerms, expandQueryByConcepts } from './server-internals.mjs';
@@ -190,6 +190,45 @@ export function ftsRowToResult(r, { scoreMultiplier, snippet } = {}) {
190
190
  };
191
191
  }
192
192
 
193
+ // Per-result estimate of the token cost to fetch the FULL body via mem_get, surfaced as the
194
+ // `~Nt` hint in search output so the agent can budget the 3-layer protocol (search → timeline →
195
+ // get) before paying to expand any ID. Adopted from thedotmack/claude-mem's token-cost column
196
+ // (reference_claude_mem_comparison) — the one genuinely portable idea from that analysis.
197
+ //
198
+ // Layer-1 search deliberately omits narrative/facts (that's what keeps the index light), so the
199
+ // heavy obs fields are batch-fetched by id HERE rather than carried on every result. The source
200
+ // key is read as `source || _source` because the two render paths disagree (#8654): MCP sets
201
+ // `source`+`text`, CLI sets `_source`+`prompt_text`. estimateTokens floors at 1, so a missing row
202
+ // or empty body yields 1 — never 0/NaN.
203
+ export function attachBodyTokens(db, results) {
204
+ if (!Array.isArray(results) || results.length === 0) return results;
205
+ const obsIds = results
206
+ .filter(r => (r.source || r._source) === 'obs' && Number.isInteger(r.id))
207
+ .map(r => r.id);
208
+ const bodyById = new Map();
209
+ if (obsIds.length > 0) {
210
+ try {
211
+ const ph = obsIds.map(() => '?').join(',');
212
+ const rows = db.prepare(`SELECT id, narrative, facts, text FROM observations WHERE id IN (${ph})`).all(...obsIds);
213
+ for (const row of rows) bodyById.set(row.id, row);
214
+ } catch (e) { debugCatch(e, 'attachBodyTokens'); }
215
+ }
216
+ for (const r of results) {
217
+ const src = r.source || r._source;
218
+ let parts;
219
+ if (src === 'obs') {
220
+ const row = bodyById.get(r.id) || {};
221
+ parts = [r.title, r.subtitle, r.lesson_learned, row.narrative, row.facts, row.text];
222
+ } else if (src === 'session') {
223
+ parts = [r.request, r.completed, r.working_on];
224
+ } else {
225
+ parts = [r.text, r.prompt_text];
226
+ }
227
+ r.bodyTokens = estimateTokens(parts.filter(Boolean).join(' '));
228
+ }
229
+ return results;
230
+ }
231
+
193
232
  function expandObsByConceptCo(db, ctx, now, existingIds, results, includeNoise = false) {
194
233
  const { ftsQuery, args, epochFrom, epochTo, limit } = ctx;
195
234
  if (results.length >= Math.ceil(limit / 2)) return;
package/server.mjs CHANGED
@@ -9,7 +9,7 @@ import { truncate, typeIcon, inferProject, scrubSecrets, fmtDate, debugLog, debu
9
9
  import { resolveProject as _resolveProjectShared } from './project-utils.mjs';
10
10
  import { ensureDb, DB_PATH, DB_DIR, REGISTRY_DB_PATH } from './schema.mjs';
11
11
  import { reRankWithContext, markSuperseded, autoBoostIfNeeded, runIdleCleanup, buildServerInstructions } from './server-internals.mjs';
12
- import { searchObservationsHybrid, countSearchTotal } from './search-engine.mjs';
12
+ import { searchObservationsHybrid, countSearchTotal, attachBodyTokens } from './search-engine.mjs';
13
13
  import { deepSearch, resolveDeepMode, shouldEscalateToDeep, autoDeepLlmReady, hasEscalatableCorpus } from './deep-search.mjs';
14
14
  import { selectCompressionCandidates, groupByProjectWeek, compressGroup } from './lib/compress-core.mjs';
15
15
  import { resolveAnchorToken, formatAnchorError, resolveQueryAnchor, fetchRecentTimeline, fetchTimelineWindow } from './lib/timeline-core.mjs';
@@ -294,21 +294,24 @@ function formatSearchOutput(paginatedResults, args, ftsQuery, totalCount, orFall
294
294
  const fallbackHint = orFallbackFired && !args.or ? ' (relaxed AND→OR)' : '';
295
295
  lines.push(`Found ${countLabel} result(s)${qLabel}${fallbackHint}:${hasMixed ? ' (# observation, S# session, P# prompt)' : ''}\n`);
296
296
 
297
+ // `~Nt` = estimated tokens to fetch this row's full body via mem_get (attachBodyTokens).
298
+ // Conditional so a result that skipped enrichment renders cleanly, not "~undefinedt".
299
+ const tok = r => (r.bodyTokens ? ` ~${r.bodyTokens}t` : '');
297
300
  for (const r of paginatedResults) {
298
301
  if (r.source === 'obs') {
299
302
  const supersededTag = r.superseded ? ' [SUPERSEDED]' : '';
300
- lines.push(`#${r.id} ${typeIcon(r.type)} [${r.type}] ${truncate(r.title || r.subtitle || '(untitled)')} | ${r.project} | ${fmtDate(r.date)}${supersededTag}`);
303
+ lines.push(`#${r.id} ${typeIcon(r.type)} [${r.type}] ${truncate(r.title || r.subtitle || '(untitled)')} | ${r.project} | ${fmtDate(r.date)}${supersededTag}${tok(r)}`);
301
304
  if (r.snippet && r.snippet.length > 10 && r.snippet !== r.title) {
302
305
  lines.push(` ${truncate(r.snippet, 100)}`);
303
306
  }
304
307
  } else if (r.source === 'session') {
305
- lines.push(`S#${r.id} 📋 ${truncate(r.request || r.completed || '(no summary)')} | ${r.project} | ${fmtDate(r.date)}`);
308
+ lines.push(`S#${r.id} 📋 ${truncate(r.request || r.completed || '(no summary)')} | ${r.project} | ${fmtDate(r.date)}${tok(r)}`);
306
309
  } else if (r.source === 'prompt') {
307
- lines.push(`P#${r.id} 💬 ${truncate(r.text)} | ${fmtDate(r.date)}`);
310
+ lines.push(`P#${r.id} 💬 ${truncate(r.text)} | ${fmtDate(r.date)}${tok(r)}`);
308
311
  }
309
312
  }
310
313
 
311
- lines.push(`\nWorkflow: mem_timeline(anchor=ID) for context | mem_get(ids=[...]) for full details`);
314
+ lines.push(`\nWorkflow: mem_timeline(anchor=ID) for context | mem_get(ids=[...]) for full details · ~Nt = est. tokens to fetch full detail`);
312
315
  return { content: [{ type: 'text', text: lines.join('\n') }] };
313
316
  }
314
317
 
@@ -508,6 +511,9 @@ async function runSearchPipeline(db, args, { llm, rerankLlm } = {}) {
508
511
  }), results.length);
509
512
  // Always apply pagination — single-source results can exceed SQL LIMIT due to expansion (concept co-occurrence, PRF, vector search)
510
513
  const paginatedResults = (offset > 0 || results.length > limit) ? results.slice(offset, offset + limit) : results;
514
+ // Enrich the FINAL page with a fetch-cost estimate (~Nt) so the agent budgets before mem_get.
515
+ // Uses the same db threaded through the pipeline (#8743) — batch-fetches heavy obs fields by id.
516
+ attachBodyTokens(db, paginatedResults);
511
517
 
512
518
  // Observability: announce auto-escalation on stderr (parity with CLI deep note).
513
519
  if (escalated) process.stderr.write(`[mem] auto-escalated to deep search (weak results: ${escalatedObsCount} hits)\n`);