claude-mem-lite 2.17.1 → 2.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -10,7 +10,7 @@
10
10
  "plugins": [
11
11
  {
12
12
  "name": "claude-mem-lite",
13
- "version": "2.17.1",
13
+ "version": "2.19.0",
14
14
  "source": "./",
15
15
  "description": "Lightweight persistent memory system for Claude Code — FTS5 search, episode batching, error-triggered recall"
16
16
  }
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-mem-lite",
3
- "version": "2.17.1",
3
+ "version": "2.19.0",
4
4
  "description": "Lightweight persistent memory system for Claude Code — FTS5 search, episode batching, error-triggered recall",
5
5
  "author": {
6
6
  "name": "sdsrss"
package/README.md CHANGED
@@ -100,6 +100,12 @@ The original sends **everything to the LLM and hopes it filters well**. claude-m
100
100
  - **LLM concurrency control** -- File-based semaphore limits background workers to 2 concurrent LLM calls, preventing resource contention
101
101
  - **stdin overflow protection** -- Hook input truncated at 256KB with regex-based action salvage for oversized tool outputs
102
102
  - **Cross-session handoff** -- Captures session state (request, completed work, next steps, key files) on `/clear` or `/exit`, then injects context when the next session detects continuation intent via explicit keywords or FTS5 term overlap
103
+ - **In-place observation updates** -- `mem_update` tool modifies existing observations atomically (field update + FTS text rebuild + vector re-computation in one transaction), preserving original IDs and references
104
+ - **Bulk export** -- `mem_export` tool exports observations as JSON or JSONL, with project/type/date filtering and 1000-row pagination cap with batch guidance
105
+ - **FTS integrity management** -- `mem_fts_check` tool verifies FTS5 index health or rebuilds indexes on demand, useful after database recovery or when search results seem wrong
106
+ - **Atomic multi-table writes** -- `saveObservation` wraps observations + observation_files + observation_vectors INSERTs in a single `db.transaction()`, preventing orphaned rows on crash
107
+ - **Modular NLP pipeline** -- Synonym maps, stop words, scoring constants, and query building extracted into focused modules (`synonyms.mjs`, `stop-words.mjs`, `scoring-sql.mjs`, `nlp.mjs`) for independent testing and maintenance
108
+ - **Porter-aligned PRF** -- Pseudo-relevance feedback terms are now stemmed with the same Porter algorithm used by FTS5, ensuring PRF expansion terms match the search index
103
109
 
104
110
  ## Platform Support
105
111
 
@@ -148,7 +154,7 @@ Source files stay in the cloned repo. Update via `git pull && node install.mjs i
148
154
  ### What happens during installation
149
155
 
150
156
  1. **Install dependencies** -- `npm install --omit=dev` (compiles native `better-sqlite3`)
151
- 2. **Register MCP server** -- `mem` server with 9 tools (search, timeline, get, save, stats, delete, compress, maintain, registry)
157
+ 2. **Register MCP server** -- `mem` server with 12 tools (search, timeline, get, save, update, stats, delete, compress, maintain, registry, export, fts_check)
152
158
  3. **Configure hooks** -- `PostToolUse`, `SessionStart`, `Stop`, `UserPromptSubmit` lifecycle hooks
153
159
  4. **Create data directory** -- `~/.claude-mem-lite/` (hidden) for database, runtime, and managed resource files
154
160
  5. **Auto-migrate** -- If `~/.claude-mem/` (original claude-mem) or `~/claude-mem-lite/` (pre-v0.5 unhidden) exists, migrates database and runtime files to `~/.claude-mem-lite/`, preserving the original untouched
@@ -204,10 +210,13 @@ rm -rf ~/claude-mem-lite/ # pre-v0.5 unhidden (if not auto-moved)
204
210
  | `mem_timeline` | Browse observations chronologically around an anchor point. |
205
211
  | `mem_get` | Retrieve full details for specific observation IDs (includes importance and related_ids). |
206
212
  | `mem_save` | Manually save a memory/observation. |
213
+ | `mem_update` | Update an existing observation in-place. Preserves original ID and references. |
207
214
  | `mem_stats` | View statistics: counts, type distribution, top projects, daily activity. |
208
215
  | `mem_delete` | Delete observations by ID with preview/confirm workflow. FTS5 cleanup is automatic. |
209
216
  | `mem_compress` | Compress old low-value observations into weekly summaries to reduce noise. |
210
217
  | `mem_maintain` | Memory maintenance: scan for duplicates/stale/broken items, then execute cleanup/dedup/rebuild_vectors operations. |
218
+ | `mem_export` | Export observations as JSON or JSONL for backup or migration. Filters by project, type, date range. |
219
+ | `mem_fts_check` | Check FTS5 index integrity or rebuild indexes. Use when search results seem wrong or after DB recovery. |
211
220
  | `mem_registry` | Manage resource registry: search for skills/agents by need, list resources, view stats, import/remove tools, reindex. |
212
221
 
213
222
  ### Skill Commands (in Claude Code chat)
@@ -238,7 +247,8 @@ Five core tables with FTS5 virtual tables for search:
238
247
  id, memory_session_id, project, type, title, subtitle,
239
248
  text, narrative, concepts, facts, files_read, files_modified,
240
249
  importance, related_ids, created_at, created_at_epoch,
241
- lesson_learned, minhash_sig, access_count, compressed_into, search_aliases
250
+ lesson_learned, minhash_sig, access_count, compressed_into, search_aliases,
251
+ branch, superseded_at, superseded_by, last_accessed_at
242
252
  ```
243
253
 
244
254
  **session_summaries** -- LLM-generated session summaries
@@ -265,6 +275,11 @@ project, type, session_id, working_on, completed, unfinished,
265
275
  key_files, key_decisions, match_keywords, created_at_epoch
266
276
  ```
267
277
 
278
+ **observation_files** -- Normalized file membership for efficient file-based recall
279
+ ```
280
+ obs_id, filename
281
+ ```
282
+
268
283
  **observation_vectors** -- TF-IDF vector embeddings for hybrid search
269
284
  ```
270
285
  observation_id, vector (BLOB Float32Array), vocab_version, created_at_epoch
@@ -422,7 +437,16 @@ claude-mem-lite/
422
437
  tool-schemas.mjs # Shared Zod schemas for MCP tool validation
423
438
  tfidf.mjs # TF-IDF vector engine: tokenization, vocabulary building, vector computation, cosine similarity, RRF merge
424
439
  tier.mjs # Temporal tier system: activity-based time window classification
425
- utils.mjs # Shared utilities: FTS5 query building, BM25 weight constants, MinHash dedup, secret scrubbing, CJK synonym extraction
440
+ utils.mjs # Re-export hub: backward-compatible surface for all utility modules
441
+ nlp.mjs # FTS5 query building: synonym expansion, CJK bigrams, sanitization
442
+ scoring-sql.mjs # BM25 weight constants and type-differentiated decay half-lives
443
+ stop-words.mjs # Shared base stop-word set for all NLP/search modules
444
+ synonyms.mjs # Unified synonym source: SYNONYM_MAP (bidirectional) + DISPATCH_SYNONYMS
445
+ project-utils.mjs # Shared project name resolution with in-process cache
446
+ secret-scrub.mjs # API key, token, PEM, and credential pattern redaction
447
+ format-utils.mjs # String formatting: truncate, typeIcon, date/time/week formatting
448
+ hash-utils.mjs # MinHash signatures, Jaccard similarity for dedup
449
+ bash-utils.mjs # Bash output significance detection: errors, tests, builds, deploys
426
450
  # Resource registry
427
451
  registry.mjs # Resource registry DB: schema, CRUD, FTS5, invocation tracking
428
452
  registry-retriever.mjs # FTS5 retrieval with synonym expansion and composite scoring
package/README.zh-CN.md CHANGED
@@ -144,7 +144,7 @@ node install.mjs install
144
144
  ### 安装过程
145
145
 
146
146
  1. **安装依赖** -- `npm install --omit=dev`(编译原生 `better-sqlite3`)
147
- 2. **注册 MCP 服务器** -- `mem` 服务器,包含 7 个工具(search、timeline、get、save、stats、delete、compress)
147
+ 2. **注册 MCP 服务器** -- `mem` 服务器,包含 12 个工具(search、timeline、get、save、update、stats、delete、compress、maintain、registry、export、fts_check
148
148
  3. **配置钩子** -- `PostToolUse`、`PreToolUse`、`SessionStart`、`Stop`、`UserPromptSubmit` 生命周期钩子
149
149
  4. **创建数据目录** -- `~/.claude-mem-lite/`(隐藏目录),存放数据库、运行时和托管资源文件
150
150
  5. **自动迁移** -- 自动检测 `~/.claude-mem/`(原版 claude-mem)或 `~/claude-mem-lite/`(v0.5 前的非隐藏目录),将数据库和运行时文件迁移到 `~/.claude-mem-lite/`,原目录保持不变
@@ -200,9 +200,14 @@ rm -rf ~/claude-mem-lite/ # v0.5 前的非隐藏目录(如未自动迁移)
200
200
  | `mem_timeline` | 围绕锚点按时间顺序浏览观察。 |
201
201
  | `mem_get` | 获取指定观察 ID 的完整详情(包含重要度和关联 ID)。 |
202
202
  | `mem_save` | 手动保存记忆/观察。 |
203
+ | `mem_update` | 原地更新已有观察,保留原始 ID 和引用关系。 |
203
204
  | `mem_stats` | 查看统计:计数、类型分布、热门项目、每日活动。 |
204
205
  | `mem_delete` | 按 ID 删除观察,支持预览/确认工作流。FTS5 自动清理。 |
205
206
  | `mem_compress` | 压缩旧的低价值观察为每周摘要,减少噪声。 |
207
+ | `mem_maintain` | 记忆维护:扫描重复/过期/损坏条目,执行清理/去重/向量重建操作。 |
208
+ | `mem_export` | 导出观察为 JSON 或 JSONL 格式,支持按项目、类型、日期范围过滤。 |
209
+ | `mem_fts_check` | 检查 FTS5 索引完整性或重建索引。搜索结果异常或数据库恢复后使用。 |
210
+ | `mem_registry` | 管理资源注册表:按需搜索技能/代理、列表、统计、导入/移除、重索引。 |
206
211
 
207
212
  ### 技能命令(在 Claude Code 聊天中使用)
208
213
 
@@ -441,7 +446,16 @@ claude-mem-lite/
441
446
  hook-semaphore.mjs # LLM 并发控制:基于文件的信号量
442
447
  schema.mjs # 数据库 schema:表、迁移、FTS5 的单一事实来源
443
448
  tool-schemas.mjs # 共享 Zod schema,用于 MCP 工具校验
444
- utils.mjs # 共享工具:FTS5 查询构建、MinHash 去重、秘密擦除
449
+ utils.mjs # 重导出中心:所有工具模块的向后兼容入口
450
+ nlp.mjs # FTS5 查询构建:同义词扩展、CJK 二元组、查询清洗
451
+ scoring-sql.mjs # BM25 权重常量和类型差异化衰减半衰期
452
+ stop-words.mjs # 共享基础停用词集
453
+ synonyms.mjs # 统一同义词源:SYNONYM_MAP(双向)+ DISPATCH_SYNONYMS
454
+ project-utils.mjs # 共享项目名解析(含进程内缓存)
455
+ secret-scrub.mjs # API 密钥、令牌、PEM 证书等凭据模式擦除
456
+ format-utils.mjs # 字符串格式化:截断、类型图标、日期/时间格式化
457
+ hash-utils.mjs # MinHash 签名、Jaccard 相似度(去重用)
458
+ bash-utils.mjs # Bash 输出显著性检测:错误、测试、构建、部署
445
459
  # 智能调度
446
460
  dispatch.mjs # 三级调度编排:快速过滤、上下文信号、FTS5、Haiku
447
461
  dispatch-inject.mjs # 注入模板渲染:skill/agent 推荐
package/bash-utils.mjs ADDED
@@ -0,0 +1,109 @@
1
+ // claude-mem-lite: Bash command analysis and file path extraction
2
+ // Extracted from utils.mjs for focused responsibility
3
+
4
+ import { basename } from 'path';
5
+
6
+ /**
7
+ * Detect significance signals in a Bash command and its response.
8
+ * Checks for errors, test runs, builds, git operations, and deployments.
9
+ * @param {object} input Tool input with command field
10
+ * @param {string} response Command output text
11
+ * @returns {{isError: boolean, isTest: boolean, isBuild: boolean, isGit: boolean, isDeploy: boolean, isSignificant: boolean}}
12
+ */
13
+ export function detectBashSignificance(input, response) {
14
+ const cmd = (input.command || '').toLowerCase();
15
+ // Skip error keyword matching when the command is a read/search operation
16
+ // (grep output naturally contains matched keywords like "error")
17
+ const isSearchCmd = /\b(grep|rg|ag|ack|cat|head|tail|less|more|find|locate|wc|file|which|type)\b/i.test(cmd);
18
+ const isError = !isSearchCmd
19
+ && /\berror\b|\bERR!|fail(ed|ure)?|exception|panic|traceback|errno|enoent|command not found/i.test(response)
20
+ && response.length > 15;
21
+ // Match actual test runner invocations, not commands that merely reference "test" as a keyword
22
+ const isTest = /\b(npm\s+test|npm\s+run\s+test|yarn\s+test|pnpm\s+test|pnpm\s+run\s+test|bun\s+test|go\s+test|cargo\s+test)\b/i.test(cmd)
23
+ || /\b(jest|pytest|vitest|mocha|cypress|playwright)\b/i.test(cmd);
24
+ const isBuild = /\b(build|compile|tsc|webpack|vite|rollup|esbuild|make|cargo)\b/i.test(cmd);
25
+ const isGit = /\bgit\s+(commit|merge|rebase|cherry-pick|push)\b/i.test(cmd);
26
+ const isDeploy = /\b(deploy|docker|kubectl|terraform)\b/i.test(cmd);
27
+ return {
28
+ isError, isTest, isBuild, isGit, isDeploy,
29
+ isSignificant: isError || isTest || isBuild || isGit || isDeploy,
30
+ };
31
+ }
32
+
33
+ const ERROR_STOP_WORDS = new Set([
34
+ 'error', 'failed', 'cannot', 'could', 'with', 'from', 'that', 'this',
35
+ 'have', 'been', 'were', 'does', 'will', 'would', 'should', 'must',
36
+ 'true', 'false', 'null', 'undefined', 'function', 'return', 'const',
37
+ 'node', 'require', 'stack', 'trace',
38
+ ]);
39
+
40
+ /**
41
+ * Extract discriminative keywords from a failed command and its error output.
42
+ * Filters out common stop words to produce useful FTS5 search terms.
43
+ * @param {string} cmd The command that was executed
44
+ * @param {string} response The error output text
45
+ * @returns {string[]|null} Array of 1-6 keywords or null if none found
46
+ */
47
+ export function extractErrorKeywords(cmd, response) {
48
+ const words = new Set();
49
+ const cmdParts = cmd.split(/[\s/\\|&;]+/).filter(w => w.length > 2 && !/^-/.test(w));
50
+ for (const w of cmdParts.slice(0, 3)) {
51
+ const lw = w.toLowerCase();
52
+ if (!ERROR_STOP_WORDS.has(lw)) words.add(lw);
53
+ }
54
+ const errLines = response.split('\n').filter(l =>
55
+ /error|fail|exception|cannot|not found|undefined|null/i.test(l)
56
+ ).slice(0, 3);
57
+ for (const line of errLines) {
58
+ const tokens = line.replace(/[^a-zA-Z0-9_.-]/g, ' ').split(/\s+/)
59
+ .filter(w => w.length > 3 && !/^\d+$/.test(w));
60
+ for (const t of tokens.slice(0, 5)) {
61
+ const lt = t.toLowerCase();
62
+ if (!ERROR_STOP_WORDS.has(lt)) words.add(lt);
63
+ }
64
+ }
65
+ const result = [...words].slice(0, 6);
66
+ return result.length >= 1 ? result : null;
67
+ }
68
+
69
+ // ─── File Paths ──────────────────────────────────────────────────────────────
70
+
71
+ /**
72
+ * Extract file paths from tool input (file_path, path, filePath, or command args).
73
+ * Deduplicates and excludes /dev/, /proc/, and /tmp/ paths.
74
+ * @param {object} input Tool input object
75
+ * @returns {string[]} Unique array of file paths
76
+ */
77
+ export function extractFilePaths(input) {
78
+ const paths = [];
79
+ if (input.file_path) paths.push(input.file_path);
80
+ if (input.path) paths.push(input.path);
81
+ if (input.filePath) paths.push(input.filePath);
82
+ if (input.command) {
83
+ // Match absolute paths; extension optional to support Makefile, Dockerfile etc.
84
+ const match = input.command.match(/(?:^|\s)(\/[\w./-]+\w)/g);
85
+ if (match) {
86
+ for (const m of match) {
87
+ const p = m.trim();
88
+ if (!p.startsWith('/dev/') && !p.startsWith('/proc/') && !p.startsWith('/tmp/')
89
+ // Skip single-component paths like /exit, /clear — likely slash commands, not files
90
+ && (p.indexOf('/', 1) !== -1 || /\.\w+$/.test(p))) {
91
+ paths.push(p);
92
+ }
93
+ }
94
+ }
95
+ }
96
+ return [...new Set(paths)];
97
+ }
98
+
99
+ // ─── Episode Logic ───────────────────────────────────────────────────────────
100
+
101
+ /**
102
+ * Strip test/spec/e2e suffixes from a filename for sibling matching.
103
+ * Example: auth.test.ts → auth.ts, auth.spec.js → auth.js
104
+ * @param {string} filePath File path to strip
105
+ * @returns {string} Basename with test suffix removed
106
+ */
107
+ export function stripTestSuffix(filePath) {
108
+ return basename(filePath).replace(/\.(test|spec|e2e)\./i, '.');
109
+ }
@@ -0,0 +1,71 @@
1
+ // claude-mem-lite: String formatting and display utilities
2
+ // Extracted from utils.mjs for focused responsibility
3
+
4
+ /**
5
+ * Truncate a string to a maximum length, replacing newlines with spaces.
6
+ * @param {string} str Input string
7
+ * @param {number} [max=80] Maximum character length
8
+ * @returns {string} Truncated string with ellipsis if needed
9
+ */
10
+ export function truncate(str, max = 80) {
11
+ if (!str) return '';
12
+ str = str.replace(/\n/g, ' ').trim();
13
+ return str.length > max ? str.slice(0, max - 1) + '\u2026' : str;
14
+ }
15
+
16
+ /**
17
+ * Map observation type to its display emoji icon.
18
+ * @param {string} type Observation type (decision, bugfix, feature, etc.)
19
+ * @returns {string} Emoji icon for the type
20
+ */
21
+ export function typeIcon(type) {
22
+ const icons = { decision: '\uD83D\uDFE1', bugfix: '\uD83D\uDD34', feature: '\uD83D\uDFE2', refactor: '\uD83D\uDD35', discovery: '\uD83D\uDD0D', change: '\uD83D\uDCDD' };
23
+ return icons[type] || '\u26AA';
24
+ }
25
+
26
+ // ─── Date Formatting ─────────────────────────────────────────────────────────
27
+
28
+ const MONTHS = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'];
29
+
30
+ /**
31
+ * Format an ISO date string as "Mon DD HH:MM" for compact display.
32
+ * @param {string} iso ISO 8601 date string
33
+ * @returns {string} Formatted date or empty string
34
+ */
35
+ export function fmtDate(iso) {
36
+ if (!iso) return '';
37
+ const d = new Date(iso);
38
+ const mon = MONTHS[d.getUTCMonth()];
39
+ const day = d.getUTCDate();
40
+ const h = String(d.getUTCHours()).padStart(2, '0');
41
+ const m = String(d.getUTCMinutes()).padStart(2, '0');
42
+ return `${mon} ${day} ${h}:${m}`;
43
+ }
44
+
45
+ /**
46
+ * Format an ISO date string as "HH:MM" for time-only display.
47
+ * @param {string} iso ISO 8601 date string
48
+ * @returns {string} Formatted time or empty string
49
+ */
50
+ export function fmtTime(iso) {
51
+ if (!iso) return '';
52
+ const d = new Date(iso);
53
+ return `${String(d.getUTCHours()).padStart(2, '0')}:${String(d.getUTCMinutes()).padStart(2, '0')}`;
54
+ }
55
+
56
+ // ─── ISO Week ────────────────────────────────────────────────────────────────
57
+
58
+ /**
59
+ * Convert an epoch timestamp to an ISO week key string (e.g. "2026-W06").
60
+ * @param {number} epochMs Epoch timestamp in milliseconds
61
+ * @returns {string} ISO week key in format "YYYY-Wnn"
62
+ */
63
+ export function isoWeekKey(epochMs) {
64
+ const d = new Date(epochMs);
65
+ const tmp = new Date(Date.UTC(d.getUTCFullYear(), d.getUTCMonth(), d.getUTCDate()));
66
+ tmp.setUTCDate(tmp.getUTCDate() + 4 - (tmp.getUTCDay() || 7));
67
+ const yearStart = new Date(Date.UTC(tmp.getUTCFullYear(), 0, 1));
68
+ const weekNum = Math.ceil(((tmp - yearStart) / 86400000 + 1) / 7);
69
+ const isoYear = tmp.getUTCFullYear();
70
+ return `${isoYear}-W${String(weekNum).padStart(2, '0')}`;
71
+ }
package/hash-utils.mjs ADDED
@@ -0,0 +1,77 @@
1
+ // claude-mem-lite: Hashing and similarity utilities
2
+ // Extracted from utils.mjs for focused responsibility
3
+
4
+ /**
5
+ * Compute word-level Jaccard similarity between two strings.
6
+ * @param {string} a First string
7
+ * @param {string} b Second string
8
+ * @returns {number} Similarity score between 0 and 1
9
+ */
10
+ export function jaccardSimilarity(a, b) {
11
+ if (!a || !b) return 0;
12
+ // Strip trailing punctuation from tokens to match MinHash normalization
13
+ // (prevents "server.rs," ≠ "server.rs" dedup failures)
14
+ const norm = s => s.toLowerCase().split(/\s+/).map(t => t.replace(/[,;:!?]+$/, ''));
15
+ const setA = new Set(norm(a));
16
+ const setB = new Set(norm(b));
17
+ let intersection = 0;
18
+ for (const w of setA) { if (setB.has(w)) intersection++; }
19
+ const union = setA.size + setB.size - intersection;
20
+ return union === 0 ? 0 : intersection / union;
21
+ }
22
+
23
+ // ─── MinHash Signatures ──────────────────────────────────────────────────
24
+
25
+ // FNV-1a hash: fast, non-cryptographic, ~10x faster than SHA-256 for MinHash
26
+ function fnv1a(str) {
27
+ let hash = 0x811c9dc5; // FNV offset basis (32-bit)
28
+ for (let i = 0; i < str.length; i++) {
29
+ hash ^= str.charCodeAt(i);
30
+ hash = Math.imul(hash, 0x01000193); // FNV prime
31
+ hash >>>= 0; // Keep as uint32
32
+ }
33
+ return hash;
34
+ }
35
+
36
+ /**
37
+ * Compute a MinHash signature for approximate set similarity.
38
+ * Returns null for texts with fewer than 3 tokens.
39
+ * @param {string} text Input text to hash
40
+ * @param {number} [numHashes=64] Number of hash functions
41
+ * @returns {string|null} Hex-encoded MinHash signature or null
42
+ */
43
+ export function computeMinHash(text, numHashes = 64) {
44
+ if (!text || typeof text !== 'string') return null;
45
+ const tokens = text.toLowerCase().replace(/[^a-z0-9\s]/g, ' ').split(/\s+/)
46
+ .filter(t => t.length > 2);
47
+ // Require at least 3 tokens for meaningful signature (avoids high collision on short texts)
48
+ if (tokens.length < 3) return null;
49
+
50
+ const mins = new Array(numHashes).fill(0xFFFFFFFF);
51
+ for (const token of tokens) {
52
+ for (let i = 0; i < numHashes; i++) {
53
+ const val = fnv1a(`${i}-${token}`);
54
+ if (val < mins[i]) mins[i] = val;
55
+ }
56
+ }
57
+ return mins.map(v => v.toString(16).padStart(8, '0')).join('');
58
+ }
59
+
60
+ /**
61
+ * Estimate Jaccard similarity from two MinHash signatures.
62
+ * @param {string} sig1 First hex-encoded MinHash signature
63
+ * @param {string} sig2 Second hex-encoded MinHash signature
64
+ * @returns {number} Estimated Jaccard similarity between 0 and 1
65
+ */
66
+ export function estimateJaccardFromMinHash(sig1, sig2) {
67
+ if (!sig1 || !sig2) return 0;
68
+ if (sig1.length !== sig2.length) return 0;
69
+ const numHashes = sig1.length / 8;
70
+ if (numHashes === 0) return 0;
71
+ let matches = 0;
72
+ for (let i = 0; i < numHashes; i++) {
73
+ const offset = i * 8;
74
+ if (sig1.slice(offset, offset + 8) === sig2.slice(offset, offset + 8)) matches++;
75
+ }
76
+ return matches / numHashes;
77
+ }
package/hook-llm.mjs CHANGED
@@ -27,6 +27,11 @@ function buildFtsTextField(obs) {
27
27
  return { conceptsText, factsText, textField: [conceptsText, factsText, aliasesText, bigramText].filter(Boolean).join(' ') };
28
28
  }
29
29
 
30
+ /**
31
+ * Save an observation to the database with three-tier dedup.
32
+ * @returns {number|null} The saved observation ID, or null if deduped.
33
+ * Throws on DB error (callers should catch if needed).
34
+ */
30
35
  export function saveObservation(obs, projectOverride, sessionIdOverride, externalDb) {
31
36
  const db = externalDb || openDb();
32
37
  if (!db) return null;
@@ -41,7 +46,7 @@ export function saveObservation(obs, projectOverride, sessionIdOverride, externa
41
46
  VALUES (?, ?, ?, ?, ?, 'active')
42
47
  `).run(sessionId, sessionId, project, now.toISOString(), now.getTime());
43
48
 
44
- // Three-tier dedup
49
+ // Three-tier dedup — returns null (not throw) for dedup hits
45
50
  // Tier 1 (fast): 5-min Jaccard on titles
46
51
  const fiveMinAgo = now.getTime() - DEDUP_WINDOW_MS;
47
52
  const recent = db.prepare(`
@@ -51,7 +56,7 @@ export function saveObservation(obs, projectOverride, sessionIdOverride, externa
51
56
  `).all(project, fiveMinAgo);
52
57
 
53
58
  if (obs.title && recent.some(r => jaccardSimilarity(r.title, obs.title) > 0.7)) {
54
- return null;
59
+ return null; // dedup: Jaccard title match
55
60
  }
56
61
 
57
62
  // Tier 1.5: Extended title dedup for low-signal degraded titles
@@ -68,7 +73,7 @@ export function saveObservation(obs, projectOverride, sessionIdOverride, externa
68
73
  WHERE project = ? AND title = ? AND created_at_epoch > ? AND created_at_epoch <= ?
69
74
  LIMIT 1
70
75
  `).get(project, obs.title, sevenDaysAgo, fiveMinAgo);
71
- if (exactDup) return null;
76
+ if (exactDup) return null; // dedup: exact title match
72
77
  // Phase 2: Jaccard similarity for near-duplicates (3-day window)
73
78
  const extRecent = db.prepare(`
74
79
  SELECT title FROM observations
@@ -76,7 +81,7 @@ export function saveObservation(obs, projectOverride, sessionIdOverride, externa
76
81
  ORDER BY created_at_epoch DESC LIMIT 60
77
82
  `).all(project, threeDaysAgo, fiveMinAgo);
78
83
  if (extRecent.some(r => jaccardSimilarity(r.title, obs.title) > 0.85)) {
79
- return null;
84
+ return null; // dedup: low-signal Jaccard match
80
85
  }
81
86
  }
82
87
 
@@ -91,44 +96,57 @@ export function saveObservation(obs, projectOverride, sessionIdOverride, externa
91
96
  `).all(project, sevenDaysAgo);
92
97
 
93
98
  if (recentSigs.some(r => estimateJaccardFromMinHash(minhashSig, r.minhash_sig) > 0.8)) {
94
- return null;
99
+ return null; // dedup: MinHash similarity match
95
100
  }
96
101
  }
97
102
 
98
103
  const { conceptsText, factsText, textField } = buildFtsTextField(obs);
99
104
 
100
- const result = db.prepare(`
101
- INSERT INTO observations (memory_session_id, project, text, type, title, subtitle, narrative, concepts, facts, files_read, files_modified, importance, minhash_sig, lesson_learned, search_aliases, branch, created_at, created_at_epoch)
102
- VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
103
- `).run(
104
- sessionId, project,
105
- textField, obs.type, obs.title, obs.subtitle || '',
106
- obs.narrative || '',
107
- conceptsText,
108
- factsText,
109
- JSON.stringify(obs.filesRead || []),
110
- JSON.stringify(obs.files || []),
111
- obs.importance ?? 1,
112
- minhashSig,
113
- obs.lessonLearned || null,
114
- obs.searchAliases || null,
115
- getCurrentBranch(),
116
- now.toISOString(), now.getTime()
117
- );
118
- const savedId = Number(result.lastInsertRowid);
119
-
120
- // Write TF-IDF vector (non-critical)
121
- try {
122
- const vocab = getVocabulary(db);
123
- if (vocab) {
124
- const vecText = [obs.title || '', obs.narrative || '', (Array.isArray(obs.concepts) ? obs.concepts.join(' ') : '')].filter(Boolean).join(' ');
125
- const vec = computeVector(vecText, vocab);
126
- if (vec) {
127
- db.prepare('INSERT OR REPLACE INTO observation_vectors (observation_id, vector, vocab_version, created_at_epoch) VALUES (?, ?, ?, ?)')
128
- .run(savedId, Buffer.from(vec.buffer), vocab.version, Date.now());
105
+ // Atomic: observation INSERT + observation_files + vector in one transaction
106
+ const savedId = db.transaction(() => {
107
+ const result = db.prepare(`
108
+ INSERT INTO observations (memory_session_id, project, text, type, title, subtitle, narrative, concepts, facts, files_read, files_modified, importance, minhash_sig, lesson_learned, search_aliases, branch, created_at, created_at_epoch)
109
+ VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
110
+ `).run(
111
+ sessionId, project,
112
+ textField, obs.type, obs.title, obs.subtitle || '',
113
+ obs.narrative || '',
114
+ conceptsText,
115
+ factsText,
116
+ JSON.stringify(obs.filesRead || []),
117
+ JSON.stringify(obs.files || []),
118
+ obs.importance ?? 1,
119
+ minhashSig,
120
+ obs.lessonLearned || null,
121
+ obs.searchAliases || null,
122
+ getCurrentBranch(),
123
+ now.toISOString(), now.getTime()
124
+ );
125
+ const id = Number(result.lastInsertRowid);
126
+
127
+ // Populate observation_files junction table
128
+ if (id && obs.files && obs.files.length > 0) {
129
+ const insertFile = db.prepare('INSERT OR IGNORE INTO observation_files (obs_id, filename) VALUES (?, ?)');
130
+ for (const f of obs.files) {
131
+ if (typeof f === 'string' && f.length > 0) insertFile.run(id, f);
129
132
  }
130
133
  }
131
- } catch (e) { debugCatch(e, 'saveObservation-vector'); }
134
+
135
+ // Write TF-IDF vector (non-critical — catch inside transaction to avoid rollback)
136
+ try {
137
+ const vocab = getVocabulary(db);
138
+ if (vocab) {
139
+ const vecText = [obs.title || '', obs.narrative || '', (Array.isArray(obs.concepts) ? obs.concepts.join(' ') : '')].filter(Boolean).join(' ');
140
+ const vec = computeVector(vecText, vocab);
141
+ if (vec) {
142
+ db.prepare('INSERT OR REPLACE INTO observation_vectors (observation_id, vector, vocab_version, created_at_epoch) VALUES (?, ?, ?, ?)')
143
+ .run(id, Buffer.from(vec.buffer), vocab.version, Date.now());
144
+ }
145
+ }
146
+ } catch (e) { debugCatch(e, 'saveObservation-vector'); }
147
+
148
+ return id;
149
+ })();
132
150
 
133
151
  return savedId;
134
152
  } finally {
package/hook-memory.mjs CHANGED
@@ -132,22 +132,20 @@ export function recallForFile(db, filePath, project) {
132
132
  const cutoff = Date.now() - FILE_RECALL_LOOKBACK_MS;
133
133
  // Escape SQL LIKE wildcards in filename to prevent injection
134
134
  const escaped = basename.replace(/%/g, '\\%').replace(/_/g, '\\_');
135
- // Match both full paths (/path/to/file.mjs) and basename-only entries ("file.mjs")
136
- // Two patterns avoid false positives: %/file.mjs"% won't match /webapp.mjs
137
- const pathPattern = `%/${escaped}"%`;
138
- const namePattern = `%"${escaped}"%`;
135
+ const likePattern = `%${escaped}`;
139
136
  const rows = db.prepare(`
140
- SELECT id, type, title, importance, lesson_learned
141
- FROM observations
142
- WHERE project = ?
143
- AND importance >= 2
144
- AND COALESCE(compressed_into, 0) = 0
145
- AND superseded_at IS NULL
146
- AND created_at_epoch > ?
147
- AND (files_modified LIKE ? ESCAPE '\\' OR files_modified LIKE ? ESCAPE '\\')
148
- ORDER BY created_at_epoch DESC
137
+ SELECT DISTINCT o.id, o.type, o.title, o.importance, o.lesson_learned
138
+ FROM observations o
139
+ JOIN observation_files of2 ON of2.obs_id = o.id
140
+ WHERE o.project = ?
141
+ AND o.importance >= 2
142
+ AND COALESCE(o.compressed_into, 0) = 0
143
+ AND o.superseded_at IS NULL
144
+ AND o.created_at_epoch > ?
145
+ AND (of2.filename = ? OR of2.filename LIKE ? ESCAPE '\\')
146
+ ORDER BY o.created_at_epoch DESC
149
147
  LIMIT ?
150
- `).all(project, cutoff, pathPattern, namePattern, MAX_FILE_RECALL);
148
+ `).all(project, cutoff, filePath, likePattern, MAX_FILE_RECALL);
151
149
  const now = Date.now();
152
150
  const updateStmt = db.prepare('UPDATE observations SET access_count = COALESCE(access_count, 0) + 1, last_accessed_at = ? WHERE id = ?');
153
151
  for (const r of rows) updateStmt.run(now, r.id);
package/hook-update.mjs CHANGED
@@ -200,6 +200,8 @@ const SOURCE_FILES = [
200
200
  'registry.mjs', 'registry-scanner.mjs', 'registry-indexer.mjs',
201
201
  'registry-retriever.mjs', 'resource-discovery.mjs',
202
202
  'install.mjs', 'install-metadata.mjs', 'mem-cli.mjs', 'tier.mjs', 'tfidf.mjs',
203
+ 'nlp.mjs', 'synonyms.mjs', 'scoring-sql.mjs', 'stop-words.mjs', 'project-utils.mjs',
204
+ 'secret-scrub.mjs', 'format-utils.mjs', 'hash-utils.mjs', 'bash-utils.mjs',
203
205
  ];
204
206
  const SWITCHABLE_PATHS = [...SOURCE_FILES, 'scripts', 'registry', 'node_modules'];
205
207
 
package/install.mjs CHANGED
@@ -206,6 +206,8 @@ async function install() {
206
206
  'registry.mjs', 'registry-scanner.mjs', 'registry-indexer.mjs',
207
207
  'registry-retriever.mjs', 'resource-discovery.mjs',
208
208
  'install-metadata.mjs', 'mem-cli.mjs', 'tier.mjs', 'tfidf.mjs',
209
+ 'nlp.mjs', 'synonyms.mjs', 'scoring-sql.mjs', 'stop-words.mjs', 'project-utils.mjs',
210
+ 'secret-scrub.mjs', 'format-utils.mjs', 'hash-utils.mjs', 'bash-utils.mjs',
209
211
  ];
210
212
 
211
213
  if (IS_DEV) {