claude_memory 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. checksums.yaml +4 -4
  2. data/.claude/CLAUDE.md +1 -1
  3. data/.claude/rules/claude_memory.generated.md +14 -1
  4. data/.claude/skills/check-memory/SKILL.md +10 -0
  5. data/.claude/skills/improve/SKILL.md +12 -1
  6. data/.claude-plugin/plugin.json +1 -1
  7. data/CHANGELOG.md +70 -0
  8. data/db/migrations/008_add_provenance_line_range.rb +21 -0
  9. data/db/migrations/009_add_docid.rb +39 -0
  10. data/db/migrations/010_add_llm_cache.rb +30 -0
  11. data/docs/improvements.md +72 -1084
  12. data/docs/influence/claude-supermemory.md +498 -0
  13. data/docs/influence/qmd.md +424 -2022
  14. data/docs/quality_review.md +64 -705
  15. data/lib/claude_memory/commands/doctor_command.rb +45 -4
  16. data/lib/claude_memory/commands/explain_command.rb +11 -6
  17. data/lib/claude_memory/commands/stats_command.rb +1 -1
  18. data/lib/claude_memory/core/fact_graph.rb +122 -0
  19. data/lib/claude_memory/core/fact_query_builder.rb +34 -14
  20. data/lib/claude_memory/core/fact_ranker.rb +3 -20
  21. data/lib/claude_memory/core/relative_time.rb +45 -0
  22. data/lib/claude_memory/core/result_sorter.rb +2 -2
  23. data/lib/claude_memory/core/rr_fusion.rb +57 -0
  24. data/lib/claude_memory/core/snippet_extractor.rb +97 -0
  25. data/lib/claude_memory/domain/fact.rb +3 -1
  26. data/lib/claude_memory/index/index_query.rb +2 -0
  27. data/lib/claude_memory/index/lexical_fts.rb +18 -0
  28. data/lib/claude_memory/infrastructure/operation_tracker.rb +7 -21
  29. data/lib/claude_memory/infrastructure/schema_validator.rb +30 -25
  30. data/lib/claude_memory/ingest/content_sanitizer.rb +8 -1
  31. data/lib/claude_memory/ingest/ingester.rb +67 -56
  32. data/lib/claude_memory/ingest/tool_extractor.rb +1 -1
  33. data/lib/claude_memory/ingest/tool_filter.rb +55 -0
  34. data/lib/claude_memory/logging/logger.rb +112 -0
  35. data/lib/claude_memory/mcp/query_guide.rb +96 -0
  36. data/lib/claude_memory/mcp/response_formatter.rb +86 -23
  37. data/lib/claude_memory/mcp/server.rb +34 -4
  38. data/lib/claude_memory/mcp/text_summary.rb +257 -0
  39. data/lib/claude_memory/mcp/tool_definitions.rb +20 -4
  40. data/lib/claude_memory/mcp/tools.rb +133 -120
  41. data/lib/claude_memory/publish.rb +12 -2
  42. data/lib/claude_memory/recall/expansion_detector.rb +44 -0
  43. data/lib/claude_memory/recall.rb +93 -41
  44. data/lib/claude_memory/resolve/resolver.rb +72 -40
  45. data/lib/claude_memory/store/sqlite_store.rb +99 -24
  46. data/lib/claude_memory/sweep/sweeper.rb +6 -0
  47. data/lib/claude_memory/version.rb +1 -1
  48. data/lib/claude_memory.rb +21 -0
  49. metadata +14 -2
  50. data/docs/remaining_improvements.md +0 -330
@@ -1,24 +1,9 @@
1
- # QMD Analysis: Quick Markdown Search
1
+ # QMD Analysis: Quick Markdown Search (Updated)
2
2
 
3
- *Analysis Date: 2026-01-26*
4
- *QMD Version: Latest (commit-based, actively developed)*
3
+ *Analysis Date: 2026-02-02*
4
+ *Previous Analysis: 2026-01-26*
5
5
  *Repository: https://github.com/tobi/qmd*
6
-
7
- ---
8
-
9
- ## Table of Contents
10
-
11
- 1. [Executive Summary](#executive-summary)
12
- 2. [Architecture Overview](#architecture-overview)
13
- 3. [Database Schema Analysis](#database-schema-analysis)
14
- 4. [Search Pipeline Deep-Dive](#search-pipeline-deep-dive)
15
- 5. [Vector Search Implementation](#vector-search-implementation)
16
- 6. [LLM Infrastructure](#llm-infrastructure)
17
- 7. [Performance Characteristics](#performance-characteristics)
18
- 8. [Comparative Analysis](#comparative-analysis)
19
- 9. [Adoption Opportunities](#adoption-opportunities)
20
- 10. [Implementation Recommendations](#implementation-recommendations)
21
- 11. [Architecture Decisions](#architecture-decisions)
6
+ *Version/Commit: 63028fd (latest main)*
22
7
 
23
8
  ---
24
9
 
@@ -26,2170 +11,587 @@
26
11
 
27
12
  ### Project Purpose
28
13
 
29
- QMD (Quick Markdown Search) is an **on-device markdown search engine** optimized for knowledge workers and AI agents. It combines lexical search (BM25), vector embeddings, and LLM reranking to provide high-quality document retrieval without cloud dependencies.
30
-
31
- **Target Users**: Developers, researchers, knowledge workers using markdown for notes, documentation, and personal knowledge management.
14
+ QMD (Quick Markdown Search) is an **on-device search engine** for markdown knowledge bases, notes, meeting transcripts, and documentation. It combines BM25 full-text search, vector semantic search, and LLM re-ranking all running locally via node-llama-cpp with GGUF models.
32
15
 
33
16
  ### Key Innovation
34
17
 
35
- QMD's primary innovation is **position-aware score blending** in hybrid search:
18
+ QMD's standout innovations since last analysis:
36
19
 
37
- ```typescript
38
- // Top results favor retrieval scores, lower results favor reranking
39
- const weights = rank <= 3
40
- ? { retrieval: 0.75, reranker: 0.25 }
41
- : rank <= 10
42
- ? { retrieval: 0.60, reranker: 0.40 }
43
- : { retrieval: 0.40, reranker: 0.60 };
44
- ```
20
+ 1. **Custom fine-tuned query expansion model** (`qmd-query-expansion-1.7B`): A Qwen3-1.7B model trained with SFT + GRPO (reinforcement learning) specifically for structured search query expansion. Produces typed outputs (`lex:`, `vec:`, `hyde:`) that route to different search backends.
21
+
22
+ 2. **Claude Code plugin ecosystem**: QMD ships as a Claude Code marketplace plugin (`.claude-plugin/marketplace.json`) with skills, MCP server integration, and inline status checks.
45
23
 
46
- This approach trusts BM25+vector fusion for strong signals while using LLM reranking to elevate semantically relevant results that lexical search missed.
24
+ 3. **Session-scoped LLM management** (`ILLMSession`): Structured lifecycle for LLM resources with abort signals, timeout management, and clean disposal.
47
25
 
48
26
  ### Technology Stack
49
27
 
50
- - **Runtime**: Bun (JavaScript/TypeScript)
51
- - **Database**: SQLite with sqlite-vec extension
52
- - **Embeddings**: EmbeddingGemma (300M params, 300MB)
53
- - **LLM**: node-llama-cpp (local GGUF models)
54
- - **Vector Search**: sqlite-vec virtual tables with cosine distance
55
- - **Full-Text Search**: SQLite FTS5 with Porter stemming
28
+ - **Runtime**: Bun >= 1.0.0 (TypeScript)
29
+ - **Database**: SQLite with sqlite-vec extension (cosine distance)
30
+ - **Full-Text Search**: SQLite FTS5 with Porter tokenization
31
+ - **Embeddings**: EmbeddingGemma-300M (GGUF, ~300MB)
32
+ - **Reranking**: Qwen3-Reranker-0.6B (GGUF, ~640MB)
33
+ - **Query Expansion**: qmd-query-expansion-1.7B (custom fine-tuned, ~1.1GB)
34
+ - **MCP**: @modelcontextprotocol/sdk with stdio transport
35
+ - **Validation**: Zod v4 for MCP tool input schemas
36
+ - **Config**: YAML-based collection management (`~/.config/qmd/index.yml`)
56
37
 
57
38
  ### Production Readiness
58
39
 
59
- - **Active Development**: Frequent commits, responsive maintainer
60
- - **Comprehensive Tests**: eval.test.ts with 24 known-answer queries
61
- - **Quality Metrics**: 50%+ Hit@3 improvement over BM25-only
62
- - **Battle-Tested**: Used by maintainer for personal knowledge base
63
-
64
- ### Evaluation Results
65
-
66
- From `eval.test.ts` (24 queries across 4 difficulty levels):
67
-
68
- | Query Type | BM25 Hit@3 | Vector Hit@3 | Hybrid Hit@3 | Improvement |
69
- |------------|------------|--------------|--------------|-------------|
70
- | Easy (exact keywords) | ≥80% | ≥60% | ≥80% | BM25 sufficient |
71
- | Medium (semantic) | ≥15% | ≥40% | ≥50% | **+233%** over BM25 |
72
- | Hard (vague) | ≥15% @ H@5 | ≥30% @ H@5 | ≥35% @ H@5 | **+133%** over BM25 |
73
- | Fusion (multi-signal) | ~15% | ~30% | ≥50% | **+233%** over BM25 |
74
- | **Overall** | ≥40% | ≥50% | ≥60% | **+50%** over BM25 |
75
-
76
- Key insight: **Hybrid RRF fusion outperforms both methods alone**, especially on queries requiring both lexical precision and semantic understanding.
40
+ - **Maturity**: Beta, actively developed, 5,700+ GitHub stars
41
+ - **Test Coverage**: Unit tests (store.test.ts, mcp.test.ts), eval harness (18 queries across 3 difficulty levels)
42
+ - **Documentation**: Comprehensive README, CLAUDE.md, inline code docs
43
+ - **Community**: 257 forks, 29 issues, 17 PRs, active maintainer (Tobi Lütke)
44
+ - **Plugin Distribution**: Available via Claude Code marketplace
77
45
 
78
46
  ---
79
47
 
80
48
  ## Architecture Overview
81
49
 
82
- ### Data Model Comparison
83
-
84
- | Aspect | QMD | ClaudeMemory |
85
- |--------|-----|--------------|
86
- | **Granularity** | Full markdown documents | Structured facts (triples) |
87
- | **Storage** | Content-addressable (SHA256 hash) | Entity-predicate-object |
88
- | **Deduplication** | Per-document (by content hash) | Per-fact (by signature) |
89
- | **Retrieval Goal** | Find relevant documents | Find specific facts |
90
- | **Truth Model** | All documents valid | Supersession + conflicts |
91
- | **Scope** | YAML collections | Dual-database (global/project) |
92
-
93
- **Philosophical Difference**:
94
- - **QMD**: "Show me documents about X" (conversation recall)
95
- - **ClaudeMemory**: "What do we know about X?" (knowledge extraction)
96
-
97
- ### Storage Strategy
50
+ ### Data Model
98
51
 
99
- QMD uses **content-addressable storage** with a virtual filesystem layer:
52
+ QMD uses content-addressable storage with a virtual filesystem layer:
100
53
 
101
54
  ```
102
- content table (SHA256 hash → document body)
55
+ content table (SHA256 hash → document body, deduplication)
103
56
 
104
- documents table (collection, path, title → hash)
57
+ documents table (collection, path, title → hash, soft-delete via active flag)
105
58
 
106
- Virtual paths: qmd://collection/path/to/file.md
107
- ```
108
-
109
- Benefits:
110
- - Automatic deduplication (same content = single storage)
111
- - Fast change detection (hash comparison)
112
- - Virtual namespace decoupled from filesystem
113
-
114
- Trade-offs:
115
- - More complex than direct file storage
116
- - Hash collisions possible (mitigated by SHA256)
117
-
118
- ### Collection System
119
-
120
- QMD uses YAML configuration for multi-collection indexing:
121
-
122
- ```yaml
123
- # ~/.config/qmd/index.yml
124
- global_context: "Personal knowledge base for software development"
125
-
126
- collections:
127
- notes:
128
- path: /Users/name/notes
129
- pattern: "**/*.md"
130
- context:
131
- /: "General notes"
132
- /work: "Work-related notes and documentation"
133
- /personal: "Personal projects and ideas"
134
-
135
- docs:
136
- path: /Users/name/Documents
137
- pattern: "**/*.md"
138
- ```
139
-
140
- **Context Inheritance**: File at `/work/projects/api.md` inherits:
141
- 1. Global context
142
- 2. `/` context (general notes)
143
- 3. `/work` context (work-related)
144
-
145
- This provides semantic metadata for LLM operations without storing it per-document.
146
-
147
- ### Lifecycle Diagram
148
-
149
- ```
150
- ┌─────────────┐
151
- │ Index Files │ (qmd index <collection>)
152
- └──────┬──────┘
153
-
154
-
155
- ┌─────────────────────────────────────────────────────────┐
156
- │ 1. Hash content (SHA256) │
157
- │ 2. INSERT OR IGNORE into content table │
158
- │ 3. INSERT/UPDATE documents table (collection, path → hash)│
159
- │ 4. FTS5 trigger auto-indexes title + body │
160
- └──────┬──────────────────────────────────────────────────┘
161
-
162
-
163
- ┌──────────────┐
164
- │ Embed │ (qmd embed <collection>)
165
- └──────┬───────┘
166
-
167
-
168
- ┌─────────────────────────────────────────────────────────┐
169
- │ 1. Chunk document (800 tokens, 15% overlap) │
170
- │ 2. Generate embeddings (EmbeddingGemma 384-dim) │
171
- │ 3. INSERT into content_vectors + vectors_vec │
172
- └──────┬──────────────────────────────────────────────────┘
173
-
174
-
175
- ┌──────────────┐
176
- │ Search │ (qmd query "concept")
177
- └──────┬───────┘
178
-
179
-
180
- ┌─────────────────────────────────────────────────────────┐
181
- │ Mode: search → BM25 only (fast) │
182
- │ Mode: vsearch → Vector only (semantic) │
183
- │ Mode: query → Hybrid pipeline (BM25 + vec + rerank) │
184
- └──────┬──────────────────────────────────────────────────┘
185
-
186
-
187
- ┌──────────────┐
188
- │ Retrieve │ (qmd get <path | #docid>)
189
- └──────────────┘
59
+ documents_fts (FTS5 full-text index, auto-synced via triggers)
60
+
61
+ content_vectors (chunk metadata: hash, seq, pos, model)
62
+
63
+ vectors_vec (sqlite-vec native KNN index, cosine distance)
64
+
65
+ llm_cache (hash-keyed deterministic response cache)
190
66
  ```
191
67
 
192
- ---
68
+ ### Key Design Patterns
193
69
 
194
- ## Database Schema Analysis
70
+ 1. **Content-Addressable Storage**: `content` table deduplicates by SHA256 hash — multiple documents with identical content share one row (`store.ts:440-450`)
195
71
 
196
- ### Core Tables
72
+ 2. **Two-Step Vector Query**: JOINs with sqlite-vec virtual tables hang indefinitely. QMD enforces separate queries for vec lookup and metadata join (`store.ts:1912-1915`):
73
+ ```typescript
74
+ // Step 1: KNN from vec table
75
+ const vecResults = db.prepare(
76
+ `SELECT hash_seq, distance FROM vectors_vec WHERE embedding MATCH ? AND k = ?`
77
+ ).all(embedding, limit * 3);
78
+ // Step 2: Join with documents separately
79
+ ```
197
80
 
198
- #### 1. `content` - Content-Addressable Storage
81
+ 3. **YAML-Based Collection Config**: Collections migrated from SQLite foreign keys to `~/.config/qmd/index.yml` for easier user management. Schema migration in `migrate-schema.ts` handled the transition.
199
82
 
200
- ```sql
201
- CREATE TABLE content (
202
- hash TEXT PRIMARY KEY, -- SHA256 of document body
203
- doc TEXT NOT NULL, -- Full markdown content
204
- created_at TEXT NOT NULL -- ISO timestamp
205
- );
206
- ```
83
+ 4. **Hierarchical Context System**: Context descriptions inherit along path hierarchy — a file at `/work/projects/api.md` gets global context + `/` context + `/work` context concatenated (`collections.ts:94-113`)
207
84
 
208
- **Design Pattern**: Hash-keyed blob storage for automatic deduplication.
85
+ 5. **Probabilistic Cache Cleanup**: 1% chance per query to prune LLM cache to latest 1000 entries (`store.ts:804-807`)
209
86
 
210
- **Key Insight**: Multiple documents with identical content share one storage entry.
87
+ 6. **Lazy Model Singleton**: LLM models lazy-load on first use, keep in memory, and unload contexts after 2-minute idle (`llm.ts:920-951`)
211
88
 
212
- #### 2. `documents` - Virtual Filesystem
89
+ ### Module Organization
213
90
 
214
- ```sql
215
- CREATE TABLE documents (
216
- id INTEGER PRIMARY KEY,
217
- collection TEXT NOT NULL, -- Collection name (from YAML)
218
- path TEXT NOT NULL, -- Relative path within collection
219
- title TEXT NOT NULL, -- Extracted from first H1/H2
220
- hash TEXT NOT NULL, -- Foreign key to content.hash
221
- created_at TEXT NOT NULL,
222
- modified_at TEXT NOT NULL,
223
- active INTEGER DEFAULT 1, -- Soft delete flag
224
- UNIQUE(collection, path)
225
- );
226
91
  ```
227
-
228
- **Virtual Path Construction**: `qmd://{collection}/{path}`
229
-
230
- Example: `qmd://notes/work/api-design.md`
231
-
232
- #### 3. `documents_fts` - Full-Text Search Index
233
-
234
- ```sql
235
- CREATE VIRTUAL TABLE documents_fts USING fts5(
236
- title, -- Weighted heavily (10.0)
237
- body, -- Standard weight (1.0)
238
- tokenize = 'porter unicode61'
239
- );
240
-
241
- -- Auto-sync trigger on documents INSERT/UPDATE/DELETE
242
- -- Copies title + body from content table via hash join
92
+ qmd/
93
+ ├── src/
94
+ │ ├── qmd.ts # CLI entry point (~750 lines, lazy-loaded store)
95
+ │ ├── store.ts # Core store: schema, search, indexing (~2400 lines)
96
+ │ ├── mcp.ts # MCP server: 6 tools + resource + prompt (~626 lines)
97
+ │ ├── llm.ts # LLM abstraction: embed, rerank, expand (~1208 lines)
98
+ │ ├── collections.ts # YAML config management (~390 lines)
99
+ │ ├── store.test.ts # Comprehensive store unit tests
100
+ │ └── mcp.test.ts # MCP integration tests
101
+ ├── finetune/ # Query expansion model training pipeline
102
+ │ ├── reward.py # Multi-dimensional reward function (5 dimensions, 120 pts)
103
+ │ ├── train.py # Unified SFT + GRPO training
104
+ │ ├── eval.py # Model evaluation with scoring
105
+ │ └── jobs/ # HuggingFace Jobs wrappers
106
+ ├── test/
107
+ │ └── eval-harness.ts # Search quality evaluation (18 queries)
108
+ ├── skills/qmd/ # Claude Code plugin skill definition
109
+ └── .claude-plugin/ # Marketplace distribution metadata
243
110
  ```
244
111
 
245
- **BM25 Scoring**: Lower scores are better (distance metric).
112
+ ### Comparison with ClaudeMemory
246
113
 
247
- **Tokenization**: Porter stemming for English, unicode61 for international characters.
114
+ | Aspect | QMD | ClaudeMemory | Notes |
115
+ |--------|-----|--------------|-------|
116
+ | **Data Model** | Full markdown documents | Structured fact triples | Different paradigms: recall vs extraction |
117
+ | **Storage** | SQLite + sqlite-vec (native vectors) | SQLite + JSON embeddings | QMD has 10-100x faster KNN |
118
+ | **Search** | BM25 + Vector + RRF + Reranking | BM25 + Vector (hybrid) | QMD adds reranking + query expansion |
119
+ | **MCP** | 6 tools + resource + prompt | 18 tools | ClaudeMemory has richer tool surface |
120
+ | **Distribution** | Bun global install + plugin | Ruby gem + MCP + hooks | QMD has smoother install via plugin |
121
+ | **LLM Dependency** | 3 local GGUF models (~2GB total) | None (local ONNX only) | ClaudeMemory is dramatically lighter |
122
+ | **Query Expansion** | Custom fine-tuned model (1.7B) | None | QMD has ML-powered query improvement |
123
+ | **Truth Maintenance** | None (all docs valid) | Supersession + conflicts | ClaudeMemory handles contradictions |
124
+ | **Scope System** | YAML collections | Dual-database (global/project) | Both approaches valid for their use case |
125
+ | **Testing** | Unit + eval harness | Unit + evals + benchmarks (DevMemBench) | ClaudeMemory has more comprehensive benchmarks |
248
126
 
249
- #### 4. `content_vectors` - Embedding Metadata
127
+ ---
250
128
 
251
- ```sql
252
- CREATE TABLE content_vectors (
253
- hash TEXT NOT NULL, -- Foreign key to content.hash
254
- seq INTEGER NOT NULL, -- Chunk sequence number
255
- pos INTEGER NOT NULL, -- Character position in document
256
- model TEXT NOT NULL, -- Embedding model name
257
- embedded_at TEXT NOT NULL, -- ISO timestamp
258
- PRIMARY KEY (hash, seq)
259
- );
260
- ```
129
+ ## Key Components Deep-Dive
261
130
 
262
- **Chunk Strategy**: 800 tokens with 15% overlap, semantic boundaries.
131
+ ### Component 1: Fine-Tuned Query Expansion
263
132
 
264
- **Key**: `hash_seq` composite (e.g., `"abc123def456_0"`)
133
+ **Purpose**: Generate structured query variations (lex/vec/hyde) to improve search recall by routing different query types to appropriate backends.
265
134
 
266
- #### 5. `vectors_vec` - Native Vector Index
135
+ **Location**: `finetune/`, `src/llm.ts:637-679`
267
136
 
268
- ```sql
269
- CREATE VIRTUAL TABLE vectors_vec USING vec0(
270
- hash_seq TEXT PRIMARY KEY, -- "hash_seq" composite key
271
- embedding float[384] -- 384-dimensional vector (EmbeddingGemma)
272
- distance_metric=cosine
273
- );
274
- ```
137
+ **Implementation** (from `finetune/README.md`):
275
138
 
276
- **Critical Implementation Note** (from store.ts:1745-1748):
277
- ```typescript
278
- // IMPORTANT: We use a two-step query approach here because sqlite-vec virtual tables
279
- // hang indefinitely when combined with JOINs in the same query. Do NOT try to
280
- // "optimize" this by combining into a single query with JOINs - it will break.
281
- // See: https://github.com/tobi/qmd/pull/23
282
-
283
- // CORRECT: Two-step pattern
284
- const vecResults = db.prepare(`
285
- SELECT hash_seq, distance
286
- FROM vectors_vec
287
- WHERE embedding MATCH ? AND k = ?
288
- `).all(embedding, limit * 3);
289
-
290
- // Then join with documents table separately
291
- const hashSeqs = vecResults.map(r => r.hash_seq);
292
- const docs = db.prepare(`
293
- SELECT * FROM documents WHERE hash IN (${placeholders})
294
- `).all(hashSeqs);
295
- ```
139
+ The custom model `qmd-query-expansion-1.7B` is trained in two stages:
296
140
 
297
- **Why This Matters for ClaudeMemory**: When adopting sqlite-vec, we MUST use two-step queries to avoid hangs.
141
+ 1. **SFT (Supervised Fine-Tuning)**: Teaches format compliance
142
+ - Base model: Qwen3-1.7B
143
+ - LoRA rank 16, alpha 32 (all projection layers)
144
+ - ~2,290 training examples, 5 epochs
145
+ - Loss: train 0.472, val 0.304
298
146
 
299
- #### 6. `llm_cache` - Deterministic Response Cache
147
+ 2. **GRPO (Group Relative Policy Optimization)**: Refines quality
148
+ - LoRA rank 4, alpha 8 (q_proj, v_proj only)
149
+ - KL beta 0.04 (prevents drift from SFT)
150
+ - 200 steps, mean reward 0.757
300
151
 
301
- ```sql
302
- CREATE TABLE llm_cache (
303
- hash TEXT PRIMARY KEY, -- Hash of (operation, model, input)
304
- result TEXT NOT NULL, -- LLM response (JSON or plain text)
305
- created_at TEXT NOT NULL
306
- );
307
- ```
152
+ **Reward Function** (from `finetune/reward.py`):
153
+ 5 dimensions totaling 120 points (140 with hyde):
154
+ - Format (0-30): Valid lex/vec/hyde lines
155
+ - Diversity (0-30): Multiple types, no echoing query
156
+ - HyDE (0-20): Presence, length, quality
157
+ - Quality (0-20): Lex < vec length, preserved terms
158
+ - Entity (±45 to +20): Named entity preservation
159
+ - Think penalty: No `<think>` blocks (uses `/no_think` directive)
308
160
 
309
- **Cache Key Formula**:
310
- ```typescript
311
- function getCacheKey(operation: string, params: Record<string, any>): string {
312
- const canonical = JSON.stringify({ operation, ...params });
313
- return sha256(canonical);
314
- }
315
-
316
- // Examples:
317
- // expandQuery: hash("expandQuery" + model + query)
318
- // rerank: hash("rerank" + model + query + file)
161
+ **Output Format**:
319
162
  ```
320
-
321
- **Cleanup Strategy** (probabilistic):
322
- ```typescript
323
- // 1% chance per query to run cleanup
324
- if (Math.random() < 0.01) {
325
- db.run(`
326
- DELETE FROM llm_cache
327
- WHERE hash NOT IN (
328
- SELECT hash FROM llm_cache
329
- ORDER BY created_at DESC
330
- LIMIT 1000
331
- )
332
- `);
333
- }
163
+ lex: authentication configuration
164
+ lex: auth settings setup
165
+ vec: how to configure authentication settings
166
+ hyde: Authentication can be configured by setting the AUTH_SECRET environment variable.
334
167
  ```
335
168
 
336
- **Benefits**:
337
- - Reduces API costs for repeated operations
338
- - Deterministic (same input = same cache key)
339
- - Self-tuning (frequent queries stay cached)
340
-
341
- ### Foreign Key Relationships
342
-
343
- ```
344
- content.hash ← documents.hash ← content_vectors.hash
345
-
346
- documents_fts (via trigger)
347
-
348
- vectors_vec.hash_seq (composite key)
349
- ```
169
+ **Design Decisions**:
170
+ - Structured output types (`lex:`, `vec:`, `hyde:`) route to different backends instead of generic rewrites
171
+ - `/no_think` Qwen3 directive suppresses chain-of-thought for direct output
172
+ - Grammar-constrained generation ensures format compliance at inference time
173
+ - Per-query caching avoids redundant expansion (80% hit rate)
350
174
 
351
- **Cascade Behavior**:
352
- - Soft delete: `documents.active = 0` (preserves content)
353
- - Hard delete: Manual cleanup of orphaned content/vectors
175
+ **Relevance to ClaudeMemory**: The structured lex/vec/hyde output pattern is interesting — if we ever add query expansion to our recall pipeline, this type-routed approach is more sophisticated than simple query rewriting. The reward function design (multi-dimensional scoring with entity preservation) is also a good reference for evaluating any future distiller quality.
354
176
 
355
177
  ---
356
178
 
357
- ## Search Pipeline Deep-Dive
358
-
359
- QMD provides three search modes with increasing sophistication:
360
-
361
- ### Mode 1: `search` (BM25 Only)
362
-
363
- **Use Case**: Fast keyword matching when you know exact terms.
364
-
365
- **Pipeline**:
366
- ```typescript
367
- searchFTS(db, query, limit) {
368
- // 1. Sanitize and build FTS5 query
369
- const terms = query.split(/\s+/)
370
- .map(t => sanitize(t))
371
- .filter(t => t.length > 0);
372
-
373
- const ftsQuery = terms.map(t => `"${t}"*`).join(' AND ');
374
-
375
- // 2. Query FTS5 with BM25 scoring
376
- const results = db.prepare(`
377
- SELECT
378
- d.path,
379
- d.title,
380
- bm25(documents_fts, 10.0, 1.0) as score
381
- FROM documents_fts f
382
- JOIN documents d ON d.id = f.rowid
383
- WHERE documents_fts MATCH ? AND d.active = 1
384
- ORDER BY score ASC -- Lower is better for BM25
385
- LIMIT ?
386
- `).all(ftsQuery, limit);
387
-
388
- // 3. Convert BM25 (lower=better) to similarity (higher=better)
389
- return results.map(r => ({
390
- ...r,
391
- score: 1 / (1 + Math.max(0, r.score))
392
- }));
393
- }
394
- ```
395
-
396
- **Latency**: <50ms
397
-
398
- **Strengths**: Fast, good for exact matches
179
+ ### Component 2: Claude Code Plugin System
399
180
 
400
- **Weaknesses**: Misses semantic similarity
181
+ **Purpose**: Package QMD for frictionless installation via Claude Code marketplace.
401
182
 
402
- ### Mode 2: `vsearch` (Vector Only)
183
+ **Location**: `.claude-plugin/marketplace.json`, `skills/qmd/SKILL.md`
403
184
 
404
- **Use Case**: Semantic search when exact terms unknown.
405
-
406
- **Pipeline**:
407
- ```typescript
408
- async searchVec(db, query, model, limit) {
409
- // 1. Generate query embedding
410
- const llm = getDefaultLlamaCpp();
411
- const formatted = formatQueryForEmbedding(query);
412
- const result = await llm.embed(formatted, { model });
413
- const embedding = new Float32Array(result.embedding);
414
-
415
- // 2. KNN search (two-step to avoid JOIN hang)
416
- const vecResults = db.prepare(`
417
- SELECT hash_seq, distance
418
- FROM vectors_vec
419
- WHERE embedding MATCH ? AND k = ?
420
- `).all(embedding, limit * 3);
421
-
422
- // 3. Join with documents (separate query)
423
- const hashSeqs = vecResults.map(r => r.hash_seq);
424
- const docs = db.prepare(`
425
- SELECT cv.hash, d.path, d.title
426
- FROM content_vectors cv
427
- JOIN documents d ON d.hash = cv.hash
428
- WHERE cv.hash || '_' || cv.seq IN (${placeholders})
429
- `).all(hashSeqs);
430
-
431
- // 4. Deduplicate by document (keep best chunk per doc)
432
- const seen = new Map();
433
- for (const doc of docs) {
434
- const distance = distanceMap.get(doc.hash_seq);
435
- const existing = seen.get(doc.path);
436
- if (!existing || distance < existing.distance) {
437
- seen.set(doc.path, { doc, distance });
185
+ **Plugin Structure** (from `marketplace.json:1-29`):
186
+ ```json
187
+ {
188
+ "name": "qmd",
189
+ "plugins": [{
190
+ "name": "qmd",
191
+ "skills": ["./skills/"],
192
+ "mcpServers": {
193
+ "qmd": { "command": "qmd", "args": ["mcp"] }
438
194
  }
439
- }
440
-
441
- // 5. Convert distance to similarity
442
- return Array.from(seen.values())
443
- .sort((a, b) => a.distance - b.distance)
444
- .slice(0, limit)
445
- .map(({ doc, distance }) => ({
446
- ...doc,
447
- score: 1 - distance // Cosine similarity
448
- }));
449
- }
450
- ```
451
-
452
- **Latency**: ~200ms (embedding generation)
453
-
454
- **Strengths**: Semantic understanding, synonym matching
455
-
456
- **Weaknesses**: Slower, may miss exact keyword matches
457
-
458
- ### Mode 3: `query` (Hybrid Pipeline)
459
-
460
- **Use Case**: Best-quality search combining lexical + semantic + reranking.
461
-
462
- **Full Pipeline** (10 stages):
463
-
464
- #### Stage 1: Initial FTS Query
465
-
466
- ```typescript
467
- const initialFts = searchFTS(db, query, 20);
468
- ```
469
-
470
- **Purpose**: Get BM25 baseline results.
471
-
472
- #### Stage 2: Smart Expansion Detection
473
-
474
- ```typescript
475
- const topScore = initialFts[0]?.score ?? 0;
476
- const secondScore = initialFts[1]?.score ?? 0;
477
- const hasStrongSignal =
478
- initialFts.length > 0 &&
479
- topScore >= 0.85 &&
480
- (topScore - secondScore) >= 0.15;
481
-
482
- if (hasStrongSignal) {
483
- // Skip expensive LLM operations
484
- return initialFts.slice(0, limit);
195
+ }]
485
196
  }
486
197
  ```
487
198
 
488
- **Purpose**: Detect when BM25 has clear winner (exact match).
489
-
490
- **Impact**: Saves 2-3 seconds on ~60% of queries (per QMD data).
491
-
492
- **Thresholds**:
493
- - `topScore >= 0.85`: Strong match
494
- - `gap >= 0.15`: Clear winner
495
-
496
- #### Stage 3: Query Expansion (LLM)
497
-
498
- ```typescript
499
- // Generate alternative phrasings for better recall
500
- const expanded = await expandQuery(query, model, db);
501
- // Returns: [original, variant1, variant2]
502
- ```
503
-
504
- **LLM Prompt** (simplified):
505
- ```
506
- Generate 2 alternative search queries:
507
- 1. 'lex': Keyword-focused variation
508
- 2. 'vec': Semantic-focused variation
509
-
510
- Original: "how to structure REST endpoints"
511
-
512
- Output:
513
- lex: API endpoint design patterns
514
- vec: RESTful service architecture best practices
515
- ```
516
-
517
- **Model**: Qwen3-1.7B (2.2GB, loaded on-demand)
518
-
519
- **Cache Key**: `hash(query + model)`
520
-
521
- #### Stage 4: Multi-Query Search (Parallel)
522
-
523
- ```typescript
524
- const rankedLists = [];
525
-
526
- for (const q of expanded) {
527
- // Run FTS for each query variant
528
- const ftsResults = searchFTS(db, q.text, 20);
529
- rankedLists.push(ftsResults);
530
-
531
- // Run vector search for each query variant
532
- const vecResults = await searchVec(db, q.text, model, 20);
533
- rankedLists.push(vecResults);
534
- }
535
-
536
- // Result: 6 ranked lists (3 queries × 2 methods each)
537
- ```
538
-
539
- **Purpose**: Cast wide net to maximize recall.
540
-
541
- #### Stage 5: Reciprocal Rank Fusion (RRF)
542
-
543
- ```typescript
544
- function reciprocalRankFusion(
545
- resultLists: RankedResult[][],
546
- weights: number[] = [],
547
- k: number = 60
548
- ): RankedResult[] {
549
- const scores = new Map<string, {
550
- result: RankedResult;
551
- rrfScore: number;
552
- topRank: number;
553
- }>();
554
-
555
- // Accumulate RRF scores across all lists
556
- for (let listIdx = 0; listIdx < resultLists.length; listIdx++) {
557
- const list = resultLists[listIdx];
558
- const weight = weights[listIdx] ?? 1.0;
559
-
560
- for (let rank = 0; rank < list.length; rank++) {
561
- const result = list[rank];
562
- const rrfContribution = weight / (k + rank + 1);
563
-
564
- const existing = scores.get(result.file);
565
- if (existing) {
566
- existing.rrfScore += rrfContribution;
567
- existing.topRank = Math.min(existing.topRank, rank);
568
- } else {
569
- scores.set(result.file, {
570
- result,
571
- rrfScore: rrfContribution,
572
- topRank: rank
573
- });
574
- }
575
- }
576
- }
577
-
578
- // Top-rank bonus (preserve exact matches)
579
- for (const entry of scores.values()) {
580
- if (entry.topRank === 0) {
581
- entry.rrfScore += 0.05; // #1 in any list
582
- } else if (entry.topRank <= 2) {
583
- entry.rrfScore += 0.02; // #2-3 in any list
584
- }
585
- }
586
-
587
- return Array.from(scores.values())
588
- .sort((a, b) => b.rrfScore - a.rrfScore)
589
- .map(e => ({ ...e.result, score: e.rrfScore }));
590
- }
199
+ **Skill Definition** (from `skills/qmd/SKILL.md:1-10`):
200
+ ```yaml
201
+ ---
202
+ name: qmd
203
+ description: Search personal markdown knowledge bases...
204
+ metadata:
205
+ author: tobi
206
+ version: "1.1.1"
207
+ allowed-tools: Bash(qmd:*), mcp__qmd__*
208
+ ---
591
209
  ```
592
210
 
593
- **RRF Formula**: `score = Σ(weight / (k + rank + 1))`
594
-
595
- **Why k=60?**: Balances top-rank emphasis with lower-rank contributions.
596
- - Lower k (e.g., 20): Top ranks dominate
597
- - Higher k (e.g., 100): Smoother blending
211
+ Key features:
212
+ - **Inline status check**: `!` prefix runs command during skill load (`SKILL.md:18`)
213
+ - **Trigger phrases**: "search my notes", "find in docs", "what did I write about"
214
+ - **Tool permissions**: Scoped to `qmd:*` bash commands and `mcp__qmd__*` tools
215
+ - **Score interpretation guide**: Embedded in skill for LLM consumption
216
+ - **Recommended workflow**: status → search → vsearch → query → get
598
217
 
599
- **Weight Strategy**:
600
- - Original query: `weight = 2.0` (prioritize user's exact words)
601
- - Expanded queries: `weight = 1.0` (supplementary signals)
218
+ **Relevance to ClaudeMemory**: This is the clearest example of how to package a memory/search tool as a Claude Code plugin. The skill definition format, tool permissions scoping, inline status checks, and MCP server bundling are all patterns we should adopt when ready to ship as a plugin. The `allowed-tools` pattern (`Bash(qmd:*)`) is particularly useful for security scoping.
602
219
 
603
- **Top-Rank Bonus**:
604
- - `+0.05` for rank #1: Likely exact match
605
- - `+0.02` for ranks #2-3: Strong signal
606
- - No bonus for rank #4+: Let RRF dominate
607
-
608
- #### Stage 6: Candidate Selection
220
+ ---
609
221
 
610
- ```typescript
611
- const candidates = fusedResults.slice(0, 30);
612
- ```
222
+ ### Component 3: MCP Server with Structured Content
613
223
 
614
- **Purpose**: Limit reranking to top candidates (cost control).
224
+ **Purpose**: Expose QMD search as MCP tools with both human-readable text and machine-parseable structured content.
615
225
 
616
- #### Stage 7: Per-Document Best Chunk Selection
226
+ **Location**: `src/mcp.ts`
617
227
 
228
+ **Implementation** (from `mcp.ts:258-292`):
618
229
  ```typescript
619
- // For each candidate document, find best matching chunk
620
- const docChunks = candidates.map(doc => {
621
- const chunks = getChunksForDocument(db, doc.hash);
622
-
623
- // Score each chunk by keyword overlap
624
- const scored = chunks.map(chunk => {
625
- const terms = query.toLowerCase().split(/\s+/);
626
- const chunkLower = chunk.text.toLowerCase();
627
- const matchCount = terms.filter(t => chunkLower.includes(t)).length;
628
- return { chunk, score: matchCount };
629
- });
630
-
631
- // Return best chunk text for reranking
230
+ server.registerTool("search", {
231
+ title: "Search (BM25)",
232
+ inputSchema: {
233
+ query: z.string().describe("Search query"),
234
+ limit: z.number().optional().default(10),
235
+ minScore: z.number().optional().default(0),
236
+ collection: z.string().optional(),
237
+ },
238
+ }, async ({ query, limit, minScore, collection }) => {
239
+ // ... search logic ...
632
240
  return {
633
- file: doc.path,
634
- text: scored.sort((a, b) => b.score - a.score)[0].chunk.text
241
+ content: [{ type: "text", text: formatSearchSummary(filtered, query) }],
242
+ structuredContent: { results: filtered },
635
243
  };
636
244
  });
637
245
  ```
638
246
 
639
- **Purpose**: Reranker sees most relevant chunk per document.
640
-
641
- #### Stage 8: LLM Reranking (Cross-Encoder)
642
-
643
- ```typescript
644
- const rerankResult = await llm.rerank(query, docChunks, { model });
645
-
646
- // Returns: [{ file, score: 0.0-1.0 }, ...]
647
- // score = normalized relevance (cross-encoder logits)
648
- ```
649
-
650
- **Model**: Qwen3-Reranker-0.6B (640MB)
651
-
652
- **How It Works**: Cross-encoder scores query-document pair directly (not separate embeddings).
653
-
654
- **Cache Key**: `hash(query + file + model)`
655
-
656
- #### Stage 9: Position-Aware Score Blending
657
-
658
- ```typescript
659
- // Combine RRF and reranker scores based on rank
660
- const blended = candidates.map((doc, rank) => {
661
- const rrfScore = doc.score;
662
- const rerankScore = rerankScores.get(doc.file) || 0;
663
-
664
- // Top results: trust retrieval more
665
- // Lower results: trust reranker more
666
- let rrfWeight, rerankWeight;
667
- if (rank < 3) {
668
- rrfWeight = 0.75;
669
- rerankWeight = 0.25;
670
- } else if (rank < 10) {
671
- rrfWeight = 0.60;
672
- rerankWeight = 0.40;
673
- } else {
674
- rrfWeight = 0.40;
675
- rerankWeight = 0.60;
676
- }
677
-
678
- const finalScore = rrfWeight * rrfScore + rerankWeight * rerankScore;
679
-
680
- return { ...doc, score: finalScore };
681
- });
682
- ```
683
-
684
- **Rationale**:
685
- - Top results likely have both strong lexical AND semantic signals
686
- - Lower results may be semantically relevant but lexically weak
687
- - Reranker helps elevate hidden gems
688
-
689
- #### Stage 10: Final Sorting
690
-
691
- ```typescript
692
- return blended
693
- .sort((a, b) => b.score - a.score)
694
- .slice(0, limit);
695
- ```
696
-
697
- **Latency Breakdown**:
698
- - Cold (first query): 2-3s (model loading + expansion + reranking)
699
- - Warm (cached expansion): ~500ms (reranking only)
700
- - Strong signal (skipped): ~200ms (FTS + vector, no LLM)
701
-
702
- ---
703
-
704
- ## Vector Search Implementation
705
-
706
- ### Embedding Model: EmbeddingGemma
707
-
708
- **Specs**:
709
- - Parameters: 300M
710
- - Dimensions: 384 (QMD docs say 768, but 384 is standard)
711
- - Format: GGUF (quantized)
712
- - Size: 300MB download
713
- - Tokenizer: SentencePiece
714
-
715
- **Prompt Format** (Nomic-style):
716
- ```typescript
717
- // Query embedding
718
- formatQueryForEmbedding(query: string): string {
719
- return `task: search result | query: ${query}`;
720
- }
721
-
722
- // Document embedding
723
- formatDocForEmbedding(text: string, title?: string): string {
724
- return `title: ${title || "none"} | text: ${text}`;
725
- }
726
- ```
727
-
728
- **Why Prompt Formatting Matters**: Embedding models are trained on specific formats. Using the wrong format degrades quality.
729
-
730
- ### Document Chunking Strategy
731
-
732
- QMD offers two chunking approaches:
733
-
734
- #### 1. Token-Based Chunking (Recommended)
735
-
736
- ```typescript
737
- async function chunkDocumentByTokens(
738
- content: string,
739
- maxTokens: number = 800,
740
- overlapTokens: number = 120 // 15% of 800
741
- ): Promise<{ text: string; pos: number; tokens: number }[]> {
742
- const llm = getDefaultLlamaCpp();
743
-
744
- // Tokenize entire document once
745
- const allTokens = await llm.tokenize(content);
746
- const totalTokens = allTokens.length;
747
-
748
- if (totalTokens <= maxTokens) {
749
- return [{ text: content, pos: 0, tokens: totalTokens }];
750
- }
751
-
752
- const chunks = [];
753
- const step = maxTokens - overlapTokens; // 680 tokens
754
- let tokenPos = 0;
755
-
756
- while (tokenPos < totalTokens) {
757
- const chunkEnd = Math.min(tokenPos + maxTokens, totalTokens);
758
- const chunkTokens = allTokens.slice(tokenPos, chunkEnd);
759
- let chunkText = await llm.detokenize(chunkTokens);
760
-
761
- // Find semantic break point if not at end
762
- if (chunkEnd < totalTokens) {
763
- const searchStart = Math.floor(chunkText.length * 0.7);
764
- const searchSlice = chunkText.slice(searchStart);
765
-
766
- // Priority: paragraph > sentence > line
767
- const breakOffset = findBreakPoint(searchSlice);
768
- if (breakOffset >= 0) {
769
- chunkText = chunkText.slice(0, searchStart + breakOffset);
770
- }
771
- }
772
-
773
- chunks.push({
774
- text: chunkText,
775
- pos: Math.floor(tokenPos * avgCharsPerToken),
776
- tokens: chunkTokens.length
777
- });
778
-
779
- tokenPos += step;
780
- }
781
-
782
- return chunks;
783
- }
784
- ```
785
-
786
- **Parameters**:
787
- - `maxTokens = 800`: EmbeddingGemma's optimal context window
788
- - `overlapTokens = 120` (15%): Ensures continuity across boundaries
789
-
790
- **Break Priority** (from store.ts:1020-1046):
791
- 1. Paragraph boundary (`\n\n`)
792
- 2. Sentence end (`. `, `.\n`, `? `, `! `)
793
- 3. Line break (`\n`)
794
- 4. Word boundary (` `)
795
- 5. Hard cut (if no boundary found)
796
-
797
- **Search Window**: Last 30% of chunk (70-100% range) to avoid cutting too early.
798
-
799
- #### 2. Character-Based Chunking (Fallback)
800
-
801
- ```typescript
802
- function chunkDocument(
803
- content: string,
804
- maxChars: number = 3200, // ~800 tokens @ 4 chars/token
805
- overlapChars: number = 480 // 15% overlap
806
- ): { text: string; pos: number }[] {
807
- // Similar logic but operates on characters instead of tokens
808
- // Faster but less accurate (doesn't respect token boundaries)
809
- }
810
- ```
811
-
812
- **When to Use**: Synchronous contexts where async tokenization isn't available.
813
-
814
- ### sqlite-vec Integration
815
-
816
- QMD uses **sqlite-vec 0.1.x** (vec0 virtual table):
817
-
818
- ```typescript
819
- // Create virtual table for native vectors
820
- db.exec(`
821
- CREATE VIRTUAL TABLE vectors_vec USING vec0(
822
- hash_seq TEXT PRIMARY KEY,
823
- embedding float[384] distance_metric=cosine
824
- )
825
- `);
826
-
827
- // Insert embedding (note: Float32Array required)
828
- const embedding = new Float32Array(embeddingArray);
829
- db.prepare(`
830
- INSERT INTO vectors_vec (hash_seq, embedding) VALUES (?, ?)
831
- `).run(`${hash}_${seq}`, embedding);
832
-
833
- // KNN search (CRITICAL: no JOINs in same query!)
834
- const vecResults = db.prepare(`
835
- SELECT hash_seq, distance
836
- FROM vectors_vec
837
- WHERE embedding MATCH ? AND k = ?
838
- `).all(queryEmbedding, limit * 3);
839
-
840
- // Then join with documents in separate query
841
- const docs = db.prepare(`
842
- SELECT * FROM documents WHERE hash IN (...)
843
- `).all(hashList);
844
- ```
845
-
846
- **Key Insights**:
847
-
848
- 1. **Two-Step Pattern Required**: JOINs with vec0 tables hang (confirmed bug)
849
- 2. **Float32Array**: Must convert number[] to typed array
850
- 3. **Cosine Distance**: Returns 0.0 (identical) to 2.0 (opposite)
851
- 4. **KNN Parameter**: Request `limit * 3` to allow for deduplication
247
+ **Key patterns**:
248
+ 1. **Dual output**: Both `content` (human-readable text) and `structuredContent` (JSON) returned from every tool
249
+ 2. **Zod validation**: Input schemas use Zod v4 with `.describe()` for auto-documentation
250
+ 3. **Resource template**: Documents accessible via `qmd://{+path}` URI pattern with suffix matching fallback (`mcp.ts:105-166`)
251
+ 4. **Query guide prompt**: Registered prompt explaining search strategy to LLMs (`mcp.ts:172-252`)
252
+ 5. **Line numbers**: Default in resource output for precise references
253
+ 6. **Error handling**: `isError: true` flag for clear error signaling, fuzzy file suggestions on not-found
852
254
 
853
- ### Per-Document vs Per-Chunk Deduplication
854
-
855
- QMD deduplicates **per-document** after vector search:
856
-
857
- ```typescript
858
- // Multiple chunks per document may match
859
- // Keep only the best chunk per document
860
- const seen = new Map<string, { doc, bestDistance }>();
861
-
862
- for (const row of docRows) {
863
- const distance = distanceMap.get(row.hash_seq);
864
- const existing = seen.get(row.filepath);
865
-
866
- if (!existing || distance < existing.bestDistance) {
867
- seen.set(row.filepath, { doc: row, bestDistance: distance });
868
- }
869
- }
870
-
871
- return Array.from(seen.values())
872
- .sort((a, b) => a.bestDistance - b.bestDistance);
873
- ```
874
-
875
- **Rationale**: Users want documents, not chunks. Show best chunk per doc.
255
+ **Relevance to ClaudeMemory**: We already have 18 MCP tools, but QMD's dual `content`/`structuredContent` pattern is worth adopting — it ensures both human (text summary) and machine (JSON) consumers get optimal formats. The registered prompt for query guidance is also a good pattern for improving Claude's tool usage.
876
256
 
877
257
  ---
878
258
 
879
- ## LLM Infrastructure
880
-
881
- ### node-llama-cpp Abstraction
882
-
883
- QMD uses **node-llama-cpp** for local inference:
884
-
885
- ```typescript
886
- import { getLlama, LlamaModel, LlamaChatSession } from "node-llama-cpp";
887
-
888
- class LlamaCpp implements LLM {
889
- private llama: Llama | null = null;
890
- private embedModel: LlamaModel | null = null;
891
- private rerankModel: LlamaModel | null = null;
892
- private generateModel: LlamaModel | null = null;
893
-
894
- // Lazy loading with singleton pattern
895
- private async ensureLlama(): Promise<Llama> {
896
- if (!this.llama) {
897
- this.llama = await getLlama({ logLevel: LlamaLogLevel.error });
898
- }
899
- return this.llama;
900
- }
901
-
902
- private async ensureEmbedModel(): Promise<LlamaModel> {
903
- if (!this.embedModel) {
904
- const llama = await this.ensureLlama();
905
- const modelPath = await resolveModelFile(
906
- "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf",
907
- this.modelCacheDir
908
- );
909
- this.embedModel = await llama.loadModel({ modelPath });
910
- }
911
- return this.embedModel;
912
- }
913
- }
914
- ```
915
-
916
- **Model Download**: Automatic from HuggingFace (cached in `~/.cache/qmd/models/`)
917
-
918
- ### Lazy Model Loading
919
-
920
- **Strategy**: Load models on first use, keep in memory, unload after 2 minutes idle.
921
-
922
- ```typescript
923
- // Inactivity timer management
924
- private touchActivity(): void {
925
- if (this.inactivityTimer) {
926
- clearTimeout(this.inactivityTimer);
927
- }
928
-
929
- if (this.inactivityTimeoutMs > 0 && this.hasLoadedContexts()) {
930
- this.inactivityTimer = setTimeout(() => {
931
- this.unloadIdleResources();
932
- }, this.inactivityTimeoutMs);
933
- this.inactivityTimer.unref(); // Don't block process exit
934
- }
935
- }
936
-
937
- // Unload contexts (heavy) but keep models (fast reload)
938
- async unloadIdleResources(): Promise<void> {
939
- if (this.embedContext) {
940
- await this.embedContext.dispose();
941
- this.embedContext = null;
942
- }
943
- if (this.rerankContext) {
944
- await this.rerankContext.dispose();
945
- this.rerankContext = null;
946
- }
947
-
948
- // Optional: also dispose models if disposeModelsOnInactivity=true
949
- // (default: false, keep models loaded)
950
- }
951
- ```
952
-
953
- **Lifecycle** (from llm.ts comments):
954
- ```
955
- llama (lightweight) → model (VRAM) → context (VRAM) → sequence (per-session)
956
- ```
957
-
958
- **Why This Matters**:
959
- - **Cold start**: First query loads models (~2-3s)
960
- - **Warm**: Subsequent queries use loaded models (~200-500ms)
961
- - **Idle**: After 2min, contexts unloaded (models stay unless configured)
962
-
963
- ### Query Expansion
964
-
965
- **Purpose**: Generate alternative phrasings for better recall.
966
-
967
- **LLM Prompt** (from llm.ts:637-679):
968
- ```typescript
969
- const prompt = `You are a search query optimization expert. Your task is to improve retrieval by rewriting queries and generating hypothetical documents.
970
-
971
- Original Query: ${query}
972
-
973
- ${context ? `Additional Context, ONLY USE IF RELEVANT:\n\n<context>${context}</context>` : ""}
974
-
975
- ## Step 1: Query Analysis
976
- Identify entities, search intent, and missing context.
977
-
978
- ## Step 2: Generate Hypothetical Document
979
- Write a focused sentence passage that would answer the query. Include specific terminology and domain vocabulary.
980
-
981
- ## Step 3: Query Rewrites
982
- Generate 2-3 alternative search queries that resolve ambiguities. Use terminology from the hypothetical document.
983
-
984
- ## Step 4: Final Retrieval Text
985
- Output exactly 1-3 'lex' lines, 1-3 'vec' lines, and MAX ONE 'hyde' line.
986
-
987
- <format>
988
- lex: {single search term}
989
- vec: {single vector query}
990
- hyde: {complete hypothetical document passage from Step 2 on a SINGLE LINE}
991
- </format>
992
-
993
- <rules>
994
- - DO NOT repeat the same line.
995
- - Each 'lex:' line MUST be a different keyword variation based on the ORIGINAL QUERY.
996
- - Each 'vec:' line MUST be a different semantic variation based on the ORIGINAL QUERY.
997
- - The 'hyde:' line MUST be the full sentence passage from Step 2, but all on one line.
998
- </rules>
999
-
1000
- Final Output:`;
1001
- ```
1002
-
1003
- **Grammar** (constrained generation):
1004
- ```typescript
1005
- const grammar = await llama.createGrammar({
1006
- grammar: `
1007
- root ::= line+
1008
- line ::= type ": " content "\\n"
1009
- type ::= "lex" | "vec" | "hyde"
1010
- content ::= [^\\n]+
1011
- `
1012
- });
1013
- ```
1014
-
1015
- **Output Parsing**:
1016
- ```typescript
1017
- const result = await session.prompt(prompt, { grammar, maxTokens: 1000, temperature: 1 });
1018
- const lines = result.trim().split("\n");
1019
- const queryables: Queryable[] = lines.map(line => {
1020
- const colonIdx = line.indexOf(":");
1021
- const type = line.slice(0, colonIdx).trim();
1022
- const text = line.slice(colonIdx + 1).trim();
1023
- return { type: type as QueryType, text };
1024
- }).filter(q => q.type === 'lex' || q.type === 'vec' || q.type === 'hyde');
1025
- ```
1026
-
1027
- **Example**:
1028
- ```
1029
- Query: "how to structure REST endpoints"
1030
-
1031
- Output:
1032
- lex: REST API design
1033
- lex: endpoint organization patterns
1034
- vec: RESTful service architecture principles
1035
- vec: HTTP resource modeling best practices
1036
- hyde: REST endpoints should follow resource-oriented design with clear hierarchies. Use nouns for resources, HTTP methods for operations, and consistent naming conventions for discoverability.
1037
- ```
1038
-
1039
- **Model**: Qwen3-1.7B (2.2GB)
1040
-
1041
- **Cache Hit Rate**: High for repeated queries (~80% per QMD usage data)
1042
-
1043
- ### LLM Reranking
1044
-
1045
- **Purpose**: Score query-document relevance using cross-encoder.
1046
-
1047
- **Implementation**:
1048
- ```typescript
1049
- async rerank(
1050
- query: string,
1051
- documents: RerankDocument[],
1052
- options: RerankOptions = {}
1053
- ): Promise<RerankResult> {
1054
- const context = await this.ensureRerankContext();
1055
-
1056
- // Extract text for ranking
1057
- const texts = documents.map(doc => doc.text);
1058
-
1059
- // Use native ranking API (returns sorted by score)
1060
- const ranked = await context.rankAndSort(query, texts);
1061
-
1062
- // Map back to original documents
1063
- const results = ranked.map(item => {
1064
- const docInfo = textToDoc.get(item.document);
1065
- return {
1066
- file: docInfo.file,
1067
- score: item.score, // 0.0 (irrelevant) to 1.0 (highly relevant)
1068
- index: docInfo.index
1069
- };
1070
- });
1071
-
1072
- return { results, model: this.rerankModelUri };
1073
- }
1074
- ```
1075
-
1076
- **Model**: Qwen3-Reranker-0.6B (640MB)
259
+ ### Component 4: Session-Scoped LLM Lifecycle
1077
260
 
1078
- **Score Range**: 0.0 to 1.0 (normalized from logits)
261
+ **Purpose**: Manage LLM model loading, context creation, and cleanup with structured lifecycle guarantees.
1079
262
 
1080
- **Cache Key**: `hash(query + file + model)`
263
+ **Location**: `src/llm.ts:126-146`
1081
264
 
1082
- ### Cache Management
1083
-
1084
- **Probabilistic Cleanup** (from store.ts:804-807):
265
+ **Session Interface** (from `llm.ts:137-146`):
1085
266
  ```typescript
1086
- // 1% chance per query to run cleanup
1087
- if (Math.random() < 0.01) {
1088
- db.run(`
1089
- DELETE FROM llm_cache
1090
- WHERE hash NOT IN (
1091
- SELECT hash FROM llm_cache
1092
- ORDER BY created_at DESC
1093
- LIMIT 1000
1094
- )
1095
- `);
267
+ export interface ILLMSession {
268
+ embed(text: string, options?: EmbedOptions): Promise<EmbeddingResult | null>;
269
+ embedBatch(texts: string[]): Promise<(EmbeddingResult | null)[]>;
270
+ expandQuery(query: string, options?): Promise<Queryable[]>;
271
+ rerank(query: string, documents: RerankDocument[]): Promise<RerankResult>;
272
+ readonly isValid: boolean;
273
+ readonly signal: AbortSignal;
1096
274
  }
1097
275
  ```
1098
276
 
1099
- **Rationale**:
1100
- - Keep latest 1000 entries (most likely to be reused)
1101
- - Probabilistic avoids overhead on every query
1102
- - Self-tuning: frequent queries naturally stay cached
1103
-
1104
- **Cache Size Estimate**:
1105
- - Query expansion: ~500 bytes per entry
1106
- - Reranking: ~50 bytes per entry (just score)
1107
- - 1000 entries ≈ 500KB (negligible)
1108
-
1109
- ---
1110
-
1111
- ## Performance Characteristics
1112
-
1113
- ### Evaluation Methodology
1114
-
1115
- QMD includes comprehensive test suite in `eval.test.ts`:
1116
-
1117
- **Test Corpus**: 6 synthetic documents covering diverse topics
1118
- - api-design.md
1119
- - fundraising.md
1120
- - distributed-systems.md
1121
- - machine-learning.md
1122
- - remote-work.md
1123
- - product-launch.md
1124
-
1125
- **Query Design**: 24 queries across 4 difficulty levels
1126
-
1127
- #### Easy Queries (6) - Exact keyword matches
1128
- ```typescript
1129
- { query: "API versioning", expectedDoc: "api-design" }
1130
- { query: "Series A fundraising", expectedDoc: "fundraising" }
1131
- { query: "CAP theorem", expectedDoc: "distributed-systems" }
1132
- { query: "overfitting machine learning", expectedDoc: "machine-learning" }
1133
- { query: "remote work VPN", expectedDoc: "remote-work" }
1134
- { query: "Project Phoenix retrospective", expectedDoc: "product-launch" }
1135
- ```
1136
-
1137
- **Expected**: BM25 should excel (≥80% Hit@3)
1138
-
1139
- #### Medium Queries (6) - Semantic/conceptual
1140
- ```typescript
1141
- { query: "how to structure REST endpoints", expectedDoc: "api-design" }
1142
- { query: "raising money for startup", expectedDoc: "fundraising" }
1143
- { query: "consistency vs availability tradeoffs", expectedDoc: "distributed-systems" }
1144
- { query: "how to prevent models from memorizing data", expectedDoc: "machine-learning" }
1145
- { query: "working from home guidelines", expectedDoc: "remote-work" }
1146
- { query: "what went wrong with the launch", expectedDoc: "product-launch" }
1147
- ```
1148
-
1149
- **Expected**: Vectors should outperform BM25 (≥40% vs ≥15%)
277
+ **Key patterns**:
278
+ - Sessions have `isValid` flag and `signal` (AbortSignal) for lifecycle tracking
279
+ - Maximum duration timeout prevents runaway sessions
280
+ - Models lazy-load but stay resident; contexts dispose after 2-min idle
281
+ - Singleton pattern ensures only one LLM instance (memory management)
1150
282
 
1151
- #### Hard Queries (6) - Vague, partial memory
1152
- ```typescript
1153
- { query: "nouns not verbs", expectedDoc: "api-design" }
1154
- { query: "Sequoia investor pitch", expectedDoc: "fundraising" }
1155
- { query: "Raft algorithm leader election", expectedDoc: "distributed-systems" }
1156
- { query: "F1 score precision recall", expectedDoc: "machine-learning" }
1157
- { query: "quarterly team gathering travel", expectedDoc: "remote-work" }
1158
- { query: "beta program 47 bugs", expectedDoc: "product-launch" }
1159
- ```
1160
-
1161
- **Expected**: Both methods struggle, hybrid helps (≥35% @ H@5 vs ≥15%)
1162
-
1163
- #### Fusion Queries (6) - Multi-signal needed
1164
- ```typescript
1165
- { query: "how much runway before running out of money", expectedDoc: "fundraising" }
1166
- { query: "datacenter replication sync strategy", expectedDoc: "distributed-systems" }
1167
- { query: "splitting data for training and testing", expectedDoc: "machine-learning" }
1168
- { query: "JSON response codes error messages", expectedDoc: "api-design" }
1169
- { query: "video calls camera async messaging", expectedDoc: "remote-work" }
1170
- { query: "CI/CD pipeline testing coverage", expectedDoc: "product-launch" }
1171
- ```
1172
-
1173
- **Expected**: RRF combines weak signals (≥50% vs ~15-30% for single methods)
1174
-
1175
- ### Results Summary
1176
-
1177
- | Method | Easy H@3 | Medium H@3 | Hard H@5 | Fusion H@3 | Overall H@3 |
1178
- |--------|----------|------------|----------|------------|-------------|
1179
- | **BM25** | ≥80% | ≥15% | ≥15% | ~15% | ≥40% |
1180
- | **Vector** | ≥60% | ≥40% | ≥30% | ~30% | ≥50% |
1181
- | **Hybrid (RRF)** | ≥80% | **≥50%** | **≥35%** | **≥50%** | **≥60%** |
1182
-
1183
- **Key Findings**:
1184
- 1. BM25 sufficient for easy queries (exact matches)
1185
- 2. Vectors essential for medium queries (+233% improvement)
1186
- 3. RRF fusion best for fusion queries (combines weak signals)
1187
- 4. Overall: Hybrid provides 50% improvement over BM25 baseline
1188
-
1189
- ### Latency Analysis
1190
-
1191
- **Measured on M1 Mac, 16GB RAM**:
1192
-
1193
- | Operation | Cold Start | Warm (Cached) | Strong Signal |
1194
- |-----------|------------|---------------|---------------|
1195
- | `search` (BM25) | <50ms | <50ms | <50ms |
1196
- | `vsearch` (Vector) | ~2s (model load) | ~200ms | ~200ms |
1197
- | `query` (Hybrid) | 3-5s (all models) | ~500ms | ~200ms |
1198
-
1199
- **Breakdown for `query` (cold)**:
1200
- - Model loading: ~2s (embed + rerank + expand)
1201
- - Query expansion: ~800ms (LLM generation)
1202
- - FTS + Vector: ~300ms (parallel)
1203
- - RRF fusion: <10ms (pure algorithm)
1204
- - Reranking: ~400ms (cross-encoder scoring)
1205
- - Total: 3-5s
1206
-
1207
- **Breakdown for `query` (warm)**:
1208
- - FTS + Vector: ~300ms
1209
- - RRF fusion: <10ms
1210
- - Reranking (cached): ~50ms
1211
- - Total: ~400-500ms
1212
-
1213
- **Breakdown for `query` (strong signal, skipped)**:
1214
- - FTS: ~50ms
1215
- - Smart detection: <5ms
1216
- - Vector (skipped): 0ms
1217
- - Expansion (skipped): 0ms
1218
- - Reranking (skipped): 0ms
1219
- - Total: ~100-150ms
1220
-
1221
- ### Resource Usage
1222
-
1223
- **Disk Space**:
1224
- - Per document: ~5KB (body + metadata)
1225
- - Per chunk embedding: ~1.5KB (384 floats + metadata)
1226
- - Example: 1000 documents, 5 chunks avg = 5MB + 7.5MB = **12.5MB total**
1227
-
1228
- **Memory**:
1229
- - Base process: ~50MB
1230
- - EmbeddingGemma loaded: +300MB
1231
- - Reranker loaded: +640MB
1232
- - Expansion model loaded: +2.2GB
1233
- - **Peak**: ~3.2GB (all models loaded)
1234
-
1235
- **VRAM** (GPU acceleration):
1236
- - EmbeddingGemma: ~300MB
1237
- - Reranker: ~640MB
1238
- - Expansion: ~2.2GB
1239
- - **Peak**: ~3.2GB
1240
-
1241
- **Optimization**: Models lazy-load and unload after 2min idle.
1242
-
1243
- ### Scalability
1244
-
1245
- **Tested Corpus Sizes**:
1246
- - 100 documents: FTS <10ms, Vector <100ms
1247
- - 1,000 documents: FTS <50ms, Vector <200ms
1248
- - 10,000 documents: FTS <200ms, Vector <500ms
1249
-
1250
- **Bottlenecks**:
1251
- 1. **Embedding generation**: Linear with document count (once)
1252
- 2. **Vector search**: KNN scales log(n) with proper indexing
1253
- 3. **FTS search**: Scales well to millions of documents
1254
- 4. **Reranking**: Linear with candidate count (top 30-40)
1255
-
1256
- **Recommended Limits**:
1257
- - Documents: 50,000+ (tested in production)
1258
- - Per-document size: <10MB (chunking handles larger)
1259
- - Query length: <500 tokens (embedding model limit)
283
+ **Relevance to ClaudeMemory**: If we ever integrate local LLMs for distillation, this session-scoped lifecycle pattern is the right approach. Clean abort propagation via AbortSignal is a good practice for any long-running operation.
1260
284
 
1261
285
  ---
1262
286
 
1263
287
  ## Comparative Analysis
1264
288
 
1265
- ### Data Model Differences
1266
-
1267
- | Dimension | QMD | ClaudeMemory | Analysis |
1268
- |-----------|-----|--------------|----------|
1269
- | **Granularity** | Full markdown documents | Structured facts (triples) | **Different use cases**: QMD = recall, ClaudeMemory = extraction |
1270
- | **Storage** | Content-addressable (SHA256) | Entity-predicate-object | **QMD advantage**: Auto-deduplication. **ClaudeMemory advantage**: Queryable structure |
1271
- | **Retrieval Goal** | "Show me docs about X" | "What do we know about X?" | **Complementary**: QMD finds context, ClaudeMemory distills knowledge |
1272
- | **Truth Model** | All documents valid | Supersession + conflicts | **ClaudeMemory advantage**: Resolves contradictions |
1273
- | **Scope** | YAML collections | Dual-database | **ClaudeMemory advantage**: Clean separation |
1274
-
1275
- **Verdict**: **Different paradigms, not competitors**. QMD optimizes for document recall, ClaudeMemory for knowledge graphs.
289
+ ### What QMD Does Well (New Findings)
1276
290
 
1277
- ### Search Quality
291
+ #### 1. Custom Fine-Tuned Model Pipeline
292
+ - **Description**: Full training pipeline (SFT → GRPO → GGUF conversion) for search-specific model
293
+ - **Evidence**: `finetune/reward.py` — multi-dimensional reward function; `finetune/train.py` — unified training script
294
+ - **Why It Works**: Domain-specific models outperform general-purpose LLMs for structured tasks. The two-stage approach (format learning via SFT, quality refinement via GRPO) is state-of-the-art.
295
+ - **Metric**: Min 92% average score required before deployment
1278
296
 
1279
- | Feature | QMD | ClaudeMemory | Winner |
1280
- |---------|-----|--------------|--------|
1281
- | **Lexical Search** | BM25 (FTS5) | FTS5 | **Tie** |
1282
- | **Vector Search** | EmbeddingGemma (300M) | TF-IDF (lightweight) | **QMD** (but costly) |
1283
- | **Ranking Algorithm** | RRF + position-aware blending | Score sorting | **QMD** |
1284
- | **Reranking** | Cross-encoder LLM | None | **QMD** (but costly) |
1285
- | **Query Expansion** | LLM-generated variants | None | **QMD** (but costly) |
297
+ #### 2. Plugin Distribution
298
+ - **Description**: Ships as a Claude Code marketplace plugin with zero-config MCP + skills
299
+ - **Evidence**: `.claude-plugin/marketplace.json`, `skills/qmd/SKILL.md`
300
+ - **Why It Works**: `claude marketplace add tobi/qmd` is dramatically simpler than manual gem install + MCP config + hook setup
301
+ - **Impact**: Massive UX improvement for installation
1286
302
 
1287
- **Verdict**: **QMD has superior search quality**, but at significant cost (3GB models, 2-3s latency).
303
+ #### 3. Typed Query Routing
304
+ - **Description**: Query expansion produces typed outputs (`lex:`, `vec:`, `hyde:`) routed to appropriate backends
305
+ - **Evidence**: `llm.ts:637-679` — structured prompt; `llm.ts:1006-1013` — grammar constraint
306
+ - **Why It Works**: Different search backends have different strengths. Routing keyword queries to BM25 and semantic queries to vector search maximizes recall.
1288
307
 
1289
- **Key Question**: Is the quality improvement worth the complexity for ClaudeMemory's fact-based use case?
308
+ #### 4. Dual Content/StructuredContent MCP Responses
309
+ - **Description**: Every MCP tool returns both human-readable text summary and machine-parseable JSON
310
+ - **Evidence**: `mcp.ts:288-291` — `return { content: [...], structuredContent: {...} }`
311
+ - **Why It Works**: LLMs can parse both formats, but text summaries are more token-efficient for simple consumption
1290
312
 
1291
- ### Vector Storage
313
+ ### What We Do Well
1292
314
 
1293
- | Aspect | QMD | ClaudeMemory | Winner |
1294
- |--------|-----|--------------|--------|
1295
- | **Storage Format** | sqlite-vec native (vec0) | JSON columns | **QMD** |
1296
- | **KNN Performance** | Native C code | Ruby JSON parsing | **QMD** (10-100x faster) |
1297
- | **Index Type** | Proper vector index | Sequential scan | **QMD** |
1298
- | **Scalability** | Tested to 10,000+ docs | Limited by JSON parsing | **QMD** |
315
+ #### 1. Fact-Based Knowledge Graph
316
+ - Our subject-predicate-object triples enable structured queries and inference
317
+ - Truth maintenance resolves contradictions automatically
318
+ - Far richer than document-level retrieval for knowledge extraction
1299
319
 
1300
- **Verdict**: **QMD's approach is objectively better**. This is a clear adoption opportunity.
320
+ #### 2. Dual-Database Architecture
321
+ - Clean global/project separation without YAML collections
322
+ - Simpler queries, clearer data ownership
1301
323
 
1302
- ### Dependencies
324
+ #### 3. Comprehensive MCP Surface
325
+ - 18 tools vs QMD's 6 — we cover recall, explain, manage, monitor
326
+ - Progressive disclosure (recall_index → recall_details) for token efficiency
1303
327
 
1304
- | Category | QMD | ClaudeMemory | Winner |
1305
- |----------|-----|--------------|--------|
1306
- | **Runtime** | Bun (Node.js compatible) | Ruby 3.2+ | **ClaudeMemory** (simpler) |
1307
- | **Database** | SQLite + sqlite-vec | SQLite | **ClaudeMemory** (fewer deps) |
1308
- | **Embeddings** | EmbeddingGemma (300MB) | TF-IDF (stdlib) | **ClaudeMemory** (lighter) |
1309
- | **LLM** | node-llama-cpp (3GB models) | None (distill only) | **ClaudeMemory** (lighter) |
1310
- | **Install Size** | ~3.5GB (with models) | ~5MB | **ClaudeMemory** |
328
+ #### 4. Lightweight Dependencies
329
+ - ~5MB gem vs ~2GB+ with GGUF models
330
+ - fastembed-rb (67MB ONNX) vs EmbeddingGemma (300MB GGUF)
331
+ - No runtime LLM dependency
1311
332
 
1312
- **Verdict**: **ClaudeMemory is dramatically lighter**, which aligns with our philosophy of pragmatic dependencies.
333
+ #### 5. Robust Benchmarking
334
+ - DevMemBench: 155 queries, Recall@k, MRR, nDCG@10
335
+ - 100 truth maintenance test cases
336
+ - 31 end-to-end scenarios with real Claude
337
+ - QMD has 18 eval queries — our evaluation is more comprehensive
1313
338
 
1314
- ### Offline Capability
339
+ ### Trade-offs
1315
340
 
1316
- | Operation | QMD | ClaudeMemory | Winner |
1317
- |-----------|-----|--------------|--------|
1318
- | **Indexing** | Fully offline | Fully offline | **Tie** |
1319
- | **Searching** | Fully offline | Fully offline (TF-IDF) | **Tie** |
1320
- | **Distillation** | N/A | Requires API | **QMD** (but N/A) |
1321
-
1322
- **Verdict**: **QMD has complete offline capability** for its use case. ClaudeMemory could adopt local embeddings for offline semantic search, but distillation still requires API.
1323
-
1324
- ### Startup Time
1325
-
1326
- | Scenario | QMD | ClaudeMemory | Winner |
1327
- |----------|-----|--------------|--------|
1328
- | **Cold start** | ~2s (model load) | <100ms | **ClaudeMemory** |
1329
- | **Warm start** | <100ms | <100ms | **Tie** |
1330
-
1331
- **Verdict**: **ClaudeMemory starts faster**, which matters for CLI tools. QMD's lazy loading mitigates this.
341
+ | Approach | Pros | Cons | Best For |
342
+ |----------|------|------|----------|
343
+ | **QMD's LLM-powered search** | Better semantic recall, typed query routing | 2GB+ models, 2-3s cold start, complex deps | Large document collections, conceptual search |
344
+ | **Our FastEmbed search** | Lightweight (67MB), fast (<100ms), no LLM | Lower semantic quality for vague queries | Structured fact retrieval, quick lookups |
345
+ | **QMD's plugin distribution** | Zero-config install, marketplace discovery | Requires plugin ecosystem maturity | Wide user adoption |
346
+ | **Our gem + MCP + hooks** | Fine-grained control, works today | Complex setup, multiple config files | Power users, custom integrations |
1332
347
 
1333
348
  ---
1334
349
 
1335
350
  ## Adoption Opportunities
1336
351
 
1337
- ### High Priority (Immediate Adoption)
1338
-
1339
- #### 1. sqlite-vec Extension for Native Vector Storage
1340
-
1341
- **Value**: **10-100x faster KNN queries**, enables larger fact databases without performance degradation.
1342
-
1343
- **QMD Proof**:
1344
- - Handles 10,000+ documents with sub-second vector queries
1345
- - Native C code vs Ruby JSON parsing
1346
- - Proper indexing vs sequential scan
1347
-
1348
- **Current ClaudeMemory**:
1349
- ```ruby
1350
- # lib/claude_memory/embeddings/similarity.rb
1351
- def search_similar(query_embedding, limit: 10)
1352
- # Load ALL facts with embeddings
1353
- facts_data = store.facts_with_embeddings(limit: 5000)
1354
-
1355
- # Parse JSON embeddings (slow!)
1356
- candidates = facts_data.map do |row|
1357
- embedding = JSON.parse(row[:embedding_json])
1358
- { fact_id: row[:id], embedding: embedding }
1359
- end
1360
-
1361
- # Calculate cosine similarity in Ruby (slow!)
1362
- top_matches = candidates.map do |c|
1363
- similarity = cosine_similarity(query_embedding, c[:embedding])
1364
- { candidate: c, similarity: similarity }
1365
- end.sort_by { |m| -m[:similarity] }.take(limit)
1366
- end
1367
- ```
1368
-
1369
- **Problems**:
1370
- - Loads up to 5000 facts into memory
1371
- - JSON parsing overhead per fact
1372
- - O(n) similarity calculation in Ruby
1373
- - No proper indexing
1374
-
1375
- **With sqlite-vec**:
1376
- ```ruby
1377
- # Step 1: Create virtual table (migration v7)
1378
- db.run(<<~SQL)
1379
- CREATE VIRTUAL TABLE facts_vec USING vec0(
1380
- fact_id INTEGER PRIMARY KEY,
1381
- embedding float[384] distance_metric=cosine
1382
- )
1383
- SQL
1384
-
1385
- # Step 2: Query with native KNN (two-step to avoid JOIN hang)
1386
- def search_similar(query_embedding, limit: 10)
1387
- vector_blob = query_embedding.pack('f*') # Float32Array
1388
-
1389
- # Step 2a: Get fact IDs from vec table (no JOINs!)
1390
- vec_results = @store.db[<<~SQL, vector_blob, limit * 3].all
1391
- SELECT fact_id, distance
1392
- FROM facts_vec
1393
- WHERE embedding MATCH ? AND k = ?
1394
- SQL
1395
-
1396
- # Step 2b: Join with facts table separately
1397
- fact_ids = vec_results.map { |r| r[:fact_id] }
1398
- facts = @store.facts.where(id: fact_ids).all
1399
-
1400
- # Merge and sort
1401
- facts.map do |fact|
1402
- distance = vec_results.find { |r| r[:fact_id] == fact[:id] }[:distance]
1403
- { fact: fact, similarity: 1 - distance }
1404
- end.sort_by { |r| -r[:similarity] }
1405
- end
1406
- ```
1407
-
1408
- **Benefits**:
1409
- - **10-100x faster**: Native C code
1410
- - **Better memory**: No need to load all facts
1411
- - **Scales**: Handles 50,000+ facts easily
1412
- - **Industry standard**: Used by Chroma, LanceDB, etc.
1413
-
1414
- **Implementation**:
1415
- 1. Add sqlite-vec extension (gem or FFI)
1416
- 2. Schema migration v7: Create `facts_vec` virtual table
1417
- 3. Backfill existing embeddings
1418
- 4. Update Similarity class
1419
- 5. Test migration on existing databases
1420
-
1421
- **Trade-off**: Adds native dependency, but well-maintained and cross-platform.
1422
-
1423
- **Recommendation**: **ADOPT IMMEDIATELY**. This is a foundational improvement.
1424
-
1425
- ---
1426
-
1427
- #### 2. ⭐ Reciprocal Rank Fusion (RRF) Algorithm
1428
-
1429
- **Value**: **50% improvement in Hit@3** for medium-difficulty queries (QMD evaluation).
1430
-
1431
- **QMD Proof**: Evaluation shows consistent improvements across all query types.
1432
-
1433
- **Current ClaudeMemory**:
1434
- ```ruby
1435
- # lib/claude_memory/recall.rb
1436
- def merge_search_results(vector_results, text_results, limit)
1437
- # Simple dedupe: add all results, prefer vector scores
1438
- combined = {}
1439
-
1440
- vector_results.each { |r| combined[r[:fact][:id]] = r }
1441
- text_results.each { |r| combined[r[:fact][:id]] ||= r }
1442
-
1443
- # Sort by similarity (vector) or default score (FTS)
1444
- combined.values
1445
- .sort_by { |r| -(r[:similarity] || 0) }
1446
- .take(limit)
1447
- end
1448
- ```
1449
-
1450
- **Problems**:
1451
- - No fusion of ranking signals
1452
- - Vector scores dominate (when present)
1453
- - Doesn't boost items appearing in multiple result lists
1454
- - Ignores rank position (only final scores)
1455
-
1456
- **With RRF**:
1457
- ```ruby
1458
- # lib/claude_memory/recall/rrf_fusion.rb
1459
- module ClaudeMemory
1460
- module Recall
1461
- class RRFusion
1462
- DEFAULT_K = 60
1463
-
1464
- def self.fuse(ranked_lists, weights: [], k: DEFAULT_K)
1465
- scores = {}
1466
-
1467
- # Accumulate RRF scores
1468
- ranked_lists.each_with_index do |list, list_idx|
1469
- weight = weights[list_idx] || 1.0
1470
-
1471
- list.each_with_index do |item, rank|
1472
- key = item_key(item)
1473
- rrf_contribution = weight / (k + rank + 1.0)
1474
-
1475
- if scores.key?(key)
1476
- scores[key][:rrf_score] += rrf_contribution
1477
- scores[key][:top_rank] = [scores[key][:top_rank], rank].min
1478
- else
1479
- scores[key] = {
1480
- item: item,
1481
- rrf_score: rrf_contribution,
1482
- top_rank: rank
1483
- }
1484
- end
1485
- end
1486
- end
1487
-
1488
- # Top-rank bonus
1489
- scores.each_value do |entry|
1490
- if entry[:top_rank] == 0
1491
- entry[:rrf_score] += 0.05 # #1 in any list
1492
- elsif entry[:top_rank] <= 2
1493
- entry[:rrf_score] += 0.02 # #2-3 in any list
1494
- end
1495
- end
1496
-
1497
- # Sort and return
1498
- scores.values
1499
- .sort_by { |e| -e[:rrf_score] }
1500
- .map { |e| e[:item].merge(rrf_score: e[:rrf_score]) }
1501
- end
1502
-
1503
- private
1504
-
1505
- def self.item_key(item)
1506
- # Dedupe by fact signature
1507
- fact = item[:fact]
1508
- "#{fact[:subject_name]}:#{fact[:predicate]}:#{fact[:object_literal]}"
1509
- end
1510
- end
1511
- end
1512
- end
1513
- ```
1514
-
1515
- **Benefits**:
1516
- - **Mathematically sound**: Well-studied in IR literature
1517
- - **Handles score scale differences**: BM25 vs cosine similarity
1518
- - **Boosts multi-method matches**: Items in both lists get higher scores
1519
- - **Preserves exact matches**: Top-rank bonus keeps strong signals at top
1520
- - **Pure algorithm**: No dependencies, fast (<10ms)
1521
-
1522
- **Implementation**:
1523
- 1. Create `lib/claude_memory/recall/rrf_fusion.rb`
1524
- 2. Update `Recall#query_semantic_dual` to use RRF
1525
- 3. Test with synthetic ranked lists
1526
- 4. Validate improvements with eval suite (if we create one)
1527
-
1528
- **Trade-off**: Slightly more complex than naive merging, but well worth it.
1529
-
1530
- **Recommendation**: **ADOPT IMMEDIATELY**. Pure algorithmic improvement with proven results.
1531
-
1532
- ---
1533
-
1534
- #### 3. ⭐ Docid Short Hash System
1535
-
1536
- **Value**: **Better UX**, enables cross-database references without context.
1537
-
1538
- **QMD Implementation**:
1539
- ```typescript
1540
- // Generate 6-character docid from content hash
1541
- function getDocid(hash: string): string {
1542
- return hash.slice(0, 6); // First 6 chars
1543
- }
1544
-
1545
- // Use in output
1546
- {
1547
- docid: `#${getDocid(row.hash)}`,
1548
- file: row.path,
1549
- // ...
1550
- }
1551
-
1552
- // Retrieval
1553
- qmd get "#abc123" // Works!
1554
- qmd get "abc123" // Also works!
1555
- ```
1556
-
1557
- **Current ClaudeMemory**:
1558
- ```ruby
1559
- # Facts referenced by integer IDs
1560
- claude-memory explain 42 # Which database? Which project?
1561
- ```
1562
-
1563
- **Problems**:
1564
- - Integer IDs are database-specific (global vs project)
1565
- - Not user-friendly
1566
- - No quick reference format
1567
-
1568
- **With Docids**:
1569
- ```ruby
1570
- # Migration v8: Add docid column
1571
- def migrate_to_v8_safe!
1572
- @db.transaction do
1573
- @db.alter_table(:facts) do
1574
- add_column :docid, String, size: 8
1575
- add_index :docid, unique: true
1576
- end
1577
-
1578
- # Backfill docids
1579
- @db[:facts].each do |fact|
1580
- signature = "#{fact[:id]}:#{fact[:subject_entity_id]}:#{fact[:predicate]}:#{fact[:object_literal]}"
1581
- hash = Digest::SHA256.hexdigest(signature)
1582
- docid = hash[0...8] # 8 chars for lower collision risk
1583
-
1584
- # Handle collisions (rare with 8 chars)
1585
- while @db[:facts].where(docid: docid).count > 0
1586
- hash = Digest::SHA256.hexdigest(hash + rand.to_s)
1587
- docid = hash[0...8]
1588
- end
1589
-
1590
- @db[:facts].where(id: fact[:id]).update(docid: docid)
1591
- end
1592
- end
1593
- end
1594
-
1595
- # Usage
1596
- claude-memory explain abc123 # Works across databases!
1597
- claude-memory explain #abc123 # Also works!
1598
-
1599
- # Output formatting
1600
- puts "Fact ##{fact[:docid]}: #{fact[:subject_name]} #{fact[:predicate]} ..."
1601
- ```
1602
-
1603
- **Benefits**:
1604
- - **Database-agnostic**: Same reference works for global/project facts
1605
- - **User-friendly**: `#abc123` is memorable and shareable
1606
- - **Standard pattern**: Git uses short SHAs, QMD uses short hashes
1607
-
1608
- **Implementation**:
1609
- 1. Schema migration v8: Add `docid` column
1610
- 2. Backfill existing facts
1611
- 3. Update CLI commands to accept docids
1612
- 4. Update MCP tools to accept docids
1613
- 5. Update output formatting to show docids
1614
-
1615
- **Trade-off**:
1616
- - Hash collisions possible (8 chars = 1 in 4.3 billion, very rare)
1617
- - Migration backfills existing facts (one-time cost)
1618
-
1619
- **Recommendation**: **ADOPT IN PHASE 3**. Clear UX improvement with minimal cost.
1620
-
1621
- ---
1622
-
1623
- #### 4. ⭐ Smart Expansion Detection
1624
-
1625
- **Value**: **Skip unnecessary vector search** when FTS finds exact match, saving 200-500ms per query.
1626
-
1627
- **QMD Implementation**:
1628
- ```typescript
1629
- // Check if BM25 has strong, clear top result
1630
- const topScore = initialFts[0]?.score ?? 0;
1631
- const secondScore = initialFts[1]?.score ?? 0;
1632
- const hasStrongSignal =
1633
- initialFts.length > 0 &&
1634
- topScore >= 0.85 &&
1635
- (topScore - secondScore) >= 0.15;
1636
-
1637
- if (hasStrongSignal) {
1638
- // Skip expensive vector search and LLM operations
1639
- return initialFts.slice(0, limit);
1640
- }
1641
- ```
1642
-
1643
- **QMD Data**: Saves 2-3 seconds on ~60% of queries (exact keyword matches).
1644
-
1645
- **Current ClaudeMemory**:
1646
- ```ruby
1647
- # Always run both FTS and vector search
1648
- def query_semantic_dual(text, limit:, scope:, mode:)
1649
- fts_results = collect_fts_results(...)
1650
- vec_results = query_vector_stores(...) # Always runs
1651
-
1652
- RRFusion.fuse([fts_results, vec_results])
1653
- end
1654
- ```
1655
-
1656
- **With Smart Detection**:
1657
- ```ruby
1658
- # lib/claude_memory/recall/expansion_detector.rb
1659
- module ClaudeMemory
1660
- module Recall
1661
- class ExpansionDetector
1662
- STRONG_SCORE_THRESHOLD = 0.85
1663
- STRONG_GAP_THRESHOLD = 0.15
1664
-
1665
- def self.should_skip_expansion?(results)
1666
- return false if results.empty? || results.size < 2
1667
-
1668
- top_score = results[0][:score] || 0
1669
- second_score = results[1][:score] || 0
1670
- gap = top_score - second_score
1671
-
1672
- top_score >= STRONG_SCORE_THRESHOLD &&
1673
- gap >= STRONG_GAP_THRESHOLD
1674
- end
1675
- end
1676
- end
1677
- end
1678
-
1679
- # Apply in Recall
1680
- def query_semantic_dual(text, limit:, scope:, mode:)
1681
- # First try FTS
1682
- fts_results = collect_fts_results(text, limit: limit * 2, scope: scope)
1683
-
1684
- # Check if we can skip vector search
1685
- if mode == :both && ExpansionDetector.should_skip_expansion?(fts_results)
1686
- return fts_results.first(limit) # Strong FTS signal
1687
- end
1688
-
1689
- # Weak signal - proceed with vector search and fusion
1690
- vec_results = query_vector_stores(text, limit: limit * 2, scope: scope)
1691
- RRFusion.fuse([fts_results, vec_results], weights: [1.0, 1.0]).first(limit)
1692
- end
1693
- ```
1694
-
1695
- **Benefits**:
1696
- - **Performance optimization**: Avoids unnecessary vector search
1697
- - **Simple heuristic**: Well-tested thresholds from QMD
1698
- - **Transparent**: Can log when skipping for metrics
1699
- - **No false negatives**: Only skips when FTS is very confident
1700
-
1701
- **Implementation**:
1702
- 1. Create `lib/claude_memory/recall/expansion_detector.rb`
1703
- 2. Update `Recall#query_semantic_dual` to use detector
1704
- 3. Test with known exact-match queries
1705
- 4. Add optional metrics tracking
1706
-
1707
- **Trade-off**: May miss semantically similar results for exact matches (acceptable).
1708
-
1709
- **Recommendation**: **ADOPT IN PHASE 4**. Clear performance win with minimal code.
1710
-
1711
- ---
1712
-
1713
- ### Medium Priority (Valuable but Higher Cost)
1714
-
1715
- #### 5. Document Chunking Strategy
1716
-
1717
- **Value**: Better embeddings for long transcripts (>3000 chars).
1718
-
1719
- **QMD Approach**:
1720
- - 800 tokens max, 15% overlap
1721
- - Semantic boundary detection
1722
- - Both token-based and char-based variants
1723
-
1724
- **Current ClaudeMemory**: Embeds entire fact text (typically short).
1725
-
1726
- **When Needed**: If users have very long transcripts that produce multi-paragraph facts.
1727
-
1728
- **Recommendation**: **CONSIDER** if we see performance issues with long content.
1729
-
1730
- ---
1731
-
1732
- #### 6. LLM Response Caching
1733
-
1734
- **Value**: Reduce API costs for repeated distillation.
1735
-
1736
- **QMD Proof**: Caches query expansion and reranking, achieves ~80% cache hit rate.
1737
-
1738
- **Implementation**:
1739
- ```ruby
1740
- # lib/claude_memory/distill/cache.rb
1741
- class DistillerCache
1742
- def initialize(store)
1743
- @store = store
1744
- end
1745
-
1746
- def fetch(content_hash)
1747
- @store.db[:llm_cache].where(hash: content_hash).first&.dig(:result)
1748
- end
1749
-
1750
- def store(content_hash, result)
1751
- @store.db[:llm_cache].insert_or_replace(
1752
- hash: content_hash,
1753
- result: result.to_json,
1754
- created_at: Time.now.iso8601
1755
- )
1756
-
1757
- # Probabilistic cleanup (1% chance)
1758
- cleanup_if_needed if rand < 0.01
1759
- end
1760
-
1761
- private
1762
-
1763
- def cleanup_if_needed
1764
- @store.db.transaction do
1765
- @store.db.run(<<~SQL)
1766
- DELETE FROM llm_cache
1767
- WHERE hash NOT IN (
1768
- SELECT hash FROM llm_cache
1769
- ORDER BY created_at DESC
1770
- LIMIT 1000
1771
- )
1772
- SQL
1773
- end
1774
- end
1775
- end
1776
- ```
1777
-
1778
- **Recommendation**: **ADOPT when distiller is fully implemented**. Clear cost savings.
1779
-
1780
- ---
1781
-
1782
- ### Low Priority (Interesting but Not Critical)
1783
-
1784
- #### 7. Enhanced Snippet Extraction
1785
-
1786
- **Value**: Better search result previews with query term highlighting.
1787
-
1788
- **QMD Approach**:
1789
- ```typescript
1790
- function extractSnippet(body: string, query: string, maxLen = 500) {
1791
- const terms = query.toLowerCase().split(/\s+/);
1792
-
1793
- // Find line with most query term matches
1794
- const lines = body.split('\n');
1795
- let bestLine = 0, bestScore = -1;
1796
-
1797
- for (let i = 0; i < lines.length; i++) {
1798
- const lineLower = lines[i].toLowerCase();
1799
- const score = terms.filter(t => lineLower.includes(t)).length;
1800
- if (score > bestScore) {
1801
- bestScore = score;
1802
- bestLine = i;
1803
- }
1804
- }
1805
-
1806
- // Extract context (1 line before, 2 lines after)
1807
- const start = Math.max(0, bestLine - 1);
1808
- const end = Math.min(lines.length, bestLine + 3);
1809
- const snippet = lines.slice(start, end).join('\n');
1810
-
1811
- return {
1812
- line: bestLine + 1,
1813
- snippet: snippet.substring(0, maxLen),
1814
- linesBefore: start,
1815
- linesAfter: lines.length - end
1816
- };
1817
- }
1818
- ```
1819
-
1820
- **Recommendation**: **CONSIDER for better UX** in search results.
1821
-
1822
- ---
1823
-
1824
- ### Features NOT to Adopt
1825
-
1826
- #### ❌ YAML Collection System
1827
-
1828
- **QMD Use**: Manages multi-directory indexing with per-path contexts.
1829
-
1830
- **Our Use**: Dual-database (global + project) already provides clean separation.
1831
-
1832
- **Mismatch**: Collections add complexity without clear benefit for our use case.
1833
-
1834
- **Recommendation**: **REJECT** - Our dual-DB approach is simpler and better suited.
1835
-
1836
- ---
1837
-
1838
- #### ❌ Content-Addressable Document Storage
1839
-
1840
- **QMD Use**: Deduplicates full markdown documents by SHA256 hash.
1841
-
1842
- **Our Use**: Facts are deduplicated by semantic signature, not content hash.
1843
-
1844
- **Mismatch**: We don't store full documents, we extract facts.
1845
-
1846
- **Recommendation**: **REJECT** - Different data model.
1847
-
1848
- ---
1849
-
1850
- #### ❌ Virtual Path System (qmd://collection/path)
1851
-
1852
- **QMD Use**: Unified namespace across multiple collections.
1853
-
1854
- **Our Use**: Dual-database provides clear namespace (global vs project).
1855
-
1856
- **Mismatch**: Adds complexity for no clear benefit.
1857
-
1858
- **Recommendation**: **REJECT** - Unnecessary abstraction.
1859
-
1860
- ---
1861
-
1862
- #### ❌ Neural Embeddings (EmbeddingGemma)
1863
-
1864
- **QMD Use**: 300M parameter model for high-quality semantic search.
1865
-
1866
- **Our Use**: TF-IDF (lightweight, no dependencies).
1867
-
1868
- **Trade-off**:
1869
- - ✅ Better quality (+40% Hit@3 over TF-IDF)
1870
- - ❌ 300MB download
1871
- - ❌ 300MB VRAM
1872
- - ❌ 2s cold start latency
1873
- - ❌ Complex dependency (node-llama-cpp or similar)
1874
-
1875
- **Decision**: **DEFER** - TF-IDF sufficient for now. Revisit if users report poor semantic search quality.
1876
-
1877
- ---
1878
-
1879
- #### ❌ Cross-Encoder Reranking
1880
-
1881
- **QMD Use**: LLM scores query-document relevance for final ranking.
1882
-
1883
- **Our Use**: None (just use retrieval scores).
1884
-
1885
- **Trade-off**:
1886
- - ✅ Better precision (elevates semantically relevant results)
1887
- - ❌ 640MB model
1888
- - ❌ 400ms latency per query
1889
- - ❌ Complex dependency
1890
-
1891
- **Decision**: **REJECT** - Over-engineering for fact retrieval. Facts are already structured; reranking is overkill.
1892
-
1893
- ---
1894
-
1895
- #### ❌ Query Expansion (LLM)
1896
-
1897
- **QMD Use**: Generates alternative query phrasings for better recall.
1898
-
1899
- **Our Use**: None (single query only).
1900
-
1901
- **Trade-off**:
1902
- - ✅ Better recall (finds documents with different terminology)
1903
- - ❌ 2.2GB model
1904
- - ❌ 800ms latency per query
1905
- - ❌ Complex dependency
1906
-
1907
- **Decision**: **REJECT** - We don't have LLM in recall path (only in distill). Adding LLM dependency for recall is too heavy.
352
+ ### High Priority
353
+
354
+ #### 1. Claude Code Plugin Distribution Format NEW
355
+ - **Value**: 10x easier installation (single command vs multi-step gem + MCP + hook config)
356
+ - **Evidence**: `.claude-plugin/marketplace.json` complete plugin spec; `skills/qmd/SKILL.md` skill definition with tool scoping
357
+ - **Implementation**: Create `.claude-plugin/marketplace.json` with `mcpServers` pointing to `claude-memory serve-mcp`, skill definition from existing MCP tools, and `allowed-tools: mcp__claude-memory__*`
358
+ - **Effort**: 2-3 days (plugin metadata, skill definition, testing, documentation)
359
+ - **Trade-off**: Depends on Claude Code plugin ecosystem maturity; current hooks integration may still be needed
360
+ - **Recommendation**: **ADOPT** QMD proves the format works. Start with plugin skeleton, iterate as ecosystem matures
361
+ - **Integration Points**: New `.claude-plugin/` directory, `skills/` directory, update installation docs
362
+
363
+ #### 2. MCP Structured Content Pattern ⭐ NEW
364
+ - **Value**: Better MCP response quality — dual human-readable + machine-parseable output
365
+ - **Evidence**: `mcp.ts:288-291` — `{ content: [{ type: "text", text: summary }], structuredContent: { results } }`
366
+ - **Implementation**: Update all 18 MCP tool handlers to return both `content` (text summary) and `structuredContent` (JSON). Text content would be a concise summary; structured content preserves full data.
367
+ - **Effort**: 1-2 days (update tool handlers, update tests)
368
+ - **Trade-off**: Slightly more code per tool handler; may need to verify Claude Code MCP client supports `structuredContent`
369
+ - **Recommendation**: **ADOPT** — Pure improvement, no downside if client supports it
370
+ - **Integration Points**: `lib/claude_memory/mcp/server.rb`, all tool handler methods
371
+
372
+ #### 3. MCP Registered Prompt for Query Guidance ⭐ NEW
373
+ - **Value**: Claude uses memory tools more effectively with embedded search strategy
374
+ - **Evidence**: `mcp.ts:172-252` — registered prompt explaining when to use recall vs recall_semantic vs search_concepts
375
+ - **Implementation**: Register a `memory_guide` prompt in our MCP server explaining tool selection strategy (recall for keywords, recall_semantic for concepts, search_concepts for multi-faceted queries, explain for provenance)
376
+ - **Effort**: 4-6 hours (write prompt, register in server, test)
377
+ - **Trade-off**: Minimal; prompt is only loaded on request
378
+ - **Recommendation**: **ADOPT** — Simple way to improve tool usage quality
379
+ - **Integration Points**: `lib/claude_memory/mcp/server.rb`
380
+
381
+ #### 4. Inline Status Check in Skills ⭐ NEW
382
+ - **Value**: Immediate feedback on memory system health when skill loads
383
+ - **Evidence**: `SKILL.md:18` — `!` prefix runs `qmd status 2>/dev/null || echo "Not installed"`
384
+ - **Implementation**: Add inline check to our skill definition: `!claude-memory doctor --brief 2>/dev/null || echo "Not configured. Run: gem install claude_memory"`
385
+ - **Effort**: 1-2 hours
386
+ - **Trade-off**: None
387
+ - **Recommendation**: **ADOPT** Trivial improvement with clear benefit
388
+ - **Integration Points**: Skill definition file
389
+
390
+ ### Previously Identified (Carried Forward)
391
+
392
+ These items from the 2026-01-26 analysis remain relevant:
393
+
394
+ #### 5. Native Vector Storage (sqlite-vec) — STILL CRITICAL
395
+ - **Value**: 10-100x faster KNN queries
396
+ - **Status**: Not yet implemented in ClaudeMemory
397
+ - **Updated Evidence**: QMD now handles 10,000+ documents in production (5,700+ star project)
398
+ - **Recommendation**: **ADOPT IMMEDIATELY** — Foundational improvement
399
+
400
+ #### 6. Reciprocal Rank Fusion (RRF) Algorithm STILL HIGH VALUE
401
+ - **Value**: 50% improvement in Hit@3 for medium-difficulty queries
402
+ - **Status**: Not yet implemented in ClaudeMemory
403
+ - **Recommendation**: **ADOPT IMMEDIATELY** — Pure algorithmic improvement
404
+
405
+ #### 7. Docid Short Hash System — STILL MEDIUM VALUE
406
+ - **Value**: Better UX, cross-database fact references
407
+ - **Status**: Not yet implemented
408
+ - **Recommendation**: **ADOPT IN PHASE 2**
409
+
410
+ #### 8. ⭐ Smart Expansion Detection — STILL MEDIUM VALUE
411
+ - **Value**: Skip unnecessary vector search when FTS has strong signal
412
+ - **Status**: Not yet implemented
413
+ - **Recommendation**: **ADOPT IN PHASE 3**
414
+
415
+ ### Medium Priority
416
+
417
+ #### 9. Skill Definition with Tool Scoping
418
+ - **Value**: Security and UX — limit tool access to memory-related commands
419
+ - **Evidence**: `SKILL.md:9` `allowed-tools: Bash(qmd:*), mcp__qmd__*`
420
+ - **Implementation**: Define skill with `allowed-tools: Bash(claude-memory:*), mcp__claude-memory__*`
421
+ - **Effort**: Included in plugin distribution work
422
+ - **Recommendation**: **CONSIDER** — Good practice for plugin security
423
+ - **Integration Points**: Skills directory
424
+
425
+ #### 10. Evaluation Harness Improvements
426
+ - **Value**: QMD's eval structure with difficulty levels and Hit@K metrics is cleaner
427
+ - **Evidence**: `test/eval-harness.ts:11-16` typed queries with difficulty + description
428
+ - **Implementation**: Already have DevMemBench (more comprehensive). Could adopt difficulty classification.
429
+ - **Recommendation**: **CONSIDER** — Our evals are already better; could add difficulty labels
430
+
431
+ ### Low Priority
432
+
433
+ #### 11. YAML-Based Collection Configuration
434
+ - **Value**: User-editable config for what gets indexed
435
+ - **Evidence**: `collections.ts`, `example-index.yml`
436
+ - **Recommendation**: **REJECT** Our dual-database provides cleaner separation
437
+
438
+ #### 12. Custom Query Expansion Model
439
+ - **Value**: Better search recall via ML-powered query rewriting
440
+ - **Evidence**: `finetune/` — complete training pipeline
441
+ - **Recommendation**: **REJECT** — Too heavy (1.7B model) for our fact retrieval use case. If we need expansion, we can leverage Claude's own capabilities during recall.
442
+
443
+ #### 13. LLM-Based Reranking
444
+ - **Value**: Better ranking precision
445
+ - **Recommendation**: **REJECT** — Over-engineering for structured fact retrieval
446
+
447
+ ### Features to Avoid
448
+
449
+ #### 1. Heavy Local LLM Dependencies
450
+ - **What It Is**: Three GGUF models totaling ~2GB for search operations
451
+ - **Why Avoid**: ClaudeMemory targets lightweight, instant search. 2-3s cold start and 3GB memory is inappropriate for a fact lookup tool.
452
+ - **Our Alternative**: FastEmbed (67MB ONNX, <100ms) provides adequate semantic search for structured facts.
453
+
454
+ #### 2. Content-Addressable Document Storage
455
+ - **What It Is**: SHA256 hash-based deduplication of full documents
456
+ - **Why Avoid**: We store facts, not documents. Our deduplication is by fact signature.
457
+ - **Our Alternative**: Existing fact signature-based deduplication.
1908
458
 
1909
459
  ---
1910
460
 
1911
461
  ## Implementation Recommendations
1912
462
 
1913
- ### Phased Adoption Strategy
463
+ ### Phase 1: Plugin Foundation (NEW)
1914
464
 
1915
- #### Phase 1: Vector Storage Foundation (IMMEDIATE)
1916
-
1917
- **Goal**: Adopt sqlite-vec and RRF fusion for performance and quality.
465
+ **Goals**: Establish ClaudeMemory as a Claude Code plugin with improved MCP output
1918
466
 
1919
467
  **Tasks**:
1920
- 1. Add sqlite-vec extension support (gem or FFI)
1921
- 2. Create schema migration v7 for `facts_vec` virtual table
1922
- 3. Backfill existing embeddings (one-time migration)
1923
- 4. Update `Embeddings::Similarity` class for native KNN
1924
- 5. Implement `Recall::RRFusion` class
1925
- 6. Update `Recall#query_semantic_dual` to use RRF
1926
- 7. Test migration on existing databases
1927
- 8. Document extension installation in README
1928
-
1929
- **Expected Impact**:
1930
- - 10-100x faster vector search
1931
- - 50% better hybrid search quality (Hit@3)
1932
- - Scales to 50,000+ facts
1933
-
1934
- **Effort**: 2-3 days
468
+ - [ ] Create `.claude-plugin/marketplace.json` with plugin metadata
469
+ - [ ] Create skill definition with tool scoping and inline health check
470
+ - [ ] Add MCP structured content pattern to all 18 tool handlers
471
+ - [ ] Register query guidance prompt in MCP server
472
+ - [ ] Test plugin installation workflow
473
+ - [ ] Update installation docs
1935
474
 
1936
- ---
475
+ **Success Criteria**:
476
+ - ClaudeMemory installable via `claude plugin add`
477
+ - MCP tools return both text summaries and structured JSON
478
+ - Query guide prompt available via MCP
1937
479
 
1938
- #### Phase 2: UX Improvements (NEAR-TERM)
1939
-
1940
- **Goal**: Adopt docid hashes and smart detection for better UX and performance.
1941
-
1942
- **Tasks**:
1943
- 1. Create schema migration v8 for `docid` column
1944
- 2. Backfill existing facts with docids
1945
- 3. Update CLI commands (`ExplainCommand`, `RecallCommand`) to accept docids
1946
- 4. Update MCP tools to accept docids
1947
- 5. Update output formatting to show docids
1948
- 6. Implement `Recall::ExpansionDetector` class
1949
- 7. Update `Recall#query_semantic_dual` to use detector
1950
- 8. Add optional metrics tracking (skip rate, avg latency)
1951
-
1952
- **Expected Impact**:
1953
- - Better UX (human-friendly fact references)
1954
- - 200-500ms latency reduction on exact matches
1955
- - Cross-database references without context
1956
-
1957
- **Effort**: 1-2 days
480
+ **Risks**: Plugin ecosystem may change; maintain backward compatibility with manual setup
1958
481
 
1959
482
  ---
1960
483
 
1961
- #### Phase 3: Caching and Optimization (FUTURE)
484
+ ### Phase 2: Vector Storage Upgrade (CARRIED FORWARD)
1962
485
 
1963
- **Goal**: Reduce API costs and optimize for long content.
486
+ **Goals**: Adopt sqlite-vec for native KNN and RRF fusion for search quality
1964
487
 
1965
488
  **Tasks**:
1966
- 1. Add `llm_cache` table to schema
1967
- 2. Implement `Distill::Cache` class
1968
- 3. Update `Distill::Distiller` to use cache
1969
- 4. Add probabilistic cleanup (1% chance per distill)
1970
- 5. Evaluate document chunking for long transcripts
1971
- 6. Implement chunking strategy if needed
489
+ - [ ] Add sqlite-vec extension support
490
+ - [ ] Schema migration for `facts_vec` virtual table (two-step query pattern)
491
+ - [ ] Implement `Recall::RRFusion` class
492
+ - [ ] Backfill existing embeddings
493
+ - [ ] Benchmark: target 10x KNN improvement
1972
494
 
1973
- **Expected Impact**:
1974
- - Reduced API costs (80% cache hit rate expected)
1975
- - Better handling of long transcripts (if needed)
1976
-
1977
- **Effort**: 2-3 days
1978
-
1979
- ---
1980
-
1981
- ### Testing Strategy
1982
-
1983
- **Unit Tests**:
1984
- - RRFusion algorithm with synthetic ranked lists
1985
- - ExpansionDetector with various score distributions
1986
- - Docid generation and collision handling
1987
- - sqlite-vec migration (up and down)
1988
-
1989
- **Integration Tests**:
1990
- - End-to-end hybrid search with RRF fusion
1991
- - Cross-database docid lookups
1992
- - Cache hit/miss behavior
1993
- - Smart detection skip rate
1994
-
1995
- **Evaluation Suite** (optional but recommended):
1996
- - Create synthetic fact corpus with known relationships
1997
- - Define easy/medium/hard recall queries
1998
- - Measure Hit@K before/after RRF adoption
1999
- - Track latency improvements from smart detection
2000
-
2001
- **Performance Tests**:
2002
- - Benchmark vector search: JSON vs sqlite-vec
2003
- - Measure RRF overhead (<10ms expected)
2004
- - Profile smart detection accuracy
495
+ **Success Criteria**:
496
+ - Vector search uses native sqlite-vec
497
+ - RRF fusion active for hybrid queries
498
+ - DevMemBench shows improved retrieval metrics
2005
499
 
2006
500
  ---
2007
501
 
2008
- ### Migration Safety
502
+ ### Phase 3: UX Polish (CARRIED FORWARD)
2009
503
 
2010
- **Schema Migrations**:
2011
- - Always use transactions for atomicity
2012
- - Provide rollback path (down migration)
2013
- - Test on copy of production database first
2014
- - Backup before running migrations
504
+ **Goals**: Docid hashes and smart expansion detection
2015
505
 
2016
- **Backfill Strategy**:
2017
- - Run backfill in batches (1000 facts at a time)
2018
- - Add progress reporting for long operations
2019
- - Handle errors gracefully (skip + log)
2020
-
2021
- **Rollback Plan**:
2022
- - Keep JSON embeddings column until v7 is stable
2023
- - Provide `migrate_down_to_v6` method
2024
- - Document rollback procedure in CHANGELOG
506
+ **Tasks**:
507
+ - [ ] Schema migration for `docid` column (8-char hash)
508
+ - [ ] Implement `Recall::ExpansionDetector`
509
+ - [ ] Update CLI and MCP tools for docid support
2025
510
 
2026
511
  ---
2027
512
 
2028
513
  ## Architecture Decisions
2029
514
 
2030
- ### Preserve Our Unique Advantages
2031
-
2032
- **1. Fact-Based Knowledge Graph**
2033
-
2034
- **What**: Subject-predicate-object triples vs full document storage.
2035
-
2036
- **Why Keep**:
2037
- - Enables structured queries ("What databases does X use?")
2038
- - Supports inference (supersession, conflicts)
2039
- - More precise than document-level retrieval
2040
-
2041
- **Don't Adopt**: QMD's document-centric model.
2042
-
2043
- ---
2044
-
2045
- **2. Truth Maintenance System**
515
+ ### What to Preserve
2046
516
 
2047
- **What**: Supersession, conflict detection, predicate policies.
517
+ - **Fact-Based Knowledge Graph**: Our structured triples are fundamentally different from (and better suited for knowledge extraction than) QMD's document storage
518
+ - **Truth Maintenance**: Supersession + conflict resolution is a core differentiator
519
+ - **Dual-Database Architecture**: Cleaner than YAML collections for our use case
520
+ - **Lightweight Dependencies**: Ruby gem + ONNX embeddings vs 2GB+ GGUF models
2048
521
 
2049
- **Why Keep**:
2050
- - Resolves contradictions automatically
2051
- - Distinguishes single-value vs multi-value predicates
2052
- - Provides evidence chain via provenance
522
+ ### What to Adopt (NEW)
2053
523
 
2054
- **Don't Adopt**: QMD's "all documents valid" model.
2055
-
2056
- ---
524
+ - **Plugin Distribution Format**: `.claude-plugin/marketplace.json` + skills for frictionless installation
525
+ - **Structured MCP Content**: Dual `content`/`structuredContent` responses for all tools
526
+ - **MCP Query Guide Prompt**: Registered prompt teaching Claude how to use memory tools effectively
527
+ - **Inline Status Checks**: Skill-level health verification on load
2057
528
 
2058
- **3. Dual-Database Architecture**
529
+ ### What to Adopt (CARRIED FORWARD)
2059
530
 
2060
- **What**: Separate global.sqlite3 and project.sqlite3.
531
+ - **sqlite-vec Native Vectors**: 10-100x faster KNN (critical)
532
+ - **RRF Fusion**: 50% search quality improvement (critical)
533
+ - **Docid Short Hashes**: Better UX for fact references
534
+ - **Smart Expansion Detection**: Skip vector search when FTS is confident
2061
535
 
2062
- **Why Keep**:
2063
- - Clean separation of concerns
2064
- - Better than YAML collections for our use case
2065
- - Simpler queries (no project_path filtering)
536
+ ### What to Reject
2066
537
 
2067
- **Don't Adopt**: QMD's YAML collection system.
538
+ - **Local LLM Models for Search**: Too heavy (2GB+, 3s cold start)
539
+ - **Custom Fine-Tuned Models**: Training pipeline is impressive but overkill for fact retrieval
540
+ - **YAML Collection System**: Our dual-DB is better for our use case
541
+ - **Content-Addressable Storage**: Different data model
542
+ - **Virtual Path System**: Unnecessary for fact-based storage
2068
543
 
2069
544
  ---
2070
545
 
2071
- **4. Lightweight Dependencies**
546
+ ## Key Takeaways
2072
547
 
2073
- **What**: Ruby stdlib, SQLite, minimal gems.
548
+ ### Main Learnings
2074
549
 
2075
- **Why Keep**:
2076
- - Fast installation (<5MB)
2077
- - No heavy models required
2078
- - Works offline for core features
550
+ 1. **Plugin distribution is the future**: QMD's marketplace plugin reduces installation from "read docs, install gem, configure MCP, set up hooks, restart Claude" to one command. This is the single most impactful UX improvement we should adopt.
2079
551
 
2080
- **Selectively Adopt**:
2081
- - ✅ sqlite-vec (small, well-maintained)
2082
- - ❌ Neural embeddings (300MB, complex)
2083
- - ❌ LLM reranking (640MB, complex)
552
+ 2. **Structured MCP responses matter**: Returning both text summary and structured JSON is a simple pattern that significantly improves how Claude consumes tool output.
2084
553
 
2085
- ---
554
+ 3. **Fine-tuned models for specific tasks work**: QMD's two-stage SFT→GRPO pipeline for query expansion is state-of-the-art. While we shouldn't adopt the models themselves (too heavy), the reward function design and structured output routing are good reference patterns.
2086
555
 
2087
- ### Adopt Their Innovations
556
+ 4. **Eval methodology with difficulty levels**: QMD's easy/medium/hard query classification provides clearer signal about where improvements matter. Our DevMemBench is more comprehensive but could benefit from this labeling.
2088
557
 
2089
- **1. Native Vector Storage (sqlite-vec)**
558
+ 5. **The previous QMD analysis recommendations remain valid**: sqlite-vec, RRF, docids, and smart expansion are still unimplemented and still valuable.
2090
559
 
2091
- **Why Adopt**:
2092
- - Industry standard (used by Chroma, LanceDB, etc.)
2093
- - 10-100x performance improvement
2094
- - Enables larger databases
2095
- - Well-maintained, cross-platform
560
+ ### Recommended Adoption Order
2096
561
 
2097
- **Implementation**: Phase 1 (immediate).
562
+ 1. **First**: Plugin distribution format — highest UX impact, unblocks ecosystem adoption
563
+ 2. **Second**: MCP structured content + query guide prompt — low effort, immediate quality gain
564
+ 3. **Third**: sqlite-vec + RRF fusion — foundational performance and quality
565
+ 4. **Fourth**: Docids + smart expansion — polish and optimization
2098
566
 
2099
- ---
567
+ ### Expected Impact
2100
568
 
2101
- **2. RRF Fusion Algorithm**
569
+ - **Installation**: 10x easier (single command vs multi-step)
570
+ - **MCP Quality**: Better Claude tool usage with structured responses + query guidance
571
+ - **Search Performance**: 10-100x faster KNN (sqlite-vec), 50% better Hit@3 (RRF)
572
+ - **UX**: Human-friendly fact references (#abc123de), smarter search skipping
2102
573
 
2103
- **Why Adopt**:
2104
- - Mathematically sound
2105
- - Proven results (50% improvement)
2106
- - Pure algorithm (no dependencies)
2107
- - Fast (<10ms overhead)
574
+ ### Next Actions
2108
575
 
2109
- **Implementation**: Phase 1 (immediate).
576
+ - [ ] Review plugin distribution feasibility (check Claude Code plugin spec)
577
+ - [ ] Implement MCP structured content pattern (quick win)
578
+ - [ ] Register query guide MCP prompt (quick win)
579
+ - [ ] Continue with sqlite-vec + RRF adoption plan from previous analysis
580
+ - [ ] Store analysis findings in memory
2110
581
 
2111
582
  ---
2112
583
 
2113
- **3. Docid Short Hashes**
2114
-
2115
- **Why Adopt**:
2116
- - Standard pattern (Git, QMD, etc.)
2117
- - Better UX for CLI tools
2118
- - Cross-database references
2119
-
2120
- **Implementation**: Phase 2 (near-term).
2121
-
2122
- ---
2123
-
2124
- **4. Smart Expansion Detection**
2125
-
2126
- **Why Adopt**:
2127
- - Clear performance win
2128
- - Simple heuristic
2129
- - No downsides (only skips when confident)
2130
-
2131
- **Implementation**: Phase 2 (near-term).
2132
-
2133
- ---
2134
-
2135
- ### Reject Due to Cost/Benefit
2136
-
2137
- **1. Neural Embeddings**
2138
-
2139
- **Cost**: 300MB download, 2s latency, complex dependency.
2140
-
2141
- **Benefit**: Better semantic search quality.
2142
-
2143
- **Decision**: DEFER - TF-IDF sufficient for now.
2144
-
2145
- ---
2146
-
2147
- **2. LLM Reranking**
2148
-
2149
- **Cost**: 640MB model, 400ms latency per query.
2150
-
2151
- **Benefit**: Better ranking precision.
2152
-
2153
- **Decision**: REJECT - Over-engineering for structured facts.
2154
-
2155
- ---
2156
-
2157
- **3. Query Expansion**
2158
-
2159
- **Cost**: 2.2GB model, 800ms latency per query.
2160
-
2161
- **Benefit**: Better recall with alternative phrasings.
2162
-
2163
- **Decision**: REJECT - No LLM in recall path, too heavy.
2164
-
2165
- ---
2166
-
2167
- ## Conclusion
2168
-
2169
- QMD demonstrates **state-of-the-art hybrid search** with impressive quality improvements (50%+ over BM25). However, it achieves this through heavy dependencies (3GB+ models) that may not be appropriate for all use cases.
2170
-
2171
- **Key Takeaways**:
2172
-
2173
- 1. **sqlite-vec is essential**: Native vector storage is 10-100x faster. This is a must-adopt.
2174
-
2175
- 2. **RRF fusion is proven**: 50% quality improvement with zero dependencies. This is a must-adopt.
2176
-
2177
- 3. **Smart optimizations matter**: Expansion detection saves 200-500ms on 60% of queries. This is worth adopting.
2178
-
2179
- 4. **Neural models are costly**: 3GB+ models provide better quality but at significant cost. Defer for now.
2180
-
2181
- 5. **Architecture matters**: QMD's document model differs from our fact model. Adopt algorithms, not architecture.
2182
-
2183
- **Recommended Adoption Order**:
2184
-
2185
- 1. **Immediate**: sqlite-vec + RRF fusion (performance foundation)
2186
- 2. **Near-term**: Docids + smart detection (UX + optimization)
2187
- 3. **Future**: LLM caching + chunking (cost reduction)
2188
- 4. **Defer**: Neural embeddings (wait for user feedback)
2189
- 5. **Reject**: LLM reranking + query expansion (over-engineering)
584
+ ## References
2190
585
 
2191
- By selectively adopting QMD's innovations while preserving our unique advantages, we can significantly improve ClaudeMemory's search quality and performance without sacrificing simplicity.
586
+ - **Repository**: https://github.com/tobi/qmd
587
+ - **Previous Analysis**: docs/influence/qmd.md (2026-01-26)
588
+ - **Claude Code Plugins**: https://code.claude.com/docs/en/plugins.md
589
+ - **MCP Spec**: https://modelcontextprotocol.io
590
+ - **sqlite-vec**: https://github.com/asg017/sqlite-vec
591
+ - **RRF Paper**: Cormack et al., "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods" (2009)
2192
592
 
2193
593
  ---
2194
594
 
2195
- *End of QMD Analysis*
595
+ *Analysis completed: 2026-02-02*
596
+ *Analyst: Claude Code*
597
+ *Review Status: Draft — Updated from 2026-01-26 analysis with new findings on plugin distribution, fine-tuned models, and MCP patterns*