claude_memory 0.4.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.claude/CLAUDE.md +1 -1
- data/.claude/rules/claude_memory.generated.md +14 -1
- data/.claude/skills/check-memory/SKILL.md +10 -0
- data/.claude/skills/improve/SKILL.md +12 -1
- data/.claude-plugin/plugin.json +1 -1
- data/CHANGELOG.md +70 -0
- data/db/migrations/008_add_provenance_line_range.rb +21 -0
- data/db/migrations/009_add_docid.rb +39 -0
- data/db/migrations/010_add_llm_cache.rb +30 -0
- data/docs/improvements.md +72 -1084
- data/docs/influence/claude-supermemory.md +498 -0
- data/docs/influence/qmd.md +424 -2022
- data/docs/quality_review.md +64 -705
- data/lib/claude_memory/commands/doctor_command.rb +45 -4
- data/lib/claude_memory/commands/explain_command.rb +11 -6
- data/lib/claude_memory/commands/stats_command.rb +1 -1
- data/lib/claude_memory/core/fact_graph.rb +122 -0
- data/lib/claude_memory/core/fact_query_builder.rb +34 -14
- data/lib/claude_memory/core/fact_ranker.rb +3 -20
- data/lib/claude_memory/core/relative_time.rb +45 -0
- data/lib/claude_memory/core/result_sorter.rb +2 -2
- data/lib/claude_memory/core/rr_fusion.rb +57 -0
- data/lib/claude_memory/core/snippet_extractor.rb +97 -0
- data/lib/claude_memory/domain/fact.rb +3 -1
- data/lib/claude_memory/index/index_query.rb +2 -0
- data/lib/claude_memory/index/lexical_fts.rb +18 -0
- data/lib/claude_memory/infrastructure/operation_tracker.rb +7 -21
- data/lib/claude_memory/infrastructure/schema_validator.rb +30 -25
- data/lib/claude_memory/ingest/content_sanitizer.rb +8 -1
- data/lib/claude_memory/ingest/ingester.rb +67 -56
- data/lib/claude_memory/ingest/tool_extractor.rb +1 -1
- data/lib/claude_memory/ingest/tool_filter.rb +55 -0
- data/lib/claude_memory/logging/logger.rb +112 -0
- data/lib/claude_memory/mcp/query_guide.rb +96 -0
- data/lib/claude_memory/mcp/response_formatter.rb +86 -23
- data/lib/claude_memory/mcp/server.rb +34 -4
- data/lib/claude_memory/mcp/text_summary.rb +257 -0
- data/lib/claude_memory/mcp/tool_definitions.rb +20 -4
- data/lib/claude_memory/mcp/tools.rb +133 -120
- data/lib/claude_memory/publish.rb +12 -2
- data/lib/claude_memory/recall/expansion_detector.rb +44 -0
- data/lib/claude_memory/recall.rb +93 -41
- data/lib/claude_memory/resolve/resolver.rb +72 -40
- data/lib/claude_memory/store/sqlite_store.rb +99 -24
- data/lib/claude_memory/sweep/sweeper.rb +6 -0
- data/lib/claude_memory/version.rb +1 -1
- data/lib/claude_memory.rb +21 -0
- metadata +14 -2
- data/docs/remaining_improvements.md +0 -330
data/docs/influence/qmd.md
CHANGED
|
@@ -1,24 +1,9 @@
|
|
|
1
|
-
# QMD Analysis: Quick Markdown Search
|
|
1
|
+
# QMD Analysis: Quick Markdown Search (Updated)
|
|
2
2
|
|
|
3
|
-
*Analysis Date: 2026-
|
|
4
|
-
*
|
|
3
|
+
*Analysis Date: 2026-02-02*
|
|
4
|
+
*Previous Analysis: 2026-01-26*
|
|
5
5
|
*Repository: https://github.com/tobi/qmd*
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Table of Contents
|
|
10
|
-
|
|
11
|
-
1. [Executive Summary](#executive-summary)
|
|
12
|
-
2. [Architecture Overview](#architecture-overview)
|
|
13
|
-
3. [Database Schema Analysis](#database-schema-analysis)
|
|
14
|
-
4. [Search Pipeline Deep-Dive](#search-pipeline-deep-dive)
|
|
15
|
-
5. [Vector Search Implementation](#vector-search-implementation)
|
|
16
|
-
6. [LLM Infrastructure](#llm-infrastructure)
|
|
17
|
-
7. [Performance Characteristics](#performance-characteristics)
|
|
18
|
-
8. [Comparative Analysis](#comparative-analysis)
|
|
19
|
-
9. [Adoption Opportunities](#adoption-opportunities)
|
|
20
|
-
10. [Implementation Recommendations](#implementation-recommendations)
|
|
21
|
-
11. [Architecture Decisions](#architecture-decisions)
|
|
6
|
+
*Version/Commit: 63028fd (latest main)*
|
|
22
7
|
|
|
23
8
|
---
|
|
24
9
|
|
|
@@ -26,2170 +11,587 @@
|
|
|
26
11
|
|
|
27
12
|
### Project Purpose
|
|
28
13
|
|
|
29
|
-
QMD (Quick Markdown Search) is an **on-device
|
|
30
|
-
|
|
31
|
-
**Target Users**: Developers, researchers, knowledge workers using markdown for notes, documentation, and personal knowledge management.
|
|
14
|
+
QMD (Quick Markdown Search) is an **on-device search engine** for markdown knowledge bases, notes, meeting transcripts, and documentation. It combines BM25 full-text search, vector semantic search, and LLM re-ranking — all running locally via node-llama-cpp with GGUF models.
|
|
32
15
|
|
|
33
16
|
### Key Innovation
|
|
34
17
|
|
|
35
|
-
QMD's
|
|
18
|
+
QMD's standout innovations since last analysis:
|
|
36
19
|
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
? { retrieval: 0.75, reranker: 0.25 }
|
|
41
|
-
: rank <= 10
|
|
42
|
-
? { retrieval: 0.60, reranker: 0.40 }
|
|
43
|
-
: { retrieval: 0.40, reranker: 0.60 };
|
|
44
|
-
```
|
|
20
|
+
1. **Custom fine-tuned query expansion model** (`qmd-query-expansion-1.7B`): A Qwen3-1.7B model trained with SFT + GRPO (reinforcement learning) specifically for structured search query expansion. Produces typed outputs (`lex:`, `vec:`, `hyde:`) that route to different search backends.
|
|
21
|
+
|
|
22
|
+
2. **Claude Code plugin ecosystem**: QMD ships as a Claude Code marketplace plugin (`.claude-plugin/marketplace.json`) with skills, MCP server integration, and inline status checks.
|
|
45
23
|
|
|
46
|
-
|
|
24
|
+
3. **Session-scoped LLM management** (`ILLMSession`): Structured lifecycle for LLM resources with abort signals, timeout management, and clean disposal.
|
|
47
25
|
|
|
48
26
|
### Technology Stack
|
|
49
27
|
|
|
50
|
-
- **Runtime**: Bun (
|
|
51
|
-
- **Database**: SQLite with sqlite-vec extension
|
|
52
|
-
- **
|
|
53
|
-
- **
|
|
54
|
-
- **
|
|
55
|
-
- **
|
|
28
|
+
- **Runtime**: Bun >= 1.0.0 (TypeScript)
|
|
29
|
+
- **Database**: SQLite with sqlite-vec extension (cosine distance)
|
|
30
|
+
- **Full-Text Search**: SQLite FTS5 with Porter tokenization
|
|
31
|
+
- **Embeddings**: EmbeddingGemma-300M (GGUF, ~300MB)
|
|
32
|
+
- **Reranking**: Qwen3-Reranker-0.6B (GGUF, ~640MB)
|
|
33
|
+
- **Query Expansion**: qmd-query-expansion-1.7B (custom fine-tuned, ~1.1GB)
|
|
34
|
+
- **MCP**: @modelcontextprotocol/sdk with stdio transport
|
|
35
|
+
- **Validation**: Zod v4 for MCP tool input schemas
|
|
36
|
+
- **Config**: YAML-based collection management (`~/.config/qmd/index.yml`)
|
|
56
37
|
|
|
57
38
|
### Production Readiness
|
|
58
39
|
|
|
59
|
-
- **
|
|
60
|
-
- **
|
|
61
|
-
- **
|
|
62
|
-
- **
|
|
63
|
-
|
|
64
|
-
### Evaluation Results
|
|
65
|
-
|
|
66
|
-
From `eval.test.ts` (24 queries across 4 difficulty levels):
|
|
67
|
-
|
|
68
|
-
| Query Type | BM25 Hit@3 | Vector Hit@3 | Hybrid Hit@3 | Improvement |
|
|
69
|
-
|------------|------------|--------------|--------------|-------------|
|
|
70
|
-
| Easy (exact keywords) | ≥80% | ≥60% | ≥80% | BM25 sufficient |
|
|
71
|
-
| Medium (semantic) | ≥15% | ≥40% | ≥50% | **+233%** over BM25 |
|
|
72
|
-
| Hard (vague) | ≥15% @ H@5 | ≥30% @ H@5 | ≥35% @ H@5 | **+133%** over BM25 |
|
|
73
|
-
| Fusion (multi-signal) | ~15% | ~30% | ≥50% | **+233%** over BM25 |
|
|
74
|
-
| **Overall** | ≥40% | ≥50% | ≥60% | **+50%** over BM25 |
|
|
75
|
-
|
|
76
|
-
Key insight: **Hybrid RRF fusion outperforms both methods alone**, especially on queries requiring both lexical precision and semantic understanding.
|
|
40
|
+
- **Maturity**: Beta, actively developed, 5,700+ GitHub stars
|
|
41
|
+
- **Test Coverage**: Unit tests (store.test.ts, mcp.test.ts), eval harness (18 queries across 3 difficulty levels)
|
|
42
|
+
- **Documentation**: Comprehensive README, CLAUDE.md, inline code docs
|
|
43
|
+
- **Community**: 257 forks, 29 issues, 17 PRs, active maintainer (Tobi Lütke)
|
|
44
|
+
- **Plugin Distribution**: Available via Claude Code marketplace
|
|
77
45
|
|
|
78
46
|
---
|
|
79
47
|
|
|
80
48
|
## Architecture Overview
|
|
81
49
|
|
|
82
|
-
### Data Model
|
|
83
|
-
|
|
84
|
-
| Aspect | QMD | ClaudeMemory |
|
|
85
|
-
|--------|-----|--------------|
|
|
86
|
-
| **Granularity** | Full markdown documents | Structured facts (triples) |
|
|
87
|
-
| **Storage** | Content-addressable (SHA256 hash) | Entity-predicate-object |
|
|
88
|
-
| **Deduplication** | Per-document (by content hash) | Per-fact (by signature) |
|
|
89
|
-
| **Retrieval Goal** | Find relevant documents | Find specific facts |
|
|
90
|
-
| **Truth Model** | All documents valid | Supersession + conflicts |
|
|
91
|
-
| **Scope** | YAML collections | Dual-database (global/project) |
|
|
92
|
-
|
|
93
|
-
**Philosophical Difference**:
|
|
94
|
-
- **QMD**: "Show me documents about X" (conversation recall)
|
|
95
|
-
- **ClaudeMemory**: "What do we know about X?" (knowledge extraction)
|
|
96
|
-
|
|
97
|
-
### Storage Strategy
|
|
50
|
+
### Data Model
|
|
98
51
|
|
|
99
|
-
QMD uses
|
|
52
|
+
QMD uses content-addressable storage with a virtual filesystem layer:
|
|
100
53
|
|
|
101
54
|
```
|
|
102
|
-
content table (SHA256 hash → document body)
|
|
55
|
+
content table (SHA256 hash → document body, deduplication)
|
|
103
56
|
↓
|
|
104
|
-
documents table (collection, path, title → hash)
|
|
57
|
+
documents table (collection, path, title → hash, soft-delete via active flag)
|
|
105
58
|
↓
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
-
|
|
111
|
-
|
|
112
|
-
-
|
|
113
|
-
|
|
114
|
-
Trade-offs:
|
|
115
|
-
- More complex than direct file storage
|
|
116
|
-
- Hash collisions possible (mitigated by SHA256)
|
|
117
|
-
|
|
118
|
-
### Collection System
|
|
119
|
-
|
|
120
|
-
QMD uses YAML configuration for multi-collection indexing:
|
|
121
|
-
|
|
122
|
-
```yaml
|
|
123
|
-
# ~/.config/qmd/index.yml
|
|
124
|
-
global_context: "Personal knowledge base for software development"
|
|
125
|
-
|
|
126
|
-
collections:
|
|
127
|
-
notes:
|
|
128
|
-
path: /Users/name/notes
|
|
129
|
-
pattern: "**/*.md"
|
|
130
|
-
context:
|
|
131
|
-
/: "General notes"
|
|
132
|
-
/work: "Work-related notes and documentation"
|
|
133
|
-
/personal: "Personal projects and ideas"
|
|
134
|
-
|
|
135
|
-
docs:
|
|
136
|
-
path: /Users/name/Documents
|
|
137
|
-
pattern: "**/*.md"
|
|
138
|
-
```
|
|
139
|
-
|
|
140
|
-
**Context Inheritance**: File at `/work/projects/api.md` inherits:
|
|
141
|
-
1. Global context
|
|
142
|
-
2. `/` context (general notes)
|
|
143
|
-
3. `/work` context (work-related)
|
|
144
|
-
|
|
145
|
-
This provides semantic metadata for LLM operations without storing it per-document.
|
|
146
|
-
|
|
147
|
-
### Lifecycle Diagram
|
|
148
|
-
|
|
149
|
-
```
|
|
150
|
-
┌─────────────┐
|
|
151
|
-
│ Index Files │ (qmd index <collection>)
|
|
152
|
-
└──────┬──────┘
|
|
153
|
-
│
|
|
154
|
-
↓
|
|
155
|
-
┌─────────────────────────────────────────────────────────┐
|
|
156
|
-
│ 1. Hash content (SHA256) │
|
|
157
|
-
│ 2. INSERT OR IGNORE into content table │
|
|
158
|
-
│ 3. INSERT/UPDATE documents table (collection, path → hash)│
|
|
159
|
-
│ 4. FTS5 trigger auto-indexes title + body │
|
|
160
|
-
└──────┬──────────────────────────────────────────────────┘
|
|
161
|
-
│
|
|
162
|
-
↓
|
|
163
|
-
┌──────────────┐
|
|
164
|
-
│ Embed │ (qmd embed <collection>)
|
|
165
|
-
└──────┬───────┘
|
|
166
|
-
│
|
|
167
|
-
↓
|
|
168
|
-
┌─────────────────────────────────────────────────────────┐
|
|
169
|
-
│ 1. Chunk document (800 tokens, 15% overlap) │
|
|
170
|
-
│ 2. Generate embeddings (EmbeddingGemma 384-dim) │
|
|
171
|
-
│ 3. INSERT into content_vectors + vectors_vec │
|
|
172
|
-
└──────┬──────────────────────────────────────────────────┘
|
|
173
|
-
│
|
|
174
|
-
↓
|
|
175
|
-
┌──────────────┐
|
|
176
|
-
│ Search │ (qmd query "concept")
|
|
177
|
-
└──────┬───────┘
|
|
178
|
-
│
|
|
179
|
-
↓
|
|
180
|
-
┌─────────────────────────────────────────────────────────┐
|
|
181
|
-
│ Mode: search → BM25 only (fast) │
|
|
182
|
-
│ Mode: vsearch → Vector only (semantic) │
|
|
183
|
-
│ Mode: query → Hybrid pipeline (BM25 + vec + rerank) │
|
|
184
|
-
└──────┬──────────────────────────────────────────────────┘
|
|
185
|
-
│
|
|
186
|
-
↓
|
|
187
|
-
┌──────────────┐
|
|
188
|
-
│ Retrieve │ (qmd get <path | #docid>)
|
|
189
|
-
└──────────────┘
|
|
59
|
+
documents_fts (FTS5 full-text index, auto-synced via triggers)
|
|
60
|
+
↓
|
|
61
|
+
content_vectors (chunk metadata: hash, seq, pos, model)
|
|
62
|
+
↓
|
|
63
|
+
vectors_vec (sqlite-vec native KNN index, cosine distance)
|
|
64
|
+
↓
|
|
65
|
+
llm_cache (hash-keyed deterministic response cache)
|
|
190
66
|
```
|
|
191
67
|
|
|
192
|
-
|
|
68
|
+
### Key Design Patterns
|
|
193
69
|
|
|
194
|
-
|
|
70
|
+
1. **Content-Addressable Storage**: `content` table deduplicates by SHA256 hash — multiple documents with identical content share one row (`store.ts:440-450`)
|
|
195
71
|
|
|
196
|
-
|
|
72
|
+
2. **Two-Step Vector Query**: JOINs with sqlite-vec virtual tables hang indefinitely. QMD enforces separate queries for vec lookup and metadata join (`store.ts:1912-1915`):
|
|
73
|
+
```typescript
|
|
74
|
+
// Step 1: KNN from vec table
|
|
75
|
+
const vecResults = db.prepare(
|
|
76
|
+
`SELECT hash_seq, distance FROM vectors_vec WHERE embedding MATCH ? AND k = ?`
|
|
77
|
+
).all(embedding, limit * 3);
|
|
78
|
+
// Step 2: Join with documents separately
|
|
79
|
+
```
|
|
197
80
|
|
|
198
|
-
|
|
81
|
+
3. **YAML-Based Collection Config**: Collections migrated from SQLite foreign keys to `~/.config/qmd/index.yml` for easier user management. Schema migration in `migrate-schema.ts` handled the transition.
|
|
199
82
|
|
|
200
|
-
|
|
201
|
-
CREATE TABLE content (
|
|
202
|
-
hash TEXT PRIMARY KEY, -- SHA256 of document body
|
|
203
|
-
doc TEXT NOT NULL, -- Full markdown content
|
|
204
|
-
created_at TEXT NOT NULL -- ISO timestamp
|
|
205
|
-
);
|
|
206
|
-
```
|
|
83
|
+
4. **Hierarchical Context System**: Context descriptions inherit along path hierarchy — a file at `/work/projects/api.md` gets global context + `/` context + `/work` context concatenated (`collections.ts:94-113`)
|
|
207
84
|
|
|
208
|
-
**
|
|
85
|
+
5. **Probabilistic Cache Cleanup**: 1% chance per query to prune LLM cache to latest 1000 entries (`store.ts:804-807`)
|
|
209
86
|
|
|
210
|
-
**
|
|
87
|
+
6. **Lazy Model Singleton**: LLM models lazy-load on first use, keep in memory, and unload contexts after 2-minute idle (`llm.ts:920-951`)
|
|
211
88
|
|
|
212
|
-
|
|
89
|
+
### Module Organization
|
|
213
90
|
|
|
214
|
-
```sql
|
|
215
|
-
CREATE TABLE documents (
|
|
216
|
-
id INTEGER PRIMARY KEY,
|
|
217
|
-
collection TEXT NOT NULL, -- Collection name (from YAML)
|
|
218
|
-
path TEXT NOT NULL, -- Relative path within collection
|
|
219
|
-
title TEXT NOT NULL, -- Extracted from first H1/H2
|
|
220
|
-
hash TEXT NOT NULL, -- Foreign key to content.hash
|
|
221
|
-
created_at TEXT NOT NULL,
|
|
222
|
-
modified_at TEXT NOT NULL,
|
|
223
|
-
active INTEGER DEFAULT 1, -- Soft delete flag
|
|
224
|
-
UNIQUE(collection, path)
|
|
225
|
-
);
|
|
226
91
|
```
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
92
|
+
qmd/
|
|
93
|
+
├── src/
|
|
94
|
+
│ ├── qmd.ts # CLI entry point (~750 lines, lazy-loaded store)
|
|
95
|
+
│ ├── store.ts # Core store: schema, search, indexing (~2400 lines)
|
|
96
|
+
│ ├── mcp.ts # MCP server: 6 tools + resource + prompt (~626 lines)
|
|
97
|
+
│ ├── llm.ts # LLM abstraction: embed, rerank, expand (~1208 lines)
|
|
98
|
+
│ ├── collections.ts # YAML config management (~390 lines)
|
|
99
|
+
│ ├── store.test.ts # Comprehensive store unit tests
|
|
100
|
+
│ └── mcp.test.ts # MCP integration tests
|
|
101
|
+
├── finetune/ # Query expansion model training pipeline
|
|
102
|
+
│ ├── reward.py # Multi-dimensional reward function (5 dimensions, 120 pts)
|
|
103
|
+
│ ├── train.py # Unified SFT + GRPO training
|
|
104
|
+
│ ├── eval.py # Model evaluation with scoring
|
|
105
|
+
│ └── jobs/ # HuggingFace Jobs wrappers
|
|
106
|
+
├── test/
|
|
107
|
+
│ └── eval-harness.ts # Search quality evaluation (18 queries)
|
|
108
|
+
├── skills/qmd/ # Claude Code plugin skill definition
|
|
109
|
+
└── .claude-plugin/ # Marketplace distribution metadata
|
|
243
110
|
```
|
|
244
111
|
|
|
245
|
-
|
|
112
|
+
### Comparison with ClaudeMemory
|
|
246
113
|
|
|
247
|
-
|
|
114
|
+
| Aspect | QMD | ClaudeMemory | Notes |
|
|
115
|
+
|--------|-----|--------------|-------|
|
|
116
|
+
| **Data Model** | Full markdown documents | Structured fact triples | Different paradigms: recall vs extraction |
|
|
117
|
+
| **Storage** | SQLite + sqlite-vec (native vectors) | SQLite + JSON embeddings | QMD has 10-100x faster KNN |
|
|
118
|
+
| **Search** | BM25 + Vector + RRF + Reranking | BM25 + Vector (hybrid) | QMD adds reranking + query expansion |
|
|
119
|
+
| **MCP** | 6 tools + resource + prompt | 18 tools | ClaudeMemory has richer tool surface |
|
|
120
|
+
| **Distribution** | Bun global install + plugin | Ruby gem + MCP + hooks | QMD has smoother install via plugin |
|
|
121
|
+
| **LLM Dependency** | 3 local GGUF models (~2GB total) | None (local ONNX only) | ClaudeMemory is dramatically lighter |
|
|
122
|
+
| **Query Expansion** | Custom fine-tuned model (1.7B) | None | QMD has ML-powered query improvement |
|
|
123
|
+
| **Truth Maintenance** | None (all docs valid) | Supersession + conflicts | ClaudeMemory handles contradictions |
|
|
124
|
+
| **Scope System** | YAML collections | Dual-database (global/project) | Both approaches valid for their use case |
|
|
125
|
+
| **Testing** | Unit + eval harness | Unit + evals + benchmarks (DevMemBench) | ClaudeMemory has more comprehensive benchmarks |
|
|
248
126
|
|
|
249
|
-
|
|
127
|
+
---
|
|
250
128
|
|
|
251
|
-
|
|
252
|
-
CREATE TABLE content_vectors (
|
|
253
|
-
hash TEXT NOT NULL, -- Foreign key to content.hash
|
|
254
|
-
seq INTEGER NOT NULL, -- Chunk sequence number
|
|
255
|
-
pos INTEGER NOT NULL, -- Character position in document
|
|
256
|
-
model TEXT NOT NULL, -- Embedding model name
|
|
257
|
-
embedded_at TEXT NOT NULL, -- ISO timestamp
|
|
258
|
-
PRIMARY KEY (hash, seq)
|
|
259
|
-
);
|
|
260
|
-
```
|
|
129
|
+
## Key Components Deep-Dive
|
|
261
130
|
|
|
262
|
-
|
|
131
|
+
### Component 1: Fine-Tuned Query Expansion
|
|
263
132
|
|
|
264
|
-
**
|
|
133
|
+
**Purpose**: Generate structured query variations (lex/vec/hyde) to improve search recall by routing different query types to appropriate backends.
|
|
265
134
|
|
|
266
|
-
|
|
135
|
+
**Location**: `finetune/`, `src/llm.ts:637-679`
|
|
267
136
|
|
|
268
|
-
|
|
269
|
-
CREATE VIRTUAL TABLE vectors_vec USING vec0(
|
|
270
|
-
hash_seq TEXT PRIMARY KEY, -- "hash_seq" composite key
|
|
271
|
-
embedding float[384] -- 384-dimensional vector (EmbeddingGemma)
|
|
272
|
-
distance_metric=cosine
|
|
273
|
-
);
|
|
274
|
-
```
|
|
137
|
+
**Implementation** (from `finetune/README.md`):
|
|
275
138
|
|
|
276
|
-
|
|
277
|
-
```typescript
|
|
278
|
-
// IMPORTANT: We use a two-step query approach here because sqlite-vec virtual tables
|
|
279
|
-
// hang indefinitely when combined with JOINs in the same query. Do NOT try to
|
|
280
|
-
// "optimize" this by combining into a single query with JOINs - it will break.
|
|
281
|
-
// See: https://github.com/tobi/qmd/pull/23
|
|
282
|
-
|
|
283
|
-
// CORRECT: Two-step pattern
|
|
284
|
-
const vecResults = db.prepare(`
|
|
285
|
-
SELECT hash_seq, distance
|
|
286
|
-
FROM vectors_vec
|
|
287
|
-
WHERE embedding MATCH ? AND k = ?
|
|
288
|
-
`).all(embedding, limit * 3);
|
|
289
|
-
|
|
290
|
-
// Then join with documents table separately
|
|
291
|
-
const hashSeqs = vecResults.map(r => r.hash_seq);
|
|
292
|
-
const docs = db.prepare(`
|
|
293
|
-
SELECT * FROM documents WHERE hash IN (${placeholders})
|
|
294
|
-
`).all(hashSeqs);
|
|
295
|
-
```
|
|
139
|
+
The custom model `qmd-query-expansion-1.7B` is trained in two stages:
|
|
296
140
|
|
|
297
|
-
**
|
|
141
|
+
1. **SFT (Supervised Fine-Tuning)**: Teaches format compliance
|
|
142
|
+
- Base model: Qwen3-1.7B
|
|
143
|
+
- LoRA rank 16, alpha 32 (all projection layers)
|
|
144
|
+
- ~2,290 training examples, 5 epochs
|
|
145
|
+
- Loss: train 0.472, val 0.304
|
|
298
146
|
|
|
299
|
-
|
|
147
|
+
2. **GRPO (Group Relative Policy Optimization)**: Refines quality
|
|
148
|
+
- LoRA rank 4, alpha 8 (q_proj, v_proj only)
|
|
149
|
+
- KL beta 0.04 (prevents drift from SFT)
|
|
150
|
+
- 200 steps, mean reward 0.757
|
|
300
151
|
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
)
|
|
307
|
-
|
|
152
|
+
**Reward Function** (from `finetune/reward.py`):
|
|
153
|
+
5 dimensions totaling 120 points (140 with hyde):
|
|
154
|
+
- Format (0-30): Valid lex/vec/hyde lines
|
|
155
|
+
- Diversity (0-30): Multiple types, no echoing query
|
|
156
|
+
- HyDE (0-20): Presence, length, quality
|
|
157
|
+
- Quality (0-20): Lex < vec length, preserved terms
|
|
158
|
+
- Entity (±45 to +20): Named entity preservation
|
|
159
|
+
- Think penalty: No `<think>` blocks (uses `/no_think` directive)
|
|
308
160
|
|
|
309
|
-
**
|
|
310
|
-
```typescript
|
|
311
|
-
function getCacheKey(operation: string, params: Record<string, any>): string {
|
|
312
|
-
const canonical = JSON.stringify({ operation, ...params });
|
|
313
|
-
return sha256(canonical);
|
|
314
|
-
}
|
|
315
|
-
|
|
316
|
-
// Examples:
|
|
317
|
-
// expandQuery: hash("expandQuery" + model + query)
|
|
318
|
-
// rerank: hash("rerank" + model + query + file)
|
|
161
|
+
**Output Format**:
|
|
319
162
|
```
|
|
320
|
-
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
if (Math.random() < 0.01) {
|
|
325
|
-
db.run(`
|
|
326
|
-
DELETE FROM llm_cache
|
|
327
|
-
WHERE hash NOT IN (
|
|
328
|
-
SELECT hash FROM llm_cache
|
|
329
|
-
ORDER BY created_at DESC
|
|
330
|
-
LIMIT 1000
|
|
331
|
-
)
|
|
332
|
-
`);
|
|
333
|
-
}
|
|
163
|
+
lex: authentication configuration
|
|
164
|
+
lex: auth settings setup
|
|
165
|
+
vec: how to configure authentication settings
|
|
166
|
+
hyde: Authentication can be configured by setting the AUTH_SECRET environment variable.
|
|
334
167
|
```
|
|
335
168
|
|
|
336
|
-
**
|
|
337
|
-
-
|
|
338
|
-
-
|
|
339
|
-
-
|
|
340
|
-
|
|
341
|
-
### Foreign Key Relationships
|
|
342
|
-
|
|
343
|
-
```
|
|
344
|
-
content.hash ← documents.hash ← content_vectors.hash
|
|
345
|
-
↓
|
|
346
|
-
documents_fts (via trigger)
|
|
347
|
-
↓
|
|
348
|
-
vectors_vec.hash_seq (composite key)
|
|
349
|
-
```
|
|
169
|
+
**Design Decisions**:
|
|
170
|
+
- Structured output types (`lex:`, `vec:`, `hyde:`) route to different backends instead of generic rewrites
|
|
171
|
+
- `/no_think` Qwen3 directive suppresses chain-of-thought for direct output
|
|
172
|
+
- Grammar-constrained generation ensures format compliance at inference time
|
|
173
|
+
- Per-query caching avoids redundant expansion (80% hit rate)
|
|
350
174
|
|
|
351
|
-
**
|
|
352
|
-
- Soft delete: `documents.active = 0` (preserves content)
|
|
353
|
-
- Hard delete: Manual cleanup of orphaned content/vectors
|
|
175
|
+
**Relevance to ClaudeMemory**: The structured lex/vec/hyde output pattern is interesting — if we ever add query expansion to our recall pipeline, this type-routed approach is more sophisticated than simple query rewriting. The reward function design (multi-dimensional scoring with entity preservation) is also a good reference for evaluating any future distiller quality.
|
|
354
176
|
|
|
355
177
|
---
|
|
356
178
|
|
|
357
|
-
|
|
358
|
-
|
|
359
|
-
QMD provides three search modes with increasing sophistication:
|
|
360
|
-
|
|
361
|
-
### Mode 1: `search` (BM25 Only)
|
|
362
|
-
|
|
363
|
-
**Use Case**: Fast keyword matching when you know exact terms.
|
|
364
|
-
|
|
365
|
-
**Pipeline**:
|
|
366
|
-
```typescript
|
|
367
|
-
searchFTS(db, query, limit) {
|
|
368
|
-
// 1. Sanitize and build FTS5 query
|
|
369
|
-
const terms = query.split(/\s+/)
|
|
370
|
-
.map(t => sanitize(t))
|
|
371
|
-
.filter(t => t.length > 0);
|
|
372
|
-
|
|
373
|
-
const ftsQuery = terms.map(t => `"${t}"*`).join(' AND ');
|
|
374
|
-
|
|
375
|
-
// 2. Query FTS5 with BM25 scoring
|
|
376
|
-
const results = db.prepare(`
|
|
377
|
-
SELECT
|
|
378
|
-
d.path,
|
|
379
|
-
d.title,
|
|
380
|
-
bm25(documents_fts, 10.0, 1.0) as score
|
|
381
|
-
FROM documents_fts f
|
|
382
|
-
JOIN documents d ON d.id = f.rowid
|
|
383
|
-
WHERE documents_fts MATCH ? AND d.active = 1
|
|
384
|
-
ORDER BY score ASC -- Lower is better for BM25
|
|
385
|
-
LIMIT ?
|
|
386
|
-
`).all(ftsQuery, limit);
|
|
387
|
-
|
|
388
|
-
// 3. Convert BM25 (lower=better) to similarity (higher=better)
|
|
389
|
-
return results.map(r => ({
|
|
390
|
-
...r,
|
|
391
|
-
score: 1 / (1 + Math.max(0, r.score))
|
|
392
|
-
}));
|
|
393
|
-
}
|
|
394
|
-
```
|
|
395
|
-
|
|
396
|
-
**Latency**: <50ms
|
|
397
|
-
|
|
398
|
-
**Strengths**: Fast, good for exact matches
|
|
179
|
+
### Component 2: Claude Code Plugin System
|
|
399
180
|
|
|
400
|
-
**
|
|
181
|
+
**Purpose**: Package QMD for frictionless installation via Claude Code marketplace.
|
|
401
182
|
|
|
402
|
-
|
|
183
|
+
**Location**: `.claude-plugin/marketplace.json`, `skills/qmd/SKILL.md`
|
|
403
184
|
|
|
404
|
-
**
|
|
405
|
-
|
|
406
|
-
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
|
|
410
|
-
|
|
411
|
-
|
|
412
|
-
|
|
413
|
-
const embedding = new Float32Array(result.embedding);
|
|
414
|
-
|
|
415
|
-
// 2. KNN search (two-step to avoid JOIN hang)
|
|
416
|
-
const vecResults = db.prepare(`
|
|
417
|
-
SELECT hash_seq, distance
|
|
418
|
-
FROM vectors_vec
|
|
419
|
-
WHERE embedding MATCH ? AND k = ?
|
|
420
|
-
`).all(embedding, limit * 3);
|
|
421
|
-
|
|
422
|
-
// 3. Join with documents (separate query)
|
|
423
|
-
const hashSeqs = vecResults.map(r => r.hash_seq);
|
|
424
|
-
const docs = db.prepare(`
|
|
425
|
-
SELECT cv.hash, d.path, d.title
|
|
426
|
-
FROM content_vectors cv
|
|
427
|
-
JOIN documents d ON d.hash = cv.hash
|
|
428
|
-
WHERE cv.hash || '_' || cv.seq IN (${placeholders})
|
|
429
|
-
`).all(hashSeqs);
|
|
430
|
-
|
|
431
|
-
// 4. Deduplicate by document (keep best chunk per doc)
|
|
432
|
-
const seen = new Map();
|
|
433
|
-
for (const doc of docs) {
|
|
434
|
-
const distance = distanceMap.get(doc.hash_seq);
|
|
435
|
-
const existing = seen.get(doc.path);
|
|
436
|
-
if (!existing || distance < existing.distance) {
|
|
437
|
-
seen.set(doc.path, { doc, distance });
|
|
185
|
+
**Plugin Structure** (from `marketplace.json:1-29`):
|
|
186
|
+
```json
|
|
187
|
+
{
|
|
188
|
+
"name": "qmd",
|
|
189
|
+
"plugins": [{
|
|
190
|
+
"name": "qmd",
|
|
191
|
+
"skills": ["./skills/"],
|
|
192
|
+
"mcpServers": {
|
|
193
|
+
"qmd": { "command": "qmd", "args": ["mcp"] }
|
|
438
194
|
}
|
|
439
|
-
}
|
|
440
|
-
|
|
441
|
-
// 5. Convert distance to similarity
|
|
442
|
-
return Array.from(seen.values())
|
|
443
|
-
.sort((a, b) => a.distance - b.distance)
|
|
444
|
-
.slice(0, limit)
|
|
445
|
-
.map(({ doc, distance }) => ({
|
|
446
|
-
...doc,
|
|
447
|
-
score: 1 - distance // Cosine similarity
|
|
448
|
-
}));
|
|
449
|
-
}
|
|
450
|
-
```
|
|
451
|
-
|
|
452
|
-
**Latency**: ~200ms (embedding generation)
|
|
453
|
-
|
|
454
|
-
**Strengths**: Semantic understanding, synonym matching
|
|
455
|
-
|
|
456
|
-
**Weaknesses**: Slower, may miss exact keyword matches
|
|
457
|
-
|
|
458
|
-
### Mode 3: `query` (Hybrid Pipeline)
|
|
459
|
-
|
|
460
|
-
**Use Case**: Best-quality search combining lexical + semantic + reranking.
|
|
461
|
-
|
|
462
|
-
**Full Pipeline** (10 stages):
|
|
463
|
-
|
|
464
|
-
#### Stage 1: Initial FTS Query
|
|
465
|
-
|
|
466
|
-
```typescript
|
|
467
|
-
const initialFts = searchFTS(db, query, 20);
|
|
468
|
-
```
|
|
469
|
-
|
|
470
|
-
**Purpose**: Get BM25 baseline results.
|
|
471
|
-
|
|
472
|
-
#### Stage 2: Smart Expansion Detection
|
|
473
|
-
|
|
474
|
-
```typescript
|
|
475
|
-
const topScore = initialFts[0]?.score ?? 0;
|
|
476
|
-
const secondScore = initialFts[1]?.score ?? 0;
|
|
477
|
-
const hasStrongSignal =
|
|
478
|
-
initialFts.length > 0 &&
|
|
479
|
-
topScore >= 0.85 &&
|
|
480
|
-
(topScore - secondScore) >= 0.15;
|
|
481
|
-
|
|
482
|
-
if (hasStrongSignal) {
|
|
483
|
-
// Skip expensive LLM operations
|
|
484
|
-
return initialFts.slice(0, limit);
|
|
195
|
+
}]
|
|
485
196
|
}
|
|
486
197
|
```
|
|
487
198
|
|
|
488
|
-
**
|
|
489
|
-
|
|
490
|
-
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
|
|
494
|
-
|
|
495
|
-
|
|
496
|
-
|
|
497
|
-
|
|
498
|
-
```typescript
|
|
499
|
-
// Generate alternative phrasings for better recall
|
|
500
|
-
const expanded = await expandQuery(query, model, db);
|
|
501
|
-
// Returns: [original, variant1, variant2]
|
|
502
|
-
```
|
|
503
|
-
|
|
504
|
-
**LLM Prompt** (simplified):
|
|
505
|
-
```
|
|
506
|
-
Generate 2 alternative search queries:
|
|
507
|
-
1. 'lex': Keyword-focused variation
|
|
508
|
-
2. 'vec': Semantic-focused variation
|
|
509
|
-
|
|
510
|
-
Original: "how to structure REST endpoints"
|
|
511
|
-
|
|
512
|
-
Output:
|
|
513
|
-
lex: API endpoint design patterns
|
|
514
|
-
vec: RESTful service architecture best practices
|
|
515
|
-
```
|
|
516
|
-
|
|
517
|
-
**Model**: Qwen3-1.7B (2.2GB, loaded on-demand)
|
|
518
|
-
|
|
519
|
-
**Cache Key**: `hash(query + model)`
|
|
520
|
-
|
|
521
|
-
#### Stage 4: Multi-Query Search (Parallel)
|
|
522
|
-
|
|
523
|
-
```typescript
|
|
524
|
-
const rankedLists = [];
|
|
525
|
-
|
|
526
|
-
for (const q of expanded) {
|
|
527
|
-
// Run FTS for each query variant
|
|
528
|
-
const ftsResults = searchFTS(db, q.text, 20);
|
|
529
|
-
rankedLists.push(ftsResults);
|
|
530
|
-
|
|
531
|
-
// Run vector search for each query variant
|
|
532
|
-
const vecResults = await searchVec(db, q.text, model, 20);
|
|
533
|
-
rankedLists.push(vecResults);
|
|
534
|
-
}
|
|
535
|
-
|
|
536
|
-
// Result: 6 ranked lists (3 queries × 2 methods each)
|
|
537
|
-
```
|
|
538
|
-
|
|
539
|
-
**Purpose**: Cast wide net to maximize recall.
|
|
540
|
-
|
|
541
|
-
#### Stage 5: Reciprocal Rank Fusion (RRF)
|
|
542
|
-
|
|
543
|
-
```typescript
|
|
544
|
-
function reciprocalRankFusion(
|
|
545
|
-
resultLists: RankedResult[][],
|
|
546
|
-
weights: number[] = [],
|
|
547
|
-
k: number = 60
|
|
548
|
-
): RankedResult[] {
|
|
549
|
-
const scores = new Map<string, {
|
|
550
|
-
result: RankedResult;
|
|
551
|
-
rrfScore: number;
|
|
552
|
-
topRank: number;
|
|
553
|
-
}>();
|
|
554
|
-
|
|
555
|
-
// Accumulate RRF scores across all lists
|
|
556
|
-
for (let listIdx = 0; listIdx < resultLists.length; listIdx++) {
|
|
557
|
-
const list = resultLists[listIdx];
|
|
558
|
-
const weight = weights[listIdx] ?? 1.0;
|
|
559
|
-
|
|
560
|
-
for (let rank = 0; rank < list.length; rank++) {
|
|
561
|
-
const result = list[rank];
|
|
562
|
-
const rrfContribution = weight / (k + rank + 1);
|
|
563
|
-
|
|
564
|
-
const existing = scores.get(result.file);
|
|
565
|
-
if (existing) {
|
|
566
|
-
existing.rrfScore += rrfContribution;
|
|
567
|
-
existing.topRank = Math.min(existing.topRank, rank);
|
|
568
|
-
} else {
|
|
569
|
-
scores.set(result.file, {
|
|
570
|
-
result,
|
|
571
|
-
rrfScore: rrfContribution,
|
|
572
|
-
topRank: rank
|
|
573
|
-
});
|
|
574
|
-
}
|
|
575
|
-
}
|
|
576
|
-
}
|
|
577
|
-
|
|
578
|
-
// Top-rank bonus (preserve exact matches)
|
|
579
|
-
for (const entry of scores.values()) {
|
|
580
|
-
if (entry.topRank === 0) {
|
|
581
|
-
entry.rrfScore += 0.05; // #1 in any list
|
|
582
|
-
} else if (entry.topRank <= 2) {
|
|
583
|
-
entry.rrfScore += 0.02; // #2-3 in any list
|
|
584
|
-
}
|
|
585
|
-
}
|
|
586
|
-
|
|
587
|
-
return Array.from(scores.values())
|
|
588
|
-
.sort((a, b) => b.rrfScore - a.rrfScore)
|
|
589
|
-
.map(e => ({ ...e.result, score: e.rrfScore }));
|
|
590
|
-
}
|
|
199
|
+
**Skill Definition** (from `skills/qmd/SKILL.md:1-10`):
|
|
200
|
+
```yaml
|
|
201
|
+
---
|
|
202
|
+
name: qmd
|
|
203
|
+
description: Search personal markdown knowledge bases...
|
|
204
|
+
metadata:
|
|
205
|
+
author: tobi
|
|
206
|
+
version: "1.1.1"
|
|
207
|
+
allowed-tools: Bash(qmd:*), mcp__qmd__*
|
|
208
|
+
---
|
|
591
209
|
```
|
|
592
210
|
|
|
593
|
-
|
|
594
|
-
|
|
595
|
-
**
|
|
596
|
-
-
|
|
597
|
-
-
|
|
211
|
+
Key features:
|
|
212
|
+
- **Inline status check**: `!` prefix runs command during skill load (`SKILL.md:18`)
|
|
213
|
+
- **Trigger phrases**: "search my notes", "find in docs", "what did I write about"
|
|
214
|
+
- **Tool permissions**: Scoped to `qmd:*` bash commands and `mcp__qmd__*` tools
|
|
215
|
+
- **Score interpretation guide**: Embedded in skill for LLM consumption
|
|
216
|
+
- **Recommended workflow**: status → search → vsearch → query → get
|
|
598
217
|
|
|
599
|
-
**
|
|
600
|
-
- Original query: `weight = 2.0` (prioritize user's exact words)
|
|
601
|
-
- Expanded queries: `weight = 1.0` (supplementary signals)
|
|
218
|
+
**Relevance to ClaudeMemory**: This is the clearest example of how to package a memory/search tool as a Claude Code plugin. The skill definition format, tool permissions scoping, inline status checks, and MCP server bundling are all patterns we should adopt when ready to ship as a plugin. The `allowed-tools` pattern (`Bash(qmd:*)`) is particularly useful for security scoping.
|
|
602
219
|
|
|
603
|
-
|
|
604
|
-
- `+0.05` for rank #1: Likely exact match
|
|
605
|
-
- `+0.02` for ranks #2-3: Strong signal
|
|
606
|
-
- No bonus for rank #4+: Let RRF dominate
|
|
607
|
-
|
|
608
|
-
#### Stage 6: Candidate Selection
|
|
220
|
+
---
|
|
609
221
|
|
|
610
|
-
|
|
611
|
-
const candidates = fusedResults.slice(0, 30);
|
|
612
|
-
```
|
|
222
|
+
### Component 3: MCP Server with Structured Content
|
|
613
223
|
|
|
614
|
-
**Purpose**:
|
|
224
|
+
**Purpose**: Expose QMD search as MCP tools with both human-readable text and machine-parseable structured content.
|
|
615
225
|
|
|
616
|
-
|
|
226
|
+
**Location**: `src/mcp.ts`
|
|
617
227
|
|
|
228
|
+
**Implementation** (from `mcp.ts:258-292`):
|
|
618
229
|
```typescript
|
|
619
|
-
|
|
620
|
-
|
|
621
|
-
|
|
622
|
-
|
|
623
|
-
|
|
624
|
-
|
|
625
|
-
|
|
626
|
-
|
|
627
|
-
|
|
628
|
-
|
|
629
|
-
});
|
|
630
|
-
|
|
631
|
-
// Return best chunk text for reranking
|
|
230
|
+
server.registerTool("search", {
|
|
231
|
+
title: "Search (BM25)",
|
|
232
|
+
inputSchema: {
|
|
233
|
+
query: z.string().describe("Search query"),
|
|
234
|
+
limit: z.number().optional().default(10),
|
|
235
|
+
minScore: z.number().optional().default(0),
|
|
236
|
+
collection: z.string().optional(),
|
|
237
|
+
},
|
|
238
|
+
}, async ({ query, limit, minScore, collection }) => {
|
|
239
|
+
// ... search logic ...
|
|
632
240
|
return {
|
|
633
|
-
|
|
634
|
-
|
|
241
|
+
content: [{ type: "text", text: formatSearchSummary(filtered, query) }],
|
|
242
|
+
structuredContent: { results: filtered },
|
|
635
243
|
};
|
|
636
244
|
});
|
|
637
245
|
```
|
|
638
246
|
|
|
639
|
-
**
|
|
640
|
-
|
|
641
|
-
|
|
642
|
-
|
|
643
|
-
|
|
644
|
-
|
|
645
|
-
|
|
646
|
-
// Returns: [{ file, score: 0.0-1.0 }, ...]
|
|
647
|
-
// score = normalized relevance (cross-encoder logits)
|
|
648
|
-
```
|
|
649
|
-
|
|
650
|
-
**Model**: Qwen3-Reranker-0.6B (640MB)
|
|
651
|
-
|
|
652
|
-
**How It Works**: Cross-encoder scores query-document pair directly (not separate embeddings).
|
|
653
|
-
|
|
654
|
-
**Cache Key**: `hash(query + file + model)`
|
|
655
|
-
|
|
656
|
-
#### Stage 9: Position-Aware Score Blending
|
|
657
|
-
|
|
658
|
-
```typescript
|
|
659
|
-
// Combine RRF and reranker scores based on rank
|
|
660
|
-
const blended = candidates.map((doc, rank) => {
|
|
661
|
-
const rrfScore = doc.score;
|
|
662
|
-
const rerankScore = rerankScores.get(doc.file) || 0;
|
|
663
|
-
|
|
664
|
-
// Top results: trust retrieval more
|
|
665
|
-
// Lower results: trust reranker more
|
|
666
|
-
let rrfWeight, rerankWeight;
|
|
667
|
-
if (rank < 3) {
|
|
668
|
-
rrfWeight = 0.75;
|
|
669
|
-
rerankWeight = 0.25;
|
|
670
|
-
} else if (rank < 10) {
|
|
671
|
-
rrfWeight = 0.60;
|
|
672
|
-
rerankWeight = 0.40;
|
|
673
|
-
} else {
|
|
674
|
-
rrfWeight = 0.40;
|
|
675
|
-
rerankWeight = 0.60;
|
|
676
|
-
}
|
|
677
|
-
|
|
678
|
-
const finalScore = rrfWeight * rrfScore + rerankWeight * rerankScore;
|
|
679
|
-
|
|
680
|
-
return { ...doc, score: finalScore };
|
|
681
|
-
});
|
|
682
|
-
```
|
|
683
|
-
|
|
684
|
-
**Rationale**:
|
|
685
|
-
- Top results likely have both strong lexical AND semantic signals
|
|
686
|
-
- Lower results may be semantically relevant but lexically weak
|
|
687
|
-
- Reranker helps elevate hidden gems
|
|
688
|
-
|
|
689
|
-
#### Stage 10: Final Sorting
|
|
690
|
-
|
|
691
|
-
```typescript
|
|
692
|
-
return blended
|
|
693
|
-
.sort((a, b) => b.score - a.score)
|
|
694
|
-
.slice(0, limit);
|
|
695
|
-
```
|
|
696
|
-
|
|
697
|
-
**Latency Breakdown**:
|
|
698
|
-
- Cold (first query): 2-3s (model loading + expansion + reranking)
|
|
699
|
-
- Warm (cached expansion): ~500ms (reranking only)
|
|
700
|
-
- Strong signal (skipped): ~200ms (FTS + vector, no LLM)
|
|
701
|
-
|
|
702
|
-
---
|
|
703
|
-
|
|
704
|
-
## Vector Search Implementation
|
|
705
|
-
|
|
706
|
-
### Embedding Model: EmbeddingGemma
|
|
707
|
-
|
|
708
|
-
**Specs**:
|
|
709
|
-
- Parameters: 300M
|
|
710
|
-
- Dimensions: 384 (QMD docs say 768, but 384 is standard)
|
|
711
|
-
- Format: GGUF (quantized)
|
|
712
|
-
- Size: 300MB download
|
|
713
|
-
- Tokenizer: SentencePiece
|
|
714
|
-
|
|
715
|
-
**Prompt Format** (Nomic-style):
|
|
716
|
-
```typescript
|
|
717
|
-
// Query embedding
|
|
718
|
-
formatQueryForEmbedding(query: string): string {
|
|
719
|
-
return `task: search result | query: ${query}`;
|
|
720
|
-
}
|
|
721
|
-
|
|
722
|
-
// Document embedding
|
|
723
|
-
formatDocForEmbedding(text: string, title?: string): string {
|
|
724
|
-
return `title: ${title || "none"} | text: ${text}`;
|
|
725
|
-
}
|
|
726
|
-
```
|
|
727
|
-
|
|
728
|
-
**Why Prompt Formatting Matters**: Embedding models are trained on specific formats. Using the wrong format degrades quality.
|
|
729
|
-
|
|
730
|
-
### Document Chunking Strategy
|
|
731
|
-
|
|
732
|
-
QMD offers two chunking approaches:
|
|
733
|
-
|
|
734
|
-
#### 1. Token-Based Chunking (Recommended)
|
|
735
|
-
|
|
736
|
-
```typescript
|
|
737
|
-
async function chunkDocumentByTokens(
|
|
738
|
-
content: string,
|
|
739
|
-
maxTokens: number = 800,
|
|
740
|
-
overlapTokens: number = 120 // 15% of 800
|
|
741
|
-
): Promise<{ text: string; pos: number; tokens: number }[]> {
|
|
742
|
-
const llm = getDefaultLlamaCpp();
|
|
743
|
-
|
|
744
|
-
// Tokenize entire document once
|
|
745
|
-
const allTokens = await llm.tokenize(content);
|
|
746
|
-
const totalTokens = allTokens.length;
|
|
747
|
-
|
|
748
|
-
if (totalTokens <= maxTokens) {
|
|
749
|
-
return [{ text: content, pos: 0, tokens: totalTokens }];
|
|
750
|
-
}
|
|
751
|
-
|
|
752
|
-
const chunks = [];
|
|
753
|
-
const step = maxTokens - overlapTokens; // 680 tokens
|
|
754
|
-
let tokenPos = 0;
|
|
755
|
-
|
|
756
|
-
while (tokenPos < totalTokens) {
|
|
757
|
-
const chunkEnd = Math.min(tokenPos + maxTokens, totalTokens);
|
|
758
|
-
const chunkTokens = allTokens.slice(tokenPos, chunkEnd);
|
|
759
|
-
let chunkText = await llm.detokenize(chunkTokens);
|
|
760
|
-
|
|
761
|
-
// Find semantic break point if not at end
|
|
762
|
-
if (chunkEnd < totalTokens) {
|
|
763
|
-
const searchStart = Math.floor(chunkText.length * 0.7);
|
|
764
|
-
const searchSlice = chunkText.slice(searchStart);
|
|
765
|
-
|
|
766
|
-
// Priority: paragraph > sentence > line
|
|
767
|
-
const breakOffset = findBreakPoint(searchSlice);
|
|
768
|
-
if (breakOffset >= 0) {
|
|
769
|
-
chunkText = chunkText.slice(0, searchStart + breakOffset);
|
|
770
|
-
}
|
|
771
|
-
}
|
|
772
|
-
|
|
773
|
-
chunks.push({
|
|
774
|
-
text: chunkText,
|
|
775
|
-
pos: Math.floor(tokenPos * avgCharsPerToken),
|
|
776
|
-
tokens: chunkTokens.length
|
|
777
|
-
});
|
|
778
|
-
|
|
779
|
-
tokenPos += step;
|
|
780
|
-
}
|
|
781
|
-
|
|
782
|
-
return chunks;
|
|
783
|
-
}
|
|
784
|
-
```
|
|
785
|
-
|
|
786
|
-
**Parameters**:
|
|
787
|
-
- `maxTokens = 800`: EmbeddingGemma's optimal context window
|
|
788
|
-
- `overlapTokens = 120` (15%): Ensures continuity across boundaries
|
|
789
|
-
|
|
790
|
-
**Break Priority** (from store.ts:1020-1046):
|
|
791
|
-
1. Paragraph boundary (`\n\n`)
|
|
792
|
-
2. Sentence end (`. `, `.\n`, `? `, `! `)
|
|
793
|
-
3. Line break (`\n`)
|
|
794
|
-
4. Word boundary (` `)
|
|
795
|
-
5. Hard cut (if no boundary found)
|
|
796
|
-
|
|
797
|
-
**Search Window**: Last 30% of chunk (70-100% range) to avoid cutting too early.
|
|
798
|
-
|
|
799
|
-
#### 2. Character-Based Chunking (Fallback)
|
|
800
|
-
|
|
801
|
-
```typescript
|
|
802
|
-
function chunkDocument(
|
|
803
|
-
content: string,
|
|
804
|
-
maxChars: number = 3200, // ~800 tokens @ 4 chars/token
|
|
805
|
-
overlapChars: number = 480 // 15% overlap
|
|
806
|
-
): { text: string; pos: number }[] {
|
|
807
|
-
// Similar logic but operates on characters instead of tokens
|
|
808
|
-
// Faster but less accurate (doesn't respect token boundaries)
|
|
809
|
-
}
|
|
810
|
-
```
|
|
811
|
-
|
|
812
|
-
**When to Use**: Synchronous contexts where async tokenization isn't available.
|
|
813
|
-
|
|
814
|
-
### sqlite-vec Integration
|
|
815
|
-
|
|
816
|
-
QMD uses **sqlite-vec 0.1.x** (vec0 virtual table):
|
|
817
|
-
|
|
818
|
-
```typescript
|
|
819
|
-
// Create virtual table for native vectors
|
|
820
|
-
db.exec(`
|
|
821
|
-
CREATE VIRTUAL TABLE vectors_vec USING vec0(
|
|
822
|
-
hash_seq TEXT PRIMARY KEY,
|
|
823
|
-
embedding float[384] distance_metric=cosine
|
|
824
|
-
)
|
|
825
|
-
`);
|
|
826
|
-
|
|
827
|
-
// Insert embedding (note: Float32Array required)
|
|
828
|
-
const embedding = new Float32Array(embeddingArray);
|
|
829
|
-
db.prepare(`
|
|
830
|
-
INSERT INTO vectors_vec (hash_seq, embedding) VALUES (?, ?)
|
|
831
|
-
`).run(`${hash}_${seq}`, embedding);
|
|
832
|
-
|
|
833
|
-
// KNN search (CRITICAL: no JOINs in same query!)
|
|
834
|
-
const vecResults = db.prepare(`
|
|
835
|
-
SELECT hash_seq, distance
|
|
836
|
-
FROM vectors_vec
|
|
837
|
-
WHERE embedding MATCH ? AND k = ?
|
|
838
|
-
`).all(queryEmbedding, limit * 3);
|
|
839
|
-
|
|
840
|
-
// Then join with documents in separate query
|
|
841
|
-
const docs = db.prepare(`
|
|
842
|
-
SELECT * FROM documents WHERE hash IN (...)
|
|
843
|
-
`).all(hashList);
|
|
844
|
-
```
|
|
845
|
-
|
|
846
|
-
**Key Insights**:
|
|
847
|
-
|
|
848
|
-
1. **Two-Step Pattern Required**: JOINs with vec0 tables hang (confirmed bug)
|
|
849
|
-
2. **Float32Array**: Must convert number[] to typed array
|
|
850
|
-
3. **Cosine Distance**: Returns 0.0 (identical) to 2.0 (opposite)
|
|
851
|
-
4. **KNN Parameter**: Request `limit * 3` to allow for deduplication
|
|
247
|
+
**Key patterns**:
|
|
248
|
+
1. **Dual output**: Both `content` (human-readable text) and `structuredContent` (JSON) returned from every tool
|
|
249
|
+
2. **Zod validation**: Input schemas use Zod v4 with `.describe()` for auto-documentation
|
|
250
|
+
3. **Resource template**: Documents accessible via `qmd://{+path}` URI pattern with suffix matching fallback (`mcp.ts:105-166`)
|
|
251
|
+
4. **Query guide prompt**: Registered prompt explaining search strategy to LLMs (`mcp.ts:172-252`)
|
|
252
|
+
5. **Line numbers**: Default in resource output for precise references
|
|
253
|
+
6. **Error handling**: `isError: true` flag for clear error signaling, fuzzy file suggestions on not-found
|
|
852
254
|
|
|
853
|
-
|
|
854
|
-
|
|
855
|
-
QMD deduplicates **per-document** after vector search:
|
|
856
|
-
|
|
857
|
-
```typescript
|
|
858
|
-
// Multiple chunks per document may match
|
|
859
|
-
// Keep only the best chunk per document
|
|
860
|
-
const seen = new Map<string, { doc, bestDistance }>();
|
|
861
|
-
|
|
862
|
-
for (const row of docRows) {
|
|
863
|
-
const distance = distanceMap.get(row.hash_seq);
|
|
864
|
-
const existing = seen.get(row.filepath);
|
|
865
|
-
|
|
866
|
-
if (!existing || distance < existing.bestDistance) {
|
|
867
|
-
seen.set(row.filepath, { doc: row, bestDistance: distance });
|
|
868
|
-
}
|
|
869
|
-
}
|
|
870
|
-
|
|
871
|
-
return Array.from(seen.values())
|
|
872
|
-
.sort((a, b) => a.bestDistance - b.bestDistance);
|
|
873
|
-
```
|
|
874
|
-
|
|
875
|
-
**Rationale**: Users want documents, not chunks. Show best chunk per doc.
|
|
255
|
+
**Relevance to ClaudeMemory**: We already have 18 MCP tools, but QMD's dual `content`/`structuredContent` pattern is worth adopting — it ensures both human (text summary) and machine (JSON) consumers get optimal formats. The registered prompt for query guidance is also a good pattern for improving Claude's tool usage.
|
|
876
256
|
|
|
877
257
|
---
|
|
878
258
|
|
|
879
|
-
|
|
880
|
-
|
|
881
|
-
### node-llama-cpp Abstraction
|
|
882
|
-
|
|
883
|
-
QMD uses **node-llama-cpp** for local inference:
|
|
884
|
-
|
|
885
|
-
```typescript
|
|
886
|
-
import { getLlama, LlamaModel, LlamaChatSession } from "node-llama-cpp";
|
|
887
|
-
|
|
888
|
-
class LlamaCpp implements LLM {
|
|
889
|
-
private llama: Llama | null = null;
|
|
890
|
-
private embedModel: LlamaModel | null = null;
|
|
891
|
-
private rerankModel: LlamaModel | null = null;
|
|
892
|
-
private generateModel: LlamaModel | null = null;
|
|
893
|
-
|
|
894
|
-
// Lazy loading with singleton pattern
|
|
895
|
-
private async ensureLlama(): Promise<Llama> {
|
|
896
|
-
if (!this.llama) {
|
|
897
|
-
this.llama = await getLlama({ logLevel: LlamaLogLevel.error });
|
|
898
|
-
}
|
|
899
|
-
return this.llama;
|
|
900
|
-
}
|
|
901
|
-
|
|
902
|
-
private async ensureEmbedModel(): Promise<LlamaModel> {
|
|
903
|
-
if (!this.embedModel) {
|
|
904
|
-
const llama = await this.ensureLlama();
|
|
905
|
-
const modelPath = await resolveModelFile(
|
|
906
|
-
"hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf",
|
|
907
|
-
this.modelCacheDir
|
|
908
|
-
);
|
|
909
|
-
this.embedModel = await llama.loadModel({ modelPath });
|
|
910
|
-
}
|
|
911
|
-
return this.embedModel;
|
|
912
|
-
}
|
|
913
|
-
}
|
|
914
|
-
```
|
|
915
|
-
|
|
916
|
-
**Model Download**: Automatic from HuggingFace (cached in `~/.cache/qmd/models/`)
|
|
917
|
-
|
|
918
|
-
### Lazy Model Loading
|
|
919
|
-
|
|
920
|
-
**Strategy**: Load models on first use, keep in memory, unload after 2 minutes idle.
|
|
921
|
-
|
|
922
|
-
```typescript
|
|
923
|
-
// Inactivity timer management
|
|
924
|
-
private touchActivity(): void {
|
|
925
|
-
if (this.inactivityTimer) {
|
|
926
|
-
clearTimeout(this.inactivityTimer);
|
|
927
|
-
}
|
|
928
|
-
|
|
929
|
-
if (this.inactivityTimeoutMs > 0 && this.hasLoadedContexts()) {
|
|
930
|
-
this.inactivityTimer = setTimeout(() => {
|
|
931
|
-
this.unloadIdleResources();
|
|
932
|
-
}, this.inactivityTimeoutMs);
|
|
933
|
-
this.inactivityTimer.unref(); // Don't block process exit
|
|
934
|
-
}
|
|
935
|
-
}
|
|
936
|
-
|
|
937
|
-
// Unload contexts (heavy) but keep models (fast reload)
|
|
938
|
-
async unloadIdleResources(): Promise<void> {
|
|
939
|
-
if (this.embedContext) {
|
|
940
|
-
await this.embedContext.dispose();
|
|
941
|
-
this.embedContext = null;
|
|
942
|
-
}
|
|
943
|
-
if (this.rerankContext) {
|
|
944
|
-
await this.rerankContext.dispose();
|
|
945
|
-
this.rerankContext = null;
|
|
946
|
-
}
|
|
947
|
-
|
|
948
|
-
// Optional: also dispose models if disposeModelsOnInactivity=true
|
|
949
|
-
// (default: false, keep models loaded)
|
|
950
|
-
}
|
|
951
|
-
```
|
|
952
|
-
|
|
953
|
-
**Lifecycle** (from llm.ts comments):
|
|
954
|
-
```
|
|
955
|
-
llama (lightweight) → model (VRAM) → context (VRAM) → sequence (per-session)
|
|
956
|
-
```
|
|
957
|
-
|
|
958
|
-
**Why This Matters**:
|
|
959
|
-
- **Cold start**: First query loads models (~2-3s)
|
|
960
|
-
- **Warm**: Subsequent queries use loaded models (~200-500ms)
|
|
961
|
-
- **Idle**: After 2min, contexts unloaded (models stay unless configured)
|
|
962
|
-
|
|
963
|
-
### Query Expansion
|
|
964
|
-
|
|
965
|
-
**Purpose**: Generate alternative phrasings for better recall.
|
|
966
|
-
|
|
967
|
-
**LLM Prompt** (from llm.ts:637-679):
|
|
968
|
-
```typescript
|
|
969
|
-
const prompt = `You are a search query optimization expert. Your task is to improve retrieval by rewriting queries and generating hypothetical documents.
|
|
970
|
-
|
|
971
|
-
Original Query: ${query}
|
|
972
|
-
|
|
973
|
-
${context ? `Additional Context, ONLY USE IF RELEVANT:\n\n<context>${context}</context>` : ""}
|
|
974
|
-
|
|
975
|
-
## Step 1: Query Analysis
|
|
976
|
-
Identify entities, search intent, and missing context.
|
|
977
|
-
|
|
978
|
-
## Step 2: Generate Hypothetical Document
|
|
979
|
-
Write a focused sentence passage that would answer the query. Include specific terminology and domain vocabulary.
|
|
980
|
-
|
|
981
|
-
## Step 3: Query Rewrites
|
|
982
|
-
Generate 2-3 alternative search queries that resolve ambiguities. Use terminology from the hypothetical document.
|
|
983
|
-
|
|
984
|
-
## Step 4: Final Retrieval Text
|
|
985
|
-
Output exactly 1-3 'lex' lines, 1-3 'vec' lines, and MAX ONE 'hyde' line.
|
|
986
|
-
|
|
987
|
-
<format>
|
|
988
|
-
lex: {single search term}
|
|
989
|
-
vec: {single vector query}
|
|
990
|
-
hyde: {complete hypothetical document passage from Step 2 on a SINGLE LINE}
|
|
991
|
-
</format>
|
|
992
|
-
|
|
993
|
-
<rules>
|
|
994
|
-
- DO NOT repeat the same line.
|
|
995
|
-
- Each 'lex:' line MUST be a different keyword variation based on the ORIGINAL QUERY.
|
|
996
|
-
- Each 'vec:' line MUST be a different semantic variation based on the ORIGINAL QUERY.
|
|
997
|
-
- The 'hyde:' line MUST be the full sentence passage from Step 2, but all on one line.
|
|
998
|
-
</rules>
|
|
999
|
-
|
|
1000
|
-
Final Output:`;
|
|
1001
|
-
```
|
|
1002
|
-
|
|
1003
|
-
**Grammar** (constrained generation):
|
|
1004
|
-
```typescript
|
|
1005
|
-
const grammar = await llama.createGrammar({
|
|
1006
|
-
grammar: `
|
|
1007
|
-
root ::= line+
|
|
1008
|
-
line ::= type ": " content "\\n"
|
|
1009
|
-
type ::= "lex" | "vec" | "hyde"
|
|
1010
|
-
content ::= [^\\n]+
|
|
1011
|
-
`
|
|
1012
|
-
});
|
|
1013
|
-
```
|
|
1014
|
-
|
|
1015
|
-
**Output Parsing**:
|
|
1016
|
-
```typescript
|
|
1017
|
-
const result = await session.prompt(prompt, { grammar, maxTokens: 1000, temperature: 1 });
|
|
1018
|
-
const lines = result.trim().split("\n");
|
|
1019
|
-
const queryables: Queryable[] = lines.map(line => {
|
|
1020
|
-
const colonIdx = line.indexOf(":");
|
|
1021
|
-
const type = line.slice(0, colonIdx).trim();
|
|
1022
|
-
const text = line.slice(colonIdx + 1).trim();
|
|
1023
|
-
return { type: type as QueryType, text };
|
|
1024
|
-
}).filter(q => q.type === 'lex' || q.type === 'vec' || q.type === 'hyde');
|
|
1025
|
-
```
|
|
1026
|
-
|
|
1027
|
-
**Example**:
|
|
1028
|
-
```
|
|
1029
|
-
Query: "how to structure REST endpoints"
|
|
1030
|
-
|
|
1031
|
-
Output:
|
|
1032
|
-
lex: REST API design
|
|
1033
|
-
lex: endpoint organization patterns
|
|
1034
|
-
vec: RESTful service architecture principles
|
|
1035
|
-
vec: HTTP resource modeling best practices
|
|
1036
|
-
hyde: REST endpoints should follow resource-oriented design with clear hierarchies. Use nouns for resources, HTTP methods for operations, and consistent naming conventions for discoverability.
|
|
1037
|
-
```
|
|
1038
|
-
|
|
1039
|
-
**Model**: Qwen3-1.7B (2.2GB)
|
|
1040
|
-
|
|
1041
|
-
**Cache Hit Rate**: High for repeated queries (~80% per QMD usage data)
|
|
1042
|
-
|
|
1043
|
-
### LLM Reranking
|
|
1044
|
-
|
|
1045
|
-
**Purpose**: Score query-document relevance using cross-encoder.
|
|
1046
|
-
|
|
1047
|
-
**Implementation**:
|
|
1048
|
-
```typescript
|
|
1049
|
-
async rerank(
|
|
1050
|
-
query: string,
|
|
1051
|
-
documents: RerankDocument[],
|
|
1052
|
-
options: RerankOptions = {}
|
|
1053
|
-
): Promise<RerankResult> {
|
|
1054
|
-
const context = await this.ensureRerankContext();
|
|
1055
|
-
|
|
1056
|
-
// Extract text for ranking
|
|
1057
|
-
const texts = documents.map(doc => doc.text);
|
|
1058
|
-
|
|
1059
|
-
// Use native ranking API (returns sorted by score)
|
|
1060
|
-
const ranked = await context.rankAndSort(query, texts);
|
|
1061
|
-
|
|
1062
|
-
// Map back to original documents
|
|
1063
|
-
const results = ranked.map(item => {
|
|
1064
|
-
const docInfo = textToDoc.get(item.document);
|
|
1065
|
-
return {
|
|
1066
|
-
file: docInfo.file,
|
|
1067
|
-
score: item.score, // 0.0 (irrelevant) to 1.0 (highly relevant)
|
|
1068
|
-
index: docInfo.index
|
|
1069
|
-
};
|
|
1070
|
-
});
|
|
1071
|
-
|
|
1072
|
-
return { results, model: this.rerankModelUri };
|
|
1073
|
-
}
|
|
1074
|
-
```
|
|
1075
|
-
|
|
1076
|
-
**Model**: Qwen3-Reranker-0.6B (640MB)
|
|
259
|
+
### Component 4: Session-Scoped LLM Lifecycle
|
|
1077
260
|
|
|
1078
|
-
**
|
|
261
|
+
**Purpose**: Manage LLM model loading, context creation, and cleanup with structured lifecycle guarantees.
|
|
1079
262
|
|
|
1080
|
-
**
|
|
263
|
+
**Location**: `src/llm.ts:126-146`
|
|
1081
264
|
|
|
1082
|
-
|
|
1083
|
-
|
|
1084
|
-
**Probabilistic Cleanup** (from store.ts:804-807):
|
|
265
|
+
**Session Interface** (from `llm.ts:137-146`):
|
|
1085
266
|
```typescript
|
|
1086
|
-
|
|
1087
|
-
|
|
1088
|
-
|
|
1089
|
-
|
|
1090
|
-
|
|
1091
|
-
|
|
1092
|
-
|
|
1093
|
-
LIMIT 1000
|
|
1094
|
-
)
|
|
1095
|
-
`);
|
|
267
|
+
export interface ILLMSession {
|
|
268
|
+
embed(text: string, options?: EmbedOptions): Promise<EmbeddingResult | null>;
|
|
269
|
+
embedBatch(texts: string[]): Promise<(EmbeddingResult | null)[]>;
|
|
270
|
+
expandQuery(query: string, options?): Promise<Queryable[]>;
|
|
271
|
+
rerank(query: string, documents: RerankDocument[]): Promise<RerankResult>;
|
|
272
|
+
readonly isValid: boolean;
|
|
273
|
+
readonly signal: AbortSignal;
|
|
1096
274
|
}
|
|
1097
275
|
```
|
|
1098
276
|
|
|
1099
|
-
**
|
|
1100
|
-
-
|
|
1101
|
-
-
|
|
1102
|
-
-
|
|
1103
|
-
|
|
1104
|
-
**Cache Size Estimate**:
|
|
1105
|
-
- Query expansion: ~500 bytes per entry
|
|
1106
|
-
- Reranking: ~50 bytes per entry (just score)
|
|
1107
|
-
- 1000 entries ≈ 500KB (negligible)
|
|
1108
|
-
|
|
1109
|
-
---
|
|
1110
|
-
|
|
1111
|
-
## Performance Characteristics
|
|
1112
|
-
|
|
1113
|
-
### Evaluation Methodology
|
|
1114
|
-
|
|
1115
|
-
QMD includes comprehensive test suite in `eval.test.ts`:
|
|
1116
|
-
|
|
1117
|
-
**Test Corpus**: 6 synthetic documents covering diverse topics
|
|
1118
|
-
- api-design.md
|
|
1119
|
-
- fundraising.md
|
|
1120
|
-
- distributed-systems.md
|
|
1121
|
-
- machine-learning.md
|
|
1122
|
-
- remote-work.md
|
|
1123
|
-
- product-launch.md
|
|
1124
|
-
|
|
1125
|
-
**Query Design**: 24 queries across 4 difficulty levels
|
|
1126
|
-
|
|
1127
|
-
#### Easy Queries (6) - Exact keyword matches
|
|
1128
|
-
```typescript
|
|
1129
|
-
{ query: "API versioning", expectedDoc: "api-design" }
|
|
1130
|
-
{ query: "Series A fundraising", expectedDoc: "fundraising" }
|
|
1131
|
-
{ query: "CAP theorem", expectedDoc: "distributed-systems" }
|
|
1132
|
-
{ query: "overfitting machine learning", expectedDoc: "machine-learning" }
|
|
1133
|
-
{ query: "remote work VPN", expectedDoc: "remote-work" }
|
|
1134
|
-
{ query: "Project Phoenix retrospective", expectedDoc: "product-launch" }
|
|
1135
|
-
```
|
|
1136
|
-
|
|
1137
|
-
**Expected**: BM25 should excel (≥80% Hit@3)
|
|
1138
|
-
|
|
1139
|
-
#### Medium Queries (6) - Semantic/conceptual
|
|
1140
|
-
```typescript
|
|
1141
|
-
{ query: "how to structure REST endpoints", expectedDoc: "api-design" }
|
|
1142
|
-
{ query: "raising money for startup", expectedDoc: "fundraising" }
|
|
1143
|
-
{ query: "consistency vs availability tradeoffs", expectedDoc: "distributed-systems" }
|
|
1144
|
-
{ query: "how to prevent models from memorizing data", expectedDoc: "machine-learning" }
|
|
1145
|
-
{ query: "working from home guidelines", expectedDoc: "remote-work" }
|
|
1146
|
-
{ query: "what went wrong with the launch", expectedDoc: "product-launch" }
|
|
1147
|
-
```
|
|
1148
|
-
|
|
1149
|
-
**Expected**: Vectors should outperform BM25 (≥40% vs ≥15%)
|
|
277
|
+
**Key patterns**:
|
|
278
|
+
- Sessions have `isValid` flag and `signal` (AbortSignal) for lifecycle tracking
|
|
279
|
+
- Maximum duration timeout prevents runaway sessions
|
|
280
|
+
- Models lazy-load but stay resident; contexts dispose after 2-min idle
|
|
281
|
+
- Singleton pattern ensures only one LLM instance (memory management)
|
|
1150
282
|
|
|
1151
|
-
|
|
1152
|
-
```typescript
|
|
1153
|
-
{ query: "nouns not verbs", expectedDoc: "api-design" }
|
|
1154
|
-
{ query: "Sequoia investor pitch", expectedDoc: "fundraising" }
|
|
1155
|
-
{ query: "Raft algorithm leader election", expectedDoc: "distributed-systems" }
|
|
1156
|
-
{ query: "F1 score precision recall", expectedDoc: "machine-learning" }
|
|
1157
|
-
{ query: "quarterly team gathering travel", expectedDoc: "remote-work" }
|
|
1158
|
-
{ query: "beta program 47 bugs", expectedDoc: "product-launch" }
|
|
1159
|
-
```
|
|
1160
|
-
|
|
1161
|
-
**Expected**: Both methods struggle, hybrid helps (≥35% @ H@5 vs ≥15%)
|
|
1162
|
-
|
|
1163
|
-
#### Fusion Queries (6) - Multi-signal needed
|
|
1164
|
-
```typescript
|
|
1165
|
-
{ query: "how much runway before running out of money", expectedDoc: "fundraising" }
|
|
1166
|
-
{ query: "datacenter replication sync strategy", expectedDoc: "distributed-systems" }
|
|
1167
|
-
{ query: "splitting data for training and testing", expectedDoc: "machine-learning" }
|
|
1168
|
-
{ query: "JSON response codes error messages", expectedDoc: "api-design" }
|
|
1169
|
-
{ query: "video calls camera async messaging", expectedDoc: "remote-work" }
|
|
1170
|
-
{ query: "CI/CD pipeline testing coverage", expectedDoc: "product-launch" }
|
|
1171
|
-
```
|
|
1172
|
-
|
|
1173
|
-
**Expected**: RRF combines weak signals (≥50% vs ~15-30% for single methods)
|
|
1174
|
-
|
|
1175
|
-
### Results Summary
|
|
1176
|
-
|
|
1177
|
-
| Method | Easy H@3 | Medium H@3 | Hard H@5 | Fusion H@3 | Overall H@3 |
|
|
1178
|
-
|--------|----------|------------|----------|------------|-------------|
|
|
1179
|
-
| **BM25** | ≥80% | ≥15% | ≥15% | ~15% | ≥40% |
|
|
1180
|
-
| **Vector** | ≥60% | ≥40% | ≥30% | ~30% | ≥50% |
|
|
1181
|
-
| **Hybrid (RRF)** | ≥80% | **≥50%** | **≥35%** | **≥50%** | **≥60%** |
|
|
1182
|
-
|
|
1183
|
-
**Key Findings**:
|
|
1184
|
-
1. BM25 sufficient for easy queries (exact matches)
|
|
1185
|
-
2. Vectors essential for medium queries (+233% improvement)
|
|
1186
|
-
3. RRF fusion best for fusion queries (combines weak signals)
|
|
1187
|
-
4. Overall: Hybrid provides 50% improvement over BM25 baseline
|
|
1188
|
-
|
|
1189
|
-
### Latency Analysis
|
|
1190
|
-
|
|
1191
|
-
**Measured on M1 Mac, 16GB RAM**:
|
|
1192
|
-
|
|
1193
|
-
| Operation | Cold Start | Warm (Cached) | Strong Signal |
|
|
1194
|
-
|-----------|------------|---------------|---------------|
|
|
1195
|
-
| `search` (BM25) | <50ms | <50ms | <50ms |
|
|
1196
|
-
| `vsearch` (Vector) | ~2s (model load) | ~200ms | ~200ms |
|
|
1197
|
-
| `query` (Hybrid) | 3-5s (all models) | ~500ms | ~200ms |
|
|
1198
|
-
|
|
1199
|
-
**Breakdown for `query` (cold)**:
|
|
1200
|
-
- Model loading: ~2s (embed + rerank + expand)
|
|
1201
|
-
- Query expansion: ~800ms (LLM generation)
|
|
1202
|
-
- FTS + Vector: ~300ms (parallel)
|
|
1203
|
-
- RRF fusion: <10ms (pure algorithm)
|
|
1204
|
-
- Reranking: ~400ms (cross-encoder scoring)
|
|
1205
|
-
- Total: 3-5s
|
|
1206
|
-
|
|
1207
|
-
**Breakdown for `query` (warm)**:
|
|
1208
|
-
- FTS + Vector: ~300ms
|
|
1209
|
-
- RRF fusion: <10ms
|
|
1210
|
-
- Reranking (cached): ~50ms
|
|
1211
|
-
- Total: ~400-500ms
|
|
1212
|
-
|
|
1213
|
-
**Breakdown for `query` (strong signal, skipped)**:
|
|
1214
|
-
- FTS: ~50ms
|
|
1215
|
-
- Smart detection: <5ms
|
|
1216
|
-
- Vector (skipped): 0ms
|
|
1217
|
-
- Expansion (skipped): 0ms
|
|
1218
|
-
- Reranking (skipped): 0ms
|
|
1219
|
-
- Total: ~100-150ms
|
|
1220
|
-
|
|
1221
|
-
### Resource Usage
|
|
1222
|
-
|
|
1223
|
-
**Disk Space**:
|
|
1224
|
-
- Per document: ~5KB (body + metadata)
|
|
1225
|
-
- Per chunk embedding: ~1.5KB (384 floats + metadata)
|
|
1226
|
-
- Example: 1000 documents, 5 chunks avg = 5MB + 7.5MB = **12.5MB total**
|
|
1227
|
-
|
|
1228
|
-
**Memory**:
|
|
1229
|
-
- Base process: ~50MB
|
|
1230
|
-
- EmbeddingGemma loaded: +300MB
|
|
1231
|
-
- Reranker loaded: +640MB
|
|
1232
|
-
- Expansion model loaded: +2.2GB
|
|
1233
|
-
- **Peak**: ~3.2GB (all models loaded)
|
|
1234
|
-
|
|
1235
|
-
**VRAM** (GPU acceleration):
|
|
1236
|
-
- EmbeddingGemma: ~300MB
|
|
1237
|
-
- Reranker: ~640MB
|
|
1238
|
-
- Expansion: ~2.2GB
|
|
1239
|
-
- **Peak**: ~3.2GB
|
|
1240
|
-
|
|
1241
|
-
**Optimization**: Models lazy-load and unload after 2min idle.
|
|
1242
|
-
|
|
1243
|
-
### Scalability
|
|
1244
|
-
|
|
1245
|
-
**Tested Corpus Sizes**:
|
|
1246
|
-
- 100 documents: FTS <10ms, Vector <100ms
|
|
1247
|
-
- 1,000 documents: FTS <50ms, Vector <200ms
|
|
1248
|
-
- 10,000 documents: FTS <200ms, Vector <500ms
|
|
1249
|
-
|
|
1250
|
-
**Bottlenecks**:
|
|
1251
|
-
1. **Embedding generation**: Linear with document count (once)
|
|
1252
|
-
2. **Vector search**: KNN scales log(n) with proper indexing
|
|
1253
|
-
3. **FTS search**: Scales well to millions of documents
|
|
1254
|
-
4. **Reranking**: Linear with candidate count (top 30-40)
|
|
1255
|
-
|
|
1256
|
-
**Recommended Limits**:
|
|
1257
|
-
- Documents: 50,000+ (tested in production)
|
|
1258
|
-
- Per-document size: <10MB (chunking handles larger)
|
|
1259
|
-
- Query length: <500 tokens (embedding model limit)
|
|
283
|
+
**Relevance to ClaudeMemory**: If we ever integrate local LLMs for distillation, this session-scoped lifecycle pattern is the right approach. Clean abort propagation via AbortSignal is a good practice for any long-running operation.
|
|
1260
284
|
|
|
1261
285
|
---
|
|
1262
286
|
|
|
1263
287
|
## Comparative Analysis
|
|
1264
288
|
|
|
1265
|
-
###
|
|
1266
|
-
|
|
1267
|
-
| Dimension | QMD | ClaudeMemory | Analysis |
|
|
1268
|
-
|-----------|-----|--------------|----------|
|
|
1269
|
-
| **Granularity** | Full markdown documents | Structured facts (triples) | **Different use cases**: QMD = recall, ClaudeMemory = extraction |
|
|
1270
|
-
| **Storage** | Content-addressable (SHA256) | Entity-predicate-object | **QMD advantage**: Auto-deduplication. **ClaudeMemory advantage**: Queryable structure |
|
|
1271
|
-
| **Retrieval Goal** | "Show me docs about X" | "What do we know about X?" | **Complementary**: QMD finds context, ClaudeMemory distills knowledge |
|
|
1272
|
-
| **Truth Model** | All documents valid | Supersession + conflicts | **ClaudeMemory advantage**: Resolves contradictions |
|
|
1273
|
-
| **Scope** | YAML collections | Dual-database | **ClaudeMemory advantage**: Clean separation |
|
|
1274
|
-
|
|
1275
|
-
**Verdict**: **Different paradigms, not competitors**. QMD optimizes for document recall, ClaudeMemory for knowledge graphs.
|
|
289
|
+
### What QMD Does Well (New Findings)
|
|
1276
290
|
|
|
1277
|
-
|
|
291
|
+
#### 1. Custom Fine-Tuned Model Pipeline
|
|
292
|
+
- **Description**: Full training pipeline (SFT → GRPO → GGUF conversion) for search-specific model
|
|
293
|
+
- **Evidence**: `finetune/reward.py` — multi-dimensional reward function; `finetune/train.py` — unified training script
|
|
294
|
+
- **Why It Works**: Domain-specific models outperform general-purpose LLMs for structured tasks. The two-stage approach (format learning via SFT, quality refinement via GRPO) is state-of-the-art.
|
|
295
|
+
- **Metric**: Min 92% average score required before deployment
|
|
1278
296
|
|
|
1279
|
-
|
|
1280
|
-
|
|
1281
|
-
|
|
1282
|
-
|
|
1283
|
-
|
|
1284
|
-
| **Reranking** | Cross-encoder LLM | None | **QMD** (but costly) |
|
|
1285
|
-
| **Query Expansion** | LLM-generated variants | None | **QMD** (but costly) |
|
|
297
|
+
#### 2. Plugin Distribution
|
|
298
|
+
- **Description**: Ships as a Claude Code marketplace plugin with zero-config MCP + skills
|
|
299
|
+
- **Evidence**: `.claude-plugin/marketplace.json`, `skills/qmd/SKILL.md`
|
|
300
|
+
- **Why It Works**: `claude marketplace add tobi/qmd` is dramatically simpler than manual gem install + MCP config + hook setup
|
|
301
|
+
- **Impact**: Massive UX improvement for installation
|
|
1286
302
|
|
|
1287
|
-
|
|
303
|
+
#### 3. Typed Query Routing
|
|
304
|
+
- **Description**: Query expansion produces typed outputs (`lex:`, `vec:`, `hyde:`) routed to appropriate backends
|
|
305
|
+
- **Evidence**: `llm.ts:637-679` — structured prompt; `llm.ts:1006-1013` — grammar constraint
|
|
306
|
+
- **Why It Works**: Different search backends have different strengths. Routing keyword queries to BM25 and semantic queries to vector search maximizes recall.
|
|
1288
307
|
|
|
1289
|
-
|
|
308
|
+
#### 4. Dual Content/StructuredContent MCP Responses
|
|
309
|
+
- **Description**: Every MCP tool returns both human-readable text summary and machine-parseable JSON
|
|
310
|
+
- **Evidence**: `mcp.ts:288-291` — `return { content: [...], structuredContent: {...} }`
|
|
311
|
+
- **Why It Works**: LLMs can parse both formats, but text summaries are more token-efficient for simple consumption
|
|
1290
312
|
|
|
1291
|
-
###
|
|
313
|
+
### What We Do Well
|
|
1292
314
|
|
|
1293
|
-
|
|
1294
|
-
|
|
1295
|
-
|
|
1296
|
-
|
|
1297
|
-
| **Index Type** | Proper vector index | Sequential scan | **QMD** |
|
|
1298
|
-
| **Scalability** | Tested to 10,000+ docs | Limited by JSON parsing | **QMD** |
|
|
315
|
+
#### 1. Fact-Based Knowledge Graph
|
|
316
|
+
- Our subject-predicate-object triples enable structured queries and inference
|
|
317
|
+
- Truth maintenance resolves contradictions automatically
|
|
318
|
+
- Far richer than document-level retrieval for knowledge extraction
|
|
1299
319
|
|
|
1300
|
-
|
|
320
|
+
#### 2. Dual-Database Architecture
|
|
321
|
+
- Clean global/project separation without YAML collections
|
|
322
|
+
- Simpler queries, clearer data ownership
|
|
1301
323
|
|
|
1302
|
-
|
|
324
|
+
#### 3. Comprehensive MCP Surface
|
|
325
|
+
- 18 tools vs QMD's 6 — we cover recall, explain, manage, monitor
|
|
326
|
+
- Progressive disclosure (recall_index → recall_details) for token efficiency
|
|
1303
327
|
|
|
1304
|
-
|
|
1305
|
-
|
|
1306
|
-
|
|
1307
|
-
|
|
1308
|
-
| **Embeddings** | EmbeddingGemma (300MB) | TF-IDF (stdlib) | **ClaudeMemory** (lighter) |
|
|
1309
|
-
| **LLM** | node-llama-cpp (3GB models) | None (distill only) | **ClaudeMemory** (lighter) |
|
|
1310
|
-
| **Install Size** | ~3.5GB (with models) | ~5MB | **ClaudeMemory** |
|
|
328
|
+
#### 4. Lightweight Dependencies
|
|
329
|
+
- ~5MB gem vs ~2GB+ with GGUF models
|
|
330
|
+
- fastembed-rb (67MB ONNX) vs EmbeddingGemma (300MB GGUF)
|
|
331
|
+
- No runtime LLM dependency
|
|
1311
332
|
|
|
1312
|
-
|
|
333
|
+
#### 5. Robust Benchmarking
|
|
334
|
+
- DevMemBench: 155 queries, Recall@k, MRR, nDCG@10
|
|
335
|
+
- 100 truth maintenance test cases
|
|
336
|
+
- 31 end-to-end scenarios with real Claude
|
|
337
|
+
- QMD has 18 eval queries — our evaluation is more comprehensive
|
|
1313
338
|
|
|
1314
|
-
###
|
|
339
|
+
### Trade-offs
|
|
1315
340
|
|
|
1316
|
-
|
|
|
1317
|
-
|
|
1318
|
-
| **
|
|
1319
|
-
| **
|
|
1320
|
-
| **
|
|
1321
|
-
|
|
1322
|
-
**Verdict**: **QMD has complete offline capability** for its use case. ClaudeMemory could adopt local embeddings for offline semantic search, but distillation still requires API.
|
|
1323
|
-
|
|
1324
|
-
### Startup Time
|
|
1325
|
-
|
|
1326
|
-
| Scenario | QMD | ClaudeMemory | Winner |
|
|
1327
|
-
|----------|-----|--------------|--------|
|
|
1328
|
-
| **Cold start** | ~2s (model load) | <100ms | **ClaudeMemory** |
|
|
1329
|
-
| **Warm start** | <100ms | <100ms | **Tie** |
|
|
1330
|
-
|
|
1331
|
-
**Verdict**: **ClaudeMemory starts faster**, which matters for CLI tools. QMD's lazy loading mitigates this.
|
|
341
|
+
| Approach | Pros | Cons | Best For |
|
|
342
|
+
|----------|------|------|----------|
|
|
343
|
+
| **QMD's LLM-powered search** | Better semantic recall, typed query routing | 2GB+ models, 2-3s cold start, complex deps | Large document collections, conceptual search |
|
|
344
|
+
| **Our FastEmbed search** | Lightweight (67MB), fast (<100ms), no LLM | Lower semantic quality for vague queries | Structured fact retrieval, quick lookups |
|
|
345
|
+
| **QMD's plugin distribution** | Zero-config install, marketplace discovery | Requires plugin ecosystem maturity | Wide user adoption |
|
|
346
|
+
| **Our gem + MCP + hooks** | Fine-grained control, works today | Complex setup, multiple config files | Power users, custom integrations |
|
|
1332
347
|
|
|
1333
348
|
---
|
|
1334
349
|
|
|
1335
350
|
## Adoption Opportunities
|
|
1336
351
|
|
|
1337
|
-
### High Priority
|
|
1338
|
-
|
|
1339
|
-
#### 1.
|
|
1340
|
-
|
|
1341
|
-
**
|
|
1342
|
-
|
|
1343
|
-
**
|
|
1344
|
-
-
|
|
1345
|
-
-
|
|
1346
|
-
-
|
|
1347
|
-
|
|
1348
|
-
|
|
1349
|
-
|
|
1350
|
-
|
|
1351
|
-
|
|
1352
|
-
|
|
1353
|
-
|
|
1354
|
-
|
|
1355
|
-
|
|
1356
|
-
|
|
1357
|
-
|
|
1358
|
-
|
|
1359
|
-
|
|
1360
|
-
|
|
1361
|
-
|
|
1362
|
-
|
|
1363
|
-
|
|
1364
|
-
|
|
1365
|
-
|
|
1366
|
-
|
|
1367
|
-
|
|
1368
|
-
|
|
1369
|
-
**
|
|
1370
|
-
-
|
|
1371
|
-
-
|
|
1372
|
-
-
|
|
1373
|
-
-
|
|
1374
|
-
|
|
1375
|
-
|
|
1376
|
-
|
|
1377
|
-
|
|
1378
|
-
|
|
1379
|
-
|
|
1380
|
-
|
|
1381
|
-
|
|
1382
|
-
|
|
1383
|
-
|
|
1384
|
-
|
|
1385
|
-
|
|
1386
|
-
|
|
1387
|
-
|
|
1388
|
-
|
|
1389
|
-
|
|
1390
|
-
|
|
1391
|
-
|
|
1392
|
-
|
|
1393
|
-
|
|
1394
|
-
|
|
1395
|
-
|
|
1396
|
-
|
|
1397
|
-
|
|
1398
|
-
|
|
1399
|
-
|
|
1400
|
-
|
|
1401
|
-
|
|
1402
|
-
|
|
1403
|
-
|
|
1404
|
-
|
|
1405
|
-
|
|
1406
|
-
|
|
1407
|
-
|
|
1408
|
-
**
|
|
1409
|
-
|
|
1410
|
-
|
|
1411
|
-
- **
|
|
1412
|
-
- **
|
|
1413
|
-
|
|
1414
|
-
**
|
|
1415
|
-
|
|
1416
|
-
|
|
1417
|
-
|
|
1418
|
-
|
|
1419
|
-
|
|
1420
|
-
|
|
1421
|
-
**
|
|
1422
|
-
|
|
1423
|
-
|
|
1424
|
-
|
|
1425
|
-
|
|
1426
|
-
|
|
1427
|
-
|
|
1428
|
-
|
|
1429
|
-
**Value**:
|
|
1430
|
-
|
|
1431
|
-
|
|
1432
|
-
|
|
1433
|
-
|
|
1434
|
-
|
|
1435
|
-
|
|
1436
|
-
|
|
1437
|
-
|
|
1438
|
-
|
|
1439
|
-
|
|
1440
|
-
|
|
1441
|
-
|
|
1442
|
-
|
|
1443
|
-
# Sort by similarity (vector) or default score (FTS)
|
|
1444
|
-
combined.values
|
|
1445
|
-
.sort_by { |r| -(r[:similarity] || 0) }
|
|
1446
|
-
.take(limit)
|
|
1447
|
-
end
|
|
1448
|
-
```
|
|
1449
|
-
|
|
1450
|
-
**Problems**:
|
|
1451
|
-
- No fusion of ranking signals
|
|
1452
|
-
- Vector scores dominate (when present)
|
|
1453
|
-
- Doesn't boost items appearing in multiple result lists
|
|
1454
|
-
- Ignores rank position (only final scores)
|
|
1455
|
-
|
|
1456
|
-
**With RRF**:
|
|
1457
|
-
```ruby
|
|
1458
|
-
# lib/claude_memory/recall/rrf_fusion.rb
|
|
1459
|
-
module ClaudeMemory
|
|
1460
|
-
module Recall
|
|
1461
|
-
class RRFusion
|
|
1462
|
-
DEFAULT_K = 60
|
|
1463
|
-
|
|
1464
|
-
def self.fuse(ranked_lists, weights: [], k: DEFAULT_K)
|
|
1465
|
-
scores = {}
|
|
1466
|
-
|
|
1467
|
-
# Accumulate RRF scores
|
|
1468
|
-
ranked_lists.each_with_index do |list, list_idx|
|
|
1469
|
-
weight = weights[list_idx] || 1.0
|
|
1470
|
-
|
|
1471
|
-
list.each_with_index do |item, rank|
|
|
1472
|
-
key = item_key(item)
|
|
1473
|
-
rrf_contribution = weight / (k + rank + 1.0)
|
|
1474
|
-
|
|
1475
|
-
if scores.key?(key)
|
|
1476
|
-
scores[key][:rrf_score] += rrf_contribution
|
|
1477
|
-
scores[key][:top_rank] = [scores[key][:top_rank], rank].min
|
|
1478
|
-
else
|
|
1479
|
-
scores[key] = {
|
|
1480
|
-
item: item,
|
|
1481
|
-
rrf_score: rrf_contribution,
|
|
1482
|
-
top_rank: rank
|
|
1483
|
-
}
|
|
1484
|
-
end
|
|
1485
|
-
end
|
|
1486
|
-
end
|
|
1487
|
-
|
|
1488
|
-
# Top-rank bonus
|
|
1489
|
-
scores.each_value do |entry|
|
|
1490
|
-
if entry[:top_rank] == 0
|
|
1491
|
-
entry[:rrf_score] += 0.05 # #1 in any list
|
|
1492
|
-
elsif entry[:top_rank] <= 2
|
|
1493
|
-
entry[:rrf_score] += 0.02 # #2-3 in any list
|
|
1494
|
-
end
|
|
1495
|
-
end
|
|
1496
|
-
|
|
1497
|
-
# Sort and return
|
|
1498
|
-
scores.values
|
|
1499
|
-
.sort_by { |e| -e[:rrf_score] }
|
|
1500
|
-
.map { |e| e[:item].merge(rrf_score: e[:rrf_score]) }
|
|
1501
|
-
end
|
|
1502
|
-
|
|
1503
|
-
private
|
|
1504
|
-
|
|
1505
|
-
def self.item_key(item)
|
|
1506
|
-
# Dedupe by fact signature
|
|
1507
|
-
fact = item[:fact]
|
|
1508
|
-
"#{fact[:subject_name]}:#{fact[:predicate]}:#{fact[:object_literal]}"
|
|
1509
|
-
end
|
|
1510
|
-
end
|
|
1511
|
-
end
|
|
1512
|
-
end
|
|
1513
|
-
```
|
|
1514
|
-
|
|
1515
|
-
**Benefits**:
|
|
1516
|
-
- **Mathematically sound**: Well-studied in IR literature
|
|
1517
|
-
- **Handles score scale differences**: BM25 vs cosine similarity
|
|
1518
|
-
- **Boosts multi-method matches**: Items in both lists get higher scores
|
|
1519
|
-
- **Preserves exact matches**: Top-rank bonus keeps strong signals at top
|
|
1520
|
-
- **Pure algorithm**: No dependencies, fast (<10ms)
|
|
1521
|
-
|
|
1522
|
-
**Implementation**:
|
|
1523
|
-
1. Create `lib/claude_memory/recall/rrf_fusion.rb`
|
|
1524
|
-
2. Update `Recall#query_semantic_dual` to use RRF
|
|
1525
|
-
3. Test with synthetic ranked lists
|
|
1526
|
-
4. Validate improvements with eval suite (if we create one)
|
|
1527
|
-
|
|
1528
|
-
**Trade-off**: Slightly more complex than naive merging, but well worth it.
|
|
1529
|
-
|
|
1530
|
-
**Recommendation**: **ADOPT IMMEDIATELY**. Pure algorithmic improvement with proven results.
|
|
1531
|
-
|
|
1532
|
-
---
|
|
1533
|
-
|
|
1534
|
-
#### 3. ⭐ Docid Short Hash System
|
|
1535
|
-
|
|
1536
|
-
**Value**: **Better UX**, enables cross-database references without context.
|
|
1537
|
-
|
|
1538
|
-
**QMD Implementation**:
|
|
1539
|
-
```typescript
|
|
1540
|
-
// Generate 6-character docid from content hash
|
|
1541
|
-
function getDocid(hash: string): string {
|
|
1542
|
-
return hash.slice(0, 6); // First 6 chars
|
|
1543
|
-
}
|
|
1544
|
-
|
|
1545
|
-
// Use in output
|
|
1546
|
-
{
|
|
1547
|
-
docid: `#${getDocid(row.hash)}`,
|
|
1548
|
-
file: row.path,
|
|
1549
|
-
// ...
|
|
1550
|
-
}
|
|
1551
|
-
|
|
1552
|
-
// Retrieval
|
|
1553
|
-
qmd get "#abc123" // Works!
|
|
1554
|
-
qmd get "abc123" // Also works!
|
|
1555
|
-
```
|
|
1556
|
-
|
|
1557
|
-
**Current ClaudeMemory**:
|
|
1558
|
-
```ruby
|
|
1559
|
-
# Facts referenced by integer IDs
|
|
1560
|
-
claude-memory explain 42 # Which database? Which project?
|
|
1561
|
-
```
|
|
1562
|
-
|
|
1563
|
-
**Problems**:
|
|
1564
|
-
- Integer IDs are database-specific (global vs project)
|
|
1565
|
-
- Not user-friendly
|
|
1566
|
-
- No quick reference format
|
|
1567
|
-
|
|
1568
|
-
**With Docids**:
|
|
1569
|
-
```ruby
|
|
1570
|
-
# Migration v8: Add docid column
|
|
1571
|
-
def migrate_to_v8_safe!
|
|
1572
|
-
@db.transaction do
|
|
1573
|
-
@db.alter_table(:facts) do
|
|
1574
|
-
add_column :docid, String, size: 8
|
|
1575
|
-
add_index :docid, unique: true
|
|
1576
|
-
end
|
|
1577
|
-
|
|
1578
|
-
# Backfill docids
|
|
1579
|
-
@db[:facts].each do |fact|
|
|
1580
|
-
signature = "#{fact[:id]}:#{fact[:subject_entity_id]}:#{fact[:predicate]}:#{fact[:object_literal]}"
|
|
1581
|
-
hash = Digest::SHA256.hexdigest(signature)
|
|
1582
|
-
docid = hash[0...8] # 8 chars for lower collision risk
|
|
1583
|
-
|
|
1584
|
-
# Handle collisions (rare with 8 chars)
|
|
1585
|
-
while @db[:facts].where(docid: docid).count > 0
|
|
1586
|
-
hash = Digest::SHA256.hexdigest(hash + rand.to_s)
|
|
1587
|
-
docid = hash[0...8]
|
|
1588
|
-
end
|
|
1589
|
-
|
|
1590
|
-
@db[:facts].where(id: fact[:id]).update(docid: docid)
|
|
1591
|
-
end
|
|
1592
|
-
end
|
|
1593
|
-
end
|
|
1594
|
-
|
|
1595
|
-
# Usage
|
|
1596
|
-
claude-memory explain abc123 # Works across databases!
|
|
1597
|
-
claude-memory explain #abc123 # Also works!
|
|
1598
|
-
|
|
1599
|
-
# Output formatting
|
|
1600
|
-
puts "Fact ##{fact[:docid]}: #{fact[:subject_name]} #{fact[:predicate]} ..."
|
|
1601
|
-
```
|
|
1602
|
-
|
|
1603
|
-
**Benefits**:
|
|
1604
|
-
- **Database-agnostic**: Same reference works for global/project facts
|
|
1605
|
-
- **User-friendly**: `#abc123` is memorable and shareable
|
|
1606
|
-
- **Standard pattern**: Git uses short SHAs, QMD uses short hashes
|
|
1607
|
-
|
|
1608
|
-
**Implementation**:
|
|
1609
|
-
1. Schema migration v8: Add `docid` column
|
|
1610
|
-
2. Backfill existing facts
|
|
1611
|
-
3. Update CLI commands to accept docids
|
|
1612
|
-
4. Update MCP tools to accept docids
|
|
1613
|
-
5. Update output formatting to show docids
|
|
1614
|
-
|
|
1615
|
-
**Trade-off**:
|
|
1616
|
-
- Hash collisions possible (8 chars = 1 in 4.3 billion, very rare)
|
|
1617
|
-
- Migration backfills existing facts (one-time cost)
|
|
1618
|
-
|
|
1619
|
-
**Recommendation**: **ADOPT IN PHASE 3**. Clear UX improvement with minimal cost.
|
|
1620
|
-
|
|
1621
|
-
---
|
|
1622
|
-
|
|
1623
|
-
#### 4. ⭐ Smart Expansion Detection
|
|
1624
|
-
|
|
1625
|
-
**Value**: **Skip unnecessary vector search** when FTS finds exact match, saving 200-500ms per query.
|
|
1626
|
-
|
|
1627
|
-
**QMD Implementation**:
|
|
1628
|
-
```typescript
|
|
1629
|
-
// Check if BM25 has strong, clear top result
|
|
1630
|
-
const topScore = initialFts[0]?.score ?? 0;
|
|
1631
|
-
const secondScore = initialFts[1]?.score ?? 0;
|
|
1632
|
-
const hasStrongSignal =
|
|
1633
|
-
initialFts.length > 0 &&
|
|
1634
|
-
topScore >= 0.85 &&
|
|
1635
|
-
(topScore - secondScore) >= 0.15;
|
|
1636
|
-
|
|
1637
|
-
if (hasStrongSignal) {
|
|
1638
|
-
// Skip expensive vector search and LLM operations
|
|
1639
|
-
return initialFts.slice(0, limit);
|
|
1640
|
-
}
|
|
1641
|
-
```
|
|
1642
|
-
|
|
1643
|
-
**QMD Data**: Saves 2-3 seconds on ~60% of queries (exact keyword matches).
|
|
1644
|
-
|
|
1645
|
-
**Current ClaudeMemory**:
|
|
1646
|
-
```ruby
|
|
1647
|
-
# Always run both FTS and vector search
|
|
1648
|
-
def query_semantic_dual(text, limit:, scope:, mode:)
|
|
1649
|
-
fts_results = collect_fts_results(...)
|
|
1650
|
-
vec_results = query_vector_stores(...) # Always runs
|
|
1651
|
-
|
|
1652
|
-
RRFusion.fuse([fts_results, vec_results])
|
|
1653
|
-
end
|
|
1654
|
-
```
|
|
1655
|
-
|
|
1656
|
-
**With Smart Detection**:
|
|
1657
|
-
```ruby
|
|
1658
|
-
# lib/claude_memory/recall/expansion_detector.rb
|
|
1659
|
-
module ClaudeMemory
|
|
1660
|
-
module Recall
|
|
1661
|
-
class ExpansionDetector
|
|
1662
|
-
STRONG_SCORE_THRESHOLD = 0.85
|
|
1663
|
-
STRONG_GAP_THRESHOLD = 0.15
|
|
1664
|
-
|
|
1665
|
-
def self.should_skip_expansion?(results)
|
|
1666
|
-
return false if results.empty? || results.size < 2
|
|
1667
|
-
|
|
1668
|
-
top_score = results[0][:score] || 0
|
|
1669
|
-
second_score = results[1][:score] || 0
|
|
1670
|
-
gap = top_score - second_score
|
|
1671
|
-
|
|
1672
|
-
top_score >= STRONG_SCORE_THRESHOLD &&
|
|
1673
|
-
gap >= STRONG_GAP_THRESHOLD
|
|
1674
|
-
end
|
|
1675
|
-
end
|
|
1676
|
-
end
|
|
1677
|
-
end
|
|
1678
|
-
|
|
1679
|
-
# Apply in Recall
|
|
1680
|
-
def query_semantic_dual(text, limit:, scope:, mode:)
|
|
1681
|
-
# First try FTS
|
|
1682
|
-
fts_results = collect_fts_results(text, limit: limit * 2, scope: scope)
|
|
1683
|
-
|
|
1684
|
-
# Check if we can skip vector search
|
|
1685
|
-
if mode == :both && ExpansionDetector.should_skip_expansion?(fts_results)
|
|
1686
|
-
return fts_results.first(limit) # Strong FTS signal
|
|
1687
|
-
end
|
|
1688
|
-
|
|
1689
|
-
# Weak signal - proceed with vector search and fusion
|
|
1690
|
-
vec_results = query_vector_stores(text, limit: limit * 2, scope: scope)
|
|
1691
|
-
RRFusion.fuse([fts_results, vec_results], weights: [1.0, 1.0]).first(limit)
|
|
1692
|
-
end
|
|
1693
|
-
```
|
|
1694
|
-
|
|
1695
|
-
**Benefits**:
|
|
1696
|
-
- **Performance optimization**: Avoids unnecessary vector search
|
|
1697
|
-
- **Simple heuristic**: Well-tested thresholds from QMD
|
|
1698
|
-
- **Transparent**: Can log when skipping for metrics
|
|
1699
|
-
- **No false negatives**: Only skips when FTS is very confident
|
|
1700
|
-
|
|
1701
|
-
**Implementation**:
|
|
1702
|
-
1. Create `lib/claude_memory/recall/expansion_detector.rb`
|
|
1703
|
-
2. Update `Recall#query_semantic_dual` to use detector
|
|
1704
|
-
3. Test with known exact-match queries
|
|
1705
|
-
4. Add optional metrics tracking
|
|
1706
|
-
|
|
1707
|
-
**Trade-off**: May miss semantically similar results for exact matches (acceptable).
|
|
1708
|
-
|
|
1709
|
-
**Recommendation**: **ADOPT IN PHASE 4**. Clear performance win with minimal code.
|
|
1710
|
-
|
|
1711
|
-
---
|
|
1712
|
-
|
|
1713
|
-
### Medium Priority (Valuable but Higher Cost)
|
|
1714
|
-
|
|
1715
|
-
#### 5. Document Chunking Strategy
|
|
1716
|
-
|
|
1717
|
-
**Value**: Better embeddings for long transcripts (>3000 chars).
|
|
1718
|
-
|
|
1719
|
-
**QMD Approach**:
|
|
1720
|
-
- 800 tokens max, 15% overlap
|
|
1721
|
-
- Semantic boundary detection
|
|
1722
|
-
- Both token-based and char-based variants
|
|
1723
|
-
|
|
1724
|
-
**Current ClaudeMemory**: Embeds entire fact text (typically short).
|
|
1725
|
-
|
|
1726
|
-
**When Needed**: If users have very long transcripts that produce multi-paragraph facts.
|
|
1727
|
-
|
|
1728
|
-
**Recommendation**: **CONSIDER** if we see performance issues with long content.
|
|
1729
|
-
|
|
1730
|
-
---
|
|
1731
|
-
|
|
1732
|
-
#### 6. LLM Response Caching
|
|
1733
|
-
|
|
1734
|
-
**Value**: Reduce API costs for repeated distillation.
|
|
1735
|
-
|
|
1736
|
-
**QMD Proof**: Caches query expansion and reranking, achieves ~80% cache hit rate.
|
|
1737
|
-
|
|
1738
|
-
**Implementation**:
|
|
1739
|
-
```ruby
|
|
1740
|
-
# lib/claude_memory/distill/cache.rb
|
|
1741
|
-
class DistillerCache
|
|
1742
|
-
def initialize(store)
|
|
1743
|
-
@store = store
|
|
1744
|
-
end
|
|
1745
|
-
|
|
1746
|
-
def fetch(content_hash)
|
|
1747
|
-
@store.db[:llm_cache].where(hash: content_hash).first&.dig(:result)
|
|
1748
|
-
end
|
|
1749
|
-
|
|
1750
|
-
def store(content_hash, result)
|
|
1751
|
-
@store.db[:llm_cache].insert_or_replace(
|
|
1752
|
-
hash: content_hash,
|
|
1753
|
-
result: result.to_json,
|
|
1754
|
-
created_at: Time.now.iso8601
|
|
1755
|
-
)
|
|
1756
|
-
|
|
1757
|
-
# Probabilistic cleanup (1% chance)
|
|
1758
|
-
cleanup_if_needed if rand < 0.01
|
|
1759
|
-
end
|
|
1760
|
-
|
|
1761
|
-
private
|
|
1762
|
-
|
|
1763
|
-
def cleanup_if_needed
|
|
1764
|
-
@store.db.transaction do
|
|
1765
|
-
@store.db.run(<<~SQL)
|
|
1766
|
-
DELETE FROM llm_cache
|
|
1767
|
-
WHERE hash NOT IN (
|
|
1768
|
-
SELECT hash FROM llm_cache
|
|
1769
|
-
ORDER BY created_at DESC
|
|
1770
|
-
LIMIT 1000
|
|
1771
|
-
)
|
|
1772
|
-
SQL
|
|
1773
|
-
end
|
|
1774
|
-
end
|
|
1775
|
-
end
|
|
1776
|
-
```
|
|
1777
|
-
|
|
1778
|
-
**Recommendation**: **ADOPT when distiller is fully implemented**. Clear cost savings.
|
|
1779
|
-
|
|
1780
|
-
---
|
|
1781
|
-
|
|
1782
|
-
### Low Priority (Interesting but Not Critical)
|
|
1783
|
-
|
|
1784
|
-
#### 7. Enhanced Snippet Extraction
|
|
1785
|
-
|
|
1786
|
-
**Value**: Better search result previews with query term highlighting.
|
|
1787
|
-
|
|
1788
|
-
**QMD Approach**:
|
|
1789
|
-
```typescript
|
|
1790
|
-
function extractSnippet(body: string, query: string, maxLen = 500) {
|
|
1791
|
-
const terms = query.toLowerCase().split(/\s+/);
|
|
1792
|
-
|
|
1793
|
-
// Find line with most query term matches
|
|
1794
|
-
const lines = body.split('\n');
|
|
1795
|
-
let bestLine = 0, bestScore = -1;
|
|
1796
|
-
|
|
1797
|
-
for (let i = 0; i < lines.length; i++) {
|
|
1798
|
-
const lineLower = lines[i].toLowerCase();
|
|
1799
|
-
const score = terms.filter(t => lineLower.includes(t)).length;
|
|
1800
|
-
if (score > bestScore) {
|
|
1801
|
-
bestScore = score;
|
|
1802
|
-
bestLine = i;
|
|
1803
|
-
}
|
|
1804
|
-
}
|
|
1805
|
-
|
|
1806
|
-
// Extract context (1 line before, 2 lines after)
|
|
1807
|
-
const start = Math.max(0, bestLine - 1);
|
|
1808
|
-
const end = Math.min(lines.length, bestLine + 3);
|
|
1809
|
-
const snippet = lines.slice(start, end).join('\n');
|
|
1810
|
-
|
|
1811
|
-
return {
|
|
1812
|
-
line: bestLine + 1,
|
|
1813
|
-
snippet: snippet.substring(0, maxLen),
|
|
1814
|
-
linesBefore: start,
|
|
1815
|
-
linesAfter: lines.length - end
|
|
1816
|
-
};
|
|
1817
|
-
}
|
|
1818
|
-
```
|
|
1819
|
-
|
|
1820
|
-
**Recommendation**: **CONSIDER for better UX** in search results.
|
|
1821
|
-
|
|
1822
|
-
---
|
|
1823
|
-
|
|
1824
|
-
### Features NOT to Adopt
|
|
1825
|
-
|
|
1826
|
-
#### ❌ YAML Collection System
|
|
1827
|
-
|
|
1828
|
-
**QMD Use**: Manages multi-directory indexing with per-path contexts.
|
|
1829
|
-
|
|
1830
|
-
**Our Use**: Dual-database (global + project) already provides clean separation.
|
|
1831
|
-
|
|
1832
|
-
**Mismatch**: Collections add complexity without clear benefit for our use case.
|
|
1833
|
-
|
|
1834
|
-
**Recommendation**: **REJECT** - Our dual-DB approach is simpler and better suited.
|
|
1835
|
-
|
|
1836
|
-
---
|
|
1837
|
-
|
|
1838
|
-
#### ❌ Content-Addressable Document Storage
|
|
1839
|
-
|
|
1840
|
-
**QMD Use**: Deduplicates full markdown documents by SHA256 hash.
|
|
1841
|
-
|
|
1842
|
-
**Our Use**: Facts are deduplicated by semantic signature, not content hash.
|
|
1843
|
-
|
|
1844
|
-
**Mismatch**: We don't store full documents, we extract facts.
|
|
1845
|
-
|
|
1846
|
-
**Recommendation**: **REJECT** - Different data model.
|
|
1847
|
-
|
|
1848
|
-
---
|
|
1849
|
-
|
|
1850
|
-
#### ❌ Virtual Path System (qmd://collection/path)
|
|
1851
|
-
|
|
1852
|
-
**QMD Use**: Unified namespace across multiple collections.
|
|
1853
|
-
|
|
1854
|
-
**Our Use**: Dual-database provides clear namespace (global vs project).
|
|
1855
|
-
|
|
1856
|
-
**Mismatch**: Adds complexity for no clear benefit.
|
|
1857
|
-
|
|
1858
|
-
**Recommendation**: **REJECT** - Unnecessary abstraction.
|
|
1859
|
-
|
|
1860
|
-
---
|
|
1861
|
-
|
|
1862
|
-
#### ❌ Neural Embeddings (EmbeddingGemma)
|
|
1863
|
-
|
|
1864
|
-
**QMD Use**: 300M parameter model for high-quality semantic search.
|
|
1865
|
-
|
|
1866
|
-
**Our Use**: TF-IDF (lightweight, no dependencies).
|
|
1867
|
-
|
|
1868
|
-
**Trade-off**:
|
|
1869
|
-
- ✅ Better quality (+40% Hit@3 over TF-IDF)
|
|
1870
|
-
- ❌ 300MB download
|
|
1871
|
-
- ❌ 300MB VRAM
|
|
1872
|
-
- ❌ 2s cold start latency
|
|
1873
|
-
- ❌ Complex dependency (node-llama-cpp or similar)
|
|
1874
|
-
|
|
1875
|
-
**Decision**: **DEFER** - TF-IDF sufficient for now. Revisit if users report poor semantic search quality.
|
|
1876
|
-
|
|
1877
|
-
---
|
|
1878
|
-
|
|
1879
|
-
#### ❌ Cross-Encoder Reranking
|
|
1880
|
-
|
|
1881
|
-
**QMD Use**: LLM scores query-document relevance for final ranking.
|
|
1882
|
-
|
|
1883
|
-
**Our Use**: None (just use retrieval scores).
|
|
1884
|
-
|
|
1885
|
-
**Trade-off**:
|
|
1886
|
-
- ✅ Better precision (elevates semantically relevant results)
|
|
1887
|
-
- ❌ 640MB model
|
|
1888
|
-
- ❌ 400ms latency per query
|
|
1889
|
-
- ❌ Complex dependency
|
|
1890
|
-
|
|
1891
|
-
**Decision**: **REJECT** - Over-engineering for fact retrieval. Facts are already structured; reranking is overkill.
|
|
1892
|
-
|
|
1893
|
-
---
|
|
1894
|
-
|
|
1895
|
-
#### ❌ Query Expansion (LLM)
|
|
1896
|
-
|
|
1897
|
-
**QMD Use**: Generates alternative query phrasings for better recall.
|
|
1898
|
-
|
|
1899
|
-
**Our Use**: None (single query only).
|
|
1900
|
-
|
|
1901
|
-
**Trade-off**:
|
|
1902
|
-
- ✅ Better recall (finds documents with different terminology)
|
|
1903
|
-
- ❌ 2.2GB model
|
|
1904
|
-
- ❌ 800ms latency per query
|
|
1905
|
-
- ❌ Complex dependency
|
|
1906
|
-
|
|
1907
|
-
**Decision**: **REJECT** - We don't have LLM in recall path (only in distill). Adding LLM dependency for recall is too heavy.
|
|
352
|
+
### High Priority ⭐
|
|
353
|
+
|
|
354
|
+
#### 1. Claude Code Plugin Distribution Format ⭐ NEW
|
|
355
|
+
- **Value**: 10x easier installation (single command vs multi-step gem + MCP + hook config)
|
|
356
|
+
- **Evidence**: `.claude-plugin/marketplace.json` — complete plugin spec; `skills/qmd/SKILL.md` — skill definition with tool scoping
|
|
357
|
+
- **Implementation**: Create `.claude-plugin/marketplace.json` with `mcpServers` pointing to `claude-memory serve-mcp`, skill definition from existing MCP tools, and `allowed-tools: mcp__claude-memory__*`
|
|
358
|
+
- **Effort**: 2-3 days (plugin metadata, skill definition, testing, documentation)
|
|
359
|
+
- **Trade-off**: Depends on Claude Code plugin ecosystem maturity; current hooks integration may still be needed
|
|
360
|
+
- **Recommendation**: **ADOPT** — QMD proves the format works. Start with plugin skeleton, iterate as ecosystem matures
|
|
361
|
+
- **Integration Points**: New `.claude-plugin/` directory, `skills/` directory, update installation docs
|
|
362
|
+
|
|
363
|
+
#### 2. MCP Structured Content Pattern ⭐ NEW
|
|
364
|
+
- **Value**: Better MCP response quality — dual human-readable + machine-parseable output
|
|
365
|
+
- **Evidence**: `mcp.ts:288-291` — `{ content: [{ type: "text", text: summary }], structuredContent: { results } }`
|
|
366
|
+
- **Implementation**: Update all 18 MCP tool handlers to return both `content` (text summary) and `structuredContent` (JSON). Text content would be a concise summary; structured content preserves full data.
|
|
367
|
+
- **Effort**: 1-2 days (update tool handlers, update tests)
|
|
368
|
+
- **Trade-off**: Slightly more code per tool handler; may need to verify Claude Code MCP client supports `structuredContent`
|
|
369
|
+
- **Recommendation**: **ADOPT** — Pure improvement, no downside if client supports it
|
|
370
|
+
- **Integration Points**: `lib/claude_memory/mcp/server.rb`, all tool handler methods
|
|
371
|
+
|
|
372
|
+
#### 3. MCP Registered Prompt for Query Guidance ⭐ NEW
|
|
373
|
+
- **Value**: Claude uses memory tools more effectively with embedded search strategy
|
|
374
|
+
- **Evidence**: `mcp.ts:172-252` — registered prompt explaining when to use recall vs recall_semantic vs search_concepts
|
|
375
|
+
- **Implementation**: Register a `memory_guide` prompt in our MCP server explaining tool selection strategy (recall for keywords, recall_semantic for concepts, search_concepts for multi-faceted queries, explain for provenance)
|
|
376
|
+
- **Effort**: 4-6 hours (write prompt, register in server, test)
|
|
377
|
+
- **Trade-off**: Minimal; prompt is only loaded on request
|
|
378
|
+
- **Recommendation**: **ADOPT** — Simple way to improve tool usage quality
|
|
379
|
+
- **Integration Points**: `lib/claude_memory/mcp/server.rb`
|
|
380
|
+
|
|
381
|
+
#### 4. Inline Status Check in Skills ⭐ NEW
|
|
382
|
+
- **Value**: Immediate feedback on memory system health when skill loads
|
|
383
|
+
- **Evidence**: `SKILL.md:18` — `!` prefix runs `qmd status 2>/dev/null || echo "Not installed"`
|
|
384
|
+
- **Implementation**: Add inline check to our skill definition: `!claude-memory doctor --brief 2>/dev/null || echo "Not configured. Run: gem install claude_memory"`
|
|
385
|
+
- **Effort**: 1-2 hours
|
|
386
|
+
- **Trade-off**: None
|
|
387
|
+
- **Recommendation**: **ADOPT** — Trivial improvement with clear benefit
|
|
388
|
+
- **Integration Points**: Skill definition file
|
|
389
|
+
|
|
390
|
+
### Previously Identified (Carried Forward)
|
|
391
|
+
|
|
392
|
+
These items from the 2026-01-26 analysis remain relevant:
|
|
393
|
+
|
|
394
|
+
#### 5. ⭐ Native Vector Storage (sqlite-vec) — STILL CRITICAL
|
|
395
|
+
- **Value**: 10-100x faster KNN queries
|
|
396
|
+
- **Status**: Not yet implemented in ClaudeMemory
|
|
397
|
+
- **Updated Evidence**: QMD now handles 10,000+ documents in production (5,700+ star project)
|
|
398
|
+
- **Recommendation**: **ADOPT IMMEDIATELY** — Foundational improvement
|
|
399
|
+
|
|
400
|
+
#### 6. ⭐ Reciprocal Rank Fusion (RRF) Algorithm — STILL HIGH VALUE
|
|
401
|
+
- **Value**: 50% improvement in Hit@3 for medium-difficulty queries
|
|
402
|
+
- **Status**: Not yet implemented in ClaudeMemory
|
|
403
|
+
- **Recommendation**: **ADOPT IMMEDIATELY** — Pure algorithmic improvement
|
|
404
|
+
|
|
405
|
+
#### 7. ⭐ Docid Short Hash System — STILL MEDIUM VALUE
|
|
406
|
+
- **Value**: Better UX, cross-database fact references
|
|
407
|
+
- **Status**: Not yet implemented
|
|
408
|
+
- **Recommendation**: **ADOPT IN PHASE 2**
|
|
409
|
+
|
|
410
|
+
#### 8. ⭐ Smart Expansion Detection — STILL MEDIUM VALUE
|
|
411
|
+
- **Value**: Skip unnecessary vector search when FTS has strong signal
|
|
412
|
+
- **Status**: Not yet implemented
|
|
413
|
+
- **Recommendation**: **ADOPT IN PHASE 3**
|
|
414
|
+
|
|
415
|
+
### Medium Priority
|
|
416
|
+
|
|
417
|
+
#### 9. Skill Definition with Tool Scoping
|
|
418
|
+
- **Value**: Security and UX — limit tool access to memory-related commands
|
|
419
|
+
- **Evidence**: `SKILL.md:9` — `allowed-tools: Bash(qmd:*), mcp__qmd__*`
|
|
420
|
+
- **Implementation**: Define skill with `allowed-tools: Bash(claude-memory:*), mcp__claude-memory__*`
|
|
421
|
+
- **Effort**: Included in plugin distribution work
|
|
422
|
+
- **Recommendation**: **CONSIDER** — Good practice for plugin security
|
|
423
|
+
- **Integration Points**: Skills directory
|
|
424
|
+
|
|
425
|
+
#### 10. Evaluation Harness Improvements
|
|
426
|
+
- **Value**: QMD's eval structure with difficulty levels and Hit@K metrics is cleaner
|
|
427
|
+
- **Evidence**: `test/eval-harness.ts:11-16` — typed queries with difficulty + description
|
|
428
|
+
- **Implementation**: Already have DevMemBench (more comprehensive). Could adopt difficulty classification.
|
|
429
|
+
- **Recommendation**: **CONSIDER** — Our evals are already better; could add difficulty labels
|
|
430
|
+
|
|
431
|
+
### Low Priority
|
|
432
|
+
|
|
433
|
+
#### 11. YAML-Based Collection Configuration
|
|
434
|
+
- **Value**: User-editable config for what gets indexed
|
|
435
|
+
- **Evidence**: `collections.ts`, `example-index.yml`
|
|
436
|
+
- **Recommendation**: **REJECT** — Our dual-database provides cleaner separation
|
|
437
|
+
|
|
438
|
+
#### 12. Custom Query Expansion Model
|
|
439
|
+
- **Value**: Better search recall via ML-powered query rewriting
|
|
440
|
+
- **Evidence**: `finetune/` — complete training pipeline
|
|
441
|
+
- **Recommendation**: **REJECT** — Too heavy (1.7B model) for our fact retrieval use case. If we need expansion, we can leverage Claude's own capabilities during recall.
|
|
442
|
+
|
|
443
|
+
#### 13. LLM-Based Reranking
|
|
444
|
+
- **Value**: Better ranking precision
|
|
445
|
+
- **Recommendation**: **REJECT** — Over-engineering for structured fact retrieval
|
|
446
|
+
|
|
447
|
+
### Features to Avoid
|
|
448
|
+
|
|
449
|
+
#### 1. Heavy Local LLM Dependencies
|
|
450
|
+
- **What It Is**: Three GGUF models totaling ~2GB for search operations
|
|
451
|
+
- **Why Avoid**: ClaudeMemory targets lightweight, instant search. 2-3s cold start and 3GB memory is inappropriate for a fact lookup tool.
|
|
452
|
+
- **Our Alternative**: FastEmbed (67MB ONNX, <100ms) provides adequate semantic search for structured facts.
|
|
453
|
+
|
|
454
|
+
#### 2. Content-Addressable Document Storage
|
|
455
|
+
- **What It Is**: SHA256 hash-based deduplication of full documents
|
|
456
|
+
- **Why Avoid**: We store facts, not documents. Our deduplication is by fact signature.
|
|
457
|
+
- **Our Alternative**: Existing fact signature-based deduplication.
|
|
1908
458
|
|
|
1909
459
|
---
|
|
1910
460
|
|
|
1911
461
|
## Implementation Recommendations
|
|
1912
462
|
|
|
1913
|
-
###
|
|
463
|
+
### Phase 1: Plugin Foundation (NEW)
|
|
1914
464
|
|
|
1915
|
-
|
|
1916
|
-
|
|
1917
|
-
**Goal**: Adopt sqlite-vec and RRF fusion for performance and quality.
|
|
465
|
+
**Goals**: Establish ClaudeMemory as a Claude Code plugin with improved MCP output
|
|
1918
466
|
|
|
1919
467
|
**Tasks**:
|
|
1920
|
-
|
|
1921
|
-
|
|
1922
|
-
|
|
1923
|
-
|
|
1924
|
-
|
|
1925
|
-
|
|
1926
|
-
7. Test migration on existing databases
|
|
1927
|
-
8. Document extension installation in README
|
|
1928
|
-
|
|
1929
|
-
**Expected Impact**:
|
|
1930
|
-
- 10-100x faster vector search
|
|
1931
|
-
- 50% better hybrid search quality (Hit@3)
|
|
1932
|
-
- Scales to 50,000+ facts
|
|
1933
|
-
|
|
1934
|
-
**Effort**: 2-3 days
|
|
468
|
+
- [ ] Create `.claude-plugin/marketplace.json` with plugin metadata
|
|
469
|
+
- [ ] Create skill definition with tool scoping and inline health check
|
|
470
|
+
- [ ] Add MCP structured content pattern to all 18 tool handlers
|
|
471
|
+
- [ ] Register query guidance prompt in MCP server
|
|
472
|
+
- [ ] Test plugin installation workflow
|
|
473
|
+
- [ ] Update installation docs
|
|
1935
474
|
|
|
1936
|
-
|
|
475
|
+
**Success Criteria**:
|
|
476
|
+
- ClaudeMemory installable via `claude plugin add`
|
|
477
|
+
- MCP tools return both text summaries and structured JSON
|
|
478
|
+
- Query guide prompt available via MCP
|
|
1937
479
|
|
|
1938
|
-
|
|
1939
|
-
|
|
1940
|
-
**Goal**: Adopt docid hashes and smart detection for better UX and performance.
|
|
1941
|
-
|
|
1942
|
-
**Tasks**:
|
|
1943
|
-
1. Create schema migration v8 for `docid` column
|
|
1944
|
-
2. Backfill existing facts with docids
|
|
1945
|
-
3. Update CLI commands (`ExplainCommand`, `RecallCommand`) to accept docids
|
|
1946
|
-
4. Update MCP tools to accept docids
|
|
1947
|
-
5. Update output formatting to show docids
|
|
1948
|
-
6. Implement `Recall::ExpansionDetector` class
|
|
1949
|
-
7. Update `Recall#query_semantic_dual` to use detector
|
|
1950
|
-
8. Add optional metrics tracking (skip rate, avg latency)
|
|
1951
|
-
|
|
1952
|
-
**Expected Impact**:
|
|
1953
|
-
- Better UX (human-friendly fact references)
|
|
1954
|
-
- 200-500ms latency reduction on exact matches
|
|
1955
|
-
- Cross-database references without context
|
|
1956
|
-
|
|
1957
|
-
**Effort**: 1-2 days
|
|
480
|
+
**Risks**: Plugin ecosystem may change; maintain backward compatibility with manual setup
|
|
1958
481
|
|
|
1959
482
|
---
|
|
1960
483
|
|
|
1961
|
-
|
|
484
|
+
### Phase 2: Vector Storage Upgrade (CARRIED FORWARD)
|
|
1962
485
|
|
|
1963
|
-
**
|
|
486
|
+
**Goals**: Adopt sqlite-vec for native KNN and RRF fusion for search quality
|
|
1964
487
|
|
|
1965
488
|
**Tasks**:
|
|
1966
|
-
|
|
1967
|
-
|
|
1968
|
-
|
|
1969
|
-
|
|
1970
|
-
|
|
1971
|
-
6. Implement chunking strategy if needed
|
|
489
|
+
- [ ] Add sqlite-vec extension support
|
|
490
|
+
- [ ] Schema migration for `facts_vec` virtual table (two-step query pattern)
|
|
491
|
+
- [ ] Implement `Recall::RRFusion` class
|
|
492
|
+
- [ ] Backfill existing embeddings
|
|
493
|
+
- [ ] Benchmark: target 10x KNN improvement
|
|
1972
494
|
|
|
1973
|
-
**
|
|
1974
|
-
-
|
|
1975
|
-
-
|
|
1976
|
-
|
|
1977
|
-
**Effort**: 2-3 days
|
|
1978
|
-
|
|
1979
|
-
---
|
|
1980
|
-
|
|
1981
|
-
### Testing Strategy
|
|
1982
|
-
|
|
1983
|
-
**Unit Tests**:
|
|
1984
|
-
- RRFusion algorithm with synthetic ranked lists
|
|
1985
|
-
- ExpansionDetector with various score distributions
|
|
1986
|
-
- Docid generation and collision handling
|
|
1987
|
-
- sqlite-vec migration (up and down)
|
|
1988
|
-
|
|
1989
|
-
**Integration Tests**:
|
|
1990
|
-
- End-to-end hybrid search with RRF fusion
|
|
1991
|
-
- Cross-database docid lookups
|
|
1992
|
-
- Cache hit/miss behavior
|
|
1993
|
-
- Smart detection skip rate
|
|
1994
|
-
|
|
1995
|
-
**Evaluation Suite** (optional but recommended):
|
|
1996
|
-
- Create synthetic fact corpus with known relationships
|
|
1997
|
-
- Define easy/medium/hard recall queries
|
|
1998
|
-
- Measure Hit@K before/after RRF adoption
|
|
1999
|
-
- Track latency improvements from smart detection
|
|
2000
|
-
|
|
2001
|
-
**Performance Tests**:
|
|
2002
|
-
- Benchmark vector search: JSON vs sqlite-vec
|
|
2003
|
-
- Measure RRF overhead (<10ms expected)
|
|
2004
|
-
- Profile smart detection accuracy
|
|
495
|
+
**Success Criteria**:
|
|
496
|
+
- Vector search uses native sqlite-vec
|
|
497
|
+
- RRF fusion active for hybrid queries
|
|
498
|
+
- DevMemBench shows improved retrieval metrics
|
|
2005
499
|
|
|
2006
500
|
---
|
|
2007
501
|
|
|
2008
|
-
###
|
|
502
|
+
### Phase 3: UX Polish (CARRIED FORWARD)
|
|
2009
503
|
|
|
2010
|
-
**
|
|
2011
|
-
- Always use transactions for atomicity
|
|
2012
|
-
- Provide rollback path (down migration)
|
|
2013
|
-
- Test on copy of production database first
|
|
2014
|
-
- Backup before running migrations
|
|
504
|
+
**Goals**: Docid hashes and smart expansion detection
|
|
2015
505
|
|
|
2016
|
-
**
|
|
2017
|
-
-
|
|
2018
|
-
-
|
|
2019
|
-
-
|
|
2020
|
-
|
|
2021
|
-
**Rollback Plan**:
|
|
2022
|
-
- Keep JSON embeddings column until v7 is stable
|
|
2023
|
-
- Provide `migrate_down_to_v6` method
|
|
2024
|
-
- Document rollback procedure in CHANGELOG
|
|
506
|
+
**Tasks**:
|
|
507
|
+
- [ ] Schema migration for `docid` column (8-char hash)
|
|
508
|
+
- [ ] Implement `Recall::ExpansionDetector`
|
|
509
|
+
- [ ] Update CLI and MCP tools for docid support
|
|
2025
510
|
|
|
2026
511
|
---
|
|
2027
512
|
|
|
2028
513
|
## Architecture Decisions
|
|
2029
514
|
|
|
2030
|
-
###
|
|
2031
|
-
|
|
2032
|
-
**1. Fact-Based Knowledge Graph**
|
|
2033
|
-
|
|
2034
|
-
**What**: Subject-predicate-object triples vs full document storage.
|
|
2035
|
-
|
|
2036
|
-
**Why Keep**:
|
|
2037
|
-
- Enables structured queries ("What databases does X use?")
|
|
2038
|
-
- Supports inference (supersession, conflicts)
|
|
2039
|
-
- More precise than document-level retrieval
|
|
2040
|
-
|
|
2041
|
-
**Don't Adopt**: QMD's document-centric model.
|
|
2042
|
-
|
|
2043
|
-
---
|
|
2044
|
-
|
|
2045
|
-
**2. Truth Maintenance System**
|
|
515
|
+
### What to Preserve
|
|
2046
516
|
|
|
2047
|
-
**
|
|
517
|
+
- **Fact-Based Knowledge Graph**: Our structured triples are fundamentally different from (and better suited for knowledge extraction than) QMD's document storage
|
|
518
|
+
- **Truth Maintenance**: Supersession + conflict resolution is a core differentiator
|
|
519
|
+
- **Dual-Database Architecture**: Cleaner than YAML collections for our use case
|
|
520
|
+
- **Lightweight Dependencies**: Ruby gem + ONNX embeddings vs 2GB+ GGUF models
|
|
2048
521
|
|
|
2049
|
-
|
|
2050
|
-
- Resolves contradictions automatically
|
|
2051
|
-
- Distinguishes single-value vs multi-value predicates
|
|
2052
|
-
- Provides evidence chain via provenance
|
|
522
|
+
### What to Adopt (NEW)
|
|
2053
523
|
|
|
2054
|
-
**
|
|
2055
|
-
|
|
2056
|
-
|
|
524
|
+
- **Plugin Distribution Format**: `.claude-plugin/marketplace.json` + skills for frictionless installation
|
|
525
|
+
- **Structured MCP Content**: Dual `content`/`structuredContent` responses for all tools
|
|
526
|
+
- **MCP Query Guide Prompt**: Registered prompt teaching Claude how to use memory tools effectively
|
|
527
|
+
- **Inline Status Checks**: Skill-level health verification on load
|
|
2057
528
|
|
|
2058
|
-
|
|
529
|
+
### What to Adopt (CARRIED FORWARD)
|
|
2059
530
|
|
|
2060
|
-
**
|
|
531
|
+
- **sqlite-vec Native Vectors**: 10-100x faster KNN (critical)
|
|
532
|
+
- **RRF Fusion**: 50% search quality improvement (critical)
|
|
533
|
+
- **Docid Short Hashes**: Better UX for fact references
|
|
534
|
+
- **Smart Expansion Detection**: Skip vector search when FTS is confident
|
|
2061
535
|
|
|
2062
|
-
|
|
2063
|
-
- Clean separation of concerns
|
|
2064
|
-
- Better than YAML collections for our use case
|
|
2065
|
-
- Simpler queries (no project_path filtering)
|
|
536
|
+
### What to Reject
|
|
2066
537
|
|
|
2067
|
-
**
|
|
538
|
+
- **Local LLM Models for Search**: Too heavy (2GB+, 3s cold start)
|
|
539
|
+
- **Custom Fine-Tuned Models**: Training pipeline is impressive but overkill for fact retrieval
|
|
540
|
+
- **YAML Collection System**: Our dual-DB is better for our use case
|
|
541
|
+
- **Content-Addressable Storage**: Different data model
|
|
542
|
+
- **Virtual Path System**: Unnecessary for fact-based storage
|
|
2068
543
|
|
|
2069
544
|
---
|
|
2070
545
|
|
|
2071
|
-
|
|
546
|
+
## Key Takeaways
|
|
2072
547
|
|
|
2073
|
-
|
|
548
|
+
### Main Learnings
|
|
2074
549
|
|
|
2075
|
-
**
|
|
2076
|
-
- Fast installation (<5MB)
|
|
2077
|
-
- No heavy models required
|
|
2078
|
-
- Works offline for core features
|
|
550
|
+
1. **Plugin distribution is the future**: QMD's marketplace plugin reduces installation from "read docs, install gem, configure MCP, set up hooks, restart Claude" to one command. This is the single most impactful UX improvement we should adopt.
|
|
2079
551
|
|
|
2080
|
-
**
|
|
2081
|
-
- ✅ sqlite-vec (small, well-maintained)
|
|
2082
|
-
- ❌ Neural embeddings (300MB, complex)
|
|
2083
|
-
- ❌ LLM reranking (640MB, complex)
|
|
552
|
+
2. **Structured MCP responses matter**: Returning both text summary and structured JSON is a simple pattern that significantly improves how Claude consumes tool output.
|
|
2084
553
|
|
|
2085
|
-
|
|
554
|
+
3. **Fine-tuned models for specific tasks work**: QMD's two-stage SFT→GRPO pipeline for query expansion is state-of-the-art. While we shouldn't adopt the models themselves (too heavy), the reward function design and structured output routing are good reference patterns.
|
|
2086
555
|
|
|
2087
|
-
|
|
556
|
+
4. **Eval methodology with difficulty levels**: QMD's easy/medium/hard query classification provides clearer signal about where improvements matter. Our DevMemBench is more comprehensive but could benefit from this labeling.
|
|
2088
557
|
|
|
2089
|
-
|
|
558
|
+
5. **The previous QMD analysis recommendations remain valid**: sqlite-vec, RRF, docids, and smart expansion are still unimplemented and still valuable.
|
|
2090
559
|
|
|
2091
|
-
|
|
2092
|
-
- Industry standard (used by Chroma, LanceDB, etc.)
|
|
2093
|
-
- 10-100x performance improvement
|
|
2094
|
-
- Enables larger databases
|
|
2095
|
-
- Well-maintained, cross-platform
|
|
560
|
+
### Recommended Adoption Order
|
|
2096
561
|
|
|
2097
|
-
**
|
|
562
|
+
1. **First**: Plugin distribution format — highest UX impact, unblocks ecosystem adoption
|
|
563
|
+
2. **Second**: MCP structured content + query guide prompt — low effort, immediate quality gain
|
|
564
|
+
3. **Third**: sqlite-vec + RRF fusion — foundational performance and quality
|
|
565
|
+
4. **Fourth**: Docids + smart expansion — polish and optimization
|
|
2098
566
|
|
|
2099
|
-
|
|
567
|
+
### Expected Impact
|
|
2100
568
|
|
|
2101
|
-
**
|
|
569
|
+
- **Installation**: 10x easier (single command vs multi-step)
|
|
570
|
+
- **MCP Quality**: Better Claude tool usage with structured responses + query guidance
|
|
571
|
+
- **Search Performance**: 10-100x faster KNN (sqlite-vec), 50% better Hit@3 (RRF)
|
|
572
|
+
- **UX**: Human-friendly fact references (#abc123de), smarter search skipping
|
|
2102
573
|
|
|
2103
|
-
|
|
2104
|
-
- Mathematically sound
|
|
2105
|
-
- Proven results (50% improvement)
|
|
2106
|
-
- Pure algorithm (no dependencies)
|
|
2107
|
-
- Fast (<10ms overhead)
|
|
574
|
+
### Next Actions
|
|
2108
575
|
|
|
2109
|
-
|
|
576
|
+
- [ ] Review plugin distribution feasibility (check Claude Code plugin spec)
|
|
577
|
+
- [ ] Implement MCP structured content pattern (quick win)
|
|
578
|
+
- [ ] Register query guide MCP prompt (quick win)
|
|
579
|
+
- [ ] Continue with sqlite-vec + RRF adoption plan from previous analysis
|
|
580
|
+
- [ ] Store analysis findings in memory
|
|
2110
581
|
|
|
2111
582
|
---
|
|
2112
583
|
|
|
2113
|
-
|
|
2114
|
-
|
|
2115
|
-
**Why Adopt**:
|
|
2116
|
-
- Standard pattern (Git, QMD, etc.)
|
|
2117
|
-
- Better UX for CLI tools
|
|
2118
|
-
- Cross-database references
|
|
2119
|
-
|
|
2120
|
-
**Implementation**: Phase 2 (near-term).
|
|
2121
|
-
|
|
2122
|
-
---
|
|
2123
|
-
|
|
2124
|
-
**4. Smart Expansion Detection**
|
|
2125
|
-
|
|
2126
|
-
**Why Adopt**:
|
|
2127
|
-
- Clear performance win
|
|
2128
|
-
- Simple heuristic
|
|
2129
|
-
- No downsides (only skips when confident)
|
|
2130
|
-
|
|
2131
|
-
**Implementation**: Phase 2 (near-term).
|
|
2132
|
-
|
|
2133
|
-
---
|
|
2134
|
-
|
|
2135
|
-
### Reject Due to Cost/Benefit
|
|
2136
|
-
|
|
2137
|
-
**1. Neural Embeddings**
|
|
2138
|
-
|
|
2139
|
-
**Cost**: 300MB download, 2s latency, complex dependency.
|
|
2140
|
-
|
|
2141
|
-
**Benefit**: Better semantic search quality.
|
|
2142
|
-
|
|
2143
|
-
**Decision**: DEFER - TF-IDF sufficient for now.
|
|
2144
|
-
|
|
2145
|
-
---
|
|
2146
|
-
|
|
2147
|
-
**2. LLM Reranking**
|
|
2148
|
-
|
|
2149
|
-
**Cost**: 640MB model, 400ms latency per query.
|
|
2150
|
-
|
|
2151
|
-
**Benefit**: Better ranking precision.
|
|
2152
|
-
|
|
2153
|
-
**Decision**: REJECT - Over-engineering for structured facts.
|
|
2154
|
-
|
|
2155
|
-
---
|
|
2156
|
-
|
|
2157
|
-
**3. Query Expansion**
|
|
2158
|
-
|
|
2159
|
-
**Cost**: 2.2GB model, 800ms latency per query.
|
|
2160
|
-
|
|
2161
|
-
**Benefit**: Better recall with alternative phrasings.
|
|
2162
|
-
|
|
2163
|
-
**Decision**: REJECT - No LLM in recall path, too heavy.
|
|
2164
|
-
|
|
2165
|
-
---
|
|
2166
|
-
|
|
2167
|
-
## Conclusion
|
|
2168
|
-
|
|
2169
|
-
QMD demonstrates **state-of-the-art hybrid search** with impressive quality improvements (50%+ over BM25). However, it achieves this through heavy dependencies (3GB+ models) that may not be appropriate for all use cases.
|
|
2170
|
-
|
|
2171
|
-
**Key Takeaways**:
|
|
2172
|
-
|
|
2173
|
-
1. **sqlite-vec is essential**: Native vector storage is 10-100x faster. This is a must-adopt.
|
|
2174
|
-
|
|
2175
|
-
2. **RRF fusion is proven**: 50% quality improvement with zero dependencies. This is a must-adopt.
|
|
2176
|
-
|
|
2177
|
-
3. **Smart optimizations matter**: Expansion detection saves 200-500ms on 60% of queries. This is worth adopting.
|
|
2178
|
-
|
|
2179
|
-
4. **Neural models are costly**: 3GB+ models provide better quality but at significant cost. Defer for now.
|
|
2180
|
-
|
|
2181
|
-
5. **Architecture matters**: QMD's document model differs from our fact model. Adopt algorithms, not architecture.
|
|
2182
|
-
|
|
2183
|
-
**Recommended Adoption Order**:
|
|
2184
|
-
|
|
2185
|
-
1. **Immediate**: sqlite-vec + RRF fusion (performance foundation)
|
|
2186
|
-
2. **Near-term**: Docids + smart detection (UX + optimization)
|
|
2187
|
-
3. **Future**: LLM caching + chunking (cost reduction)
|
|
2188
|
-
4. **Defer**: Neural embeddings (wait for user feedback)
|
|
2189
|
-
5. **Reject**: LLM reranking + query expansion (over-engineering)
|
|
584
|
+
## References
|
|
2190
585
|
|
|
2191
|
-
|
|
586
|
+
- **Repository**: https://github.com/tobi/qmd
|
|
587
|
+
- **Previous Analysis**: docs/influence/qmd.md (2026-01-26)
|
|
588
|
+
- **Claude Code Plugins**: https://code.claude.com/docs/en/plugins.md
|
|
589
|
+
- **MCP Spec**: https://modelcontextprotocol.io
|
|
590
|
+
- **sqlite-vec**: https://github.com/asg017/sqlite-vec
|
|
591
|
+
- **RRF Paper**: Cormack et al., "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods" (2009)
|
|
2192
592
|
|
|
2193
593
|
---
|
|
2194
594
|
|
|
2195
|
-
*
|
|
595
|
+
*Analysis completed: 2026-02-02*
|
|
596
|
+
*Analyst: Claude Code*
|
|
597
|
+
*Review Status: Draft — Updated from 2026-01-26 analysis with new findings on plugin distribution, fine-tuned models, and MCP patterns*
|