RubyGems - claude_memory - Versions diffs - 0.4.0 → 0.5.0 - Mend

claude_memory 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

checksums.yaml +4 -4
data/.claude/CLAUDE.md +1 -1
data/.claude/rules/claude_memory.generated.md +14 -1
data/.claude/skills/check-memory/SKILL.md +10 -0
data/.claude/skills/improve/SKILL.md +12 -1
data/.claude-plugin/plugin.json +1 -1
data/CHANGELOG.md +70 -0
data/db/migrations/008_add_provenance_line_range.rb +21 -0
data/db/migrations/009_add_docid.rb +39 -0
data/db/migrations/010_add_llm_cache.rb +30 -0
data/docs/improvements.md +72 -1084
data/docs/influence/claude-supermemory.md +498 -0
data/docs/influence/qmd.md +424 -2022
data/docs/quality_review.md +64 -705
data/lib/claude_memory/commands/doctor_command.rb +45 -4
data/lib/claude_memory/commands/explain_command.rb +11 -6
data/lib/claude_memory/commands/stats_command.rb +1 -1
data/lib/claude_memory/core/fact_graph.rb +122 -0
data/lib/claude_memory/core/fact_query_builder.rb +34 -14
data/lib/claude_memory/core/fact_ranker.rb +3 -20
data/lib/claude_memory/core/relative_time.rb +45 -0
data/lib/claude_memory/core/result_sorter.rb +2 -2
data/lib/claude_memory/core/rr_fusion.rb +57 -0
data/lib/claude_memory/core/snippet_extractor.rb +97 -0
data/lib/claude_memory/domain/fact.rb +3 -1
data/lib/claude_memory/index/index_query.rb +2 -0
data/lib/claude_memory/index/lexical_fts.rb +18 -0
data/lib/claude_memory/infrastructure/operation_tracker.rb +7 -21
data/lib/claude_memory/infrastructure/schema_validator.rb +30 -25
data/lib/claude_memory/ingest/content_sanitizer.rb +8 -1
data/lib/claude_memory/ingest/ingester.rb +67 -56
data/lib/claude_memory/ingest/tool_extractor.rb +1 -1
data/lib/claude_memory/ingest/tool_filter.rb +55 -0
data/lib/claude_memory/logging/logger.rb +112 -0
data/lib/claude_memory/mcp/query_guide.rb +96 -0
data/lib/claude_memory/mcp/response_formatter.rb +86 -23
data/lib/claude_memory/mcp/server.rb +34 -4
data/lib/claude_memory/mcp/text_summary.rb +257 -0
data/lib/claude_memory/mcp/tool_definitions.rb +20 -4
data/lib/claude_memory/mcp/tools.rb +133 -120
data/lib/claude_memory/publish.rb +12 -2
data/lib/claude_memory/recall/expansion_detector.rb +44 -0
data/lib/claude_memory/recall.rb +93 -41
data/lib/claude_memory/resolve/resolver.rb +72 -40
data/lib/claude_memory/store/sqlite_store.rb +99 -24
data/lib/claude_memory/sweep/sweeper.rb +6 -0
data/lib/claude_memory/version.rb +1 -1
data/lib/claude_memory.rb +21 -0
metadata +14 -2
data/docs/remaining_improvements.md +0 -330

data/docs/influence/qmd.md CHANGED Viewed

@@ -1,24 +1,9 @@
-# QMD Analysis: Quick Markdown Search
+# QMD Analysis: Quick Markdown Search (Updated)
-*Analysis Date: 2026-01-26*
-*QMD Version: Latest (commit-based, actively developed)*
+*Analysis Date: 2026-02-02*
+*Previous Analysis: 2026-01-26*
 *Repository: https://github.com/tobi/qmd*
----
-## Table of Contents
-1. [Executive Summary](#executive-summary)
-2. [Architecture Overview](#architecture-overview)
-3. [Database Schema Analysis](#database-schema-analysis)
-4. [Search Pipeline Deep-Dive](#search-pipeline-deep-dive)
-5. [Vector Search Implementation](#vector-search-implementation)
-6. [LLM Infrastructure](#llm-infrastructure)
-7. [Performance Characteristics](#performance-characteristics)
-8. [Comparative Analysis](#comparative-analysis)
-9. [Adoption Opportunities](#adoption-opportunities)
-10. [Implementation Recommendations](#implementation-recommendations)
-11. [Architecture Decisions](#architecture-decisions)
+*Version/Commit: 63028fd (latest main)*
 ---
@@ -26,2170 +11,587 @@
 ### Project Purpose
-QMD (Quick Markdown Search) is an **on-device markdown search engine** optimized for knowledge workers and AI agents. It combines lexical search (BM25), vector embeddings, and LLM reranking to provide high-quality document retrieval without cloud dependencies.
-**Target Users**: Developers, researchers, knowledge workers using markdown for notes, documentation, and personal knowledge management.
+QMD (Quick Markdown Search) is an **on-device search engine** for markdown knowledge bases, notes, meeting transcripts, and documentation. It combines BM25 full-text search, vector semantic search, and LLM re-ranking — all running locally via node-llama-cpp with GGUF models.
 ### Key Innovation
-QMD's primary innovation is **position-aware score blending** in hybrid search:
+QMD's standout innovations since last analysis:
-```typescript
-// Top results favor retrieval scores, lower results favor reranking
-const weights = rank <= 3
-  ? { retrieval: 0.75, reranker: 0.25 }
-  : rank <= 10
-  ? { retrieval: 0.60, reranker: 0.40 }
-  : { retrieval: 0.40, reranker: 0.60 };
-```
+1. **Custom fine-tuned query expansion model** (`qmd-query-expansion-1.7B`): A Qwen3-1.7B model trained with SFT + GRPO (reinforcement learning) specifically for structured search query expansion. Produces typed outputs (`lex:`, `vec:`, `hyde:`) that route to different search backends.
+2. **Claude Code plugin ecosystem**: QMD ships as a Claude Code marketplace plugin (`.claude-plugin/marketplace.json`) with skills, MCP server integration, and inline status checks.
-This approach trusts BM25+vector fusion for strong signals while using LLM reranking to elevate semantically relevant results that lexical search missed.
+3. **Session-scoped LLM management** (`ILLMSession`): Structured lifecycle for LLM resources with abort signals, timeout management, and clean disposal.
 ### Technology Stack
-- **Runtime**: Bun (JavaScript/TypeScript)
-- **Database**: SQLite with sqlite-vec extension
-- **Embeddings**: EmbeddingGemma (300M params, 300MB)
-- **LLM**: node-llama-cpp (local GGUF models)
-- **Vector Search**: sqlite-vec virtual tables with cosine distance
-- **Full-Text Search**: SQLite FTS5 with Porter stemming
+- **Runtime**: Bun >= 1.0.0 (TypeScript)
+- **Database**: SQLite with sqlite-vec extension (cosine distance)
+- **Full-Text Search**: SQLite FTS5 with Porter tokenization
+- **Embeddings**: EmbeddingGemma-300M (GGUF, ~300MB)
+- **Reranking**: Qwen3-Reranker-0.6B (GGUF, ~640MB)
+- **Query Expansion**: qmd-query-expansion-1.7B (custom fine-tuned, ~1.1GB)
+- **MCP**: @modelcontextprotocol/sdk with stdio transport
+- **Validation**: Zod v4 for MCP tool input schemas
+- **Config**: YAML-based collection management (`~/.config/qmd/index.yml`)
 ### Production Readiness
-- **Active Development**: Frequent commits, responsive maintainer
-- **Comprehensive Tests**: eval.test.ts with 24 known-answer queries
-- **Quality Metrics**: 50%+ Hit@3 improvement over BM25-only
-- **Battle-Tested**: Used by maintainer for personal knowledge base
-### Evaluation Results
-From `eval.test.ts` (24 queries across 4 difficulty levels):
-| Query Type | BM25 Hit@3 | Vector Hit@3 | Hybrid Hit@3 | Improvement |
-|------------|------------|--------------|--------------|-------------|
-| Easy (exact keywords) | ≥80% | ≥60% | ≥80% | BM25 sufficient |
-| Medium (semantic) | ≥15% | ≥40% | ≥50% | **+233%** over BM25 |
-| Hard (vague) | ≥15% @ H@5 | ≥30% @ H@5 | ≥35% @ H@5 | **+133%** over BM25 |
-| Fusion (multi-signal) | ~15% | ~30% | ≥50% | **+233%** over BM25 |
-| **Overall** | ≥40% | ≥50% | ≥60% | **+50%** over BM25 |
-Key insight: **Hybrid RRF fusion outperforms both methods alone**, especially on queries requiring both lexical precision and semantic understanding.
+- **Maturity**: Beta, actively developed, 5,700+ GitHub stars
+- **Test Coverage**: Unit tests (store.test.ts, mcp.test.ts), eval harness (18 queries across 3 difficulty levels)
+- **Documentation**: Comprehensive README, CLAUDE.md, inline code docs
+- **Community**: 257 forks, 29 issues, 17 PRs, active maintainer (Tobi Lütke)
+- **Plugin Distribution**: Available via Claude Code marketplace
 ---
 ## Architecture Overview
-### Data Model Comparison
-| Aspect | QMD | ClaudeMemory |
-|--------|-----|--------------|
-| **Granularity** | Full markdown documents | Structured facts (triples) |
-| **Storage** | Content-addressable (SHA256 hash) | Entity-predicate-object |
-| **Deduplication** | Per-document (by content hash) | Per-fact (by signature) |
-| **Retrieval Goal** | Find relevant documents | Find specific facts |
-| **Truth Model** | All documents valid | Supersession + conflicts |
-| **Scope** | YAML collections | Dual-database (global/project) |
-**Philosophical Difference**:
-- **QMD**: "Show me documents about X" (conversation recall)
-- **ClaudeMemory**: "What do we know about X?" (knowledge extraction)
-### Storage Strategy
+### Data Model
-QMD uses **content-addressable storage** with a virtual filesystem layer:
+QMD uses content-addressable storage with a virtual filesystem layer:
 ```
-content table (SHA256 hash → document body)
+content table (SHA256 hash → document body, deduplication)
     ↓
-documents table (collection, path, title → hash)
+documents table (collection, path, title → hash, soft-delete via active flag)
     ↓
-Virtual paths: qmd://collection/path/to/file.md
-```
-Benefits:
-- Automatic deduplication (same content = single storage)
-- Fast change detection (hash comparison)
-- Virtual namespace decoupled from filesystem
-Trade-offs:
-- More complex than direct file storage
-- Hash collisions possible (mitigated by SHA256)
-### Collection System
-QMD uses YAML configuration for multi-collection indexing:
-```yaml
-# ~/.config/qmd/index.yml
-global_context: "Personal knowledge base for software development"
-collections:
-  notes:
-    path: /Users/name/notes
-    pattern: "**/*.md"
-    context:
-      /: "General notes"
-      /work: "Work-related notes and documentation"
-      /personal: "Personal projects and ideas"
-  docs:
-    path: /Users/name/Documents
-    pattern: "**/*.md"
-```
-**Context Inheritance**: File at `/work/projects/api.md` inherits:
-1. Global context
-2. `/` context (general notes)
-3. `/work` context (work-related)
-This provides semantic metadata for LLM operations without storing it per-document.
-### Lifecycle Diagram
-```
-┌─────────────┐
-│ Index Files │ (qmd index <collection>)
-└──────┬──────┘
-       │
-       ↓
-┌─────────────────────────────────────────────────────────┐
-│ 1. Hash content (SHA256)                                 │
-│ 2. INSERT OR IGNORE into content table                   │
-│ 3. INSERT/UPDATE documents table (collection, path → hash)│
-│ 4. FTS5 trigger auto-indexes title + body                │
-└──────┬──────────────────────────────────────────────────┘
-       │
-       ↓
-┌──────────────┐
-│ Embed        │ (qmd embed <collection>)
-└──────┬───────┘
-       │
-       ↓
-┌─────────────────────────────────────────────────────────┐
-│ 1. Chunk document (800 tokens, 15% overlap)              │
-│ 2. Generate embeddings (EmbeddingGemma 384-dim)          │
-│ 3. INSERT into content_vectors + vectors_vec             │
-└──────┬──────────────────────────────────────────────────┘
-       │
-       ↓
-┌──────────────┐
-│ Search       │ (qmd query "concept")
-└──────┬───────┘
-       │
-       ↓
-┌─────────────────────────────────────────────────────────┐
-│ Mode: search  → BM25 only (fast)                         │
-│ Mode: vsearch → Vector only (semantic)                   │
-│ Mode: query   → Hybrid pipeline (BM25 + vec + rerank)    │
-└──────┬──────────────────────────────────────────────────┘
-       │
-       ↓
-┌──────────────┐
-│ Retrieve     │ (qmd get <path | #docid>)
-└──────────────┘
+documents_fts (FTS5 full-text index, auto-synced via triggers)
+    ↓
+content_vectors (chunk metadata: hash, seq, pos, model)
+    ↓
+vectors_vec (sqlite-vec native KNN index, cosine distance)
+    ↓
+llm_cache (hash-keyed deterministic response cache)
 ```
----
+### Key Design Patterns
-## Database Schema Analysis
+1. **Content-Addressable Storage**: `content` table deduplicates by SHA256 hash — multiple documents with identical content share one row (`store.ts:440-450`)
-### Core Tables
+2. **Two-Step Vector Query**: JOINs with sqlite-vec virtual tables hang indefinitely. QMD enforces separate queries for vec lookup and metadata join (`store.ts:1912-1915`):
+   ```typescript
+   // Step 1: KNN from vec table
+   const vecResults = db.prepare(
+     `SELECT hash_seq, distance FROM vectors_vec WHERE embedding MATCH ? AND k = ?`
+   ).all(embedding, limit * 3);
+   // Step 2: Join with documents separately
+   ```
-#### 1. `content` - Content-Addressable Storage
+3. **YAML-Based Collection Config**: Collections migrated from SQLite foreign keys to `~/.config/qmd/index.yml` for easier user management. Schema migration in `migrate-schema.ts` handled the transition.
-```sql
-CREATE TABLE content (
-  hash TEXT PRIMARY KEY,      -- SHA256 of document body
-  doc TEXT NOT NULL,          -- Full markdown content
-  created_at TEXT NOT NULL    -- ISO timestamp
-);
-```
+4. **Hierarchical Context System**: Context descriptions inherit along path hierarchy — a file at `/work/projects/api.md` gets global context + `/` context + `/work` context concatenated (`collections.ts:94-113`)
-**Design Pattern**: Hash-keyed blob storage for automatic deduplication.
+5. **Probabilistic Cache Cleanup**: 1% chance per query to prune LLM cache to latest 1000 entries (`store.ts:804-807`)
-**Key Insight**: Multiple documents with identical content share one storage entry.
+6. **Lazy Model Singleton**: LLM models lazy-load on first use, keep in memory, and unload contexts after 2-minute idle (`llm.ts:920-951`)
-#### 2. `documents` - Virtual Filesystem
+### Module Organization
-```sql
-CREATE TABLE documents (
-  id INTEGER PRIMARY KEY,
-  collection TEXT NOT NULL,   -- Collection name (from YAML)
-  path TEXT NOT NULL,         -- Relative path within collection
-  title TEXT NOT NULL,        -- Extracted from first H1/H2
-  hash TEXT NOT NULL,         -- Foreign key to content.hash
-  created_at TEXT NOT NULL,
-  modified_at TEXT NOT NULL,
-  active INTEGER DEFAULT 1,   -- Soft delete flag
-  UNIQUE(collection, path)
-);
 ```
-**Virtual Path Construction**: `qmd://{collection}/{path}`
-Example: `qmd://notes/work/api-design.md`
-#### 3. `documents_fts` - Full-Text Search Index
-```sql
-CREATE VIRTUAL TABLE documents_fts USING fts5(
-  title,                      -- Weighted heavily (10.0)
-  body,                       -- Standard weight (1.0)
-  tokenize = 'porter unicode61'
-);
--- Auto-sync trigger on documents INSERT/UPDATE/DELETE
--- Copies title + body from content table via hash join
+qmd/
+├── src/
+│   ├── qmd.ts          # CLI entry point (~750 lines, lazy-loaded store)
+│   ├── store.ts         # Core store: schema, search, indexing (~2400 lines)
+│   ├── mcp.ts           # MCP server: 6 tools + resource + prompt (~626 lines)
+│   ├── llm.ts           # LLM abstraction: embed, rerank, expand (~1208 lines)
+│   ├── collections.ts   # YAML config management (~390 lines)
+│   ├── store.test.ts    # Comprehensive store unit tests
+│   └── mcp.test.ts      # MCP integration tests
+├── finetune/            # Query expansion model training pipeline
+│   ├── reward.py        # Multi-dimensional reward function (5 dimensions, 120 pts)
+│   ├── train.py         # Unified SFT + GRPO training
+│   ├── eval.py          # Model evaluation with scoring
+│   └── jobs/            # HuggingFace Jobs wrappers
+├── test/
+│   └── eval-harness.ts  # Search quality evaluation (18 queries)
+├── skills/qmd/          # Claude Code plugin skill definition
+└── .claude-plugin/      # Marketplace distribution metadata
 ```
-**BM25 Scoring**: Lower scores are better (distance metric).
+### Comparison with ClaudeMemory
-**Tokenization**: Porter stemming for English, unicode61 for international characters.
+| Aspect | QMD | ClaudeMemory | Notes |
+|--------|-----|--------------|-------|
+| **Data Model** | Full markdown documents | Structured fact triples | Different paradigms: recall vs extraction |
+| **Storage** | SQLite + sqlite-vec (native vectors) | SQLite + JSON embeddings | QMD has 10-100x faster KNN |
+| **Search** | BM25 + Vector + RRF + Reranking | BM25 + Vector (hybrid) | QMD adds reranking + query expansion |
+| **MCP** | 6 tools + resource + prompt | 18 tools | ClaudeMemory has richer tool surface |
+| **Distribution** | Bun global install + plugin | Ruby gem + MCP + hooks | QMD has smoother install via plugin |
+| **LLM Dependency** | 3 local GGUF models (~2GB total) | None (local ONNX only) | ClaudeMemory is dramatically lighter |
+| **Query Expansion** | Custom fine-tuned model (1.7B) | None | QMD has ML-powered query improvement |
+| **Truth Maintenance** | None (all docs valid) | Supersession + conflicts | ClaudeMemory handles contradictions |
+| **Scope System** | YAML collections | Dual-database (global/project) | Both approaches valid for their use case |
+| **Testing** | Unit + eval harness | Unit + evals + benchmarks (DevMemBench) | ClaudeMemory has more comprehensive benchmarks |
-#### 4. `content_vectors` - Embedding Metadata
+---
-```sql
-CREATE TABLE content_vectors (
-  hash TEXT NOT NULL,         -- Foreign key to content.hash
-  seq INTEGER NOT NULL,       -- Chunk sequence number
-  pos INTEGER NOT NULL,       -- Character position in document
-  model TEXT NOT NULL,        -- Embedding model name
-  embedded_at TEXT NOT NULL,  -- ISO timestamp
-  PRIMARY KEY (hash, seq)
-);
-```
+## Key Components Deep-Dive
-**Chunk Strategy**: 800 tokens with 15% overlap, semantic boundaries.
+### Component 1: Fine-Tuned Query Expansion
-**Key**: `hash_seq` composite (e.g., `"abc123def456_0"`)
+**Purpose**: Generate structured query variations (lex/vec/hyde) to improve search recall by routing different query types to appropriate backends.
-#### 5. `vectors_vec` - Native Vector Index
+**Location**: `finetune/`, `src/llm.ts:637-679`
-```sql
-CREATE VIRTUAL TABLE vectors_vec USING vec0(
-  hash_seq TEXT PRIMARY KEY,  -- "hash_seq" composite key
-  embedding float[384]         -- 384-dimensional vector (EmbeddingGemma)
-    distance_metric=cosine
-);
-```
+**Implementation** (from `finetune/README.md`):
-**Critical Implementation Note** (from store.ts:1745-1748):
-```typescript
-// IMPORTANT: We use a two-step query approach here because sqlite-vec virtual tables
-// hang indefinitely when combined with JOINs in the same query. Do NOT try to
-// "optimize" this by combining into a single query with JOINs - it will break.
-// See: https://github.com/tobi/qmd/pull/23
-// CORRECT: Two-step pattern
-const vecResults = db.prepare(`
-  SELECT hash_seq, distance
-  FROM vectors_vec
-  WHERE embedding MATCH ? AND k = ?
-`).all(embedding, limit * 3);
-// Then join with documents table separately
-const hashSeqs = vecResults.map(r => r.hash_seq);
-const docs = db.prepare(`
-  SELECT * FROM documents WHERE hash IN (${placeholders})
-`).all(hashSeqs);
-```
+The custom model `qmd-query-expansion-1.7B` is trained in two stages:
-**Why This Matters for ClaudeMemory**: When adopting sqlite-vec, we MUST use two-step queries to avoid hangs.
+1. **SFT (Supervised Fine-Tuning)**: Teaches format compliance
+   - Base model: Qwen3-1.7B
+   - LoRA rank 16, alpha 32 (all projection layers)
+   - ~2,290 training examples, 5 epochs
+   - Loss: train 0.472, val 0.304
-#### 6. `llm_cache` - Deterministic Response Cache
+2. **GRPO (Group Relative Policy Optimization)**: Refines quality
+   - LoRA rank 4, alpha 8 (q_proj, v_proj only)
+   - KL beta 0.04 (prevents drift from SFT)
+   - 200 steps, mean reward 0.757
-```sql
-CREATE TABLE llm_cache (
-  hash TEXT PRIMARY KEY,      -- Hash of (operation, model, input)
-  result TEXT NOT NULL,       -- LLM response (JSON or plain text)
-  created_at TEXT NOT NULL
-);
-```
+**Reward Function** (from `finetune/reward.py`):
+5 dimensions totaling 120 points (140 with hyde):
+- Format (0-30): Valid lex/vec/hyde lines
+- Diversity (0-30): Multiple types, no echoing query
+- HyDE (0-20): Presence, length, quality
+- Quality (0-20): Lex < vec length, preserved terms
+- Entity (±45 to +20): Named entity preservation
+- Think penalty: No `<think>` blocks (uses `/no_think` directive)
-**Cache Key Formula**:
-```typescript
-function getCacheKey(operation: string, params: Record<string, any>): string {
-  const canonical = JSON.stringify({ operation, ...params });
-  return sha256(canonical);
-}
-// Examples:
-// expandQuery: hash("expandQuery" + model + query)
-// rerank: hash("rerank" + model + query + file)
+**Output Format**:
 ```
-**Cleanup Strategy** (probabilistic):
-```typescript
-// 1% chance per query to run cleanup
-if (Math.random() < 0.01) {
-  db.run(`
-    DELETE FROM llm_cache
-    WHERE hash NOT IN (
-      SELECT hash FROM llm_cache
-      ORDER BY created_at DESC
-      LIMIT 1000
-    )
-  `);
-}
+lex: authentication configuration
+lex: auth settings setup
+vec: how to configure authentication settings
+hyde: Authentication can be configured by setting the AUTH_SECRET environment variable.
 ```
-**Benefits**:
-- Reduces API costs for repeated operations
-- Deterministic (same input = same cache key)
-- Self-tuning (frequent queries stay cached)
-### Foreign Key Relationships
-```
-content.hash ← documents.hash ← content_vectors.hash
-                    ↓
-              documents_fts (via trigger)
-                    ↓
-         vectors_vec.hash_seq (composite key)
-```
+**Design Decisions**:
+- Structured output types (`lex:`, `vec:`, `hyde:`) route to different backends instead of generic rewrites
+- `/no_think` Qwen3 directive suppresses chain-of-thought for direct output
+- Grammar-constrained generation ensures format compliance at inference time
+- Per-query caching avoids redundant expansion (80% hit rate)
-**Cascade Behavior**:
-- Soft delete: `documents.active = 0` (preserves content)
-- Hard delete: Manual cleanup of orphaned content/vectors
+**Relevance to ClaudeMemory**: The structured lex/vec/hyde output pattern is interesting — if we ever add query expansion to our recall pipeline, this type-routed approach is more sophisticated than simple query rewriting. The reward function design (multi-dimensional scoring with entity preservation) is also a good reference for evaluating any future distiller quality.
 ---
-## Search Pipeline Deep-Dive
-QMD provides three search modes with increasing sophistication:
-### Mode 1: `search` (BM25 Only)
-**Use Case**: Fast keyword matching when you know exact terms.
-**Pipeline**:
-```typescript
-searchFTS(db, query, limit) {
-  // 1. Sanitize and build FTS5 query
-  const terms = query.split(/\s+/)
-    .map(t => sanitize(t))
-    .filter(t => t.length > 0);
-  const ftsQuery = terms.map(t => `"${t}"*`).join(' AND ');
-  // 2. Query FTS5 with BM25 scoring
-  const results = db.prepare(`
-    SELECT
-      d.path,
-      d.title,
-      bm25(documents_fts, 10.0, 1.0) as score
-    FROM documents_fts f
-    JOIN documents d ON d.id = f.rowid
-    WHERE documents_fts MATCH ? AND d.active = 1
-    ORDER BY score ASC  -- Lower is better for BM25
-    LIMIT ?
-  `).all(ftsQuery, limit);
-  // 3. Convert BM25 (lower=better) to similarity (higher=better)
-  return results.map(r => ({
-    ...r,
-    score: 1 / (1 + Math.max(0, r.score))
-  }));
-}
-```
-**Latency**: <50ms
-**Strengths**: Fast, good for exact matches
+### Component 2: Claude Code Plugin System
-**Weaknesses**: Misses semantic similarity
+**Purpose**: Package QMD for frictionless installation via Claude Code marketplace.
-### Mode 2: `vsearch` (Vector Only)
+**Location**: `.claude-plugin/marketplace.json`, `skills/qmd/SKILL.md`
-**Use Case**: Semantic search when exact terms unknown.
-**Pipeline**:
-```typescript
-async searchVec(db, query, model, limit) {
-  // 1. Generate query embedding
-  const llm = getDefaultLlamaCpp();
-  const formatted = formatQueryForEmbedding(query);
-  const result = await llm.embed(formatted, { model });
-  const embedding = new Float32Array(result.embedding);
-  // 2. KNN search (two-step to avoid JOIN hang)
-  const vecResults = db.prepare(`
-    SELECT hash_seq, distance
-    FROM vectors_vec
-    WHERE embedding MATCH ? AND k = ?
-  `).all(embedding, limit * 3);
-  // 3. Join with documents (separate query)
-  const hashSeqs = vecResults.map(r => r.hash_seq);
-  const docs = db.prepare(`
-    SELECT cv.hash, d.path, d.title
-    FROM content_vectors cv
-    JOIN documents d ON d.hash = cv.hash
-    WHERE cv.hash || '_' || cv.seq IN (${placeholders})
-  `).all(hashSeqs);
-  // 4. Deduplicate by document (keep best chunk per doc)
-  const seen = new Map();
-  for (const doc of docs) {
-    const distance = distanceMap.get(doc.hash_seq);
-    const existing = seen.get(doc.path);
-    if (!existing || distance < existing.distance) {
-      seen.set(doc.path, { doc, distance });
+**Plugin Structure** (from `marketplace.json:1-29`):
+```json
+{
+  "name": "qmd",
+  "plugins": [{
+    "name": "qmd",
+    "skills": ["./skills/"],
+    "mcpServers": {
+      "qmd": { "command": "qmd", "args": ["mcp"] }
     }
-  }
-  // 5. Convert distance to similarity
-  return Array.from(seen.values())
-    .sort((a, b) => a.distance - b.distance)
-    .slice(0, limit)
-    .map(({ doc, distance }) => ({
-      ...doc,
-      score: 1 - distance  // Cosine similarity
-    }));
-}
-```
-**Latency**: ~200ms (embedding generation)
-**Strengths**: Semantic understanding, synonym matching
-**Weaknesses**: Slower, may miss exact keyword matches
-### Mode 3: `query` (Hybrid Pipeline)
-**Use Case**: Best-quality search combining lexical + semantic + reranking.
-**Full Pipeline** (10 stages):
-#### Stage 1: Initial FTS Query
-```typescript
-const initialFts = searchFTS(db, query, 20);
-```
-**Purpose**: Get BM25 baseline results.
-#### Stage 2: Smart Expansion Detection
-```typescript
-const topScore = initialFts[0]?.score ?? 0;
-const secondScore = initialFts[1]?.score ?? 0;
-const hasStrongSignal =
-  initialFts.length > 0 &&
-  topScore >= 0.85 &&
-  (topScore - secondScore) >= 0.15;
-if (hasStrongSignal) {
-  // Skip expensive LLM operations
-  return initialFts.slice(0, limit);
+  }]
 }
 ```
-**Purpose**: Detect when BM25 has clear winner (exact match).
-**Impact**: Saves 2-3 seconds on ~60% of queries (per QMD data).
-**Thresholds**:
-- `topScore >= 0.85`: Strong match
-- `gap >= 0.15`: Clear winner
-#### Stage 3: Query Expansion (LLM)
-```typescript
-// Generate alternative phrasings for better recall
-const expanded = await expandQuery(query, model, db);
-// Returns: [original, variant1, variant2]
-```
-**LLM Prompt** (simplified):
-```
-Generate 2 alternative search queries:
-1. 'lex': Keyword-focused variation
-2. 'vec': Semantic-focused variation
-Original: "how to structure REST endpoints"
-Output:
-lex: API endpoint design patterns
-vec: RESTful service architecture best practices
-```
-**Model**: Qwen3-1.7B (2.2GB, loaded on-demand)
-**Cache Key**: `hash(query + model)`
-#### Stage 4: Multi-Query Search (Parallel)
-```typescript
-const rankedLists = [];
-for (const q of expanded) {
-  // Run FTS for each query variant
-  const ftsResults = searchFTS(db, q.text, 20);
-  rankedLists.push(ftsResults);
-  // Run vector search for each query variant
-  const vecResults = await searchVec(db, q.text, model, 20);
-  rankedLists.push(vecResults);
-}
-// Result: 6 ranked lists (3 queries × 2 methods each)
-```
-**Purpose**: Cast wide net to maximize recall.
-#### Stage 5: Reciprocal Rank Fusion (RRF)
-```typescript
-function reciprocalRankFusion(
-  resultLists: RankedResult[][],
-  weights: number[] = [],
-  k: number = 60
-): RankedResult[] {
-  const scores = new Map<string, {
-    result: RankedResult;
-    rrfScore: number;
-    topRank: number;
-  }>();
-  // Accumulate RRF scores across all lists
-  for (let listIdx = 0; listIdx < resultLists.length; listIdx++) {
-    const list = resultLists[listIdx];
-    const weight = weights[listIdx] ?? 1.0;
-    for (let rank = 0; rank < list.length; rank++) {
-      const result = list[rank];
-      const rrfContribution = weight / (k + rank + 1);
-      const existing = scores.get(result.file);
-      if (existing) {
-        existing.rrfScore += rrfContribution;
-        existing.topRank = Math.min(existing.topRank, rank);
-      } else {
-        scores.set(result.file, {
-          result,
-          rrfScore: rrfContribution,
-          topRank: rank
-        });
-      }
-    }
-  }
-  // Top-rank bonus (preserve exact matches)
-  for (const entry of scores.values()) {
-    if (entry.topRank === 0) {
-      entry.rrfScore += 0.05;  // #1 in any list
-    } else if (entry.topRank <= 2) {
-      entry.rrfScore += 0.02;  // #2-3 in any list
-    }
-  }
-  return Array.from(scores.values())
-    .sort((a, b) => b.rrfScore - a.rrfScore)
-    .map(e => ({ ...e.result, score: e.rrfScore }));
-}
+**Skill Definition** (from `skills/qmd/SKILL.md:1-10`):
+```yaml
+---
+name: qmd
+description: Search personal markdown knowledge bases...
+metadata:
+  author: tobi
+  version: "1.1.1"
+allowed-tools: Bash(qmd:*), mcp__qmd__*
+---
 ```
-**RRF Formula**: `score = Σ(weight / (k + rank + 1))`
-**Why k=60?**: Balances top-rank emphasis with lower-rank contributions.
-- Lower k (e.g., 20): Top ranks dominate
-- Higher k (e.g., 100): Smoother blending
+Key features:
+- **Inline status check**: `!` prefix runs command during skill load (`SKILL.md:18`)
+- **Trigger phrases**: "search my notes", "find in docs", "what did I write about"
+- **Tool permissions**: Scoped to `qmd:*` bash commands and `mcp__qmd__*` tools
+- **Score interpretation guide**: Embedded in skill for LLM consumption
+- **Recommended workflow**: status → search → vsearch → query → get
-**Weight Strategy**:
-- Original query: `weight = 2.0` (prioritize user's exact words)
-- Expanded queries: `weight = 1.0` (supplementary signals)
+**Relevance to ClaudeMemory**: This is the clearest example of how to package a memory/search tool as a Claude Code plugin. The skill definition format, tool permissions scoping, inline status checks, and MCP server bundling are all patterns we should adopt when ready to ship as a plugin. The `allowed-tools` pattern (`Bash(qmd:*)`) is particularly useful for security scoping.
-**Top-Rank Bonus**:
-- `+0.05` for rank #1: Likely exact match
-- `+0.02` for ranks #2-3: Strong signal
-- No bonus for rank #4+: Let RRF dominate
-#### Stage 6: Candidate Selection
+---
-```typescript
-const candidates = fusedResults.slice(0, 30);
-```
+### Component 3: MCP Server with Structured Content
-**Purpose**: Limit reranking to top candidates (cost control).
+**Purpose**: Expose QMD search as MCP tools with both human-readable text and machine-parseable structured content.
-#### Stage 7: Per-Document Best Chunk Selection
+**Location**: `src/mcp.ts`
+**Implementation** (from `mcp.ts:258-292`):
 ```typescript
-// For each candidate document, find best matching chunk
-const docChunks = candidates.map(doc => {
-  const chunks = getChunksForDocument(db, doc.hash);
-  // Score each chunk by keyword overlap
-  const scored = chunks.map(chunk => {
-    const terms = query.toLowerCase().split(/\s+/);
-    const chunkLower = chunk.text.toLowerCase();
-    const matchCount = terms.filter(t => chunkLower.includes(t)).length;
-    return { chunk, score: matchCount };
-  });
-  // Return best chunk text for reranking
+server.registerTool("search", {
+  title: "Search (BM25)",
+  inputSchema: {
+    query: z.string().describe("Search query"),
+    limit: z.number().optional().default(10),
+    minScore: z.number().optional().default(0),
+    collection: z.string().optional(),
+  },
+}, async ({ query, limit, minScore, collection }) => {
+  // ... search logic ...
   return {
-    file: doc.path,
-    text: scored.sort((a, b) => b.score - a.score)[0].chunk.text
+    content: [{ type: "text", text: formatSearchSummary(filtered, query) }],
+    structuredContent: { results: filtered },
   };
 });
 ```
-**Purpose**: Reranker sees most relevant chunk per document.
-#### Stage 8: LLM Reranking (Cross-Encoder)
-```typescript
-const rerankResult = await llm.rerank(query, docChunks, { model });
-// Returns: [{ file, score: 0.0-1.0 }, ...]
-// score = normalized relevance (cross-encoder logits)
-```
-**Model**: Qwen3-Reranker-0.6B (640MB)
-**How It Works**: Cross-encoder scores query-document pair directly (not separate embeddings).
-**Cache Key**: `hash(query + file + model)`
-#### Stage 9: Position-Aware Score Blending
-```typescript
-// Combine RRF and reranker scores based on rank
-const blended = candidates.map((doc, rank) => {
-  const rrfScore = doc.score;
-  const rerankScore = rerankScores.get(doc.file) || 0;
-  // Top results: trust retrieval more
-  // Lower results: trust reranker more
-  let rrfWeight, rerankWeight;
-  if (rank < 3) {
-    rrfWeight = 0.75;
-    rerankWeight = 0.25;
-  } else if (rank < 10) {
-    rrfWeight = 0.60;
-    rerankWeight = 0.40;
-  } else {
-    rrfWeight = 0.40;
-    rerankWeight = 0.60;
-  }
-  const finalScore = rrfWeight * rrfScore + rerankWeight * rerankScore;
-  return { ...doc, score: finalScore };
-});
-```
-**Rationale**:
-- Top results likely have both strong lexical AND semantic signals
-- Lower results may be semantically relevant but lexically weak
-- Reranker helps elevate hidden gems
-#### Stage 10: Final Sorting
-```typescript
-return blended
-  .sort((a, b) => b.score - a.score)
-  .slice(0, limit);
-```
-**Latency Breakdown**:
-- Cold (first query): 2-3s (model loading + expansion + reranking)
-- Warm (cached expansion): ~500ms (reranking only)
-- Strong signal (skipped): ~200ms (FTS + vector, no LLM)
----
-## Vector Search Implementation
-### Embedding Model: EmbeddingGemma
-**Specs**:
-- Parameters: 300M
-- Dimensions: 384 (QMD docs say 768, but 384 is standard)
-- Format: GGUF (quantized)
-- Size: 300MB download
-- Tokenizer: SentencePiece
-**Prompt Format** (Nomic-style):
-```typescript
-// Query embedding
-formatQueryForEmbedding(query: string): string {
-  return `task: search result | query: ${query}`;
-}
-// Document embedding
-formatDocForEmbedding(text: string, title?: string): string {
-  return `title: ${title || "none"} | text: ${text}`;
-}
-```
-**Why Prompt Formatting Matters**: Embedding models are trained on specific formats. Using the wrong format degrades quality.
-### Document Chunking Strategy
-QMD offers two chunking approaches:
-#### 1. Token-Based Chunking (Recommended)
-```typescript
-async function chunkDocumentByTokens(
-  content: string,
-  maxTokens: number = 800,
-  overlapTokens: number = 120  // 15% of 800
-): Promise<{ text: string; pos: number; tokens: number }[]> {
-  const llm = getDefaultLlamaCpp();
-  // Tokenize entire document once
-  const allTokens = await llm.tokenize(content);
-  const totalTokens = allTokens.length;
-  if (totalTokens <= maxTokens) {
-    return [{ text: content, pos: 0, tokens: totalTokens }];
-  }
-  const chunks = [];
-  const step = maxTokens - overlapTokens;  // 680 tokens
-  let tokenPos = 0;
-  while (tokenPos < totalTokens) {
-    const chunkEnd = Math.min(tokenPos + maxTokens, totalTokens);
-    const chunkTokens = allTokens.slice(tokenPos, chunkEnd);
-    let chunkText = await llm.detokenize(chunkTokens);
-    // Find semantic break point if not at end
-    if (chunkEnd < totalTokens) {
-      const searchStart = Math.floor(chunkText.length * 0.7);
-      const searchSlice = chunkText.slice(searchStart);
-      // Priority: paragraph > sentence > line
-      const breakOffset = findBreakPoint(searchSlice);
-      if (breakOffset >= 0) {
-        chunkText = chunkText.slice(0, searchStart + breakOffset);
-      }
-    }
-    chunks.push({
-      text: chunkText,
-      pos: Math.floor(tokenPos * avgCharsPerToken),
-      tokens: chunkTokens.length
-    });
-    tokenPos += step;
-  }
-  return chunks;
-}
-```
-**Parameters**:
-- `maxTokens = 800`: EmbeddingGemma's optimal context window
-- `overlapTokens = 120` (15%): Ensures continuity across boundaries
-**Break Priority** (from store.ts:1020-1046):
-1. Paragraph boundary (`\n\n`)
-2. Sentence end (`. `, `.\n`, `? `, `! `)
-3. Line break (`\n`)
-4. Word boundary (` `)
-5. Hard cut (if no boundary found)
-**Search Window**: Last 30% of chunk (70-100% range) to avoid cutting too early.
-#### 2. Character-Based Chunking (Fallback)
-```typescript
-function chunkDocument(
-  content: string,
-  maxChars: number = 3200,      // ~800 tokens @ 4 chars/token
-  overlapChars: number = 480    // 15% overlap
-): { text: string; pos: number }[] {
-  // Similar logic but operates on characters instead of tokens
-  // Faster but less accurate (doesn't respect token boundaries)
-}
-```
-**When to Use**: Synchronous contexts where async tokenization isn't available.
-### sqlite-vec Integration
-QMD uses **sqlite-vec 0.1.x** (vec0 virtual table):
-```typescript
-// Create virtual table for native vectors
-db.exec(`
-  CREATE VIRTUAL TABLE vectors_vec USING vec0(
-    hash_seq TEXT PRIMARY KEY,
-    embedding float[384] distance_metric=cosine
-  )
-`);
-// Insert embedding (note: Float32Array required)
-const embedding = new Float32Array(embeddingArray);
-db.prepare(`
-  INSERT INTO vectors_vec (hash_seq, embedding) VALUES (?, ?)
-`).run(`${hash}_${seq}`, embedding);
-// KNN search (CRITICAL: no JOINs in same query!)
-const vecResults = db.prepare(`
-  SELECT hash_seq, distance
-  FROM vectors_vec
-  WHERE embedding MATCH ? AND k = ?
-`).all(queryEmbedding, limit * 3);
-// Then join with documents in separate query
-const docs = db.prepare(`
-  SELECT * FROM documents WHERE hash IN (...)
-`).all(hashList);
-```
-**Key Insights**:
-1. **Two-Step Pattern Required**: JOINs with vec0 tables hang (confirmed bug)
-2. **Float32Array**: Must convert number[] to typed array
-3. **Cosine Distance**: Returns 0.0 (identical) to 2.0 (opposite)
-4. **KNN Parameter**: Request `limit * 3` to allow for deduplication
+**Key patterns**:
+1. **Dual output**: Both `content` (human-readable text) and `structuredContent` (JSON) returned from every tool
+2. **Zod validation**: Input schemas use Zod v4 with `.describe()` for auto-documentation
+3. **Resource template**: Documents accessible via `qmd://{+path}` URI pattern with suffix matching fallback (`mcp.ts:105-166`)
+4. **Query guide prompt**: Registered prompt explaining search strategy to LLMs (`mcp.ts:172-252`)
+5. **Line numbers**: Default in resource output for precise references
+6. **Error handling**: `isError: true` flag for clear error signaling, fuzzy file suggestions on not-found
-### Per-Document vs Per-Chunk Deduplication
-QMD deduplicates **per-document** after vector search:
-```typescript
-// Multiple chunks per document may match
-// Keep only the best chunk per document
-const seen = new Map<string, { doc, bestDistance }>();
-for (const row of docRows) {
-  const distance = distanceMap.get(row.hash_seq);
-  const existing = seen.get(row.filepath);
-  if (!existing || distance < existing.bestDistance) {
-    seen.set(row.filepath, { doc: row, bestDistance: distance });
-  }
-}
-return Array.from(seen.values())
-  .sort((a, b) => a.bestDistance - b.bestDistance);
-```
-**Rationale**: Users want documents, not chunks. Show best chunk per doc.
+**Relevance to ClaudeMemory**: We already have 18 MCP tools, but QMD's dual `content`/`structuredContent` pattern is worth adopting — it ensures both human (text summary) and machine (JSON) consumers get optimal formats. The registered prompt for query guidance is also a good pattern for improving Claude's tool usage.
 ---
-## LLM Infrastructure
-### node-llama-cpp Abstraction
-QMD uses **node-llama-cpp** for local inference:
-```typescript
-import { getLlama, LlamaModel, LlamaChatSession } from "node-llama-cpp";
-class LlamaCpp implements LLM {
-  private llama: Llama | null = null;
-  private embedModel: LlamaModel | null = null;
-  private rerankModel: LlamaModel | null = null;
-  private generateModel: LlamaModel | null = null;
-  // Lazy loading with singleton pattern
-  private async ensureLlama(): Promise<Llama> {
-    if (!this.llama) {
-      this.llama = await getLlama({ logLevel: LlamaLogLevel.error });
-    }
-    return this.llama;
-  }
-  private async ensureEmbedModel(): Promise<LlamaModel> {
-    if (!this.embedModel) {
-      const llama = await this.ensureLlama();
-      const modelPath = await resolveModelFile(
-        "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf",
-        this.modelCacheDir
-      );
-      this.embedModel = await llama.loadModel({ modelPath });
-    }
-    return this.embedModel;
-  }
-}
-```
-**Model Download**: Automatic from HuggingFace (cached in `~/.cache/qmd/models/`)
-### Lazy Model Loading
-**Strategy**: Load models on first use, keep in memory, unload after 2 minutes idle.
-```typescript
-// Inactivity timer management
-private touchActivity(): void {
-  if (this.inactivityTimer) {
-    clearTimeout(this.inactivityTimer);
-  }
-  if (this.inactivityTimeoutMs > 0 && this.hasLoadedContexts()) {
-    this.inactivityTimer = setTimeout(() => {
-      this.unloadIdleResources();
-    }, this.inactivityTimeoutMs);
-    this.inactivityTimer.unref();  // Don't block process exit
-  }
-}
-// Unload contexts (heavy) but keep models (fast reload)
-async unloadIdleResources(): Promise<void> {
-  if (this.embedContext) {
-    await this.embedContext.dispose();
-    this.embedContext = null;
-  }
-  if (this.rerankContext) {
-    await this.rerankContext.dispose();
-    this.rerankContext = null;
-  }
-  // Optional: also dispose models if disposeModelsOnInactivity=true
-  // (default: false, keep models loaded)
-}
-```
-**Lifecycle** (from llm.ts comments):
-```
-llama (lightweight) → model (VRAM) → context (VRAM) → sequence (per-session)
-```
-**Why This Matters**:
-- **Cold start**: First query loads models (~2-3s)
-- **Warm**: Subsequent queries use loaded models (~200-500ms)
-- **Idle**: After 2min, contexts unloaded (models stay unless configured)
-### Query Expansion
-**Purpose**: Generate alternative phrasings for better recall.
-**LLM Prompt** (from llm.ts:637-679):
-```typescript
-const prompt = `You are a search query optimization expert. Your task is to improve retrieval by rewriting queries and generating hypothetical documents.
-Original Query: ${query}
-${context ? `Additional Context, ONLY USE IF RELEVANT:\n\n<context>${context}</context>` : ""}
-## Step 1: Query Analysis
-Identify entities, search intent, and missing context.
-## Step 2: Generate Hypothetical Document
-Write a focused sentence passage that would answer the query. Include specific terminology and domain vocabulary.
-## Step 3: Query Rewrites
-Generate 2-3 alternative search queries that resolve ambiguities. Use terminology from the hypothetical document.
-## Step 4: Final Retrieval Text
-Output exactly 1-3 'lex' lines, 1-3 'vec' lines, and MAX ONE 'hyde' line.
-<format>
-lex: {single search term}
-vec: {single vector query}
-hyde: {complete hypothetical document passage from Step 2 on a SINGLE LINE}
-</format>
-<rules>
-- DO NOT repeat the same line.
-- Each 'lex:' line MUST be a different keyword variation based on the ORIGINAL QUERY.
-- Each 'vec:' line MUST be a different semantic variation based on the ORIGINAL QUERY.
-- The 'hyde:' line MUST be the full sentence passage from Step 2, but all on one line.
-</rules>
-Final Output:`;
-```
-**Grammar** (constrained generation):
-```typescript
-const grammar = await llama.createGrammar({
-  grammar: `
-    root ::= line+
-    line ::= type ": " content "\\n"
-    type ::= "lex" | "vec" | "hyde"
-    content ::= [^\\n]+
-  `
-});
-```
-**Output Parsing**:
-```typescript
-const result = await session.prompt(prompt, { grammar, maxTokens: 1000, temperature: 1 });
-const lines = result.trim().split("\n");
-const queryables: Queryable[] = lines.map(line => {
-  const colonIdx = line.indexOf(":");
-  const type = line.slice(0, colonIdx).trim();
-  const text = line.slice(colonIdx + 1).trim();
-  return { type: type as QueryType, text };
-}).filter(q => q.type === 'lex' || q.type === 'vec' || q.type === 'hyde');
-```
-**Example**:
-```
-Query: "how to structure REST endpoints"
-Output:
-lex: REST API design
-lex: endpoint organization patterns
-vec: RESTful service architecture principles
-vec: HTTP resource modeling best practices
-hyde: REST endpoints should follow resource-oriented design with clear hierarchies. Use nouns for resources, HTTP methods for operations, and consistent naming conventions for discoverability.
-```
-**Model**: Qwen3-1.7B (2.2GB)
-**Cache Hit Rate**: High for repeated queries (~80% per QMD usage data)
-### LLM Reranking
-**Purpose**: Score query-document relevance using cross-encoder.
-**Implementation**:
-```typescript
-async rerank(
-  query: string,
-  documents: RerankDocument[],
-  options: RerankOptions = {}
-): Promise<RerankResult> {
-  const context = await this.ensureRerankContext();
-  // Extract text for ranking
-  const texts = documents.map(doc => doc.text);
-  // Use native ranking API (returns sorted by score)
-  const ranked = await context.rankAndSort(query, texts);
-  // Map back to original documents
-  const results = ranked.map(item => {
-    const docInfo = textToDoc.get(item.document);
-    return {
-      file: docInfo.file,
-      score: item.score,  // 0.0 (irrelevant) to 1.0 (highly relevant)
-      index: docInfo.index
-    };
-  });
-  return { results, model: this.rerankModelUri };
-}
-```
-**Model**: Qwen3-Reranker-0.6B (640MB)
+### Component 4: Session-Scoped LLM Lifecycle
-**Score Range**: 0.0 to 1.0 (normalized from logits)
+**Purpose**: Manage LLM model loading, context creation, and cleanup with structured lifecycle guarantees.
-**Cache Key**: `hash(query + file + model)`
+**Location**: `src/llm.ts:126-146`
-### Cache Management
-**Probabilistic Cleanup** (from store.ts:804-807):
+**Session Interface** (from `llm.ts:137-146`):
 ```typescript
-// 1% chance per query to run cleanup
-if (Math.random() < 0.01) {
-  db.run(`
-    DELETE FROM llm_cache
-    WHERE hash NOT IN (
-      SELECT hash FROM llm_cache
-      ORDER BY created_at DESC
-      LIMIT 1000
-    )
-  `);
+export interface ILLMSession {
+  embed(text: string, options?: EmbedOptions): Promise<EmbeddingResult | null>;
+  embedBatch(texts: string[]): Promise<(EmbeddingResult | null)[]>;
+  expandQuery(query: string, options?): Promise<Queryable[]>;
+  rerank(query: string, documents: RerankDocument[]): Promise<RerankResult>;
+  readonly isValid: boolean;
+  readonly signal: AbortSignal;
 }
 ```
-**Rationale**:
-- Keep latest 1000 entries (most likely to be reused)
-- Probabilistic avoids overhead on every query
-- Self-tuning: frequent queries naturally stay cached
-**Cache Size Estimate**:
-- Query expansion: ~500 bytes per entry
-- Reranking: ~50 bytes per entry (just score)
-- 1000 entries ≈ 500KB (negligible)
----
-## Performance Characteristics
-### Evaluation Methodology
-QMD includes comprehensive test suite in `eval.test.ts`:
-**Test Corpus**: 6 synthetic documents covering diverse topics
-- api-design.md
-- fundraising.md
-- distributed-systems.md
-- machine-learning.md
-- remote-work.md
-- product-launch.md
-**Query Design**: 24 queries across 4 difficulty levels
-#### Easy Queries (6) - Exact keyword matches
-```typescript
-{ query: "API versioning", expectedDoc: "api-design" }
-{ query: "Series A fundraising", expectedDoc: "fundraising" }
-{ query: "CAP theorem", expectedDoc: "distributed-systems" }
-{ query: "overfitting machine learning", expectedDoc: "machine-learning" }
-{ query: "remote work VPN", expectedDoc: "remote-work" }
-{ query: "Project Phoenix retrospective", expectedDoc: "product-launch" }
-```
-**Expected**: BM25 should excel (≥80% Hit@3)
-#### Medium Queries (6) - Semantic/conceptual
-```typescript
-{ query: "how to structure REST endpoints", expectedDoc: "api-design" }
-{ query: "raising money for startup", expectedDoc: "fundraising" }
-{ query: "consistency vs availability tradeoffs", expectedDoc: "distributed-systems" }
-{ query: "how to prevent models from memorizing data", expectedDoc: "machine-learning" }
-{ query: "working from home guidelines", expectedDoc: "remote-work" }
-{ query: "what went wrong with the launch", expectedDoc: "product-launch" }
-```
-**Expected**: Vectors should outperform BM25 (≥40% vs ≥15%)
+**Key patterns**:
+- Sessions have `isValid` flag and `signal` (AbortSignal) for lifecycle tracking
+- Maximum duration timeout prevents runaway sessions
+- Models lazy-load but stay resident; contexts dispose after 2-min idle
+- Singleton pattern ensures only one LLM instance (memory management)
-#### Hard Queries (6) - Vague, partial memory
-```typescript
-{ query: "nouns not verbs", expectedDoc: "api-design" }
-{ query: "Sequoia investor pitch", expectedDoc: "fundraising" }
-{ query: "Raft algorithm leader election", expectedDoc: "distributed-systems" }
-{ query: "F1 score precision recall", expectedDoc: "machine-learning" }
-{ query: "quarterly team gathering travel", expectedDoc: "remote-work" }
-{ query: "beta program 47 bugs", expectedDoc: "product-launch" }
-```
-**Expected**: Both methods struggle, hybrid helps (≥35% @ H@5 vs ≥15%)
-#### Fusion Queries (6) - Multi-signal needed
-```typescript
-{ query: "how much runway before running out of money", expectedDoc: "fundraising" }
-{ query: "datacenter replication sync strategy", expectedDoc: "distributed-systems" }
-{ query: "splitting data for training and testing", expectedDoc: "machine-learning" }
-{ query: "JSON response codes error messages", expectedDoc: "api-design" }
-{ query: "video calls camera async messaging", expectedDoc: "remote-work" }
-{ query: "CI/CD pipeline testing coverage", expectedDoc: "product-launch" }
-```
-**Expected**: RRF combines weak signals (≥50% vs ~15-30% for single methods)
-### Results Summary
-| Method | Easy H@3 | Medium H@3 | Hard H@5 | Fusion H@3 | Overall H@3 |
-|--------|----------|------------|----------|------------|-------------|
-| **BM25** | ≥80% | ≥15% | ≥15% | ~15% | ≥40% |
-| **Vector** | ≥60% | ≥40% | ≥30% | ~30% | ≥50% |
-| **Hybrid (RRF)** | ≥80% | **≥50%** | **≥35%** | **≥50%** | **≥60%** |
-**Key Findings**:
-1. BM25 sufficient for easy queries (exact matches)
-2. Vectors essential for medium queries (+233% improvement)
-3. RRF fusion best for fusion queries (combines weak signals)
-4. Overall: Hybrid provides 50% improvement over BM25 baseline
-### Latency Analysis
-**Measured on M1 Mac, 16GB RAM**:
-| Operation | Cold Start | Warm (Cached) | Strong Signal |
-|-----------|------------|---------------|---------------|
-| `search` (BM25) | <50ms | <50ms | <50ms |
-| `vsearch` (Vector) | ~2s (model load) | ~200ms | ~200ms |
-| `query` (Hybrid) | 3-5s (all models) | ~500ms | ~200ms |
-**Breakdown for `query` (cold)**:
-- Model loading: ~2s (embed + rerank + expand)
-- Query expansion: ~800ms (LLM generation)
-- FTS + Vector: ~300ms (parallel)
-- RRF fusion: <10ms (pure algorithm)
-- Reranking: ~400ms (cross-encoder scoring)
-- Total: 3-5s
-**Breakdown for `query` (warm)**:
-- FTS + Vector: ~300ms
-- RRF fusion: <10ms
-- Reranking (cached): ~50ms
-- Total: ~400-500ms
-**Breakdown for `query` (strong signal, skipped)**:
-- FTS: ~50ms
-- Smart detection: <5ms
-- Vector (skipped): 0ms
-- Expansion (skipped): 0ms
-- Reranking (skipped): 0ms
-- Total: ~100-150ms
-### Resource Usage
-**Disk Space**:
-- Per document: ~5KB (body + metadata)
-- Per chunk embedding: ~1.5KB (384 floats + metadata)
-- Example: 1000 documents, 5 chunks avg = 5MB + 7.5MB = **12.5MB total**
-**Memory**:
-- Base process: ~50MB
-- EmbeddingGemma loaded: +300MB
-- Reranker loaded: +640MB
-- Expansion model loaded: +2.2GB
-- **Peak**: ~3.2GB (all models loaded)
-**VRAM** (GPU acceleration):
-- EmbeddingGemma: ~300MB
-- Reranker: ~640MB
-- Expansion: ~2.2GB
-- **Peak**: ~3.2GB
-**Optimization**: Models lazy-load and unload after 2min idle.
-### Scalability
-**Tested Corpus Sizes**:
-- 100 documents: FTS <10ms, Vector <100ms
-- 1,000 documents: FTS <50ms, Vector <200ms
-- 10,000 documents: FTS <200ms, Vector <500ms
-**Bottlenecks**:
-1. **Embedding generation**: Linear with document count (once)
-2. **Vector search**: KNN scales log(n) with proper indexing
-3. **FTS search**: Scales well to millions of documents
-4. **Reranking**: Linear with candidate count (top 30-40)
-**Recommended Limits**:
-- Documents: 50,000+ (tested in production)
-- Per-document size: <10MB (chunking handles larger)
-- Query length: <500 tokens (embedding model limit)
+**Relevance to ClaudeMemory**: If we ever integrate local LLMs for distillation, this session-scoped lifecycle pattern is the right approach. Clean abort propagation via AbortSignal is a good practice for any long-running operation.
 ---
 ## Comparative Analysis
-### Data Model Differences
-| Dimension | QMD | ClaudeMemory | Analysis |
-|-----------|-----|--------------|----------|
-| **Granularity** | Full markdown documents | Structured facts (triples) | **Different use cases**: QMD = recall, ClaudeMemory = extraction |
-| **Storage** | Content-addressable (SHA256) | Entity-predicate-object | **QMD advantage**: Auto-deduplication. **ClaudeMemory advantage**: Queryable structure |
-| **Retrieval Goal** | "Show me docs about X" | "What do we know about X?" | **Complementary**: QMD finds context, ClaudeMemory distills knowledge |
-| **Truth Model** | All documents valid | Supersession + conflicts | **ClaudeMemory advantage**: Resolves contradictions |
-| **Scope** | YAML collections | Dual-database | **ClaudeMemory advantage**: Clean separation |
-**Verdict**: **Different paradigms, not competitors**. QMD optimizes for document recall, ClaudeMemory for knowledge graphs.
+### What QMD Does Well (New Findings)
-### Search Quality
+#### 1. Custom Fine-Tuned Model Pipeline
+- **Description**: Full training pipeline (SFT → GRPO → GGUF conversion) for search-specific model
+- **Evidence**: `finetune/reward.py` — multi-dimensional reward function; `finetune/train.py` — unified training script
+- **Why It Works**: Domain-specific models outperform general-purpose LLMs for structured tasks. The two-stage approach (format learning via SFT, quality refinement via GRPO) is state-of-the-art.
+- **Metric**: Min 92% average score required before deployment
-| Feature | QMD | ClaudeMemory | Winner |
-|---------|-----|--------------|--------|
-| **Lexical Search** | BM25 (FTS5) | FTS5 | **Tie** |
-| **Vector Search** | EmbeddingGemma (300M) | TF-IDF (lightweight) | **QMD** (but costly) |
-| **Ranking Algorithm** | RRF + position-aware blending | Score sorting | **QMD** |
-| **Reranking** | Cross-encoder LLM | None | **QMD** (but costly) |
-| **Query Expansion** | LLM-generated variants | None | **QMD** (but costly) |
+#### 2. Plugin Distribution
+- **Description**: Ships as a Claude Code marketplace plugin with zero-config MCP + skills
+- **Evidence**: `.claude-plugin/marketplace.json`, `skills/qmd/SKILL.md`
+- **Why It Works**: `claude marketplace add tobi/qmd` is dramatically simpler than manual gem install + MCP config + hook setup
+- **Impact**: Massive UX improvement for installation
-**Verdict**: **QMD has superior search quality**, but at significant cost (3GB models, 2-3s latency).
+#### 3. Typed Query Routing
+- **Description**: Query expansion produces typed outputs (`lex:`, `vec:`, `hyde:`) routed to appropriate backends
+- **Evidence**: `llm.ts:637-679` — structured prompt; `llm.ts:1006-1013` — grammar constraint
+- **Why It Works**: Different search backends have different strengths. Routing keyword queries to BM25 and semantic queries to vector search maximizes recall.
-**Key Question**: Is the quality improvement worth the complexity for ClaudeMemory's fact-based use case?
+#### 4. Dual Content/StructuredContent MCP Responses
+- **Description**: Every MCP tool returns both human-readable text summary and machine-parseable JSON
+- **Evidence**: `mcp.ts:288-291` — `return { content: [...], structuredContent: {...} }`
+- **Why It Works**: LLMs can parse both formats, but text summaries are more token-efficient for simple consumption
-### Vector Storage
+### What We Do Well
-| Aspect | QMD | ClaudeMemory | Winner |
-|--------|-----|--------------|--------|
-| **Storage Format** | sqlite-vec native (vec0) | JSON columns | **QMD** |
-| **KNN Performance** | Native C code | Ruby JSON parsing | **QMD** (10-100x faster) |
-| **Index Type** | Proper vector index | Sequential scan | **QMD** |
-| **Scalability** | Tested to 10,000+ docs | Limited by JSON parsing | **QMD** |
+#### 1. Fact-Based Knowledge Graph
+- Our subject-predicate-object triples enable structured queries and inference
+- Truth maintenance resolves contradictions automatically
+- Far richer than document-level retrieval for knowledge extraction
-**Verdict**: **QMD's approach is objectively better**. This is a clear adoption opportunity.
+#### 2. Dual-Database Architecture
+- Clean global/project separation without YAML collections
+- Simpler queries, clearer data ownership
-### Dependencies
+#### 3. Comprehensive MCP Surface
+- 18 tools vs QMD's 6 — we cover recall, explain, manage, monitor
+- Progressive disclosure (recall_index → recall_details) for token efficiency
-| Category | QMD | ClaudeMemory | Winner |
-|----------|-----|--------------|--------|
-| **Runtime** | Bun (Node.js compatible) | Ruby 3.2+ | **ClaudeMemory** (simpler) |
-| **Database** | SQLite + sqlite-vec | SQLite | **ClaudeMemory** (fewer deps) |
-| **Embeddings** | EmbeddingGemma (300MB) | TF-IDF (stdlib) | **ClaudeMemory** (lighter) |
-| **LLM** | node-llama-cpp (3GB models) | None (distill only) | **ClaudeMemory** (lighter) |
-| **Install Size** | ~3.5GB (with models) | ~5MB | **ClaudeMemory** |
+#### 4. Lightweight Dependencies
+- ~5MB gem vs ~2GB+ with GGUF models
+- fastembed-rb (67MB ONNX) vs EmbeddingGemma (300MB GGUF)
+- No runtime LLM dependency
-**Verdict**: **ClaudeMemory is dramatically lighter**, which aligns with our philosophy of pragmatic dependencies.
+#### 5. Robust Benchmarking
+- DevMemBench: 155 queries, Recall@k, MRR, nDCG@10
+- 100 truth maintenance test cases
+- 31 end-to-end scenarios with real Claude
+- QMD has 18 eval queries — our evaluation is more comprehensive
-### Offline Capability
+### Trade-offs
-| Operation | QMD | ClaudeMemory | Winner |
-|-----------|-----|--------------|--------|
-| **Indexing** | Fully offline | Fully offline | **Tie** |
-| **Searching** | Fully offline | Fully offline (TF-IDF) | **Tie** |
-| **Distillation** | N/A | Requires API | **QMD** (but N/A) |
-**Verdict**: **QMD has complete offline capability** for its use case. ClaudeMemory could adopt local embeddings for offline semantic search, but distillation still requires API.
-### Startup Time
-| Scenario | QMD | ClaudeMemory | Winner |
-|----------|-----|--------------|--------|
-| **Cold start** | ~2s (model load) | <100ms | **ClaudeMemory** |
-| **Warm start** | <100ms | <100ms | **Tie** |
-**Verdict**: **ClaudeMemory starts faster**, which matters for CLI tools. QMD's lazy loading mitigates this.
+| Approach | Pros | Cons | Best For |
+|----------|------|------|----------|
+| **QMD's LLM-powered search** | Better semantic recall, typed query routing | 2GB+ models, 2-3s cold start, complex deps | Large document collections, conceptual search |
+| **Our FastEmbed search** | Lightweight (67MB), fast (<100ms), no LLM | Lower semantic quality for vague queries | Structured fact retrieval, quick lookups |
+| **QMD's plugin distribution** | Zero-config install, marketplace discovery | Requires plugin ecosystem maturity | Wide user adoption |
+| **Our gem + MCP + hooks** | Fine-grained control, works today | Complex setup, multiple config files | Power users, custom integrations |
 ---
 ## Adoption Opportunities
-### High Priority (Immediate Adoption)
-#### 1. ⭐ sqlite-vec Extension for Native Vector Storage
-**Value**: **10-100x faster KNN queries**, enables larger fact databases without performance degradation.
-**QMD Proof**:
-- Handles 10,000+ documents with sub-second vector queries
-- Native C code vs Ruby JSON parsing
-- Proper indexing vs sequential scan
-**Current ClaudeMemory**:
-```ruby
-# lib/claude_memory/embeddings/similarity.rb
-def search_similar(query_embedding, limit: 10)
-  # Load ALL facts with embeddings
-  facts_data = store.facts_with_embeddings(limit: 5000)
-  # Parse JSON embeddings (slow!)
-  candidates = facts_data.map do |row|
-    embedding = JSON.parse(row[:embedding_json])
-    { fact_id: row[:id], embedding: embedding }
-  end
-  # Calculate cosine similarity in Ruby (slow!)
-  top_matches = candidates.map do |c|
-    similarity = cosine_similarity(query_embedding, c[:embedding])
-    { candidate: c, similarity: similarity }
-  end.sort_by { |m| -m[:similarity] }.take(limit)
-end
-```
-**Problems**:
-- Loads up to 5000 facts into memory
-- JSON parsing overhead per fact
-- O(n) similarity calculation in Ruby
-- No proper indexing
-**With sqlite-vec**:
-```ruby
-# Step 1: Create virtual table (migration v7)
-db.run(<<~SQL)
-  CREATE VIRTUAL TABLE facts_vec USING vec0(
-    fact_id INTEGER PRIMARY KEY,
-    embedding float[384] distance_metric=cosine
-  )
-SQL
-# Step 2: Query with native KNN (two-step to avoid JOIN hang)
-def search_similar(query_embedding, limit: 10)
-  vector_blob = query_embedding.pack('f*')  # Float32Array
-  # Step 2a: Get fact IDs from vec table (no JOINs!)
-  vec_results = @store.db[<<~SQL, vector_blob, limit * 3].all
-    SELECT fact_id, distance
-    FROM facts_vec
-    WHERE embedding MATCH ? AND k = ?
-  SQL
-  # Step 2b: Join with facts table separately
-  fact_ids = vec_results.map { |r| r[:fact_id] }
-  facts = @store.facts.where(id: fact_ids).all
-  # Merge and sort
-  facts.map do |fact|
-    distance = vec_results.find { |r| r[:fact_id] == fact[:id] }[:distance]
-    { fact: fact, similarity: 1 - distance }
-  end.sort_by { |r| -r[:similarity] }
-end
-```
-**Benefits**:
-- **10-100x faster**: Native C code
-- **Better memory**: No need to load all facts
-- **Scales**: Handles 50,000+ facts easily
-- **Industry standard**: Used by Chroma, LanceDB, etc.
-**Implementation**:
-1. Add sqlite-vec extension (gem or FFI)
-2. Schema migration v7: Create `facts_vec` virtual table
-3. Backfill existing embeddings
-4. Update Similarity class
-5. Test migration on existing databases
-**Trade-off**: Adds native dependency, but well-maintained and cross-platform.
-**Recommendation**: **ADOPT IMMEDIATELY**. This is a foundational improvement.
----
-#### 2. ⭐ Reciprocal Rank Fusion (RRF) Algorithm
-**Value**: **50% improvement in Hit@3** for medium-difficulty queries (QMD evaluation).
-**QMD Proof**: Evaluation shows consistent improvements across all query types.
-**Current ClaudeMemory**:
-```ruby
-# lib/claude_memory/recall.rb
-def merge_search_results(vector_results, text_results, limit)
-  # Simple dedupe: add all results, prefer vector scores
-  combined = {}
-  vector_results.each { |r| combined[r[:fact][:id]] = r }
-  text_results.each { |r| combined[r[:fact][:id]] ||= r }
-  # Sort by similarity (vector) or default score (FTS)
-  combined.values
-    .sort_by { |r| -(r[:similarity] || 0) }
-    .take(limit)
-end
-```
-**Problems**:
-- No fusion of ranking signals
-- Vector scores dominate (when present)
-- Doesn't boost items appearing in multiple result lists
-- Ignores rank position (only final scores)
-**With RRF**:
-```ruby
-# lib/claude_memory/recall/rrf_fusion.rb
-module ClaudeMemory
-  module Recall
-    class RRFusion
-      DEFAULT_K = 60
-      def self.fuse(ranked_lists, weights: [], k: DEFAULT_K)
-        scores = {}
-        # Accumulate RRF scores
-        ranked_lists.each_with_index do |list, list_idx|
-          weight = weights[list_idx] || 1.0
-          list.each_with_index do |item, rank|
-            key = item_key(item)
-            rrf_contribution = weight / (k + rank + 1.0)
-            if scores.key?(key)
-              scores[key][:rrf_score] += rrf_contribution
-              scores[key][:top_rank] = [scores[key][:top_rank], rank].min
-            else
-              scores[key] = {
-                item: item,
-                rrf_score: rrf_contribution,
-                top_rank: rank
-              }
-            end
-          end
-        end
-        # Top-rank bonus
-        scores.each_value do |entry|
-          if entry[:top_rank] == 0
-            entry[:rrf_score] += 0.05  # #1 in any list
-          elsif entry[:top_rank] <= 2
-            entry[:rrf_score] += 0.02  # #2-3 in any list
-          end
-        end
-        # Sort and return
-        scores.values
-          .sort_by { |e| -e[:rrf_score] }
-          .map { |e| e[:item].merge(rrf_score: e[:rrf_score]) }
-      end
-      private
-      def self.item_key(item)
-        # Dedupe by fact signature
-        fact = item[:fact]
-        "#{fact[:subject_name]}:#{fact[:predicate]}:#{fact[:object_literal]}"
-      end
-    end
-  end
-end
-```
-**Benefits**:
-- **Mathematically sound**: Well-studied in IR literature
-- **Handles score scale differences**: BM25 vs cosine similarity
-- **Boosts multi-method matches**: Items in both lists get higher scores
-- **Preserves exact matches**: Top-rank bonus keeps strong signals at top
-- **Pure algorithm**: No dependencies, fast (<10ms)
-**Implementation**:
-1. Create `lib/claude_memory/recall/rrf_fusion.rb`
-2. Update `Recall#query_semantic_dual` to use RRF
-3. Test with synthetic ranked lists
-4. Validate improvements with eval suite (if we create one)
-**Trade-off**: Slightly more complex than naive merging, but well worth it.
-**Recommendation**: **ADOPT IMMEDIATELY**. Pure algorithmic improvement with proven results.
----
-#### 3. ⭐ Docid Short Hash System
-**Value**: **Better UX**, enables cross-database references without context.
-**QMD Implementation**:
-```typescript
-// Generate 6-character docid from content hash
-function getDocid(hash: string): string {
-  return hash.slice(0, 6);  // First 6 chars
-}
-// Use in output
-{
-  docid: `#${getDocid(row.hash)}`,
-  file: row.path,
-  // ...
-}
-// Retrieval
-qmd get "#abc123"  // Works!
-qmd get "abc123"   // Also works!
-```
-**Current ClaudeMemory**:
-```ruby
-# Facts referenced by integer IDs
-claude-memory explain 42  # Which database? Which project?
-```
-**Problems**:
-- Integer IDs are database-specific (global vs project)
-- Not user-friendly
-- No quick reference format
-**With Docids**:
-```ruby
-# Migration v8: Add docid column
-def migrate_to_v8_safe!
-  @db.transaction do
-    @db.alter_table(:facts) do
-      add_column :docid, String, size: 8
-      add_index :docid, unique: true
-    end
-    # Backfill docids
-    @db[:facts].each do |fact|
-      signature = "#{fact[:id]}:#{fact[:subject_entity_id]}:#{fact[:predicate]}:#{fact[:object_literal]}"
-      hash = Digest::SHA256.hexdigest(signature)
-      docid = hash[0...8]  # 8 chars for lower collision risk
-      # Handle collisions (rare with 8 chars)
-      while @db[:facts].where(docid: docid).count > 0
-        hash = Digest::SHA256.hexdigest(hash + rand.to_s)
-        docid = hash[0...8]
-      end
-      @db[:facts].where(id: fact[:id]).update(docid: docid)
-    end
-  end
-end
-# Usage
-claude-memory explain abc123   # Works across databases!
-claude-memory explain #abc123  # Also works!
-# Output formatting
-puts "Fact ##{fact[:docid]}: #{fact[:subject_name]} #{fact[:predicate]} ..."
-```
-**Benefits**:
-- **Database-agnostic**: Same reference works for global/project facts
-- **User-friendly**: `#abc123` is memorable and shareable
-- **Standard pattern**: Git uses short SHAs, QMD uses short hashes
-**Implementation**:
-1. Schema migration v8: Add `docid` column
-2. Backfill existing facts
-3. Update CLI commands to accept docids
-4. Update MCP tools to accept docids
-5. Update output formatting to show docids
-**Trade-off**:
-- Hash collisions possible (8 chars = 1 in 4.3 billion, very rare)
-- Migration backfills existing facts (one-time cost)
-**Recommendation**: **ADOPT IN PHASE 3**. Clear UX improvement with minimal cost.
----
-#### 4. ⭐ Smart Expansion Detection
-**Value**: **Skip unnecessary vector search** when FTS finds exact match, saving 200-500ms per query.
-**QMD Implementation**:
-```typescript
-// Check if BM25 has strong, clear top result
-const topScore = initialFts[0]?.score ?? 0;
-const secondScore = initialFts[1]?.score ?? 0;
-const hasStrongSignal =
-  initialFts.length > 0 &&
-  topScore >= 0.85 &&
-  (topScore - secondScore) >= 0.15;
-if (hasStrongSignal) {
-  // Skip expensive vector search and LLM operations
-  return initialFts.slice(0, limit);
-}
-```
-**QMD Data**: Saves 2-3 seconds on ~60% of queries (exact keyword matches).
-**Current ClaudeMemory**:
-```ruby
-# Always run both FTS and vector search
-def query_semantic_dual(text, limit:, scope:, mode:)
-  fts_results = collect_fts_results(...)
-  vec_results = query_vector_stores(...)  # Always runs
-  RRFusion.fuse([fts_results, vec_results])
-end
-```
-**With Smart Detection**:
-```ruby
-# lib/claude_memory/recall/expansion_detector.rb
-module ClaudeMemory
-  module Recall
-    class ExpansionDetector
-      STRONG_SCORE_THRESHOLD = 0.85
-      STRONG_GAP_THRESHOLD = 0.15
-      def self.should_skip_expansion?(results)
-        return false if results.empty? || results.size < 2
-        top_score = results[0][:score] || 0
-        second_score = results[1][:score] || 0
-        gap = top_score - second_score
-        top_score >= STRONG_SCORE_THRESHOLD &&
-          gap >= STRONG_GAP_THRESHOLD
-      end
-    end
-  end
-end
-# Apply in Recall
-def query_semantic_dual(text, limit:, scope:, mode:)
-  # First try FTS
-  fts_results = collect_fts_results(text, limit: limit * 2, scope: scope)
-  # Check if we can skip vector search
-  if mode == :both && ExpansionDetector.should_skip_expansion?(fts_results)
-    return fts_results.first(limit)  # Strong FTS signal
-  end
-  # Weak signal - proceed with vector search and fusion
-  vec_results = query_vector_stores(text, limit: limit * 2, scope: scope)
-  RRFusion.fuse([fts_results, vec_results], weights: [1.0, 1.0]).first(limit)
-end
-```
-**Benefits**:
-- **Performance optimization**: Avoids unnecessary vector search
-- **Simple heuristic**: Well-tested thresholds from QMD
-- **Transparent**: Can log when skipping for metrics
-- **No false negatives**: Only skips when FTS is very confident
-**Implementation**:
-1. Create `lib/claude_memory/recall/expansion_detector.rb`
-2. Update `Recall#query_semantic_dual` to use detector
-3. Test with known exact-match queries
-4. Add optional metrics tracking
-**Trade-off**: May miss semantically similar results for exact matches (acceptable).
-**Recommendation**: **ADOPT IN PHASE 4**. Clear performance win with minimal code.
----
-### Medium Priority (Valuable but Higher Cost)
-#### 5. Document Chunking Strategy
-**Value**: Better embeddings for long transcripts (>3000 chars).
-**QMD Approach**:
-- 800 tokens max, 15% overlap
-- Semantic boundary detection
-- Both token-based and char-based variants
-**Current ClaudeMemory**: Embeds entire fact text (typically short).
-**When Needed**: If users have very long transcripts that produce multi-paragraph facts.
-**Recommendation**: **CONSIDER** if we see performance issues with long content.
----
-#### 6. LLM Response Caching
-**Value**: Reduce API costs for repeated distillation.
-**QMD Proof**: Caches query expansion and reranking, achieves ~80% cache hit rate.
-**Implementation**:
-```ruby
-# lib/claude_memory/distill/cache.rb
-class DistillerCache
-  def initialize(store)
-    @store = store
-  end
-  def fetch(content_hash)
-    @store.db[:llm_cache].where(hash: content_hash).first&.dig(:result)
-  end
-  def store(content_hash, result)
-    @store.db[:llm_cache].insert_or_replace(
-      hash: content_hash,
-      result: result.to_json,
-      created_at: Time.now.iso8601
-    )
-    # Probabilistic cleanup (1% chance)
-    cleanup_if_needed if rand < 0.01
-  end
-  private
-  def cleanup_if_needed
-    @store.db.transaction do
-      @store.db.run(<<~SQL)
-        DELETE FROM llm_cache
-        WHERE hash NOT IN (
-          SELECT hash FROM llm_cache
-          ORDER BY created_at DESC
-          LIMIT 1000
-        )
-      SQL
-    end
-  end
-end
-```
-**Recommendation**: **ADOPT when distiller is fully implemented**. Clear cost savings.
----
-### Low Priority (Interesting but Not Critical)
-#### 7. Enhanced Snippet Extraction
-**Value**: Better search result previews with query term highlighting.
-**QMD Approach**:
-```typescript
-function extractSnippet(body: string, query: string, maxLen = 500) {
-  const terms = query.toLowerCase().split(/\s+/);
-  // Find line with most query term matches
-  const lines = body.split('\n');
-  let bestLine = 0, bestScore = -1;
-  for (let i = 0; i < lines.length; i++) {
-    const lineLower = lines[i].toLowerCase();
-    const score = terms.filter(t => lineLower.includes(t)).length;
-    if (score > bestScore) {
-      bestScore = score;
-      bestLine = i;
-    }
-  }
-  // Extract context (1 line before, 2 lines after)
-  const start = Math.max(0, bestLine - 1);
-  const end = Math.min(lines.length, bestLine + 3);
-  const snippet = lines.slice(start, end).join('\n');
-  return {
-    line: bestLine + 1,
-    snippet: snippet.substring(0, maxLen),
-    linesBefore: start,
-    linesAfter: lines.length - end
-  };
-}
-```
-**Recommendation**: **CONSIDER for better UX** in search results.
----
-### Features NOT to Adopt
-#### ❌ YAML Collection System
-**QMD Use**: Manages multi-directory indexing with per-path contexts.
-**Our Use**: Dual-database (global + project) already provides clean separation.
-**Mismatch**: Collections add complexity without clear benefit for our use case.
-**Recommendation**: **REJECT** - Our dual-DB approach is simpler and better suited.
----
-#### ❌ Content-Addressable Document Storage
-**QMD Use**: Deduplicates full markdown documents by SHA256 hash.
-**Our Use**: Facts are deduplicated by semantic signature, not content hash.
-**Mismatch**: We don't store full documents, we extract facts.
-**Recommendation**: **REJECT** - Different data model.
----
-#### ❌ Virtual Path System (qmd://collection/path)
-**QMD Use**: Unified namespace across multiple collections.
-**Our Use**: Dual-database provides clear namespace (global vs project).
-**Mismatch**: Adds complexity for no clear benefit.
-**Recommendation**: **REJECT** - Unnecessary abstraction.
----
-#### ❌ Neural Embeddings (EmbeddingGemma)
-**QMD Use**: 300M parameter model for high-quality semantic search.
-**Our Use**: TF-IDF (lightweight, no dependencies).
-**Trade-off**:
-- ✅ Better quality (+40% Hit@3 over TF-IDF)
-- ❌ 300MB download
-- ❌ 300MB VRAM
-- ❌ 2s cold start latency
-- ❌ Complex dependency (node-llama-cpp or similar)
-**Decision**: **DEFER** - TF-IDF sufficient for now. Revisit if users report poor semantic search quality.
----
-#### ❌ Cross-Encoder Reranking
-**QMD Use**: LLM scores query-document relevance for final ranking.
-**Our Use**: None (just use retrieval scores).
-**Trade-off**:
-- ✅ Better precision (elevates semantically relevant results)
-- ❌ 640MB model
-- ❌ 400ms latency per query
-- ❌ Complex dependency
-**Decision**: **REJECT** - Over-engineering for fact retrieval. Facts are already structured; reranking is overkill.
----
-#### ❌ Query Expansion (LLM)
-**QMD Use**: Generates alternative query phrasings for better recall.
-**Our Use**: None (single query only).
-**Trade-off**:
-- ✅ Better recall (finds documents with different terminology)
-- ❌ 2.2GB model
-- ❌ 800ms latency per query
-- ❌ Complex dependency
-**Decision**: **REJECT** - We don't have LLM in recall path (only in distill). Adding LLM dependency for recall is too heavy.
+### High Priority ⭐
+#### 1. Claude Code Plugin Distribution Format ⭐ NEW
+- **Value**: 10x easier installation (single command vs multi-step gem + MCP + hook config)
+- **Evidence**: `.claude-plugin/marketplace.json` — complete plugin spec; `skills/qmd/SKILL.md` — skill definition with tool scoping
+- **Implementation**: Create `.claude-plugin/marketplace.json` with `mcpServers` pointing to `claude-memory serve-mcp`, skill definition from existing MCP tools, and `allowed-tools: mcp__claude-memory__*`
+- **Effort**: 2-3 days (plugin metadata, skill definition, testing, documentation)
+- **Trade-off**: Depends on Claude Code plugin ecosystem maturity; current hooks integration may still be needed
+- **Recommendation**: **ADOPT** — QMD proves the format works. Start with plugin skeleton, iterate as ecosystem matures
+- **Integration Points**: New `.claude-plugin/` directory, `skills/` directory, update installation docs
+#### 2. MCP Structured Content Pattern ⭐ NEW
+- **Value**: Better MCP response quality — dual human-readable + machine-parseable output
+- **Evidence**: `mcp.ts:288-291` — `{ content: [{ type: "text", text: summary }], structuredContent: { results } }`
+- **Implementation**: Update all 18 MCP tool handlers to return both `content` (text summary) and `structuredContent` (JSON). Text content would be a concise summary; structured content preserves full data.
+- **Effort**: 1-2 days (update tool handlers, update tests)
+- **Trade-off**: Slightly more code per tool handler; may need to verify Claude Code MCP client supports `structuredContent`
+- **Recommendation**: **ADOPT** — Pure improvement, no downside if client supports it
+- **Integration Points**: `lib/claude_memory/mcp/server.rb`, all tool handler methods
+#### 3. MCP Registered Prompt for Query Guidance ⭐ NEW
+- **Value**: Claude uses memory tools more effectively with embedded search strategy
+- **Evidence**: `mcp.ts:172-252` — registered prompt explaining when to use recall vs recall_semantic vs search_concepts
+- **Implementation**: Register a `memory_guide` prompt in our MCP server explaining tool selection strategy (recall for keywords, recall_semantic for concepts, search_concepts for multi-faceted queries, explain for provenance)
+- **Effort**: 4-6 hours (write prompt, register in server, test)
+- **Trade-off**: Minimal; prompt is only loaded on request
+- **Recommendation**: **ADOPT** — Simple way to improve tool usage quality
+- **Integration Points**: `lib/claude_memory/mcp/server.rb`
+#### 4. Inline Status Check in Skills ⭐ NEW
+- **Value**: Immediate feedback on memory system health when skill loads
+- **Evidence**: `SKILL.md:18` — `!` prefix runs `qmd status 2>/dev/null || echo "Not installed"`
+- **Implementation**: Add inline check to our skill definition: `!claude-memory doctor --brief 2>/dev/null || echo "Not configured. Run: gem install claude_memory"`
+- **Effort**: 1-2 hours
+- **Trade-off**: None
+- **Recommendation**: **ADOPT** — Trivial improvement with clear benefit
+- **Integration Points**: Skill definition file
+### Previously Identified (Carried Forward)
+These items from the 2026-01-26 analysis remain relevant:
+#### 5. ⭐ Native Vector Storage (sqlite-vec) — STILL CRITICAL
+- **Value**: 10-100x faster KNN queries
+- **Status**: Not yet implemented in ClaudeMemory
+- **Updated Evidence**: QMD now handles 10,000+ documents in production (5,700+ star project)
+- **Recommendation**: **ADOPT IMMEDIATELY** — Foundational improvement
+#### 6. ⭐ Reciprocal Rank Fusion (RRF) Algorithm — STILL HIGH VALUE
+- **Value**: 50% improvement in Hit@3 for medium-difficulty queries
+- **Status**: Not yet implemented in ClaudeMemory
+- **Recommendation**: **ADOPT IMMEDIATELY** — Pure algorithmic improvement
+#### 7. ⭐ Docid Short Hash System — STILL MEDIUM VALUE
+- **Value**: Better UX, cross-database fact references
+- **Status**: Not yet implemented
+- **Recommendation**: **ADOPT IN PHASE 2**
+#### 8. ⭐ Smart Expansion Detection — STILL MEDIUM VALUE
+- **Value**: Skip unnecessary vector search when FTS has strong signal
+- **Status**: Not yet implemented
+- **Recommendation**: **ADOPT IN PHASE 3**
+### Medium Priority
+#### 9. Skill Definition with Tool Scoping
+- **Value**: Security and UX — limit tool access to memory-related commands
+- **Evidence**: `SKILL.md:9` — `allowed-tools: Bash(qmd:*), mcp__qmd__*`
+- **Implementation**: Define skill with `allowed-tools: Bash(claude-memory:*), mcp__claude-memory__*`
+- **Effort**: Included in plugin distribution work
+- **Recommendation**: **CONSIDER** — Good practice for plugin security
+- **Integration Points**: Skills directory
+#### 10. Evaluation Harness Improvements
+- **Value**: QMD's eval structure with difficulty levels and Hit@K metrics is cleaner
+- **Evidence**: `test/eval-harness.ts:11-16` — typed queries with difficulty + description
+- **Implementation**: Already have DevMemBench (more comprehensive). Could adopt difficulty classification.
+- **Recommendation**: **CONSIDER** — Our evals are already better; could add difficulty labels
+### Low Priority
+#### 11. YAML-Based Collection Configuration
+- **Value**: User-editable config for what gets indexed
+- **Evidence**: `collections.ts`, `example-index.yml`
+- **Recommendation**: **REJECT** — Our dual-database provides cleaner separation
+#### 12. Custom Query Expansion Model
+- **Value**: Better search recall via ML-powered query rewriting
+- **Evidence**: `finetune/` — complete training pipeline
+- **Recommendation**: **REJECT** — Too heavy (1.7B model) for our fact retrieval use case. If we need expansion, we can leverage Claude's own capabilities during recall.
+#### 13. LLM-Based Reranking
+- **Value**: Better ranking precision
+- **Recommendation**: **REJECT** — Over-engineering for structured fact retrieval
+### Features to Avoid
+#### 1. Heavy Local LLM Dependencies
+- **What It Is**: Three GGUF models totaling ~2GB for search operations
+- **Why Avoid**: ClaudeMemory targets lightweight, instant search. 2-3s cold start and 3GB memory is inappropriate for a fact lookup tool.
+- **Our Alternative**: FastEmbed (67MB ONNX, <100ms) provides adequate semantic search for structured facts.
+#### 2. Content-Addressable Document Storage
+- **What It Is**: SHA256 hash-based deduplication of full documents
+- **Why Avoid**: We store facts, not documents. Our deduplication is by fact signature.
+- **Our Alternative**: Existing fact signature-based deduplication.
 ---
 ## Implementation Recommendations
-### Phased Adoption Strategy
+### Phase 1: Plugin Foundation (NEW)
-#### Phase 1: Vector Storage Foundation (IMMEDIATE)
-**Goal**: Adopt sqlite-vec and RRF fusion for performance and quality.
+**Goals**: Establish ClaudeMemory as a Claude Code plugin with improved MCP output
 **Tasks**:
-1. Add sqlite-vec extension support (gem or FFI)
-2. Create schema migration v7 for `facts_vec` virtual table
-3. Backfill existing embeddings (one-time migration)
-4. Update `Embeddings::Similarity` class for native KNN
-5. Implement `Recall::RRFusion` class
-6. Update `Recall#query_semantic_dual` to use RRF
-7. Test migration on existing databases
-8. Document extension installation in README
-**Expected Impact**:
-- 10-100x faster vector search
-- 50% better hybrid search quality (Hit@3)
-- Scales to 50,000+ facts
-**Effort**: 2-3 days
+- [ ] Create `.claude-plugin/marketplace.json` with plugin metadata
+- [ ] Create skill definition with tool scoping and inline health check
+- [ ] Add MCP structured content pattern to all 18 tool handlers
+- [ ] Register query guidance prompt in MCP server
+- [ ] Test plugin installation workflow
+- [ ] Update installation docs
----
+**Success Criteria**:
+- ClaudeMemory installable via `claude plugin add`
+- MCP tools return both text summaries and structured JSON
+- Query guide prompt available via MCP
-#### Phase 2: UX Improvements (NEAR-TERM)
-**Goal**: Adopt docid hashes and smart detection for better UX and performance.
-**Tasks**:
-1. Create schema migration v8 for `docid` column
-2. Backfill existing facts with docids
-3. Update CLI commands (`ExplainCommand`, `RecallCommand`) to accept docids
-4. Update MCP tools to accept docids
-5. Update output formatting to show docids
-6. Implement `Recall::ExpansionDetector` class
-7. Update `Recall#query_semantic_dual` to use detector
-8. Add optional metrics tracking (skip rate, avg latency)
-**Expected Impact**:
-- Better UX (human-friendly fact references)
-- 200-500ms latency reduction on exact matches
-- Cross-database references without context
-**Effort**: 1-2 days
+**Risks**: Plugin ecosystem may change; maintain backward compatibility with manual setup
 ---
-#### Phase 3: Caching and Optimization (FUTURE)
+### Phase 2: Vector Storage Upgrade (CARRIED FORWARD)
-**Goal**: Reduce API costs and optimize for long content.
+**Goals**: Adopt sqlite-vec for native KNN and RRF fusion for search quality
 **Tasks**:
-1. Add `llm_cache` table to schema
-2. Implement `Distill::Cache` class
-3. Update `Distill::Distiller` to use cache
-4. Add probabilistic cleanup (1% chance per distill)
-5. Evaluate document chunking for long transcripts
-6. Implement chunking strategy if needed
+- [ ] Add sqlite-vec extension support
+- [ ] Schema migration for `facts_vec` virtual table (two-step query pattern)
+- [ ] Implement `Recall::RRFusion` class
+- [ ] Backfill existing embeddings
+- [ ] Benchmark: target 10x KNN improvement
-**Expected Impact**:
-- Reduced API costs (80% cache hit rate expected)
-- Better handling of long transcripts (if needed)
-**Effort**: 2-3 days
----
-### Testing Strategy
-**Unit Tests**:
-- RRFusion algorithm with synthetic ranked lists
-- ExpansionDetector with various score distributions
-- Docid generation and collision handling
-- sqlite-vec migration (up and down)
-**Integration Tests**:
-- End-to-end hybrid search with RRF fusion
-- Cross-database docid lookups
-- Cache hit/miss behavior
-- Smart detection skip rate
-**Evaluation Suite** (optional but recommended):
-- Create synthetic fact corpus with known relationships
-- Define easy/medium/hard recall queries
-- Measure Hit@K before/after RRF adoption
-- Track latency improvements from smart detection
-**Performance Tests**:
-- Benchmark vector search: JSON vs sqlite-vec
-- Measure RRF overhead (<10ms expected)
-- Profile smart detection accuracy
+**Success Criteria**:
+- Vector search uses native sqlite-vec
+- RRF fusion active for hybrid queries
+- DevMemBench shows improved retrieval metrics
 ---
-### Migration Safety
+### Phase 3: UX Polish (CARRIED FORWARD)
-**Schema Migrations**:
-- Always use transactions for atomicity
-- Provide rollback path (down migration)
-- Test on copy of production database first
-- Backup before running migrations
+**Goals**: Docid hashes and smart expansion detection
-**Backfill Strategy**:
-- Run backfill in batches (1000 facts at a time)
-- Add progress reporting for long operations
-- Handle errors gracefully (skip + log)
-**Rollback Plan**:
-- Keep JSON embeddings column until v7 is stable
-- Provide `migrate_down_to_v6` method
-- Document rollback procedure in CHANGELOG
+**Tasks**:
+- [ ] Schema migration for `docid` column (8-char hash)
+- [ ] Implement `Recall::ExpansionDetector`
+- [ ] Update CLI and MCP tools for docid support
 ---
 ## Architecture Decisions
-### Preserve Our Unique Advantages
-**1. Fact-Based Knowledge Graph**
-**What**: Subject-predicate-object triples vs full document storage.
-**Why Keep**:
-- Enables structured queries ("What databases does X use?")
-- Supports inference (supersession, conflicts)
-- More precise than document-level retrieval
-**Don't Adopt**: QMD's document-centric model.
----
-**2. Truth Maintenance System**
+### What to Preserve
-**What**: Supersession, conflict detection, predicate policies.
+- **Fact-Based Knowledge Graph**: Our structured triples are fundamentally different from (and better suited for knowledge extraction than) QMD's document storage
+- **Truth Maintenance**: Supersession + conflict resolution is a core differentiator
+- **Dual-Database Architecture**: Cleaner than YAML collections for our use case
+- **Lightweight Dependencies**: Ruby gem + ONNX embeddings vs 2GB+ GGUF models
-**Why Keep**:
-- Resolves contradictions automatically
-- Distinguishes single-value vs multi-value predicates
-- Provides evidence chain via provenance
+### What to Adopt (NEW)
-**Don't Adopt**: QMD's "all documents valid" model.
----
+- **Plugin Distribution Format**: `.claude-plugin/marketplace.json` + skills for frictionless installation
+- **Structured MCP Content**: Dual `content`/`structuredContent` responses for all tools
+- **MCP Query Guide Prompt**: Registered prompt teaching Claude how to use memory tools effectively
+- **Inline Status Checks**: Skill-level health verification on load
-**3. Dual-Database Architecture**
+### What to Adopt (CARRIED FORWARD)
-**What**: Separate global.sqlite3 and project.sqlite3.
+- **sqlite-vec Native Vectors**: 10-100x faster KNN (critical)
+- **RRF Fusion**: 50% search quality improvement (critical)
+- **Docid Short Hashes**: Better UX for fact references
+- **Smart Expansion Detection**: Skip vector search when FTS is confident
-**Why Keep**:
-- Clean separation of concerns
-- Better than YAML collections for our use case
-- Simpler queries (no project_path filtering)
+### What to Reject
-**Don't Adopt**: QMD's YAML collection system.
+- **Local LLM Models for Search**: Too heavy (2GB+, 3s cold start)
+- **Custom Fine-Tuned Models**: Training pipeline is impressive but overkill for fact retrieval
+- **YAML Collection System**: Our dual-DB is better for our use case
+- **Content-Addressable Storage**: Different data model
+- **Virtual Path System**: Unnecessary for fact-based storage
 ---
-**4. Lightweight Dependencies**
+## Key Takeaways
-**What**: Ruby stdlib, SQLite, minimal gems.
+### Main Learnings
-**Why Keep**:
-- Fast installation (<5MB)
-- No heavy models required
-- Works offline for core features
+1. **Plugin distribution is the future**: QMD's marketplace plugin reduces installation from "read docs, install gem, configure MCP, set up hooks, restart Claude" to one command. This is the single most impactful UX improvement we should adopt.
-**Selectively Adopt**:
-- ✅ sqlite-vec (small, well-maintained)
-- ❌ Neural embeddings (300MB, complex)
-- ❌ LLM reranking (640MB, complex)
+2. **Structured MCP responses matter**: Returning both text summary and structured JSON is a simple pattern that significantly improves how Claude consumes tool output.
----
+3. **Fine-tuned models for specific tasks work**: QMD's two-stage SFT→GRPO pipeline for query expansion is state-of-the-art. While we shouldn't adopt the models themselves (too heavy), the reward function design and structured output routing are good reference patterns.
-### Adopt Their Innovations
+4. **Eval methodology with difficulty levels**: QMD's easy/medium/hard query classification provides clearer signal about where improvements matter. Our DevMemBench is more comprehensive but could benefit from this labeling.
-**1. Native Vector Storage (sqlite-vec)**
+5. **The previous QMD analysis recommendations remain valid**: sqlite-vec, RRF, docids, and smart expansion are still unimplemented and still valuable.
-**Why Adopt**:
-- Industry standard (used by Chroma, LanceDB, etc.)
-- 10-100x performance improvement
-- Enables larger databases
-- Well-maintained, cross-platform
+### Recommended Adoption Order
-**Implementation**: Phase 1 (immediate).
+1. **First**: Plugin distribution format — highest UX impact, unblocks ecosystem adoption
+2. **Second**: MCP structured content + query guide prompt — low effort, immediate quality gain
+3. **Third**: sqlite-vec + RRF fusion — foundational performance and quality
+4. **Fourth**: Docids + smart expansion — polish and optimization
----
+### Expected Impact
-**2. RRF Fusion Algorithm**
+- **Installation**: 10x easier (single command vs multi-step)
+- **MCP Quality**: Better Claude tool usage with structured responses + query guidance
+- **Search Performance**: 10-100x faster KNN (sqlite-vec), 50% better Hit@3 (RRF)
+- **UX**: Human-friendly fact references (#abc123de), smarter search skipping
-**Why Adopt**:
-- Mathematically sound
-- Proven results (50% improvement)
-- Pure algorithm (no dependencies)
-- Fast (<10ms overhead)
+### Next Actions
-**Implementation**: Phase 1 (immediate).
+- [ ] Review plugin distribution feasibility (check Claude Code plugin spec)
+- [ ] Implement MCP structured content pattern (quick win)
+- [ ] Register query guide MCP prompt (quick win)
+- [ ] Continue with sqlite-vec + RRF adoption plan from previous analysis
+- [ ] Store analysis findings in memory
 ---
-**3. Docid Short Hashes**
-**Why Adopt**:
-- Standard pattern (Git, QMD, etc.)
-- Better UX for CLI tools
-- Cross-database references
-**Implementation**: Phase 2 (near-term).
----
-**4. Smart Expansion Detection**
-**Why Adopt**:
-- Clear performance win
-- Simple heuristic
-- No downsides (only skips when confident)
-**Implementation**: Phase 2 (near-term).
----
-### Reject Due to Cost/Benefit
-**1. Neural Embeddings**
-**Cost**: 300MB download, 2s latency, complex dependency.
-**Benefit**: Better semantic search quality.
-**Decision**: DEFER - TF-IDF sufficient for now.
----
-**2. LLM Reranking**
-**Cost**: 640MB model, 400ms latency per query.
-**Benefit**: Better ranking precision.
-**Decision**: REJECT - Over-engineering for structured facts.
----
-**3. Query Expansion**
-**Cost**: 2.2GB model, 800ms latency per query.
-**Benefit**: Better recall with alternative phrasings.
-**Decision**: REJECT - No LLM in recall path, too heavy.
----
-## Conclusion
-QMD demonstrates **state-of-the-art hybrid search** with impressive quality improvements (50%+ over BM25). However, it achieves this through heavy dependencies (3GB+ models) that may not be appropriate for all use cases.
-**Key Takeaways**:
-1. **sqlite-vec is essential**: Native vector storage is 10-100x faster. This is a must-adopt.
-2. **RRF fusion is proven**: 50% quality improvement with zero dependencies. This is a must-adopt.
-3. **Smart optimizations matter**: Expansion detection saves 200-500ms on 60% of queries. This is worth adopting.
-4. **Neural models are costly**: 3GB+ models provide better quality but at significant cost. Defer for now.
-5. **Architecture matters**: QMD's document model differs from our fact model. Adopt algorithms, not architecture.
-**Recommended Adoption Order**:
-1. **Immediate**: sqlite-vec + RRF fusion (performance foundation)
-2. **Near-term**: Docids + smart detection (UX + optimization)
-3. **Future**: LLM caching + chunking (cost reduction)
-4. **Defer**: Neural embeddings (wait for user feedback)
-5. **Reject**: LLM reranking + query expansion (over-engineering)
+## References
-By selectively adopting QMD's innovations while preserving our unique advantages, we can significantly improve ClaudeMemory's search quality and performance without sacrificing simplicity.
+- **Repository**: https://github.com/tobi/qmd
+- **Previous Analysis**: docs/influence/qmd.md (2026-01-26)
+- **Claude Code Plugins**: https://code.claude.com/docs/en/plugins.md
+- **MCP Spec**: https://modelcontextprotocol.io
+- **sqlite-vec**: https://github.com/asg017/sqlite-vec
+- **RRF Paper**: Cormack et al., "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods" (2009)
 ---
-*End of QMD Analysis*
+*Analysis completed: 2026-02-02*
+*Analyst: Claude Code*
+*Review Status: Draft — Updated from 2026-01-26 analysis with new findings on plugin distribution, fine-tuned models, and MCP patterns*