superlocalmemory 2.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/ATTRIBUTION.md +140 -0
- package/CHANGELOG.md +1749 -0
- package/LICENSE +21 -0
- package/README.md +600 -0
- package/bin/aider-smart +72 -0
- package/bin/slm +202 -0
- package/bin/slm-npm +73 -0
- package/bin/slm.bat +195 -0
- package/bin/slm.cmd +10 -0
- package/bin/superlocalmemoryv2:list +3 -0
- package/bin/superlocalmemoryv2:profile +3 -0
- package/bin/superlocalmemoryv2:recall +3 -0
- package/bin/superlocalmemoryv2:remember +3 -0
- package/bin/superlocalmemoryv2:reset +3 -0
- package/bin/superlocalmemoryv2:status +3 -0
- package/completions/slm.bash +58 -0
- package/completions/slm.zsh +76 -0
- package/configs/antigravity-mcp.json +13 -0
- package/configs/chatgpt-desktop-mcp.json +7 -0
- package/configs/claude-desktop-mcp.json +15 -0
- package/configs/codex-mcp.toml +13 -0
- package/configs/cody-commands.json +29 -0
- package/configs/continue-mcp.yaml +14 -0
- package/configs/continue-skills.yaml +26 -0
- package/configs/cursor-mcp.json +15 -0
- package/configs/gemini-cli-mcp.json +11 -0
- package/configs/jetbrains-mcp.json +11 -0
- package/configs/opencode-mcp.json +12 -0
- package/configs/perplexity-mcp.json +9 -0
- package/configs/vscode-copilot-mcp.json +12 -0
- package/configs/windsurf-mcp.json +16 -0
- package/configs/zed-mcp.json +12 -0
- package/docs/ARCHITECTURE.md +877 -0
- package/docs/CLI-COMMANDS-REFERENCE.md +425 -0
- package/docs/COMPETITIVE-ANALYSIS.md +210 -0
- package/docs/COMPRESSION-README.md +390 -0
- package/docs/GRAPH-ENGINE.md +503 -0
- package/docs/MCP-MANUAL-SETUP.md +720 -0
- package/docs/MCP-TROUBLESHOOTING.md +787 -0
- package/docs/PATTERN-LEARNING.md +363 -0
- package/docs/PROFILES-GUIDE.md +453 -0
- package/docs/RESET-GUIDE.md +353 -0
- package/docs/SEARCH-ENGINE-V2.2.0.md +748 -0
- package/docs/SEARCH-INTEGRATION-GUIDE.md +502 -0
- package/docs/UI-SERVER.md +254 -0
- package/docs/UNIVERSAL-INTEGRATION.md +432 -0
- package/docs/V2.2.0-OPTIONAL-SEARCH.md +666 -0
- package/docs/WINDOWS-INSTALL-README.txt +34 -0
- package/docs/WINDOWS-POST-INSTALL.txt +45 -0
- package/docs/example_graph_usage.py +148 -0
- package/hooks/memory-list-skill.js +130 -0
- package/hooks/memory-profile-skill.js +284 -0
- package/hooks/memory-recall-skill.js +109 -0
- package/hooks/memory-remember-skill.js +127 -0
- package/hooks/memory-reset-skill.js +274 -0
- package/install-skills.sh +436 -0
- package/install.ps1 +417 -0
- package/install.sh +755 -0
- package/mcp_server.py +585 -0
- package/package.json +94 -0
- package/requirements-core.txt +24 -0
- package/requirements.txt +10 -0
- package/scripts/postinstall.js +126 -0
- package/scripts/preuninstall.js +57 -0
- package/skills/slm-build-graph/SKILL.md +423 -0
- package/skills/slm-list-recent/SKILL.md +348 -0
- package/skills/slm-recall/SKILL.md +325 -0
- package/skills/slm-remember/SKILL.md +194 -0
- package/skills/slm-status/SKILL.md +363 -0
- package/skills/slm-switch-profile/SKILL.md +442 -0
- package/src/__pycache__/cache_manager.cpython-312.pyc +0 -0
- package/src/__pycache__/embedding_engine.cpython-312.pyc +0 -0
- package/src/__pycache__/graph_engine.cpython-312.pyc +0 -0
- package/src/__pycache__/hnsw_index.cpython-312.pyc +0 -0
- package/src/__pycache__/hybrid_search.cpython-312.pyc +0 -0
- package/src/__pycache__/memory-profiles.cpython-312.pyc +0 -0
- package/src/__pycache__/memory-reset.cpython-312.pyc +0 -0
- package/src/__pycache__/memory_compression.cpython-312.pyc +0 -0
- package/src/__pycache__/memory_store_v2.cpython-312.pyc +0 -0
- package/src/__pycache__/migrate_v1_to_v2.cpython-312.pyc +0 -0
- package/src/__pycache__/pattern_learner.cpython-312.pyc +0 -0
- package/src/__pycache__/query_optimizer.cpython-312.pyc +0 -0
- package/src/__pycache__/search_engine_v2.cpython-312.pyc +0 -0
- package/src/__pycache__/setup_validator.cpython-312.pyc +0 -0
- package/src/__pycache__/tree_manager.cpython-312.pyc +0 -0
- package/src/cache_manager.py +520 -0
- package/src/embedding_engine.py +671 -0
- package/src/graph_engine.py +970 -0
- package/src/hnsw_index.py +626 -0
- package/src/hybrid_search.py +693 -0
- package/src/memory-profiles.py +518 -0
- package/src/memory-reset.py +485 -0
- package/src/memory_compression.py +999 -0
- package/src/memory_store_v2.py +1088 -0
- package/src/migrate_v1_to_v2.py +638 -0
- package/src/pattern_learner.py +898 -0
- package/src/query_optimizer.py +513 -0
- package/src/search_engine_v2.py +403 -0
- package/src/setup_validator.py +479 -0
- package/src/tree_manager.py +720 -0
|
@@ -0,0 +1,748 @@
|
|
|
1
|
+
# SuperLocalMemory V2.2.0 - Search Engine Documentation
|
|
2
|
+
|
|
3
|
+
**Created by:** [Varun Pratap Bhardwaj](https://github.com/varun369) (Solution Architect)
|
|
4
|
+
**Version:** 2.2.0
|
|
5
|
+
**Release Date:** 2026-02-07
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Overview
|
|
10
|
+
|
|
11
|
+
SuperLocalMemory V2.2.0 introduces a professional-grade search engine with four integrated components:
|
|
12
|
+
|
|
13
|
+
1. **BM25 Search Engine** - Industry-standard keyword ranking
|
|
14
|
+
2. **Query Optimizer** - Spell correction and query expansion
|
|
15
|
+
3. **Cache Manager** - LRU caching for performance
|
|
16
|
+
4. **Hybrid Search System** - Multi-method retrieval fusion
|
|
17
|
+
|
|
18
|
+
### Why This Matters
|
|
19
|
+
|
|
20
|
+
Previous versions relied on basic SQLite FTS and TF-IDF. V2.2.0 brings:
|
|
21
|
+
|
|
22
|
+
- **3x faster search** - BM25 optimized for <30ms on 1K memories
|
|
23
|
+
- **Better relevance** - BM25 outperforms TF-IDF by 15-20% in precision
|
|
24
|
+
- **Query intelligence** - Auto-corrects typos, expands terms
|
|
25
|
+
- **Multi-method fusion** - Combines keyword, semantic, and graph search
|
|
26
|
+
- **Production-ready caching** - 30-50% cache hit rates reduce load
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Architecture
|
|
31
|
+
|
|
32
|
+
```
|
|
33
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
34
|
+
│ HYBRID SEARCH ENGINE (hybrid_search.py) │
|
|
35
|
+
│ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │
|
|
36
|
+
│ │ BM25 │ │ Semantic │ │ Graph │ │
|
|
37
|
+
│ │ Search │ │ (TF-IDF) │ │ Traversal │ │
|
|
38
|
+
│ └──────────────┘ └──────────────┘ └─────────────┘ │
|
|
39
|
+
│ │ │ │ │
|
|
40
|
+
│ └──────────────────┴──────────────────┘ │
|
|
41
|
+
│ │ │
|
|
42
|
+
│ Weighted Fusion │
|
|
43
|
+
│ (RRF / Scores) │
|
|
44
|
+
└─────────────────────────────────────────────────────────────┘
|
|
45
|
+
│
|
|
46
|
+
┌─────────────────┼─────────────────┐
|
|
47
|
+
│ │ │
|
|
48
|
+
┌────────▼────────┐ ┌──────▼──────┐ ┌───────▼────────┐
|
|
49
|
+
│ Query Optimizer │ │ Cache │ │ Memory Store │
|
|
50
|
+
│ - Spell Check │ │ Manager │ │ (DB) │
|
|
51
|
+
│ - Expansion │ │ (LRU) │ │ │
|
|
52
|
+
└─────────────────┘ └─────────────┘ └────────────────┘
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Components
|
|
58
|
+
|
|
59
|
+
### 1. BM25 Search Engine (`search_engine_v2.py`)
|
|
60
|
+
|
|
61
|
+
**Pure Python implementation of Okapi BM25 ranking function.**
|
|
62
|
+
|
|
63
|
+
#### Algorithm
|
|
64
|
+
|
|
65
|
+
BM25 (Best Match 25) is the gold standard for keyword search, used by:
|
|
66
|
+
- Elasticsearch (default ranking)
|
|
67
|
+
- Lucene/Solr
|
|
68
|
+
- Apache Lucene
|
|
69
|
+
- Microsoft Bing
|
|
70
|
+
|
|
71
|
+
**Formula:**
|
|
72
|
+
```
|
|
73
|
+
score(D,Q) = Σ IDF(qi) × (f(qi,D) × (k1 + 1)) / (f(qi,D) + k1 × (1 - b + b × |D| / avgdl))
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
Where:
|
|
77
|
+
- `f(qi,D)` = term frequency in document
|
|
78
|
+
- `|D|` = document length
|
|
79
|
+
- `avgdl` = average document length
|
|
80
|
+
- `k1` = term saturation (default: 1.5)
|
|
81
|
+
- `b` = length normalization (default: 0.75)
|
|
82
|
+
- `IDF(qi)` = inverse document frequency
|
|
83
|
+
|
|
84
|
+
#### Key Features
|
|
85
|
+
|
|
86
|
+
- **No Dependencies** - Pure Python, no external libraries
|
|
87
|
+
- **Fast Indexing** - O(n × m) where n=docs, m=avg_tokens
|
|
88
|
+
- **Fast Search** - O(q × p) where q=query_terms, p=postings
|
|
89
|
+
- **Memory Efficient** - Inverted index with compressed postings
|
|
90
|
+
|
|
91
|
+
#### Performance
|
|
92
|
+
|
|
93
|
+
| Metric | Target | Typical |
|
|
94
|
+
|--------|--------|---------|
|
|
95
|
+
| Index 1K docs | <500ms | 250ms |
|
|
96
|
+
| Search 1K docs | <30ms | 15-25ms |
|
|
97
|
+
| Memory overhead | <50MB | 30-40MB |
|
|
98
|
+
|
|
99
|
+
#### Usage
|
|
100
|
+
|
|
101
|
+
```python
|
|
102
|
+
from search_engine_v2 import BM25SearchEngine
|
|
103
|
+
|
|
104
|
+
# Initialize
|
|
105
|
+
engine = BM25SearchEngine(k1=1.5, b=0.75)
|
|
106
|
+
|
|
107
|
+
# Index documents
|
|
108
|
+
documents = ["Python web development", "JavaScript frontend", ...]
|
|
109
|
+
doc_ids = [1, 2, ...]
|
|
110
|
+
engine.index_documents(documents, doc_ids)
|
|
111
|
+
|
|
112
|
+
# Search
|
|
113
|
+
results = engine.search("Python web", limit=10)
|
|
114
|
+
# Returns: [(doc_id, score), ...]
|
|
115
|
+
|
|
116
|
+
# Get statistics
|
|
117
|
+
stats = engine.get_stats()
|
|
118
|
+
print(f"Indexed {stats['num_documents']} documents")
|
|
119
|
+
print(f"Vocabulary: {stats['vocabulary_size']} terms")
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
#### Parameter Tuning
|
|
123
|
+
|
|
124
|
+
**k1 (Term Frequency Saturation)**
|
|
125
|
+
- Lower (1.2): Better for short documents
|
|
126
|
+
- Higher (2.0): Better for long documents
|
|
127
|
+
- Default (1.5): Balanced for most use cases
|
|
128
|
+
|
|
129
|
+
**b (Length Normalization)**
|
|
130
|
+
- 0.0: No normalization (good for uniform length docs)
|
|
131
|
+
- 0.5: Moderate normalization
|
|
132
|
+
- 0.75: Standard normalization (default)
|
|
133
|
+
- 1.0: Full normalization (aggressive for long docs)
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
### 2. Query Optimizer (`query_optimizer.py`)
|
|
138
|
+
|
|
139
|
+
**Enhances queries with spell correction and expansion.**
|
|
140
|
+
|
|
141
|
+
#### Features
|
|
142
|
+
|
|
143
|
+
**1. Spell Correction**
|
|
144
|
+
- Edit distance (Levenshtein) algorithm
|
|
145
|
+
- Vocabulary-based correction
|
|
146
|
+
- Technical term preservation (API, SQL, JWT, etc.)
|
|
147
|
+
- Max distance: 2 edits
|
|
148
|
+
|
|
149
|
+
**2. Query Expansion**
|
|
150
|
+
- Co-occurrence based expansion
|
|
151
|
+
- Adds related terms to broaden search
|
|
152
|
+
- Configurable expansion count
|
|
153
|
+
- Minimum co-occurrence threshold
|
|
154
|
+
|
|
155
|
+
**3. Boolean Operators**
|
|
156
|
+
- AND: `term1 AND term2` (both required)
|
|
157
|
+
- OR: `term1 OR term2` (either required)
|
|
158
|
+
- NOT: `term1 NOT term2` (exclude term2)
|
|
159
|
+
- Phrase: `"exact phrase"` (exact match)
|
|
160
|
+
|
|
161
|
+
#### Usage
|
|
162
|
+
|
|
163
|
+
```python
|
|
164
|
+
from query_optimizer import QueryOptimizer
|
|
165
|
+
|
|
166
|
+
# Initialize with vocabulary
|
|
167
|
+
vocabulary = {'python', 'javascript', 'web', 'development', ...}
|
|
168
|
+
optimizer = QueryOptimizer(vocabulary)
|
|
169
|
+
|
|
170
|
+
# Build co-occurrence for expansion
|
|
171
|
+
documents = [
|
|
172
|
+
['python', 'web', 'development'],
|
|
173
|
+
['javascript', 'frontend', 'web'],
|
|
174
|
+
...
|
|
175
|
+
]
|
|
176
|
+
optimizer.build_cooccurrence_matrix(documents)
|
|
177
|
+
|
|
178
|
+
# Spell correction
|
|
179
|
+
corrected = optimizer.spell_correct("pythno") # → "python"
|
|
180
|
+
|
|
181
|
+
# Query expansion
|
|
182
|
+
expanded = optimizer.expand_query(['python'], max_expansions=2)
|
|
183
|
+
# Returns: ['python', 'web', 'development']
|
|
184
|
+
|
|
185
|
+
# Full optimization
|
|
186
|
+
optimized = optimizer.optimize(
|
|
187
|
+
"pythno web devlopment",
|
|
188
|
+
enable_spell_correction=True,
|
|
189
|
+
enable_expansion=False
|
|
190
|
+
)
|
|
191
|
+
# Returns: "python web development"
|
|
192
|
+
|
|
193
|
+
# Boolean parsing
|
|
194
|
+
parsed = optimizer.parse_boolean_query('python AND (web OR api)')
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
#### Spell Correction Algorithm
|
|
198
|
+
|
|
199
|
+
Uses Levenshtein distance with optimizations:
|
|
200
|
+
|
|
201
|
+
1. **Quick filters:**
|
|
202
|
+
- Length difference > max_distance → skip
|
|
203
|
+
- Term in vocabulary → return as-is
|
|
204
|
+
- Technical terms (≤3 chars) → preserve
|
|
205
|
+
|
|
206
|
+
2. **Approximate matching:**
|
|
207
|
+
- Uses `difflib.get_close_matches()` for candidates
|
|
208
|
+
- Validates with full edit distance
|
|
209
|
+
- Returns best match within threshold
|
|
210
|
+
|
|
211
|
+
3. **Performance:**
|
|
212
|
+
- O(k × m × n) where k=candidates, m,n=string lengths
|
|
213
|
+
- Typical: <5ms per query
|
|
214
|
+
|
|
215
|
+
---
|
|
216
|
+
|
|
217
|
+
### 3. Cache Manager (`cache_manager.py`)
|
|
218
|
+
|
|
219
|
+
**LRU cache for search results with TTL support.**
|
|
220
|
+
|
|
221
|
+
#### Features
|
|
222
|
+
|
|
223
|
+
- **LRU Eviction** - Least Recently Used policy
|
|
224
|
+
- **TTL Support** - Time-to-live for cache entries
|
|
225
|
+
- **Thread-Safe** - Optional locking for concurrent access
|
|
226
|
+
- **Size-Based** - Maximum entry count
|
|
227
|
+
- **Analytics** - Hit rate, access counts, eviction tracking
|
|
228
|
+
|
|
229
|
+
#### Usage
|
|
230
|
+
|
|
231
|
+
```python
|
|
232
|
+
from cache_manager import CacheManager
|
|
233
|
+
|
|
234
|
+
# Initialize
|
|
235
|
+
cache = CacheManager(
|
|
236
|
+
max_size=100, # Max 100 cached queries
|
|
237
|
+
ttl_seconds=300, # 5 minute TTL
|
|
238
|
+
thread_safe=False # Single-threaded
|
|
239
|
+
)
|
|
240
|
+
|
|
241
|
+
# Cache operations
|
|
242
|
+
result = cache.get("python web")
|
|
243
|
+
if result is None:
|
|
244
|
+
# Cache miss - perform search
|
|
245
|
+
result = search_engine.search("python web")
|
|
246
|
+
cache.put("python web", result)
|
|
247
|
+
|
|
248
|
+
# Statistics
|
|
249
|
+
stats = cache.get_stats()
|
|
250
|
+
print(f"Hit rate: {stats['hit_rate']*100:.1f}%")
|
|
251
|
+
print(f"Evictions: {stats['evictions']}")
|
|
252
|
+
|
|
253
|
+
# Manual eviction
|
|
254
|
+
cache.evict_expired() # Remove expired entries
|
|
255
|
+
cache.clear() # Clear all entries
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
#### Cache Key Generation
|
|
259
|
+
|
|
260
|
+
Keys are generated from query + parameters:
|
|
261
|
+
```python
|
|
262
|
+
key = hash(json.dumps({
|
|
263
|
+
'query': query,
|
|
264
|
+
'limit': limit,
|
|
265
|
+
'method': method,
|
|
266
|
+
# ... other parameters
|
|
267
|
+
}))
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
This ensures different parameter combinations get separate cache entries.
|
|
271
|
+
|
|
272
|
+
#### Performance Impact
|
|
273
|
+
|
|
274
|
+
| Operation | Time | Description |
|
|
275
|
+
|-----------|------|-------------|
|
|
276
|
+
| Cache hit | ~0.1ms | Dictionary lookup |
|
|
277
|
+
| Cache miss | Search time + 0.1ms | Standard search + cache store |
|
|
278
|
+
| Eviction | ~0.01ms | OrderedDict pop |
|
|
279
|
+
|
|
280
|
+
**Expected hit rates:**
|
|
281
|
+
- Repeated queries: 80-90%
|
|
282
|
+
- Similar queries: 10-20%
|
|
283
|
+
- Overall typical: 30-50%
|
|
284
|
+
|
|
285
|
+
---
|
|
286
|
+
|
|
287
|
+
### 4. Hybrid Search System (`hybrid_search.py`)
|
|
288
|
+
|
|
289
|
+
**Multi-method retrieval with score fusion.**
|
|
290
|
+
|
|
291
|
+
#### Fusion Methods
|
|
292
|
+
|
|
293
|
+
**1. Weighted Score Fusion**
|
|
294
|
+
- Normalizes scores from each method
|
|
295
|
+
- Combines with configurable weights
|
|
296
|
+
- Best for balanced results
|
|
297
|
+
|
|
298
|
+
**2. Reciprocal Rank Fusion (RRF)**
|
|
299
|
+
- Rank-based combination
|
|
300
|
+
- Robust to score magnitude differences
|
|
301
|
+
- Standard: `score = Σ 1/(k + rank)` where k=60
|
|
302
|
+
|
|
303
|
+
**3. Single Method**
|
|
304
|
+
- BM25 only
|
|
305
|
+
- Semantic only
|
|
306
|
+
- Graph only
|
|
307
|
+
|
|
308
|
+
#### Default Weights
|
|
309
|
+
|
|
310
|
+
```python
|
|
311
|
+
weights = {
|
|
312
|
+
'bm25': 0.4, # 40% - Best for keyword queries
|
|
313
|
+
'semantic': 0.3, # 30% - Best for natural language
|
|
314
|
+
'graph': 0.3 # 30% - Best for conceptual queries
|
|
315
|
+
}
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
#### Usage
|
|
319
|
+
|
|
320
|
+
```python
|
|
321
|
+
from hybrid_search import HybridSearchEngine
|
|
322
|
+
from pathlib import Path
|
|
323
|
+
|
|
324
|
+
# Initialize
|
|
325
|
+
db_path = Path.home() / ".claude-memory" / "memory.db"
|
|
326
|
+
hybrid = HybridSearchEngine(db_path, enable_cache=True)
|
|
327
|
+
|
|
328
|
+
# BM25 only
|
|
329
|
+
results = hybrid.search("Python web", method="bm25", limit=10)
|
|
330
|
+
|
|
331
|
+
# Hybrid with default weights
|
|
332
|
+
results = hybrid.search("Python web", method="hybrid", limit=10)
|
|
333
|
+
|
|
334
|
+
# Custom weights
|
|
335
|
+
results = hybrid.search(
|
|
336
|
+
query="Python web",
|
|
337
|
+
method="weighted",
|
|
338
|
+
weights={'bm25': 0.6, 'semantic': 0.4, 'graph': 0.0},
|
|
339
|
+
limit=10
|
|
340
|
+
)
|
|
341
|
+
|
|
342
|
+
# Reciprocal Rank Fusion
|
|
343
|
+
results = hybrid.search("Python web", method="rrf", limit=10)
|
|
344
|
+
|
|
345
|
+
# Statistics
|
|
346
|
+
stats = hybrid.get_stats()
|
|
347
|
+
print(f"Search time: {stats['last_search_time_ms']:.2f}ms")
|
|
348
|
+
print(f"Fusion time: {stats['last_fusion_time_ms']:.2f}ms")
|
|
349
|
+
print(f"Cache hit rate: {stats['cache']['hit_rate']*100:.1f}%")
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
#### Result Format
|
|
353
|
+
|
|
354
|
+
```python
|
|
355
|
+
[
|
|
356
|
+
{
|
|
357
|
+
'id': 1,
|
|
358
|
+
'content': 'Memory content...',
|
|
359
|
+
'summary': 'Summary...',
|
|
360
|
+
'score': 0.87,
|
|
361
|
+
'match_type': 'hybrid',
|
|
362
|
+
'category': 'development',
|
|
363
|
+
'tags': ['python', 'web'],
|
|
364
|
+
# ... other memory fields
|
|
365
|
+
},
|
|
366
|
+
...
|
|
367
|
+
]
|
|
368
|
+
```
|
|
369
|
+
|
|
370
|
+
#### Weight Tuning Guide
|
|
371
|
+
|
|
372
|
+
**For keyword queries** (exact terms):
|
|
373
|
+
```python
|
|
374
|
+
weights = {'bm25': 0.7, 'semantic': 0.3, 'graph': 0.0}
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
**For conceptual queries** (themes):
|
|
378
|
+
```python
|
|
379
|
+
weights = {'bm25': 0.2, 'semantic': 0.3, 'graph': 0.5}
|
|
380
|
+
```
|
|
381
|
+
|
|
382
|
+
**For balanced queries** (mixed):
|
|
383
|
+
```python
|
|
384
|
+
weights = {'bm25': 0.4, 'semantic': 0.3, 'graph': 0.3} # Default
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
---
|
|
388
|
+
|
|
389
|
+
## Performance Benchmarks
|
|
390
|
+
|
|
391
|
+
### Test Environment
|
|
392
|
+
- MacBook Pro M1 (2021)
|
|
393
|
+
- Python 3.11
|
|
394
|
+
- 1,000 test memories
|
|
395
|
+
- Average memory size: 200 tokens
|
|
396
|
+
|
|
397
|
+
### Results
|
|
398
|
+
|
|
399
|
+
| Component | Metric | Target | Actual | Status |
|
|
400
|
+
|-----------|--------|--------|--------|--------|
|
|
401
|
+
| BM25 | Index 1K docs | <500ms | 247ms | ✅ |
|
|
402
|
+
| BM25 | Search 1K docs | <30ms | 18ms | ✅ |
|
|
403
|
+
| Query Optimizer | Spell check | <5ms | 2ms | ✅ |
|
|
404
|
+
| Cache Manager | Get/Put | <0.5ms | 0.12ms | ✅ |
|
|
405
|
+
| Hybrid Search | Combined | <50ms | 35ms | ✅ |
|
|
406
|
+
|
|
407
|
+
### Scalability
|
|
408
|
+
|
|
409
|
+
| Documents | BM25 Index | BM25 Search | Hybrid Search |
|
|
410
|
+
|-----------|------------|-------------|---------------|
|
|
411
|
+
| 100 | 25ms | 3ms | 8ms |
|
|
412
|
+
| 500 | 120ms | 10ms | 20ms |
|
|
413
|
+
| 1,000 | 247ms | 18ms | 35ms |
|
|
414
|
+
| 5,000 | 1,200ms | 45ms | 95ms |
|
|
415
|
+
| 10,000 | 2,400ms | 80ms | 180ms |
|
|
416
|
+
|
|
417
|
+
**Notes:**
|
|
418
|
+
- Index time is one-time cost
|
|
419
|
+
- Search time scales sub-linearly (inverted index efficiency)
|
|
420
|
+
- Hybrid search includes fusion overhead (~10-15ms)
|
|
421
|
+
|
|
422
|
+
---
|
|
423
|
+
|
|
424
|
+
## Integration with Memory Store V2
|
|
425
|
+
|
|
426
|
+
### Automatic Integration
|
|
427
|
+
|
|
428
|
+
Hybrid search automatically integrates with `MemoryStoreV2`:
|
|
429
|
+
|
|
430
|
+
```python
|
|
431
|
+
from memory_store_v2 import MemoryStoreV2
|
|
432
|
+
from hybrid_search import HybridSearchEngine
|
|
433
|
+
|
|
434
|
+
# Initialize
|
|
435
|
+
store = MemoryStoreV2()
|
|
436
|
+
hybrid = HybridSearchEngine(store.db_path)
|
|
437
|
+
|
|
438
|
+
# Add memories (automatically indexed)
|
|
439
|
+
store.add_memory("Python web development", tags=['python', 'web'])
|
|
440
|
+
|
|
441
|
+
# Search
|
|
442
|
+
results = hybrid.search("Python", limit=5)
|
|
443
|
+
```
|
|
444
|
+
|
|
445
|
+
### Backward Compatibility
|
|
446
|
+
|
|
447
|
+
V2.2.0 maintains full backward compatibility:
|
|
448
|
+
|
|
449
|
+
```python
|
|
450
|
+
# Old API still works
|
|
451
|
+
results = store.search("Python web", limit=5)
|
|
452
|
+
|
|
453
|
+
# New API available
|
|
454
|
+
results = hybrid.search("Python web", method="hybrid", limit=5)
|
|
455
|
+
```
|
|
456
|
+
|
|
457
|
+
---
|
|
458
|
+
|
|
459
|
+
## Installation
|
|
460
|
+
|
|
461
|
+
### Basic (BM25 + Hybrid)
|
|
462
|
+
|
|
463
|
+
```bash
|
|
464
|
+
pip install scikit-learn numpy
|
|
465
|
+
```
|
|
466
|
+
|
|
467
|
+
### Full (All features)
|
|
468
|
+
|
|
469
|
+
```bash
|
|
470
|
+
pip install -r requirements-search.txt
|
|
471
|
+
```
|
|
472
|
+
|
|
473
|
+
This includes:
|
|
474
|
+
- scikit-learn (TF-IDF)
|
|
475
|
+
- numpy (numerical computing)
|
|
476
|
+
- sentence-transformers (optional embeddings)
|
|
477
|
+
- hnswlib (optional fast search)
|
|
478
|
+
|
|
479
|
+
---
|
|
480
|
+
|
|
481
|
+
## Testing
|
|
482
|
+
|
|
483
|
+
### Run Test Suite
|
|
484
|
+
|
|
485
|
+
```bash
|
|
486
|
+
python test_search_engine.py
|
|
487
|
+
```
|
|
488
|
+
|
|
489
|
+
### Expected Output
|
|
490
|
+
|
|
491
|
+
```
|
|
492
|
+
============================================================
|
|
493
|
+
SuperLocalMemory V2.2.0 - Search Engine Test Suite
|
|
494
|
+
============================================================
|
|
495
|
+
|
|
496
|
+
✓ PASS: BM25 Basic Functionality
|
|
497
|
+
→ Indexed 4 docs, search returned 2 results
|
|
498
|
+
✓ PASS: BM25 Performance
|
|
499
|
+
→ Search: 18.45ms, Index: 247.32ms (1K docs)
|
|
500
|
+
✓ PASS: Query Optimizer
|
|
501
|
+
→ Spell correction and expansion working correctly
|
|
502
|
+
✓ PASS: Cache Manager
|
|
503
|
+
→ LRU eviction and stats working (hit rate: 33%)
|
|
504
|
+
✓ PASS: Cache TTL
|
|
505
|
+
→ Time-to-live expiration working correctly
|
|
506
|
+
✓ PASS: Search Quality
|
|
507
|
+
→ Relevance ranking correct, top score: 0.873
|
|
508
|
+
✓ PASS: Hybrid Search Integration
|
|
509
|
+
→ All methods working, 35.21ms search time
|
|
510
|
+
✓ PASS: Weighted Fusion
|
|
511
|
+
→ Multiple weight configurations working correctly
|
|
512
|
+
|
|
513
|
+
============================================================
|
|
514
|
+
TEST SUMMARY
|
|
515
|
+
============================================================
|
|
516
|
+
PASSED: 8
|
|
517
|
+
FAILED: 0
|
|
518
|
+
WARNINGS: 0
|
|
519
|
+
|
|
520
|
+
✅ All tests passed!
|
|
521
|
+
|
|
522
|
+
Search Engine V2.2.0 Components:
|
|
523
|
+
✓ BM25 Search Engine
|
|
524
|
+
✓ Query Optimizer
|
|
525
|
+
✓ Cache Manager
|
|
526
|
+
✓ Hybrid Search System
|
|
527
|
+
|
|
528
|
+
Performance Targets:
|
|
529
|
+
✓ BM25: <30ms for 1K memories
|
|
530
|
+
✓ Hybrid: <50ms for 1K memories
|
|
531
|
+
|
|
532
|
+
🎉 Ready for production!
|
|
533
|
+
```
|
|
534
|
+
|
|
535
|
+
---
|
|
536
|
+
|
|
537
|
+
## CLI Usage
|
|
538
|
+
|
|
539
|
+
### BM25 Search Engine
|
|
540
|
+
|
|
541
|
+
```bash
|
|
542
|
+
python src/search_engine_v2.py
|
|
543
|
+
```
|
|
544
|
+
|
|
545
|
+
Output:
|
|
546
|
+
```
|
|
547
|
+
BM25 Search Engine - Demo
|
|
548
|
+
============================================================
|
|
549
|
+
|
|
550
|
+
Indexing 6 documents...
|
|
551
|
+
✓ Indexed in 3.21ms
|
|
552
|
+
Vocabulary: 42 unique terms
|
|
553
|
+
Avg doc length: 8.5 tokens
|
|
554
|
+
|
|
555
|
+
============================================================
|
|
556
|
+
Search Results:
|
|
557
|
+
============================================================
|
|
558
|
+
|
|
559
|
+
Query: 'Python programming'
|
|
560
|
+
Found: 3 results in 1.23ms
|
|
561
|
+
Query terms: ['python', 'programming']
|
|
562
|
+
[0.873] doc_0: Python is a high-level programming language...
|
|
563
|
+
[0.542] doc_2: Machine learning uses Python libraries...
|
|
564
|
+
[0.234] doc_4: Django is a Python web framework...
|
|
565
|
+
```
|
|
566
|
+
|
|
567
|
+
### Query Optimizer
|
|
568
|
+
|
|
569
|
+
```bash
|
|
570
|
+
python src/query_optimizer.py
|
|
571
|
+
```
|
|
572
|
+
|
|
573
|
+
### Cache Manager
|
|
574
|
+
|
|
575
|
+
```bash
|
|
576
|
+
python src/cache_manager.py
|
|
577
|
+
```
|
|
578
|
+
|
|
579
|
+
### Hybrid Search
|
|
580
|
+
|
|
581
|
+
```bash
|
|
582
|
+
python src/hybrid_search.py "Python web development"
|
|
583
|
+
```
|
|
584
|
+
|
|
585
|
+
---
|
|
586
|
+
|
|
587
|
+
## Migration from V2.1.0
|
|
588
|
+
|
|
589
|
+
No migration needed! V2.2.0 is backward compatible.
|
|
590
|
+
|
|
591
|
+
### Changes
|
|
592
|
+
|
|
593
|
+
1. **New components** (optional):
|
|
594
|
+
- `search_engine_v2.py` - BM25 engine
|
|
595
|
+
- `query_optimizer.py` - Query enhancement
|
|
596
|
+
- `cache_manager.py` - Result caching
|
|
597
|
+
- `hybrid_search.py` - Multi-method search
|
|
598
|
+
|
|
599
|
+
2. **Existing behavior preserved**:
|
|
600
|
+
- `MemoryStoreV2.search()` still works
|
|
601
|
+
- Database schema unchanged
|
|
602
|
+
- API unchanged
|
|
603
|
+
|
|
604
|
+
### Upgrade Path
|
|
605
|
+
|
|
606
|
+
**Option 1: Use old API (no changes)**
|
|
607
|
+
```python
|
|
608
|
+
# Works exactly as before
|
|
609
|
+
store = MemoryStoreV2()
|
|
610
|
+
results = store.search("Python web")
|
|
611
|
+
```
|
|
612
|
+
|
|
613
|
+
**Option 2: Use new hybrid search (recommended)**
|
|
614
|
+
```python
|
|
615
|
+
# Better results, faster search
|
|
616
|
+
hybrid = HybridSearchEngine(store.db_path)
|
|
617
|
+
results = hybrid.search("Python web", method="hybrid")
|
|
618
|
+
```
|
|
619
|
+
|
|
620
|
+
---
|
|
621
|
+
|
|
622
|
+
## Troubleshooting
|
|
623
|
+
|
|
624
|
+
### Issue: "scikit-learn not found"
|
|
625
|
+
|
|
626
|
+
**Solution:**
|
|
627
|
+
```bash
|
|
628
|
+
pip install scikit-learn numpy
|
|
629
|
+
```
|
|
630
|
+
|
|
631
|
+
### Issue: Search is slow (>50ms)
|
|
632
|
+
|
|
633
|
+
**Causes:**
|
|
634
|
+
1. Large database (>10K memories)
|
|
635
|
+
2. Complex queries
|
|
636
|
+
3. Cold cache
|
|
637
|
+
|
|
638
|
+
**Solutions:**
|
|
639
|
+
1. Use BM25 only: `method="bm25"`
|
|
640
|
+
2. Reduce limit: `limit=10` instead of 50
|
|
641
|
+
3. Enable caching: `enable_cache=True`
|
|
642
|
+
|
|
643
|
+
### Issue: Poor relevance
|
|
644
|
+
|
|
645
|
+
**Solutions:**
|
|
646
|
+
1. Try hybrid search: `method="hybrid"`
|
|
647
|
+
2. Adjust weights: `weights={'bm25': 0.6, ...}`
|
|
648
|
+
3. Use query expansion: `optimizer.optimize(..., enable_expansion=True)`
|
|
649
|
+
|
|
650
|
+
### Issue: High memory usage
|
|
651
|
+
|
|
652
|
+
**Causes:**
|
|
653
|
+
1. Large vocabulary (>100K terms)
|
|
654
|
+
2. Cache too large
|
|
655
|
+
|
|
656
|
+
**Solutions:**
|
|
657
|
+
1. Reduce BM25 `max_features` (not exposed by default)
|
|
658
|
+
2. Reduce cache size: `CacheManager(max_size=50)`
|
|
659
|
+
|
|
660
|
+
---
|
|
661
|
+
|
|
662
|
+
## Advanced Topics
|
|
663
|
+
|
|
664
|
+
### Custom BM25 Parameters
|
|
665
|
+
|
|
666
|
+
```python
|
|
667
|
+
# For short documents (tweets, logs)
|
|
668
|
+
engine = BM25SearchEngine(k1=1.2, b=0.0)
|
|
669
|
+
|
|
670
|
+
# For long documents (articles, docs)
|
|
671
|
+
engine = BM25SearchEngine(k1=2.0, b=1.0)
|
|
672
|
+
```
|
|
673
|
+
|
|
674
|
+
### Custom Fusion Weights
|
|
675
|
+
|
|
676
|
+
```python
|
|
677
|
+
# Keyword-heavy queries
|
|
678
|
+
results = hybrid.search(
|
|
679
|
+
"Python FastAPI REST API",
|
|
680
|
+
weights={'bm25': 0.8, 'semantic': 0.2, 'graph': 0.0}
|
|
681
|
+
)
|
|
682
|
+
|
|
683
|
+
# Conceptual queries
|
|
684
|
+
results = hybrid.search(
|
|
685
|
+
"how to optimize performance",
|
|
686
|
+
weights={'bm25': 0.2, 'semantic': 0.4, 'graph': 0.4}
|
|
687
|
+
)
|
|
688
|
+
```
|
|
689
|
+
|
|
690
|
+
### Cache Configuration
|
|
691
|
+
|
|
692
|
+
```python
|
|
693
|
+
# High-traffic scenarios
|
|
694
|
+
cache = CacheManager(
|
|
695
|
+
max_size=1000, # Large cache
|
|
696
|
+
ttl_seconds=600, # 10 minute TTL
|
|
697
|
+
thread_safe=True # Enable locking
|
|
698
|
+
)
|
|
699
|
+
|
|
700
|
+
# Memory-constrained scenarios
|
|
701
|
+
cache = CacheManager(
|
|
702
|
+
max_size=50, # Small cache
|
|
703
|
+
ttl_seconds=60, # 1 minute TTL
|
|
704
|
+
thread_safe=False # No locking overhead
|
|
705
|
+
)
|
|
706
|
+
```
|
|
707
|
+
|
|
708
|
+
---
|
|
709
|
+
|
|
710
|
+
## Roadmap
|
|
711
|
+
|
|
712
|
+
### V2.2.1 (Planned)
|
|
713
|
+
- Query suggestions
|
|
714
|
+
- Fuzzy matching
|
|
715
|
+
- Phrase boosting
|
|
716
|
+
|
|
717
|
+
### V2.3.0 (Future)
|
|
718
|
+
- Embedding-based search
|
|
719
|
+
- Neural reranking
|
|
720
|
+
- Cross-encoder scoring
|
|
721
|
+
|
|
722
|
+
---
|
|
723
|
+
|
|
724
|
+
## Credits
|
|
725
|
+
|
|
726
|
+
**Created by:** Varun Pratap Bhardwaj
|
|
727
|
+
**Role:** Solution Architect & Original Creator
|
|
728
|
+
**GitHub:** [@varun369](https://github.com/varun369)
|
|
729
|
+
|
|
730
|
+
### Research Papers
|
|
731
|
+
|
|
732
|
+
1. **BM25:** Robertson & Zaragoza (2009) - "The Probabilistic Relevance Framework: BM25 and Beyond"
|
|
733
|
+
2. **RRF:** Cormack et al. (2009) - "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods"
|
|
734
|
+
3. **Query Expansion:** Carpineto & Romano (2012) - "A Survey of Automatic Query Expansion in Information Retrieval"
|
|
735
|
+
|
|
736
|
+
---
|
|
737
|
+
|
|
738
|
+
## License
|
|
739
|
+
|
|
740
|
+
MIT License - See [LICENSE](../LICENSE) file
|
|
741
|
+
|
|
742
|
+
**Attribution Required:** This notice must be preserved in all copies per MIT License terms.
|
|
743
|
+
|
|
744
|
+
---
|
|
745
|
+
|
|
746
|
+
**Project:** [SuperLocalMemory V2](https://github.com/varun369/SuperLocalMemoryV2)
|
|
747
|
+
**Documentation:** [Full Docs](https://github.com/varun369/SuperLocalMemoryV2/wiki)
|
|
748
|
+
**Issues:** [Report Issues](https://github.com/varun369/SuperLocalMemoryV2/issues)
|