superlocalmemory 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (100) hide show
  1. package/ATTRIBUTION.md +140 -0
  2. package/CHANGELOG.md +1749 -0
  3. package/LICENSE +21 -0
  4. package/README.md +600 -0
  5. package/bin/aider-smart +72 -0
  6. package/bin/slm +202 -0
  7. package/bin/slm-npm +73 -0
  8. package/bin/slm.bat +195 -0
  9. package/bin/slm.cmd +10 -0
  10. package/bin/superlocalmemoryv2:list +3 -0
  11. package/bin/superlocalmemoryv2:profile +3 -0
  12. package/bin/superlocalmemoryv2:recall +3 -0
  13. package/bin/superlocalmemoryv2:remember +3 -0
  14. package/bin/superlocalmemoryv2:reset +3 -0
  15. package/bin/superlocalmemoryv2:status +3 -0
  16. package/completions/slm.bash +58 -0
  17. package/completions/slm.zsh +76 -0
  18. package/configs/antigravity-mcp.json +13 -0
  19. package/configs/chatgpt-desktop-mcp.json +7 -0
  20. package/configs/claude-desktop-mcp.json +15 -0
  21. package/configs/codex-mcp.toml +13 -0
  22. package/configs/cody-commands.json +29 -0
  23. package/configs/continue-mcp.yaml +14 -0
  24. package/configs/continue-skills.yaml +26 -0
  25. package/configs/cursor-mcp.json +15 -0
  26. package/configs/gemini-cli-mcp.json +11 -0
  27. package/configs/jetbrains-mcp.json +11 -0
  28. package/configs/opencode-mcp.json +12 -0
  29. package/configs/perplexity-mcp.json +9 -0
  30. package/configs/vscode-copilot-mcp.json +12 -0
  31. package/configs/windsurf-mcp.json +16 -0
  32. package/configs/zed-mcp.json +12 -0
  33. package/docs/ARCHITECTURE.md +877 -0
  34. package/docs/CLI-COMMANDS-REFERENCE.md +425 -0
  35. package/docs/COMPETITIVE-ANALYSIS.md +210 -0
  36. package/docs/COMPRESSION-README.md +390 -0
  37. package/docs/GRAPH-ENGINE.md +503 -0
  38. package/docs/MCP-MANUAL-SETUP.md +720 -0
  39. package/docs/MCP-TROUBLESHOOTING.md +787 -0
  40. package/docs/PATTERN-LEARNING.md +363 -0
  41. package/docs/PROFILES-GUIDE.md +453 -0
  42. package/docs/RESET-GUIDE.md +353 -0
  43. package/docs/SEARCH-ENGINE-V2.2.0.md +748 -0
  44. package/docs/SEARCH-INTEGRATION-GUIDE.md +502 -0
  45. package/docs/UI-SERVER.md +254 -0
  46. package/docs/UNIVERSAL-INTEGRATION.md +432 -0
  47. package/docs/V2.2.0-OPTIONAL-SEARCH.md +666 -0
  48. package/docs/WINDOWS-INSTALL-README.txt +34 -0
  49. package/docs/WINDOWS-POST-INSTALL.txt +45 -0
  50. package/docs/example_graph_usage.py +148 -0
  51. package/hooks/memory-list-skill.js +130 -0
  52. package/hooks/memory-profile-skill.js +284 -0
  53. package/hooks/memory-recall-skill.js +109 -0
  54. package/hooks/memory-remember-skill.js +127 -0
  55. package/hooks/memory-reset-skill.js +274 -0
  56. package/install-skills.sh +436 -0
  57. package/install.ps1 +417 -0
  58. package/install.sh +755 -0
  59. package/mcp_server.py +585 -0
  60. package/package.json +94 -0
  61. package/requirements-core.txt +24 -0
  62. package/requirements.txt +10 -0
  63. package/scripts/postinstall.js +126 -0
  64. package/scripts/preuninstall.js +57 -0
  65. package/skills/slm-build-graph/SKILL.md +423 -0
  66. package/skills/slm-list-recent/SKILL.md +348 -0
  67. package/skills/slm-recall/SKILL.md +325 -0
  68. package/skills/slm-remember/SKILL.md +194 -0
  69. package/skills/slm-status/SKILL.md +363 -0
  70. package/skills/slm-switch-profile/SKILL.md +442 -0
  71. package/src/__pycache__/cache_manager.cpython-312.pyc +0 -0
  72. package/src/__pycache__/embedding_engine.cpython-312.pyc +0 -0
  73. package/src/__pycache__/graph_engine.cpython-312.pyc +0 -0
  74. package/src/__pycache__/hnsw_index.cpython-312.pyc +0 -0
  75. package/src/__pycache__/hybrid_search.cpython-312.pyc +0 -0
  76. package/src/__pycache__/memory-profiles.cpython-312.pyc +0 -0
  77. package/src/__pycache__/memory-reset.cpython-312.pyc +0 -0
  78. package/src/__pycache__/memory_compression.cpython-312.pyc +0 -0
  79. package/src/__pycache__/memory_store_v2.cpython-312.pyc +0 -0
  80. package/src/__pycache__/migrate_v1_to_v2.cpython-312.pyc +0 -0
  81. package/src/__pycache__/pattern_learner.cpython-312.pyc +0 -0
  82. package/src/__pycache__/query_optimizer.cpython-312.pyc +0 -0
  83. package/src/__pycache__/search_engine_v2.cpython-312.pyc +0 -0
  84. package/src/__pycache__/setup_validator.cpython-312.pyc +0 -0
  85. package/src/__pycache__/tree_manager.cpython-312.pyc +0 -0
  86. package/src/cache_manager.py +520 -0
  87. package/src/embedding_engine.py +671 -0
  88. package/src/graph_engine.py +970 -0
  89. package/src/hnsw_index.py +626 -0
  90. package/src/hybrid_search.py +693 -0
  91. package/src/memory-profiles.py +518 -0
  92. package/src/memory-reset.py +485 -0
  93. package/src/memory_compression.py +999 -0
  94. package/src/memory_store_v2.py +1088 -0
  95. package/src/migrate_v1_to_v2.py +638 -0
  96. package/src/pattern_learner.py +898 -0
  97. package/src/query_optimizer.py +513 -0
  98. package/src/search_engine_v2.py +403 -0
  99. package/src/setup_validator.py +479 -0
  100. package/src/tree_manager.py +720 -0
@@ -0,0 +1,748 @@
1
+ # SuperLocalMemory V2.2.0 - Search Engine Documentation
2
+
3
+ **Created by:** [Varun Pratap Bhardwaj](https://github.com/varun369) (Solution Architect)
4
+ **Version:** 2.2.0
5
+ **Release Date:** 2026-02-07
6
+
7
+ ---
8
+
9
+ ## Overview
10
+
11
+ SuperLocalMemory V2.2.0 introduces a professional-grade search engine with four integrated components:
12
+
13
+ 1. **BM25 Search Engine** - Industry-standard keyword ranking
14
+ 2. **Query Optimizer** - Spell correction and query expansion
15
+ 3. **Cache Manager** - LRU caching for performance
16
+ 4. **Hybrid Search System** - Multi-method retrieval fusion
17
+
18
+ ### Why This Matters
19
+
20
+ Previous versions relied on basic SQLite FTS and TF-IDF. V2.2.0 brings:
21
+
22
+ - **3x faster search** - BM25 optimized for <30ms on 1K memories
23
+ - **Better relevance** - BM25 outperforms TF-IDF by 15-20% in precision
24
+ - **Query intelligence** - Auto-corrects typos, expands terms
25
+ - **Multi-method fusion** - Combines keyword, semantic, and graph search
26
+ - **Production-ready caching** - 30-50% cache hit rates reduce load
27
+
28
+ ---
29
+
30
+ ## Architecture
31
+
32
+ ```
33
+ ┌─────────────────────────────────────────────────────────────┐
34
+ │ HYBRID SEARCH ENGINE (hybrid_search.py) │
35
+ │ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │
36
+ │ │ BM25 │ │ Semantic │ │ Graph │ │
37
+ │ │ Search │ │ (TF-IDF) │ │ Traversal │ │
38
+ │ └──────────────┘ └──────────────┘ └─────────────┘ │
39
+ │ │ │ │ │
40
+ │ └──────────────────┴──────────────────┘ │
41
+ │ │ │
42
+ │ Weighted Fusion │
43
+ │ (RRF / Scores) │
44
+ └─────────────────────────────────────────────────────────────┘
45
+
46
+ ┌─────────────────┼─────────────────┐
47
+ │ │ │
48
+ ┌────────▼────────┐ ┌──────▼──────┐ ┌───────▼────────┐
49
+ │ Query Optimizer │ │ Cache │ │ Memory Store │
50
+ │ - Spell Check │ │ Manager │ │ (DB) │
51
+ │ - Expansion │ │ (LRU) │ │ │
52
+ └─────────────────┘ └─────────────┘ └────────────────┘
53
+ ```
54
+
55
+ ---
56
+
57
+ ## Components
58
+
59
+ ### 1. BM25 Search Engine (`search_engine_v2.py`)
60
+
61
+ **Pure Python implementation of Okapi BM25 ranking function.**
62
+
63
+ #### Algorithm
64
+
65
+ BM25 (Best Match 25) is the gold standard for keyword search, used by:
66
+ - Elasticsearch (default ranking)
67
+ - Lucene/Solr
68
+ - Apache Lucene
69
+ - Microsoft Bing
70
+
71
+ **Formula:**
72
+ ```
73
+ score(D,Q) = Σ IDF(qi) × (f(qi,D) × (k1 + 1)) / (f(qi,D) + k1 × (1 - b + b × |D| / avgdl))
74
+ ```
75
+
76
+ Where:
77
+ - `f(qi,D)` = term frequency in document
78
+ - `|D|` = document length
79
+ - `avgdl` = average document length
80
+ - `k1` = term saturation (default: 1.5)
81
+ - `b` = length normalization (default: 0.75)
82
+ - `IDF(qi)` = inverse document frequency
83
+
84
+ #### Key Features
85
+
86
+ - **No Dependencies** - Pure Python, no external libraries
87
+ - **Fast Indexing** - O(n × m) where n=docs, m=avg_tokens
88
+ - **Fast Search** - O(q × p) where q=query_terms, p=postings
89
+ - **Memory Efficient** - Inverted index with compressed postings
90
+
91
+ #### Performance
92
+
93
+ | Metric | Target | Typical |
94
+ |--------|--------|---------|
95
+ | Index 1K docs | <500ms | 250ms |
96
+ | Search 1K docs | <30ms | 15-25ms |
97
+ | Memory overhead | <50MB | 30-40MB |
98
+
99
+ #### Usage
100
+
101
+ ```python
102
+ from search_engine_v2 import BM25SearchEngine
103
+
104
+ # Initialize
105
+ engine = BM25SearchEngine(k1=1.5, b=0.75)
106
+
107
+ # Index documents
108
+ documents = ["Python web development", "JavaScript frontend", ...]
109
+ doc_ids = [1, 2, ...]
110
+ engine.index_documents(documents, doc_ids)
111
+
112
+ # Search
113
+ results = engine.search("Python web", limit=10)
114
+ # Returns: [(doc_id, score), ...]
115
+
116
+ # Get statistics
117
+ stats = engine.get_stats()
118
+ print(f"Indexed {stats['num_documents']} documents")
119
+ print(f"Vocabulary: {stats['vocabulary_size']} terms")
120
+ ```
121
+
122
+ #### Parameter Tuning
123
+
124
+ **k1 (Term Frequency Saturation)**
125
+ - Lower (1.2): Better for short documents
126
+ - Higher (2.0): Better for long documents
127
+ - Default (1.5): Balanced for most use cases
128
+
129
+ **b (Length Normalization)**
130
+ - 0.0: No normalization (good for uniform length docs)
131
+ - 0.5: Moderate normalization
132
+ - 0.75: Standard normalization (default)
133
+ - 1.0: Full normalization (aggressive for long docs)
134
+
135
+ ---
136
+
137
+ ### 2. Query Optimizer (`query_optimizer.py`)
138
+
139
+ **Enhances queries with spell correction and expansion.**
140
+
141
+ #### Features
142
+
143
+ **1. Spell Correction**
144
+ - Edit distance (Levenshtein) algorithm
145
+ - Vocabulary-based correction
146
+ - Technical term preservation (API, SQL, JWT, etc.)
147
+ - Max distance: 2 edits
148
+
149
+ **2. Query Expansion**
150
+ - Co-occurrence based expansion
151
+ - Adds related terms to broaden search
152
+ - Configurable expansion count
153
+ - Minimum co-occurrence threshold
154
+
155
+ **3. Boolean Operators**
156
+ - AND: `term1 AND term2` (both required)
157
+ - OR: `term1 OR term2` (either required)
158
+ - NOT: `term1 NOT term2` (exclude term2)
159
+ - Phrase: `"exact phrase"` (exact match)
160
+
161
+ #### Usage
162
+
163
+ ```python
164
+ from query_optimizer import QueryOptimizer
165
+
166
+ # Initialize with vocabulary
167
+ vocabulary = {'python', 'javascript', 'web', 'development', ...}
168
+ optimizer = QueryOptimizer(vocabulary)
169
+
170
+ # Build co-occurrence for expansion
171
+ documents = [
172
+ ['python', 'web', 'development'],
173
+ ['javascript', 'frontend', 'web'],
174
+ ...
175
+ ]
176
+ optimizer.build_cooccurrence_matrix(documents)
177
+
178
+ # Spell correction
179
+ corrected = optimizer.spell_correct("pythno") # → "python"
180
+
181
+ # Query expansion
182
+ expanded = optimizer.expand_query(['python'], max_expansions=2)
183
+ # Returns: ['python', 'web', 'development']
184
+
185
+ # Full optimization
186
+ optimized = optimizer.optimize(
187
+ "pythno web devlopment",
188
+ enable_spell_correction=True,
189
+ enable_expansion=False
190
+ )
191
+ # Returns: "python web development"
192
+
193
+ # Boolean parsing
194
+ parsed = optimizer.parse_boolean_query('python AND (web OR api)')
195
+ ```
196
+
197
+ #### Spell Correction Algorithm
198
+
199
+ Uses Levenshtein distance with optimizations:
200
+
201
+ 1. **Quick filters:**
202
+ - Length difference > max_distance → skip
203
+ - Term in vocabulary → return as-is
204
+ - Technical terms (≤3 chars) → preserve
205
+
206
+ 2. **Approximate matching:**
207
+ - Uses `difflib.get_close_matches()` for candidates
208
+ - Validates with full edit distance
209
+ - Returns best match within threshold
210
+
211
+ 3. **Performance:**
212
+ - O(k × m × n) where k=candidates, m,n=string lengths
213
+ - Typical: <5ms per query
214
+
215
+ ---
216
+
217
+ ### 3. Cache Manager (`cache_manager.py`)
218
+
219
+ **LRU cache for search results with TTL support.**
220
+
221
+ #### Features
222
+
223
+ - **LRU Eviction** - Least Recently Used policy
224
+ - **TTL Support** - Time-to-live for cache entries
225
+ - **Thread-Safe** - Optional locking for concurrent access
226
+ - **Size-Based** - Maximum entry count
227
+ - **Analytics** - Hit rate, access counts, eviction tracking
228
+
229
+ #### Usage
230
+
231
+ ```python
232
+ from cache_manager import CacheManager
233
+
234
+ # Initialize
235
+ cache = CacheManager(
236
+ max_size=100, # Max 100 cached queries
237
+ ttl_seconds=300, # 5 minute TTL
238
+ thread_safe=False # Single-threaded
239
+ )
240
+
241
+ # Cache operations
242
+ result = cache.get("python web")
243
+ if result is None:
244
+ # Cache miss - perform search
245
+ result = search_engine.search("python web")
246
+ cache.put("python web", result)
247
+
248
+ # Statistics
249
+ stats = cache.get_stats()
250
+ print(f"Hit rate: {stats['hit_rate']*100:.1f}%")
251
+ print(f"Evictions: {stats['evictions']}")
252
+
253
+ # Manual eviction
254
+ cache.evict_expired() # Remove expired entries
255
+ cache.clear() # Clear all entries
256
+ ```
257
+
258
+ #### Cache Key Generation
259
+
260
+ Keys are generated from query + parameters:
261
+ ```python
262
+ key = hash(json.dumps({
263
+ 'query': query,
264
+ 'limit': limit,
265
+ 'method': method,
266
+ # ... other parameters
267
+ }))
268
+ ```
269
+
270
+ This ensures different parameter combinations get separate cache entries.
271
+
272
+ #### Performance Impact
273
+
274
+ | Operation | Time | Description |
275
+ |-----------|------|-------------|
276
+ | Cache hit | ~0.1ms | Dictionary lookup |
277
+ | Cache miss | Search time + 0.1ms | Standard search + cache store |
278
+ | Eviction | ~0.01ms | OrderedDict pop |
279
+
280
+ **Expected hit rates:**
281
+ - Repeated queries: 80-90%
282
+ - Similar queries: 10-20%
283
+ - Overall typical: 30-50%
284
+
285
+ ---
286
+
287
+ ### 4. Hybrid Search System (`hybrid_search.py`)
288
+
289
+ **Multi-method retrieval with score fusion.**
290
+
291
+ #### Fusion Methods
292
+
293
+ **1. Weighted Score Fusion**
294
+ - Normalizes scores from each method
295
+ - Combines with configurable weights
296
+ - Best for balanced results
297
+
298
+ **2. Reciprocal Rank Fusion (RRF)**
299
+ - Rank-based combination
300
+ - Robust to score magnitude differences
301
+ - Standard: `score = Σ 1/(k + rank)` where k=60
302
+
303
+ **3. Single Method**
304
+ - BM25 only
305
+ - Semantic only
306
+ - Graph only
307
+
308
+ #### Default Weights
309
+
310
+ ```python
311
+ weights = {
312
+ 'bm25': 0.4, # 40% - Best for keyword queries
313
+ 'semantic': 0.3, # 30% - Best for natural language
314
+ 'graph': 0.3 # 30% - Best for conceptual queries
315
+ }
316
+ ```
317
+
318
+ #### Usage
319
+
320
+ ```python
321
+ from hybrid_search import HybridSearchEngine
322
+ from pathlib import Path
323
+
324
+ # Initialize
325
+ db_path = Path.home() / ".claude-memory" / "memory.db"
326
+ hybrid = HybridSearchEngine(db_path, enable_cache=True)
327
+
328
+ # BM25 only
329
+ results = hybrid.search("Python web", method="bm25", limit=10)
330
+
331
+ # Hybrid with default weights
332
+ results = hybrid.search("Python web", method="hybrid", limit=10)
333
+
334
+ # Custom weights
335
+ results = hybrid.search(
336
+ query="Python web",
337
+ method="weighted",
338
+ weights={'bm25': 0.6, 'semantic': 0.4, 'graph': 0.0},
339
+ limit=10
340
+ )
341
+
342
+ # Reciprocal Rank Fusion
343
+ results = hybrid.search("Python web", method="rrf", limit=10)
344
+
345
+ # Statistics
346
+ stats = hybrid.get_stats()
347
+ print(f"Search time: {stats['last_search_time_ms']:.2f}ms")
348
+ print(f"Fusion time: {stats['last_fusion_time_ms']:.2f}ms")
349
+ print(f"Cache hit rate: {stats['cache']['hit_rate']*100:.1f}%")
350
+ ```
351
+
352
+ #### Result Format
353
+
354
+ ```python
355
+ [
356
+ {
357
+ 'id': 1,
358
+ 'content': 'Memory content...',
359
+ 'summary': 'Summary...',
360
+ 'score': 0.87,
361
+ 'match_type': 'hybrid',
362
+ 'category': 'development',
363
+ 'tags': ['python', 'web'],
364
+ # ... other memory fields
365
+ },
366
+ ...
367
+ ]
368
+ ```
369
+
370
+ #### Weight Tuning Guide
371
+
372
+ **For keyword queries** (exact terms):
373
+ ```python
374
+ weights = {'bm25': 0.7, 'semantic': 0.3, 'graph': 0.0}
375
+ ```
376
+
377
+ **For conceptual queries** (themes):
378
+ ```python
379
+ weights = {'bm25': 0.2, 'semantic': 0.3, 'graph': 0.5}
380
+ ```
381
+
382
+ **For balanced queries** (mixed):
383
+ ```python
384
+ weights = {'bm25': 0.4, 'semantic': 0.3, 'graph': 0.3} # Default
385
+ ```
386
+
387
+ ---
388
+
389
+ ## Performance Benchmarks
390
+
391
+ ### Test Environment
392
+ - MacBook Pro M1 (2021)
393
+ - Python 3.11
394
+ - 1,000 test memories
395
+ - Average memory size: 200 tokens
396
+
397
+ ### Results
398
+
399
+ | Component | Metric | Target | Actual | Status |
400
+ |-----------|--------|--------|--------|--------|
401
+ | BM25 | Index 1K docs | <500ms | 247ms | ✅ |
402
+ | BM25 | Search 1K docs | <30ms | 18ms | ✅ |
403
+ | Query Optimizer | Spell check | <5ms | 2ms | ✅ |
404
+ | Cache Manager | Get/Put | <0.5ms | 0.12ms | ✅ |
405
+ | Hybrid Search | Combined | <50ms | 35ms | ✅ |
406
+
407
+ ### Scalability
408
+
409
+ | Documents | BM25 Index | BM25 Search | Hybrid Search |
410
+ |-----------|------------|-------------|---------------|
411
+ | 100 | 25ms | 3ms | 8ms |
412
+ | 500 | 120ms | 10ms | 20ms |
413
+ | 1,000 | 247ms | 18ms | 35ms |
414
+ | 5,000 | 1,200ms | 45ms | 95ms |
415
+ | 10,000 | 2,400ms | 80ms | 180ms |
416
+
417
+ **Notes:**
418
+ - Index time is one-time cost
419
+ - Search time scales sub-linearly (inverted index efficiency)
420
+ - Hybrid search includes fusion overhead (~10-15ms)
421
+
422
+ ---
423
+
424
+ ## Integration with Memory Store V2
425
+
426
+ ### Automatic Integration
427
+
428
+ Hybrid search automatically integrates with `MemoryStoreV2`:
429
+
430
+ ```python
431
+ from memory_store_v2 import MemoryStoreV2
432
+ from hybrid_search import HybridSearchEngine
433
+
434
+ # Initialize
435
+ store = MemoryStoreV2()
436
+ hybrid = HybridSearchEngine(store.db_path)
437
+
438
+ # Add memories (automatically indexed)
439
+ store.add_memory("Python web development", tags=['python', 'web'])
440
+
441
+ # Search
442
+ results = hybrid.search("Python", limit=5)
443
+ ```
444
+
445
+ ### Backward Compatibility
446
+
447
+ V2.2.0 maintains full backward compatibility:
448
+
449
+ ```python
450
+ # Old API still works
451
+ results = store.search("Python web", limit=5)
452
+
453
+ # New API available
454
+ results = hybrid.search("Python web", method="hybrid", limit=5)
455
+ ```
456
+
457
+ ---
458
+
459
+ ## Installation
460
+
461
+ ### Basic (BM25 + Hybrid)
462
+
463
+ ```bash
464
+ pip install scikit-learn numpy
465
+ ```
466
+
467
+ ### Full (All features)
468
+
469
+ ```bash
470
+ pip install -r requirements-search.txt
471
+ ```
472
+
473
+ This includes:
474
+ - scikit-learn (TF-IDF)
475
+ - numpy (numerical computing)
476
+ - sentence-transformers (optional embeddings)
477
+ - hnswlib (optional fast search)
478
+
479
+ ---
480
+
481
+ ## Testing
482
+
483
+ ### Run Test Suite
484
+
485
+ ```bash
486
+ python test_search_engine.py
487
+ ```
488
+
489
+ ### Expected Output
490
+
491
+ ```
492
+ ============================================================
493
+ SuperLocalMemory V2.2.0 - Search Engine Test Suite
494
+ ============================================================
495
+
496
+ ✓ PASS: BM25 Basic Functionality
497
+ → Indexed 4 docs, search returned 2 results
498
+ ✓ PASS: BM25 Performance
499
+ → Search: 18.45ms, Index: 247.32ms (1K docs)
500
+ ✓ PASS: Query Optimizer
501
+ → Spell correction and expansion working correctly
502
+ ✓ PASS: Cache Manager
503
+ → LRU eviction and stats working (hit rate: 33%)
504
+ ✓ PASS: Cache TTL
505
+ → Time-to-live expiration working correctly
506
+ ✓ PASS: Search Quality
507
+ → Relevance ranking correct, top score: 0.873
508
+ ✓ PASS: Hybrid Search Integration
509
+ → All methods working, 35.21ms search time
510
+ ✓ PASS: Weighted Fusion
511
+ → Multiple weight configurations working correctly
512
+
513
+ ============================================================
514
+ TEST SUMMARY
515
+ ============================================================
516
+ PASSED: 8
517
+ FAILED: 0
518
+ WARNINGS: 0
519
+
520
+ ✅ All tests passed!
521
+
522
+ Search Engine V2.2.0 Components:
523
+ ✓ BM25 Search Engine
524
+ ✓ Query Optimizer
525
+ ✓ Cache Manager
526
+ ✓ Hybrid Search System
527
+
528
+ Performance Targets:
529
+ ✓ BM25: <30ms for 1K memories
530
+ ✓ Hybrid: <50ms for 1K memories
531
+
532
+ 🎉 Ready for production!
533
+ ```
534
+
535
+ ---
536
+
537
+ ## CLI Usage
538
+
539
+ ### BM25 Search Engine
540
+
541
+ ```bash
542
+ python src/search_engine_v2.py
543
+ ```
544
+
545
+ Output:
546
+ ```
547
+ BM25 Search Engine - Demo
548
+ ============================================================
549
+
550
+ Indexing 6 documents...
551
+ ✓ Indexed in 3.21ms
552
+ Vocabulary: 42 unique terms
553
+ Avg doc length: 8.5 tokens
554
+
555
+ ============================================================
556
+ Search Results:
557
+ ============================================================
558
+
559
+ Query: 'Python programming'
560
+ Found: 3 results in 1.23ms
561
+ Query terms: ['python', 'programming']
562
+ [0.873] doc_0: Python is a high-level programming language...
563
+ [0.542] doc_2: Machine learning uses Python libraries...
564
+ [0.234] doc_4: Django is a Python web framework...
565
+ ```
566
+
567
+ ### Query Optimizer
568
+
569
+ ```bash
570
+ python src/query_optimizer.py
571
+ ```
572
+
573
+ ### Cache Manager
574
+
575
+ ```bash
576
+ python src/cache_manager.py
577
+ ```
578
+
579
+ ### Hybrid Search
580
+
581
+ ```bash
582
+ python src/hybrid_search.py "Python web development"
583
+ ```
584
+
585
+ ---
586
+
587
+ ## Migration from V2.1.0
588
+
589
+ No migration needed! V2.2.0 is backward compatible.
590
+
591
+ ### Changes
592
+
593
+ 1. **New components** (optional):
594
+ - `search_engine_v2.py` - BM25 engine
595
+ - `query_optimizer.py` - Query enhancement
596
+ - `cache_manager.py` - Result caching
597
+ - `hybrid_search.py` - Multi-method search
598
+
599
+ 2. **Existing behavior preserved**:
600
+ - `MemoryStoreV2.search()` still works
601
+ - Database schema unchanged
602
+ - API unchanged
603
+
604
+ ### Upgrade Path
605
+
606
+ **Option 1: Use old API (no changes)**
607
+ ```python
608
+ # Works exactly as before
609
+ store = MemoryStoreV2()
610
+ results = store.search("Python web")
611
+ ```
612
+
613
+ **Option 2: Use new hybrid search (recommended)**
614
+ ```python
615
+ # Better results, faster search
616
+ hybrid = HybridSearchEngine(store.db_path)
617
+ results = hybrid.search("Python web", method="hybrid")
618
+ ```
619
+
620
+ ---
621
+
622
+ ## Troubleshooting
623
+
624
+ ### Issue: "scikit-learn not found"
625
+
626
+ **Solution:**
627
+ ```bash
628
+ pip install scikit-learn numpy
629
+ ```
630
+
631
+ ### Issue: Search is slow (>50ms)
632
+
633
+ **Causes:**
634
+ 1. Large database (>10K memories)
635
+ 2. Complex queries
636
+ 3. Cold cache
637
+
638
+ **Solutions:**
639
+ 1. Use BM25 only: `method="bm25"`
640
+ 2. Reduce limit: `limit=10` instead of 50
641
+ 3. Enable caching: `enable_cache=True`
642
+
643
+ ### Issue: Poor relevance
644
+
645
+ **Solutions:**
646
+ 1. Try hybrid search: `method="hybrid"`
647
+ 2. Adjust weights: `weights={'bm25': 0.6, ...}`
648
+ 3. Use query expansion: `optimizer.optimize(..., enable_expansion=True)`
649
+
650
+ ### Issue: High memory usage
651
+
652
+ **Causes:**
653
+ 1. Large vocabulary (>100K terms)
654
+ 2. Cache too large
655
+
656
+ **Solutions:**
657
+ 1. Reduce BM25 `max_features` (not exposed by default)
658
+ 2. Reduce cache size: `CacheManager(max_size=50)`
659
+
660
+ ---
661
+
662
+ ## Advanced Topics
663
+
664
+ ### Custom BM25 Parameters
665
+
666
+ ```python
667
+ # For short documents (tweets, logs)
668
+ engine = BM25SearchEngine(k1=1.2, b=0.0)
669
+
670
+ # For long documents (articles, docs)
671
+ engine = BM25SearchEngine(k1=2.0, b=1.0)
672
+ ```
673
+
674
+ ### Custom Fusion Weights
675
+
676
+ ```python
677
+ # Keyword-heavy queries
678
+ results = hybrid.search(
679
+ "Python FastAPI REST API",
680
+ weights={'bm25': 0.8, 'semantic': 0.2, 'graph': 0.0}
681
+ )
682
+
683
+ # Conceptual queries
684
+ results = hybrid.search(
685
+ "how to optimize performance",
686
+ weights={'bm25': 0.2, 'semantic': 0.4, 'graph': 0.4}
687
+ )
688
+ ```
689
+
690
+ ### Cache Configuration
691
+
692
+ ```python
693
+ # High-traffic scenarios
694
+ cache = CacheManager(
695
+ max_size=1000, # Large cache
696
+ ttl_seconds=600, # 10 minute TTL
697
+ thread_safe=True # Enable locking
698
+ )
699
+
700
+ # Memory-constrained scenarios
701
+ cache = CacheManager(
702
+ max_size=50, # Small cache
703
+ ttl_seconds=60, # 1 minute TTL
704
+ thread_safe=False # No locking overhead
705
+ )
706
+ ```
707
+
708
+ ---
709
+
710
+ ## Roadmap
711
+
712
+ ### V2.2.1 (Planned)
713
+ - Query suggestions
714
+ - Fuzzy matching
715
+ - Phrase boosting
716
+
717
+ ### V2.3.0 (Future)
718
+ - Embedding-based search
719
+ - Neural reranking
720
+ - Cross-encoder scoring
721
+
722
+ ---
723
+
724
+ ## Credits
725
+
726
+ **Created by:** Varun Pratap Bhardwaj
727
+ **Role:** Solution Architect & Original Creator
728
+ **GitHub:** [@varun369](https://github.com/varun369)
729
+
730
+ ### Research Papers
731
+
732
+ 1. **BM25:** Robertson & Zaragoza (2009) - "The Probabilistic Relevance Framework: BM25 and Beyond"
733
+ 2. **RRF:** Cormack et al. (2009) - "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods"
734
+ 3. **Query Expansion:** Carpineto & Romano (2012) - "A Survey of Automatic Query Expansion in Information Retrieval"
735
+
736
+ ---
737
+
738
+ ## License
739
+
740
+ MIT License - See [LICENSE](../LICENSE) file
741
+
742
+ **Attribution Required:** This notice must be preserved in all copies per MIT License terms.
743
+
744
+ ---
745
+
746
+ **Project:** [SuperLocalMemory V2](https://github.com/varun369/SuperLocalMemoryV2)
747
+ **Documentation:** [Full Docs](https://github.com/varun369/SuperLocalMemoryV2/wiki)
748
+ **Issues:** [Report Issues](https://github.com/varun369/SuperLocalMemoryV2/issues)