superlocalmemory 3.4.17 → 3.4.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. package/CHANGELOG.md +19 -0
  2. package/package.json +1 -3
  3. package/pyproject.toml +10 -1
  4. package/src/superlocalmemory/cli/setup_wizard.py +30 -0
  5. package/src/superlocalmemory/core/embeddings.py +8 -2
  6. package/src/superlocalmemory/retrieval/reranker.py +4 -2
  7. package/src/superlocalmemory.egg-info/PKG-INFO +4 -1
  8. package/src/superlocalmemory.egg-info/requires.txt +3 -0
  9. package/docs/ARCHITECTURE.md +0 -149
  10. package/docs/api-reference.md +0 -284
  11. package/docs/auto-memory.md +0 -150
  12. package/docs/cli-reference.md +0 -327
  13. package/docs/cloud-backup.md +0 -174
  14. package/docs/compliance.md +0 -191
  15. package/docs/configuration.md +0 -182
  16. package/docs/getting-started.md +0 -102
  17. package/docs/ide-setup.md +0 -261
  18. package/docs/mcp-tools.md +0 -220
  19. package/docs/migration-from-v2.md +0 -170
  20. package/docs/profiles.md +0 -173
  21. package/docs/screenshots/01-dashboard-main.png +0 -0
  22. package/docs/screenshots/02-knowledge-graph.png +0 -0
  23. package/docs/screenshots/03-math-health.png +0 -0
  24. package/docs/screenshots/03-patterns-learning.png +0 -0
  25. package/docs/screenshots/04-learning-dashboard.png +0 -0
  26. package/docs/screenshots/04-recall-lab.png +0 -0
  27. package/docs/screenshots/05-behavioral-analysis.png +0 -0
  28. package/docs/screenshots/05-trust-dashboard.png +0 -0
  29. package/docs/screenshots/06-graph-communities.png +0 -0
  30. package/docs/screenshots/06-settings.png +0 -0
  31. package/docs/screenshots/07-memories-blurred.png +0 -0
  32. package/docs/skill-evolution.md +0 -256
  33. package/docs/troubleshooting.md +0 -310
  34. package/docs/v2-archive/ACCESSIBILITY.md +0 -291
  35. package/docs/v2-archive/ARCHITECTURE.md +0 -886
  36. package/docs/v2-archive/CLI-COMMANDS-REFERENCE.md +0 -425
  37. package/docs/v2-archive/COMPRESSION-README.md +0 -390
  38. package/docs/v2-archive/FRAMEWORK-INTEGRATIONS.md +0 -300
  39. package/docs/v2-archive/MCP-MANUAL-SETUP.md +0 -775
  40. package/docs/v2-archive/MCP-TROUBLESHOOTING.md +0 -787
  41. package/docs/v2-archive/PATTERN-LEARNING.md +0 -228
  42. package/docs/v2-archive/PROFILES-GUIDE.md +0 -453
  43. package/docs/v2-archive/RESET-GUIDE.md +0 -353
  44. package/docs/v2-archive/SEARCH-ENGINE-V2.2.0.md +0 -749
  45. package/docs/v2-archive/SEARCH-INTEGRATION-GUIDE.md +0 -502
  46. package/docs/v2-archive/UI-SERVER.md +0 -262
  47. package/docs/v2-archive/UNIVERSAL-INTEGRATION.md +0 -488
  48. package/docs/v2-archive/V2.2.0-OPTIONAL-SEARCH.md +0 -666
  49. package/docs/v2-archive/WINDOWS-INSTALL-README.txt +0 -34
  50. package/docs/v2-archive/WINDOWS-POST-INSTALL.txt +0 -45
  51. package/docs/v2-archive/example_graph_usage.py +0 -146
  52. package/ui/index.html +0 -1879
  53. package/ui/js/agents.js +0 -192
  54. package/ui/js/auto-settings.js +0 -399
  55. package/ui/js/behavioral.js +0 -276
  56. package/ui/js/clusters.js +0 -206
  57. package/ui/js/compliance.js +0 -252
  58. package/ui/js/core.js +0 -246
  59. package/ui/js/dashboard.js +0 -110
  60. package/ui/js/events.js +0 -178
  61. package/ui/js/fact-detail.js +0 -92
  62. package/ui/js/feedback.js +0 -333
  63. package/ui/js/graph-core.js +0 -447
  64. package/ui/js/graph-filters.js +0 -220
  65. package/ui/js/graph-interactions.js +0 -351
  66. package/ui/js/graph-ui.js +0 -214
  67. package/ui/js/ide-status.js +0 -102
  68. package/ui/js/init.js +0 -45
  69. package/ui/js/learning.js +0 -435
  70. package/ui/js/lifecycle.js +0 -298
  71. package/ui/js/math-health.js +0 -98
  72. package/ui/js/memories.js +0 -264
  73. package/ui/js/modal.js +0 -357
  74. package/ui/js/patterns.js +0 -93
  75. package/ui/js/profiles.js +0 -236
  76. package/ui/js/recall-lab.js +0 -292
  77. package/ui/js/search.js +0 -59
  78. package/ui/js/settings.js +0 -224
  79. package/ui/js/timeline.js +0 -32
  80. package/ui/js/trust-dashboard.js +0 -73
@@ -1,749 +0,0 @@
1
- # SuperLocalMemory V2.2.0 - Search Engine Documentation
2
-
3
- **Created by:** [Varun Pratap Bhardwaj](https://github.com/varun369) (Solution Architect)
4
- **Version:** 2.2.0
5
- **Release Date:** 2026-02-07
6
-
7
- ---
8
-
9
- ## Overview
10
-
11
- SuperLocalMemory V2.2.0 introduces a professional-grade search engine with four integrated components:
12
-
13
- 1. **BM25 Search Engine** - Industry-standard keyword ranking
14
- 2. **Query Optimizer** - Spell correction and query expansion
15
- 3. **Cache Manager** - LRU caching for performance
16
- 4. **Hybrid Search System** - Multi-method retrieval fusion
17
-
18
- ### Why This Matters
19
-
20
- Previous versions relied on basic SQLite FTS and TF-IDF. V2.2.0 brings:
21
-
22
- - **3x faster search** - BM25 optimized for <30ms on 1K memories
23
- - **Better relevance** - BM25 outperforms TF-IDF by 15-20% in precision
24
- - **Query intelligence** - Auto-corrects typos, expands terms
25
- - **Multi-method fusion** - Combines keyword, semantic, and graph search
26
- - **Production-ready caching** - 30-50% cache hit rates reduce load
27
-
28
- ---
29
-
30
- ## Architecture
31
-
32
- ```
33
- ┌─────────────────────────────────────────────────────────────┐
34
- │ HYBRID SEARCH ENGINE (hybrid_search.py) │
35
- │ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │
36
- │ │ BM25 │ │ Semantic │ │ Graph │ │
37
- │ │ Search │ │ (TF-IDF) │ │ Traversal │ │
38
- │ └──────────────┘ └──────────────┘ └─────────────┘ │
39
- │ │ │ │ │
40
- │ └──────────────────┴──────────────────┘ │
41
- │ │ │
42
- │ Weighted Fusion │
43
- │ (RRF / Scores) │
44
- └─────────────────────────────────────────────────────────────┘
45
-
46
- ┌─────────────────┼─────────────────┐
47
- │ │ │
48
- ┌────────▼────────┐ ┌──────▼──────┐ ┌───────▼────────┐
49
- │ Query Optimizer │ │ Cache │ │ Memory Store │
50
- │ - Spell Check │ │ Manager │ │ (DB) │
51
- │ - Expansion │ │ (LRU) │ │ │
52
- └─────────────────┘ └─────────────┘ └────────────────┘
53
- ```
54
-
55
- ---
56
-
57
- ## Components
58
-
59
- ### 1. BM25 Search Engine (`search_engine_v2.py`)
60
-
61
- **Pure Python implementation of Okapi BM25 ranking function.**
62
-
63
- #### Algorithm
64
-
65
- BM25 (Best Match 25) is the gold standard for keyword search, used by:
66
- - Elasticsearch (default ranking)
67
- - Lucene/Solr
68
- - Apache Lucene
69
- - Microsoft Bing
70
-
71
- **Formula:**
72
- ```
73
- score(D,Q) = Σ IDF(qi) × (f(qi,D) × (k1 + 1)) / (f(qi,D) + k1 × (1 - b + b × |D| / avgdl))
74
- ```
75
-
76
- Where:
77
- - `f(qi,D)` = term frequency in document
78
- - `|D|` = document length
79
- - `avgdl` = average document length
80
- - `k1` = term saturation (default: 1.5)
81
- - `b` = length normalization (default: 0.75)
82
- - `IDF(qi)` = inverse document frequency
83
-
84
- #### Key Features
85
-
86
- - **No Dependencies** - Pure Python, no external libraries
87
- - **Fast Indexing** - O(n × m) where n=docs, m=avg_tokens
88
- - **Fast Search** - O(q × p) where q=query_terms, p=postings
89
- - **Memory Efficient** - Inverted index with compressed postings
90
-
91
- #### Performance
92
-
93
- | Metric | Target | Typical |
94
- |--------|--------|---------|
95
- | Index 1K docs | <500ms | 250ms |
96
- | Search 1K docs | <30ms | 15-25ms |
97
- | Memory overhead | <50MB | 30-40MB |
98
-
99
- #### Usage
100
-
101
- ```python
102
- from search_engine_v2 import BM25SearchEngine
103
-
104
- # Initialize
105
- engine = BM25SearchEngine(k1=1.5, b=0.75)
106
-
107
- # Index documents
108
- documents = ["Python web development", "JavaScript frontend", ...]
109
- doc_ids = [1, 2, ...]
110
- engine.index_documents(documents, doc_ids)
111
-
112
- # Search
113
- results = engine.search("Python web", limit=10)
114
- # Returns: [(doc_id, score), ...]
115
-
116
- # Get statistics
117
- stats = engine.get_stats()
118
- print(f"Indexed {stats['num_documents']} documents")
119
- print(f"Vocabulary: {stats['vocabulary_size']} terms")
120
- ```
121
-
122
- #### Parameter Tuning
123
-
124
- **k1 (Term Frequency Saturation)**
125
- - Lower (1.2): Better for short documents
126
- - Higher (2.0): Better for long documents
127
- - Default (1.5): Balanced for most use cases
128
-
129
- **b (Length Normalization)**
130
- - 0.0: No normalization (good for uniform length docs)
131
- - 0.5: Moderate normalization
132
- - 0.75: Standard normalization (default)
133
- - 1.0: Full normalization (aggressive for long docs)
134
-
135
- ---
136
-
137
- ### 2. Query Optimizer (`query_optimizer.py`)
138
-
139
- **Enhances queries with spell correction and expansion.**
140
-
141
- #### Features
142
-
143
- **1. Spell Correction**
144
- - Edit distance (Levenshtein) algorithm
145
- - Vocabulary-based correction
146
- - Technical term preservation (API, SQL, JWT, etc.)
147
- - Max distance: 2 edits
148
-
149
- **2. Query Expansion**
150
- - Co-occurrence based expansion
151
- - Adds related terms to broaden search
152
- - Configurable expansion count
153
- - Minimum co-occurrence threshold
154
-
155
- **3. Boolean Operators**
156
- - AND: `term1 AND term2` (both required)
157
- - OR: `term1 OR term2` (either required)
158
- - NOT: `term1 NOT term2` (exclude term2)
159
- - Phrase: `"exact phrase"` (exact match)
160
-
161
- #### Usage
162
-
163
- ```python
164
- from query_optimizer import QueryOptimizer
165
-
166
- # Initialize with vocabulary
167
- vocabulary = {'python', 'javascript', 'web', 'development', ...}
168
- optimizer = QueryOptimizer(vocabulary)
169
-
170
- # Build co-occurrence for expansion
171
- documents = [
172
- ['python', 'web', 'development'],
173
- ['javascript', 'frontend', 'web'],
174
- ...
175
- ]
176
- optimizer.build_cooccurrence_matrix(documents)
177
-
178
- # Spell correction
179
- corrected = optimizer.spell_correct("pythno") # → "python"
180
-
181
- # Query expansion
182
- expanded = optimizer.expand_query(['python'], max_expansions=2)
183
- # Returns: ['python', 'web', 'development']
184
-
185
- # Full optimization
186
- optimized = optimizer.optimize(
187
- "pythno web devlopment",
188
- enable_spell_correction=True,
189
- enable_expansion=False
190
- )
191
- # Returns: "python web development"
192
-
193
- # Boolean parsing
194
- parsed = optimizer.parse_boolean_query('python AND (web OR api)')
195
- ```
196
-
197
- #### Spell Correction Algorithm
198
-
199
- Uses Levenshtein distance with optimizations:
200
-
201
- 1. **Quick filters:**
202
- - Length difference > max_distance → skip
203
- - Term in vocabulary → return as-is
204
- - Technical terms (≤3 chars) → preserve
205
-
206
- 2. **Approximate matching:**
207
- - Uses `difflib.get_close_matches()` for candidates
208
- - Validates with full edit distance
209
- - Returns best match within threshold
210
-
211
- 3. **Performance:**
212
- - O(k × m × n) where k=candidates, m,n=string lengths
213
- - Typical: <5ms per query
214
-
215
- ---
216
-
217
- ### 3. Cache Manager (`cache_manager.py`)
218
-
219
- **LRU cache for search results with TTL support.**
220
-
221
- #### Features
222
-
223
- - **LRU Eviction** - Least Recently Used policy
224
- - **TTL Support** - Time-to-live for cache entries
225
- - **Thread-Safe** - Optional locking for concurrent access
226
- - **Size-Based** - Maximum entry count
227
- - **Analytics** - Hit rate, access counts, eviction tracking
228
-
229
- #### Usage
230
-
231
- ```python
232
- from cache_manager import CacheManager
233
-
234
- # Initialize
235
- cache = CacheManager(
236
- max_size=100, # Max 100 cached queries
237
- ttl_seconds=300, # 5 minute TTL
238
- thread_safe=False # Single-threaded
239
- )
240
-
241
- # Cache operations
242
- result = cache.get("python web")
243
- if result is None:
244
- # Cache miss - perform search
245
- result = search_engine.search("python web")
246
- cache.put("python web", result)
247
-
248
- # Statistics
249
- stats = cache.get_stats()
250
- print(f"Hit rate: {stats['hit_rate']*100:.1f}%")
251
- print(f"Evictions: {stats['evictions']}")
252
-
253
- # Manual eviction
254
- cache.evict_expired() # Remove expired entries
255
- cache.clear() # Clear all entries
256
- ```
257
-
258
- #### Cache Key Generation
259
-
260
- Keys are generated from query + parameters:
261
- ```python
262
- key = hash(json.dumps({
263
- 'query': query,
264
- 'limit': limit,
265
- 'method': method,
266
- # ... other parameters
267
- }))
268
- ```
269
-
270
- This ensures different parameter combinations get separate cache entries.
271
-
272
- #### Performance Impact
273
-
274
- | Operation | Time | Description |
275
- |-----------|------|-------------|
276
- | Cache hit | ~0.1ms | Dictionary lookup |
277
- | Cache miss | Search time + 0.1ms | Standard search + cache store |
278
- | Eviction | ~0.01ms | OrderedDict pop |
279
-
280
- **Expected hit rates:**
281
- - Repeated queries: 80-90%
282
- - Similar queries: 10-20%
283
- - Overall typical: 30-50%
284
-
285
- ---
286
-
287
- ### 4. Hybrid Search System (`hybrid_search.py`)
288
-
289
- **Multi-method retrieval with score fusion.**
290
-
291
- #### Fusion Methods
292
-
293
- **1. Weighted Score Fusion**
294
- - Normalizes scores from each method
295
- - Combines with configurable weights
296
- - Best for balanced results
297
-
298
- **2. Reciprocal Rank Fusion (RRF)**
299
- - Rank-based combination
300
- - Robust to score magnitude differences
301
- - Standard: `score = Σ 1/(k + rank)` where k=60
302
-
303
- **3. Single Method**
304
- - BM25 only
305
- - Semantic only
306
- - Graph only
307
-
308
- #### Default Weights
309
-
310
- ```python
311
- weights = {
312
- 'bm25': 0.4, # 40% - Best for keyword queries
313
- 'semantic': 0.3, # 30% - Best for natural language
314
- 'graph': 0.3 # 30% - Best for conceptual queries
315
- }
316
- ```
317
-
318
- #### Usage
319
-
320
- ```python
321
- from hybrid_search import HybridSearchEngine
322
- from pathlib import Path
323
-
324
- # Initialize
325
- db_path = Path.home() / ".claude-memory" / "memory.db"
326
- hybrid = HybridSearchEngine(db_path, enable_cache=True)
327
-
328
- # BM25 only
329
- results = hybrid.search("Python web", method="bm25", limit=10)
330
-
331
- # Hybrid with default weights
332
- results = hybrid.search("Python web", method="hybrid", limit=10)
333
-
334
- # Custom weights
335
- results = hybrid.search(
336
- query="Python web",
337
- method="weighted",
338
- weights={'bm25': 0.6, 'semantic': 0.4, 'graph': 0.0},
339
- limit=10
340
- )
341
-
342
- # Reciprocal Rank Fusion
343
- results = hybrid.search("Python web", method="rrf", limit=10)
344
-
345
- # Statistics
346
- stats = hybrid.get_stats()
347
- print(f"Search time: {stats['last_search_time_ms']:.2f}ms")
348
- print(f"Fusion time: {stats['last_fusion_time_ms']:.2f}ms")
349
- print(f"Cache hit rate: {stats['cache']['hit_rate']*100:.1f}%")
350
- ```
351
-
352
- #### Result Format
353
-
354
- ```python
355
- [
356
- {
357
- 'id': 1,
358
- 'content': 'Memory content...',
359
- 'summary': 'Summary...',
360
- 'score': 0.87,
361
- 'match_type': 'hybrid',
362
- 'category': 'development',
363
- 'tags': ['python', 'web'],
364
- # ... other memory fields
365
- },
366
- ...
367
- ]
368
- ```
369
-
370
- #### Weight Tuning Guide
371
-
372
- **For keyword queries** (exact terms):
373
- ```python
374
- weights = {'bm25': 0.7, 'semantic': 0.3, 'graph': 0.0}
375
- ```
376
-
377
- **For conceptual queries** (themes):
378
- ```python
379
- weights = {'bm25': 0.2, 'semantic': 0.3, 'graph': 0.5}
380
- ```
381
-
382
- **For balanced queries** (mixed):
383
- ```python
384
- weights = {'bm25': 0.4, 'semantic': 0.3, 'graph': 0.3} # Default
385
- ```
386
-
387
- ---
388
-
389
- ## Performance Benchmarks
390
-
391
- ### Test Environment
392
- - MacBook Pro M1 (2021)
393
- - Python 3.11
394
- - 1,000 test memories
395
- - Average memory size: 200 tokens
396
-
397
- ### Results
398
-
399
- | Component | Metric | Target | Actual | Status |
400
- |-----------|--------|--------|--------|--------|
401
- | BM25 | Index 1K docs | <500ms | 247ms | ✅ |
402
- | BM25 | Search 1K docs | <30ms | 18ms | ✅ |
403
- | Query Optimizer | Spell check | <5ms | 2ms | ✅ |
404
- | Cache Manager | Get/Put | <0.5ms | 0.12ms | ✅ |
405
- | Hybrid Search | Combined | <50ms | 35ms | ✅ |
406
-
407
- ### Scalability
408
-
409
- | Documents | BM25 Index | BM25 Search | Hybrid Search |
410
- |-----------|------------|-------------|---------------|
411
- | 100 | 25ms | 3ms | 8ms |
412
- | 500 | 120ms | 10ms | 20ms |
413
- | 1,000 | 247ms | 18ms | 35ms |
414
- | 5,000 | 1,200ms | 45ms | 95ms |
415
- | 10,000 | 2,400ms | 80ms | 180ms |
416
-
417
- **Notes:**
418
- - Index time is one-time cost
419
- - Search time scales sub-linearly (inverted index efficiency)
420
- - Hybrid search includes fusion overhead (~10-15ms)
421
- - These are projected estimates for the optional BM25 engine. See wiki Performance Benchmarks for measured end-to-end search latency.
422
-
423
- ---
424
-
425
- ## Integration with Memory Store V2
426
-
427
- ### Automatic Integration
428
-
429
- Hybrid search automatically integrates with `MemoryStoreV2`:
430
-
431
- ```python
432
- from memory_store_v2 import MemoryStoreV2
433
- from hybrid_search import HybridSearchEngine
434
-
435
- # Initialize
436
- store = MemoryStoreV2()
437
- hybrid = HybridSearchEngine(store.db_path)
438
-
439
- # Add memories (automatically indexed)
440
- store.add_memory("Python web development", tags=['python', 'web'])
441
-
442
- # Search
443
- results = hybrid.search("Python", limit=5)
444
- ```
445
-
446
- ### Backward Compatibility
447
-
448
- V2.2.0 maintains full backward compatibility:
449
-
450
- ```python
451
- # Old API still works
452
- results = store.search("Python web", limit=5)
453
-
454
- # New API available
455
- results = hybrid.search("Python web", method="hybrid", limit=5)
456
- ```
457
-
458
- ---
459
-
460
- ## Installation
461
-
462
- ### Basic (BM25 + Hybrid)
463
-
464
- ```bash
465
- pip install scikit-learn numpy
466
- ```
467
-
468
- ### Full (All features)
469
-
470
- ```bash
471
- pip install -r requirements-search.txt
472
- ```
473
-
474
- This includes:
475
- - scikit-learn (TF-IDF)
476
- - numpy (numerical computing)
477
- - sentence-transformers (optional embeddings)
478
- - hnswlib (optional fast search)
479
-
480
- ---
481
-
482
- ## Testing
483
-
484
- ### Run Test Suite
485
-
486
- ```bash
487
- python test_search_engine.py
488
- ```
489
-
490
- ### Expected Output
491
-
492
- ```
493
- ============================================================
494
- SuperLocalMemory V2.2.0 - Search Engine Test Suite
495
- ============================================================
496
-
497
- ✓ PASS: BM25 Basic Functionality
498
- → Indexed 4 docs, search returned 2 results
499
- ✓ PASS: BM25 Performance
500
- → Search: 18.45ms, Index: 247.32ms (1K docs)
501
- ✓ PASS: Query Optimizer
502
- → Spell correction and expansion working correctly
503
- ✓ PASS: Cache Manager
504
- → LRU eviction and stats working (hit rate: 33%)
505
- ✓ PASS: Cache TTL
506
- → Time-to-live expiration working correctly
507
- ✓ PASS: Search Quality
508
- → Relevance ranking correct, top score: 0.873
509
- ✓ PASS: Hybrid Search Integration
510
- → All methods working, 35.21ms search time
511
- ✓ PASS: Weighted Fusion
512
- → Multiple weight configurations working correctly
513
-
514
- ============================================================
515
- TEST SUMMARY
516
- ============================================================
517
- PASSED: 8
518
- FAILED: 0
519
- WARNINGS: 0
520
-
521
- ✅ All tests passed!
522
-
523
- Search Engine V2.2.0 Components:
524
- ✓ BM25 Search Engine
525
- ✓ Query Optimizer
526
- ✓ Cache Manager
527
- ✓ Hybrid Search System
528
-
529
- Performance Targets:
530
- ✓ BM25: <30ms for 1K memories
531
- ✓ Hybrid: <50ms for 1K memories
532
-
533
- 🎉 Ready for production!
534
- ```
535
-
536
- ---
537
-
538
- ## CLI Usage
539
-
540
- ### BM25 Search Engine
541
-
542
- ```bash
543
- python src/search_engine_v2.py
544
- ```
545
-
546
- Output:
547
- ```
548
- BM25 Search Engine - Demo
549
- ============================================================
550
-
551
- Indexing 6 documents...
552
- ✓ Indexed in 3.21ms
553
- Vocabulary: 42 unique terms
554
- Avg doc length: 8.5 tokens
555
-
556
- ============================================================
557
- Search Results:
558
- ============================================================
559
-
560
- Query: 'Python programming'
561
- Found: 3 results in 1.23ms
562
- Query terms: ['python', 'programming']
563
- [0.873] doc_0: Python is a high-level programming language...
564
- [0.542] doc_2: Machine learning uses Python libraries...
565
- [0.234] doc_4: Django is a Python web framework...
566
- ```
567
-
568
- ### Query Optimizer
569
-
570
- ```bash
571
- python src/query_optimizer.py
572
- ```
573
-
574
- ### Cache Manager
575
-
576
- ```bash
577
- python src/cache_manager.py
578
- ```
579
-
580
- ### Hybrid Search
581
-
582
- ```bash
583
- python src/hybrid_search.py "Python web development"
584
- ```
585
-
586
- ---
587
-
588
- ## Migration from V2.1.0
589
-
590
- No migration needed! V2.2.0 is backward compatible.
591
-
592
- ### Changes
593
-
594
- 1. **New components** (optional):
595
- - `search_engine_v2.py` - BM25 engine
596
- - `query_optimizer.py` - Query enhancement
597
- - `cache_manager.py` - Result caching
598
- - `hybrid_search.py` - Multi-method search
599
-
600
- 2. **Existing behavior preserved**:
601
- - `MemoryStoreV2.search()` still works
602
- - Database schema unchanged
603
- - API unchanged
604
-
605
- ### Upgrade Path
606
-
607
- **Option 1: Use old API (no changes)**
608
- ```python
609
- # Works exactly as before
610
- store = MemoryStoreV2()
611
- results = store.search("Python web")
612
- ```
613
-
614
- **Option 2: Use new hybrid search (recommended)**
615
- ```python
616
- # Better results, faster search
617
- hybrid = HybridSearchEngine(store.db_path)
618
- results = hybrid.search("Python web", method="hybrid")
619
- ```
620
-
621
- ---
622
-
623
- ## Troubleshooting
624
-
625
- ### Issue: "scikit-learn not found"
626
-
627
- **Solution:**
628
- ```bash
629
- pip install scikit-learn numpy
630
- ```
631
-
632
- ### Issue: Search is slow (>50ms)
633
-
634
- **Causes:**
635
- 1. Large database (>10K memories)
636
- 2. Complex queries
637
- 3. Cold cache
638
-
639
- **Solutions:**
640
- 1. Use BM25 only: `method="bm25"`
641
- 2. Reduce limit: `limit=10` instead of 50
642
- 3. Enable caching: `enable_cache=True`
643
-
644
- ### Issue: Poor relevance
645
-
646
- **Solutions:**
647
- 1. Try hybrid search: `method="hybrid"`
648
- 2. Adjust weights: `weights={'bm25': 0.6, ...}`
649
- 3. Use query expansion: `optimizer.optimize(..., enable_expansion=True)`
650
-
651
- ### Issue: High memory usage
652
-
653
- **Causes:**
654
- 1. Large vocabulary (>100K terms)
655
- 2. Cache too large
656
-
657
- **Solutions:**
658
- 1. Reduce BM25 `max_features` (not exposed by default)
659
- 2. Reduce cache size: `CacheManager(max_size=50)`
660
-
661
- ---
662
-
663
- ## Advanced Topics
664
-
665
- ### Custom BM25 Parameters
666
-
667
- ```python
668
- # For short documents (tweets, logs)
669
- engine = BM25SearchEngine(k1=1.2, b=0.0)
670
-
671
- # For long documents (articles, docs)
672
- engine = BM25SearchEngine(k1=2.0, b=1.0)
673
- ```
674
-
675
- ### Custom Fusion Weights
676
-
677
- ```python
678
- # Keyword-heavy queries
679
- results = hybrid.search(
680
- "Python FastAPI REST API",
681
- weights={'bm25': 0.8, 'semantic': 0.2, 'graph': 0.0}
682
- )
683
-
684
- # Conceptual queries
685
- results = hybrid.search(
686
- "how to optimize performance",
687
- weights={'bm25': 0.2, 'semantic': 0.4, 'graph': 0.4}
688
- )
689
- ```
690
-
691
- ### Cache Configuration
692
-
693
- ```python
694
- # High-traffic scenarios
695
- cache = CacheManager(
696
- max_size=1000, # Large cache
697
- ttl_seconds=600, # 10 minute TTL
698
- thread_safe=True # Enable locking
699
- )
700
-
701
- # Memory-constrained scenarios
702
- cache = CacheManager(
703
- max_size=50, # Small cache
704
- ttl_seconds=60, # 1 minute TTL
705
- thread_safe=False # No locking overhead
706
- )
707
- ```
708
-
709
- ---
710
-
711
- ## Roadmap
712
-
713
- ### V2.2.1 (Planned)
714
- - Query suggestions
715
- - Fuzzy matching
716
- - Phrase boosting
717
-
718
- ### V2.3.0 (Future)
719
- - Embedding-based search
720
- - Neural reranking
721
- - Cross-encoder scoring
722
-
723
- ---
724
-
725
- ## Credits
726
-
727
- **Created by:** Varun Pratap Bhardwaj
728
- **Role:** Solution Architect & Original Creator
729
- **GitHub:** [@varun369](https://github.com/varun369)
730
-
731
- ### Research Papers
732
-
733
- 1. **BM25:** Robertson & Zaragoza (2009) - "The Probabilistic Relevance Framework: BM25 and Beyond"
734
- 2. **RRF:** Cormack et al. (2009) - "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods"
735
- 3. **Query Expansion:** Carpineto & Romano (2012) - "A Survey of Automatic Query Expansion in Information Retrieval"
736
-
737
- ---
738
-
739
- ## License
740
-
741
- AGPL-3.0 - See [LICENSE](../LICENSE) file
742
-
743
- **Attribution Required:** This notice must be preserved in all copies per AGPL-3.0 terms.
744
-
745
- ---
746
-
747
- **Project:** [SuperLocalMemory V2](https://github.com/qualixar/superlocalmemory)
748
- **Documentation:** [Full Docs](https://github.com/qualixar/superlocalmemory/wiki)
749
- **Issues:** [Report Issues](https://github.com/qualixar/superlocalmemory/issues)