cozo-memory 1.0.9 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -63,6 +63,8 @@ Now you can add the server to your MCP client (e.g. Claude Desktop).
63
63
 
64
64
  🧠 **Agentic Retrieval Layer (v2.0)** - Auto-routing engine that analyzes query intent via local LLM to select the optimal search strategy (Vector, Graph, or Community)
65
65
 
66
+ 🎯 **Tiny Learned Reranker (v2.0)** - Integrated Cross-Encoder model (`ms-marco-MiniLM-L-6-v2`) for ultra-precise re-ranking of top search results
67
+
66
68
  🎯 **Multi-Vector Support (since v1.7)** - Dual embeddings per entity: content-embedding for context, name-embedding for identification
67
69
 
68
70
  ⚡ **Semantic Caching (since v0.8.5)** - Two-level cache (L1 memory + L2 persistent) with semantic query matching
@@ -191,8 +193,10 @@ This tool compares strategies using a synthetic dataset and measures **Recall@K*
191
193
  | Method | Recall@10 | Avg Latency | Best For |
192
194
  | :--- | :--- | :--- | :--- |
193
195
  | **Graph-RAG** | **1.00** | **~32 ms** | Deep relational reasoning |
196
+ | **Graph-RAG (Reranked)** | **1.00** | **~36 ms** | Maximum precision for relational data |
194
197
  | **Graph-Walking** | 1.00 | ~50 ms | Associative path exploration |
195
198
  | **Hybrid Search** | 1.00 | ~89 ms | Broad factual retrieval |
199
+ | **Reranked Search** | 1.00 | ~20 ms* | Ultra-precise factual search (Warm cache) |
196
200
 
197
201
  ## Architecture
198
202
 
@@ -608,14 +612,14 @@ PDF Ingestion via File Path:
608
612
  ### query_memory (Read)
609
613
 
610
614
  Actions:
611
- - `search`: `{ query, limit?, entity_types?, include_entities?, include_observations? }`
612
- - `advancedSearch`: `{ query, limit?, filters?, graphConstraints?, vectorOptions? }` **(New v1.1 / v1.4)**: Extended search with native HNSW filters (types) and robust post-filtering (metadata, time).
615
+ - `search`: `{ query, limit?, entity_types?, include_entities?, include_observations?, rerank? }`
616
+ - `advancedSearch`: `{ query, limit?, filters?, graphConstraints?, vectorOptions?, rerank? }`
613
617
  - `context`: `{ query, context_window?, time_range_hours? }`
614
618
  - `entity_details`: `{ entity_id, as_of? }`
615
619
  - `history`: `{ entity_id }`
616
- - `graph_rag`: `{ query, max_depth?, limit?, filters? }` Graph-based reasoning. Finds vector seeds (with inline filtering) first and then expands transitive relationships. Uses recursive Datalog for efficient BFS expansion.
620
+ - `graph_rag`: `{ query, max_depth?, limit?, filters?, rerank? }` Graph-based reasoning. Finds vector seeds (with inline filtering) first and then expands transitive relationships. Uses recursive Datalog for efficient BFS expansion.
617
621
  - `graph_walking`: `{ query, start_entity_id?, max_depth?, limit? }` (v1.7) Recursive semantic graph search. Starts at vector seeds or a specific entity and follows relationships to other semantically relevant entities. Ideal for deeper path exploration.
618
- - `agentic_search`: `{ query, limit? }` **(New v2.0)**: **Auto-Routing Search**. Uses a local LLM (Ollama) to analyze query intent and automatically routes it to the most appropriate strategy (`vector_search`, `graph_walk`, or `community_summary`).
622
+ - `agentic_search`: `{ query, limit?, rerank? }` **(New v2.0)**: **Auto-Routing Search**. Uses a local LLM (Ollama) to analyze query intent and automatically routes it to the most appropriate strategy (`vector_search`, `graph_walk`, or `community_summary`).
619
623
  - `get_relation_evolution`: `{ from_id, to_id?, since?, until? }` (in `analyze_graph`) Shows temporal development of relationships including time range filter and diff summary.
620
624
 
621
625
  Important Details:
@@ -885,6 +889,15 @@ Uncertainty/Transparency:
885
889
  - Inference candidates are marked as `source: "inference"` and provide a short reason (uncertainty hint) in the result.
886
890
  - In `context` output, inferred entities additionally carry an `uncertainty_hint` so an LLM can distinguish "hard fact" vs. "conjecture".
887
891
 
892
+ ### Tiny Learned Reranker (Cross-Encoder)
893
+
894
+ For maximum precision, CozoDB Memory integrates a specialized **Cross-Encoder Reranker** (Phase 2 RAG).
895
+
896
+ - **Model**: `Xenova/ms-marco-MiniLM-L-6-v2` (Local ONNX)
897
+ - **Mechanism**: After initial hybrid retrieval, the top candidates (up to 30) are re-evaluated by the cross-encoder. Unlike bi-encoders (vectors), cross-encoders process query and document simultaneously, capturing deep semantic nuances.
898
+ - **Latency**: Minimal overhead (~4-6ms for top 10 candidates).
899
+ - **Supported Tools**: Available as a `rerank: true` parameter in `search`, `advancedSearch`, `graph_rag`, and `agentic_search`.
900
+
888
901
  ### Inference Engine
889
902
 
890
903
  Inference uses multiple strategies (non-persisting):
@@ -0,0 +1,402 @@
1
+ "use strict";
2
+ var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
3
+ if (k2 === undefined) k2 = k;
4
+ var desc = Object.getOwnPropertyDescriptor(m, k);
5
+ if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
6
+ desc = { enumerable: true, get: function() { return m[k]; } };
7
+ }
8
+ Object.defineProperty(o, k2, desc);
9
+ }) : (function(o, m, k, k2) {
10
+ if (k2 === undefined) k2 = k;
11
+ o[k2] = m[k];
12
+ }));
13
+ var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
14
+ Object.defineProperty(o, "default", { enumerable: true, value: v });
15
+ }) : function(o, v) {
16
+ o["default"] = v;
17
+ });
18
+ var __importStar = (this && this.__importStar) || (function () {
19
+ var ownKeys = function(o) {
20
+ ownKeys = Object.getOwnPropertyNames || function (o) {
21
+ var ar = [];
22
+ for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
23
+ return ar;
24
+ };
25
+ return ownKeys(o);
26
+ };
27
+ return function (mod) {
28
+ if (mod && mod.__esModule) return mod;
29
+ var result = {};
30
+ if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
31
+ __setModuleDefault(result, mod);
32
+ return result;
33
+ };
34
+ })();
35
+ Object.defineProperty(exports, "__esModule", { value: true });
36
+ require("dotenv/config");
37
+ const embedding_service_1 = require("./embedding-service");
38
+ const path = __importStar(require("path"));
39
+ const fs = __importStar(require("fs"));
40
+ // Test data - verschiedene Szenarien
41
+ const TEST_QUERIES = [
42
+ "What is machine learning?",
43
+ "How do neural networks work?",
44
+ "Explain quantum computing",
45
+ "What are the benefits of TypeScript?",
46
+ "How to optimize database queries?",
47
+ "Best practices for API design",
48
+ "Understanding distributed systems",
49
+ "Introduction to graph databases",
50
+ "Microservices architecture patterns",
51
+ "Cloud computing fundamentals"
52
+ ];
53
+ const TEST_DOCUMENTS = [
54
+ "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.",
55
+ "Neural networks are computing systems inspired by biological neural networks that constitute animal brains. They consist of interconnected nodes or neurons.",
56
+ "Quantum computing uses quantum-mechanical phenomena such as superposition and entanglement to perform operations on data.",
57
+ "TypeScript is a strongly typed programming language that builds on JavaScript, giving you better tooling at any scale.",
58
+ "Database query optimization involves analyzing and improving query performance through indexing, query rewriting, and execution plan analysis.",
59
+ "API design best practices include using RESTful principles, proper versioning, clear documentation, and consistent error handling.",
60
+ "Distributed systems are computing systems whose components are located on different networked computers, which communicate and coordinate their actions.",
61
+ "Graph databases use graph structures with nodes, edges, and properties to represent and store data, ideal for connected data.",
62
+ "Microservices architecture is an approach to developing a single application as a suite of small services, each running in its own process.",
63
+ "Cloud computing delivers computing services including servers, storage, databases, networking, software, analytics, and intelligence over the Internet."
64
+ ];
65
+ // Cosine similarity berechnen
66
+ function cosineSimilarity(a, b) {
67
+ let dotProduct = 0;
68
+ let normA = 0;
69
+ let normB = 0;
70
+ for (let i = 0; i < a.length; i++) {
71
+ dotProduct += a[i] * b[i];
72
+ normA += a[i] * a[i];
73
+ normB += b[i] * b[i];
74
+ }
75
+ return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
76
+ }
77
+ // Test 1: Embedding-Geschwindigkeit
78
+ async function testEmbeddingSpeed(service, modelName) {
79
+ console.log(`\n${'='.repeat(70)}`);
80
+ console.log(`TEST 1: Embedding-Geschwindigkeit - ${modelName}`);
81
+ console.log('='.repeat(70));
82
+ const times = [];
83
+ // Warmup
84
+ await service.embed("warmup");
85
+ // Single embeddings
86
+ for (const query of TEST_QUERIES) {
87
+ const start = performance.now();
88
+ await service.embed(query);
89
+ const end = performance.now();
90
+ times.push(end - start);
91
+ }
92
+ const avgTime = times.reduce((a, b) => a + b, 0) / times.length;
93
+ const minTime = Math.min(...times);
94
+ const maxTime = Math.max(...times);
95
+ console.log(`\nSingle Embedding Performance:`);
96
+ console.log(` Average: ${avgTime.toFixed(2)} ms`);
97
+ console.log(` Min: ${minTime.toFixed(2)} ms`);
98
+ console.log(` Max: ${maxTime.toFixed(2)} ms`);
99
+ return { avgTime, minTime, maxTime };
100
+ }
101
+ // Test 2: Batch-Performance
102
+ async function testBatchPerformance(service, modelName) {
103
+ console.log(`\n${'='.repeat(70)}`);
104
+ console.log(`TEST 2: Batch-Performance - ${modelName}`);
105
+ console.log('='.repeat(70));
106
+ const start = performance.now();
107
+ await service.embedBatch(TEST_DOCUMENTS);
108
+ const end = performance.now();
109
+ const totalTime = end - start;
110
+ const avgPerDoc = totalTime / TEST_DOCUMENTS.length;
111
+ console.log(`\nBatch Embedding (${TEST_DOCUMENTS.length} documents):`);
112
+ console.log(` Total time: ${totalTime.toFixed(2)} ms`);
113
+ console.log(` Avg per doc: ${avgPerDoc.toFixed(2)} ms`);
114
+ console.log(` Throughput: ${(1000 / avgPerDoc).toFixed(2)} docs/sec`);
115
+ return { totalTime, avgPerDoc };
116
+ }
117
+ // Test 3: Semantic Similarity Quality
118
+ async function testSemanticSimilarity(service, modelName) {
119
+ console.log(`\n${'='.repeat(70)}`);
120
+ console.log(`TEST 3: Semantic Similarity Quality - ${modelName}`);
121
+ console.log('='.repeat(70));
122
+ // Embed queries and documents
123
+ const queryEmbeddings = await service.embedBatch(TEST_QUERIES);
124
+ const docEmbeddings = await service.embedBatch(TEST_DOCUMENTS);
125
+ // Expected matches (query index -> document index)
126
+ const expectedMatches = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
127
+ let correctMatches = 0;
128
+ const similarities = [];
129
+ console.log(`\nTop Match for each Query:`);
130
+ for (let i = 0; i < TEST_QUERIES.length; i++) {
131
+ const queryEmb = queryEmbeddings[i];
132
+ // Calculate similarities with all documents
133
+ const sims = docEmbeddings.map((docEmb, idx) => ({
134
+ idx,
135
+ similarity: cosineSimilarity(queryEmb, docEmb)
136
+ }));
137
+ // Sort by similarity
138
+ sims.sort((a, b) => b.similarity - a.similarity);
139
+ const topMatch = sims[0];
140
+ const isCorrect = topMatch.idx === expectedMatches[i];
141
+ if (isCorrect)
142
+ correctMatches++;
143
+ similarities.push(topMatch.similarity);
144
+ console.log(` Q${i}: "${TEST_QUERIES[i].substring(0, 40)}..."`);
145
+ console.log(` → Doc ${topMatch.idx} (sim: ${topMatch.similarity.toFixed(4)}) ${isCorrect ? '✓' : '✗'}`);
146
+ }
147
+ const accuracy = (correctMatches / TEST_QUERIES.length) * 100;
148
+ const avgSimilarity = similarities.reduce((a, b) => a + b, 0) / similarities.length;
149
+ console.log(`\nResults:`);
150
+ console.log(` Accuracy: ${accuracy.toFixed(1)}% (${correctMatches}/${TEST_QUERIES.length})`);
151
+ console.log(` Avg Similarity: ${avgSimilarity.toFixed(4)}`);
152
+ return { accuracy, avgSimilarity, correctMatches };
153
+ }
154
+ // Test 4: Long Context Handling
155
+ async function testLongContext(service, modelName) {
156
+ console.log(`\n${'='.repeat(70)}`);
157
+ console.log(`TEST 4: Long Context Handling - ${modelName}`);
158
+ console.log('='.repeat(70));
159
+ const shortText = "Machine learning is AI.";
160
+ const mediumText = TEST_DOCUMENTS[0]; // ~150 chars
161
+ const longText = TEST_DOCUMENTS.join(" "); // ~1000 chars
162
+ const veryLongText = longText.repeat(5); // ~5000 chars
163
+ const tests = [
164
+ { name: "Short (~20 chars)", text: shortText },
165
+ { name: "Medium (~150 chars)", text: mediumText },
166
+ { name: "Long (~1000 chars)", text: longText },
167
+ { name: "Very Long (~5000 chars)", text: veryLongText }
168
+ ];
169
+ console.log(`\nContext Length Performance:`);
170
+ const results = [];
171
+ for (const test of tests) {
172
+ const start = performance.now();
173
+ await service.embed(test.text);
174
+ const end = performance.now();
175
+ const time = end - start;
176
+ console.log(` ${test.name.padEnd(25)} ${time.toFixed(2)} ms`);
177
+ results.push({ name: test.name, time });
178
+ }
179
+ return results;
180
+ }
181
+ // Test 5: Cache Performance
182
+ async function testCachePerformance(service, modelName) {
183
+ console.log(`\n${'='.repeat(70)}`);
184
+ console.log(`TEST 5: Cache Performance - ${modelName}`);
185
+ console.log('='.repeat(70));
186
+ const testText = "Test cache performance";
187
+ // First call (cold)
188
+ const start1 = performance.now();
189
+ await service.embed(testText);
190
+ const end1 = performance.now();
191
+ const coldTime = end1 - start1;
192
+ // Second call (cached)
193
+ const start2 = performance.now();
194
+ await service.embed(testText);
195
+ const end2 = performance.now();
196
+ const cachedTime = end2 - start2;
197
+ const speedup = coldTime / cachedTime;
198
+ console.log(`\nCache Hit Performance:`);
199
+ console.log(` Cold (first call): ${coldTime.toFixed(2)} ms`);
200
+ console.log(` Cached (second): ${cachedTime.toFixed(2)} ms`);
201
+ console.log(` Speedup: ${speedup.toFixed(1)}x faster`);
202
+ const stats = service.getCacheStats();
203
+ console.log(`\nCache Statistics:`);
204
+ console.log(` Size: ${stats.size}/${stats.maxSize}`);
205
+ console.log(` Model: ${stats.model}`);
206
+ console.log(` Dims: ${stats.dimensions}`);
207
+ return { coldTime, cachedTime, speedup };
208
+ }
209
+ // Test 6: Memory Usage
210
+ async function testMemoryUsage(service, modelName) {
211
+ console.log(`\n${'='.repeat(70)}`);
212
+ console.log(`TEST 6: Memory Usage - ${modelName}`);
213
+ console.log('='.repeat(70));
214
+ const memBefore = process.memoryUsage();
215
+ // Embed a batch
216
+ await service.embedBatch(TEST_DOCUMENTS);
217
+ const memAfter = process.memoryUsage();
218
+ const heapUsedMB = (memAfter.heapUsed - memBefore.heapUsed) / 1024 / 1024;
219
+ const rssMB = (memAfter.rss - memBefore.rss) / 1024 / 1024;
220
+ console.log(`\nMemory Usage (after batch embedding):`);
221
+ console.log(` Heap Used: ${heapUsedMB.toFixed(2)} MB`);
222
+ console.log(` RSS: ${rssMB.toFixed(2)} MB`);
223
+ console.log(` Total Heap: ${(memAfter.heapTotal / 1024 / 1024).toFixed(2)} MB`);
224
+ return { heapUsedMB, rssMB };
225
+ }
226
+ // Main comparison function
227
+ async function runSingleModelTest(modelId) {
228
+ console.log('\n' + '█'.repeat(70));
229
+ console.log(`TESTING MODEL: ${modelId}`);
230
+ console.log('█'.repeat(70));
231
+ const service = new embedding_service_1.EmbeddingService();
232
+ const results = {
233
+ model: modelId,
234
+ timestamp: new Date().toISOString()
235
+ };
236
+ // Run tests
237
+ results.speed = await testEmbeddingSpeed(service, modelId);
238
+ results.batch = await testBatchPerformance(service, modelId);
239
+ results.similarity = await testSemanticSimilarity(service, modelId);
240
+ results.longContext = await testLongContext(service, modelId);
241
+ results.cache = await testCachePerformance(service, modelId);
242
+ results.memory = await testMemoryUsage(service, modelId);
243
+ // Save results
244
+ const resultsPath = path.join(__dirname, '..', `embedding-results-${modelId.replace(/\//g, '-')}.json`);
245
+ fs.writeFileSync(resultsPath, JSON.stringify(results, null, 2));
246
+ console.log(`\n✓ Results saved to: ${resultsPath}`);
247
+ return results;
248
+ }
249
+ async function compareResults() {
250
+ console.log('\n' + '█'.repeat(70));
251
+ console.log('LOADING AND COMPARING RESULTS');
252
+ console.log('█'.repeat(70));
253
+ const bgeFile = path.join(__dirname, '..', 'embedding-results-Xenova-bge-m3.json');
254
+ const pplxFile = path.join(__dirname, '..', 'embedding-results-perplexity-ai-pplx-embed-v1-0.6b.json');
255
+ if (!fs.existsSync(bgeFile)) {
256
+ console.error('\n✗ BGE-M3 results not found!');
257
+ console.log('Please run: EMBEDDING_MODEL=Xenova/bge-m3 npm run compare-embeddings');
258
+ return;
259
+ }
260
+ if (!fs.existsSync(pplxFile)) {
261
+ console.error('\n✗ pplx-embed results not found!');
262
+ console.log('Please run: EMBEDDING_MODEL=perplexity-ai/pplx-embed-v1-0.6b npm run compare-embeddings');
263
+ return;
264
+ }
265
+ const bge = JSON.parse(fs.readFileSync(bgeFile, 'utf-8'));
266
+ const pplx = JSON.parse(fs.readFileSync(pplxFile, 'utf-8'));
267
+ // Print comparison summary
268
+ console.log(`\n\n${'█'.repeat(70)}`);
269
+ console.log('COMPARISON SUMMARY');
270
+ console.log('█'.repeat(70));
271
+ console.log(`\n1. EMBEDDING SPEED (lower is better)`);
272
+ console.log(` BGE-M3: ${bge.speed.avgTime.toFixed(2)} ms`);
273
+ console.log(` pplx-embed: ${pplx.speed.avgTime.toFixed(2)} ms`);
274
+ const speedDiff = ((pplx.speed.avgTime - bge.speed.avgTime) / bge.speed.avgTime * 100);
275
+ console.log(` Winner: ${bge.speed.avgTime < pplx.speed.avgTime ? 'BGE-M3' : 'pplx-embed'} (${Math.abs(speedDiff).toFixed(1)}% ${speedDiff > 0 ? 'slower' : 'faster'})`);
276
+ console.log(`\n2. BATCH THROUGHPUT (higher is better)`);
277
+ const bgeThroughput = 1000 / bge.batch.avgPerDoc;
278
+ const pplxThroughput = 1000 / pplx.batch.avgPerDoc;
279
+ console.log(` BGE-M3: ${bgeThroughput.toFixed(2)} docs/sec`);
280
+ console.log(` pplx-embed: ${pplxThroughput.toFixed(2)} docs/sec`);
281
+ console.log(` Winner: ${bgeThroughput > pplxThroughput ? 'BGE-M3' : 'pplx-embed'}`);
282
+ console.log(`\n3. SEMANTIC SIMILARITY ACCURACY (higher is better)`);
283
+ console.log(` BGE-M3: ${bge.similarity.accuracy.toFixed(1)}% (${bge.similarity.correctMatches}/${TEST_QUERIES.length})`);
284
+ console.log(` pplx-embed: ${pplx.similarity.accuracy.toFixed(1)}% (${pplx.similarity.correctMatches}/${TEST_QUERIES.length})`);
285
+ console.log(` Winner: ${bge.similarity.accuracy > pplx.similarity.accuracy ? 'BGE-M3' : 'pplx-embed'} ${pplx.similarity.accuracy > bge.similarity.accuracy ? '🏆' : ''}`);
286
+ console.log(`\n4. AVERAGE SIMILARITY SCORE (higher is better)`);
287
+ console.log(` BGE-M3: ${bge.similarity.avgSimilarity.toFixed(4)}`);
288
+ console.log(` pplx-embed: ${pplx.similarity.avgSimilarity.toFixed(4)}`);
289
+ const simDiff = ((pplx.similarity.avgSimilarity - bge.similarity.avgSimilarity) / bge.similarity.avgSimilarity * 100);
290
+ console.log(` Winner: ${bge.similarity.avgSimilarity > pplx.similarity.avgSimilarity ? 'BGE-M3' : 'pplx-embed'} (${Math.abs(simDiff).toFixed(1)}% ${simDiff > 0 ? 'higher' : 'lower'})`);
291
+ console.log(`\n5. CACHE SPEEDUP (higher is better)`);
292
+ console.log(` BGE-M3: ${bge.cache.speedup.toFixed(1)}x`);
293
+ console.log(` pplx-embed: ${pplx.cache.speedup.toFixed(1)}x`);
294
+ console.log(`\n6. MEMORY USAGE (lower is better)`);
295
+ console.log(` BGE-M3: ${bge.memory.heapUsedMB.toFixed(2)} MB heap`);
296
+ console.log(` pplx-embed: ${pplx.memory.heapUsedMB.toFixed(2)} MB heap`);
297
+ console.log(` Winner: ${bge.memory.heapUsedMB < pplx.memory.heapUsedMB ? 'BGE-M3' : 'pplx-embed'}`);
298
+ // Score calculation
299
+ let bgeScore = 0;
300
+ let pplxScore = 0;
301
+ if (bge.speed.avgTime < pplx.speed.avgTime)
302
+ bgeScore++;
303
+ else
304
+ pplxScore++;
305
+ if (bgeThroughput > pplxThroughput)
306
+ bgeScore++;
307
+ else
308
+ pplxScore++;
309
+ if (bge.similarity.accuracy > pplx.similarity.accuracy)
310
+ bgeScore++;
311
+ else
312
+ pplxScore++;
313
+ if (bge.similarity.avgSimilarity > pplx.similarity.avgSimilarity)
314
+ bgeScore++;
315
+ else
316
+ pplxScore++;
317
+ if (bge.memory.heapUsedMB < pplx.memory.heapUsedMB)
318
+ bgeScore++;
319
+ else
320
+ pplxScore++;
321
+ // Overall winner
322
+ console.log(`\n${'='.repeat(70)}`);
323
+ console.log('OVERALL SCORE');
324
+ console.log('='.repeat(70));
325
+ console.log(`\n BGE-M3: ${bgeScore}/5 wins`);
326
+ console.log(` pplx-embed: ${pplxScore}/5 wins`);
327
+ console.log(`\n${'='.repeat(70)}`);
328
+ console.log('RECOMMENDATION');
329
+ console.log('='.repeat(70));
330
+ if (pplxScore > bgeScore) {
331
+ console.log(`\n✓ pplx-embed-v1-0.6b is RECOMMENDED 🏆`);
332
+ console.log(` Reasons:`);
333
+ if (pplx.similarity.accuracy > bge.similarity.accuracy) {
334
+ console.log(` ✓ Better semantic similarity accuracy (+${(pplx.similarity.accuracy - bge.similarity.accuracy).toFixed(1)}%)`);
335
+ }
336
+ if (pplx.similarity.avgSimilarity > bge.similarity.avgSimilarity) {
337
+ console.log(` ✓ Higher quality embeddings (+${(simDiff).toFixed(1)}%)`);
338
+ }
339
+ console.log(` ✓ 32K context length (vs 8K for BGE-M3)`);
340
+ console.log(` ✓ Better MTEB benchmark scores`);
341
+ }
342
+ else if (bgeScore > pplxScore) {
343
+ console.log(`\n✓ BGE-M3 is RECOMMENDED 🏆`);
344
+ console.log(` Reasons:`);
345
+ if (bge.speed.avgTime < pplx.speed.avgTime) {
346
+ console.log(` ✓ Faster embedding speed (-${Math.abs(speedDiff).toFixed(1)}%)`);
347
+ }
348
+ if (bge.memory.heapUsedMB < pplx.memory.heapUsedMB) {
349
+ console.log(` ✓ Lower memory usage`);
350
+ }
351
+ console.log(` ✓ Automatic download (no manual setup)`);
352
+ console.log(` ✓ Proven stability`);
353
+ }
354
+ else {
355
+ console.log(`\n⚖ BOTH MODELS ARE EQUALLY COMPETITIVE`);
356
+ console.log(` Choose based on your priorities:`);
357
+ console.log(` - pplx-embed: Better quality, longer context (32K)`);
358
+ console.log(` - BGE-M3: Faster, easier setup, automatic download`);
359
+ }
360
+ console.log('\n' + '█'.repeat(70));
361
+ console.log('COMPARISON COMPLETE');
362
+ console.log('█'.repeat(70) + '\n');
363
+ }
364
+ // Main entry point
365
+ async function main() {
366
+ const currentModel = process.env.EMBEDDING_MODEL || "Xenova/bge-m3";
367
+ // Check if we should compare existing results
368
+ const args = process.argv.slice(2);
369
+ if (args.includes('--compare')) {
370
+ await compareResults();
371
+ return;
372
+ }
373
+ console.log('\n' + '█'.repeat(70));
374
+ console.log('EMBEDDING MODEL BENCHMARK');
375
+ console.log('█'.repeat(70));
376
+ console.log(`\nCurrent model: ${currentModel}`);
377
+ console.log('\nThis will run a comprehensive test suite including:');
378
+ console.log(' 1. Embedding speed');
379
+ console.log(' 2. Batch performance');
380
+ console.log(' 3. Semantic similarity quality');
381
+ console.log(' 4. Long context handling');
382
+ console.log(' 5. Cache performance');
383
+ console.log(' 6. Memory usage');
384
+ console.log('\nEstimated time: 2-3 minutes\n');
385
+ await runSingleModelTest(currentModel);
386
+ console.log('\n' + '='.repeat(70));
387
+ console.log('NEXT STEPS');
388
+ console.log('='.repeat(70));
389
+ if (currentModel === 'Xenova/bge-m3') {
390
+ console.log('\nTo test pplx-embed, run:');
391
+ console.log(' EMBEDDING_MODEL=perplexity-ai/pplx-embed-v1-0.6b npm run compare-embeddings');
392
+ }
393
+ else if (currentModel === 'perplexity-ai/pplx-embed-v1-0.6b') {
394
+ console.log('\nTo test BGE-M3, run:');
395
+ console.log(' EMBEDDING_MODEL=Xenova/bge-m3 npm run compare-embeddings');
396
+ }
397
+ console.log('\nTo compare both results, run:');
398
+ console.log(' npm run compare-embeddings -- --compare');
399
+ console.log();
400
+ }
401
+ // Run
402
+ main().catch(console.error);
@@ -0,0 +1,151 @@
1
+ "use strict";
2
+ var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
3
+ if (k2 === undefined) k2 = k;
4
+ var desc = Object.getOwnPropertyDescriptor(m, k);
5
+ if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
6
+ desc = { enumerable: true, get: function() { return m[k]; } };
7
+ }
8
+ Object.defineProperty(o, k2, desc);
9
+ }) : (function(o, m, k, k2) {
10
+ if (k2 === undefined) k2 = k;
11
+ o[k2] = m[k];
12
+ }));
13
+ var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
14
+ Object.defineProperty(o, "default", { enumerable: true, value: v });
15
+ }) : function(o, v) {
16
+ o["default"] = v;
17
+ });
18
+ var __importStar = (this && this.__importStar) || (function () {
19
+ var ownKeys = function(o) {
20
+ ownKeys = Object.getOwnPropertyNames || function (o) {
21
+ var ar = [];
22
+ for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
23
+ return ar;
24
+ };
25
+ return ownKeys(o);
26
+ };
27
+ return function (mod) {
28
+ if (mod && mod.__esModule) return mod;
29
+ var result = {};
30
+ if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
31
+ __setModuleDefault(result, mod);
32
+ return result;
33
+ };
34
+ })();
35
+ Object.defineProperty(exports, "__esModule", { value: true });
36
+ require("dotenv/config");
37
+ const https = __importStar(require("https"));
38
+ const fs = __importStar(require("fs"));
39
+ const path = __importStar(require("path"));
40
+ const transformers_1 = require("@xenova/transformers");
41
+ // Configure cache path
42
+ const CACHE_DIR = path.resolve('./.cache');
43
+ transformers_1.env.cacheDir = CACHE_DIR;
44
+ const MODEL_ID = "perplexity-ai/pplx-embed-v1-0.6b";
45
+ const BASE_URL = `https://huggingface.co/${MODEL_ID}/resolve/main/onnx`;
46
+ // Model variant to download (quantized is recommended for smaller size)
47
+ const USE_QUANTIZED = true; // Set to false for FP32 full precision
48
+ // Files to download based on variant
49
+ const FILES = USE_QUANTIZED ? [
50
+ { name: 'model_quantized.onnx', size: '614 KB' },
51
+ { name: 'model_quantized.onnx_data', size: '706 MB' }
52
+ ] : [
53
+ { name: 'model.onnx', size: '520 KB' },
54
+ { name: 'model.onnx_data', size: '2.09 GB' },
55
+ { name: 'model.onnx_data_1', size: '306 MB' }
56
+ ];
57
+ // Target directory
58
+ const targetDir = path.join(CACHE_DIR, 'perplexity-ai', 'pplx-embed-v1-0.6b', 'onnx');
59
+ function downloadFile(url, dest) {
60
+ return new Promise((resolve, reject) => {
61
+ const file = fs.createWriteStream(dest);
62
+ https.get(url, (response) => {
63
+ if (response.statusCode === 302 || response.statusCode === 301) {
64
+ // Follow redirect
65
+ const redirectUrl = response.headers.location;
66
+ if (redirectUrl) {
67
+ file.close();
68
+ fs.unlinkSync(dest);
69
+ return downloadFile(redirectUrl, dest).then(resolve).catch(reject);
70
+ }
71
+ }
72
+ const totalSize = parseInt(response.headers['content-length'] || '0', 10);
73
+ let downloadedSize = 0;
74
+ response.on('data', (chunk) => {
75
+ downloadedSize += chunk.length;
76
+ const progress = ((downloadedSize / totalSize) * 100).toFixed(2);
77
+ process.stdout.write(`\r Progress: ${progress}% (${(downloadedSize / 1024 / 1024).toFixed(2)} MB / ${(totalSize / 1024 / 1024).toFixed(2)} MB)`);
78
+ });
79
+ response.pipe(file);
80
+ file.on('finish', () => {
81
+ file.close();
82
+ console.log('\n ✓ Download complete');
83
+ resolve();
84
+ });
85
+ }).on('error', (err) => {
86
+ fs.unlinkSync(dest);
87
+ reject(err);
88
+ });
89
+ });
90
+ }
91
+ async function downloadPplxEmbed() {
92
+ console.log('='.repeat(70));
93
+ console.log('Downloading Perplexity pplx-embed-v1-0.6b ONNX files');
94
+ console.log(`Variant: ${USE_QUANTIZED ? 'INT8 Quantized (Recommended)' : 'FP32 Full Precision'}`);
95
+ console.log('='.repeat(70));
96
+ console.log();
97
+ // Create target directory
98
+ if (!fs.existsSync(targetDir)) {
99
+ fs.mkdirSync(targetDir, { recursive: true });
100
+ console.log(`✓ Created directory: ${targetDir}`);
101
+ }
102
+ console.log();
103
+ console.log('Files to download:');
104
+ FILES.forEach(f => console.log(` - ${f.name} (${f.size})`));
105
+ console.log();
106
+ console.log(`Total size: ${USE_QUANTIZED ? '~706 MB' : '~2.5 GB'}`);
107
+ console.log(`This may take ${USE_QUANTIZED ? '3-10' : '10-30'} minutes depending on your internet connection.`);
108
+ console.log();
109
+ if (USE_QUANTIZED) {
110
+ console.log('ℹ Using INT8 quantized model (recommended):');
111
+ console.log(' ✓ 3x smaller than FP32 (~706 MB vs ~2.4 GB)');
112
+ console.log(' ✓ Minimal quality loss (~1.5% MTEB drop)');
113
+ console.log(' ✓ Faster inference');
114
+ console.log();
115
+ console.log(' To use FP32 instead, edit src/download-pplx-embed.ts');
116
+ console.log(' and set USE_QUANTIZED = false');
117
+ console.log();
118
+ }
119
+ // Download each file
120
+ for (const file of FILES) {
121
+ const filePath = path.join(targetDir, file.name);
122
+ // Skip if already exists
123
+ if (fs.existsSync(filePath)) {
124
+ console.log(`⊘ Skipping ${file.name} (already exists)`);
125
+ continue;
126
+ }
127
+ console.log(`⬇ Downloading ${file.name} (${file.size})...`);
128
+ const url = `${BASE_URL}/${file.name}`;
129
+ try {
130
+ await downloadFile(url, filePath);
131
+ }
132
+ catch (error) {
133
+ console.error(`✗ Failed to download ${file.name}:`, error.message);
134
+ console.error(' Please download manually from:');
135
+ console.error(` ${url}`);
136
+ process.exit(1);
137
+ }
138
+ }
139
+ console.log();
140
+ console.log('='.repeat(70));
141
+ console.log('✓ All files downloaded successfully!');
142
+ console.log('='.repeat(70));
143
+ console.log();
144
+ console.log('You can now use the model by setting in .env:');
145
+ console.log(' EMBEDDING_MODEL=perplexity-ai/pplx-embed-v1-0.6b');
146
+ console.log();
147
+ console.log('Then start the server with:');
148
+ console.log(' npm run start');
149
+ console.log();
150
+ }
151
+ downloadPplxEmbed().catch(console.error);
@@ -94,9 +94,16 @@ class EmbeddingService {
94
94
  tokenizer = null;
95
95
  modelId;
96
96
  dimensions;
97
+ useOllama;
98
+ ollamaModel;
99
+ ollamaBaseUrl;
97
100
  queue = Promise.resolve();
98
101
  constructor() {
99
102
  this.cache = new LRUCache(1000, 3600000); // 1000 entries, 1h TTL
103
+ // Check if Ollama should be used
104
+ this.useOllama = process.env.USE_OLLAMA === 'true';
105
+ this.ollamaModel = process.env.OLLAMA_EMBEDDING_MODEL || 'argus-ai/pplx-embed-v1-0.6b:q8_0';
106
+ this.ollamaBaseUrl = process.env.OLLAMA_BASE_URL || 'http://localhost:11434';
100
107
  // Support multiple embedding models via environment variable
101
108
  this.modelId = process.env.EMBEDDING_MODEL || "Xenova/bge-m3";
102
109
  // Set dimensions based on model
@@ -106,9 +113,20 @@ class EmbeddingService {
106
113
  "Xenova/bge-small-en-v1.5": 384,
107
114
  "Xenova/nomic-embed-text-v1": 768,
108
115
  "onnx-community/Qwen3-Embedding-0.6B-ONNX": 1024,
116
+ // Note: perplexity-ai models require manual ONNX file placement
117
+ // See PPLX_EMBED_INTEGRATION.md for instructions
118
+ "perplexity-ai/pplx-embed-v1-0.6b": 1024,
119
+ "perplexity-ai/pplx-embed-v1-4b": 2560,
120
+ // Ollama models
121
+ "argus-ai/pplx-embed-v1-0.6b:q8_0": 1024,
109
122
  };
110
- this.dimensions = dimensionMap[this.modelId] || 1024;
111
- console.error(`[EmbeddingService] Using model: ${this.modelId} (${this.dimensions} dimensions)`);
123
+ this.dimensions = dimensionMap[this.useOllama ? this.ollamaModel : this.modelId] || 1024;
124
+ if (this.useOllama) {
125
+ console.error(`[EmbeddingService] Using Ollama: ${this.ollamaModel} @ ${this.ollamaBaseUrl} (${this.dimensions} dimensions)`);
126
+ }
127
+ else {
128
+ console.error(`[EmbeddingService] Using ONNX model: ${this.modelId} (${this.dimensions} dimensions)`);
129
+ }
112
130
  }
113
131
  // Public getter for dimensions
114
132
  getDimensions() {
@@ -125,6 +143,11 @@ class EmbeddingService {
125
143
  async init() {
126
144
  if (this.session && this.tokenizer)
127
145
  return;
146
+ // Skip ONNX initialization if using Ollama
147
+ if (this.useOllama) {
148
+ console.error('[EmbeddingService] Using Ollama backend, skipping ONNX initialization');
149
+ return;
150
+ }
128
151
  try {
129
152
  // 1. Check if model needs to be downloaded
130
153
  // Extract namespace and model name from modelId (e.g., "Xenova/bge-m3" or "onnx-community/Qwen3-Embedding-0.6B-ONNX")
@@ -139,10 +162,23 @@ class EmbeddingService {
139
162
  if (!fs.existsSync(fp32Path) && !fs.existsSync(quantizedPath)) {
140
163
  console.log(`[EmbeddingService] Model not found, downloading ${this.modelId}...`);
141
164
  console.log(`[EmbeddingService] This may take a few minutes on first run.`);
142
- // Import AutoModel dynamically to trigger download
143
- const { AutoModel } = await import("@xenova/transformers");
144
- await AutoModel.from_pretrained(this.modelId, { quantized: false });
145
- console.log(`[EmbeddingService] Model download completed.`);
165
+ // Check if this is a Xenova-compatible model
166
+ if (namespace === 'Xenova' || namespace === 'onnx-community') {
167
+ // Import AutoModel dynamically to trigger download
168
+ const { AutoModel } = await import("@xenova/transformers");
169
+ await AutoModel.from_pretrained(this.modelId, { quantized: false });
170
+ console.log(`[EmbeddingService] Model download completed.`);
171
+ }
172
+ else {
173
+ // For non-Xenova models (like perplexity-ai), provide manual download instructions
174
+ console.error(`[EmbeddingService] ERROR: Model ${this.modelId} is not available via @xenova/transformers`);
175
+ console.error(`[EmbeddingService] Please download the model manually:`);
176
+ console.error(`[EmbeddingService] 1. Visit: https://huggingface.co/${this.modelId}`);
177
+ console.error(`[EmbeddingService] 2. Download the 'onnx' folder contents`);
178
+ console.error(`[EmbeddingService] 3. Place files in: ${baseDir}`);
179
+ console.error(`[EmbeddingService] See PPLX_EMBED_INTEGRATION.md for detailed instructions`);
180
+ throw new Error(`Model ${this.modelId} requires manual download. See error messages above.`);
181
+ }
146
182
  }
147
183
  // 2. Load Tokenizer
148
184
  if (!this.tokenizer) {
@@ -188,6 +224,10 @@ class EmbeddingService {
188
224
  return cached;
189
225
  }
190
226
  try {
227
+ // Use Ollama if enabled
228
+ if (this.useOllama) {
229
+ return await this.embedWithOllama(textStr);
230
+ }
191
231
  await this.init();
192
232
  if (!this.session || !this.tokenizer)
193
233
  throw new Error("Session/Tokenizer not initialized");
@@ -240,6 +280,37 @@ class EmbeddingService {
240
280
  }
241
281
  });
242
282
  }
283
+ async embedWithOllama(text) {
284
+ try {
285
+ const response = await fetch(`${this.ollamaBaseUrl}/api/embeddings`, {
286
+ method: 'POST',
287
+ headers: {
288
+ 'Content-Type': 'application/json',
289
+ },
290
+ body: JSON.stringify({
291
+ model: this.ollamaModel,
292
+ prompt: text,
293
+ }),
294
+ });
295
+ if (!response.ok) {
296
+ throw new Error(`Ollama API error: ${response.status} ${response.statusText}`);
297
+ }
298
+ const data = await response.json();
299
+ if (!data.embedding || !Array.isArray(data.embedding)) {
300
+ throw new Error('Invalid response from Ollama API');
301
+ }
302
+ const embedding = data.embedding;
303
+ // Normalize the embedding
304
+ const normalized = this.normalize(embedding);
305
+ // Cache it
306
+ this.cache.set(text, normalized);
307
+ return normalized;
308
+ }
309
+ catch (error) {
310
+ console.error(`[EmbeddingService] Ollama error for "${text.substring(0, 20)}...":`, error?.message || error);
311
+ return new Array(this.dimensions).fill(0);
312
+ }
313
+ }
243
314
  // Batch-Embeddings
244
315
  async embedBatch(texts) {
245
316
  // For now, process sequentially via serialized queue to avoid overloading
@@ -306,7 +377,8 @@ class EmbeddingService {
306
377
  return {
307
378
  size: this.cache.size(),
308
379
  maxSize: 1000,
309
- model: this.modelId,
380
+ model: this.useOllama ? this.ollamaModel : this.modelId,
381
+ backend: this.useOllama ? 'ollama' : 'onnx',
310
382
  dimensions: this.dimensions
311
383
  };
312
384
  }
@@ -88,10 +88,14 @@ async function runEvaluation() {
88
88
  }
89
89
  const server = new index_1.MemoryServer(EVAL_DB_PATH);
90
90
  await server.embeddingService.embed("warmup");
91
+ // Warmup reranker
92
+ await server.hybridSearch.advancedSearch({ query: "warmup", limit: 1, rerank: true });
91
93
  await setupEvalData(server);
92
94
  const methods = [
93
95
  { name: "Hybrid Search", func: (q) => server.hybridSearch.search({ query: q, limit: 10 }) },
96
+ { name: "Reranked Search", func: (q) => server.hybridSearch.search({ query: q, limit: 10, rerank: true }) },
94
97
  { name: "Graph-RAG", func: (q) => server.hybridSearch.graphRag({ query: q, limit: 10, graphConstraints: { maxDepth: 2 } }) },
98
+ { name: "Graph-RAG (Reranked)", func: (q) => server.hybridSearch.graphRag({ query: q, limit: 10, graphConstraints: { maxDepth: 2 }, rerank: true }) },
95
99
  { name: "Graph-Walking", func: (q) => server.graph_walking({ query: q, limit: 10, max_depth: 3 }) }
96
100
  ];
97
101
  const summary = [];
@@ -101,6 +105,9 @@ async function runEvaluation() {
101
105
  let totalRecall10 = 0;
102
106
  let totalMRR = 0;
103
107
  let totalLatency = 0;
108
+ const n = EVAL_DATASET.length;
109
+ // Reset cache between methods to get accurate latency
110
+ await server.hybridSearch.clearCache();
104
111
  for (const task of EVAL_DATASET) {
105
112
  const t0 = perf_hooks_1.performance.now();
106
113
  const results = await method.func(task.query);
@@ -113,7 +120,6 @@ async function runEvaluation() {
113
120
  totalMRR += mrr;
114
121
  totalLatency += (t1 - t0);
115
122
  }
116
- const n = EVAL_DATASET.length;
117
123
  summary.push({
118
124
  Method: method.name,
119
125
  "Recall@3": (totalRecall3 / n).toFixed(3),
@@ -5,15 +5,18 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
5
5
  Object.defineProperty(exports, "__esModule", { value: true });
6
6
  exports.HybridSearch = void 0;
7
7
  const crypto_1 = __importDefault(require("crypto"));
8
+ const reranker_service_1 = require("./reranker-service");
8
9
  const SEMANTIC_CACHE_THRESHOLD = 0.95;
9
10
  class HybridSearch {
10
11
  db;
11
12
  embeddingService;
13
+ rerankerService;
12
14
  searchCache = new Map();
13
15
  CACHE_TTL = 300000; // 5 minutes cache
14
16
  constructor(db, embeddingService) {
15
17
  this.db = db;
16
18
  this.embeddingService = embeddingService;
19
+ this.rerankerService = new reranker_service_1.RerankerService();
17
20
  }
18
21
  getCacheKey(options) {
19
22
  const str = JSON.stringify({
@@ -75,6 +78,36 @@ class HybridSearch {
75
78
  return { ...r, score };
76
79
  });
77
80
  }
81
+ async applyReranking(query, results) {
82
+ if (results.length <= 1)
83
+ return results;
84
+ console.error(`[HybridSearch] Reranking ${results.length} candidates...`);
85
+ const documents = results.map(r => {
86
+ const parts = [
87
+ r.name ? `Name: ${r.name}` : '',
88
+ r.type ? `Type: ${r.type}` : '',
89
+ r.text ? `Description: ${r.text}` : '',
90
+ r.metadata ? `Details: ${JSON.stringify(r.metadata)}` : ''
91
+ ].filter(p => p !== '');
92
+ return parts.join(' | ');
93
+ });
94
+ try {
95
+ const rerankedOrder = await this.rerankerService.rerank(query, documents);
96
+ return rerankedOrder.map((item, i) => {
97
+ const original = results[item.index];
98
+ return {
99
+ ...original,
100
+ score: (item.score + 1.0) / 2.0, // Normalize to 0-1 range if it's logits, or just use as is
101
+ explanation: (typeof original.explanation === 'string' ? original.explanation : JSON.stringify(original.explanation)) +
102
+ ` | Reranked (Rank ${i + 1}, Cross-Encoder Score: ${item.score.toFixed(4)})`
103
+ };
104
+ });
105
+ }
106
+ catch (e) {
107
+ console.error(`[HybridSearch] Reranking failed, returning original results:`, e);
108
+ return results;
109
+ }
110
+ }
78
111
  async advancedSearch(options) {
79
112
  console.error("[HybridSearch] Starting advancedSearch with options:", JSON.stringify(options, null, 2));
80
113
  const { query, limit = 10, filters, graphConstraints, vectorParams } = options;
@@ -212,6 +245,12 @@ class HybridSearch {
212
245
  });
213
246
  }
214
247
  const finalResults = this.applyTimeDecay(searchResults);
248
+ // Phase 3: Reranking
249
+ if (options.rerank) {
250
+ const rerankedResults = await this.applyReranking(options.query, finalResults);
251
+ await this.updateCache(options, queryEmbedding, rerankedResults);
252
+ return rerankedResults;
253
+ }
215
254
  await this.updateCache(options, queryEmbedding, finalResults);
216
255
  return finalResults;
217
256
  }
@@ -330,7 +369,11 @@ class HybridSearch {
330
369
  return Object.entries(filters.metadata).every(([key, val]) => r.metadata[key] === val);
331
370
  });
332
371
  }
333
- return this.applyTimeDecay(searchResults);
372
+ const decayedResults = this.applyTimeDecay(searchResults);
373
+ if (options.rerank) {
374
+ return await this.applyReranking(options.query, decayedResults);
375
+ }
376
+ return decayedResults;
334
377
  }
335
378
  catch (e) {
336
379
  console.error("[HybridSearch] Error in graphRag:", e.message);
@@ -414,7 +457,8 @@ No markdown, no explanation. Just the JSON.`;
414
457
  filters: {
415
458
  ...options.filters,
416
459
  entityTypes: ["CommunitySummary"]
417
- }
460
+ },
461
+ rerank: options.rerank
418
462
  });
419
463
  // If no community summaries found, fallback to standard search
420
464
  if (results.length === 0) {
@@ -437,5 +481,15 @@ No markdown, no explanation. Just the JSON.`;
437
481
  }
438
482
  }));
439
483
  }
484
+ async clearCache() {
485
+ this.searchCache.clear();
486
+ try {
487
+ await this.db.run(`{ ?[query_hash] := *search_cache{query_hash} :rm search_cache {query_hash} }`);
488
+ console.error("[HybridSearch] Cache cleared successfully.");
489
+ }
490
+ catch (e) {
491
+ console.error("[HybridSearch] Error clearing cache:", e);
492
+ }
493
+ }
440
494
  }
441
495
  exports.HybridSearch = HybridSearch;
package/dist/index.js CHANGED
@@ -2631,6 +2631,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
2631
2631
  entity_types: zod_1.z.array(zod_1.z.string()).optional().describe("Filter by entity types"),
2632
2632
  include_entities: zod_1.z.boolean().optional().default(true).describe("Include entities in search"),
2633
2633
  include_observations: zod_1.z.boolean().optional().default(true).describe("Include observations in search"),
2634
+ rerank: zod_1.z.boolean().optional().default(false).describe("Use Cross-Encoder reranking for higher precision"),
2634
2635
  }),
2635
2636
  zod_1.z.object({
2636
2637
  action: zod_1.z.literal("advancedSearch"),
@@ -2658,6 +2659,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
2658
2659
  vectorParams: zod_1.z.object({
2659
2660
  efSearch: zod_1.z.number().optional().describe("HNSW search precision"),
2660
2661
  }).optional().describe("Vector parameters"),
2662
+ rerank: zod_1.z.boolean().optional().default(false).describe("Use Cross-Encoder reranking for higher precision"),
2661
2663
  }),
2662
2664
  zod_1.z.object({
2663
2665
  action: zod_1.z.literal("context"),
@@ -2679,6 +2681,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
2679
2681
  query: zod_1.z.string().describe("Search query for initial vector seeds"),
2680
2682
  max_depth: zod_1.z.number().min(1).max(3).optional().default(2).describe("Maximum depth of graph expansion (Default: 2)"),
2681
2683
  limit: zod_1.z.number().optional().default(10).describe("Number of initial vector seeds"),
2684
+ rerank: zod_1.z.boolean().optional().default(false).describe("Use Cross-Encoder reranking for higher precision"),
2682
2685
  }),
2683
2686
  zod_1.z.object({
2684
2687
  action: zod_1.z.literal("graph_walking"),
@@ -2691,6 +2694,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
2691
2694
  action: zod_1.z.literal("agentic_search"),
2692
2695
  query: zod_1.z.string().describe("Context query for agentic routing"),
2693
2696
  limit: zod_1.z.number().optional().default(10).describe("Maximum number of results"),
2697
+ rerank: zod_1.z.boolean().optional().default(false).describe("Use Cross-Encoder reranking for higher precision"),
2694
2698
  }),
2695
2699
  ]);
2696
2700
  const QueryMemoryParameters = zod_1.z.object({
@@ -2711,6 +2715,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
2711
2715
  as_of: zod_1.z.string().optional().describe("Only for entity_details: ISO string or 'NOW'"),
2712
2716
  max_depth: zod_1.z.number().optional().describe("Only for graph_rag/graph_walking: Maximum expansion depth"),
2713
2717
  start_entity_id: zod_1.z.string().optional().describe("Only for graph_walking: Start entity"),
2718
+ rerank: zod_1.z.boolean().optional().describe("Only for search/advancedSearch/agentic_search: Enable Cross-Encoder reranking"),
2714
2719
  });
2715
2720
  this.mcp.addTool({
2716
2721
  name: "query_memory",
@@ -2745,6 +2750,7 @@ Notes: 'agentic_search' is the most powerful and adaptable, 'context' is ideal f
2745
2750
  entityTypes: input.entity_types,
2746
2751
  includeEntities: input.include_entities,
2747
2752
  includeObservations: input.include_observations,
2753
+ rerank: input.rerank,
2748
2754
  });
2749
2755
  const conflictEntityIds = Array.from(new Set(results
2750
2756
  .map((r) => (r.name ? r.id : r.entity_id))
@@ -2776,6 +2782,7 @@ Notes: 'agentic_search' is the most powerful and adaptable, 'context' is ideal f
2776
2782
  filters: input.filters,
2777
2783
  graphConstraints: input.graphConstraints,
2778
2784
  vectorParams: input.vectorParams,
2785
+ rerank: input.rerank,
2779
2786
  });
2780
2787
  const conflictEntityIds = Array.from(new Set(results
2781
2788
  .map((r) => (r.name ? r.id : r.entity_id))
@@ -2888,6 +2895,7 @@ Notes: 'agentic_search' is the most powerful and adaptable, 'context' is ideal f
2888
2895
  const results = await this.hybridSearch.agenticRetrieve({
2889
2896
  query: input.query,
2890
2897
  limit: input.limit,
2898
+ rerank: input.rerank,
2891
2899
  });
2892
2900
  return JSON.stringify(results);
2893
2901
  }
@@ -2900,7 +2908,8 @@ Notes: 'agentic_search' is the most powerful and adaptable, 'context' is ideal f
2900
2908
  limit: input.limit,
2901
2909
  graphConstraints: {
2902
2910
  maxDepth: input.max_depth
2903
- }
2911
+ },
2912
+ rerank: input.rerank,
2904
2913
  });
2905
2914
  return JSON.stringify(results);
2906
2915
  }
@@ -0,0 +1,125 @@
1
+ "use strict";
2
+ var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
3
+ if (k2 === undefined) k2 = k;
4
+ var desc = Object.getOwnPropertyDescriptor(m, k);
5
+ if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
6
+ desc = { enumerable: true, get: function() { return m[k]; } };
7
+ }
8
+ Object.defineProperty(o, k2, desc);
9
+ }) : (function(o, m, k, k2) {
10
+ if (k2 === undefined) k2 = k;
11
+ o[k2] = m[k];
12
+ }));
13
+ var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
14
+ Object.defineProperty(o, "default", { enumerable: true, value: v });
15
+ }) : function(o, v) {
16
+ o["default"] = v;
17
+ });
18
+ var __importStar = (this && this.__importStar) || (function () {
19
+ var ownKeys = function(o) {
20
+ ownKeys = Object.getOwnPropertyNames || function (o) {
21
+ var ar = [];
22
+ for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
23
+ return ar;
24
+ };
25
+ return ownKeys(o);
26
+ };
27
+ return function (mod) {
28
+ if (mod && mod.__esModule) return mod;
29
+ var result = {};
30
+ if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
31
+ __setModuleDefault(result, mod);
32
+ return result;
33
+ };
34
+ })();
35
+ Object.defineProperty(exports, "__esModule", { value: true });
36
+ exports.RerankerService = void 0;
37
+ const transformers_1 = require("@xenova/transformers");
38
+ const path = __importStar(require("path"));
39
+ const fs = __importStar(require("fs"));
40
+ // Robust path to project root
41
+ const PROJECT_ROOT = path.resolve(__dirname, '..');
42
+ const CACHE_DIR = path.resolve(PROJECT_ROOT, '.cache');
43
+ transformers_1.env.cacheDir = CACHE_DIR;
44
+ transformers_1.env.allowLocalModels = true;
45
+ class RerankerService {
46
+ pipe = null;
47
+ modelId;
48
+ initialized = false;
49
+ constructor() {
50
+ // Using a tiny but effective cross-encoder
51
+ this.modelId = process.env.RERANKER_MODEL || "Xenova/ms-marco-MiniLM-L-6-v2";
52
+ console.error(`[RerankerService] Using model: ${this.modelId}`);
53
+ }
54
+ async init() {
55
+ if (this.initialized)
56
+ return;
57
+ try {
58
+ // Check if model exists locally in cache
59
+ const parts = this.modelId.split('/');
60
+ const namespace = parts[0];
61
+ const modelName = parts[1];
62
+ const modelDir = path.join(CACHE_DIR, namespace, modelName);
63
+ if (!fs.existsSync(modelDir)) {
64
+ console.log(`[RerankerService] Model not found, downloading ${this.modelId}...`);
65
+ }
66
+ // We use the sequence-classification task for cross-encoders
67
+ this.pipe = await (0, transformers_1.pipeline)('sequence-classification', this.modelId, {
68
+ quantized: true,
69
+ // @ts-ignore
70
+ progress_callback: (info) => {
71
+ if (info.status === 'done') {
72
+ console.error(`[RerankerService] Loaded shard: ${info.file}`);
73
+ }
74
+ }
75
+ });
76
+ this.initialized = true;
77
+ console.error(`[RerankerService] Initialization complete.`);
78
+ }
79
+ catch (error) {
80
+ console.error(`[RerankerService] Initialization failed:`, error);
81
+ throw error;
82
+ }
83
+ }
84
+ /**
85
+ * Reranks a list of documents based on a query.
86
+ * @param query The search query
87
+ * @param documents Array of document strings to rank
88
+ * @returns Array of { index, score } sorted by score descending
89
+ */
90
+ async rerank(query, documents) {
91
+ if (documents.length === 0)
92
+ return [];
93
+ await this.init();
94
+ try {
95
+ const results = [];
96
+ // Cross-encoders take pairs of [query, document]
97
+ // We can process them in a single batch
98
+ const inputs = documents.map(doc => [query, doc]);
99
+ // @ts-ignore
100
+ const outputs = await this.pipe(inputs, {
101
+ topk: 1 // We want the score for the "relevant" class (usually index 1 or the only output)
102
+ });
103
+ // Handle both array of results and single result (if only 1 doc)
104
+ const outputArray = Array.isArray(outputs) ? outputs : [outputs];
105
+ for (let i = 0; i < outputArray.length; i++) {
106
+ // Cross-encoders for ms-marco typically output a single logit/score or a 2-class distribution
107
+ // transformers.js sequence-classification returns { label: string, score: number }[]
108
+ // For ms-marco, label 'LABEL_1' is usually the relevance score
109
+ const out = outputArray[i];
110
+ results.push({
111
+ index: i,
112
+ score: out.score || 0
113
+ });
114
+ }
115
+ // Sort by score descending
116
+ return results.sort((a, b) => b.score - a.score);
117
+ }
118
+ catch (error) {
119
+ console.error(`[RerankerService] Reranking failed:`, error);
120
+ // Fallback: return original order with neutral scores
121
+ return documents.map((_, i) => ({ index: i, score: 0 }));
122
+ }
123
+ }
124
+ }
125
+ exports.RerankerService = RerankerService;
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "cozo-memory",
3
- "version": "1.0.9",
3
+ "version": "1.1.0",
4
4
  "mcpName": "io.github.tobs-code/cozo-memory",
5
5
  "description": "Local-first persistent memory system for AI agents with hybrid search, graph reasoning, and MCP integration",
6
6
  "main": "dist/index.js",
@@ -38,7 +38,9 @@
38
38
  "test": "echo \"Error: no test specified\" && exit 1",
39
39
  "benchmark": "ts-node src/benchmark.ts",
40
40
  "eval": "ts-node src/eval-suite.ts",
41
- "download-model": "ts-node src/download-model.ts"
41
+ "download-model": "ts-node src/download-model.ts",
42
+ "download-pplx-embed": "ts-node src/download-pplx-embed.ts",
43
+ "compare-embeddings": "ts-node src/compare-embeddings.ts"
42
44
  },
43
45
  "keywords": [
44
46
  "mcp",
@@ -95,4 +97,4 @@
95
97
  "tsx": "^4.21.0",
96
98
  "typescript": "^5.9.3"
97
99
  }
98
- }
100
+ }