cozo-memory 1.0.8 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +17 -4
- package/dist/compare-embeddings.js +402 -0
- package/dist/download-pplx-embed.js +151 -0
- package/dist/embedding-service.js +79 -7
- package/dist/eval-suite.js +7 -1
- package/dist/hybrid-search.js +56 -2
- package/dist/index.js +10 -1
- package/dist/reranker-service.js +125 -0
- package/package.json +5 -3
package/README.md
CHANGED
|
@@ -63,6 +63,8 @@ Now you can add the server to your MCP client (e.g. Claude Desktop).
|
|
|
63
63
|
|
|
64
64
|
🧠 **Agentic Retrieval Layer (v2.0)** - Auto-routing engine that analyzes query intent via local LLM to select the optimal search strategy (Vector, Graph, or Community)
|
|
65
65
|
|
|
66
|
+
🎯 **Tiny Learned Reranker (v2.0)** - Integrated Cross-Encoder model (`ms-marco-MiniLM-L-6-v2`) for ultra-precise re-ranking of top search results
|
|
67
|
+
|
|
66
68
|
🎯 **Multi-Vector Support (since v1.7)** - Dual embeddings per entity: content-embedding for context, name-embedding for identification
|
|
67
69
|
|
|
68
70
|
⚡ **Semantic Caching (since v0.8.5)** - Two-level cache (L1 memory + L2 persistent) with semantic query matching
|
|
@@ -191,8 +193,10 @@ This tool compares strategies using a synthetic dataset and measures **Recall@K*
|
|
|
191
193
|
| Method | Recall@10 | Avg Latency | Best For |
|
|
192
194
|
| :--- | :--- | :--- | :--- |
|
|
193
195
|
| **Graph-RAG** | **1.00** | **~32 ms** | Deep relational reasoning |
|
|
196
|
+
| **Graph-RAG (Reranked)** | **1.00** | **~36 ms** | Maximum precision for relational data |
|
|
194
197
|
| **Graph-Walking** | 1.00 | ~50 ms | Associative path exploration |
|
|
195
198
|
| **Hybrid Search** | 1.00 | ~89 ms | Broad factual retrieval |
|
|
199
|
+
| **Reranked Search** | 1.00 | ~20 ms* | Ultra-precise factual search (Warm cache) |
|
|
196
200
|
|
|
197
201
|
## Architecture
|
|
198
202
|
|
|
@@ -608,14 +612,14 @@ PDF Ingestion via File Path:
|
|
|
608
612
|
### query_memory (Read)
|
|
609
613
|
|
|
610
614
|
Actions:
|
|
611
|
-
- `search`: `{ query, limit?, entity_types?, include_entities?, include_observations? }`
|
|
612
|
-
- `advancedSearch`: `{ query, limit?, filters?, graphConstraints?, vectorOptions? }`
|
|
615
|
+
- `search`: `{ query, limit?, entity_types?, include_entities?, include_observations?, rerank? }`
|
|
616
|
+
- `advancedSearch`: `{ query, limit?, filters?, graphConstraints?, vectorOptions?, rerank? }`
|
|
613
617
|
- `context`: `{ query, context_window?, time_range_hours? }`
|
|
614
618
|
- `entity_details`: `{ entity_id, as_of? }`
|
|
615
619
|
- `history`: `{ entity_id }`
|
|
616
|
-
- `graph_rag`: `{ query, max_depth?, limit?, filters? }` Graph-based reasoning. Finds vector seeds (with inline filtering) first and then expands transitive relationships. Uses recursive Datalog for efficient BFS expansion.
|
|
620
|
+
- `graph_rag`: `{ query, max_depth?, limit?, filters?, rerank? }` Graph-based reasoning. Finds vector seeds (with inline filtering) first and then expands transitive relationships. Uses recursive Datalog for efficient BFS expansion.
|
|
617
621
|
- `graph_walking`: `{ query, start_entity_id?, max_depth?, limit? }` (v1.7) Recursive semantic graph search. Starts at vector seeds or a specific entity and follows relationships to other semantically relevant entities. Ideal for deeper path exploration.
|
|
618
|
-
- `agentic_search`: `{ query, limit? }` **(New v2.0)**: **Auto-Routing Search**. Uses a local LLM (Ollama) to analyze query intent and automatically routes it to the most appropriate strategy (`vector_search`, `graph_walk`, or `community_summary`).
|
|
622
|
+
- `agentic_search`: `{ query, limit?, rerank? }` **(New v2.0)**: **Auto-Routing Search**. Uses a local LLM (Ollama) to analyze query intent and automatically routes it to the most appropriate strategy (`vector_search`, `graph_walk`, or `community_summary`).
|
|
619
623
|
- `get_relation_evolution`: `{ from_id, to_id?, since?, until? }` (in `analyze_graph`) Shows temporal development of relationships including time range filter and diff summary.
|
|
620
624
|
|
|
621
625
|
Important Details:
|
|
@@ -885,6 +889,15 @@ Uncertainty/Transparency:
|
|
|
885
889
|
- Inference candidates are marked as `source: "inference"` and provide a short reason (uncertainty hint) in the result.
|
|
886
890
|
- In `context` output, inferred entities additionally carry an `uncertainty_hint` so an LLM can distinguish "hard fact" vs. "conjecture".
|
|
887
891
|
|
|
892
|
+
### Tiny Learned Reranker (Cross-Encoder)
|
|
893
|
+
|
|
894
|
+
For maximum precision, CozoDB Memory integrates a specialized **Cross-Encoder Reranker** (Phase 2 RAG).
|
|
895
|
+
|
|
896
|
+
- **Model**: `Xenova/ms-marco-MiniLM-L-6-v2` (Local ONNX)
|
|
897
|
+
- **Mechanism**: After initial hybrid retrieval, the top candidates (up to 30) are re-evaluated by the cross-encoder. Unlike bi-encoders (vectors), cross-encoders process query and document simultaneously, capturing deep semantic nuances.
|
|
898
|
+
- **Latency**: Minimal overhead (~4-6ms for top 10 candidates).
|
|
899
|
+
- **Supported Tools**: Available as a `rerank: true` parameter in `search`, `advancedSearch`, `graph_rag`, and `agentic_search`.
|
|
900
|
+
|
|
888
901
|
### Inference Engine
|
|
889
902
|
|
|
890
903
|
Inference uses multiple strategies (non-persisting):
|
|
@@ -0,0 +1,402 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
|
|
3
|
+
if (k2 === undefined) k2 = k;
|
|
4
|
+
var desc = Object.getOwnPropertyDescriptor(m, k);
|
|
5
|
+
if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
|
|
6
|
+
desc = { enumerable: true, get: function() { return m[k]; } };
|
|
7
|
+
}
|
|
8
|
+
Object.defineProperty(o, k2, desc);
|
|
9
|
+
}) : (function(o, m, k, k2) {
|
|
10
|
+
if (k2 === undefined) k2 = k;
|
|
11
|
+
o[k2] = m[k];
|
|
12
|
+
}));
|
|
13
|
+
var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
|
|
14
|
+
Object.defineProperty(o, "default", { enumerable: true, value: v });
|
|
15
|
+
}) : function(o, v) {
|
|
16
|
+
o["default"] = v;
|
|
17
|
+
});
|
|
18
|
+
var __importStar = (this && this.__importStar) || (function () {
|
|
19
|
+
var ownKeys = function(o) {
|
|
20
|
+
ownKeys = Object.getOwnPropertyNames || function (o) {
|
|
21
|
+
var ar = [];
|
|
22
|
+
for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
|
|
23
|
+
return ar;
|
|
24
|
+
};
|
|
25
|
+
return ownKeys(o);
|
|
26
|
+
};
|
|
27
|
+
return function (mod) {
|
|
28
|
+
if (mod && mod.__esModule) return mod;
|
|
29
|
+
var result = {};
|
|
30
|
+
if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
|
|
31
|
+
__setModuleDefault(result, mod);
|
|
32
|
+
return result;
|
|
33
|
+
};
|
|
34
|
+
})();
|
|
35
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
36
|
+
require("dotenv/config");
|
|
37
|
+
const embedding_service_1 = require("./embedding-service");
|
|
38
|
+
const path = __importStar(require("path"));
|
|
39
|
+
const fs = __importStar(require("fs"));
|
|
40
|
+
// Test data - verschiedene Szenarien
|
|
41
|
+
const TEST_QUERIES = [
|
|
42
|
+
"What is machine learning?",
|
|
43
|
+
"How do neural networks work?",
|
|
44
|
+
"Explain quantum computing",
|
|
45
|
+
"What are the benefits of TypeScript?",
|
|
46
|
+
"How to optimize database queries?",
|
|
47
|
+
"Best practices for API design",
|
|
48
|
+
"Understanding distributed systems",
|
|
49
|
+
"Introduction to graph databases",
|
|
50
|
+
"Microservices architecture patterns",
|
|
51
|
+
"Cloud computing fundamentals"
|
|
52
|
+
];
|
|
53
|
+
const TEST_DOCUMENTS = [
|
|
54
|
+
"Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.",
|
|
55
|
+
"Neural networks are computing systems inspired by biological neural networks that constitute animal brains. They consist of interconnected nodes or neurons.",
|
|
56
|
+
"Quantum computing uses quantum-mechanical phenomena such as superposition and entanglement to perform operations on data.",
|
|
57
|
+
"TypeScript is a strongly typed programming language that builds on JavaScript, giving you better tooling at any scale.",
|
|
58
|
+
"Database query optimization involves analyzing and improving query performance through indexing, query rewriting, and execution plan analysis.",
|
|
59
|
+
"API design best practices include using RESTful principles, proper versioning, clear documentation, and consistent error handling.",
|
|
60
|
+
"Distributed systems are computing systems whose components are located on different networked computers, which communicate and coordinate their actions.",
|
|
61
|
+
"Graph databases use graph structures with nodes, edges, and properties to represent and store data, ideal for connected data.",
|
|
62
|
+
"Microservices architecture is an approach to developing a single application as a suite of small services, each running in its own process.",
|
|
63
|
+
"Cloud computing delivers computing services including servers, storage, databases, networking, software, analytics, and intelligence over the Internet."
|
|
64
|
+
];
|
|
65
|
+
// Cosine similarity berechnen
|
|
66
|
+
function cosineSimilarity(a, b) {
|
|
67
|
+
let dotProduct = 0;
|
|
68
|
+
let normA = 0;
|
|
69
|
+
let normB = 0;
|
|
70
|
+
for (let i = 0; i < a.length; i++) {
|
|
71
|
+
dotProduct += a[i] * b[i];
|
|
72
|
+
normA += a[i] * a[i];
|
|
73
|
+
normB += b[i] * b[i];
|
|
74
|
+
}
|
|
75
|
+
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
|
|
76
|
+
}
|
|
77
|
+
// Test 1: Embedding-Geschwindigkeit
|
|
78
|
+
async function testEmbeddingSpeed(service, modelName) {
|
|
79
|
+
console.log(`\n${'='.repeat(70)}`);
|
|
80
|
+
console.log(`TEST 1: Embedding-Geschwindigkeit - ${modelName}`);
|
|
81
|
+
console.log('='.repeat(70));
|
|
82
|
+
const times = [];
|
|
83
|
+
// Warmup
|
|
84
|
+
await service.embed("warmup");
|
|
85
|
+
// Single embeddings
|
|
86
|
+
for (const query of TEST_QUERIES) {
|
|
87
|
+
const start = performance.now();
|
|
88
|
+
await service.embed(query);
|
|
89
|
+
const end = performance.now();
|
|
90
|
+
times.push(end - start);
|
|
91
|
+
}
|
|
92
|
+
const avgTime = times.reduce((a, b) => a + b, 0) / times.length;
|
|
93
|
+
const minTime = Math.min(...times);
|
|
94
|
+
const maxTime = Math.max(...times);
|
|
95
|
+
console.log(`\nSingle Embedding Performance:`);
|
|
96
|
+
console.log(` Average: ${avgTime.toFixed(2)} ms`);
|
|
97
|
+
console.log(` Min: ${minTime.toFixed(2)} ms`);
|
|
98
|
+
console.log(` Max: ${maxTime.toFixed(2)} ms`);
|
|
99
|
+
return { avgTime, minTime, maxTime };
|
|
100
|
+
}
|
|
101
|
+
// Test 2: Batch-Performance
|
|
102
|
+
async function testBatchPerformance(service, modelName) {
|
|
103
|
+
console.log(`\n${'='.repeat(70)}`);
|
|
104
|
+
console.log(`TEST 2: Batch-Performance - ${modelName}`);
|
|
105
|
+
console.log('='.repeat(70));
|
|
106
|
+
const start = performance.now();
|
|
107
|
+
await service.embedBatch(TEST_DOCUMENTS);
|
|
108
|
+
const end = performance.now();
|
|
109
|
+
const totalTime = end - start;
|
|
110
|
+
const avgPerDoc = totalTime / TEST_DOCUMENTS.length;
|
|
111
|
+
console.log(`\nBatch Embedding (${TEST_DOCUMENTS.length} documents):`);
|
|
112
|
+
console.log(` Total time: ${totalTime.toFixed(2)} ms`);
|
|
113
|
+
console.log(` Avg per doc: ${avgPerDoc.toFixed(2)} ms`);
|
|
114
|
+
console.log(` Throughput: ${(1000 / avgPerDoc).toFixed(2)} docs/sec`);
|
|
115
|
+
return { totalTime, avgPerDoc };
|
|
116
|
+
}
|
|
117
|
+
// Test 3: Semantic Similarity Quality
|
|
118
|
+
async function testSemanticSimilarity(service, modelName) {
|
|
119
|
+
console.log(`\n${'='.repeat(70)}`);
|
|
120
|
+
console.log(`TEST 3: Semantic Similarity Quality - ${modelName}`);
|
|
121
|
+
console.log('='.repeat(70));
|
|
122
|
+
// Embed queries and documents
|
|
123
|
+
const queryEmbeddings = await service.embedBatch(TEST_QUERIES);
|
|
124
|
+
const docEmbeddings = await service.embedBatch(TEST_DOCUMENTS);
|
|
125
|
+
// Expected matches (query index -> document index)
|
|
126
|
+
const expectedMatches = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
|
|
127
|
+
let correctMatches = 0;
|
|
128
|
+
const similarities = [];
|
|
129
|
+
console.log(`\nTop Match for each Query:`);
|
|
130
|
+
for (let i = 0; i < TEST_QUERIES.length; i++) {
|
|
131
|
+
const queryEmb = queryEmbeddings[i];
|
|
132
|
+
// Calculate similarities with all documents
|
|
133
|
+
const sims = docEmbeddings.map((docEmb, idx) => ({
|
|
134
|
+
idx,
|
|
135
|
+
similarity: cosineSimilarity(queryEmb, docEmb)
|
|
136
|
+
}));
|
|
137
|
+
// Sort by similarity
|
|
138
|
+
sims.sort((a, b) => b.similarity - a.similarity);
|
|
139
|
+
const topMatch = sims[0];
|
|
140
|
+
const isCorrect = topMatch.idx === expectedMatches[i];
|
|
141
|
+
if (isCorrect)
|
|
142
|
+
correctMatches++;
|
|
143
|
+
similarities.push(topMatch.similarity);
|
|
144
|
+
console.log(` Q${i}: "${TEST_QUERIES[i].substring(0, 40)}..."`);
|
|
145
|
+
console.log(` → Doc ${topMatch.idx} (sim: ${topMatch.similarity.toFixed(4)}) ${isCorrect ? '✓' : '✗'}`);
|
|
146
|
+
}
|
|
147
|
+
const accuracy = (correctMatches / TEST_QUERIES.length) * 100;
|
|
148
|
+
const avgSimilarity = similarities.reduce((a, b) => a + b, 0) / similarities.length;
|
|
149
|
+
console.log(`\nResults:`);
|
|
150
|
+
console.log(` Accuracy: ${accuracy.toFixed(1)}% (${correctMatches}/${TEST_QUERIES.length})`);
|
|
151
|
+
console.log(` Avg Similarity: ${avgSimilarity.toFixed(4)}`);
|
|
152
|
+
return { accuracy, avgSimilarity, correctMatches };
|
|
153
|
+
}
|
|
154
|
+
// Test 4: Long Context Handling
|
|
155
|
+
async function testLongContext(service, modelName) {
|
|
156
|
+
console.log(`\n${'='.repeat(70)}`);
|
|
157
|
+
console.log(`TEST 4: Long Context Handling - ${modelName}`);
|
|
158
|
+
console.log('='.repeat(70));
|
|
159
|
+
const shortText = "Machine learning is AI.";
|
|
160
|
+
const mediumText = TEST_DOCUMENTS[0]; // ~150 chars
|
|
161
|
+
const longText = TEST_DOCUMENTS.join(" "); // ~1000 chars
|
|
162
|
+
const veryLongText = longText.repeat(5); // ~5000 chars
|
|
163
|
+
const tests = [
|
|
164
|
+
{ name: "Short (~20 chars)", text: shortText },
|
|
165
|
+
{ name: "Medium (~150 chars)", text: mediumText },
|
|
166
|
+
{ name: "Long (~1000 chars)", text: longText },
|
|
167
|
+
{ name: "Very Long (~5000 chars)", text: veryLongText }
|
|
168
|
+
];
|
|
169
|
+
console.log(`\nContext Length Performance:`);
|
|
170
|
+
const results = [];
|
|
171
|
+
for (const test of tests) {
|
|
172
|
+
const start = performance.now();
|
|
173
|
+
await service.embed(test.text);
|
|
174
|
+
const end = performance.now();
|
|
175
|
+
const time = end - start;
|
|
176
|
+
console.log(` ${test.name.padEnd(25)} ${time.toFixed(2)} ms`);
|
|
177
|
+
results.push({ name: test.name, time });
|
|
178
|
+
}
|
|
179
|
+
return results;
|
|
180
|
+
}
|
|
181
|
+
// Test 5: Cache Performance
|
|
182
|
+
async function testCachePerformance(service, modelName) {
|
|
183
|
+
console.log(`\n${'='.repeat(70)}`);
|
|
184
|
+
console.log(`TEST 5: Cache Performance - ${modelName}`);
|
|
185
|
+
console.log('='.repeat(70));
|
|
186
|
+
const testText = "Test cache performance";
|
|
187
|
+
// First call (cold)
|
|
188
|
+
const start1 = performance.now();
|
|
189
|
+
await service.embed(testText);
|
|
190
|
+
const end1 = performance.now();
|
|
191
|
+
const coldTime = end1 - start1;
|
|
192
|
+
// Second call (cached)
|
|
193
|
+
const start2 = performance.now();
|
|
194
|
+
await service.embed(testText);
|
|
195
|
+
const end2 = performance.now();
|
|
196
|
+
const cachedTime = end2 - start2;
|
|
197
|
+
const speedup = coldTime / cachedTime;
|
|
198
|
+
console.log(`\nCache Hit Performance:`);
|
|
199
|
+
console.log(` Cold (first call): ${coldTime.toFixed(2)} ms`);
|
|
200
|
+
console.log(` Cached (second): ${cachedTime.toFixed(2)} ms`);
|
|
201
|
+
console.log(` Speedup: ${speedup.toFixed(1)}x faster`);
|
|
202
|
+
const stats = service.getCacheStats();
|
|
203
|
+
console.log(`\nCache Statistics:`);
|
|
204
|
+
console.log(` Size: ${stats.size}/${stats.maxSize}`);
|
|
205
|
+
console.log(` Model: ${stats.model}`);
|
|
206
|
+
console.log(` Dims: ${stats.dimensions}`);
|
|
207
|
+
return { coldTime, cachedTime, speedup };
|
|
208
|
+
}
|
|
209
|
+
// Test 6: Memory Usage
|
|
210
|
+
async function testMemoryUsage(service, modelName) {
|
|
211
|
+
console.log(`\n${'='.repeat(70)}`);
|
|
212
|
+
console.log(`TEST 6: Memory Usage - ${modelName}`);
|
|
213
|
+
console.log('='.repeat(70));
|
|
214
|
+
const memBefore = process.memoryUsage();
|
|
215
|
+
// Embed a batch
|
|
216
|
+
await service.embedBatch(TEST_DOCUMENTS);
|
|
217
|
+
const memAfter = process.memoryUsage();
|
|
218
|
+
const heapUsedMB = (memAfter.heapUsed - memBefore.heapUsed) / 1024 / 1024;
|
|
219
|
+
const rssMB = (memAfter.rss - memBefore.rss) / 1024 / 1024;
|
|
220
|
+
console.log(`\nMemory Usage (after batch embedding):`);
|
|
221
|
+
console.log(` Heap Used: ${heapUsedMB.toFixed(2)} MB`);
|
|
222
|
+
console.log(` RSS: ${rssMB.toFixed(2)} MB`);
|
|
223
|
+
console.log(` Total Heap: ${(memAfter.heapTotal / 1024 / 1024).toFixed(2)} MB`);
|
|
224
|
+
return { heapUsedMB, rssMB };
|
|
225
|
+
}
|
|
226
|
+
// Main comparison function
|
|
227
|
+
async function runSingleModelTest(modelId) {
|
|
228
|
+
console.log('\n' + '█'.repeat(70));
|
|
229
|
+
console.log(`TESTING MODEL: ${modelId}`);
|
|
230
|
+
console.log('█'.repeat(70));
|
|
231
|
+
const service = new embedding_service_1.EmbeddingService();
|
|
232
|
+
const results = {
|
|
233
|
+
model: modelId,
|
|
234
|
+
timestamp: new Date().toISOString()
|
|
235
|
+
};
|
|
236
|
+
// Run tests
|
|
237
|
+
results.speed = await testEmbeddingSpeed(service, modelId);
|
|
238
|
+
results.batch = await testBatchPerformance(service, modelId);
|
|
239
|
+
results.similarity = await testSemanticSimilarity(service, modelId);
|
|
240
|
+
results.longContext = await testLongContext(service, modelId);
|
|
241
|
+
results.cache = await testCachePerformance(service, modelId);
|
|
242
|
+
results.memory = await testMemoryUsage(service, modelId);
|
|
243
|
+
// Save results
|
|
244
|
+
const resultsPath = path.join(__dirname, '..', `embedding-results-${modelId.replace(/\//g, '-')}.json`);
|
|
245
|
+
fs.writeFileSync(resultsPath, JSON.stringify(results, null, 2));
|
|
246
|
+
console.log(`\n✓ Results saved to: ${resultsPath}`);
|
|
247
|
+
return results;
|
|
248
|
+
}
|
|
249
|
+
async function compareResults() {
|
|
250
|
+
console.log('\n' + '█'.repeat(70));
|
|
251
|
+
console.log('LOADING AND COMPARING RESULTS');
|
|
252
|
+
console.log('█'.repeat(70));
|
|
253
|
+
const bgeFile = path.join(__dirname, '..', 'embedding-results-Xenova-bge-m3.json');
|
|
254
|
+
const pplxFile = path.join(__dirname, '..', 'embedding-results-perplexity-ai-pplx-embed-v1-0.6b.json');
|
|
255
|
+
if (!fs.existsSync(bgeFile)) {
|
|
256
|
+
console.error('\n✗ BGE-M3 results not found!');
|
|
257
|
+
console.log('Please run: EMBEDDING_MODEL=Xenova/bge-m3 npm run compare-embeddings');
|
|
258
|
+
return;
|
|
259
|
+
}
|
|
260
|
+
if (!fs.existsSync(pplxFile)) {
|
|
261
|
+
console.error('\n✗ pplx-embed results not found!');
|
|
262
|
+
console.log('Please run: EMBEDDING_MODEL=perplexity-ai/pplx-embed-v1-0.6b npm run compare-embeddings');
|
|
263
|
+
return;
|
|
264
|
+
}
|
|
265
|
+
const bge = JSON.parse(fs.readFileSync(bgeFile, 'utf-8'));
|
|
266
|
+
const pplx = JSON.parse(fs.readFileSync(pplxFile, 'utf-8'));
|
|
267
|
+
// Print comparison summary
|
|
268
|
+
console.log(`\n\n${'█'.repeat(70)}`);
|
|
269
|
+
console.log('COMPARISON SUMMARY');
|
|
270
|
+
console.log('█'.repeat(70));
|
|
271
|
+
console.log(`\n1. EMBEDDING SPEED (lower is better)`);
|
|
272
|
+
console.log(` BGE-M3: ${bge.speed.avgTime.toFixed(2)} ms`);
|
|
273
|
+
console.log(` pplx-embed: ${pplx.speed.avgTime.toFixed(2)} ms`);
|
|
274
|
+
const speedDiff = ((pplx.speed.avgTime - bge.speed.avgTime) / bge.speed.avgTime * 100);
|
|
275
|
+
console.log(` Winner: ${bge.speed.avgTime < pplx.speed.avgTime ? 'BGE-M3' : 'pplx-embed'} (${Math.abs(speedDiff).toFixed(1)}% ${speedDiff > 0 ? 'slower' : 'faster'})`);
|
|
276
|
+
console.log(`\n2. BATCH THROUGHPUT (higher is better)`);
|
|
277
|
+
const bgeThroughput = 1000 / bge.batch.avgPerDoc;
|
|
278
|
+
const pplxThroughput = 1000 / pplx.batch.avgPerDoc;
|
|
279
|
+
console.log(` BGE-M3: ${bgeThroughput.toFixed(2)} docs/sec`);
|
|
280
|
+
console.log(` pplx-embed: ${pplxThroughput.toFixed(2)} docs/sec`);
|
|
281
|
+
console.log(` Winner: ${bgeThroughput > pplxThroughput ? 'BGE-M3' : 'pplx-embed'}`);
|
|
282
|
+
console.log(`\n3. SEMANTIC SIMILARITY ACCURACY (higher is better)`);
|
|
283
|
+
console.log(` BGE-M3: ${bge.similarity.accuracy.toFixed(1)}% (${bge.similarity.correctMatches}/${TEST_QUERIES.length})`);
|
|
284
|
+
console.log(` pplx-embed: ${pplx.similarity.accuracy.toFixed(1)}% (${pplx.similarity.correctMatches}/${TEST_QUERIES.length})`);
|
|
285
|
+
console.log(` Winner: ${bge.similarity.accuracy > pplx.similarity.accuracy ? 'BGE-M3' : 'pplx-embed'} ${pplx.similarity.accuracy > bge.similarity.accuracy ? '🏆' : ''}`);
|
|
286
|
+
console.log(`\n4. AVERAGE SIMILARITY SCORE (higher is better)`);
|
|
287
|
+
console.log(` BGE-M3: ${bge.similarity.avgSimilarity.toFixed(4)}`);
|
|
288
|
+
console.log(` pplx-embed: ${pplx.similarity.avgSimilarity.toFixed(4)}`);
|
|
289
|
+
const simDiff = ((pplx.similarity.avgSimilarity - bge.similarity.avgSimilarity) / bge.similarity.avgSimilarity * 100);
|
|
290
|
+
console.log(` Winner: ${bge.similarity.avgSimilarity > pplx.similarity.avgSimilarity ? 'BGE-M3' : 'pplx-embed'} (${Math.abs(simDiff).toFixed(1)}% ${simDiff > 0 ? 'higher' : 'lower'})`);
|
|
291
|
+
console.log(`\n5. CACHE SPEEDUP (higher is better)`);
|
|
292
|
+
console.log(` BGE-M3: ${bge.cache.speedup.toFixed(1)}x`);
|
|
293
|
+
console.log(` pplx-embed: ${pplx.cache.speedup.toFixed(1)}x`);
|
|
294
|
+
console.log(`\n6. MEMORY USAGE (lower is better)`);
|
|
295
|
+
console.log(` BGE-M3: ${bge.memory.heapUsedMB.toFixed(2)} MB heap`);
|
|
296
|
+
console.log(` pplx-embed: ${pplx.memory.heapUsedMB.toFixed(2)} MB heap`);
|
|
297
|
+
console.log(` Winner: ${bge.memory.heapUsedMB < pplx.memory.heapUsedMB ? 'BGE-M3' : 'pplx-embed'}`);
|
|
298
|
+
// Score calculation
|
|
299
|
+
let bgeScore = 0;
|
|
300
|
+
let pplxScore = 0;
|
|
301
|
+
if (bge.speed.avgTime < pplx.speed.avgTime)
|
|
302
|
+
bgeScore++;
|
|
303
|
+
else
|
|
304
|
+
pplxScore++;
|
|
305
|
+
if (bgeThroughput > pplxThroughput)
|
|
306
|
+
bgeScore++;
|
|
307
|
+
else
|
|
308
|
+
pplxScore++;
|
|
309
|
+
if (bge.similarity.accuracy > pplx.similarity.accuracy)
|
|
310
|
+
bgeScore++;
|
|
311
|
+
else
|
|
312
|
+
pplxScore++;
|
|
313
|
+
if (bge.similarity.avgSimilarity > pplx.similarity.avgSimilarity)
|
|
314
|
+
bgeScore++;
|
|
315
|
+
else
|
|
316
|
+
pplxScore++;
|
|
317
|
+
if (bge.memory.heapUsedMB < pplx.memory.heapUsedMB)
|
|
318
|
+
bgeScore++;
|
|
319
|
+
else
|
|
320
|
+
pplxScore++;
|
|
321
|
+
// Overall winner
|
|
322
|
+
console.log(`\n${'='.repeat(70)}`);
|
|
323
|
+
console.log('OVERALL SCORE');
|
|
324
|
+
console.log('='.repeat(70));
|
|
325
|
+
console.log(`\n BGE-M3: ${bgeScore}/5 wins`);
|
|
326
|
+
console.log(` pplx-embed: ${pplxScore}/5 wins`);
|
|
327
|
+
console.log(`\n${'='.repeat(70)}`);
|
|
328
|
+
console.log('RECOMMENDATION');
|
|
329
|
+
console.log('='.repeat(70));
|
|
330
|
+
if (pplxScore > bgeScore) {
|
|
331
|
+
console.log(`\n✓ pplx-embed-v1-0.6b is RECOMMENDED 🏆`);
|
|
332
|
+
console.log(` Reasons:`);
|
|
333
|
+
if (pplx.similarity.accuracy > bge.similarity.accuracy) {
|
|
334
|
+
console.log(` ✓ Better semantic similarity accuracy (+${(pplx.similarity.accuracy - bge.similarity.accuracy).toFixed(1)}%)`);
|
|
335
|
+
}
|
|
336
|
+
if (pplx.similarity.avgSimilarity > bge.similarity.avgSimilarity) {
|
|
337
|
+
console.log(` ✓ Higher quality embeddings (+${(simDiff).toFixed(1)}%)`);
|
|
338
|
+
}
|
|
339
|
+
console.log(` ✓ 32K context length (vs 8K for BGE-M3)`);
|
|
340
|
+
console.log(` ✓ Better MTEB benchmark scores`);
|
|
341
|
+
}
|
|
342
|
+
else if (bgeScore > pplxScore) {
|
|
343
|
+
console.log(`\n✓ BGE-M3 is RECOMMENDED 🏆`);
|
|
344
|
+
console.log(` Reasons:`);
|
|
345
|
+
if (bge.speed.avgTime < pplx.speed.avgTime) {
|
|
346
|
+
console.log(` ✓ Faster embedding speed (-${Math.abs(speedDiff).toFixed(1)}%)`);
|
|
347
|
+
}
|
|
348
|
+
if (bge.memory.heapUsedMB < pplx.memory.heapUsedMB) {
|
|
349
|
+
console.log(` ✓ Lower memory usage`);
|
|
350
|
+
}
|
|
351
|
+
console.log(` ✓ Automatic download (no manual setup)`);
|
|
352
|
+
console.log(` ✓ Proven stability`);
|
|
353
|
+
}
|
|
354
|
+
else {
|
|
355
|
+
console.log(`\n⚖ BOTH MODELS ARE EQUALLY COMPETITIVE`);
|
|
356
|
+
console.log(` Choose based on your priorities:`);
|
|
357
|
+
console.log(` - pplx-embed: Better quality, longer context (32K)`);
|
|
358
|
+
console.log(` - BGE-M3: Faster, easier setup, automatic download`);
|
|
359
|
+
}
|
|
360
|
+
console.log('\n' + '█'.repeat(70));
|
|
361
|
+
console.log('COMPARISON COMPLETE');
|
|
362
|
+
console.log('█'.repeat(70) + '\n');
|
|
363
|
+
}
|
|
364
|
+
// Main entry point
|
|
365
|
+
async function main() {
|
|
366
|
+
const currentModel = process.env.EMBEDDING_MODEL || "Xenova/bge-m3";
|
|
367
|
+
// Check if we should compare existing results
|
|
368
|
+
const args = process.argv.slice(2);
|
|
369
|
+
if (args.includes('--compare')) {
|
|
370
|
+
await compareResults();
|
|
371
|
+
return;
|
|
372
|
+
}
|
|
373
|
+
console.log('\n' + '█'.repeat(70));
|
|
374
|
+
console.log('EMBEDDING MODEL BENCHMARK');
|
|
375
|
+
console.log('█'.repeat(70));
|
|
376
|
+
console.log(`\nCurrent model: ${currentModel}`);
|
|
377
|
+
console.log('\nThis will run a comprehensive test suite including:');
|
|
378
|
+
console.log(' 1. Embedding speed');
|
|
379
|
+
console.log(' 2. Batch performance');
|
|
380
|
+
console.log(' 3. Semantic similarity quality');
|
|
381
|
+
console.log(' 4. Long context handling');
|
|
382
|
+
console.log(' 5. Cache performance');
|
|
383
|
+
console.log(' 6. Memory usage');
|
|
384
|
+
console.log('\nEstimated time: 2-3 minutes\n');
|
|
385
|
+
await runSingleModelTest(currentModel);
|
|
386
|
+
console.log('\n' + '='.repeat(70));
|
|
387
|
+
console.log('NEXT STEPS');
|
|
388
|
+
console.log('='.repeat(70));
|
|
389
|
+
if (currentModel === 'Xenova/bge-m3') {
|
|
390
|
+
console.log('\nTo test pplx-embed, run:');
|
|
391
|
+
console.log(' EMBEDDING_MODEL=perplexity-ai/pplx-embed-v1-0.6b npm run compare-embeddings');
|
|
392
|
+
}
|
|
393
|
+
else if (currentModel === 'perplexity-ai/pplx-embed-v1-0.6b') {
|
|
394
|
+
console.log('\nTo test BGE-M3, run:');
|
|
395
|
+
console.log(' EMBEDDING_MODEL=Xenova/bge-m3 npm run compare-embeddings');
|
|
396
|
+
}
|
|
397
|
+
console.log('\nTo compare both results, run:');
|
|
398
|
+
console.log(' npm run compare-embeddings -- --compare');
|
|
399
|
+
console.log();
|
|
400
|
+
}
|
|
401
|
+
// Run
|
|
402
|
+
main().catch(console.error);
|
|
@@ -0,0 +1,151 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
|
|
3
|
+
if (k2 === undefined) k2 = k;
|
|
4
|
+
var desc = Object.getOwnPropertyDescriptor(m, k);
|
|
5
|
+
if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
|
|
6
|
+
desc = { enumerable: true, get: function() { return m[k]; } };
|
|
7
|
+
}
|
|
8
|
+
Object.defineProperty(o, k2, desc);
|
|
9
|
+
}) : (function(o, m, k, k2) {
|
|
10
|
+
if (k2 === undefined) k2 = k;
|
|
11
|
+
o[k2] = m[k];
|
|
12
|
+
}));
|
|
13
|
+
var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
|
|
14
|
+
Object.defineProperty(o, "default", { enumerable: true, value: v });
|
|
15
|
+
}) : function(o, v) {
|
|
16
|
+
o["default"] = v;
|
|
17
|
+
});
|
|
18
|
+
var __importStar = (this && this.__importStar) || (function () {
|
|
19
|
+
var ownKeys = function(o) {
|
|
20
|
+
ownKeys = Object.getOwnPropertyNames || function (o) {
|
|
21
|
+
var ar = [];
|
|
22
|
+
for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
|
|
23
|
+
return ar;
|
|
24
|
+
};
|
|
25
|
+
return ownKeys(o);
|
|
26
|
+
};
|
|
27
|
+
return function (mod) {
|
|
28
|
+
if (mod && mod.__esModule) return mod;
|
|
29
|
+
var result = {};
|
|
30
|
+
if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
|
|
31
|
+
__setModuleDefault(result, mod);
|
|
32
|
+
return result;
|
|
33
|
+
};
|
|
34
|
+
})();
|
|
35
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
36
|
+
require("dotenv/config");
|
|
37
|
+
const https = __importStar(require("https"));
|
|
38
|
+
const fs = __importStar(require("fs"));
|
|
39
|
+
const path = __importStar(require("path"));
|
|
40
|
+
const transformers_1 = require("@xenova/transformers");
|
|
41
|
+
// Configure cache path
|
|
42
|
+
const CACHE_DIR = path.resolve('./.cache');
|
|
43
|
+
transformers_1.env.cacheDir = CACHE_DIR;
|
|
44
|
+
const MODEL_ID = "perplexity-ai/pplx-embed-v1-0.6b";
|
|
45
|
+
const BASE_URL = `https://huggingface.co/${MODEL_ID}/resolve/main/onnx`;
|
|
46
|
+
// Model variant to download (quantized is recommended for smaller size)
|
|
47
|
+
const USE_QUANTIZED = true; // Set to false for FP32 full precision
|
|
48
|
+
// Files to download based on variant
|
|
49
|
+
const FILES = USE_QUANTIZED ? [
|
|
50
|
+
{ name: 'model_quantized.onnx', size: '614 KB' },
|
|
51
|
+
{ name: 'model_quantized.onnx_data', size: '706 MB' }
|
|
52
|
+
] : [
|
|
53
|
+
{ name: 'model.onnx', size: '520 KB' },
|
|
54
|
+
{ name: 'model.onnx_data', size: '2.09 GB' },
|
|
55
|
+
{ name: 'model.onnx_data_1', size: '306 MB' }
|
|
56
|
+
];
|
|
57
|
+
// Target directory
|
|
58
|
+
const targetDir = path.join(CACHE_DIR, 'perplexity-ai', 'pplx-embed-v1-0.6b', 'onnx');
|
|
59
|
+
function downloadFile(url, dest) {
|
|
60
|
+
return new Promise((resolve, reject) => {
|
|
61
|
+
const file = fs.createWriteStream(dest);
|
|
62
|
+
https.get(url, (response) => {
|
|
63
|
+
if (response.statusCode === 302 || response.statusCode === 301) {
|
|
64
|
+
// Follow redirect
|
|
65
|
+
const redirectUrl = response.headers.location;
|
|
66
|
+
if (redirectUrl) {
|
|
67
|
+
file.close();
|
|
68
|
+
fs.unlinkSync(dest);
|
|
69
|
+
return downloadFile(redirectUrl, dest).then(resolve).catch(reject);
|
|
70
|
+
}
|
|
71
|
+
}
|
|
72
|
+
const totalSize = parseInt(response.headers['content-length'] || '0', 10);
|
|
73
|
+
let downloadedSize = 0;
|
|
74
|
+
response.on('data', (chunk) => {
|
|
75
|
+
downloadedSize += chunk.length;
|
|
76
|
+
const progress = ((downloadedSize / totalSize) * 100).toFixed(2);
|
|
77
|
+
process.stdout.write(`\r Progress: ${progress}% (${(downloadedSize / 1024 / 1024).toFixed(2)} MB / ${(totalSize / 1024 / 1024).toFixed(2)} MB)`);
|
|
78
|
+
});
|
|
79
|
+
response.pipe(file);
|
|
80
|
+
file.on('finish', () => {
|
|
81
|
+
file.close();
|
|
82
|
+
console.log('\n ✓ Download complete');
|
|
83
|
+
resolve();
|
|
84
|
+
});
|
|
85
|
+
}).on('error', (err) => {
|
|
86
|
+
fs.unlinkSync(dest);
|
|
87
|
+
reject(err);
|
|
88
|
+
});
|
|
89
|
+
});
|
|
90
|
+
}
|
|
91
|
+
async function downloadPplxEmbed() {
|
|
92
|
+
console.log('='.repeat(70));
|
|
93
|
+
console.log('Downloading Perplexity pplx-embed-v1-0.6b ONNX files');
|
|
94
|
+
console.log(`Variant: ${USE_QUANTIZED ? 'INT8 Quantized (Recommended)' : 'FP32 Full Precision'}`);
|
|
95
|
+
console.log('='.repeat(70));
|
|
96
|
+
console.log();
|
|
97
|
+
// Create target directory
|
|
98
|
+
if (!fs.existsSync(targetDir)) {
|
|
99
|
+
fs.mkdirSync(targetDir, { recursive: true });
|
|
100
|
+
console.log(`✓ Created directory: ${targetDir}`);
|
|
101
|
+
}
|
|
102
|
+
console.log();
|
|
103
|
+
console.log('Files to download:');
|
|
104
|
+
FILES.forEach(f => console.log(` - ${f.name} (${f.size})`));
|
|
105
|
+
console.log();
|
|
106
|
+
console.log(`Total size: ${USE_QUANTIZED ? '~706 MB' : '~2.5 GB'}`);
|
|
107
|
+
console.log(`This may take ${USE_QUANTIZED ? '3-10' : '10-30'} minutes depending on your internet connection.`);
|
|
108
|
+
console.log();
|
|
109
|
+
if (USE_QUANTIZED) {
|
|
110
|
+
console.log('ℹ Using INT8 quantized model (recommended):');
|
|
111
|
+
console.log(' ✓ 3x smaller than FP32 (~706 MB vs ~2.4 GB)');
|
|
112
|
+
console.log(' ✓ Minimal quality loss (~1.5% MTEB drop)');
|
|
113
|
+
console.log(' ✓ Faster inference');
|
|
114
|
+
console.log();
|
|
115
|
+
console.log(' To use FP32 instead, edit src/download-pplx-embed.ts');
|
|
116
|
+
console.log(' and set USE_QUANTIZED = false');
|
|
117
|
+
console.log();
|
|
118
|
+
}
|
|
119
|
+
// Download each file
|
|
120
|
+
for (const file of FILES) {
|
|
121
|
+
const filePath = path.join(targetDir, file.name);
|
|
122
|
+
// Skip if already exists
|
|
123
|
+
if (fs.existsSync(filePath)) {
|
|
124
|
+
console.log(`⊘ Skipping ${file.name} (already exists)`);
|
|
125
|
+
continue;
|
|
126
|
+
}
|
|
127
|
+
console.log(`⬇ Downloading ${file.name} (${file.size})...`);
|
|
128
|
+
const url = `${BASE_URL}/${file.name}`;
|
|
129
|
+
try {
|
|
130
|
+
await downloadFile(url, filePath);
|
|
131
|
+
}
|
|
132
|
+
catch (error) {
|
|
133
|
+
console.error(`✗ Failed to download ${file.name}:`, error.message);
|
|
134
|
+
console.error(' Please download manually from:');
|
|
135
|
+
console.error(` ${url}`);
|
|
136
|
+
process.exit(1);
|
|
137
|
+
}
|
|
138
|
+
}
|
|
139
|
+
console.log();
|
|
140
|
+
console.log('='.repeat(70));
|
|
141
|
+
console.log('✓ All files downloaded successfully!');
|
|
142
|
+
console.log('='.repeat(70));
|
|
143
|
+
console.log();
|
|
144
|
+
console.log('You can now use the model by setting in .env:');
|
|
145
|
+
console.log(' EMBEDDING_MODEL=perplexity-ai/pplx-embed-v1-0.6b');
|
|
146
|
+
console.log();
|
|
147
|
+
console.log('Then start the server with:');
|
|
148
|
+
console.log(' npm run start');
|
|
149
|
+
console.log();
|
|
150
|
+
}
|
|
151
|
+
downloadPplxEmbed().catch(console.error);
|
|
@@ -94,9 +94,16 @@ class EmbeddingService {
|
|
|
94
94
|
tokenizer = null;
|
|
95
95
|
modelId;
|
|
96
96
|
dimensions;
|
|
97
|
+
useOllama;
|
|
98
|
+
ollamaModel;
|
|
99
|
+
ollamaBaseUrl;
|
|
97
100
|
queue = Promise.resolve();
|
|
98
101
|
constructor() {
|
|
99
102
|
this.cache = new LRUCache(1000, 3600000); // 1000 entries, 1h TTL
|
|
103
|
+
// Check if Ollama should be used
|
|
104
|
+
this.useOllama = process.env.USE_OLLAMA === 'true';
|
|
105
|
+
this.ollamaModel = process.env.OLLAMA_EMBEDDING_MODEL || 'argus-ai/pplx-embed-v1-0.6b:q8_0';
|
|
106
|
+
this.ollamaBaseUrl = process.env.OLLAMA_BASE_URL || 'http://localhost:11434';
|
|
100
107
|
// Support multiple embedding models via environment variable
|
|
101
108
|
this.modelId = process.env.EMBEDDING_MODEL || "Xenova/bge-m3";
|
|
102
109
|
// Set dimensions based on model
|
|
@@ -106,9 +113,20 @@ class EmbeddingService {
|
|
|
106
113
|
"Xenova/bge-small-en-v1.5": 384,
|
|
107
114
|
"Xenova/nomic-embed-text-v1": 768,
|
|
108
115
|
"onnx-community/Qwen3-Embedding-0.6B-ONNX": 1024,
|
|
116
|
+
// Note: perplexity-ai models require manual ONNX file placement
|
|
117
|
+
// See PPLX_EMBED_INTEGRATION.md for instructions
|
|
118
|
+
"perplexity-ai/pplx-embed-v1-0.6b": 1024,
|
|
119
|
+
"perplexity-ai/pplx-embed-v1-4b": 2560,
|
|
120
|
+
// Ollama models
|
|
121
|
+
"argus-ai/pplx-embed-v1-0.6b:q8_0": 1024,
|
|
109
122
|
};
|
|
110
|
-
this.dimensions = dimensionMap[this.modelId] || 1024;
|
|
111
|
-
|
|
123
|
+
this.dimensions = dimensionMap[this.useOllama ? this.ollamaModel : this.modelId] || 1024;
|
|
124
|
+
if (this.useOllama) {
|
|
125
|
+
console.error(`[EmbeddingService] Using Ollama: ${this.ollamaModel} @ ${this.ollamaBaseUrl} (${this.dimensions} dimensions)`);
|
|
126
|
+
}
|
|
127
|
+
else {
|
|
128
|
+
console.error(`[EmbeddingService] Using ONNX model: ${this.modelId} (${this.dimensions} dimensions)`);
|
|
129
|
+
}
|
|
112
130
|
}
|
|
113
131
|
// Public getter for dimensions
|
|
114
132
|
getDimensions() {
|
|
@@ -125,6 +143,11 @@ class EmbeddingService {
|
|
|
125
143
|
async init() {
|
|
126
144
|
if (this.session && this.tokenizer)
|
|
127
145
|
return;
|
|
146
|
+
// Skip ONNX initialization if using Ollama
|
|
147
|
+
if (this.useOllama) {
|
|
148
|
+
console.error('[EmbeddingService] Using Ollama backend, skipping ONNX initialization');
|
|
149
|
+
return;
|
|
150
|
+
}
|
|
128
151
|
try {
|
|
129
152
|
// 1. Check if model needs to be downloaded
|
|
130
153
|
// Extract namespace and model name from modelId (e.g., "Xenova/bge-m3" or "onnx-community/Qwen3-Embedding-0.6B-ONNX")
|
|
@@ -139,10 +162,23 @@ class EmbeddingService {
|
|
|
139
162
|
if (!fs.existsSync(fp32Path) && !fs.existsSync(quantizedPath)) {
|
|
140
163
|
console.log(`[EmbeddingService] Model not found, downloading ${this.modelId}...`);
|
|
141
164
|
console.log(`[EmbeddingService] This may take a few minutes on first run.`);
|
|
142
|
-
//
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
165
|
+
// Check if this is a Xenova-compatible model
|
|
166
|
+
if (namespace === 'Xenova' || namespace === 'onnx-community') {
|
|
167
|
+
// Import AutoModel dynamically to trigger download
|
|
168
|
+
const { AutoModel } = await import("@xenova/transformers");
|
|
169
|
+
await AutoModel.from_pretrained(this.modelId, { quantized: false });
|
|
170
|
+
console.log(`[EmbeddingService] Model download completed.`);
|
|
171
|
+
}
|
|
172
|
+
else {
|
|
173
|
+
// For non-Xenova models (like perplexity-ai), provide manual download instructions
|
|
174
|
+
console.error(`[EmbeddingService] ERROR: Model ${this.modelId} is not available via @xenova/transformers`);
|
|
175
|
+
console.error(`[EmbeddingService] Please download the model manually:`);
|
|
176
|
+
console.error(`[EmbeddingService] 1. Visit: https://huggingface.co/${this.modelId}`);
|
|
177
|
+
console.error(`[EmbeddingService] 2. Download the 'onnx' folder contents`);
|
|
178
|
+
console.error(`[EmbeddingService] 3. Place files in: ${baseDir}`);
|
|
179
|
+
console.error(`[EmbeddingService] See PPLX_EMBED_INTEGRATION.md for detailed instructions`);
|
|
180
|
+
throw new Error(`Model ${this.modelId} requires manual download. See error messages above.`);
|
|
181
|
+
}
|
|
146
182
|
}
|
|
147
183
|
// 2. Load Tokenizer
|
|
148
184
|
if (!this.tokenizer) {
|
|
@@ -188,6 +224,10 @@ class EmbeddingService {
|
|
|
188
224
|
return cached;
|
|
189
225
|
}
|
|
190
226
|
try {
|
|
227
|
+
// Use Ollama if enabled
|
|
228
|
+
if (this.useOllama) {
|
|
229
|
+
return await this.embedWithOllama(textStr);
|
|
230
|
+
}
|
|
191
231
|
await this.init();
|
|
192
232
|
if (!this.session || !this.tokenizer)
|
|
193
233
|
throw new Error("Session/Tokenizer not initialized");
|
|
@@ -240,6 +280,37 @@ class EmbeddingService {
|
|
|
240
280
|
}
|
|
241
281
|
});
|
|
242
282
|
}
|
|
283
|
+
async embedWithOllama(text) {
|
|
284
|
+
try {
|
|
285
|
+
const response = await fetch(`${this.ollamaBaseUrl}/api/embeddings`, {
|
|
286
|
+
method: 'POST',
|
|
287
|
+
headers: {
|
|
288
|
+
'Content-Type': 'application/json',
|
|
289
|
+
},
|
|
290
|
+
body: JSON.stringify({
|
|
291
|
+
model: this.ollamaModel,
|
|
292
|
+
prompt: text,
|
|
293
|
+
}),
|
|
294
|
+
});
|
|
295
|
+
if (!response.ok) {
|
|
296
|
+
throw new Error(`Ollama API error: ${response.status} ${response.statusText}`);
|
|
297
|
+
}
|
|
298
|
+
const data = await response.json();
|
|
299
|
+
if (!data.embedding || !Array.isArray(data.embedding)) {
|
|
300
|
+
throw new Error('Invalid response from Ollama API');
|
|
301
|
+
}
|
|
302
|
+
const embedding = data.embedding;
|
|
303
|
+
// Normalize the embedding
|
|
304
|
+
const normalized = this.normalize(embedding);
|
|
305
|
+
// Cache it
|
|
306
|
+
this.cache.set(text, normalized);
|
|
307
|
+
return normalized;
|
|
308
|
+
}
|
|
309
|
+
catch (error) {
|
|
310
|
+
console.error(`[EmbeddingService] Ollama error for "${text.substring(0, 20)}...":`, error?.message || error);
|
|
311
|
+
return new Array(this.dimensions).fill(0);
|
|
312
|
+
}
|
|
313
|
+
}
|
|
243
314
|
// Batch-Embeddings
|
|
244
315
|
async embedBatch(texts) {
|
|
245
316
|
// For now, process sequentially via serialized queue to avoid overloading
|
|
@@ -306,7 +377,8 @@ class EmbeddingService {
|
|
|
306
377
|
return {
|
|
307
378
|
size: this.cache.size(),
|
|
308
379
|
maxSize: 1000,
|
|
309
|
-
model: this.modelId,
|
|
380
|
+
model: this.useOllama ? this.ollamaModel : this.modelId,
|
|
381
|
+
backend: this.useOllama ? 'ollama' : 'onnx',
|
|
310
382
|
dimensions: this.dimensions
|
|
311
383
|
};
|
|
312
384
|
}
|
package/dist/eval-suite.js
CHANGED
|
@@ -88,10 +88,14 @@ async function runEvaluation() {
|
|
|
88
88
|
}
|
|
89
89
|
const server = new index_1.MemoryServer(EVAL_DB_PATH);
|
|
90
90
|
await server.embeddingService.embed("warmup");
|
|
91
|
+
// Warmup reranker
|
|
92
|
+
await server.hybridSearch.advancedSearch({ query: "warmup", limit: 1, rerank: true });
|
|
91
93
|
await setupEvalData(server);
|
|
92
94
|
const methods = [
|
|
93
95
|
{ name: "Hybrid Search", func: (q) => server.hybridSearch.search({ query: q, limit: 10 }) },
|
|
96
|
+
{ name: "Reranked Search", func: (q) => server.hybridSearch.search({ query: q, limit: 10, rerank: true }) },
|
|
94
97
|
{ name: "Graph-RAG", func: (q) => server.hybridSearch.graphRag({ query: q, limit: 10, graphConstraints: { maxDepth: 2 } }) },
|
|
98
|
+
{ name: "Graph-RAG (Reranked)", func: (q) => server.hybridSearch.graphRag({ query: q, limit: 10, graphConstraints: { maxDepth: 2 }, rerank: true }) },
|
|
95
99
|
{ name: "Graph-Walking", func: (q) => server.graph_walking({ query: q, limit: 10, max_depth: 3 }) }
|
|
96
100
|
];
|
|
97
101
|
const summary = [];
|
|
@@ -101,6 +105,9 @@ async function runEvaluation() {
|
|
|
101
105
|
let totalRecall10 = 0;
|
|
102
106
|
let totalMRR = 0;
|
|
103
107
|
let totalLatency = 0;
|
|
108
|
+
const n = EVAL_DATASET.length;
|
|
109
|
+
// Reset cache between methods to get accurate latency
|
|
110
|
+
await server.hybridSearch.clearCache();
|
|
104
111
|
for (const task of EVAL_DATASET) {
|
|
105
112
|
const t0 = perf_hooks_1.performance.now();
|
|
106
113
|
const results = await method.func(task.query);
|
|
@@ -113,7 +120,6 @@ async function runEvaluation() {
|
|
|
113
120
|
totalMRR += mrr;
|
|
114
121
|
totalLatency += (t1 - t0);
|
|
115
122
|
}
|
|
116
|
-
const n = EVAL_DATASET.length;
|
|
117
123
|
summary.push({
|
|
118
124
|
Method: method.name,
|
|
119
125
|
"Recall@3": (totalRecall3 / n).toFixed(3),
|
package/dist/hybrid-search.js
CHANGED
|
@@ -5,15 +5,18 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
|
|
|
5
5
|
Object.defineProperty(exports, "__esModule", { value: true });
|
|
6
6
|
exports.HybridSearch = void 0;
|
|
7
7
|
const crypto_1 = __importDefault(require("crypto"));
|
|
8
|
+
const reranker_service_1 = require("./reranker-service");
|
|
8
9
|
const SEMANTIC_CACHE_THRESHOLD = 0.95;
|
|
9
10
|
class HybridSearch {
|
|
10
11
|
db;
|
|
11
12
|
embeddingService;
|
|
13
|
+
rerankerService;
|
|
12
14
|
searchCache = new Map();
|
|
13
15
|
CACHE_TTL = 300000; // 5 minutes cache
|
|
14
16
|
constructor(db, embeddingService) {
|
|
15
17
|
this.db = db;
|
|
16
18
|
this.embeddingService = embeddingService;
|
|
19
|
+
this.rerankerService = new reranker_service_1.RerankerService();
|
|
17
20
|
}
|
|
18
21
|
getCacheKey(options) {
|
|
19
22
|
const str = JSON.stringify({
|
|
@@ -75,6 +78,36 @@ class HybridSearch {
|
|
|
75
78
|
return { ...r, score };
|
|
76
79
|
});
|
|
77
80
|
}
|
|
81
|
+
async applyReranking(query, results) {
|
|
82
|
+
if (results.length <= 1)
|
|
83
|
+
return results;
|
|
84
|
+
console.error(`[HybridSearch] Reranking ${results.length} candidates...`);
|
|
85
|
+
const documents = results.map(r => {
|
|
86
|
+
const parts = [
|
|
87
|
+
r.name ? `Name: ${r.name}` : '',
|
|
88
|
+
r.type ? `Type: ${r.type}` : '',
|
|
89
|
+
r.text ? `Description: ${r.text}` : '',
|
|
90
|
+
r.metadata ? `Details: ${JSON.stringify(r.metadata)}` : ''
|
|
91
|
+
].filter(p => p !== '');
|
|
92
|
+
return parts.join(' | ');
|
|
93
|
+
});
|
|
94
|
+
try {
|
|
95
|
+
const rerankedOrder = await this.rerankerService.rerank(query, documents);
|
|
96
|
+
return rerankedOrder.map((item, i) => {
|
|
97
|
+
const original = results[item.index];
|
|
98
|
+
return {
|
|
99
|
+
...original,
|
|
100
|
+
score: (item.score + 1.0) / 2.0, // Normalize to 0-1 range if it's logits, or just use as is
|
|
101
|
+
explanation: (typeof original.explanation === 'string' ? original.explanation : JSON.stringify(original.explanation)) +
|
|
102
|
+
` | Reranked (Rank ${i + 1}, Cross-Encoder Score: ${item.score.toFixed(4)})`
|
|
103
|
+
};
|
|
104
|
+
});
|
|
105
|
+
}
|
|
106
|
+
catch (e) {
|
|
107
|
+
console.error(`[HybridSearch] Reranking failed, returning original results:`, e);
|
|
108
|
+
return results;
|
|
109
|
+
}
|
|
110
|
+
}
|
|
78
111
|
async advancedSearch(options) {
|
|
79
112
|
console.error("[HybridSearch] Starting advancedSearch with options:", JSON.stringify(options, null, 2));
|
|
80
113
|
const { query, limit = 10, filters, graphConstraints, vectorParams } = options;
|
|
@@ -212,6 +245,12 @@ class HybridSearch {
|
|
|
212
245
|
});
|
|
213
246
|
}
|
|
214
247
|
const finalResults = this.applyTimeDecay(searchResults);
|
|
248
|
+
// Phase 3: Reranking
|
|
249
|
+
if (options.rerank) {
|
|
250
|
+
const rerankedResults = await this.applyReranking(options.query, finalResults);
|
|
251
|
+
await this.updateCache(options, queryEmbedding, rerankedResults);
|
|
252
|
+
return rerankedResults;
|
|
253
|
+
}
|
|
215
254
|
await this.updateCache(options, queryEmbedding, finalResults);
|
|
216
255
|
return finalResults;
|
|
217
256
|
}
|
|
@@ -330,7 +369,11 @@ class HybridSearch {
|
|
|
330
369
|
return Object.entries(filters.metadata).every(([key, val]) => r.metadata[key] === val);
|
|
331
370
|
});
|
|
332
371
|
}
|
|
333
|
-
|
|
372
|
+
const decayedResults = this.applyTimeDecay(searchResults);
|
|
373
|
+
if (options.rerank) {
|
|
374
|
+
return await this.applyReranking(options.query, decayedResults);
|
|
375
|
+
}
|
|
376
|
+
return decayedResults;
|
|
334
377
|
}
|
|
335
378
|
catch (e) {
|
|
336
379
|
console.error("[HybridSearch] Error in graphRag:", e.message);
|
|
@@ -414,7 +457,8 @@ No markdown, no explanation. Just the JSON.`;
|
|
|
414
457
|
filters: {
|
|
415
458
|
...options.filters,
|
|
416
459
|
entityTypes: ["CommunitySummary"]
|
|
417
|
-
}
|
|
460
|
+
},
|
|
461
|
+
rerank: options.rerank
|
|
418
462
|
});
|
|
419
463
|
// If no community summaries found, fallback to standard search
|
|
420
464
|
if (results.length === 0) {
|
|
@@ -437,5 +481,15 @@ No markdown, no explanation. Just the JSON.`;
|
|
|
437
481
|
}
|
|
438
482
|
}));
|
|
439
483
|
}
|
|
484
|
+
async clearCache() {
|
|
485
|
+
this.searchCache.clear();
|
|
486
|
+
try {
|
|
487
|
+
await this.db.run(`{ ?[query_hash] := *search_cache{query_hash} :rm search_cache {query_hash} }`);
|
|
488
|
+
console.error("[HybridSearch] Cache cleared successfully.");
|
|
489
|
+
}
|
|
490
|
+
catch (e) {
|
|
491
|
+
console.error("[HybridSearch] Error clearing cache:", e);
|
|
492
|
+
}
|
|
493
|
+
}
|
|
440
494
|
}
|
|
441
495
|
exports.HybridSearch = HybridSearch;
|
package/dist/index.js
CHANGED
|
@@ -2631,6 +2631,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
|
|
|
2631
2631
|
entity_types: zod_1.z.array(zod_1.z.string()).optional().describe("Filter by entity types"),
|
|
2632
2632
|
include_entities: zod_1.z.boolean().optional().default(true).describe("Include entities in search"),
|
|
2633
2633
|
include_observations: zod_1.z.boolean().optional().default(true).describe("Include observations in search"),
|
|
2634
|
+
rerank: zod_1.z.boolean().optional().default(false).describe("Use Cross-Encoder reranking for higher precision"),
|
|
2634
2635
|
}),
|
|
2635
2636
|
zod_1.z.object({
|
|
2636
2637
|
action: zod_1.z.literal("advancedSearch"),
|
|
@@ -2658,6 +2659,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
|
|
|
2658
2659
|
vectorParams: zod_1.z.object({
|
|
2659
2660
|
efSearch: zod_1.z.number().optional().describe("HNSW search precision"),
|
|
2660
2661
|
}).optional().describe("Vector parameters"),
|
|
2662
|
+
rerank: zod_1.z.boolean().optional().default(false).describe("Use Cross-Encoder reranking for higher precision"),
|
|
2661
2663
|
}),
|
|
2662
2664
|
zod_1.z.object({
|
|
2663
2665
|
action: zod_1.z.literal("context"),
|
|
@@ -2679,6 +2681,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
|
|
|
2679
2681
|
query: zod_1.z.string().describe("Search query for initial vector seeds"),
|
|
2680
2682
|
max_depth: zod_1.z.number().min(1).max(3).optional().default(2).describe("Maximum depth of graph expansion (Default: 2)"),
|
|
2681
2683
|
limit: zod_1.z.number().optional().default(10).describe("Number of initial vector seeds"),
|
|
2684
|
+
rerank: zod_1.z.boolean().optional().default(false).describe("Use Cross-Encoder reranking for higher precision"),
|
|
2682
2685
|
}),
|
|
2683
2686
|
zod_1.z.object({
|
|
2684
2687
|
action: zod_1.z.literal("graph_walking"),
|
|
@@ -2691,6 +2694,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
|
|
|
2691
2694
|
action: zod_1.z.literal("agentic_search"),
|
|
2692
2695
|
query: zod_1.z.string().describe("Context query for agentic routing"),
|
|
2693
2696
|
limit: zod_1.z.number().optional().default(10).describe("Maximum number of results"),
|
|
2697
|
+
rerank: zod_1.z.boolean().optional().default(false).describe("Use Cross-Encoder reranking for higher precision"),
|
|
2694
2698
|
}),
|
|
2695
2699
|
]);
|
|
2696
2700
|
const QueryMemoryParameters = zod_1.z.object({
|
|
@@ -2711,6 +2715,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
|
|
|
2711
2715
|
as_of: zod_1.z.string().optional().describe("Only for entity_details: ISO string or 'NOW'"),
|
|
2712
2716
|
max_depth: zod_1.z.number().optional().describe("Only for graph_rag/graph_walking: Maximum expansion depth"),
|
|
2713
2717
|
start_entity_id: zod_1.z.string().optional().describe("Only for graph_walking: Start entity"),
|
|
2718
|
+
rerank: zod_1.z.boolean().optional().describe("Only for search/advancedSearch/agentic_search: Enable Cross-Encoder reranking"),
|
|
2714
2719
|
});
|
|
2715
2720
|
this.mcp.addTool({
|
|
2716
2721
|
name: "query_memory",
|
|
@@ -2745,6 +2750,7 @@ Notes: 'agentic_search' is the most powerful and adaptable, 'context' is ideal f
|
|
|
2745
2750
|
entityTypes: input.entity_types,
|
|
2746
2751
|
includeEntities: input.include_entities,
|
|
2747
2752
|
includeObservations: input.include_observations,
|
|
2753
|
+
rerank: input.rerank,
|
|
2748
2754
|
});
|
|
2749
2755
|
const conflictEntityIds = Array.from(new Set(results
|
|
2750
2756
|
.map((r) => (r.name ? r.id : r.entity_id))
|
|
@@ -2776,6 +2782,7 @@ Notes: 'agentic_search' is the most powerful and adaptable, 'context' is ideal f
|
|
|
2776
2782
|
filters: input.filters,
|
|
2777
2783
|
graphConstraints: input.graphConstraints,
|
|
2778
2784
|
vectorParams: input.vectorParams,
|
|
2785
|
+
rerank: input.rerank,
|
|
2779
2786
|
});
|
|
2780
2787
|
const conflictEntityIds = Array.from(new Set(results
|
|
2781
2788
|
.map((r) => (r.name ? r.id : r.entity_id))
|
|
@@ -2888,6 +2895,7 @@ Notes: 'agentic_search' is the most powerful and adaptable, 'context' is ideal f
|
|
|
2888
2895
|
const results = await this.hybridSearch.agenticRetrieve({
|
|
2889
2896
|
query: input.query,
|
|
2890
2897
|
limit: input.limit,
|
|
2898
|
+
rerank: input.rerank,
|
|
2891
2899
|
});
|
|
2892
2900
|
return JSON.stringify(results);
|
|
2893
2901
|
}
|
|
@@ -2900,7 +2908,8 @@ Notes: 'agentic_search' is the most powerful and adaptable, 'context' is ideal f
|
|
|
2900
2908
|
limit: input.limit,
|
|
2901
2909
|
graphConstraints: {
|
|
2902
2910
|
maxDepth: input.max_depth
|
|
2903
|
-
}
|
|
2911
|
+
},
|
|
2912
|
+
rerank: input.rerank,
|
|
2904
2913
|
});
|
|
2905
2914
|
return JSON.stringify(results);
|
|
2906
2915
|
}
|
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
|
|
3
|
+
if (k2 === undefined) k2 = k;
|
|
4
|
+
var desc = Object.getOwnPropertyDescriptor(m, k);
|
|
5
|
+
if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
|
|
6
|
+
desc = { enumerable: true, get: function() { return m[k]; } };
|
|
7
|
+
}
|
|
8
|
+
Object.defineProperty(o, k2, desc);
|
|
9
|
+
}) : (function(o, m, k, k2) {
|
|
10
|
+
if (k2 === undefined) k2 = k;
|
|
11
|
+
o[k2] = m[k];
|
|
12
|
+
}));
|
|
13
|
+
var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
|
|
14
|
+
Object.defineProperty(o, "default", { enumerable: true, value: v });
|
|
15
|
+
}) : function(o, v) {
|
|
16
|
+
o["default"] = v;
|
|
17
|
+
});
|
|
18
|
+
var __importStar = (this && this.__importStar) || (function () {
|
|
19
|
+
var ownKeys = function(o) {
|
|
20
|
+
ownKeys = Object.getOwnPropertyNames || function (o) {
|
|
21
|
+
var ar = [];
|
|
22
|
+
for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
|
|
23
|
+
return ar;
|
|
24
|
+
};
|
|
25
|
+
return ownKeys(o);
|
|
26
|
+
};
|
|
27
|
+
return function (mod) {
|
|
28
|
+
if (mod && mod.__esModule) return mod;
|
|
29
|
+
var result = {};
|
|
30
|
+
if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
|
|
31
|
+
__setModuleDefault(result, mod);
|
|
32
|
+
return result;
|
|
33
|
+
};
|
|
34
|
+
})();
|
|
35
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
36
|
+
exports.RerankerService = void 0;
|
|
37
|
+
const transformers_1 = require("@xenova/transformers");
|
|
38
|
+
const path = __importStar(require("path"));
|
|
39
|
+
const fs = __importStar(require("fs"));
|
|
40
|
+
// Robust path to project root
|
|
41
|
+
const PROJECT_ROOT = path.resolve(__dirname, '..');
|
|
42
|
+
const CACHE_DIR = path.resolve(PROJECT_ROOT, '.cache');
|
|
43
|
+
transformers_1.env.cacheDir = CACHE_DIR;
|
|
44
|
+
transformers_1.env.allowLocalModels = true;
|
|
45
|
+
class RerankerService {
|
|
46
|
+
pipe = null;
|
|
47
|
+
modelId;
|
|
48
|
+
initialized = false;
|
|
49
|
+
constructor() {
|
|
50
|
+
// Using a tiny but effective cross-encoder
|
|
51
|
+
this.modelId = process.env.RERANKER_MODEL || "Xenova/ms-marco-MiniLM-L-6-v2";
|
|
52
|
+
console.error(`[RerankerService] Using model: ${this.modelId}`);
|
|
53
|
+
}
|
|
54
|
+
async init() {
|
|
55
|
+
if (this.initialized)
|
|
56
|
+
return;
|
|
57
|
+
try {
|
|
58
|
+
// Check if model exists locally in cache
|
|
59
|
+
const parts = this.modelId.split('/');
|
|
60
|
+
const namespace = parts[0];
|
|
61
|
+
const modelName = parts[1];
|
|
62
|
+
const modelDir = path.join(CACHE_DIR, namespace, modelName);
|
|
63
|
+
if (!fs.existsSync(modelDir)) {
|
|
64
|
+
console.log(`[RerankerService] Model not found, downloading ${this.modelId}...`);
|
|
65
|
+
}
|
|
66
|
+
// We use the sequence-classification task for cross-encoders
|
|
67
|
+
this.pipe = await (0, transformers_1.pipeline)('sequence-classification', this.modelId, {
|
|
68
|
+
quantized: true,
|
|
69
|
+
// @ts-ignore
|
|
70
|
+
progress_callback: (info) => {
|
|
71
|
+
if (info.status === 'done') {
|
|
72
|
+
console.error(`[RerankerService] Loaded shard: ${info.file}`);
|
|
73
|
+
}
|
|
74
|
+
}
|
|
75
|
+
});
|
|
76
|
+
this.initialized = true;
|
|
77
|
+
console.error(`[RerankerService] Initialization complete.`);
|
|
78
|
+
}
|
|
79
|
+
catch (error) {
|
|
80
|
+
console.error(`[RerankerService] Initialization failed:`, error);
|
|
81
|
+
throw error;
|
|
82
|
+
}
|
|
83
|
+
}
|
|
84
|
+
/**
|
|
85
|
+
* Reranks a list of documents based on a query.
|
|
86
|
+
* @param query The search query
|
|
87
|
+
* @param documents Array of document strings to rank
|
|
88
|
+
* @returns Array of { index, score } sorted by score descending
|
|
89
|
+
*/
|
|
90
|
+
async rerank(query, documents) {
|
|
91
|
+
if (documents.length === 0)
|
|
92
|
+
return [];
|
|
93
|
+
await this.init();
|
|
94
|
+
try {
|
|
95
|
+
const results = [];
|
|
96
|
+
// Cross-encoders take pairs of [query, document]
|
|
97
|
+
// We can process them in a single batch
|
|
98
|
+
const inputs = documents.map(doc => [query, doc]);
|
|
99
|
+
// @ts-ignore
|
|
100
|
+
const outputs = await this.pipe(inputs, {
|
|
101
|
+
topk: 1 // We want the score for the "relevant" class (usually index 1 or the only output)
|
|
102
|
+
});
|
|
103
|
+
// Handle both array of results and single result (if only 1 doc)
|
|
104
|
+
const outputArray = Array.isArray(outputs) ? outputs : [outputs];
|
|
105
|
+
for (let i = 0; i < outputArray.length; i++) {
|
|
106
|
+
// Cross-encoders for ms-marco typically output a single logit/score or a 2-class distribution
|
|
107
|
+
// transformers.js sequence-classification returns { label: string, score: number }[]
|
|
108
|
+
// For ms-marco, label 'LABEL_1' is usually the relevance score
|
|
109
|
+
const out = outputArray[i];
|
|
110
|
+
results.push({
|
|
111
|
+
index: i,
|
|
112
|
+
score: out.score || 0
|
|
113
|
+
});
|
|
114
|
+
}
|
|
115
|
+
// Sort by score descending
|
|
116
|
+
return results.sort((a, b) => b.score - a.score);
|
|
117
|
+
}
|
|
118
|
+
catch (error) {
|
|
119
|
+
console.error(`[RerankerService] Reranking failed:`, error);
|
|
120
|
+
// Fallback: return original order with neutral scores
|
|
121
|
+
return documents.map((_, i) => ({ index: i, score: 0 }));
|
|
122
|
+
}
|
|
123
|
+
}
|
|
124
|
+
}
|
|
125
|
+
exports.RerankerService = RerankerService;
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "cozo-memory",
|
|
3
|
-
"version": "1.0
|
|
3
|
+
"version": "1.1.0",
|
|
4
4
|
"mcpName": "io.github.tobs-code/cozo-memory",
|
|
5
5
|
"description": "Local-first persistent memory system for AI agents with hybrid search, graph reasoning, and MCP integration",
|
|
6
6
|
"main": "dist/index.js",
|
|
@@ -38,7 +38,9 @@
|
|
|
38
38
|
"test": "echo \"Error: no test specified\" && exit 1",
|
|
39
39
|
"benchmark": "ts-node src/benchmark.ts",
|
|
40
40
|
"eval": "ts-node src/eval-suite.ts",
|
|
41
|
-
"download-model": "ts-node src/download-model.ts"
|
|
41
|
+
"download-model": "ts-node src/download-model.ts",
|
|
42
|
+
"download-pplx-embed": "ts-node src/download-pplx-embed.ts",
|
|
43
|
+
"compare-embeddings": "ts-node src/compare-embeddings.ts"
|
|
42
44
|
},
|
|
43
45
|
"keywords": [
|
|
44
46
|
"mcp",
|
|
@@ -95,4 +97,4 @@
|
|
|
95
97
|
"tsx": "^4.21.0",
|
|
96
98
|
"typescript": "^5.9.3"
|
|
97
99
|
}
|
|
98
|
-
}
|
|
100
|
+
}
|