npm - cozo-memory - Versions diffs - 1.0.8 → 1.1.0 - Mend

cozo-memory 1.0.8 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md +17 -4
package/dist/compare-embeddings.js +402 -0
package/dist/download-pplx-embed.js +151 -0
package/dist/embedding-service.js +79 -7
package/dist/eval-suite.js +7 -1
package/dist/hybrid-search.js +56 -2
package/dist/index.js +10 -1
package/dist/reranker-service.js +125 -0
package/package.json +5 -3

package/README.md CHANGED Viewed

@@ -63,6 +63,8 @@ Now you can add the server to your MCP client (e.g. Claude Desktop).
 🧠 **Agentic Retrieval Layer (v2.0)** - Auto-routing engine that analyzes query intent via local LLM to select the optimal search strategy (Vector, Graph, or Community)
+🎯 **Tiny Learned Reranker (v2.0)** - Integrated Cross-Encoder model (`ms-marco-MiniLM-L-6-v2`) for ultra-precise re-ranking of top search results
 🎯 **Multi-Vector Support (since v1.7)** - Dual embeddings per entity: content-embedding for context, name-embedding for identification
 ⚡ **Semantic Caching (since v0.8.5)** - Two-level cache (L1 memory + L2 persistent) with semantic query matching
@@ -191,8 +193,10 @@ This tool compares strategies using a synthetic dataset and measures **Recall@K*
 | Method | Recall@10 | Avg Latency | Best For |
 | :--- | :--- | :--- | :--- |
 | **Graph-RAG** | **1.00** | **~32 ms** | Deep relational reasoning |
+| **Graph-RAG (Reranked)** | **1.00** | **~36 ms** | Maximum precision for relational data |
 | **Graph-Walking** | 1.00 | ~50 ms | Associative path exploration |
 | **Hybrid Search** | 1.00 | ~89 ms | Broad factual retrieval |
+| **Reranked Search** | 1.00 | ~20 ms* | Ultra-precise factual search (Warm cache) |
 ## Architecture
@@ -608,14 +612,14 @@ PDF Ingestion via File Path:
 ### query_memory (Read)
 Actions:
-- `search`: `{ query, limit?, entity_types?, include_entities?, include_observations? }`
-- `advancedSearch`: `{ query, limit?, filters?, graphConstraints?, vectorOptions? }` **(New v1.1 / v1.4)**: Extended search with native HNSW filters (types) and robust post-filtering (metadata, time).
+- `search`: `{ query, limit?, entity_types?, include_entities?, include_observations?, rerank? }`
+- `advancedSearch`: `{ query, limit?, filters?, graphConstraints?, vectorOptions?, rerank? }`
 - `context`: `{ query, context_window?, time_range_hours? }`
 - `entity_details`: `{ entity_id, as_of? }`
 - `history`: `{ entity_id }`
-- `graph_rag`: `{ query, max_depth?, limit?, filters? }` Graph-based reasoning. Finds vector seeds (with inline filtering) first and then expands transitive relationships. Uses recursive Datalog for efficient BFS expansion.
+- `graph_rag`: `{ query, max_depth?, limit?, filters?, rerank? }` Graph-based reasoning. Finds vector seeds (with inline filtering) first and then expands transitive relationships. Uses recursive Datalog for efficient BFS expansion.
 - `graph_walking`: `{ query, start_entity_id?, max_depth?, limit? }` (v1.7) Recursive semantic graph search. Starts at vector seeds or a specific entity and follows relationships to other semantically relevant entities. Ideal for deeper path exploration.
-- `agentic_search`: `{ query, limit? }` **(New v2.0)**: **Auto-Routing Search**. Uses a local LLM (Ollama) to analyze query intent and automatically routes it to the most appropriate strategy (`vector_search`, `graph_walk`, or `community_summary`).
+- `agentic_search`: `{ query, limit?, rerank? }` **(New v2.0)**: **Auto-Routing Search**. Uses a local LLM (Ollama) to analyze query intent and automatically routes it to the most appropriate strategy (`vector_search`, `graph_walk`, or `community_summary`).
 - `get_relation_evolution`: `{ from_id, to_id?, since?, until? }` (in `analyze_graph`) Shows temporal development of relationships including time range filter and diff summary.
 Important Details:
@@ -885,6 +889,15 @@ Uncertainty/Transparency:
 - Inference candidates are marked as `source: "inference"` and provide a short reason (uncertainty hint) in the result.
 - In `context` output, inferred entities additionally carry an `uncertainty_hint` so an LLM can distinguish "hard fact" vs. "conjecture".
+### Tiny Learned Reranker (Cross-Encoder)
+For maximum precision, CozoDB Memory integrates a specialized **Cross-Encoder Reranker** (Phase 2 RAG).
+- **Model**: `Xenova/ms-marco-MiniLM-L-6-v2` (Local ONNX)
+- **Mechanism**: After initial hybrid retrieval, the top candidates (up to 30) are re-evaluated by the cross-encoder. Unlike bi-encoders (vectors), cross-encoders process query and document simultaneously, capturing deep semantic nuances.
+- **Latency**: Minimal overhead (~4-6ms for top 10 candidates).
+- **Supported Tools**: Available as a `rerank: true` parameter in `search`, `advancedSearch`, `graph_rag`, and `agentic_search`.
 ### Inference Engine
 Inference uses multiple strategies (non-persisting):

package/dist/compare-embeddings.js ADDED Viewed

@@ -0,0 +1,402 @@
+"use strict";
+var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    var desc = Object.getOwnPropertyDescriptor(m, k);
+    if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+      desc = { enumerable: true, get: function() { return m[k]; } };
+    }
+    Object.defineProperty(o, k2, desc);
+}) : (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    o[k2] = m[k];
+}));
+var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
+    Object.defineProperty(o, "default", { enumerable: true, value: v });
+}) : function(o, v) {
+    o["default"] = v;
+});
+var __importStar = (this && this.__importStar) || (function () {
+    var ownKeys = function(o) {
+        ownKeys = Object.getOwnPropertyNames || function (o) {
+            var ar = [];
+            for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
+            return ar;
+        };
+        return ownKeys(o);
+    };
+    return function (mod) {
+        if (mod && mod.__esModule) return mod;
+        var result = {};
+        if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
+        __setModuleDefault(result, mod);
+        return result;
+    };
+})();
+Object.defineProperty(exports, "__esModule", { value: true });
+require("dotenv/config");
+const embedding_service_1 = require("./embedding-service");
+const path = __importStar(require("path"));
+const fs = __importStar(require("fs"));
+// Test data - verschiedene Szenarien
+const TEST_QUERIES = [
+    "What is machine learning?",
+    "How do neural networks work?",
+    "Explain quantum computing",
+    "What are the benefits of TypeScript?",
+    "How to optimize database queries?",
+    "Best practices for API design",
+    "Understanding distributed systems",
+    "Introduction to graph databases",
+    "Microservices architecture patterns",
+    "Cloud computing fundamentals"
+];
+const TEST_DOCUMENTS = [
+    "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.",
+    "Neural networks are computing systems inspired by biological neural networks that constitute animal brains. They consist of interconnected nodes or neurons.",
+    "Quantum computing uses quantum-mechanical phenomena such as superposition and entanglement to perform operations on data.",
+    "TypeScript is a strongly typed programming language that builds on JavaScript, giving you better tooling at any scale.",
+    "Database query optimization involves analyzing and improving query performance through indexing, query rewriting, and execution plan analysis.",
+    "API design best practices include using RESTful principles, proper versioning, clear documentation, and consistent error handling.",
+    "Distributed systems are computing systems whose components are located on different networked computers, which communicate and coordinate their actions.",
+    "Graph databases use graph structures with nodes, edges, and properties to represent and store data, ideal for connected data.",
+    "Microservices architecture is an approach to developing a single application as a suite of small services, each running in its own process.",
+    "Cloud computing delivers computing services including servers, storage, databases, networking, software, analytics, and intelligence over the Internet."
+];
+// Cosine similarity berechnen
+function cosineSimilarity(a, b) {
+    let dotProduct = 0;
+    let normA = 0;
+    let normB = 0;
+    for (let i = 0; i < a.length; i++) {
+        dotProduct += a[i] * b[i];
+        normA += a[i] * a[i];
+        normB += b[i] * b[i];
+    }
+    return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
+}
+// Test 1: Embedding-Geschwindigkeit
+async function testEmbeddingSpeed(service, modelName) {
+    console.log(`\n${'='.repeat(70)}`);
+    console.log(`TEST 1: Embedding-Geschwindigkeit - ${modelName}`);
+    console.log('='.repeat(70));
+    const times = [];
+    // Warmup
+    await service.embed("warmup");
+    // Single embeddings
+    for (const query of TEST_QUERIES) {
+        const start = performance.now();
+        await service.embed(query);
+        const end = performance.now();
+        times.push(end - start);
+    }
+    const avgTime = times.reduce((a, b) => a + b, 0) / times.length;
+    const minTime = Math.min(...times);
+    const maxTime = Math.max(...times);
+    console.log(`\nSingle Embedding Performance:`);
+    console.log(`  Average: ${avgTime.toFixed(2)} ms`);
+    console.log(`  Min:     ${minTime.toFixed(2)} ms`);
+    console.log(`  Max:     ${maxTime.toFixed(2)} ms`);
+    return { avgTime, minTime, maxTime };
+}
+// Test 2: Batch-Performance
+async function testBatchPerformance(service, modelName) {
+    console.log(`\n${'='.repeat(70)}`);
+    console.log(`TEST 2: Batch-Performance - ${modelName}`);
+    console.log('='.repeat(70));
+    const start = performance.now();
+    await service.embedBatch(TEST_DOCUMENTS);
+    const end = performance.now();
+    const totalTime = end - start;
+    const avgPerDoc = totalTime / TEST_DOCUMENTS.length;
+    console.log(`\nBatch Embedding (${TEST_DOCUMENTS.length} documents):`);
+    console.log(`  Total time:    ${totalTime.toFixed(2)} ms`);
+    console.log(`  Avg per doc:   ${avgPerDoc.toFixed(2)} ms`);
+    console.log(`  Throughput:    ${(1000 / avgPerDoc).toFixed(2)} docs/sec`);
+    return { totalTime, avgPerDoc };
+}
+// Test 3: Semantic Similarity Quality
+async function testSemanticSimilarity(service, modelName) {
+    console.log(`\n${'='.repeat(70)}`);
+    console.log(`TEST 3: Semantic Similarity Quality - ${modelName}`);
+    console.log('='.repeat(70));
+    // Embed queries and documents
+    const queryEmbeddings = await service.embedBatch(TEST_QUERIES);
+    const docEmbeddings = await service.embedBatch(TEST_DOCUMENTS);
+    // Expected matches (query index -> document index)
+    const expectedMatches = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
+    let correctMatches = 0;
+    const similarities = [];
+    console.log(`\nTop Match for each Query:`);
+    for (let i = 0; i < TEST_QUERIES.length; i++) {
+        const queryEmb = queryEmbeddings[i];
+        // Calculate similarities with all documents
+        const sims = docEmbeddings.map((docEmb, idx) => ({
+            idx,
+            similarity: cosineSimilarity(queryEmb, docEmb)
+        }));
+        // Sort by similarity
+        sims.sort((a, b) => b.similarity - a.similarity);
+        const topMatch = sims[0];
+        const isCorrect = topMatch.idx === expectedMatches[i];
+        if (isCorrect)
+            correctMatches++;
+        similarities.push(topMatch.similarity);
+        console.log(`  Q${i}: "${TEST_QUERIES[i].substring(0, 40)}..."`);
+        console.log(`       → Doc ${topMatch.idx} (sim: ${topMatch.similarity.toFixed(4)}) ${isCorrect ? '✓' : '✗'}`);
+    }
+    const accuracy = (correctMatches / TEST_QUERIES.length) * 100;
+    const avgSimilarity = similarities.reduce((a, b) => a + b, 0) / similarities.length;
+    console.log(`\nResults:`);
+    console.log(`  Accuracy:          ${accuracy.toFixed(1)}% (${correctMatches}/${TEST_QUERIES.length})`);
+    console.log(`  Avg Similarity:    ${avgSimilarity.toFixed(4)}`);
+    return { accuracy, avgSimilarity, correctMatches };
+}
+// Test 4: Long Context Handling
+async function testLongContext(service, modelName) {
+    console.log(`\n${'='.repeat(70)}`);
+    console.log(`TEST 4: Long Context Handling - ${modelName}`);
+    console.log('='.repeat(70));
+    const shortText = "Machine learning is AI.";
+    const mediumText = TEST_DOCUMENTS[0]; // ~150 chars
+    const longText = TEST_DOCUMENTS.join(" "); // ~1000 chars
+    const veryLongText = longText.repeat(5); // ~5000 chars
+    const tests = [
+        { name: "Short (~20 chars)", text: shortText },
+        { name: "Medium (~150 chars)", text: mediumText },
+        { name: "Long (~1000 chars)", text: longText },
+        { name: "Very Long (~5000 chars)", text: veryLongText }
+    ];
+    console.log(`\nContext Length Performance:`);
+    const results = [];
+    for (const test of tests) {
+        const start = performance.now();
+        await service.embed(test.text);
+        const end = performance.now();
+        const time = end - start;
+        console.log(`  ${test.name.padEnd(25)} ${time.toFixed(2)} ms`);
+        results.push({ name: test.name, time });
+    }
+    return results;
+}
+// Test 5: Cache Performance
+async function testCachePerformance(service, modelName) {
+    console.log(`\n${'='.repeat(70)}`);
+    console.log(`TEST 5: Cache Performance - ${modelName}`);
+    console.log('='.repeat(70));
+    const testText = "Test cache performance";
+    // First call (cold)
+    const start1 = performance.now();
+    await service.embed(testText);
+    const end1 = performance.now();
+    const coldTime = end1 - start1;
+    // Second call (cached)
+    const start2 = performance.now();
+    await service.embed(testText);
+    const end2 = performance.now();
+    const cachedTime = end2 - start2;
+    const speedup = coldTime / cachedTime;
+    console.log(`\nCache Hit Performance:`);
+    console.log(`  Cold (first call):   ${coldTime.toFixed(2)} ms`);
+    console.log(`  Cached (second):     ${cachedTime.toFixed(2)} ms`);
+    console.log(`  Speedup:             ${speedup.toFixed(1)}x faster`);
+    const stats = service.getCacheStats();
+    console.log(`\nCache Statistics:`);
+    console.log(`  Size:     ${stats.size}/${stats.maxSize}`);
+    console.log(`  Model:    ${stats.model}`);
+    console.log(`  Dims:     ${stats.dimensions}`);
+    return { coldTime, cachedTime, speedup };
+}
+// Test 6: Memory Usage
+async function testMemoryUsage(service, modelName) {
+    console.log(`\n${'='.repeat(70)}`);
+    console.log(`TEST 6: Memory Usage - ${modelName}`);
+    console.log('='.repeat(70));
+    const memBefore = process.memoryUsage();
+    // Embed a batch
+    await service.embedBatch(TEST_DOCUMENTS);
+    const memAfter = process.memoryUsage();
+    const heapUsedMB = (memAfter.heapUsed - memBefore.heapUsed) / 1024 / 1024;
+    const rssMB = (memAfter.rss - memBefore.rss) / 1024 / 1024;
+    console.log(`\nMemory Usage (after batch embedding):`);
+    console.log(`  Heap Used:    ${heapUsedMB.toFixed(2)} MB`);
+    console.log(`  RSS:          ${rssMB.toFixed(2)} MB`);
+    console.log(`  Total Heap:   ${(memAfter.heapTotal / 1024 / 1024).toFixed(2)} MB`);
+    return { heapUsedMB, rssMB };
+}
+// Main comparison function
+async function runSingleModelTest(modelId) {
+    console.log('\n' + '█'.repeat(70));
+    console.log(`TESTING MODEL: ${modelId}`);
+    console.log('█'.repeat(70));
+    const service = new embedding_service_1.EmbeddingService();
+    const results = {
+        model: modelId,
+        timestamp: new Date().toISOString()
+    };
+    // Run tests
+    results.speed = await testEmbeddingSpeed(service, modelId);
+    results.batch = await testBatchPerformance(service, modelId);
+    results.similarity = await testSemanticSimilarity(service, modelId);
+    results.longContext = await testLongContext(service, modelId);
+    results.cache = await testCachePerformance(service, modelId);
+    results.memory = await testMemoryUsage(service, modelId);
+    // Save results
+    const resultsPath = path.join(__dirname, '..', `embedding-results-${modelId.replace(/\//g, '-')}.json`);
+    fs.writeFileSync(resultsPath, JSON.stringify(results, null, 2));
+    console.log(`\n✓ Results saved to: ${resultsPath}`);
+    return results;
+}
+async function compareResults() {
+    console.log('\n' + '█'.repeat(70));
+    console.log('LOADING AND COMPARING RESULTS');
+    console.log('█'.repeat(70));
+    const bgeFile = path.join(__dirname, '..', 'embedding-results-Xenova-bge-m3.json');
+    const pplxFile = path.join(__dirname, '..', 'embedding-results-perplexity-ai-pplx-embed-v1-0.6b.json');
+    if (!fs.existsSync(bgeFile)) {
+        console.error('\n✗ BGE-M3 results not found!');
+        console.log('Please run: EMBEDDING_MODEL=Xenova/bge-m3 npm run compare-embeddings');
+        return;
+    }
+    if (!fs.existsSync(pplxFile)) {
+        console.error('\n✗ pplx-embed results not found!');
+        console.log('Please run: EMBEDDING_MODEL=perplexity-ai/pplx-embed-v1-0.6b npm run compare-embeddings');
+        return;
+    }
+    const bge = JSON.parse(fs.readFileSync(bgeFile, 'utf-8'));
+    const pplx = JSON.parse(fs.readFileSync(pplxFile, 'utf-8'));
+    // Print comparison summary
+    console.log(`\n\n${'█'.repeat(70)}`);
+    console.log('COMPARISON SUMMARY');
+    console.log('█'.repeat(70));
+    console.log(`\n1. EMBEDDING SPEED (lower is better)`);
+    console.log(`   BGE-M3:     ${bge.speed.avgTime.toFixed(2)} ms`);
+    console.log(`   pplx-embed: ${pplx.speed.avgTime.toFixed(2)} ms`);
+    const speedDiff = ((pplx.speed.avgTime - bge.speed.avgTime) / bge.speed.avgTime * 100);
+    console.log(`   Winner:     ${bge.speed.avgTime < pplx.speed.avgTime ? 'BGE-M3' : 'pplx-embed'} (${Math.abs(speedDiff).toFixed(1)}% ${speedDiff > 0 ? 'slower' : 'faster'})`);
+    console.log(`\n2. BATCH THROUGHPUT (higher is better)`);
+    const bgeThroughput = 1000 / bge.batch.avgPerDoc;
+    const pplxThroughput = 1000 / pplx.batch.avgPerDoc;
+    console.log(`   BGE-M3:     ${bgeThroughput.toFixed(2)} docs/sec`);
+    console.log(`   pplx-embed: ${pplxThroughput.toFixed(2)} docs/sec`);
+    console.log(`   Winner:     ${bgeThroughput > pplxThroughput ? 'BGE-M3' : 'pplx-embed'}`);
+    console.log(`\n3. SEMANTIC SIMILARITY ACCURACY (higher is better)`);
+    console.log(`   BGE-M3:     ${bge.similarity.accuracy.toFixed(1)}% (${bge.similarity.correctMatches}/${TEST_QUERIES.length})`);
+    console.log(`   pplx-embed: ${pplx.similarity.accuracy.toFixed(1)}% (${pplx.similarity.correctMatches}/${TEST_QUERIES.length})`);
+    console.log(`   Winner:     ${bge.similarity.accuracy > pplx.similarity.accuracy ? 'BGE-M3' : 'pplx-embed'} ${pplx.similarity.accuracy > bge.similarity.accuracy ? '🏆' : ''}`);
+    console.log(`\n4. AVERAGE SIMILARITY SCORE (higher is better)`);
+    console.log(`   BGE-M3:     ${bge.similarity.avgSimilarity.toFixed(4)}`);
+    console.log(`   pplx-embed: ${pplx.similarity.avgSimilarity.toFixed(4)}`);
+    const simDiff = ((pplx.similarity.avgSimilarity - bge.similarity.avgSimilarity) / bge.similarity.avgSimilarity * 100);
+    console.log(`   Winner:     ${bge.similarity.avgSimilarity > pplx.similarity.avgSimilarity ? 'BGE-M3' : 'pplx-embed'} (${Math.abs(simDiff).toFixed(1)}% ${simDiff > 0 ? 'higher' : 'lower'})`);
+    console.log(`\n5. CACHE SPEEDUP (higher is better)`);
+    console.log(`   BGE-M3:     ${bge.cache.speedup.toFixed(1)}x`);
+    console.log(`   pplx-embed: ${pplx.cache.speedup.toFixed(1)}x`);
+    console.log(`\n6. MEMORY USAGE (lower is better)`);
+    console.log(`   BGE-M3:     ${bge.memory.heapUsedMB.toFixed(2)} MB heap`);
+    console.log(`   pplx-embed: ${pplx.memory.heapUsedMB.toFixed(2)} MB heap`);
+    console.log(`   Winner:     ${bge.memory.heapUsedMB < pplx.memory.heapUsedMB ? 'BGE-M3' : 'pplx-embed'}`);
+    // Score calculation
+    let bgeScore = 0;
+    let pplxScore = 0;
+    if (bge.speed.avgTime < pplx.speed.avgTime)
+        bgeScore++;
+    else
+        pplxScore++;
+    if (bgeThroughput > pplxThroughput)
+        bgeScore++;
+    else
+        pplxScore++;
+    if (bge.similarity.accuracy > pplx.similarity.accuracy)
+        bgeScore++;
+    else
+        pplxScore++;
+    if (bge.similarity.avgSimilarity > pplx.similarity.avgSimilarity)
+        bgeScore++;
+    else
+        pplxScore++;
+    if (bge.memory.heapUsedMB < pplx.memory.heapUsedMB)
+        bgeScore++;
+    else
+        pplxScore++;
+    // Overall winner
+    console.log(`\n${'='.repeat(70)}`);
+    console.log('OVERALL SCORE');
+    console.log('='.repeat(70));
+    console.log(`\n  BGE-M3:     ${bgeScore}/5 wins`);
+    console.log(`  pplx-embed: ${pplxScore}/5 wins`);
+    console.log(`\n${'='.repeat(70)}`);
+    console.log('RECOMMENDATION');
+    console.log('='.repeat(70));
+    if (pplxScore > bgeScore) {
+        console.log(`\n✓ pplx-embed-v1-0.6b is RECOMMENDED 🏆`);
+        console.log(`  Reasons:`);
+        if (pplx.similarity.accuracy > bge.similarity.accuracy) {
+            console.log(`  ✓ Better semantic similarity accuracy (+${(pplx.similarity.accuracy - bge.similarity.accuracy).toFixed(1)}%)`);
+        }
+        if (pplx.similarity.avgSimilarity > bge.similarity.avgSimilarity) {
+            console.log(`  ✓ Higher quality embeddings (+${(simDiff).toFixed(1)}%)`);
+        }
+        console.log(`  ✓ 32K context length (vs 8K for BGE-M3)`);
+        console.log(`  ✓ Better MTEB benchmark scores`);
+    }
+    else if (bgeScore > pplxScore) {
+        console.log(`\n✓ BGE-M3 is RECOMMENDED 🏆`);
+        console.log(`  Reasons:`);
+        if (bge.speed.avgTime < pplx.speed.avgTime) {
+            console.log(`  ✓ Faster embedding speed (-${Math.abs(speedDiff).toFixed(1)}%)`);
+        }
+        if (bge.memory.heapUsedMB < pplx.memory.heapUsedMB) {
+            console.log(`  ✓ Lower memory usage`);
+        }
+        console.log(`  ✓ Automatic download (no manual setup)`);
+        console.log(`  ✓ Proven stability`);
+    }
+    else {
+        console.log(`\n⚖ BOTH MODELS ARE EQUALLY COMPETITIVE`);
+        console.log(`  Choose based on your priorities:`);
+        console.log(`  - pplx-embed: Better quality, longer context (32K)`);
+        console.log(`  - BGE-M3: Faster, easier setup, automatic download`);
+    }
+    console.log('\n' + '█'.repeat(70));
+    console.log('COMPARISON COMPLETE');
+    console.log('█'.repeat(70) + '\n');
+}
+// Main entry point
+async function main() {
+    const currentModel = process.env.EMBEDDING_MODEL || "Xenova/bge-m3";
+    // Check if we should compare existing results
+    const args = process.argv.slice(2);
+    if (args.includes('--compare')) {
+        await compareResults();
+        return;
+    }
+    console.log('\n' + '█'.repeat(70));
+    console.log('EMBEDDING MODEL BENCHMARK');
+    console.log('█'.repeat(70));
+    console.log(`\nCurrent model: ${currentModel}`);
+    console.log('\nThis will run a comprehensive test suite including:');
+    console.log('  1. Embedding speed');
+    console.log('  2. Batch performance');
+    console.log('  3. Semantic similarity quality');
+    console.log('  4. Long context handling');
+    console.log('  5. Cache performance');
+    console.log('  6. Memory usage');
+    console.log('\nEstimated time: 2-3 minutes\n');
+    await runSingleModelTest(currentModel);
+    console.log('\n' + '='.repeat(70));
+    console.log('NEXT STEPS');
+    console.log('='.repeat(70));
+    if (currentModel === 'Xenova/bge-m3') {
+        console.log('\nTo test pplx-embed, run:');
+        console.log('  EMBEDDING_MODEL=perplexity-ai/pplx-embed-v1-0.6b npm run compare-embeddings');
+    }
+    else if (currentModel === 'perplexity-ai/pplx-embed-v1-0.6b') {
+        console.log('\nTo test BGE-M3, run:');
+        console.log('  EMBEDDING_MODEL=Xenova/bge-m3 npm run compare-embeddings');
+    }
+    console.log('\nTo compare both results, run:');
+    console.log('  npm run compare-embeddings -- --compare');
+    console.log();
+}
+// Run
+main().catch(console.error);

package/dist/download-pplx-embed.js ADDED Viewed

@@ -0,0 +1,151 @@
+"use strict";
+var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    var desc = Object.getOwnPropertyDescriptor(m, k);
+    if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+      desc = { enumerable: true, get: function() { return m[k]; } };
+    }
+    Object.defineProperty(o, k2, desc);
+}) : (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    o[k2] = m[k];
+}));
+var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
+    Object.defineProperty(o, "default", { enumerable: true, value: v });
+}) : function(o, v) {
+    o["default"] = v;
+});
+var __importStar = (this && this.__importStar) || (function () {
+    var ownKeys = function(o) {
+        ownKeys = Object.getOwnPropertyNames || function (o) {
+            var ar = [];
+            for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
+            return ar;
+        };
+        return ownKeys(o);
+    };
+    return function (mod) {
+        if (mod && mod.__esModule) return mod;
+        var result = {};
+        if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
+        __setModuleDefault(result, mod);
+        return result;
+    };
+})();
+Object.defineProperty(exports, "__esModule", { value: true });
+require("dotenv/config");
+const https = __importStar(require("https"));
+const fs = __importStar(require("fs"));
+const path = __importStar(require("path"));
+const transformers_1 = require("@xenova/transformers");
+// Configure cache path
+const CACHE_DIR = path.resolve('./.cache');
+transformers_1.env.cacheDir = CACHE_DIR;
+const MODEL_ID = "perplexity-ai/pplx-embed-v1-0.6b";
+const BASE_URL = `https://huggingface.co/${MODEL_ID}/resolve/main/onnx`;
+// Model variant to download (quantized is recommended for smaller size)
+const USE_QUANTIZED = true; // Set to false for FP32 full precision
+// Files to download based on variant
+const FILES = USE_QUANTIZED ? [
+    { name: 'model_quantized.onnx', size: '614 KB' },
+    { name: 'model_quantized.onnx_data', size: '706 MB' }
+] : [
+    { name: 'model.onnx', size: '520 KB' },
+    { name: 'model.onnx_data', size: '2.09 GB' },
+    { name: 'model.onnx_data_1', size: '306 MB' }
+];
+// Target directory
+const targetDir = path.join(CACHE_DIR, 'perplexity-ai', 'pplx-embed-v1-0.6b', 'onnx');
+function downloadFile(url, dest) {
+    return new Promise((resolve, reject) => {
+        const file = fs.createWriteStream(dest);
+        https.get(url, (response) => {
+            if (response.statusCode === 302 || response.statusCode === 301) {
+                // Follow redirect
+                const redirectUrl = response.headers.location;
+                if (redirectUrl) {
+                    file.close();
+                    fs.unlinkSync(dest);
+                    return downloadFile(redirectUrl, dest).then(resolve).catch(reject);
+                }
+            }
+            const totalSize = parseInt(response.headers['content-length'] || '0', 10);
+            let downloadedSize = 0;
+            response.on('data', (chunk) => {
+                downloadedSize += chunk.length;
+                const progress = ((downloadedSize / totalSize) * 100).toFixed(2);
+                process.stdout.write(`\r  Progress: ${progress}% (${(downloadedSize / 1024 / 1024).toFixed(2)} MB / ${(totalSize / 1024 / 1024).toFixed(2)} MB)`);
+            });
+            response.pipe(file);
+            file.on('finish', () => {
+                file.close();
+                console.log('\n  ✓ Download complete');
+                resolve();
+            });
+        }).on('error', (err) => {
+            fs.unlinkSync(dest);
+            reject(err);
+        });
+    });
+}
+async function downloadPplxEmbed() {
+    console.log('='.repeat(70));
+    console.log('Downloading Perplexity pplx-embed-v1-0.6b ONNX files');
+    console.log(`Variant: ${USE_QUANTIZED ? 'INT8 Quantized (Recommended)' : 'FP32 Full Precision'}`);
+    console.log('='.repeat(70));
+    console.log();
+    // Create target directory
+    if (!fs.existsSync(targetDir)) {
+        fs.mkdirSync(targetDir, { recursive: true });
+        console.log(`✓ Created directory: ${targetDir}`);
+    }
+    console.log();
+    console.log('Files to download:');
+    FILES.forEach(f => console.log(`  - ${f.name} (${f.size})`));
+    console.log();
+    console.log(`Total size: ${USE_QUANTIZED ? '~706 MB' : '~2.5 GB'}`);
+    console.log(`This may take ${USE_QUANTIZED ? '3-10' : '10-30'} minutes depending on your internet connection.`);
+    console.log();
+    if (USE_QUANTIZED) {
+        console.log('ℹ Using INT8 quantized model (recommended):');
+        console.log('  ✓ 3x smaller than FP32 (~706 MB vs ~2.4 GB)');
+        console.log('  ✓ Minimal quality loss (~1.5% MTEB drop)');
+        console.log('  ✓ Faster inference');
+        console.log();
+        console.log('  To use FP32 instead, edit src/download-pplx-embed.ts');
+        console.log('  and set USE_QUANTIZED = false');
+        console.log();
+    }
+    // Download each file
+    for (const file of FILES) {
+        const filePath = path.join(targetDir, file.name);
+        // Skip if already exists
+        if (fs.existsSync(filePath)) {
+            console.log(`⊘ Skipping ${file.name} (already exists)`);
+            continue;
+        }
+        console.log(`⬇ Downloading ${file.name} (${file.size})...`);
+        const url = `${BASE_URL}/${file.name}`;
+        try {
+            await downloadFile(url, filePath);
+        }
+        catch (error) {
+            console.error(`✗ Failed to download ${file.name}:`, error.message);
+            console.error('  Please download manually from:');
+            console.error(`  ${url}`);
+            process.exit(1);
+        }
+    }
+    console.log();
+    console.log('='.repeat(70));
+    console.log('✓ All files downloaded successfully!');
+    console.log('='.repeat(70));
+    console.log();
+    console.log('You can now use the model by setting in .env:');
+    console.log('  EMBEDDING_MODEL=perplexity-ai/pplx-embed-v1-0.6b');
+    console.log();
+    console.log('Then start the server with:');
+    console.log('  npm run start');
+    console.log();
+}
+downloadPplxEmbed().catch(console.error);

package/dist/embedding-service.js CHANGED Viewed

@@ -94,9 +94,16 @@ class EmbeddingService {
     tokenizer = null;
     modelId;
     dimensions;
+    useOllama;
+    ollamaModel;
+    ollamaBaseUrl;
     queue = Promise.resolve();
     constructor() {
         this.cache = new LRUCache(1000, 3600000); // 1000 entries, 1h TTL
+        // Check if Ollama should be used
+        this.useOllama = process.env.USE_OLLAMA === 'true';
+        this.ollamaModel = process.env.OLLAMA_EMBEDDING_MODEL || 'argus-ai/pplx-embed-v1-0.6b:q8_0';
+        this.ollamaBaseUrl = process.env.OLLAMA_BASE_URL || 'http://localhost:11434';
         // Support multiple embedding models via environment variable
         this.modelId = process.env.EMBEDDING_MODEL || "Xenova/bge-m3";
         // Set dimensions based on model
@@ -106,9 +113,20 @@ class EmbeddingService {
             "Xenova/bge-small-en-v1.5": 384,
             "Xenova/nomic-embed-text-v1": 768,
             "onnx-community/Qwen3-Embedding-0.6B-ONNX": 1024,
+            // Note: perplexity-ai models require manual ONNX file placement
+            // See PPLX_EMBED_INTEGRATION.md for instructions
+            "perplexity-ai/pplx-embed-v1-0.6b": 1024,
+            "perplexity-ai/pplx-embed-v1-4b": 2560,
+            // Ollama models
+            "argus-ai/pplx-embed-v1-0.6b:q8_0": 1024,
         };
-        this.dimensions = dimensionMap[this.modelId] || 1024;
-        console.error(`[EmbeddingService] Using model: ${this.modelId} (${this.dimensions} dimensions)`);
+        this.dimensions = dimensionMap[this.useOllama ? this.ollamaModel : this.modelId] || 1024;
+        if (this.useOllama) {
+            console.error(`[EmbeddingService] Using Ollama: ${this.ollamaModel} @ ${this.ollamaBaseUrl} (${this.dimensions} dimensions)`);
+        }
+        else {
+            console.error(`[EmbeddingService] Using ONNX model: ${this.modelId} (${this.dimensions} dimensions)`);
+        }
     }
     // Public getter for dimensions
     getDimensions() {
@@ -125,6 +143,11 @@ class EmbeddingService {
     async init() {
         if (this.session && this.tokenizer)
             return;
+        // Skip ONNX initialization if using Ollama
+        if (this.useOllama) {
+            console.error('[EmbeddingService] Using Ollama backend, skipping ONNX initialization');
+            return;
+        }
         try {
             // 1. Check if model needs to be downloaded
             // Extract namespace and model name from modelId (e.g., "Xenova/bge-m3" or "onnx-community/Qwen3-Embedding-0.6B-ONNX")
@@ -139,10 +162,23 @@ class EmbeddingService {
             if (!fs.existsSync(fp32Path) && !fs.existsSync(quantizedPath)) {
                 console.log(`[EmbeddingService] Model not found, downloading ${this.modelId}...`);
                 console.log(`[EmbeddingService] This may take a few minutes on first run.`);
-                // Import AutoModel dynamically to trigger download
-                const { AutoModel } = await import("@xenova/transformers");
-                await AutoModel.from_pretrained(this.modelId, { quantized: false });
-                console.log(`[EmbeddingService] Model download completed.`);
+                // Check if this is a Xenova-compatible model
+                if (namespace === 'Xenova' || namespace === 'onnx-community') {
+                    // Import AutoModel dynamically to trigger download
+                    const { AutoModel } = await import("@xenova/transformers");
+                    await AutoModel.from_pretrained(this.modelId, { quantized: false });
+                    console.log(`[EmbeddingService] Model download completed.`);
+                }
+                else {
+                    // For non-Xenova models (like perplexity-ai), provide manual download instructions
+                    console.error(`[EmbeddingService] ERROR: Model ${this.modelId} is not available via @xenova/transformers`);
+                    console.error(`[EmbeddingService] Please download the model manually:`);
+                    console.error(`[EmbeddingService] 1. Visit: https://huggingface.co/${this.modelId}`);
+                    console.error(`[EmbeddingService] 2. Download the 'onnx' folder contents`);
+                    console.error(`[EmbeddingService] 3. Place files in: ${baseDir}`);
+                    console.error(`[EmbeddingService] See PPLX_EMBED_INTEGRATION.md for detailed instructions`);
+                    throw new Error(`Model ${this.modelId} requires manual download. See error messages above.`);
+                }
             }
             // 2. Load Tokenizer
             if (!this.tokenizer) {
@@ -188,6 +224,10 @@ class EmbeddingService {
                 return cached;
             }
             try {
+                // Use Ollama if enabled
+                if (this.useOllama) {
+                    return await this.embedWithOllama(textStr);
+                }
                 await this.init();
                 if (!this.session || !this.tokenizer)
                     throw new Error("Session/Tokenizer not initialized");
@@ -240,6 +280,37 @@ class EmbeddingService {
             }
         });
     }
+    async embedWithOllama(text) {
+        try {
+            const response = await fetch(`${this.ollamaBaseUrl}/api/embeddings`, {
+                method: 'POST',
+                headers: {
+                    'Content-Type': 'application/json',
+                },
+                body: JSON.stringify({
+                    model: this.ollamaModel,
+                    prompt: text,
+                }),
+            });
+            if (!response.ok) {
+                throw new Error(`Ollama API error: ${response.status} ${response.statusText}`);
+            }
+            const data = await response.json();
+            if (!data.embedding || !Array.isArray(data.embedding)) {
+                throw new Error('Invalid response from Ollama API');
+            }
+            const embedding = data.embedding;
+            // Normalize the embedding
+            const normalized = this.normalize(embedding);
+            // Cache it
+            this.cache.set(text, normalized);
+            return normalized;
+        }
+        catch (error) {
+            console.error(`[EmbeddingService] Ollama error for "${text.substring(0, 20)}...":`, error?.message || error);
+            return new Array(this.dimensions).fill(0);
+        }
+    }
     // Batch-Embeddings
     async embedBatch(texts) {
         // For now, process sequentially via serialized queue to avoid overloading
@@ -306,7 +377,8 @@ class EmbeddingService {
         return {
             size: this.cache.size(),
             maxSize: 1000,
-            model: this.modelId,
+            model: this.useOllama ? this.ollamaModel : this.modelId,
+            backend: this.useOllama ? 'ollama' : 'onnx',
             dimensions: this.dimensions
         };
     }

package/dist/eval-suite.js CHANGED Viewed

@@ -88,10 +88,14 @@ async function runEvaluation() {
     }
     const server = new index_1.MemoryServer(EVAL_DB_PATH);
     await server.embeddingService.embed("warmup");
+    // Warmup reranker
+    await server.hybridSearch.advancedSearch({ query: "warmup", limit: 1, rerank: true });
     await setupEvalData(server);
     const methods = [
         { name: "Hybrid Search", func: (q) => server.hybridSearch.search({ query: q, limit: 10 }) },
+        { name: "Reranked Search", func: (q) => server.hybridSearch.search({ query: q, limit: 10, rerank: true }) },
         { name: "Graph-RAG", func: (q) => server.hybridSearch.graphRag({ query: q, limit: 10, graphConstraints: { maxDepth: 2 } }) },
+        { name: "Graph-RAG (Reranked)", func: (q) => server.hybridSearch.graphRag({ query: q, limit: 10, graphConstraints: { maxDepth: 2 }, rerank: true }) },
         { name: "Graph-Walking", func: (q) => server.graph_walking({ query: q, limit: 10, max_depth: 3 }) }
     ];
     const summary = [];
@@ -101,6 +105,9 @@ async function runEvaluation() {
         let totalRecall10 = 0;
         let totalMRR = 0;
         let totalLatency = 0;
+        const n = EVAL_DATASET.length;
+        // Reset cache between methods to get accurate latency
+        await server.hybridSearch.clearCache();
         for (const task of EVAL_DATASET) {
             const t0 = perf_hooks_1.performance.now();
             const results = await method.func(task.query);
@@ -113,7 +120,6 @@ async function runEvaluation() {
             totalMRR += mrr;
             totalLatency += (t1 - t0);
         }
-        const n = EVAL_DATASET.length;
         summary.push({
             Method: method.name,
             "Recall@3": (totalRecall3 / n).toFixed(3),

package/dist/hybrid-search.js CHANGED Viewed

@@ -5,15 +5,18 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
 Object.defineProperty(exports, "__esModule", { value: true });
 exports.HybridSearch = void 0;
 const crypto_1 = __importDefault(require("crypto"));
+const reranker_service_1 = require("./reranker-service");
 const SEMANTIC_CACHE_THRESHOLD = 0.95;
 class HybridSearch {
     db;
     embeddingService;
+    rerankerService;
     searchCache = new Map();
     CACHE_TTL = 300000; // 5 minutes cache
     constructor(db, embeddingService) {
         this.db = db;
         this.embeddingService = embeddingService;
+        this.rerankerService = new reranker_service_1.RerankerService();
     }
     getCacheKey(options) {
         const str = JSON.stringify({
@@ -75,6 +78,36 @@ class HybridSearch {
             return { ...r, score };
         });
     }
+    async applyReranking(query, results) {
+        if (results.length <= 1)
+            return results;
+        console.error(`[HybridSearch] Reranking ${results.length} candidates...`);
+        const documents = results.map(r => {
+            const parts = [
+                r.name ? `Name: ${r.name}` : '',
+                r.type ? `Type: ${r.type}` : '',
+                r.text ? `Description: ${r.text}` : '',
+                r.metadata ? `Details: ${JSON.stringify(r.metadata)}` : ''
+            ].filter(p => p !== '');
+            return parts.join(' | ');
+        });
+        try {
+            const rerankedOrder = await this.rerankerService.rerank(query, documents);
+            return rerankedOrder.map((item, i) => {
+                const original = results[item.index];
+                return {
+                    ...original,
+                    score: (item.score + 1.0) / 2.0, // Normalize to 0-1 range if it's logits, or just use as is
+                    explanation: (typeof original.explanation === 'string' ? original.explanation : JSON.stringify(original.explanation)) +
+                        ` | Reranked (Rank ${i + 1}, Cross-Encoder Score: ${item.score.toFixed(4)})`
+                };
+            });
+        }
+        catch (e) {
+            console.error(`[HybridSearch] Reranking failed, returning original results:`, e);
+            return results;
+        }
+    }
     async advancedSearch(options) {
         console.error("[HybridSearch] Starting advancedSearch with options:", JSON.stringify(options, null, 2));
         const { query, limit = 10, filters, graphConstraints, vectorParams } = options;
@@ -212,6 +245,12 @@ class HybridSearch {
                 });
             }
             const finalResults = this.applyTimeDecay(searchResults);
+            // Phase 3: Reranking
+            if (options.rerank) {
+                const rerankedResults = await this.applyReranking(options.query, finalResults);
+                await this.updateCache(options, queryEmbedding, rerankedResults);
+                return rerankedResults;
+            }
             await this.updateCache(options, queryEmbedding, finalResults);
             return finalResults;
         }
@@ -330,7 +369,11 @@ class HybridSearch {
                     return Object.entries(filters.metadata).every(([key, val]) => r.metadata[key] === val);
                 });
             }
-            return this.applyTimeDecay(searchResults);
+            const decayedResults = this.applyTimeDecay(searchResults);
+            if (options.rerank) {
+                return await this.applyReranking(options.query, decayedResults);
+            }
+            return decayedResults;
         }
         catch (e) {
             console.error("[HybridSearch] Error in graphRag:", e.message);
@@ -414,7 +457,8 @@ No markdown, no explanation. Just the JSON.`;
                     filters: {
                         ...options.filters,
                         entityTypes: ["CommunitySummary"]
-                    }
+                    },
+                    rerank: options.rerank
                 });
                 // If no community summaries found, fallback to standard search
                 if (results.length === 0) {
@@ -437,5 +481,15 @@ No markdown, no explanation. Just the JSON.`;
             }
         }));
     }
+    async clearCache() {
+        this.searchCache.clear();
+        try {
+            await this.db.run(`{ ?[query_hash] := *search_cache{query_hash} :rm search_cache {query_hash} }`);
+            console.error("[HybridSearch] Cache cleared successfully.");
+        }
+        catch (e) {
+            console.error("[HybridSearch] Error clearing cache:", e);
+        }
+    }
 }
 exports.HybridSearch = HybridSearch;

package/dist/index.js CHANGED Viewed

@@ -2631,6 +2631,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
                 entity_types: zod_1.z.array(zod_1.z.string()).optional().describe("Filter by entity types"),
                 include_entities: zod_1.z.boolean().optional().default(true).describe("Include entities in search"),
                 include_observations: zod_1.z.boolean().optional().default(true).describe("Include observations in search"),
+                rerank: zod_1.z.boolean().optional().default(false).describe("Use Cross-Encoder reranking for higher precision"),
             }),
             zod_1.z.object({
                 action: zod_1.z.literal("advancedSearch"),
@@ -2658,6 +2659,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
                 vectorParams: zod_1.z.object({
                     efSearch: zod_1.z.number().optional().describe("HNSW search precision"),
                 }).optional().describe("Vector parameters"),
+                rerank: zod_1.z.boolean().optional().default(false).describe("Use Cross-Encoder reranking for higher precision"),
             }),
             zod_1.z.object({
                 action: zod_1.z.literal("context"),
@@ -2679,6 +2681,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
                 query: zod_1.z.string().describe("Search query for initial vector seeds"),
                 max_depth: zod_1.z.number().min(1).max(3).optional().default(2).describe("Maximum depth of graph expansion (Default: 2)"),
                 limit: zod_1.z.number().optional().default(10).describe("Number of initial vector seeds"),
+                rerank: zod_1.z.boolean().optional().default(false).describe("Use Cross-Encoder reranking for higher precision"),
             }),
             zod_1.z.object({
                 action: zod_1.z.literal("graph_walking"),
@@ -2691,6 +2694,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
                 action: zod_1.z.literal("agentic_search"),
                 query: zod_1.z.string().describe("Context query for agentic routing"),
                 limit: zod_1.z.number().optional().default(10).describe("Maximum number of results"),
+                rerank: zod_1.z.boolean().optional().default(false).describe("Use Cross-Encoder reranking for higher precision"),
             }),
         ]);
         const QueryMemoryParameters = zod_1.z.object({
@@ -2711,6 +2715,7 @@ Validation: Invalid syntax or missing columns in inference rules will result in
             as_of: zod_1.z.string().optional().describe("Only for entity_details: ISO string or 'NOW'"),
             max_depth: zod_1.z.number().optional().describe("Only for graph_rag/graph_walking: Maximum expansion depth"),
             start_entity_id: zod_1.z.string().optional().describe("Only for graph_walking: Start entity"),
+            rerank: zod_1.z.boolean().optional().describe("Only for search/advancedSearch/agentic_search: Enable Cross-Encoder reranking"),
         });
         this.mcp.addTool({
             name: "query_memory",
@@ -2745,6 +2750,7 @@ Notes: 'agentic_search' is the most powerful and adaptable, 'context' is ideal f
                         entityTypes: input.entity_types,
                         includeEntities: input.include_entities,
                         includeObservations: input.include_observations,
+                        rerank: input.rerank,
                     });
                     const conflictEntityIds = Array.from(new Set(results
                         .map((r) => (r.name ? r.id : r.entity_id))
@@ -2776,6 +2782,7 @@ Notes: 'agentic_search' is the most powerful and adaptable, 'context' is ideal f
                         filters: input.filters,
                         graphConstraints: input.graphConstraints,
                         vectorParams: input.vectorParams,
+                        rerank: input.rerank,
                     });
                     const conflictEntityIds = Array.from(new Set(results
                         .map((r) => (r.name ? r.id : r.entity_id))
@@ -2888,6 +2895,7 @@ Notes: 'agentic_search' is the most powerful and adaptable, 'context' is ideal f
                     const results = await this.hybridSearch.agenticRetrieve({
                         query: input.query,
                         limit: input.limit,
+                        rerank: input.rerank,
                     });
                     return JSON.stringify(results);
                 }
@@ -2900,7 +2908,8 @@ Notes: 'agentic_search' is the most powerful and adaptable, 'context' is ideal f
                         limit: input.limit,
                         graphConstraints: {
                             maxDepth: input.max_depth
-                        }
+                        },
+                        rerank: input.rerank,
                     });
                     return JSON.stringify(results);
                 }

package/dist/reranker-service.js ADDED Viewed

@@ -0,0 +1,125 @@
+"use strict";
+var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    var desc = Object.getOwnPropertyDescriptor(m, k);
+    if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+      desc = { enumerable: true, get: function() { return m[k]; } };
+    }
+    Object.defineProperty(o, k2, desc);
+}) : (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    o[k2] = m[k];
+}));
+var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
+    Object.defineProperty(o, "default", { enumerable: true, value: v });
+}) : function(o, v) {
+    o["default"] = v;
+});
+var __importStar = (this && this.__importStar) || (function () {
+    var ownKeys = function(o) {
+        ownKeys = Object.getOwnPropertyNames || function (o) {
+            var ar = [];
+            for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
+            return ar;
+        };
+        return ownKeys(o);
+    };
+    return function (mod) {
+        if (mod && mod.__esModule) return mod;
+        var result = {};
+        if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
+        __setModuleDefault(result, mod);
+        return result;
+    };
+})();
+Object.defineProperty(exports, "__esModule", { value: true });
+exports.RerankerService = void 0;
+const transformers_1 = require("@xenova/transformers");
+const path = __importStar(require("path"));
+const fs = __importStar(require("fs"));
+// Robust path to project root
+const PROJECT_ROOT = path.resolve(__dirname, '..');
+const CACHE_DIR = path.resolve(PROJECT_ROOT, '.cache');
+transformers_1.env.cacheDir = CACHE_DIR;
+transformers_1.env.allowLocalModels = true;
+class RerankerService {
+    pipe = null;
+    modelId;
+    initialized = false;
+    constructor() {
+        // Using a tiny but effective cross-encoder
+        this.modelId = process.env.RERANKER_MODEL || "Xenova/ms-marco-MiniLM-L-6-v2";
+        console.error(`[RerankerService] Using model: ${this.modelId}`);
+    }
+    async init() {
+        if (this.initialized)
+            return;
+        try {
+            // Check if model exists locally in cache
+            const parts = this.modelId.split('/');
+            const namespace = parts[0];
+            const modelName = parts[1];
+            const modelDir = path.join(CACHE_DIR, namespace, modelName);
+            if (!fs.existsSync(modelDir)) {
+                console.log(`[RerankerService] Model not found, downloading ${this.modelId}...`);
+            }
+            // We use the sequence-classification task for cross-encoders
+            this.pipe = await (0, transformers_1.pipeline)('sequence-classification', this.modelId, {
+                quantized: true,
+                // @ts-ignore
+                progress_callback: (info) => {
+                    if (info.status === 'done') {
+                        console.error(`[RerankerService] Loaded shard: ${info.file}`);
+                    }
+                }
+            });
+            this.initialized = true;
+            console.error(`[RerankerService] Initialization complete.`);
+        }
+        catch (error) {
+            console.error(`[RerankerService] Initialization failed:`, error);
+            throw error;
+        }
+    }
+    /**
+     * Reranks a list of documents based on a query.
+     * @param query The search query
+     * @param documents Array of document strings to rank
+     * @returns Array of { index, score } sorted by score descending
+     */
+    async rerank(query, documents) {
+        if (documents.length === 0)
+            return [];
+        await this.init();
+        try {
+            const results = [];
+            // Cross-encoders take pairs of [query, document]
+            // We can process them in a single batch
+            const inputs = documents.map(doc => [query, doc]);
+            // @ts-ignore
+            const outputs = await this.pipe(inputs, {
+                topk: 1 // We want the score for the "relevant" class (usually index 1 or the only output)
+            });
+            // Handle both array of results and single result (if only 1 doc)
+            const outputArray = Array.isArray(outputs) ? outputs : [outputs];
+            for (let i = 0; i < outputArray.length; i++) {
+                // Cross-encoders for ms-marco typically output a single logit/score or a 2-class distribution
+                // transformers.js sequence-classification returns { label: string, score: number }[]
+                // For ms-marco, label 'LABEL_1' is usually the relevance score
+                const out = outputArray[i];
+                results.push({
+                    index: i,
+                    score: out.score || 0
+                });
+            }
+            // Sort by score descending
+            return results.sort((a, b) => b.score - a.score);
+        }
+        catch (error) {
+            console.error(`[RerankerService] Reranking failed:`, error);
+            // Fallback: return original order with neutral scores
+            return documents.map((_, i) => ({ index: i, score: 0 }));
+        }
+    }
+}
+exports.RerankerService = RerankerService;

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "cozo-memory",
-  "version": "1.0.8",
+  "version": "1.1.0",
   "mcpName": "io.github.tobs-code/cozo-memory",
   "description": "Local-first persistent memory system for AI agents with hybrid search, graph reasoning, and MCP integration",
   "main": "dist/index.js",
@@ -38,7 +38,9 @@
     "test": "echo \"Error: no test specified\" && exit 1",
     "benchmark": "ts-node src/benchmark.ts",
     "eval": "ts-node src/eval-suite.ts",
-    "download-model": "ts-node src/download-model.ts"
+    "download-model": "ts-node src/download-model.ts",
+    "download-pplx-embed": "ts-node src/download-pplx-embed.ts",
+    "compare-embeddings": "ts-node src/compare-embeddings.ts"
   },
   "keywords": [
     "mcp",
@@ -95,4 +97,4 @@
     "tsx": "^4.21.0",
     "typescript": "^5.9.3"
   }
-}
+}