npm - cozo-memory - Versions diffs - 1.2.9 → 1.2.10 - Mend

cozo-memory 1.2.9 → 1.2.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -7,6 +7,8 @@
 > **Why Cozo Memory?**
 > LLMs have short-term memory limits. Standard RAG retrieves documents but can't connect facts across time. Cozo Memory gives your AI agent **persistent, structured memory** – it remembers past conversations, infers relationships, detects contradictions, and explores its knowledge graph – fully on your machine, with **optional local LLM integration via Ollama** for intelligent actions (cleanup, reflection, summarization, agentic routing).
+>
+> Most memory stacks combine separate databases: SQLite for facts, Chroma for vector search, NetworkX for graphs. **CozoDB replaces all of that with one embedded engine**: relational, graph, vector, and full-text search in a single query language, one file, zero sync lag.
 **Local-first memory for Claude & AI agents with hybrid search, Graph-RAG, and time-travel – runs entirely on your machine. Optional [Ollama](https://ollama.ai) integration enables LLM-powered actions (cleanup, reflect, summarize, agentic retrieval).**
@@ -65,17 +67,36 @@ Now add the server to your MCP client (e.g. Claude Desktop) – see [Integration
 ## Positioning & Comparison
+### Why CozoDB instead of SQLite + Chroma + NetworkX?
+A common first question is: *"Why not just combine existing tools?"*
+| If you need... | Typical separate stack | CozoDB Memory |
+| :--- | :--- | :--- |
+| Structured data & relations | **SQLite** / PostgreSQL | ✅ Built-in relational engine |
+| Semantic / vector search | **Chroma** / Qdrant / Pinecone | ✅ HNSW + FTS + RRF in one engine |
+| Graph traversal & reasoning | **NetworkX** / Neo4j | ✅ Native graph queries + PageRank |
+| Time-travel / versioning | Custom audit tables | ✅ Built-in `Validity` time-travel |
+| Unified query language | Multiple APIs + glue code | ✅ Single Datalog query across all dimensions |
+**The core insight:** Most memory stacks bolt vector search onto a graph DB, or graph search onto a vector DB. CozoDB is different: it is a **single engine** that natively combines relational, graph, vector, and full-text search. That means:
+- **One query language** (Datalog) reaches every dimension.
+- **No sync lag** between separate indexes.
+- **No ETL bridge** between "vector results" and "graph expansion."
+- **Smaller operational surface**: one database file, one process, one dependency chain.
+### Comparison with other memory solutions
 Most "Memory" MCP servers fall into two categories:
 1. **Simple Knowledge Graphs**: CRUD operations on triples, often only text search
 2. **Pure Vector Stores**: Semantic search (RAG), but little understanding of complex relationships
-This server fills the gap in between ("Sweet Spot"): A **local, database-backed memory engine** combining vector, graph, and keyword signals.
-### Comparison with other solutions
+This server fills the gap in between ("Sweet Spot"): A **local, database-backed memory engine** combining vector, graph, and keyword signals — powered by CozoDB's unified engine rather than a patchwork of separate databases.
 | Feature | **CozoDB Memory (This Project)** | **Official Reference (`@modelcontextprotocol/server-memory`)** | **mcp-memory-service (Community)** | **Database Adapters (Qdrant/Neo4j)** |
 | :--- | :--- | :--- | :--- | :--- |
-| **Backend** | **CozoDB** (Graph + Vector + Relational) | JSON file (`memory.jsonl`) | SQLite / Cloudflare | Specialized DB (only Vector or Graph) |
+| **Backend** | **CozoDB** (Graph + Vector + Relational + FTS in one engine) | JSON file (`memory.jsonl`) | SQLite / Cloudflare | Specialized DB (only Vector or Graph) |
 | **Search Logic** | **Agentic (Auto-Route)**: Hybrid + Graph + Summaries | Keyword only / Exact Graph Match | Vector + Keyword | Mostly only one dimension |
 | **Inference** | **Yes**: Built-in engine for implicit knowledge | No | No ("Dreaming" is consolidation) | No (Retrieval only) |
 | **Community** | **Yes**: Hierarchical Community Summaries | No | No | Only clustering (no summary) |

package/dist/benchmark.js CHANGED Viewed

@@ -8,153 +8,431 @@ const path_1 = __importDefault(require("path"));
 const fs_1 = __importDefault(require("fs"));
 const perf_hooks_1 = require("perf_hooks");
 const BENCHMARK_DB_PATH = path_1.default.join(process.cwd(), "benchmark_db");
-async function runBenchmark() {
-    console.log("🚀 Starting Performance Benchmark...");
-    // Cleanup
+function parseArgs() {
+    const args = process.argv.slice(2);
+    const opts = {
+        format: process.env.BENCH_FORMAT || "text",
+        runs: parseInt(process.env.BENCH_RUNS || "5", 10),
+        warmupRuns: parseInt(process.env.BENCH_WARMUP || "2", 10),
+        enableRerank: (process.env.BENCH_ENABLE_RERANK || "false").toLowerCase() !== "false",
+    };
+    for (let i = 0; i < args.length; i++) {
+        const a = args[i];
+        if (a === "--format" && args[i + 1])
+            opts.format = args[++i];
+        else if (a === "--runs" && args[i + 1])
+            opts.runs = Math.max(1, parseInt(args[++i], 10));
+        else if (a === "--warmup" && args[i + 1])
+            opts.warmupRuns = Math.max(0, parseInt(args[++i], 10));
+        else if (a === "--csv" && args[i + 1])
+            opts.csvPath = args[++i];
+        else if (a === "--enable-rerank")
+            opts.enableRerank = true;
+        else if (a === "--no-rerank")
+            opts.enableRerank = false;
+    }
+    if (!["text", "json", "markdown"].includes(opts.format)) {
+        opts.format = "text";
+    }
+    return opts;
+}
+function percentile(sorted, p) {
+    if (sorted.length === 0)
+        return 0;
+    const pos = (sorted.length - 1) * p;
+    const base = Math.floor(pos);
+    const rest = pos - base;
+    if (sorted[base + 1] !== undefined) {
+        return sorted[base] + rest * (sorted[base + 1] - sorted[base]);
+    }
+    return sorted[base];
+}
+function mean(values) {
+    if (values.length === 0)
+        return 0;
+    return values.reduce((a, b) => a + b, 0) / values.length;
+}
+function median(values) {
+    const sorted = values.slice().sort((a, b) => a - b);
+    return percentile(sorted, 0.5);
+}
+function stddev(values) {
+    if (values.length < 2)
+        return 0;
+    const m = mean(values);
+    const v = values.reduce((s, x) => s + (x - m) ** 2, 0) / (values.length - 1);
+    return Math.sqrt(v);
+}
+function formatNum(n, digits = 2) {
+    return n.toFixed(digits);
+}
+async function time(fn) {
+    const t0 = perf_hooks_1.performance.now();
+    const result = await fn();
+    const t1 = perf_hooks_1.performance.now();
+    return { result, ms: t1 - t0 };
+}
+async function warmupServer(server, times = 2) {
+    const durations = [];
+    for (let i = 0; i < times; i++) {
+        const { ms } = await time(async () => {
+            await server.hybridSearch.search({ query: "warmup benchmark", limit: 5, includeEntities: true, includeObservations: true });
+            await server.hybridSearch.graphRag({ query: "warmup", limit: 5, graphConstraints: { maxDepth: 1 } });
+        });
+        durations.push(ms);
+    }
+    return durations.length ? mean(durations) : 0;
+}
+function computeNDCG(results, expectedNames, k) {
+    const topK = results.slice(0, k);
+    const relevances = topK.map((r) => {
+        const name = (r.name || "").toLowerCase();
+        return expectedNames.some(e => e.toLowerCase() === name) ? 1 : 0;
+    });
+    const ideal = expectedNames.slice(0, k).map(() => 1);
+    const dcg = relevances.reduce((sum, rel, idx) => sum + rel / Math.log2(idx + 2), 0);
+    const idealDcg = ideal.reduce((sum, rel, idx) => sum + rel / Math.log2(idx + 2), 0);
+    return idealDcg === 0 ? 0 : dcg / idealDcg;
+}
+function computeRecall(results, expectedNames, k) {
+    const topK = results.slice(0, k);
+    const found = expectedNames.filter(name => topK.some(r => (r.name || "").toLowerCase() === name.toLowerCase()));
+    return expectedNames.length ? found.length / expectedNames.length : 0;
+}
+function computeMRR(results, expectedNames) {
+    for (let i = 0; i < results.length; i++) {
+        const name = (results[i].name || "").toLowerCase();
+        if (expectedNames.some(e => e.toLowerCase() === name)) {
+            return 1 / (i + 1);
+        }
+    }
+    return 0;
+}
+async function seedData(server) {
+    const entities = [];
+    const addEntity = async (name, type, metadata) => {
+        const entity = await server.createEntity({ name, type, metadata });
+        entities.push(entity);
+        return entity;
+    };
+    const acme = await addEntity("Acme Corp", "Organization", {});
+    const openai = await addEntity("OpenAI", "Organization", {});
+    const google = await addEntity("Google", "Organization", {});
+    const samOpenAI = await addEntity("Sam Altman", "Person", {});
+    const samAcme = await addEntity("Sam Brown", "Person", {});
+    const aliceGoogle = await addEntity("Alice Chen", "Person", {});
+    const aliceAcme = await addEntity("Alice Walker", "Person", {});
+    const bobEngineer = await addEntity("Bob Martinez", "Person", {});
+    const projectX = await addEntity("Project X", "Project", {});
+    const projectY = await addEntity("Project Y", "Project", {});
+    const datalog = await addEntity("Datalog", "Technology", {});
+    const python = await addEntity("Python", "Technology", {});
+    const rust = await addEntity("Rust", "Technology", {});
+    const oldInitiative = await addEntity("Legacy Initiative", "Project", {});
+    const newInitiative = await addEntity("Cloud Initiative", "Project", {});
+    const obs = async (entityId, text, metadata) => server.addObservation({ entity_id: entityId, text, metadata });
+    const rel = async (fromId, toId, relationType, strength = 0.9) => server.createRelation({ from_id: fromId, to_id: toId, relation_type: relationType, strength });
+    await obs(samOpenAI.id, "Sam Altman is the CEO of OpenAI since 2019.", { year: 2019 });
+    await obs(samOpenAI.id, "Sam Altman briefly joined Acme Corp as advisor in 2023.", { year: 2023 });
+    await obs(samOpenAI.id, "Sam Altman returned to OpenAI full-time in late 2023.", { year: 2023 });
+    await obs(samAcme.id, "Sam Brown is the CFO of Acme Corp since 2021.", { year: 2021 });
+    await obs(aliceGoogle.id, "Alice Chen works at Google on search ranking.", { year: 2022 });
+    await obs(aliceGoogle.id, "Alice Chen moved to Acme Corp as VP Engineering in 2024.", { year: 2024 });
+    await obs(aliceAcme.id, "Alice Walker is a product manager at Acme Corp.", { year: 2020 });
+    await obs(bobEngineer.id, "Bob Martinez is a senior engineer on Project X.", { year: 2022 });
+    await obs(bobEngineer.id, "Bob Martinez switched from Python to Rust in 2024.", { year: 2024 });
+    await obs(projectX.id, "Project X is Acme Corp's internal search engine.", { year: 2021 });
+    await obs(projectX.id, "Project X is being rewritten in Rust.", { year: 2024 });
+    await obs(projectY.id, "Project Y is Acme Corp's data lake.", { year: 2022 });
+    await obs(projectY.id, "Project Y was paused in 2024.", { year: 2024 });
+    await obs(acme.id, "Acme Corp acquired a Datalog startup in 2022.", { year: 2022 });
+    await obs(acme.id, "Acme Corp is headquartered in Berlin.", { year: 2020 });
+    await obs(datalog.id, "Datalog is used inside Project X for policy rules.", { year: 2023 });
+    await obs(datalog.id, "Datalog was replaced by SQL in Project Y.", { year: 2024 });
+    await obs(oldInitiative.id, "Legacy Initiative was cancelled in 2023.", { year: 2023 });
+    await obs(newInitiative.id, "Cloud Initiative started in 2024.", { year: 2024 });
+    await rel(samOpenAI.id, openai.id, "works_at", 0.95);
+    await rel(samOpenAI.id, acme.id, "advised", 0.7);
+    await rel(samAcme.id, acme.id, "works_at", 0.95);
+    await rel(aliceGoogle.id, google.id, "works_at", 0.9);
+    await rel(aliceGoogle.id, acme.id, "works_at", 0.95);
+    await rel(aliceAcme.id, acme.id, "works_at", 0.95);
+    await rel(bobEngineer.id, projectX.id, "works_on", 0.95);
+    await rel(bobEngineer.id, projectY.id, "works_on", 0.4);
+    await rel(projectX.id, datalog.id, "uses_tech", 0.9);
+    await rel(projectX.id, rust.id, "uses_tech", 0.85);
+    await rel(projectY.id, python.id, "uses_tech", 0.8);
+    await rel(acme.id, oldInitiative.id, "owns", 0.7);
+    await rel(acme.id, newInitiative.id, "owns", 0.9);
+    const distractors = [];
+    for (let i = 0; i < 220; i++) {
+        distractors.push(`Background note ${i}: noise about ${i % 5 === 0 ? 'Paris' : i % 5 === 1 ? 'Tokyo' : i % 5 === 2 ? 'finance' : i % 5 === 3 ? 'marketing' : 'logistics'} seed ${i}.`);
+    }
+    for (let i = 0; i < distractors.length; i++) {
+        const target = entities[i % entities.length];
+        await server.addObservation({ entity_id: target.id, text: distractors[i] });
+    }
+    return { entities, NUM_ENTITIES: entities.length, NUM_OBSERVATIONS: 265, NUM_RELATIONS: 14 };
+}
+async function measureRecall(server, runs, warmupRuns, opts) {
+    const tasks = [
+        { query: "Who works at OpenAI?", expected: ["Sam Altman", "OpenAI"], type: "factual" },
+        { query: "Current CEO of OpenAI", expected: ["Sam Altman"], type: "factual" },
+        { query: "Alice engineering manager Acme", expected: ["Alice Walker", "Alice Chen"], type: "ambiguous" },
+        { query: "Who is Bob's colleague on the search engine project?", expected: ["Project X"], type: "relational" },
+        { query: "Project using Datalog and Rust", expected: ["Project X"], type: "multi-hop" },
+        { query: "Technology switched by Bob in 2024", expected: ["Rust", "Python"], type: "temporal" },
+        { query: "Current Acme active initiative 2024", expected: ["Cloud Initiative"], type: "temporal" },
+        { query: "Acme acquisition technology 2022", expected: ["Datalog"], type: "multi-hop" },
+        { query: "Sam Altman Acme advisor", expected: ["Sam Altman", "Acme Corp"], type: "relational" },
+        { query: "Person VP Engineering Acme 2024", expected: ["Alice Chen", "Alice Walker"], type: "temporal" },
+    ];
+    const methods = [
+        { name: "Hybrid Search", fn: (q) => server.hybridSearch.search({ query: q, limit: 10, includeEntities: true, includeObservations: true }) },
+        { name: "Graph-RAG", fn: (q) => server.hybridSearch.graphRag({ query: q, limit: 10, graphConstraints: { maxDepth: 2 } }) },
+        { name: "Graph-Walking", fn: (q) => server.graph_walking({ query: q, limit: 10, max_depth: 3 }) },
+        ...(opts.enableRerank ? [
+            { name: "Reranked Search", fn: (q) => server.hybridSearch.search({ query: q, limit: 10, rerank: true, includeEntities: true, includeObservations: true }) },
+            { name: "Graph-RAG (Reranked)", fn: (q) => server.hybridSearch.graphRag({ query: q, limit: 10, graphConstraints: { maxDepth: 2 }, rerank: true }) },
+        ] : []),
+    ];
+    const results = [];
+    for (const method of methods) {
+        const allRunsRecall10 = [];
+        const allRunsRecall3 = [];
+        const allRunsMRR = [];
+        const allRunsNDCG10 = [];
+        const allRunsLatency = [];
+        for (let r = 0; r < warmupRuns + runs; r++) {
+            await server.hybridSearch.clearCache();
+            let r10 = 0, r3 = 0, mrr = 0, ndcg = 0, lat = 0;
+            for (const task of tasks) {
+                const { result, ms } = await time(() => method.fn(task.query));
+                r10 += computeRecall(result, task.expected, 10);
+                r3 += computeRecall(result, task.expected, 3);
+                mrr += computeMRR(result, task.expected);
+                ndcg += computeNDCG(result, task.expected, 10);
+                lat += ms;
+            }
+            const n = tasks.length;
+            if (r >= warmupRuns) {
+                allRunsRecall10.push(r10 / n);
+                allRunsRecall3.push(r3 / n);
+                allRunsMRR.push(mrr / n);
+                allRunsNDCG10.push(ndcg / n);
+                allRunsLatency.push(lat / n);
+            }
+        }
+        results.push({
+            method: method.name,
+            recallAt10: mean(allRunsRecall10),
+            recallAt3: mean(allRunsRecall3),
+            mrr: mean(allRunsMRR),
+            ndcgAt10: mean(allRunsNDCG10),
+            avgLatencyMs: mean(allRunsLatency),
+            p50LatencyMs: median(allRunsLatency),
+            p95LatencyMs: percentile(allRunsLatency.slice().sort((a, b) => a - b), 0.95),
+        });
+    }
+    return results;
+}
+async function runBenchmark(opts) {
+    console.log(`🚀 Starting Performance Benchmark (runs=${opts.runs}, warmup=${opts.warmupRuns}, format=${opts.format})`);
     if (fs_1.default.existsSync(BENCHMARK_DB_PATH + ".db")) {
         fs_1.default.unlinkSync(BENCHMARK_DB_PATH + ".db");
     }
-    // Measure Memory Baseline
     const memStart = process.memoryUsage();
-    // Initialize Server
+    const envStart = perf_hooks_1.performance.now();
     console.log("• Initializing Server & Loading Embedding Model...");
-    const initStart = perf_hooks_1.performance.now();
     const server = new index_1.MemoryServer(BENCHMARK_DB_PATH);
-    // Force embedding model load
-    await server.embeddingService.embed("warmup");
-    const initEnd = perf_hooks_1.performance.now();
-    console.log(`  -> Init Time: ${(initEnd - initStart).toFixed(2)}ms`);
+    await server.initPromise;
+    const embedWarmupStart = perf_hooks_1.performance.now();
+    await server.embeddingService.embed("benchmark-warmup");
+    const embedWarmupEnd = perf_hooks_1.performance.now();
+    const initMs = embedWarmupEnd - envStart;
+    const firstEmbeddingMs = embedWarmupEnd - embedWarmupStart;
+    console.log(`  -> Init + Warmup: ${formatNum(initMs)}ms`);
+    console.log(`  -> First embedding: ${formatNum(firstEmbeddingMs)}ms`);
     const memAfterInit = process.memoryUsage();
-    console.log(`  -> Memory Increase (Init): ${((memAfterInit.rss - memStart.rss) / 1024 / 1024).toFixed(2)} MB RSS`);
-    // Data Generation
-    const NUM_ENTITIES = 50;
-    const NUM_OBSERVATIONS = 200;
-    const NUM_RELATIONS = 100;
-    console.log(`\n• Generating Data (${NUM_ENTITIES} Entities, ${NUM_OBSERVATIONS} Observations, ${NUM_RELATIONS} Relations)...`);
-    const dataStart = perf_hooks_1.performance.now();
-    // Entities
-    const entities = [];
-    for (let i = 0; i < NUM_ENTITIES; i++) {
-        entities.push(await server.createEntity({
-            name: `Entity_${i}`,
-            type: i % 2 === 0 ? "Person" : "Project",
-            metadata: { index: i }
-        }));
-    }
-    // Observations
-    for (let i = 0; i < NUM_OBSERVATIONS; i++) {
-        const entity = entities[i % NUM_ENTITIES];
-        // @ts-ignore
-        await server.addObservation({
-            // @ts-ignore
-            entity_id: entity.id,
-            text: `This is observation number ${i} for entity ${ // @ts-ignore
-            entity.name}. It contains some random keywords like apple, banana, and cherry.`
-        });
-    }
-    // Relations
-    for (let i = 0; i < NUM_RELATIONS; i++) {
-        const from = entities[i % NUM_ENTITIES];
-        const to = entities[(i + 1) % NUM_ENTITIES];
-        // @ts-ignore
-        await server.createRelation({
-            // @ts-ignore
-            from_id: from.id,
-            // @ts-ignore
-            to_id: to.id,
-            relation_type: "related_to",
-            strength: 0.5
-        });
-    }
-    const dataEnd = perf_hooks_1.performance.now();
-    console.log(`  -> Data Ingestion Time: ${(dataEnd - dataStart).toFixed(2)}ms`);
-    console.log(`  -> Avg Time per Operation: ${((dataEnd - dataStart) / (NUM_ENTITIES + NUM_OBSERVATIONS + NUM_RELATIONS)).toFixed(2)}ms`);
+    console.log(`\n• Seeding Data...`);
+    const seedStart = perf_hooks_1.performance.now();
+    const { entities, NUM_ENTITIES, NUM_OBSERVATIONS, NUM_RELATIONS } = await seedData(server);
+    const seedEnd = perf_hooks_1.performance.now();
+    const totalOps = NUM_ENTITIES + NUM_OBSERVATIONS + NUM_RELATIONS;
+    const ingestionTotalMs = seedEnd - seedStart;
+    const ingestionAvgMs = ingestionTotalMs / totalOps;
+    console.log(`  -> Data Ingestion: ${formatNum(ingestionTotalMs)}ms (${formatNum(ingestionAvgMs)} ms/op)`);
     const memAfterData = process.memoryUsage();
-    console.log(`  -> Memory Increase (Data): ${((memAfterData.rss - memAfterInit.rss) / 1024 / 1024).toFixed(2)} MB RSS`);
-    // Query Benchmark
-    console.log("\n• Running Queries (Hybrid Search)...");
+    await warmupServer(server, opts.warmupRuns);
+    console.log("\n• Running Query Benchmarks...");
     const queries = [
         "observation number 10",
-        "apple banana",
-        "Entity_0",
-        "Project related"
+        "alpha beta gamma",
+        `Entity_${NUM_ENTITIES - 1}`,
+        "Project related",
+        "delta observation keywords",
+        "colleague technology",
+        "Bob Alice relation",
+        "works on project",
+        "Senior Engineer Berlin",
+        "graph traversal seed",
     ];
-    const times = [];
-    for (const q of queries) {
-        const t0 = perf_hooks_1.performance.now();
-        await server.hybridSearch.search({
-            query: q,
-            limit: 10,
-            includeEntities: true,
-            includeObservations: true
-        });
-        const t1 = perf_hooks_1.performance.now();
-        times.push(t1 - t0);
-        process.stdout.write(".");
-    }
-    console.log("");
-    const avgQueryTime = times.reduce((a, b) => a + b, 0) / times.length;
-    const minQueryTime = Math.min(...times);
-    const maxQueryTime = Math.max(...times);
-    console.log(`  -> Avg Query Time: ${avgQueryTime.toFixed(2)}ms`);
-    console.log(`  -> Min Query Time: ${minQueryTime.toFixed(2)}ms`);
-    console.log(`  -> Max Query Time: ${maxQueryTime.toFixed(2)}ms`);
-    // RRF Overhead Estimation (Approximation)
-    // We perform a raw vector search (fastest component) and compare with hybrid search
-    // This is a rough proxy because hybrid search does 5 parallel searches + RRF
-    console.log("\n• Estimating RRF/Combination Overhead...");
-    const tVecStart = perf_hooks_1.performance.now();
-    // Access private method via any cast or just simulate a similar query
-    // Since we can't easily access private methods, we will rely on the fact that
-    // Hybrid Search = Promise.all([Vector, Keyword, Graph]) + RRF
-    // We'll run a search with ONLY vector enabled (by setting weights of others to 0? No, they still run)
-    // We will try to run a pure DB query to simulate vector search time
-    const vectorOnlyStart = perf_hooks_1.performance.now();
-    const qEmb = await server.embeddingService.embed("apple");
-    await server.db.run(`
-    ?[id, score] := ~entity:semantic { id | query: vec($qEmb), k: 10, ef: 20 }, score = 1.0
-  `, { qEmb });
-    const vectorOnlyEnd = perf_hooks_1.performance.now();
-    const vectorTime = vectorOnlyEnd - vectorOnlyStart;
-    console.log(`  -> Raw Vector Search Time: ${vectorTime.toFixed(2)}ms`);
-    console.log(`  -> Overhead (Hybrid Logic + RRF): ${(avgQueryTime - vectorTime).toFixed(2)}ms`);
-    // Graph Benchmark
-    console.log("\n• Running Graph Benchmarks (Graph-RAG & Graph-Walking)...");
-    // Graph-RAG
-    const ragStart = perf_hooks_1.performance.now();
-    // @ts-ignore
-    await server.hybridSearch.graphRag({
-        query: "Entity_0",
-        limit: 20,
-        graphConstraints: {
-            maxDepth: 2
+    const runs = { hybrid: [], reranked: [], graphRag: [], graphWalking: [], rawVector: [] };
+    for (let r = 0; r < opts.runs; r++) {
+        await server.hybridSearch.clearCache();
+        for (const q of queries) {
+            const hybridMs = (await time(() => server.hybridSearch.search({
+                query: q,
+                limit: 10,
+                includeEntities: true,
+                includeObservations: true,
+            }))).ms;
+            runs.hybrid.push(hybridMs);
+            if (opts.enableRerank) {
+                const rerankedMs = (await time(() => server.hybridSearch.search({
+                    query: q,
+                    limit: 10,
+                    rerank: true,
+                    includeEntities: true,
+                    includeObservations: true,
+                }))).ms;
+                runs.reranked.push(rerankedMs);
+            }
+            const graphRagMs = (await time(() => server.hybridSearch.graphRag({
+                query: q,
+                limit: 10,
+                graphConstraints: { maxDepth: 2 },
+            }))).ms;
+            runs.graphRag.push(graphRagMs);
+            const startEntityId = entities[r % entities.length].id;
+            const walkMs = (await time(() => server.graph_walking({
+                query: q,
+                start_entity_id: startEntityId,
+                max_depth: 3,
+                limit: 10,
+            }))).ms;
+            runs.graphWalking.push(walkMs);
         }
-    });
-    const ragEnd = perf_hooks_1.performance.now();
-    console.log(`  -> Graph-RAG (2-Hop) Time: ${(ragEnd - ragStart).toFixed(2)}ms`);
-    // Graph-Walking
-    const walkStart = perf_hooks_1.performance.now();
-    // @ts-ignore
-    const startEntityId = entities[0].id;
-    // @ts-ignore
-    await server.graph_walking({
-        query: "related concepts",
-        start_entity_id: startEntityId,
-        max_depth: 3,
-        limit: 10
-    });
-    const walkEnd = perf_hooks_1.performance.now();
-    console.log(`  -> Graph-Walking (Recursive) Time: ${(walkEnd - walkStart).toFixed(2)}ms`);
-    // Final Memory
+        const qEmb = await server.embeddingService.embed("benchmark-vector-baseline");
+        const vectorMs = (await time(() => server.db.run(`?[id, score] := ~entity:semantic { id | query: vec($qEmb), k: 10, ef: 20 }, score = 1.0`, { qEmb }))).ms;
+        runs.rawVector.push(vectorMs);
+    }
+    console.log("\n• Running Recall Evaluation...");
+    const recall = await measureRecall(server, opts.runs, opts.warmupRuns, opts);
     const memFinal = process.memoryUsage();
-    console.log("\n• Final Memory Stats:");
-    console.log(`  -> RSS: ${(memFinal.rss / 1024 / 1024).toFixed(2)} MB`);
-    console.log(`  -> Heap Used: ${(memFinal.heapUsed / 1024 / 1024).toFixed(2)} MB`);
-    // Cleanup
-    // @ts-ignore
+    console.log(`\n• Final Memory Stats:`);
+    console.log(`  -> RSS Init: ${formatNum(memAfterInit.rss / 1024 / 1024)} MB`);
+    console.log(`  -> RSS After Data: ${formatNum(memAfterData.rss / 1024 / 1024)} MB`);
+    console.log(`  -> RSS Final: ${formatNum(memFinal.rss / 1024 / 1024)} MB`);
+    console.log(`  -> Heap Used Final: ${formatNum(memFinal.heapUsed / 1024 / 1024)} MB`);
+    const summary = {
+        environment: {
+            nodeVersion: process.version,
+            platform: process.platform,
+            timestamp: new Date().toISOString(),
+            embeddingModel: process.env.EMBEDDING_MODEL || "Xenova/bge-m3",
+            dbEngine: process.env.DB_ENGINE || "sqlite",
+        },
+        warmup: {
+            initMs,
+            firstEmbeddingMs,
+        },
+        ingestion: {
+            totalMs: ingestionTotalMs,
+            avgPerOpMs: ingestionAvgMs,
+            throughputOpsPerSec: 1000 / ingestionAvgMs,
+        },
+        memory: {
+            rssAfterInitMB: memAfterInit.rss / 1024 / 1024,
+            rssAfterDataMB: memAfterData.rss / 1024 / 1024,
+            rssFinalMB: memFinal.rss / 1024 / 1024,
+            heapUsedFinalMB: memFinal.heapUsed / 1024 / 1024,
+        },
+        queries: {
+            rawVectorMs: { avg: mean(runs.rawVector), p50: median(runs.rawVector), p95: percentile(runs.rawVector.slice().sort((a, b) => a - b), 0.95) },
+            hybrid: { avg: mean(runs.hybrid), p50: median(runs.hybrid), p95: percentile(runs.hybrid.slice().sort((a, b) => a - b), 0.95) },
+            reranked: { avg: mean(runs.reranked), p50: median(runs.reranked), p95: percentile(runs.reranked.slice().sort((a, b) => a - b), 0.95) },
+            graphRag: { avg: mean(runs.graphRag), p50: median(runs.graphRag), p95: percentile(runs.graphRag.slice().sort((a, b) => a - b), 0.95) },
+            graphWalking: { avg: mean(runs.graphWalking), p50: median(runs.graphWalking), p95: percentile(runs.graphWalking.slice().sort((a, b) => a - b), 0.95) },
+        },
+        recall,
+    };
+    const output = renderSummary(summary, opts);
+    if (opts.format === "json") {
+        console.log(JSON.stringify(summary, null, 2));
+    }
+    else if (opts.format === "markdown") {
+        console.log(output);
+    }
+    else {
+        console.log(output);
+    }
+    if (opts.csvPath && recall.length) {
+        const header = ["method", "recall_at_10", "recall_at_3", "mrr", "ndcg_at_10", "avg_latency_ms", "p50_latency_ms", "p95_latency_ms"];
+        const rows = recall.map(r => [r.method, r.recallAt10.toFixed(4), r.recallAt3.toFixed(4), r.mrr.toFixed(4), r.ndcgAt10.toFixed(4), r.avgLatencyMs.toFixed(2), r.p50LatencyMs.toFixed(2), r.p95LatencyMs.toFixed(2)]);
+        const csv = [header.join(","), ...rows.map(r => r.join(","))].join("\n");
+        fs_1.default.writeFileSync(opts.csvPath, csv);
+        console.log(`\n• CSV written to: ${opts.csvPath}`);
+    }
     server.db.close();
     if (fs_1.default.existsSync(BENCHMARK_DB_PATH + ".db")) {
         fs_1.default.unlinkSync(BENCHMARK_DB_PATH + ".db");
     }
 }
-runBenchmark().catch(console.error);
+function renderSummary(s, opts) {
+    const lines = [];
+    lines.push("==================================================");
+    lines.push("CozoDB Memory Benchmark Results");
+    lines.push("==================================================");
+    lines.push(`Environment: ${s.environment.nodeVersion} on ${s.environment.platform}`);
+    lines.push(`Timestamp:   ${s.environment.timestamp}`);
+    lines.push(`Embedding:   ${s.environment.embeddingModel}`);
+    lines.push(`DB Engine:   ${s.environment.dbEngine}`);
+    lines.push(`Runs:        ${opts.runs}  Warmup: ${opts.warmupRuns}`);
+    lines.push("");
+    lines.push("## Warmup");
+    lines.push(`- Init + Warmup:          ${formatNum(s.warmup.initMs)} ms`);
+    lines.push(`- First embedding:       ${formatNum(s.warmup.firstEmbeddingMs)} ms`);
+    lines.push("");
+    lines.push("## Ingestion");
+    lines.push(`- Total ingestion:       ${formatNum(s.ingestion.totalMs)} ms`);
+    lines.push(`- Avg per operation:     ${formatNum(s.ingestion.avgPerOpMs)} ms`);
+    lines.push(`- Throughput:            ${formatNum(s.ingestion.throughputOpsPerSec)} ops/sec`);
+    lines.push("");
+    lines.push("## Memory");
+    lines.push(`- RSS after init:        ${formatNum(s.memory.rssAfterInitMB)} MB`);
+    lines.push(`- RSS after data load:   ${formatNum(s.memory.rssAfterDataMB)} MB`);
+    lines.push(`- RSS final:             ${formatNum(s.memory.rssFinalMB)} MB`);
+    lines.push(`- Heap used final:       ${formatNum(s.memory.heapUsedFinalMB)} MB`);
+    lines.push("");
+    lines.push("## Query Latency (ms)");
+    lines.push("| Method           | Avg      | P50      | P95      |");
+    lines.push("|------------------|----------|----------|----------|");
+    lines.push(`| Raw Vector       | ${formatNum(s.queries.rawVectorMs.avg, 2).padEnd(8)} | ${formatNum(s.queries.rawVectorMs.p50, 2).padEnd(8)} | ${formatNum(s.queries.rawVectorMs.p95, 2).padEnd(8)} |`);
+    lines.push(`| Hybrid Search    | ${formatNum(s.queries.hybrid.avg, 2).padEnd(8)} | ${formatNum(s.queries.hybrid.p50, 2).padEnd(8)} | ${formatNum(s.queries.hybrid.p95, 2).padEnd(8)} |`);
+    if (opts.enableRerank) {
+        lines.push(`| Reranked Search  | ${formatNum(s.queries.reranked.avg, 2).padEnd(8)} | ${formatNum(s.queries.reranked.p50, 2).padEnd(8)} | ${formatNum(s.queries.reranked.p95, 2).padEnd(8)} |`);
+    }
+    lines.push(`| Graph-RAG        | ${formatNum(s.queries.graphRag.avg, 2).padEnd(8)} | ${formatNum(s.queries.graphRag.p50, 2).padEnd(8)} | ${formatNum(s.queries.graphRag.p95, 2).padEnd(8)} |`);
+    lines.push(`| Graph-Walking    | ${formatNum(s.queries.graphWalking.avg, 2).padEnd(8)} | ${formatNum(s.queries.graphWalking.p50, 2).padEnd(8)} | ${formatNum(s.queries.graphWalking.p95, 2).padEnd(8)} |`);
+    lines.push("");
+    lines.push(`## Recall & Quality (Mean across ${opts.runs} runs)`);
+    lines.push("| Method                | Recall@10 | Recall@3 | MRR   | nDCG@10 | Avg Latency | P50 Latency | P95 Latency |");
+    lines.push("|-----------------------|-----------|----------|-------|---------|-------------|-------------|-------------|");
+    for (const r of s.recall) {
+        lines.push(`| ${r.method.padEnd(21)} | ${r.recallAt10.toFixed(3).padStart(9)} | ${r.recallAt3.toFixed(3).padStart(8)} | ${r.mrr.toFixed(3).padStart(5)} | ${r.ndcgAt10.toFixed(3).padStart(7)} | ${formatNum(r.avgLatencyMs, 2).padStart(11)} | ${formatNum(r.p50LatencyMs, 2).padStart(11)} | ${formatNum(r.p95LatencyMs, 2).padStart(11)} |`);
+    }
+    lines.push("");
+    lines.push("## Benchmarking external systems");
+    lines.push("Recall/Quality numbers for Chroma, Qdrant, Mem0 are not generated by this script.");
+    lines.push("Fill the comparison table in docs/BENCHMARKS.md only from published/public benchmarks.");
+    lines.push("Do not insert internal estimates into the publication table.");
+    lines.push("");
+    lines.push("==================================================");
+    return lines.join("\n");
+}
+const opts = parseArgs();
+runBenchmark(opts).catch((err) => {
+    console.error("Benchmark failed:", err);
+    process.exit(1);
+});

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "cozo-memory",
-  "version": "1.2.9",
+  "version": "1.2.10",
   "mcpName": "io.github.tobs-code/cozo-memory",
   "description": "Local-first persistent memory system for AI agents with hybrid search, graph reasoning, and MCP integration",
   "main": "dist/index.js",