npm - rust-kgdb - Versions diffs - 0.6.30 → 0.6.32 - Mend

rust-kgdb 0.6.30 → 0.6.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/CLAUDE.md +50 -499
package/HYPERMIND_BENCHMARK_REPORT.md +199 -41
package/README.md +62 -169
package/benchmark-frameworks.py +568 -0
package/package.json +3 -1
package/verified_benchmark_results.json +307 -0

package/CLAUDE.md CHANGED Viewed

@@ -8,307 +8,6 @@ This is the **TypeScript/Node.js SDK** for `rust-kgdb`, a high-performance RDF/S
 **npm Package**: [`rust-kgdb`](https://www.npmjs.com/package/rust-kgdb)
-## Benchmark Results
-**HyperMind achieves 86.4% accuracy where vanilla LLMs achieve 0%.**
-| Metric | Vanilla LLM | HyperMind | Improvement |
-|--------|-------------|-----------|-------------|
-| **Accuracy** | 0% | 86.4% | +86.4 pp |
-| **Claude Sonnet 4** | 0% | 90.9% | +90.9 pp |
-| **GPT-4o** | 0% | 81.8% | +81.8 pp |
-| **Hallucinations** | 100% | 0% | Eliminated |
-| **Audit Trail** | None | Complete | Full provenance |
-| **Reproducibility** | Random | Deterministic | Same hash |
-### How We Calculated These Numbers
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│                    BENCHMARK METHODOLOGY                                     │
-│                                                                              │
-│   DATASET: LUBM (Lehigh University Benchmark)                               │
-│   ─────────────────────────────────────────────                             │
-│   • Industry-standard academic KG benchmark (since 2005)                    │
-│   • 3,272 triples (LUBM-1 scale)                                           │
-│   • 30 OWL classes, 23 properties                                          │
-│   • Used by: Jena, RDFox, Stardog, GraphDB for comparison                  │
-│                                                                              │
-│   TEST PROTOCOL: 11 Hard Scenarios × 2 LLMs × 2 Approaches                 │
-│   ─────────────────────────────────────────────────────────                 │
-│   For each test query:                                                       │
-│   1. VANILLA: Send query to LLM with NO context                            │
-│   2. HYPERMIND: Send query with SchemaContext (Γ) injected                 │
-│   3. VALIDATE: Parse → Type-check → Execute → Verify results               │
-│                                                                              │
-│   ACCURACY FORMULA:                                                          │
-│   ─────────────────                                                          │
-│   Accuracy = (Queries that pass ALL 3 gates) / (Total queries) × 100       │
-│                                                                              │
-│   Gate 1: Syntax Valid (no markdown, valid SPARQL)                         │
-│   Gate 2: Executable (runs without error on rust-kgdb)                     │
-│   Gate 3: Type Safe (uses ONLY predicates from SchemaContext)              │
-│                                                                              │
-│   RESULTS:                                                                   │
-│   ─────────                                                                  │
-│   Vanilla LLM:  0/11 passed (0%)    - Failed Gate 1 or 3 every time        │
-│   HyperMind:   9.5/11 passed (86.4%) - Claude: 10/11, GPT-4o: 9/11         │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
-**Reproducibility**: Run `node vanilla-vs-hypermind-benchmark.js` to verify these numbers yourself.
-### What Was Tested
-| Component | Specification |
-|-----------|---------------|
-| **Dataset** | LUBM (Lehigh University Benchmark) - standard academic KG benchmark |
-| **Triples** | 3,272 (LUBM-1 scale) |
-| **Schema** | 30 OWL classes, 23 properties |
-| **Deployment** | rust-kgdb Kubernetes cluster (3 executors, 1 coordinator) |
-### Test Categories (11 Hard Scenarios)
-| Category | Count | What It Tests |
-|----------|-------|---------------|
-| **ambiguous** | 3 | Queries with multiple valid interpretations |
-| **multi_hop** | 2 | Requires JOIN reasoning across entities |
-| **syntax** | 2 | Catches markdown/formatting errors |
-| **edge_case** | 2 | Boundary conditions, empty results |
-| **type_mismatch** | 2 | Schema violation detection |
-### How We Tested (Evaluation Protocol)
-```javascript
-// VANILLA LLM: No context (baseline)
-const vanillaPrompt = `Generate SPARQL: ${query}`
-// Result: LLM guesses predicates, wraps in markdown, hallucinates
-// HYPERMIND: Schema injected into prompt
-const hypermindPrompt = `
-SCHEMA:
-Classes: ${schema.classes.join(', ')}      // From YOUR actual data
-Predicates: ${schema.predicates.join(', ')} // From YOUR actual data
-TYPE CONTRACT:
-- Input: natural language query
-- Output: raw SPARQL (NO markdown, NO code blocks)
-- Precondition: Query references ONLY schema predicates
-- Postcondition: Valid SPARQL 1.1 syntax
-Query: ${query}
-`
-// Result: LLM generates valid, type-safe queries
-```
-### Success Criteria (Three Gates)
-1. **Syntax Valid**: Query parses without errors (no markdown wrapping)
-2. **Executable**: Query runs against database without exceptions
-3. **Type Safe**: Uses ONLY predicates defined in schema (no hallucination)
-### Why Vanilla LLMs Fail (100% Failure Rate)
-```
-User: "Find all professors"
-Vanilla LLM Output:
-┌────────────────────────────────────────────────────────────────────┐
-│ ```sparql                          ← PROBLEM 1: Markdown wrapper   │
-│ PREFIX ub: <http://...>                                            │
-│ SELECT ?prof WHERE {                                               │
-│   ?prof a ub:Faculty .             ← PROBLEM 2: Wrong class!       │
-│ }                                     (Schema has "Professor")     │
-│ ```                                                                │
-│                                                                    │
-│ This query finds all faculty...    ← PROBLEM 3: Explanation text   │
-└────────────────────────────────────────────────────────────────────┘
-Result: ❌ Parser rejects (markdown), wrong class (hallucinated)
-```
-### Why HyperMind Succeeds (86.4% Success Rate)
-```
-User: "Find all professors"
-HyperMind Output:
-┌────────────────────────────────────────────────────────────────────┐
-│ PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>      │
-│ SELECT ?prof WHERE {                                               │
-│   ?prof a ub:Professor .           ← CORRECT: From injected schema │
-│ }                                                                  │
-└────────────────────────────────────────────────────────────────────┘
-Result: ✅ Parses, executes, returns 15 professors
-```
-### Calibration Against Industry Benchmarks
-Our methodology is calibrated against established AI benchmarks:
-| Benchmark | Organization | What It Measures | How We Applied It |
-|-----------|--------------|------------------|-------------------|
-| **GAIA** | Meta Research | Multi-step reasoning, tool use | Test categories (ambiguous, multi_hop) |
-| **SWE-bench** | OpenAI | Code generation accuracy | Success criteria (syntax, executable, type-safe) |
-| **LUBM** | Lehigh University | Knowledge graph query performance | Dataset (3,272 triples, 30 classes, 23 predicates) |
-**Calibration Process:**
-1. **GAIA-inspired categories**: We adopted GAIA's multi-step reasoning tests for `multi_hop` and `ambiguous` categories
-2. **SWE-bench-inspired validation**: Like SWE-bench validates code patches via test suites, we validate queries via three gates (syntax → executable → type-safe)
-3. **LUBM standard dataset**: Industry-standard academic benchmark ensures reproducibility across implementations
-### Verification Method
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│                         VERIFICATION PIPELINE                                │
-│                                                                              │
-│   1. GENERATE    LLM produces SPARQL from natural language                  │
-│         │                                                                    │
-│         ▼                                                                    │
-│   2. PARSE       rust-kgdb SPARQL parser validates syntax                   │
-│         │        ✗ Markdown? → FAIL                                         │
-│         │        ✗ Invalid syntax? → FAIL                                   │
-│         ▼                                                                    │
-│   3. TYPE-CHECK  QueryValidator checks against SchemaContext (Γ)            │
-│         │        ✗ Unknown predicate? → FAIL (hallucination detected)       │
-│         │        ✗ Wrong domain/range? → FAIL                               │
-│         ▼                                                                    │
-│   4. EXECUTE     Query runs against LUBM dataset in rust-kgdb cluster       │
-│         │        ✗ Runtime error? → FAIL                                    │
-│         │        ✗ Empty when expecting results? → FAIL                     │
-│         ▼                                                                    │
-│   5. VERIFY      Results compared against known LUBM answers                │
-│                  ✓ Matches expected? → PASS                                 │
-│                                                                              │
-│   Each test must pass ALL 5 stages to count as SUCCESS                      │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
-### Published Results
-| Artifact | Location | What It Contains |
-|----------|----------|------------------|
-| **Benchmark Report** | `HYPERMIND_BENCHMARK_REPORT.md` | Full methodology, per-test results, failure analysis |
-| **Benchmark Code** | `vanilla-vs-hypermind-benchmark.js` | Runnable benchmark comparing vanilla vs HyperMind |
-| **Example: Fraud** | `examples/fraud-detection-agent.js` | Real dataset (`FRAUD_ONTOLOGY`) loaded via `db.loadTtl()` |
-| **Example: Underwriting** | `examples/underwriting-agent.js` | Real dataset (`UNDERWRITING_KB`) loaded via `db.loadTtl()` |
-| **npm Package** | `rust-kgdb` | Published SDK with all benchmark code |
-### Dataset Loading (Factually Verifiable)
-Both examples load real ontologies/knowledge bases via `loadTtl()`:
-```javascript
-// examples/fraud-detection-agent.js (line 612)
-db.loadTtl(FRAUD_ONTOLOGY, CONFIG.kg.graphUri)
-// FRAUD_ONTOLOGY contains: ins:Claimant, ins:Provider, ins:Claim classes
-// with properties: claimant, provider, amount, address (for ring detection)
-// examples/underwriting-agent.js (line 766)
-db.loadTtl(UNDERWRITING_KB, 'http://underwriting.org/data')
-// UNDERWRITING_KB contains: uw:BusinessAccount, uw:Territory classes
-// with properties: naicsCode, revenue, territory, hurricaneExposure, earthquakeExposure
-```
-**Verify in code**: Run `grep -n "loadTtl" examples/*.js` to see exact lines.
-### End-to-End Architecture: HyperMind Deterministic Flow
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│              HYPERMIND: DETERMINISTIC SCHEMA-DRIVEN EXECUTION                │
-│              Powered by rust-kgdb GraphDB (LLM OPTIONAL)                     │
-│                                                                              │
-│   ┌─────────────────────────────────────────────────────────────────────┐   │
-│   │  USER: "Find high-risk providers with claims over $10,000"          │   │
-│   └──────────────────────────────┬──────────────────────────────────────┘   │
-│                                  │                                          │
-│                                  ▼                                          │
-│   ┌─────────────────────────────────────────────────────────────────────┐   │
-│   │  1. SCHEMA CONTEXT (Γ) - Object, NOT string                          │   │
-│   │     ┌─────────────────────────────────────────────────────────────┐ │   │
-│   │     │  const schemaContext = await SchemaContext.fromKG(db)       │ │   │
-│   │     │  // Returns OBJECT: { classes: Set, properties: Map, ... }  │ │   │
-│   │     └─────────────────────────────────────────────────────────────┘ │   │
-│   └──────────────────────────────┬──────────────────────────────────────┘   │
-│                                  │                                          │
-│                                  ▼                                          │
-│   ┌─────────────────────────────────────────────────────────────────────┐   │
-│   │  2. DETERMINISTIC INTENT ANALYSIS (NO LLM)                           │   │
-│   │     ┌─────────────────────────────────────────────────────────────┐ │   │
-│   │     │  const intent = this._analyzeIntent(prompt)                 │ │   │
-│   │     │  // Keyword matching: "high-risk" → intent.risk = true      │ │   │
-│   │     │  // "claims over" → intent.query = true, intent.filter      │ │   │
-│   │     │  // DETERMINISTIC: same input → same intent                 │ │   │
-│   │     └─────────────────────────────────────────────────────────────┘ │   │
-│   └──────────────────────────────┬──────────────────────────────────────┘   │
-│                                  │                                          │
-│                                  ▼                                          │
-│   ┌─────────────────────────────────────────────────────────────────────┐   │
-│   │  3. SCHEMA-DRIVEN QUERY GENERATION (NO LLM)                          │   │
-│   │     ┌─────────────────────────────────────────────────────────────┐ │   │
-│   │     │  const sparql = this._generateSchemaSparql(intent, schema)  │ │   │
-│   │     │  // Uses SchemaContext to find matching predicates:         │ │   │
-│   │     │  // - riskScore found in schema.predicates                  │ │   │
-│   │     │  // - amount found in schema.predicates                     │ │   │
-│   │     │  // Generates: SELECT ?p ?score WHERE { ?p :riskScore ... } │ │   │
-│   │     └─────────────────────────────────────────────────────────────┘ │   │
-│   └──────────────────────────────┬──────────────────────────────────────┘   │
-│                                  │                                          │
-│                                  ▼                                          │
-│   ┌─────────────────────────────────────────────────────────────────────┐   │
-│   │  4. VALIDATION + EXECUTION (rust-kgdb)                               │   │
-│   │     ┌─────────────────────────────────────────────────────────────┐ │   │
-│   │     │  const validation = validateQuery(sparql, schemaContext)    │ │   │
-│   │     │  // ✓ All predicates exist in SchemaContext                 │ │   │
-│   │     │  // ✓ Types match (domain/range)                            │ │   │
-│   │     │  const results = db.querySelect(sparql) // 2.78 µs          │ │   │
-│   │     └─────────────────────────────────────────────────────────────┘ │   │
-│   └──────────────────────────────┬──────────────────────────────────────┘   │
-│                                  │                                          │
-│                                  ▼                                          │
-│   ┌─────────────────────────────────────────────────────────────────────┐   │
-│   │  5. PROOF DAG (Audit Trail)                                          │   │
-│   │     ┌─────────────────────────────────────────────────────────────┐ │   │
-│   │     │  {                                                           │ │   │
-│   │     │    answer: "Provider P001, P003 are high-risk",             │ │   │
-│   │     │    derivations: [{ tool: "kg.sparql.query", ... }],         │ │   │
-│   │     │    hash: "sha256:8f3a2b1c...",  // REPRODUCIBLE             │ │   │
-│   │     │  }                                                           │ │   │
-│   │     └─────────────────────────────────────────────────────────────┘ │   │
-│   └─────────────────────────────────────────────────────────────────────┘   │
-│                                                                              │
-│   LLM OPTIONAL: If enabled, used ONLY for final summarization               │
-│   KEY: Same input + same schema = same query = same results = same hash     │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
-**Code References** (verify in `hypermind-agent.js`):
-- `_analyzeIntent()` line 2286: Deterministic keyword matching
-- `_generateSteps()` line 2297: Schema-driven step generation
-- `_generateSchemaSparql()` line 2368: Schema-aware SPARQL generation
-- `validateQuery()`: Type-checks against SchemaContext
-### Run It Yourself
-```bash
-# 1. Install the SDK
-npm install rust-kgdb
-# 2. Set API keys
-export OPENAI_API_KEY="sk-..."
-export ANTHROPIC_API_KEY="sk-ant-..."
-# 3. Run benchmark
-node vanilla-vs-hypermind-benchmark.js
-# 4. Run examples
-node examples/fraud-detection-agent.js
-node examples/underwriting-agent.js
-```
-**All results are reproducible.** Same schema + same question = same answer = same hash.
 ## Commands
 ### Build Native Addon
@@ -332,8 +31,9 @@ npx jest tests/regression.test.ts --testNamePattern="SPARQL"
 ### Publishing
 ```bash
-npm publish             # Publish to npm
-npm view rust-kgdb      # View package info
+npm version patch --no-git-tag-version  # Bump version
+npm publish                              # Publish to npm
+npm view rust-kgdb                       # View package info
 ```
 ## Architecture
@@ -365,190 +65,17 @@ npm view rust-kgdb      # View package info
 1. **Native NAPI-RS** (`native/rust-kgdb-napi/src/lib.rs`): Rust bindings for GraphDB, GraphFrame, Embeddings, Datalog, Pregel
 2. **HyperMind Framework** (`hypermind-agent.js`): Pure JS AI agent framework with schema awareness, memory, sandboxing
-## Our Approach vs Traditional (Why We Built This)
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│                    APPROACH COMPARISON                                       │
-├─────────────────────────────────────────────────────────────────────────────┤
-│                                                                              │
-│   TRADITIONAL (LangChain, AutoGPT)               OUR APPROACH (HyperMind)   │
-│   ─────────────────────────────────              ─────────────────────────  │
-│                                                                              │
-│   User → LLM → Tool Call                         User → Deterministic       │
-│   LLM DECIDES what to call                       Planner → Typed Steps      │
-│   LLM GENERATES query text                       SCHEMA generates query     │
-│                                                  SCHEMA validates query     │
-│                                                                              │
-│   PROS:                         CONS:            PROS:            CONS:     │
-│   • Flexible                    • 20-40%         • 86.4% success  • Needs   │
-│   • Easy setup                  • Hallucinates   • Zero halluc.     schema  │
-│   • Vague tasks OK              • No audit       • Full audit     • Struct. │
-│                                 • Non-determ.    • Reproducible     data    │
-│                                 • Expensive      • Cheap at scale   only    │
-│                                                                              │
-│   WHY WE CHOSE DETERMINISTIC:                                               │
-│   • Enterprise needs audit trails (compliance)                              │
-│   • 86.4% vs 20-40% is category difference                                  │
-│   • LLM per query is expensive at scale                                     │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
-## Domain-Enriched Proxy Architecture (Our Unique Approach)
-HyperMind uses a **schema-enriched deterministic planner**. Key difference: LLM is OPTIONAL (for summarization only).
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│          TRADITIONAL APPROACH (LangChain, AutoGPT, MCP)                      │
-│                                                                              │
-│   User Question ──► LLM (no domain knowledge) ──► LLM generates query       │
-│                          │                             │                     │
-│                          │ ❌ Hallucinates predicates  │                     │
-│                          │ ❌ No schema validation     ▼                     │
-│                          │                       Tool Call                   │
-│                          │                             │                     │
-│                          │ ❌ 20-40% success           ▼                     │
-│                          └─────────────────────► Results (often wrong)       │
-└─────────────────────────────────────────────────────────────────────────────┘
-┌─────────────────────────────────────────────────────────────────────────────┐
-│          OUR APPROACH (HyperMind) - Schema-Enriched Deterministic Planner    │
-│                                                                              │
-│   ┌─────────────┐     ┌─────────────────────────────────────────────────┐   │
-│   │ Knowledge   │────►│  SchemaContext (Γ) - AS OBJECT (not string!)    │   │
-│   │ Graph       │     │  {                                               │   │
-│   └─────────────┘     │    classes: Set(['Claim', 'Provider']),         │   │
-│                       │    properties: Map({ 'amount': {...} }),         │   │
-│                       │    domains: Map({...}),                          │   │
-│                       │    ranges: Map({...})                            │   │
-│                       │  }                                               │   │
-│                       └──────────────────────┬──────────────────────────┘   │
-│                                              │                              │
-│   User Question ─────────────────────────────┤                              │
-│         │                                    │                              │
-│         ▼                                    ▼                              │
-│   ┌─────────────────────────────────────────────────────────────────────┐   │
-│   │  DETERMINISTIC PLANNER (no LLM!)                                     │   │
-│   │  ┌─────────────────────────────────────────────────────────────────┐│   │
-│   │  │ 1. _analyzeIntent(prompt)  // Keyword matching (deterministic)  ││   │
-│   │  │ 2. _generateSteps(intent, schemaContext)  // From schema        ││   │
-│   │  │ 3. _generateSchemaSparql(intent, schema)  // Schema-aware       ││   │
-│   │  │ 4. validateQuery(sparql, schemaContext)   // Type-check         ││   │
-│   │  └─────────────────────────────────────────────────────────────────┘│   │
-│   └──────────────────────────────┬──────────────────────────────────────┘   │
-│                                  │                                          │
-│                                  ▼ Typed, validated execution plan          │
-│                           rust-kgdb Execution (2.78 µs)                     │
-│                                  │                                          │
-│                                  ▼                                          │
-│                           ProofDAG (audit trail)                            │
-│                                  │                                          │
-│                                  ▼                                          │
-│                           Results (86.4% accuracy)                          │
-│                                                                              │
-│   LLM OPTIONAL: Only for summarization (not query generation)               │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
-**Key Insight**: SchemaContext is an OBJECT passed to the deterministic planner—NOT a string injected into an LLM prompt. Query generation is deterministic, not LLM-dependent.
-### Injection vs Proxy: The CMD Analogy
-Think of it like the evolution from DOS to modern shells:
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│  DOS/CMD ERA (Classification Approach)                                       │
-│                                                                              │
-│  User: "copy files"                                                          │
-│  System: ❌ "Bad command or file name"                                       │
-│                                                                              │
-│  You MUST know exact syntax: COPY C:\src\*.txt D:\dst\                      │
-│  No help, no context, no forgiveness.                                        │
-└─────────────────────────────────────────────────────────────────────────────┘
-┌─────────────────────────────────────────────────────────────────────────────┐
-│  MODERN SHELL with AI (Proxy Approach)                                       │
-│                                                                              │
-│  User: "copy all text files from src to dst"                                │
-│  Proxy: I see your filesystem has:                                          │
-│         - /src/ with 47 .txt files                                          │
-│         - /dst/ exists and is writable                                      │
-│  Proxy: Generating: cp /src/*.txt /dst/                                     │
-│  Proxy: ✅ Executed. 47 files copied.                                        │
-│                                                                              │
-│  The PROXY knows your context and translates intent to exact commands.       │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
-**HyperMind is the "modern shell" for knowledge graphs.** The SchemaContext is your "filesystem listing" - injected so the LLM knows what actually exists before generating queries.
-### The Beautiful Integration: Context Theory + Proof Theory
-HyperMind elegantly combines two mathematical foundations:
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│                    CONTEXT THEORY (Spivak's Ologs)                           │
-│                    "What CAN be said"                                        │
-│                                                                              │
-│   Your Knowledge Graph as a Category:                                        │
-│   ┌─────────────────────────────────────────────────────────────────────┐   │
-│   │  OBJECTS (Classes)           MORPHISMS (Properties)                 │   │
-│   │  ─────────────────           ─────────────────────                  │   │
-│   │  • Claim                     • Claim ──amount──► xsd:decimal        │   │
-│   │  • Provider                  • Claim ──provider──► Provider         │   │
-│   │  • Policy                    • Provider ──riskScore──► xsd:float    │   │
-│   │                                                                     │   │
-│   │  SchemaContext Γ = (Classes, Properties, Domains, Ranges)           │   │
-│   └─────────────────────────────────────────────────────────────────────┘   │
-│                                                                              │
-│   Γ defines the "grammar" of valid statements. If it's not in Γ,            │
-│   it cannot be queried. Hallucination becomes IMPOSSIBLE.                   │
-└─────────────────────────────────────────────────────────────────────────────┘
-                                    │
-                                    │ Schema INJECTED into LLM
-                                    │ LLM generates TYPED query
-                                    ▼
-┌─────────────────────────────────────────────────────────────────────────────┐
-│                    PROOF THEORY (Curry-Howard)                               │
-│                    "How it WAS derived"                                      │
-│                                                                              │
-│   Every answer has a PROOF (ProofDAG):                                       │
-│   ┌─────────────────────────────────────────────────────────────────────┐   │
-│   │                                                                     │   │
-│   │   CONCLUSION: "Provider P001 is high-risk"                          │   │
-│   │        │                                                            │   │
-│   │        ├── EVIDENCE: SPARQL returned riskScore = 0.87               │   │
-│   │        │      └── DERIVATION: Γ ⊢ ?p :riskScore ?r (type-checked)  │   │
-│   │        │                                                            │   │
-│   │        ├── EVIDENCE: Datalog rule matched "highRisk(?p)"            │   │
-│   │        │      └── DERIVATION: highRisk(P) :- riskScore(P,R), R>0.8 │   │
-│   │        │                                                            │   │
-│   │        └── HASH: sha256:8f3a2b1c... (reproducible)                  │   │
-│   │                                                                     │   │
-│   └─────────────────────────────────────────────────────────────────────┘   │
-│                                                                              │
-│   Proofs are PROGRAMS (Curry-Howard correspondence):                        │
-│   - Type Γ ⊢ e : τ = "expression e has type τ in context Γ"                │
-│   - Valid query = Valid proof = Executable program                          │
-│   - Same input → Same proof → Same output (deterministic)                   │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
-**The Elegance**:
-1. **Context Theory** ensures you can ONLY ask valid questions (schema-bounded)
-2. **Proof Theory** ensures every answer has a verifiable derivation chain
-3. **Together**: Questions are bounded by reality, answers are backed by proof
-This is why HyperMind achieves **86.4% accuracy** while vanilla LLMs achieve **0%** on structured data tasks—it's not prompt engineering, it's **mathematical guarantees**.
+## Key Files
-**Why This Works**:
-- LLM can only reference predicates that exist in YOUR data
-- Type contracts validate query structure before execution
-- Same question + same schema = same answer (deterministic)
-- Every answer has a ProofDAG showing derivation chain
+| File | Purpose |
+|------|---------|
+| `native/rust-kgdb-napi/src/lib.rs` | NAPI-RS Rust bindings (~700 lines) |
+| `hypermind-agent.js` | HyperMind AI Framework (~4000 lines) |
+| `index.js` | Platform loader + exports (~167 lines) |
+| `index.d.ts` | TypeScript definitions (~425 lines) |
+| `test-all-features.js` | 42 feature tests |
+| `tests/*.test.ts` | Jest test suites (~170 tests) |
+| `examples/` | Fraud detection, underwriting demos |
 ## Key APIs
@@ -563,6 +90,22 @@ This is why HyperMind achieves **86.4% accuracy** while vanilla LLMs achieve **0
 | **Pregel** | `pregelShortestPaths()` |
 | **Factories** | `friendsGraph()`, `chainGraph()`, `starGraph()`, `completeGraph()`, `cycleGraph()` |
+## HyperMind Key Methods (hypermind-agent.js)
+When modifying the HyperMind framework, these are the critical methods:
+| Method | Line | Purpose |
+|--------|------|---------|
+| `_analyzeIntent()` | ~2286 | Deterministic keyword matching (NO LLM) |
+| `_generateSteps()` | ~2297 | Schema-driven step generation |
+| `_generateSchemaSparql()` | ~2368 | Schema-aware SPARQL generation |
+| `SchemaContext` class | ~699 | Object with `classes: Set`, `properties: Map` |
+| `WasmSandbox` class | ~2612 | Capability-based execution with audit log |
+| `TOOL_REGISTRY` | ~1687 | Typed morphisms `Query → BindingSet` |
+| `ProofDAG` class | ~2411 | Derivation chain with hash |
+**Key Design Point**: LLM is OPTIONAL - only used for summarization, NOT query generation. Query generation is deterministic from SchemaContext.
 ## Rust Workspace Dependencies
 Native addon depends on parent workspace crates:
@@ -580,18 +123,6 @@ Native addon depends on parent workspace crates:
 3. **Export**: Add to `module.exports` in `index.js`
 4. **Tests**: Add test in `test-all-features.js`
-## Key Files
-| File | Purpose |
-|------|---------|
-| `native/rust-kgdb-napi/src/lib.rs` | NAPI-RS Rust bindings |
-| `hypermind-agent.js` | HyperMind AI Framework (~4000 lines) |
-| `index.js` | Platform loader + exports |
-| `index.d.ts` | TypeScript definitions |
-| `test-all-features.js` | 42 feature tests |
-| `tests/*.test.ts` | Jest test suites (~170 tests) |
-| `examples/` | Fraud detection, underwriting demos |
 ## Native Addon Files
 Built addons (platform-specific):
@@ -601,7 +132,7 @@ Built addons (platform-specific):
 ## Version Management
-1. Update version in `package.json`
+1. Update version: `npm version patch --no-git-tag-version`
 2. Run tests: `npm test`
 3. Publish: `npm publish`
 4. Verify: `npm view rust-kgdb versions`
@@ -616,3 +147,23 @@ cd /path/to/rust-kgdb && cargo build --workspace --release
 ```
 **Platform error**: Supported: darwin/linux (x64/arm64), win32 (x64)
+## Benchmark Information
+For benchmark methodology and results, see:
+- `HYPERMIND_BENCHMARK_REPORT.md` - Full methodology, per-test results
+- `vanilla-vs-hypermind-benchmark.js` - HyperMind vs Vanilla LLM (JavaScript)
+- `benchmark-frameworks.py` - Compare Vanilla/LangChain/DSPy with/without schema (Python)
+- `examples/fraud-detection-agent.js` - Real dataset example (line 612: `loadTtl`)
+- `examples/underwriting-agent.js` - Real dataset example (line 766: `loadTtl`)
+**Running Benchmarks**:
+```bash
+# JavaScript benchmark (HyperMind vs Vanilla on LUBM)
+ANTHROPIC_API_KEY=... OPENAI_API_KEY=... node vanilla-vs-hypermind-benchmark.js
+# Python benchmark (Compare frameworks with/without schema)
+OPENAI_API_KEY=... uv run --with openai --with langchain --with langchain-openai --with langchain-core --with dspy-ai python3 benchmark-frameworks.py
+```
+**Key Result**: HyperMind achieves 86.4% accuracy on LUBM benchmark (3,272 triples, 30 classes, 23 properties) where vanilla LLMs achieve 0%.