npm - rust-kgdb - Versions diffs - 0.6.31 → 0.6.33 - Mend

rust-kgdb 0.6.31 → 0.6.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/CHANGELOG.md +42 -0
package/CLAUDE.md +50 -499
package/HYPERMIND_BENCHMARK_REPORT.md +199 -41
package/README.md +194 -171
package/benchmark-frameworks.py +568 -0
package/package.json +3 -1
package/verified_benchmark_results.json +307 -0

package/README.md CHANGED Viewed

@@ -12,27 +12,27 @@
 ---
-## Results
+## Results (Verified December 2025)
 ```
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │  BENCHMARK: LUBM (Lehigh University Benchmark)                              │
 │  DATASET:   3,272 triples │ 30 OWL classes │ 23 properties                  │
-│  TESTS:     11 hard scenarios (ambiguous, multi-hop, edge cases)            │
-│  PROTOCOL:  Query → Parse → Type-check → Execute → Verify                   │
+│  MODEL:     GPT-4o │ Real API calls │ No mocking                            │
 ├─────────────────────────────────────────────────────────────────────────────┤
 │                                                                             │
-│  METRIC            VANILLA LLM      HYPERMIND       IMPROVEMENT             │
+│  FRAMEWORK         NO SCHEMA     WITH SCHEMA    IMPROVEMENT                 │
 │  ─────────────────────────────────────────────────────────────              │
-│  Accuracy          0%               86.4%           +86.4 pp                │
-│  Hallucinations    100%             0%              Eliminated              │
-│  Audit Trail       None             Complete        Full provenance         │
-│  Reproducibility   Random           Deterministic   Same hash               │
+│  Vanilla OpenAI    0.0%          71.4%          +71.4 pp                    │
+│  LangChain         0.0%          71.4%          +71.4 pp                    │
+│  DSPy              14.3%         71.4%          +57.1 pp                    │
+│  ─────────────────────────────────────────────────────────────              │
+│  AVERAGE           4.8%          71.4%          +66.7 pp                    │
 │                                                                             │
-│  Claude Sonnet 4:  90.9% accuracy                                           │
-│  GPT-4o:           81.8% accuracy                                           │
+│  KEY INSIGHT: Schema injection improves ALL frameworks equally.             │
+│  HyperMind's value = architecture, not framework.                           │
 │                                                                             │
-│  Reproduce: node vanilla-vs-hypermind-benchmark.js                          │
+│  Reproduce: python3 benchmark-frameworks.py                                 │
 └─────────────────────────────────────────────────────────────────────────────┘
 ```
@@ -275,6 +275,149 @@ console.log(result.reasoningTrace)  // Full audit trail
 ---
+## Framework Comparison (Verified Benchmark Setup)
+The following code snippets show EXACTLY how each framework was tested. All tests use the same LUBM dataset (3,272 triples) and GPT-4o model with real API calls—no mocking.
+**Reproduce yourself**: `python3 benchmark-frameworks.py` (included in package)
+### Vanilla OpenAI (0% → 71.4% with schema)
+```python
+# WITHOUT SCHEMA: 0% accuracy
+from openai import OpenAI
+client = OpenAI()
+response = client.chat.completions.create(
+    model="gpt-4o",
+    messages=[{"role": "user", "content": "Find all teachers"}]
+)
+# Returns: Long explanation with markdown code blocks
+# FAILS: No usable SPARQL query
+```
+```python
+# WITH SCHEMA: 71.4% accuracy (+71.4 pp improvement)
+LUBM_SCHEMA = """
+PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
+Classes: University, Department, Professor, Student, Course, Publication
+Properties: teacherOf(Faculty→Course), worksFor(Faculty→Department)
+"""
+response = client.chat.completions.create(
+    model="gpt-4o",
+    messages=[{
+        "role": "system",
+        "content": f"{LUBM_SCHEMA}\nOutput raw SPARQL only, no markdown."
+    }, {
+        "role": "user",
+        "content": "Find all teachers"
+    }]
+)
+# Returns: SELECT DISTINCT ?teacher WHERE { ?teacher a ub:Professor . }
+# WORKS: Valid SPARQL using correct ontology terms
+```
+### LangChain (0% → 71.4% with schema)
+```python
+# WITHOUT SCHEMA: 0% accuracy
+from langchain_openai import ChatOpenAI
+from langchain_core.prompts import PromptTemplate
+from langchain_core.output_parsers import StrOutputParser
+llm = ChatOpenAI(model="gpt-4o")
+template = PromptTemplate(
+    input_variables=["question"],
+    template="Generate SPARQL for: {question}"
+)
+chain = template | llm | StrOutputParser()
+result = chain.invoke({"question": "Find all teachers"})
+# Returns: Explanation + markdown code blocks
+# FAILS: Not executable SPARQL
+```
+```python
+# WITH SCHEMA: 71.4% accuracy (+71.4 pp improvement)
+template = PromptTemplate(
+    input_variables=["question", "schema"],
+    template="""You are a SPARQL query generator.
+{schema}
+TYPE CONTRACT: Output raw SPARQL only, NO markdown, NO explanation.
+Query: {question}
+Output raw SPARQL only:"""
+)
+chain = template | llm | StrOutputParser()
+result = chain.invoke({"question": "Find all teachers", "schema": LUBM_SCHEMA})
+# Returns: SELECT DISTINCT ?teacher WHERE { ?teacher a ub:Professor . }
+# WORKS: Schema injection guides correct predicate selection
+```
+### DSPy (14.3% → 71.4% with schema)
+```python
+# WITHOUT SCHEMA: 14.3% accuracy (best without schema!)
+import dspy
+from dspy import LM
+lm = LM("openai/gpt-4o")
+dspy.configure(lm=lm)
+class SPARQLGenerator(dspy.Signature):
+    """Generate SPARQL query."""
+    question = dspy.InputField()
+    sparql = dspy.OutputField(desc="Raw SPARQL query only")
+generator = dspy.Predict(SPARQLGenerator)
+result = generator(question="Find all teachers")
+# Returns: SELECT ?teacher WHERE { ?teacher a :Teacher . }
+# PARTIAL: Sometimes works due to DSPy's structured output
+```
+```python
+# WITH SCHEMA: 71.4% accuracy (+57.1 pp improvement)
+class SchemaSPARQLGenerator(dspy.Signature):
+    """Generate SPARQL query using the provided schema."""
+    schema = dspy.InputField(desc="Database schema with classes and properties")
+    question = dspy.InputField(desc="Natural language question")
+    sparql = dspy.OutputField(desc="Raw SPARQL query, no markdown")
+generator = dspy.Predict(SchemaSPARQLGenerator)
+result = generator(schema=LUBM_SCHEMA, question="Find all teachers")
+# Returns: SELECT DISTINCT ?teacher WHERE { ?teacher a ub:Professor . }
+# WORKS: Schema + DSPy structured output = reliable queries
+```
+### HyperMind (Built-in Schema Awareness)
+```javascript
+// HyperMind auto-extracts schema from your data
+const { HyperMindAgent, createSchemaAwareGraphDB } = require('rust-kgdb');
+const db = createSchemaAwareGraphDB('http://university.org/');
+db.loadTtl(lubmData, null);  // Load LUBM 3,272 triples
+const agent = new HyperMindAgent({
+  kg: db,
+  model: 'gpt-4o',
+  apiKey: process.env.OPENAI_API_KEY
+});
+const result = await agent.call('Find all teachers');
+// Schema auto-extracted: { classes: Set(30), properties: Map(23) }
+// Query generated: SELECT ?x WHERE { ?x ub:teacherOf ?course . }
+// Result: 39 faculty members who teach courses
+console.log(result.reasoningTrace);
+// [{ tool: 'kg.sparql.query', query: 'SELECT...', bindings: 39 }]
+console.log(result.hash);
+// "sha256:a7b2c3..." - Reproducible answer
+```
+**Key Insight**: All frameworks achieve the SAME accuracy (71.4%) when given schema. HyperMind's value is that it extracts and injects schema AUTOMATICALLY from your data—no manual prompt engineering required.
+---
 ## Use Cases
 ### Fraud Detection
@@ -811,27 +954,44 @@ console.log('Supersteps:', result.supersteps)  // 5
 | Virtuoso | ~5 µs | 35-75 bytes | No |
 | Blazegraph | ~100 µs | 100+ bytes | No |
-### AI Agent Accuracy
+### AI Agent Accuracy (Verified December 2025)
+| Framework | No Schema | With Schema (HyperMind) | Improvement |
+|-----------|-----------|-------------------------|-------------|
+| **Vanilla OpenAI** | 0.0% | 71.4% | +71.4 pp |
+| **LangChain** | 0.0% | 71.4% | +71.4 pp |
+| **DSPy** | 14.3% | 71.4% | +57.1 pp |
+| **Average** | 4.8% | **71.4%** | **+66.7 pp** |
+*Tested: GPT-4o, 7 LUBM queries, real API calls. See `framework_benchmark_*.json` for raw data.*
+### AI Framework Architectural Comparison
-| Approach | Accuracy | Why |
-|----------|----------|-----|
-| **Vanilla LLM** | 0% | Hallucinated predicates, markdown in SPARQL |
-| **HyperMind** | 86.4% | Schema injection, typed tools, audit trail |
+| Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail |
+|-----------|-------------|--------------|-------------------|-------------|
+| **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
+| LangChain | ❌ No | ❌ No | ❌ No | ❌ No |
+| DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No |
-### AI Framework Comparison
+**Key Insight**: Schema injection (HyperMind's architecture) provides +66.7 pp improvement across ALL frameworks. The value is in the architecture, not the specific framework.
-| Framework | Type Safety | Schema Aware | Symbolic Execution | Success Rate |
-|-----------|-------------|--------------|-------------------|--------------|
-| **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | **86.4%** |
-| LangChain | ❌ No | ❌ No | ❌ No | ~20-40%* |
-| AutoGPT | ❌ No | ❌ No | ❌ No | ~10-25%* |
-| DSPy | ⚠️ Partial | ❌ No | ❌ No | ~30-50%* |
+### Reproduce Benchmarks
-*Estimated from GAIA (Meta Research, 2023), SWE-bench (OpenAI, 2024), and LUBM (Lehigh University) benchmarks on structured data tasks. HyperMind results measured on LUBM-1 dataset (3,272 triples, 30 classes, 23 properties) using vanilla-vs-hypermind-benchmark.js.
+Two benchmark scripts are available for verification:
-**Why HyperMind Wins**:
+```bash
+# JavaScript: HyperMind vs Vanilla LLM on LUBM (12 queries)
+ANTHROPIC_API_KEY=... OPENAI_API_KEY=... node vanilla-vs-hypermind-benchmark.js
+# Python: Compare frameworks (Vanilla, LangChain, DSPy) with/without schema
+OPENAI_API_KEY=... uv run --with openai --with langchain --with langchain-openai --with langchain-core --with dspy-ai python3 benchmark-frameworks.py
+```
+Both scripts make real API calls and report actual results. No mocking.
+**Why These Features Matter**:
 - **Type Safety**: Tools have typed signatures (Query → BindingSet), invalid combinations rejected
-- **Schema Awareness**: LLM sees your actual data structure, can only reference real properties
+- **Schema Awareness**: Planner sees your actual data structure, can only reference real properties
 - **Symbolic Execution**: Queries run against real database, not LLM imagination
 - **Audit Trail**: Every answer has cryptographic hash for reproducibility
@@ -1164,140 +1324,6 @@ const result = await agent.call('Find collusion patterns')
 // Result: ✅ Type-safe, domain-aware, auditable
 ```
-### Code Comparison: DSPy vs HyperMind
-#### DSPy Approach (Prompt Optimization)
-```python
-# DSPy: Statistically optimized prompt - NO guarantees
-import dspy
-class FraudDetector(dspy.Signature):
-    """Find fraud patterns in claims data."""
-    claims_data = dspy.InputField()
-    fraud_patterns = dspy.OutputField()
-class FraudPipeline(dspy.Module):
-    def __init__(self):
-        self.detector = dspy.ChainOfThought(FraudDetector)
-    def forward(self, claims):
-        return self.detector(claims_data=claims)
-# "Optimize" via statistical fitting
-optimizer = dspy.BootstrapFewShot(metric=some_metric)
-optimized = optimizer.compile(FraudPipeline(), trainset=examples)
-# Call and HOPE it works
-result = optimized(claims="[claim data here]")
-# ❌ No type guarantee - fraud_patterns could be anything
-# ❌ No proof of execution - just text output
-# ❌ No composition safety - next step might fail
-# ❌ No audit trail - "it said fraud" is not compliance
-```
-**What DSPy produces:** A string that *probably* contains fraud patterns.
-#### HyperMind Approach (Mathematical Proof)
-```javascript
-// HyperMind: Type-safe morphism composition - PROVEN correct
-const { GraphDB, GraphFrame, DatalogProgram, evaluateDatalog } = require('rust-kgdb')
-// Step 1: Load typed knowledge graph (Schema enforced)
-const db = new GraphDB('http://insurance.org/fraud-kb')
-db.loadTtl(`
-  @prefix : <http://insurance.org/> .
-  :CLM001 :amount "18500" ; :claimant :P001 ; :provider :PROV001 .
-  :P001 :paidTo :P002 .
-  :P002 :paidTo :P003 .
-  :P003 :paidTo :P001 .
-`, null)
-// Step 2: GraphFrame analysis (Morphism: Graph → TriangleCount)
-// Type signature: GraphFrame → number (guaranteed)
-const graph = new GraphFrame(
-  JSON.stringify([{id:'P001'}, {id:'P002'}, {id:'P003'}]),
-  JSON.stringify([
-    {src:'P001', dst:'P002'},
-    {src:'P002', dst:'P003'},
-    {src:'P003', dst:'P001'}
-  ])
-)
-const triangles = graph.triangleCount()  // Type: number (always)
-// Step 3: Datalog inference (Morphism: Rules → Facts)
-// Type signature: DatalogProgram → InferredFacts (guaranteed)
-const datalog = new DatalogProgram()
-datalog.addFact(JSON.stringify({predicate:'claim', terms:['CLM001','P001','PROV001']}))
-datalog.addFact(JSON.stringify({predicate:'related', terms:['P001','P002']}))
-datalog.addRule(JSON.stringify({
-  head: {predicate:'collusion', terms:['?P1','?P2','?Prov']},
-  body: [
-    {predicate:'claim', terms:['?C1','?P1','?Prov']},
-    {predicate:'claim', terms:['?C2','?P2','?Prov']},
-    {predicate:'related', terms:['?P1','?P2']}
-  ]
-}))
-const result = JSON.parse(evaluateDatalog(datalog))
-// ✓ Type guarantee: result.collusion is always array of tuples
-// ✓ Proof of execution: Datalog evaluation is deterministic
-// ✓ Composition safety: Each step has typed input/output
-// ✓ Audit trail: Every fact derivation is traceable
-```
-**What HyperMind produces:** Typed results with mathematical proof of derivation.
-#### Actual Output Comparison
-**DSPy Output:**
-```
-fraud_patterns: "I found some suspicious patterns involving P001 and P002
-that appear to be related. There might be collusion with provider PROV001."
-```
-*How do you validate this? You can't. It's text.*
-**HyperMind Output:**
-```json
-{
-  "triangles": 1,
-  "collusion": [["P001", "P002", "PROV001"]],
-  "executionWitness": {
-    "tool": "datalog.evaluate",
-    "input": "6 facts, 1 rule",
-    "output": "collusion(P001,P002,PROV001)",
-    "derivation": "claim(CLM001,P001,PROV001) ∧ claim(CLM002,P002,PROV001) ∧ related(P001,P002) → collusion(P001,P002,PROV001)",
-    "timestamp": "2024-12-14T10:30:00Z",
-    "semanticHash": "semhash:collusion-p001-p002-prov001"
-  }
-}
-```
-*Every result has a logical derivation and cryptographic proof.*
-#### The Compliance Question
-**Auditor:** "How do you know P001-P002-PROV001 is actually collusion?"
-**DSPy Team:** "Our model said so. It was trained on examples and optimized for accuracy."
-**HyperMind Team:** "Here's the derivation chain:
-1. `claim(CLM001, P001, PROV001)` - fact from data
-2. `claim(CLM002, P002, PROV001)` - fact from data
-3. `related(P001, P002)` - fact from data
-4. Rule: `collusion(?P1, ?P2, ?Prov) :- claim(?C1, ?P1, ?Prov), claim(?C2, ?P2, ?Prov), related(?P1, ?P2)`
-5. Unification: `?P1=P001, ?P2=P002, ?Prov=PROV001`
-6. Conclusion: `collusion(P001, P002, PROV001)` - QED
-Here's the semantic hash: `semhash:collusion-p001-p002-prov001` - same query intent will always return this exact result."
-**Result:** HyperMind passes audit. DSPy gets you a follow-up meeting with legal.
 ### Why Vanilla LLMs Fail
 When you ask an LLM to query a knowledge graph, it produces **broken SPARQL 85% of the time**:
@@ -1346,16 +1372,15 @@ Result: ❌ PARSER ERROR - Invalid SPARQL syntax
 **Note**: Tentris implements WCOJ (see [ISWC 2025 paper](https://papers.dice-research.org/2025/ISWC_Tentris-WCOJ-Update/public.pdf)). rust-kgdb is the only system combining WCOJ with mobile support and integrated AI framework.
-#### AI Framework Comparison
+#### AI Framework Architectural Comparison
-| Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail | Success Rate |
-|-----------|-------------|--------------|-------------------|-------------|--------------|
-| **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | **86.4%** |
-| LangChain | ❌ No | ❌ No | ❌ No | ❌ No | ~20-40%* |
-| AutoGPT | ❌ No | ❌ No | ❌ No | ❌ No | ~10-25%* |
-| DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No | ~30-50%* |
+| Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail |
+|-----------|-------------|--------------|-------------------|-------------|
+| **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
+| LangChain | ❌ No | ❌ No | ❌ No | ❌ No |
+| DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No |
-*Estimated from GAIA (Meta Research, 2023), SWE-bench (OpenAI, 2024), and LUBM (Lehigh University) benchmarks. HyperMind: LUBM-1 (3,272 triples).
+**Note**: This compares architectural features. Benchmark (Dec 2025): Schema injection improves all frameworks by +66.7 pp (Vanilla: 0%→71.4%, LangChain: 0%→71.4%, DSPy: 14.3%→71.4%).
 ```
 ┌─────────────────────────────────────────────────────────────────┐
@@ -1368,12 +1393,10 @@ Result: ❌ PARSER ERROR - Invalid SPARQL syntax
 │  Apache Jena:    Great features, but 150+ µs lookups            │
 │  Neo4j:          Popular, but no SPARQL/RDF standards           │
 │  Amazon Neptune: Managed, but cloud-only vendor lock-in         │
-│  LangChain:      Vibe coding, fails compliance audits           │
-│  DSPy:           Statistical optimization, no guarantees        │
 │                                                                 │
 │  rust-kgdb:      2.78 µs lookups, WCOJ joins, mobile-native     │
 │                  Standalone → Clustered on same codebase        │
-│                  Mathematical foundations, audit-ready           │
+│                  Deterministic planner, audit-ready              │
 │                                                                 │
 └─────────────────────────────────────────────────────────────────┘
 ```