rust-kgdb 0.6.31 → 0.6.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -12,27 +12,27 @@
12
12
 
13
13
  ---
14
14
 
15
- ## Results
15
+ ## Results (Verified December 2025)
16
16
 
17
17
  ```
18
18
  ┌─────────────────────────────────────────────────────────────────────────────┐
19
19
  │ BENCHMARK: LUBM (Lehigh University Benchmark) │
20
20
  │ DATASET: 3,272 triples │ 30 OWL classes │ 23 properties │
21
- TESTS: 11 hard scenarios (ambiguous, multi-hop, edge cases)
22
- │ PROTOCOL: Query → Parse → Type-check → Execute → Verify │
21
+ MODEL: GPT-4o Real API calls No mocking
23
22
  ├─────────────────────────────────────────────────────────────────────────────┤
24
23
  │ │
25
- METRIC VANILLA LLM HYPERMIND IMPROVEMENT
24
+ FRAMEWORK NO SCHEMA WITH SCHEMA IMPROVEMENT
26
25
  │ ───────────────────────────────────────────────────────────── │
27
- Accuracy 0% 86.4% +86.4 pp
28
- Hallucinations 100% 0% Eliminated
29
- Audit Trail None Complete Full provenance
30
- Reproducibility Random Deterministic Same hash
26
+ Vanilla OpenAI 0.0% 71.4% +71.4 pp
27
+ LangChain 0.0% 71.4% +71.4 pp
28
+ DSPy 14.3% 71.4% +57.1 pp
29
+ ─────────────────────────────────────────────────────────────
30
+ │ AVERAGE 4.8% 71.4% +66.7 pp │
31
31
  │ │
32
- Claude Sonnet 4: 90.9% accuracy
33
- GPT-4o: 81.8% accuracy
32
+ KEY INSIGHT: Schema injection improves ALL frameworks equally.
33
+ HyperMind's value = architecture, not framework.
34
34
  │ │
35
- │ Reproduce: node vanilla-vs-hypermind-benchmark.js
35
+ │ Reproduce: python3 benchmark-frameworks.py
36
36
  └─────────────────────────────────────────────────────────────────────────────┘
37
37
  ```
38
38
 
@@ -275,6 +275,149 @@ console.log(result.reasoningTrace) // Full audit trail
275
275
 
276
276
  ---
277
277
 
278
+ ## Framework Comparison (Verified Benchmark Setup)
279
+
280
+ The following code snippets show EXACTLY how each framework was tested. All tests use the same LUBM dataset (3,272 triples) and GPT-4o model with real API calls—no mocking.
281
+
282
+ **Reproduce yourself**: `python3 benchmark-frameworks.py` (included in package)
283
+
284
+ ### Vanilla OpenAI (0% → 71.4% with schema)
285
+
286
+ ```python
287
+ # WITHOUT SCHEMA: 0% accuracy
288
+ from openai import OpenAI
289
+ client = OpenAI()
290
+
291
+ response = client.chat.completions.create(
292
+ model="gpt-4o",
293
+ messages=[{"role": "user", "content": "Find all teachers"}]
294
+ )
295
+ # Returns: Long explanation with markdown code blocks
296
+ # FAILS: No usable SPARQL query
297
+ ```
298
+
299
+ ```python
300
+ # WITH SCHEMA: 71.4% accuracy (+71.4 pp improvement)
301
+ LUBM_SCHEMA = """
302
+ PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
303
+ Classes: University, Department, Professor, Student, Course, Publication
304
+ Properties: teacherOf(Faculty→Course), worksFor(Faculty→Department)
305
+ """
306
+
307
+ response = client.chat.completions.create(
308
+ model="gpt-4o",
309
+ messages=[{
310
+ "role": "system",
311
+ "content": f"{LUBM_SCHEMA}\nOutput raw SPARQL only, no markdown."
312
+ }, {
313
+ "role": "user",
314
+ "content": "Find all teachers"
315
+ }]
316
+ )
317
+ # Returns: SELECT DISTINCT ?teacher WHERE { ?teacher a ub:Professor . }
318
+ # WORKS: Valid SPARQL using correct ontology terms
319
+ ```
320
+
321
+ ### LangChain (0% → 71.4% with schema)
322
+
323
+ ```python
324
+ # WITHOUT SCHEMA: 0% accuracy
325
+ from langchain_openai import ChatOpenAI
326
+ from langchain_core.prompts import PromptTemplate
327
+ from langchain_core.output_parsers import StrOutputParser
328
+
329
+ llm = ChatOpenAI(model="gpt-4o")
330
+ template = PromptTemplate(
331
+ input_variables=["question"],
332
+ template="Generate SPARQL for: {question}"
333
+ )
334
+ chain = template | llm | StrOutputParser()
335
+ result = chain.invoke({"question": "Find all teachers"})
336
+ # Returns: Explanation + markdown code blocks
337
+ # FAILS: Not executable SPARQL
338
+ ```
339
+
340
+ ```python
341
+ # WITH SCHEMA: 71.4% accuracy (+71.4 pp improvement)
342
+ template = PromptTemplate(
343
+ input_variables=["question", "schema"],
344
+ template="""You are a SPARQL query generator.
345
+ {schema}
346
+ TYPE CONTRACT: Output raw SPARQL only, NO markdown, NO explanation.
347
+ Query: {question}
348
+ Output raw SPARQL only:"""
349
+ )
350
+ chain = template | llm | StrOutputParser()
351
+ result = chain.invoke({"question": "Find all teachers", "schema": LUBM_SCHEMA})
352
+ # Returns: SELECT DISTINCT ?teacher WHERE { ?teacher a ub:Professor . }
353
+ # WORKS: Schema injection guides correct predicate selection
354
+ ```
355
+
356
+ ### DSPy (14.3% → 71.4% with schema)
357
+
358
+ ```python
359
+ # WITHOUT SCHEMA: 14.3% accuracy (best without schema!)
360
+ import dspy
361
+ from dspy import LM
362
+
363
+ lm = LM("openai/gpt-4o")
364
+ dspy.configure(lm=lm)
365
+
366
+ class SPARQLGenerator(dspy.Signature):
367
+ """Generate SPARQL query."""
368
+ question = dspy.InputField()
369
+ sparql = dspy.OutputField(desc="Raw SPARQL query only")
370
+
371
+ generator = dspy.Predict(SPARQLGenerator)
372
+ result = generator(question="Find all teachers")
373
+ # Returns: SELECT ?teacher WHERE { ?teacher a :Teacher . }
374
+ # PARTIAL: Sometimes works due to DSPy's structured output
375
+ ```
376
+
377
+ ```python
378
+ # WITH SCHEMA: 71.4% accuracy (+57.1 pp improvement)
379
+ class SchemaSPARQLGenerator(dspy.Signature):
380
+ """Generate SPARQL query using the provided schema."""
381
+ schema = dspy.InputField(desc="Database schema with classes and properties")
382
+ question = dspy.InputField(desc="Natural language question")
383
+ sparql = dspy.OutputField(desc="Raw SPARQL query, no markdown")
384
+
385
+ generator = dspy.Predict(SchemaSPARQLGenerator)
386
+ result = generator(schema=LUBM_SCHEMA, question="Find all teachers")
387
+ # Returns: SELECT DISTINCT ?teacher WHERE { ?teacher a ub:Professor . }
388
+ # WORKS: Schema + DSPy structured output = reliable queries
389
+ ```
390
+
391
+ ### HyperMind (Built-in Schema Awareness)
392
+
393
+ ```javascript
394
+ // HyperMind auto-extracts schema from your data
395
+ const { HyperMindAgent, createSchemaAwareGraphDB } = require('rust-kgdb');
396
+
397
+ const db = createSchemaAwareGraphDB('http://university.org/');
398
+ db.loadTtl(lubmData, null); // Load LUBM 3,272 triples
399
+
400
+ const agent = new HyperMindAgent({
401
+ kg: db,
402
+ model: 'gpt-4o',
403
+ apiKey: process.env.OPENAI_API_KEY
404
+ });
405
+
406
+ const result = await agent.call('Find all teachers');
407
+ // Schema auto-extracted: { classes: Set(30), properties: Map(23) }
408
+ // Query generated: SELECT ?x WHERE { ?x ub:teacherOf ?course . }
409
+ // Result: 39 faculty members who teach courses
410
+
411
+ console.log(result.reasoningTrace);
412
+ // [{ tool: 'kg.sparql.query', query: 'SELECT...', bindings: 39 }]
413
+ console.log(result.hash);
414
+ // "sha256:a7b2c3..." - Reproducible answer
415
+ ```
416
+
417
+ **Key Insight**: All frameworks achieve the SAME accuracy (71.4%) when given schema. HyperMind's value is that it extracts and injects schema AUTOMATICALLY from your data—no manual prompt engineering required.
418
+
419
+ ---
420
+
278
421
  ## Use Cases
279
422
 
280
423
  ### Fraud Detection
@@ -811,27 +954,44 @@ console.log('Supersteps:', result.supersteps) // 5
811
954
  | Virtuoso | ~5 µs | 35-75 bytes | No |
812
955
  | Blazegraph | ~100 µs | 100+ bytes | No |
813
956
 
814
- ### AI Agent Accuracy
957
+ ### AI Agent Accuracy (Verified December 2025)
958
+
959
+ | Framework | No Schema | With Schema (HyperMind) | Improvement |
960
+ |-----------|-----------|-------------------------|-------------|
961
+ | **Vanilla OpenAI** | 0.0% | 71.4% | +71.4 pp |
962
+ | **LangChain** | 0.0% | 71.4% | +71.4 pp |
963
+ | **DSPy** | 14.3% | 71.4% | +57.1 pp |
964
+ | **Average** | 4.8% | **71.4%** | **+66.7 pp** |
965
+
966
+ *Tested: GPT-4o, 7 LUBM queries, real API calls. See `framework_benchmark_*.json` for raw data.*
967
+
968
+ ### AI Framework Architectural Comparison
815
969
 
816
- | Approach | Accuracy | Why |
817
- |----------|----------|-----|
818
- | **Vanilla LLM** | 0% | Hallucinated predicates, markdown in SPARQL |
819
- | **HyperMind** | 86.4% | Schema injection, typed tools, audit trail |
970
+ | Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail |
971
+ |-----------|-------------|--------------|-------------------|-------------|
972
+ | **HyperMind** | Yes | Yes | Yes | ✅ Yes |
973
+ | LangChain | No | No | No | ❌ No |
974
+ | DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No |
820
975
 
821
- ### AI Framework Comparison
976
+ **Key Insight**: Schema injection (HyperMind's architecture) provides +66.7 pp improvement across ALL frameworks. The value is in the architecture, not the specific framework.
822
977
 
823
- | Framework | Type Safety | Schema Aware | Symbolic Execution | Success Rate |
824
- |-----------|-------------|--------------|-------------------|--------------|
825
- | **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | **86.4%** |
826
- | LangChain | ❌ No | ❌ No | ❌ No | ~20-40%* |
827
- | AutoGPT | ❌ No | ❌ No | ❌ No | ~10-25%* |
828
- | DSPy | ⚠️ Partial | ❌ No | ❌ No | ~30-50%* |
978
+ ### Reproduce Benchmarks
829
979
 
830
- *Estimated from GAIA (Meta Research, 2023), SWE-bench (OpenAI, 2024), and LUBM (Lehigh University) benchmarks on structured data tasks. HyperMind results measured on LUBM-1 dataset (3,272 triples, 30 classes, 23 properties) using vanilla-vs-hypermind-benchmark.js.
980
+ Two benchmark scripts are available for verification:
831
981
 
832
- **Why HyperMind Wins**:
982
+ ```bash
983
+ # JavaScript: HyperMind vs Vanilla LLM on LUBM (12 queries)
984
+ ANTHROPIC_API_KEY=... OPENAI_API_KEY=... node vanilla-vs-hypermind-benchmark.js
985
+
986
+ # Python: Compare frameworks (Vanilla, LangChain, DSPy) with/without schema
987
+ OPENAI_API_KEY=... uv run --with openai --with langchain --with langchain-openai --with langchain-core --with dspy-ai python3 benchmark-frameworks.py
988
+ ```
989
+
990
+ Both scripts make real API calls and report actual results. No mocking.
991
+
992
+ **Why These Features Matter**:
833
993
  - **Type Safety**: Tools have typed signatures (Query → BindingSet), invalid combinations rejected
834
- - **Schema Awareness**: LLM sees your actual data structure, can only reference real properties
994
+ - **Schema Awareness**: Planner sees your actual data structure, can only reference real properties
835
995
  - **Symbolic Execution**: Queries run against real database, not LLM imagination
836
996
  - **Audit Trail**: Every answer has cryptographic hash for reproducibility
837
997
 
@@ -1164,140 +1324,6 @@ const result = await agent.call('Find collusion patterns')
1164
1324
  // Result: ✅ Type-safe, domain-aware, auditable
1165
1325
  ```
1166
1326
 
1167
- ### Code Comparison: DSPy vs HyperMind
1168
-
1169
- #### DSPy Approach (Prompt Optimization)
1170
-
1171
- ```python
1172
- # DSPy: Statistically optimized prompt - NO guarantees
1173
-
1174
- import dspy
1175
-
1176
- class FraudDetector(dspy.Signature):
1177
- """Find fraud patterns in claims data."""
1178
- claims_data = dspy.InputField()
1179
- fraud_patterns = dspy.OutputField()
1180
-
1181
- class FraudPipeline(dspy.Module):
1182
- def __init__(self):
1183
- self.detector = dspy.ChainOfThought(FraudDetector)
1184
-
1185
- def forward(self, claims):
1186
- return self.detector(claims_data=claims)
1187
-
1188
- # "Optimize" via statistical fitting
1189
- optimizer = dspy.BootstrapFewShot(metric=some_metric)
1190
- optimized = optimizer.compile(FraudPipeline(), trainset=examples)
1191
-
1192
- # Call and HOPE it works
1193
- result = optimized(claims="[claim data here]")
1194
-
1195
- # ❌ No type guarantee - fraud_patterns could be anything
1196
- # ❌ No proof of execution - just text output
1197
- # ❌ No composition safety - next step might fail
1198
- # ❌ No audit trail - "it said fraud" is not compliance
1199
- ```
1200
-
1201
- **What DSPy produces:** A string that *probably* contains fraud patterns.
1202
-
1203
- #### HyperMind Approach (Mathematical Proof)
1204
-
1205
- ```javascript
1206
- // HyperMind: Type-safe morphism composition - PROVEN correct
1207
-
1208
- const { GraphDB, GraphFrame, DatalogProgram, evaluateDatalog } = require('rust-kgdb')
1209
-
1210
- // Step 1: Load typed knowledge graph (Schema enforced)
1211
- const db = new GraphDB('http://insurance.org/fraud-kb')
1212
- db.loadTtl(`
1213
- @prefix : <http://insurance.org/> .
1214
- :CLM001 :amount "18500" ; :claimant :P001 ; :provider :PROV001 .
1215
- :P001 :paidTo :P002 .
1216
- :P002 :paidTo :P003 .
1217
- :P003 :paidTo :P001 .
1218
- `, null)
1219
-
1220
- // Step 2: GraphFrame analysis (Morphism: Graph → TriangleCount)
1221
- // Type signature: GraphFrame → number (guaranteed)
1222
- const graph = new GraphFrame(
1223
- JSON.stringify([{id:'P001'}, {id:'P002'}, {id:'P003'}]),
1224
- JSON.stringify([
1225
- {src:'P001', dst:'P002'},
1226
- {src:'P002', dst:'P003'},
1227
- {src:'P003', dst:'P001'}
1228
- ])
1229
- )
1230
- const triangles = graph.triangleCount() // Type: number (always)
1231
-
1232
- // Step 3: Datalog inference (Morphism: Rules → Facts)
1233
- // Type signature: DatalogProgram → InferredFacts (guaranteed)
1234
- const datalog = new DatalogProgram()
1235
- datalog.addFact(JSON.stringify({predicate:'claim', terms:['CLM001','P001','PROV001']}))
1236
- datalog.addFact(JSON.stringify({predicate:'related', terms:['P001','P002']}))
1237
-
1238
- datalog.addRule(JSON.stringify({
1239
- head: {predicate:'collusion', terms:['?P1','?P2','?Prov']},
1240
- body: [
1241
- {predicate:'claim', terms:['?C1','?P1','?Prov']},
1242
- {predicate:'claim', terms:['?C2','?P2','?Prov']},
1243
- {predicate:'related', terms:['?P1','?P2']}
1244
- ]
1245
- }))
1246
-
1247
- const result = JSON.parse(evaluateDatalog(datalog))
1248
-
1249
- // ✓ Type guarantee: result.collusion is always array of tuples
1250
- // ✓ Proof of execution: Datalog evaluation is deterministic
1251
- // ✓ Composition safety: Each step has typed input/output
1252
- // ✓ Audit trail: Every fact derivation is traceable
1253
- ```
1254
-
1255
- **What HyperMind produces:** Typed results with mathematical proof of derivation.
1256
-
1257
- #### Actual Output Comparison
1258
-
1259
- **DSPy Output:**
1260
- ```
1261
- fraud_patterns: "I found some suspicious patterns involving P001 and P002
1262
- that appear to be related. There might be collusion with provider PROV001."
1263
- ```
1264
- *How do you validate this? You can't. It's text.*
1265
-
1266
- **HyperMind Output:**
1267
- ```json
1268
- {
1269
- "triangles": 1,
1270
- "collusion": [["P001", "P002", "PROV001"]],
1271
- "executionWitness": {
1272
- "tool": "datalog.evaluate",
1273
- "input": "6 facts, 1 rule",
1274
- "output": "collusion(P001,P002,PROV001)",
1275
- "derivation": "claim(CLM001,P001,PROV001) ∧ claim(CLM002,P002,PROV001) ∧ related(P001,P002) → collusion(P001,P002,PROV001)",
1276
- "timestamp": "2024-12-14T10:30:00Z",
1277
- "semanticHash": "semhash:collusion-p001-p002-prov001"
1278
- }
1279
- }
1280
- ```
1281
- *Every result has a logical derivation and cryptographic proof.*
1282
-
1283
- #### The Compliance Question
1284
-
1285
- **Auditor:** "How do you know P001-P002-PROV001 is actually collusion?"
1286
-
1287
- **DSPy Team:** "Our model said so. It was trained on examples and optimized for accuracy."
1288
-
1289
- **HyperMind Team:** "Here's the derivation chain:
1290
- 1. `claim(CLM001, P001, PROV001)` - fact from data
1291
- 2. `claim(CLM002, P002, PROV001)` - fact from data
1292
- 3. `related(P001, P002)` - fact from data
1293
- 4. Rule: `collusion(?P1, ?P2, ?Prov) :- claim(?C1, ?P1, ?Prov), claim(?C2, ?P2, ?Prov), related(?P1, ?P2)`
1294
- 5. Unification: `?P1=P001, ?P2=P002, ?Prov=PROV001`
1295
- 6. Conclusion: `collusion(P001, P002, PROV001)` - QED
1296
-
1297
- Here's the semantic hash: `semhash:collusion-p001-p002-prov001` - same query intent will always return this exact result."
1298
-
1299
- **Result:** HyperMind passes audit. DSPy gets you a follow-up meeting with legal.
1300
-
1301
1327
  ### Why Vanilla LLMs Fail
1302
1328
 
1303
1329
  When you ask an LLM to query a knowledge graph, it produces **broken SPARQL 85% of the time**:
@@ -1346,16 +1372,15 @@ Result: ❌ PARSER ERROR - Invalid SPARQL syntax
1346
1372
 
1347
1373
  **Note**: Tentris implements WCOJ (see [ISWC 2025 paper](https://papers.dice-research.org/2025/ISWC_Tentris-WCOJ-Update/public.pdf)). rust-kgdb is the only system combining WCOJ with mobile support and integrated AI framework.
1348
1374
 
1349
- #### AI Framework Comparison
1375
+ #### AI Framework Architectural Comparison
1350
1376
 
1351
- | Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail | Success Rate |
1352
- |-----------|-------------|--------------|-------------------|-------------|--------------|
1353
- | **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | **86.4%** |
1354
- | LangChain | ❌ No | ❌ No | ❌ No | ❌ No | ~20-40%* |
1355
- | AutoGPT | No | ❌ No | ❌ No | ❌ No | ~10-25%* |
1356
- | DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No | ~30-50%* |
1377
+ | Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail |
1378
+ |-----------|-------------|--------------|-------------------|-------------|
1379
+ | **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
1380
+ | LangChain | ❌ No | ❌ No | ❌ No | ❌ No |
1381
+ | DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No |
1357
1382
 
1358
- *Estimated from GAIA (Meta Research, 2023), SWE-bench (OpenAI, 2024), and LUBM (Lehigh University) benchmarks. HyperMind: LUBM-1 (3,272 triples).
1383
+ **Note**: This compares architectural features. Benchmark (Dec 2025): Schema injection improves all frameworks by +66.7 pp (Vanilla: 0%→71.4%, LangChain: 0%→71.4%, DSPy: 14.3%→71.4%).
1359
1384
 
1360
1385
  ```
1361
1386
  ┌─────────────────────────────────────────────────────────────────┐
@@ -1368,12 +1393,10 @@ Result: ❌ PARSER ERROR - Invalid SPARQL syntax
1368
1393
  │ Apache Jena: Great features, but 150+ µs lookups │
1369
1394
  │ Neo4j: Popular, but no SPARQL/RDF standards │
1370
1395
  │ Amazon Neptune: Managed, but cloud-only vendor lock-in │
1371
- │ LangChain: Vibe coding, fails compliance audits │
1372
- │ DSPy: Statistical optimization, no guarantees │
1373
1396
  │ │
1374
1397
  │ rust-kgdb: 2.78 µs lookups, WCOJ joins, mobile-native │
1375
1398
  │ Standalone → Clustered on same codebase │
1376
- Mathematical foundations, audit-ready
1399
+ Deterministic planner, audit-ready
1377
1400
  │ │
1378
1401
  └─────────────────────────────────────────────────────────────────┘
1379
1402
  ```