rust-kgdb 0.6.30 → 0.6.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,28 +1,31 @@
1
1
  # HyperMind Benchmark Report
2
2
 
3
- ## AI That Doesn't Hallucinate: The Results
3
+ ## Verified Framework Comparison: Schema Injection Works
4
4
 
5
- **Version**: 0.6.17
5
+ **Version**: 0.6.32
6
6
  **Date**: December 16, 2025
7
- **SDK**: rust-kgdb@0.6.17
7
+ **SDK**: rust-kgdb@0.6.32
8
8
 
9
9
  ---
10
10
 
11
- ## The Bottom Line
11
+ ## Executive Summary (Verified Results)
12
12
 
13
- **HyperMind achieves 86.4% accuracy where vanilla LLMs achieve 0%.**
13
+ **Schema injection improves ALL frameworks by +66.7 percentage points.**
14
14
 
15
- | Metric | Vanilla LLM | HyperMind | Improvement |
16
- |--------|-------------|-----------|-------------|
17
- | **Accuracy** | 0% | 86.4% | +86.4 pp |
18
- | **Hallucinations** | 100% | 0% | Eliminated |
19
- | **Audit Trail** | None | Complete | Full provenance |
20
- | **Claude Sonnet 4** | 0% | 90.9% | +90.9 pp |
21
- | **GPT-4o** | 0% | 81.8% | +81.8 pp |
15
+ | Framework | No Schema | With Schema | Improvement |
16
+ |-----------|-----------|-------------|-------------|
17
+ | **Vanilla OpenAI** | 0.0% | 71.4% | +71.4 pp |
18
+ | **LangChain** | 0.0% | 71.4% | +71.4 pp |
19
+ | **DSPy** | 14.3% | 71.4% | +57.1 pp |
20
+ | **Average** | 4.8% | **71.4%** | **+66.7 pp** |
21
+
22
+ *GPT-4o, 7 LUBM queries, real API calls, no mocking. See `verified_benchmark_results.json`.*
23
+
24
+ **Key Insight**: The value is in the ARCHITECTURE (schema injection, type contracts), not the specific framework.
22
25
 
23
26
  ---
24
27
 
25
- ## Why Vanilla LLMs Fail
28
+ ## Why Vanilla LLMs Fail (Without Schema)
26
29
 
27
30
  When you ask a vanilla LLM to query your database:
28
31
 
@@ -33,39 +36,194 @@ When you ask a vanilla LLM to query your database:
33
36
 
34
37
  ---
35
38
 
36
- ## How HyperMind Fixes This
39
+ ## How Schema Injection Fixes This
37
40
 
38
- HyperMind grounds every answer in your actual data:
41
+ The HyperMind approach (schema injection) works with ANY framework:
39
42
 
40
43
  1. **Schema injection** - LLM sees your real data structure (30 classes, 23 properties)
41
- 2. **Input/output validation** - Prevents invalid query combinations
42
- 3. **Reasoning trace** - Every answer shows exactly how it was derived
43
- 4. **Reproducible** - Same question = Same answer = Same hash
44
+ 2. **Output format** - Explicit instructions for raw SPARQL (no markdown)
45
+ 3. **Type contracts** - Predicate constraints from actual schema
46
+ 4. **Reproducible** - Same question = Same answer
47
+
48
+ ---
49
+
50
+ ## Benchmark Setup: Code for Each Framework
51
+
52
+ ### Test Queries (Same for All Frameworks)
53
+
54
+ ```python
55
+ TEST_QUERIES = [
56
+ {"id": "A1", "question": "Find all teachers", "correct_predicate": "teacherOf"},
57
+ {"id": "A2", "question": "Get student emails", "correct_predicate": "emailAddress"},
58
+ {"id": "A3", "question": "Find faculty members", "correct_predicate": "Professor"},
59
+ {"id": "S1", "question": "Write a SPARQL query to count professors. Just give me the query."},
60
+ {"id": "S2", "question": "SPARQL only, no explanation: find graduate students"},
61
+ {"id": "M1", "question": "Find professors who work for departments"},
62
+ {"id": "E1", "question": "Find professors with no publications"}
63
+ ]
64
+ ```
65
+
66
+ ### LUBM Schema (Injected for "With Schema" Tests)
67
+
68
+ ```python
69
+ LUBM_SCHEMA = """LUBM (Lehigh University Benchmark) Schema:
70
+
71
+ PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
44
72
 
73
+ Classes: University, Department, Professor, AssociateProfessor, AssistantProfessor,
74
+ FullProfessor, Lecturer, GraduateStudent, UndergraduateStudent,
75
+ Course, GraduateCourse, Publication, Research, ResearchGroup
76
+
77
+ Properties:
78
+ - ub:worksFor (person → organization)
79
+ - ub:memberOf (person → organization)
80
+ - ub:advisor (student → professor)
81
+ - ub:takesCourse (student → course)
82
+ - ub:teacherOf (professor → course)
83
+ - ub:publicationAuthor (publication → person)
84
+ - ub:subOrganizationOf (organization → organization)
85
+ - ub:emailAddress (person → string)
86
+
87
+ IMPORTANT: Use ONLY these predicates. Do NOT use: teacher, email, faculty"""
45
88
  ```
46
- ┌─────────────────────────────────────────────────────────────────┐
47
- │ YOUR QUESTION │
48
- │ "Find professors who teach courses..." │
49
- └───────────────────────────┬─────────────────────────────────────┘
50
-
51
-
52
- ┌─────────────────────────────────────────────────────────────────┐
53
- │ VANILLA LLM │
54
- │ ❌ No schema awareness │
55
- │ ❌ Hallucinates predicates │
56
- │ ❌ Wraps in markdown │
57
- │ ❌ 0% success rate │
58
- └─────────────────────────────────────────────────────────────────┘
59
-
60
- vs.
61
-
62
- ┌─────────────────────────────────────────────────────────────────┐
63
- │ HYPERMIND │
64
- │ ✅ Sees your actual schema (30 classes, 23 properties) │
65
- │ ✅ Validates query structure before execution │
66
- │ ✅ Provides complete reasoning trace │
67
- │ ✅ 86.4% success rate │
68
- └─────────────────────────────────────────────────────────────────┘
89
+
90
+ ---
91
+
92
+ ## Framework Code Comparison
93
+
94
+ ### 1. Vanilla OpenAI (No Schema) - 0% Accuracy
95
+
96
+ ```python
97
+ from openai import OpenAI
98
+ client = OpenAI(api_key=api_key)
99
+
100
+ response = client.chat.completions.create(
101
+ model="gpt-4o",
102
+ messages=[{"role": "user", "content": f"Generate a SPARQL query for: {question}"}],
103
+ max_tokens=500
104
+ )
105
+ sparql = response.choices[0].message.content
106
+ # Result: 0/7 passed - all wrapped in markdown
107
+ ```
108
+
109
+ ### 2. Vanilla OpenAI (With Schema) - 71.4% Accuracy
110
+
111
+ ```python
112
+ from openai import OpenAI
113
+ client = OpenAI(api_key=api_key)
114
+
115
+ prompt = f"""You are a SPARQL query generator.
116
+
117
+ {LUBM_SCHEMA}
118
+
119
+ TYPE CONTRACT:
120
+ - Input: natural language query
121
+ - Output: raw SPARQL (NO markdown, NO code blocks, NO explanation)
122
+ - Use ONLY predicates from the schema above
123
+
124
+ Query: {question}
125
+
126
+ Output raw SPARQL only:"""
127
+
128
+ response = client.chat.completions.create(
129
+ model="gpt-4o",
130
+ messages=[{"role": "user", "content": prompt}],
131
+ max_tokens=500
132
+ )
133
+ sparql = response.choices[0].message.content
134
+ # Result: 5/7 passed - schema prevents wrong predicates
135
+ ```
136
+
137
+ ### 3. LangChain (No Schema) - 0% Accuracy
138
+
139
+ ```python
140
+ from langchain_openai import ChatOpenAI
141
+ from langchain_core.prompts import PromptTemplate
142
+ from langchain_core.output_parsers import StrOutputParser
143
+
144
+ llm = ChatOpenAI(model="gpt-4o", api_key=api_key)
145
+ parser = StrOutputParser()
146
+
147
+ template = PromptTemplate(
148
+ input_variables=["question"],
149
+ template="Generate a SPARQL query for: {question}"
150
+ )
151
+ chain = template | llm | parser
152
+
153
+ sparql = chain.invoke({"question": question})
154
+ # Result: 0/7 passed - all wrapped in markdown
155
+ ```
156
+
157
+ ### 4. LangChain (With Schema) - 71.4% Accuracy
158
+
159
+ ```python
160
+ from langchain_openai import ChatOpenAI
161
+ from langchain_core.prompts import PromptTemplate
162
+ from langchain_core.output_parsers import StrOutputParser
163
+
164
+ llm = ChatOpenAI(model="gpt-4o", api_key=api_key)
165
+ parser = StrOutputParser()
166
+
167
+ template = PromptTemplate(
168
+ input_variables=["question", "schema"],
169
+ template="""You are a SPARQL query generator.
170
+
171
+ {schema}
172
+
173
+ TYPE CONTRACT:
174
+ - Input: natural language query
175
+ - Output: raw SPARQL (NO markdown, NO code blocks, NO explanation)
176
+ - Use ONLY predicates from the schema above
177
+
178
+ Query: {question}
179
+
180
+ Output raw SPARQL only:"""
181
+ )
182
+ chain = template | llm | parser
183
+
184
+ sparql = chain.invoke({"question": question, "schema": LUBM_SCHEMA})
185
+ # Result: 5/7 passed - same as vanilla with schema
186
+ ```
187
+
188
+ ### 5. DSPy (No Schema) - 14.3% Accuracy
189
+
190
+ ```python
191
+ import dspy
192
+ from dspy import LM
193
+
194
+ lm = LM("openai/gpt-4o")
195
+ dspy.configure(lm=lm)
196
+
197
+ class SPARQLGenerator(dspy.Signature):
198
+ """Generate SPARQL query from natural language."""
199
+ question = dspy.InputField(desc="Natural language question")
200
+ sparql = dspy.OutputField(desc="SPARQL query")
201
+
202
+ generator = dspy.Predict(SPARQLGenerator)
203
+ response = generator(question=question)
204
+ sparql = response.sparql
205
+ # Result: 1/7 passed - slightly better output formatting
206
+ ```
207
+
208
+ ### 6. DSPy (With Schema) - 71.4% Accuracy
209
+
210
+ ```python
211
+ import dspy
212
+ from dspy import LM
213
+
214
+ lm = LM("openai/gpt-4o")
215
+ dspy.configure(lm=lm)
216
+
217
+ class SchemaSPARQLGenerator(dspy.Signature):
218
+ """Generate SPARQL query using the provided schema. Output raw SPARQL only."""
219
+ schema = dspy.InputField(desc="Database schema with classes and properties")
220
+ question = dspy.InputField(desc="Natural language question")
221
+ sparql = dspy.OutputField(desc="Raw SPARQL query (no markdown, no explanation)")
222
+
223
+ generator = dspy.Predict(SchemaSPARQLGenerator)
224
+ response = generator(schema=LUBM_SCHEMA, question=question)
225
+ sparql = response.sparql
226
+ # Result: 5/7 passed - same as others with schema
69
227
  ```
70
228
 
71
229
  ---
package/README.md CHANGED
@@ -12,16 +12,29 @@
12
12
 
13
13
  ---
14
14
 
15
- ## Results
15
+ ## Results (Verified December 2025)
16
16
 
17
- | Metric | Vanilla LLM | HyperMind | Improvement |
18
- |--------|-------------|-----------|-------------|
19
- | **Accuracy** | 0% | 86.4% | +86.4 pp |
20
- | **Hallucinations** | 100% | 0% | Eliminated |
21
- | **Audit Trail** | None | Complete | Full provenance |
22
- | **Reproducibility** | Random | Deterministic | Same hash |
23
-
24
- **Models tested**: Claude Sonnet 4 (90.9%), GPT-4o (81.8%)
17
+ ```
18
+ ┌─────────────────────────────────────────────────────────────────────────────┐
19
+ │ BENCHMARK: LUBM (Lehigh University Benchmark) │
20
+ │ DATASET: 3,272 triples 30 OWL classes 23 properties │
21
+ │ MODEL: GPT-4o Real API calls No mocking │
22
+ ├─────────────────────────────────────────────────────────────────────────────┤
23
+ │ │
24
+ │ FRAMEWORK NO SCHEMA WITH SCHEMA IMPROVEMENT │
25
+ │ ───────────────────────────────────────────────────────────── │
26
+ │ Vanilla OpenAI 0.0% 71.4% +71.4 pp │
27
+ │ LangChain 0.0% 71.4% +71.4 pp │
28
+ │ DSPy 14.3% 71.4% +57.1 pp │
29
+ │ ───────────────────────────────────────────────────────────── │
30
+ │ AVERAGE 4.8% 71.4% +66.7 pp │
31
+ │ │
32
+ │ KEY INSIGHT: Schema injection improves ALL frameworks equally. │
33
+ │ HyperMind's value = architecture, not framework. │
34
+ │ │
35
+ │ Reproduce: python3 benchmark-frameworks.py │
36
+ └─────────────────────────────────────────────────────────────────────────────┘
37
+ ```
25
38
 
26
39
  ---
27
40
 
@@ -798,27 +811,44 @@ console.log('Supersteps:', result.supersteps) // 5
798
811
  | Virtuoso | ~5 µs | 35-75 bytes | No |
799
812
  | Blazegraph | ~100 µs | 100+ bytes | No |
800
813
 
801
- ### AI Agent Accuracy
814
+ ### AI Agent Accuracy (Verified December 2025)
815
+
816
+ | Framework | No Schema | With Schema (HyperMind) | Improvement |
817
+ |-----------|-----------|-------------------------|-------------|
818
+ | **Vanilla OpenAI** | 0.0% | 71.4% | +71.4 pp |
819
+ | **LangChain** | 0.0% | 71.4% | +71.4 pp |
820
+ | **DSPy** | 14.3% | 71.4% | +57.1 pp |
821
+ | **Average** | 4.8% | **71.4%** | **+66.7 pp** |
822
+
823
+ *Tested: GPT-4o, 7 LUBM queries, real API calls. See `framework_benchmark_*.json` for raw data.*
824
+
825
+ ### AI Framework Architectural Comparison
802
826
 
803
- | Approach | Accuracy | Why |
804
- |----------|----------|-----|
805
- | **Vanilla LLM** | 0% | Hallucinated predicates, markdown in SPARQL |
806
- | **HyperMind** | 86.4% | Schema injection, typed tools, audit trail |
827
+ | Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail |
828
+ |-----------|-------------|--------------|-------------------|-------------|
829
+ | **HyperMind** | Yes | Yes | Yes | ✅ Yes |
830
+ | LangChain | No | No | No | ❌ No |
831
+ | DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No |
807
832
 
808
- ### AI Framework Comparison
833
+ **Key Insight**: Schema injection (HyperMind's architecture) provides +66.7 pp improvement across ALL frameworks. The value is in the architecture, not the specific framework.
809
834
 
810
- | Framework | Type Safety | Schema Aware | Symbolic Execution | Success Rate |
811
- |-----------|-------------|--------------|-------------------|--------------|
812
- | **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | **86.4%** |
813
- | LangChain | ❌ No | ❌ No | ❌ No | ~20-40%* |
814
- | AutoGPT | ❌ No | ❌ No | ❌ No | ~10-25%* |
815
- | DSPy | ⚠️ Partial | ❌ No | ❌ No | ~30-50%* |
835
+ ### Reproduce Benchmarks
816
836
 
817
- *Estimated from GAIA (Meta Research, 2023), SWE-bench (OpenAI, 2024), and LUBM (Lehigh University) benchmarks on structured data tasks. HyperMind results measured on LUBM-1 dataset (3,272 triples, 30 classes, 23 properties) using vanilla-vs-hypermind-benchmark.js.
837
+ Two benchmark scripts are available for verification:
818
838
 
819
- **Why HyperMind Wins**:
839
+ ```bash
840
+ # JavaScript: HyperMind vs Vanilla LLM on LUBM (12 queries)
841
+ ANTHROPIC_API_KEY=... OPENAI_API_KEY=... node vanilla-vs-hypermind-benchmark.js
842
+
843
+ # Python: Compare frameworks (Vanilla, LangChain, DSPy) with/without schema
844
+ OPENAI_API_KEY=... uv run --with openai --with langchain --with langchain-openai --with langchain-core --with dspy-ai python3 benchmark-frameworks.py
845
+ ```
846
+
847
+ Both scripts make real API calls and report actual results. No mocking.
848
+
849
+ **Why These Features Matter**:
820
850
  - **Type Safety**: Tools have typed signatures (Query → BindingSet), invalid combinations rejected
821
- - **Schema Awareness**: LLM sees your actual data structure, can only reference real properties
851
+ - **Schema Awareness**: Planner sees your actual data structure, can only reference real properties
822
852
  - **Symbolic Execution**: Queries run against real database, not LLM imagination
823
853
  - **Audit Trail**: Every answer has cryptographic hash for reproducibility
824
854
 
@@ -1151,140 +1181,6 @@ const result = await agent.call('Find collusion patterns')
1151
1181
  // Result: ✅ Type-safe, domain-aware, auditable
1152
1182
  ```
1153
1183
 
1154
- ### Code Comparison: DSPy vs HyperMind
1155
-
1156
- #### DSPy Approach (Prompt Optimization)
1157
-
1158
- ```python
1159
- # DSPy: Statistically optimized prompt - NO guarantees
1160
-
1161
- import dspy
1162
-
1163
- class FraudDetector(dspy.Signature):
1164
- """Find fraud patterns in claims data."""
1165
- claims_data = dspy.InputField()
1166
- fraud_patterns = dspy.OutputField()
1167
-
1168
- class FraudPipeline(dspy.Module):
1169
- def __init__(self):
1170
- self.detector = dspy.ChainOfThought(FraudDetector)
1171
-
1172
- def forward(self, claims):
1173
- return self.detector(claims_data=claims)
1174
-
1175
- # "Optimize" via statistical fitting
1176
- optimizer = dspy.BootstrapFewShot(metric=some_metric)
1177
- optimized = optimizer.compile(FraudPipeline(), trainset=examples)
1178
-
1179
- # Call and HOPE it works
1180
- result = optimized(claims="[claim data here]")
1181
-
1182
- # ❌ No type guarantee - fraud_patterns could be anything
1183
- # ❌ No proof of execution - just text output
1184
- # ❌ No composition safety - next step might fail
1185
- # ❌ No audit trail - "it said fraud" is not compliance
1186
- ```
1187
-
1188
- **What DSPy produces:** A string that *probably* contains fraud patterns.
1189
-
1190
- #### HyperMind Approach (Mathematical Proof)
1191
-
1192
- ```javascript
1193
- // HyperMind: Type-safe morphism composition - PROVEN correct
1194
-
1195
- const { GraphDB, GraphFrame, DatalogProgram, evaluateDatalog } = require('rust-kgdb')
1196
-
1197
- // Step 1: Load typed knowledge graph (Schema enforced)
1198
- const db = new GraphDB('http://insurance.org/fraud-kb')
1199
- db.loadTtl(`
1200
- @prefix : <http://insurance.org/> .
1201
- :CLM001 :amount "18500" ; :claimant :P001 ; :provider :PROV001 .
1202
- :P001 :paidTo :P002 .
1203
- :P002 :paidTo :P003 .
1204
- :P003 :paidTo :P001 .
1205
- `, null)
1206
-
1207
- // Step 2: GraphFrame analysis (Morphism: Graph → TriangleCount)
1208
- // Type signature: GraphFrame → number (guaranteed)
1209
- const graph = new GraphFrame(
1210
- JSON.stringify([{id:'P001'}, {id:'P002'}, {id:'P003'}]),
1211
- JSON.stringify([
1212
- {src:'P001', dst:'P002'},
1213
- {src:'P002', dst:'P003'},
1214
- {src:'P003', dst:'P001'}
1215
- ])
1216
- )
1217
- const triangles = graph.triangleCount() // Type: number (always)
1218
-
1219
- // Step 3: Datalog inference (Morphism: Rules → Facts)
1220
- // Type signature: DatalogProgram → InferredFacts (guaranteed)
1221
- const datalog = new DatalogProgram()
1222
- datalog.addFact(JSON.stringify({predicate:'claim', terms:['CLM001','P001','PROV001']}))
1223
- datalog.addFact(JSON.stringify({predicate:'related', terms:['P001','P002']}))
1224
-
1225
- datalog.addRule(JSON.stringify({
1226
- head: {predicate:'collusion', terms:['?P1','?P2','?Prov']},
1227
- body: [
1228
- {predicate:'claim', terms:['?C1','?P1','?Prov']},
1229
- {predicate:'claim', terms:['?C2','?P2','?Prov']},
1230
- {predicate:'related', terms:['?P1','?P2']}
1231
- ]
1232
- }))
1233
-
1234
- const result = JSON.parse(evaluateDatalog(datalog))
1235
-
1236
- // ✓ Type guarantee: result.collusion is always array of tuples
1237
- // ✓ Proof of execution: Datalog evaluation is deterministic
1238
- // ✓ Composition safety: Each step has typed input/output
1239
- // ✓ Audit trail: Every fact derivation is traceable
1240
- ```
1241
-
1242
- **What HyperMind produces:** Typed results with mathematical proof of derivation.
1243
-
1244
- #### Actual Output Comparison
1245
-
1246
- **DSPy Output:**
1247
- ```
1248
- fraud_patterns: "I found some suspicious patterns involving P001 and P002
1249
- that appear to be related. There might be collusion with provider PROV001."
1250
- ```
1251
- *How do you validate this? You can't. It's text.*
1252
-
1253
- **HyperMind Output:**
1254
- ```json
1255
- {
1256
- "triangles": 1,
1257
- "collusion": [["P001", "P002", "PROV001"]],
1258
- "executionWitness": {
1259
- "tool": "datalog.evaluate",
1260
- "input": "6 facts, 1 rule",
1261
- "output": "collusion(P001,P002,PROV001)",
1262
- "derivation": "claim(CLM001,P001,PROV001) ∧ claim(CLM002,P002,PROV001) ∧ related(P001,P002) → collusion(P001,P002,PROV001)",
1263
- "timestamp": "2024-12-14T10:30:00Z",
1264
- "semanticHash": "semhash:collusion-p001-p002-prov001"
1265
- }
1266
- }
1267
- ```
1268
- *Every result has a logical derivation and cryptographic proof.*
1269
-
1270
- #### The Compliance Question
1271
-
1272
- **Auditor:** "How do you know P001-P002-PROV001 is actually collusion?"
1273
-
1274
- **DSPy Team:** "Our model said so. It was trained on examples and optimized for accuracy."
1275
-
1276
- **HyperMind Team:** "Here's the derivation chain:
1277
- 1. `claim(CLM001, P001, PROV001)` - fact from data
1278
- 2. `claim(CLM002, P002, PROV001)` - fact from data
1279
- 3. `related(P001, P002)` - fact from data
1280
- 4. Rule: `collusion(?P1, ?P2, ?Prov) :- claim(?C1, ?P1, ?Prov), claim(?C2, ?P2, ?Prov), related(?P1, ?P2)`
1281
- 5. Unification: `?P1=P001, ?P2=P002, ?Prov=PROV001`
1282
- 6. Conclusion: `collusion(P001, P002, PROV001)` - QED
1283
-
1284
- Here's the semantic hash: `semhash:collusion-p001-p002-prov001` - same query intent will always return this exact result."
1285
-
1286
- **Result:** HyperMind passes audit. DSPy gets you a follow-up meeting with legal.
1287
-
1288
1184
  ### Why Vanilla LLMs Fail
1289
1185
 
1290
1186
  When you ask an LLM to query a knowledge graph, it produces **broken SPARQL 85% of the time**:
@@ -1333,16 +1229,15 @@ Result: ❌ PARSER ERROR - Invalid SPARQL syntax
1333
1229
 
1334
1230
  **Note**: Tentris implements WCOJ (see [ISWC 2025 paper](https://papers.dice-research.org/2025/ISWC_Tentris-WCOJ-Update/public.pdf)). rust-kgdb is the only system combining WCOJ with mobile support and integrated AI framework.
1335
1231
 
1336
- #### AI Framework Comparison
1232
+ #### AI Framework Architectural Comparison
1337
1233
 
1338
- | Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail | Success Rate |
1339
- |-----------|-------------|--------------|-------------------|-------------|--------------|
1340
- | **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | **86.4%** |
1341
- | LangChain | ❌ No | ❌ No | ❌ No | ❌ No | ~20-40%* |
1342
- | AutoGPT | No | ❌ No | ❌ No | ❌ No | ~10-25%* |
1343
- | DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No | ~30-50%* |
1234
+ | Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail |
1235
+ |-----------|-------------|--------------|-------------------|-------------|
1236
+ | **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
1237
+ | LangChain | ❌ No | ❌ No | ❌ No | ❌ No |
1238
+ | DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No |
1344
1239
 
1345
- *Estimated from GAIA (Meta Research, 2023), SWE-bench (OpenAI, 2024), and LUBM (Lehigh University) benchmarks. HyperMind: LUBM-1 (3,272 triples).
1240
+ **Note**: This compares architectural features. Benchmark (Dec 2025): Schema injection improves all frameworks by +66.7 pp (Vanilla: 0%→71.4%, LangChain: 0%→71.4%, DSPy: 14.3%→71.4%).
1346
1241
 
1347
1242
  ```
1348
1243
  ┌─────────────────────────────────────────────────────────────────┐
@@ -1355,12 +1250,10 @@ Result: ❌ PARSER ERROR - Invalid SPARQL syntax
1355
1250
  │ Apache Jena: Great features, but 150+ µs lookups │
1356
1251
  │ Neo4j: Popular, but no SPARQL/RDF standards │
1357
1252
  │ Amazon Neptune: Managed, but cloud-only vendor lock-in │
1358
- │ LangChain: Vibe coding, fails compliance audits │
1359
- │ DSPy: Statistical optimization, no guarantees │
1360
1253
  │ │
1361
1254
  │ rust-kgdb: 2.78 µs lookups, WCOJ joins, mobile-native │
1362
1255
  │ Standalone → Clustered on same codebase │
1363
- Mathematical foundations, audit-ready
1256
+ Deterministic planner, audit-ready
1364
1257
  │ │
1365
1258
  └─────────────────────────────────────────────────────────────────┘
1366
1259
  ```