rust-kgdb 0.6.31 → 0.6.32
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +50 -499
- package/HYPERMIND_BENCHMARK_REPORT.md +199 -41
- package/README.md +51 -171
- package/benchmark-frameworks.py +568 -0
- package/package.json +3 -1
- package/verified_benchmark_results.json +307 -0
|
@@ -1,28 +1,31 @@
|
|
|
1
1
|
# HyperMind Benchmark Report
|
|
2
2
|
|
|
3
|
-
##
|
|
3
|
+
## Verified Framework Comparison: Schema Injection Works
|
|
4
4
|
|
|
5
|
-
**Version**: 0.6.
|
|
5
|
+
**Version**: 0.6.32
|
|
6
6
|
**Date**: December 16, 2025
|
|
7
|
-
**SDK**: rust-kgdb@0.6.
|
|
7
|
+
**SDK**: rust-kgdb@0.6.32
|
|
8
8
|
|
|
9
9
|
---
|
|
10
10
|
|
|
11
|
-
##
|
|
11
|
+
## Executive Summary (Verified Results)
|
|
12
12
|
|
|
13
|
-
**
|
|
13
|
+
**Schema injection improves ALL frameworks by +66.7 percentage points.**
|
|
14
14
|
|
|
15
|
-
|
|
|
16
|
-
|
|
17
|
-
| **
|
|
18
|
-
| **
|
|
19
|
-
| **
|
|
20
|
-
| **
|
|
21
|
-
|
|
15
|
+
| Framework | No Schema | With Schema | Improvement |
|
|
16
|
+
|-----------|-----------|-------------|-------------|
|
|
17
|
+
| **Vanilla OpenAI** | 0.0% | 71.4% | +71.4 pp |
|
|
18
|
+
| **LangChain** | 0.0% | 71.4% | +71.4 pp |
|
|
19
|
+
| **DSPy** | 14.3% | 71.4% | +57.1 pp |
|
|
20
|
+
| **Average** | 4.8% | **71.4%** | **+66.7 pp** |
|
|
21
|
+
|
|
22
|
+
*GPT-4o, 7 LUBM queries, real API calls, no mocking. See `verified_benchmark_results.json`.*
|
|
23
|
+
|
|
24
|
+
**Key Insight**: The value is in the ARCHITECTURE (schema injection, type contracts), not the specific framework.
|
|
22
25
|
|
|
23
26
|
---
|
|
24
27
|
|
|
25
|
-
## Why Vanilla LLMs Fail
|
|
28
|
+
## Why Vanilla LLMs Fail (Without Schema)
|
|
26
29
|
|
|
27
30
|
When you ask a vanilla LLM to query your database:
|
|
28
31
|
|
|
@@ -33,39 +36,194 @@ When you ask a vanilla LLM to query your database:
|
|
|
33
36
|
|
|
34
37
|
---
|
|
35
38
|
|
|
36
|
-
## How
|
|
39
|
+
## How Schema Injection Fixes This
|
|
37
40
|
|
|
38
|
-
HyperMind
|
|
41
|
+
The HyperMind approach (schema injection) works with ANY framework:
|
|
39
42
|
|
|
40
43
|
1. **Schema injection** - LLM sees your real data structure (30 classes, 23 properties)
|
|
41
|
-
2. **
|
|
42
|
-
3. **
|
|
43
|
-
4. **Reproducible** - Same question = Same answer
|
|
44
|
+
2. **Output format** - Explicit instructions for raw SPARQL (no markdown)
|
|
45
|
+
3. **Type contracts** - Predicate constraints from actual schema
|
|
46
|
+
4. **Reproducible** - Same question = Same answer
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## Benchmark Setup: Code for Each Framework
|
|
51
|
+
|
|
52
|
+
### Test Queries (Same for All Frameworks)
|
|
53
|
+
|
|
54
|
+
```python
|
|
55
|
+
TEST_QUERIES = [
|
|
56
|
+
{"id": "A1", "question": "Find all teachers", "correct_predicate": "teacherOf"},
|
|
57
|
+
{"id": "A2", "question": "Get student emails", "correct_predicate": "emailAddress"},
|
|
58
|
+
{"id": "A3", "question": "Find faculty members", "correct_predicate": "Professor"},
|
|
59
|
+
{"id": "S1", "question": "Write a SPARQL query to count professors. Just give me the query."},
|
|
60
|
+
{"id": "S2", "question": "SPARQL only, no explanation: find graduate students"},
|
|
61
|
+
{"id": "M1", "question": "Find professors who work for departments"},
|
|
62
|
+
{"id": "E1", "question": "Find professors with no publications"}
|
|
63
|
+
]
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### LUBM Schema (Injected for "With Schema" Tests)
|
|
67
|
+
|
|
68
|
+
```python
|
|
69
|
+
LUBM_SCHEMA = """LUBM (Lehigh University Benchmark) Schema:
|
|
70
|
+
|
|
71
|
+
PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
|
|
44
72
|
|
|
73
|
+
Classes: University, Department, Professor, AssociateProfessor, AssistantProfessor,
|
|
74
|
+
FullProfessor, Lecturer, GraduateStudent, UndergraduateStudent,
|
|
75
|
+
Course, GraduateCourse, Publication, Research, ResearchGroup
|
|
76
|
+
|
|
77
|
+
Properties:
|
|
78
|
+
- ub:worksFor (person → organization)
|
|
79
|
+
- ub:memberOf (person → organization)
|
|
80
|
+
- ub:advisor (student → professor)
|
|
81
|
+
- ub:takesCourse (student → course)
|
|
82
|
+
- ub:teacherOf (professor → course)
|
|
83
|
+
- ub:publicationAuthor (publication → person)
|
|
84
|
+
- ub:subOrganizationOf (organization → organization)
|
|
85
|
+
- ub:emailAddress (person → string)
|
|
86
|
+
|
|
87
|
+
IMPORTANT: Use ONLY these predicates. Do NOT use: teacher, email, faculty"""
|
|
45
88
|
```
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## Framework Code Comparison
|
|
93
|
+
|
|
94
|
+
### 1. Vanilla OpenAI (No Schema) - 0% Accuracy
|
|
95
|
+
|
|
96
|
+
```python
|
|
97
|
+
from openai import OpenAI
|
|
98
|
+
client = OpenAI(api_key=api_key)
|
|
99
|
+
|
|
100
|
+
response = client.chat.completions.create(
|
|
101
|
+
model="gpt-4o",
|
|
102
|
+
messages=[{"role": "user", "content": f"Generate a SPARQL query for: {question}"}],
|
|
103
|
+
max_tokens=500
|
|
104
|
+
)
|
|
105
|
+
sparql = response.choices[0].message.content
|
|
106
|
+
# Result: 0/7 passed - all wrapped in markdown
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
### 2. Vanilla OpenAI (With Schema) - 71.4% Accuracy
|
|
110
|
+
|
|
111
|
+
```python
|
|
112
|
+
from openai import OpenAI
|
|
113
|
+
client = OpenAI(api_key=api_key)
|
|
114
|
+
|
|
115
|
+
prompt = f"""You are a SPARQL query generator.
|
|
116
|
+
|
|
117
|
+
{LUBM_SCHEMA}
|
|
118
|
+
|
|
119
|
+
TYPE CONTRACT:
|
|
120
|
+
- Input: natural language query
|
|
121
|
+
- Output: raw SPARQL (NO markdown, NO code blocks, NO explanation)
|
|
122
|
+
- Use ONLY predicates from the schema above
|
|
123
|
+
|
|
124
|
+
Query: {question}
|
|
125
|
+
|
|
126
|
+
Output raw SPARQL only:"""
|
|
127
|
+
|
|
128
|
+
response = client.chat.completions.create(
|
|
129
|
+
model="gpt-4o",
|
|
130
|
+
messages=[{"role": "user", "content": prompt}],
|
|
131
|
+
max_tokens=500
|
|
132
|
+
)
|
|
133
|
+
sparql = response.choices[0].message.content
|
|
134
|
+
# Result: 5/7 passed - schema prevents wrong predicates
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
### 3. LangChain (No Schema) - 0% Accuracy
|
|
138
|
+
|
|
139
|
+
```python
|
|
140
|
+
from langchain_openai import ChatOpenAI
|
|
141
|
+
from langchain_core.prompts import PromptTemplate
|
|
142
|
+
from langchain_core.output_parsers import StrOutputParser
|
|
143
|
+
|
|
144
|
+
llm = ChatOpenAI(model="gpt-4o", api_key=api_key)
|
|
145
|
+
parser = StrOutputParser()
|
|
146
|
+
|
|
147
|
+
template = PromptTemplate(
|
|
148
|
+
input_variables=["question"],
|
|
149
|
+
template="Generate a SPARQL query for: {question}"
|
|
150
|
+
)
|
|
151
|
+
chain = template | llm | parser
|
|
152
|
+
|
|
153
|
+
sparql = chain.invoke({"question": question})
|
|
154
|
+
# Result: 0/7 passed - all wrapped in markdown
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
### 4. LangChain (With Schema) - 71.4% Accuracy
|
|
158
|
+
|
|
159
|
+
```python
|
|
160
|
+
from langchain_openai import ChatOpenAI
|
|
161
|
+
from langchain_core.prompts import PromptTemplate
|
|
162
|
+
from langchain_core.output_parsers import StrOutputParser
|
|
163
|
+
|
|
164
|
+
llm = ChatOpenAI(model="gpt-4o", api_key=api_key)
|
|
165
|
+
parser = StrOutputParser()
|
|
166
|
+
|
|
167
|
+
template = PromptTemplate(
|
|
168
|
+
input_variables=["question", "schema"],
|
|
169
|
+
template="""You are a SPARQL query generator.
|
|
170
|
+
|
|
171
|
+
{schema}
|
|
172
|
+
|
|
173
|
+
TYPE CONTRACT:
|
|
174
|
+
- Input: natural language query
|
|
175
|
+
- Output: raw SPARQL (NO markdown, NO code blocks, NO explanation)
|
|
176
|
+
- Use ONLY predicates from the schema above
|
|
177
|
+
|
|
178
|
+
Query: {question}
|
|
179
|
+
|
|
180
|
+
Output raw SPARQL only:"""
|
|
181
|
+
)
|
|
182
|
+
chain = template | llm | parser
|
|
183
|
+
|
|
184
|
+
sparql = chain.invoke({"question": question, "schema": LUBM_SCHEMA})
|
|
185
|
+
# Result: 5/7 passed - same as vanilla with schema
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
### 5. DSPy (No Schema) - 14.3% Accuracy
|
|
189
|
+
|
|
190
|
+
```python
|
|
191
|
+
import dspy
|
|
192
|
+
from dspy import LM
|
|
193
|
+
|
|
194
|
+
lm = LM("openai/gpt-4o")
|
|
195
|
+
dspy.configure(lm=lm)
|
|
196
|
+
|
|
197
|
+
class SPARQLGenerator(dspy.Signature):
|
|
198
|
+
"""Generate SPARQL query from natural language."""
|
|
199
|
+
question = dspy.InputField(desc="Natural language question")
|
|
200
|
+
sparql = dspy.OutputField(desc="SPARQL query")
|
|
201
|
+
|
|
202
|
+
generator = dspy.Predict(SPARQLGenerator)
|
|
203
|
+
response = generator(question=question)
|
|
204
|
+
sparql = response.sparql
|
|
205
|
+
# Result: 1/7 passed - slightly better output formatting
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
### 6. DSPy (With Schema) - 71.4% Accuracy
|
|
209
|
+
|
|
210
|
+
```python
|
|
211
|
+
import dspy
|
|
212
|
+
from dspy import LM
|
|
213
|
+
|
|
214
|
+
lm = LM("openai/gpt-4o")
|
|
215
|
+
dspy.configure(lm=lm)
|
|
216
|
+
|
|
217
|
+
class SchemaSPARQLGenerator(dspy.Signature):
|
|
218
|
+
"""Generate SPARQL query using the provided schema. Output raw SPARQL only."""
|
|
219
|
+
schema = dspy.InputField(desc="Database schema with classes and properties")
|
|
220
|
+
question = dspy.InputField(desc="Natural language question")
|
|
221
|
+
sparql = dspy.OutputField(desc="Raw SPARQL query (no markdown, no explanation)")
|
|
222
|
+
|
|
223
|
+
generator = dspy.Predict(SchemaSPARQLGenerator)
|
|
224
|
+
response = generator(schema=LUBM_SCHEMA, question=question)
|
|
225
|
+
sparql = response.sparql
|
|
226
|
+
# Result: 5/7 passed - same as others with schema
|
|
69
227
|
```
|
|
70
228
|
|
|
71
229
|
---
|
package/README.md
CHANGED
|
@@ -12,27 +12,27 @@
|
|
|
12
12
|
|
|
13
13
|
---
|
|
14
14
|
|
|
15
|
-
## Results
|
|
15
|
+
## Results (Verified December 2025)
|
|
16
16
|
|
|
17
17
|
```
|
|
18
18
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
19
19
|
│ BENCHMARK: LUBM (Lehigh University Benchmark) │
|
|
20
20
|
│ DATASET: 3,272 triples │ 30 OWL classes │ 23 properties │
|
|
21
|
-
│
|
|
22
|
-
│ PROTOCOL: Query → Parse → Type-check → Execute → Verify │
|
|
21
|
+
│ MODEL: GPT-4o │ Real API calls │ No mocking │
|
|
23
22
|
├─────────────────────────────────────────────────────────────────────────────┤
|
|
24
23
|
│ │
|
|
25
|
-
│
|
|
24
|
+
│ FRAMEWORK NO SCHEMA WITH SCHEMA IMPROVEMENT │
|
|
26
25
|
│ ───────────────────────────────────────────────────────────── │
|
|
27
|
-
│
|
|
28
|
-
│
|
|
29
|
-
│
|
|
30
|
-
│
|
|
26
|
+
│ Vanilla OpenAI 0.0% 71.4% +71.4 pp │
|
|
27
|
+
│ LangChain 0.0% 71.4% +71.4 pp │
|
|
28
|
+
│ DSPy 14.3% 71.4% +57.1 pp │
|
|
29
|
+
│ ───────────────────────────────────────────────────────────── │
|
|
30
|
+
│ AVERAGE 4.8% 71.4% +66.7 pp │
|
|
31
31
|
│ │
|
|
32
|
-
│
|
|
33
|
-
│
|
|
32
|
+
│ KEY INSIGHT: Schema injection improves ALL frameworks equally. │
|
|
33
|
+
│ HyperMind's value = architecture, not framework. │
|
|
34
34
|
│ │
|
|
35
|
-
│ Reproduce:
|
|
35
|
+
│ Reproduce: python3 benchmark-frameworks.py │
|
|
36
36
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
37
37
|
```
|
|
38
38
|
|
|
@@ -811,27 +811,44 @@ console.log('Supersteps:', result.supersteps) // 5
|
|
|
811
811
|
| Virtuoso | ~5 µs | 35-75 bytes | No |
|
|
812
812
|
| Blazegraph | ~100 µs | 100+ bytes | No |
|
|
813
813
|
|
|
814
|
-
### AI Agent Accuracy
|
|
814
|
+
### AI Agent Accuracy (Verified December 2025)
|
|
815
|
+
|
|
816
|
+
| Framework | No Schema | With Schema (HyperMind) | Improvement |
|
|
817
|
+
|-----------|-----------|-------------------------|-------------|
|
|
818
|
+
| **Vanilla OpenAI** | 0.0% | 71.4% | +71.4 pp |
|
|
819
|
+
| **LangChain** | 0.0% | 71.4% | +71.4 pp |
|
|
820
|
+
| **DSPy** | 14.3% | 71.4% | +57.1 pp |
|
|
821
|
+
| **Average** | 4.8% | **71.4%** | **+66.7 pp** |
|
|
822
|
+
|
|
823
|
+
*Tested: GPT-4o, 7 LUBM queries, real API calls. See `framework_benchmark_*.json` for raw data.*
|
|
824
|
+
|
|
825
|
+
### AI Framework Architectural Comparison
|
|
826
|
+
|
|
827
|
+
| Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail |
|
|
828
|
+
|-----------|-------------|--------------|-------------------|-------------|
|
|
829
|
+
| **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
|
|
830
|
+
| LangChain | ❌ No | ❌ No | ❌ No | ❌ No |
|
|
831
|
+
| DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No |
|
|
832
|
+
|
|
833
|
+
**Key Insight**: Schema injection (HyperMind's architecture) provides +66.7 pp improvement across ALL frameworks. The value is in the architecture, not the specific framework.
|
|
834
|
+
|
|
835
|
+
### Reproduce Benchmarks
|
|
815
836
|
|
|
816
|
-
|
|
817
|
-
|----------|----------|-----|
|
|
818
|
-
| **Vanilla LLM** | 0% | Hallucinated predicates, markdown in SPARQL |
|
|
819
|
-
| **HyperMind** | 86.4% | Schema injection, typed tools, audit trail |
|
|
837
|
+
Two benchmark scripts are available for verification:
|
|
820
838
|
|
|
821
|
-
|
|
839
|
+
```bash
|
|
840
|
+
# JavaScript: HyperMind vs Vanilla LLM on LUBM (12 queries)
|
|
841
|
+
ANTHROPIC_API_KEY=... OPENAI_API_KEY=... node vanilla-vs-hypermind-benchmark.js
|
|
822
842
|
|
|
823
|
-
|
|
824
|
-
|
|
825
|
-
|
|
826
|
-
| LangChain | ❌ No | ❌ No | ❌ No | ~20-40%* |
|
|
827
|
-
| AutoGPT | ❌ No | ❌ No | ❌ No | ~10-25%* |
|
|
828
|
-
| DSPy | ⚠️ Partial | ❌ No | ❌ No | ~30-50%* |
|
|
843
|
+
# Python: Compare frameworks (Vanilla, LangChain, DSPy) with/without schema
|
|
844
|
+
OPENAI_API_KEY=... uv run --with openai --with langchain --with langchain-openai --with langchain-core --with dspy-ai python3 benchmark-frameworks.py
|
|
845
|
+
```
|
|
829
846
|
|
|
830
|
-
|
|
847
|
+
Both scripts make real API calls and report actual results. No mocking.
|
|
831
848
|
|
|
832
|
-
**Why
|
|
849
|
+
**Why These Features Matter**:
|
|
833
850
|
- **Type Safety**: Tools have typed signatures (Query → BindingSet), invalid combinations rejected
|
|
834
|
-
- **Schema Awareness**:
|
|
851
|
+
- **Schema Awareness**: Planner sees your actual data structure, can only reference real properties
|
|
835
852
|
- **Symbolic Execution**: Queries run against real database, not LLM imagination
|
|
836
853
|
- **Audit Trail**: Every answer has cryptographic hash for reproducibility
|
|
837
854
|
|
|
@@ -1164,140 +1181,6 @@ const result = await agent.call('Find collusion patterns')
|
|
|
1164
1181
|
// Result: ✅ Type-safe, domain-aware, auditable
|
|
1165
1182
|
```
|
|
1166
1183
|
|
|
1167
|
-
### Code Comparison: DSPy vs HyperMind
|
|
1168
|
-
|
|
1169
|
-
#### DSPy Approach (Prompt Optimization)
|
|
1170
|
-
|
|
1171
|
-
```python
|
|
1172
|
-
# DSPy: Statistically optimized prompt - NO guarantees
|
|
1173
|
-
|
|
1174
|
-
import dspy
|
|
1175
|
-
|
|
1176
|
-
class FraudDetector(dspy.Signature):
|
|
1177
|
-
"""Find fraud patterns in claims data."""
|
|
1178
|
-
claims_data = dspy.InputField()
|
|
1179
|
-
fraud_patterns = dspy.OutputField()
|
|
1180
|
-
|
|
1181
|
-
class FraudPipeline(dspy.Module):
|
|
1182
|
-
def __init__(self):
|
|
1183
|
-
self.detector = dspy.ChainOfThought(FraudDetector)
|
|
1184
|
-
|
|
1185
|
-
def forward(self, claims):
|
|
1186
|
-
return self.detector(claims_data=claims)
|
|
1187
|
-
|
|
1188
|
-
# "Optimize" via statistical fitting
|
|
1189
|
-
optimizer = dspy.BootstrapFewShot(metric=some_metric)
|
|
1190
|
-
optimized = optimizer.compile(FraudPipeline(), trainset=examples)
|
|
1191
|
-
|
|
1192
|
-
# Call and HOPE it works
|
|
1193
|
-
result = optimized(claims="[claim data here]")
|
|
1194
|
-
|
|
1195
|
-
# ❌ No type guarantee - fraud_patterns could be anything
|
|
1196
|
-
# ❌ No proof of execution - just text output
|
|
1197
|
-
# ❌ No composition safety - next step might fail
|
|
1198
|
-
# ❌ No audit trail - "it said fraud" is not compliance
|
|
1199
|
-
```
|
|
1200
|
-
|
|
1201
|
-
**What DSPy produces:** A string that *probably* contains fraud patterns.
|
|
1202
|
-
|
|
1203
|
-
#### HyperMind Approach (Mathematical Proof)
|
|
1204
|
-
|
|
1205
|
-
```javascript
|
|
1206
|
-
// HyperMind: Type-safe morphism composition - PROVEN correct
|
|
1207
|
-
|
|
1208
|
-
const { GraphDB, GraphFrame, DatalogProgram, evaluateDatalog } = require('rust-kgdb')
|
|
1209
|
-
|
|
1210
|
-
// Step 1: Load typed knowledge graph (Schema enforced)
|
|
1211
|
-
const db = new GraphDB('http://insurance.org/fraud-kb')
|
|
1212
|
-
db.loadTtl(`
|
|
1213
|
-
@prefix : <http://insurance.org/> .
|
|
1214
|
-
:CLM001 :amount "18500" ; :claimant :P001 ; :provider :PROV001 .
|
|
1215
|
-
:P001 :paidTo :P002 .
|
|
1216
|
-
:P002 :paidTo :P003 .
|
|
1217
|
-
:P003 :paidTo :P001 .
|
|
1218
|
-
`, null)
|
|
1219
|
-
|
|
1220
|
-
// Step 2: GraphFrame analysis (Morphism: Graph → TriangleCount)
|
|
1221
|
-
// Type signature: GraphFrame → number (guaranteed)
|
|
1222
|
-
const graph = new GraphFrame(
|
|
1223
|
-
JSON.stringify([{id:'P001'}, {id:'P002'}, {id:'P003'}]),
|
|
1224
|
-
JSON.stringify([
|
|
1225
|
-
{src:'P001', dst:'P002'},
|
|
1226
|
-
{src:'P002', dst:'P003'},
|
|
1227
|
-
{src:'P003', dst:'P001'}
|
|
1228
|
-
])
|
|
1229
|
-
)
|
|
1230
|
-
const triangles = graph.triangleCount() // Type: number (always)
|
|
1231
|
-
|
|
1232
|
-
// Step 3: Datalog inference (Morphism: Rules → Facts)
|
|
1233
|
-
// Type signature: DatalogProgram → InferredFacts (guaranteed)
|
|
1234
|
-
const datalog = new DatalogProgram()
|
|
1235
|
-
datalog.addFact(JSON.stringify({predicate:'claim', terms:['CLM001','P001','PROV001']}))
|
|
1236
|
-
datalog.addFact(JSON.stringify({predicate:'related', terms:['P001','P002']}))
|
|
1237
|
-
|
|
1238
|
-
datalog.addRule(JSON.stringify({
|
|
1239
|
-
head: {predicate:'collusion', terms:['?P1','?P2','?Prov']},
|
|
1240
|
-
body: [
|
|
1241
|
-
{predicate:'claim', terms:['?C1','?P1','?Prov']},
|
|
1242
|
-
{predicate:'claim', terms:['?C2','?P2','?Prov']},
|
|
1243
|
-
{predicate:'related', terms:['?P1','?P2']}
|
|
1244
|
-
]
|
|
1245
|
-
}))
|
|
1246
|
-
|
|
1247
|
-
const result = JSON.parse(evaluateDatalog(datalog))
|
|
1248
|
-
|
|
1249
|
-
// ✓ Type guarantee: result.collusion is always array of tuples
|
|
1250
|
-
// ✓ Proof of execution: Datalog evaluation is deterministic
|
|
1251
|
-
// ✓ Composition safety: Each step has typed input/output
|
|
1252
|
-
// ✓ Audit trail: Every fact derivation is traceable
|
|
1253
|
-
```
|
|
1254
|
-
|
|
1255
|
-
**What HyperMind produces:** Typed results with mathematical proof of derivation.
|
|
1256
|
-
|
|
1257
|
-
#### Actual Output Comparison
|
|
1258
|
-
|
|
1259
|
-
**DSPy Output:**
|
|
1260
|
-
```
|
|
1261
|
-
fraud_patterns: "I found some suspicious patterns involving P001 and P002
|
|
1262
|
-
that appear to be related. There might be collusion with provider PROV001."
|
|
1263
|
-
```
|
|
1264
|
-
*How do you validate this? You can't. It's text.*
|
|
1265
|
-
|
|
1266
|
-
**HyperMind Output:**
|
|
1267
|
-
```json
|
|
1268
|
-
{
|
|
1269
|
-
"triangles": 1,
|
|
1270
|
-
"collusion": [["P001", "P002", "PROV001"]],
|
|
1271
|
-
"executionWitness": {
|
|
1272
|
-
"tool": "datalog.evaluate",
|
|
1273
|
-
"input": "6 facts, 1 rule",
|
|
1274
|
-
"output": "collusion(P001,P002,PROV001)",
|
|
1275
|
-
"derivation": "claim(CLM001,P001,PROV001) ∧ claim(CLM002,P002,PROV001) ∧ related(P001,P002) → collusion(P001,P002,PROV001)",
|
|
1276
|
-
"timestamp": "2024-12-14T10:30:00Z",
|
|
1277
|
-
"semanticHash": "semhash:collusion-p001-p002-prov001"
|
|
1278
|
-
}
|
|
1279
|
-
}
|
|
1280
|
-
```
|
|
1281
|
-
*Every result has a logical derivation and cryptographic proof.*
|
|
1282
|
-
|
|
1283
|
-
#### The Compliance Question
|
|
1284
|
-
|
|
1285
|
-
**Auditor:** "How do you know P001-P002-PROV001 is actually collusion?"
|
|
1286
|
-
|
|
1287
|
-
**DSPy Team:** "Our model said so. It was trained on examples and optimized for accuracy."
|
|
1288
|
-
|
|
1289
|
-
**HyperMind Team:** "Here's the derivation chain:
|
|
1290
|
-
1. `claim(CLM001, P001, PROV001)` - fact from data
|
|
1291
|
-
2. `claim(CLM002, P002, PROV001)` - fact from data
|
|
1292
|
-
3. `related(P001, P002)` - fact from data
|
|
1293
|
-
4. Rule: `collusion(?P1, ?P2, ?Prov) :- claim(?C1, ?P1, ?Prov), claim(?C2, ?P2, ?Prov), related(?P1, ?P2)`
|
|
1294
|
-
5. Unification: `?P1=P001, ?P2=P002, ?Prov=PROV001`
|
|
1295
|
-
6. Conclusion: `collusion(P001, P002, PROV001)` - QED
|
|
1296
|
-
|
|
1297
|
-
Here's the semantic hash: `semhash:collusion-p001-p002-prov001` - same query intent will always return this exact result."
|
|
1298
|
-
|
|
1299
|
-
**Result:** HyperMind passes audit. DSPy gets you a follow-up meeting with legal.
|
|
1300
|
-
|
|
1301
1184
|
### Why Vanilla LLMs Fail
|
|
1302
1185
|
|
|
1303
1186
|
When you ask an LLM to query a knowledge graph, it produces **broken SPARQL 85% of the time**:
|
|
@@ -1346,16 +1229,15 @@ Result: ❌ PARSER ERROR - Invalid SPARQL syntax
|
|
|
1346
1229
|
|
|
1347
1230
|
**Note**: Tentris implements WCOJ (see [ISWC 2025 paper](https://papers.dice-research.org/2025/ISWC_Tentris-WCOJ-Update/public.pdf)). rust-kgdb is the only system combining WCOJ with mobile support and integrated AI framework.
|
|
1348
1231
|
|
|
1349
|
-
#### AI Framework Comparison
|
|
1232
|
+
#### AI Framework Architectural Comparison
|
|
1350
1233
|
|
|
1351
|
-
| Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail |
|
|
1352
|
-
|
|
1353
|
-
| **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
|
|
1354
|
-
| LangChain | ❌ No | ❌ No | ❌ No | ❌ No |
|
|
1355
|
-
|
|
|
1356
|
-
| DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No | ~30-50%* |
|
|
1234
|
+
| Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail |
|
|
1235
|
+
|-----------|-------------|--------------|-------------------|-------------|
|
|
1236
|
+
| **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
|
|
1237
|
+
| LangChain | ❌ No | ❌ No | ❌ No | ❌ No |
|
|
1238
|
+
| DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No |
|
|
1357
1239
|
|
|
1358
|
-
|
|
1240
|
+
**Note**: This compares architectural features. Benchmark (Dec 2025): Schema injection improves all frameworks by +66.7 pp (Vanilla: 0%→71.4%, LangChain: 0%→71.4%, DSPy: 14.3%→71.4%).
|
|
1359
1241
|
|
|
1360
1242
|
```
|
|
1361
1243
|
┌─────────────────────────────────────────────────────────────────┐
|
|
@@ -1368,12 +1250,10 @@ Result: ❌ PARSER ERROR - Invalid SPARQL syntax
|
|
|
1368
1250
|
│ Apache Jena: Great features, but 150+ µs lookups │
|
|
1369
1251
|
│ Neo4j: Popular, but no SPARQL/RDF standards │
|
|
1370
1252
|
│ Amazon Neptune: Managed, but cloud-only vendor lock-in │
|
|
1371
|
-
│ LangChain: Vibe coding, fails compliance audits │
|
|
1372
|
-
│ DSPy: Statistical optimization, no guarantees │
|
|
1373
1253
|
│ │
|
|
1374
1254
|
│ rust-kgdb: 2.78 µs lookups, WCOJ joins, mobile-native │
|
|
1375
1255
|
│ Standalone → Clustered on same codebase │
|
|
1376
|
-
│
|
|
1256
|
+
│ Deterministic planner, audit-ready │
|
|
1377
1257
|
│ │
|
|
1378
1258
|
└─────────────────────────────────────────────────────────────────┘
|
|
1379
1259
|
```
|