rust-kgdb 0.6.30 → 0.6.32
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +50 -499
- package/HYPERMIND_BENCHMARK_REPORT.md +199 -41
- package/README.md +62 -169
- package/benchmark-frameworks.py +568 -0
- package/package.json +3 -1
- package/verified_benchmark_results.json +307 -0
package/CLAUDE.md
CHANGED
|
@@ -8,307 +8,6 @@ This is the **TypeScript/Node.js SDK** for `rust-kgdb`, a high-performance RDF/S
|
|
|
8
8
|
|
|
9
9
|
**npm Package**: [`rust-kgdb`](https://www.npmjs.com/package/rust-kgdb)
|
|
10
10
|
|
|
11
|
-
## Benchmark Results
|
|
12
|
-
|
|
13
|
-
**HyperMind achieves 86.4% accuracy where vanilla LLMs achieve 0%.**
|
|
14
|
-
|
|
15
|
-
| Metric | Vanilla LLM | HyperMind | Improvement |
|
|
16
|
-
|--------|-------------|-----------|-------------|
|
|
17
|
-
| **Accuracy** | 0% | 86.4% | +86.4 pp |
|
|
18
|
-
| **Claude Sonnet 4** | 0% | 90.9% | +90.9 pp |
|
|
19
|
-
| **GPT-4o** | 0% | 81.8% | +81.8 pp |
|
|
20
|
-
| **Hallucinations** | 100% | 0% | Eliminated |
|
|
21
|
-
| **Audit Trail** | None | Complete | Full provenance |
|
|
22
|
-
| **Reproducibility** | Random | Deterministic | Same hash |
|
|
23
|
-
|
|
24
|
-
### How We Calculated These Numbers
|
|
25
|
-
|
|
26
|
-
```
|
|
27
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
28
|
-
│ BENCHMARK METHODOLOGY │
|
|
29
|
-
│ │
|
|
30
|
-
│ DATASET: LUBM (Lehigh University Benchmark) │
|
|
31
|
-
│ ───────────────────────────────────────────── │
|
|
32
|
-
│ • Industry-standard academic KG benchmark (since 2005) │
|
|
33
|
-
│ • 3,272 triples (LUBM-1 scale) │
|
|
34
|
-
│ • 30 OWL classes, 23 properties │
|
|
35
|
-
│ • Used by: Jena, RDFox, Stardog, GraphDB for comparison │
|
|
36
|
-
│ │
|
|
37
|
-
│ TEST PROTOCOL: 11 Hard Scenarios × 2 LLMs × 2 Approaches │
|
|
38
|
-
│ ───────────────────────────────────────────────────────── │
|
|
39
|
-
│ For each test query: │
|
|
40
|
-
│ 1. VANILLA: Send query to LLM with NO context │
|
|
41
|
-
│ 2. HYPERMIND: Send query with SchemaContext (Γ) injected │
|
|
42
|
-
│ 3. VALIDATE: Parse → Type-check → Execute → Verify results │
|
|
43
|
-
│ │
|
|
44
|
-
│ ACCURACY FORMULA: │
|
|
45
|
-
│ ───────────────── │
|
|
46
|
-
│ Accuracy = (Queries that pass ALL 3 gates) / (Total queries) × 100 │
|
|
47
|
-
│ │
|
|
48
|
-
│ Gate 1: Syntax Valid (no markdown, valid SPARQL) │
|
|
49
|
-
│ Gate 2: Executable (runs without error on rust-kgdb) │
|
|
50
|
-
│ Gate 3: Type Safe (uses ONLY predicates from SchemaContext) │
|
|
51
|
-
│ │
|
|
52
|
-
│ RESULTS: │
|
|
53
|
-
│ ───────── │
|
|
54
|
-
│ Vanilla LLM: 0/11 passed (0%) - Failed Gate 1 or 3 every time │
|
|
55
|
-
│ HyperMind: 9.5/11 passed (86.4%) - Claude: 10/11, GPT-4o: 9/11 │
|
|
56
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
**Reproducibility**: Run `node vanilla-vs-hypermind-benchmark.js` to verify these numbers yourself.
|
|
60
|
-
|
|
61
|
-
### What Was Tested
|
|
62
|
-
|
|
63
|
-
| Component | Specification |
|
|
64
|
-
|-----------|---------------|
|
|
65
|
-
| **Dataset** | LUBM (Lehigh University Benchmark) - standard academic KG benchmark |
|
|
66
|
-
| **Triples** | 3,272 (LUBM-1 scale) |
|
|
67
|
-
| **Schema** | 30 OWL classes, 23 properties |
|
|
68
|
-
| **Deployment** | rust-kgdb Kubernetes cluster (3 executors, 1 coordinator) |
|
|
69
|
-
|
|
70
|
-
### Test Categories (11 Hard Scenarios)
|
|
71
|
-
|
|
72
|
-
| Category | Count | What It Tests |
|
|
73
|
-
|----------|-------|---------------|
|
|
74
|
-
| **ambiguous** | 3 | Queries with multiple valid interpretations |
|
|
75
|
-
| **multi_hop** | 2 | Requires JOIN reasoning across entities |
|
|
76
|
-
| **syntax** | 2 | Catches markdown/formatting errors |
|
|
77
|
-
| **edge_case** | 2 | Boundary conditions, empty results |
|
|
78
|
-
| **type_mismatch** | 2 | Schema violation detection |
|
|
79
|
-
|
|
80
|
-
### How We Tested (Evaluation Protocol)
|
|
81
|
-
|
|
82
|
-
```javascript
|
|
83
|
-
// VANILLA LLM: No context (baseline)
|
|
84
|
-
const vanillaPrompt = `Generate SPARQL: ${query}`
|
|
85
|
-
// Result: LLM guesses predicates, wraps in markdown, hallucinates
|
|
86
|
-
|
|
87
|
-
// HYPERMIND: Schema injected into prompt
|
|
88
|
-
const hypermindPrompt = `
|
|
89
|
-
SCHEMA:
|
|
90
|
-
Classes: ${schema.classes.join(', ')} // From YOUR actual data
|
|
91
|
-
Predicates: ${schema.predicates.join(', ')} // From YOUR actual data
|
|
92
|
-
|
|
93
|
-
TYPE CONTRACT:
|
|
94
|
-
- Input: natural language query
|
|
95
|
-
- Output: raw SPARQL (NO markdown, NO code blocks)
|
|
96
|
-
- Precondition: Query references ONLY schema predicates
|
|
97
|
-
- Postcondition: Valid SPARQL 1.1 syntax
|
|
98
|
-
|
|
99
|
-
Query: ${query}
|
|
100
|
-
`
|
|
101
|
-
// Result: LLM generates valid, type-safe queries
|
|
102
|
-
```
|
|
103
|
-
|
|
104
|
-
### Success Criteria (Three Gates)
|
|
105
|
-
|
|
106
|
-
1. **Syntax Valid**: Query parses without errors (no markdown wrapping)
|
|
107
|
-
2. **Executable**: Query runs against database without exceptions
|
|
108
|
-
3. **Type Safe**: Uses ONLY predicates defined in schema (no hallucination)
|
|
109
|
-
|
|
110
|
-
### Why Vanilla LLMs Fail (100% Failure Rate)
|
|
111
|
-
|
|
112
|
-
```
|
|
113
|
-
User: "Find all professors"
|
|
114
|
-
|
|
115
|
-
Vanilla LLM Output:
|
|
116
|
-
┌────────────────────────────────────────────────────────────────────┐
|
|
117
|
-
│ ```sparql ← PROBLEM 1: Markdown wrapper │
|
|
118
|
-
│ PREFIX ub: <http://...> │
|
|
119
|
-
│ SELECT ?prof WHERE { │
|
|
120
|
-
│ ?prof a ub:Faculty . ← PROBLEM 2: Wrong class! │
|
|
121
|
-
│ } (Schema has "Professor") │
|
|
122
|
-
│ ``` │
|
|
123
|
-
│ │
|
|
124
|
-
│ This query finds all faculty... ← PROBLEM 3: Explanation text │
|
|
125
|
-
└────────────────────────────────────────────────────────────────────┘
|
|
126
|
-
Result: ❌ Parser rejects (markdown), wrong class (hallucinated)
|
|
127
|
-
```
|
|
128
|
-
|
|
129
|
-
### Why HyperMind Succeeds (86.4% Success Rate)
|
|
130
|
-
|
|
131
|
-
```
|
|
132
|
-
User: "Find all professors"
|
|
133
|
-
|
|
134
|
-
HyperMind Output:
|
|
135
|
-
┌────────────────────────────────────────────────────────────────────┐
|
|
136
|
-
│ PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#> │
|
|
137
|
-
│ SELECT ?prof WHERE { │
|
|
138
|
-
│ ?prof a ub:Professor . ← CORRECT: From injected schema │
|
|
139
|
-
│ } │
|
|
140
|
-
└────────────────────────────────────────────────────────────────────┘
|
|
141
|
-
Result: ✅ Parses, executes, returns 15 professors
|
|
142
|
-
```
|
|
143
|
-
|
|
144
|
-
### Calibration Against Industry Benchmarks
|
|
145
|
-
|
|
146
|
-
Our methodology is calibrated against established AI benchmarks:
|
|
147
|
-
|
|
148
|
-
| Benchmark | Organization | What It Measures | How We Applied It |
|
|
149
|
-
|-----------|--------------|------------------|-------------------|
|
|
150
|
-
| **GAIA** | Meta Research | Multi-step reasoning, tool use | Test categories (ambiguous, multi_hop) |
|
|
151
|
-
| **SWE-bench** | OpenAI | Code generation accuracy | Success criteria (syntax, executable, type-safe) |
|
|
152
|
-
| **LUBM** | Lehigh University | Knowledge graph query performance | Dataset (3,272 triples, 30 classes, 23 predicates) |
|
|
153
|
-
|
|
154
|
-
**Calibration Process:**
|
|
155
|
-
1. **GAIA-inspired categories**: We adopted GAIA's multi-step reasoning tests for `multi_hop` and `ambiguous` categories
|
|
156
|
-
2. **SWE-bench-inspired validation**: Like SWE-bench validates code patches via test suites, we validate queries via three gates (syntax → executable → type-safe)
|
|
157
|
-
3. **LUBM standard dataset**: Industry-standard academic benchmark ensures reproducibility across implementations
|
|
158
|
-
|
|
159
|
-
### Verification Method
|
|
160
|
-
|
|
161
|
-
```
|
|
162
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
163
|
-
│ VERIFICATION PIPELINE │
|
|
164
|
-
│ │
|
|
165
|
-
│ 1. GENERATE LLM produces SPARQL from natural language │
|
|
166
|
-
│ │ │
|
|
167
|
-
│ ▼ │
|
|
168
|
-
│ 2. PARSE rust-kgdb SPARQL parser validates syntax │
|
|
169
|
-
│ │ ✗ Markdown? → FAIL │
|
|
170
|
-
│ │ ✗ Invalid syntax? → FAIL │
|
|
171
|
-
│ ▼ │
|
|
172
|
-
│ 3. TYPE-CHECK QueryValidator checks against SchemaContext (Γ) │
|
|
173
|
-
│ │ ✗ Unknown predicate? → FAIL (hallucination detected) │
|
|
174
|
-
│ │ ✗ Wrong domain/range? → FAIL │
|
|
175
|
-
│ ▼ │
|
|
176
|
-
│ 4. EXECUTE Query runs against LUBM dataset in rust-kgdb cluster │
|
|
177
|
-
│ │ ✗ Runtime error? → FAIL │
|
|
178
|
-
│ │ ✗ Empty when expecting results? → FAIL │
|
|
179
|
-
│ ▼ │
|
|
180
|
-
│ 5. VERIFY Results compared against known LUBM answers │
|
|
181
|
-
│ ✓ Matches expected? → PASS │
|
|
182
|
-
│ │
|
|
183
|
-
│ Each test must pass ALL 5 stages to count as SUCCESS │
|
|
184
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
185
|
-
```
|
|
186
|
-
|
|
187
|
-
### Published Results
|
|
188
|
-
|
|
189
|
-
| Artifact | Location | What It Contains |
|
|
190
|
-
|----------|----------|------------------|
|
|
191
|
-
| **Benchmark Report** | `HYPERMIND_BENCHMARK_REPORT.md` | Full methodology, per-test results, failure analysis |
|
|
192
|
-
| **Benchmark Code** | `vanilla-vs-hypermind-benchmark.js` | Runnable benchmark comparing vanilla vs HyperMind |
|
|
193
|
-
| **Example: Fraud** | `examples/fraud-detection-agent.js` | Real dataset (`FRAUD_ONTOLOGY`) loaded via `db.loadTtl()` |
|
|
194
|
-
| **Example: Underwriting** | `examples/underwriting-agent.js` | Real dataset (`UNDERWRITING_KB`) loaded via `db.loadTtl()` |
|
|
195
|
-
| **npm Package** | `rust-kgdb` | Published SDK with all benchmark code |
|
|
196
|
-
|
|
197
|
-
### Dataset Loading (Factually Verifiable)
|
|
198
|
-
|
|
199
|
-
Both examples load real ontologies/knowledge bases via `loadTtl()`:
|
|
200
|
-
|
|
201
|
-
```javascript
|
|
202
|
-
// examples/fraud-detection-agent.js (line 612)
|
|
203
|
-
db.loadTtl(FRAUD_ONTOLOGY, CONFIG.kg.graphUri)
|
|
204
|
-
// FRAUD_ONTOLOGY contains: ins:Claimant, ins:Provider, ins:Claim classes
|
|
205
|
-
// with properties: claimant, provider, amount, address (for ring detection)
|
|
206
|
-
|
|
207
|
-
// examples/underwriting-agent.js (line 766)
|
|
208
|
-
db.loadTtl(UNDERWRITING_KB, 'http://underwriting.org/data')
|
|
209
|
-
// UNDERWRITING_KB contains: uw:BusinessAccount, uw:Territory classes
|
|
210
|
-
// with properties: naicsCode, revenue, territory, hurricaneExposure, earthquakeExposure
|
|
211
|
-
```
|
|
212
|
-
|
|
213
|
-
**Verify in code**: Run `grep -n "loadTtl" examples/*.js` to see exact lines.
|
|
214
|
-
|
|
215
|
-
### End-to-End Architecture: HyperMind Deterministic Flow
|
|
216
|
-
|
|
217
|
-
```
|
|
218
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
219
|
-
│ HYPERMIND: DETERMINISTIC SCHEMA-DRIVEN EXECUTION │
|
|
220
|
-
│ Powered by rust-kgdb GraphDB (LLM OPTIONAL) │
|
|
221
|
-
│ │
|
|
222
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
223
|
-
│ │ USER: "Find high-risk providers with claims over $10,000" │ │
|
|
224
|
-
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
|
225
|
-
│ │ │
|
|
226
|
-
│ ▼ │
|
|
227
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
228
|
-
│ │ 1. SCHEMA CONTEXT (Γ) - Object, NOT string │ │
|
|
229
|
-
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
230
|
-
│ │ │ const schemaContext = await SchemaContext.fromKG(db) │ │ │
|
|
231
|
-
│ │ │ // Returns OBJECT: { classes: Set, properties: Map, ... } │ │ │
|
|
232
|
-
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
233
|
-
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
|
234
|
-
│ │ │
|
|
235
|
-
│ ▼ │
|
|
236
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
237
|
-
│ │ 2. DETERMINISTIC INTENT ANALYSIS (NO LLM) │ │
|
|
238
|
-
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
239
|
-
│ │ │ const intent = this._analyzeIntent(prompt) │ │ │
|
|
240
|
-
│ │ │ // Keyword matching: "high-risk" → intent.risk = true │ │ │
|
|
241
|
-
│ │ │ // "claims over" → intent.query = true, intent.filter │ │ │
|
|
242
|
-
│ │ │ // DETERMINISTIC: same input → same intent │ │ │
|
|
243
|
-
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
244
|
-
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
|
245
|
-
│ │ │
|
|
246
|
-
│ ▼ │
|
|
247
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
248
|
-
│ │ 3. SCHEMA-DRIVEN QUERY GENERATION (NO LLM) │ │
|
|
249
|
-
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
250
|
-
│ │ │ const sparql = this._generateSchemaSparql(intent, schema) │ │ │
|
|
251
|
-
│ │ │ // Uses SchemaContext to find matching predicates: │ │ │
|
|
252
|
-
│ │ │ // - riskScore found in schema.predicates │ │ │
|
|
253
|
-
│ │ │ // - amount found in schema.predicates │ │ │
|
|
254
|
-
│ │ │ // Generates: SELECT ?p ?score WHERE { ?p :riskScore ... } │ │ │
|
|
255
|
-
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
256
|
-
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
|
257
|
-
│ │ │
|
|
258
|
-
│ ▼ │
|
|
259
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
260
|
-
│ │ 4. VALIDATION + EXECUTION (rust-kgdb) │ │
|
|
261
|
-
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
262
|
-
│ │ │ const validation = validateQuery(sparql, schemaContext) │ │ │
|
|
263
|
-
│ │ │ // ✓ All predicates exist in SchemaContext │ │ │
|
|
264
|
-
│ │ │ // ✓ Types match (domain/range) │ │ │
|
|
265
|
-
│ │ │ const results = db.querySelect(sparql) // 2.78 µs │ │ │
|
|
266
|
-
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
267
|
-
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
|
268
|
-
│ │ │
|
|
269
|
-
│ ▼ │
|
|
270
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
271
|
-
│ │ 5. PROOF DAG (Audit Trail) │ │
|
|
272
|
-
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
273
|
-
│ │ │ { │ │ │
|
|
274
|
-
│ │ │ answer: "Provider P001, P003 are high-risk", │ │ │
|
|
275
|
-
│ │ │ derivations: [{ tool: "kg.sparql.query", ... }], │ │ │
|
|
276
|
-
│ │ │ hash: "sha256:8f3a2b1c...", // REPRODUCIBLE │ │ │
|
|
277
|
-
│ │ │ } │ │ │
|
|
278
|
-
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
279
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
280
|
-
│ │
|
|
281
|
-
│ LLM OPTIONAL: If enabled, used ONLY for final summarization │
|
|
282
|
-
│ KEY: Same input + same schema = same query = same results = same hash │
|
|
283
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
284
|
-
```
|
|
285
|
-
|
|
286
|
-
**Code References** (verify in `hypermind-agent.js`):
|
|
287
|
-
- `_analyzeIntent()` line 2286: Deterministic keyword matching
|
|
288
|
-
- `_generateSteps()` line 2297: Schema-driven step generation
|
|
289
|
-
- `_generateSchemaSparql()` line 2368: Schema-aware SPARQL generation
|
|
290
|
-
- `validateQuery()`: Type-checks against SchemaContext
|
|
291
|
-
|
|
292
|
-
### Run It Yourself
|
|
293
|
-
|
|
294
|
-
```bash
|
|
295
|
-
# 1. Install the SDK
|
|
296
|
-
npm install rust-kgdb
|
|
297
|
-
|
|
298
|
-
# 2. Set API keys
|
|
299
|
-
export OPENAI_API_KEY="sk-..."
|
|
300
|
-
export ANTHROPIC_API_KEY="sk-ant-..."
|
|
301
|
-
|
|
302
|
-
# 3. Run benchmark
|
|
303
|
-
node vanilla-vs-hypermind-benchmark.js
|
|
304
|
-
|
|
305
|
-
# 4. Run examples
|
|
306
|
-
node examples/fraud-detection-agent.js
|
|
307
|
-
node examples/underwriting-agent.js
|
|
308
|
-
```
|
|
309
|
-
|
|
310
|
-
**All results are reproducible.** Same schema + same question = same answer = same hash.
|
|
311
|
-
|
|
312
11
|
## Commands
|
|
313
12
|
|
|
314
13
|
### Build Native Addon
|
|
@@ -332,8 +31,9 @@ npx jest tests/regression.test.ts --testNamePattern="SPARQL"
|
|
|
332
31
|
### Publishing
|
|
333
32
|
|
|
334
33
|
```bash
|
|
335
|
-
npm
|
|
336
|
-
npm
|
|
34
|
+
npm version patch --no-git-tag-version # Bump version
|
|
35
|
+
npm publish # Publish to npm
|
|
36
|
+
npm view rust-kgdb # View package info
|
|
337
37
|
```
|
|
338
38
|
|
|
339
39
|
## Architecture
|
|
@@ -365,190 +65,17 @@ npm view rust-kgdb # View package info
|
|
|
365
65
|
1. **Native NAPI-RS** (`native/rust-kgdb-napi/src/lib.rs`): Rust bindings for GraphDB, GraphFrame, Embeddings, Datalog, Pregel
|
|
366
66
|
2. **HyperMind Framework** (`hypermind-agent.js`): Pure JS AI agent framework with schema awareness, memory, sandboxing
|
|
367
67
|
|
|
368
|
-
##
|
|
369
|
-
|
|
370
|
-
```
|
|
371
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
372
|
-
│ APPROACH COMPARISON │
|
|
373
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
374
|
-
│ │
|
|
375
|
-
│ TRADITIONAL (LangChain, AutoGPT) OUR APPROACH (HyperMind) │
|
|
376
|
-
│ ───────────────────────────────── ───────────────────────── │
|
|
377
|
-
│ │
|
|
378
|
-
│ User → LLM → Tool Call User → Deterministic │
|
|
379
|
-
│ LLM DECIDES what to call Planner → Typed Steps │
|
|
380
|
-
│ LLM GENERATES query text SCHEMA generates query │
|
|
381
|
-
│ SCHEMA validates query │
|
|
382
|
-
│ │
|
|
383
|
-
│ PROS: CONS: PROS: CONS: │
|
|
384
|
-
│ • Flexible • 20-40% • 86.4% success • Needs │
|
|
385
|
-
│ • Easy setup • Hallucinates • Zero halluc. schema │
|
|
386
|
-
│ • Vague tasks OK • No audit • Full audit • Struct. │
|
|
387
|
-
│ • Non-determ. • Reproducible data │
|
|
388
|
-
│ • Expensive • Cheap at scale only │
|
|
389
|
-
│ │
|
|
390
|
-
│ WHY WE CHOSE DETERMINISTIC: │
|
|
391
|
-
│ • Enterprise needs audit trails (compliance) │
|
|
392
|
-
│ • 86.4% vs 20-40% is category difference │
|
|
393
|
-
│ • LLM per query is expensive at scale │
|
|
394
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
395
|
-
```
|
|
396
|
-
|
|
397
|
-
## Domain-Enriched Proxy Architecture (Our Unique Approach)
|
|
398
|
-
|
|
399
|
-
HyperMind uses a **schema-enriched deterministic planner**. Key difference: LLM is OPTIONAL (for summarization only).
|
|
400
|
-
|
|
401
|
-
```
|
|
402
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
403
|
-
│ TRADITIONAL APPROACH (LangChain, AutoGPT, MCP) │
|
|
404
|
-
│ │
|
|
405
|
-
│ User Question ──► LLM (no domain knowledge) ──► LLM generates query │
|
|
406
|
-
│ │ │ │
|
|
407
|
-
│ │ ❌ Hallucinates predicates │ │
|
|
408
|
-
│ │ ❌ No schema validation ▼ │
|
|
409
|
-
│ │ Tool Call │
|
|
410
|
-
│ │ │ │
|
|
411
|
-
│ │ ❌ 20-40% success ▼ │
|
|
412
|
-
│ └─────────────────────► Results (often wrong) │
|
|
413
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
414
|
-
|
|
415
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
416
|
-
│ OUR APPROACH (HyperMind) - Schema-Enriched Deterministic Planner │
|
|
417
|
-
│ │
|
|
418
|
-
│ ┌─────────────┐ ┌─────────────────────────────────────────────────┐ │
|
|
419
|
-
│ │ Knowledge │────►│ SchemaContext (Γ) - AS OBJECT (not string!) │ │
|
|
420
|
-
│ │ Graph │ │ { │ │
|
|
421
|
-
│ └─────────────┘ │ classes: Set(['Claim', 'Provider']), │ │
|
|
422
|
-
│ │ properties: Map({ 'amount': {...} }), │ │
|
|
423
|
-
│ │ domains: Map({...}), │ │
|
|
424
|
-
│ │ ranges: Map({...}) │ │
|
|
425
|
-
│ │ } │ │
|
|
426
|
-
│ └──────────────────────┬──────────────────────────┘ │
|
|
427
|
-
│ │ │
|
|
428
|
-
│ User Question ─────────────────────────────┤ │
|
|
429
|
-
│ │ │ │
|
|
430
|
-
│ ▼ ▼ │
|
|
431
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
432
|
-
│ │ DETERMINISTIC PLANNER (no LLM!) │ │
|
|
433
|
-
│ │ ┌─────────────────────────────────────────────────────────────────┐│ │
|
|
434
|
-
│ │ │ 1. _analyzeIntent(prompt) // Keyword matching (deterministic) ││ │
|
|
435
|
-
│ │ │ 2. _generateSteps(intent, schemaContext) // From schema ││ │
|
|
436
|
-
│ │ │ 3. _generateSchemaSparql(intent, schema) // Schema-aware ││ │
|
|
437
|
-
│ │ │ 4. validateQuery(sparql, schemaContext) // Type-check ││ │
|
|
438
|
-
│ │ └─────────────────────────────────────────────────────────────────┘│ │
|
|
439
|
-
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
|
440
|
-
│ │ │
|
|
441
|
-
│ ▼ Typed, validated execution plan │
|
|
442
|
-
│ rust-kgdb Execution (2.78 µs) │
|
|
443
|
-
│ │ │
|
|
444
|
-
│ ▼ │
|
|
445
|
-
│ ProofDAG (audit trail) │
|
|
446
|
-
│ │ │
|
|
447
|
-
│ ▼ │
|
|
448
|
-
│ Results (86.4% accuracy) │
|
|
449
|
-
│ │
|
|
450
|
-
│ LLM OPTIONAL: Only for summarization (not query generation) │
|
|
451
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
452
|
-
```
|
|
453
|
-
|
|
454
|
-
**Key Insight**: SchemaContext is an OBJECT passed to the deterministic planner—NOT a string injected into an LLM prompt. Query generation is deterministic, not LLM-dependent.
|
|
455
|
-
|
|
456
|
-
### Injection vs Proxy: The CMD Analogy
|
|
457
|
-
|
|
458
|
-
Think of it like the evolution from DOS to modern shells:
|
|
459
|
-
|
|
460
|
-
```
|
|
461
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
462
|
-
│ DOS/CMD ERA (Classification Approach) │
|
|
463
|
-
│ │
|
|
464
|
-
│ User: "copy files" │
|
|
465
|
-
│ System: ❌ "Bad command or file name" │
|
|
466
|
-
│ │
|
|
467
|
-
│ You MUST know exact syntax: COPY C:\src\*.txt D:\dst\ │
|
|
468
|
-
│ No help, no context, no forgiveness. │
|
|
469
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
470
|
-
|
|
471
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
472
|
-
│ MODERN SHELL with AI (Proxy Approach) │
|
|
473
|
-
│ │
|
|
474
|
-
│ User: "copy all text files from src to dst" │
|
|
475
|
-
│ Proxy: I see your filesystem has: │
|
|
476
|
-
│ - /src/ with 47 .txt files │
|
|
477
|
-
│ - /dst/ exists and is writable │
|
|
478
|
-
│ Proxy: Generating: cp /src/*.txt /dst/ │
|
|
479
|
-
│ Proxy: ✅ Executed. 47 files copied. │
|
|
480
|
-
│ │
|
|
481
|
-
│ The PROXY knows your context and translates intent to exact commands. │
|
|
482
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
483
|
-
```
|
|
484
|
-
|
|
485
|
-
**HyperMind is the "modern shell" for knowledge graphs.** The SchemaContext is your "filesystem listing" - injected so the LLM knows what actually exists before generating queries.
|
|
486
|
-
|
|
487
|
-
### The Beautiful Integration: Context Theory + Proof Theory
|
|
488
|
-
|
|
489
|
-
HyperMind elegantly combines two mathematical foundations:
|
|
490
|
-
|
|
491
|
-
```
|
|
492
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
493
|
-
│ CONTEXT THEORY (Spivak's Ologs) │
|
|
494
|
-
│ "What CAN be said" │
|
|
495
|
-
│ │
|
|
496
|
-
│ Your Knowledge Graph as a Category: │
|
|
497
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
498
|
-
│ │ OBJECTS (Classes) MORPHISMS (Properties) │ │
|
|
499
|
-
│ │ ───────────────── ───────────────────── │ │
|
|
500
|
-
│ │ • Claim • Claim ──amount──► xsd:decimal │ │
|
|
501
|
-
│ │ • Provider • Claim ──provider──► Provider │ │
|
|
502
|
-
│ │ • Policy • Provider ──riskScore──► xsd:float │ │
|
|
503
|
-
│ │ │ │
|
|
504
|
-
│ │ SchemaContext Γ = (Classes, Properties, Domains, Ranges) │ │
|
|
505
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
506
|
-
│ │
|
|
507
|
-
│ Γ defines the "grammar" of valid statements. If it's not in Γ, │
|
|
508
|
-
│ it cannot be queried. Hallucination becomes IMPOSSIBLE. │
|
|
509
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
510
|
-
│
|
|
511
|
-
│ Schema INJECTED into LLM
|
|
512
|
-
│ LLM generates TYPED query
|
|
513
|
-
▼
|
|
514
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
515
|
-
│ PROOF THEORY (Curry-Howard) │
|
|
516
|
-
│ "How it WAS derived" │
|
|
517
|
-
│ │
|
|
518
|
-
│ Every answer has a PROOF (ProofDAG): │
|
|
519
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
520
|
-
│ │ │ │
|
|
521
|
-
│ │ CONCLUSION: "Provider P001 is high-risk" │ │
|
|
522
|
-
│ │ │ │ │
|
|
523
|
-
│ │ ├── EVIDENCE: SPARQL returned riskScore = 0.87 │ │
|
|
524
|
-
│ │ │ └── DERIVATION: Γ ⊢ ?p :riskScore ?r (type-checked) │ │
|
|
525
|
-
│ │ │ │ │
|
|
526
|
-
│ │ ├── EVIDENCE: Datalog rule matched "highRisk(?p)" │ │
|
|
527
|
-
│ │ │ └── DERIVATION: highRisk(P) :- riskScore(P,R), R>0.8 │ │
|
|
528
|
-
│ │ │ │ │
|
|
529
|
-
│ │ └── HASH: sha256:8f3a2b1c... (reproducible) │ │
|
|
530
|
-
│ │ │ │
|
|
531
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
532
|
-
│ │
|
|
533
|
-
│ Proofs are PROGRAMS (Curry-Howard correspondence): │
|
|
534
|
-
│ - Type Γ ⊢ e : τ = "expression e has type τ in context Γ" │
|
|
535
|
-
│ - Valid query = Valid proof = Executable program │
|
|
536
|
-
│ - Same input → Same proof → Same output (deterministic) │
|
|
537
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
538
|
-
```
|
|
539
|
-
|
|
540
|
-
**The Elegance**:
|
|
541
|
-
1. **Context Theory** ensures you can ONLY ask valid questions (schema-bounded)
|
|
542
|
-
2. **Proof Theory** ensures every answer has a verifiable derivation chain
|
|
543
|
-
3. **Together**: Questions are bounded by reality, answers are backed by proof
|
|
544
|
-
|
|
545
|
-
This is why HyperMind achieves **86.4% accuracy** while vanilla LLMs achieve **0%** on structured data tasks—it's not prompt engineering, it's **mathematical guarantees**.
|
|
68
|
+
## Key Files
|
|
546
69
|
|
|
547
|
-
|
|
548
|
-
|
|
549
|
-
-
|
|
550
|
-
-
|
|
551
|
-
|
|
70
|
+
| File | Purpose |
|
|
71
|
+
|------|---------|
|
|
72
|
+
| `native/rust-kgdb-napi/src/lib.rs` | NAPI-RS Rust bindings (~700 lines) |
|
|
73
|
+
| `hypermind-agent.js` | HyperMind AI Framework (~4000 lines) |
|
|
74
|
+
| `index.js` | Platform loader + exports (~167 lines) |
|
|
75
|
+
| `index.d.ts` | TypeScript definitions (~425 lines) |
|
|
76
|
+
| `test-all-features.js` | 42 feature tests |
|
|
77
|
+
| `tests/*.test.ts` | Jest test suites (~170 tests) |
|
|
78
|
+
| `examples/` | Fraud detection, underwriting demos |
|
|
552
79
|
|
|
553
80
|
## Key APIs
|
|
554
81
|
|
|
@@ -563,6 +90,22 @@ This is why HyperMind achieves **86.4% accuracy** while vanilla LLMs achieve **0
|
|
|
563
90
|
| **Pregel** | `pregelShortestPaths()` |
|
|
564
91
|
| **Factories** | `friendsGraph()`, `chainGraph()`, `starGraph()`, `completeGraph()`, `cycleGraph()` |
|
|
565
92
|
|
|
93
|
+
## HyperMind Key Methods (hypermind-agent.js)
|
|
94
|
+
|
|
95
|
+
When modifying the HyperMind framework, these are the critical methods:
|
|
96
|
+
|
|
97
|
+
| Method | Line | Purpose |
|
|
98
|
+
|--------|------|---------|
|
|
99
|
+
| `_analyzeIntent()` | ~2286 | Deterministic keyword matching (NO LLM) |
|
|
100
|
+
| `_generateSteps()` | ~2297 | Schema-driven step generation |
|
|
101
|
+
| `_generateSchemaSparql()` | ~2368 | Schema-aware SPARQL generation |
|
|
102
|
+
| `SchemaContext` class | ~699 | Object with `classes: Set`, `properties: Map` |
|
|
103
|
+
| `WasmSandbox` class | ~2612 | Capability-based execution with audit log |
|
|
104
|
+
| `TOOL_REGISTRY` | ~1687 | Typed morphisms `Query → BindingSet` |
|
|
105
|
+
| `ProofDAG` class | ~2411 | Derivation chain with hash |
|
|
106
|
+
|
|
107
|
+
**Key Design Point**: LLM is OPTIONAL - only used for summarization, NOT query generation. Query generation is deterministic from SchemaContext.
|
|
108
|
+
|
|
566
109
|
## Rust Workspace Dependencies
|
|
567
110
|
|
|
568
111
|
Native addon depends on parent workspace crates:
|
|
@@ -580,18 +123,6 @@ Native addon depends on parent workspace crates:
|
|
|
580
123
|
3. **Export**: Add to `module.exports` in `index.js`
|
|
581
124
|
4. **Tests**: Add test in `test-all-features.js`
|
|
582
125
|
|
|
583
|
-
## Key Files
|
|
584
|
-
|
|
585
|
-
| File | Purpose |
|
|
586
|
-
|------|---------|
|
|
587
|
-
| `native/rust-kgdb-napi/src/lib.rs` | NAPI-RS Rust bindings |
|
|
588
|
-
| `hypermind-agent.js` | HyperMind AI Framework (~4000 lines) |
|
|
589
|
-
| `index.js` | Platform loader + exports |
|
|
590
|
-
| `index.d.ts` | TypeScript definitions |
|
|
591
|
-
| `test-all-features.js` | 42 feature tests |
|
|
592
|
-
| `tests/*.test.ts` | Jest test suites (~170 tests) |
|
|
593
|
-
| `examples/` | Fraud detection, underwriting demos |
|
|
594
|
-
|
|
595
126
|
## Native Addon Files
|
|
596
127
|
|
|
597
128
|
Built addons (platform-specific):
|
|
@@ -601,7 +132,7 @@ Built addons (platform-specific):
|
|
|
601
132
|
|
|
602
133
|
## Version Management
|
|
603
134
|
|
|
604
|
-
1. Update version
|
|
135
|
+
1. Update version: `npm version patch --no-git-tag-version`
|
|
605
136
|
2. Run tests: `npm test`
|
|
606
137
|
3. Publish: `npm publish`
|
|
607
138
|
4. Verify: `npm view rust-kgdb versions`
|
|
@@ -616,3 +147,23 @@ cd /path/to/rust-kgdb && cargo build --workspace --release
|
|
|
616
147
|
```
|
|
617
148
|
|
|
618
149
|
**Platform error**: Supported: darwin/linux (x64/arm64), win32 (x64)
|
|
150
|
+
|
|
151
|
+
## Benchmark Information
|
|
152
|
+
|
|
153
|
+
For benchmark methodology and results, see:
|
|
154
|
+
- `HYPERMIND_BENCHMARK_REPORT.md` - Full methodology, per-test results
|
|
155
|
+
- `vanilla-vs-hypermind-benchmark.js` - HyperMind vs Vanilla LLM (JavaScript)
|
|
156
|
+
- `benchmark-frameworks.py` - Compare Vanilla/LangChain/DSPy with/without schema (Python)
|
|
157
|
+
- `examples/fraud-detection-agent.js` - Real dataset example (line 612: `loadTtl`)
|
|
158
|
+
- `examples/underwriting-agent.js` - Real dataset example (line 766: `loadTtl`)
|
|
159
|
+
|
|
160
|
+
**Running Benchmarks**:
|
|
161
|
+
```bash
|
|
162
|
+
# JavaScript benchmark (HyperMind vs Vanilla on LUBM)
|
|
163
|
+
ANTHROPIC_API_KEY=... OPENAI_API_KEY=... node vanilla-vs-hypermind-benchmark.js
|
|
164
|
+
|
|
165
|
+
# Python benchmark (Compare frameworks with/without schema)
|
|
166
|
+
OPENAI_API_KEY=... uv run --with openai --with langchain --with langchain-openai --with langchain-core --with dspy-ai python3 benchmark-frameworks.py
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
**Key Result**: HyperMind achieves 86.4% accuracy on LUBM benchmark (3,272 triples, 30 classes, 23 properties) where vanilla LLMs achieve 0%.
|