rust-kgdb 0.6.54 → 0.6.56
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +22 -0
- package/README.md +212 -1678
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -2,1819 +2,353 @@
|
|
|
2
2
|
|
|
3
3
|
[](https://www.npmjs.com/package/rust-kgdb)
|
|
4
4
|
[](https://opensource.org/licenses/Apache-2.0)
|
|
5
|
-
[](https://www.w3.org/TR/sparql11-query/)
|
|
6
|
-
|
|
7
|
-
## What Is This?
|
|
8
|
-
|
|
9
|
-
**rust-kgdb** is two layers in one package:
|
|
10
|
-
|
|
11
|
-
```
|
|
12
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
13
|
-
│ YOUR APPLICATION │
|
|
14
|
-
└─────────────────────────────────┬───────────────────────────────────────────┘
|
|
15
|
-
│
|
|
16
|
-
┌─────────────────────────────────▼───────────────────────────────────────────┐
|
|
17
|
-
│ HYPERMIND AGENT FRAMEWORK (JavaScript) │
|
|
18
|
-
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
19
|
-
│ │ LLMPlanner │ │ MemoryMgr │ │ WASM │ │ ProofDAG │ │
|
|
20
|
-
│ │ (Schema- │ │ (Working/ │ │ Sandbox │ │ (Audit │ │
|
|
21
|
-
│ │ Aware) │ │ Episodic) │ │ (Secure) │ │ Trail) │ │
|
|
22
|
-
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
|
|
23
|
-
└─────────────────────────────────┬───────────────────────────────────────────┘
|
|
24
|
-
│ NAPI-RS (zero-copy)
|
|
25
|
-
┌─────────────────────────────────▼───────────────────────────────────────────┐
|
|
26
|
-
│ RUST CORE (Native Performance) │
|
|
27
|
-
│ ┌──────────────────────────────────────────────────────────────────────┐ │
|
|
28
|
-
│ │ QUERY ENGINE │ │
|
|
29
|
-
│ │ • SPARQL 1.1 (449ns lookups) • WCOJ Joins (worst-case optimal) │ │
|
|
30
|
-
│ │ • Datalog (semi-naive eval) • Sparse Matrix (CSR/CSC reasoning) │ │
|
|
31
|
-
│ └──────────────────────────────────────────────────────────────────────┘ │
|
|
32
|
-
│ ┌──────────────────────────────────────────────────────────────────────┐ │
|
|
33
|
-
│ │ GRAPH ANALYTICS │ │
|
|
34
|
-
│ │ • GraphFrames (PageRank, Components, Triangles, Motifs) │ │
|
|
35
|
-
│ │ • Pregel BSP (Bulk Synchronous Parallel) │ │
|
|
36
|
-
│ │ • Shortest Paths, Label Propagation │ │
|
|
37
|
-
│ └──────────────────────────────────────────────────────────────────────┘ │
|
|
38
|
-
│ ┌──────────────────────────────────────────────────────────────────────┐ │
|
|
39
|
-
│ │ VECTOR & RETRIEVAL │ │
|
|
40
|
-
│ │ • HNSW Index (O(log N) ANN) • ARCADE 1-Hop Cache (O(1) neighbors) │ │
|
|
41
|
-
│ │ • Multi-provider Embeddings • RRF Reranking │ │
|
|
42
|
-
│ └──────────────────────────────────────────────────────────────────────┘ │
|
|
43
|
-
│ ┌──────────────────────────────────────────────────────────────────────┐ │
|
|
44
|
-
│ │ STORAGE │ │
|
|
45
|
-
│ │ • InMemory (dev) • RocksDB (prod) • LMDB (read-heavy) │ │
|
|
46
|
-
│ │ • SPOC/POCS/OCSP/CSPO Indexes • 24 bytes/triple │ │
|
|
47
|
-
│ └──────────────────────────────────────────────────────────────────────┘ │
|
|
48
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
49
|
-
```
|
|
50
|
-
|
|
51
|
-
### Layer 1: Rust Core (Native Performance)
|
|
52
|
-
|
|
53
|
-
| Component | What It Does | Performance |
|
|
54
|
-
|-----------|--------------|-------------|
|
|
55
|
-
| **SPARQL 1.1** | W3C-compliant query engine, 64 builtin functions | 449ns lookups |
|
|
56
|
-
| **RDF 1.2** | RDF-Star (quoted triples), TriG, N-Quads | W3C compliant |
|
|
57
|
-
| **SHACL** | W3C Shapes Constraint Language validation | Constraint engine |
|
|
58
|
-
| **PROV** | W3C Provenance ontology support | Audit trail |
|
|
59
|
-
| **WCOJ Joins** | Worst-case optimal joins for multi-way patterns | O(N^(ρ/2)) |
|
|
60
|
-
| **Datalog** | Semi-naive evaluation with recursion | Incremental |
|
|
61
|
-
| **Sparse Matrix** | CSR/CSC-based reasoning for OWL 2 RL | Memory-efficient |
|
|
62
|
-
| **GraphFrames** | PageRank, components, triangles, motifs | Parallel |
|
|
63
|
-
| **Pregel** | Bulk Synchronous Parallel graph processing | Superstep-based |
|
|
64
|
-
| **HNSW** | Hierarchical Navigable Small World index | O(log N) |
|
|
65
|
-
| **ARCADE Cache** | 1-hop neighbor pre-caching | O(1) context |
|
|
66
|
-
| **Storage** | InMemory, RocksDB, LMDB backends | 24 bytes/triple |
|
|
67
|
-
|
|
68
|
-
**Scalability Numbers (Verified Benchmark)**:
|
|
69
|
-
|
|
70
|
-
| Operation | 1 Worker | 16 Workers | Scaling |
|
|
71
|
-
|-----------|----------|------------|---------|
|
|
72
|
-
| Concurrent Writes | 66K ops/sec | 132K ops/sec | 2.0x |
|
|
73
|
-
| GraphFrame Analytics | 6.0K ops/sec | 6.5K ops/sec | Thread-safe |
|
|
74
|
-
| Memory per Triple | 24 bytes | 24 bytes | Constant |
|
|
75
|
-
|
|
76
|
-
Reproduce: `node concurrency-benchmark.js`
|
|
77
|
-
|
|
78
|
-
### Layer 2: HyperMind Agent Framework (JavaScript)
|
|
79
|
-
|
|
80
|
-
| Component | What It Does |
|
|
81
|
-
|-----------|--------------|
|
|
82
|
-
| **LLMPlanner** | Schema-aware query generation (auto-extracts from data) |
|
|
83
|
-
| **MemoryManager** | Working memory + episodic memory + long-term KG |
|
|
84
|
-
| **WASM Sandbox** | Secure execution with capability-based permissions |
|
|
85
|
-
| **ProofDAG** | Audit trail with cryptographic hash for reproducibility |
|
|
86
|
-
| **TypedTools** | Input/output validation prevents hallucination |
|
|
87
|
-
|
|
88
|
-
### WASM Sandbox Architecture
|
|
89
|
-
|
|
90
|
-
```
|
|
91
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
92
|
-
│ WASM SANDBOX (Secure Agent Execution) │
|
|
93
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
94
|
-
│ │
|
|
95
|
-
│ ┌─────────────────────┐ ┌─────────────────────┐ ┌────────────────┐ │
|
|
96
|
-
│ │ CAPABILITIES │ │ FUEL METERING │ │ AUDIT LOG │ │
|
|
97
|
-
│ │ • ReadKG │ │ • CPU budget limit │ │ • Every action │ │
|
|
98
|
-
│ │ • ExecuteTool │ │ • Prevents infinite │ │ • Timestamps │ │
|
|
99
|
-
│ │ • WriteKG (opt) │ │ loops │ │ • Arguments │ │
|
|
100
|
-
│ └─────────────────────┘ └─────────────────────┘ └────────────────┘ │
|
|
101
|
-
│ │
|
|
102
|
-
│ Agent Code → WASM Runtime → Capability Check → Tool Execution → Audit │
|
|
103
|
-
│ │
|
|
104
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
105
|
-
```
|
|
106
|
-
|
|
107
|
-
**Think of it as**: A knowledge graph database (Rust, native performance) with an AI agent runtime (JavaScript, WASM-sandboxed) on top. The database provides ground truth. The runtime makes it accessible via natural language with full security and audit trails.
|
|
108
|
-
|
|
109
|
-
### Game Changer: Embedded Database (No Installation)
|
|
110
|
-
|
|
111
|
-
```
|
|
112
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
113
|
-
│ TRADITIONAL APPROACH │
|
|
114
|
-
│ ─────────────────────── │
|
|
115
|
-
│ Your App → HTTP/gRPC → Database Server → Disk │
|
|
116
|
-
│ │
|
|
117
|
-
│ • Install database server (RDFox, Virtuoso, Neo4j) │
|
|
118
|
-
│ • Configure connections, ports, authentication │
|
|
119
|
-
│ • Network latency on every query │
|
|
120
|
-
│ • DevOps overhead for maintenance │
|
|
121
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
122
|
-
|
|
123
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
124
|
-
│ rust-kgdb: EMBEDDED │
|
|
125
|
-
│ ────────────────────── │
|
|
126
|
-
│ Your App ← contains → rust-kgdb (native addon) │
|
|
127
|
-
│ │
|
|
128
|
-
│ • npm install rust-kgdb - that's it │
|
|
129
|
-
│ • No server, no Docker, no configuration │
|
|
130
|
-
│ • Zero network latency (same process) │
|
|
131
|
-
│ • Deploy as single binary │
|
|
132
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
133
|
-
```
|
|
134
|
-
|
|
135
|
-
**Why This Matters**:
|
|
136
|
-
- **SQLite for RDF**: Like SQLite replaced MySQL for embedded use cases
|
|
137
|
-
- **449ns lookups**: No network roundtrip - direct memory access
|
|
138
|
-
- **Ship as one file**: Your app + database = single deployable
|
|
139
|
-
|
|
140
|
-
**Scale When You Need To**: Start embedded, scale to cluster when required:
|
|
141
|
-
```
|
|
142
|
-
Embedded (single node) → Clustered (distributed)
|
|
143
|
-
npm install K8s deployment
|
|
144
|
-
No config HDRF partitioning
|
|
145
|
-
Millions of triples Billions of triples
|
|
146
|
-
```
|
|
147
5
|
|
|
148
6
|
---
|
|
149
7
|
|
|
150
|
-
##
|
|
8
|
+
## The Trillion-Dollar Mistake
|
|
151
9
|
|
|
152
|
-
|
|
153
|
-
**Problem**: LLMs generate SPARQL with made-up predicates (`?person :fakeProperty ?value`).
|
|
154
|
-
**Solution**: We auto-extract your schema and inject it into prompts. The LLM can ONLY reference predicates that actually exist in your data.
|
|
10
|
+
A lawyer asks AI: *"Has this contract clause ever been challenged in court?"*
|
|
155
11
|
|
|
156
|
-
|
|
157
|
-
**Problem**: LangChain/DSPy generate queries, but you need to find a database to run them.
|
|
158
|
-
**Solution**: rust-kgdb IS the database. Generate query → Execute query → Return results. All in one package.
|
|
12
|
+
AI responds: *"Yes, in Smith v. Johnson (2019), the court ruled..."*
|
|
159
13
|
|
|
160
|
-
|
|
161
|
-
**Problem**: LLM says "Provider P001 is suspicious" - where did that come from?
|
|
162
|
-
**Solution**: Every answer includes a reasoning trace showing which SPARQL queries ran, which rules matched, and what data was found.
|
|
14
|
+
The lawyer cites it. The judge looks confused. **That case doesn't exist.** The AI invented it.
|
|
163
15
|
|
|
164
|
-
|
|
165
|
-
**Problem**: Ask the same question twice, get different answers.
|
|
166
|
-
**Solution**: Same input → Same query → Same database → Same result → Same hash. Reproducible for compliance.
|
|
16
|
+
This isn't rare. It happens every day:
|
|
167
17
|
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
18
|
+
**In Healthcare:**
|
|
19
|
+
> Doctor: "What drugs interact with this patient's current medications?"
|
|
20
|
+
> AI: "Avoid combining with Nexapril due to cardiac risks."
|
|
21
|
+
> *Nexapril isn't a real drug.*
|
|
171
22
|
|
|
172
|
-
|
|
23
|
+
**In Insurance:**
|
|
24
|
+
> Claims Adjuster: "Has this provider shown suspicious billing patterns?"
|
|
25
|
+
> AI: "Provider #4521 has a history of duplicate billing..."
|
|
26
|
+
> *Provider #4521 has a perfect record.*
|
|
173
27
|
|
|
174
|
-
|
|
28
|
+
**In Fraud Detection:**
|
|
29
|
+
> Analyst: "Find transactions that look like money laundering."
|
|
30
|
+
> AI: "Account ending 7842 shows classic layering behavior..."
|
|
31
|
+
> *That account belongs to a charity. Now you've falsely accused them.*
|
|
175
32
|
|
|
176
|
-
**The
|
|
177
|
-
|
|
178
|
-
**The Solution**: HyperMind grounds every AI answer in YOUR actual data. Every response includes a complete audit trail. Same question = Same answer = Same proof.
|
|
33
|
+
**The AI doesn't know your data. It guesses. And it sounds confident while lying.**
|
|
179
34
|
|
|
180
35
|
---
|
|
181
36
|
|
|
182
|
-
##
|
|
183
|
-
|
|
184
|
-
### Benchmark Methodology
|
|
185
|
-
|
|
186
|
-
**Dataset**: [LUBM (Lehigh University Benchmark)](http://swat.cse.lehigh.edu/projects/lubm/) - the industry-standard benchmark for RDF/SPARQL systems since 2005. Used by RDFox, Virtuoso, Jena, and all major triple stores.
|
|
187
|
-
|
|
188
|
-
**Setup**:
|
|
189
|
-
- 3,272 triples, 30 OWL classes, 23 properties
|
|
190
|
-
- 7 query types: attribute (A1-A3), statistical (S1-S2), multi-hop (M1), existence (E1)
|
|
191
|
-
- Model: GPT-4o with real API calls (no mocking)
|
|
192
|
-
- Reproducible: `python3 benchmark-frameworks.py`
|
|
193
|
-
|
|
194
|
-
**Evaluation Criteria**:
|
|
195
|
-
- Query must parse (no markdown, no explanation text)
|
|
196
|
-
- Query must use correct ontology terms (e.g., `ub:Professor` not `ub:Faculty`)
|
|
197
|
-
- Query must return expected result count
|
|
198
|
-
|
|
199
|
-
### Honest Framework Comparison
|
|
200
|
-
|
|
201
|
-
**Important**: HyperMind and LangChain/DSPy are **different product categories**.
|
|
202
|
-
|
|
203
|
-
| Category | HyperMind | LangChain/DSPy |
|
|
204
|
-
|----------|-----------|----------------|
|
|
205
|
-
| **What It Is** | GraphDB + Agent Framework | LLM Orchestration Library |
|
|
206
|
-
| **Core Function** | Execute queries on data | Chain LLM prompts |
|
|
207
|
-
| **Data Storage** | Built-in QuadStore | None (BYODB) |
|
|
208
|
-
| **Query Execution** | Native SPARQL/Datalog | External DB needed |
|
|
209
|
-
| **Agent Memory** | Built-in (Working + Episodic + KG-backed) | External vector DB needed |
|
|
210
|
-
| **Deep Flashback** | 94% Recall@10 at 10K query depth (16.7ms) | Limited by external provider |
|
|
211
|
-
|
|
212
|
-
**Why Agent Memory Matters**: We can retrieve relevant past queries from 10,000+ history entries with 94% accuracy in 16.7ms. This enables "flashback" to any past interaction - LangChain/DSPy require external vector DBs for this capability.
|
|
213
|
-
|
|
214
|
-
**Built-in Capabilities (No External Dependencies)**:
|
|
37
|
+
## Why "Guardrails" Don't Fix This
|
|
215
38
|
|
|
216
|
-
|
|
217
|
-
|------------|-----------|----------------|
|
|
218
|
-
| **Recursive Reasoning** | Datalog semi-naive evaluation (native) | Manual implementation needed |
|
|
219
|
-
| **Graph Propagation** | Pregel BSP (PageRank, shortest paths) | External library (NetworkX) |
|
|
220
|
-
| **Multi-way Joins** | WCOJ algorithm O(N^(ρ/2)) | No native support |
|
|
221
|
-
| **Pattern Matching** | Motif DSL `(a)-[]->(b); (b)-[]->(c)` | Manual graph traversal |
|
|
222
|
-
| **OWL 2 RL Reasoning** | Sparse matrix CSR/CSC (native) | External reasoner needed |
|
|
223
|
-
| **Vector Similarity** | HNSW + ARCADE 1-hop cache | External vector DB (Pinecone, etc.) |
|
|
224
|
-
| **Transitive Closure** | `ancestor(?X,?Z) :- parent(?X,?Y), ancestor(?Y,?Z)` | Loop implementation |
|
|
225
|
-
| **RDF-Star** | Native quoted triples (RDF 1.2) | Not supported |
|
|
226
|
-
| **Data Validation** | SHACL constraints (W3C) | External validator needed |
|
|
227
|
-
| **Provenance Tracking** | W3C PROV ontology (native) | Manual implementation |
|
|
39
|
+
The industry response? Add guardrails. Use RAG. Fine-tune models.
|
|
228
40
|
|
|
229
|
-
|
|
41
|
+
But here's what they don't tell you:
|
|
230
42
|
|
|
231
|
-
|
|
232
|
-
|--------|-----------|------------|
|
|
233
|
-
| **Triple Lookup** | 449 ns | 35x faster than RDFox |
|
|
234
|
-
| **Memory/Triple** | 24 bytes | 25% less than RDFox |
|
|
235
|
-
| **Concurrent Writes** | 132K ops/sec | Thread-safe at scale |
|
|
43
|
+
**RAG (Retrieval-Augmented Generation)** finds *similar* documents. Similar isn't the same as *correct*. If your policy database has 10,000 documents about cardiac drugs, RAG might retrieve the wrong 5.
|
|
236
44
|
|
|
237
|
-
**
|
|
45
|
+
**Fine-tuning** teaches the model patterns from your data. But patterns aren't facts. It still can't look up "does Patient X have a penicillin allergy" because it doesn't have a database - it has patterns.
|
|
238
46
|
|
|
239
|
-
|
|
240
|
-
- **LangChain**: When you need to orchestrate multiple LLM calls with prompts. Flexible, extensive integrations.
|
|
241
|
-
- **DSPy**: When you need to optimize prompts programmatically. Research-focused.
|
|
47
|
+
**Guardrails** catch obvious errors. But "Provider #4521 shows billing anomalies" sounds completely plausible. No guardrail catches it.
|
|
242
48
|
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
```
|
|
246
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
247
|
-
│ TEXT → INTENT → EMBEDDING → NEIGHBORS → ACCURATE SPARQL │
|
|
248
|
-
│ (The ARCADE Pipeline) │
|
|
249
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
250
|
-
│ │
|
|
251
|
-
│ 1. TEXT INPUT │
|
|
252
|
-
│ "Find high-risk providers" │
|
|
253
|
-
│ ↓ │
|
|
254
|
-
│ 2. INTENT CLASSIFICATION (Deterministic keyword matching) │
|
|
255
|
-
│ Intent: QUERY_ENTITIES │
|
|
256
|
-
│ Domain: insurance, Entity: provider, Filter: high-risk │
|
|
257
|
-
│ ↓ │
|
|
258
|
-
│ 3. EMBEDDING LOOKUP (HNSW index, 449ns) │
|
|
259
|
-
│ Query: "provider" → Vector [0.23, 0.87, ...] │
|
|
260
|
-
│ Similar entities: [:Provider, :Vendor, :Supplier] │
|
|
261
|
-
│ ↓ │
|
|
262
|
-
│ 4. 1-HOP NEIGHBOR RETRIEVAL (ARCADE Cache) │
|
|
263
|
-
│ :Provider → outgoing: [:hasRiskScore, :hasClaim, :worksFor] │
|
|
264
|
-
│ :Provider → incoming: [:submittedBy, :reviewedBy] │
|
|
265
|
-
│ Cache hit: O(1) lookup, no SPARQL needed │
|
|
266
|
-
│ ↓ │
|
|
267
|
-
│ 5. SCHEMA-AWARE SPARQL GENERATION │
|
|
268
|
-
│ Available predicates: {hasRiskScore, hasClaim, worksFor} │
|
|
269
|
-
│ Filter mapping: "high-risk" → ?score > 0.7 │
|
|
270
|
-
│ Generated: SELECT ?p WHERE { ?p :hasRiskScore ?s . FILTER(?s > 0.7) } │
|
|
271
|
-
│ │
|
|
272
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
273
|
-
│ WHY THIS WORKS: │
|
|
274
|
-
│ • Step 2: NO LLM needed - deterministic pattern matching │
|
|
275
|
-
│ • Step 3: Embedding similarity finds related concepts │
|
|
276
|
-
│ • Step 4: ARCADE cache provides schema context in O(1) │
|
|
277
|
-
│ • Step 5: Schema injection ensures only valid predicates used │
|
|
278
|
-
│ │
|
|
279
|
-
│ ARCADE = Adaptive Retrieval Cache for Approximate Dense Embeddings │
|
|
280
|
-
│ Paper: https://arxiv.org/abs/2104.08663 │
|
|
281
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
282
|
-
```
|
|
283
|
-
|
|
284
|
-
**Embedding Trigger Setup** (automatic on triple insert):
|
|
285
|
-
```javascript
|
|
286
|
-
const { EmbeddingService, GraphDB } = require('rust-kgdb')
|
|
287
|
-
|
|
288
|
-
const db = new GraphDB('http://example.org/')
|
|
289
|
-
const embeddings = new EmbeddingService()
|
|
290
|
-
|
|
291
|
-
// On every triple insert, embedding cache is updated
|
|
292
|
-
db.loadTtl(':Provider123 :hasRiskScore "0.87" .', null)
|
|
293
|
-
// Triggers: embeddings.onTripleInsert('Provider123', 'hasRiskScore', '0.87', null)
|
|
294
|
-
// 1-hop cache updated: Provider123 → outgoing: [hasRiskScore]
|
|
295
|
-
```
|
|
296
|
-
|
|
297
|
-
### End-to-End Capability Benchmark
|
|
298
|
-
|
|
299
|
-
```
|
|
300
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
301
|
-
│ CAPABILITY COMPARISON: What Can Actually Execute on Data │
|
|
302
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
303
|
-
│ │
|
|
304
|
-
│ Capability │ HyperMind │ LangChain/DSPy │
|
|
305
|
-
│ ───────────────────────────────────────────────────────── │
|
|
306
|
-
│ Generate Motif Pattern │ ✅ │ ✅ │
|
|
307
|
-
│ Generate Datalog Rules │ ✅ │ ✅ │
|
|
308
|
-
│ Execute Motif on Data │ ✅ │ ❌ (no DB) │
|
|
309
|
-
│ Execute Datalog Rules │ ✅ │ ❌ (no DB) │
|
|
310
|
-
│ Execute SPARQL Queries │ ✅ │ ❌ (no DB) │
|
|
311
|
-
│ GraphFrame Analytics │ ✅ │ ❌ (no DB) │
|
|
312
|
-
│ Deterministic Results │ ✅ │ ❌ │
|
|
313
|
-
│ Audit Trail/Provenance │ ✅ │ ❌ │
|
|
314
|
-
│ ───────────────────────────────────────────────────────── │
|
|
315
|
-
│ TOTAL │ 8/8 │ 2/8 │
|
|
316
|
-
│ │
|
|
317
|
-
│ NOTE: LangChain/DSPy CAN execute on data if you integrate a database. │
|
|
318
|
-
│ HyperMind has the database BUILT-IN. │
|
|
319
|
-
│ │
|
|
320
|
-
│ Reproduce: node benchmark-e2e-execution.js │
|
|
321
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
322
|
-
```
|
|
323
|
-
|
|
324
|
-
### Memory Retrieval Depth Benchmark
|
|
325
|
-
|
|
326
|
-
Based on academic benchmarks: MemQ (arXiv 2503.05193), mKGQAgent (Text2SPARQL 2025), MTEB.
|
|
327
|
-
|
|
328
|
-
```
|
|
329
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
330
|
-
│ BENCHMARK: Memory Retrieval at Depth (50 queries per depth) │
|
|
331
|
-
│ METHODOLOGY: LUBM schema-driven queries, HNSW index, random seed 42 │
|
|
332
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
333
|
-
│ │
|
|
334
|
-
│ DEPTH │ P50 LATENCY │ P95 LATENCY │ Recall@5 │ Recall@10 │ MRR │
|
|
335
|
-
│ ──────────────────────────────────────────────────────────────────────────│
|
|
336
|
-
│ 10 │ 0.06 ms │ 0.26 ms │ 78% │ 100% │ 0.68 │
|
|
337
|
-
│ 100 │ 0.50 ms │ 0.75 ms │ 88% │ 98% │ 0.42 │
|
|
338
|
-
│ 1,000 │ 1.59 ms │ 5.03 ms │ 80% │ 94% │ 0.50 │
|
|
339
|
-
│ 10,000 │ 16.71 ms │ 17.37 ms │ 76% │ 94% │ 0.54 │
|
|
340
|
-
│ ──────────────────────────────────────────────────────────────────────────│
|
|
341
|
-
│ │
|
|
342
|
-
│ KEY INSIGHT: Even at 10,000 stored queries, Recall@10 stays at 94% │
|
|
343
|
-
│ Sub-17ms retrieval from 10K query pool = practical for production use │
|
|
344
|
-
│ │
|
|
345
|
-
│ Reproduce: node memory-retrieval-benchmark.js │
|
|
346
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
347
|
-
```
|
|
348
|
-
|
|
349
|
-
### Where We Actually Outperform (Database Performance)
|
|
350
|
-
|
|
351
|
-
```
|
|
352
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
353
|
-
│ BENCHMARK: Triple Store Performance (vs Industry Leaders) │
|
|
354
|
-
│ METHODOLOGY: Criterion.rs statistical benchmarking, LUBM dataset │
|
|
355
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
356
|
-
│ │
|
|
357
|
-
│ METRIC rust-kgdb RDFox Jena Neo4j │
|
|
358
|
-
│ ───────────────────────────────────────────────────────────── │
|
|
359
|
-
│ Lookup Speed 449 ns ~5 µs ~150 µs ~5 µs │
|
|
360
|
-
│ Memory/Triple 24 bytes 36-89 bytes 50-60 bytes 70+ bytes │
|
|
361
|
-
│ Bulk Insert 146K/sec ~200K/sec ~50K/sec ~100K/sec │
|
|
362
|
-
│ Concurrent Writes 132K/sec N/A N/A N/A │
|
|
363
|
-
│ ───────────────────────────────────────────────────────────── │
|
|
364
|
-
│ │
|
|
365
|
-
│ ADVANTAGE: 35x faster lookups than RDFox, 25% less memory │
|
|
366
|
-
│ THIS IS WHERE WE GENUINELY WIN - raw database performance. │
|
|
367
|
-
│ │
|
|
368
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
369
|
-
```
|
|
370
|
-
|
|
371
|
-
### SPARQL Generation (Honest Assessment)
|
|
372
|
-
|
|
373
|
-
```
|
|
374
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
375
|
-
│ BENCHMARK: LUBM SPARQL Generation Accuracy │
|
|
376
|
-
│ DATASET: 3,272 triples │ MODEL: GPT-4o │ Real API calls │
|
|
377
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
378
|
-
│ │
|
|
379
|
-
│ FRAMEWORK NO SCHEMA WITH SCHEMA │
|
|
380
|
-
│ ───────────────────────────────────────────────────────────── │
|
|
381
|
-
│ Vanilla OpenAI 0.0% 71.4% │
|
|
382
|
-
│ LangChain 0.0% 71.4% │
|
|
383
|
-
│ DSPy 14.3% 71.4% │
|
|
384
|
-
│ ───────────────────────────────────────────────────────────── │
|
|
385
|
-
│ │
|
|
386
|
-
│ HONEST TRUTH: Schema injection improves ALL frameworks equally. │
|
|
387
|
-
│ Any framework + schema context achieves ~71% accuracy. │
|
|
388
|
-
│ │
|
|
389
|
-
│ NOTE: DSPy gets 14.3% WITHOUT schema (vs 0% for others) due to │
|
|
390
|
-
│ its structured output format. With schema, all converge to 71.4%. │
|
|
391
|
-
│ │
|
|
392
|
-
│ OUR REAL VALUE: We include the database. Others don't. │
|
|
393
|
-
│ - LangChain generates SPARQL → you need to find a database │
|
|
394
|
-
│ - HyperMind generates SPARQL → executes on built-in 449ns database │
|
|
395
|
-
│ │
|
|
396
|
-
│ Reproduce: python3 benchmark-frameworks.py │
|
|
397
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
398
|
-
```
|
|
49
|
+
The fundamental problem: **You're asking a language model to be a database. It's not.**
|
|
399
50
|
|
|
400
51
|
---
|
|
401
52
|
|
|
402
|
-
## The
|
|
403
|
-
|
|
404
|
-
### Manual Approach (Works, But Tedious)
|
|
405
|
-
|
|
406
|
-
```javascript
|
|
407
|
-
// STEP 1: Manually write your schema (takes hours for large ontologies)
|
|
408
|
-
const LUBM_SCHEMA = `
|
|
409
|
-
PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
|
|
410
|
-
Classes: University, Department, Professor, Student, Course, Publication
|
|
411
|
-
Properties: teacherOf(Faculty→Course), worksFor(Faculty→Department)
|
|
412
|
-
`;
|
|
413
|
-
|
|
414
|
-
// STEP 2: Pass schema to LLM
|
|
415
|
-
const answer = await openai.chat.completions.create({
|
|
416
|
-
model: 'gpt-4o',
|
|
417
|
-
messages: [
|
|
418
|
-
{ role: 'system', content: `${LUBM_SCHEMA}\nOutput raw SPARQL only.` },
|
|
419
|
-
{ role: 'user', content: 'Find suspicious providers' }
|
|
420
|
-
]
|
|
421
|
-
});
|
|
422
|
-
|
|
423
|
-
// STEP 3: Parse out the SPARQL (handle markdown, explanations, etc.)
|
|
424
|
-
const sparql = extractSPARQL(answer.choices[0].message.content);
|
|
425
|
-
|
|
426
|
-
// STEP 4: Find a SPARQL database (Jena? RDFox? Virtuoso?)
|
|
427
|
-
// STEP 5: Connect to database
|
|
428
|
-
// STEP 6: Execute query
|
|
429
|
-
// STEP 7: Parse results
|
|
430
|
-
// STEP 8: No audit trail - you'd have to build that yourself
|
|
431
|
-
|
|
432
|
-
// RESULT: ~71% accuracy (same as HyperMind with schema)
|
|
433
|
-
// BUT: 5-8 manual integration steps
|
|
434
|
-
```
|
|
435
|
-
|
|
436
|
-
### HyperMind Approach (Integrated)
|
|
437
|
-
|
|
438
|
-
```javascript
|
|
439
|
-
// ONE-TIME SETUP: Load your data
|
|
440
|
-
const { HyperMindAgent, GraphDB } = require('rust-kgdb');
|
|
441
|
-
|
|
442
|
-
const db = new GraphDB('http://insurance.org/');
|
|
443
|
-
db.loadTtl(yourActualData, null); // Schema auto-extracted from data
|
|
53
|
+
## The Insight That Changes Everything
|
|
444
54
|
|
|
445
|
-
|
|
446
|
-
const result = await agent.call('Find suspicious providers');
|
|
55
|
+
What if we stopped asking AI for **answers** and started asking it for **questions**?
|
|
447
56
|
|
|
448
|
-
|
|
449
|
-
// "Provider PROV001 has risk score 0.87 with 47 claims over $50,000"
|
|
450
|
-
|
|
451
|
-
// WHAT YOU GET (ALL AUTOMATIC):
|
|
452
|
-
// ✅ Schema auto-extracted (no manual prompt engineering)
|
|
453
|
-
// ✅ Query executed on built-in database (no external DB needed)
|
|
454
|
-
// ✅ Full audit trail included
|
|
455
|
-
// ✅ Reproducible hash for compliance
|
|
456
|
-
|
|
457
|
-
console.log(result.reasoningTrace);
|
|
458
|
-
// [
|
|
459
|
-
// { tool: 'kg.sparql.query', input: 'SELECT ?p WHERE...', output: '[PROV001]' },
|
|
460
|
-
// { tool: 'kg.datalog.apply', input: 'highRisk(?p) :- ...', output: 'MATCHED' }
|
|
461
|
-
// ]
|
|
462
|
-
|
|
463
|
-
console.log(result.hash);
|
|
464
|
-
// "sha256:8f3a2b1c..." - Same question = Same answer = Same hash
|
|
465
|
-
```
|
|
57
|
+
Think about how a skilled legal researcher works:
|
|
466
58
|
|
|
467
|
-
**
|
|
468
|
-
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
---
|
|
59
|
+
1. **Lawyer asks:** "Has this clause been challenged?"
|
|
60
|
+
2. **Researcher understands** the legal question
|
|
61
|
+
3. **Researcher searches** actual case law databases
|
|
62
|
+
4. **Returns cases** that actually exist, with citations
|
|
472
63
|
|
|
473
|
-
|
|
64
|
+
The AI should be the researcher - understanding intent and writing queries. The database should find facts.
|
|
474
65
|
|
|
66
|
+
**Before (Dangerous):**
|
|
475
67
|
```
|
|
476
|
-
|
|
477
|
-
|
|
478
|
-
├───────────────────────────────────────────────────────────────────────────┤
|
|
479
|
-
│ │
|
|
480
|
-
│ TRADITIONAL: CODE GENERATION OUR APPROACH: NO CODE GENERATION │
|
|
481
|
-
│ ──────────────────────────── ──────────────────────────────── │
|
|
482
|
-
│ │
|
|
483
|
-
│ User → LLM → Generate Code User → Domain-Enriched Proxy │
|
|
484
|
-
│ │
|
|
485
|
-
│ ❌ SLOW: LLM generates text ✅ FAST: Pre-built typed tools │
|
|
486
|
-
│ ❌ ERROR-PRONE: Syntax errors ✅ RELIABLE: Schema-validated │
|
|
487
|
-
│ ❌ UNPREDICTABLE: Different ✅ DETERMINISTIC: Same every time │
|
|
488
|
-
│ │
|
|
489
|
-
├───────────────────────────────────────────────────────────────────────────┤
|
|
490
|
-
│ TRADITIONAL FLOW OUR FLOW │
|
|
491
|
-
│ ──────────────── ──────── │
|
|
492
|
-
│ │
|
|
493
|
-
│ 1. User asks question 1. User asks question │
|
|
494
|
-
│ 2. LLM generates code (SLOW) 2. Intent matched (INSTANT) │
|
|
495
|
-
│ 3. Code has syntax error? 3. Schema object consulted │
|
|
496
|
-
│ 4. Retry with LLM (SLOW) 4. Typed tool selected │
|
|
497
|
-
│ 5. Code runs, wrong result? 5. Query built from schema │
|
|
498
|
-
│ 6. Retry with LLM (SLOW) 6. Validated & executed │
|
|
499
|
-
│ 7. Maybe works after 3-5 tries 7. Works first time │
|
|
500
|
-
│ │
|
|
501
|
-
├───────────────────────────────────────────────────────────────────────────┤
|
|
502
|
-
│ OUR DOMAIN-ENRICHED PROXY LAYER │
|
|
503
|
-
│ ─────────────────────────────── │
|
|
504
|
-
│ │
|
|
505
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
506
|
-
│ │ CONTEXT THEORY (Spivak's Ologs) │ │
|
|
507
|
-
│ │ SchemaContext = { classes: Set, properties: Map, domains, ranges } │ │
|
|
508
|
-
│ │ → Defines WHAT can be queried (schema as category) │ │
|
|
509
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
510
|
-
│ │ │
|
|
511
|
-
│ ▼ │
|
|
512
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
513
|
-
│ │ TYPE THEORY (Hindley-Milner) │ │
|
|
514
|
-
│ │ TOOL_REGISTRY = { 'kg.sparql.query': Query → BindingSet, ... } │ │
|
|
515
|
-
│ │ → Defines HOW tools compose (typed morphisms) │ │
|
|
516
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
517
|
-
│ │ │
|
|
518
|
-
│ ▼ │
|
|
519
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
520
|
-
│ │ PROOF THEORY (Curry-Howard) │ │
|
|
521
|
-
│ │ ProofDAG = { derivations: [...], hash: "sha256:..." } │ │
|
|
522
|
-
│ │ → Proves HOW answer was derived (audit trail) │ │
|
|
523
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
524
|
-
│ │
|
|
525
|
-
├───────────────────────────────────────────────────────────────────────────┤
|
|
526
|
-
│ RESULTS: SPEED + ACCURACY │
|
|
527
|
-
│ ───────────────────────── │
|
|
528
|
-
│ │
|
|
529
|
-
│ TRADITIONAL (Code Gen) OUR APPROACH (Proxy Layer) │
|
|
530
|
-
│ • 2-5 seconds per query • <100ms per query (20-50x FASTER) │
|
|
531
|
-
│ • 0-14% accuracy (no schema) • 71% accuracy (schema auto-injected) │
|
|
532
|
-
│ • Retry loops on errors • No retries needed │
|
|
533
|
-
│ • $0.01-0.05 per query • <$0.001 per query (cached patterns) │
|
|
534
|
-
│ │
|
|
535
|
-
├───────────────────────────────────────────────────────────────────────────┤
|
|
536
|
-
│ WHY NO CODE GENERATION: │
|
|
537
|
-
│ ─────────────────────── │
|
|
538
|
-
│ 1. CODE GEN IS SLOW: LLM takes 1-3 seconds per query │
|
|
539
|
-
│ 2. CODE GEN IS ERROR-PRONE: Syntax errors, hallucination │
|
|
540
|
-
│ 3. CODE GEN IS EXPENSIVE: Every query costs LLM tokens │
|
|
541
|
-
│ 4. CODE GEN IS NON-DETERMINISTIC: Same question → different code │
|
|
542
|
-
│ │
|
|
543
|
-
│ OUR PROXY LAYER PROVIDES: │
|
|
544
|
-
│ 1. SPEED: Deterministic planner runs in milliseconds │
|
|
545
|
-
│ 2. ACCURACY: Schema object ensures only valid predicates │
|
|
546
|
-
│ 3. COST: No LLM needed for query generation │
|
|
547
|
-
│ 4. DETERMINISM: Same input → same query → same result → same hash │
|
|
548
|
-
└───────────────────────────────────────────────────────────────────────────┘
|
|
68
|
+
Lawyer: "Has this clause been challenged?"
|
|
69
|
+
AI: "Yes, in Smith v. Johnson (2019)..." ← FABRICATED
|
|
549
70
|
```
|
|
550
71
|
|
|
551
|
-
**
|
|
72
|
+
**After (Safe):**
|
|
552
73
|
```
|
|
553
|
-
|
|
554
|
-
|
|
555
|
-
|
|
556
|
-
|
|
557
|
-
Result returned (NO PROOF)
|
|
558
|
-
|
|
559
|
-
(20-40% accuracy, 2-5 sec/query, $0.01-0.05/query)
|
|
560
|
-
|
|
561
|
-
OUR APPROACH: User → Proxied Objects → WASM Sandbox → RPC → Real Systems
|
|
562
|
-
│
|
|
563
|
-
├── SchemaContext (Context Theory)
|
|
564
|
-
│ └── Live object: { classes: Set, properties: Map }
|
|
565
|
-
│ └── NOT serialized JSON string
|
|
566
|
-
│
|
|
567
|
-
├── TOOL_REGISTRY (Type Theory)
|
|
568
|
-
│ └── Typed morphisms: Query → BindingSet
|
|
569
|
-
│ └── Composition validated at compile-time
|
|
570
|
-
│
|
|
571
|
-
├── WasmSandbox (Secure Execution)
|
|
572
|
-
│ └── Capability-based: ReadKG, ExecuteTool
|
|
573
|
-
│ └── Fuel metering: prevents infinite loops
|
|
574
|
-
│ └── Full audit log: every action traced
|
|
575
|
-
│
|
|
576
|
-
├── rust-kgdb via NAPI-RS (Native RPC)
|
|
577
|
-
│ └── 449ns lookups (not HTTP round-trips)
|
|
578
|
-
│ └── Zero-copy data transfer
|
|
579
|
-
│
|
|
580
|
-
└── ProofDAG (Proof Theory)
|
|
581
|
-
└── Every answer has derivation chain
|
|
582
|
-
└── Deterministic hash for reproducibility
|
|
583
|
-
|
|
584
|
-
(71% accuracy with schema, <100ms/query, <$0.001/query)
|
|
74
|
+
Lawyer: "Has this clause been challenged?"
|
|
75
|
+
AI: Generates query → Searches case database
|
|
76
|
+
Database: Returns real cases that actually exist
|
|
77
|
+
Result: "Martinez v. Apex Corp (2021), Chen v. StateBank (2018)" ← VERIFIABLE
|
|
585
78
|
```
|
|
586
79
|
|
|
587
|
-
**The
|
|
588
|
-
- **Context Theory**: `SchemaContext` object defines what CAN be queried
|
|
589
|
-
- **Type Theory**: `TOOL_REGISTRY` object defines typed tool signatures
|
|
590
|
-
- **Proof Theory**: `ProofDAG` object proves how answer was derived
|
|
591
|
-
|
|
592
|
-
**Why Proxied Objects + WASM Sandbox**:
|
|
593
|
-
- **Proxied Objects**: SchemaContext, TOOL_REGISTRY are live objects with methods, not serialized JSON
|
|
594
|
-
- **RPC to Real Systems**: Queries execute on rust-kgdb (449ns native performance)
|
|
595
|
-
- **WASM Sandbox**: Capability-based security, fuel metering, full audit trail
|
|
80
|
+
**The AI writes the question. The database finds the answer. No hallucination possible.**
|
|
596
81
|
|
|
597
82
|
---
|
|
598
83
|
|
|
599
|
-
##
|
|
84
|
+
## But Where's The Database?
|
|
600
85
|
|
|
601
|
-
|
|
86
|
+
Traditional setup for a knowledge graph:
|
|
87
|
+
- Install graph database server (weeks)
|
|
88
|
+
- Configure connections, security, backups (days)
|
|
89
|
+
- Hire a DBA (expensive)
|
|
90
|
+
- Maintain infrastructure (forever)
|
|
91
|
+
- Worry about HIPAA/SOC2 compliance for hosted data
|
|
602
92
|
|
|
93
|
+
**Our setup:**
|
|
603
94
|
```bash
|
|
604
95
|
npm install rust-kgdb
|
|
605
96
|
```
|
|
606
97
|
|
|
607
|
-
**
|
|
98
|
+
That's it. The database runs **inside your application**. No server. No Docker. No config. No data leaving your system.
|
|
608
99
|
|
|
609
|
-
|
|
100
|
+
Like SQLite - but for knowledge graphs. HIPAA-friendly by default because data never leaves your infrastructure.
|
|
610
101
|
|
|
611
|
-
|
|
612
|
-
const { GraphDB } = require('rust-kgdb')
|
|
102
|
+
---
|
|
613
103
|
|
|
614
|
-
|
|
615
|
-
db.loadTtl(':alice :knows :bob .', null)
|
|
616
|
-
const results = db.querySelect('SELECT ?who WHERE { ?who :knows :bob }')
|
|
617
|
-
console.log(results) // [{ bindings: { who: 'http://example.org/alice' } }]
|
|
618
|
-
```
|
|
104
|
+
## Real Examples
|
|
619
105
|
|
|
620
|
-
###
|
|
106
|
+
### Legal: Contract Analysis
|
|
621
107
|
|
|
622
108
|
```javascript
|
|
623
|
-
const { GraphDB, HyperMindAgent
|
|
109
|
+
const { GraphDB, HyperMindAgent } = require('rust-kgdb');
|
|
624
110
|
|
|
625
|
-
|
|
626
|
-
const db = createSchemaAwareGraphDB('http://insurance.org/')
|
|
111
|
+
const db = new GraphDB('http://lawfirm.com/');
|
|
627
112
|
db.loadTtl(`
|
|
628
|
-
|
|
629
|
-
:
|
|
630
|
-
:
|
|
631
|
-
|
|
632
|
-
|
|
633
|
-
// Create AI agent
|
|
634
|
-
const agent = new HyperMindAgent({
|
|
635
|
-
kg: db,
|
|
636
|
-
model: 'gpt-4o',
|
|
637
|
-
apiKey: process.env.OPENAI_API_KEY
|
|
638
|
-
})
|
|
639
|
-
|
|
640
|
-
// Ask questions in plain English
|
|
641
|
-
const result = await agent.call('Find high-risk providers')
|
|
642
|
-
|
|
643
|
-
// Every answer includes:
|
|
644
|
-
// - The SPARQL query that was generated
|
|
645
|
-
// - The data that was retrieved
|
|
646
|
-
// - A reasoning trace showing how the conclusion was reached
|
|
647
|
-
// - A cryptographic hash for reproducibility
|
|
648
|
-
console.log(result.answer)
|
|
649
|
-
console.log(result.reasoningTrace) // Full audit trail
|
|
650
|
-
```
|
|
651
|
-
|
|
652
|
-
---
|
|
653
|
-
|
|
654
|
-
## Framework Comparison (Verified Benchmark Setup)
|
|
655
|
-
|
|
656
|
-
The following code snippets show EXACTLY how each framework was tested. All tests use the same LUBM dataset (3,272 triples) and GPT-4o model with real API calls—no mocking.
|
|
657
|
-
|
|
658
|
-
**Reproduce yourself**: `python3 benchmark-frameworks.py` (included in package)
|
|
659
|
-
|
|
660
|
-
### Vanilla OpenAI (0% → 71.4% with schema)
|
|
661
|
-
|
|
662
|
-
```python
|
|
663
|
-
# WITHOUT SCHEMA: 0% accuracy
|
|
664
|
-
from openai import OpenAI
|
|
665
|
-
client = OpenAI()
|
|
666
|
-
|
|
667
|
-
response = client.chat.completions.create(
|
|
668
|
-
model="gpt-4o",
|
|
669
|
-
messages=[{"role": "user", "content": "Find all teachers"}]
|
|
670
|
-
)
|
|
671
|
-
# Returns: Long explanation with markdown code blocks
|
|
672
|
-
# FAILS: No usable SPARQL query
|
|
673
|
-
```
|
|
674
|
-
|
|
675
|
-
```python
|
|
676
|
-
# WITH SCHEMA: 71.4% accuracy (+71.4 pp improvement)
|
|
677
|
-
LUBM_SCHEMA = """
|
|
678
|
-
PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
|
|
679
|
-
Classes: University, Department, Professor, Student, Course, Publication
|
|
680
|
-
Properties: teacherOf(Faculty→Course), worksFor(Faculty→Department)
|
|
681
|
-
"""
|
|
682
|
-
|
|
683
|
-
response = client.chat.completions.create(
|
|
684
|
-
model="gpt-4o",
|
|
685
|
-
messages=[{
|
|
686
|
-
"role": "system",
|
|
687
|
-
"content": f"{LUBM_SCHEMA}\nOutput raw SPARQL only, no markdown."
|
|
688
|
-
}, {
|
|
689
|
-
"role": "user",
|
|
690
|
-
"content": "Find all teachers"
|
|
691
|
-
}]
|
|
692
|
-
)
|
|
693
|
-
# Returns: SELECT DISTINCT ?teacher WHERE { ?teacher a ub:Professor . }
|
|
694
|
-
# WORKS: Valid SPARQL using correct ontology terms
|
|
695
|
-
```
|
|
696
|
-
|
|
697
|
-
### LangChain (0% → 71.4% with schema)
|
|
698
|
-
|
|
699
|
-
```python
|
|
700
|
-
# WITHOUT SCHEMA: 0% accuracy
|
|
701
|
-
from langchain_openai import ChatOpenAI
|
|
702
|
-
from langchain_core.prompts import PromptTemplate
|
|
703
|
-
from langchain_core.output_parsers import StrOutputParser
|
|
704
|
-
|
|
705
|
-
llm = ChatOpenAI(model="gpt-4o")
|
|
706
|
-
template = PromptTemplate(
|
|
707
|
-
input_variables=["question"],
|
|
708
|
-
template="Generate SPARQL for: {question}"
|
|
709
|
-
)
|
|
710
|
-
chain = template | llm | StrOutputParser()
|
|
711
|
-
result = chain.invoke({"question": "Find all teachers"})
|
|
712
|
-
# Returns: Explanation + markdown code blocks
|
|
713
|
-
# FAILS: Not executable SPARQL
|
|
714
|
-
```
|
|
715
|
-
|
|
716
|
-
```python
|
|
717
|
-
# WITH SCHEMA: 71.4% accuracy (+71.4 pp improvement)
|
|
718
|
-
template = PromptTemplate(
|
|
719
|
-
input_variables=["question", "schema"],
|
|
720
|
-
template="""You are a SPARQL query generator.
|
|
721
|
-
{schema}
|
|
722
|
-
TYPE CONTRACT: Output raw SPARQL only, NO markdown, NO explanation.
|
|
723
|
-
Query: {question}
|
|
724
|
-
Output raw SPARQL only:"""
|
|
725
|
-
)
|
|
726
|
-
chain = template | llm | StrOutputParser()
|
|
727
|
-
result = chain.invoke({"question": "Find all teachers", "schema": LUBM_SCHEMA})
|
|
728
|
-
# Returns: SELECT DISTINCT ?teacher WHERE { ?teacher a ub:Professor . }
|
|
729
|
-
# WORKS: Schema injection guides correct predicate selection
|
|
730
|
-
```
|
|
731
|
-
|
|
732
|
-
### DSPy (14.3% → 71.4% with schema)
|
|
113
|
+
:Contract_2024_001 :hasClause :NonCompete_3yr ; :signedBy :ClientA .
|
|
114
|
+
:NonCompete_3yr :challengedIn :Martinez_v_Apex ; :upheldIn :Chen_v_StateBank .
|
|
115
|
+
:Martinez_v_Apex :court "9th Circuit" ; :year 2021 ; :outcome "partially_enforced" .
|
|
116
|
+
:Chen_v_StateBank :court "Delaware Chancery" ; :year 2018 ; :outcome "fully_enforced" .
|
|
117
|
+
`);
|
|
733
118
|
|
|
734
|
-
|
|
735
|
-
|
|
736
|
-
import dspy
|
|
737
|
-
from dspy import LM
|
|
119
|
+
const agent = new HyperMindAgent({ db });
|
|
120
|
+
const result = await agent.ask("Has the non-compete clause been challenged?");
|
|
738
121
|
|
|
739
|
-
|
|
740
|
-
|
|
741
|
-
|
|
742
|
-
class SPARQLGenerator(dspy.Signature):
|
|
743
|
-
"""Generate SPARQL query."""
|
|
744
|
-
question = dspy.InputField()
|
|
745
|
-
sparql = dspy.OutputField(desc="Raw SPARQL query only")
|
|
746
|
-
|
|
747
|
-
generator = dspy.Predict(SPARQLGenerator)
|
|
748
|
-
result = generator(question="Find all teachers")
|
|
749
|
-
# Returns: SELECT ?teacher WHERE { ?teacher a :Teacher . }
|
|
750
|
-
# PARTIAL: Sometimes works due to DSPy's structured output
|
|
751
|
-
```
|
|
122
|
+
console.log(result.answer);
|
|
123
|
+
// "Yes - Martinez v. Apex (9th Circuit, 2021) partially enforced;
|
|
124
|
+
// Chen v. StateBank (Delaware, 2018) fully enforced"
|
|
752
125
|
|
|
753
|
-
|
|
754
|
-
|
|
755
|
-
class SchemaSPARQLGenerator(dspy.Signature):
|
|
756
|
-
"""Generate SPARQL query using the provided schema."""
|
|
757
|
-
schema = dspy.InputField(desc="Database schema with classes and properties")
|
|
758
|
-
question = dspy.InputField(desc="Natural language question")
|
|
759
|
-
sparql = dspy.OutputField(desc="Raw SPARQL query, no markdown")
|
|
760
|
-
|
|
761
|
-
generator = dspy.Predict(SchemaSPARQLGenerator)
|
|
762
|
-
result = generator(schema=LUBM_SCHEMA, question="Find all teachers")
|
|
763
|
-
# Returns: SELECT DISTINCT ?teacher WHERE { ?teacher a ub:Professor . }
|
|
764
|
-
# WORKS: Schema + DSPy structured output = reliable queries
|
|
126
|
+
console.log(result.evidence);
|
|
127
|
+
// Full audit trail proving every fact came from your case database
|
|
765
128
|
```
|
|
766
129
|
|
|
767
|
-
###
|
|
130
|
+
### Healthcare: Drug Interactions
|
|
768
131
|
|
|
769
132
|
```javascript
|
|
770
|
-
|
|
771
|
-
|
|
772
|
-
|
|
773
|
-
|
|
774
|
-
|
|
775
|
-
|
|
776
|
-
|
|
777
|
-
kg: db,
|
|
778
|
-
model: 'gpt-4o',
|
|
779
|
-
apiKey: process.env.OPENAI_API_KEY
|
|
780
|
-
});
|
|
781
|
-
|
|
782
|
-
const result = await agent.call('Find all teachers');
|
|
783
|
-
// Schema auto-extracted: { classes: Set(30), properties: Map(23) }
|
|
784
|
-
// Query generated: SELECT ?x WHERE { ?x ub:teacherOf ?course . }
|
|
785
|
-
// Result: 39 faculty members who teach courses
|
|
786
|
-
|
|
787
|
-
console.log(result.reasoningTrace);
|
|
788
|
-
// [{ tool: 'kg.sparql.query', query: 'SELECT...', bindings: 39 }]
|
|
789
|
-
console.log(result.hash);
|
|
790
|
-
// "sha256:a7b2c3..." - Reproducible answer
|
|
791
|
-
```
|
|
792
|
-
|
|
793
|
-
**Key Insight**: All frameworks achieve the SAME accuracy (~71%) when given schema. HyperMind's value is that it extracts and injects schema AUTOMATICALLY from your data—no manual prompt engineering required. Plus it includes the database to actually execute queries.
|
|
794
|
-
|
|
795
|
-
---
|
|
796
|
-
|
|
797
|
-
## Use Cases
|
|
798
|
-
|
|
799
|
-
### Fraud Detection
|
|
133
|
+
const db = new GraphDB('http://hospital.org/');
|
|
134
|
+
db.loadTtl(`
|
|
135
|
+
:Patient_7291 :currentMedication :Warfarin ; :currentMedication :Lisinopril .
|
|
136
|
+
:Warfarin :interactsWith :Aspirin ; :interactionSeverity "high" .
|
|
137
|
+
:Warfarin :interactsWith :Ibuprofen ; :interactionSeverity "moderate" .
|
|
138
|
+
:Lisinopril :interactsWith :Potassium ; :interactionSeverity "high" .
|
|
139
|
+
`);
|
|
800
140
|
|
|
801
|
-
|
|
802
|
-
|
|
803
|
-
|
|
804
|
-
name: 'fraud-detector',
|
|
805
|
-
model: 'claude-3-opus'
|
|
806
|
-
})
|
|
807
|
-
|
|
808
|
-
const result = await agent.call('Find providers with suspicious billing patterns')
|
|
809
|
-
// Returns: List of providers with complete evidence trail
|
|
810
|
-
// - SPARQL queries executed
|
|
811
|
-
// - Rules that matched
|
|
812
|
-
// - Similar entities found via embeddings
|
|
141
|
+
const result = await agent.ask("What should we avoid prescribing to Patient 7291?");
|
|
142
|
+
// Returns ONLY drugs that actually interact with their ACTUAL medications
|
|
143
|
+
// Not hallucinated drug names - real interactions from your formulary
|
|
813
144
|
```
|
|
814
145
|
|
|
815
|
-
###
|
|
146
|
+
### Insurance: Claims Fraud Detection
|
|
816
147
|
|
|
817
148
|
```javascript
|
|
818
|
-
const
|
|
819
|
-
|
|
820
|
-
|
|
821
|
-
|
|
149
|
+
const db = new GraphDB('http://insurer.com/');
|
|
150
|
+
db.loadTtl(`
|
|
151
|
+
:Provider_892 :totalClaims 1247 ; :avgClaimAmount 3200 ; :denialRate 0.02 .
|
|
152
|
+
:Provider_445 :totalClaims 89 ; :avgClaimAmount 47000 ; :denialRate 0.34 .
|
|
153
|
+
:Provider_445 :hasPattern :UnbundledBilling ; :flaggedBy :SIU_2024_Q1 .
|
|
154
|
+
:Claim_99281 :provider :Provider_445 ; :amount 52000 ; :diagnosis :LumbarFusion .
|
|
155
|
+
`);
|
|
822
156
|
|
|
823
|
-
const result = await agent.
|
|
824
|
-
// Returns
|
|
157
|
+
const result = await agent.ask("Which providers show suspicious billing patterns?");
|
|
158
|
+
// Returns Provider_445 with ACTUAL evidence:
|
|
159
|
+
// - High avg claim ($47K vs network avg)
|
|
160
|
+
// - 34% denial rate
|
|
161
|
+
// - SIU flag from Q1 2024
|
|
162
|
+
// NOT fabricated accusations against innocent providers
|
|
825
163
|
```
|
|
826
164
|
|
|
827
|
-
###
|
|
165
|
+
### Fraud: Transaction Network Analysis
|
|
828
166
|
|
|
829
167
|
```javascript
|
|
830
|
-
const
|
|
831
|
-
|
|
832
|
-
|
|
833
|
-
|
|
834
|
-
|
|
168
|
+
const db = new GraphDB('http://bank.com/aml/');
|
|
169
|
+
db.loadTtl(`
|
|
170
|
+
:Acct_1001 :transferredTo :Acct_2002 ; :amount 9500 .
|
|
171
|
+
:Acct_2002 :transferredTo :Acct_3003 ; :amount 9400 .
|
|
172
|
+
:Acct_3003 :transferredTo :Acct_1001 ; :amount 9200 . # Circular!
|
|
173
|
+
:Acct_1001 :owner :Entity_A ; :jurisdiction "Cayman Islands" .
|
|
174
|
+
`);
|
|
175
|
+
|
|
176
|
+
// Datalog rule: Find circular payment chains (potential layering)
|
|
177
|
+
db.addRule(`
|
|
178
|
+
circularChain(X, Y, Z) :-
|
|
179
|
+
transfer(X, Y), transfer(Y, Z), transfer(Z, X),
|
|
180
|
+
amount(X, Y, A1), amount(Y, Z, A2), amount(Z, X, A3),
|
|
181
|
+
A1 > 9000, A2 > 9000, A3 > 9000.
|
|
182
|
+
`);
|
|
183
|
+
|
|
184
|
+
const result = await agent.ask("Find potential money laundering patterns");
|
|
185
|
+
// Returns the ACTUAL circular chain: 1001 → 2002 → 3003 → 1001
|
|
186
|
+
// With amounts just under $10K reporting threshold
|
|
187
|
+
// All verifiable from your transaction records
|
|
835
188
|
```
|
|
836
189
|
|
|
837
190
|
---
|
|
838
191
|
|
|
839
|
-
##
|
|
840
|
-
|
|
841
|
-
### Core Database (SPARQL 1.1)
|
|
842
|
-
| Feature | Description |
|
|
843
|
-
|---------|-------------|
|
|
844
|
-
| **SELECT/CONSTRUCT/ASK** | Full SPARQL 1.1 query support |
|
|
845
|
-
| **INSERT/DELETE/UPDATE** | SPARQL Update operations |
|
|
846
|
-
| **64 Builtin Functions** | String, numeric, date/time, hash functions |
|
|
847
|
-
| **Named Graphs** | Quad-based storage with graph isolation |
|
|
848
|
-
| **RDF-Star** | Statements about statements |
|
|
849
|
-
|
|
850
|
-
### Rule-Based Reasoning (Datalog)
|
|
851
|
-
| Feature | Description |
|
|
852
|
-
|---------|-------------|
|
|
853
|
-
| **Facts & Rules** | Define base facts and inference rules |
|
|
854
|
-
| **Semi-naive Evaluation** | Efficient incremental computation |
|
|
855
|
-
| **Recursive Queries** | Transitive closure, ancestor chains |
|
|
856
|
-
|
|
857
|
-
### Graph Analytics (GraphFrames)
|
|
858
|
-
| Feature | Description |
|
|
859
|
-
|---------|-------------|
|
|
860
|
-
| **PageRank** | Iterative node importance ranking |
|
|
861
|
-
| **Connected Components** | Find isolated subgraphs |
|
|
862
|
-
| **Shortest Paths** | BFS path finding from landmarks |
|
|
863
|
-
| **Triangle Count** | Graph density measurement |
|
|
864
|
-
| **Motif Finding** | Structural pattern matching DSL |
|
|
865
|
-
|
|
866
|
-
### Vector Similarity (Embeddings)
|
|
867
|
-
| Feature | Description |
|
|
868
|
-
|---------|-------------|
|
|
869
|
-
| **HNSW Index** | O(log N) approximate nearest neighbor |
|
|
870
|
-
| **Multi-provider** | OpenAI, Anthropic, Ollama support |
|
|
871
|
-
| **Composite Search** | RRF aggregation across providers |
|
|
872
|
-
|
|
873
|
-
### AI Agent Framework (HyperMind)
|
|
874
|
-
| Feature | Description |
|
|
875
|
-
|---------|-------------|
|
|
876
|
-
| **Schema-Aware** | Auto-extracts schema from your data |
|
|
877
|
-
| **Typed Tools** | Input/output validation prevents errors |
|
|
878
|
-
| **Audit Trail** | Every answer is traceable |
|
|
879
|
-
| **Memory** | Working, episodic, and long-term memory |
|
|
880
|
-
|
|
881
|
-
### Schema-Aware Generation (Proxied Tools)
|
|
882
|
-
|
|
883
|
-
Generate motif patterns and Datalog rules from natural language using schema injection:
|
|
192
|
+
## The Math (Explained Simply)
|
|
884
193
|
|
|
885
|
-
|
|
886
|
-
const { LLMPlanner, createSchemaAwareGraphDB } = require('rust-kgdb');
|
|
887
|
-
|
|
888
|
-
const db = createSchemaAwareGraphDB('http://insurance.org/');
|
|
889
|
-
db.loadTtl(insuranceData, null);
|
|
890
|
-
|
|
891
|
-
const planner = new LLMPlanner({ kg: db, model: 'gpt-4o' });
|
|
892
|
-
|
|
893
|
-
// Generate motif pattern from text
|
|
894
|
-
const motif = await planner.generateMotifFromText('Find circular payment patterns');
|
|
895
|
-
// Returns: {
|
|
896
|
-
// pattern: "(a)-[transfers]->(b); (b)-[transfers]->(c); (c)-[transfers]->(a)",
|
|
897
|
-
// variables: ["a", "b", "c"],
|
|
898
|
-
// predicatesUsed: ["transfers"],
|
|
899
|
-
// confidence: 0.9
|
|
900
|
-
// }
|
|
901
|
-
|
|
902
|
-
// Generate Datalog rules from text
|
|
903
|
-
const datalog = await planner.generateDatalogFromText(
|
|
904
|
-
'High risk providers are those with risk score above 0.7'
|
|
905
|
-
);
|
|
906
|
-
// Returns: {
|
|
907
|
-
// rules: [{ name: "highRisk", head: {...}, body: [...] }],
|
|
908
|
-
// datalogSyntax: ["highRisk(?x) :- provider(?x), riskScore(?x, ?score), ?score > 0.7."],
|
|
909
|
-
// predicatesUsed: ["riskScore", "provider"],
|
|
910
|
-
// confidence: 0.85
|
|
911
|
-
// }
|
|
912
|
-
```
|
|
194
|
+
### Category Theory: The Lego Rule
|
|
913
195
|
|
|
914
|
-
|
|
915
|
-
|
|
916
|
-
### Available Tools
|
|
917
|
-
| Tool | Input → Output | Description |
|
|
918
|
-
|------|----------------|-------------|
|
|
919
|
-
| `kg.sparql.query` | Query → BindingSet | Execute SPARQL SELECT |
|
|
920
|
-
| `kg.sparql.update` | Update → Result | Execute SPARQL UPDATE |
|
|
921
|
-
| `kg.datalog.apply` | Rules → InferredFacts | Apply Datalog rules |
|
|
922
|
-
| `kg.motif.find` | Pattern → Matches | Find graph patterns |
|
|
923
|
-
| `kg.embeddings.search` | Entity → SimilarEntities | Vector similarity |
|
|
924
|
-
| `kg.graphframes.pagerank` | Graph → Scores | Rank nodes |
|
|
925
|
-
| `kg.graphframes.components` | Graph → Components | Find communities |
|
|
926
|
-
|
|
927
|
-
### Performance
|
|
928
|
-
| Metric | Value | Comparison |
|
|
929
|
-
|--------|-------|------------|
|
|
930
|
-
| **Lookup Speed** | 449 ns | 5-10x faster than RDFox (verified Dec 2025) |
|
|
931
|
-
| **Bulk Insert** | 146K triples/sec | Production-grade |
|
|
932
|
-
| **Memory** | 24 bytes/triple | Best-in-class efficiency |
|
|
933
|
-
|
|
934
|
-
### Join Optimization (WCOJ)
|
|
935
|
-
| Feature | Description |
|
|
936
|
-
|---------|-------------|
|
|
937
|
-
| **WCOJ Algorithm** | Worst-case optimal joins with O(N^(ρ/2)) complexity |
|
|
938
|
-
| **Multi-way Joins** | Process multiple patterns simultaneously |
|
|
939
|
-
| **Adaptive Plans** | Cost-based optimizer selects best strategy |
|
|
940
|
-
|
|
941
|
-
**Research Foundation**: WCOJ algorithms are the state-of-the-art for graph pattern matching. See [Tentris WCOJ Update (ISWC 2025)](https://papers.dice-research.org/2025/ISWC_Tentris-WCOJ-Update/public.pdf) for latest research.
|
|
942
|
-
|
|
943
|
-
### Ontology & Reasoning
|
|
944
|
-
| Feature | Description |
|
|
945
|
-
|---------|-------------|
|
|
946
|
-
| **RDFS Reasoner** | Subclass/subproperty inference |
|
|
947
|
-
| **OWL 2 RL** | Rule-based OWL reasoning (prp-dom, prp-rng, prp-symp, prp-trp, cls-hv, cls-svf, cax-sco) |
|
|
948
|
-
| **SHACL** | W3C shapes constraint validation |
|
|
949
|
-
|
|
950
|
-
### Distribution (Clustered Mode)
|
|
951
|
-
| Feature | Description |
|
|
952
|
-
|---------|-------------|
|
|
953
|
-
| **HDRF Partitioning** | Streaming graph partitioning (subject-anchored) |
|
|
954
|
-
| **Raft Consensus** | Distributed coordination |
|
|
955
|
-
| **gRPC** | Inter-node communication |
|
|
956
|
-
| **Kubernetes-Native** | Helm charts, health checks |
|
|
957
|
-
|
|
958
|
-
### Storage Backends
|
|
959
|
-
| Backend | Use Case |
|
|
960
|
-
|---------|----------|
|
|
961
|
-
| **InMemory** | Development, testing, small datasets |
|
|
962
|
-
| **RocksDB** | Production, large datasets, ACID |
|
|
963
|
-
| **LMDB** | Read-heavy workloads, memory-mapped |
|
|
964
|
-
|
|
965
|
-
### Mobile Support
|
|
966
|
-
| Platform | Binding |
|
|
967
|
-
|----------|---------|
|
|
968
|
-
| **iOS** | Swift via UniFFI 0.30 |
|
|
969
|
-
| **Android** | Kotlin via UniFFI 0.30 |
|
|
970
|
-
| **Node.js** | NAPI-RS (this package) |
|
|
971
|
-
| **Python** | UniFFI (separate package) |
|
|
196
|
+
Imagine Lego blocks. A 2x4 brick only connects to compatible bricks.
|
|
972
197
|
|
|
973
|
-
|
|
198
|
+
We made AI tools work the same way:
|
|
199
|
+
- Query tool: takes a question, returns case citations
|
|
200
|
+
- Validation tool: takes citations, returns verified facts
|
|
974
201
|
|
|
975
|
-
|
|
976
|
-
|
|
977
|
-
| Category | Feature | What It Does |
|
|
978
|
-
|----------|---------|--------------|
|
|
979
|
-
| **Core** | GraphDB | High-performance RDF/SPARQL quad store |
|
|
980
|
-
| **Core** | SPOC Indexes | Four-way indexing (SPOC/POCS/OCSP/CSPO) |
|
|
981
|
-
| **Core** | Dictionary | String interning with 8-byte IDs |
|
|
982
|
-
| **Analytics** | GraphFrames | PageRank, connected components, triangles |
|
|
983
|
-
| **Analytics** | Motif Finding | Pattern matching DSL |
|
|
984
|
-
| **Analytics** | Pregel | BSP parallel graph processing |
|
|
985
|
-
| **AI** | Embeddings | HNSW similarity with 1-hop ARCADE cache |
|
|
986
|
-
| **AI** | HyperMind | Neuro-symbolic agent framework |
|
|
987
|
-
| **Reasoning** | Datalog | Semi-naive evaluation engine |
|
|
988
|
-
| **Reasoning** | RDFS Reasoner | Subclass/subproperty inference |
|
|
989
|
-
| **Reasoning** | OWL 2 RL | Rule-based OWL reasoning |
|
|
990
|
-
| **Ontology** | SHACL | W3C shapes constraint validation |
|
|
991
|
-
| **Joins** | WCOJ | Worst-case optimal join algorithm |
|
|
992
|
-
| **Distribution** | HDRF | Streaming graph partitioning |
|
|
993
|
-
| **Distribution** | Raft | Consensus for coordination |
|
|
994
|
-
| **Mobile** | iOS/Android | Swift and Kotlin bindings via UniFFI |
|
|
995
|
-
| **Storage** | InMemory/RocksDB/LMDB | Three backend options |
|
|
202
|
+
The AI can only chain tools where outputs match inputs. A "patient record" output can't connect to a "case citation" input. **The type system prevents nonsense combinations** - like Lego blocks that physically don't fit.
|
|
996
203
|
|
|
997
|
-
|
|
204
|
+
### WCOJ: The Court Records Trick
|
|
998
205
|
|
|
999
|
-
|
|
206
|
+
Finding "all cases where Judge X ruled on Contract Type Y involving Company Z"?
|
|
1000
207
|
|
|
1001
|
-
|
|
208
|
+
**Slow way:** Check every case with Judge X (50,000), every contract type (500K combinations), every company (25M checks).
|
|
1002
209
|
|
|
1003
|
-
|
|
1004
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1005
|
-
│ YOUR QUESTION │
|
|
1006
|
-
│ "Find suspicious providers" │
|
|
1007
|
-
└─────────────────────────────────┬───────────────────────────────────────────┘
|
|
1008
|
-
│
|
|
1009
|
-
▼
|
|
1010
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1011
|
-
│ STEP 1: SCHEMA INJECTION │
|
|
1012
|
-
│ │
|
|
1013
|
-
│ LLM receives your question PLUS your actual data schema: │
|
|
1014
|
-
│ • Classes: Claim, Provider, Policy (from YOUR database) │
|
|
1015
|
-
│ • Properties: amount, riskScore, claimCount (from YOUR database) │
|
|
1016
|
-
│ │
|
|
1017
|
-
│ The LLM can ONLY reference things that actually exist in your data. │
|
|
1018
|
-
└─────────────────────────────────┬───────────────────────────────────────────┘
|
|
1019
|
-
│
|
|
1020
|
-
▼
|
|
1021
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1022
|
-
│ STEP 2: TYPED EXECUTION PLAN │
|
|
1023
|
-
│ │
|
|
1024
|
-
│ LLM generates a plan using typed tools: │
|
|
1025
|
-
│ 1. kg.sparql.query("SELECT ?p WHERE { ?p :riskScore ?r . FILTER(?r > 0.8)}")│
|
|
1026
|
-
│ 2. kg.datalog.apply("suspicious(?p) :- highRisk(?p), highClaimCount(?p)") │
|
|
1027
|
-
│ │
|
|
1028
|
-
│ Each tool has defined inputs/outputs. Invalid combinations rejected. │
|
|
1029
|
-
└─────────────────────────────────┬───────────────────────────────────────────┘
|
|
1030
|
-
│
|
|
1031
|
-
▼
|
|
1032
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1033
|
-
│ STEP 3: DATABASE EXECUTION │
|
|
1034
|
-
│ │
|
|
1035
|
-
│ The database executes the plan against YOUR ACTUAL DATA: │
|
|
1036
|
-
│ • SPARQL query runs → finds 3 providers with riskScore > 0.8 │
|
|
1037
|
-
│ • Datalog rules run → 1 provider matches "suspicious" pattern │
|
|
1038
|
-
│ │
|
|
1039
|
-
│ Every step is recorded in the reasoning trace. │
|
|
1040
|
-
└─────────────────────────────────┬───────────────────────────────────────────┘
|
|
1041
|
-
│
|
|
1042
|
-
▼
|
|
1043
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1044
|
-
│ STEP 4: VERIFIED ANSWER │
|
|
1045
|
-
│ │
|
|
1046
|
-
│ Answer: "Provider PROV001 is suspicious (riskScore: 0.87, claims: 47)" │
|
|
1047
|
-
│ │
|
|
1048
|
-
│ + Reasoning Trace: Every query, every rule, every result │
|
|
1049
|
-
│ + Hash: sha256:8f3a2b1c... (reproducible) │
|
|
1050
|
-
│ │
|
|
1051
|
-
│ Run the same question tomorrow → Same answer → Same hash │
|
|
1052
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1053
|
-
```
|
|
1054
|
-
|
|
1055
|
-
### Why Hallucination Is Impossible
|
|
210
|
+
**Our way:** Keep sorted indexes of judges, contract types, and companies. Walk through all three simultaneously, skip impossible combinations. 50,000 checks instead of 25 million. This is called Worst-Case Optimal Join.
|
|
1056
211
|
|
|
1057
|
-
|
|
1058
|
-
|------|----------------------------|
|
|
1059
|
-
| Schema Injection | LLM only sees properties that exist in YOUR data |
|
|
1060
|
-
| Typed Tools | Invalid query structures rejected before execution |
|
|
1061
|
-
| Database Execution | Answers come from actual data, not LLM imagination |
|
|
1062
|
-
| Reasoning Trace | Every claim is backed by recorded evidence |
|
|
212
|
+
### HNSW: The Medical Specialist Network
|
|
1063
213
|
|
|
1064
|
-
|
|
214
|
+
Finding the right specialist for a rare condition from 50,000 doctors?
|
|
1065
215
|
|
|
1066
|
-
|
|
216
|
+
**Slow way:** Compare symptoms to all 50,000 doctor profiles.
|
|
1067
217
|
|
|
1068
|
-
|
|
218
|
+
**Our way:** Build a "referral network." Generalists connect to specialists who connect to sub-specialists. Start anywhere, hop toward the right match. ~20 hops instead of 50,000 comparisons.
|
|
1069
219
|
|
|
1070
|
-
|
|
220
|
+
We use this to find "similar past queries" - 10,000 historical questions searched in 16 milliseconds.
|
|
1071
221
|
|
|
1072
|
-
|
|
1073
|
-
class GraphDB {
|
|
1074
|
-
constructor(appGraphUri: string)
|
|
1075
|
-
loadTtl(ttlContent: string, graphName: string | null): void
|
|
1076
|
-
querySelect(sparql: string): QueryResult[]
|
|
1077
|
-
query(sparql: string): TripleResult[]
|
|
1078
|
-
countTriples(): number
|
|
1079
|
-
clear(): void
|
|
1080
|
-
}
|
|
1081
|
-
```
|
|
222
|
+
### Datalog: The Compliance Cascade
|
|
1082
223
|
|
|
1083
|
-
|
|
1084
|
-
|
|
1085
|
-
```typescript
|
|
1086
|
-
class HyperMindAgent {
|
|
1087
|
-
constructor(options: {
|
|
1088
|
-
kg: GraphDB, // Your knowledge graph
|
|
1089
|
-
model?: string, // 'gpt-4o' | 'claude-3-opus' | etc.
|
|
1090
|
-
apiKey?: string, // LLM API key
|
|
1091
|
-
memory?: MemoryManager,
|
|
1092
|
-
scope?: AgentScope,
|
|
1093
|
-
embeddings?: EmbeddingService
|
|
1094
|
-
})
|
|
1095
|
-
|
|
1096
|
-
call(prompt: string): Promise<AgentResponse>
|
|
1097
|
-
}
|
|
1098
|
-
|
|
1099
|
-
interface AgentResponse {
|
|
1100
|
-
answer: string
|
|
1101
|
-
reasoningTrace: ReasoningStep[] // Audit trail
|
|
1102
|
-
hash: string // Reproducibility hash
|
|
1103
|
-
}
|
|
1104
|
-
```
|
|
224
|
+
Instead of manually listing every compliance requirement:
|
|
1105
225
|
|
|
1106
|
-
### GraphFrame
|
|
1107
|
-
|
|
1108
|
-
```typescript
|
|
1109
|
-
class GraphFrame {
|
|
1110
|
-
constructor(verticesJson: string, edgesJson: string)
|
|
1111
|
-
pageRank(resetProb: number, maxIter: number): string
|
|
1112
|
-
connectedComponents(): string
|
|
1113
|
-
shortestPaths(landmarks: string[]): string
|
|
1114
|
-
triangleCount(): number
|
|
1115
|
-
find(pattern: string): string // Motif pattern matching
|
|
1116
|
-
}
|
|
1117
226
|
```
|
|
1118
|
-
|
|
1119
|
-
|
|
1120
|
-
|
|
1121
|
-
```typescript
|
|
1122
|
-
class EmbeddingService {
|
|
1123
|
-
storeVector(entityId: string, vector: number[]): void
|
|
1124
|
-
findSimilar(entityId: string, k: number, threshold: number): string
|
|
1125
|
-
rebuildIndex(): void
|
|
1126
|
-
}
|
|
227
|
+
mustReport(X) :- transaction(X), amount(X, A), A > 10000.
|
|
228
|
+
mustReport(X) :- transaction(X), involves(X, PEP).
|
|
229
|
+
mustReport(X) :- relatedTo(X, Y), mustReport(Y). # Cascades!
|
|
1127
230
|
```
|
|
1128
231
|
|
|
1129
|
-
|
|
1130
|
-
|
|
1131
|
-
```typescript
|
|
1132
|
-
class DatalogProgram {
|
|
1133
|
-
addFact(factJson: string): void
|
|
1134
|
-
addRule(ruleJson: string): void
|
|
1135
|
-
}
|
|
1136
|
-
|
|
1137
|
-
function evaluateDatalog(program: DatalogProgram): string
|
|
1138
|
-
function queryDatalog(program: DatalogProgram, query: string): string
|
|
1139
|
-
```
|
|
232
|
+
Three rules generate ALL reporting requirements automatically. Even for transactions connected to other suspicious transactions, going back as far as your data allows.
|
|
1140
233
|
|
|
1141
234
|
---
|
|
1142
235
|
|
|
1143
|
-
##
|
|
236
|
+
## Why Our Agent Memory Is Different
|
|
1144
237
|
|
|
1145
|
-
|
|
238
|
+
Most AI agents have amnesia. Ask them the same question twice, they start from scratch.
|
|
1146
239
|
|
|
1147
|
-
|
|
1148
|
-
|
|
240
|
+
**The Problem:**
|
|
241
|
+
- ChatGPT forgets your previous questions after context window fills
|
|
242
|
+
- LangChain agents rebuild context every call (~500ms overhead)
|
|
243
|
+
- Vector databases return "similar" docs, not the exact query you ran before
|
|
1149
244
|
|
|
1150
|
-
|
|
1151
|
-
db.loadTtl(`
|
|
1152
|
-
@prefix : <http://example.org/> .
|
|
1153
|
-
:alice :knows :bob .
|
|
1154
|
-
:bob :knows :charlie .
|
|
1155
|
-
:charlie :knows :alice .
|
|
1156
|
-
`, null)
|
|
1157
|
-
|
|
1158
|
-
console.log(`Loaded ${db.countTriples()} triples`) // 3
|
|
1159
|
-
|
|
1160
|
-
const results = db.querySelect(`
|
|
1161
|
-
PREFIX : <http://example.org/>
|
|
1162
|
-
SELECT ?person WHERE { ?person :knows :bob }
|
|
1163
|
-
`)
|
|
1164
|
-
console.log(results) // [{ bindings: { person: 'http://example.org/alice' } }]
|
|
1165
|
-
```
|
|
245
|
+
**Our Approach: Deep Flashback**
|
|
1166
246
|
|
|
1167
|
-
|
|
247
|
+
When you ask "find suspicious providers", we:
|
|
248
|
+
1. **Hash your intent** → Check if we've seen this exact question pattern before
|
|
249
|
+
2. **HNSW lookup** → Search 10,000 historical queries in 16ms (not 500ms)
|
|
250
|
+
3. **Return cached result** → If we've answered this before, return instantly with proof
|
|
1168
251
|
|
|
1169
|
-
|
|
1170
|
-
const { GraphFrame } = require('rust-kgdb')
|
|
1171
|
-
|
|
1172
|
-
const graph = new GraphFrame(
|
|
1173
|
-
JSON.stringify([{id:'alice'}, {id:'bob'}, {id:'charlie'}]),
|
|
1174
|
-
JSON.stringify([
|
|
1175
|
-
{src:'alice', dst:'bob'},
|
|
1176
|
-
{src:'bob', dst:'charlie'},
|
|
1177
|
-
{src:'charlie', dst:'alice'}
|
|
1178
|
-
])
|
|
1179
|
-
)
|
|
1180
|
-
|
|
1181
|
-
// Built-in algorithms
|
|
1182
|
-
console.log('Triangles:', graph.triangleCount()) // 1
|
|
1183
|
-
console.log('PageRank:', JSON.parse(graph.pageRank(0.15, 20)))
|
|
1184
|
-
console.log('Components:', JSON.parse(graph.connectedComponents()))
|
|
1185
|
-
```
|
|
252
|
+
**Benchmarked Results (Verified):**
|
|
1186
253
|
|
|
1187
|
-
|
|
254
|
+
| Metric | Result | What It Means |
|
|
255
|
+
|--------|--------|---------------|
|
|
256
|
+
| **Memory Retrieval** | 94% Recall@10 at 10K depth | Find the right past query 94% of the time |
|
|
257
|
+
| **Search Speed** | 16.7ms for 10K queries | 30x faster than typical RAG |
|
|
258
|
+
| **Write Throughput** | 132K ops/sec (16 workers) | Handle enterprise query volumes |
|
|
259
|
+
| **Read Throughput** | 302 ops/sec concurrent | Consistent under load |
|
|
1188
260
|
|
|
1189
|
-
|
|
1190
|
-
const { GraphFrame } = require('rust-kgdb')
|
|
1191
|
-
|
|
1192
|
-
// Create a graph with payment relationships
|
|
1193
|
-
const graph = new GraphFrame(
|
|
1194
|
-
JSON.stringify([
|
|
1195
|
-
{id:'company_a'}, {id:'company_b'}, {id:'company_c'}, {id:'company_d'}
|
|
1196
|
-
]),
|
|
1197
|
-
JSON.stringify([
|
|
1198
|
-
{src:'company_a', dst:'company_b'}, // A pays B
|
|
1199
|
-
{src:'company_b', dst:'company_c'}, // B pays C
|
|
1200
|
-
{src:'company_c', dst:'company_a'}, // C pays A (circular!)
|
|
1201
|
-
{src:'company_c', dst:'company_d'} // C also pays D
|
|
1202
|
-
])
|
|
1203
|
-
)
|
|
1204
|
-
|
|
1205
|
-
// Find simple edge pattern: (a)-[]->(b)
|
|
1206
|
-
const edges = JSON.parse(graph.find('(a)-[]->(b)'))
|
|
1207
|
-
console.log('All edges:', edges.length) // 4
|
|
1208
|
-
|
|
1209
|
-
// Find two-hop path: (x)-[]->(y)-[]->(z)
|
|
1210
|
-
const twoHops = JSON.parse(graph.find('(x)-[]->(y); (y)-[]->(z)'))
|
|
1211
|
-
console.log('Two-hop paths:', twoHops.length) // 3
|
|
1212
|
-
|
|
1213
|
-
// Find circular pattern (fraud detection!): A->B->C->A
|
|
1214
|
-
const circles = JSON.parse(graph.find('(a)-[]->(b); (b)-[]->(c); (c)-[]->(a)'))
|
|
1215
|
-
console.log('Circular patterns:', circles.length) // 1 (the fraud ring!)
|
|
1216
|
-
|
|
1217
|
-
// Each match includes the bound variables
|
|
1218
|
-
// circles[0] = { a: 'company_a', b: 'company_b', c: 'company_c' }
|
|
1219
|
-
```
|
|
261
|
+
**Why This Matters:**
|
|
1220
262
|
|
|
1221
|
-
|
|
263
|
+
A claims adjuster asks about Provider #445 on Monday. On Friday, a different adjuster asks the same question. Without memory:
|
|
264
|
+
- Monday: 3 seconds to generate query, execute, format
|
|
265
|
+
- Friday: 3 seconds again (total waste)
|
|
1222
266
|
|
|
1223
|
-
|
|
1224
|
-
|
|
1225
|
-
|
|
1226
|
-
const program = new DatalogProgram()
|
|
1227
|
-
program.addFact(JSON.stringify({predicate: 'parent', terms: ['alice', 'bob']}))
|
|
1228
|
-
program.addFact(JSON.stringify({predicate: 'parent', terms: ['bob', 'charlie']}))
|
|
1229
|
-
|
|
1230
|
-
// grandparent(X, Z) :- parent(X, Y), parent(Y, Z)
|
|
1231
|
-
program.addRule(JSON.stringify({
|
|
1232
|
-
head: {predicate: 'grandparent', terms: ['?X', '?Z']},
|
|
1233
|
-
body: [
|
|
1234
|
-
{predicate: 'parent', terms: ['?X', '?Y']},
|
|
1235
|
-
{predicate: 'parent', terms: ['?Y', '?Z']}
|
|
1236
|
-
]
|
|
1237
|
-
}))
|
|
1238
|
-
|
|
1239
|
-
console.log('Inferred:', JSON.parse(evaluateDatalog(program)))
|
|
1240
|
-
// grandparent(alice, charlie)
|
|
1241
|
-
```
|
|
1242
|
-
|
|
1243
|
-
### Semantic Similarity
|
|
1244
|
-
|
|
1245
|
-
```javascript
|
|
1246
|
-
const { EmbeddingService } = require('rust-kgdb')
|
|
267
|
+
With our memory:
|
|
268
|
+
- Monday: 3 seconds (first time)
|
|
269
|
+
- Friday: 16ms (cached, with full audit trail)
|
|
1247
270
|
|
|
1248
|
-
|
|
1249
|
-
|
|
1250
|
-
// Store 384-dimension vectors
|
|
1251
|
-
embeddings.storeVector('claim_001', new Array(384).fill(0.5))
|
|
1252
|
-
embeddings.storeVector('claim_002', new Array(384).fill(0.6))
|
|
1253
|
-
embeddings.rebuildIndex()
|
|
1254
|
-
|
|
1255
|
-
// HNSW similarity search
|
|
1256
|
-
const similar = JSON.parse(embeddings.findSimilar('claim_001', 5, 0.7))
|
|
1257
|
-
console.log('Similar:', similar)
|
|
1258
|
-
```
|
|
1259
|
-
|
|
1260
|
-
### Pregel (BSP Graph Processing)
|
|
1261
|
-
|
|
1262
|
-
```javascript
|
|
1263
|
-
const { chainGraph, pregelShortestPaths } = require('rust-kgdb')
|
|
1264
|
-
|
|
1265
|
-
// Create a chain: v0 -> v1 -> v2 -> v3 -> v4
|
|
1266
|
-
const graph = chainGraph(5)
|
|
1267
|
-
|
|
1268
|
-
// Compute shortest paths from v0
|
|
1269
|
-
const result = JSON.parse(pregelShortestPaths(graph, 'v0', 10))
|
|
1270
|
-
console.log('Distances:', result.distances)
|
|
1271
|
-
// { v0: 0, v1: 1, v2: 2, v3: 3, v4: 4 }
|
|
1272
|
-
console.log('Supersteps:', result.supersteps) // 5
|
|
1273
|
-
```
|
|
1274
|
-
|
|
1275
|
-
---
|
|
1276
|
-
|
|
1277
|
-
## Comprehensive Example Tables
|
|
1278
|
-
|
|
1279
|
-
### SPARQL Examples
|
|
1280
|
-
|
|
1281
|
-
| Query Type | Example | Description |
|
|
1282
|
-
|------------|---------|-------------|
|
|
1283
|
-
| **SELECT** | `SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10` | Basic triple pattern |
|
|
1284
|
-
| **FILTER** | `SELECT ?p WHERE { ?p :age ?a . FILTER(?a > 30) }` | Numeric filtering |
|
|
1285
|
-
| **OPTIONAL** | `SELECT ?p ?email WHERE { ?p a :Person . OPTIONAL { ?p :email ?email } }` | Left outer join |
|
|
1286
|
-
| **UNION** | `SELECT ?x WHERE { { ?x a :Cat } UNION { ?x a :Dog } }` | Pattern union |
|
|
1287
|
-
| **CONSTRUCT** | `CONSTRUCT { ?s :knows ?o } WHERE { ?s :friend ?o }` | Create new triples |
|
|
1288
|
-
| **ASK** | `ASK WHERE { :alice :knows :bob }` | Boolean existence check |
|
|
1289
|
-
| **INSERT** | `INSERT DATA { :alice :knows :charlie }` | Add triples |
|
|
1290
|
-
| **DELETE** | `DELETE WHERE { :alice :knows ?anyone }` | Remove triples |
|
|
1291
|
-
| **Aggregation** | `SELECT (COUNT(?p) AS ?cnt) WHERE { ?p a :Person }` | Count/Sum/Avg/Min/Max |
|
|
1292
|
-
| **GROUP BY** | `SELECT ?dept (COUNT(?e) AS ?cnt) WHERE { ?e :worksIn ?dept } GROUP BY ?dept` | Grouping |
|
|
1293
|
-
| **HAVING** | `SELECT ?dept (COUNT(?e) AS ?cnt) WHERE { ?e :worksIn ?dept } GROUP BY ?dept HAVING (COUNT(?e) > 5)` | Filter groups |
|
|
1294
|
-
| **ORDER BY** | `SELECT ?p ?age WHERE { ?p :age ?age } ORDER BY DESC(?age)` | Sorting |
|
|
1295
|
-
| **DISTINCT** | `SELECT DISTINCT ?type WHERE { ?s a ?type }` | Remove duplicates |
|
|
1296
|
-
| **VALUES** | `SELECT ?p WHERE { VALUES ?type { :Cat :Dog } ?p a ?type }` | Inline data |
|
|
1297
|
-
| **BIND** | `SELECT ?p ?label WHERE { ?p :name ?n . BIND(CONCAT("Mr. ", ?n) AS ?label) }` | Computed values |
|
|
1298
|
-
| **Subquery** | `SELECT ?p WHERE { { SELECT ?p WHERE { ?p :score ?s } ORDER BY DESC(?s) LIMIT 10 } }` | Nested queries |
|
|
1299
|
-
|
|
1300
|
-
### Datalog Examples
|
|
1301
|
-
|
|
1302
|
-
| Pattern | Rule | Description |
|
|
1303
|
-
|---------|------|-------------|
|
|
1304
|
-
| **Transitive Closure** | `ancestor(?X,?Z) :- parent(?X,?Y), ancestor(?Y,?Z)` | Recursive ancestor |
|
|
1305
|
-
| **Symmetric** | `knows(?X,?Y) :- knows(?Y,?X)` | Bidirectional relations |
|
|
1306
|
-
| **Composition** | `grandparent(?X,?Z) :- parent(?X,?Y), parent(?Y,?Z)` | Two-hop relation |
|
|
1307
|
-
| **Negation** | `lonely(?X) :- person(?X), NOT friend(?X,?Y)` | Absence check |
|
|
1308
|
-
| **Aggregation** | `popular(?X) :- friend(?X,?Y), COUNT(?Y) > 10` | Count-based rules |
|
|
1309
|
-
| **Path Finding** | `reachable(?X,?Y) :- edge(?X,?Y). reachable(?X,?Z) :- edge(?X,?Y), reachable(?Y,?Z)` | Graph connectivity |
|
|
1310
|
-
|
|
1311
|
-
### Motif Pattern Syntax
|
|
1312
|
-
|
|
1313
|
-
| Pattern | Syntax | Matches |
|
|
1314
|
-
|---------|--------|---------|
|
|
1315
|
-
| **Single Edge** | `(a)-[]->(b)` | All directed edges |
|
|
1316
|
-
| **Two-Hop** | `(a)-[]->(b); (b)-[]->(c)` | Paths of length 2 |
|
|
1317
|
-
| **Triangle** | `(a)-[]->(b); (b)-[]->(c); (c)-[]->(a)` | Closed triangles |
|
|
1318
|
-
| **Star** | `(center)-[]->(a); (center)-[]->(b); (center)-[]->(c)` | Hub patterns |
|
|
1319
|
-
| **Named Edge** | `(a)-[e]->(b)` | Capture edge in variable `e` |
|
|
1320
|
-
| **Negation** | `(a)-[]->(b); !(b)-[]->(a)` | One-way edges only |
|
|
1321
|
-
| **Diamond** | `(a)-[]->(b); (a)-[]->(c); (b)-[]->(d); (c)-[]->(d)` | Diamond pattern |
|
|
1322
|
-
|
|
1323
|
-
### GraphFrame Algorithms
|
|
1324
|
-
|
|
1325
|
-
| Algorithm | Method | Input | Output |
|
|
1326
|
-
|-----------|--------|-------|--------|
|
|
1327
|
-
| **PageRank** | `graph.pageRank(0.15, 20)` | damping, iterations | `{ ranks: {id: score}, iterations, converged }` |
|
|
1328
|
-
| **Connected Components** | `graph.connectedComponents()` | - | `{ components: {id: componentId}, count }` |
|
|
1329
|
-
| **Shortest Paths** | `graph.shortestPaths(['v0', 'v5'])` | landmark vertices | `{ distances: {id: {landmark: dist}} }` |
|
|
1330
|
-
| **Label Propagation** | `graph.labelPropagation(10)` | max iterations | `{ labels: {id: label}, iterations }` |
|
|
1331
|
-
| **Triangle Count** | `graph.triangleCount()` | - | Number of triangles |
|
|
1332
|
-
| **Motif Finding** | `graph.find('(a)-[]->(b)')` | pattern string | Array of matches |
|
|
1333
|
-
| **Degrees** | `graph.degrees()` / `inDegrees()` / `outDegrees()` | - | `{ id: degree }` |
|
|
1334
|
-
| **Pregel** | `pregelShortestPaths(graph, 'v0', 10)` | landmark, maxSteps | `{ distances, supersteps }` |
|
|
1335
|
-
|
|
1336
|
-
### Embedding Operations
|
|
1337
|
-
|
|
1338
|
-
| Operation | Method | Description |
|
|
1339
|
-
|-----------|--------|-------------|
|
|
1340
|
-
| **Store Vector** | `service.storeVector('id', [0.1, 0.2, ...])` | Store 384-dim embedding |
|
|
1341
|
-
| **Find Similar** | `service.findSimilar('id', 10, 0.7)` | HNSW k-NN search |
|
|
1342
|
-
| **Composite Store** | `service.storeComposite('id', JSON.stringify({openai: [...], voyage: [...]}))` | Multi-provider |
|
|
1343
|
-
| **Composite Search** | `service.findSimilarComposite('id', 10, 0.7, 'rrf')` | RRF/max/voting aggregation |
|
|
1344
|
-
| **1-Hop Cache** | `service.getNeighborsOut('id')` / `getNeighborsIn('id')` | ARCADE neighbor cache |
|
|
1345
|
-
| **Rebuild Index** | `service.rebuildIndex()` | Rebuild HNSW index |
|
|
271
|
+
**The audit trail proves the Friday answer came from the same verified query as Monday** - not a new hallucination.
|
|
1346
272
|
|
|
1347
273
|
---
|
|
1348
274
|
|
|
1349
|
-
##
|
|
1350
|
-
|
|
1351
|
-
### Performance (Measured)
|
|
1352
|
-
|
|
1353
|
-
| Metric | Value | Rate |
|
|
1354
|
-
|--------|-------|------|
|
|
1355
|
-
| **Triple Lookup** | 449 ns | 2.2M lookups/sec |
|
|
1356
|
-
| **Bulk Insert (100K)** | 682 ms | 146K triples/sec |
|
|
1357
|
-
| **Memory per Triple** | 24 bytes | Best-in-class |
|
|
1358
|
-
|
|
1359
|
-
### Industry Comparison
|
|
1360
|
-
|
|
1361
|
-
| System | Lookup Speed | Memory/Triple | AI Framework |
|
|
1362
|
-
|--------|-------------|---------------|--------------|
|
|
1363
|
-
| **rust-kgdb** | **449 ns** | **24 bytes** | **Yes** |
|
|
1364
|
-
| RDFox | ~5 µs | 36-89 bytes | No |
|
|
1365
|
-
| Virtuoso | ~5 µs | 35-75 bytes | No |
|
|
1366
|
-
| Blazegraph | ~100 µs | 100+ bytes | No |
|
|
1367
|
-
|
|
1368
|
-
### AI Agent Accuracy (Verified December 2025)
|
|
275
|
+
## Embedding-Powered Similarity
|
|
1369
276
|
|
|
1370
|
-
|
|
1371
|
-
|
|
1372
|
-
|
|
1373
|
-
|
|
1374
|
-
| **DSPy** | 14.3% | 71.4% |
|
|
277
|
+
Traditional keyword search fails when:
|
|
278
|
+
- Lawyer searches "breach of fiduciary duty" but case uses "violation of trust obligations"
|
|
279
|
+
- Doctor searches "heart attack" but records say "myocardial infarction"
|
|
280
|
+
- Fraud analyst searches "shell company" but data shows "SPV" or "holding entity"
|
|
1375
281
|
|
|
1376
|
-
|
|
282
|
+
**Our Approach:**
|
|
1377
283
|
|
|
1378
|
-
|
|
1379
|
-
|
|
1380
|
-
### AI Framework Architectural Comparison
|
|
1381
|
-
|
|
1382
|
-
| Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail |
|
|
1383
|
-
|-----------|-------------|--------------|-------------------|-------------|
|
|
1384
|
-
| **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
|
|
1385
|
-
| LangChain | ❌ No | ❌ No | ❌ No | ❌ No |
|
|
1386
|
-
| DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No |
|
|
1387
|
-
|
|
1388
|
-
**Key Insight**: Schema injection (HyperMind's architecture) provides +66.7 pp improvement across ALL frameworks. The value is in the architecture, not the specific framework.
|
|
1389
|
-
|
|
1390
|
-
### Reproduce Benchmarks
|
|
1391
|
-
|
|
1392
|
-
Two benchmark scripts are available for verification:
|
|
284
|
+
```javascript
|
|
285
|
+
const embedding = new EmbeddingService();
|
|
1393
286
|
|
|
1394
|
-
|
|
1395
|
-
|
|
1396
|
-
ANTHROPIC_API_KEY=... OPENAI_API_KEY=... node vanilla-vs-hypermind-benchmark.js
|
|
287
|
+
// Store queries with their semantic embeddings
|
|
288
|
+
embedding.store("find_fraud_providers", queryEmbedding);
|
|
1397
289
|
|
|
1398
|
-
|
|
1399
|
-
|
|
290
|
+
// Later: "which doctors are cheating" matches "find_fraud_providers"
|
|
291
|
+
// because embeddings capture meaning, not just keywords
|
|
292
|
+
const similar = embedding.findSimilar(newQueryEmbedding, 0.85);
|
|
1400
293
|
```
|
|
1401
294
|
|
|
1402
|
-
|
|
295
|
+
**HNSW Index Performance:**
|
|
296
|
+
- 50,000 vectors: ~20 comparisons (not 50,000)
|
|
297
|
+
- O(log N) search time
|
|
298
|
+
- 16ms for 10K similarity lookups
|
|
1403
299
|
|
|
1404
|
-
**
|
|
1405
|
-
- **Type Safety**: Tools have typed signatures (Query → BindingSet), invalid combinations rejected
|
|
1406
|
-
- **Schema Awareness**: Planner sees your actual data structure, can only reference real properties
|
|
1407
|
-
- **Symbolic Execution**: Queries run against real database, not LLM imagination
|
|
1408
|
-
- **Audit Trail**: Every answer has cryptographic hash for reproducibility
|
|
300
|
+
**This is how "cases like this one" returns relevant precedents even when the exact words differ.**
|
|
1409
301
|
|
|
1410
302
|
---
|
|
1411
303
|
|
|
1412
|
-
##
|
|
304
|
+
## What's In The Box
|
|
1413
305
|
|
|
1414
|
-
|
|
|
1415
|
-
|
|
1416
|
-
| **SPARQL
|
|
1417
|
-
| **
|
|
1418
|
-
| **
|
|
1419
|
-
| **
|
|
1420
|
-
| **
|
|
306
|
+
| Feature | What It Does | Why It Matters |
|
|
307
|
+
|---------|--------------|----------------|
|
|
308
|
+
| **SPARQL Engine** | Query knowledge graphs (449ns) | Faster than any hosted graph DB |
|
|
309
|
+
| **Datalog Rules** | Derive new facts from rules | Compliance cascades, fraud chains |
|
|
310
|
+
| **GraphFrames** | PageRank, shortest paths, motifs | Find hidden network structures |
|
|
311
|
+
| **Pregel BSP** | Process billion-edge graphs | Scale to enterprise transaction volumes |
|
|
312
|
+
| **HNSW Search** | Find similar items in milliseconds | "Cases like this one" in 16ms |
|
|
313
|
+
| **Audit Trail** | Prove every answer's source | Regulatory compliance, legal discovery |
|
|
314
|
+
| **WASM Sandbox** | Secure agent execution | Run untrusted code safely |
|
|
315
|
+
| **RDF 1.2 + SHACL** | W3C standards compliance | Interop with existing enterprise data |
|
|
1421
316
|
|
|
1422
317
|
---
|
|
1423
318
|
|
|
1424
|
-
##
|
|
319
|
+
## Performance
|
|
1425
320
|
|
|
1426
|
-
|
|
1427
|
-
|
|
1428
|
-
|
|
1429
|
-
|
|
321
|
+
| Metric | rust-kgdb | Typical Graph DB |
|
|
322
|
+
|--------|-----------|------------------|
|
|
323
|
+
| Lookup | 449 ns | 5,000+ ns |
|
|
324
|
+
| Memory | 24 bytes/triple | 60+ bytes |
|
|
325
|
+
| Setup | `npm install` | Days/weeks |
|
|
326
|
+
| Server | None (embedded) | Required |
|
|
327
|
+
| Data Location | Your infrastructure | Their cloud |
|
|
1430
328
|
|
|
1431
329
|
---
|
|
1432
330
|
|
|
1433
|
-
##
|
|
331
|
+
## Install
|
|
1434
332
|
|
|
1435
|
-
For those interested in the technical foundations of why HyperMind achieves deterministic AI reasoning.
|
|
1436
|
-
|
|
1437
|
-
### Why It Works: The Technical Foundation
|
|
1438
|
-
|
|
1439
|
-
HyperMind's reliability comes from three mathematical foundations:
|
|
1440
|
-
|
|
1441
|
-
| Foundation | What It Does | Practical Benefit |
|
|
1442
|
-
|------------|--------------|-------------------|
|
|
1443
|
-
| **Schema Awareness** | Auto-extracts your data structure | LLM only generates valid queries |
|
|
1444
|
-
| **Typed Tools** | Input/output validation | Prevents invalid tool combinations |
|
|
1445
|
-
| **Reasoning Trace** | Records every step | Complete audit trail for compliance |
|
|
1446
|
-
|
|
1447
|
-
### The Reasoning Trace (Audit Trail)
|
|
1448
|
-
|
|
1449
|
-
Every HyperMind answer includes a cryptographically-signed derivation showing exactly how the conclusion was reached:
|
|
1450
|
-
|
|
1451
|
-
```
|
|
1452
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1453
|
-
│ REASONING TRACE │
|
|
1454
|
-
│ │
|
|
1455
|
-
│ ┌────────────────────────────────┐ │
|
|
1456
|
-
│ │ CONCLUSION (Root) │ │
|
|
1457
|
-
│ │ "Provider P001 is suspicious" │ │
|
|
1458
|
-
│ │ Confidence: 94% │ │
|
|
1459
|
-
│ └───────────────┬────────────────┘ │
|
|
1460
|
-
│ │ │
|
|
1461
|
-
│ ┌───────────────┼───────────────┐ │
|
|
1462
|
-
│ │ │ │ │
|
|
1463
|
-
│ ▼ ▼ ▼ │
|
|
1464
|
-
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
|
|
1465
|
-
│ │ Database Query │ │ Rule Application │ │ Similarity Match │ │
|
|
1466
|
-
│ │ │ │ │ │ │ │
|
|
1467
|
-
│ │ Tool: SPARQL │ │ Tool: Datalog │ │ Tool: Embeddings │ │
|
|
1468
|
-
│ │ Result: 47 claims│ │ Result: MATCHED │ │ Result: 87% │ │
|
|
1469
|
-
│ │ Time: 2.3ms │ │ Rule: fraud(?P) │ │ similar to known │ │
|
|
1470
|
-
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
|
|
1471
|
-
│ │
|
|
1472
|
-
│ HASH: sha256:8f3a2b1c4d5e... (Reproducible, Auditable, Verifiable) │
|
|
1473
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1474
|
-
```
|
|
1475
|
-
|
|
1476
|
-
### For Academics: Mathematical Foundations
|
|
1477
|
-
|
|
1478
|
-
HyperMind is built on rigorous mathematical foundations:
|
|
1479
|
-
|
|
1480
|
-
- **Context Theory** (Spivak's Ologs): Schema represented as a category where objects are classes and morphisms are properties
|
|
1481
|
-
- **Type Theory** (Hindley-Milner): Every tool has a typed signature enabling compile-time validation
|
|
1482
|
-
- **Proof Theory** (Curry-Howard): Proofs are programs, types are propositions - every conclusion has a derivation
|
|
1483
|
-
- **Category Theory**: Tools as morphisms with validated composition
|
|
1484
|
-
|
|
1485
|
-
These foundations ensure that HyperMind transforms probabilistic LLM outputs into deterministic, verifiable reasoning chains.
|
|
1486
|
-
|
|
1487
|
-
### Architecture Layers
|
|
1488
|
-
|
|
1489
|
-
```
|
|
1490
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1491
|
-
│ INTELLIGENCE CONTROL PLANE │
|
|
1492
|
-
│ │
|
|
1493
|
-
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
|
1494
|
-
│ │ Schema │ │ Tool │ │ Reasoning │ │
|
|
1495
|
-
│ │ Awareness │ │ Validation │ │ Trace │ │
|
|
1496
|
-
│ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘ │
|
|
1497
|
-
│ └────────────────────┼────────────────────┘ │
|
|
1498
|
-
│ ▼ │
|
|
1499
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1500
|
-
│ │ HYPERMIND AGENT │ │
|
|
1501
|
-
│ │ User Query → LLM Planner → Typed Execution Plan → Tools → Answer │ │
|
|
1502
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1503
|
-
│ ▼ │
|
|
1504
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1505
|
-
│ │ rust-kgdb ENGINE │ │
|
|
1506
|
-
│ │ • GraphDB (SPARQL 1.1) • GraphFrames (Analytics) │ │
|
|
1507
|
-
│ │ • Datalog (Rules) • Embeddings (Similarity) │ │
|
|
1508
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1509
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1510
|
-
```
|
|
1511
|
-
|
|
1512
|
-
### Security Model
|
|
1513
|
-
|
|
1514
|
-
HyperMind includes capability-based security:
|
|
1515
|
-
|
|
1516
|
-
```javascript
|
|
1517
|
-
const agent = new HyperMindAgent({
|
|
1518
|
-
kg: db,
|
|
1519
|
-
scope: new AgentScope({
|
|
1520
|
-
allowedGraphs: ['http://insurance.org/'], // Restrict graph access
|
|
1521
|
-
allowedPredicates: ['amount', 'provider'], // Restrict predicates
|
|
1522
|
-
maxResultSize: 1000 // Limit result size
|
|
1523
|
-
}),
|
|
1524
|
-
sandbox: {
|
|
1525
|
-
capabilities: ['ReadKG', 'ExecuteTool'], // No WriteKG = read-only
|
|
1526
|
-
fuelLimit: 1_000_000 // CPU budget
|
|
1527
|
-
}
|
|
1528
|
-
})
|
|
1529
|
-
```
|
|
1530
|
-
|
|
1531
|
-
### Distributed Deployment (Kubernetes)
|
|
1532
|
-
|
|
1533
|
-
rust-kgdb scales from single-node to distributed cluster on the same codebase.
|
|
1534
|
-
|
|
1535
|
-
```
|
|
1536
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1537
|
-
│ DISTRIBUTED ARCHITECTURE │
|
|
1538
|
-
│ │
|
|
1539
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1540
|
-
│ │ COORDINATOR NODE │ │
|
|
1541
|
-
│ │ • Query planning & optimization │ │
|
|
1542
|
-
│ │ • HDRF streaming partitioner (subject-anchored) │ │
|
|
1543
|
-
│ │ • Raft consensus leader │ │
|
|
1544
|
-
│ │ • gRPC routing to executors │ │
|
|
1545
|
-
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
|
1546
|
-
│ │ │
|
|
1547
|
-
│ ┌───────────────────────┼───────────────────────┐ │
|
|
1548
|
-
│ │ │ │ │
|
|
1549
|
-
│ ▼ ▼ ▼ │
|
|
1550
|
-
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
1551
|
-
│ │ EXECUTOR 1 │ │ EXECUTOR 2 │ │ EXECUTOR 3 │ │
|
|
1552
|
-
│ │ │ │ │ │ │ │
|
|
1553
|
-
│ │ Partition 0 │ │ Partition 1 │ │ Partition 2 │ │
|
|
1554
|
-
│ │ RocksDB │ │ RocksDB │ │ RocksDB │ │
|
|
1555
|
-
│ │ Embeddings │ │ Embeddings │ │ Embeddings │ │
|
|
1556
|
-
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
|
1557
|
-
│ │
|
|
1558
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1559
|
-
```
|
|
1560
|
-
|
|
1561
|
-
**Deployment with Helm:**
|
|
1562
333
|
```bash
|
|
1563
|
-
|
|
1564
|
-
helm install rust-kgdb ./infra/helm -n rust-kgdb --create-namespace
|
|
1565
|
-
|
|
1566
|
-
# Scale executors
|
|
1567
|
-
kubectl scale deployment rust-kgdb-executor --replicas=5 -n rust-kgdb
|
|
1568
|
-
|
|
1569
|
-
# Check cluster health
|
|
1570
|
-
kubectl get pods -n rust-kgdb
|
|
1571
|
-
```
|
|
1572
|
-
|
|
1573
|
-
**Key Distributed Features:**
|
|
1574
|
-
| Feature | Description |
|
|
1575
|
-
|---------|-------------|
|
|
1576
|
-
| **HDRF Partitioning** | Subject-anchored streaming partitioner minimizes edge cuts |
|
|
1577
|
-
| **Raft Consensus** | Leader election, log replication, consistency |
|
|
1578
|
-
| **gRPC Communication** | Efficient inter-node query routing |
|
|
1579
|
-
| **Shadow Partitions** | Zero-downtime rebalancing (~10ms pause) |
|
|
1580
|
-
| **DataFusion OLAP** | Arrow-native analytical queries |
|
|
1581
|
-
|
|
1582
|
-
### Memory System
|
|
1583
|
-
|
|
1584
|
-
Agents have persistent memory across sessions:
|
|
1585
|
-
|
|
1586
|
-
```javascript
|
|
1587
|
-
const agent = new HyperMindAgent({
|
|
1588
|
-
kg: db,
|
|
1589
|
-
memory: new MemoryManager({
|
|
1590
|
-
workingMemorySize: 10, // Current session cache
|
|
1591
|
-
episodicRetentionDays: 30, // Episode history
|
|
1592
|
-
longTermGraph: 'http://memory/' // Persistent knowledge
|
|
1593
|
-
})
|
|
1594
|
-
})
|
|
1595
|
-
```
|
|
1596
|
-
|
|
1597
|
-
### Memory Hypergraph: How AI Agents Remember
|
|
1598
|
-
|
|
1599
|
-
rust-kgdb introduces the **Memory Hypergraph** - a temporal knowledge graph where agent memory is stored in the *same* quad store as your domain knowledge, with hyper-edges connecting episodes to KG entities.
|
|
1600
|
-
|
|
1601
|
-
```
|
|
1602
|
-
┌─────────────────────────────────────────────────────────────────────────────────┐
|
|
1603
|
-
│ MEMORY HYPERGRAPH ARCHITECTURE │
|
|
1604
|
-
│ │
|
|
1605
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
1606
|
-
│ │ AGENT MEMORY LAYER (am: graph) │ │
|
|
1607
|
-
│ │ │ │
|
|
1608
|
-
│ │ Episode:001 Episode:002 Episode:003 │ │
|
|
1609
|
-
│ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │
|
|
1610
|
-
│ │ │ Fraud ring │ │ Underwriting │ │ Follow-up │ │ │
|
|
1611
|
-
│ │ │ detected in │ │ denied claim │ │ investigation │ │ │
|
|
1612
|
-
│ │ │ Provider P001 │ │ from P001 │ │ on P001 │ │ │
|
|
1613
|
-
│ │ │ │ │ │ │ │ │ │
|
|
1614
|
-
│ │ │ Dec 10, 14:30 │ │ Dec 12, 09:15 │ │ Dec 15, 11:00 │ │ │
|
|
1615
|
-
│ │ │ Score: 0.95 │ │ Score: 0.87 │ │ Score: 0.92 │ │ │
|
|
1616
|
-
│ │ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │ │
|
|
1617
|
-
│ │ │ │ │ │ │
|
|
1618
|
-
│ └───────────┼─────────────────────────┼─────────────────────────┼─────────┘ │
|
|
1619
|
-
│ │ HyperEdge: │ HyperEdge: │ │
|
|
1620
|
-
│ │ "QueriedKG" │ "DeniedClaim" │ │
|
|
1621
|
-
│ ▼ ▼ ▼ │
|
|
1622
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
1623
|
-
│ │ KNOWLEDGE GRAPH LAYER (domain graph) │ │
|
|
1624
|
-
│ │ │ │
|
|
1625
|
-
│ │ Provider:P001 ──────────────▶ Claim:C123 ◀────────── Claimant:C001 │ │
|
|
1626
|
-
│ │ │ │ │ │ │
|
|
1627
|
-
│ │ │ :hasRiskScore │ :amount │ :name │ │
|
|
1628
|
-
│ │ ▼ ▼ ▼ │ │
|
|
1629
|
-
│ │ "0.87" "50000" "John Doe" │ │
|
|
1630
|
-
│ │ │ │
|
|
1631
|
-
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
1632
|
-
│ │ │ SAME QUAD STORE - Single SPARQL query traverses BOTH │ │ │
|
|
1633
|
-
│ │ │ memory graph AND knowledge graph! │ │ │
|
|
1634
|
-
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
1635
|
-
│ │ │ │
|
|
1636
|
-
│ └─────────────────────────────────────────────────────────────────────────┘ │
|
|
1637
|
-
│ │
|
|
1638
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
1639
|
-
│ │ TEMPORAL SCORING FORMULA │ │
|
|
1640
|
-
│ │ │ │
|
|
1641
|
-
│ │ Score = α × Recency + β × Relevance + γ × Importance │ │
|
|
1642
|
-
│ │ │ │
|
|
1643
|
-
│ │ where: │ │
|
|
1644
|
-
│ │ Recency = 0.995^hours (12% decay/day) │ │
|
|
1645
|
-
│ │ Relevance = cosine_similarity(query, episode) │ │
|
|
1646
|
-
│ │ Importance = log10(access_count + 1) / log10(max + 1) │ │
|
|
1647
|
-
│ │ │ │
|
|
1648
|
-
│ │ Default: α=0.3, β=0.5, γ=0.2 │ │
|
|
1649
|
-
│ └─────────────────────────────────────────────────────────────────────────┘ │
|
|
1650
|
-
│ │
|
|
1651
|
-
└─────────────────────────────────────────────────────────────────────────────────┘
|
|
1652
|
-
```
|
|
1653
|
-
|
|
1654
|
-
**Without Memory Hypergraph** (LangChain, LlamaIndex):
|
|
1655
|
-
```javascript
|
|
1656
|
-
// Ask about last week's findings
|
|
1657
|
-
agent.chat("What fraud patterns did we find with Provider P001?")
|
|
1658
|
-
// Response: "I don't have that information. Could you describe what you're looking for?"
|
|
1659
|
-
// Cost: Re-run entire fraud detection pipeline ($5 in API calls, 30 seconds)
|
|
1660
|
-
```
|
|
1661
|
-
|
|
1662
|
-
**With Memory Hypergraph** (rust-kgdb HyperMind Framework):
|
|
1663
|
-
```javascript
|
|
1664
|
-
// HyperMind API: Recall memories with KG context
|
|
1665
|
-
const enrichedMemories = await agent.recallWithKG({
|
|
1666
|
-
query: "Provider P001 fraud",
|
|
1667
|
-
kgFilter: { predicate: ":amount", operator: ">", value: 25000 },
|
|
1668
|
-
limit: 10
|
|
1669
|
-
})
|
|
1670
|
-
|
|
1671
|
-
// Returns typed results with linked KG context:
|
|
1672
|
-
// {
|
|
1673
|
-
// episode: "Episode:001",
|
|
1674
|
-
// finding: "Fraud ring detected in Provider P001",
|
|
1675
|
-
// kgContext: {
|
|
1676
|
-
// provider: "Provider:P001",
|
|
1677
|
-
// claims: [{ id: "Claim:C123", amount: 50000 }],
|
|
1678
|
-
// riskScore: 0.87
|
|
1679
|
-
// },
|
|
1680
|
-
// semanticHash: "semhash:fraud-provider-p001-ring-detection"
|
|
1681
|
-
// }
|
|
1682
|
-
```
|
|
1683
|
-
|
|
1684
|
-
#### Semantic Hashing for Idempotent Responses
|
|
1685
|
-
|
|
1686
|
-
Same question = Same answer. Even with **different wording**. Critical for compliance.
|
|
1687
|
-
|
|
1688
|
-
```javascript
|
|
1689
|
-
// First call: Compute answer, cache with semantic hash
|
|
1690
|
-
const result1 = await agent.call("Analyze claims from Provider P001")
|
|
1691
|
-
// Semantic Hash: semhash:fraud-provider-p001-claims-analysis
|
|
1692
|
-
|
|
1693
|
-
// Second call (different wording, same intent): Cache HIT!
|
|
1694
|
-
const result2 = await agent.call("Show me P001's claim patterns")
|
|
1695
|
-
// Cache HIT - same semantic hash
|
|
1696
|
-
|
|
1697
|
-
// Compliance officer: "Why are these identical?"
|
|
1698
|
-
// You: "Semantic hashing - same meaning, same output, regardless of phrasing."
|
|
1699
|
-
```
|
|
1700
|
-
|
|
1701
|
-
**How it works**: Query embeddings are hashed via **Locality-Sensitive Hashing (LSH)** with random hyperplane projections. Semantically similar queries map to the same bucket.
|
|
1702
|
-
|
|
1703
|
-
### HyperMind vs MCP (Model Context Protocol)
|
|
1704
|
-
|
|
1705
|
-
Why domain-enriched proxies beat generic function calling:
|
|
1706
|
-
|
|
1707
|
-
```
|
|
1708
|
-
┌───────────────────────┬──────────────────────┬──────────────────────────┐
|
|
1709
|
-
│ Feature │ MCP │ HyperMind Proxy │
|
|
1710
|
-
├───────────────────────┼──────────────────────┼──────────────────────────┤
|
|
1711
|
-
│ Type Safety │ ❌ String only │ ✅ Full type system │
|
|
1712
|
-
│ Domain Knowledge │ ❌ Generic │ ✅ Domain-enriched │
|
|
1713
|
-
│ Tool Composition │ ❌ Isolated │ ✅ Morphism composition │
|
|
1714
|
-
│ Validation │ ❌ Runtime │ ✅ Compile-time │
|
|
1715
|
-
│ Security │ ❌ None │ ✅ WASM sandbox │
|
|
1716
|
-
│ Audit Trail │ ❌ None │ ✅ Execution witness │
|
|
1717
|
-
│ LLM Context │ ❌ Generic schema │ ✅ Rich domain hints │
|
|
1718
|
-
│ Capability Control │ ❌ All or nothing │ ✅ Fine-grained caps │
|
|
1719
|
-
├───────────────────────┼──────────────────────┼──────────────────────────┤
|
|
1720
|
-
│ Result │ 60% accuracy │ 95%+ accuracy │
|
|
1721
|
-
└───────────────────────┴──────────────────────┴──────────────────────────┘
|
|
334
|
+
npm install rust-kgdb
|
|
1722
335
|
```
|
|
1723
336
|
|
|
1724
|
-
**MCP**: LLM generates query → hope it works
|
|
1725
|
-
**HyperMind**: LLM selects tools → type system validates → guaranteed correct
|
|
1726
|
-
|
|
1727
337
|
```javascript
|
|
1728
|
-
|
|
1729
|
-
// Tool: search_database(query: string)
|
|
1730
|
-
// LLM generates: "SELECT * FROM claims WHERE suspicious = true"
|
|
1731
|
-
// Result: ❌ SQL injection risk, "suspicious" column doesn't exist
|
|
1732
|
-
|
|
1733
|
-
// HYPERMIND APPROACH (Domain-enriched proxy)
|
|
1734
|
-
// Tool: kg.datalog.infer with fraud rules
|
|
1735
|
-
const result = await agent.call('Find collusion patterns')
|
|
1736
|
-
// Result: ✅ Type-safe, domain-aware, auditable
|
|
1737
|
-
```
|
|
1738
|
-
|
|
1739
|
-
### Why Vanilla LLMs Fail
|
|
1740
|
-
|
|
1741
|
-
When you ask an LLM to query a knowledge graph, it produces **broken SPARQL 85% of the time**:
|
|
338
|
+
const { GraphDB } = require('rust-kgdb');
|
|
1742
339
|
|
|
1743
|
-
|
|
1744
|
-
|
|
1745
|
-
|
|
1746
|
-
Vanilla LLM Output:
|
|
1747
|
-
┌───────────────────────────────────────────────────────────────────────┐
|
|
1748
|
-
│ ```sparql │
|
|
1749
|
-
│ PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#> │
|
|
1750
|
-
│ SELECT ?professor WHERE { │
|
|
1751
|
-
│ ?professor a ub:Faculty . ← WRONG! Schema has "Professor" │
|
|
1752
|
-
│ } │
|
|
1753
|
-
│ ``` ← Parser rejects markdown │
|
|
1754
|
-
│ │
|
|
1755
|
-
│ This query retrieves all faculty members from the LUBM dataset. │
|
|
1756
|
-
│ ↑ Explanation text breaks parsing │
|
|
1757
|
-
└───────────────────────────────────────────────────────────────────────┘
|
|
1758
|
-
Result: ❌ PARSER ERROR - Invalid SPARQL syntax
|
|
1759
|
-
```
|
|
1760
|
-
|
|
1761
|
-
**Why it fails:**
|
|
1762
|
-
1. LLM wraps query in markdown code blocks → parser chokes
|
|
1763
|
-
2. LLM adds explanation text → mixed with query syntax
|
|
1764
|
-
3. LLM hallucinates class names → `ub:Faculty` doesn't exist (it's `ub:Professor`)
|
|
1765
|
-
4. LLM has no schema awareness → guesses predicates and classes
|
|
1766
|
-
|
|
1767
|
-
**HyperMind fixes all of this** with schema injection and typed tools, achieving **71% accuracy** vs **0% for vanilla LLMs without schema**.
|
|
1768
|
-
|
|
1769
|
-
### Competitive Landscape
|
|
1770
|
-
|
|
1771
|
-
#### Triple Stores Comparison
|
|
1772
|
-
|
|
1773
|
-
| System | Lookup Speed | Memory/Triple | WCOJ | Mobile | AI Framework |
|
|
1774
|
-
|--------|-------------|---------------|------|--------|--------------|
|
|
1775
|
-
| **rust-kgdb** | **449 ns** | **24 bytes** | ✅ Yes | ✅ Yes | ✅ HyperMind |
|
|
1776
|
-
| Tentris | ~5 µs | ~30 bytes | ✅ Yes | ❌ No | ❌ No |
|
|
1777
|
-
| RDFox | ~5 µs | 36-89 bytes | ❌ No | ❌ No | ❌ No |
|
|
1778
|
-
| AllegroGraph | ~10 µs | 50+ bytes | ❌ No | ❌ No | ❌ No |
|
|
1779
|
-
| Virtuoso | ~5 µs | 35-75 bytes | ❌ No | ❌ No | ❌ No |
|
|
1780
|
-
| Blazegraph | ~100 µs | 100+ bytes | ❌ No | ❌ No | ❌ No |
|
|
1781
|
-
| Apache Jena | 150+ µs | 50-60 bytes | ❌ No | ❌ No | ❌ No |
|
|
1782
|
-
| Neo4j | ~5 µs | 70+ bytes | ❌ No | ❌ No | ❌ No |
|
|
1783
|
-
| Amazon Neptune | ~5 µs | N/A (managed) | ❌ No | ❌ No | ❌ No |
|
|
1784
|
-
|
|
1785
|
-
**Note**: Tentris implements WCOJ (see [ISWC 2025 paper](https://papers.dice-research.org/2025/ISWC_Tentris-WCOJ-Update/public.pdf)). rust-kgdb is the only system combining WCOJ with mobile support and integrated AI framework.
|
|
1786
|
-
|
|
1787
|
-
#### AI Framework Architectural Comparison
|
|
1788
|
-
|
|
1789
|
-
| Framework | Type Safety | Schema Aware | Symbolic Execution | Audit Trail |
|
|
1790
|
-
|-----------|-------------|--------------|-------------------|-------------|
|
|
1791
|
-
| **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
|
|
1792
|
-
| LangChain | ❌ No | ❌ No | ❌ No | ❌ No |
|
|
1793
|
-
| DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No |
|
|
1794
|
-
|
|
1795
|
-
**Note**: This compares architectural features. Benchmark (Dec 2025): Schema injection brings all frameworks to ~71% accuracy equally.
|
|
340
|
+
const db = new GraphDB('http://example.org/');
|
|
341
|
+
db.loadTtl(':Alice :knows :Bob . :Bob :knows :Charlie .');
|
|
1796
342
|
|
|
1797
|
-
|
|
1798
|
-
|
|
1799
|
-
│ COMPETITIVE LANDSCAPE │
|
|
1800
|
-
├─────────────────────────────────────────────────────────────────┤
|
|
1801
|
-
│ │
|
|
1802
|
-
│ Tentris: WCOJ-optimized, but no mobile or AI framework │
|
|
1803
|
-
│ RDFox: Fast commercial, but expensive, no mobile │
|
|
1804
|
-
│ AllegroGraph: Enterprise features, but slower, no mobile │
|
|
1805
|
-
│ Apache Jena: Great features, but 150+ µs lookups │
|
|
1806
|
-
│ Neo4j: Popular, but no SPARQL/RDF standards │
|
|
1807
|
-
│ Amazon Neptune: Managed, but cloud-only vendor lock-in │
|
|
1808
|
-
│ │
|
|
1809
|
-
│ rust-kgdb: 449 ns lookups, WCOJ joins, mobile-native │
|
|
1810
|
-
│ Standalone → Clustered on same codebase │
|
|
1811
|
-
│ Deterministic planner, audit-ready │
|
|
1812
|
-
│ │
|
|
1813
|
-
└─────────────────────────────────────────────────────────────────┘
|
|
343
|
+
const results = db.query('SELECT ?x WHERE { :Alice :knows ?x }');
|
|
344
|
+
// [{x: ':Bob'}]
|
|
1814
345
|
```
|
|
1815
346
|
|
|
1816
347
|
---
|
|
1817
348
|
|
|
1818
|
-
##
|
|
349
|
+
## Links
|
|
350
|
+
|
|
351
|
+
- [Examples](./examples/)
|
|
352
|
+
- [GitHub](https://github.com/gonnect-uk/rust-kgdb)
|
|
1819
353
|
|
|
1820
|
-
Apache 2.0
|
|
354
|
+
Apache 2.0 License
|