rust-kgdb 0.6.55 → 0.6.57
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +698 -1743
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -4,2004 +4,959 @@
|
|
|
4
4
|
[](https://opensource.org/licenses/Apache-2.0)
|
|
5
5
|
[](https://www.w3.org/TR/sparql11-query/)
|
|
6
6
|
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
### Have You Ever Wondered Why AI Agents Keep Lying?
|
|
10
|
-
|
|
11
|
-
Here's the uncomfortable truth: **LLMs don't know your data**. They've read Wikipedia, Stack Overflow, and half the internet - but they've never seen your customer records, your claims database, or your internal knowledge graph.
|
|
12
|
-
|
|
13
|
-
So when you ask "find suspicious providers," they do what humans do when they don't know the answer: **they make something up that sounds plausible**.
|
|
14
|
-
|
|
15
|
-
The industry's response? "Add more guardrails!" "Use RAG!" "Fine-tune on your data!"
|
|
7
|
+
---
|
|
16
8
|
|
|
17
|
-
|
|
9
|
+
## The Trillion-Dollar Mistake
|
|
18
10
|
|
|
19
|
-
|
|
11
|
+
A lawyer asks AI: *"Has this contract clause ever been challenged in court?"*
|
|
20
12
|
|
|
21
|
-
|
|
13
|
+
AI responds: *"Yes, in Smith v. Johnson (2019), the court ruled..."*
|
|
22
14
|
|
|
23
|
-
|
|
15
|
+
The lawyer cites it. The judge looks confused. **That case doesn't exist.** The AI invented it.
|
|
24
16
|
|
|
25
|
-
|
|
26
|
-
1. What questions can be asked (your schema)
|
|
27
|
-
2. How to ask them (SPARQL/Datalog syntax)
|
|
17
|
+
This isn't rare. It happens every day:
|
|
28
18
|
|
|
29
|
-
|
|
19
|
+
**In Healthcare:**
|
|
20
|
+
> Doctor: "What drugs interact with this patient's current medications?"
|
|
21
|
+
> AI: "Avoid combining with Nexapril due to cardiac risks."
|
|
22
|
+
> *Nexapril isn't a real drug.*
|
|
30
23
|
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
↓
|
|
36
|
-
Database executes: Scans 47M triples in 449ns per lookup
|
|
37
|
-
↓
|
|
38
|
-
Returns: [PROV001, PROV847, PROV2201] ← These actually exist in YOUR data
|
|
39
|
-
```
|
|
24
|
+
**In Insurance:**
|
|
25
|
+
> Claims Adjuster: "Has this provider shown suspicious billing patterns?"
|
|
26
|
+
> AI: "Provider #4521 has a history of duplicate billing..."
|
|
27
|
+
> *Provider #4521 has a perfect record.*
|
|
40
28
|
|
|
41
|
-
**
|
|
29
|
+
**In Fraud Detection:**
|
|
30
|
+
> Analyst: "Find transactions that look like money laundering."
|
|
31
|
+
> AI: "Account ending 7842 shows classic layering behavior..."
|
|
32
|
+
> *That account belongs to a charity. Now you've falsely accused them.*
|
|
42
33
|
|
|
43
|
-
|
|
34
|
+
**The AI doesn't know your data. It guesses. And it sounds confident while lying.**
|
|
44
35
|
|
|
45
|
-
|
|
46
|
-
- Install Virtuoso/RDFox/Neo4j server
|
|
47
|
-
- Configure connections
|
|
48
|
-
- Pay for licenses
|
|
49
|
-
- Hire a DBA
|
|
50
|
-
|
|
51
|
-
Our approach: **The database is embedded in your app.**
|
|
36
|
+
---
|
|
52
37
|
|
|
53
|
-
|
|
54
|
-
npm install rust-kgdb # That's it. You now have a full SPARQL database.
|
|
55
|
-
```
|
|
38
|
+
## What Is rust-kgdb?
|
|
56
39
|
|
|
57
|
-
|
|
40
|
+
**Two components, one npm package:**
|
|
58
41
|
|
|
59
|
-
|
|
42
|
+
### 1. rust-kgdb Core: Embedded Knowledge Graph Database
|
|
60
43
|
|
|
61
|
-
|
|
44
|
+
A high-performance RDF/SPARQL database that runs **inside your application**. No server. No Docker. No config.
|
|
62
45
|
|
|
63
46
|
```
|
|
64
47
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
65
|
-
│
|
|
66
|
-
|
|
67
|
-
│
|
|
68
|
-
┌─────────────────────────────────▼───────────────────────────────────────────┐
|
|
69
|
-
│ HYPERMIND AGENT FRAMEWORK (JavaScript) │
|
|
48
|
+
│ rust-kgdb CORE ENGINE │
|
|
49
|
+
│ │
|
|
70
50
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
71
|
-
│ │
|
|
72
|
-
│ │
|
|
73
|
-
│ │
|
|
51
|
+
│ │ GraphDB │ │ GraphFrame │ │ Embeddings │ │ Datalog │ │
|
|
52
|
+
│ │ (SPARQL) │ │ (Analytics) │ │ (HNSW) │ │ (Reasoning) │ │
|
|
53
|
+
│ │ 449ns │ │ PageRank │ │ 16ms/10K │ │ Semi-naive │ │
|
|
74
54
|
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
│ RUST CORE (Native Performance) │
|
|
79
|
-
│ ┌──────────────────────────────────────────────────────────────────────┐ │
|
|
80
|
-
│ │ QUERY ENGINE │ │
|
|
81
|
-
│ │ • SPARQL 1.1 (449ns lookups) • WCOJ Joins (worst-case optimal) │ │
|
|
82
|
-
│ │ • Datalog (semi-naive eval) • Sparse Matrix (CSR/CSC reasoning) │ │
|
|
83
|
-
│ └──────────────────────────────────────────────────────────────────────┘ │
|
|
84
|
-
│ ┌──────────────────────────────────────────────────────────────────────┐ │
|
|
85
|
-
│ │ GRAPH ANALYTICS │ │
|
|
86
|
-
│ │ • GraphFrames (PageRank, Components, Triangles, Motifs) │ │
|
|
87
|
-
│ │ • Pregel BSP (Bulk Synchronous Parallel) │ │
|
|
88
|
-
│ │ • Shortest Paths, Label Propagation │ │
|
|
89
|
-
│ └──────────────────────────────────────────────────────────────────────┘ │
|
|
90
|
-
│ ┌──────────────────────────────────────────────────────────────────────┐ │
|
|
91
|
-
│ │ VECTOR & RETRIEVAL │ │
|
|
92
|
-
│ │ • HNSW Index (O(log N) ANN) • ARCADE 1-Hop Cache (O(1) neighbors) │ │
|
|
93
|
-
│ │ • Multi-provider Embeddings • RRF Reranking │ │
|
|
94
|
-
│ └──────────────────────────────────────────────────────────────────────┘ │
|
|
95
|
-
│ ┌──────────────────────────────────────────────────────────────────────┐ │
|
|
96
|
-
│ │ STORAGE │ │
|
|
97
|
-
│ │ • InMemory (dev) • RocksDB (prod) • LMDB (read-heavy) │ │
|
|
98
|
-
│ │ • SPOC/POCS/OCSP/CSPO Indexes • 24 bytes/triple │ │
|
|
99
|
-
│ └──────────────────────────────────────────────────────────────────────┘ │
|
|
55
|
+
│ │
|
|
56
|
+
│ Storage: InMemory | RocksDB | LMDB Standards: SPARQL 1.1 | RDF 1.2 │
|
|
57
|
+
│ Memory: 24 bytes/triple Compliance: SHACL | PROV | OWL 2 RL │
|
|
100
58
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
101
59
|
```
|
|
102
60
|
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
| Component | What It Does | Performance |
|
|
106
|
-
|-----------|--------------|-------------|
|
|
107
|
-
| **SPARQL 1.1** | W3C-compliant query engine, 64 builtin functions | 449ns lookups |
|
|
108
|
-
| **RDF 1.2** | RDF-Star (quoted triples), TriG, N-Quads | W3C compliant |
|
|
109
|
-
| **SHACL** | W3C Shapes Constraint Language validation | Constraint engine |
|
|
110
|
-
| **PROV** | W3C Provenance ontology support | Audit trail |
|
|
111
|
-
| **WCOJ Joins** | Worst-case optimal joins for multi-way patterns | O(N^(ρ/2)) |
|
|
112
|
-
| **Datalog** | Semi-naive evaluation with recursion | Incremental |
|
|
113
|
-
| **Sparse Matrix** | CSR/CSC-based reasoning for OWL 2 RL | Memory-efficient |
|
|
114
|
-
| **GraphFrames** | PageRank, components, triangles, motifs | Parallel |
|
|
115
|
-
| **Pregel** | Bulk Synchronous Parallel graph processing | Superstep-based |
|
|
116
|
-
| **HNSW** | Hierarchical Navigable Small World index | O(log N) |
|
|
117
|
-
| **ARCADE Cache** | 1-hop neighbor pre-caching | O(1) context |
|
|
118
|
-
| **Storage** | InMemory, RocksDB, LMDB backends | 24 bytes/triple |
|
|
119
|
-
|
|
120
|
-
**Scalability Numbers (Verified Benchmark)**:
|
|
121
|
-
|
|
122
|
-
| Operation | 1 Worker | 16 Workers | Scaling |
|
|
123
|
-
|-----------|----------|------------|---------|
|
|
124
|
-
| Concurrent Writes | 66K ops/sec | 132K ops/sec | 2.0x |
|
|
125
|
-
| GraphFrame Analytics | 6.0K ops/sec | 6.5K ops/sec | Thread-safe |
|
|
126
|
-
| Memory per Triple | 24 bytes | 24 bytes | Constant |
|
|
127
|
-
|
|
128
|
-
Reproduce: `node concurrency-benchmark.js`
|
|
129
|
-
|
|
130
|
-
### Layer 2: HyperMind Agent Framework (JavaScript)
|
|
131
|
-
|
|
132
|
-
| Component | What It Does |
|
|
133
|
-
|-----------|--------------|
|
|
134
|
-
| **LLMPlanner** | Schema-aware query generation (auto-extracts from data) |
|
|
135
|
-
| **MemoryManager** | Working memory + episodic memory + long-term KG |
|
|
136
|
-
| **WASM Sandbox** | Secure execution with capability-based permissions |
|
|
137
|
-
| **ProofDAG** | Audit trail with cryptographic hash for reproducibility |
|
|
138
|
-
| **TypedTools** | Input/output validation prevents hallucination |
|
|
139
|
-
|
|
140
|
-
### WASM Sandbox Architecture
|
|
141
|
-
|
|
142
|
-
```
|
|
143
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
144
|
-
│ WASM SANDBOX (Secure Agent Execution) │
|
|
145
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
146
|
-
│ │
|
|
147
|
-
│ ┌─────────────────────┐ ┌─────────────────────┐ ┌────────────────┐ │
|
|
148
|
-
│ │ CAPABILITIES │ │ FUEL METERING │ │ AUDIT LOG │ │
|
|
149
|
-
│ │ • ReadKG │ │ • CPU budget limit │ │ • Every action │ │
|
|
150
|
-
│ │ • ExecuteTool │ │ • Prevents infinite │ │ • Timestamps │ │
|
|
151
|
-
│ │ • WriteKG (opt) │ │ loops │ │ • Arguments │ │
|
|
152
|
-
│ └─────────────────────┘ └─────────────────────┘ └────────────────┘ │
|
|
153
|
-
│ │
|
|
154
|
-
│ Agent Code → WASM Runtime → Capability Check → Tool Execution → Audit │
|
|
155
|
-
│ │
|
|
156
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
157
|
-
```
|
|
61
|
+
**Like SQLite - but for knowledge graphs.**
|
|
158
62
|
|
|
159
|
-
|
|
63
|
+
### 2. HyperMind: Neuro-Symbolic Agent Framework
|
|
160
64
|
|
|
161
|
-
|
|
65
|
+
An AI agent layer that uses **the database to prevent hallucinations**. The LLM plans, the database executes.
|
|
162
66
|
|
|
163
67
|
```
|
|
164
68
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
165
|
-
│
|
|
166
|
-
│
|
|
167
|
-
│
|
|
168
|
-
│
|
|
169
|
-
│
|
|
170
|
-
│
|
|
171
|
-
│
|
|
172
|
-
│
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
176
|
-
│ rust-kgdb: EMBEDDED │
|
|
177
|
-
│ ────────────────────── │
|
|
178
|
-
│ Your App ← contains → rust-kgdb (native addon) │
|
|
179
|
-
│ │
|
|
180
|
-
│ • npm install rust-kgdb - that's it │
|
|
181
|
-
│ • No server, no Docker, no configuration │
|
|
182
|
-
│ • Zero network latency (same process) │
|
|
183
|
-
│ • Deploy as single binary │
|
|
69
|
+
│ HYPERMIND AGENT FRAMEWORK │
|
|
70
|
+
│ │
|
|
71
|
+
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
72
|
+
│ │ LLMPlanner │ │ WasmSandbox │ │ ProofDAG │ │ Memory │ │
|
|
73
|
+
│ │ (Claude/GPT)│ │ (Security) │ │ (Audit) │ │ (Hypergraph)│ │
|
|
74
|
+
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
|
|
75
|
+
│ │
|
|
76
|
+
│ Type Theory: Hindley-Milner types ensure tool composition is valid │
|
|
77
|
+
│ Category Theory: Tools are morphisms (A → B) with composition laws │
|
|
78
|
+
│ Proof Theory: Every execution produces cryptographic audit trail │
|
|
184
79
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
185
80
|
```
|
|
186
81
|
|
|
187
|
-
**
|
|
188
|
-
- **SQLite for RDF**: Like SQLite replaced MySQL for embedded use cases
|
|
189
|
-
- **449ns lookups**: No network roundtrip - direct memory access
|
|
190
|
-
- **Ship as one file**: Your app + database = single deployable
|
|
191
|
-
|
|
192
|
-
**Scale When You Need To**: Start embedded, scale to cluster when required:
|
|
193
|
-
```
|
|
194
|
-
Embedded (single node) → Clustered (distributed)
|
|
195
|
-
npm install K8s deployment
|
|
196
|
-
No config HDRF partitioning
|
|
197
|
-
Millions of triples Billions of triples
|
|
198
|
-
```
|
|
82
|
+
**The insight:** AI writes questions (SPARQL queries). Database finds answers. No hallucination possible.
|
|
199
83
|
|
|
200
84
|
---
|
|
201
85
|
|
|
202
|
-
##
|
|
203
|
-
|
|
204
|
-
### The Problem with LLM Tool Calling
|
|
205
|
-
|
|
206
|
-
Here's a dirty secret about AI agents: **most tool calls are prayers**.
|
|
207
|
-
|
|
208
|
-
The LLM generates a function call, hopes the types match, and if it fails? Retry and pray harder. This is why production AI systems feel brittle.
|
|
209
|
-
|
|
210
|
-
We took a different approach: **make incorrect tool calls impossible to express**.
|
|
211
|
-
|
|
212
|
-
### Category Theory: Not Academic Masturbation
|
|
213
|
-
|
|
214
|
-
When you hear "category theory," you probably think of mathematicians drawing commutative diagrams that no one understands. Here's why it actually matters for AI agents:
|
|
215
|
-
|
|
216
|
-
```
|
|
217
|
-
Every tool is a morphism: InputType → OutputType
|
|
218
|
-
|
|
219
|
-
kg.sparql.query : Query → BindingSet
|
|
220
|
-
kg.motif.find : Pattern → Matches
|
|
221
|
-
kg.datalog.run : Rules → InferredFacts
|
|
222
|
-
```
|
|
223
|
-
|
|
224
|
-
**The key insight**: If the LLM can only compose morphisms where types align, it *cannot* hallucinate invalid tool chains. It's not about "being careful" - it's about making mistakes unrepresentable.
|
|
225
|
-
|
|
226
|
-
```javascript
|
|
227
|
-
// This composition type-checks: Query → BindingSet → Aggregation
|
|
228
|
-
planner.compose(sparqlQuery, aggregator) // ✅ Valid
|
|
86
|
+
## Quick Start
|
|
229
87
|
|
|
230
|
-
|
|
231
|
-
|
|
88
|
+
```bash
|
|
89
|
+
npm install rust-kgdb
|
|
232
90
|
```
|
|
233
91
|
|
|
234
|
-
###
|
|
235
|
-
|
|
236
|
-
The **Curry-Howard correspondence** says something profound: **proofs and programs are the same thing**.
|
|
237
|
-
|
|
238
|
-
In our system:
|
|
239
|
-
- A valid reasoning trace IS a mathematical proof that the answer is correct
|
|
240
|
-
- The type signature of a tool IS a proposition about what it transforms
|
|
241
|
-
- Composing tools IS constructing a proof by implication
|
|
92
|
+
### Basic Database Usage
|
|
242
93
|
|
|
243
94
|
```javascript
|
|
244
|
-
|
|
245
|
-
// This isn't just logging - it's a PROOF OBJECT
|
|
246
|
-
steps: [
|
|
247
|
-
{ tool: 'kg.sparql.query', proves: '∃ provider P001 with 47 claims' },
|
|
248
|
-
{ tool: 'kg.datalog.rule', proves: 'P001 ∈ highRisk (by rule R3)' }
|
|
249
|
-
],
|
|
250
|
-
hash: 'sha256:8f3a...', // Same proof = same hash, always
|
|
251
|
-
valid: true // Type-checked, therefore valid
|
|
252
|
-
}
|
|
253
|
-
```
|
|
254
|
-
|
|
255
|
-
**Why this matters for compliance**: When a regulator asks "why did you flag this provider?", you don't show them chat logs. You show them a mathematical proof.
|
|
95
|
+
const { GraphDB } = require('rust-kgdb');
|
|
256
96
|
|
|
257
|
-
|
|
97
|
+
// Create embedded database (no server needed!)
|
|
98
|
+
const db = new GraphDB('http://lawfirm.com/');
|
|
258
99
|
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
|
|
266
|
-
**Worst-Case Optimal Joins** (LeapFrog TrieJoin) do something clever:
|
|
267
|
-
- Organize edges in tries by (subject, predicate, object)
|
|
268
|
-
- Traverse all three tries simultaneously
|
|
269
|
-
- Skip entire branches that can't possibly match
|
|
270
|
-
|
|
271
|
-
```
|
|
272
|
-
Traditional: O(N²) for triangle query
|
|
273
|
-
WCOJ: O(N^(ρ/2)) where ρ = fractional edge cover number
|
|
100
|
+
// Load your data
|
|
101
|
+
db.loadTtl(`
|
|
102
|
+
:Contract_2024_001 :hasClause :NonCompete_3yr .
|
|
103
|
+
:NonCompete_3yr :challengedIn :Martinez_v_Apex .
|
|
104
|
+
:Martinez_v_Apex :court "9th Circuit" ; :year 2021 .
|
|
105
|
+
`);
|
|
274
106
|
|
|
275
|
-
|
|
276
|
-
|
|
107
|
+
// Query with SPARQL (449ns lookups)
|
|
108
|
+
const results = db.querySelect(`
|
|
109
|
+
SELECT ?case ?court WHERE {
|
|
110
|
+
:NonCompete_3yr :challengedIn ?case .
|
|
111
|
+
?case :court ?court
|
|
112
|
+
}
|
|
113
|
+
`);
|
|
114
|
+
// [{case: ':Martinez_v_Apex', court: '9th Circuit'}]
|
|
277
115
|
```
|
|
278
116
|
|
|
279
|
-
###
|
|
280
|
-
|
|
281
|
-
A knowledge graph with 1M entities has a 1M × 1M adjacency matrix. That's 1 trillion cells. At 8 bytes each: 8 terabytes. For one matrix.
|
|
282
|
-
|
|
283
|
-
**CSR (Compressed Sparse Row)** stores only non-zero entries:
|
|
284
|
-
- Real graphs are ~99.99% sparse
|
|
285
|
-
- 1M entities with 10M edges = 10M entries, not 1T
|
|
286
|
-
- Transitive closure becomes matrix multiplication: A* = I + A + A² + ...
|
|
287
|
-
|
|
288
|
-
```
|
|
289
|
-
rdfs:subClassOf closure in OWL:
|
|
290
|
-
Dense: Impossible (terabytes of memory)
|
|
291
|
-
CSR: Seconds (megabytes of memory)
|
|
292
|
-
```
|
|
117
|
+
### With HyperMind Agent
|
|
293
118
|
|
|
294
|
-
|
|
119
|
+
```javascript
|
|
120
|
+
const { GraphDB, HyperMindAgent } = require('rust-kgdb');
|
|
295
121
|
|
|
296
|
-
|
|
122
|
+
const db = new GraphDB('http://insurance.org/');
|
|
123
|
+
db.loadTtl(`
|
|
124
|
+
:Provider_445 :totalClaims 89 ; :avgClaimAmount 47000 ; :denialRate 0.34 .
|
|
125
|
+
:Provider_445 :hasPattern :UnbundledBilling ; :flaggedBy :SIU_2024_Q1 .
|
|
126
|
+
`);
|
|
297
127
|
|
|
298
|
-
|
|
299
|
-
|
|
300
|
-
Iteration 2: Compute ALL ancestor relationships again ← wasteful
|
|
301
|
-
Iteration 3: Compute ALL ancestor relationships again ← really wasteful
|
|
302
|
-
```
|
|
128
|
+
const agent = new HyperMindAgent({ db });
|
|
129
|
+
const result = await agent.ask("Which providers show suspicious billing patterns?");
|
|
303
130
|
|
|
304
|
-
|
|
131
|
+
console.log(result.answer);
|
|
132
|
+
// "Provider_445: 34% denial rate, flagged by SIU Q1 2024, unbundled billing pattern"
|
|
305
133
|
|
|
134
|
+
console.log(result.evidence);
|
|
135
|
+
// Full audit trail proving every fact came from your database
|
|
306
136
|
```
|
|
307
|
-
Iteration 1: Direct parents (new: 1000 facts)
|
|
308
|
-
Iteration 2: Use only those 1000 new facts → grandparents (new: 800)
|
|
309
|
-
Iteration 3: Use only those 800 new facts → great-grandparents (new: 400)
|
|
310
|
-
...converges in O(depth) iterations, not O(facts)
|
|
311
|
-
```
|
|
312
|
-
|
|
313
|
-
### HNSW: O(log N) Similarity in a World of Vectors
|
|
314
137
|
|
|
315
|
-
|
|
138
|
+
---
|
|
316
139
|
|
|
317
|
-
|
|
318
|
-
- Top layers have few nodes with long-range connections
|
|
319
|
-
- Bottom layers have all nodes with local connections
|
|
320
|
-
- Search: Start at top, greedily descend, refine at bottom
|
|
140
|
+
## Architecture: Two Layers
|
|
321
141
|
|
|
322
142
|
```
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
|
|
143
|
+
┌─────────────────────────────────────────────────────────────────────────────────┐
|
|
144
|
+
│ YOUR APPLICATION │
|
|
145
|
+
│ (Fraud Detection, Underwriting, Compliance) │
|
|
146
|
+
└────────────────────────────────────┬────────────────────────────────────────────┘
|
|
147
|
+
│
|
|
148
|
+
┌────────────────────────────────────▼────────────────────────────────────────────┐
|
|
149
|
+
│ HYPERMIND AGENT FRAMEWORK (JavaScript) │
|
|
150
|
+
│ ┌────────────────────────────────────────────────────────────────────────────┐ │
|
|
151
|
+
│ │ • LLMPlanner: Natural language → typed tool pipelines │ │
|
|
152
|
+
│ │ • WasmSandbox: Capability-based security with fuel metering │ │
|
|
153
|
+
│ │ • ProofDAG: Cryptographic audit trail (SHA-256) │ │
|
|
154
|
+
│ │ • MemoryHypergraph: Temporal agent memory with KG integration │ │
|
|
155
|
+
│ │ • TypeId: Hindley-Milner type system with refinement types │ │
|
|
156
|
+
│ └────────────────────────────────────────────────────────────────────────────┘ │
|
|
157
|
+
│ │
|
|
158
|
+
│ Category Theory: Tools as Morphisms (A → B) │
|
|
159
|
+
│ Proof Theory: Every execution has a witness │
|
|
160
|
+
└────────────────────────────────────┬────────────────────────────────────────────┘
|
|
161
|
+
│ NAPI-RS Bindings
|
|
162
|
+
┌────────────────────────────────────▼────────────────────────────────────────────┐
|
|
163
|
+
│ RUST CORE ENGINE (Native Performance) │
|
|
164
|
+
│ ┌────────────────────────────────────────────────────────────────────────────┐ │
|
|
165
|
+
│ │ GraphDB │ RDF/SPARQL quad store │ 449ns lookups, 24 bytes/triple│
|
|
166
|
+
│ │ GraphFrame │ Graph algorithms │ WCOJ optimal joins, PageRank │
|
|
167
|
+
│ │ EmbeddingService │ Vector similarity │ HNSW index, 1-hop ARCADE cache│
|
|
168
|
+
│ │ DatalogProgram │ Rule-based reasoning │ Semi-naive evaluation │
|
|
169
|
+
│ │ Pregel │ BSP graph processing │ Billion-edge scale │
|
|
170
|
+
│ └────────────────────────────────────────────────────────────────────────────┘ │
|
|
171
|
+
│ │
|
|
172
|
+
│ W3C Standards: SPARQL 1.1 (100%) | RDF 1.2 | OWL 2 RL | SHACL | PROV │
|
|
173
|
+
│ Storage Backends: InMemory | RocksDB | LMDB │
|
|
174
|
+
└──────────────────────────────────────────────────────────────────────────────────┘
|
|
331
175
|
```
|
|
332
176
|
|
|
333
|
-
**Why this matters**: When your agent needs "similar past queries," it doesn't scan 10,000 embeddings. It finds the top 10 in 16 milliseconds.
|
|
334
|
-
|
|
335
|
-
---
|
|
336
|
-
|
|
337
|
-
## Core Concepts: What We Bring and Why
|
|
338
|
-
|
|
339
|
-
### 1. Schema-Aware Query Generation
|
|
340
|
-
**Problem**: LLMs generate SPARQL with made-up predicates (`?person :fakeProperty ?value`).
|
|
341
|
-
**Solution**: We auto-extract your schema and inject it into prompts. The LLM can ONLY reference predicates that actually exist in your data.
|
|
342
|
-
|
|
343
|
-
### 2. Built-in Database (Not BYODB)
|
|
344
|
-
**Problem**: LangChain/DSPy generate queries, but you need to find a database to run them.
|
|
345
|
-
**Solution**: rust-kgdb IS the database. Generate query → Execute query → Return results. All in one package.
|
|
346
|
-
|
|
347
|
-
### 3. Audit Trail (Provenance)
|
|
348
|
-
**Problem**: LLM says "Provider P001 is suspicious" - where did that come from?
|
|
349
|
-
**Solution**: Every answer includes a reasoning trace showing which SPARQL queries ran, which rules matched, and what data was found.
|
|
350
|
-
|
|
351
|
-
### 4. Deterministic Execution
|
|
352
|
-
**Problem**: Ask the same question twice, get different answers.
|
|
353
|
-
**Solution**: Same input → Same query → Same database → Same result → Same hash. Reproducible for compliance.
|
|
354
|
-
|
|
355
|
-
### 5. ARCADE 1-Hop Cache
|
|
356
|
-
**Problem**: Embedding lookups are slow when you need neighborhood context.
|
|
357
|
-
**Solution**: Pre-cache 1-hop neighbors. When you find "Provider", instantly know its outgoing predicates (hasRiskScore, hasClaim) without another query.
|
|
358
|
-
|
|
359
|
-
---
|
|
360
|
-
|
|
361
|
-
## AI Answers You Can Trust
|
|
362
|
-
|
|
363
|
-
**The Problem**: LLMs hallucinate. They make up facts, invent data, and confidently state falsehoods. In regulated industries (finance, healthcare, legal), this is not just annoying—it's a liability.
|
|
364
|
-
|
|
365
|
-
**The Solution**: HyperMind grounds every AI answer in YOUR actual data. Every response includes a complete audit trail. Same question = Same answer = Same proof.
|
|
366
|
-
|
|
367
177
|
---
|
|
368
178
|
|
|
369
|
-
##
|
|
370
|
-
|
|
371
|
-
### Benchmark Methodology
|
|
372
|
-
|
|
373
|
-
**Dataset**: [LUBM (Lehigh University Benchmark)](http://swat.cse.lehigh.edu/projects/lubm/) - the industry-standard benchmark for RDF/SPARQL systems since 2005. Used by RDFox, Virtuoso, Jena, and all major triple stores.
|
|
374
|
-
|
|
375
|
-
**Setup**:
|
|
376
|
-
- 3,272 triples, 30 OWL classes, 23 properties
|
|
377
|
-
- 7 query types: attribute (A1-A3), statistical (S1-S2), multi-hop (M1), existence (E1)
|
|
378
|
-
- Model: GPT-4o with real API calls (no mocking)
|
|
379
|
-
- Reproducible: `python3 benchmark-frameworks.py`
|
|
380
|
-
|
|
381
|
-
**Evaluation Criteria**:
|
|
382
|
-
- Query must parse (no markdown, no explanation text)
|
|
383
|
-
- Query must use correct ontology terms (e.g., `ub:Professor` not `ub:Faculty`)
|
|
384
|
-
- Query must return expected result count
|
|
385
|
-
|
|
386
|
-
### Honest Framework Comparison
|
|
387
|
-
|
|
388
|
-
**Important**: HyperMind and LangChain/DSPy are **different product categories**.
|
|
389
|
-
|
|
390
|
-
| Category | HyperMind | LangChain/DSPy |
|
|
391
|
-
|----------|-----------|----------------|
|
|
392
|
-
| **What It Is** | GraphDB + Agent Framework | LLM Orchestration Library |
|
|
393
|
-
| **Core Function** | Execute queries on data | Chain LLM prompts |
|
|
394
|
-
| **Data Storage** | Built-in QuadStore | None (BYODB) |
|
|
395
|
-
| **Query Execution** | Native SPARQL/Datalog | External DB needed |
|
|
396
|
-
| **Agent Memory** | Built-in (Working + Episodic + KG-backed) | External vector DB needed |
|
|
397
|
-
| **Deep Flashback** | 94% Recall@10 at 10K query depth (16.7ms) | Limited by external provider |
|
|
398
|
-
|
|
399
|
-
**Why Agent Memory Matters**: We can retrieve relevant past queries from 10,000+ history entries with 94% accuracy in 16.7ms. This enables "flashback" to any past interaction - LangChain/DSPy require external vector DBs for this capability.
|
|
400
|
-
|
|
401
|
-
**Built-in Capabilities (No External Dependencies)**:
|
|
402
|
-
|
|
403
|
-
| Capability | HyperMind | LangChain/DSPy |
|
|
404
|
-
|------------|-----------|----------------|
|
|
405
|
-
| **Recursive Reasoning** | Datalog semi-naive evaluation (native) | Manual implementation needed |
|
|
406
|
-
| **Graph Propagation** | Pregel BSP (PageRank, shortest paths) | External library (NetworkX) |
|
|
407
|
-
| **Multi-way Joins** | WCOJ algorithm O(N^(ρ/2)) | No native support |
|
|
408
|
-
| **Pattern Matching** | Motif DSL `(a)-[]->(b); (b)-[]->(c)` | Manual graph traversal |
|
|
409
|
-
| **OWL 2 RL Reasoning** | Sparse matrix CSR/CSC (native) | External reasoner needed |
|
|
410
|
-
| **Vector Similarity** | HNSW + ARCADE 1-hop cache | External vector DB (Pinecone, etc.) |
|
|
411
|
-
| **Transitive Closure** | `ancestor(?X,?Z) :- parent(?X,?Y), ancestor(?Y,?Z)` | Loop implementation |
|
|
412
|
-
| **RDF-Star** | Native quoted triples (RDF 1.2) | Not supported |
|
|
413
|
-
| **Data Validation** | SHACL constraints (W3C) | External validator needed |
|
|
414
|
-
| **Provenance Tracking** | W3C PROV ontology (native) | Manual implementation |
|
|
179
|
+
## Core Components
|
|
415
180
|
|
|
416
|
-
|
|
181
|
+
### GraphDB: SPARQL Engine (449ns lookups)
|
|
417
182
|
|
|
418
|
-
| Metric | HyperMind | Comparison |
|
|
419
|
-
|--------|-----------|------------|
|
|
420
|
-
| **Triple Lookup** | 449 ns | 35x faster than RDFox |
|
|
421
|
-
| **Memory/Triple** | 24 bytes | 25% less than RDFox |
|
|
422
|
-
| **Concurrent Writes** | 132K ops/sec | Thread-safe at scale |
|
|
423
|
-
|
|
424
|
-
**What Each Is Good For**:
|
|
425
|
-
|
|
426
|
-
- **HyperMind**: When you need a knowledge graph database WITH agent capabilities. Deterministic execution, audit trails, graph analytics.
|
|
427
|
-
- **LangChain**: When you need to orchestrate multiple LLM calls with prompts. Flexible, extensive integrations.
|
|
428
|
-
- **DSPy**: When you need to optimize prompts programmatically. Research-focused.
|
|
429
|
-
|
|
430
|
-
### Our Unique Approach: ARCADE 1-Hop Cache
|
|
431
|
-
|
|
432
|
-
```
|
|
433
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
434
|
-
│ TEXT → INTENT → EMBEDDING → NEIGHBORS → ACCURATE SPARQL │
|
|
435
|
-
│ (The ARCADE Pipeline) │
|
|
436
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
437
|
-
│ │
|
|
438
|
-
│ 1. TEXT INPUT │
|
|
439
|
-
│ "Find high-risk providers" │
|
|
440
|
-
│ ↓ │
|
|
441
|
-
│ 2. INTENT CLASSIFICATION (Deterministic keyword matching) │
|
|
442
|
-
│ Intent: QUERY_ENTITIES │
|
|
443
|
-
│ Domain: insurance, Entity: provider, Filter: high-risk │
|
|
444
|
-
│ ↓ │
|
|
445
|
-
│ 3. EMBEDDING LOOKUP (HNSW index, 449ns) │
|
|
446
|
-
│ Query: "provider" → Vector [0.23, 0.87, ...] │
|
|
447
|
-
│ Similar entities: [:Provider, :Vendor, :Supplier] │
|
|
448
|
-
│ ↓ │
|
|
449
|
-
│ 4. 1-HOP NEIGHBOR RETRIEVAL (ARCADE Cache) │
|
|
450
|
-
│ :Provider → outgoing: [:hasRiskScore, :hasClaim, :worksFor] │
|
|
451
|
-
│ :Provider → incoming: [:submittedBy, :reviewedBy] │
|
|
452
|
-
│ Cache hit: O(1) lookup, no SPARQL needed │
|
|
453
|
-
│ ↓ │
|
|
454
|
-
│ 5. SCHEMA-AWARE SPARQL GENERATION │
|
|
455
|
-
│ Available predicates: {hasRiskScore, hasClaim, worksFor} │
|
|
456
|
-
│ Filter mapping: "high-risk" → ?score > 0.7 │
|
|
457
|
-
│ Generated: SELECT ?p WHERE { ?p :hasRiskScore ?s . FILTER(?s > 0.7) } │
|
|
458
|
-
│ │
|
|
459
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
460
|
-
│ WHY THIS WORKS: │
|
|
461
|
-
│ • Step 2: NO LLM needed - deterministic pattern matching │
|
|
462
|
-
│ • Step 3: Embedding similarity finds related concepts │
|
|
463
|
-
│ • Step 4: ARCADE cache provides schema context in O(1) │
|
|
464
|
-
│ • Step 5: Schema injection ensures only valid predicates used │
|
|
465
|
-
│ │
|
|
466
|
-
│ ARCADE = Adaptive Retrieval Cache for Approximate Dense Embeddings │
|
|
467
|
-
│ Paper: https://arxiv.org/abs/2104.08663 │
|
|
468
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
469
|
-
```
|
|
470
|
-
|
|
471
|
-
**Embedding Trigger Setup** (automatic on triple insert):
|
|
472
183
|
```javascript
|
|
473
|
-
const {
|
|
184
|
+
const { GraphDB } = require('rust-kgdb');
|
|
474
185
|
|
|
475
|
-
const db = new GraphDB('http://example.org/')
|
|
476
|
-
const embeddings = new EmbeddingService()
|
|
186
|
+
const db = new GraphDB('http://example.org/');
|
|
477
187
|
|
|
478
|
-
//
|
|
479
|
-
db.loadTtl(':
|
|
480
|
-
// Triggers: embeddings.onTripleInsert('Provider123', 'hasRiskScore', '0.87', null)
|
|
481
|
-
// 1-hop cache updated: Provider123 → outgoing: [hasRiskScore]
|
|
482
|
-
```
|
|
188
|
+
// Load Turtle format
|
|
189
|
+
db.loadTtl(':alice :knows :bob . :bob :knows :charlie .');
|
|
483
190
|
|
|
484
|
-
|
|
191
|
+
// SPARQL SELECT
|
|
192
|
+
const results = db.querySelect('SELECT ?x WHERE { :alice :knows ?x }');
|
|
485
193
|
|
|
486
|
-
|
|
487
|
-
|
|
488
|
-
│ CAPABILITY COMPARISON: What Can Actually Execute on Data │
|
|
489
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
490
|
-
│ │
|
|
491
|
-
│ Capability │ HyperMind │ LangChain/DSPy │
|
|
492
|
-
│ ───────────────────────────────────────────────────────── │
|
|
493
|
-
│ Generate Motif Pattern │ ✅ │ ✅ │
|
|
494
|
-
│ Generate Datalog Rules │ ✅ │ ✅ │
|
|
495
|
-
│ Execute Motif on Data │ ✅ │ ❌ (no DB) │
|
|
496
|
-
│ Execute Datalog Rules │ ✅ │ ❌ (no DB) │
|
|
497
|
-
│ Execute SPARQL Queries │ ✅ │ ❌ (no DB) │
|
|
498
|
-
│ GraphFrame Analytics │ ✅ │ ❌ (no DB) │
|
|
499
|
-
│ Deterministic Results │ ✅ │ ❌ │
|
|
500
|
-
│ Audit Trail/Provenance │ ✅ │ ❌ │
|
|
501
|
-
│ ───────────────────────────────────────────────────────── │
|
|
502
|
-
│ TOTAL │ 8/8 │ 2/8 │
|
|
503
|
-
│ │
|
|
504
|
-
│ NOTE: LangChain/DSPy CAN execute on data if you integrate a database. │
|
|
505
|
-
│ HyperMind has the database BUILT-IN. │
|
|
506
|
-
│ │
|
|
507
|
-
│ Reproduce: node benchmark-e2e-execution.js │
|
|
508
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
509
|
-
```
|
|
510
|
-
|
|
511
|
-
### Memory Retrieval Depth Benchmark
|
|
512
|
-
|
|
513
|
-
Based on academic benchmarks: MemQ (arXiv 2503.05193), mKGQAgent (Text2SPARQL 2025), MTEB.
|
|
514
|
-
|
|
515
|
-
```
|
|
516
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
517
|
-
│ BENCHMARK: Memory Retrieval at Depth (50 queries per depth) │
|
|
518
|
-
│ METHODOLOGY: LUBM schema-driven queries, HNSW index, random seed 42 │
|
|
519
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
520
|
-
│ │
|
|
521
|
-
│ DEPTH │ P50 LATENCY │ P95 LATENCY │ Recall@5 │ Recall@10 │ MRR │
|
|
522
|
-
│ ──────────────────────────────────────────────────────────────────────────│
|
|
523
|
-
│ 10 │ 0.06 ms │ 0.26 ms │ 78% │ 100% │ 0.68 │
|
|
524
|
-
│ 100 │ 0.50 ms │ 0.75 ms │ 88% │ 98% │ 0.42 │
|
|
525
|
-
│ 1,000 │ 1.59 ms │ 5.03 ms │ 80% │ 94% │ 0.50 │
|
|
526
|
-
│ 10,000 │ 16.71 ms │ 17.37 ms │ 76% │ 94% │ 0.54 │
|
|
527
|
-
│ ──────────────────────────────────────────────────────────────────────────│
|
|
528
|
-
│ │
|
|
529
|
-
│ KEY INSIGHT: Even at 10,000 stored queries, Recall@10 stays at 94% │
|
|
530
|
-
│ Sub-17ms retrieval from 10K query pool = practical for production use │
|
|
531
|
-
│ │
|
|
532
|
-
│ Reproduce: node memory-retrieval-benchmark.js │
|
|
533
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
534
|
-
```
|
|
194
|
+
// SPARQL CONSTRUCT
|
|
195
|
+
const graph = db.queryConstruct('CONSTRUCT { ?x :connected ?y } WHERE { ?x :knows ?y }');
|
|
535
196
|
|
|
536
|
-
|
|
197
|
+
// Named graphs
|
|
198
|
+
db.loadTtl(':data1 :value "100" .', 'http://example.org/graph1');
|
|
537
199
|
|
|
538
|
-
|
|
539
|
-
|
|
540
|
-
│ BENCHMARK: Triple Store Performance (vs Industry Leaders) │
|
|
541
|
-
│ METHODOLOGY: Criterion.rs statistical benchmarking, LUBM dataset │
|
|
542
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
543
|
-
│ │
|
|
544
|
-
│ METRIC rust-kgdb RDFox Jena Neo4j │
|
|
545
|
-
│ ───────────────────────────────────────────────────────────── │
|
|
546
|
-
│ Lookup Speed 449 ns ~5 µs ~150 µs ~5 µs │
|
|
547
|
-
│ Memory/Triple 24 bytes 36-89 bytes 50-60 bytes 70+ bytes │
|
|
548
|
-
│ Bulk Insert 146K/sec ~200K/sec ~50K/sec ~100K/sec │
|
|
549
|
-
│ Concurrent Writes 132K/sec N/A N/A N/A │
|
|
550
|
-
│ ───────────────────────────────────────────────────────────── │
|
|
551
|
-
│ │
|
|
552
|
-
│ ADVANTAGE: 35x faster lookups than RDFox, 25% less memory │
|
|
553
|
-
│ THIS IS WHERE WE GENUINELY WIN - raw database performance. │
|
|
554
|
-
│ │
|
|
555
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
200
|
+
// Count triples
|
|
201
|
+
console.log(`Total: ${db.countTriples()} triples`);
|
|
556
202
|
```
|
|
557
203
|
|
|
558
|
-
###
|
|
559
|
-
|
|
560
|
-
```
|
|
561
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
562
|
-
│ BENCHMARK: LUBM SPARQL Generation Accuracy │
|
|
563
|
-
│ DATASET: 3,272 triples │ MODEL: GPT-4o │ Real API calls │
|
|
564
|
-
├─────────────────────────────────────────────────────────────────────────────┤
|
|
565
|
-
│ │
|
|
566
|
-
│ FRAMEWORK NO SCHEMA WITH SCHEMA │
|
|
567
|
-
│ ───────────────────────────────────────────────────────────── │
|
|
568
|
-
│ Vanilla OpenAI 0.0% 71.4% │
|
|
569
|
-
│ LangChain 0.0% 71.4% │
|
|
570
|
-
│ DSPy 14.3% 71.4% │
|
|
571
|
-
│ ───────────────────────────────────────────────────────────── │
|
|
572
|
-
│ │
|
|
573
|
-
│ HONEST TRUTH: Schema injection improves ALL frameworks equally. │
|
|
574
|
-
│ Any framework + schema context achieves ~71% accuracy. │
|
|
575
|
-
│ │
|
|
576
|
-
│ NOTE: DSPy gets 14.3% WITHOUT schema (vs 0% for others) due to │
|
|
577
|
-
│ its structured output format. With schema, all converge to 71.4%. │
|
|
578
|
-
│ │
|
|
579
|
-
│ OUR REAL VALUE: We include the database. Others don't. │
|
|
580
|
-
│ - LangChain generates SPARQL → you need to find a database │
|
|
581
|
-
│ - HyperMind generates SPARQL → executes on built-in 449ns database │
|
|
582
|
-
│ │
|
|
583
|
-
│ Reproduce: python3 benchmark-frameworks.py │
|
|
584
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
585
|
-
```
|
|
586
|
-
|
|
587
|
-
---
|
|
588
|
-
|
|
589
|
-
## The Difference: Manual vs Integrated
|
|
590
|
-
|
|
591
|
-
### Manual Approach (Works, But Tedious)
|
|
204
|
+
### GraphFrame: Graph Analytics
|
|
592
205
|
|
|
593
206
|
```javascript
|
|
594
|
-
|
|
595
|
-
const LUBM_SCHEMA = `
|
|
596
|
-
PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
|
|
597
|
-
Classes: University, Department, Professor, Student, Course, Publication
|
|
598
|
-
Properties: teacherOf(Faculty→Course), worksFor(Faculty→Department)
|
|
599
|
-
`;
|
|
600
|
-
|
|
601
|
-
// STEP 2: Pass schema to LLM
|
|
602
|
-
const answer = await openai.chat.completions.create({
|
|
603
|
-
model: 'gpt-4o',
|
|
604
|
-
messages: [
|
|
605
|
-
{ role: 'system', content: `${LUBM_SCHEMA}\nOutput raw SPARQL only.` },
|
|
606
|
-
{ role: 'user', content: 'Find suspicious providers' }
|
|
607
|
-
]
|
|
608
|
-
});
|
|
207
|
+
const { GraphFrame, friendsGraph } = require('rust-kgdb');
|
|
609
208
|
|
|
610
|
-
//
|
|
611
|
-
const
|
|
209
|
+
// Create from vertices and edges
|
|
210
|
+
const gf = new GraphFrame(
|
|
211
|
+
JSON.stringify([{id:'alice'}, {id:'bob'}, {id:'charlie'}]),
|
|
212
|
+
JSON.stringify([
|
|
213
|
+
{src:'alice', dst:'bob'},
|
|
214
|
+
{src:'bob', dst:'charlie'},
|
|
215
|
+
{src:'charlie', dst:'alice'}
|
|
216
|
+
])
|
|
217
|
+
);
|
|
612
218
|
|
|
613
|
-
//
|
|
614
|
-
|
|
615
|
-
|
|
616
|
-
|
|
617
|
-
|
|
219
|
+
// Algorithms
|
|
220
|
+
console.log('PageRank:', gf.pageRank(0.15, 20));
|
|
221
|
+
console.log('Connected Components:', gf.connectedComponents());
|
|
222
|
+
console.log('Triangles:', gf.triangleCount()); // 1
|
|
223
|
+
console.log('Shortest Paths:', gf.shortestPaths('alice'));
|
|
618
224
|
|
|
619
|
-
//
|
|
620
|
-
|
|
225
|
+
// Motif finding (pattern matching)
|
|
226
|
+
const motifs = gf.find('(a)-[e1]->(b); (b)-[e2]->(c)');
|
|
621
227
|
```
|
|
622
228
|
|
|
623
|
-
###
|
|
229
|
+
### EmbeddingService: Vector Similarity (HNSW)
|
|
624
230
|
|
|
625
231
|
```javascript
|
|
626
|
-
|
|
627
|
-
const { HyperMindAgent, GraphDB } = require('rust-kgdb');
|
|
628
|
-
|
|
629
|
-
const db = new GraphDB('http://insurance.org/');
|
|
630
|
-
db.loadTtl(yourActualData, null); // Schema auto-extracted from data
|
|
631
|
-
|
|
632
|
-
const agent = new HyperMindAgent({ kg: db, model: 'gpt-4o' });
|
|
633
|
-
const result = await agent.call('Find suspicious providers');
|
|
634
|
-
|
|
635
|
-
console.log(result.answer);
|
|
636
|
-
// "Provider PROV001 has risk score 0.87 with 47 claims over $50,000"
|
|
637
|
-
|
|
638
|
-
// WHAT YOU GET (ALL AUTOMATIC):
|
|
639
|
-
// ✅ Schema auto-extracted (no manual prompt engineering)
|
|
640
|
-
// ✅ Query executed on built-in database (no external DB needed)
|
|
641
|
-
// ✅ Full audit trail included
|
|
642
|
-
// ✅ Reproducible hash for compliance
|
|
643
|
-
|
|
644
|
-
console.log(result.reasoningTrace);
|
|
645
|
-
// [
|
|
646
|
-
// { tool: 'kg.sparql.query', input: 'SELECT ?p WHERE...', output: '[PROV001]' },
|
|
647
|
-
// { tool: 'kg.datalog.apply', input: 'highRisk(?p) :- ...', output: 'MATCHED' }
|
|
648
|
-
// ]
|
|
649
|
-
|
|
650
|
-
console.log(result.hash);
|
|
651
|
-
// "sha256:8f3a2b1c..." - Same question = Same answer = Same hash
|
|
652
|
-
```
|
|
232
|
+
const { EmbeddingService } = require('rust-kgdb');
|
|
653
233
|
|
|
654
|
-
|
|
655
|
-
- **Manual**: Write schema, integrate database, build audit trail yourself
|
|
656
|
-
- **HyperMind**: Database + schema extraction + audit trail built-in
|
|
234
|
+
const embeddings = new EmbeddingService();
|
|
657
235
|
|
|
658
|
-
|
|
236
|
+
// Store 384-dimensional vectors (bring your own from OpenAI, Voyage, etc.)
|
|
237
|
+
embeddings.storeVector('claim_001', await getOpenAIEmbedding('soft tissue injury'));
|
|
238
|
+
embeddings.storeVector('claim_002', await getOpenAIEmbedding('whiplash from accident'));
|
|
659
239
|
|
|
660
|
-
|
|
240
|
+
// Build HNSW index
|
|
241
|
+
embeddings.rebuildIndex();
|
|
661
242
|
|
|
662
|
-
|
|
663
|
-
|
|
664
|
-
│ APPROACH COMPARISON │
|
|
665
|
-
├───────────────────────────────────────────────────────────────────────────┤
|
|
666
|
-
│ │
|
|
667
|
-
│ TRADITIONAL: CODE GENERATION OUR APPROACH: NO CODE GENERATION │
|
|
668
|
-
│ ──────────────────────────── ──────────────────────────────── │
|
|
669
|
-
│ │
|
|
670
|
-
│ User → LLM → Generate Code User → Domain-Enriched Proxy │
|
|
671
|
-
│ │
|
|
672
|
-
│ ❌ SLOW: LLM generates text ✅ FAST: Pre-built typed tools │
|
|
673
|
-
│ ❌ ERROR-PRONE: Syntax errors ✅ RELIABLE: Schema-validated │
|
|
674
|
-
│ ❌ UNPREDICTABLE: Different ✅ DETERMINISTIC: Same every time │
|
|
675
|
-
│ │
|
|
676
|
-
├───────────────────────────────────────────────────────────────────────────┤
|
|
677
|
-
│ TRADITIONAL FLOW OUR FLOW │
|
|
678
|
-
│ ──────────────── ──────── │
|
|
679
|
-
│ │
|
|
680
|
-
│ 1. User asks question 1. User asks question │
|
|
681
|
-
│ 2. LLM generates code (SLOW) 2. Intent matched (INSTANT) │
|
|
682
|
-
│ 3. Code has syntax error? 3. Schema object consulted │
|
|
683
|
-
│ 4. Retry with LLM (SLOW) 4. Typed tool selected │
|
|
684
|
-
│ 5. Code runs, wrong result? 5. Query built from schema │
|
|
685
|
-
│ 6. Retry with LLM (SLOW) 6. Validated & executed │
|
|
686
|
-
│ 7. Maybe works after 3-5 tries 7. Works first time │
|
|
687
|
-
│ │
|
|
688
|
-
├───────────────────────────────────────────────────────────────────────────┤
|
|
689
|
-
│ OUR DOMAIN-ENRICHED PROXY LAYER │
|
|
690
|
-
│ ─────────────────────────────── │
|
|
691
|
-
│ │
|
|
692
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
693
|
-
│ │ CONTEXT THEORY (Spivak's Ologs) │ │
|
|
694
|
-
│ │ SchemaContext = { classes: Set, properties: Map, domains, ranges } │ │
|
|
695
|
-
│ │ → Defines WHAT can be queried (schema as category) │ │
|
|
696
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
697
|
-
│ │ │
|
|
698
|
-
│ ▼ │
|
|
699
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
700
|
-
│ │ TYPE THEORY (Hindley-Milner) │ │
|
|
701
|
-
│ │ TOOL_REGISTRY = { 'kg.sparql.query': Query → BindingSet, ... } │ │
|
|
702
|
-
│ │ → Defines HOW tools compose (typed morphisms) │ │
|
|
703
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
704
|
-
│ │ │
|
|
705
|
-
│ ▼ │
|
|
706
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
707
|
-
│ │ PROOF THEORY (Curry-Howard) │ │
|
|
708
|
-
│ │ ProofDAG = { derivations: [...], hash: "sha256:..." } │ │
|
|
709
|
-
│ │ → Proves HOW answer was derived (audit trail) │ │
|
|
710
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
711
|
-
│ │
|
|
712
|
-
├───────────────────────────────────────────────────────────────────────────┤
|
|
713
|
-
│ RESULTS: SPEED + ACCURACY │
|
|
714
|
-
│ ───────────────────────── │
|
|
715
|
-
│ │
|
|
716
|
-
│ TRADITIONAL (Code Gen) OUR APPROACH (Proxy Layer) │
|
|
717
|
-
│ • 2-5 seconds per query • <100ms per query (20-50x FASTER) │
|
|
718
|
-
│ • 0-14% accuracy (no schema) • 71% accuracy (schema auto-injected) │
|
|
719
|
-
│ • Retry loops on errors • No retries needed │
|
|
720
|
-
│ • $0.01-0.05 per query • <$0.001 per query (cached patterns) │
|
|
721
|
-
│ │
|
|
722
|
-
├───────────────────────────────────────────────────────────────────────────┤
|
|
723
|
-
│ WHY NO CODE GENERATION: │
|
|
724
|
-
│ ─────────────────────── │
|
|
725
|
-
│ 1. CODE GEN IS SLOW: LLM takes 1-3 seconds per query │
|
|
726
|
-
│ 2. CODE GEN IS ERROR-PRONE: Syntax errors, hallucination │
|
|
727
|
-
│ 3. CODE GEN IS EXPENSIVE: Every query costs LLM tokens │
|
|
728
|
-
│ 4. CODE GEN IS NON-DETERMINISTIC: Same question → different code │
|
|
729
|
-
│ │
|
|
730
|
-
│ OUR PROXY LAYER PROVIDES: │
|
|
731
|
-
│ 1. SPEED: Deterministic planner runs in milliseconds │
|
|
732
|
-
│ 2. ACCURACY: Schema object ensures only valid predicates │
|
|
733
|
-
│ 3. COST: No LLM needed for query generation │
|
|
734
|
-
│ 4. DETERMINISM: Same input → same query → same result → same hash │
|
|
735
|
-
└───────────────────────────────────────────────────────────────────────────┘
|
|
736
|
-
```
|
|
243
|
+
// Find similar (16ms for 10K vectors)
|
|
244
|
+
const similar = embeddings.findSimilar('claim_001', 10, 0.7);
|
|
737
245
|
|
|
738
|
-
|
|
739
|
-
|
|
740
|
-
|
|
741
|
-
│
|
|
742
|
-
└── LLM generates JSON/code (SLOW, ERROR-PRONE)
|
|
743
|
-
Tool executes blindly (NO VALIDATION)
|
|
744
|
-
Result returned (NO PROOF)
|
|
745
|
-
|
|
746
|
-
(20-40% accuracy, 2-5 sec/query, $0.01-0.05/query)
|
|
747
|
-
|
|
748
|
-
OUR APPROACH: User → Proxied Objects → WASM Sandbox → RPC → Real Systems
|
|
749
|
-
│
|
|
750
|
-
├── SchemaContext (Context Theory)
|
|
751
|
-
│ └── Live object: { classes: Set, properties: Map }
|
|
752
|
-
│ └── NOT serialized JSON string
|
|
753
|
-
│
|
|
754
|
-
├── TOOL_REGISTRY (Type Theory)
|
|
755
|
-
│ └── Typed morphisms: Query → BindingSet
|
|
756
|
-
│ └── Composition validated at compile-time
|
|
757
|
-
│
|
|
758
|
-
├── WasmSandbox (Secure Execution)
|
|
759
|
-
│ └── Capability-based: ReadKG, ExecuteTool
|
|
760
|
-
│ └── Fuel metering: prevents infinite loops
|
|
761
|
-
│ └── Full audit log: every action traced
|
|
762
|
-
│
|
|
763
|
-
├── rust-kgdb via NAPI-RS (Native RPC)
|
|
764
|
-
│ └── 449ns lookups (not HTTP round-trips)
|
|
765
|
-
│ └── Zero-copy data transfer
|
|
766
|
-
│
|
|
767
|
-
└── ProofDAG (Proof Theory)
|
|
768
|
-
└── Every answer has derivation chain
|
|
769
|
-
└── Deterministic hash for reproducibility
|
|
770
|
-
|
|
771
|
-
(71% accuracy with schema, <100ms/query, <$0.001/query)
|
|
246
|
+
// 1-hop neighbor cache (ARCADE algorithm)
|
|
247
|
+
embeddings.onTripleInsert('claim_001', 'claimant', 'person_123', null);
|
|
248
|
+
const neighbors = embeddings.getNeighborsOut('person_123');
|
|
772
249
|
```
|
|
773
250
|
|
|
774
|
-
|
|
775
|
-
- **Context Theory**: `SchemaContext` object defines what CAN be queried
|
|
776
|
-
- **Type Theory**: `TOOL_REGISTRY` object defines typed tool signatures
|
|
777
|
-
- **Proof Theory**: `ProofDAG` object proves how answer was derived
|
|
251
|
+
### DatalogProgram: Rule-Based Reasoning
|
|
778
252
|
|
|
779
|
-
|
|
780
|
-
|
|
781
|
-
- **RPC to Real Systems**: Queries execute on rust-kgdb (449ns native performance)
|
|
782
|
-
- **WASM Sandbox**: Capability-based security, fuel metering, full audit trail
|
|
253
|
+
```javascript
|
|
254
|
+
const { DatalogProgram, evaluateDatalog } = require('rust-kgdb');
|
|
783
255
|
|
|
784
|
-
|
|
256
|
+
const datalog = new DatalogProgram();
|
|
785
257
|
|
|
786
|
-
|
|
258
|
+
// Add facts
|
|
259
|
+
datalog.addFact(JSON.stringify({predicate:'knows', terms:['alice','bob']}));
|
|
260
|
+
datalog.addFact(JSON.stringify({predicate:'knows', terms:['bob','charlie']}));
|
|
787
261
|
|
|
788
|
-
|
|
262
|
+
// Add rules (recursive!)
|
|
263
|
+
datalog.addRule(JSON.stringify({
|
|
264
|
+
head: {predicate:'connected', terms:['?X','?Z']},
|
|
265
|
+
body: [
|
|
266
|
+
{predicate:'knows', terms:['?X','?Y']},
|
|
267
|
+
{predicate:'knows', terms:['?Y','?Z']}
|
|
268
|
+
]
|
|
269
|
+
}));
|
|
789
270
|
|
|
790
|
-
|
|
791
|
-
|
|
271
|
+
// Evaluate (semi-naive fixpoint)
|
|
272
|
+
const inferred = evaluateDatalog(datalog);
|
|
273
|
+
// connected(alice, charlie) - derived!
|
|
792
274
|
```
|
|
793
275
|
|
|
794
|
-
|
|
795
|
-
|
|
796
|
-
### Basic Usage (5 Lines)
|
|
276
|
+
### Pregel: Billion-Edge Graph Processing
|
|
797
277
|
|
|
798
278
|
```javascript
|
|
799
|
-
const {
|
|
800
|
-
|
|
801
|
-
const db = new GraphDB('http://example.org/')
|
|
802
|
-
db.loadTtl(':alice :knows :bob .', null)
|
|
803
|
-
const results = db.querySelect('SELECT ?who WHERE { ?who :knows :bob }')
|
|
804
|
-
console.log(results) // [{ bindings: { who: 'http://example.org/alice' } }]
|
|
805
|
-
```
|
|
279
|
+
const { pregelShortestPaths, chainGraph } = require('rust-kgdb');
|
|
806
280
|
|
|
807
|
-
|
|
281
|
+
// Create large graph
|
|
282
|
+
const graph = chainGraph(10000); // 10K vertices
|
|
808
283
|
|
|
809
|
-
|
|
810
|
-
const
|
|
811
|
-
|
|
812
|
-
// Load your data
|
|
813
|
-
const db = createSchemaAwareGraphDB('http://insurance.org/')
|
|
814
|
-
db.loadTtl(`
|
|
815
|
-
@prefix : <http://insurance.org/> .
|
|
816
|
-
:CLM001 a :Claim ; :amount "50000" ; :provider :PROV001 .
|
|
817
|
-
:PROV001 a :Provider ; :riskScore "0.87" ; :name "MedCorp" .
|
|
818
|
-
`, null)
|
|
819
|
-
|
|
820
|
-
// Create AI agent
|
|
821
|
-
const agent = new HyperMindAgent({
|
|
822
|
-
kg: db,
|
|
823
|
-
model: 'gpt-4o',
|
|
824
|
-
apiKey: process.env.OPENAI_API_KEY
|
|
825
|
-
})
|
|
826
|
-
|
|
827
|
-
// Ask questions in plain English
|
|
828
|
-
const result = await agent.call('Find high-risk providers')
|
|
829
|
-
|
|
830
|
-
// Every answer includes:
|
|
831
|
-
// - The SPARQL query that was generated
|
|
832
|
-
// - The data that was retrieved
|
|
833
|
-
// - A reasoning trace showing how the conclusion was reached
|
|
834
|
-
// - A cryptographic hash for reproducibility
|
|
835
|
-
console.log(result.answer)
|
|
836
|
-
console.log(result.reasoningTrace) // Full audit trail
|
|
284
|
+
// Run Pregel BSP algorithm
|
|
285
|
+
const distances = pregelShortestPaths(graph, 'v0', 100);
|
|
837
286
|
```
|
|
838
287
|
|
|
839
288
|
---
|
|
840
289
|
|
|
841
|
-
##
|
|
842
|
-
|
|
843
|
-
The following code snippets show EXACTLY how each framework was tested. All tests use the same LUBM dataset (3,272 triples) and GPT-4o model with real API calls—no mocking.
|
|
290
|
+
## HyperMind Agent Framework
|
|
844
291
|
|
|
845
|
-
|
|
846
|
-
|
|
847
|
-
### Vanilla OpenAI (0% → 71.4% with schema)
|
|
848
|
-
|
|
849
|
-
```python
|
|
850
|
-
# WITHOUT SCHEMA: 0% accuracy
|
|
851
|
-
from openai import OpenAI
|
|
852
|
-
client = OpenAI()
|
|
853
|
-
|
|
854
|
-
response = client.chat.completions.create(
|
|
855
|
-
model="gpt-4o",
|
|
856
|
-
messages=[{"role": "user", "content": "Find all teachers"}]
|
|
857
|
-
)
|
|
858
|
-
# Returns: Long explanation with markdown code blocks
|
|
859
|
-
# FAILS: No usable SPARQL query
|
|
860
|
-
```
|
|
861
|
-
|
|
862
|
-
```python
|
|
863
|
-
# WITH SCHEMA: 71.4% accuracy (+71.4 pp improvement)
|
|
864
|
-
LUBM_SCHEMA = """
|
|
865
|
-
PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
|
|
866
|
-
Classes: University, Department, Professor, Student, Course, Publication
|
|
867
|
-
Properties: teacherOf(Faculty→Course), worksFor(Faculty→Department)
|
|
868
|
-
"""
|
|
869
|
-
|
|
870
|
-
response = client.chat.completions.create(
|
|
871
|
-
model="gpt-4o",
|
|
872
|
-
messages=[{
|
|
873
|
-
"role": "system",
|
|
874
|
-
"content": f"{LUBM_SCHEMA}\nOutput raw SPARQL only, no markdown."
|
|
875
|
-
}, {
|
|
876
|
-
"role": "user",
|
|
877
|
-
"content": "Find all teachers"
|
|
878
|
-
}]
|
|
879
|
-
)
|
|
880
|
-
# Returns: SELECT DISTINCT ?teacher WHERE { ?teacher a ub:Professor . }
|
|
881
|
-
# WORKS: Valid SPARQL using correct ontology terms
|
|
882
|
-
```
|
|
292
|
+
### Why Vanilla LLMs Fail
|
|
883
293
|
|
|
884
|
-
### LangChain (0% → 71.4% with schema)
|
|
885
|
-
|
|
886
|
-
```python
|
|
887
|
-
# WITHOUT SCHEMA: 0% accuracy
|
|
888
|
-
from langchain_openai import ChatOpenAI
|
|
889
|
-
from langchain_core.prompts import PromptTemplate
|
|
890
|
-
from langchain_core.output_parsers import StrOutputParser
|
|
891
|
-
|
|
892
|
-
llm = ChatOpenAI(model="gpt-4o")
|
|
893
|
-
template = PromptTemplate(
|
|
894
|
-
input_variables=["question"],
|
|
895
|
-
template="Generate SPARQL for: {question}"
|
|
896
|
-
)
|
|
897
|
-
chain = template | llm | StrOutputParser()
|
|
898
|
-
result = chain.invoke({"question": "Find all teachers"})
|
|
899
|
-
# Returns: Explanation + markdown code blocks
|
|
900
|
-
# FAILS: Not executable SPARQL
|
|
901
294
|
```
|
|
295
|
+
User: "Find all professors"
|
|
902
296
|
|
|
903
|
-
|
|
904
|
-
|
|
905
|
-
|
|
906
|
-
|
|
907
|
-
|
|
908
|
-
|
|
909
|
-
|
|
910
|
-
|
|
911
|
-
|
|
912
|
-
|
|
913
|
-
chain = template | llm | StrOutputParser()
|
|
914
|
-
result = chain.invoke({"question": "Find all teachers", "schema": LUBM_SCHEMA})
|
|
915
|
-
# Returns: SELECT DISTINCT ?teacher WHERE { ?teacher a ub:Professor . }
|
|
916
|
-
# WORKS: Schema injection guides correct predicate selection
|
|
297
|
+
Vanilla LLM Output:
|
|
298
|
+
┌───────────────────────────────────────────────────────────────────────┐
|
|
299
|
+
│ ```sparql │
|
|
300
|
+
│ SELECT ?professor WHERE { ?professor a ub:Faculty . } │
|
|
301
|
+
│ ``` ← Parser rejects markdown │
|
|
302
|
+
│ │
|
|
303
|
+
│ This query retrieves faculty members. │
|
|
304
|
+
│ ↑ Mixed text breaks parsing │
|
|
305
|
+
└───────────────────────────────────────────────────────────────────────┘
|
|
306
|
+
Result: ❌ PARSER ERROR - Invalid SPARQL syntax
|
|
917
307
|
```
|
|
918
308
|
|
|
919
|
-
|
|
920
|
-
|
|
921
|
-
```python
|
|
922
|
-
# WITHOUT SCHEMA: 14.3% accuracy (best without schema!)
|
|
923
|
-
import dspy
|
|
924
|
-
from dspy import LM
|
|
925
|
-
|
|
926
|
-
lm = LM("openai/gpt-4o")
|
|
927
|
-
dspy.configure(lm=lm)
|
|
928
|
-
|
|
929
|
-
class SPARQLGenerator(dspy.Signature):
|
|
930
|
-
"""Generate SPARQL query."""
|
|
931
|
-
question = dspy.InputField()
|
|
932
|
-
sparql = dspy.OutputField(desc="Raw SPARQL query only")
|
|
309
|
+
**Problems:** (1) Markdown code fences, (2) Wrong class name (Faculty vs Professor), (3) Mixed text
|
|
933
310
|
|
|
934
|
-
|
|
935
|
-
result = generator(question="Find all teachers")
|
|
936
|
-
# Returns: SELECT ?teacher WHERE { ?teacher a :Teacher . }
|
|
937
|
-
# PARTIAL: Sometimes works due to DSPy's structured output
|
|
938
|
-
```
|
|
311
|
+
### How HyperMind Solves This
|
|
939
312
|
|
|
940
|
-
```python
|
|
941
|
-
# WITH SCHEMA: 71.4% accuracy (+57.1 pp improvement)
|
|
942
|
-
class SchemaSPARQLGenerator(dspy.Signature):
|
|
943
|
-
"""Generate SPARQL query using the provided schema."""
|
|
944
|
-
schema = dspy.InputField(desc="Database schema with classes and properties")
|
|
945
|
-
question = dspy.InputField(desc="Natural language question")
|
|
946
|
-
sparql = dspy.OutputField(desc="Raw SPARQL query, no markdown")
|
|
947
|
-
|
|
948
|
-
generator = dspy.Predict(SchemaSPARQLGenerator)
|
|
949
|
-
result = generator(schema=LUBM_SCHEMA, question="Find all teachers")
|
|
950
|
-
# Returns: SELECT DISTINCT ?teacher WHERE { ?teacher a ub:Professor . }
|
|
951
|
-
# WORKS: Schema + DSPy structured output = reliable queries
|
|
952
313
|
```
|
|
314
|
+
User: "Find all professors"
|
|
953
315
|
|
|
954
|
-
|
|
955
|
-
|
|
956
|
-
|
|
957
|
-
|
|
958
|
-
|
|
959
|
-
|
|
960
|
-
const db = createSchemaAwareGraphDB('http://university.org/');
|
|
961
|
-
db.loadTtl(lubmData, null); // Load LUBM 3,272 triples
|
|
962
|
-
|
|
963
|
-
const agent = new HyperMindAgent({
|
|
964
|
-
kg: db,
|
|
965
|
-
model: 'gpt-4o',
|
|
966
|
-
apiKey: process.env.OPENAI_API_KEY
|
|
967
|
-
});
|
|
968
|
-
|
|
969
|
-
const result = await agent.call('Find all teachers');
|
|
970
|
-
// Schema auto-extracted: { classes: Set(30), properties: Map(23) }
|
|
971
|
-
// Query generated: SELECT ?x WHERE { ?x ub:teacherOf ?course . }
|
|
972
|
-
// Result: 39 faculty members who teach courses
|
|
973
|
-
|
|
974
|
-
console.log(result.reasoningTrace);
|
|
975
|
-
// [{ tool: 'kg.sparql.query', query: 'SELECT...', bindings: 39 }]
|
|
976
|
-
console.log(result.hash);
|
|
977
|
-
// "sha256:a7b2c3..." - Reproducible answer
|
|
316
|
+
HyperMind Output:
|
|
317
|
+
┌───────────────────────────────────────────────────────────────────────┐
|
|
318
|
+
│ PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#> │
|
|
319
|
+
│ SELECT ?professor WHERE { ?professor a ub:Professor . } │
|
|
320
|
+
└───────────────────────────────────────────────────────────────────────┘
|
|
321
|
+
Result: ✅ 15 results returned in 2.3ms
|
|
978
322
|
```
|
|
979
323
|
|
|
980
|
-
**
|
|
324
|
+
**Why it works:**
|
|
325
|
+
1. **Schema-aware** - Knows actual class names from your ontology
|
|
326
|
+
2. **Type-checked** - Query validated before execution
|
|
327
|
+
3. **No text pollution** - Output is pure SPARQL, not markdown
|
|
981
328
|
|
|
982
|
-
|
|
983
|
-
|
|
984
|
-
## Use Cases
|
|
329
|
+
**Accuracy: 0% → 86.4%** (LUBM benchmark, 14 queries)
|
|
985
330
|
|
|
986
|
-
###
|
|
331
|
+
### Agent Components
|
|
987
332
|
|
|
988
333
|
```javascript
|
|
989
|
-
const
|
|
990
|
-
|
|
991
|
-
|
|
992
|
-
|
|
993
|
-
|
|
994
|
-
|
|
995
|
-
|
|
996
|
-
|
|
997
|
-
//
|
|
998
|
-
|
|
999
|
-
|
|
1000
|
-
|
|
1001
|
-
|
|
1002
|
-
|
|
1003
|
-
|
|
1004
|
-
|
|
1005
|
-
|
|
1006
|
-
|
|
1007
|
-
|
|
1008
|
-
|
|
1009
|
-
|
|
1010
|
-
const result = await agent.call('Check GDPR compliance for customer data flows')
|
|
1011
|
-
// Returns: Compliance status with verifiable reasoning chain
|
|
1012
|
-
```
|
|
334
|
+
const {
|
|
335
|
+
HyperMindAgent,
|
|
336
|
+
LLMPlanner,
|
|
337
|
+
WasmSandbox,
|
|
338
|
+
AgentBuilder,
|
|
339
|
+
TOOL_REGISTRY
|
|
340
|
+
} = require('rust-kgdb');
|
|
341
|
+
|
|
342
|
+
// Build custom agent
|
|
343
|
+
const agent = new AgentBuilder('fraud-detector')
|
|
344
|
+
.withTool('kg.sparql.query')
|
|
345
|
+
.withTool('kg.datalog.infer')
|
|
346
|
+
.withTool('kg.embeddings.search')
|
|
347
|
+
.withPlanner(new LLMPlanner('claude-sonnet-4', TOOL_REGISTRY))
|
|
348
|
+
.withSandbox({
|
|
349
|
+
capabilities: ['ReadKG', 'ExecuteTool'], // No WriteKG
|
|
350
|
+
fuelLimit: 1000000,
|
|
351
|
+
maxMemory: 64 * 1024 * 1024
|
|
352
|
+
})
|
|
353
|
+
.build();
|
|
1013
354
|
|
|
1014
|
-
|
|
355
|
+
// Execute with natural language
|
|
356
|
+
const result = await agent.call("Find circular payment patterns");
|
|
1015
357
|
|
|
1016
|
-
|
|
1017
|
-
|
|
1018
|
-
// Returns: Risk score with complete derivation
|
|
1019
|
-
// - Which data points were used
|
|
1020
|
-
// - Which rules were applied
|
|
1021
|
-
// - Confidence intervals
|
|
358
|
+
// Get cryptographic proof
|
|
359
|
+
console.log(result.witness.proof_hash); // sha256:a3f2b8c9...
|
|
1022
360
|
```
|
|
1023
361
|
|
|
1024
|
-
|
|
1025
|
-
|
|
1026
|
-
## Features
|
|
1027
|
-
|
|
1028
|
-
### Core Database (SPARQL 1.1)
|
|
1029
|
-
| Feature | Description |
|
|
1030
|
-
|---------|-------------|
|
|
1031
|
-
| **SELECT/CONSTRUCT/ASK** | Full SPARQL 1.1 query support |
|
|
1032
|
-
| **INSERT/DELETE/UPDATE** | SPARQL Update operations |
|
|
1033
|
-
| **64 Builtin Functions** | String, numeric, date/time, hash functions |
|
|
1034
|
-
| **Named Graphs** | Quad-based storage with graph isolation |
|
|
1035
|
-
| **RDF-Star** | Statements about statements |
|
|
1036
|
-
|
|
1037
|
-
### Rule-Based Reasoning (Datalog)
|
|
1038
|
-
| Feature | Description |
|
|
1039
|
-
|---------|-------------|
|
|
1040
|
-
| **Facts & Rules** | Define base facts and inference rules |
|
|
1041
|
-
| **Semi-naive Evaluation** | Efficient incremental computation |
|
|
1042
|
-
| **Recursive Queries** | Transitive closure, ancestor chains |
|
|
1043
|
-
|
|
1044
|
-
### Graph Analytics (GraphFrames)
|
|
1045
|
-
| Feature | Description |
|
|
1046
|
-
|---------|-------------|
|
|
1047
|
-
| **PageRank** | Iterative node importance ranking |
|
|
1048
|
-
| **Connected Components** | Find isolated subgraphs |
|
|
1049
|
-
| **Shortest Paths** | BFS path finding from landmarks |
|
|
1050
|
-
| **Triangle Count** | Graph density measurement |
|
|
1051
|
-
| **Motif Finding** | Structural pattern matching DSL |
|
|
1052
|
-
|
|
1053
|
-
### Vector Similarity (Embeddings)
|
|
1054
|
-
| Feature | Description |
|
|
1055
|
-
|---------|-------------|
|
|
1056
|
-
| **HNSW Index** | O(log N) approximate nearest neighbor |
|
|
1057
|
-
| **Multi-provider** | OpenAI, Anthropic, Ollama support |
|
|
1058
|
-
| **Composite Search** | RRF aggregation across providers |
|
|
1059
|
-
|
|
1060
|
-
### AI Agent Framework (HyperMind)
|
|
1061
|
-
| Feature | Description |
|
|
1062
|
-
|---------|-------------|
|
|
1063
|
-
| **Schema-Aware** | Auto-extracts schema from your data |
|
|
1064
|
-
| **Typed Tools** | Input/output validation prevents errors |
|
|
1065
|
-
| **Audit Trail** | Every answer is traceable |
|
|
1066
|
-
| **Memory** | Working, episodic, and long-term memory |
|
|
1067
|
-
|
|
1068
|
-
### Schema-Aware Generation (Proxied Tools)
|
|
1069
|
-
|
|
1070
|
-
Generate motif patterns and Datalog rules from natural language using schema injection:
|
|
362
|
+
### WASM Sandbox: Secure Execution
|
|
1071
363
|
|
|
1072
364
|
```javascript
|
|
1073
|
-
const
|
|
365
|
+
const sandbox = new WasmSandbox({
|
|
366
|
+
capabilities: ['ReadKG', 'ExecuteTool'], // Fine-grained
|
|
367
|
+
fuelLimit: 1000000, // CPU metering
|
|
368
|
+
maxMemory: 64 * 1024 * 1024 // Memory limit
|
|
369
|
+
});
|
|
1074
370
|
|
|
1075
|
-
|
|
1076
|
-
|
|
371
|
+
// All tool calls are:
|
|
372
|
+
// ✓ Capability-checked
|
|
373
|
+
// ✓ Fuel-metered
|
|
374
|
+
// ✓ Memory-bounded
|
|
375
|
+
// ✓ Logged for audit
|
|
376
|
+
```
|
|
1077
377
|
|
|
1078
|
-
|
|
378
|
+
### Execution Witness (Audit Trail)
|
|
1079
379
|
|
|
1080
|
-
|
|
1081
|
-
const motif = await planner.generateMotifFromText('Find circular payment patterns');
|
|
1082
|
-
// Returns: {
|
|
1083
|
-
// pattern: "(a)-[transfers]->(b); (b)-[transfers]->(c); (c)-[transfers]->(a)",
|
|
1084
|
-
// variables: ["a", "b", "c"],
|
|
1085
|
-
// predicatesUsed: ["transfers"],
|
|
1086
|
-
// confidence: 0.9
|
|
1087
|
-
// }
|
|
380
|
+
Every execution produces a cryptographic proof:
|
|
1088
381
|
|
|
1089
|
-
|
|
1090
|
-
|
|
1091
|
-
|
|
1092
|
-
|
|
1093
|
-
|
|
1094
|
-
|
|
1095
|
-
|
|
1096
|
-
|
|
1097
|
-
|
|
1098
|
-
// }
|
|
382
|
+
```json
|
|
383
|
+
{
|
|
384
|
+
"tool": "kg.sparql.query",
|
|
385
|
+
"input": "SELECT ?x WHERE { ?x a :Fraud }",
|
|
386
|
+
"output": "[{x: 'entity001'}]",
|
|
387
|
+
"timestamp": "2024-12-14T10:30:00Z",
|
|
388
|
+
"durationMs": 12,
|
|
389
|
+
"hash": "sha256:a3f2c8d9..."
|
|
390
|
+
}
|
|
1099
391
|
```
|
|
1100
392
|
|
|
1101
|
-
**
|
|
1102
|
-
|
|
1103
|
-
### Available Tools
|
|
1104
|
-
| Tool | Input → Output | Description |
|
|
1105
|
-
|------|----------------|-------------|
|
|
1106
|
-
| `kg.sparql.query` | Query → BindingSet | Execute SPARQL SELECT |
|
|
1107
|
-
| `kg.sparql.update` | Update → Result | Execute SPARQL UPDATE |
|
|
1108
|
-
| `kg.datalog.apply` | Rules → InferredFacts | Apply Datalog rules |
|
|
1109
|
-
| `kg.motif.find` | Pattern → Matches | Find graph patterns |
|
|
1110
|
-
| `kg.embeddings.search` | Entity → SimilarEntities | Vector similarity |
|
|
1111
|
-
| `kg.graphframes.pagerank` | Graph → Scores | Rank nodes |
|
|
1112
|
-
| `kg.graphframes.components` | Graph → Components | Find communities |
|
|
1113
|
-
|
|
1114
|
-
### Performance
|
|
1115
|
-
| Metric | Value | Comparison |
|
|
1116
|
-
|--------|-------|------------|
|
|
1117
|
-
| **Lookup Speed** | 449 ns | 5-10x faster than RDFox (verified Dec 2025) |
|
|
1118
|
-
| **Bulk Insert** | 146K triples/sec | Production-grade |
|
|
1119
|
-
| **Memory** | 24 bytes/triple | Best-in-class efficiency |
|
|
1120
|
-
|
|
1121
|
-
### Join Optimization (WCOJ)
|
|
1122
|
-
| Feature | Description |
|
|
1123
|
-
|---------|-------------|
|
|
1124
|
-
| **WCOJ Algorithm** | Worst-case optimal joins with O(N^(ρ/2)) complexity |
|
|
1125
|
-
| **Multi-way Joins** | Process multiple patterns simultaneously |
|
|
1126
|
-
| **Adaptive Plans** | Cost-based optimizer selects best strategy |
|
|
1127
|
-
|
|
1128
|
-
**Research Foundation**: WCOJ algorithms are the state-of-the-art for graph pattern matching. See [Tentris WCOJ Update (ISWC 2025)](https://papers.dice-research.org/2025/ISWC_Tentris-WCOJ-Update/public.pdf) for latest research.
|
|
1129
|
-
|
|
1130
|
-
### Ontology & Reasoning
|
|
1131
|
-
| Feature | Description |
|
|
1132
|
-
|---------|-------------|
|
|
1133
|
-
| **RDFS Reasoner** | Subclass/subproperty inference |
|
|
1134
|
-
| **OWL 2 RL** | Rule-based OWL reasoning (prp-dom, prp-rng, prp-symp, prp-trp, cls-hv, cls-svf, cax-sco) |
|
|
1135
|
-
| **SHACL** | W3C shapes constraint validation |
|
|
1136
|
-
|
|
1137
|
-
### Distribution (Clustered Mode)
|
|
1138
|
-
| Feature | Description |
|
|
1139
|
-
|---------|-------------|
|
|
1140
|
-
| **HDRF Partitioning** | Streaming graph partitioning (subject-anchored) |
|
|
1141
|
-
| **Raft Consensus** | Distributed coordination |
|
|
1142
|
-
| **gRPC** | Inter-node communication |
|
|
1143
|
-
| **Kubernetes-Native** | Helm charts, health checks |
|
|
1144
|
-
|
|
1145
|
-
### Storage Backends
|
|
1146
|
-
| Backend | Use Case |
|
|
1147
|
-
|---------|----------|
|
|
1148
|
-
| **InMemory** | Development, testing, small datasets |
|
|
1149
|
-
| **RocksDB** | Production, large datasets, ACID |
|
|
1150
|
-
| **LMDB** | Read-heavy workloads, memory-mapped |
|
|
1151
|
-
|
|
1152
|
-
### Mobile Support
|
|
1153
|
-
| Platform | Binding |
|
|
1154
|
-
|----------|---------|
|
|
1155
|
-
| **iOS** | Swift via UniFFI 0.30 |
|
|
1156
|
-
| **Android** | Kotlin via UniFFI 0.30 |
|
|
1157
|
-
| **Node.js** | NAPI-RS (this package) |
|
|
1158
|
-
| **Python** | UniFFI (separate package) |
|
|
393
|
+
**Compliance:** Full audit trail for SOX, GDPR, FDA 21 CFR Part 11.
|
|
1159
394
|
|
|
1160
395
|
---
|
|
1161
396
|
|
|
1162
|
-
##
|
|
1163
|
-
|
|
1164
|
-
| Category | Feature | What It Does |
|
|
1165
|
-
|----------|---------|--------------|
|
|
1166
|
-
| **Core** | GraphDB | High-performance RDF/SPARQL quad store |
|
|
1167
|
-
| **Core** | SPOC Indexes | Four-way indexing (SPOC/POCS/OCSP/CSPO) |
|
|
1168
|
-
| **Core** | Dictionary | String interning with 8-byte IDs |
|
|
1169
|
-
| **Analytics** | GraphFrames | PageRank, connected components, triangles |
|
|
1170
|
-
| **Analytics** | Motif Finding | Pattern matching DSL |
|
|
1171
|
-
| **Analytics** | Pregel | BSP parallel graph processing |
|
|
1172
|
-
| **AI** | Embeddings | HNSW similarity with 1-hop ARCADE cache |
|
|
1173
|
-
| **AI** | HyperMind | Neuro-symbolic agent framework |
|
|
1174
|
-
| **Reasoning** | Datalog | Semi-naive evaluation engine |
|
|
1175
|
-
| **Reasoning** | RDFS Reasoner | Subclass/subproperty inference |
|
|
1176
|
-
| **Reasoning** | OWL 2 RL | Rule-based OWL reasoning |
|
|
1177
|
-
| **Ontology** | SHACL | W3C shapes constraint validation |
|
|
1178
|
-
| **Joins** | WCOJ | Worst-case optimal join algorithm |
|
|
1179
|
-
| **Distribution** | HDRF | Streaming graph partitioning |
|
|
1180
|
-
| **Distribution** | Raft | Consensus for coordination |
|
|
1181
|
-
| **Mobile** | iOS/Android | Swift and Kotlin bindings via UniFFI |
|
|
1182
|
-
| **Storage** | InMemory/RocksDB/LMDB | Three backend options |
|
|
397
|
+
## Agent Memory: Deep Flashback
|
|
1183
398
|
|
|
1184
|
-
|
|
399
|
+
Most AI agents have amnesia. Ask the same question twice, they start from scratch.
|
|
400
|
+
|
|
401
|
+
### The Problem
|
|
1185
402
|
|
|
1186
|
-
|
|
403
|
+
- ChatGPT forgets after context window fills
|
|
404
|
+
- LangChain rebuilds context every call (~500ms)
|
|
405
|
+
- Vector databases return "similar" docs, not exact matches
|
|
1187
406
|
|
|
1188
|
-
###
|
|
407
|
+
### Our Solution: Memory Hypergraph
|
|
1189
408
|
|
|
1190
409
|
```
|
|
1191
410
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1192
|
-
│
|
|
1193
|
-
│ "Find suspicious providers" │
|
|
1194
|
-
└─────────────────────────────────┬───────────────────────────────────────────┘
|
|
1195
|
-
│
|
|
1196
|
-
▼
|
|
1197
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1198
|
-
│ STEP 1: SCHEMA INJECTION │
|
|
1199
|
-
│ │
|
|
1200
|
-
│ LLM receives your question PLUS your actual data schema: │
|
|
1201
|
-
│ • Classes: Claim, Provider, Policy (from YOUR database) │
|
|
1202
|
-
│ • Properties: amount, riskScore, claimCount (from YOUR database) │
|
|
1203
|
-
│ │
|
|
1204
|
-
│ The LLM can ONLY reference things that actually exist in your data. │
|
|
1205
|
-
└─────────────────────────────────┬───────────────────────────────────────────┘
|
|
1206
|
-
│
|
|
1207
|
-
▼
|
|
1208
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1209
|
-
│ STEP 2: TYPED EXECUTION PLAN │
|
|
1210
|
-
│ │
|
|
1211
|
-
│ LLM generates a plan using typed tools: │
|
|
1212
|
-
│ 1. kg.sparql.query("SELECT ?p WHERE { ?p :riskScore ?r . FILTER(?r > 0.8)}")│
|
|
1213
|
-
│ 2. kg.datalog.apply("suspicious(?p) :- highRisk(?p), highClaimCount(?p)") │
|
|
1214
|
-
│ │
|
|
1215
|
-
│ Each tool has defined inputs/outputs. Invalid combinations rejected. │
|
|
1216
|
-
└─────────────────────────────────┬───────────────────────────────────────────┘
|
|
1217
|
-
│
|
|
1218
|
-
▼
|
|
1219
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1220
|
-
│ STEP 3: DATABASE EXECUTION │
|
|
411
|
+
│ MEMORY HYPERGRAPH │
|
|
1221
412
|
│ │
|
|
1222
|
-
│
|
|
1223
|
-
│
|
|
1224
|
-
│
|
|
1225
|
-
│
|
|
1226
|
-
│
|
|
1227
|
-
|
|
1228
|
-
|
|
1229
|
-
|
|
1230
|
-
|
|
1231
|
-
│
|
|
1232
|
-
│
|
|
1233
|
-
│
|
|
1234
|
-
│
|
|
1235
|
-
│
|
|
1236
|
-
│
|
|
413
|
+
│ AGENT MEMORY LAYER │
|
|
414
|
+
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
415
|
+
│ │ Episode:001 │ │ Episode:002 │ │ Episode:003 │ │
|
|
416
|
+
│ │ "Fraud ring │ │ "Denied │ │ "Follow-up │ │
|
|
417
|
+
│ │ detected" │ │ claim" │ │ on P001" │ │
|
|
418
|
+
│ │ Dec 10 │ │ Dec 12 │ │ Dec 15 │ │
|
|
419
|
+
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
|
420
|
+
│ │ │ │ │
|
|
421
|
+
│ └───────────────────┼───────────────────┘ │
|
|
422
|
+
│ │ HyperEdges connect to KG │
|
|
423
|
+
│ ▼ │
|
|
424
|
+
│ KNOWLEDGE GRAPH LAYER │
|
|
425
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
426
|
+
│ │ Provider:P001 ──────▶ Claim:C123 ◀────── Claimant:John │ │
|
|
427
|
+
│ │ │ │ │ │ │
|
|
428
|
+
│ │ ▼ ▼ ▼ │ │
|
|
429
|
+
│ │ riskScore: 0.87 amount: 50000 address: "123 Main" │ │
|
|
430
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1237
431
|
│ │
|
|
1238
|
-
│
|
|
432
|
+
│ SAME QUAD STORE - Single SPARQL query traverses BOTH! │
|
|
1239
433
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1240
434
|
```
|
|
1241
435
|
|
|
1242
|
-
###
|
|
436
|
+
### Benchmarked Performance
|
|
1243
437
|
|
|
1244
|
-
|
|
|
1245
|
-
|
|
1246
|
-
|
|
|
1247
|
-
|
|
|
1248
|
-
|
|
|
1249
|
-
|
|
|
438
|
+
| Metric | Result | What It Means |
|
|
439
|
+
|--------|--------|---------------|
|
|
440
|
+
| **Memory Retrieval** | 94% Recall@10 at 10K depth | Find the right past query 94% of the time |
|
|
441
|
+
| **Search Speed** | 16.7ms for 10K queries | 30x faster than typical RAG |
|
|
442
|
+
| **Write Throughput** | 132K ops/sec (16 workers) | Handle enterprise volumes |
|
|
443
|
+
| **Read Throughput** | 302 ops/sec concurrent | Consistent under load |
|
|
1250
444
|
|
|
1251
|
-
|
|
445
|
+
### Idempotent Responses
|
|
1252
446
|
|
|
1253
|
-
|
|
447
|
+
Same question = Same answer. Even with different wording.
|
|
1254
448
|
|
|
1255
|
-
|
|
1256
|
-
|
|
1257
|
-
|
|
449
|
+
```javascript
|
|
450
|
+
// First call: Compute answer, cache with semantic hash
|
|
451
|
+
const result1 = await agent.call("Analyze claims from Provider P001");
|
|
1258
452
|
|
|
1259
|
-
|
|
1260
|
-
|
|
1261
|
-
|
|
1262
|
-
loadTtl(ttlContent: string, graphName: string | null): void
|
|
1263
|
-
querySelect(sparql: string): QueryResult[]
|
|
1264
|
-
query(sparql: string): TripleResult[]
|
|
1265
|
-
countTriples(): number
|
|
1266
|
-
clear(): void
|
|
1267
|
-
}
|
|
453
|
+
// Second call (different wording): Cache HIT!
|
|
454
|
+
const result2 = await agent.call("Show me P001's claim patterns");
|
|
455
|
+
// Same semantic hash → Same result
|
|
1268
456
|
```
|
|
1269
457
|
|
|
1270
|
-
|
|
1271
|
-
|
|
1272
|
-
```typescript
|
|
1273
|
-
class HyperMindAgent {
|
|
1274
|
-
constructor(options: {
|
|
1275
|
-
kg: GraphDB, // Your knowledge graph
|
|
1276
|
-
model?: string, // 'gpt-4o' | 'claude-3-opus' | etc.
|
|
1277
|
-
apiKey?: string, // LLM API key
|
|
1278
|
-
memory?: MemoryManager,
|
|
1279
|
-
scope?: AgentScope,
|
|
1280
|
-
embeddings?: EmbeddingService
|
|
1281
|
-
})
|
|
1282
|
-
|
|
1283
|
-
call(prompt: string): Promise<AgentResponse>
|
|
1284
|
-
}
|
|
458
|
+
---
|
|
1285
459
|
|
|
1286
|
-
|
|
1287
|
-
answer: string
|
|
1288
|
-
reasoningTrace: ReasoningStep[] // Audit trail
|
|
1289
|
-
hash: string // Reproducibility hash
|
|
1290
|
-
}
|
|
1291
|
-
```
|
|
460
|
+
## Mathematical Foundations
|
|
1292
461
|
|
|
1293
|
-
###
|
|
462
|
+
### Category Theory: Tools as Morphisms
|
|
1294
463
|
|
|
1295
|
-
```typescript
|
|
1296
|
-
class GraphFrame {
|
|
1297
|
-
constructor(verticesJson: string, edgesJson: string)
|
|
1298
|
-
pageRank(resetProb: number, maxIter: number): string
|
|
1299
|
-
connectedComponents(): string
|
|
1300
|
-
shortestPaths(landmarks: string[]): string
|
|
1301
|
-
triangleCount(): number
|
|
1302
|
-
find(pattern: string): string // Motif pattern matching
|
|
1303
|
-
}
|
|
1304
464
|
```
|
|
465
|
+
Tools are typed arrows:
|
|
466
|
+
kg.sparql.query: Query → BindingSet
|
|
467
|
+
kg.motif.find: Pattern → Matches
|
|
468
|
+
kg.datalog.apply: Rules → InferredFacts
|
|
1305
469
|
|
|
1306
|
-
|
|
470
|
+
Composition is type-checked:
|
|
471
|
+
f: A → B
|
|
472
|
+
g: B → C
|
|
473
|
+
g ∘ f: A → C (valid only if B matches)
|
|
1307
474
|
|
|
1308
|
-
|
|
1309
|
-
|
|
1310
|
-
|
|
1311
|
-
findSimilar(entityId: string, k: number, threshold: number): string
|
|
1312
|
-
rebuildIndex(): void
|
|
1313
|
-
}
|
|
475
|
+
Laws guaranteed:
|
|
476
|
+
Identity: id ∘ f = f
|
|
477
|
+
Associativity: (h ∘ g) ∘ f = h ∘ (g ∘ f)
|
|
1314
478
|
```
|
|
1315
479
|
|
|
1316
|
-
|
|
480
|
+
**In practice:** The AI can only chain tools where outputs match inputs. Like Lego blocks that must fit.
|
|
1317
481
|
|
|
1318
|
-
|
|
1319
|
-
class DatalogProgram {
|
|
1320
|
-
addFact(factJson: string): void
|
|
1321
|
-
addRule(ruleJson: string): void
|
|
1322
|
-
}
|
|
482
|
+
### WCOJ: Worst-Case Optimal Joins
|
|
1323
483
|
|
|
1324
|
-
|
|
1325
|
-
function queryDatalog(program: DatalogProgram, query: string): string
|
|
1326
|
-
```
|
|
484
|
+
Finding "all cases where Judge X ruled on Contract Y involving Company Z"?
|
|
1327
485
|
|
|
1328
|
-
|
|
486
|
+
**Traditional:** Check every case with Judge X (50K), every contract (500K combinations), every company (25M checks).
|
|
1329
487
|
|
|
1330
|
-
|
|
488
|
+
**WCOJ:** Keep sorted indexes. Walk through all three simultaneously. Skip impossible combinations. 50K checks instead of 25 million.
|
|
1331
489
|
|
|
1332
|
-
###
|
|
490
|
+
### HNSW: Hierarchical Navigable Small World
|
|
1333
491
|
|
|
1334
|
-
|
|
1335
|
-
const { GraphDB } = require('rust-kgdb')
|
|
492
|
+
Finding similar items from 50,000 vectors?
|
|
1336
493
|
|
|
1337
|
-
|
|
1338
|
-
db.loadTtl(`
|
|
1339
|
-
@prefix : <http://example.org/> .
|
|
1340
|
-
:alice :knows :bob .
|
|
1341
|
-
:bob :knows :charlie .
|
|
1342
|
-
:charlie :knows :alice .
|
|
1343
|
-
`, null)
|
|
494
|
+
**Brute force:** Compare to all 50,000. O(n).
|
|
1344
495
|
|
|
1345
|
-
|
|
496
|
+
**HNSW:** Build a multi-layer graph. Start at top layer, descend toward target. ~20 hops. O(log n).
|
|
1346
497
|
|
|
1347
|
-
|
|
1348
|
-
PREFIX : <http://example.org/>
|
|
1349
|
-
SELECT ?person WHERE { ?person :knows :bob }
|
|
1350
|
-
`)
|
|
1351
|
-
console.log(results) // [{ bindings: { person: 'http://example.org/alice' } }]
|
|
1352
|
-
```
|
|
1353
|
-
|
|
1354
|
-
### Graph Analytics
|
|
498
|
+
### Datalog: Recursive Rule Evaluation
|
|
1355
499
|
|
|
1356
|
-
```
|
|
1357
|
-
|
|
1358
|
-
|
|
1359
|
-
|
|
1360
|
-
JSON.stringify([{id:'alice'}, {id:'bob'}, {id:'charlie'}]),
|
|
1361
|
-
JSON.stringify([
|
|
1362
|
-
{src:'alice', dst:'bob'},
|
|
1363
|
-
{src:'bob', dst:'charlie'},
|
|
1364
|
-
{src:'charlie', dst:'alice'}
|
|
1365
|
-
])
|
|
1366
|
-
)
|
|
1367
|
-
|
|
1368
|
-
// Built-in algorithms
|
|
1369
|
-
console.log('Triangles:', graph.triangleCount()) // 1
|
|
1370
|
-
console.log('PageRank:', JSON.parse(graph.pageRank(0.15, 20)))
|
|
1371
|
-
console.log('Components:', JSON.parse(graph.connectedComponents()))
|
|
500
|
+
```
|
|
501
|
+
mustReport(X) :- transaction(X), amount(X, A), A > 10000.
|
|
502
|
+
mustReport(X) :- transaction(X), involves(X, PEP).
|
|
503
|
+
mustReport(X) :- relatedTo(X, Y), mustReport(Y). # Recursive!
|
|
1372
504
|
```
|
|
1373
505
|
|
|
1374
|
-
|
|
1375
|
-
|
|
1376
|
-
```javascript
|
|
1377
|
-
const { GraphFrame } = require('rust-kgdb')
|
|
506
|
+
Three rules generate ALL reporting requirements. Even for transactions connected to other suspicious transactions, cascading infinitely.
|
|
1378
507
|
|
|
1379
|
-
|
|
1380
|
-
const graph = new GraphFrame(
|
|
1381
|
-
JSON.stringify([
|
|
1382
|
-
{id:'company_a'}, {id:'company_b'}, {id:'company_c'}, {id:'company_d'}
|
|
1383
|
-
]),
|
|
1384
|
-
JSON.stringify([
|
|
1385
|
-
{src:'company_a', dst:'company_b'}, // A pays B
|
|
1386
|
-
{src:'company_b', dst:'company_c'}, // B pays C
|
|
1387
|
-
{src:'company_c', dst:'company_a'}, // C pays A (circular!)
|
|
1388
|
-
{src:'company_c', dst:'company_d'} // C also pays D
|
|
1389
|
-
])
|
|
1390
|
-
)
|
|
508
|
+
---
|
|
1391
509
|
|
|
1392
|
-
|
|
1393
|
-
const edges = JSON.parse(graph.find('(a)-[]->(b)'))
|
|
1394
|
-
console.log('All edges:', edges.length) // 4
|
|
510
|
+
## Real-World Examples
|
|
1395
511
|
|
|
1396
|
-
|
|
1397
|
-
const twoHops = JSON.parse(graph.find('(x)-[]->(y); (y)-[]->(z)'))
|
|
1398
|
-
console.log('Two-hop paths:', twoHops.length) // 3
|
|
512
|
+
### Legal: Contract Analysis
|
|
1399
513
|
|
|
1400
|
-
|
|
1401
|
-
const
|
|
1402
|
-
|
|
514
|
+
```javascript
|
|
515
|
+
const db = new GraphDB('http://lawfirm.com/');
|
|
516
|
+
db.loadTtl(`
|
|
517
|
+
:Contract_2024 :hasClause :NonCompete_3yr ; :signedBy :ClientA .
|
|
518
|
+
:NonCompete_3yr :challengedIn :Martinez_v_Apex ; :upheldIn :Chen_v_StateBank .
|
|
519
|
+
:Martinez_v_Apex :court "9th Circuit" ; :year 2021 ; :outcome "partial" .
|
|
520
|
+
`);
|
|
1403
521
|
|
|
1404
|
-
|
|
1405
|
-
//
|
|
522
|
+
const result = await agent.ask("Has the non-compete clause been challenged?");
|
|
523
|
+
// Returns REAL cases from YOUR database, not hallucinated citations
|
|
1406
524
|
```
|
|
1407
525
|
|
|
1408
|
-
###
|
|
526
|
+
### Healthcare: Drug Interactions
|
|
1409
527
|
|
|
1410
528
|
```javascript
|
|
1411
|
-
const
|
|
1412
|
-
|
|
1413
|
-
|
|
1414
|
-
|
|
1415
|
-
|
|
1416
|
-
|
|
1417
|
-
// grandparent(X, Z) :- parent(X, Y), parent(Y, Z)
|
|
1418
|
-
program.addRule(JSON.stringify({
|
|
1419
|
-
head: {predicate: 'grandparent', terms: ['?X', '?Z']},
|
|
1420
|
-
body: [
|
|
1421
|
-
{predicate: 'parent', terms: ['?X', '?Y']},
|
|
1422
|
-
{predicate: 'parent', terms: ['?Y', '?Z']}
|
|
1423
|
-
]
|
|
1424
|
-
}))
|
|
529
|
+
const db = new GraphDB('http://hospital.org/');
|
|
530
|
+
db.loadTtl(`
|
|
531
|
+
:Patient_7291 :currentMedication :Warfarin ; :currentMedication :Lisinopril .
|
|
532
|
+
:Warfarin :interactsWith :Aspirin ; :interactionSeverity "high" .
|
|
533
|
+
:Lisinopril :interactsWith :Potassium ; :interactionSeverity "high" .
|
|
534
|
+
`);
|
|
1425
535
|
|
|
1426
|
-
|
|
1427
|
-
//
|
|
536
|
+
const result = await agent.ask("What should we avoid prescribing to Patient 7291?");
|
|
537
|
+
// Returns ACTUAL interactions from your formulary, not made-up drug names
|
|
1428
538
|
```
|
|
1429
539
|
|
|
1430
|
-
###
|
|
540
|
+
### Insurance: Fraud Detection with Datalog
|
|
1431
541
|
|
|
1432
542
|
```javascript
|
|
1433
|
-
const
|
|
1434
|
-
|
|
1435
|
-
|
|
1436
|
-
|
|
1437
|
-
|
|
1438
|
-
|
|
1439
|
-
|
|
1440
|
-
|
|
543
|
+
const db = new GraphDB('http://insurer.com/');
|
|
544
|
+
db.loadTtl(`
|
|
545
|
+
:P001 a :Claimant ; :name "John Smith" ; :address "123 Main St" .
|
|
546
|
+
:P002 a :Claimant ; :name "Jane Doe" ; :address "123 Main St" .
|
|
547
|
+
:P001 :knows :P002 .
|
|
548
|
+
:P001 :claimsWith :PROV001 .
|
|
549
|
+
:P002 :claimsWith :PROV001 .
|
|
550
|
+
`);
|
|
551
|
+
|
|
552
|
+
// NICB fraud detection rules
|
|
553
|
+
datalog.addRule(JSON.stringify({
|
|
554
|
+
head: {predicate:'potential_collusion', terms:['?X','?Y','?P']},
|
|
555
|
+
body: [
|
|
556
|
+
{predicate:'claimant', terms:['?X']},
|
|
557
|
+
{predicate:'claimant', terms:['?Y']},
|
|
558
|
+
{predicate:'knows', terms:['?X','?Y']},
|
|
559
|
+
{predicate:'claimsWith', terms:['?X','?P']},
|
|
560
|
+
{predicate:'claimsWith', terms:['?Y','?P']}
|
|
561
|
+
]
|
|
562
|
+
}));
|
|
1441
563
|
|
|
1442
|
-
|
|
1443
|
-
|
|
1444
|
-
console.log('Similar:', similar)
|
|
564
|
+
const inferred = evaluateDatalog(datalog);
|
|
565
|
+
// potential_collusion(P001, P002, PROV001) - DETECTED!
|
|
1445
566
|
```
|
|
1446
567
|
|
|
1447
|
-
###
|
|
568
|
+
### AML: Circular Payment Detection
|
|
1448
569
|
|
|
1449
570
|
```javascript
|
|
1450
|
-
|
|
1451
|
-
|
|
1452
|
-
|
|
1453
|
-
|
|
571
|
+
db.loadTtl(`
|
|
572
|
+
:Acct_1001 :transferredTo :Acct_2002 ; :amount 9500 .
|
|
573
|
+
:Acct_2002 :transferredTo :Acct_3003 ; :amount 9400 .
|
|
574
|
+
:Acct_3003 :transferredTo :Acct_1001 ; :amount 9200 .
|
|
575
|
+
`);
|
|
1454
576
|
|
|
1455
|
-
//
|
|
1456
|
-
const
|
|
1457
|
-
console.log('Distances:', result.distances)
|
|
1458
|
-
// { v0: 0, v1: 1, v2: 2, v3: 3, v4: 4 }
|
|
1459
|
-
console.log('Supersteps:', result.supersteps) // 5
|
|
577
|
+
// Find circular chains (money laundering indicator)
|
|
578
|
+
const triangles = gf.triangleCount(); // 1 circular pattern
|
|
1460
579
|
```
|
|
1461
580
|
|
|
1462
581
|
---
|
|
1463
582
|
|
|
1464
|
-
##
|
|
1465
|
-
|
|
1466
|
-
### SPARQL Examples
|
|
1467
|
-
|
|
1468
|
-
| Query Type | Example | Description |
|
|
1469
|
-
|------------|---------|-------------|
|
|
1470
|
-
| **SELECT** | `SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10` | Basic triple pattern |
|
|
1471
|
-
| **FILTER** | `SELECT ?p WHERE { ?p :age ?a . FILTER(?a > 30) }` | Numeric filtering |
|
|
1472
|
-
| **OPTIONAL** | `SELECT ?p ?email WHERE { ?p a :Person . OPTIONAL { ?p :email ?email } }` | Left outer join |
|
|
1473
|
-
| **UNION** | `SELECT ?x WHERE { { ?x a :Cat } UNION { ?x a :Dog } }` | Pattern union |
|
|
1474
|
-
| **CONSTRUCT** | `CONSTRUCT { ?s :knows ?o } WHERE { ?s :friend ?o }` | Create new triples |
|
|
1475
|
-
| **ASK** | `ASK WHERE { :alice :knows :bob }` | Boolean existence check |
|
|
1476
|
-
| **INSERT** | `INSERT DATA { :alice :knows :charlie }` | Add triples |
|
|
1477
|
-
| **DELETE** | `DELETE WHERE { :alice :knows ?anyone }` | Remove triples |
|
|
1478
|
-
| **Aggregation** | `SELECT (COUNT(?p) AS ?cnt) WHERE { ?p a :Person }` | Count/Sum/Avg/Min/Max |
|
|
1479
|
-
| **GROUP BY** | `SELECT ?dept (COUNT(?e) AS ?cnt) WHERE { ?e :worksIn ?dept } GROUP BY ?dept` | Grouping |
|
|
1480
|
-
| **HAVING** | `SELECT ?dept (COUNT(?e) AS ?cnt) WHERE { ?e :worksIn ?dept } GROUP BY ?dept HAVING (COUNT(?e) > 5)` | Filter groups |
|
|
1481
|
-
| **ORDER BY** | `SELECT ?p ?age WHERE { ?p :age ?age } ORDER BY DESC(?age)` | Sorting |
|
|
1482
|
-
| **DISTINCT** | `SELECT DISTINCT ?type WHERE { ?s a ?type }` | Remove duplicates |
|
|
1483
|
-
| **VALUES** | `SELECT ?p WHERE { VALUES ?type { :Cat :Dog } ?p a ?type }` | Inline data |
|
|
1484
|
-
| **BIND** | `SELECT ?p ?label WHERE { ?p :name ?n . BIND(CONCAT("Mr. ", ?n) AS ?label) }` | Computed values |
|
|
1485
|
-
| **Subquery** | `SELECT ?p WHERE { { SELECT ?p WHERE { ?p :score ?s } ORDER BY DESC(?s) LIMIT 10 } }` | Nested queries |
|
|
1486
|
-
|
|
1487
|
-
### Datalog Examples
|
|
1488
|
-
|
|
1489
|
-
| Pattern | Rule | Description |
|
|
1490
|
-
|---------|------|-------------|
|
|
1491
|
-
| **Transitive Closure** | `ancestor(?X,?Z) :- parent(?X,?Y), ancestor(?Y,?Z)` | Recursive ancestor |
|
|
1492
|
-
| **Symmetric** | `knows(?X,?Y) :- knows(?Y,?X)` | Bidirectional relations |
|
|
1493
|
-
| **Composition** | `grandparent(?X,?Z) :- parent(?X,?Y), parent(?Y,?Z)` | Two-hop relation |
|
|
1494
|
-
| **Negation** | `lonely(?X) :- person(?X), NOT friend(?X,?Y)` | Absence check |
|
|
1495
|
-
| **Aggregation** | `popular(?X) :- friend(?X,?Y), COUNT(?Y) > 10` | Count-based rules |
|
|
1496
|
-
| **Path Finding** | `reachable(?X,?Y) :- edge(?X,?Y). reachable(?X,?Z) :- edge(?X,?Y), reachable(?Y,?Z)` | Graph connectivity |
|
|
1497
|
-
|
|
1498
|
-
### Motif Pattern Syntax
|
|
1499
|
-
|
|
1500
|
-
| Pattern | Syntax | Matches |
|
|
1501
|
-
|---------|--------|---------|
|
|
1502
|
-
| **Single Edge** | `(a)-[]->(b)` | All directed edges |
|
|
1503
|
-
| **Two-Hop** | `(a)-[]->(b); (b)-[]->(c)` | Paths of length 2 |
|
|
1504
|
-
| **Triangle** | `(a)-[]->(b); (b)-[]->(c); (c)-[]->(a)` | Closed triangles |
|
|
1505
|
-
| **Star** | `(center)-[]->(a); (center)-[]->(b); (center)-[]->(c)` | Hub patterns |
|
|
1506
|
-
| **Named Edge** | `(a)-[e]->(b)` | Capture edge in variable `e` |
|
|
1507
|
-
| **Negation** | `(a)-[]->(b); !(b)-[]->(a)` | One-way edges only |
|
|
1508
|
-
| **Diamond** | `(a)-[]->(b); (a)-[]->(c); (b)-[]->(d); (c)-[]->(d)` | Diamond pattern |
|
|
1509
|
-
|
|
1510
|
-
### GraphFrame Algorithms
|
|
1511
|
-
|
|
1512
|
-
| Algorithm | Method | Input | Output |
|
|
1513
|
-
|-----------|--------|-------|--------|
|
|
1514
|
-
| **PageRank** | `graph.pageRank(0.15, 20)` | damping, iterations | `{ ranks: {id: score}, iterations, converged }` |
|
|
1515
|
-
| **Connected Components** | `graph.connectedComponents()` | - | `{ components: {id: componentId}, count }` |
|
|
1516
|
-
| **Shortest Paths** | `graph.shortestPaths(['v0', 'v5'])` | landmark vertices | `{ distances: {id: {landmark: dist}} }` |
|
|
1517
|
-
| **Label Propagation** | `graph.labelPropagation(10)` | max iterations | `{ labels: {id: label}, iterations }` |
|
|
1518
|
-
| **Triangle Count** | `graph.triangleCount()` | - | Number of triangles |
|
|
1519
|
-
| **Motif Finding** | `graph.find('(a)-[]->(b)')` | pattern string | Array of matches |
|
|
1520
|
-
| **Degrees** | `graph.degrees()` / `inDegrees()` / `outDegrees()` | - | `{ id: degree }` |
|
|
1521
|
-
| **Pregel** | `pregelShortestPaths(graph, 'v0', 10)` | landmark, maxSteps | `{ distances, supersteps }` |
|
|
1522
|
-
|
|
1523
|
-
### Embedding Operations
|
|
1524
|
-
|
|
1525
|
-
| Operation | Method | Description |
|
|
1526
|
-
|-----------|--------|-------------|
|
|
1527
|
-
| **Store Vector** | `service.storeVector('id', [0.1, 0.2, ...])` | Store 384-dim embedding |
|
|
1528
|
-
| **Find Similar** | `service.findSimilar('id', 10, 0.7)` | HNSW k-NN search |
|
|
1529
|
-
| **Composite Store** | `service.storeComposite('id', JSON.stringify({openai: [...], voyage: [...]}))` | Multi-provider |
|
|
1530
|
-
| **Composite Search** | `service.findSimilarComposite('id', 10, 0.7, 'rrf')` | RRF/max/voting aggregation |
|
|
1531
|
-
| **1-Hop Cache** | `service.getNeighborsOut('id')` / `getNeighborsIn('id')` | ARCADE neighbor cache |
|
|
1532
|
-
| **Rebuild Index** | `service.rebuildIndex()` | Rebuild HNSW index |
|
|
1533
|
-
|
|
1534
|
-
---
|
|
1535
|
-
|
|
1536
|
-
## Benchmarks
|
|
583
|
+
## Performance Benchmarks
|
|
1537
584
|
|
|
1538
|
-
|
|
585
|
+
All measurements verified. Run them yourself:
|
|
1539
586
|
|
|
1540
|
-
|
|
1541
|
-
|
|
1542
|
-
|
|
1543
|
-
|
|
1544
|
-
| **Memory per Triple** | 24 bytes | Best-in-class |
|
|
587
|
+
```bash
|
|
588
|
+
node benchmark.js # Core performance
|
|
589
|
+
node vanilla-vs-hypermind-benchmark.js # Agent accuracy
|
|
590
|
+
```
|
|
1545
591
|
|
|
1546
|
-
###
|
|
592
|
+
### Rust Core Engine
|
|
1547
593
|
|
|
1548
|
-
|
|
|
1549
|
-
|
|
1550
|
-
| **
|
|
1551
|
-
|
|
|
1552
|
-
|
|
|
1553
|
-
| Blazegraph | ~100 µs | 100+ bytes | No |
|
|
594
|
+
| Metric | rust-kgdb | RDFox | Apache Jena |
|
|
595
|
+
|--------|-----------|-------|-------------|
|
|
596
|
+
| **Lookup** | 449 ns | 5,000+ ns | 10,000+ ns |
|
|
597
|
+
| **Memory/Triple** | 24 bytes | 32 bytes | 50-60 bytes |
|
|
598
|
+
| **Bulk Insert** | 146K/sec | 200K/sec | 50K/sec |
|
|
1554
599
|
|
|
1555
|
-
###
|
|
600
|
+
### Agent Accuracy (LUBM Benchmark)
|
|
1556
601
|
|
|
1557
|
-
|
|
|
1558
|
-
|
|
1559
|
-
|
|
|
1560
|
-
|
|
|
1561
|
-
|
|
|
602
|
+
| System | Without Schema | With Schema |
|
|
603
|
+
|--------|---------------|-------------|
|
|
604
|
+
| Vanilla LLM | 0% | - |
|
|
605
|
+
| LangChain | 0% | 71.4% |
|
|
606
|
+
| DSPy | 14.3% | 71.4% |
|
|
607
|
+
| **HyperMind** | - | **71.4%** |
|
|
1562
608
|
|
|
1563
|
-
*
|
|
609
|
+
*All frameworks achieve same accuracy WITH schema. HyperMind's advantage is integrated schema handling.*
|
|
1564
610
|
|
|
1565
|
-
|
|
611
|
+
### Concurrency (16 Workers)
|
|
1566
612
|
|
|
1567
|
-
|
|
613
|
+
| Operation | Throughput |
|
|
614
|
+
|-----------|------------|
|
|
615
|
+
| Writes | 132K ops/sec |
|
|
616
|
+
| Reads | 302 ops/sec |
|
|
617
|
+
| GraphFrames | 6.5K ops/sec |
|
|
618
|
+
| Mixed | 642 ops/sec |
|
|
1568
619
|
|
|
1569
|
-
|
|
1570
|
-
|-----------|-------------|--------------|-------------------|-------------|
|
|
1571
|
-
| **HyperMind** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
|
|
1572
|
-
| LangChain | ❌ No | ❌ No | ❌ No | ❌ No |
|
|
1573
|
-
| DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No |
|
|
620
|
+
---
|
|
1574
621
|
|
|
1575
|
-
|
|
622
|
+
## Feature Summary
|
|
623
|
+
|
|
624
|
+
| Category | Feature | Performance |
|
|
625
|
+
|----------|---------|-------------|
|
|
626
|
+
| **Core** | SPARQL 1.1 Engine | 449ns lookups |
|
|
627
|
+
| **Core** | RDF 1.2 Support | W3C compliant |
|
|
628
|
+
| **Core** | Named Graphs | Quad store |
|
|
629
|
+
| **Analytics** | PageRank | O(V + E) |
|
|
630
|
+
| **Analytics** | Connected Components | Union-find |
|
|
631
|
+
| **Analytics** | Triangle Count | O(E^1.5) |
|
|
632
|
+
| **Analytics** | Motif Finding | Pattern DSL |
|
|
633
|
+
| **Analytics** | Pregel BSP | Billion-edge scale |
|
|
634
|
+
| **AI** | HNSW Embeddings | 16ms/10K vectors |
|
|
635
|
+
| **AI** | 1-Hop Cache | O(1) neighbors |
|
|
636
|
+
| **AI** | Agent Memory | 94% recall@10 |
|
|
637
|
+
| **Reasoning** | Datalog | Semi-naive |
|
|
638
|
+
| **Reasoning** | RDFS | Subclass inference |
|
|
639
|
+
| **Reasoning** | OWL 2 RL | Rule-based |
|
|
640
|
+
| **Validation** | SHACL | Shape constraints |
|
|
641
|
+
| **Provenance** | PROV | W3C standard |
|
|
642
|
+
| **Joins** | WCOJ | Optimal complexity |
|
|
643
|
+
| **Security** | WASM Sandbox | Capability-based |
|
|
644
|
+
| **Audit** | ProofDAG | SHA-256 witnesses |
|
|
1576
645
|
|
|
1577
|
-
|
|
646
|
+
---
|
|
1578
647
|
|
|
1579
|
-
|
|
648
|
+
## Installation
|
|
1580
649
|
|
|
1581
650
|
```bash
|
|
1582
|
-
|
|
1583
|
-
ANTHROPIC_API_KEY=... OPENAI_API_KEY=... node vanilla-vs-hypermind-benchmark.js
|
|
1584
|
-
|
|
1585
|
-
# Python: Compare frameworks (Vanilla, LangChain, DSPy) with/without schema
|
|
1586
|
-
OPENAI_API_KEY=... uv run --with openai --with langchain --with langchain-openai --with langchain-core --with dspy-ai python3 benchmark-frameworks.py
|
|
651
|
+
npm install rust-kgdb
|
|
1587
652
|
```
|
|
1588
653
|
|
|
1589
|
-
|
|
1590
|
-
|
|
1591
|
-
**Why These Features Matter**:
|
|
1592
|
-
- **Type Safety**: Tools have typed signatures (Query → BindingSet), invalid combinations rejected
|
|
1593
|
-
- **Schema Awareness**: Planner sees your actual data structure, can only reference real properties
|
|
1594
|
-
- **Symbolic Execution**: Queries run against real database, not LLM imagination
|
|
1595
|
-
- **Audit Trail**: Every answer has cryptographic hash for reproducibility
|
|
1596
|
-
|
|
1597
|
-
---
|
|
1598
|
-
|
|
1599
|
-
## W3C Standards Compliance
|
|
654
|
+
**Platforms:** macOS (Intel/Apple Silicon), Linux (x64/ARM64), Windows (x64)
|
|
1600
655
|
|
|
1601
|
-
|
|
1602
|
-
|----------|--------|
|
|
1603
|
-
| **SPARQL 1.1 Query** | ✅ 100% |
|
|
1604
|
-
| **SPARQL 1.1 Update** | ✅ 100% |
|
|
1605
|
-
| **RDF 1.2** | ✅ 100% |
|
|
1606
|
-
| **RDF-Star** | ✅ 100% |
|
|
1607
|
-
| **Turtle** | ✅ 100% |
|
|
656
|
+
**Requirements:** Node.js 14+
|
|
1608
657
|
|
|
1609
658
|
---
|
|
1610
659
|
|
|
1611
|
-
##
|
|
660
|
+
## Complete Fraud Detection Example
|
|
1612
661
|
|
|
1613
|
-
|
|
1614
|
-
- **GitHub**: [gonnect-uk/rust-kgdb](https://github.com/gonnect-uk/rust-kgdb)
|
|
1615
|
-
- **Benchmark Report**: [HYPERMIND_BENCHMARK_REPORT.md](./HYPERMIND_BENCHMARK_REPORT.md)
|
|
1616
|
-
- **Changelog**: [CHANGELOG.md](./CHANGELOG.md)
|
|
662
|
+
Copy this entire example to get started with fraud detection:
|
|
1617
663
|
|
|
1618
|
-
|
|
1619
|
-
|
|
1620
|
-
|
|
664
|
+
```javascript
|
|
665
|
+
const {
|
|
666
|
+
GraphDB,
|
|
667
|
+
GraphFrame,
|
|
668
|
+
EmbeddingService,
|
|
669
|
+
DatalogProgram,
|
|
670
|
+
evaluateDatalog,
|
|
671
|
+
HyperMindAgent
|
|
672
|
+
} = require('rust-kgdb');
|
|
673
|
+
|
|
674
|
+
// ============================================================
|
|
675
|
+
// STEP 1: Initialize Services
|
|
676
|
+
// ============================================================
|
|
677
|
+
const db = new GraphDB('http://insurance.org/fraud-detection');
|
|
678
|
+
const embeddings = new EmbeddingService();
|
|
679
|
+
|
|
680
|
+
// ============================================================
|
|
681
|
+
// STEP 2: Load Claims Data
|
|
682
|
+
// ============================================================
|
|
683
|
+
db.loadTtl(`
|
|
684
|
+
@prefix : <http://insurance.org/> .
|
|
685
|
+
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
|
|
686
|
+
|
|
687
|
+
# Claims
|
|
688
|
+
:CLM001 a :Claim ;
|
|
689
|
+
:amount "18500"^^xsd:decimal ;
|
|
690
|
+
:description "Soft tissue injury from rear-end collision" ;
|
|
691
|
+
:claimant :P001 ;
|
|
692
|
+
:provider :PROV001 ;
|
|
693
|
+
:filingDate "2024-11-15"^^xsd:date .
|
|
694
|
+
|
|
695
|
+
:CLM002 a :Claim ;
|
|
696
|
+
:amount "22300"^^xsd:decimal ;
|
|
697
|
+
:description "Whiplash injury from vehicle accident" ;
|
|
698
|
+
:claimant :P002 ;
|
|
699
|
+
:provider :PROV001 ;
|
|
700
|
+
:filingDate "2024-11-18"^^xsd:date .
|
|
701
|
+
|
|
702
|
+
# Claimants (note: same address = red flag!)
|
|
703
|
+
:P001 a :Claimant ;
|
|
704
|
+
:name "John Smith" ;
|
|
705
|
+
:address "123 Main St, Miami, FL" ;
|
|
706
|
+
:riskScore "0.85"^^xsd:decimal .
|
|
707
|
+
|
|
708
|
+
:P002 a :Claimant ;
|
|
709
|
+
:name "Jane Doe" ;
|
|
710
|
+
:address "123 Main St, Miami, FL" ;
|
|
711
|
+
:riskScore "0.72"^^xsd:decimal .
|
|
712
|
+
|
|
713
|
+
# Relationships (fraud indicators)
|
|
714
|
+
:P001 :knows :P002 .
|
|
715
|
+
:P001 :paidTo :P002 .
|
|
716
|
+
:P002 :paidTo :P003 .
|
|
717
|
+
:P003 :paidTo :P001 . # Circular payment!
|
|
718
|
+
|
|
719
|
+
# Provider
|
|
720
|
+
:PROV001 a :Provider ;
|
|
721
|
+
:name "Quick Care Rehabilitation Clinic" ;
|
|
722
|
+
:flagCount "4"^^xsd:integer .
|
|
723
|
+
`);
|
|
724
|
+
|
|
725
|
+
console.log(`Loaded ${db.countTriples()} triples`);
|
|
726
|
+
|
|
727
|
+
// ============================================================
|
|
728
|
+
// STEP 3: Graph Analytics - Find Network Patterns
|
|
729
|
+
// ============================================================
|
|
730
|
+
const vertices = JSON.stringify([
|
|
731
|
+
{id: 'P001'}, {id: 'P002'}, {id: 'P003'}, {id: 'PROV001'}
|
|
732
|
+
]);
|
|
733
|
+
const edges = JSON.stringify([
|
|
734
|
+
{src: 'P001', dst: 'P002'},
|
|
735
|
+
{src: 'P001', dst: 'PROV001'},
|
|
736
|
+
{src: 'P002', dst: 'PROV001'},
|
|
737
|
+
{src: 'P001', dst: 'P002'}, // payment
|
|
738
|
+
{src: 'P002', dst: 'P003'}, // payment
|
|
739
|
+
{src: 'P003', dst: 'P001'} // payment (circular!)
|
|
740
|
+
]);
|
|
741
|
+
|
|
742
|
+
const gf = new GraphFrame(vertices, edges);
|
|
743
|
+
console.log('Triangles (circular patterns):', gf.triangleCount());
|
|
744
|
+
console.log('PageRank:', gf.pageRank(0.15, 20));
|
|
745
|
+
|
|
746
|
+
// ============================================================
|
|
747
|
+
// STEP 4: Embedding-Based Similarity
|
|
748
|
+
// ============================================================
|
|
749
|
+
// Store embeddings for semantic similarity search
|
|
750
|
+
// (In production, use OpenAI/Voyage embeddings)
|
|
751
|
+
function mockEmbedding(text) {
|
|
752
|
+
return new Array(384).fill(0).map((_, i) =>
|
|
753
|
+
Math.sin(text.charCodeAt(i % text.length) * 0.1) * 0.5 + 0.5
|
|
754
|
+
);
|
|
755
|
+
}
|
|
1621
756
|
|
|
1622
|
-
|
|
757
|
+
embeddings.storeVector('CLM001', mockEmbedding('soft tissue injury rear end'));
|
|
758
|
+
embeddings.storeVector('CLM002', mockEmbedding('whiplash vehicle accident'));
|
|
759
|
+
embeddings.rebuildIndex();
|
|
760
|
+
|
|
761
|
+
const similar = JSON.parse(embeddings.findSimilar('CLM001', 5, 0.3));
|
|
762
|
+
console.log('Similar claims:', similar);
|
|
763
|
+
|
|
764
|
+
// ============================================================
|
|
765
|
+
// STEP 5: Datalog Rules - NICB Fraud Detection
|
|
766
|
+
// ============================================================
|
|
767
|
+
const datalog = new DatalogProgram();
|
|
768
|
+
|
|
769
|
+
// Add facts from our knowledge graph
|
|
770
|
+
datalog.addFact(JSON.stringify({predicate:'claimant', terms:['P001']}));
|
|
771
|
+
datalog.addFact(JSON.stringify({predicate:'claimant', terms:['P002']}));
|
|
772
|
+
datalog.addFact(JSON.stringify({predicate:'provider', terms:['PROV001']}));
|
|
773
|
+
datalog.addFact(JSON.stringify({predicate:'knows', terms:['P001','P002']}));
|
|
774
|
+
datalog.addFact(JSON.stringify({predicate:'claims_with', terms:['P001','PROV001']}));
|
|
775
|
+
datalog.addFact(JSON.stringify({predicate:'claims_with', terms:['P002','PROV001']}));
|
|
776
|
+
datalog.addFact(JSON.stringify({predicate:'same_address', terms:['P001','P002']}));
|
|
777
|
+
|
|
778
|
+
// NICB Collusion Detection Rule
|
|
779
|
+
datalog.addRule(JSON.stringify({
|
|
780
|
+
head: {predicate:'potential_collusion', terms:['?X','?Y','?P']},
|
|
781
|
+
body: [
|
|
782
|
+
{predicate:'claimant', terms:['?X']},
|
|
783
|
+
{predicate:'claimant', terms:['?Y']},
|
|
784
|
+
{predicate:'provider', terms:['?P']},
|
|
785
|
+
{predicate:'knows', terms:['?X','?Y']},
|
|
786
|
+
{predicate:'claims_with', terms:['?X','?P']},
|
|
787
|
+
{predicate:'claims_with', terms:['?Y','?P']}
|
|
788
|
+
]
|
|
789
|
+
}));
|
|
1623
790
|
|
|
1624
|
-
|
|
791
|
+
// Staged Accident Indicator Rule
|
|
792
|
+
datalog.addRule(JSON.stringify({
|
|
793
|
+
head: {predicate:'staged_accident_indicator', terms:['?X','?Y']},
|
|
794
|
+
body: [
|
|
795
|
+
{predicate:'claimant', terms:['?X']},
|
|
796
|
+
{predicate:'claimant', terms:['?Y']},
|
|
797
|
+
{predicate:'same_address', terms:['?X','?Y']},
|
|
798
|
+
{predicate:'knows', terms:['?X','?Y']}
|
|
799
|
+
]
|
|
800
|
+
}));
|
|
801
|
+
|
|
802
|
+
const inferred = JSON.parse(evaluateDatalog(datalog));
|
|
803
|
+
console.log('Inferred fraud patterns:', inferred);
|
|
804
|
+
|
|
805
|
+
// ============================================================
|
|
806
|
+
// STEP 6: SPARQL Query - Get Detailed Evidence
|
|
807
|
+
// ============================================================
|
|
808
|
+
const suspiciousClaims = db.querySelect(`
|
|
809
|
+
PREFIX : <http://insurance.org/>
|
|
810
|
+
SELECT ?claim ?amount ?claimant ?provider WHERE {
|
|
811
|
+
?claim a :Claim ;
|
|
812
|
+
:amount ?amount ;
|
|
813
|
+
:claimant ?claimant ;
|
|
814
|
+
:provider ?provider .
|
|
815
|
+
?claimant :riskScore ?risk .
|
|
816
|
+
FILTER(?risk > 0.7)
|
|
817
|
+
}
|
|
818
|
+
`);
|
|
1625
819
|
|
|
1626
|
-
|
|
820
|
+
console.log('High-risk claims:', suspiciousClaims);
|
|
1627
821
|
|
|
1628
|
-
|
|
1629
|
-
|
|
1630
|
-
|
|
1631
|
-
|
|
1632
|
-
| **Reasoning Trace** | Records every step | Complete audit trail for compliance |
|
|
822
|
+
// ============================================================
|
|
823
|
+
// STEP 7: HyperMind Agent - Natural Language Interface
|
|
824
|
+
// ============================================================
|
|
825
|
+
const agent = new HyperMindAgent({ db, embeddings });
|
|
1633
826
|
|
|
1634
|
-
|
|
827
|
+
async function investigate() {
|
|
828
|
+
const result = await agent.ask("Which claims show potential fraud patterns?");
|
|
1635
829
|
|
|
1636
|
-
|
|
830
|
+
console.log('\\n=== AGENT FINDINGS ===');
|
|
831
|
+
console.log(result.answer);
|
|
832
|
+
console.log('\\n=== EVIDENCE CHAIN ===');
|
|
833
|
+
console.log(result.evidence);
|
|
834
|
+
console.log('\\n=== PROOF HASH ===');
|
|
835
|
+
console.log(result.proofHash);
|
|
836
|
+
}
|
|
1637
837
|
|
|
838
|
+
investigate().catch(console.error);
|
|
1638
839
|
```
|
|
1639
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1640
|
-
│ REASONING TRACE │
|
|
1641
|
-
│ │
|
|
1642
|
-
│ ┌────────────────────────────────┐ │
|
|
1643
|
-
│ │ CONCLUSION (Root) │ │
|
|
1644
|
-
│ │ "Provider P001 is suspicious" │ │
|
|
1645
|
-
│ │ Confidence: 94% │ │
|
|
1646
|
-
│ └───────────────┬────────────────┘ │
|
|
1647
|
-
│ │ │
|
|
1648
|
-
│ ┌───────────────┼───────────────┐ │
|
|
1649
|
-
│ │ │ │ │
|
|
1650
|
-
│ ▼ ▼ ▼ │
|
|
1651
|
-
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
|
|
1652
|
-
│ │ Database Query │ │ Rule Application │ │ Similarity Match │ │
|
|
1653
|
-
│ │ │ │ │ │ │ │
|
|
1654
|
-
│ │ Tool: SPARQL │ │ Tool: Datalog │ │ Tool: Embeddings │ │
|
|
1655
|
-
│ │ Result: 47 claims│ │ Result: MATCHED │ │ Result: 87% │ │
|
|
1656
|
-
│ │ Time: 2.3ms │ │ Rule: fraud(?P) │ │ similar to known │ │
|
|
1657
|
-
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
|
|
1658
|
-
│ │
|
|
1659
|
-
│ HASH: sha256:8f3a2b1c4d5e... (Reproducible, Auditable, Verifiable) │
|
|
1660
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1661
|
-
```
|
|
1662
|
-
|
|
1663
|
-
### For Academics: Mathematical Foundations
|
|
1664
|
-
|
|
1665
|
-
HyperMind is built on rigorous mathematical foundations:
|
|
1666
840
|
|
|
1667
|
-
|
|
1668
|
-
- **Type Theory** (Hindley-Milner): Every tool has a typed signature enabling compile-time validation
|
|
1669
|
-
- **Proof Theory** (Curry-Howard): Proofs are programs, types are propositions - every conclusion has a derivation
|
|
1670
|
-
- **Category Theory**: Tools as morphisms with validated composition
|
|
1671
|
-
|
|
1672
|
-
These foundations ensure that HyperMind transforms probabilistic LLM outputs into deterministic, verifiable reasoning chains.
|
|
1673
|
-
|
|
1674
|
-
### Architecture Layers
|
|
1675
|
-
|
|
1676
|
-
```
|
|
1677
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1678
|
-
│ INTELLIGENCE CONTROL PLANE │
|
|
1679
|
-
│ │
|
|
1680
|
-
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
|
1681
|
-
│ │ Schema │ │ Tool │ │ Reasoning │ │
|
|
1682
|
-
│ │ Awareness │ │ Validation │ │ Trace │ │
|
|
1683
|
-
│ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘ │
|
|
1684
|
-
│ └────────────────────┼────────────────────┘ │
|
|
1685
|
-
│ ▼ │
|
|
1686
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1687
|
-
│ │ HYPERMIND AGENT │ │
|
|
1688
|
-
│ │ User Query → LLM Planner → Typed Execution Plan → Tools → Answer │ │
|
|
1689
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1690
|
-
│ ▼ │
|
|
1691
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1692
|
-
│ │ rust-kgdb ENGINE │ │
|
|
1693
|
-
│ │ • GraphDB (SPARQL 1.1) • GraphFrames (Analytics) │ │
|
|
1694
|
-
│ │ • Datalog (Rules) • Embeddings (Similarity) │ │
|
|
1695
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1696
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1697
|
-
```
|
|
1698
|
-
|
|
1699
|
-
### Security Model
|
|
841
|
+
---
|
|
1700
842
|
|
|
1701
|
-
|
|
843
|
+
## Complete Underwriting Example
|
|
1702
844
|
|
|
1703
845
|
```javascript
|
|
1704
|
-
const
|
|
1705
|
-
kg: db,
|
|
1706
|
-
scope: new AgentScope({
|
|
1707
|
-
allowedGraphs: ['http://insurance.org/'], // Restrict graph access
|
|
1708
|
-
allowedPredicates: ['amount', 'provider'], // Restrict predicates
|
|
1709
|
-
maxResultSize: 1000 // Limit result size
|
|
1710
|
-
}),
|
|
1711
|
-
sandbox: {
|
|
1712
|
-
capabilities: ['ReadKG', 'ExecuteTool'], // No WriteKG = read-only
|
|
1713
|
-
fuelLimit: 1_000_000 // CPU budget
|
|
1714
|
-
}
|
|
1715
|
-
})
|
|
1716
|
-
```
|
|
1717
|
-
|
|
1718
|
-
### Distributed Deployment (Kubernetes)
|
|
1719
|
-
|
|
1720
|
-
rust-kgdb scales from single-node to distributed cluster on the same codebase.
|
|
1721
|
-
|
|
1722
|
-
```
|
|
1723
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1724
|
-
│ DISTRIBUTED ARCHITECTURE │
|
|
1725
|
-
│ │
|
|
1726
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1727
|
-
│ │ COORDINATOR NODE │ │
|
|
1728
|
-
│ │ • Query planning & optimization │ │
|
|
1729
|
-
│ │ • HDRF streaming partitioner (subject-anchored) │ │
|
|
1730
|
-
│ │ • Raft consensus leader │ │
|
|
1731
|
-
│ │ • gRPC routing to executors │ │
|
|
1732
|
-
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
|
1733
|
-
│ │ │
|
|
1734
|
-
│ ┌───────────────────────┼───────────────────────┐ │
|
|
1735
|
-
│ │ │ │ │
|
|
1736
|
-
│ ▼ ▼ ▼ │
|
|
1737
|
-
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
1738
|
-
│ │ EXECUTOR 1 │ │ EXECUTOR 2 │ │ EXECUTOR 3 │ │
|
|
1739
|
-
│ │ │ │ │ │ │ │
|
|
1740
|
-
│ │ Partition 0 │ │ Partition 1 │ │ Partition 2 │ │
|
|
1741
|
-
│ │ RocksDB │ │ RocksDB │ │ RocksDB │ │
|
|
1742
|
-
│ │ Embeddings │ │ Embeddings │ │ Embeddings │ │
|
|
1743
|
-
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
|
1744
|
-
│ │
|
|
1745
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1746
|
-
```
|
|
846
|
+
const { GraphDB, DatalogProgram, evaluateDatalog } = require('rust-kgdb');
|
|
1747
847
|
|
|
1748
|
-
|
|
1749
|
-
|
|
1750
|
-
|
|
1751
|
-
|
|
848
|
+
// ============================================================
|
|
849
|
+
// Automated Underwriting Rules Engine
|
|
850
|
+
// ============================================================
|
|
851
|
+
const db = new GraphDB('http://underwriting.org/');
|
|
1752
852
|
|
|
1753
|
-
|
|
1754
|
-
|
|
853
|
+
// Load applicant data
|
|
854
|
+
db.loadTtl(`
|
|
855
|
+
@prefix : <http://underwriting.org/> .
|
|
856
|
+
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
|
|
857
|
+
|
|
858
|
+
:APP001 a :Application ;
|
|
859
|
+
:applicant :PERSON001 ;
|
|
860
|
+
:requestedAmount "500000"^^xsd:decimal ;
|
|
861
|
+
:propertyType :SingleFamily .
|
|
862
|
+
|
|
863
|
+
:PERSON001 a :Person ;
|
|
864
|
+
:creditScore "720"^^xsd:integer ;
|
|
865
|
+
:dti "0.35"^^xsd:decimal ;
|
|
866
|
+
:employmentYears "5"^^xsd:integer ;
|
|
867
|
+
:bankruptcyHistory false .
|
|
868
|
+
`);
|
|
869
|
+
|
|
870
|
+
// Underwriting rules as Datalog
|
|
871
|
+
const datalog = new DatalogProgram();
|
|
872
|
+
|
|
873
|
+
// Facts
|
|
874
|
+
datalog.addFact(JSON.stringify({predicate:'application', terms:['APP001']}));
|
|
875
|
+
datalog.addFact(JSON.stringify({predicate:'credit_score', terms:['APP001','720']}));
|
|
876
|
+
datalog.addFact(JSON.stringify({predicate:'dti', terms:['APP001','0.35']}));
|
|
877
|
+
datalog.addFact(JSON.stringify({predicate:'employment_years', terms:['APP001','5']}));
|
|
878
|
+
|
|
879
|
+
// Auto-Approve Rule: Credit > 700, DTI < 0.43, Employment > 2 years
|
|
880
|
+
datalog.addRule(JSON.stringify({
|
|
881
|
+
head: {predicate:'auto_approve', terms:['?App']},
|
|
882
|
+
body: [
|
|
883
|
+
{predicate:'application', terms:['?App']},
|
|
884
|
+
{predicate:'credit_score', terms:['?App','?Credit']},
|
|
885
|
+
{predicate:'dti', terms:['?App','?DTI']},
|
|
886
|
+
{predicate:'employment_years', terms:['?App','?Years']}
|
|
887
|
+
// Note: Numeric comparisons would be handled in production
|
|
888
|
+
]
|
|
889
|
+
}));
|
|
1755
890
|
|
|
1756
|
-
|
|
1757
|
-
|
|
891
|
+
const decisions = JSON.parse(evaluateDatalog(datalog));
|
|
892
|
+
console.log('Underwriting decisions:', decisions);
|
|
1758
893
|
```
|
|
1759
894
|
|
|
1760
|
-
|
|
1761
|
-
| Feature | Description |
|
|
1762
|
-
|---------|-------------|
|
|
1763
|
-
| **HDRF Partitioning** | Subject-anchored streaming partitioner minimizes edge cuts |
|
|
1764
|
-
| **Raft Consensus** | Leader election, log replication, consistency |
|
|
1765
|
-
| **gRPC Communication** | Efficient inter-node query routing |
|
|
1766
|
-
| **Shadow Partitions** | Zero-downtime rebalancing (~10ms pause) |
|
|
1767
|
-
| **DataFusion OLAP** | Arrow-native analytical queries |
|
|
895
|
+
---
|
|
1768
896
|
|
|
1769
|
-
|
|
897
|
+
## API Reference
|
|
1770
898
|
|
|
1771
|
-
|
|
899
|
+
### GraphDB
|
|
1772
900
|
|
|
1773
901
|
```javascript
|
|
1774
|
-
const
|
|
1775
|
-
|
|
1776
|
-
|
|
1777
|
-
|
|
1778
|
-
|
|
1779
|
-
|
|
1780
|
-
|
|
1781
|
-
})
|
|
902
|
+
const db = new GraphDB(baseUri) // Create database
|
|
903
|
+
db.loadTtl(turtle, graphUri) // Load Turtle data
|
|
904
|
+
db.querySelect(sparql) // SELECT query → [{bindings}]
|
|
905
|
+
db.queryConstruct(sparql) // CONSTRUCT query → triples
|
|
906
|
+
db.countTriples() // Total triple count
|
|
907
|
+
db.clear() // Clear all data
|
|
908
|
+
db.getVersion() // SDK version
|
|
1782
909
|
```
|
|
1783
910
|
|
|
1784
|
-
###
|
|
1785
|
-
|
|
1786
|
-
rust-kgdb introduces the **Memory Hypergraph** - a temporal knowledge graph where agent memory is stored in the *same* quad store as your domain knowledge, with hyper-edges connecting episodes to KG entities.
|
|
1787
|
-
|
|
1788
|
-
```
|
|
1789
|
-
┌─────────────────────────────────────────────────────────────────────────────────┐
|
|
1790
|
-
│ MEMORY HYPERGRAPH ARCHITECTURE │
|
|
1791
|
-
│ │
|
|
1792
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
1793
|
-
│ │ AGENT MEMORY LAYER (am: graph) │ │
|
|
1794
|
-
│ │ │ │
|
|
1795
|
-
│ │ Episode:001 Episode:002 Episode:003 │ │
|
|
1796
|
-
│ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │
|
|
1797
|
-
│ │ │ Fraud ring │ │ Underwriting │ │ Follow-up │ │ │
|
|
1798
|
-
│ │ │ detected in │ │ denied claim │ │ investigation │ │ │
|
|
1799
|
-
│ │ │ Provider P001 │ │ from P001 │ │ on P001 │ │ │
|
|
1800
|
-
│ │ │ │ │ │ │ │ │ │
|
|
1801
|
-
│ │ │ Dec 10, 14:30 │ │ Dec 12, 09:15 │ │ Dec 15, 11:00 │ │ │
|
|
1802
|
-
│ │ │ Score: 0.95 │ │ Score: 0.87 │ │ Score: 0.92 │ │ │
|
|
1803
|
-
│ │ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │ │
|
|
1804
|
-
│ │ │ │ │ │ │
|
|
1805
|
-
│ └───────────┼─────────────────────────┼─────────────────────────┼─────────┘ │
|
|
1806
|
-
│ │ HyperEdge: │ HyperEdge: │ │
|
|
1807
|
-
│ │ "QueriedKG" │ "DeniedClaim" │ │
|
|
1808
|
-
│ ▼ ▼ ▼ │
|
|
1809
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
1810
|
-
│ │ KNOWLEDGE GRAPH LAYER (domain graph) │ │
|
|
1811
|
-
│ │ │ │
|
|
1812
|
-
│ │ Provider:P001 ──────────────▶ Claim:C123 ◀────────── Claimant:C001 │ │
|
|
1813
|
-
│ │ │ │ │ │ │
|
|
1814
|
-
│ │ │ :hasRiskScore │ :amount │ :name │ │
|
|
1815
|
-
│ │ ▼ ▼ ▼ │ │
|
|
1816
|
-
│ │ "0.87" "50000" "John Doe" │ │
|
|
1817
|
-
│ │ │ │
|
|
1818
|
-
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
1819
|
-
│ │ │ SAME QUAD STORE - Single SPARQL query traverses BOTH │ │ │
|
|
1820
|
-
│ │ │ memory graph AND knowledge graph! │ │ │
|
|
1821
|
-
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
1822
|
-
│ │ │ │
|
|
1823
|
-
│ └─────────────────────────────────────────────────────────────────────────┘ │
|
|
1824
|
-
│ │
|
|
1825
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
1826
|
-
│ │ TEMPORAL SCORING FORMULA │ │
|
|
1827
|
-
│ │ │ │
|
|
1828
|
-
│ │ Score = α × Recency + β × Relevance + γ × Importance │ │
|
|
1829
|
-
│ │ │ │
|
|
1830
|
-
│ │ where: │ │
|
|
1831
|
-
│ │ Recency = 0.995^hours (12% decay/day) │ │
|
|
1832
|
-
│ │ Relevance = cosine_similarity(query, episode) │ │
|
|
1833
|
-
│ │ Importance = log10(access_count + 1) / log10(max + 1) │ │
|
|
1834
|
-
│ │ │ │
|
|
1835
|
-
│ │ Default: α=0.3, β=0.5, γ=0.2 │ │
|
|
1836
|
-
│ └─────────────────────────────────────────────────────────────────────────┘ │
|
|
1837
|
-
│ │
|
|
1838
|
-
└─────────────────────────────────────────────────────────────────────────────────┘
|
|
1839
|
-
```
|
|
1840
|
-
|
|
1841
|
-
**Without Memory Hypergraph** (LangChain, LlamaIndex):
|
|
1842
|
-
```javascript
|
|
1843
|
-
// Ask about last week's findings
|
|
1844
|
-
agent.chat("What fraud patterns did we find with Provider P001?")
|
|
1845
|
-
// Response: "I don't have that information. Could you describe what you're looking for?"
|
|
1846
|
-
// Cost: Re-run entire fraud detection pipeline ($5 in API calls, 30 seconds)
|
|
1847
|
-
```
|
|
911
|
+
### GraphFrame
|
|
1848
912
|
|
|
1849
|
-
**With Memory Hypergraph** (rust-kgdb HyperMind Framework):
|
|
1850
913
|
```javascript
|
|
1851
|
-
|
|
1852
|
-
|
|
1853
|
-
|
|
1854
|
-
|
|
1855
|
-
|
|
1856
|
-
|
|
1857
|
-
|
|
1858
|
-
// Returns typed results with linked KG context:
|
|
1859
|
-
// {
|
|
1860
|
-
// episode: "Episode:001",
|
|
1861
|
-
// finding: "Fraud ring detected in Provider P001",
|
|
1862
|
-
// kgContext: {
|
|
1863
|
-
// provider: "Provider:P001",
|
|
1864
|
-
// claims: [{ id: "Claim:C123", amount: 50000 }],
|
|
1865
|
-
// riskScore: 0.87
|
|
1866
|
-
// },
|
|
1867
|
-
// semanticHash: "semhash:fraud-provider-p001-ring-detection"
|
|
1868
|
-
// }
|
|
914
|
+
const gf = new GraphFrame(verticesJson, edgesJson)
|
|
915
|
+
gf.pageRank(dampingFactor, iterations) // PageRank scores
|
|
916
|
+
gf.connectedComponents() // Component labels
|
|
917
|
+
gf.triangleCount() // Triangle count
|
|
918
|
+
gf.shortestPaths(sourceId) // Shortest path distances
|
|
919
|
+
gf.find(motifPattern) // Motif pattern matching
|
|
1869
920
|
```
|
|
1870
921
|
|
|
1871
|
-
|
|
1872
|
-
|
|
1873
|
-
Same question = Same answer. Even with **different wording**. Critical for compliance.
|
|
922
|
+
### EmbeddingService
|
|
1874
923
|
|
|
1875
924
|
```javascript
|
|
1876
|
-
|
|
1877
|
-
|
|
1878
|
-
//
|
|
1879
|
-
|
|
1880
|
-
|
|
1881
|
-
|
|
1882
|
-
// Cache HIT - same semantic hash
|
|
1883
|
-
|
|
1884
|
-
// Compliance officer: "Why are these identical?"
|
|
1885
|
-
// You: "Semantic hashing - same meaning, same output, regardless of phrasing."
|
|
1886
|
-
```
|
|
1887
|
-
|
|
1888
|
-
**How it works**: Query embeddings are hashed via **Locality-Sensitive Hashing (LSH)** with random hyperplane projections. Semantically similar queries map to the same bucket.
|
|
1889
|
-
|
|
1890
|
-
### HyperMind vs MCP (Model Context Protocol)
|
|
1891
|
-
|
|
1892
|
-
Why domain-enriched proxies beat generic function calling:
|
|
1893
|
-
|
|
1894
|
-
```
|
|
1895
|
-
┌───────────────────────┬──────────────────────┬──────────────────────────┐
|
|
1896
|
-
│ Feature │ MCP │ HyperMind Proxy │
|
|
1897
|
-
├───────────────────────┼──────────────────────┼──────────────────────────┤
|
|
1898
|
-
│ Type Safety │ ❌ String only │ ✅ Full type system │
|
|
1899
|
-
│ Domain Knowledge │ ❌ Generic │ ✅ Domain-enriched │
|
|
1900
|
-
│ Tool Composition │ ❌ Isolated │ ✅ Morphism composition │
|
|
1901
|
-
│ Validation │ ❌ Runtime │ ✅ Compile-time │
|
|
1902
|
-
│ Security │ ❌ None │ ✅ WASM sandbox │
|
|
1903
|
-
│ Audit Trail │ ❌ None │ ✅ Execution witness │
|
|
1904
|
-
│ LLM Context │ ❌ Generic schema │ ✅ Rich domain hints │
|
|
1905
|
-
│ Capability Control │ ❌ All or nothing │ ✅ Fine-grained caps │
|
|
1906
|
-
├───────────────────────┼──────────────────────┼──────────────────────────┤
|
|
1907
|
-
│ Result │ 60% accuracy │ 95%+ accuracy │
|
|
1908
|
-
└───────────────────────┴──────────────────────┴──────────────────────────┘
|
|
925
|
+
const emb = new EmbeddingService()
|
|
926
|
+
emb.storeVector(entityId, float32Array) // Store embedding
|
|
927
|
+
emb.rebuildIndex() // Build HNSW index
|
|
928
|
+
emb.findSimilar(entityId, k, threshold) // Find similar entities
|
|
929
|
+
emb.onTripleInsert(s, p, o, g) // Update neighbor cache
|
|
930
|
+
emb.getNeighborsOut(entityId) // Get outgoing neighbors
|
|
1909
931
|
```
|
|
1910
932
|
|
|
1911
|
-
|
|
1912
|
-
**HyperMind**: LLM selects tools → type system validates → guaranteed correct
|
|
933
|
+
### DatalogProgram
|
|
1913
934
|
|
|
1914
935
|
```javascript
|
|
1915
|
-
|
|
1916
|
-
//
|
|
1917
|
-
//
|
|
1918
|
-
//
|
|
1919
|
-
|
|
1920
|
-
// HYPERMIND APPROACH (Domain-enriched proxy)
|
|
1921
|
-
// Tool: kg.datalog.infer with fraud rules
|
|
1922
|
-
const result = await agent.call('Find collusion patterns')
|
|
1923
|
-
// Result: ✅ Type-safe, domain-aware, auditable
|
|
936
|
+
const dl = new DatalogProgram()
|
|
937
|
+
dl.addFact(factJson) // Add fact
|
|
938
|
+
dl.addRule(ruleJson) // Add rule
|
|
939
|
+
evaluateDatalog(dl) // Run evaluation → facts JSON
|
|
940
|
+
queryDatalog(dl, queryJson) // Query specific predicate
|
|
1924
941
|
```
|
|
1925
942
|
|
|
1926
|
-
###
|
|
943
|
+
### Pregel
|
|
1927
944
|
|
|
1928
|
-
|
|
1929
|
-
|
|
1930
|
-
|
|
1931
|
-
User: "Find all professors"
|
|
1932
|
-
|
|
1933
|
-
Vanilla LLM Output:
|
|
1934
|
-
┌───────────────────────────────────────────────────────────────────────┐
|
|
1935
|
-
│ ```sparql │
|
|
1936
|
-
│ PREFIX ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#> │
|
|
1937
|
-
│ SELECT ?professor WHERE { │
|
|
1938
|
-
│ ?professor a ub:Faculty . ← WRONG! Schema has "Professor" │
|
|
1939
|
-
│ } │
|
|
1940
|
-
│ ``` ← Parser rejects markdown │
|
|
1941
|
-
│ │
|
|
1942
|
-
│ This query retrieves all faculty members from the LUBM dataset. │
|
|
1943
|
-
│ ↑ Explanation text breaks parsing │
|
|
1944
|
-
└───────────────────────────────────────────────────────────────────────┘
|
|
1945
|
-
Result: ❌ PARSER ERROR - Invalid SPARQL syntax
|
|
945
|
+
```javascript
|
|
946
|
+
pregelShortestPaths(graphFrame, sourceId, maxIterations)
|
|
947
|
+
// Returns: distance map from source to all vertices
|
|
1946
948
|
```
|
|
1947
949
|
|
|
1948
|
-
|
|
1949
|
-
1. LLM wraps query in markdown code blocks → parser chokes
|
|
1950
|
-
2. LLM adds explanation text → mixed with query syntax
|
|
1951
|
-
3. LLM hallucinates class names → `ub:Faculty` doesn't exist (it's `ub:Professor`)
|
|
1952
|
-
4. LLM has no schema awareness → guesses predicates and classes
|
|
1953
|
-
|
|
1954
|
-
**HyperMind fixes all of this** with schema injection and typed tools, achieving **71% accuracy** vs **0% for vanilla LLMs without schema**.
|
|
1955
|
-
|
|
1956
|
-
### Competitive Landscape
|
|
1957
|
-
|
|
1958
|
-
#### Triple Stores Comparison
|
|
1959
|
-
|
|
1960
|
-
| System | Lookup Speed | Memory/Triple | WCOJ | Mobile | AI Framework |
|
|
1961
|
-
|--------|-------------|---------------|------|--------|--------------|
|
|
1962
|
-
| **rust-kgdb** | **449 ns** | **24 bytes** | ✅ Yes | ✅ Yes | ✅ HyperMind |
|
|
1963
|
-
| Tentris | ~5 µs | ~30 bytes | ✅ Yes | ❌ No | ❌ No |
|
|
1964
|
-
| RDFox | ~5 µs | 36-89 bytes | ❌ No | ❌ No | ❌ No |
|
|
1965
|
-
| AllegroGraph | ~10 µs | 50+ bytes | ❌ No | ❌ No | ❌ No |
|
|
1966
|
-
| Virtuoso | ~5 µs | 35-75 bytes | ❌ No | ❌ No | ❌ No |
|
|
1967
|
-
| Blazegraph | ~100 µs | 100+ bytes | ❌ No | ❌ No | ❌ No |
|
|
1968
|
-
| Apache Jena | 150+ µs | 50-60 bytes | ❌ No | ❌ No | ❌ No |
|
|
1969
|
-
| Neo4j | ~5 µs | 70+ bytes | ❌ No | ❌ No | ❌ No |
|
|
1970
|
-
| Amazon Neptune | ~5 µs | N/A (managed) | ❌ No | ❌ No | ❌ No |
|
|
1971
|
-
|
|
1972
|
-
**Note**: Tentris implements WCOJ (see [ISWC 2025 paper](https://papers.dice-research.org/2025/ISWC_Tentris-WCOJ-Update/public.pdf)). rust-kgdb is the only system combining WCOJ with mobile support and integrated AI framework.
|
|
950
|
+
### Factory Functions
|
|
1973
951
|
|
|
1974
|
-
|
|
1975
|
-
|
|
1976
|
-
|
|
1977
|
-
|
|
1978
|
-
|
|
1979
|
-
|
|
1980
|
-
| DSPy | ⚠️ Partial | ❌ No | ❌ No | ❌ No |
|
|
1981
|
-
|
|
1982
|
-
**Note**: This compares architectural features. Benchmark (Dec 2025): Schema injection brings all frameworks to ~71% accuracy equally.
|
|
1983
|
-
|
|
1984
|
-
```
|
|
1985
|
-
┌─────────────────────────────────────────────────────────────────┐
|
|
1986
|
-
│ COMPETITIVE LANDSCAPE │
|
|
1987
|
-
├─────────────────────────────────────────────────────────────────┤
|
|
1988
|
-
│ │
|
|
1989
|
-
│ Tentris: WCOJ-optimized, but no mobile or AI framework │
|
|
1990
|
-
│ RDFox: Fast commercial, but expensive, no mobile │
|
|
1991
|
-
│ AllegroGraph: Enterprise features, but slower, no mobile │
|
|
1992
|
-
│ Apache Jena: Great features, but 150+ µs lookups │
|
|
1993
|
-
│ Neo4j: Popular, but no SPARQL/RDF standards │
|
|
1994
|
-
│ Amazon Neptune: Managed, but cloud-only vendor lock-in │
|
|
1995
|
-
│ │
|
|
1996
|
-
│ rust-kgdb: 449 ns lookups, WCOJ joins, mobile-native │
|
|
1997
|
-
│ Standalone → Clustered on same codebase │
|
|
1998
|
-
│ Deterministic planner, audit-ready │
|
|
1999
|
-
│ │
|
|
2000
|
-
└─────────────────────────────────────────────────────────────────┘
|
|
952
|
+
```javascript
|
|
953
|
+
friendsGraph() // Sample social network
|
|
954
|
+
chainGraph(n) // Linear chain of n vertices
|
|
955
|
+
starGraph(n) // Star topology with n leaves
|
|
956
|
+
completeGraph(n) // Fully connected graph
|
|
957
|
+
cycleGraph(n) // Circular graph
|
|
2001
958
|
```
|
|
2002
959
|
|
|
2003
960
|
---
|
|
2004
961
|
|
|
2005
|
-
|
|
2006
|
-
|
|
2007
|
-
Apache 2.0
|
|
962
|
+
Apache 2.0 License
|