rust-kgdb 0.6.16 → 0.6.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +123 -0
- package/HYPERMIND_BENCHMARK_REPORT.md +32 -35
- package/README.md +324 -3335
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -4,3517 +4,506 @@
|
|
|
4
4
|
[](https://opensource.org/licenses/Apache-2.0)
|
|
5
5
|
[](https://www.w3.org/TR/sparql11-query/)
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
## AI Answers You Can Trust
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
**The Problem**: LLMs hallucinate. They make up facts, invent data, and confidently state falsehoods. In regulated industries (finance, healthcare, legal), this is not just annoying—it's a liability.
|
|
10
10
|
|
|
11
|
-
|
|
11
|
+
**The Solution**: HyperMind grounds every AI answer in YOUR actual data. Every response includes a complete audit trail. Same question = Same answer = Same proof.
|
|
12
12
|
|
|
13
13
|
---
|
|
14
14
|
|
|
15
|
-
##
|
|
15
|
+
## Results
|
|
16
16
|
|
|
17
|
-
|
|
17
|
+
| Metric | Vanilla LLM | HyperMind | Improvement |
|
|
18
|
+
|--------|-------------|-----------|-------------|
|
|
19
|
+
| **Accuracy** | 0% | 86.4% | +86.4 pp |
|
|
20
|
+
| **Hallucinations** | 100% | 0% | Eliminated |
|
|
21
|
+
| **Audit Trail** | None | Complete | Full provenance |
|
|
22
|
+
| **Reproducibility** | Random | Deterministic | Same hash |
|
|
18
23
|
|
|
19
|
-
**
|
|
20
|
-
|
|
21
|
-
```
|
|
22
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
23
|
-
│ FROM PROBABILISTIC LLM TO DETERMINISTIC REASONING │
|
|
24
|
-
│ │
|
|
25
|
-
│ USER QUERY ──────────────────────────────────────────────────────────────▶│
|
|
26
|
-
│ "Find suspicious providers" │
|
|
27
|
-
│ │
|
|
28
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
29
|
-
│ │ INTELLIGENCE CONTROL PLANE │ │
|
|
30
|
-
│ │ │ │
|
|
31
|
-
│ │ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ │
|
|
32
|
-
│ │ │ CONTEXT THEORY │ │ TYPE THEORY │ │ PROOF THEORY │ │ │
|
|
33
|
-
│ │ │ │ │ │ │ │ │ │
|
|
34
|
-
│ │ │ Spivak's Ologs │ │ Hindley-Milner │ │ Curry-Howard │ │ │
|
|
35
|
-
│ │ │ │ │ │ │ │ │ │
|
|
36
|
-
│ │ │ • Schema as │ │ • Typed tool │ │ • Proofs are │ │ │
|
|
37
|
-
│ │ │ Category │ │ signatures │ │ programs │ │ │
|
|
38
|
-
│ │ │ • Morphisms = │ │ • Composition │ │ • Types are │ │ │
|
|
39
|
-
│ │ │ Properties │ │ validation │ │ propositions │ │ │
|
|
40
|
-
│ │ │ • Functors = │ │ • Compile-time │ │ • Derivation │ │ │
|
|
41
|
-
│ │ │ Transforms │ │ rejection │ │ chains │ │ │
|
|
42
|
-
│ │ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘ │ │
|
|
43
|
-
│ │ │ │ │ │ │
|
|
44
|
-
│ │ └────────────────────┼────────────────────┘ │ │
|
|
45
|
-
│ │ │ │ │
|
|
46
|
-
│ │ ▼ │ │
|
|
47
|
-
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
|
|
48
|
-
│ │ │ HYPERMIND AGENT FRAMEWORK │ │ │
|
|
49
|
-
│ │ │ │ │ │
|
|
50
|
-
│ │ │ User Query → LLM Planner → Typed Execution Plan → Tools │ │ │
|
|
51
|
-
│ │ │ │ │ │
|
|
52
|
-
│ │ │ "Find suspicious" → kg.sparql.query → kg.datalog.apply │ │ │
|
|
53
|
-
│ │ │ → kg.embeddings.search → COMBINE │ │ │
|
|
54
|
-
│ │ └────────────────────────────────────────────────────────────┘ │ │
|
|
55
|
-
│ │ │ │ │
|
|
56
|
-
│ │ ▼ │ │
|
|
57
|
-
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
|
|
58
|
-
│ │ │ rust-kgdb ENGINE │ │ │
|
|
59
|
-
│ │ │ │ │ │
|
|
60
|
-
│ │ │ • GraphDB: SPARQL 1.1 + RDF 1.2 + Hypergraph │ │ │
|
|
61
|
-
│ │ │ • GraphFrames: Distributed analytics (no Spark needed) │ │ │
|
|
62
|
-
│ │ │ • Datalog: Semi-naive evaluation + stratified negation │ │ │
|
|
63
|
-
│ │ │ • Embeddings: HNSW + ARCADE 1-hop cache │ │ │
|
|
64
|
-
│ │ └────────────────────────────────────────────────────────────┘ │ │
|
|
65
|
-
│ │ │ │ │
|
|
66
|
-
│ └────────────────────────────────┼────────────────────────────────────┘ │
|
|
67
|
-
│ │ │
|
|
68
|
-
│ ▼ │
|
|
69
|
-
│ ◀──────────────────────────────────────────────────────────────── OUTPUT │
|
|
70
|
-
│ ProofDAG: Cryptographically-signed derivation chain │
|
|
71
|
-
│ Hash: sha256:8f3a2b1c... (Reproducible, Auditable, Verifiable) │
|
|
72
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
73
|
-
```
|
|
74
|
-
|
|
75
|
-
**How Mathematical Foundations Make This Possible**:
|
|
76
|
-
|
|
77
|
-
| Foundation | Role | What It Provides |
|
|
78
|
-
|------------|------|-----------------|
|
|
79
|
-
| **Context Theory** (Spivak's Ologs) | Schema as Category | Automatic schema detection, semantic validation, consistent interpretation |
|
|
80
|
-
| **Type Theory** (Hindley-Milner) | Typed Tool Signatures | Compile-time validation, prevents invalid tool compositions |
|
|
81
|
-
| **Proof Theory** (Curry-Howard) | Proofs = Programs | Every conclusion has a derivation chain, reproducible reasoning |
|
|
82
|
-
| **Category Theory** | Morphism Composition | Tools as morphisms, validated composition, guaranteed well-formedness |
|
|
83
|
-
|
|
84
|
-
**The Three-Layer Stack**:
|
|
85
|
-
|
|
86
|
-
1. **rust-kgdb** (Foundation) - High-performance knowledge graph database
|
|
87
|
-
- 2.78µs lookup speed (35x faster than RDFox)
|
|
88
|
-
- Native Rust, zero-copy semantics, 24 bytes/triple
|
|
89
|
-
|
|
90
|
-
2. **HyperMind Agent** (Execution) - Schema-aware agent framework
|
|
91
|
-
- LLM Planner with schema injection
|
|
92
|
-
- Typed tool composition (kg.sparql.query, kg.datalog.apply, etc.)
|
|
93
|
-
- Memory management (working, episodic, long-term)
|
|
94
|
-
|
|
95
|
-
3. **Intelligence Control Plane** (Orchestration) - Neuro-symbolic integration
|
|
96
|
-
- Mathematical foundations (Context + Type + Proof Theory)
|
|
97
|
-
- ProofDAG generation for auditability
|
|
98
|
-
- Deterministic LLM outputs through symbolic grounding
|
|
99
|
-
|
|
100
|
-
**Result**: Transform any LLM from a "black box" into a **verifiable reasoning system** where every answer comes with mathematical proof of correctness.
|
|
101
|
-
|
|
102
|
-
---
|
|
103
|
-
|
|
104
|
-
## The ProofDAG: Verifiable AI Reasoning
|
|
105
|
-
|
|
106
|
-
Every HyperMind answer comes with a **ProofDAG** - a cryptographically-signed derivation graph that makes LLM outputs auditable and reproducible.
|
|
107
|
-
|
|
108
|
-
```
|
|
109
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
110
|
-
│ PROOFDAG VISUALIZATION │
|
|
111
|
-
│ │
|
|
112
|
-
│ ┌────────────────────────────────┐ │
|
|
113
|
-
│ │ CONCLUSION (Root) │ │
|
|
114
|
-
│ │ │ │
|
|
115
|
-
│ │ "Provider P001 is suspicious"│ │
|
|
116
|
-
│ │ Risk Score: 0.91 │ │
|
|
117
|
-
│ │ Confidence: 94% │ │
|
|
118
|
-
│ │ │ │
|
|
119
|
-
│ └───────────────┬────────────────┘ │
|
|
120
|
-
│ │ │
|
|
121
|
-
│ ┌───────────────┼───────────────┐ │
|
|
122
|
-
│ │ │ │ │
|
|
123
|
-
│ ▼ ▼ ▼ │
|
|
124
|
-
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
|
|
125
|
-
│ │ SPARQL Evidence │ │ Datalog Derived │ │ Embedding Match │ │
|
|
126
|
-
│ │ │ │ │ │ │ │
|
|
127
|
-
│ │ Tool: kg.sparql │ │ Tool: kg.datalog │ │ Tool: embeddings │ │
|
|
128
|
-
│ │ Query: SELECT... │ │ Rule: fraud(?P) │ │ Entity: P001 │ │
|
|
129
|
-
│ │ │ │ :- high_amount, │ │ │ │
|
|
130
|
-
│ │ Result: │ │ rapid_filing │ │ Result: │ │
|
|
131
|
-
│ │ 47 claims found │ │ │ │ 87% similar to │ │
|
|
132
|
-
│ │ Time: 2.3ms │ │ Result: MATCHED │ │ known fraud │ │
|
|
133
|
-
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
|
|
134
|
-
│ │
|
|
135
|
-
│ ════════════════════════════════════════════════════════════════ │
|
|
136
|
-
│ PROOF HASH: sha256:8f3a2b1c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a │
|
|
137
|
-
│ TIMESTAMP: 2025-12-15T10:30:00Z │
|
|
138
|
-
│ ════════════════════════════════════════════════════════════════ │
|
|
139
|
-
│ │
|
|
140
|
-
│ VERIFICATION: Anyone can replay this exact derivation and get │
|
|
141
|
-
│ the same conclusion with the same hash │
|
|
142
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
143
|
-
```
|
|
144
|
-
|
|
145
|
-
### How ProofDAGs Solve the LLM Evaluation Problem
|
|
146
|
-
|
|
147
|
-
Traditional LLMs have a fundamental problem: **no way to verify correctness**. HyperMind solves this with mathematical proof theory:
|
|
148
|
-
|
|
149
|
-
```
|
|
150
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
151
|
-
│ LLM EVALUATION: THE PROBLEM & SOLUTION │
|
|
152
|
-
│ │
|
|
153
|
-
│ THE PROBLEM WITH VANILLA LLMs: │
|
|
154
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
155
|
-
│ │ User: "Is Provider P001 suspicious?" │ │
|
|
156
|
-
│ │ LLM: "Yes, Provider P001 appears suspicious because..." │ │
|
|
157
|
-
│ │ │ │
|
|
158
|
-
│ │ Questions that CAN'T be answered: │ │
|
|
159
|
-
│ │ ✗ What data did the LLM actually look at? │ │
|
|
160
|
-
│ │ ✗ Did it hallucinate the evidence? │ │
|
|
161
|
-
│ │ ✗ Can we reproduce this answer tomorrow? │ │
|
|
162
|
-
│ │ ✗ How do we audit this decision for regulators? │ │
|
|
163
|
-
│ │ ✗ What's the basis for the confidence score? │ │
|
|
164
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
165
|
-
│ │
|
|
166
|
-
│ HYPERMIND'S SOLUTION: Proof Theory + Type Theory + Category Theory │
|
|
167
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
168
|
-
│ │ │ │
|
|
169
|
-
│ │ TYPE THEORY (Hindley-Milner): │ │
|
|
170
|
-
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
171
|
-
│ │ │ Every tool has a typed signature: │ │ │
|
|
172
|
-
│ │ │ kg.sparql.query : Query → BindingSet │ │ │
|
|
173
|
-
│ │ │ kg.datalog.apply : RuleSet → InferredFacts │ │ │
|
|
174
|
-
│ │ │ kg.embeddings.search : Entity → SimilarEntities │ │ │
|
|
175
|
-
│ │ │ │ │ │
|
|
176
|
-
│ │ │ LLM must produce plans that TYPE CHECK │ │ │
|
|
177
|
-
│ │ │ Invalid tool composition → compile-time rejection │ │ │
|
|
178
|
-
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
179
|
-
│ │ │ │
|
|
180
|
-
│ │ CATEGORY THEORY (Morphism Composition): │ │
|
|
181
|
-
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
182
|
-
│ │ │ Tools are morphisms in a category: │ │ │
|
|
183
|
-
│ │ │ │ │ │
|
|
184
|
-
│ │ │ Query ──sparql──→ BindingSet ──datalog──→ InferredFacts │ │ │
|
|
185
|
-
│ │ │ │ │ │
|
|
186
|
-
│ │ │ Composition validated: output(f) = input(g) for f;g │ │ │
|
|
187
|
-
│ │ │ This guarantees well-formed execution plans │ │ │
|
|
188
|
-
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
189
|
-
│ │ │ │
|
|
190
|
-
│ │ PROOF THEORY (Curry-Howard): │ │
|
|
191
|
-
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
192
|
-
│ │ │ Proofs are Programs, Types are Propositions │ │ │
|
|
193
|
-
│ │ │ │ │ │
|
|
194
|
-
│ │ │ Proposition: "P001 is suspicious" │ │ │
|
|
195
|
-
│ │ │ Proof: ProofDAG with derivation chain │ │ │
|
|
196
|
-
│ │ │ │ │ │
|
|
197
|
-
│ │ │ Γ ⊢ sparql("...") : BindingSet (47 claims) │ │ │
|
|
198
|
-
│ │ │ Γ ⊢ datalog(rules) : InferredFact (fraud matched) │ │ │
|
|
199
|
-
│ │ │ Γ ⊢ embedding(P001) : Similarity (0.87 score) │ │ │
|
|
200
|
-
│ │ │ ────────────────────────────────────────────────────── │ │ │
|
|
201
|
-
│ │ │ Γ ⊢ suspicious(P001) : Conclusion (QED) │ │ │
|
|
202
|
-
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
203
|
-
│ │ │ │
|
|
204
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
205
|
-
│ │
|
|
206
|
-
│ RESULT: LLM outputs become MATHEMATICALLY VERIFIABLE │
|
|
207
|
-
│ ✓ Every claim traced to specific SPARQL results │
|
|
208
|
-
│ ✓ Every inference justified by Datalog rule application │
|
|
209
|
-
│ ✓ Every similarity score backed by embedding computation │
|
|
210
|
-
│ ✓ Deterministic hash enables reproducibility │
|
|
211
|
-
│ ✓ Full audit trail for regulatory compliance │
|
|
212
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
213
|
-
```
|
|
214
|
-
|
|
215
|
-
**LLM Evaluation Metrics Improved by ProofDAGs**:
|
|
216
|
-
|
|
217
|
-
| Metric | Vanilla LLM | HyperMind + ProofDAG | Improvement |
|
|
218
|
-
|--------|-------------|---------------------|-------------|
|
|
219
|
-
| **Factual Accuracy** | ~60% (hallucinations) | 100% (grounded in KG) | +66% |
|
|
220
|
-
| **Reproducibility** | 0% (non-deterministic) | 100% (same hash = same answer) | ∞ |
|
|
221
|
-
| **Auditability** | 0% (black box) | 100% (full derivation chain) | ∞ |
|
|
222
|
-
| **Explainability** | Low (post-hoc) | High (proof witnesses) | +300% |
|
|
223
|
-
| **Regulatory Compliance** | Fails | Passes (GDPR Art. 22, SOX) | Required |
|
|
224
|
-
|
|
225
|
-
---
|
|
226
|
-
|
|
227
|
-
## What rust-kgdb Provides
|
|
228
|
-
|
|
229
|
-
### Core Database
|
|
230
|
-
- **GraphDB** - W3C compliant RDF quad store with SPOC/POCS/OCSP/CSPO indexes
|
|
231
|
-
- **SPARQL 1.1** - Full query and update support (64 builtin functions)
|
|
232
|
-
- **RDF 1.2** - Complete standard implementation
|
|
233
|
-
- **RDF-Star (RDF*)** - Quoted triples for statements about statements
|
|
234
|
-
- **Native Hypergraph** - Beyond RDF triples: n-ary relationships, hyperedges
|
|
235
|
-
|
|
236
|
-
### Data Model: RDF + Hypergraph
|
|
237
|
-
|
|
238
|
-
```
|
|
239
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
240
|
-
│ DATA MODEL COMPARISON │
|
|
241
|
-
│ │
|
|
242
|
-
│ TRADITIONAL RDF: HYPERGRAPH (rust-kgdb native): │
|
|
243
|
-
│ ┌─────────────────────┐ ┌─────────────────────────────────┐ │
|
|
244
|
-
│ │ Subject → Object │ │ Hyperedge connects N nodes │ │
|
|
245
|
-
│ │ (binary relation) │ │ (n-ary relation) │ │
|
|
246
|
-
│ │ │ │ │ │
|
|
247
|
-
│ │ A ──pred──→ B │ │ A ──┐ │ │
|
|
248
|
-
│ │ │ │ │ │ │
|
|
249
|
-
│ │ │ │ B ──┼── hyperedge ──→ D │ │
|
|
250
|
-
│ │ │ │ │ │ │
|
|
251
|
-
│ │ │ │ C ──┘ │ │
|
|
252
|
-
│ └─────────────────────┘ └─────────────────────────────────┘ │
|
|
253
|
-
│ │
|
|
254
|
-
│ RDF-Star (Quoted Triples): Memory Hypergraph (Agent Memory): │
|
|
255
|
-
│ ┌─────────────────────┐ ┌─────────────────────────────────┐ │
|
|
256
|
-
│ │ << A :knows B >> │ │ Episode links to N KG entities │ │
|
|
257
|
-
│ │ :certainty │ │ │ │
|
|
258
|
-
│ │ 0.95 │ │ Episode:001 ──→ Provider:P001 │ │
|
|
259
|
-
│ │ │ │ ──→ Claim:C123 │ │
|
|
260
|
-
│ │ (statement about │ │ ──→ Claimant:C001 │ │
|
|
261
|
-
│ │ a statement) │ │ │ │
|
|
262
|
-
│ └─────────────────────┘ └─────────────────────────────────┘ │
|
|
263
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
264
|
-
```
|
|
265
|
-
|
|
266
|
-
**RDF-Star Example** (metadata on statements):
|
|
267
|
-
```javascript
|
|
268
|
-
const db = new GraphDB('http://example.org/')
|
|
269
|
-
|
|
270
|
-
// Load RDF-Star data - quoted triples with metadata
|
|
271
|
-
db.loadTtl(`
|
|
272
|
-
@prefix : <http://example.org/> .
|
|
273
|
-
|
|
274
|
-
# Standard triple
|
|
275
|
-
:alice :knows :bob .
|
|
276
|
-
|
|
277
|
-
# RDF-Star: statement about a statement
|
|
278
|
-
<< :alice :knows :bob >> :certainty 0.95 ;
|
|
279
|
-
:source :linkedin ;
|
|
280
|
-
:validUntil "2025-12-31"^^xsd:date .
|
|
281
|
-
`, null)
|
|
282
|
-
|
|
283
|
-
// Query metadata about statements
|
|
284
|
-
const results = db.querySelect(`
|
|
285
|
-
PREFIX : <http://example.org/>
|
|
286
|
-
SELECT ?certainty ?source WHERE {
|
|
287
|
-
<< :alice :knows :bob >> :certainty ?certainty ;
|
|
288
|
-
:source ?source .
|
|
289
|
-
}
|
|
290
|
-
`)
|
|
291
|
-
// Returns: [{ certainty: "0.95", source: "http://example.org/linkedin" }]
|
|
292
|
-
```
|
|
293
|
-
|
|
294
|
-
**Native Hypergraph Use Cases**:
|
|
295
|
-
|
|
296
|
-
| Use Case | Why Hypergraph | RDF Workaround |
|
|
297
|
-
|----------|---------------|----------------|
|
|
298
|
-
| **Event participation** | Event links N participants directly | Reification (verbose) |
|
|
299
|
-
| **Document authorship** | Paper links N co-authors | Multiple triples |
|
|
300
|
-
| **Chemical reactions** | Reaction links N compounds | Named graphs |
|
|
301
|
-
| **Agent memory** | Episode links N entities investigated | Blank nodes |
|
|
302
|
-
|
|
303
|
-
**Hyperedge in Memory Ontology**:
|
|
304
|
-
```turtle
|
|
305
|
-
@prefix am: <http://hypermind.ai/memory#> .
|
|
306
|
-
@prefix ins: <http://insurance.org/> .
|
|
307
|
-
|
|
308
|
-
# Hyperedge: Episode links to multiple KG entities
|
|
309
|
-
<episode:001> a am:Episode ;
|
|
310
|
-
am:linksToEntity ins:Provider_P001 ; # N-ary link
|
|
311
|
-
am:linksToEntity ins:Claim_C123 ; # N-ary link
|
|
312
|
-
am:linksToEntity ins:Claimant_C001 ; # N-ary link
|
|
313
|
-
am:prompt "Investigate fraud ring" .
|
|
314
|
-
```
|
|
315
|
-
|
|
316
|
-
### Graph Analytics (GraphFrames)
|
|
317
|
-
- **PageRank** - Iterative ranking algorithm
|
|
318
|
-
- **Connected Components** - Union-find based component detection
|
|
319
|
-
- **Shortest Paths** - Landmark-based path finding
|
|
320
|
-
- **Triangle Count** - Graph density measurement
|
|
321
|
-
- **Motif Finding** - Pattern matching DSL (e.g., `"(a)-[e1]->(b); (b)-[e2]->(c)"`)
|
|
322
|
-
- **Label Propagation** - Community detection
|
|
323
|
-
- **Pregel API** - Bulk Synchronous Parallel computation model
|
|
324
|
-
|
|
325
|
-
### Why GraphFrames + SQL over SPARQL?
|
|
326
|
-
|
|
327
|
-
SPARQL excels at graph pattern matching but struggles with analytical workloads. GraphFrames bridges this gap: your data stays in RDF, but analytics run on Apache Arrow columnar format for 10-100x faster execution.
|
|
328
|
-
|
|
329
|
-
**SPARQL vs GraphFrames Comparison**:
|
|
330
|
-
|
|
331
|
-
| Use Case | SPARQL | GraphFrames | Winner |
|
|
332
|
-
|----------|--------|-------------|--------|
|
|
333
|
-
| **Simple Pattern Match** | `SELECT ?s ?o WHERE { ?s :knows ?o }` | `graph.find("(a)-[:knows]->(b)")` | SPARQL (simpler) |
|
|
334
|
-
| **Aggregation (1M rows)** | `SELECT (COUNT(?x) as ?c) GROUP BY ?g` - 850ms | `df.groupBy("g").count()` - 12ms | **GraphFrames (70x)** |
|
|
335
|
-
| **Window Function** | Not supported natively | `RANK() OVER (PARTITION BY dept ORDER BY salary)` | **GraphFrames** |
|
|
336
|
-
| **Running Total** | Requires SPARQL 1.1 subqueries | `SUM(amount) OVER (ORDER BY date ROWS UNBOUNDED)` | **GraphFrames** |
|
|
337
|
-
| **Top-K per Group** | Complex nested queries | `ROW_NUMBER() OVER (PARTITION BY category) <= 10` | **GraphFrames** |
|
|
338
|
-
| **Percentiles** | Not supported | `PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency)` | **GraphFrames** |
|
|
339
|
-
| **Export to Parquet** | Not supported | Native Apache Arrow integration | **GraphFrames** |
|
|
340
|
-
| **BI Tool Integration** | Limited | Direct connection via Arrow Flight | **GraphFrames** |
|
|
341
|
-
|
|
342
|
-
**Concrete Examples**:
|
|
343
|
-
|
|
344
|
-
```javascript
|
|
345
|
-
// SPARQL: Count claims by provider (takes 850ms on 1M rows)
|
|
346
|
-
const sparqlResult = db.querySelect(`
|
|
347
|
-
SELECT ?provider (COUNT(?claim) as ?count)
|
|
348
|
-
WHERE { ?claim :provider ?provider }
|
|
349
|
-
GROUP BY ?provider
|
|
350
|
-
ORDER BY DESC(?count)
|
|
351
|
-
LIMIT 10
|
|
352
|
-
`)
|
|
353
|
-
|
|
354
|
-
// GraphFrames: Same query (takes 12ms on 1M rows - 70x faster)
|
|
355
|
-
const gfResult = graph.sql(`
|
|
356
|
-
SELECT provider, COUNT(*) as claim_count
|
|
357
|
-
FROM edges
|
|
358
|
-
WHERE relationship = 'provider'
|
|
359
|
-
GROUP BY provider
|
|
360
|
-
ORDER BY claim_count DESC
|
|
361
|
-
LIMIT 10
|
|
362
|
-
`)
|
|
363
|
-
|
|
364
|
-
// GraphFrames: Window functions (impossible in SPARQL)
|
|
365
|
-
const ranked = graph.sql(`
|
|
366
|
-
SELECT
|
|
367
|
-
provider,
|
|
368
|
-
claim_amount,
|
|
369
|
-
RANK() OVER (PARTITION BY region ORDER BY claim_amount DESC) as region_rank,
|
|
370
|
-
SUM(claim_amount) OVER (PARTITION BY provider ORDER BY claim_date) as running_total,
|
|
371
|
-
AVG(claim_amount) OVER (PARTITION BY provider ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) as moving_avg
|
|
372
|
-
FROM claims
|
|
373
|
-
`)
|
|
374
|
-
|
|
375
|
-
// GraphFrames: Percentile analysis (impossible in SPARQL)
|
|
376
|
-
const percentiles = graph.sql(`
|
|
377
|
-
SELECT
|
|
378
|
-
provider,
|
|
379
|
-
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY claim_amount) as median,
|
|
380
|
-
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY claim_amount) as p95,
|
|
381
|
-
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY claim_amount) as p99
|
|
382
|
-
FROM claims
|
|
383
|
-
GROUP BY provider
|
|
384
|
-
`)
|
|
385
|
-
```
|
|
386
|
-
|
|
387
|
-
**When to Use Each**:
|
|
388
|
-
|
|
389
|
-
| Scenario | Recommendation | Reason |
|
|
390
|
-
|----------|---------------|--------|
|
|
391
|
-
| Graph traversal (friends-of-friends) | SPARQL | Property path syntax is cleaner |
|
|
392
|
-
| Pattern matching (fraud rings) | SPARQL or Motif | Both support cyclic patterns |
|
|
393
|
-
| Large aggregations | GraphFrames | Columnar execution is 10-100x faster |
|
|
394
|
-
| Window functions | GraphFrames | Not available in SPARQL |
|
|
395
|
-
| Export/BI integration | GraphFrames | Native Parquet/Arrow support |
|
|
396
|
-
| Schema inference | SPARQL | CONSTRUCT queries for RDF generation |
|
|
397
|
-
|
|
398
|
-
### OLAP Analytics Engine
|
|
399
|
-
|
|
400
|
-
rust-kgdb provides high-performance OLAP analytics over graph data:
|
|
401
|
-
|
|
402
|
-
```
|
|
403
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
404
|
-
│ OLAP ANALYTICS STACK │
|
|
405
|
-
│ │
|
|
406
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
407
|
-
│ │ GraphFrame API ││
|
|
408
|
-
│ │ graph.pageRank(), graph.connectedComponents(), graph.find(pattern) ││
|
|
409
|
-
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
410
|
-
│ ↓ │
|
|
411
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
412
|
-
│ │ Query Optimization Layer ││
|
|
413
|
-
│ │ - Predicate pushdown ││
|
|
414
|
-
│ │ - Join reordering ││
|
|
415
|
-
│ │ - WCOJ for cyclic queries ││
|
|
416
|
-
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
417
|
-
│ ↓ │
|
|
418
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
419
|
-
│ │ Columnar Execution Engine ││
|
|
420
|
-
│ │ - Vectorized operations ││
|
|
421
|
-
│ │ - Cache-optimized memory layout ││
|
|
422
|
-
│ │ - SIMD acceleration ││
|
|
423
|
-
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
424
|
-
│ ↓ │
|
|
425
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
426
|
-
│ │ GraphFrame (Vertices + Edges) ││
|
|
427
|
-
│ │ - vertices: id, properties ││
|
|
428
|
-
│ │ - edges: src, dst, relationship ││
|
|
429
|
-
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
430
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
431
|
-
```
|
|
432
|
-
|
|
433
|
-
**Graph Algorithms**:
|
|
434
|
-
|
|
435
|
-
| Algorithm | Complexity | Use Case |
|
|
436
|
-
|-----------|------------|----------|
|
|
437
|
-
| **PageRank** | O(E × iterations) | Influence ranking, fraud detection |
|
|
438
|
-
| **Connected Components** | O(V + E) | Cluster detection, entity resolution |
|
|
439
|
-
| **Shortest Paths** | O(V + E) | Path finding, relationship distance |
|
|
440
|
-
| **Triangle Count** | O(E^1.5) | Graph density, community structure |
|
|
441
|
-
| **Label Propagation** | O(E × iterations) | Community detection |
|
|
442
|
-
| **Motif Finding** | O(pattern-dependent) | Pattern matching, fraud rings |
|
|
443
|
-
|
|
444
|
-
**No Apache Spark Required**: Unlike traditional graph analytics that require separate Spark clusters, rust-kgdb includes a **native distributed OLAP engine** built on Apache Arrow columnar format. GraphFrames, Pregel, and all analytics run directly in your rust-kgdb cluster without additional infrastructure.
|
|
445
|
-
|
|
446
|
-
---
|
|
447
|
-
|
|
448
|
-
## Deep Dive: Pregel BSP (Bulk Synchronous Parallel)
|
|
449
|
-
|
|
450
|
-
**What is Pregel?**
|
|
451
|
-
|
|
452
|
-
Pregel is Google's **vertex-centric graph processing model**. Instead of thinking about edges, you think about vertices that:
|
|
453
|
-
1. **Receive** messages from neighbors
|
|
454
|
-
2. **Compute** based on messages and local state
|
|
455
|
-
3. **Send** messages to neighbors
|
|
456
|
-
4. **Vote to halt** when done
|
|
457
|
-
|
|
458
|
-
```
|
|
459
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
460
|
-
│ PREGEL: BULK SYNCHRONOUS PARALLEL │
|
|
461
|
-
│ │
|
|
462
|
-
│ Traditional vs Pregel Thinking: │
|
|
463
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
464
|
-
│ │ TRADITIONAL (edge-centric): PREGEL (vertex-centric): │ │
|
|
465
|
-
│ │ for each edge (u, v): for each vertex v in parallel: │ │
|
|
466
|
-
│ │ process(u, v) msgs = receive() │ │
|
|
467
|
-
│ │ v.state = compute(msgs) │ │
|
|
468
|
-
│ │ Problem: Hard to parallelize send(neighbors, newMsg) │ │
|
|
469
|
-
│ │ if done: voteToHalt() │ │
|
|
470
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
471
|
-
│ │
|
|
472
|
-
│ SUPERSTEP EXECUTION: │
|
|
473
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
474
|
-
│ │ │ │
|
|
475
|
-
│ │ Superstep 0 Superstep 1 Superstep 2 HALT │ │
|
|
476
|
-
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌────┐ │ │
|
|
477
|
-
│ │ │ A: init │───────→│ A: recv │───────→│ A: recv │───────→│ A:✓│ │ │
|
|
478
|
-
│ │ │ B: init │───────→│ B: recv │───────→│ B: recv │───────→│ B:✓│ │ │
|
|
479
|
-
│ │ │ C: init │───────→│ C: recv │───────→│ C: recv │───────→│ C:✓│ │ │
|
|
480
|
-
│ │ └─────────┘ └─────────┘ └─────────┘ └────┘ │ │
|
|
481
|
-
│ │ │ │ │ │ │
|
|
482
|
-
│ │ ▼ ▼ ▼ │ │
|
|
483
|
-
│ │ BARRIER BARRIER BARRIER DONE │ │
|
|
484
|
-
│ │ (all sync) (all sync) (all sync) │ │
|
|
485
|
-
│ │ │ │
|
|
486
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
487
|
-
│ │
|
|
488
|
-
│ KEY INSIGHT: Vertices process in PARALLEL, synchronize at BARRIERS │
|
|
489
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
490
|
-
```
|
|
491
|
-
|
|
492
|
-
**Pregel Shortest Paths Example**:
|
|
493
|
-
|
|
494
|
-
```javascript
|
|
495
|
-
const { pregelShortestPaths, GraphFrame } = require('rust-kgdb')
|
|
496
|
-
|
|
497
|
-
// Create a weighted graph
|
|
498
|
-
const graph = new GraphFrame(
|
|
499
|
-
JSON.stringify([
|
|
500
|
-
{ id: 'A' }, { id: 'B' }, { id: 'C' }, { id: 'D' }, { id: 'E' }
|
|
501
|
-
]),
|
|
502
|
-
JSON.stringify([
|
|
503
|
-
{ src: 'A', dst: 'B', weight: 1 },
|
|
504
|
-
{ src: 'A', dst: 'C', weight: 4 },
|
|
505
|
-
{ src: 'B', dst: 'C', weight: 2 },
|
|
506
|
-
{ src: 'B', dst: 'D', weight: 5 },
|
|
507
|
-
{ src: 'C', dst: 'D', weight: 1 },
|
|
508
|
-
{ src: 'D', dst: 'E', weight: 3 }
|
|
509
|
-
])
|
|
510
|
-
)
|
|
511
|
-
|
|
512
|
-
// Find shortest paths from landmarks A and B to all vertices
|
|
513
|
-
const distances = pregelShortestPaths(graph, ['A', 'B'])
|
|
514
|
-
console.log('Shortest distances:', JSON.parse(distances))
|
|
515
|
-
// Output:
|
|
516
|
-
// {
|
|
517
|
-
// "A": { "from_A": 0, "from_B": 1 },
|
|
518
|
-
// "B": { "from_A": 1, "from_B": 0 },
|
|
519
|
-
// "C": { "from_A": 3, "from_B": 2 },
|
|
520
|
-
// "D": { "from_A": 4, "from_B": 3 },
|
|
521
|
-
// "E": { "from_A": 7, "from_B": 6 }
|
|
522
|
-
// }
|
|
523
|
-
```
|
|
524
|
-
|
|
525
|
-
**How Pregel Shortest Paths Works**:
|
|
526
|
-
|
|
527
|
-
```
|
|
528
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
529
|
-
│ PREGEL SHORTEST PATHS EXECUTION │
|
|
530
|
-
│ │
|
|
531
|
-
│ Graph: A─1→B─2→C─1→D─3→E │
|
|
532
|
-
│ └──4──┘ │
|
|
533
|
-
│ │
|
|
534
|
-
│ SUPERSTEP 0 (Initialize): │
|
|
535
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
536
|
-
│ │ A.dist = 0 (source) │ │
|
|
537
|
-
│ │ B.dist = ∞ │ │
|
|
538
|
-
│ │ C.dist = ∞ │ │
|
|
539
|
-
│ │ D.dist = ∞ │ │
|
|
540
|
-
│ │ E.dist = ∞ │ │
|
|
541
|
-
│ │ A sends: (B, 1), (C, 4) │ │
|
|
542
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
543
|
-
│ │
|
|
544
|
-
│ SUPERSTEP 1 (Process A's messages): │
|
|
545
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
546
|
-
│ │ B receives (B, 1) → B.dist = min(∞, 1) = 1 │ │
|
|
547
|
-
│ │ C receives (C, 4) → C.dist = min(∞, 4) = 4 │ │
|
|
548
|
-
│ │ B sends: (C, 1+2=3), (D, 1+5=6) │ │
|
|
549
|
-
│ │ C sends: (D, 4+1=5) │ │
|
|
550
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
551
|
-
│ │
|
|
552
|
-
│ SUPERSTEP 2 (Process B, C messages): │
|
|
553
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
554
|
-
│ │ C receives (C, 3) → C.dist = min(4, 3) = 3 ← IMPROVED! │ │
|
|
555
|
-
│ │ D receives (D, 6), (D, 5) → D.dist = min(∞, 5) = 5 │ │
|
|
556
|
-
│ │ C sends: (D, 3+1=4) ← Propagate improvement │ │
|
|
557
|
-
│ │ D sends: (E, 5+3=8) │ │
|
|
558
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
559
|
-
│ │
|
|
560
|
-
│ SUPERSTEP 3: │
|
|
561
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
562
|
-
│ │ D receives (D, 4) → D.dist = min(5, 4) = 4 ← IMPROVED! │ │
|
|
563
|
-
│ │ E receives (E, 8) → E.dist = min(∞, 8) = 8 │ │
|
|
564
|
-
│ │ D sends: (E, 4+3=7) ← Propagate improvement │ │
|
|
565
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
566
|
-
│ │
|
|
567
|
-
│ SUPERSTEP 4: │
|
|
568
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
569
|
-
│ │ E receives (E, 7) → E.dist = min(8, 7) = 7 ← FINAL │ │
|
|
570
|
-
│ │ No new improvements → All vertices vote to halt │ │
|
|
571
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
572
|
-
│ │
|
|
573
|
-
│ RESULT: A=0, B=1, C=3, D=4, E=7 │
|
|
574
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
575
|
-
```
|
|
576
|
-
|
|
577
|
-
**Pregel vs Other Approaches**:
|
|
578
|
-
|
|
579
|
-
| Approach | Pros | Cons | When to Use |
|
|
580
|
-
|----------|------|------|-------------|
|
|
581
|
-
| **Pregel (BSP)** | Simple model, automatic parallelism | Barrier overhead | Iterative algorithms |
|
|
582
|
-
| **GraphX (Spark)** | Mature ecosystem | Requires Spark cluster | Already using Spark |
|
|
583
|
-
| **Native (rust-kgdb)** | Zero dependencies, fastest | Less mature | Production deployment |
|
|
584
|
-
| **MapReduce** | Fault tolerant | High latency | Batch processing |
|
|
585
|
-
|
|
586
|
-
**Algorithms Built on Pregel in rust-kgdb**:
|
|
587
|
-
|
|
588
|
-
| Algorithm | Supersteps | Message Type | Use Case |
|
|
589
|
-
|-----------|------------|--------------|----------|
|
|
590
|
-
| **Shortest Paths** | O(diameter) | (vertex, distance) | Route finding |
|
|
591
|
-
| **PageRank** | 20 (typical) | (vertex, rank contribution) | Influence ranking |
|
|
592
|
-
| **Connected Components** | O(diameter) | (vertex, component_id) | Cluster detection |
|
|
593
|
-
| **Label Propagation** | O(log n) | (vertex, label) | Community detection |
|
|
594
|
-
|
|
595
|
-
---
|
|
596
|
-
|
|
597
|
-
**GraphFrame Example - Degrees & Analytics**:
|
|
598
|
-
```javascript
|
|
599
|
-
const { GraphFrame, friendsGraph } = require('rust-kgdb')
|
|
600
|
-
|
|
601
|
-
// Create graph from vertices and edges
|
|
602
|
-
const graph = new GraphFrame(
|
|
603
|
-
JSON.stringify([
|
|
604
|
-
{ id: 'alice' }, { id: 'bob' }, { id: 'charlie' }, { id: 'david' }
|
|
605
|
-
]),
|
|
606
|
-
JSON.stringify([
|
|
607
|
-
{ src: 'alice', dst: 'bob' },
|
|
608
|
-
{ src: 'alice', dst: 'charlie' },
|
|
609
|
-
{ src: 'bob', dst: 'charlie' },
|
|
610
|
-
{ src: 'charlie', dst: 'david' }
|
|
611
|
-
])
|
|
612
|
-
)
|
|
613
|
-
|
|
614
|
-
// Degree analysis
|
|
615
|
-
const degrees = JSON.parse(graph.degrees())
|
|
616
|
-
console.log('Degrees:', degrees)
|
|
617
|
-
// Output: { alice: { in: 0, out: 2 }, bob: { in: 1, out: 1 }, charlie: { in: 2, out: 1 }, david: { in: 1, out: 0 } }
|
|
618
|
-
|
|
619
|
-
// PageRank (fraud detection: who has most influence?)
|
|
620
|
-
const pagerank = JSON.parse(graph.pageRank(0.85, 20))
|
|
621
|
-
console.log('PageRank:', pagerank)
|
|
622
|
-
// Output: { alice: 0.15, bob: 0.21, charlie: 0.38, david: 0.26 }
|
|
623
|
-
|
|
624
|
-
// Triangle count (graph density)
|
|
625
|
-
console.log('Triangles:', graph.triangleCount()) // 1
|
|
626
|
-
|
|
627
|
-
// Motif finding (pattern matching)
|
|
628
|
-
const patterns = JSON.parse(graph.find('(a)-[e1]->(b); (b)-[e2]->(c)'))
|
|
629
|
-
console.log('Chain patterns:', patterns)
|
|
630
|
-
// Finds: alice→bob→charlie, bob→charlie→david
|
|
631
|
-
```
|
|
632
|
-
|
|
633
|
-
### Query Optimizations
|
|
634
|
-
|
|
635
|
-
**WCOJ (Worst-Case Optimal Join)**:
|
|
636
|
-
```
|
|
637
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
638
|
-
│ WCOJ vs TRADITIONAL JOIN │
|
|
639
|
-
│ │
|
|
640
|
-
│ Query: Find triangles (a)→(b)→(c)→(a) │
|
|
641
|
-
│ │
|
|
642
|
-
│ TRADITIONAL (Hash Join): WCOJ (Leapfrog Triejoin): │
|
|
643
|
-
│ ┌─────────────────────────┐ ┌─────────────────────────┐ │
|
|
644
|
-
│ │ Step 1: Join(E1, E2) │ │ Intersect iterators │ │
|
|
645
|
-
│ │ O(n²) worst │ │ on sorted indexes │ │
|
|
646
|
-
│ │ Step 2: Join(result, E3)│ │ │ │
|
|
647
|
-
│ │ O(n²) worst │ │ O(n^(w/2)) guaranteed │ │
|
|
648
|
-
│ │ │ │ w = fractional edge │ │
|
|
649
|
-
│ │ Total: O(n⁴) possible │ │ cover number │ │
|
|
650
|
-
│ └─────────────────────────┘ └─────────────────────────┘ │
|
|
651
|
-
│ │
|
|
652
|
-
│ For cyclic queries (fraud rings!), WCOJ is exponentially faster │
|
|
653
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
654
|
-
```
|
|
655
|
-
|
|
656
|
-
**Sparse Matrix Representations** (for Datalog reasoning):
|
|
657
|
-
|
|
658
|
-
| Format | Structure | Best For |
|
|
659
|
-
|--------|-----------|----------|
|
|
660
|
-
| **CSR** (Compressed Sparse Row) | Row pointers + column indices | Forward traversal (S→P→O) |
|
|
661
|
-
| **CSC** (Compressed Sparse Column) | Column pointers + row indices | Backward traversal (O→P→S) |
|
|
662
|
-
| **COO** (Coordinate) | (row, col, val) tuples | Incremental updates |
|
|
663
|
-
|
|
664
|
-
**Semi-Naive Datalog Evaluation**:
|
|
665
|
-
```
|
|
666
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
667
|
-
│ SEMI-NAIVE OPTIMIZATION │
|
|
668
|
-
│ │
|
|
669
|
-
│ Naive: Each iteration re-evaluates ALL rules on ALL facts │
|
|
670
|
-
│ Semi-Naive: Only evaluate rules on NEW facts from previous iteration │
|
|
671
|
-
│ │
|
|
672
|
-
│ Iteration 1: Δ¹ = immediate consequences of base facts │
|
|
673
|
-
│ Iteration 2: Δ² = rules applied to Δ¹ only (not base facts again) │
|
|
674
|
-
│ ... │
|
|
675
|
-
│ Fixpoint: When Δⁿ = ∅ │
|
|
676
|
-
│ │
|
|
677
|
-
│ Speedup: O(n) → O(Δ) per iteration │
|
|
678
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
679
|
-
```
|
|
680
|
-
|
|
681
|
-
**Index Structures**:
|
|
682
|
-
|
|
683
|
-
| Index | Pattern | Lookup Time |
|
|
684
|
-
|-------|---------|-------------|
|
|
685
|
-
| **SPOC** | Subject-Predicate-Object-Context | O(1) exact match |
|
|
686
|
-
| **POCS** | Predicate-Object-Context-Subject | O(1) reverse lookup |
|
|
687
|
-
| **OCSP** | Object-Context-Subject-Predicate | O(1) object queries |
|
|
688
|
-
| **CSPO** | Context-Subject-Predicate-Object | O(1) named graph queries |
|
|
689
|
-
|
|
690
|
-
### Distributed GraphDB Cluster (v0.2.0)
|
|
691
|
-
|
|
692
|
-
Production-ready distributed architecture for billion-triple scale:
|
|
693
|
-
|
|
694
|
-
```
|
|
695
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
696
|
-
│ DISTRIBUTED CLUSTER ARCHITECTURE │
|
|
697
|
-
│ │
|
|
698
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
699
|
-
│ │ COORDINATOR NODE ││
|
|
700
|
-
│ │ - Query routing & optimization ││
|
|
701
|
-
│ │ - HDRF partition assignment ││
|
|
702
|
-
│ │ - Result aggregation ││
|
|
703
|
-
│ │ - Raft consensus leader ││
|
|
704
|
-
│ └──────────────────────────────┬──────────────────────────────────────────┘│
|
|
705
|
-
│ │ gRPC │
|
|
706
|
-
│ ┌──────────────────────┼──────────────────────┐ │
|
|
707
|
-
│ │ │ │ │
|
|
708
|
-
│ ▼ ▼ ▼ │
|
|
709
|
-
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
710
|
-
│ │ EXECUTOR 0 │ │ EXECUTOR 1 │ │ EXECUTOR 2 │ │
|
|
711
|
-
│ │ │ │ │ │ │ │
|
|
712
|
-
│ │ Partition 0 │ │ Partition 1 │ │ Partition 2 │ │
|
|
713
|
-
│ │ Partition 3 │ │ Partition 4 │ │ Partition 5 │ │
|
|
714
|
-
│ │ │ │ │ │ │ │
|
|
715
|
-
│ │ RocksDB/LMDB │ │ RocksDB/LMDB │ │ RocksDB/LMDB │ │
|
|
716
|
-
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
717
|
-
│ │
|
|
718
|
-
│ HDRF Partitioning: High-degree vertices replicated for load balancing │
|
|
719
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
720
|
-
```
|
|
721
|
-
|
|
722
|
-
**HDRF (High-Degree-Replicated-First) Partitioning**:
|
|
723
|
-
- Streaming edge partitioner - O(1) assignment decisions
|
|
724
|
-
- High-degree vertices (hubs) replicated across partitions
|
|
725
|
-
- Minimizes cross-partition communication
|
|
726
|
-
- Subject-anchored: all triples for a subject on same partition
|
|
727
|
-
|
|
728
|
-
**Deployment** (Kubernetes):
|
|
729
|
-
```bash
|
|
730
|
-
# Deploy cluster via Helm
|
|
731
|
-
helm install rust-kgdb ./infra/helm -n rust-kgdb --create-namespace
|
|
732
|
-
|
|
733
|
-
# Scale executors
|
|
734
|
-
kubectl scale deployment rust-kgdb-executor --replicas=5 -n rust-kgdb
|
|
735
|
-
```
|
|
736
|
-
|
|
737
|
-
**Storage Backends**:
|
|
738
|
-
| Backend | Persistence | Use Case |
|
|
739
|
-
|---------|-------------|----------|
|
|
740
|
-
| **InMemory** | None | Development, testing |
|
|
741
|
-
| **RocksDB** | LSM-tree | Write-heavy workloads |
|
|
742
|
-
| **LMDB** | B+tree, mmap | Read-heavy workloads |
|
|
743
|
-
|
|
744
|
-
### Distributed Cluster (v0.2.0)
|
|
745
|
-
- **HDRF Partitioning** - High-Degree-Replicated-First streaming partitioner
|
|
746
|
-
- **Coordinator + Executors** - gRPC-based query distribution
|
|
747
|
-
- **Raft Consensus** - Distributed coordination (planned)
|
|
748
|
-
- **Kubernetes Native** - Helm charts included
|
|
749
|
-
|
|
750
|
-
### AI & Embeddings
|
|
751
|
-
- **EmbeddingService** - HNSW approximate nearest neighbor search
|
|
752
|
-
- **1-Hop ARCADE Cache** - Neighbor-aware embedding retrieval
|
|
753
|
-
- **Multiple Providers** - OpenAI, Ollama, Anthropic, or custom
|
|
754
|
-
|
|
755
|
-
### Reasoning
|
|
756
|
-
- **Datalog** - Semi-naive rule evaluation with stratified negation (distributed-ready)
|
|
757
|
-
- **HyperMindAgent** - Pattern-based intent classification (no LLM calls)
|
|
758
|
-
|
|
759
|
-
---
|
|
760
|
-
|
|
761
|
-
## Deep Dive: Motif Pattern Matching
|
|
762
|
-
|
|
763
|
-
**What is Motif Finding?**
|
|
764
|
-
|
|
765
|
-
Motif finding is a **graph pattern search** that finds all subgraphs matching a specified pattern. Unlike SPARQL which matches RDF triple patterns, Motif uses a more intuitive DSL designed for relationship analysis.
|
|
766
|
-
|
|
767
|
-
```
|
|
768
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
769
|
-
│ MOTIF vs SPARQL: WHEN TO USE EACH │
|
|
770
|
-
│ │
|
|
771
|
-
│ SPARQL (RDF Triple Patterns): MOTIF (Graph Pattern DSL): │
|
|
772
|
-
│ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │
|
|
773
|
-
│ │ SELECT ?a ?b ?c WHERE { │ │ "(a)-[e1]->(b); (b)-[e2]->(c)" │
|
|
774
|
-
│ │ ?a :knows ?b . │ │ │ │
|
|
775
|
-
│ │ ?b :knows ?c . │ │ More readable for complex │ │
|
|
776
|
-
│ │ } │ │ multi-hop patterns │ │
|
|
777
|
-
│ └─────────────────────────────┘ └─────────────────────────────┘ │
|
|
778
|
-
│ │
|
|
779
|
-
│ SPARQL is better for: MOTIF is better for: │
|
|
780
|
-
│ • RDF data with named predicates • Relationship chains │
|
|
781
|
-
│ • FILTER expressions • Cyclic patterns (fraud rings) │
|
|
782
|
-
│ • OPTIONAL patterns • Subgraph matching │
|
|
783
|
-
│ • Aggregation (COUNT, GROUP BY) • Visual pattern specification │
|
|
784
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
785
|
-
```
|
|
786
|
-
|
|
787
|
-
**Motif Pattern Syntax**:
|
|
788
|
-
|
|
789
|
-
| Pattern | Meaning | Example Match |
|
|
790
|
-
|---------|---------|---------------|
|
|
791
|
-
| `(a)-[e]->(b)` | a has edge e to b | alice→bob |
|
|
792
|
-
| `(a)-[e1]->(b); (b)-[e2]->(c)` | Chain: a→b→c | alice→bob→charlie |
|
|
793
|
-
| `(a)-[e1]->(b); (a)-[e2]->(c)` | Fork: a→b and a→c | alice→bob, alice→charlie |
|
|
794
|
-
| `(a)-[e1]->(b); (b)-[e2]->(a)` | **Cycle**: a→b→a | Mutual relationship (fraud ring) |
|
|
795
|
-
| `(a)-[e1]->(b); (b)-[e2]->(c); (c)-[e3]->(a)` | **Triangle** | Classic fraud pattern |
|
|
796
|
-
|
|
797
|
-
**Fraud Ring Detection with Motif**:
|
|
798
|
-
|
|
799
|
-
```javascript
|
|
800
|
-
const { GraphFrame } = require('rust-kgdb')
|
|
801
|
-
|
|
802
|
-
// Build transaction graph
|
|
803
|
-
const txGraph = new GraphFrame(
|
|
804
|
-
JSON.stringify([
|
|
805
|
-
{ id: 'account_A' }, { id: 'account_B' },
|
|
806
|
-
{ id: 'account_C' }, { id: 'account_D' }
|
|
807
|
-
]),
|
|
808
|
-
JSON.stringify([
|
|
809
|
-
{ src: 'account_A', dst: 'account_B', relationship: 'transfer', amount: 50000 },
|
|
810
|
-
{ src: 'account_B', dst: 'account_C', relationship: 'transfer', amount: 49500 },
|
|
811
|
-
{ src: 'account_C', dst: 'account_A', relationship: 'transfer', amount: 49000 }, // CYCLE!
|
|
812
|
-
{ src: 'account_D', dst: 'account_A', relationship: 'transfer', amount: 1000 } // Normal
|
|
813
|
-
])
|
|
814
|
-
)
|
|
815
|
-
|
|
816
|
-
// Find triangular money flows (classic money laundering pattern)
|
|
817
|
-
const triangles = txGraph.find('(a)-[e1]->(b); (b)-[e2]->(c); (c)-[e3]->(a)')
|
|
818
|
-
console.log('Suspicious triangles:', JSON.parse(triangles))
|
|
819
|
-
// Output: [{ a: 'account_A', b: 'account_B', c: 'account_C', ... }]
|
|
820
|
-
|
|
821
|
-
// Find chains of 3+ hops (structuring detection)
|
|
822
|
-
const chains = txGraph.find('(a)-[e1]->(b); (b)-[e2]->(c); (c)-[e3]->(d)')
|
|
823
|
-
console.log('Long chains:', JSON.parse(chains))
|
|
824
|
-
```
|
|
825
|
-
|
|
826
|
-
**Performance Characteristics**:
|
|
827
|
-
|
|
828
|
-
| Pattern Type | Complexity | Notes |
|
|
829
|
-
|--------------|------------|-------|
|
|
830
|
-
| Simple edge `(a)->(b)` | O(E) | Linear scan |
|
|
831
|
-
| 2-hop chain `(a)->(b)->(c)` | O(E × avg_degree) | Index-assisted |
|
|
832
|
-
| Triangle `(a)->(b)->(c)->(a)` | O(E^1.5) | WCOJ optimization |
|
|
833
|
-
| 4-clique | O(E²) worst | Uses worst-case optimal joins |
|
|
834
|
-
|
|
835
|
-
---
|
|
836
|
-
|
|
837
|
-
## Deep Dive: Datalog Rule Engine
|
|
838
|
-
|
|
839
|
-
**What is Datalog?**
|
|
840
|
-
|
|
841
|
-
Datalog is a **declarative logic programming language** for expressing recursive queries. Unlike SPARQL which can only match patterns, Datalog can **derive new facts** from existing facts using rules.
|
|
842
|
-
|
|
843
|
-
```
|
|
844
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
845
|
-
│ DATALOG: RULE-BASED REASONING │
|
|
846
|
-
│ │
|
|
847
|
-
│ FACTS (What we know): │
|
|
848
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
849
|
-
│ │ parent(alice, bob). % Alice is parent of Bob │ │
|
|
850
|
-
│ │ parent(bob, charlie). % Bob is parent of Charlie │ │
|
|
851
|
-
│ │ parent(charlie, diana). % Charlie is parent of Diana │ │
|
|
852
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
853
|
-
│ │
|
|
854
|
-
│ RULES (How to derive new facts): │
|
|
855
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
856
|
-
│ │ ancestor(X, Y) :- parent(X, Y). % Direct parent │ │
|
|
857
|
-
│ │ ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y). % Recursive! │ │
|
|
858
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
859
|
-
│ │
|
|
860
|
-
│ DERIVED FACTS (Automatically computed): │
|
|
861
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
862
|
-
│ │ ancestor(alice, bob). % From rule 1 │ │
|
|
863
|
-
│ │ ancestor(bob, charlie). % From rule 1 │ │
|
|
864
|
-
│ │ ancestor(alice, charlie). % From rule 2: alice→bob→charlie │ │
|
|
865
|
-
│ │ ancestor(alice, diana). % From rule 2: alice→bob→charlie→diana │ │
|
|
866
|
-
│ │ ancestor(bob, diana). % From rule 2: bob→charlie→diana │ │
|
|
867
|
-
│ │ ancestor(charlie, diana). % From rule 1 │ │
|
|
868
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
869
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
870
|
-
```
|
|
871
|
-
|
|
872
|
-
### Semi-Naive Evaluation (Performance Optimization)
|
|
873
|
-
|
|
874
|
-
**What is Semi-Naive?**
|
|
875
|
-
|
|
876
|
-
When evaluating recursive rules, the naive approach re-evaluates ALL rules on ALL facts every iteration. Semi-naive only evaluates rules on **newly derived facts** from the previous iteration.
|
|
877
|
-
|
|
878
|
-
```
|
|
879
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
880
|
-
│ NAIVE vs SEMI-NAIVE EVALUATION │
|
|
881
|
-
│ │
|
|
882
|
-
│ Rule: ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y). │
|
|
883
|
-
│ Base: 3 parent facts │
|
|
884
|
-
│ │
|
|
885
|
-
│ NAIVE APPROACH: SEMI-NAIVE APPROACH: │
|
|
886
|
-
│ ┌─────────────────────────┐ ┌─────────────────────────┐ │
|
|
887
|
-
│ │ Iter 1: 3×3 = 9 checks │ │ Iter 1: 3 new ancestors │ │
|
|
888
|
-
│ │ Iter 2: 6×6 = 36 checks │ │ Iter 2: only check Δ¹ │ │
|
|
889
|
-
│ │ Iter 3: 9×9 = 81 checks │ │ Iter 3: only check Δ² │ │
|
|
890
|
-
│ │ ...exponential growth │ │ ...linear in new facts │ │
|
|
891
|
-
│ └─────────────────────────┘ └─────────────────────────┘ │
|
|
892
|
-
│ │
|
|
893
|
-
│ Mathematical notation: │
|
|
894
|
-
│ Δⁿ = facts derived in iteration n │
|
|
895
|
-
│ Semi-naive: only join base facts with Δⁿ⁻¹ (not entire fact set) │
|
|
896
|
-
│ │
|
|
897
|
-
│ Speedup: O(n²) → O(n × Δ) where Δ << n │
|
|
898
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
899
|
-
```
|
|
900
|
-
|
|
901
|
-
### Stratified Negation (Safe Negation in Rules)
|
|
902
|
-
|
|
903
|
-
**What is Stratified Negation?**
|
|
904
|
-
|
|
905
|
-
Negation in Datalog is tricky: `not fraud(X)` means "X is not proven to be fraud". But what if the rule deriving `fraud(X)` hasn't run yet? Stratification solves this by:
|
|
906
|
-
|
|
907
|
-
1. **Ordering rules into strata** - Rules with negation run AFTER the rules they negate
|
|
908
|
-
2. **Computing each stratum to fixpoint** - Before moving to the next
|
|
909
|
-
|
|
910
|
-
```
|
|
911
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
912
|
-
│ STRATIFIED NEGATION │
|
|
913
|
-
│ │
|
|
914
|
-
│ Problem: When can we evaluate "not fraud(X)"? │
|
|
915
|
-
│ │
|
|
916
|
-
│ UNSTRATIFIED (WRONG): │
|
|
917
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
918
|
-
│ │ safe(X) :- claim(X), not fraud(X). % Safe if not fraud │ │
|
|
919
|
-
│ │ fraud(X) :- claim(X), high_amount(X).% Fraud if high amount │ │
|
|
920
|
-
│ │ │ │
|
|
921
|
-
│ │ If we evaluate safe(X) before fraud(X) is computed, │ │
|
|
922
|
-
│ │ we get WRONG results (everything looks safe!) │ │
|
|
923
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
924
|
-
│ │
|
|
925
|
-
│ STRATIFIED (CORRECT): │
|
|
926
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
927
|
-
│ │ STRATUM 1: Compute all positive facts │ │
|
|
928
|
-
│ │ fraud(X) :- claim(X), high_amount(X). ← Run first! │ │
|
|
929
|
-
│ │ │ │
|
|
930
|
-
│ │ STRATUM 2: Now negation is safe │ │
|
|
931
|
-
│ │ safe(X) :- claim(X), not fraud(X). ← Run after stratum 1 │ │
|
|
932
|
-
│ │ │ │
|
|
933
|
-
│ │ Dependency graph: safe depends on NOT fraud, so fraud must be │ │
|
|
934
|
-
│ │ fully computed before safe can be evaluated. │ │
|
|
935
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
936
|
-
│ │
|
|
937
|
-
│ rust-kgdb automatically stratifies your rules! │
|
|
938
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
939
|
-
```
|
|
940
|
-
|
|
941
|
-
### Datalog in Distributed Mode
|
|
942
|
-
|
|
943
|
-
**Distributed Datalog Execution**: rust-kgdb's Datalog engine works in distributed clusters:
|
|
944
|
-
|
|
945
|
-
```
|
|
946
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
947
|
-
│ DISTRIBUTED DATALOG EXECUTION │
|
|
948
|
-
│ │
|
|
949
|
-
│ COORDINATOR │
|
|
950
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
951
|
-
│ │ 1. Parse Datalog program │ │
|
|
952
|
-
│ │ 2. Stratify rules (compute dependency order) │ │
|
|
953
|
-
│ │ 3. For each stratum: │ │
|
|
954
|
-
│ │ a. Broadcast rules to all executors │ │
|
|
955
|
-
│ │ b. Each executor evaluates on local partition │ │
|
|
956
|
-
│ │ c. Exchange facts at partition boundaries (shuffle) │ │
|
|
957
|
-
│ │ d. Repeat until global fixpoint │ │
|
|
958
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
959
|
-
│ │ │
|
|
960
|
-
│ ┌───────────────┼───────────────┐ │
|
|
961
|
-
│ ▼ ▼ ▼ │
|
|
962
|
-
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
963
|
-
│ │ EXECUTOR 0 │ │ EXECUTOR 1 │ │ EXECUTOR 2 │ │
|
|
964
|
-
│ │ │ │ │ │ │ │
|
|
965
|
-
│ │ Local facts │ │ Local facts │ │ Local facts │ │
|
|
966
|
-
│ │ + Rules │ │ + Rules │ │ + Rules │ │
|
|
967
|
-
│ │ = Local Δ │ │ = Local Δ │ │ = Local Δ │ │
|
|
968
|
-
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
|
969
|
-
│ │ │ │ │
|
|
970
|
-
│ └───────────────┼───────────────┘ │
|
|
971
|
-
│ ▼ │
|
|
972
|
-
│ FACT EXCHANGE │
|
|
973
|
-
│ (hash-partitioned shuffle) │
|
|
974
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
975
|
-
```
|
|
976
|
-
|
|
977
|
-
**Complete Datalog Example**:
|
|
978
|
-
|
|
979
|
-
```javascript
|
|
980
|
-
const { DatalogProgram, evaluateDatalog, queryDatalog } = require('rust-kgdb')
|
|
981
|
-
|
|
982
|
-
const program = new DatalogProgram()
|
|
983
|
-
|
|
984
|
-
// Add base facts (from your knowledge graph)
|
|
985
|
-
program.addFact(JSON.stringify({ predicate: 'claim', terms: ['CLM001'] }))
|
|
986
|
-
program.addFact(JSON.stringify({ predicate: 'claim', terms: ['CLM002'] }))
|
|
987
|
-
program.addFact(JSON.stringify({ predicate: 'claim', terms: ['CLM003'] }))
|
|
988
|
-
program.addFact(JSON.stringify({ predicate: 'amount', terms: ['CLM001', '150000'] }))
|
|
989
|
-
program.addFact(JSON.stringify({ predicate: 'amount', terms: ['CLM002', '500'] }))
|
|
990
|
-
program.addFact(JSON.stringify({ predicate: 'amount', terms: ['CLM003', '200000'] }))
|
|
991
|
-
program.addFact(JSON.stringify({ predicate: 'provider', terms: ['CLM001', 'PROV_A'] }))
|
|
992
|
-
program.addFact(JSON.stringify({ predicate: 'provider', terms: ['CLM003', 'PROV_A'] }))
|
|
993
|
-
|
|
994
|
-
// Define rules (NICB fraud patterns)
|
|
995
|
-
// Rule 1: High amount claims (> $100,000) are suspicious
|
|
996
|
-
program.addRule(JSON.stringify({
|
|
997
|
-
head: { predicate: 'high_amount', terms: ['?C'] },
|
|
998
|
-
body: [
|
|
999
|
-
{ predicate: 'claim', terms: ['?C'] },
|
|
1000
|
-
{ predicate: 'amount', terms: ['?C', '?A'] },
|
|
1001
|
-
{ predicate: 'gt', terms: ['?A', '100000'] } // Built-in comparison
|
|
1002
|
-
]
|
|
1003
|
-
}))
|
|
1004
|
-
|
|
1005
|
-
// Rule 2: Providers with multiple high-amount claims need investigation
|
|
1006
|
-
program.addRule(JSON.stringify({
|
|
1007
|
-
head: { predicate: 'investigate_provider', terms: ['?P'] },
|
|
1008
|
-
body: [
|
|
1009
|
-
{ predicate: 'high_amount', terms: ['?C1'] },
|
|
1010
|
-
{ predicate: 'high_amount', terms: ['?C2'] },
|
|
1011
|
-
{ predicate: 'provider', terms: ['?C1', '?P'] },
|
|
1012
|
-
{ predicate: 'provider', terms: ['?C2', '?P'] },
|
|
1013
|
-
{ predicate: 'neq', terms: ['?C1', '?C2'] } // Different claims
|
|
1014
|
-
]
|
|
1015
|
-
}))
|
|
1016
|
-
|
|
1017
|
-
// Evaluate to fixpoint (semi-naive, stratified)
|
|
1018
|
-
const allFacts = JSON.parse(evaluateDatalog(program))
|
|
1019
|
-
console.log('Derived facts:', allFacts)
|
|
1020
|
-
// Includes: high_amount(CLM001), high_amount(CLM003), investigate_provider(PROV_A)
|
|
1021
|
-
|
|
1022
|
-
// Query specific predicate
|
|
1023
|
-
const toInvestigate = JSON.parse(queryDatalog(program, 'investigate_provider'))
|
|
1024
|
-
console.log('Providers to investigate:', toInvestigate)
|
|
1025
|
-
// Output: [{ predicate: 'investigate_provider', terms: ['PROV_A'] }]
|
|
1026
|
-
```
|
|
1027
|
-
|
|
1028
|
-
---
|
|
1029
|
-
|
|
1030
|
-
## Deep Dive: ARCADE 1-Hop Cache
|
|
1031
|
-
|
|
1032
|
-
**What is ARCADE?**
|
|
1033
|
-
|
|
1034
|
-
ARCADE (Adaptive Retrieval Cache for Approximate Dense Embeddings) is a caching strategy that improves embedding retrieval by **preloading 1-hop neighbors** of frequently accessed entities.
|
|
1035
|
-
|
|
1036
|
-
```
|
|
1037
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1038
|
-
│ ARCADE 1-HOP CACHE │
|
|
1039
|
-
│ │
|
|
1040
|
-
│ PROBLEM: Embedding lookups are expensive │
|
|
1041
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1042
|
-
│ │ Query: "Find entities similar to Alice" │ │
|
|
1043
|
-
│ │ Step 1: Get Alice's embedding → 2ms (disk/network) │ │
|
|
1044
|
-
│ │ Step 2: HNSW search for neighbors → 5ms │ │
|
|
1045
|
-
│ │ Step 3: Get Bob's embedding → 2ms (disk/network) │ │
|
|
1046
|
-
│ │ Step 4: Get Charlie's embedding → 2ms (disk/network) │ │
|
|
1047
|
-
│ │ Total: 11ms │ │
|
|
1048
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1049
|
-
│ │
|
|
1050
|
-
│ SOLUTION: Cache 1-hop neighbors proactively │
|
|
1051
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1052
|
-
│ │ When Alice is accessed: │ │
|
|
1053
|
-
│ │ 1. Load Alice's embedding │ │
|
|
1054
|
-
│ │ 2. ALSO load embeddings of Alice's graph neighbors: │ │
|
|
1055
|
-
│ │ - Bob (Alice knows Bob) │ │
|
|
1056
|
-
│ │ - Company_X (Alice works at Company_X) │ │
|
|
1057
|
-
│ │ - Project_Y (Alice contributes to Project_Y) │ │
|
|
1058
|
-
│ │ │ │
|
|
1059
|
-
│ │ Next query about Bob? Already in cache → 0ms │ │
|
|
1060
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1061
|
-
│ │
|
|
1062
|
-
│ WHY "1-HOP"? │
|
|
1063
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1064
|
-
│ │ │ │
|
|
1065
|
-
│ │ [Company_X]←────┐ │ │
|
|
1066
|
-
│ │ │ │ │
|
|
1067
|
-
│ │ [Project_Y]←──[ALICE]──→[Bob]──→[Charlie] │ │
|
|
1068
|
-
│ │ ↑ │ │
|
|
1069
|
-
│ │ │ │ │
|
|
1070
|
-
│ │ 1-HOP NEIGHBORS 2-HOP (not cached) │ │
|
|
1071
|
-
│ │ │ │
|
|
1072
|
-
│ │ 1-hop = directly connected = high probability of access │ │
|
|
1073
|
-
│ │ 2-hop = too many, cache would explode │ │
|
|
1074
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1075
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1076
|
-
```
|
|
1077
|
-
|
|
1078
|
-
**Performance Impact**:
|
|
1079
|
-
|
|
1080
|
-
| Scenario | Without ARCADE | With ARCADE | Improvement |
|
|
1081
|
-
|----------|---------------|-------------|-------------|
|
|
1082
|
-
| Single entity lookup | 2ms | 2ms | Same |
|
|
1083
|
-
| Entity + neighbors (5) | 12ms | 2ms | **6x faster** |
|
|
1084
|
-
| Fraud ring traversal (10 entities) | 25ms | 4ms | **6x faster** |
|
|
1085
|
-
| Cold start | N/A | +5ms initial | One-time cost |
|
|
1086
|
-
|
|
1087
|
-
**When ARCADE Helps**:
|
|
1088
|
-
|
|
1089
|
-
| Use Case | Benefit | Why |
|
|
1090
|
-
|----------|---------|-----|
|
|
1091
|
-
| Fraud ring detection | High | Ring members are 1-hop connected |
|
|
1092
|
-
| Entity resolution | High | Similar entities share neighbors |
|
|
1093
|
-
| Recommendation | High | "Users like you" are 1-hop away |
|
|
1094
|
-
| Random lookups | Low | No locality to exploit |
|
|
1095
|
-
|
|
1096
|
-
```javascript
|
|
1097
|
-
const { EmbeddingService } = require('rust-kgdb')
|
|
1098
|
-
|
|
1099
|
-
// ARCADE is enabled by default
|
|
1100
|
-
const embeddings = new EmbeddingService({
|
|
1101
|
-
provider: 'openai',
|
|
1102
|
-
arcadeCache: {
|
|
1103
|
-
enabled: true,
|
|
1104
|
-
maxSize: 10000, // Cache up to 10K embeddings
|
|
1105
|
-
ttlSeconds: 300, // 5 minute TTL
|
|
1106
|
-
preloadDepth: 1 // 1-hop neighbors (default)
|
|
1107
|
-
}
|
|
1108
|
-
})
|
|
1109
|
-
|
|
1110
|
-
// First access: loads Alice + 1-hop neighbors
|
|
1111
|
-
const aliceEmbedding = await embeddings.get('http://example.org/Alice')
|
|
1112
|
-
|
|
1113
|
-
// Bob is Alice's neighbor: CACHE HIT (0ms instead of 2ms)
|
|
1114
|
-
const bobEmbedding = await embeddings.get('http://example.org/Bob')
|
|
1115
|
-
```
|
|
1116
|
-
|
|
1117
|
-
### Mathematical Foundations (HyperMind Framework)
|
|
1118
|
-
|
|
1119
|
-
The HyperMind agent framework is built on three mathematical pillars:
|
|
1120
|
-
|
|
1121
|
-
| Theory | Purpose | Implementation |
|
|
1122
|
-
|--------|---------|----------------|
|
|
1123
|
-
| **Type Theory** | Compile-time contracts for tool inputs/outputs | Hindley-Milner type inference, refinement types |
|
|
1124
|
-
| **Category Theory** | Tool composition with mathematical guarantees | Morphisms (A → B), functors, natural transformations |
|
|
1125
|
-
| **Proof Theory** | Every execution produces a verifiable witness | Curry-Howard correspondence, proof DAGs |
|
|
1126
|
-
|
|
1127
|
-
**Example**: A fraud detection query composes morphisms:
|
|
1128
|
-
```
|
|
1129
|
-
Query → BindingSet → RiskScore → FraudReport
|
|
1130
|
-
(morphism) (morphism) (morphism)
|
|
1131
|
-
```
|
|
1132
|
-
Each step has typed contracts. Composition is validated at compile time.
|
|
1133
|
-
|
|
1134
|
-
### Security: Object Capability Model (WASM Sandbox)
|
|
1135
|
-
|
|
1136
|
-
Unlike MCP (Model Context Protocol) which relies on trust-based access, rust-kgdb uses an **Object Capability (OCAP) security model**:
|
|
1137
|
-
|
|
1138
|
-
| Aspect | MCP | rust-kgdb WASM Sandbox |
|
|
1139
|
-
|--------|-----|------------------------|
|
|
1140
|
-
| **Access Control** | Trust-based (server decides) | Capability-based (code has what it's given) |
|
|
1141
|
-
| **Isolation** | Process boundaries | WASM linear memory isolation |
|
|
1142
|
-
| **Resource Limits** | None built-in | Fuel metering (CPU), memory limits |
|
|
1143
|
-
| **Audit Trail** | Optional logging | Built-in execution trace |
|
|
1144
|
-
|
|
1145
|
-
**Capabilities** granted to agents:
|
|
1146
|
-
```javascript
|
|
1147
|
-
const agent = new HyperMindAgent({
|
|
1148
|
-
kg: db,
|
|
1149
|
-
sandbox: {
|
|
1150
|
-
capabilities: ['ReadKG', 'ExecuteTool'], // No WriteKG = read-only
|
|
1151
|
-
fuelLimit: 1_000_000 // CPU budget
|
|
1152
|
-
}
|
|
1153
|
-
})
|
|
1154
|
-
```
|
|
1155
|
-
|
|
1156
|
-
Available capabilities: `ReadKG`, `WriteKG`, `ExecuteTool`, `SpawnAgent`, `HttpAccess`
|
|
1157
|
-
|
|
1158
|
-
**Why OCAP over MCP?**
|
|
1159
|
-
- **Principle of Least Authority**: Agent only has capabilities explicitly granted
|
|
1160
|
-
- **No Ambient Authority**: Can't access resources just because they exist
|
|
1161
|
-
- **Composable Security**: Capabilities can be attenuated when passed down
|
|
1162
|
-
|
|
1163
|
-
---
|
|
1164
|
-
|
|
1165
|
-
## Architecture Layers
|
|
1166
|
-
|
|
1167
|
-
### Layer Diagram
|
|
1168
|
-
|
|
1169
|
-
```
|
|
1170
|
-
┌─────────────────────────────────────────────────────────────────────────┐
|
|
1171
|
-
│ YOUR APPLICATION │
|
|
1172
|
-
│ (Fraud Detection, Risk Analysis, Compliance) │
|
|
1173
|
-
└────────────────────────────────┬────────────────────────────────────────┘
|
|
1174
|
-
│
|
|
1175
|
-
┌────────────────────────────────▼────────────────────────────────────────┐
|
|
1176
|
-
│ LAYER 1: SDK BINDINGS │
|
|
1177
|
-
│ TypeScript (NAPI-RS) | Python (UniFFI) | Kotlin (UniFFI) | Swift │
|
|
1178
|
-
└────────────────────────────────┬────────────────────────────────────────┘
|
|
1179
|
-
│
|
|
1180
|
-
┌────────────────────────────────▼────────────────────────────────────────┐
|
|
1181
|
-
│ LAYER 2: HYPERMIND FRAMEWORK │
|
|
1182
|
-
├─────────────────────────────────────────────────────────────────────────┤
|
|
1183
|
-
│ Intent Classification │ Tool Orchestration │ Memory Management │
|
|
1184
|
-
│ (keyword patterns) │ (morphism compose) │ (episode storage) │
|
|
1185
|
-
├─────────────────────────────────────────────────────────────────────────┤
|
|
1186
|
-
│ Type Theory │ Category Theory │ Proof Theory │
|
|
1187
|
-
│ (Hindley-Milner) │ (morphisms A→B) │ (Curry-Howard) │
|
|
1188
|
-
├─────────────────────────────────────────────────────────────────────────┤
|
|
1189
|
-
│ WASM Sandbox: Object Capability Security + Fuel Metering │
|
|
1190
|
-
└────────────────────────────────┬────────────────────────────────────────┘
|
|
1191
|
-
│
|
|
1192
|
-
┌────────────────────────────────▼────────────────────────────────────────┐
|
|
1193
|
-
│ LAYER 3: RUST CORE ENGINES │
|
|
1194
|
-
├──────────────────┬──────────────────┬──────────────────┬────────────────┤
|
|
1195
|
-
│ RDF/SPARQL │ GraphFrames │ Embeddings │ Datalog │
|
|
1196
|
-
│ • Quad Store │ • DataFusion SQL │ • HNSW ANN │ • Semi-naive │
|
|
1197
|
-
│ • SPOC Indexes │ • Arrow Columnar │ • 1-Hop Cache │ • Stratified │
|
|
1198
|
-
│ • 64 Builtins │ • Pregel BSP │ • Multi-Provider │ • Negation │
|
|
1199
|
-
└──────────────────┴──────────────────┴──────────────────┴────────────────┘
|
|
1200
|
-
│
|
|
1201
|
-
┌────────────────────────────────▼────────────────────────────────────────┐
|
|
1202
|
-
│ LAYER 4: STORAGE │
|
|
1203
|
-
│ InMemory (HashMap) │ RocksDB (LSM-tree) │ LMDB (B+tree, mmap) │
|
|
1204
|
-
└────────────────────────────────┬────────────────────────────────────────┘
|
|
1205
|
-
│
|
|
1206
|
-
┌────────────────────────────────▼────────────────────────────────────────┐
|
|
1207
|
-
│ LAYER 5: DISTRIBUTED (v0.2.0) │
|
|
1208
|
-
│ HDRF Partitioner │ gRPC Protocol │ Coordinator/Executor │ Raft (planned)│
|
|
1209
|
-
└─────────────────────────────────────────────────────────────────────────┘
|
|
1210
|
-
```
|
|
1211
|
-
|
|
1212
|
-
### Memory Hypergraph: Temporal + Long-Term Knowledge
|
|
1213
|
-
|
|
1214
|
-
The Memory Hypergraph solves a fundamental AI agent problem: **memory persistence across sessions**.
|
|
1215
|
-
|
|
1216
|
-
**Two Storage Layers, One Quad Store**:
|
|
1217
|
-
|
|
1218
|
-
| Layer | Purpose | Lifespan | Named Graph |
|
|
1219
|
-
|-------|---------|----------|-------------|
|
|
1220
|
-
| **Temporal Memory** | Agent episodes, conversations, findings | Session → months | `https://gonnect.ai/memory/` |
|
|
1221
|
-
| **Long-Term Knowledge** | Domain facts, entities, relationships | Permanent | Default graph |
|
|
1222
|
-
|
|
1223
|
-
**How They Connect**:
|
|
1224
|
-
|
|
1225
|
-
```
|
|
1226
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1227
|
-
│ TEMPORAL MEMORY LAYER │
|
|
1228
|
-
│ (Named Graph: https://gonnect.ai/memory/) │
|
|
1229
|
-
│ │
|
|
1230
|
-
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
1231
|
-
│ │ Episode:001 │────→│ Episode:002 │────→│ Episode:003 │ │
|
|
1232
|
-
│ │ │ │ │ │ │ │
|
|
1233
|
-
│ │ prompt: │ │ prompt: │ │ prompt: │ │
|
|
1234
|
-
│ │ "Investigate │ │ "Check claim │ │ "Summarize │ │
|
|
1235
|
-
│ │ P001" │ │ C123" │ │ investigation"│ │
|
|
1236
|
-
│ │ │ │ │ │ │ │
|
|
1237
|
-
│ │ timestamp: │ │ timestamp: │ │ timestamp: │ │
|
|
1238
|
-
│ │ Dec 10 9:00 │ │ Dec 12 14:30 │ │ Dec 14 11:00 │ │
|
|
1239
|
-
│ │ │ │ │ │ │ │
|
|
1240
|
-
│ │ success: ✓ │ │ success: ✓ │ │ success: ✓ │ │
|
|
1241
|
-
│ │ │ │ │ │ │ │
|
|
1242
|
-
│ │ accessCount: │ │ accessCount: │ │ accessCount: │ │
|
|
1243
|
-
│ │ 5 │ │ 3 │ │ 1 │ │
|
|
1244
|
-
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
|
1245
|
-
│ │ am:kgEntity │ am:kgEntity │ am:kgEntity │
|
|
1246
|
-
└──────────┼────────────────────┼────────────────────┼────────────────────────┘
|
|
1247
|
-
│ │ │
|
|
1248
|
-
│ HYPER-EDGES │ (link temporal │ to permanent)
|
|
1249
|
-
│ ═══════════ │ │
|
|
1250
|
-
▼ ▼ ▼
|
|
1251
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1252
|
-
│ LONG-TERM KNOWLEDGE LAYER │
|
|
1253
|
-
│ (Default Graph) │
|
|
1254
|
-
│ │
|
|
1255
|
-
│ ┌────────────────┐ ┌────────────────┐ │
|
|
1256
|
-
│ │ Provider:P001 │───submittedClaim──→│ Claim:C123 │ │
|
|
1257
|
-
│ │ │ │ │ │
|
|
1258
|
-
│ │ riskScore: 0.87│ │ amount: $50000 │ │
|
|
1259
|
-
│ │ name: "MedCorp"│ │ status: "open" │ │
|
|
1260
|
-
│ └────────────────┘ └───────┬────────┘ │
|
|
1261
|
-
│ │ │
|
|
1262
|
-
│ filedBy│ │
|
|
1263
|
-
│ ▼ │
|
|
1264
|
-
│ ┌────────────────┐ │
|
|
1265
|
-
│ │ Claimant:C001 │ │
|
|
1266
|
-
│ │ │ │
|
|
1267
|
-
│ │ name: "J.Smith"│ │
|
|
1268
|
-
│ │ riskScore: 0.85│ │
|
|
1269
|
-
│ └────────────────┘ │
|
|
1270
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1271
|
-
```
|
|
1272
|
-
|
|
1273
|
-
**Memory Scoring Formula** (for retrieval):
|
|
1274
|
-
```
|
|
1275
|
-
Score = α × Recency + β × Relevance + γ × Importance
|
|
1276
|
-
(0.3) (0.5) (0.2)
|
|
1277
|
-
|
|
1278
|
-
Recency = 0.995^hours_since_episode (decays ~12% per day)
|
|
1279
|
-
Relevance = cosine_similarity(query_embedding, episode_embedding)
|
|
1280
|
-
Importance = log10(access_count + 1) / log10(max_access + 1)
|
|
1281
|
-
```
|
|
1282
|
-
|
|
1283
|
-
**Rolling Context Window** (adaptive retrieval):
|
|
1284
|
-
```
|
|
1285
|
-
Pass 1: Search last 1 hour → 0 episodes → expand window
|
|
1286
|
-
Pass 2: Search last 24 hours → 1 episode → expand window
|
|
1287
|
-
Pass 3: Search last 7 days → 3 episodes → sufficient context!
|
|
1288
|
-
```
|
|
1289
|
-
|
|
1290
|
-
**Single Query Traverses Both Layers**:
|
|
1291
|
-
```sparql
|
|
1292
|
-
PREFIX am: <https://gonnect.ai/ontology/agent-memory#>
|
|
1293
|
-
PREFIX ins: <http://insurance.org/>
|
|
1294
|
-
|
|
1295
|
-
# Find past investigations and current risk scores
|
|
1296
|
-
SELECT ?episode ?finding ?providerRisk ?claimAmount WHERE {
|
|
1297
|
-
# Temporal layer: past agent memory
|
|
1298
|
-
GRAPH <https://gonnect.ai/memory/> {
|
|
1299
|
-
?episode a am:Episode ;
|
|
1300
|
-
am:prompt ?finding ;
|
|
1301
|
-
am:kgEntity ?provider .
|
|
1302
|
-
}
|
|
1303
|
-
# Long-term layer: current facts
|
|
1304
|
-
?provider ins:riskScore ?providerRisk .
|
|
1305
|
-
?provider ins:submittedClaim ?claim .
|
|
1306
|
-
?claim ins:amount ?claimAmount .
|
|
1307
|
-
}
|
|
1308
|
-
ORDER BY DESC(?providerRisk)
|
|
1309
|
-
```
|
|
1310
|
-
|
|
1311
|
-
**Key Benefits**:
|
|
1312
|
-
- **Session Persistence**: Agent remembers past investigations
|
|
1313
|
-
- **Contextual Recall**: "What did we find about P001 last week?"
|
|
1314
|
-
- **Idempotent Responses**: Same question → same answer (semantic hash)
|
|
1315
|
-
- **Full Provenance**: Every conclusion traceable to source episodes + KG facts
|
|
1316
|
-
|
|
1317
|
-
### Agent Identity & Session Persistence
|
|
1318
|
-
|
|
1319
|
-
Each agent has a persistent identity stored in the Memory Hypergraph:
|
|
1320
|
-
|
|
1321
|
-
```javascript
|
|
1322
|
-
const agent = new HyperMindAgent({
|
|
1323
|
-
kg: db,
|
|
1324
|
-
name: 'fraud-detector-alpha' // Agent identity
|
|
1325
|
-
})
|
|
1326
|
-
```
|
|
1327
|
-
|
|
1328
|
-
**Agent Memory Structure**:
|
|
1329
|
-
```
|
|
1330
|
-
┌────────────────────────────────────────────────────────────────────────────┐
|
|
1331
|
-
│ Agent: fraud-detector-alpha │
|
|
1332
|
-
│ Created: 2024-12-10 09:00:00 │
|
|
1333
|
-
│ Total Episodes: 47 │
|
|
1334
|
-
│ Last Active: 2024-12-15 14:30:00 │
|
|
1335
|
-
├────────────────────────────────────────────────────────────────────────────┤
|
|
1336
|
-
│ Session 1 (Dec 10) │ Session 2 (Dec 12) │ Session 3... │
|
|
1337
|
-
│ ├─ Episode:001 │ ├─ Episode:010 │ │
|
|
1338
|
-
│ ├─ Episode:002 │ ├─ Episode:011 │ │
|
|
1339
|
-
│ └─ Episode:003 │ └─ Episode:012 │ │
|
|
1340
|
-
└────────────────────────────────────────────────────────────────────────────┘
|
|
1341
|
-
```
|
|
1342
|
-
|
|
1343
|
-
**Cross-Session Continuity**:
|
|
1344
|
-
```javascript
|
|
1345
|
-
// Monday: First investigation
|
|
1346
|
-
const agent = new HyperMindAgent({ kg: db, name: 'fraud-detector' })
|
|
1347
|
-
await agent.call('Investigate Provider P001')
|
|
1348
|
-
// Memory stored: Episode:001 → linked to Provider:P001
|
|
1349
|
-
|
|
1350
|
-
// Wednesday: Agent recalls Monday's work
|
|
1351
|
-
const agent = new HyperMindAgent({ kg: db, name: 'fraud-detector' })
|
|
1352
|
-
await agent.call('What did we find about P001?')
|
|
1353
|
-
// Returns: "On Monday at 9:00am, we investigated P001 and found..."
|
|
1354
|
-
```
|
|
1355
|
-
|
|
1356
|
-
**SPARQL to Query Agent History**:
|
|
1357
|
-
```sparql
|
|
1358
|
-
PREFIX am: <https://gonnect.ai/ontology/agent-memory#>
|
|
1359
|
-
|
|
1360
|
-
SELECT ?episode ?prompt ?timestamp ?success WHERE {
|
|
1361
|
-
GRAPH <https://gonnect.ai/memory/> {
|
|
1362
|
-
?episode a am:Episode ;
|
|
1363
|
-
am:agent "fraud-detector-alpha" ;
|
|
1364
|
-
am:prompt ?prompt ;
|
|
1365
|
-
am:timestamp ?timestamp ;
|
|
1366
|
-
am:success ?success .
|
|
1367
|
-
}
|
|
1368
|
-
}
|
|
1369
|
-
ORDER BY DESC(?timestamp)
|
|
1370
|
-
LIMIT 10
|
|
1371
|
-
```
|
|
1372
|
-
|
|
1373
|
-
### Memory Ontology Specification
|
|
1374
|
-
|
|
1375
|
-
The agent memory system uses a formal OWL ontology available at [`ontology/agent-memory.ttl`](./ontology/agent-memory.ttl).
|
|
1376
|
-
|
|
1377
|
-
**Namespace**: `http://hypermind.ai/memory#` (prefix: `am:`)
|
|
1378
|
-
|
|
1379
|
-
**Core Classes**:
|
|
1380
|
-
|
|
1381
|
-
| Class | Description |
|
|
1382
|
-
|-------|-------------|
|
|
1383
|
-
| `am:Episode` | A discrete interaction record (prompt → response) |
|
|
1384
|
-
| `am:ExecutionRecord` | Tool execution within an episode |
|
|
1385
|
-
| `am:Agent` | Persistent agent identity |
|
|
1386
|
-
| `am:Session` | Bounded interaction period |
|
|
1387
|
-
| `am:ProofDAG` | Reasoning chain (Curry-Howard proof witness) |
|
|
1388
|
-
|
|
1389
|
-
**Key Properties**:
|
|
1390
|
-
|
|
1391
|
-
| Property | Domain | Range | Description |
|
|
1392
|
-
|----------|--------|-------|-------------|
|
|
1393
|
-
| `am:prompt` | Episode | xsd:string | User prompt that initiated the episode |
|
|
1394
|
-
| `am:success` | Episode | xsd:boolean | Whether execution succeeded |
|
|
1395
|
-
| `am:timestamp` | Episode | xsd:dateTime | When the episode occurred |
|
|
1396
|
-
| `am:durationMs` | Episode | xsd:integer | Execution time in milliseconds |
|
|
1397
|
-
| `am:accessCount` | Episode | xsd:integer | Retrieval count (for importance scoring) |
|
|
1398
|
-
| `am:linksToEntity` | Episode | rdfs:Resource | **Hyper-edge to KG entity** |
|
|
1399
|
-
| `am:embedding` | Episode | xsd:string | 384-dim vector (JSON array) |
|
|
1400
|
-
| `am:tool` | ExecutionRecord | xsd:string | Tool identifier (e.g., 'kg.sparql.query') |
|
|
1401
|
-
| `am:performedBy` | Episode | Agent | Agent that executed the episode |
|
|
1402
|
-
|
|
1403
|
-
**Hyper-Edge Pattern** (linking temporal memory to KG):
|
|
1404
|
-
|
|
1405
|
-
```turtle
|
|
1406
|
-
@prefix am: <http://hypermind.ai/memory#> .
|
|
1407
|
-
@prefix ins: <http://insurance.org/> .
|
|
1408
|
-
|
|
1409
|
-
# Episode links to multiple KG entities via hyper-edges
|
|
1410
|
-
<episode:001> a am:Episode ;
|
|
1411
|
-
am:prompt "Investigate fraud ring involving P001 and C123" ;
|
|
1412
|
-
am:success true ;
|
|
1413
|
-
am:timestamp "2025-12-15T10:30:00Z"^^xsd:dateTime ;
|
|
1414
|
-
am:linksToEntity ins:P001 ; # Hyper-edge to Provider
|
|
1415
|
-
am:linksToEntity ins:C123 ; # Hyper-edge to Claim
|
|
1416
|
-
am:performedBy <agent:fraud-detector> .
|
|
1417
|
-
```
|
|
1418
|
-
|
|
1419
|
-
**Named Graphs**:
|
|
1420
|
-
|
|
1421
|
-
| Graph | Purpose |
|
|
1422
|
-
|-------|---------|
|
|
1423
|
-
| `http://hypermind.ai/memory/` | Default episodic memory storage |
|
|
1424
|
-
| `http://memory.hypermind.ai/` | Long-term persistent memory |
|
|
1425
|
-
|
|
1426
|
-
The ontology is constructed from:
|
|
1427
|
-
1. **User conversations** - Prompts and natural language queries
|
|
1428
|
-
2. **Agent responses** - Results, explanations, proofs
|
|
1429
|
-
3. **Temporal metadata** - Timestamps, durations, access patterns
|
|
1430
|
-
4. **KG linkage** - Hyper-edges connecting episodes to business entities
|
|
1431
|
-
|
|
1432
|
-
### Schema-Aware GraphDB (v0.6.13+)
|
|
1433
|
-
|
|
1434
|
-
Automatic schema extraction at load time - internal to the engine:
|
|
1435
|
-
|
|
1436
|
-
```javascript
|
|
1437
|
-
const { createSchemaAwareGraphDB, wrapWithSchemaAwareness } = require('rust-kgdb')
|
|
1438
|
-
|
|
1439
|
-
// Option 1: Create new schema-aware database
|
|
1440
|
-
const db = createSchemaAwareGraphDB('http://example.org/', {
|
|
1441
|
-
autoExtract: true // Extract schema after every load operation
|
|
1442
|
-
})
|
|
1443
|
-
|
|
1444
|
-
// Option 2: Wrap existing database
|
|
1445
|
-
const rawDb = new GraphDB('http://example.org/')
|
|
1446
|
-
const schemaDb = wrapWithSchemaAwareness(rawDb, { autoExtract: true })
|
|
1447
|
-
|
|
1448
|
-
// Load data - schema extraction happens automatically
|
|
1449
|
-
db.loadTtl(`
|
|
1450
|
-
@prefix : <http://example.org/> .
|
|
1451
|
-
:alice a :Person ; :knows :bob .
|
|
1452
|
-
:bob a :Person ; :age 30 .
|
|
1453
|
-
`, null)
|
|
1454
|
-
|
|
1455
|
-
// Wait for schema to be ready (handles race conditions)
|
|
1456
|
-
const schema = await db.waitForSchema()
|
|
1457
|
-
console.log('Classes:', schema.context.classes) // ['Person']
|
|
1458
|
-
console.log('Predicates:', schema.context.predicates) // ['knows', 'age']
|
|
1459
|
-
```
|
|
1460
|
-
|
|
1461
|
-
**Key Features**:
|
|
1462
|
-
- **Auto-extraction**: Schema extracted asynchronously after `loadTtl()`, `loadNtriples()`, `updateInsert()`
|
|
1463
|
-
- **Race condition handling**: `waitForSchema()` blocks until extraction completes
|
|
1464
|
-
- **Caching**: Schema cached globally via `SCHEMA_CACHE` (5 minute TTL)
|
|
1465
|
-
- **No redundant extraction**: Only triggers on data modifications, not reads
|
|
1466
|
-
|
|
1467
|
-
### Schema Caching (v0.6.12+)
|
|
1468
|
-
|
|
1469
|
-
Cross-agent schema sharing via global singleton:
|
|
1470
|
-
|
|
1471
|
-
```javascript
|
|
1472
|
-
const { SCHEMA_CACHE, SchemaCache } = require('rust-kgdb')
|
|
1473
|
-
|
|
1474
|
-
// Global singleton - shared across all agents
|
|
1475
|
-
SCHEMA_CACHE.set('http://insurance.org/', schema)
|
|
1476
|
-
const cached = SCHEMA_CACHE.get('http://insurance.org/')
|
|
1477
|
-
|
|
1478
|
-
// Cache-aside pattern for automatic computation
|
|
1479
|
-
const schema = await SCHEMA_CACHE.getOrCompute(
|
|
1480
|
-
'http://insurance.org/',
|
|
1481
|
-
async () => SchemaContext.fromKG(db)
|
|
1482
|
-
)
|
|
1483
|
-
|
|
1484
|
-
// Invalidate on data changes
|
|
1485
|
-
SCHEMA_CACHE.invalidate('http://insurance.org/')
|
|
1486
|
-
|
|
1487
|
-
// Monitor cache performance
|
|
1488
|
-
console.log(SCHEMA_CACHE.getStats()) // { hits: 42, misses: 3, evictions: 1 }
|
|
1489
|
-
```
|
|
1490
|
-
|
|
1491
|
-
**Cache Configuration** (via `CONFIG.SCHEMA_CACHE_TTL_MS`):
|
|
1492
|
-
- Default TTL: 5 minutes (300,000 ms)
|
|
1493
|
-
- Eviction: Automatic when cache exceeds 100 entries
|
|
1494
|
-
|
|
1495
|
-
### Context Theory (v0.6.11+)
|
|
1496
|
-
|
|
1497
|
-
Type-theoretic schema validation based on Spivak's Ologs:
|
|
1498
|
-
|
|
1499
|
-
```javascript
|
|
1500
|
-
const { SchemaContext, TypeJudgment, QueryValidator, ProofDAG } = require('rust-kgdb')
|
|
1501
|
-
|
|
1502
|
-
// Extract schema as category (Objects = Classes, Morphisms = Properties)
|
|
1503
|
-
const schema = SchemaContext.fromKG(db)
|
|
1504
|
-
console.log(schema.objects) // Classes: ['Claim', 'Provider', 'Claimant']
|
|
1505
|
-
console.log(schema.morphisms) // Properties: ['submittedBy', 'amount', 'riskScore']
|
|
1506
|
-
|
|
1507
|
-
// Validate SPARQL queries against schema
|
|
1508
|
-
const validator = new QueryValidator(schema)
|
|
1509
|
-
const result = validator.validate(`
|
|
1510
|
-
SELECT ?claim ?amount WHERE {
|
|
1511
|
-
?claim :amount ?amount .
|
|
1512
|
-
?claim :unknownPredicate ?x .
|
|
1513
|
-
}
|
|
1514
|
-
`)
|
|
1515
|
-
// result: { valid: false, errors: ['unknownPredicate not in schema morphisms'] }
|
|
1516
|
-
|
|
1517
|
-
// Build proof DAG for verifiable reasoning
|
|
1518
|
-
const proof = new ProofDAG()
|
|
1519
|
-
proof.addNode('sparql_result', { bindings: [...] })
|
|
1520
|
-
proof.addNode('datalog_inference', { rule: 'fraud_rule' })
|
|
1521
|
-
proof.setRoot('conclusion', {
|
|
1522
|
-
derives_from: ['sparql_result', 'datalog_inference']
|
|
1523
|
-
})
|
|
1524
|
-
console.log(proof.hash) // Deterministic hash for auditability
|
|
1525
|
-
```
|
|
1526
|
-
|
|
1527
|
-
**Mathematical Foundation**:
|
|
1528
|
-
- Schema as category (Spivak's Ologs)
|
|
1529
|
-
- Queries as functors (structure-preserving)
|
|
1530
|
-
- Type judgments: Γ ⊢ t : T (context proves term has type)
|
|
1531
|
-
- Curry-Howard correspondence for proof witnesses
|
|
1532
|
-
|
|
1533
|
-
### Automatic Schema Detection: Mathematical Foundations
|
|
1534
|
-
|
|
1535
|
-
When no schema is explicitly provided, HyperMind uses **Context Theory** (based on Spivak's Categorical approach to Databases and Ologs) to automatically discover the schema from your knowledge graph data.
|
|
1536
|
-
|
|
1537
|
-
```
|
|
1538
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1539
|
-
│ MATHEMATICAL SCHEMA DETECTION │
|
|
1540
|
-
│ │
|
|
1541
|
-
│ STEP 1: Category Construction (Objects) │
|
|
1542
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1543
|
-
│ │ For every triple (s, rdf:type, C), add C to Objects │ │
|
|
1544
|
-
│ │ │ │
|
|
1545
|
-
│ │ Input triples: │ │
|
|
1546
|
-
│ │ :claim001 a :Claim . │ │
|
|
1547
|
-
│ │ :provider001 a :Provider . │ │
|
|
1548
|
-
│ │ │ │
|
|
1549
|
-
│ │ Discovered Objects (Classes): { Claim, Provider } │ │
|
|
1550
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1551
|
-
│ │
|
|
1552
|
-
│ STEP 2: Morphism Discovery (Properties) │
|
|
1553
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1554
|
-
│ │ For every triple (s, p, o) where p ≠ rdf:type: │ │
|
|
1555
|
-
│ │ - p becomes a morphism │ │
|
|
1556
|
-
│ │ - domain(p) = type(s) (inferred from rdf:type of subject) │ │
|
|
1557
|
-
│ │ - codomain(p) = type(o) (inferred from rdf:type or literal type)│ │
|
|
1558
|
-
│ │ │ │
|
|
1559
|
-
│ │ Input triples: │ │
|
|
1560
|
-
│ │ :claim001 :submittedBy :provider001 . │ │
|
|
1561
|
-
│ │ :claim001 :amount "50000"^^xsd:decimal . │ │
|
|
1562
|
-
│ │ │ │
|
|
1563
|
-
│ │ Discovered Morphisms: │ │
|
|
1564
|
-
│ │ submittedBy : Claim → Provider (object property) │ │
|
|
1565
|
-
│ │ amount : Claim → xsd:decimal (datatype property) │ │
|
|
1566
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1567
|
-
│ │
|
|
1568
|
-
│ STEP 3: Type Judgment Formation │
|
|
1569
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1570
|
-
│ │ Context Γ = { claim001 : Claim, provider001 : Provider } │ │
|
|
1571
|
-
│ │ │ │
|
|
1572
|
-
│ │ Type Judgment: Γ ⊢ submittedBy(claim001) : Provider │ │
|
|
1573
|
-
│ │ (Under context Γ, applying submittedBy to claim001 yields Provider)│ │
|
|
1574
|
-
│ │ │ │
|
|
1575
|
-
│ │ This forms the basis for SPARQL validation: │ │
|
|
1576
|
-
│ │ - If query uses ?claim :submittedBy ?x, we know ?x : Provider │ │
|
|
1577
|
-
│ │ - If query uses ?claim :unknownPred ?x → TYPE ERROR (not in Γ) │ │
|
|
1578
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1579
|
-
│ │
|
|
1580
|
-
│ RESULT: Schema as Category C │
|
|
1581
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1582
|
-
│ │ Objects: { Claim, Provider, xsd:decimal, xsd:string, ... } │ │
|
|
1583
|
-
│ │ Morphisms: { submittedBy, amount, name, riskScore, ... } │ │
|
|
1584
|
-
│ │ Composition: submittedBy ∘ name : Claim → xsd:string │ │
|
|
1585
|
-
│ │ (claim's provider's name) │ │
|
|
1586
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1587
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1588
|
-
```
|
|
1589
|
-
|
|
1590
|
-
**Key Mathematical Concepts**:
|
|
1591
|
-
|
|
1592
|
-
| Concept | Mathematical Definition | In HyperMind |
|
|
1593
|
-
|---------|------------------------|--------------|
|
|
1594
|
-
| **Olog (Ontology Log)** | Category where objects are types, morphisms are functional relations | `SchemaContext` class |
|
|
1595
|
-
| **Functor** | Structure-preserving map between categories | SPARQL query as `Schema → Results` functor |
|
|
1596
|
-
| **Type Judgment** | Γ ⊢ t : T (context proves term has type) | Validates query variables against schema |
|
|
1597
|
-
| **Pullback** | Fiber product of two morphisms | JOIN operation in SPARQL |
|
|
1598
|
-
| **Curry-Howard** | Proofs = Programs, Types = Propositions | ProofDAG witnesses for audit |
|
|
1599
|
-
|
|
1600
|
-
**Why This Matters**:
|
|
1601
|
-
|
|
1602
|
-
1. **No Schema? No Problem**: HyperMind extracts schema from your data structure
|
|
1603
|
-
2. **Type-Safe Queries**: Invalid predicates caught at planning time, not runtime
|
|
1604
|
-
3. **LLM Grounding**: Schema injected into LLM prompts ensures valid SPARQL generation
|
|
1605
|
-
4. **Provenance**: Every inference traceable through the categorical structure
|
|
1606
|
-
|
|
1607
|
-
### Intelligence Control Plane: The Neuro-Symbolic Stack
|
|
1608
|
-
|
|
1609
|
-
HyperMind implements an **Intelligence Control Plane** - a formal architecture layer that governs how AI agents interact with knowledge, based on research from MIT (David Spivak's Categorical Databases) and Stanford (Pat Langley's Cognitive Architectures).
|
|
1610
|
-
|
|
1611
|
-
```
|
|
1612
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1613
|
-
│ INTELLIGENCE CONTROL PLANE │
|
|
1614
|
-
│ (Neuro-Symbolic Integration Layer) │
|
|
1615
|
-
│ │
|
|
1616
|
-
│ Research Foundations: │
|
|
1617
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1618
|
-
│ │ • MIT - Spivak's "Category Theory for Databases" (2014) │ │
|
|
1619
|
-
│ │ • Stanford - Langley's Cognitive Systems Architecture │ │
|
|
1620
|
-
│ │ • CMU - Curry-Howard Correspondence for AI Verification │ │
|
|
1621
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1622
|
-
│ │
|
|
1623
|
-
│ LAYER 1: NEURAL PERCEPTION (LLM) │
|
|
1624
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1625
|
-
│ │ Input: "Find suspicious billing patterns for Provider P001" │ │
|
|
1626
|
-
│ │ Output: Intent classification + tool selection │ │
|
|
1627
|
-
│ │ Constraint: Schema-bounded generation (no hallucinated predicates) │ │
|
|
1628
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1629
|
-
│ │ │
|
|
1630
|
-
│ ▼ │
|
|
1631
|
-
│ LAYER 2: SYMBOLIC REASONING (SPARQL + Datalog) │
|
|
1632
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1633
|
-
│ │ Query Execution: SELECT ?claim WHERE { ?claim :provider :P001 } │ │
|
|
1634
|
-
│ │ Rule Application: fraud(?C) :- high_amount(?C), rapid_filing(?C) │ │
|
|
1635
|
-
│ │ Guarantee: Deterministic, reproducible, auditable │ │
|
|
1636
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1637
|
-
│ │ │
|
|
1638
|
-
│ ▼ │
|
|
1639
|
-
│ LAYER 3: PROOF SYNTHESIS (Curry-Howard) │
|
|
1640
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1641
|
-
│ │ ProofDAG: Every conclusion backed by derivation chain │ │
|
|
1642
|
-
│ │ │ │
|
|
1643
|
-
│ │ [CONCLUSION: P001 is suspicious] │ │
|
|
1644
|
-
│ │ │ │ │
|
|
1645
|
-
│ │ ┌─────────────┼─────────────┐ │ │
|
|
1646
|
-
│ │ │ │ │ │ │
|
|
1647
|
-
│ │ [SPARQL] [Datalog] [Embedding] │ │
|
|
1648
|
-
│ │ 47 claims fraud rule 0.87 similarity │ │
|
|
1649
|
-
│ │ matched matched to known fraud │ │
|
|
1650
|
-
│ │ │ │
|
|
1651
|
-
│ │ Hash: sha256:8f3a2b1c... (deterministic, verifiable) │ │
|
|
1652
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1653
|
-
│ │ │
|
|
1654
|
-
│ ▼ │
|
|
1655
|
-
│ OUTPUT: Verified Answer with Full Provenance │
|
|
1656
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1657
|
-
│ │ "Provider P001 is flagged for review. Evidence: │ │
|
|
1658
|
-
│ │ - 47 high-value claims in 30 days (SPARQL) │ │
|
|
1659
|
-
│ │ - Matches fraud pattern fraud_rapid_high (Datalog) │ │
|
|
1660
|
-
│ │ - 87% similar to 3 previously confirmed fraudulent providers │ │
|
|
1661
|
-
│ │ Proof hash: sha256:8f3a2b1c4d5e6f7a8b9c0d1e2f3a4b5c" │ │
|
|
1662
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1663
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1664
|
-
```
|
|
1665
|
-
|
|
1666
|
-
**Why "Control Plane"?**
|
|
1667
|
-
|
|
1668
|
-
In networking, the **control plane** makes decisions about where traffic should go, while the **data plane** actually forwards the packets. Similarly:
|
|
1669
|
-
|
|
1670
|
-
| Concept | Networking | HyperMind |
|
|
1671
|
-
|---------|-----------|-----------|
|
|
1672
|
-
| **Control Plane** | Routing decisions | LLM planning + type validation + proof synthesis |
|
|
1673
|
-
| **Data Plane** | Packet forwarding | SPARQL execution + Datalog evaluation + embedding lookup |
|
|
1674
|
-
| **Policy** | ACLs, firewall rules | AgentScope, capabilities, fuel limits |
|
|
1675
|
-
| **Verification** | Routing table consistency | ProofDAG with Curry-Howard witnesses |
|
|
1676
|
-
|
|
1677
|
-
**The Curry-Howard Insight**:
|
|
1678
|
-
|
|
1679
|
-
The Curry-Howard correspondence states that **proofs are programs** and **types are propositions**. HyperMind applies this:
|
|
1680
|
-
|
|
1681
|
-
```
|
|
1682
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1683
|
-
│ CURRY-HOWARD IN HYPERMIND │
|
|
1684
|
-
│ │
|
|
1685
|
-
│ PROPOSITION (Type): "Provider P001 has fraud indicators" │
|
|
1686
|
-
│ │
|
|
1687
|
-
│ PROOF (Program): ProofDAG with derivation steps │
|
|
1688
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1689
|
-
│ │ 1. sparql_result: 47 claims found │ │
|
|
1690
|
-
│ │ Γ ⊢ sparql("SELECT ?c WHERE {...}") : BindingSet │ │
|
|
1691
|
-
│ │ │ │
|
|
1692
|
-
│ │ 2. datalog_derivation: fraud rule matched │ │
|
|
1693
|
-
│ │ Γ, sparql_result ⊢ fraud(P001) : InferredFact │ │
|
|
1694
|
-
│ │ │ │
|
|
1695
|
-
│ │ 3. embedding_similarity: 0.87 match to known fraud │ │
|
|
1696
|
-
│ │ Γ ⊢ similar(P001, fraud_cluster) : SimilarityScore │ │
|
|
1697
|
-
│ │ │ │
|
|
1698
|
-
│ │ 4. conclusion: conjunction of evidence │ │
|
|
1699
|
-
│ │ Γ, (2), (3) ⊢ suspicious(P001) : FraudIndicator │ │
|
|
1700
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1701
|
-
│ │
|
|
1702
|
-
│ VERIFICATION: Given ProofDAG, anyone can: │
|
|
1703
|
-
│ 1. Re-execute each step │
|
|
1704
|
-
│ 2. Verify types match │
|
|
1705
|
-
│ 3. Confirm deterministic hash │
|
|
1706
|
-
│ 4. Audit the complete reasoning chain │
|
|
1707
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1708
|
-
```
|
|
1709
|
-
|
|
1710
|
-
**ProofDAG Structure**:
|
|
1711
|
-
|
|
1712
|
-
```javascript
|
|
1713
|
-
const proof = {
|
|
1714
|
-
root: {
|
|
1715
|
-
id: 'conclusion',
|
|
1716
|
-
type: 'FraudIndicator',
|
|
1717
|
-
value: { provider: 'P001', riskScore: 0.91, confidence: 0.94 },
|
|
1718
|
-
derives_from: ['sparql_evidence', 'datalog_derivation', 'embedding_match']
|
|
1719
|
-
},
|
|
1720
|
-
nodes: [
|
|
1721
|
-
{
|
|
1722
|
-
id: 'sparql_evidence',
|
|
1723
|
-
tool: 'kg.sparql.query',
|
|
1724
|
-
input_type: 'Query',
|
|
1725
|
-
output_type: 'BindingSet',
|
|
1726
|
-
query: 'SELECT ?claim WHERE { ?claim :provider :P001 ; :amount ?a . FILTER(?a > 10000) }',
|
|
1727
|
-
result: { count: 47, time_ms: 2.3 }
|
|
1728
|
-
},
|
|
1729
|
-
{
|
|
1730
|
-
id: 'datalog_derivation',
|
|
1731
|
-
tool: 'kg.datalog.apply',
|
|
1732
|
-
input_type: 'RuleSet',
|
|
1733
|
-
output_type: 'InferredFacts',
|
|
1734
|
-
rule: 'fraud(?P) :- provider(?P), high_claim_count(?P), rapid_filing(?P)',
|
|
1735
|
-
result: { matched: true, bindings: { P: 'P001' } }
|
|
1736
|
-
},
|
|
1737
|
-
{
|
|
1738
|
-
id: 'embedding_match',
|
|
1739
|
-
tool: 'kg.embeddings.search',
|
|
1740
|
-
input_type: 'Entity',
|
|
1741
|
-
output_type: 'SimilarEntities',
|
|
1742
|
-
entity: 'P001',
|
|
1743
|
-
result: { similar: ['FRAUD_001', 'FRAUD_002', 'FRAUD_003'], score: 0.87 }
|
|
1744
|
-
}
|
|
1745
|
-
],
|
|
1746
|
-
hash: 'sha256:8f3a2b1c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a',
|
|
1747
|
-
timestamp: '2025-12-15T10:30:00Z'
|
|
1748
|
-
}
|
|
1749
|
-
|
|
1750
|
-
// Anyone can verify this proof independently
|
|
1751
|
-
const isValid = ProofDAG.verify(proof) // true if all derivations check out
|
|
1752
|
-
```
|
|
1753
|
-
|
|
1754
|
-
### Deterministic LLM Usage in Planner
|
|
1755
|
-
|
|
1756
|
-
The LLMPlanner makes LLM usage **deterministic** by constraining outputs to the schema category:
|
|
1757
|
-
|
|
1758
|
-
```
|
|
1759
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1760
|
-
│ DETERMINISTIC LLM PLANNING │
|
|
1761
|
-
│ │
|
|
1762
|
-
│ PROBLEM: LLMs are inherently non-deterministic │
|
|
1763
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1764
|
-
│ │ Same prompt → Different outputs each time │ │
|
|
1765
|
-
│ │ "Find high-risk claims" → SELECT ?x WHERE {...} (run 1) │ │
|
|
1766
|
-
│ │ "Find high-risk claims" → SELECT ?claim WHERE {...} (run 2) │ │
|
|
1767
|
-
│ │ Different variable names! │ │
|
|
1768
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1769
|
-
│ │
|
|
1770
|
-
│ SOLUTION: Schema-constrained generation │
|
|
1771
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1772
|
-
│ │ 1. SCHEMA INJECTION: LLM receives exact predicates from schema │ │
|
|
1773
|
-
│ │ "Available predicates: submittedBy, amount, riskScore" │ │
|
|
1774
|
-
│ │ │ │
|
|
1775
|
-
│ │ 2. TEMPLATE ENFORCEMENT: Output must follow typed template │ │
|
|
1776
|
-
│ │ { │ │
|
|
1777
|
-
│ │ "tool": "kg.sparql.query", // From TOOL_REGISTRY │ │
|
|
1778
|
-
│ │ "query": "SELECT ...", // Must use schema predicates │ │
|
|
1779
|
-
│ │ "expected_type": "BindingSet" // From TypeId │ │
|
|
1780
|
-
│ │ } │ │
|
|
1781
|
-
│ │ │ │
|
|
1782
|
-
│ │ 3. VALIDATION: Generated SPARQL checked against schema category │ │
|
|
1783
|
-
│ │ - All predicates ∈ schema.morphisms? ✓ │ │
|
|
1784
|
-
│ │ - All types ∈ schema.objects? ✓ │ │
|
|
1785
|
-
│ │ - Variable bindings type-correct? ✓ │ │
|
|
1786
|
-
│ │ │ │
|
|
1787
|
-
│ │ 4. RETRY ON FAILURE: If validation fails, regenerate with hint │ │
|
|
1788
|
-
│ │ "Previous query used ':badPredicate' not in schema. Try again" │ │
|
|
1789
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1790
|
-
│ │
|
|
1791
|
-
│ RESULT: Same semantic query → Same valid SPARQL (modulo variable names) │
|
|
1792
|
-
│ │
|
|
1793
|
-
│ "Find high-risk claims" → Always generates: │
|
|
1794
|
-
│ SELECT ?claim WHERE { ?claim :riskScore ?score . FILTER(?score > 0.7) } │
|
|
1795
|
-
│ Because :riskScore is the ONLY risk-related predicate in schema │
|
|
1796
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1797
|
-
```
|
|
1798
|
-
|
|
1799
|
-
**Determinism Guarantees**:
|
|
1800
|
-
|
|
1801
|
-
| Aspect | How Determinism is Achieved |
|
|
1802
|
-
|--------|---------------------------|
|
|
1803
|
-
| **Predicate Selection** | LLM can ONLY use predicates from extracted schema |
|
|
1804
|
-
| **Type Consistency** | Output types validated against TypeId registry |
|
|
1805
|
-
| **Tool Selection** | TOOL_REGISTRY defines exact tool signatures |
|
|
1806
|
-
| **Error Recovery** | Failed validations trigger constrained retry |
|
|
1807
|
-
| **Caching** | Identical queries return cached SPARQL (no re-generation) |
|
|
1808
|
-
|
|
1809
|
-
```javascript
|
|
1810
|
-
// Deterministic LLM Planning in action
|
|
1811
|
-
const planner = new LLMPlanner({
|
|
1812
|
-
model: 'gpt-4o',
|
|
1813
|
-
apiKey: process.env.OPENAI_API_KEY,
|
|
1814
|
-
schema: SchemaContext.fromKG(db), // Schema constrains LLM output
|
|
1815
|
-
temperature: 0, // Minimize randomness
|
|
1816
|
-
cacheTTL: 300000 // Cache results for 5 minutes
|
|
1817
|
-
})
|
|
1818
|
-
|
|
1819
|
-
// These produce identical SPARQL because schema only has one risk predicate
|
|
1820
|
-
const plan1 = await planner.plan('Find risky claims')
|
|
1821
|
-
const plan2 = await planner.plan('Show me dangerous claims')
|
|
1822
|
-
const plan3 = await planner.plan('Which claims are high-risk?')
|
|
1823
|
-
|
|
1824
|
-
// All three generate the same validated SPARQL
|
|
1825
|
-
console.log(plan1.sparql === plan2.sparql) // true (after normalization)
|
|
1826
|
-
```
|
|
1827
|
-
|
|
1828
|
-
### Bring Your Own Ontology (BYOO) - Enterprise Support
|
|
1829
|
-
|
|
1830
|
-
For organizations with existing ontology teams:
|
|
1831
|
-
|
|
1832
|
-
```javascript
|
|
1833
|
-
const { SchemaContext } = require('rust-kgdb')
|
|
1834
|
-
|
|
1835
|
-
// Load enterprise ontology (TTL/OWL format)
|
|
1836
|
-
const ontologyTtl = `
|
|
1837
|
-
@prefix owl: <http://www.w3.org/2002/07/owl#> .
|
|
1838
|
-
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
|
|
1839
|
-
@prefix ins: <http://insurance.org/> .
|
|
1840
|
-
|
|
1841
|
-
ins:Claim a owl:Class ;
|
|
1842
|
-
rdfs:label "Insurance Claim" .
|
|
1843
|
-
|
|
1844
|
-
ins:Provider a owl:Class ;
|
|
1845
|
-
rdfs:label "Healthcare Provider" .
|
|
1846
|
-
|
|
1847
|
-
ins:submittedBy a owl:ObjectProperty ;
|
|
1848
|
-
rdfs:domain ins:Claim ;
|
|
1849
|
-
rdfs:range ins:Provider .
|
|
1850
|
-
|
|
1851
|
-
ins:amount a owl:DatatypeProperty ;
|
|
1852
|
-
rdfs:domain ins:Claim ;
|
|
1853
|
-
rdfs:range xsd:decimal .
|
|
1854
|
-
`
|
|
1855
|
-
|
|
1856
|
-
// Create schema from external ontology
|
|
1857
|
-
const ontologySchema = SchemaContext.fromOntology(db, ontologyTtl)
|
|
1858
|
-
|
|
1859
|
-
// Or merge ontology with KG-derived schema
|
|
1860
|
-
const kgSchema = SchemaContext.fromKG(db)
|
|
1861
|
-
const mergedSchema = SchemaContext.merge(ontologySchema, kgSchema)
|
|
1862
|
-
|
|
1863
|
-
// Use in HyperMind agent
|
|
1864
|
-
const agent = new HyperMindAgent({
|
|
1865
|
-
kg: db,
|
|
1866
|
-
schema: mergedSchema // Agent uses your enterprise ontology
|
|
1867
|
-
})
|
|
1868
|
-
```
|
|
1869
|
-
|
|
1870
|
-
**Use Cases**:
|
|
1871
|
-
- **Large Enterprises**: Central ontology team defines schemas
|
|
1872
|
-
- **Industry Standards**: Use FIBO, HL7 FHIR, or domain-specific ontologies
|
|
1873
|
-
- **Governance**: Schema changes go through formal approval process
|
|
1874
|
-
|
|
1875
|
-
---
|
|
1876
|
-
|
|
1877
|
-
## Installation
|
|
1878
|
-
|
|
1879
|
-
```bash
|
|
1880
|
-
npm install rust-kgdb
|
|
1881
|
-
```
|
|
1882
|
-
|
|
1883
|
-
**Platforms**: macOS (Intel/Apple Silicon), Linux (x64/ARM64), Windows (x64)
|
|
1884
|
-
|
|
1885
|
-
---
|
|
1886
|
-
|
|
1887
|
-
## Quick Start
|
|
1888
|
-
|
|
1889
|
-
### 1. Knowledge Graph
|
|
1890
|
-
|
|
1891
|
-
```javascript
|
|
1892
|
-
const { GraphDB, getVersion } = require('rust-kgdb')
|
|
1893
|
-
|
|
1894
|
-
console.log('Version:', getVersion()) // "0.2.0"
|
|
1895
|
-
|
|
1896
|
-
const db = new GraphDB('http://example.org/')
|
|
1897
|
-
|
|
1898
|
-
db.loadTtl(`
|
|
1899
|
-
@prefix : <http://example.org/> .
|
|
1900
|
-
:alice :knows :bob .
|
|
1901
|
-
:bob :knows :charlie .
|
|
1902
|
-
:charlie :knows :alice .
|
|
1903
|
-
`, null)
|
|
1904
|
-
|
|
1905
|
-
console.log(`Loaded ${db.countTriples()} triples`) // 3
|
|
1906
|
-
|
|
1907
|
-
const results = db.querySelect(`
|
|
1908
|
-
PREFIX : <http://example.org/>
|
|
1909
|
-
SELECT ?person WHERE { ?person :knows :bob }
|
|
1910
|
-
`)
|
|
1911
|
-
console.log(results) // [{ bindings: { person: 'http://example.org/alice' } }]
|
|
1912
|
-
```
|
|
1913
|
-
|
|
1914
|
-
### 2. Graph Analytics
|
|
1915
|
-
|
|
1916
|
-
```javascript
|
|
1917
|
-
const { GraphFrame } = require('rust-kgdb')
|
|
1918
|
-
|
|
1919
|
-
const graph = new GraphFrame(
|
|
1920
|
-
JSON.stringify([{id:'alice'}, {id:'bob'}, {id:'charlie'}]),
|
|
1921
|
-
JSON.stringify([
|
|
1922
|
-
{src:'alice', dst:'bob'},
|
|
1923
|
-
{src:'bob', dst:'charlie'},
|
|
1924
|
-
{src:'charlie', dst:'alice'}
|
|
1925
|
-
])
|
|
1926
|
-
)
|
|
1927
|
-
|
|
1928
|
-
console.log('Triangles:', graph.triangleCount()) // 1
|
|
1929
|
-
console.log('PageRank:', JSON.parse(graph.pageRank(0.15, 20)))
|
|
1930
|
-
console.log('Components:', JSON.parse(graph.connectedComponents()))
|
|
1931
|
-
```
|
|
1932
|
-
|
|
1933
|
-
### 3. Semantic Similarity
|
|
1934
|
-
|
|
1935
|
-
```javascript
|
|
1936
|
-
const { EmbeddingService } = require('rust-kgdb')
|
|
1937
|
-
|
|
1938
|
-
const embeddings = new EmbeddingService()
|
|
1939
|
-
|
|
1940
|
-
// Store 384-dimension vectors
|
|
1941
|
-
embeddings.storeVector('claim_001', new Array(384).fill(0.5))
|
|
1942
|
-
embeddings.storeVector('claim_002', new Array(384).fill(0.6))
|
|
1943
|
-
embeddings.rebuildIndex()
|
|
1944
|
-
|
|
1945
|
-
// HNSW similarity search
|
|
1946
|
-
const similar = JSON.parse(embeddings.findSimilar('claim_001', 5, 0.7))
|
|
1947
|
-
console.log('Similar:', similar)
|
|
1948
|
-
```
|
|
1949
|
-
|
|
1950
|
-
### 4. Rule-Based Reasoning
|
|
1951
|
-
|
|
1952
|
-
```javascript
|
|
1953
|
-
const { DatalogProgram, evaluateDatalog, queryDatalog } = require('rust-kgdb')
|
|
1954
|
-
|
|
1955
|
-
const program = new DatalogProgram()
|
|
1956
|
-
|
|
1957
|
-
program.addFact(JSON.stringify({predicate: 'parent', terms: ['alice', 'bob']}))
|
|
1958
|
-
program.addFact(JSON.stringify({predicate: 'parent', terms: ['bob', 'charlie']}))
|
|
1959
|
-
|
|
1960
|
-
// grandparent(X, Z) :- parent(X, Y), parent(Y, Z)
|
|
1961
|
-
program.addRule(JSON.stringify({
|
|
1962
|
-
head: {predicate: 'grandparent', terms: ['?X', '?Z']},
|
|
1963
|
-
body: [
|
|
1964
|
-
{predicate: 'parent', terms: ['?X', '?Y']},
|
|
1965
|
-
{predicate: 'parent', terms: ['?Y', '?Z']}
|
|
1966
|
-
]
|
|
1967
|
-
}))
|
|
1968
|
-
|
|
1969
|
-
console.log('Inferred:', JSON.parse(evaluateDatalog(program)))
|
|
1970
|
-
```
|
|
1971
|
-
|
|
1972
|
-
### 5. HyperMind Agent (Complete Example)
|
|
1973
|
-
|
|
1974
|
-
```javascript
|
|
1975
|
-
const {
|
|
1976
|
-
GraphDB, EmbeddingService, HyperMindAgent,
|
|
1977
|
-
MemoryManager, AgentScope, LLMPlanner,
|
|
1978
|
-
createSchemaAwareGraphDB
|
|
1979
|
-
} = require('rust-kgdb')
|
|
1980
|
-
|
|
1981
|
-
// Create schema-aware database (auto-extracts schema on load)
|
|
1982
|
-
const db = createSchemaAwareGraphDB('http://insurance.org/')
|
|
1983
|
-
db.loadTtl(`
|
|
1984
|
-
@prefix : <http://insurance.org/> .
|
|
1985
|
-
:CLM001 a :Claim ; :amount "50000" ; :provider :PROV001 .
|
|
1986
|
-
:CLM002 a :Claim ; :amount "75000" ; :provider :PROV001 .
|
|
1987
|
-
:PROV001 a :Provider ; :riskScore "0.87" ; :name "MedCorp" .
|
|
1988
|
-
:PROV002 a :Provider ; :riskScore "0.35" ; :name "HealthCo" .
|
|
1989
|
-
`, null)
|
|
1990
|
-
|
|
1991
|
-
// Full configuration showing all layers
|
|
1992
|
-
const agent = new HyperMindAgent({
|
|
1993
|
-
// === REQUIRED ===
|
|
1994
|
-
kg: db,
|
|
1995
|
-
|
|
1996
|
-
// === LAYER 1: LLM Planner (Production Mode) ===
|
|
1997
|
-
model: 'gpt-4o', // LLM model for intent + SPARQL
|
|
1998
|
-
apiKey: process.env.OPENAI_API_KEY, // Required for LLM calls
|
|
1999
|
-
|
|
2000
|
-
// === LAYER 2: Memory ===
|
|
2001
|
-
memory: new MemoryManager({
|
|
2002
|
-
workingMemorySize: 10, // LRU cache for current session
|
|
2003
|
-
episodicRetentionDays: 30, // How long to keep episodes
|
|
2004
|
-
longTermGraph: 'http://memory.hypermind.ai/' // Persistent memory
|
|
2005
|
-
}),
|
|
2006
|
-
|
|
2007
|
-
// === LAYER 3: Scope ===
|
|
2008
|
-
scope: new AgentScope({
|
|
2009
|
-
allowedGraphs: ['http://insurance.org/'], // Graphs agent can access
|
|
2010
|
-
allowedPredicates: null, // null = all predicates
|
|
2011
|
-
maxResultSize: 10000 // Limit result set size
|
|
2012
|
-
}),
|
|
24
|
+
**Models tested**: Claude Sonnet 4 (90.9%), GPT-4o (81.8%)
|
|
2013
25
|
|
|
2014
|
-
|
|
2015
|
-
embeddings: new EmbeddingService(), // For similarity search
|
|
26
|
+
[Full Benchmark Report →](./HYPERMIND_BENCHMARK_REPORT.md)
|
|
2016
27
|
|
|
2017
|
-
|
|
2018
|
-
sandbox: {
|
|
2019
|
-
capabilities: ['ReadKG', 'ExecuteTool'], // No WriteKG = read-only
|
|
2020
|
-
fuelLimit: 1_000_000 // CPU budget
|
|
2021
|
-
},
|
|
2022
|
-
|
|
2023
|
-
// === LAYER 6: Identity & Session ===
|
|
2024
|
-
name: 'fraud-detector', // Persistent agent identity
|
|
2025
|
-
userId: 'user:alice@company.com', // User identity (for multi-tenant)
|
|
2026
|
-
sessionId: 'session:2025-12-15-001' // Session tracking
|
|
2027
|
-
})
|
|
28
|
+
---
|
|
2028
29
|
|
|
2029
|
-
|
|
2030
|
-
await db.waitForSchema()
|
|
30
|
+
## Quick Start
|
|
2031
31
|
|
|
2032
|
-
|
|
2033
|
-
const result = await agent.call('Find all high-risk claims')
|
|
32
|
+
### Installation
|
|
2034
33
|
|
|
2035
|
-
|
|
2036
|
-
|
|
2037
|
-
console.log('SPARQL Generated:', result.explanation.sparql_queries)
|
|
2038
|
-
console.log('Proof Hash:', result.proof?.hash)
|
|
34
|
+
```bash
|
|
35
|
+
npm install rust-kgdb
|
|
2039
36
|
```
|
|
2040
37
|
|
|
2041
|
-
**
|
|
2042
|
-
|
|
2043
|
-
| Layer | Default Value |
|
|
2044
|
-
|-------|---------------|
|
|
2045
|
-
| Memory | Disabled (no session persistence) |
|
|
2046
|
-
| Scope | Unrestricted (all graphs, all predicates) |
|
|
2047
|
-
| Embeddings | Disabled (no similarity search) |
|
|
2048
|
-
| Sandbox | `['ReadKG', 'ExecuteTool']`, fuel: 1M |
|
|
2049
|
-
| LLM Model | None (demo mode with keyword matching) |
|
|
2050
|
-
| Identity | Auto-generated UUID, no user tracking |
|
|
2051
|
-
|
|
2052
|
-
### Session Management: User Identity & Agent Persistence
|
|
2053
|
-
|
|
2054
|
-
HyperMind provides **recognized and persisted** identities for multi-tenant, audit-compliant deployments:
|
|
2055
|
-
|
|
2056
|
-
```
|
|
2057
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
2058
|
-
│ SESSION & IDENTITY MODEL │
|
|
2059
|
-
│ │
|
|
2060
|
-
│ THREE IDENTITY LAYERS: │
|
|
2061
|
-
│ │
|
|
2062
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
2063
|
-
│ │ 1. AGENT NAME (Persistent) │ │
|
|
2064
|
-
│ │ - Unique identifier for the agent type │ │
|
|
2065
|
-
│ │ - Persists across sessions, users, and restarts │ │
|
|
2066
|
-
│ │ - Example: 'fraud-detector', 'underwriter', 'claims-reviewer' │ │
|
|
2067
|
-
│ │ - Used for: Role-based access, audit trails, agent memory │ │
|
|
2068
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
2069
|
-
│ │
|
|
2070
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
2071
|
-
│ │ 2. USER ID (Multi-tenant) │ │
|
|
2072
|
-
│ │ - Identity of the human user invoking the agent │ │
|
|
2073
|
-
│ │ - Persisted in episodic memory for audit compliance │ │
|
|
2074
|
-
│ │ - Example: 'user:alice@company.com', 'user:claims-team' │ │
|
|
2075
|
-
│ │ - Used for: Access control, usage tracking, billing │ │
|
|
2076
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
2077
|
-
│ │
|
|
2078
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
2079
|
-
│ │ 3. SESSION ID (Ephemeral) │ │
|
|
2080
|
-
│ │ - Unique identifier for a single conversation/interaction │ │
|
|
2081
|
-
│ │ - Links all operations within one user interaction │ │
|
|
2082
|
-
│ │ - Example: 'session:2025-12-15-001', auto-generated UUID │ │
|
|
2083
|
-
│ │ - Used for: Conversation context, working memory scope │ │
|
|
2084
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
2085
|
-
│ │
|
|
2086
|
-
│ PERSISTENCE MODEL: │
|
|
2087
|
-
│ │
|
|
2088
|
-
│ Agent Name ─────► Stored in KG: <agent:fraud-detector> a am:Agent . │
|
|
2089
|
-
│ User ID ─────► Stored in KG: <user:alice> a am:User . │
|
|
2090
|
-
│ Session ID ─────► Stored in KG: <session:001> a am:Session . │
|
|
2091
|
-
│ │
|
|
2092
|
-
│ Episode ─────────► Links all three: │
|
|
2093
|
-
│ <episode:123> am:performedBy <agent:fraud-detector> ; │
|
|
2094
|
-
│ am:requestedBy <user:alice> ; │
|
|
2095
|
-
│ am:inSession <session:001> . │
|
|
2096
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
2097
|
-
```
|
|
38
|
+
**Platforms**: macOS (Intel/Apple Silicon), Linux (x64/ARM64), Windows (x64)
|
|
2098
39
|
|
|
2099
|
-
|
|
40
|
+
### Basic Usage (5 Lines)
|
|
2100
41
|
|
|
2101
42
|
```javascript
|
|
2102
|
-
const {
|
|
2103
|
-
|
|
2104
|
-
// Create agent with full identity configuration
|
|
2105
|
-
const agent = new HyperMindAgent({
|
|
2106
|
-
kg: db,
|
|
2107
|
-
|
|
2108
|
-
// Agent identity (persistent across all users/sessions)
|
|
2109
|
-
name: 'fraud-detector',
|
|
2110
|
-
|
|
2111
|
-
// User identity (for multi-tenant deployments)
|
|
2112
|
-
userId: 'user:alice@acme-insurance.com',
|
|
43
|
+
const { GraphDB } = require('rust-kgdb')
|
|
2113
44
|
|
|
2114
|
-
|
|
2115
|
-
|
|
2116
|
-
|
|
2117
|
-
//
|
|
2118
|
-
memory: new MemoryManager({
|
|
2119
|
-
workingMemorySize: 20, // In-session context
|
|
2120
|
-
episodicRetentionDays: 90, // 90-day retention for compliance
|
|
2121
|
-
longTermGraph: 'http://memory.acme-insurance.com/'
|
|
2122
|
-
})
|
|
2123
|
-
})
|
|
2124
|
-
|
|
2125
|
-
// First query in session
|
|
2126
|
-
await agent.call('Find claims over $100,000')
|
|
2127
|
-
|
|
2128
|
-
// Second query - agent remembers context from first query
|
|
2129
|
-
await agent.call('Now show me which of those are from Provider P001')
|
|
2130
|
-
|
|
2131
|
-
// Episodic memory stores the full conversation:
|
|
2132
|
-
// <episode:uuid-1> am:prompt "Find claims over $100,000" ;
|
|
2133
|
-
// am:performedBy <agent:fraud-detector> ;
|
|
2134
|
-
// am:requestedBy <user:alice@acme-insurance.com> ;
|
|
2135
|
-
// am:inSession <session:web-ui-2025-12-15-143022> ;
|
|
2136
|
-
// am:timestamp "2025-12-15T14:30:22Z" .
|
|
45
|
+
const db = new GraphDB('http://example.org/')
|
|
46
|
+
db.loadTtl(':alice :knows :bob .', null)
|
|
47
|
+
const results = db.querySelect('SELECT ?who WHERE { ?who :knows :bob }')
|
|
48
|
+
console.log(results) // [{ bindings: { who: 'http://example.org/alice' } }]
|
|
2137
49
|
```
|
|
2138
50
|
|
|
2139
|
-
|
|
51
|
+
### Complete Example with AI Agent
|
|
2140
52
|
|
|
2141
|
-
|
|
2142
|
-
|
|
2143
|
-
| `name` | String | Permanent (KG) | Agent type identification |
|
|
2144
|
-
| `userId` | URI or String | Per-episode | Audit trails, multi-tenant isolation |
|
|
2145
|
-
| `sessionId` | UUID or String | Per-session | Conversation continuity |
|
|
53
|
+
```javascript
|
|
54
|
+
const { GraphDB, HyperMindAgent, createSchemaAwareGraphDB } = require('rust-kgdb')
|
|
2146
55
|
|
|
2147
|
-
|
|
56
|
+
// Load your data
|
|
57
|
+
const db = createSchemaAwareGraphDB('http://insurance.org/')
|
|
58
|
+
db.loadTtl(`
|
|
59
|
+
@prefix : <http://insurance.org/> .
|
|
60
|
+
:CLM001 a :Claim ; :amount "50000" ; :provider :PROV001 .
|
|
61
|
+
:PROV001 a :Provider ; :riskScore "0.87" ; :name "MedCorp" .
|
|
62
|
+
`, null)
|
|
2148
63
|
|
|
2149
|
-
|
|
2150
|
-
// New session, same user - retrieve previous context
|
|
64
|
+
// Create AI agent
|
|
2151
65
|
const agent = new HyperMindAgent({
|
|
2152
66
|
kg: db,
|
|
2153
|
-
|
|
2154
|
-
|
|
2155
|
-
sessionId: 'session:web-ui-2025-12-16-091500', // New session
|
|
2156
|
-
memory: new MemoryManager({ episodicRetentionDays: 90 })
|
|
67
|
+
model: 'gpt-4o',
|
|
68
|
+
apiKey: process.env.OPENAI_API_KEY
|
|
2157
69
|
})
|
|
2158
70
|
|
|
2159
|
-
//
|
|
2160
|
-
const previousInvestigations = await agent.memory.query(`
|
|
2161
|
-
SELECT ?prompt ?result ?timestamp WHERE {
|
|
2162
|
-
?episode am:requestedBy <user:alice@acme-insurance.com> ;
|
|
2163
|
-
am:prompt ?prompt ;
|
|
2164
|
-
am:result ?result ;
|
|
2165
|
-
am:timestamp ?timestamp .
|
|
2166
|
-
} ORDER BY DESC(?timestamp) LIMIT 10
|
|
2167
|
-
`)
|
|
2168
|
-
// Returns: Last 10 queries by Alice across all her sessions
|
|
2169
|
-
```
|
|
2170
|
-
|
|
2171
|
-
**Audit Compliance Features**:
|
|
2172
|
-
|
|
2173
|
-
| Requirement | How HyperMind Addresses It |
|
|
2174
|
-
|-------------|---------------------------|
|
|
2175
|
-
| Who ran the query? | `userId` persisted in every episode |
|
|
2176
|
-
| What agent was used? | `name` links to agent's capabilities |
|
|
2177
|
-
| When did it happen? | `am:timestamp` on every episode |
|
|
2178
|
-
| What was the result? | `am:result` with full execution trace |
|
|
2179
|
-
| Can we replay it? | ProofDAG enables deterministic replay |
|
|
2180
|
-
| Retention policy? | `episodicRetentionDays` enforces TTL |
|
|
2181
|
-
|
|
2182
|
-
### Schema-Aware Intent: Different Words → Same Result
|
|
2183
|
-
|
|
2184
|
-
The LLM Planner + Schema injection ensures consistent results regardless of phrasing:
|
|
2185
|
-
|
|
2186
|
-
```javascript
|
|
2187
|
-
// All these queries produce the SAME SPARQL because LLM knows your schema
|
|
2188
|
-
await agent.call('Find high-risk providers') // "high-risk"
|
|
2189
|
-
await agent.call('Show me suspicious vendors') // "suspicious vendors"
|
|
2190
|
-
await agent.call('Which suppliers have elevated risk?') // "elevated risk"
|
|
2191
|
-
await agent.call('List providers with bad scores') // "bad scores"
|
|
2192
|
-
|
|
2193
|
-
// Generated SPARQL (same for all above):
|
|
2194
|
-
// SELECT ?provider ?name ?score WHERE {
|
|
2195
|
-
// ?provider a :Provider ; :name ?name ; :riskScore ?score .
|
|
2196
|
-
// FILTER(?score > 0.7)
|
|
2197
|
-
// }
|
|
2198
|
-
```
|
|
2199
|
-
|
|
2200
|
-
**How it works**:
|
|
2201
|
-
1. LLM receives your schema: `{ classes: ['Claim', 'Provider'], predicates: ['riskScore', 'amount'] }`
|
|
2202
|
-
2. LLM understands "vendors", "suppliers", "providers" all map to `:Provider`
|
|
2203
|
-
3. LLM understands "high-risk", "suspicious", "bad" all map to `:riskScore > threshold`
|
|
2204
|
-
4. Generated SPARQL uses YOUR actual predicates, not hallucinated ones
|
|
2205
|
-
|
|
2206
|
-
### Mathematical Foundation: Predictable AI
|
|
2207
|
-
|
|
2208
|
-
Unlike black-box LLMs, HyperMind produces **deterministic, verifiable results**:
|
|
2209
|
-
|
|
2210
|
-
```
|
|
2211
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
2212
|
-
│ NEURO-SYMBOLIC ARCHITECTURE │
|
|
2213
|
-
│ │
|
|
2214
|
-
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
|
2215
|
-
│ │ Neural │ │ Symbolic │ │ Output │ │
|
|
2216
|
-
│ │ (LLM) │────→│ (SPARQL) │────→│ (Proof DAG) │ │
|
|
2217
|
-
│ │ │ │ │ │ │ │
|
|
2218
|
-
│ │ Intent classif │ │ Query execution│ │ Verifiable │ │
|
|
2219
|
-
│ │ SPARQL gen │ │ Datalog rules │ │ Reproducible │ │
|
|
2220
|
-
│ └────────────────┘ └────────────────┘ └────────────────┘ │
|
|
2221
|
-
│ │
|
|
2222
|
-
│ "Find fraud" → SELECT ?claim WHERE {...} → { hash: "0x8f3a...", │
|
|
2223
|
-
│ derivation: [...] } │
|
|
2224
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
2225
|
-
```
|
|
2226
|
-
|
|
2227
|
-
**Three Mathematical Pillars**:
|
|
2228
|
-
|
|
2229
|
-
| Pillar | Guarantee | Implementation |
|
|
2230
|
-
|--------|-----------|----------------|
|
|
2231
|
-
| **Type Theory** | Input/output contracts enforced | `kg.sparql.query: Query → BindingSet` |
|
|
2232
|
-
| **Category Theory** | Safe tool composition | Morphisms compose: `A → B → C` |
|
|
2233
|
-
| **Proof Theory** | Every answer has provenance | ProofDAG with Curry-Howard witness |
|
|
2234
|
-
|
|
2235
|
-
**Why This Matters**:
|
|
2236
|
-
- **No Hallucination**: SPARQL results come from your actual data
|
|
2237
|
-
- **Audit Trail**: Every conclusion traceable to source triples
|
|
2238
|
-
- **Reproducibility**: Same query → same answer → same proof hash
|
|
2239
|
-
- **Compliance Ready**: Full provenance for regulatory requirements
|
|
2240
|
-
|
|
2241
|
-
### Comparison with Agentic Frameworks
|
|
2242
|
-
|
|
2243
|
-
How HyperMind differs from popular LLM orchestration frameworks:
|
|
2244
|
-
|
|
2245
|
-
| Feature | HyperMind | LangChain | DSPy | CrewAI | AutoGPT |
|
|
2246
|
-
|---------|-----------|-----------|------|--------|---------|
|
|
2247
|
-
| **Core Paradigm** | Neuro-Symbolic | Chain-of-Thought | Prompt Optimization | Multi-Agent Roles | Autonomous Loop |
|
|
2248
|
-
| **Prompt Optimization** | ✅ Schema injection | ❌ Manual templates | ✅ Compiled prompts | ❌ Role-based | ❌ Fixed prompts |
|
|
2249
|
-
| **Grounding Source** | Knowledge Graph | External retrievers | Training data | Tool calls | Web search |
|
|
2250
|
-
| **Verification** | ✅ ProofDAG | ❌ Trust LLM | ❌ Trust LLM | ❌ Trust LLM | ❌ Trust LLM |
|
|
2251
|
-
| **Determinism** | ✅ Same hash | ❌ Varies | ❌ Varies | ❌ Varies | ❌ Varies |
|
|
2252
|
-
| **Memory Model** | Temporal + LT KG | VectorDB | None | VectorDB | VectorDB |
|
|
2253
|
-
| **Security** | WASM OCAP | Trust-based | None | Trust-based | Trust-based |
|
|
2254
|
-
| **Type Safety** | ✅ Curry-Howard | ❌ Runtime | ❌ Runtime | ❌ Runtime | ❌ Runtime |
|
|
2255
|
-
|
|
2256
|
-
#### Prompt Optimization: Schema Injection vs. Others
|
|
2257
|
-
|
|
2258
|
-
**LangChain (Manual Prompts)**:
|
|
2259
|
-
```python
|
|
2260
|
-
# Developer writes prompts by hand - error-prone, doesn't know actual schema
|
|
2261
|
-
template = """Given this context: {context}
|
|
2262
|
-
Answer: {question}"""
|
|
2263
|
-
# Problem: Context is unstructured, LLM may hallucinate predicates
|
|
2264
|
-
```
|
|
2265
|
-
|
|
2266
|
-
**DSPy (Compiled Prompts)**:
|
|
2267
|
-
```python
|
|
2268
|
-
# Learns optimal prompts from training examples
|
|
2269
|
-
class FraudDetector(dspy.Signature):
|
|
2270
|
-
claim = dspy.InputField()
|
|
2271
|
-
is_fraud = dspy.OutputField()
|
|
2272
|
-
# Problem: Still no grounding - outputs are unverified predictions
|
|
2273
|
-
```
|
|
2274
|
-
|
|
2275
|
-
**HyperMind (Schema-Injected Prompts)**:
|
|
2276
|
-
```javascript
|
|
2277
|
-
// Automatic schema extraction + injection
|
|
2278
|
-
const schema = SchemaContext.fromKG(db)
|
|
2279
|
-
// schema = { classes: ['Claim', 'Provider'], predicates: ['amount', 'riskScore'] }
|
|
2280
|
-
|
|
2281
|
-
// LLM receives YOUR schema - can only use valid predicates
|
|
2282
|
-
// Prompt: "Generate SPARQL using ONLY: amount, riskScore, submittedBy"
|
|
2283
|
-
// Result: Valid SPARQL that executes against YOUR data
|
|
2284
|
-
```
|
|
2285
|
-
|
|
2286
|
-
**Why Schema Injection > Prompt Templates**:
|
|
2287
|
-
|
|
2288
|
-
| Approach | Hallucination Risk | Schema Drift | Verification |
|
|
2289
|
-
|----------|-------------------|--------------|--------------|
|
|
2290
|
-
| Manual templates | High | Not handled | None |
|
|
2291
|
-
| DSPy compiled | Medium | Not handled | None |
|
|
2292
|
-
| **HyperMind schema** | **Low** | **Auto-detected** | **ProofDAG** |
|
|
2293
|
-
|
|
2294
|
-
```
|
|
2295
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
2296
|
-
│ PROMPT OPTIMIZATION COMPARISON │
|
|
2297
|
-
│ │
|
|
2298
|
-
│ LANGCHAIN: HYPERMIND: │
|
|
2299
|
-
│ ┌──────────────────┐ ┌──────────────────┐ │
|
|
2300
|
-
│ │ Static Prompt │ │ Schema Extract │ ← Auto from KG │
|
|
2301
|
-
│ │ "Find fraud..." │ │ {classes, pred} │ │
|
|
2302
|
-
│ └────────┬─────────┘ └────────┬─────────┘ │
|
|
2303
|
-
│ │ │ │
|
|
2304
|
-
│ ▼ ▼ │
|
|
2305
|
-
│ ┌──────────────────┐ ┌──────────────────┐ │
|
|
2306
|
-
│ │ LLM │ │ LLM + Schema │ ← Constrained │
|
|
2307
|
-
│ │ (unconstrained) │ │ injection │ │
|
|
2308
|
-
│ └────────┬─────────┘ └────────┬─────────┘ │
|
|
2309
|
-
│ │ │ │
|
|
2310
|
-
│ ▼ ▼ │
|
|
2311
|
-
│ ┌──────────────────┐ ┌──────────────────┐ │
|
|
2312
|
-
│ │ "fraud in the │ │ SELECT ?claim │ ← Valid SPARQL │
|
|
2313
|
-
│ │ insurance..." │ │ WHERE {valid} │ │
|
|
2314
|
-
│ │ (unstructured) │ └────────┬─────────┘ │
|
|
2315
|
-
│ └──────────────────┘ │ │
|
|
2316
|
-
│ ▼ │
|
|
2317
|
-
│ ┌──────────────────┐ │
|
|
2318
|
-
│ │ Execute against │ ← Actual data │
|
|
2319
|
-
│ │ Knowledge Graph │ │
|
|
2320
|
-
│ └────────┬─────────┘ │
|
|
2321
|
-
│ │ │
|
|
2322
|
-
│ ▼ │
|
|
2323
|
-
│ ┌──────────────────┐ │
|
|
2324
|
-
│ │ ProofDAG │ ← Verifiable │
|
|
2325
|
-
│ │ hash: 0x8f3a... │ │
|
|
2326
|
-
│ └──────────────────┘ │
|
|
2327
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
2328
|
-
```
|
|
2329
|
-
|
|
2330
|
-
**Key Insight**: DSPy optimizes prompts for *output format*. HyperMind optimizes prompts for *semantic correctness* by grounding in your actual data schema.
|
|
2331
|
-
|
|
2332
|
-
### HyperMind as Intelligence Control Plane
|
|
2333
|
-
|
|
2334
|
-
HyperMind implements a **control plane architecture** for LLM agents, aligning with recent research on the "missing coordination layer" for AI systems (see [Chang 2025](https://arxiv.org/abs/2512.05765)).
|
|
2335
|
-
|
|
2336
|
-
```
|
|
2337
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
2338
|
-
│ HYPERMIND CONTROL PLANE │
|
|
2339
|
-
│ │
|
|
2340
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
2341
|
-
│ │ LAYER 3: PROOF/VERIFICATION (Type Theory) ││
|
|
2342
|
-
│ │ - Curry-Howard correspondence: proofs as programs ││
|
|
2343
|
-
│ │ - ProofDAG: verifiable reasoning chains ││
|
|
2344
|
-
│ │ - Deterministic hashes: reproducible conclusions ││
|
|
2345
|
-
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
2346
|
-
│ ↑ │
|
|
2347
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
2348
|
-
│ │ LAYER 2: SCHEMA/CONSTRAINT (Category Theory) ││
|
|
2349
|
-
│ │ - SchemaContext: semantic anchoring to KG structure ││
|
|
2350
|
-
│ │ - Tool composition: morphisms A → B → C ││
|
|
2351
|
-
│ │ - Type contracts: Query → BindingSet (enforced) ││
|
|
2352
|
-
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
2353
|
-
│ ↑ │
|
|
2354
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
2355
|
-
│ │ LAYER 1: MEMORY/PERSISTENCE (Hypergraph) ││
|
|
2356
|
-
│ │ - Episodic memory: temporal scoring, rolling context ││
|
|
2357
|
-
│ │ - Long-term KG: persistent facts + relationships ││
|
|
2358
|
-
│ │ - Session continuity: cross-invocation state ││
|
|
2359
|
-
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
2360
|
-
│ ↑ │
|
|
2361
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
2362
|
-
│ │ LLM (Pattern Layer - e.g., Claude, GPT-4o) ││
|
|
2363
|
-
│ │ - Intent classification ││
|
|
2364
|
-
│ │ - SPARQL generation (constrained by schema) ││
|
|
2365
|
-
│ │ - Natural language understanding ││
|
|
2366
|
-
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
2367
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
2368
|
-
```
|
|
2369
|
-
|
|
2370
|
-
**Key Insight**: LLMs alone produce "pattern alchemy" - plausible but unverified outputs. HyperMind adds **coordination physics** through:
|
|
2371
|
-
|
|
2372
|
-
| Control Mechanism | Implementation | Effect |
|
|
2373
|
-
|-------------------|----------------|--------|
|
|
2374
|
-
| **Semantic Anchoring** | SchemaContext injection | LLM outputs constrained to valid predicates |
|
|
2375
|
-
| **Goal-Directed Constraints** | Type contracts (TOOL_REGISTRY) | Tool composition validated at compile-time |
|
|
2376
|
-
| **Transactional Memory** | Memory Hypergraph | Context persists across sessions |
|
|
2377
|
-
| **Verification Layer** | ProofDAG | Every conclusion has auditable derivation |
|
|
2378
|
-
|
|
2379
|
-
**Research Alignment**:
|
|
2380
|
-
- [Chang 2025 - "The Missing Layer of AGI"](https://arxiv.org/abs/2512.05765): Coordination layer shifts LLM outputs from unguided to goal-directed
|
|
2381
|
-
- [Curry-Howard Correspondence](https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence): Proofs = Programs (HyperMind implements this)
|
|
2382
|
-
- [Spivak's Ologs](https://arxiv.org/abs/1102.1889): Category-theoretic knowledge representation
|
|
2383
|
-
|
|
2384
|
-
### ProofDAG Example Output
|
|
2385
|
-
|
|
2386
|
-
Every HyperMind agent response includes a verifiable proof:
|
|
2387
|
-
|
|
2388
|
-
```javascript
|
|
71
|
+
// Ask questions in plain English
|
|
2389
72
|
const result = await agent.call('Find high-risk providers')
|
|
2390
73
|
|
|
2391
|
-
|
|
2392
|
-
|
|
2393
|
-
|
|
2394
|
-
|
|
2395
|
-
|
|
2396
|
-
|
|
2397
|
-
|
|
2398
|
-
│ │
|
|
2399
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
2400
|
-
│ │ ROOT: conclusion │ │
|
|
2401
|
-
│ │ hash: 0x8f3a2b1c... │ │
|
|
2402
|
-
│ │ type: FraudReport │ │
|
|
2403
|
-
│ │ confidence: 0.94 │ │
|
|
2404
|
-
│ └──────────────────────────┬──────────────────────────────────────────┘ │
|
|
2405
|
-
│ │ │
|
|
2406
|
-
│ ┌────────────────┼────────────────┐ │
|
|
2407
|
-
│ │ │ │ │
|
|
2408
|
-
│ ▼ ▼ ▼ │
|
|
2409
|
-
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
2410
|
-
│ │ sparql_result│ │datalog_rule │ │embedding_sim │ │
|
|
2411
|
-
│ │ │ │ │ │ │ │
|
|
2412
|
-
│ │ tool: query │ │ tool: apply │ │ tool: search │ │
|
|
2413
|
-
│ │ bindings: 47 │ │ rule: fraud │ │ similar: 3 │ │
|
|
2414
|
-
│ │ time: 2.3ms │ │ inferred: 12 │ │ threshold:0.8│ │
|
|
2415
|
-
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
2416
|
-
│ │
|
|
2417
|
-
│ Derivation Chain: │
|
|
2418
|
-
│ 1. kg.sparql.query → 47 high-amount claims from Provider P001 │
|
|
2419
|
-
│ 2. kg.datalog.apply → fraud_pattern rule matched 12 claims │
|
|
2420
|
-
│ 3. kg.embeddings.search → P001 similar to 3 known fraud providers │
|
|
2421
|
-
│ 4. CONCLUSION: P001 risk score 0.87 (high confidence) │
|
|
2422
|
-
│ │
|
|
2423
|
-
│ Proof Hash: 0x8f3a2b1c4d5e6f7a8b9c0d1e2f3a4b5c │
|
|
2424
|
-
│ (Deterministic - same inputs always produce same hash) │
|
|
2425
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
2426
|
-
```
|
|
2427
|
-
|
|
2428
|
-
**JSON Structure**:
|
|
2429
|
-
```json
|
|
2430
|
-
{
|
|
2431
|
-
"hash": "0x8f3a2b1c4d5e6f7a8b9c0d1e2f3a4b5c",
|
|
2432
|
-
"type": "curry_howard_witness",
|
|
2433
|
-
"root": {
|
|
2434
|
-
"id": "conclusion",
|
|
2435
|
-
"type": "FraudReport",
|
|
2436
|
-
"confidence": 0.94,
|
|
2437
|
-
"derives_from": ["sparql_result", "datalog_rule", "embedding_sim"]
|
|
2438
|
-
},
|
|
2439
|
-
"nodes": [
|
|
2440
|
-
{
|
|
2441
|
-
"id": "sparql_result",
|
|
2442
|
-
"tool": "kg.sparql.query",
|
|
2443
|
-
"input_type": "Query",
|
|
2444
|
-
"output_type": "BindingSet",
|
|
2445
|
-
"result": { "count": 47, "time_ms": 2.3 }
|
|
2446
|
-
},
|
|
2447
|
-
{
|
|
2448
|
-
"id": "datalog_rule",
|
|
2449
|
-
"tool": "kg.datalog.apply",
|
|
2450
|
-
"input_type": "RuleSet",
|
|
2451
|
-
"output_type": "InferredFacts",
|
|
2452
|
-
"result": { "rule": "fraud_pattern", "inferred": 12 }
|
|
2453
|
-
},
|
|
2454
|
-
{
|
|
2455
|
-
"id": "embedding_sim",
|
|
2456
|
-
"tool": "kg.embeddings.search",
|
|
2457
|
-
"input_type": "Entity",
|
|
2458
|
-
"output_type": "SimilarEntities",
|
|
2459
|
-
"result": { "similar": 3, "threshold": 0.8 }
|
|
2460
|
-
}
|
|
2461
|
-
],
|
|
2462
|
-
"timestamp": "2025-12-15T10:30:00Z",
|
|
2463
|
-
"agent": "fraud-detector"
|
|
2464
|
-
}
|
|
2465
|
-
```
|
|
2466
|
-
|
|
2467
|
-
**How Intent Classification Works:**
|
|
2468
|
-
|
|
2469
|
-
For accurate natural language → SPARQL conversion, the agent needs:
|
|
2470
|
-
|
|
2471
|
-
1. **Schema Awareness** - Know actual predicates in your graph
|
|
2472
|
-
2. **Semantic Understanding** - Map natural language to graph operations
|
|
2473
|
-
3. **Dynamic Query Generation** - Build SPARQL for your specific schema
|
|
2474
|
-
|
|
2475
|
-
**Two Modes of Operation:**
|
|
2476
|
-
|
|
2477
|
-
| Mode | Intent Classification | SPARQL Generation | Use Case |
|
|
2478
|
-
|------|----------------------|-------------------|----------|
|
|
2479
|
-
| **Demo Mode** (default) | Keyword patterns | Hardcoded templates | Quick testing, demos |
|
|
2480
|
-
| **Production Mode** | LLM + Schema injection | LLM-generated | Accurate queries on real data |
|
|
2481
|
-
|
|
2482
|
-
### Demo Mode (Current Default)
|
|
2483
|
-
|
|
2484
|
-
Works with keyword matching and pre-built templates:
|
|
2485
|
-
|
|
2486
|
-
```javascript
|
|
2487
|
-
const agent = new HyperMindAgent({ kg: db })
|
|
2488
|
-
|
|
2489
|
-
// Works: keyword "fraud" matches detect_fraud intent
|
|
2490
|
-
await agent.call('Find fraud cases')
|
|
2491
|
-
|
|
2492
|
-
// Fails: "anomalous" doesn't match any keyword
|
|
2493
|
-
await agent.call('Find anomalous billing patterns') // Falls back to generic query
|
|
74
|
+
// Every answer includes:
|
|
75
|
+
// - The SPARQL query that was generated
|
|
76
|
+
// - The data that was retrieved
|
|
77
|
+
// - A reasoning trace showing how the conclusion was reached
|
|
78
|
+
// - A cryptographic hash for reproducibility
|
|
79
|
+
console.log(result.answer)
|
|
80
|
+
console.log(result.reasoningTrace) // Full audit trail
|
|
2494
81
|
```
|
|
2495
82
|
|
|
2496
|
-
|
|
2497
|
-
- Only matches exact keywords: "fraud", "suspicious", "risk", "similar", etc.
|
|
2498
|
-
- Uses hardcoded SPARQL templates that may not match your schema
|
|
2499
|
-
- Suitable for demos with insurance/LUBM ontologies only
|
|
83
|
+
---
|
|
2500
84
|
|
|
2501
|
-
|
|
85
|
+
## Use Cases
|
|
2502
86
|
|
|
2503
|
-
|
|
87
|
+
### Fraud Detection
|
|
2504
88
|
|
|
2505
89
|
```javascript
|
|
2506
90
|
const agent = new HyperMindAgent({
|
|
2507
|
-
kg:
|
|
2508
|
-
|
|
2509
|
-
model: 'claude-
|
|
2510
|
-
apiKey: process.env.ANTHROPIC_API_KEY // Required for LLM calls
|
|
91
|
+
kg: insuranceDB,
|
|
92
|
+
name: 'fraud-detector',
|
|
93
|
+
model: 'claude-3-opus'
|
|
2511
94
|
})
|
|
2512
95
|
|
|
2513
|
-
|
|
2514
|
-
|
|
2515
|
-
|
|
2516
|
-
|
|
2517
|
-
|
|
2518
|
-
|
|
96
|
+
const result = await agent.call('Find providers with suspicious billing patterns')
|
|
97
|
+
// Returns: List of providers with complete evidence trail
|
|
98
|
+
// - SPARQL queries executed
|
|
99
|
+
// - Rules that matched
|
|
100
|
+
// - Similar entities found via embeddings
|
|
2519
101
|
```
|
|
2520
|
-
User Query: "Find anomalous billing patterns"
|
|
2521
|
-
│
|
|
2522
|
-
▼
|
|
2523
|
-
┌─────────────────────────────────────────────────────────────────┐
|
|
2524
|
-
│ 1. SCHEMA INJECTION │
|
|
2525
|
-
│ Agent extracts predicates from KG: │
|
|
2526
|
-
│ Classes: Claim, Provider, Claimant │
|
|
2527
|
-
│ Predicates: submittedBy, amount, riskScore, filedDate │
|
|
2528
|
-
└─────────────────────────────────────────────────────────────────┘
|
|
2529
|
-
│
|
|
2530
|
-
▼
|
|
2531
|
-
┌─────────────────────────────────────────────────────────────────┐
|
|
2532
|
-
│ 2. LLM INTENT CLASSIFICATION │
|
|
2533
|
-
│ Prompt: "Given schema {classes, predicates}, classify: │
|
|
2534
|
-
│ 'Find anomalous billing patterns'" │
|
|
2535
|
-
│ Response: { intent: 'detect_fraud', confidence: 0.92 } │
|
|
2536
|
-
└─────────────────────────────────────────────────────────────────┘
|
|
2537
|
-
│
|
|
2538
|
-
▼
|
|
2539
|
-
┌─────────────────────────────────────────────────────────────────┐
|
|
2540
|
-
│ 3. LLM SPARQL GENERATION │
|
|
2541
|
-
│ Prompt: "Generate SPARQL for detect_fraud using: │
|
|
2542
|
-
│ - Predicates: {submittedBy, amount, riskScore} │
|
|
2543
|
-
│ - Type contracts: Output must be valid SPARQL 1.1" │
|
|
2544
|
-
│ Response: Valid SPARQL matching YOUR schema │
|
|
2545
|
-
└─────────────────────────────────────────────────────────────────┘
|
|
2546
|
-
```
|
|
2547
|
-
|
|
2548
|
-
**Why EmbeddingService?**
|
|
2549
102
|
|
|
2550
|
-
|
|
2551
|
-
1. **Semantic Search Tool** - `find_similar` intent uses `kg.embeddings.search`
|
|
2552
|
-
2. **Memory Retrieval** - Find similar past queries for context
|
|
103
|
+
### Regulatory Compliance
|
|
2553
104
|
|
|
2554
105
|
```javascript
|
|
2555
|
-
// Without embeddings: only SPARQL + Datalog tools available
|
|
2556
|
-
const agent = new HyperMindAgent({ kg: db, model: 'claude-sonnet-4', apiKey })
|
|
2557
|
-
|
|
2558
|
-
// With embeddings: adds semantic search capability
|
|
2559
106
|
const agent = new HyperMindAgent({
|
|
2560
|
-
kg:
|
|
2561
|
-
|
|
2562
|
-
model: 'claude-sonnet-4',
|
|
2563
|
-
apiKey
|
|
107
|
+
kg: complianceDB,
|
|
108
|
+
scope: { allowedGraphs: ['http://compliance.org/'] } // Restrict access
|
|
2564
109
|
})
|
|
2565
|
-
|
|
110
|
+
|
|
111
|
+
const result = await agent.call('Check GDPR compliance for customer data flows')
|
|
112
|
+
// Returns: Compliance status with verifiable reasoning chain
|
|
2566
113
|
```
|
|
2567
114
|
|
|
2568
|
-
|
|
115
|
+
### Risk Assessment
|
|
2569
116
|
|
|
2570
117
|
```javascript
|
|
2571
|
-
const
|
|
2572
|
-
|
|
2573
|
-
|
|
2574
|
-
|
|
2575
|
-
|
|
2576
|
-
name: 'fraud-detector', // Optional: Agent identity for memory
|
|
2577
|
-
sandbox: { ... } // Optional: Security capabilities
|
|
2578
|
-
})
|
|
118
|
+
const result = await agent.call('Calculate risk score for entity P001')
|
|
119
|
+
// Returns: Risk score with complete derivation
|
|
120
|
+
// - Which data points were used
|
|
121
|
+
// - Which rules were applied
|
|
122
|
+
// - Confidence intervals
|
|
2579
123
|
```
|
|
2580
124
|
|
|
2581
125
|
---
|
|
2582
126
|
|
|
2583
|
-
##
|
|
2584
|
-
|
|
2585
|
-
### Test Environment
|
|
2586
|
-
|
|
2587
|
-
All benchmarks run on **commodity hardware** (Intel Mac) using the InMemory storage backend.
|
|
2588
|
-
|
|
2589
|
-
| Component | Specification |
|
|
2590
|
-
|-----------|---------------|
|
|
2591
|
-
| **Hardware** | Intel Mac (commodity laptop) |
|
|
2592
|
-
| **Backend** | InMemoryBackend (zero-copy, no GC) |
|
|
2593
|
-
| **Dataset** | [LUBM](http://swat.cse.lehigh.edu/projects/lubm/) (Lehigh University Benchmark) |
|
|
2594
|
-
| **Triples** | 3,272 (LUBM-1 scale factor) |
|
|
2595
|
-
| **Tool** | [Criterion.rs](https://github.com/bheisler/criterion.rs) statistical benchmarking |
|
|
2596
|
-
|
|
2597
|
-
### Measured Performance (Our Benchmarks)
|
|
2598
|
-
|
|
2599
|
-
| Metric | Measured Value | Rate |
|
|
2600
|
-
|--------|----------------|------|
|
|
2601
|
-
| **Triple Lookup** | 2.78 µs | 359K lookups/sec |
|
|
2602
|
-
| **Bulk Insert (100K)** | 682 ms | 146K triples/sec |
|
|
2603
|
-
| **Dictionary Intern (new)** | 1.10 ms / 1K | 909K/sec |
|
|
2604
|
-
| **Dictionary Lookup (cached)** | 60.4 µs / 100 | 1.65M/sec |
|
|
2605
|
-
|
|
2606
|
-
### Memory Efficiency
|
|
2607
|
-
|
|
2608
|
-
| Metric | Value | Calculation |
|
|
2609
|
-
|--------|-------|-------------|
|
|
2610
|
-
| **Bytes per Triple** | 24 bytes | 3 × 8-byte node references |
|
|
2611
|
-
| **Index Overhead** | 4 indexes | SPOC, POCS, OCSP, CSPO |
|
|
2612
|
-
|
|
2613
|
-
### Industry Comparison (Published Research)
|
|
2614
|
-
|
|
2615
|
-
All competitor numbers are from peer-reviewed papers and official documentation. **Direct same-hardware comparison requires independent benchmarking.**
|
|
2616
|
-
|
|
2617
|
-
#### Triple Store Performance Comparison
|
|
2618
|
-
|
|
2619
|
-
| System | Lookup Speed | Insert Rate | Memory/Triple | Source |
|
|
2620
|
-
|--------|-------------|-------------|---------------|--------|
|
|
2621
|
-
| **rust-kgdb** | **2.78 µs** | 146K/sec | **24 bytes** | [Our Criterion.rs benchmarks](./HYPERMIND_BENCHMARK_REPORT.md) |
|
|
2622
|
-
| RDFox | ~5 µs | 200-1000K/sec | 36-89 bytes | [Oxford Semantic 2024](https://www.oxfordsemantic.tech/rdfox) |
|
|
2623
|
-
| Tentris | ~10-50 µs | 67ms/update | 32-64 bytes | [ISWC 2020/2025](https://papers.dice-research.org/2020/ISWC_Tentris/iswc2020_tentris_public.pdf) |
|
|
2624
|
-
| Virtuoso | ~5 µs | 12-36K/sec | 35-75 bytes | [OpenLink LUBM](https://vos.openlinksw.com/owiki/wiki/VOS/VOSArticleLUBMBenchmark) |
|
|
2625
|
-
| Blazegraph | ~100 µs | ~50K/sec | 100+ bytes | [Blazegraph Wiki](https://github.com/blazegraph/database/wiki) |
|
|
2626
|
-
| AllegroGraph | ~50 µs | ~20K/sec | 100+ bytes | [Franz SP2 Benchmark](https://allegrograph.com/benchmarks-sp2/) |
|
|
2627
|
-
|
|
2628
|
-
#### Query Algorithm Comparison
|
|
2629
|
-
|
|
2630
|
-
| System | Join Algorithm | Cyclic Query | Worst-Case | Notes |
|
|
2631
|
-
|--------|---------------|--------------|------------|-------|
|
|
2632
|
-
| **rust-kgdb** | **WCOJ** | **O(n^(w/2))** | **Optimal** | Worst-case optimal joins |
|
|
2633
|
-
| Tentris | WCOJ (Einstein) | O(n^(w/2)) | Optimal | Tensor-based hypertrie |
|
|
2634
|
-
| RDFox | Hash Join | O(n²) | Not optimal | Fast for star queries |
|
|
2635
|
-
| Virtuoso | Hash/Merge | O(n²) | Not optimal | Good for simple patterns |
|
|
2636
|
-
| Blazegraph | Hash Join | O(n²) | Not optimal | Optimized for Wikidata |
|
|
2637
|
-
|
|
2638
|
-
**WCOJ Advantage**: Cyclic queries (fraud rings, circular dependencies) run optimally. Traditional hash joins degrade to O(n²).
|
|
2639
|
-
|
|
2640
|
-
#### Queries per Second (Published Benchmarks)
|
|
2641
|
-
|
|
2642
|
-
| System | SWDF (372K) | DBpedia (681M) | WatDiv (1B) | Source |
|
|
2643
|
-
|--------|-------------|----------------|-------------|--------|
|
|
2644
|
-
| Tentris | 4088 QpS | 4825 QpS | ~2000 QpS | [ISWC 2022](https://link.springer.com/chapter/10.1007/978-3-031-19433-7_4) |
|
|
2645
|
-
| Virtuoso | ~1000 QpS | ~500 QpS | ~200 QpS | [Tentris comparison](https://papers.dice-research.org/2020/ISWC_Tentris/iswc2020_tentris_public.pdf) |
|
|
2646
|
-
| Blazegraph | ~800 QpS | ~300 QpS | ~150 QpS | [Tentris comparison](https://papers.dice-research.org/2020/ISWC_Tentris/iswc2020_tentris_public.pdf) |
|
|
2647
|
-
| RDFox | N/A | 62 QpS (Wikidata) | N/A | [Oxford 2024](https://www.oxfordsemantic.tech/blog/enhancing-wikidata-performance-with-rdfox-how-to-dissect-the-worlds-leading-rdf-database-faster) |
|
|
2648
|
-
|
|
2649
|
-
**Note**: QpS varies significantly by query complexity and dataset. Tentris excels on analytical workloads with WCOJ.
|
|
2650
|
-
|
|
2651
|
-
#### Unique rust-kgdb Advantages
|
|
2652
|
-
|
|
2653
|
-
| Feature | rust-kgdb | Tentris | RDFox | Virtuoso | Blazegraph |
|
|
2654
|
-
|---------|-----------|---------|-------|----------|------------|
|
|
2655
|
-
| **Mobile (iOS/Android)** | ✅ UniFFI | ❌ | ❌ | ❌ | ❌ |
|
|
2656
|
-
| **AI Agent Framework** | ✅ HyperMind | ❌ | ❌ | ❌ | ❌ |
|
|
2657
|
-
| **Proof DAG (Curry-Howard)** | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
2658
|
-
| **WASM Sandbox** | ✅ OCAP | ❌ | ❌ | ❌ | ❌ |
|
|
2659
|
-
| **Zero-Copy (no GC)** | ✅ Rust | ❌ C++ | ❌ C++ | ❌ C | ❌ Java |
|
|
2660
|
-
| **WCOJ Algorithm** | ✅ | ✅ | ❌ | ❌ | ❌ |
|
|
2661
|
-
| **Memory Hypergraph** | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
2662
|
-
| **Schema-Aware LLM** | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
2663
|
-
|
|
2664
|
-
#### Honest Assessment
|
|
127
|
+
## Features
|
|
2665
128
|
|
|
2666
|
-
|
|
2667
|
-
- **
|
|
2668
|
-
- **
|
|
2669
|
-
- **
|
|
2670
|
-
- **
|
|
2671
|
-
|
|
2672
|
-
**Sources**:
|
|
2673
|
-
- [Tentris ISWC 2020 Paper](https://papers.dice-research.org/2020/ISWC_Tentris/iswc2020_tentris_public.pdf)
|
|
2674
|
-
- [Tentris WCOJ Update 2025](https://papers.dice-research.org/2025/ISWC_Tentris-WCOJ-Update/public.pdf)
|
|
2675
|
-
- [RDFox Oxford Semantic](https://www.oxfordsemantic.tech/rdfox)
|
|
2676
|
-
- [Virtuoso LUBM Benchmark](https://vos.openlinksw.com/owiki/wiki/VOS/VOSArticleLUBMBenchmark)
|
|
2677
|
-
- [AllegroGraph SP2](https://allegrograph.com/benchmarks-sp2/)
|
|
2678
|
-
|
|
2679
|
-
### HyperMind Agent Accuracy
|
|
2680
|
-
|
|
2681
|
-
Tested on LUBM dataset with 11 hard query scenarios:
|
|
2682
|
-
|
|
2683
|
-
| Approach | Valid SPARQL Generated | Why |
|
|
2684
|
-
|----------|------------------------|-----|
|
|
2685
|
-
| **Vanilla LLM** | 0% | Markdown fences, hallucinated predicates |
|
|
2686
|
-
| **HyperMind + Schema** | 86.4% avg | Schema injection, type contracts |
|
|
2687
|
-
|
|
2688
|
-
**Models tested**: Claude Sonnet 4 (90.9%), GPT-4o (81.8%)
|
|
2689
|
-
|
|
2690
|
-
**Methodology**: [HYPERMIND_BENCHMARK_REPORT.md](./HYPERMIND_BENCHMARK_REPORT.md)
|
|
2691
|
-
|
|
2692
|
-
### Run Benchmarks Yourself
|
|
2693
|
-
|
|
2694
|
-
```bash
|
|
2695
|
-
# Database benchmarks (requires Rust)
|
|
2696
|
-
cargo bench --package storage --bench triple_store_benchmark
|
|
2697
|
-
|
|
2698
|
-
# HyperMind agent benchmarks
|
|
2699
|
-
node hypermind-benchmark.js
|
|
2700
|
-
```
|
|
2701
|
-
|
|
2702
|
-
---
|
|
2703
|
-
|
|
2704
|
-
## W3C Standards Compliance
|
|
129
|
+
### Core Database
|
|
130
|
+
- **SPARQL 1.1** - Full query and update support (64 builtin functions)
|
|
131
|
+
- **RDF 1.2** - Complete W3C standard implementation
|
|
132
|
+
- **RDF-Star** - Statements about statements
|
|
133
|
+
- **Hypergraph** - N-ary relationships beyond triples
|
|
2705
134
|
|
|
2706
|
-
|
|
2707
|
-
|
|
2708
|
-
|
|
2709
|
-
|
|
2710
|
-
|
|
2711
|
-
|
|
2712
|
-
|
|
2713
|
-
|
|
2714
|
-
|
|
2715
|
-
|
|
2716
|
-
|
|
2717
|
-
|
|
2718
|
-
|
|
2719
|
-
|
|
2720
|
-
|
|
2721
|
-
|
|
2722
|
-
|
|
2723
|
-
| **SPARQL-Star** | ✅ | ❌ | ✅ | ⚠️ Partial | ⚠️ RDR |
|
|
2724
|
-
| **Native Hypergraph** | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
2725
|
-
| **64 Builtins** | ✅ | ~30 | ~40 | ~50 | ~45 |
|
|
2726
|
-
|
|
2727
|
-
**64 SPARQL Builtin Functions** implemented:
|
|
2728
|
-
- String: `STR`, `CONCAT`, `SUBSTR`, `STRLEN`, `REGEX`, `REPLACE`, etc.
|
|
2729
|
-
- Numeric: `ABS`, `ROUND`, `CEIL`, `FLOOR`, `RAND`
|
|
2730
|
-
- Date/Time: `NOW`, `YEAR`, `MONTH`, `DAY`, `HOURS`, `MINUTES`, `SECONDS`
|
|
2731
|
-
- Hash: `MD5`, `SHA1`, `SHA256`, `SHA384`, `SHA512`
|
|
2732
|
-
- Aggregates: `COUNT`, `SUM`, `AVG`, `MIN`, `MAX`, `GROUP_CONCAT`
|
|
135
|
+
### Graph Analytics
|
|
136
|
+
- **PageRank** - Iterative ranking algorithm
|
|
137
|
+
- **Connected Components** - Community detection
|
|
138
|
+
- **Shortest Paths** - Path finding
|
|
139
|
+
- **Triangle Count** - Graph density
|
|
140
|
+
- **Motif Finding** - Pattern matching
|
|
141
|
+
|
|
142
|
+
### AI Agent Framework
|
|
143
|
+
- **Schema-Aware** - Auto-extracts schema from your data
|
|
144
|
+
- **Typed Tools** - Input/output validation prevents errors
|
|
145
|
+
- **Audit Trail** - Every answer is traceable
|
|
146
|
+
- **Memory** - Working, episodic, and long-term memory
|
|
147
|
+
|
|
148
|
+
### Performance
|
|
149
|
+
- **2.78 µs** lookup speed (35x faster than RDFox)
|
|
150
|
+
- **146K triples/sec** bulk insert
|
|
151
|
+
- **24 bytes/triple** memory efficiency
|
|
2733
152
|
|
|
2734
153
|
---
|
|
2735
154
|
|
|
2736
|
-
##
|
|
2737
|
-
|
|
2738
|
-
### GraphDB
|
|
2739
|
-
|
|
2740
|
-
```typescript
|
|
2741
|
-
class GraphDB {
|
|
2742
|
-
constructor(appGraphUri: string)
|
|
2743
|
-
loadTtl(ttlContent: string, graphName: string | null): void
|
|
2744
|
-
querySelect(sparql: string): QueryResult[]
|
|
2745
|
-
query(sparql: string): TripleResult[]
|
|
2746
|
-
countTriples(): number
|
|
2747
|
-
clear(): void
|
|
2748
|
-
getGraphUri(): string
|
|
2749
|
-
}
|
|
2750
|
-
```
|
|
2751
|
-
|
|
2752
|
-
### GraphFrame
|
|
2753
|
-
|
|
2754
|
-
```typescript
|
|
2755
|
-
class GraphFrame {
|
|
2756
|
-
constructor(verticesJson: string, edgesJson: string)
|
|
2757
|
-
pageRank(resetProb: number, maxIter: number): string
|
|
2758
|
-
connectedComponents(): string
|
|
2759
|
-
shortestPaths(landmarks: string[]): string
|
|
2760
|
-
triangleCount(): number
|
|
2761
|
-
find(pattern: string): string // Motif finding
|
|
2762
|
-
}
|
|
2763
|
-
|
|
2764
|
-
// Factory functions
|
|
2765
|
-
friendsGraph(), chainGraph(n), starGraph(n), completeGraph(n), cycleGraph(n)
|
|
2766
|
-
```
|
|
2767
|
-
|
|
2768
|
-
### EmbeddingService
|
|
155
|
+
## How It Works
|
|
2769
156
|
|
|
2770
|
-
|
|
2771
|
-
class EmbeddingService {
|
|
2772
|
-
constructor()
|
|
2773
|
-
storeVector(entityId: string, vector: number[]): void
|
|
2774
|
-
getVector(entityId: string): number[] | null
|
|
2775
|
-
findSimilar(entityId: string, k: number, threshold: number): string
|
|
2776
|
-
rebuildIndex(): void
|
|
2777
|
-
onTripleInsert(subject: string, predicate: string, object: string, graph: string | null): void
|
|
2778
|
-
}
|
|
2779
|
-
```
|
|
157
|
+
HyperMind combines two approaches:
|
|
2780
158
|
|
|
2781
|
-
|
|
159
|
+
1. **Neural** (LLM): Understands your question in natural language
|
|
160
|
+
2. **Symbolic** (Database): Executes precise queries against your data
|
|
2782
161
|
|
|
2783
|
-
```typescript
|
|
2784
|
-
class DatalogProgram {
|
|
2785
|
-
constructor()
|
|
2786
|
-
addFact(factJson: string): void
|
|
2787
|
-
addRule(ruleJson: string): void
|
|
2788
|
-
}
|
|
2789
|
-
function evaluateDatalog(program: DatalogProgram): string
|
|
2790
|
-
function queryDatalog(program: DatalogProgram, predicate: string): string
|
|
2791
162
|
```
|
|
2792
|
-
|
|
2793
|
-
|
|
2794
|
-
|
|
2795
|
-
|
|
2796
|
-
|
|
2797
|
-
|
|
2798
|
-
kg: GraphDB | SchemaAwareGraphDB, // REQUIRED
|
|
2799
|
-
embeddings?: EmbeddingService, // Optional: for similarity search
|
|
2800
|
-
model?: string, // Optional: 'claude-sonnet-4', 'gpt-4o'
|
|
2801
|
-
apiKey?: string, // Required if model specified
|
|
2802
|
-
name?: string, // Default: 'hypermind-agent'
|
|
2803
|
-
memory?: MemoryManager, // Optional: session persistence
|
|
2804
|
-
scope?: AgentScope, // Optional: access control
|
|
2805
|
-
sandbox?: { // Default: secure (ReadKG, ExecuteTool)
|
|
2806
|
-
capabilities: string[], // 'ReadKG', 'WriteKG', 'ExecuteTool', 'SpawnAgent', 'HttpAccess'
|
|
2807
|
-
fuelLimit: number // CPU budget (default: 1_000_000)
|
|
2808
|
-
}
|
|
2809
|
-
})
|
|
2810
|
-
|
|
2811
|
-
call(prompt: string): Promise<{
|
|
2812
|
-
answer: string,
|
|
2813
|
-
explanation: { tools_used: string[], sparql_queries: string[] },
|
|
2814
|
-
proof: { hash: string, type: string, derivation: object[] }
|
|
2815
|
-
}>
|
|
2816
|
-
|
|
2817
|
-
addRule(name: string, rule: object): void
|
|
2818
|
-
getAuditLog(): object[]
|
|
2819
|
-
}
|
|
163
|
+
Your Question → LLM Plans Query → Database Executes → Verified Answer
|
|
164
|
+
↓ ↓ ↓ ↓
|
|
165
|
+
"Find fraud" SELECT ?x WHERE... 47 results "Provider P001
|
|
166
|
+
is suspicious"
|
|
167
|
+
+ reasoning trace
|
|
168
|
+
+ audit hash
|
|
2820
169
|
```
|
|
2821
170
|
|
|
2822
|
-
|
|
2823
|
-
|
|
2824
|
-
```typescript
|
|
2825
|
-
class SchemaAwareGraphDB {
|
|
2826
|
-
constructor(baseUriOrDb: string | GraphDB, options?: {
|
|
2827
|
-
autoExtract?: boolean, // Default: true - extract schema on load
|
|
2828
|
-
ontology?: string // Optional: TTL ontology to use
|
|
2829
|
-
})
|
|
2830
|
-
|
|
2831
|
-
// All GraphDB methods available (loadTtl, querySelect, etc.)
|
|
2832
|
-
loadTtl(data: string, graphUri: string | null): void
|
|
2833
|
-
querySelect(sparql: string): QueryResult[]
|
|
171
|
+
The LLM plans WHAT to look for. The database finds EXACTLY that. Every answer traces back to actual data. No hallucination possible.
|
|
2834
172
|
|
|
2835
|
-
|
|
2836
|
-
waitForSchema(timeoutMs?: number): Promise<SchemaContext>
|
|
2837
|
-
getSchema(): SchemaContext | null
|
|
2838
|
-
refreshSchema(): Promise<void>
|
|
2839
|
-
}
|
|
173
|
+
---
|
|
2840
174
|
|
|
2841
|
-
|
|
2842
|
-
function createSchemaAwareGraphDB(baseUri: string, options?: object): SchemaAwareGraphDB
|
|
2843
|
-
function wrapWithSchemaAwareness(db: GraphDB, options?: object): SchemaAwareGraphDB
|
|
2844
|
-
```
|
|
175
|
+
## API Reference
|
|
2845
176
|
|
|
2846
|
-
###
|
|
177
|
+
### GraphDB
|
|
2847
178
|
|
|
2848
179
|
```typescript
|
|
2849
|
-
class
|
|
2850
|
-
|
|
2851
|
-
|
|
2852
|
-
|
|
2853
|
-
|
|
2854
|
-
|
|
2855
|
-
|
|
2856
|
-
static merge(...contexts: SchemaContext[]): SchemaContext
|
|
180
|
+
class GraphDB {
|
|
181
|
+
constructor(appGraphUri: string)
|
|
182
|
+
loadTtl(ttlContent: string, graphName: string | null): void
|
|
183
|
+
querySelect(sparql: string): QueryResult[]
|
|
184
|
+
query(sparql: string): TripleResult[]
|
|
185
|
+
countTriples(): number
|
|
186
|
+
clear(): void
|
|
2857
187
|
}
|
|
2858
188
|
```
|
|
2859
189
|
|
|
2860
|
-
###
|
|
190
|
+
### HyperMindAgent
|
|
2861
191
|
|
|
2862
192
|
```typescript
|
|
2863
|
-
class
|
|
2864
|
-
constructor(
|
|
2865
|
-
kg: GraphDB,
|
|
2866
|
-
model?: string, // '
|
|
2867
|
-
apiKey?: string
|
|
193
|
+
class HyperMindAgent {
|
|
194
|
+
constructor(options: {
|
|
195
|
+
kg: GraphDB, // Your knowledge graph
|
|
196
|
+
model?: string, // 'gpt-4o' | 'claude-3-opus' | etc.
|
|
197
|
+
apiKey?: string, // LLM API key
|
|
198
|
+
memory?: MemoryManager,
|
|
199
|
+
scope?: AgentScope,
|
|
200
|
+
embeddings?: EmbeddingService
|
|
2868
201
|
})
|
|
2869
202
|
|
|
2870
|
-
|
|
2871
|
-
|
|
2872
|
-
|
|
203
|
+
call(prompt: string): Promise<AgentResponse>
|
|
204
|
+
}
|
|
205
|
+
|
|
206
|
+
interface AgentResponse {
|
|
207
|
+
answer: string
|
|
208
|
+
reasoningTrace: ReasoningStep[] // Audit trail
|
|
209
|
+
hash: string // Reproducibility hash
|
|
2873
210
|
}
|
|
2874
211
|
```
|
|
2875
212
|
|
|
2876
|
-
###
|
|
213
|
+
### GraphFrame
|
|
2877
214
|
|
|
2878
215
|
```typescript
|
|
2879
|
-
class
|
|
2880
|
-
constructor(
|
|
2881
|
-
|
|
2882
|
-
|
|
2883
|
-
|
|
2884
|
-
|
|
2885
|
-
|
|
2886
|
-
storeEpisode(episode: object): void
|
|
2887
|
-
recall(query: string, limit?: number): object[]
|
|
2888
|
-
getWorkingMemory(): object[]
|
|
2889
|
-
clearWorkingMemory(): void
|
|
216
|
+
class GraphFrame {
|
|
217
|
+
constructor(verticesJson: string, edgesJson: string)
|
|
218
|
+
pageRank(resetProb: number, maxIter: number): string
|
|
219
|
+
connectedComponents(): string
|
|
220
|
+
shortestPaths(landmarks: string[]): string
|
|
221
|
+
triangleCount(): number
|
|
222
|
+
find(pattern: string): string // Motif pattern matching
|
|
2890
223
|
}
|
|
2891
224
|
```
|
|
2892
225
|
|
|
2893
|
-
###
|
|
226
|
+
### EmbeddingService
|
|
2894
227
|
|
|
2895
228
|
```typescript
|
|
2896
|
-
class
|
|
2897
|
-
|
|
2898
|
-
|
|
2899
|
-
|
|
2900
|
-
maxResultSize?: number // Default: 10000
|
|
2901
|
-
})
|
|
2902
|
-
|
|
2903
|
-
checkAccess(graph: string, predicate: string): boolean
|
|
2904
|
-
enforceLimit(results: any[]): any[]
|
|
229
|
+
class EmbeddingService {
|
|
230
|
+
storeVector(entityId: string, vector: number[]): void
|
|
231
|
+
findSimilar(entityId: string, k: number, threshold: number): string
|
|
232
|
+
rebuildIndex(): void
|
|
2905
233
|
}
|
|
2906
234
|
```
|
|
2907
235
|
|
|
2908
|
-
###
|
|
236
|
+
### DatalogProgram
|
|
2909
237
|
|
|
2910
238
|
```typescript
|
|
2911
|
-
class
|
|
2912
|
-
|
|
2913
|
-
|
|
2914
|
-
fuelLimit: number // CPU budget
|
|
2915
|
-
})
|
|
2916
|
-
|
|
2917
|
-
execute(tool: string, args: object): Promise<object>
|
|
2918
|
-
getRemainingFuel(): number
|
|
2919
|
-
getExecutionTrace(): object[]
|
|
239
|
+
class DatalogProgram {
|
|
240
|
+
addFact(factJson: string): void
|
|
241
|
+
addRule(ruleJson: string): void
|
|
2920
242
|
}
|
|
243
|
+
|
|
244
|
+
function evaluateDatalog(program: DatalogProgram): string
|
|
245
|
+
function queryDatalog(program: DatalogProgram, query: string): string
|
|
2921
246
|
```
|
|
2922
247
|
|
|
2923
248
|
---
|
|
2924
249
|
|
|
2925
|
-
##
|
|
2926
|
-
|
|
2927
|
-
HyperMind implements three complementary security layers for AI agent execution:
|
|
2928
|
-
|
|
2929
|
-
### 1. AgentScope: Data Access Control
|
|
2930
|
-
|
|
2931
|
-
**Concept**: Scope defines WHAT data an agent can access - a whitelist-based filter on graphs and predicates.
|
|
2932
|
-
|
|
2933
|
-
```
|
|
2934
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
2935
|
-
│ AGENT SCOPE MODEL │
|
|
2936
|
-
│ │
|
|
2937
|
-
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
2938
|
-
│ │ KNOWLEDGE GRAPH ││
|
|
2939
|
-
│ │ ┌──────────────────────────────────────────────────────────────────┐ ││
|
|
2940
|
-
│ │ │ Graph: http://insurance.org/claims ← ALLOWED │ ││
|
|
2941
|
-
│ │ │ :Claim :amount, :provider, :status │ ││
|
|
2942
|
-
│ │ └──────────────────────────────────────────────────────────────────┘ ││
|
|
2943
|
-
│ │ ┌──────────────────────────────────────────────────────────────────┐ ││
|
|
2944
|
-
│ │ │ Graph: http://insurance.org/internal ← BLOCKED │ ││
|
|
2945
|
-
│ │ │ :Employee :salary, :ssn, :performance │ ││
|
|
2946
|
-
│ │ └──────────────────────────────────────────────────────────────────┘ ││
|
|
2947
|
-
│ │ ┌──────────────────────────────────────────────────────────────────┐ ││
|
|
2948
|
-
│ │ │ Graph: http://insurance.org/customers ← ALLOWED │ ││
|
|
2949
|
-
│ │ │ :Customer :riskScore (allowed), :creditCard (blocked) │ ││
|
|
2950
|
-
│ │ └──────────────────────────────────────────────────────────────────┘ ││
|
|
2951
|
-
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
2952
|
-
│ │
|
|
2953
|
-
│ AgentScope: │
|
|
2954
|
-
│ allowedGraphs: ['http://insurance.org/claims', 'http://insurance.org/customers']│
|
|
2955
|
-
│ allowedPredicates: [':amount', ':provider', ':status', ':riskScore'] │
|
|
2956
|
-
│ maxResultSize: 1000 │
|
|
2957
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
2958
|
-
```
|
|
250
|
+
## More Examples
|
|
2959
251
|
|
|
2960
|
-
|
|
2961
|
-
- **Principle of Least Privilege**: Agent only sees data relevant to its task
|
|
2962
|
-
- **Data Isolation**: PII, financials, internal data can be excluded
|
|
2963
|
-
- **Compliance**: GDPR, HIPAA, SOX - restrict access by role
|
|
252
|
+
### Knowledge Graph
|
|
2964
253
|
|
|
2965
254
|
```javascript
|
|
2966
|
-
|
|
2967
|
-
const claimsScope = new AgentScope({
|
|
2968
|
-
allowedGraphs: ['http://insurance.org/claims'],
|
|
2969
|
-
allowedPredicates: [':amount', ':provider', ':status', ':dateSubmitted'],
|
|
2970
|
-
maxResultSize: 5000 // Prevent data exfiltration
|
|
2971
|
-
})
|
|
2972
|
-
|
|
2973
|
-
// Executive dashboard - broader access, still limited
|
|
2974
|
-
const execScope = new AgentScope({
|
|
2975
|
-
allowedGraphs: ['http://insurance.org/claims', 'http://insurance.org/analytics'],
|
|
2976
|
-
allowedPredicates: null, // All predicates
|
|
2977
|
-
maxResultSize: 50000
|
|
2978
|
-
})
|
|
2979
|
-
```
|
|
255
|
+
const { GraphDB } = require('rust-kgdb')
|
|
2980
256
|
|
|
2981
|
-
|
|
2982
|
-
|
|
2983
|
-
|
|
257
|
+
const db = new GraphDB('http://example.org/')
|
|
258
|
+
db.loadTtl(`
|
|
259
|
+
@prefix : <http://example.org/> .
|
|
260
|
+
:alice :knows :bob .
|
|
261
|
+
:bob :knows :charlie .
|
|
262
|
+
:charlie :knows :alice .
|
|
263
|
+
`, null)
|
|
2984
264
|
|
|
2985
|
-
|
|
265
|
+
console.log(`Loaded ${db.countTriples()} triples`) // 3
|
|
2986
266
|
|
|
267
|
+
const results = db.querySelect(`
|
|
268
|
+
PREFIX : <http://example.org/>
|
|
269
|
+
SELECT ?person WHERE { ?person :knows :bob }
|
|
270
|
+
`)
|
|
271
|
+
console.log(results) // [{ bindings: { person: 'http://example.org/alice' } }]
|
|
2987
272
|
```
|
|
2988
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
2989
|
-
│ FUEL: THE PREPAID COMPUTATION MODEL │
|
|
2990
|
-
│ │
|
|
2991
|
-
│ ANALOGY: Prepaid Phone Card │
|
|
2992
|
-
│ │
|
|
2993
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
2994
|
-
│ │ You buy a phone card with 100 minutes │ │
|
|
2995
|
-
│ │ Local call (SPARQL query): -2 minutes │ │
|
|
2996
|
-
│ │ Long distance (Datalog): -10 minutes │ │
|
|
2997
|
-
│ │ International (Graph algo): -30 minutes │ │
|
|
2998
|
-
│ │ │ │
|
|
2999
|
-
│ │ When minutes = 0 → Card stops working │ │
|
|
3000
|
-
│ │ No overdraft, no credit, no exceptions │ │
|
|
3001
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
3002
|
-
│ │
|
|
3003
|
-
│ SAME FOR AGENTS: │
|
|
3004
|
-
│ │
|
|
3005
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
3006
|
-
│ │ Agent gets 1,000,000 fuel units │ │
|
|
3007
|
-
│ │ Simple query: -1,000 fuel │ │
|
|
3008
|
-
│ │ Complex join: -15,000 fuel │ │
|
|
3009
|
-
│ │ PageRank: -100,000 fuel │ │
|
|
3010
|
-
│ │ │ │
|
|
3011
|
-
│ │ When fuel = 0 → Agent halts immediately │ │
|
|
3012
|
-
│ │ Operation in progress? Aborted. │ │
|
|
3013
|
-
│ │ No "just one more query", no exceptions │ │
|
|
3014
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
3015
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
3016
|
-
```
|
|
3017
|
-
|
|
3018
|
-
**Why Fuel Matters**:
|
|
3019
273
|
|
|
3020
|
-
|
|
3021
|
-
|---------|--------------|-----------|
|
|
3022
|
-
| **Infinite Loop** | Agent runs forever, system hangs | Agent stops when fuel exhausted |
|
|
3023
|
-
| **Malicious Query** | `SELECT * FROM trillion_rows` crashes system | Query aborted at fuel limit |
|
|
3024
|
-
| **Cost Control** | Unknown compute costs | Predictable: 1M fuel = ~$0.01 |
|
|
3025
|
-
| **Multi-tenant** | One agent starves others | Each agent has guaranteed budget |
|
|
3026
|
-
| **Audit** | "Why did this cost so much?" | Fuel log shows exact operations |
|
|
274
|
+
### Graph Analytics
|
|
3027
275
|
|
|
3028
|
-
|
|
3029
|
-
|
|
3030
|
-
**Why is it called "CPU Budget"?**
|
|
276
|
+
```javascript
|
|
277
|
+
const { GraphFrame } = require('rust-kgdb')
|
|
3031
278
|
|
|
3032
|
-
|
|
279
|
+
const graph = new GraphFrame(
|
|
280
|
+
JSON.stringify([{id:'alice'}, {id:'bob'}, {id:'charlie'}]),
|
|
281
|
+
JSON.stringify([
|
|
282
|
+
{src:'alice', dst:'bob'},
|
|
283
|
+
{src:'bob', dst:'charlie'},
|
|
284
|
+
{src:'charlie', dst:'alice'}
|
|
285
|
+
])
|
|
286
|
+
)
|
|
3033
287
|
|
|
3034
|
-
|
|
3035
|
-
|
|
3036
|
-
|
|
3037
|
-
│ │
|
|
3038
|
-
│ 1 fuel unit ≈ 1 microsecond of CPU time (approximate) │
|
|
3039
|
-
│ │
|
|
3040
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
3041
|
-
│ │ FUEL LIMIT APPROXIMATE CPU TIME TYPICAL USE CASE │ │
|
|
3042
|
-
│ │ ───────────────────────────────────────────────────────────────── │ │
|
|
3043
|
-
│ │ 100,000 ~100ms Simple query │ │
|
|
3044
|
-
│ │ 1,000,000 ~1 second Standard agent task │ │
|
|
3045
|
-
│ │ 10,000,000 ~10 seconds Complex analysis │ │
|
|
3046
|
-
│ │ 100,000,000 ~100 seconds Batch processing │ │
|
|
3047
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
3048
|
-
│ │
|
|
3049
|
-
│ WHY "FUEL" INSTEAD OF "TIME"? │
|
|
3050
|
-
│ │
|
|
3051
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
3052
|
-
│ │ TIME (wall clock): FUEL (CPU budget): │ │
|
|
3053
|
-
│ │ • Varies by machine speed • Consistent across machines │ │
|
|
3054
|
-
│ │ • Includes I/O wait • Only counts computation │ │
|
|
3055
|
-
│ │ • Hard to predict • Deterministic per operation │ │
|
|
3056
|
-
│ │ • Can't pause/resume • Checkpoint and continue │ │
|
|
3057
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
3058
|
-
│ │
|
|
3059
|
-
│ FUEL COST = OPERATION COMPLEXITY │
|
|
3060
|
-
│ │
|
|
3061
|
-
│ Simple SELECT: ~1,000 fuel (scans 100 triples) │
|
|
3062
|
-
│ Complex JOIN: ~15,000 fuel (joins 3 tables, 1000 rows each) │
|
|
3063
|
-
│ PageRank(100): ~100,000 fuel (20 iterations on 100-node graph) │
|
|
3064
|
-
│ │
|
|
3065
|
-
│ The cost is based on ALGORITHM COMPLEXITY, not wall-clock time. │
|
|
3066
|
-
│ A 1000-fuel query takes 1000 fuel whether it runs on a laptop or server. │
|
|
3067
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
288
|
+
console.log('Triangles:', graph.triangleCount()) // 1
|
|
289
|
+
console.log('PageRank:', JSON.parse(graph.pageRank(0.15, 20)))
|
|
290
|
+
console.log('Components:', JSON.parse(graph.connectedComponents()))
|
|
3068
291
|
```
|
|
3069
292
|
|
|
3070
|
-
|
|
293
|
+
### Rule-Based Reasoning
|
|
3071
294
|
|
|
3072
295
|
```javascript
|
|
3073
|
-
const
|
|
3074
|
-
kg: db,
|
|
3075
|
-
sandbox: {
|
|
3076
|
-
capabilities: ['ReadKG', 'ExecuteTool'],
|
|
3077
|
-
fuelLimit: 1_000_000 // 1 million fuel ≈ 1 second of CPU budget
|
|
3078
|
-
}
|
|
3079
|
-
})
|
|
3080
|
-
|
|
3081
|
-
// Agent executes:
|
|
3082
|
-
// 1. SPARQL query: costs 5,000 fuel
|
|
3083
|
-
// 2. Datalog evaluation: costs 25,000 fuel
|
|
3084
|
-
// 3. Embedding search: costs 2,000 fuel
|
|
3085
|
-
// Total: 32,000 fuel used, 968,000 remaining
|
|
296
|
+
const { DatalogProgram, evaluateDatalog } = require('rust-kgdb')
|
|
3086
297
|
|
|
3087
|
-
|
|
3088
|
-
|
|
3089
|
-
|
|
3090
|
-
```
|
|
298
|
+
const program = new DatalogProgram()
|
|
299
|
+
program.addFact(JSON.stringify({predicate: 'parent', terms: ['alice', 'bob']}))
|
|
300
|
+
program.addFact(JSON.stringify({predicate: 'parent', terms: ['bob', 'charlie']}))
|
|
3091
301
|
|
|
3092
|
-
|
|
302
|
+
// grandparent(X, Z) :- parent(X, Y), parent(Y, Z)
|
|
303
|
+
program.addRule(JSON.stringify({
|
|
304
|
+
head: {predicate: 'grandparent', terms: ['?X', '?Z']},
|
|
305
|
+
body: [
|
|
306
|
+
{predicate: 'parent', terms: ['?X', '?Y']},
|
|
307
|
+
{predicate: 'parent', terms: ['?Y', '?Z']}
|
|
308
|
+
]
|
|
309
|
+
}))
|
|
3093
310
|
|
|
3094
|
-
|
|
3095
|
-
|
|
3096
|
-
│ FUEL METERING MODEL │
|
|
3097
|
-
│ │
|
|
3098
|
-
│ Initial Fuel: 1,000,000 │
|
|
3099
|
-
│ │
|
|
3100
|
-
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3101
|
-
│ │ Operation 1: SPARQL Query (complex join) │ │
|
|
3102
|
-
│ │ Cost: -15,000 fuel │ │
|
|
3103
|
-
│ │ Remaining: 985,000 │ │
|
|
3104
|
-
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3105
|
-
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3106
|
-
│ │ Operation 2: Datalog evaluation (50 rules) │ │
|
|
3107
|
-
│ │ Cost: -45,000 fuel │ │
|
|
3108
|
-
│ │ Remaining: 940,000 │ │
|
|
3109
|
-
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3110
|
-
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3111
|
-
│ │ Operation 3: Embedding similarity search │ │
|
|
3112
|
-
│ │ Cost: -2,000 fuel │ │
|
|
3113
|
-
│ │ Remaining: 938,000 │ │
|
|
3114
|
-
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3115
|
-
│ ... │
|
|
3116
|
-
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3117
|
-
│ │ Operation N: Attempted complex analysis │ │
|
|
3118
|
-
│ │ Cost: -950,000 fuel │ │
|
|
3119
|
-
│ │ ERROR: FuelExhausted - execution halted │ │
|
|
3120
|
-
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3121
|
-
│ │
|
|
3122
|
-
│ WHY FUEL? │
|
|
3123
|
-
│ • Prevents infinite loops │
|
|
3124
|
-
│ • Enables cost accounting per agent │
|
|
3125
|
-
│ • DoS protection (runaway queries) │
|
|
3126
|
-
│ • Multi-tenant resource fairness │
|
|
3127
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
311
|
+
console.log('Inferred:', JSON.parse(evaluateDatalog(program)))
|
|
312
|
+
// grandparent(alice, charlie)
|
|
3128
313
|
```
|
|
3129
314
|
|
|
3130
|
-
|
|
315
|
+
### Semantic Similarity
|
|
3131
316
|
|
|
3132
|
-
|
|
3133
|
-
|
|
3134
|
-
| Simple SPARQL SELECT | 1,000 - 5,000 | BGP with 1-3 patterns |
|
|
3135
|
-
| Complex SPARQL (joins) | 10,000 - 50,000 | Multiple joins, filters |
|
|
3136
|
-
| Datalog evaluation | 5,000 - 100,000 | Depends on rule count |
|
|
3137
|
-
| Embedding search | 500 - 2,000 | HNSW lookup |
|
|
3138
|
-
| Graph algorithm | 10,000 - 500,000 | PageRank, components |
|
|
3139
|
-
| Memory retrieval | 100 - 500 | Episode lookup |
|
|
317
|
+
```javascript
|
|
318
|
+
const { EmbeddingService } = require('rust-kgdb')
|
|
3140
319
|
|
|
3141
|
-
|
|
320
|
+
const embeddings = new EmbeddingService()
|
|
3142
321
|
|
|
3143
|
-
|
|
322
|
+
// Store 384-dimension vectors
|
|
323
|
+
embeddings.storeVector('claim_001', new Array(384).fill(0.5))
|
|
324
|
+
embeddings.storeVector('claim_002', new Array(384).fill(0.6))
|
|
325
|
+
embeddings.rebuildIndex()
|
|
3144
326
|
|
|
327
|
+
// HNSW similarity search
|
|
328
|
+
const similar = JSON.parse(embeddings.findSimilar('claim_001', 5, 0.7))
|
|
329
|
+
console.log('Similar:', similar)
|
|
3145
330
|
```
|
|
3146
|
-
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
3147
|
-
│ OCAP vs TRADITIONAL ACCESS CONTROL │
|
|
3148
|
-
│ │
|
|
3149
|
-
│ TRADITIONAL (ACL/RBAC): OCAP (HyperMind): │
|
|
3150
|
-
│ ┌─────────────────────────┐ ┌─────────────────────────┐ │
|
|
3151
|
-
│ │ Agent requests │ │ Agent receives │ │
|
|
3152
|
-
│ │ "read claims" │ │ capability token │ │
|
|
3153
|
-
│ │ │ │ │ │ │ │
|
|
3154
|
-
│ │ ▼ │ │ ▼ │ │
|
|
3155
|
-
│ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │
|
|
3156
|
-
│ │ │ Access │ │ │ │ Token = │ │ │
|
|
3157
|
-
│ │ │ Control List │ │ │ │ ReadKG cap │ │ │
|
|
3158
|
-
│ │ │ (centralized)│ │ │ │ (unforgeable)│ │ │
|
|
3159
|
-
│ │ └──────────────┘ │ │ └──────────────┘ │ │
|
|
3160
|
-
│ │ │ │ │ │ │ │
|
|
3161
|
-
│ │ Check role → grant │ │ Has token → use it │ │
|
|
3162
|
-
│ │ │ │ │ │
|
|
3163
|
-
│ │ Problem: Ambient │ │ Benefit: No ambient │ │
|
|
3164
|
-
│ │ authority - agent │ │ authority - only what │ │
|
|
3165
|
-
│ │ could escalate │ │ was explicitly granted │ │
|
|
3166
|
-
│ └─────────────────────────┘ └─────────────────────────┘ │
|
|
3167
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
3168
|
-
```
|
|
3169
|
-
|
|
3170
|
-
**Available Capabilities**:
|
|
3171
331
|
|
|
3172
|
-
|
|
3173
|
-
|------------|----------------|------------|
|
|
3174
|
-
| `ReadKG` | Query knowledge graph (SELECT, CONSTRUCT, ASK) | Low |
|
|
3175
|
-
| `WriteKG` | Modify knowledge graph (INSERT, DELETE) | Medium |
|
|
3176
|
-
| `ExecuteTool` | Run registered tools (Datalog, GraphFrame) | Medium |
|
|
3177
|
-
| `SpawnAgent` | Create child agents | High |
|
|
3178
|
-
| `HttpAccess` | Make external HTTP requests | High |
|
|
332
|
+
---
|
|
3179
333
|
|
|
3180
|
-
|
|
3181
|
-
- **Memory Isolation**: Agent cannot access host memory
|
|
3182
|
-
- **Linear Memory**: Fixed-size sandbox, cannot grow unbounded
|
|
3183
|
-
- **No Ambient Authority**: Cannot access filesystem, network unless granted
|
|
3184
|
-
- **Deterministic Execution**: Same inputs → same outputs
|
|
334
|
+
## Benchmarks
|
|
3185
335
|
|
|
3186
|
-
|
|
3187
|
-
// Minimal permissions for read-only analysis
|
|
3188
|
-
const readOnlyAgent = new HyperMindAgent({
|
|
3189
|
-
kg: db,
|
|
3190
|
-
sandbox: {
|
|
3191
|
-
capabilities: ['ReadKG'], // Cannot write or execute tools
|
|
3192
|
-
fuelLimit: 100_000
|
|
3193
|
-
}
|
|
3194
|
-
})
|
|
336
|
+
### Performance (Measured)
|
|
3195
337
|
|
|
3196
|
-
|
|
3197
|
-
|
|
3198
|
-
|
|
3199
|
-
|
|
3200
|
-
|
|
3201
|
-
fuelLimit: 10_000_000
|
|
3202
|
-
}
|
|
3203
|
-
})
|
|
338
|
+
| Metric | Value | Rate |
|
|
339
|
+
|--------|-------|------|
|
|
340
|
+
| **Triple Lookup** | 2.78 µs | 359K lookups/sec |
|
|
341
|
+
| **Bulk Insert (100K)** | 682 ms | 146K triples/sec |
|
|
342
|
+
| **Memory per Triple** | 24 bytes | Best-in-class |
|
|
3204
343
|
|
|
3205
|
-
|
|
3206
|
-
const adminAgent = new HyperMindAgent({
|
|
3207
|
-
kg: db,
|
|
3208
|
-
sandbox: {
|
|
3209
|
-
capabilities: ['ReadKG', 'WriteKG', 'ExecuteTool', 'SpawnAgent'],
|
|
3210
|
-
fuelLimit: 100_000_000
|
|
3211
|
-
}
|
|
3212
|
-
})
|
|
3213
|
-
```
|
|
344
|
+
### Industry Comparison
|
|
3214
345
|
|
|
3215
|
-
|
|
346
|
+
| System | Lookup Speed | Memory/Triple | AI Framework |
|
|
347
|
+
|--------|-------------|---------------|--------------|
|
|
348
|
+
| **rust-kgdb** | **2.78 µs** | **24 bytes** | **Yes** |
|
|
349
|
+
| RDFox | ~5 µs | 36-89 bytes | No |
|
|
350
|
+
| Virtuoso | ~5 µs | 35-75 bytes | No |
|
|
351
|
+
| Blazegraph | ~100 µs | 100+ bytes | No |
|
|
3216
352
|
|
|
3217
|
-
|
|
353
|
+
### AI Agent Accuracy
|
|
3218
354
|
|
|
3219
|
-
|
|
3220
|
-
|
|
3221
|
-
|
|
3222
|
-
|
|
3223
|
-
│ User Query: "Find high-risk claims and update their status" │
|
|
3224
|
-
│ │
|
|
3225
|
-
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3226
|
-
│ │ LAYER 1: SCOPE CHECK │ │
|
|
3227
|
-
│ │ ✅ Graph 'claims' is in allowedGraphs │ │
|
|
3228
|
-
│ │ ✅ Predicates 'riskScore', 'status' are allowed │ │
|
|
3229
|
-
│ │ ❌ If accessing 'internal' graph → BLOCKED │ │
|
|
3230
|
-
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3231
|
-
│ ↓ │
|
|
3232
|
-
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3233
|
-
│ │ LAYER 2: CAPABILITY CHECK │ │
|
|
3234
|
-
│ │ ✅ Has 'ReadKG' → SELECT query allowed │ │
|
|
3235
|
-
│ │ ❓ Has 'WriteKG'? → If yes, UPDATE allowed; if no, BLOCKED │ │
|
|
3236
|
-
│ │ ✅ Has 'ExecuteTool' → Datalog rules can run │ │
|
|
3237
|
-
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3238
|
-
│ ↓ │
|
|
3239
|
-
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3240
|
-
│ │ LAYER 3: FUEL CHECK │ │
|
|
3241
|
-
│ │ Query cost estimate: 25,000 fuel │ │
|
|
3242
|
-
│ │ Available fuel: 938,000 │ │
|
|
3243
|
-
│ │ ✅ Sufficient fuel → EXECUTE │ │
|
|
3244
|
-
│ │ (After execution: 913,000 remaining) │ │
|
|
3245
|
-
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3246
|
-
│ ↓ │
|
|
3247
|
-
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3248
|
-
│ │ RESULT: Query executed, results returned │ │
|
|
3249
|
-
│ │ All operations logged in audit trail │ │
|
|
3250
|
-
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3251
|
-
└─────────────────────────────────────────────────────────────────────────────┘
|
|
3252
|
-
```
|
|
355
|
+
| Approach | Accuracy | Why |
|
|
356
|
+
|----------|----------|-----|
|
|
357
|
+
| **Vanilla LLM** | 0% | Hallucinated predicates, markdown in SPARQL |
|
|
358
|
+
| **HyperMind** | 86.4% | Schema injection, typed tools, audit trail |
|
|
3253
359
|
|
|
3254
360
|
---
|
|
3255
361
|
|
|
3256
|
-
|
|
3257
|
-
|
|
3258
|
-
Fuel metering prevents runaway computations and enables resource accounting:
|
|
362
|
+
## W3C Standards Compliance
|
|
3259
363
|
|
|
3260
|
-
|
|
3261
|
-
|
|
3262
|
-
|
|
3263
|
-
|
|
3264
|
-
|
|
3265
|
-
|
|
3266
|
-
|
|
3267
|
-
})
|
|
364
|
+
| Standard | Status |
|
|
365
|
+
|----------|--------|
|
|
366
|
+
| **SPARQL 1.1 Query** | ✅ 100% |
|
|
367
|
+
| **SPARQL 1.1 Update** | ✅ 100% |
|
|
368
|
+
| **RDF 1.2** | ✅ 100% |
|
|
369
|
+
| **RDF-Star** | ✅ 100% |
|
|
370
|
+
| **Turtle** | ✅ 100% |
|
|
3268
371
|
|
|
3269
|
-
|
|
3270
|
-
// - SPARQL query: ~1000-10000 fuel (depends on complexity)
|
|
3271
|
-
// - Datalog evaluation: ~5000-50000 fuel
|
|
3272
|
-
// - Embedding search: ~500-2000 fuel
|
|
372
|
+
---
|
|
3273
373
|
|
|
3274
|
-
|
|
3275
|
-
// Error: FuelExhausted - agent exceeded CPU budget
|
|
374
|
+
## Running Tests
|
|
3276
375
|
|
|
3277
|
-
|
|
3278
|
-
|
|
3279
|
-
|
|
376
|
+
```bash
|
|
377
|
+
npm test # 42 feature tests
|
|
378
|
+
npm run test:jest # 217 unit tests
|
|
3280
379
|
```
|
|
3281
380
|
|
|
3282
|
-
**Fuel Limits by Use Case**:
|
|
3283
|
-
|
|
3284
|
-
| Use Case | Recommended Fuel | Rationale |
|
|
3285
|
-
|----------|------------------|-----------|
|
|
3286
|
-
| Simple queries | 100,000 | Single SPARQL + formatting |
|
|
3287
|
-
| Complex analysis | 1,000,000 | Multiple queries + Datalog |
|
|
3288
|
-
| Long-running agent | 10,000,000 | Extended conversation |
|
|
3289
|
-
| Batch processing | 100,000,000 | Many independent queries |
|
|
3290
|
-
|
|
3291
381
|
---
|
|
3292
382
|
|
|
3293
|
-
##
|
|
383
|
+
## Links
|
|
384
|
+
|
|
385
|
+
- **npm**: [rust-kgdb](https://www.npmjs.com/package/rust-kgdb)
|
|
386
|
+
- **GitHub**: [gonnect-uk/rust-kgdb](https://github.com/gonnect-uk/rust-kgdb)
|
|
387
|
+
- **Benchmark Report**: [HYPERMIND_BENCHMARK_REPORT.md](./HYPERMIND_BENCHMARK_REPORT.md)
|
|
388
|
+
- **Changelog**: [CHANGELOG.md](./CHANGELOG.md)
|
|
3294
389
|
|
|
3295
|
-
|
|
390
|
+
---
|
|
3296
391
|
|
|
3297
|
-
|
|
392
|
+
## Advanced Topics
|
|
3298
393
|
|
|
3299
|
-
|
|
3300
|
-
const { HyperMindAgent, GraphDB, DatalogProgram, evaluateDatalog, GraphFrame } = require('rust-kgdb')
|
|
394
|
+
For those interested in the technical foundations of why HyperMind achieves deterministic AI reasoning.
|
|
3301
395
|
|
|
3302
|
-
|
|
3303
|
-
const db = new GraphDB('http://insurance.org/')
|
|
3304
|
-
db.loadTtl(FRAUD_ONTOLOGY, 'http://insurance.org/fraud-kb')
|
|
396
|
+
### Why It Works: The Technical Foundation
|
|
3305
397
|
|
|
3306
|
-
|
|
3307
|
-
kg: db,
|
|
3308
|
-
name: 'fraud-detector',
|
|
3309
|
-
sandbox: {
|
|
3310
|
-
capabilities: ['ReadKG', 'ExecuteTool'], // Read-only!
|
|
3311
|
-
fuelLimit: 1_000_000
|
|
3312
|
-
}
|
|
3313
|
-
})
|
|
398
|
+
HyperMind's reliability comes from three mathematical foundations:
|
|
3314
399
|
|
|
3315
|
-
|
|
3316
|
-
|
|
3317
|
-
|
|
3318
|
-
|
|
3319
|
-
|
|
3320
|
-
{ predicate: 'claimant', terms: ['?Y'] },
|
|
3321
|
-
{ predicate: 'provider', terms: ['?P'] },
|
|
3322
|
-
{ predicate: 'claims_with', terms: ['?X', '?P'] },
|
|
3323
|
-
{ predicate: 'claims_with', terms: ['?Y', '?P'] },
|
|
3324
|
-
{ predicate: 'knows', terms: ['?X', '?Y'] }
|
|
3325
|
-
]
|
|
3326
|
-
})
|
|
400
|
+
| Foundation | What It Does | Practical Benefit |
|
|
401
|
+
|------------|--------------|-------------------|
|
|
402
|
+
| **Schema Awareness** | Auto-extracts your data structure | LLM only generates valid queries |
|
|
403
|
+
| **Typed Tools** | Input/output validation | Prevents invalid tool combinations |
|
|
404
|
+
| **Reasoning Trace** | Records every step | Complete audit trail for compliance |
|
|
3327
405
|
|
|
3328
|
-
|
|
3329
|
-
const result = await agent.call('Find all claimants with high risk scores')
|
|
406
|
+
### The Reasoning Trace (Audit Trail)
|
|
3330
407
|
|
|
3331
|
-
|
|
3332
|
-
console.log(result.explanation) // Full execution trace
|
|
3333
|
-
console.log(result.proof) // Curry-Howard proof witness
|
|
3334
|
-
```
|
|
408
|
+
Every HyperMind answer includes a cryptographically-signed derivation showing exactly how the conclusion was reached:
|
|
3335
409
|
|
|
3336
|
-
**Fraud Agent ProofDAG Output**:
|
|
3337
410
|
```
|
|
3338
411
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
3339
|
-
│
|
|
3340
|
-
│ │
|
|
3341
|
-
│ ROOT: Collusion Detection (P001 ↔ P002 ↔ PROV001) │
|
|
3342
|
-
│ ═══════════════════════════════════════════════════ │
|
|
3343
|
-
│ │
|
|
3344
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
3345
|
-
│ │ Rule: potential_collusion(?X, ?Y, ?P) │ │
|
|
3346
|
-
│ │ Bindings: ?X=P001, ?Y=P002, ?P=PROV001 │ │
|
|
3347
|
-
│ │ │ │
|
|
3348
|
-
│ │ Proof Tree: │ │
|
|
3349
|
-
│ │ claimant(P001) ✓ [fact from KG] │ │
|
|
3350
|
-
│ │ claimant(P002) ✓ [fact from KG] │ │
|
|
3351
|
-
│ │ provider(PROV001) ✓ [fact from KG] │ │
|
|
3352
|
-
│ │ claims_with(P001,PROV001) ✓ [inferred from CLM001] │ │
|
|
3353
|
-
│ │ claims_with(P002,PROV001) ✓ [inferred from CLM002] │ │
|
|
3354
|
-
│ │ knows(P001,P002) ✓ [fact from KG] │ │
|
|
3355
|
-
│ │ ───────────────────────────────────────────── │ │
|
|
3356
|
-
│ │ ∴ potential_collusion(P001,P002,PROV001) ✓ [DERIVED] │ │
|
|
3357
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
412
|
+
│ REASONING TRACE │
|
|
3358
413
|
│ │
|
|
3359
|
-
│
|
|
3360
|
-
│
|
|
3361
|
-
│
|
|
3362
|
-
│
|
|
3363
|
-
│
|
|
3364
|
-
│
|
|
3365
|
-
│
|
|
3366
|
-
│
|
|
3367
|
-
│
|
|
414
|
+
│ ┌────────────────────────────────┐ │
|
|
415
|
+
│ │ CONCLUSION (Root) │ │
|
|
416
|
+
│ │ "Provider P001 is suspicious" │ │
|
|
417
|
+
│ │ Confidence: 94% │ │
|
|
418
|
+
│ └───────────────┬────────────────┘ │
|
|
419
|
+
│ │ │
|
|
420
|
+
│ ┌───────────────┼───────────────┐ │
|
|
421
|
+
│ │ │ │ │
|
|
422
|
+
│ ▼ ▼ ▼ │
|
|
423
|
+
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
|
|
424
|
+
│ │ Database Query │ │ Rule Application │ │ Similarity Match │ │
|
|
425
|
+
│ │ │ │ │ │ │ │
|
|
426
|
+
│ │ Tool: SPARQL │ │ Tool: Datalog │ │ Tool: Embeddings │ │
|
|
427
|
+
│ │ Result: 47 claims│ │ Result: MATCHED │ │ Result: 87% │ │
|
|
428
|
+
│ │ Time: 2.3ms │ │ Rule: fraud(?P) │ │ similar to known │ │
|
|
429
|
+
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
|
|
3368
430
|
│ │
|
|
3369
|
-
│
|
|
431
|
+
│ HASH: sha256:8f3a2b1c4d5e... (Reproducible, Auditable, Verifiable) │
|
|
3370
432
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
3371
433
|
```
|
|
3372
434
|
|
|
3373
|
-
###
|
|
3374
|
-
|
|
3375
|
-
**Use Case**: Commercial insurance underwriting with ISO/NAIC rating factors.
|
|
3376
|
-
|
|
3377
|
-
```javascript
|
|
3378
|
-
const { HyperMindAgent, GraphDB, DatalogProgram, evaluateDatalog } = require('rust-kgdb')
|
|
3379
|
-
|
|
3380
|
-
const db = new GraphDB('http://underwriting.org/')
|
|
3381
|
-
db.loadTtl(UNDERWRITING_KB, 'http://underwriting.org/data')
|
|
3382
|
-
|
|
3383
|
-
const agent = new HyperMindAgent({
|
|
3384
|
-
kg: db,
|
|
3385
|
-
name: 'underwriter',
|
|
3386
|
-
sandbox: {
|
|
3387
|
-
capabilities: ['ReadKG', 'ExecuteTool'], // Read-only for audit compliance
|
|
3388
|
-
fuelLimit: 500_000
|
|
3389
|
-
}
|
|
3390
|
-
})
|
|
435
|
+
### For Academics: Mathematical Foundations
|
|
3391
436
|
|
|
3392
|
-
|
|
3393
|
-
agent.addRule('auto_approval', {
|
|
3394
|
-
head: { predicate: 'auto_approve', terms: ['?Account'] },
|
|
3395
|
-
body: [
|
|
3396
|
-
{ predicate: 'account', terms: ['?Account'] },
|
|
3397
|
-
{ predicate: 'loss_ratio', terms: ['?Account', '?LR'] },
|
|
3398
|
-
{ predicate: 'years_in_business', terms: ['?Account', '?Years'] },
|
|
3399
|
-
{ predicate: 'builtin_lt', terms: ['?LR', '0.35'] },
|
|
3400
|
-
{ predicate: 'builtin_gt', terms: ['?Years', '5'] }
|
|
3401
|
-
]
|
|
3402
|
-
})
|
|
437
|
+
HyperMind is built on rigorous mathematical foundations:
|
|
3403
438
|
|
|
3404
|
-
|
|
3405
|
-
|
|
3406
|
-
|
|
3407
|
-
|
|
3408
|
-
{ predicate: 'loss_ratio', terms: ['?Account', '?LR'] },
|
|
3409
|
-
{ predicate: 'builtin_gt', terms: ['?LR', '0.50'] }
|
|
3410
|
-
]
|
|
3411
|
-
})
|
|
439
|
+
- **Context Theory** (Spivak's Ologs): Schema represented as a category where objects are classes and morphisms are properties
|
|
440
|
+
- **Type Theory** (Hindley-Milner): Every tool has a typed signature enabling compile-time validation
|
|
441
|
+
- **Proof Theory** (Curry-Howard): Proofs are programs, types are propositions - every conclusion has a derivation
|
|
442
|
+
- **Category Theory**: Tools as morphisms with validated composition
|
|
3412
443
|
|
|
3413
|
-
|
|
3414
|
-
function calculatePremium(baseRate, exposure, territoryMod, lossRatio, yearsInBusiness) {
|
|
3415
|
-
const experienceMod = yearsInBusiness >= 10 ? 0.90 : yearsInBusiness >= 5 ? 0.95 : 1.05
|
|
3416
|
-
const lossMod = lossRatio < 0.30 ? 0.85 : lossRatio < 0.50 ? 1.00 : lossRatio < 0.70 ? 1.15 : 1.35
|
|
3417
|
-
return baseRate * exposure * territoryMod * experienceMod * lossMod
|
|
3418
|
-
}
|
|
444
|
+
These foundations ensure that HyperMind transforms probabilistic LLM outputs into deterministic, verifiable reasoning chains.
|
|
3419
445
|
|
|
3420
|
-
|
|
3421
|
-
const result = await agent.call('Which accounts need manual underwriter review?')
|
|
3422
|
-
```
|
|
446
|
+
### Architecture Layers
|
|
3423
447
|
|
|
3424
|
-
**Underwriting Agent ProofDAG Output**:
|
|
3425
448
|
```
|
|
3426
449
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
3427
|
-
│
|
|
3428
|
-
│ │
|
|
3429
|
-
│ Decision: BUS003 (SafeHaul Logistics) → REFER_TO_UNDERWRITER │
|
|
3430
|
-
│ ═════════════════════════════════════════════════════════ │
|
|
3431
|
-
│ │
|
|
3432
|
-
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
3433
|
-
│ │ RULE FIRED: refer_to_underwriter(?A) │ │
|
|
3434
|
-
│ │ │ │
|
|
3435
|
-
│ │ Datalog Definition: │ │
|
|
3436
|
-
│ │ refer_to_underwriter(?A) :- │ │
|
|
3437
|
-
│ │ account(?A), │ │
|
|
3438
|
-
│ │ loss_ratio(?A, ?L), │ │
|
|
3439
|
-
│ │ ?L > 0.5. │ │
|
|
3440
|
-
│ │ │ │
|
|
3441
|
-
│ │ Matching Facts: │ │
|
|
3442
|
-
│ │ account(BUS003) ✓ SafeHaul is an account │ │
|
|
3443
|
-
│ │ loss_ratio(BUS003, 0.72) ✓ Loss ratio is 72% │ │
|
|
3444
|
-
│ │ 0.72 > 0.5 ✓ Threshold exceeded │ │
|
|
3445
|
-
│ │ ───────────────────────────────────────────── │ │
|
|
3446
|
-
│ │ ∴ refer_to_underwriter(BUS003) ✓ [DERIVED] │ │
|
|
3447
|
-
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
3448
|
-
│ │
|
|
3449
|
-
│ Premium Calculation Trace: │
|
|
3450
|
-
│ ├─ Base Rate: $18.75/100 (NAICS 484110: General Freight Trucking) │
|
|
3451
|
-
│ ├─ Exposure: $4,200,000 revenue │
|
|
3452
|
-
│ ├─ Territory Mod: 1.45 (FEMA Zone AE - high flood risk) │
|
|
3453
|
-
│ ├─ Experience Mod: 0.95 (8 years in business) │
|
|
3454
|
-
│ ├─ Loss Mod: 1.35 (72% loss ratio - poor history) │
|
|
3455
|
-
│ └─ PREMIUM: $18.75 × 42000 × 1.45 × 0.95 × 1.35 = $1,463,925 │
|
|
3456
|
-
│ │
|
|
3457
|
-
│ Risk Factors (from GraphFrame): │
|
|
3458
|
-
│ ├─ Industry: Transportation (ISO high-risk class) │
|
|
3459
|
-
│ ├─ PageRank: 0.1847 (high network centrality in risk graph) │
|
|
3460
|
-
│ └─ Territory: TX-201 (hurricane corridor exposure) │
|
|
3461
|
-
│ │
|
|
3462
|
-
│ Auto-Approved Accounts (low risk): │
|
|
3463
|
-
│ ├─ BUS002 (TechStart LLC): loss_ratio=0.15, years=3 │
|
|
3464
|
-
│ └─ BUS004 (Downtown Restaurant): loss_ratio=0.28, years=12 │
|
|
3465
|
-
│ │
|
|
3466
|
-
│ Proof Hash: sha256:9d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8g │
|
|
3467
|
-
│ Timestamp: 2025-12-15T14:45:00Z │
|
|
3468
|
-
│ Agent: underwriter │
|
|
450
|
+
│ INTELLIGENCE CONTROL PLANE │
|
|
3469
451
|
│ │
|
|
3470
|
-
│
|
|
452
|
+
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
|
453
|
+
│ │ Schema │ │ Tool │ │ Reasoning │ │
|
|
454
|
+
│ │ Awareness │ │ Validation │ │ Trace │ │
|
|
455
|
+
│ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘ │
|
|
456
|
+
│ └────────────────────┼────────────────────┘ │
|
|
457
|
+
│ ▼ │
|
|
458
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
459
|
+
│ │ HYPERMIND AGENT │ │
|
|
460
|
+
│ │ User Query → LLM Planner → Typed Execution Plan → Tools → Answer │ │
|
|
461
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
462
|
+
│ ▼ │
|
|
463
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
464
|
+
│ │ rust-kgdb ENGINE │ │
|
|
465
|
+
│ │ • GraphDB (SPARQL 1.1) • GraphFrames (Analytics) │ │
|
|
466
|
+
│ │ • Datalog (Rules) • Embeddings (Similarity) │ │
|
|
467
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
3471
468
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
3472
469
|
```
|
|
3473
470
|
|
|
3474
|
-
###
|
|
471
|
+
### Security Model
|
|
3475
472
|
|
|
3476
|
-
|
|
3477
|
-
|--------|-------------|----------------------|
|
|
3478
|
-
| **Audit Question** | "Why was this flagged?" | Hash: 9d4e5f6a → Full derivation chain |
|
|
3479
|
-
| **Regulatory Review** | Black box | "Rule R1 matched facts F1, F2, F3" |
|
|
3480
|
-
| **Reproducibility** | Different each time | Same inputs → Same hash |
|
|
3481
|
-
| **Liability Defense** | "The AI said so" | "ISO guideline + NAIC rule + KG facts" |
|
|
3482
|
-
| **SOX/GDPR Compliance** | Cannot prove | Full execution witness |
|
|
473
|
+
HyperMind includes capability-based security:
|
|
3483
474
|
|
|
3484
|
-
```
|
|
3485
|
-
|
|
3486
|
-
|
|
3487
|
-
|
|
475
|
+
```javascript
|
|
476
|
+
const agent = new HyperMindAgent({
|
|
477
|
+
kg: db,
|
|
478
|
+
scope: new AgentScope({
|
|
479
|
+
allowedGraphs: ['http://insurance.org/'], // Restrict graph access
|
|
480
|
+
allowedPredicates: ['amount', 'provider'], // Restrict predicates
|
|
481
|
+
maxResultSize: 1000 // Limit result size
|
|
482
|
+
}),
|
|
483
|
+
sandbox: {
|
|
484
|
+
capabilities: ['ReadKG', 'ExecuteTool'], // No WriteKG = read-only
|
|
485
|
+
fuelLimit: 1_000_000 // CPU budget
|
|
486
|
+
}
|
|
487
|
+
})
|
|
3488
488
|
```
|
|
3489
489
|
|
|
3490
|
-
|
|
3491
|
-
|
|
3492
|
-
## Examples
|
|
3493
|
-
|
|
3494
|
-
```bash
|
|
3495
|
-
# Fraud detection agent
|
|
3496
|
-
node examples/fraud-detection-agent.js
|
|
490
|
+
### Memory System
|
|
3497
491
|
|
|
3498
|
-
|
|
3499
|
-
node examples/underwriting-agent.js
|
|
492
|
+
Agents have persistent memory across sessions:
|
|
3500
493
|
|
|
3501
|
-
|
|
3502
|
-
|
|
3503
|
-
|
|
494
|
+
```javascript
|
|
495
|
+
const agent = new HyperMindAgent({
|
|
496
|
+
kg: db,
|
|
497
|
+
memory: new MemoryManager({
|
|
498
|
+
workingMemorySize: 10, // Current session cache
|
|
499
|
+
episodicRetentionDays: 30, // Episode history
|
|
500
|
+
longTermGraph: 'http://memory/' // Persistent knowledge
|
|
501
|
+
})
|
|
502
|
+
})
|
|
3504
503
|
```
|
|
3505
504
|
|
|
3506
505
|
---
|
|
3507
506
|
|
|
3508
|
-
## Links
|
|
3509
|
-
|
|
3510
|
-
- **npm**: [rust-kgdb](https://www.npmjs.com/package/rust-kgdb)
|
|
3511
|
-
- **GitHub**: [gonnect-uk/rust-kgdb](https://github.com/gonnect-uk/rust-kgdb)
|
|
3512
|
-
- **Benchmark Report**: [HYPERMIND_BENCHMARK_REPORT.md](./HYPERMIND_BENCHMARK_REPORT.md)
|
|
3513
|
-
- **Changelog**: [CHANGELOG.md](./CHANGELOG.md)
|
|
3514
|
-
- **Archive**: [README.archive.md](./README.archive.md) - Previous comprehensive documentation
|
|
3515
|
-
|
|
3516
|
-
---
|
|
3517
|
-
|
|
3518
507
|
## License
|
|
3519
508
|
|
|
3520
509
|
Apache 2.0
|