rust-kgdb 0.6.14 → 0.6.15
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2076 -38
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -8,6 +8,131 @@
|
|
|
8
8
|
|
|
9
9
|
Native Rust RDF/SPARQL engine with graph analytics, embeddings, and rule-based reasoning.
|
|
10
10
|
|
|
11
|
+
**+86.4% accuracy over vanilla LLMs** through schema-aware reasoning with verifiable ProofDAGs.
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## The ProofDAG: Verifiable AI Reasoning
|
|
16
|
+
|
|
17
|
+
Every HyperMind answer comes with a **ProofDAG** - a cryptographically-signed derivation graph that makes LLM outputs auditable and reproducible.
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
21
|
+
│ PROOFDAG VISUALIZATION │
|
|
22
|
+
│ │
|
|
23
|
+
│ ┌────────────────────────────────┐ │
|
|
24
|
+
│ │ CONCLUSION (Root) │ │
|
|
25
|
+
│ │ │ │
|
|
26
|
+
│ │ "Provider P001 is suspicious"│ │
|
|
27
|
+
│ │ Risk Score: 0.91 │ │
|
|
28
|
+
│ │ Confidence: 94% │ │
|
|
29
|
+
│ │ │ │
|
|
30
|
+
│ └───────────────┬────────────────┘ │
|
|
31
|
+
│ │ │
|
|
32
|
+
│ ┌───────────────┼───────────────┐ │
|
|
33
|
+
│ │ │ │ │
|
|
34
|
+
│ ▼ ▼ ▼ │
|
|
35
|
+
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
|
|
36
|
+
│ │ SPARQL Evidence │ │ Datalog Derived │ │ Embedding Match │ │
|
|
37
|
+
│ │ │ │ │ │ │ │
|
|
38
|
+
│ │ Tool: kg.sparql │ │ Tool: kg.datalog │ │ Tool: embeddings │ │
|
|
39
|
+
│ │ Query: SELECT... │ │ Rule: fraud(?P) │ │ Entity: P001 │ │
|
|
40
|
+
│ │ │ │ :- high_amount, │ │ │ │
|
|
41
|
+
│ │ Result: │ │ rapid_filing │ │ Result: │ │
|
|
42
|
+
│ │ 47 claims found │ │ │ │ 87% similar to │ │
|
|
43
|
+
│ │ Time: 2.3ms │ │ Result: MATCHED │ │ known fraud │ │
|
|
44
|
+
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
|
|
45
|
+
│ │
|
|
46
|
+
│ ════════════════════════════════════════════════════════════════ │
|
|
47
|
+
│ PROOF HASH: sha256:8f3a2b1c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a │
|
|
48
|
+
│ TIMESTAMP: 2025-12-15T10:30:00Z │
|
|
49
|
+
│ ════════════════════════════════════════════════════════════════ │
|
|
50
|
+
│ │
|
|
51
|
+
│ VERIFICATION: Anyone can replay this exact derivation and get │
|
|
52
|
+
│ the same conclusion with the same hash │
|
|
53
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### How ProofDAGs Solve the LLM Evaluation Problem
|
|
57
|
+
|
|
58
|
+
Traditional LLMs have a fundamental problem: **no way to verify correctness**. HyperMind solves this with mathematical proof theory:
|
|
59
|
+
|
|
60
|
+
```
|
|
61
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
62
|
+
│ LLM EVALUATION: THE PROBLEM & SOLUTION │
|
|
63
|
+
│ │
|
|
64
|
+
│ THE PROBLEM WITH VANILLA LLMs: │
|
|
65
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
66
|
+
│ │ User: "Is Provider P001 suspicious?" │ │
|
|
67
|
+
│ │ LLM: "Yes, Provider P001 appears suspicious because..." │ │
|
|
68
|
+
│ │ │ │
|
|
69
|
+
│ │ Questions that CAN'T be answered: │ │
|
|
70
|
+
│ │ ✗ What data did the LLM actually look at? │ │
|
|
71
|
+
│ │ ✗ Did it hallucinate the evidence? │ │
|
|
72
|
+
│ │ ✗ Can we reproduce this answer tomorrow? │ │
|
|
73
|
+
│ │ ✗ How do we audit this decision for regulators? │ │
|
|
74
|
+
│ │ ✗ What's the basis for the confidence score? │ │
|
|
75
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
76
|
+
│ │
|
|
77
|
+
│ HYPERMIND'S SOLUTION: Proof Theory + Type Theory + Category Theory │
|
|
78
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
79
|
+
│ │ │ │
|
|
80
|
+
│ │ TYPE THEORY (Hindley-Milner): │ │
|
|
81
|
+
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
82
|
+
│ │ │ Every tool has a typed signature: │ │ │
|
|
83
|
+
│ │ │ kg.sparql.query : Query → BindingSet │ │ │
|
|
84
|
+
│ │ │ kg.datalog.apply : RuleSet → InferredFacts │ │ │
|
|
85
|
+
│ │ │ kg.embeddings.search : Entity → SimilarEntities │ │ │
|
|
86
|
+
│ │ │ │ │ │
|
|
87
|
+
│ │ │ LLM must produce plans that TYPE CHECK │ │ │
|
|
88
|
+
│ │ │ Invalid tool composition → compile-time rejection │ │ │
|
|
89
|
+
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
90
|
+
│ │ │ │
|
|
91
|
+
│ │ CATEGORY THEORY (Morphism Composition): │ │
|
|
92
|
+
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
93
|
+
│ │ │ Tools are morphisms in a category: │ │ │
|
|
94
|
+
│ │ │ │ │ │
|
|
95
|
+
│ │ │ Query ──sparql──→ BindingSet ──datalog──→ InferredFacts │ │ │
|
|
96
|
+
│ │ │ │ │ │
|
|
97
|
+
│ │ │ Composition validated: output(f) = input(g) for f;g │ │ │
|
|
98
|
+
│ │ │ This guarantees well-formed execution plans │ │ │
|
|
99
|
+
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
100
|
+
│ │ │ │
|
|
101
|
+
│ │ PROOF THEORY (Curry-Howard): │ │
|
|
102
|
+
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
|
103
|
+
│ │ │ Proofs are Programs, Types are Propositions │ │ │
|
|
104
|
+
│ │ │ │ │ │
|
|
105
|
+
│ │ │ Proposition: "P001 is suspicious" │ │ │
|
|
106
|
+
│ │ │ Proof: ProofDAG with derivation chain │ │ │
|
|
107
|
+
│ │ │ │ │ │
|
|
108
|
+
│ │ │ Γ ⊢ sparql("...") : BindingSet (47 claims) │ │ │
|
|
109
|
+
│ │ │ Γ ⊢ datalog(rules) : InferredFact (fraud matched) │ │ │
|
|
110
|
+
│ │ │ Γ ⊢ embedding(P001) : Similarity (0.87 score) │ │ │
|
|
111
|
+
│ │ │ ────────────────────────────────────────────────────── │ │ │
|
|
112
|
+
│ │ │ Γ ⊢ suspicious(P001) : Conclusion (QED) │ │ │
|
|
113
|
+
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
|
114
|
+
│ │ │ │
|
|
115
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
116
|
+
│ │
|
|
117
|
+
│ RESULT: LLM outputs become MATHEMATICALLY VERIFIABLE │
|
|
118
|
+
│ ✓ Every claim traced to specific SPARQL results │
|
|
119
|
+
│ ✓ Every inference justified by Datalog rule application │
|
|
120
|
+
│ ✓ Every similarity score backed by embedding computation │
|
|
121
|
+
│ ✓ Deterministic hash enables reproducibility │
|
|
122
|
+
│ ✓ Full audit trail for regulatory compliance │
|
|
123
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
**LLM Evaluation Metrics Improved by ProofDAGs**:
|
|
127
|
+
|
|
128
|
+
| Metric | Vanilla LLM | HyperMind + ProofDAG | Improvement |
|
|
129
|
+
|--------|-------------|---------------------|-------------|
|
|
130
|
+
| **Factual Accuracy** | ~60% (hallucinations) | 100% (grounded in KG) | +66% |
|
|
131
|
+
| **Reproducibility** | 0% (non-deterministic) | 100% (same hash = same answer) | ∞ |
|
|
132
|
+
| **Auditability** | 0% (black box) | 100% (full derivation chain) | ∞ |
|
|
133
|
+
| **Explainability** | Low (post-hoc) | High (proof witnesses) | +300% |
|
|
134
|
+
| **Regulatory Compliance** | Fails | Passes (GDPR Art. 22, SOX) | Required |
|
|
135
|
+
|
|
11
136
|
---
|
|
12
137
|
|
|
13
138
|
## What rust-kgdb Provides
|
|
@@ -16,6 +141,88 @@ Native Rust RDF/SPARQL engine with graph analytics, embeddings, and rule-based r
|
|
|
16
141
|
- **GraphDB** - W3C compliant RDF quad store with SPOC/POCS/OCSP/CSPO indexes
|
|
17
142
|
- **SPARQL 1.1** - Full query and update support (64 builtin functions)
|
|
18
143
|
- **RDF 1.2** - Complete standard implementation
|
|
144
|
+
- **RDF-Star (RDF*)** - Quoted triples for statements about statements
|
|
145
|
+
- **Native Hypergraph** - Beyond RDF triples: n-ary relationships, hyperedges
|
|
146
|
+
|
|
147
|
+
### Data Model: RDF + Hypergraph
|
|
148
|
+
|
|
149
|
+
```
|
|
150
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
151
|
+
│ DATA MODEL COMPARISON │
|
|
152
|
+
│ │
|
|
153
|
+
│ TRADITIONAL RDF: HYPERGRAPH (rust-kgdb native): │
|
|
154
|
+
│ ┌─────────────────────┐ ┌─────────────────────────────────┐ │
|
|
155
|
+
│ │ Subject → Object │ │ Hyperedge connects N nodes │ │
|
|
156
|
+
│ │ (binary relation) │ │ (n-ary relation) │ │
|
|
157
|
+
│ │ │ │ │ │
|
|
158
|
+
│ │ A ──pred──→ B │ │ A ──┐ │ │
|
|
159
|
+
│ │ │ │ │ │ │
|
|
160
|
+
│ │ │ │ B ──┼── hyperedge ──→ D │ │
|
|
161
|
+
│ │ │ │ │ │ │
|
|
162
|
+
│ │ │ │ C ──┘ │ │
|
|
163
|
+
│ └─────────────────────┘ └─────────────────────────────────┘ │
|
|
164
|
+
│ │
|
|
165
|
+
│ RDF-Star (Quoted Triples): Memory Hypergraph (Agent Memory): │
|
|
166
|
+
│ ┌─────────────────────┐ ┌─────────────────────────────────┐ │
|
|
167
|
+
│ │ << A :knows B >> │ │ Episode links to N KG entities │ │
|
|
168
|
+
│ │ :certainty │ │ │ │
|
|
169
|
+
│ │ 0.95 │ │ Episode:001 ──→ Provider:P001 │ │
|
|
170
|
+
│ │ │ │ ──→ Claim:C123 │ │
|
|
171
|
+
│ │ (statement about │ │ ──→ Claimant:C001 │ │
|
|
172
|
+
│ │ a statement) │ │ │ │
|
|
173
|
+
│ └─────────────────────┘ └─────────────────────────────────┘ │
|
|
174
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
**RDF-Star Example** (metadata on statements):
|
|
178
|
+
```javascript
|
|
179
|
+
const db = new GraphDB('http://example.org/')
|
|
180
|
+
|
|
181
|
+
// Load RDF-Star data - quoted triples with metadata
|
|
182
|
+
db.loadTtl(`
|
|
183
|
+
@prefix : <http://example.org/> .
|
|
184
|
+
|
|
185
|
+
# Standard triple
|
|
186
|
+
:alice :knows :bob .
|
|
187
|
+
|
|
188
|
+
# RDF-Star: statement about a statement
|
|
189
|
+
<< :alice :knows :bob >> :certainty 0.95 ;
|
|
190
|
+
:source :linkedin ;
|
|
191
|
+
:validUntil "2025-12-31"^^xsd:date .
|
|
192
|
+
`, null)
|
|
193
|
+
|
|
194
|
+
// Query metadata about statements
|
|
195
|
+
const results = db.querySelect(`
|
|
196
|
+
PREFIX : <http://example.org/>
|
|
197
|
+
SELECT ?certainty ?source WHERE {
|
|
198
|
+
<< :alice :knows :bob >> :certainty ?certainty ;
|
|
199
|
+
:source ?source .
|
|
200
|
+
}
|
|
201
|
+
`)
|
|
202
|
+
// Returns: [{ certainty: "0.95", source: "http://example.org/linkedin" }]
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
**Native Hypergraph Use Cases**:
|
|
206
|
+
|
|
207
|
+
| Use Case | Why Hypergraph | RDF Workaround |
|
|
208
|
+
|----------|---------------|----------------|
|
|
209
|
+
| **Event participation** | Event links N participants directly | Reification (verbose) |
|
|
210
|
+
| **Document authorship** | Paper links N co-authors | Multiple triples |
|
|
211
|
+
| **Chemical reactions** | Reaction links N compounds | Named graphs |
|
|
212
|
+
| **Agent memory** | Episode links N entities investigated | Blank nodes |
|
|
213
|
+
|
|
214
|
+
**Hyperedge in Memory Ontology**:
|
|
215
|
+
```turtle
|
|
216
|
+
@prefix am: <http://hypermind.ai/memory#> .
|
|
217
|
+
@prefix ins: <http://insurance.org/> .
|
|
218
|
+
|
|
219
|
+
# Hyperedge: Episode links to multiple KG entities
|
|
220
|
+
<episode:001> a am:Episode ;
|
|
221
|
+
am:linksToEntity ins:Provider_P001 ; # N-ary link
|
|
222
|
+
am:linksToEntity ins:Claim_C123 ; # N-ary link
|
|
223
|
+
am:linksToEntity ins:Claimant_C001 ; # N-ary link
|
|
224
|
+
am:prompt "Investigate fraud ring" .
|
|
225
|
+
```
|
|
19
226
|
|
|
20
227
|
### Graph Analytics (GraphFrames)
|
|
21
228
|
- **PageRank** - Iterative ranking algorithm
|
|
@@ -28,13 +235,422 @@ Native Rust RDF/SPARQL engine with graph analytics, embeddings, and rule-based r
|
|
|
28
235
|
|
|
29
236
|
### Why GraphFrames + SQL over SPARQL?
|
|
30
237
|
|
|
31
|
-
SPARQL excels at graph pattern matching but struggles with:
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
238
|
+
SPARQL excels at graph pattern matching but struggles with analytical workloads. GraphFrames bridges this gap: your data stays in RDF, but analytics run on Apache Arrow columnar format for 10-100x faster execution.
|
|
239
|
+
|
|
240
|
+
**SPARQL vs GraphFrames Comparison**:
|
|
241
|
+
|
|
242
|
+
| Use Case | SPARQL | GraphFrames | Winner |
|
|
243
|
+
|----------|--------|-------------|--------|
|
|
244
|
+
| **Simple Pattern Match** | `SELECT ?s ?o WHERE { ?s :knows ?o }` | `graph.find("(a)-[:knows]->(b)")` | SPARQL (simpler) |
|
|
245
|
+
| **Aggregation (1M rows)** | `SELECT (COUNT(?x) as ?c) GROUP BY ?g` - 850ms | `df.groupBy("g").count()` - 12ms | **GraphFrames (70x)** |
|
|
246
|
+
| **Window Function** | Not supported natively | `RANK() OVER (PARTITION BY dept ORDER BY salary)` | **GraphFrames** |
|
|
247
|
+
| **Running Total** | Requires SPARQL 1.1 subqueries | `SUM(amount) OVER (ORDER BY date ROWS UNBOUNDED)` | **GraphFrames** |
|
|
248
|
+
| **Top-K per Group** | Complex nested queries | `ROW_NUMBER() OVER (PARTITION BY category) <= 10` | **GraphFrames** |
|
|
249
|
+
| **Percentiles** | Not supported | `PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency)` | **GraphFrames** |
|
|
250
|
+
| **Export to Parquet** | Not supported | Native Apache Arrow integration | **GraphFrames** |
|
|
251
|
+
| **BI Tool Integration** | Limited | Direct connection via Arrow Flight | **GraphFrames** |
|
|
252
|
+
|
|
253
|
+
**Concrete Examples**:
|
|
254
|
+
|
|
255
|
+
```javascript
|
|
256
|
+
// SPARQL: Count claims by provider (takes 850ms on 1M rows)
|
|
257
|
+
const sparqlResult = db.querySelect(`
|
|
258
|
+
SELECT ?provider (COUNT(?claim) as ?count)
|
|
259
|
+
WHERE { ?claim :provider ?provider }
|
|
260
|
+
GROUP BY ?provider
|
|
261
|
+
ORDER BY DESC(?count)
|
|
262
|
+
LIMIT 10
|
|
263
|
+
`)
|
|
264
|
+
|
|
265
|
+
// GraphFrames: Same query (takes 12ms on 1M rows - 70x faster)
|
|
266
|
+
const gfResult = graph.sql(`
|
|
267
|
+
SELECT provider, COUNT(*) as claim_count
|
|
268
|
+
FROM edges
|
|
269
|
+
WHERE relationship = 'provider'
|
|
270
|
+
GROUP BY provider
|
|
271
|
+
ORDER BY claim_count DESC
|
|
272
|
+
LIMIT 10
|
|
273
|
+
`)
|
|
274
|
+
|
|
275
|
+
// GraphFrames: Window functions (impossible in SPARQL)
|
|
276
|
+
const ranked = graph.sql(`
|
|
277
|
+
SELECT
|
|
278
|
+
provider,
|
|
279
|
+
claim_amount,
|
|
280
|
+
RANK() OVER (PARTITION BY region ORDER BY claim_amount DESC) as region_rank,
|
|
281
|
+
SUM(claim_amount) OVER (PARTITION BY provider ORDER BY claim_date) as running_total,
|
|
282
|
+
AVG(claim_amount) OVER (PARTITION BY provider ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) as moving_avg
|
|
283
|
+
FROM claims
|
|
284
|
+
`)
|
|
285
|
+
|
|
286
|
+
// GraphFrames: Percentile analysis (impossible in SPARQL)
|
|
287
|
+
const percentiles = graph.sql(`
|
|
288
|
+
SELECT
|
|
289
|
+
provider,
|
|
290
|
+
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY claim_amount) as median,
|
|
291
|
+
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY claim_amount) as p95,
|
|
292
|
+
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY claim_amount) as p99
|
|
293
|
+
FROM claims
|
|
294
|
+
GROUP BY provider
|
|
295
|
+
`)
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
**When to Use Each**:
|
|
299
|
+
|
|
300
|
+
| Scenario | Recommendation | Reason |
|
|
301
|
+
|----------|---------------|--------|
|
|
302
|
+
| Graph traversal (friends-of-friends) | SPARQL | Property path syntax is cleaner |
|
|
303
|
+
| Pattern matching (fraud rings) | SPARQL or Motif | Both support cyclic patterns |
|
|
304
|
+
| Large aggregations | GraphFrames | Columnar execution is 10-100x faster |
|
|
305
|
+
| Window functions | GraphFrames | Not available in SPARQL |
|
|
306
|
+
| Export/BI integration | GraphFrames | Native Parquet/Arrow support |
|
|
307
|
+
| Schema inference | SPARQL | CONSTRUCT queries for RDF generation |
|
|
308
|
+
|
|
309
|
+
### OLAP Analytics Engine
|
|
310
|
+
|
|
311
|
+
rust-kgdb provides high-performance OLAP analytics over graph data:
|
|
312
|
+
|
|
313
|
+
```
|
|
314
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
315
|
+
│ OLAP ANALYTICS STACK │
|
|
316
|
+
│ │
|
|
317
|
+
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
318
|
+
│ │ GraphFrame API ││
|
|
319
|
+
│ │ graph.pageRank(), graph.connectedComponents(), graph.find(pattern) ││
|
|
320
|
+
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
321
|
+
│ ↓ │
|
|
322
|
+
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
323
|
+
│ │ Query Optimization Layer ││
|
|
324
|
+
│ │ - Predicate pushdown ││
|
|
325
|
+
│ │ - Join reordering ││
|
|
326
|
+
│ │ - WCOJ for cyclic queries ││
|
|
327
|
+
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
328
|
+
│ ↓ │
|
|
329
|
+
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
330
|
+
│ │ Columnar Execution Engine ││
|
|
331
|
+
│ │ - Vectorized operations ││
|
|
332
|
+
│ │ - Cache-optimized memory layout ││
|
|
333
|
+
│ │ - SIMD acceleration ││
|
|
334
|
+
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
335
|
+
│ ↓ │
|
|
336
|
+
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
337
|
+
│ │ GraphFrame (Vertices + Edges) ││
|
|
338
|
+
│ │ - vertices: id, properties ││
|
|
339
|
+
│ │ - edges: src, dst, relationship ││
|
|
340
|
+
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
341
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
**Graph Algorithms**:
|
|
345
|
+
|
|
346
|
+
| Algorithm | Complexity | Use Case |
|
|
347
|
+
|-----------|------------|----------|
|
|
348
|
+
| **PageRank** | O(E × iterations) | Influence ranking, fraud detection |
|
|
349
|
+
| **Connected Components** | O(V + E) | Cluster detection, entity resolution |
|
|
350
|
+
| **Shortest Paths** | O(V + E) | Path finding, relationship distance |
|
|
351
|
+
| **Triangle Count** | O(E^1.5) | Graph density, community structure |
|
|
352
|
+
| **Label Propagation** | O(E × iterations) | Community detection |
|
|
353
|
+
| **Motif Finding** | O(pattern-dependent) | Pattern matching, fraud rings |
|
|
354
|
+
|
|
355
|
+
**No Apache Spark Required**: Unlike traditional graph analytics that require separate Spark clusters, rust-kgdb includes a **native distributed OLAP engine** built on Apache Arrow columnar format. GraphFrames, Pregel, and all analytics run directly in your rust-kgdb cluster without additional infrastructure.
|
|
356
|
+
|
|
357
|
+
---
|
|
358
|
+
|
|
359
|
+
## Deep Dive: Pregel BSP (Bulk Synchronous Parallel)
|
|
360
|
+
|
|
361
|
+
**What is Pregel?**
|
|
362
|
+
|
|
363
|
+
Pregel is Google's **vertex-centric graph processing model**. Instead of thinking about edges, you think about vertices that:
|
|
364
|
+
1. **Receive** messages from neighbors
|
|
365
|
+
2. **Compute** based on messages and local state
|
|
366
|
+
3. **Send** messages to neighbors
|
|
367
|
+
4. **Vote to halt** when done
|
|
368
|
+
|
|
369
|
+
```
|
|
370
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
371
|
+
│ PREGEL: BULK SYNCHRONOUS PARALLEL │
|
|
372
|
+
│ │
|
|
373
|
+
│ Traditional vs Pregel Thinking: │
|
|
374
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
375
|
+
│ │ TRADITIONAL (edge-centric): PREGEL (vertex-centric): │ │
|
|
376
|
+
│ │ for each edge (u, v): for each vertex v in parallel: │ │
|
|
377
|
+
│ │ process(u, v) msgs = receive() │ │
|
|
378
|
+
│ │ v.state = compute(msgs) │ │
|
|
379
|
+
│ │ Problem: Hard to parallelize send(neighbors, newMsg) │ │
|
|
380
|
+
│ │ if done: voteToHalt() │ │
|
|
381
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
382
|
+
│ │
|
|
383
|
+
│ SUPERSTEP EXECUTION: │
|
|
384
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
385
|
+
│ │ │ │
|
|
386
|
+
│ │ Superstep 0 Superstep 1 Superstep 2 HALT │ │
|
|
387
|
+
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌────┐ │ │
|
|
388
|
+
│ │ │ A: init │───────→│ A: recv │───────→│ A: recv │───────→│ A:✓│ │ │
|
|
389
|
+
│ │ │ B: init │───────→│ B: recv │───────→│ B: recv │───────→│ B:✓│ │ │
|
|
390
|
+
│ │ │ C: init │───────→│ C: recv │───────→│ C: recv │───────→│ C:✓│ │ │
|
|
391
|
+
│ │ └─────────┘ └─────────┘ └─────────┘ └────┘ │ │
|
|
392
|
+
│ │ │ │ │ │ │
|
|
393
|
+
│ │ ▼ ▼ ▼ │ │
|
|
394
|
+
│ │ BARRIER BARRIER BARRIER DONE │ │
|
|
395
|
+
│ │ (all sync) (all sync) (all sync) │ │
|
|
396
|
+
│ │ │ │
|
|
397
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
398
|
+
│ │
|
|
399
|
+
│ KEY INSIGHT: Vertices process in PARALLEL, synchronize at BARRIERS │
|
|
400
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
**Pregel Shortest Paths Example**:
|
|
404
|
+
|
|
405
|
+
```javascript
|
|
406
|
+
const { pregelShortestPaths, GraphFrame } = require('rust-kgdb')
|
|
407
|
+
|
|
408
|
+
// Create a weighted graph
|
|
409
|
+
const graph = new GraphFrame(
|
|
410
|
+
JSON.stringify([
|
|
411
|
+
{ id: 'A' }, { id: 'B' }, { id: 'C' }, { id: 'D' }, { id: 'E' }
|
|
412
|
+
]),
|
|
413
|
+
JSON.stringify([
|
|
414
|
+
{ src: 'A', dst: 'B', weight: 1 },
|
|
415
|
+
{ src: 'A', dst: 'C', weight: 4 },
|
|
416
|
+
{ src: 'B', dst: 'C', weight: 2 },
|
|
417
|
+
{ src: 'B', dst: 'D', weight: 5 },
|
|
418
|
+
{ src: 'C', dst: 'D', weight: 1 },
|
|
419
|
+
{ src: 'D', dst: 'E', weight: 3 }
|
|
420
|
+
])
|
|
421
|
+
)
|
|
422
|
+
|
|
423
|
+
// Find shortest paths from landmarks A and B to all vertices
|
|
424
|
+
const distances = pregelShortestPaths(graph, ['A', 'B'])
|
|
425
|
+
console.log('Shortest distances:', JSON.parse(distances))
|
|
426
|
+
// Output:
|
|
427
|
+
// {
|
|
428
|
+
// "A": { "from_A": 0, "from_B": 1 },
|
|
429
|
+
// "B": { "from_A": 1, "from_B": 0 },
|
|
430
|
+
// "C": { "from_A": 3, "from_B": 2 },
|
|
431
|
+
// "D": { "from_A": 4, "from_B": 3 },
|
|
432
|
+
// "E": { "from_A": 7, "from_B": 6 }
|
|
433
|
+
// }
|
|
434
|
+
```
|
|
435
|
+
|
|
436
|
+
**How Pregel Shortest Paths Works**:
|
|
437
|
+
|
|
438
|
+
```
|
|
439
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
440
|
+
│ PREGEL SHORTEST PATHS EXECUTION │
|
|
441
|
+
│ │
|
|
442
|
+
│ Graph: A─1→B─2→C─1→D─3→E │
|
|
443
|
+
│ └──4──┘ │
|
|
444
|
+
│ │
|
|
445
|
+
│ SUPERSTEP 0 (Initialize): │
|
|
446
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
447
|
+
│ │ A.dist = 0 (source) │ │
|
|
448
|
+
│ │ B.dist = ∞ │ │
|
|
449
|
+
│ │ C.dist = ∞ │ │
|
|
450
|
+
│ │ D.dist = ∞ │ │
|
|
451
|
+
│ │ E.dist = ∞ │ │
|
|
452
|
+
│ │ A sends: (B, 1), (C, 4) │ │
|
|
453
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
454
|
+
│ │
|
|
455
|
+
│ SUPERSTEP 1 (Process A's messages): │
|
|
456
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
457
|
+
│ │ B receives (B, 1) → B.dist = min(∞, 1) = 1 │ │
|
|
458
|
+
│ │ C receives (C, 4) → C.dist = min(∞, 4) = 4 │ │
|
|
459
|
+
│ │ B sends: (C, 1+2=3), (D, 1+5=6) │ │
|
|
460
|
+
│ │ C sends: (D, 4+1=5) │ │
|
|
461
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
462
|
+
│ │
|
|
463
|
+
│ SUPERSTEP 2 (Process B, C messages): │
|
|
464
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
465
|
+
│ │ C receives (C, 3) → C.dist = min(4, 3) = 3 ← IMPROVED! │ │
|
|
466
|
+
│ │ D receives (D, 6), (D, 5) → D.dist = min(∞, 5) = 5 │ │
|
|
467
|
+
│ │ C sends: (D, 3+1=4) ← Propagate improvement │ │
|
|
468
|
+
│ │ D sends: (E, 5+3=8) │ │
|
|
469
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
470
|
+
│ │
|
|
471
|
+
│ SUPERSTEP 3: │
|
|
472
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
473
|
+
│ │ D receives (D, 4) → D.dist = min(5, 4) = 4 ← IMPROVED! │ │
|
|
474
|
+
│ │ E receives (E, 8) → E.dist = min(∞, 8) = 8 │ │
|
|
475
|
+
│ │ D sends: (E, 4+3=7) ← Propagate improvement │ │
|
|
476
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
477
|
+
│ │
|
|
478
|
+
│ SUPERSTEP 4: │
|
|
479
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
480
|
+
│ │ E receives (E, 7) → E.dist = min(8, 7) = 7 ← FINAL │ │
|
|
481
|
+
│ │ No new improvements → All vertices vote to halt │ │
|
|
482
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
483
|
+
│ │
|
|
484
|
+
│ RESULT: A=0, B=1, C=3, D=4, E=7 │
|
|
485
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
486
|
+
```
|
|
487
|
+
|
|
488
|
+
**Pregel vs Other Approaches**:
|
|
36
489
|
|
|
37
|
-
|
|
490
|
+
| Approach | Pros | Cons | When to Use |
|
|
491
|
+
|----------|------|------|-------------|
|
|
492
|
+
| **Pregel (BSP)** | Simple model, automatic parallelism | Barrier overhead | Iterative algorithms |
|
|
493
|
+
| **GraphX (Spark)** | Mature ecosystem | Requires Spark cluster | Already using Spark |
|
|
494
|
+
| **Native (rust-kgdb)** | Zero dependencies, fastest | Less mature | Production deployment |
|
|
495
|
+
| **MapReduce** | Fault tolerant | High latency | Batch processing |
|
|
496
|
+
|
|
497
|
+
**Algorithms Built on Pregel in rust-kgdb**:
|
|
498
|
+
|
|
499
|
+
| Algorithm | Supersteps | Message Type | Use Case |
|
|
500
|
+
|-----------|------------|--------------|----------|
|
|
501
|
+
| **Shortest Paths** | O(diameter) | (vertex, distance) | Route finding |
|
|
502
|
+
| **PageRank** | 20 (typical) | (vertex, rank contribution) | Influence ranking |
|
|
503
|
+
| **Connected Components** | O(diameter) | (vertex, component_id) | Cluster detection |
|
|
504
|
+
| **Label Propagation** | O(log n) | (vertex, label) | Community detection |
|
|
505
|
+
|
|
506
|
+
---
|
|
507
|
+
|
|
508
|
+
**GraphFrame Example - Degrees & Analytics**:
|
|
509
|
+
```javascript
|
|
510
|
+
const { GraphFrame, friendsGraph } = require('rust-kgdb')
|
|
511
|
+
|
|
512
|
+
// Create graph from vertices and edges
|
|
513
|
+
const graph = new GraphFrame(
|
|
514
|
+
JSON.stringify([
|
|
515
|
+
{ id: 'alice' }, { id: 'bob' }, { id: 'charlie' }, { id: 'david' }
|
|
516
|
+
]),
|
|
517
|
+
JSON.stringify([
|
|
518
|
+
{ src: 'alice', dst: 'bob' },
|
|
519
|
+
{ src: 'alice', dst: 'charlie' },
|
|
520
|
+
{ src: 'bob', dst: 'charlie' },
|
|
521
|
+
{ src: 'charlie', dst: 'david' }
|
|
522
|
+
])
|
|
523
|
+
)
|
|
524
|
+
|
|
525
|
+
// Degree analysis
|
|
526
|
+
const degrees = JSON.parse(graph.degrees())
|
|
527
|
+
console.log('Degrees:', degrees)
|
|
528
|
+
// Output: { alice: { in: 0, out: 2 }, bob: { in: 1, out: 1 }, charlie: { in: 2, out: 1 }, david: { in: 1, out: 0 } }
|
|
529
|
+
|
|
530
|
+
// PageRank (fraud detection: who has most influence?)
|
|
531
|
+
const pagerank = JSON.parse(graph.pageRank(0.85, 20))
|
|
532
|
+
console.log('PageRank:', pagerank)
|
|
533
|
+
// Output: { alice: 0.15, bob: 0.21, charlie: 0.38, david: 0.26 }
|
|
534
|
+
|
|
535
|
+
// Triangle count (graph density)
|
|
536
|
+
console.log('Triangles:', graph.triangleCount()) // 1
|
|
537
|
+
|
|
538
|
+
// Motif finding (pattern matching)
|
|
539
|
+
const patterns = JSON.parse(graph.find('(a)-[e1]->(b); (b)-[e2]->(c)'))
|
|
540
|
+
console.log('Chain patterns:', patterns)
|
|
541
|
+
// Finds: alice→bob→charlie, bob→charlie→david
|
|
542
|
+
```
|
|
543
|
+
|
|
544
|
+
### Query Optimizations
|
|
545
|
+
|
|
546
|
+
**WCOJ (Worst-Case Optimal Join)**:
|
|
547
|
+
```
|
|
548
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
549
|
+
│ WCOJ vs TRADITIONAL JOIN │
|
|
550
|
+
│ │
|
|
551
|
+
│ Query: Find triangles (a)→(b)→(c)→(a) │
|
|
552
|
+
│ │
|
|
553
|
+
│ TRADITIONAL (Hash Join): WCOJ (Leapfrog Triejoin): │
|
|
554
|
+
│ ┌─────────────────────────┐ ┌─────────────────────────┐ │
|
|
555
|
+
│ │ Step 1: Join(E1, E2) │ │ Intersect iterators │ │
|
|
556
|
+
│ │ O(n²) worst │ │ on sorted indexes │ │
|
|
557
|
+
│ │ Step 2: Join(result, E3)│ │ │ │
|
|
558
|
+
│ │ O(n²) worst │ │ O(n^(w/2)) guaranteed │ │
|
|
559
|
+
│ │ │ │ w = fractional edge │ │
|
|
560
|
+
│ │ Total: O(n⁴) possible │ │ cover number │ │
|
|
561
|
+
│ └─────────────────────────┘ └─────────────────────────┘ │
|
|
562
|
+
│ │
|
|
563
|
+
│ For cyclic queries (fraud rings!), WCOJ is exponentially faster │
|
|
564
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
565
|
+
```
|
|
566
|
+
|
|
567
|
+
**Sparse Matrix Representations** (for Datalog reasoning):
|
|
568
|
+
|
|
569
|
+
| Format | Structure | Best For |
|
|
570
|
+
|--------|-----------|----------|
|
|
571
|
+
| **CSR** (Compressed Sparse Row) | Row pointers + column indices | Forward traversal (S→P→O) |
|
|
572
|
+
| **CSC** (Compressed Sparse Column) | Column pointers + row indices | Backward traversal (O→P→S) |
|
|
573
|
+
| **COO** (Coordinate) | (row, col, val) tuples | Incremental updates |
|
|
574
|
+
|
|
575
|
+
**Semi-Naive Datalog Evaluation**:
|
|
576
|
+
```
|
|
577
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
578
|
+
│ SEMI-NAIVE OPTIMIZATION │
|
|
579
|
+
│ │
|
|
580
|
+
│ Naive: Each iteration re-evaluates ALL rules on ALL facts │
|
|
581
|
+
│ Semi-Naive: Only evaluate rules on NEW facts from previous iteration │
|
|
582
|
+
│ │
|
|
583
|
+
│ Iteration 1: Δ¹ = immediate consequences of base facts │
|
|
584
|
+
│ Iteration 2: Δ² = rules applied to Δ¹ only (not base facts again) │
|
|
585
|
+
│ ... │
|
|
586
|
+
│ Fixpoint: When Δⁿ = ∅ │
|
|
587
|
+
│ │
|
|
588
|
+
│ Speedup: O(n) → O(Δ) per iteration │
|
|
589
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
590
|
+
```
|
|
591
|
+
|
|
592
|
+
**Index Structures**:
|
|
593
|
+
|
|
594
|
+
| Index | Pattern | Lookup Time |
|
|
595
|
+
|-------|---------|-------------|
|
|
596
|
+
| **SPOC** | Subject-Predicate-Object-Context | O(1) exact match |
|
|
597
|
+
| **POCS** | Predicate-Object-Context-Subject | O(1) reverse lookup |
|
|
598
|
+
| **OCSP** | Object-Context-Subject-Predicate | O(1) object queries |
|
|
599
|
+
| **CSPO** | Context-Subject-Predicate-Object | O(1) named graph queries |
|
|
600
|
+
|
|
601
|
+
### Distributed GraphDB Cluster (v0.2.0)
|
|
602
|
+
|
|
603
|
+
Production-ready distributed architecture for billion-triple scale:
|
|
604
|
+
|
|
605
|
+
```
|
|
606
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
607
|
+
│ DISTRIBUTED CLUSTER ARCHITECTURE │
|
|
608
|
+
│ │
|
|
609
|
+
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
610
|
+
│ │ COORDINATOR NODE ││
|
|
611
|
+
│ │ - Query routing & optimization ││
|
|
612
|
+
│ │ - HDRF partition assignment ││
|
|
613
|
+
│ │ - Result aggregation ││
|
|
614
|
+
│ │ - Raft consensus leader ││
|
|
615
|
+
│ └──────────────────────────────┬──────────────────────────────────────────┘│
|
|
616
|
+
│ │ gRPC │
|
|
617
|
+
│ ┌──────────────────────┼──────────────────────┐ │
|
|
618
|
+
│ │ │ │ │
|
|
619
|
+
│ ▼ ▼ ▼ │
|
|
620
|
+
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
621
|
+
│ │ EXECUTOR 0 │ │ EXECUTOR 1 │ │ EXECUTOR 2 │ │
|
|
622
|
+
│ │ │ │ │ │ │ │
|
|
623
|
+
│ │ Partition 0 │ │ Partition 1 │ │ Partition 2 │ │
|
|
624
|
+
│ │ Partition 3 │ │ Partition 4 │ │ Partition 5 │ │
|
|
625
|
+
│ │ │ │ │ │ │ │
|
|
626
|
+
│ │ RocksDB/LMDB │ │ RocksDB/LMDB │ │ RocksDB/LMDB │ │
|
|
627
|
+
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
628
|
+
│ │
|
|
629
|
+
│ HDRF Partitioning: High-degree vertices replicated for load balancing │
|
|
630
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
631
|
+
```
|
|
632
|
+
|
|
633
|
+
**HDRF (High-Degree-Replicated-First) Partitioning**:
|
|
634
|
+
- Streaming edge partitioner - O(1) assignment decisions
|
|
635
|
+
- High-degree vertices (hubs) replicated across partitions
|
|
636
|
+
- Minimizes cross-partition communication
|
|
637
|
+
- Subject-anchored: all triples for a subject on same partition
|
|
638
|
+
|
|
639
|
+
**Deployment** (Kubernetes):
|
|
640
|
+
```bash
|
|
641
|
+
# Deploy cluster via Helm
|
|
642
|
+
helm install rust-kgdb ./infra/helm -n rust-kgdb --create-namespace
|
|
643
|
+
|
|
644
|
+
# Scale executors
|
|
645
|
+
kubectl scale deployment rust-kgdb-executor --replicas=5 -n rust-kgdb
|
|
646
|
+
```
|
|
647
|
+
|
|
648
|
+
**Storage Backends**:
|
|
649
|
+
| Backend | Persistence | Use Case |
|
|
650
|
+
|---------|-------------|----------|
|
|
651
|
+
| **InMemory** | None | Development, testing |
|
|
652
|
+
| **RocksDB** | LSM-tree | Write-heavy workloads |
|
|
653
|
+
| **LMDB** | B+tree, mmap | Read-heavy workloads |
|
|
38
654
|
|
|
39
655
|
### Distributed Cluster (v0.2.0)
|
|
40
656
|
- **HDRF Partitioning** - High-Degree-Replicated-First streaming partitioner
|
|
@@ -48,9 +664,367 @@ GraphFrames bridges this gap: your data stays in RDF, but analytics run on Apach
|
|
|
48
664
|
- **Multiple Providers** - OpenAI, Ollama, Anthropic, or custom
|
|
49
665
|
|
|
50
666
|
### Reasoning
|
|
51
|
-
- **Datalog** - Semi-naive rule evaluation with stratified negation
|
|
667
|
+
- **Datalog** - Semi-naive rule evaluation with stratified negation (distributed-ready)
|
|
52
668
|
- **HyperMindAgent** - Pattern-based intent classification (no LLM calls)
|
|
53
669
|
|
|
670
|
+
---
|
|
671
|
+
|
|
672
|
+
## Deep Dive: Motif Pattern Matching
|
|
673
|
+
|
|
674
|
+
**What is Motif Finding?**
|
|
675
|
+
|
|
676
|
+
Motif finding is a **graph pattern search** that finds all subgraphs matching a specified pattern. Unlike SPARQL which matches RDF triple patterns, Motif uses a more intuitive DSL designed for relationship analysis.
|
|
677
|
+
|
|
678
|
+
```
|
|
679
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
680
|
+
│ MOTIF vs SPARQL: WHEN TO USE EACH │
|
|
681
|
+
│ │
|
|
682
|
+
│ SPARQL (RDF Triple Patterns): MOTIF (Graph Pattern DSL): │
|
|
683
|
+
│ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │
|
|
684
|
+
│ │ SELECT ?a ?b ?c WHERE { │ │ "(a)-[e1]->(b); (b)-[e2]->(c)" │
|
|
685
|
+
│ │ ?a :knows ?b . │ │ │ │
|
|
686
|
+
│ │ ?b :knows ?c . │ │ More readable for complex │ │
|
|
687
|
+
│ │ } │ │ multi-hop patterns │ │
|
|
688
|
+
│ └─────────────────────────────┘ └─────────────────────────────┘ │
|
|
689
|
+
│ │
|
|
690
|
+
│ SPARQL is better for: MOTIF is better for: │
|
|
691
|
+
│ • RDF data with named predicates • Relationship chains │
|
|
692
|
+
│ • FILTER expressions • Cyclic patterns (fraud rings) │
|
|
693
|
+
│ • OPTIONAL patterns • Subgraph matching │
|
|
694
|
+
│ • Aggregation (COUNT, GROUP BY) • Visual pattern specification │
|
|
695
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
696
|
+
```
|
|
697
|
+
|
|
698
|
+
**Motif Pattern Syntax**:
|
|
699
|
+
|
|
700
|
+
| Pattern | Meaning | Example Match |
|
|
701
|
+
|---------|---------|---------------|
|
|
702
|
+
| `(a)-[e]->(b)` | a has edge e to b | alice→bob |
|
|
703
|
+
| `(a)-[e1]->(b); (b)-[e2]->(c)` | Chain: a→b→c | alice→bob→charlie |
|
|
704
|
+
| `(a)-[e1]->(b); (a)-[e2]->(c)` | Fork: a→b and a→c | alice→bob, alice→charlie |
|
|
705
|
+
| `(a)-[e1]->(b); (b)-[e2]->(a)` | **Cycle**: a→b→a | Mutual relationship (fraud ring) |
|
|
706
|
+
| `(a)-[e1]->(b); (b)-[e2]->(c); (c)-[e3]->(a)` | **Triangle** | Classic fraud pattern |
|
|
707
|
+
|
|
708
|
+
**Fraud Ring Detection with Motif**:
|
|
709
|
+
|
|
710
|
+
```javascript
|
|
711
|
+
const { GraphFrame } = require('rust-kgdb')
|
|
712
|
+
|
|
713
|
+
// Build transaction graph
|
|
714
|
+
const txGraph = new GraphFrame(
|
|
715
|
+
JSON.stringify([
|
|
716
|
+
{ id: 'account_A' }, { id: 'account_B' },
|
|
717
|
+
{ id: 'account_C' }, { id: 'account_D' }
|
|
718
|
+
]),
|
|
719
|
+
JSON.stringify([
|
|
720
|
+
{ src: 'account_A', dst: 'account_B', relationship: 'transfer', amount: 50000 },
|
|
721
|
+
{ src: 'account_B', dst: 'account_C', relationship: 'transfer', amount: 49500 },
|
|
722
|
+
{ src: 'account_C', dst: 'account_A', relationship: 'transfer', amount: 49000 }, // CYCLE!
|
|
723
|
+
{ src: 'account_D', dst: 'account_A', relationship: 'transfer', amount: 1000 } // Normal
|
|
724
|
+
])
|
|
725
|
+
)
|
|
726
|
+
|
|
727
|
+
// Find triangular money flows (classic money laundering pattern)
|
|
728
|
+
const triangles = txGraph.find('(a)-[e1]->(b); (b)-[e2]->(c); (c)-[e3]->(a)')
|
|
729
|
+
console.log('Suspicious triangles:', JSON.parse(triangles))
|
|
730
|
+
// Output: [{ a: 'account_A', b: 'account_B', c: 'account_C', ... }]
|
|
731
|
+
|
|
732
|
+
// Find chains of 3+ hops (structuring detection)
|
|
733
|
+
const chains = txGraph.find('(a)-[e1]->(b); (b)-[e2]->(c); (c)-[e3]->(d)')
|
|
734
|
+
console.log('Long chains:', JSON.parse(chains))
|
|
735
|
+
```
|
|
736
|
+
|
|
737
|
+
**Performance Characteristics**:
|
|
738
|
+
|
|
739
|
+
| Pattern Type | Complexity | Notes |
|
|
740
|
+
|--------------|------------|-------|
|
|
741
|
+
| Simple edge `(a)->(b)` | O(E) | Linear scan |
|
|
742
|
+
| 2-hop chain `(a)->(b)->(c)` | O(E × avg_degree) | Index-assisted |
|
|
743
|
+
| Triangle `(a)->(b)->(c)->(a)` | O(E^1.5) | WCOJ optimization |
|
|
744
|
+
| 4-clique | O(E²) worst | Uses worst-case optimal joins |
|
|
745
|
+
|
|
746
|
+
---
|
|
747
|
+
|
|
748
|
+
## Deep Dive: Datalog Rule Engine
|
|
749
|
+
|
|
750
|
+
**What is Datalog?**
|
|
751
|
+
|
|
752
|
+
Datalog is a **declarative logic programming language** for expressing recursive queries. Unlike SPARQL which can only match patterns, Datalog can **derive new facts** from existing facts using rules.
|
|
753
|
+
|
|
754
|
+
```
|
|
755
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
756
|
+
│ DATALOG: RULE-BASED REASONING │
|
|
757
|
+
│ │
|
|
758
|
+
│ FACTS (What we know): │
|
|
759
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
760
|
+
│ │ parent(alice, bob). % Alice is parent of Bob │ │
|
|
761
|
+
│ │ parent(bob, charlie). % Bob is parent of Charlie │ │
|
|
762
|
+
│ │ parent(charlie, diana). % Charlie is parent of Diana │ │
|
|
763
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
764
|
+
│ │
|
|
765
|
+
│ RULES (How to derive new facts): │
|
|
766
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
767
|
+
│ │ ancestor(X, Y) :- parent(X, Y). % Direct parent │ │
|
|
768
|
+
│ │ ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y). % Recursive! │ │
|
|
769
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
770
|
+
│ │
|
|
771
|
+
│ DERIVED FACTS (Automatically computed): │
|
|
772
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
773
|
+
│ │ ancestor(alice, bob). % From rule 1 │ │
|
|
774
|
+
│ │ ancestor(bob, charlie). % From rule 1 │ │
|
|
775
|
+
│ │ ancestor(alice, charlie). % From rule 2: alice→bob→charlie │ │
|
|
776
|
+
│ │ ancestor(alice, diana). % From rule 2: alice→bob→charlie→diana │ │
|
|
777
|
+
│ │ ancestor(bob, diana). % From rule 2: bob→charlie→diana │ │
|
|
778
|
+
│ │ ancestor(charlie, diana). % From rule 1 │ │
|
|
779
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
780
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
781
|
+
```
|
|
782
|
+
|
|
783
|
+
### Semi-Naive Evaluation (Performance Optimization)
|
|
784
|
+
|
|
785
|
+
**What is Semi-Naive?**
|
|
786
|
+
|
|
787
|
+
When evaluating recursive rules, the naive approach re-evaluates ALL rules on ALL facts every iteration. Semi-naive only evaluates rules on **newly derived facts** from the previous iteration.
|
|
788
|
+
|
|
789
|
+
```
|
|
790
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
791
|
+
│ NAIVE vs SEMI-NAIVE EVALUATION │
|
|
792
|
+
│ │
|
|
793
|
+
│ Rule: ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y). │
|
|
794
|
+
│ Base: 3 parent facts │
|
|
795
|
+
│ │
|
|
796
|
+
│ NAIVE APPROACH: SEMI-NAIVE APPROACH: │
|
|
797
|
+
│ ┌─────────────────────────┐ ┌─────────────────────────┐ │
|
|
798
|
+
│ │ Iter 1: 3×3 = 9 checks │ │ Iter 1: 3 new ancestors │ │
|
|
799
|
+
│ │ Iter 2: 6×6 = 36 checks │ │ Iter 2: only check Δ¹ │ │
|
|
800
|
+
│ │ Iter 3: 9×9 = 81 checks │ │ Iter 3: only check Δ² │ │
|
|
801
|
+
│ │ ...exponential growth │ │ ...linear in new facts │ │
|
|
802
|
+
│ └─────────────────────────┘ └─────────────────────────┘ │
|
|
803
|
+
│ │
|
|
804
|
+
│ Mathematical notation: │
|
|
805
|
+
│ Δⁿ = facts derived in iteration n │
|
|
806
|
+
│ Semi-naive: only join base facts with Δⁿ⁻¹ (not entire fact set) │
|
|
807
|
+
│ │
|
|
808
|
+
│ Speedup: O(n²) → O(n × Δ) where Δ << n │
|
|
809
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
810
|
+
```
|
|
811
|
+
|
|
812
|
+
### Stratified Negation (Safe Negation in Rules)
|
|
813
|
+
|
|
814
|
+
**What is Stratified Negation?**
|
|
815
|
+
|
|
816
|
+
Negation in Datalog is tricky: `not fraud(X)` means "X is not proven to be fraud". But what if the rule deriving `fraud(X)` hasn't run yet? Stratification solves this by:
|
|
817
|
+
|
|
818
|
+
1. **Ordering rules into strata** - Rules with negation run AFTER the rules they negate
|
|
819
|
+
2. **Computing each stratum to fixpoint** - Before moving to the next
|
|
820
|
+
|
|
821
|
+
```
|
|
822
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
823
|
+
│ STRATIFIED NEGATION │
|
|
824
|
+
│ │
|
|
825
|
+
│ Problem: When can we evaluate "not fraud(X)"? │
|
|
826
|
+
│ │
|
|
827
|
+
│ UNSTRATIFIED (WRONG): │
|
|
828
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
829
|
+
│ │ safe(X) :- claim(X), not fraud(X). % Safe if not fraud │ │
|
|
830
|
+
│ │ fraud(X) :- claim(X), high_amount(X).% Fraud if high amount │ │
|
|
831
|
+
│ │ │ │
|
|
832
|
+
│ │ If we evaluate safe(X) before fraud(X) is computed, │ │
|
|
833
|
+
│ │ we get WRONG results (everything looks safe!) │ │
|
|
834
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
835
|
+
│ │
|
|
836
|
+
│ STRATIFIED (CORRECT): │
|
|
837
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
838
|
+
│ │ STRATUM 1: Compute all positive facts │ │
|
|
839
|
+
│ │ fraud(X) :- claim(X), high_amount(X). ← Run first! │ │
|
|
840
|
+
│ │ │ │
|
|
841
|
+
│ │ STRATUM 2: Now negation is safe │ │
|
|
842
|
+
│ │ safe(X) :- claim(X), not fraud(X). ← Run after stratum 1 │ │
|
|
843
|
+
│ │ │ │
|
|
844
|
+
│ │ Dependency graph: safe depends on NOT fraud, so fraud must be │ │
|
|
845
|
+
│ │ fully computed before safe can be evaluated. │ │
|
|
846
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
847
|
+
│ │
|
|
848
|
+
│ rust-kgdb automatically stratifies your rules! │
|
|
849
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
850
|
+
```
|
|
851
|
+
|
|
852
|
+
### Datalog in Distributed Mode
|
|
853
|
+
|
|
854
|
+
**Distributed Datalog Execution**: rust-kgdb's Datalog engine works in distributed clusters:
|
|
855
|
+
|
|
856
|
+
```
|
|
857
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
858
|
+
│ DISTRIBUTED DATALOG EXECUTION │
|
|
859
|
+
│ │
|
|
860
|
+
│ COORDINATOR │
|
|
861
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
862
|
+
│ │ 1. Parse Datalog program │ │
|
|
863
|
+
│ │ 2. Stratify rules (compute dependency order) │ │
|
|
864
|
+
│ │ 3. For each stratum: │ │
|
|
865
|
+
│ │ a. Broadcast rules to all executors │ │
|
|
866
|
+
│ │ b. Each executor evaluates on local partition │ │
|
|
867
|
+
│ │ c. Exchange facts at partition boundaries (shuffle) │ │
|
|
868
|
+
│ │ d. Repeat until global fixpoint │ │
|
|
869
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
870
|
+
│ │ │
|
|
871
|
+
│ ┌───────────────┼───────────────┐ │
|
|
872
|
+
│ ▼ ▼ ▼ │
|
|
873
|
+
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
874
|
+
│ │ EXECUTOR 0 │ │ EXECUTOR 1 │ │ EXECUTOR 2 │ │
|
|
875
|
+
│ │ │ │ │ │ │ │
|
|
876
|
+
│ │ Local facts │ │ Local facts │ │ Local facts │ │
|
|
877
|
+
│ │ + Rules │ │ + Rules │ │ + Rules │ │
|
|
878
|
+
│ │ = Local Δ │ │ = Local Δ │ │ = Local Δ │ │
|
|
879
|
+
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
|
880
|
+
│ │ │ │ │
|
|
881
|
+
│ └───────────────┼───────────────┘ │
|
|
882
|
+
│ ▼ │
|
|
883
|
+
│ FACT EXCHANGE │
|
|
884
|
+
│ (hash-partitioned shuffle) │
|
|
885
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
886
|
+
```
|
|
887
|
+
|
|
888
|
+
**Complete Datalog Example**:
|
|
889
|
+
|
|
890
|
+
```javascript
|
|
891
|
+
const { DatalogProgram, evaluateDatalog, queryDatalog } = require('rust-kgdb')
|
|
892
|
+
|
|
893
|
+
const program = new DatalogProgram()
|
|
894
|
+
|
|
895
|
+
// Add base facts (from your knowledge graph)
|
|
896
|
+
program.addFact(JSON.stringify({ predicate: 'claim', terms: ['CLM001'] }))
|
|
897
|
+
program.addFact(JSON.stringify({ predicate: 'claim', terms: ['CLM002'] }))
|
|
898
|
+
program.addFact(JSON.stringify({ predicate: 'claim', terms: ['CLM003'] }))
|
|
899
|
+
program.addFact(JSON.stringify({ predicate: 'amount', terms: ['CLM001', '150000'] }))
|
|
900
|
+
program.addFact(JSON.stringify({ predicate: 'amount', terms: ['CLM002', '500'] }))
|
|
901
|
+
program.addFact(JSON.stringify({ predicate: 'amount', terms: ['CLM003', '200000'] }))
|
|
902
|
+
program.addFact(JSON.stringify({ predicate: 'provider', terms: ['CLM001', 'PROV_A'] }))
|
|
903
|
+
program.addFact(JSON.stringify({ predicate: 'provider', terms: ['CLM003', 'PROV_A'] }))
|
|
904
|
+
|
|
905
|
+
// Define rules (NICB fraud patterns)
|
|
906
|
+
// Rule 1: High amount claims (> $100,000) are suspicious
|
|
907
|
+
program.addRule(JSON.stringify({
|
|
908
|
+
head: { predicate: 'high_amount', terms: ['?C'] },
|
|
909
|
+
body: [
|
|
910
|
+
{ predicate: 'claim', terms: ['?C'] },
|
|
911
|
+
{ predicate: 'amount', terms: ['?C', '?A'] },
|
|
912
|
+
{ predicate: 'gt', terms: ['?A', '100000'] } // Built-in comparison
|
|
913
|
+
]
|
|
914
|
+
}))
|
|
915
|
+
|
|
916
|
+
// Rule 2: Providers with multiple high-amount claims need investigation
|
|
917
|
+
program.addRule(JSON.stringify({
|
|
918
|
+
head: { predicate: 'investigate_provider', terms: ['?P'] },
|
|
919
|
+
body: [
|
|
920
|
+
{ predicate: 'high_amount', terms: ['?C1'] },
|
|
921
|
+
{ predicate: 'high_amount', terms: ['?C2'] },
|
|
922
|
+
{ predicate: 'provider', terms: ['?C1', '?P'] },
|
|
923
|
+
{ predicate: 'provider', terms: ['?C2', '?P'] },
|
|
924
|
+
{ predicate: 'neq', terms: ['?C1', '?C2'] } // Different claims
|
|
925
|
+
]
|
|
926
|
+
}))
|
|
927
|
+
|
|
928
|
+
// Evaluate to fixpoint (semi-naive, stratified)
|
|
929
|
+
const allFacts = JSON.parse(evaluateDatalog(program))
|
|
930
|
+
console.log('Derived facts:', allFacts)
|
|
931
|
+
// Includes: high_amount(CLM001), high_amount(CLM003), investigate_provider(PROV_A)
|
|
932
|
+
|
|
933
|
+
// Query specific predicate
|
|
934
|
+
const toInvestigate = JSON.parse(queryDatalog(program, 'investigate_provider'))
|
|
935
|
+
console.log('Providers to investigate:', toInvestigate)
|
|
936
|
+
// Output: [{ predicate: 'investigate_provider', terms: ['PROV_A'] }]
|
|
937
|
+
```
|
|
938
|
+
|
|
939
|
+
---
|
|
940
|
+
|
|
941
|
+
## Deep Dive: ARCADE 1-Hop Cache
|
|
942
|
+
|
|
943
|
+
**What is ARCADE?**
|
|
944
|
+
|
|
945
|
+
ARCADE (Adaptive Retrieval Cache for Approximate Dense Embeddings) is a caching strategy that improves embedding retrieval by **preloading 1-hop neighbors** of frequently accessed entities.
|
|
946
|
+
|
|
947
|
+
```
|
|
948
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
949
|
+
│ ARCADE 1-HOP CACHE │
|
|
950
|
+
│ │
|
|
951
|
+
│ PROBLEM: Embedding lookups are expensive │
|
|
952
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
953
|
+
│ │ Query: "Find entities similar to Alice" │ │
|
|
954
|
+
│ │ Step 1: Get Alice's embedding → 2ms (disk/network) │ │
|
|
955
|
+
│ │ Step 2: HNSW search for neighbors → 5ms │ │
|
|
956
|
+
│ │ Step 3: Get Bob's embedding → 2ms (disk/network) │ │
|
|
957
|
+
│ │ Step 4: Get Charlie's embedding → 2ms (disk/network) │ │
|
|
958
|
+
│ │ Total: 11ms │ │
|
|
959
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
960
|
+
│ │
|
|
961
|
+
│ SOLUTION: Cache 1-hop neighbors proactively │
|
|
962
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
963
|
+
│ │ When Alice is accessed: │ │
|
|
964
|
+
│ │ 1. Load Alice's embedding │ │
|
|
965
|
+
│ │ 2. ALSO load embeddings of Alice's graph neighbors: │ │
|
|
966
|
+
│ │ - Bob (Alice knows Bob) │ │
|
|
967
|
+
│ │ - Company_X (Alice works at Company_X) │ │
|
|
968
|
+
│ │ - Project_Y (Alice contributes to Project_Y) │ │
|
|
969
|
+
│ │ │ │
|
|
970
|
+
│ │ Next query about Bob? Already in cache → 0ms │ │
|
|
971
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
972
|
+
│ │
|
|
973
|
+
│ WHY "1-HOP"? │
|
|
974
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
975
|
+
│ │ │ │
|
|
976
|
+
│ │ [Company_X]←────┐ │ │
|
|
977
|
+
│ │ │ │ │
|
|
978
|
+
│ │ [Project_Y]←──[ALICE]──→[Bob]──→[Charlie] │ │
|
|
979
|
+
│ │ ↑ │ │
|
|
980
|
+
│ │ │ │ │
|
|
981
|
+
│ │ 1-HOP NEIGHBORS 2-HOP (not cached) │ │
|
|
982
|
+
│ │ │ │
|
|
983
|
+
│ │ 1-hop = directly connected = high probability of access │ │
|
|
984
|
+
│ │ 2-hop = too many, cache would explode │ │
|
|
985
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
986
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
987
|
+
```
|
|
988
|
+
|
|
989
|
+
**Performance Impact**:
|
|
990
|
+
|
|
991
|
+
| Scenario | Without ARCADE | With ARCADE | Improvement |
|
|
992
|
+
|----------|---------------|-------------|-------------|
|
|
993
|
+
| Single entity lookup | 2ms | 2ms | Same |
|
|
994
|
+
| Entity + neighbors (5) | 12ms | 2ms | **6x faster** |
|
|
995
|
+
| Fraud ring traversal (10 entities) | 25ms | 4ms | **6x faster** |
|
|
996
|
+
| Cold start | N/A | +5ms initial | One-time cost |
|
|
997
|
+
|
|
998
|
+
**When ARCADE Helps**:
|
|
999
|
+
|
|
1000
|
+
| Use Case | Benefit | Why |
|
|
1001
|
+
|----------|---------|-----|
|
|
1002
|
+
| Fraud ring detection | High | Ring members are 1-hop connected |
|
|
1003
|
+
| Entity resolution | High | Similar entities share neighbors |
|
|
1004
|
+
| Recommendation | High | "Users like you" are 1-hop away |
|
|
1005
|
+
| Random lookups | Low | No locality to exploit |
|
|
1006
|
+
|
|
1007
|
+
```javascript
|
|
1008
|
+
const { EmbeddingService } = require('rust-kgdb')
|
|
1009
|
+
|
|
1010
|
+
// ARCADE is enabled by default
|
|
1011
|
+
const embeddings = new EmbeddingService({
|
|
1012
|
+
provider: 'openai',
|
|
1013
|
+
arcadeCache: {
|
|
1014
|
+
enabled: true,
|
|
1015
|
+
maxSize: 10000, // Cache up to 10K embeddings
|
|
1016
|
+
ttlSeconds: 300, // 5 minute TTL
|
|
1017
|
+
preloadDepth: 1 // 1-hop neighbors (default)
|
|
1018
|
+
}
|
|
1019
|
+
})
|
|
1020
|
+
|
|
1021
|
+
// First access: loads Alice + 1-hop neighbors
|
|
1022
|
+
const aliceEmbedding = await embeddings.get('http://example.org/Alice')
|
|
1023
|
+
|
|
1024
|
+
// Bob is Alice's neighbor: CACHE HIT (0ms instead of 2ms)
|
|
1025
|
+
const bobEmbedding = await embeddings.get('http://example.org/Bob')
|
|
1026
|
+
```
|
|
1027
|
+
|
|
54
1028
|
### Mathematical Foundations (HyperMind Framework)
|
|
55
1029
|
|
|
56
1030
|
The HyperMind agent framework is built on three mathematical pillars:
|
|
@@ -467,6 +1441,301 @@ console.log(proof.hash) // Deterministic hash for auditability
|
|
|
467
1441
|
- Type judgments: Γ ⊢ t : T (context proves term has type)
|
|
468
1442
|
- Curry-Howard correspondence for proof witnesses
|
|
469
1443
|
|
|
1444
|
+
### Automatic Schema Detection: Mathematical Foundations
|
|
1445
|
+
|
|
1446
|
+
When no schema is explicitly provided, HyperMind uses **Context Theory** (based on Spivak's Categorical approach to Databases and Ologs) to automatically discover the schema from your knowledge graph data.
|
|
1447
|
+
|
|
1448
|
+
```
|
|
1449
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1450
|
+
│ MATHEMATICAL SCHEMA DETECTION │
|
|
1451
|
+
│ │
|
|
1452
|
+
│ STEP 1: Category Construction (Objects) │
|
|
1453
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1454
|
+
│ │ For every triple (s, rdf:type, C), add C to Objects │ │
|
|
1455
|
+
│ │ │ │
|
|
1456
|
+
│ │ Input triples: │ │
|
|
1457
|
+
│ │ :claim001 a :Claim . │ │
|
|
1458
|
+
│ │ :provider001 a :Provider . │ │
|
|
1459
|
+
│ │ │ │
|
|
1460
|
+
│ │ Discovered Objects (Classes): { Claim, Provider } │ │
|
|
1461
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1462
|
+
│ │
|
|
1463
|
+
│ STEP 2: Morphism Discovery (Properties) │
|
|
1464
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1465
|
+
│ │ For every triple (s, p, o) where p ≠ rdf:type: │ │
|
|
1466
|
+
│ │ - p becomes a morphism │ │
|
|
1467
|
+
│ │ - domain(p) = type(s) (inferred from rdf:type of subject) │ │
|
|
1468
|
+
│ │ - codomain(p) = type(o) (inferred from rdf:type or literal type)│ │
|
|
1469
|
+
│ │ │ │
|
|
1470
|
+
│ │ Input triples: │ │
|
|
1471
|
+
│ │ :claim001 :submittedBy :provider001 . │ │
|
|
1472
|
+
│ │ :claim001 :amount "50000"^^xsd:decimal . │ │
|
|
1473
|
+
│ │ │ │
|
|
1474
|
+
│ │ Discovered Morphisms: │ │
|
|
1475
|
+
│ │ submittedBy : Claim → Provider (object property) │ │
|
|
1476
|
+
│ │ amount : Claim → xsd:decimal (datatype property) │ │
|
|
1477
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1478
|
+
│ │
|
|
1479
|
+
│ STEP 3: Type Judgment Formation │
|
|
1480
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1481
|
+
│ │ Context Γ = { claim001 : Claim, provider001 : Provider } │ │
|
|
1482
|
+
│ │ │ │
|
|
1483
|
+
│ │ Type Judgment: Γ ⊢ submittedBy(claim001) : Provider │ │
|
|
1484
|
+
│ │ (Under context Γ, applying submittedBy to claim001 yields Provider)│ │
|
|
1485
|
+
│ │ │ │
|
|
1486
|
+
│ │ This forms the basis for SPARQL validation: │ │
|
|
1487
|
+
│ │ - If query uses ?claim :submittedBy ?x, we know ?x : Provider │ │
|
|
1488
|
+
│ │ - If query uses ?claim :unknownPred ?x → TYPE ERROR (not in Γ) │ │
|
|
1489
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1490
|
+
│ │
|
|
1491
|
+
│ RESULT: Schema as Category C │
|
|
1492
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1493
|
+
│ │ Objects: { Claim, Provider, xsd:decimal, xsd:string, ... } │ │
|
|
1494
|
+
│ │ Morphisms: { submittedBy, amount, name, riskScore, ... } │ │
|
|
1495
|
+
│ │ Composition: submittedBy ∘ name : Claim → xsd:string │ │
|
|
1496
|
+
│ │ (claim's provider's name) │ │
|
|
1497
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1498
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1499
|
+
```
|
|
1500
|
+
|
|
1501
|
+
**Key Mathematical Concepts**:
|
|
1502
|
+
|
|
1503
|
+
| Concept | Mathematical Definition | In HyperMind |
|
|
1504
|
+
|---------|------------------------|--------------|
|
|
1505
|
+
| **Olog (Ontology Log)** | Category where objects are types, morphisms are functional relations | `SchemaContext` class |
|
|
1506
|
+
| **Functor** | Structure-preserving map between categories | SPARQL query as `Schema → Results` functor |
|
|
1507
|
+
| **Type Judgment** | Γ ⊢ t : T (context proves term has type) | Validates query variables against schema |
|
|
1508
|
+
| **Pullback** | Fiber product of two morphisms | JOIN operation in SPARQL |
|
|
1509
|
+
| **Curry-Howard** | Proofs = Programs, Types = Propositions | ProofDAG witnesses for audit |
|
|
1510
|
+
|
|
1511
|
+
**Why This Matters**:
|
|
1512
|
+
|
|
1513
|
+
1. **No Schema? No Problem**: HyperMind extracts schema from your data structure
|
|
1514
|
+
2. **Type-Safe Queries**: Invalid predicates caught at planning time, not runtime
|
|
1515
|
+
3. **LLM Grounding**: Schema injected into LLM prompts ensures valid SPARQL generation
|
|
1516
|
+
4. **Provenance**: Every inference traceable through the categorical structure
|
|
1517
|
+
|
|
1518
|
+
### Intelligence Control Plane: The Neuro-Symbolic Stack
|
|
1519
|
+
|
|
1520
|
+
HyperMind implements an **Intelligence Control Plane** - a formal architecture layer that governs how AI agents interact with knowledge, based on research from MIT (David Spivak's Categorical Databases) and Stanford (Pat Langley's Cognitive Architectures).
|
|
1521
|
+
|
|
1522
|
+
```
|
|
1523
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1524
|
+
│ INTELLIGENCE CONTROL PLANE │
|
|
1525
|
+
│ (Neuro-Symbolic Integration Layer) │
|
|
1526
|
+
│ │
|
|
1527
|
+
│ Research Foundations: │
|
|
1528
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1529
|
+
│ │ • MIT - Spivak's "Category Theory for Databases" (2014) │ │
|
|
1530
|
+
│ │ • Stanford - Langley's Cognitive Systems Architecture │ │
|
|
1531
|
+
│ │ • CMU - Curry-Howard Correspondence for AI Verification │ │
|
|
1532
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1533
|
+
│ │
|
|
1534
|
+
│ LAYER 1: NEURAL PERCEPTION (LLM) │
|
|
1535
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1536
|
+
│ │ Input: "Find suspicious billing patterns for Provider P001" │ │
|
|
1537
|
+
│ │ Output: Intent classification + tool selection │ │
|
|
1538
|
+
│ │ Constraint: Schema-bounded generation (no hallucinated predicates) │ │
|
|
1539
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1540
|
+
│ │ │
|
|
1541
|
+
│ ▼ │
|
|
1542
|
+
│ LAYER 2: SYMBOLIC REASONING (SPARQL + Datalog) │
|
|
1543
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1544
|
+
│ │ Query Execution: SELECT ?claim WHERE { ?claim :provider :P001 } │ │
|
|
1545
|
+
│ │ Rule Application: fraud(?C) :- high_amount(?C), rapid_filing(?C) │ │
|
|
1546
|
+
│ │ Guarantee: Deterministic, reproducible, auditable │ │
|
|
1547
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1548
|
+
│ │ │
|
|
1549
|
+
│ ▼ │
|
|
1550
|
+
│ LAYER 3: PROOF SYNTHESIS (Curry-Howard) │
|
|
1551
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1552
|
+
│ │ ProofDAG: Every conclusion backed by derivation chain │ │
|
|
1553
|
+
│ │ │ │
|
|
1554
|
+
│ │ [CONCLUSION: P001 is suspicious] │ │
|
|
1555
|
+
│ │ │ │ │
|
|
1556
|
+
│ │ ┌─────────────┼─────────────┐ │ │
|
|
1557
|
+
│ │ │ │ │ │ │
|
|
1558
|
+
│ │ [SPARQL] [Datalog] [Embedding] │ │
|
|
1559
|
+
│ │ 47 claims fraud rule 0.87 similarity │ │
|
|
1560
|
+
│ │ matched matched to known fraud │ │
|
|
1561
|
+
│ │ │ │
|
|
1562
|
+
│ │ Hash: sha256:8f3a2b1c... (deterministic, verifiable) │ │
|
|
1563
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1564
|
+
│ │ │
|
|
1565
|
+
│ ▼ │
|
|
1566
|
+
│ OUTPUT: Verified Answer with Full Provenance │
|
|
1567
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1568
|
+
│ │ "Provider P001 is flagged for review. Evidence: │ │
|
|
1569
|
+
│ │ - 47 high-value claims in 30 days (SPARQL) │ │
|
|
1570
|
+
│ │ - Matches fraud pattern fraud_rapid_high (Datalog) │ │
|
|
1571
|
+
│ │ - 87% similar to 3 previously confirmed fraudulent providers │ │
|
|
1572
|
+
│ │ Proof hash: sha256:8f3a2b1c4d5e6f7a8b9c0d1e2f3a4b5c" │ │
|
|
1573
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1574
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1575
|
+
```
|
|
1576
|
+
|
|
1577
|
+
**Why "Control Plane"?**
|
|
1578
|
+
|
|
1579
|
+
In networking, the **control plane** makes decisions about where traffic should go, while the **data plane** actually forwards the packets. Similarly:
|
|
1580
|
+
|
|
1581
|
+
| Concept | Networking | HyperMind |
|
|
1582
|
+
|---------|-----------|-----------|
|
|
1583
|
+
| **Control Plane** | Routing decisions | LLM planning + type validation + proof synthesis |
|
|
1584
|
+
| **Data Plane** | Packet forwarding | SPARQL execution + Datalog evaluation + embedding lookup |
|
|
1585
|
+
| **Policy** | ACLs, firewall rules | AgentScope, capabilities, fuel limits |
|
|
1586
|
+
| **Verification** | Routing table consistency | ProofDAG with Curry-Howard witnesses |
|
|
1587
|
+
|
|
1588
|
+
**The Curry-Howard Insight**:
|
|
1589
|
+
|
|
1590
|
+
The Curry-Howard correspondence states that **proofs are programs** and **types are propositions**. HyperMind applies this:
|
|
1591
|
+
|
|
1592
|
+
```
|
|
1593
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1594
|
+
│ CURRY-HOWARD IN HYPERMIND │
|
|
1595
|
+
│ │
|
|
1596
|
+
│ PROPOSITION (Type): "Provider P001 has fraud indicators" │
|
|
1597
|
+
│ │
|
|
1598
|
+
│ PROOF (Program): ProofDAG with derivation steps │
|
|
1599
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1600
|
+
│ │ 1. sparql_result: 47 claims found │ │
|
|
1601
|
+
│ │ Γ ⊢ sparql("SELECT ?c WHERE {...}") : BindingSet │ │
|
|
1602
|
+
│ │ │ │
|
|
1603
|
+
│ │ 2. datalog_derivation: fraud rule matched │ │
|
|
1604
|
+
│ │ Γ, sparql_result ⊢ fraud(P001) : InferredFact │ │
|
|
1605
|
+
│ │ │ │
|
|
1606
|
+
│ │ 3. embedding_similarity: 0.87 match to known fraud │ │
|
|
1607
|
+
│ │ Γ ⊢ similar(P001, fraud_cluster) : SimilarityScore │ │
|
|
1608
|
+
│ │ │ │
|
|
1609
|
+
│ │ 4. conclusion: conjunction of evidence │ │
|
|
1610
|
+
│ │ Γ, (2), (3) ⊢ suspicious(P001) : FraudIndicator │ │
|
|
1611
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1612
|
+
│ │
|
|
1613
|
+
│ VERIFICATION: Given ProofDAG, anyone can: │
|
|
1614
|
+
│ 1. Re-execute each step │
|
|
1615
|
+
│ 2. Verify types match │
|
|
1616
|
+
│ 3. Confirm deterministic hash │
|
|
1617
|
+
│ 4. Audit the complete reasoning chain │
|
|
1618
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1619
|
+
```
|
|
1620
|
+
|
|
1621
|
+
**ProofDAG Structure**:
|
|
1622
|
+
|
|
1623
|
+
```javascript
|
|
1624
|
+
const proof = {
|
|
1625
|
+
root: {
|
|
1626
|
+
id: 'conclusion',
|
|
1627
|
+
type: 'FraudIndicator',
|
|
1628
|
+
value: { provider: 'P001', riskScore: 0.91, confidence: 0.94 },
|
|
1629
|
+
derives_from: ['sparql_evidence', 'datalog_derivation', 'embedding_match']
|
|
1630
|
+
},
|
|
1631
|
+
nodes: [
|
|
1632
|
+
{
|
|
1633
|
+
id: 'sparql_evidence',
|
|
1634
|
+
tool: 'kg.sparql.query',
|
|
1635
|
+
input_type: 'Query',
|
|
1636
|
+
output_type: 'BindingSet',
|
|
1637
|
+
query: 'SELECT ?claim WHERE { ?claim :provider :P001 ; :amount ?a . FILTER(?a > 10000) }',
|
|
1638
|
+
result: { count: 47, time_ms: 2.3 }
|
|
1639
|
+
},
|
|
1640
|
+
{
|
|
1641
|
+
id: 'datalog_derivation',
|
|
1642
|
+
tool: 'kg.datalog.apply',
|
|
1643
|
+
input_type: 'RuleSet',
|
|
1644
|
+
output_type: 'InferredFacts',
|
|
1645
|
+
rule: 'fraud(?P) :- provider(?P), high_claim_count(?P), rapid_filing(?P)',
|
|
1646
|
+
result: { matched: true, bindings: { P: 'P001' } }
|
|
1647
|
+
},
|
|
1648
|
+
{
|
|
1649
|
+
id: 'embedding_match',
|
|
1650
|
+
tool: 'kg.embeddings.search',
|
|
1651
|
+
input_type: 'Entity',
|
|
1652
|
+
output_type: 'SimilarEntities',
|
|
1653
|
+
entity: 'P001',
|
|
1654
|
+
result: { similar: ['FRAUD_001', 'FRAUD_002', 'FRAUD_003'], score: 0.87 }
|
|
1655
|
+
}
|
|
1656
|
+
],
|
|
1657
|
+
hash: 'sha256:8f3a2b1c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a',
|
|
1658
|
+
timestamp: '2025-12-15T10:30:00Z'
|
|
1659
|
+
}
|
|
1660
|
+
|
|
1661
|
+
// Anyone can verify this proof independently
|
|
1662
|
+
const isValid = ProofDAG.verify(proof) // true if all derivations check out
|
|
1663
|
+
```
|
|
1664
|
+
|
|
1665
|
+
### Deterministic LLM Usage in Planner
|
|
1666
|
+
|
|
1667
|
+
The LLMPlanner makes LLM usage **deterministic** by constraining outputs to the schema category:
|
|
1668
|
+
|
|
1669
|
+
```
|
|
1670
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1671
|
+
│ DETERMINISTIC LLM PLANNING │
|
|
1672
|
+
│ │
|
|
1673
|
+
│ PROBLEM: LLMs are inherently non-deterministic │
|
|
1674
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1675
|
+
│ │ Same prompt → Different outputs each time │ │
|
|
1676
|
+
│ │ "Find high-risk claims" → SELECT ?x WHERE {...} (run 1) │ │
|
|
1677
|
+
│ │ "Find high-risk claims" → SELECT ?claim WHERE {...} (run 2) │ │
|
|
1678
|
+
│ │ Different variable names! │ │
|
|
1679
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1680
|
+
│ │
|
|
1681
|
+
│ SOLUTION: Schema-constrained generation │
|
|
1682
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1683
|
+
│ │ 1. SCHEMA INJECTION: LLM receives exact predicates from schema │ │
|
|
1684
|
+
│ │ "Available predicates: submittedBy, amount, riskScore" │ │
|
|
1685
|
+
│ │ │ │
|
|
1686
|
+
│ │ 2. TEMPLATE ENFORCEMENT: Output must follow typed template │ │
|
|
1687
|
+
│ │ { │ │
|
|
1688
|
+
│ │ "tool": "kg.sparql.query", // From TOOL_REGISTRY │ │
|
|
1689
|
+
│ │ "query": "SELECT ...", // Must use schema predicates │ │
|
|
1690
|
+
│ │ "expected_type": "BindingSet" // From TypeId │ │
|
|
1691
|
+
│ │ } │ │
|
|
1692
|
+
│ │ │ │
|
|
1693
|
+
│ │ 3. VALIDATION: Generated SPARQL checked against schema category │ │
|
|
1694
|
+
│ │ - All predicates ∈ schema.morphisms? ✓ │ │
|
|
1695
|
+
│ │ - All types ∈ schema.objects? ✓ │ │
|
|
1696
|
+
│ │ - Variable bindings type-correct? ✓ │ │
|
|
1697
|
+
│ │ │ │
|
|
1698
|
+
│ │ 4. RETRY ON FAILURE: If validation fails, regenerate with hint │ │
|
|
1699
|
+
│ │ "Previous query used ':badPredicate' not in schema. Try again" │ │
|
|
1700
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1701
|
+
│ │
|
|
1702
|
+
│ RESULT: Same semantic query → Same valid SPARQL (modulo variable names) │
|
|
1703
|
+
│ │
|
|
1704
|
+
│ "Find high-risk claims" → Always generates: │
|
|
1705
|
+
│ SELECT ?claim WHERE { ?claim :riskScore ?score . FILTER(?score > 0.7) } │
|
|
1706
|
+
│ Because :riskScore is the ONLY risk-related predicate in schema │
|
|
1707
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
1708
|
+
```
|
|
1709
|
+
|
|
1710
|
+
**Determinism Guarantees**:
|
|
1711
|
+
|
|
1712
|
+
| Aspect | How Determinism is Achieved |
|
|
1713
|
+
|--------|---------------------------|
|
|
1714
|
+
| **Predicate Selection** | LLM can ONLY use predicates from extracted schema |
|
|
1715
|
+
| **Type Consistency** | Output types validated against TypeId registry |
|
|
1716
|
+
| **Tool Selection** | TOOL_REGISTRY defines exact tool signatures |
|
|
1717
|
+
| **Error Recovery** | Failed validations trigger constrained retry |
|
|
1718
|
+
| **Caching** | Identical queries return cached SPARQL (no re-generation) |
|
|
1719
|
+
|
|
1720
|
+
```javascript
|
|
1721
|
+
// Deterministic LLM Planning in action
|
|
1722
|
+
const planner = new LLMPlanner({
|
|
1723
|
+
model: 'gpt-4o',
|
|
1724
|
+
apiKey: process.env.OPENAI_API_KEY,
|
|
1725
|
+
schema: SchemaContext.fromKG(db), // Schema constrains LLM output
|
|
1726
|
+
temperature: 0, // Minimize randomness
|
|
1727
|
+
cacheTTL: 300000 // Cache results for 5 minutes
|
|
1728
|
+
})
|
|
1729
|
+
|
|
1730
|
+
// These produce identical SPARQL because schema only has one risk predicate
|
|
1731
|
+
const plan1 = await planner.plan('Find risky claims')
|
|
1732
|
+
const plan2 = await planner.plan('Show me dangerous claims')
|
|
1733
|
+
const plan3 = await planner.plan('Which claims are high-risk?')
|
|
1734
|
+
|
|
1735
|
+
// All three generate the same validated SPARQL
|
|
1736
|
+
console.log(plan1.sparql === plan2.sparql) // true (after normalization)
|
|
1737
|
+
```
|
|
1738
|
+
|
|
470
1739
|
### Bring Your Own Ontology (BYOO) - Enterprise Support
|
|
471
1740
|
|
|
472
1741
|
For organizations with existing ontology teams:
|
|
@@ -646,47 +1915,180 @@ const agent = new HyperMindAgent({
|
|
|
646
1915
|
longTermGraph: 'http://memory.hypermind.ai/' // Persistent memory
|
|
647
1916
|
}),
|
|
648
1917
|
|
|
649
|
-
// === LAYER 3: Scope ===
|
|
650
|
-
scope: new AgentScope({
|
|
651
|
-
allowedGraphs: ['http://insurance.org/'], // Graphs agent can access
|
|
652
|
-
allowedPredicates: null, // null = all predicates
|
|
653
|
-
maxResultSize: 10000 // Limit result set size
|
|
654
|
-
}),
|
|
1918
|
+
// === LAYER 3: Scope ===
|
|
1919
|
+
scope: new AgentScope({
|
|
1920
|
+
allowedGraphs: ['http://insurance.org/'], // Graphs agent can access
|
|
1921
|
+
allowedPredicates: null, // null = all predicates
|
|
1922
|
+
maxResultSize: 10000 // Limit result set size
|
|
1923
|
+
}),
|
|
1924
|
+
|
|
1925
|
+
// === LAYER 4: Embeddings ===
|
|
1926
|
+
embeddings: new EmbeddingService(), // For similarity search
|
|
1927
|
+
|
|
1928
|
+
// === LAYER 5: Security ===
|
|
1929
|
+
sandbox: {
|
|
1930
|
+
capabilities: ['ReadKG', 'ExecuteTool'], // No WriteKG = read-only
|
|
1931
|
+
fuelLimit: 1_000_000 // CPU budget
|
|
1932
|
+
},
|
|
1933
|
+
|
|
1934
|
+
// === LAYER 6: Identity & Session ===
|
|
1935
|
+
name: 'fraud-detector', // Persistent agent identity
|
|
1936
|
+
userId: 'user:alice@company.com', // User identity (for multi-tenant)
|
|
1937
|
+
sessionId: 'session:2025-12-15-001' // Session tracking
|
|
1938
|
+
})
|
|
1939
|
+
|
|
1940
|
+
// Wait for schema extraction to complete
|
|
1941
|
+
await db.waitForSchema()
|
|
1942
|
+
|
|
1943
|
+
// Natural language query - LLM uses schema for accurate SPARQL
|
|
1944
|
+
const result = await agent.call('Find all high-risk claims')
|
|
1945
|
+
|
|
1946
|
+
console.log('Answer:', result.answer)
|
|
1947
|
+
console.log('Tools Used:', result.explanation.tools_used)
|
|
1948
|
+
console.log('SPARQL Generated:', result.explanation.sparql_queries)
|
|
1949
|
+
console.log('Proof Hash:', result.proof?.hash)
|
|
1950
|
+
```
|
|
1951
|
+
|
|
1952
|
+
**Layer Defaults** (if not specified):
|
|
1953
|
+
|
|
1954
|
+
| Layer | Default Value |
|
|
1955
|
+
|-------|---------------|
|
|
1956
|
+
| Memory | Disabled (no session persistence) |
|
|
1957
|
+
| Scope | Unrestricted (all graphs, all predicates) |
|
|
1958
|
+
| Embeddings | Disabled (no similarity search) |
|
|
1959
|
+
| Sandbox | `['ReadKG', 'ExecuteTool']`, fuel: 1M |
|
|
1960
|
+
| LLM Model | None (demo mode with keyword matching) |
|
|
1961
|
+
| Identity | Auto-generated UUID, no user tracking |
|
|
1962
|
+
|
|
1963
|
+
### Session Management: User Identity & Agent Persistence
|
|
1964
|
+
|
|
1965
|
+
HyperMind provides **recognized and persisted** identities for multi-tenant, audit-compliant deployments:
|
|
1966
|
+
|
|
1967
|
+
```
|
|
1968
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
1969
|
+
│ SESSION & IDENTITY MODEL │
|
|
1970
|
+
│ │
|
|
1971
|
+
│ THREE IDENTITY LAYERS: │
|
|
1972
|
+
│ │
|
|
1973
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1974
|
+
│ │ 1. AGENT NAME (Persistent) │ │
|
|
1975
|
+
│ │ - Unique identifier for the agent type │ │
|
|
1976
|
+
│ │ - Persists across sessions, users, and restarts │ │
|
|
1977
|
+
│ │ - Example: 'fraud-detector', 'underwriter', 'claims-reviewer' │ │
|
|
1978
|
+
│ │ - Used for: Role-based access, audit trails, agent memory │ │
|
|
1979
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1980
|
+
│ │
|
|
1981
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1982
|
+
│ │ 2. USER ID (Multi-tenant) │ │
|
|
1983
|
+
│ │ - Identity of the human user invoking the agent │ │
|
|
1984
|
+
│ │ - Persisted in episodic memory for audit compliance │ │
|
|
1985
|
+
│ │ - Example: 'user:alice@company.com', 'user:claims-team' │ │
|
|
1986
|
+
│ │ - Used for: Access control, usage tracking, billing │ │
|
|
1987
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1988
|
+
│ │
|
|
1989
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
1990
|
+
│ │ 3. SESSION ID (Ephemeral) │ │
|
|
1991
|
+
│ │ - Unique identifier for a single conversation/interaction │ │
|
|
1992
|
+
│ │ - Links all operations within one user interaction │ │
|
|
1993
|
+
│ │ - Example: 'session:2025-12-15-001', auto-generated UUID │ │
|
|
1994
|
+
│ │ - Used for: Conversation context, working memory scope │ │
|
|
1995
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
1996
|
+
│ │
|
|
1997
|
+
│ PERSISTENCE MODEL: │
|
|
1998
|
+
│ │
|
|
1999
|
+
│ Agent Name ─────► Stored in KG: <agent:fraud-detector> a am:Agent . │
|
|
2000
|
+
│ User ID ─────► Stored in KG: <user:alice> a am:User . │
|
|
2001
|
+
│ Session ID ─────► Stored in KG: <session:001> a am:Session . │
|
|
2002
|
+
│ │
|
|
2003
|
+
│ Episode ─────────► Links all three: │
|
|
2004
|
+
│ <episode:123> am:performedBy <agent:fraud-detector> ; │
|
|
2005
|
+
│ am:requestedBy <user:alice> ; │
|
|
2006
|
+
│ am:inSession <session:001> . │
|
|
2007
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
2008
|
+
```
|
|
2009
|
+
|
|
2010
|
+
**Session Management Example**:
|
|
2011
|
+
|
|
2012
|
+
```javascript
|
|
2013
|
+
const { HyperMindAgent, MemoryManager } = require('rust-kgdb')
|
|
2014
|
+
|
|
2015
|
+
// Create agent with full identity configuration
|
|
2016
|
+
const agent = new HyperMindAgent({
|
|
2017
|
+
kg: db,
|
|
2018
|
+
|
|
2019
|
+
// Agent identity (persistent across all users/sessions)
|
|
2020
|
+
name: 'fraud-detector',
|
|
655
2021
|
|
|
656
|
-
//
|
|
657
|
-
|
|
2022
|
+
// User identity (for multi-tenant deployments)
|
|
2023
|
+
userId: 'user:alice@acme-insurance.com',
|
|
658
2024
|
|
|
659
|
-
//
|
|
660
|
-
|
|
661
|
-
capabilities: ['ReadKG', 'ExecuteTool'], // No WriteKG = read-only
|
|
662
|
-
fuelLimit: 1_000_000 // CPU budget
|
|
663
|
-
},
|
|
2025
|
+
// Session identity (for conversation tracking)
|
|
2026
|
+
sessionId: 'session:web-ui-2025-12-15-143022',
|
|
664
2027
|
|
|
665
|
-
//
|
|
666
|
-
|
|
2028
|
+
// Memory with persistence
|
|
2029
|
+
memory: new MemoryManager({
|
|
2030
|
+
workingMemorySize: 20, // In-session context
|
|
2031
|
+
episodicRetentionDays: 90, // 90-day retention for compliance
|
|
2032
|
+
longTermGraph: 'http://memory.acme-insurance.com/'
|
|
2033
|
+
})
|
|
667
2034
|
})
|
|
668
2035
|
|
|
669
|
-
//
|
|
670
|
-
await
|
|
2036
|
+
// First query in session
|
|
2037
|
+
await agent.call('Find claims over $100,000')
|
|
671
2038
|
|
|
672
|
-
//
|
|
673
|
-
|
|
2039
|
+
// Second query - agent remembers context from first query
|
|
2040
|
+
await agent.call('Now show me which of those are from Provider P001')
|
|
674
2041
|
|
|
675
|
-
|
|
676
|
-
|
|
677
|
-
|
|
678
|
-
|
|
2042
|
+
// Episodic memory stores the full conversation:
|
|
2043
|
+
// <episode:uuid-1> am:prompt "Find claims over $100,000" ;
|
|
2044
|
+
// am:performedBy <agent:fraud-detector> ;
|
|
2045
|
+
// am:requestedBy <user:alice@acme-insurance.com> ;
|
|
2046
|
+
// am:inSession <session:web-ui-2025-12-15-143022> ;
|
|
2047
|
+
// am:timestamp "2025-12-15T14:30:22Z" .
|
|
679
2048
|
```
|
|
680
2049
|
|
|
681
|
-
**
|
|
2050
|
+
**Identity Resolution**:
|
|
682
2051
|
|
|
683
|
-
|
|
|
684
|
-
|
|
685
|
-
|
|
|
686
|
-
|
|
|
687
|
-
|
|
|
688
|
-
|
|
689
|
-
|
|
2052
|
+
| Field | Format | Persistence | Use Case |
|
|
2053
|
+
|-------|--------|-------------|----------|
|
|
2054
|
+
| `name` | String | Permanent (KG) | Agent type identification |
|
|
2055
|
+
| `userId` | URI or String | Per-episode | Audit trails, multi-tenant isolation |
|
|
2056
|
+
| `sessionId` | UUID or String | Per-session | Conversation continuity |
|
|
2057
|
+
|
|
2058
|
+
**Cross-Session Memory Retrieval**:
|
|
2059
|
+
|
|
2060
|
+
```javascript
|
|
2061
|
+
// New session, same user - retrieve previous context
|
|
2062
|
+
const agent = new HyperMindAgent({
|
|
2063
|
+
kg: db,
|
|
2064
|
+
name: 'fraud-detector',
|
|
2065
|
+
userId: 'user:alice@acme-insurance.com',
|
|
2066
|
+
sessionId: 'session:web-ui-2025-12-16-091500', // New session
|
|
2067
|
+
memory: new MemoryManager({ episodicRetentionDays: 90 })
|
|
2068
|
+
})
|
|
2069
|
+
|
|
2070
|
+
// Agent can recall previous sessions for this user
|
|
2071
|
+
const previousInvestigations = await agent.memory.query(`
|
|
2072
|
+
SELECT ?prompt ?result ?timestamp WHERE {
|
|
2073
|
+
?episode am:requestedBy <user:alice@acme-insurance.com> ;
|
|
2074
|
+
am:prompt ?prompt ;
|
|
2075
|
+
am:result ?result ;
|
|
2076
|
+
am:timestamp ?timestamp .
|
|
2077
|
+
} ORDER BY DESC(?timestamp) LIMIT 10
|
|
2078
|
+
`)
|
|
2079
|
+
// Returns: Last 10 queries by Alice across all her sessions
|
|
2080
|
+
```
|
|
2081
|
+
|
|
2082
|
+
**Audit Compliance Features**:
|
|
2083
|
+
|
|
2084
|
+
| Requirement | How HyperMind Addresses It |
|
|
2085
|
+
|-------------|---------------------------|
|
|
2086
|
+
| Who ran the query? | `userId` persisted in every episode |
|
|
2087
|
+
| What agent was used? | `name` links to agent's capabilities |
|
|
2088
|
+
| When did it happen? | `am:timestamp` on every episode |
|
|
2089
|
+
| What was the result? | `am:result` with full execution trace |
|
|
2090
|
+
| Can we replay it? | ProofDAG enables deterministic replay |
|
|
2091
|
+
| Retention policy? | `episodicRetentionDays` enforces TTL |
|
|
690
2092
|
|
|
691
2093
|
### Schema-Aware Intent: Different Words → Same Result
|
|
692
2094
|
|
|
@@ -747,6 +2149,97 @@ Unlike black-box LLMs, HyperMind produces **deterministic, verifiable results**:
|
|
|
747
2149
|
- **Reproducibility**: Same query → same answer → same proof hash
|
|
748
2150
|
- **Compliance Ready**: Full provenance for regulatory requirements
|
|
749
2151
|
|
|
2152
|
+
### Comparison with Agentic Frameworks
|
|
2153
|
+
|
|
2154
|
+
How HyperMind differs from popular LLM orchestration frameworks:
|
|
2155
|
+
|
|
2156
|
+
| Feature | HyperMind | LangChain | DSPy | CrewAI | AutoGPT |
|
|
2157
|
+
|---------|-----------|-----------|------|--------|---------|
|
|
2158
|
+
| **Core Paradigm** | Neuro-Symbolic | Chain-of-Thought | Prompt Optimization | Multi-Agent Roles | Autonomous Loop |
|
|
2159
|
+
| **Prompt Optimization** | ✅ Schema injection | ❌ Manual templates | ✅ Compiled prompts | ❌ Role-based | ❌ Fixed prompts |
|
|
2160
|
+
| **Grounding Source** | Knowledge Graph | External retrievers | Training data | Tool calls | Web search |
|
|
2161
|
+
| **Verification** | ✅ ProofDAG | ❌ Trust LLM | ❌ Trust LLM | ❌ Trust LLM | ❌ Trust LLM |
|
|
2162
|
+
| **Determinism** | ✅ Same hash | ❌ Varies | ❌ Varies | ❌ Varies | ❌ Varies |
|
|
2163
|
+
| **Memory Model** | Temporal + LT KG | VectorDB | None | VectorDB | VectorDB |
|
|
2164
|
+
| **Security** | WASM OCAP | Trust-based | None | Trust-based | Trust-based |
|
|
2165
|
+
| **Type Safety** | ✅ Curry-Howard | ❌ Runtime | ❌ Runtime | ❌ Runtime | ❌ Runtime |
|
|
2166
|
+
|
|
2167
|
+
#### Prompt Optimization: Schema Injection vs. Others
|
|
2168
|
+
|
|
2169
|
+
**LangChain (Manual Prompts)**:
|
|
2170
|
+
```python
|
|
2171
|
+
# Developer writes prompts by hand - error-prone, doesn't know actual schema
|
|
2172
|
+
template = """Given this context: {context}
|
|
2173
|
+
Answer: {question}"""
|
|
2174
|
+
# Problem: Context is unstructured, LLM may hallucinate predicates
|
|
2175
|
+
```
|
|
2176
|
+
|
|
2177
|
+
**DSPy (Compiled Prompts)**:
|
|
2178
|
+
```python
|
|
2179
|
+
# Learns optimal prompts from training examples
|
|
2180
|
+
class FraudDetector(dspy.Signature):
|
|
2181
|
+
claim = dspy.InputField()
|
|
2182
|
+
is_fraud = dspy.OutputField()
|
|
2183
|
+
# Problem: Still no grounding - outputs are unverified predictions
|
|
2184
|
+
```
|
|
2185
|
+
|
|
2186
|
+
**HyperMind (Schema-Injected Prompts)**:
|
|
2187
|
+
```javascript
|
|
2188
|
+
// Automatic schema extraction + injection
|
|
2189
|
+
const schema = SchemaContext.fromKG(db)
|
|
2190
|
+
// schema = { classes: ['Claim', 'Provider'], predicates: ['amount', 'riskScore'] }
|
|
2191
|
+
|
|
2192
|
+
// LLM receives YOUR schema - can only use valid predicates
|
|
2193
|
+
// Prompt: "Generate SPARQL using ONLY: amount, riskScore, submittedBy"
|
|
2194
|
+
// Result: Valid SPARQL that executes against YOUR data
|
|
2195
|
+
```
|
|
2196
|
+
|
|
2197
|
+
**Why Schema Injection > Prompt Templates**:
|
|
2198
|
+
|
|
2199
|
+
| Approach | Hallucination Risk | Schema Drift | Verification |
|
|
2200
|
+
|----------|-------------------|--------------|--------------|
|
|
2201
|
+
| Manual templates | High | Not handled | None |
|
|
2202
|
+
| DSPy compiled | Medium | Not handled | None |
|
|
2203
|
+
| **HyperMind schema** | **Low** | **Auto-detected** | **ProofDAG** |
|
|
2204
|
+
|
|
2205
|
+
```
|
|
2206
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
2207
|
+
│ PROMPT OPTIMIZATION COMPARISON │
|
|
2208
|
+
│ │
|
|
2209
|
+
│ LANGCHAIN: HYPERMIND: │
|
|
2210
|
+
│ ┌──────────────────┐ ┌──────────────────┐ │
|
|
2211
|
+
│ │ Static Prompt │ │ Schema Extract │ ← Auto from KG │
|
|
2212
|
+
│ │ "Find fraud..." │ │ {classes, pred} │ │
|
|
2213
|
+
│ └────────┬─────────┘ └────────┬─────────┘ │
|
|
2214
|
+
│ │ │ │
|
|
2215
|
+
│ ▼ ▼ │
|
|
2216
|
+
│ ┌──────────────────┐ ┌──────────────────┐ │
|
|
2217
|
+
│ │ LLM │ │ LLM + Schema │ ← Constrained │
|
|
2218
|
+
│ │ (unconstrained) │ │ injection │ │
|
|
2219
|
+
│ └────────┬─────────┘ └────────┬─────────┘ │
|
|
2220
|
+
│ │ │ │
|
|
2221
|
+
│ ▼ ▼ │
|
|
2222
|
+
│ ┌──────────────────┐ ┌──────────────────┐ │
|
|
2223
|
+
│ │ "fraud in the │ │ SELECT ?claim │ ← Valid SPARQL │
|
|
2224
|
+
│ │ insurance..." │ │ WHERE {valid} │ │
|
|
2225
|
+
│ │ (unstructured) │ └────────┬─────────┘ │
|
|
2226
|
+
│ └──────────────────┘ │ │
|
|
2227
|
+
│ ▼ │
|
|
2228
|
+
│ ┌──────────────────┐ │
|
|
2229
|
+
│ │ Execute against │ ← Actual data │
|
|
2230
|
+
│ │ Knowledge Graph │ │
|
|
2231
|
+
│ └────────┬─────────┘ │
|
|
2232
|
+
│ │ │
|
|
2233
|
+
│ ▼ │
|
|
2234
|
+
│ ┌──────────────────┐ │
|
|
2235
|
+
│ │ ProofDAG │ ← Verifiable │
|
|
2236
|
+
│ │ hash: 0x8f3a... │ │
|
|
2237
|
+
│ └──────────────────┘ │
|
|
2238
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
2239
|
+
```
|
|
2240
|
+
|
|
2241
|
+
**Key Insight**: DSPy optimizes prompts for *output format*. HyperMind optimizes prompts for *semantic correctness* by grounding in your actual data schema.
|
|
2242
|
+
|
|
750
2243
|
### HyperMind as Intelligence Control Plane
|
|
751
2244
|
|
|
752
2245
|
HyperMind implements a **control plane architecture** for LLM agents, aligning with recent research on the "missing coordination layer" for AI systems (see [Chang 2025](https://arxiv.org/abs/2512.05765)).
|
|
@@ -1126,9 +2619,22 @@ node hypermind-benchmark.js
|
|
|
1126
2619
|
| **SPARQL 1.1 Query** | 100% | [W3C Rec](https://www.w3.org/TR/sparql11-query/) |
|
|
1127
2620
|
| **SPARQL 1.1 Update** | 100% | [W3C Rec](https://www.w3.org/TR/sparql11-update/) |
|
|
1128
2621
|
| **RDF 1.2** | 100% | [W3C Draft](https://www.w3.org/TR/rdf12-concepts/) |
|
|
2622
|
+
| **RDF-Star (RDF 1.2)** | 100% | [W3C Draft](https://www.w3.org/TR/rdf12-star/) |
|
|
2623
|
+
| **SPARQL-Star** | 100% | [W3C Draft](https://www.w3.org/TR/sparql12-query/#rdf-star) |
|
|
1129
2624
|
| **Turtle** | 100% | [W3C Rec](https://www.w3.org/TR/turtle/) |
|
|
2625
|
+
| **Turtle-Star** | 100% | [W3C Draft](https://www.w3.org/TR/rdf12-turtle/) |
|
|
1130
2626
|
| **N-Triples** | 100% | [W3C Rec](https://www.w3.org/TR/n-triples/) |
|
|
1131
2627
|
|
|
2628
|
+
### Standards Comparison with Other Systems
|
|
2629
|
+
|
|
2630
|
+
| Standard | rust-kgdb | Tentris | RDFox | Virtuoso | Blazegraph |
|
|
2631
|
+
|----------|-----------|---------|-------|----------|------------|
|
|
2632
|
+
| **SPARQL 1.1** | ✅ 100% | ✅ | ✅ | ✅ | ✅ |
|
|
2633
|
+
| **RDF-Star** | ✅ | ❌ | ✅ | ⚠️ Partial | ⚠️ RDR |
|
|
2634
|
+
| **SPARQL-Star** | ✅ | ❌ | ✅ | ⚠️ Partial | ⚠️ RDR |
|
|
2635
|
+
| **Native Hypergraph** | ✅ | ❌ | ❌ | ❌ | ❌ |
|
|
2636
|
+
| **64 Builtins** | ✅ | ~30 | ~40 | ~50 | ~45 |
|
|
2637
|
+
|
|
1132
2638
|
**64 SPARQL Builtin Functions** implemented:
|
|
1133
2639
|
- String: `STR`, `CONCAT`, `SUBSTR`, `STRLEN`, `REGEX`, `REPLACE`, etc.
|
|
1134
2640
|
- Numeric: `ABS`, `ROUND`, `CEIL`, `FLOOR`, `RAND`
|
|
@@ -1325,6 +2831,339 @@ class WasmSandbox {
|
|
|
1325
2831
|
}
|
|
1326
2832
|
```
|
|
1327
2833
|
|
|
2834
|
+
---
|
|
2835
|
+
|
|
2836
|
+
## Security Concepts: Scope, Fuel, and WASM
|
|
2837
|
+
|
|
2838
|
+
HyperMind implements three complementary security layers for AI agent execution:
|
|
2839
|
+
|
|
2840
|
+
### 1. AgentScope: Data Access Control
|
|
2841
|
+
|
|
2842
|
+
**Concept**: Scope defines WHAT data an agent can access - a whitelist-based filter on graphs and predicates.
|
|
2843
|
+
|
|
2844
|
+
```
|
|
2845
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
2846
|
+
│ AGENT SCOPE MODEL │
|
|
2847
|
+
│ │
|
|
2848
|
+
│ ┌─────────────────────────────────────────────────────────────────────────┐│
|
|
2849
|
+
│ │ KNOWLEDGE GRAPH ││
|
|
2850
|
+
│ │ ┌──────────────────────────────────────────────────────────────────┐ ││
|
|
2851
|
+
│ │ │ Graph: http://insurance.org/claims ← ALLOWED │ ││
|
|
2852
|
+
│ │ │ :Claim :amount, :provider, :status │ ││
|
|
2853
|
+
│ │ └──────────────────────────────────────────────────────────────────┘ ││
|
|
2854
|
+
│ │ ┌──────────────────────────────────────────────────────────────────┐ ││
|
|
2855
|
+
│ │ │ Graph: http://insurance.org/internal ← BLOCKED │ ││
|
|
2856
|
+
│ │ │ :Employee :salary, :ssn, :performance │ ││
|
|
2857
|
+
│ │ └──────────────────────────────────────────────────────────────────┘ ││
|
|
2858
|
+
│ │ ┌──────────────────────────────────────────────────────────────────┐ ││
|
|
2859
|
+
│ │ │ Graph: http://insurance.org/customers ← ALLOWED │ ││
|
|
2860
|
+
│ │ │ :Customer :riskScore (allowed), :creditCard (blocked) │ ││
|
|
2861
|
+
│ │ └──────────────────────────────────────────────────────────────────┘ ││
|
|
2862
|
+
│ └─────────────────────────────────────────────────────────────────────────┘│
|
|
2863
|
+
│ │
|
|
2864
|
+
│ AgentScope: │
|
|
2865
|
+
│ allowedGraphs: ['http://insurance.org/claims', 'http://insurance.org/customers']│
|
|
2866
|
+
│ allowedPredicates: [':amount', ':provider', ':status', ':riskScore'] │
|
|
2867
|
+
│ maxResultSize: 1000 │
|
|
2868
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
2869
|
+
```
|
|
2870
|
+
|
|
2871
|
+
**Why Scope Matters**:
|
|
2872
|
+
- **Principle of Least Privilege**: Agent only sees data relevant to its task
|
|
2873
|
+
- **Data Isolation**: PII, financials, internal data can be excluded
|
|
2874
|
+
- **Compliance**: GDPR, HIPAA, SOX - restrict access by role
|
|
2875
|
+
|
|
2876
|
+
```javascript
|
|
2877
|
+
// Claims analyst - can see claims but not internal employee data
|
|
2878
|
+
const claimsScope = new AgentScope({
|
|
2879
|
+
allowedGraphs: ['http://insurance.org/claims'],
|
|
2880
|
+
allowedPredicates: [':amount', ':provider', ':status', ':dateSubmitted'],
|
|
2881
|
+
maxResultSize: 5000 // Prevent data exfiltration
|
|
2882
|
+
})
|
|
2883
|
+
|
|
2884
|
+
// Executive dashboard - broader access, still limited
|
|
2885
|
+
const execScope = new AgentScope({
|
|
2886
|
+
allowedGraphs: ['http://insurance.org/claims', 'http://insurance.org/analytics'],
|
|
2887
|
+
allowedPredicates: null, // All predicates
|
|
2888
|
+
maxResultSize: 50000
|
|
2889
|
+
})
|
|
2890
|
+
```
|
|
2891
|
+
|
|
2892
|
+
### 2. Fuel Metering: CPU Budget Control
|
|
2893
|
+
|
|
2894
|
+
**What is Fuel?**
|
|
2895
|
+
|
|
2896
|
+
Fuel is like a **prepaid phone card for computation**. When you create an agent, you give it a fuel budget. Every operation the agent performs costs fuel. When fuel runs out, the agent stops - no exceptions.
|
|
2897
|
+
|
|
2898
|
+
```
|
|
2899
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
2900
|
+
│ FUEL: THE PREPAID COMPUTATION MODEL │
|
|
2901
|
+
│ │
|
|
2902
|
+
│ ANALOGY: Prepaid Phone Card │
|
|
2903
|
+
│ │
|
|
2904
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
2905
|
+
│ │ You buy a phone card with 100 minutes │ │
|
|
2906
|
+
│ │ Local call (SPARQL query): -2 minutes │ │
|
|
2907
|
+
│ │ Long distance (Datalog): -10 minutes │ │
|
|
2908
|
+
│ │ International (Graph algo): -30 minutes │ │
|
|
2909
|
+
│ │ │ │
|
|
2910
|
+
│ │ When minutes = 0 → Card stops working │ │
|
|
2911
|
+
│ │ No overdraft, no credit, no exceptions │ │
|
|
2912
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
2913
|
+
│ │
|
|
2914
|
+
│ SAME FOR AGENTS: │
|
|
2915
|
+
│ │
|
|
2916
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
2917
|
+
│ │ Agent gets 1,000,000 fuel units │ │
|
|
2918
|
+
│ │ Simple query: -1,000 fuel │ │
|
|
2919
|
+
│ │ Complex join: -15,000 fuel │ │
|
|
2920
|
+
│ │ PageRank: -100,000 fuel │ │
|
|
2921
|
+
│ │ │ │
|
|
2922
|
+
│ │ When fuel = 0 → Agent halts immediately │ │
|
|
2923
|
+
│ │ Operation in progress? Aborted. │ │
|
|
2924
|
+
│ │ No "just one more query", no exceptions │ │
|
|
2925
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
2926
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
2927
|
+
```
|
|
2928
|
+
|
|
2929
|
+
**Why Fuel Matters**:
|
|
2930
|
+
|
|
2931
|
+
| Problem | Without Fuel | With Fuel |
|
|
2932
|
+
|---------|--------------|-----------|
|
|
2933
|
+
| **Infinite Loop** | Agent runs forever, system hangs | Agent stops when fuel exhausted |
|
|
2934
|
+
| **Malicious Query** | `SELECT * FROM trillion_rows` crashes system | Query aborted at fuel limit |
|
|
2935
|
+
| **Cost Control** | Unknown compute costs | Predictable: 1M fuel = ~$0.01 |
|
|
2936
|
+
| **Multi-tenant** | One agent starves others | Each agent has guaranteed budget |
|
|
2937
|
+
| **Audit** | "Why did this cost so much?" | Fuel log shows exact operations |
|
|
2938
|
+
|
|
2939
|
+
### Fuel = CPU Budget: The Relationship
|
|
2940
|
+
|
|
2941
|
+
**Why is it called "CPU Budget"?**
|
|
2942
|
+
|
|
2943
|
+
Fuel is an **abstract representation of CPU time**. The relationship:
|
|
2944
|
+
|
|
2945
|
+
```
|
|
2946
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
2947
|
+
│ FUEL ↔ CPU BUDGET RELATIONSHIP │
|
|
2948
|
+
│ │
|
|
2949
|
+
│ 1 fuel unit ≈ 1 microsecond of CPU time (approximate) │
|
|
2950
|
+
│ │
|
|
2951
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
2952
|
+
│ │ FUEL LIMIT APPROXIMATE CPU TIME TYPICAL USE CASE │ │
|
|
2953
|
+
│ │ ───────────────────────────────────────────────────────────────── │ │
|
|
2954
|
+
│ │ 100,000 ~100ms Simple query │ │
|
|
2955
|
+
│ │ 1,000,000 ~1 second Standard agent task │ │
|
|
2956
|
+
│ │ 10,000,000 ~10 seconds Complex analysis │ │
|
|
2957
|
+
│ │ 100,000,000 ~100 seconds Batch processing │ │
|
|
2958
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
2959
|
+
│ │
|
|
2960
|
+
│ WHY "FUEL" INSTEAD OF "TIME"? │
|
|
2961
|
+
│ │
|
|
2962
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
2963
|
+
│ │ TIME (wall clock): FUEL (CPU budget): │ │
|
|
2964
|
+
│ │ • Varies by machine speed • Consistent across machines │ │
|
|
2965
|
+
│ │ • Includes I/O wait • Only counts computation │ │
|
|
2966
|
+
│ │ • Hard to predict • Deterministic per operation │ │
|
|
2967
|
+
│ │ • Can't pause/resume • Checkpoint and continue │ │
|
|
2968
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
2969
|
+
│ │
|
|
2970
|
+
│ FUEL COST = OPERATION COMPLEXITY │
|
|
2971
|
+
│ │
|
|
2972
|
+
│ Simple SELECT: ~1,000 fuel (scans 100 triples) │
|
|
2973
|
+
│ Complex JOIN: ~15,000 fuel (joins 3 tables, 1000 rows each) │
|
|
2974
|
+
│ PageRank(100): ~100,000 fuel (20 iterations on 100-node graph) │
|
|
2975
|
+
│ │
|
|
2976
|
+
│ The cost is based on ALGORITHM COMPLEXITY, not wall-clock time. │
|
|
2977
|
+
│ A 1000-fuel query takes 1000 fuel whether it runs on a laptop or server. │
|
|
2978
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
2979
|
+
```
|
|
2980
|
+
|
|
2981
|
+
**Practical Example**:
|
|
2982
|
+
|
|
2983
|
+
```javascript
|
|
2984
|
+
const agent = new HyperMindAgent({
|
|
2985
|
+
kg: db,
|
|
2986
|
+
sandbox: {
|
|
2987
|
+
capabilities: ['ReadKG', 'ExecuteTool'],
|
|
2988
|
+
fuelLimit: 1_000_000 // 1 million fuel ≈ 1 second of CPU budget
|
|
2989
|
+
}
|
|
2990
|
+
})
|
|
2991
|
+
|
|
2992
|
+
// Agent executes:
|
|
2993
|
+
// 1. SPARQL query: costs 5,000 fuel
|
|
2994
|
+
// 2. Datalog evaluation: costs 25,000 fuel
|
|
2995
|
+
// 3. Embedding search: costs 2,000 fuel
|
|
2996
|
+
// Total: 32,000 fuel used, 968,000 remaining
|
|
2997
|
+
|
|
2998
|
+
// If agent tries expensive operation:
|
|
2999
|
+
// 4. PageRank on 10K nodes: would cost 2,000,000 fuel
|
|
3000
|
+
// ERROR: FuelExhausted - operation requires 2M fuel but only 968K available
|
|
3001
|
+
```
|
|
3002
|
+
|
|
3003
|
+
**Concept**: Fuel is a consumable resource that limits computation. Every operation costs fuel.
|
|
3004
|
+
|
|
3005
|
+
```
|
|
3006
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
3007
|
+
│ FUEL METERING MODEL │
|
|
3008
|
+
│ │
|
|
3009
|
+
│ Initial Fuel: 1,000,000 │
|
|
3010
|
+
│ │
|
|
3011
|
+
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3012
|
+
│ │ Operation 1: SPARQL Query (complex join) │ │
|
|
3013
|
+
│ │ Cost: -15,000 fuel │ │
|
|
3014
|
+
│ │ Remaining: 985,000 │ │
|
|
3015
|
+
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3016
|
+
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3017
|
+
│ │ Operation 2: Datalog evaluation (50 rules) │ │
|
|
3018
|
+
│ │ Cost: -45,000 fuel │ │
|
|
3019
|
+
│ │ Remaining: 940,000 │ │
|
|
3020
|
+
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3021
|
+
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3022
|
+
│ │ Operation 3: Embedding similarity search │ │
|
|
3023
|
+
│ │ Cost: -2,000 fuel │ │
|
|
3024
|
+
│ │ Remaining: 938,000 │ │
|
|
3025
|
+
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3026
|
+
│ ... │
|
|
3027
|
+
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3028
|
+
│ │ Operation N: Attempted complex analysis │ │
|
|
3029
|
+
│ │ Cost: -950,000 fuel │ │
|
|
3030
|
+
│ │ ERROR: FuelExhausted - execution halted │ │
|
|
3031
|
+
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3032
|
+
│ │
|
|
3033
|
+
│ WHY FUEL? │
|
|
3034
|
+
│ • Prevents infinite loops │
|
|
3035
|
+
│ • Enables cost accounting per agent │
|
|
3036
|
+
│ • DoS protection (runaway queries) │
|
|
3037
|
+
│ • Multi-tenant resource fairness │
|
|
3038
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
3039
|
+
```
|
|
3040
|
+
|
|
3041
|
+
**Fuel Cost Reference**:
|
|
3042
|
+
|
|
3043
|
+
| Operation | Typical Fuel Cost | Notes |
|
|
3044
|
+
|-----------|-------------------|-------|
|
|
3045
|
+
| Simple SPARQL SELECT | 1,000 - 5,000 | BGP with 1-3 patterns |
|
|
3046
|
+
| Complex SPARQL (joins) | 10,000 - 50,000 | Multiple joins, filters |
|
|
3047
|
+
| Datalog evaluation | 5,000 - 100,000 | Depends on rule count |
|
|
3048
|
+
| Embedding search | 500 - 2,000 | HNSW lookup |
|
|
3049
|
+
| Graph algorithm | 10,000 - 500,000 | PageRank, components |
|
|
3050
|
+
| Memory retrieval | 100 - 500 | Episode lookup |
|
|
3051
|
+
|
|
3052
|
+
### 3. WASM Sandbox: Capability-Based Security
|
|
3053
|
+
|
|
3054
|
+
**Concept**: Object-Capability (OCAP) security - code can only access resources it's given explicit handles to.
|
|
3055
|
+
|
|
3056
|
+
```
|
|
3057
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
3058
|
+
│ OCAP vs TRADITIONAL ACCESS CONTROL │
|
|
3059
|
+
│ │
|
|
3060
|
+
│ TRADITIONAL (ACL/RBAC): OCAP (HyperMind): │
|
|
3061
|
+
│ ┌─────────────────────────┐ ┌─────────────────────────┐ │
|
|
3062
|
+
│ │ Agent requests │ │ Agent receives │ │
|
|
3063
|
+
│ │ "read claims" │ │ capability token │ │
|
|
3064
|
+
│ │ │ │ │ │ │ │
|
|
3065
|
+
│ │ ▼ │ │ ▼ │ │
|
|
3066
|
+
│ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │
|
|
3067
|
+
│ │ │ Access │ │ │ │ Token = │ │ │
|
|
3068
|
+
│ │ │ Control List │ │ │ │ ReadKG cap │ │ │
|
|
3069
|
+
│ │ │ (centralized)│ │ │ │ (unforgeable)│ │ │
|
|
3070
|
+
│ │ └──────────────┘ │ │ └──────────────┘ │ │
|
|
3071
|
+
│ │ │ │ │ │ │ │
|
|
3072
|
+
│ │ Check role → grant │ │ Has token → use it │ │
|
|
3073
|
+
│ │ │ │ │ │
|
|
3074
|
+
│ │ Problem: Ambient │ │ Benefit: No ambient │ │
|
|
3075
|
+
│ │ authority - agent │ │ authority - only what │ │
|
|
3076
|
+
│ │ could escalate │ │ was explicitly granted │ │
|
|
3077
|
+
│ └─────────────────────────┘ └─────────────────────────┘ │
|
|
3078
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
3079
|
+
```
|
|
3080
|
+
|
|
3081
|
+
**Available Capabilities**:
|
|
3082
|
+
|
|
3083
|
+
| Capability | What It Grants | Risk Level |
|
|
3084
|
+
|------------|----------------|------------|
|
|
3085
|
+
| `ReadKG` | Query knowledge graph (SELECT, CONSTRUCT, ASK) | Low |
|
|
3086
|
+
| `WriteKG` | Modify knowledge graph (INSERT, DELETE) | Medium |
|
|
3087
|
+
| `ExecuteTool` | Run registered tools (Datalog, GraphFrame) | Medium |
|
|
3088
|
+
| `SpawnAgent` | Create child agents | High |
|
|
3089
|
+
| `HttpAccess` | Make external HTTP requests | High |
|
|
3090
|
+
|
|
3091
|
+
**WASM Isolation Benefits**:
|
|
3092
|
+
- **Memory Isolation**: Agent cannot access host memory
|
|
3093
|
+
- **Linear Memory**: Fixed-size sandbox, cannot grow unbounded
|
|
3094
|
+
- **No Ambient Authority**: Cannot access filesystem, network unless granted
|
|
3095
|
+
- **Deterministic Execution**: Same inputs → same outputs
|
|
3096
|
+
|
|
3097
|
+
```javascript
|
|
3098
|
+
// Minimal permissions for read-only analysis
|
|
3099
|
+
const readOnlyAgent = new HyperMindAgent({
|
|
3100
|
+
kg: db,
|
|
3101
|
+
sandbox: {
|
|
3102
|
+
capabilities: ['ReadKG'], // Cannot write or execute tools
|
|
3103
|
+
fuelLimit: 100_000
|
|
3104
|
+
}
|
|
3105
|
+
})
|
|
3106
|
+
|
|
3107
|
+
// Production fraud detector with more permissions
|
|
3108
|
+
const fraudAgent = new HyperMindAgent({
|
|
3109
|
+
kg: db,
|
|
3110
|
+
sandbox: {
|
|
3111
|
+
capabilities: ['ReadKG', 'ExecuteTool'], // Can run Datalog rules
|
|
3112
|
+
fuelLimit: 10_000_000
|
|
3113
|
+
}
|
|
3114
|
+
})
|
|
3115
|
+
|
|
3116
|
+
// Administrative agent (use with caution)
|
|
3117
|
+
const adminAgent = new HyperMindAgent({
|
|
3118
|
+
kg: db,
|
|
3119
|
+
sandbox: {
|
|
3120
|
+
capabilities: ['ReadKG', 'WriteKG', 'ExecuteTool', 'SpawnAgent'],
|
|
3121
|
+
fuelLimit: 100_000_000
|
|
3122
|
+
}
|
|
3123
|
+
})
|
|
3124
|
+
```
|
|
3125
|
+
|
|
3126
|
+
### Security Layer Integration
|
|
3127
|
+
|
|
3128
|
+
All three layers work together:
|
|
3129
|
+
|
|
3130
|
+
```
|
|
3131
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
3132
|
+
│ SECURITY LAYER STACK │
|
|
3133
|
+
│ │
|
|
3134
|
+
│ User Query: "Find high-risk claims and update their status" │
|
|
3135
|
+
│ │
|
|
3136
|
+
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3137
|
+
│ │ LAYER 1: SCOPE CHECK │ │
|
|
3138
|
+
│ │ ✅ Graph 'claims' is in allowedGraphs │ │
|
|
3139
|
+
│ │ ✅ Predicates 'riskScore', 'status' are allowed │ │
|
|
3140
|
+
│ │ ❌ If accessing 'internal' graph → BLOCKED │ │
|
|
3141
|
+
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3142
|
+
│ ↓ │
|
|
3143
|
+
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3144
|
+
│ │ LAYER 2: CAPABILITY CHECK │ │
|
|
3145
|
+
│ │ ✅ Has 'ReadKG' → SELECT query allowed │ │
|
|
3146
|
+
│ │ ❓ Has 'WriteKG'? → If yes, UPDATE allowed; if no, BLOCKED │ │
|
|
3147
|
+
│ │ ✅ Has 'ExecuteTool' → Datalog rules can run │ │
|
|
3148
|
+
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3149
|
+
│ ↓ │
|
|
3150
|
+
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3151
|
+
│ │ LAYER 3: FUEL CHECK │ │
|
|
3152
|
+
│ │ Query cost estimate: 25,000 fuel │ │
|
|
3153
|
+
│ │ Available fuel: 938,000 │ │
|
|
3154
|
+
│ │ ✅ Sufficient fuel → EXECUTE │ │
|
|
3155
|
+
│ │ (After execution: 913,000 remaining) │ │
|
|
3156
|
+
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3157
|
+
│ ↓ │
|
|
3158
|
+
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
|
3159
|
+
│ │ RESULT: Query executed, results returned │ │
|
|
3160
|
+
│ │ All operations logged in audit trail │ │
|
|
3161
|
+
│ └───────────────────────────────────────────────────────────────────────┘ │
|
|
3162
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
3163
|
+
```
|
|
3164
|
+
|
|
3165
|
+
---
|
|
3166
|
+
|
|
1328
3167
|
**Fuel Concept** (CPU Budget):
|
|
1329
3168
|
|
|
1330
3169
|
Fuel metering prevents runaway computations and enables resource accounting:
|
|
@@ -1362,6 +3201,205 @@ console.log(`Fuel remaining: ${remaining}`) // e.g., 985000
|
|
|
1362
3201
|
|
|
1363
3202
|
---
|
|
1364
3203
|
|
|
3204
|
+
## Real-World Agent Examples with ProofDAGs
|
|
3205
|
+
|
|
3206
|
+
### Fraud Detection Agent
|
|
3207
|
+
|
|
3208
|
+
**Use Case**: Detect insurance fraud rings using NICB (National Insurance Crime Bureau) patterns.
|
|
3209
|
+
|
|
3210
|
+
```javascript
|
|
3211
|
+
const { HyperMindAgent, GraphDB, DatalogProgram, evaluateDatalog, GraphFrame } = require('rust-kgdb')
|
|
3212
|
+
|
|
3213
|
+
// Create agent with secure defaults
|
|
3214
|
+
const db = new GraphDB('http://insurance.org/')
|
|
3215
|
+
db.loadTtl(FRAUD_ONTOLOGY, 'http://insurance.org/fraud-kb')
|
|
3216
|
+
|
|
3217
|
+
const agent = new HyperMindAgent({
|
|
3218
|
+
kg: db,
|
|
3219
|
+
name: 'fraud-detector',
|
|
3220
|
+
sandbox: {
|
|
3221
|
+
capabilities: ['ReadKG', 'ExecuteTool'], // Read-only!
|
|
3222
|
+
fuelLimit: 1_000_000
|
|
3223
|
+
}
|
|
3224
|
+
})
|
|
3225
|
+
|
|
3226
|
+
// Add NICB fraud detection rules
|
|
3227
|
+
agent.addRule('collusion_detection', {
|
|
3228
|
+
head: { predicate: 'potential_collusion', terms: ['?X', '?Y', '?P'] },
|
|
3229
|
+
body: [
|
|
3230
|
+
{ predicate: 'claimant', terms: ['?X'] },
|
|
3231
|
+
{ predicate: 'claimant', terms: ['?Y'] },
|
|
3232
|
+
{ predicate: 'provider', terms: ['?P'] },
|
|
3233
|
+
{ predicate: 'claims_with', terms: ['?X', '?P'] },
|
|
3234
|
+
{ predicate: 'claims_with', terms: ['?Y', '?P'] },
|
|
3235
|
+
{ predicate: 'knows', terms: ['?X', '?Y'] }
|
|
3236
|
+
]
|
|
3237
|
+
})
|
|
3238
|
+
|
|
3239
|
+
// Natural language query - full explainability!
|
|
3240
|
+
const result = await agent.call('Find all claimants with high risk scores')
|
|
3241
|
+
|
|
3242
|
+
console.log(result.answer) // Human-readable answer
|
|
3243
|
+
console.log(result.explanation) // Full execution trace
|
|
3244
|
+
console.log(result.proof) // Curry-Howard proof witness
|
|
3245
|
+
```
|
|
3246
|
+
|
|
3247
|
+
**Fraud Agent ProofDAG Output**:
|
|
3248
|
+
```
|
|
3249
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
3250
|
+
│ FRAUD DETECTION PROOF DAG │
|
|
3251
|
+
│ │
|
|
3252
|
+
│ ROOT: Collusion Detection (P001 ↔ P002 ↔ PROV001) │
|
|
3253
|
+
│ ═══════════════════════════════════════════════════ │
|
|
3254
|
+
│ │
|
|
3255
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
3256
|
+
│ │ Rule: potential_collusion(?X, ?Y, ?P) │ │
|
|
3257
|
+
│ │ Bindings: ?X=P001, ?Y=P002, ?P=PROV001 │ │
|
|
3258
|
+
│ │ │ │
|
|
3259
|
+
│ │ Proof Tree: │ │
|
|
3260
|
+
│ │ claimant(P001) ✓ [fact from KG] │ │
|
|
3261
|
+
│ │ claimant(P002) ✓ [fact from KG] │ │
|
|
3262
|
+
│ │ provider(PROV001) ✓ [fact from KG] │ │
|
|
3263
|
+
│ │ claims_with(P001,PROV001) ✓ [inferred from CLM001] │ │
|
|
3264
|
+
│ │ claims_with(P002,PROV001) ✓ [inferred from CLM002] │ │
|
|
3265
|
+
│ │ knows(P001,P002) ✓ [fact from KG] │ │
|
|
3266
|
+
│ │ ───────────────────────────────────────────── │ │
|
|
3267
|
+
│ │ ∴ potential_collusion(P001,P002,PROV001) ✓ [DERIVED] │ │
|
|
3268
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
3269
|
+
│ │
|
|
3270
|
+
│ Supporting Evidence: │
|
|
3271
|
+
│ ├─ SPARQL: 47 claims from PROV001 (time: 2.3ms) │
|
|
3272
|
+
│ ├─ GraphFrame: 1 triangle detected (P001-P002-PROV001) │
|
|
3273
|
+
│ ├─ Datalog: potential_collusion rule matched │
|
|
3274
|
+
│ └─ Embeddings: P001 similar to 3 known fraud providers (0.87 score) │
|
|
3275
|
+
│ │
|
|
3276
|
+
│ Proof Hash: sha256:8f3a2b1c4d5e6f7a8b9c0d1e2f3a4b5c │
|
|
3277
|
+
│ Timestamp: 2025-12-15T10:30:00Z │
|
|
3278
|
+
│ Agent: fraud-detector │
|
|
3279
|
+
│ │
|
|
3280
|
+
│ REGULATORY DEFENSIBLE: Every conclusion traceable to KG facts + rules │
|
|
3281
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
3282
|
+
```
|
|
3283
|
+
|
|
3284
|
+
### Underwriting Agent
|
|
3285
|
+
|
|
3286
|
+
**Use Case**: Commercial insurance underwriting with ISO/NAIC rating factors.
|
|
3287
|
+
|
|
3288
|
+
```javascript
|
|
3289
|
+
const { HyperMindAgent, GraphDB, DatalogProgram, evaluateDatalog } = require('rust-kgdb')
|
|
3290
|
+
|
|
3291
|
+
const db = new GraphDB('http://underwriting.org/')
|
|
3292
|
+
db.loadTtl(UNDERWRITING_KB, 'http://underwriting.org/data')
|
|
3293
|
+
|
|
3294
|
+
const agent = new HyperMindAgent({
|
|
3295
|
+
kg: db,
|
|
3296
|
+
name: 'underwriter',
|
|
3297
|
+
sandbox: {
|
|
3298
|
+
capabilities: ['ReadKG', 'ExecuteTool'], // Read-only for audit compliance
|
|
3299
|
+
fuelLimit: 500_000
|
|
3300
|
+
}
|
|
3301
|
+
})
|
|
3302
|
+
|
|
3303
|
+
// Add NAIC-informed underwriting rules
|
|
3304
|
+
agent.addRule('auto_approval', {
|
|
3305
|
+
head: { predicate: 'auto_approve', terms: ['?Account'] },
|
|
3306
|
+
body: [
|
|
3307
|
+
{ predicate: 'account', terms: ['?Account'] },
|
|
3308
|
+
{ predicate: 'loss_ratio', terms: ['?Account', '?LR'] },
|
|
3309
|
+
{ predicate: 'years_in_business', terms: ['?Account', '?Years'] },
|
|
3310
|
+
{ predicate: 'builtin_lt', terms: ['?LR', '0.35'] },
|
|
3311
|
+
{ predicate: 'builtin_gt', terms: ['?Years', '5'] }
|
|
3312
|
+
]
|
|
3313
|
+
})
|
|
3314
|
+
|
|
3315
|
+
agent.addRule('refer_to_underwriter', {
|
|
3316
|
+
head: { predicate: 'refer_to_underwriter', terms: ['?Account'] },
|
|
3317
|
+
body: [
|
|
3318
|
+
{ predicate: 'account', terms: ['?Account'] },
|
|
3319
|
+
{ predicate: 'loss_ratio', terms: ['?Account', '?LR'] },
|
|
3320
|
+
{ predicate: 'builtin_gt', terms: ['?LR', '0.50'] }
|
|
3321
|
+
]
|
|
3322
|
+
})
|
|
3323
|
+
|
|
3324
|
+
// ISO Premium Calculation: Base × Exposure × Territory × Experience × Loss
|
|
3325
|
+
function calculatePremium(baseRate, exposure, territoryMod, lossRatio, yearsInBusiness) {
|
|
3326
|
+
const experienceMod = yearsInBusiness >= 10 ? 0.90 : yearsInBusiness >= 5 ? 0.95 : 1.05
|
|
3327
|
+
const lossMod = lossRatio < 0.30 ? 0.85 : lossRatio < 0.50 ? 1.00 : lossRatio < 0.70 ? 1.15 : 1.35
|
|
3328
|
+
return baseRate * exposure * territoryMod * experienceMod * lossMod
|
|
3329
|
+
}
|
|
3330
|
+
|
|
3331
|
+
// Natural language underwriting
|
|
3332
|
+
const result = await agent.call('Which accounts need manual underwriter review?')
|
|
3333
|
+
```
|
|
3334
|
+
|
|
3335
|
+
**Underwriting Agent ProofDAG Output**:
|
|
3336
|
+
```
|
|
3337
|
+
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
3338
|
+
│ UNDERWRITING DECISION PROOF DAG │
|
|
3339
|
+
│ │
|
|
3340
|
+
│ Decision: BUS003 (SafeHaul Logistics) → REFER_TO_UNDERWRITER │
|
|
3341
|
+
│ ═════════════════════════════════════════════════════════ │
|
|
3342
|
+
│ │
|
|
3343
|
+
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
3344
|
+
│ │ RULE FIRED: refer_to_underwriter(?A) │ │
|
|
3345
|
+
│ │ │ │
|
|
3346
|
+
│ │ Datalog Definition: │ │
|
|
3347
|
+
│ │ refer_to_underwriter(?A) :- │ │
|
|
3348
|
+
│ │ account(?A), │ │
|
|
3349
|
+
│ │ loss_ratio(?A, ?L), │ │
|
|
3350
|
+
│ │ ?L > 0.5. │ │
|
|
3351
|
+
│ │ │ │
|
|
3352
|
+
│ │ Matching Facts: │ │
|
|
3353
|
+
│ │ account(BUS003) ✓ SafeHaul is an account │ │
|
|
3354
|
+
│ │ loss_ratio(BUS003, 0.72) ✓ Loss ratio is 72% │ │
|
|
3355
|
+
│ │ 0.72 > 0.5 ✓ Threshold exceeded │ │
|
|
3356
|
+
│ │ ───────────────────────────────────────────── │ │
|
|
3357
|
+
│ │ ∴ refer_to_underwriter(BUS003) ✓ [DERIVED] │ │
|
|
3358
|
+
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
3359
|
+
│ │
|
|
3360
|
+
│ Premium Calculation Trace: │
|
|
3361
|
+
│ ├─ Base Rate: $18.75/100 (NAICS 484110: General Freight Trucking) │
|
|
3362
|
+
│ ├─ Exposure: $4,200,000 revenue │
|
|
3363
|
+
│ ├─ Territory Mod: 1.45 (FEMA Zone AE - high flood risk) │
|
|
3364
|
+
│ ├─ Experience Mod: 0.95 (8 years in business) │
|
|
3365
|
+
│ ├─ Loss Mod: 1.35 (72% loss ratio - poor history) │
|
|
3366
|
+
│ └─ PREMIUM: $18.75 × 42000 × 1.45 × 0.95 × 1.35 = $1,463,925 │
|
|
3367
|
+
│ │
|
|
3368
|
+
│ Risk Factors (from GraphFrame): │
|
|
3369
|
+
│ ├─ Industry: Transportation (ISO high-risk class) │
|
|
3370
|
+
│ ├─ PageRank: 0.1847 (high network centrality in risk graph) │
|
|
3371
|
+
│ └─ Territory: TX-201 (hurricane corridor exposure) │
|
|
3372
|
+
│ │
|
|
3373
|
+
│ Auto-Approved Accounts (low risk): │
|
|
3374
|
+
│ ├─ BUS002 (TechStart LLC): loss_ratio=0.15, years=3 │
|
|
3375
|
+
│ └─ BUS004 (Downtown Restaurant): loss_ratio=0.28, years=12 │
|
|
3376
|
+
│ │
|
|
3377
|
+
│ Proof Hash: sha256:9d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8g │
|
|
3378
|
+
│ Timestamp: 2025-12-15T14:45:00Z │
|
|
3379
|
+
│ Agent: underwriter │
|
|
3380
|
+
│ │
|
|
3381
|
+
│ AUDIT TRAIL: ISO base rates + NAIC guidelines + FEMA zones applied │
|
|
3382
|
+
└─────────────────────────────────────────────────────────────────────────────┘
|
|
3383
|
+
```
|
|
3384
|
+
|
|
3385
|
+
### Why ProofDAGs Matter for Regulated Industries
|
|
3386
|
+
|
|
3387
|
+
| Aspect | Vanilla LLM | HyperMind + ProofDAG |
|
|
3388
|
+
|--------|-------------|----------------------|
|
|
3389
|
+
| **Audit Question** | "Why was this flagged?" | Hash: 9d4e5f6a → Full derivation chain |
|
|
3390
|
+
| **Regulatory Review** | Black box | "Rule R1 matched facts F1, F2, F3" |
|
|
3391
|
+
| **Reproducibility** | Different each time | Same inputs → Same hash |
|
|
3392
|
+
| **Liability Defense** | "The AI said so" | "ISO guideline + NAIC rule + KG facts" |
|
|
3393
|
+
| **SOX/GDPR Compliance** | Cannot prove | Full execution witness |
|
|
3394
|
+
|
|
3395
|
+
```bash
|
|
3396
|
+
# Run the examples
|
|
3397
|
+
node examples/fraud-detection-agent.js
|
|
3398
|
+
node examples/underwriting-agent.js
|
|
3399
|
+
```
|
|
3400
|
+
|
|
3401
|
+
---
|
|
3402
|
+
|
|
1365
3403
|
## Examples
|
|
1366
3404
|
|
|
1367
3405
|
```bash
|