rust-kgdb 0.5.6 → 0.5.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +205 -0
- package/README.md +812 -36
- package/examples/embeddings-example.ts +4 -4
- package/examples/hypermind-agent-architecture.js +1709 -0
- package/hypermind-agent.js +636 -1
- package/index.d.ts +248 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -4,6 +4,72 @@
|
|
|
4
4
|
[](https://opensource.org/licenses/Apache-2.0)
|
|
5
5
|
[](https://www.w3.org/TR/sparql11-query/)
|
|
6
6
|
|
|
7
|
+
> **Two-Layer Architecture**: High-performance Rust knowledge graph database + HyperMind neuro-symbolic agent framework with mathematical foundations.
|
|
8
|
+
|
|
9
|
+
**Naming Note**: The `GraphDB` class in this SDK is not affiliated with [Ontotext GraphDB](https://www.ontotext.com/products/graphdb/). The `GraphFrame` API is inspired by [Apache Spark GraphFrames](https://graphframes.github.io/graphframes/docs/_site/index.html).
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Architecture: What Powers rust-kgdb
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
┌─────────────────────────────────────────────────────────────────────────────────┐
|
|
17
|
+
│ YOUR APPLICATION │
|
|
18
|
+
│ (Fraud Detection, Underwriting, Compliance) │
|
|
19
|
+
└────────────────────────────────────┬────────────────────────────────────────────┘
|
|
20
|
+
│
|
|
21
|
+
┌────────────────────────────────────▼────────────────────────────────────────────┐
|
|
22
|
+
│ HYPERMIND AGENT FRAMEWORK (SDK Layer) │
|
|
23
|
+
│ ┌────────────────────────────────────────────────────────────────────────────┐ │
|
|
24
|
+
│ │ Mathematical Abstractions (High-Level) │ │
|
|
25
|
+
│ │ • TypeId: Hindley-Milner type system with refinement types │ │
|
|
26
|
+
│ │ • LLMPlanner: Natural language → typed tool pipelines │ │
|
|
27
|
+
│ │ • WasmSandbox: WASM isolation with capability-based security │ │
|
|
28
|
+
│ │ • AgentBuilder: Fluent composition of typed tools │ │
|
|
29
|
+
│ │ • ExecutionWitness: Cryptographic proofs (SHA-256) │ │
|
|
30
|
+
│ └────────────────────────────────────────────────────────────────────────────┘ │
|
|
31
|
+
│ │ │
|
|
32
|
+
│ Category Theory: Tools as Morphisms (A → B) │
|
|
33
|
+
│ Proof Theory: Every execution has a witness │
|
|
34
|
+
└────────────────────────────────────┬────────────────────────────────────────────┘
|
|
35
|
+
│ NAPI-RS Bindings
|
|
36
|
+
┌────────────────────────────────────▼────────────────────────────────────────────┐
|
|
37
|
+
│ RUST CORE ENGINE (Native Performance) │
|
|
38
|
+
│ ┌────────────────────────────────────────────────────────────────────────────┐ │
|
|
39
|
+
│ │ GraphDB │ RDF/SPARQL quad store │ 2.78µs lookups, 24 bytes/triple│
|
|
40
|
+
│ │ GraphFrame │ Graph algorithms │ WCOJ optimal joins, PageRank │
|
|
41
|
+
│ │ EmbeddingService │ Vector similarity │ HNSW index, 1-hop ARCADE cache│
|
|
42
|
+
│ │ DatalogProgram │ Rule-based reasoning │ Semi-naive evaluation │
|
|
43
|
+
│ │ Pregel │ BSP graph processing │ Iterative algorithms │
|
|
44
|
+
│ └────────────────────────────────────────────────────────────────────────────┘ │
|
|
45
|
+
│ │
|
|
46
|
+
│ W3C Standards: SPARQL 1.1 (100%) | RDF 1.2 | OWL 2 RL | SHACL | RDFS │
|
|
47
|
+
│ Storage Backends: InMemory | RocksDB | LMDB │
|
|
48
|
+
│ Distribution: HDRF Partitioning | Raft Consensus | gRPC │
|
|
49
|
+
└──────────────────────────────────────────────────────────────────────────────────┘
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
**Key Insight**: The Rust core provides raw performance (2.78µs lookups). The HyperMind framework adds mathematical guarantees (type safety, composition laws, proof generation) without sacrificing speed.
|
|
53
|
+
|
|
54
|
+
### What's Rust vs JavaScript?
|
|
55
|
+
|
|
56
|
+
| Component | Implementation | Performance | Notes |
|
|
57
|
+
|-----------|---------------|-------------|-------|
|
|
58
|
+
| **GraphDB** | Rust via NAPI-RS | 2.78µs lookups | Zero-copy RDF quad store |
|
|
59
|
+
| **GraphFrame** | Rust via NAPI-RS | WCOJ optimal | PageRank, triangles, components |
|
|
60
|
+
| **EmbeddingService** | Rust via NAPI-RS | Sub-ms search | HNSW index + 1-hop cache |
|
|
61
|
+
| **DatalogProgram** | Rust via NAPI-RS | Semi-naive eval | Rule-based reasoning |
|
|
62
|
+
| **Pregel** | Rust via NAPI-RS | BSP model | Iterative graph algorithms |
|
|
63
|
+
| **TypeId** | JavaScript | N/A | Type system labels |
|
|
64
|
+
| **LLMPlanner** | JavaScript + HTTP | LLM latency | Claude/GPT integration |
|
|
65
|
+
| **WasmSandbox** | JavaScript Proxy | Capability check | All Rust calls proxied |
|
|
66
|
+
| **AgentBuilder** | JavaScript | N/A | Fluent composition |
|
|
67
|
+
| **ExecutionWitness** | JavaScript | SHA-256 | Cryptographic audit |
|
|
68
|
+
|
|
69
|
+
**Security Model**: All interactions with Rust components flow through NAPI-RS bindings with memory isolation. The WasmSandbox wraps these bindings with capability-based access control, ensuring agents can only invoke tools they're explicitly granted. This provides defense-in-depth: NAPI-RS for memory safety, WasmSandbox for capability control.
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
7
73
|
## The Problem
|
|
8
74
|
|
|
9
75
|
We asked GPT-4 to write a simple SPARQL query: *"Find all professors."*
|
|
@@ -87,6 +153,393 @@ We don't make claims we can't prove. All measurements use **publicly available,
|
|
|
87
153
|
|
|
88
154
|
**Reproducibility:** All benchmarks at `crates/storage/benches/` and `crates/hypergraph/benches/`. Run with `cargo bench --workspace`.
|
|
89
155
|
|
|
156
|
+
### Benchmark Methodology
|
|
157
|
+
|
|
158
|
+
**How we measure performance:**
|
|
159
|
+
|
|
160
|
+
1. **LUBM Data Generation**
|
|
161
|
+
```bash
|
|
162
|
+
# Generate test data (matches official Java UBA generator)
|
|
163
|
+
rustc tools/lubm_generator.rs -O -o tools/lubm_generator
|
|
164
|
+
./tools/lubm_generator 1 /tmp/lubm_1.nt # 3,272 triples
|
|
165
|
+
./tools/lubm_generator 10 /tmp/lubm_10.nt # ~32K triples
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
2. **Storage Benchmarks**
|
|
169
|
+
```bash
|
|
170
|
+
# Run Criterion benchmarks (statistical analysis, 10K+ samples)
|
|
171
|
+
cargo bench --package storage --bench triple_store_benchmark
|
|
172
|
+
|
|
173
|
+
# Results include:
|
|
174
|
+
# - Mean, median, standard deviation
|
|
175
|
+
# - Outlier detection
|
|
176
|
+
# - Comparison vs baseline
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
3. **HyperMind Agent Accuracy**
|
|
180
|
+
```bash
|
|
181
|
+
# Run LUBM benchmark comparing Vanilla LLM vs HyperMind
|
|
182
|
+
node hypermind-benchmark.js
|
|
183
|
+
|
|
184
|
+
# Tests 12 queries (Easy: 3, Medium: 5, Hard: 4)
|
|
185
|
+
# Measures: Syntax validity, execution success, latency
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
4. **Hardware Requirements**
|
|
189
|
+
- Minimum: 4GB RAM, any x64/ARM64 CPU
|
|
190
|
+
- Recommended: 8GB+ RAM, Apple Silicon or modern x64
|
|
191
|
+
- Benchmarks run on: M2 MacBook Pro (baseline measurements)
|
|
192
|
+
|
|
193
|
+
5. **Fair Comparison Conditions**
|
|
194
|
+
- All systems tested with identical LUBM datasets
|
|
195
|
+
- Same SPARQL queries across all systems
|
|
196
|
+
- Cold-start measurements (no warm cache)
|
|
197
|
+
- 10,000+ iterations per measurement for statistical significance
|
|
198
|
+
|
|
199
|
+
---
|
|
200
|
+
|
|
201
|
+
## Why Embeddings? The Rise of Neuro-Symbolic AI
|
|
202
|
+
|
|
203
|
+
### The Problem with Pure Symbolic Systems
|
|
204
|
+
|
|
205
|
+
Traditional knowledge graphs are powerful for **structured reasoning**:
|
|
206
|
+
|
|
207
|
+
```sparql
|
|
208
|
+
SELECT ?fraud WHERE {
|
|
209
|
+
?claim :amount ?amt .
|
|
210
|
+
FILTER(?amt > 50000)
|
|
211
|
+
?claim :provider ?prov .
|
|
212
|
+
?prov :flaggedCount ?flags .
|
|
213
|
+
FILTER(?flags > 3)
|
|
214
|
+
}
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
But they fail at **semantic similarity**: "Find claims similar to this suspicious one" requires understanding meaning, not just matching predicates.
|
|
218
|
+
|
|
219
|
+
### The Problem with Pure Neural Systems
|
|
220
|
+
|
|
221
|
+
LLMs and embedding models excel at **semantic understanding**:
|
|
222
|
+
|
|
223
|
+
```javascript
|
|
224
|
+
// Find semantically similar claims
|
|
225
|
+
const similar = embeddings.findSimilar('CLM001', 10, 0.85)
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
But they hallucinate, have no audit trail, and can't explain their reasoning.
|
|
229
|
+
|
|
230
|
+
### The Neuro-Symbolic Solution
|
|
231
|
+
|
|
232
|
+
**rust-kgdb combines both**: Use embeddings for semantic discovery, symbolic reasoning for provable conclusions.
|
|
233
|
+
|
|
234
|
+
```
|
|
235
|
+
┌─────────────────────────────────────────────────────────────────────────┐
|
|
236
|
+
│ NEURO-SYMBOLIC PIPELINE │
|
|
237
|
+
│ │
|
|
238
|
+
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
239
|
+
│ │ NEURAL │ │ SYMBOLIC │ │ NEURAL │ │
|
|
240
|
+
│ │ (Discovery) │ ───▶ │ (Reasoning) │ ───▶ │ (Explain) │ │
|
|
241
|
+
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
242
|
+
│ │
|
|
243
|
+
│ "Find similar" "Apply rules" "Summarize for │
|
|
244
|
+
│ Embeddings search Datalog inference human consumption" │
|
|
245
|
+
│ HNSW index Semi-naive eval LLM generation │
|
|
246
|
+
│ Sub-ms latency Deterministic Cryptographic proof │
|
|
247
|
+
└─────────────────────────────────────────────────────────────────────────┘
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
### Why 1-Hop Embeddings Matter
|
|
251
|
+
|
|
252
|
+
The ARCADE (Adaptive Relation-Aware Cache for Dynamic Embeddings) algorithm provides **1-hop neighbor awareness**:
|
|
253
|
+
|
|
254
|
+
```javascript
|
|
255
|
+
const service = new EmbeddingService()
|
|
256
|
+
|
|
257
|
+
// Build neighbor cache from triples
|
|
258
|
+
service.onTripleInsert('CLM001', 'claimant', 'P001', null)
|
|
259
|
+
service.onTripleInsert('P001', 'knows', 'P002', null)
|
|
260
|
+
|
|
261
|
+
// 1-hop aware similarity: finds entities connected in the graph
|
|
262
|
+
const neighbors = service.getNeighborsOut('P001') // ['P002']
|
|
263
|
+
|
|
264
|
+
// Combine structural + semantic similarity
|
|
265
|
+
// "Find similar claims that are also connected to this claimant"
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
**Why it matters**: Pure embedding similarity finds semantically similar entities. 1-hop awareness finds entities that are both similar AND structurally connected - critical for fraud ring detection where relationships matter as much as content.
|
|
269
|
+
|
|
270
|
+
---
|
|
271
|
+
|
|
272
|
+
## Embedding Service: Multi-Provider Vector Search
|
|
273
|
+
|
|
274
|
+
### Provider Abstraction
|
|
275
|
+
|
|
276
|
+
The EmbeddingService supports multiple embedding providers with a unified API:
|
|
277
|
+
|
|
278
|
+
```javascript
|
|
279
|
+
const { EmbeddingService } = require('rust-kgdb')
|
|
280
|
+
|
|
281
|
+
// Initialize service (uses built-in 384-dim embeddings by default)
|
|
282
|
+
const service = new EmbeddingService()
|
|
283
|
+
|
|
284
|
+
// Store embeddings from any provider
|
|
285
|
+
service.storeVector('entity1', openaiEmbedding) // 384-dim
|
|
286
|
+
service.storeVector('entity2', anthropicEmbedding) // 384-dim
|
|
287
|
+
service.storeVector('entity3', cohereEmbedding) // 384-dim
|
|
288
|
+
|
|
289
|
+
// HNSW similarity search (Rust-native, sub-ms)
|
|
290
|
+
service.rebuildIndex()
|
|
291
|
+
const similar = JSON.parse(service.findSimilar('entity1', 10, 0.7))
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
### Composite Multi-Provider Embeddings
|
|
295
|
+
|
|
296
|
+
For production deployments, combine multiple providers for robustness:
|
|
297
|
+
|
|
298
|
+
```javascript
|
|
299
|
+
// Store embeddings from multiple providers for the same entity
|
|
300
|
+
service.storeComposite('CLM001', JSON.stringify({
|
|
301
|
+
openai: await openai.embed('Insurance claim for soft tissue injury'),
|
|
302
|
+
voyage: await voyage.embed('Insurance claim for soft tissue injury'),
|
|
303
|
+
cohere: await cohere.embed('Insurance claim for soft tissue injury')
|
|
304
|
+
}))
|
|
305
|
+
|
|
306
|
+
// Search with aggregation strategies
|
|
307
|
+
const rrfResults = service.findSimilarComposite('CLM001', 10, 0.7, 'rrf') // Reciprocal Rank Fusion
|
|
308
|
+
const maxResults = service.findSimilarComposite('CLM001', 10, 0.7, 'max') // Max score
|
|
309
|
+
const voteResults = service.findSimilarComposite('CLM001', 10, 0.7, 'voting') // Majority voting
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
### Provider Configuration
|
|
313
|
+
|
|
314
|
+
Configure your embedding providers with API keys:
|
|
315
|
+
|
|
316
|
+
```javascript
|
|
317
|
+
// Example: Using OpenAI embeddings
|
|
318
|
+
const { OpenAI } = require('openai')
|
|
319
|
+
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
|
|
320
|
+
|
|
321
|
+
async function getOpenAIEmbedding(text) {
|
|
322
|
+
const response = await openai.embeddings.create({
|
|
323
|
+
model: 'text-embedding-3-small',
|
|
324
|
+
input: text,
|
|
325
|
+
dimensions: 384 // Match rust-kgdb's 384-dim format
|
|
326
|
+
})
|
|
327
|
+
return response.data[0].embedding
|
|
328
|
+
}
|
|
329
|
+
|
|
330
|
+
// Example: Using Anthropic (via their embedding partner)
|
|
331
|
+
// Note: Anthropic doesn't provide embeddings directly; use Voyage AI
|
|
332
|
+
const { VoyageAIClient } = require('voyageai')
|
|
333
|
+
const voyage = new VoyageAIClient({ apiKey: process.env.VOYAGE_API_KEY })
|
|
334
|
+
|
|
335
|
+
async function getVoyageEmbedding(text) {
|
|
336
|
+
const response = await voyage.embed({
|
|
337
|
+
input: text,
|
|
338
|
+
model: 'voyage-2'
|
|
339
|
+
})
|
|
340
|
+
return response.embeddings[0].slice(0, 384) // Truncate to 384-dim
|
|
341
|
+
}
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
---
|
|
345
|
+
|
|
346
|
+
## Graph Ingestion Pipeline with Embedding Triggers
|
|
347
|
+
|
|
348
|
+
### Automatic Embedding on Triple Insert
|
|
349
|
+
|
|
350
|
+
Configure your pipeline to automatically generate embeddings when triples are inserted:
|
|
351
|
+
|
|
352
|
+
```javascript
|
|
353
|
+
const { GraphDB, EmbeddingService } = require('rust-kgdb')
|
|
354
|
+
|
|
355
|
+
// Initialize services
|
|
356
|
+
const db = new GraphDB('http://insurance.org/claims')
|
|
357
|
+
const embeddings = new EmbeddingService()
|
|
358
|
+
|
|
359
|
+
// Embedding provider (configure with your API key)
|
|
360
|
+
async function getEmbedding(text) {
|
|
361
|
+
// Replace with your provider (OpenAI, Voyage, Cohere, etc.)
|
|
362
|
+
return new Array(384).fill(0).map(() => Math.random())
|
|
363
|
+
}
|
|
364
|
+
|
|
365
|
+
// Ingestion pipeline with embedding triggers
|
|
366
|
+
async function ingestClaim(claim) {
|
|
367
|
+
// 1. Insert structured data into knowledge graph
|
|
368
|
+
db.loadTtl(`
|
|
369
|
+
@prefix : <http://insurance.org/> .
|
|
370
|
+
:${claim.id} a :Claim ;
|
|
371
|
+
:amount "${claim.amount}" ;
|
|
372
|
+
:description "${claim.description}" ;
|
|
373
|
+
:claimant :${claim.claimantId} ;
|
|
374
|
+
:provider :${claim.providerId} .
|
|
375
|
+
`, null)
|
|
376
|
+
|
|
377
|
+
// 2. Generate and store embedding for semantic search
|
|
378
|
+
const vector = await getEmbedding(claim.description)
|
|
379
|
+
embeddings.storeVector(claim.id, vector)
|
|
380
|
+
|
|
381
|
+
// 3. Update 1-hop cache for neighbor-aware search
|
|
382
|
+
embeddings.onTripleInsert(claim.id, 'claimant', claim.claimantId, null)
|
|
383
|
+
embeddings.onTripleInsert(claim.id, 'provider', claim.providerId, null)
|
|
384
|
+
|
|
385
|
+
// 4. Rebuild index after batch inserts (or periodically)
|
|
386
|
+
embeddings.rebuildIndex()
|
|
387
|
+
|
|
388
|
+
return { tripleCount: db.countTriples(), embeddingStored: true }
|
|
389
|
+
}
|
|
390
|
+
|
|
391
|
+
// Process batch with embedding triggers
|
|
392
|
+
async function processBatch(claims) {
|
|
393
|
+
for (const claim of claims) {
|
|
394
|
+
await ingestClaim(claim)
|
|
395
|
+
console.log(`Ingested: ${claim.id}`)
|
|
396
|
+
}
|
|
397
|
+
|
|
398
|
+
// Rebuild HNSW index after batch
|
|
399
|
+
embeddings.rebuildIndex()
|
|
400
|
+
console.log(`Index rebuilt with ${claims.length} new embeddings`)
|
|
401
|
+
}
|
|
402
|
+
```
|
|
403
|
+
|
|
404
|
+
### Pipeline Architecture
|
|
405
|
+
|
|
406
|
+
```
|
|
407
|
+
┌─────────────────────────────────────────────────────────────────────────┐
|
|
408
|
+
│ GRAPH INGESTION PIPELINE │
|
|
409
|
+
│ │
|
|
410
|
+
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
|
|
411
|
+
│ │ Data Source │ │ Transform │ │ Enrich │ │
|
|
412
|
+
│ │ (JSON/CSV) │────▶│ (to RDF) │────▶│ (+Embeddings)│ │
|
|
413
|
+
│ └───────────────┘ └───────────────┘ └───────┬───────┘ │
|
|
414
|
+
│ │ │
|
|
415
|
+
│ ┌───────────────────────────────────────────────────┼───────────────┐ │
|
|
416
|
+
│ │ TRIGGERS │ │ │
|
|
417
|
+
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┴─────────────┐ │ │
|
|
418
|
+
│ │ │ Embedding │ │ 1-Hop │ │ HNSW Index │ │ │
|
|
419
|
+
│ │ │ Generation │ │ Cache │ │ Rebuild │ │ │
|
|
420
|
+
│ │ │ (per entity)│ │ Update │ │ (batch/periodic) │ │ │
|
|
421
|
+
│ │ └─────────────┘ └─────────────┘ └───────────────────────────┘ │ │
|
|
422
|
+
│ └───────────────────────────────────────────────────────────────────┘ │
|
|
423
|
+
│ │ │
|
|
424
|
+
│ ▼ │
|
|
425
|
+
│ ┌───────────────────────────────────────────────────────────────────┐ │
|
|
426
|
+
│ │ RUST CORE (NAPI-RS) │ │
|
|
427
|
+
│ │ GraphDB (triples) │ EmbeddingService (vectors) │ HNSW (index) │ │
|
|
428
|
+
│ └───────────────────────────────────────────────────────────────────┘ │
|
|
429
|
+
└─────────────────────────────────────────────────────────────────────────┘
|
|
430
|
+
```
|
|
431
|
+
|
|
432
|
+
---
|
|
433
|
+
|
|
434
|
+
## HyperAgent Framework Components
|
|
435
|
+
|
|
436
|
+
The HyperMind agent framework provides complete infrastructure for building neuro-symbolic AI agents:
|
|
437
|
+
|
|
438
|
+
### Architecture Overview
|
|
439
|
+
|
|
440
|
+
```
|
|
441
|
+
┌─────────────────────────────────────────────────────────────────────────┐
|
|
442
|
+
│ HYPERAGENT FRAMEWORK │
|
|
443
|
+
│ │
|
|
444
|
+
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
445
|
+
│ │ GOVERNANCE LAYER │ │
|
|
446
|
+
│ │ Policy Engine | Capability Grants | Audit Trail | Compliance │ │
|
|
447
|
+
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
448
|
+
│ │ │
|
|
449
|
+
│ ┌───────────────────────────────┼─────────────────────────────────┐ │
|
|
450
|
+
│ │ RUNTIME LAYER │ │
|
|
451
|
+
│ │ ┌──────────────┐ ┌───────┴───────┐ ┌──────────────┐ │ │
|
|
452
|
+
│ │ │ LLMPlanner │ │ PlanExecutor │ │ WasmSandbox │ │ │
|
|
453
|
+
│ │ │ (Claude/GPT)│───▶│ (Type-safe) │───▶│ (Isolated) │ │ │
|
|
454
|
+
│ │ └──────────────┘ └───────────────┘ └──────┬───────┘ │ │
|
|
455
|
+
│ └──────────────────────────────────────────────────┼──────────────┘ │
|
|
456
|
+
│ │ │
|
|
457
|
+
│ ┌──────────────────────────────────────────────────┼──────────────┐ │
|
|
458
|
+
│ │ PROXY LAYER │ │ │
|
|
459
|
+
│ │ Object Proxy: All tool calls flow through typed morphism layer │ │
|
|
460
|
+
│ │ ┌────────────────────────────────────────────────┴───────────┐ │ │
|
|
461
|
+
│ │ │ proxy.call('kg.sparql.query', { query }) → BindingSet │ │ │
|
|
462
|
+
│ │ │ proxy.call('kg.motif.find', { pattern }) → List<Match> │ │ │
|
|
463
|
+
│ │ │ proxy.call('kg.datalog.infer', { rules }) → List<Fact> │ │ │
|
|
464
|
+
│ │ │ proxy.call('kg.embeddings.search', { entity }) → Similar │ │ │
|
|
465
|
+
│ │ └────────────────────────────────────────────────────────────┘ │ │
|
|
466
|
+
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
467
|
+
│ │
|
|
468
|
+
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
469
|
+
│ │ MEMORY LAYER │ │
|
|
470
|
+
│ │ Working Memory | Long-term Memory | Episodic Memory │ │
|
|
471
|
+
│ │ (Current context) (Knowledge graph) (Execution history) │ │
|
|
472
|
+
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
473
|
+
│ │
|
|
474
|
+
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
475
|
+
│ │ SCOPE LAYER │ │
|
|
476
|
+
│ │ Namespace isolation | Resource limits | Capability boundaries │ │
|
|
477
|
+
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
478
|
+
└─────────────────────────────────────────────────────────────────────────┘
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
### Component Details
|
|
482
|
+
|
|
483
|
+
**Governance Layer**: Policy-based control over agent behavior
|
|
484
|
+
```javascript
|
|
485
|
+
const agent = new AgentBuilder('compliance-agent')
|
|
486
|
+
.withPolicy({
|
|
487
|
+
maxExecutionTime: 30000, // 30 second timeout
|
|
488
|
+
allowedTools: ['kg.sparql.query', 'kg.datalog.infer'],
|
|
489
|
+
deniedTools: ['kg.update', 'kg.delete'], // Read-only
|
|
490
|
+
auditLevel: 'full' // Log all tool calls
|
|
491
|
+
})
|
|
492
|
+
```
|
|
493
|
+
|
|
494
|
+
**Runtime Layer**: Type-safe plan execution
|
|
495
|
+
```javascript
|
|
496
|
+
const { LLMPlanner, TOOL_REGISTRY } = require('rust-kgdb/hypermind-agent')
|
|
497
|
+
|
|
498
|
+
const planner = new LLMPlanner('claude-sonnet-4', TOOL_REGISTRY)
|
|
499
|
+
const plan = await planner.plan("Find suspicious claims")
|
|
500
|
+
// plan.steps: [{tool: 'kg.sparql.query', args: {...}}, ...]
|
|
501
|
+
// plan.confidence: 0.92
|
|
502
|
+
```
|
|
503
|
+
|
|
504
|
+
**Proxy Layer**: All Rust interactions through typed morphisms
|
|
505
|
+
```javascript
|
|
506
|
+
const sandbox = new WasmSandbox({
|
|
507
|
+
capabilities: ['ReadKG', 'ExecuteTool'],
|
|
508
|
+
fuelLimit: 1000000
|
|
509
|
+
})
|
|
510
|
+
|
|
511
|
+
const proxy = sandbox.createObjectProxy({
|
|
512
|
+
'kg.sparql.query': (args) => db.querySelect(args.query),
|
|
513
|
+
'kg.embeddings.search': (args) => embeddings.findSimilar(args.entity, args.k, args.threshold)
|
|
514
|
+
})
|
|
515
|
+
|
|
516
|
+
// All calls are logged, metered, and capability-checked
|
|
517
|
+
const result = await proxy['kg.sparql.query']({ query: 'SELECT ?x WHERE { ?x a :Fraud }' })
|
|
518
|
+
```
|
|
519
|
+
|
|
520
|
+
**Memory Layer**: Context management across agent lifecycle
|
|
521
|
+
```javascript
|
|
522
|
+
const agent = new AgentBuilder('investigator')
|
|
523
|
+
.withMemory({
|
|
524
|
+
working: { maxSize: 1024 * 1024 }, // 1MB working memory
|
|
525
|
+
episodic: { retentionDays: 30 }, // 30-day execution history
|
|
526
|
+
longTerm: db // Knowledge graph as long-term memory
|
|
527
|
+
})
|
|
528
|
+
```
|
|
529
|
+
|
|
530
|
+
**Scope Layer**: Resource isolation and boundaries
|
|
531
|
+
```javascript
|
|
532
|
+
const agent = new AgentBuilder('scoped-agent')
|
|
533
|
+
.withScope({
|
|
534
|
+
namespace: 'fraud-detection',
|
|
535
|
+
resourceLimits: {
|
|
536
|
+
maxTriples: 1000000,
|
|
537
|
+
maxEmbeddings: 100000,
|
|
538
|
+
maxConcurrentQueries: 10
|
|
539
|
+
}
|
|
540
|
+
})
|
|
541
|
+
```
|
|
542
|
+
|
|
90
543
|
---
|
|
91
544
|
|
|
92
545
|
## Feature Overview
|
|
@@ -253,6 +706,202 @@ console.log('Inferred:', evaluateDatalog(datalog))
|
|
|
253
706
|
|
|
254
707
|
---
|
|
255
708
|
|
|
709
|
+
## HyperMind Architecture Deep Dive
|
|
710
|
+
|
|
711
|
+
For a complete walkthrough of the architecture, run:
|
|
712
|
+
```bash
|
|
713
|
+
node examples/hypermind-agent-architecture.js
|
|
714
|
+
```
|
|
715
|
+
|
|
716
|
+
### Full System Architecture
|
|
717
|
+
|
|
718
|
+
```
|
|
719
|
+
╔════════════════════════════════════════════════════════════════════════════════╗
|
|
720
|
+
║ HYPERMIND NEURO-SYMBOLIC ARCHITECTURE ║
|
|
721
|
+
╠════════════════════════════════════════════════════════════════════════════════╣
|
|
722
|
+
║ ║
|
|
723
|
+
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
|
|
724
|
+
║ │ APPLICATION LAYER │ ║
|
|
725
|
+
║ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ ║
|
|
726
|
+
║ │ │ Fraud │ │ Underwriting│ │ Compliance │ │ Custom │ │ ║
|
|
727
|
+
║ │ │ Detection │ │ Agent │ │ Checker │ │ Agents │ │ ║
|
|
728
|
+
║ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ ║
|
|
729
|
+
║ └─────────┼────────────────┼────────────────┼────────────────┼───────────┘ ║
|
|
730
|
+
║ └────────────────┴────────┬───────┴────────────────┘ ║
|
|
731
|
+
║ │ ║
|
|
732
|
+
║ ┌───────────────────────────────────┼────────────────────────────────────┐ ║
|
|
733
|
+
║ │ HYPERMIND RUNTIME │ ║
|
|
734
|
+
║ │ ┌────────────────┐ ┌─────────┴─────────┐ ┌─────────────────┐ │ ║
|
|
735
|
+
║ │ │ LLM PLANNER │ │ PLAN EXECUTOR │ │ WASM SANDBOX │ │ ║
|
|
736
|
+
║ │ │ • Claude/GPT │───▶│ • Type validation │───▶│ • Capabilities │ │ ║
|
|
737
|
+
║ │ │ • Intent parse │ │ • Morphism compose│ │ • Fuel metering │ │ ║
|
|
738
|
+
║ │ │ • Tool select │ │ • Step execution │ │ • Memory limits │ │ ║
|
|
739
|
+
║ │ └────────────────┘ └───────────────────┘ └────────┬────────┘ │ ║
|
|
740
|
+
║ │ │ │ ║
|
|
741
|
+
║ │ ┌───────────────────────────────────────────────────────┼───────────┐ │ ║
|
|
742
|
+
║ │ │ OBJECT PROXY (gRPC-style) │ │ │ ║
|
|
743
|
+
║ │ │ proxy.call("kg.sparql.query", args) ────────────────┤ │ │ ║
|
|
744
|
+
║ │ │ proxy.call("kg.motif.find", args) ────────────────┤ │ │ ║
|
|
745
|
+
║ │ │ proxy.call("kg.datalog.infer", args) ────────────────┤ │ │ ║
|
|
746
|
+
║ │ └───────────────────────────────────────────────────────┼───────────┘ │ ║
|
|
747
|
+
║ └──────────────────────────────────────────────────────────┼─────────────┘ ║
|
|
748
|
+
║ │ ║
|
|
749
|
+
║ ┌──────────────────────────────────────────────────────────┼─────────────┐ ║
|
|
750
|
+
║ │ HYPERMIND TOOLS │ │ ║
|
|
751
|
+
║ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───┴─────────┐ │ ║
|
|
752
|
+
║ │ │ SPARQL │ │ MOTIF │ │ DATALOG │ │ EMBEDDINGS │ │ ║
|
|
753
|
+
║ │ │ String → │ │ Pattern → │ │ Rules → │ │ Entity → │ │ ║
|
|
754
|
+
║ │ │ BindingSet │ │ List<Match> │ │ List<Fact> │ │ List<Sim> │ │ ║
|
|
755
|
+
║ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ ║
|
|
756
|
+
║ └────────────────────────────────────────────────────────────────────────┘ ║
|
|
757
|
+
║ ║
|
|
758
|
+
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
|
|
759
|
+
║ │ rust-kgdb KNOWLEDGE GRAPH │ ║
|
|
760
|
+
║ │ RDF Triples | SPARQL 1.1 | GraphFrames | Embeddings | Datalog │ ║
|
|
761
|
+
║ │ 2.78µs lookups | 24 bytes/triple | 35x faster than RDFox │ ║
|
|
762
|
+
║ └────────────────────────────────────────────────────────────────────────┘ ║
|
|
763
|
+
╚════════════════════════════════════════════════════════════════════════════════╝
|
|
764
|
+
```
|
|
765
|
+
|
|
766
|
+
### Agent Execution Sequence
|
|
767
|
+
|
|
768
|
+
```
|
|
769
|
+
╔════════════════════════════════════════════════════════════════════════════════╗
|
|
770
|
+
║ HYPERMIND AGENT EXECUTION - SEQUENCE DIAGRAM ║
|
|
771
|
+
╠════════════════════════════════════════════════════════════════════════════════╣
|
|
772
|
+
║ ║
|
|
773
|
+
║ User SDK Planner Sandbox Proxy KG ║
|
|
774
|
+
║ │ │ │ │ │ │ ║
|
|
775
|
+
║ │ "Find suspicious claims" │ │ │ │ ║
|
|
776
|
+
║ │────────────▶│ │ │ │ │ ║
|
|
777
|
+
║ │ │ plan(prompt) │ │ │ │ ║
|
|
778
|
+
║ │ │─────────────▶│ │ │ │ ║
|
|
779
|
+
║ │ │ │ ┌──────────────────────────┐│ │ ║
|
|
780
|
+
║ │ │ │ │ LLM Reasoning: ││ │ ║
|
|
781
|
+
║ │ │ │ │ 1. Parse intent ││ │ ║
|
|
782
|
+
║ │ │ │ │ 2. Select tools ││ │ ║
|
|
783
|
+
║ │ │ │ │ 3. Validate types ││ │ ║
|
|
784
|
+
║ │ │ │ └──────────────────────────┘│ │ ║
|
|
785
|
+
║ │ │ Plan{steps, confidence} │ │ │ ║
|
|
786
|
+
║ │ │◀─────────────│ │ │ │ ║
|
|
787
|
+
║ │ │ execute(plan)│ │ │ │ ║
|
|
788
|
+
║ │ │─────────────────────────────▶ │ │ ║
|
|
789
|
+
║ │ │ │ ┌────────────────────────┐ │ │ ║
|
|
790
|
+
║ │ │ │ │ Sandbox Init: │ │ │ ║
|
|
791
|
+
║ │ │ │ │ • Capabilities: [Read] │ │ │ ║
|
|
792
|
+
║ │ │ │ │ • Fuel: 1,000,000 │ │ │ ║
|
|
793
|
+
║ │ │ │ └────────────────────────┘ │ │ ║
|
|
794
|
+
║ │ │ │ │ kg.sparql │ │ ║
|
|
795
|
+
║ │ │ │ │─────────────▶│───────────▶│ ║
|
|
796
|
+
║ │ │ │ │ │ BindingSet │ ║
|
|
797
|
+
║ │ │ │ │◀─────────────│◀───────────│ ║
|
|
798
|
+
║ │ │ │ │ kg.datalog │ │ ║
|
|
799
|
+
║ │ │ │ │─────────────▶│───────────▶│ ║
|
|
800
|
+
║ │ │ │ │ │ List<Fact> │ ║
|
|
801
|
+
║ │ │ │ │◀─────────────│◀───────────│ ║
|
|
802
|
+
║ │ │ ExecutionResult{findings, witness} │ │ ║
|
|
803
|
+
║ │ │◀───────────────────────────── │ │ ║
|
|
804
|
+
║ │ "Found 2 collusion patterns. Evidence: ..." │ │ ║
|
|
805
|
+
║ │◀────────────│ │ │ │ │ ║
|
|
806
|
+
╚════════════════════════════════════════════════════════════════════════════════╝
|
|
807
|
+
```
|
|
808
|
+
|
|
809
|
+
### Architecture Components (v0.5.8+)
|
|
810
|
+
|
|
811
|
+
The TypeScript SDK exports production-ready HyperMind components. All execution flows through the **WASM sandbox** for complete security isolation:
|
|
812
|
+
|
|
813
|
+
```javascript
|
|
814
|
+
const {
|
|
815
|
+
// Type System (Hindley-Milner style)
|
|
816
|
+
TypeId, // Base types + refinement types (RiskScore, PolicyNumber)
|
|
817
|
+
TOOL_REGISTRY, // Tools as typed morphisms (category theory)
|
|
818
|
+
|
|
819
|
+
// Runtime Components
|
|
820
|
+
LLMPlanner, // Natural language → typed tool pipelines
|
|
821
|
+
WasmSandbox, // Secure WASM isolation with capability-based security
|
|
822
|
+
AgentBuilder, // Fluent builder for agent composition
|
|
823
|
+
ComposedAgent, // Executable agent with execution witness
|
|
824
|
+
} = require('rust-kgdb/hypermind-agent')
|
|
825
|
+
```
|
|
826
|
+
|
|
827
|
+
**Example: Build a Custom Agent**
|
|
828
|
+
```javascript
|
|
829
|
+
const { AgentBuilder, LLMPlanner, TypeId, TOOL_REGISTRY } = require('rust-kgdb/hypermind-agent')
|
|
830
|
+
|
|
831
|
+
// Compose an agent using the builder pattern
|
|
832
|
+
const agent = new AgentBuilder('compliance-checker')
|
|
833
|
+
.withTool('kg.sparql.query')
|
|
834
|
+
.withTool('kg.datalog.infer')
|
|
835
|
+
.withPlanner(new LLMPlanner('claude-sonnet-4', TOOL_REGISTRY))
|
|
836
|
+
.withSandbox({
|
|
837
|
+
capabilities: ['ReadKG', 'ExecuteTool'], // No WriteKG for safety
|
|
838
|
+
fuelLimit: 1000000,
|
|
839
|
+
maxMemory: 64 * 1024 * 1024 // 64MB
|
|
840
|
+
})
|
|
841
|
+
.withHook('afterExecute', (step, result) => {
|
|
842
|
+
console.log(`Completed: ${step.tool} → ${result.length} results`)
|
|
843
|
+
})
|
|
844
|
+
.build()
|
|
845
|
+
|
|
846
|
+
// Execute with natural language
|
|
847
|
+
const result = await agent.call("Check compliance status for all vendors")
|
|
848
|
+
console.log(result.witness.proof_hash) // sha256:...
|
|
849
|
+
```
|
|
850
|
+
|
|
851
|
+
---
|
|
852
|
+
|
|
853
|
+
## HyperMind vs MCP (Model Context Protocol)
|
|
854
|
+
|
|
855
|
+
Why domain-enriched proxies beat generic function calling:
|
|
856
|
+
|
|
857
|
+
```
|
|
858
|
+
┌───────────────────────┬──────────────────────┬──────────────────────────┐
|
|
859
|
+
│ Feature │ MCP │ HyperMind Proxy │
|
|
860
|
+
├───────────────────────┼──────────────────────┼──────────────────────────┤
|
|
861
|
+
│ Type Safety │ ❌ String only │ ✅ Full type system │
|
|
862
|
+
│ Domain Knowledge │ ❌ Generic │ ✅ Domain-enriched │
|
|
863
|
+
│ Tool Composition │ ❌ Isolated │ ✅ Morphism composition │
|
|
864
|
+
│ Validation │ ❌ Runtime │ ✅ Compile-time │
|
|
865
|
+
│ Security │ ❌ None │ ✅ WASM sandbox │
|
|
866
|
+
│ Audit Trail │ ❌ None │ ✅ Execution witness │
|
|
867
|
+
│ LLM Context │ ❌ Generic schema │ ✅ Rich domain hints │
|
|
868
|
+
│ Capability Control │ ❌ All or nothing │ ✅ Fine-grained caps │
|
|
869
|
+
├───────────────────────┼──────────────────────┼──────────────────────────┤
|
|
870
|
+
│ Result │ 60% accuracy │ 95%+ accuracy │
|
|
871
|
+
│ │ "I think this might │ "Rule R1 matched facts │
|
|
872
|
+
│ │ be suspicious..." │ F1,F2,F3. Proof: ..." │
|
|
873
|
+
└───────────────────────┴──────────────────────┴──────────────────────────┘
|
|
874
|
+
```
|
|
875
|
+
|
|
876
|
+
### The Key Insight
|
|
877
|
+
|
|
878
|
+
**MCP**: LLM generates query → hope it works
|
|
879
|
+
**HyperMind**: LLM selects tools → type system validates → guaranteed correct
|
|
880
|
+
|
|
881
|
+
```javascript
|
|
882
|
+
// MCP APPROACH (Generic function calling)
|
|
883
|
+
// Tool: search_database(query: string)
|
|
884
|
+
// LLM generates: "SELECT * FROM claims WHERE suspicious = true"
|
|
885
|
+
// Result: ❌ SQL injection risk, "suspicious" column doesn't exist
|
|
886
|
+
|
|
887
|
+
// HYPERMIND APPROACH (Domain-enriched proxy)
|
|
888
|
+
// Tool: kg.datalog.infer with NICB fraud rules
|
|
889
|
+
const proxy = sandbox.createObjectProxy(tools)
|
|
890
|
+
const result = await proxy['kg.datalog.infer']({
|
|
891
|
+
rules: ['potential_collusion', 'staged_accident']
|
|
892
|
+
})
|
|
893
|
+
// Result: ✅ Type-safe, domain-aware, auditable
|
|
894
|
+
```
|
|
895
|
+
|
|
896
|
+
**Why Domain Proxies Win:**
|
|
897
|
+
1. LLM becomes **orchestrator**, not executor
|
|
898
|
+
2. Domain knowledge **reduces hallucination**
|
|
899
|
+
3. Composition **multiplies capability**
|
|
900
|
+
4. Audit trail **enables compliance**
|
|
901
|
+
5. Security **enables enterprise deployment**
|
|
902
|
+
|
|
903
|
+
---
|
|
904
|
+
|
|
256
905
|
## Why Vanilla LLMs Fail
|
|
257
906
|
|
|
258
907
|
When you ask an LLM to query a knowledge graph, it produces **broken SPARQL 85% of the time**:
|
|
@@ -551,51 +1200,178 @@ rust-kgdb includes a complete ontology engine based on W3C standards.
|
|
|
551
1200
|
|
|
552
1201
|
**Pattern Recognition:** Circular payment detection mirrors real SIU (Special Investigation Unit) methodologies from major insurers.
|
|
553
1202
|
|
|
1203
|
+
### Pre-Steps: Dataset and Embedding Configuration
|
|
1204
|
+
|
|
1205
|
+
Before running the fraud detection pipeline, configure your environment:
|
|
1206
|
+
|
|
554
1207
|
```javascript
|
|
1208
|
+
// ============================================================
|
|
1209
|
+
// STEP 1: Environment Configuration
|
|
1210
|
+
// ============================================================
|
|
555
1211
|
const { GraphDB, GraphFrame, EmbeddingService, DatalogProgram, evaluateDatalog } = require('rust-kgdb')
|
|
1212
|
+
const { AgentBuilder, LLMPlanner, WasmSandbox, TOOL_REGISTRY } = require('rust-kgdb/hypermind-agent')
|
|
1213
|
+
|
|
1214
|
+
// Configure embedding provider (choose one)
|
|
1215
|
+
const EMBEDDING_PROVIDER = process.env.EMBEDDING_PROVIDER || 'mock'
|
|
1216
|
+
const OPENAI_API_KEY = process.env.OPENAI_API_KEY
|
|
1217
|
+
const VOYAGE_API_KEY = process.env.VOYAGE_API_KEY
|
|
1218
|
+
|
|
1219
|
+
// Embedding dimension must match provider output
|
|
1220
|
+
const EMBEDDING_DIM = 384
|
|
556
1221
|
|
|
557
|
-
//
|
|
1222
|
+
// ============================================================
|
|
1223
|
+
// STEP 2: Initialize Services
|
|
1224
|
+
// ============================================================
|
|
558
1225
|
const db = new GraphDB('http://insurance.org/fraud-kb')
|
|
559
|
-
|
|
560
|
-
@prefix : <http://insurance.org/> .
|
|
561
|
-
:CLM001 :amount "18500" ; :claimant :P001 ; :provider :PROV001 .
|
|
562
|
-
:CLM002 :amount "22300" ; :claimant :P002 ; :provider :PROV001 .
|
|
563
|
-
:P001 :paidTo :P002 .
|
|
564
|
-
:P002 :paidTo :P003 .
|
|
565
|
-
:P003 :paidTo :P001 . # Circular!
|
|
566
|
-
`, null)
|
|
1226
|
+
const embeddings = new EmbeddingService()
|
|
567
1227
|
|
|
568
|
-
//
|
|
569
|
-
|
|
570
|
-
|
|
571
|
-
|
|
572
|
-
|
|
573
|
-
|
|
574
|
-
|
|
575
|
-
|
|
576
|
-
|
|
1228
|
+
// ============================================================
|
|
1229
|
+
// STEP 3: Configure Embedding Provider
|
|
1230
|
+
// ============================================================
|
|
1231
|
+
async function getEmbedding(text) {
|
|
1232
|
+
switch (EMBEDDING_PROVIDER) {
|
|
1233
|
+
case 'openai':
|
|
1234
|
+
const { OpenAI } = require('openai')
|
|
1235
|
+
const openai = new OpenAI({ apiKey: OPENAI_API_KEY })
|
|
1236
|
+
const resp = await openai.embeddings.create({
|
|
1237
|
+
model: 'text-embedding-3-small',
|
|
1238
|
+
input: text,
|
|
1239
|
+
dimensions: EMBEDDING_DIM
|
|
1240
|
+
})
|
|
1241
|
+
return resp.data[0].embedding
|
|
1242
|
+
|
|
1243
|
+
case 'voyage':
|
|
1244
|
+
const { VoyageAIClient } = require('voyageai')
|
|
1245
|
+
const voyage = new VoyageAIClient({ apiKey: VOYAGE_API_KEY })
|
|
1246
|
+
const vResp = await voyage.embed({ input: text, model: 'voyage-2' })
|
|
1247
|
+
return vResp.embeddings[0].slice(0, EMBEDDING_DIM)
|
|
1248
|
+
|
|
1249
|
+
default: // Mock embeddings for testing
|
|
1250
|
+
return new Array(EMBEDDING_DIM).fill(0).map((_, i) =>
|
|
1251
|
+
Math.sin(text.charCodeAt(i % text.length) * 0.1) * 0.5 + 0.5
|
|
1252
|
+
)
|
|
1253
|
+
}
|
|
1254
|
+
}
|
|
577
1255
|
|
|
578
|
-
|
|
579
|
-
|
|
1256
|
+
// ============================================================
|
|
1257
|
+
// STEP 4: Load Dataset with Embedding Triggers
|
|
1258
|
+
// ============================================================
|
|
1259
|
+
async function loadClaimsDataset() {
|
|
1260
|
+
// Load structured RDF data
|
|
1261
|
+
db.loadTtl(`
|
|
1262
|
+
@prefix : <http://insurance.org/> .
|
|
1263
|
+
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
|
|
1264
|
+
|
|
1265
|
+
# Claims
|
|
1266
|
+
:CLM001 a :Claim ;
|
|
1267
|
+
:amount "18500"^^xsd:decimal ;
|
|
1268
|
+
:description "Soft tissue injury from rear-end collision" ;
|
|
1269
|
+
:claimant :P001 ;
|
|
1270
|
+
:provider :PROV001 ;
|
|
1271
|
+
:filingDate "2024-11-15"^^xsd:date .
|
|
1272
|
+
|
|
1273
|
+
:CLM002 a :Claim ;
|
|
1274
|
+
:amount "22300"^^xsd:decimal ;
|
|
1275
|
+
:description "Whiplash injury from vehicle accident" ;
|
|
1276
|
+
:claimant :P002 ;
|
|
1277
|
+
:provider :PROV001 ;
|
|
1278
|
+
:filingDate "2024-11-18"^^xsd:date .
|
|
1279
|
+
|
|
1280
|
+
# Claimants
|
|
1281
|
+
:P001 a :Claimant ;
|
|
1282
|
+
:name "John Smith" ;
|
|
1283
|
+
:address "123 Main St, Miami, FL" ;
|
|
1284
|
+
:riskScore "0.85"^^xsd:decimal .
|
|
1285
|
+
|
|
1286
|
+
:P002 a :Claimant ;
|
|
1287
|
+
:name "Jane Doe" ;
|
|
1288
|
+
:address "123 Main St, Miami, FL" ; # Same address!
|
|
1289
|
+
:riskScore "0.72"^^xsd:decimal .
|
|
1290
|
+
|
|
1291
|
+
# Relationships (fraud indicators)
|
|
1292
|
+
:P001 :knows :P002 .
|
|
1293
|
+
:P001 :paidTo :P002 .
|
|
1294
|
+
:P002 :paidTo :P003 .
|
|
1295
|
+
:P003 :paidTo :P001 . # Circular payment!
|
|
1296
|
+
|
|
1297
|
+
# Provider
|
|
1298
|
+
:PROV001 a :Provider ;
|
|
1299
|
+
:name "Quick Care Rehabilitation Clinic" ;
|
|
1300
|
+
:flagCount "4"^^xsd:integer .
|
|
1301
|
+
`, null)
|
|
1302
|
+
|
|
1303
|
+
console.log(`[Dataset] Loaded ${db.countTriples()} triples`)
|
|
1304
|
+
|
|
1305
|
+
// Generate embeddings for claims (TRIGGER)
|
|
1306
|
+
const claims = ['CLM001', 'CLM002']
|
|
1307
|
+
for (const claimId of claims) {
|
|
1308
|
+
const desc = db.querySelect(`
|
|
1309
|
+
PREFIX : <http://insurance.org/>
|
|
1310
|
+
SELECT ?desc WHERE { :${claimId} :description ?desc }
|
|
1311
|
+
`)[0]?.bindings?.desc || claimId
|
|
1312
|
+
|
|
1313
|
+
const vector = await getEmbedding(desc)
|
|
1314
|
+
embeddings.storeVector(claimId, vector)
|
|
1315
|
+
console.log(`[Embedding] Stored ${claimId}: ${vector.slice(0, 3).map(v => v.toFixed(3)).join(', ')}...`)
|
|
1316
|
+
}
|
|
580
1317
|
|
|
581
|
-
//
|
|
582
|
-
|
|
583
|
-
|
|
584
|
-
|
|
585
|
-
|
|
1318
|
+
// Update 1-hop cache (TRIGGER)
|
|
1319
|
+
embeddings.onTripleInsert('CLM001', 'claimant', 'P001', null)
|
|
1320
|
+
embeddings.onTripleInsert('CLM001', 'provider', 'PROV001', null)
|
|
1321
|
+
embeddings.onTripleInsert('CLM002', 'claimant', 'P002', null)
|
|
1322
|
+
embeddings.onTripleInsert('CLM002', 'provider', 'PROV001', null)
|
|
1323
|
+
embeddings.onTripleInsert('P001', 'knows', 'P002', null)
|
|
1324
|
+
console.log('[1-Hop Cache] Updated neighbor relationships')
|
|
1325
|
+
|
|
1326
|
+
// Rebuild HNSW index
|
|
1327
|
+
embeddings.rebuildIndex()
|
|
1328
|
+
console.log('[HNSW Index] Rebuilt for similarity search')
|
|
1329
|
+
}
|
|
586
1330
|
|
|
587
|
-
|
|
588
|
-
|
|
589
|
-
|
|
590
|
-
|
|
591
|
-
|
|
592
|
-
|
|
593
|
-
|
|
594
|
-
|
|
1331
|
+
// ============================================================
|
|
1332
|
+
// STEP 5: Run Fraud Detection Pipeline
|
|
1333
|
+
// ============================================================
|
|
1334
|
+
async function runFraudDetection() {
|
|
1335
|
+
await loadClaimsDataset()
|
|
1336
|
+
|
|
1337
|
+
// Graph network analysis
|
|
1338
|
+
const graph = new GraphFrame(
|
|
1339
|
+
JSON.stringify([{id:'P001'}, {id:'P002'}, {id:'P003'}]),
|
|
1340
|
+
JSON.stringify([
|
|
1341
|
+
{src:'P001', dst:'P002'},
|
|
1342
|
+
{src:'P002', dst:'P003'},
|
|
1343
|
+
{src:'P003', dst:'P001'}
|
|
1344
|
+
])
|
|
1345
|
+
)
|
|
1346
|
+
|
|
1347
|
+
const triangles = graph.triangleCount()
|
|
1348
|
+
console.log(`[GraphFrame] Fraud rings detected: ${triangles}`)
|
|
1349
|
+
|
|
1350
|
+
// Semantic similarity search
|
|
1351
|
+
const similarClaims = JSON.parse(embeddings.findSimilar('CLM001', 5, 0.7))
|
|
1352
|
+
console.log(`[Embeddings] Claims similar to CLM001:`, similarClaims)
|
|
1353
|
+
|
|
1354
|
+
// Datalog rule-based inference
|
|
1355
|
+
const datalog = new DatalogProgram()
|
|
1356
|
+
datalog.addFact(JSON.stringify({predicate:'claim', terms:['CLM001','P001','PROV001']}))
|
|
1357
|
+
datalog.addFact(JSON.stringify({predicate:'claim', terms:['CLM002','P002','PROV001']}))
|
|
1358
|
+
datalog.addFact(JSON.stringify({predicate:'related', terms:['P001','P002']}))
|
|
1359
|
+
|
|
1360
|
+
datalog.addRule(JSON.stringify({
|
|
1361
|
+
head: {predicate:'collusion', terms:['?P1','?P2','?Prov']},
|
|
1362
|
+
body: [
|
|
1363
|
+
{predicate:'claim', terms:['?C1','?P1','?Prov']},
|
|
1364
|
+
{predicate:'claim', terms:['?C2','?P2','?Prov']},
|
|
1365
|
+
{predicate:'related', terms:['?P1','?P2']}
|
|
1366
|
+
]
|
|
1367
|
+
}))
|
|
1368
|
+
|
|
1369
|
+
const result = JSON.parse(evaluateDatalog(datalog))
|
|
1370
|
+
console.log('[Datalog] Collusion detected:', result.collusion)
|
|
1371
|
+
// Output: [["P001","P002","PROV001"]]
|
|
1372
|
+
}
|
|
595
1373
|
|
|
596
|
-
|
|
597
|
-
console.log('Collusion detected:', result.collusion)
|
|
598
|
-
// Output: [["P001","P002","PROV001"]]
|
|
1374
|
+
runFraudDetection()
|
|
599
1375
|
```
|
|
600
1376
|
|
|
601
1377
|
**Run it yourself:**
|