rust-kgdb 0.6.63 → 0.6.64
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.archive.md.old +1206 -0
- package/README.md +196 -874
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,331 +1,111 @@
|
|
|
1
1
|
# rust-kgdb
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
[](https://opensource.org/licenses/Apache-2.0)
|
|
5
|
-
[](https://www.w3.org/TR/sparql11-query/)
|
|
6
|
-
|
|
7
|
-
---
|
|
3
|
+
High-performance RDF/SPARQL database with AI agent framework.
|
|
8
4
|
|
|
9
5
|
## The Problem With AI Today
|
|
10
6
|
|
|
11
7
|
Enterprise AI projects keep failing. Not because the technology is bad, but because organizations use it wrong.
|
|
12
8
|
|
|
13
|
-
A claims investigator asks ChatGPT:
|
|
9
|
+
A claims investigator asks ChatGPT: "Has Provider #4521 shown suspicious billing patterns?"
|
|
14
10
|
|
|
15
|
-
The AI responds confidently:
|
|
11
|
+
The AI responds confidently: "Yes, Provider #4521 has a history of duplicate billing and upcoding."
|
|
16
12
|
|
|
17
|
-
The investigator opens a case. Weeks later, legal discovers
|
|
13
|
+
The investigator opens a case. Weeks later, legal discovers Provider #4521 has a perfect record. The AI made it up. Lawsuit incoming.
|
|
18
14
|
|
|
19
15
|
This keeps happening:
|
|
20
16
|
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
**A fraud analyst** flags Account #7842 for money laundering. It belongs to a children's charity.
|
|
17
|
+
- A lawyer cites "Smith v. Johnson (2019)" in court. The judge is confused. That case does not exist.
|
|
18
|
+
- A doctor avoids prescribing "Nexapril" due to cardiac interactions. Nexapril is not a real drug.
|
|
19
|
+
- A fraud analyst flags Account #7842 for money laundering. It belongs to a children's charity.
|
|
26
20
|
|
|
27
21
|
Every time, the same pattern: The AI sounds confident. The AI is wrong. People get hurt.
|
|
28
22
|
|
|
29
|
-
---
|
|
30
|
-
|
|
31
|
-
## The Engineering Problem
|
|
32
|
-
|
|
33
|
-
**The root cause is simple:** LLMs are language models, not databases. They predict plausible text. They don't look up facts.
|
|
34
|
-
|
|
35
|
-
When you ask "Has Provider #4521 shown suspicious patterns?", the LLM doesn't query your claims database. It generates text that *sounds like* an answer based on patterns from its training data.
|
|
36
|
-
|
|
37
|
-
**The industry's response?** Add guardrails. Use RAG. Fine-tune models.
|
|
38
|
-
|
|
39
|
-
These help, but they're patches. RAG retrieves *similar* documents - similar isn't the same as *correct*. Fine-tuning teaches patterns, not facts. Guardrails catch obvious errors, but "Provider #4521 has billing anomalies" sounds perfectly plausible.
|
|
40
|
-
|
|
41
|
-
**A real solution requires a different architecture.** One built on solid engineering principles, not hope.
|
|
42
|
-
|
|
43
|
-
---
|
|
44
|
-
|
|
45
23
|
## The Solution
|
|
46
24
|
|
|
47
|
-
What if AI stopped providing
|
|
25
|
+
What if AI stopped providing answers and started generating queries?
|
|
48
26
|
|
|
49
|
-
|
|
50
|
-
-
|
|
51
|
-
-
|
|
52
|
-
- **You need both** working together
|
|
27
|
+
- Your database knows the facts (claims, providers, transactions)
|
|
28
|
+
- AI understands language (can parse "find suspicious patterns")
|
|
29
|
+
- You need both working together
|
|
53
30
|
|
|
54
31
|
The AI translates intent into queries. The database finds facts. The AI never makes up data.
|
|
55
32
|
|
|
56
|
-
|
|
57
|
-
Before (Dangerous):
|
|
58
|
-
Human: "Is Provider #4521 suspicious?"
|
|
59
|
-
AI: "Yes, they have billing anomalies" <- FABRICATED
|
|
60
|
-
|
|
61
|
-
After (Safe):
|
|
62
|
-
Human: "Is Provider #4521 suspicious?"
|
|
63
|
-
AI: Generates SPARQL query -> Executes against YOUR database
|
|
64
|
-
Database: Returns actual facts about Provider #4521
|
|
65
|
-
Result: Real data with audit trail <- VERIFIABLE
|
|
66
|
-
```
|
|
67
|
-
|
|
68
|
-
rust-kgdb is a knowledge graph database with an AI layer that **cannot hallucinate** because it only returns data from your actual systems.
|
|
69
|
-
|
|
70
|
-
---
|
|
33
|
+
rust-kgdb is a knowledge graph database with an AI layer that cannot hallucinate because it only returns data from your actual systems.
|
|
71
34
|
|
|
72
35
|
## The Business Value
|
|
73
36
|
|
|
74
|
-
|
|
75
|
-
-
|
|
76
|
-
-
|
|
77
|
-
-
|
|
78
|
-
- **Instant deployment** - `npm install` and you're running
|
|
79
|
-
|
|
80
|
-
**For Engineering Teams:**
|
|
81
|
-
- **449ns lookups** - 35x faster than RDFox, the previous gold standard
|
|
82
|
-
- **24 bytes per triple** - 25% more memory efficient than competitors
|
|
83
|
-
- **132K writes/sec** - Handle enterprise transaction volumes
|
|
84
|
-
- **94% recall on memory retrieval** - Agent remembers past queries accurately
|
|
37
|
+
For Enterprises:
|
|
38
|
+
- Zero hallucinations - Every answer traces back to your actual data
|
|
39
|
+
- Full audit trail - Regulators can verify every AI decision (SOX, GDPR, FDA 21 CFR Part 11)
|
|
40
|
+
- No infrastructure - Runs embedded in your app, no servers to manage
|
|
85
41
|
|
|
86
|
-
|
|
87
|
-
-
|
|
88
|
-
-
|
|
89
|
-
-
|
|
90
|
-
- **Schema-aware generation** - AI uses YOUR ontology, not guessed class names
|
|
42
|
+
For Engineering Teams:
|
|
43
|
+
- 449ns lookups - 35x faster than RDFox
|
|
44
|
+
- 24 bytes per triple - 25% more memory efficient than competitors
|
|
45
|
+
- 132K writes/sec - Handle enterprise transaction volumes
|
|
91
46
|
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
47
|
+
For AI/ML Teams:
|
|
48
|
+
- 86.4% SPARQL accuracy - vs 0% with vanilla LLMs on LUBM benchmark
|
|
49
|
+
- 16ms similarity search - Find related entities across 10K vectors
|
|
50
|
+
- Schema-aware generation - AI uses YOUR ontology, not guessed class names
|
|
95
51
|
|
|
96
52
|
## What Is rust-kgdb?
|
|
97
53
|
|
|
98
|
-
|
|
54
|
+
Two components, one npm package:
|
|
99
55
|
|
|
100
56
|
### rust-kgdb Core: Embedded Knowledge Graph Database
|
|
101
57
|
|
|
102
|
-
A high-performance RDF/SPARQL database that runs
|
|
58
|
+
A high-performance RDF/SPARQL database that runs inside your application. No server. No Docker. No config.
|
|
103
59
|
|
|
104
60
|
```
|
|
105
61
|
+-----------------------------------------------------------------------------+
|
|
106
|
-
| rust-kgdb CORE ENGINE
|
|
107
|
-
|
|
|
108
|
-
|
|
|
109
|
-
| |
|
|
110
|
-
| |
|
|
111
|
-
| | 449ns
|
|
112
|
-
|
|
|
113
|
-
|
|
|
62
|
+
| rust-kgdb CORE ENGINE |
|
|
63
|
+
| |
|
|
64
|
+
| +-----------+ +-----------+ +-----------+ +-----------+ |
|
|
65
|
+
| | GraphDB | |GraphFrame | |Embeddings | | Datalog | |
|
|
66
|
+
| | (SPARQL) | |(Analytics)| | (HNSW) | |(Reasoning)| |
|
|
67
|
+
| | 449ns | | PageRank | | 16ms/10K | |Semi-naive | |
|
|
68
|
+
| +-----------+ +-----------+ +-----------+ +-----------+ |
|
|
69
|
+
| |
|
|
114
70
|
| Storage: InMemory | RocksDB | LMDB Standards: SPARQL 1.1 | RDF 1.2 |
|
|
115
|
-
| Memory: 24 bytes/triple Compliance: SHACL | PROV | OWL 2 RL |
|
|
116
71
|
+-----------------------------------------------------------------------------+
|
|
117
72
|
```
|
|
118
73
|
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
|
122
|
-
|
|
123
|
-
|
|
|
124
|
-
| **Memory/Triple** | 24 bytes | 32 bytes | 50-60 bytes | Fit more data in memory |
|
|
125
|
-
| **Bulk Insert** | 146K/sec | 200K/sec | 50K/sec | Load million-record datasets fast |
|
|
126
|
-
| **Concurrent Writes** | 132K ops/sec | - | - | Handle enterprise transaction volumes |
|
|
74
|
+
| Metric | rust-kgdb | RDFox | Apache Jena |
|
|
75
|
+
|--------|-----------|-------|-------------|
|
|
76
|
+
| Lookup | 449 ns | 5,000+ ns | 10,000+ ns |
|
|
77
|
+
| Memory/Triple | 24 bytes | 32 bytes | 50-60 bytes |
|
|
78
|
+
| Bulk Insert | 146K/sec | 200K/sec | 50K/sec |
|
|
127
79
|
|
|
128
|
-
|
|
80
|
+
Like SQLite - but for knowledge graphs.
|
|
129
81
|
|
|
130
82
|
### HyperMind: Neuro-Symbolic Agent Framework
|
|
131
83
|
|
|
132
|
-
An AI agent layer that uses
|
|
84
|
+
An AI agent layer that uses the database to prevent hallucinations. The LLM plans, the database executes.
|
|
133
85
|
|
|
134
86
|
```
|
|
135
87
|
+-----------------------------------------------------------------------------+
|
|
136
|
-
| HYPERMIND AGENT FRAMEWORK
|
|
137
|
-
|
|
|
138
|
-
|
|
|
139
|
-
| |
|
|
140
|
-
| |
|
|
141
|
-
|
|
|
142
|
-
|
|
|
143
|
-
| Type Theory: Hindley-Milner types ensure tool composition is valid
|
|
144
|
-
| Category Theory: Tools are morphisms (A -> B) with composition laws
|
|
145
|
-
| Proof Theory: Every execution produces cryptographic audit trail
|
|
88
|
+
| HYPERMIND AGENT FRAMEWORK |
|
|
89
|
+
| |
|
|
90
|
+
| +-----------+ +-----------+ +-----------+ +-----------+ |
|
|
91
|
+
| |LLMPlanner | |WasmSandbox| | ProofDAG | | Memory | |
|
|
92
|
+
| |(Claude/GPT| | (Security)| | (Audit) | |(Hypergraph| |
|
|
93
|
+
| +-----------+ +-----------+ +-----------+ +-----------+ |
|
|
94
|
+
| |
|
|
95
|
+
| Type Theory: Hindley-Milner types ensure tool composition is valid |
|
|
96
|
+
| Category Theory: Tools are morphisms (A -> B) with composition laws |
|
|
97
|
+
| Proof Theory: Every execution produces cryptographic audit trail |
|
|
146
98
|
+-----------------------------------------------------------------------------+
|
|
147
99
|
```
|
|
148
100
|
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
|
152
|
-
|
|
153
|
-
|
|
|
154
|
-
|
|
|
155
|
-
| **DSPy** | 14.3% | 71.4% | Better prompting helps slightly |
|
|
156
|
-
| **HyperMind** | - | 71.4% | Schema integrated by design |
|
|
157
|
-
|
|
158
|
-
*Honest numbers: All frameworks achieve similar accuracy WITH schema. The difference is HyperMind integrates schema handling - you don't manually inject it.*
|
|
159
|
-
|
|
160
|
-
**Memory Retrieval (Agent Recall Benchmark):**
|
|
161
|
-
|
|
162
|
-
| Metric | HyperMind | Typical RAG | Why It Matters |
|
|
163
|
-
|--------|-----------|-------------|----------------|
|
|
164
|
-
| **Recall@10** | 94% at 10K depth | ~70% | Find the right past query |
|
|
165
|
-
| **Search Speed** | 16.7ms / 10K queries | 500ms+ | 30x faster context retrieval |
|
|
166
|
-
| **Idempotent Responses** | Yes (semantic hash) | No | Same question = same answer |
|
|
167
|
-
|
|
168
|
-
**Long-Term Memory: Deep Flashback**
|
|
169
|
-
|
|
170
|
-
Most AI agents forget everything between sessions. HyperMind stores memory in the *same* knowledge graph as your data:
|
|
171
|
-
|
|
172
|
-
- **Episodes** link to **KG entities** via hyper-edges
|
|
173
|
-
- **Embeddings** enable semantic search over past queries
|
|
174
|
-
- **Temporal decay** prioritizes recent, relevant memories
|
|
175
|
-
- **Single SPARQL query** traverses both memory AND knowledge graph
|
|
176
|
-
|
|
177
|
-
When your fraud analyst asks "What did we find about Provider X last month?", the agent doesn't say "I don't remember." It retrieves the exact investigation with full context - 94% recall at 10,000 queries deep.
|
|
178
|
-
|
|
179
|
-
**The insight:** AI writes questions (SPARQL queries). Database finds answers. No hallucination possible.
|
|
180
|
-
|
|
181
|
-
---
|
|
182
|
-
|
|
183
|
-
## The Engineering Choices
|
|
184
|
-
|
|
185
|
-
Every decision in this codebase has a reason:
|
|
186
|
-
|
|
187
|
-
**Why embedded, not client-server?**
|
|
188
|
-
Because data shouldn't leave your infrastructure. An embedded database means your patient records, claims data, and transaction histories never cross a network boundary. HIPAA compliance by architecture, not policy.
|
|
189
|
-
|
|
190
|
-
**Why SPARQL, not SQL?**
|
|
191
|
-
Because relationships matter. "Find all providers connected to this claimant through any intermediary" is one line in SPARQL. It's a nightmare in SQL with recursive CTEs. Knowledge graphs are built for connection queries.
|
|
192
|
-
|
|
193
|
-
**Why category theory for tools?**
|
|
194
|
-
Because composition must be safe. When Tool A outputs a `BindingSet` and Tool B expects a `Pattern`, the type system catches it at build time. No runtime surprises. No "undefined is not a function."
|
|
195
|
-
|
|
196
|
-
**Why WASM sandbox for agents?**
|
|
197
|
-
Because AI shouldn't have unlimited power. The sandbox enforces capability-based security. An agent can read the knowledge graph but can't delete data. It can execute 1M operations but not infinite loop. Defense in depth.
|
|
198
|
-
|
|
199
|
-
**Why Datalog for reasoning?**
|
|
200
|
-
Because rules should cascade. A fraud pattern that triggers another rule that triggers another - Datalog handles recursive inference naturally. Semi-naive evaluation ensures we don't recompute what we already know.
|
|
201
|
-
|
|
202
|
-
**Why HNSW for embeddings?**
|
|
203
|
-
Because O(log n) beats O(n). Finding similar claims from 100K vectors shouldn't scan all 100K. HNSW builds a navigable graph - ~20 hops to find your answer regardless of dataset size.
|
|
204
|
-
|
|
205
|
-
**Why clustered mode for scale?**
|
|
206
|
-
Because some problems don't fit on one machine. The same codebase that runs embedded on your laptop scales to Kubernetes clusters for billion-triple graphs. HDRF (High-Degree Replicated First) partitioning keeps high-connectivity nodes available across partitions. Raft consensus ensures consistency. gRPC handles inter-node communication. You write the same code - deployment decides the scale.
|
|
207
|
-
|
|
208
|
-
These aren't arbitrary choices. Each one solves a real problem I encountered building enterprise AI systems.
|
|
209
|
-
|
|
210
|
-
---
|
|
211
|
-
|
|
212
|
-
## Why Our Tool Calling Is Different
|
|
213
|
-
|
|
214
|
-
Traditional AI tool calling (OpenAI Functions, LangChain Tools) has fundamental problems:
|
|
215
|
-
|
|
216
|
-
**The Traditional Approach:**
|
|
217
|
-
```
|
|
218
|
-
LLM generates JSON -> Runtime validates schema -> Tool executes -> Hope it works
|
|
219
|
-
```
|
|
220
|
-
|
|
221
|
-
1. **Schema is decorative.** The LLM sees a JSON schema and tries to match it. No guarantee outputs are correct types.
|
|
222
|
-
2. **Composition is ad-hoc.** Chain Tool A -> Tool B? Pray that A's output format happens to match B's input.
|
|
223
|
-
3. **Errors happen at runtime.** You find out a tool chain is broken when a user hits it in production.
|
|
224
|
-
4. **No mathematical guarantees.** "It usually works" is the best you get.
|
|
225
|
-
|
|
226
|
-
**Our Approach: Tools as Typed Morphisms**
|
|
227
|
-
```
|
|
228
|
-
Tools are arrows in a category:
|
|
229
|
-
kg.sparql.query: Query -> BindingSet
|
|
230
|
-
kg.motif.find: Pattern -> Matches
|
|
231
|
-
kg.embeddings.search: EntityId -> SimilarEntities
|
|
232
|
-
|
|
233
|
-
Composition is verified:
|
|
234
|
-
f: A -> B
|
|
235
|
-
g: B -> C
|
|
236
|
-
g o f: A -> C [x] Compiles only if types match
|
|
237
|
-
|
|
238
|
-
Errors caught at plan time, not runtime.
|
|
239
|
-
```
|
|
240
|
-
|
|
241
|
-
**What this means in practice:**
|
|
242
|
-
|
|
243
|
-
| Problem | Traditional | HyperMind |
|
|
244
|
-
|---------|-------------|-----------|
|
|
245
|
-
| **Type mismatch** | Runtime error | Won't compile |
|
|
246
|
-
| **Tool chaining** | Hope it works | Type-checked composition |
|
|
247
|
-
| **Output validation** | Schema validation (partial) | Refinement types (complete) |
|
|
248
|
-
| **Audit trail** | Optional logging | Built-in proof witnesses |
|
|
249
|
-
|
|
250
|
-
**Refinement Types: Beyond Basic Types**
|
|
251
|
-
|
|
252
|
-
We don't just have `string` and `number`. We have:
|
|
253
|
-
- `RiskScore` (number between 0 and 1)
|
|
254
|
-
- `PolicyNumber` (matches regex `^POL-\d{8}$`)
|
|
255
|
-
- `CreditScore` (integer between 300 and 850)
|
|
256
|
-
|
|
257
|
-
The type system *guarantees* a tool that outputs `RiskScore` produces a valid risk score. Not "probably" - mathematically proven.
|
|
258
|
-
|
|
259
|
-
**The Insight:** Category theory isn't academic overhead. It's the same math that makes your database transactions safe (ACID = category theory applied to data). We apply it to tool composition.
|
|
260
|
-
|
|
261
|
-
**Trust Model: Proxied Execution**
|
|
262
|
-
|
|
263
|
-
Traditional tool calling trusts the LLM output completely:
|
|
264
|
-
```
|
|
265
|
-
LLM -> Tool (direct execution) -> Result
|
|
266
|
-
```
|
|
267
|
-
|
|
268
|
-
The LLM decides what to execute. The tool runs it blindly. This is why prompt injection attacks work - the LLM's output *is* the program.
|
|
269
|
-
|
|
270
|
-
**Our approach: Agent -> Proxy -> Sandbox -> Tool**
|
|
271
|
-
```
|
|
272
|
-
+---------------------------------------------------------------------+
|
|
273
|
-
| Agent Request: "Find suspicious claims" |
|
|
274
|
-
+----------------------------+----------------------------------------+
|
|
275
|
-
|
|
|
276
|
-
v
|
|
277
|
-
+---------------------------------------------------------------------+
|
|
278
|
-
| LLMPlanner: Generates tool call plan |
|
|
279
|
-
| -> kg.sparql.query(pattern) |
|
|
280
|
-
| -> kg.datalog.infer(rules) |
|
|
281
|
-
+----------------------------+----------------------------------------+
|
|
282
|
-
| Plan (NOT executed yet)
|
|
283
|
-
v
|
|
284
|
-
+---------------------------------------------------------------------+
|
|
285
|
-
| HyperAgentProxy: Validates plan against capabilities |
|
|
286
|
-
| [x] Does agent have ReadKG capability? Yes |
|
|
287
|
-
| [x] Is query schema-valid? Yes |
|
|
288
|
-
| [x] Are all types correct? Yes |
|
|
289
|
-
| [ ] Blocked: WriteKG not in capability set |
|
|
290
|
-
+----------------------------+----------------------------------------+
|
|
291
|
-
| Validated plan only
|
|
292
|
-
v
|
|
293
|
-
+---------------------------------------------------------------------+
|
|
294
|
-
| WasmSandbox: Executes with resource limits |
|
|
295
|
-
| * Fuel metering: 1M operations max |
|
|
296
|
-
| * Memory cap: 64MB |
|
|
297
|
-
| * Capability enforcement: Cannot exceed granted permissions |
|
|
298
|
-
+----------------------------+----------------------------------------+
|
|
299
|
-
| Execution with audit
|
|
300
|
-
v
|
|
301
|
-
+---------------------------------------------------------------------+
|
|
302
|
-
| ProofDAG: Records execution witness |
|
|
303
|
-
| * What tool ran |
|
|
304
|
-
| * What inputs were used |
|
|
305
|
-
| * What outputs were produced |
|
|
306
|
-
| * SHA-256 hash of entire execution |
|
|
307
|
-
+---------------------------------------------------------------------+
|
|
308
|
-
```
|
|
309
|
-
|
|
310
|
-
The LLM never executes directly. It proposes. The proxy validates. The sandbox enforces. The proof records. Four independent layers of defense.
|
|
311
|
-
|
|
312
|
-
---
|
|
313
|
-
|
|
314
|
-
## What You Can Do
|
|
315
|
-
|
|
316
|
-
| Query Type | Use Case | Example |
|
|
317
|
-
|------------|----------|---------|
|
|
318
|
-
| **SPARQL** | Find connected entities | `SELECT ?claim WHERE { ?claim :provider :PROV001 }` |
|
|
319
|
-
| **Datalog** | Recursive fraud detection | `fraud_ring(X,Y) :- knows(X,Y), claims_with(X,P), claims_with(Y,P)` |
|
|
320
|
-
| **Motif** | Network pattern matching | `(a)-[e1]->(b); (b)-[e2]->(a)` finds circular relationships |
|
|
321
|
-
| **GraphFrame** | Social network analysis | `gf.pageRank(0.15, 20)` ranks entities by connection importance |
|
|
322
|
-
| **Pregel** | Shortest paths at scale | `pregelShortestPaths(gf, 'source', 100)` for billion-edge graphs |
|
|
323
|
-
| **Embeddings** | Semantic similarity | `embeddings.findSimilar('CLM001', 10, 0.7)` finds related claims |
|
|
324
|
-
| **Agent** | Natural language interface | `agent.ask("Which providers show fraud patterns?")` |
|
|
325
|
-
|
|
326
|
-
Each of these runs in the same embedded database. No separate systems to maintain.
|
|
101
|
+
| Framework | Without Schema | With Schema |
|
|
102
|
+
|-----------|---------------|-------------|
|
|
103
|
+
| Vanilla LLM | 0% | - |
|
|
104
|
+
| LangChain | 0% | 71.4% |
|
|
105
|
+
| DSPy | 14.3% | 71.4% |
|
|
106
|
+
| HyperMind | - | 71.4% |
|
|
327
107
|
|
|
328
|
-
|
|
108
|
+
All frameworks achieve similar accuracy WITH schema. The difference is HyperMind integrates schema handling - you do not manually inject it.
|
|
329
109
|
|
|
330
110
|
## Quick Start
|
|
331
111
|
|
|
@@ -379,47 +159,6 @@ console.log(result.evidence);
|
|
|
379
159
|
// Full audit trail proving every fact came from your database
|
|
380
160
|
```
|
|
381
161
|
|
|
382
|
-
---
|
|
383
|
-
|
|
384
|
-
## Architecture: Two Layers
|
|
385
|
-
|
|
386
|
-
```
|
|
387
|
-
+---------------------------------------------------------------------------------+
|
|
388
|
-
| YOUR APPLICATION |
|
|
389
|
-
| (Fraud Detection, Underwriting, Compliance) |
|
|
390
|
-
+------------------------------------+--------------------------------------------+
|
|
391
|
-
|
|
|
392
|
-
+------------------------------------v--------------------------------------------+
|
|
393
|
-
| HYPERMIND AGENT FRAMEWORK (JavaScript) |
|
|
394
|
-
| +----------------------------------------------------------------------------+ |
|
|
395
|
-
| | * LLMPlanner: Natural language -> typed tool pipelines | |
|
|
396
|
-
| | * WasmSandbox: Capability-based security with fuel metering | |
|
|
397
|
-
| | * ProofDAG: Cryptographic audit trail (SHA-256) | |
|
|
398
|
-
| | * MemoryHypergraph: Temporal agent memory with KG integration | |
|
|
399
|
-
| | * TypeId: Hindley-Milner type system with refinement types | |
|
|
400
|
-
| +----------------------------------------------------------------------------+ |
|
|
401
|
-
| |
|
|
402
|
-
| Category Theory: Tools as Morphisms (A -> B) |
|
|
403
|
-
| Proof Theory: Every execution has a witness |
|
|
404
|
-
+------------------------------------+--------------------------------------------+
|
|
405
|
-
| NAPI-RS Bindings
|
|
406
|
-
+------------------------------------v--------------------------------------------+
|
|
407
|
-
| RUST CORE ENGINE (Native Performance) |
|
|
408
|
-
| +----------------------------------------------------------------------------+ |
|
|
409
|
-
| | GraphDB | RDF/SPARQL quad store | 449ns lookups, 24 bytes/triple|
|
|
410
|
-
| | GraphFrame | Graph algorithms | WCOJ optimal joins, PageRank |
|
|
411
|
-
| | EmbeddingService | Vector similarity | HNSW index, 1-hop ARCADE cache|
|
|
412
|
-
| | DatalogProgram | Rule-based reasoning | Semi-naive evaluation |
|
|
413
|
-
| | Pregel | BSP graph processing | Billion-edge scale |
|
|
414
|
-
| +----------------------------------------------------------------------------+ |
|
|
415
|
-
| |
|
|
416
|
-
| W3C Standards: SPARQL 1.1 (100%) | RDF 1.2 | OWL 2 RL | SHACL | PROV |
|
|
417
|
-
| Storage Backends: InMemory | RocksDB | LMDB |
|
|
418
|
-
+----------------------------------------------------------------------------------+
|
|
419
|
-
```
|
|
420
|
-
|
|
421
|
-
---
|
|
422
|
-
|
|
423
162
|
## Core Components
|
|
424
163
|
|
|
425
164
|
### GraphDB: SPARQL Engine (449ns lookups)
|
|
@@ -463,7 +202,7 @@ const gf = new GraphFrame(
|
|
|
463
202
|
// Algorithms
|
|
464
203
|
console.log('PageRank:', gf.pageRank(0.15, 20));
|
|
465
204
|
console.log('Connected Components:', gf.connectedComponents());
|
|
466
|
-
console.log('Triangles:', gf.triangleCount());
|
|
205
|
+
console.log('Triangles:', gf.triangleCount());
|
|
467
206
|
console.log('Shortest Paths:', gf.shortestPaths('alice'));
|
|
468
207
|
|
|
469
208
|
// Motif finding (pattern matching)
|
|
@@ -477,19 +216,15 @@ const { EmbeddingService } = require('rust-kgdb');
|
|
|
477
216
|
|
|
478
217
|
const embeddings = new EmbeddingService();
|
|
479
218
|
|
|
480
|
-
// Store 384-dimensional vectors
|
|
481
|
-
embeddings.storeVector('claim_001',
|
|
482
|
-
embeddings.storeVector('claim_002',
|
|
219
|
+
// Store 384-dimensional vectors
|
|
220
|
+
embeddings.storeVector('claim_001', vectorFromOpenAI);
|
|
221
|
+
embeddings.storeVector('claim_002', vectorFromOpenAI);
|
|
483
222
|
|
|
484
223
|
// Build HNSW index
|
|
485
224
|
embeddings.rebuildIndex();
|
|
486
225
|
|
|
487
226
|
// Find similar (16ms for 10K vectors)
|
|
488
227
|
const similar = embeddings.findSimilar('claim_001', 10, 0.7);
|
|
489
|
-
|
|
490
|
-
// 1-hop neighbor cache (ARCADE algorithm)
|
|
491
|
-
embeddings.onTripleInsert('claim_001', 'claimant', 'person_123', null);
|
|
492
|
-
const neighbors = embeddings.getNeighborsOut('person_123');
|
|
493
228
|
```
|
|
494
229
|
|
|
495
230
|
### DatalogProgram: Rule-Based Reasoning
|
|
@@ -517,239 +252,114 @@ const inferred = evaluateDatalog(datalog);
|
|
|
517
252
|
// connected(alice, charlie) - derived!
|
|
518
253
|
```
|
|
519
254
|
|
|
520
|
-
|
|
521
|
-
|
|
522
|
-
```javascript
|
|
523
|
-
const { pregelShortestPaths, chainGraph } = require('rust-kgdb');
|
|
524
|
-
|
|
525
|
-
// Create large graph
|
|
526
|
-
const graph = chainGraph(10000); // 10K vertices
|
|
527
|
-
|
|
528
|
-
// Run Pregel BSP algorithm
|
|
529
|
-
const distances = pregelShortestPaths(graph, 'v0', 100);
|
|
530
|
-
```
|
|
531
|
-
|
|
532
|
-
---
|
|
255
|
+
## Why Our Tool Calling Is Different
|
|
533
256
|
|
|
534
|
-
|
|
257
|
+
Traditional AI tool calling (OpenAI Functions, LangChain Tools) has problems:
|
|
535
258
|
|
|
536
|
-
|
|
259
|
+
1. Schema is decorative - The LLM sees a JSON schema and tries to match it. No guarantee outputs are correct types.
|
|
260
|
+
2. Composition is ad-hoc - Chain Tool A to Tool B? Pray that A's output format happens to match B's input.
|
|
261
|
+
3. Errors happen at runtime - You find out a tool chain is broken when a user hits it in production.
|
|
537
262
|
|
|
538
|
-
|
|
539
|
-
User: "Find all professors"
|
|
540
|
-
|
|
541
|
-
Vanilla LLM Output:
|
|
542
|
-
+-----------------------------------------------------------------------+
|
|
543
|
-
| ```sparql |
|
|
544
|
-
| SELECT ?professor WHERE { ?professor a ub:Faculty . } |
|
|
545
|
-
| ``` <- Parser rejects markdown |
|
|
546
|
-
| |
|
|
547
|
-
| This query retrieves faculty members. |
|
|
548
|
-
| ^ Mixed text breaks parsing |
|
|
549
|
-
+-----------------------------------------------------------------------+
|
|
550
|
-
Result: FAIL PARSER ERROR - Invalid SPARQL syntax
|
|
551
|
-
```
|
|
263
|
+
Our Approach: Tools as Typed Morphisms
|
|
552
264
|
|
|
553
|
-
|
|
265
|
+
Tools are arrows in a category with verified composition:
|
|
266
|
+
- kg.sparql.query: Query to BindingSet
|
|
267
|
+
- kg.motif.find: Pattern to Matches
|
|
268
|
+
- kg.embeddings.search: EntityId to SimilarEntities
|
|
554
269
|
|
|
555
|
-
|
|
270
|
+
The type system catches mismatches at plan time, not runtime.
|
|
556
271
|
|
|
557
|
-
|
|
558
|
-
|
|
559
|
-
|
|
560
|
-
|
|
561
|
-
|
|
562
|
-
|
|
|
563
|
-
| SELECT ?professor WHERE { ?professor a ub:Professor . } |
|
|
564
|
-
+-----------------------------------------------------------------------+
|
|
565
|
-
Result: OK 15 results returned in 2.3ms
|
|
566
|
-
```
|
|
567
|
-
|
|
568
|
-
**Why it works:**
|
|
569
|
-
1. **Schema-aware** - Knows actual class names from your ontology
|
|
570
|
-
2. **Type-checked** - Query validated before execution
|
|
571
|
-
3. **No text pollution** - Output is pure SPARQL, not markdown
|
|
572
|
-
|
|
573
|
-
**Accuracy: 0% -> 86.4%** (LUBM benchmark, 14 queries)
|
|
272
|
+
| Problem | Traditional | HyperMind |
|
|
273
|
+
|---------|-------------|-----------|
|
|
274
|
+
| Type mismatch | Runtime error | Will not compile |
|
|
275
|
+
| Tool chaining | Hope it works | Type-checked composition |
|
|
276
|
+
| Output validation | Schema validation (partial) | Refinement types (complete) |
|
|
277
|
+
| Audit trail | Optional logging | Built-in proof witnesses |
|
|
574
278
|
|
|
575
|
-
|
|
279
|
+
## Trust Model: Proxied Execution
|
|
576
280
|
|
|
577
|
-
|
|
578
|
-
const {
|
|
579
|
-
HyperMindAgent,
|
|
580
|
-
LLMPlanner,
|
|
581
|
-
WasmSandbox,
|
|
582
|
-
AgentBuilder,
|
|
583
|
-
TOOL_REGISTRY
|
|
584
|
-
} = require('rust-kgdb');
|
|
585
|
-
|
|
586
|
-
// Build custom agent
|
|
587
|
-
const agent = new AgentBuilder('fraud-detector')
|
|
588
|
-
.withTool('kg.sparql.query')
|
|
589
|
-
.withTool('kg.datalog.infer')
|
|
590
|
-
.withTool('kg.embeddings.search')
|
|
591
|
-
.withPlanner(new LLMPlanner('claude-sonnet-4', TOOL_REGISTRY))
|
|
592
|
-
.withSandbox({
|
|
593
|
-
capabilities: ['ReadKG', 'ExecuteTool'], // No WriteKG
|
|
594
|
-
fuelLimit: 1000000,
|
|
595
|
-
maxMemory: 64 * 1024 * 1024
|
|
596
|
-
})
|
|
597
|
-
.build();
|
|
598
|
-
|
|
599
|
-
// Execute with natural language
|
|
600
|
-
const result = await agent.call("Find circular payment patterns");
|
|
601
|
-
|
|
602
|
-
// Get cryptographic proof
|
|
603
|
-
console.log(result.witness.proof_hash); // sha256:a3f2b8c9...
|
|
604
|
-
```
|
|
281
|
+
Traditional tool calling trusts the LLM output completely. The LLM decides what to execute. The tool runs it blindly.
|
|
605
282
|
|
|
606
|
-
|
|
283
|
+
Our approach: Agent to Proxy to Sandbox to Tool
|
|
607
284
|
|
|
608
|
-
```javascript
|
|
609
|
-
const sandbox = new WasmSandbox({
|
|
610
|
-
capabilities: ['ReadKG', 'ExecuteTool'], // Fine-grained
|
|
611
|
-
fuelLimit: 1000000, // CPU metering
|
|
612
|
-
maxMemory: 64 * 1024 * 1024 // Memory limit
|
|
613
|
-
});
|
|
614
|
-
|
|
615
|
-
// All tool calls are:
|
|
616
|
-
// [x] Capability-checked
|
|
617
|
-
// [x] Fuel-metered
|
|
618
|
-
// [x] Memory-bounded
|
|
619
|
-
// [x] Logged for audit
|
|
620
285
|
```
|
|
621
|
-
|
|
622
|
-
|
|
623
|
-
|
|
624
|
-
|
|
625
|
-
|
|
626
|
-
|
|
627
|
-
|
|
628
|
-
|
|
629
|
-
|
|
630
|
-
|
|
631
|
-
|
|
632
|
-
|
|
633
|
-
|
|
634
|
-
|
|
286
|
+
+---------------------------------------------------------------------+
|
|
287
|
+
| Agent Request: "Find suspicious claims" |
|
|
288
|
+
+--------------------------------+------------------------------------+
|
|
289
|
+
|
|
|
290
|
+
v
|
|
291
|
+
+---------------------------------------------------------------------+
|
|
292
|
+
| LLMPlanner: Generates tool call plan |
|
|
293
|
+
| -> kg.sparql.query(pattern) |
|
|
294
|
+
| -> kg.datalog.infer(rules) |
|
|
295
|
+
+--------------------------------+------------------------------------+
|
|
296
|
+
| Plan (NOT executed yet)
|
|
297
|
+
v
|
|
298
|
+
+---------------------------------------------------------------------+
|
|
299
|
+
| HyperAgentProxy: Validates plan against capabilities |
|
|
300
|
+
| [x] Does agent have ReadKG capability? Yes |
|
|
301
|
+
| [x] Is query schema-valid? Yes |
|
|
302
|
+
| [ ] Blocked: WriteKG not in capability set |
|
|
303
|
+
+--------------------------------+------------------------------------+
|
|
304
|
+
| Validated plan only
|
|
305
|
+
v
|
|
306
|
+
+---------------------------------------------------------------------+
|
|
307
|
+
| WasmSandbox: Executes with resource limits |
|
|
308
|
+
| - Fuel metering: 1M operations max |
|
|
309
|
+
| - Memory cap: 64MB |
|
|
310
|
+
| - Capability enforcement |
|
|
311
|
+
+--------------------------------+------------------------------------+
|
|
312
|
+
| Execution with audit
|
|
313
|
+
v
|
|
314
|
+
+---------------------------------------------------------------------+
|
|
315
|
+
| ProofDAG: Records execution witness |
|
|
316
|
+
| - What tool ran |
|
|
317
|
+
| - What inputs/outputs |
|
|
318
|
+
| - SHA-256 hash of entire execution |
|
|
319
|
+
+---------------------------------------------------------------------+
|
|
635
320
|
```
|
|
636
321
|
|
|
637
|
-
|
|
638
|
-
|
|
639
|
-
---
|
|
322
|
+
The LLM never executes directly. It proposes. The proxy validates. The sandbox enforces. The proof records. Four independent layers of defense.
|
|
640
323
|
|
|
641
324
|
## Agent Memory: Deep Flashback
|
|
642
325
|
|
|
643
|
-
Most AI agents
|
|
644
|
-
|
|
645
|
-
### The Problem
|
|
646
|
-
|
|
647
|
-
- ChatGPT forgets after context window fills
|
|
648
|
-
- LangChain rebuilds context every call (~500ms)
|
|
649
|
-
- Vector databases return "similar" docs, not exact matches
|
|
650
|
-
|
|
651
|
-
### Our Solution: Memory Hypergraph
|
|
326
|
+
Most AI agents forget everything between sessions. HyperMind stores memory in the same knowledge graph as your data.
|
|
652
327
|
|
|
653
328
|
```
|
|
654
329
|
+-----------------------------------------------------------------------------+
|
|
655
|
-
| MEMORY HYPERGRAPH
|
|
656
|
-
|
|
|
657
|
-
| AGENT MEMORY LAYER
|
|
658
|
-
|
|
|
659
|
-
| |
|
|
660
|
-
| |
|
|
661
|
-
| |
|
|
662
|
-
|
|
|
663
|
-
|
|
|
664
|
-
|
|
|
665
|
-
|
|
|
666
|
-
|
|
|
667
|
-
|
|
|
668
|
-
|
|
|
669
|
-
|
|
|
670
|
-
| |
|
|
671
|
-
| |
|
|
672
|
-
| |
|
|
673
|
-
| |
|
|
674
|
-
|
|
|
675
|
-
| |
|
|
330
|
+
| MEMORY HYPERGRAPH |
|
|
331
|
+
| |
|
|
332
|
+
| AGENT MEMORY LAYER |
|
|
333
|
+
| +-----------+ +-----------+ +-----------+ |
|
|
334
|
+
| |Episode:001| |Episode:002| |Episode:003| |
|
|
335
|
+
| |"Fraud ring| |"Denied | |"Follow-up | |
|
|
336
|
+
| | detected" | | claim" | | on P001" | |
|
|
337
|
+
| +-----+-----+ +-----+-----+ +-----+-----+ |
|
|
338
|
+
| | | | |
|
|
339
|
+
| +-----------------+-----------------+ |
|
|
340
|
+
| | HyperEdges connect to KG |
|
|
341
|
+
| v |
|
|
342
|
+
| KNOWLEDGE GRAPH LAYER |
|
|
343
|
+
| +-----------------------------------------------------------------+ |
|
|
344
|
+
| | Provider:P001 -----> Claim:C123 <----- Claimant:John | |
|
|
345
|
+
| | | | | | |
|
|
346
|
+
| | v v v | |
|
|
347
|
+
| | riskScore: 0.87 amount: 50000 address: "123 Main" | |
|
|
348
|
+
| +-----------------------------------------------------------------+ |
|
|
349
|
+
| |
|
|
676
350
|
| SAME QUAD STORE - Single SPARQL query traverses BOTH! |
|
|
677
351
|
+-----------------------------------------------------------------------------+
|
|
678
352
|
```
|
|
679
353
|
|
|
680
|
-
|
|
681
|
-
|
|
682
|
-
|
|
683
|
-
|
|
684
|
-
| **Memory Retrieval** | 94% Recall@10 at 10K depth | Find the right past query 94% of the time |
|
|
685
|
-
| **Search Speed** | 16.7ms for 10K queries | 30x faster than typical RAG |
|
|
686
|
-
| **Write Throughput** | 132K ops/sec (16 workers) | Handle enterprise volumes |
|
|
687
|
-
| **Read Throughput** | 302 ops/sec concurrent | Consistent under load |
|
|
688
|
-
|
|
689
|
-
### Idempotent Responses
|
|
690
|
-
|
|
691
|
-
Same question = Same answer. Even with different wording.
|
|
692
|
-
|
|
693
|
-
```javascript
|
|
694
|
-
// First call: Compute answer, cache with semantic hash
|
|
695
|
-
const result1 = await agent.call("Analyze claims from Provider P001");
|
|
696
|
-
|
|
697
|
-
// Second call (different wording): Cache HIT!
|
|
698
|
-
const result2 = await agent.call("Show me P001's claim patterns");
|
|
699
|
-
// Same semantic hash -> Same result
|
|
700
|
-
```
|
|
701
|
-
|
|
702
|
-
---
|
|
703
|
-
|
|
704
|
-
## Mathematical Foundations
|
|
705
|
-
|
|
706
|
-
### Category Theory: Tools as Morphisms
|
|
707
|
-
|
|
708
|
-
```
|
|
709
|
-
Tools are typed arrows:
|
|
710
|
-
kg.sparql.query: Query -> BindingSet
|
|
711
|
-
kg.motif.find: Pattern -> Matches
|
|
712
|
-
kg.datalog.apply: Rules -> InferredFacts
|
|
713
|
-
|
|
714
|
-
Composition is type-checked:
|
|
715
|
-
f: A -> B
|
|
716
|
-
g: B -> C
|
|
717
|
-
g o f: A -> C (valid only if B matches)
|
|
718
|
-
|
|
719
|
-
Laws guaranteed:
|
|
720
|
-
Identity: id o f = f
|
|
721
|
-
Associativity: (h o g) o f = h o (g o f)
|
|
722
|
-
```
|
|
723
|
-
|
|
724
|
-
**In practice:** The AI can only chain tools where outputs match inputs. Like Lego blocks that must fit.
|
|
725
|
-
|
|
726
|
-
### WCOJ: Worst-Case Optimal Joins
|
|
354
|
+
- Episodes link to KG entities via hyper-edges
|
|
355
|
+
- Embeddings enable semantic search over past queries
|
|
356
|
+
- Temporal decay prioritizes recent, relevant memories
|
|
357
|
+
- Single SPARQL query traverses both memory AND knowledge graph
|
|
727
358
|
|
|
728
|
-
|
|
729
|
-
|
|
730
|
-
|
|
731
|
-
|
|
732
|
-
**WCOJ:** Keep sorted indexes. Walk through all three simultaneously. Skip impossible combinations. 50K checks instead of 25 million.
|
|
733
|
-
|
|
734
|
-
### HNSW: Hierarchical Navigable Small World
|
|
735
|
-
|
|
736
|
-
Finding similar items from 50,000 vectors?
|
|
737
|
-
|
|
738
|
-
**Brute force:** Compare to all 50,000. O(n).
|
|
739
|
-
|
|
740
|
-
**HNSW:** Build a multi-layer graph. Start at top layer, descend toward target. ~20 hops. O(log n).
|
|
741
|
-
|
|
742
|
-
### Datalog: Recursive Rule Evaluation
|
|
743
|
-
|
|
744
|
-
```
|
|
745
|
-
mustReport(X) :- transaction(X), amount(X, A), A > 10000.
|
|
746
|
-
mustReport(X) :- transaction(X), involves(X, PEP).
|
|
747
|
-
mustReport(X) :- relatedTo(X, Y), mustReport(Y). # Recursive!
|
|
748
|
-
```
|
|
749
|
-
|
|
750
|
-
Three rules generate ALL reporting requirements. Even for transactions connected to other suspicious transactions, cascading infinitely.
|
|
751
|
-
|
|
752
|
-
---
|
|
359
|
+
Memory Retrieval Performance:
|
|
360
|
+
- 94% Recall at 10K depth
|
|
361
|
+
- 16.7ms search speed for 10K queries
|
|
362
|
+
- 132K ops/sec write throughput
|
|
753
363
|
|
|
754
364
|
## Real-World Examples
|
|
755
365
|
|
|
@@ -781,7 +391,7 @@ const result = await agent.ask("What should we avoid prescribing to Patient 7291
|
|
|
781
391
|
// Returns ACTUAL interactions from your formulary, not made-up drug names
|
|
782
392
|
```
|
|
783
393
|
|
|
784
|
-
### Insurance: Fraud Detection
|
|
394
|
+
### Insurance: Fraud Detection
|
|
785
395
|
|
|
786
396
|
```javascript
|
|
787
397
|
const db = new GraphDB('http://insurer.com/');
|
|
@@ -809,48 +419,22 @@ const inferred = evaluateDatalog(datalog);
|
|
|
809
419
|
// potential_collusion(P001, P002, PROV001) - DETECTED!
|
|
810
420
|
```
|
|
811
421
|
|
|
812
|
-
### AML: Circular Payment Detection
|
|
813
|
-
|
|
814
|
-
```javascript
|
|
815
|
-
db.loadTtl(`
|
|
816
|
-
:Acct_1001 :transferredTo :Acct_2002 ; :amount 9500 .
|
|
817
|
-
:Acct_2002 :transferredTo :Acct_3003 ; :amount 9400 .
|
|
818
|
-
:Acct_3003 :transferredTo :Acct_1001 ; :amount 9200 .
|
|
819
|
-
`);
|
|
820
|
-
|
|
821
|
-
// Find circular chains (money laundering indicator)
|
|
822
|
-
const triangles = gf.triangleCount(); // 1 circular pattern
|
|
823
|
-
```
|
|
824
|
-
|
|
825
|
-
---
|
|
826
|
-
|
|
827
422
|
## Performance Benchmarks
|
|
828
423
|
|
|
829
424
|
All measurements verified. Run them yourself:
|
|
830
425
|
|
|
831
426
|
```bash
|
|
832
|
-
node benchmark.js
|
|
833
|
-
node vanilla-vs-hypermind-benchmark.js
|
|
427
|
+
node benchmark.js
|
|
428
|
+
node vanilla-vs-hypermind-benchmark.js
|
|
834
429
|
```
|
|
835
430
|
|
|
836
431
|
### Rust Core Engine
|
|
837
432
|
|
|
838
433
|
| Metric | rust-kgdb | RDFox | Apache Jena |
|
|
839
434
|
|--------|-----------|-------|-------------|
|
|
840
|
-
|
|
|
841
|
-
|
|
|
842
|
-
|
|
|
843
|
-
|
|
844
|
-
### Agent Accuracy (LUBM Benchmark)
|
|
845
|
-
|
|
846
|
-
| System | Without Schema | With Schema |
|
|
847
|
-
|--------|---------------|-------------|
|
|
848
|
-
| Vanilla LLM | 0% | - |
|
|
849
|
-
| LangChain | 0% | 71.4% |
|
|
850
|
-
| DSPy | 14.3% | 71.4% |
|
|
851
|
-
| **HyperMind** | - | **71.4%** |
|
|
852
|
-
|
|
853
|
-
*All frameworks achieve same accuracy WITH schema. HyperMind's advantage is integrated schema handling.*
|
|
435
|
+
| Lookup | 449 ns | 5,000+ ns | 10,000+ ns |
|
|
436
|
+
| Memory/Triple | 24 bytes | 32 bytes | 50-60 bytes |
|
|
437
|
+
| Bulk Insert | 146K/sec | 200K/sec | 50K/sec |
|
|
854
438
|
|
|
855
439
|
### Concurrency (16 Workers)
|
|
856
440
|
|
|
@@ -859,348 +443,86 @@ node vanilla-vs-hypermind-benchmark.js # Agent accuracy
|
|
|
859
443
|
| Writes | 132K ops/sec |
|
|
860
444
|
| Reads | 302 ops/sec |
|
|
861
445
|
| GraphFrames | 6.5K ops/sec |
|
|
862
|
-
| Mixed | 642 ops/sec |
|
|
863
|
-
|
|
864
|
-
---
|
|
865
446
|
|
|
866
447
|
## Feature Summary
|
|
867
448
|
|
|
868
449
|
| Category | Feature | Performance |
|
|
869
450
|
|----------|---------|-------------|
|
|
870
|
-
|
|
|
871
|
-
|
|
|
872
|
-
|
|
|
873
|
-
|
|
|
874
|
-
|
|
|
875
|
-
|
|
|
876
|
-
|
|
|
877
|
-
|
|
|
878
|
-
|
|
|
879
|
-
|
|
|
880
|
-
|
|
|
881
|
-
|
|
|
882
|
-
| **Reasoning** | RDFS | Subclass inference |
|
|
883
|
-
| **Reasoning** | OWL 2 RL | Rule-based |
|
|
884
|
-
| **Validation** | SHACL | Shape constraints |
|
|
885
|
-
| **Provenance** | PROV | W3C standard |
|
|
886
|
-
| **Joins** | WCOJ | Optimal complexity |
|
|
887
|
-
| **Security** | WASM Sandbox | Capability-based |
|
|
888
|
-
| **Audit** | ProofDAG | SHA-256 witnesses |
|
|
889
|
-
|
|
890
|
-
---
|
|
891
|
-
|
|
892
|
-
## Installation
|
|
893
|
-
|
|
894
|
-
```bash
|
|
895
|
-
npm install rust-kgdb
|
|
896
|
-
```
|
|
897
|
-
|
|
898
|
-
**Platforms:** macOS (Intel/Apple Silicon), Linux (x64/ARM64), Windows (x64)
|
|
899
|
-
|
|
900
|
-
**Requirements:** Node.js 14+
|
|
901
|
-
|
|
902
|
-
---
|
|
903
|
-
|
|
904
|
-
## Complete Fraud Detection Example
|
|
905
|
-
|
|
906
|
-
Copy this entire example to get started with fraud detection:
|
|
907
|
-
|
|
908
|
-
```javascript
|
|
909
|
-
const {
|
|
910
|
-
GraphDB,
|
|
911
|
-
GraphFrame,
|
|
912
|
-
EmbeddingService,
|
|
913
|
-
DatalogProgram,
|
|
914
|
-
evaluateDatalog,
|
|
915
|
-
HyperMindAgent
|
|
916
|
-
} = require('rust-kgdb');
|
|
917
|
-
|
|
918
|
-
// ============================================================
|
|
919
|
-
// STEP 1: Initialize Services
|
|
920
|
-
// ============================================================
|
|
921
|
-
const db = new GraphDB('http://insurance.org/fraud-detection');
|
|
922
|
-
const embeddings = new EmbeddingService();
|
|
923
|
-
|
|
924
|
-
// ============================================================
|
|
925
|
-
// STEP 2: Load Claims Data
|
|
926
|
-
// ============================================================
|
|
927
|
-
db.loadTtl(`
|
|
928
|
-
@prefix : <http://insurance.org/> .
|
|
929
|
-
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
|
|
930
|
-
|
|
931
|
-
# Claims
|
|
932
|
-
:CLM001 a :Claim ;
|
|
933
|
-
:amount "18500"^^xsd:decimal ;
|
|
934
|
-
:description "Soft tissue injury from rear-end collision" ;
|
|
935
|
-
:claimant :P001 ;
|
|
936
|
-
:provider :PROV001 ;
|
|
937
|
-
:filingDate "2024-11-15"^^xsd:date .
|
|
938
|
-
|
|
939
|
-
:CLM002 a :Claim ;
|
|
940
|
-
:amount "22300"^^xsd:decimal ;
|
|
941
|
-
:description "Whiplash injury from vehicle accident" ;
|
|
942
|
-
:claimant :P002 ;
|
|
943
|
-
:provider :PROV001 ;
|
|
944
|
-
:filingDate "2024-11-18"^^xsd:date .
|
|
945
|
-
|
|
946
|
-
# Claimants (note: same address = red flag!)
|
|
947
|
-
:P001 a :Claimant ;
|
|
948
|
-
:name "John Smith" ;
|
|
949
|
-
:address "123 Main St, Miami, FL" ;
|
|
950
|
-
:riskScore "0.85"^^xsd:decimal .
|
|
951
|
-
|
|
952
|
-
:P002 a :Claimant ;
|
|
953
|
-
:name "Jane Doe" ;
|
|
954
|
-
:address "123 Main St, Miami, FL" ;
|
|
955
|
-
:riskScore "0.72"^^xsd:decimal .
|
|
956
|
-
|
|
957
|
-
# Relationships (fraud indicators)
|
|
958
|
-
:P001 :knows :P002 .
|
|
959
|
-
:P001 :paidTo :P002 .
|
|
960
|
-
:P002 :paidTo :P003 .
|
|
961
|
-
:P003 :paidTo :P001 . # Circular payment!
|
|
962
|
-
|
|
963
|
-
# Provider
|
|
964
|
-
:PROV001 a :Provider ;
|
|
965
|
-
:name "Quick Care Rehabilitation Clinic" ;
|
|
966
|
-
:flagCount "4"^^xsd:integer .
|
|
967
|
-
`);
|
|
968
|
-
|
|
969
|
-
console.log(`Loaded ${db.countTriples()} triples`);
|
|
970
|
-
|
|
971
|
-
// ============================================================
|
|
972
|
-
// STEP 3: Graph Analytics - Find Network Patterns
|
|
973
|
-
// ============================================================
|
|
974
|
-
const vertices = JSON.stringify([
|
|
975
|
-
{id: 'P001'}, {id: 'P002'}, {id: 'P003'}, {id: 'PROV001'}
|
|
976
|
-
]);
|
|
977
|
-
const edges = JSON.stringify([
|
|
978
|
-
{src: 'P001', dst: 'P002'},
|
|
979
|
-
{src: 'P001', dst: 'PROV001'},
|
|
980
|
-
{src: 'P002', dst: 'PROV001'},
|
|
981
|
-
{src: 'P001', dst: 'P002'}, // payment
|
|
982
|
-
{src: 'P002', dst: 'P003'}, // payment
|
|
983
|
-
{src: 'P003', dst: 'P001'} // payment (circular!)
|
|
984
|
-
]);
|
|
985
|
-
|
|
986
|
-
const gf = new GraphFrame(vertices, edges);
|
|
987
|
-
console.log('Triangles (circular patterns):', gf.triangleCount());
|
|
988
|
-
console.log('PageRank:', gf.pageRank(0.15, 20));
|
|
989
|
-
|
|
990
|
-
// ============================================================
|
|
991
|
-
// STEP 4: Embedding-Based Similarity
|
|
992
|
-
// ============================================================
|
|
993
|
-
// Store embeddings for semantic similarity search
|
|
994
|
-
// (In production, use OpenAI/Voyage embeddings)
|
|
995
|
-
function mockEmbedding(text) {
|
|
996
|
-
return new Array(384).fill(0).map((_, i) =>
|
|
997
|
-
Math.sin(text.charCodeAt(i % text.length) * 0.1) * 0.5 + 0.5
|
|
998
|
-
);
|
|
999
|
-
}
|
|
1000
|
-
|
|
1001
|
-
embeddings.storeVector('CLM001', mockEmbedding('soft tissue injury rear end'));
|
|
1002
|
-
embeddings.storeVector('CLM002', mockEmbedding('whiplash vehicle accident'));
|
|
1003
|
-
embeddings.rebuildIndex();
|
|
1004
|
-
|
|
1005
|
-
const similar = JSON.parse(embeddings.findSimilar('CLM001', 5, 0.3));
|
|
1006
|
-
console.log('Similar claims:', similar);
|
|
1007
|
-
|
|
1008
|
-
// ============================================================
|
|
1009
|
-
// STEP 5: Datalog Rules - NICB Fraud Detection
|
|
1010
|
-
// ============================================================
|
|
1011
|
-
const datalog = new DatalogProgram();
|
|
1012
|
-
|
|
1013
|
-
// Add facts from our knowledge graph
|
|
1014
|
-
datalog.addFact(JSON.stringify({predicate:'claimant', terms:['P001']}));
|
|
1015
|
-
datalog.addFact(JSON.stringify({predicate:'claimant', terms:['P002']}));
|
|
1016
|
-
datalog.addFact(JSON.stringify({predicate:'provider', terms:['PROV001']}));
|
|
1017
|
-
datalog.addFact(JSON.stringify({predicate:'knows', terms:['P001','P002']}));
|
|
1018
|
-
datalog.addFact(JSON.stringify({predicate:'claims_with', terms:['P001','PROV001']}));
|
|
1019
|
-
datalog.addFact(JSON.stringify({predicate:'claims_with', terms:['P002','PROV001']}));
|
|
1020
|
-
datalog.addFact(JSON.stringify({predicate:'same_address', terms:['P001','P002']}));
|
|
1021
|
-
|
|
1022
|
-
// NICB Collusion Detection Rule
|
|
1023
|
-
datalog.addRule(JSON.stringify({
|
|
1024
|
-
head: {predicate:'potential_collusion', terms:['?X','?Y','?P']},
|
|
1025
|
-
body: [
|
|
1026
|
-
{predicate:'claimant', terms:['?X']},
|
|
1027
|
-
{predicate:'claimant', terms:['?Y']},
|
|
1028
|
-
{predicate:'provider', terms:['?P']},
|
|
1029
|
-
{predicate:'knows', terms:['?X','?Y']},
|
|
1030
|
-
{predicate:'claims_with', terms:['?X','?P']},
|
|
1031
|
-
{predicate:'claims_with', terms:['?Y','?P']}
|
|
1032
|
-
]
|
|
1033
|
-
}));
|
|
1034
|
-
|
|
1035
|
-
// Staged Accident Indicator Rule
|
|
1036
|
-
datalog.addRule(JSON.stringify({
|
|
1037
|
-
head: {predicate:'staged_accident_indicator', terms:['?X','?Y']},
|
|
1038
|
-
body: [
|
|
1039
|
-
{predicate:'claimant', terms:['?X']},
|
|
1040
|
-
{predicate:'claimant', terms:['?Y']},
|
|
1041
|
-
{predicate:'same_address', terms:['?X','?Y']},
|
|
1042
|
-
{predicate:'knows', terms:['?X','?Y']}
|
|
1043
|
-
]
|
|
1044
|
-
}));
|
|
1045
|
-
|
|
1046
|
-
const inferred = JSON.parse(evaluateDatalog(datalog));
|
|
1047
|
-
console.log('Inferred fraud patterns:', inferred);
|
|
1048
|
-
|
|
1049
|
-
// ============================================================
|
|
1050
|
-
// STEP 6: SPARQL Query - Get Detailed Evidence
|
|
1051
|
-
// ============================================================
|
|
1052
|
-
const suspiciousClaims = db.querySelect(`
|
|
1053
|
-
PREFIX : <http://insurance.org/>
|
|
1054
|
-
SELECT ?claim ?amount ?claimant ?provider WHERE {
|
|
1055
|
-
?claim a :Claim ;
|
|
1056
|
-
:amount ?amount ;
|
|
1057
|
-
:claimant ?claimant ;
|
|
1058
|
-
:provider ?provider .
|
|
1059
|
-
?claimant :riskScore ?risk .
|
|
1060
|
-
FILTER(?risk > 0.7)
|
|
1061
|
-
}
|
|
1062
|
-
`);
|
|
1063
|
-
|
|
1064
|
-
console.log('High-risk claims:', suspiciousClaims);
|
|
1065
|
-
|
|
1066
|
-
// ============================================================
|
|
1067
|
-
// STEP 7: HyperMind Agent - Natural Language Interface
|
|
1068
|
-
// ============================================================
|
|
1069
|
-
const agent = new HyperMindAgent({ db, embeddings });
|
|
1070
|
-
|
|
1071
|
-
async function investigate() {
|
|
1072
|
-
const result = await agent.ask("Which claims show potential fraud patterns?");
|
|
1073
|
-
|
|
1074
|
-
console.log('\\n=== AGENT FINDINGS ===');
|
|
1075
|
-
console.log(result.answer);
|
|
1076
|
-
console.log('\\n=== EVIDENCE CHAIN ===');
|
|
1077
|
-
console.log(result.evidence);
|
|
1078
|
-
console.log('\\n=== PROOF HASH ===');
|
|
1079
|
-
console.log(result.proofHash);
|
|
1080
|
-
}
|
|
1081
|
-
|
|
1082
|
-
investigate().catch(console.error);
|
|
1083
|
-
```
|
|
1084
|
-
|
|
1085
|
-
---
|
|
1086
|
-
|
|
1087
|
-
## Complete Underwriting Example
|
|
1088
|
-
|
|
1089
|
-
```javascript
|
|
1090
|
-
const { GraphDB, DatalogProgram, evaluateDatalog } = require('rust-kgdb');
|
|
1091
|
-
|
|
1092
|
-
// ============================================================
|
|
1093
|
-
// Automated Underwriting Rules Engine
|
|
1094
|
-
// ============================================================
|
|
1095
|
-
const db = new GraphDB('http://underwriting.org/');
|
|
1096
|
-
|
|
1097
|
-
// Load applicant data
|
|
1098
|
-
db.loadTtl(`
|
|
1099
|
-
@prefix : <http://underwriting.org/> .
|
|
1100
|
-
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
|
|
1101
|
-
|
|
1102
|
-
:APP001 a :Application ;
|
|
1103
|
-
:applicant :PERSON001 ;
|
|
1104
|
-
:requestedAmount "500000"^^xsd:decimal ;
|
|
1105
|
-
:propertyType :SingleFamily .
|
|
1106
|
-
|
|
1107
|
-
:PERSON001 a :Person ;
|
|
1108
|
-
:creditScore "720"^^xsd:integer ;
|
|
1109
|
-
:dti "0.35"^^xsd:decimal ;
|
|
1110
|
-
:employmentYears "5"^^xsd:integer ;
|
|
1111
|
-
:bankruptcyHistory false .
|
|
1112
|
-
`);
|
|
1113
|
-
|
|
1114
|
-
// Underwriting rules as Datalog
|
|
1115
|
-
const datalog = new DatalogProgram();
|
|
1116
|
-
|
|
1117
|
-
// Facts
|
|
1118
|
-
datalog.addFact(JSON.stringify({predicate:'application', terms:['APP001']}));
|
|
1119
|
-
datalog.addFact(JSON.stringify({predicate:'credit_score', terms:['APP001','720']}));
|
|
1120
|
-
datalog.addFact(JSON.stringify({predicate:'dti', terms:['APP001','0.35']}));
|
|
1121
|
-
datalog.addFact(JSON.stringify({predicate:'employment_years', terms:['APP001','5']}));
|
|
1122
|
-
|
|
1123
|
-
// Auto-Approve Rule: Credit > 700, DTI < 0.43, Employment > 2 years
|
|
1124
|
-
datalog.addRule(JSON.stringify({
|
|
1125
|
-
head: {predicate:'auto_approve', terms:['?App']},
|
|
1126
|
-
body: [
|
|
1127
|
-
{predicate:'application', terms:['?App']},
|
|
1128
|
-
{predicate:'credit_score', terms:['?App','?Credit']},
|
|
1129
|
-
{predicate:'dti', terms:['?App','?DTI']},
|
|
1130
|
-
{predicate:'employment_years', terms:['?App','?Years']}
|
|
1131
|
-
// Note: Numeric comparisons would be handled in production
|
|
1132
|
-
]
|
|
1133
|
-
}));
|
|
1134
|
-
|
|
1135
|
-
const decisions = JSON.parse(evaluateDatalog(datalog));
|
|
1136
|
-
console.log('Underwriting decisions:', decisions);
|
|
1137
|
-
```
|
|
1138
|
-
|
|
1139
|
-
---
|
|
451
|
+
| Core | SPARQL 1.1 Engine | 449ns lookups |
|
|
452
|
+
| Core | RDF 1.2 Support | W3C compliant |
|
|
453
|
+
| Core | Named Graphs | Quad store |
|
|
454
|
+
| Analytics | PageRank | O(V + E) |
|
|
455
|
+
| Analytics | Connected Components | Union-find |
|
|
456
|
+
| Analytics | Triangle Count | O(E^1.5) |
|
|
457
|
+
| Analytics | Motif Finding | Pattern DSL |
|
|
458
|
+
| AI | HNSW Embeddings | 16ms/10K vectors |
|
|
459
|
+
| AI | Agent Memory | 94% recall |
|
|
460
|
+
| Reasoning | Datalog | Semi-naive |
|
|
461
|
+
| Security | WASM Sandbox | Capability-based |
|
|
462
|
+
| Audit | ProofDAG | SHA-256 witnesses |
|
|
1140
463
|
|
|
1141
464
|
## API Reference
|
|
1142
465
|
|
|
1143
466
|
### GraphDB
|
|
1144
467
|
|
|
1145
468
|
```javascript
|
|
1146
|
-
const db = new GraphDB(baseUri)
|
|
1147
|
-
db.loadTtl(turtle, graphUri)
|
|
1148
|
-
db.querySelect(sparql)
|
|
1149
|
-
db.queryConstruct(sparql)
|
|
1150
|
-
db.countTriples()
|
|
1151
|
-
db.clear()
|
|
1152
|
-
db.getVersion() // SDK version
|
|
469
|
+
const db = new GraphDB(baseUri)
|
|
470
|
+
db.loadTtl(turtle, graphUri)
|
|
471
|
+
db.querySelect(sparql)
|
|
472
|
+
db.queryConstruct(sparql)
|
|
473
|
+
db.countTriples()
|
|
474
|
+
db.clear()
|
|
1153
475
|
```
|
|
1154
476
|
|
|
1155
477
|
### GraphFrame
|
|
1156
478
|
|
|
1157
479
|
```javascript
|
|
1158
480
|
const gf = new GraphFrame(verticesJson, edgesJson)
|
|
1159
|
-
gf.pageRank(dampingFactor, iterations)
|
|
1160
|
-
gf.connectedComponents()
|
|
1161
|
-
gf.triangleCount()
|
|
1162
|
-
gf.shortestPaths(sourceId)
|
|
1163
|
-
gf.find(motifPattern)
|
|
481
|
+
gf.pageRank(dampingFactor, iterations)
|
|
482
|
+
gf.connectedComponents()
|
|
483
|
+
gf.triangleCount()
|
|
484
|
+
gf.shortestPaths(sourceId)
|
|
485
|
+
gf.find(motifPattern)
|
|
1164
486
|
```
|
|
1165
487
|
|
|
1166
488
|
### EmbeddingService
|
|
1167
489
|
|
|
1168
490
|
```javascript
|
|
1169
491
|
const emb = new EmbeddingService()
|
|
1170
|
-
emb.storeVector(entityId, float32Array)
|
|
1171
|
-
emb.rebuildIndex()
|
|
1172
|
-
emb.findSimilar(entityId, k, threshold)
|
|
1173
|
-
emb.onTripleInsert(s, p, o, g) // Update neighbor cache
|
|
1174
|
-
emb.getNeighborsOut(entityId) // Get outgoing neighbors
|
|
492
|
+
emb.storeVector(entityId, float32Array)
|
|
493
|
+
emb.rebuildIndex()
|
|
494
|
+
emb.findSimilar(entityId, k, threshold)
|
|
1175
495
|
```
|
|
1176
496
|
|
|
1177
497
|
### DatalogProgram
|
|
1178
498
|
|
|
1179
499
|
```javascript
|
|
1180
500
|
const dl = new DatalogProgram()
|
|
1181
|
-
dl.addFact(factJson)
|
|
1182
|
-
dl.addRule(ruleJson)
|
|
1183
|
-
evaluateDatalog(dl)
|
|
1184
|
-
queryDatalog(dl, queryJson) // Query specific predicate
|
|
501
|
+
dl.addFact(factJson)
|
|
502
|
+
dl.addRule(ruleJson)
|
|
503
|
+
evaluateDatalog(dl)
|
|
1185
504
|
```
|
|
1186
505
|
|
|
1187
|
-
###
|
|
506
|
+
### Factory Functions
|
|
1188
507
|
|
|
1189
508
|
```javascript
|
|
1190
|
-
|
|
1191
|
-
|
|
509
|
+
friendsGraph()
|
|
510
|
+
chainGraph(n)
|
|
511
|
+
starGraph(n)
|
|
512
|
+
completeGraph(n)
|
|
513
|
+
cycleGraph(n)
|
|
1192
514
|
```
|
|
1193
515
|
|
|
1194
|
-
|
|
516
|
+
## Installation
|
|
1195
517
|
|
|
1196
|
-
```
|
|
1197
|
-
|
|
1198
|
-
chainGraph(n) // Linear chain of n vertices
|
|
1199
|
-
starGraph(n) // Star topology with n leaves
|
|
1200
|
-
completeGraph(n) // Fully connected graph
|
|
1201
|
-
cycleGraph(n) // Circular graph
|
|
518
|
+
```bash
|
|
519
|
+
npm install rust-kgdb
|
|
1202
520
|
```
|
|
1203
521
|
|
|
1204
|
-
|
|
522
|
+
Platforms: macOS (Intel/Apple Silicon), Linux (x64/ARM64), Windows (x64)
|
|
523
|
+
|
|
524
|
+
Requirements: Node.js 14+
|
|
525
|
+
|
|
526
|
+
## License
|
|
1205
527
|
|
|
1206
|
-
Apache 2.0
|
|
528
|
+
Apache 2.0
|