rust-kgdb 0.6.54 → 0.6.55
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +22 -0
- package/README.md +187 -0
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,28 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to the rust-kgdb TypeScript SDK will be documented in this file.
|
|
4
4
|
|
|
5
|
+
## [0.6.55] - 2025-12-17
|
|
6
|
+
|
|
7
|
+
### Thought-Provoking Documentation Rewrite
|
|
8
|
+
|
|
9
|
+
#### New Opening: "Why AI Agents Keep Lying"
|
|
10
|
+
Rewrote the entire opening to explain the core insight in human, thought-provoking style:
|
|
11
|
+
- LLMs don't know YOUR data - they make things up
|
|
12
|
+
- Industry response (guardrails, RAG, fine-tuning) treats symptoms, not causes
|
|
13
|
+
- Our insight: What if AI couldn't lie even if it wanted to? Through ARCHITECTURE.
|
|
14
|
+
- LLM generates queries, database executes on YOUR data - no hallucination possible
|
|
15
|
+
|
|
16
|
+
#### Mathematical Foundations Section
|
|
17
|
+
Rewrote in SME style explaining WHY each concept matters:
|
|
18
|
+
- **Category Theory**: "Not Academic Masturbation" - makes incorrect tool calls unrepresentable
|
|
19
|
+
- **Curry-Howard**: Proofs you can execute - audit trails ARE mathematical proofs
|
|
20
|
+
- **WCOJ**: When O(N²) is unacceptable - 31K vs 1T operations for triangle queries
|
|
21
|
+
- **Sparse Matrix**: Why your RAM doesn't explode - CSR makes terabyte matrices practical
|
|
22
|
+
- **Semi-Naive Datalog**: Don't repeat yourself - O(depth) vs O(facts) iterations
|
|
23
|
+
- **HNSW**: O(log N) similarity - find top 10 from 10K embeddings in 16ms
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
5
27
|
## [0.6.54] - 2025-12-17
|
|
6
28
|
|
|
7
29
|
### Embedded Database: Game Changer
|
package/README.md
CHANGED
|
@@ -6,6 +6,58 @@
|
|
|
6
6
|
|
|
7
7
|
## What Is This?
|
|
8
8
|
|
|
9
|
+
### Have You Ever Wondered Why AI Agents Keep Lying?
|
|
10
|
+
|
|
11
|
+
Here's the uncomfortable truth: **LLMs don't know your data**. They've read Wikipedia, Stack Overflow, and half the internet - but they've never seen your customer records, your claims database, or your internal knowledge graph.
|
|
12
|
+
|
|
13
|
+
So when you ask "find suspicious providers," they do what humans do when they don't know the answer: **they make something up that sounds plausible**.
|
|
14
|
+
|
|
15
|
+
The industry's response? "Add more guardrails!" "Use RAG!" "Fine-tune on your data!"
|
|
16
|
+
|
|
17
|
+
We asked a different question: **What if the AI couldn't lie even if it wanted to?**
|
|
18
|
+
|
|
19
|
+
Not through prompting. Not through fine-tuning. Through **architecture**.
|
|
20
|
+
|
|
21
|
+
### The Insight That Changes Everything
|
|
22
|
+
|
|
23
|
+
What if instead of asking an LLM to generate answers, we asked it to generate **database queries**?
|
|
24
|
+
|
|
25
|
+
The LLM doesn't need to know your data. It just needs to know:
|
|
26
|
+
1. What questions can be asked (your schema)
|
|
27
|
+
2. How to ask them (SPARQL/Datalog syntax)
|
|
28
|
+
|
|
29
|
+
Then a **real database** - with your actual data - executes the query and returns facts. Not hallucinations. Facts.
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
User: "Find suspicious providers"
|
|
33
|
+
↓
|
|
34
|
+
LLM generates: SELECT ?provider WHERE { ?provider :riskScore ?s . FILTER(?s > 0.8) }
|
|
35
|
+
↓
|
|
36
|
+
Database executes: Scans 47M triples in 449ns per lookup
|
|
37
|
+
↓
|
|
38
|
+
Returns: [PROV001, PROV847, PROV2201] ← These actually exist in YOUR data
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
**The AI suggests what to look for. The database finds exactly that. No hallucination possible.**
|
|
42
|
+
|
|
43
|
+
### But Wait - Where's the Database?
|
|
44
|
+
|
|
45
|
+
Here's where it gets interesting. Traditional approach:
|
|
46
|
+
- Install Virtuoso/RDFox/Neo4j server
|
|
47
|
+
- Configure connections
|
|
48
|
+
- Pay for licenses
|
|
49
|
+
- Hire a DBA
|
|
50
|
+
|
|
51
|
+
Our approach: **The database is embedded in your app.**
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
npm install rust-kgdb # That's it. You now have a full SPARQL database.
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
47.2MB native addon. Zero configuration. 449ns lookups. Embedded like SQLite, powerful like RDFox.
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
9
61
|
**rust-kgdb** is two layers in one package:
|
|
10
62
|
|
|
11
63
|
```
|
|
@@ -147,6 +199,141 @@ Embedded (single node) → Clustered (distributed)
|
|
|
147
199
|
|
|
148
200
|
---
|
|
149
201
|
|
|
202
|
+
## Mathematical Foundations: Why This Actually Works
|
|
203
|
+
|
|
204
|
+
### The Problem with LLM Tool Calling
|
|
205
|
+
|
|
206
|
+
Here's a dirty secret about AI agents: **most tool calls are prayers**.
|
|
207
|
+
|
|
208
|
+
The LLM generates a function call, hopes the types match, and if it fails? Retry and pray harder. This is why production AI systems feel brittle.
|
|
209
|
+
|
|
210
|
+
We took a different approach: **make incorrect tool calls impossible to express**.
|
|
211
|
+
|
|
212
|
+
### Category Theory: Not Academic Masturbation
|
|
213
|
+
|
|
214
|
+
When you hear "category theory," you probably think of mathematicians drawing commutative diagrams that no one understands. Here's why it actually matters for AI agents:
|
|
215
|
+
|
|
216
|
+
```
|
|
217
|
+
Every tool is a morphism: InputType → OutputType
|
|
218
|
+
|
|
219
|
+
kg.sparql.query : Query → BindingSet
|
|
220
|
+
kg.motif.find : Pattern → Matches
|
|
221
|
+
kg.datalog.run : Rules → InferredFacts
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
**The key insight**: If the LLM can only compose morphisms where types align, it *cannot* hallucinate invalid tool chains. It's not about "being careful" - it's about making mistakes unrepresentable.
|
|
225
|
+
|
|
226
|
+
```javascript
|
|
227
|
+
// This composition type-checks: Query → BindingSet → Aggregation
|
|
228
|
+
planner.compose(sparqlQuery, aggregator) // ✅ Valid
|
|
229
|
+
|
|
230
|
+
// This doesn't even compile conceptually
|
|
231
|
+
planner.compose(sparqlQuery, imageGenerator) // ❌ Type error
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
### Curry-Howard: Proofs You Can Execute
|
|
235
|
+
|
|
236
|
+
The **Curry-Howard correspondence** says something profound: **proofs and programs are the same thing**.
|
|
237
|
+
|
|
238
|
+
In our system:
|
|
239
|
+
- A valid reasoning trace IS a mathematical proof that the answer is correct
|
|
240
|
+
- The type signature of a tool IS a proposition about what it transforms
|
|
241
|
+
- Composing tools IS constructing a proof by implication
|
|
242
|
+
|
|
243
|
+
```javascript
|
|
244
|
+
result.proofDAG = {
|
|
245
|
+
// This isn't just logging - it's a PROOF OBJECT
|
|
246
|
+
steps: [
|
|
247
|
+
{ tool: 'kg.sparql.query', proves: '∃ provider P001 with 47 claims' },
|
|
248
|
+
{ tool: 'kg.datalog.rule', proves: 'P001 ∈ highRisk (by rule R3)' }
|
|
249
|
+
],
|
|
250
|
+
hash: 'sha256:8f3a...', // Same proof = same hash, always
|
|
251
|
+
valid: true // Type-checked, therefore valid
|
|
252
|
+
}
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
**Why this matters for compliance**: When a regulator asks "why did you flag this provider?", you don't show them chat logs. You show them a mathematical proof.
|
|
256
|
+
|
|
257
|
+
### WCOJ: When O(N²) is Unacceptable
|
|
258
|
+
|
|
259
|
+
Finding triangles in a graph (A→B→C→A) seems simple. The naive approach:
|
|
260
|
+
1. For each edge A→B
|
|
261
|
+
2. For each edge B→C
|
|
262
|
+
3. Check if C→A exists
|
|
263
|
+
|
|
264
|
+
That's O(N²) - fine for toy graphs, death for production.
|
|
265
|
+
|
|
266
|
+
**Worst-Case Optimal Joins** (LeapFrog TrieJoin) do something clever:
|
|
267
|
+
- Organize edges in tries by (subject, predicate, object)
|
|
268
|
+
- Traverse all three tries simultaneously
|
|
269
|
+
- Skip entire branches that can't possibly match
|
|
270
|
+
|
|
271
|
+
```
|
|
272
|
+
Traditional: O(N²) for triangle query
|
|
273
|
+
WCOJ: O(N^(ρ/2)) where ρ = fractional edge cover number
|
|
274
|
+
|
|
275
|
+
For triangles: ρ = 1.5, so O(N^0.75) vs O(N²)
|
|
276
|
+
At 1M edges: 31K operations vs 1T operations
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
### Sparse Matrix: Why Your RAM Doesn't Explode
|
|
280
|
+
|
|
281
|
+
A knowledge graph with 1M entities has a 1M × 1M adjacency matrix. That's 1 trillion cells. At 8 bytes each: 8 terabytes. For one matrix.
|
|
282
|
+
|
|
283
|
+
**CSR (Compressed Sparse Row)** stores only non-zero entries:
|
|
284
|
+
- Real graphs are ~99.99% sparse
|
|
285
|
+
- 1M entities with 10M edges = 10M entries, not 1T
|
|
286
|
+
- Transitive closure becomes matrix multiplication: A* = I + A + A² + ...
|
|
287
|
+
|
|
288
|
+
```
|
|
289
|
+
rdfs:subClassOf closure in OWL:
|
|
290
|
+
Dense: Impossible (terabytes of memory)
|
|
291
|
+
CSR: Seconds (megabytes of memory)
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
### Semi-Naive Datalog: Don't Repeat Yourself
|
|
295
|
+
|
|
296
|
+
Recursive rules need fixpoint iteration. The naive way recomputes everything:
|
|
297
|
+
|
|
298
|
+
```
|
|
299
|
+
Iteration 1: Compute ALL ancestor relationships
|
|
300
|
+
Iteration 2: Compute ALL ancestor relationships again ← wasteful
|
|
301
|
+
Iteration 3: Compute ALL ancestor relationships again ← really wasteful
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
**Semi-naive evaluation**: Only derive facts using NEW facts from the previous iteration.
|
|
305
|
+
|
|
306
|
+
```
|
|
307
|
+
Iteration 1: Direct parents (new: 1000 facts)
|
|
308
|
+
Iteration 2: Use only those 1000 new facts → grandparents (new: 800)
|
|
309
|
+
Iteration 3: Use only those 800 new facts → great-grandparents (new: 400)
|
|
310
|
+
...converges in O(depth) iterations, not O(facts)
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
### HNSW: O(log N) Similarity in a World of Vectors
|
|
314
|
+
|
|
315
|
+
Finding the nearest neighbor in a million vectors should take a million comparisons. It doesn't have to.
|
|
316
|
+
|
|
317
|
+
**HNSW** builds a navigable graph where:
|
|
318
|
+
- Top layers have few nodes with long-range connections
|
|
319
|
+
- Bottom layers have all nodes with local connections
|
|
320
|
+
- Search: Start at top, greedily descend, refine at bottom
|
|
321
|
+
|
|
322
|
+
```
|
|
323
|
+
Layer 2: ●───────────────────● (sparse, long jumps)
|
|
324
|
+
│ │
|
|
325
|
+
Layer 1: ●────●────●────●────● (medium density)
|
|
326
|
+
│ │ │ │ │
|
|
327
|
+
Layer 0: ●─●─●─●─●─●─●─●─●─●─● (all nodes, local connections)
|
|
328
|
+
|
|
329
|
+
Search path: Start top-left, jump to approximate region, refine locally
|
|
330
|
+
Result: O(log N) comparisons, ~95% recall
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
**Why this matters**: When your agent needs "similar past queries," it doesn't scan 10,000 embeddings. It finds the top 10 in 16 milliseconds.
|
|
334
|
+
|
|
335
|
+
---
|
|
336
|
+
|
|
150
337
|
## Core Concepts: What We Bring and Why
|
|
151
338
|
|
|
152
339
|
### 1. Schema-Aware Query Generation
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "rust-kgdb",
|
|
3
|
-
"version": "0.6.
|
|
3
|
+
"version": "0.6.55",
|
|
4
4
|
"description": "High-performance RDF/SPARQL database with AI agent framework. GraphDB (449ns lookups, 35x faster than RDFox), GraphFrames analytics (PageRank, motifs), Datalog reasoning, HNSW vector embeddings. HyperMindAgent for schema-aware query generation with audit trails. W3C SPARQL 1.1 compliant. Native performance via Rust + NAPI-RS.",
|
|
5
5
|
"main": "index.js",
|
|
6
6
|
"types": "index.d.ts",
|