rust-kgdb 0.6.54 → 0.6.55

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/CHANGELOG.md +22 -0
  2. package/README.md +187 -0
  3. package/package.json +1 -1
package/CHANGELOG.md CHANGED
@@ -2,6 +2,28 @@
2
2
 
3
3
  All notable changes to the rust-kgdb TypeScript SDK will be documented in this file.
4
4
 
5
+ ## [0.6.55] - 2025-12-17
6
+
7
+ ### Thought-Provoking Documentation Rewrite
8
+
9
+ #### New Opening: "Why AI Agents Keep Lying"
10
+ Rewrote the entire opening to explain the core insight in human, thought-provoking style:
11
+ - LLMs don't know YOUR data - they make things up
12
+ - Industry response (guardrails, RAG, fine-tuning) treats symptoms, not causes
13
+ - Our insight: What if AI couldn't lie even if it wanted to? Through ARCHITECTURE.
14
+ - LLM generates queries, database executes on YOUR data - no hallucination possible
15
+
16
+ #### Mathematical Foundations Section
17
+ Rewrote in SME style explaining WHY each concept matters:
18
+ - **Category Theory**: "Not Academic Masturbation" - makes incorrect tool calls unrepresentable
19
+ - **Curry-Howard**: Proofs you can execute - audit trails ARE mathematical proofs
20
+ - **WCOJ**: When O(N²) is unacceptable - 31K vs 1T operations for triangle queries
21
+ - **Sparse Matrix**: Why your RAM doesn't explode - CSR makes terabyte matrices practical
22
+ - **Semi-Naive Datalog**: Don't repeat yourself - O(depth) vs O(facts) iterations
23
+ - **HNSW**: O(log N) similarity - find top 10 from 10K embeddings in 16ms
24
+
25
+ ---
26
+
5
27
  ## [0.6.54] - 2025-12-17
6
28
 
7
29
  ### Embedded Database: Game Changer
package/README.md CHANGED
@@ -6,6 +6,58 @@
6
6
 
7
7
  ## What Is This?
8
8
 
9
+ ### Have You Ever Wondered Why AI Agents Keep Lying?
10
+
11
+ Here's the uncomfortable truth: **LLMs don't know your data**. They've read Wikipedia, Stack Overflow, and half the internet - but they've never seen your customer records, your claims database, or your internal knowledge graph.
12
+
13
+ So when you ask "find suspicious providers," they do what humans do when they don't know the answer: **they make something up that sounds plausible**.
14
+
15
+ The industry's response? "Add more guardrails!" "Use RAG!" "Fine-tune on your data!"
16
+
17
+ We asked a different question: **What if the AI couldn't lie even if it wanted to?**
18
+
19
+ Not through prompting. Not through fine-tuning. Through **architecture**.
20
+
21
+ ### The Insight That Changes Everything
22
+
23
+ What if instead of asking an LLM to generate answers, we asked it to generate **database queries**?
24
+
25
+ The LLM doesn't need to know your data. It just needs to know:
26
+ 1. What questions can be asked (your schema)
27
+ 2. How to ask them (SPARQL/Datalog syntax)
28
+
29
+ Then a **real database** - with your actual data - executes the query and returns facts. Not hallucinations. Facts.
30
+
31
+ ```
32
+ User: "Find suspicious providers"
33
+
34
+ LLM generates: SELECT ?provider WHERE { ?provider :riskScore ?s . FILTER(?s > 0.8) }
35
+
36
+ Database executes: Scans 47M triples in 449ns per lookup
37
+
38
+ Returns: [PROV001, PROV847, PROV2201] ← These actually exist in YOUR data
39
+ ```
40
+
41
+ **The AI suggests what to look for. The database finds exactly that. No hallucination possible.**
42
+
43
+ ### But Wait - Where's the Database?
44
+
45
+ Here's where it gets interesting. Traditional approach:
46
+ - Install Virtuoso/RDFox/Neo4j server
47
+ - Configure connections
48
+ - Pay for licenses
49
+ - Hire a DBA
50
+
51
+ Our approach: **The database is embedded in your app.**
52
+
53
+ ```bash
54
+ npm install rust-kgdb # That's it. You now have a full SPARQL database.
55
+ ```
56
+
57
+ 47.2MB native addon. Zero configuration. 449ns lookups. Embedded like SQLite, powerful like RDFox.
58
+
59
+ ---
60
+
9
61
  **rust-kgdb** is two layers in one package:
10
62
 
11
63
  ```
@@ -147,6 +199,141 @@ Embedded (single node) → Clustered (distributed)
147
199
 
148
200
  ---
149
201
 
202
+ ## Mathematical Foundations: Why This Actually Works
203
+
204
+ ### The Problem with LLM Tool Calling
205
+
206
+ Here's a dirty secret about AI agents: **most tool calls are prayers**.
207
+
208
+ The LLM generates a function call, hopes the types match, and if it fails? Retry and pray harder. This is why production AI systems feel brittle.
209
+
210
+ We took a different approach: **make incorrect tool calls impossible to express**.
211
+
212
+ ### Category Theory: Not Academic Masturbation
213
+
214
+ When you hear "category theory," you probably think of mathematicians drawing commutative diagrams that no one understands. Here's why it actually matters for AI agents:
215
+
216
+ ```
217
+ Every tool is a morphism: InputType → OutputType
218
+
219
+ kg.sparql.query : Query → BindingSet
220
+ kg.motif.find : Pattern → Matches
221
+ kg.datalog.run : Rules → InferredFacts
222
+ ```
223
+
224
+ **The key insight**: If the LLM can only compose morphisms where types align, it *cannot* hallucinate invalid tool chains. It's not about "being careful" - it's about making mistakes unrepresentable.
225
+
226
+ ```javascript
227
+ // This composition type-checks: Query → BindingSet → Aggregation
228
+ planner.compose(sparqlQuery, aggregator) // ✅ Valid
229
+
230
+ // This doesn't even compile conceptually
231
+ planner.compose(sparqlQuery, imageGenerator) // ❌ Type error
232
+ ```
233
+
234
+ ### Curry-Howard: Proofs You Can Execute
235
+
236
+ The **Curry-Howard correspondence** says something profound: **proofs and programs are the same thing**.
237
+
238
+ In our system:
239
+ - A valid reasoning trace IS a mathematical proof that the answer is correct
240
+ - The type signature of a tool IS a proposition about what it transforms
241
+ - Composing tools IS constructing a proof by implication
242
+
243
+ ```javascript
244
+ result.proofDAG = {
245
+ // This isn't just logging - it's a PROOF OBJECT
246
+ steps: [
247
+ { tool: 'kg.sparql.query', proves: '∃ provider P001 with 47 claims' },
248
+ { tool: 'kg.datalog.rule', proves: 'P001 ∈ highRisk (by rule R3)' }
249
+ ],
250
+ hash: 'sha256:8f3a...', // Same proof = same hash, always
251
+ valid: true // Type-checked, therefore valid
252
+ }
253
+ ```
254
+
255
+ **Why this matters for compliance**: When a regulator asks "why did you flag this provider?", you don't show them chat logs. You show them a mathematical proof.
256
+
257
+ ### WCOJ: When O(N²) is Unacceptable
258
+
259
+ Finding triangles in a graph (A→B→C→A) seems simple. The naive approach:
260
+ 1. For each edge A→B
261
+ 2. For each edge B→C
262
+ 3. Check if C→A exists
263
+
264
+ That's O(N²) - fine for toy graphs, death for production.
265
+
266
+ **Worst-Case Optimal Joins** (LeapFrog TrieJoin) do something clever:
267
+ - Organize edges in tries by (subject, predicate, object)
268
+ - Traverse all three tries simultaneously
269
+ - Skip entire branches that can't possibly match
270
+
271
+ ```
272
+ Traditional: O(N²) for triangle query
273
+ WCOJ: O(N^(ρ/2)) where ρ = fractional edge cover number
274
+
275
+ For triangles: ρ = 1.5, so O(N^0.75) vs O(N²)
276
+ At 1M edges: 31K operations vs 1T operations
277
+ ```
278
+
279
+ ### Sparse Matrix: Why Your RAM Doesn't Explode
280
+
281
+ A knowledge graph with 1M entities has a 1M × 1M adjacency matrix. That's 1 trillion cells. At 8 bytes each: 8 terabytes. For one matrix.
282
+
283
+ **CSR (Compressed Sparse Row)** stores only non-zero entries:
284
+ - Real graphs are ~99.99% sparse
285
+ - 1M entities with 10M edges = 10M entries, not 1T
286
+ - Transitive closure becomes matrix multiplication: A* = I + A + A² + ...
287
+
288
+ ```
289
+ rdfs:subClassOf closure in OWL:
290
+ Dense: Impossible (terabytes of memory)
291
+ CSR: Seconds (megabytes of memory)
292
+ ```
293
+
294
+ ### Semi-Naive Datalog: Don't Repeat Yourself
295
+
296
+ Recursive rules need fixpoint iteration. The naive way recomputes everything:
297
+
298
+ ```
299
+ Iteration 1: Compute ALL ancestor relationships
300
+ Iteration 2: Compute ALL ancestor relationships again ← wasteful
301
+ Iteration 3: Compute ALL ancestor relationships again ← really wasteful
302
+ ```
303
+
304
+ **Semi-naive evaluation**: Only derive facts using NEW facts from the previous iteration.
305
+
306
+ ```
307
+ Iteration 1: Direct parents (new: 1000 facts)
308
+ Iteration 2: Use only those 1000 new facts → grandparents (new: 800)
309
+ Iteration 3: Use only those 800 new facts → great-grandparents (new: 400)
310
+ ...converges in O(depth) iterations, not O(facts)
311
+ ```
312
+
313
+ ### HNSW: O(log N) Similarity in a World of Vectors
314
+
315
+ Finding the nearest neighbor in a million vectors should take a million comparisons. It doesn't have to.
316
+
317
+ **HNSW** builds a navigable graph where:
318
+ - Top layers have few nodes with long-range connections
319
+ - Bottom layers have all nodes with local connections
320
+ - Search: Start at top, greedily descend, refine at bottom
321
+
322
+ ```
323
+ Layer 2: ●───────────────────● (sparse, long jumps)
324
+ │ │
325
+ Layer 1: ●────●────●────●────● (medium density)
326
+ │ │ │ │ │
327
+ Layer 0: ●─●─●─●─●─●─●─●─●─●─● (all nodes, local connections)
328
+
329
+ Search path: Start top-left, jump to approximate region, refine locally
330
+ Result: O(log N) comparisons, ~95% recall
331
+ ```
332
+
333
+ **Why this matters**: When your agent needs "similar past queries," it doesn't scan 10,000 embeddings. It finds the top 10 in 16 milliseconds.
334
+
335
+ ---
336
+
150
337
  ## Core Concepts: What We Bring and Why
151
338
 
152
339
  ### 1. Schema-Aware Query Generation
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "rust-kgdb",
3
- "version": "0.6.54",
3
+ "version": "0.6.55",
4
4
  "description": "High-performance RDF/SPARQL database with AI agent framework. GraphDB (449ns lookups, 35x faster than RDFox), GraphFrames analytics (PageRank, motifs), Datalog reasoning, HNSW vector embeddings. HyperMindAgent for schema-aware query generation with audit trails. W3C SPARQL 1.1 compliant. Native performance via Rust + NAPI-RS.",
5
5
  "main": "index.js",
6
6
  "types": "index.d.ts",