rust-kgdb 0.6.78 → 0.6.80
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +129 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -4,7 +4,24 @@
|
|
|
4
4
|
[](https://opensource.org/licenses/Apache-2.0)
|
|
5
5
|
[](https://www.w3.org/TR/sparql11-query/)
|
|
6
6
|
|
|
7
|
-
> **
|
|
7
|
+
> **Enterprise Knowledge Graph with Native Graph Embeddings**: A production-grade RDF database featuring built-in RDF2Vec, multi-vector composite search, and distributed SPARQL execution—engineered for teams who need verifiable AI at scale.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## What's New in v0.6.79
|
|
12
|
+
|
|
13
|
+
| Feature | Description | Performance |
|
|
14
|
+
|---------|-------------|-------------|
|
|
15
|
+
| **Rdf2VecEngine** | Native graph embeddings from random walks | 68 µs lookup (3,000x faster than APIs) |
|
|
16
|
+
| **Composite Multi-Vector** | RRF fusion of RDF2Vec + OpenAI + domain | +26% recall improvement |
|
|
17
|
+
| **Distributed SPARQL** | HDRF-partitioned Kubernetes clusters | 66-141ms across 3 executors |
|
|
18
|
+
| **Auto-Embedding Triggers** | Vectors generated on graph insert/update | 37 µs incremental updates |
|
|
19
|
+
|
|
20
|
+
```javascript
|
|
21
|
+
const { GraphDB, Rdf2VecEngine, EmbeddingService } = require('rust-kgdb')
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
*See [Native Graph Embeddings](#native-graph-embeddings-rdf2vec-engine) for complete documentation and benchmarks.*
|
|
8
25
|
|
|
9
26
|
---
|
|
10
27
|
|
|
@@ -107,6 +124,26 @@ The math matters. When your fraud detection runs 35x faster, you catch fraud bef
|
|
|
107
124
|
|
|
108
125
|
Most AI frameworks trust the LLM. We don't.
|
|
109
126
|
|
|
127
|
+
### Core Capabilities
|
|
128
|
+
|
|
129
|
+
| Layer | Feature | What It Does |
|
|
130
|
+
|-------|---------|--------------|
|
|
131
|
+
| **Database** | GraphDB | W3C SPARQL 1.1 compliant RDF store with 449ns lookups |
|
|
132
|
+
| **Database** | Distributed SPARQL | HDRF partitioning across Kubernetes executors |
|
|
133
|
+
| **Embeddings** | Rdf2VecEngine | Train 384-dim vectors from graph random walks |
|
|
134
|
+
| **Embeddings** | EmbeddingService | Multi-provider composite vectors with RRF fusion |
|
|
135
|
+
| **Embeddings** | HNSW Index | Approximate nearest neighbor search in 303µs |
|
|
136
|
+
| **Analytics** | GraphFrames | PageRank, connected components, motif matching |
|
|
137
|
+
| **Analytics** | Pregel API | Bulk synchronous parallel graph algorithms |
|
|
138
|
+
| **Reasoning** | Datalog Engine | Recursive rule evaluation with fixpoint semantics |
|
|
139
|
+
| **AI Agent** | HyperMindAgent | Schema-aware SPARQL generation from natural language |
|
|
140
|
+
| **AI Agent** | Type System | Hindley-Milner type inference for query validation |
|
|
141
|
+
| **AI Agent** | Proof DAG | SHA-256 audit trail for every AI decision |
|
|
142
|
+
| **Security** | WASM Sandbox | Capability-based isolation with fuel metering |
|
|
143
|
+
| **Security** | Schema Cache | Cross-agent ontology sharing with validation |
|
|
144
|
+
|
|
145
|
+
### The Architecture Difference
|
|
146
|
+
|
|
110
147
|
```
|
|
111
148
|
+===========================================================================+
|
|
112
149
|
| |
|
|
@@ -305,6 +342,97 @@ The difference? HyperMind treats tools as **typed morphisms** (category theory),
|
|
|
305
342
|
|
|
306
343
|
---
|
|
307
344
|
|
|
345
|
+
## Native Graph Embeddings: RDF2Vec Engine
|
|
346
|
+
|
|
347
|
+
Traditional embedding pipelines introduce significant latency: serialize your entity, make an HTTP request to OpenAI or Cohere, wait 200-500ms, parse the response. For applications requiring real-time similarity—fraud detection, recommendation engines, entity resolution—this latency model becomes a critical bottleneck.
|
|
348
|
+
|
|
349
|
+
**RDF2Vec takes a fundamentally different approach.** Instead of treating entities as text to be embedded by external APIs, it learns vector representations directly from your graph's topology. The algorithm performs random walks across your knowledge graph, treating the resulting paths as "sentences" that capture structural relationships. These walks train a Word2Vec model in-process, producing embeddings that encode how entities relate to each other.
|
|
350
|
+
|
|
351
|
+
```javascript
|
|
352
|
+
const { GraphDB, Rdf2VecEngine } = require('rust-kgdb')
|
|
353
|
+
|
|
354
|
+
// Load your knowledge graph
|
|
355
|
+
const db = new GraphDB('http://enterprise/claims')
|
|
356
|
+
db.loadTtl(claimsOntology, null) // 130,923 triples/sec throughput
|
|
357
|
+
|
|
358
|
+
// Initialize the RDF2Vec engine
|
|
359
|
+
const rdf2vec = new Rdf2VecEngine()
|
|
360
|
+
|
|
361
|
+
// Train embeddings from graph structure
|
|
362
|
+
// Walks capture: Provider → submits → Claim → involves → Patient
|
|
363
|
+
const walks = extractRandomWalks(db)
|
|
364
|
+
rdf2vec.train(JSON.stringify(walks)) // 1,207 walks/sec → 384-dim vectors
|
|
365
|
+
|
|
366
|
+
// Retrieve embeddings with microsecond latency
|
|
367
|
+
const embedding = rdf2vec.getEmbedding('http://claims/provider/4521') // 68 µs
|
|
368
|
+
|
|
369
|
+
// Find structurally similar entities
|
|
370
|
+
const similar = rdf2vec.findSimilar(provider, candidateProviders, 10) // 303 µs
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
### Performance: Why Microseconds Matter
|
|
374
|
+
|
|
375
|
+
| Operation | rust-kgdb (RDF2Vec) | External API (OpenAI) | Advantage |
|
|
376
|
+
|-----------|---------------------|----------------------|-----------|
|
|
377
|
+
| **Single Embedding Lookup** | 68 µs | 200-500 ms | **3,000-7,000x faster** |
|
|
378
|
+
| **Similarity Search (k=10)** | 303 µs | 300-800 ms | **1,000-2,600x faster** |
|
|
379
|
+
| **Batch Training (1K walks)** | 829 ms | N/A | Graph-native training |
|
|
380
|
+
| **Rate Limits** | None (in-process) | Quota-restricted | Unlimited throughput |
|
|
381
|
+
|
|
382
|
+
**Practical Impact**: When investigating a flagged claim, an analyst might check 50 similar providers. At 300ms per API call, that's 15 seconds of waiting. With RDF2Vec at 303µs per lookup, the same operation completes in 15 milliseconds—a 1,000x improvement that transforms the user experience from "waiting for AI" to "instant insight."
|
|
383
|
+
|
|
384
|
+
### Multi-Vector Composite Embeddings with RRF
|
|
385
|
+
|
|
386
|
+
Real-world similarity often requires multiple perspectives. A claim's structural relationships (RDF2Vec) tell a different story than its textual description (OpenAI) or domain-specific features (custom model). The `EmbeddingService` supports composite embeddings with Reciprocal Rank Fusion (RRF) to combine these views:
|
|
387
|
+
|
|
388
|
+
```javascript
|
|
389
|
+
const service = new EmbeddingService()
|
|
390
|
+
|
|
391
|
+
// Store embeddings from multiple sources
|
|
392
|
+
service.storeComposite('CLM-2024-0847', JSON.stringify({
|
|
393
|
+
rdf2vec: rdf2vec.getEmbedding('CLM-2024-0847'), // Graph structure
|
|
394
|
+
openai: await openaiEmbed(claimNarrative), // Semantic content
|
|
395
|
+
domain: fraudRiskEmbedding // Domain-specific signals
|
|
396
|
+
}))
|
|
397
|
+
|
|
398
|
+
// RRF fusion combines rankings from each source
|
|
399
|
+
// Formula: Score = Σ(1 / (k + rank_i)), k=60
|
|
400
|
+
const similar = service.findSimilarComposite('CLM-2024-0847', 10, 0.7, 'rrf')
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
| Candidate Pool | Single-Source Recall | RRF Composite Recall | Improvement |
|
|
404
|
+
|----------------|---------------------|---------------------|-------------|
|
|
405
|
+
| 100 entities | 78% | **89%** | +14% |
|
|
406
|
+
| 1,000 entities | 72% | **85%** | +18% |
|
|
407
|
+
| 10,000 entities | 65% | **82%** | +26% |
|
|
408
|
+
|
|
409
|
+
### Distributed Cluster Benchmarks (Kubernetes)
|
|
410
|
+
|
|
411
|
+
For deployments exceeding single-node capacity, rust-kgdb supports distributed execution across Kubernetes clusters. Verified benchmarks on the LUBM academic dataset:
|
|
412
|
+
|
|
413
|
+
| Query | Pattern | Results | Latency |
|
|
414
|
+
|-------|---------|---------|---------|
|
|
415
|
+
| Q1 | Type lookup (GraduateStudent) | 150 | **66 ms** |
|
|
416
|
+
| Q4 | Join (student → advisor) | 150 | **101 ms** |
|
|
417
|
+
| Q6 | 2-hop join (advisor → department) | 46 | **75 ms** |
|
|
418
|
+
| Q7 | Course enrollment scan | 570 | **141 ms** |
|
|
419
|
+
|
|
420
|
+
**Configuration**: 1 coordinator + 3 executors, HDRF partitioning, NodePort access at `localhost:30080`. Triples distribute automatically across executors; multi-hop joins execute seamlessly across partition boundaries.
|
|
421
|
+
|
|
422
|
+
### End-to-End Pipeline Throughput
|
|
423
|
+
|
|
424
|
+
| Stage | Throughput | Notes |
|
|
425
|
+
|-------|------------|-------|
|
|
426
|
+
| Graph ingestion | **130,923 triples/sec** | Bulk load with indexing |
|
|
427
|
+
| RDF2Vec training | **1,207 walks/sec** | Configurable walk length/count |
|
|
428
|
+
| Embedding lookup | **68 µs** (14,700/sec) | In-memory, zero network |
|
|
429
|
+
| Similarity search | **303 µs** (3,300/sec) | HNSW index |
|
|
430
|
+
| Incremental update | **37 µs** | No full retrain required |
|
|
431
|
+
|
|
432
|
+
*For detailed configuration options, see [Walk Configuration](#walk-configuration-tuning-rdf2vec-performance) and [Auto-Embedding Triggers](#auto-embedding-triggers-automatic-on-graph-insertupdate) below.*
|
|
433
|
+
|
|
434
|
+
---
|
|
435
|
+
|
|
308
436
|
## The Deeper Problem: AI Agents Forget
|
|
309
437
|
|
|
310
438
|
Fixing SPARQL syntax is table stakes. Here's what keeps enterprise architects up at night:
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "rust-kgdb",
|
|
3
|
-
"version": "0.6.
|
|
3
|
+
"version": "0.6.80",
|
|
4
4
|
"description": "High-performance RDF/SPARQL database with AI agent framework. GraphDB (449ns lookups, 35x faster than RDFox), GraphFrames analytics (PageRank, motifs), Datalog reasoning, HNSW vector embeddings. HyperMindAgent for schema-aware query generation with audit trails. W3C SPARQL 1.1 compliant. Native performance via Rust + NAPI-RS.",
|
|
5
5
|
"main": "index.js",
|
|
6
6
|
"types": "index.d.ts",
|