rust-kgdb 0.1.9 → 0.1.11
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +426 -268
- package/package.json +7 -7
package/README.md
CHANGED
|
@@ -1,237 +1,469 @@
|
|
|
1
|
-
# rust-kgdb
|
|
1
|
+
# rust-kgdb
|
|
2
2
|
|
|
3
3
|
[](https://www.npmjs.com/package/rust-kgdb)
|
|
4
4
|
[](https://opensource.org/licenses/Apache-2.0)
|
|
5
5
|
|
|
6
|
-
**Production-ready
|
|
6
|
+
**Production-ready RDF/hypergraph database with 100% W3C SPARQL 1.1 + RDF 1.2 compliance, worst-case optimal joins (WCOJ), and pluggable storage backends.**
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
---
|
|
9
9
|
|
|
10
|
-
|
|
11
|
-
- **100% W3C RDF 1.2 Compliance** - Full standard implementation
|
|
12
|
-
- **WCOJ Execution** (v0.1.8) - LeapFrog TrieJoin for optimal multi-way joins
|
|
13
|
-
- **Zero-Copy Semantics** - Minimal allocations, maximum performance
|
|
14
|
-
- **Blazing Fast** - 2.78 µs triple lookups, 146K triples/sec bulk insert
|
|
15
|
-
- **Memory Efficient** - 24 bytes/triple (25% better than RDFox)
|
|
16
|
-
- **Native Rust** - Safe, reliable, production-ready
|
|
10
|
+
## Why rust-kgdb?
|
|
17
11
|
|
|
18
|
-
|
|
12
|
+
| Feature | rust-kgdb | Apache Jena | RDFox |
|
|
13
|
+
|---------|-----------|-------------|-------|
|
|
14
|
+
| **Lookup Speed** | 2.78 µs | ~50 µs | 50-100 µs |
|
|
15
|
+
| **Memory/Triple** | 24 bytes | 50-60 bytes | 32 bytes |
|
|
16
|
+
| **SPARQL 1.1** | 100% | 100% | 95% |
|
|
17
|
+
| **RDF 1.2** | 100% | Partial | No |
|
|
18
|
+
| **WCOJ** | ✅ LeapFrog | ❌ | ❌ |
|
|
19
|
+
| **Mobile-Ready** | ✅ iOS/Android | ❌ | ❌ |
|
|
19
20
|
|
|
20
|
-
|
|
21
|
+
---
|
|
21
22
|
|
|
22
|
-
|
|
23
|
-
|------------|---------------------|--------------|------------------|
|
|
24
|
-
| **Star Queries** (3+ patterns) | O(n³) | O(n log n) | **50-100x** |
|
|
25
|
-
| **Complex Joins** (4+ patterns) | O(n⁴) | O(n log n) | **100-1000x** |
|
|
26
|
-
| **Chain Queries** | O(n²) | O(n log n) | **10-20x** |
|
|
23
|
+
## Core Technical Innovations
|
|
27
24
|
|
|
28
|
-
###
|
|
25
|
+
### 1. Worst-Case Optimal Joins (WCOJ)
|
|
29
26
|
|
|
30
|
-
|
|
31
|
-
|--------|--------|------|----------|
|
|
32
|
-
| **Lookup** | 2.78 µs | 359K/sec | ✅ **35-180x faster** |
|
|
33
|
-
| **Bulk Insert** | 682 ms (100K) | 146K/sec | ⚠️ 73% speed (gap closing) |
|
|
34
|
-
| **Memory** | 24 bytes/triple | - | ✅ **25% better** |
|
|
27
|
+
Traditional databases use **nested-loop joins** with O(n²) to O(n⁴) complexity. rust-kgdb implements the **LeapFrog TrieJoin** algorithm—a worst-case optimal join that achieves O(n log n) for multi-way joins.
|
|
35
28
|
|
|
36
|
-
|
|
29
|
+
**How it works:**
|
|
30
|
+
- **Trie Data Structure**: Triples indexed hierarchically (S→P→O) using BTreeMap for sorted access
|
|
31
|
+
- **Variable Ordering**: Frequency-based analysis orders variables for optimal intersection
|
|
32
|
+
- **LeapFrog Iterator**: Binary search across sorted iterators finds intersections without materializing intermediate results
|
|
37
33
|
|
|
38
|
-
|
|
34
|
+
```
|
|
35
|
+
Query: SELECT ?x ?y ?z WHERE { ?x :p ?y . ?y :q ?z . ?x :r ?z }
|
|
39
36
|
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
| **Q2: 5-way star** | 234ms | **183ms** | ✅ **22% faster** | Strong |
|
|
44
|
-
| **Q3: 3-way star** | 177ms | **62ms** | 🔥 **65% faster** | Exceptional |
|
|
45
|
-
| **Q4: 3-hop chain** | 254ms | **101ms** | 🔥 **60% faster** | Exceptional |
|
|
46
|
-
| **Q5: 2-hop chain** | 230ms | **53ms** | 🔥 **77% faster** | **BEST** |
|
|
47
|
-
| **Q6: 6-way complex** | 641ms | **464ms** | ✅ **28% faster** | Good |
|
|
48
|
-
| **Q7: Hierarchy** | 343ms | **198ms** | ✅ **42% faster** | Strong |
|
|
49
|
-
| **Q8: Triangle** | 410ms | **193ms** | ✅ **53% faster** | Strong |
|
|
37
|
+
Nested Loop: O(n³) - examines every combination
|
|
38
|
+
WCOJ: O(n log n) - iterates in sorted order, seeks forward on mismatch
|
|
39
|
+
```
|
|
50
40
|
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
-
|
|
54
|
-
|
|
55
|
-
|
|
41
|
+
| Query Pattern | Before (Nested Loop) | After (WCOJ) | Speedup |
|
|
42
|
+
|---------------|---------------------|--------------|---------|
|
|
43
|
+
| 3-way star | O(n³) | O(n log n) | **50-100x** |
|
|
44
|
+
| 4+ way complex | O(n⁴) | O(n log n) | **100-1000x** |
|
|
45
|
+
| Chain queries | O(n²) | O(n log n) | **10-20x** |
|
|
56
46
|
|
|
57
|
-
|
|
58
|
-
1. **Instrumentation Build**: Compiler adds profiling hooks
|
|
59
|
-
2. **Profile Collection**: Run real workload (23 runtime profiles collected)
|
|
60
|
-
3. **Profile Merging**: Combine profiles into 5.9M merged dataset
|
|
61
|
-
4. **Optimized Rebuild**: Compiler uses runtime data for:
|
|
62
|
-
- Optimized hot paths (loops, function calls)
|
|
63
|
-
- Improved branch prediction
|
|
64
|
-
- Enhanced instruction cache locality
|
|
65
|
-
- Better CPU pipelining
|
|
47
|
+
### 2. Sparse Matrix Engine (CSR Format)
|
|
66
48
|
|
|
67
|
-
**
|
|
49
|
+
Binary relations (e.g., `foaf:knows`, `rdfs:subClassOf`) are converted to **Compressed Sparse Row (CSR)** matrices for cache-efficient join evaluation:
|
|
68
50
|
|
|
69
|
-
|
|
51
|
+
- **Memory**: O(nnz) where nnz = number of edges (not O(n²))
|
|
52
|
+
- **Matrix Multiplication**: Replaces nested-loop joins
|
|
53
|
+
- **Transitive Closure**: Semi-naive Δ-matrix evaluation (not iterated powers)
|
|
70
54
|
|
|
71
|
-
```
|
|
72
|
-
|
|
55
|
+
```rust
|
|
56
|
+
// Traditional: O(n²) nested loops
|
|
57
|
+
for (s, p, o) in triples { ... }
|
|
58
|
+
|
|
59
|
+
// CSR Matrix: O(nnz) cache-friendly iteration
|
|
60
|
+
row_ptr[i] → col_indices[j] → values[j]
|
|
73
61
|
```
|
|
74
62
|
|
|
75
|
-
|
|
63
|
+
**Used for**: RDFS/OWL reasoning, transitive closure, Datalog evaluation.
|
|
64
|
+
|
|
65
|
+
### 3. SIMD + PGO Compiler Optimizations
|
|
66
|
+
|
|
67
|
+
**Zero code changes—pure compiler-level performance gains.**
|
|
68
|
+
|
|
69
|
+
| Optimization | Technology | Effect |
|
|
70
|
+
|--------------|------------|--------|
|
|
71
|
+
| **SIMD Vectorization** | AVX2/BMI2 (Intel), NEON (ARM) | 8-wide parallel operations |
|
|
72
|
+
| **Profile-Guided Optimization** | LLVM PGO | Hot path optimization, branch prediction |
|
|
73
|
+
| **Link-Time Optimization** | LTO (fat) | Cross-crate inlining, dead code elimination |
|
|
74
|
+
|
|
75
|
+
**Benchmark Results (LUBM, Intel Skylake):**
|
|
76
|
+
|
|
77
|
+
| Query | Before | After (SIMD+PGO) | Improvement |
|
|
78
|
+
|-------|--------|------------------|-------------|
|
|
79
|
+
| Q5: 2-hop chain | 230ms | 53ms | **77% faster** |
|
|
80
|
+
| Q3: 3-way star | 177ms | 62ms | **65% faster** |
|
|
81
|
+
| Q4: 3-hop chain | 254ms | 101ms | **60% faster** |
|
|
82
|
+
| Q8: Triangle | 410ms | 193ms | **53% faster** |
|
|
83
|
+
| Q7: Hierarchy | 343ms | 198ms | **42% faster** |
|
|
84
|
+
| Q6: 6-way complex | 641ms | 464ms | **28% faster** |
|
|
85
|
+
| Q2: 5-way star | 234ms | 183ms | **22% faster** |
|
|
86
|
+
| Q1: 4-way star | 283ms | 258ms | **9% faster** |
|
|
87
|
+
|
|
88
|
+
**Average speedup: 44.5%** across all queries.
|
|
89
|
+
|
|
90
|
+
### 4. Quad Indexing (SPOC)
|
|
91
|
+
|
|
92
|
+
Four complementary indexes enable O(1) pattern matching regardless of query shape:
|
|
93
|
+
|
|
94
|
+
| Index | Pattern | Use Case |
|
|
95
|
+
|-------|---------|----------|
|
|
96
|
+
| **SPOC** | `(?s, ?p, ?o, ?g)` | Subject-centric queries |
|
|
97
|
+
| **POCS** | `(?p, ?o, ?c, ?s)` | Property enumeration |
|
|
98
|
+
| **OCSP** | `(?o, ?c, ?s, ?p)` | Object lookups (reverse links) |
|
|
99
|
+
| **CSPO** | `(?c, ?s, ?p, ?o)` | Named graph iteration |
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
## Storage Backends
|
|
104
|
+
|
|
105
|
+
rust-kgdb uses a pluggable storage architecture. **Default is in-memory** (zero configuration). For persistence, enable RocksDB.
|
|
106
|
+
|
|
107
|
+
| Backend | Feature Flag | Use Case | Status |
|
|
108
|
+
|---------|--------------|----------|--------|
|
|
109
|
+
| **InMemory** | `default` | Development, testing, embedded | ✅ **Production Ready** |
|
|
110
|
+
| **RocksDB** | `rocksdb-backend` | Production, large datasets | ✅ **61 tests passing** |
|
|
111
|
+
| **LMDB** | `lmdb-backend` | Read-heavy workloads | ⏳ Planned v0.2.0 |
|
|
112
|
+
|
|
113
|
+
### InMemory (Default)
|
|
114
|
+
|
|
115
|
+
Zero configuration, maximum performance. Data is volatile (lost on process exit).
|
|
116
|
+
|
|
117
|
+
**High-Performance Data Structures:**
|
|
118
|
+
|
|
119
|
+
| Component | Structure | Why |
|
|
120
|
+
|-----------|-----------|-----|
|
|
121
|
+
| **Triple Store** | `DashMap` | Lock-free concurrent hash map, 100K pre-allocation |
|
|
122
|
+
| **WCOJ Trie** | `BTreeMap` | Sorted iteration for LeapFrog intersection |
|
|
123
|
+
| **Dictionary** | `FxHashSet` | String interning with rustc-optimized hashing |
|
|
124
|
+
| **Hypergraph** | `FxHashMap` | Fast node→edge adjacency lists |
|
|
125
|
+
| **Reasoning** | `AHashMap` | RDFS/OWL inference with DoS-resistant hashing |
|
|
126
|
+
| **Datalog** | `FxHashMap` | Semi-naive evaluation with delta propagation |
|
|
127
|
+
|
|
128
|
+
**Why these structures enable sub-microsecond performance:**
|
|
129
|
+
- **DashMap**: Sharded locks (16 shards default) → near-linear scaling on multi-core
|
|
130
|
+
- **FxHashMap**: Rust compiler's hash function → 30% faster than std HashMap
|
|
131
|
+
- **BTreeMap**: O(log n) ordered iteration → enables binary search in LeapFrog
|
|
132
|
+
- **Pre-allocation**: 100K capacity avoids rehashing during bulk inserts
|
|
133
|
+
|
|
134
|
+
```rust
|
|
135
|
+
use storage::{QuadStore, InMemoryBackend};
|
|
136
|
+
|
|
137
|
+
let store = QuadStore::new(InMemoryBackend::new());
|
|
138
|
+
// Ultra-fast: 2.78 µs lookups, zero disk I/O
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### RocksDB (Persistent)
|
|
142
|
+
|
|
143
|
+
LSM-tree based storage with ACID transactions. Tested with **61 comprehensive tests**.
|
|
144
|
+
|
|
145
|
+
```toml
|
|
146
|
+
# Cargo.toml - Enable RocksDB backend
|
|
147
|
+
[dependencies]
|
|
148
|
+
storage = { version = "0.1.10", features = ["rocksdb-backend"] }
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
```rust
|
|
152
|
+
use storage::{QuadStore, RocksDbBackend};
|
|
153
|
+
|
|
154
|
+
// Create persistent database
|
|
155
|
+
let backend = RocksDbBackend::new("/path/to/data")?;
|
|
156
|
+
let store = QuadStore::new(backend);
|
|
157
|
+
|
|
158
|
+
// Features:
|
|
159
|
+
// - ACID transactions
|
|
160
|
+
// - Snappy compression (automatic)
|
|
161
|
+
// - Crash recovery
|
|
162
|
+
// - Range & prefix scanning
|
|
163
|
+
// - 1MB+ value support
|
|
164
|
+
|
|
165
|
+
// Force sync to disk
|
|
166
|
+
store.flush()?;
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
**RocksDB Test Coverage:**
|
|
170
|
+
- Basic CRUD operations (14 tests)
|
|
171
|
+
- Range scanning (8 tests)
|
|
172
|
+
- Prefix scanning (6 tests)
|
|
173
|
+
- Batch operations (8 tests)
|
|
174
|
+
- Transactions (8 tests)
|
|
175
|
+
- Concurrent access (5 tests)
|
|
176
|
+
- Unicode & binary data (4 tests)
|
|
177
|
+
- Large key/value handling (8 tests)
|
|
178
|
+
|
|
179
|
+
### TypeScript SDK
|
|
180
|
+
|
|
181
|
+
The npm package uses the in-memory backend—ideal for:
|
|
182
|
+
- Knowledge graph queries
|
|
183
|
+
- SPARQL execution
|
|
184
|
+
- Data transformation pipelines
|
|
185
|
+
- Embedded applications
|
|
186
|
+
|
|
187
|
+
```typescript
|
|
188
|
+
import { GraphDB } from 'rust-kgdb'
|
|
76
189
|
|
|
77
|
-
-
|
|
78
|
-
|
|
190
|
+
// In-memory database (default, no configuration needed)
|
|
191
|
+
const db = new GraphDB('http://example.org/app')
|
|
192
|
+
|
|
193
|
+
// For persistence, export via CONSTRUCT:
|
|
194
|
+
const ntriples = db.queryConstruct('CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }')
|
|
195
|
+
fs.writeFileSync('backup.nt', ntriples)
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
---
|
|
199
|
+
|
|
200
|
+
## Installation
|
|
201
|
+
|
|
202
|
+
```bash
|
|
203
|
+
npm install rust-kgdb
|
|
204
|
+
```
|
|
79
205
|
|
|
80
206
|
### Platform Support
|
|
81
207
|
|
|
82
|
-
| Platform | Architecture | Status |
|
|
83
|
-
|
|
84
|
-
| **macOS** |
|
|
85
|
-
| **macOS** |
|
|
86
|
-
| **Linux** | x64 | ✅
|
|
87
|
-
| **Linux** | arm64 | ✅
|
|
88
|
-
| **Windows** | x64 | ✅
|
|
89
|
-
| **Windows** | arm64 | ⏳
|
|
208
|
+
| Platform | Architecture | Status | SIMD |
|
|
209
|
+
|----------|-------------|--------|------|
|
|
210
|
+
| **macOS** | Intel (x64) | ✅ | AVX2, BMI2, POPCNT |
|
|
211
|
+
| **macOS** | Apple Silicon (arm64) | ✅ | NEON |
|
|
212
|
+
| **Linux** | x64 | ✅ | AVX2, BMI2, POPCNT |
|
|
213
|
+
| **Linux** | arm64 | ✅ | NEON |
|
|
214
|
+
| **Windows** | x64 | ✅ | AVX2, BMI2, POPCNT |
|
|
215
|
+
| **Windows** | arm64 | ⏳ v0.2.0 | — |
|
|
90
216
|
|
|
91
|
-
**
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
- **Profile-Guided Optimization (PGO)**: Runtime profile-based code generation
|
|
217
|
+
**No compilation required**—pre-built native binaries included.
|
|
218
|
+
|
|
219
|
+
---
|
|
95
220
|
|
|
96
|
-
|
|
221
|
+
## Quick Start
|
|
97
222
|
|
|
98
|
-
|
|
223
|
+
### Complete Working Example
|
|
99
224
|
|
|
100
225
|
```typescript
|
|
101
|
-
import { GraphDB
|
|
226
|
+
import { GraphDB } from 'rust-kgdb'
|
|
102
227
|
|
|
103
|
-
// Create
|
|
104
|
-
const db = new GraphDB('http://example.org/
|
|
228
|
+
// 1. Create database
|
|
229
|
+
const db = new GraphDB('http://example.org/myapp')
|
|
105
230
|
|
|
106
|
-
//
|
|
231
|
+
// 2. Load data (Turtle format)
|
|
107
232
|
db.loadTtl(`
|
|
108
233
|
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
|
|
234
|
+
@prefix ex: <http://example.org/> .
|
|
109
235
|
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
236
|
+
ex:alice a foaf:Person ;
|
|
237
|
+
foaf:name "Alice" ;
|
|
238
|
+
foaf:age 30 ;
|
|
239
|
+
foaf:knows ex:bob, ex:charlie .
|
|
113
240
|
|
|
114
|
-
|
|
115
|
-
|
|
241
|
+
ex:bob a foaf:Person ;
|
|
242
|
+
foaf:name "Bob" ;
|
|
243
|
+
foaf:age 25 ;
|
|
244
|
+
foaf:knows ex:charlie .
|
|
245
|
+
|
|
246
|
+
ex:charlie a foaf:Person ;
|
|
247
|
+
foaf:name "Charlie" ;
|
|
248
|
+
foaf:age 35 .
|
|
116
249
|
`, null)
|
|
117
250
|
|
|
118
|
-
//
|
|
119
|
-
const
|
|
251
|
+
// 3. Query: Find friends-of-friends (WCOJ optimized!)
|
|
252
|
+
const fof = db.querySelect(`
|
|
120
253
|
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
254
|
+
PREFIX ex: <http://example.org/>
|
|
121
255
|
|
|
122
|
-
SELECT ?person ?
|
|
123
|
-
?person foaf:
|
|
124
|
-
|
|
256
|
+
SELECT ?person ?friend ?fof WHERE {
|
|
257
|
+
?person foaf:knows ?friend .
|
|
258
|
+
?friend foaf:knows ?fof .
|
|
259
|
+
FILTER(?person != ?fof)
|
|
125
260
|
}
|
|
126
|
-
ORDER BY DESC(?age)
|
|
127
261
|
`)
|
|
262
|
+
console.log('Friends of Friends:', fof)
|
|
263
|
+
// [{ person: 'ex:alice', friend: 'ex:bob', fof: 'ex:charlie' }]
|
|
264
|
+
|
|
265
|
+
// 4. Aggregation: Average age
|
|
266
|
+
const stats = db.querySelect(`
|
|
267
|
+
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
128
268
|
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
269
|
+
SELECT (COUNT(?p) AS ?count) (AVG(?age) AS ?avgAge) WHERE {
|
|
270
|
+
?p a foaf:Person ; foaf:age ?age .
|
|
271
|
+
}
|
|
272
|
+
`)
|
|
273
|
+
console.log('Stats:', stats)
|
|
274
|
+
// [{ count: '3', avgAge: '30.0' }]
|
|
134
275
|
|
|
135
|
-
//
|
|
276
|
+
// 5. ASK query
|
|
136
277
|
const hasAlice = db.queryAsk(`
|
|
137
|
-
|
|
278
|
+
PREFIX ex: <http://example.org/>
|
|
279
|
+
ASK { ex:alice a <http://xmlns.com/foaf/0.1/Person> }
|
|
138
280
|
`)
|
|
139
|
-
console.log(hasAlice)
|
|
281
|
+
console.log('Has Alice?', hasAlice) // true
|
|
282
|
+
|
|
283
|
+
// 6. CONSTRUCT query
|
|
284
|
+
const graph = db.queryConstruct(`
|
|
285
|
+
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
286
|
+
PREFIX ex: <http://example.org/>
|
|
287
|
+
|
|
288
|
+
CONSTRUCT { ?p foaf:knows ?f }
|
|
289
|
+
WHERE { ?p foaf:knows ?f }
|
|
290
|
+
`)
|
|
291
|
+
console.log('Extracted graph:', graph)
|
|
292
|
+
|
|
293
|
+
// 7. Count and cleanup
|
|
294
|
+
console.log('Triple count:', db.count()) // 11
|
|
295
|
+
db.clear()
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
### Save to File
|
|
299
|
+
|
|
300
|
+
```typescript
|
|
301
|
+
import { writeFileSync } from 'fs'
|
|
302
|
+
|
|
303
|
+
// Save as N-Triples
|
|
304
|
+
const db = new GraphDB('http://example.org/export')
|
|
305
|
+
db.loadTtl(`<http://example.org/s> <http://example.org/p> "value" .`, null)
|
|
140
306
|
|
|
141
|
-
|
|
142
|
-
|
|
307
|
+
const ntriples = db.queryConstruct(`CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }`)
|
|
308
|
+
writeFileSync('output.nt', ntriples)
|
|
143
309
|
```
|
|
144
310
|
|
|
145
|
-
|
|
311
|
+
---
|
|
146
312
|
|
|
147
|
-
|
|
313
|
+
## SPARQL 1.1 Features (100% W3C Compliant)
|
|
314
|
+
|
|
315
|
+
### Query Forms
|
|
148
316
|
|
|
149
317
|
```typescript
|
|
150
|
-
//
|
|
151
|
-
|
|
152
|
-
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
318
|
+
// SELECT - return bindings
|
|
319
|
+
db.querySelect('SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10')
|
|
153
320
|
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
321
|
+
// ASK - boolean existence check
|
|
322
|
+
db.queryAsk('ASK { <http://example.org/x> ?p ?o }')
|
|
323
|
+
|
|
324
|
+
// CONSTRUCT - build new graph
|
|
325
|
+
db.queryConstruct('CONSTRUCT { ?s <http://new/prop> ?o } WHERE { ?s ?p ?o }')
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
### Aggregates
|
|
329
|
+
|
|
330
|
+
```typescript
|
|
331
|
+
db.querySelect(`
|
|
332
|
+
SELECT ?type (COUNT(*) AS ?count) (AVG(?value) AS ?avg)
|
|
333
|
+
WHERE { ?s a ?type ; <http://ex/value> ?value }
|
|
334
|
+
GROUP BY ?type
|
|
335
|
+
HAVING (COUNT(*) > 5)
|
|
336
|
+
ORDER BY DESC(?count)
|
|
159
337
|
`)
|
|
338
|
+
```
|
|
339
|
+
|
|
340
|
+
### Property Paths
|
|
160
341
|
|
|
161
|
-
|
|
162
|
-
//
|
|
342
|
+
```typescript
|
|
343
|
+
// Transitive closure (rdfs:subClassOf*)
|
|
344
|
+
db.querySelect('SELECT ?class WHERE { ?class rdfs:subClassOf* <http://top/Class> }')
|
|
345
|
+
|
|
346
|
+
// Alternative paths
|
|
347
|
+
db.querySelect('SELECT ?name WHERE { ?x (foaf:name|rdfs:label) ?name }')
|
|
348
|
+
|
|
349
|
+
// Sequence paths
|
|
350
|
+
db.querySelect('SELECT ?grandparent WHERE { ?x foaf:parent/foaf:parent ?grandparent }')
|
|
163
351
|
```
|
|
164
352
|
|
|
165
|
-
###
|
|
353
|
+
### Named Graphs
|
|
166
354
|
|
|
167
355
|
```typescript
|
|
168
|
-
//
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
?person1 org:name ?name1 .
|
|
176
|
-
?person2 org:name ?name2 .
|
|
177
|
-
FILTER(?person1 != ?person2)
|
|
356
|
+
// Load into named graph
|
|
357
|
+
db.loadTtl('<http://s> <http://p> "o" .', 'http://example.org/graph1')
|
|
358
|
+
|
|
359
|
+
// Query specific graph
|
|
360
|
+
db.querySelect(`
|
|
361
|
+
SELECT ?s ?p ?o WHERE {
|
|
362
|
+
GRAPH <http://example.org/graph1> { ?s ?p ?o }
|
|
178
363
|
}
|
|
179
364
|
`)
|
|
365
|
+
```
|
|
366
|
+
|
|
367
|
+
### UPDATE Operations
|
|
180
368
|
|
|
181
|
-
|
|
182
|
-
//
|
|
369
|
+
```typescript
|
|
370
|
+
// INSERT DATA
|
|
371
|
+
db.updateInsert(`
|
|
372
|
+
INSERT DATA { <http://ex/new> <http://ex/prop> "value" }
|
|
373
|
+
`)
|
|
374
|
+
|
|
375
|
+
// DELETE WHERE
|
|
376
|
+
db.updateDelete(`
|
|
377
|
+
DELETE WHERE { ?s <http://ex/deprecated> ?o }
|
|
378
|
+
`)
|
|
183
379
|
```
|
|
184
380
|
|
|
185
|
-
|
|
381
|
+
---
|
|
382
|
+
|
|
383
|
+
## Sample Application
|
|
384
|
+
|
|
385
|
+
### Knowledge Graph Demo
|
|
386
|
+
|
|
387
|
+
A complete, production-ready sample application demonstrating enterprise knowledge graph capabilities is available in the repository.
|
|
388
|
+
|
|
389
|
+
**Location**: [`examples/knowledge-graph-demo/`](../../examples/knowledge-graph-demo/)
|
|
390
|
+
|
|
391
|
+
**Features Demonstrated**:
|
|
392
|
+
- Complete organizational knowledge graph (employees, departments, projects, skills)
|
|
393
|
+
- SPARQL SELECT queries with star and chain patterns (WCOJ-optimized)
|
|
394
|
+
- Aggregations (COUNT, AVG, GROUP BY, HAVING)
|
|
395
|
+
- Property paths for transitive closure (organizational hierarchy)
|
|
396
|
+
- SPARQL ASK and CONSTRUCT queries
|
|
397
|
+
- Named graphs for multi-tenant data isolation
|
|
398
|
+
- Data export to Turtle format
|
|
399
|
+
|
|
400
|
+
**Run the Demo**:
|
|
401
|
+
|
|
402
|
+
```bash
|
|
403
|
+
cd examples/knowledge-graph-demo
|
|
404
|
+
npm install
|
|
405
|
+
npm start
|
|
406
|
+
```
|
|
407
|
+
|
|
408
|
+
**Sample Output**:
|
|
409
|
+
|
|
410
|
+
The demo creates a realistic knowledge graph with:
|
|
411
|
+
- 5 employees across 4 departments
|
|
412
|
+
- 13 technical and soft skills
|
|
413
|
+
- 2 software projects
|
|
414
|
+
- Reporting hierarchies and salary data
|
|
415
|
+
- Named graph for sensitive compensation data
|
|
416
|
+
|
|
417
|
+
**Example Query from Demo** (finds all direct and indirect reports):
|
|
186
418
|
|
|
187
419
|
```typescript
|
|
188
|
-
|
|
189
|
-
|
|
420
|
+
const pathQuery = `
|
|
421
|
+
PREFIX ex: <http://example.org/>
|
|
190
422
|
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
191
423
|
|
|
192
|
-
SELECT ?
|
|
193
|
-
?
|
|
194
|
-
?
|
|
424
|
+
SELECT ?employee ?name WHERE {
|
|
425
|
+
?employee ex:reportsTo+ ex:alice . # Transitive closure
|
|
426
|
+
?employee foaf:name ?name .
|
|
195
427
|
}
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
// Expected speedup: 10-20x over nested loop
|
|
428
|
+
ORDER BY ?name
|
|
429
|
+
`
|
|
430
|
+
const results = db.querySelect(pathQuery)
|
|
200
431
|
```
|
|
201
432
|
|
|
202
|
-
|
|
433
|
+
**Learn More**: See the [demo README](../../examples/knowledge-graph-demo/README.md) for full documentation, query examples, and how to customize the knowledge graph.
|
|
434
|
+
|
|
435
|
+
---
|
|
436
|
+
|
|
437
|
+
## API Reference
|
|
203
438
|
|
|
204
439
|
### GraphDB Class
|
|
205
440
|
|
|
206
441
|
```typescript
|
|
207
442
|
class GraphDB {
|
|
208
|
-
// Create
|
|
209
|
-
static inMemory(): GraphDB
|
|
210
|
-
constructor(baseUri: string)
|
|
443
|
+
constructor(baseUri: string) // Create with base URI
|
|
444
|
+
static inMemory(): GraphDB // Create anonymous in-memory DB
|
|
211
445
|
|
|
212
|
-
// Data
|
|
213
|
-
loadTtl(data: string,
|
|
214
|
-
loadNTriples(data: string,
|
|
446
|
+
// Data Loading
|
|
447
|
+
loadTtl(data: string, graph: string | null): void
|
|
448
|
+
loadNTriples(data: string, graph: string | null): void
|
|
215
449
|
|
|
216
|
-
// SPARQL
|
|
450
|
+
// SPARQL Queries (WCOJ-optimized)
|
|
217
451
|
querySelect(sparql: string): Array<Record<string, string>>
|
|
218
452
|
queryAsk(sparql: string): boolean
|
|
219
|
-
queryConstruct(sparql: string): string
|
|
453
|
+
queryConstruct(sparql: string): string // Returns N-Triples
|
|
220
454
|
|
|
221
|
-
// SPARQL
|
|
455
|
+
// SPARQL Updates
|
|
222
456
|
updateInsert(sparql: string): void
|
|
223
457
|
updateDelete(sparql: string): void
|
|
224
458
|
|
|
225
|
-
// Database
|
|
459
|
+
// Database Operations
|
|
226
460
|
count(): number
|
|
227
461
|
clear(): void
|
|
228
|
-
|
|
229
|
-
// Metadata
|
|
230
462
|
getVersion(): string
|
|
231
463
|
}
|
|
232
464
|
```
|
|
233
465
|
|
|
234
|
-
### Node Class
|
|
466
|
+
### Node Class
|
|
235
467
|
|
|
236
468
|
```typescript
|
|
237
469
|
class Node {
|
|
@@ -245,165 +477,91 @@ class Node {
|
|
|
245
477
|
}
|
|
246
478
|
```
|
|
247
479
|
|
|
248
|
-
|
|
480
|
+
---
|
|
249
481
|
|
|
250
|
-
|
|
482
|
+
## Performance Characteristics
|
|
251
483
|
|
|
252
|
-
|
|
253
|
-
// INSERT DATA
|
|
254
|
-
db.updateInsert(`
|
|
255
|
-
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
484
|
+
### Complexity Analysis
|
|
256
485
|
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
486
|
+
| Operation | Complexity | Notes |
|
|
487
|
+
|-----------|------------|-------|
|
|
488
|
+
| Triple lookup | O(1) | Hash-based SPOC index |
|
|
489
|
+
| Pattern scan | O(k) | k = matching triples |
|
|
490
|
+
| Star join (WCOJ) | O(n log n) | LeapFrog intersection |
|
|
491
|
+
| Complex join (WCOJ) | O(n log n) | Trie-based |
|
|
492
|
+
| Transitive closure | O(n²) worst | CSR matrix optimization |
|
|
493
|
+
| Bulk insert | O(n) | Batch indexing |
|
|
262
494
|
|
|
263
|
-
|
|
264
|
-
db.updateDelete(`
|
|
265
|
-
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
495
|
+
### Memory Layout
|
|
266
496
|
|
|
267
|
-
DELETE WHERE {
|
|
268
|
-
?person foaf:age ?age .
|
|
269
|
-
FILTER(?age < 18)
|
|
270
|
-
}
|
|
271
|
-
`)
|
|
272
497
|
```
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
|
|
281
|
-
|
|
282
|
-
// Query specific graph
|
|
283
|
-
const results = db.querySelect(`
|
|
284
|
-
SELECT ?s ?p ?o WHERE {
|
|
285
|
-
GRAPH <http://example.org/graph1> {
|
|
286
|
-
?s ?p ?o .
|
|
287
|
-
}
|
|
288
|
-
}
|
|
289
|
-
`)
|
|
290
|
-
```
|
|
291
|
-
|
|
292
|
-
### SPARQL 1.1 Aggregates
|
|
293
|
-
|
|
294
|
-
```typescript
|
|
295
|
-
// COUNT, AVG, MIN, MAX, SUM
|
|
296
|
-
const aggregates = db.querySelect(`
|
|
297
|
-
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
298
|
-
|
|
299
|
-
SELECT
|
|
300
|
-
(COUNT(?person) AS ?count)
|
|
301
|
-
(AVG(?age) AS ?avgAge)
|
|
302
|
-
(MIN(?age) AS ?minAge)
|
|
303
|
-
(MAX(?age) AS ?maxAge)
|
|
304
|
-
WHERE {
|
|
305
|
-
?person foaf:age ?age .
|
|
306
|
-
}
|
|
307
|
-
`)
|
|
498
|
+
Triple: 24 bytes
|
|
499
|
+
├── Subject: 8 bytes (dictionary ID)
|
|
500
|
+
├── Predicate: 8 bytes (dictionary ID)
|
|
501
|
+
└── Object: 8 bytes (dictionary ID)
|
|
502
|
+
|
|
503
|
+
String Interning: All URIs/literals stored once in Dictionary
|
|
504
|
+
Index Overhead: ~4x base triple size (4 indexes)
|
|
505
|
+
Total: ~120 bytes/triple including indexes
|
|
308
506
|
```
|
|
309
507
|
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
```typescript
|
|
313
|
-
// Transitive closure with *
|
|
314
|
-
const transitiveKnows = db.querySelect(`
|
|
315
|
-
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
316
|
-
|
|
317
|
-
SELECT ?person ?connected WHERE {
|
|
318
|
-
<http://example.org/alice> foaf:knows* ?connected .
|
|
319
|
-
}
|
|
320
|
-
`)
|
|
321
|
-
|
|
322
|
-
// Alternative paths with |
|
|
323
|
-
const nameOrLabel = db.querySelect(`
|
|
324
|
-
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
325
|
-
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
|
|
326
|
-
|
|
327
|
-
SELECT ?resource ?name WHERE {
|
|
328
|
-
?resource (foaf:name|rdfs:label) ?name .
|
|
329
|
-
}
|
|
330
|
-
`)
|
|
331
|
-
```
|
|
508
|
+
---
|
|
332
509
|
|
|
333
|
-
##
|
|
510
|
+
## Version History
|
|
334
511
|
|
|
335
|
-
|
|
336
|
-
- **Bindings**: NAPI-RS for native Node.js addon
|
|
337
|
-
- **Storage**: Pluggable backends (InMemory, RocksDB, LMDB)
|
|
338
|
-
- **Indexing**: SPOC, POCS, OCSP, CSPO quad indexes
|
|
339
|
-
- **Query Optimizer**: Automatic WCOJ detection and execution
|
|
340
|
-
- **WCOJ Engine**: LeapFrog TrieJoin with variable ordering analysis
|
|
512
|
+
### v0.1.9 (2025-12-01) - SIMD + PGO Release
|
|
341
513
|
|
|
342
|
-
|
|
514
|
+
- **44.5% average speedup** via SIMD + PGO compiler optimizations
|
|
515
|
+
- WCOJ execution with LeapFrog TrieJoin
|
|
516
|
+
- Release automation infrastructure
|
|
517
|
+
- All packages updated to gonnect-uk namespace
|
|
343
518
|
|
|
344
|
-
### v0.1.8 (2025-12-01) - WCOJ Execution
|
|
519
|
+
### v0.1.8 (2025-12-01) - WCOJ Execution
|
|
345
520
|
|
|
346
|
-
-
|
|
347
|
-
-
|
|
348
|
-
-
|
|
349
|
-
- ✅ **100-1000x Speedup** for complex joins (4+ patterns)
|
|
350
|
-
- ✅ **577 Tests Passing** - Comprehensive end-to-end verification
|
|
351
|
-
- ✅ **Zero Regressions** - All existing queries work unchanged
|
|
521
|
+
- WCOJ execution path activated
|
|
522
|
+
- Variable ordering analysis for optimal joins
|
|
523
|
+
- 577 tests passing
|
|
352
524
|
|
|
353
525
|
### v0.1.7 (2025-11-30)
|
|
354
526
|
|
|
355
527
|
- Query optimizer with automatic strategy selection
|
|
356
528
|
- WCOJ algorithm integration (planning phase)
|
|
357
|
-
- Query plan visualization API
|
|
358
529
|
|
|
359
530
|
### v0.1.3 (2025-11-18)
|
|
360
531
|
|
|
361
|
-
- Initial TypeScript SDK
|
|
532
|
+
- Initial TypeScript SDK
|
|
362
533
|
- 100% W3C SPARQL 1.1 compliance
|
|
363
534
|
- 100% W3C RDF 1.2 compliance
|
|
364
535
|
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
```bash
|
|
368
|
-
# Run test suite
|
|
369
|
-
npm test
|
|
370
|
-
|
|
371
|
-
# Run specific tests
|
|
372
|
-
npm test -- --testNamePattern="star query"
|
|
373
|
-
```
|
|
374
|
-
|
|
375
|
-
## 🤝 Contributing
|
|
536
|
+
---
|
|
376
537
|
|
|
377
|
-
|
|
538
|
+
## Use Cases
|
|
378
539
|
|
|
379
|
-
|
|
540
|
+
| Domain | Application |
|
|
541
|
+
|--------|-------------|
|
|
542
|
+
| **Knowledge Graphs** | Enterprise ontologies, taxonomies |
|
|
543
|
+
| **Semantic Search** | Structured queries over unstructured data |
|
|
544
|
+
| **Data Integration** | ETL with SPARQL CONSTRUCT |
|
|
545
|
+
| **Compliance** | SHACL validation, provenance tracking |
|
|
546
|
+
| **Graph Analytics** | Pattern detection, community analysis |
|
|
547
|
+
| **Mobile Apps** | Embedded RDF on iOS/Android |
|
|
380
548
|
|
|
381
|
-
|
|
549
|
+
---
|
|
382
550
|
|
|
383
|
-
##
|
|
551
|
+
## Links
|
|
384
552
|
|
|
385
553
|
- [GitHub Repository](https://github.com/gonnect-uk/rust-kgdb)
|
|
386
554
|
- [Documentation](https://github.com/gonnect-uk/rust-kgdb/tree/main/docs)
|
|
387
555
|
- [CHANGELOG](https://github.com/gonnect-uk/rust-kgdb/blob/main/CHANGELOG.md)
|
|
388
|
-
- [W3C SPARQL 1.1
|
|
389
|
-
- [W3C RDF 1.2
|
|
556
|
+
- [W3C SPARQL 1.1](https://www.w3.org/TR/sparql11-query/)
|
|
557
|
+
- [W3C RDF 1.2](https://www.w3.org/TR/rdf12-concepts/)
|
|
390
558
|
|
|
391
|
-
|
|
392
|
-
|
|
393
|
-
- **Knowledge Graphs** - Build semantic data models
|
|
394
|
-
- **Semantic Search** - Query structured data with SPARQL
|
|
395
|
-
- **Data Integration** - Combine data from multiple sources
|
|
396
|
-
- **Ontology Reasoning** - RDFS and OWL inference
|
|
397
|
-
- **Graph Analytics** - Complex pattern matching with WCOJ
|
|
398
|
-
- **Mobile Apps** - Embedded RDF database for iOS/Android
|
|
559
|
+
---
|
|
399
560
|
|
|
400
|
-
##
|
|
561
|
+
## License
|
|
401
562
|
|
|
402
|
-
|
|
403
|
-
- [ ] v0.1.9: Manual SIMD vectorization for 2-4x additional speedup
|
|
404
|
-
- [ ] v0.2.0: Windows ARM64 support + distributed query execution
|
|
405
|
-
- [ ] v0.3.0: Graph analytics and reasoning engines
|
|
563
|
+
Apache License 2.0
|
|
406
564
|
|
|
407
565
|
---
|
|
408
566
|
|
|
409
|
-
**Built with
|
|
567
|
+
**Built with Rust + NAPI-RS**
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "rust-kgdb",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.11",
|
|
4
4
|
"description": "High-performance RDF/SPARQL database with 100% W3C compliance and WCOJ execution",
|
|
5
5
|
"main": "index.js",
|
|
6
6
|
"types": "index.d.ts",
|
|
@@ -21,7 +21,7 @@
|
|
|
21
21
|
"build:debug": "napi build --platform native/rust-kgdb-napi",
|
|
22
22
|
"prepublishOnly": "napi prepublish -t npm",
|
|
23
23
|
"test": "jest",
|
|
24
|
-
"version": "0.1.
|
|
24
|
+
"version": "0.1.11"
|
|
25
25
|
},
|
|
26
26
|
"keywords": [
|
|
27
27
|
"rdf",
|
|
@@ -56,10 +56,10 @@
|
|
|
56
56
|
"*.node"
|
|
57
57
|
],
|
|
58
58
|
"optionalDependencies": {
|
|
59
|
-
"rust-kgdb-win32-x64-msvc": "0.1.
|
|
60
|
-
"rust-kgdb-darwin-x64": "0.1.
|
|
61
|
-
"rust-kgdb-linux-x64-gnu": "0.1.
|
|
62
|
-
"rust-kgdb-darwin-arm64": "0.1.
|
|
63
|
-
"rust-kgdb-linux-arm64-gnu": "0.1.
|
|
59
|
+
"rust-kgdb-win32-x64-msvc": "0.1.11",
|
|
60
|
+
"rust-kgdb-darwin-x64": "0.1.11",
|
|
61
|
+
"rust-kgdb-linux-x64-gnu": "0.1.11",
|
|
62
|
+
"rust-kgdb-darwin-arm64": "0.1.11",
|
|
63
|
+
"rust-kgdb-linux-arm64-gnu": "0.1.11"
|
|
64
64
|
}
|
|
65
65
|
}
|