rdf-starbase 0.1.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- rdf_starbase/__init__.py +57 -0
- rdf_starbase/ai_grounding.py +728 -0
- rdf_starbase/compat/__init__.py +26 -0
- rdf_starbase/compat/rdflib.py +1104 -0
- rdf_starbase/formats/__init__.py +29 -0
- rdf_starbase/formats/jsonld.py +488 -0
- rdf_starbase/formats/ntriples.py +419 -0
- rdf_starbase/formats/rdfxml.py +434 -0
- rdf_starbase/formats/turtle.py +882 -0
- rdf_starbase/models.py +92 -0
- rdf_starbase/registry.py +540 -0
- rdf_starbase/repositories.py +407 -0
- rdf_starbase/repository_api.py +739 -0
- rdf_starbase/sparql/__init__.py +35 -0
- rdf_starbase/sparql/ast.py +910 -0
- rdf_starbase/sparql/executor.py +1925 -0
- rdf_starbase/sparql/parser.py +1716 -0
- rdf_starbase/storage/__init__.py +44 -0
- rdf_starbase/storage/executor.py +1914 -0
- rdf_starbase/storage/facts.py +850 -0
- rdf_starbase/storage/lsm.py +531 -0
- rdf_starbase/storage/persistence.py +338 -0
- rdf_starbase/storage/quoted_triples.py +292 -0
- rdf_starbase/storage/reasoner.py +1035 -0
- rdf_starbase/storage/terms.py +628 -0
- rdf_starbase/store.py +1049 -0
- rdf_starbase/store_legacy.py +748 -0
- rdf_starbase/web.py +568 -0
- rdf_starbase-0.1.0.dist-info/METADATA +706 -0
- rdf_starbase-0.1.0.dist-info/RECORD +31 -0
- rdf_starbase-0.1.0.dist-info/WHEEL +4 -0
|
@@ -0,0 +1,706 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: rdf-starbase
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: A blazingly fast RDF-Star database powered by Polars
|
|
5
|
+
Project-URL: Homepage, https://github.com/ontus/rdf-starbase
|
|
6
|
+
Project-URL: Repository, https://github.com/ontus/rdf-starbase
|
|
7
|
+
Project-URL: Documentation, https://rdf-starbase.readthedocs.io
|
|
8
|
+
Author-email: Ontus <team@ontus.dev>
|
|
9
|
+
License: MIT
|
|
10
|
+
Keywords: knowledge-graph,polars,rdf,rdf-star,semantic-web
|
|
11
|
+
Classifier: Development Status :: 3 - Alpha
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
+
Requires-Python: >=3.10
|
|
18
|
+
Requires-Dist: polars>=0.20.0
|
|
19
|
+
Requires-Dist: pyarrow>=14.0.0
|
|
20
|
+
Requires-Dist: pydantic>=2.0.0
|
|
21
|
+
Requires-Dist: typing-extensions>=4.8.0
|
|
22
|
+
Provides-Extra: dev
|
|
23
|
+
Requires-Dist: black>=23.0.0; extra == 'dev'
|
|
24
|
+
Requires-Dist: ipython>=8.12.0; extra == 'dev'
|
|
25
|
+
Requires-Dist: jupyter>=1.0.0; extra == 'dev'
|
|
26
|
+
Requires-Dist: mypy>=1.7.0; extra == 'dev'
|
|
27
|
+
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
|
|
28
|
+
Requires-Dist: pytest>=7.4.0; extra == 'dev'
|
|
29
|
+
Requires-Dist: ruff>=0.1.0; extra == 'dev'
|
|
30
|
+
Provides-Extra: query
|
|
31
|
+
Requires-Dist: pyparsing>=3.1.0; extra == 'query'
|
|
32
|
+
Provides-Extra: viz
|
|
33
|
+
Requires-Dist: matplotlib>=3.8.0; extra == 'viz'
|
|
34
|
+
Requires-Dist: networkx>=3.2.0; extra == 'viz'
|
|
35
|
+
Provides-Extra: web
|
|
36
|
+
Requires-Dist: fastapi>=0.104.0; extra == 'web'
|
|
37
|
+
Requires-Dist: uvicorn>=0.24.0; extra == 'web'
|
|
38
|
+
Description-Content-Type: text/markdown
|
|
39
|
+
|
|
40
|
+
# RDF-StarBase
|
|
41
|
+
|
|
42
|
+
> **A blazingly fast RDF★ database with native provenance tracking**
|
|
43
|
+
|
|
44
|
+
[](https://www.python.org/downloads/)
|
|
45
|
+
[](https://opensource.org/licenses/MIT)
|
|
46
|
+
[]()
|
|
47
|
+
[]()
|
|
48
|
+
|
|
49
|
+
RDF-StarBase is a native RDF★ platform for storing, querying, and visualizing **assertions about data** — not just data itself. Every triple carries full provenance: **who** said it, **when**, **how confident** they were, and **which process** generated it.
|
|
50
|
+
|
|
51
|
+
## Key Features
|
|
52
|
+
|
|
53
|
+
- **Blazingly Fast** — Built on [Polars](https://pola.rs/) with Rust-speed DataFrame operations
|
|
54
|
+
- **Native RDF-Star** — First-class support for quoted triples and statement metadata
|
|
55
|
+
- **Full Provenance** — Every assertion tracked with source, timestamp, confidence, process
|
|
56
|
+
- **Competing Claims** — See ALL assertions, not just the "winning" one
|
|
57
|
+
- **SPARQL-Star** — Query with standard SPARQL syntax + provenance extensions
|
|
58
|
+
- **Assertion Registry** — Track data sources, APIs, and mappings as first-class entities
|
|
59
|
+
- **REST API** — FastAPI-powered web interface with interactive docs
|
|
60
|
+
- **Graph Visualization** — React + D3.js frontend for exploring knowledge graphs
|
|
61
|
+
- **Parquet Persistence** — Efficient columnar storage for analytics workloads
|
|
62
|
+
|
|
63
|
+
## Why RDF-StarBase?
|
|
64
|
+
|
|
65
|
+
Traditional databases store **values**.
|
|
66
|
+
Traditional catalogs store **descriptions**.
|
|
67
|
+
**RDF-StarBase stores assertions about reality.**
|
|
68
|
+
|
|
69
|
+
When your CRM says `customer.age = 34` and your Data Lake says `customer.age = 36`, most systems silently overwrite. RDF-StarBase **keeps both**, letting you:
|
|
70
|
+
|
|
71
|
+
- See competing claims side-by-side
|
|
72
|
+
- Filter by source, confidence, or recency
|
|
73
|
+
- Maintain full audit trails
|
|
74
|
+
- Let downstream systems choose which to trust
|
|
75
|
+
|
|
76
|
+
## Installation
|
|
77
|
+
|
|
78
|
+
```bash
|
|
79
|
+
pip install rdf-starbase
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Or install from source:
|
|
83
|
+
|
|
84
|
+
```bash
|
|
85
|
+
git clone https://github.com/ontus/rdf-starbase.git
|
|
86
|
+
cd rdf-starbase
|
|
87
|
+
pip install -e ".[dev]"
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Quick Start
|
|
91
|
+
|
|
92
|
+
```python
|
|
93
|
+
from rdf_starbase import TripleStore, ProvenanceContext
|
|
94
|
+
|
|
95
|
+
# Create a store
|
|
96
|
+
store = TripleStore()
|
|
97
|
+
|
|
98
|
+
# Add triples with provenance
|
|
99
|
+
prov = ProvenanceContext(
|
|
100
|
+
source="CRM_System",
|
|
101
|
+
confidence=0.85,
|
|
102
|
+
process="api_sync"
|
|
103
|
+
)
|
|
104
|
+
|
|
105
|
+
store.add_triple(
|
|
106
|
+
"http://example.org/customer/123",
|
|
107
|
+
"http://xmlns.com/foaf/0.1/name",
|
|
108
|
+
"Alice Johnson",
|
|
109
|
+
prov
|
|
110
|
+
)
|
|
111
|
+
|
|
112
|
+
# Query with provenance filtering
|
|
113
|
+
results = store.get_triples(
|
|
114
|
+
subject="http://example.org/customer/123",
|
|
115
|
+
min_confidence=0.8
|
|
116
|
+
)
|
|
117
|
+
|
|
118
|
+
# Detect competing claims
|
|
119
|
+
claims = store.get_competing_claims(
|
|
120
|
+
subject="http://example.org/customer/123",
|
|
121
|
+
predicate="http://example.org/age"
|
|
122
|
+
)
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
## 🔍 SPARQL-Star Queries
|
|
126
|
+
|
|
127
|
+
```python
|
|
128
|
+
from rdf_starbase import execute_sparql
|
|
129
|
+
|
|
130
|
+
# Standard SPARQL
|
|
131
|
+
results = execute_sparql(store, """
|
|
132
|
+
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
133
|
+
SELECT ?name WHERE {
|
|
134
|
+
<http://example.org/customer/123> foaf:name ?name
|
|
135
|
+
}
|
|
136
|
+
""")
|
|
137
|
+
|
|
138
|
+
# With provenance extensions
|
|
139
|
+
results = execute_sparql(store, """
|
|
140
|
+
SELECT ?s ?p ?o WHERE {
|
|
141
|
+
?s ?p ?o .
|
|
142
|
+
FILTER_CONFIDENCE(>= 0.9)
|
|
143
|
+
FILTER_SOURCE("CRM_System")
|
|
144
|
+
}
|
|
145
|
+
""")
|
|
146
|
+
|
|
147
|
+
# ASK queries
|
|
148
|
+
exists = execute_sparql(store, """
|
|
149
|
+
ASK WHERE {
|
|
150
|
+
<http://example.org/customer/123> <http://xmlns.com/foaf/0.1/name> ?name
|
|
151
|
+
}
|
|
152
|
+
""") # Returns: True
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### Advanced Query Features
|
|
156
|
+
|
|
157
|
+
```python
|
|
158
|
+
# OPTIONAL - include data when available
|
|
159
|
+
results = execute_sparql(store, """
|
|
160
|
+
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
|
161
|
+
SELECT ?person ?name ?email WHERE {
|
|
162
|
+
?person foaf:name ?name .
|
|
163
|
+
OPTIONAL { ?person foaf:mbox ?email }
|
|
164
|
+
}
|
|
165
|
+
""")
|
|
166
|
+
|
|
167
|
+
# UNION - combine multiple patterns
|
|
168
|
+
results = execute_sparql(store, """
|
|
169
|
+
SELECT ?entity ?label WHERE {
|
|
170
|
+
{ ?entity rdfs:label ?label }
|
|
171
|
+
UNION
|
|
172
|
+
{ ?entity foaf:name ?label }
|
|
173
|
+
}
|
|
174
|
+
""")
|
|
175
|
+
|
|
176
|
+
# BIND - computed values
|
|
177
|
+
results = execute_sparql(store, """
|
|
178
|
+
SELECT ?product ?price ?taxed WHERE {
|
|
179
|
+
?product ex:price ?price .
|
|
180
|
+
BIND(?price * 1.1 AS ?taxed)
|
|
181
|
+
}
|
|
182
|
+
""")
|
|
183
|
+
|
|
184
|
+
# Aggregates with GROUP BY
|
|
185
|
+
results = execute_sparql(store, """
|
|
186
|
+
SELECT ?source (COUNT(*) AS ?count) (AVG(?confidence) AS ?avg_conf) WHERE {
|
|
187
|
+
?s ?p ?o .
|
|
188
|
+
}
|
|
189
|
+
GROUP BY ?source
|
|
190
|
+
HAVING (COUNT(*) > 10)
|
|
191
|
+
""")
|
|
192
|
+
|
|
193
|
+
# CONSTRUCT - generate new triples
|
|
194
|
+
results = execute_sparql(store, """
|
|
195
|
+
CONSTRUCT {
|
|
196
|
+
?person foaf:knows ?other .
|
|
197
|
+
}
|
|
198
|
+
WHERE {
|
|
199
|
+
?person ex:worksAt ?company .
|
|
200
|
+
?other ex:worksAt ?company .
|
|
201
|
+
FILTER(?person != ?other)
|
|
202
|
+
}
|
|
203
|
+
""")
|
|
204
|
+
|
|
205
|
+
# INSERT DATA - add new triples
|
|
206
|
+
execute_sparql(store, """
|
|
207
|
+
INSERT DATA {
|
|
208
|
+
<http://example.org/alice> foaf:name "Alice" .
|
|
209
|
+
<http://example.org/alice> foaf:age 30 .
|
|
210
|
+
}
|
|
211
|
+
""")
|
|
212
|
+
|
|
213
|
+
# DELETE DATA - remove specific triples
|
|
214
|
+
execute_sparql(store, """
|
|
215
|
+
DELETE DATA {
|
|
216
|
+
<http://example.org/alice> foaf:age 30 .
|
|
217
|
+
}
|
|
218
|
+
""")
|
|
219
|
+
|
|
220
|
+
# DELETE WHERE - remove matching patterns
|
|
221
|
+
execute_sparql(store, """
|
|
222
|
+
DELETE WHERE {
|
|
223
|
+
<http://example.org/alice> foaf:knows ?anyone .
|
|
224
|
+
}
|
|
225
|
+
""")
|
|
226
|
+
|
|
227
|
+
# DELETE/INSERT WHERE - update values atomically
|
|
228
|
+
execute_sparql(store, """
|
|
229
|
+
DELETE { ?s ex:status "active" }
|
|
230
|
+
INSERT { ?s ex:status "archived" }
|
|
231
|
+
WHERE { ?s ex:status "active" }
|
|
232
|
+
""")
|
|
233
|
+
|
|
234
|
+
# Property paths - navigate graph relationships
|
|
235
|
+
results = execute_sparql(store, """
|
|
236
|
+
SELECT ?ancestor WHERE {
|
|
237
|
+
<http://example.org/alice> foaf:knows+ ?ancestor . # One or more hops
|
|
238
|
+
}
|
|
239
|
+
""")
|
|
240
|
+
|
|
241
|
+
results = execute_sparql(store, """
|
|
242
|
+
SELECT ?connected WHERE {
|
|
243
|
+
<http://example.org/alice> (foaf:knows|foaf:worksWith)* ?connected . # Zero or more via knows OR worksWith
|
|
244
|
+
}
|
|
245
|
+
""")
|
|
246
|
+
|
|
247
|
+
results = execute_sparql(store, """
|
|
248
|
+
SELECT ?knower WHERE {
|
|
249
|
+
?knower ^foaf:knows <http://example.org/bob> . # Inverse: who knows Bob?
|
|
250
|
+
}
|
|
251
|
+
""")
|
|
252
|
+
|
|
253
|
+
# Time-travel queries - query historical state
|
|
254
|
+
results = execute_sparql(store, """
|
|
255
|
+
SELECT ?s ?name WHERE {
|
|
256
|
+
?s foaf:name ?name .
|
|
257
|
+
}
|
|
258
|
+
AS OF "2025-01-15T00:00:00Z"
|
|
259
|
+
""")
|
|
260
|
+
|
|
261
|
+
# ASK with time-travel
|
|
262
|
+
existed = execute_sparql(store, """
|
|
263
|
+
ASK WHERE {
|
|
264
|
+
<http://example.org/alice> foaf:name ?name .
|
|
265
|
+
}
|
|
266
|
+
AS OF "2024-06-01"
|
|
267
|
+
""") # Returns: True if Alice existed on that date
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
## 📊 Named Graph Management
|
|
271
|
+
|
|
272
|
+
RDF-StarBase supports named graphs (graph containers/clusters) with full SPARQL Graph Store Protocol operations:
|
|
273
|
+
|
|
274
|
+
```python
|
|
275
|
+
from rdf_starbase import execute_sparql
|
|
276
|
+
|
|
277
|
+
# CREATE GRAPH - create a new named graph
|
|
278
|
+
execute_sparql(store, """
|
|
279
|
+
CREATE GRAPH <http://example.org/graphs/customers>
|
|
280
|
+
""")
|
|
281
|
+
|
|
282
|
+
# LOAD - load RDF data from a file into a graph
|
|
283
|
+
execute_sparql(store, """
|
|
284
|
+
LOAD <file:///data/customers.ttl>
|
|
285
|
+
INTO GRAPH <http://example.org/graphs/customers>
|
|
286
|
+
""")
|
|
287
|
+
|
|
288
|
+
# Or load from HTTP
|
|
289
|
+
execute_sparql(store, """
|
|
290
|
+
LOAD <https://example.org/data/products.ttl>
|
|
291
|
+
INTO GRAPH <http://example.org/graphs/products>
|
|
292
|
+
""")
|
|
293
|
+
|
|
294
|
+
# COPY - copy all triples from one graph to another
|
|
295
|
+
execute_sparql(store, """
|
|
296
|
+
COPY GRAPH <http://example.org/graphs/customers>
|
|
297
|
+
TO GRAPH <http://example.org/graphs/customers_backup>
|
|
298
|
+
""")
|
|
299
|
+
|
|
300
|
+
# MOVE - move triples (copy then clear source)
|
|
301
|
+
execute_sparql(store, """
|
|
302
|
+
MOVE GRAPH <http://example.org/graphs/staging>
|
|
303
|
+
TO GRAPH <http://example.org/graphs/production>
|
|
304
|
+
""")
|
|
305
|
+
|
|
306
|
+
# ADD - add triples to another graph (merge)
|
|
307
|
+
execute_sparql(store, """
|
|
308
|
+
ADD GRAPH <http://example.org/graphs/updates>
|
|
309
|
+
TO GRAPH <http://example.org/graphs/main>
|
|
310
|
+
""")
|
|
311
|
+
|
|
312
|
+
# CLEAR - remove all triples from a graph (graph still exists)
|
|
313
|
+
execute_sparql(store, """
|
|
314
|
+
CLEAR GRAPH <http://example.org/graphs/temp>
|
|
315
|
+
""")
|
|
316
|
+
|
|
317
|
+
# DROP - delete a graph and all its triples
|
|
318
|
+
execute_sparql(store, """
|
|
319
|
+
DROP GRAPH <http://example.org/graphs/old_data>
|
|
320
|
+
""")
|
|
321
|
+
|
|
322
|
+
# Special graph targets
|
|
323
|
+
execute_sparql(store, "CLEAR DEFAULT") # Clear default graph
|
|
324
|
+
execute_sparql(store, "DROP NAMED") # Drop all named graphs
|
|
325
|
+
execute_sparql(store, "CLEAR ALL") # Clear everything
|
|
326
|
+
|
|
327
|
+
# SILENT mode - don't fail if graph doesn't exist
|
|
328
|
+
execute_sparql(store, """
|
|
329
|
+
DROP SILENT GRAPH <http://example.org/graphs/maybe_exists>
|
|
330
|
+
""")
|
|
331
|
+
|
|
332
|
+
# List all named graphs
|
|
333
|
+
graphs = store.list_graphs()
|
|
334
|
+
print(graphs) # ['http://example.org/graphs/customers', 'http://example.org/graphs/products']
|
|
335
|
+
```
|
|
336
|
+
|
|
337
|
+
### Querying Named Graphs
|
|
338
|
+
|
|
339
|
+
```python
|
|
340
|
+
# FROM clause - restrict query to specific graph
|
|
341
|
+
results = execute_sparql(store, """
|
|
342
|
+
SELECT ?customer ?name
|
|
343
|
+
FROM <http://example.org/graphs/customers>
|
|
344
|
+
WHERE {
|
|
345
|
+
?customer foaf:name ?name
|
|
346
|
+
}
|
|
347
|
+
""")
|
|
348
|
+
|
|
349
|
+
# FROM with multiple graphs (union of datasets)
|
|
350
|
+
results = execute_sparql(store, """
|
|
351
|
+
SELECT ?entity ?label
|
|
352
|
+
FROM <http://example.org/graphs/customers>
|
|
353
|
+
FROM <http://example.org/graphs/products>
|
|
354
|
+
WHERE {
|
|
355
|
+
?entity rdfs:label ?label
|
|
356
|
+
}
|
|
357
|
+
""")
|
|
358
|
+
|
|
359
|
+
# GRAPH pattern - query specific named graph in WHERE clause
|
|
360
|
+
results = execute_sparql(store, """
|
|
361
|
+
SELECT ?customer ?name WHERE {
|
|
362
|
+
GRAPH <http://example.org/graphs/customers> {
|
|
363
|
+
?customer foaf:name ?name
|
|
364
|
+
}
|
|
365
|
+
}
|
|
366
|
+
""")
|
|
367
|
+
|
|
368
|
+
# GRAPH with variable - discover which graph contains data
|
|
369
|
+
results = execute_sparql(store, """
|
|
370
|
+
SELECT ?graph ?entity ?name WHERE {
|
|
371
|
+
GRAPH ?graph {
|
|
372
|
+
?entity foaf:name ?name
|
|
373
|
+
}
|
|
374
|
+
}
|
|
375
|
+
""")
|
|
376
|
+
|
|
377
|
+
# Combined patterns - default graph + specific named graph
|
|
378
|
+
results = execute_sparql(store, """
|
|
379
|
+
SELECT ?person ?friend ?friendName WHERE {
|
|
380
|
+
?person foaf:knows ?friend .
|
|
381
|
+
GRAPH <http://example.org/graphs/profiles> {
|
|
382
|
+
?friend foaf:name ?friendName
|
|
383
|
+
}
|
|
384
|
+
}
|
|
385
|
+
""")
|
|
386
|
+
|
|
387
|
+
# FROM NAMED - specify available named graphs for GRAPH patterns
|
|
388
|
+
results = execute_sparql(store, """
|
|
389
|
+
SELECT ?g ?s ?name
|
|
390
|
+
FROM NAMED <http://example.org/graphs/customers>
|
|
391
|
+
FROM NAMED <http://example.org/graphs/employees>
|
|
392
|
+
WHERE {
|
|
393
|
+
GRAPH ?g { ?s foaf:name ?name }
|
|
394
|
+
}
|
|
395
|
+
""")
|
|
396
|
+
```
|
|
397
|
+
|
|
398
|
+
## ⭐ RDF-Star: Quoted Triples
|
|
399
|
+
|
|
400
|
+
RDF-Star allows you to make statements **about statements**:
|
|
401
|
+
|
|
402
|
+
```python
|
|
403
|
+
# The assertion "Alice knows Bob" is claimed by Wikipedia
|
|
404
|
+
store.add_quoted_triple(
|
|
405
|
+
subject="<<http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob>>",
|
|
406
|
+
predicate="http://example.org/assertedBy",
|
|
407
|
+
obj="http://dbpedia.org/resource/Wikipedia",
|
|
408
|
+
provenance=prov
|
|
409
|
+
)
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
Query with SPARQL-Star:
|
|
413
|
+
|
|
414
|
+
```sparql
|
|
415
|
+
SELECT ?who WHERE {
|
|
416
|
+
<< ?person foaf:knows ?other >> ex:assertedBy ?who
|
|
417
|
+
}
|
|
418
|
+
```
|
|
419
|
+
|
|
420
|
+
## Competing Claims Detection
|
|
421
|
+
|
|
422
|
+
```python
|
|
423
|
+
# Multiple systems report different ages
|
|
424
|
+
crm_prov = ProvenanceContext(source="CRM", confidence=0.85)
|
|
425
|
+
lake_prov = ProvenanceContext(source="DataLake", confidence=0.92)
|
|
426
|
+
|
|
427
|
+
store.add_triple(customer, "http://example.org/age", 34, crm_prov)
|
|
428
|
+
store.add_triple(customer, "http://example.org/age", 36, lake_prov)
|
|
429
|
+
|
|
430
|
+
# See all competing values
|
|
431
|
+
claims = store.get_competing_claims(customer, "http://example.org/age")
|
|
432
|
+
print(claims)
|
|
433
|
+
# shape: (2, 4)
|
|
434
|
+
# ┌────────┬──────────┬────────────┬─────────────────────┐
|
|
435
|
+
# │ object │ source │ confidence │ timestamp │
|
|
436
|
+
# ├────────┼──────────┼────────────┼─────────────────────┤
|
|
437
|
+
# │ 36 │ DataLake │ 0.92 │ 2026-01-16 03:00:00 │
|
|
438
|
+
# │ 34 │ CRM │ 0.85 │ 2026-01-16 02:00:00 │
|
|
439
|
+
# └────────┴──────────┴────────────┴─────────────────────┘
|
|
440
|
+
```
|
|
441
|
+
|
|
442
|
+
## Persistence
|
|
443
|
+
|
|
444
|
+
```python
|
|
445
|
+
# Save to Parquet (columnar, fast, compressible)
|
|
446
|
+
store.save("knowledge_graph.parquet")
|
|
447
|
+
|
|
448
|
+
# Load back
|
|
449
|
+
loaded_store = TripleStore.load("knowledge_graph.parquet")
|
|
450
|
+
```
|
|
451
|
+
|
|
452
|
+
## Architecture
|
|
453
|
+
|
|
454
|
+
```
|
|
455
|
+
┌─────────────────────────────────────────────────────────────────────┐
|
|
456
|
+
│ RDF-StarBase │
|
|
457
|
+
├─────────────────────────────────────────────────────────────────────┤
|
|
458
|
+
│ React + D3.js Frontend │ REST API (FastAPI) │
|
|
459
|
+
├──────────────────────────────┼──────────────────────────────────────┤
|
|
460
|
+
│ SPARQL-Star Parser │ Query Executor │ Assertion Registry │
|
|
461
|
+
├─────────────────────────────────────────────────────────────────────┤
|
|
462
|
+
│ Triple Store (Polars DataFrames) │
|
|
463
|
+
├─────────────────────────────────────────────────────────────────────┤
|
|
464
|
+
│ Parquet I/O │ Provenance Tracking │ Competing Claims Detection │
|
|
465
|
+
└─────────────────────────────────────────────────────────────────────┘
|
|
466
|
+
```
|
|
467
|
+
|
|
468
|
+
**Core Stack:**
|
|
469
|
+
- **Polars** — Rust-powered DataFrames for blazing performance
|
|
470
|
+
- **FastAPI** — Modern async REST API framework
|
|
471
|
+
- **pyparsing** — SPARQL-Star parser
|
|
472
|
+
- **Pydantic** — Data model validation
|
|
473
|
+
- **D3.js** — Graph visualization
|
|
474
|
+
- **PyArrow** — Parquet persistence
|
|
475
|
+
|
|
476
|
+
## Performance
|
|
477
|
+
|
|
478
|
+
RDF-StarBase leverages Polars' Rust backend for:
|
|
479
|
+
|
|
480
|
+
- **Vectorized operations** on millions of triples
|
|
481
|
+
- **Lazy evaluation** for query optimization
|
|
482
|
+
- **Zero-copy reads** from Parquet
|
|
483
|
+
- **Parallel execution** across cores
|
|
484
|
+
|
|
485
|
+
## Web API
|
|
486
|
+
|
|
487
|
+
Start the server:
|
|
488
|
+
|
|
489
|
+
```bash
|
|
490
|
+
# Using uvicorn directly
|
|
491
|
+
uvicorn rdf_starbase.web:app --reload
|
|
492
|
+
|
|
493
|
+
# Or with the module
|
|
494
|
+
python -m rdf_starbase.web
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
Then open:
|
|
498
|
+
- **API Docs**: http://localhost:8000/docs
|
|
499
|
+
- **ReDoc**: http://localhost:8000/redoc
|
|
500
|
+
|
|
501
|
+
### REST Endpoints
|
|
502
|
+
|
|
503
|
+
| Endpoint | Method | Description |
|
|
504
|
+
|----------|--------|-------------|
|
|
505
|
+
| `/triples` | GET | Query triples with filters |
|
|
506
|
+
| `/triples` | POST | Add new triple with provenance |
|
|
507
|
+
| `/triples/{subject}/claims` | GET | Get competing claims |
|
|
508
|
+
| `/sparql` | POST | Execute SPARQL-Star query |
|
|
509
|
+
| `/sources` | GET/POST | Manage data sources |
|
|
510
|
+
| `/graph/nodes` | GET | Visualization data |
|
|
511
|
+
| `/graph/edges` | GET | Graph edges |
|
|
512
|
+
| `/stats` | GET | Database statistics |
|
|
513
|
+
|
|
514
|
+
### 🤖 AI Grounding API
|
|
515
|
+
|
|
516
|
+
A specialized API layer designed for AI/LLM consumption, separate from the UI visualization endpoints:
|
|
517
|
+
|
|
518
|
+
| Endpoint | Method | Description |
|
|
519
|
+
|----------|--------|-------------|
|
|
520
|
+
| `/ai/query` | POST | Structured fact retrieval with provenance for RAG |
|
|
521
|
+
| `/ai/verify` | POST | Verify if a claim is supported by the knowledge base |
|
|
522
|
+
| `/ai/context/{iri}` | GET | Get all facts about an entity with citations |
|
|
523
|
+
| `/ai/materialize` | POST | Trigger reasoning and persist inferences |
|
|
524
|
+
| `/ai/inferences` | GET | List materialized inferences |
|
|
525
|
+
| `/ai/health` | GET | AI API health check |
|
|
526
|
+
|
|
527
|
+
#### Why a Separate AI API?
|
|
528
|
+
|
|
529
|
+
| Aspect | UI API (`/graph/*`) | AI Grounding API (`/ai/*`) |
|
|
530
|
+
|--------|---------------------|----------------------------|
|
|
531
|
+
| **Consumer** | D3.js visualization | LLM tool calls / agents |
|
|
532
|
+
| **Response format** | Nodes + edges for rendering | Facts + provenance + citations |
|
|
533
|
+
| **Query pattern** | Browsing, neighborhood exploration | Precise fact lookup, verification |
|
|
534
|
+
| **Filtering** | Limit by count, visual simplicity | Confidence threshold, freshness |
|
|
535
|
+
|
|
536
|
+
#### Example: Grounding an AI Response
|
|
537
|
+
|
|
538
|
+
```python
|
|
539
|
+
import httpx
|
|
540
|
+
|
|
541
|
+
# 1. Query relevant facts for RAG
|
|
542
|
+
response = httpx.post("http://localhost:8000/ai/query", json={
|
|
543
|
+
"subject": "http://example.org/customer/123",
|
|
544
|
+
"min_confidence": "high", # high (>=0.9), medium (>=0.7), low (>=0.5), any
|
|
545
|
+
"max_age_days": 30, # Only recent facts
|
|
546
|
+
})
|
|
547
|
+
facts = response.json()["facts"]
|
|
548
|
+
|
|
549
|
+
# 2. Verify a claim before stating it
|
|
550
|
+
verify = httpx.post("http://localhost:8000/ai/verify", json={
|
|
551
|
+
"subject": "http://example.org/customer/123",
|
|
552
|
+
"predicate": "http://xmlns.com/foaf/0.1/age",
|
|
553
|
+
"expected_object": "34",
|
|
554
|
+
})
|
|
555
|
+
result = verify.json()
|
|
556
|
+
if result["claim_supported"]:
|
|
557
|
+
print(f"Claim verified with {result['confidence']:.0%} confidence")
|
|
558
|
+
elif result["has_conflicts"]:
|
|
559
|
+
print("Warning: Competing claims exist!")
|
|
560
|
+
print(result["recommendation"])
|
|
561
|
+
|
|
562
|
+
# 3. Get full entity context
|
|
563
|
+
context = httpx.get("http://localhost:8000/ai/context/http://example.org/customer/123")
|
|
564
|
+
entity_facts = context.json()["facts"]
|
|
565
|
+
related = context.json()["related_entities"]
|
|
566
|
+
```
|
|
567
|
+
|
|
568
|
+
#### Inference Materialization
|
|
569
|
+
|
|
570
|
+
Materialize RDFS/OWL inferences with provenance tracking:
|
|
571
|
+
|
|
572
|
+
```python
|
|
573
|
+
# Run reasoning engine and persist inferred triples
|
|
574
|
+
response = httpx.post("http://localhost:8000/ai/materialize", json={
|
|
575
|
+
"enable_rdfs": True, # RDFS entailment rules
|
|
576
|
+
"enable_owl": True, # OWL 2 RL rules
|
|
577
|
+
"max_iterations": 100,
|
|
578
|
+
})
|
|
579
|
+
print(f"Inferred {response.json()['triples_inferred']} triples")
|
|
580
|
+
|
|
581
|
+
# Query inferred facts (marked with source='reasoner')
|
|
582
|
+
inferences = httpx.get("http://localhost:8000/ai/inferences")
|
|
583
|
+
for fact in inferences.json()["inferences"]:
|
|
584
|
+
print(f"Inferred: {fact['subject']} {fact['predicate']} {fact['object']}")
|
|
585
|
+
```
|
|
586
|
+
|
|
587
|
+
## 📋 Assertion Registry
|
|
588
|
+
|
|
589
|
+
Track data sources as first-class entities:
|
|
590
|
+
|
|
591
|
+
```python
|
|
592
|
+
from rdf_starbase import AssertionRegistry, SourceType
|
|
593
|
+
|
|
594
|
+
registry = AssertionRegistry()
|
|
595
|
+
|
|
596
|
+
# Register a data source
|
|
597
|
+
source = registry.register_source(
|
|
598
|
+
name="CRM_Production",
|
|
599
|
+
source_type=SourceType.API,
|
|
600
|
+
uri="https://api.crm.example.com/v2",
|
|
601
|
+
owner="sales-team",
|
|
602
|
+
tags=["production", "customer-data"],
|
|
603
|
+
)
|
|
604
|
+
|
|
605
|
+
# Track sync runs
|
|
606
|
+
run = registry.start_sync(source.id)
|
|
607
|
+
# ... perform sync ...
|
|
608
|
+
registry.complete_sync(run.id, records_processed=1000)
|
|
609
|
+
|
|
610
|
+
# Get sync history
|
|
611
|
+
history = registry.get_sync_history(source.id)
|
|
612
|
+
```
|
|
613
|
+
|
|
614
|
+
## 🧪 Development
|
|
615
|
+
|
|
616
|
+
```bash
|
|
617
|
+
# Install dev dependencies
|
|
618
|
+
pip install -e ".[dev]"
|
|
619
|
+
|
|
620
|
+
# Run tests
|
|
621
|
+
pytest tests/ -v
|
|
622
|
+
|
|
623
|
+
# Run with coverage
|
|
624
|
+
pytest tests/ --cov=src/rdf_starbase
|
|
625
|
+
|
|
626
|
+
# Format code
|
|
627
|
+
black src/ tests/
|
|
628
|
+
ruff check src/ tests/
|
|
629
|
+
```
|
|
630
|
+
|
|
631
|
+
## 📊 Frontend (React + D3)
|
|
632
|
+
|
|
633
|
+
```bash
|
|
634
|
+
cd frontend
|
|
635
|
+
npm install
|
|
636
|
+
npm run dev
|
|
637
|
+
```
|
|
638
|
+
|
|
639
|
+
Then open http://localhost:3000 (proxies API to :8000)
|
|
640
|
+
|
|
641
|
+
## 📚 Examples
|
|
642
|
+
|
|
643
|
+
See the `examples/` directory:
|
|
644
|
+
|
|
645
|
+
- `quickstart.py` — Core features demonstration
|
|
646
|
+
- `competing_claims.py` — Handling conflicting data from multiple sources
|
|
647
|
+
- `sparql_queries.py` — SPARQL-Star query examples
|
|
648
|
+
- `registry_demo.py` — Assertion Registry usage
|
|
649
|
+
|
|
650
|
+
## 🗺️ Roadmap
|
|
651
|
+
|
|
652
|
+
### ✅ Completed (MVP)
|
|
653
|
+
- [x] Native RDF-Star storage
|
|
654
|
+
- [x] Provenance tracking (source, timestamp, confidence, process)
|
|
655
|
+
- [x] Competing claims detection
|
|
656
|
+
- [x] SPARQL-Star parser (SELECT, ASK, FILTER, ORDER BY, LIMIT, OFFSET)
|
|
657
|
+
- [x] SPARQL-Star executor with Polars backend
|
|
658
|
+
- [x] Provenance filter extensions
|
|
659
|
+
- [x] Parquet persistence
|
|
660
|
+
- [x] Assertion Registry (datasets, APIs, mappings)
|
|
661
|
+
- [x] REST API with FastAPI
|
|
662
|
+
- [x] React + D3 graph visualization
|
|
663
|
+
|
|
664
|
+
### ✅ Completed (Advanced Query Features)
|
|
665
|
+
- [x] OPTIONAL patterns (left outer joins)
|
|
666
|
+
- [x] UNION patterns (combine result sets)
|
|
667
|
+
- [x] MINUS patterns (set difference)
|
|
668
|
+
- [x] FILTER expressions (comparisons, boolean logic, regex, string functions)
|
|
669
|
+
- [x] BIND clauses (variable assignment, expressions, functions)
|
|
670
|
+
- [x] VALUES inline data
|
|
671
|
+
- [x] Aggregate functions (COUNT, SUM, AVG, MIN, MAX, GROUP_CONCAT, SAMPLE)
|
|
672
|
+
- [x] GROUP BY and HAVING
|
|
673
|
+
- [x] CONSTRUCT queries (template-based triple generation)
|
|
674
|
+
- [x] DESCRIBE queries (resource description)
|
|
675
|
+
- [x] SPARQL UPDATE (INSERT DATA, DELETE DATA, DELETE WHERE, DELETE/INSERT WHERE)
|
|
676
|
+
- [x] OWL reasoning (rdfs:subClassOf, owl:sameAs, owl:inverseOf, owl:TransitiveProperty)
|
|
677
|
+
- [x] Property path queries (`/`, `|`, `^`, `*`, `+`, `?`)
|
|
678
|
+
- [x] Time-travel queries (`AS OF "2025-01-15T00:00:00Z"`)
|
|
679
|
+
- [x] AI Grounding API (`/ai/query`, `/ai/verify`, `/ai/context`)
|
|
680
|
+
- [x] Inference materialization (`/ai/materialize`, `/ai/inferences`)
|
|
681
|
+
- [x] Named Graph Management (CREATE, DROP, CLEAR, LOAD, COPY, MOVE, ADD)
|
|
682
|
+
- [x] FROM clause dataset specification
|
|
683
|
+
- [x] GRAPH pattern queries
|
|
684
|
+
|
|
685
|
+
### 🔜 Next
|
|
686
|
+
- [ ] Trust scoring and decay
|
|
687
|
+
|
|
688
|
+
### 🚀 Future
|
|
689
|
+
- [ ] Federation across instances
|
|
690
|
+
- [ ] Governance workflows
|
|
691
|
+
|
|
692
|
+
## 📄 License
|
|
693
|
+
|
|
694
|
+
MIT License — see [LICENSE](LICENSE) for details.
|
|
695
|
+
|
|
696
|
+
## 🙏 Acknowledgments
|
|
697
|
+
|
|
698
|
+
- [Polars](https://pola.rs/) — The lightning-fast DataFrame library
|
|
699
|
+
- [RDF-Star Working Group](https://w3c.github.io/rdf-star/) — For the specification
|
|
700
|
+
- [FastAPI](https://fastapi.tiangolo.com/) — Modern Python web framework
|
|
701
|
+
- [D3.js](https://d3js.org/) — Data visualization library
|
|
702
|
+
- [pyparsing](https://pyparsing-docs.readthedocs.io/) — Parser combinators for Python
|
|
703
|
+
|
|
704
|
+
---
|
|
705
|
+
|
|
706
|
+
**RDF-StarBase** — *The place where enterprises store beliefs, not just data.*
|