rdf-lmdb 0.3.2 → 0.3.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/TODO.org +138 -0
- data/lib/rdf/lmdb/version.rb +1 -1
- data/lib/rdf/lmdb.rb +40 -1
- data/rdf-lmdb.gemspec +1 -1
- metadata +12 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 0c021040c4ca9c7021c79808072cf55ce2c4f378f995726dfd7eec48c1f1b3ca
|
4
|
+
data.tar.gz: 62d9f85388152eddb2f7965ac32cb4fcbb0955d35ab410fa36726632f8f235c0
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c76ff224facd0128e6554018bb85937a0f27175f061e2a59093ba007bd6c790330a93ec20c5346d0ba5ec45f242bad0f393d05b322b8d8ce80d705ee122af1f6
|
7
|
+
data.tar.gz: 5771bec23b120002df5d94e2c8f1e85657409f447924cacb1f2ec464e1fab5751ac1adbaf5fe3286e7e96051fe012e530ddac8ead2097e3e39a0033195b599c1
|
data/TODO.org
ADDED
@@ -0,0 +1,138 @@
|
|
1
|
+
#+STARTUP: showall hidestars indent
|
2
|
+
* Preamble
|
3
|
+
- Let us first disabuse ourselves of the notion that this is anyhthing more than a toy database.
|
4
|
+
- That said, it's written in a language which is easy to experiment with, on top of a simple database which is easy to use.
|
5
|
+
- Also, none of what I'm proposing in here is peculiar to either Ruby or LMDB.
|
6
|
+
- Indeed, any language and any direct-attached key-value store that does transactions could support this (I think?)
|
7
|
+
- So whereas other products like [[https://github.com/oxigraph/oxigraph][Oxigraph]] are focused on features like SPARQL, I am particularly interested in how you lay out a key-value store /in general/ such that you can represent an RDF store with characteristics like:
|
8
|
+
- RDF-star (which I should just do anyway)
|
9
|
+
- a change history (i.e., undo)
|
10
|
+
- dealing with multiple users
|
11
|
+
- (i.e., access control)
|
12
|
+
- efficient storage of typed literals
|
13
|
+
- efficient handling of large literals and ~data:~ URIs
|
14
|
+
- unicode normalization for literals for sure
|
15
|
+
- outsourcing to content-addressable storage would be ideal
|
16
|
+
- There are going to be really silly SPARQL queries like searching substrings in ~data:~ URIs
|
17
|
+
- at the basic graph level we will probably just have to serve those up and deal with the cost of doing that
|
18
|
+
- Inferencing:
|
19
|
+
- RDFS, OWL, SHACL inferencing for basic graph queries
|
20
|
+
- don't /generate/ statements here, just return them if the inferences resolve
|
21
|
+
- Layering:
|
22
|
+
- think [[https://en.wikipedia.org/wiki/Union_mount][~unionfs~]] but for RDF stores
|
23
|
+
- "union graphs", contexts which merge two or more other contexts together
|
24
|
+
- no context is kind of like the union of all contexts
|
25
|
+
- except triples have to be stored in an invisible null context if they aren't explicitly ascribed a context
|
26
|
+
- if you select without a context it should return statements from all contexts at once
|
27
|
+
- if you delete a triple (ie not a quad) it should delete it from all contexts (y/n?)
|
28
|
+
- it should be possible to specify contexts that union other arbitrary contexts together
|
29
|
+
- this should recurse but probably not loop/self-reference
|
30
|
+
- the question (as ever) will be when you write to one of these, what happens?
|
31
|
+
- "consensus graphs" which extend the idea of union graphs to a shared reality for multiple users
|
32
|
+
- "proxy graphs" that map to other systems (e.g. SQL)
|
33
|
+
- or even other RDF stores
|
34
|
+
- statement-generating layers that do things we actually /do/ want statements in the graph for, but /generated/ rather than stored (or perhaps merely /cached/, and thus not subject to versioning)
|
35
|
+
- e.g. "soft" inferences, stuff written in vocab specs that had no way to formally express at the time
|
36
|
+
- I'm thinking specifically how ~?c a skos:OrderedCollection; skos:memberList (?m1 ?m2 ?mn)~ implies ~?c skos:member ?m1~ and so on.
|
37
|
+
- Totally achievable with [[https://www.w3.org/TR/shacl-af/#rules][SHACL rules]]
|
38
|
+
- e.g. stateful or aggregate statements computed from other statements
|
39
|
+
- again this is totally doable with SHACL.
|
40
|
+
* TODO RDF-star
|
41
|
+
- at root there are terms
|
42
|
+
- terms can be normalized and hashed
|
43
|
+
- each term is assigned a numeric identifier that is local to the database and not otherwise exposed
|
44
|
+
- assume this is a native-endian ~size_t~ integer; we are not gonna screw around with portability across cpu architectures
|
45
|
+
- so intel (and apple silicon coincidentally) will be 64-bit little-endian
|
46
|
+
- statements are composed of terms
|
47
|
+
- statements can be represented as: ~statement id => [subject id, predicate id, object id]~
|
48
|
+
- quad stores have contexts
|
49
|
+
- a context is just a term
|
50
|
+
- ~context id => statement id~
|
51
|
+
- also ~statement id => context id~
|
52
|
+
- the gist of RDF* is that entire statements can also be terms
|
53
|
+
- and this can be recursive
|
54
|
+
- so subjects and objects can now be /statements/ in addition to URIs and bnodes (and literals for objects)
|
55
|
+
- so it shouldn't be the end of the world to make that a thing
|
56
|
+
- albeit backward-compatibility to existing stores might be a problem
|
57
|
+
- well if anybody wants to hire me te do that for them, they can
|
58
|
+
* TODO change history
|
59
|
+
- anyway, that aside, what we're actually after is being able to access the state of the database at the instant of a particular transaction
|
60
|
+
- random access is ideal
|
61
|
+
- indeed random access is probably /necessary/, all things considered
|
62
|
+
- so there should be a basic key-value map that maps statement identifiers to statements
|
63
|
+
- then there is another one that maps statements to contexts; this is how contexts are handled
|
64
|
+
- each transaction can basically be seen as a "meta-context"
|
65
|
+
- i.e., the state after the transaction is committed may as well have its own context URL.
|
66
|
+
- the grammar of change in an rdf store reduces to:
|
67
|
+
- statements added
|
68
|
+
- statements removed
|
69
|
+
- we can work with this
|
70
|
+
- again, you have layer /zero/ which maps between terms and hashes/internal IDs
|
71
|
+
- this is like saying "the database has seen /these/ terms."
|
72
|
+
- you have layer /one/ which maps statements (which are also considered terms) to their referents
|
73
|
+
- this is like saying "the database has seen /these/ statements."
|
74
|
+
- (again note statements are also terms under RDF*.)
|
75
|
+
- layer /two/ says which /contexts/ the statements belong to.
|
76
|
+
- this is like saying "the /context/ currently contains these statements."
|
77
|
+
- there is a "null" context that includes all statements ever
|
78
|
+
** TODO make a sandwich layer between raw statements and context for current state
|
79
|
+
- between-/ish/: you can easily imagine removing a statement from one context and adding it to another within a single transaction
|
80
|
+
- every transaction can be represented as adding and/or removing zero or more quads such that the union of both sets is nonempty
|
81
|
+
- otherwise there's nothing to record
|
82
|
+
- in other words to be recorded as a transaction you have to /either/ add /or/ remove at least one quad, otherwise it's a no-op
|
83
|
+
- originally considered using generated contexts as a surrogate interface for identifying individual states
|
84
|
+
- this obviously isn't going to work because a context implies what remains is a /triple/, not a /quad/, so diffs that don't change anything but the context of a given statement aren't going to be visible
|
85
|
+
- although ehhh that's gonna be weird already because you'll have to have individual contexts for the add side /and/ remove side
|
86
|
+
- how else are you going to represent statements that were removed?
|
87
|
+
- anyway there is the technical problem of how to implement this without a shitload of waste
|
88
|
+
- change ID
|
89
|
+
- statements removed
|
90
|
+
- statements added
|
91
|
+
- if the change ID monotonically increases (it should, at least internally) on retrieval we just do this:
|
92
|
+
- retrieve the statement from whatever stateless storage
|
93
|
+
- check if it has been added by whatever change ID we're currently looking at
|
94
|
+
- check if it has not been subsequently removed
|
95
|
+
- if it /has/ been subsequently removed, check if it has been re-added
|
96
|
+
- basically we need a mapping of statement ID to change ID
|
97
|
+
- why not just stick a bit on the end of that as to whether it's added or removed
|
98
|
+
- so we have ~added~ and ~removed~ tables of the form ~change id => statement id~
|
99
|
+
- we also have i dunno, ~state~ or something of the form ~statement id => change id, bit for added/removed~
|
100
|
+
** TODO global mtime
|
101
|
+
- which resources have been affected since this time/transaction id
|
102
|
+
* TODO principals (multi-user)
|
103
|
+
- each individual user gets their own quad store from their point of view
|
104
|
+
- "consensus graph" for multiple users
|
105
|
+
- union of individual spaces
|
106
|
+
- one context identifier everybody involved can read in its totality
|
107
|
+
- every statement you /add/ goes into your own slice and is visible to everybody in the group
|
108
|
+
- you can't add or delete statements in other people's slices and they can't change yours
|
109
|
+
- though they should be able to transfer ownership of a set of statements to you somehow
|
110
|
+
- (but the person receiving should be able to decline the transfer)
|
111
|
+
** TODO lensing
|
112
|
+
- my concern here is a way to have a single repository that can support multiple users without "leaking" content from other users
|
113
|
+
- like we /could/ just partition these on the disk but there are reasons not to do this:
|
114
|
+
1. it's gonna be a pain in the ass for downstream applications in the best case
|
115
|
+
2. there will be at least /some/ material in common across all users, so content will be unnecessarily duplicated
|
116
|
+
3. LMDB uses ~mmap~ and so running multiple repositories will use lots of ram
|
117
|
+
- i.e., rudimentary "access control"
|
118
|
+
- it would behave like a second, invisible context
|
119
|
+
- something like ~lensed_repo = repo.lens uri~
|
120
|
+
- then you can use ~lensed_repo~ without worrying that it will leak
|
121
|
+
** TODO access control
|
122
|
+
- evaluate different approaches
|
123
|
+
- resource-based
|
124
|
+
- individual resources or sets of resources?
|
125
|
+
- privileges:
|
126
|
+
- know the existence of a resource
|
127
|
+
- i.e. you don't see statements with this rsource
|
128
|
+
- read statements where the resource is a subject
|
129
|
+
- going to have to censor ~owl:inverseOf~ etc, i.e. access control will have to be evaluated before inferences
|
130
|
+
- add statements with this subject
|
131
|
+
- remove statements with this subject
|
132
|
+
- statement-based
|
133
|
+
- just access-control entire contexts?
|
134
|
+
- that would probably be easiest
|
135
|
+
- identity-oriented vs capability-oriented
|
136
|
+
- would kinda love to do capability-oriented
|
137
|
+
* TODO layered graphs
|
138
|
+
- yeah this is gonna be hard lol
|
data/lib/rdf/lmdb/version.rb
CHANGED
data/lib/rdf/lmdb.rb
CHANGED
@@ -5,6 +5,7 @@ require 'rdf/ntriples'
|
|
5
5
|
require 'pathname'
|
6
6
|
require 'lmdb'
|
7
7
|
require 'digest'
|
8
|
+
require 'time'
|
8
9
|
require 'unf' # lol unf unf unf
|
9
10
|
|
10
11
|
module RDF
|
@@ -165,6 +166,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
165
166
|
# databases are opened in a transaction, who knew
|
166
167
|
@lmdb.transaction do # |t|
|
167
168
|
@dbs = {
|
169
|
+
# this is the control database, it gets no flags
|
170
|
+
control: [],
|
171
|
+
# actual instance data
|
168
172
|
statement: [:integerkey], # key: int; val: ints
|
169
173
|
hash2term: [], # key: sha256, val: int
|
170
174
|
int2term: [:integerkey], # key: int, val: string
|
@@ -187,6 +191,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
187
191
|
**(flags + [:create]).map { |f| [f, true] }.to_h)]
|
188
192
|
end.to_h
|
189
193
|
|
194
|
+
# this will write the mtime if it isn't already there
|
195
|
+
mtime
|
196
|
+
|
190
197
|
# t.commit
|
191
198
|
end
|
192
199
|
@lmdb.sync
|
@@ -635,6 +642,14 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
635
642
|
ret
|
636
643
|
end
|
637
644
|
|
645
|
+
def log_mtime time = nil
|
646
|
+
time ||= Time.now in: ?Z
|
647
|
+
nsecs = time.utc.to_r
|
648
|
+
nsecs = (nsecs * 10**9).numerator
|
649
|
+
@lmdb.transaction { @dbs[:control]['mtime'] = [nsecs].pack ?q }
|
650
|
+
time
|
651
|
+
end
|
652
|
+
|
638
653
|
public
|
639
654
|
|
640
655
|
def initialize dir = nil, uri: nil, title: nil, **options, &block
|
@@ -679,11 +694,28 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
679
694
|
@lmdb.close
|
680
695
|
end
|
681
696
|
|
697
|
+
# Return a {::Time} object representing when the store was last written.
|
698
|
+
#
|
699
|
+
# @return [Time] said modification time
|
700
|
+
#
|
701
|
+
def mtime
|
702
|
+
if packed = @dbs[:control]['mtime']
|
703
|
+
nsecs = Rational(packed.unpack1(?q), 10 ** 9)
|
704
|
+
Time.at nsecs, in: ?Z
|
705
|
+
else
|
706
|
+
log_mtime
|
707
|
+
end
|
708
|
+
end
|
709
|
+
|
682
710
|
# data manipulation
|
683
711
|
|
684
712
|
def insert_statement statement
|
685
713
|
complete! statement
|
686
|
-
@lmdb.transaction
|
714
|
+
@lmdb.transaction do |t|
|
715
|
+
add_one statement
|
716
|
+
log_mtime
|
717
|
+
t.commit # cargo cult?
|
718
|
+
end
|
687
719
|
nil
|
688
720
|
end
|
689
721
|
|
@@ -698,6 +730,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
698
730
|
else
|
699
731
|
rm_one statement
|
700
732
|
end
|
733
|
+
|
734
|
+
log_mtime
|
735
|
+
|
701
736
|
t.commit
|
702
737
|
end
|
703
738
|
nil
|
@@ -709,6 +744,7 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
709
744
|
complete! statement
|
710
745
|
add_one statement
|
711
746
|
end
|
747
|
+
log_mtime
|
712
748
|
end
|
713
749
|
|
714
750
|
nil
|
@@ -736,6 +772,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
736
772
|
end
|
737
773
|
|
738
774
|
clean_terms hashes.uniq
|
775
|
+
|
776
|
+
log_mtime
|
777
|
+
|
739
778
|
t.commit
|
740
779
|
end
|
741
780
|
|
data/rdf-lmdb.gemspec
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: rdf-lmdb
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
4
|
+
version: 0.3.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dorian Taylor
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2025-04-05 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -100,14 +100,20 @@ dependencies:
|
|
100
100
|
requirements:
|
101
101
|
- - "~>"
|
102
102
|
- !ruby/object:Gem::Version
|
103
|
-
version: 0.6
|
103
|
+
version: '0.6'
|
104
|
+
- - ">="
|
105
|
+
- !ruby/object:Gem::Version
|
106
|
+
version: 0.6.2
|
104
107
|
type: :runtime
|
105
108
|
prerelease: false
|
106
109
|
version_requirements: !ruby/object:Gem::Requirement
|
107
110
|
requirements:
|
108
111
|
- - "~>"
|
109
112
|
- !ruby/object:Gem::Version
|
110
|
-
version: 0.6
|
113
|
+
version: '0.6'
|
114
|
+
- - ">="
|
115
|
+
- !ruby/object:Gem::Version
|
116
|
+
version: 0.6.2
|
111
117
|
description: |
|
112
118
|
This module implements RDF::Repository on top of LMDB, a fast and
|
113
119
|
robust key-value store.
|
@@ -124,6 +130,7 @@ files:
|
|
124
130
|
- LICENSE
|
125
131
|
- README.md
|
126
132
|
- Rakefile
|
133
|
+
- TODO.org
|
127
134
|
- bin/console
|
128
135
|
- bin/setup
|
129
136
|
- lib/rdf-lmdb.rb
|
@@ -149,7 +156,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
149
156
|
- !ruby/object:Gem::Version
|
150
157
|
version: '0'
|
151
158
|
requirements: []
|
152
|
-
rubygems_version: 3.
|
159
|
+
rubygems_version: 3.4.20
|
153
160
|
signing_key:
|
154
161
|
specification_version: 4
|
155
162
|
summary: Symas LMDB back-end for RDF::Repository
|