rdf-lmdb 0.3.2 → 0.3.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/TODO.org +138 -0
- data/lib/rdf/lmdb/version.rb +1 -1
- data/lib/rdf/lmdb.rb +50 -10
- data/rdf-lmdb.gemspec +1 -1
- metadata +12 -8
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: af391230f651d1a891379a96c2ce2fd5a28dc0fc1e094c61bdfa285d0f5778d3
|
4
|
+
data.tar.gz: da776b5c1acff8e8184f99955868124aecdcc386f48a4fb7d2710fef77a9bfb5
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: e1dfec2d450287001e0bd2ee8998844f66ea4196fab3d644a5dfc12104b482e43cbff882e2697f6661b5f8bedb6bf9f5d07e9bb94f2edebb9813faf7da23e10a
|
7
|
+
data.tar.gz: 46aec34649aaff508943353d7de50288b9a2eed00083b37204f65969ea9401ab58fb39a904935a8aefac7b39a62d394cbd65c8e34734a53127b1a24265735fee
|
data/TODO.org
ADDED
@@ -0,0 +1,138 @@
|
|
1
|
+
#+STARTUP: showall hidestars indent
|
2
|
+
* Preamble
|
3
|
+
- Let us first disabuse ourselves of the notion that this is anyhthing more than a toy database.
|
4
|
+
- That said, it's written in a language which is easy to experiment with, on top of a simple database which is easy to use.
|
5
|
+
- Also, none of what I'm proposing in here is peculiar to either Ruby or LMDB.
|
6
|
+
- Indeed, any language and any direct-attached key-value store that does transactions could support this (I think?)
|
7
|
+
- So whereas other products like [[https://github.com/oxigraph/oxigraph][Oxigraph]] are focused on features like SPARQL, I am particularly interested in how you lay out a key-value store /in general/ such that you can represent an RDF store with characteristics like:
|
8
|
+
- RDF-star (which I should just do anyway)
|
9
|
+
- a change history (i.e., undo)
|
10
|
+
- dealing with multiple users
|
11
|
+
- (i.e., access control)
|
12
|
+
- efficient storage of typed literals
|
13
|
+
- efficient handling of large literals and ~data:~ URIs
|
14
|
+
- unicode normalization for literals for sure
|
15
|
+
- outsourcing to content-addressable storage would be ideal
|
16
|
+
- There are going to be really silly SPARQL queries like searching substrings in ~data:~ URIs
|
17
|
+
- at the basic graph level we will probably just have to serve those up and deal with the cost of doing that
|
18
|
+
- Inferencing:
|
19
|
+
- RDFS, OWL, SHACL inferencing for basic graph queries
|
20
|
+
- don't /generate/ statements here, just return them if the inferences resolve
|
21
|
+
- Layering:
|
22
|
+
- think [[https://en.wikipedia.org/wiki/Union_mount][~unionfs~]] but for RDF stores
|
23
|
+
- "union graphs", contexts which merge two or more other contexts together
|
24
|
+
- no context is kind of like the union of all contexts
|
25
|
+
- except triples have to be stored in an invisible null context if they aren't explicitly ascribed a context
|
26
|
+
- if you select without a context it should return statements from all contexts at once
|
27
|
+
- if you delete a triple (ie not a quad) it should delete it from all contexts (y/n?)
|
28
|
+
- it should be possible to specify contexts that union other arbitrary contexts together
|
29
|
+
- this should recurse but probably not loop/self-reference
|
30
|
+
- the question (as ever) will be when you write to one of these, what happens?
|
31
|
+
- "consensus graphs" which extend the idea of union graphs to a shared reality for multiple users
|
32
|
+
- "proxy graphs" that map to other systems (e.g. SQL)
|
33
|
+
- or even other RDF stores
|
34
|
+
- statement-generating layers that do things we actually /do/ want statements in the graph for, but /generated/ rather than stored (or perhaps merely /cached/, and thus not subject to versioning)
|
35
|
+
- e.g. "soft" inferences, stuff written in vocab specs that had no way to formally express at the time
|
36
|
+
- I'm thinking specifically how ~?c a skos:OrderedCollection; skos:memberList (?m1 ?m2 ?mn)~ implies ~?c skos:member ?m1~ and so on.
|
37
|
+
- Totally achievable with [[https://www.w3.org/TR/shacl-af/#rules][SHACL rules]]
|
38
|
+
- e.g. stateful or aggregate statements computed from other statements
|
39
|
+
- again this is totally doable with SHACL.
|
40
|
+
* TODO RDF-star
|
41
|
+
- at root there are terms
|
42
|
+
- terms can be normalized and hashed
|
43
|
+
- each term is assigned a numeric identifier that is local to the database and not otherwise exposed
|
44
|
+
- assume this is a native-endian ~size_t~ integer; we are not gonna screw around with portability across cpu architectures
|
45
|
+
- so intel (and apple silicon coincidentally) will be 64-bit little-endian
|
46
|
+
- statements are composed of terms
|
47
|
+
- statements can be represented as: ~statement id => [subject id, predicate id, object id]~
|
48
|
+
- quad stores have contexts
|
49
|
+
- a context is just a term
|
50
|
+
- ~context id => statement id~
|
51
|
+
- also ~statement id => context id~
|
52
|
+
- the gist of RDF* is that entire statements can also be terms
|
53
|
+
- and this can be recursive
|
54
|
+
- so subjects and objects can now be /statements/ in addition to URIs and bnodes (and literals for objects)
|
55
|
+
- so it shouldn't be the end of the world to make that a thing
|
56
|
+
- albeit backward-compatibility to existing stores might be a problem
|
57
|
+
- well if anybody wants to hire me te do that for them, they can
|
58
|
+
* TODO change history
|
59
|
+
- anyway, that aside, what we're actually after is being able to access the state of the database at the instant of a particular transaction
|
60
|
+
- random access is ideal
|
61
|
+
- indeed random access is probably /necessary/, all things considered
|
62
|
+
- so there should be a basic key-value map that maps statement identifiers to statements
|
63
|
+
- then there is another one that maps statements to contexts; this is how contexts are handled
|
64
|
+
- each transaction can basically be seen as a "meta-context"
|
65
|
+
- i.e., the state after the transaction is committed may as well have its own context URL.
|
66
|
+
- the grammar of change in an rdf store reduces to:
|
67
|
+
- statements added
|
68
|
+
- statements removed
|
69
|
+
- we can work with this
|
70
|
+
- again, you have layer /zero/ which maps between terms and hashes/internal IDs
|
71
|
+
- this is like saying "the database has seen /these/ terms."
|
72
|
+
- you have layer /one/ which maps statements (which are also considered terms) to their referents
|
73
|
+
- this is like saying "the database has seen /these/ statements."
|
74
|
+
- (again note statements are also terms under RDF*.)
|
75
|
+
- layer /two/ says which /contexts/ the statements belong to.
|
76
|
+
- this is like saying "the /context/ currently contains these statements."
|
77
|
+
- there is a "null" context that includes all statements ever
|
78
|
+
** TODO make a sandwich layer between raw statements and context for current state
|
79
|
+
- between-/ish/: you can easily imagine removing a statement from one context and adding it to another within a single transaction
|
80
|
+
- every transaction can be represented as adding and/or removing zero or more quads such that the union of both sets is nonempty
|
81
|
+
- otherwise there's nothing to record
|
82
|
+
- in other words to be recorded as a transaction you have to /either/ add /or/ remove at least one quad, otherwise it's a no-op
|
83
|
+
- originally considered using generated contexts as a surrogate interface for identifying individual states
|
84
|
+
- this obviously isn't going to work because a context implies what remains is a /triple/, not a /quad/, so diffs that don't change anything but the context of a given statement aren't going to be visible
|
85
|
+
- although ehhh that's gonna be weird already because you'll have to have individual contexts for the add side /and/ remove side
|
86
|
+
- how else are you going to represent statements that were removed?
|
87
|
+
- anyway there is the technical problem of how to implement this without a shitload of waste
|
88
|
+
- change ID
|
89
|
+
- statements removed
|
90
|
+
- statements added
|
91
|
+
- if the change ID monotonically increases (it should, at least internally) on retrieval we just do this:
|
92
|
+
- retrieve the statement from whatever stateless storage
|
93
|
+
- check if it has been added by whatever change ID we're currently looking at
|
94
|
+
- check if it has not been subsequently removed
|
95
|
+
- if it /has/ been subsequently removed, check if it has been re-added
|
96
|
+
- basically we need a mapping of statement ID to change ID
|
97
|
+
- why not just stick a bit on the end of that as to whether it's added or removed
|
98
|
+
- so we have ~added~ and ~removed~ tables of the form ~change id => statement id~
|
99
|
+
- we also have i dunno, ~state~ or something of the form ~statement id => change id, bit for added/removed~
|
100
|
+
** TODO global mtime
|
101
|
+
- which resources have been affected since this time/transaction id
|
102
|
+
* TODO principals (multi-user)
|
103
|
+
- each individual user gets their own quad store from their point of view
|
104
|
+
- "consensus graph" for multiple users
|
105
|
+
- union of individual spaces
|
106
|
+
- one context identifier everybody involved can read in its totality
|
107
|
+
- every statement you /add/ goes into your own slice and is visible to everybody in the group
|
108
|
+
- you can't add or delete statements in other people's slices and they can't change yours
|
109
|
+
- though they should be able to transfer ownership of a set of statements to you somehow
|
110
|
+
- (but the person receiving should be able to decline the transfer)
|
111
|
+
** TODO lensing
|
112
|
+
- my concern here is a way to have a single repository that can support multiple users without "leaking" content from other users
|
113
|
+
- like we /could/ just partition these on the disk but there are reasons not to do this:
|
114
|
+
1. it's gonna be a pain in the ass for downstream applications in the best case
|
115
|
+
2. there will be at least /some/ material in common across all users, so content will be unnecessarily duplicated
|
116
|
+
3. LMDB uses ~mmap~ and so running multiple repositories will use lots of ram
|
117
|
+
- i.e., rudimentary "access control"
|
118
|
+
- it would behave like a second, invisible context
|
119
|
+
- something like ~lensed_repo = repo.lens uri~
|
120
|
+
- then you can use ~lensed_repo~ without worrying that it will leak
|
121
|
+
** TODO access control
|
122
|
+
- evaluate different approaches
|
123
|
+
- resource-based
|
124
|
+
- individual resources or sets of resources?
|
125
|
+
- privileges:
|
126
|
+
- know the existence of a resource
|
127
|
+
- i.e. you don't see statements with this rsource
|
128
|
+
- read statements where the resource is a subject
|
129
|
+
- going to have to censor ~owl:inverseOf~ etc, i.e. access control will have to be evaluated before inferences
|
130
|
+
- add statements with this subject
|
131
|
+
- remove statements with this subject
|
132
|
+
- statement-based
|
133
|
+
- just access-control entire contexts?
|
134
|
+
- that would probably be easiest
|
135
|
+
- identity-oriented vs capability-oriented
|
136
|
+
- would kinda love to do capability-oriented
|
137
|
+
* TODO layered graphs
|
138
|
+
- yeah this is gonna be hard lol
|
data/lib/rdf/lmdb/version.rb
CHANGED
data/lib/rdf/lmdb.rb
CHANGED
@@ -5,6 +5,7 @@ require 'rdf/ntriples'
|
|
5
5
|
require 'pathname'
|
6
6
|
require 'lmdb'
|
7
7
|
require 'digest'
|
8
|
+
require 'time'
|
8
9
|
require 'unf' # lol unf unf unf
|
9
10
|
|
10
11
|
module RDF
|
@@ -165,6 +166,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
165
166
|
# databases are opened in a transaction, who knew
|
166
167
|
@lmdb.transaction do # |t|
|
167
168
|
@dbs = {
|
169
|
+
# this is the control database, it gets no flags
|
170
|
+
control: [],
|
171
|
+
# actual instance data
|
168
172
|
statement: [:integerkey], # key: int; val: ints
|
169
173
|
hash2term: [], # key: sha256, val: int
|
170
174
|
int2term: [:integerkey], # key: int, val: string
|
@@ -187,6 +191,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
187
191
|
**(flags + [:create]).map { |f| [f, true] }.to_h)]
|
188
192
|
end.to_h
|
189
193
|
|
194
|
+
# this will write the mtime if it isn't already there
|
195
|
+
mtime
|
196
|
+
|
190
197
|
# t.commit
|
191
198
|
end
|
192
199
|
@lmdb.sync
|
@@ -522,7 +529,7 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
522
529
|
ihash = thash.transform_values { |v| int_for v }
|
523
530
|
cache = thash.keys.map { |k| [ihash[k], thash[k]] }.to_h
|
524
531
|
|
525
|
-
body = -> do
|
532
|
+
body = -> _ = nil do
|
526
533
|
# if the graph is nonexistent there is nothing to show
|
527
534
|
return if thash[:graph_name] and !ihash[:graph_name]
|
528
535
|
|
@@ -579,8 +586,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
579
586
|
return unless db.has? anchor
|
580
587
|
|
581
588
|
db.each_value anchor do |spack|
|
582
|
-
spo
|
583
|
-
|
589
|
+
spo = @dbs[:statement][spack]
|
590
|
+
gpack = [ihash[:graph_name]].pack ?J
|
591
|
+
return unless @dbs[:stmt2g].has? spack, gpack
|
584
592
|
spo = resolve_terms spo
|
585
593
|
yield RDF::Statement(*spo, graph_name: thash[:graph_name])
|
586
594
|
end
|
@@ -625,14 +633,22 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
625
633
|
end
|
626
634
|
end
|
627
635
|
|
628
|
-
|
636
|
+
@lmdb.active_txn ? body.call : @lmdb.transaction(true, &body)
|
629
637
|
|
630
|
-
ret = nil
|
631
|
-
@lmdb.transaction do
|
632
|
-
|
633
|
-
end
|
638
|
+
# ret = nil
|
639
|
+
# @lmdb.transaction do
|
640
|
+
# ret = body.call
|
641
|
+
# end
|
634
642
|
|
635
|
-
ret
|
643
|
+
# ret
|
644
|
+
end
|
645
|
+
|
646
|
+
def log_mtime time = nil
|
647
|
+
time ||= Time.now in: ?Z
|
648
|
+
nsecs = time.utc.to_r
|
649
|
+
nsecs = (nsecs * 10**9).numerator
|
650
|
+
@lmdb.transaction { @dbs[:control]['mtime'] = [nsecs].pack ?q }
|
651
|
+
time
|
636
652
|
end
|
637
653
|
|
638
654
|
public
|
@@ -679,11 +695,28 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
679
695
|
@lmdb.close
|
680
696
|
end
|
681
697
|
|
698
|
+
# Return a {::Time} object representing when the store was last written.
|
699
|
+
#
|
700
|
+
# @return [Time] said modification time
|
701
|
+
#
|
702
|
+
def mtime
|
703
|
+
if packed = @dbs[:control]['mtime']
|
704
|
+
nsecs = Rational(packed.unpack1(?q), 10 ** 9)
|
705
|
+
Time.at nsecs, in: ?Z
|
706
|
+
else
|
707
|
+
log_mtime
|
708
|
+
end
|
709
|
+
end
|
710
|
+
|
682
711
|
# data manipulation
|
683
712
|
|
684
713
|
def insert_statement statement
|
685
714
|
complete! statement
|
686
|
-
@lmdb.transaction
|
715
|
+
@lmdb.transaction do |t|
|
716
|
+
add_one statement
|
717
|
+
log_mtime
|
718
|
+
t.commit # cargo cult?
|
719
|
+
end
|
687
720
|
nil
|
688
721
|
end
|
689
722
|
|
@@ -698,6 +731,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
698
731
|
else
|
699
732
|
rm_one statement
|
700
733
|
end
|
734
|
+
|
735
|
+
log_mtime
|
736
|
+
|
701
737
|
t.commit
|
702
738
|
end
|
703
739
|
nil
|
@@ -709,6 +745,7 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
709
745
|
complete! statement
|
710
746
|
add_one statement
|
711
747
|
end
|
748
|
+
log_mtime
|
712
749
|
end
|
713
750
|
|
714
751
|
nil
|
@@ -736,6 +773,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
|
|
736
773
|
end
|
737
774
|
|
738
775
|
clean_terms hashes.uniq
|
776
|
+
|
777
|
+
log_mtime
|
778
|
+
|
739
779
|
t.commit
|
740
780
|
end
|
741
781
|
|
data/rdf-lmdb.gemspec
CHANGED
metadata
CHANGED
@@ -1,14 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: rdf-lmdb
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
4
|
+
version: 0.3.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dorian Taylor
|
8
|
-
autorequire:
|
9
8
|
bindir: exe
|
10
9
|
cert_chain: []
|
11
|
-
date:
|
10
|
+
date: 2025-05-07 00:00:00.000000000 Z
|
12
11
|
dependencies:
|
13
12
|
- !ruby/object:Gem::Dependency
|
14
13
|
name: bundler
|
@@ -100,14 +99,20 @@ dependencies:
|
|
100
99
|
requirements:
|
101
100
|
- - "~>"
|
102
101
|
- !ruby/object:Gem::Version
|
103
|
-
version: 0.6
|
102
|
+
version: '0.6'
|
103
|
+
- - ">="
|
104
|
+
- !ruby/object:Gem::Version
|
105
|
+
version: 0.6.2
|
104
106
|
type: :runtime
|
105
107
|
prerelease: false
|
106
108
|
version_requirements: !ruby/object:Gem::Requirement
|
107
109
|
requirements:
|
108
110
|
- - "~>"
|
109
111
|
- !ruby/object:Gem::Version
|
110
|
-
version: 0.6
|
112
|
+
version: '0.6'
|
113
|
+
- - ">="
|
114
|
+
- !ruby/object:Gem::Version
|
115
|
+
version: 0.6.2
|
111
116
|
description: |
|
112
117
|
This module implements RDF::Repository on top of LMDB, a fast and
|
113
118
|
robust key-value store.
|
@@ -124,6 +129,7 @@ files:
|
|
124
129
|
- LICENSE
|
125
130
|
- README.md
|
126
131
|
- Rakefile
|
132
|
+
- TODO.org
|
127
133
|
- bin/console
|
128
134
|
- bin/setup
|
129
135
|
- lib/rdf-lmdb.rb
|
@@ -134,7 +140,6 @@ homepage: https://github.com/doriantaylor/rb-rdf-lmdb
|
|
134
140
|
licenses:
|
135
141
|
- Apache-2.0
|
136
142
|
metadata: {}
|
137
|
-
post_install_message:
|
138
143
|
rdoc_options: []
|
139
144
|
require_paths:
|
140
145
|
- lib
|
@@ -149,8 +154,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
149
154
|
- !ruby/object:Gem::Version
|
150
155
|
version: '0'
|
151
156
|
requirements: []
|
152
|
-
rubygems_version: 3.3
|
153
|
-
signing_key:
|
157
|
+
rubygems_version: 3.6.3
|
154
158
|
specification_version: 4
|
155
159
|
summary: Symas LMDB back-end for RDF::Repository
|
156
160
|
test_files: []
|