rdf-lmdb 0.3.2 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ea35cfcc72e22d7a71f047728b412575ce712691c2815a3693d23f94909a78b2
4
- data.tar.gz: ab6e65528c0b9bc70bf4d18a7f47b2f47206dbfd1d09aa99c4551a799694b3a0
3
+ metadata.gz: 0c021040c4ca9c7021c79808072cf55ce2c4f378f995726dfd7eec48c1f1b3ca
4
+ data.tar.gz: 62d9f85388152eddb2f7965ac32cb4fcbb0955d35ab410fa36726632f8f235c0
5
5
  SHA512:
6
- metadata.gz: 483d8b372297ee978595110432dd21170cd34fafde285b7f0f1269fea9d5d06428066b19e8e9b6cf294c6e71dbf378e1b30e37215e36810fae19dd6ebbeda5fc
7
- data.tar.gz: 20fac6ce5e493fa10ef923bd12e0e549e33b83a46f859b623f5fe5838ab42348ec1ef67884bfbe7baaa396ae1937a6bb3256897f86e8fd0343b0705ac2ed423d
6
+ metadata.gz: c76ff224facd0128e6554018bb85937a0f27175f061e2a59093ba007bd6c790330a93ec20c5346d0ba5ec45f242bad0f393d05b322b8d8ce80d705ee122af1f6
7
+ data.tar.gz: 5771bec23b120002df5d94e2c8f1e85657409f447924cacb1f2ec464e1fab5751ac1adbaf5fe3286e7e96051fe012e530ddac8ead2097e3e39a0033195b599c1
data/TODO.org ADDED
@@ -0,0 +1,138 @@
1
+ #+STARTUP: showall hidestars indent
2
+ * Preamble
3
+ - Let us first disabuse ourselves of the notion that this is anyhthing more than a toy database.
4
+ - That said, it's written in a language which is easy to experiment with, on top of a simple database which is easy to use.
5
+ - Also, none of what I'm proposing in here is peculiar to either Ruby or LMDB.
6
+ - Indeed, any language and any direct-attached key-value store that does transactions could support this (I think?)
7
+ - So whereas other products like [[https://github.com/oxigraph/oxigraph][Oxigraph]] are focused on features like SPARQL, I am particularly interested in how you lay out a key-value store /in general/ such that you can represent an RDF store with characteristics like:
8
+ - RDF-star (which I should just do anyway)
9
+ - a change history (i.e., undo)
10
+ - dealing with multiple users
11
+ - (i.e., access control)
12
+ - efficient storage of typed literals
13
+ - efficient handling of large literals and ~data:~ URIs
14
+ - unicode normalization for literals for sure
15
+ - outsourcing to content-addressable storage would be ideal
16
+ - There are going to be really silly SPARQL queries like searching substrings in ~data:~ URIs
17
+ - at the basic graph level we will probably just have to serve those up and deal with the cost of doing that
18
+ - Inferencing:
19
+ - RDFS, OWL, SHACL inferencing for basic graph queries
20
+ - don't /generate/ statements here, just return them if the inferences resolve
21
+ - Layering:
22
+ - think [[https://en.wikipedia.org/wiki/Union_mount][~unionfs~]] but for RDF stores
23
+ - "union graphs", contexts which merge two or more other contexts together
24
+ - no context is kind of like the union of all contexts
25
+ - except triples have to be stored in an invisible null context if they aren't explicitly ascribed a context
26
+ - if you select without a context it should return statements from all contexts at once
27
+ - if you delete a triple (ie not a quad) it should delete it from all contexts (y/n?)
28
+ - it should be possible to specify contexts that union other arbitrary contexts together
29
+ - this should recurse but probably not loop/self-reference
30
+ - the question (as ever) will be when you write to one of these, what happens?
31
+ - "consensus graphs" which extend the idea of union graphs to a shared reality for multiple users
32
+ - "proxy graphs" that map to other systems (e.g. SQL)
33
+ - or even other RDF stores
34
+ - statement-generating layers that do things we actually /do/ want statements in the graph for, but /generated/ rather than stored (or perhaps merely /cached/, and thus not subject to versioning)
35
+ - e.g. "soft" inferences, stuff written in vocab specs that had no way to formally express at the time
36
+ - I'm thinking specifically how ~?c a skos:OrderedCollection; skos:memberList (?m1 ?m2 ?mn)~ implies ~?c skos:member ?m1~ and so on.
37
+ - Totally achievable with [[https://www.w3.org/TR/shacl-af/#rules][SHACL rules]]
38
+ - e.g. stateful or aggregate statements computed from other statements
39
+ - again this is totally doable with SHACL.
40
+ * TODO RDF-star
41
+ - at root there are terms
42
+ - terms can be normalized and hashed
43
+ - each term is assigned a numeric identifier that is local to the database and not otherwise exposed
44
+ - assume this is a native-endian ~size_t~ integer; we are not gonna screw around with portability across cpu architectures
45
+ - so intel (and apple silicon coincidentally) will be 64-bit little-endian
46
+ - statements are composed of terms
47
+ - statements can be represented as: ~statement id => [subject id, predicate id, object id]~
48
+ - quad stores have contexts
49
+ - a context is just a term
50
+ - ~context id => statement id~
51
+ - also ~statement id => context id~
52
+ - the gist of RDF* is that entire statements can also be terms
53
+ - and this can be recursive
54
+ - so subjects and objects can now be /statements/ in addition to URIs and bnodes (and literals for objects)
55
+ - so it shouldn't be the end of the world to make that a thing
56
+ - albeit backward-compatibility to existing stores might be a problem
57
+ - well if anybody wants to hire me te do that for them, they can
58
+ * TODO change history
59
+ - anyway, that aside, what we're actually after is being able to access the state of the database at the instant of a particular transaction
60
+ - random access is ideal
61
+ - indeed random access is probably /necessary/, all things considered
62
+ - so there should be a basic key-value map that maps statement identifiers to statements
63
+ - then there is another one that maps statements to contexts; this is how contexts are handled
64
+ - each transaction can basically be seen as a "meta-context"
65
+ - i.e., the state after the transaction is committed may as well have its own context URL.
66
+ - the grammar of change in an rdf store reduces to:
67
+ - statements added
68
+ - statements removed
69
+ - we can work with this
70
+ - again, you have layer /zero/ which maps between terms and hashes/internal IDs
71
+ - this is like saying "the database has seen /these/ terms."
72
+ - you have layer /one/ which maps statements (which are also considered terms) to their referents
73
+ - this is like saying "the database has seen /these/ statements."
74
+ - (again note statements are also terms under RDF*.)
75
+ - layer /two/ says which /contexts/ the statements belong to.
76
+ - this is like saying "the /context/ currently contains these statements."
77
+ - there is a "null" context that includes all statements ever
78
+ ** TODO make a sandwich layer between raw statements and context for current state
79
+ - between-/ish/: you can easily imagine removing a statement from one context and adding it to another within a single transaction
80
+ - every transaction can be represented as adding and/or removing zero or more quads such that the union of both sets is nonempty
81
+ - otherwise there's nothing to record
82
+ - in other words to be recorded as a transaction you have to /either/ add /or/ remove at least one quad, otherwise it's a no-op
83
+ - originally considered using generated contexts as a surrogate interface for identifying individual states
84
+ - this obviously isn't going to work because a context implies what remains is a /triple/, not a /quad/, so diffs that don't change anything but the context of a given statement aren't going to be visible
85
+ - although ehhh that's gonna be weird already because you'll have to have individual contexts for the add side /and/ remove side
86
+ - how else are you going to represent statements that were removed?
87
+ - anyway there is the technical problem of how to implement this without a shitload of waste
88
+ - change ID
89
+ - statements removed
90
+ - statements added
91
+ - if the change ID monotonically increases (it should, at least internally) on retrieval we just do this:
92
+ - retrieve the statement from whatever stateless storage
93
+ - check if it has been added by whatever change ID we're currently looking at
94
+ - check if it has not been subsequently removed
95
+ - if it /has/ been subsequently removed, check if it has been re-added
96
+ - basically we need a mapping of statement ID to change ID
97
+ - why not just stick a bit on the end of that as to whether it's added or removed
98
+ - so we have ~added~ and ~removed~ tables of the form ~change id => statement id~
99
+ - we also have i dunno, ~state~ or something of the form ~statement id => change id, bit for added/removed~
100
+ ** TODO global mtime
101
+ - which resources have been affected since this time/transaction id
102
+ * TODO principals (multi-user)
103
+ - each individual user gets their own quad store from their point of view
104
+ - "consensus graph" for multiple users
105
+ - union of individual spaces
106
+ - one context identifier everybody involved can read in its totality
107
+ - every statement you /add/ goes into your own slice and is visible to everybody in the group
108
+ - you can't add or delete statements in other people's slices and they can't change yours
109
+ - though they should be able to transfer ownership of a set of statements to you somehow
110
+ - (but the person receiving should be able to decline the transfer)
111
+ ** TODO lensing
112
+ - my concern here is a way to have a single repository that can support multiple users without "leaking" content from other users
113
+ - like we /could/ just partition these on the disk but there are reasons not to do this:
114
+ 1. it's gonna be a pain in the ass for downstream applications in the best case
115
+ 2. there will be at least /some/ material in common across all users, so content will be unnecessarily duplicated
116
+ 3. LMDB uses ~mmap~ and so running multiple repositories will use lots of ram
117
+ - i.e., rudimentary "access control"
118
+ - it would behave like a second, invisible context
119
+ - something like ~lensed_repo = repo.lens uri~
120
+ - then you can use ~lensed_repo~ without worrying that it will leak
121
+ ** TODO access control
122
+ - evaluate different approaches
123
+ - resource-based
124
+ - individual resources or sets of resources?
125
+ - privileges:
126
+ - know the existence of a resource
127
+ - i.e. you don't see statements with this rsource
128
+ - read statements where the resource is a subject
129
+ - going to have to censor ~owl:inverseOf~ etc, i.e. access control will have to be evaluated before inferences
130
+ - add statements with this subject
131
+ - remove statements with this subject
132
+ - statement-based
133
+ - just access-control entire contexts?
134
+ - that would probably be easiest
135
+ - identity-oriented vs capability-oriented
136
+ - would kinda love to do capability-oriented
137
+ * TODO layered graphs
138
+ - yeah this is gonna be hard lol
@@ -1,5 +1,5 @@
1
1
  module RDF
2
2
  module LMDB
3
- VERSION = "0.3.2"
3
+ VERSION = "0.3.3"
4
4
  end
5
5
  end
data/lib/rdf/lmdb.rb CHANGED
@@ -5,6 +5,7 @@ require 'rdf/ntriples'
5
5
  require 'pathname'
6
6
  require 'lmdb'
7
7
  require 'digest'
8
+ require 'time'
8
9
  require 'unf' # lol unf unf unf
9
10
 
10
11
  module RDF
@@ -165,6 +166,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
165
166
  # databases are opened in a transaction, who knew
166
167
  @lmdb.transaction do # |t|
167
168
  @dbs = {
169
+ # this is the control database, it gets no flags
170
+ control: [],
171
+ # actual instance data
168
172
  statement: [:integerkey], # key: int; val: ints
169
173
  hash2term: [], # key: sha256, val: int
170
174
  int2term: [:integerkey], # key: int, val: string
@@ -187,6 +191,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
187
191
  **(flags + [:create]).map { |f| [f, true] }.to_h)]
188
192
  end.to_h
189
193
 
194
+ # this will write the mtime if it isn't already there
195
+ mtime
196
+
190
197
  # t.commit
191
198
  end
192
199
  @lmdb.sync
@@ -635,6 +642,14 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
635
642
  ret
636
643
  end
637
644
 
645
+ def log_mtime time = nil
646
+ time ||= Time.now in: ?Z
647
+ nsecs = time.utc.to_r
648
+ nsecs = (nsecs * 10**9).numerator
649
+ @lmdb.transaction { @dbs[:control]['mtime'] = [nsecs].pack ?q }
650
+ time
651
+ end
652
+
638
653
  public
639
654
 
640
655
  def initialize dir = nil, uri: nil, title: nil, **options, &block
@@ -679,11 +694,28 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
679
694
  @lmdb.close
680
695
  end
681
696
 
697
+ # Return a {::Time} object representing when the store was last written.
698
+ #
699
+ # @return [Time] said modification time
700
+ #
701
+ def mtime
702
+ if packed = @dbs[:control]['mtime']
703
+ nsecs = Rational(packed.unpack1(?q), 10 ** 9)
704
+ Time.at nsecs, in: ?Z
705
+ else
706
+ log_mtime
707
+ end
708
+ end
709
+
682
710
  # data manipulation
683
711
 
684
712
  def insert_statement statement
685
713
  complete! statement
686
- @lmdb.transaction { |t| add_one statement; t.commit }
714
+ @lmdb.transaction do |t|
715
+ add_one statement
716
+ log_mtime
717
+ t.commit # cargo cult?
718
+ end
687
719
  nil
688
720
  end
689
721
 
@@ -698,6 +730,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
698
730
  else
699
731
  rm_one statement
700
732
  end
733
+
734
+ log_mtime
735
+
701
736
  t.commit
702
737
  end
703
738
  nil
@@ -709,6 +744,7 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
709
744
  complete! statement
710
745
  add_one statement
711
746
  end
747
+ log_mtime
712
748
  end
713
749
 
714
750
  nil
@@ -736,6 +772,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
736
772
  end
737
773
 
738
774
  clean_terms hashes.uniq
775
+
776
+ log_mtime
777
+
739
778
  t.commit
740
779
  end
741
780
 
data/rdf-lmdb.gemspec CHANGED
@@ -36,5 +36,5 @@ robust key-value store.
36
36
  # stuff we use
37
37
  spec.add_runtime_dependency 'unf', '~> 0.1'
38
38
  spec.add_runtime_dependency 'rdf', '~> 3'
39
- spec.add_runtime_dependency 'lmdb', '~> 0.6.1'
39
+ spec.add_runtime_dependency 'lmdb', '~> 0.6', '>= 0.6.2'
40
40
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rdf-lmdb
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.2
4
+ version: 0.3.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dorian Taylor
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2024-02-19 00:00:00.000000000 Z
11
+ date: 2025-04-05 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -100,14 +100,20 @@ dependencies:
100
100
  requirements:
101
101
  - - "~>"
102
102
  - !ruby/object:Gem::Version
103
- version: 0.6.1
103
+ version: '0.6'
104
+ - - ">="
105
+ - !ruby/object:Gem::Version
106
+ version: 0.6.2
104
107
  type: :runtime
105
108
  prerelease: false
106
109
  version_requirements: !ruby/object:Gem::Requirement
107
110
  requirements:
108
111
  - - "~>"
109
112
  - !ruby/object:Gem::Version
110
- version: 0.6.1
113
+ version: '0.6'
114
+ - - ">="
115
+ - !ruby/object:Gem::Version
116
+ version: 0.6.2
111
117
  description: |
112
118
  This module implements RDF::Repository on top of LMDB, a fast and
113
119
  robust key-value store.
@@ -124,6 +130,7 @@ files:
124
130
  - LICENSE
125
131
  - README.md
126
132
  - Rakefile
133
+ - TODO.org
127
134
  - bin/console
128
135
  - bin/setup
129
136
  - lib/rdf-lmdb.rb
@@ -149,7 +156,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
149
156
  - !ruby/object:Gem::Version
150
157
  version: '0'
151
158
  requirements: []
152
- rubygems_version: 3.3.15
159
+ rubygems_version: 3.4.20
153
160
  signing_key:
154
161
  specification_version: 4
155
162
  summary: Symas LMDB back-end for RDF::Repository