rdf-lmdb 0.3.2 → 0.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ea35cfcc72e22d7a71f047728b412575ce712691c2815a3693d23f94909a78b2
4
- data.tar.gz: ab6e65528c0b9bc70bf4d18a7f47b2f47206dbfd1d09aa99c4551a799694b3a0
3
+ metadata.gz: af391230f651d1a891379a96c2ce2fd5a28dc0fc1e094c61bdfa285d0f5778d3
4
+ data.tar.gz: da776b5c1acff8e8184f99955868124aecdcc386f48a4fb7d2710fef77a9bfb5
5
5
  SHA512:
6
- metadata.gz: 483d8b372297ee978595110432dd21170cd34fafde285b7f0f1269fea9d5d06428066b19e8e9b6cf294c6e71dbf378e1b30e37215e36810fae19dd6ebbeda5fc
7
- data.tar.gz: 20fac6ce5e493fa10ef923bd12e0e549e33b83a46f859b623f5fe5838ab42348ec1ef67884bfbe7baaa396ae1937a6bb3256897f86e8fd0343b0705ac2ed423d
6
+ metadata.gz: e1dfec2d450287001e0bd2ee8998844f66ea4196fab3d644a5dfc12104b482e43cbff882e2697f6661b5f8bedb6bf9f5d07e9bb94f2edebb9813faf7da23e10a
7
+ data.tar.gz: 46aec34649aaff508943353d7de50288b9a2eed00083b37204f65969ea9401ab58fb39a904935a8aefac7b39a62d394cbd65c8e34734a53127b1a24265735fee
data/TODO.org ADDED
@@ -0,0 +1,138 @@
1
+ #+STARTUP: showall hidestars indent
2
+ * Preamble
3
+ - Let us first disabuse ourselves of the notion that this is anyhthing more than a toy database.
4
+ - That said, it's written in a language which is easy to experiment with, on top of a simple database which is easy to use.
5
+ - Also, none of what I'm proposing in here is peculiar to either Ruby or LMDB.
6
+ - Indeed, any language and any direct-attached key-value store that does transactions could support this (I think?)
7
+ - So whereas other products like [[https://github.com/oxigraph/oxigraph][Oxigraph]] are focused on features like SPARQL, I am particularly interested in how you lay out a key-value store /in general/ such that you can represent an RDF store with characteristics like:
8
+ - RDF-star (which I should just do anyway)
9
+ - a change history (i.e., undo)
10
+ - dealing with multiple users
11
+ - (i.e., access control)
12
+ - efficient storage of typed literals
13
+ - efficient handling of large literals and ~data:~ URIs
14
+ - unicode normalization for literals for sure
15
+ - outsourcing to content-addressable storage would be ideal
16
+ - There are going to be really silly SPARQL queries like searching substrings in ~data:~ URIs
17
+ - at the basic graph level we will probably just have to serve those up and deal with the cost of doing that
18
+ - Inferencing:
19
+ - RDFS, OWL, SHACL inferencing for basic graph queries
20
+ - don't /generate/ statements here, just return them if the inferences resolve
21
+ - Layering:
22
+ - think [[https://en.wikipedia.org/wiki/Union_mount][~unionfs~]] but for RDF stores
23
+ - "union graphs", contexts which merge two or more other contexts together
24
+ - no context is kind of like the union of all contexts
25
+ - except triples have to be stored in an invisible null context if they aren't explicitly ascribed a context
26
+ - if you select without a context it should return statements from all contexts at once
27
+ - if you delete a triple (ie not a quad) it should delete it from all contexts (y/n?)
28
+ - it should be possible to specify contexts that union other arbitrary contexts together
29
+ - this should recurse but probably not loop/self-reference
30
+ - the question (as ever) will be when you write to one of these, what happens?
31
+ - "consensus graphs" which extend the idea of union graphs to a shared reality for multiple users
32
+ - "proxy graphs" that map to other systems (e.g. SQL)
33
+ - or even other RDF stores
34
+ - statement-generating layers that do things we actually /do/ want statements in the graph for, but /generated/ rather than stored (or perhaps merely /cached/, and thus not subject to versioning)
35
+ - e.g. "soft" inferences, stuff written in vocab specs that had no way to formally express at the time
36
+ - I'm thinking specifically how ~?c a skos:OrderedCollection; skos:memberList (?m1 ?m2 ?mn)~ implies ~?c skos:member ?m1~ and so on.
37
+ - Totally achievable with [[https://www.w3.org/TR/shacl-af/#rules][SHACL rules]]
38
+ - e.g. stateful or aggregate statements computed from other statements
39
+ - again this is totally doable with SHACL.
40
+ * TODO RDF-star
41
+ - at root there are terms
42
+ - terms can be normalized and hashed
43
+ - each term is assigned a numeric identifier that is local to the database and not otherwise exposed
44
+ - assume this is a native-endian ~size_t~ integer; we are not gonna screw around with portability across cpu architectures
45
+ - so intel (and apple silicon coincidentally) will be 64-bit little-endian
46
+ - statements are composed of terms
47
+ - statements can be represented as: ~statement id => [subject id, predicate id, object id]~
48
+ - quad stores have contexts
49
+ - a context is just a term
50
+ - ~context id => statement id~
51
+ - also ~statement id => context id~
52
+ - the gist of RDF* is that entire statements can also be terms
53
+ - and this can be recursive
54
+ - so subjects and objects can now be /statements/ in addition to URIs and bnodes (and literals for objects)
55
+ - so it shouldn't be the end of the world to make that a thing
56
+ - albeit backward-compatibility to existing stores might be a problem
57
+ - well if anybody wants to hire me te do that for them, they can
58
+ * TODO change history
59
+ - anyway, that aside, what we're actually after is being able to access the state of the database at the instant of a particular transaction
60
+ - random access is ideal
61
+ - indeed random access is probably /necessary/, all things considered
62
+ - so there should be a basic key-value map that maps statement identifiers to statements
63
+ - then there is another one that maps statements to contexts; this is how contexts are handled
64
+ - each transaction can basically be seen as a "meta-context"
65
+ - i.e., the state after the transaction is committed may as well have its own context URL.
66
+ - the grammar of change in an rdf store reduces to:
67
+ - statements added
68
+ - statements removed
69
+ - we can work with this
70
+ - again, you have layer /zero/ which maps between terms and hashes/internal IDs
71
+ - this is like saying "the database has seen /these/ terms."
72
+ - you have layer /one/ which maps statements (which are also considered terms) to their referents
73
+ - this is like saying "the database has seen /these/ statements."
74
+ - (again note statements are also terms under RDF*.)
75
+ - layer /two/ says which /contexts/ the statements belong to.
76
+ - this is like saying "the /context/ currently contains these statements."
77
+ - there is a "null" context that includes all statements ever
78
+ ** TODO make a sandwich layer between raw statements and context for current state
79
+ - between-/ish/: you can easily imagine removing a statement from one context and adding it to another within a single transaction
80
+ - every transaction can be represented as adding and/or removing zero or more quads such that the union of both sets is nonempty
81
+ - otherwise there's nothing to record
82
+ - in other words to be recorded as a transaction you have to /either/ add /or/ remove at least one quad, otherwise it's a no-op
83
+ - originally considered using generated contexts as a surrogate interface for identifying individual states
84
+ - this obviously isn't going to work because a context implies what remains is a /triple/, not a /quad/, so diffs that don't change anything but the context of a given statement aren't going to be visible
85
+ - although ehhh that's gonna be weird already because you'll have to have individual contexts for the add side /and/ remove side
86
+ - how else are you going to represent statements that were removed?
87
+ - anyway there is the technical problem of how to implement this without a shitload of waste
88
+ - change ID
89
+ - statements removed
90
+ - statements added
91
+ - if the change ID monotonically increases (it should, at least internally) on retrieval we just do this:
92
+ - retrieve the statement from whatever stateless storage
93
+ - check if it has been added by whatever change ID we're currently looking at
94
+ - check if it has not been subsequently removed
95
+ - if it /has/ been subsequently removed, check if it has been re-added
96
+ - basically we need a mapping of statement ID to change ID
97
+ - why not just stick a bit on the end of that as to whether it's added or removed
98
+ - so we have ~added~ and ~removed~ tables of the form ~change id => statement id~
99
+ - we also have i dunno, ~state~ or something of the form ~statement id => change id, bit for added/removed~
100
+ ** TODO global mtime
101
+ - which resources have been affected since this time/transaction id
102
+ * TODO principals (multi-user)
103
+ - each individual user gets their own quad store from their point of view
104
+ - "consensus graph" for multiple users
105
+ - union of individual spaces
106
+ - one context identifier everybody involved can read in its totality
107
+ - every statement you /add/ goes into your own slice and is visible to everybody in the group
108
+ - you can't add or delete statements in other people's slices and they can't change yours
109
+ - though they should be able to transfer ownership of a set of statements to you somehow
110
+ - (but the person receiving should be able to decline the transfer)
111
+ ** TODO lensing
112
+ - my concern here is a way to have a single repository that can support multiple users without "leaking" content from other users
113
+ - like we /could/ just partition these on the disk but there are reasons not to do this:
114
+ 1. it's gonna be a pain in the ass for downstream applications in the best case
115
+ 2. there will be at least /some/ material in common across all users, so content will be unnecessarily duplicated
116
+ 3. LMDB uses ~mmap~ and so running multiple repositories will use lots of ram
117
+ - i.e., rudimentary "access control"
118
+ - it would behave like a second, invisible context
119
+ - something like ~lensed_repo = repo.lens uri~
120
+ - then you can use ~lensed_repo~ without worrying that it will leak
121
+ ** TODO access control
122
+ - evaluate different approaches
123
+ - resource-based
124
+ - individual resources or sets of resources?
125
+ - privileges:
126
+ - know the existence of a resource
127
+ - i.e. you don't see statements with this rsource
128
+ - read statements where the resource is a subject
129
+ - going to have to censor ~owl:inverseOf~ etc, i.e. access control will have to be evaluated before inferences
130
+ - add statements with this subject
131
+ - remove statements with this subject
132
+ - statement-based
133
+ - just access-control entire contexts?
134
+ - that would probably be easiest
135
+ - identity-oriented vs capability-oriented
136
+ - would kinda love to do capability-oriented
137
+ * TODO layered graphs
138
+ - yeah this is gonna be hard lol
@@ -1,5 +1,5 @@
1
1
  module RDF
2
2
  module LMDB
3
- VERSION = "0.3.2"
3
+ VERSION = "0.3.4"
4
4
  end
5
5
  end
data/lib/rdf/lmdb.rb CHANGED
@@ -5,6 +5,7 @@ require 'rdf/ntriples'
5
5
  require 'pathname'
6
6
  require 'lmdb'
7
7
  require 'digest'
8
+ require 'time'
8
9
  require 'unf' # lol unf unf unf
9
10
 
10
11
  module RDF
@@ -165,6 +166,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
165
166
  # databases are opened in a transaction, who knew
166
167
  @lmdb.transaction do # |t|
167
168
  @dbs = {
169
+ # this is the control database, it gets no flags
170
+ control: [],
171
+ # actual instance data
168
172
  statement: [:integerkey], # key: int; val: ints
169
173
  hash2term: [], # key: sha256, val: int
170
174
  int2term: [:integerkey], # key: int, val: string
@@ -187,6 +191,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
187
191
  **(flags + [:create]).map { |f| [f, true] }.to_h)]
188
192
  end.to_h
189
193
 
194
+ # this will write the mtime if it isn't already there
195
+ mtime
196
+
190
197
  # t.commit
191
198
  end
192
199
  @lmdb.sync
@@ -522,7 +529,7 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
522
529
  ihash = thash.transform_values { |v| int_for v }
523
530
  cache = thash.keys.map { |k| [ihash[k], thash[k]] }.to_h
524
531
 
525
- body = -> do
532
+ body = -> _ = nil do
526
533
  # if the graph is nonexistent there is nothing to show
527
534
  return if thash[:graph_name] and !ihash[:graph_name]
528
535
 
@@ -579,8 +586,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
579
586
  return unless db.has? anchor
580
587
 
581
588
  db.each_value anchor do |spack|
582
- spo = @dbs[:statement][spack]
583
- return unless @dbs[:stmt2g].has? spack, ihash[:graph_name]
589
+ spo = @dbs[:statement][spack]
590
+ gpack = [ihash[:graph_name]].pack ?J
591
+ return unless @dbs[:stmt2g].has? spack, gpack
584
592
  spo = resolve_terms spo
585
593
  yield RDF::Statement(*spo, graph_name: thash[:graph_name])
586
594
  end
@@ -625,14 +633,22 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
625
633
  end
626
634
  end
627
635
 
628
- #@lmdb.active_txn ? body.call : @lmdb.transaction(true, &body)
636
+ @lmdb.active_txn ? body.call : @lmdb.transaction(true, &body)
629
637
 
630
- ret = nil
631
- @lmdb.transaction do
632
- ret = body.call
633
- end
638
+ # ret = nil
639
+ # @lmdb.transaction do
640
+ # ret = body.call
641
+ # end
634
642
 
635
- ret
643
+ # ret
644
+ end
645
+
646
+ def log_mtime time = nil
647
+ time ||= Time.now in: ?Z
648
+ nsecs = time.utc.to_r
649
+ nsecs = (nsecs * 10**9).numerator
650
+ @lmdb.transaction { @dbs[:control]['mtime'] = [nsecs].pack ?q }
651
+ time
636
652
  end
637
653
 
638
654
  public
@@ -679,11 +695,28 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
679
695
  @lmdb.close
680
696
  end
681
697
 
698
+ # Return a {::Time} object representing when the store was last written.
699
+ #
700
+ # @return [Time] said modification time
701
+ #
702
+ def mtime
703
+ if packed = @dbs[:control]['mtime']
704
+ nsecs = Rational(packed.unpack1(?q), 10 ** 9)
705
+ Time.at nsecs, in: ?Z
706
+ else
707
+ log_mtime
708
+ end
709
+ end
710
+
682
711
  # data manipulation
683
712
 
684
713
  def insert_statement statement
685
714
  complete! statement
686
- @lmdb.transaction { |t| add_one statement; t.commit }
715
+ @lmdb.transaction do |t|
716
+ add_one statement
717
+ log_mtime
718
+ t.commit # cargo cult?
719
+ end
687
720
  nil
688
721
  end
689
722
 
@@ -698,6 +731,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
698
731
  else
699
732
  rm_one statement
700
733
  end
734
+
735
+ log_mtime
736
+
701
737
  t.commit
702
738
  end
703
739
  nil
@@ -709,6 +745,7 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
709
745
  complete! statement
710
746
  add_one statement
711
747
  end
748
+ log_mtime
712
749
  end
713
750
 
714
751
  nil
@@ -736,6 +773,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
736
773
  end
737
774
 
738
775
  clean_terms hashes.uniq
776
+
777
+ log_mtime
778
+
739
779
  t.commit
740
780
  end
741
781
 
data/rdf-lmdb.gemspec CHANGED
@@ -36,5 +36,5 @@ robust key-value store.
36
36
  # stuff we use
37
37
  spec.add_runtime_dependency 'unf', '~> 0.1'
38
38
  spec.add_runtime_dependency 'rdf', '~> 3'
39
- spec.add_runtime_dependency 'lmdb', '~> 0.6.1'
39
+ spec.add_runtime_dependency 'lmdb', '~> 0.6', '>= 0.6.2'
40
40
  end
metadata CHANGED
@@ -1,14 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rdf-lmdb
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.2
4
+ version: 0.3.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dorian Taylor
8
- autorequire:
9
8
  bindir: exe
10
9
  cert_chain: []
11
- date: 2024-02-19 00:00:00.000000000 Z
10
+ date: 2025-05-07 00:00:00.000000000 Z
12
11
  dependencies:
13
12
  - !ruby/object:Gem::Dependency
14
13
  name: bundler
@@ -100,14 +99,20 @@ dependencies:
100
99
  requirements:
101
100
  - - "~>"
102
101
  - !ruby/object:Gem::Version
103
- version: 0.6.1
102
+ version: '0.6'
103
+ - - ">="
104
+ - !ruby/object:Gem::Version
105
+ version: 0.6.2
104
106
  type: :runtime
105
107
  prerelease: false
106
108
  version_requirements: !ruby/object:Gem::Requirement
107
109
  requirements:
108
110
  - - "~>"
109
111
  - !ruby/object:Gem::Version
110
- version: 0.6.1
112
+ version: '0.6'
113
+ - - ">="
114
+ - !ruby/object:Gem::Version
115
+ version: 0.6.2
111
116
  description: |
112
117
  This module implements RDF::Repository on top of LMDB, a fast and
113
118
  robust key-value store.
@@ -124,6 +129,7 @@ files:
124
129
  - LICENSE
125
130
  - README.md
126
131
  - Rakefile
132
+ - TODO.org
127
133
  - bin/console
128
134
  - bin/setup
129
135
  - lib/rdf-lmdb.rb
@@ -134,7 +140,6 @@ homepage: https://github.com/doriantaylor/rb-rdf-lmdb
134
140
  licenses:
135
141
  - Apache-2.0
136
142
  metadata: {}
137
- post_install_message:
138
143
  rdoc_options: []
139
144
  require_paths:
140
145
  - lib
@@ -149,8 +154,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
149
154
  - !ruby/object:Gem::Version
150
155
  version: '0'
151
156
  requirements: []
152
- rubygems_version: 3.3.15
153
- signing_key:
157
+ rubygems_version: 3.6.3
154
158
  specification_version: 4
155
159
  summary: Symas LMDB back-end for RDF::Repository
156
160
  test_files: []