rdf-lmdb 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 44a37cfc92db953f40ab6973acdacd645d652f1b5cc0c4e2524727eadaddbe3a
4
+ data.tar.gz: 25fffce08cbe02ca7baeacd485bd1bf8124cd735344a6203b51c79bbcba7f33d
5
+ SHA512:
6
+ metadata.gz: 5efc6ee2af73f21d4e1810f176b4652b78517d396f7a13ff70680ad5042a4b43549f923845abf39e0b79193806d3ae10452e3f2aa090f577c796824d4a0ae2de
7
+ data.tar.gz: '08f9e412178177a143b513a9379606a4290f1fa89de0f74afed6dd9d6068d63813bc3ea8c032f373866f0b9d9a1b066b4366b50e39b549b0569716ed1356b666'
@@ -0,0 +1,13 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+
10
+ # rspec failure tracking
11
+ .rspec_status
12
+ Gemfile.lock
13
+ *.gem
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
@@ -0,0 +1,7 @@
1
+ ---
2
+ sudo: false
3
+ language: ruby
4
+ cache: bundler
5
+ rvm:
6
+ - 2.6.3
7
+ before_install: gem install bundler -v 2.0.2
data/Gemfile ADDED
@@ -0,0 +1,7 @@
1
+ source "https://rubygems.org"
2
+
3
+ #gem 'lmdb', git: 'https://github.com/doriantaylor/lmdb.git',
4
+ # branch: 'cursor-get-both'
5
+
6
+ # Specify your gem's dependencies in rdf-lmdb.gemspec
7
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,202 @@
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
@@ -0,0 +1,67 @@
1
+ # RDF::LMDB - Lightweight, persistent, transactional RDF store
2
+
3
+ This library implements `RDF::Repository` using the Symax Lightning
4
+ MDB key-value database. It is intended to be a basic, durable,
5
+ locally-attached quad store, that avails itself of the properties of
6
+ LMDB.
7
+
8
+ `RDF::LMDB` is _also_ intended to provide a reference implementation
9
+ of an architecture for storing RDF in _any_ key-value database, such
10
+ that this adapter could be ported, or indeed the data _imported_, to
11
+ other back-ends (e.g. Berkeley DB, LevelDB, Kyoto Cabinet…) without
12
+ having to significantly change the design. The only real requirement
13
+ for the back-end is some kind of cursor functionality, and the
14
+ handling of multi-valued keys.
15
+
16
+ ## Architecture
17
+
18
+ The system uses binary SHA-256 digests of N-Triples representations of
19
+ terms and statements. Terms are normalized first before being hashed.
20
+ The hashes themselves are stored in their binary representation.
21
+
22
+ ### Triples
23
+
24
+ The main content of the store is keyed on the hash of a normalized
25
+ N-Triples statement (including the terminating ` .`). Its values are
26
+ the concatenated hashes of the individual terms:
27
+
28
+ sha256(s <sp> p <sp> o " .") => sha256(s) sha256(p) sha256(o)
29
+
30
+ ### GSPO
31
+
32
+ There are four indices that resolve terms to statements, _graph_,
33
+ _subject_, _predicate_, _object_, respectively:
34
+
35
+ sha256(term) => sha256(s <sp> p <sp> o " .")
36
+
37
+ ### Node Resolution
38
+
39
+ Finally, there is an index that maps the digests of the terms back to
40
+ their normalized N-Triples representations:
41
+
42
+ sha256(term) => term
43
+
44
+ ## API Documentation
45
+
46
+ Generated and deposited
47
+ [in the usual place](http://www.rubydoc.info/gems/rdf-lmdb/).
48
+
49
+ ## Installation
50
+
51
+ Come on, you know how to do this:
52
+
53
+ $ gem install rdf-lmdb
54
+
55
+ Or, [download it off rubygems.org](https://rubygems.org/gems/rdf-lmdb).
56
+
57
+ ## Contributing
58
+
59
+ Bug reports and pull requests are welcome at
60
+ [the GitHub repository](https://github.com/doriantaylor/rb-rdf-sak).
61
+
62
+ ## Copyright & License
63
+
64
+ ©2019 [Dorian Taylor](https://doriantaylor.com/)
65
+
66
+ This software is provided under
67
+ the [Apache License, 2.0](https://www.apache.org/licenses/LICENSE-2.0).
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "rdf/lmdb"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1 @@
1
+ require 'rdf/lmdb'
@@ -0,0 +1,873 @@
1
+ require 'rdf/lmdb/version'
2
+
3
+ require 'rdf'
4
+ require 'rdf/ntriples'
5
+ require 'pathname'
6
+ require 'lmdb'
7
+ require 'digest'
8
+ require 'unf' # lol unf unf unf
9
+
10
+ module RDF
11
+
12
+ class Node
13
+
14
+ private
15
+
16
+ B64_ALPHA = ((?A..?Z).to_a + (?a..?z).to_a + (?0..?9).to_a + %w(- _)).freeze
17
+
18
+ def make_cheapo_b64_uuid_ncname
19
+ vals = (1..20).map { rand 64 } # generate the content
20
+ vals.push(rand(4) + 8) # last digit is special
21
+ 'E' + vals.map { |v| B64_ALPHA[v] }.join('') # 'E' for UUID v4
22
+ end
23
+
24
+ public
25
+
26
+ # Monkeypatch the bnode identifier generator because memory
27
+ # addresses have a tendency to be the same across runs on certain systems
28
+ def initialize(id = nil)
29
+ id = nil if id.to_s.empty?
30
+ @id = (id || make_cheapo_b64_uuid_ncname).to_s.freeze
31
+ end
32
+
33
+ end
34
+
35
+ module LMDB
36
+
37
+ # ???
38
+ class Transaction < ::RDF::Transaction
39
+ private
40
+
41
+ # LMDB transactions have to happen inside a block, while
42
+ # RDF::Transactable transactions can float freely.
43
+
44
+ def wrap_txn &block
45
+ begin
46
+ @repository.env.transaction !@mutable do |t|
47
+ @txn = t
48
+
49
+ case block.arity
50
+ when 1 then block.call(self)
51
+ else self.instance_eval(&block)
52
+ end
53
+
54
+ # and now we make sure we change it
55
+ execute unless @rolledback
56
+ @txn = nil
57
+ end
58
+ rescue => error
59
+ raise error
60
+ end
61
+ end
62
+
63
+ public
64
+
65
+ def initialize repository,
66
+ graph_name: nil, mutable: false, **options, &block
67
+ @repository = repository
68
+ @snapshot =
69
+ repository.supports?(:snapshots) ? repository.snapshot : repository
70
+ @options = options.dup
71
+ @mutable = mutable
72
+ @graph_name = graph_name
73
+
74
+ raise TransactionError,
75
+ 'Tried to open a mutable transaction on an immutable repository' if
76
+ @mutable && !@repository.mutable?
77
+
78
+ @changes = RDF::Changeset.new
79
+
80
+ #warn caller[0]
81
+
82
+ wrap_txn(&block) if block_given?
83
+ end
84
+
85
+ def execute
86
+ raise TransactionError,
87
+ 'Cannot execute a rolled back transaction. Open a new one instead.' if
88
+ @rolledback
89
+
90
+ ret = if @txn
91
+ @changes.apply(@repository)
92
+ else
93
+ wrap_txn { @changes.apply(@repository) }
94
+ end
95
+
96
+ @changes = RDF::Changeset.new
97
+
98
+ ret
99
+ end
100
+
101
+ def rollback
102
+ if @txn
103
+ @txn.abort
104
+ @txn = nil
105
+ end
106
+
107
+ super
108
+ end
109
+ end
110
+
111
+ #
112
+ # RDF::LMDB::Repository implements a lightweight, transactional,
113
+ # locally-attached data store using Symax LMDB.
114
+ #
115
+ class Repository < ::RDF::Repository
116
+ private
117
+
118
+ DEFAULT_TX_CLASS = RDF::LMDB::Transaction
119
+
120
+ SUPPORTS = %i[
121
+ graph_name literal_equality atomic_writes
122
+ ].map {|s| [s, s] }.to_h.freeze
123
+
124
+ # give us the binary hash of the initial sha256 state
125
+ NULL_SHA256 = [
126
+ 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
127
+ ].pack('H*').freeze
128
+
129
+ def init_lmdb dir, **options
130
+ dir = Pathname(dir).expand_path
131
+ dir.mkdir unless dir.exist?
132
+
133
+ # fire up the environment
134
+ @lmdb = ::LMDB.new dir, **options
135
+
136
+ # XXX trip over the old database layout for now
137
+ dbs = @lmdb.database.keys.map(&:to_sym)
138
+ unless dbs.empty? or dbs.include? :int2term
139
+ err = <<-ERR.tr_s("\n ", ' ')
140
+ This version uses an updated (and incompatible) database layout.
141
+ Currently you have to dump from the old layout and reload the new one. Sorry!
142
+ ERR
143
+ raise ArgumentError, err
144
+ end
145
+
146
+ # databases are opened in a transaction, who knew
147
+ @lmdb.transaction do # |t|
148
+ @dbs = {
149
+ statement: [:integerkey], # key: int; val: ints
150
+ hash2term: [], # key: sha256, val: int
151
+ int2term: [:integerkey], # key: int, val: string
152
+ ints2stmt: [], # key: 3x ints, val: int
153
+ s2stmt: [:integerkey, :dupsort, :dupfixed],
154
+ p2stmt: [:integerkey, :dupsort, :dupfixed],
155
+ o2stmt: [:integerkey, :dupsort, :dupfixed],
156
+ g2stmt: [:integerkey, :dupsort, :dupfixed],
157
+ stmt2g: [:integerkey, :dupsort, :dupfixed],
158
+ sp2stmt: [:dupsort, :dupfixed],
159
+ so2stmt: [:dupsort, :dupfixed],
160
+ po2stmt: [:dupsort, :dupfixed],
161
+ # on the fence about whether or not to include graph
162
+ # indexes; my inclination is that they would be redundant
163
+ # gs2stmt: [:dupsort, :dupfixed],
164
+ # gp2stmt: [:dupsort, :dupfixed],
165
+ # go2stmt: [:dupsort, :dupfixed],
166
+ }.map do |name, flags|
167
+ [name, @lmdb.database(name.to_s,
168
+ **(flags + [:create]).map { |f| [f, true] }.to_h)]
169
+ end.to_h
170
+
171
+ # t.commit
172
+ end
173
+ @lmdb.sync
174
+ end
175
+
176
+ SPO = %i[subject predicate object].freeze
177
+ SPO_MAP = {
178
+ subject: :s2stmt,
179
+ predicate: :p2stmt,
180
+ object: :o2stmt,
181
+ }.freeze
182
+ SPOG_MAP = SPO_MAP.merge({ graph_name: :g2stmt }).freeze
183
+ PAIR_MAP = {
184
+ [:subject, :predicate] => :sp2stmt,
185
+ [:predicate, :object] => :po2stmt,
186
+ [:subject, :object] => :so2stmt,
187
+ # [:graph_name, :subject] => :gs2stmt,
188
+ # [:graph_name, :predicate] => :gp2stmt,
189
+ # [:graph_name, :object] => :go2stmt,
190
+ }.freeze
191
+
192
+ def last_key db
193
+ db = @dbs[db] if db.is_a? Symbol
194
+ return nil if db.size == 0
195
+ # the last entry in the database should be the highest number
196
+ db.cursor { |c| c.last }.first.unpack1 ?J
197
+ end
198
+
199
+ def int_for term
200
+ case term
201
+ when nil then 0
202
+ when RDF::Statement
203
+ terms = term.to_a.map { |t| int_for t }
204
+ return if terms.include? nil # the statement implicitly not here
205
+
206
+ if raw = @dbs[:ints2stmt].get(terms.pack 'J3')
207
+ raw.unpack1 ?J
208
+ end
209
+ when Hash # of integers
210
+ if raw = @dbs[:ints2stmt].get(term.values_at(*SPO).pack 'J3')
211
+ raw.unpack1 ?J
212
+ end
213
+ when RDF::Term
214
+ thash = hash_term term
215
+ if raw = @dbs[:hash2term].get(thash)
216
+ raw.unpack1 ?J
217
+ end
218
+ when String
219
+ # assume this is the hash string
220
+ if raw = @dbs[:hash2term].get(term)
221
+ raw.unpack1 ?J
222
+ end
223
+ end
224
+ end
225
+
226
+ def store_term term
227
+ return 0 if term.nil?
228
+ raise ArgumentError, 'must be a term' unless term.is_a? RDF::Term
229
+ # get the hash first
230
+ thash = hash_term term
231
+ if ix = int_for(thash)
232
+ return ix
233
+ end
234
+
235
+ # this should start with 1, not zero
236
+ ix = (last_key(@dbs[:int2term]) || 0) + 1
237
+ ib = [ix].pack ?J
238
+ @dbs[:int2term].put ib, term.to_ntriples.to_nfc
239
+
240
+ # we need the hash too to resolve the term the other way
241
+ @dbs[:hash2term].put thash, ib
242
+
243
+ ix # return the current index
244
+ end
245
+
246
+ def store_stmt statement, ints = nil
247
+ ints ||= statement.to_h.transform_values { |v| store_term v }
248
+ ik = ints.values_at(*SPO).pack 'J3'
249
+ if ib = @dbs[:ints2stmt].get(ik)
250
+ return ib.unpack1 ?J
251
+ end
252
+
253
+ # this should start with 1, not zero
254
+ ix = (last_key(:statement) || 0) + 1
255
+ ib = [ix].pack ?J
256
+
257
+ @dbs[:statement].put ib, ik # number to triple-number
258
+ @dbs[:ints2stmt].put ik, ib # triple-number to number
259
+
260
+ ix # the index integer
261
+ end
262
+
263
+ # everything gets normalized to NFC on the way in (i
264
+ # consternated for a very long time about NFC vs NFKC)
265
+ def hash_term term
266
+ Digest::SHA256.digest term.to_ntriples.to_nfc
267
+ end
268
+
269
+ # note we leave the period but we nuke the newline
270
+ def hash_statement stmt
271
+ Digest::SHA256.digest stmt.to_ntriples.to_nfc.chomp
272
+ end
273
+
274
+ def add_one statement
275
+ # get the integer keys for the terms and statement
276
+ terms = statement.to_h
277
+ ints = terms.transform_values { |v| store_term v }
278
+ ipack = ints.transform_values { |v| [v].pack ?J }
279
+ sint = store_stmt statement, ints
280
+ spack = [sint].pack ?J
281
+
282
+ # now we map the SPO indices
283
+ SPO_MAP.each do |k, d|
284
+ db = @dbs[d]
285
+ ik = ipack[k]
286
+ # note we test before inserting or lmdb will dutifully
287
+ # create unlimited duplicate values and results will be wrong
288
+ db.put ik, spack unless db.has? ik, spack
289
+ end
290
+
291
+ # now we do the pair indices
292
+ PAIR_MAP.each do |pair, d|
293
+ db = @dbs[d]
294
+ ik = ipack.values_at(*pair).join
295
+ db.put ik, spack unless db.has? ik, spack
296
+ end
297
+
298
+ # associate the statement with its graph; note zero is the null graph
299
+ gint = ints[:graph_name] || 0
300
+ gpack = [gint].pack ?J
301
+ @dbs[:g2stmt].put gpack, spack unless @dbs[:g2stmt].has? gpack, spack
302
+ @dbs[:stmt2g].put spack, gpack unless @dbs[:stmt2g].has? spack, gpack
303
+ end
304
+
305
+ def rm_one statement, scan: true
306
+ terms = statement.to_h
307
+ ints = terms.transform_values { |v| int_for v }
308
+ # if none of the terms resolve, we don't have it
309
+ return [] if ints.values_at(*SPO).include? nil
310
+ # same goes for the statement
311
+ sint = int_for(ints) or return []
312
+ spack = [sint].pack ?J
313
+
314
+
315
+ gint = ints[:graph_name] or return []
316
+ gpack = [gint].pack ?J
317
+ graphs = @dbs[:stmt2g].each_value(spack).to_a.uniq
318
+
319
+ out = []
320
+ unless graphs.empty?
321
+ # this will dissociate the statement from the graph
322
+ @dbs[:g2stmt].delete? gpack, spack
323
+ @dbs[:stmt2g].delete? spack, gpack
324
+
325
+ if graphs.size == 1 and graphs.first == gpack
326
+ # nuke the statement if this is the only instance of it
327
+ @dbs[:statement].delete? spack
328
+ @dbs[:ints2stmt].delete? ints.values_at(*SPO).pack('J3')
329
+
330
+ # now we nuke the indexes
331
+
332
+ # first the original spo
333
+ SPO_MAP.map do |k, d|
334
+ ib = [ints[k]].pack ?J
335
+ @dbs[d].delete? ib, spack
336
+ out << ints[k]
337
+ end
338
+
339
+ # add the graph if it is not null
340
+ out << terms[:graph_name] if terms[:graph_name] and gint != 0
341
+
342
+ # and now the pair map
343
+ ipack = ints.slice(*SPO).transform_values { |v| [v].pack ?J }
344
+ PAIR_MAP.map do |pair, d|
345
+ ib = ipack.values_at(*pair).join
346
+ @dbs[d].delete? ib, spack
347
+ end
348
+ end
349
+ end
350
+
351
+ # nuke any unused terms
352
+ clean_terms out if scan
353
+
354
+ out
355
+ end
356
+
357
+ def clean_terms terms
358
+ terms.map! { |t| t.is_a?(RDF::Term) ? hash_term(t) : t.to_s }.uniq
359
+ @lmdb.transaction do
360
+ terms.each do |hash|
361
+ next if hash == NULL_SHA256
362
+ next unless ib = @dbs[:hash2term].get(hash)
363
+ unless SPOG_MAP.values.any? {|d| @dbs[d].get ib }
364
+ @dbs[:int2term].delete? ib
365
+ @dbs[:hash2term].delete? hash
366
+ end
367
+ end
368
+ end
369
+ end
370
+
371
+ def complete! statement
372
+ raise ArgumentError, "Statement #{statement.inspect} is incomplete" if
373
+ statement.incomplete?
374
+ end
375
+
376
+ def resolve_term candidate, cache: {}, write: false
377
+ int = nil
378
+ term = case candidate
379
+ when nil then return
380
+ when Integer
381
+ int = candidate
382
+ return if int == 0
383
+ return cache[int] if cache[int]
384
+ str = [int].pack ?J
385
+ @dbs[:int2term][str] or return
386
+ when String
387
+ int = candidate.unpack1 ?J
388
+ str = [int].pack ?J
389
+ if candidate == str
390
+ return if int == 0
391
+ return cache[int] if cache[int]
392
+ @dbs[:int2term][str] or return
393
+ else
394
+ return if candidate == NULL_SHA256
395
+ str = @dbs[:hash2term][candidate] or return
396
+ @dbs[:int2term][str] or return
397
+ int = str.unpack1 ?J
398
+ end
399
+ else
400
+ raise ArgumentError, 'not an integer or a string'
401
+ end
402
+
403
+ term.force_encoding 'utf-8'
404
+
405
+ term = RDF::NTriples::Reader.parse_object term, intern: true
406
+ cache[int] = term if write
407
+ term
408
+ end
409
+
410
+ def split_fixed string, length
411
+ string = string.dup
412
+ seq = []
413
+ until string.empty?
414
+ seq << string.slice!(0, length)
415
+ end
416
+ seq
417
+ end
418
+
419
+ def resolve_terms string, cache: {}, write: false, hash: false
420
+ seq = []
421
+ out = string.unpack('J*').map do |i|
422
+ seq << i
423
+ j = resolve_term(i, cache: cache, write: write)
424
+ [i, j]
425
+ end.to_h
426
+
427
+ # if we aren't returning a hash, make sure the result is
428
+ # returned in order
429
+ hash ? out : out.values_at(*seq)
430
+ end
431
+
432
+ def each_maybe_with_graph has_graph = false, &block
433
+ body = -> do
434
+ cache = {}
435
+ @dbs[:statement].each do |spack, spo|
436
+ spo = resolve_terms spo, cache: cache, write: true
437
+
438
+ @dbs[:stmt2g].each_value spack do |gpack|
439
+ gint = gpack.unpack1 ?J
440
+ next if has_graph and gint == 0
441
+ graph = resolve_term gpack, cache: cache, write: true
442
+ block.call RDF::Statement(*spo, graph_name: graph)
443
+ end
444
+ end
445
+ end
446
+
447
+ @lmdb.transaction do
448
+ body.call
449
+ end
450
+
451
+ #@lmdb.active_txn ? body.call : @lmdb.transaction(true, &body)
452
+ end
453
+
454
+ def check_triple_quad arg, name: :triple, quad: false
455
+ raise ArgumentError, "#{name} must be Array-able" unless
456
+ arg.respond_to? :to_a
457
+ arg = arg.to_a
458
+ spo = arg.take 3
459
+ raise ArgumentError,
460
+ '#{name} must be at least 3 RDF::Term elements' unless
461
+ spo.length == 3 and spo.all? { |x| x.is_a? RDF::Term }
462
+ graph = nil
463
+ if quad
464
+ graph = arg[3]
465
+ raise ArgumentError, 'quad must be nil or an RDF::Term' unless
466
+ graph.nil? or graph.is_a? RDF::Term
467
+ end
468
+
469
+ RDF::Statement(*spo, graph_name: graph)
470
+ end
471
+
472
+ protected
473
+
474
+ def begin_transaction mutable: false, graph_name: nil, &block
475
+ @tx_class.new self, mutable: mutable, graph_name: graph_name, &block
476
+ end
477
+
478
+ def commit_transaction txn = nil
479
+ nil # nothing lol
480
+ end
481
+
482
+ def rollback_transaction txn = nil
483
+ nil # nothing lol
484
+ end
485
+
486
+ def query_pattern pattern, options = {}, &block
487
+ return enum_for :query_pattern, pattern, options unless block_given?
488
+
489
+ # coerce to hash
490
+ pattern = pattern.to_h
491
+
492
+ # flag if the graph is a variable
493
+ gv = pattern[:graph_name] && pattern[:graph_name].variable?
494
+
495
+ # hash of terms we get from the pattern
496
+ thash = pattern.reject { |_, v| !v or v.variable? }
497
+
498
+ # if nothing in the pattern is present then this is the same
499
+ # as #each/#each_statement
500
+ return each_maybe_with_graph(gv, &block) if thash.empty?
501
+
502
+ # hash of integer keys we retrieve for the terms
503
+ ihash = thash.transform_values { |v| int_for v }
504
+ cache = thash.keys.map { |k| [ihash[k], thash[k]] }.to_h
505
+
506
+ body = -> do
507
+ # if the graph is nonexistent there is nothing to show
508
+ return if thash[:graph_name] and !ihash[:graph_name]
509
+
510
+ if (SPO - thash.keys).empty?
511
+ # if all of SPO are defined then we can just construct a
512
+ # statement and hash it; then if G is defined on top of that
513
+ # we can just check :stmt2g
514
+ stmt = RDF::Statement.new(**thash)
515
+ sint = int_for(stmt) or return
516
+ spack = [sint].pack ?J
517
+ first = @dbs[:statement].get(spack) or return
518
+
519
+ # warn thash.inspect, ihash.inspect
520
+
521
+ # note
522
+ if gint = ihash[:graph_name]
523
+ gpack = [gint].pack ?J
524
+ return unless @dbs[:stmt2g].has? spack, gpack
525
+ yield stmt
526
+ else
527
+ @dbs[:stmt2g].each_value spack do |gpack|
528
+ # return if gpack.unpack1(?J) == 0
529
+ graph = resolve_term gpack, cache: cache, write: true
530
+ yield RDF::Statement.from(stmt, graph_name: graph)
531
+ end
532
+ end
533
+ elsif thash.keys.count == 1
534
+ # if only a single component (e.g. :subject) is present then
535
+ # we only need to check (e.g.) :s2stmt.
536
+ pos = thash.keys.first
537
+ db = @dbs[SPOG_MAP[pos]]
538
+ ix = ihash[pos] or return # note ihash[pos] may be nil
539
+ anchor = [ix].pack ?J
540
+ return unless db.has? anchor
541
+
542
+ db.each_value anchor do |spack|
543
+ spo = resolve_terms @dbs[:statement][spack],
544
+ cache: cache, write: true
545
+ if pos == :graph_name
546
+ yield RDF::Statement(*spo, graph_name: thash[:graph_name])
547
+ else
548
+ @dbs[:stmt2g].each_value spack do |gpack|
549
+ gint = gpack.unpack1 ?J
550
+ graph = resolve_term gint, cache: cache, write: true
551
+ yield RDF::Statement(*spo, graph_name: graph)
552
+ end
553
+ end
554
+ end
555
+ elsif thash.keys.count == 2 and thash[:graph_name]
556
+ pos = (thash.keys - [:graph_name]).first
557
+ db = @dbs[SPO_MAP[pos]]
558
+ ix = ihash[pos] or return
559
+ anchor = [ix].pack ?J
560
+ return unless db.has? anchor
561
+
562
+ db.each_value anchor do |spack|
563
+ spo = @dbs[:statement][spack]
564
+ return unless @dbs[:stmt2g].has? spack, ihash[:graph_name]
565
+ spo = resolve_terms spo
566
+ yield RDF::Statement(*spo, graph_name: thash[:graph_name])
567
+ end
568
+ else
569
+ # okay we will have either two or three terms
570
+
571
+ # select the pair of term keys with the lowest non-zero
572
+ # cardinality
573
+ pair = PAIR_MAP.select do |pr, _|
574
+ # we check for keys present as well as values (eg nil graph)
575
+ (pr - thash.keys).empty? and ihash.values_at(*pr).none?(&:nil?)
576
+ end.map do |pr, _|
577
+ v = ihash.values_at(*pr).pack 'J2'
578
+ c = @dbs[PAIR_MAP[pr]].cardinality(v)
579
+ [c, pr]
580
+ end.sort do |a, b|
581
+ a.first <=> b.first
582
+ end.reject { |x| x.first == 0 }.map(&:last).first or return
583
+
584
+ # grab the graph if we have it
585
+ g = resolve_term(ihash[:graph_name],
586
+ cache: cache, write: true) if ihash[:graph_name]
587
+
588
+ ib = ihash.values_at(*pair).pack 'J2'
589
+ @dbs[PAIR_MAP[pair]].each_value ib do |spack|
590
+ spo = resolve_terms @dbs[:statement][spack],
591
+ cache: cache, write: true
592
+
593
+ if ihash[:graph_name]
594
+ # warn g, ihash.inspect
595
+ gpack = [ihash[:graph_name]].pack ?J
596
+ next unless @dbs[:stmt2g].has? spack, gpack
597
+ yield RDF::Statement(*spo, graph_name: g)
598
+ else
599
+ @dbs[:stmt2g].each_value spack do |gpack|
600
+ gint = gpack.unpack1 ?J
601
+ g = resolve_term gint, cache: cache, write: true
602
+ yield RDF::Statement(*spo, graph_name: g)
603
+ end
604
+ end
605
+ end
606
+ end
607
+ end
608
+
609
+ #@lmdb.active_txn ? body.call : @lmdb.transaction(true, &body)
610
+
611
+ ret = nil
612
+ @lmdb.transaction do
613
+ ret = body.call
614
+ end
615
+
616
+ ret
617
+ end
618
+
619
+ public
620
+
621
+ def initialize dir = nil, uri: nil, title: nil, **options, &block
622
+ dir ||= options.delete(:dir) if options[:dir]
623
+
624
+ # wtf no idea why this won't inherit
625
+ @tx_class ||= options.delete(:transaction_class) { DEFAULT_TX_CLASS }
626
+ raise ArgumentError, "Invalid transaction class #{@tx_class}" unless
627
+ @tx_class.is_a? Class and @tx_class <= DEFAULT_TX_CLASS
628
+
629
+ init_lmdb dir, **options
630
+ super uri: uri, title: title, **options, &block
631
+ end
632
+
633
+ # housekeeping
634
+
635
+ def supports? feature
636
+ !!SUPPORTS[feature.to_s.to_sym]
637
+ end
638
+
639
+ def isolation_level
640
+ :serializable
641
+ end
642
+
643
+ def path
644
+ Pathname(@lmdb.path)
645
+ end
646
+
647
+ def clear
648
+ @lmdb.transaction do
649
+ @dbs.each_value { |db| db.clear }
650
+ end
651
+ # we do not clear the main database; that nukes the sub-databases
652
+ # @lmdb.database.clear
653
+ end
654
+
655
+ def open dir, **options
656
+ init_lmdb dir, **options
657
+ end
658
+
659
+ def close
660
+ @lmdb.close
661
+ end
662
+
663
+ # data manipulation
664
+
665
+ def insert_statement statement
666
+ complete! statement
667
+ @lmdb.transaction { |t| add_one statement; t.commit }
668
+ nil
669
+ end
670
+
671
+ def delete_statement statement
672
+ complete! statement
673
+ @lmdb.transaction { |t| rm_one statement; t.commit }
674
+ nil
675
+ end
676
+
677
+ def insert_statements statements
678
+ @lmdb.transaction do
679
+ statements.each do |statement|
680
+ complete! statement
681
+ add_one statement
682
+ end
683
+ end
684
+
685
+ nil
686
+ end
687
+
688
+ def delete_statements statements
689
+ @lmdb.transaction do |t|
690
+ hashes = []
691
+ statements.each do |statement|
692
+ complete! statement
693
+ hashes += rm_one statement, scan: false
694
+ end
695
+
696
+ clean_terms hashes
697
+ t.commit
698
+ end
699
+
700
+ nil
701
+ end
702
+
703
+ # data retrieval
704
+
705
+ def each &block
706
+ return enum_for :each unless block_given?
707
+
708
+ each_maybe_with_graph(&block)
709
+ end
710
+
711
+ def each_subject &block
712
+ return enum_for :each_subject unless block_given?
713
+ @dbs[:s2stmt].cursor do |c|
714
+ while (k, _ = c.next true)
715
+ yield resolve_term k
716
+ end
717
+ end
718
+ end
719
+
720
+ def each_predicate &block
721
+ return enum_for :each_predicate unless block_given?
722
+ @dbs[:p2stmt].cursor do |c|
723
+ while (k, _ = c.next true)
724
+ yield resolve_term k
725
+ end
726
+ end
727
+ end
728
+
729
+ def each_object &block
730
+ return enum_for :each_object unless block_given?
731
+ @dbs[:o2stmt].cursor do |c|
732
+ while (k, _ = c.next true)
733
+ yield resolve_term k
734
+ end
735
+ end
736
+ end
737
+
738
+ def each_graph &block
739
+ return enum_for :each_graph unless block_given?
740
+ @dbs[:g2stmt].cursor do |c|
741
+ while (k, _ = c.next true)
742
+ yield RDF::Graph.new(graph_name: resolve_term(k), data: self)
743
+ end
744
+ end
745
+ end
746
+
747
+ def each_term &block
748
+ return enum_for :each_term unless block_given?
749
+ @dbs[:int2term].cursor do |c|
750
+ while (_, v = c.next)
751
+ # yield RDF::NTriples::Reader.unserialize v
752
+ v.force_encoding 'utf-8'
753
+ yield RDF::NTriples::Reader.parse_object(v, intern: true)
754
+ end
755
+ end
756
+ end
757
+
758
+ def project_graph graph_name, &block
759
+ return enum_for :project_graph, graph_name unless block_given?
760
+ body = -> do
761
+ gint = graph_name ? int_for(graph_name) : 0
762
+ return unless gint
763
+ gpack = [gint].pack ?J
764
+ cache = {}
765
+ @dbs[:statement].each do |spack, spo|
766
+ next unless @dbs[:stmt2g].has? spack, gpack
767
+ spo = resolve_terms spo, cache: cache, write: true
768
+
769
+ block.call RDF::Statement(*spo, graph_name: graph_name)
770
+ end
771
+ end
772
+
773
+ @lmdb.transaction do
774
+ body.call
775
+ end
776
+
777
+ #@lmdb.active_txn ? body.call : @lmdb.transaction(true, &body)
778
+ end
779
+
780
+ def count
781
+ @dbs[:stmt2g].size
782
+ end
783
+
784
+ def empty?
785
+ count == 0
786
+ end
787
+
788
+ # def apply_changeset changeset
789
+ # @lmdb.transaction do |t|
790
+ # delete_insert(changeset.deletes, changeset.inserts)
791
+ # end
792
+ # end
793
+
794
+ def delete_insert deletes, inserts
795
+ ret = super(deletes, inserts)
796
+ commit_transaction # this is to satiate the test suite
797
+ ret
798
+ end
799
+
800
+ def env
801
+ @lmdb
802
+ end
803
+
804
+ def transaction mutable: false, &block
805
+ return begin_transaction mutable: mutable unless block_given?
806
+
807
+ begin
808
+ begin_transaction mutable: mutable, &block
809
+ rescue => error
810
+ rollback_transaction # to sate the test suite
811
+ raise error
812
+ end
813
+ #commit_transaction # to sate the test suite
814
+ self
815
+ end
816
+
817
+ def has_statement? statement
818
+ raise ArgumentError, 'Argument must be an RDF::Statement' unless
819
+ statement.is_a? RDF::Statement
820
+ !query_pattern(statement.to_h).to_a.empty?
821
+ end
822
+
823
+ def has_graph? graph_name
824
+ raise ArgumentError, 'graph_name must be an RDF::Term' unless
825
+ graph_name.is_a? RDF::Term
826
+ int = int_for(graph_name) or return
827
+ pack = [int].pack ?J
828
+ @dbs[:g2stmt].has? pack
829
+ end
830
+
831
+ def has_subject? subject
832
+ raise ArgumentError, 'subject must be an RDF::Term' unless
833
+ subject.is_a? RDF::Term
834
+ int = int_for(subject) or return
835
+ pack = [int].pack ?J
836
+ @dbs[:s2stmt].has? pack
837
+ end
838
+
839
+ def has_predicate? predicate
840
+ raise ArgumentError, 'predicate must be an RDF::Term' unless
841
+ predicate.is_a? RDF::Term
842
+ int = int_for(predicate) or return
843
+ pack = [int].pack ?J
844
+ @dbs[:p2stmt].has? pack
845
+ end
846
+
847
+ def has_object? object
848
+ raise ArgumentError, 'object must be an RDF::Term' unless
849
+ object.is_a? RDF::Term
850
+ int = int_for(object) or return
851
+ pack = [int].pack ?J
852
+ @dbs[:o2stmt].has? pack
853
+ end
854
+
855
+ def has_term? term
856
+ raise ArgumentError, 'term must be an RDF::Term' unless
857
+ term.is_a? RDF::Term
858
+ @dbs[:hash2term].has? hash_term(term)
859
+ end
860
+
861
+ def has_triple? triple
862
+ has_statement? check_triple_quad triple
863
+ end
864
+
865
+ def has_quad? quad
866
+ has_statement? check_triple_quad quad, quad: true
867
+ end
868
+
869
+
870
+ # lol, ruby
871
+ end
872
+ end
873
+ end