rdf-lmdb 0.2.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 44a37cfc92db953f40ab6973acdacd645d652f1b5cc0c4e2524727eadaddbe3a
4
+ data.tar.gz: 25fffce08cbe02ca7baeacd485bd1bf8124cd735344a6203b51c79bbcba7f33d
5
+ SHA512:
6
+ metadata.gz: 5efc6ee2af73f21d4e1810f176b4652b78517d396f7a13ff70680ad5042a4b43549f923845abf39e0b79193806d3ae10452e3f2aa090f577c796824d4a0ae2de
7
+ data.tar.gz: '08f9e412178177a143b513a9379606a4290f1fa89de0f74afed6dd9d6068d63813bc3ea8c032f373866f0b9d9a1b066b4366b50e39b549b0569716ed1356b666'
@@ -0,0 +1,13 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+
10
+ # rspec failure tracking
11
+ .rspec_status
12
+ Gemfile.lock
13
+ *.gem
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
@@ -0,0 +1,7 @@
1
+ ---
2
+ sudo: false
3
+ language: ruby
4
+ cache: bundler
5
+ rvm:
6
+ - 2.6.3
7
+ before_install: gem install bundler -v 2.0.2
data/Gemfile ADDED
@@ -0,0 +1,7 @@
1
+ source "https://rubygems.org"
2
+
3
+ #gem 'lmdb', git: 'https://github.com/doriantaylor/lmdb.git',
4
+ # branch: 'cursor-get-both'
5
+
6
+ # Specify your gem's dependencies in rdf-lmdb.gemspec
7
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,202 @@
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
@@ -0,0 +1,67 @@
1
+ # RDF::LMDB - Lightweight, persistent, transactional RDF store
2
+
3
+ This library implements `RDF::Repository` using the Symax Lightning
4
+ MDB key-value database. It is intended to be a basic, durable,
5
+ locally-attached quad store, that avails itself of the properties of
6
+ LMDB.
7
+
8
+ `RDF::LMDB` is _also_ intended to provide a reference implementation
9
+ of an architecture for storing RDF in _any_ key-value database, such
10
+ that this adapter could be ported, or indeed the data _imported_, to
11
+ other back-ends (e.g. Berkeley DB, LevelDB, Kyoto Cabinet…) without
12
+ having to significantly change the design. The only real requirement
13
+ for the back-end is some kind of cursor functionality, and the
14
+ handling of multi-valued keys.
15
+
16
+ ## Architecture
17
+
18
+ The system uses binary SHA-256 digests of N-Triples representations of
19
+ terms and statements. Terms are normalized first before being hashed.
20
+ The hashes themselves are stored in their binary representation.
21
+
22
+ ### Triples
23
+
24
+ The main content of the store is keyed on the hash of a normalized
25
+ N-Triples statement (including the terminating ` .`). Its values are
26
+ the concatenated hashes of the individual terms:
27
+
28
+ sha256(s <sp> p <sp> o " .") => sha256(s) sha256(p) sha256(o)
29
+
30
+ ### GSPO
31
+
32
+ There are four indices that resolve terms to statements, _graph_,
33
+ _subject_, _predicate_, _object_, respectively:
34
+
35
+ sha256(term) => sha256(s <sp> p <sp> o " .")
36
+
37
+ ### Node Resolution
38
+
39
+ Finally, there is an index that maps the digests of the terms back to
40
+ their normalized N-Triples representations:
41
+
42
+ sha256(term) => term
43
+
44
+ ## API Documentation
45
+
46
+ Generated and deposited
47
+ [in the usual place](http://www.rubydoc.info/gems/rdf-lmdb/).
48
+
49
+ ## Installation
50
+
51
+ Come on, you know how to do this:
52
+
53
+ $ gem install rdf-lmdb
54
+
55
+ Or, [download it off rubygems.org](https://rubygems.org/gems/rdf-lmdb).
56
+
57
+ ## Contributing
58
+
59
+ Bug reports and pull requests are welcome at
60
+ [the GitHub repository](https://github.com/doriantaylor/rb-rdf-sak).
61
+
62
+ ## Copyright & License
63
+
64
+ ©2019 [Dorian Taylor](https://doriantaylor.com/)
65
+
66
+ This software is provided under
67
+ the [Apache License, 2.0](https://www.apache.org/licenses/LICENSE-2.0).
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "rdf/lmdb"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1 @@
1
+ require 'rdf/lmdb'
@@ -0,0 +1,873 @@
1
+ require 'rdf/lmdb/version'
2
+
3
+ require 'rdf'
4
+ require 'rdf/ntriples'
5
+ require 'pathname'
6
+ require 'lmdb'
7
+ require 'digest'
8
+ require 'unf' # lol unf unf unf
9
+
10
+ module RDF
11
+
12
+ class Node
13
+
14
+ private
15
+
16
+ B64_ALPHA = ((?A..?Z).to_a + (?a..?z).to_a + (?0..?9).to_a + %w(- _)).freeze
17
+
18
+ def make_cheapo_b64_uuid_ncname
19
+ vals = (1..20).map { rand 64 } # generate the content
20
+ vals.push(rand(4) + 8) # last digit is special
21
+ 'E' + vals.map { |v| B64_ALPHA[v] }.join('') # 'E' for UUID v4
22
+ end
23
+
24
+ public
25
+
26
+ # Monkeypatch the bnode identifier generator because memory
27
+ # addresses have a tendency to be the same across runs on certain systems
28
+ def initialize(id = nil)
29
+ id = nil if id.to_s.empty?
30
+ @id = (id || make_cheapo_b64_uuid_ncname).to_s.freeze
31
+ end
32
+
33
+ end
34
+
35
+ module LMDB
36
+
37
+ # ???
38
+ class Transaction < ::RDF::Transaction
39
+ private
40
+
41
+ # LMDB transactions have to happen inside a block, while
42
+ # RDF::Transactable transactions can float freely.
43
+
44
+ def wrap_txn &block
45
+ begin
46
+ @repository.env.transaction !@mutable do |t|
47
+ @txn = t
48
+
49
+ case block.arity
50
+ when 1 then block.call(self)
51
+ else self.instance_eval(&block)
52
+ end
53
+
54
+ # and now we make sure we change it
55
+ execute unless @rolledback
56
+ @txn = nil
57
+ end
58
+ rescue => error
59
+ raise error
60
+ end
61
+ end
62
+
63
+ public
64
+
65
+ def initialize repository,
66
+ graph_name: nil, mutable: false, **options, &block
67
+ @repository = repository
68
+ @snapshot =
69
+ repository.supports?(:snapshots) ? repository.snapshot : repository
70
+ @options = options.dup
71
+ @mutable = mutable
72
+ @graph_name = graph_name
73
+
74
+ raise TransactionError,
75
+ 'Tried to open a mutable transaction on an immutable repository' if
76
+ @mutable && !@repository.mutable?
77
+
78
+ @changes = RDF::Changeset.new
79
+
80
+ #warn caller[0]
81
+
82
+ wrap_txn(&block) if block_given?
83
+ end
84
+
85
+ def execute
86
+ raise TransactionError,
87
+ 'Cannot execute a rolled back transaction. Open a new one instead.' if
88
+ @rolledback
89
+
90
+ ret = if @txn
91
+ @changes.apply(@repository)
92
+ else
93
+ wrap_txn { @changes.apply(@repository) }
94
+ end
95
+
96
+ @changes = RDF::Changeset.new
97
+
98
+ ret
99
+ end
100
+
101
+ def rollback
102
+ if @txn
103
+ @txn.abort
104
+ @txn = nil
105
+ end
106
+
107
+ super
108
+ end
109
+ end
110
+
111
+ #
112
+ # RDF::LMDB::Repository implements a lightweight, transactional,
113
+ # locally-attached data store using Symax LMDB.
114
+ #
115
+ class Repository < ::RDF::Repository
116
+ private
117
+
118
+ DEFAULT_TX_CLASS = RDF::LMDB::Transaction
119
+
120
+ SUPPORTS = %i[
121
+ graph_name literal_equality atomic_writes
122
+ ].map {|s| [s, s] }.to_h.freeze
123
+
124
+ # give us the binary hash of the initial sha256 state
125
+ NULL_SHA256 = [
126
+ 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
127
+ ].pack('H*').freeze
128
+
129
+ def init_lmdb dir, **options
130
+ dir = Pathname(dir).expand_path
131
+ dir.mkdir unless dir.exist?
132
+
133
+ # fire up the environment
134
+ @lmdb = ::LMDB.new dir, **options
135
+
136
+ # XXX trip over the old database layout for now
137
+ dbs = @lmdb.database.keys.map(&:to_sym)
138
+ unless dbs.empty? or dbs.include? :int2term
139
+ err = <<-ERR.tr_s("\n ", ' ')
140
+ This version uses an updated (and incompatible) database layout.
141
+ Currently you have to dump from the old layout and reload the new one. Sorry!
142
+ ERR
143
+ raise ArgumentError, err
144
+ end
145
+
146
+ # databases are opened in a transaction, who knew
147
+ @lmdb.transaction do # |t|
148
+ @dbs = {
149
+ statement: [:integerkey], # key: int; val: ints
150
+ hash2term: [], # key: sha256, val: int
151
+ int2term: [:integerkey], # key: int, val: string
152
+ ints2stmt: [], # key: 3x ints, val: int
153
+ s2stmt: [:integerkey, :dupsort, :dupfixed],
154
+ p2stmt: [:integerkey, :dupsort, :dupfixed],
155
+ o2stmt: [:integerkey, :dupsort, :dupfixed],
156
+ g2stmt: [:integerkey, :dupsort, :dupfixed],
157
+ stmt2g: [:integerkey, :dupsort, :dupfixed],
158
+ sp2stmt: [:dupsort, :dupfixed],
159
+ so2stmt: [:dupsort, :dupfixed],
160
+ po2stmt: [:dupsort, :dupfixed],
161
+ # on the fence about whether or not to include graph
162
+ # indexes; my inclination is that they would be redundant
163
+ # gs2stmt: [:dupsort, :dupfixed],
164
+ # gp2stmt: [:dupsort, :dupfixed],
165
+ # go2stmt: [:dupsort, :dupfixed],
166
+ }.map do |name, flags|
167
+ [name, @lmdb.database(name.to_s,
168
+ **(flags + [:create]).map { |f| [f, true] }.to_h)]
169
+ end.to_h
170
+
171
+ # t.commit
172
+ end
173
+ @lmdb.sync
174
+ end
175
+
176
+ SPO = %i[subject predicate object].freeze
177
+ SPO_MAP = {
178
+ subject: :s2stmt,
179
+ predicate: :p2stmt,
180
+ object: :o2stmt,
181
+ }.freeze
182
+ SPOG_MAP = SPO_MAP.merge({ graph_name: :g2stmt }).freeze
183
+ PAIR_MAP = {
184
+ [:subject, :predicate] => :sp2stmt,
185
+ [:predicate, :object] => :po2stmt,
186
+ [:subject, :object] => :so2stmt,
187
+ # [:graph_name, :subject] => :gs2stmt,
188
+ # [:graph_name, :predicate] => :gp2stmt,
189
+ # [:graph_name, :object] => :go2stmt,
190
+ }.freeze
191
+
192
+ def last_key db
193
+ db = @dbs[db] if db.is_a? Symbol
194
+ return nil if db.size == 0
195
+ # the last entry in the database should be the highest number
196
+ db.cursor { |c| c.last }.first.unpack1 ?J
197
+ end
198
+
199
+ def int_for term
200
+ case term
201
+ when nil then 0
202
+ when RDF::Statement
203
+ terms = term.to_a.map { |t| int_for t }
204
+ return if terms.include? nil # the statement implicitly not here
205
+
206
+ if raw = @dbs[:ints2stmt].get(terms.pack 'J3')
207
+ raw.unpack1 ?J
208
+ end
209
+ when Hash # of integers
210
+ if raw = @dbs[:ints2stmt].get(term.values_at(*SPO).pack 'J3')
211
+ raw.unpack1 ?J
212
+ end
213
+ when RDF::Term
214
+ thash = hash_term term
215
+ if raw = @dbs[:hash2term].get(thash)
216
+ raw.unpack1 ?J
217
+ end
218
+ when String
219
+ # assume this is the hash string
220
+ if raw = @dbs[:hash2term].get(term)
221
+ raw.unpack1 ?J
222
+ end
223
+ end
224
+ end
225
+
226
+ def store_term term
227
+ return 0 if term.nil?
228
+ raise ArgumentError, 'must be a term' unless term.is_a? RDF::Term
229
+ # get the hash first
230
+ thash = hash_term term
231
+ if ix = int_for(thash)
232
+ return ix
233
+ end
234
+
235
+ # this should start with 1, not zero
236
+ ix = (last_key(@dbs[:int2term]) || 0) + 1
237
+ ib = [ix].pack ?J
238
+ @dbs[:int2term].put ib, term.to_ntriples.to_nfc
239
+
240
+ # we need the hash too to resolve the term the other way
241
+ @dbs[:hash2term].put thash, ib
242
+
243
+ ix # return the current index
244
+ end
245
+
246
+ def store_stmt statement, ints = nil
247
+ ints ||= statement.to_h.transform_values { |v| store_term v }
248
+ ik = ints.values_at(*SPO).pack 'J3'
249
+ if ib = @dbs[:ints2stmt].get(ik)
250
+ return ib.unpack1 ?J
251
+ end
252
+
253
+ # this should start with 1, not zero
254
+ ix = (last_key(:statement) || 0) + 1
255
+ ib = [ix].pack ?J
256
+
257
+ @dbs[:statement].put ib, ik # number to triple-number
258
+ @dbs[:ints2stmt].put ik, ib # triple-number to number
259
+
260
+ ix # the index integer
261
+ end
262
+
263
+ # everything gets normalized to NFC on the way in (i
264
+ # consternated for a very long time about NFC vs NFKC)
265
+ def hash_term term
266
+ Digest::SHA256.digest term.to_ntriples.to_nfc
267
+ end
268
+
269
+ # note we leave the period but we nuke the newline
270
+ def hash_statement stmt
271
+ Digest::SHA256.digest stmt.to_ntriples.to_nfc.chomp
272
+ end
273
+
274
+ def add_one statement
275
+ # get the integer keys for the terms and statement
276
+ terms = statement.to_h
277
+ ints = terms.transform_values { |v| store_term v }
278
+ ipack = ints.transform_values { |v| [v].pack ?J }
279
+ sint = store_stmt statement, ints
280
+ spack = [sint].pack ?J
281
+
282
+ # now we map the SPO indices
283
+ SPO_MAP.each do |k, d|
284
+ db = @dbs[d]
285
+ ik = ipack[k]
286
+ # note we test before inserting or lmdb will dutifully
287
+ # create unlimited duplicate values and results will be wrong
288
+ db.put ik, spack unless db.has? ik, spack
289
+ end
290
+
291
+ # now we do the pair indices
292
+ PAIR_MAP.each do |pair, d|
293
+ db = @dbs[d]
294
+ ik = ipack.values_at(*pair).join
295
+ db.put ik, spack unless db.has? ik, spack
296
+ end
297
+
298
+ # associate the statement with its graph; note zero is the null graph
299
+ gint = ints[:graph_name] || 0
300
+ gpack = [gint].pack ?J
301
+ @dbs[:g2stmt].put gpack, spack unless @dbs[:g2stmt].has? gpack, spack
302
+ @dbs[:stmt2g].put spack, gpack unless @dbs[:stmt2g].has? spack, gpack
303
+ end
304
+
305
+ def rm_one statement, scan: true
306
+ terms = statement.to_h
307
+ ints = terms.transform_values { |v| int_for v }
308
+ # if none of the terms resolve, we don't have it
309
+ return [] if ints.values_at(*SPO).include? nil
310
+ # same goes for the statement
311
+ sint = int_for(ints) or return []
312
+ spack = [sint].pack ?J
313
+
314
+
315
+ gint = ints[:graph_name] or return []
316
+ gpack = [gint].pack ?J
317
+ graphs = @dbs[:stmt2g].each_value(spack).to_a.uniq
318
+
319
+ out = []
320
+ unless graphs.empty?
321
+ # this will dissociate the statement from the graph
322
+ @dbs[:g2stmt].delete? gpack, spack
323
+ @dbs[:stmt2g].delete? spack, gpack
324
+
325
+ if graphs.size == 1 and graphs.first == gpack
326
+ # nuke the statement if this is the only instance of it
327
+ @dbs[:statement].delete? spack
328
+ @dbs[:ints2stmt].delete? ints.values_at(*SPO).pack('J3')
329
+
330
+ # now we nuke the indexes
331
+
332
+ # first the original spo
333
+ SPO_MAP.map do |k, d|
334
+ ib = [ints[k]].pack ?J
335
+ @dbs[d].delete? ib, spack
336
+ out << ints[k]
337
+ end
338
+
339
+ # add the graph if it is not null
340
+ out << terms[:graph_name] if terms[:graph_name] and gint != 0
341
+
342
+ # and now the pair map
343
+ ipack = ints.slice(*SPO).transform_values { |v| [v].pack ?J }
344
+ PAIR_MAP.map do |pair, d|
345
+ ib = ipack.values_at(*pair).join
346
+ @dbs[d].delete? ib, spack
347
+ end
348
+ end
349
+ end
350
+
351
+ # nuke any unused terms
352
+ clean_terms out if scan
353
+
354
+ out
355
+ end
356
+
357
+ def clean_terms terms
358
+ terms.map! { |t| t.is_a?(RDF::Term) ? hash_term(t) : t.to_s }.uniq
359
+ @lmdb.transaction do
360
+ terms.each do |hash|
361
+ next if hash == NULL_SHA256
362
+ next unless ib = @dbs[:hash2term].get(hash)
363
+ unless SPOG_MAP.values.any? {|d| @dbs[d].get ib }
364
+ @dbs[:int2term].delete? ib
365
+ @dbs[:hash2term].delete? hash
366
+ end
367
+ end
368
+ end
369
+ end
370
+
371
+ def complete! statement
372
+ raise ArgumentError, "Statement #{statement.inspect} is incomplete" if
373
+ statement.incomplete?
374
+ end
375
+
376
+ def resolve_term candidate, cache: {}, write: false
377
+ int = nil
378
+ term = case candidate
379
+ when nil then return
380
+ when Integer
381
+ int = candidate
382
+ return if int == 0
383
+ return cache[int] if cache[int]
384
+ str = [int].pack ?J
385
+ @dbs[:int2term][str] or return
386
+ when String
387
+ int = candidate.unpack1 ?J
388
+ str = [int].pack ?J
389
+ if candidate == str
390
+ return if int == 0
391
+ return cache[int] if cache[int]
392
+ @dbs[:int2term][str] or return
393
+ else
394
+ return if candidate == NULL_SHA256
395
+ str = @dbs[:hash2term][candidate] or return
396
+ @dbs[:int2term][str] or return
397
+ int = str.unpack1 ?J
398
+ end
399
+ else
400
+ raise ArgumentError, 'not an integer or a string'
401
+ end
402
+
403
+ term.force_encoding 'utf-8'
404
+
405
+ term = RDF::NTriples::Reader.parse_object term, intern: true
406
+ cache[int] = term if write
407
+ term
408
+ end
409
+
410
+ def split_fixed string, length
411
+ string = string.dup
412
+ seq = []
413
+ until string.empty?
414
+ seq << string.slice!(0, length)
415
+ end
416
+ seq
417
+ end
418
+
419
+ def resolve_terms string, cache: {}, write: false, hash: false
420
+ seq = []
421
+ out = string.unpack('J*').map do |i|
422
+ seq << i
423
+ j = resolve_term(i, cache: cache, write: write)
424
+ [i, j]
425
+ end.to_h
426
+
427
+ # if we aren't returning a hash, make sure the result is
428
+ # returned in order
429
+ hash ? out : out.values_at(*seq)
430
+ end
431
+
432
+ def each_maybe_with_graph has_graph = false, &block
433
+ body = -> do
434
+ cache = {}
435
+ @dbs[:statement].each do |spack, spo|
436
+ spo = resolve_terms spo, cache: cache, write: true
437
+
438
+ @dbs[:stmt2g].each_value spack do |gpack|
439
+ gint = gpack.unpack1 ?J
440
+ next if has_graph and gint == 0
441
+ graph = resolve_term gpack, cache: cache, write: true
442
+ block.call RDF::Statement(*spo, graph_name: graph)
443
+ end
444
+ end
445
+ end
446
+
447
+ @lmdb.transaction do
448
+ body.call
449
+ end
450
+
451
+ #@lmdb.active_txn ? body.call : @lmdb.transaction(true, &body)
452
+ end
453
+
454
+ def check_triple_quad arg, name: :triple, quad: false
455
+ raise ArgumentError, "#{name} must be Array-able" unless
456
+ arg.respond_to? :to_a
457
+ arg = arg.to_a
458
+ spo = arg.take 3
459
+ raise ArgumentError,
460
+ '#{name} must be at least 3 RDF::Term elements' unless
461
+ spo.length == 3 and spo.all? { |x| x.is_a? RDF::Term }
462
+ graph = nil
463
+ if quad
464
+ graph = arg[3]
465
+ raise ArgumentError, 'quad must be nil or an RDF::Term' unless
466
+ graph.nil? or graph.is_a? RDF::Term
467
+ end
468
+
469
+ RDF::Statement(*spo, graph_name: graph)
470
+ end
471
+
472
+ protected
473
+
474
+ def begin_transaction mutable: false, graph_name: nil, &block
475
+ @tx_class.new self, mutable: mutable, graph_name: graph_name, &block
476
+ end
477
+
478
+ def commit_transaction txn = nil
479
+ nil # nothing lol
480
+ end
481
+
482
+ def rollback_transaction txn = nil
483
+ nil # nothing lol
484
+ end
485
+
486
+ def query_pattern pattern, options = {}, &block
487
+ return enum_for :query_pattern, pattern, options unless block_given?
488
+
489
+ # coerce to hash
490
+ pattern = pattern.to_h
491
+
492
+ # flag if the graph is a variable
493
+ gv = pattern[:graph_name] && pattern[:graph_name].variable?
494
+
495
+ # hash of terms we get from the pattern
496
+ thash = pattern.reject { |_, v| !v or v.variable? }
497
+
498
+ # if nothing in the pattern is present then this is the same
499
+ # as #each/#each_statement
500
+ return each_maybe_with_graph(gv, &block) if thash.empty?
501
+
502
+ # hash of integer keys we retrieve for the terms
503
+ ihash = thash.transform_values { |v| int_for v }
504
+ cache = thash.keys.map { |k| [ihash[k], thash[k]] }.to_h
505
+
506
+ body = -> do
507
+ # if the graph is nonexistent there is nothing to show
508
+ return if thash[:graph_name] and !ihash[:graph_name]
509
+
510
+ if (SPO - thash.keys).empty?
511
+ # if all of SPO are defined then we can just construct a
512
+ # statement and hash it; then if G is defined on top of that
513
+ # we can just check :stmt2g
514
+ stmt = RDF::Statement.new(**thash)
515
+ sint = int_for(stmt) or return
516
+ spack = [sint].pack ?J
517
+ first = @dbs[:statement].get(spack) or return
518
+
519
+ # warn thash.inspect, ihash.inspect
520
+
521
+ # note
522
+ if gint = ihash[:graph_name]
523
+ gpack = [gint].pack ?J
524
+ return unless @dbs[:stmt2g].has? spack, gpack
525
+ yield stmt
526
+ else
527
+ @dbs[:stmt2g].each_value spack do |gpack|
528
+ # return if gpack.unpack1(?J) == 0
529
+ graph = resolve_term gpack, cache: cache, write: true
530
+ yield RDF::Statement.from(stmt, graph_name: graph)
531
+ end
532
+ end
533
+ elsif thash.keys.count == 1
534
+ # if only a single component (e.g. :subject) is present then
535
+ # we only need to check (e.g.) :s2stmt.
536
+ pos = thash.keys.first
537
+ db = @dbs[SPOG_MAP[pos]]
538
+ ix = ihash[pos] or return # note ihash[pos] may be nil
539
+ anchor = [ix].pack ?J
540
+ return unless db.has? anchor
541
+
542
+ db.each_value anchor do |spack|
543
+ spo = resolve_terms @dbs[:statement][spack],
544
+ cache: cache, write: true
545
+ if pos == :graph_name
546
+ yield RDF::Statement(*spo, graph_name: thash[:graph_name])
547
+ else
548
+ @dbs[:stmt2g].each_value spack do |gpack|
549
+ gint = gpack.unpack1 ?J
550
+ graph = resolve_term gint, cache: cache, write: true
551
+ yield RDF::Statement(*spo, graph_name: graph)
552
+ end
553
+ end
554
+ end
555
+ elsif thash.keys.count == 2 and thash[:graph_name]
556
+ pos = (thash.keys - [:graph_name]).first
557
+ db = @dbs[SPO_MAP[pos]]
558
+ ix = ihash[pos] or return
559
+ anchor = [ix].pack ?J
560
+ return unless db.has? anchor
561
+
562
+ db.each_value anchor do |spack|
563
+ spo = @dbs[:statement][spack]
564
+ return unless @dbs[:stmt2g].has? spack, ihash[:graph_name]
565
+ spo = resolve_terms spo
566
+ yield RDF::Statement(*spo, graph_name: thash[:graph_name])
567
+ end
568
+ else
569
+ # okay we will have either two or three terms
570
+
571
+ # select the pair of term keys with the lowest non-zero
572
+ # cardinality
573
+ pair = PAIR_MAP.select do |pr, _|
574
+ # we check for keys present as well as values (eg nil graph)
575
+ (pr - thash.keys).empty? and ihash.values_at(*pr).none?(&:nil?)
576
+ end.map do |pr, _|
577
+ v = ihash.values_at(*pr).pack 'J2'
578
+ c = @dbs[PAIR_MAP[pr]].cardinality(v)
579
+ [c, pr]
580
+ end.sort do |a, b|
581
+ a.first <=> b.first
582
+ end.reject { |x| x.first == 0 }.map(&:last).first or return
583
+
584
+ # grab the graph if we have it
585
+ g = resolve_term(ihash[:graph_name],
586
+ cache: cache, write: true) if ihash[:graph_name]
587
+
588
+ ib = ihash.values_at(*pair).pack 'J2'
589
+ @dbs[PAIR_MAP[pair]].each_value ib do |spack|
590
+ spo = resolve_terms @dbs[:statement][spack],
591
+ cache: cache, write: true
592
+
593
+ if ihash[:graph_name]
594
+ # warn g, ihash.inspect
595
+ gpack = [ihash[:graph_name]].pack ?J
596
+ next unless @dbs[:stmt2g].has? spack, gpack
597
+ yield RDF::Statement(*spo, graph_name: g)
598
+ else
599
+ @dbs[:stmt2g].each_value spack do |gpack|
600
+ gint = gpack.unpack1 ?J
601
+ g = resolve_term gint, cache: cache, write: true
602
+ yield RDF::Statement(*spo, graph_name: g)
603
+ end
604
+ end
605
+ end
606
+ end
607
+ end
608
+
609
+ #@lmdb.active_txn ? body.call : @lmdb.transaction(true, &body)
610
+
611
+ ret = nil
612
+ @lmdb.transaction do
613
+ ret = body.call
614
+ end
615
+
616
+ ret
617
+ end
618
+
619
+ public
620
+
621
+ def initialize dir = nil, uri: nil, title: nil, **options, &block
622
+ dir ||= options.delete(:dir) if options[:dir]
623
+
624
+ # wtf no idea why this won't inherit
625
+ @tx_class ||= options.delete(:transaction_class) { DEFAULT_TX_CLASS }
626
+ raise ArgumentError, "Invalid transaction class #{@tx_class}" unless
627
+ @tx_class.is_a? Class and @tx_class <= DEFAULT_TX_CLASS
628
+
629
+ init_lmdb dir, **options
630
+ super uri: uri, title: title, **options, &block
631
+ end
632
+
633
+ # housekeeping
634
+
635
+ def supports? feature
636
+ !!SUPPORTS[feature.to_s.to_sym]
637
+ end
638
+
639
+ def isolation_level
640
+ :serializable
641
+ end
642
+
643
+ def path
644
+ Pathname(@lmdb.path)
645
+ end
646
+
647
+ def clear
648
+ @lmdb.transaction do
649
+ @dbs.each_value { |db| db.clear }
650
+ end
651
+ # we do not clear the main database; that nukes the sub-databases
652
+ # @lmdb.database.clear
653
+ end
654
+
655
+ def open dir, **options
656
+ init_lmdb dir, **options
657
+ end
658
+
659
+ def close
660
+ @lmdb.close
661
+ end
662
+
663
+ # data manipulation
664
+
665
+ def insert_statement statement
666
+ complete! statement
667
+ @lmdb.transaction { |t| add_one statement; t.commit }
668
+ nil
669
+ end
670
+
671
+ def delete_statement statement
672
+ complete! statement
673
+ @lmdb.transaction { |t| rm_one statement; t.commit }
674
+ nil
675
+ end
676
+
677
+ def insert_statements statements
678
+ @lmdb.transaction do
679
+ statements.each do |statement|
680
+ complete! statement
681
+ add_one statement
682
+ end
683
+ end
684
+
685
+ nil
686
+ end
687
+
688
+ def delete_statements statements
689
+ @lmdb.transaction do |t|
690
+ hashes = []
691
+ statements.each do |statement|
692
+ complete! statement
693
+ hashes += rm_one statement, scan: false
694
+ end
695
+
696
+ clean_terms hashes
697
+ t.commit
698
+ end
699
+
700
+ nil
701
+ end
702
+
703
+ # data retrieval
704
+
705
+ def each &block
706
+ return enum_for :each unless block_given?
707
+
708
+ each_maybe_with_graph(&block)
709
+ end
710
+
711
+ def each_subject &block
712
+ return enum_for :each_subject unless block_given?
713
+ @dbs[:s2stmt].cursor do |c|
714
+ while (k, _ = c.next true)
715
+ yield resolve_term k
716
+ end
717
+ end
718
+ end
719
+
720
+ def each_predicate &block
721
+ return enum_for :each_predicate unless block_given?
722
+ @dbs[:p2stmt].cursor do |c|
723
+ while (k, _ = c.next true)
724
+ yield resolve_term k
725
+ end
726
+ end
727
+ end
728
+
729
+ def each_object &block
730
+ return enum_for :each_object unless block_given?
731
+ @dbs[:o2stmt].cursor do |c|
732
+ while (k, _ = c.next true)
733
+ yield resolve_term k
734
+ end
735
+ end
736
+ end
737
+
738
+ def each_graph &block
739
+ return enum_for :each_graph unless block_given?
740
+ @dbs[:g2stmt].cursor do |c|
741
+ while (k, _ = c.next true)
742
+ yield RDF::Graph.new(graph_name: resolve_term(k), data: self)
743
+ end
744
+ end
745
+ end
746
+
747
+ def each_term &block
748
+ return enum_for :each_term unless block_given?
749
+ @dbs[:int2term].cursor do |c|
750
+ while (_, v = c.next)
751
+ # yield RDF::NTriples::Reader.unserialize v
752
+ v.force_encoding 'utf-8'
753
+ yield RDF::NTriples::Reader.parse_object(v, intern: true)
754
+ end
755
+ end
756
+ end
757
+
758
+ def project_graph graph_name, &block
759
+ return enum_for :project_graph, graph_name unless block_given?
760
+ body = -> do
761
+ gint = graph_name ? int_for(graph_name) : 0
762
+ return unless gint
763
+ gpack = [gint].pack ?J
764
+ cache = {}
765
+ @dbs[:statement].each do |spack, spo|
766
+ next unless @dbs[:stmt2g].has? spack, gpack
767
+ spo = resolve_terms spo, cache: cache, write: true
768
+
769
+ block.call RDF::Statement(*spo, graph_name: graph_name)
770
+ end
771
+ end
772
+
773
+ @lmdb.transaction do
774
+ body.call
775
+ end
776
+
777
+ #@lmdb.active_txn ? body.call : @lmdb.transaction(true, &body)
778
+ end
779
+
780
+ def count
781
+ @dbs[:stmt2g].size
782
+ end
783
+
784
+ def empty?
785
+ count == 0
786
+ end
787
+
788
+ # def apply_changeset changeset
789
+ # @lmdb.transaction do |t|
790
+ # delete_insert(changeset.deletes, changeset.inserts)
791
+ # end
792
+ # end
793
+
794
+ def delete_insert deletes, inserts
795
+ ret = super(deletes, inserts)
796
+ commit_transaction # this is to satiate the test suite
797
+ ret
798
+ end
799
+
800
+ def env
801
+ @lmdb
802
+ end
803
+
804
+ def transaction mutable: false, &block
805
+ return begin_transaction mutable: mutable unless block_given?
806
+
807
+ begin
808
+ begin_transaction mutable: mutable, &block
809
+ rescue => error
810
+ rollback_transaction # to sate the test suite
811
+ raise error
812
+ end
813
+ #commit_transaction # to sate the test suite
814
+ self
815
+ end
816
+
817
+ def has_statement? statement
818
+ raise ArgumentError, 'Argument must be an RDF::Statement' unless
819
+ statement.is_a? RDF::Statement
820
+ !query_pattern(statement.to_h).to_a.empty?
821
+ end
822
+
823
+ def has_graph? graph_name
824
+ raise ArgumentError, 'graph_name must be an RDF::Term' unless
825
+ graph_name.is_a? RDF::Term
826
+ int = int_for(graph_name) or return
827
+ pack = [int].pack ?J
828
+ @dbs[:g2stmt].has? pack
829
+ end
830
+
831
+ def has_subject? subject
832
+ raise ArgumentError, 'subject must be an RDF::Term' unless
833
+ subject.is_a? RDF::Term
834
+ int = int_for(subject) or return
835
+ pack = [int].pack ?J
836
+ @dbs[:s2stmt].has? pack
837
+ end
838
+
839
+ def has_predicate? predicate
840
+ raise ArgumentError, 'predicate must be an RDF::Term' unless
841
+ predicate.is_a? RDF::Term
842
+ int = int_for(predicate) or return
843
+ pack = [int].pack ?J
844
+ @dbs[:p2stmt].has? pack
845
+ end
846
+
847
+ def has_object? object
848
+ raise ArgumentError, 'object must be an RDF::Term' unless
849
+ object.is_a? RDF::Term
850
+ int = int_for(object) or return
851
+ pack = [int].pack ?J
852
+ @dbs[:o2stmt].has? pack
853
+ end
854
+
855
+ def has_term? term
856
+ raise ArgumentError, 'term must be an RDF::Term' unless
857
+ term.is_a? RDF::Term
858
+ @dbs[:hash2term].has? hash_term(term)
859
+ end
860
+
861
+ def has_triple? triple
862
+ has_statement? check_triple_quad triple
863
+ end
864
+
865
+ def has_quad? quad
866
+ has_statement? check_triple_quad quad, quad: true
867
+ end
868
+
869
+
870
+ # lol, ruby
871
+ end
872
+ end
873
+ end