store-digest 0.1.3

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: c6d86e00c5ff94af896658561fdc0076c17650e704dc611f4520e5580043c45a
4
+ data.tar.gz: 68524ba5e9d3e253b1d3b7d589d777d617d3d360d9d7c9ed5f8cd019ff3ea2eb
5
+ SHA512:
6
+ metadata.gz: 713edaa0ced8d81587b276f09a9740af9f0b25f39689336cf96377a3a38c5fcf081f9b3532e1b85e8051cfaed730cca1885a68a477e5d3f5a2a3664f4d4cf3d2
7
+ data.tar.gz: 7113db7fd653759cd6431e2c0c5f5ea06063daabb43ff767e93f47c942967915e54315c40d1ead17f51244e5c24f92dfb508df689977a2aad61ee527c61f87b1
@@ -0,0 +1,15 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+
10
+ # rspec failure tracking
11
+ .rspec_status
12
+ .\#*
13
+ \#*\#
14
+ Gemfile.lock
15
+ *.gem
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
@@ -0,0 +1,7 @@
1
+ ---
2
+ sudo: false
3
+ language: ruby
4
+ cache: bundler
5
+ rvm:
6
+ - 2.5.5
7
+ before_install: gem install bundler -v 1.17.3
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "https://rubygems.org"
2
+
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in store-digest.gemspec
6
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,202 @@
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
@@ -0,0 +1,231 @@
1
+ # Store::Digest - An RFC6920-compliant content-addressable store
2
+
3
+ There are a number of content-addressable stores out there. [Git is
4
+ one](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects),
5
+ [Perkeep](https://perkeep.org/) (née CamliStore) is another. Why
6
+ another one? Well:
7
+
8
+ 1. **[RFC6920](https://tools.ietf.org/html/rfc6920) URI interface:** the
9
+ primary way you talk to this thing is through `ni:///…` URIs.
10
+ 2. **Multiple digest algorithms:** every object in the store is
11
+ identified by more than one cryptographic digest, as cryptographic
12
+ digest algorithms tend to come and go.
13
+ 3. **Network-optional:** don't run another network service when an
14
+ embedded solution will suffice.
15
+ 4. **Rudimentary metadata:** Aside from internal bookeeping
16
+ information, store _only_ the handful of facts that would enable a
17
+ blob to be properly rendered in a Web browser without integrating
18
+ another metadata source, _and nothing else_.
19
+ 5. **Organizational memory:** Retain a record, not only of every
20
+ object _currently_ in the store, but also every object that has
21
+ _ever been_ in the store.
22
+
23
+ ## How to use Store::Digest
24
+
25
+ ```ruby
26
+ require 'store/digest'
27
+ require 'pathname'
28
+ require 'uri/ni'
29
+
30
+ store = Store::Digest.new driver: :LMDB, dir: '/var/lib/store-digest'
31
+
32
+ objs = Pathname('~/Desktop').expand_path.glob(?*).map do |f|
33
+ store.add f if f.file?
34
+ end
35
+ # congratulations, you have just copied all the stuff on your desktop.
36
+
37
+ # let's make an identifer
38
+
39
+ uri = URI::NI.compute 'some data'
40
+ # => #<URI::NI ni:///sha-256;EweZDmulyhRes16ZGCqb7EZTG8VN32VqYCx4D6AkDe4>
41
+
42
+ store.get uri
43
+ # => nil
44
+
45
+ # of course not because we didn't put the content in there, so let's do that:
46
+
47
+ store.add 'some data'
48
+ # => #<Store::Digest::Object:0x00007fa00d5ee3e0
49
+ # @content=
50
+ # #<Proc:0x00007fa00d5ee430:1 (lambda)>,
51
+ # @ctime=2020-01-18 14:05:56 -0800,
52
+ # @digests=
53
+ # {:md5=>#<URI::NI ni:///md5;HlAhCgICSX-3m8OLat5sNA>,
54
+ # :"sha-1"=>#<URI::NI ni:///sha-1;uvNFUf7LSKzD2oaOuF4bbayd41Y>,
55
+ # :"sha-256"=>
56
+ # #<URI::NI ni:///sha-256;EweZDmulyhRes16ZGCqb7EZTG8VN32VqYCx4D6AkDe4>,
57
+ # :"sha-384"=>
58
+ # #<URI::NI ni:///sha-384;qcYaFi9LVypj5rDitFrvRztzAn1ZBVWWakwJGFg3_3KhAZHBNuw_RhTXkU0dqCPw>,
59
+ # :"sha-512"=>
60
+ # #<URI::NI ni:///sha-512;4WRedJLwMvtixnTbdVAL57Jgv8DaqWWCHds_ikm10zeI7j8EZ0TiuVr7XD2PJQDFScqJ15_GiQiF0o4FUAdCTw>},
61
+ # @dtime=nil,
62
+ # @flags=0,
63
+ # @mtime=2020-01-18 14:05:56 -0800,
64
+ # @ptime=2020-01-18 14:05:56 -0800,
65
+ # @size=9,
66
+ # @type="text/plain">
67
+
68
+ store.get uri
69
+ # ...same thing...
70
+ ```
71
+
72
+ The main operations are, of course, `add`, `get`, `remove`, and
73
+ `forget`. I am currently working on a `search` which will match
74
+ partial digests and metadata values.
75
+
76
+ For each object, immutable bookkeeping metadata include:
77
+
78
+ * Size (bytes)
79
+ * Added to store (timestamp)
80
+ * Metadata modified (timestamp)
81
+ * Blob deleted (timestamp if present)
82
+
83
+ User-manipulable metadata consists of:
84
+
85
+ * Modification time (timestamp)
86
+ * Content-type (MIME identifier, e.g. `text/html`)
87
+ * Language (optional [RFC5646](https://tools.ietf.org/html/rfc5646)
88
+ token, e.g. `en-ca`)
89
+ * Character set (optional token, e.g. `utf-8`, `iso-8859-1`, `windows-1252`)
90
+ * Content-encoding (optional token, e.g. `gzip`, `deflate`)
91
+ * Flags (8-bit unsigned integer)
92
+
93
+ There are four flags, each with two bits of information:
94
+
95
+ * Content-type
96
+ * Character set
97
+ * Content-encoding
98
+ * Syntax
99
+
100
+ For each of these flags, the values 0 to 3 signify:
101
+
102
+ 0. Unverified
103
+ 1. Invalid
104
+ 2. Recheck validation
105
+ 3. Verified valid
106
+
107
+ These metadata fields are "user-manipulable" for an _extremely_ loose
108
+ definition of "user". The idea, in particular for the fields that have
109
+ associated flags, is that any initial value is a _claim_ that may be
110
+ subsequently verified, by, for example, a separate maintenance daemon
111
+ that scans newly-inserted objects and supplants their metadata with
112
+ whatever it finds. For example, one could insert a compressed file of
113
+ type `application/gzip`, and some other maintenance process could
114
+ come along and realize that in fact the file is `image/svg+xml` with
115
+ an _encoding_ of `gzip`, but there is also a syntax error, so it
116
+ should not be served without first attempting to repair it. Because of
117
+ the enormous combination of types, encodings, and syntaxes, such a
118
+ maintenance daemon is way out of scope for this project, despite being
119
+ a desirable and likely future addition to it.
120
+
121
+ ## API Documentation
122
+
123
+ Generated and deposited [in the usual
124
+ place](https://www.rubydoc.info/github/doriantaylor/rb-store-digest/master).
125
+
126
+ ## Design
127
+
128
+ This package is intended to be the simplest possible substrate for
129
+ managing opaque data segments in terms of their contents. The central
130
+ ambition is to solidify a consistent interface for all the desired
131
+ behaviour, and then subsequently expand that interface to different
132
+ languages and platforms, including, when possible, adapters to
133
+ existing content-addressable storage platforms.
134
+
135
+ This version is written in Ruby, and is the maturation of [an earlier
136
+ version written in Perl](https://metacpan.org/pod/Store::Digest).
137
+ Whereas the Perl implementation uses [Berkeley
138
+ DB](https://www.oracle.com/database/berkeley-db/) to store its
139
+ metadata, this version uses [LMDB](https://symas.com/lmdb/). Indeed, a
140
+ subsidiary goal of this project is to define a _pattern_ for
141
+ lower-level database storage, and key-value stores in particular, such
142
+ that complying digest store interfaces running on different
143
+ programming languages can attach to the _same_ storage repository.
144
+
145
+ The current implementation uses the file system store its blobs. It
146
+ does so by taking the primary digest algorithm (`sha-256` by default)
147
+ of the given blob, encoding it as base-32 (to be case-insensitive),
148
+ and transforming the first few bits into a hashed directory structure
149
+ (similar to Git, though it uses hexadecimal encoding). Internally, the
150
+ blob and metadata subsystems are decoupled, such that the two can be
151
+ mixed and matched.
152
+
153
+ ### Basic Architecture
154
+
155
+ `Store::Digest` proper is a unified interface over a `Driver` which
156
+ provides the concrete implementation. A driver may be further
157
+ decoupled (as is the case with `LMDB`) into `Blob` and `Meta`
158
+ subcomponents, which themselves may share `Trait`s (like storing its
159
+ state in a `RootDir`). We can imagine this bifurcation not being
160
+ universal, e.g. a prospective PostgreSQL driver could handle both
161
+ blobs _and_ metadata within its own confines.
162
+
163
+ > I have yet to decide on the final layout of this system, so don't
164
+ > get too used to it.
165
+
166
+ ### A note on storage efficiency
167
+
168
+ Unlike Git, which uses `pigz` (or rather, `deflate`) to compress its
169
+ contents (along with a small amount of embedded metadata), objects in
170
+ this system are stored as-is. This is deliberate: if you want to
171
+ compress your objects, you should compress them _before_ adding them
172
+ to the store, and signal the fact that they are compressed in the
173
+ metadata.
174
+
175
+ I have also elected to have the system designate a "primary" digest
176
+ algorithm, which defaults to `sha-256`. Its binary representation is
177
+ 32 bytes long. By all accounts, this is way too big for an internal
178
+ identifier. This is how the original Perl version worked, and likely
179
+ how I plan to keep it until I can work out the additional complexity
180
+ of a shorter identifier (like an unsigned integer) which is guaranteed
181
+ to be unique (namely, recycling discarded identifiers). For the time
182
+ being however I am not worried, as storage is large and computers are
183
+ fast.
184
+
185
+ ### Issues with hash collisions
186
+
187
+ MD5 and SHA-1 are both considered unsafe for the purposes of
188
+ cryptographic integrity, since collision attacks have been
189
+ demonstrated against both algorithms. (This is why neither is
190
+ recommended as a primary key.)
191
+
192
+ Currently, this system does not support duplicate mappings from one
193
+ digest algorithm to another. If two different blobs with the same hash
194
+ are entered into the store, the second one will probably be ignored. I
195
+ will probably change this eventually so smaller hashes can accommodate
196
+ duplicate entries, as dumping crafted data into this system would
197
+ likely be a convenient way to identify collision targets.
198
+
199
+ ### Future directions
200
+
201
+ The first order of business is to create a search function for the
202
+ various metadata elements. Then, probably a Rack app to put this whole
203
+ business online (much like the analogous one I wrote in Perl).
204
+ Following that, probably start poking around at that maintenance
205
+ daemon I mentioned in the other section, and whatever else I think is
206
+ a good idea and not too much effort.
207
+
208
+ Afterward, I will probably take the show on the road: write a version
209
+ in Python and/or JavaScript, for example. Maybe look at other
210
+ back-ends. We'll see.
211
+
212
+ ## Installation
213
+
214
+ When it is ready, you know how to do this:
215
+
216
+ $ gem install store-digest
217
+
218
+ You will know when it's far enough along when you can [download it off
219
+ rubygems.org](https://rubygems.org/gems/store-digest).
220
+
221
+ ## Contributing
222
+
223
+ Bug reports and pull requests are welcome at
224
+ [the GitHub repository](https://github.com/doriantaylor/rb-store-digest/issues).
225
+
226
+ ## Copyright & License
227
+
228
+ ©2019 [Dorian Taylor](https://doriantaylor.com/)
229
+
230
+ This software is provided under
231
+ the [Apache License, 2.0](https://www.apache.org/licenses/LICENSE-2.0).