store-digest 0.3.1 → 0.4.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/TODO.org +115 -4
- data/lib/store/digest/blob/filesystem.rb +10 -5
- data/lib/store/digest/driver.rb +4 -0
- data/lib/store/digest/entry.rb +1214 -0
- data/lib/store/digest/error.rb +28 -0
- data/lib/store/digest/meta/lmdb/v0.rb +388 -0
- data/lib/store/digest/meta/lmdb/v1.rb +737 -0
- data/lib/store/digest/meta/lmdb.rb +59 -1041
- data/lib/store/digest/meta.rb +1 -1
- data/lib/store/digest/readwrapper.rb +174 -0
- data/lib/store/digest/version.rb +1 -1
- data/lib/store/digest.rb +335 -117
- data/store-digest.gemspec +6 -7
- metadata +45 -17
- data/lib/store/digest/object.rb +0 -623
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: '09eefcbc9e86e0b771a61edd3a16f135f559c228f9ea0ebd453bd198f3d70ae3'
|
|
4
|
+
data.tar.gz: 9bd41bdcd3709b07081daeaf49e2f7f8e7f5513f3c2469b3636d5ffcfc9f9948
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: d4f6199427973e30fda2380738bb8d7697c8e12f6c1f641e78f901c9827ab8a87742a62cd07fa422d89617e504181a41c9ba227427d64955b4416fdd0ebec6d6
|
|
7
|
+
data.tar.gz: b67f8805a959d787914b5adbc0854d969e4d0e5d5a2a5bb541c9aa259728f789f0a01e0e50bf7a4014bfbc9466d526f33ff24e4f809fbd0a8f644ae5cd4004fe
|
data/TODO.org
CHANGED
|
@@ -1,11 +1,122 @@
|
|
|
1
1
|
#+STARTUP: showall indent hidestars
|
|
2
|
+
* TODO 2026-06-07 remaining tasks for scanning overhaul
|
|
3
|
+
- [ ] ~Store::Digest~
|
|
4
|
+
- [ ] create ~#has?~ predicate
|
|
5
|
+
- [ ] test it doyy
|
|
6
|
+
- [ ] overhaul ~#add~
|
|
7
|
+
- [ ] make it pass through to ~Entry~ constructor
|
|
8
|
+
- [ ] create ~#add_raw~
|
|
9
|
+
- [ ] canonicalize media type
|
|
10
|
+
- [ ] update TTL on cache entries on match
|
|
11
|
+
- [ ] return a hash
|
|
12
|
+
- [ ] overhaul ~#remove~
|
|
13
|
+
- [ ] add ~tombstone:~ parameter to return already-deleted entities
|
|
14
|
+
- [ ] create ~#remove_raw~
|
|
15
|
+
- [ ] do we /really/ need to create a ~#remove_raw~?
|
|
16
|
+
- [ ] can we add a ~remove:~ flag to ~#get_raw~?
|
|
17
|
+
- [ ] overhaul ~#get~
|
|
18
|
+
- [ ] add ~tombstone:~ parameter to return already-deleted entities
|
|
19
|
+
- [ ] make it pass through to ~Entry~ constructor
|
|
20
|
+
- [ ] create ~#get_raw~
|
|
21
|
+
- [ ] silently remove expired cache items
|
|
22
|
+
- could probably chuck some kinda ~#purge~ operation in a thread; run it every few seconds
|
|
23
|
+
- [ ] return the same hash as ~#add_raw~
|
|
24
|
+
- [ ] ~Store::Digest::Entry~
|
|
25
|
+
- [ ] finish modifications to ~#initialize~
|
|
26
|
+
- [ ] prune out parameters that you shouldn't be able to mess with
|
|
27
|
+
- [ ] add ~#store.get_raw~ scenario to constructor
|
|
28
|
+
- [ ] if ~content~ is ~nil~ and there is a ~store:~ supplied
|
|
29
|
+
- [ ] (~content~ /must not/ be ~nil~ /except/ if there is a ~store:~)
|
|
30
|
+
- [ ] maybe implement as a block?
|
|
31
|
+
- [ ] implement ~#add~
|
|
32
|
+
- [ ] may be a noop if already associated with the store
|
|
33
|
+
- [ ] should produce an error if entry is a tombstone/has no content
|
|
34
|
+
- [ ] otherwise the entry will have to be rescanned
|
|
35
|
+
- [ ] implement ~#remove~
|
|
36
|
+
- [ ] implement ~#scan!~
|
|
37
|
+
- [ ] use ~store.add_raw~ if associated with a store
|
|
38
|
+
- (this returns a file handle or precursor thereto)
|
|
39
|
+
- [ ] otherwise use ~self.class.scan_raw~
|
|
40
|
+
- (this needs its own filehandle, or we can reuse the input if rewindable)
|
|
41
|
+
* TODO 2026-05-26 beef up scanning
|
|
42
|
+
- *GOAL*: We want to be able to drop one of these objects into a ~Rack~ message body — either request or response — and it should JFW.
|
|
43
|
+
- We should be able to take what's already in there and initialize the object with that.
|
|
44
|
+
- The result must be rewindable.
|
|
45
|
+
- [-] ~Store::Digest::Object~ changes
|
|
46
|
+
- [X] rename to ~Store::Digest::Entry~
|
|
47
|
+
- [X] add a ~Store::Digest::Object~ constant for the time being
|
|
48
|
+
- [X] absorb ~Store::Digest::Object::IOWrapper~
|
|
49
|
+
- [X] expose a minimal ~#each~ ~#read~ ~#gets~ ~#rewind~ ~#close~ interface
|
|
50
|
+
- [X] ~#content~ becomes a no-op
|
|
51
|
+
- [-] overhaul ~#initialize~
|
|
52
|
+
- [X] [[https://github.com/rack/rack/blob/main/SPEC.rdoc#the-body][ingest all kinds of objects]] (via ~Store::Digest::ReadWrapper~)
|
|
53
|
+
- [X] strings
|
|
54
|
+
- [X] arrays of strings / enumerators that yield strings / ~#each~
|
|
55
|
+
- [X] [[https://github.com/rack/rack/blob/main/SPEC.rdoc#the-input-stream][things that quack like]] ~#gets~ ~#read~ ~#close~
|
|
56
|
+
- [X] things that respond to ~#call~
|
|
57
|
+
- [X] that takes no arguments and returns one of the above
|
|
58
|
+
- [X] that takes a write handle as its first (only?) argument
|
|
59
|
+
- [ ] provide a ~store:~ named parameter for associating with a ~Store::Digest~ instance
|
|
60
|
+
- [ ] make it so ~digests:~ can take an ~Array~ of ~Symbol~ objects that represent the desired algorithms when the content is eventually scanned
|
|
61
|
+
- [ ] add a ~cache:~ named parameter which can be falsy, truthy, an integer representing a TTL, or a ~Time~ object indicating the desired expiry date
|
|
62
|
+
- [ ] add scan ~scan:~ named parameter (effectively the same as invoking ~.new(…).scan~)
|
|
63
|
+
- [ ] do not scan input until an attempt to access its contents
|
|
64
|
+
- [ ] first call to ~#each~, ~#gets~, ~#read~, or ~#content~
|
|
65
|
+
- [ ] first call to the ~#digests~ hash or any of the dependent metadata, such as ~#size~
|
|
66
|
+
- [ ] may not scan at all if passed in an object which is already scanned
|
|
67
|
+
- [ ] overhaul ~#scan~ instance method
|
|
68
|
+
- [ ] make it idempotent
|
|
69
|
+
- [ ] create a ~#scan!~ method that forces the issue
|
|
70
|
+
- [ ] deprecate the arguments/parameters and have it just operate over itself
|
|
71
|
+
- [ ] scanning the input adds it to the store if a backreference is present and replaces the input with the settled blob in the store
|
|
72
|
+
- [ ] swap in a different read handle if there is no backreference to the store
|
|
73
|
+
- [ ] the passed-in handle itself if it responds to ~#binmode~ and ~#rewind~
|
|
74
|
+
- [ ] ~StringIO~ for small blobs (1k? 10k? 64k?)
|
|
75
|
+
- how would I do that? start writing to a ~StringIO~ and if it ticks over the limit, dump that into a temp file and switch?
|
|
76
|
+
- (probably)
|
|
77
|
+
- (the benefit to doing that, aside from being faster, would be to mitigate the number of concurrent open file handles)
|
|
78
|
+
- (at the cost of bigger stuff being one blit slower)
|
|
79
|
+
- (honestly it'd probably get buffered in ram anyway)
|
|
80
|
+
- [ ] otherwise an ordinary temporary file
|
|
81
|
+
- [ ] Overhaul ~.scan~ class method to align with new constructor
|
|
82
|
+
- [ ] ~store:~ named parameter
|
|
83
|
+
- [ ] ~cache:~ named parameter
|
|
84
|
+
- [ ] initialize with ~scan: true~, obviously
|
|
85
|
+
- [X] create a ~.scan_raw~ class method so ~Store::Digest~ can use it independently
|
|
86
|
+
- [X] make it return a minimal set: digest hashes, byte count, and optional (scanned) content type
|
|
87
|
+
- [ ] if hashes are already supplied in the constructor then it is assumed to be scanned already
|
|
88
|
+
- [ ] a call to ~#scan~ is a no-op (because ~#scan~ should be idempotent)
|
|
89
|
+
- [ ] a call to ~#scan!~ is assumed to mean you want to verify the content and should blow up if *any* of the supplied hashes don't match
|
|
90
|
+
- [ ] implement ~IO~ and/or ~Enumerable~ emulation methods
|
|
91
|
+
- [ ] Any call to any of these should start a scan that ultimately replaces the underlying content with a rewindable file handle
|
|
92
|
+
- [ ] (unless the original object implements ~#rewind~ and there is no association with a store)
|
|
93
|
+
- (although note the caveat that the original object can be written to in situ, like it's more efficient to reuse it but less secure)
|
|
94
|
+
- [ ] maybe make an option in the constructor to not do this?
|
|
95
|
+
- [ ] ~#each~
|
|
96
|
+
- [ ] ~#gets~
|
|
97
|
+
- [ ] ~#read~
|
|
98
|
+
- [ ] ~#rewind~
|
|
99
|
+
- [ ] ~#open~ (no-op)
|
|
100
|
+
- [ ] ~#close~ (no-op)
|
|
101
|
+
- [ ] make ~#content~ a no-op
|
|
102
|
+
- [ ] going to have to overhaul ~#to_h~
|
|
103
|
+
- [ ] make a version that doesn't scan?
|
|
104
|
+
- [ ] add ~#stored?~
|
|
105
|
+
- [-] ~Store::Digest~ changes
|
|
106
|
+
- [ ] add something like a ~#add_raw~ method that will take the entry's readable object and return a ~Hash~ of metadata plus a lambda that returns a rewindable file handle
|
|
107
|
+
- [ ] overhaul ~#add~
|
|
108
|
+
- [ ] overhaul ~#get~
|
|
109
|
+
- [X] nuke ~#lazy_add~; it is no longer necessary
|
|
110
|
+
- [X] ~MimeMagic~ monkeypatch
|
|
111
|
+
- [X] lobby maintainers to merge my patches (again; not a peep in years)
|
|
112
|
+
- [X] failing that, ship a fork
|
|
2
113
|
* TODO upgrade metadata format
|
|
3
|
-
- [
|
|
4
|
-
- [
|
|
114
|
+
- [X] get rid of the idea of a 'primary' digest algorithm
|
|
115
|
+
- [X] change the main record to being keyed by integers and contain all hashes
|
|
5
116
|
- old: ≥209 bytes + 316 mapping
|
|
6
117
|
- new: ≥218 + 220 mapping
|
|
7
|
-
- [
|
|
8
|
-
- [
|
|
118
|
+
- [X] make digest -> integer mapping
|
|
119
|
+
- [X] add version to metadata
|
|
9
120
|
- [ ] add function to upgrade metadata store
|
|
10
121
|
* TODO add cache capability
|
|
11
122
|
- [ ] add fields to control:
|
|
@@ -57,20 +57,24 @@ module Store::Digest::Blob::FileSystem
|
|
|
57
57
|
# Return an open tempfile in the designated temp directory
|
|
58
58
|
# @return [Tempfile]
|
|
59
59
|
def temp_blob
|
|
60
|
-
Tempfile.new 'blob', tmp
|
|
60
|
+
Tempfile.new 'blob', tmp, encoding: Encoding::BINARY
|
|
61
61
|
end
|
|
62
62
|
|
|
63
63
|
# Settle a blob from its temporary location to its permanent location.
|
|
64
|
+
#
|
|
64
65
|
# @param bin [String] The binary representation of the keying digest
|
|
65
66
|
# @param fh [File] An open filehandle, presumably a temp file
|
|
66
67
|
# @param mtime [nil, Time, DateTime, Integer] the modification time
|
|
67
68
|
# (defaults to now)
|
|
68
69
|
# @param overwrite [false, true] whether to overwrite the target
|
|
69
|
-
#
|
|
70
|
+
#
|
|
70
71
|
# @raise [SystemCallError] as we are mucking with the file system
|
|
71
|
-
|
|
72
|
+
#
|
|
73
|
+
# @return [Proc] a lambda that returns a read handle
|
|
74
|
+
#
|
|
75
|
+
def settle_blob bin, fh, mtime: nil, overwrite: false, direct: false
|
|
72
76
|
# get the mtimes
|
|
73
|
-
mtime ||= Time.now
|
|
77
|
+
mtime ||= Time.now(in: ?Z)
|
|
74
78
|
mtime = case mtime
|
|
75
79
|
when Time then mtime.to_i
|
|
76
80
|
when Integer then mtime
|
|
@@ -100,7 +104,8 @@ module Store::Digest::Blob::FileSystem
|
|
|
100
104
|
target.utime mtime, mtime
|
|
101
105
|
end
|
|
102
106
|
|
|
103
|
-
|
|
107
|
+
# return a proc that returns the open file handle
|
|
108
|
+
direct ? target.open('rb') : -> { target.open 'rb' }
|
|
104
109
|
end
|
|
105
110
|
|
|
106
111
|
# Return a blob filehandle (or closure that will return said blob).
|