store-digest 0.3.1 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 53000b2277712556bc9fdc4e3a8aa04bc1f2d6c4a432cd672ca1363901f1772d
4
- data.tar.gz: ae7ff0deb6b21b7eb24031a44a74aed287866e792891331164fcf79df6894ab5
3
+ metadata.gz: da26c3144a1f78075a51a15259d48c3d68866f5ccd624e55a1b951aae8db540a
4
+ data.tar.gz: 777dc486557ea9d1f53432ba20e8be72ef193a1a407f7bebc4ce007f950e6d72
5
5
  SHA512:
6
- metadata.gz: fc4f0a08994b1382bb75cd34015e8984f65121439597bae891cfa5bdd55f7ab046b70f86995a4e3840eac699304540d0d92d408450366a11d324543f97ab8cb6
7
- data.tar.gz: 19097b9434eb605b7ebfbc5c1af3d59caefae3d06d241437b7e66c24f9a95b9172fa5a40f754804dfd9192f7833dd4fdc0a51e38273345e0fe43fbac9f28ee4e
6
+ metadata.gz: 99fe3437512764a8ecc3df0653353724d9b01fcc391c21bd2c4ebbfa06078b32fb74a1f3f5a28f4489b7cce4b65bf85ebddae3f96a1a51a6e91c823a27b490c9
7
+ data.tar.gz: 768f4346e6bdd607ab0cb8fa2177913367abe2d87539788033f7a4e76a7f78ca7337f83690d015960067713f23dad8746adcf5a81915b6487bd2094e718c321e
data/TODO.org CHANGED
@@ -1,11 +1,122 @@
1
1
  #+STARTUP: showall indent hidestars
2
+ * TODO 2026-06-07 remaining tasks for scanning overhaul
3
+ - [ ] ~Store::Digest~
4
+ - [ ] create ~#has?~ predicate
5
+ - [ ] test it doyy
6
+ - [ ] overhaul ~#add~
7
+ - [ ] make it pass through to ~Entry~ constructor
8
+ - [ ] create ~#add_raw~
9
+ - [ ] canonicalize media type
10
+ - [ ] update TTL on cache entries on match
11
+ - [ ] return a hash
12
+ - [ ] overhaul ~#remove~
13
+ - [ ] add ~tombstone:~ parameter to return already-deleted entities
14
+ - [ ] create ~#remove_raw~
15
+ - [ ] do we /really/ need to create a ~#remove_raw~?
16
+ - [ ] can we add a ~remove:~ flag to ~#get_raw~?
17
+ - [ ] overhaul ~#get~
18
+ - [ ] add ~tombstone:~ parameter to return already-deleted entities
19
+ - [ ] make it pass through to ~Entry~ constructor
20
+ - [ ] create ~#get_raw~
21
+ - [ ] silently remove expired cache items
22
+ - could probably chuck some kinda ~#purge~ operation in a thread; run it every few seconds
23
+ - [ ] return the same hash as ~#add_raw~
24
+ - [ ] ~Store::Digest::Entry~
25
+ - [ ] finish modifications to ~#initialize~
26
+ - [ ] prune out parameters that you shouldn't be able to mess with
27
+ - [ ] add ~#store.get_raw~ scenario to constructor
28
+ - [ ] if ~content~ is ~nil~ and there is a ~store:~ supplied
29
+ - [ ] (~content~ /must not/ be ~nil~ /except/ if there is a ~store:~)
30
+ - [ ] maybe implement as a block?
31
+ - [ ] implement ~#add~
32
+ - [ ] may be a noop if already associated with the store
33
+ - [ ] should produce an error if entry is a tombstone/has no content
34
+ - [ ] otherwise the entry will have to be rescanned
35
+ - [ ] implement ~#remove~
36
+ - [ ] implement ~#scan!~
37
+ - [ ] use ~store.add_raw~ if associated with a store
38
+ - (this returns a file handle or precursor thereto)
39
+ - [ ] otherwise use ~self.class.scan_raw~
40
+ - (this needs its own filehandle, or we can reuse the input if rewindable)
41
+ * TODO 2026-05-26 beef up scanning
42
+ - *GOAL*: We want to be able to drop one of these objects into a ~Rack~ message body — either request or response — and it should JFW.
43
+ - We should be able to take what's already in there and initialize the object with that.
44
+ - The result must be rewindable.
45
+ - [-] ~Store::Digest::Object~ changes
46
+ - [X] rename to ~Store::Digest::Entry~
47
+ - [X] add a ~Store::Digest::Object~ constant for the time being
48
+ - [X] absorb ~Store::Digest::Object::IOWrapper~
49
+ - [X] expose a minimal ~#each~ ~#read~ ~#gets~ ~#rewind~ ~#close~ interface
50
+ - [X] ~#content~ becomes a no-op
51
+ - [-] overhaul ~#initialize~
52
+ - [X] [[https://github.com/rack/rack/blob/main/SPEC.rdoc#the-body][ingest all kinds of objects]] (via ~Store::Digest::ReadWrapper~)
53
+ - [X] strings
54
+ - [X] arrays of strings / enumerators that yield strings / ~#each~
55
+ - [X] [[https://github.com/rack/rack/blob/main/SPEC.rdoc#the-input-stream][things that quack like]] ~#gets~ ~#read~ ~#close~
56
+ - [X] things that respond to ~#call~
57
+ - [X] that takes no arguments and returns one of the above
58
+ - [X] that takes a write handle as its first (only?) argument
59
+ - [ ] provide a ~store:~ named parameter for associating with a ~Store::Digest~ instance
60
+ - [ ] make it so ~digests:~ can take an ~Array~ of ~Symbol~ objects that represent the desired algorithms when the content is eventually scanned
61
+ - [ ] add a ~cache:~ named parameter which can be falsy, truthy, an integer representing a TTL, or a ~Time~ object indicating the desired expiry date
62
+ - [ ] add scan ~scan:~ named parameter (effectively the same as invoking ~.new(…).scan~)
63
+ - [ ] do not scan input until an attempt to access its contents
64
+ - [ ] first call to ~#each~, ~#gets~, ~#read~, or ~#content~
65
+ - [ ] first call to the ~#digests~ hash or any of the dependent metadata, such as ~#size~
66
+ - [ ] may not scan at all if passed in an object which is already scanned
67
+ - [ ] overhaul ~#scan~ instance method
68
+ - [ ] make it idempotent
69
+ - [ ] create a ~#scan!~ method that forces the issue
70
+ - [ ] deprecate the arguments/parameters and have it just operate over itself
71
+ - [ ] scanning the input adds it to the store if a backreference is present and replaces the input with the settled blob in the store
72
+ - [ ] swap in a different read handle if there is no backreference to the store
73
+ - [ ] the passed-in handle itself if it responds to ~#binmode~ and ~#rewind~
74
+ - [ ] ~StringIO~ for small blobs (1k? 10k? 64k?)
75
+ - how would I do that? start writing to a ~StringIO~ and if it ticks over the limit, dump that into a temp file and switch?
76
+ - (probably)
77
+ - (the benefit to doing that, aside from being faster, would be to mitigate the number of concurrent open file handles)
78
+ - (at the cost of bigger stuff being one blit slower)
79
+ - (honestly it'd probably get buffered in ram anyway)
80
+ - [ ] otherwise an ordinary temporary file
81
+ - [ ] Overhaul ~.scan~ class method to align with new constructor
82
+ - [ ] ~store:~ named parameter
83
+ - [ ] ~cache:~ named parameter
84
+ - [ ] initialize with ~scan: true~, obviously
85
+ - [X] create a ~.scan_raw~ class method so ~Store::Digest~ can use it independently
86
+ - [X] make it return a minimal set: digest hashes, byte count, and optional (scanned) content type
87
+ - [ ] if hashes are already supplied in the constructor then it is assumed to be scanned already
88
+ - [ ] a call to ~#scan~ is a no-op (because ~#scan~ should be idempotent)
89
+ - [ ] a call to ~#scan!~ is assumed to mean you want to verify the content and should blow up if *any* of the supplied hashes don't match
90
+ - [ ] implement ~IO~ and/or ~Enumerable~ emulation methods
91
+ - [ ] Any call to any of these should start a scan that ultimately replaces the underlying content with a rewindable file handle
92
+ - [ ] (unless the original object implements ~#rewind~ and there is no association with a store)
93
+ - (although note the caveat that the original object can be written to in situ, like it's more efficient to reuse it but less secure)
94
+ - [ ] maybe make an option in the constructor to not do this?
95
+ - [ ] ~#each~
96
+ - [ ] ~#gets~
97
+ - [ ] ~#read~
98
+ - [ ] ~#rewind~
99
+ - [ ] ~#open~ (no-op)
100
+ - [ ] ~#close~ (no-op)
101
+ - [ ] make ~#content~ a no-op
102
+ - [ ] going to have to overhaul ~#to_h~
103
+ - [ ] make a version that doesn't scan?
104
+ - [ ] add ~#stored?~
105
+ - [-] ~Store::Digest~ changes
106
+ - [ ] add something like a ~#add_raw~ method that will take the entry's readable object and return a ~Hash~ of metadata plus a lambda that returns a rewindable file handle
107
+ - [ ] overhaul ~#add~
108
+ - [ ] overhaul ~#get~
109
+ - [X] nuke ~#lazy_add~; it is no longer necessary
110
+ - [X] ~MimeMagic~ monkeypatch
111
+ - [X] lobby maintainers to merge my patches (again; not a peep in years)
112
+ - [X] failing that, ship a fork
2
113
  * TODO upgrade metadata format
3
- - [ ] get rid of the idea of a 'primary' digest algorithm
4
- - [ ] change the main record to being keyed by integers and contain all hashes
114
+ - [X] get rid of the idea of a 'primary' digest algorithm
115
+ - [X] change the main record to being keyed by integers and contain all hashes
5
116
  - old: ≥209 bytes + 316 mapping
6
117
  - new: ≥218 + 220 mapping
7
- - [ ] make digest -> integer mapping
8
- - [ ] add version to metadata
118
+ - [X] make digest -> integer mapping
119
+ - [X] add version to metadata
9
120
  - [ ] add function to upgrade metadata store
10
121
  * TODO add cache capability
11
122
  - [ ] add fields to control:
@@ -57,18 +57,22 @@ module Store::Digest::Blob::FileSystem
57
57
  # Return an open tempfile in the designated temp directory
58
58
  # @return [Tempfile]
59
59
  def temp_blob
60
- Tempfile.new 'blob', tmp
60
+ Tempfile.new 'blob', tmp, encoding: Encoding::BINARY
61
61
  end
62
62
 
63
63
  # Settle a blob from its temporary location to its permanent location.
64
+ #
64
65
  # @param bin [String] The binary representation of the keying digest
65
66
  # @param fh [File] An open filehandle, presumably a temp file
66
67
  # @param mtime [nil, Time, DateTime, Integer] the modification time
67
68
  # (defaults to now)
68
69
  # @param overwrite [false, true] whether to overwrite the target
69
- # @return [true] a throwaway return value
70
+ #
70
71
  # @raise [SystemCallError] as we are mucking with the file system
71
- def settle_blob bin, fh, mtime: nil, overwrite: false
72
+ #
73
+ # @return [Proc] a lambda that returns a read handle
74
+ #
75
+ def settle_blob bin, fh, mtime: nil, overwrite: false, direct: false
72
76
  # get the mtimes
73
77
  mtime ||= Time.now
74
78
  mtime = case mtime
@@ -100,7 +104,8 @@ module Store::Digest::Blob::FileSystem
100
104
  target.utime mtime, mtime
101
105
  end
102
106
 
103
- true
107
+ # return a proc that returns the open file handle
108
+ direct ? target.open('rb') : -> { target.open 'rb' }
104
109
  end
105
110
 
106
111
  # Return a blob filehandle (or closure that will return said blob).
@@ -30,4 +30,8 @@ module Store::Digest::Driver
30
30
  def setup **options
31
31
  raise NotImplementedError, 'gotta roll your own, holmes'
32
32
  end
33
+
34
+ def close_internal
35
+ raise NotImplementedError, 'close_internal not implemented'
36
+ end
33
37
  end