zip_tricks 5.1.1 → 5.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 311852db3fa83e4f6631fb8ecf563255c0f16a65bf2b3c2f2375f41040eb3661
4
- data.tar.gz: e7b307a3b5eb5ed344d20c85d8d5cef40e89e9afbe61ce5dded2f3971b60002a
3
+ metadata.gz: 50ba2d6a0b5bde1cf51443c7ff4228a7e99a1cc1ac843e3ef23c0d197878a04b
4
+ data.tar.gz: 5fdb377cc34fd6d9edb4e7932e79ebe28bc2dc91aa71348e386b8b601e4f06f1
5
5
  SHA512:
6
- metadata.gz: 031da6b3ab1199c8967f7cfbf0116beb9025346b10ae3679391f08c6c3205672842b1f38309efeb710af758c0c28a9c2ed0b5ce784936f3c3bfdb79b3920a3a2
7
- data.tar.gz: 5e412daec62f191e4e54cfe8ff60127784264b0c851454db58b95b646af379a219d271e9d6d94aa00beb0d91b986960da07266940cc58764abb2e3e8892372b6
6
+ metadata.gz: 4c2ae765d7e6c584632b7606d66b970c100b9679cb6b503209be1179bc9582f83727b13154bfce5281dd8dc105f1f5112d2794df7f97ebb987a1e224f67f4840
7
+ data.tar.gz: e06306028f18fe1eb16abe12c48cd93182d11451298cde6068b9ccb37b78846be469b6bd48848507f38f8796b00281f27391589d20d9163b3845cba4112411c5
@@ -11,3 +11,5 @@ Style/GlobalVars:
11
11
  - qa/*.rb
12
12
  - spec/spec_helper.rb
13
13
  - spec/support/zip_inspection.rb
14
+ Layout/IndentHeredoc:
15
+ Enabled: false
@@ -4,8 +4,5 @@ rvm:
4
4
  - jruby-9.0
5
5
  sudo: false
6
6
  cache: bundler
7
- matrix:
8
- allow_failures:
9
- - rvm: jruby-9.0
10
7
  script:
11
8
  - bundle exec rake
@@ -1,3 +1,39 @@
1
+ ## 5.5.0
2
+
3
+ * In `OutputEnumerator` apply some amount of buffering to be within a UNIX socket size for metatada writes. This
4
+ speeds up usage with Puma by about 20 percent, as there won't be as many `syswrite` calls on the socket.
5
+ * Make `StoredWriter` and `DeflatedWriter` public constants so that standalone tests can be written for them
6
+
7
+ ## 5.4.0
8
+
9
+ * Use block form for zlib Deflater calls to conserve memory
10
+ * Do not change string encoding in writer wrappers (avoid extra work)
11
+ * Fix a zlib deflater object being leaked per archived file
12
+ * Speed up streaming CRC32 computation
13
+ * When running tests, assign the port for the Puma server dynamically
14
+ * Reduce string allocations in the block deflate spec
15
+ * Make sure RemoteUncap specs run under JRuby correctly
16
+ * Replace Rails::Live streaming with iterable body streaming to avoid issues with Rails::Live across the board
17
+ * Remove `qa/` directory and scripts, as the tests for the library proper should now be sufficient
18
+ * Fix some documentation and sample code omissions and inconsistencies.
19
+
20
+ ## 5.3.1
21
+
22
+ * Fix extended timestamp timestamp value encoding. Previously we would use an incorrect encoding for the timestamp value, which would output correct but nonsensical timestamps. The pack specifier is now changed to output the correct value.
23
+
24
+ ## 5.3.0
25
+
26
+ * Raise in `Streamer#close` when the IO offset of the Streamer does not match the size of the written entries. This is a situation which
27
+ can occur if one adds the local headers, writes the bodies of the files to the socket/output directly, and forgets to adjust the internal
28
+ Streamer offset. The unadjusted offset would then produce incorrect values in both the local headers which come after the missing
29
+ offset adjustment _and_ in the central directory headers. Some ZIP unarchivers are able to recover from this (ones that read
30
+ files "straight-ahead" but others aren't - if the ZIP unarchiver uses central directory entries it would be using incorrect offsets.
31
+ Instead of producing an invalid ZIP, raise an exception which explains what happened and how it can be resolved.
32
+
33
+ ## 5.2.0
34
+
35
+ * Remove `Streamer#add_compressed_entry` and `SizeEstimator#add_compressed_entry`
36
+
1
37
  ## 5.1.1
2
38
 
3
39
  * Fix extended timestamp extra field output. The first bit of the flag would be set instead of the last bit of
data/README.md CHANGED
@@ -24,20 +24,20 @@ to [32 bit sizes.](https://github.com/jruby/jruby/issues/3817)
24
24
 
25
25
  ## Diving in: send some large CSV reports from Rails
26
26
 
27
- The easiest is to use the Rails' built-in streaming feature:
27
+ The easiest is to include the `ZipTricks::RailsStreaming` module into your
28
+ controller.
28
29
 
29
30
  ```ruby
30
31
  class ZipsController < ActionController::Base
31
- include ActionController::Live # required for streaming
32
32
  include ZipTricks::RailsStreaming
33
33
 
34
34
  def download
35
35
  zip_tricks_stream do |zip|
36
36
  zip.write_deflated_file('report1.csv') do |sink|
37
37
  CSV(sink) do |csv_write|
38
- csv << Person.column_names
38
+ csv_write << Person.column_names
39
39
  Person.all.find_each do |person|
40
- csv << person.attributes.values
40
+ csv_write << person.attributes.values
41
41
  end
42
42
  end
43
43
  end
@@ -49,6 +49,10 @@ class ZipsController < ActionController::Base
49
49
  end
50
50
  ```
51
51
 
52
+ If you want some more conveniences you can also use [zipline](https://github.com/fringd/zipline) which
53
+ will automatically process and stream attachments (Carrierwave, Shrine, ActiveStorage) and remote objects
54
+ via HTTP.
55
+
52
56
  ## Create a ZIP file without size estimation, compress on-the-fly during writes
53
57
 
54
58
  Basic use case is compressing on the fly. Some data will be buffered by the Zlib deflater, but
@@ -71,12 +75,15 @@ since you do not know how large the compressed data segments are going to be.
71
75
 
72
76
  ## Send a ZIP from a Rack response
73
77
 
74
- Create a `RackBody` object and give it's constructor a block that adds files.
75
- The block will only be called when actually sending the response to the client
78
+ To "pull" data from ZipTricks you can create an `OutputEnumerator` object which will yield the binary chunks piece
79
+ by piece, and apply some amount of buffering as well. Since this `OutputEnumerator` responds to `#each` and yields
80
+ Strings it also can (and should!) be used as a Rack response body. Return it to your webserver and you will
81
+ have your ZIP streamed. The block that you give to the `OutputEnumerator` will only start executing once your
82
+ response body starts getting iterated over - when actually sending the response to the client
76
83
  (unless you are using a buffering Rack webserver, such as Webrick).
77
84
 
78
85
  ```ruby
79
- body = ZipTricks::RackBody.new do | zip |
86
+ body = ZipTricks::Streamer.output_enum do | zip |
80
87
  zip.write_stored_file('mov.mp4') do |sink| # Those MPEG4 files do not compress that well
81
88
  File.open('mov.mp4', 'rb'){|source| IO.copy_stream(source, sink) }
82
89
  end
@@ -84,7 +91,7 @@ body = ZipTricks::RackBody.new do | zip |
84
91
  File.open('novel.txt', 'rb'){|source| IO.copy_stream(source, sink) }
85
92
  end
86
93
  end
87
- [200, {'Transfer-Encoding' => 'chunked'}, body]
94
+ [200, {}, body]
88
95
  ```
89
96
 
90
97
  ## Send a ZIP file of known size, with correct headers
@@ -123,11 +130,12 @@ ZipTricks::Streamer.open(io) do | zip |
123
130
  # Write the local file header first..
124
131
  zip.add_stored_entry(filename: "first-file.bin", size: raw_file.size, crc32: raw_file_crc32)
125
132
 
126
- # then send the actual file contents bypassing the Streamer interface
133
+ # Adjust the ZIP offsets within the Streamer
134
+ zip.simulate_write(my_temp_file.size)
135
+
136
+ # ...and then send the actual file contents bypassing the Streamer interface
127
137
  io.sendfile(my_temp_file)
128
138
 
129
- # ...and then adjust the ZIP offsets within the Streamer
130
- zip.simulate_write(my_temp_file.size)
131
139
  end
132
140
  ```
133
141
 
@@ -174,9 +182,12 @@ that have not been formally verified (ours hasn't been).
174
182
  * Commit and push until you are happy with your contribution.
175
183
  * Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
176
184
  * Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
177
- * If you alter the `ZipWriter`, please take the time to run the test in the `qa/` directory. Ensure that the generated (large) files open manually - see README_QA for more.
178
185
 
179
- ## Copyright
186
+ ## Copyright and license
187
+
188
+ Copyright (c) 2020 WeTransfer.
180
189
 
181
- Copyright (c) 2019 WeTransfer. `zip_tricks` is distributed under the conditions of the [Hippocratic License](https://firstdonoharm.dev/version/1/2/license.html)
182
- - See LICENSE.txt for further details.
190
+ `zip_tricks` is distributed under the conditions of the [Hippocratic License](https://firstdonoharm.dev/version/1/2/license.html)
191
+ See LICENSE.txt for further details. If this license is not acceptable for your use case we still maintain the 4.x version tree
192
+ which remains under the MIT license, see https://rubygems.org/gems/zip_tricks/versions for more information.
193
+ Note that we only backport some performance optimizations and crucial bugfixes but not the new features to that tree.
@@ -1,11 +1,25 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- # Stashes a block given by the Rack webserver when calling each() on a body, and calls
4
- # that block every time it is written to using :<< (shovel). Poses as an IO for rubyzip.
5
-
3
+ # Acts as a converter between callers which send data to the `#<<` method (such as all the ZipTricks
4
+ # writer methods, which push onto anything), and a given block. Every time `#<<` gets called on the BlockWrite,
5
+ # the block given to the constructor will be called with the same argument. ZipTricks uses this object
6
+ # when integrating with Rack and in the OutputEnumerator. Normally you wouldn't need to use it manually but
7
+ # you always can. BlockWrite will also ensure the binary string encoding is forced onto any string
8
+ # that passes through it.
9
+ #
10
+ # For example, you can create a Rack response body like so:
11
+ #
12
+ # class MyRackResponse
13
+ # def each
14
+ # writer = ZipTricks::BlockWrite.new {|chunk| yield(chunk) }
15
+ # writer << "Hello" << "world" << "!"
16
+ # end
17
+ # end
18
+ # [200, {}, MyRackResponse.new]
6
19
  class ZipTricks::BlockWrite
7
- # The block is the block given to each() of the Rack body, or other block you want
8
- # to receive the string chunks written by the zip compressor.
20
+ # Creates a new BlockWrite.
21
+ #
22
+ # @param block The block that will be called when this object receives the `<<` message
9
23
  def initialize(&block)
10
24
  @block = block
11
25
  end
@@ -17,26 +31,17 @@ class ZipTricks::BlockWrite
17
31
  end
18
32
  end
19
33
 
20
- # Every time this object gets written to, call the Rack body each() block
21
- # with the bytes given instead.
34
+ # Sends a string through to the block stored in the BlockWrite.
35
+ #
36
+ # @param buf[String] the string to write. Note that a zero-length String
37
+ # will not be forwarded to the block, as it has special meaning when used
38
+ # with chunked encoding (it indicates the end of the stream).
39
+ # @return self
22
40
  def <<(buf)
23
41
  # Zero-size output has a special meaning when using chunked encoding
24
42
  return if buf.nil? || buf.bytesize.zero?
25
43
 
26
- # Ensure we ALWAYS write in binary encoding.
27
- encoded =
28
- if buf.encoding != Encoding::BINARY
29
- # If we got a frozen string we can't force_encoding on it
30
- begin
31
- buf.force_encoding(Encoding::BINARY)
32
- rescue
33
- buf.dup.force_encoding(Encoding::BINARY)
34
- end
35
- else
36
- buf
37
- end
38
-
39
- @block.call(encoded)
44
+ @block.call(buf.b)
40
45
  self
41
46
  end
42
47
  end
@@ -19,7 +19,7 @@ require 'stringio'
19
19
  # ## Usage
20
20
  #
21
21
  # File.open('zipfile.zip', 'rb') do |f|
22
- # entries = FileReader.read_zip_structure(f)
22
+ # entries = ZipTricks::FileReader.read_zip_structure(io: f)
23
23
  # entries.each do |e|
24
24
  # File.open(e.filename, 'wb') do |extracted_file|
25
25
  # ex = e.extractor_from(f)
@@ -82,7 +82,9 @@ class ZipTricks::FileReader
82
82
 
83
83
  private_constant :StoredReader, :InflatingReader
84
84
 
85
- # Represents a file within the ZIP archive being read
85
+ # Represents a file within the ZIP archive being read. This is different from
86
+ # the Entry object used in Streamer for ZIP writing, since during writing more
87
+ # data can be kept in memory for immediate use.
86
88
  class ZipEntry
87
89
  # @return [Fixnum] bit-packed version signature of the program that made the archive
88
90
  attr_accessor :made_by
@@ -279,7 +281,7 @@ class ZipTricks::FileReader
279
281
  seek(io, next_local_header_offset) # Seek to the next entry, and raise if seek is impossible
280
282
  end
281
283
  entries
282
- rescue ReadError
284
+ rescue ReadError, RangeError # RangeError is raised if offset exceeds int32/int64 range
283
285
  log do
284
286
  'Got a read/seek error after reaching %<cur_offset>d, no more entries can be recovered' %
285
287
  {cur_offset: cur_offset}
@@ -363,7 +365,7 @@ class ZipTricks::FileReader
363
365
  # (read starting at this offset to get the data).
364
366
  #
365
367
  # @param io[#seek, #read] an IO-ish object the ZIP file can be read from
366
- # @param local_header_offset[Fixnum] absolute offset (0-based) where the
368
+ # @param local_file_header_offset[Fixnum] absolute offset (0-based) where the
367
369
  # local file header is supposed to begin @return [Fixnum] absolute offset
368
370
  # (0-based) of where the compressed data begins for this file within the ZIP
369
371
  def get_compressed_data_offset(io:, local_file_header_offset:)
@@ -375,7 +377,7 @@ class ZipTricks::FileReader
375
377
  # Parse an IO handle to a ZIP archive into an array of Entry objects, reading from the end
376
378
  # of the IO object.
377
379
  #
378
- # @see {#read_zip_structure}
380
+ # @see #read_zip_structure
379
381
  # @param options[Hash] any options the instance method of the same name accepts
380
382
  # @return [Array<ZipEntry>] an array of entries within the ZIP being parsed
381
383
  def self.read_zip_structure(**options)
@@ -385,7 +387,7 @@ class ZipTricks::FileReader
385
387
  # Parse an IO handle to a ZIP archive into an array of Entry objects, reading from the start of
386
388
  # the file and parsing local file headers one-by-one
387
389
  #
388
- # @see {#read_zip_straight_ahead}
390
+ # @see #read_zip_straight_ahead
389
391
  # @param options[Hash] any options the instance method of the same name accepts
390
392
  # @return [Array<ZipEntry>] an array of entries within the ZIP being parsed
391
393
  def self.read_zip_straight_ahead(**options)
@@ -4,7 +4,7 @@
4
4
  # write operations, but want to discard the data (like when
5
5
  # estimating the size of a ZIP)
6
6
  module ZipTricks::NullWriter
7
- # @param data[String] the data to write
7
+ # @param _[String] the data to write
8
8
  # @return [self]
9
9
  def self.<<(_)
10
10
  self
@@ -1,43 +1,64 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- # Can be used as a Rack response body directly. Will yield
4
- # a {ZipTricks::Streamer} for adding entries to the archive and writing
5
- # zip entry bodies.
3
+ # The output enumerator makes it possible to "pull" from a ZipTricks streamer
4
+ # object instead of having it "push" writes to you. It will "stash" the block which
5
+ # writes the ZIP archive through the streamer, and when you call `each` on the Enumerator
6
+ # it will yield you the bytes the block writes. Since it is an enumerator you can
7
+ # use `next` to take chunks written by the ZipTricks streamer one by one. It can be very
8
+ # convenient when you need to segment your ZIP output into bigger chunks for, say,
9
+ # uploading them to a cloud storage provider such as S3.
10
+ #
11
+ # Another use of the output enumerator is outputting a ZIP archive from Rails or Rack,
12
+ # where an object responding to `each` is required which yields Strings. For instance,
13
+ # you can return a ZIP archive from Rack like so:
14
+ #
15
+ # iterable_zip_body = ZipTricks::OutputEnumerator.new do | streamer |
16
+ # streamer.write_deflated_file('big.csv') do |sink|
17
+ # CSV(sink) do |csv_writer|
18
+ # csv_writer << Person.column_names
19
+ # Person.all.find_each do |person|
20
+ # csv_writer << person.attributes.values
21
+ # end
22
+ # end
23
+ # end
24
+ # end
25
+ #
26
+ # [200, {'Content-Type' => 'binary/octet-stream'}, iterable_zip_body]
6
27
  class ZipTricks::OutputEnumerator
7
- # Prepares a new Rack response body with a Zip output stream.
8
- # The block given to the constructor will be called when the response
9
- # body will be read by the webserver, and will receive a {ZipTricks::Streamer}
10
- # as it's block argument. You can then add entries to the Streamer as usual.
11
- # The archive will be automatically closed at the end of the block.
28
+ DEFAULT_WRITE_BUFFER_SIZE = 64 * 1024
29
+ # Creates a new OutputEnumerator.
12
30
  #
13
- # # Precompute the Content-Length ahead of time
14
- # content_length = ZipTricks::SizeEstimator.estimate do | estimator |
15
- # estimator.add_stored_entry(filename: 'large.tif', size: 1289894)
16
- # end
17
- #
18
- # # Prepare the response body.
19
- # # The block will only be called when the
20
- # # response starts to be written.
21
- # body = ZipTricks::OutputEnumerator.new do | streamer |
22
- # streamer.add_stored_entry(filename: 'large.tif', size: 1289894, crc32: 198210)
23
- # streamer << large_file.read(1024*1024) until large_file.eof?
24
- # ...
25
- # end
26
- #
27
- # return [200, {'Content-Type' => 'binary/octet-stream',
28
- # 'Content-Length' => content_length.to_s}, body]
29
- def initialize(**streamer_options, &blk)
31
+ # @param streamer_options[Hash] options for Streamer, see {ZipTricks::Streamer.new}
32
+ # @param write_buffer_size[Integer] By default all ZipTricks writes are unbuffered. For output to sockets
33
+ # it is beneficial to bulkify those writes so that they are roughly sized to a socket buffer chunk. This
34
+ # object will bulkify writes for you in this way (so `each` will yield not on every call to `<<` from the Streamer
35
+ # but at block size boundaries or greater). Set it to 0 for unbuffered writes.
36
+ # @param blk a block that will receive the Streamer object when executing. The block will not be executed
37
+ # immediately but only once `each` is called on the OutputEnumerator
38
+ def initialize(write_buffer_size: DEFAULT_WRITE_BUFFER_SIZE, **streamer_options, &blk)
30
39
  @streamer_options = streamer_options.to_h
40
+ @bufsize = write_buffer_size.to_i
31
41
  @archiving_block = blk
32
42
  end
33
43
 
34
44
  # Executes the block given to the constructor with a {ZipTricks::Streamer}
35
45
  # and passes each written chunk to the block given to the method. This allows one
36
- # to "take" output of the ZIP piecewise.
46
+ # to "take" output of the ZIP piecewise. If called without a block will return an Enumerator
47
+ # that you can pull data from using `next`.
48
+ #
49
+ # **NOTE** Because the `WriteBuffer` inside this object can reuse the buffer, it is important
50
+ # that the `String` that is yielded **either** gets consumed eagerly (written byte-by-byte somewhere, or `#dup`-ed)
51
+ # since the write buffer will clear it after your block returns. If you expand this Enumerator
52
+ # eagerly into an Array you might notice that a lot of the segments of your ZIP output are
53
+ # empty - this means that you need to duplicate them.
54
+ #
55
+ # @yield [String] a chunk of the ZIP output in binary encoding
37
56
  def each
38
57
  if block_given?
39
58
  block_write = ZipTricks::BlockWrite.new { |chunk| yield(chunk) }
40
- ZipTricks::Streamer.open(block_write, **@streamer_options, &@archiving_block)
59
+ buffer = ZipTricks::WriteBuffer.new(block_write, @bufsize)
60
+ ZipTricks::Streamer.open(buffer, **@streamer_options, &@archiving_block)
61
+ buffer.flush
41
62
  else
42
63
  enum_for(:each)
43
64
  end
@@ -32,6 +32,10 @@
32
32
  # conflict is avoided. This is not possible to apply to directories, because when one of the
33
33
  # path components is reused in multiple filenames it means those entities should end up in
34
34
  # the same directory (subdirectory) once the archive is opened.
35
+ #
36
+ # The `PathSet` keeps track of entries as they get added using 2 Sets (cheap presence checks),
37
+ # one for directories and one for files. It will raise a `Conflict` exception if there are
38
+ # files clobbering one another, or in case files collide with directories.
35
39
  class ZipTricks::PathSet
36
40
  class Conflict < StandardError
37
41
  end
@@ -6,17 +6,16 @@ module ZipTricks::RailsStreaming
6
6
  # Opens a {ZipTricks::Streamer} and yields it to the caller. The output of the streamer
7
7
  # gets automatically forwarded to the Rails response stream. When the output completes,
8
8
  # the Rails response stream is going to be closed automatically.
9
+ # @param zip_streamer_options[Hash] options that will be passed to the Streamer.
10
+ # See {ZipTricks::Streamer#initialize} for the full list of options.
9
11
  # @yield [Streamer] the streamer that can be written to
10
- def zip_tricks_stream
12
+ # @return [ZipTricks::OutputEnumerator] The output enumerator assigned to the response body
13
+ def zip_tricks_stream(**zip_streamer_options, &zip_streaming_blk)
11
14
  # Set a reasonable content type
12
15
  response.headers['Content-Type'] = 'application/zip'
13
16
  # Make sure nginx buffering is suppressed - see https://github.com/WeTransfer/zip_tricks/issues/48
14
17
  response.headers['X-Accel-Buffering'] = 'no'
15
- # Create a wrapper for the write call that quacks like something you
16
- # can << to, used by ZipTricks
17
- w = ZipTricks::BlockWrite.new { |chunk| response.stream.write(chunk) }
18
- ZipTricks::Streamer.open(w) { |z| yield(z) }
19
- ensure
20
- response.stream.close
18
+ response.sending_file = true
19
+ self.response_body = ZipTricks::OutputEnumerator.new(**zip_streamer_options, &zip_streaming_blk)
21
20
  end
22
21
  end
@@ -73,9 +73,6 @@ class ZipTricks::SizeEstimator
73
73
  self
74
74
  end
75
75
 
76
- # Will be phased out in ZipTricks 5.x
77
- alias_method :add_compressed_entry, :add_deflated_entry
78
-
79
76
  # Add an empty directory to the archive.
80
77
  #
81
78
  # @param dirname [String] the name of the directory
@@ -27,7 +27,7 @@ class ZipTricks::StreamCRC32
27
27
 
28
28
  # Creates a new streaming CRC32 calculator
29
29
  def initialize
30
- @crc = Zlib.crc32('')
30
+ @crc = Zlib.crc32
31
31
  end
32
32
 
33
33
  # Append data to the CRC32. Updates the contained CRC32 value in place.
@@ -35,7 +35,7 @@ class ZipTricks::StreamCRC32
35
35
  # @param blob[String] the string to compute the CRC32 from
36
36
  # @return [self]
37
37
  def <<(blob)
38
- @crc = Zlib.crc32_combine(@crc, Zlib.crc32(blob), blob.bytesize)
38
+ @crc = Zlib.crc32(blob, @crc)
39
39
  self
40
40
  end
41
41