zip_tricks 5.1.0 → 5.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 983b83d36f88a25db8276a74e45d506f801ac3ecdd64197fd6fbc98bdb39cb03
4
- data.tar.gz: c74a8a3936f4afb1f71ad7775025b73ccd04d2db309df8942a20ff64e82cea81
3
+ metadata.gz: 05e8eea8ecf1ad0b9b9cb132c54dde1abd60dc003afd1e5a1ce989786ce89608
4
+ data.tar.gz: fbdc2172fc3becefa4dd8720713acb11d838900465ac3cb42de38fec76fd0d90
5
5
  SHA512:
6
- metadata.gz: 2b6eed391cd16c4077003535b0d7fb52e6930f351b230d31ba68aab2f05905df3987565aad76741083e888a91d054eaff3beab507117088b812b6c95f9ecf4f3
7
- data.tar.gz: 9c4ad53ef462d28f8cdffcc439200788cb260e9d1b35ee231aed3b7979630d0c31881db95198a944c375c836678063e0db187fbfb3c2672819ad5e06d1023dc9
6
+ metadata.gz: 449c59e898d2b54a089b60d7aebe7633d9f65ad64a5ce014a7a44e1683f6d6cec4997f732fd06746edeec96594d5041b68efcf19571ef92c9798acc05ffda7bd
7
+ data.tar.gz: 5ed26109e12373acfb9866531ef03c4302141bbb8a312f759ff768295c8d7926f650745f3f2624ad652333e57cd5bc5f98f932ace41504aeca45a668ef7a4f4a
@@ -11,3 +11,5 @@ Style/GlobalVars:
11
11
  - qa/*.rb
12
12
  - spec/spec_helper.rb
13
13
  - spec/support/zip_inspection.rb
14
+ Layout/IndentHeredoc:
15
+ Enabled: false
@@ -4,8 +4,5 @@ rvm:
4
4
  - jruby-9.0
5
5
  sudo: false
6
6
  cache: bundler
7
- matrix:
8
- allow_failures:
9
- - rvm: jruby-9.0
10
7
  script:
11
8
  - bundle exec rake
@@ -1,3 +1,39 @@
1
+ ## 5.4.0
2
+
3
+ * Use block form for zlib Deflater calls to conserve memory
4
+ * Do not change string encoding in writer wrappers (avoid extra work)
5
+ * Fix a zlib deflater object being leaked per archived file
6
+ * Speed up streaming CRC32 computation
7
+ * When running tests, assign the port for the Puma server dynamically
8
+ * Reduce string allocations in the block deflate spec
9
+ * Make sure RemoteUncap specs run under JRuby correctly
10
+ * Replace Rails::Live streaming with iterable body streaming to avoid issues with Rails::Live across the board
11
+ * Remove `qa/` directory and scripts, as the tests for the library proper should now be sufficient
12
+ * Fix some documentation and sample code omissions and inconsistencies.
13
+
14
+ ## 5.3.1
15
+
16
+ * Fix extended timestamp timestamp value encoding. Previously we would use an incorrect encoding for the timestamp value, which would output correct but nonsensical timestamps. The pack specifier is now changed to output the correct value.
17
+
18
+ ## 5.3.0
19
+
20
+ * Raise in `Streamer#close` when the IO offset of the Streamer does not match the size of the written entries. This is a situation which
21
+ can occur if one adds the local headers, writes the bodies of the files to the socket/output directly, and forgets to adjust the internal
22
+ Streamer offset. The unadjusted offset would then produce incorrect values in both the local headers which come after the missing
23
+ offset adjustment _and_ in the central directory headers. Some ZIP unarchivers are able to recover from this (ones that read
24
+ files "straight-ahead" but others aren't - if the ZIP unarchiver uses central directory entries it would be using incorrect offsets.
25
+ Instead of producing an invalid ZIP, raise an exception which explains what happened and how it can be resolved.
26
+
27
+ ## 5.2.0
28
+
29
+ * Remove `Streamer#add_compressed_entry` and `SizeEstimator#add_compressed_entry`
30
+
31
+ ## 5.1.1
32
+
33
+ * Fix extended timestamp extra field output. The first bit of the flag would be set instead of the last bit of
34
+ the flag, which made it impossible for Rubyzip to read the timestamp of the entry - and it would also make
35
+ the extra field useless for most reading applications.
36
+
1
37
  ## 5.1.0
2
38
 
3
39
  * Slightly rework `RemoteIO` and `RemoteUncap` and make sure they work correctly by spinning up a test webserver
@@ -25,16 +61,16 @@
25
61
 
26
62
  ## 4.7.4
27
63
 
28
- * Use a single fixed capacity string in StreamCRC32.from_io to avoid unnecessary allocations
64
+ * Use a single fixed capacity string in `StreamCRC32.from_io` to avoid unnecessary allocations
29
65
  * Fix a few tests that were calling out to external binaries
30
66
 
31
67
  ## 4.7.3
32
68
 
33
- * Fix RemoteUncap#request_object_size to function correctly
69
+ * Fix `RemoteUncap#request_object_size` to function correctly
34
70
 
35
71
  ## 4.7.2
36
72
 
37
- * Relax bundler dependency so that both 1.x and 2.x are supported cleanly
73
+ * Relax bundler dependency so that both bundler 1.x and 2.x are supported cleanly
38
74
 
39
75
  ## 4.7.1
40
76
 
data/README.md CHANGED
@@ -24,11 +24,11 @@ to [32 bit sizes.](https://github.com/jruby/jruby/issues/3817)
24
24
 
25
25
  ## Diving in: send some large CSV reports from Rails
26
26
 
27
- The easiest is to use the Rails' built-in streaming feature:
27
+ The easiest is to include the `ZipTricks::RailsStreaming` module into your
28
+ controller.
28
29
 
29
30
  ```ruby
30
31
  class ZipsController < ActionController::Base
31
- include ActionController::Live # required for streaming
32
32
  include ZipTricks::RailsStreaming
33
33
 
34
34
  def download
@@ -49,6 +49,10 @@ class ZipsController < ActionController::Base
49
49
  end
50
50
  ```
51
51
 
52
+ If you want some more conveniences you can also use [zipline](https://github.com/fringd/zipline) which
53
+ will automatically process and stream attachments (Carrierwave, Shrine, ActiveStorage) and remote objects
54
+ via HTTP.
55
+
52
56
  ## Create a ZIP file without size estimation, compress on-the-fly during writes
53
57
 
54
58
  Basic use case is compressing on the fly. Some data will be buffered by the Zlib deflater, but
@@ -84,7 +88,7 @@ body = ZipTricks::RackBody.new do | zip |
84
88
  File.open('novel.txt', 'rb'){|source| IO.copy_stream(source, sink) }
85
89
  end
86
90
  end
87
- [200, {'Transfer-Encoding' => 'chunked'}, body]
91
+ [200, {}, body]
88
92
  ```
89
93
 
90
94
  ## Send a ZIP file of known size, with correct headers
@@ -174,9 +178,12 @@ that have not been formally verified (ours hasn't been).
174
178
  * Commit and push until you are happy with your contribution.
175
179
  * Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
176
180
  * Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
177
- * If you alter the `ZipWriter`, please take the time to run the test in the `qa/` directory. Ensure that the generated (large) files open manually - see README_QA for more.
178
181
 
179
- ## Copyright
182
+ ## Copyright and license
183
+
184
+ Copyright (c) 2020 WeTransfer.
180
185
 
181
- Copyright (c) 2019 WeTransfer. `zip_tricks` is distributed under the conditions of the [Hippocratic License](https://firstdonoharm.dev/version/1/2/license.html)
182
- - See LICENSE.txt for further details.
186
+ `zip_tricks` is distributed under the conditions of the [Hippocratic License](https://firstdonoharm.dev/version/1/2/license.html)
187
+ See LICENSE.txt for further details. If this license is not acceptable for your use case we still maintain the 4.x version tree
188
+ which remains under the MIT license, see https://rubygems.org/gems/zip_tricks/versions for more information.
189
+ Note that we only backport some performance optimizations and crucial bugfixes but not the new features to that tree.
@@ -1,11 +1,25 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- # Stashes a block given by the Rack webserver when calling each() on a body, and calls
4
- # that block every time it is written to using :<< (shovel). Poses as an IO for rubyzip.
5
-
3
+ # Acts as a converter between callers which send data to the `#<<` method (such as all the ZipTricks
4
+ # writer methods, which push onto anything), and a given block. Every time `#<<` gets called on the BlockWrite,
5
+ # the block given to the constructor will be called with the same argument. ZipTricks uses this object
6
+ # when integrating with Rack and in the OutputEnumerator. Normally you wouldn't need to use it manually but
7
+ # you always can. BlockWrite will also ensure the binary string encoding is forced onto any string
8
+ # that passes through it.
9
+ #
10
+ # For example, you can create a Rack response body like so:
11
+ #
12
+ # class MyRackResponse
13
+ # def each
14
+ # writer = ZipTricks::BlockWrite.new {|chunk| yield(chunk) }
15
+ # writer << "Hello" << "world" << "!"
16
+ # end
17
+ # end
18
+ # [200, {}, MyRackResponse.new]
6
19
  class ZipTricks::BlockWrite
7
- # The block is the block given to each() of the Rack body, or other block you want
8
- # to receive the string chunks written by the zip compressor.
20
+ # Creates a new BlockWrite.
21
+ #
22
+ # @param block The block that will be called when this object receives the `<<` message
9
23
  def initialize(&block)
10
24
  @block = block
11
25
  end
@@ -17,26 +31,17 @@ class ZipTricks::BlockWrite
17
31
  end
18
32
  end
19
33
 
20
- # Every time this object gets written to, call the Rack body each() block
21
- # with the bytes given instead.
34
+ # Sends a string through to the block stored in the BlockWrite.
35
+ #
36
+ # @param buf[String] the string to write. Note that a zero-length String
37
+ # will not be forwarded to the block, as it has special meaning when used
38
+ # with chunked encoding (it indicates the end of the stream).
39
+ # @return self
22
40
  def <<(buf)
23
41
  # Zero-size output has a special meaning when using chunked encoding
24
42
  return if buf.nil? || buf.bytesize.zero?
25
43
 
26
- # Ensure we ALWAYS write in binary encoding.
27
- encoded =
28
- if buf.encoding != Encoding::BINARY
29
- # If we got a frozen string we can't force_encoding on it
30
- begin
31
- buf.force_encoding(Encoding::BINARY)
32
- rescue
33
- buf.dup.force_encoding(Encoding::BINARY)
34
- end
35
- else
36
- buf
37
- end
38
-
39
- @block.call(encoded)
44
+ @block.call(buf.b)
40
45
  self
41
46
  end
42
47
  end
@@ -19,7 +19,7 @@ require 'stringio'
19
19
  # ## Usage
20
20
  #
21
21
  # File.open('zipfile.zip', 'rb') do |f|
22
- # entries = FileReader.read_zip_structure(f)
22
+ # entries = ZipTricks::FileReader.read_zip_structure(io: f)
23
23
  # entries.each do |e|
24
24
  # File.open(e.filename, 'wb') do |extracted_file|
25
25
  # ex = e.extractor_from(f)
@@ -82,7 +82,9 @@ class ZipTricks::FileReader
82
82
 
83
83
  private_constant :StoredReader, :InflatingReader
84
84
 
85
- # Represents a file within the ZIP archive being read
85
+ # Represents a file within the ZIP archive being read. This is different from
86
+ # the Entry object used in Streamer for ZIP writing, since during writing more
87
+ # data can be kept in memory for immediate use.
86
88
  class ZipEntry
87
89
  # @return [Fixnum] bit-packed version signature of the program that made the archive
88
90
  attr_accessor :made_by
@@ -279,7 +281,7 @@ class ZipTricks::FileReader
279
281
  seek(io, next_local_header_offset) # Seek to the next entry, and raise if seek is impossible
280
282
  end
281
283
  entries
282
- rescue ReadError
284
+ rescue ReadError, RangeError # RangeError is raised if offset exceeds int32/int64 range
283
285
  log do
284
286
  'Got a read/seek error after reaching %<cur_offset>d, no more entries can be recovered' %
285
287
  {cur_offset: cur_offset}
@@ -363,7 +365,7 @@ class ZipTricks::FileReader
363
365
  # (read starting at this offset to get the data).
364
366
  #
365
367
  # @param io[#seek, #read] an IO-ish object the ZIP file can be read from
366
- # @param local_header_offset[Fixnum] absolute offset (0-based) where the
368
+ # @param local_file_header_offset[Fixnum] absolute offset (0-based) where the
367
369
  # local file header is supposed to begin @return [Fixnum] absolute offset
368
370
  # (0-based) of where the compressed data begins for this file within the ZIP
369
371
  def get_compressed_data_offset(io:, local_file_header_offset:)
@@ -375,7 +377,7 @@ class ZipTricks::FileReader
375
377
  # Parse an IO handle to a ZIP archive into an array of Entry objects, reading from the end
376
378
  # of the IO object.
377
379
  #
378
- # @see {#read_zip_structure}
380
+ # @see #read_zip_structure
379
381
  # @param options[Hash] any options the instance method of the same name accepts
380
382
  # @return [Array<ZipEntry>] an array of entries within the ZIP being parsed
381
383
  def self.read_zip_structure(**options)
@@ -385,7 +387,7 @@ class ZipTricks::FileReader
385
387
  # Parse an IO handle to a ZIP archive into an array of Entry objects, reading from the start of
386
388
  # the file and parsing local file headers one-by-one
387
389
  #
388
- # @see {#read_zip_straight_ahead}
390
+ # @see #read_zip_straight_ahead
389
391
  # @param options[Hash] any options the instance method of the same name accepts
390
392
  # @return [Array<ZipEntry>] an array of entries within the ZIP being parsed
391
393
  def self.read_zip_straight_ahead(**options)
@@ -4,7 +4,7 @@
4
4
  # write operations, but want to discard the data (like when
5
5
  # estimating the size of a ZIP)
6
6
  module ZipTricks::NullWriter
7
- # @param data[String] the data to write
7
+ # @param _[String] the data to write
8
8
  # @return [self]
9
9
  def self.<<(_)
10
10
  self
@@ -32,6 +32,10 @@
32
32
  # conflict is avoided. This is not possible to apply to directories, because when one of the
33
33
  # path components is reused in multiple filenames it means those entities should end up in
34
34
  # the same directory (subdirectory) once the archive is opened.
35
+ #
36
+ # The `PathSet` keeps track of entries as they get added using 2 Sets (cheap presence checks),
37
+ # one for directories and one for files. It will raise a `Conflict` exception if there are
38
+ # files clobbering one another, or in case files collide with directories.
35
39
  class ZipTricks::PathSet
36
40
  class Conflict < StandardError
37
41
  end
@@ -6,17 +6,16 @@ module ZipTricks::RailsStreaming
6
6
  # Opens a {ZipTricks::Streamer} and yields it to the caller. The output of the streamer
7
7
  # gets automatically forwarded to the Rails response stream. When the output completes,
8
8
  # the Rails response stream is going to be closed automatically.
9
+ # @param zip_streamer_options[Hash] options that will be passed to the Streamer.
10
+ # See {ZipTricks::Streamer#initialize} for the full list of options.
9
11
  # @yield [Streamer] the streamer that can be written to
10
- def zip_tricks_stream
12
+ # @return [ZipTricks::OutputEnumerator] The output enumerator assigned to the response body
13
+ def zip_tricks_stream(**zip_streamer_options, &zip_streaming_blk)
11
14
  # Set a reasonable content type
12
15
  response.headers['Content-Type'] = 'application/zip'
13
16
  # Make sure nginx buffering is suppressed - see https://github.com/WeTransfer/zip_tricks/issues/48
14
17
  response.headers['X-Accel-Buffering'] = 'no'
15
- # Create a wrapper for the write call that quacks like something you
16
- # can << to, used by ZipTricks
17
- w = ZipTricks::BlockWrite.new { |chunk| response.stream.write(chunk) }
18
- ZipTricks::Streamer.open(w) { |z| yield(z) }
19
- ensure
20
- response.stream.close
18
+ response.sending_file = true
19
+ self.response_body = ZipTricks::OutputEnumerator.new(**zip_streamer_options, &zip_streaming_blk)
21
20
  end
22
21
  end
@@ -73,9 +73,6 @@ class ZipTricks::SizeEstimator
73
73
  self
74
74
  end
75
75
 
76
- # Will be phased out in ZipTricks 5.x
77
- alias_method :add_compressed_entry, :add_deflated_entry
78
-
79
76
  # Add an empty directory to the archive.
80
77
  #
81
78
  # @param dirname [String] the name of the directory
@@ -27,7 +27,7 @@ class ZipTricks::StreamCRC32
27
27
 
28
28
  # Creates a new streaming CRC32 calculator
29
29
  def initialize
30
- @crc = Zlib.crc32('')
30
+ @crc = Zlib.crc32
31
31
  end
32
32
 
33
33
  # Append data to the CRC32. Updates the contained CRC32 value in place.
@@ -35,7 +35,7 @@ class ZipTricks::StreamCRC32
35
35
  # @param blob[String] the string to compute the CRC32 from
36
36
  # @return [self]
37
37
  def <<(blob)
38
- @crc = Zlib.crc32_combine(@crc, Zlib.crc32(blob), blob.bytesize)
38
+ @crc = Zlib.crc32(blob, @crc)
39
39
  self
40
40
  end
41
41
 
@@ -91,6 +91,7 @@ class ZipTricks::Streamer
91
91
  InvalidOutput = Class.new(ArgumentError)
92
92
  Overflow = Class.new(StandardError)
93
93
  UnknownMode = Class.new(StandardError)
94
+ OffsetOutOfSync = Class.new(StandardError)
94
95
 
95
96
  private_constant :DeflatedWriter, :StoredWriter, :STORED, :DEFLATED
96
97
 
@@ -130,7 +131,7 @@ class ZipTricks::Streamer
130
131
  # end
131
132
  #
132
133
  # @param kwargs_for_new [Hash] keyword arguments for {Streamer.new}
133
- # @return [Enumerator] the enumerator you can read bytestrings of the ZIP from using `each`
134
+ # @return [ZipTricks::OutputEnumerator] the enumerator you can read bytestrings of the ZIP from by calling `each`
134
135
  def self.output_enum(**kwargs_for_new, &zip_streamer_block)
135
136
  ZipTricks::OutputEnumerator.new(**kwargs_for_new, &zip_streamer_block)
136
137
  end
@@ -149,7 +150,6 @@ class ZipTricks::Streamer
149
150
  @dedupe_filenames = auto_rename_duplicate_filenames
150
151
  @out = ZipTricks::WriteAndTell.new(stream)
151
152
  @files = []
152
- @local_header_offsets = []
153
153
  @path_set = ZipTricks::PathSet.new
154
154
  @writer = writer
155
155
  end
@@ -213,9 +213,6 @@ class ZipTricks::Streamer
213
213
  @out.tell
214
214
  end
215
215
 
216
- # Will be phased out in ZipTricks 5.x
217
- alias_method :add_compressed_entry, :add_deflated_entry
218
-
219
216
  # Writes out the local header for an entry (file in the ZIP) that is using
220
217
  # the stored storage model (is stored as-is).
221
218
  # Once this method is called, the `<<` method has to be called one or more
@@ -363,14 +360,16 @@ class ZipTricks::Streamer
363
360
  #
364
361
  # @return [Integer] the offset the output IO is at after closing the archive
365
362
  def close
363
+ # Make sure offsets are in order
364
+ verify_offsets!
365
+
366
366
  # Record the central directory offset, so that it can be written into the EOCD record
367
367
  cdir_starts_at = @out.tell
368
368
 
369
369
  # Write out the central directory entries, one for each file
370
- @files.each_with_index do |entry, i|
371
- header_loc = @local_header_offsets.fetch(i)
370
+ @files.each do |entry|
372
371
  @writer.write_central_directory_file_header(io: @out,
373
- local_file_header_location: header_loc,
372
+ local_file_header_location: entry.local_header_offset,
374
373
  gp_flags: entry.gp_flags,
375
374
  storage_mode: entry.storage_mode,
376
375
  compressed_size: entry.compressed_size,
@@ -423,15 +422,40 @@ class ZipTricks::Streamer
423
422
  last_entry.compressed_size = compressed_size
424
423
  last_entry.uncompressed_size = uncompressed_size
425
424
 
425
+ offset_before_data_descriptor = @out.tell
426
426
  @writer.write_data_descriptor(io: @out,
427
427
  crc32: last_entry.crc32,
428
428
  compressed_size: last_entry.compressed_size,
429
429
  uncompressed_size: last_entry.uncompressed_size)
430
+ last_entry.bytes_used_for_data_descriptor = @out.tell - offset_before_data_descriptor
431
+
430
432
  @out.tell
431
433
  end
432
434
 
433
435
  private
434
436
 
437
+ def verify_offsets!
438
+ # We need to check whether the offsets noted for the entries actually make sense
439
+ computed_offset = @files.map(&:total_bytes_used).inject(0, &:+)
440
+ actual_offset = @out.tell
441
+ if computed_offset != actual_offset
442
+ message = <<-EMS
443
+ The offset of the Streamer output IO is out of sync with the expected value. All entries written so far,
444
+ including their compressed bodies, local headers and data descriptors, add up to a certain offset,
445
+ but this offset does not match the actual offset of the IO.
446
+
447
+ Entries add up to #{computed_offset} bytes and the IO is at #{actual_offset} bytes.
448
+
449
+ This can happen if you write local headers for an entry, write the "body" of the entry directly to the IO
450
+ object which is your destination, but do not adjust the offset known to the Streamer object. To adjust
451
+ the offfset you need to call `Streamer#simulate_write(body_size)` after outputting the entry. Otherwise
452
+ the local header offsets of the entries you write are going to be incorrect and some ZIP applications
453
+ are going to have problems opening your archive.
454
+ EMS
455
+ raise OffsetOutOfSync, message
456
+ end
457
+ end
458
+
435
459
  def add_file_and_write_local_header(
436
460
  filename:,
437
461
  modification_time:,
@@ -464,16 +488,18 @@ class ZipTricks::Streamer
464
488
  uncompressed_size = 0
465
489
  end
466
490
 
491
+ local_header_starts_at = @out.tell
492
+
467
493
  e = Entry.new(filename,
468
494
  crc32,
469
495
  compressed_size,
470
496
  uncompressed_size,
471
497
  storage_mode,
472
498
  modification_time,
473
- use_data_descriptor)
474
-
475
- @files << e
476
- @local_header_offsets << @out.tell
499
+ use_data_descriptor,
500
+ _local_file_header_offset = local_header_starts_at,
501
+ _bytes_used_for_local_header = 0,
502
+ _bytes_used_for_data_descriptor = 0)
477
503
 
478
504
  @writer.write_local_file_header(io: @out,
479
505
  gp_flags: e.gp_flags,
@@ -483,6 +509,9 @@ class ZipTricks::Streamer
483
509
  mtime: e.mtime,
484
510
  filename: e.filename,
485
511
  storage_mode: e.storage_mode)
512
+ e.bytes_used_for_local_header = @out.tell - e.local_header_offset
513
+
514
+ @files << e
486
515
  end
487
516
 
488
517
  def remove_backslash(filename)