RubyGems - zip_tricks - Versions diffs - 2.8.0 → 2.8.1 - Mend

zip_tricks 2.8.0 → 2.8.1

Files changed (7) hide show

checksums.yaml +4 -4
data/IMPLEMENTATION_DETAILS.md +106 -0
data/lib/zip_tricks/microzip.rb +21 -49
data/lib/zip_tricks.rb +1 -1
data/spec/zip_tricks/microzip_spec.rb +345 -35
data/zip_tricks.gemspec +4 -3
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 84302e70bc873418b8a458456a75e0da7b5bc67d
-  data.tar.gz: 06f81fc860d5cf77fdbd4b570602ae4984297d43
+  metadata.gz: 05fb68904433a8f06dd50d668c29d3c0cf8780f2
+  data.tar.gz: af0dfe83697cc9cd6bb4ed1a8714cffbb3ceedd7
 SHA512:
-  metadata.gz: 932fe6e3a095f43996505c642fce11009efba4298302e958f5fdf7649cd3ca9fd28ab1de6a33d1e20be2955a7efd58b4d432486d3004ab70437a55413c27934d
-  data.tar.gz: b9d43dbf16156cd3c877a8353dd0dd4a34e4c8b2e43664e34c4c3f2562c3a9937390bf398b0278449bf338f89337c348f30da637414e61b17eab58a939b05fc5
+  metadata.gz: 4b583b966eb87b502f428bd43d7bd9d9c800112bb3c256aa63a8cda6dcfb02052cf653de2df4022e23f49db37e157d81d336b0db8c8527c88f6e2de2e8974172
+  data.tar.gz: 22f263d209256d28c552c6fe04b9981d1181eb7f062233f7bc407146987faf2d1f14db9b8902ddf89d4038680839971291c5e47d8b57cbc8ed248a9360bf431b

data/IMPLEMENTATION_DETAILS.md ADDED Viewed

@@ -0,0 +1,106 @@
+# Implementation details
+The ZipTricks streaming implementation is designed around the following requirements:
+* Only ahead-writes (no IO seek or rewind)
+* Automatic switching to Zip64 as the files get written (no IO seeks), but not requiring Zip64 support if the archive can do without
+* Make use of the fact that CRC32 checksums and the sizes of the files (compressed _and_ uncompressed) are known upfront
+It strives to be compatible with the following unzip programs _at the minimum:_
+* OSX - builtin ArchiveUtility (except the Zip64 support when files larger than 4GB are in the archive)
+* OSX - The Unarchiver, at least 3.10.1
+* Windows 7 - built-in Explorer zip browser (except for Unicode filenames which it just doesn't support)
+* Windows 7 - 7Zip 9.20
+Below is the list of _specific_ decisions taken when writing the implementation, with an explanation for each.
+We specifically _omit_ a number of things that we could do, but that are not necessary to satisfy our objectives.
+The omissions are _intentional_ since we do not want to have things of which we _assume_ they work, or have things
+that work only for one obscure unarchiver in one obscure case (like WinRAR with chinese filenames).
+## Data descriptors (postfix CRC32/file sizes)
+Data descriptors permit you to generate "postfix" ZIP files (where you write the local file header without having to
+know the CRC32 and the file size upfront, then write the compressed file data, and only then - once you know what your CRC32,
+compressed and uncompressed sizes are etc. - write them into a data descriptor that follows the file data.
+The streamer does _not_ use data descriptors, because their use [is problematic](https://github.com/thejoshwolfe/yazl/issues/13)
+with the 7Zip version that we want to support. Or rather - not the use of data descriptors themselves, but the use of the GP flag
+bit 3 that trips up that version of 7Zip. If we were to use data descriptors, we would have to up the minimum supported version
+of 7Zip.
+That means, in turn, that **to use the ZipTricks streamer you have to know the CRC32 and the sizes of the compressed/uncompressed
+file upfront.** So you have to precompute them in some way. To do that, you can use `BlockDeflate` to precompress the file in
+parallel, and `StreamCRC32` to compute the CRC checksum, before feeding them to the ZIP writer.
+This approach might be reconsidered in the future.
+For more info see https://github.com/thejoshwolfe/yazl#general-purpose-bit-flag
+## Zip64 support
+Zip64 support switches on _by itself_, automatically, when _any_ of the following conditions is met:
+* The start of the central directory lies beyound the 4GB limit
+* The ZIP archive has more than 65535 files added to it
+* Any entry is present whose compressed _or_ uncompressed size is above 4GB
+When writing out local file headers, the Zip64 extra field (and related changes to the standard fields) are
+_only_ performed if one of the file sizes is larger than 4GB. Otherwise the Zip64 extra will _only_ be
+written in the central directory entry, but not in the local file header.
+This has to do with the fact that otherwise we would write Zip64 extra fields for all local file headers,
+regardless whether the file actually requires Zip64 or not. That might impede some older tools from reading
+the archive, which is a problem you don't want to have if your archive otherwise fits perfectly below all
+the Zip64 thresholds.
+To be compatible with Windows7 built-in tools, the Zip64 extra field _must_ be written as _the first_ extra
+field, any other extra fields should come after.
+## International filename support and the Info-ZIP extra field
+If a diacritic-containing character (such as å) does fit into the DOS-437
+codepage, it should be encodable as such. This would, in theory, let older Windows tools
+decode the filename correctly. However, this kills the filename decoding for the OSX builtin
+archive utility (it assumes the filename to be UTF-8, regardless). So if we allow filenames
+to be encoded in DOS-437, we _potentially_ have support in Windows but we upset everyone on Mac.
+If we just use UTF-8 and set the right EFS bit in general purpose flags, we upset Windows users
+because most of the Windows unarchive tools (at least the builtin ones) do not give a flying eff
+about the EFS support bit being set.
+Additionally, if we use Unarchiver on OSX (which is our recommended unpacker for large files),
+it will (very rightfully) ask us how we should decode each filename that does not have the EFS bit,
+but does contain something non-ASCII-decodable. This is horrible UX for users.
+So, basically, we have 2 choices, for filenames containing diacritics (for bona-fide UTF-8 you do not
+even get those choices, you _have_ to use UTF-8):
+* Make life easier for Windows users by setting stuff to DOS, not care about the standard _and_ make
+  most of Mac users upset
+* Make life easy for Mac users and conform to the standard, and tell Windows users to get a _decent_
+  ZIP unarchiving tool.
+We are going with option 2, and this is well-thought-out. Trust me. If you want the crazytown
+filename encoding scheme that is described here http://stackoverflow.com/questions/13261347
+you can try this:
+   [Encoding::CP437, Encoding::ISO_8859_1, Encoding::UTF_8]
+We don't want no such thing, and sorry Windows users, you are going to need a decent unarchiver
+that honors the standard. Alas, alas.
+Additionally, the tests with the unarchivers we _do_ support have shown that including the InfoZIP
+extra field does not actually help any of them recognize the file name correctly. And the use of
+those fields for the UTF-8 filename, per spec, tells us we should not set the EFS bit - which ruins
+the unarchiving for all other solutions. As any other, this decision may be changed in the future.
+There are some interesting notes about the Info-ZIP/EFS combination here
+https://commons.apache.org/proper/commons-compress/zip.html
+## Directory support
+ZIP makes it possible to store empty directories (folders). For our purposes, however, we are going
+to store only the files. If you store a file, called, say, `docs/item.doc` then the unarchiver will
+automatically create the `docs` directory if it doesn't exist already. You can also store an entry
+with a length of 0 and set it's external attributes to be an empty directory, but we do not need
+that functionality - so it is also omitted.

data/lib/zip_tricks/microzip.rb CHANGED Viewed

@@ -13,6 +13,7 @@ class ZipTricks::Microzip
   DEFLATED = 8
   TooMuch = Class.new(StandardError)
+  PathError = Class.new(StandardError)
   DuplicateFilenames = Class.new(StandardError)
   UnknownMode = Class.new(StandardError)
@@ -42,21 +43,14 @@ class ZipTricks::Microzip
   C_v = 'v'.freeze
   C_Qe = 'Q<'.freeze
-  module Bytesize
-    def bytesize_of
-      ''.force_encoding(Encoding::BINARY).tap {|b| yield(b) }.bytesize
-    end
-  end
-  include Bytesize
   class Entry < Struct.new(:filename, :crc32, :compressed_size, :uncompressed_size, :storage_mode, :mtime)
-    include Bytesize
     def initialize(*)
       super
+      filename.force_encoding(Encoding::UTF_8)
+      @requires_efs_flag = !(filename.encode(Encoding::ASCII) rescue false)
       @requires_zip64 = (compressed_size > FOUR_BYTE_MAX_UINT || uncompressed_size > FOUR_BYTE_MAX_UINT)
-      if filename.bytesize > TWO_BYTE_MAX_UINT
-        raise TooMuch, "The given filename is too long to fit (%d bytes)" % filename.bytesize
-      end
+      raise TooMuch, "Filename is too long" if filename.bytesize > TWO_BYTE_MAX_UINT
+      raise PathError, "Paths in ZIP may only contain forward slashes (UNIX separators)" if filename.include?('\\')
     end
     def requires_zip64?
@@ -67,41 +61,8 @@ class ZipTricks::Microzip
     # bit (bit 11) which should be set if the filename is UTF8. If it is, we need to set the
     # bit so that the unarchiving application knows that the filename in the archive is UTF-8
     # encoded, and not some DOS default. For ASCII entries it does not matter.
-    #
-    # Now, strictly speaking, if a diacritic-containing character (such as å) does fit into the DOS-437
-    # codepage, it should be encodable as such. This would, in theory, let older Windows tools
-    # decode the filename correctly. However, this kills the filename decoding for the OSX builtin
-    # archive utility (it assumes the filename to be UTF-8, regardless). So if we allow filenames
-    # to be encoded in DOS-437, we _potentially_ have support in Windows but we upset everyone on Mac.
-    # If we just use UTF-8 and set the right EFS bit in general purpose flags, we upset Windows users
-    # because most of the Windows unarchive tools (at least the builtin ones) do not give a flying eff
-    # about the EFS support bit being set.
-    #
-    # Additionally, if we use Unarchiver on OSX (which is our recommended unpacker for large files),
-    # it will (very rightfully) ask us how we should decode each filename that does not have the EFS bit,
-    # but does contain something non-ASCII-decodable. This is horrible UX for users.
-    #
-    # So, basically, we have 2 choices, for filenames containing diacritics (for bona-fide UTF-8 you do not
-    # even get those choices, you _have_ to use UTF-8):
-    #
-    # * Make life easier for Windows users by setting stuff to DOS, not care about the standard _and_ make
-    #   most of Mac users upset
-    # * Make life easy for Mac users and conform to the standard, and tell Windows users to get a _decent_
-    #   ZIP unarchiving tool.
-    #
-    # We are going with option 2, and this is well-thought-out. Trust me. If you want the crazytown
-    # filename encoding scheme that is described here http://stackoverflow.com/questions/13261347
-    # you can try this:
-    #
-    #  [Encoding::CP437, Encoding::ISO_8859_1, Encoding::UTF_8]
-    #
-    # We don't want no such thing, and sorry Windows users, you are going to need a decent unarchiver
-    # that honors the standard. Alas, alas.
     def gp_flags_based_on_filename
-      filename.encode(Encoding::ASCII)
-      0b00000000000
-    rescue EncodingError
-      0b00000000000 | 0b100000000000
+      @requires_efs_flag ? (0b00000000000 | 0b100000000000) : 0b00000000000
     end
     def write_local_file_header(io)
@@ -212,9 +173,16 @@ class ZipTricks::Microzip
       io << [extra_size].pack(C_v)                        # extra field length              2 bytes
       io << [0].pack(C_v)                                 # file comment length             2 bytes
-      io << [0].pack(C_v)                                 # disk number start               2 bytes
-      io << [0].pack(C_v)                                 # internal file attributes        2 bytes
+      # For The Unarchiver < 3.11.1 this field has to be set to the overflow value if zip64 is used
+      # because otherwise it does not properly advance the pointer when reading the Zip64 extra field
+      # https://bitbucket.org/WAHa_06x36/theunarchiver/pull-requests/2/bug-fix-for-zip64-extra-field-parser/diff
+      if @requires_zip64
+        io << [TWO_BYTE_MAX_UINT].pack(C_v)               # disk number start               2 bytes
+      else
+        io << [0].pack(C_v)                               # disk number start               2 bytes
+      end
+      io << [0].pack(C_v)                                # internal file attributes        2 bytes
       io << [DEFAULT_EXTERNAL_ATTRS].pack(C_V)           # external file attributes        4 bytes
       if @requires_zip64
@@ -232,6 +200,10 @@ class ZipTricks::Microzip
     private
+    def bytesize_of
+      ''.force_encoding(Encoding::BINARY).tap {|b| yield(b) }.bytesize
+    end
     def to_binary_dos_time(t)
       (t.sec/2) + (t.min << 5) + (t.hour << 11)
     end
@@ -313,10 +285,10 @@ class ZipTricks::Microzip
                                                              # offset of start of central
                                                              # directory with respect to
       io << [start_of_central_directory].pack(C_Qe)          # the starting disk number        8 bytes
-                                                              # zip64 extensible data sector    (variable size)
+                                                             # zip64 extensible data sector    (variable size), blank for us
       # [zip64 end of central directory locator]
-      io << [0x07064b50].pack("V")                           # zip64 end of central dir locator
+      io << [0x07064b50].pack(C_V)                           # zip64 end of central dir locator
                                                              # signature                       4 bytes  (0x07064b50)
       io << [0].pack(C_V)                                    # number of the disk with the
                                                              # start of the zip64 end of

data/lib/zip_tricks.rb CHANGED Viewed

@@ -2,7 +2,7 @@ require 'zip'
 require 'very_tiny_state_machine'
 module ZipTricks
-  VERSION = '2.8.0'
+  VERSION = '2.8.1'
   # Require all the sub-components except myself
   Dir.glob(__dir__ + '/**/*.rb').sort.each {|p| require p unless p == __FILE__ }

data/spec/zip_tricks/microzip_spec.rb CHANGED Viewed

@@ -1,28 +1,35 @@
 require_relative '../spec_helper'
+require_relative '../../testing/support'
 describe ZipTricks::Microzip do
   class ByteReader < Struct.new(:io)
     def read_2b
       io.read(2).unpack('v').first
     end
     def read_2c
       io.read(2).unpack('CC').first
     end
     def read_4b
       io.read(4).unpack('V').first
     end
     def read_8b
       io.read(8).unpack('Q<').first
     end
     def read_n(n)
       io.read(n)
     end
   end
+  class IOWrapper < ZipTricks::WriteAndTell
+    def read(n)
+      @io.read(n)
+    end
+  end
   it 'raises an exception if the filename is non-unique in the already existing set' do
     z = described_class.new
     z.add_local_file_header(io: StringIO.new, filename: 'foo.txt', crc32: 0, compressed_size: 0, uncompressed_size: 0, storage_mode: 0)
@@ -31,15 +38,23 @@ describe ZipTricks::Microzip do
     }.to raise_error(/already/)
   end
+  it 'raises an exception if the filename contains backward slashes' do
+    z = described_class.new
+    expect {
+      z.add_local_file_header(io: StringIO.new, filename: 'windows\not\welcome.txt',
+        crc32: 0, compressed_size: 0, uncompressed_size: 0, storage_mode: 0)
+    }.to raise_error(/UNIX/)
+  end
   it 'raises an exception if the filename does not fit in 0xFFFF bytes' do
     longest_filename_in_the_universe = "x" * (0xFFFF + 1)
     z = described_class.new
     expect {
-      z.add_local_file_header(io: StringIO.new, filename: longest_filename_in_the_universe,
+      z.add_local_file_header(io: StringIO.new, filename: longest_filename_in_the_universe,
         crc32: 0, compressed_size: 0, uncompressed_size: 0, storage_mode: 0)
-    }.to raise_error(/filename/)
+    }.to raise_error(/is too long/)
   end
   describe '#add_local_file_header' do
     it 'writes out the local file header for an entry that fits into a standard ZIP' do
       buf = StringIO.new
@@ -47,7 +62,7 @@ describe ZipTricks::Microzip do
       mtime = Time.utc(2016, 7, 17, 13, 48)
       zip.add_local_file_header(io: buf, filename: 'first-file.bin', crc32: 123, compressed_size: 8981,
         uncompressed_size: 90981, storage_mode: 8, mtime: mtime)
       buf.rewind
       br = ByteReader.new(buf)
       expect(br.read_4b).to eq(0x04034b50) # Signature
@@ -64,28 +79,45 @@ describe ZipTricks::Microzip do
       expect(br.read_n('first-file.bin'.bytesize)).to eq('first-file.bin') # the filename
       expect(buf).to be_eof
     end
     it 'writes out the local file header for an entry with a UTF-8 filename, setting the proper GP flag bit' do
       buf = StringIO.new
       zip = described_class.new
       mtime = Time.utc(2016, 7, 17, 13, 48)
       zip.add_local_file_header(io: buf, filename: 'файл.bin', crc32: 123, compressed_size: 8981,
         uncompressed_size: 90981, storage_mode: 8, mtime: mtime)
+      buf.rewind
+      br = ByteReader.new(buf)
+      br.read_4b # Signature
+      br.read_2b # Version needed to extract
+      expect(br.read_2b).to eq(2048)       # gp flags
+    end
+    it "correctly recognizes UTF-8 filenames even if they are tagged as ASCII" do
+      name = 'файл.bin'
+      name.force_encoding(Encoding::US_ASCII)
+      buf = StringIO.new
+      zip = described_class.new
+      mtime = Time.utc(2016, 7, 17, 13, 48)
+      zip.add_local_file_header(io: buf, filename: name, crc32: 123, compressed_size: 8981,
+                                uncompressed_size: 90981, storage_mode: 8, mtime: mtime)
       buf.rewind
       br = ByteReader.new(buf)
       br.read_4b # Signature
       br.read_2b # Version needed to extract
       expect(br.read_2b).to eq(2048)       # gp flags
     end
     it 'writes out the local file header for an entry with a filename with diacritics, setting the proper GP flag bit' do
       buf = StringIO.new
       zip = described_class.new
       mtime = Time.utc(2016, 7, 17, 13, 48)
       zip.add_local_file_header(io: buf, filename: 'Kungälv', crc32: 123, compressed_size: 8981,
         uncompressed_size: 90981, storage_mode: 8, mtime: mtime)
       buf.rewind
       br = ByteReader.new(buf)
       br.read_4b # Signature
@@ -102,14 +134,14 @@ describe ZipTricks::Microzip do
       filename_readback = br.read_n('Kungälv'.bytesize)
       expect(filename_readback.force_encoding(Encoding::UTF_8)).to eq('Kungälv')
     end
     it 'writes out the local file header for an entry that requires Zip64 based on its compressed size _only_' do
       buf = StringIO.new
       zip = described_class.new
       mtime = Time.utc(2016, 7, 17, 13, 48)
       zip.add_local_file_header(io: buf, filename: 'first-file.bin', crc32: 123, compressed_size: (0xFFFFFFFF + 1),
         uncompressed_size: 90981, storage_mode: 8, mtime: mtime)
       buf.rewind
       br = ByteReader.new(buf)
       expect(br.read_4b).to eq(0x04034b50) # Signature
@@ -130,14 +162,14 @@ describe ZipTricks::Microzip do
       expect(br.read_8b).to eq(0xFFFFFFFF + 1) # True uncompressed size
       expect(buf).to be_eof
     end
     it 'writes out the local file header for an entry that requires Zip64 based on its uncompressed size _only_' do
       buf = StringIO.new
       zip = described_class.new
       mtime = Time.utc(2016, 7, 17, 13, 48)
       zip.add_local_file_header(io: buf, filename: 'first-file.bin', crc32: 123, compressed_size: 90981,
         uncompressed_size: (0xFFFFFFFF + 1), storage_mode: 8, mtime: mtime)
       buf.rewind
       br = ByteReader.new(buf)
       expect(br.read_4b).to eq(0x04034b50) # Signature
@@ -158,7 +190,7 @@ describe ZipTricks::Microzip do
       expect(br.read_8b).to eq(90981)          # True compressed size
       expect(buf).to be_eof
     end
     it 'does not write out the Zip64 extra if the position in the destination IO is beyond the Zip64 size limit' do
       buf = StringIO.new
       zip = described_class.new
@@ -166,7 +198,7 @@ describe ZipTricks::Microzip do
       expect(buf).to receive(:tell).and_return(0xFFFFFFFF + 1)
       zip.add_local_file_header(io: buf, filename: 'first-file.bin', crc32: 123, compressed_size: 123,
         uncompressed_size: 456, storage_mode: 8, mtime: mtime)
       buf.rewind
       br = ByteReader.new(buf)
       expect(br.read_4b).to eq(0x04034b50) # Signature
@@ -182,14 +214,14 @@ describe ZipTricks::Microzip do
       expect(br.read_2b).to be_zero
     end
   end
   describe '#write_central_directory' do
-    it 'can write the central directory and makes it a valid one even if there were no files' do
+    it 'writes the central directory and makes it a valid one even if there were no files' do
       buf = StringIO.new
       zip = described_class.new
       zip.write_central_directory(buf)
       buf.rewind
       br = ByteReader.new(buf)
       expect(br.read_4b).to eq(0x06054b50) # EOCD signature
@@ -202,35 +234,313 @@ describe ZipTricks::Microzip do
       expect(br.read_2b).to eq(0)          # ZIP file comment length
       expect(buf).to be_eof
     end
     it 'writes the central directory for 2 files' do
       zip = described_class.new
       mtime = Time.utc(2016, 7, 17, 13, 48)
       buf = StringIO.new
       zip.add_local_file_header(io: buf, filename: 'first-file.bin', crc32: 123, compressed_size: 5,
         uncompressed_size: 8, storage_mode: 8, mtime: mtime)
       buf << Random.new.bytes(5)
-      zip.add_local_file_header(io: buf, filename: 'first-file.txt', crc32: 123, compressed_size: 9,
+      zip.add_local_file_header(io: buf, filename: 'second-file.txt', crc32: 546, compressed_size: 9,
         uncompressed_size: 9, storage_mode: 0, mtime: mtime)
       buf << Random.new.bytes(5)
       central_dir_offset = buf.tell
       zip.write_central_directory(buf)
       # Seek to where the central directory begins
       buf.rewind
       buf.seek(central_dir_offset)
       br = ByteReader.new(buf)
+      # Central directory entry for the first file
+      expect(br.read_4b).to eq(0x02014b50) # Central directory entry sig
+      expect(br.read_2b).to eq(820)        # version made by
+      expect(br.read_2b).to eq(20)         # version need to extract
+      expect(br.read_2b).to eq(0)          # general purpose bit flag
+      expect(br.read_2b).to eq(8)          # compression method (deflated here)
+      expect(br.read_2b).to eq(28160)      # last mod file time
+      expect(br.read_2b).to eq(18673)      # last mod file date
+      expect(br.read_4b).to eq(123)        # crc32
+      expect(br.read_4b).to eq(5)          # compressed size
+      expect(br.read_4b).to eq(8)          # uncompressed size
+      expect(br.read_2b).to eq(14)         # filename length
+      expect(br.read_2b).to eq(0)          # extra field length
+      expect(br.read_2b).to eq(0)          # file comment
+      expect(br.read_2b).to eq(0)          # disk number, must be blanked to the maximum value because of The Unarchiver bug
+      expect(br.read_2b).to eq(0)          # internal file attributes
+      expect(br.read_4b).to eq(2175008768) # external file attributes
+      expect(br.read_4b).to eq(0)          # relative offset of local header
+      expect(br.read_n(14)).to eq('first-file.bin') # the filename
+      # Central directory entry for the second file
       expect(br.read_4b).to eq(0x02014b50) # Central directory entry sig
+      expect(br.read_2b).to eq(820)        # version made by
+      expect(br.read_2b).to eq(20)         # version need to extract
+      expect(br.read_2b).to eq(0)          # general purpose bit flag
+      expect(br.read_2b).to eq(0)          # compression method (stored here)
+      expect(br.read_2b).to eq(28160)      # last mod file time
+      expect(br.read_2b).to eq(18673)      # last mod file date
+      expect(br.read_4b).to eq(546)        # crc32
+      expect(br.read_4b).to eq(9)          # compressed size
+      expect(br.read_4b).to eq(9)          # uncompressed size
+      expect(br.read_2b).to eq('second-file.bin'.bytesize)         # filename length
+      expect(br.read_2b).to eq(0)          # extra field length
+      expect(br.read_2b).to eq(0)          # file comment
+      expect(br.read_2b).to eq(0)          # disk number, must be blanked to the maximum value because of The Unarchiver bug
+      expect(br.read_2b).to eq(0)          # internal file attributes
+      expect(br.read_4b).to eq(2175008768) # external file attributes
+      expect(br.read_4b).to eq(49)         # relative offset of local header
+      expect(br.read_n('second-file.txt'.bytesize)).to eq('second-file.txt') # the filename
+      expect(br.read_4b).to eq(0x06054b50) # end of central dir signature
+      br.read_2b
+      br.read_2b
+      br.read_2b
+      br.read_2b
+      br.read_4b
+      br.read_4b
+      br.read_2b
+      expect(buf).to be_eof
+    end
+    it 'writes the central directory for 1 file that is larger than 4GB' do
+      zip   = described_class.new
+      buf   = StringIO.new
+      big   = 0xFFFFFFFF + 2048
+      mtime = Time.utc(2016, 7, 17, 13, 48)
+      zip.add_local_file_header(io: buf, filename: 'big-file.bin', crc32: 12345, compressed_size: big,
+                                uncompressed_size: big, storage_mode: 0, mtime: mtime)
+      central_dir_offset = buf.tell
+      zip.write_central_directory(buf)
+      # Seek to where the central directory begins
+      buf.rewind
+      buf.seek(central_dir_offset)
+      br = ByteReader.new(buf)
+      # Standard central directory entry (similar to the local file header)
+      expect(br.read_4b).to eq(0x02014b50)  # Central directory entry sig
+      expect(br.read_2b).to eq(820)         # version made by
+      expect(br.read_2b).to eq(45)          # version need to extract (45 for Zip64)
+      expect(br.read_2b).to eq(0)           # general purpose bit flag
+      expect(br.read_2b).to eq(0)           # compression method (stored here)
+      expect(br.read_2b).to eq(28160)       # last mod file time
+      expect(br.read_2b).to eq(18673)       # last mod file date
+      expect(br.read_4b).to eq(12345)       # crc32
+      expect(br.read_4b).to eq(0xFFFFFFFF)  # compressed size
+      expect(br.read_4b).to eq(0xFFFFFFFF)  # uncompressed size
+      expect(br.read_2b).to eq(12)          # filename length
+      expect(br.read_2b).to eq(32)          # extra field length (we store the Zip64 extra field for this file)
+      expect(br.read_2b).to eq(0)           # file comment
+      expect(br.read_2b).to eq(0xFFFF)      # disk number, must be blanked to the maximum value because of The Unarchiver bug
+      expect(br.read_2b).to eq(0)           # internal file attributes
+      expect(br.read_4b).to eq(2175008768)  # external file attributes
+      expect(br.read_4b).to eq(0xFFFFFFFF)  # relative offset of local header
+      expect(br.read_n(12)).to eq('big-file.bin') # the filename
+      # Zip64 extra field
+      expect(br.read_2b).to eq(0x0001) # Tag for the "extra" block
+      expect(br.read_2b).to eq(28) # Size of this "extra" block. For us it will always be 28
+      expect(br.read_8b).to eq(big) # Original uncompressed file size
+      expect(br.read_8b).to eq(big) # Original compressed file size
+      expect(br.read_8b).to eq(0) # Offset of local header record
+      expect(br.read_4b).to eq(0) # Number of the disk on which this file starts
+    end
+    it 'writes the central directory for 2 files which, together, make the central directory start beyound the 4GB threshold' do
+      zip   = described_class.new
+      raw_buf = StringIO.new
+      zip_write_buf   = IOWrapper.new(raw_buf)
+      big1  = 0xFFFFFFFF/2 + 512
+      big2  = 0xFFFFFFFF/2 + 1024
+      mtime = Time.utc(2016, 7, 17, 13, 48)
+      zip.add_local_file_header(io: zip_write_buf, filename: 'first-big-file.bin', crc32: 12345, compressed_size: big1,
+                                uncompressed_size: big1, storage_mode: 0, mtime: mtime)
+      zip_write_buf.advance_position_by(big1)
+      zip.add_local_file_header(io: zip_write_buf, filename: 'second-big-file.bin', crc32: 54321, compressed_size: big2,
+                                uncompressed_size: big2, storage_mode: 0, mtime: mtime)
+      zip_write_buf.advance_position_by(big2)
+      fake_central_dir_offset   = zip_write_buf.tell # Grab the position in the underlying buffer
+      actual_central_dir_offset = raw_buf.tell # Grab the position in the underlying buffer
+      zip.write_central_directory(zip_write_buf)
+      # Seek to where the central directory begins
+      raw_buf.seek(actual_central_dir_offset, IO::SEEK_SET)
+      br = ByteReader.new(raw_buf)
+      # Standard central directory entry (similar to the local file header)
+      expect(br.read_4b).to eq(0x02014b50)  # Central directory entry sig
+      expect(br.read_2b).to eq(820)         # version made by
+      expect(br.read_2b).to eq(20)          # version need to extract (45 for Zip64)
+      expect(br.read_2b).to eq(0)           # general purpose bit flag
+      expect(br.read_2b).to eq(0)           # compression method (stored here)
+      expect(br.read_2b).to eq(28160)       # last mod file time
+      expect(br.read_2b).to eq(18673)       # last mod file date
+      expect(br.read_4b).to eq(12345)       # crc32
+      expect(br.read_4b).to eq(2147484159)  # compressed size
+      expect(br.read_4b).to eq(2147484159)  # uncompressed size
+      expect(br.read_2b).to eq(18)          # filename length
+      expect(br.read_2b).to eq(0)           # extra field length
+      expect(br.read_2b).to eq(0)           # file comment length
+      expect(br.read_2b).to eq(0)           # disk number, must be blanked to the maximum value because of The Unarchiver bug
+      expect(br.read_2b).to eq(0)           # internal file attributes
+      expect(br.read_4b).to eq(2175008768)  # external file attributes
+      expect(br.read_4b).to eq(0)           # relative offset of local header
+      expect(br.read_n(18)).to eq("first-big-file.bin") # the filename
+      # Standard central directory entry (similar to the local file header)
+      expect(br.read_4b).to eq(0x02014b50)  # Central directory entry sig
+      expect(br.read_2b).to eq(820)         # version made by
+      expect(br.read_2b).to eq(20)          # version need to extract (45 for Zip64)
+      expect(br.read_2b).to eq(0)           # general purpose bit flag
+      expect(br.read_2b).to eq(0)           # compression method (stored here)
+      expect(br.read_2b).to eq(28160)       # last mod file time
+      expect(br.read_2b).to eq(18673)       # last mod file date
+      expect(br.read_4b).to eq(54321)       # crc32
+      expect(br.read_4b).to eq(2147484671)  # compressed size
+      expect(br.read_4b).to eq(2147484671)  # uncompressed size
+      expect(br.read_2b).to eq(19)          # filename length
+      expect(br.read_2b).to eq(0)           # extra field length
+      expect(br.read_2b).to eq(0)           # file comment length
+      expect(br.read_2b).to eq(0)           # disk number, must be blanked to the maximum value because of The Unarchiver bug
+      expect(br.read_2b).to eq(0)           # internal file attributes
+      expect(br.read_4b).to eq(2175008768)  # external file attributes
+      expect(br.read_4b).to eq(2147484207)  # relative offset of local header
+      expect(br.read_n(19)).to eq('second-big-file.bin') # the filename
+      # zip64 specific values for a whole central directory
+      expect(br.read_4b).to eq(0x06064b50) # zip64 end of central dir signature
+      expect(br.read_8b).to eq(44) # size of zip64 end of central directory record
+      expect(br.read_2b).to eq(820) # version made by
+      expect(br.read_2b).to eq(45) # version need to extract
+      expect(br.read_4b).to eq(0) # number of this disk
+      expect(br.read_4b).to eq(0) # another number related to disk
+      expect(br.read_8b).to eq(2) # total number of entries in the central directory on this disk
+      expect(br.read_8b).to eq(2) # total number of entries in the central directory
+      expect(br.read_8b).to eq(129) # size of central directory
+      expect(br.read_8b).to eq(4294968927) # starting disk number
+      expect(br.read_4b).to eq(0x07064b50) # zip64 end of central dir locator signature
+      expect(br.read_4b).to eq(0) # number of disk ...
+      expect(br.read_8b).to eq(4294969056) # relative offset zip64
+      expect(br.read_4b).to eq(1) # total number of disks
+    end
+    it 'writes the central directory for 3 files which, the third of which will require the Zip64 extra since it is past the 4GB offset' do
+      zip   = described_class.new
+      raw_buf = StringIO.new
+      zip_write_buf   = IOWrapper.new(raw_buf)
+      big1  = 0xFFFFFFFF/2 + 512
+      big2  = 0xFFFFFFFF/2 + 1024
+      big3  = 0xFFFFFFFF/2 + 1024
+      mtime = Time.utc(2016, 7, 17, 13, 48)
+      zip.add_local_file_header(io: zip_write_buf, filename: 'one', crc32: 12345, compressed_size: big1,
+                                uncompressed_size: big1, storage_mode: 0, mtime: mtime)
+      zip_write_buf.advance_position_by(big1)
+      zip.add_local_file_header(io: zip_write_buf, filename: 'two', crc32: 54321, compressed_size: big2,
+                                uncompressed_size: big2, storage_mode: 0, mtime: mtime)
+      zip_write_buf.advance_position_by(big2)
+      big3_offset = zip_write_buf.tell
+      zip.add_local_file_header(io: zip_write_buf, filename: 'three', crc32: 54321, compressed_size: big2,
+                                uncompressed_size: big2, storage_mode: 0, mtime: mtime)
+      zip_write_buf.advance_position_by(big3)
+      fake_central_dir_offset   = zip_write_buf.tell # Grab the position in the underlying buffer
+      actual_central_dir_offset = raw_buf.tell # Grab the position in the underlying buffer
+      zip.write_central_directory(zip_write_buf)
+      # Seek to where the central directory begins
+      raw_buf.seek(actual_central_dir_offset, IO::SEEK_SET)
+      br = ByteReader.new(raw_buf)
+      # Standard central directory entry (similar to the local file header)
+      # Skip over two entries, because the other example has a 1-to-1 repeat of this
+      2.times {
+        br.read_4b
+        br.read_2b
+        br.read_2b
+        br.read_2b
+        br.read_2b
+        br.read_2b
+        br.read_2b
+        br.read_4b
+        br.read_4b
+        br.read_4b
+        br.read_2b
+        br.read_2b
+        br.read_2b
+        br.read_2b
+        br.read_2b
+        br.read_4b
+        br.read_4b
+        br.read_n(3)
+      }
+      # Entry for the third file DOES bear the Zip64 extra field
+      expect(br.read_4b).to eq(0x02014b50)  # Central directory entry sig
+      expect(br.read_2b).to eq(820)         # version made by
+      expect(br.read_2b).to eq(45)          # version need to extract (45 for Zip64) - this entry requires it
+      expect(br.read_2b).to eq(0)           # general purpose bit flag
+      expect(br.read_2b).to eq(0)           # compression method (stored here)
+      expect(br.read_2b).to eq(28160)       # last mod file time
+      expect(br.read_2b).to eq(18673)       # last mod file date
+      expect(br.read_4b).to eq(54321)       # crc32
+      expect(br.read_4b).to eq(0xFFFFFFFF)  # compressed size - blanked for Zip64
+      expect(br.read_4b).to eq(0xFFFFFFFF)  # uncompressed size - blanked for Zip64
+      expect(br.read_2b).to eq(5)           # filename length
+      expect(br.read_2b).to eq(32)          # extra field length (length of the ZIp64 extra)
+      expect(br.read_2b).to eq(0)           # file comment length
+      expect(br.read_2b).to eq(0xFFFF)      # disk number, with Zip64 must be blanked to the maximum value because of The Unarchiver bug
+      expect(br.read_2b).to eq(0)           # internal file attributes
+      expect(br.read_4b).to eq(2175008768)  # external file attributes
+      expect(br.read_4b).to eq(4294967295)  # relative offset of local header
+      expect(br.read_n(5)).to eq('three') # the filename
+      # then the Zip64 extra for that last file _only_
+      expect(br.read_2b).to eq(0x0001) # Tag for the "extra" block
+      expect(br.read_2b).to eq(28) # Size of this "extra" block. For us it will always be 28
+      expect(br.read_8b).to eq(big3) # Original uncompressed file size
+      expect(br.read_8b).to eq(big3) # Original compressed file size
+      expect(br.read_8b).to eq(big3_offset) # Offset of local header record
+      expect(br.read_4b).to eq(0) # Number of the disk on which this file starts
+      # zip64 specific values for a whole central directory
+      expect(br.read_4b).to eq(0x06064b50)  # zip64 end of central dir signature
+      expect(br.read_8b).to eq(44)          # size of zip64 end of central directory record
+      expect(br.read_2b).to eq(820)         # version made by
+      expect(br.read_2b).to eq(45)          # version need to extract
+      expect(br.read_4b).to eq(0)           # number of this disk
+      expect(br.read_4b).to eq(0)           # another number related to disk
+      expect(br.read_8b).to eq(3)           # total number of entries in the central directory on this disk
+      expect(br.read_8b).to eq(3)           # total number of entries in the central directory
+      expect(br.read_8b).to eq(181)         # size of central directory
+      expect(br.read_8b).to eq(6442453602)  # central directory offset from start of disk
-      skip "Not finished"
+      expect(br.read_4b).to eq(0x07064b50)  # Zip64 EOCD locator signature
+      expect(br.read_4b).to eq(0)           # Disk number with the start of central directory
+      expect(br.read_8b).to eq(6442453783)  # relative offset of the zip64 end of central directory record
+      expect(br.read_4b).to eq(1)           # total number of disks
     end
-    it 'writes the central directory 1 file that is larger than 4GB'
-    it 'writes the central directory for 2 files which, together, make the central directory start beyound the 4GB threshold'
   end
 end

data/zip_tricks.gemspec CHANGED Viewed

@@ -2,16 +2,16 @@
 # DO NOT EDIT THIS FILE DIRECTLY
 # Instead, edit Jeweler::Tasks in Rakefile, and run 'rake gemspec'
 # -*- encoding: utf-8 -*-
-# stub: zip_tricks 2.8.0 ruby lib
+# stub: zip_tricks 2.8.1 ruby lib
 Gem::Specification.new do |s|
   s.name = "zip_tricks"
-  s.version = "2.8.0"
+  s.version = "2.8.1"
   s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
   s.require_paths = ["lib"]
   s.authors = ["Julik Tarkhanov"]
-  s.date = "2016-07-18"
+  s.date = "2016-07-22"
   s.description = "Makes rubyzip stream, for real"
   s.email = "me@julik.nl"
   s.extra_rdoc_files = [
@@ -24,6 +24,7 @@ Gem::Specification.new do |s|
     ".travis.yml",
     ".yardopts",
     "Gemfile",
+    "IMPLEMENTATION_DETAILS.md",
     "LICENSE.txt",
     "README.md",
     "Rakefile",

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: zip_tricks
 version: !ruby/object:Gem::Version
-  version: 2.8.0
+  version: 2.8.1
 platform: ruby
 authors:
 - Julik Tarkhanov
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-07-18 00:00:00.000000000 Z
+date: 2016-07-22 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rubyzip
@@ -161,6 +161,7 @@ files:
 - ".travis.yml"
 - ".yardopts"
 - Gemfile
+- IMPLEMENTATION_DETAILS.md
 - LICENSE.txt
 - README.md
 - Rakefile