RubyGems - just-keep-zipping - Versions diffs - 0.0.1 → 0.0.2 - Mend

just-keep-zipping 0.0.1 → 0.0.2

Files changed (4) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-SHA1:
-  metadata.gz: dc550e2a2c93e0959eab63397efdd23c99dc07fe
-  data.tar.gz: f09576ae77642fcb284faac26623ebf741ed89ce
+SHA256:
+  metadata.gz: 98bb5c411d4764cdbaa2b15a6d938443a9ee4b8ce05863338d0e234222c135a3
+  data.tar.gz: 3fe9a60cde27e9c204c5509dc994bc165153d478bb51ade721cb76433f8cc191
 SHA512:
-  metadata.gz: aa481e0ef062dbd9ee8e9c861fdf9c13cc7887bdfe63804cca593da3315773ecfc7c4a7263699d5b836e692146d7ca7faa323c3fff17213abcbb29cb5da2b76b
-  data.tar.gz: 2fd58dbbb21d993d613e6e4c86b1cbfd76ece44d81da2f7f2b9bd0b8f1a362e3132e3172a3e7733fa3a97ac4285c100a4124a6805973a34981f7764094bdde51
+  metadata.gz: 584cb29baaa3ffff8d7adb59703fbf853f13a14757ebd654004b626a2e705f0edd5df112ceb67c178d3fe5717d3efaa6c57e1d2bbd93e9ae86f00f392042e103
+  data.tar.gz: 81f45ea7a320dd08c1627103cd6730c296995084c42510c3de62acace301a51b07e967cfafdfff8fd5f9af16bbe7e11084cd77c0ac9d3328504f854d65a7629e

data/README.md CHANGED Viewed

@@ -1,6 +1,20 @@
 # just-keep-zipping
-Produce a zip file from many source files, in a streaming or distributed fashion.
+Produce a ZIP file from many source files, in a streaming or distributed fashion.
+The ZIP format is well suited for quick updates, allowing appends of new data without needing to extract and compress
+the entire archive. This is possible because the ZIP header is written at the end of the file, and a new header can be
+added after new data is added. However, the file must be available locally for ZIP tools to operate effectively. If the
+file is remote, then the entire archive must be downloaded, updated, then uploaded--which is a heavyweight method of
+adding small files to a large archive.
+Memory, disk space, and CPU time are all limits when running in a cloud environment, and it does not always scale to
+require the production of an entire ZIP archive to occur within a single processing unit.
+Just Keep Zipping allows a large ZIP archive to be produced in parts, on one machine or many, and can be used with
+Amazon S3 or Google Cloud Storage.
+The instance is Marshallable, and the `progress_data` used between steps can be stored in Redis or another object store.
 ## Usage
@@ -32,3 +46,44 @@ Assemble the zip
 	data = incomplete_data + ending_data
+## Amazon S3
+https://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Object.html#initiate_multipart_upload-instance_method
+Each interval, e.g. 50-100 files, save the current data into s3. When finished, use a Multipart Upload with
+`copy_part` to combine the parts into a whole.
+	zip = JustKeepZipping.new
+	zip.add 'file1.txt', 'Data to be zipped'
+	bucket.object('part_one').put zip.read
+	zip.add 'file2.txt', 'More data to be zipped'
+	zip.close
+	bucket.object('part_two').put zip.read
+	upload = bucket.object('archive.zip').initiate_multipart_upload
+	upload.part(1).copy_from copy_source: "bucket/part_one"
+	upload.part(2).copy_from copy_source: "bucket/part_two"
+	upload.complete compute_parts: true
+## Google Cloud Storage
+http://googleapis.github.io/google-cloud-ruby/docs/google-cloud-storage/latest/Google/Cloud/Storage/Bucket.html#compose-instance_method
+Each interval, e.g. 50-100 files, save the current data into s3. When finished, use the compose method to join the parts
+into a whole (for more than 32 parts, iteratively compose the destination file as an input of the next group).
+	zip = JustKeepZipping.new
+	zip.add 'file1.txt', 'Data to be zipped'
+	bucket.create_file StringIO.new(zip.read), 'part_one'
+	zip.add 'file2.txt', 'More data to be zipped'
+	zip.close
+	bucket.create_file StringIO.new(zip.read), 'part_two'
+	bucket.compose ['part_one', 'part_two'], 'archive.zip'

data/lib/just-keep-zipping.rb CHANGED Viewed

@@ -1,17 +1,45 @@
 require 'zip'
+# Allows the creating of large ZIP files in a streaming fashion.
+#
+# Example:
+#   zip = JustKeepZipping.new
+#   zip.add 'file1.txt', 'Data to be zipped'
+#   data1 = zip.read
+#   progress = Marshal.dump zip # into an object store?
+#
+#   zip = Marshal.load progress # load from object store?
+#   zip.add 'file2.txt', 'More data to be zipped'
+#   zip.close
+#   data2 = zip.read
+#
+#   complete_archive = data1 + data2
+#
 class JustKeepZipping
   attr_reader :entries
+  # Use the constructor for the initial object creation.
+  # Use Marshal.dump and Marshal.load (e.g. with Redis) to tranfer this instance between
+  # compute units (e.g. Sidekiq jobs).
+  #
   def initialize
     @entries = []
     @data = ''
   end
+  # The current data size. Use this as a stopping or checkpoint condition, to
+  # keep memory from growing too large.
+  #
   def current_size
     @data.size
   end
+  # Add a file to the archive.
+  #
+  # Params:
+  # +filename+:: a string representing the name of the file as it should appear in the archive
+  # +body+:: a string or IO object that represents the contents of the file
+  #
   def add(filename, body)
     io = Zip::OutputStream.write_buffer do |zip|
       zip.put_next_entry filename
@@ -31,6 +59,10 @@ class JustKeepZipping
     nil
   end
+  # Finalizes the archive by adding the trailing ZIP header. A final read must be called to get the data.
+  #
+  # No further files should be added after calling close.
+  #
   def close
     contents_size = 0
     @entries.each do |e|
@@ -51,6 +83,11 @@ class JustKeepZipping
     contents_size
   end
+  # Get the current ZIP data, to save in an object store like S3 or GCS.
+  #
+  # Do this before persisting this instance with Marshal.dump, to avoid
+  # placing too much progress data into a temporary object store like Redis.
+  #
   def read
     data = @data
     @data = ''

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: just-keep-zipping
 version: !ruby/object:Gem::Version
-  version: 0.0.1
+  version: 0.0.2
 platform: ruby
 authors:
 - Ryan Calhoun
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-11-27 00:00:00.000000000 Z
+date: 2018-11-28 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rubyzip
@@ -54,7 +54,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.6.11
+rubygems_version: 2.7.6
 signing_key:
 specification_version: 4
 summary: Just Keep Zipping