RubyGems - rocketjob - Versions diffs - 6.0.0.rc1 → 6.0.1 - Mend

rocketjob 6.0.0.rc1 → 6.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

checksums.yaml +4 -4
data/README.md +164 -8
data/lib/rocket_job/batch/categories.rb +25 -18
data/lib/rocket_job/batch/io.rb +130 -130
data/lib/rocket_job/batch/performance.rb +2 -2
data/lib/rocket_job/batch/statistics.rb +2 -2
data/lib/rocket_job/batch/throttle_running_workers.rb +1 -1
data/lib/rocket_job/batch/worker.rb +14 -12
data/lib/rocket_job/batch.rb +0 -1
data/lib/rocket_job/category/base.rb +10 -7
data/lib/rocket_job/category/input.rb +61 -1
data/lib/rocket_job/category/output.rb +9 -0
data/lib/rocket_job/cli.rb +1 -1
data/lib/rocket_job/dirmon_entry.rb +1 -1
data/lib/rocket_job/extensions/mongoid/contextual/mongo.rb +2 -2
data/lib/rocket_job/extensions/rocket_job_adapter.rb +2 -2
data/lib/rocket_job/job_exception.rb +1 -1
data/lib/rocket_job/jobs/conversion_job.rb +43 -0
data/lib/rocket_job/jobs/dirmon_job.rb +24 -35
data/lib/rocket_job/jobs/housekeeping_job.rb +4 -5
data/lib/rocket_job/jobs/on_demand_batch_job.rb +15 -11
data/lib/rocket_job/jobs/on_demand_job.rb +2 -2
data/lib/rocket_job/jobs/re_encrypt/relational_job.rb +103 -97
data/lib/rocket_job/jobs/upload_file_job.rb +6 -3
data/lib/rocket_job/lookup_collection.rb +4 -3
data/lib/rocket_job/plugins/cron.rb +60 -20
data/lib/rocket_job/plugins/job/persistence.rb +36 -0
data/lib/rocket_job/plugins/job/throttle.rb +2 -2
data/lib/rocket_job/plugins/restart.rb +3 -110
data/lib/rocket_job/plugins/state_machine.rb +2 -2
data/lib/rocket_job/plugins/throttle_dependent_jobs.rb +43 -0
data/lib/rocket_job/sliced/bzip2_output_slice.rb +18 -19
data/lib/rocket_job/sliced/compressed_slice.rb +3 -6
data/lib/rocket_job/sliced/encrypted_bzip2_output_slice.rb +49 -0
data/lib/rocket_job/sliced/encrypted_slice.rb +4 -6
data/lib/rocket_job/sliced/input.rb +42 -54
data/lib/rocket_job/sliced/slice.rb +7 -3
data/lib/rocket_job/sliced/slices.rb +12 -9
data/lib/rocket_job/sliced/writer/input.rb +46 -18
data/lib/rocket_job/sliced/writer/output.rb +0 -1
data/lib/rocket_job/sliced.rb +1 -19
data/lib/rocket_job/throttle_definitions.rb +7 -1
data/lib/rocket_job/version.rb +1 -1
data/lib/rocketjob.rb +4 -5
metadata +12 -12
data/lib/rocket_job/batch/tabular/input.rb +0 -133
data/lib/rocket_job/batch/tabular/output.rb +0 -67
data/lib/rocket_job/batch/tabular.rb +0 -58

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 2794f5dc5e0ada3ffdc3da9a13fd0cb6c5713f89254d93b69d60677283bc2d64
-  data.tar.gz: 3a208b181aca760b07432348bc2e51443a9da03cc6a143be81765ca2b3c0e37a
+  metadata.gz: 305189df5d57cf64c3d771bc05f940df6be6fd8c322ae9f3f796166fe99e1b75
+  data.tar.gz: a16ed0d77f1d1cb4e0ec4beefba922278a8b05e90ab5ecaffe6a3c347abdfed0
 SHA512:
-  metadata.gz: 44816973f2f63dc300fe41e168ae485cd8b7def5e3bbab173501f6ef2935d3f65707ec4c2f9eb7cb6b43bab2d464b163f3e10216be50babf2dab5e82a7998439
-  data.tar.gz: c0b2d210a3bb3faa49f30eeaf687052ed79e9da802635c6434e374c4f1ccc3538a71a4db9391f41e18a6265106aa7fedeaf920edf4e9d11acf81c9bc632534bd
+  metadata.gz: f2afe61652e3b6b225515b95e7b370f8c4a8853c30cad45d0fc93d49a86571764b480aaca5a5cfb264ea1ea86912b6b680e4e122952383f2425585e3683e8581
+  data.tar.gz: eb1b041fbe425143c5f2b64aa5e799c35bf6885c00020964cf87d87485ba359e19796f7e5c5378baefe9f4ca93fe5de900b78e6669c7ffe83838c87e197ab0e1

data/README.md CHANGED Viewed

@@ -17,21 +17,177 @@ Checkout https://rocketjob.io/
 * Questions? Join the chat room on Gitter for [rocketjob support](https://gitter.im/rocketjob/support)
 * [Report bugs](https://github.com/rocketjob/rocketjob/issues)
-## Rocket Job v5
+## Rocket Job v6
 - Support for Ruby v3 and Rails 6.
-- Multiple output file support through extended `output_categories` capability.
-    - File output formats for each category. For example: CSV, PSV, JSON, etc.
-- Support for AWS DocumentDB as the data store.
+- Major enhancements in Batch job support:
+    - Direct built-in Tabular support for all input and output categories.
+    - Multiple output file support, each with its own settings for:
+        - Compression
+            - GZip, Zip, BZip2 (Chunked for much faster loading into Apache Spark).
+        - Encryption
+            - PGP, Symmetric Encryption.
+        - File format
+            - CSV, PSV, JSON, Fixed Format, xlsx.
+- Significant error handling improvements, especially around throttle failures
+  that used to result in "hanging" jobs.
+- Support AWS DocumentDB in addition to MongoDB as the data store.
 - Removed use of Symbols to meet Symbol deprecation in MongoDB and Mongoid.
-The following plugins have been deprecated and will be removed in Rocket Job v5.1
-- RocketJob::Batch::Tabular::Input
-- RocketJob::Batch::Tabular::Output
+### Upgrading to Rocket Job v6
+The following plugins have been deprecated and are no longer loaded by default.
+- `RocketJob::Batch::Tabular::Input`
+- `RocketJob::Batch::Tabular::Output`
+If your code relies on these plugins and you still want to upgrade to Rocket Job v6,
+add the following require statement to any jobs that still use them:
+~~~ruby
+require "rocket_job/batch/tabular"
+~~~
+It is important to migrate away from these plugins, since they will be removed in a future release.
+#### Scheduled Jobs
+For any scheduled jobs that include the `RocketJob::Plugins::Cron` plugin, the default behavior has changed
+so that the scheduled job instance is created immediately after the currently scheduled instance starts.
+To maintain the old behavior of creating the job when it fails, aborts, or completes, add the following line
+to each of the applicable jobs:
+~~~ruby
+self.cron_after_start = false
+~~~
+Additionally, scheduled jobs will now prevent a new one from being created when another scheduled instance
+of the same job is already queued, or running with the _same_ `cron_schedule`.
+To maintain the old behavior of allowing multiple instances with the same cron schedule, add the following
+line to each of the applicable jobs:
+~~~ruby
+self.cron_singleton = false
+~~~
+##### Singleton
+Since Scheduled jobs now implement their own singleton logic, remove the singleton plugin from any scheduled jobs.
+#### Upgrading Batch Jobs to Rocket Job v6
+Rocket Job v6 replaces the array of symbol type for `input_categories` and `output_categories`
+with an array of `RocketJob::Category::Input` and `RocketJob::Category::Output`.
+Jobs that added or modified the input or output categories need to be upgraded. For example:
+~~~ruby
+class MyJob < RocketJob::Job
+  include RocketJob::Batch
+  self.output_categories = [:main, :errors, :ignored]
+end
+~~~
+Needs to be changed to:
+~~~ruby
+class MyJob < RocketJob::Job
+  include RocketJob::Batch
+  output_category name: :main
+  output_category name: :errors
+  output_category name: :ignored
+end
+~~~
+##### slice_size, encrypt, compress
+These fields have been removed from the job itself:
+~~~ruby
+class MyJob < RocketJob::Job
+  include RocketJob::Batch
+  self.slice_sice = 1_000
+  self.encrypt    = true
+  self.compress   = true
+end
+~~~
+They are now specified on the `input_category` as follows:
+- `slice_size` just moves under `input_category`.
+- `encrypt` becomes an option to `serializer`.
+- `compress` is now the default for all batch jobs so is not needed.
+If the serializer is set to `encrypt` then it is automatically compressed.
+~~~ruby
+class MyJob < RocketJob::Job
+  include RocketJob::Batch
+  input_category slice_sice: 1_000, serializer: :encrypt
+end
+~~~
+##### collect_output, collect_nil_output
+The following fields have been moved from the job itself:
+~~~ruby
+class MyJob < RocketJob::Job
+  include RocketJob::Batch
+  self.collect_output     = true
+  self.collect_nil_output = true
+end
+~~~
+Into the corresponding `output_category`:
+- `collect_output` no longer has any meaning. Output is collected anytime an `output_category` is defined.
+- `collect_nil_output` is now the option `nils` on the `output_category.
+  It defaults to `false` so that by default any `nil` output from the `perform` method is not collected.
+~~~ruby
+class MyJob < RocketJob::Job
+  include RocketJob::Batch
+  output_category nils: true
+end
+~~~
+##### name
+For both `input_category` and `output_category`, when the `name` argument is not supplied
+it defaults to `:main`.
+For Example:
+~~~ruby
+class MyJob < RocketJob::Job
+  include RocketJob::Batch
+  input_category name: :main, serializer: :encrypt
+  output_category name: :main
+end
+~~~
+Is the same as:
+~~~ruby
+class MyJob < RocketJob::Job
+  include RocketJob::Batch
+  input_category serializer: :encrypt
+  output_category
+end
+~~~
+##### Existing and inflight jobs
+When migrating to Rocket Job 6, it is recommended to load every job and then save it back again as part of the
+deployment. When the job loads it will automatically convert itself from the old schema to the new v6 schema.
+In flight jobs should not be affected, other than it is important to shutdown all running batch
+servers _before_ running any new instances.
 ## Rocket Job v4
-Rocket Job Pro is now open source and included in Rocket Job.
+Rocket Job Pro is now fully open source and included in Rocket Job under the Apache License.
 The `RocketJob::Batch` plugin now adds batch processing capabilities to break up a single task into many
 concurrent workers processing slices of the entire job at the same time.

data/lib/rocket_job/batch/categories.rb CHANGED Viewed

@@ -72,31 +72,38 @@ module RocketJob
       end
       def input_category(category_name = :main)
+        return category_name if category_name.is_a?(Category::Input)
+        raise(ArgumentError, "Cannot supply Output Category to input category") if category_name.is_a?(Category::Output)
         category_name = category_name.to_sym
-        category      = nil
-        # .find does not work against this association
-        input_categories.each { |catg| category = catg if catg.name == category_name }
-        unless category
-          # Auto-register main input category if missing
-          if category_name == :main
-            category              = Category::Input.new
-            self.input_categories = [category]
-          else
-            raise(ArgumentError, "Unknown Input Category: #{category_name.inspect}. Registered categories: #{input_categories.collect(&:name).join(',')}")
-          end
+        # find does not work against this association
+        input_categories.each { |category| return category if category.name == category_name }
+        unless category_name == :main
+          raise(
+            ArgumentError,
+            "Unknown Input Category: #{category_name.inspect}. Registered categories: #{input_categories.collect(&:name).join(',')}"
+          )
         end
+        # Auto-register main input category when not defined
+        category = Category::Input.new(job: self)
+        self.input_categories << category
         category
       end
       def output_category(category_name = :main)
+        return category_name if category_name.is_a?(Category::Output)
+        raise(ArgumentError, "Cannot supply Input Category to output category") if category_name.is_a?(Category::Input)
         category_name = category_name.to_sym
-        category      = nil
         # .find does not work against this association
-        output_categories.each { |catg| category = catg if catg.name == category_name }
-        unless category
-          raise(ArgumentError, "Unknown Output Category: #{category_name.inspect}. Registered categories: #{output_categories.collect(&:name).join(',')}")
-        end
-        category
+        output_categories.each { |category| return category if category.name == category_name }
+        raise(
+          ArgumentError,
+          "Unknown Output Category: #{category_name.inspect}. Registered categories: #{output_categories.collect(&:name).join(',')}"
+        )
       end
       # Returns [true|false] whether the named category has already been defined
@@ -211,7 +218,7 @@ module RocketJob
         category.tabular.render(row)
       end
-      # Migrate existing v4 batch jobs to v5.0
+      # Migrate existing v5 batch jobs to v6
       def rocketjob_categories_migrate
         return unless attribute_present?(:input_categories) && self[:input_categories]&.first.is_a?(Symbol)

data/lib/rocket_job/batch/io.rb CHANGED Viewed

@@ -14,11 +14,9 @@ module RocketJob
       #     Default: None ( Uses the single default input collection for this job )
       #     Validates: This value must be one of those listed in #input_categories
       def input(category = :main)
-        raise(ArgumentError, "Cannot supply Output Category to input category") if category.is_a?(Category::Output)
+        category = input_category(category)
-        category = input_category(category) unless category.is_a?(Category::Input)
-        (@inputs ||= {})[category.name] ||= RocketJob::Sliced.factory(:input, category, self)
+        (@inputs ||= {})[category.name] ||= category.data_store(self)
       end
       # Returns [RocketJob::Sliced::Output] output collection for holding output slices
@@ -30,11 +28,9 @@ module RocketJob
       #     Default: None ( Uses the single default output collection for this job )
       #     Validates: This value must be one of those listed in #output_categories
       def output(category = :main)
-        raise(ArgumentError, "Cannot supply Input Category to output category") if category.is_a?(Category::Input)
-        category = output_category(category) unless category.is_a?(Category::Output)
+        category = output_category(category)
-        (@outputs ||= {})[category.name] ||= RocketJob::Sliced.factory(:output, category, self)
+        (@outputs ||= {})[category.name] ||= category.data_store(self)
       end
       # Rapidly upload individual records in batches.
@@ -59,19 +55,19 @@ module RocketJob
       #     The category or the name of the category to access or download data from
       #     Default: None ( Uses the single default output collection for this job )
       #     Validates: This value must be one of those listed in #input_categories
-      def lookup_collection(category = :main)
-        category = input_category(category) unless category.is_a?(Category::Input)
-        collection = (@lookup_collections ||= {})[category.name]
-        unless collection
-          collection_name = "rocket_job.inputs.#{id}"
-          collection_name << ".#{category.name}" unless category.name == :main
-          @lookup_collections[category.name] ||=
-            LookupCollection.new(Sliced::Slice.collection.database, collection_name)
-        end
-      end
+      # def lookup_collection(category = :main)
+      #   category = input_category(category) unless category.is_a?(Category::Input)
+      #
+      #   collection = (@lookup_collections ||= {})[category.name]
+      #
+      #   unless collection
+      #     collection_name = "rocket_job.inputs.#{id}"
+      #     collection_name << ".#{category.name}" unless category.name == :main
+      #
+      #     @lookup_collections[category.name] ||=
+      #       LookupCollection.new(Sliced::Slice.collection.database, collection_name)
+      #   end
+      # end
       # Upload the supplied file, io, IOStreams::Path, or IOStreams::Stream.
       #
@@ -154,53 +150,7 @@ module RocketJob
       # * If an io stream is supplied, it is read until it returns nil.
       # * Only use this method for UTF-8 data, for binary data use #input_slice or #input_records.
       # * CSV parsing is slow, so it is usually left for the workers to do.
-      def upload(stream = nil, file_name: nil, category: :main, stream_mode: :line, on_first: nil, **args, &block)
-        raise(ArgumentError, "Either stream, or a block must be supplied") unless stream || block
-        category = input_category(category) unless category.is_a?(Category::Input)
-        stream   ||= category.file_name
-        path     = nil
-        if stream
-          path                  = IOStreams.new(stream)
-          path.file_name        = file_name if file_name
-          category.file_name    = path.file_name
-          # Auto detect the format based on the upload file name if present.
-          if category.format == :auto
-            format = path.format
-            if format
-              # Rebuild tabular with the above file name
-              category.reset_tabular
-              category.format = format
-            end
-          end
-        end
-        # Tabular transformations required for upload?
-        if category.tabular?
-          # Remove non-printable characters from tabular input formats
-          # Cannot change the length of fixed width lines
-          replace = category.format == :fixed ? " " : ""
-          path&.option_or_stream(:encode, encoding: "UTF-8", cleaner: :printable, replace: replace)
-          # Extract the header line during the file upload when needed.
-          on_first = rocket_job_upload_header_lambda(category, on_first) if category.tabular.header?
-        end
-        count =
-          if block
-            input(category).upload(on_first: on_first, &block)
-          else
-            input(category).upload(on_first: on_first) do |io|
-              path.each(stream_mode, **args) { |line| io << line }
-            end
-          end
-        self.record_count = (record_count || 0) + count
-        count
-      end
+      #
       # Upload results from an Arel into RocketJob::SlicedJob.
       #
       # Params
@@ -227,18 +177,13 @@ module RocketJob
       #
       # Example: Upload user_name and zip_code
       #   arel = User.where(country_code: 'US')
-      #   job.upload_arel(arel, :user_name, :zip_code)
+      #   job.upload_arel(arel, columns: [:user_name, :zip_code])
       #
       # Notes:
       # * Only call from one thread at a time against a single instance of this job.
       # * The record_count for the job is set to the number of records returned by the arel.
       # * If an exception is raised while uploading data, the input collection is cleared out
       #   so that if a job is retried during an upload failure, data is not duplicated.
-      def upload_arel(arel, *column_names, category: :main, &block)
-        count             = input(category).upload_arel(arel, *column_names, &block)
-        self.record_count = (record_count || 0) + count
-        count
-      end
       # Upload the result of a MongoDB query to the input collection for processing
       # Useful when an entire MongoDB collection, or part thereof needs to be
@@ -266,24 +211,19 @@ module RocketJob
       #   criteria = User.where(state: 'FL')
       #   job.record_count = job.upload_mongo_query(criteria)
       #
-      # Example: Upload just the supplied column
+      # Example: Upload only the specified column(s)
       #   criteria = User.where(state: 'FL')
-      #   job.record_count = job.upload_mongo_query(criteria, :zip_code)
+      #   job.record_count = job.upload_mongo_query(criteria, columns: [:zip_code])
       #
       # Notes:
       # * Only call from one thread at a time against a single instance of this job.
       # * The record_count for the job is set to the number of records returned by the monqo query.
       # * If an exception is raised while uploading data, the input collection is cleared out
       #   so that if a job is retried during an upload failure, data is not duplicated.
-      def upload_mongo_query(criteria, *column_names, category: :main, &block)
-        count             = input(category).upload_mongo_query(criteria, *column_names, &block)
-        self.record_count = (record_count || 0) + count
-        count
-      end
       # Upload sliced range of integer requests as arrays of start and end ids.
       #
-      # Returns [Integer] last_id - start_id + 1.
+      # Returns [Integer] the number of slices uploaded.
       #
       # Uploads one range per slice so that the response can return multiple records
       # for each slice processed
@@ -302,17 +242,11 @@ module RocketJob
       # * The record_count for the job is set to: last_id - start_id + 1.
       # * If an exception is raised while uploading data, the input collection is cleared out
       #   so that if a job is retried during an upload failure, data is not duplicated.
-      def upload_integer_range(start_id, last_id, category: :main)
-        input(category).upload_integer_range(start_id, last_id)
-        count             = last_id - start_id + 1
-        self.record_count = (record_count || 0) + count
-        count
-      end
       # Upload sliced range of integer requests as an arrays of start and end ids
       # starting with the last range first
       #
-      # Returns [Integer] last_id - start_id + 1.
+      # Returns [Integer] the number of slices uploaded.
       #
       # Uploads one range per slice so that the response can return multiple records
       # for each slice processed.
@@ -334,14 +268,102 @@ module RocketJob
       # * The record_count for the job is set to: last_id - start_id + 1.
       # * If an exception is raised while uploading data, the input collection is cleared out
       #   so that if a job is retried during an upload failure, data is not duplicated.
-      def upload_integer_range_in_reverse_order(start_id, last_id, category: :main)
-        input(category).upload_integer_range_in_reverse_order(start_id, last_id)
-        count             = last_id - start_id + 1
+      def upload(object = nil, category: :main, file_name: nil, stream_mode: nil, on_first: nil, columns: nil, slice_batch_size: nil, **args, &block)
+        input_collection = input(category)
+        if block
+          raise(ArgumentError, "Cannot supply both an object to upload, and a block.") if object
+          if stream_mode || columns || slice_batch_size || args.size > 0
+            raise(ArgumentError, "Unknown keyword arguments when uploading a block. Only accepts :category, :file_name, or :on_first")
+          end
+          category           = input_category(category)
+          category.file_name = file_name if file_name
+          # Extract the header line during the upload when applicable.
+          extract_header = category.extract_header_callback(on_first)
+          count             = input_collection.upload(on_first: extract_header, slice_batch_size: slice_batch_size, &block)
+          self.record_count = (record_count || 0) + count
+          return count
+        end
+        count =
+          case object
+          when Range
+            if file_name || stream_mode || on_first || args.size > 0
+              raise(ArgumentError, "Unknown keyword arguments when uploading a Range. Only accepts :category, :columns, or :slice_batch_size")
+            end
+            first = object.first
+            last  = object.last
+            if first < last
+              input_collection.upload_integer_range(first, last, slice_batch_size: slice_batch_size || 1_000)
+            else
+              input_collection.upload_integer_range_in_reverse_order(last, first, slice_batch_size: slice_batch_size || 1_000)
+            end
+          when Mongoid::Criteria
+            if file_name || stream_mode || on_first || args.size > 0
+              raise(ArgumentError, "Unknown keyword arguments when uploading a Mongoid::Criteria. Only accepts :category, :columns, or :slice_batch_size")
+            end
+            input_collection.upload_mongo_query(object, columns: columns, slice_batch_size: slice_batch_size, &block)
+          when defined?(ActiveRecord::Relation) ? ActiveRecord::Relation : false
+            if file_name || stream_mode || on_first || args.size > 0
+              raise(ArgumentError, "Unknown keyword arguments when uploading an ActiveRecord::Relation. Only accepts :category, :columns, or :slice_batch_size")
+            end
+            input_collection.upload_arel(object, columns: columns, slice_batch_size: slice_batch_size, &block)
+          else
+            raise(ArgumentError, "Unknown keyword argument :columns when uploading a file") if columns
+            category = input_category(category)
+            # Extract the header line during the upload when applicable.
+            extract_header = category.extract_header_callback(on_first)
+            path = category.upload_path(object, original_file_name: file_name)
+            input_collection.upload(on_first: extract_header, slice_batch_size: slice_batch_size) do |io|
+              path.each(stream_mode || :line, **args) { |line| io << line }
+            end
+          end
+        self.record_count = (record_count || 0) + count
+        count
+      end
+      # @deprecated
+      def upload_arel(arel, *column_names, category: :main, &block)
+        count             = input(category).upload_arel(arel, columns: column_names, &block)
         self.record_count = (record_count || 0) + count
         count
       end
-      # Upload the supplied slices for processing by workers
+      # @deprecated
+      def upload_mongo_query(criteria, *column_names, category: :main, &block)
+        count             = input(category).upload_mongo_query(criteria, columns: column_names, &block)
+        self.record_count = (record_count || 0) + count
+        count
+      end
+      # @deprecated
+      def upload_integer_range(start_id, last_id, category: :main, slice_batch_size: 1_000)
+        count             = input(category).upload_integer_range(start_id, last_id, slice_batch_size: slice_batch_size)
+        self.record_count = (record_count || 0) + count
+        count
+      end
+      # @deprecated
+      def upload_integer_range_in_reverse_order(start_id, last_id, category: :main, slice_batch_size: 1_000)
+        count             = input(category).upload_integer_range_in_reverse_order(start_id, last_id, slice_batch_size: slice_batch_size)
+        self.record_count = (record_count || 0) + count
+        count
+      end
+      # Upload the supplied slice for processing by workers
       #
       # Updates the record_count after adding the records
       #
@@ -421,56 +443,34 @@ module RocketJob
       def download(stream = nil, category: :main, header_line: nil, **args, &block)
         raise "Cannot download incomplete job: #{id}. Currently in state: #{state}-#{sub_state}" if rocket_job_processing?
-        category          = output_category(category) unless category.is_a?(Category::Output)
-        output_collection = output(category)
+        category           = output_category(category) unless category.is_a?(Category::Output)
+        output_collection  = output(category)
         # Store the output file name in the category
         category.file_name = stream if !block && (stream.is_a?(String) || stream.is_a?(IOStreams::Path))
-        if output_collection.binary?
-          raise(ArgumentError, "A `header_line` is not supported with binary output collections") if header_line
-          return output_collection.download(&block) if block
+        header_line ||= category.render_header
-          IOStreams.new(stream || category.file_name).stream(:none).writer(**args) do |io|
-            output_collection.download { |record| io << record[:binary] }
-          end
-        else
-          header_line ||= category.render_header
+        return output_collection.download(header_line: header_line, &block) if block
-          return output_collection.download(header_line: header_line, &block) if block
+        raise(ArgumentError, "Missing mandatory `stream` or `category.file_name`") unless stream || category.file_name
-          raise(ArgumentError, "Missing mandatory `stream` or `category.file_name`") unless stream || category.file_name
+        if output_collection.slice_class.binary_format
+          binary_header_line = output_collection.slice_class.to_binary(header_line) if header_line
+          # Don't overwrite supplied stream options if any
+          stream = stream&.is_a?(IOStreams::Stream) ? stream.dup : IOStreams.new(category.file_name)
+          stream.remove_from_pipeline(output_collection.slice_class.binary_format)
+          stream.writer(**args) do |io|
+            # TODO: Binary formats should return the record count, instead of the slice count.
+            output_collection.download(header_line: binary_header_line) { |record| io.write(record) }
+          end
+        else
           IOStreams.new(stream || category.file_name).writer(:line, **args) do |io|
             output_collection.download(header_line: header_line) { |record| io << record }
           end
         end
       end
-      private
-      # Return a lambda to extract the header row from the uploaded file.
-      def rocket_job_upload_header_lambda(category, on_first)
-        case category.mode
-        when :line
-          lambda do |line|
-            category.tabular.parse_header(line)
-            category.cleanse_header!
-            category.columns = category.tabular.header.columns
-            # Call chained on_first if present
-            on_first&.call(line)
-          end
-        when :array
-          lambda do |row|
-            category.tabular.header.columns = row
-            category.cleanse_header!
-            category.columns = category.tabular.header.columns
-            # Call chained on_first if present
-            on_first&.call(line)
-          end
-        end
-      end
     end
   end
 end

data/lib/rocket_job/batch/performance.rb CHANGED Viewed

@@ -22,7 +22,7 @@ module RocketJob
         count_running_workers
         puts "Loading job with #{count} records/lines"
-        job = RocketJob::Jobs::PerformanceJob.new(log_level: :warn)
+        job                           = RocketJob::Jobs::PerformanceJob.new(log_level: :warn)
         job.input_category.slice_size = slice_size
         if encrypt
           job.input_category.serializer  = :encrypt
@@ -64,7 +64,7 @@ module RocketJob
       # Parse command line options
       def parse(argv)
-        parser        = OptionParser.new do |o|
+        parser = OptionParser.new do |o|
           o.on("-c", "--count COUNT", "Count of records to enqueue") do |arg|
             self.count = arg.to_i
           end

data/lib/rocket_job/batch/statistics.rb CHANGED Viewed

@@ -49,7 +49,7 @@ module RocketJob
           last  = paths.pop
           return unless last
-          last_target       = paths.inject(in_memory) do |target, sub_key|
+          last_target = paths.inject(in_memory) do |target, sub_key|
             target.key?(sub_key) ? target[sub_key] : target[sub_key] = Hash.new(0)
           end
           last_target[last] += increment
@@ -99,7 +99,7 @@ module RocketJob
       # Overrides RocketJob::Batch::Logger#rocket_job_batch_log_payload
       def rocket_job_batch_log_payload
-        h              = {
+        h = {
           from:  aasm.from_state,
           to:    aasm.to_state,
           event: aasm.current_event

data/lib/rocket_job/batch/throttle_running_workers.rb CHANGED Viewed

@@ -53,7 +53,7 @@ module RocketJob
       # Allows another job with a higher priority to start even though this one is running already
       # @overrides RocketJob::Plugins::Job::ThrottleRunningJobs#throttle_running_jobs_base_query
       def throttle_running_jobs_base_query
-        query = super
+        query                = super
         query[:priority.lte] = priority if throttle_running_workers&.positive?
         query
       end