RubyGems - sidekiq-iteration - Versions diffs - 0.1.0 → 0.3.0 - Mend

sidekiq-iteration 0.1.0 → 0.3.0

Files changed (16) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +23 -0
data/README.md +19 -11
data/guides/argument-semantics.md +130 -0
data/guides/best-practices.md +5 -0
data/guides/custom-enumerator.md +34 -7
data/lib/sidekiq_iteration/active_record_enumerator.rb +152 -23
data/lib/sidekiq_iteration/csv_enumerator.rb +2 -10
data/lib/sidekiq_iteration/enumerators.rb +3 -4
data/lib/sidekiq_iteration/iteration.rb +13 -7
data/lib/sidekiq_iteration/job_retry_patch.rb +16 -3
data/lib/sidekiq_iteration/version.rb +1 -1
data/lib/sidekiq_iteration.rb +11 -0
metadata +5 -6
data/lib/sidekiq_iteration/active_record_batch_enumerator.rb +0 -127
data/lib/sidekiq_iteration/active_record_cursor.rb +0 -89

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 82316cffa840b2c9619792b6f0c5bb7ec696f964ad81edf6b0ed5861339ca064
-  data.tar.gz: 8337c0e87e6be8858d5b9d868c8a04b05f91de315be2e9aca9c40e8447c78644
+  metadata.gz: 40efca13e06cd7fdcfc1ff59ad08fea8fc731ee1b5560ae5b75b2591379bcb63
+  data.tar.gz: eec2991b40bb67ffc1dcea55f1c0e8acc98a18cd59ad2fd117cd80c1a94c3e79
 SHA512:
-  metadata.gz: 63712780bca873613cbe3ef89ff0037c3eaa5633a28382eb02427311360714a89f092e5efef29873efa75067d22f32745eea3f04d7606442e29da33e2e2e6a08
-  data.tar.gz: 4ded7fc6ab772c019154e6559027c87d389a356a16915d2c869e318fb26f5339dd98355a0f1eb422358c1cd9f7fe31bc2a70d5627a577f8f45394db15d0593b7
+  metadata.gz: 1162ffafc4d157e7a8f9d2b8f69163e90e83431daac129707e16605a9b0df250c1cf2dd9063f651782b01ab0dcd9a0f3848d381981f94ef5a2daf36f43d591be
+  data.tar.gz: 45a1efa4e1e65ae322b7c923cb5795ceda901863afc4d135cb25e1b342c9604e7f90719259b7fd93b8c38b15a5bcb4cab984617ea915fb58055f21936470269c

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,28 @@
 ## master (unreleased)
+## 0.3.0 (2023-05-20)
+- Allow a default retry backoff to be configured
+    ```ruby
+    SidekiqIteration.default_retry_backoff = 10.seconds
+    ```
+- Add ability to iterate Active Record enumerators in reverse order
+    ```ruby
+    active_record_records_enumerator(User.all, order: :desc)
+    ```
+## 0.2.0 (2022-11-11)
+- Fix storing run metadata when the job fails for sidekiq < 6.5.2
+- Make enumerators resume from the last cursor position
+  This fixes `NestedEnumerator` to work correctly. Previously, each intermediate enumerator
+  was resumed from the next cursor position, possibly skipping remaining inner items.
 ## 0.1.0 (2022-11-02)
 - First release

data/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 [![Build Status](https://github.com/fatkodima/sidekiq-iteration/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fatkodima/sidekiq-iteration/actions/workflows/ci.yml)
-Meet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
+Meet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your long-running jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
 ## Background
@@ -33,7 +33,7 @@ Software that is designed for high availability [must be resilient](https://12fa
 - Ruby 2.7+ (if you need support for older ruby, [open an issue](https://github.com/fatkodima/sidekiq-iteration/issues/new))
 - Sidekiq 6+
-## Getting started
+## Installation
 Add this line to your application's Gemfile:
@@ -45,6 +45,8 @@ And then execute:
     $ bundle
+## Getting started
 In the job, include `SidekiqIteration::Iteration` module and start describing the job with two methods (`build_enumerator` and `each_iteration`) instead of `perform`:
 ```ruby
@@ -136,10 +138,10 @@ class BatchesJob
 end
 ```
-### Iterating over batches of Active Record Relations
+### Iterating over Active Record Relations
 ```ruby
-class BatchesAsRelationJob
+class RelationsJob
   include Sidekiq::Job
   include SidekiqIteration::Iteration
@@ -151,14 +153,14 @@ class BatchesAsRelationJob
     )
   end
-  def each_iteration(batch_of_comments, product_id)
-    # batch_of_comments will be a Comment::ActiveRecord_Relation
-    batch_of_comments.update_all(deleted: true)
+  def each_iteration(comments_relation, product_id)
+    # comments_relation will be a Comment::ActiveRecord_Relation
+    comments_relation.update_all(deleted: true)
   end
 end
 ```
-### Iterating over arrays
+### Iterating over arbitrary arrays
 ```ruby
 class ArrayJob
@@ -184,10 +186,10 @@ class CsvJob
   def build_enumerator(import_id, cursor:)
     import = Import.find(import_id)
-    csv_enumereator(import.csv, cursor: cursor)
+    csv_enumerator(import.csv, cursor: cursor)
   end
-  def each_iteration(csv_row)
+  def each_iteration(csv_row, import_id)
     # insert csv_row to database
   end
 end
@@ -220,6 +222,7 @@ end
 ## Guides
 * [Iteration: how it works](guides/iteration-how-it-works.md)
+* [Job argument semantics](guides/argument-semantics.md)
 * [Best practices](guides/best-practices.md)
 * [Writing custom enumerator](guides/custom-enumerator.md)
 * [Throttling](guides/throttling.md)
@@ -228,10 +231,15 @@ For more detailed documentation, see [rubydoc](https://rubydoc.info/gems/sidekiq
 ## API
-Iteration job must respond to `build_enumerator` and `each_iteration` methods. `build_enumerator` must return [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.htmll) object that respects the `cursor` value.
+Iteration job must respond to `build_enumerator` and `each_iteration` methods. `build_enumerator` must return [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) object that respects the `cursor` value.
 ## FAQ
+**Advantages of this pattern over splitting a large job into many small jobs?**
+* Having one job is easier for redis in terms of memory, time and # of requests needed for enqueuing.
+* It simplifies sidekiq monitoring, because you have a predictable number of jobs in the queues, instead of having thousands of them at one time and millions at another. Also easier to navigate its web UI.
+* You can stop/pause/delete just one job, if something goes wrong. With many jobs it is harder and can take a long time, if it is critical to stop it right now.
 **Why can't I just iterate in `#perform` method and do whatever I want?** You can, but then your job has to comply with a long list of requirements, such as the ones above. This creates leaky abstractions more easily, when instead we can expose a more powerful abstraction for developers without exposing the underlying infrastructure.
 **What happens when my job is interrupted?** A checkpoint will be persisted to Redis after the current `each_iteration`, and the job will be re-enqueued. Once it's popped off the queue, the worker will work off from the next iteration.

data/guides/argument-semantics.md ADDED Viewed

@@ -0,0 +1,130 @@
+# Argument Semantics
+`sidekiq-iteration` defines the `perform` method, required by `sidekiq`, to allow for iteration.
+The call sequence is usually 3 methods:
+`perform -> build_enumerator -> each_iteration`
+In that sense `sidekiq-iteration` works like a framework (it calls your code) rather than like a library (that you call). When using jobs with parameters, the following rules of thumb are good to keep in mind.
+## Jobs without arguments
+Jobs without arguments do not pass anything into either `build_enumerator` or `each_iteration` except for the `cursor` which `sidekiq-iteration` persists by itself:
+```ruby
+class ArglessJob
+  include Sidekiq::Job
+  include SidekiqIteration::Iteration
+  def build_enumerator(cursor:)
+    # ...
+  end
+  def each_iteration(single_object_yielded_from_enumerator)
+    # ...
+  end
+end
+```
+To enqueue the job:
+```ruby
+ArglessJob.perform_async
+```
+## Jobs with positional arguments
+Jobs with positional arguments will have those arguments available to both `build_enumerator` and `each_iteration`:
+```ruby
+class ArgumentativeJob
+  include Sidekiq::Job
+  include SidekiqIteration::Iteration
+  def build_enumerator(arg1, arg2, arg3, cursor:)
+    # ...
+  end
+  def each_iteration(single_object_yielded_from_enumerator, arg1, arg2, arg3)
+    # ...
+  end
+end
+```
+To enqueue the job:
+```ruby
+ArgumentativeJob.perform_async(_arg1 = "One", _arg2 = "Two", _arg3 = "Three")
+```
+## Jobs with keyword arguments
+Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in:
+```ruby
+class ParameterizedJob
+  include Sidekiq::Job
+  include SidekiqIteration::Iteration
+  def build_enumerator(kwargs, cursor:)
+    name = kwargs.fetch("name")
+    email = kwargs.fetch("email")
+    # ...
+  end
+  def each_iteration(object_yielded_from_enumerator, kwargs)
+    name = kwargs.fetch("name")
+    email = kwargs.fetch("email")
+    # ...
+  end
+end
+```
+To enqueue the job:
+```ruby
+ParameterizedJob.perform_async("name" => "Jane", "email" => "jane@host.example")
+```
+## Jobs with both positional and keyword arguments
+Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in. Positional arguments get passed first and "unsplatted" (not combined into an array), the `Hash` containing keyword arguments comes after:
+```ruby
+class HighlyConfigurableGreetingJob
+  include Sidekiq::Job
+  include SidekiqIteration::Iteration
+  def build_enumerator(subject_line, kwargs, cursor:)
+    name = kwargs.fetch("sender_name")
+    email = kwargs.fetch("sender_email")
+    # ...
+  end
+  def each_iteration(object_yielded_from_enumerator, subject_line, kwargs)
+    name = kwargs.fetch("sender_name")
+    email = kwargs.fetch("sender_email")
+    # ...
+  end
+end
+```
+To enqueue the job:
+```ruby
+HighlyConfigurableGreetingJob.perform_async(_subject_line = "Greetings everybody!", "sender_name" => "Jane", "sender_email" => "jane@host.example")
+```
+## Returning (yielding) from enumerators
+When defining a custom enumerator (see the [custom enumerator guide](custom-enumerator.md)) you need to yield two positional arguments from it: the object that will be the value for the current iteration (like a single ActiveModel instance, a single number...) and the value you want to be persisted as the `cursor` value should `sidekiq-iteration` decide to interrupt you after this iteration. Calling the enumerator with that cursor should return the next object after the one returned in this iteration. That new `cursor` value does not get passed to `each_iteration`:
+```ruby
+Enumerator.new do |yielder|
+  # In this case `cursor` is an Integer
+  cursor.upto(99999) do |offset|
+    yielder.yield(fetch_record_at(offset), offset)
+  end
+end
+```

data/guides/best-practices.md CHANGED Viewed

@@ -1,5 +1,10 @@
 # Best practices
+## Considerations when writing jobs
+* Duration of `#each_iteration`: processing a single element from the enumerator builded in `#build_enumerator` should take less than 25 seconds, or the duration set as a timeout for Sidekiq. It allows the job to be safely interrupted and resumed.
+* Idempotency of `#each_iteration`: it should be safe to run `#each_iteration` multiple times for the same element from the enumerator. Read more in [this Sidekiq best practice](https://github.com/mperham/sidekiq/wiki/Best-Practices#2-make-your-job-idempotent-and-transactional). It's important if the job errors and you run it again, because the same element that errored the job may be processed again. It especially matters in the situation described above, when the iteration duration exceeds the timeout: if the job is re-enqueued, multiple elements may be processed again.
 ## Batch iteration
 Regardless of the active record enumerator used in the task, `sidekiq-iteration` gem loads records in batches of 100 (by default).

data/guides/custom-enumerator.md CHANGED Viewed

@@ -2,6 +2,17 @@
 Iteration leverages the [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) pattern from the Ruby standard library, which allows us to use almost any resource as a collection to iterate.
+Before writing an enumerator, it is important to understand [how Iteration works](iteration-how-it-works.md) and how
+your enumerator will be used by it. An enumerator must `yield` two things in the following order as positional
+arguments:
+- An object to be processed in a job `each_iteration` method
+- A cursor position, which Iteration will persist if `each_iteration` returns succesfully and the job is forced to shut
+  down. It can be any data type your job backend can serialize and deserialize correctly.
+A job that includes Iteration is first started with `nil` as the cursor. When resuming an interrupted job, Iteration
+will deserialize the persisted cursor and pass it to the job's `build_enumerator` method, which your enumerator uses to
+find objects that come _after_ the last successfully processed object.
 ## Cursorless Enumerator
 Consider a custom Enumerator that takes items from a Redis list. Because a Redis list is essentially a queue, we can ignore the cursor:
@@ -23,7 +34,7 @@ class ListJob
     end
   end
-  def each_iteration(item)
+  def each_iteration(item_from_redis)
     # ...
   end
 end
@@ -31,14 +42,15 @@ end
 ## Enumerator with cursor
-But what about iterating based on a cursor? Consider this Enumerator that wraps third party API (Stripe) for paginated iteration:
+For a more complex example, consider this Enumerator that wraps a third party API (Stripe) for paginated iteration and
+stores a string as the cursor position:
 ```ruby
 class StripeListEnumerator
   # @param resource [Stripe::APIResource] The type of Stripe object to request
   # @param params [Hash] Query parameters for the request
   # @param options [Hash] Request options, such as API key or version
-  # @param cursor [String]
+  # @param cursor [nil, String] The Stripe ID of the last item iterated over
   def initialize(resource, params: {}, options: {}, cursor:)
     pagination_params = {}
     pagination_params[:starting_after] = cursor unless cursor.nil?
@@ -59,6 +71,9 @@ class StripeListEnumerator
   def each
     loop do
       @list.each do |item, _index|
+        # The first argument is what gets passed to `each_iteration`.
+        # The second argument (item.id) is going to be persisted as the cursor,
+        # it doesn't get passed to `each_iteration`.
         yield item, item.id
       end
@@ -71,26 +86,38 @@ class StripeListEnumerator
 end
 ```
+Here we leverage the Stripe cursor pagination where the cursor is an ID of a specific item in the collection. The job
+which uses such an `Enumerator` would then look like so:
 ```ruby
-class StripeJob
+class LoadRefundsForChargeJob
   include Sidekiq::Job
   include SidekiqIteration::Iteration
-  def build_enumerator(params, cursor:)
+  def build_enumerator(charge_id, cursor:)
     StripeListEnumerator.new(
       Stripe::Refund,
-      params: { charge: "ch_123" },
+      params: { charge: charge_id }, # "charge_id" will be a prefixed Stripe ID such as "chrg_123"
       options: { api_key: "sk_test_123", stripe_version: "2018-01-18" },
       cursor: cursor
     ).to_enumerator
   end
-  def each_iteration(stripe_refund, _params)
+  # Note that in this case `each_iteration` will only receive one positional argument per iteration.
+  # If what your enumerator yields is a composite object you will need to unpack it yourself
+  # inside the `each_iteration`.
+  def each_iteration(stripe_refund, charge_id)
     # ...
   end
 end
 ```
+and you initiate the job with
+```ruby
+LoadRefundsForChargeJob.perform_later(_charge_id = "chrg_345")
+```
 ## Notes
 We recommend that you read the implementation of the other enumerators that come with the library (`CsvEnumerator`, `ActiveRecordEnumerator`) to gain a better understanding of building Enumerator objects.

data/lib/sidekiq_iteration/active_record_enumerator.rb CHANGED Viewed

@@ -1,28 +1,51 @@
 # frozen_string_literal: true
-require_relative "active_record_cursor"
 module SidekiqIteration
-  # Builds Enumerator based on ActiveRecord Relation. Supports enumerating on rows and batches.
   # @private
   class ActiveRecordEnumerator
-    SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%N"
+    SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%6N"
-    def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
+    def initialize(relation, columns: nil, batch_size: 100, order: :asc, cursor: nil)
       unless relation.is_a?(ActiveRecord::Relation)
         raise ArgumentError, "relation must be an ActiveRecord::Relation"
       end
-      @relation = relation
+      unless order == :asc || order == :desc
+        raise ArgumentError, ":order must be :asc or :desc, got #{order.inspect}"
+      end
+      @primary_key = "#{relation.table_name}.#{relation.primary_key}"
+      @columns = Array(columns&.map(&:to_s) || @primary_key)
+      @primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
+      @pluck_columns = if @primary_key_index
+                         @columns
+                       else
+                         @columns + [@primary_key]
+                       end
       @batch_size = batch_size
-      @columns = Array(columns || "#{relation.table_name}.#{relation.primary_key}")
-      @cursor = cursor
+      @order = order
+      @cursor = Array.wrap(cursor)
+      raise ArgumentError, "Must specify at least one column" if @columns.empty?
+      if relation.joins_values.present? && !@columns.all?(/\./)
+        raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
+      end
+      if relation.arel.orders.present? || relation.arel.taken.present?
+        raise ArgumentError,
+          "The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
+          "You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
+      end
+      ordering = @columns.to_h { |column| [column, @order] }
+      @base_relation = relation.reorder(ordering)
+      @iteration_count = 0
     end
     def records
-      Enumerator.new(-> { size }) do |yielder|
+      Enumerator.new(-> { records_size }) do |yielder|
         batches.each do |batch, _|
           batch.each do |record|
+            @iteration_count += 1
             yielder.yield(record, cursor_value(record))
           end
         end
@@ -30,40 +53,146 @@ module SidekiqIteration
     end
     def batches
-      cursor = ActiveRecordCursor.new(@relation, @columns, @cursor)
-      Enumerator.new(-> { size }) do |yielder|
-        while (records = cursor.next_batch(@batch_size))
-          yielder.yield(records, cursor_value(records.last)) if records.any?
+      Enumerator.new(-> { records_size }) do |yielder|
+        while (batch = next_batch(load: true))
+          @iteration_count += 1
+          yielder.yield(batch, cursor_value(batch.last))
         end
       end
     end
-    def size
-      @relation.count(:all)
+    def relations
+      Enumerator.new(-> { relations_size }) do |yielder|
+        while (batch = next_batch(load: false))
+          @iteration_count += 1
+          yielder.yield(batch, unwrap_array(@cursor))
+        end
+      end
     end
     private
+      def records_size
+        @base_relation.count(:all)
+      end
+      def relations_size
+        (records_size + @batch_size - 1) / @batch_size # ceiling division
+      end
+      def next_batch(load:)
+        batch_relation = @base_relation.limit(@batch_size)
+        if conditions.any?
+          batch_relation = batch_relation.where(*conditions)
+        end
+        records = nil
+        cursor_values, ids = batch_relation.uncached do
+          if load
+            records = batch_relation.records
+            pluck_columns(records)
+          else
+            pluck_columns(batch_relation)
+          end
+        end
+        cursor = cursor_values.last
+        return unless cursor.present?
+        # The primary key was plucked, but original cursor did not include it, so we should remove it
+        cursor.pop unless @primary_key_index
+        @cursor = Array.wrap(cursor)
+        # Yields relations by selecting the primary keys of records in the batch.
+        # Post.where(published: nil) results in an enumerator of relations like:
+        # Post.where(published: nil, ids: batch_of_ids)
+        relation = @base_relation.where(@primary_key => ids)
+        relation.send(:load_records, records) if load
+        relation
+      end
+      def pluck_columns(batch)
+        columns =
+          if batch.is_a?(Array)
+            @pluck_columns.map { |column| column.to_s.split(".").last }
+          else
+            @pluck_columns
+          end
+        if columns.size == 1 # only the primary key
+          column_values = batch.pluck(columns.first)
+          return [column_values, column_values]
+        end
+        column_values = batch.pluck(*columns)
+        primary_key_index = @primary_key_index || -1
+        primary_key_values = column_values.map { |values| values[primary_key_index] }
+        serialize_column_values!(column_values)
+        [column_values, primary_key_values]
+      end
       def cursor_value(record)
         positions = @columns.map do |column|
           attribute_name = column.to_s.split(".").last
-          column_value(record, attribute_name)
+          column_value(record[attribute_name])
+        end
+        unwrap_array(positions)
+      end
+      def conditions
+        return [] if @cursor.empty?
+        binds = []
+        sql = build_starts_after_conditions(0, binds)
+        # Start from the record pointed by cursor.
+        # We use the property that `>=` is equivalent to `> or =`.
+        if @iteration_count == 0
+          binds.unshift(*@cursor)
+          columns_equality = @columns.map { |column| "#{column} = ?" }.join(" AND ")
+          sql = "(#{columns_equality}) OR (#{sql})"
         end
-        if positions.size == 1
-          positions.first
+        [sql, *binds]
+      end
+      # (x, y) > (a, b) iff (x > a or (x = a and y > b))
+      # (x, y) < (a, b) iff (x < a or (x = a and y < b))
+      def build_starts_after_conditions(index, binds)
+        column = @columns[index]
+        if index < @cursor.size - 1
+          binds << @cursor[index] << @cursor[index]
+          "#{column} #{@order == :asc ? '>' : '<'} ? OR (#{column} = ? AND (#{build_starts_after_conditions(index + 1, binds)}))"
         else
-          positions
+          binds << @cursor[index]
+          if @columns.size == @cursor.size
+            @order == :asc ? "#{column} > ?" : "#{column} < ?"
+          else
+            @order == :asc ? "#{column} >= ?" : "#{column} <= ?"
+          end
         end
       end
-      def column_value(record, attribute)
-        value = record.read_attribute(attribute.to_sym)
-        case record.class.columns_hash.fetch(attribute).type
-        when :datetime
+      def serialize_column_values!(column_values)
+        column_values.map! { |values| values.map! { |value| column_value(value) } }
+      end
+      def column_value(value)
+        if value.is_a?(Time)
           value.strftime(SQL_DATETIME_WITH_NSEC)
         else
           value
         end
       end
+      def unwrap_array(array)
+        if array.size == 1
+          array.first
+        else
+          array
+        end
+      end
   end
 end

data/lib/sidekiq_iteration/csv_enumerator.rb CHANGED Viewed

@@ -49,7 +49,7 @@ module SidekiqIteration
     def rows(cursor:)
       @csv.lazy
         .each_with_index
-        .drop(count_of_processed_rows(cursor))
+        .drop(cursor || 0)
         .to_enum { count_of_rows_in_file }
     end
@@ -60,7 +60,7 @@ module SidekiqIteration
       @csv.lazy
         .each_slice(batch_size)
         .with_index
-        .drop(count_of_processed_rows(cursor))
+        .drop(cursor || 0)
         .to_enum { (count_of_rows_in_file.to_f / batch_size).ceil }
     end
@@ -73,13 +73,5 @@ module SidekiqIteration
         count -= 1 if @csv.headers
         count
       end
-      def count_of_processed_rows(cursor)
-        if cursor
-          cursor + 1
-        else
-          0
-        end
-      end
   end
 end

data/lib/sidekiq_iteration/enumerators.rb CHANGED Viewed

@@ -1,7 +1,6 @@
 # frozen_string_literal: true
 require_relative "active_record_enumerator"
-require_relative "active_record_batch_enumerator"
 require_relative "csv_enumerator"
 require_relative "nested_enumerator"
@@ -22,8 +21,7 @@ module SidekiqIteration
         raise ArgumentError, "array cannot contain ActiveRecord objects"
       end
-      drop = cursor ? cursor + 1 : 0
-      array.each_with_index.drop(drop).to_enum { array.size }
+      array.each_with_index.drop(cursor || 0).to_enum { array.size }
     end
     # Builds Enumerator from Active Record Relation. Each Enumerator tick moves the cursor one row forward.
@@ -33,6 +31,7 @@ module SidekiqIteration
     # @option options :columns [Array<String, Symbol>] used to build the actual query for iteration,
     #   defaults to primary key
     # @option options :batch_size [Integer] (100) size of the batch
+    # @option options :order [:asc, :desc] (:asc) specifies iteration order
     #
     # +columns:+ argument is used to build the actual query for iteration. +columns+: defaults to primary key:
     #
@@ -115,7 +114,7 @@ module SidekiqIteration
     #   end
     #
     def active_record_relations_enumerator(scope, cursor:, **options)
-      ActiveRecordBatchEnumerator.new(scope, cursor: cursor, **options).each
+      ActiveRecordEnumerator.new(scope, cursor: cursor, **options).relations
     end
     # Builds Enumerator from a CSV file.

data/lib/sidekiq_iteration/iteration.rb CHANGED Viewed

@@ -13,13 +13,13 @@ module SidekiqIteration
       base.extend(Throttling)
       base.class_eval do
-        throttle_on(backoff: 0) do |job|
+        throttle_on(backoff: SidekiqIteration.default_retry_backoff) do |job|
           job.class.max_job_runtime &&
             job.start_time &&
             (Time.now.utc - job.start_time) > job.class.max_job_runtime
         end
-        throttle_on(backoff: 0) do
+        throttle_on(backoff: SidekiqIteration.default_retry_backoff) do
           defined?(Sidekiq::CLI) &&
             Sidekiq::CLI.instance.launcher.stopping?
         end
@@ -56,16 +56,22 @@ module SidekiqIteration
     attr_reader :executions,
       :cursor_position,
-      :start_time,
       :times_interrupted,
-      :total_time,
       :current_run_iterations
+    # The time when the job starts running. If the job is interrupted and runs again,
+    # the value is updated.
+    attr_reader :start_time
+    # The total time the job has been running, including multiple iterations.
+    # The time isn't reset if the job is interrupted.
+    attr_reader :total_time
     # @private
     def initialize
       super
       @arguments = nil
-      @job_iteration_retry_backoff = nil
+      @job_iteration_retry_backoff = SidekiqIteration.default_retry_backoff
       @needs_reenqueue = false
       @current_run_iterations = 0
     end
@@ -191,14 +197,14 @@ module SidekiqIteration
           )
         end
-        adjust_total_time
         true
+      ensure
+        adjust_total_time
       end
       def reenqueue_iteration_job
         SidekiqIteration.logger.info("[SidekiqIteration::Iteration] Interrupting and re-enqueueing the job cursor_position=#{cursor_position}")
-        adjust_total_time
         @times_interrupted += 1
         arguments = @arguments

data/lib/sidekiq_iteration/job_retry_patch.rb CHANGED Viewed

@@ -7,6 +7,17 @@ module SidekiqIteration
   module JobRetryPatch
     private
       def process_retry(jobinst, msg, queue, exception)
+        add_sidekiq_iteration_metadata(jobinst, msg)
+        super
+      end
+      # The method was renamed in https://github.com/mperham/sidekiq/commit/0676a5202e89aa9da4ad7991f4111b97a9d8a0a4.
+      def attempt_retry(jobinst, msg, queue, exception)
+        add_sidekiq_iteration_metadata(jobinst, msg)
+        super
+      end
+      def add_sidekiq_iteration_metadata(jobinst, msg)
         if jobinst.is_a?(Iteration)
           unless msg["args"].last.is_a?(Hash)
             msg["args"].push({})
@@ -19,12 +30,14 @@ module SidekiqIteration
             "total_time" => jobinst.total_time,
           }
         end
-        super
       end
   end
 end
-if Sidekiq::JobRetry.instance_method(:process_retry)
+if Sidekiq::JobRetry.private_method_defined?(:process_retry) ||
+   Sidekiq::JobRetry.private_method_defined?(:attempt_retry)
   Sidekiq::JobRetry.prepend(SidekiqIteration::JobRetryPatch)
+else
+  raise "Sidekiq #{Sidekiq::VERSION} removed the #process_retry method. " \
+        "Please open an issue at the `sidekiq-iteration` gem."
 end

data/lib/sidekiq_iteration/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module SidekiqIteration
-  VERSION = "0.1.0"
+  VERSION = "0.3.0"
 end

data/lib/sidekiq_iteration.rb CHANGED Viewed

@@ -22,6 +22,17 @@ module SidekiqIteration
     #
     attr_accessor :max_job_runtime
+    # Configures a delay duration to wait before resuming an interrupted job.
+    #
+    # @example
+    #   SidekiqIteration.default_retry_backoff = 10.seconds
+    #
+    # Defaults to nil which means interrupted jobs will be retried immediately.
+    # This value will be ignored when an interruption is raised by a throttle enumerator,
+    # where the throttle backoff value will take precedence over this setting.
+    #
+    attr_accessor :default_retry_backoff
     # Set a custom logger for sidekiq-iteration.
     # Defaults to `Sidekiq.logger`.
     #

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: sidekiq-iteration
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.3.0
 platform: ruby
 authors:
 - fatkodima
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2022-11-02 00:00:00.000000000 Z
+date: 2023-05-20 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: sidekiq
@@ -35,14 +35,13 @@ files:
 - CHANGELOG.md
 - LICENSE.txt
 - README.md
+- guides/argument-semantics.md
 - guides/best-practices.md
 - guides/custom-enumerator.md
 - guides/iteration-how-it-works.md
 - guides/throttling.md
 - lib/sidekiq-iteration.rb
 - lib/sidekiq_iteration.rb
-- lib/sidekiq_iteration/active_record_batch_enumerator.rb
-- lib/sidekiq_iteration/active_record_cursor.rb
 - lib/sidekiq_iteration/active_record_enumerator.rb
 - lib/sidekiq_iteration/csv_enumerator.rb
 - lib/sidekiq_iteration/enumerators.rb
@@ -73,8 +72,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.1.6
+rubygems_version: 3.4.12
 signing_key:
 specification_version: 4
-summary: Makes your sidekiq jobs interruptible and resumable.
+summary: Makes your long-running sidekiq jobs interruptible and resumable.
 test_files: []

data/lib/sidekiq_iteration/active_record_batch_enumerator.rb DELETED Viewed

@@ -1,127 +0,0 @@
-# frozen_string_literal: true
-module SidekiqIteration
-  # Batch Enumerator based on ActiveRecord Relation.
-  # @private
-  class ActiveRecordBatchEnumerator
-    include Enumerable
-    SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%N"
-    def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
-      @primary_key = "#{relation.table_name}.#{relation.primary_key}"
-      @columns = Array(columns&.map(&:to_s) || @primary_key)
-      @primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
-      @pluck_columns = if @primary_key_index
-                         @columns
-                       else
-                         @columns + [@primary_key]
-                       end
-      @batch_size = batch_size
-      @cursor = Array.wrap(cursor)
-      @initial_cursor = @cursor
-      raise ArgumentError, "Must specify at least one column" if @columns.empty?
-      if relation.joins_values.present? && !@columns.all?(/\./)
-        raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
-      end
-      if relation.arel.orders.present? || relation.arel.taken.present?
-        raise ArgumentError,
-          "The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
-          "You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
-      end
-      @base_relation = relation.reorder(@columns.join(", "))
-    end
-    def each
-      return to_enum { size } unless block_given?
-      while (relation = next_batch)
-        yield relation, cursor_value
-      end
-    end
-    def size
-      (@base_relation.count(:all) + @batch_size - 1) / @batch_size # ceiling division
-    end
-    private
-      def next_batch
-        relation = @base_relation.limit(@batch_size)
-        if conditions.any?
-          relation = relation.where(*conditions)
-        end
-        cursor_values, ids = relation.uncached do
-          pluck_columns(relation)
-        end
-        cursor = cursor_values.last
-        unless cursor.present?
-          @cursor = @initial_cursor
-          return
-        end
-        # The primary key was plucked, but original cursor did not include it, so we should remove it
-        cursor.pop unless @primary_key_index
-        @cursor = Array.wrap(cursor)
-        # Yields relations by selecting the primary keys of records in the batch.
-        # Post.where(published: nil) results in an enumerator of relations like:
-        # Post.where(published: nil, ids: batch_of_ids)
-        @base_relation.where(@primary_key => ids)
-      end
-      def pluck_columns(relation)
-        if @pluck_columns.size == 1 # only the primary key
-          column_values = relation.pluck(*@pluck_columns)
-          return [column_values, column_values]
-        end
-        column_values = relation.pluck(*@pluck_columns)
-        primary_key_index = @primary_key_index || -1
-        primary_key_values = column_values.map { |values| values[primary_key_index] }
-        serialize_column_values!(column_values)
-        [column_values, primary_key_values]
-      end
-      def cursor_value
-        if @cursor.size == 1
-          @cursor.first
-        else
-          @cursor
-        end
-      end
-      def conditions
-        column_index = @cursor.size - 1
-        column = @columns[column_index]
-        where_clause = if @columns.size == @cursor.size
-                         "#{column} > ?"
-                       else
-                         "#{column} >= ?"
-                       end
-        while column_index > 0
-          column_index -= 1
-          column = @columns[column_index]
-          where_clause = "#{column} > ? OR (#{column} = ? AND (#{where_clause}))"
-        end
-        ret = @cursor.reduce([where_clause]) { |params, value| params << value << value }
-        ret.pop
-        ret
-      end
-      def serialize_column_values!(column_values)
-        column_values.map! { |values| values.map! { |value| column_value(value) } }
-      end
-      def column_value(value)
-        if value.is_a?(Time)
-          value.strftime(SQL_DATETIME_WITH_NSEC)
-        else
-          value
-        end
-      end
-  end
-end

data/lib/sidekiq_iteration/active_record_cursor.rb DELETED Viewed

@@ -1,89 +0,0 @@
-# frozen_string_literal: true
-module SidekiqIteration
-  # @private
-  class ActiveRecordCursor
-    include Comparable
-    attr_reader :position, :reached_end
-    def initialize(relation, columns = nil, position = nil)
-      columns ||= "#{relation.table_name}.#{relation.primary_key}"
-      @columns = Array.wrap(columns)
-      raise ArgumentError, "Must specify at least one column" if @columns.empty?
-      self.position = Array.wrap(position)
-      if relation.joins_values.present? && !@columns.all?(/\./)
-        raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
-      end
-      if relation.arel.orders.present? || relation.arel.taken.present?
-        raise ArgumentError,
-          "The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
-          "You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
-      end
-      @base_relation = relation.reorder(@columns.join(", "))
-      @reached_end = false
-    end
-    def <=>(other)
-      if reached_end == other.reached_end
-        position <=> other.position
-      else
-        reached_end ? 1 : -1
-      end
-    end
-    def position=(position)
-      raise ArgumentError, "Cursor position cannot contain nil values" if position.any?(&:nil?)
-      @position = position
-    end
-    def next_batch(batch_size)
-      return if @reached_end
-      relation = @base_relation.limit(batch_size)
-      if (conditions = self.conditions).any?
-        relation = relation.where(*conditions)
-      end
-      records = relation.uncached do
-        relation.to_a
-      end
-      update_from_record(records.last) if records.any?
-      @reached_end = records.size < batch_size
-      records if records.any?
-    end
-    private
-      def conditions
-        i = @position.size - 1
-        column = @columns[i]
-        conditions = if @columns.size == @position.size
-                       "#{column} > ?"
-                     else
-                       "#{column} >= ?"
-                     end
-        while i > 0
-          i -= 1
-          column = @columns[i]
-          conditions = "#{column} > ? OR (#{column} = ? AND (#{conditions}))"
-        end
-        ret = @position.reduce([conditions]) { |params, value| params << value << value }
-        ret.pop
-        ret
-      end
-      def update_from_record(record)
-        self.position = @columns.map do |column|
-          method = column.to_s.split(".").last
-          record.send(method)
-        end
-      end
-  end
-end