RubyGems - sidekiq-iteration - Versions diffs - 0.2.0 → 0.4.0 - Mend

sidekiq-iteration 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +44 -0
data/README.md +14 -3
data/guides/argument-semantics.md +130 -0
data/guides/best-practices.md +1 -1
data/guides/custom-enumerator.md +34 -7
data/guides/iteration-how-it-works.md +6 -0
data/guides/throttling.md +1 -1
data/lib/sidekiq_iteration/active_record_enumerator.rb +139 -72
data/lib/sidekiq_iteration/csv_enumerator.rb +1 -1
data/lib/sidekiq_iteration/enumerators.rb +3 -6
data/lib/sidekiq_iteration/iteration.rb +23 -17
data/lib/sidekiq_iteration/version.rb +1 -1
data/lib/sidekiq_iteration.rb +21 -2
metadata +5 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 342540da75582c7f102f6ead29643a5196038978c5626b0c6a00a43db04f4f1f
-  data.tar.gz: cfb7cf80031976e5c68b2503d6ab0a13ffe92efc016419ad20944d3c5ce3e56d
+  metadata.gz: ea1fc3e6f5faff037ecfade45cc58bd2035385cafb888ef67ce253d12295aee8
+  data.tar.gz: a4bf097d4a1a8750f4d3e2ac9ce4f01191b5e7313635fed3d958b89647639c9a
 SHA512:
-  metadata.gz: 9c22d0b3d74888b394fcb26ca759a0a9a74e762f06c1d655b1dcb94b791c5d4be3880c6a127d8b5c0d149317fa4f25e1a2550026a14bde62b3743de7b058557c
-  data.tar.gz: 13ca6cd11f437d9c1b25e6dbd380f018674b1b80b7b4fece7c3a91c042b59b9dc8f8e1bc6a42a14325ec89ab4af4b022d4c3bd00e15aca6cb6189082e03d099a
+  metadata.gz: 415f27277011c3721853ae64c1c78c566a3bfe2c690ebaeb8e9e05bb10b4f3faac00cda0f6f95ef9fd980993695edc6621085833c7901bb09b14ff6027467622
+  data.tar.gz: 68568c8205c0370a3d1765e5bd6431655d1a3d5fd0ff07ee65fc6e8bf1dd0447fba1a62978c45fea3850029a9046f4cbe7768bce885f5bcb4d9c8e322a539ab6

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,49 @@
 ## master (unreleased)
+## 0.4.0 (2024-05-10)
+- Support ordering using multiple directions for ActiveRecord enumerators
+    ```ruby
+    active_record_records_enumerator(..., columns: [:shop_id, :id], order: [:asc, :desc])
+    ```
+- Support iterating over ActiveRecord models with composite primary keys
+- Use Arel to generate SQL in ActiveRecord enumerator
+    Previously, the enumerator coerced numeric ids to a string value (e.g.: `... AND id > '1'`),
+    which can cause problems on some DBMSes (like BigQuery).
+- Enforce explicitly passed to ActiveRecord enumerators `:columns` value to include a primary key
+    Previously, the primary key column was added implicitly if it was not in the list.
+    ```ruby
+    # before
+    active_record_records_enumerator(..., columns: [:updated_at])
+    # after
+    active_record_records_enumerator(..., columns: [:updated_at, :id])
+    ```
+- Accept single values as a `:columns` for ActiveRecord enumerators
+- Add `around_iteration` hook
+## 0.3.0 (2023-05-20)
+- Allow a default retry backoff to be configured
+    ```ruby
+    SidekiqIteration.default_retry_backoff = 10.seconds
+    ```
+- Add ability to iterate Active Record enumerators in reverse order
+    ```ruby
+    active_record_records_enumerator(User.all, order: :desc)
+    ```
 ## 0.2.0 (2022-11-11)
 - Fix storing run metadata when the job fails for sidekiq < 6.5.2

data/README.md CHANGED Viewed

@@ -4,6 +4,8 @@
 Meet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your long-running jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).
+You may consider [`pluck_in_batches`](https://github.com/fatkodima/pluck_in_batches) gem to speedup iterating over large database tables.
 ## Background
 Imagine the following job:
@@ -33,7 +35,7 @@ Software that is designed for high availability [must be resilient](https://12fa
 - Ruby 2.7+ (if you need support for older ruby, [open an issue](https://github.com/fatkodima/sidekiq-iteration/issues/new))
 - Sidekiq 6+
-## Getting started
+## Installation
 Add this line to your application's Gemfile:
@@ -45,6 +47,8 @@ And then execute:
     $ bundle
+## Getting started
 In the job, include `SidekiqIteration::Iteration` module and start describing the job with two methods (`build_enumerator` and `each_iteration`) instead of `perform`:
 ```ruby
@@ -97,6 +101,12 @@ class NotifyUsersJob
     # Will be called when the job starts iterating. Called only once, for the first time.
   end
+  def around_iteration
+    # Will be called around each iteration.
+    # Can be useful for some metrics collection, performance tracking etc.
+    yield
+  end
   def on_resume
     # Called when the job resumes iterating.
   end
@@ -184,10 +194,10 @@ class CsvJob
   def build_enumerator(import_id, cursor:)
     import = Import.find(import_id)
-    csv_enumereator(import.csv, cursor: cursor)
+    csv_enumerator(import.csv, cursor: cursor)
   end
-  def each_iteration(csv_row)
+  def each_iteration(csv_row, import_id)
     # insert csv_row to database
   end
 end
@@ -220,6 +230,7 @@ end
 ## Guides
 * [Iteration: how it works](guides/iteration-how-it-works.md)
+* [Job argument semantics](guides/argument-semantics.md)
 * [Best practices](guides/best-practices.md)
 * [Writing custom enumerator](guides/custom-enumerator.md)
 * [Throttling](guides/throttling.md)

data/guides/argument-semantics.md ADDED Viewed

@@ -0,0 +1,130 @@
+# Argument Semantics
+`sidekiq-iteration` defines the `perform` method, required by `sidekiq`, to allow for iteration.
+The call sequence is usually 3 methods:
+`perform -> build_enumerator -> each_iteration`
+In that sense `sidekiq-iteration` works like a framework (it calls your code) rather than like a library (that you call). When using jobs with parameters, the following rules of thumb are good to keep in mind.
+## Jobs without arguments
+Jobs without arguments do not pass anything into either `build_enumerator` or `each_iteration` except for the `cursor` which `sidekiq-iteration` persists by itself:
+```ruby
+class ArglessJob
+  include Sidekiq::Job
+  include SidekiqIteration::Iteration
+  def build_enumerator(cursor:)
+    # ...
+  end
+  def each_iteration(single_object_yielded_from_enumerator)
+    # ...
+  end
+end
+```
+To enqueue the job:
+```ruby
+ArglessJob.perform_async
+```
+## Jobs with positional arguments
+Jobs with positional arguments will have those arguments available to both `build_enumerator` and `each_iteration`:
+```ruby
+class ArgumentativeJob
+  include Sidekiq::Job
+  include SidekiqIteration::Iteration
+  def build_enumerator(arg1, arg2, arg3, cursor:)
+    # ...
+  end
+  def each_iteration(single_object_yielded_from_enumerator, arg1, arg2, arg3)
+    # ...
+  end
+end
+```
+To enqueue the job:
+```ruby
+ArgumentativeJob.perform_async(_arg1 = "One", _arg2 = "Two", _arg3 = "Three")
+```
+## Jobs with keyword arguments
+Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in:
+```ruby
+class ParameterizedJob
+  include Sidekiq::Job
+  include SidekiqIteration::Iteration
+  def build_enumerator(kwargs, cursor:)
+    name = kwargs.fetch("name")
+    email = kwargs.fetch("email")
+    # ...
+  end
+  def each_iteration(object_yielded_from_enumerator, kwargs)
+    name = kwargs.fetch("name")
+    email = kwargs.fetch("email")
+    # ...
+  end
+end
+```
+To enqueue the job:
+```ruby
+ParameterizedJob.perform_async("name" => "Jane", "email" => "jane@host.example")
+```
+## Jobs with both positional and keyword arguments
+Jobs with keyword arguments will have the keyword arguments available to both `build_enumerator` and `each_iteration`, but these arguments come packaged into a Hash in both cases. You will need to `fetch` or `[]` your parameter from the `Hash` you get passed in. Positional arguments get passed first and "unsplatted" (not combined into an array), the `Hash` containing keyword arguments comes after:
+```ruby
+class HighlyConfigurableGreetingJob
+  include Sidekiq::Job
+  include SidekiqIteration::Iteration
+  def build_enumerator(subject_line, kwargs, cursor:)
+    name = kwargs.fetch("sender_name")
+    email = kwargs.fetch("sender_email")
+    # ...
+  end
+  def each_iteration(object_yielded_from_enumerator, subject_line, kwargs)
+    name = kwargs.fetch("sender_name")
+    email = kwargs.fetch("sender_email")
+    # ...
+  end
+end
+```
+To enqueue the job:
+```ruby
+HighlyConfigurableGreetingJob.perform_async(_subject_line = "Greetings everybody!", "sender_name" => "Jane", "sender_email" => "jane@host.example")
+```
+## Returning (yielding) from enumerators
+When defining a custom enumerator (see the [custom enumerator guide](custom-enumerator.md)) you need to yield two positional arguments from it: the object that will be the value for the current iteration (like a single ActiveModel instance, a single number...) and the value you want to be persisted as the `cursor` value should `sidekiq-iteration` decide to interrupt you after this iteration. Calling the enumerator with that cursor should return the next object after the one returned in this iteration. That new `cursor` value does not get passed to `each_iteration`:
+```ruby
+Enumerator.new do |yielder|
+  # In this case `cursor` is an Integer
+  cursor.upto(99999) do |offset|
+    yielder.yield(fetch_record_at(offset), offset)
+  end
+end
+```

data/guides/best-practices.md CHANGED Viewed

@@ -2,7 +2,7 @@
 ## Considerations when writing jobs
-* Duration of `#each_iteration`: processing a single element from the enumerator builded in `#build_enumerator` should take less than 25 seconds, or the duration set as a timeout for Sidekiq. It allows the job to be safely interrupted and resumed.
+* Duration of `#each_iteration`: processing a single element from the enumerator built in `#build_enumerator` should take less than 25 seconds, or the duration set as a timeout for Sidekiq. It allows the job to be safely interrupted and resumed.
 * Idempotency of `#each_iteration`: it should be safe to run `#each_iteration` multiple times for the same element from the enumerator. Read more in [this Sidekiq best practice](https://github.com/mperham/sidekiq/wiki/Best-Practices#2-make-your-job-idempotent-and-transactional). It's important if the job errors and you run it again, because the same element that errored the job may be processed again. It especially matters in the situation described above, when the iteration duration exceeds the timeout: if the job is re-enqueued, multiple elements may be processed again.
 ## Batch iteration

data/guides/custom-enumerator.md CHANGED Viewed

@@ -2,6 +2,17 @@
 Iteration leverages the [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) pattern from the Ruby standard library, which allows us to use almost any resource as a collection to iterate.
+Before writing an enumerator, it is important to understand [how Iteration works](iteration-how-it-works.md) and how
+your enumerator will be used by it. An enumerator must `yield` two things in the following order as positional
+arguments:
+- An object to be processed in a job `each_iteration` method
+- A cursor position, which Iteration will persist if `each_iteration` returns successfully and the job is forced to shut
+  down. It can be any data type your job backend can serialize and deserialize correctly.
+A job that includes Iteration is first started with `nil` as the cursor. When resuming an interrupted job, Iteration
+will deserialize the persisted cursor and pass it to the job's `build_enumerator` method, which your enumerator uses to
+find objects that come _after_ the last successfully processed object.
 ## Cursorless Enumerator
 Consider a custom Enumerator that takes items from a Redis list. Because a Redis list is essentially a queue, we can ignore the cursor:
@@ -23,7 +34,7 @@ class ListJob
     end
   end
-  def each_iteration(item)
+  def each_iteration(item_from_redis)
     # ...
   end
 end
@@ -31,14 +42,15 @@ end
 ## Enumerator with cursor
-But what about iterating based on a cursor? Consider this Enumerator that wraps third party API (Stripe) for paginated iteration:
+For a more complex example, consider this Enumerator that wraps a third party API (Stripe) for paginated iteration and
+stores a string as the cursor position:
 ```ruby
 class StripeListEnumerator
   # @param resource [Stripe::APIResource] The type of Stripe object to request
   # @param params [Hash] Query parameters for the request
   # @param options [Hash] Request options, such as API key or version
-  # @param cursor [String]
+  # @param cursor [nil, String] The Stripe ID of the last item iterated over
   def initialize(resource, params: {}, options: {}, cursor:)
     pagination_params = {}
     pagination_params[:starting_after] = cursor unless cursor.nil?
@@ -59,6 +71,9 @@ class StripeListEnumerator
   def each
     loop do
       @list.each do |item, _index|
+        # The first argument is what gets passed to `each_iteration`.
+        # The second argument (item.id) is going to be persisted as the cursor,
+        # it doesn't get passed to `each_iteration`.
         yield item, item.id
       end
@@ -71,26 +86,38 @@ class StripeListEnumerator
 end
 ```
+Here we leverage the Stripe cursor pagination where the cursor is an ID of a specific item in the collection. The job
+which uses such an `Enumerator` would then look like so:
 ```ruby
-class StripeJob
+class LoadRefundsForChargeJob
   include Sidekiq::Job
   include SidekiqIteration::Iteration
-  def build_enumerator(params, cursor:)
+  def build_enumerator(charge_id, cursor:)
     StripeListEnumerator.new(
       Stripe::Refund,
-      params: { charge: "ch_123" },
+      params: { charge: charge_id }, # "charge_id" will be a prefixed Stripe ID such as "chrg_123"
       options: { api_key: "sk_test_123", stripe_version: "2018-01-18" },
       cursor: cursor
     ).to_enumerator
   end
-  def each_iteration(stripe_refund, _params)
+  # Note that in this case `each_iteration` will only receive one positional argument per iteration.
+  # If what your enumerator yields is a composite object you will need to unpack it yourself
+  # inside the `each_iteration`.
+  def each_iteration(stripe_refund, charge_id)
     # ...
   end
 end
 ```
+and you initiate the job with
+```ruby
+LoadRefundsForChargeJob.perform_later(_charge_id = "chrg_345")
+```
 ## Notes
 We recommend that you read the implementation of the other enumerators that come with the library (`CsvEnumerator`, `ActiveRecordEnumerator`) to gain a better understanding of building Enumerator objects.

data/guides/iteration-how-it-works.md CHANGED Viewed

@@ -36,6 +36,12 @@ SELECT "users".* FROM "users" ORDER BY "users"."id" LIMIT 100
 SELECT "users".* FROM "users" WHERE "users"."id" > 2 ORDER BY "products"."id" LIMIT 100
 ```
+## Exceptions inside `each_iteration`
+When an unrescued exception happens inside the `each_iteration` block, the job will stop and re-enqueue itself with the last successful cursor. This means that the iteration that failed will be retried with the same parameters and the cursor will only move if that iteration succeeds. This behaviour may be enough for intermittent errors, such as network connection failures, but if your execution is deterministic and you have an error, subsequent iterations will never run.
+In other words, if you are trying to process 100 records but the job consistently fails on the 61st, only the first 60 will be processed and the job will try to process the 61st record until retries are exhausted.
 ## Signals
 It's critical to know [UNIX signals](https://www.tutorialspoint.com/unix/unix-signals-traps.htm) in order to understand how interruption works. There are two main signals that Sidekiq use: `SIGTERM` and `SIGKILL`. `SIGTERM` is the graceful termination signal which means that the process should exit _soon_, not immediately. For Iteration, it means that we have time to wait for the last iteration to finish and to push job back to the queue with the last cursor position.

data/guides/throttling.md CHANGED Viewed

@@ -25,7 +25,7 @@ class DeleteAccountsThrottledJob
 end
 ```
-Note that it’s up to you to define a throttling condition that makes sense for your app.
+Note that it's up to you to define a throttling condition that makes sense for your app.
 For example, `DatabaseStatus.healthy?` can check various MySQL metrics such as replication lag, DB threads, whether DB writes are available, etc.
 Jobs can define multiple throttle conditions. Throttle conditions are inherited by descendants, and new conditions will be appended without impacting existing conditions.

data/lib/sidekiq_iteration/active_record_enumerator.rb CHANGED Viewed

@@ -5,41 +5,69 @@ module SidekiqIteration
   class ActiveRecordEnumerator
     SQL_DATETIME_WITH_NSEC = "%Y-%m-%d %H:%M:%S.%6N"
-    def initialize(relation, columns: nil, batch_size: 100, cursor: nil)
+    def initialize(relation, columns: nil, batch_size: 100, order: :asc, cursor: nil)
       unless relation.is_a?(ActiveRecord::Relation)
         raise ArgumentError, "relation must be an ActiveRecord::Relation"
       end
-      @primary_key = "#{relation.table_name}.#{relation.primary_key}"
-      @columns = Array(columns&.map(&:to_s) || @primary_key)
-      @primary_key_index = @columns.index(@primary_key) || @columns.index(relation.primary_key)
-      @pluck_columns = if @primary_key_index
-                         @columns
-                       else
-                         @columns + [@primary_key]
-                       end
-      @batch_size = batch_size
-      @cursor = Array.wrap(cursor)
-      raise ArgumentError, "Must specify at least one column" if @columns.empty?
-      if relation.joins_values.present? && !@columns.all?(/\./)
-        raise ArgumentError, "You need to specify fully-qualified columns if you join a table"
-      end
       if relation.arel.orders.present? || relation.arel.taken.present?
         raise ArgumentError,
           "The relation cannot use ORDER BY or LIMIT due to the way how iteration with a cursor is designed. " \
           "You can use other ways to limit the number of rows, e.g. a WHERE condition on the primary key column."
       end
-      @base_relation = relation.reorder(@columns.join(", "))
+      @relation = relation
+      @primary_key = relation.primary_key
+      columns = Array(columns || @primary_key).map(&:to_s)
+      if (Array(order) - [:asc, :desc]).any?
+        raise ArgumentError, ":order must be :asc or :desc or an array consisting of :asc or :desc, got #{order.inspect}"
+      end
+      if order.is_a?(Array) && order.size != columns.size
+        raise ArgumentError, ":order must include a direction for each batching column"
+      end
+      @primary_key_index = primary_key_index(columns, relation)
+      if @primary_key_index.nil? || (composite_primary_key? && @primary_key_index.any?(nil))
+        raise ArgumentError, ":columns must include a primary key columns"
+      end
+      @batch_size = batch_size
+      @order = batch_order(columns, order)
+      @cursor = Array(cursor)
+      if @cursor.present? && @cursor.size != columns.size
+        raise ArgumentError, ":cursor must include values for all the columns from :columns"
+      end
+      if columns.any?(/\W/)
+        arel_columns = columns.map.with_index do |column, i|
+          arel_column(column).as("cursor_column_#{i + 1}")
+        end
+        @cursor_columns = arel_columns.map { |column| column.right.to_s }
+        relation =
+          if relation.select_values.empty?
+            relation.select(@relation.arel_table[Arel.star], arel_columns)
+          else
+            relation.select(arel_columns)
+          end
+      else
+        @cursor_columns = columns
+      end
+      @columns = columns
+      ordering = @columns.zip(@order).to_h
+      @base_relation = relation.reorder(ordering)
       @iteration_count = 0
     end
     def records
       Enumerator.new(-> { records_size }) do |yielder|
-        batches.each do |batch, _|
+        batches.each do |batch, _| # rubocop:disable Style/HashEachMethods
           batch.each do |record|
-            @iteration_count += 1
+            increment_iteration
             yielder.yield(record, cursor_value(record))
           end
         end
@@ -49,7 +77,7 @@ module SidekiqIteration
     def batches
       Enumerator.new(-> { records_size }) do |yielder|
         while (batch = next_batch(load: true))
-          @iteration_count += 1
+          increment_iteration
           yielder.yield(batch, cursor_value(batch.last))
         end
       end
@@ -58,13 +86,44 @@ module SidekiqIteration
     def relations
       Enumerator.new(-> { relations_size }) do |yielder|
         while (batch = next_batch(load: false))
-          @iteration_count += 1
+          increment_iteration
           yielder.yield(batch, unwrap_array(@cursor))
         end
       end
     end
     private
+      def primary_key_index(columns, relation)
+        indexes = Array(@primary_key).map do |pk_column|
+          columns.index do |column|
+            column == pk_column ||
+              (column.include?(relation.table_name) && column.include?(pk_column))
+          end
+        end
+        if composite_primary_key?
+          indexes
+        else
+          indexes.first
+        end
+      end
+      def batch_order(columns, order)
+        if order.is_a?(Array)
+          order
+        else
+          [order] * columns.size
+        end
+      end
+      def arel_column(column)
+        if column.include?(".")
+          Arel.sql(column)
+        else
+          @relation.arel_table[column]
+        end
+      end
       def records_size
         @base_relation.count(:all)
       end
@@ -75,8 +134,8 @@ module SidekiqIteration
       def next_batch(load:)
         batch_relation = @base_relation.limit(@batch_size)
-        if conditions.any?
-          batch_relation = batch_relation.where(*conditions)
+        if @cursor.present?
+          batch_relation = apply_cursor(batch_relation)
         end
         records = nil
@@ -92,9 +151,7 @@ module SidekiqIteration
         cursor = cursor_values.last
         return unless cursor.present?
-        # The primary key was plucked, but original cursor did not include it, so we should remove it
-        cursor.pop unless @primary_key_index
-        @cursor = Array.wrap(cursor)
+        @cursor = Array(cursor)
         # Yields relations by selecting the primary keys of records in the batch.
         # Post.where(published: nil) results in an enumerator of relations like:
@@ -105,79 +162,89 @@ module SidekiqIteration
       end
       def pluck_columns(batch)
-        columns =
-          if batch.is_a?(Array)
-            @pluck_columns.map { |column| column.to_s.split(".").last }
-          else
-            @pluck_columns
-          end
-        if columns.size == 1 # only the primary key
-          column_values = batch.pluck(columns.first)
+        if @cursor_columns.size == 1 # only the primary key
+          column_values = batch.pluck(@cursor_columns.first)
           return [column_values, column_values]
         end
-        column_values = batch.pluck(*columns)
-        primary_key_index = @primary_key_index || -1
-        primary_key_values = column_values.map { |values| values[primary_key_index] }
+        column_values = batch.pluck(*@cursor_columns)
+        primary_key_values =
+          if composite_primary_key?
+            column_values.map { |values| values.values_at(*@primary_key_index) }
+          else
+            column_values.map { |values| values[@primary_key_index] }
+          end
-        serialize_column_values!(column_values)
+        column_values = serialize_column_values(column_values)
         [column_values, primary_key_values]
       end
       def cursor_value(record)
-        positions = @columns.map do |column|
-          attribute_name = column.to_s.split(".").last
-          column_value(record[attribute_name])
+        positions = @cursor_columns.map do |column|
+          column_value(record[column])
         end
         unwrap_array(positions)
       end
-      def conditions
-        return [] if @cursor.empty?
+      # (x, y) >= (a, b) iff (x > a or (x = a and y >= b))
+      # (x, y) <= (a, b) iff (x < a or (x = a and y <= b))
+      def apply_cursor(relation)
+        arel_columns = @columns.map { |column| arel_column(column) }
+        cursor_positions = arel_columns.zip(@cursor, cursor_operators)
-        binds = []
-        sql = build_starts_after_conditions(0, binds)
-        # Start from the record pointed by cursor.
-        # We use the property that `>=` is equivalent to `> or =`.
-        if @iteration_count == 0
-          binds.unshift(*@cursor)
-          columns_equality = @columns.map { |column| "#{column} = ?" }.join(" AND ")
-          sql = "(#{columns_equality}) OR (#{sql})"
+        where_clause = nil
+        cursor_positions.reverse_each.with_index do |(arel_column, value, operator), index|
+          where_clause =
+            if index == 0
+              arel_column.public_send(operator, value)
+            else
+              arel_column.public_send(operator, value).or(
+                arel_column.eq(value).and(where_clause),
+              )
+            end
         end
-        [sql, *binds]
+        relation.where(where_clause)
       end
-      # (x, y) > (a, b) iff (x > a or (x = a and y > b))
-      def build_starts_after_conditions(index, binds)
-        column = @columns[index]
+      def serialize_column_values(column_values)
+        column_values.map { |values| values.map { |value| column_value(value) } }
+      end
-        if index < @cursor.size - 1
-          binds << @cursor[index] << @cursor[index]
-          "#{column} > ? OR (#{column} = ? AND (#{build_starts_after_conditions(index + 1, binds)}))"
+      def column_value(value)
+        if value.is_a?(Time)
+          value.strftime(SQL_DATETIME_WITH_NSEC)
         else
-          binds << @cursor[index]
-          if @columns.size == @cursor.size
-            "#{column} > ?"
+          value
+        end
+      end
+      def cursor_operators
+        # Start from the record pointed by cursor when just starting.
+        @columns.zip(@order).map do |column, order|
+          if column == @columns.last
+            if order == :asc
+              first_iteration? ? :gteq : :gt
+            else
+              first_iteration? ? :lteq : :lt
+            end
           else
-            "#{column} >= ?"
+            order == :asc ? :gt : :lt
           end
         end
       end
-      def serialize_column_values!(column_values)
-        column_values.map! { |values| values.map! { |value| column_value(value) } }
+      def increment_iteration
+        @iteration_count += 1
       end
-      def column_value(value)
-        if value.is_a?(Time)
-          value.strftime(SQL_DATETIME_WITH_NSEC)
-        else
-          value
-        end
+      def first_iteration?
+        @iteration_count == 0
+      end
+      def composite_primary_key?
+        @primary_key.is_a?(Array)
       end
       def unwrap_array(array)

data/lib/sidekiq_iteration/csv_enumerator.rb CHANGED Viewed

@@ -36,7 +36,7 @@ module SidekiqIteration
     #   SidekiqIteration::CsvEnumerator.new(csv).rows(cursor: cursor)
     #
     def initialize(csv)
-      unless csv.instance_of?(CSV)
+      unless defined?(CSV) && csv.instance_of?(CSV)
         raise ArgumentError, "CsvEnumerator.new takes CSV object"
       end

data/lib/sidekiq_iteration/enumerators.rb CHANGED Viewed

@@ -17,10 +17,6 @@ module SidekiqIteration
     def array_enumerator(array, cursor:)
       raise ArgumentError, "array must be an Array" unless array.is_a?(Array)
-      if defined?(ActiveRecord) && array.any?(ActiveRecord::Base)
-        raise ArgumentError, "array cannot contain ActiveRecord objects"
-      end
       array.each_with_index.drop(cursor || 0).to_enum { array.size }
     end
@@ -28,9 +24,10 @@ module SidekiqIteration
     #
     # @param scope [ActiveRecord::Relation] scope to iterate
     # @param cursor [Object] offset to start iteration from, usually an id
-    # @option options :columns [Array<String, Symbol>] used to build the actual query for iteration,
+    # @option options :columns [Array<String, Symbol>, String, Symbol] used to build the actual query for iteration,
     #   defaults to primary key
     # @option options :batch_size [Integer] (100) size of the batch
+    # @option options :order [:asc, :desc, Array<:asc, :desc>] (:asc) specifies iteration order
     #
     # +columns:+ argument is used to build the actual query for iteration. +columns+: defaults to primary key:
     #
@@ -58,7 +55,7 @@ module SidekiqIteration
     # As a result of this query pattern, if the values in these columns change for the records in scope during
     # iteration, they may be skipped or yielded multiple times depending on the nature of the update and the
     # cursor's value. If the value gets updated to a greater value than the cursor's value, it will get yielded
-    # again. Similarly, if the value gets updated to a lesser value than the curor's value, it will get skipped.
+    # again. Similarly, if the value gets updated to a lesser value than the cursor's value, it will get skipped.
     #
     # @example
     #   def build_enumerator(cursor:)

data/lib/sidekiq_iteration/iteration.rb CHANGED Viewed

@@ -13,15 +13,14 @@ module SidekiqIteration
       base.extend(Throttling)
       base.class_eval do
-        throttle_on(backoff: 0) do |job|
+        throttle_on(backoff: SidekiqIteration.default_retry_backoff) do |job|
           job.class.max_job_runtime &&
             job.start_time &&
             (Time.now.utc - job.start_time) > job.class.max_job_runtime
         end
-        throttle_on(backoff: 0) do
-          defined?(Sidekiq::CLI) &&
-            Sidekiq::CLI.instance.launcher.stopping?
+        throttle_on(backoff: SidekiqIteration.default_retry_backoff) do
+          SidekiqIteration.stopping
         end
       end
@@ -56,16 +55,22 @@ module SidekiqIteration
     attr_reader :executions,
       :cursor_position,
-      :start_time,
       :times_interrupted,
-      :total_time,
       :current_run_iterations
+    # The time when the job starts running. If the job is interrupted and runs again,
+    # the value is updated.
+    attr_reader :start_time
+    # The total time the job has been running, including multiple iterations.
+    # The time isn't reset if the job is interrupted.
+    attr_reader :total_time
     # @private
     def initialize
       super
       @arguments = nil
-      @job_iteration_retry_backoff = nil
+      @job_iteration_retry_backoff = SidekiqIteration.default_retry_backoff
       @needs_reenqueue = false
       @current_run_iterations = 0
     end
@@ -82,6 +87,12 @@ module SidekiqIteration
     def on_start
     end
+    # A hook to override that will be called around each iteration.
+    # Can be useful for some metrics collection, performance tracking etc.
+    def around_iteration
+      yield
+    end
     # A hook to override that will be called when the job resumes iterating.
     def on_resume
     end
@@ -172,7 +183,9 @@ module SidekiqIteration
         enumerator.each do |object_from_enumerator, index|
           found_record = true
-          each_iteration(object_from_enumerator, *arguments)
+          around_iteration do
+            each_iteration(object_from_enumerator, *arguments)
+          end
           @cursor_position = index
           @current_run_iterations += 1
@@ -191,14 +204,14 @@ module SidekiqIteration
           )
         end
-        adjust_total_time
         true
+      ensure
+        adjust_total_time
       end
       def reenqueue_iteration_job
         SidekiqIteration.logger.info("[SidekiqIteration::Iteration] Interrupting and re-enqueueing the job cursor_position=#{cursor_position}")
-        adjust_total_time
         @times_interrupted += 1
         arguments = @arguments
@@ -252,13 +265,6 @@ module SidekiqIteration
           true
         when false, :skip_complete_callback
           false
-        when Array # can be used to return early from the enumerator
-          reason, backoff = completed
-          raise "Unknown reason: #{reason}" unless reason == :retry
-          @job_iteration_retry_backoff = backoff
-          @needs_reenqueue = true
-          false
         else
           raise "Unexpected thrown value: #{completed.inspect}"
         end

data/lib/sidekiq_iteration/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module SidekiqIteration
-  VERSION = "0.2.0"
+  VERSION = "0.4.0"
 end

data/lib/sidekiq_iteration.rb CHANGED Viewed

@@ -1,6 +1,8 @@
 # frozen_string_literal: true
 require "sidekiq"
+require_relative "sidekiq_iteration/iteration"
+require_relative "sidekiq_iteration/job_retry_patch"
 require_relative "sidekiq_iteration/version"
 module SidekiqIteration
@@ -22,6 +24,17 @@ module SidekiqIteration
     #
     attr_accessor :max_job_runtime
+    # Configures a delay duration to wait before resuming an interrupted job.
+    #
+    # @example
+    #   SidekiqIteration.default_retry_backoff = 10.seconds
+    #
+    # Defaults to nil which means interrupted jobs will be retried immediately.
+    # This value will be ignored when an interruption is raised by a throttle enumerator,
+    # where the throttle backoff value will take precedence over this setting.
+    #
+    attr_accessor :default_retry_backoff
     # Set a custom logger for sidekiq-iteration.
     # Defaults to `Sidekiq.logger`.
     #
@@ -33,8 +46,14 @@ module SidekiqIteration
     def logger
       @logger ||= Sidekiq.logger
     end
+    # @private
+    attr_accessor :stopping
   end
 end
-require_relative "sidekiq_iteration/iteration"
-require_relative "sidekiq_iteration/job_retry_patch"
+Sidekiq.configure_server do |config|
+  config.on(:quiet) do
+    SidekiqIteration.stopping = true
+  end
+end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: sidekiq-iteration
 version: !ruby/object:Gem::Version
-  version: 0.2.0
+  version: 0.4.0
 platform: ruby
 authors:
 - fatkodima
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2022-11-11 00:00:00.000000000 Z
+date: 2024-05-10 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: sidekiq
@@ -35,6 +35,7 @@ files:
 - CHANGELOG.md
 - LICENSE.txt
 - README.md
+- guides/argument-semantics.md
 - guides/best-practices.md
 - guides/custom-enumerator.md
 - guides/iteration-how-it-works.md
@@ -71,8 +72,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.1.6
+rubygems_version: 3.4.19
 signing_key:
 specification_version: 4
-summary: Makes your sidekiq jobs interruptible and resumable.
+summary: Makes your long-running sidekiq jobs interruptible and resumable.
 test_files: []