RubyGems - jobba - Versions diffs - 1.5.0 → 1.6.0 - Mend

jobba 1.5.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: cb1d6250e365236cfdbe87245b5ccd520809fe5a
-  data.tar.gz: 24e9272312ad529d7ec0e33c4d21e8f06c4836b5
+  metadata.gz: 9d8bc9c17314ba74b80cf4c79c5d638f9d35b83b
+  data.tar.gz: e5f612733209502da66c0200cc924fb941d8f773
 SHA512:
-  metadata.gz: 26f7a2a14a1de1761acec0c584bc3d3946ae23f955ba583e2a72dcd30ee3bd647cf2daa41f32b312729d8e451ed8ffbf5a3d7deaae191746f4b9f54bc106e0ab
-  data.tar.gz: ff3d42902f0094392848149d37379739c87b72a4cacd77f1e3e84343855defbd574b9dc8a1b9c3c2deefe08c0b150f776dc832228f8981edd6614a8aab52d366
+  metadata.gz: e185dfac6ec0e3d5d1daad73b62abc307dd8098251c1889f8427dbf1137970a9bb6e24ccdfda10eeacbd7ddd29648ee28cc7a7a8b1e4ced5b1c664208a307527
+  data.tar.gz: 259cd22746fb655039ca5e4b11c7095567c725ba03bb238445ef702ad24e07aa540eddab71613e21990508006050ddde68ec0173b6afc4cd32ecfdd3491ba541

data/README.md CHANGED

@@ -439,6 +439,17 @@ Jobba.where(...).run.count   # These pull data back to Ruby and count in Ruby
 Jobba.where(...).run.empty?
 ```
+## Pagination
+Pagination is supported with an ActiveRecord-like interface.  You can call `.limit(x)` and `.offset(y)` on
+queries, e.g.
+```ruby
+Jobba.where(state: :succeeded).limit(10).offset(20).to_a
+```
+Specifying a limit does not guarantee that you'll get that many elements back, as there may not be that many left in the result.
 ## Notes
 ### Times
@@ -449,6 +460,8 @@ Note that, in operations having to do with time, this gem ignores anything beyon
 Jobba strives to do all of its operations as efficiently as possible using built-in Redis operations.  If you find a place where the efficiency can be improved, please submit an issue or a pull request.
+Single-clause queries (those with one `where` call) have been optimized.  `Jobba.all` is a single-clause query.  If you have lots of IDs, try to get by with single-clause queries.  Multi-clause queries (including `count`) have to copy sets into temporary working sets where query clauses are ANDed together.  This can be expensive for large datasets.
 ### Write from one; Read from many
 Jobba assumes that any job is being run at one time by only one worker.  Jobba makes no accomodations for multiple processes updating a Status at the same time; multiple processes reading of a Status are fine of course.
@@ -463,6 +476,12 @@ $> USE_REAL_REDIS=true rspec
 Travis runs the specs with both `fakeredis` and real Redis.
+Clauses need to implement three methods:
+1. `to_new_set` - puts the IDs indicated by the clause into a new sorted set in redis
+2. `result_ids` - used to get the IDs indicated by the clause when the clause is the only one in the query
+3. `result_count` - used to get the count of IDs indicated by the clause when the clause is the only one in the query
 ## TODO
 1. Provide job min, max, and average durations.

data/Rakefile CHANGED

@@ -3,4 +3,4 @@ require "rspec/core/rake_task"
 RSpec::Core::RakeTask.new(:spec)
-task :default => :spec
+task default: :spec

data/lib/jobba.rb CHANGED

@@ -38,4 +38,27 @@ module Jobba
     )
   end
+  # Clears the whole shebang!  USE WITH CARE!
+  def self.clear_all_jobba_data!
+    keys = Jobba.redis.keys("*")
+    keys.each_slice(1000) do |some_keys|
+      Jobba.redis.del(*some_keys)
+    end
+  end
+  def self.cleanup(seconds_ago: 60*60*24*30*12, batch_size: 1000)
+    start_time = Jobba::Time.now
+    delete_before = start_time - seconds_ago
+    jobs_count = 0
+    loop do
+      jobs = where(recorded_at: { before: delete_before }).limit(batch_size).to_a
+      jobs.each(&:delete!)
+      num_jobs = jobs.size
+      jobs_count += num_jobs
+      break if jobs.size < batch_size
+    end
+  end
 end

data/lib/jobba/clause.rb CHANGED

@@ -4,7 +4,9 @@ class Jobba::Clause
   include Jobba::Common
   # if `keys` or `suffixes` is an array, all entries will be included in the resulting set
-  def initialize(prefix: nil, suffixes: nil, keys: nil, min: nil, max: nil)
+  def initialize(prefix: nil, suffixes: nil, keys: nil, min: nil, max: nil,
+                 keys_contain_only_unique_ids: false)
     if keys.nil? && prefix.nil? && suffixes.nil?
       raise ArgumentError, "Either `keys` or both `prefix` and `suffix` must be specified.", caller
     end
@@ -22,6 +24,8 @@ class Jobba::Clause
     @min = min
     @max = max
+    @keys_contain_only_unique_ids = keys_contain_only_unique_ids
   end
   def to_new_set
@@ -41,4 +45,84 @@ class Jobba::Clause
     new_key
   end
+  def result_ids(offset: nil, limit: nil)
+    # If we have one key and it is sorted, we can let redis return limited IDs,
+    # so handle that case specially.
+    id_data =
+      if @keys.one?
+        # offset and limit may or may not be used, so have to do again below
+        get_members(key: @keys.first, offset: offset, limit: limit)
+      else
+        ids = @keys.flat_map do |key|
+          # don't do limiting here -- doesn't make sense til we collect all the members
+          get_members(key: key)[:ids]
+        end
+        ids.sort!
+        ids.uniq! unless @keys_contain_only_unique_ids
+        {ids: ids, is_limited: false}
+      end
+    if !offset.nil? && !limit.nil? && id_data[:is_limited] == false
+      id_data[:ids].slice(offset, limit)
+    else
+      id_data[:ids]
+    end
+  end
+  def get_members(key:, offset: nil, limit: nil)
+    if sorted_key?(key)
+      min = @min.nil? ? "-inf" : "#{@min}"
+      max = @max.nil? ? "+inf" : "#{@max}"
+      options = {}
+      is_limited = false
+      if !offset.nil? && !limit.nil?
+        options[:limit] = [offset, limit]
+        is_limited = true
+      end
+      ids = redis.zrangebyscore(key, min, max, options)
+      {ids: ids, is_limited: is_limited}
+    else
+      ids = redis.smembers(key)
+      ids.sort!
+      {ids: ids, is_limited: false}
+    end
+  end
+  def result_count(offset: nil, limit: nil)
+    if @keys.one? || @keys_contain_only_unique_ids
+      # can count each key on its own using fast redis ops and add them up
+      nonlimited_count = @keys.map do |key|
+        if sorted_key?(key)
+          if @min.nil? && @max.nil?
+            redis.zcard(key)
+          else
+            min = @min.nil? ? "-inf" : "#{@min}"
+            max = @max.nil? ? "+inf" : "#{@max}"
+            redis.zcount(key, min, max)
+          end
+        else
+          redis.scard(key)
+        end
+      end.reduce(:+)
+      Jobba::Utils.limited_count(nonlimited_count: nonlimited_count,
+                                 offset: offset, limit: limit)
+    else
+      # Because we need to get a count of uniq members, have to do a full query
+      result_ids(offset: offset, limit: limit).count
+    end
+  end
+  def sorted_key?(key)
+    key.match(/_at$/)
+  end
 end

data/lib/jobba/clause_factory.rb CHANGED

@@ -65,7 +65,10 @@ class Jobba::ClauseFactory
     }.uniq
     validate_state_name!(state)
-    Jobba::Clause.new(keys: state)
+    # An ID is in only one state at a time, so we can tell `Clause` that
+    # info via `keys_contain_only_unique_ids` -- helps it be more efficient
+    Jobba::Clause.new(keys: state, keys_contain_only_unique_ids: true)
   end
   def self.validate_state_name!(state_name)

data/lib/jobba/id_clause.rb CHANGED

@@ -11,4 +11,12 @@ class Jobba::IdClause
     redis.zadd(new_key, @ids.collect{|id| [0, id]}) if @ids.any?
     new_key
   end
+  def result_ids(offset: nil, limit: nil)
+    @ids.map(&:to_s).slice(offset || 0, limit || @ids.count)
+  end
+  def result_count(offset: nil, limit: nil)
+    Jobba::Utils.limited_count(nonlimited_count: @ids.count, offset: offset, limit: limit)
+  end
 end

data/lib/jobba/query.rb CHANGED

@@ -6,6 +6,8 @@ class Jobba::Query
   include Jobba::Common
+  attr_reader :_limit, :_offset
   def where(options)
     options.each do |kk,vv|
       clauses.push(Jobba::ClauseFactory.new_clause(kk,vv))
@@ -14,8 +16,19 @@ class Jobba::Query
     self
   end
+  def limit(number)
+    @_limit = number
+    @_offset ||= 0
+    self
+  end
+  def offset(number)
+    @_offset = number
+    self
+  end
   def count
-    _run(COUNT_STATUSES)
+    _run(CountStatuses.new(self))
   end
   def empty?
@@ -39,7 +52,7 @@ class Jobba::Query
   end
   def run
-    _run(GET_STATUSES)
+    _run(GetStatuses.new(self))
   end
   protected
@@ -50,73 +63,111 @@ class Jobba::Query
     @clauses = []
   end
-  class RunBlocks
-    attr_reader :redis_block, :output_block
+  class Operations
+    attr_reader :query, :redis
+    def initialize(query)
+      @query = query
+      @redis = query.redis
+    end
-    def initialize(redis_block, output_block)
-      @redis_block = redis_block
-      @output_block = output_block
+    # Standalone method that gives the final result when the query is one clause
+    def handle_single_clause(clause)
+      raise "AbstractMethod"
     end
-    def output_block_result_is_redis_block_result?
-      output_block.nil?
+    # When the query is multiple clauses, this method is called on the final set
+    # that represents the ANDing of all clauses.  It is called inside a `redis.multi`
+    # block.
+    def multi_clause_last_redis_op(result_set)
+      raise "AbstractMethod"
+    end
+    # Called on the output from the redis multi block for multi-clause queries.
+    def multi_clause_postprocess(redis_output)
+      raise "AbstractMethod"
     end
   end
-  GET_STATUSES = RunBlocks.new(
-    ->(working_set, redis) {
-      redis.zrange(working_set, 0, -1)
-    },
-    ->(ids) {
+  class GetStatuses < Operations
+    def handle_single_clause(clause)
+      ids = clause.result_ids(limit: query._limit, offset: query._offset)
       Jobba::Statuses.new(ids)
-    }
-  )
-  COUNT_STATUSES = RunBlocks.new(
-    ->(working_set, redis) {
-      redis.zcard(working_set)
-    },
-    nil
-  )
-  def _run(run_blocks)
-    # Each clause in a query is converted to a sorted set (which may be filtered,
-    # e.g. in the case of timestamp clauses) and then the sets are successively
-    # intersected.
-    #
-    # Different users of this method have different uses for the final "working"
-    # set.  Because we want to bundle all of the creations and intersections of
-    # clause sets into one call to Redis (via a `multi` block), we have users
-    # of this method provide a final block to run on the working set within
-    # Redis (and within the `multi` call) and then another block to run on
-    # the output of the first block.
-    multi_result = redis.multi do
-      load_default_clause if clauses.empty?
-      working_set = nil
-      clauses.each do |clause|
-        clause_set = clause.to_new_set
-        if working_set.nil?
-          working_set = clause_set
-        else
-          redis.zinterstore(working_set, [working_set, clause_set], weights: [0, 0])
-          redis.del(clause_set)
-        end
-      end
+    end
-      # This is later accessed as `multi_result[-2]` since it is the second to last output
-      run_blocks.redis_block.call(working_set, redis)
+    def multi_clause_last_redis_op(result_set)
+      start = query._offset || 0
+      stop = query._limit.nil? ? -1 : start + query._limit - 1
+      redis.zrange(result_set, start, stop)
+    end
-      redis.del(working_set)
+    def multi_clause_postprocess(ids)
+      Jobba::Statuses.new(ids)
     end
+  end
-    redis_block_output = multi_result[-2]
+  class CountStatuses < Operations
+    def handle_single_clause(clause)
+      clause.result_count(limit: query._limit, offset: query._offset)
+    end
+    def multi_clause_last_redis_op(result_set)
+      redis.zcard(result_set)
+    end
+    def multi_clause_postprocess(redis_output)
+      Jobba::Utils.limited_count(nonlimited_count: redis_output, offset: query._offset, limit: query._limit)
+    end
+  end
+  def _run(operations)
+    if _limit.nil? && !_offset.nil?
+      raise ArgumentError, "`limit` must be set if `offset` is set", caller
+    end
-    run_blocks.output_block_result_is_redis_block_result? ?
-      redis_block_output :
-      run_blocks.output_block.call(redis_block_output)
+    load_default_clause if clauses.empty?
+    if clauses.one?
+      # We can make specialized calls that don't need intermediate copies of sets
+      # to be made (which are costly)
+      operations.handle_single_clause(clauses.first)
+    else
+      # Each clause in a query is converted to a sorted set (which may be filtered,
+      # e.g. in the case of timestamp clauses) and then the sets are successively
+      # intersected.
+      #
+      # Different users of this method have different uses for the final "working"
+      # set.  Because we want to bundle all of the creations and intersections of
+      # clause sets into one call to Redis (via a `multi` block), we have users
+      # of this method provide a final block to run on the working set within
+      # Redis (and within the `multi` call) and then another block to run on
+      # the output of the first block.
+      #
+      # This code also works for the single clause case, but it is less efficient
+      multi_result = redis.multi do
+        working_set = nil
+        clauses.each do |clause|
+          clause_set = clause.to_new_set
+          if working_set.nil?
+            working_set = clause_set
+          else
+            redis.zinterstore(working_set, [working_set, clause_set], weights: [0, 0])
+            redis.del(clause_set)
+          end
+        end
+        # This is later accessed as `multi_result[-2]` since it is the second to last output
+        operations.multi_clause_last_redis_op(working_set)
+        redis.del(working_set)
+      end
+      operations.multi_clause_postprocess(multi_result[-2])
+    end
   end
   def load_default_clause

data/lib/jobba/utils.rb CHANGED

@@ -26,4 +26,45 @@ module Jobba::Utils
     "temp:#{SecureRandom.hex(10)}"
   end
+  def self.limited_count(nonlimited_count:, offset:, limit:)
+    raise(ArgumentError, "`limit` cannot be negative") if !limit.nil? && limit < 0
+    raise(ArgumentError, "`offset` cannot be negative") if !offset.nil? && offset < 0
+    # If we get a count of an array or set that doesn't take into account
+    # specified offsets and limits (what we call a `nonlimited_count`, but
+    # we need the count to effectively have been done with an offset and
+    # limit, this method calculates that limited count.
+    #
+    # This can happen when it is more efficient to calculate an unlimited
+    # count and then limit it after the fact.
+    #
+    # E.g.
+    #
+    # Get count of
+    #   array = [a b c d e f g]
+    # where
+    #   offset = 4
+    #   limit = 5
+    #
+    # nonlimited_count = 7
+    #
+    # The limited array includes the highlighted (^) elements
+    #   array = [a b c d e f g]
+    #                    ^ ^ ^ ^ ^
+    # Element `e` is the first element indicated by an offset of 4.  The
+    # limit of 5 then causes us to take the rest of the elements in the array.
+    # The limit here is effectively 3 since there are only 3 elements left.
+    #
+    # So the limited_count is 3.
+    first_position_counted = offset || 0
+    # The `min` here is to make sure we don't go beyond the end of the array.  The `- 1`
+    # is because we are getting a zero-indexed position from a count.
+    last_position_counted = [first_position_counted + (limit || nonlimited_count), nonlimited_count].min - 1
+    # Guard against first position being after last position by forcing min of 0
+    [last_position_counted - first_position_counted + 1, 0].max
+  end
 end

data/lib/jobba/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Jobba
-  VERSION = "1.5.0"
+  VERSION = "1.6.0"
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: jobba
 version: !ruby/object:Gem::Version
-  version: 1.5.0
+  version: 1.6.0
 platform: ruby
 authors:
 - JP Slavinsky
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2017-04-17 00:00:00.000000000 Z
+date: 2017-09-22 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: redis