RubyGems - redis-scheduler - Versions diffs - 0.3 → 0.4 - Mend

redis-scheduler 0.3 → 0.4

Files changed (3) hide show

data/README CHANGED Viewed

@@ -1,20 +1,28 @@
-This is a basic chronological scheduler for Redis. It allows you to schedule
-items to be processed at arbitrary points in time (via #schedule!), and then
-to easily retrieve only those items that are due to be processed (via #each).
+RedisScheduler is a chronological scheduler for Redis. It allows you to schedule
+items to be processed at arbitrary points in time (via RedisScheduler#schedule!)
+and easily retrieve only those items that are due to be processed (via
+RedisScheduler#each).
-Items are represented as strings.
-It does everything you'd want from a production scheduler:
+It does everything you'd expect from a production scheduler:
 * You can schedule items at arbitrary times.
-* It supports multiple simultaneous readers and writers.
-* An exception causes the in-process item to be rescheduled at the original time.
-* A crash leaves the item in a separate error queue, from which it can later be recovered.
 * You can iterate over ready items in either blocking or non-blocking mode.
-In non-blocking mode (the default), #each will iterate only over those work items
-whose scheduled time is less than or equal to the current time, and then stop.
-In blocking mode, #each will iterate over the same items, but will also block
-until items are available. In blocking mode, #each will never return.
+* It supports multiple simultaneous readers and writers.
+* A Ruby exception causes the item to be rescheduled at the original time.
+* Work items lost as part of a Ruby crash or segfault are recoverable.
+In non-blocking mode (the default), RedisScheduler#each will iterate only over
+those work items whose scheduled time is less than or equal to the current time,
+and then stop. In blocking mode, RedisScheduler#each will iterate over the same
+items, but will also block until items are available. In this mode, #each will
+never return.
+For debugging purposes, you can use RedisScheduler#items to iterate over all
+items in the queue, but note that this method is not guaranteed to be
+consistent.
+For error recovery purposes, you can use RedisScheduler#processing_set_items
+to iterate over all the items in the processing set to determine whether any
+of them are the result of a process crash.
 == Synopsis

data/lib/redis-scheduler.rb CHANGED Viewed

@@ -1,105 +1,154 @@
+## A basic chronological scheduler for Redis.
+##
+## Use #schedule! to add an item to be processed at an arbitrary point in time.
+## The item will be converted to a string and later returned to you as such.
+##
+## Use #each to iterate over those items in the schedule which are ready for
+## processing. In blocking mode, this call will never terminate. In nonblocking
+## mode, this call will terminate when there are no items ready for processing.
+##
+## Use #items to iterate over all items in the queue, for debugging purposes.
+##
+## == Ensuring reliable behavior in the presence of segfaults
+##
+## The scheduler maintains a "processing set" of items currently being
+## processed. If a process dies (i.e. not as a result of a Ruby exception, but
+## as the result of a segfault), the item will remain in this set but will
+## not longer appear in the schedule. To avoid losing scheduled work due to
+## segfaults, you must periodically iterate through this set and recover
+## any items that have been abandoned, using #processing_set_items. Setting a
+## proper 'descriptor' argument in #each is suggested.
 class RedisScheduler
   include Enumerable
   POLL_DELAY = 1.0 # seconds
   CAS_DELAY  = 0.5 # seconds
-  ## options:
-  ## * +namespace+: prefix for redis data, e.g. "scheduler/"
+  ## Options:
+  ## * +namespace+: prefix for Redis keys, e.g. "scheduler/"
   ## * +blocking+: whether #each should block or return immediately if there are items to be processed immediately.
   ##
-  ## Note that nonblocking mode may still actually block as part of the
-  ## check-and-set semantics, i.e. block during contention from multiple
-  ## clients. "Nonblocking" mode just refers to whether the scheduler
-  ## should wait until events in the schedule are ready, or only return
-  ## those items that are ready currently.
+  ## Note that nonblocking mode may still actually block momentarily as part of
+  ## the check-and-set semantics, i.e. block during contention from multiple
+  ## clients. "Nonblocking" refers to whether the scheduler should wait until
+  ## events in the schedule are ready, or only return those items that are
+  ## ready currently.
   def initialize redis, opts={}
     @redis = redis
     @namespace = opts[:namespace]
     @blocking = opts[:blocking]
     @queue = [@namespace, "q"].join
-    @error_queue = [@namespace, "errorq"].join
+    @processing_set = [@namespace, "processing"].join
     @counter = [@namespace, "counter"].join
   end
-  ## schedule an item at a specific time. item will be converted to a
-  ## string.
+  ## Schedule an item at a specific time. item will be converted to a string.
   def schedule! item, time
     id = @redis.incr @counter
     @redis.zadd @queue, time.to_f, "#{id}:#{item}"
   end
+  ## Drop all data and reset the schedule entirely.
   def reset!
-    [@queue, @error_queue, @counter].each { |k| @redis.del k }
+    [@queue, @processing_set, @counter].each { |k| @redis.del k }
   end
+  ## Return the total number of items in the schedule.
   def size; @redis.zcard @queue end
-  def error_queue_size; @redis.llen @error_queue end
-  ## yields items along with their scheduled times. only returns items
-  ## on or after their scheduled times. items returned as strings. if
-  ## @blocking is false, will stop once there are no more items that can
-  ## be processed immediately; if it's true, will wait until items
-  ## become available (and never terminate).
-  def each
-    while(x = get)
-      item, erritem, at = x
+  ## Returns the total number of items currently being processed.
+  def processing_set_size; @redis.scard @processing_set end
+  ## Yields items along with their scheduled times. only returns items on or
+  ## after their scheduled times. items are returned as strings. if @blocking is
+  ## false, will stop once there are no more items that can be processed
+  ## immediately; if it's true, will wait until items become available (and
+  ## never terminate).
+  ##
+  ## +Descriptor+ is an optional string that will be associated with this item
+  ## while in the processing set. This is useful for providing whatever
+  ## information you need to determine whether the item needs to be recovered
+  ## when iterating through the processing set.
+  def each descriptor=nil
+    while(x = get(descriptor))
+      item, processing_descriptor, at = x
       begin
         yield item, at
       rescue Exception # back in the hole!
         schedule! item, at
         raise
       ensure
-        cleanup! erritem
+        cleanup! processing_descriptor
       end
     end
   end
-  ## returns an Enumerable of [item, schedule time] pairs, which can be used to
-  ## easily iterate over all the items in the queue, in order of earliest- to
-  ## latest-scheduled. note that this view is not coordinated with write
-  ## operations, and may be inconsistent (e.g. return duplicates, miss items,
-  ## etc).
+  ## Returns an Enumerable of [item, scheduled time] pairs, which can be used
+  ## to iterate over all the items in the queue, in order of earliest- to
+  ## latest-scheduled, regardless of the schedule time.
+  ##
+  ## Note that this view is not synchronized with write operations, and thus
+  ## may be inconsistent (e.g. return duplicates, miss items, etc) if changes
+  ## to the schedule happen while iterating.
   ##
-  ## for these reasons, this operation is mainly useful for debugging purposes.
+  ## For these reasons, this is mainly useful for debugging purposes.
   def items; ItemEnumerator.new(@redis, @queue) end
+  ## Returns an Array of [item, timestamp, descriptor] tuples representing the
+  ## set of in-process items. The timestamp corresponds to the time at which
+  ## the item was removed from the schedule for processing.
+  def processing_set_items
+    @redis.smembers(@processing_set).map do |x|
+      item, timestamp, descriptor = Marshal.load(x)
+      [item, Time.at(timestamp), descriptor]
+    end
+  end
 private
-  def get; @blocking ? blocking_get : nonblocking_get end
+  def get descriptor; @blocking ? blocking_get(descriptor) : nonblocking_get(descriptor) end
-  def blocking_get
-    sleep POLL_DELAY until(x = nonblocking_get)
+  def blocking_get descriptor
+    sleep POLL_DELAY until(x = nonblocking_get(descriptor))
     x
   end
+  ## Thrown by some RedisScheduler operations if the item in Redis zset
+  ## underlying the schedule is not parseable. This should basically never
+  ## happen, unless you are naughty and are adding/removing items from that
+  ## zset yourself.
   class InvalidEntryException < StandardError; end
-  def nonblocking_get
+  def nonblocking_get descriptor
     catch :cas_retry do
       @redis.watch @queue
-      item, at = @redis.zrangebyscore @queue, 0, Time.now.to_f,
-        :withscores => true, :limit => [0, 1]
-      if item
+      entry, at = @redis.zrangebyscore @queue, 0, Time.now.to_f, :withscores => true, :limit => [0, 1]
+      if entry
+        entry =~ /^\d+:(\S+)$/ or raise InvalidEntryException, entry
+        item = $1
+        processing_descriptor = Marshal.dump [item, Time.now.to_i, descriptor]
         @redis.multi do # try and grab it
-          @redis.zrem @queue, item
-          @redis.lpush @error_queue, item
+          @redis.zrem @queue, entry
+          @redis.sadd @processing_set, processing_descriptor
         end or begin
           sleep CAS_DELAY
           throw :cas_retry
         end
-        item =~ /^\d+:(\S+)$/ or raise InvalidEntryException, item
-        original = $1
-        [original, item, Time.at(at.to_f)]
+        [item, processing_descriptor, Time.at(at.to_f)]
       end
     end
   end
   def cleanup! item
-    @redis.lrem @error_queue, 1, item
+    @redis.srem @processing_set, item
   end
-  ## enumerable for just iterating over everything in the queue
+  ## Enumerable class for iterating over everything in the schedule. Paginates
+  ## calls to Redis under the hood (and is thus usable for very large
+  ## schedules), but is not synchronized with write traffic and thus may return
+  ## duplicates or skip items when paginating.
+  ##
+  ## Supports random access with #[], with the same caveats as above.
   class ItemEnumerator
     include Enumerable
     def initialize redis, q
@@ -107,21 +156,26 @@ private
       @q = q
     end
-    BLOCK_SIZE = 10
+    PAGE_SIZE = 50
     def each
       start = 0
       while start < size
-        elements = @redis.zrange @q, start, start + BLOCK_SIZE,
-          :withscores => true
-        elements.each_slice(2) do |item, at| # isgh
-          item =~ /^\d+:(\S+)$/ or raise InvalidEntryException, item
-          item = $1
-          yield item, Time.at(at.to_f)
-        end
+        elements = self[start, PAGE_SIZE]
+        elements.each { |*x| yield(*x) }
         start += elements.size
       end
     end
+    def [] start, num=nil
+      elements = @redis.zrange @q, start, start + (num || 0) - 1, :withscores => true
+      v = elements.each_slice(2).map do |item, at|
+        item =~ /^\d+:(\S+)$/ or raise InvalidEntryException, item
+        item = $1
+        [item, Time.at(at.to_f)]
+      end
+      num ? v : v.first
+    end
     def size; @redis.zcard @q end
   end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: redis-scheduler
 version: !ruby/object:Gem::Version
-  version: '0.3'
+  version: '0.4'
   prerelease:
 platform: ruby
 authors:
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-04-29 00:00:00.000000000 Z
+date: 2012-05-12 00:00:00.000000000 Z
 dependencies: []
 description: A basic chronological scheduler for Redis. Add work items to be processed
   at specific times in the future, and easily retrieve all items that are ready for