RubyGems - progressor - Versions diffs - 0.0.1 → 0.1.0 - Mend

progressor 0.0.1 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/README.md +97 -7
data/lib/progressor.rb +54 -73
data/lib/progressor/error.rb +7 -0
data/lib/progressor/formatting.rb +39 -0
data/lib/progressor/limited_sequence.rb +126 -0
data/lib/progressor/unlimited_sequence.rb +104 -0
data/lib/progressor/version.rb +3 -0
metadata +21 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: fa7eacefe70e6b186f9a72625d6158db19b7f9b45b327c397b7b4402dc56bc47
-  data.tar.gz: 5cb7a42fffe3d6650a48454e41c73d8f74e7cb72fdf521fb2fdd6e59bb2cae8c
+  metadata.gz: 67733b5ccdcf4efdbe628a511246b38238f5bd368e320ae9251ec34dc1be7cc5
+  data.tar.gz: 2c84c065e48fdfab6e9ab4079c7c595be60cbfe16f7378970a451cebe36af326
 SHA512:
-  metadata.gz: 2d11d38113127328cb2d0cd6b6d98652f9e6010816aebdbbd0c44cc27b331ba03c9b961308f2ec71e6abb5f0b0694f4952132214a126cdbecb5d062741e7cbe0
-  data.tar.gz: cba7008ee6ca3151ba3c89159dc7835d8d76b5cb981da4017d3dffe0fdb14a9bf0c3ef9e18a6f532ab220f4abececc487a8145e86e34b77b226b9a81f9473f13
+  metadata.gz: 4735274940fcb4bd4b54089dce074bbbb6fe42cc530db202f3a3f4738c396624685d3d6740e8707882d8cfa9c0f4e2053851f5ed1c987c3d0a88c1fb665b2f28
+  data.tar.gz: 375e03f4109ea4a74de4483b55aa889611d7d2d0ec8901950363b2e5196f8652fb8995d20bcddb8d8118e842e6ead21d71bb47dec00cd071795962361f505371

data/README.md CHANGED

@@ -1,8 +1,31 @@
-A very basic library to measure loops in a long-running task.
+Full documentation for the latest released version can be found at: https://www.rubydoc.info/gems/progressor
-*Note: Very incomplete, so mostly for personal usage. Will hopefully flesh it out, write tests, configuration, etc, at some point (PRs welcome). Until then, a similar library can be found here: https://github.com/mkdynamic/ke*
+## Basic example
-Example usage:
+Here's an example long-running task:
+``` ruby
+Product.find_each do |product|
+  next if product.not_something_we_want_to_process?
+  product.calculate_interesting_stats
+end
+```
+In order to understand how it's progressing, we might add some print statements:
+``` ruby
+Product.find_each do |product|
+  if product.not_something_we_want_to_process?
+    puts "Skipping product: #{product.id}"
+    next
+  end
+  puts "Working on product: #{product.id}"
+  product.calculate_interesting_stats
+end
+```
+This gives us some indication of progress, but no idea how much time is left. We could take a count and maintain a manual index, and then eyeball it based on how fast the numbers are adding up. Progressor automates that process:
 ``` ruby
 progressor = Progressor.new(total_count: Product.count)
@@ -20,12 +43,79 @@ Product.find_each do |product|
 end
 ```
-Example output:
+Each invocation of `run` measures how long its block took and records it. The yielded `progress` parameter is an object that can be `to_s`-ed to provide progress information.
+The output might look like this:
 ```
 ...
-[0038/1000, (004%), t/i: 0.5s, ETA: 8m:0.27s] Product 38
-[0039/1000, (004%), t/i: 0.5s, ETA: 7m:58.47s] Product 39
-[0040/1000, (004%), t/i: 0.5s, ETA: 7m:57.08s] Product 40
+[0038/1000, (004%), t/i: 0.5s, ETA: 8m:00s] Product 38
+[0039/1000, (004%), t/i: 0.5s, ETA: 7m:58s] Product 39
+[0040/1000, (004%), t/i: 0.5s, ETA: 7m:57s] Product 40
 ...
 ```
+You can check the documentation for the [Progressor](https://www.rubydoc.info/gems/progressor/Progressor) class for details on the methods you can call to get the individual pieces of data shown in the report.
+## Limited and unlimited sequences
+Initializing a `Progressor` with a provided `total_count:` parameter gives you a limited sequence, which can give you not only a progress report, but an estimation of when it'll be done:
+```
+[<current loop>/<total count>, (<progress>%), t/i: <time per iteration>, ETA: <time until it's done>]
+```
+The calculation is done by maintaining a list of measurements with a limited size, and a list of averages of those measurements. The average of averages is the "time per iteration" and it's multiplied by the remaining count to produce the estimation.
+I can't really say how reliable this is, but it seems to provide smoothly changing estimations that seem more or less correct to me, for similarly-sized chunks of work per iteration.
+**Not** providing a `total_count:` parameter leads to less available information:
+``` ruby
+progressor = Progressor.new
+(1..100).each do |i|
+  progressor.run do |progress|
+    sleep rand
+    puts progress
+  end
+end
+```
+A sample of output might look like this:
+```
+...
+11, t: 5.32s, t/i: 442.39ms
+12, t: 5.58s, t/i: 446.11ms
+...
+```
+The format is:
+```
+<current>, t: <time from start>, t/i: <time per iteration>
+```
+## Configuration
+Apart from `total_count`, which is optional and affects the kind of sequence that will be stored, you can provide `min_samples` and `max_samples`. You can also provide a custom formatter:
+``` ruby
+progressor = Progressor.new({
+  total_count: 1000,
+  min_samples: 5,
+  max_samples: 10,
+  formatter: -> (p) { p.eta }
+})
+```
+The option `min_samples` determines how many loops the tool will wait until trying to produce an estimation. A higher number means no information in the beginning, but no wild fluctuations, either. It needs to be at least 1 and the default is 1.
+The option `max_samples` is how many measurements will be retained. Those measurements will be averaged, and then those averages averaged to get a time-per-iteration estimate. A smaller number means giving more weight to later events, while a larger one would average over a larger amount of samples. The default is 100.
+The `formatter` is a callback that gets a progress object as an argument and you can return your own string to output on every loop. Check `LimitedSequence` and `UnlimitedSequence` for the available methods and accessors you can use.
+## Related work
+A very similar tool is the gem [ke](https://github.com/mkdynamic/ke). It provides its estimation by maintaining the median quartile range of the stored measurements, removing outliers. It also automates the output of the progress report, only printing it every N loops. Depending on your needs and preferences, it might be better for your use case.

data/lib/progressor.rb CHANGED

@@ -1,7 +1,13 @@
+require 'progressor/version'
+require 'progressor/error'
+require 'progressor/formatting'
+require 'progressor/limited_sequence'
+require 'progressor/unlimited_sequence'
 require 'benchmark'
 # Used to measure the running time of parts of a long-running task and output
-# an estimation based on the average of the last 10-100 measurements.
+# an estimation based on the average of the last 1-100 measurements.
 #
 # Example usage:
 #
@@ -22,21 +28,23 @@ require 'benchmark'
 # Example output:
 #
 #   ...
-#   [0038/1000, (004%), t/i: 0.5s, ETA: 8m:0.27s] Product 38
-#   [0039/1000, (004%), t/i: 0.5s, ETA: 7m:58.47s] Product 39
-#   [0040/1000, (004%), t/i: 0.5s, ETA: 7m:57.08s] Product 40
+#   [0038/1000, 004%, t/i: 0.5s, ETA: 8m:00s] Product 38
+#   [0039/1000, 004%, t/i: 0.5s, ETA: 7m:58s] Product 39
+#   [0040/1000, 004%, t/i: 0.5s, ETA: 7m:57s] Product 40
 #   ...
 #
 class Progressor
-  VERSION = '0.0.1'
+  include Formatting
   # Utility method to print a message with the time it took to run the contents
   # of the block.
   #
-  # > Progressor.puts("Working on a thing") { thing_work }
+  #   Progressor.puts("Working on a thing") { thing_work }
+  #
+  # Output:
   #
-  # Working on a thing...
-  # Working on a thing DONE: 2.1s
+  #   Working on a thing...
+  #   Working on a thing DONE: 2.1s
   #
   def self.puts(message, &block)
     Kernel.puts "#{message}..."
@@ -44,76 +52,49 @@ class Progressor
     Kernel.puts "#{message} DONE: #{format_time(measurement.real)}"
   end
-  def initialize(total_count:)
-    @total_count = total_count
-    @total_count_digits = total_count.to_s.length
-    @current = 0
-    @measurements = []
-    @averages = []
+  # Set up a new Progressor instance. Optional parameters:
+  #
+  # - total_count: If given, the tool will be able to provide an ETA.
+  #
+  # - min_samples: The number of samples to collect before attempting to
+  #   calculate a time per iteration. Default: 1
+  #
+  # - max_samples: The maximum number of measurements to collect and average.
+  #   Default: 100.
+  #
+  # - formatter: A callable that accepts a progress object and returns a
+  #   custom formatted string.
+  #
+  def initialize(total_count: nil, min_samples: 1, max_samples: 100, formatter: nil)
+    params = {
+      min_samples: min_samples,
+      max_samples: max_samples,
+      formatter:   formatter,
+    }
+    if total_count
+      @sequence = LimitedSequence.new(total_count: total_count, **params)
+    else
+      @sequence = UnlimitedSequence.new(**params)
+    end
   end
+  # Run the given block of code, yielding a sequence object that holds progress
+  # information.
+  #
+  # Example usage:
+  #
+  #   progressor.run { |progress| puts progress; long_running_task() }
+  #
   def run
-    @current += 1
-    measurement = Benchmark.measure { yield self }
-    @measurements << measurement.real
-    # only keep last 1000
-    @measurements.shift if @measurements.count > 1000
-    @averages << average(@measurements)
-    @averages = @averages.compact
-    # only keep last 100
-    @averages.shift if @averages.count > 100
+    measurement = Benchmark.measure { yield @sequence }
+    @sequence.push(measurement.real)
   end
+  # Skips the given number of loops (will likely be 1), updating the
+  # estimations appropriately.
+  #
   def skip(n)
-    @total_count -= n
-  end
-  def to_s
-    [
-      "#{@current.to_s.rjust(@total_count_digits, '0')}/#{@total_count}",
-      "(#{((@current / @total_count.to_f) * 100).round.to_s.rjust(3, '0')}%)",
-      "t/i: #{self.class.format_time(per_iteration)}",
-      "ETA: #{self.class.format_time(eta)}",
-    ].join(', ')
-  end
-  def per_iteration
-    return nil if @measurements.count < 10
-    average(@averages)
-  end
-  def eta
-    return nil if @measurements.count < 10
-    remaining_time = per_iteration * (@total_count - @current)
-    remaining_time.round(2)
-  end
-  private
-  def self.format_time(time)
-    return "?s" if time.nil?
-    if time < 0.1
-      "#{(time * 1000).round(2)}ms"
-    elsif time < 60
-      "#{time.round(2)}s"
-    elsif time < 3600
-      minutes = time.to_i / 60
-      seconds = (time - minutes * 60).round(2)
-      "#{minutes}m:#{seconds}s"
-    else
-      hours = time.to_i / 3600
-      minutes = (time.to_i % 3600) / 60
-      seconds = (time - (hours * 3600 + minutes * 60)).round(2)
-      "#{hours}h:#{minutes}m:#{seconds}s"
-    end
-  end
-  def average(collection)
-    collection.inject(&:+) / collection.count.to_f
+    @sequence.skip(n)
   end
 end

data/lib/progressor/error.rb ADDED

@@ -0,0 +1,7 @@
+class Progressor
+  # A custom error class for targeted catching. All Progressor errors will be
+  # wrapped in a Progressor::Error.
+  #
+  class Error < RuntimeError
+  end
+end

data/lib/progressor/formatting.rb ADDED

@@ -0,0 +1,39 @@
+class Progressor
+  module Formatting
+    # Formats the given time in seconds to something human readable. Examples:
+    #
+    # - 1 second:      1.00s
+    # - 0.123 seconds: 123.00ms
+    # - 100 seconds:   01m:40s
+    # - 101.5 seconds: 01m:41s
+    # - 3661 seconds:  01h:01m:01s
+    def format_time(time)
+      return "?s" if time.nil?
+      if time < 1
+        "#{format_float((time * 1000).round(2))}ms"
+      elsif time < 60
+        "#{format_float(time.round(2))}s"
+      elsif time < 3600
+        minutes = time.to_i / 60
+        seconds = (time - minutes * 60).round(2)
+        "#{format_int(minutes)}m:#{format_int(seconds)}s"
+      else
+        hours = time.to_i / 3600
+        minutes = (time.to_i % 3600) / 60
+        seconds = (time - (hours * 3600 + minutes * 60)).round(2)
+        "#{format_int(hours)}h:#{format_int(minutes)}m:#{format_int(seconds)}s"
+      end
+    end
+    # :nodoc:
+    def format_int(value)
+      sprintf("%02d", value)
+    end
+    # :nodoc:
+    def format_float(value)
+      sprintf("%0.2f", value)
+    end
+  end
+end

data/lib/progressor/limited_sequence.rb ADDED

@@ -0,0 +1,126 @@
+class Progressor
+  class LimitedSequence
+    include Formatting
+    attr_reader :total_count, :min_samples, :max_samples
+    # The current loop index, starts at 1
+    attr_reader :current
+    # The time the object was created
+    attr_reader :start_time
+    # Creates a new LimitedSequence with the given parameters:
+    #
+    # - total_count: The expected number of loops.
+    #
+    # - min_samples: The number of samples to collect before attempting to
+    #   calculate a time per iteration. Default: 1
+    #
+    # - max_samples: The maximum number of measurements to collect and average.
+    #   Default: 100.
+    #
+    # - formatter: A callable that accepts the sequence object and returns a
+    #   custom formatted string.
+    #
+    def initialize(total_count:, min_samples: 1, max_samples: 100, formatter: nil)
+      @total_count = total_count
+      @min_samples = min_samples
+      @max_samples = [max_samples, total_count].min
+      @formatter   = formatter
+      raise Error.new("min_samples needs to be a positive number") if min_samples <= 0
+      raise Error.new("max_samples needs to be larger than min_samples") if max_samples <= min_samples
+      @start_time         = Time.now
+      @total_count_digits = total_count.to_s.length
+      @current            = 0
+      @measurements       = []
+      @averages           = []
+    end
+    # Adds a duration in seconds to the internal storage of samples. Updates
+    # averages accordingly.
+    #
+    def push(duration)
+      @current += 1
+      @measurements << duration
+      # only keep last `max_samples`
+      @measurements.shift if @measurements.count > max_samples
+      @averages << average(@measurements)
+      @averages = @averages.compact
+      # only keep last `max_samples`
+      @averages.shift if @averages.count > max_samples
+    end
+    # Skips an iteration, updating the total count and ETA
+    #
+    def skip(n)
+      @total_count -= n
+    end
+    # Outputs a textual representation of the current state of the
+    # UnlimitedSequence. Shows:
+    #
+    # - the current number of iterations and the total count
+    # - completion level in percentage
+    # - how long a single iteration takes
+    # - estimated time of arrival (ETA) -- time until it's done
+    #
+    # A custom `formatter` provided at construction time overrides this default
+    # output.
+    #
+    # If the "current" number of iterations goes over the total count, an ETA
+    # can't be shown anymore, so it'll just be the current number over the
+    # expected one, and the time per iteration.
+    #
+    def to_s
+      return @formatter.call(self).to_s if @formatter
+      if @current > @total_count
+        return [
+          "#{@current} (expected #{@total_count})",
+          "t/i: #{format_time(per_iteration)}",
+          "ETA: ???",
+        ].join(', ')
+      end
+      [
+        "#{@current.to_s.rjust(@total_count_digits, '0')}/#{@total_count}",
+        "#{((@current / @total_count.to_f) * 100).round.to_s.rjust(3, '0')}%",
+        "t/i: #{format_time(per_iteration)}",
+        "ETA: #{format_time(eta)}",
+      ].join(', ')
+    end
+    # Returns an estimation for the time per single iteration. Implemented as
+    # an average of averages to provide a smoother gradient from loop to loop.
+    #
+    # Returns nil if not enough samples have been collected yet.
+    #
+    def per_iteration
+      return nil if @measurements.count < min_samples
+      average(@averages)
+    end
+    # Returns an estimation for the Estimated Time of Arrival (time until
+    # done).
+    #
+    # Calculated by multiplying the average time per iteration with the
+    # remaining number of loops.
+    #
+    def eta
+      return nil if @measurements.count < min_samples
+      remaining_time = per_iteration * (@total_count - @current)
+      remaining_time.round(2)
+    end
+    private
+    def average(collection)
+      collection.inject(&:+) / collection.count.to_f
+    end
+  end
+end

data/lib/progressor/unlimited_sequence.rb ADDED

@@ -0,0 +1,104 @@
+class Progressor
+  class UnlimitedSequence
+    include Formatting
+    attr_reader :min_samples, :max_samples
+    # The current loop index, starts at 1
+    attr_reader :current
+    # The time the object was created
+    attr_reader :start_time
+    # Creates a new UnlimitedSequence with the given parameters:
+    #
+    # - min_samples: The number of samples to collect before attempting to
+    #   calculate a time per iteration. Default: 1
+    #
+    # - max_samples: The maximum number of measurements to collect and average.
+    #   Default: 100.
+    #
+    # - formatter: A callable that accepts the sequence object and returns a
+    #   custom formatted string.
+    #
+    def initialize(min_samples: 1, max_samples: 100, formatter: nil)
+      @min_samples = min_samples
+      @max_samples = max_samples
+      @formatter   = formatter
+      raise Error.new("min_samples needs to be a positive number") if min_samples <= 0
+      raise Error.new("max_samples needs to be larger than min_samples") if max_samples <= min_samples
+      @start_time   = Time.now
+      @current      = 0
+      @measurements = []
+      @averages     = []
+    end
+    # Adds a duration in seconds to the internal storage of samples. Updates
+    # averages accordingly.
+    #
+    def push(duration)
+      @current += 1
+      @measurements << duration
+      # only keep last `max_samples`
+      @measurements.shift if @measurements.count > max_samples
+      @averages << average(@measurements)
+      @averages = @averages.compact
+      # only keep last `max_samples`
+      @averages.shift if @averages.count > max_samples
+    end
+    # "Skips" an iteration, which, in the context of an UnlimitedSequence is a no-op.
+    #
+    def skip(_n)
+      # Nothing to do
+    end
+    # Outputs a textual representation of the current state of the
+    # UnlimitedSequence. Shows:
+    #
+    # - the current (1-indexed) number of iterations
+    # - how long since the start time
+    # - how long a single iteration takes
+    #
+    # A custom `formatter` provided at construction time overrides this default
+    # output.
+    #
+    def to_s
+      return @formatter.call(self).to_s if @formatter
+      [
+        "#{@current + 1}",
+        "t: #{format_time(Time.now - @start_time)}",
+        "t/i: #{format_time(per_iteration)}",
+      ].join(', ')
+    end
+    # Returns an estimation for the time per single iteration. Implemented as
+    # an average of averages to provide a smoother gradient from loop to loop.
+    #
+    # Returns nil if not enough samples have been collected yet.
+    #
+    def per_iteration
+      return nil if @measurements.count < min_samples
+      average(@averages)
+    end
+    # Is supposed to return an estimation for the Estimated Time of Arrival
+    # (time until done).
+    #
+    # For an UnlimitedSequence, this always returns nil.
+    #
+    def eta
+      # No estimation possible
+    end
+    private
+    def average(collection)
+      collection.inject(&:+) / collection.count.to_f
+    end
+  end
+end

data/lib/progressor/version.rb ADDED

@@ -0,0 +1,3 @@
+class Progressor
+  VERSION = '0.1.0'
+end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: progressor
 version: !ruby/object:Gem::Version
-  version: 0.0.1
+  version: 0.1.0
 platform: ruby
 authors:
 - Andrew Radev
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2019-02-20 00:00:00.000000000 Z
+date: 2019-03-15 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -52,6 +52,20 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '3.0'
+- !ruby/object:Gem::Dependency
+  name: timecop
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.9'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.9'
 description: |
   Provides a way to measure how long each loop in a task took, outputting a
   report with an estimated time till the task is done.
@@ -65,6 +79,11 @@ files:
 - LICENSE
 - README.md
 - lib/progressor.rb
+- lib/progressor/error.rb
+- lib/progressor/formatting.rb
+- lib/progressor/limited_sequence.rb
+- lib/progressor/unlimited_sequence.rb
+- lib/progressor/version.rb
 homepage: https://github.com/AndrewRadev/progressor
 licenses:
 - MIT