RubyGems - rspecq - Versions diffs - 0.0.1.pre2 → 0.1.0 - Mend

rspecq 0.0.1.pre2 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +18 -0
data/README.md +165 -63
data/Rakefile +9 -0
data/bin/rspecq +79 -28
data/lib/rspecq.rb +5 -7
data/lib/rspecq/formatters/failure_recorder.rb +3 -2
data/lib/rspecq/queue.rb +21 -5
data/lib/rspecq/reporter.rb +3 -4
data/lib/rspecq/version.rb +1 -1
data/lib/rspecq/worker.rb +112 -63
metadata +55 -11

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4fcc5311329946efb2a7801087f4cdb5f0be8becc01bd1bb8f367b6d130a02ea
-  data.tar.gz: d6f64c6d0c1dae8a53af8bf2d7724e2fb988b03fa795c59d0f5ecd18a92b072a
+  metadata.gz: d6b4c91525a2fb29e2198f290877ffe5ef1e753dafe0f9babd75a581e25d7af8
+  data.tar.gz: 27d3705ee014a5dc77514238b36386eaf11d1bb76601ab25c463796184ea5795
 SHA512:
-  metadata.gz: 7611cf0944ea7751eaf93a7aae5686f6c03563b01e71978d9e1d61f30f14f89b12de0ce9ac590f9351eff22a4d4811f9e2f6c241232754ede3162142225f2c27
-  data.tar.gz: bdbd6da559607026b8e6fead442d4b15b1bb73e957a63b022c4459a83aba0c2a8e297204d9c29ae3bbc700d39ea2434c7b99e55dfb3752a56af5376d0511fea0
+  metadata.gz: 21803fa664abe45f173f7121dc948ebc8dbc0df41046f8e6269e8ec3751b647701b07a8b6a872aad0da22f438c043ebe6bf0fc69bc9c9c8b327e3b157dccab04
+  data.tar.gz: 4d683884610e2e28ca5ce5cf891e37135886ea4c6ec905f26607c1f0fdd62c1952e9c49983bb0191eb5d42433a800bf88f8227354a213b50645742f21f95af16

data/CHANGELOG.md CHANGED

@@ -1,4 +1,22 @@
 # Changelog
+Breaking changes are prefixed with a "[BREAKING]" label.
 ## master (unreleased)
+## 0.1.0 (2020-08-27)
+### Added
+- Sentry integration for various RSpecQ-level events [[#16](https://github.com/skroutz/rspecq/pull/16)]
+- CLI: Flags can now be also set environment variables [[c519230](https://github.com/skroutz/rspecq/commit/c5192303e229f361e8ac86ae449b4ea84d42e022)]
+- CLI: Added shorthand specifiers versions for some flags [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
+- CLI: Added `--help` and `--version` flags [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
+- CLI: Max number of retries for failed examples is now configurable via the `--max-requeues` option [[#14](https://github.com/skroutz/rspecq/pull/14)]
+### Changed
+- [BREAKING] CLI: Renamed `--timings` to `--update-timings` [[c519230](https://github.com/skroutz/rspecq/commit/c5192303e229f361e8ac86ae449b4ea84d42e022)]
+- [BREAKING] CLI: Renamed `--build-id` to `--build` and `--worker-id` to `--worker` [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
+- CLI: `--worker` is not required when `--reporter` is used [[4323a75](https://github.com/skroutz/rspecq/commit/4323a75ca357274069d02ba9fb51cdebb04e0be4)]
+- CLI: Improved help output [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]

data/README.md CHANGED

@@ -1,102 +1,204 @@
-# RSpecQ
+RSpec Queue
+=========================================================================
+[![Build Status](https://travis-ci.com/skroutz/rspecq.svg?branch=master)](https://travis-ci.com/github/skroutz/rspecq)
+[![Gem Version](https://badge.fury.io/rb/rspecq.svg)](https://badge.fury.io/rb/rspecq)
-RSpecQ (`rspecq`) distributes and executes an RSpec suite over many workers,
-using a centralized queue backed by Redis.
+RSpec Queue (RSpecQ) distributes and executes RSpec suites among parallel
+workers. It uses a centralized queue that workers connect to and pop off
+tests from. It ensures optimal scheduling of tests based on their run time,
+facilitating faster CI builds.
-RSpecQ is heavily inspired by [test-queue](https://github.com/tmm1/test-queue)
+RSpecQ is inspired by [test-queue](https://github.com/tmm1/test-queue)
 and [ci-queue](https://github.com/Shopify/ci-queue).
-## Why don't you just use ci-queue?
+## Features
+- Run an RSpec suite among many workers
+  (potentially located in different hosts) in a distributed fashion,
+  facilitating faster CI builds.
+- Consolidated, real-time reporting of a build's progress.
+- Optimal scheduling of test execution by using timings statistics from previous runs and
+  automatically scheduling slow spec files as individual examples. See
+  [*Spec file splitting*](#spec-file-splitting).
+- Automatic retry of test failures before being considered legit, in order to
+  rule out flakiness. See [*Requeues*](#requeues).
+- Handles intermittent worker failures (e.g. network hiccups, faulty hardware etc.)
+  by detecting non-responsive workers and requeing their jobs. See [*Worker failures*](#worker-failures)
+- [Sentry](https://sentry.io) integration for monitoring important
+  RSpecQ-level events.
+- [PLANNED] StatsD integration for various build-level metrics and insights.
+  See [#2](https://github.com/skroutz/rspecq/issues/2).
-While evaluating ci-queue for our RSpec suite, we observed slow boot times
-in the workers (up to 3 minutes), increased memory consumption and too much
-disk I/O on boot. This is due to the fact that a worker in ci-queue has to
-load every spec file on boot. This can be problematic for applications with
-a large number of spec files.
-RSpecQ works with spec files as its unit of work (as opposed to ci-queue which
-works with individual examples). This means that an RSpecQ worker does not
-have to load all spec files at once and so it doesn't have the aforementioned
-problems. It also allows suites to keep using `before(:all)` hooks
-(which ci-queue explicitly rejects). (Note: RSpecQ also schedules individual
-examples, but only when this is deemed necessary, see section
-"Spec file splitting").
+## Usage
-We also observed faster build times by scheduling spec files instead of
-individual examples, due to way less Redis operations.
+A worker needs to be given a name and the build it will participate in.
+Assuming there's a Redis instance listening at `localhost`, starting a worker
+is as simple as:
-The downside of this design is that it's more complicated, since the scheduling
-of spec files happens based on timings calculated from previous runs. This
-means that RSpecQ maintains a key with the timing of each job and updates it
-on every run (if the `--timings` option was used). Also, RSpecQ has a "slow
-file threshold" which, currently has to be set manually (but this can be
-improved).
+```shell
+$ rspecq --build=123 --worker=foo1 spec/
+```
-*Update*: ci-queue deprecated support for RSpec, so there's that.
+To start more workers for the same build, use distinct worker IDs but the same
+build ID:
-## Usage
+```shell
+$ rspecq --build=123 --worker=foo2
+```
-Each worker needs to know the build it will participate in, its name and where
-Redis is located. To start a worker:
+To view the progress of the build use `--report`:
 ```shell
-$ rspecq --build-id=foo --worker-id=worker1 --redis=redis://localhost
+$ rspecq --build=123 --report
 ```
-To view the progress of the build print use `--report`:
+For detailed info use `--help`:
-```shell
-$ rspecq --build-id=foo --worker-id=reporter --redis=redis://localhost --report
 ```
+NAME:
+    rspecq - Optimally distribute and run RSpec suites among parallel workers
+USAGE:
+    rspecq [<options>] [spec files or directories]
+OPTIONS:
+    -b, --build ID                   A unique identifier for the build. Should be common among workers participating in the same build.
+    -w, --worker ID                  An identifier for the worker. Workers participating in the same build should have distinct IDs.
+    -r, --redis HOST                 Redis host to connect to (default: 127.0.0.1).
+        --update-timings             Update the global job timings key with the timings of this build. Note: This key is used as the basis for job scheduling.
+        --file-split-threshold N     Split spec files slower than N seconds and schedule them as individual examples.
+        --report                     Enable reporter mode: do not pull tests off the queue; instead print build progress and exit when it's finished.
+                                     Exits with a non-zero status code if there were any failures.
+        --report-timeout N           Fail if build is not finished after N seconds. Only applicable if --report is enabled (default: 3600).
+        --max-requeues N             Retry failed examples up to N times before considering them legit failures (default: 3).
+    -h, --help                       Show this message.
+    -v, --version                    Print the version and exit.
+```
+### Sentry integration
-For detailed info use `--help`.
+RSpecQ can optionally emit build events to a
+[Sentry](https://sentry.io) project by setting the
+[`SENTRY_DSN`](https://github.com/getsentry/raven-ruby#raven-only-runs-when-sentry_dsn-is-set)
+environment variable.
+This is convenient for monitoring important warnings/errors that may impact
+build times, such as the fact that no previous timings were found and
+therefore job scheduling was effectively random for a particular build.
 ## How it works
-The basic idea is identical to ci-queue so please refer to its README
+The core design is almost identical to ci-queue so please refer to its
+[README](https://github.com/Shopify/ci-queue/blob/master/README.md) instead.
 ### Terminology
-- Job: the smallest unit of work, which is usually a spec file
+- **Job**: the smallest unit of work, which is usually a spec file
   (e.g. `./spec/models/foo_spec.rb`) but can also be an individual example
-  (e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow
-- Queue: a collection of Redis-backed structures that hold all the necessary
-  information for RSpecQ to function. This includes timing statistics, jobs to
-  be executed, the failure reports, requeueing statistics and more.
-- Worker: a process that, given a build id, pops up jobs of that build and
-  executes them using RSpec
-- Reporter: a process that, given a build id, waits for the build to finish
-  and prints the summary report (examples executed, build result, failures etc.)
+  (e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow.
+- **Queue**: a collection of Redis-backed structures that hold all the necessary
+  information for an RSpecQ build to run. This includes timing statistics,
+  jobs to be executed, the failure reports and more.
+- **Build**: a particular test suite run. Each build has its own **Queue**.
+- **Worker**: an `rspecq` process that, given a build id, consumes jobs off the
+  build's queue and executes them using RSpec
+- **Reporter**: an `rspecq` process that, given a build id, waits for the build's
+  queue to be drained and prints the build summary report
 ### Spec file splitting
-Very slow files may put a limit to how fast the suite can execute. For example,
-a worker may spend 10 minutes running a single slow file, while all the other
-workers finish after 8 minutes. To overcome this issue, rspecq splits
-files that their execution time is above a certain threshold
-(set with the `--file-split-threshold` option) and will instead schedule them as
-individual examples.
+Particularly slow spec files may set a limit to how fast a build can be.
+For example, a single file may need 10 minutes to run while all other
+files finish after 8 minutes. This would cause all but one workers to be
+sitting idle for 2 minutes.
+To overcome this issue, RSpecQ can splits files which their execution time is
+above a certain threshold (set with the `--file-split-threshold` option)
+and instead schedule them as individual examples.
-In the future, we'd like for the slow threshold to be calculated and set
-dynamically.
+Note: In the future, we'd like for the slow threshold to be calculated and set
+dynamically (see #3).
 ### Requeues
-As a mitigation measure for flaky tests, if an example fails it will be put
-back to the queue to be picked up by
-another worker. This will be repeated up to a certain number of times before,
-after which the example will be considered a legit failure and will be printed
-in the final report (`--report`).
+As a mitigation technique against flaky tests, if an example fails it will be
+put back to the queue to be picked up by another worker. This will be repeated
+up to a certain number of times (set with the `--max-requeues` option), after
+which the example will be considered a legit failure and printed as such in the
+final report.
 ### Worker failures
-Workers emit a timestamp after each example, as a heartbeat, to denote
-that they're fine and performing jobs. If a worker hasn't reported for
-a given amount of time (see `WORKER_LIVENESS_SEC`) it is considered dead
-and the job it reserved will be requeued, so that it is picked up by another worker.
+It's not uncommon for CI processes to encounter unrecoverable failures for
+various reasons: faulty hardware, network hiccups, segmentation faults in
+MRI etc.
+For resiliency against such issues, workers emit a heartbeat after each
+example they execute, to signal
+that they're healthy and performing jobs as expected. If a worker hasn't
+emitted a heartbeat for a given amount of time (set by `WORKER_LIVENESS_SEC`)
+it is considered dead and its reserved job will be put back to the queue, to
+be picked up by another healthy worker.
+## Rationale
+### Why didn't you use ci-queue?
+**Update**: ci-queue [deprecated support for RSpec](https://github.com/Shopify/ci-queue/pull/149).
+While evaluating ci-queue we experienced slow worker boot
+times (up to 3 minutes in some cases) combined with disk IO saturation and
+increased memory consumption. This is due to the fact that a worker in
+ci-queue has to load every spec file on boot. In applications with a large
+number of spec files this may result in a significant performance hit and
+in case of cloud environments, increased costs.
+We also observed slower build times compared to our previous solution which
+scheduled whole spec files (as opposed to individual examples), due to
+big differences in runtimes of individual examples, something common in big
+RSpec suites.
+We decided for RSpecQ to use whole spec files as its main unit of work (as
+opposed to ci-queue which uses individual examples). This means that an RSpecQ
+worker only loads the files needed and ends up with a subset of all the suite's
+files.  (Note: RSpecQ also schedules individual examples, but only when this is
+deemed necessary, see [Spec file splitting](#spec-file-splitting)).
+This kept boot and test run times considerably fast. As a side benefit, this
+allows suites to keep using `before(:all)` hooks (which ci-queue explicitly
+rejects).
+The downside of this design is that it's more complicated, since the scheduling
+of spec files happens based on timings calculated from previous runs. This
+means that RSpecQ maintains a key with the timing of each job and updates it
+on every run (if the `--timings` option was used). Also, RSpecQ has a "slow
+file threshold" which, currently has to be set manually (but this can be
+improved in the future).
+## Development
+Install the required dependencies:
+```
+$ bundle install
+```
+Then you can execute the tests after spinning up a Redis instance at
+`127.0.0.1:6379`:
+```
+$ bundle exec rake
+```
+To enable verbose output in the tests:
+```
+$ RSPECQ_DEBUG=1 bundle exec rake
+```
-This protects us against unrecoverable worker failures (e.g. segfault).
 ## License

data/Rakefile ADDED

@@ -0,0 +1,9 @@
+require "rake/testtask"
+Rake::TestTask.new do |t|
+  t.libs << "test"
+  t.test_files = FileList['test/test*.rb']
+  t.verbose = true
+end
+task default: :test

data/bin/rspecq CHANGED

@@ -1,67 +1,118 @@
 #!/usr/bin/env ruby
-require "optionparser"
+require "optparse"
 require "rspecq"
+DEFAULT_REDIS_HOST = "127.0.0.1"
+DEFAULT_REPORT_TIMEOUT = 3600 # 1 hour
+DEFAULT_MAX_REQUEUES = 3
+def env_set?(var)
+  ["1", "true"].include?(ENV[var])
+end
 opts = {}
 OptionParser.new do |o|
-  o.banner = "Usage: #{$PROGRAM_NAME} [opts] [files_or_directories_to_run]"
+  name = File.basename($PROGRAM_NAME)
-  o.on("--build-id ID", "A unique identifier denoting the build") do |v|
-    opts[:build_id] = v
+  o.banner = <<~BANNER
+    NAME:
+        #{name} - Optimally distribute and run RSpec suites among parallel workers
+    USAGE:
+        #{name} [<options>] [spec files or directories]
+  BANNER
+  o.separator ""
+  o.separator "OPTIONS:"
+  o.on("-b", "--build ID", "A unique identifier for the build. Should be " \
+       "common among workers participating in the same build.") do |v|
+    opts[:build] = v
   end
-  o.on("--worker-id ID", "A unique identifier denoting the worker") do |v|
-    opts[:worker_id] = v
+  o.on("-w", "--worker ID", "An identifier for the worker. Workers " \
+       "participating in the same build should have distinct IDs.") do |v|
+    opts[:worker] = v
   end
-  o.on("--redis HOST", "Redis HOST to connect to (default: 127.0.0.1)") do |v|
-    opts[:redis_host] = v || "127.0.0.1"
+  o.on("-r", "--redis HOST", "Redis host to connect to " \
+       "(default: #{DEFAULT_REDIS_HOST}).") do |v|
+    opts[:redis_host] = v
   end
-  o.on("--timings", "Populate global job timings in Redis") do |v|
+  o.on("--update-timings", "Update the global job timings key with the "     \
+       "timings of this build. Note: This key is used as the basis for job " \
+       "scheduling.") do |v|
     opts[:timings] = v
   end
-  o.on("--file-split-threshold N", "Split spec files slower than N sec. and " \
-       "schedule them by example (default: 999999)") do |v|
-    opts[:file_split_threshold] = Float(v)
+  o.on("--file-split-threshold N", Integer, "Split spec files slower than N " \
+       "seconds and schedule them as individual examples.") do |v|
+    opts[:file_split_threshold] = v
   end
-  o.on("--report", "Do not execute tests but wait until queue is empty and " \
-       "print a report") do |v|
+  o.on("--report", "Enable reporter mode: do not pull tests off the queue; " \
+                   "instead print build progress and exit when it's "        \
+                   "finished.\n#{o.summary_indent*9} "                       \
+                   "Exits with a non-zero status code if there were any "    \
+                   "failures.") do |v|
     opts[:report] = v
   end
-  o.on("--report-timeout N", Integer, "Fail if queue is not empty after " \
-       "N seconds. Only applicable if --report is enabled "               \
-       "(default: 3600)") do |v|
+  o.on("--report-timeout N", Integer, "Fail if build is not finished after " \
+       "N seconds. Only applicable if --report is enabled "                  \
+       "(default: #{DEFAULT_REPORT_TIMEOUT}).") do |v|
     opts[:report_timeout] = v
   end
+  o.on("--max-requeues N", Integer, "Retry failed examples up to N times "   \
+       "before considering them legit failures "                             \
+       "(default: #{DEFAULT_MAX_REQUEUES}).") do |v|
+    opts[:max_requeues] = v
+  end
+  o.on_tail("-h", "--help", "Show this message.") do
+    puts o
+    exit
+  end
+  o.on_tail("-v", "--version", "Print the version and exit.") do
+    puts "#{name} #{RSpecQ::VERSION}"
+    exit
+  end
 end.parse!
-[:build_id, :worker_id].each do |o|
-  raise OptionParser::MissingArgument.new(o) if opts[o].nil?
-end
+opts[:build] ||= ENV["RSPECQ_BUILD"]
+opts[:worker] ||= ENV["RSPECQ_WORKER"]
+opts[:redis_host] ||= ENV["RSPECQ_REDIS"] || DEFAULT_REDIS_HOST
+opts[:timings] ||= env_set?("RSPECQ_UPDATE_TIMINGS")
+opts[:file_split_threshold] ||= Integer(ENV["RSPECQ_FILE_SPLIT_THRESHOLD"] || 9999999)
+opts[:report] ||= env_set?("RSPECQ_REPORT")
+opts[:report_timeout] ||= Integer(ENV["RSPECQ_REPORT_TIMEOUT"] || DEFAULT_REPORT_TIMEOUT)
+opts[:max_requeues] ||= Integer(ENV["RSPECQ_MAX_REQUEUES"] || DEFAULT_MAX_REQUEUES)
+raise OptionParser::MissingArgument.new(:build) if opts[:build].nil?
+raise OptionParser::MissingArgument.new(:worker) if !opts[:report] && opts[:worker].nil?
 if opts[:report]
   reporter = RSpecQ::Reporter.new(
-    build_id: opts[:build_id],
-    worker_id: opts[:worker_id],
-    timeout: opts[:report_timeout] || 3600,
+    build_id: opts[:build],
+    timeout: opts[:report_timeout],
     redis_host: opts[:redis_host],
   )
   reporter.report
 else
   worker = RSpecQ::Worker.new(
-    build_id: opts[:build_id],
-    worker_id: opts[:worker_id],
-    redis_host: opts[:redis_host],
-    files_or_dirs_to_run: ARGV[0] || "spec",
+    build_id: opts[:build],
+    worker_id: opts[:worker],
+    redis_host: opts[:redis_host]
   )
+  worker.files_or_dirs_to_run = ARGV[0] if ARGV[0]
   worker.populate_timings = opts[:timings]
-  worker.file_split_threshold = opts[:file_split_threshold] || 999999
+  worker.file_split_threshold = opts[:file_split_threshold]
+  worker.max_requeues = opts[:max_requeues]
   worker.work
 end

data/lib/rspecq.rb CHANGED

@@ -1,11 +1,10 @@
 require "rspec/core"
+require "sentry-raven"
 module RSpecQ
-  MAX_REQUEUES = 3
-  # If a worker haven't executed an RSpec example for more than this time
-  # (in seconds), it is considered dead and its reserved work will be put back
-  # to the queue, to be picked up by another worker.
+  # If a worker haven't executed an example for more than WORKER_LIVENESS_SEC
+  # seconds, it is considered dead and its reserved work will be put back
+  # to the queue to be picked up by another worker.
   WORKER_LIVENESS_SEC = 60.0
 end
@@ -16,6 +15,5 @@ require_relative "rspecq/formatters/worker_heartbeat_recorder"
 require_relative "rspecq/queue"
 require_relative "rspecq/reporter"
-require_relative "rspecq/worker"
 require_relative "rspecq/version"
+require_relative "rspecq/worker"

data/lib/rspecq/formatters/failure_recorder.rb CHANGED

@@ -1,11 +1,12 @@
 module RSpecQ
   module Formatters
     class FailureRecorder
-      def initialize(queue, job)
+      def initialize(queue, job, max_requeues)
         @queue = queue
         @job = job
         @colorizer = RSpec::Core::Formatters::ConsoleCodes
         @non_example_error_recorded = false
+        @max_requeues = max_requeues
       end
       # Here we're notified about errors occuring outside of examples.
@@ -24,7 +25,7 @@ module RSpecQ
       def example_failed(notification)
         example = notification.example
-        if @queue.requeue_job(example.id, MAX_REQUEUES)
+        if @queue.requeue_job(example.id, @max_requeues)
           # HACK: try to avoid picking the job we just requeued; we want it
           # to be picked up by a different worker
           sleep 0.5

data/lib/rspecq/queue.rb CHANGED

@@ -57,6 +57,8 @@ module RSpecQ
     STATUS_INITIALIZING = "initializing".freeze
     STATUS_READY = "ready".freeze
+    attr_reader :redis
     def initialize(build_id, worker_id, redis_host)
       @build_id = build_id
       @worker_id = worker_id
@@ -150,13 +152,21 @@ module RSpecQ
     end
     def example_count
-      @redis.get(key_example_count) || 0
+      @redis.get(key_example_count).to_i
     end
     def processed_jobs_count
       @redis.scard(key_queue_processed)
     end
+    def processed_jobs
+      @redis.smembers(key_queue_processed)
+    end
+    def requeued_jobs
+      @redis.hgetall(key_requeues)
+    end
     def become_master
       @redis.setnx(key_queue_status, STATUS_INITIALIZING)
     end
@@ -200,10 +210,10 @@ module RSpecQ
       exhausted? && example_failures.empty? && non_example_errors.empty?
     end
-    private
-    def key(*keys)
-      [@build_id, keys].join(":")
+    # The remaining jobs to be processed. Jobs at the head of the list will
+    # be procesed first.
+    def unprocessed_jobs
+      @redis.lrange(key_queue_unprocessed, 0, -1)
     end
     # redis: STRING [STATUS_INITIALIZING, STATUS_READY]
@@ -279,6 +289,12 @@ module RSpecQ
       "build_times"
     end
+    private
+    def key(*keys)
+      [@build_id, keys].join(":")
+    end
     # We don't use any Ruby `Time` methods because specs that use timecop in
     # before(:all) hooks will mess up our times.
     def current_time

data/lib/rspecq/reporter.rb CHANGED

@@ -1,10 +1,9 @@
 module RSpecQ
   class Reporter
-    def initialize(build_id:, worker_id:, timeout:, redis_host:)
+    def initialize(build_id:, timeout:, redis_host:)
       @build_id = build_id
-      @worker_id = worker_id
       @timeout = timeout
-      @queue = Queue.new(build_id, worker_id, redis_host)
+      @queue = Queue.new(build_id, "reporter", redis_host)
       # We want feedback to be immediattely printed to CI users, so
       # we disable buffering.
@@ -12,7 +11,7 @@ module RSpecQ
     end
     def report
-      t = measure_duration { @queue.wait_until_published }
+      @queue.wait_until_published
       finished = false

data/lib/rspecq/version.rb CHANGED

@@ -1,3 +1,3 @@
 module RSpecQ
-  VERSION = "0.0.1.pre2".freeze
+  VERSION = "0.1.0".freeze
 end

data/lib/rspecq/worker.rb CHANGED

@@ -1,10 +1,16 @@
 require "json"
+require "pathname"
 require "pp"
 module RSpecQ
   class Worker
     HEARTBEAT_FREQUENCY = WORKER_LIVENESS_SEC / 6
+    # The root path or individual spec files to execute.
+    #
+    # Defaults to "spec" (just like in RSpec)
+    attr_accessor :files_or_dirs_to_run
     # If true, job timings will be populated in the global Redis timings key
     #
     # Defaults to false
@@ -12,15 +18,27 @@ module RSpecQ
     # If set, spec files that are known to take more than this value to finish,
     # will be split and scheduled on a per-example basis.
+    #
+    # Defaults to 999999
     attr_accessor :file_split_threshold
-    def initialize(build_id:, worker_id:, redis_host:, files_or_dirs_to_run:)
+    # Retry failed examples up to N times (with N being the supplied value)
+    # before considering them legit failures
+    #
+    # Defaults to 3
+    attr_accessor :max_requeues
+    attr_reader :queue
+    def initialize(build_id:, worker_id:, redis_host:)
       @build_id = build_id
       @worker_id = worker_id
       @queue = Queue.new(build_id, worker_id, redis_host)
-      @files_or_dirs_to_run = files_or_dirs_to_run
+      @files_or_dirs_to_run = "spec"
       @populate_timings = false
       @file_split_threshold = 999999
+      @heartbeat_updated_at = nil
+      @max_requeues = 3
       RSpec::Core::Formatters.register(Formatters::JobTimingRecorder, :dump_summary)
       RSpec::Core::Formatters.register(Formatters::ExampleCountRecorder, :dump_summary)
@@ -31,23 +49,23 @@ module RSpecQ
     def work
       puts "Working for build #{@build_id} (worker=#{@worker_id})"
-      try_publish_queue!(@queue)
-      @queue.wait_until_published
+      try_publish_queue!(queue)
+      queue.wait_until_published
       loop do
         # we have to bootstrap this so that it can be used in the first call
         # to `requeue_lost_job` inside the work loop
         update_heartbeat
-        lost = @queue.requeue_lost_job
+        lost = queue.requeue_lost_job
         puts "Requeued lost job: #{lost}" if lost
         # TODO: can we make `reserve_job` also act like exhausted? and get
         # rid of `exhausted?` (i.e. return false if no jobs remain)
-        job = @queue.reserve_job
+        job = queue.reserve_job
         # build is finished
-        return if job.nil? && @queue.exhausted?
+        return if job.nil? && queue.exhausted?
         next if job.nil?
@@ -60,112 +78,125 @@ module RSpecQ
         RSpec.configuration.detail_color = :magenta
         RSpec.configuration.seed = srand && srand % 0xFFFF
         RSpec.configuration.backtrace_formatter.filter_gem('rspecq')
-        RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(@queue, job))
-        RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(@queue))
+        RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(queue, job, max_requeues))
+        RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(queue))
         RSpec.configuration.add_formatter(Formatters::WorkerHeartbeatRecorder.new(self))
         if populate_timings
-          RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(@queue, job))
+          RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(queue, job))
         end
         opts = RSpec::Core::ConfigurationOptions.new(["--format", "progress", job])
         _result = RSpec::Core::Runner.new(opts).run($stderr, $stdout)
-        @queue.acknowledge_job(job)
+        queue.acknowledge_job(job)
       end
     end
     # Update the worker heartbeat if necessary
     def update_heartbeat
       if @heartbeat_updated_at.nil? || elapsed(@heartbeat_updated_at) >= HEARTBEAT_FREQUENCY
-        @queue.record_worker_heartbeat
+        queue.record_worker_heartbeat
         @heartbeat_updated_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
       end
     end
-    private
-    def reset_rspec_state!
-      RSpec.clear_examples
-      # TODO: remove after https://github.com/rspec/rspec-core/pull/2723
-      RSpec.world.instance_variable_set(:@example_group_counts_by_spec_file, Hash.new(0))
-      # RSpec.clear_examples does not reset those, which causes issues when
-      # a non-example error occurs (subsequent jobs are not executed)
-      # TODO: upstream
-      RSpec.world.non_example_failure = false
-      # we don't want an error that occured outside of the examples (which
-      # would set this to `true`) to stop the worker
-      RSpec.world.wants_to_quit = false
-    end
     def try_publish_queue!(queue)
       return if !queue.become_master
-      RSpec.configuration.files_or_directories_to_run = @files_or_dirs_to_run
+      RSpec.configuration.files_or_directories_to_run = files_or_dirs_to_run
       files_to_run = RSpec.configuration.files_to_run.map { |j| relative_path(j) }
       timings = queue.timings
       if timings.empty?
-        # TODO: should be a warning reported somewhere (Sentry?)
         q_size = queue.publish(files_to_run.shuffle)
-        puts "WARNING: No timings found! Published queue in " \
-             "random order (size=#{q_size})"
+        log_event(
+          "No timings found! Published queue in random order (size=#{q_size})",
+          "warning"
+        )
         return
       end
-      slow_files = timings.take_while do |_job, duration|
-        duration >= file_split_threshold
-      end.map(&:first) & files_to_run
+      # prepare jobs to run
+      jobs = []
+      slow_files = []
-      if slow_files.any?
-        puts "Slow files (threshold=#{file_split_threshold}): #{slow_files}"
+      if file_split_threshold
+        slow_files = timings.take_while do |_job, duration|
+          duration >= file_split_threshold
+        end.map(&:first) & files_to_run
       end
-      # prepare jobs to run
-      jobs = []
-      jobs.concat(files_to_run - slow_files)
-      jobs.concat(files_to_example_ids(slow_files)) if slow_files.any?
+      if slow_files.any?
+        jobs.concat(files_to_run - slow_files)
+        jobs.concat(files_to_example_ids(slow_files))
+      else
+        jobs.concat(files_to_run)
+      end
-      # assign timings to all of them
       default_timing = timings.values[timings.values.size/2]
+      # assign timings (based on previous runs) to all jobs
       jobs = jobs.each_with_object({}) do |j, h|
-        # heuristic: put untimed jobs in the middle of the queue
-        puts "New/untimed job: #{j}" if timings[j].nil?
+        puts "Untimed job: #{j}" if timings[j].nil?
+        # HEURISTIC: put jobs without previous timings (e.g. a newly added
+        # spec file) in the middle of the queue
         h[j] = timings[j] || default_timing
       end
-      # finally, sort them based on their timing (slowest first)
+      # sort jobs based on their timings (slowest to be processed first)
       jobs = jobs.sort_by { |_j, t| -t }.map(&:first)
       puts "Published queue (size=#{queue.publish(jobs)})"
     end
+    private
+    def reset_rspec_state!
+      RSpec.clear_examples
+      # see https://github.com/rspec/rspec-core/pull/2723
+      if Gem::Version.new(RSpec::Core::Version::STRING) <= Gem::Version.new("3.9.1")
+        RSpec.world.instance_variable_set(
+          :@example_group_counts_by_spec_file, Hash.new(0))
+      end
+      # RSpec.clear_examples does not reset those, which causes issues when
+      # a non-example error occurs (subsequent jobs are not executed)
+      # TODO: upstream
+      RSpec.world.non_example_failure = false
+      # we don't want an error that occured outside of the examples (which
+      # would set this to `true`) to stop the worker
+      RSpec.world.wants_to_quit = false
+    end
     # NOTE: RSpec has to load the files before we can split them as individual
     # examples. In case a file to be splitted fails to be loaded
-    # (e.g. contains a syntax error), we return the slow files unchanged,
-    # thereby falling back to scheduling them normally.
-    #
-    # Their errors will be reported in the normal flow, when they're picked up
-    # as jobs by a worker.
+    # (e.g. contains a syntax error), we return the files unchanged, thereby
+    # falling back to scheduling them as whole files. Their errors will be
+    # reported in the normal flow when they're eventually picked up by a worker.
     def files_to_example_ids(files)
-      # TODO: do this programatically
-      cmd = "DISABLE_SPRING=1 bin/rspec --dry-run --format json #{files.join(' ')}"
+      cmd = "DISABLE_SPRING=1 bundle exec rspec --dry-run --format json #{files.join(' ')} 2>&1"
       out = `#{cmd}`
+      cmd_result = $?
-      if !$?.success?
-        # TODO: emit warning to Sentry
-        puts "WARNING: Error splitting slow files; falling back to regular scheduling:"
+      if !cmd_result.success?
+        rspec_output = begin
+                         JSON.parse(out)
+                       rescue JSON::ParserError
+                         out
+                       end
-        begin
-          pp JSON.parse(out)
-        rescue JSON::ParserError
-          puts out
-        end
-        puts
+        log_event(
+          "Failed to split slow files, falling back to regular scheduling",
+          "error",
+          rspec_output: rspec_output,
+          cmd_result: cmd_result.inspect,
+        )
+        pp rspec_output
         return files
       end
@@ -181,5 +212,23 @@ module RSpecQ
     def elapsed(since)
       Process.clock_gettime(Process::CLOCK_MONOTONIC) - since
     end
+    # Prints msg to standard output and emits an event to Sentry, if the
+    # SENTRY_DSN environment variable is set.
+    def log_event(msg, level, additional={})
+      puts msg
+      Raven.capture_message(msg, level: level, extra: {
+        build: @build_id,
+        worker: @worker_id,
+        queue: queue.inspect,
+        files_or_dirs_to_run: files_or_dirs_to_run,
+        populate_timings: populate_timings,
+        file_split_threshold: file_split_threshold,
+        heartbeat_updated_at: @heartbeat_updated_at,
+        object: self.inspect,
+        pid: Process.pid,
+      }.merge(additional))
+    end
   end
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rspecq
 version: !ruby/object:Gem::Version
-  version: 0.0.1.pre2
+  version: 0.1.0
 platform: ruby
 authors:
 - Agis Anastasopoulos
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2020-06-26 00:00:00.000000000 Z
+date: 2020-08-27 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec-core
@@ -38,22 +38,64 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: sentry-raven
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: pry-byebug
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 - !ruby/object:Gem::Dependency
   name: minitest
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '5.14'
+        version: '0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '5.14'
+        version: '0'
 - !ruby/object:Gem::Dependency
-  name: rake
+  name: rspec
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
@@ -76,6 +118,7 @@ files:
 - CHANGELOG.md
 - LICENSE
 - README.md
+- Rakefile
 - bin/rspecq
 - lib/rspecq.rb
 - lib/rspecq/formatters/example_count_recorder.rb
@@ -101,12 +144,13 @@ required_ruby_version: !ruby/object:Gem::Requirement
       version: '0'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
-  - - ">"
+  - - ">="
     - !ruby/object:Gem::Version
-      version: 1.3.1
+      version: '0'
 requirements: []
-rubygems_version: 3.1.2
+rubygems_version: 3.1.4
 signing_key:
 specification_version: 4
-summary: Distribute an RSpec suite among many workers
+summary: Optimally distribute and run RSpec suites among parallel workers; for faster
+  CI builds
 test_files: []