RubyGems - rspecq - Versions diffs - 0.0.1.pre2 → 0.3.0 - Mend

rspecq 0.0.1.pre2 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +54 -0
data/README.md +170 -63
data/Rakefile +9 -0
data/bin/rspecq +99 -28
data/lib/rspecq.rb +5 -7
data/lib/rspecq/formatters/README.md +4 -0
data/lib/rspecq/formatters/failure_recorder.rb +3 -2
data/lib/rspecq/queue.rb +47 -6
data/lib/rspecq/reporter.rb +57 -6
data/lib/rspecq/version.rb +1 -1
data/lib/rspecq/worker.rb +128 -67
metadata +56 -11

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4fcc5311329946efb2a7801087f4cdb5f0be8becc01bd1bb8f367b6d130a02ea
-  data.tar.gz: d6f64c6d0c1dae8a53af8bf2d7724e2fb988b03fa795c59d0f5ecd18a92b072a
+  metadata.gz: 89dbfa98d1eaceb06c39d41ab85e7fa6923d0c87e9a15b9cbfaf7399ff2aaff3
+  data.tar.gz: b7cd028440e6eb03401dc623c7ee0fc0fe74f6ffa12a25ecc23d0cf54e6acd1e
 SHA512:
-  metadata.gz: 7611cf0944ea7751eaf93a7aae5686f6c03563b01e71978d9e1d61f30f14f89b12de0ce9ac590f9351eff22a4d4811f9e2f6c241232754ede3162142225f2c27
-  data.tar.gz: bdbd6da559607026b8e6fead442d4b15b1bb73e957a63b022c4459a83aba0c2a8e297204d9c29ae3bbc700d39ea2434c7b99e55dfb3752a56af5376d0511fea0
+  metadata.gz: a43f0630e8a02a001132f45c9f68cacf7edae8e90487112e640eb611e7d1345f68ad0ab163ae01c91cb38ebc879e98cda9b54044c0ee676293c7aa3bf7c17942
+  data.tar.gz: bf98027dc02ac56d02cc258700f5efa766c40d4d69c55c529d311083965d1fe4f76423b05fddb35a37496bb6eb3c11a8460a8e76f94897c4bb38a744b2fb40df

data/CHANGELOG.md CHANGED

@@ -1,4 +1,58 @@
 # Changelog
+Breaking changes are prefixed with a "[BREAKING]" label.
 ## master (unreleased)
+## 0.3.0 (2020-10-05)
+### Added
+- Providing a Redis URL is now possible using the `--redis-url` option
+  [[#40](https://github.com/skroutz/rspecq/pull/40)]
+### Changed
+- [DEPRECATION] The `--redis` option is now deprecated. Use `--redis-host`
+  instead [[#40](https://github.com/skroutz/rspecq/pull/40)]
+## 0.2.2 (2020-09-10)
+### Fixed
+- Worker would fail if application code was writing to stderr
+ [[#35](https://github.com/skroutz/rspecq/pull/35)]
+## 0.2.1 (2020-09-09)
+### Changed
+- Sentry Integration: Changed the way events for flaky jobs are emitted to a
+  per-flaky-job fashion. This ultimately improves grouping and filtering of the
+  flaky events in Sentry [[#33](https://github.com/skroutz/rspecq/pull/33)]
+## 0.2.0 (2020-08-31)
+This is a feature release with no breaking changes.
+### Added
+- Flaky jobs are now printed by the reporter in the final build output and also
+  emitted to Sentry (if the integration is enabled) [[#26](https://github.com/skroutz/rspecq/pull/26)]
+## 0.1.0 (2020-08-27)
+### Added
+- Sentry integration for various RSpecQ-level events [[#16](https://github.com/skroutz/rspecq/pull/16)]
+- CLI: Flags can now be also set environment variables [[c519230](https://github.com/skroutz/rspecq/commit/c5192303e229f361e8ac86ae449b4ea84d42e022)]
+- CLI: Added shorthand specifiers versions for some flags [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
+- CLI: Added `--help` and `--version` flags [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
+- CLI: Max number of retries for failed examples is now configurable via the `--max-requeues` option [[#14](https://github.com/skroutz/rspecq/pull/14)]
+### Changed
+- [BREAKING] CLI: Renamed `--timings` to `--update-timings` [[c519230](https://github.com/skroutz/rspecq/commit/c5192303e229f361e8ac86ae449b4ea84d42e022)]
+- [BREAKING] CLI: Renamed `--build-id` to `--build` and `--worker-id` to `--worker` [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
+- CLI: `--worker` is not required when `--reporter` is used [[4323a75](https://github.com/skroutz/rspecq/commit/4323a75ca357274069d02ba9fb51cdebb04e0be4)]
+- CLI: Improved help output [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]

data/README.md CHANGED

@@ -1,102 +1,209 @@
-# RSpecQ
+RSpec Queue
+=========================================================================
+[![Build Status](https://travis-ci.com/skroutz/rspecq.svg?branch=master)](https://travis-ci.com/github/skroutz/rspecq)
+[![Gem Version](https://badge.fury.io/rb/rspecq.svg)](https://badge.fury.io/rb/rspecq)
-RSpecQ (`rspecq`) distributes and executes an RSpec suite over many workers,
-using a centralized queue backed by Redis.
+RSpec Queue (RSpecQ) distributes and executes RSpec suites among parallel
+workers. It uses a centralized queue that workers connect to and pop off
+tests from. It ensures optimal scheduling of tests based on their run time,
+facilitating faster CI builds.
-RSpecQ is heavily inspired by [test-queue](https://github.com/tmm1/test-queue)
+RSpecQ is inspired by [test-queue](https://github.com/tmm1/test-queue)
 and [ci-queue](https://github.com/Shopify/ci-queue).
-## Why don't you just use ci-queue?
+## Features
+- Run an RSpec suite among many workers
+  (potentially located in different hosts) in a distributed fashion,
+  facilitating faster CI builds.
+- Consolidated, real-time reporting of a build's progress.
+- Optimal scheduling of test execution by using timings statistics from previous runs and
+  automatically scheduling slow spec files as individual examples. See
+  [*Spec file splitting*](#spec-file-splitting).
+- Automatic retry of test failures before being considered legit, in order to
+  rule out flakiness. Additionally, flaky tests are detected and provided to
+  the user. See [*Requeues*](#requeues).
+- Handles intermittent worker failures (e.g. network hiccups, faulty hardware etc.)
+  by detecting non-responsive workers and requeing their jobs. See [*Worker failures*](#worker-failures)
+- Sentry integration for monitoring build-level events. See [*Sentry integration*](#sentry-integration).
+- [PLANNED] StatsD integration for various build-level metrics and insights.
+  See [#2](https://github.com/skroutz/rspecq/issues/2).
-While evaluating ci-queue for our RSpec suite, we observed slow boot times
-in the workers (up to 3 minutes), increased memory consumption and too much
-disk I/O on boot. This is due to the fact that a worker in ci-queue has to
-load every spec file on boot. This can be problematic for applications with
-a large number of spec files.
-RSpecQ works with spec files as its unit of work (as opposed to ci-queue which
-works with individual examples). This means that an RSpecQ worker does not
-have to load all spec files at once and so it doesn't have the aforementioned
-problems. It also allows suites to keep using `before(:all)` hooks
-(which ci-queue explicitly rejects). (Note: RSpecQ also schedules individual
-examples, but only when this is deemed necessary, see section
-"Spec file splitting").
+## Usage
-We also observed faster build times by scheduling spec files instead of
-individual examples, due to way less Redis operations.
+A worker needs to be given a name and the build it will participate in.
+Assuming there's a Redis instance listening at `localhost`, starting a worker
+is as simple as:
-The downside of this design is that it's more complicated, since the scheduling
-of spec files happens based on timings calculated from previous runs. This
-means that RSpecQ maintains a key with the timing of each job and updates it
-on every run (if the `--timings` option was used). Also, RSpecQ has a "slow
-file threshold" which, currently has to be set manually (but this can be
-improved).
+```shell
+$ rspecq --build=123 --worker=foo1 spec/
+```
-*Update*: ci-queue deprecated support for RSpec, so there's that.
+To start more workers for the same build, use distinct worker IDs but the same
+build ID:
-## Usage
+```shell
+$ rspecq --build=123 --worker=foo2
+```
-Each worker needs to know the build it will participate in, its name and where
-Redis is located. To start a worker:
+To view the progress of the build use `--report`:
 ```shell
-$ rspecq --build-id=foo --worker-id=worker1 --redis=redis://localhost
+$ rspecq --build=123 --report
 ```
-To view the progress of the build print use `--report`:
+For detailed info use `--help`:
-```shell
-$ rspecq --build-id=foo --worker-id=reporter --redis=redis://localhost --report
 ```
+NAME:
+    rspecq - Optimally distribute and run RSpec suites among parallel workers
+USAGE:
+    rspecq [<options>] [spec files or directories]
+OPTIONS:
+    -b, --build ID                   A unique identifier for the build. Should be common among workers participating in the same build.
+    -w, --worker ID                  An identifier for the worker. Workers participating in the same build should have distinct IDs.
+    -r, --redis HOST                 --redis is deprecated. Use --redis-host or --redis-url instead. Redis host to connect to (default: 127.0.0.1).
+        --redis-host HOST            Redis host to connect to (default: 127.0.0.1).
+        --redis-url URL              Redis URL to connect to (e.g.: redis://127.0.0.1:6379/0).
+        --update-timings             Update the global job timings key with the timings of this build. Note: This key is used as the basis for job scheduling.
+        --file-split-threshold N     Split spec files slower than N seconds and schedule them as individual examples.
+        --report                     Enable reporter mode: do not pull tests off the queue; instead print build progress and exit when it's finished.
+                                     Exits with a non-zero status code if there were any failures.
+        --report-timeout N           Fail if build is not finished after N seconds. Only applicable if --report is enabled (default: 3600).
+        --max-requeues N             Retry failed examples up to N times before considering them legit failures (default: 3).
+    -h, --help                       Show this message.
+    -v, --version                    Print the version and exit.
+```
+### Sentry integration
+RSpecQ can optionally emit build events to a
+[Sentry](https://sentry.io) project by setting the
+[`SENTRY_DSN`](https://github.com/getsentry/raven-ruby#raven-only-runs-when-sentry_dsn-is-set)
+environment variable.
-For detailed info use `--help`.
+This is convenient for monitoring important warnings/errors that may impact
+build times, such as the fact that no previous timings were found and
+therefore job scheduling was effectively random for a particular build.
 ## How it works
-The basic idea is identical to ci-queue so please refer to its README
+The core design is almost identical to ci-queue so please refer to its
+[README](https://github.com/Shopify/ci-queue/blob/master/README.md) instead.
 ### Terminology
-- Job: the smallest unit of work, which is usually a spec file
+- **Job**: the smallest unit of work, which is usually a spec file
   (e.g. `./spec/models/foo_spec.rb`) but can also be an individual example
-  (e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow
-- Queue: a collection of Redis-backed structures that hold all the necessary
-  information for RSpecQ to function. This includes timing statistics, jobs to
-  be executed, the failure reports, requeueing statistics and more.
-- Worker: a process that, given a build id, pops up jobs of that build and
-  executes them using RSpec
-- Reporter: a process that, given a build id, waits for the build to finish
-  and prints the summary report (examples executed, build result, failures etc.)
+  (e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow.
+- **Queue**: a collection of Redis-backed structures that hold all the necessary
+  information for an RSpecQ build to run. This includes timing statistics,
+  jobs to be executed, the failure reports and more.
+- **Build**: a particular test suite run. Each build has its own **Queue**.
+- **Worker**: an `rspecq` process that, given a build id, consumes jobs off the
+  build's queue and executes them using RSpec
+- **Reporter**: an `rspecq` process that, given a build id, waits for the build's
+  queue to be drained and prints the build summary report
 ### Spec file splitting
-Very slow files may put a limit to how fast the suite can execute. For example,
-a worker may spend 10 minutes running a single slow file, while all the other
-workers finish after 8 minutes. To overcome this issue, rspecq splits
-files that their execution time is above a certain threshold
-(set with the `--file-split-threshold` option) and will instead schedule them as
-individual examples.
+Particularly slow spec files may set a limit to how fast a build can be.
+For example, a single file may need 10 minutes to run while all other
+files finish after 8 minutes. This would cause all but one workers to be
+sitting idle for 2 minutes.
-In the future, we'd like for the slow threshold to be calculated and set
-dynamically.
+To overcome this issue, RSpecQ can splits files which their execution time is
+above a certain threshold (set with the `--file-split-threshold` option)
+and instead schedule them as individual examples.
+Note: In the future, we'd like for the slow threshold to be calculated and set
+dynamically (see #3).
 ### Requeues
-As a mitigation measure for flaky tests, if an example fails it will be put
-back to the queue to be picked up by
-another worker. This will be repeated up to a certain number of times before,
-after which the example will be considered a legit failure and will be printed
-in the final report (`--report`).
+As a mitigation technique against flaky tests, if an example fails it will be
+put back to the queue to be picked up by another worker. This will be repeated
+up to a certain number of times (set with the `--max-requeues` option), after
+which the example will be considered a legit failure and printed as such in the
+final report.
+Flaky tests are also detected and printed as such in the final report. They are
+also emitted to Sentry (see [Sentry integration](#sentry-integration)).
 ### Worker failures
-Workers emit a timestamp after each example, as a heartbeat, to denote
-that they're fine and performing jobs. If a worker hasn't reported for
-a given amount of time (see `WORKER_LIVENESS_SEC`) it is considered dead
-and the job it reserved will be requeued, so that it is picked up by another worker.
+It's not uncommon for CI processes to encounter unrecoverable failures for
+various reasons: faulty hardware, network hiccups, segmentation faults in
+MRI etc.
+For resiliency against such issues, workers emit a heartbeat after each
+example they execute, to signal
+that they're healthy and performing jobs as expected. If a worker hasn't
+emitted a heartbeat for a given amount of time (set by `WORKER_LIVENESS_SEC`)
+it is considered dead and its reserved job will be put back to the queue, to
+be picked up by another healthy worker.
+## Rationale
+### Why didn't you use ci-queue?
+**Update**: ci-queue [deprecated support for RSpec](https://github.com/Shopify/ci-queue/pull/149).
+While evaluating ci-queue we experienced slow worker boot
+times (up to 3 minutes in some cases) combined with disk IO saturation and
+increased memory consumption. This is due to the fact that a worker in
+ci-queue has to load every spec file on boot. In applications with a large
+number of spec files this may result in a significant performance hit and
+in case of cloud environments, increased costs.
+We also observed slower build times compared to our previous solution which
+scheduled whole spec files (as opposed to individual examples), due to
+big differences in runtimes of individual examples, something common in big
+RSpec suites.
+We decided for RSpecQ to use whole spec files as its main unit of work (as
+opposed to ci-queue which uses individual examples). This means that an RSpecQ
+worker only loads the files needed and ends up with a subset of all the suite's
+files.  (Note: RSpecQ also schedules individual examples, but only when this is
+deemed necessary, see [Spec file splitting](#spec-file-splitting)).
+This kept boot and test run times considerably fast. As a side benefit, this
+allows suites to keep using `before(:all)` hooks (which ci-queue explicitly
+rejects).
+The downside of this design is that it's more complicated, since the scheduling
+of spec files happens based on timings calculated from previous runs. This
+means that RSpecQ maintains a key with the timing of each job and updates it
+on every run (if the `--update-timings` option was used). Also, RSpecQ has a
+"slow file threshold" which, currently has to be set manually (but this can be
+improved in the future).
+## Development
+Install the required dependencies:
+```
+$ bundle install
+```
+Then you can execute the tests after spinning up a Redis instance at
+`127.0.0.1:6379`:
+```
+$ bundle exec rake
+```
+To enable verbose output in the tests:
+```
+$ RSPECQ_DEBUG=1 bundle exec rake
+```
-This protects us against unrecoverable worker failures (e.g. segfault).
 ## License

data/Rakefile ADDED

@@ -0,0 +1,9 @@
+require "rake/testtask"
+Rake::TestTask.new do |t|
+  t.libs << "test"
+  t.test_files = FileList['test/test_*.rb']
+  t.verbose = true
+end
+task default: :test

data/bin/rspecq CHANGED

@@ -1,67 +1,138 @@
 #!/usr/bin/env ruby
-require "optionparser"
+require "optparse"
 require "rspecq"
+DEFAULT_REDIS_HOST = "127.0.0.1"
+DEFAULT_REPORT_TIMEOUT = 3600 # 1 hour
+DEFAULT_MAX_REQUEUES = 3
+def env_set?(var)
+  ["1", "true"].include?(ENV[var])
+end
 opts = {}
 OptionParser.new do |o|
-  o.banner = "Usage: #{$PROGRAM_NAME} [opts] [files_or_directories_to_run]"
+  name = File.basename($PROGRAM_NAME)
+  o.banner = <<~BANNER
+    NAME:
+        #{name} - Optimally distribute and run RSpec suites among parallel workers
+    USAGE:
+        #{name} [<options>] [spec files or directories]
+  BANNER
+  o.separator ""
+  o.separator "OPTIONS:"
+  o.on("-b", "--build ID", "A unique identifier for the build. Should be " \
+       "common among workers participating in the same build.") do |v|
+    opts[:build] = v
+  end
+  o.on("-w", "--worker ID", "An identifier for the worker. Workers " \
+       "participating in the same build should have distinct IDs.") do |v|
+    opts[:worker] = v
+  end
-  o.on("--build-id ID", "A unique identifier denoting the build") do |v|
-    opts[:build_id] = v
+  o.on("-r", "--redis HOST", "Redis host to connect to " \
+       "(default: #{DEFAULT_REDIS_HOST}).") do |v|
+    puts "--redis is deprecated. Use --redis-host or --redis-url instead"
+    opts[:redis_host] = v
   end
-  o.on("--worker-id ID", "A unique identifier denoting the worker") do |v|
-    opts[:worker_id] = v
+  o.on("--redis-host HOST", "Redis host to connect to " \
+       "(default: #{DEFAULT_REDIS_HOST}).") do |v|
+    opts[:redis_host] = v
   end
-  o.on("--redis HOST", "Redis HOST to connect to (default: 127.0.0.1)") do |v|
-    opts[:redis_host] = v || "127.0.0.1"
+  o.on("--redis-url URL", "The URL of the Redis host to connect to " \
+       "(e.g.: redis://127.0.0.1:6379/0).") do |v|
+    opts[:redis_url] = v
   end
-  o.on("--timings", "Populate global job timings in Redis") do |v|
+  o.on("--update-timings", "Update the global job timings key with the "     \
+       "timings of this build. Note: This key is used as the basis for job " \
+       "scheduling.") do |v|
     opts[:timings] = v
   end
-  o.on("--file-split-threshold N", "Split spec files slower than N sec. and " \
-       "schedule them by example (default: 999999)") do |v|
-    opts[:file_split_threshold] = Float(v)
+  o.on("--file-split-threshold N", Integer, "Split spec files slower than N " \
+       "seconds and schedule them as individual examples.") do |v|
+    opts[:file_split_threshold] = v
   end
-  o.on("--report", "Do not execute tests but wait until queue is empty and " \
-       "print a report") do |v|
+  o.on("--report", "Enable reporter mode: do not pull tests off the queue; " \
+                   "instead print build progress and exit when it's "        \
+                   "finished.\n#{o.summary_indent*9} "                       \
+                   "Exits with a non-zero status code if there were any "    \
+                   "failures.") do |v|
     opts[:report] = v
   end
-  o.on("--report-timeout N", Integer, "Fail if queue is not empty after " \
-       "N seconds. Only applicable if --report is enabled "               \
-       "(default: 3600)") do |v|
+  o.on("--report-timeout N", Integer, "Fail if build is not finished after " \
+       "N seconds. Only applicable if --report is enabled "                  \
+       "(default: #{DEFAULT_REPORT_TIMEOUT}).") do |v|
     opts[:report_timeout] = v
   end
+  o.on("--max-requeues N", Integer, "Retry failed examples up to N times "   \
+       "before considering them legit failures "                             \
+       "(default: #{DEFAULT_MAX_REQUEUES}).") do |v|
+    opts[:max_requeues] = v
+  end
+  o.on_tail("-h", "--help", "Show this message.") do
+    puts o
+    exit
+  end
+  o.on_tail("-v", "--version", "Print the version and exit.") do
+    puts "#{name} #{RSpecQ::VERSION}"
+    exit
+  end
 end.parse!
-[:build_id, :worker_id].each do |o|
-  raise OptionParser::MissingArgument.new(o) if opts[o].nil?
+opts[:build] ||= ENV["RSPECQ_BUILD"]
+opts[:worker] ||= ENV["RSPECQ_WORKER"]
+opts[:redis_host] ||= ENV["RSPECQ_REDIS"] || DEFAULT_REDIS_HOST
+opts[:timings] ||= env_set?("RSPECQ_UPDATE_TIMINGS")
+opts[:file_split_threshold] ||= Integer(ENV["RSPECQ_FILE_SPLIT_THRESHOLD"] || 9999999)
+opts[:report] ||= env_set?("RSPECQ_REPORT")
+opts[:report_timeout] ||= Integer(ENV["RSPECQ_REPORT_TIMEOUT"] || DEFAULT_REPORT_TIMEOUT)
+opts[:max_requeues] ||= Integer(ENV["RSPECQ_MAX_REQUEUES"] || DEFAULT_MAX_REQUEUES)
+opts[:redis_url] ||= ENV["RSPECQ_REDIS_URL"]
+raise OptionParser::MissingArgument.new(:build) if opts[:build].nil?
+raise OptionParser::MissingArgument.new(:worker) if !opts[:report] && opts[:worker].nil?
+redis_opts = {}
+if opts[:redis_url]
+  redis_opts[:url] = opts[:redis_url]
+else
+  redis_opts[:host] = opts[:redis_host]
 end
 if opts[:report]
   reporter = RSpecQ::Reporter.new(
-    build_id: opts[:build_id],
-    worker_id: opts[:worker_id],
-    timeout: opts[:report_timeout] || 3600,
-    redis_host: opts[:redis_host],
+    build_id: opts[:build],
+    timeout: opts[:report_timeout],
+    redis_opts: redis_opts,
   )
   reporter.report
 else
   worker = RSpecQ::Worker.new(
-    build_id: opts[:build_id],
-    worker_id: opts[:worker_id],
-    redis_host: opts[:redis_host],
-    files_or_dirs_to_run: ARGV[0] || "spec",
+    build_id: opts[:build],
+    worker_id: opts[:worker],
+    redis_opts: redis_opts
   )
+  worker.files_or_dirs_to_run = ARGV[0] if ARGV[0]
   worker.populate_timings = opts[:timings]
-  worker.file_split_threshold = opts[:file_split_threshold] || 999999
+  worker.file_split_threshold = opts[:file_split_threshold]
+  worker.max_requeues = opts[:max_requeues]
   worker.work
 end

data/lib/rspecq.rb CHANGED

@@ -1,11 +1,10 @@
 require "rspec/core"
+require "sentry-raven"
 module RSpecQ
-  MAX_REQUEUES = 3
-  # If a worker haven't executed an RSpec example for more than this time
-  # (in seconds), it is considered dead and its reserved work will be put back
-  # to the queue, to be picked up by another worker.
+  # If a worker haven't executed an example for more than WORKER_LIVENESS_SEC
+  # seconds, it is considered dead and its reserved work will be put back
+  # to the queue to be picked up by another worker.
   WORKER_LIVENESS_SEC = 60.0
 end
@@ -16,6 +15,5 @@ require_relative "rspecq/formatters/worker_heartbeat_recorder"
 require_relative "rspecq/queue"
 require_relative "rspecq/reporter"
-require_relative "rspecq/worker"
 require_relative "rspecq/version"
+require_relative "rspecq/worker"

data/lib/rspecq/formatters/README.md ADDED

@@ -0,0 +1,4 @@
+RSpec Formatters are used by RSpecQ as hooks for various execution events.
+For more info on formatters in general, see
+https://rubydoc.info/gems/rspec-core/RSpec/Core/Formatters.

data/lib/rspecq/formatters/failure_recorder.rb CHANGED

@@ -1,11 +1,12 @@
 module RSpecQ
   module Formatters
     class FailureRecorder
-      def initialize(queue, job)
+      def initialize(queue, job, max_requeues)
         @queue = queue
         @job = job
         @colorizer = RSpec::Core::Formatters::ConsoleCodes
         @non_example_error_recorded = false
+        @max_requeues = max_requeues
       end
       # Here we're notified about errors occuring outside of examples.
@@ -24,7 +25,7 @@ module RSpecQ
       def example_failed(notification)
         example = notification.example
-        if @queue.requeue_job(example.id, MAX_REQUEUES)
+        if @queue.requeue_job(example.id, @max_requeues)
           # HACK: try to avoid picking the job we just requeued; we want it
           # to be picked up by a different worker
           sleep 0.5

data/lib/rspecq/queue.rb CHANGED

@@ -1,6 +1,17 @@
 require "redis"
 module RSpecQ
+  # Queue is the data store interface (Redis) and is used to manage the work
+  # queue for a particular build. All Redis operations happen via Queue.
+  #
+  # A queue typically contains all the data needed for a particular build to
+  # happen. These include (but are not limited to) the following:
+  #
+  # - the list of jobs (spec files and/or examples) to be executed
+  # - the failed examples along with their backtrace
+  # - the set of running jobs
+  # - previous job timing statistics used to optimally schedule the jobs
+  # - the set of executed jobs
   class Queue
     RESERVE_JOB = <<~LUA.freeze
       local queue = KEYS[1]
@@ -57,10 +68,12 @@ module RSpecQ
     STATUS_INITIALIZING = "initializing".freeze
     STATUS_READY = "ready".freeze
-    def initialize(build_id, worker_id, redis_host)
+    attr_reader :redis
+    def initialize(build_id, worker_id, redis_opts)
       @build_id = build_id
       @worker_id = worker_id
-      @redis = Redis.new(host: redis_host, id: worker_id)
+      @redis = Redis.new(redis_opts.merge(id: worker_id))
     end
     # NOTE: jobs will be processed from head to tail (lpop)
@@ -150,13 +163,21 @@ module RSpecQ
     end
     def example_count
-      @redis.get(key_example_count) || 0
+      @redis.get(key_example_count).to_i
     end
     def processed_jobs_count
       @redis.scard(key_queue_processed)
     end
+    def processed_jobs
+      @redis.smembers(key_queue_processed)
+    end
+    def requeued_jobs
+      @redis.hgetall(key_requeues)
+    end
     def become_master
       @redis.setnx(key_queue_status, STATUS_INITIALIZING)
     end
@@ -174,6 +195,7 @@ module RSpecQ
       @redis.hgetall(key_errors)
     end
+    # True if the build is complete, false otherwise
     def exhausted?
       return false if !published?
@@ -200,10 +222,23 @@ module RSpecQ
       exhausted? && example_failures.empty? && non_example_errors.empty?
     end
-    private
+    # The remaining jobs to be processed. Jobs at the head of the list will
+    # be procesed first.
+    def unprocessed_jobs
+      @redis.lrange(key_queue_unprocessed, 0, -1)
+    end
-    def key(*keys)
-      [@build_id, keys].join(":")
+    # Returns the jobs considered flaky (i.e. initially failed but passed
+    # after being retried). Must be called after the build is complete,
+    # otherwise an exception will be raised.
+    def flaky_jobs
+      raise "Queue is not yet exhausted" if !exhausted?
+      requeued = @redis.hkeys(key_requeues)
+      return [] if requeued.empty?
+      requeued - @redis.hkeys(key_failures)
     end
     # redis: STRING [STATUS_INITIALIZING, STATUS_READY]
@@ -279,6 +314,12 @@ module RSpecQ
       "build_times"
     end
+    private
+    def key(*keys)
+      [@build_id, keys].join(":")
+    end
     # We don't use any Ruby `Time` methods because specs that use timecop in
     # before(:all) hooks will mess up our times.
     def current_time

data/lib/rspecq/reporter.rb CHANGED

@@ -1,10 +1,18 @@
 module RSpecQ
+  # A Reporter, given a build ID, is responsible for consolidating the results
+  # from different workers and printing a complete build summary to the user,
+  # along with any failures that might have occured.
+  #
+  # The failures are printed in real-time as they occur, while the final
+  # summary is printed after the queue is empty and no tests are being
+  # executed. If the build failed, the status code of the reporter is non-zero.
+  #
+  # Reporters are readers of the queue.
   class Reporter
-    def initialize(build_id:, worker_id:, timeout:, redis_host:)
+    def initialize(build_id:, timeout:, redis_opts:)
       @build_id = build_id
-      @worker_id = worker_id
       @timeout = timeout
-      @queue = Queue.new(build_id, worker_id, redis_host)
+      @queue = Queue.new(build_id, "reporter", redis_opts)
       # We want feedback to be immediattely printed to CI users, so
       # we disable buffering.
@@ -12,7 +20,7 @@ module RSpecQ
     end
     def report
-      t = measure_duration { @queue.wait_until_published }
+      @queue.wait_until_published
       finished = false
@@ -46,8 +54,13 @@ module RSpecQ
       raise "Build not finished after #{@timeout} seconds" if !finished
       @queue.record_build_time(tests_duration)
+      flaky_jobs = @queue.flaky_jobs
       puts summary(@queue.example_failures, @queue.non_example_errors,
-                   humanize_duration(tests_duration))
+        flaky_jobs, humanize_duration(tests_duration))
+      flaky_jobs_to_sentry(flaky_jobs, tests_duration)
       exit 1 if !@queue.build_successful?
     end
@@ -61,7 +74,7 @@ module RSpecQ
     end
     # We try to keep this output consistent with RSpec's original output
-    def summary(failures, errors, duration)
+    def summary(failures, errors, flaky_jobs, duration)
       failed_examples_section = "\nFailed examples:\n\n"
       failures.each do |_job, msg|
@@ -82,6 +95,14 @@ module RSpecQ
                  "#{errors.count} errors"
       summary << "\n\n"
       summary << "Spec execution time: #{duration}"
+      if !flaky_jobs.empty?
+        summary << "\n\n"
+        summary << "Flaky jobs detected (count=#{flaky_jobs.count}):\n"
+        flaky_jobs.each { |j| summary << "  #{j}\n" }
+      end
+      summary
     end
     def failure_formatted(rspec_output)
@@ -91,5 +112,35 @@ module RSpecQ
     def humanize_duration(seconds)
       Time.at(seconds).utc.strftime("%H:%M:%S")
     end
+    def flaky_jobs_to_sentry(jobs, build_duration)
+      return if jobs.empty?
+      jobs.each do |job|
+        filename = job.sub(/\[.+\]/, '')
+        extra = {
+          build: @build_id,
+          build_timeout: @timeout,
+          queue: @queue.inspect,
+          object: self.inspect,
+          pid: Process.pid,
+          job_path: job,
+          build_duration: build_duration
+        }
+        tags = {
+          flaky: true,
+          spec_file: filename
+        }
+        Raven.capture_message(
+          "Flaky test in #{filename}",
+          level: 'warning',
+          extra: extra,
+          tags: tags
+        )
+      end
+    end
   end
 end

data/lib/rspecq/version.rb CHANGED

@@ -1,3 +1,3 @@
 module RSpecQ
-  VERSION = "0.0.1.pre2".freeze
+  VERSION = "0.3.0".freeze
 end

data/lib/rspecq/worker.rb CHANGED

@@ -1,10 +1,28 @@
 require "json"
+require "pathname"
 require "pp"
+require "open3"
 module RSpecQ
+  # A Worker, given a build ID, continuously consumes tests off the
+  # corresponding and executes them, until the queue is empty.
+  # It is also responsible for populating the initial queue.
+  #
+  # Essentially, a worker is an RSpec runner that prints the results of the
+  # tests it executes to standard output.
+  #
+  # The typical use case is to spawn many workers for a given build, thereby
+  # parallelizing the work and achieving faster build times.
+  #
+  # Workers are readers+writers of the queue.
   class Worker
     HEARTBEAT_FREQUENCY = WORKER_LIVENESS_SEC / 6
+    # The root path or individual spec files to execute.
+    #
+    # Defaults to "spec" (similar to RSpec)
+    attr_accessor :files_or_dirs_to_run
     # If true, job timings will be populated in the global Redis timings key
     #
     # Defaults to false
@@ -12,15 +30,27 @@ module RSpecQ
     # If set, spec files that are known to take more than this value to finish,
     # will be split and scheduled on a per-example basis.
+    #
+    # Defaults to 999999
     attr_accessor :file_split_threshold
-    def initialize(build_id:, worker_id:, redis_host:, files_or_dirs_to_run:)
+    # Retry failed examples up to N times (with N being the supplied value)
+    # before considering them legit failures
+    #
+    # Defaults to 3
+    attr_accessor :max_requeues
+    attr_reader :queue
+    def initialize(build_id:, worker_id:, redis_opts:)
       @build_id = build_id
       @worker_id = worker_id
-      @queue = Queue.new(build_id, worker_id, redis_host)
-      @files_or_dirs_to_run = files_or_dirs_to_run
+      @queue = Queue.new(build_id, worker_id, redis_opts)
+      @files_or_dirs_to_run = "spec"
       @populate_timings = false
       @file_split_threshold = 999999
+      @heartbeat_updated_at = nil
+      @max_requeues = 3
       RSpec::Core::Formatters.register(Formatters::JobTimingRecorder, :dump_summary)
       RSpec::Core::Formatters.register(Formatters::ExampleCountRecorder, :dump_summary)
@@ -31,23 +61,23 @@ module RSpecQ
     def work
       puts "Working for build #{@build_id} (worker=#{@worker_id})"
-      try_publish_queue!(@queue)
-      @queue.wait_until_published
+      try_publish_queue!(queue)
+      queue.wait_until_published
       loop do
         # we have to bootstrap this so that it can be used in the first call
         # to `requeue_lost_job` inside the work loop
         update_heartbeat
-        lost = @queue.requeue_lost_job
+        lost = queue.requeue_lost_job
         puts "Requeued lost job: #{lost}" if lost
         # TODO: can we make `reserve_job` also act like exhausted? and get
         # rid of `exhausted?` (i.e. return false if no jobs remain)
-        job = @queue.reserve_job
+        job = queue.reserve_job
         # build is finished
-        return if job.nil? && @queue.exhausted?
+        return if job.nil? && queue.exhausted?
         next if job.nil?
@@ -60,112 +90,125 @@ module RSpecQ
         RSpec.configuration.detail_color = :magenta
         RSpec.configuration.seed = srand && srand % 0xFFFF
         RSpec.configuration.backtrace_formatter.filter_gem('rspecq')
-        RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(@queue, job))
-        RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(@queue))
+        RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(queue, job, max_requeues))
+        RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(queue))
         RSpec.configuration.add_formatter(Formatters::WorkerHeartbeatRecorder.new(self))
         if populate_timings
-          RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(@queue, job))
+          RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(queue, job))
         end
         opts = RSpec::Core::ConfigurationOptions.new(["--format", "progress", job])
         _result = RSpec::Core::Runner.new(opts).run($stderr, $stdout)
-        @queue.acknowledge_job(job)
+        queue.acknowledge_job(job)
       end
     end
     # Update the worker heartbeat if necessary
     def update_heartbeat
       if @heartbeat_updated_at.nil? || elapsed(@heartbeat_updated_at) >= HEARTBEAT_FREQUENCY
-        @queue.record_worker_heartbeat
+        queue.record_worker_heartbeat
         @heartbeat_updated_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
       end
     end
-    private
-    def reset_rspec_state!
-      RSpec.clear_examples
-      # TODO: remove after https://github.com/rspec/rspec-core/pull/2723
-      RSpec.world.instance_variable_set(:@example_group_counts_by_spec_file, Hash.new(0))
-      # RSpec.clear_examples does not reset those, which causes issues when
-      # a non-example error occurs (subsequent jobs are not executed)
-      # TODO: upstream
-      RSpec.world.non_example_failure = false
-      # we don't want an error that occured outside of the examples (which
-      # would set this to `true`) to stop the worker
-      RSpec.world.wants_to_quit = false
-    end
     def try_publish_queue!(queue)
       return if !queue.become_master
-      RSpec.configuration.files_or_directories_to_run = @files_or_dirs_to_run
+      RSpec.configuration.files_or_directories_to_run = files_or_dirs_to_run
       files_to_run = RSpec.configuration.files_to_run.map { |j| relative_path(j) }
       timings = queue.timings
       if timings.empty?
-        # TODO: should be a warning reported somewhere (Sentry?)
         q_size = queue.publish(files_to_run.shuffle)
-        puts "WARNING: No timings found! Published queue in " \
-             "random order (size=#{q_size})"
+        log_event(
+          "No timings found! Published queue in random order (size=#{q_size})",
+          "warning"
+        )
         return
       end
-      slow_files = timings.take_while do |_job, duration|
-        duration >= file_split_threshold
-      end.map(&:first) & files_to_run
+      # prepare jobs to run
+      jobs = []
+      slow_files = []
-      if slow_files.any?
-        puts "Slow files (threshold=#{file_split_threshold}): #{slow_files}"
+      if file_split_threshold
+        slow_files = timings.take_while do |_job, duration|
+          duration >= file_split_threshold
+        end.map(&:first) & files_to_run
       end
-      # prepare jobs to run
-      jobs = []
-      jobs.concat(files_to_run - slow_files)
-      jobs.concat(files_to_example_ids(slow_files)) if slow_files.any?
+      if slow_files.any?
+        jobs.concat(files_to_run - slow_files)
+        jobs.concat(files_to_example_ids(slow_files))
+      else
+        jobs.concat(files_to_run)
+      end
-      # assign timings to all of them
       default_timing = timings.values[timings.values.size/2]
+      # assign timings (based on previous runs) to all jobs
       jobs = jobs.each_with_object({}) do |j, h|
-        # heuristic: put untimed jobs in the middle of the queue
-        puts "New/untimed job: #{j}" if timings[j].nil?
+        puts "Untimed job: #{j}" if timings[j].nil?
+        # HEURISTIC: put jobs without previous timings (e.g. a newly added
+        # spec file) in the middle of the queue
         h[j] = timings[j] || default_timing
       end
-      # finally, sort them based on their timing (slowest first)
+      # sort jobs based on their timings (slowest to be processed first)
       jobs = jobs.sort_by { |_j, t| -t }.map(&:first)
       puts "Published queue (size=#{queue.publish(jobs)})"
     end
+    private
+    def reset_rspec_state!
+      RSpec.clear_examples
+      # see https://github.com/rspec/rspec-core/pull/2723
+      if Gem::Version.new(RSpec::Core::Version::STRING) <= Gem::Version.new("3.9.1")
+        RSpec.world.instance_variable_set(
+          :@example_group_counts_by_spec_file, Hash.new(0))
+      end
+      # RSpec.clear_examples does not reset those, which causes issues when
+      # a non-example error occurs (subsequent jobs are not executed)
+      # TODO: upstream
+      RSpec.world.non_example_failure = false
+      # we don't want an error that occured outside of the examples (which
+      # would set this to `true`) to stop the worker
+      RSpec.world.wants_to_quit = false
+    end
     # NOTE: RSpec has to load the files before we can split them as individual
     # examples. In case a file to be splitted fails to be loaded
-    # (e.g. contains a syntax error), we return the slow files unchanged,
-    # thereby falling back to scheduling them normally.
-    #
-    # Their errors will be reported in the normal flow, when they're picked up
-    # as jobs by a worker.
+    # (e.g. contains a syntax error), we return the files unchanged, thereby
+    # falling back to scheduling them as whole files. Their errors will be
+    # reported in the normal flow when they're eventually picked up by a worker.
     def files_to_example_ids(files)
-      # TODO: do this programatically
-      cmd = "DISABLE_SPRING=1 bin/rspec --dry-run --format json #{files.join(' ')}"
-      out = `#{cmd}`
-      if !$?.success?
-        # TODO: emit warning to Sentry
-        puts "WARNING: Error splitting slow files; falling back to regular scheduling:"
-        begin
-          pp JSON.parse(out)
-        rescue JSON::ParserError
-          puts out
-        end
-        puts
+      cmd = "DISABLE_SPRING=1 bundle exec rspec --dry-run --format json #{files.join(' ')}"
+      out, err, cmd_result = Open3.capture3(cmd)
+      if !cmd_result.success?
+        rspec_output = begin
+                         JSON.parse(out)
+                       rescue JSON::ParserError
+                         out
+                       end
+        log_event(
+          "Failed to split slow files, falling back to regular scheduling.\n #{err}",
+          "error",
+          rspec_stdout: rspec_output,
+          rspec_stderr: err,
+          cmd_result: cmd_result.inspect,
+        )
+        pp rspec_output
         return files
       end
@@ -181,5 +224,23 @@ module RSpecQ
     def elapsed(since)
       Process.clock_gettime(Process::CLOCK_MONOTONIC) - since
     end
+    # Prints msg to standard output and emits an event to Sentry, if the
+    # SENTRY_DSN environment variable is set.
+    def log_event(msg, level, additional={})
+      puts msg
+      Raven.capture_message(msg, level: level, extra: {
+        build: @build_id,
+        worker: @worker_id,
+        queue: queue.inspect,
+        files_or_dirs_to_run: files_or_dirs_to_run,
+        populate_timings: populate_timings,
+        file_split_threshold: file_split_threshold,
+        heartbeat_updated_at: @heartbeat_updated_at,
+        object: self.inspect,
+        pid: Process.pid,
+      }.merge(additional))
+    end
   end
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rspecq
 version: !ruby/object:Gem::Version
-  version: 0.0.1.pre2
+  version: 0.3.0
 platform: ruby
 authors:
 - Agis Anastasopoulos
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2020-06-26 00:00:00.000000000 Z
+date: 2020-10-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec-core
@@ -38,22 +38,64 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: sentry-raven
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: pry-byebug
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 - !ruby/object:Gem::Dependency
   name: minitest
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '5.14'
+        version: '0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '5.14'
+        version: '0'
 - !ruby/object:Gem::Dependency
-  name: rake
+  name: rspec
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
@@ -76,8 +118,10 @@ files:
 - CHANGELOG.md
 - LICENSE
 - README.md
+- Rakefile
 - bin/rspecq
 - lib/rspecq.rb
+- lib/rspecq/formatters/README.md
 - lib/rspecq/formatters/example_count_recorder.rb
 - lib/rspecq/formatters/failure_recorder.rb
 - lib/rspecq/formatters/job_timing_recorder.rb
@@ -101,12 +145,13 @@ required_ruby_version: !ruby/object:Gem::Requirement
       version: '0'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
-  - - ">"
+  - - ">="
     - !ruby/object:Gem::Version
-      version: 1.3.1
+      version: '0'
 requirements: []
-rubygems_version: 3.1.2
+rubygems_version: 3.1.4
 signing_key:
 specification_version: 4
-summary: Distribute an RSpec suite among many workers
+summary: Optimally distribute and run RSpec suites among parallel workers; for faster
+  CI builds
 test_files: []