RubyGems - rspecq - Versions diffs - 0.0.1.pre2 → 0.3.0 - Mend

rspecq 0.0.1.pre2 → 0.3.0

Files changed (13) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +54 -0
data/README.md +170 -63
data/Rakefile +9 -0
data/bin/rspecq +99 -28
data/lib/rspecq.rb +5 -7
data/lib/rspecq/formatters/README.md +4 -0
data/lib/rspecq/formatters/failure_recorder.rb +3 -2
data/lib/rspecq/queue.rb +47 -6
data/lib/rspecq/reporter.rb +57 -6
data/lib/rspecq/version.rb +1 -1
data/lib/rspecq/worker.rb +128 -67
metadata +56 -11

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4fcc5311329946efb2a7801087f4cdb5f0be8becc01bd1bb8f367b6d130a02ea
-  data.tar.gz: d6f64c6d0c1dae8a53af8bf2d7724e2fb988b03fa795c59d0f5ecd18a92b072a
+  metadata.gz: 89dbfa98d1eaceb06c39d41ab85e7fa6923d0c87e9a15b9cbfaf7399ff2aaff3
+  data.tar.gz: b7cd028440e6eb03401dc623c7ee0fc0fe74f6ffa12a25ecc23d0cf54e6acd1e
 SHA512:
-  metadata.gz: 7611cf0944ea7751eaf93a7aae5686f6c03563b01e71978d9e1d61f30f14f89b12de0ce9ac590f9351eff22a4d4811f9e2f6c241232754ede3162142225f2c27
-  data.tar.gz: bdbd6da559607026b8e6fead442d4b15b1bb73e957a63b022c4459a83aba0c2a8e297204d9c29ae3bbc700d39ea2434c7b99e55dfb3752a56af5376d0511fea0
+  metadata.gz: a43f0630e8a02a001132f45c9f68cacf7edae8e90487112e640eb611e7d1345f68ad0ab163ae01c91cb38ebc879e98cda9b54044c0ee676293c7aa3bf7c17942
+  data.tar.gz: bf98027dc02ac56d02cc258700f5efa766c40d4d69c55c529d311083965d1fe4f76423b05fddb35a37496bb6eb3c11a8460a8e76f94897c4bb38a744b2fb40df

data/CHANGELOG.md CHANGED

@@ -1,4 +1,58 @@
 # Changelog
+Breaking changes are prefixed with a "[BREAKING]" label.
 ## master (unreleased)
+## 0.3.0 (2020-10-05)
+### Added
+- Providing a Redis URL is now possible using the `--redis-url` option
+  [[#40](https://github.com/skroutz/rspecq/pull/40)]
+### Changed
+- [DEPRECATION] The `--redis` option is now deprecated. Use `--redis-host`
+  instead [[#40](https://github.com/skroutz/rspecq/pull/40)]
+## 0.2.2 (2020-09-10)
+### Fixed
+- Worker would fail if application code was writing to stderr
+ [[#35](https://github.com/skroutz/rspecq/pull/35)]
+## 0.2.1 (2020-09-09)
+### Changed
+- Sentry Integration: Changed the way events for flaky jobs are emitted to a
+  per-flaky-job fashion. This ultimately improves grouping and filtering of the
+  flaky events in Sentry [[#33](https://github.com/skroutz/rspecq/pull/33)]
+## 0.2.0 (2020-08-31)
+This is a feature release with no breaking changes.
+### Added
+- Flaky jobs are now printed by the reporter in the final build output and also
+  emitted to Sentry (if the integration is enabled) [[#26](https://github.com/skroutz/rspecq/pull/26)]
+## 0.1.0 (2020-08-27)
+### Added
+- Sentry integration for various RSpecQ-level events [[#16](https://github.com/skroutz/rspecq/pull/16)]
+- CLI: Flags can now be also set environment variables [[c519230](https://github.com/skroutz/rspecq/commit/c5192303e229f361e8ac86ae449b4ea84d42e022)]
+- CLI: Added shorthand specifiers versions for some flags [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
+- CLI: Added `--help` and `--version` flags [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
+- CLI: Max number of retries for failed examples is now configurable via the `--max-requeues` option [[#14](https://github.com/skroutz/rspecq/pull/14)]
+### Changed
+- [BREAKING] CLI: Renamed `--timings` to `--update-timings` [[c519230](https://github.com/skroutz/rspecq/commit/c5192303e229f361e8ac86ae449b4ea84d42e022)]
+- [BREAKING] CLI: Renamed `--build-id` to `--build` and `--worker-id` to `--worker` [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
+- CLI: `--worker` is not required when `--reporter` is used [[4323a75](https://github.com/skroutz/rspecq/commit/4323a75ca357274069d02ba9fb51cdebb04e0be4)]
+- CLI: Improved help output [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]

data/README.md CHANGED

@@ -1,102 +1,209 @@
-# RSpecQ
+RSpec Queue
+=========================================================================
+[![Build Status](https://travis-ci.com/skroutz/rspecq.svg?branch=master)](https://travis-ci.com/github/skroutz/rspecq)
+[![Gem Version](https://badge.fury.io/rb/rspecq.svg)](https://badge.fury.io/rb/rspecq)
-RSpecQ (`rspecq`) distributes and executes an RSpec suite over many workers,
-using a centralized queue backed by Redis.
+RSpec Queue (RSpecQ) distributes and executes RSpec suites among parallel
+workers. It uses a centralized queue that workers connect to and pop off
+tests from. It ensures optimal scheduling of tests based on their run time,
+facilitating faster CI builds.
-RSpecQ is heavily inspired by [test-queue](https://github.com/tmm1/test-queue)
+RSpecQ is inspired by [test-queue](https://github.com/tmm1/test-queue)
 and [ci-queue](https://github.com/Shopify/ci-queue).
-## Why don't you just use ci-queue?
+## Features
+- Run an RSpec suite among many workers
+  (potentially located in different hosts) in a distributed fashion,
+  facilitating faster CI builds.
+- Consolidated, real-time reporting of a build's progress.
+- Optimal scheduling of test execution by using timings statistics from previous runs and
+  automatically scheduling slow spec files as individual examples. See
+  [*Spec file splitting*](#spec-file-splitting).
+- Automatic retry of test failures before being considered legit, in order to
+  rule out flakiness. Additionally, flaky tests are detected and provided to
+  the user. See [*Requeues*](#requeues).
+- Handles intermittent worker failures (e.g. network hiccups, faulty hardware etc.)
+  by detecting non-responsive workers and requeing their jobs. See [*Worker failures*](#worker-failures)
+- Sentry integration for monitoring build-level events. See [*Sentry integration*](#sentry-integration).
+- [PLANNED] StatsD integration for various build-level metrics and insights.
+  See [#2](https://github.com/skroutz/rspecq/issues/2).
-While evaluating ci-queue for our RSpec suite, we observed slow boot times
-in the workers (up to 3 minutes), increased memory consumption and too much
-disk I/O on boot. This is due to the fact that a worker in ci-queue has to
-load every spec file on boot. This can be problematic for applications with
-a large number of spec files.
-RSpecQ works with spec files as its unit of work (as opposed to ci-queue which
-works with individual examples). This means that an RSpecQ worker does not
-have to load all spec files at once and so it doesn't have the aforementioned
-problems. It also allows suites to keep using `before(:all)` hooks
-(which ci-queue explicitly rejects). (Note: RSpecQ also schedules individual
-examples, but only when this is deemed necessary, see section
-"Spec file splitting").
+## Usage
-We also observed faster build times by scheduling spec files instead of
-individual examples, due to way less Redis operations.
+A worker needs to be given a name and the build it will participate in.
+Assuming there's a Redis instance listening at `localhost`, starting a worker
+is as simple as:
-The downside of this design is that it's more complicated, since the scheduling
-of spec files happens based on timings calculated from previous runs. This
-means that RSpecQ maintains a key with the timing of each job and updates it
-on every run (if the `--timings` option was used). Also, RSpecQ has a "slow
-file threshold" which, currently has to be set manually (but this can be
-improved).
+```shell
+$ rspecq --build=123 --worker=foo1 spec/
+```
-*Update*: ci-queue deprecated support for RSpec, so there's that.
+To start more workers for the same build, use distinct worker IDs but the same
+build ID:
-## Usage
+```shell
+$ rspecq --build=123 --worker=foo2
+```
-Each worker needs to know the build it will participate in, its name and where
-Redis is located. To start a worker:
+To view the progress of the build use `--report`:
 ```shell
-$ rspecq --build-id=foo --worker-id=worker1 --redis=redis://localhost
+$ rspecq --build=123 --report
 ```
-To view the progress of the build print use `--report`:
+For detailed info use `--help`:
-```shell
-$ rspecq --build-id=foo --worker-id=reporter --redis=redis://localhost --report
 ```
+NAME:
+    rspecq - Optimally distribute and run RSpec suites among parallel workers
+USAGE:
+    rspecq [<options>] [spec files or directories]
+OPTIONS:
+    -b, --build ID                   A unique identifier for the build. Should be common among workers participating in the same build.
+    -w, --worker ID                  An identifier for the worker. Workers participating in the same build should have distinct IDs.
+    -r, --redis HOST                 --redis is deprecated. Use --redis-host or --redis-url instead. Redis host to connect to (default: 127.0.0.1).
+        --redis-host HOST            Redis host to connect to (default: 127.0.0.1).
+        --redis-url URL              Redis URL to connect to (e.g.: redis://127.0.0.1:6379/0).
+        --update-timings             Update the global job timings key with the timings of this build. Note: This key is used as the basis for job scheduling.
+        --file-split-threshold N     Split spec files slower than N seconds and schedule them as individual examples.
+        --report                     Enable reporter mode: do not pull tests off the queue; instead print build progress and exit when it's finished.
+                                     Exits with a non-zero status code if there were any failures.
+        --report-timeout N           Fail if build is not finished after N seconds. Only applicable if --report is enabled (default: 3600).
+        --max-requeues N             Retry failed examples up to N times before considering them legit failures (default: 3).
+    -h, --help                       Show this message.
+    -v, --version                    Print the version and exit.
+```
+### Sentry integration
+RSpecQ can optionally emit build events to a
+[Sentry](https://sentry.io) project by setting the
+[`SENTRY_DSN`](https://github.com/getsentry/raven-ruby#raven-only-runs-when-sentry_dsn-is-set)
+environment variable.
-For detailed info use `--help`.
+This is convenient for monitoring important warnings/errors that may impact
+build times, such as the fact that no previous timings were found and
+therefore job scheduling was effectively random for a particular build.
 ## How it works
-The basic idea is identical to ci-queue so please refer to its README
+The core design is almost identical to ci-queue so please refer to its
+[README](https://github.com/Shopify/ci-queue/blob/master/README.md) instead.
 ### Terminology
-- Job: the smallest unit of work, which is usually a spec file
+- **Job**: the smallest unit of work, which is usually a spec file
   (e.g. `./spec/models/foo_spec.rb`) but can also be an individual example
-  (e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow
-- Queue: a collection of Redis-backed structures that hold all the necessary
-  information for RSpecQ to function. This includes timing statistics, jobs to
-  be executed, the failure reports, requeueing statistics and more.
-- Worker: a process that, given a build id, pops up jobs of that build and
-  executes them using RSpec
-- Reporter: a process that, given a build id, waits for the build to finish
-  and prints the summary report (examples executed, build result, failures etc.)
+  (e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow.
+- **Queue**: a collection of Redis-backed structures that hold all the necessary
+  information for an RSpecQ build to run. This includes timing statistics,
+  jobs to be executed, the failure reports and more.
+- **Build**: a particular test suite run. Each build has its own **Queue**.
+- **Worker**: an `rspecq` process that, given a build id, consumes jobs off the
+  build's queue and executes them using RSpec
+- **Reporter**: an `rspecq` process that, given a build id, waits for the build's
+  queue to be drained and prints the build summary report
 ### Spec file splitting
-Very slow files may put a limit to how fast the suite can execute. For example,
-a worker may spend 10 minutes running a single slow file, while all the other
-workers finish after 8 minutes. To overcome this issue, rspecq splits
-files that their execution time is above a certain threshold
-(set with the `--file-split-threshold` option) and will instead schedule them as
-individual examples.
+Particularly slow spec files may set a limit to how fast a build can be.
+For example, a single file may need 10 minutes to run while all other
+files finish after 8 minutes. This would cause all but one workers to be
+sitting idle for 2 minutes.
-In the future, we'd like for the slow threshold to be calculated and set
-dynamically.
+To overcome this issue, RSpecQ can splits files which their execution time is
+above a certain threshold (set with the `--file-split-threshold` option)
+and instead schedule them as individual examples.
+Note: In the future, we'd like for the slow threshold to be calculated and set
+dynamically (see #3).
 ### Requeues
-As a mitigation measure for flaky tests, if an example fails it will be put
-back to the queue to be picked up by
-another worker. This will be repeated up to a certain number of times before,
-after which the example will be considered a legit failure and will be printed
-in the final report (`--report`).
+As a mitigation technique against flaky tests, if an example fails it will be
+put back to the queue to be picked up by another worker. This will be repeated
+up to a certain number of times (set with the `--max-requeues` option), after
+which the example will be considered a legit failure and printed as such in the
+final report.
+Flaky tests are also detected and printed as such in the final report. They are
+also emitted to Sentry (see [Sentry integration](#sentry-integration)).
 ### Worker failures
-Workers emit a timestamp after each example, as a heartbeat, to denote
-that they're fine and performing jobs. If a worker hasn't reported for
-a given amount of time (see `WORKER_LIVENESS_SEC`) it is considered dead
-and the job it reserved will be requeued, so that it is picked up by another worker.
+It's not uncommon for CI processes to encounter unrecoverable failures for
+various reasons: faulty hardware, network hiccups, segmentation faults in
+MRI etc.
+For resiliency against such issues, workers emit a heartbeat after each
+example they execute, to signal
+that they're healthy and performing jobs as expected. If a worker hasn't
+emitted a heartbeat for a given amount of time (set by `WORKER_LIVENESS_SEC`)
+it is considered dead and its reserved job will be put back to the queue, to
+be picked up by another healthy worker.
+## Rationale
+### Why didn't you use ci-queue?
+**Update**: ci-queue [deprecated support for RSpec](https://github.com/Shopify/ci-queue/pull/149).
+While evaluating ci-queue we experienced slow worker boot
+times (up to 3 minutes in some cases) combined with disk IO saturation and
+increased memory consumption. This is due to the fact that a worker in
+ci-queue has to load every spec file on boot. In applications with a large
+number of spec files this may result in a significant performance hit and
+in case of cloud environments, increased costs.
+We also observed slower build times compared to our previous solution which
+scheduled whole spec files (as opposed to individual examples), due to
+big differences in runtimes of individual examples, something common in big
+RSpec suites.
+We decided for RSpecQ to use whole spec files as its main unit of work (as
+opposed to ci-queue which uses individual examples). This means that an RSpecQ
+worker only loads the files needed and ends up with a subset of all the suite's
+files.  (Note: RSpecQ also schedules individual examples, but only when this is
+deemed necessary, see [Spec file splitting](#spec-file-splitting)).
+This kept boot and test run times considerably fast. As a side benefit, this
+allows suites to keep using `before(:all)` hooks (which ci-queue explicitly
+rejects).
+The downside of this design is that it's more complicated, since the scheduling
+of spec files happens based on timings calculated from previous runs. This
+means that RSpecQ maintains a key with the timing of each job and updates it
+on every run (if the `--update-timings` option was used). Also, RSpecQ has a
+"slow file threshold" which, currently has to be set manually (but this can be
+improved in the future).
+## Development
+Install the required dependencies:
+```
+$ bundle install
+```
+Then you can execute the tests after spinning up a Redis instance at
+`127.0.0.1:6379`:
+```
+$ bundle exec rake
+```
+To enable verbose output in the tests:
+```
+$ RSPECQ_DEBUG=1 bundle exec rake
+```
-This protects us against unrecoverable worker failures (e.g. segfault).
 ## License

data/Rakefile ADDED

@@ -0,0 +1,9 @@
+require "rake/testtask"
+Rake::TestTask.new do |t|
+  t.libs << "test"
+  t.test_files = FileList['test/test_*.rb']
+  t.verbose = true
+end
+task default: :test

data/bin/rspecq CHANGED

@@ -1,67 +1,138 @@
 #!/usr/bin/env ruby
-require "optionparser"
+require "optparse"
 require "rspecq"
+DEFAULT_REDIS_HOST = "127.0.0.1"
+DEFAULT_REPORT_TIMEOUT = 3600 # 1 hour
+DEFAULT_MAX_REQUEUES = 3
+def env_set?(var)
+  ["1", "true"].include?(ENV[var])
+end
 opts = {}
 OptionParser.new do |o|
-  o.banner = "Usage: #{$PROGRAM_NAME} [opts] [files_or_directories_to_run]"
+  name = File.basename($PROGRAM_NAME)
+  o.banner = <<~BANNER
+    NAME:
+        #{name} - Optimally distribute and run RSpec suites among parallel workers
+    USAGE:
+        #{name} [<options>] [spec files or directories]
+  BANNER
+  o.separator ""
+  o.separator "OPTIONS:"
+  o.on("-b", "--build ID", "A unique identifier for the build. Should be " \
+       "common among workers participating in the same build.") do |v|
+    opts[:build] = v
+  end
+  o.on("-w", "--worker ID", "An identifier for the worker. Workers " \
+       "participating in the same build should have distinct IDs.") do |v|
+    opts[:worker] = v
+  end
-  o.on("--build-id ID", "A unique identifier denoting the build") do |v|
-    opts[:build_id] = v
+  o.on("-r", "--redis HOST", "Redis host to connect to " \
+       "(default: #{DEFAULT_REDIS_HOST}).") do |v|
+    puts "--redis is deprecated. Use --redis-host or --redis-url instead"
+    opts[:redis_host] = v
   end
-  o.on("--worker-id ID", "A unique identifier denoting the worker") do |v|
-    opts[:worker_id] = v
+  o.on("--redis-host HOST", "Redis host to connect to " \
+       "(default: #{DEFAULT_REDIS_HOST}).") do |v|
+    opts[:redis_host] = v
   end
-  o.on("--redis HOST", "Redis HOST to connect to (default: 127.0.0.1)") do |v|
-    opts[:redis_host] = v || "127.0.0.1"
+  o.on("--redis-url URL", "The URL of the Redis host to connect to " \
+       "(e.g.: redis://127.0.0.1:6379/0).") do |v|
+    opts[:redis_url] = v
   end
-  o.on("--timings", "Populate global job timings in Redis") do |v|
+  o.on("--update-timings", "Update the global job timings key with the "     \
+       "timings of this build. Note: This key is used as the basis for job " \
+       "scheduling.") do |v|
     opts[:timings] = v
   end
-  o.on("--file-split-threshold N", "Split spec files slower than N sec. and " \
-       "schedule them by example (default: 999999)") do |v|
-    opts[:file_split_threshold] = Float(v)
+  o.on("--file-split-threshold N", Integer, "Split spec files slower than N " \
+       "seconds and schedule them as individual examples.") do |v|
+    opts[:file_split_threshold] = v
   end
-  o.on("--report", "Do not execute tests but wait until queue is empty and " \
-       "print a report") do |v|
+  o.on("--report", "Enable reporter mode: do not pull tests off the queue; " \
+                   "instead print build progress and exit when it's "        \
+                   "finished.\n#{o.summary_indent*9} "                       \
+                   "Exits with a non-zero status code if there were any "    \
+                   "failures.") do |v|
     opts[:report] = v
   end
-  o.on("--report-timeout N", Integer, "Fail if queue is not empty after " \
-       "N seconds. Only applicable if --report is enabled "               \
-       "(default: 3600)") do |v|
+  o.on("--report-timeout N", Integer, "Fail if build is not finished after " \
+       "N seconds. Only applicable if --report is enabled "                  \
+       "(default: #{DEFAULT_REPORT_TIMEOUT}).") do |v|
     opts[:report_timeout] = v
   end
+  o.on("--max-requeues N", Integer, "Retry failed examples up to N times "   \
+       "before considering them legit failures "                             \
+       "(default: #{DEFAULT_MAX_REQUEUES}).") do |v|
+    opts[:max_requeues] = v
+  end
+  o.on_tail("-h", "--help", "Show this message.") do
+    puts o
+    exit
+  end
+  o.on_tail("-v", "--version", "Print the version and exit.") do
+    puts "#{name} #{RSpecQ::VERSION}"
+    exit
+  end
 end.parse!
-[:build_id, :worker_id].each do |o|
-  raise OptionParser::MissingArgument.new(o) if opts[o].nil?
+opts[:build] ||= ENV["RSPECQ_BUILD"]
+opts[:worker] ||= ENV["RSPECQ_WORKER"]
+opts[:redis_host] ||= ENV["RSPECQ_REDIS"] || DEFAULT_REDIS_HOST
+opts[:timings] ||= env_set?("RSPECQ_UPDATE_TIMINGS")
+opts[:file_split_threshold] ||= Integer(ENV["RSPECQ_FILE_SPLIT_THRESHOLD"] || 9999999)
+opts[:report] ||= env_set?("RSPECQ_REPORT")
+opts[:report_timeout] ||= Integer(ENV["RSPECQ_REPORT_TIMEOUT"] || DEFAULT_REPORT_TIMEOUT)
+opts[:max_requeues] ||= Integer(ENV["RSPECQ_MAX_REQUEUES"] || DEFAULT_MAX_REQUEUES)
+opts[:redis_url] ||= ENV["RSPECQ_REDIS_URL"]
+raise OptionParser::MissingArgument.new(:build) if opts[:build].nil?
+raise OptionParser::MissingArgument.new(:worker) if !opts[:report] && opts[:worker].nil?
+redis_opts = {}
+if opts[:redis_url]
+  redis_opts[:url] = opts[:redis_url]
+else
+  redis_opts[:host] = opts[:redis_host]
 end
 if opts[:report]
   reporter = RSpecQ::Reporter.new(
-    build_id: opts[:build_id],
-    worker_id: opts[:worker_id],
-    timeout: opts[:report_timeout] || 3600,
-    redis_host: opts[:redis_host],
+    build_id: opts[:build],
+    timeout: opts[:report_timeout],
+    redis_opts: redis_opts,
   )
   reporter.report
 else
   worker = RSpecQ::Worker.new(
-    build_id: opts[:build_id],
-    worker_id: opts[:worker_id],
-    redis_host: opts[:redis_host],
-    files_or_dirs_to_run: ARGV[0] || "spec",
+    build_id: opts[:build],
+    worker_id: opts[:worker],
+    redis_opts: redis_opts
   )
+  worker.files_or_dirs_to_run = ARGV[0] if ARGV[0]
   worker.populate_timings = opts[:timings]
-  worker.file_split_threshold = opts[:file_split_threshold] || 999999
+  worker.file_split_threshold = opts[:file_split_threshold]
+  worker.max_requeues = opts[:max_requeues]
   worker.work
 end

data/lib/rspecq.rb CHANGED

@@ -1,11 +1,10 @@
 require "rspec/core"
+require "sentry-raven"
 module RSpecQ
-  MAX_REQUEUES = 3
-  # If a worker haven't executed an RSpec example for more than this time
-  # (in seconds), it is considered dead and its reserved work will be put back
-  # to the queue, to be picked up by another worker.
+  # If a worker haven't executed an example for more than WORKER_LIVENESS_SEC
+  # seconds, it is considered dead and its reserved work will be put back
+  # to the queue to be picked up by another worker.
   WORKER_LIVENESS_SEC = 60.0
 end
@@ -16,6 +15,5 @@ require_relative "rspecq/formatters/worker_heartbeat_recorder"
 require_relative "rspecq/queue"
 require_relative "rspecq/reporter"
-require_relative "rspecq/worker"
 require_relative "rspecq/version"
+require_relative "rspecq/worker"

data/lib/rspecq/formatters/README.md ADDED

@@ -0,0 +1,4 @@
+RSpec Formatters are used by RSpecQ as hooks for various execution events.
+For more info on formatters in general, see
+https://rubydoc.info/gems/rspec-core/RSpec/Core/Formatters.

data/lib/rspecq/formatters/failure_recorder.rb CHANGED

@@ -1,11 +1,12 @@
 module RSpecQ
   module Formatters
     class FailureRecorder
-      def initialize(queue, job)
+      def initialize(queue, job, max_requeues)
         @queue = queue
         @job = job
         @colorizer = RSpec::Core::Formatters::ConsoleCodes
         @non_example_error_recorded = false
+        @max_requeues = max_requeues
       end
       # Here we're notified about errors occuring outside of examples.
@@ -24,7 +25,7 @@ module RSpecQ
       def example_failed(notification)
         example = notification.example
-        if @queue.requeue_job(example.id, MAX_REQUEUES)
+        if @queue.requeue_job(example.id, @max_requeues)
           # HACK: try to avoid picking the job we just requeued; we want it
           # to be picked up by a different worker
           sleep 0.5

data/lib/rspecq/queue.rb CHANGED

@@ -1,6 +1,17 @@
 require "redis"
 module RSpecQ
+  # Queue is the data store interface (Redis) and is used to manage the work
+  # queue for a particular build. All Redis operations happen via Queue.
+  #
+  # A queue typically contains all the data needed for a particular build to
+  # happen. These include (but are not limited to) the following:
+  #
+  # - the list of jobs (spec files and/or examples) to be executed
+  # - the failed examples along with their backtrace
+  # - the set of running jobs
+  # - previous job timing statistics used to optimally schedule the jobs
+  # - the set of executed jobs
   class Queue
     RESERVE_JOB = <<~LUA.freeze
       local queue = KEYS[1]
@@ -57,10 +68,12 @@ module RSpecQ
     STATUS_INITIALIZING = "initializing".freeze
     STATUS_READY = "ready".freeze
-    def initialize(build_id, worker_id, redis_host)
+    attr_reader :redis
+    def initialize(build_id, worker_id, redis_opts)
       @build_id = build_id
       @worker_id = worker_id
-      @redis = Redis.new(host: redis_host, id: worker_id)
+      @redis = Redis.new(redis_opts.merge(id: worker_id))
     end
     # NOTE: jobs will be processed from head to tail (lpop)
@@ -150,13 +163,21 @@ module RSpecQ
     end
     def example_count
-      @redis.get(key_example_count) || 0
+      @redis.get(key_example_count).to_i
     end
     def processed_jobs_count
       @redis.scard(key_queue_processed)
     end
+    def processed_jobs
+      @redis.smembers(key_queue_processed)
+    end
+    def requeued_jobs
+      @redis.hgetall(key_requeues)
+    end
     def become_master
       @redis.setnx(key_queue_status, STATUS_INITIALIZING)
     end
@@ -174,6 +195,7 @@ module RSpecQ
       @redis.hgetall(key_errors)
     end
+    # True if the build is complete, false otherwise
     def exhausted?
       return false if !published?
@@ -200,10 +222,23 @@ module RSpecQ
       exhausted? && example_failures.empty? && non_example_errors.empty?
     end
-    private
+    # The remaining jobs to be processed. Jobs at the head of the list will
+    # be procesed first.
+    def unprocessed_jobs
+      @redis.lrange(key_queue_unprocessed, 0, -1)
+    end
-    def key(*keys)
-      [@build_id, keys].join(":")
+    # Returns the jobs considered flaky (i.e. initially failed but passed
+    # after being retried). Must be called after the build is complete,
+    # otherwise an exception will be raised.
+    def flaky_jobs
+      raise "Queue is not yet exhausted" if !exhausted?
+      requeued = @redis.hkeys(key_requeues)
+      return [] if requeued.empty?
+      requeued - @redis.hkeys(key_failures)
     end
     # redis: STRING [STATUS_INITIALIZING, STATUS_READY]
@@ -279,6 +314,12 @@ module RSpecQ
       "build_times"
     end
+    private
+    def key(*keys)
+      [@build_id, keys].join(":")
+    end
     # We don't use any Ruby `Time` methods because specs that use timecop in
     # before(:all) hooks will mess up our times.
     def current_time

data/lib/rspecq/reporter.rb CHANGED

@@ -1,10 +1,18 @@
 module RSpecQ
+  # A Reporter, given a build ID, is responsible for consolidating the results
+  # from different workers and printing a complete build summary to the user,
+  # along with any failures that might have occured.
+  #
+  # The failures are printed in real-time as they occur, while the final
+  # summary is printed after the queue is empty and no tests are being
+  # executed. If the build failed, the status code of the reporter is non-zero.
+  #
+  # Reporters are readers of the queue.
   class Reporter
-    def initialize(build_id:, worker_id:, timeout:, redis_host:)
+    def initialize(build_id:, timeout:, redis_opts:)
       @build_id = build_id
-      @worker_id = worker_id
       @timeout = timeout
-      @queue = Queue.new(build_id, worker_id, redis_host)
+      @queue = Queue.new(build_id, "reporter", redis_opts)
       # We want feedback to be immediattely printed to CI users, so
       # we disable buffering.
@@ -12,7 +20,7 @@ module RSpecQ
     end
     def report
-      t = measure_duration { @queue.wait_until_published }
+      @queue.wait_until_published
       finished = false
@@ -46,8 +54,13 @@ module RSpecQ
       raise "Build not finished after #{@timeout} seconds" if !finished
       @queue.record_build_time(tests_duration)
+      flaky_jobs = @queue.flaky_jobs
       puts summary(@queue.example_failures, @queue.non_example_errors,
-                   humanize_duration(tests_duration))
+        flaky_jobs, humanize_duration(tests_duration))
+      flaky_jobs_to_sentry(flaky_jobs, tests_duration)
       exit 1 if !@queue.build_successful?
     end
@@ -61,7 +74,7 @@ module RSpecQ
     end
     # We try to keep this output consistent with RSpec's original output
-    def summary(failures, errors, duration)
+    def summary(failures, errors, flaky_jobs, duration)
       failed_examples_section = "\nFailed examples:\n\n"
       failures.each do |_job, msg|
@@ -82,6 +95,14 @@ module RSpecQ
                  "#{errors.count} errors"
       summary << "\n\n"
       summary << "Spec execution time: #{duration}"
+      if !flaky_jobs.empty?
+        summary << "\n\n"
+        summary << "Flaky jobs detected (count=#{flaky_jobs.count}):\n"
+        flaky_jobs.each { |j| summary << "  #{j}\n" }
+      end
+      summary
     end
     def failure_formatted(rspec_output)
@@ -91,5 +112,35 @@ module RSpecQ
     def humanize_duration(seconds)
       Time.at(seconds).utc.strftime("%H:%M:%S")
     end
+    def flaky_jobs_to_sentry(jobs, build_duration)
+      return if jobs.empty?
+      jobs.each do |job|
+        filename = job.sub(/\[.+\]/, '')
+        extra = {
+          build: @build_id,
+          build_timeout: @timeout,
+          queue: @queue.inspect,
+          object: self.inspect,
+          pid: Process.pid,
+          job_path: job,
+          build_duration: build_duration
+        }
+        tags = {
+          flaky: true,
+          spec_file: filename
+        }
+        Raven.capture_message(
+          "Flaky test in #{filename}",
+          level: 'warning',
+          extra: extra,
+          tags: tags
+        )
+      end
+    end
   end
 end

data/lib/rspecq/version.rb CHANGED

@@ -1,3 +1,3 @@
 module RSpecQ
-  VERSION = "0.0.1.pre2".freeze
+  VERSION = "0.3.0".freeze
 end

data/lib/rspecq/worker.rb CHANGED

@@ -1,10 +1,28 @@
 require "json"
+require "pathname"
 require "pp"
+require "open3"
 module RSpecQ
+  # A Worker, given a build ID, continuously consumes tests off the
+  # corresponding and executes them, until the queue is empty.
+  # It is also responsible for populating the initial queue.
+  #
+  # Essentially, a worker is an RSpec runner that prints the results of the
+  # tests it executes to standard output.
+  #
+  # The typical use case is to spawn many workers for a given build, thereby
+  # parallelizing the work and achieving faster build times.
+  #
+  # Workers are readers+writers of the queue.
   class Worker
     HEARTBEAT_FREQUENCY = WORKER_LIVENESS_SEC / 6
+    # The root path or individual spec files to execute.
+    #
+    # Defaults to "spec" (similar to RSpec)
+    attr_accessor :files_or_dirs_to_run
     # If true, job timings will be populated in the global Redis timings key
     #
     # Defaults to false
@@ -12,15 +30,27 @@ module RSpecQ
     # If set, spec files that are known to take more than this value to finish,
     # will be split and scheduled on a per-example basis.
+    #
+    # Defaults to 999999
     attr_accessor :file_split_threshold
-    def initialize(build_id:, worker_id:, redis_host:, files_or_dirs_to_run:)
+    # Retry failed examples up to N times (with N being the supplied value)
+    # before considering them legit failures
+    #
+    # Defaults to 3
+    attr_accessor :max_requeues
+    attr_reader :queue
+    def initialize(build_id:, worker_id:, redis_opts:)
       @build_id = build_id
       @worker_id = worker_id
-      @queue = Queue.new(build_id, worker_id, redis_host)
-      @files_or_dirs_to_run = files_or_dirs_to_run
+      @queue = Queue.new(build_id, worker_id, redis_opts)
+      @files_or_dirs_to_run = "spec"
       @populate_timings = false
       @file_split_threshold = 999999
+      @heartbeat_updated_at = nil
+      @max_requeues = 3
       RSpec::Core::Formatters.register(Formatters::JobTimingRecorder, :dump_summary)
       RSpec::Core::Formatters.register(Formatters::ExampleCountRecorder, :dump_summary)
@@ -31,23 +61,23 @@ module RSpecQ
     def work
       puts "Working for build #{@build_id} (worker=#{@worker_id})"
-      try_publish_queue!(@queue)
-      @queue.wait_until_published
+      try_publish_queue!(queue)
+      queue.wait_until_published
       loop do
         # we have to bootstrap this so that it can be used in the first call
         # to `requeue_lost_job` inside the work loop
         update_heartbeat
-        lost = @queue.requeue_lost_job
+        lost = queue.requeue_lost_job
         puts "Requeued lost job: #{lost}" if lost
         # TODO: can we make `reserve_job` also act like exhausted? and get
         # rid of `exhausted?` (i.e. return false if no jobs remain)
-        job = @queue.reserve_job
+        job = queue.reserve_job
         # build is finished
-        return if job.nil? && @queue.exhausted?
+        return if job.nil? && queue.exhausted?
         next if job.nil?
@@ -60,112 +90,125 @@ module RSpecQ
         RSpec.configuration.detail_color = :magenta
         RSpec.configuration.seed = srand && srand % 0xFFFF
         RSpec.configuration.backtrace_formatter.filter_gem('rspecq')
-        RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(@queue, job))
-        RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(@queue))
+        RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(queue, job, max_requeues))
+        RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(queue))
         RSpec.configuration.add_formatter(Formatters::WorkerHeartbeatRecorder.new(self))
         if populate_timings
-          RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(@queue, job))
+          RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(queue, job))
         end
         opts = RSpec::Core::ConfigurationOptions.new(["--format", "progress", job])
         _result = RSpec::Core::Runner.new(opts).run($stderr, $stdout)
-        @queue.acknowledge_job(job)
+        queue.acknowledge_job(job)
       end
     end
     # Update the worker heartbeat if necessary
     def update_heartbeat
       if @heartbeat_updated_at.nil? || elapsed(@heartbeat_updated_at) >= HEARTBEAT_FREQUENCY
-        @queue.record_worker_heartbeat
+        queue.record_worker_heartbeat
         @heartbeat_updated_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
       end
     end
-    private
-    def reset_rspec_state!
-      RSpec.clear_examples
-      # TODO: remove after https://github.com/rspec/rspec-core/pull/2723
-      RSpec.world.instance_variable_set(:@example_group_counts_by_spec_file, Hash.new(0))
-      # RSpec.clear_examples does not reset those, which causes issues when
-      # a non-example error occurs (subsequent jobs are not executed)
-      # TODO: upstream
-      RSpec.world.non_example_failure = false
-      # we don't want an error that occured outside of the examples (which
-      # would set this to `true`) to stop the worker
-      RSpec.world.wants_to_quit = false
-    end
     def try_publish_queue!(queue)
       return if !queue.become_master
-      RSpec.configuration.files_or_directories_to_run = @files_or_dirs_to_run
+      RSpec.configuration.files_or_directories_to_run = files_or_dirs_to_run
       files_to_run = RSpec.configuration.files_to_run.map { |j| relative_path(j) }
       timings = queue.timings
       if timings.empty?
-        # TODO: should be a warning reported somewhere (Sentry?)
         q_size = queue.publish(files_to_run.shuffle)
-        puts "WARNING: No timings found! Published queue in " \
-             "random order (size=#{q_size})"
+        log_event(
+          "No timings found! Published queue in random order (size=#{q_size})",
+          "warning"
+        )
         return
       end
-      slow_files = timings.take_while do |_job, duration|
-        duration >= file_split_threshold
-      end.map(&:first) & files_to_run
+      # prepare jobs to run
+      jobs = []
+      slow_files = []
-      if slow_files.any?
-        puts "Slow files (threshold=#{file_split_threshold}): #{slow_files}"
+      if file_split_threshold
+        slow_files = timings.take_while do |_job, duration|
+          duration >= file_split_threshold
+        end.map(&:first) & files_to_run
       end
-      # prepare jobs to run
-      jobs = []
-      jobs.concat(files_to_run - slow_files)
-      jobs.concat(files_to_example_ids(slow_files)) if slow_files.any?
+      if slow_files.any?
+        jobs.concat(files_to_run - slow_files)
+        jobs.concat(files_to_example_ids(slow_files))
+      else
+        jobs.concat(files_to_run)
+      end
-      # assign timings to all of them
       default_timing = timings.values[timings.values.size/2]
+      # assign timings (based on previous runs) to all jobs
       jobs = jobs.each_with_object({}) do |j, h|
-        # heuristic: put untimed jobs in the middle of the queue
-        puts "New/untimed job: #{j}" if timings[j].nil?
+        puts "Untimed job: #{j}" if timings[j].nil?
+        # HEURISTIC: put jobs without previous timings (e.g. a newly added
+        # spec file) in the middle of the queue
         h[j] = timings[j] || default_timing
       end
-      # finally, sort them based on their timing (slowest first)
+      # sort jobs based on their timings (slowest to be processed first)
       jobs = jobs.sort_by { |_j, t| -t }.map(&:first)
       puts "Published queue (size=#{queue.publish(jobs)})"
     end
+    private
+    def reset_rspec_state!
+      RSpec.clear_examples
+      # see https://github.com/rspec/rspec-core/pull/2723
+      if Gem::Version.new(RSpec::Core::Version::STRING) <= Gem::Version.new("3.9.1")
+        RSpec.world.instance_variable_set(
+          :@example_group_counts_by_spec_file, Hash.new(0))
+      end
+      # RSpec.clear_examples does not reset those, which causes issues when
+      # a non-example error occurs (subsequent jobs are not executed)
+      # TODO: upstream
+      RSpec.world.non_example_failure = false
+      # we don't want an error that occured outside of the examples (which
+      # would set this to `true`) to stop the worker
+      RSpec.world.wants_to_quit = false
+    end
     # NOTE: RSpec has to load the files before we can split them as individual
     # examples. In case a file to be splitted fails to be loaded
-    # (e.g. contains a syntax error), we return the slow files unchanged,
-    # thereby falling back to scheduling them normally.
-    #
-    # Their errors will be reported in the normal flow, when they're picked up
-    # as jobs by a worker.
+    # (e.g. contains a syntax error), we return the files unchanged, thereby
+    # falling back to scheduling them as whole files. Their errors will be
+    # reported in the normal flow when they're eventually picked up by a worker.
     def files_to_example_ids(files)
-      # TODO: do this programatically
-      cmd = "DISABLE_SPRING=1 bin/rspec --dry-run --format json #{files.join(' ')}"
-      out = `#{cmd}`
-      if !$?.success?
-        # TODO: emit warning to Sentry
-        puts "WARNING: Error splitting slow files; falling back to regular scheduling:"
-        begin
-          pp JSON.parse(out)
-        rescue JSON::ParserError
-          puts out
-        end
-        puts
+      cmd = "DISABLE_SPRING=1 bundle exec rspec --dry-run --format json #{files.join(' ')}"
+      out, err, cmd_result = Open3.capture3(cmd)
+      if !cmd_result.success?
+        rspec_output = begin
+                         JSON.parse(out)
+                       rescue JSON::ParserError
+                         out
+                       end
+        log_event(
+          "Failed to split slow files, falling back to regular scheduling.\n #{err}",
+          "error",
+          rspec_stdout: rspec_output,
+          rspec_stderr: err,
+          cmd_result: cmd_result.inspect,
+        )
+        pp rspec_output
         return files
       end
@@ -181,5 +224,23 @@ module RSpecQ
     def elapsed(since)
       Process.clock_gettime(Process::CLOCK_MONOTONIC) - since
     end
+    # Prints msg to standard output and emits an event to Sentry, if the
+    # SENTRY_DSN environment variable is set.
+    def log_event(msg, level, additional={})
+      puts msg
+      Raven.capture_message(msg, level: level, extra: {
+        build: @build_id,
+        worker: @worker_id,
+        queue: queue.inspect,
+        files_or_dirs_to_run: files_or_dirs_to_run,
+        populate_timings: populate_timings,
+        file_split_threshold: file_split_threshold,
+        heartbeat_updated_at: @heartbeat_updated_at,
+        object: self.inspect,
+        pid: Process.pid,
+      }.merge(additional))
+    end
   end
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rspecq
 version: !ruby/object:Gem::Version
-  version: 0.0.1.pre2
+  version: 0.3.0
 platform: ruby
 authors:
 - Agis Anastasopoulos
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2020-06-26 00:00:00.000000000 Z
+date: 2020-10-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec-core
@@ -38,22 +38,64 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: sentry-raven
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: pry-byebug
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 - !ruby/object:Gem::Dependency
   name: minitest
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '5.14'
+        version: '0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '5.14'
+        version: '0'
 - !ruby/object:Gem::Dependency
-  name: rake
+  name: rspec
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
@@ -76,8 +118,10 @@ files:
 - CHANGELOG.md
 - LICENSE
 - README.md
+- Rakefile
 - bin/rspecq
 - lib/rspecq.rb
+- lib/rspecq/formatters/README.md
 - lib/rspecq/formatters/example_count_recorder.rb
 - lib/rspecq/formatters/failure_recorder.rb
 - lib/rspecq/formatters/job_timing_recorder.rb
@@ -101,12 +145,13 @@ required_ruby_version: !ruby/object:Gem::Requirement
       version: '0'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
-  - - ">"
+  - - ">="
     - !ruby/object:Gem::Version
-      version: 1.3.1
+      version: '0'
 requirements: []
-rubygems_version: 3.1.2
+rubygems_version: 3.1.4
 signing_key:
 specification_version: 4
-summary: Distribute an RSpec suite among many workers
+summary: Optimally distribute and run RSpec suites among parallel workers; for faster
+  CI builds
 test_files: []