RubyGems - rspecq - Versions diffs - 0.0.1.pre1 - Mend

rspecq 0.0.1.pre1

Files changed (15) hide show

checksums.yaml +7 -0
data/CHANGELOG.md +4 -0
data/LICENSE +20 -0
data/README.md +103 -0
data/bin/rspecq +67 -0
data/lib/rspecq.rb +21 -0
data/lib/rspecq/formatters/example_count_recorder.rb +15 -0
data/lib/rspecq/formatters/failure_recorder.rb +50 -0
data/lib/rspecq/formatters/job_timing_recorder.rb +14 -0
data/lib/rspecq/formatters/worker_heartbeat_recorder.rb +17 -0
data/lib/rspecq/queue.rb +288 -0
data/lib/rspecq/reporter.rb +95 -0
data/lib/rspecq/version.rb +3 -0
data/lib/rspecq/worker.rb +185 -0
metadata +98 -0

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: a1c9e27a7a39ff772ee8f4303d26af799f4c7f20232cdc8729f3fa1ddcc4c144
+  data.tar.gz: 2dc1200b575b95b10f2dca4e4a1f3ba90e1577a6af4fd177691aece592249ed6
+SHA512:
+  metadata.gz: c7654d037340e28e5ed31dfbed7826e30b84a2e092f930df13e76d92d0513c6ef2d25727c18bb7b84ad89675835e28e848c85f5b010ab13384674e2c2763f06f
+  data.tar.gz: 2b7421273d4b38848e8110526fb29740f67f9a620d2f205a612887503ce3b04463fafb9eff0dd9845c1eb56c4810fca97eed37761a6a23fa9fc39ab962d5373b

data/CHANGELOG.md ADDED

@@ -0,0 +1,4 @@
+# Changelog
+## master (unreleased)

data/LICENSE ADDED

@@ -0,0 +1,20 @@
+The MIT License
+Copyright (c) 2020 Skroutz S.A.
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
+the Software, and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
+COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md ADDED

@@ -0,0 +1,103 @@
+# RSpecQ
+RSpecQ (`rspecq`) distributes and executes an RSpec suite over many workers,
+using a centralized queue backed by Redis.
+RSpecQ is heavily inspired by [test-queue](https://github.com/tmm1/test-queue)
+and [ci-queue](https://github.com/Shopify/ci-queue).
+## Why don't you just use ci-queue?
+While evaluating ci-queue for our RSpec suite, we observed slow boot times
+in the workers (up to 3 minutes), increased memory consumption and too much
+disk I/O on boot. This is due to the fact that a worker in ci-queue has to
+load every spec file on boot. This can be problematic for applications with
+a large number of spec files.
+RSpecQ works with spec files as its unit of work (as opposed to ci-queue which
+works with individual examples). This means that an RSpecQ worker does not
+have to load all spec files at once and so it doesn't have the aforementioned
+problems. It also allows suites to keep using `before(:all)` hooks
+(which ci-queue explicitly rejects). (Note: RSpecQ also schedules individual
+examples, but only when this is deemed necessary, see section
+"Spec file splitting").
+We also observed faster build times by scheduling spec files instead of
+individual examples, due to way less Redis operations.
+The downside of this design is that it's more complicated, since the scheduling
+of spec files happens based on timings calculated from previous runs. This
+means that RSpecQ maintains a key with the timing of each job and updates it
+on every run (if the `--timings` option was used). Also, RSpecQ has a "slow
+file threshold" which, currently has to be set manually (but this can be
+improved).
+*Update*: ci-queue deprecated support for RSpec, so there's that.
+## Usage
+Each worker needs to know the build it will participate in, its name and where
+Redis is located. To start a worker:
+```shell
+$ rspecq --build-id=foo --worker-id=worker1 --redis=redis://localhost
+```
+To view the progress of the build print use `--report`:
+```shell
+$ rspecq --build-id=foo --worker-id=reporter --redis=redis://localhost --report
+```
+For detailed info use `--help`.
+## How it works
+The basic idea is identical to ci-queue so please refer to its README
+### Terminology
+- Job: the smallest unit of work, which is usually a spec file
+  (e.g. `./spec/models/foo_spec.rb`) but can also be an individual example
+  (e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow
+- Queue: a collection of Redis-backed structures that hold all the necessary
+  information for RSpecQ to function. This includes timing statistics, jobs to
+  be executed, the failure reports, requeueing statistics and more.
+- Worker: a process that, given a build id, pops up jobs of that build and
+  executes them using RSpec
+- Reporter: a process that, given a build id, waits for the build to finish
+  and prints the summary report (examples executed, build result, failures etc.)
+### Spec file splitting
+Very slow files may put a limit to how fast the suite can execute. For example,
+a worker may spend 10 minutes running a single slow file, while all the other
+workers finish after 8 minutes. To overcome this issue, rspecq splits
+files that their execution time is above a certain threshold
+(set with the `--file-split-threshold` option) and will instead schedule them as
+individual examples.
+In the future, we'd like for the slow threshold to be calculated and set
+dynamically.
+### Requeues
+As a mitigation measure for flaky tests, if an example fails it will be put
+back to the queue to be picked up by
+another worker. This will be repeated up to a certain number of times before,
+after which the example will be considered a legit failure and will be printed
+in the final report (`--report`).
+### Worker failures
+Workers emit a timestamp after each example, as a heartbeat, to denote
+that they're fine and performing jobs. If a worker hasn't reported for
+a given amount of time (see `WORKER_LIVENESS_SEC`) it is considered dead
+and the job it reserved will be requeued, so that it is picked up by another worker.
+This protects us against unrecoverable worker failures (e.g. segfault).
+## License
+RSpecQ is licensed under MIT. See [LICENSE](LICENSE).

data/bin/rspecq ADDED

@@ -0,0 +1,67 @@
+#!/usr/bin/env ruby
+require "optionparser"
+require "rspecq"
+opts = {}
+OptionParser.new do |o|
+  o.banner = "Usage: #{$PROGRAM_NAME} [opts] [files_or_directories_to_run]"
+  o.on("--build-id ID", "A unique identifier denoting the build") do |v|
+    opts[:build_id] = v
+  end
+  o.on("--worker-id ID", "A unique identifier denoting the worker") do |v|
+    opts[:worker_id] = v
+  end
+  o.on("--redis HOST", "Redis HOST to connect to (default: 127.0.0.1)") do |v|
+    opts[:redis_host] = v || "127.0.0.1"
+  end
+  o.on("--timings", "Populate global job timings in Redis") do |v|
+    opts[:timings] = v
+  end
+  o.on("--file-split-threshold N", "Split spec files slower than N sec. and " \
+       "schedule them by example (default: 999999)") do |v|
+    opts[:file_split_threshold] = Float(v)
+  end
+  o.on("--report", "Do not execute tests but wait until queue is empty and " \
+       "print a report") do |v|
+    opts[:report] = v
+  end
+  o.on("--report-timeout N", Integer, "Fail if queue is not empty after " \
+       "N seconds. Only applicable if --report is enabled "               \
+       "(default: 3600)") do |v|
+    opts[:report_timeout] = v
+  end
+end.parse!
+[:build_id, :worker_id].each do |o|
+  raise OptionParser::MissingArgument.new(o) if opts[o].nil?
+end
+if opts[:report]
+  reporter = RSpecQ::Reporter.new(
+    build_id: opts[:build_id],
+    worker_id: opts[:worker_id],
+    timeout: opts[:report_timeout] || 3600,
+    redis_host: opts[:redis_host],
+  )
+  reporter.report
+else
+  worker = RSpecQ::Worker.new(
+    build_id: opts[:build_id],
+    worker_id: opts[:worker_id],
+    redis_host: opts[:redis_host],
+    files_or_dirs_to_run: ARGV[0] || "spec",
+  )
+  worker.populate_timings = opts[:timings]
+  worker.file_split_threshold = opts[:file_split_threshold] || 999999
+  worker.work
+end

data/lib/rspecq.rb ADDED

@@ -0,0 +1,21 @@
+require "rspec/core"
+module RSpecQ
+  MAX_REQUEUES = 3
+  # If a worker haven't executed an RSpec example for more than this time
+  # (in seconds), it is considered dead and its reserved work will be put back
+  # to the queue, to be picked up by another worker.
+  WORKER_LIVENESS_SEC = 60.0
+end
+require_relative "rspecq/formatters/example_count_recorder"
+require_relative "rspecq/formatters/failure_recorder"
+require_relative "rspecq/formatters/job_timing_recorder"
+require_relative "rspecq/formatters/worker_heartbeat_recorder"
+require_relative "rspecq/queue"
+require_relative "rspecq/reporter"
+require_relative "rspecq/worker"
+require_relative "rspecq/version"

data/lib/rspecq/formatters/example_count_recorder.rb ADDED

@@ -0,0 +1,15 @@
+module RSpecQ
+  module Formatters
+    # Increments the example counter after each job.
+    class ExampleCountRecorder
+      def initialize(queue)
+        @queue = queue
+      end
+      def dump_summary(summary)
+        n = summary.examples.count
+        @queue.increment_example_count(n) if n > 0
+      end
+    end
+  end
+end

data/lib/rspecq/formatters/failure_recorder.rb ADDED

@@ -0,0 +1,50 @@
+module RSpecQ
+  module Formatters
+    class FailureRecorder
+      def initialize(queue, job)
+        @queue = queue
+        @job = job
+        @colorizer = RSpec::Core::Formatters::ConsoleCodes
+        @non_example_error_recorded = false
+      end
+      # Here we're notified about errors occuring outside of examples.
+      #
+      # NOTE: Upon such an error, RSpec emits multiple notifications but we only
+      # want the _first_, which is the one that contains the error backtrace.
+      # That's why have to keep track of whether we've already received the
+      # needed notification and act accordingly.
+      def message(n)
+        if RSpec.world.non_example_failure && !@non_example_error_recorded
+          @queue.record_non_example_error(@job, n.message)
+          @non_example_error_recorded = true
+        end
+      end
+      def example_failed(notification)
+        example = notification.example
+        if @queue.requeue_job(example.id, MAX_REQUEUES)
+          # HACK: try to avoid picking the job we just requeued; we want it
+          # to be picked up by a different worker
+          sleep 0.5
+          return
+        end
+        presenter = RSpec::Core::Formatters::ExceptionPresenter.new(
+          example.exception, example)
+        msg = presenter.fully_formatted(nil, @colorizer)
+        msg << "\n"
+        msg << @colorizer.wrap(
+          "bin/rspec #{example.location_rerun_argument}",
+          RSpec.configuration.failure_color)
+        msg << @colorizer.wrap(
+          " # #{example.full_description}", RSpec.configuration.detail_color)
+        @queue.record_example_failure(notification.example.id, msg)
+      end
+    end
+  end
+end

data/lib/rspecq/formatters/job_timing_recorder.rb ADDED

@@ -0,0 +1,14 @@
+module RSpecQ
+  module Formatters
+    class JobTimingRecorder
+      def initialize(queue, job)
+        @queue = queue
+        @job = job
+      end
+      def dump_summary(summary)
+        @queue.record_timing(@job, Float(summary.duration))
+      end
+    end
+  end
+end

data/lib/rspecq/formatters/worker_heartbeat_recorder.rb ADDED

@@ -0,0 +1,17 @@
+module RSpecQ
+  module Formatters
+    # Updates the respective heartbeat key of the worker after each example.
+    #
+    # Refer to the documentation of WORKER_LIVENESS_SEC for more info.
+    class WorkerHeartbeatRecorder
+      def initialize(worker)
+        @worker = worker
+      end
+      def example_finished(*)
+        @worker.update_heartbeat
+      end
+    end
+  end
+end

data/lib/rspecq/queue.rb ADDED

@@ -0,0 +1,288 @@
+require "redis"
+module RSpecQ
+  class Queue
+    RESERVE_JOB = <<~LUA.freeze
+      local queue = KEYS[1]
+      local queue_running = KEYS[2]
+      local worker_id = ARGV[1]
+      local job = redis.call('lpop', queue)
+      if job then
+        redis.call('hset', queue_running, worker_id, job)
+        return job
+      else
+        return nil
+      end
+    LUA
+    # Scans for dead workers and puts their reserved jobs back to the queue.
+    REQUEUE_LOST_JOB = <<~LUA.freeze
+      local worker_heartbeats = KEYS[1]
+      local queue_running = KEYS[2]
+      local queue_unprocessed = KEYS[3]
+      local time_now = ARGV[1]
+      local timeout = ARGV[2]
+      local dead_workers = redis.call('zrangebyscore', worker_heartbeats, 0, time_now - timeout)
+      for _, worker in ipairs(dead_workers) do
+        local job = redis.call('hget', queue_running, worker)
+        if job then
+          redis.call('lpush', queue_unprocessed, job)
+          redis.call('hdel', queue_running, worker)
+          return job
+        end
+      end
+      return nil
+    LUA
+    REQUEUE_JOB = <<~LUA.freeze
+      local key_queue_unprocessed = KEYS[1]
+      local key_requeues = KEYS[2]
+      local job = ARGV[1]
+      local max_requeues = ARGV[2]
+      local requeued_times = redis.call('hget', key_requeues, job)
+      if requeued_times and requeued_times >= max_requeues then
+        return nil
+      end
+      redis.call('lpush', key_queue_unprocessed, job)
+      redis.call('hincrby', key_requeues, job, 1)
+      return true
+    LUA
+    STATUS_INITIALIZING = "initializing".freeze
+    STATUS_READY = "ready".freeze
+    def initialize(build_id, worker_id, redis_host)
+      @build_id = build_id
+      @worker_id = worker_id
+      @redis = Redis.new(host: redis_host, id: worker_id)
+    end
+    # NOTE: jobs will be processed from head to tail (lpop)
+    def publish(jobs)
+      @redis.multi do
+        @redis.rpush(key_queue_unprocessed, jobs)
+        @redis.set(key_queue_status, STATUS_READY)
+      end.first
+    end
+    def reserve_job
+      @redis.eval(
+        RESERVE_JOB,
+        keys: [
+          key_queue_unprocessed,
+          key_queue_running,
+        ],
+        argv: [@worker_id]
+      )
+    end
+    def requeue_lost_job
+      @redis.eval(
+        REQUEUE_LOST_JOB,
+        keys: [
+          key_worker_heartbeats,
+          key_queue_running,
+          key_queue_unprocessed
+        ],
+        argv: [
+          current_time,
+          WORKER_LIVENESS_SEC
+        ]
+      )
+    end
+    # NOTE: The same job might happen to be acknowledged more than once, in
+    # the case of requeues.
+    def acknowledge_job(job)
+      @redis.multi do
+        @redis.hdel(key_queue_running, @worker_id)
+        @redis.sadd(key_queue_processed, job)
+      end
+    end
+    # Put job at the head of the queue to be re-processed right after, by
+    # another worker. This is a mitigation measure against flaky tests.
+    #
+    # Returns nil if the job hit the requeue limit and therefore was not
+    # requeued and should be considered a failure.
+    def requeue_job(job, max_requeues)
+      return false if max_requeues.zero?
+      @redis.eval(
+        REQUEUE_JOB,
+        keys: [key_queue_unprocessed, key_requeues],
+        argv: [job, max_requeues],
+      )
+    end
+    def record_example_failure(example_id, message)
+      @redis.hset(key_failures, example_id, message)
+    end
+    # For errors occured outside of examples (e.g. while loading a spec file)
+    def record_non_example_error(job, message)
+      @redis.hset(key_errors, job, message)
+    end
+    def record_timing(job, duration)
+      @redis.zadd(key_timings, duration, job)
+    end
+    def record_build_time(duration)
+      @redis.multi do
+        @redis.lpush(key_build_times, Float(duration))
+        @redis.ltrim(key_build_times, 0, 99)
+      end
+    end
+    def record_worker_heartbeat
+      @redis.zadd(key_worker_heartbeats, current_time, @worker_id)
+    end
+    def increment_example_count(n)
+      @redis.incrby(key_example_count, n)
+    end
+    def example_count
+      @redis.get(key_example_count) || 0
+    end
+    def processed_jobs_count
+      @redis.scard(key_queue_processed)
+    end
+    def become_master
+      @redis.setnx(key_queue_status, STATUS_INITIALIZING)
+    end
+    # ordered by execution time desc (slowest are in the head)
+    def timings
+      Hash[@redis.zrevrange(key_timings, 0, -1, withscores: true)]
+    end
+    def example_failures
+      @redis.hgetall(key_failures)
+    end
+    def non_example_errors
+      @redis.hgetall(key_errors)
+    end
+    def exhausted?
+      return false if !published?
+      @redis.multi do
+        @redis.llen(key_queue_unprocessed)
+        @redis.hlen(key_queue_running)
+      end.inject(:+).zero?
+    end
+    def published?
+      @redis.get(key_queue_status) == STATUS_READY
+    end
+    def wait_until_published(timeout=30)
+      (timeout * 10).times do
+        return if published?
+        sleep 0.1
+      end
+      raise "Queue not yet published after #{timeout} seconds"
+    end
+    def build_successful?
+      exhausted? && example_failures.empty? && non_example_errors.empty?
+    end
+    private
+    def key(*keys)
+      [@build_id, keys].join(":")
+    end
+    # redis: STRING [STATUS_INITIALIZING, STATUS_READY]
+    def key_queue_status
+      key("queue", "status")
+    end
+    # redis: LIST<job>
+    def key_queue_unprocessed
+      key("queue", "unprocessed")
+    end
+    # redis: HASH<worker_id => job>
+    def key_queue_running
+      key("queue", "running")
+    end
+    # redis: SET<job>
+    def key_queue_processed
+      key("queue", "processed")
+    end
+    # Contains regular RSpec example failures.
+    #
+    # redis: HASH<example_id => error message>
+    def key_failures
+      key("example_failures")
+    end
+    # Contains errors raised outside of RSpec examples
+    # (e.g. a syntax error in spec_helper.rb).
+    #
+    # redis: HASH<job => error message>
+    def key_errors
+      key("errors")
+    end
+    # As a mitigation mechanism for flaky tests, we requeue example failures
+    # to be retried by another worker, up to a certain number of times.
+    #
+    # redis: HASH<job => times_retried>
+    def key_requeues
+      key("requeues")
+    end
+    # The total number of examples, those that were requeued.
+    #
+    # redis: STRING<integer>
+    def key_example_count
+      key("example_count")
+    end
+    # redis: ZSET<worker_id => timestamp>
+    #
+    # Timestamp of the last example processed by each worker.
+    def key_worker_heartbeats
+      key("worker_heartbeats")
+    end
+    # redis: ZSET<job => duration>
+    #
+    # NOTE: This key is not scoped to a build (i.e. shared among all builds),
+    # so be careful to only publish timings from a single branch (e.g. master).
+    # Otherwise, timings won't be accurate.
+    def key_timings
+      "timings"
+    end
+    # redis: LIST<duration>
+    #
+    # Last build is at the head of the list.
+    def key_build_times
+      "build_times"
+    end
+    # We don't use any Ruby `Time` methods because specs that use timecop in
+    # before(:all) hooks will mess up our times.
+    def current_time
+      @redis.time[0]
+    end
+  end
+end

data/lib/rspecq/reporter.rb ADDED

@@ -0,0 +1,95 @@
+module RSpecQ
+  class Reporter
+    def initialize(build_id:, worker_id:, timeout:, redis_host:)
+      @build_id = build_id
+      @worker_id = worker_id
+      @timeout = timeout
+      @queue = Queue.new(build_id, worker_id, redis_host)
+      # We want feedback to be immediattely printed to CI users, so
+      # we disable buffering.
+      STDOUT.sync = true
+    end
+    def report
+      t = measure_duration { @queue.wait_until_published }
+      finished = false
+      reported_failures = {}
+      failure_heading_printed = false
+      tests_duration = measure_duration do
+        @timeout.times do |i|
+          @queue.example_failures.each do |job, rspec_output|
+            next if reported_failures[job]
+            if !failure_heading_printed
+              puts "\nFailures:\n"
+              failure_heading_printed = true
+            end
+            reported_failures[job] = true
+            puts failure_formatted(rspec_output)
+          end
+          if !@queue.exhausted?
+            sleep 1
+            next
+          end
+          finished = true
+          break
+        end
+      end
+      raise "Build not finished after #{@timeout} seconds" if !finished
+      @queue.record_build_time(tests_duration)
+      puts summary(@queue.example_failures, @queue.non_example_errors,
+                   humanize_duration(tests_duration))
+      exit 1 if !@queue.build_successful?
+    end
+    private
+    def measure_duration
+      start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+      yield
+      (Process.clock_gettime(Process::CLOCK_MONOTONIC) - start).round(2)
+    end
+    # We try to keep this output consistent with RSpec's original output
+    def summary(failures, errors, duration)
+      failed_examples_section = "\nFailed examples:\n\n"
+      failures.each do |_job, msg|
+        parts = msg.split("\n")
+        failed_examples_section << "  #{parts[-1]}\n"
+      end
+      summary = ""
+      summary << failed_examples_section if !failures.empty?
+      errors.each { |_job, msg| summary << msg }
+      summary << "\n"
+      summary << "Total results:\n"
+      summary << "  #{@queue.example_count} examples "     \
+                 "(#{@queue.processed_jobs_count} jobs), " \
+                 "#{failures.count} failures, "            \
+                 "#{errors.count} errors"
+      summary << "\n\n"
+      summary << "Spec execution time: #{duration}"
+    end
+    def failure_formatted(rspec_output)
+      rspec_output.split("\n")[0..-2].join("\n")
+    end
+    def humanize_duration(seconds)
+      Time.at(seconds).utc.strftime("%H:%M:%S")
+    end
+  end
+end

data/lib/rspecq/version.rb ADDED

@@ -0,0 +1,3 @@
+module RSpecQ
+  VERSION = "0.0.1.pre1".freeze
+end

data/lib/rspecq/worker.rb ADDED

@@ -0,0 +1,185 @@
+require "json"
+require "pp"
+module RSpecQ
+  class Worker
+    HEARTBEAT_FREQUENCY = WORKER_LIVENESS_SEC / 6
+    # If true, job timings will be populated in the global Redis timings key
+    #
+    # Defaults to false
+    attr_accessor :populate_timings
+    # If set, spec files that are known to take more than this value to finish,
+    # will be split and scheduled on a per-example basis.
+    attr_accessor :file_split_threshold
+    def initialize(build_id:, worker_id:, redis_host:, files_or_dirs_to_run:)
+      @build_id = build_id
+      @worker_id = worker_id
+      @queue = Queue.new(build_id, worker_id, redis_host)
+      @files_or_dirs_to_run = files_or_dirs_to_run
+      @populate_timings = false
+      @file_split_threshold = 999999
+      RSpec::Core::Formatters.register(Formatters::JobTimingRecorder, :dump_summary)
+      RSpec::Core::Formatters.register(Formatters::ExampleCountRecorder, :dump_summary)
+      RSpec::Core::Formatters.register(Formatters::FailureRecorder, :example_failed, :message)
+      RSpec::Core::Formatters.register(Formatters::WorkerHeartbeatRecorder, :example_finished)
+    end
+    def work
+      puts "Working for build #{@build_id} (worker=#{@worker_id})"
+      try_publish_queue!(@queue)
+      @queue.wait_until_published
+      loop do
+        # we have to bootstrap this so that it can be used in the first call
+        # to `requeue_lost_job` inside the work loop
+        update_heartbeat
+        lost = @queue.requeue_lost_job
+        puts "Requeued lost job: #{lost}" if lost
+        # TODO: can we make `reserve_job` also act like exhausted? and get
+        # rid of `exhausted?` (i.e. return false if no jobs remain)
+        job = @queue.reserve_job
+        # build is finished
+        return if job.nil? && @queue.exhausted?
+        next if job.nil?
+        puts
+        puts "Executing #{job}"
+        reset_rspec_state!
+        # reconfigure rspec
+        RSpec.configuration.detail_color = :magenta
+        RSpec.configuration.seed = srand && srand % 0xFFFF
+        RSpec.configuration.backtrace_formatter.filter_gem('rspecq')
+        RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(@queue, job))
+        RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(@queue))
+        RSpec.configuration.add_formatter(Formatters::WorkerHeartbeatRecorder.new(self))
+        if populate_timings
+          RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(@queue, job))
+        end
+        opts = RSpec::Core::ConfigurationOptions.new(["--format", "progress", job])
+        _result = RSpec::Core::Runner.new(opts).run($stderr, $stdout)
+        @queue.acknowledge_job(job)
+      end
+    end
+    # Update the worker heartbeat if necessary
+    def update_heartbeat
+      if @heartbeat_updated_at.nil? || elapsed(@heartbeat_updated_at) >= HEARTBEAT_FREQUENCY
+        @queue.record_worker_heartbeat
+        @heartbeat_updated_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+      end
+    end
+    private
+    def reset_rspec_state!
+      RSpec.clear_examples
+      # TODO: remove after https://github.com/rspec/rspec-core/pull/2723
+      RSpec.world.instance_variable_set(:@example_group_counts_by_spec_file, Hash.new(0))
+      # RSpec.clear_examples does not reset those, which causes issues when
+      # a non-example error occurs (subsequent jobs are not executed)
+      # TODO: upstream
+      RSpec.world.non_example_failure = false
+      # we don't want an error that occured outside of the examples (which
+      # would set this to `true`) to stop the worker
+      RSpec.world.wants_to_quit = false
+    end
+    def try_publish_queue!(queue)
+      return if !queue.become_master
+      RSpec.configuration.files_or_directories_to_run = @files_or_dirs_to_run
+      files_to_run = RSpec.configuration.files_to_run.map { |j| relative_path(j) }
+      timings = queue.timings
+      if timings.empty?
+        # TODO: should be a warning reported somewhere (Sentry?)
+        q_size = queue.publish(files_to_run.shuffle)
+        puts "WARNING: No timings found! Published queue in " \
+             "random order (size=#{q_size})"
+        return
+      end
+      slow_files = timings.take_while do |_job, duration|
+        duration >= file_split_threshold
+      end.map(&:first) & files_to_run
+      if slow_files.any?
+        puts "Slow files (threshold=#{file_split_threshold}): #{slow_files}"
+      end
+      # prepare jobs to run
+      jobs = []
+      jobs.concat(files_to_run - slow_files)
+      jobs.concat(files_to_example_ids(slow_files)) if slow_files.any?
+      # assign timings to all of them
+      default_timing = timings.values[timings.values.size/2]
+      jobs = jobs.each_with_object({}) do |j, h|
+        # heuristic: put untimed jobs in the middle of the queue
+        puts "New/untimed job: #{j}" if timings[j].nil?
+        h[j] = timings[j] || default_timing
+      end
+      # finally, sort them based on their timing (slowest first)
+      jobs = jobs.sort_by { |_j, t| -t }.map(&:first)
+      puts "Published queue (size=#{queue.publish(jobs)})"
+    end
+    # NOTE: RSpec has to load the files before we can split them as individual
+    # examples. In case a file to be splitted fails to be loaded
+    # (e.g. contains a syntax error), we return the slow files unchanged,
+    # thereby falling back to scheduling them normally.
+    #
+    # Their errors will be reported in the normal flow, when they're picked up
+    # as jobs by a worker.
+    def files_to_example_ids(files)
+      # TODO: do this programatically
+      cmd = "DISABLE_SPRING=1 bin/rspec --dry-run --format json #{files.join(' ')}"
+      out = `#{cmd}`
+      if !$?.success?
+        # TODO: emit warning to Sentry
+        puts "WARNING: Error splitting slow files; falling back to regular scheduling:"
+        begin
+          pp JSON.parse(out)
+        rescue JSON::ParserError
+          puts out
+        end
+        puts
+        return files
+      end
+      JSON.parse(out)["examples"].map { |e| e["id"] }
+    end
+    def relative_path(job)
+      @cwd ||= Pathname.new(Dir.pwd)
+      "./#{Pathname.new(job).relative_path_from(@cwd)}"
+    end
+    def elapsed(since)
+      Process.clock_gettime(Process::CLOCK_MONOTONIC) - since
+    end
+  end
+end

metadata ADDED

@@ -0,0 +1,98 @@
+--- !ruby/object:Gem::Specification
+name: rspecq
+version: !ruby/object:Gem::Version
+  version: 0.0.1.pre1
+platform: ruby
+authors:
+- Agis Anastasopoulos
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2020-06-26 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rspec-core
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: minitest
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '5.14'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '5.14'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+description:
+email: agis.anast@gmail.com
+executables:
+- rspecq
+extensions: []
+extra_rdoc_files: []
+files:
+- CHANGELOG.md
+- LICENSE
+- README.md
+- bin/rspecq
+- lib/rspecq.rb
+- lib/rspecq/formatters/example_count_recorder.rb
+- lib/rspecq/formatters/failure_recorder.rb
+- lib/rspecq/formatters/job_timing_recorder.rb
+- lib/rspecq/formatters/worker_heartbeat_recorder.rb
+- lib/rspecq/queue.rb
+- lib/rspecq/reporter.rb
+- lib/rspecq/version.rb
+- lib/rspecq/worker.rb
+homepage: https://github.com/skroutz/rspecq
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">"
+    - !ruby/object:Gem::Version
+      version: 1.3.1
+requirements: []
+rubygems_version: 3.1.2
+signing_key:
+specification_version: 4
+summary: Distribute an RSpec suite among many workers
+test_files: []