RubyGems - rspecq - Versions diffs - 0.0.1.pre1 - Mend

rspecq 0.0.1.pre1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +7 -0
data/CHANGELOG.md +4 -0
data/LICENSE +20 -0
data/README.md +103 -0
data/bin/rspecq +67 -0
data/lib/rspecq.rb +21 -0
data/lib/rspecq/formatters/example_count_recorder.rb +15 -0
data/lib/rspecq/formatters/failure_recorder.rb +50 -0
data/lib/rspecq/formatters/job_timing_recorder.rb +14 -0
data/lib/rspecq/formatters/worker_heartbeat_recorder.rb +17 -0
data/lib/rspecq/queue.rb +288 -0
data/lib/rspecq/reporter.rb +95 -0
data/lib/rspecq/version.rb +3 -0
data/lib/rspecq/worker.rb +185 -0
metadata +98 -0

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: a1c9e27a7a39ff772ee8f4303d26af799f4c7f20232cdc8729f3fa1ddcc4c144
+  data.tar.gz: 2dc1200b575b95b10f2dca4e4a1f3ba90e1577a6af4fd177691aece592249ed6
+SHA512:
+  metadata.gz: c7654d037340e28e5ed31dfbed7826e30b84a2e092f930df13e76d92d0513c6ef2d25727c18bb7b84ad89675835e28e848c85f5b010ab13384674e2c2763f06f
+  data.tar.gz: 2b7421273d4b38848e8110526fb29740f67f9a620d2f205a612887503ce3b04463fafb9eff0dd9845c1eb56c4810fca97eed37761a6a23fa9fc39ab962d5373b

data/CHANGELOG.md ADDED

@@ -0,0 +1,4 @@
+# Changelog
+## master (unreleased)

data/LICENSE ADDED

@@ -0,0 +1,20 @@
+The MIT License
+Copyright (c) 2020 Skroutz S.A.
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
+the Software, and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
+COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md ADDED

@@ -0,0 +1,103 @@
+# RSpecQ
+RSpecQ (`rspecq`) distributes and executes an RSpec suite over many workers,
+using a centralized queue backed by Redis.
+RSpecQ is heavily inspired by [test-queue](https://github.com/tmm1/test-queue)
+and [ci-queue](https://github.com/Shopify/ci-queue).
+## Why don't you just use ci-queue?
+While evaluating ci-queue for our RSpec suite, we observed slow boot times
+in the workers (up to 3 minutes), increased memory consumption and too much
+disk I/O on boot. This is due to the fact that a worker in ci-queue has to
+load every spec file on boot. This can be problematic for applications with
+a large number of spec files.
+RSpecQ works with spec files as its unit of work (as opposed to ci-queue which
+works with individual examples). This means that an RSpecQ worker does not
+have to load all spec files at once and so it doesn't have the aforementioned
+problems. It also allows suites to keep using `before(:all)` hooks
+(which ci-queue explicitly rejects). (Note: RSpecQ also schedules individual
+examples, but only when this is deemed necessary, see section
+"Spec file splitting").
+We also observed faster build times by scheduling spec files instead of
+individual examples, due to way less Redis operations.
+The downside of this design is that it's more complicated, since the scheduling
+of spec files happens based on timings calculated from previous runs. This
+means that RSpecQ maintains a key with the timing of each job and updates it
+on every run (if the `--timings` option was used). Also, RSpecQ has a "slow
+file threshold" which, currently has to be set manually (but this can be
+improved).
+*Update*: ci-queue deprecated support for RSpec, so there's that.
+## Usage
+Each worker needs to know the build it will participate in, its name and where
+Redis is located. To start a worker:
+```shell
+$ rspecq --build-id=foo --worker-id=worker1 --redis=redis://localhost
+```
+To view the progress of the build print use `--report`:
+```shell
+$ rspecq --build-id=foo --worker-id=reporter --redis=redis://localhost --report
+```
+For detailed info use `--help`.
+## How it works
+The basic idea is identical to ci-queue so please refer to its README
+### Terminology
+- Job: the smallest unit of work, which is usually a spec file
+  (e.g. `./spec/models/foo_spec.rb`) but can also be an individual example
+  (e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow
+- Queue: a collection of Redis-backed structures that hold all the necessary
+  information for RSpecQ to function. This includes timing statistics, jobs to
+  be executed, the failure reports, requeueing statistics and more.
+- Worker: a process that, given a build id, pops up jobs of that build and
+  executes them using RSpec
+- Reporter: a process that, given a build id, waits for the build to finish
+  and prints the summary report (examples executed, build result, failures etc.)
+### Spec file splitting
+Very slow files may put a limit to how fast the suite can execute. For example,
+a worker may spend 10 minutes running a single slow file, while all the other
+workers finish after 8 minutes. To overcome this issue, rspecq splits
+files that their execution time is above a certain threshold
+(set with the `--file-split-threshold` option) and will instead schedule them as
+individual examples.
+In the future, we'd like for the slow threshold to be calculated and set
+dynamically.
+### Requeues
+As a mitigation measure for flaky tests, if an example fails it will be put
+back to the queue to be picked up by
+another worker. This will be repeated up to a certain number of times before,
+after which the example will be considered a legit failure and will be printed
+in the final report (`--report`).
+### Worker failures
+Workers emit a timestamp after each example, as a heartbeat, to denote
+that they're fine and performing jobs. If a worker hasn't reported for
+a given amount of time (see `WORKER_LIVENESS_SEC`) it is considered dead
+and the job it reserved will be requeued, so that it is picked up by another worker.
+This protects us against unrecoverable worker failures (e.g. segfault).
+## License
+RSpecQ is licensed under MIT. See [LICENSE](LICENSE).

data/bin/rspecq ADDED

@@ -0,0 +1,67 @@
+#!/usr/bin/env ruby
+require "optionparser"
+require "rspecq"
+opts = {}
+OptionParser.new do |o|
+  o.banner = "Usage: #{$PROGRAM_NAME} [opts] [files_or_directories_to_run]"
+  o.on("--build-id ID", "A unique identifier denoting the build") do |v|
+    opts[:build_id] = v
+  end
+  o.on("--worker-id ID", "A unique identifier denoting the worker") do |v|
+    opts[:worker_id] = v
+  end
+  o.on("--redis HOST", "Redis HOST to connect to (default: 127.0.0.1)") do |v|
+    opts[:redis_host] = v || "127.0.0.1"
+  end
+  o.on("--timings", "Populate global job timings in Redis") do |v|
+    opts[:timings] = v
+  end
+  o.on("--file-split-threshold N", "Split spec files slower than N sec. and " \
+       "schedule them by example (default: 999999)") do |v|
+    opts[:file_split_threshold] = Float(v)
+  end
+  o.on("--report", "Do not execute tests but wait until queue is empty and " \
+       "print a report") do |v|
+    opts[:report] = v
+  end
+  o.on("--report-timeout N", Integer, "Fail if queue is not empty after " \
+       "N seconds. Only applicable if --report is enabled "               \
+       "(default: 3600)") do |v|
+    opts[:report_timeout] = v
+  end
+end.parse!
+[:build_id, :worker_id].each do |o|
+  raise OptionParser::MissingArgument.new(o) if opts[o].nil?
+end
+if opts[:report]
+  reporter = RSpecQ::Reporter.new(
+    build_id: opts[:build_id],
+    worker_id: opts[:worker_id],
+    timeout: opts[:report_timeout] || 3600,
+    redis_host: opts[:redis_host],
+  )
+  reporter.report
+else
+  worker = RSpecQ::Worker.new(
+    build_id: opts[:build_id],
+    worker_id: opts[:worker_id],
+    redis_host: opts[:redis_host],
+    files_or_dirs_to_run: ARGV[0] || "spec",
+  )
+  worker.populate_timings = opts[:timings]
+  worker.file_split_threshold = opts[:file_split_threshold] || 999999
+  worker.work
+end

data/lib/rspecq.rb ADDED

@@ -0,0 +1,21 @@
+require "rspec/core"
+module RSpecQ
+  MAX_REQUEUES = 3
+  # If a worker haven't executed an RSpec example for more than this time
+  # (in seconds), it is considered dead and its reserved work will be put back
+  # to the queue, to be picked up by another worker.
+  WORKER_LIVENESS_SEC = 60.0
+end
+require_relative "rspecq/formatters/example_count_recorder"
+require_relative "rspecq/formatters/failure_recorder"
+require_relative "rspecq/formatters/job_timing_recorder"
+require_relative "rspecq/formatters/worker_heartbeat_recorder"
+require_relative "rspecq/queue"
+require_relative "rspecq/reporter"
+require_relative "rspecq/worker"
+require_relative "rspecq/version"

data/lib/rspecq/formatters/example_count_recorder.rb ADDED

@@ -0,0 +1,15 @@
+module RSpecQ
+  module Formatters
+    # Increments the example counter after each job.
+    class ExampleCountRecorder
+      def initialize(queue)
+        @queue = queue
+      end
+      def dump_summary(summary)
+        n = summary.examples.count
+        @queue.increment_example_count(n) if n > 0
+      end
+    end
+  end
+end

data/lib/rspecq/formatters/failure_recorder.rb ADDED

@@ -0,0 +1,50 @@
+module RSpecQ
+  module Formatters
+    class FailureRecorder
+      def initialize(queue, job)
+        @queue = queue
+        @job = job
+        @colorizer = RSpec::Core::Formatters::ConsoleCodes
+        @non_example_error_recorded = false
+      end
+      # Here we're notified about errors occuring outside of examples.
+      #
+      # NOTE: Upon such an error, RSpec emits multiple notifications but we only
+      # want the _first_, which is the one that contains the error backtrace.
+      # That's why have to keep track of whether we've already received the
+      # needed notification and act accordingly.
+      def message(n)
+        if RSpec.world.non_example_failure && !@non_example_error_recorded
+          @queue.record_non_example_error(@job, n.message)
+          @non_example_error_recorded = true
+        end
+      end
+      def example_failed(notification)
+        example = notification.example
+        if @queue.requeue_job(example.id, MAX_REQUEUES)
+          # HACK: try to avoid picking the job we just requeued; we want it
+          # to be picked up by a different worker
+          sleep 0.5
+          return
+        end
+        presenter = RSpec::Core::Formatters::ExceptionPresenter.new(
+          example.exception, example)
+        msg = presenter.fully_formatted(nil, @colorizer)
+        msg << "\n"
+        msg << @colorizer.wrap(
+          "bin/rspec #{example.location_rerun_argument}",
+          RSpec.configuration.failure_color)
+        msg << @colorizer.wrap(
+          " # #{example.full_description}", RSpec.configuration.detail_color)
+        @queue.record_example_failure(notification.example.id, msg)
+      end
+    end
+  end
+end

data/lib/rspecq/formatters/job_timing_recorder.rb ADDED

@@ -0,0 +1,14 @@
+module RSpecQ
+  module Formatters
+    class JobTimingRecorder
+      def initialize(queue, job)
+        @queue = queue
+        @job = job
+      end
+      def dump_summary(summary)
+        @queue.record_timing(@job, Float(summary.duration))
+      end
+    end
+  end
+end

data/lib/rspecq/formatters/worker_heartbeat_recorder.rb ADDED

@@ -0,0 +1,17 @@
+module RSpecQ
+  module Formatters
+    # Updates the respective heartbeat key of the worker after each example.
+    #
+    # Refer to the documentation of WORKER_LIVENESS_SEC for more info.
+    class WorkerHeartbeatRecorder
+      def initialize(worker)
+        @worker = worker
+      end
+      def example_finished(*)
+        @worker.update_heartbeat
+      end
+    end
+  end
+end

data/lib/rspecq/queue.rb ADDED

@@ -0,0 +1,288 @@
+require "redis"
+module RSpecQ
+  class Queue
+    RESERVE_JOB = <<~LUA.freeze
+      local queue = KEYS[1]
+      local queue_running = KEYS[2]
+      local worker_id = ARGV[1]
+      local job = redis.call('lpop', queue)
+      if job then
+        redis.call('hset', queue_running, worker_id, job)
+        return job
+      else
+        return nil
+      end
+    LUA
+    # Scans for dead workers and puts their reserved jobs back to the queue.
+    REQUEUE_LOST_JOB = <<~LUA.freeze
+      local worker_heartbeats = KEYS[1]
+      local queue_running = KEYS[2]
+      local queue_unprocessed = KEYS[3]
+      local time_now = ARGV[1]
+      local timeout = ARGV[2]
+      local dead_workers = redis.call('zrangebyscore', worker_heartbeats, 0, time_now - timeout)
+      for _, worker in ipairs(dead_workers) do
+        local job = redis.call('hget', queue_running, worker)
+        if job then
+          redis.call('lpush', queue_unprocessed, job)
+          redis.call('hdel', queue_running, worker)
+          return job
+        end
+      end
+      return nil
+    LUA
+    REQUEUE_JOB = <<~LUA.freeze
+      local key_queue_unprocessed = KEYS[1]
+      local key_requeues = KEYS[2]
+      local job = ARGV[1]
+      local max_requeues = ARGV[2]
+      local requeued_times = redis.call('hget', key_requeues, job)
+      if requeued_times and requeued_times >= max_requeues then
+        return nil
+      end
+      redis.call('lpush', key_queue_unprocessed, job)
+      redis.call('hincrby', key_requeues, job, 1)
+      return true
+    LUA
+    STATUS_INITIALIZING = "initializing".freeze
+    STATUS_READY = "ready".freeze
+    def initialize(build_id, worker_id, redis_host)
+      @build_id = build_id
+      @worker_id = worker_id
+      @redis = Redis.new(host: redis_host, id: worker_id)
+    end
+    # NOTE: jobs will be processed from head to tail (lpop)
+    def publish(jobs)
+      @redis.multi do
+        @redis.rpush(key_queue_unprocessed, jobs)
+        @redis.set(key_queue_status, STATUS_READY)
+      end.first
+    end
+    def reserve_job
+      @redis.eval(
+        RESERVE_JOB,
+        keys: [
+          key_queue_unprocessed,
+          key_queue_running,
+        ],
+        argv: [@worker_id]
+      )
+    end
+    def requeue_lost_job
+      @redis.eval(
+        REQUEUE_LOST_JOB,
+        keys: [
+          key_worker_heartbeats,
+          key_queue_running,
+          key_queue_unprocessed
+        ],
+        argv: [
+          current_time,
+          WORKER_LIVENESS_SEC
+        ]
+      )
+    end
+    # NOTE: The same job might happen to be acknowledged more than once, in
+    # the case of requeues.
+    def acknowledge_job(job)
+      @redis.multi do
+        @redis.hdel(key_queue_running, @worker_id)
+        @redis.sadd(key_queue_processed, job)
+      end
+    end
+    # Put job at the head of the queue to be re-processed right after, by
+    # another worker. This is a mitigation measure against flaky tests.
+    #
+    # Returns nil if the job hit the requeue limit and therefore was not
+    # requeued and should be considered a failure.
+    def requeue_job(job, max_requeues)
+      return false if max_requeues.zero?
+      @redis.eval(
+        REQUEUE_JOB,
+        keys: [key_queue_unprocessed, key_requeues],
+        argv: [job, max_requeues],
+      )
+    end
+    def record_example_failure(example_id, message)
+      @redis.hset(key_failures, example_id, message)
+    end
+    # For errors occured outside of examples (e.g. while loading a spec file)
+    def record_non_example_error(job, message)
+      @redis.hset(key_errors, job, message)
+    end
+    def record_timing(job, duration)
+      @redis.zadd(key_timings, duration, job)
+    end
+    def record_build_time(duration)
+      @redis.multi do
+        @redis.lpush(key_build_times, Float(duration))
+        @redis.ltrim(key_build_times, 0, 99)
+      end
+    end
+    def record_worker_heartbeat
+      @redis.zadd(key_worker_heartbeats, current_time, @worker_id)
+    end
+    def increment_example_count(n)
+      @redis.incrby(key_example_count, n)
+    end
+    def example_count
+      @redis.get(key_example_count) || 0
+    end
+    def processed_jobs_count
+      @redis.scard(key_queue_processed)
+    end
+    def become_master
+      @redis.setnx(key_queue_status, STATUS_INITIALIZING)
+    end
+    # ordered by execution time desc (slowest are in the head)
+    def timings
+      Hash[@redis.zrevrange(key_timings, 0, -1, withscores: true)]
+    end
+    def example_failures
+      @redis.hgetall(key_failures)
+    end
+    def non_example_errors
+      @redis.hgetall(key_errors)
+    end
+    def exhausted?
+      return false if !published?
+      @redis.multi do
+        @redis.llen(key_queue_unprocessed)
+        @redis.hlen(key_queue_running)
+      end.inject(:+).zero?
+    end
+    def published?
+      @redis.get(key_queue_status) == STATUS_READY
+    end
+    def wait_until_published(timeout=30)
+      (timeout * 10).times do
+        return if published?
+        sleep 0.1
+      end
+      raise "Queue not yet published after #{timeout} seconds"
+    end
+    def build_successful?
+      exhausted? && example_failures.empty? && non_example_errors.empty?
+    end
+    private
+    def key(*keys)
+      [@build_id, keys].join(":")
+    end
+    # redis: STRING [STATUS_INITIALIZING, STATUS_READY]
+    def key_queue_status
+      key("queue", "status")
+    end
+    # redis: LIST<job>
+    def key_queue_unprocessed
+      key("queue", "unprocessed")
+    end
+    # redis: HASH<worker_id => job>
+    def key_queue_running
+      key("queue", "running")
+    end
+    # redis: SET<job>
+    def key_queue_processed
+      key("queue", "processed")
+    end
+    # Contains regular RSpec example failures.
+    #
+    # redis: HASH<example_id => error message>
+    def key_failures
+      key("example_failures")
+    end
+    # Contains errors raised outside of RSpec examples
+    # (e.g. a syntax error in spec_helper.rb).
+    #
+    # redis: HASH<job => error message>
+    def key_errors
+      key("errors")
+    end
+    # As a mitigation mechanism for flaky tests, we requeue example failures
+    # to be retried by another worker, up to a certain number of times.
+    #
+    # redis: HASH<job => times_retried>
+    def key_requeues
+      key("requeues")
+    end
+    # The total number of examples, those that were requeued.
+    #
+    # redis: STRING<integer>
+    def key_example_count
+      key("example_count")
+    end
+    # redis: ZSET<worker_id => timestamp>
+    #
+    # Timestamp of the last example processed by each worker.
+    def key_worker_heartbeats
+      key("worker_heartbeats")
+    end
+    # redis: ZSET<job => duration>
+    #
+    # NOTE: This key is not scoped to a build (i.e. shared among all builds),
+    # so be careful to only publish timings from a single branch (e.g. master).
+    # Otherwise, timings won't be accurate.
+    def key_timings
+      "timings"
+    end
+    # redis: LIST<duration>
+    #
+    # Last build is at the head of the list.
+    def key_build_times
+      "build_times"
+    end
+    # We don't use any Ruby `Time` methods because specs that use timecop in
+    # before(:all) hooks will mess up our times.
+    def current_time
+      @redis.time[0]
+    end
+  end
+end

data/lib/rspecq/reporter.rb ADDED

@@ -0,0 +1,95 @@
+module RSpecQ
+  class Reporter
+    def initialize(build_id:, worker_id:, timeout:, redis_host:)
+      @build_id = build_id
+      @worker_id = worker_id
+      @timeout = timeout
+      @queue = Queue.new(build_id, worker_id, redis_host)
+      # We want feedback to be immediattely printed to CI users, so
+      # we disable buffering.
+      STDOUT.sync = true
+    end
+    def report
+      t = measure_duration { @queue.wait_until_published }
+      finished = false
+      reported_failures = {}
+      failure_heading_printed = false
+      tests_duration = measure_duration do
+        @timeout.times do |i|
+          @queue.example_failures.each do |job, rspec_output|
+            next if reported_failures[job]
+            if !failure_heading_printed
+              puts "\nFailures:\n"
+              failure_heading_printed = true
+            end
+            reported_failures[job] = true
+            puts failure_formatted(rspec_output)
+          end
+          if !@queue.exhausted?
+            sleep 1
+            next
+          end
+          finished = true
+          break
+        end
+      end
+      raise "Build not finished after #{@timeout} seconds" if !finished
+      @queue.record_build_time(tests_duration)
+      puts summary(@queue.example_failures, @queue.non_example_errors,
+                   humanize_duration(tests_duration))
+      exit 1 if !@queue.build_successful?
+    end
+    private
+    def measure_duration
+      start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+      yield
+      (Process.clock_gettime(Process::CLOCK_MONOTONIC) - start).round(2)
+    end
+    # We try to keep this output consistent with RSpec's original output
+    def summary(failures, errors, duration)
+      failed_examples_section = "\nFailed examples:\n\n"
+      failures.each do |_job, msg|
+        parts = msg.split("\n")
+        failed_examples_section << "  #{parts[-1]}\n"
+      end
+      summary = ""
+      summary << failed_examples_section if !failures.empty?
+      errors.each { |_job, msg| summary << msg }
+      summary << "\n"
+      summary << "Total results:\n"
+      summary << "  #{@queue.example_count} examples "     \
+                 "(#{@queue.processed_jobs_count} jobs), " \
+                 "#{failures.count} failures, "            \
+                 "#{errors.count} errors"
+      summary << "\n\n"
+      summary << "Spec execution time: #{duration}"
+    end
+    def failure_formatted(rspec_output)
+      rspec_output.split("\n")[0..-2].join("\n")
+    end
+    def humanize_duration(seconds)
+      Time.at(seconds).utc.strftime("%H:%M:%S")
+    end
+  end
+end

data/lib/rspecq/version.rb ADDED

@@ -0,0 +1,3 @@
+module RSpecQ
+  VERSION = "0.0.1.pre1".freeze
+end

data/lib/rspecq/worker.rb ADDED

@@ -0,0 +1,185 @@
+require "json"
+require "pp"
+module RSpecQ
+  class Worker
+    HEARTBEAT_FREQUENCY = WORKER_LIVENESS_SEC / 6
+    # If true, job timings will be populated in the global Redis timings key
+    #
+    # Defaults to false
+    attr_accessor :populate_timings
+    # If set, spec files that are known to take more than this value to finish,
+    # will be split and scheduled on a per-example basis.
+    attr_accessor :file_split_threshold
+    def initialize(build_id:, worker_id:, redis_host:, files_or_dirs_to_run:)
+      @build_id = build_id
+      @worker_id = worker_id
+      @queue = Queue.new(build_id, worker_id, redis_host)
+      @files_or_dirs_to_run = files_or_dirs_to_run
+      @populate_timings = false
+      @file_split_threshold = 999999
+      RSpec::Core::Formatters.register(Formatters::JobTimingRecorder, :dump_summary)
+      RSpec::Core::Formatters.register(Formatters::ExampleCountRecorder, :dump_summary)
+      RSpec::Core::Formatters.register(Formatters::FailureRecorder, :example_failed, :message)
+      RSpec::Core::Formatters.register(Formatters::WorkerHeartbeatRecorder, :example_finished)
+    end
+    def work
+      puts "Working for build #{@build_id} (worker=#{@worker_id})"
+      try_publish_queue!(@queue)
+      @queue.wait_until_published
+      loop do
+        # we have to bootstrap this so that it can be used in the first call
+        # to `requeue_lost_job` inside the work loop
+        update_heartbeat
+        lost = @queue.requeue_lost_job
+        puts "Requeued lost job: #{lost}" if lost
+        # TODO: can we make `reserve_job` also act like exhausted? and get
+        # rid of `exhausted?` (i.e. return false if no jobs remain)
+        job = @queue.reserve_job
+        # build is finished
+        return if job.nil? && @queue.exhausted?
+        next if job.nil?
+        puts
+        puts "Executing #{job}"
+        reset_rspec_state!
+        # reconfigure rspec
+        RSpec.configuration.detail_color = :magenta
+        RSpec.configuration.seed = srand && srand % 0xFFFF
+        RSpec.configuration.backtrace_formatter.filter_gem('rspecq')
+        RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(@queue, job))
+        RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(@queue))
+        RSpec.configuration.add_formatter(Formatters::WorkerHeartbeatRecorder.new(self))
+        if populate_timings
+          RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(@queue, job))
+        end
+        opts = RSpec::Core::ConfigurationOptions.new(["--format", "progress", job])
+        _result = RSpec::Core::Runner.new(opts).run($stderr, $stdout)
+        @queue.acknowledge_job(job)
+      end
+    end
+    # Update the worker heartbeat if necessary
+    def update_heartbeat
+      if @heartbeat_updated_at.nil? || elapsed(@heartbeat_updated_at) >= HEARTBEAT_FREQUENCY
+        @queue.record_worker_heartbeat
+        @heartbeat_updated_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+      end
+    end
+    private
+    def reset_rspec_state!
+      RSpec.clear_examples
+      # TODO: remove after https://github.com/rspec/rspec-core/pull/2723
+      RSpec.world.instance_variable_set(:@example_group_counts_by_spec_file, Hash.new(0))
+      # RSpec.clear_examples does not reset those, which causes issues when
+      # a non-example error occurs (subsequent jobs are not executed)
+      # TODO: upstream
+      RSpec.world.non_example_failure = false
+      # we don't want an error that occured outside of the examples (which
+      # would set this to `true`) to stop the worker
+      RSpec.world.wants_to_quit = false
+    end
+    def try_publish_queue!(queue)
+      return if !queue.become_master
+      RSpec.configuration.files_or_directories_to_run = @files_or_dirs_to_run
+      files_to_run = RSpec.configuration.files_to_run.map { |j| relative_path(j) }
+      timings = queue.timings
+      if timings.empty?
+        # TODO: should be a warning reported somewhere (Sentry?)
+        q_size = queue.publish(files_to_run.shuffle)
+        puts "WARNING: No timings found! Published queue in " \
+             "random order (size=#{q_size})"
+        return
+      end
+      slow_files = timings.take_while do |_job, duration|
+        duration >= file_split_threshold
+      end.map(&:first) & files_to_run
+      if slow_files.any?
+        puts "Slow files (threshold=#{file_split_threshold}): #{slow_files}"
+      end
+      # prepare jobs to run
+      jobs = []
+      jobs.concat(files_to_run - slow_files)
+      jobs.concat(files_to_example_ids(slow_files)) if slow_files.any?
+      # assign timings to all of them
+      default_timing = timings.values[timings.values.size/2]
+      jobs = jobs.each_with_object({}) do |j, h|
+        # heuristic: put untimed jobs in the middle of the queue
+        puts "New/untimed job: #{j}" if timings[j].nil?
+        h[j] = timings[j] || default_timing
+      end
+      # finally, sort them based on their timing (slowest first)
+      jobs = jobs.sort_by { |_j, t| -t }.map(&:first)
+      puts "Published queue (size=#{queue.publish(jobs)})"
+    end
+    # NOTE: RSpec has to load the files before we can split them as individual
+    # examples. In case a file to be splitted fails to be loaded
+    # (e.g. contains a syntax error), we return the slow files unchanged,
+    # thereby falling back to scheduling them normally.
+    #
+    # Their errors will be reported in the normal flow, when they're picked up
+    # as jobs by a worker.
+    def files_to_example_ids(files)
+      # TODO: do this programatically
+      cmd = "DISABLE_SPRING=1 bin/rspec --dry-run --format json #{files.join(' ')}"
+      out = `#{cmd}`
+      if !$?.success?
+        # TODO: emit warning to Sentry
+        puts "WARNING: Error splitting slow files; falling back to regular scheduling:"
+        begin
+          pp JSON.parse(out)
+        rescue JSON::ParserError
+          puts out
+        end
+        puts
+        return files
+      end
+      JSON.parse(out)["examples"].map { |e| e["id"] }
+    end
+    def relative_path(job)
+      @cwd ||= Pathname.new(Dir.pwd)
+      "./#{Pathname.new(job).relative_path_from(@cwd)}"
+    end
+    def elapsed(since)
+      Process.clock_gettime(Process::CLOCK_MONOTONIC) - since
+    end
+  end
+end

metadata ADDED

@@ -0,0 +1,98 @@
+--- !ruby/object:Gem::Specification
+name: rspecq
+version: !ruby/object:Gem::Version
+  version: 0.0.1.pre1
+platform: ruby
+authors:
+- Agis Anastasopoulos
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2020-06-26 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rspec-core
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: minitest
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '5.14'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '5.14'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+description:
+email: agis.anast@gmail.com
+executables:
+- rspecq
+extensions: []
+extra_rdoc_files: []
+files:
+- CHANGELOG.md
+- LICENSE
+- README.md
+- bin/rspecq
+- lib/rspecq.rb
+- lib/rspecq/formatters/example_count_recorder.rb
+- lib/rspecq/formatters/failure_recorder.rb
+- lib/rspecq/formatters/job_timing_recorder.rb
+- lib/rspecq/formatters/worker_heartbeat_recorder.rb
+- lib/rspecq/queue.rb
+- lib/rspecq/reporter.rb
+- lib/rspecq/version.rb
+- lib/rspecq/worker.rb
+homepage: https://github.com/skroutz/rspecq
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">"
+    - !ruby/object:Gem::Version
+      version: 1.3.1
+requirements: []
+rubygems_version: 3.1.2
+signing_key:
+specification_version: 4
+summary: Distribute an RSpec suite among many workers
+test_files: []