rspecq 0.0.1.pre1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/CHANGELOG.md +4 -0
- data/LICENSE +20 -0
- data/README.md +103 -0
- data/bin/rspecq +67 -0
- data/lib/rspecq.rb +21 -0
- data/lib/rspecq/formatters/example_count_recorder.rb +15 -0
- data/lib/rspecq/formatters/failure_recorder.rb +50 -0
- data/lib/rspecq/formatters/job_timing_recorder.rb +14 -0
- data/lib/rspecq/formatters/worker_heartbeat_recorder.rb +17 -0
- data/lib/rspecq/queue.rb +288 -0
- data/lib/rspecq/reporter.rb +95 -0
- data/lib/rspecq/version.rb +3 -0
- data/lib/rspecq/worker.rb +185 -0
- metadata +98 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: a1c9e27a7a39ff772ee8f4303d26af799f4c7f20232cdc8729f3fa1ddcc4c144
|
4
|
+
data.tar.gz: 2dc1200b575b95b10f2dca4e4a1f3ba90e1577a6af4fd177691aece592249ed6
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: c7654d037340e28e5ed31dfbed7826e30b84a2e092f930df13e76d92d0513c6ef2d25727c18bb7b84ad89675835e28e848c85f5b010ab13384674e2c2763f06f
|
7
|
+
data.tar.gz: 2b7421273d4b38848e8110526fb29740f67f9a620d2f205a612887503ce3b04463fafb9eff0dd9845c1eb56c4810fca97eed37761a6a23fa9fc39ab962d5373b
|
data/CHANGELOG.md
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
The MIT License
|
2
|
+
|
3
|
+
Copyright (c) 2020 Skroutz S.A.
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
6
|
+
this software and associated documentation files (the "Software"), to deal in
|
7
|
+
the Software without restriction, including without limitation the rights to
|
8
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
9
|
+
the Software, and to permit persons to whom the Software is furnished to do so,
|
10
|
+
subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
13
|
+
copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
17
|
+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
18
|
+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
19
|
+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
20
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,103 @@
|
|
1
|
+
# RSpecQ
|
2
|
+
|
3
|
+
RSpecQ (`rspecq`) distributes and executes an RSpec suite over many workers,
|
4
|
+
using a centralized queue backed by Redis.
|
5
|
+
|
6
|
+
RSpecQ is heavily inspired by [test-queue](https://github.com/tmm1/test-queue)
|
7
|
+
and [ci-queue](https://github.com/Shopify/ci-queue).
|
8
|
+
|
9
|
+
## Why don't you just use ci-queue?
|
10
|
+
|
11
|
+
While evaluating ci-queue for our RSpec suite, we observed slow boot times
|
12
|
+
in the workers (up to 3 minutes), increased memory consumption and too much
|
13
|
+
disk I/O on boot. This is due to the fact that a worker in ci-queue has to
|
14
|
+
load every spec file on boot. This can be problematic for applications with
|
15
|
+
a large number of spec files.
|
16
|
+
|
17
|
+
RSpecQ works with spec files as its unit of work (as opposed to ci-queue which
|
18
|
+
works with individual examples). This means that an RSpecQ worker does not
|
19
|
+
have to load all spec files at once and so it doesn't have the aforementioned
|
20
|
+
problems. It also allows suites to keep using `before(:all)` hooks
|
21
|
+
(which ci-queue explicitly rejects). (Note: RSpecQ also schedules individual
|
22
|
+
examples, but only when this is deemed necessary, see section
|
23
|
+
"Spec file splitting").
|
24
|
+
|
25
|
+
We also observed faster build times by scheduling spec files instead of
|
26
|
+
individual examples, due to way less Redis operations.
|
27
|
+
|
28
|
+
The downside of this design is that it's more complicated, since the scheduling
|
29
|
+
of spec files happens based on timings calculated from previous runs. This
|
30
|
+
means that RSpecQ maintains a key with the timing of each job and updates it
|
31
|
+
on every run (if the `--timings` option was used). Also, RSpecQ has a "slow
|
32
|
+
file threshold" which, currently has to be set manually (but this can be
|
33
|
+
improved).
|
34
|
+
|
35
|
+
*Update*: ci-queue deprecated support for RSpec, so there's that.
|
36
|
+
|
37
|
+
## Usage
|
38
|
+
|
39
|
+
Each worker needs to know the build it will participate in, its name and where
|
40
|
+
Redis is located. To start a worker:
|
41
|
+
|
42
|
+
```shell
|
43
|
+
$ rspecq --build-id=foo --worker-id=worker1 --redis=redis://localhost
|
44
|
+
```
|
45
|
+
|
46
|
+
To view the progress of the build print use `--report`:
|
47
|
+
|
48
|
+
```shell
|
49
|
+
$ rspecq --build-id=foo --worker-id=reporter --redis=redis://localhost --report
|
50
|
+
```
|
51
|
+
|
52
|
+
For detailed info use `--help`.
|
53
|
+
|
54
|
+
|
55
|
+
## How it works
|
56
|
+
|
57
|
+
The basic idea is identical to ci-queue so please refer to its README
|
58
|
+
|
59
|
+
### Terminology
|
60
|
+
|
61
|
+
- Job: the smallest unit of work, which is usually a spec file
|
62
|
+
(e.g. `./spec/models/foo_spec.rb`) but can also be an individual example
|
63
|
+
(e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow
|
64
|
+
- Queue: a collection of Redis-backed structures that hold all the necessary
|
65
|
+
information for RSpecQ to function. This includes timing statistics, jobs to
|
66
|
+
be executed, the failure reports, requeueing statistics and more.
|
67
|
+
- Worker: a process that, given a build id, pops up jobs of that build and
|
68
|
+
executes them using RSpec
|
69
|
+
- Reporter: a process that, given a build id, waits for the build to finish
|
70
|
+
and prints the summary report (examples executed, build result, failures etc.)
|
71
|
+
|
72
|
+
### Spec file splitting
|
73
|
+
|
74
|
+
Very slow files may put a limit to how fast the suite can execute. For example,
|
75
|
+
a worker may spend 10 minutes running a single slow file, while all the other
|
76
|
+
workers finish after 8 minutes. To overcome this issue, rspecq splits
|
77
|
+
files that their execution time is above a certain threshold
|
78
|
+
(set with the `--file-split-threshold` option) and will instead schedule them as
|
79
|
+
individual examples.
|
80
|
+
|
81
|
+
In the future, we'd like for the slow threshold to be calculated and set
|
82
|
+
dynamically.
|
83
|
+
|
84
|
+
### Requeues
|
85
|
+
|
86
|
+
As a mitigation measure for flaky tests, if an example fails it will be put
|
87
|
+
back to the queue to be picked up by
|
88
|
+
another worker. This will be repeated up to a certain number of times before,
|
89
|
+
after which the example will be considered a legit failure and will be printed
|
90
|
+
in the final report (`--report`).
|
91
|
+
|
92
|
+
### Worker failures
|
93
|
+
|
94
|
+
Workers emit a timestamp after each example, as a heartbeat, to denote
|
95
|
+
that they're fine and performing jobs. If a worker hasn't reported for
|
96
|
+
a given amount of time (see `WORKER_LIVENESS_SEC`) it is considered dead
|
97
|
+
and the job it reserved will be requeued, so that it is picked up by another worker.
|
98
|
+
|
99
|
+
This protects us against unrecoverable worker failures (e.g. segfault).
|
100
|
+
|
101
|
+
## License
|
102
|
+
|
103
|
+
RSpecQ is licensed under MIT. See [LICENSE](LICENSE).
|
data/bin/rspecq
ADDED
@@ -0,0 +1,67 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
require "optionparser"
|
3
|
+
require "rspecq"
|
4
|
+
|
5
|
+
opts = {}
|
6
|
+
OptionParser.new do |o|
|
7
|
+
o.banner = "Usage: #{$PROGRAM_NAME} [opts] [files_or_directories_to_run]"
|
8
|
+
|
9
|
+
o.on("--build-id ID", "A unique identifier denoting the build") do |v|
|
10
|
+
opts[:build_id] = v
|
11
|
+
end
|
12
|
+
|
13
|
+
o.on("--worker-id ID", "A unique identifier denoting the worker") do |v|
|
14
|
+
opts[:worker_id] = v
|
15
|
+
end
|
16
|
+
|
17
|
+
o.on("--redis HOST", "Redis HOST to connect to (default: 127.0.0.1)") do |v|
|
18
|
+
opts[:redis_host] = v || "127.0.0.1"
|
19
|
+
end
|
20
|
+
|
21
|
+
o.on("--timings", "Populate global job timings in Redis") do |v|
|
22
|
+
opts[:timings] = v
|
23
|
+
end
|
24
|
+
|
25
|
+
o.on("--file-split-threshold N", "Split spec files slower than N sec. and " \
|
26
|
+
"schedule them by example (default: 999999)") do |v|
|
27
|
+
opts[:file_split_threshold] = Float(v)
|
28
|
+
end
|
29
|
+
|
30
|
+
o.on("--report", "Do not execute tests but wait until queue is empty and " \
|
31
|
+
"print a report") do |v|
|
32
|
+
opts[:report] = v
|
33
|
+
end
|
34
|
+
|
35
|
+
o.on("--report-timeout N", Integer, "Fail if queue is not empty after " \
|
36
|
+
"N seconds. Only applicable if --report is enabled " \
|
37
|
+
"(default: 3600)") do |v|
|
38
|
+
opts[:report_timeout] = v
|
39
|
+
end
|
40
|
+
|
41
|
+
end.parse!
|
42
|
+
|
43
|
+
[:build_id, :worker_id].each do |o|
|
44
|
+
raise OptionParser::MissingArgument.new(o) if opts[o].nil?
|
45
|
+
end
|
46
|
+
|
47
|
+
if opts[:report]
|
48
|
+
reporter = RSpecQ::Reporter.new(
|
49
|
+
build_id: opts[:build_id],
|
50
|
+
worker_id: opts[:worker_id],
|
51
|
+
timeout: opts[:report_timeout] || 3600,
|
52
|
+
redis_host: opts[:redis_host],
|
53
|
+
)
|
54
|
+
|
55
|
+
reporter.report
|
56
|
+
else
|
57
|
+
worker = RSpecQ::Worker.new(
|
58
|
+
build_id: opts[:build_id],
|
59
|
+
worker_id: opts[:worker_id],
|
60
|
+
redis_host: opts[:redis_host],
|
61
|
+
files_or_dirs_to_run: ARGV[0] || "spec",
|
62
|
+
)
|
63
|
+
|
64
|
+
worker.populate_timings = opts[:timings]
|
65
|
+
worker.file_split_threshold = opts[:file_split_threshold] || 999999
|
66
|
+
worker.work
|
67
|
+
end
|
data/lib/rspecq.rb
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
require "rspec/core"
|
2
|
+
|
3
|
+
module RSpecQ
|
4
|
+
MAX_REQUEUES = 3
|
5
|
+
|
6
|
+
# If a worker haven't executed an RSpec example for more than this time
|
7
|
+
# (in seconds), it is considered dead and its reserved work will be put back
|
8
|
+
# to the queue, to be picked up by another worker.
|
9
|
+
WORKER_LIVENESS_SEC = 60.0
|
10
|
+
end
|
11
|
+
|
12
|
+
require_relative "rspecq/formatters/example_count_recorder"
|
13
|
+
require_relative "rspecq/formatters/failure_recorder"
|
14
|
+
require_relative "rspecq/formatters/job_timing_recorder"
|
15
|
+
require_relative "rspecq/formatters/worker_heartbeat_recorder"
|
16
|
+
|
17
|
+
require_relative "rspecq/queue"
|
18
|
+
require_relative "rspecq/reporter"
|
19
|
+
require_relative "rspecq/worker"
|
20
|
+
|
21
|
+
require_relative "rspecq/version"
|
@@ -0,0 +1,15 @@
|
|
1
|
+
module RSpecQ
|
2
|
+
module Formatters
|
3
|
+
# Increments the example counter after each job.
|
4
|
+
class ExampleCountRecorder
|
5
|
+
def initialize(queue)
|
6
|
+
@queue = queue
|
7
|
+
end
|
8
|
+
|
9
|
+
def dump_summary(summary)
|
10
|
+
n = summary.examples.count
|
11
|
+
@queue.increment_example_count(n) if n > 0
|
12
|
+
end
|
13
|
+
end
|
14
|
+
end
|
15
|
+
end
|
@@ -0,0 +1,50 @@
|
|
1
|
+
module RSpecQ
|
2
|
+
module Formatters
|
3
|
+
class FailureRecorder
|
4
|
+
def initialize(queue, job)
|
5
|
+
@queue = queue
|
6
|
+
@job = job
|
7
|
+
@colorizer = RSpec::Core::Formatters::ConsoleCodes
|
8
|
+
@non_example_error_recorded = false
|
9
|
+
end
|
10
|
+
|
11
|
+
# Here we're notified about errors occuring outside of examples.
|
12
|
+
#
|
13
|
+
# NOTE: Upon such an error, RSpec emits multiple notifications but we only
|
14
|
+
# want the _first_, which is the one that contains the error backtrace.
|
15
|
+
# That's why have to keep track of whether we've already received the
|
16
|
+
# needed notification and act accordingly.
|
17
|
+
def message(n)
|
18
|
+
if RSpec.world.non_example_failure && !@non_example_error_recorded
|
19
|
+
@queue.record_non_example_error(@job, n.message)
|
20
|
+
@non_example_error_recorded = true
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
def example_failed(notification)
|
25
|
+
example = notification.example
|
26
|
+
|
27
|
+
if @queue.requeue_job(example.id, MAX_REQUEUES)
|
28
|
+
# HACK: try to avoid picking the job we just requeued; we want it
|
29
|
+
# to be picked up by a different worker
|
30
|
+
sleep 0.5
|
31
|
+
return
|
32
|
+
end
|
33
|
+
|
34
|
+
presenter = RSpec::Core::Formatters::ExceptionPresenter.new(
|
35
|
+
example.exception, example)
|
36
|
+
|
37
|
+
msg = presenter.fully_formatted(nil, @colorizer)
|
38
|
+
msg << "\n"
|
39
|
+
msg << @colorizer.wrap(
|
40
|
+
"bin/rspec #{example.location_rerun_argument}",
|
41
|
+
RSpec.configuration.failure_color)
|
42
|
+
|
43
|
+
msg << @colorizer.wrap(
|
44
|
+
" # #{example.full_description}", RSpec.configuration.detail_color)
|
45
|
+
|
46
|
+
@queue.record_example_failure(notification.example.id, msg)
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
@@ -0,0 +1,17 @@
|
|
1
|
+
module RSpecQ
|
2
|
+
module Formatters
|
3
|
+
# Updates the respective heartbeat key of the worker after each example.
|
4
|
+
#
|
5
|
+
# Refer to the documentation of WORKER_LIVENESS_SEC for more info.
|
6
|
+
class WorkerHeartbeatRecorder
|
7
|
+
def initialize(worker)
|
8
|
+
@worker = worker
|
9
|
+
end
|
10
|
+
|
11
|
+
def example_finished(*)
|
12
|
+
@worker.update_heartbeat
|
13
|
+
end
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
data/lib/rspecq/queue.rb
ADDED
@@ -0,0 +1,288 @@
|
|
1
|
+
require "redis"
|
2
|
+
|
3
|
+
module RSpecQ
|
4
|
+
class Queue
|
5
|
+
RESERVE_JOB = <<~LUA.freeze
|
6
|
+
local queue = KEYS[1]
|
7
|
+
local queue_running = KEYS[2]
|
8
|
+
local worker_id = ARGV[1]
|
9
|
+
|
10
|
+
local job = redis.call('lpop', queue)
|
11
|
+
if job then
|
12
|
+
redis.call('hset', queue_running, worker_id, job)
|
13
|
+
return job
|
14
|
+
else
|
15
|
+
return nil
|
16
|
+
end
|
17
|
+
LUA
|
18
|
+
|
19
|
+
# Scans for dead workers and puts their reserved jobs back to the queue.
|
20
|
+
REQUEUE_LOST_JOB = <<~LUA.freeze
|
21
|
+
local worker_heartbeats = KEYS[1]
|
22
|
+
local queue_running = KEYS[2]
|
23
|
+
local queue_unprocessed = KEYS[3]
|
24
|
+
local time_now = ARGV[1]
|
25
|
+
local timeout = ARGV[2]
|
26
|
+
|
27
|
+
local dead_workers = redis.call('zrangebyscore', worker_heartbeats, 0, time_now - timeout)
|
28
|
+
for _, worker in ipairs(dead_workers) do
|
29
|
+
local job = redis.call('hget', queue_running, worker)
|
30
|
+
if job then
|
31
|
+
redis.call('lpush', queue_unprocessed, job)
|
32
|
+
redis.call('hdel', queue_running, worker)
|
33
|
+
return job
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
return nil
|
38
|
+
LUA
|
39
|
+
|
40
|
+
REQUEUE_JOB = <<~LUA.freeze
|
41
|
+
local key_queue_unprocessed = KEYS[1]
|
42
|
+
local key_requeues = KEYS[2]
|
43
|
+
local job = ARGV[1]
|
44
|
+
local max_requeues = ARGV[2]
|
45
|
+
|
46
|
+
local requeued_times = redis.call('hget', key_requeues, job)
|
47
|
+
if requeued_times and requeued_times >= max_requeues then
|
48
|
+
return nil
|
49
|
+
end
|
50
|
+
|
51
|
+
redis.call('lpush', key_queue_unprocessed, job)
|
52
|
+
redis.call('hincrby', key_requeues, job, 1)
|
53
|
+
|
54
|
+
return true
|
55
|
+
LUA
|
56
|
+
|
57
|
+
STATUS_INITIALIZING = "initializing".freeze
|
58
|
+
STATUS_READY = "ready".freeze
|
59
|
+
|
60
|
+
def initialize(build_id, worker_id, redis_host)
|
61
|
+
@build_id = build_id
|
62
|
+
@worker_id = worker_id
|
63
|
+
@redis = Redis.new(host: redis_host, id: worker_id)
|
64
|
+
end
|
65
|
+
|
66
|
+
# NOTE: jobs will be processed from head to tail (lpop)
|
67
|
+
def publish(jobs)
|
68
|
+
@redis.multi do
|
69
|
+
@redis.rpush(key_queue_unprocessed, jobs)
|
70
|
+
@redis.set(key_queue_status, STATUS_READY)
|
71
|
+
end.first
|
72
|
+
end
|
73
|
+
|
74
|
+
def reserve_job
|
75
|
+
@redis.eval(
|
76
|
+
RESERVE_JOB,
|
77
|
+
keys: [
|
78
|
+
key_queue_unprocessed,
|
79
|
+
key_queue_running,
|
80
|
+
],
|
81
|
+
argv: [@worker_id]
|
82
|
+
)
|
83
|
+
end
|
84
|
+
|
85
|
+
def requeue_lost_job
|
86
|
+
@redis.eval(
|
87
|
+
REQUEUE_LOST_JOB,
|
88
|
+
keys: [
|
89
|
+
key_worker_heartbeats,
|
90
|
+
key_queue_running,
|
91
|
+
key_queue_unprocessed
|
92
|
+
],
|
93
|
+
argv: [
|
94
|
+
current_time,
|
95
|
+
WORKER_LIVENESS_SEC
|
96
|
+
]
|
97
|
+
)
|
98
|
+
end
|
99
|
+
|
100
|
+
# NOTE: The same job might happen to be acknowledged more than once, in
|
101
|
+
# the case of requeues.
|
102
|
+
def acknowledge_job(job)
|
103
|
+
@redis.multi do
|
104
|
+
@redis.hdel(key_queue_running, @worker_id)
|
105
|
+
@redis.sadd(key_queue_processed, job)
|
106
|
+
end
|
107
|
+
end
|
108
|
+
|
109
|
+
# Put job at the head of the queue to be re-processed right after, by
|
110
|
+
# another worker. This is a mitigation measure against flaky tests.
|
111
|
+
#
|
112
|
+
# Returns nil if the job hit the requeue limit and therefore was not
|
113
|
+
# requeued and should be considered a failure.
|
114
|
+
def requeue_job(job, max_requeues)
|
115
|
+
return false if max_requeues.zero?
|
116
|
+
|
117
|
+
@redis.eval(
|
118
|
+
REQUEUE_JOB,
|
119
|
+
keys: [key_queue_unprocessed, key_requeues],
|
120
|
+
argv: [job, max_requeues],
|
121
|
+
)
|
122
|
+
end
|
123
|
+
|
124
|
+
def record_example_failure(example_id, message)
|
125
|
+
@redis.hset(key_failures, example_id, message)
|
126
|
+
end
|
127
|
+
|
128
|
+
# For errors occured outside of examples (e.g. while loading a spec file)
|
129
|
+
def record_non_example_error(job, message)
|
130
|
+
@redis.hset(key_errors, job, message)
|
131
|
+
end
|
132
|
+
|
133
|
+
def record_timing(job, duration)
|
134
|
+
@redis.zadd(key_timings, duration, job)
|
135
|
+
end
|
136
|
+
|
137
|
+
def record_build_time(duration)
|
138
|
+
@redis.multi do
|
139
|
+
@redis.lpush(key_build_times, Float(duration))
|
140
|
+
@redis.ltrim(key_build_times, 0, 99)
|
141
|
+
end
|
142
|
+
end
|
143
|
+
|
144
|
+
def record_worker_heartbeat
|
145
|
+
@redis.zadd(key_worker_heartbeats, current_time, @worker_id)
|
146
|
+
end
|
147
|
+
|
148
|
+
def increment_example_count(n)
|
149
|
+
@redis.incrby(key_example_count, n)
|
150
|
+
end
|
151
|
+
|
152
|
+
def example_count
|
153
|
+
@redis.get(key_example_count) || 0
|
154
|
+
end
|
155
|
+
|
156
|
+
def processed_jobs_count
|
157
|
+
@redis.scard(key_queue_processed)
|
158
|
+
end
|
159
|
+
|
160
|
+
def become_master
|
161
|
+
@redis.setnx(key_queue_status, STATUS_INITIALIZING)
|
162
|
+
end
|
163
|
+
|
164
|
+
# ordered by execution time desc (slowest are in the head)
|
165
|
+
def timings
|
166
|
+
Hash[@redis.zrevrange(key_timings, 0, -1, withscores: true)]
|
167
|
+
end
|
168
|
+
|
169
|
+
def example_failures
|
170
|
+
@redis.hgetall(key_failures)
|
171
|
+
end
|
172
|
+
|
173
|
+
def non_example_errors
|
174
|
+
@redis.hgetall(key_errors)
|
175
|
+
end
|
176
|
+
|
177
|
+
def exhausted?
|
178
|
+
return false if !published?
|
179
|
+
|
180
|
+
@redis.multi do
|
181
|
+
@redis.llen(key_queue_unprocessed)
|
182
|
+
@redis.hlen(key_queue_running)
|
183
|
+
end.inject(:+).zero?
|
184
|
+
end
|
185
|
+
|
186
|
+
def published?
|
187
|
+
@redis.get(key_queue_status) == STATUS_READY
|
188
|
+
end
|
189
|
+
|
190
|
+
def wait_until_published(timeout=30)
|
191
|
+
(timeout * 10).times do
|
192
|
+
return if published?
|
193
|
+
sleep 0.1
|
194
|
+
end
|
195
|
+
|
196
|
+
raise "Queue not yet published after #{timeout} seconds"
|
197
|
+
end
|
198
|
+
|
199
|
+
def build_successful?
|
200
|
+
exhausted? && example_failures.empty? && non_example_errors.empty?
|
201
|
+
end
|
202
|
+
|
203
|
+
private
|
204
|
+
|
205
|
+
def key(*keys)
|
206
|
+
[@build_id, keys].join(":")
|
207
|
+
end
|
208
|
+
|
209
|
+
# redis: STRING [STATUS_INITIALIZING, STATUS_READY]
|
210
|
+
def key_queue_status
|
211
|
+
key("queue", "status")
|
212
|
+
end
|
213
|
+
|
214
|
+
# redis: LIST<job>
|
215
|
+
def key_queue_unprocessed
|
216
|
+
key("queue", "unprocessed")
|
217
|
+
end
|
218
|
+
|
219
|
+
# redis: HASH<worker_id => job>
|
220
|
+
def key_queue_running
|
221
|
+
key("queue", "running")
|
222
|
+
end
|
223
|
+
|
224
|
+
# redis: SET<job>
|
225
|
+
def key_queue_processed
|
226
|
+
key("queue", "processed")
|
227
|
+
end
|
228
|
+
|
229
|
+
# Contains regular RSpec example failures.
|
230
|
+
#
|
231
|
+
# redis: HASH<example_id => error message>
|
232
|
+
def key_failures
|
233
|
+
key("example_failures")
|
234
|
+
end
|
235
|
+
|
236
|
+
# Contains errors raised outside of RSpec examples
|
237
|
+
# (e.g. a syntax error in spec_helper.rb).
|
238
|
+
#
|
239
|
+
# redis: HASH<job => error message>
|
240
|
+
def key_errors
|
241
|
+
key("errors")
|
242
|
+
end
|
243
|
+
|
244
|
+
# As a mitigation mechanism for flaky tests, we requeue example failures
|
245
|
+
# to be retried by another worker, up to a certain number of times.
|
246
|
+
#
|
247
|
+
# redis: HASH<job => times_retried>
|
248
|
+
def key_requeues
|
249
|
+
key("requeues")
|
250
|
+
end
|
251
|
+
|
252
|
+
# The total number of examples, those that were requeued.
|
253
|
+
#
|
254
|
+
# redis: STRING<integer>
|
255
|
+
def key_example_count
|
256
|
+
key("example_count")
|
257
|
+
end
|
258
|
+
|
259
|
+
# redis: ZSET<worker_id => timestamp>
|
260
|
+
#
|
261
|
+
# Timestamp of the last example processed by each worker.
|
262
|
+
def key_worker_heartbeats
|
263
|
+
key("worker_heartbeats")
|
264
|
+
end
|
265
|
+
|
266
|
+
# redis: ZSET<job => duration>
|
267
|
+
#
|
268
|
+
# NOTE: This key is not scoped to a build (i.e. shared among all builds),
|
269
|
+
# so be careful to only publish timings from a single branch (e.g. master).
|
270
|
+
# Otherwise, timings won't be accurate.
|
271
|
+
def key_timings
|
272
|
+
"timings"
|
273
|
+
end
|
274
|
+
|
275
|
+
# redis: LIST<duration>
|
276
|
+
#
|
277
|
+
# Last build is at the head of the list.
|
278
|
+
def key_build_times
|
279
|
+
"build_times"
|
280
|
+
end
|
281
|
+
|
282
|
+
# We don't use any Ruby `Time` methods because specs that use timecop in
|
283
|
+
# before(:all) hooks will mess up our times.
|
284
|
+
def current_time
|
285
|
+
@redis.time[0]
|
286
|
+
end
|
287
|
+
end
|
288
|
+
end
|
@@ -0,0 +1,95 @@
|
|
1
|
+
module RSpecQ
|
2
|
+
class Reporter
|
3
|
+
def initialize(build_id:, worker_id:, timeout:, redis_host:)
|
4
|
+
@build_id = build_id
|
5
|
+
@worker_id = worker_id
|
6
|
+
@timeout = timeout
|
7
|
+
@queue = Queue.new(build_id, worker_id, redis_host)
|
8
|
+
|
9
|
+
# We want feedback to be immediattely printed to CI users, so
|
10
|
+
# we disable buffering.
|
11
|
+
STDOUT.sync = true
|
12
|
+
end
|
13
|
+
|
14
|
+
def report
|
15
|
+
t = measure_duration { @queue.wait_until_published }
|
16
|
+
|
17
|
+
finished = false
|
18
|
+
|
19
|
+
reported_failures = {}
|
20
|
+
failure_heading_printed = false
|
21
|
+
|
22
|
+
tests_duration = measure_duration do
|
23
|
+
@timeout.times do |i|
|
24
|
+
@queue.example_failures.each do |job, rspec_output|
|
25
|
+
next if reported_failures[job]
|
26
|
+
|
27
|
+
if !failure_heading_printed
|
28
|
+
puts "\nFailures:\n"
|
29
|
+
failure_heading_printed = true
|
30
|
+
end
|
31
|
+
|
32
|
+
reported_failures[job] = true
|
33
|
+
puts failure_formatted(rspec_output)
|
34
|
+
end
|
35
|
+
|
36
|
+
if !@queue.exhausted?
|
37
|
+
sleep 1
|
38
|
+
next
|
39
|
+
end
|
40
|
+
|
41
|
+
finished = true
|
42
|
+
break
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
raise "Build not finished after #{@timeout} seconds" if !finished
|
47
|
+
|
48
|
+
@queue.record_build_time(tests_duration)
|
49
|
+
puts summary(@queue.example_failures, @queue.non_example_errors,
|
50
|
+
humanize_duration(tests_duration))
|
51
|
+
|
52
|
+
exit 1 if !@queue.build_successful?
|
53
|
+
end
|
54
|
+
|
55
|
+
private
|
56
|
+
|
57
|
+
def measure_duration
|
58
|
+
start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
|
59
|
+
yield
|
60
|
+
(Process.clock_gettime(Process::CLOCK_MONOTONIC) - start).round(2)
|
61
|
+
end
|
62
|
+
|
63
|
+
# We try to keep this output consistent with RSpec's original output
|
64
|
+
def summary(failures, errors, duration)
|
65
|
+
failed_examples_section = "\nFailed examples:\n\n"
|
66
|
+
|
67
|
+
failures.each do |_job, msg|
|
68
|
+
parts = msg.split("\n")
|
69
|
+
failed_examples_section << " #{parts[-1]}\n"
|
70
|
+
end
|
71
|
+
|
72
|
+
summary = ""
|
73
|
+
summary << failed_examples_section if !failures.empty?
|
74
|
+
|
75
|
+
errors.each { |_job, msg| summary << msg }
|
76
|
+
|
77
|
+
summary << "\n"
|
78
|
+
summary << "Total results:\n"
|
79
|
+
summary << " #{@queue.example_count} examples " \
|
80
|
+
"(#{@queue.processed_jobs_count} jobs), " \
|
81
|
+
"#{failures.count} failures, " \
|
82
|
+
"#{errors.count} errors"
|
83
|
+
summary << "\n\n"
|
84
|
+
summary << "Spec execution time: #{duration}"
|
85
|
+
end
|
86
|
+
|
87
|
+
def failure_formatted(rspec_output)
|
88
|
+
rspec_output.split("\n")[0..-2].join("\n")
|
89
|
+
end
|
90
|
+
|
91
|
+
def humanize_duration(seconds)
|
92
|
+
Time.at(seconds).utc.strftime("%H:%M:%S")
|
93
|
+
end
|
94
|
+
end
|
95
|
+
end
|
@@ -0,0 +1,185 @@
|
|
1
|
+
require "json"
|
2
|
+
require "pp"
|
3
|
+
|
4
|
+
module RSpecQ
|
5
|
+
class Worker
|
6
|
+
HEARTBEAT_FREQUENCY = WORKER_LIVENESS_SEC / 6
|
7
|
+
|
8
|
+
# If true, job timings will be populated in the global Redis timings key
|
9
|
+
#
|
10
|
+
# Defaults to false
|
11
|
+
attr_accessor :populate_timings
|
12
|
+
|
13
|
+
# If set, spec files that are known to take more than this value to finish,
|
14
|
+
# will be split and scheduled on a per-example basis.
|
15
|
+
attr_accessor :file_split_threshold
|
16
|
+
|
17
|
+
def initialize(build_id:, worker_id:, redis_host:, files_or_dirs_to_run:)
|
18
|
+
@build_id = build_id
|
19
|
+
@worker_id = worker_id
|
20
|
+
@queue = Queue.new(build_id, worker_id, redis_host)
|
21
|
+
@files_or_dirs_to_run = files_or_dirs_to_run
|
22
|
+
@populate_timings = false
|
23
|
+
@file_split_threshold = 999999
|
24
|
+
|
25
|
+
RSpec::Core::Formatters.register(Formatters::JobTimingRecorder, :dump_summary)
|
26
|
+
RSpec::Core::Formatters.register(Formatters::ExampleCountRecorder, :dump_summary)
|
27
|
+
RSpec::Core::Formatters.register(Formatters::FailureRecorder, :example_failed, :message)
|
28
|
+
RSpec::Core::Formatters.register(Formatters::WorkerHeartbeatRecorder, :example_finished)
|
29
|
+
end
|
30
|
+
|
31
|
+
def work
|
32
|
+
puts "Working for build #{@build_id} (worker=#{@worker_id})"
|
33
|
+
|
34
|
+
try_publish_queue!(@queue)
|
35
|
+
@queue.wait_until_published
|
36
|
+
|
37
|
+
loop do
|
38
|
+
# we have to bootstrap this so that it can be used in the first call
|
39
|
+
# to `requeue_lost_job` inside the work loop
|
40
|
+
update_heartbeat
|
41
|
+
|
42
|
+
lost = @queue.requeue_lost_job
|
43
|
+
puts "Requeued lost job: #{lost}" if lost
|
44
|
+
|
45
|
+
# TODO: can we make `reserve_job` also act like exhausted? and get
|
46
|
+
# rid of `exhausted?` (i.e. return false if no jobs remain)
|
47
|
+
job = @queue.reserve_job
|
48
|
+
|
49
|
+
# build is finished
|
50
|
+
return if job.nil? && @queue.exhausted?
|
51
|
+
|
52
|
+
next if job.nil?
|
53
|
+
|
54
|
+
puts
|
55
|
+
puts "Executing #{job}"
|
56
|
+
|
57
|
+
reset_rspec_state!
|
58
|
+
|
59
|
+
# reconfigure rspec
|
60
|
+
RSpec.configuration.detail_color = :magenta
|
61
|
+
RSpec.configuration.seed = srand && srand % 0xFFFF
|
62
|
+
RSpec.configuration.backtrace_formatter.filter_gem('rspecq')
|
63
|
+
RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(@queue, job))
|
64
|
+
RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(@queue))
|
65
|
+
RSpec.configuration.add_formatter(Formatters::WorkerHeartbeatRecorder.new(self))
|
66
|
+
|
67
|
+
if populate_timings
|
68
|
+
RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(@queue, job))
|
69
|
+
end
|
70
|
+
|
71
|
+
opts = RSpec::Core::ConfigurationOptions.new(["--format", "progress", job])
|
72
|
+
_result = RSpec::Core::Runner.new(opts).run($stderr, $stdout)
|
73
|
+
|
74
|
+
@queue.acknowledge_job(job)
|
75
|
+
end
|
76
|
+
end
|
77
|
+
|
78
|
+
# Update the worker heartbeat if necessary
|
79
|
+
def update_heartbeat
|
80
|
+
if @heartbeat_updated_at.nil? || elapsed(@heartbeat_updated_at) >= HEARTBEAT_FREQUENCY
|
81
|
+
@queue.record_worker_heartbeat
|
82
|
+
@heartbeat_updated_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
|
83
|
+
end
|
84
|
+
end
|
85
|
+
|
86
|
+
private
|
87
|
+
|
88
|
+
def reset_rspec_state!
|
89
|
+
RSpec.clear_examples
|
90
|
+
|
91
|
+
# TODO: remove after https://github.com/rspec/rspec-core/pull/2723
|
92
|
+
RSpec.world.instance_variable_set(:@example_group_counts_by_spec_file, Hash.new(0))
|
93
|
+
|
94
|
+
# RSpec.clear_examples does not reset those, which causes issues when
|
95
|
+
# a non-example error occurs (subsequent jobs are not executed)
|
96
|
+
# TODO: upstream
|
97
|
+
RSpec.world.non_example_failure = false
|
98
|
+
|
99
|
+
# we don't want an error that occured outside of the examples (which
|
100
|
+
# would set this to `true`) to stop the worker
|
101
|
+
RSpec.world.wants_to_quit = false
|
102
|
+
end
|
103
|
+
|
104
|
+
def try_publish_queue!(queue)
|
105
|
+
return if !queue.become_master
|
106
|
+
|
107
|
+
RSpec.configuration.files_or_directories_to_run = @files_or_dirs_to_run
|
108
|
+
files_to_run = RSpec.configuration.files_to_run.map { |j| relative_path(j) }
|
109
|
+
|
110
|
+
timings = queue.timings
|
111
|
+
if timings.empty?
|
112
|
+
# TODO: should be a warning reported somewhere (Sentry?)
|
113
|
+
q_size = queue.publish(files_to_run.shuffle)
|
114
|
+
puts "WARNING: No timings found! Published queue in " \
|
115
|
+
"random order (size=#{q_size})"
|
116
|
+
return
|
117
|
+
end
|
118
|
+
|
119
|
+
slow_files = timings.take_while do |_job, duration|
|
120
|
+
duration >= file_split_threshold
|
121
|
+
end.map(&:first) & files_to_run
|
122
|
+
|
123
|
+
if slow_files.any?
|
124
|
+
puts "Slow files (threshold=#{file_split_threshold}): #{slow_files}"
|
125
|
+
end
|
126
|
+
|
127
|
+
# prepare jobs to run
|
128
|
+
jobs = []
|
129
|
+
jobs.concat(files_to_run - slow_files)
|
130
|
+
jobs.concat(files_to_example_ids(slow_files)) if slow_files.any?
|
131
|
+
|
132
|
+
# assign timings to all of them
|
133
|
+
default_timing = timings.values[timings.values.size/2]
|
134
|
+
|
135
|
+
jobs = jobs.each_with_object({}) do |j, h|
|
136
|
+
# heuristic: put untimed jobs in the middle of the queue
|
137
|
+
puts "New/untimed job: #{j}" if timings[j].nil?
|
138
|
+
h[j] = timings[j] || default_timing
|
139
|
+
end
|
140
|
+
|
141
|
+
# finally, sort them based on their timing (slowest first)
|
142
|
+
jobs = jobs.sort_by { |_j, t| -t }.map(&:first)
|
143
|
+
|
144
|
+
puts "Published queue (size=#{queue.publish(jobs)})"
|
145
|
+
end
|
146
|
+
|
147
|
+
# NOTE: RSpec has to load the files before we can split them as individual
|
148
|
+
# examples. In case a file to be splitted fails to be loaded
|
149
|
+
# (e.g. contains a syntax error), we return the slow files unchanged,
|
150
|
+
# thereby falling back to scheduling them normally.
|
151
|
+
#
|
152
|
+
# Their errors will be reported in the normal flow, when they're picked up
|
153
|
+
# as jobs by a worker.
|
154
|
+
def files_to_example_ids(files)
|
155
|
+
# TODO: do this programatically
|
156
|
+
cmd = "DISABLE_SPRING=1 bin/rspec --dry-run --format json #{files.join(' ')}"
|
157
|
+
out = `#{cmd}`
|
158
|
+
|
159
|
+
if !$?.success?
|
160
|
+
# TODO: emit warning to Sentry
|
161
|
+
puts "WARNING: Error splitting slow files; falling back to regular scheduling:"
|
162
|
+
|
163
|
+
begin
|
164
|
+
pp JSON.parse(out)
|
165
|
+
rescue JSON::ParserError
|
166
|
+
puts out
|
167
|
+
end
|
168
|
+
puts
|
169
|
+
|
170
|
+
return files
|
171
|
+
end
|
172
|
+
|
173
|
+
JSON.parse(out)["examples"].map { |e| e["id"] }
|
174
|
+
end
|
175
|
+
|
176
|
+
def relative_path(job)
|
177
|
+
@cwd ||= Pathname.new(Dir.pwd)
|
178
|
+
"./#{Pathname.new(job).relative_path_from(@cwd)}"
|
179
|
+
end
|
180
|
+
|
181
|
+
def elapsed(since)
|
182
|
+
Process.clock_gettime(Process::CLOCK_MONOTONIC) - since
|
183
|
+
end
|
184
|
+
end
|
185
|
+
end
|
metadata
ADDED
@@ -0,0 +1,98 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: rspecq
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.1.pre1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Agis Anastasopoulos
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2020-06-26 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: rspec-core
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - ">="
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '0'
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - ">="
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: minitest
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - "~>"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '5.14'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '5.14'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: rake
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - ">="
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
type: :development
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
55
|
+
description:
|
56
|
+
email: agis.anast@gmail.com
|
57
|
+
executables:
|
58
|
+
- rspecq
|
59
|
+
extensions: []
|
60
|
+
extra_rdoc_files: []
|
61
|
+
files:
|
62
|
+
- CHANGELOG.md
|
63
|
+
- LICENSE
|
64
|
+
- README.md
|
65
|
+
- bin/rspecq
|
66
|
+
- lib/rspecq.rb
|
67
|
+
- lib/rspecq/formatters/example_count_recorder.rb
|
68
|
+
- lib/rspecq/formatters/failure_recorder.rb
|
69
|
+
- lib/rspecq/formatters/job_timing_recorder.rb
|
70
|
+
- lib/rspecq/formatters/worker_heartbeat_recorder.rb
|
71
|
+
- lib/rspecq/queue.rb
|
72
|
+
- lib/rspecq/reporter.rb
|
73
|
+
- lib/rspecq/version.rb
|
74
|
+
- lib/rspecq/worker.rb
|
75
|
+
homepage: https://github.com/skroutz/rspecq
|
76
|
+
licenses:
|
77
|
+
- MIT
|
78
|
+
metadata: {}
|
79
|
+
post_install_message:
|
80
|
+
rdoc_options: []
|
81
|
+
require_paths:
|
82
|
+
- lib
|
83
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
84
|
+
requirements:
|
85
|
+
- - ">="
|
86
|
+
- !ruby/object:Gem::Version
|
87
|
+
version: '0'
|
88
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
89
|
+
requirements:
|
90
|
+
- - ">"
|
91
|
+
- !ruby/object:Gem::Version
|
92
|
+
version: 1.3.1
|
93
|
+
requirements: []
|
94
|
+
rubygems_version: 3.1.2
|
95
|
+
signing_key:
|
96
|
+
specification_version: 4
|
97
|
+
summary: Distribute an RSpec suite among many workers
|
98
|
+
test_files: []
|