rspecq 0.0.1.pre2 → 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +18 -0
- data/README.md +165 -63
- data/Rakefile +9 -0
- data/bin/rspecq +79 -28
- data/lib/rspecq.rb +5 -7
- data/lib/rspecq/formatters/failure_recorder.rb +3 -2
- data/lib/rspecq/queue.rb +21 -5
- data/lib/rspecq/reporter.rb +3 -4
- data/lib/rspecq/version.rb +1 -1
- data/lib/rspecq/worker.rb +112 -63
- metadata +55 -11
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d6b4c91525a2fb29e2198f290877ffe5ef1e753dafe0f9babd75a581e25d7af8
|
4
|
+
data.tar.gz: 27d3705ee014a5dc77514238b36386eaf11d1bb76601ab25c463796184ea5795
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 21803fa664abe45f173f7121dc948ebc8dbc0df41046f8e6269e8ec3751b647701b07a8b6a872aad0da22f438c043ebe6bf0fc69bc9c9c8b327e3b157dccab04
|
7
|
+
data.tar.gz: 4d683884610e2e28ca5ce5cf891e37135886ea4c6ec905f26607c1f0fdd62c1952e9c49983bb0191eb5d42433a800bf88f8227354a213b50645742f21f95af16
|
data/CHANGELOG.md
CHANGED
@@ -1,4 +1,22 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
+
Breaking changes are prefixed with a "[BREAKING]" label.
|
4
|
+
|
3
5
|
## master (unreleased)
|
4
6
|
|
7
|
+
## 0.1.0 (2020-08-27)
|
8
|
+
|
9
|
+
### Added
|
10
|
+
|
11
|
+
- Sentry integration for various RSpecQ-level events [[#16](https://github.com/skroutz/rspecq/pull/16)]
|
12
|
+
- CLI: Flags can now be also set environment variables [[c519230](https://github.com/skroutz/rspecq/commit/c5192303e229f361e8ac86ae449b4ea84d42e022)]
|
13
|
+
- CLI: Added shorthand specifiers versions for some flags [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
|
14
|
+
- CLI: Added `--help` and `--version` flags [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
|
15
|
+
- CLI: Max number of retries for failed examples is now configurable via the `--max-requeues` option [[#14](https://github.com/skroutz/rspecq/pull/14)]
|
16
|
+
|
17
|
+
### Changed
|
18
|
+
|
19
|
+
- [BREAKING] CLI: Renamed `--timings` to `--update-timings` [[c519230](https://github.com/skroutz/rspecq/commit/c5192303e229f361e8ac86ae449b4ea84d42e022)]
|
20
|
+
- [BREAKING] CLI: Renamed `--build-id` to `--build` and `--worker-id` to `--worker` [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
|
21
|
+
- CLI: `--worker` is not required when `--reporter` is used [[4323a75](https://github.com/skroutz/rspecq/commit/4323a75ca357274069d02ba9fb51cdebb04e0be4)]
|
22
|
+
- CLI: Improved help output [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
|
data/README.md
CHANGED
@@ -1,102 +1,204 @@
|
|
1
|
-
|
1
|
+
RSpec Queue
|
2
|
+
=========================================================================
|
3
|
+
[![Build Status](https://travis-ci.com/skroutz/rspecq.svg?branch=master)](https://travis-ci.com/github/skroutz/rspecq)
|
4
|
+
[![Gem Version](https://badge.fury.io/rb/rspecq.svg)](https://badge.fury.io/rb/rspecq)
|
2
5
|
|
3
|
-
|
4
|
-
|
6
|
+
RSpec Queue (RSpecQ) distributes and executes RSpec suites among parallel
|
7
|
+
workers. It uses a centralized queue that workers connect to and pop off
|
8
|
+
tests from. It ensures optimal scheduling of tests based on their run time,
|
9
|
+
facilitating faster CI builds.
|
5
10
|
|
6
|
-
RSpecQ is
|
11
|
+
RSpecQ is inspired by [test-queue](https://github.com/tmm1/test-queue)
|
7
12
|
and [ci-queue](https://github.com/Shopify/ci-queue).
|
8
13
|
|
9
|
-
##
|
14
|
+
## Features
|
15
|
+
|
16
|
+
- Run an RSpec suite among many workers
|
17
|
+
(potentially located in different hosts) in a distributed fashion,
|
18
|
+
facilitating faster CI builds.
|
19
|
+
- Consolidated, real-time reporting of a build's progress.
|
20
|
+
- Optimal scheduling of test execution by using timings statistics from previous runs and
|
21
|
+
automatically scheduling slow spec files as individual examples. See
|
22
|
+
[*Spec file splitting*](#spec-file-splitting).
|
23
|
+
- Automatic retry of test failures before being considered legit, in order to
|
24
|
+
rule out flakiness. See [*Requeues*](#requeues).
|
25
|
+
- Handles intermittent worker failures (e.g. network hiccups, faulty hardware etc.)
|
26
|
+
by detecting non-responsive workers and requeing their jobs. See [*Worker failures*](#worker-failures)
|
27
|
+
- [Sentry](https://sentry.io) integration for monitoring important
|
28
|
+
RSpecQ-level events.
|
29
|
+
- [PLANNED] StatsD integration for various build-level metrics and insights.
|
30
|
+
See [#2](https://github.com/skroutz/rspecq/issues/2).
|
10
31
|
|
11
|
-
|
12
|
-
in the workers (up to 3 minutes), increased memory consumption and too much
|
13
|
-
disk I/O on boot. This is due to the fact that a worker in ci-queue has to
|
14
|
-
load every spec file on boot. This can be problematic for applications with
|
15
|
-
a large number of spec files.
|
16
|
-
|
17
|
-
RSpecQ works with spec files as its unit of work (as opposed to ci-queue which
|
18
|
-
works with individual examples). This means that an RSpecQ worker does not
|
19
|
-
have to load all spec files at once and so it doesn't have the aforementioned
|
20
|
-
problems. It also allows suites to keep using `before(:all)` hooks
|
21
|
-
(which ci-queue explicitly rejects). (Note: RSpecQ also schedules individual
|
22
|
-
examples, but only when this is deemed necessary, see section
|
23
|
-
"Spec file splitting").
|
32
|
+
## Usage
|
24
33
|
|
25
|
-
|
26
|
-
|
34
|
+
A worker needs to be given a name and the build it will participate in.
|
35
|
+
Assuming there's a Redis instance listening at `localhost`, starting a worker
|
36
|
+
is as simple as:
|
27
37
|
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
on every run (if the `--timings` option was used). Also, RSpecQ has a "slow
|
32
|
-
file threshold" which, currently has to be set manually (but this can be
|
33
|
-
improved).
|
38
|
+
```shell
|
39
|
+
$ rspecq --build=123 --worker=foo1 spec/
|
40
|
+
```
|
34
41
|
|
35
|
-
|
42
|
+
To start more workers for the same build, use distinct worker IDs but the same
|
43
|
+
build ID:
|
36
44
|
|
37
|
-
|
45
|
+
```shell
|
46
|
+
$ rspecq --build=123 --worker=foo2
|
47
|
+
```
|
38
48
|
|
39
|
-
|
40
|
-
Redis is located. To start a worker:
|
49
|
+
To view the progress of the build use `--report`:
|
41
50
|
|
42
51
|
```shell
|
43
|
-
$ rspecq --build
|
52
|
+
$ rspecq --build=123 --report
|
44
53
|
```
|
45
54
|
|
46
|
-
|
55
|
+
For detailed info use `--help`:
|
47
56
|
|
48
|
-
```shell
|
49
|
-
$ rspecq --build-id=foo --worker-id=reporter --redis=redis://localhost --report
|
50
57
|
```
|
58
|
+
NAME:
|
59
|
+
rspecq - Optimally distribute and run RSpec suites among parallel workers
|
60
|
+
|
61
|
+
USAGE:
|
62
|
+
rspecq [<options>] [spec files or directories]
|
63
|
+
|
64
|
+
OPTIONS:
|
65
|
+
-b, --build ID A unique identifier for the build. Should be common among workers participating in the same build.
|
66
|
+
-w, --worker ID An identifier for the worker. Workers participating in the same build should have distinct IDs.
|
67
|
+
-r, --redis HOST Redis host to connect to (default: 127.0.0.1).
|
68
|
+
--update-timings Update the global job timings key with the timings of this build. Note: This key is used as the basis for job scheduling.
|
69
|
+
--file-split-threshold N Split spec files slower than N seconds and schedule them as individual examples.
|
70
|
+
--report Enable reporter mode: do not pull tests off the queue; instead print build progress and exit when it's finished.
|
71
|
+
Exits with a non-zero status code if there were any failures.
|
72
|
+
--report-timeout N Fail if build is not finished after N seconds. Only applicable if --report is enabled (default: 3600).
|
73
|
+
--max-requeues N Retry failed examples up to N times before considering them legit failures (default: 3).
|
74
|
+
-h, --help Show this message.
|
75
|
+
-v, --version Print the version and exit.
|
76
|
+
```
|
77
|
+
|
78
|
+
### Sentry integration
|
51
79
|
|
52
|
-
|
80
|
+
RSpecQ can optionally emit build events to a
|
81
|
+
[Sentry](https://sentry.io) project by setting the
|
82
|
+
[`SENTRY_DSN`](https://github.com/getsentry/raven-ruby#raven-only-runs-when-sentry_dsn-is-set)
|
83
|
+
environment variable.
|
84
|
+
|
85
|
+
This is convenient for monitoring important warnings/errors that may impact
|
86
|
+
build times, such as the fact that no previous timings were found and
|
87
|
+
therefore job scheduling was effectively random for a particular build.
|
53
88
|
|
54
89
|
|
55
90
|
## How it works
|
56
91
|
|
57
|
-
The
|
92
|
+
The core design is almost identical to ci-queue so please refer to its
|
93
|
+
[README](https://github.com/Shopify/ci-queue/blob/master/README.md) instead.
|
58
94
|
|
59
95
|
### Terminology
|
60
96
|
|
61
|
-
- Job
|
97
|
+
- **Job**: the smallest unit of work, which is usually a spec file
|
62
98
|
(e.g. `./spec/models/foo_spec.rb`) but can also be an individual example
|
63
|
-
(e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow
|
64
|
-
- Queue
|
65
|
-
information for RSpecQ to
|
66
|
-
be executed, the failure reports
|
67
|
-
-
|
68
|
-
|
69
|
-
|
70
|
-
|
99
|
+
(e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow.
|
100
|
+
- **Queue**: a collection of Redis-backed structures that hold all the necessary
|
101
|
+
information for an RSpecQ build to run. This includes timing statistics,
|
102
|
+
jobs to be executed, the failure reports and more.
|
103
|
+
- **Build**: a particular test suite run. Each build has its own **Queue**.
|
104
|
+
- **Worker**: an `rspecq` process that, given a build id, consumes jobs off the
|
105
|
+
build's queue and executes them using RSpec
|
106
|
+
- **Reporter**: an `rspecq` process that, given a build id, waits for the build's
|
107
|
+
queue to be drained and prints the build summary report
|
71
108
|
|
72
109
|
### Spec file splitting
|
73
110
|
|
74
|
-
|
75
|
-
a
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
111
|
+
Particularly slow spec files may set a limit to how fast a build can be.
|
112
|
+
For example, a single file may need 10 minutes to run while all other
|
113
|
+
files finish after 8 minutes. This would cause all but one workers to be
|
114
|
+
sitting idle for 2 minutes.
|
115
|
+
|
116
|
+
To overcome this issue, RSpecQ can splits files which their execution time is
|
117
|
+
above a certain threshold (set with the `--file-split-threshold` option)
|
118
|
+
and instead schedule them as individual examples.
|
80
119
|
|
81
|
-
In the future, we'd like for the slow threshold to be calculated and set
|
82
|
-
dynamically.
|
120
|
+
Note: In the future, we'd like for the slow threshold to be calculated and set
|
121
|
+
dynamically (see #3).
|
83
122
|
|
84
123
|
### Requeues
|
85
124
|
|
86
|
-
As a mitigation
|
87
|
-
back to the queue to be picked up by
|
88
|
-
|
89
|
-
|
90
|
-
|
125
|
+
As a mitigation technique against flaky tests, if an example fails it will be
|
126
|
+
put back to the queue to be picked up by another worker. This will be repeated
|
127
|
+
up to a certain number of times (set with the `--max-requeues` option), after
|
128
|
+
which the example will be considered a legit failure and printed as such in the
|
129
|
+
final report.
|
91
130
|
|
92
131
|
### Worker failures
|
93
132
|
|
94
|
-
|
95
|
-
|
96
|
-
|
97
|
-
|
133
|
+
It's not uncommon for CI processes to encounter unrecoverable failures for
|
134
|
+
various reasons: faulty hardware, network hiccups, segmentation faults in
|
135
|
+
MRI etc.
|
136
|
+
|
137
|
+
For resiliency against such issues, workers emit a heartbeat after each
|
138
|
+
example they execute, to signal
|
139
|
+
that they're healthy and performing jobs as expected. If a worker hasn't
|
140
|
+
emitted a heartbeat for a given amount of time (set by `WORKER_LIVENESS_SEC`)
|
141
|
+
it is considered dead and its reserved job will be put back to the queue, to
|
142
|
+
be picked up by another healthy worker.
|
143
|
+
|
144
|
+
|
145
|
+
## Rationale
|
146
|
+
|
147
|
+
### Why didn't you use ci-queue?
|
148
|
+
|
149
|
+
**Update**: ci-queue [deprecated support for RSpec](https://github.com/Shopify/ci-queue/pull/149).
|
150
|
+
|
151
|
+
While evaluating ci-queue we experienced slow worker boot
|
152
|
+
times (up to 3 minutes in some cases) combined with disk IO saturation and
|
153
|
+
increased memory consumption. This is due to the fact that a worker in
|
154
|
+
ci-queue has to load every spec file on boot. In applications with a large
|
155
|
+
number of spec files this may result in a significant performance hit and
|
156
|
+
in case of cloud environments, increased costs.
|
157
|
+
|
158
|
+
We also observed slower build times compared to our previous solution which
|
159
|
+
scheduled whole spec files (as opposed to individual examples), due to
|
160
|
+
big differences in runtimes of individual examples, something common in big
|
161
|
+
RSpec suites.
|
162
|
+
|
163
|
+
We decided for RSpecQ to use whole spec files as its main unit of work (as
|
164
|
+
opposed to ci-queue which uses individual examples). This means that an RSpecQ
|
165
|
+
worker only loads the files needed and ends up with a subset of all the suite's
|
166
|
+
files. (Note: RSpecQ also schedules individual examples, but only when this is
|
167
|
+
deemed necessary, see [Spec file splitting](#spec-file-splitting)).
|
168
|
+
|
169
|
+
This kept boot and test run times considerably fast. As a side benefit, this
|
170
|
+
allows suites to keep using `before(:all)` hooks (which ci-queue explicitly
|
171
|
+
rejects).
|
172
|
+
|
173
|
+
The downside of this design is that it's more complicated, since the scheduling
|
174
|
+
of spec files happens based on timings calculated from previous runs. This
|
175
|
+
means that RSpecQ maintains a key with the timing of each job and updates it
|
176
|
+
on every run (if the `--timings` option was used). Also, RSpecQ has a "slow
|
177
|
+
file threshold" which, currently has to be set manually (but this can be
|
178
|
+
improved in the future).
|
179
|
+
|
180
|
+
|
181
|
+
## Development
|
182
|
+
|
183
|
+
Install the required dependencies:
|
184
|
+
|
185
|
+
```
|
186
|
+
$ bundle install
|
187
|
+
```
|
188
|
+
|
189
|
+
Then you can execute the tests after spinning up a Redis instance at
|
190
|
+
`127.0.0.1:6379`:
|
191
|
+
|
192
|
+
```
|
193
|
+
$ bundle exec rake
|
194
|
+
```
|
195
|
+
|
196
|
+
To enable verbose output in the tests:
|
197
|
+
|
198
|
+
```
|
199
|
+
$ RSPECQ_DEBUG=1 bundle exec rake
|
200
|
+
```
|
98
201
|
|
99
|
-
This protects us against unrecoverable worker failures (e.g. segfault).
|
100
202
|
|
101
203
|
## License
|
102
204
|
|
data/Rakefile
ADDED
data/bin/rspecq
CHANGED
@@ -1,67 +1,118 @@
|
|
1
1
|
#!/usr/bin/env ruby
|
2
|
-
require "
|
2
|
+
require "optparse"
|
3
3
|
require "rspecq"
|
4
4
|
|
5
|
+
DEFAULT_REDIS_HOST = "127.0.0.1"
|
6
|
+
DEFAULT_REPORT_TIMEOUT = 3600 # 1 hour
|
7
|
+
DEFAULT_MAX_REQUEUES = 3
|
8
|
+
|
9
|
+
def env_set?(var)
|
10
|
+
["1", "true"].include?(ENV[var])
|
11
|
+
end
|
12
|
+
|
5
13
|
opts = {}
|
14
|
+
|
6
15
|
OptionParser.new do |o|
|
7
|
-
|
16
|
+
name = File.basename($PROGRAM_NAME)
|
8
17
|
|
9
|
-
o.
|
10
|
-
|
18
|
+
o.banner = <<~BANNER
|
19
|
+
NAME:
|
20
|
+
#{name} - Optimally distribute and run RSpec suites among parallel workers
|
21
|
+
|
22
|
+
USAGE:
|
23
|
+
#{name} [<options>] [spec files or directories]
|
24
|
+
BANNER
|
25
|
+
|
26
|
+
o.separator ""
|
27
|
+
o.separator "OPTIONS:"
|
28
|
+
|
29
|
+
o.on("-b", "--build ID", "A unique identifier for the build. Should be " \
|
30
|
+
"common among workers participating in the same build.") do |v|
|
31
|
+
opts[:build] = v
|
11
32
|
end
|
12
33
|
|
13
|
-
o.on("--worker
|
14
|
-
|
34
|
+
o.on("-w", "--worker ID", "An identifier for the worker. Workers " \
|
35
|
+
"participating in the same build should have distinct IDs.") do |v|
|
36
|
+
opts[:worker] = v
|
15
37
|
end
|
16
38
|
|
17
|
-
o.on("--redis HOST", "Redis
|
18
|
-
|
39
|
+
o.on("-r", "--redis HOST", "Redis host to connect to " \
|
40
|
+
"(default: #{DEFAULT_REDIS_HOST}).") do |v|
|
41
|
+
opts[:redis_host] = v
|
19
42
|
end
|
20
43
|
|
21
|
-
o.on("--timings", "
|
44
|
+
o.on("--update-timings", "Update the global job timings key with the " \
|
45
|
+
"timings of this build. Note: This key is used as the basis for job " \
|
46
|
+
"scheduling.") do |v|
|
22
47
|
opts[:timings] = v
|
23
48
|
end
|
24
49
|
|
25
|
-
o.on("--file-split-threshold N", "Split spec files slower than N
|
26
|
-
"schedule them
|
27
|
-
opts[:file_split_threshold] =
|
50
|
+
o.on("--file-split-threshold N", Integer, "Split spec files slower than N " \
|
51
|
+
"seconds and schedule them as individual examples.") do |v|
|
52
|
+
opts[:file_split_threshold] = v
|
28
53
|
end
|
29
54
|
|
30
|
-
o.on("--report", "
|
31
|
-
|
55
|
+
o.on("--report", "Enable reporter mode: do not pull tests off the queue; " \
|
56
|
+
"instead print build progress and exit when it's " \
|
57
|
+
"finished.\n#{o.summary_indent*9} " \
|
58
|
+
"Exits with a non-zero status code if there were any " \
|
59
|
+
"failures.") do |v|
|
32
60
|
opts[:report] = v
|
33
61
|
end
|
34
62
|
|
35
|
-
o.on("--report-timeout N", Integer, "Fail if
|
36
|
-
"N seconds. Only applicable if --report is enabled "
|
37
|
-
"(default:
|
63
|
+
o.on("--report-timeout N", Integer, "Fail if build is not finished after " \
|
64
|
+
"N seconds. Only applicable if --report is enabled " \
|
65
|
+
"(default: #{DEFAULT_REPORT_TIMEOUT}).") do |v|
|
38
66
|
opts[:report_timeout] = v
|
39
67
|
end
|
40
68
|
|
69
|
+
o.on("--max-requeues N", Integer, "Retry failed examples up to N times " \
|
70
|
+
"before considering them legit failures " \
|
71
|
+
"(default: #{DEFAULT_MAX_REQUEUES}).") do |v|
|
72
|
+
opts[:max_requeues] = v
|
73
|
+
end
|
74
|
+
|
75
|
+
o.on_tail("-h", "--help", "Show this message.") do
|
76
|
+
puts o
|
77
|
+
exit
|
78
|
+
end
|
79
|
+
|
80
|
+
o.on_tail("-v", "--version", "Print the version and exit.") do
|
81
|
+
puts "#{name} #{RSpecQ::VERSION}"
|
82
|
+
exit
|
83
|
+
end
|
41
84
|
end.parse!
|
42
85
|
|
43
|
-
[:
|
44
|
-
|
45
|
-
|
86
|
+
opts[:build] ||= ENV["RSPECQ_BUILD"]
|
87
|
+
opts[:worker] ||= ENV["RSPECQ_WORKER"]
|
88
|
+
opts[:redis_host] ||= ENV["RSPECQ_REDIS"] || DEFAULT_REDIS_HOST
|
89
|
+
opts[:timings] ||= env_set?("RSPECQ_UPDATE_TIMINGS")
|
90
|
+
opts[:file_split_threshold] ||= Integer(ENV["RSPECQ_FILE_SPLIT_THRESHOLD"] || 9999999)
|
91
|
+
opts[:report] ||= env_set?("RSPECQ_REPORT")
|
92
|
+
opts[:report_timeout] ||= Integer(ENV["RSPECQ_REPORT_TIMEOUT"] || DEFAULT_REPORT_TIMEOUT)
|
93
|
+
opts[:max_requeues] ||= Integer(ENV["RSPECQ_MAX_REQUEUES"] || DEFAULT_MAX_REQUEUES)
|
94
|
+
|
95
|
+
raise OptionParser::MissingArgument.new(:build) if opts[:build].nil?
|
96
|
+
raise OptionParser::MissingArgument.new(:worker) if !opts[:report] && opts[:worker].nil?
|
46
97
|
|
47
98
|
if opts[:report]
|
48
99
|
reporter = RSpecQ::Reporter.new(
|
49
|
-
build_id: opts[:
|
50
|
-
|
51
|
-
timeout: opts[:report_timeout] || 3600,
|
100
|
+
build_id: opts[:build],
|
101
|
+
timeout: opts[:report_timeout],
|
52
102
|
redis_host: opts[:redis_host],
|
53
103
|
)
|
54
104
|
|
55
105
|
reporter.report
|
56
106
|
else
|
57
107
|
worker = RSpecQ::Worker.new(
|
58
|
-
build_id: opts[:
|
59
|
-
worker_id: opts[:
|
60
|
-
redis_host: opts[:redis_host]
|
61
|
-
files_or_dirs_to_run: ARGV[0] || "spec",
|
108
|
+
build_id: opts[:build],
|
109
|
+
worker_id: opts[:worker],
|
110
|
+
redis_host: opts[:redis_host]
|
62
111
|
)
|
63
112
|
|
113
|
+
worker.files_or_dirs_to_run = ARGV[0] if ARGV[0]
|
64
114
|
worker.populate_timings = opts[:timings]
|
65
|
-
worker.file_split_threshold = opts[:file_split_threshold]
|
115
|
+
worker.file_split_threshold = opts[:file_split_threshold]
|
116
|
+
worker.max_requeues = opts[:max_requeues]
|
66
117
|
worker.work
|
67
118
|
end
|
data/lib/rspecq.rb
CHANGED
@@ -1,11 +1,10 @@
|
|
1
1
|
require "rspec/core"
|
2
|
+
require "sentry-raven"
|
2
3
|
|
3
4
|
module RSpecQ
|
4
|
-
|
5
|
-
|
6
|
-
#
|
7
|
-
# (in seconds), it is considered dead and its reserved work will be put back
|
8
|
-
# to the queue, to be picked up by another worker.
|
5
|
+
# If a worker haven't executed an example for more than WORKER_LIVENESS_SEC
|
6
|
+
# seconds, it is considered dead and its reserved work will be put back
|
7
|
+
# to the queue to be picked up by another worker.
|
9
8
|
WORKER_LIVENESS_SEC = 60.0
|
10
9
|
end
|
11
10
|
|
@@ -16,6 +15,5 @@ require_relative "rspecq/formatters/worker_heartbeat_recorder"
|
|
16
15
|
|
17
16
|
require_relative "rspecq/queue"
|
18
17
|
require_relative "rspecq/reporter"
|
19
|
-
require_relative "rspecq/worker"
|
20
|
-
|
21
18
|
require_relative "rspecq/version"
|
19
|
+
require_relative "rspecq/worker"
|
@@ -1,11 +1,12 @@
|
|
1
1
|
module RSpecQ
|
2
2
|
module Formatters
|
3
3
|
class FailureRecorder
|
4
|
-
def initialize(queue, job)
|
4
|
+
def initialize(queue, job, max_requeues)
|
5
5
|
@queue = queue
|
6
6
|
@job = job
|
7
7
|
@colorizer = RSpec::Core::Formatters::ConsoleCodes
|
8
8
|
@non_example_error_recorded = false
|
9
|
+
@max_requeues = max_requeues
|
9
10
|
end
|
10
11
|
|
11
12
|
# Here we're notified about errors occuring outside of examples.
|
@@ -24,7 +25,7 @@ module RSpecQ
|
|
24
25
|
def example_failed(notification)
|
25
26
|
example = notification.example
|
26
27
|
|
27
|
-
if @queue.requeue_job(example.id,
|
28
|
+
if @queue.requeue_job(example.id, @max_requeues)
|
28
29
|
# HACK: try to avoid picking the job we just requeued; we want it
|
29
30
|
# to be picked up by a different worker
|
30
31
|
sleep 0.5
|
data/lib/rspecq/queue.rb
CHANGED
@@ -57,6 +57,8 @@ module RSpecQ
|
|
57
57
|
STATUS_INITIALIZING = "initializing".freeze
|
58
58
|
STATUS_READY = "ready".freeze
|
59
59
|
|
60
|
+
attr_reader :redis
|
61
|
+
|
60
62
|
def initialize(build_id, worker_id, redis_host)
|
61
63
|
@build_id = build_id
|
62
64
|
@worker_id = worker_id
|
@@ -150,13 +152,21 @@ module RSpecQ
|
|
150
152
|
end
|
151
153
|
|
152
154
|
def example_count
|
153
|
-
@redis.get(key_example_count)
|
155
|
+
@redis.get(key_example_count).to_i
|
154
156
|
end
|
155
157
|
|
156
158
|
def processed_jobs_count
|
157
159
|
@redis.scard(key_queue_processed)
|
158
160
|
end
|
159
161
|
|
162
|
+
def processed_jobs
|
163
|
+
@redis.smembers(key_queue_processed)
|
164
|
+
end
|
165
|
+
|
166
|
+
def requeued_jobs
|
167
|
+
@redis.hgetall(key_requeues)
|
168
|
+
end
|
169
|
+
|
160
170
|
def become_master
|
161
171
|
@redis.setnx(key_queue_status, STATUS_INITIALIZING)
|
162
172
|
end
|
@@ -200,10 +210,10 @@ module RSpecQ
|
|
200
210
|
exhausted? && example_failures.empty? && non_example_errors.empty?
|
201
211
|
end
|
202
212
|
|
203
|
-
|
204
|
-
|
205
|
-
def
|
206
|
-
|
213
|
+
# The remaining jobs to be processed. Jobs at the head of the list will
|
214
|
+
# be procesed first.
|
215
|
+
def unprocessed_jobs
|
216
|
+
@redis.lrange(key_queue_unprocessed, 0, -1)
|
207
217
|
end
|
208
218
|
|
209
219
|
# redis: STRING [STATUS_INITIALIZING, STATUS_READY]
|
@@ -279,6 +289,12 @@ module RSpecQ
|
|
279
289
|
"build_times"
|
280
290
|
end
|
281
291
|
|
292
|
+
private
|
293
|
+
|
294
|
+
def key(*keys)
|
295
|
+
[@build_id, keys].join(":")
|
296
|
+
end
|
297
|
+
|
282
298
|
# We don't use any Ruby `Time` methods because specs that use timecop in
|
283
299
|
# before(:all) hooks will mess up our times.
|
284
300
|
def current_time
|
data/lib/rspecq/reporter.rb
CHANGED
@@ -1,10 +1,9 @@
|
|
1
1
|
module RSpecQ
|
2
2
|
class Reporter
|
3
|
-
def initialize(build_id:,
|
3
|
+
def initialize(build_id:, timeout:, redis_host:)
|
4
4
|
@build_id = build_id
|
5
|
-
@worker_id = worker_id
|
6
5
|
@timeout = timeout
|
7
|
-
@queue = Queue.new(build_id,
|
6
|
+
@queue = Queue.new(build_id, "reporter", redis_host)
|
8
7
|
|
9
8
|
# We want feedback to be immediattely printed to CI users, so
|
10
9
|
# we disable buffering.
|
@@ -12,7 +11,7 @@ module RSpecQ
|
|
12
11
|
end
|
13
12
|
|
14
13
|
def report
|
15
|
-
|
14
|
+
@queue.wait_until_published
|
16
15
|
|
17
16
|
finished = false
|
18
17
|
|
data/lib/rspecq/version.rb
CHANGED
data/lib/rspecq/worker.rb
CHANGED
@@ -1,10 +1,16 @@
|
|
1
1
|
require "json"
|
2
|
+
require "pathname"
|
2
3
|
require "pp"
|
3
4
|
|
4
5
|
module RSpecQ
|
5
6
|
class Worker
|
6
7
|
HEARTBEAT_FREQUENCY = WORKER_LIVENESS_SEC / 6
|
7
8
|
|
9
|
+
# The root path or individual spec files to execute.
|
10
|
+
#
|
11
|
+
# Defaults to "spec" (just like in RSpec)
|
12
|
+
attr_accessor :files_or_dirs_to_run
|
13
|
+
|
8
14
|
# If true, job timings will be populated in the global Redis timings key
|
9
15
|
#
|
10
16
|
# Defaults to false
|
@@ -12,15 +18,27 @@ module RSpecQ
|
|
12
18
|
|
13
19
|
# If set, spec files that are known to take more than this value to finish,
|
14
20
|
# will be split and scheduled on a per-example basis.
|
21
|
+
#
|
22
|
+
# Defaults to 999999
|
15
23
|
attr_accessor :file_split_threshold
|
16
24
|
|
17
|
-
|
25
|
+
# Retry failed examples up to N times (with N being the supplied value)
|
26
|
+
# before considering them legit failures
|
27
|
+
#
|
28
|
+
# Defaults to 3
|
29
|
+
attr_accessor :max_requeues
|
30
|
+
|
31
|
+
attr_reader :queue
|
32
|
+
|
33
|
+
def initialize(build_id:, worker_id:, redis_host:)
|
18
34
|
@build_id = build_id
|
19
35
|
@worker_id = worker_id
|
20
36
|
@queue = Queue.new(build_id, worker_id, redis_host)
|
21
|
-
@files_or_dirs_to_run =
|
37
|
+
@files_or_dirs_to_run = "spec"
|
22
38
|
@populate_timings = false
|
23
39
|
@file_split_threshold = 999999
|
40
|
+
@heartbeat_updated_at = nil
|
41
|
+
@max_requeues = 3
|
24
42
|
|
25
43
|
RSpec::Core::Formatters.register(Formatters::JobTimingRecorder, :dump_summary)
|
26
44
|
RSpec::Core::Formatters.register(Formatters::ExampleCountRecorder, :dump_summary)
|
@@ -31,23 +49,23 @@ module RSpecQ
|
|
31
49
|
def work
|
32
50
|
puts "Working for build #{@build_id} (worker=#{@worker_id})"
|
33
51
|
|
34
|
-
try_publish_queue!(
|
35
|
-
|
52
|
+
try_publish_queue!(queue)
|
53
|
+
queue.wait_until_published
|
36
54
|
|
37
55
|
loop do
|
38
56
|
# we have to bootstrap this so that it can be used in the first call
|
39
57
|
# to `requeue_lost_job` inside the work loop
|
40
58
|
update_heartbeat
|
41
59
|
|
42
|
-
lost =
|
60
|
+
lost = queue.requeue_lost_job
|
43
61
|
puts "Requeued lost job: #{lost}" if lost
|
44
62
|
|
45
63
|
# TODO: can we make `reserve_job` also act like exhausted? and get
|
46
64
|
# rid of `exhausted?` (i.e. return false if no jobs remain)
|
47
|
-
job =
|
65
|
+
job = queue.reserve_job
|
48
66
|
|
49
67
|
# build is finished
|
50
|
-
return if job.nil? &&
|
68
|
+
return if job.nil? && queue.exhausted?
|
51
69
|
|
52
70
|
next if job.nil?
|
53
71
|
|
@@ -60,112 +78,125 @@ module RSpecQ
|
|
60
78
|
RSpec.configuration.detail_color = :magenta
|
61
79
|
RSpec.configuration.seed = srand && srand % 0xFFFF
|
62
80
|
RSpec.configuration.backtrace_formatter.filter_gem('rspecq')
|
63
|
-
RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(
|
64
|
-
RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(
|
81
|
+
RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(queue, job, max_requeues))
|
82
|
+
RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(queue))
|
65
83
|
RSpec.configuration.add_formatter(Formatters::WorkerHeartbeatRecorder.new(self))
|
66
84
|
|
67
85
|
if populate_timings
|
68
|
-
RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(
|
86
|
+
RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(queue, job))
|
69
87
|
end
|
70
88
|
|
71
89
|
opts = RSpec::Core::ConfigurationOptions.new(["--format", "progress", job])
|
72
90
|
_result = RSpec::Core::Runner.new(opts).run($stderr, $stdout)
|
73
91
|
|
74
|
-
|
92
|
+
queue.acknowledge_job(job)
|
75
93
|
end
|
76
94
|
end
|
77
95
|
|
78
96
|
# Update the worker heartbeat if necessary
|
79
97
|
def update_heartbeat
|
80
98
|
if @heartbeat_updated_at.nil? || elapsed(@heartbeat_updated_at) >= HEARTBEAT_FREQUENCY
|
81
|
-
|
99
|
+
queue.record_worker_heartbeat
|
82
100
|
@heartbeat_updated_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
|
83
101
|
end
|
84
102
|
end
|
85
103
|
|
86
|
-
private
|
87
|
-
|
88
|
-
def reset_rspec_state!
|
89
|
-
RSpec.clear_examples
|
90
|
-
|
91
|
-
# TODO: remove after https://github.com/rspec/rspec-core/pull/2723
|
92
|
-
RSpec.world.instance_variable_set(:@example_group_counts_by_spec_file, Hash.new(0))
|
93
|
-
|
94
|
-
# RSpec.clear_examples does not reset those, which causes issues when
|
95
|
-
# a non-example error occurs (subsequent jobs are not executed)
|
96
|
-
# TODO: upstream
|
97
|
-
RSpec.world.non_example_failure = false
|
98
|
-
|
99
|
-
# we don't want an error that occured outside of the examples (which
|
100
|
-
# would set this to `true`) to stop the worker
|
101
|
-
RSpec.world.wants_to_quit = false
|
102
|
-
end
|
103
|
-
|
104
104
|
def try_publish_queue!(queue)
|
105
105
|
return if !queue.become_master
|
106
106
|
|
107
|
-
RSpec.configuration.files_or_directories_to_run =
|
107
|
+
RSpec.configuration.files_or_directories_to_run = files_or_dirs_to_run
|
108
108
|
files_to_run = RSpec.configuration.files_to_run.map { |j| relative_path(j) }
|
109
109
|
|
110
110
|
timings = queue.timings
|
111
111
|
if timings.empty?
|
112
|
-
# TODO: should be a warning reported somewhere (Sentry?)
|
113
112
|
q_size = queue.publish(files_to_run.shuffle)
|
114
|
-
|
115
|
-
|
113
|
+
log_event(
|
114
|
+
"No timings found! Published queue in random order (size=#{q_size})",
|
115
|
+
"warning"
|
116
|
+
)
|
116
117
|
return
|
117
118
|
end
|
118
119
|
|
119
|
-
|
120
|
-
|
121
|
-
|
120
|
+
# prepare jobs to run
|
121
|
+
jobs = []
|
122
|
+
slow_files = []
|
122
123
|
|
123
|
-
if
|
124
|
-
|
124
|
+
if file_split_threshold
|
125
|
+
slow_files = timings.take_while do |_job, duration|
|
126
|
+
duration >= file_split_threshold
|
127
|
+
end.map(&:first) & files_to_run
|
125
128
|
end
|
126
129
|
|
127
|
-
|
128
|
-
|
129
|
-
|
130
|
-
|
130
|
+
if slow_files.any?
|
131
|
+
jobs.concat(files_to_run - slow_files)
|
132
|
+
jobs.concat(files_to_example_ids(slow_files))
|
133
|
+
else
|
134
|
+
jobs.concat(files_to_run)
|
135
|
+
end
|
131
136
|
|
132
|
-
# assign timings to all of them
|
133
137
|
default_timing = timings.values[timings.values.size/2]
|
134
138
|
|
139
|
+
# assign timings (based on previous runs) to all jobs
|
135
140
|
jobs = jobs.each_with_object({}) do |j, h|
|
136
|
-
|
137
|
-
|
141
|
+
puts "Untimed job: #{j}" if timings[j].nil?
|
142
|
+
|
143
|
+
# HEURISTIC: put jobs without previous timings (e.g. a newly added
|
144
|
+
# spec file) in the middle of the queue
|
138
145
|
h[j] = timings[j] || default_timing
|
139
146
|
end
|
140
147
|
|
141
|
-
#
|
148
|
+
# sort jobs based on their timings (slowest to be processed first)
|
142
149
|
jobs = jobs.sort_by { |_j, t| -t }.map(&:first)
|
143
150
|
|
144
151
|
puts "Published queue (size=#{queue.publish(jobs)})"
|
145
152
|
end
|
146
153
|
|
154
|
+
private
|
155
|
+
|
156
|
+
def reset_rspec_state!
|
157
|
+
RSpec.clear_examples
|
158
|
+
|
159
|
+
# see https://github.com/rspec/rspec-core/pull/2723
|
160
|
+
if Gem::Version.new(RSpec::Core::Version::STRING) <= Gem::Version.new("3.9.1")
|
161
|
+
RSpec.world.instance_variable_set(
|
162
|
+
:@example_group_counts_by_spec_file, Hash.new(0))
|
163
|
+
end
|
164
|
+
|
165
|
+
# RSpec.clear_examples does not reset those, which causes issues when
|
166
|
+
# a non-example error occurs (subsequent jobs are not executed)
|
167
|
+
# TODO: upstream
|
168
|
+
RSpec.world.non_example_failure = false
|
169
|
+
|
170
|
+
# we don't want an error that occured outside of the examples (which
|
171
|
+
# would set this to `true`) to stop the worker
|
172
|
+
RSpec.world.wants_to_quit = false
|
173
|
+
end
|
174
|
+
|
147
175
|
# NOTE: RSpec has to load the files before we can split them as individual
|
148
176
|
# examples. In case a file to be splitted fails to be loaded
|
149
|
-
# (e.g. contains a syntax error), we return the
|
150
|
-
#
|
151
|
-
#
|
152
|
-
# Their errors will be reported in the normal flow, when they're picked up
|
153
|
-
# as jobs by a worker.
|
177
|
+
# (e.g. contains a syntax error), we return the files unchanged, thereby
|
178
|
+
# falling back to scheduling them as whole files. Their errors will be
|
179
|
+
# reported in the normal flow when they're eventually picked up by a worker.
|
154
180
|
def files_to_example_ids(files)
|
155
|
-
|
156
|
-
cmd = "DISABLE_SPRING=1 bin/rspec --dry-run --format json #{files.join(' ')}"
|
181
|
+
cmd = "DISABLE_SPRING=1 bundle exec rspec --dry-run --format json #{files.join(' ')} 2>&1"
|
157
182
|
out = `#{cmd}`
|
183
|
+
cmd_result = $?
|
158
184
|
|
159
|
-
if
|
160
|
-
|
161
|
-
|
185
|
+
if !cmd_result.success?
|
186
|
+
rspec_output = begin
|
187
|
+
JSON.parse(out)
|
188
|
+
rescue JSON::ParserError
|
189
|
+
out
|
190
|
+
end
|
162
191
|
|
163
|
-
|
164
|
-
|
165
|
-
|
166
|
-
|
167
|
-
|
168
|
-
|
192
|
+
log_event(
|
193
|
+
"Failed to split slow files, falling back to regular scheduling",
|
194
|
+
"error",
|
195
|
+
rspec_output: rspec_output,
|
196
|
+
cmd_result: cmd_result.inspect,
|
197
|
+
)
|
198
|
+
|
199
|
+
pp rspec_output
|
169
200
|
|
170
201
|
return files
|
171
202
|
end
|
@@ -181,5 +212,23 @@ module RSpecQ
|
|
181
212
|
def elapsed(since)
|
182
213
|
Process.clock_gettime(Process::CLOCK_MONOTONIC) - since
|
183
214
|
end
|
215
|
+
|
216
|
+
# Prints msg to standard output and emits an event to Sentry, if the
|
217
|
+
# SENTRY_DSN environment variable is set.
|
218
|
+
def log_event(msg, level, additional={})
|
219
|
+
puts msg
|
220
|
+
|
221
|
+
Raven.capture_message(msg, level: level, extra: {
|
222
|
+
build: @build_id,
|
223
|
+
worker: @worker_id,
|
224
|
+
queue: queue.inspect,
|
225
|
+
files_or_dirs_to_run: files_or_dirs_to_run,
|
226
|
+
populate_timings: populate_timings,
|
227
|
+
file_split_threshold: file_split_threshold,
|
228
|
+
heartbeat_updated_at: @heartbeat_updated_at,
|
229
|
+
object: self.inspect,
|
230
|
+
pid: Process.pid,
|
231
|
+
}.merge(additional))
|
232
|
+
end
|
184
233
|
end
|
185
234
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: rspecq
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.1.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Agis Anastasopoulos
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2020-
|
11
|
+
date: 2020-08-27 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rspec-core
|
@@ -38,22 +38,64 @@ dependencies:
|
|
38
38
|
- - ">="
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: '0'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: sentry-raven
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - ">="
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
type: :runtime
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: rake
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - ">="
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '0'
|
62
|
+
type: :development
|
63
|
+
prerelease: false
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
65
|
+
requirements:
|
66
|
+
- - ">="
|
67
|
+
- !ruby/object:Gem::Version
|
68
|
+
version: '0'
|
69
|
+
- !ruby/object:Gem::Dependency
|
70
|
+
name: pry-byebug
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
72
|
+
requirements:
|
73
|
+
- - ">="
|
74
|
+
- !ruby/object:Gem::Version
|
75
|
+
version: '0'
|
76
|
+
type: :development
|
77
|
+
prerelease: false
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
79
|
+
requirements:
|
80
|
+
- - ">="
|
81
|
+
- !ruby/object:Gem::Version
|
82
|
+
version: '0'
|
41
83
|
- !ruby/object:Gem::Dependency
|
42
84
|
name: minitest
|
43
85
|
requirement: !ruby/object:Gem::Requirement
|
44
86
|
requirements:
|
45
|
-
- - "
|
87
|
+
- - ">="
|
46
88
|
- !ruby/object:Gem::Version
|
47
|
-
version: '
|
89
|
+
version: '0'
|
48
90
|
type: :development
|
49
91
|
prerelease: false
|
50
92
|
version_requirements: !ruby/object:Gem::Requirement
|
51
93
|
requirements:
|
52
|
-
- - "
|
94
|
+
- - ">="
|
53
95
|
- !ruby/object:Gem::Version
|
54
|
-
version: '
|
96
|
+
version: '0'
|
55
97
|
- !ruby/object:Gem::Dependency
|
56
|
-
name:
|
98
|
+
name: rspec
|
57
99
|
requirement: !ruby/object:Gem::Requirement
|
58
100
|
requirements:
|
59
101
|
- - ">="
|
@@ -76,6 +118,7 @@ files:
|
|
76
118
|
- CHANGELOG.md
|
77
119
|
- LICENSE
|
78
120
|
- README.md
|
121
|
+
- Rakefile
|
79
122
|
- bin/rspecq
|
80
123
|
- lib/rspecq.rb
|
81
124
|
- lib/rspecq/formatters/example_count_recorder.rb
|
@@ -101,12 +144,13 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
101
144
|
version: '0'
|
102
145
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
103
146
|
requirements:
|
104
|
-
- - "
|
147
|
+
- - ">="
|
105
148
|
- !ruby/object:Gem::Version
|
106
|
-
version:
|
149
|
+
version: '0'
|
107
150
|
requirements: []
|
108
|
-
rubygems_version: 3.1.
|
151
|
+
rubygems_version: 3.1.4
|
109
152
|
signing_key:
|
110
153
|
specification_version: 4
|
111
|
-
summary:
|
154
|
+
summary: Optimally distribute and run RSpec suites among parallel workers; for faster
|
155
|
+
CI builds
|
112
156
|
test_files: []
|