rspecq 0.0.1.pre2 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4fcc5311329946efb2a7801087f4cdb5f0be8becc01bd1bb8f367b6d130a02ea
4
- data.tar.gz: d6f64c6d0c1dae8a53af8bf2d7724e2fb988b03fa795c59d0f5ecd18a92b072a
3
+ metadata.gz: d6b4c91525a2fb29e2198f290877ffe5ef1e753dafe0f9babd75a581e25d7af8
4
+ data.tar.gz: 27d3705ee014a5dc77514238b36386eaf11d1bb76601ab25c463796184ea5795
5
5
  SHA512:
6
- metadata.gz: 7611cf0944ea7751eaf93a7aae5686f6c03563b01e71978d9e1d61f30f14f89b12de0ce9ac590f9351eff22a4d4811f9e2f6c241232754ede3162142225f2c27
7
- data.tar.gz: bdbd6da559607026b8e6fead442d4b15b1bb73e957a63b022c4459a83aba0c2a8e297204d9c29ae3bbc700d39ea2434c7b99e55dfb3752a56af5376d0511fea0
6
+ metadata.gz: 21803fa664abe45f173f7121dc948ebc8dbc0df41046f8e6269e8ec3751b647701b07a8b6a872aad0da22f438c043ebe6bf0fc69bc9c9c8b327e3b157dccab04
7
+ data.tar.gz: 4d683884610e2e28ca5ce5cf891e37135886ea4c6ec905f26607c1f0fdd62c1952e9c49983bb0191eb5d42433a800bf88f8227354a213b50645742f21f95af16
@@ -1,4 +1,22 @@
1
1
  # Changelog
2
2
 
3
+ Breaking changes are prefixed with a "[BREAKING]" label.
4
+
3
5
  ## master (unreleased)
4
6
 
7
+ ## 0.1.0 (2020-08-27)
8
+
9
+ ### Added
10
+
11
+ - Sentry integration for various RSpecQ-level events [[#16](https://github.com/skroutz/rspecq/pull/16)]
12
+ - CLI: Flags can now be also set environment variables [[c519230](https://github.com/skroutz/rspecq/commit/c5192303e229f361e8ac86ae449b4ea84d42e022)]
13
+ - CLI: Added shorthand specifiers versions for some flags [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
14
+ - CLI: Added `--help` and `--version` flags [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
15
+ - CLI: Max number of retries for failed examples is now configurable via the `--max-requeues` option [[#14](https://github.com/skroutz/rspecq/pull/14)]
16
+
17
+ ### Changed
18
+
19
+ - [BREAKING] CLI: Renamed `--timings` to `--update-timings` [[c519230](https://github.com/skroutz/rspecq/commit/c5192303e229f361e8ac86ae449b4ea84d42e022)]
20
+ - [BREAKING] CLI: Renamed `--build-id` to `--build` and `--worker-id` to `--worker` [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
21
+ - CLI: `--worker` is not required when `--reporter` is used [[4323a75](https://github.com/skroutz/rspecq/commit/4323a75ca357274069d02ba9fb51cdebb04e0be4)]
22
+ - CLI: Improved help output [[df9faa8](https://github.com/skroutz/rspecq/commit/df9faa8ec6721af8357cfee4de6a2fe7b32070fc)]
data/README.md CHANGED
@@ -1,102 +1,204 @@
1
- # RSpecQ
1
+ RSpec Queue
2
+ =========================================================================
3
+ [![Build Status](https://travis-ci.com/skroutz/rspecq.svg?branch=master)](https://travis-ci.com/github/skroutz/rspecq)
4
+ [![Gem Version](https://badge.fury.io/rb/rspecq.svg)](https://badge.fury.io/rb/rspecq)
2
5
 
3
- RSpecQ (`rspecq`) distributes and executes an RSpec suite over many workers,
4
- using a centralized queue backed by Redis.
6
+ RSpec Queue (RSpecQ) distributes and executes RSpec suites among parallel
7
+ workers. It uses a centralized queue that workers connect to and pop off
8
+ tests from. It ensures optimal scheduling of tests based on their run time,
9
+ facilitating faster CI builds.
5
10
 
6
- RSpecQ is heavily inspired by [test-queue](https://github.com/tmm1/test-queue)
11
+ RSpecQ is inspired by [test-queue](https://github.com/tmm1/test-queue)
7
12
  and [ci-queue](https://github.com/Shopify/ci-queue).
8
13
 
9
- ## Why don't you just use ci-queue?
14
+ ## Features
15
+
16
+ - Run an RSpec suite among many workers
17
+ (potentially located in different hosts) in a distributed fashion,
18
+ facilitating faster CI builds.
19
+ - Consolidated, real-time reporting of a build's progress.
20
+ - Optimal scheduling of test execution by using timings statistics from previous runs and
21
+ automatically scheduling slow spec files as individual examples. See
22
+ [*Spec file splitting*](#spec-file-splitting).
23
+ - Automatic retry of test failures before being considered legit, in order to
24
+ rule out flakiness. See [*Requeues*](#requeues).
25
+ - Handles intermittent worker failures (e.g. network hiccups, faulty hardware etc.)
26
+ by detecting non-responsive workers and requeing their jobs. See [*Worker failures*](#worker-failures)
27
+ - [Sentry](https://sentry.io) integration for monitoring important
28
+ RSpecQ-level events.
29
+ - [PLANNED] StatsD integration for various build-level metrics and insights.
30
+ See [#2](https://github.com/skroutz/rspecq/issues/2).
10
31
 
11
- While evaluating ci-queue for our RSpec suite, we observed slow boot times
12
- in the workers (up to 3 minutes), increased memory consumption and too much
13
- disk I/O on boot. This is due to the fact that a worker in ci-queue has to
14
- load every spec file on boot. This can be problematic for applications with
15
- a large number of spec files.
16
-
17
- RSpecQ works with spec files as its unit of work (as opposed to ci-queue which
18
- works with individual examples). This means that an RSpecQ worker does not
19
- have to load all spec files at once and so it doesn't have the aforementioned
20
- problems. It also allows suites to keep using `before(:all)` hooks
21
- (which ci-queue explicitly rejects). (Note: RSpecQ also schedules individual
22
- examples, but only when this is deemed necessary, see section
23
- "Spec file splitting").
32
+ ## Usage
24
33
 
25
- We also observed faster build times by scheduling spec files instead of
26
- individual examples, due to way less Redis operations.
34
+ A worker needs to be given a name and the build it will participate in.
35
+ Assuming there's a Redis instance listening at `localhost`, starting a worker
36
+ is as simple as:
27
37
 
28
- The downside of this design is that it's more complicated, since the scheduling
29
- of spec files happens based on timings calculated from previous runs. This
30
- means that RSpecQ maintains a key with the timing of each job and updates it
31
- on every run (if the `--timings` option was used). Also, RSpecQ has a "slow
32
- file threshold" which, currently has to be set manually (but this can be
33
- improved).
38
+ ```shell
39
+ $ rspecq --build=123 --worker=foo1 spec/
40
+ ```
34
41
 
35
- *Update*: ci-queue deprecated support for RSpec, so there's that.
42
+ To start more workers for the same build, use distinct worker IDs but the same
43
+ build ID:
36
44
 
37
- ## Usage
45
+ ```shell
46
+ $ rspecq --build=123 --worker=foo2
47
+ ```
38
48
 
39
- Each worker needs to know the build it will participate in, its name and where
40
- Redis is located. To start a worker:
49
+ To view the progress of the build use `--report`:
41
50
 
42
51
  ```shell
43
- $ rspecq --build-id=foo --worker-id=worker1 --redis=redis://localhost
52
+ $ rspecq --build=123 --report
44
53
  ```
45
54
 
46
- To view the progress of the build print use `--report`:
55
+ For detailed info use `--help`:
47
56
 
48
- ```shell
49
- $ rspecq --build-id=foo --worker-id=reporter --redis=redis://localhost --report
50
57
  ```
58
+ NAME:
59
+ rspecq - Optimally distribute and run RSpec suites among parallel workers
60
+
61
+ USAGE:
62
+ rspecq [<options>] [spec files or directories]
63
+
64
+ OPTIONS:
65
+ -b, --build ID A unique identifier for the build. Should be common among workers participating in the same build.
66
+ -w, --worker ID An identifier for the worker. Workers participating in the same build should have distinct IDs.
67
+ -r, --redis HOST Redis host to connect to (default: 127.0.0.1).
68
+ --update-timings Update the global job timings key with the timings of this build. Note: This key is used as the basis for job scheduling.
69
+ --file-split-threshold N Split spec files slower than N seconds and schedule them as individual examples.
70
+ --report Enable reporter mode: do not pull tests off the queue; instead print build progress and exit when it's finished.
71
+ Exits with a non-zero status code if there were any failures.
72
+ --report-timeout N Fail if build is not finished after N seconds. Only applicable if --report is enabled (default: 3600).
73
+ --max-requeues N Retry failed examples up to N times before considering them legit failures (default: 3).
74
+ -h, --help Show this message.
75
+ -v, --version Print the version and exit.
76
+ ```
77
+
78
+ ### Sentry integration
51
79
 
52
- For detailed info use `--help`.
80
+ RSpecQ can optionally emit build events to a
81
+ [Sentry](https://sentry.io) project by setting the
82
+ [`SENTRY_DSN`](https://github.com/getsentry/raven-ruby#raven-only-runs-when-sentry_dsn-is-set)
83
+ environment variable.
84
+
85
+ This is convenient for monitoring important warnings/errors that may impact
86
+ build times, such as the fact that no previous timings were found and
87
+ therefore job scheduling was effectively random for a particular build.
53
88
 
54
89
 
55
90
  ## How it works
56
91
 
57
- The basic idea is identical to ci-queue so please refer to its README
92
+ The core design is almost identical to ci-queue so please refer to its
93
+ [README](https://github.com/Shopify/ci-queue/blob/master/README.md) instead.
58
94
 
59
95
  ### Terminology
60
96
 
61
- - Job: the smallest unit of work, which is usually a spec file
97
+ - **Job**: the smallest unit of work, which is usually a spec file
62
98
  (e.g. `./spec/models/foo_spec.rb`) but can also be an individual example
63
- (e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow
64
- - Queue: a collection of Redis-backed structures that hold all the necessary
65
- information for RSpecQ to function. This includes timing statistics, jobs to
66
- be executed, the failure reports, requeueing statistics and more.
67
- - Worker: a process that, given a build id, pops up jobs of that build and
68
- executes them using RSpec
69
- - Reporter: a process that, given a build id, waits for the build to finish
70
- and prints the summary report (examples executed, build result, failures etc.)
99
+ (e.g. `./spec/models/foo_spec.rb[1:2:1]`) if the file is too slow.
100
+ - **Queue**: a collection of Redis-backed structures that hold all the necessary
101
+ information for an RSpecQ build to run. This includes timing statistics,
102
+ jobs to be executed, the failure reports and more.
103
+ - **Build**: a particular test suite run. Each build has its own **Queue**.
104
+ - **Worker**: an `rspecq` process that, given a build id, consumes jobs off the
105
+ build's queue and executes them using RSpec
106
+ - **Reporter**: an `rspecq` process that, given a build id, waits for the build's
107
+ queue to be drained and prints the build summary report
71
108
 
72
109
  ### Spec file splitting
73
110
 
74
- Very slow files may put a limit to how fast the suite can execute. For example,
75
- a worker may spend 10 minutes running a single slow file, while all the other
76
- workers finish after 8 minutes. To overcome this issue, rspecq splits
77
- files that their execution time is above a certain threshold
78
- (set with the `--file-split-threshold` option) and will instead schedule them as
79
- individual examples.
111
+ Particularly slow spec files may set a limit to how fast a build can be.
112
+ For example, a single file may need 10 minutes to run while all other
113
+ files finish after 8 minutes. This would cause all but one workers to be
114
+ sitting idle for 2 minutes.
115
+
116
+ To overcome this issue, RSpecQ can splits files which their execution time is
117
+ above a certain threshold (set with the `--file-split-threshold` option)
118
+ and instead schedule them as individual examples.
80
119
 
81
- In the future, we'd like for the slow threshold to be calculated and set
82
- dynamically.
120
+ Note: In the future, we'd like for the slow threshold to be calculated and set
121
+ dynamically (see #3).
83
122
 
84
123
  ### Requeues
85
124
 
86
- As a mitigation measure for flaky tests, if an example fails it will be put
87
- back to the queue to be picked up by
88
- another worker. This will be repeated up to a certain number of times before,
89
- after which the example will be considered a legit failure and will be printed
90
- in the final report (`--report`).
125
+ As a mitigation technique against flaky tests, if an example fails it will be
126
+ put back to the queue to be picked up by another worker. This will be repeated
127
+ up to a certain number of times (set with the `--max-requeues` option), after
128
+ which the example will be considered a legit failure and printed as such in the
129
+ final report.
91
130
 
92
131
  ### Worker failures
93
132
 
94
- Workers emit a timestamp after each example, as a heartbeat, to denote
95
- that they're fine and performing jobs. If a worker hasn't reported for
96
- a given amount of time (see `WORKER_LIVENESS_SEC`) it is considered dead
97
- and the job it reserved will be requeued, so that it is picked up by another worker.
133
+ It's not uncommon for CI processes to encounter unrecoverable failures for
134
+ various reasons: faulty hardware, network hiccups, segmentation faults in
135
+ MRI etc.
136
+
137
+ For resiliency against such issues, workers emit a heartbeat after each
138
+ example they execute, to signal
139
+ that they're healthy and performing jobs as expected. If a worker hasn't
140
+ emitted a heartbeat for a given amount of time (set by `WORKER_LIVENESS_SEC`)
141
+ it is considered dead and its reserved job will be put back to the queue, to
142
+ be picked up by another healthy worker.
143
+
144
+
145
+ ## Rationale
146
+
147
+ ### Why didn't you use ci-queue?
148
+
149
+ **Update**: ci-queue [deprecated support for RSpec](https://github.com/Shopify/ci-queue/pull/149).
150
+
151
+ While evaluating ci-queue we experienced slow worker boot
152
+ times (up to 3 minutes in some cases) combined with disk IO saturation and
153
+ increased memory consumption. This is due to the fact that a worker in
154
+ ci-queue has to load every spec file on boot. In applications with a large
155
+ number of spec files this may result in a significant performance hit and
156
+ in case of cloud environments, increased costs.
157
+
158
+ We also observed slower build times compared to our previous solution which
159
+ scheduled whole spec files (as opposed to individual examples), due to
160
+ big differences in runtimes of individual examples, something common in big
161
+ RSpec suites.
162
+
163
+ We decided for RSpecQ to use whole spec files as its main unit of work (as
164
+ opposed to ci-queue which uses individual examples). This means that an RSpecQ
165
+ worker only loads the files needed and ends up with a subset of all the suite's
166
+ files. (Note: RSpecQ also schedules individual examples, but only when this is
167
+ deemed necessary, see [Spec file splitting](#spec-file-splitting)).
168
+
169
+ This kept boot and test run times considerably fast. As a side benefit, this
170
+ allows suites to keep using `before(:all)` hooks (which ci-queue explicitly
171
+ rejects).
172
+
173
+ The downside of this design is that it's more complicated, since the scheduling
174
+ of spec files happens based on timings calculated from previous runs. This
175
+ means that RSpecQ maintains a key with the timing of each job and updates it
176
+ on every run (if the `--timings` option was used). Also, RSpecQ has a "slow
177
+ file threshold" which, currently has to be set manually (but this can be
178
+ improved in the future).
179
+
180
+
181
+ ## Development
182
+
183
+ Install the required dependencies:
184
+
185
+ ```
186
+ $ bundle install
187
+ ```
188
+
189
+ Then you can execute the tests after spinning up a Redis instance at
190
+ `127.0.0.1:6379`:
191
+
192
+ ```
193
+ $ bundle exec rake
194
+ ```
195
+
196
+ To enable verbose output in the tests:
197
+
198
+ ```
199
+ $ RSPECQ_DEBUG=1 bundle exec rake
200
+ ```
98
201
 
99
- This protects us against unrecoverable worker failures (e.g. segfault).
100
202
 
101
203
  ## License
102
204
 
@@ -0,0 +1,9 @@
1
+ require "rake/testtask"
2
+
3
+ Rake::TestTask.new do |t|
4
+ t.libs << "test"
5
+ t.test_files = FileList['test/test*.rb']
6
+ t.verbose = true
7
+ end
8
+
9
+ task default: :test
data/bin/rspecq CHANGED
@@ -1,67 +1,118 @@
1
1
  #!/usr/bin/env ruby
2
- require "optionparser"
2
+ require "optparse"
3
3
  require "rspecq"
4
4
 
5
+ DEFAULT_REDIS_HOST = "127.0.0.1"
6
+ DEFAULT_REPORT_TIMEOUT = 3600 # 1 hour
7
+ DEFAULT_MAX_REQUEUES = 3
8
+
9
+ def env_set?(var)
10
+ ["1", "true"].include?(ENV[var])
11
+ end
12
+
5
13
  opts = {}
14
+
6
15
  OptionParser.new do |o|
7
- o.banner = "Usage: #{$PROGRAM_NAME} [opts] [files_or_directories_to_run]"
16
+ name = File.basename($PROGRAM_NAME)
8
17
 
9
- o.on("--build-id ID", "A unique identifier denoting the build") do |v|
10
- opts[:build_id] = v
18
+ o.banner = <<~BANNER
19
+ NAME:
20
+ #{name} - Optimally distribute and run RSpec suites among parallel workers
21
+
22
+ USAGE:
23
+ #{name} [<options>] [spec files or directories]
24
+ BANNER
25
+
26
+ o.separator ""
27
+ o.separator "OPTIONS:"
28
+
29
+ o.on("-b", "--build ID", "A unique identifier for the build. Should be " \
30
+ "common among workers participating in the same build.") do |v|
31
+ opts[:build] = v
11
32
  end
12
33
 
13
- o.on("--worker-id ID", "A unique identifier denoting the worker") do |v|
14
- opts[:worker_id] = v
34
+ o.on("-w", "--worker ID", "An identifier for the worker. Workers " \
35
+ "participating in the same build should have distinct IDs.") do |v|
36
+ opts[:worker] = v
15
37
  end
16
38
 
17
- o.on("--redis HOST", "Redis HOST to connect to (default: 127.0.0.1)") do |v|
18
- opts[:redis_host] = v || "127.0.0.1"
39
+ o.on("-r", "--redis HOST", "Redis host to connect to " \
40
+ "(default: #{DEFAULT_REDIS_HOST}).") do |v|
41
+ opts[:redis_host] = v
19
42
  end
20
43
 
21
- o.on("--timings", "Populate global job timings in Redis") do |v|
44
+ o.on("--update-timings", "Update the global job timings key with the " \
45
+ "timings of this build. Note: This key is used as the basis for job " \
46
+ "scheduling.") do |v|
22
47
  opts[:timings] = v
23
48
  end
24
49
 
25
- o.on("--file-split-threshold N", "Split spec files slower than N sec. and " \
26
- "schedule them by example (default: 999999)") do |v|
27
- opts[:file_split_threshold] = Float(v)
50
+ o.on("--file-split-threshold N", Integer, "Split spec files slower than N " \
51
+ "seconds and schedule them as individual examples.") do |v|
52
+ opts[:file_split_threshold] = v
28
53
  end
29
54
 
30
- o.on("--report", "Do not execute tests but wait until queue is empty and " \
31
- "print a report") do |v|
55
+ o.on("--report", "Enable reporter mode: do not pull tests off the queue; " \
56
+ "instead print build progress and exit when it's " \
57
+ "finished.\n#{o.summary_indent*9} " \
58
+ "Exits with a non-zero status code if there were any " \
59
+ "failures.") do |v|
32
60
  opts[:report] = v
33
61
  end
34
62
 
35
- o.on("--report-timeout N", Integer, "Fail if queue is not empty after " \
36
- "N seconds. Only applicable if --report is enabled " \
37
- "(default: 3600)") do |v|
63
+ o.on("--report-timeout N", Integer, "Fail if build is not finished after " \
64
+ "N seconds. Only applicable if --report is enabled " \
65
+ "(default: #{DEFAULT_REPORT_TIMEOUT}).") do |v|
38
66
  opts[:report_timeout] = v
39
67
  end
40
68
 
69
+ o.on("--max-requeues N", Integer, "Retry failed examples up to N times " \
70
+ "before considering them legit failures " \
71
+ "(default: #{DEFAULT_MAX_REQUEUES}).") do |v|
72
+ opts[:max_requeues] = v
73
+ end
74
+
75
+ o.on_tail("-h", "--help", "Show this message.") do
76
+ puts o
77
+ exit
78
+ end
79
+
80
+ o.on_tail("-v", "--version", "Print the version and exit.") do
81
+ puts "#{name} #{RSpecQ::VERSION}"
82
+ exit
83
+ end
41
84
  end.parse!
42
85
 
43
- [:build_id, :worker_id].each do |o|
44
- raise OptionParser::MissingArgument.new(o) if opts[o].nil?
45
- end
86
+ opts[:build] ||= ENV["RSPECQ_BUILD"]
87
+ opts[:worker] ||= ENV["RSPECQ_WORKER"]
88
+ opts[:redis_host] ||= ENV["RSPECQ_REDIS"] || DEFAULT_REDIS_HOST
89
+ opts[:timings] ||= env_set?("RSPECQ_UPDATE_TIMINGS")
90
+ opts[:file_split_threshold] ||= Integer(ENV["RSPECQ_FILE_SPLIT_THRESHOLD"] || 9999999)
91
+ opts[:report] ||= env_set?("RSPECQ_REPORT")
92
+ opts[:report_timeout] ||= Integer(ENV["RSPECQ_REPORT_TIMEOUT"] || DEFAULT_REPORT_TIMEOUT)
93
+ opts[:max_requeues] ||= Integer(ENV["RSPECQ_MAX_REQUEUES"] || DEFAULT_MAX_REQUEUES)
94
+
95
+ raise OptionParser::MissingArgument.new(:build) if opts[:build].nil?
96
+ raise OptionParser::MissingArgument.new(:worker) if !opts[:report] && opts[:worker].nil?
46
97
 
47
98
  if opts[:report]
48
99
  reporter = RSpecQ::Reporter.new(
49
- build_id: opts[:build_id],
50
- worker_id: opts[:worker_id],
51
- timeout: opts[:report_timeout] || 3600,
100
+ build_id: opts[:build],
101
+ timeout: opts[:report_timeout],
52
102
  redis_host: opts[:redis_host],
53
103
  )
54
104
 
55
105
  reporter.report
56
106
  else
57
107
  worker = RSpecQ::Worker.new(
58
- build_id: opts[:build_id],
59
- worker_id: opts[:worker_id],
60
- redis_host: opts[:redis_host],
61
- files_or_dirs_to_run: ARGV[0] || "spec",
108
+ build_id: opts[:build],
109
+ worker_id: opts[:worker],
110
+ redis_host: opts[:redis_host]
62
111
  )
63
112
 
113
+ worker.files_or_dirs_to_run = ARGV[0] if ARGV[0]
64
114
  worker.populate_timings = opts[:timings]
65
- worker.file_split_threshold = opts[:file_split_threshold] || 999999
115
+ worker.file_split_threshold = opts[:file_split_threshold]
116
+ worker.max_requeues = opts[:max_requeues]
66
117
  worker.work
67
118
  end
@@ -1,11 +1,10 @@
1
1
  require "rspec/core"
2
+ require "sentry-raven"
2
3
 
3
4
  module RSpecQ
4
- MAX_REQUEUES = 3
5
-
6
- # If a worker haven't executed an RSpec example for more than this time
7
- # (in seconds), it is considered dead and its reserved work will be put back
8
- # to the queue, to be picked up by another worker.
5
+ # If a worker haven't executed an example for more than WORKER_LIVENESS_SEC
6
+ # seconds, it is considered dead and its reserved work will be put back
7
+ # to the queue to be picked up by another worker.
9
8
  WORKER_LIVENESS_SEC = 60.0
10
9
  end
11
10
 
@@ -16,6 +15,5 @@ require_relative "rspecq/formatters/worker_heartbeat_recorder"
16
15
 
17
16
  require_relative "rspecq/queue"
18
17
  require_relative "rspecq/reporter"
19
- require_relative "rspecq/worker"
20
-
21
18
  require_relative "rspecq/version"
19
+ require_relative "rspecq/worker"
@@ -1,11 +1,12 @@
1
1
  module RSpecQ
2
2
  module Formatters
3
3
  class FailureRecorder
4
- def initialize(queue, job)
4
+ def initialize(queue, job, max_requeues)
5
5
  @queue = queue
6
6
  @job = job
7
7
  @colorizer = RSpec::Core::Formatters::ConsoleCodes
8
8
  @non_example_error_recorded = false
9
+ @max_requeues = max_requeues
9
10
  end
10
11
 
11
12
  # Here we're notified about errors occuring outside of examples.
@@ -24,7 +25,7 @@ module RSpecQ
24
25
  def example_failed(notification)
25
26
  example = notification.example
26
27
 
27
- if @queue.requeue_job(example.id, MAX_REQUEUES)
28
+ if @queue.requeue_job(example.id, @max_requeues)
28
29
  # HACK: try to avoid picking the job we just requeued; we want it
29
30
  # to be picked up by a different worker
30
31
  sleep 0.5
@@ -57,6 +57,8 @@ module RSpecQ
57
57
  STATUS_INITIALIZING = "initializing".freeze
58
58
  STATUS_READY = "ready".freeze
59
59
 
60
+ attr_reader :redis
61
+
60
62
  def initialize(build_id, worker_id, redis_host)
61
63
  @build_id = build_id
62
64
  @worker_id = worker_id
@@ -150,13 +152,21 @@ module RSpecQ
150
152
  end
151
153
 
152
154
  def example_count
153
- @redis.get(key_example_count) || 0
155
+ @redis.get(key_example_count).to_i
154
156
  end
155
157
 
156
158
  def processed_jobs_count
157
159
  @redis.scard(key_queue_processed)
158
160
  end
159
161
 
162
+ def processed_jobs
163
+ @redis.smembers(key_queue_processed)
164
+ end
165
+
166
+ def requeued_jobs
167
+ @redis.hgetall(key_requeues)
168
+ end
169
+
160
170
  def become_master
161
171
  @redis.setnx(key_queue_status, STATUS_INITIALIZING)
162
172
  end
@@ -200,10 +210,10 @@ module RSpecQ
200
210
  exhausted? && example_failures.empty? && non_example_errors.empty?
201
211
  end
202
212
 
203
- private
204
-
205
- def key(*keys)
206
- [@build_id, keys].join(":")
213
+ # The remaining jobs to be processed. Jobs at the head of the list will
214
+ # be procesed first.
215
+ def unprocessed_jobs
216
+ @redis.lrange(key_queue_unprocessed, 0, -1)
207
217
  end
208
218
 
209
219
  # redis: STRING [STATUS_INITIALIZING, STATUS_READY]
@@ -279,6 +289,12 @@ module RSpecQ
279
289
  "build_times"
280
290
  end
281
291
 
292
+ private
293
+
294
+ def key(*keys)
295
+ [@build_id, keys].join(":")
296
+ end
297
+
282
298
  # We don't use any Ruby `Time` methods because specs that use timecop in
283
299
  # before(:all) hooks will mess up our times.
284
300
  def current_time
@@ -1,10 +1,9 @@
1
1
  module RSpecQ
2
2
  class Reporter
3
- def initialize(build_id:, worker_id:, timeout:, redis_host:)
3
+ def initialize(build_id:, timeout:, redis_host:)
4
4
  @build_id = build_id
5
- @worker_id = worker_id
6
5
  @timeout = timeout
7
- @queue = Queue.new(build_id, worker_id, redis_host)
6
+ @queue = Queue.new(build_id, "reporter", redis_host)
8
7
 
9
8
  # We want feedback to be immediattely printed to CI users, so
10
9
  # we disable buffering.
@@ -12,7 +11,7 @@ module RSpecQ
12
11
  end
13
12
 
14
13
  def report
15
- t = measure_duration { @queue.wait_until_published }
14
+ @queue.wait_until_published
16
15
 
17
16
  finished = false
18
17
 
@@ -1,3 +1,3 @@
1
1
  module RSpecQ
2
- VERSION = "0.0.1.pre2".freeze
2
+ VERSION = "0.1.0".freeze
3
3
  end
@@ -1,10 +1,16 @@
1
1
  require "json"
2
+ require "pathname"
2
3
  require "pp"
3
4
 
4
5
  module RSpecQ
5
6
  class Worker
6
7
  HEARTBEAT_FREQUENCY = WORKER_LIVENESS_SEC / 6
7
8
 
9
+ # The root path or individual spec files to execute.
10
+ #
11
+ # Defaults to "spec" (just like in RSpec)
12
+ attr_accessor :files_or_dirs_to_run
13
+
8
14
  # If true, job timings will be populated in the global Redis timings key
9
15
  #
10
16
  # Defaults to false
@@ -12,15 +18,27 @@ module RSpecQ
12
18
 
13
19
  # If set, spec files that are known to take more than this value to finish,
14
20
  # will be split and scheduled on a per-example basis.
21
+ #
22
+ # Defaults to 999999
15
23
  attr_accessor :file_split_threshold
16
24
 
17
- def initialize(build_id:, worker_id:, redis_host:, files_or_dirs_to_run:)
25
+ # Retry failed examples up to N times (with N being the supplied value)
26
+ # before considering them legit failures
27
+ #
28
+ # Defaults to 3
29
+ attr_accessor :max_requeues
30
+
31
+ attr_reader :queue
32
+
33
+ def initialize(build_id:, worker_id:, redis_host:)
18
34
  @build_id = build_id
19
35
  @worker_id = worker_id
20
36
  @queue = Queue.new(build_id, worker_id, redis_host)
21
- @files_or_dirs_to_run = files_or_dirs_to_run
37
+ @files_or_dirs_to_run = "spec"
22
38
  @populate_timings = false
23
39
  @file_split_threshold = 999999
40
+ @heartbeat_updated_at = nil
41
+ @max_requeues = 3
24
42
 
25
43
  RSpec::Core::Formatters.register(Formatters::JobTimingRecorder, :dump_summary)
26
44
  RSpec::Core::Formatters.register(Formatters::ExampleCountRecorder, :dump_summary)
@@ -31,23 +49,23 @@ module RSpecQ
31
49
  def work
32
50
  puts "Working for build #{@build_id} (worker=#{@worker_id})"
33
51
 
34
- try_publish_queue!(@queue)
35
- @queue.wait_until_published
52
+ try_publish_queue!(queue)
53
+ queue.wait_until_published
36
54
 
37
55
  loop do
38
56
  # we have to bootstrap this so that it can be used in the first call
39
57
  # to `requeue_lost_job` inside the work loop
40
58
  update_heartbeat
41
59
 
42
- lost = @queue.requeue_lost_job
60
+ lost = queue.requeue_lost_job
43
61
  puts "Requeued lost job: #{lost}" if lost
44
62
 
45
63
  # TODO: can we make `reserve_job` also act like exhausted? and get
46
64
  # rid of `exhausted?` (i.e. return false if no jobs remain)
47
- job = @queue.reserve_job
65
+ job = queue.reserve_job
48
66
 
49
67
  # build is finished
50
- return if job.nil? && @queue.exhausted?
68
+ return if job.nil? && queue.exhausted?
51
69
 
52
70
  next if job.nil?
53
71
 
@@ -60,112 +78,125 @@ module RSpecQ
60
78
  RSpec.configuration.detail_color = :magenta
61
79
  RSpec.configuration.seed = srand && srand % 0xFFFF
62
80
  RSpec.configuration.backtrace_formatter.filter_gem('rspecq')
63
- RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(@queue, job))
64
- RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(@queue))
81
+ RSpec.configuration.add_formatter(Formatters::FailureRecorder.new(queue, job, max_requeues))
82
+ RSpec.configuration.add_formatter(Formatters::ExampleCountRecorder.new(queue))
65
83
  RSpec.configuration.add_formatter(Formatters::WorkerHeartbeatRecorder.new(self))
66
84
 
67
85
  if populate_timings
68
- RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(@queue, job))
86
+ RSpec.configuration.add_formatter(Formatters::JobTimingRecorder.new(queue, job))
69
87
  end
70
88
 
71
89
  opts = RSpec::Core::ConfigurationOptions.new(["--format", "progress", job])
72
90
  _result = RSpec::Core::Runner.new(opts).run($stderr, $stdout)
73
91
 
74
- @queue.acknowledge_job(job)
92
+ queue.acknowledge_job(job)
75
93
  end
76
94
  end
77
95
 
78
96
  # Update the worker heartbeat if necessary
79
97
  def update_heartbeat
80
98
  if @heartbeat_updated_at.nil? || elapsed(@heartbeat_updated_at) >= HEARTBEAT_FREQUENCY
81
- @queue.record_worker_heartbeat
99
+ queue.record_worker_heartbeat
82
100
  @heartbeat_updated_at = Process.clock_gettime(Process::CLOCK_MONOTONIC)
83
101
  end
84
102
  end
85
103
 
86
- private
87
-
88
- def reset_rspec_state!
89
- RSpec.clear_examples
90
-
91
- # TODO: remove after https://github.com/rspec/rspec-core/pull/2723
92
- RSpec.world.instance_variable_set(:@example_group_counts_by_spec_file, Hash.new(0))
93
-
94
- # RSpec.clear_examples does not reset those, which causes issues when
95
- # a non-example error occurs (subsequent jobs are not executed)
96
- # TODO: upstream
97
- RSpec.world.non_example_failure = false
98
-
99
- # we don't want an error that occured outside of the examples (which
100
- # would set this to `true`) to stop the worker
101
- RSpec.world.wants_to_quit = false
102
- end
103
-
104
104
  def try_publish_queue!(queue)
105
105
  return if !queue.become_master
106
106
 
107
- RSpec.configuration.files_or_directories_to_run = @files_or_dirs_to_run
107
+ RSpec.configuration.files_or_directories_to_run = files_or_dirs_to_run
108
108
  files_to_run = RSpec.configuration.files_to_run.map { |j| relative_path(j) }
109
109
 
110
110
  timings = queue.timings
111
111
  if timings.empty?
112
- # TODO: should be a warning reported somewhere (Sentry?)
113
112
  q_size = queue.publish(files_to_run.shuffle)
114
- puts "WARNING: No timings found! Published queue in " \
115
- "random order (size=#{q_size})"
113
+ log_event(
114
+ "No timings found! Published queue in random order (size=#{q_size})",
115
+ "warning"
116
+ )
116
117
  return
117
118
  end
118
119
 
119
- slow_files = timings.take_while do |_job, duration|
120
- duration >= file_split_threshold
121
- end.map(&:first) & files_to_run
120
+ # prepare jobs to run
121
+ jobs = []
122
+ slow_files = []
122
123
 
123
- if slow_files.any?
124
- puts "Slow files (threshold=#{file_split_threshold}): #{slow_files}"
124
+ if file_split_threshold
125
+ slow_files = timings.take_while do |_job, duration|
126
+ duration >= file_split_threshold
127
+ end.map(&:first) & files_to_run
125
128
  end
126
129
 
127
- # prepare jobs to run
128
- jobs = []
129
- jobs.concat(files_to_run - slow_files)
130
- jobs.concat(files_to_example_ids(slow_files)) if slow_files.any?
130
+ if slow_files.any?
131
+ jobs.concat(files_to_run - slow_files)
132
+ jobs.concat(files_to_example_ids(slow_files))
133
+ else
134
+ jobs.concat(files_to_run)
135
+ end
131
136
 
132
- # assign timings to all of them
133
137
  default_timing = timings.values[timings.values.size/2]
134
138
 
139
+ # assign timings (based on previous runs) to all jobs
135
140
  jobs = jobs.each_with_object({}) do |j, h|
136
- # heuristic: put untimed jobs in the middle of the queue
137
- puts "New/untimed job: #{j}" if timings[j].nil?
141
+ puts "Untimed job: #{j}" if timings[j].nil?
142
+
143
+ # HEURISTIC: put jobs without previous timings (e.g. a newly added
144
+ # spec file) in the middle of the queue
138
145
  h[j] = timings[j] || default_timing
139
146
  end
140
147
 
141
- # finally, sort them based on their timing (slowest first)
148
+ # sort jobs based on their timings (slowest to be processed first)
142
149
  jobs = jobs.sort_by { |_j, t| -t }.map(&:first)
143
150
 
144
151
  puts "Published queue (size=#{queue.publish(jobs)})"
145
152
  end
146
153
 
154
+ private
155
+
156
+ def reset_rspec_state!
157
+ RSpec.clear_examples
158
+
159
+ # see https://github.com/rspec/rspec-core/pull/2723
160
+ if Gem::Version.new(RSpec::Core::Version::STRING) <= Gem::Version.new("3.9.1")
161
+ RSpec.world.instance_variable_set(
162
+ :@example_group_counts_by_spec_file, Hash.new(0))
163
+ end
164
+
165
+ # RSpec.clear_examples does not reset those, which causes issues when
166
+ # a non-example error occurs (subsequent jobs are not executed)
167
+ # TODO: upstream
168
+ RSpec.world.non_example_failure = false
169
+
170
+ # we don't want an error that occured outside of the examples (which
171
+ # would set this to `true`) to stop the worker
172
+ RSpec.world.wants_to_quit = false
173
+ end
174
+
147
175
  # NOTE: RSpec has to load the files before we can split them as individual
148
176
  # examples. In case a file to be splitted fails to be loaded
149
- # (e.g. contains a syntax error), we return the slow files unchanged,
150
- # thereby falling back to scheduling them normally.
151
- #
152
- # Their errors will be reported in the normal flow, when they're picked up
153
- # as jobs by a worker.
177
+ # (e.g. contains a syntax error), we return the files unchanged, thereby
178
+ # falling back to scheduling them as whole files. Their errors will be
179
+ # reported in the normal flow when they're eventually picked up by a worker.
154
180
  def files_to_example_ids(files)
155
- # TODO: do this programatically
156
- cmd = "DISABLE_SPRING=1 bin/rspec --dry-run --format json #{files.join(' ')}"
181
+ cmd = "DISABLE_SPRING=1 bundle exec rspec --dry-run --format json #{files.join(' ')} 2>&1"
157
182
  out = `#{cmd}`
183
+ cmd_result = $?
158
184
 
159
- if !$?.success?
160
- # TODO: emit warning to Sentry
161
- puts "WARNING: Error splitting slow files; falling back to regular scheduling:"
185
+ if !cmd_result.success?
186
+ rspec_output = begin
187
+ JSON.parse(out)
188
+ rescue JSON::ParserError
189
+ out
190
+ end
162
191
 
163
- begin
164
- pp JSON.parse(out)
165
- rescue JSON::ParserError
166
- puts out
167
- end
168
- puts
192
+ log_event(
193
+ "Failed to split slow files, falling back to regular scheduling",
194
+ "error",
195
+ rspec_output: rspec_output,
196
+ cmd_result: cmd_result.inspect,
197
+ )
198
+
199
+ pp rspec_output
169
200
 
170
201
  return files
171
202
  end
@@ -181,5 +212,23 @@ module RSpecQ
181
212
  def elapsed(since)
182
213
  Process.clock_gettime(Process::CLOCK_MONOTONIC) - since
183
214
  end
215
+
216
+ # Prints msg to standard output and emits an event to Sentry, if the
217
+ # SENTRY_DSN environment variable is set.
218
+ def log_event(msg, level, additional={})
219
+ puts msg
220
+
221
+ Raven.capture_message(msg, level: level, extra: {
222
+ build: @build_id,
223
+ worker: @worker_id,
224
+ queue: queue.inspect,
225
+ files_or_dirs_to_run: files_or_dirs_to_run,
226
+ populate_timings: populate_timings,
227
+ file_split_threshold: file_split_threshold,
228
+ heartbeat_updated_at: @heartbeat_updated_at,
229
+ object: self.inspect,
230
+ pid: Process.pid,
231
+ }.merge(additional))
232
+ end
184
233
  end
185
234
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rspecq
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1.pre2
4
+ version: 0.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Agis Anastasopoulos
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-06-26 00:00:00.000000000 Z
11
+ date: 2020-08-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec-core
@@ -38,22 +38,64 @@ dependencies:
38
38
  - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: sentry-raven
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rake
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: pry-byebug
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
41
83
  - !ruby/object:Gem::Dependency
42
84
  name: minitest
43
85
  requirement: !ruby/object:Gem::Requirement
44
86
  requirements:
45
- - - "~>"
87
+ - - ">="
46
88
  - !ruby/object:Gem::Version
47
- version: '5.14'
89
+ version: '0'
48
90
  type: :development
49
91
  prerelease: false
50
92
  version_requirements: !ruby/object:Gem::Requirement
51
93
  requirements:
52
- - - "~>"
94
+ - - ">="
53
95
  - !ruby/object:Gem::Version
54
- version: '5.14'
96
+ version: '0'
55
97
  - !ruby/object:Gem::Dependency
56
- name: rake
98
+ name: rspec
57
99
  requirement: !ruby/object:Gem::Requirement
58
100
  requirements:
59
101
  - - ">="
@@ -76,6 +118,7 @@ files:
76
118
  - CHANGELOG.md
77
119
  - LICENSE
78
120
  - README.md
121
+ - Rakefile
79
122
  - bin/rspecq
80
123
  - lib/rspecq.rb
81
124
  - lib/rspecq/formatters/example_count_recorder.rb
@@ -101,12 +144,13 @@ required_ruby_version: !ruby/object:Gem::Requirement
101
144
  version: '0'
102
145
  required_rubygems_version: !ruby/object:Gem::Requirement
103
146
  requirements:
104
- - - ">"
147
+ - - ">="
105
148
  - !ruby/object:Gem::Version
106
- version: 1.3.1
149
+ version: '0'
107
150
  requirements: []
108
- rubygems_version: 3.1.2
151
+ rubygems_version: 3.1.4
109
152
  signing_key:
110
153
  specification_version: 4
111
- summary: Distribute an RSpec suite among many workers
154
+ summary: Optimally distribute and run RSpec suites among parallel workers; for faster
155
+ CI builds
112
156
  test_files: []