super_spreader 0.1.0.beta2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.circleci/config.yml +145 -0
- data/.github/CODEOWNERS +3 -0
- data/.gitignore +15 -0
- data/.rspec +3 -0
- data/.ruby-version +1 -0
- data/.travis.yml +7 -0
- data/CHANGELOG.md +6 -0
- data/CONTRIBUTING.md +30 -0
- data/Gemfile +6 -0
- data/Gemfile.lock +196 -0
- data/Guardfile +18 -0
- data/LICENSE.txt +13 -0
- data/README.md +240 -0
- data/Rakefile +35 -0
- data/bin/console +14 -0
- data/bin/setup +7 -0
- data/lib/super_spreader/batch_helper.rb +40 -0
- data/lib/super_spreader/peak_schedule.rb +22 -0
- data/lib/super_spreader/redis_model.rb +51 -0
- data/lib/super_spreader/scheduler_config.rb +79 -0
- data/lib/super_spreader/scheduler_job.rb +43 -0
- data/lib/super_spreader/spread_tracker.rb +39 -0
- data/lib/super_spreader/spreader.rb +61 -0
- data/lib/super_spreader/stop_signal.rb +29 -0
- data/lib/super_spreader/version.rb +5 -0
- data/lib/super_spreader.rb +20 -0
- data/super_spreader.gemspec +46 -0
- metadata +341 -0
data/README.md
ADDED
@@ -0,0 +1,240 @@
|
|
1
|
+
# SuperSpreader
|
2
|
+
|
3
|
+
SuperSpreader is a library for massive, memory- and compute-efficient backfills of ActiveRecord models using ActiveJob.
|
4
|
+
|
5
|
+
This tool is built to backfill many millions of records in a resource-efficient way. When paired with a properly written job, it can drastically reduce the wall time of a backfill through parallelization. Jobs are enqueued in small batches so that the ActiveJob backend is not overwhelmed. These jobs can also be stopped at a moment's notice, if needed.
|
6
|
+
|
7
|
+
## Example use cases
|
8
|
+
|
9
|
+
- Re-encrypt data
|
10
|
+
- Make API calls to fill in missing data
|
11
|
+
- Restructuring complex data
|
12
|
+
|
13
|
+
## Warnings
|
14
|
+
|
15
|
+
> [!WARNING]
|
16
|
+
>
|
17
|
+
> **Please be aware:** SuperSpreader is still fairly early in development. While it can be used effectively by experienced hands, we are aware that it could have a better developer experience (DevX). It was written to solve a specific problem (see "History"). We are working to generalize the tool as the need arises. Pull requests are welcome!
|
18
|
+
|
19
|
+
Please also see "Roadmap" for other known limitations that may be relevant to you.
|
20
|
+
|
21
|
+
## History
|
22
|
+
|
23
|
+
SuperSpreader was originally written to re-encrypt the Dialer database, a key component of Doximity's telehealth offerings. Without SuperSpreader, it would have taken **several months** to handle many millions of records using a Key Management Service (KMS) that adds an overhead of 11 ms per record. Using SuperSpreader took the time to backfill down to a couple of weeks. This massive backfill happened safely during very high Dialer usage during the winter of 2020. Of course, the name came from the coronavirus pandemic, which had a number of super-spreader events in the news around the same time. Rather than spreading disease, the SuperSpreader gem spreads out telehealth background jobs to support the healthcare professionals that fight disease.
|
24
|
+
|
25
|
+
Since that time, our team has started to use SuperSpreader in many other situations. Our hope is that other teams, internal and external, can use it if they have similar problems to solve.
|
26
|
+
|
27
|
+
## When should I use it?
|
28
|
+
|
29
|
+
SuperSpreader was built for backfills. If you need to touch every record and you have _a lot_ of records, it may be a good fit.
|
30
|
+
|
31
|
+
That said, it's **not** common to need a tool like SuperSpreader. Many backfills are better handled through SQL or Rake tasks. SuperSpreader should only be used when the additional complexity is warranted. Before using a shiny tool, **please stop and consider the tradeoffs**.
|
32
|
+
|
33
|
+
For some use cases, a pure-SQL migration or Rake task may be a better fit. It may also make sense to use **both** background jobs and foreground task (Rails migration or Rake task). (Consider the database size and number of instances between production and staging.) For that, you might consider [SuperSpreader::BatchHelper](https://github.com/doximity/super_spreader/blob/master/lib/super_spreader/batch_helper.rb).
|
34
|
+
|
35
|
+
The primary criterion to consider is whether the backfill in question is _long-running_. If you estimate it would take at least a couple of days to complete, it makes sense to consider SuperSpreader. Another good reason to consider this tool is _code reuse_. If you already have Ruby-land code that would be difficult or impossible to replicate in SQL, it makes sense to use SuperSpreader, assuming the equivalent Rake task would be impractical.
|
36
|
+
|
37
|
+
## How does it work?
|
38
|
+
|
39
|
+
SuperSpreader enqueues a configurable number of background jobs on a set schedule. These background jobs are executed in small batches such that only a small number of jobs are enqueued at any given time. The jobs start at the most recent record and work back to the first record, based on the auto-incrementing primary key.
|
40
|
+
|
41
|
+
The configuration is able to be tuned for the needs of an individual problem. If the backfill would require months of compute time, it can be run in parallel so that it takes much less time. The resource utilization can be spread out so that shared resources, such as a database, are not overwhelmed with requests. Finally, there is also support for running more jobs during off-peak usage based on a schedule.
|
42
|
+
|
43
|
+
Backfills are implemented using ActiveJob classes. SuperSpreader orchestrates running those jobs. Each set of jobs is enqueued by a scheduler using the supplied configuration.
|
44
|
+
|
45
|
+
As an example, assume that there's a table with 100,000,000 rows which need Ruby-land logic to be applied using `ExampleBackfillJob`. The rate (e.g., how many jobs per second) is configurable. Once configured, SuperSpreader would enqueue job in batches like:
|
46
|
+
|
47
|
+
ExampleBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_901, end_id: 100_000_000
|
48
|
+
ExampleBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_801, end_id: 99_999_900
|
49
|
+
ExampleBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_701, end_id: 99_999_800
|
50
|
+
ExampleBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_601, end_id: 99_999_700
|
51
|
+
ExampleBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_501, end_id: 99_999_600
|
52
|
+
ExampleBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_401, end_id: 99_999_500
|
53
|
+
|
54
|
+
Notice that there are 3 jobs per second, 2 seconds of work were enqueued, and the batch size is 100. Again, this is just an example for illustration, and the configuration can be modified to suit the needs of the problem.
|
55
|
+
|
56
|
+
After running out of work, SuperSpreader will enqueue more work:
|
57
|
+
|
58
|
+
SuperScheduler::SchedulerJob run_at: "2020-11-16T22:52:01Z"
|
59
|
+
|
60
|
+
And the work continues:
|
61
|
+
|
62
|
+
ExampleBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_401, end_id: 99_999_500
|
63
|
+
ExampleBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_301, end_id: 99_999_400
|
64
|
+
ExampleBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_201, end_id: 99_999_300
|
65
|
+
ExampleBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_101, end_id: 99_999_200
|
66
|
+
ExampleBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_001, end_id: 99_999_100
|
67
|
+
ExampleBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_998_901, end_id: 99_999_000
|
68
|
+
|
69
|
+
This process continues until there is no more work to be done. For more detail, please see [Spreader](https://github.com/doximity/super_spreader/blob/master/lib/super_spreader/spreader.rb) and [its spec](https://github.com/doximity/super_spreader/blob/master/spec/spreader_spec.rb).
|
70
|
+
|
71
|
+
Additionally, the configuration can be tuned while SuperSpreader is running. The configuration is read each time `SchedulerJob` runs. As it stands, each run of SuperSpreader is hand-tuned. It is highly recommended that SuperSpreader resource utilization is monitored during runs. That said, it is designed to run autonomously once a good configuration is found.
|
72
|
+
|
73
|
+
Example tuning:
|
74
|
+
|
75
|
+
- Does the process need to go faster? Increase the number of jobs per second.
|
76
|
+
- Are batches taking too long to complete? Decrease the batch size.
|
77
|
+
- Is `SchedulerJob` taking a long time to complete? Decrease the duration so that less work is enqueued in each cycle.
|
78
|
+
|
79
|
+
Finally, SuperSpreader can be stopped instantly and resumed at a later time, if a need ever arises.
|
80
|
+
|
81
|
+
## How do I use it?
|
82
|
+
|
83
|
+
To repeat an earlier disclaimer:
|
84
|
+
|
85
|
+
> **Please be aware:** SuperSpreader is still fairly early in development. While it can be used by experienced hands, we are aware that it could have a better developer experience (DevX). It was written to solve a specific problem (see "History"). We are working to generalize the tool as the need arises. Pull requests are welcome!
|
86
|
+
|
87
|
+
If you haven't yet, please read the "How does it work?" section. This basic workflow is tested in `spec/integration/backfill_spec.rb`.
|
88
|
+
|
89
|
+
First, write a backfill job. Please see [this example for details](https://github.com/doximity/super_spreader/blob/master/spec/support/example_backfill_job.rb).
|
90
|
+
|
91
|
+
Next, configure `SuperSpreader` from the console by saving `SchedulerConfig` to Redis. For documentation on each attribute, please see [SchedulerConfig](https://github.com/doximity/super_spreader/blob/master/lib/super_spreader/scheduler_config.rb). It is recommended that you start slow, with small batches, short durations, and low per-second rates.
|
92
|
+
|
93
|
+
**Important:** SuperSpreader currently only supports a _single_ configuration, though removing that limitation is our Roadmap (please see below).
|
94
|
+
|
95
|
+
```ruby
|
96
|
+
# NOTE: This is an example. You should take your situation into account when
|
97
|
+
# setting these values.
|
98
|
+
config = SuperSpreader::SchedulerConfig.new
|
99
|
+
|
100
|
+
config.batch_size = 10
|
101
|
+
config.duration = 10
|
102
|
+
config.job_class_name = "ExampleBackfillJob"
|
103
|
+
|
104
|
+
config.per_second_on_peak = 3.0
|
105
|
+
config.per_second_off_peak = 3.0
|
106
|
+
|
107
|
+
config.on_peak_timezone = "America/Los_Angeles"
|
108
|
+
config.on_peak_wday_begin = 1
|
109
|
+
config.on_peak_wday_end = 5
|
110
|
+
config.on_peak_hour_begin = 5
|
111
|
+
config.on_peak_hour_end = 17
|
112
|
+
|
113
|
+
config.save
|
114
|
+
```
|
115
|
+
|
116
|
+
Now the `SchedulerJob` can be started. It will run until it is stopped or runs out of work.
|
117
|
+
|
118
|
+
```ruby
|
119
|
+
SuperSpreader::SchedulerJob.perform_now
|
120
|
+
```
|
121
|
+
|
122
|
+
At this point, you should monitor your database and worker instances using the tooling you have available. You should make adjustments based on the metrics you have available.
|
123
|
+
|
124
|
+
Based on those metrics, slowly step up `per_second_on_peak` and `batch_size` while continuing to monitor:
|
125
|
+
|
126
|
+
```ruby
|
127
|
+
config.batch_size = 20
|
128
|
+
config.save
|
129
|
+
```
|
130
|
+
|
131
|
+
```ruby
|
132
|
+
config.per_second_on_peak = 4.0
|
133
|
+
config.save
|
134
|
+
```
|
135
|
+
|
136
|
+
Continue to step up the rates, until you arrive at a rate that is acceptable for your situation.
|
137
|
+
For our re-encryption project as an example, our jobs ran at this rate:
|
138
|
+
|
139
|
+
```ruby
|
140
|
+
# NOTE: This is an example. You should take your situation into account when
|
141
|
+
# setting these values.
|
142
|
+
config = SuperSpreader::SchedulerConfig.new
|
143
|
+
|
144
|
+
config.batch_size = 70
|
145
|
+
config.duration = 180
|
146
|
+
config.job_class_name = "ReencryptJob"
|
147
|
+
|
148
|
+
config.per_second_on_peak = 3.0
|
149
|
+
config.per_second_off_peak = 7.5
|
150
|
+
|
151
|
+
config.on_peak_timezone = "America/Los_Angeles"
|
152
|
+
config.on_peak_wday_begin = 1
|
153
|
+
config.on_peak_wday_end = 5
|
154
|
+
config.on_peak_hour_begin = 5
|
155
|
+
config.on_peak_hour_end = 17
|
156
|
+
|
157
|
+
config.save
|
158
|
+
```
|
159
|
+
|
160
|
+
### Disaster recovery
|
161
|
+
|
162
|
+
If at any point you need to stop the background jobs, stop all scheduling using:
|
163
|
+
|
164
|
+
```ruby
|
165
|
+
SuperSpreader::SchedulerJob.stop!
|
166
|
+
```
|
167
|
+
|
168
|
+
Optionally, if it is acceptable to have a partially-processed cycle, you can stop the backfill jobs as well:
|
169
|
+
|
170
|
+
```ruby
|
171
|
+
ExampleBackfillJob.stop!
|
172
|
+
```
|
173
|
+
|
174
|
+
(Recovering from a partially-processed cycle requires manually setting the correct `initial_id` in `SpreadTracker`.)
|
175
|
+
|
176
|
+
The jobs will still be present in the job runner, but will all execute instantly because of the early return as demonstrated in [the example job](https://github.com/doximity/super_spreader/blob/master/spec/support/example_backfill_job.rb). After the last scheduler job, the process will be paused.
|
177
|
+
|
178
|
+
### Restarting
|
179
|
+
|
180
|
+
If you stop the jobs but you wish to restart them later, use the `go!` method and *then* call `SuperSpreader::SchedulerJob.perform_now`. Otherwise, the jobs will not do any work.
|
181
|
+
|
182
|
+
```ruby
|
183
|
+
ExampleBackfillJob.go!
|
184
|
+
SuperSpreader::SchedulerJob.go!
|
185
|
+
SuperSpreader::SchedulerJob.perform_now
|
186
|
+
```
|
187
|
+
|
188
|
+
## Installation
|
189
|
+
|
190
|
+
If you've gotten this far and think SuperSpreader is a good fit for your problem, these are the instructions for installing it.
|
191
|
+
|
192
|
+
Add this line to your application's Gemfile:
|
193
|
+
|
194
|
+
```ruby
|
195
|
+
gem 'super_spreader'
|
196
|
+
```
|
197
|
+
|
198
|
+
And then execute:
|
199
|
+
|
200
|
+
$ bundle
|
201
|
+
|
202
|
+
Or install it yourself as:
|
203
|
+
|
204
|
+
$ gem install super_spreader
|
205
|
+
|
206
|
+
SuperSpreader requires an ActiveRecord-compatible database, an ActiveJob-compatible job runner, and Redis for bookkeeping.
|
207
|
+
|
208
|
+
For Rails, please set up SuperSpreader using an initializer:
|
209
|
+
|
210
|
+
```ruby
|
211
|
+
# config/initializers/super_spreader.rb
|
212
|
+
|
213
|
+
SuperSpreader.logger = Rails.logger
|
214
|
+
SuperSpreader.redis = Redis.new(url: ENV["REDIS_URL"])
|
215
|
+
```
|
216
|
+
|
217
|
+
## Roadmap
|
218
|
+
|
219
|
+
Please see [the Milestones on GitHub](https://github.com/doximity/super_spreader/milestones?direction=asc&sort=title&state=open).
|
220
|
+
|
221
|
+
## Development
|
222
|
+
|
223
|
+
You'll need [Redis](https://redis.io/docs/getting-started/) and [Ruby](https://www.ruby-lang.org/en/downloads/) installed. Please ensure both are set up before continuing.
|
224
|
+
|
225
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
226
|
+
|
227
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
228
|
+
|
229
|
+
## Contributing
|
230
|
+
|
231
|
+
1. See [CONTRIBUTING.md](./CONTRIBUTING.md)
|
232
|
+
2. Fork it ( https://github.com/doximity/super_spreader/fork )
|
233
|
+
3. Create your feature branch (`git checkout -b my-new-feature`)
|
234
|
+
4. Commit your changes (`git commit -am 'Add some feature'`)
|
235
|
+
5. Push to the branch (`git push origin my-new-feature`)
|
236
|
+
6. Create a new Pull Request
|
237
|
+
|
238
|
+
## License
|
239
|
+
|
240
|
+
`super_spreader` is licensed under an Apache 2 license. Contributors are required to sign a contributor license agreement. See LICENSE.txt and CONTRIBUTING.md for more information.
|
data/Rakefile
ADDED
@@ -0,0 +1,35 @@
|
|
1
|
+
require "bundler/gem_tasks"
|
2
|
+
require "rspec/core/rake_task"
|
3
|
+
require "yard"
|
4
|
+
|
5
|
+
RSpec::Core::RakeTask.new(:spec)
|
6
|
+
YARD::Rake::YardocTask.new(:doc)
|
7
|
+
|
8
|
+
task default: :spec
|
9
|
+
|
10
|
+
desc "Run a REPL with access to this library"
|
11
|
+
task :console do
|
12
|
+
sh("irb -I lib -r super_spreader")
|
13
|
+
end
|
14
|
+
|
15
|
+
namespace :check do
|
16
|
+
desc "Run all checks"
|
17
|
+
task all: %i[redis]
|
18
|
+
|
19
|
+
desc "Confirm Redis is accessible"
|
20
|
+
task :redis do
|
21
|
+
require "redis"
|
22
|
+
|
23
|
+
inaccessible_error_message = "Redis: inaccessible (please confirm that a Redis server is installed and running, e.g. redis-server)"
|
24
|
+
|
25
|
+
redis = Redis.new(url: ENV["REDIS_URL"])
|
26
|
+
|
27
|
+
if redis.ping == "PONG"
|
28
|
+
puts "Redis: OK"
|
29
|
+
else
|
30
|
+
raise inaccessible_error_message
|
31
|
+
end
|
32
|
+
rescue Redis::CannotConnectError
|
33
|
+
raise inaccessible_error_message
|
34
|
+
end
|
35
|
+
end
|
data/bin/console
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "super_spreader"
|
5
|
+
|
6
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
7
|
+
# with your gem easier. You can also use a different console, if you like.
|
8
|
+
|
9
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
10
|
+
# require "pry"
|
11
|
+
# Pry.start
|
12
|
+
|
13
|
+
require "irb"
|
14
|
+
IRB.start(__FILE__)
|
data/bin/setup
ADDED
@@ -0,0 +1,40 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "active_record"
|
4
|
+
|
5
|
+
module SuperSpreader
|
6
|
+
# Methods in this module are suitable for use in Rails migrations. It is
|
7
|
+
# expected that their interface will remain stable. If breaking changes are
|
8
|
+
# introduced, a new module will be introduced so existing migrations will not
|
9
|
+
# be affected.
|
10
|
+
module BatchHelper
|
11
|
+
# Execute SQL in small batches for an entire table.
|
12
|
+
#
|
13
|
+
# It is assumed that the table has a primary key named +id+.
|
14
|
+
#
|
15
|
+
# Recommendation for migrations: Use this in combination with +disable_ddl_transaction!+.
|
16
|
+
# See also: https://github.com/ankane/strong_migrations#backfilling-data
|
17
|
+
#
|
18
|
+
# @param table_name [String] the name of the table
|
19
|
+
# @param step_size [Integer] how many records to process in each batch
|
20
|
+
# @yield [minimum_id, maximum_id] block that returns SQL to migrate records between minimum_id and maximum_id
|
21
|
+
def batch_execute(table_name:, step_size:)
|
22
|
+
result = execute(<<~SQL).to_a.flatten
|
23
|
+
SELECT MIN(id) AS min_id, MAX(id) AS max_id FROM #{quote_table_name(table_name)}
|
24
|
+
SQL
|
25
|
+
min_id = result[0]["min_id"]
|
26
|
+
max_id = result[0]["max_id"]
|
27
|
+
return unless min_id && max_id
|
28
|
+
|
29
|
+
lower_id = min_id
|
30
|
+
loop do
|
31
|
+
sql = yield(lower_id, lower_id + step_size)
|
32
|
+
|
33
|
+
execute(sql)
|
34
|
+
|
35
|
+
lower_id += step_size
|
36
|
+
break if lower_id > max_id
|
37
|
+
end
|
38
|
+
end
|
39
|
+
end
|
40
|
+
end
|
@@ -0,0 +1,22 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "active_support/core_ext/time"
|
4
|
+
|
5
|
+
module SuperSpreader
|
6
|
+
class PeakSchedule
|
7
|
+
def initialize(on_peak_wday_range:, on_peak_hour_range:, timezone:)
|
8
|
+
@on_peak_wday_range = on_peak_wday_range
|
9
|
+
@on_peak_hour_range = on_peak_hour_range
|
10
|
+
@timezone = timezone
|
11
|
+
end
|
12
|
+
|
13
|
+
def on_peak?(time = Time.current)
|
14
|
+
time_in_zone = time.in_time_zone(@timezone)
|
15
|
+
|
16
|
+
is_on_peak_day = @on_peak_wday_range.cover?(time_in_zone.wday)
|
17
|
+
is_on_peak_hour = @on_peak_hour_range.cover?(time_in_zone.hour)
|
18
|
+
|
19
|
+
is_on_peak_day && is_on_peak_hour
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
@@ -0,0 +1,51 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "active_model"
|
4
|
+
require "redis"
|
5
|
+
|
6
|
+
module SuperSpreader
|
7
|
+
class RedisModel
|
8
|
+
include ActiveModel::Model
|
9
|
+
include ActiveModel::Attributes
|
10
|
+
include ActiveModel::Serialization
|
11
|
+
|
12
|
+
def initialize(values = default_values)
|
13
|
+
super
|
14
|
+
end
|
15
|
+
|
16
|
+
def default_values
|
17
|
+
redis.hgetall(redis_key)
|
18
|
+
end
|
19
|
+
|
20
|
+
def persisted?
|
21
|
+
redis.get(redis_key).present?
|
22
|
+
end
|
23
|
+
|
24
|
+
def delete
|
25
|
+
redis.del(redis_key)
|
26
|
+
end
|
27
|
+
|
28
|
+
def save
|
29
|
+
redis.multi do |pipeline|
|
30
|
+
pipeline.del(redis_key)
|
31
|
+
|
32
|
+
serializable_hash.each do |key, value|
|
33
|
+
pipeline.hset(redis_key, key, value)
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
37
|
+
|
38
|
+
# Primarily for factory_bot
|
39
|
+
alias_method :save!, :save
|
40
|
+
|
41
|
+
private
|
42
|
+
|
43
|
+
def redis_key
|
44
|
+
self.class.name
|
45
|
+
end
|
46
|
+
|
47
|
+
def redis
|
48
|
+
SuperSpreader.redis
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
@@ -0,0 +1,79 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "super_spreader/peak_schedule"
|
4
|
+
require "super_spreader/redis_model"
|
5
|
+
|
6
|
+
module SuperSpreader
|
7
|
+
class SchedulerConfig < RedisModel
|
8
|
+
# The job class to enqueue on each run of the scheduler.
|
9
|
+
attribute :job_class_name, :string
|
10
|
+
# The number of records to process in each invocation of the job class.
|
11
|
+
attribute :batch_size, :integer
|
12
|
+
# The amount of work to enqueue, in seconds.
|
13
|
+
attribute :duration, :integer
|
14
|
+
|
15
|
+
# The number of jobs to enqueue per second, allowing for fractional amounts
|
16
|
+
# such as 1 job every other second using `0.5`.
|
17
|
+
attribute :per_second_on_peak, :float
|
18
|
+
# The same as per_second_on_peak, but for times that are not identified as
|
19
|
+
# on-peak.
|
20
|
+
attribute :per_second_off_peak, :float
|
21
|
+
|
22
|
+
# This section manages the definition "on peak." Compare this terminology
|
23
|
+
# to bus or train schedules.
|
24
|
+
|
25
|
+
# The timezone to use for time calculations.
|
26
|
+
#
|
27
|
+
# Example: "America/Los_Angeles" for Pacific time
|
28
|
+
attribute :on_peak_timezone, :string
|
29
|
+
# The 24-hour hour on which on-peak application usage starts.
|
30
|
+
#
|
31
|
+
# Example: 5 for 5 AM
|
32
|
+
attribute :on_peak_hour_begin, :integer
|
33
|
+
# The 24-hour hour on which on-peak application usage ends.
|
34
|
+
#
|
35
|
+
# Example: 17 for 5 PM
|
36
|
+
attribute :on_peak_hour_end, :integer
|
37
|
+
# The wday value on which on-peak application usage starts.
|
38
|
+
#
|
39
|
+
# Example: 1 for Monday
|
40
|
+
attribute :on_peak_wday_begin, :integer
|
41
|
+
# The wday value on which on-peak application usage ends.
|
42
|
+
#
|
43
|
+
# Example: 5 for Friday
|
44
|
+
attribute :on_peak_wday_end, :integer
|
45
|
+
|
46
|
+
attr_writer :schedule
|
47
|
+
|
48
|
+
def job_class
|
49
|
+
job_class_name.constantize
|
50
|
+
end
|
51
|
+
|
52
|
+
def super_spreader_config
|
53
|
+
[job_class, job_class.super_spreader_model_class]
|
54
|
+
end
|
55
|
+
|
56
|
+
def spread_options
|
57
|
+
{
|
58
|
+
batch_size: batch_size,
|
59
|
+
duration: duration,
|
60
|
+
per_second: per_second
|
61
|
+
}
|
62
|
+
end
|
63
|
+
|
64
|
+
def per_second
|
65
|
+
schedule.on_peak? ? per_second_on_peak : per_second_off_peak
|
66
|
+
end
|
67
|
+
|
68
|
+
private
|
69
|
+
|
70
|
+
def schedule
|
71
|
+
@schedule ||=
|
72
|
+
PeakSchedule.new(
|
73
|
+
on_peak_wday_range: on_peak_wday_begin..on_peak_wday_end,
|
74
|
+
on_peak_hour_range: on_peak_hour_begin..on_peak_hour_end,
|
75
|
+
timezone: on_peak_timezone
|
76
|
+
)
|
77
|
+
end
|
78
|
+
end
|
79
|
+
end
|
@@ -0,0 +1,43 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "active_job"
|
4
|
+
require "json"
|
5
|
+
require "super_spreader/scheduler_config"
|
6
|
+
require "super_spreader/spreader"
|
7
|
+
require "super_spreader/stop_signal"
|
8
|
+
|
9
|
+
module SuperSpreader
|
10
|
+
class SchedulerJob < ActiveJob::Base
|
11
|
+
extend StopSignal
|
12
|
+
|
13
|
+
def perform
|
14
|
+
return if self.class.stopped?
|
15
|
+
|
16
|
+
log(started_at: Time.current.iso8601)
|
17
|
+
log(config.serializable_hash)
|
18
|
+
|
19
|
+
super_spreader = Spreader.new(*config.super_spreader_config)
|
20
|
+
next_id = super_spreader.enqueue_spread(**config.spread_options)
|
21
|
+
log(next_id: next_id)
|
22
|
+
|
23
|
+
return if next_id.zero?
|
24
|
+
|
25
|
+
self.class.set(wait_until: next_run_at).perform_later
|
26
|
+
log(next_run_at: next_run_at.iso8601)
|
27
|
+
end
|
28
|
+
|
29
|
+
def next_run_at
|
30
|
+
config.duration.seconds.from_now
|
31
|
+
end
|
32
|
+
|
33
|
+
def config
|
34
|
+
@config ||= SchedulerConfig.new
|
35
|
+
end
|
36
|
+
|
37
|
+
private
|
38
|
+
|
39
|
+
def log(hash)
|
40
|
+
SuperSpreader.logger.info({subject: self.class.name}.merge(hash).to_json)
|
41
|
+
end
|
42
|
+
end
|
43
|
+
end
|
@@ -0,0 +1,39 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "active_record"
|
4
|
+
require "redis"
|
5
|
+
|
6
|
+
module SuperSpreader
|
7
|
+
class SpreadTracker
|
8
|
+
def initialize(job_class, model_class)
|
9
|
+
@job_class = job_class
|
10
|
+
@model_class = model_class
|
11
|
+
end
|
12
|
+
|
13
|
+
def initial_id
|
14
|
+
redis_value = redis.hget(initial_id_key, @model_class.name)
|
15
|
+
|
16
|
+
value = redis_value || @model_class.maximum(:id)
|
17
|
+
|
18
|
+
value.to_i
|
19
|
+
end
|
20
|
+
|
21
|
+
def initial_id=(value)
|
22
|
+
if value.nil?
|
23
|
+
redis.hdel(initial_id_key, @model_class.name)
|
24
|
+
else
|
25
|
+
redis.hset(initial_id_key, @model_class.name, value)
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
private
|
30
|
+
|
31
|
+
def redis
|
32
|
+
SuperSpreader.redis
|
33
|
+
end
|
34
|
+
|
35
|
+
def initial_id_key
|
36
|
+
"#{@job_class.name}:initial_id"
|
37
|
+
end
|
38
|
+
end
|
39
|
+
end
|
@@ -0,0 +1,61 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "super_spreader/spread_tracker"
|
4
|
+
|
5
|
+
module SuperSpreader
|
6
|
+
class Spreader
|
7
|
+
def initialize(job_class, model_class, spread_tracker: nil)
|
8
|
+
@job_class = job_class
|
9
|
+
@model_class = model_class
|
10
|
+
@spread_tracker = spread_tracker || SpreadTracker.new(job_class, model_class)
|
11
|
+
end
|
12
|
+
|
13
|
+
def spread(batch_size:, duration:, per_second:, initial_id:, begin_at: Time.now.utc)
|
14
|
+
end_id = initial_id
|
15
|
+
segment_duration = 1.0 / per_second
|
16
|
+
time_index = 0.0
|
17
|
+
batches = []
|
18
|
+
|
19
|
+
while time_index < duration
|
20
|
+
break if end_id <= 0
|
21
|
+
|
22
|
+
# Use floor to prevent subsecond times
|
23
|
+
run_at = begin_at + time_index.floor
|
24
|
+
begin_id = clamp(end_id - batch_size + 1)
|
25
|
+
batches << {run_at: run_at, begin_id: begin_id, end_id: end_id}
|
26
|
+
|
27
|
+
break if begin_id == 1
|
28
|
+
|
29
|
+
end_id = begin_id - 1
|
30
|
+
time_index += segment_duration
|
31
|
+
end
|
32
|
+
|
33
|
+
batches
|
34
|
+
end
|
35
|
+
|
36
|
+
def enqueue_spread(**opts)
|
37
|
+
initial_id = @spread_tracker.initial_id
|
38
|
+
return 0 if initial_id.zero?
|
39
|
+
|
40
|
+
batches = spread(**opts.merge(initial_id: initial_id))
|
41
|
+
|
42
|
+
batches.each do |batch|
|
43
|
+
@job_class
|
44
|
+
.set(wait_until: batch[:run_at])
|
45
|
+
.perform_later(batch[:begin_id], batch[:end_id])
|
46
|
+
end
|
47
|
+
|
48
|
+
last_begin_id = batches.last[:begin_id]
|
49
|
+
next_id = last_begin_id - 1
|
50
|
+
@spread_tracker.initial_id = next_id
|
51
|
+
|
52
|
+
next_id
|
53
|
+
end
|
54
|
+
|
55
|
+
private
|
56
|
+
|
57
|
+
def clamp(value)
|
58
|
+
(value <= 0) ? 1 : value
|
59
|
+
end
|
60
|
+
end
|
61
|
+
end
|
@@ -0,0 +1,29 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require "redis"
|
4
|
+
|
5
|
+
module SuperSpreader
|
6
|
+
module StopSignal
|
7
|
+
def stop!
|
8
|
+
redis.set(stop_key, true)
|
9
|
+
end
|
10
|
+
|
11
|
+
def go!
|
12
|
+
redis.del(stop_key)
|
13
|
+
end
|
14
|
+
|
15
|
+
def stopped?
|
16
|
+
redis.exists(stop_key).positive?
|
17
|
+
end
|
18
|
+
|
19
|
+
private
|
20
|
+
|
21
|
+
def redis
|
22
|
+
SuperSpreader.redis
|
23
|
+
end
|
24
|
+
|
25
|
+
def stop_key
|
26
|
+
"#{name}:stop"
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|