super_spreader 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,240 @@
1
+ # SuperSpreader
2
+
3
+ SuperSpreader is a library for massive, memory- and compute-efficient backfills of ActiveRecord models using ActiveJob.
4
+
5
+ This tool is built to backfill many millions of records in a resource-efficient way. When paired with a properly written job, it can drastically reduce the wall time of a backfill through parallelization. Jobs are enqueued in small batches so that the ActiveJob backend is not overwhelmed. These jobs can also be stopped at a moment's notice, if needed.
6
+
7
+ ## Example use cases
8
+
9
+ - Re-encrypt data
10
+ - Make API calls to fill in missing data
11
+ - Restructuring complex data
12
+
13
+ ## Warnings
14
+
15
+ > [!WARNING]
16
+ >
17
+ > **Please be aware:** SuperSpreader is still fairly early in development. While it can be used effectively by experienced hands, we are aware that it could have a better developer experience (DevX). It was written to solve a specific problem (see "History"). We are working to generalize the tool as the need arises. Pull requests are welcome!
18
+
19
+ Please also see "Roadmap" for other known limitations that may be relevant to you.
20
+
21
+ ## History
22
+
23
+ SuperSpreader was originally written to re-encrypt the Dialer database, a key component of Doximity's telehealth offerings. Without SuperSpreader, it would have taken **several months** to handle many millions of records using a Key Management Service (KMS) that adds an overhead of 11 ms per record. Using SuperSpreader took the time to backfill down to a couple of weeks. This massive backfill happened safely during very high Dialer usage during the winter of 2020. Of course, the name came from the coronavirus pandemic, which had a number of super-spreader events in the news around the same time. Rather than spreading disease, the SuperSpreader gem spreads out telehealth background jobs to support the healthcare professionals that fight disease.
24
+
25
+ Since that time, our team has started to use SuperSpreader in many other situations. Our hope is that other teams, internal and external, can use it if they have similar problems to solve.
26
+
27
+ ## When should I use it?
28
+
29
+ SuperSpreader was built for backfills. If you need to touch every record and you have _a lot_ of records, it may be a good fit.
30
+
31
+ That said, it's **not** common to need a tool like SuperSpreader. Many backfills are better handled through SQL or Rake tasks. SuperSpreader should only be used when the additional complexity is warranted. Before using a shiny tool, **please stop and consider the tradeoffs**.
32
+
33
+ For some use cases, a pure-SQL migration or Rake task may be a better fit. It may also make sense to use **both** background jobs and foreground task (Rails migration or Rake task). (Consider the database size and number of instances between production and staging.) For that, you might consider [SuperSpreader::BatchHelper](https://github.com/doximity/super_spreader/blob/master/lib/super_spreader/batch_helper.rb).
34
+
35
+ The primary criterion to consider is whether the backfill in question is _long-running_. If you estimate it would take at least a couple of days to complete, it makes sense to consider SuperSpreader. Another good reason to consider this tool is _code reuse_. If you already have Ruby-land code that would be difficult or impossible to replicate in SQL, it makes sense to use SuperSpreader, assuming the equivalent Rake task would be impractical.
36
+
37
+ ## How does it work?
38
+
39
+ SuperSpreader enqueues a configurable number of background jobs on a set schedule. These background jobs are executed in small batches such that only a small number of jobs are enqueued at any given time. The jobs start at the most recent record and work back to the first record, based on the auto-incrementing primary key.
40
+
41
+ The configuration is able to be tuned for the needs of an individual problem. If the backfill would require months of compute time, it can be run in parallel so that it takes much less time. The resource utilization can be spread out so that shared resources, such as a database, are not overwhelmed with requests. Finally, there is also support for running more jobs during off-peak usage based on a schedule.
42
+
43
+ Backfills are implemented using ActiveJob classes. SuperSpreader orchestrates running those jobs. Each set of jobs is enqueued by a scheduler using the supplied configuration.
44
+
45
+ As an example, assume that there's a table with 100,000,000 rows which need Ruby-land logic to be applied using `ExampleBackfillJob`. The rate (e.g., how many jobs per second) is configurable. Once configured, SuperSpreader would enqueue job in batches like:
46
+
47
+ ExampleBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_901, end_id: 100_000_000
48
+ ExampleBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_801, end_id: 99_999_900
49
+ ExampleBackfillJob run_at: "2020-11-16T22:51:59Z", begin_id: 99_999_701, end_id: 99_999_800
50
+ ExampleBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_601, end_id: 99_999_700
51
+ ExampleBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_501, end_id: 99_999_600
52
+ ExampleBackfillJob run_at: "2020-11-16T22:52:00Z", begin_id: 99_999_401, end_id: 99_999_500
53
+
54
+ Notice that there are 3 jobs per second, 2 seconds of work were enqueued, and the batch size is 100. Again, this is just an example for illustration, and the configuration can be modified to suit the needs of the problem.
55
+
56
+ After running out of work, SuperSpreader will enqueue more work:
57
+
58
+ SuperScheduler::SchedulerJob run_at: "2020-11-16T22:52:01Z"
59
+
60
+ And the work continues:
61
+
62
+ ExampleBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_401, end_id: 99_999_500
63
+ ExampleBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_301, end_id: 99_999_400
64
+ ExampleBackfillJob run_at: "2020-11-16T22:52:01Z", begin_id: 99_999_201, end_id: 99_999_300
65
+ ExampleBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_101, end_id: 99_999_200
66
+ ExampleBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_999_001, end_id: 99_999_100
67
+ ExampleBackfillJob run_at: "2020-11-16T22:52:02Z", begin_id: 99_998_901, end_id: 99_999_000
68
+
69
+ This process continues until there is no more work to be done. For more detail, please see [Spreader](https://github.com/doximity/super_spreader/blob/master/lib/super_spreader/spreader.rb) and [its spec](https://github.com/doximity/super_spreader/blob/master/spec/spreader_spec.rb).
70
+
71
+ Additionally, the configuration can be tuned while SuperSpreader is running. The configuration is read each time `SchedulerJob` runs. As it stands, each run of SuperSpreader is hand-tuned. It is highly recommended that SuperSpreader resource utilization is monitored during runs. That said, it is designed to run autonomously once a good configuration is found.
72
+
73
+ Example tuning:
74
+
75
+ - Does the process need to go faster? Increase the number of jobs per second.
76
+ - Are batches taking too long to complete? Decrease the batch size.
77
+ - Is `SchedulerJob` taking a long time to complete? Decrease the duration so that less work is enqueued in each cycle.
78
+
79
+ Finally, SuperSpreader can be stopped instantly and resumed at a later time, if a need ever arises.
80
+
81
+ ## How do I use it?
82
+
83
+ To repeat an earlier disclaimer:
84
+
85
+ > **Please be aware:** SuperSpreader is still fairly early in development. While it can be used by experienced hands, we are aware that it could have a better developer experience (DevX). It was written to solve a specific problem (see "History"). We are working to generalize the tool as the need arises. Pull requests are welcome!
86
+
87
+ If you haven't yet, please read the "How does it work?" section. This basic workflow is tested in `spec/integration/backfill_spec.rb`.
88
+
89
+ First, write a backfill job. Please see [this example for details](https://github.com/doximity/super_spreader/blob/master/spec/support/example_backfill_job.rb).
90
+
91
+ Next, configure `SuperSpreader` from the console by saving `SchedulerConfig` to Redis. For documentation on each attribute, please see [SchedulerConfig](https://github.com/doximity/super_spreader/blob/master/lib/super_spreader/scheduler_config.rb). It is recommended that you start slow, with small batches, short durations, and low per-second rates.
92
+
93
+ **Important:** SuperSpreader currently only supports a _single_ configuration, though removing that limitation is our Roadmap (please see below).
94
+
95
+ ```ruby
96
+ # NOTE: This is an example. You should take your situation into account when
97
+ # setting these values.
98
+ config = SuperSpreader::SchedulerConfig.new
99
+
100
+ config.batch_size = 10
101
+ config.duration = 10
102
+ config.job_class_name = "ExampleBackfillJob"
103
+
104
+ config.per_second_on_peak = 3.0
105
+ config.per_second_off_peak = 3.0
106
+
107
+ config.on_peak_timezone = "America/Los_Angeles"
108
+ config.on_peak_wday_begin = 1
109
+ config.on_peak_wday_end = 5
110
+ config.on_peak_hour_begin = 5
111
+ config.on_peak_hour_end = 17
112
+
113
+ config.save
114
+ ```
115
+
116
+ Now the `SchedulerJob` can be started. It will run until it is stopped or runs out of work.
117
+
118
+ ```ruby
119
+ SuperSpreader::SchedulerJob.perform_now
120
+ ```
121
+
122
+ At this point, you should monitor your database and worker instances using the tooling you have available. You should make adjustments based on the metrics you have available.
123
+
124
+ Based on those metrics, slowly step up `per_second_on_peak` and `batch_size` while continuing to monitor:
125
+
126
+ ```ruby
127
+ config.batch_size = 20
128
+ config.save
129
+ ```
130
+
131
+ ```ruby
132
+ config.per_second_on_peak = 4.0
133
+ config.save
134
+ ```
135
+
136
+ Continue to step up the rates, until you arrive at a rate that is acceptable for your situation.
137
+ For our re-encryption project as an example, our jobs ran at this rate:
138
+
139
+ ```ruby
140
+ # NOTE: This is an example. You should take your situation into account when
141
+ # setting these values.
142
+ config = SuperSpreader::SchedulerConfig.new
143
+
144
+ config.batch_size = 70
145
+ config.duration = 180
146
+ config.job_class_name = "ReencryptJob"
147
+
148
+ config.per_second_on_peak = 3.0
149
+ config.per_second_off_peak = 7.5
150
+
151
+ config.on_peak_timezone = "America/Los_Angeles"
152
+ config.on_peak_wday_begin = 1
153
+ config.on_peak_wday_end = 5
154
+ config.on_peak_hour_begin = 5
155
+ config.on_peak_hour_end = 17
156
+
157
+ config.save
158
+ ```
159
+
160
+ ### Disaster recovery
161
+
162
+ If at any point you need to stop the background jobs, stop all scheduling using:
163
+
164
+ ```ruby
165
+ SuperSpreader::SchedulerJob.stop!
166
+ ```
167
+
168
+ Optionally, if it is acceptable to have a partially-processed cycle, you can stop the backfill jobs as well:
169
+
170
+ ```ruby
171
+ ExampleBackfillJob.stop!
172
+ ```
173
+
174
+ (Recovering from a partially-processed cycle requires manually setting the correct `initial_id` in `SpreadTracker`.)
175
+
176
+ The jobs will still be present in the job runner, but will all execute instantly because of the early return as demonstrated in [the example job](https://github.com/doximity/super_spreader/blob/master/spec/support/example_backfill_job.rb). After the last scheduler job, the process will be paused.
177
+
178
+ ### Restarting
179
+
180
+ If you stop the jobs but you wish to restart them later, use the `go!` method and *then* call `SuperSpreader::SchedulerJob.perform_now`. Otherwise, the jobs will not do any work.
181
+
182
+ ```ruby
183
+ ExampleBackfillJob.go!
184
+ SuperSpreader::SchedulerJob.go!
185
+ SuperSpreader::SchedulerJob.perform_now
186
+ ```
187
+
188
+ ## Installation
189
+
190
+ If you've gotten this far and think SuperSpreader is a good fit for your problem, these are the instructions for installing it.
191
+
192
+ Add this line to your application's Gemfile:
193
+
194
+ ```ruby
195
+ gem 'super_spreader'
196
+ ```
197
+
198
+ And then execute:
199
+
200
+ $ bundle
201
+
202
+ Or install it yourself as:
203
+
204
+ $ gem install super_spreader
205
+
206
+ SuperSpreader requires an ActiveRecord-compatible database, an ActiveJob-compatible job runner, and Redis for bookkeeping.
207
+
208
+ For Rails, please set up SuperSpreader using an initializer:
209
+
210
+ ```ruby
211
+ # config/initializers/super_spreader.rb
212
+
213
+ SuperSpreader.logger = Rails.logger
214
+ SuperSpreader.redis = Redis.new(url: ENV["REDIS_URL"])
215
+ ```
216
+
217
+ ## Roadmap
218
+
219
+ Please see [the Milestones on GitHub](https://github.com/doximity/super_spreader/milestones?direction=asc&sort=title&state=open).
220
+
221
+ ## Development
222
+
223
+ You'll need [Redis](https://redis.io/docs/getting-started/) and [Ruby](https://www.ruby-lang.org/en/downloads/) installed. Please ensure both are set up before continuing.
224
+
225
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
226
+
227
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
228
+
229
+ ## Contributing
230
+
231
+ 1. See [CONTRIBUTING.md](./CONTRIBUTING.md)
232
+ 2. Fork it ( https://github.com/doximity/super_spreader/fork )
233
+ 3. Create your feature branch (`git checkout -b my-new-feature`)
234
+ 4. Commit your changes (`git commit -am 'Add some feature'`)
235
+ 5. Push to the branch (`git push origin my-new-feature`)
236
+ 6. Create a new Pull Request
237
+
238
+ ## License
239
+
240
+ `super_spreader` is licensed under an Apache 2 license. Contributors are required to sign a contributor license agreement. See LICENSE.txt and CONTRIBUTING.md for more information.
data/Rakefile ADDED
@@ -0,0 +1,35 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+ require "yard"
4
+
5
+ RSpec::Core::RakeTask.new(:spec)
6
+ YARD::Rake::YardocTask.new(:doc)
7
+
8
+ task default: :spec
9
+
10
+ desc "Run a REPL with access to this library"
11
+ task :console do
12
+ sh("irb -I lib -r super_spreader")
13
+ end
14
+
15
+ namespace :check do
16
+ desc "Run all checks"
17
+ task all: %i[redis]
18
+
19
+ desc "Confirm Redis is accessible"
20
+ task :redis do
21
+ require "redis"
22
+
23
+ inaccessible_error_message = "Redis: inaccessible (please confirm that a Redis server is installed and running, e.g. redis-server)"
24
+
25
+ redis = Redis.new(url: ENV["REDIS_URL"])
26
+
27
+ if redis.ping == "PONG"
28
+ puts "Redis: OK"
29
+ else
30
+ raise inaccessible_error_message
31
+ end
32
+ rescue Redis::CannotConnectError
33
+ raise inaccessible_error_message
34
+ end
35
+ end
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "super_spreader"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+ rake check:all
@@ -0,0 +1,40 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "active_record"
4
+
5
+ module SuperSpreader
6
+ # Methods in this module are suitable for use in Rails migrations. It is
7
+ # expected that their interface will remain stable. If breaking changes are
8
+ # introduced, a new module will be introduced so existing migrations will not
9
+ # be affected.
10
+ module BatchHelper
11
+ # Execute SQL in small batches for an entire table.
12
+ #
13
+ # It is assumed that the table has a primary key named +id+.
14
+ #
15
+ # Recommendation for migrations: Use this in combination with +disable_ddl_transaction!+.
16
+ # See also: https://github.com/ankane/strong_migrations#backfilling-data
17
+ #
18
+ # @param table_name [String] the name of the table
19
+ # @param step_size [Integer] how many records to process in each batch
20
+ # @yield [minimum_id, maximum_id] block that returns SQL to migrate records between minimum_id and maximum_id
21
+ def batch_execute(table_name:, step_size:)
22
+ result = execute(<<~SQL).to_a.flatten
23
+ SELECT MIN(id) AS min_id, MAX(id) AS max_id FROM #{quote_table_name(table_name)}
24
+ SQL
25
+ min_id = result[0]["min_id"]
26
+ max_id = result[0]["max_id"]
27
+ return unless min_id && max_id
28
+
29
+ lower_id = min_id
30
+ loop do
31
+ sql = yield(lower_id, lower_id + step_size)
32
+
33
+ execute(sql)
34
+
35
+ lower_id += step_size
36
+ break if lower_id > max_id
37
+ end
38
+ end
39
+ end
40
+ end
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "active_support/core_ext/time"
4
+
5
+ module SuperSpreader
6
+ class PeakSchedule
7
+ def initialize(on_peak_wday_range:, on_peak_hour_range:, timezone:)
8
+ @on_peak_wday_range = on_peak_wday_range
9
+ @on_peak_hour_range = on_peak_hour_range
10
+ @timezone = timezone
11
+ end
12
+
13
+ def on_peak?(time = Time.current)
14
+ time_in_zone = time.in_time_zone(@timezone)
15
+
16
+ is_on_peak_day = @on_peak_wday_range.cover?(time_in_zone.wday)
17
+ is_on_peak_hour = @on_peak_hour_range.cover?(time_in_zone.hour)
18
+
19
+ is_on_peak_day && is_on_peak_hour
20
+ end
21
+ end
22
+ end
@@ -0,0 +1,51 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "active_model"
4
+ require "redis"
5
+
6
+ module SuperSpreader
7
+ class RedisModel
8
+ include ActiveModel::Model
9
+ include ActiveModel::Attributes
10
+ include ActiveModel::Serialization
11
+
12
+ def initialize(values = default_values)
13
+ super
14
+ end
15
+
16
+ def default_values
17
+ redis.hgetall(redis_key)
18
+ end
19
+
20
+ def persisted?
21
+ redis.get(redis_key).present?
22
+ end
23
+
24
+ def delete
25
+ redis.del(redis_key)
26
+ end
27
+
28
+ def save
29
+ redis.multi do |pipeline|
30
+ pipeline.del(redis_key)
31
+
32
+ serializable_hash.each do |key, value|
33
+ pipeline.hset(redis_key, key, value)
34
+ end
35
+ end
36
+ end
37
+
38
+ # Primarily for factory_bot
39
+ alias_method :save!, :save
40
+
41
+ private
42
+
43
+ def redis_key
44
+ self.class.name
45
+ end
46
+
47
+ def redis
48
+ SuperSpreader.redis
49
+ end
50
+ end
51
+ end
@@ -0,0 +1,79 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "super_spreader/peak_schedule"
4
+ require "super_spreader/redis_model"
5
+
6
+ module SuperSpreader
7
+ class SchedulerConfig < RedisModel
8
+ # The job class to enqueue on each run of the scheduler.
9
+ attribute :job_class_name, :string
10
+ # The number of records to process in each invocation of the job class.
11
+ attribute :batch_size, :integer
12
+ # The amount of work to enqueue, in seconds.
13
+ attribute :duration, :integer
14
+
15
+ # The number of jobs to enqueue per second, allowing for fractional amounts
16
+ # such as 1 job every other second using `0.5`.
17
+ attribute :per_second_on_peak, :float
18
+ # The same as per_second_on_peak, but for times that are not identified as
19
+ # on-peak.
20
+ attribute :per_second_off_peak, :float
21
+
22
+ # This section manages the definition "on peak." Compare this terminology
23
+ # to bus or train schedules.
24
+
25
+ # The timezone to use for time calculations.
26
+ #
27
+ # Example: "America/Los_Angeles" for Pacific time
28
+ attribute :on_peak_timezone, :string
29
+ # The 24-hour hour on which on-peak application usage starts.
30
+ #
31
+ # Example: 5 for 5 AM
32
+ attribute :on_peak_hour_begin, :integer
33
+ # The 24-hour hour on which on-peak application usage ends.
34
+ #
35
+ # Example: 17 for 5 PM
36
+ attribute :on_peak_hour_end, :integer
37
+ # The wday value on which on-peak application usage starts.
38
+ #
39
+ # Example: 1 for Monday
40
+ attribute :on_peak_wday_begin, :integer
41
+ # The wday value on which on-peak application usage ends.
42
+ #
43
+ # Example: 5 for Friday
44
+ attribute :on_peak_wday_end, :integer
45
+
46
+ attr_writer :schedule
47
+
48
+ def job_class
49
+ job_class_name.constantize
50
+ end
51
+
52
+ def super_spreader_config
53
+ [job_class, job_class.super_spreader_model_class]
54
+ end
55
+
56
+ def spread_options
57
+ {
58
+ batch_size: batch_size,
59
+ duration: duration,
60
+ per_second: per_second
61
+ }
62
+ end
63
+
64
+ def per_second
65
+ schedule.on_peak? ? per_second_on_peak : per_second_off_peak
66
+ end
67
+
68
+ private
69
+
70
+ def schedule
71
+ @schedule ||=
72
+ PeakSchedule.new(
73
+ on_peak_wday_range: on_peak_wday_begin..on_peak_wday_end,
74
+ on_peak_hour_range: on_peak_hour_begin..on_peak_hour_end,
75
+ timezone: on_peak_timezone
76
+ )
77
+ end
78
+ end
79
+ end
@@ -0,0 +1,43 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "active_job"
4
+ require "json"
5
+ require "super_spreader/scheduler_config"
6
+ require "super_spreader/spreader"
7
+ require "super_spreader/stop_signal"
8
+
9
+ module SuperSpreader
10
+ class SchedulerJob < ActiveJob::Base
11
+ extend StopSignal
12
+
13
+ def perform
14
+ return if self.class.stopped?
15
+
16
+ log(started_at: Time.current.iso8601)
17
+ log(config.serializable_hash)
18
+
19
+ super_spreader = Spreader.new(*config.super_spreader_config)
20
+ next_id = super_spreader.enqueue_spread(**config.spread_options)
21
+ log(next_id: next_id)
22
+
23
+ return if next_id.zero?
24
+
25
+ self.class.set(wait_until: next_run_at).perform_later
26
+ log(next_run_at: next_run_at.iso8601)
27
+ end
28
+
29
+ def next_run_at
30
+ config.duration.seconds.from_now
31
+ end
32
+
33
+ def config
34
+ @config ||= SchedulerConfig.new
35
+ end
36
+
37
+ private
38
+
39
+ def log(hash)
40
+ SuperSpreader.logger.info({subject: self.class.name}.merge(hash).to_json)
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,39 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "active_record"
4
+ require "redis"
5
+
6
+ module SuperSpreader
7
+ class SpreadTracker
8
+ def initialize(job_class, model_class)
9
+ @job_class = job_class
10
+ @model_class = model_class
11
+ end
12
+
13
+ def initial_id
14
+ redis_value = redis.hget(initial_id_key, @model_class.name)
15
+
16
+ value = redis_value || @model_class.maximum(:id)
17
+
18
+ value.to_i
19
+ end
20
+
21
+ def initial_id=(value)
22
+ if value.nil?
23
+ redis.hdel(initial_id_key, @model_class.name)
24
+ else
25
+ redis.hset(initial_id_key, @model_class.name, value)
26
+ end
27
+ end
28
+
29
+ private
30
+
31
+ def redis
32
+ SuperSpreader.redis
33
+ end
34
+
35
+ def initial_id_key
36
+ "#{@job_class.name}:initial_id"
37
+ end
38
+ end
39
+ end
@@ -0,0 +1,61 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "super_spreader/spread_tracker"
4
+
5
+ module SuperSpreader
6
+ class Spreader
7
+ def initialize(job_class, model_class, spread_tracker: nil)
8
+ @job_class = job_class
9
+ @model_class = model_class
10
+ @spread_tracker = spread_tracker || SpreadTracker.new(job_class, model_class)
11
+ end
12
+
13
+ def spread(batch_size:, duration:, per_second:, initial_id:, begin_at: Time.now.utc)
14
+ end_id = initial_id
15
+ segment_duration = 1.0 / per_second
16
+ time_index = 0.0
17
+ batches = []
18
+
19
+ while time_index < duration
20
+ break if end_id <= 0
21
+
22
+ # Use floor to prevent subsecond times
23
+ run_at = begin_at + time_index.floor
24
+ begin_id = clamp(end_id - batch_size + 1)
25
+ batches << {run_at: run_at, begin_id: begin_id, end_id: end_id}
26
+
27
+ break if begin_id == 1
28
+
29
+ end_id = begin_id - 1
30
+ time_index += segment_duration
31
+ end
32
+
33
+ batches
34
+ end
35
+
36
+ def enqueue_spread(**opts)
37
+ initial_id = @spread_tracker.initial_id
38
+ return 0 if initial_id.zero?
39
+
40
+ batches = spread(**opts.merge(initial_id: initial_id))
41
+
42
+ batches.each do |batch|
43
+ @job_class
44
+ .set(wait_until: batch[:run_at])
45
+ .perform_later(batch[:begin_id], batch[:end_id])
46
+ end
47
+
48
+ last_begin_id = batches.last[:begin_id]
49
+ next_id = last_begin_id - 1
50
+ @spread_tracker.initial_id = next_id
51
+
52
+ next_id
53
+ end
54
+
55
+ private
56
+
57
+ def clamp(value)
58
+ (value <= 0) ? 1 : value
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,29 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "redis"
4
+
5
+ module SuperSpreader
6
+ module StopSignal
7
+ def stop!
8
+ redis.set(stop_key, true)
9
+ end
10
+
11
+ def go!
12
+ redis.del(stop_key)
13
+ end
14
+
15
+ def stopped?
16
+ redis.exists(stop_key).positive?
17
+ end
18
+
19
+ private
20
+
21
+ def redis
22
+ SuperSpreader.redis
23
+ end
24
+
25
+ def stop_key
26
+ "#{name}:stop"
27
+ end
28
+ end
29
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SuperSpreader
4
+ VERSION = "0.2.0"
5
+ end