maintenance_tasks 2.6.0 → 2.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +105 -20
- data/app/models/maintenance_tasks/batch_csv_collection_builder.rb +4 -3
- data/app/models/maintenance_tasks/csv_collection_builder.rb +11 -8
- data/app/models/maintenance_tasks/progress.rb +5 -5
- data/app/models/maintenance_tasks/run.rb +26 -0
- data/app/models/maintenance_tasks/runner.rb +1 -1
- data/app/models/maintenance_tasks/task.rb +9 -5
- data/lib/generators/maintenance_tasks/task_generator.rb +2 -2
- data/lib/maintenance_tasks/cli.rb +2 -2
- data/lib/tasks/maintenance_tasks_tasks.rake +1 -0
- metadata +4 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 2869ba1c05cb80edb54803f50037bb45b388a5489bec724a79617d2890e78fdc
|
4
|
+
data.tar.gz: 5a936d433f102642e972f4c04e7fd96817193f0075b252e942c1ff86bb6a0e06
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 5e054a909fd05bacc7482caa7cba9cc69ed15850bdec1e0bbf09466426ad6bcfadd12c086a4efe185eb5be1edc40b3fbacaca34b555fac39fa0dcfac2df50405
|
7
|
+
data.tar.gz: fa1c8baed4e9cc6b34acd03e9c16ef2266da288293da212fb96c03d6a6be2d87fb5599e167e4d629e80f16a80127b611ffe6b8895867958a92675dca06457e48
|
data/README.md
CHANGED
@@ -10,8 +10,8 @@ engine helps with the second part of this process, backfilling.
|
|
10
10
|
|
11
11
|
Maintenance tasks are collection-based tasks, usually using Active Record, that
|
12
12
|
update the data in your database. They can be paused or interrupted. Maintenance
|
13
|
-
tasks can operate [in batches]
|
14
|
-
[throttling]
|
13
|
+
tasks can operate [in batches](#processing-batch-collections) and use
|
14
|
+
[throttling](#throttling) to control the load on your database.
|
15
15
|
|
16
16
|
Maintenance tasks aren't meant to happen on a regular basis. They're used as
|
17
17
|
needed, or as one-offs. Normally maintenance tasks are ephemeral, so they are
|
@@ -191,15 +191,57 @@ module Maintenance
|
|
191
191
|
end
|
192
192
|
```
|
193
193
|
|
194
|
+
`posts.csv`:
|
194
195
|
```csv
|
195
|
-
# posts.csv
|
196
196
|
title,content
|
197
197
|
My Title,Hello World!
|
198
198
|
```
|
199
199
|
|
200
200
|
The files uploaded to your Active Storage service provider will be renamed to
|
201
|
-
include an ISO 8601 timestamp and the Task name in snake case format.
|
202
|
-
|
201
|
+
include an ISO 8601 timestamp and the Task name in snake case format.
|
202
|
+
|
203
|
+
The implicit `#count` method loads and parses the entire file to determine the
|
204
|
+
accurate number of rows. With files with millions of rows, it takes several
|
205
|
+
seconds to process. Consider skipping the count (defining a `count` that returns
|
206
|
+
`nil`) or use an approximation, eg: count the number of new lines:
|
207
|
+
|
208
|
+
```ruby
|
209
|
+
def count(task)
|
210
|
+
task.csv_content.count("\n") - 1
|
211
|
+
end
|
212
|
+
```
|
213
|
+
|
214
|
+
#### CSV options
|
215
|
+
|
216
|
+
Tasks can pass [options for Ruby's CSV parser][csv-parse-options] by adding
|
217
|
+
keyword arguments to `csv_collection`:
|
218
|
+
|
219
|
+
[csv-parse-options]: https://ruby-doc.org/3.3.0/stdlibs/csv/CSV.html#class-CSV-label-Options+for+Parsing
|
220
|
+
|
221
|
+
```ruby
|
222
|
+
# app/tasks/maintenance/import_posts_task.rb
|
223
|
+
|
224
|
+
module Maintenance
|
225
|
+
class ImportPosts
|
226
|
+
csv_collection(skip_lines: /^#/, converters: ->(field) { field.strip })
|
227
|
+
|
228
|
+
def process(row)
|
229
|
+
Post.create!(title: row["title"], content: row["content"])
|
230
|
+
end
|
231
|
+
end
|
232
|
+
end
|
233
|
+
```
|
234
|
+
|
235
|
+
These options instruct Ruby's CSV parser to skip lines that start with a `#`,
|
236
|
+
and removes the leading and trailing spaces from any field, so that the
|
237
|
+
following file will be processed identically as the previous example:
|
238
|
+
|
239
|
+
`posts.csv`:
|
240
|
+
```csv
|
241
|
+
# A comment
|
242
|
+
title,content
|
243
|
+
My Title ,Hello World!
|
244
|
+
```
|
203
245
|
|
204
246
|
#### Batch CSV Tasks
|
205
247
|
|
@@ -453,6 +495,64 @@ module Maintenance
|
|
453
495
|
end
|
454
496
|
```
|
455
497
|
|
498
|
+
### Subscribing to instrumentation events
|
499
|
+
|
500
|
+
If you are interested in actioning a specific task event, please refer to the [Using Task Callbacks](#using-task-callbacks) section below. However, if you want to subscribe to all events, irrespective of the task, you can use the following Active Support notifications:
|
501
|
+
|
502
|
+
```ruby
|
503
|
+
enqueued.maintenance_tasks # This event is published when a task has been enqueued by the user.
|
504
|
+
succeeded.maintenance_tasks # This event is published when a task has finished without any errors.
|
505
|
+
cancelled.maintenance_tasks # This event is published when the user explicitly halts the execution of a task.
|
506
|
+
paused.maintenance_tasks # This event is published when a task is paused by the user in the middle of its run.
|
507
|
+
errored.maintenance_tasks # This event is published when the task's code produces an unhandled exception.
|
508
|
+
```
|
509
|
+
|
510
|
+
These notifications offer a way to monitor the lifecycle of maintenance tasks in your application.
|
511
|
+
|
512
|
+
Usage example:
|
513
|
+
|
514
|
+
```ruby
|
515
|
+
ActiveSupport::Notifications.subscribe("succeeded.maintenance_tasks") do |*, payload|
|
516
|
+
task_name = payload[:task_name]
|
517
|
+
arguments = payload[:arguments]
|
518
|
+
metadata = payload[:metadata]
|
519
|
+
job_id = payload[:job_id]
|
520
|
+
run_id = payload[:run_id]
|
521
|
+
time_running = payload[:time_running]
|
522
|
+
started_at = payload[:started_at]
|
523
|
+
ended_at = payload[:ended_at]
|
524
|
+
rescue => e
|
525
|
+
Rails.logger.error(e)
|
526
|
+
end
|
527
|
+
|
528
|
+
ActiveSupport::Notifications.subscribe("errored.maintenance_tasks") do |*, payload|
|
529
|
+
task_name = payload[:task_name]
|
530
|
+
error = payload[:error]
|
531
|
+
error_message = error[:message]
|
532
|
+
error_class = error[:class]
|
533
|
+
error_backtrace = error[:backtrace]
|
534
|
+
rescue => e
|
535
|
+
Rails.logger.error(e)
|
536
|
+
end
|
537
|
+
|
538
|
+
# or
|
539
|
+
|
540
|
+
class MaintenanceTasksInstrumenter < ActiveSupport::Subscriber
|
541
|
+
attach_to :maintenance_tasks
|
542
|
+
|
543
|
+
def enqueued(event)
|
544
|
+
task_name = event.payload[:task_name]
|
545
|
+
arguments = event.payload[:arguments]
|
546
|
+
metadata = event.payload[:metadata]
|
547
|
+
|
548
|
+
SlackNotifier.broadcast(SLACK_CHANNEL,
|
549
|
+
"Job #{task_name} was started by #{metadata[:user_email]}} with arguments #{arguments.to_s.truncate(255)}")
|
550
|
+
rescue => e
|
551
|
+
Rails.logger.error(e)
|
552
|
+
end
|
553
|
+
end
|
554
|
+
```
|
555
|
+
|
456
556
|
### Using Task Callbacks
|
457
557
|
|
458
558
|
The Task provides callbacks that hook into its life cycle.
|
@@ -503,21 +603,6 @@ end
|
|
503
603
|
If any of the other callbacks cause an exception, it will be handled by the
|
504
604
|
error handler, and will cause the task to stop running.
|
505
605
|
|
506
|
-
Callback behaviour can be shared across all tasks using an initializer.
|
507
|
-
|
508
|
-
```ruby
|
509
|
-
# config/initializer/maintenance_tasks.rb
|
510
|
-
Rails.autoloaders.main.on_load("MaintenanceTasks::Task") do
|
511
|
-
MaintenanceTasks::Task.class_eval do
|
512
|
-
after_start(:notify)
|
513
|
-
|
514
|
-
private
|
515
|
-
|
516
|
-
def notify; end
|
517
|
-
end
|
518
|
-
end
|
519
|
-
```
|
520
|
-
|
521
606
|
### Considerations when writing Tasks
|
522
607
|
|
523
608
|
Maintenance Tasks relies on the queue adapter configured for your application to
|
@@ -12,16 +12,17 @@ module MaintenanceTasks
|
|
12
12
|
# Initialize a BatchCsvCollectionBuilder with a batch size.
|
13
13
|
#
|
14
14
|
# @param batch_size [Integer] the number of CSV rows in a batch.
|
15
|
-
|
15
|
+
# @param csv_options [Hash] options to pass to the CSV parser.
|
16
|
+
def initialize(batch_size, **csv_options)
|
16
17
|
@batch_size = batch_size
|
17
|
-
super()
|
18
|
+
super(**csv_options)
|
18
19
|
end
|
19
20
|
|
20
21
|
# Defines the collection to be iterated over, based on the provided CSV.
|
21
22
|
# Includes the CSV and the batch size.
|
22
23
|
def collection(task)
|
23
24
|
BatchCsv.new(
|
24
|
-
csv: CSV.new(task.csv_content,
|
25
|
+
csv: CSV.new(task.csv_content, **@csv_options),
|
25
26
|
batch_size: @batch_size,
|
26
27
|
)
|
27
28
|
end
|
@@ -5,24 +5,27 @@ require "csv"
|
|
5
5
|
module MaintenanceTasks
|
6
6
|
# Strategy for building a Task that processes CSV files.
|
7
7
|
#
|
8
|
+
# @param csv_options [Hash] options to pass to the CSV parser.
|
8
9
|
# @api private
|
9
10
|
class CsvCollectionBuilder
|
11
|
+
def initialize(**csv_options)
|
12
|
+
@csv_options = csv_options
|
13
|
+
end
|
14
|
+
|
10
15
|
# Defines the collection to be iterated over, based on the provided CSV.
|
11
16
|
#
|
12
|
-
# @return [CSV] the CSV object constructed from the specified CSV content
|
13
|
-
# with headers.
|
17
|
+
# @return [CSV] the CSV object constructed from the specified CSV content.
|
14
18
|
def collection(task)
|
15
|
-
CSV.new(task.csv_content,
|
19
|
+
CSV.new(task.csv_content, **@csv_options)
|
16
20
|
end
|
17
21
|
|
18
|
-
# The number of rows to be processed.
|
19
|
-
#
|
20
|
-
# Note that
|
21
|
-
# newlines.
|
22
|
+
# The number of rows to be processed.
|
23
|
+
# It uses the CSV library for an accurate row count.
|
24
|
+
# Note that the entire file is loaded. It will take several seconds with files with millions of rows.
|
22
25
|
#
|
23
26
|
# @return [Integer] the approximate number of rows to process.
|
24
27
|
def count(task)
|
25
|
-
task.csv_content.count
|
28
|
+
CSV.new(task.csv_content, **@csv_options).count
|
26
29
|
end
|
27
30
|
|
28
31
|
# Return that the Task processes CSV content.
|
@@ -52,17 +52,17 @@ module MaintenanceTasks
|
|
52
52
|
total = @run.tick_total
|
53
53
|
|
54
54
|
if !total?
|
55
|
-
"Processed #{number_to_delimited(count)} "\
|
55
|
+
"Processed #{number_to_delimited(count)} " \
|
56
56
|
"#{"item".pluralize(count)}."
|
57
57
|
elsif over_total?
|
58
|
-
"Processed #{number_to_delimited(count)} "\
|
59
|
-
"#{"item".pluralize(count)} "\
|
58
|
+
"Processed #{number_to_delimited(count)} " \
|
59
|
+
"#{"item".pluralize(count)} " \
|
60
60
|
"(expected #{number_to_delimited(total)})."
|
61
61
|
else
|
62
62
|
percentage = 100.0 * count / total
|
63
63
|
|
64
|
-
"Processed #{number_to_delimited(count)} out of "\
|
65
|
-
"#{number_to_delimited(total)} #{"item".pluralize(total)} "\
|
64
|
+
"Processed #{number_to_delimited(count)} out of " \
|
65
|
+
"#{number_to_delimited(total)} #{"item".pluralize(total)} " \
|
66
66
|
"(#{number_to_percentage(percentage, precision: 0)})."
|
67
67
|
end
|
68
68
|
end
|
@@ -39,6 +39,8 @@ module MaintenanceTasks
|
|
39
39
|
enum status: STATUSES.to_h { |status| [status, status.to_s] }
|
40
40
|
end
|
41
41
|
|
42
|
+
after_save :instrument_status_change
|
43
|
+
|
42
44
|
validate :task_name_belongs_to_a_valid_task, on: :create
|
43
45
|
validate :csv_attachment_presence, on: :create
|
44
46
|
validate :csv_content_type, on: :create
|
@@ -452,6 +454,30 @@ module MaintenanceTasks
|
|
452
454
|
|
453
455
|
private
|
454
456
|
|
457
|
+
def instrument_status_change
|
458
|
+
return unless status_previously_changed? || id_previously_changed?
|
459
|
+
return if running? || pausing? || cancelling? || interrupted?
|
460
|
+
|
461
|
+
attr = {
|
462
|
+
run_id: id,
|
463
|
+
job_id: job_id,
|
464
|
+
task_name: task_name,
|
465
|
+
arguments: arguments,
|
466
|
+
metadata: metadata,
|
467
|
+
time_running: time_running,
|
468
|
+
started_at: started_at,
|
469
|
+
ended_at: ended_at,
|
470
|
+
}
|
471
|
+
|
472
|
+
attr[:error] = {
|
473
|
+
message: error_message,
|
474
|
+
class: error_class,
|
475
|
+
backtrace: backtrace,
|
476
|
+
} if errored?
|
477
|
+
|
478
|
+
ActiveSupport::Notifications.instrument("#{status}.maintenance_tasks", attr)
|
479
|
+
end
|
480
|
+
|
455
481
|
def run_task_callbacks(callback)
|
456
482
|
task.run_callbacks(callback)
|
457
483
|
rescue Task::NotFoundError
|
@@ -74,7 +74,7 @@ module MaintenanceTasks
|
|
74
74
|
|
75
75
|
def enqueue(run, job)
|
76
76
|
unless job.enqueue
|
77
|
-
raise "The job to perform #{run.task_name} could not be enqueued. "\
|
77
|
+
raise "The job to perform #{run.task_name} could not be enqueued. " \
|
78
78
|
"Enqueuing has been prevented by a callback."
|
79
79
|
end
|
80
80
|
rescue => error
|
@@ -65,20 +65,24 @@ module MaintenanceTasks
|
|
65
65
|
# Make this Task a task that handles CSV.
|
66
66
|
#
|
67
67
|
# @param in_batches [Integer] optionally, supply a batch size if the CSV
|
68
|
-
#
|
68
|
+
# should be processed in batches.
|
69
|
+
# @param csv_options [Hash] optionally, supply options for the CSV parser.
|
70
|
+
# If not given, defaults to: <code>{ headers: true }</code>
|
71
|
+
# @see https://ruby-doc.org/3.3.0/stdlibs/csv/CSV.html#class-CSV-label-Options+for+Parsing
|
69
72
|
#
|
70
73
|
# An input to upload a CSV will be added in the form to start a Run. The
|
71
74
|
# collection and count method are implemented.
|
72
|
-
def csv_collection(in_batches: nil)
|
75
|
+
def csv_collection(in_batches: nil, **csv_options)
|
73
76
|
unless defined?(ActiveStorage)
|
74
|
-
raise NotImplementedError, "Active Storage needs to be installed\n"\
|
77
|
+
raise NotImplementedError, "Active Storage needs to be installed\n" \
|
75
78
|
"To resolve this issue run: bin/rails active_storage:install"
|
76
79
|
end
|
77
80
|
|
81
|
+
csv_options[:headers] = true unless csv_options.key?(:headers)
|
78
82
|
self.collection_builder_strategy = if in_batches
|
79
|
-
BatchCsvCollectionBuilder.new(in_batches)
|
83
|
+
BatchCsvCollectionBuilder.new(in_batches, **csv_options)
|
80
84
|
else
|
81
|
-
CsvCollectionBuilder.new
|
85
|
+
CsvCollectionBuilder.new(**csv_options)
|
82
86
|
end
|
83
87
|
end
|
84
88
|
|
@@ -6,7 +6,7 @@ module MaintenanceTasks
|
|
6
6
|
# @api private
|
7
7
|
class TaskGenerator < Rails::Generators::NamedBase
|
8
8
|
source_root File.expand_path("templates", __dir__)
|
9
|
-
desc "This generator creates a task file at app/tasks and a corresponding "\
|
9
|
+
desc "This generator creates a task file at app/tasks and a corresponding " \
|
10
10
|
"test."
|
11
11
|
|
12
12
|
class_option :csv,
|
@@ -24,7 +24,7 @@ module MaintenanceTasks
|
|
24
24
|
# Creates the Task file.
|
25
25
|
def create_task_file
|
26
26
|
if options[:csv] && options[:no_collection]
|
27
|
-
raise "Multiple Task type options provided. Please use either "\
|
27
|
+
raise "Multiple Task type options provided. Please use either " \
|
28
28
|
"--csv or --no-collection."
|
29
29
|
end
|
30
30
|
template_file = File.join(
|
@@ -23,11 +23,11 @@ module MaintenanceTasks
|
|
23
23
|
DESC
|
24
24
|
|
25
25
|
# Specify the CSV file to process for CSV Tasks
|
26
|
-
desc = "Supply a CSV file to be processed by a CSV Task, "\
|
26
|
+
desc = "Supply a CSV file to be processed by a CSV Task, " \
|
27
27
|
"--csv path/to/csv/file.csv"
|
28
28
|
option :csv, lazy_default: :stdin, desc: desc
|
29
29
|
# Specify arguments to supply to a Task supporting parameters
|
30
|
-
desc = "Supply arguments for a Task that accepts parameters as a set of "\
|
30
|
+
desc = "Supply arguments for a Task that accepts parameters as a set of " \
|
31
31
|
"<key>:<value> pairs."
|
32
32
|
option :arguments, type: :hash, desc: desc
|
33
33
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: maintenance_tasks
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: 2.7.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Shopify Engineering
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-
|
11
|
+
date: 2024-04-16 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: actionpack
|
@@ -171,7 +171,7 @@ homepage: https://github.com/Shopify/maintenance_tasks
|
|
171
171
|
licenses:
|
172
172
|
- MIT
|
173
173
|
metadata:
|
174
|
-
source_code_uri: https://github.com/Shopify/maintenance_tasks/tree/v2.
|
174
|
+
source_code_uri: https://github.com/Shopify/maintenance_tasks/tree/v2.7.0
|
175
175
|
allowed_push_host: https://rubygems.org
|
176
176
|
post_install_message:
|
177
177
|
rdoc_options: []
|
@@ -188,7 +188,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
188
188
|
- !ruby/object:Gem::Version
|
189
189
|
version: '0'
|
190
190
|
requirements: []
|
191
|
-
rubygems_version: 3.5.
|
191
|
+
rubygems_version: 3.5.9
|
192
192
|
signing_key:
|
193
193
|
specification_version: 4
|
194
194
|
summary: A Rails engine for queuing and managing maintenance tasks
|