maintenance_tasks 2.6.0 → 2.7.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2dca6b60a4a6d6366eaf05b43d1e7d1244c59baef10fe9d25561cab727a70638
4
- data.tar.gz: c9827e001e131747ea2cb20301ef9749f394bd603a1d657afac0024e1feefc79
3
+ metadata.gz: dac65eae0e9b5f8066a994408059eb42689792fed29f36321632ff42a804a7cd
4
+ data.tar.gz: a8df371985f28c8d97a2e8363f263e93fa40c2291370c63da8bfa115bad0e21a
5
5
  SHA512:
6
- metadata.gz: 269fe1ab563047cf368cb552b91305c0a99b25affa6bea95553f55541c6e5a947726e441b3487181146d90d75bf9e9e1e853fc3302fda483ce0d00032ad7bceb
7
- data.tar.gz: 7b1ae78d3049cb3c28c17bcf068acc735441e4b4a3de6babeb8107294408a1ee139c03b6bad12e1004f0bef59c834314429079df0a11c6384148f007bc3a44a9
6
+ metadata.gz: 5144b9f8ab67ae9ea05b16c74178a96239cf22611627822e6ca00205c145e1767bc51015f796fd95dd40b973f3e5b6b1e1e67a950c72c628995602c631e0e49a
7
+ data.tar.gz: caeefe52b00431e3ba41297cc10f3dfb17c184b57237e74e6825f50b683be58431c079bf4921ac2529286a0b926fdbbe2ee6e61dc4350e3457d477ce18dc1ba1
data/README.md CHANGED
@@ -10,8 +10,8 @@ engine helps with the second part of this process, backfilling.
10
10
 
11
11
  Maintenance tasks are collection-based tasks, usually using Active Record, that
12
12
  update the data in your database. They can be paused or interrupted. Maintenance
13
- tasks can operate [in batches][#processing-batch-collections] and use
14
- [throttling][#throttling] to control the load on your database.
13
+ tasks can operate [in batches](#processing-batch-collections) and use
14
+ [throttling](#throttling) to control the load on your database.
15
15
 
16
16
  Maintenance tasks aren't meant to happen on a regular basis. They're used as
17
17
  needed, or as one-offs. Normally maintenance tasks are ephemeral, so they are
@@ -191,15 +191,57 @@ module Maintenance
191
191
  end
192
192
  ```
193
193
 
194
+ `posts.csv`:
194
195
  ```csv
195
- # posts.csv
196
196
  title,content
197
197
  My Title,Hello World!
198
198
  ```
199
199
 
200
200
  The files uploaded to your Active Storage service provider will be renamed to
201
- include an ISO 8601 timestamp and the Task name in snake case format. The CSV is
202
- expected to have a trailing newline at the end of the file.
201
+ include an ISO 8601 timestamp and the Task name in snake case format.
202
+
203
+ The implicit `#count` method loads and parses the entire file to determine the
204
+ accurate number of rows. With files with millions of rows, it takes several
205
+ seconds to process. Consider skipping the count (defining a `count` that returns
206
+ `nil`) or use an approximation, eg: count the number of new lines:
207
+
208
+ ```ruby
209
+ def count(task)
210
+ task.csv_content.count("\n") - 1
211
+ end
212
+ ```
213
+
214
+ #### CSV options
215
+
216
+ Tasks can pass [options for Ruby's CSV parser][csv-parse-options] by adding
217
+ keyword arguments to `csv_collection`:
218
+
219
+ [csv-parse-options]: https://ruby-doc.org/3.3.0/stdlibs/csv/CSV.html#class-CSV-label-Options+for+Parsing
220
+
221
+ ```ruby
222
+ # app/tasks/maintenance/import_posts_task.rb
223
+
224
+ module Maintenance
225
+ class ImportPosts
226
+ csv_collection(skip_lines: /^#/, converters: ->(field) { field.strip })
227
+
228
+ def process(row)
229
+ Post.create!(title: row["title"], content: row["content"])
230
+ end
231
+ end
232
+ end
233
+ ```
234
+
235
+ These options instruct Ruby's CSV parser to skip lines that start with a `#`,
236
+ and removes the leading and trailing spaces from any field, so that the
237
+ following file will be processed identically as the previous example:
238
+
239
+ `posts.csv`:
240
+ ```csv
241
+ # A comment
242
+ title,content
243
+ My Title ,Hello World!
244
+ ```
203
245
 
204
246
  #### Batch CSV Tasks
205
247
 
@@ -453,6 +495,68 @@ module Maintenance
453
495
  end
454
496
  ```
455
497
 
498
+ ### Subscribing to instrumentation events
499
+
500
+ If you are interested in actioning a specific task event, please refer to the
501
+ [Using Task Callbacks](#using-task-callbacks) section below. However, if you
502
+ want to subscribe to all events, irrespective of the task, you can use the
503
+ following Active Support notifications:
504
+
505
+ ```ruby
506
+ enqueued.maintenance_tasks # This event is published when a task has been enqueued by the user.
507
+ succeeded.maintenance_tasks # This event is published when a task has finished without any errors.
508
+ cancelled.maintenance_tasks # This event is published when the user explicitly halts the execution of a task.
509
+ paused.maintenance_tasks # This event is published when a task is paused by the user in the middle of its run.
510
+ errored.maintenance_tasks # This event is published when the task's code produces an unhandled exception.
511
+ ```
512
+
513
+ These notifications offer a way to monitor the lifecycle of maintenance tasks in
514
+ your application.
515
+
516
+ Usage example:
517
+
518
+ ```ruby
519
+ ActiveSupport::Notifications.subscribe("succeeded.maintenance_tasks") do |*, payload|
520
+ task_name = payload[:task_name]
521
+ arguments = payload[:arguments]
522
+ metadata = payload[:metadata]
523
+ job_id = payload[:job_id]
524
+ run_id = payload[:run_id]
525
+ time_running = payload[:time_running]
526
+ started_at = payload[:started_at]
527
+ ended_at = payload[:ended_at]
528
+ rescue => e
529
+ Rails.logger.error(e)
530
+ end
531
+
532
+ ActiveSupport::Notifications.subscribe("errored.maintenance_tasks") do |*, payload|
533
+ task_name = payload[:task_name]
534
+ error = payload[:error]
535
+ error_message = error[:message]
536
+ error_class = error[:class]
537
+ error_backtrace = error[:backtrace]
538
+ rescue => e
539
+ Rails.logger.error(e)
540
+ end
541
+
542
+ # or
543
+
544
+ class MaintenanceTasksInstrumenter < ActiveSupport::Subscriber
545
+ attach_to :maintenance_tasks
546
+
547
+ def enqueued(event)
548
+ task_name = event.payload[:task_name]
549
+ arguments = event.payload[:arguments]
550
+ metadata = event.payload[:metadata]
551
+
552
+ SlackNotifier.broadcast(SLACK_CHANNEL,
553
+ "Job #{task_name} was started by #{metadata[:user_email]}} with arguments #{arguments.to_s.truncate(255)}")
554
+ rescue => e
555
+ Rails.logger.error(e)
556
+ end
557
+ end
558
+ ```
559
+
456
560
  ### Using Task Callbacks
457
561
 
458
562
  The Task provides callbacks that hook into its life cycle.
@@ -503,21 +607,6 @@ end
503
607
  If any of the other callbacks cause an exception, it will be handled by the
504
608
  error handler, and will cause the task to stop running.
505
609
 
506
- Callback behaviour can be shared across all tasks using an initializer.
507
-
508
- ```ruby
509
- # config/initializer/maintenance_tasks.rb
510
- Rails.autoloaders.main.on_load("MaintenanceTasks::Task") do
511
- MaintenanceTasks::Task.class_eval do
512
- after_start(:notify)
513
-
514
- private
515
-
516
- def notify; end
517
- end
518
- end
519
- ```
520
-
521
610
  ### Considerations when writing Tasks
522
611
 
523
612
  Maintenance Tasks relies on the queue adapter configured for your application to
@@ -32,6 +32,7 @@ module MaintenanceTasks
32
32
 
33
33
  def build_enumerator(_run, cursor:)
34
34
  cursor ||= @run.cursor
35
+ self.cursor_position = cursor
35
36
  @collection_enum = @task.enumerator_builder(cursor: cursor)
36
37
 
37
38
  @collection_enum ||= case (collection = @task.collection)
@@ -12,16 +12,17 @@ module MaintenanceTasks
12
12
  # Initialize a BatchCsvCollectionBuilder with a batch size.
13
13
  #
14
14
  # @param batch_size [Integer] the number of CSV rows in a batch.
15
- def initialize(batch_size)
15
+ # @param csv_options [Hash] options to pass to the CSV parser.
16
+ def initialize(batch_size, **csv_options)
16
17
  @batch_size = batch_size
17
- super()
18
+ super(**csv_options)
18
19
  end
19
20
 
20
21
  # Defines the collection to be iterated over, based on the provided CSV.
21
22
  # Includes the CSV and the batch size.
22
23
  def collection(task)
23
24
  BatchCsv.new(
24
- csv: CSV.new(task.csv_content, headers: true),
25
+ csv: CSV.new(task.csv_content, **@csv_options),
25
26
  batch_size: @batch_size,
26
27
  )
27
28
  end
@@ -5,24 +5,27 @@ require "csv"
5
5
  module MaintenanceTasks
6
6
  # Strategy for building a Task that processes CSV files.
7
7
  #
8
+ # @param csv_options [Hash] options to pass to the CSV parser.
8
9
  # @api private
9
10
  class CsvCollectionBuilder
11
+ def initialize(**csv_options)
12
+ @csv_options = csv_options
13
+ end
14
+
10
15
  # Defines the collection to be iterated over, based on the provided CSV.
11
16
  #
12
- # @return [CSV] the CSV object constructed from the specified CSV content,
13
- # with headers.
17
+ # @return [CSV] the CSV object constructed from the specified CSV content.
14
18
  def collection(task)
15
- CSV.new(task.csv_content, headers: true)
19
+ CSV.new(task.csv_content, **@csv_options)
16
20
  end
17
21
 
18
- # The number of rows to be processed. Excludes the header row from the
19
- # count and assumes a trailing newline is at the end of the CSV file.
20
- # Note that this number is an approximation based on the number of
21
- # newlines.
22
+ # The number of rows to be processed.
23
+ # It uses the CSV library for an accurate row count.
24
+ # Note that the entire file is loaded. It will take several seconds with files with millions of rows.
22
25
  #
23
26
  # @return [Integer] the approximate number of rows to process.
24
27
  def count(task)
25
- task.csv_content.count("\n") - 1
28
+ CSV.new(task.csv_content, **@csv_options).count
26
29
  end
27
30
 
28
31
  # Return that the Task processes CSV content.
@@ -52,17 +52,17 @@ module MaintenanceTasks
52
52
  total = @run.tick_total
53
53
 
54
54
  if !total?
55
- "Processed #{number_to_delimited(count)} "\
55
+ "Processed #{number_to_delimited(count)} " \
56
56
  "#{"item".pluralize(count)}."
57
57
  elsif over_total?
58
- "Processed #{number_to_delimited(count)} "\
59
- "#{"item".pluralize(count)} "\
58
+ "Processed #{number_to_delimited(count)} " \
59
+ "#{"item".pluralize(count)} " \
60
60
  "(expected #{number_to_delimited(total)})."
61
61
  else
62
62
  percentage = 100.0 * count / total
63
63
 
64
- "Processed #{number_to_delimited(count)} out of "\
65
- "#{number_to_delimited(total)} #{"item".pluralize(total)} "\
64
+ "Processed #{number_to_delimited(count)} out of " \
65
+ "#{number_to_delimited(total)} #{"item".pluralize(total)} " \
66
66
  "(#{number_to_percentage(percentage, precision: 0)})."
67
67
  end
68
68
  end
@@ -39,6 +39,8 @@ module MaintenanceTasks
39
39
  enum status: STATUSES.to_h { |status| [status, status.to_s] }
40
40
  end
41
41
 
42
+ after_save :instrument_status_change
43
+
42
44
  validate :task_name_belongs_to_a_valid_task, on: :create
43
45
  validate :csv_attachment_presence, on: :create
44
46
  validate :csv_content_type, on: :create
@@ -452,6 +454,30 @@ module MaintenanceTasks
452
454
 
453
455
  private
454
456
 
457
+ def instrument_status_change
458
+ return unless status_previously_changed? || id_previously_changed?
459
+ return if running? || pausing? || cancelling? || interrupted?
460
+
461
+ attr = {
462
+ run_id: id,
463
+ job_id: job_id,
464
+ task_name: task_name,
465
+ arguments: arguments,
466
+ metadata: metadata,
467
+ time_running: time_running,
468
+ started_at: started_at,
469
+ ended_at: ended_at,
470
+ }
471
+
472
+ attr[:error] = {
473
+ message: error_message,
474
+ class: error_class,
475
+ backtrace: backtrace,
476
+ } if errored?
477
+
478
+ ActiveSupport::Notifications.instrument("#{status}.maintenance_tasks", attr)
479
+ end
480
+
455
481
  def run_task_callbacks(callback)
456
482
  task.run_callbacks(callback)
457
483
  rescue Task::NotFoundError
@@ -74,7 +74,7 @@ module MaintenanceTasks
74
74
 
75
75
  def enqueue(run, job)
76
76
  unless job.enqueue
77
- raise "The job to perform #{run.task_name} could not be enqueued. "\
77
+ raise "The job to perform #{run.task_name} could not be enqueued. " \
78
78
  "Enqueuing has been prevented by a callback."
79
79
  end
80
80
  rescue => error
@@ -65,20 +65,25 @@ module MaintenanceTasks
65
65
  # Make this Task a task that handles CSV.
66
66
  #
67
67
  # @param in_batches [Integer] optionally, supply a batch size if the CSV
68
- # should be processed in batches.
68
+ # should be processed in batches.
69
+ # @param csv_options [Hash] optionally, supply options for the CSV parser.
70
+ # If not given, defaults to: <code>{ headers: true }</code>
71
+ # @see https://ruby-doc.org/3.3.0/stdlibs/csv/CSV.html#class-CSV-label-Options+for+Parsing
69
72
  #
70
73
  # An input to upload a CSV will be added in the form to start a Run. The
71
74
  # collection and count method are implemented.
72
- def csv_collection(in_batches: nil)
75
+ def csv_collection(in_batches: nil, **csv_options)
73
76
  unless defined?(ActiveStorage)
74
- raise NotImplementedError, "Active Storage needs to be installed\n"\
77
+ raise NotImplementedError, "Active Storage needs to be installed\n" \
75
78
  "To resolve this issue run: bin/rails active_storage:install"
76
79
  end
77
80
 
81
+ csv_options[:headers] = true unless csv_options.key?(:headers)
82
+ csv_options[:encoding] ||= Encoding.default_external
78
83
  self.collection_builder_strategy = if in_batches
79
- BatchCsvCollectionBuilder.new(in_batches)
84
+ BatchCsvCollectionBuilder.new(in_batches, **csv_options)
80
85
  else
81
- CsvCollectionBuilder.new
86
+ CsvCollectionBuilder.new(**csv_options)
82
87
  end
83
88
  end
84
89
 
@@ -6,7 +6,7 @@ module MaintenanceTasks
6
6
  # @api private
7
7
  class TaskGenerator < Rails::Generators::NamedBase
8
8
  source_root File.expand_path("templates", __dir__)
9
- desc "This generator creates a task file at app/tasks and a corresponding "\
9
+ desc "This generator creates a task file at app/tasks and a corresponding " \
10
10
  "test."
11
11
 
12
12
  class_option :csv,
@@ -24,7 +24,7 @@ module MaintenanceTasks
24
24
  # Creates the Task file.
25
25
  def create_task_file
26
26
  if options[:csv] && options[:no_collection]
27
- raise "Multiple Task type options provided. Please use either "\
27
+ raise "Multiple Task type options provided. Please use either " \
28
28
  "--csv or --no-collection."
29
29
  end
30
30
  template_file = File.join(
@@ -23,11 +23,11 @@ module MaintenanceTasks
23
23
  DESC
24
24
 
25
25
  # Specify the CSV file to process for CSV Tasks
26
- desc = "Supply a CSV file to be processed by a CSV Task, "\
26
+ desc = "Supply a CSV file to be processed by a CSV Task, " \
27
27
  "--csv path/to/csv/file.csv"
28
28
  option :csv, lazy_default: :stdin, desc: desc
29
29
  # Specify arguments to supply to a Task supporting parameters
30
- desc = "Supply arguments for a Task that accepts parameters as a set of "\
30
+ desc = "Supply arguments for a Task that accepts parameters as a set of " \
31
31
  "<key>:<value> pairs."
32
32
  option :arguments, type: :hash, desc: desc
33
33
 
@@ -1,4 +1,5 @@
1
1
  # frozen_string_literal: true
2
+
2
3
  # desc "Explaining what the task does"
3
4
  # task :maintenance_tasks do
4
5
  # # Task goes here
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: maintenance_tasks
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.6.0
4
+ version: 2.7.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shopify Engineering
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2024-02-12 00:00:00.000000000 Z
11
+ date: 2024-06-12 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: actionpack
@@ -52,6 +52,20 @@ dependencies:
52
52
  - - ">="
53
53
  - !ruby/object:Gem::Version
54
54
  version: '6.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: csv
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
55
69
  - !ruby/object:Gem::Dependency
56
70
  name: job-iteration
57
71
  requirement: !ruby/object:Gem::Requirement
@@ -171,7 +185,7 @@ homepage: https://github.com/Shopify/maintenance_tasks
171
185
  licenses:
172
186
  - MIT
173
187
  metadata:
174
- source_code_uri: https://github.com/Shopify/maintenance_tasks/tree/v2.6.0
188
+ source_code_uri: https://github.com/Shopify/maintenance_tasks/tree/v2.7.1
175
189
  allowed_push_host: https://rubygems.org
176
190
  post_install_message:
177
191
  rdoc_options: []
@@ -188,7 +202,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
188
202
  - !ruby/object:Gem::Version
189
203
  version: '0'
190
204
  requirements: []
191
- rubygems_version: 3.5.6
205
+ rubygems_version: 3.5.11
192
206
  signing_key:
193
207
  specification_version: 4
194
208
  summary: A Rails engine for queuing and managing maintenance tasks