batches_task_processor 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0d65fcf4a0f70a2c325afe4ac8e61142c810d8fa884ef5eb6caac62c43f6461a
4
- data.tar.gz: 974ef55a01812679696cecc85fbf4d6cd3f9ba3d8849a258a3f5b63ecab9b3e2
3
+ metadata.gz: a9292ab75aea73468e3c48bf668388c3767e6e55aa36872adcd044014b20c7b1
4
+ data.tar.gz: 6955b56074ea63f120010518807a03a58f053b23f57b7dc3064e9c8705b84d14
5
5
  SHA512:
6
- metadata.gz: d1fb278e8d5ca7b07011e635ccef6de8b60f8acb3584b2bb877a7aef1985b0a6f0d2506fa80e0c920d69ae09aecda8a930caf9544f2ed98ba61c6f8cccb751b5
7
- data.tar.gz: 6f4b98fca6315f6c088d32c5d7df7c43c56d66a7304308c9049b76d6762472001fd2347f44f444baf059fbd55f8c439cac45aa484b655ba753b9853952626c37
6
+ metadata.gz: f88cecfa026896d758f24c260cc9b2f6a2516a83e80df80bd8862b44fe650ceedd07cb867fdbb6a723bf0235b79cdc612090c8b0cf361a5af3471af2572c4358
7
+ data.tar.gz: 7fc18b9188e50f2ee84c61a67bbd4b4e6769c39aa2658c0a0de8b92ff44542a3c2d1b2560468f2ae6e829d415cc03af387dd4b85f07d1199ac90bb3ae0a8dd88
data/README.md CHANGED
@@ -1,6 +1,5 @@
1
1
  # BatchesTaskProcessor
2
- Gem that allows to process huge amount of tasks in parallel using batches (Supports for array or activerecord collections).
3
- This gem depends on `Rails.cache` to save results of processing (In the future: use a database table instead).
2
+ Gem that allows to process huge amount of any kind of tasks in parallel using batches.
4
3
 
5
4
  ## Installation
6
5
  Add this line to your application's Gemfile:
@@ -11,42 +10,48 @@ gem "batches_task_processor"
11
10
  And then execute: `bundle install`
12
11
 
13
12
 
14
- ## Usage
15
- - Create an initializer file for your application:
16
- Sample Array:
13
+ ## Usage
14
+ - Register a new task:
17
15
  ```ruby
18
- # config/initializers/batches_task_processor.rb
19
- BatchesTaskProcessor::Config.configure do |config|
20
- config.per_page = 100
21
- config.calculate_items = -> { [1,2,3,5] }
22
- config.preload_job_items = -> { |items| Article.where(id: items) }
23
- config.process_item = -> { |item| MyService.new.process(item) }
24
- end
16
+ task = BatchesTaskProcessor::Model.create!(
17
+ key: 'my_process',
18
+ data: [1, 2, 3],
19
+ qty_jobs: 10,
20
+ process_item: 'puts "my item: #{item}"'
21
+ )
25
22
  ```
26
- Sample ActiveRecord Collection:
27
- ```ruby
28
- # config/initializers/batches_task_processor.rb
29
- BatchesTaskProcessor::Config.configure do |config|
30
- config.per_page = 100
31
- config.calculate_items = -> { Article.where(created_at: 10.days.ago..Time.current) }
32
- config.process_item = -> { |item| MyService.new.process(item) }
33
- end
34
- ```
23
+ Activerecord sample (recommended `preload_job_items` for performance reasons):
24
+ ```ruby
25
+ task = BatchesTaskProcessor::Model.create!(
26
+ key: 'my_process',
27
+ data: Article.all.pluck(:id),
28
+ qty_jobs: 10,
29
+ preload_job_items: 'Article.where(id: items)',
30
+ process_item: 'puts "my article: #{item.id}"'
31
+ )
32
+ ```
33
+
34
+ - Run the corresponding rake task:
35
+ Copy the `task.id` from step one and use it in the following code:
36
+ `RUNNER_MODEL_ID=<id-here> rake batches_task_processor:call`
37
+
38
+ ![Photo](./img.png)
39
+
40
+ ## TODO
41
+ - update tests
35
42
 
36
43
  ## Api
37
44
  Settings:
38
- - `config.calculate_items` (Mandatory) method called to calculate the whole list of items to process
39
- - `config.process_item(item)` (Mandatory) method called to process each item
40
- - `config.per_page` (Optional) number of items in one batch
41
- - `config.preload_job_items(items)` (Optional) Allows to preload associations or load objects list. Provides `items` which is a chunk of items to process.
42
- Tasks:
43
- - `rake batches_task_processor:call` Starts the processing of jobs.
44
- - `rake batches_task_processor:process_job` (Only for internal usage).
45
- - `rake batches_task_processor:retry` Retries the processing of all jobs (ignores already processed).
45
+ - `data` (Array<Integer|String>) Array of whole items to be processed.
46
+ - `key` (Mandatory) key to be used to identify the task.
47
+ - `qty_jobs` (Optional) number of jobs to be created. Default: `10`
48
+ - `process_item` (Mandatory) callback to be called to perform each item where `item` variable holds the current item value. Sample: `'Article.find(item).update_column(:title, "changed")'`
49
+ - `preload_job_items` (Optional) callback that allows to preload items list and/or associations where `items` variable holds the current chunk of items to be processed (by default returns the same list). Sample: `Article.where(id: items)`
50
+
51
+ Tasks (requires `RUNNER_MODEL_ID` env variable):
52
+ - `rake batches_task_processor:call` Starts the processing of jobs (Skips already processed ones when rerunning after cancel).
46
53
  - `rake batches_task_processor:status` Prints the process status.
47
- - `rake batches_task_processor:cancel` Marks as cancelled the process and stops processing jobs.
48
- - `rake batches_task_processor:clear` Removes all process logs or tmp data.
49
-
54
+ - `rake batches_task_processor:cancel` Marks as cancelled the process and stops processing jobs (Change into `pending` to rerun again).
50
55
 
51
56
  ## Contributing
52
57
  Contribution directions go here.
@@ -0,0 +1,34 @@
1
+ # frozen_string_literal: true
2
+
3
+ module BatchesTaskProcessor
4
+ class Model < ActiveRecord::Base
5
+ self.table_name = 'batches_task_processors'
6
+ has_many :items, class_name: 'BatchesTaskProcessor::ModelItem', dependent: :destroy, foreign_key: :batches_task_processors_id
7
+ validates :process_item, presence: true
8
+ validates :key, presence: true
9
+ before_create :apply_data_uniqueness
10
+ # state: :pending, :processing, :finished, :canceled
11
+
12
+ def qty_items_job
13
+ @qty_items_job ||= (data.count.to_f / qty_jobs).ceil
14
+ end
15
+
16
+ def finish!
17
+ update!(state: :finished, finished_at: Time.current)
18
+ end
19
+
20
+ def cancel!
21
+ update!(state: :canceled)
22
+ end
23
+
24
+ def all_processed?
25
+ items.count == data.count
26
+ end
27
+
28
+ private
29
+
30
+ def apply_data_uniqueness
31
+ self.data = data.uniq
32
+ end
33
+ end
34
+ end
@@ -0,0 +1,8 @@
1
+ # frozen_string_literal: true
2
+
3
+ module BatchesTaskProcessor
4
+ class ModelItem < ActiveRecord::Base
5
+ self.table_name = 'batches_task_processor_items'
6
+ belongs_to :parent, class_name: 'BatchesTaskProcessor::Model'
7
+ end
8
+ end
@@ -4,141 +4,95 @@ require 'active_support/all'
4
4
  module BatchesTaskProcessor
5
5
  class Processor
6
6
  RUNNER_JOB_KEY = 'RUNNER_JOB_KEY'
7
+ attr_reader :model_id
8
+
9
+ def initialize(model_id = nil)
10
+ @model_id = model_id || ENV['RUNNER_MODEL_ID']
11
+ end
7
12
 
8
13
  def call
9
- init_cache
10
14
  init_jobs
11
15
  end
12
16
 
13
17
  def process_job(job_no)
14
- run_job(job_no.to_i, calculate_items)
15
- end
16
-
17
- def retry
18
- init_jobs
18
+ run_job(job_no.to_i)
19
19
  end
20
20
 
21
21
  def status
22
- res = Rails.cache.read(RUNNER_JOB_KEY)
23
- res[:jobs] = res[:jobs].times.map { |i| job_registry(i)[:items].count }
24
- puts "Process status: #{res.inspect}"
22
+ log "Process status: #{process_model.items.count}/#{process_model.data.count}"
25
23
  end
26
24
 
27
25
  def cancel
28
- data = Rails.cache.read(RUNNER_JOB_KEY)
29
- data[:cancelled] = true
30
- Rails.cache.write(RUNNER_JOB_KEY, data)
31
- end
32
-
33
- def clear
34
- res = Rails.cache.read(RUNNER_JOB_KEY)
35
- res[:jobs].times.each { |i| job_registry(i, :delete) }
36
- Rails.cache.delete(RUNNER_JOB_KEY)
26
+ process_model.cancel!
37
27
  end
38
28
 
39
29
  private
40
30
 
41
- # ****** customizations
42
- # @example ['article_id1', 'article_id2', 'article_id3']
43
- # @example Article.where(created_at: 1.month_ago..Time.current)
44
- def calculate_items
45
- instance_exec(&BatchesTaskProcessor::Config.calculate_items)
46
- end
47
-
48
31
  # @example item.perform_my_action
49
32
  def process_item(item)
50
- instance_exec(item, &BatchesTaskProcessor::Config.process_item)
51
- end
52
-
53
- def per_page
54
- BatchesTaskProcessor::Config.per_page
33
+ instance_eval(process_model.process_item)
55
34
  end
56
35
 
57
36
  # @example Article.where(no: items)
58
37
  def preload_job_items(items)
59
- instance_exec(items, &BatchesTaskProcessor::Config.preload_job_items)
60
- end
61
- # ****** end customizations
62
-
63
- def init_cache
64
- items = calculate_items
65
- jobs = (items.count.to_f / per_page).ceil
66
- data = { jobs: jobs, count: items.count, date: Time.current, finished_jobs: [], cancelled: false }
67
- main_registry(data)
38
+ instance_eval(process_model.preload_job_items || 'items')
68
39
  end
69
40
 
70
41
  def init_jobs
71
- jobs = main_registry[:jobs]
42
+ jobs = process_model.qty_jobs
72
43
  log "Initializing #{jobs} jobs..."
73
44
  jobs.times.each do |index|
74
45
  log "Starting ##{index} job..."
75
- pid = Process.spawn("RUNNER_JOB_NO=#{index} rake batches_task_processor:process_job &")
46
+ env_vars = "RUNNER_JOB_NO=#{index} RUNNER_MODEL_ID=#{model_id}"
47
+ pid = Process.spawn("#{env_vars} rake batches_task_processor:process_job &")
76
48
  Process.detach(pid)
77
49
  end
78
50
  end
79
51
 
80
- def run_job(job, items)
52
+ def run_job(job)
81
53
  log "Running ##{job} job..."
82
- preload_job_items(job_items(items, job)).each_with_index do |item, index|
54
+ items = job_items(job)
55
+ (items.try(:find_each) || items.each).with_index do |item, index|
83
56
  key = item.try(:id) || item
84
57
  break log('Process cancelled') if process_cancelled?
85
- next log("Skipping #{key}...") if already_processed?(job, key)
58
+ next log("Skipping #{key}...") if already_processed?(key)
86
59
 
87
60
  start_process_item(item, job, key, index)
88
61
  end
89
62
 
90
- mark_finished_job(job)
91
63
  log "Finished #{job} job..."
64
+ process_model.finish! if process_model.all_processed?
92
65
  end
93
66
 
94
- def job_items(items, job)
95
- items.is_a?(Array) ? items.each_slice(per_page).to_a[job] : items.offset(job * per_page).limit(per_page)
67
+ def job_items(job)
68
+ res = process_model.data.each_slice(process_model.qty_items_job).to_a[job]
69
+ preload_job_items(res)
96
70
  end
97
71
 
98
72
  def start_process_item(item, job, key, index)
99
- log "Processing #{job}/#{key}: #{index}/#{per_page}"
100
- process_item(item)
101
- update_job_cache(job, key)
73
+ log "Processing #{job}/#{key}: #{index}/#{process_model.qty_items_job}"
74
+ result = process_item(item)
75
+ process_model.items.create!(key: key, result: result.to_s[0..255])
102
76
  rescue => e
103
- update_job_cache(job, key, e.message)
77
+ process_model.items.create!(key: key, error_details: e.message)
104
78
  log "Process failed #{job}/#{key}: #{e.message}"
105
79
  end
106
80
 
107
- def main_registry(new_data = nil)
108
- Rails.cache.write(RUNNER_JOB_KEY, new_data, expires_in: 1.week) if new_data
109
- new_data || Rails.cache.read(RUNNER_JOB_KEY)
110
- end
111
-
112
- def mark_finished_job(job)
113
- main_registry(main_registry.merge(finished_jobs: main_registry[:finished_jobs] + [job]))
114
- end
115
-
116
- def job_registry(job, new_data = nil)
117
- key = "#{RUNNER_JOB_KEY}/#{job}"
118
- default_data = { items: [], errors: [] }
119
- Rails.cache.write(key, default_data, expires_in: 1.week) unless Rails.cache.read(key)
120
- Rails.cache.write(key, new_data, expires_in: 1.week) if new_data
121
- Rails.cache.delete(key) if new_data == :delete
122
- new_data || Rails.cache.read(key)
123
- end
124
-
125
- def update_job_cache(job, value, error = nil)
126
- data = job_registry(job)
127
- data[:items] << value
128
- data[:errors] << [value, error] if error
129
- job_registry(job, data)
130
- end
131
-
132
- def already_processed?(job, value)
133
- job_registry(job)[:items].include?(value)
81
+ def already_processed?(key)
82
+ process_model.items.where(key: key).exists?
134
83
  end
135
84
 
136
85
  def process_cancelled?
137
- Rails.cache.read(RUNNER_JOB_KEY)[:cancelled]
86
+ process_model.state == 'cancelled'
138
87
  end
139
88
 
140
89
  def log(msg)
141
90
  puts "BatchesTaskProcessor => #{msg}"
142
91
  end
92
+
93
+ def process_model
94
+ klass = BatchesTaskProcessor::Model.all
95
+ model_id ? klass.find(model_id) : klass.last
96
+ end
143
97
  end
144
98
  end
@@ -6,5 +6,9 @@ module BatchesTaskProcessor
6
6
  rake_tasks do
7
7
  load 'tasks/batches_task_processor_tasks.rake'
8
8
  end
9
+ initializer :append_migrations do |app|
10
+ path = File.join(File.expand_path('../../', __FILE__), 'db/migrate')
11
+ app.config.paths["db/migrate"] << path
12
+ end
9
13
  end
10
14
  end
@@ -1,3 +1,3 @@
1
1
  module BatchesTaskProcessor
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.0"
3
3
  end
@@ -1,18 +1,10 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require "batches_task_processor/version"
2
4
  require "batches_task_processor/railtie"
3
5
  require "batches_task_processor/processor"
4
-
6
+ require "batches_task_processor/model"
7
+ require "batches_task_processor/model_item"
5
8
 
6
9
  module BatchesTaskProcessor
7
- class Config
8
- cattr_accessor(:per_page) { 5000 }
9
- cattr_accessor(:calculate_items) { -> { raise('Implement calculate_items method') } }
10
- cattr_accessor(:process_item) { -> (_item) { raise('Implement calculate_items method') } }
11
- cattr_accessor(:preload_job_items) { -> (items) { items } }
12
-
13
-
14
- def self.configure
15
- yield self
16
- end
17
- end
18
10
  end
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+
3
+ class AddBatchesTaskProcessor < ActiveRecord::Migration[5.0]
4
+ def change
5
+ create_table :batches_task_processors do |t|
6
+ t.string :key
7
+ t.string :state, default: :pending
8
+ t.json :data, default: []
9
+ t.integer :qty_jobs, default: 10
10
+ t.datetime :finished_at
11
+ t.text :preload_job_items
12
+ t.text :process_item, null: false
13
+ t.timestamps
14
+ end
15
+
16
+ create_table :batches_task_processor_items do |t|
17
+ t.belongs_to :batches_task_processors, foreign_key: true, index: { name: 'index_batches_task_processors_parent_id' }
18
+ t.string :key
19
+ t.text :result
20
+ t.text :error_details
21
+ t.timestamps
22
+ end
23
+ end
24
+ end
@@ -3,31 +3,22 @@
3
3
  namespace :batches_task_processor do
4
4
  desc 'Starts the Batches Task Processor'
5
5
  task call: :environment do
6
- BatchesTaskProcessor::Processor.new.call
6
+ BatchesTaskProcessor::Processor.new(ENV['RUNNER_MODEL_ID']).call
7
7
  end
8
8
 
9
9
  desc 'Starts the Batches Task Processor'
10
10
  task process_job: :environment do
11
- BatchesTaskProcessor::Processor.new.process_job(ENV['RUNNER_JOB_NO'])
11
+ BatchesTaskProcessor::Processor.new(ENV['RUNNER_MODEL_ID']).process_job(ENV['RUNNER_JOB_NO'])
12
12
  end
13
13
 
14
- desc 'Retries the Batches Task Processor'
15
- task retry: :environment do
16
- BatchesTaskProcessor::Processor.new.retry
17
- end
18
14
 
19
15
  desc 'Prints the status of the Task Processor'
20
16
  task status: :environment do
21
- BatchesTaskProcessor::Processor.new.status
17
+ BatchesTaskProcessor::Processor.new(ENV['RUNNER_MODEL_ID']).status
22
18
  end
23
19
 
24
20
  desc 'Cancels the Batches Task Processor'
25
21
  task cancel: :environment do
26
- BatchesTaskProcessor::Processor.new.cancel
27
- end
28
-
29
- desc 'Clears the Batches Task Processor cache'
30
- task clear: :environment do
31
- BatchesTaskProcessor::Processor.new.clear
22
+ BatchesTaskProcessor::Processor.new(ENV['RUNNER_MODEL_ID']).cancel
32
23
  end
33
24
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: batches_task_processor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Owen Peredo
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-07-29 00:00:00.000000000 Z
11
+ date: 2022-07-31 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rails
@@ -34,9 +34,12 @@ files:
34
34
  - README.md
35
35
  - Rakefile
36
36
  - lib/batches_task_processor.rb
37
+ - lib/batches_task_processor/model.rb
38
+ - lib/batches_task_processor/model_item.rb
37
39
  - lib/batches_task_processor/processor.rb
38
40
  - lib/batches_task_processor/railtie.rb
39
41
  - lib/batches_task_processor/version.rb
42
+ - lib/db/migrate/20220727101904_add_batches_task_processor.rb
40
43
  - lib/tasks/batches_task_processor_tasks.rake
41
44
  homepage: https://github.com/owen2345/batches-task-processor
42
45
  licenses: []