rocketjob 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d784e40e75e2aca697e1258fb6d823be8e03c6f1
4
+ data.tar.gz: 69008d9f1ae0396be4e1838f3d931299af226005
5
+ SHA512:
6
+ metadata.gz: 614602a9f849b27bfbd4e2fda0985da5ae798e4a95e9ccbfe84a49d620fc1fedd9423df5395e7a1acefc4f42f61008d467d136a2c122ef19038e1dfd4b7dd555
7
+ data.tar.gz: be35af2fafca63f647ebb485cfcdbcb76399556ac86693a48ec22d98cbfca512925b7c07ac9ca191a1fb7ce37ff0e2235124ddc7f79c7bf313647797e14e35f1
data/README.md ADDED
@@ -0,0 +1,160 @@
1
+ # rocketjob
2
+
3
+ High volume, priority based, background job processing solution for Ruby.
4
+
5
+ ## Status
6
+
7
+ Alpha - Feedback on the API is welcome. API will change.
8
+
9
+ Already in use in production internally processing large files with millions
10
+ of records, as well as large jobs to walk though large databases.
11
+
12
+ ## Why?
13
+
14
+ We have tried for years to make both `resque` and more recently `sidekiq`
15
+ work for large high performance batch processing.
16
+ Even `sidekiq-pro` was purchased and used in an attempt to process large batches.
17
+
18
+ Unfortunately, after all the pain and suffering with the existing asynchronous
19
+ worker solutions none of them have worked in our production environment without
20
+ significant hand-holding and constant support. Mysteriously the odd record/job
21
+ was disappearing when processing 100's of millions of jobs with no indication
22
+ where those lost jobs went.
23
+
24
+ In our environment we cannot lose even a single job or record, as all data is
25
+ business critical. The existing batch processing solution do not supply any way
26
+ to collect the output from batch processing and as a result every job has custom
27
+ code to collect it's output. rocketjob has built in support to collect the results
28
+ of any batch job.
29
+
30
+ High availability and high throughput were being limited by how much we could get
31
+ through `redis`. Being a single-threaded process it is constrained to a single
32
+ CPU. Putting `redis` on a large multi-core box does not help since it will not
33
+ use more than one CPU at a time.
34
+ Additionally, `redis` is constrained to the amount of physical memory is available
35
+ on the server.
36
+ `redis` worked very well when processing was below around 100,000 jobs a day,
37
+ when our workload suddenly increased to over 100,000,000 a day it could not keep
38
+ up. Its single CPU would often hit 100% CPU utilization when running many `sidekiq-pro`
39
+ servers. We also had to store actual job data in a separate MySQL database since
40
+ it would not fit in memory on the `redis` server.
41
+
42
+ `rocketjob` was created out of necessity due to constant support. End-users were
43
+ constantly contacting the development team to ask on the status of "hung" or
44
+ "in-complete" jobs, as part of our DevOps role.
45
+
46
+ Another significant production support challenge is trying to get `resque` or `sidekiq`
47
+ to process the batch jobs in a very specific order. Switching from queue-based
48
+ to priority-based job processing means that all jobs are processed in the order of
49
+ their priority and not what queues are defined on what servers and in what quantity.
50
+ This approach has allowed us to significantly increase the CPU and IO utilization
51
+ across all worker machines. The traditional queue based approach required constant
52
+ tweaking in the production environment to try and balance workload without overwhelming
53
+ any one server.
54
+
55
+ End-users are now able to modify the priority of their various jobs at runtime
56
+ so that they can get that business critical job out first, instead of having to
57
+ wait for other jobs of the same type/priority to finish first.
58
+
59
+ Since `rocketjob` uploads the entire file, or all data for processing it does not
60
+ require jobs to store the data in other databases.
61
+ Additionally, `rocketjob` supports encryption and compression of any data uploaded
62
+ into Sliced Jobs to ensure PCI compliance and to prevent sensitive from being exposed
63
+ either at rest in the data store, or in flight as it is being read or written to the
64
+ backend data store.
65
+ Often large files received for processing contain sensitive data that must not be exposed
66
+ in the backend job store. Having this capability built-in ensures all our jobs
67
+ are properly securing sensitive data.
68
+
69
+ Since moving to `rocketjob` our production support has diminished and now we can
70
+ focus on writing code again. :)
71
+
72
+ ## Introduction
73
+
74
+ `rocketjob` is a global "priority based queue" (https://en.wikipedia.org/wiki/Priority_queue)
75
+ All jobs are placed in a single global queue and the job with the highest priority
76
+ is processed first. Jobs with the same priority are processed on a first-in
77
+ first-out (FIFO) basis.
78
+
79
+ This differs from the traditional approach of separate queues for jobs which
80
+ quickly becomes cumbersome when there are for example over a hundred different
81
+ types of jobs.
82
+
83
+ The global priority based queue ensures that the servers are utilized to their
84
+ capacity without requiring constant manual intervention.
85
+
86
+ `rocketjob` is designed to handle hundreds of millions of concurrent jobs
87
+ that are often encountered in high volume batch processing environments.
88
+ It is designed from the ground up to support large batch file processing.
89
+ For example a single file that contains millions of records to be processed
90
+ as quickly as possible without impacting other jobs with a higher priority.
91
+
92
+ ## Management
93
+
94
+ The companion project [rocketjob mission control](https://github.com/lambcr/rocket_job_mission_control)
95
+ contains the Rails Engine that can be loaded into your Rails project to add
96
+ a web interface for viewing and managing `rocketjob` jobs.
97
+
98
+ `rocketjob mission control` can also be run stand-alone in a shell Rails application.
99
+
100
+ By separating `rocketjob mission control` into a separate gem means it does not
101
+ have to be loaded where `rocketjob` jobs are defined or run.
102
+
103
+ ## Jobs
104
+
105
+ Simple single task jobs:
106
+
107
+ Example job to run in a separate worker process
108
+
109
+ ```ruby
110
+ class MyJob < RocketJob::Job
111
+ # Method to call asynchronously by the worker
112
+ def perform(email_address, message)
113
+ # For example send an email to the supplied address with the supplied message
114
+ send_email(email_address, message)
115
+ end
116
+ end
117
+ ```
118
+
119
+ To queue the above job for processing:
120
+
121
+ ```ruby
122
+ MyJob.perform_later('jack@blah.com', 'lets meet')
123
+ ```
124
+
125
+ ## Configuration
126
+
127
+ MongoMapper will already configure itself in Rails environments. Sometimes we want
128
+ to use a different Mongo Database instance for the records and results.
129
+
130
+ For example, the RocketJob::Job can be stored in a Mongo Database that is replicated
131
+ across data centers, whereas we may not want to replicate record and result data
132
+ due to it's sheer volume.
133
+
134
+ ```ruby
135
+ config.before_initialize do
136
+ # If this environment has a separate Work server
137
+ # Share the common mongo configuration file
138
+ config_file = root.join('config', 'mongo.yml')
139
+ if config_file.file?
140
+ if config = YAML.load(ERB.new(config_file.read).result)["#{Rails.env}_work]
141
+ options = (config['options']||{}).symbolize_keys
142
+ # In the development environment the Mongo driver generates a lot of
143
+ # network trace log data, move its debug logging to :trace
144
+ options[:logger] = SemanticLogger::DebugAsTraceLogger.new('Mongo:Work')
145
+ RocketJob::Config.mongo_work_connection = Mongo::MongoClient.from_uri(config['uri'], options)
146
+
147
+ # It is also possible to store the jobs themselves in a separate MongoDB database
148
+ # RocketJob::Config.mongo_connection = Mongo::MongoClient.from_uri(config['uri'], options)
149
+ end
150
+ else
151
+ puts "\nmongo.yml config file not found: #{config_file}"
152
+ end
153
+ end
154
+ ```
155
+
156
+ ## Requirements
157
+
158
+ MongoDB V2.6 or greater. V3 is recommended
159
+
160
+ * V2.6 includes a feature to allow lookups using the `$or` clause to use an index
data/Rakefile ADDED
@@ -0,0 +1,28 @@
1
+ require 'rake/clean'
2
+ require 'rake/testtask'
3
+
4
+ $LOAD_PATH.unshift File.expand_path("../lib", __FILE__)
5
+ require 'rocket_job/version'
6
+
7
+ task :gem do
8
+ system "gem build rocketjob.gemspec"
9
+ end
10
+
11
+ task :publish => :gem do
12
+ system "git tag -a v#{RocketJob::VERSION} -m 'Tagging #{RocketJob::VERSION}'"
13
+ system "git push --tags"
14
+ system "gem push rocketjob-#{RocketJob::VERSION}.gem"
15
+ system "rm rocketjob-#{RocketJob::VERSION}.gem"
16
+ end
17
+
18
+ desc "Run Test Suite"
19
+ task :test do
20
+ Rake::TestTask.new(:functional) do |t|
21
+ t.test_files = FileList['test/**/*_test.rb']
22
+ t.verbose = true
23
+ end
24
+
25
+ Rake::Task['functional'].invoke
26
+ end
27
+
28
+ task :default => :test
data/bin/rocketjob ADDED
@@ -0,0 +1,13 @@
1
+ #!/usr/bin/env ruby
2
+ require 'rocketjob'
3
+
4
+ # Start a rocketjob server instance from the command line
5
+ begin
6
+ RocketJob::CLI.new(ARGV).run
7
+ rescue => exc
8
+ # Failsafe logger that writes to STDERR
9
+ SemanticLogger.add_appender(STDERR, :error, &SemanticLogger::Appender::Base.colorized_formatter)
10
+ SemanticLogger['RocketJob'].error('Rocket Job shutting down due to exception', exc)
11
+ SemanticLogger.flush
12
+ exit 1
13
+ end
@@ -0,0 +1,76 @@
1
+ require 'optparse'
2
+ module RocketJob
3
+ # Command Line Interface parser for RocketJob
4
+ class CLI
5
+ attr_reader :name, :threads, :environment, :pidfile, :directory, :quiet
6
+
7
+ def initialize(argv)
8
+ @name = nil
9
+ @threads = nil
10
+
11
+ @quiet = false
12
+ @environment = ENV['RAILS_ENV'] || ENV['RACK_ENV'] || 'development'
13
+ @pidfile = nil
14
+ @directory = '.'
15
+ parse(argv)
16
+ end
17
+
18
+ # Run a RocketJob::Server from the command line
19
+ def run
20
+ SemanticLogger.add_appender(STDOUT, &SemanticLogger::Appender::Base.colorized_formatter) unless quiet
21
+ boot_rails
22
+ write_pidfile
23
+
24
+ opts = {}
25
+ opts[:name] = name if name
26
+ opts[:max_threads] = threads if threads
27
+ Server.run(opts)
28
+ end
29
+
30
+ # Initialize the Rails environment
31
+ def boot_rails
32
+ require File.expand_path("#{directory}/config/environment.rb")
33
+ if Rails.configuration.eager_load
34
+ RocketJob::Server.logger.benchmark_info('Eager loaded Rails and all Engines') do
35
+ Rails.application.eager_load!
36
+ Rails::Engine.subclasses.each { |engine| engine.eager_load! }
37
+ end
38
+ end
39
+ end
40
+
41
+ # Create a PID file if requested
42
+ def write_pidfile
43
+ return unless pidfile
44
+ pid = $$
45
+ File.open(pidfile, 'w') { |f| f.puts(pid) }
46
+
47
+ # Remove pidfile on exit
48
+ at_exit do
49
+ File.delete(pidfile) if pid == $$
50
+ end
51
+ end
52
+
53
+ # Parse command line options placing results in the corresponding instance variables
54
+ def parse(argv)
55
+ parser = OptionParser.new do |o|
56
+ o.on('-n', '--name NAME', 'Unique Name of this server instance (Default: hostname:PID)') { |arg| @name = arg }
57
+ o.on('-t', '--threads COUNT', 'Number of worker threads to start') { |arg| @threads = arg.to_i }
58
+ o.on('-q', '--quiet', 'Do not write to stdout, only to logfile. Necessary when running as a daemon') { @quiet = true }
59
+ o.on('-d', '--dir DIR', 'Directory containing Rails app, if not current directory') { |arg| @directory = arg }
60
+ o.on('-e', '--environment ENVIRONMENT', 'The environment to run the app on (Default: RAILS_ENV || RACK_ENV || development)') { |arg| @environment = arg }
61
+ o.on('--pidfile PATH', 'Use PATH as a pidfile') { |arg| @pidfile = arg }
62
+ o.on('-v', '--version', 'Print the version information') do
63
+ puts "Rocket Job v#{RocketJob::VERSION}"
64
+ exit 1
65
+ end
66
+ end
67
+ parser.banner = 'rocketjob <options>'
68
+ parser.on_tail '-h', '--help', 'Show help' do
69
+ puts parser
70
+ exit 1
71
+ end
72
+ parser.parse! argv
73
+ end
74
+
75
+ end
76
+ end
@@ -0,0 +1,157 @@
1
+ # encoding: UTF-8
2
+
3
+ # Worker behavior for a job
4
+ module RocketJob
5
+ module Concerns
6
+ module Worker
7
+ def self.included(base)
8
+ base.extend ClassMethods
9
+ base.class_eval do
10
+ # While working on a slice, the current slice is available via this reader
11
+ attr_reader :rocket_job_slice
12
+
13
+ @rocket_job_defaults = nil
14
+ end
15
+ end
16
+
17
+ module ClassMethods
18
+ # Returns [Job] after queue-ing it for processing
19
+ def later(method, *args, &block)
20
+ if RocketJob::Config.inline_mode
21
+ now(method, *args, &block)
22
+ else
23
+ job = build(method, *args, &block)
24
+ job.save!
25
+ job
26
+ end
27
+ end
28
+
29
+ # Create a job and process it immediately in-line by this thread
30
+ def now(method, *args, &block)
31
+ job = build(method, *args, &block)
32
+ server = Server.new(name: 'inline')
33
+ server.started
34
+ job.start
35
+ while job.running? && !job.work(server)
36
+ end
37
+ job
38
+ end
39
+
40
+ # Build a Rocket Job instance
41
+ #
42
+ # Note:
43
+ # - #save! must be called on the return job instance if it needs to be
44
+ # queued for processing.
45
+ # - If data is uploaded into the job instance before saving, and is then
46
+ # discarded, call #cleanup! to clear out any partially uploaded data
47
+ def build(method, *args, &block)
48
+ job = new(arguments: args, perform_method: method.to_sym)
49
+ @rocket_job_defaults.call(job) if @rocket_job_defaults
50
+ block.call(job) if block
51
+ job
52
+ end
53
+
54
+ # Method to be performed later
55
+ def perform_later(*args, &block)
56
+ later(:perform, *args, &block)
57
+ end
58
+
59
+ # Method to be performed later
60
+ def perform_build(*args, &block)
61
+ build(:perform, *args, &block)
62
+ end
63
+
64
+ # Method to be performed now
65
+ def perform_now(*args, &block)
66
+ now(:perform, *args, &block)
67
+ end
68
+
69
+ # Define job defaults
70
+ def rocket_job(&block)
71
+ @rocket_job_defaults = block
72
+ self
73
+ end
74
+ end
75
+
76
+ def rocket_job_csv_parser
77
+ # TODO Change into an instance variable once CSV handling has been re-worked
78
+ RocketJob::Utility::CSVRow.new
79
+ end
80
+
81
+ # Works on this job
82
+ #
83
+ # Returns [true|false] whether this job should be excluded from the next lookup
84
+ #
85
+ # If an exception is thrown the job is marked as failed and the exception
86
+ # is set in the job itself.
87
+ #
88
+ # Thread-safe, can be called by multiple threads at the same time
89
+ def work(server)
90
+ raise 'Job must be started before calling #work' unless running?
91
+ begin
92
+ # before_perform
93
+ call_method(perform_method, arguments, event: :before, log_level: log_level)
94
+
95
+ # perform
96
+ call_method(perform_method, arguments, log_level: log_level)
97
+ if self.collect_output?
98
+ self.output = (result.is_a?(Hash) || result.is_a?(BSON::OrderedHash)) ? result : { result: result }
99
+ end
100
+
101
+ # after_perform
102
+ call_method(perform_method, arguments, event: :after, log_level: log_level)
103
+ complete!
104
+ rescue Exception => exc
105
+ set_exception(server.name, exc)
106
+ raise exc if RocketJob::Config.inline_mode
107
+ end
108
+ false
109
+ end
110
+
111
+ protected
112
+
113
+ # Calls a method on this job, if it is defined
114
+ # Adds the event name to the method call if supplied
115
+ #
116
+ # Returns [Object] the result of calling the method
117
+ #
118
+ # Parameters
119
+ # method [Symbol]
120
+ # The method to call on this job
121
+ #
122
+ # arguments [Array]
123
+ # Arguments to pass to the method call
124
+ #
125
+ # Options:
126
+ # event: [Symbol]
127
+ # Any one of: :before, :after
128
+ # Default: None, just calls the method itself
129
+ #
130
+ # log_level: [Symbol]
131
+ # Log level to apply to silence logging during the call
132
+ # Default: nil ( no change )
133
+ #
134
+ def call_method(method, arguments, options={})
135
+ options = options.dup
136
+ event = options.delete(:event)
137
+ log_level = options.delete(:log_level)
138
+ raise(ArgumentError, "Unknown #{self.class.name}#call_method options: #{options.inspect}") if options.size > 0
139
+
140
+ the_method = event.nil? ? method : "#{event}_#{method}".to_sym
141
+ if respond_to?(the_method)
142
+ method_name = "#{self.class.name}##{the_method}"
143
+ logger.info "Start #{method_name}"
144
+ logger.benchmark_info("Completed #{method_name}",
145
+ metric: "rocketjob/#{self.class.name.underscore}/#{the_method}",
146
+ log_exception: :full,
147
+ on_exception_level: :error,
148
+ silence: log_level
149
+ ) do
150
+ self.send(the_method, *arguments)
151
+ end
152
+ end
153
+ end
154
+
155
+ end
156
+ end
157
+ end