rocketjob 0.7.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d784e40e75e2aca697e1258fb6d823be8e03c6f1
4
+ data.tar.gz: 69008d9f1ae0396be4e1838f3d931299af226005
5
+ SHA512:
6
+ metadata.gz: 614602a9f849b27bfbd4e2fda0985da5ae798e4a95e9ccbfe84a49d620fc1fedd9423df5395e7a1acefc4f42f61008d467d136a2c122ef19038e1dfd4b7dd555
7
+ data.tar.gz: be35af2fafca63f647ebb485cfcdbcb76399556ac86693a48ec22d98cbfca512925b7c07ac9ca191a1fb7ce37ff0e2235124ddc7f79c7bf313647797e14e35f1
data/README.md ADDED
@@ -0,0 +1,160 @@
1
+ # rocketjob
2
+
3
+ High volume, priority based, background job processing solution for Ruby.
4
+
5
+ ## Status
6
+
7
+ Alpha - Feedback on the API is welcome. API will change.
8
+
9
+ Already in use in production internally processing large files with millions
10
+ of records, as well as large jobs to walk though large databases.
11
+
12
+ ## Why?
13
+
14
+ We have tried for years to make both `resque` and more recently `sidekiq`
15
+ work for large high performance batch processing.
16
+ Even `sidekiq-pro` was purchased and used in an attempt to process large batches.
17
+
18
+ Unfortunately, after all the pain and suffering with the existing asynchronous
19
+ worker solutions none of them have worked in our production environment without
20
+ significant hand-holding and constant support. Mysteriously the odd record/job
21
+ was disappearing when processing 100's of millions of jobs with no indication
22
+ where those lost jobs went.
23
+
24
+ In our environment we cannot lose even a single job or record, as all data is
25
+ business critical. The existing batch processing solution do not supply any way
26
+ to collect the output from batch processing and as a result every job has custom
27
+ code to collect it's output. rocketjob has built in support to collect the results
28
+ of any batch job.
29
+
30
+ High availability and high throughput were being limited by how much we could get
31
+ through `redis`. Being a single-threaded process it is constrained to a single
32
+ CPU. Putting `redis` on a large multi-core box does not help since it will not
33
+ use more than one CPU at a time.
34
+ Additionally, `redis` is constrained to the amount of physical memory is available
35
+ on the server.
36
+ `redis` worked very well when processing was below around 100,000 jobs a day,
37
+ when our workload suddenly increased to over 100,000,000 a day it could not keep
38
+ up. Its single CPU would often hit 100% CPU utilization when running many `sidekiq-pro`
39
+ servers. We also had to store actual job data in a separate MySQL database since
40
+ it would not fit in memory on the `redis` server.
41
+
42
+ `rocketjob` was created out of necessity due to constant support. End-users were
43
+ constantly contacting the development team to ask on the status of "hung" or
44
+ "in-complete" jobs, as part of our DevOps role.
45
+
46
+ Another significant production support challenge is trying to get `resque` or `sidekiq`
47
+ to process the batch jobs in a very specific order. Switching from queue-based
48
+ to priority-based job processing means that all jobs are processed in the order of
49
+ their priority and not what queues are defined on what servers and in what quantity.
50
+ This approach has allowed us to significantly increase the CPU and IO utilization
51
+ across all worker machines. The traditional queue based approach required constant
52
+ tweaking in the production environment to try and balance workload without overwhelming
53
+ any one server.
54
+
55
+ End-users are now able to modify the priority of their various jobs at runtime
56
+ so that they can get that business critical job out first, instead of having to
57
+ wait for other jobs of the same type/priority to finish first.
58
+
59
+ Since `rocketjob` uploads the entire file, or all data for processing it does not
60
+ require jobs to store the data in other databases.
61
+ Additionally, `rocketjob` supports encryption and compression of any data uploaded
62
+ into Sliced Jobs to ensure PCI compliance and to prevent sensitive from being exposed
63
+ either at rest in the data store, or in flight as it is being read or written to the
64
+ backend data store.
65
+ Often large files received for processing contain sensitive data that must not be exposed
66
+ in the backend job store. Having this capability built-in ensures all our jobs
67
+ are properly securing sensitive data.
68
+
69
+ Since moving to `rocketjob` our production support has diminished and now we can
70
+ focus on writing code again. :)
71
+
72
+ ## Introduction
73
+
74
+ `rocketjob` is a global "priority based queue" (https://en.wikipedia.org/wiki/Priority_queue)
75
+ All jobs are placed in a single global queue and the job with the highest priority
76
+ is processed first. Jobs with the same priority are processed on a first-in
77
+ first-out (FIFO) basis.
78
+
79
+ This differs from the traditional approach of separate queues for jobs which
80
+ quickly becomes cumbersome when there are for example over a hundred different
81
+ types of jobs.
82
+
83
+ The global priority based queue ensures that the servers are utilized to their
84
+ capacity without requiring constant manual intervention.
85
+
86
+ `rocketjob` is designed to handle hundreds of millions of concurrent jobs
87
+ that are often encountered in high volume batch processing environments.
88
+ It is designed from the ground up to support large batch file processing.
89
+ For example a single file that contains millions of records to be processed
90
+ as quickly as possible without impacting other jobs with a higher priority.
91
+
92
+ ## Management
93
+
94
+ The companion project [rocketjob mission control](https://github.com/lambcr/rocket_job_mission_control)
95
+ contains the Rails Engine that can be loaded into your Rails project to add
96
+ a web interface for viewing and managing `rocketjob` jobs.
97
+
98
+ `rocketjob mission control` can also be run stand-alone in a shell Rails application.
99
+
100
+ By separating `rocketjob mission control` into a separate gem means it does not
101
+ have to be loaded where `rocketjob` jobs are defined or run.
102
+
103
+ ## Jobs
104
+
105
+ Simple single task jobs:
106
+
107
+ Example job to run in a separate worker process
108
+
109
+ ```ruby
110
+ class MyJob < RocketJob::Job
111
+ # Method to call asynchronously by the worker
112
+ def perform(email_address, message)
113
+ # For example send an email to the supplied address with the supplied message
114
+ send_email(email_address, message)
115
+ end
116
+ end
117
+ ```
118
+
119
+ To queue the above job for processing:
120
+
121
+ ```ruby
122
+ MyJob.perform_later('jack@blah.com', 'lets meet')
123
+ ```
124
+
125
+ ## Configuration
126
+
127
+ MongoMapper will already configure itself in Rails environments. Sometimes we want
128
+ to use a different Mongo Database instance for the records and results.
129
+
130
+ For example, the RocketJob::Job can be stored in a Mongo Database that is replicated
131
+ across data centers, whereas we may not want to replicate record and result data
132
+ due to it's sheer volume.
133
+
134
+ ```ruby
135
+ config.before_initialize do
136
+ # If this environment has a separate Work server
137
+ # Share the common mongo configuration file
138
+ config_file = root.join('config', 'mongo.yml')
139
+ if config_file.file?
140
+ if config = YAML.load(ERB.new(config_file.read).result)["#{Rails.env}_work]
141
+ options = (config['options']||{}).symbolize_keys
142
+ # In the development environment the Mongo driver generates a lot of
143
+ # network trace log data, move its debug logging to :trace
144
+ options[:logger] = SemanticLogger::DebugAsTraceLogger.new('Mongo:Work')
145
+ RocketJob::Config.mongo_work_connection = Mongo::MongoClient.from_uri(config['uri'], options)
146
+
147
+ # It is also possible to store the jobs themselves in a separate MongoDB database
148
+ # RocketJob::Config.mongo_connection = Mongo::MongoClient.from_uri(config['uri'], options)
149
+ end
150
+ else
151
+ puts "\nmongo.yml config file not found: #{config_file}"
152
+ end
153
+ end
154
+ ```
155
+
156
+ ## Requirements
157
+
158
+ MongoDB V2.6 or greater. V3 is recommended
159
+
160
+ * V2.6 includes a feature to allow lookups using the `$or` clause to use an index
data/Rakefile ADDED
@@ -0,0 +1,28 @@
1
+ require 'rake/clean'
2
+ require 'rake/testtask'
3
+
4
+ $LOAD_PATH.unshift File.expand_path("../lib", __FILE__)
5
+ require 'rocket_job/version'
6
+
7
+ task :gem do
8
+ system "gem build rocketjob.gemspec"
9
+ end
10
+
11
+ task :publish => :gem do
12
+ system "git tag -a v#{RocketJob::VERSION} -m 'Tagging #{RocketJob::VERSION}'"
13
+ system "git push --tags"
14
+ system "gem push rocketjob-#{RocketJob::VERSION}.gem"
15
+ system "rm rocketjob-#{RocketJob::VERSION}.gem"
16
+ end
17
+
18
+ desc "Run Test Suite"
19
+ task :test do
20
+ Rake::TestTask.new(:functional) do |t|
21
+ t.test_files = FileList['test/**/*_test.rb']
22
+ t.verbose = true
23
+ end
24
+
25
+ Rake::Task['functional'].invoke
26
+ end
27
+
28
+ task :default => :test
data/bin/rocketjob ADDED
@@ -0,0 +1,13 @@
1
+ #!/usr/bin/env ruby
2
+ require 'rocketjob'
3
+
4
+ # Start a rocketjob server instance from the command line
5
+ begin
6
+ RocketJob::CLI.new(ARGV).run
7
+ rescue => exc
8
+ # Failsafe logger that writes to STDERR
9
+ SemanticLogger.add_appender(STDERR, :error, &SemanticLogger::Appender::Base.colorized_formatter)
10
+ SemanticLogger['RocketJob'].error('Rocket Job shutting down due to exception', exc)
11
+ SemanticLogger.flush
12
+ exit 1
13
+ end
@@ -0,0 +1,76 @@
1
+ require 'optparse'
2
+ module RocketJob
3
+ # Command Line Interface parser for RocketJob
4
+ class CLI
5
+ attr_reader :name, :threads, :environment, :pidfile, :directory, :quiet
6
+
7
+ def initialize(argv)
8
+ @name = nil
9
+ @threads = nil
10
+
11
+ @quiet = false
12
+ @environment = ENV['RAILS_ENV'] || ENV['RACK_ENV'] || 'development'
13
+ @pidfile = nil
14
+ @directory = '.'
15
+ parse(argv)
16
+ end
17
+
18
+ # Run a RocketJob::Server from the command line
19
+ def run
20
+ SemanticLogger.add_appender(STDOUT, &SemanticLogger::Appender::Base.colorized_formatter) unless quiet
21
+ boot_rails
22
+ write_pidfile
23
+
24
+ opts = {}
25
+ opts[:name] = name if name
26
+ opts[:max_threads] = threads if threads
27
+ Server.run(opts)
28
+ end
29
+
30
+ # Initialize the Rails environment
31
+ def boot_rails
32
+ require File.expand_path("#{directory}/config/environment.rb")
33
+ if Rails.configuration.eager_load
34
+ RocketJob::Server.logger.benchmark_info('Eager loaded Rails and all Engines') do
35
+ Rails.application.eager_load!
36
+ Rails::Engine.subclasses.each { |engine| engine.eager_load! }
37
+ end
38
+ end
39
+ end
40
+
41
+ # Create a PID file if requested
42
+ def write_pidfile
43
+ return unless pidfile
44
+ pid = $$
45
+ File.open(pidfile, 'w') { |f| f.puts(pid) }
46
+
47
+ # Remove pidfile on exit
48
+ at_exit do
49
+ File.delete(pidfile) if pid == $$
50
+ end
51
+ end
52
+
53
+ # Parse command line options placing results in the corresponding instance variables
54
+ def parse(argv)
55
+ parser = OptionParser.new do |o|
56
+ o.on('-n', '--name NAME', 'Unique Name of this server instance (Default: hostname:PID)') { |arg| @name = arg }
57
+ o.on('-t', '--threads COUNT', 'Number of worker threads to start') { |arg| @threads = arg.to_i }
58
+ o.on('-q', '--quiet', 'Do not write to stdout, only to logfile. Necessary when running as a daemon') { @quiet = true }
59
+ o.on('-d', '--dir DIR', 'Directory containing Rails app, if not current directory') { |arg| @directory = arg }
60
+ o.on('-e', '--environment ENVIRONMENT', 'The environment to run the app on (Default: RAILS_ENV || RACK_ENV || development)') { |arg| @environment = arg }
61
+ o.on('--pidfile PATH', 'Use PATH as a pidfile') { |arg| @pidfile = arg }
62
+ o.on('-v', '--version', 'Print the version information') do
63
+ puts "Rocket Job v#{RocketJob::VERSION}"
64
+ exit 1
65
+ end
66
+ end
67
+ parser.banner = 'rocketjob <options>'
68
+ parser.on_tail '-h', '--help', 'Show help' do
69
+ puts parser
70
+ exit 1
71
+ end
72
+ parser.parse! argv
73
+ end
74
+
75
+ end
76
+ end
@@ -0,0 +1,157 @@
1
+ # encoding: UTF-8
2
+
3
+ # Worker behavior for a job
4
+ module RocketJob
5
+ module Concerns
6
+ module Worker
7
+ def self.included(base)
8
+ base.extend ClassMethods
9
+ base.class_eval do
10
+ # While working on a slice, the current slice is available via this reader
11
+ attr_reader :rocket_job_slice
12
+
13
+ @rocket_job_defaults = nil
14
+ end
15
+ end
16
+
17
+ module ClassMethods
18
+ # Returns [Job] after queue-ing it for processing
19
+ def later(method, *args, &block)
20
+ if RocketJob::Config.inline_mode
21
+ now(method, *args, &block)
22
+ else
23
+ job = build(method, *args, &block)
24
+ job.save!
25
+ job
26
+ end
27
+ end
28
+
29
+ # Create a job and process it immediately in-line by this thread
30
+ def now(method, *args, &block)
31
+ job = build(method, *args, &block)
32
+ server = Server.new(name: 'inline')
33
+ server.started
34
+ job.start
35
+ while job.running? && !job.work(server)
36
+ end
37
+ job
38
+ end
39
+
40
+ # Build a Rocket Job instance
41
+ #
42
+ # Note:
43
+ # - #save! must be called on the return job instance if it needs to be
44
+ # queued for processing.
45
+ # - If data is uploaded into the job instance before saving, and is then
46
+ # discarded, call #cleanup! to clear out any partially uploaded data
47
+ def build(method, *args, &block)
48
+ job = new(arguments: args, perform_method: method.to_sym)
49
+ @rocket_job_defaults.call(job) if @rocket_job_defaults
50
+ block.call(job) if block
51
+ job
52
+ end
53
+
54
+ # Method to be performed later
55
+ def perform_later(*args, &block)
56
+ later(:perform, *args, &block)
57
+ end
58
+
59
+ # Method to be performed later
60
+ def perform_build(*args, &block)
61
+ build(:perform, *args, &block)
62
+ end
63
+
64
+ # Method to be performed now
65
+ def perform_now(*args, &block)
66
+ now(:perform, *args, &block)
67
+ end
68
+
69
+ # Define job defaults
70
+ def rocket_job(&block)
71
+ @rocket_job_defaults = block
72
+ self
73
+ end
74
+ end
75
+
76
+ def rocket_job_csv_parser
77
+ # TODO Change into an instance variable once CSV handling has been re-worked
78
+ RocketJob::Utility::CSVRow.new
79
+ end
80
+
81
+ # Works on this job
82
+ #
83
+ # Returns [true|false] whether this job should be excluded from the next lookup
84
+ #
85
+ # If an exception is thrown the job is marked as failed and the exception
86
+ # is set in the job itself.
87
+ #
88
+ # Thread-safe, can be called by multiple threads at the same time
89
+ def work(server)
90
+ raise 'Job must be started before calling #work' unless running?
91
+ begin
92
+ # before_perform
93
+ call_method(perform_method, arguments, event: :before, log_level: log_level)
94
+
95
+ # perform
96
+ call_method(perform_method, arguments, log_level: log_level)
97
+ if self.collect_output?
98
+ self.output = (result.is_a?(Hash) || result.is_a?(BSON::OrderedHash)) ? result : { result: result }
99
+ end
100
+
101
+ # after_perform
102
+ call_method(perform_method, arguments, event: :after, log_level: log_level)
103
+ complete!
104
+ rescue Exception => exc
105
+ set_exception(server.name, exc)
106
+ raise exc if RocketJob::Config.inline_mode
107
+ end
108
+ false
109
+ end
110
+
111
+ protected
112
+
113
+ # Calls a method on this job, if it is defined
114
+ # Adds the event name to the method call if supplied
115
+ #
116
+ # Returns [Object] the result of calling the method
117
+ #
118
+ # Parameters
119
+ # method [Symbol]
120
+ # The method to call on this job
121
+ #
122
+ # arguments [Array]
123
+ # Arguments to pass to the method call
124
+ #
125
+ # Options:
126
+ # event: [Symbol]
127
+ # Any one of: :before, :after
128
+ # Default: None, just calls the method itself
129
+ #
130
+ # log_level: [Symbol]
131
+ # Log level to apply to silence logging during the call
132
+ # Default: nil ( no change )
133
+ #
134
+ def call_method(method, arguments, options={})
135
+ options = options.dup
136
+ event = options.delete(:event)
137
+ log_level = options.delete(:log_level)
138
+ raise(ArgumentError, "Unknown #{self.class.name}#call_method options: #{options.inspect}") if options.size > 0
139
+
140
+ the_method = event.nil? ? method : "#{event}_#{method}".to_sym
141
+ if respond_to?(the_method)
142
+ method_name = "#{self.class.name}##{the_method}"
143
+ logger.info "Start #{method_name}"
144
+ logger.benchmark_info("Completed #{method_name}",
145
+ metric: "rocketjob/#{self.class.name.underscore}/#{the_method}",
146
+ log_exception: :full,
147
+ on_exception_level: :error,
148
+ silence: log_level
149
+ ) do
150
+ self.send(the_method, *arguments)
151
+ end
152
+ end
153
+ end
154
+
155
+ end
156
+ end
157
+ end