rocketjob 0.7.0 → 0.8.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: d784e40e75e2aca697e1258fb6d823be8e03c6f1
4
- data.tar.gz: 69008d9f1ae0396be4e1838f3d931299af226005
3
+ metadata.gz: 0ce24994db3b436019267874436049b46226fe54
4
+ data.tar.gz: b34a433d970dab95a6f37fcf4d1db25002e17bf6
5
5
  SHA512:
6
- metadata.gz: 614602a9f849b27bfbd4e2fda0985da5ae798e4a95e9ccbfe84a49d620fc1fedd9423df5395e7a1acefc4f42f61008d467d136a2c122ef19038e1dfd4b7dd555
7
- data.tar.gz: be35af2fafca63f647ebb485cfcdbcb76399556ac86693a48ec22d98cbfca512925b7c07ac9ca191a1fb7ce37ff0e2235124ddc7f79c7bf313647797e14e35f1
6
+ metadata.gz: d9f00987987456c79446bab56247fc3c23cb91263d42e1d30957b539851952a5564c9f744e165132120ade41529834d949c1af6918d960c546e2cd76dafc5b74
7
+ data.tar.gz: 77b5a354059d60f59b1f9c8f2f138f02a6817dd391ec7e05d8915998bfa0d98ae6aa1296c97dadc88ff8597e47aeb56122a1e8b526792b28e99b27dd662ff702
data/README.md CHANGED
@@ -1,10 +1,10 @@
1
- # rocketjob
1
+ # rocketjob[![Build Status](https://secure.travis-ci.org/rocketjob/rocketjob.png?branch=master)](http://travis-ci.org/rocketjob/rocketjob) ![](http://ruby-gem-downloads-badge.herokuapp.com/rocketjob?type=total)
2
2
 
3
3
  High volume, priority based, background job processing solution for Ruby.
4
4
 
5
5
  ## Status
6
6
 
7
- Alpha - Feedback on the API is welcome. API will change.
7
+ Beta - Feedback on the API is welcome. API may change.
8
8
 
9
9
  Already in use in production internally processing large files with millions
10
10
  of records, as well as large jobs to walk though large databases.
@@ -91,7 +91,7 @@ as quickly as possible without impacting other jobs with a higher priority.
91
91
 
92
92
  ## Management
93
93
 
94
- The companion project [rocketjob mission control](https://github.com/lambcr/rocket_job_mission_control)
94
+ The companion project [rocketjob mission control](https://github.com/rocketjob/rocket_job_mission_control)
95
95
  contains the Rails Engine that can be loaded into your Rails project to add
96
96
  a web interface for viewing and managing `rocketjob` jobs.
97
97
 
@@ -122,30 +122,167 @@ To queue the above job for processing:
122
122
  MyJob.perform_later('jack@blah.com', 'lets meet')
123
123
  ```
124
124
 
125
- ## Configuration
125
+ ## Directory Monitoring
126
126
 
127
- MongoMapper will already configure itself in Rails environments. Sometimes we want
128
- to use a different Mongo Database instance for the records and results.
127
+ A common task with many batch processing systems is to look for the appearance of
128
+ new files and kick off jobs to process them. `DirmonJob` is a job designed to do
129
+ this task.
129
130
 
130
- For example, the RocketJob::Job can be stored in a Mongo Database that is replicated
131
- across data centers, whereas we may not want to replicate record and result data
132
- due to it's sheer volume.
131
+ `DirmonJob` runs every 5 minutes by default, looking for new files that have appeared
132
+ based on configured entries called `DirmonEntry`. Ultimately these entries will be
133
+ configurable via `rocketjob_mission_control`, the web management interface for `rocketjob`.
134
+
135
+ Example, creating a `DirmonEntry`
136
+
137
+ ```ruby
138
+ RocketJob::DirmonEntry.new(
139
+ path: 'path_to_monitor/*',
140
+ job: 'Jobs::TestJob',
141
+ arguments: [ { input: 'yes' } ],
142
+ properties: { priority: 23, perform_method: :event },
143
+ archive_directory: '/exports/archive'
144
+ )
145
+ ```
146
+
147
+ The attributes of DirmonEntry:
148
+
149
+ * path <String>
150
+
151
+ Wildcard path to search for files in.
152
+ For details on valid path values, see: http://ruby-doc.org/core-2.2.2/Dir.html#method-c-glob
153
+
154
+ Example:
155
+
156
+ * input_files/process1/*.csv*
157
+ * input_files/process2/**/*
158
+
159
+ * job <String>
160
+
161
+ Name of the job to start
162
+
163
+ * arguments <Array>
164
+
165
+ Any user supplied arguments for the method invocation
166
+ All keys must be UTF-8 strings. The values can be any valid BSON type:
167
+
168
+ * Integer
169
+ * Float
170
+ * Time (UTC)
171
+ * String (UTF-8)
172
+ * Array
173
+ * Hash
174
+ * True
175
+ * False
176
+ * Symbol
177
+ * nil
178
+ * Regular Expression
179
+
180
+ _Note_: Date is not supported, convert it to a UTC time
181
+
182
+ * properties <Hash>
183
+
184
+ Any job properties to set.
185
+
186
+ Example, override the default job priority:
187
+
188
+ ```ruby
189
+ { priority: 45 }
190
+ ```
191
+
192
+ * archive_directory
193
+
194
+ Archive directory to move the file to before the job is started. It is important to
195
+ move the file before it is processed so that it is not picked up again for processing.
196
+ If no archive_directory is supplied the file will be moved to a folder called '_archive'
197
+ in the same folder as the file itself.
198
+
199
+ If the `path` above is a relative path the relative path structure will be
200
+ maintained when the file is moved to the archive path.
201
+
202
+ * enabled <Boolean>
203
+
204
+ Allow a monitoring entry to be disabled so that it is ignored by `DirmonJob`.
205
+ This feature is useful for operations to temporarily stop processing files
206
+ from a particular source, without having to completely delete the `DirmonEntry`.
207
+ It can also be used to create a `DirmonEntry` without it becoming immediately
208
+ active.
209
+ ```
210
+
211
+ ### Starting the directory monitor
212
+
213
+ The directory monitor job only needs to be started once per installation by running
214
+ the following code:
215
+
216
+ ```ruby
217
+ RocketJob::Jobs::DirmonJob.perform_later
218
+ ```
219
+
220
+ The polling interval to check for new files can be modified when starting the job
221
+ for the first time by adding:
222
+ ```ruby
223
+ RocketJob::Jobs::DirmonJob.perform_later do |job|
224
+ job.check_seconds = 180
225
+ end
226
+ ```
227
+
228
+ The default priority for `DirmonJob` is 40, to increase it's priority:
229
+ ```ruby
230
+ RocketJob::Jobs::DirmonJob.perform_later do |job|
231
+ job.check_seconds = 300
232
+ job.priority = 25
233
+ end
234
+ ```
235
+
236
+ Once `DirmonJob` has been started it's priority and check interval can be
237
+ changed at any time as follows:
238
+
239
+ ```ruby
240
+ RocketJob::Jobs::DirmonJob.first.set(check_seconds: 180, priority: 20)
241
+ ```
242
+
243
+ The `DirmonJob` will automatically re-schedule a new instance of itself to run in
244
+ the future after it completes a each scan/run. If successful the current job instance
245
+ will destroy itself.
246
+
247
+ In this way it avoids having a single Directory Monitor process that constantly
248
+ sits there monitoring folders for changes. More importantly it avoids a "single
249
+ point of failure" that is typical for earlier directory monitoring solutions.
250
+ Every time `DirmonJob` runs and scans the paths for new files it could be running
251
+ on a new worker. If any server/worker is removed or shutdown it will not stop
252
+ `DirmonJob` since it will just run on another worker instance.
253
+
254
+ There can only be one `DirmonJob` instance `queued` or `running` at a time. Any
255
+ attempt to start a second instance will result in an exception.
256
+
257
+ If an exception occurs while running `DirmonJob`, a failed job instance will remain
258
+ in the job list for problem determination. The failed job cannot be restarted and
259
+ should be destroyed if no longer needed.
260
+
261
+ ## Rails Configuration
262
+
263
+ MongoMapper will already configure itself in Rails environments. `rocketjob` can
264
+ be configured to use a separate MongoDB instance from the Rails application as follows:
265
+
266
+ For example, we may want `RocketJob::Job` to be stored in a Mongo Database that
267
+ is replicated across data centers, whereas we may not want to replicate the
268
+ `RocketJob::SlicedJob`** slices due to it's sheer volume.
133
269
 
134
270
  ```ruby
135
271
  config.before_initialize do
136
- # If this environment has a separate Work server
137
272
  # Share the common mongo configuration file
138
273
  config_file = root.join('config', 'mongo.yml')
139
274
  if config_file.file?
140
- if config = YAML.load(ERB.new(config_file.read).result)["#{Rails.env}_work]
275
+ config = YAML.load(ERB.new(config_file.read).result)
276
+ if config["#{Rails.env}_rocketjob]
141
277
  options = (config['options']||{}).symbolize_keys
142
- # In the development environment the Mongo driver generates a lot of
143
- # network trace log data, move its debug logging to :trace
144
- options[:logger] = SemanticLogger::DebugAsTraceLogger.new('Mongo:Work')
278
+ options[:logger] = SemanticLogger::DebugAsTraceLogger.new('Mongo:rocketjob')
279
+ RocketJob::Config.mongo_connection = Mongo::MongoClient.from_uri(config['uri'], options)
280
+ end
281
+ # It is also possible to store the jobs themselves in a separate MongoDB database
282
+ if config["#{Rails.env}_rocketjob_work]
283
+ options = (config['options']||{}).symbolize_keys
284
+ options[:logger] = SemanticLogger::DebugAsTraceLogger.new('Mongo:rocketjob_work')
145
285
  RocketJob::Config.mongo_work_connection = Mongo::MongoClient.from_uri(config['uri'], options)
146
-
147
- # It is also possible to store the jobs themselves in a separate MongoDB database
148
- # RocketJob::Config.mongo_connection = Mongo::MongoClient.from_uri(config['uri'], options)
149
286
  end
150
287
  else
151
288
  puts "\nmongo.yml config file not found: #{config_file}"
@@ -153,8 +290,48 @@ config.before_initialize do
153
290
  end
154
291
  ```
155
292
 
293
+ For an example config file, `config/mongo.yml`, see [mongo.yml](https://github.com/rocketjob/rocketjob/blob/master/test/config/mongo.yml)
294
+
295
+ ## Standalone Configuration
296
+
297
+ When running `rocketjob` in a standalone environment without Rails, the MongoDB
298
+ connections will need to be setup as follows:
299
+
300
+ ```ruby
301
+ options = {
302
+ pool_size: 50,
303
+ pool_timeout: 5,
304
+ logger: SemanticLogger::DebugAsTraceLogger.new('Mongo:Work'),
305
+ }
306
+
307
+ # For example when using a replica-set for high availability
308
+ uri = 'mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production_rocketjob'
309
+ RocketJob::Config.mongo_connection = Mongo::MongoClient.from_uri(uri, options)
310
+
311
+ # Use a separate database, or even server for `RocketJob::SlicedJob` slices
312
+ uri = 'mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production_rocketjob_slices'
313
+ RocketJob::Config.mongo_work_connection = Mongo::MongoClient.from_uri(uri, options)
314
+ ```
315
+
156
316
  ## Requirements
157
317
 
158
318
  MongoDB V2.6 or greater. V3 is recommended
159
319
 
160
320
  * V2.6 includes a feature to allow lookups using the `$or` clause to use an index
321
+
322
+ ## Meta
323
+
324
+ * Code: `git clone git://github.com/rocketjob/rocketjob.git`
325
+ * Home: <https://github.com/rocketjob/rocketjob>
326
+ * Bugs: <http://github.com/rocketjob/rocketjob/issues>
327
+ * Gems: <http://rubygems.org/gems/rocketjob>
328
+
329
+ This project uses [Semantic Versioning](http://semver.org/).
330
+
331
+ ## Author
332
+
333
+ [Reid Morrison](https://github.com/reidmorrison) :: @reidmorrison
334
+
335
+ ## Contributors
336
+
337
+ * [Chris Lamb](https://github.com/lambcr)
@@ -18,7 +18,7 @@ module RocketJob
18
18
  # Run a RocketJob::Server from the command line
19
19
  def run
20
20
  SemanticLogger.add_appender(STDOUT, &SemanticLogger::Appender::Base.colorized_formatter) unless quiet
21
- boot_rails
21
+ boot_rails if defined?(:Rails)
22
22
  write_pidfile
23
23
 
24
24
  opts = {}
@@ -53,11 +53,15 @@ module RocketJob
53
53
  connection(connection)
54
54
  Server.connection(connection)
55
55
  Job.connection(connection)
56
+ Config.connection(connection)
57
+ DirmonEntry.connection(connection)
56
58
 
57
59
  db_name = connection.db.name
58
60
  set_database_name(db_name)
59
61
  Server.set_database_name(db_name)
60
62
  Job.set_database_name(db_name)
63
+ Config.set_database_name(db_name)
64
+ DirmonEntry.set_database_name(db_name)
61
65
  end
62
66
 
63
67
  # Use a separate Mongo connection for the Records and Results
@@ -0,0 +1,60 @@
1
+ module RocketJob
2
+ class DirmonEntry
3
+ include MongoMapper::Document
4
+
5
+ # Wildcard path to search for files in
6
+ #
7
+ # Example:
8
+ # input_files/process1/*.csv*
9
+ # input_files/process2/**/*
10
+ #
11
+ # For details on valid path values, see: http://ruby-doc.org/core-2.2.2/Dir.html#method-c-glob
12
+ #
13
+ # Note
14
+ # - If there are no '*' in the path then an exact filename match is expected
15
+ key :path, String
16
+
17
+ # Job to start
18
+ #
19
+ # Example:
20
+ # "ProcessItJob"
21
+ key :job, String
22
+
23
+ # Any user supplied arguments for the method invocation
24
+ # All keys must be UTF-8 strings. The values can be any valid BSON type:
25
+ # Integer
26
+ # Float
27
+ # Time (UTC)
28
+ # String (UTF-8)
29
+ # Array
30
+ # Hash
31
+ # True
32
+ # False
33
+ # Symbol
34
+ # nil
35
+ # Regular Expression
36
+ #
37
+ # Note: Date is not supported, convert it to a UTC time
38
+ key :arguments, Array, default: []
39
+
40
+ # Any job properties to set
41
+ #
42
+ # Example, override the default job priority:
43
+ # { priority: 45 }
44
+ key :properties, Hash, default: {}
45
+
46
+ # Archive directory to move files to when processed to prevent processing the
47
+ # file again.
48
+ #
49
+ # If supplied, the file will be moved to this directory before the job is started
50
+ # If the file was in a sub-directory, the corresponding sub-directory will
51
+ # be created in the archive directory, if the path being scanned for files
52
+ # is a relative path. (I.e. Does not start with '/') .
53
+ key :archive_directory, String
54
+
55
+ # Allow a monitoring path to be temporarily disabled
56
+ key :enabled, Boolean, default: true
57
+
58
+ validates_presence_of :path, :job
59
+ end
60
+ end
@@ -23,28 +23,28 @@ module RocketJob
23
23
  key :perform_method, Symbol, default: :perform
24
24
 
25
25
  # Priority of this job as it relates to other jobs [1..100]
26
- # 1: Lowest Priority
27
- # 100: Highest Priority
26
+ # 1: Highest Priority
28
27
  # 50: Default Priority
28
+ # 100: Lowest Priority
29
+ #
30
+ # Example:
31
+ # A job with a priority of 40 will execute before a job with priority 50
32
+ #
33
+ # In RocketJob Pro, if a SlicedJob is running and a higher priority job
34
+ # arrives, then the current job will complete the current slices and process
35
+ # the new higher priority job
29
36
  key :priority, Integer, default: 50
30
37
 
31
- # Support running this job in the future
32
- # Also set when a job fails and needs to be re-tried in the future
38
+ # Run this job no earlier than this time
33
39
  key :run_at, Time
34
40
 
35
41
  # If a job has not started by this time, destroy it
36
42
  key :expires_at, Time
37
43
 
38
44
  # When specified a job will be re-scheduled to run at it's next scheduled interval
39
- # Format is the same as cron
40
- key :schedule, String
41
-
42
- # Job should be marked as repeatable when it can be run multiple times
43
- # without changing the system state or modifying database contents.
44
- # Setting to false will result in an additional lookup on the results collection
45
- # before processing the record to ensure it was not previously processed.
46
- # This is necessary for retrying a job.
47
- key :repeatable, Boolean, default: true
45
+ # Format is the same as cron.
46
+ # #TODO Future capability.
47
+ #key :schedule, String
48
48
 
49
49
  # When the job completes destroy it from both the database and the UI
50
50
  key :destroy_on_complete, Boolean, default: true
@@ -75,9 +75,6 @@ module RocketJob
75
75
  # Levels supported: :trace, :debug, :info, :warn, :error, :fatal
76
76
  key :log_level, Symbol
77
77
 
78
- # Only give access through the Web UI to this group identifier
79
- #key :group, String
80
-
81
78
  #
82
79
  # Read-only attributes
83
80
  #
@@ -121,30 +118,17 @@ module RocketJob
121
118
  set_collection_name 'rocket_job.jobs'
122
119
 
123
120
  validates_presence_of :state, :failure_count, :created_at, :perform_method
124
- # :repeatable, :destroy_on_complete, :collect_output, :arguments
125
121
  validates :priority, inclusion: 1..100
126
122
 
127
123
  # State Machine events and transitions
128
124
  #
129
- # For Job Record jobs, usual processing:
130
125
  # :queued -> :running -> :completed
131
- # -> :paused -> :running ( manual )
132
- # -> :failed -> :running ( manual )
133
- # -> :retry -> :running ( future date )
134
- #
135
- # Any state other than :completed can transition manually to :aborted
136
- #
137
- # Work queue is priority based and then FIFO thereafter
138
- # means that records from existing multi-record jobs will be completed before
139
- # new jobs are started with the same priority.
140
- # Unless, the loader is not fast enough and the
141
- # records queue is empty. In this case the next multi-record job will
142
- # start loading too.
143
- #
144
- # Where: state: [:queued, :running], run_at: $lte: Time.now
145
- # Sort: priority, created_at
146
- #
147
- # Index: state, run_at
126
+ # -> :paused -> :running
127
+ # -> :aborted
128
+ # -> :failed -> :running
129
+ # -> :aborted
130
+ # -> :aborted
131
+ # -> :aborted
148
132
  aasm column: :state do
149
133
  # Job has been created and is queued for processing ( Initial state )
150
134
  state :queued, initial: true
@@ -162,10 +146,6 @@ module RocketJob
162
146
  # Job failed to process and needs to be manually re-tried or aborted
163
147
  state :failed
164
148
 
165
- # Job failed to process previously and is scheduled to be retried at a
166
- # future date
167
- state :retry
168
-
169
149
  # Job was aborted and cannot be resumed ( End state )
170
150
  state :aborted
171
151
 
@@ -253,6 +233,11 @@ module RocketJob
253
233
  Time.at(seconds)
254
234
  end
255
235
 
236
+ # A job has expired if the expiry time has passed before it is started
237
+ def expired?
238
+ started_at.nil? && expires_at && (expires_at < Time.now)
239
+ end
240
+
256
241
  # Returns [Hash] status of this job
257
242
  def status(time_zone='Eastern Time (US & Canada)')
258
243
  h = {
@@ -279,13 +264,15 @@ module RocketJob
279
264
  h
280
265
  end
281
266
 
282
- # Same basic formula for calculating retry interval as delayed_job and Sidekiq
283
- # TODO Consider lowering the priority automatically after every retry?
267
+ # TODO Jobs are not currently automatically retried. Is there a need?
284
268
  def seconds_to_delay(count)
269
+ # TODO Consider lowering the priority automatically after every retry?
270
+ # Same basic formula for calculating retry interval as delayed_job and Sidekiq
285
271
  (count ** 4) + 15 + (rand(30)*(count+1))
286
272
  end
287
273
 
288
274
  # Patch the way MongoMapper reloads a model
275
+ # Only reload MongoMapper attributes, leaving other instance variables untouched
289
276
  def reload
290
277
  if doc = collection.find_one(:_id => id)
291
278
  load_from_database(doc)
@@ -345,7 +332,7 @@ module RocketJob
345
332
  # Name of the server that will be processing this job
346
333
  #
347
334
  # skip_job_ids [Array<BSON::ObjectId>]
348
- # Job ids to exclude when looking for 3the next job
335
+ # Job ids to exclude when looking for the next job
349
336
  #
350
337
  # Note:
351
338
  # If a job is in queued state it will be started
@@ -368,18 +355,25 @@ module RocketJob
368
355
  }
369
356
  query['_id'] = { '$nin' => skip_job_ids } if skip_job_ids && skip_job_ids.size > 0
370
357
 
371
- if doc = find_and_modify(
358
+ while doc = find_and_modify(
372
359
  query: query,
373
360
  sort: [['priority', 'asc'], ['created_at', 'asc']],
374
361
  update: { '$set' => { 'server_name' => server_name, 'state' => 'running' } }
375
362
  )
376
363
  job = load(doc)
377
- unless job.running?
378
- # Also update in-memory state and run call-backs
379
- job.start
380
- job.set(started_at: job.started_at)
364
+ if job.running?
365
+ return job
366
+ else
367
+ if job.expired?
368
+ job.destroy
369
+ logger.info "Destroyed expired job #{job.class.name}, id:#{job.id}"
370
+ else
371
+ # Also update in-memory state and run call-backs
372
+ job.start
373
+ job.set(started_at: job.started_at)
374
+ return job
375
+ end
381
376
  end
382
- job
383
377
  end
384
378
  end
385
379
 
@@ -0,0 +1,174 @@
1
+ require 'fileutils'
2
+ module RocketJob
3
+ module Jobs
4
+ # Dirmon monitors folders for files matching the criteria specified in DirmonEntry
5
+ #
6
+ # * The first time Dirmon runs it gathers the names of files in the monitored
7
+ # folders.
8
+ # * On completion Dirmon kicks off a new Dimon job passing it the list
9
+ # of known files.
10
+ # * On each subsequent Dirmon run it checks the size of each file against the
11
+ # previous list of known files, and only of the file size has not changed
12
+ # the corresponding job is started for that file.
13
+ # * If the job implements #file_store_upload or #upload, that method is called
14
+ # and then the file is deleted, or moved to the archive_directory if supplied
15
+ # * Otherwise, the file is moved to the supplied archive_directory (defaults to
16
+ # `_archive` in the same folder as the file itself. The absolute path and
17
+ # file name of the archived file is passed into the job as it's first argument.
18
+ # Note: This means that such jobs _must_ have a Hash as the first agrument
19
+ #
20
+ # With RocketJob Pro, the file is automatically uploaded into the job itself
21
+ # using the job's #upload method, after which the file is archived or deleted
22
+ # if no archive_directory was specified in the DirmonEntry.
23
+ #
24
+ # To start Dirmon for the first time
25
+ #
26
+ #
27
+ # Note:
28
+ # Do _not_ start multiple copies of Dirmon as it will result in duplicate
29
+ # jobs being started.
30
+ class DirmonJob < RocketJob::Job
31
+ DEFAULT_ARCHIVE_DIR = '_archive'.freeze
32
+
33
+ rocket_job do |job|
34
+ job.priority = 40
35
+ end
36
+
37
+ # Number of seconds between directory scans. Default 5 mins
38
+ key :check_seconds, Float, default: 300.0
39
+
40
+ # TODO Make :perform_later, :perform_now, :perform, :now protected/private
41
+ # class << self
42
+ # # Ensure that only one instance of the job is running.
43
+ # protected :perform_later, :perform_now, :perform, :now
44
+ # end
45
+ #self.send(:protected, :perform_later)
46
+
47
+ # Start the single instance of this job
48
+ # Returns true if the job was started
49
+ # Returns false if the job is already running and doe not need to be started
50
+ def self.start(&block)
51
+ # Prevent multiple Dirmon Jobs from running at the same time
52
+ return false if where(state: [ :running, :queued ]).count > 0
53
+
54
+ perform_later({}, &block)
55
+ true
56
+ end
57
+
58
+ # Iterate over each Dirmon entry looking for new files
59
+ # If a new file is found, it is not processed immediately, instead
60
+ # it is passed to the next run of this job along with the file size.
61
+ # If the file size has not changed, the Job is kicked off.
62
+ def perform(previous_file_names={})
63
+ new_file_names = check_directories(previous_file_names)
64
+ ensure
65
+ # Run again in the future, even if this run fails with an exception
66
+ self.class.perform_later(new_file_names || previous_file_names) do |job|
67
+ job.priority = priority
68
+ job.check_seconds = check_seconds
69
+ job.run_at = Time.now + check_seconds
70
+ end
71
+ end
72
+
73
+ # Checks the directories for new files, starting jobs if files have not changed
74
+ # since the last run
75
+ def check_directories(previous_file_names)
76
+ new_file_names = {}
77
+ DirmonEntry.where(enabled: true).each do |entry|
78
+ logger.tagged("Entry:#{entry.id}") do
79
+ Dir[entry.path].each do |file_name|
80
+ next if File.directory?(file_name)
81
+ next if file_name.include?(DEFAULT_ARCHIVE_DIR)
82
+ # BSON Keys cannot contain periods
83
+ key = file_name.gsub('.', '_')
84
+ previous_size = previous_file_names[key]
85
+ if size = check_file(entry, file_name, previous_size)
86
+ new_file_names[key] = size
87
+ end
88
+ end
89
+ end
90
+ end
91
+ new_file_names
92
+ end
93
+
94
+ # Checks if a file should result in starting a job
95
+ # Returns [Integer] file size, or nil if the file started a job
96
+ def check_file(entry, file_name, previous_size)
97
+ size = File.size(file_name)
98
+ if previous_size && (previous_size == size)
99
+ logger.info("File stabilized: #{file_name}. Starting: #{entry.job}")
100
+ start_job(entry, file_name)
101
+ nil
102
+ else
103
+ logger.info("Found file: #{file_name}. File size: #{size}")
104
+ # Keep for the next run
105
+ size
106
+ end
107
+ rescue Errno::ENOENT => exc
108
+ # File may have been deleted since the scan was performed
109
+ nil
110
+ end
111
+
112
+ # Starts the job for the supplied entry
113
+ def start_job(entry, file_name)
114
+ entry.job.constantize.perform_later(*entry.arguments) do |job|
115
+ # Set properties, also allows :perform_method to be overridden
116
+ entry.properties.each_pair { |k, v| job.send("#{k}=".to_sym, v) }
117
+
118
+ upload_file(job, file_name, entry.archive_directory)
119
+ end
120
+ end
121
+
122
+ # Upload the file to the job
123
+ def upload_file(job, file_name, archive_directory)
124
+ if job.respond_to?(:file_store_upload)
125
+ # Allow the job to determine what to do with the file
126
+ job.file_store_upload(file_name)
127
+ archive_file(file_name, archive_directory)
128
+ elsif job.respond_to?(:upload)
129
+ # With RocketJob Pro the file can be uploaded directly into the Job itself
130
+ job.upload(file_name)
131
+ archive_file(file_name, archive_directory)
132
+ else
133
+ upload_default(job, file_name, archive_directory)
134
+ end
135
+ end
136
+
137
+ # Archives the file for a job where there was no #file_store_upload or #upload method
138
+ def upload_default(job, file_name, archive_directory)
139
+ # The first argument must be a hash
140
+ job.arguments << {} if job.arguments.size == 0
141
+ # If no archive directory is supplied, use DEFAULT_ARCHIVE_DIR under the same path as the file
142
+ archive_directory ||= File.join(File.dirname(file_name), DEFAULT_ARCHIVE_DIR)
143
+ file_name = File.join(archive_directory, File.basename(file_name))
144
+ job.arguments.first[:full_file_name] = File.absolute_path(file_name)
145
+ archive_file(file_name, archive_directory)
146
+ end
147
+
148
+ # Move the file to the archive directory
149
+ # Or, delete it if no archive directory was supplied for this entry
150
+ #
151
+ # If the file_name contains a relative path the relative path will be
152
+ # created in the archive_directory before moving the file.
153
+ #
154
+ # If an absolute path is supplied, then the file is just moved into the
155
+ # archive directory without any sub-directories
156
+ def archive_file(file_name, archive_directory)
157
+ # Move file to archive directory if set
158
+ if archive_directory
159
+ # Absolute path?
160
+ target_file_name = if file_name.start_with?('/')
161
+ File.join(archive_directory, File.basename(file_name))
162
+ else
163
+ File.join(archive_directory, file_name)
164
+ end
165
+ FileUtils.mkdir_p(File.dirname(target_file_name))
166
+ FileUtils.move(file_name, target_file_name)
167
+ else
168
+ File.delete(file_name)
169
+ end
170
+ end
171
+
172
+ end
173
+ end
174
+ end
@@ -1,4 +1,4 @@
1
1
  # encoding: UTF-8
2
2
  module RocketJob #:nodoc
3
- VERSION = "0.7.0"
3
+ VERSION = "0.8.0"
4
4
  end
data/lib/rocketjob.rb CHANGED
@@ -8,6 +8,7 @@ require 'rocket_job/version'
8
8
  module RocketJob
9
9
  autoload :CLI, 'rocket_job/cli'
10
10
  autoload :Config, 'rocket_job/config'
11
+ autoload :DirmonEntry, 'rocket_job/dirmon_entry'
11
12
  autoload :Heartbeat, 'rocket_job/heartbeat'
12
13
  autoload :Job, 'rocket_job/job'
13
14
  autoload :JobException, 'rocket_job/job_exception'
@@ -15,4 +16,7 @@ module RocketJob
15
16
  module Concerns
16
17
  autoload :Worker, 'rocket_job/concerns/worker'
17
18
  end
19
+ module Jobs
20
+ autoload :DirmonJob, 'rocket_job/jobs/dirmon_job'
21
+ end
18
22
  end
@@ -11,35 +11,35 @@ default_options: &default_options
11
11
  :reconnect_max_retry_seconds: 5
12
12
 
13
13
  development:
14
- uri: mongodb://localhost:27017/development_rocket_job
14
+ uri: mongodb://localhost:27017/development_rocketjob
15
15
  options:
16
16
  <<: *default_options
17
17
 
18
18
  development_work:
19
- uri: mongodb://localhost:27017/development_rocket_job_work
19
+ uri: mongodb://localhost:27017/development_rocketjob_work
20
20
  options:
21
21
  <<: *default_options
22
22
 
23
23
  test:
24
- uri: mongodb://localhost:27017/test_rocket_job
24
+ uri: mongodb://localhost:27017/test_rocketjob
25
25
  options:
26
26
  <<: *default_options
27
27
 
28
28
  test_work:
29
- uri: mongodb://localhost:27017/test_rocket_job_work
29
+ uri: mongodb://localhost:27017/test_rocketjob_work
30
30
  options:
31
31
  <<: *default_options
32
32
 
33
33
  # Sample Production Settings
34
34
  production:
35
- uri: mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production_rocket_job
35
+ uri: mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production_rocketjob
36
36
  options:
37
37
  <<: *default_options
38
38
  :pool_size: 50
39
39
  :pool_timeout: 5
40
40
 
41
41
  production_work:
42
- uri: mongodb://mongo_local.site.com:27017/production_rocket_job_work
42
+ uri: mongodb://mongo_local.site.com:27017/production_rocketjob_work
43
43
  options:
44
44
  <<: *default_options
45
45
  :pool_size: 50
@@ -0,0 +1,299 @@
1
+ require_relative 'test_helper'
2
+ require_relative 'jobs/test_job'
3
+
4
+ # Unit Test for RocketJob::Job
5
+ class DirmonJobTest < Minitest::Test
6
+ context RocketJob::Jobs::DirmonJob do
7
+ setup do
8
+ @server = RocketJob::Server.new
9
+ @server.started
10
+ @dirmon_job = RocketJob::Jobs::DirmonJob.new
11
+ @archive_directory = '/tmp/archive_directory'
12
+ @entry = RocketJob::DirmonEntry.new(
13
+ path: 'abc/*',
14
+ job: 'Jobs::TestJob',
15
+ arguments: [ { input: 'yes' } ],
16
+ properties: { priority: 23, perform_method: :event },
17
+ archive_directory: @archive_directory
18
+ )
19
+ @job = Jobs::TestJob.new
20
+ @paths = {
21
+ 'abc/*' => %w(abc/file1 abc/file2)
22
+ }
23
+ end
24
+
25
+ teardown do
26
+ @dirmon_job.destroy if @dirmon_job && !@dirmon_job.new_record?
27
+ FileUtils.remove_dir(@archive_directory, true) if Dir.exist?(@archive_directory)
28
+ end
29
+
30
+ context '.config' do
31
+ should 'support multiple databases' do
32
+ assert_equal 'test_rocketjob', RocketJob::DirmonEntry.collection.db.name
33
+ end
34
+ end
35
+
36
+ context '#archive_file' do
37
+ should 'archive absolute path file' do
38
+ begin
39
+ file = Tempfile.new('archive')
40
+ file_name = file.path
41
+ File.open(file_name, 'w') { |file| file.write('Hello World') }
42
+ assert File.exists?(file_name)
43
+ @dirmon_job.archive_file(file_name, @archive_directory)
44
+ archive_file_name = File.join(@archive_directory, File.basename(file_name))
45
+ assert File.exists?(archive_file_name), archive_file_name
46
+ ensure
47
+ file.delete if file
48
+ end
49
+ end
50
+
51
+ should 'archive relative path file' do
52
+ begin
53
+ relative_path = 'tmp'
54
+ FileUtils.mkdir_p(relative_path)
55
+ file_name = File.join(relative_path, 'dirmon_job_test.txt')
56
+ File.open(file_name, 'w') { |file| file.write('Hello World') }
57
+ @dirmon_job.archive_file(file_name, @archive_directory)
58
+ archive_file_name = File.join(@archive_directory, file_name)
59
+ assert File.exists?(archive_file_name), archive_file_name
60
+ ensure
61
+ File.delete(file_name) if file_name && File.exists?(file_name)
62
+ end
63
+ end
64
+ end
65
+
66
+ context '#upload_default' do
67
+ should 'upload default case with no archive_directory' do
68
+ job = Jobs::TestJob.new
69
+ file_name = 'abc/myfile.txt'
70
+ archived_file_name = 'abc/_archive/myfile.txt'
71
+ @dirmon_job.stub(:archive_file, -> fn, tp { assert_equal [archived_file_name, 'abc/_archive'], [fn, tp] }) do
72
+ @dirmon_job.upload_default(job, file_name, nil)
73
+ end
74
+ assert_equal File.absolute_path(archived_file_name), job.arguments.first[:full_file_name]
75
+ end
76
+
77
+ should 'upload default case with archive_directory' do
78
+ job = Jobs::TestJob.new
79
+ file_name = 'abc/myfile.txt'
80
+ archived_file_name = "#{@archive_directory}/myfile.txt"
81
+ @dirmon_job.stub(:archive_file, -> fn, tp { assert_equal [archived_file_name, @archive_directory], [fn, tp] }) do
82
+ @dirmon_job.upload_default(job, file_name, @archive_directory)
83
+ end
84
+ assert_equal File.absolute_path(archived_file_name), job.arguments.first[:full_file_name]
85
+ end
86
+ end
87
+
88
+ context '#upload_file' do
89
+ should 'upload using #file_store_upload' do
90
+ job = Jobs::TestJob.new
91
+ job.define_singleton_method(:file_store_upload) do |file_name|
92
+ file_name
93
+ end
94
+ file_name = 'abc/myfile.txt'
95
+ @dirmon_job.stub(:archive_file, -> fn, tp { assert_equal [file_name, @archive_directory], [fn, tp] }) do
96
+ @dirmon_job.upload_file(job, file_name, @archive_directory)
97
+ end
98
+ end
99
+
100
+ should 'upload using #upload' do
101
+ job = Jobs::TestJob.new
102
+ job.define_singleton_method(:upload) do |file_name|
103
+ file_name
104
+ end
105
+ file_name = 'abc/myfile.txt'
106
+ @dirmon_job.stub(:archive_file, -> fn, tp { assert_equal [file_name, @archive_directory], [fn, tp] }) do
107
+ @dirmon_job.upload_file(job, file_name, @archive_directory)
108
+ end
109
+ end
110
+ end
111
+
112
+ context '#start_job' do
113
+ setup do
114
+ RocketJob::Config.inline_mode = true
115
+ end
116
+
117
+ teardown do
118
+ RocketJob::Config.inline_mode = false
119
+ end
120
+
121
+ should 'upload using #upload' do
122
+ file_name = 'abc/myfile.txt'
123
+ job = @dirmon_job.stub(:upload_file, -> j, fn, sp { assert_equal [file_name, @archive_directory], [fn, sp] }) do
124
+ @dirmon_job.start_job(@entry, file_name)
125
+ end
126
+ assert_equal @entry.job, job.class.name
127
+ assert_equal 23, job.priority
128
+ assert_equal [ {:input=>"yes", "before_event"=>true, "event"=>true, "after_event"=>true} ], job.arguments
129
+ end
130
+ end
131
+
132
+ context '#check_file' do
133
+ should 'check growing file' do
134
+ previous_size = 5
135
+ new_size = 10
136
+ file = Tempfile.new('check_file')
137
+ file_name = file.path
138
+ File.open(file_name, 'w') { |file| file.write('*' * new_size) }
139
+ assert_equal new_size, File.size(file_name)
140
+ result = @dirmon_job.check_file(@entry, file_name, previous_size)
141
+ assert_equal new_size, result
142
+ end
143
+
144
+ should 'check completed file' do
145
+ previous_size = 10
146
+ new_size = 10
147
+ file = Tempfile.new('check_file')
148
+ file_name = file.path
149
+ File.open(file_name, 'w') { |file| file.write('*' * new_size) }
150
+ assert_equal new_size, File.size(file_name)
151
+ started = false
152
+ result = @dirmon_job.stub(:start_job, -> e,fn { started = true } ) do
153
+ @dirmon_job.check_file(@entry, file_name, previous_size)
154
+ end
155
+ assert_equal nil, result
156
+ assert started
157
+ end
158
+
159
+ should 'check deleted file' do
160
+ previous_size = 5
161
+ file_name = 'blah'
162
+ result = @dirmon_job.check_file(@entry, file_name, previous_size)
163
+ assert_equal nil, result
164
+ end
165
+ end
166
+
167
+ context '#check_directories' do
168
+ setup do
169
+ @entry.save!
170
+ end
171
+
172
+ teardown do
173
+ @entry.destroy if @entry
174
+ end
175
+
176
+ should 'no files' do
177
+ previous_file_names = {}
178
+ result = nil
179
+ Dir.stub(:[], -> dir { [] }) do
180
+ result = @dirmon_job.check_directories(previous_file_names)
181
+ end
182
+ assert_equal 0, result.count
183
+ end
184
+
185
+ should 'new files' do
186
+ previous_file_names = {}
187
+ result = nil
188
+ Dir.stub(:[], -> dir { @paths[dir] }) do
189
+ result = @dirmon_job.stub(:check_file, -> e, fn, ps { 5 } ) do
190
+ @dirmon_job.check_directories(previous_file_names)
191
+ end
192
+ end
193
+ assert_equal result.count, @paths['abc/*'].count
194
+ result.each_pair do |k,v|
195
+ assert_equal 5, v
196
+ end
197
+ end
198
+
199
+ should 'allow files to grow' do
200
+ previous_file_names = {}
201
+ @paths['abc/*'].each { |file_name| previous_file_names[file_name] = 5}
202
+ result = nil
203
+ Dir.stub(:[], -> dir { @paths[dir] }) do
204
+ result = @dirmon_job.stub(:check_file, -> e, fn, ps { 10 } ) do
205
+ @dirmon_job.check_directories(previous_file_names)
206
+ end
207
+ end
208
+ assert_equal result.count, @paths['abc/*'].count
209
+ result.each_pair do |k,v|
210
+ assert_equal 10, v
211
+ end
212
+ end
213
+
214
+ should 'start all files' do
215
+ previous_file_names = {}
216
+ @paths['abc/*'].each { |file_name| previous_file_names[file_name] = 10 }
217
+ result = nil
218
+ Dir.stub(:[], -> dir { @paths[dir] }) do
219
+ result = @dirmon_job.stub(:check_file, -> e, fn, ps { nil } ) do
220
+ @dirmon_job.check_directories(previous_file_names)
221
+ end
222
+ end
223
+ assert_equal 0, result.count
224
+ end
225
+
226
+ should 'skip files in archive directory' do
227
+ previous_file_names = {}
228
+ @paths['abc/*'].each { |file_name| previous_file_names[file_name] = 5}
229
+ result = nil
230
+ # Add a file in the archive directory
231
+ @paths['abc/*'] << File.join('abc', RocketJob::Jobs::DirmonJob::DEFAULT_ARCHIVE_DIR, 'test.zip')
232
+ Dir.stub(:[], -> dir { @paths[dir] }) do
233
+ result = @dirmon_job.stub(:check_file, -> e, fn, ps { 10 } ) do
234
+ @dirmon_job.check_directories(previous_file_names)
235
+ end
236
+ end
237
+ assert_equal result.count, @paths['abc/*'].count - 1
238
+ result.each_pair do |k,v|
239
+ assert_equal 10, v
240
+ end
241
+ end
242
+ end
243
+
244
+
245
+ context '#perform' do
246
+ should 'check directories and reschedule' do
247
+ dirmon_job = nil
248
+ previous_file_names = {}
249
+ @paths['abc/*'].each { |file_name| previous_file_names[file_name] = 5 }
250
+ new_file_names = {}
251
+ @paths['abc/*'].each { |file_name| new_file_names[file_name] = 10 }
252
+ RocketJob::Jobs::DirmonJob.destroy_all
253
+ RocketJob::Jobs::DirmonJob.stub_any_instance(:check_directories, new_file_names) do
254
+ # perform_now does not save the job, just runs it
255
+ dirmon_job = RocketJob::Jobs::DirmonJob.perform_now(previous_file_names) do |job|
256
+ job.priority = 11
257
+ job.check_seconds = 30
258
+ end
259
+ end
260
+ assert dirmon_job.completed?, dirmon_job.status.inspect
261
+
262
+ # It should have enqueued another instance to run in the future
263
+ assert_equal 1, RocketJob::Jobs::DirmonJob.count
264
+ assert new_dirmon_job = RocketJob::Jobs::DirmonJob.last
265
+ assert_equal false, dirmon_job.id == new_dirmon_job.id
266
+ assert new_dirmon_job.run_at
267
+ assert_equal 11, new_dirmon_job.priority
268
+ assert_equal 30, new_dirmon_job.check_seconds
269
+ assert new_dirmon_job.queued?
270
+
271
+ new_dirmon_job.destroy
272
+ end
273
+
274
+ should 'check directories and reschedule even on exception' do
275
+ dirmon_job = nil
276
+ RocketJob::Jobs::DirmonJob.destroy_all
277
+ RocketJob::Jobs::DirmonJob.stub_any_instance(:check_directories, -> previous { raise RuntimeError.new("Oh no") }) do
278
+ # perform_now does not save the job, just runs it
279
+ dirmon_job = RocketJob::Jobs::DirmonJob.perform_now do |job|
280
+ job.priority = 11
281
+ job.check_seconds = 30
282
+ end
283
+ end
284
+ assert dirmon_job.failed?, dirmon_job.status.inspect
285
+
286
+ # It should have enqueued another instance to run in the future
287
+ assert_equal 2, RocketJob::Jobs::DirmonJob.count
288
+ assert new_dirmon_job = RocketJob::Jobs::DirmonJob.last
289
+ assert new_dirmon_job.run_at
290
+ assert_equal 11, new_dirmon_job.priority
291
+ assert_equal 30, new_dirmon_job.check_seconds
292
+ assert new_dirmon_job.queued?
293
+
294
+ new_dirmon_job.destroy
295
+ end
296
+ end
297
+
298
+ end
299
+ end
data/test/job_test.rb CHANGED
@@ -22,7 +22,7 @@ class JobTest < Minitest::Test
22
22
 
23
23
  context '.config' do
24
24
  should 'support multiple databases' do
25
- assert_equal 'test_rocket_job', RocketJob::Job.collection.db.name
25
+ assert_equal 'test_rocketjob', RocketJob::Job.collection.db.name
26
26
  end
27
27
  end
28
28
 
@@ -55,10 +55,8 @@ class JobTest < Minitest::Test
55
55
  assert_equal @arguments, @job.arguments
56
56
  assert_equal 0, @job.percent_complete
57
57
  assert_equal 50, @job.priority
58
- assert_equal true, @job.repeatable
59
58
  assert_equal 0, @job.failure_count
60
59
  assert_nil @job.run_at
61
- assert_nil @job.schedule
62
60
  assert_nil @job.started_at
63
61
  assert_equal :queued, @job.state
64
62
  end
@@ -181,6 +179,14 @@ class JobTest < Minitest::Test
181
179
  assert_equal @job.id, job.id
182
180
  end
183
181
 
182
+ should 'Skip expired jobs' do
183
+ count = RocketJob::Job.count
184
+ @job.expires_at = Time.now - 100
185
+ @job.save!
186
+ assert_equal nil, RocketJob::Job.next_job(@server.name)
187
+ assert_equal count, RocketJob::Job.count
188
+ end
189
+
184
190
  end
185
191
  end
186
192
  end
data/test/server_test.rb CHANGED
@@ -23,7 +23,7 @@ class ServerTest < Minitest::Test
23
23
 
24
24
  context '.config' do
25
25
  should 'support multiple databases' do
26
- assert_equal 'test_rocket_job', RocketJob::Job.collection.db.name
26
+ assert_equal 'test_rocketjob', RocketJob::Job.collection.db.name
27
27
  end
28
28
  end
29
29
 
data/test/worker_test.rb CHANGED
@@ -29,10 +29,8 @@ class WorkerTest < Minitest::Test
29
29
  assert_nil @job.expires_at
30
30
  assert_equal 0, @job.percent_complete
31
31
  assert_equal 50, @job.priority
32
- assert_equal true, @job.repeatable
33
32
  assert_equal 0, @job.failure_count
34
33
  assert_nil @job.run_at
35
- assert_nil @job.schedule
36
34
  assert_nil @job.started_at
37
35
  assert_equal :queued, @job.state
38
36
 
@@ -50,10 +48,8 @@ class WorkerTest < Minitest::Test
50
48
  assert_nil @job.expires_at
51
49
  assert_equal 100, @job.percent_complete
52
50
  assert_equal 50, @job.priority
53
- assert_equal true, @job.repeatable
54
51
  assert_equal 0, @job.failure_count
55
52
  assert_nil @job.run_at
56
- assert_nil @job.schedule
57
53
  assert @job.started_at
58
54
  end
59
55
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rocketjob
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.8.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Reid Morrison
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-07-14 00:00:00.000000000 Z
11
+ date: 2015-07-20 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: aasm
@@ -123,13 +123,16 @@ files:
123
123
  - lib/rocket_job/cli.rb
124
124
  - lib/rocket_job/concerns/worker.rb
125
125
  - lib/rocket_job/config.rb
126
+ - lib/rocket_job/dirmon_entry.rb
126
127
  - lib/rocket_job/heartbeat.rb
127
128
  - lib/rocket_job/job.rb
128
129
  - lib/rocket_job/job_exception.rb
130
+ - lib/rocket_job/jobs/dirmon_job.rb
129
131
  - lib/rocket_job/server.rb
130
132
  - lib/rocket_job/version.rb
131
133
  - lib/rocketjob.rb
132
134
  - test/config/mongo.yml
135
+ - test/dirmon_job_test.rb
133
136
  - test/job_test.rb
134
137
  - test/jobs/test_job.rb
135
138
  - test/server_test.rb
@@ -161,6 +164,7 @@ specification_version: 4
161
164
  summary: High volume, priority based, Enterprise Batch Processing solution for Ruby
162
165
  test_files:
163
166
  - test/config/mongo.yml
167
+ - test/dirmon_job_test.rb
164
168
  - test/job_test.rb
165
169
  - test/jobs/test_job.rb
166
170
  - test/server_test.rb