rocketjob 0.7.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: d784e40e75e2aca697e1258fb6d823be8e03c6f1
4
- data.tar.gz: 69008d9f1ae0396be4e1838f3d931299af226005
3
+ metadata.gz: 0ce24994db3b436019267874436049b46226fe54
4
+ data.tar.gz: b34a433d970dab95a6f37fcf4d1db25002e17bf6
5
5
  SHA512:
6
- metadata.gz: 614602a9f849b27bfbd4e2fda0985da5ae798e4a95e9ccbfe84a49d620fc1fedd9423df5395e7a1acefc4f42f61008d467d136a2c122ef19038e1dfd4b7dd555
7
- data.tar.gz: be35af2fafca63f647ebb485cfcdbcb76399556ac86693a48ec22d98cbfca512925b7c07ac9ca191a1fb7ce37ff0e2235124ddc7f79c7bf313647797e14e35f1
6
+ metadata.gz: d9f00987987456c79446bab56247fc3c23cb91263d42e1d30957b539851952a5564c9f744e165132120ade41529834d949c1af6918d960c546e2cd76dafc5b74
7
+ data.tar.gz: 77b5a354059d60f59b1f9c8f2f138f02a6817dd391ec7e05d8915998bfa0d98ae6aa1296c97dadc88ff8597e47aeb56122a1e8b526792b28e99b27dd662ff702
data/README.md CHANGED
@@ -1,10 +1,10 @@
1
- # rocketjob
1
+ # rocketjob[![Build Status](https://secure.travis-ci.org/rocketjob/rocketjob.png?branch=master)](http://travis-ci.org/rocketjob/rocketjob) ![](http://ruby-gem-downloads-badge.herokuapp.com/rocketjob?type=total)
2
2
 
3
3
  High volume, priority based, background job processing solution for Ruby.
4
4
 
5
5
  ## Status
6
6
 
7
- Alpha - Feedback on the API is welcome. API will change.
7
+ Beta - Feedback on the API is welcome. API may change.
8
8
 
9
9
  Already in use in production internally processing large files with millions
10
10
  of records, as well as large jobs to walk though large databases.
@@ -91,7 +91,7 @@ as quickly as possible without impacting other jobs with a higher priority.
91
91
 
92
92
  ## Management
93
93
 
94
- The companion project [rocketjob mission control](https://github.com/lambcr/rocket_job_mission_control)
94
+ The companion project [rocketjob mission control](https://github.com/rocketjob/rocket_job_mission_control)
95
95
  contains the Rails Engine that can be loaded into your Rails project to add
96
96
  a web interface for viewing and managing `rocketjob` jobs.
97
97
 
@@ -122,30 +122,167 @@ To queue the above job for processing:
122
122
  MyJob.perform_later('jack@blah.com', 'lets meet')
123
123
  ```
124
124
 
125
- ## Configuration
125
+ ## Directory Monitoring
126
126
 
127
- MongoMapper will already configure itself in Rails environments. Sometimes we want
128
- to use a different Mongo Database instance for the records and results.
127
+ A common task with many batch processing systems is to look for the appearance of
128
+ new files and kick off jobs to process them. `DirmonJob` is a job designed to do
129
+ this task.
129
130
 
130
- For example, the RocketJob::Job can be stored in a Mongo Database that is replicated
131
- across data centers, whereas we may not want to replicate record and result data
132
- due to it's sheer volume.
131
+ `DirmonJob` runs every 5 minutes by default, looking for new files that have appeared
132
+ based on configured entries called `DirmonEntry`. Ultimately these entries will be
133
+ configurable via `rocketjob_mission_control`, the web management interface for `rocketjob`.
134
+
135
+ Example, creating a `DirmonEntry`
136
+
137
+ ```ruby
138
+ RocketJob::DirmonEntry.new(
139
+ path: 'path_to_monitor/*',
140
+ job: 'Jobs::TestJob',
141
+ arguments: [ { input: 'yes' } ],
142
+ properties: { priority: 23, perform_method: :event },
143
+ archive_directory: '/exports/archive'
144
+ )
145
+ ```
146
+
147
+ The attributes of DirmonEntry:
148
+
149
+ * path <String>
150
+
151
+ Wildcard path to search for files in.
152
+ For details on valid path values, see: http://ruby-doc.org/core-2.2.2/Dir.html#method-c-glob
153
+
154
+ Example:
155
+
156
+ * input_files/process1/*.csv*
157
+ * input_files/process2/**/*
158
+
159
+ * job <String>
160
+
161
+ Name of the job to start
162
+
163
+ * arguments <Array>
164
+
165
+ Any user supplied arguments for the method invocation
166
+ All keys must be UTF-8 strings. The values can be any valid BSON type:
167
+
168
+ * Integer
169
+ * Float
170
+ * Time (UTC)
171
+ * String (UTF-8)
172
+ * Array
173
+ * Hash
174
+ * True
175
+ * False
176
+ * Symbol
177
+ * nil
178
+ * Regular Expression
179
+
180
+ _Note_: Date is not supported, convert it to a UTC time
181
+
182
+ * properties <Hash>
183
+
184
+ Any job properties to set.
185
+
186
+ Example, override the default job priority:
187
+
188
+ ```ruby
189
+ { priority: 45 }
190
+ ```
191
+
192
+ * archive_directory
193
+
194
+ Archive directory to move the file to before the job is started. It is important to
195
+ move the file before it is processed so that it is not picked up again for processing.
196
+ If no archive_directory is supplied the file will be moved to a folder called '_archive'
197
+ in the same folder as the file itself.
198
+
199
+ If the `path` above is a relative path the relative path structure will be
200
+ maintained when the file is moved to the archive path.
201
+
202
+ * enabled <Boolean>
203
+
204
+ Allow a monitoring entry to be disabled so that it is ignored by `DirmonJob`.
205
+ This feature is useful for operations to temporarily stop processing files
206
+ from a particular source, without having to completely delete the `DirmonEntry`.
207
+ It can also be used to create a `DirmonEntry` without it becoming immediately
208
+ active.
209
+ ```
210
+
211
+ ### Starting the directory monitor
212
+
213
+ The directory monitor job only needs to be started once per installation by running
214
+ the following code:
215
+
216
+ ```ruby
217
+ RocketJob::Jobs::DirmonJob.perform_later
218
+ ```
219
+
220
+ The polling interval to check for new files can be modified when starting the job
221
+ for the first time by adding:
222
+ ```ruby
223
+ RocketJob::Jobs::DirmonJob.perform_later do |job|
224
+ job.check_seconds = 180
225
+ end
226
+ ```
227
+
228
+ The default priority for `DirmonJob` is 40, to increase it's priority:
229
+ ```ruby
230
+ RocketJob::Jobs::DirmonJob.perform_later do |job|
231
+ job.check_seconds = 300
232
+ job.priority = 25
233
+ end
234
+ ```
235
+
236
+ Once `DirmonJob` has been started it's priority and check interval can be
237
+ changed at any time as follows:
238
+
239
+ ```ruby
240
+ RocketJob::Jobs::DirmonJob.first.set(check_seconds: 180, priority: 20)
241
+ ```
242
+
243
+ The `DirmonJob` will automatically re-schedule a new instance of itself to run in
244
+ the future after it completes a each scan/run. If successful the current job instance
245
+ will destroy itself.
246
+
247
+ In this way it avoids having a single Directory Monitor process that constantly
248
+ sits there monitoring folders for changes. More importantly it avoids a "single
249
+ point of failure" that is typical for earlier directory monitoring solutions.
250
+ Every time `DirmonJob` runs and scans the paths for new files it could be running
251
+ on a new worker. If any server/worker is removed or shutdown it will not stop
252
+ `DirmonJob` since it will just run on another worker instance.
253
+
254
+ There can only be one `DirmonJob` instance `queued` or `running` at a time. Any
255
+ attempt to start a second instance will result in an exception.
256
+
257
+ If an exception occurs while running `DirmonJob`, a failed job instance will remain
258
+ in the job list for problem determination. The failed job cannot be restarted and
259
+ should be destroyed if no longer needed.
260
+
261
+ ## Rails Configuration
262
+
263
+ MongoMapper will already configure itself in Rails environments. `rocketjob` can
264
+ be configured to use a separate MongoDB instance from the Rails application as follows:
265
+
266
+ For example, we may want `RocketJob::Job` to be stored in a Mongo Database that
267
+ is replicated across data centers, whereas we may not want to replicate the
268
+ `RocketJob::SlicedJob`** slices due to it's sheer volume.
133
269
 
134
270
  ```ruby
135
271
  config.before_initialize do
136
- # If this environment has a separate Work server
137
272
  # Share the common mongo configuration file
138
273
  config_file = root.join('config', 'mongo.yml')
139
274
  if config_file.file?
140
- if config = YAML.load(ERB.new(config_file.read).result)["#{Rails.env}_work]
275
+ config = YAML.load(ERB.new(config_file.read).result)
276
+ if config["#{Rails.env}_rocketjob]
141
277
  options = (config['options']||{}).symbolize_keys
142
- # In the development environment the Mongo driver generates a lot of
143
- # network trace log data, move its debug logging to :trace
144
- options[:logger] = SemanticLogger::DebugAsTraceLogger.new('Mongo:Work')
278
+ options[:logger] = SemanticLogger::DebugAsTraceLogger.new('Mongo:rocketjob')
279
+ RocketJob::Config.mongo_connection = Mongo::MongoClient.from_uri(config['uri'], options)
280
+ end
281
+ # It is also possible to store the jobs themselves in a separate MongoDB database
282
+ if config["#{Rails.env}_rocketjob_work]
283
+ options = (config['options']||{}).symbolize_keys
284
+ options[:logger] = SemanticLogger::DebugAsTraceLogger.new('Mongo:rocketjob_work')
145
285
  RocketJob::Config.mongo_work_connection = Mongo::MongoClient.from_uri(config['uri'], options)
146
-
147
- # It is also possible to store the jobs themselves in a separate MongoDB database
148
- # RocketJob::Config.mongo_connection = Mongo::MongoClient.from_uri(config['uri'], options)
149
286
  end
150
287
  else
151
288
  puts "\nmongo.yml config file not found: #{config_file}"
@@ -153,8 +290,48 @@ config.before_initialize do
153
290
  end
154
291
  ```
155
292
 
293
+ For an example config file, `config/mongo.yml`, see [mongo.yml](https://github.com/rocketjob/rocketjob/blob/master/test/config/mongo.yml)
294
+
295
+ ## Standalone Configuration
296
+
297
+ When running `rocketjob` in a standalone environment without Rails, the MongoDB
298
+ connections will need to be setup as follows:
299
+
300
+ ```ruby
301
+ options = {
302
+ pool_size: 50,
303
+ pool_timeout: 5,
304
+ logger: SemanticLogger::DebugAsTraceLogger.new('Mongo:Work'),
305
+ }
306
+
307
+ # For example when using a replica-set for high availability
308
+ uri = 'mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production_rocketjob'
309
+ RocketJob::Config.mongo_connection = Mongo::MongoClient.from_uri(uri, options)
310
+
311
+ # Use a separate database, or even server for `RocketJob::SlicedJob` slices
312
+ uri = 'mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production_rocketjob_slices'
313
+ RocketJob::Config.mongo_work_connection = Mongo::MongoClient.from_uri(uri, options)
314
+ ```
315
+
156
316
  ## Requirements
157
317
 
158
318
  MongoDB V2.6 or greater. V3 is recommended
159
319
 
160
320
  * V2.6 includes a feature to allow lookups using the `$or` clause to use an index
321
+
322
+ ## Meta
323
+
324
+ * Code: `git clone git://github.com/rocketjob/rocketjob.git`
325
+ * Home: <https://github.com/rocketjob/rocketjob>
326
+ * Bugs: <http://github.com/rocketjob/rocketjob/issues>
327
+ * Gems: <http://rubygems.org/gems/rocketjob>
328
+
329
+ This project uses [Semantic Versioning](http://semver.org/).
330
+
331
+ ## Author
332
+
333
+ [Reid Morrison](https://github.com/reidmorrison) :: @reidmorrison
334
+
335
+ ## Contributors
336
+
337
+ * [Chris Lamb](https://github.com/lambcr)
@@ -18,7 +18,7 @@ module RocketJob
18
18
  # Run a RocketJob::Server from the command line
19
19
  def run
20
20
  SemanticLogger.add_appender(STDOUT, &SemanticLogger::Appender::Base.colorized_formatter) unless quiet
21
- boot_rails
21
+ boot_rails if defined?(:Rails)
22
22
  write_pidfile
23
23
 
24
24
  opts = {}
@@ -53,11 +53,15 @@ module RocketJob
53
53
  connection(connection)
54
54
  Server.connection(connection)
55
55
  Job.connection(connection)
56
+ Config.connection(connection)
57
+ DirmonEntry.connection(connection)
56
58
 
57
59
  db_name = connection.db.name
58
60
  set_database_name(db_name)
59
61
  Server.set_database_name(db_name)
60
62
  Job.set_database_name(db_name)
63
+ Config.set_database_name(db_name)
64
+ DirmonEntry.set_database_name(db_name)
61
65
  end
62
66
 
63
67
  # Use a separate Mongo connection for the Records and Results
@@ -0,0 +1,60 @@
1
+ module RocketJob
2
+ class DirmonEntry
3
+ include MongoMapper::Document
4
+
5
+ # Wildcard path to search for files in
6
+ #
7
+ # Example:
8
+ # input_files/process1/*.csv*
9
+ # input_files/process2/**/*
10
+ #
11
+ # For details on valid path values, see: http://ruby-doc.org/core-2.2.2/Dir.html#method-c-glob
12
+ #
13
+ # Note
14
+ # - If there are no '*' in the path then an exact filename match is expected
15
+ key :path, String
16
+
17
+ # Job to start
18
+ #
19
+ # Example:
20
+ # "ProcessItJob"
21
+ key :job, String
22
+
23
+ # Any user supplied arguments for the method invocation
24
+ # All keys must be UTF-8 strings. The values can be any valid BSON type:
25
+ # Integer
26
+ # Float
27
+ # Time (UTC)
28
+ # String (UTF-8)
29
+ # Array
30
+ # Hash
31
+ # True
32
+ # False
33
+ # Symbol
34
+ # nil
35
+ # Regular Expression
36
+ #
37
+ # Note: Date is not supported, convert it to a UTC time
38
+ key :arguments, Array, default: []
39
+
40
+ # Any job properties to set
41
+ #
42
+ # Example, override the default job priority:
43
+ # { priority: 45 }
44
+ key :properties, Hash, default: {}
45
+
46
+ # Archive directory to move files to when processed to prevent processing the
47
+ # file again.
48
+ #
49
+ # If supplied, the file will be moved to this directory before the job is started
50
+ # If the file was in a sub-directory, the corresponding sub-directory will
51
+ # be created in the archive directory, if the path being scanned for files
52
+ # is a relative path. (I.e. Does not start with '/') .
53
+ key :archive_directory, String
54
+
55
+ # Allow a monitoring path to be temporarily disabled
56
+ key :enabled, Boolean, default: true
57
+
58
+ validates_presence_of :path, :job
59
+ end
60
+ end
@@ -23,28 +23,28 @@ module RocketJob
23
23
  key :perform_method, Symbol, default: :perform
24
24
 
25
25
  # Priority of this job as it relates to other jobs [1..100]
26
- # 1: Lowest Priority
27
- # 100: Highest Priority
26
+ # 1: Highest Priority
28
27
  # 50: Default Priority
28
+ # 100: Lowest Priority
29
+ #
30
+ # Example:
31
+ # A job with a priority of 40 will execute before a job with priority 50
32
+ #
33
+ # In RocketJob Pro, if a SlicedJob is running and a higher priority job
34
+ # arrives, then the current job will complete the current slices and process
35
+ # the new higher priority job
29
36
  key :priority, Integer, default: 50
30
37
 
31
- # Support running this job in the future
32
- # Also set when a job fails and needs to be re-tried in the future
38
+ # Run this job no earlier than this time
33
39
  key :run_at, Time
34
40
 
35
41
  # If a job has not started by this time, destroy it
36
42
  key :expires_at, Time
37
43
 
38
44
  # When specified a job will be re-scheduled to run at it's next scheduled interval
39
- # Format is the same as cron
40
- key :schedule, String
41
-
42
- # Job should be marked as repeatable when it can be run multiple times
43
- # without changing the system state or modifying database contents.
44
- # Setting to false will result in an additional lookup on the results collection
45
- # before processing the record to ensure it was not previously processed.
46
- # This is necessary for retrying a job.
47
- key :repeatable, Boolean, default: true
45
+ # Format is the same as cron.
46
+ # #TODO Future capability.
47
+ #key :schedule, String
48
48
 
49
49
  # When the job completes destroy it from both the database and the UI
50
50
  key :destroy_on_complete, Boolean, default: true
@@ -75,9 +75,6 @@ module RocketJob
75
75
  # Levels supported: :trace, :debug, :info, :warn, :error, :fatal
76
76
  key :log_level, Symbol
77
77
 
78
- # Only give access through the Web UI to this group identifier
79
- #key :group, String
80
-
81
78
  #
82
79
  # Read-only attributes
83
80
  #
@@ -121,30 +118,17 @@ module RocketJob
121
118
  set_collection_name 'rocket_job.jobs'
122
119
 
123
120
  validates_presence_of :state, :failure_count, :created_at, :perform_method
124
- # :repeatable, :destroy_on_complete, :collect_output, :arguments
125
121
  validates :priority, inclusion: 1..100
126
122
 
127
123
  # State Machine events and transitions
128
124
  #
129
- # For Job Record jobs, usual processing:
130
125
  # :queued -> :running -> :completed
131
- # -> :paused -> :running ( manual )
132
- # -> :failed -> :running ( manual )
133
- # -> :retry -> :running ( future date )
134
- #
135
- # Any state other than :completed can transition manually to :aborted
136
- #
137
- # Work queue is priority based and then FIFO thereafter
138
- # means that records from existing multi-record jobs will be completed before
139
- # new jobs are started with the same priority.
140
- # Unless, the loader is not fast enough and the
141
- # records queue is empty. In this case the next multi-record job will
142
- # start loading too.
143
- #
144
- # Where: state: [:queued, :running], run_at: $lte: Time.now
145
- # Sort: priority, created_at
146
- #
147
- # Index: state, run_at
126
+ # -> :paused -> :running
127
+ # -> :aborted
128
+ # -> :failed -> :running
129
+ # -> :aborted
130
+ # -> :aborted
131
+ # -> :aborted
148
132
  aasm column: :state do
149
133
  # Job has been created and is queued for processing ( Initial state )
150
134
  state :queued, initial: true
@@ -162,10 +146,6 @@ module RocketJob
162
146
  # Job failed to process and needs to be manually re-tried or aborted
163
147
  state :failed
164
148
 
165
- # Job failed to process previously and is scheduled to be retried at a
166
- # future date
167
- state :retry
168
-
169
149
  # Job was aborted and cannot be resumed ( End state )
170
150
  state :aborted
171
151
 
@@ -253,6 +233,11 @@ module RocketJob
253
233
  Time.at(seconds)
254
234
  end
255
235
 
236
+ # A job has expired if the expiry time has passed before it is started
237
+ def expired?
238
+ started_at.nil? && expires_at && (expires_at < Time.now)
239
+ end
240
+
256
241
  # Returns [Hash] status of this job
257
242
  def status(time_zone='Eastern Time (US & Canada)')
258
243
  h = {
@@ -279,13 +264,15 @@ module RocketJob
279
264
  h
280
265
  end
281
266
 
282
- # Same basic formula for calculating retry interval as delayed_job and Sidekiq
283
- # TODO Consider lowering the priority automatically after every retry?
267
+ # TODO Jobs are not currently automatically retried. Is there a need?
284
268
  def seconds_to_delay(count)
269
+ # TODO Consider lowering the priority automatically after every retry?
270
+ # Same basic formula for calculating retry interval as delayed_job and Sidekiq
285
271
  (count ** 4) + 15 + (rand(30)*(count+1))
286
272
  end
287
273
 
288
274
  # Patch the way MongoMapper reloads a model
275
+ # Only reload MongoMapper attributes, leaving other instance variables untouched
289
276
  def reload
290
277
  if doc = collection.find_one(:_id => id)
291
278
  load_from_database(doc)
@@ -345,7 +332,7 @@ module RocketJob
345
332
  # Name of the server that will be processing this job
346
333
  #
347
334
  # skip_job_ids [Array<BSON::ObjectId>]
348
- # Job ids to exclude when looking for 3the next job
335
+ # Job ids to exclude when looking for the next job
349
336
  #
350
337
  # Note:
351
338
  # If a job is in queued state it will be started
@@ -368,18 +355,25 @@ module RocketJob
368
355
  }
369
356
  query['_id'] = { '$nin' => skip_job_ids } if skip_job_ids && skip_job_ids.size > 0
370
357
 
371
- if doc = find_and_modify(
358
+ while doc = find_and_modify(
372
359
  query: query,
373
360
  sort: [['priority', 'asc'], ['created_at', 'asc']],
374
361
  update: { '$set' => { 'server_name' => server_name, 'state' => 'running' } }
375
362
  )
376
363
  job = load(doc)
377
- unless job.running?
378
- # Also update in-memory state and run call-backs
379
- job.start
380
- job.set(started_at: job.started_at)
364
+ if job.running?
365
+ return job
366
+ else
367
+ if job.expired?
368
+ job.destroy
369
+ logger.info "Destroyed expired job #{job.class.name}, id:#{job.id}"
370
+ else
371
+ # Also update in-memory state and run call-backs
372
+ job.start
373
+ job.set(started_at: job.started_at)
374
+ return job
375
+ end
381
376
  end
382
- job
383
377
  end
384
378
  end
385
379
 
@@ -0,0 +1,174 @@
1
+ require 'fileutils'
2
+ module RocketJob
3
+ module Jobs
4
+ # Dirmon monitors folders for files matching the criteria specified in DirmonEntry
5
+ #
6
+ # * The first time Dirmon runs it gathers the names of files in the monitored
7
+ # folders.
8
+ # * On completion Dirmon kicks off a new Dimon job passing it the list
9
+ # of known files.
10
+ # * On each subsequent Dirmon run it checks the size of each file against the
11
+ # previous list of known files, and only of the file size has not changed
12
+ # the corresponding job is started for that file.
13
+ # * If the job implements #file_store_upload or #upload, that method is called
14
+ # and then the file is deleted, or moved to the archive_directory if supplied
15
+ # * Otherwise, the file is moved to the supplied archive_directory (defaults to
16
+ # `_archive` in the same folder as the file itself. The absolute path and
17
+ # file name of the archived file is passed into the job as it's first argument.
18
+ # Note: This means that such jobs _must_ have a Hash as the first agrument
19
+ #
20
+ # With RocketJob Pro, the file is automatically uploaded into the job itself
21
+ # using the job's #upload method, after which the file is archived or deleted
22
+ # if no archive_directory was specified in the DirmonEntry.
23
+ #
24
+ # To start Dirmon for the first time
25
+ #
26
+ #
27
+ # Note:
28
+ # Do _not_ start multiple copies of Dirmon as it will result in duplicate
29
+ # jobs being started.
30
+ class DirmonJob < RocketJob::Job
31
+ DEFAULT_ARCHIVE_DIR = '_archive'.freeze
32
+
33
+ rocket_job do |job|
34
+ job.priority = 40
35
+ end
36
+
37
+ # Number of seconds between directory scans. Default 5 mins
38
+ key :check_seconds, Float, default: 300.0
39
+
40
+ # TODO Make :perform_later, :perform_now, :perform, :now protected/private
41
+ # class << self
42
+ # # Ensure that only one instance of the job is running.
43
+ # protected :perform_later, :perform_now, :perform, :now
44
+ # end
45
+ #self.send(:protected, :perform_later)
46
+
47
+ # Start the single instance of this job
48
+ # Returns true if the job was started
49
+ # Returns false if the job is already running and doe not need to be started
50
+ def self.start(&block)
51
+ # Prevent multiple Dirmon Jobs from running at the same time
52
+ return false if where(state: [ :running, :queued ]).count > 0
53
+
54
+ perform_later({}, &block)
55
+ true
56
+ end
57
+
58
+ # Iterate over each Dirmon entry looking for new files
59
+ # If a new file is found, it is not processed immediately, instead
60
+ # it is passed to the next run of this job along with the file size.
61
+ # If the file size has not changed, the Job is kicked off.
62
+ def perform(previous_file_names={})
63
+ new_file_names = check_directories(previous_file_names)
64
+ ensure
65
+ # Run again in the future, even if this run fails with an exception
66
+ self.class.perform_later(new_file_names || previous_file_names) do |job|
67
+ job.priority = priority
68
+ job.check_seconds = check_seconds
69
+ job.run_at = Time.now + check_seconds
70
+ end
71
+ end
72
+
73
+ # Checks the directories for new files, starting jobs if files have not changed
74
+ # since the last run
75
+ def check_directories(previous_file_names)
76
+ new_file_names = {}
77
+ DirmonEntry.where(enabled: true).each do |entry|
78
+ logger.tagged("Entry:#{entry.id}") do
79
+ Dir[entry.path].each do |file_name|
80
+ next if File.directory?(file_name)
81
+ next if file_name.include?(DEFAULT_ARCHIVE_DIR)
82
+ # BSON Keys cannot contain periods
83
+ key = file_name.gsub('.', '_')
84
+ previous_size = previous_file_names[key]
85
+ if size = check_file(entry, file_name, previous_size)
86
+ new_file_names[key] = size
87
+ end
88
+ end
89
+ end
90
+ end
91
+ new_file_names
92
+ end
93
+
94
+ # Checks if a file should result in starting a job
95
+ # Returns [Integer] file size, or nil if the file started a job
96
+ def check_file(entry, file_name, previous_size)
97
+ size = File.size(file_name)
98
+ if previous_size && (previous_size == size)
99
+ logger.info("File stabilized: #{file_name}. Starting: #{entry.job}")
100
+ start_job(entry, file_name)
101
+ nil
102
+ else
103
+ logger.info("Found file: #{file_name}. File size: #{size}")
104
+ # Keep for the next run
105
+ size
106
+ end
107
+ rescue Errno::ENOENT => exc
108
+ # File may have been deleted since the scan was performed
109
+ nil
110
+ end
111
+
112
+ # Starts the job for the supplied entry
113
+ def start_job(entry, file_name)
114
+ entry.job.constantize.perform_later(*entry.arguments) do |job|
115
+ # Set properties, also allows :perform_method to be overridden
116
+ entry.properties.each_pair { |k, v| job.send("#{k}=".to_sym, v) }
117
+
118
+ upload_file(job, file_name, entry.archive_directory)
119
+ end
120
+ end
121
+
122
+ # Upload the file to the job
123
+ def upload_file(job, file_name, archive_directory)
124
+ if job.respond_to?(:file_store_upload)
125
+ # Allow the job to determine what to do with the file
126
+ job.file_store_upload(file_name)
127
+ archive_file(file_name, archive_directory)
128
+ elsif job.respond_to?(:upload)
129
+ # With RocketJob Pro the file can be uploaded directly into the Job itself
130
+ job.upload(file_name)
131
+ archive_file(file_name, archive_directory)
132
+ else
133
+ upload_default(job, file_name, archive_directory)
134
+ end
135
+ end
136
+
137
+ # Archives the file for a job where there was no #file_store_upload or #upload method
138
+ def upload_default(job, file_name, archive_directory)
139
+ # The first argument must be a hash
140
+ job.arguments << {} if job.arguments.size == 0
141
+ # If no archive directory is supplied, use DEFAULT_ARCHIVE_DIR under the same path as the file
142
+ archive_directory ||= File.join(File.dirname(file_name), DEFAULT_ARCHIVE_DIR)
143
+ file_name = File.join(archive_directory, File.basename(file_name))
144
+ job.arguments.first[:full_file_name] = File.absolute_path(file_name)
145
+ archive_file(file_name, archive_directory)
146
+ end
147
+
148
+ # Move the file to the archive directory
149
+ # Or, delete it if no archive directory was supplied for this entry
150
+ #
151
+ # If the file_name contains a relative path the relative path will be
152
+ # created in the archive_directory before moving the file.
153
+ #
154
+ # If an absolute path is supplied, then the file is just moved into the
155
+ # archive directory without any sub-directories
156
+ def archive_file(file_name, archive_directory)
157
+ # Move file to archive directory if set
158
+ if archive_directory
159
+ # Absolute path?
160
+ target_file_name = if file_name.start_with?('/')
161
+ File.join(archive_directory, File.basename(file_name))
162
+ else
163
+ File.join(archive_directory, file_name)
164
+ end
165
+ FileUtils.mkdir_p(File.dirname(target_file_name))
166
+ FileUtils.move(file_name, target_file_name)
167
+ else
168
+ File.delete(file_name)
169
+ end
170
+ end
171
+
172
+ end
173
+ end
174
+ end
@@ -1,4 +1,4 @@
1
1
  # encoding: UTF-8
2
2
  module RocketJob #:nodoc
3
- VERSION = "0.7.0"
3
+ VERSION = "0.8.0"
4
4
  end
data/lib/rocketjob.rb CHANGED
@@ -8,6 +8,7 @@ require 'rocket_job/version'
8
8
  module RocketJob
9
9
  autoload :CLI, 'rocket_job/cli'
10
10
  autoload :Config, 'rocket_job/config'
11
+ autoload :DirmonEntry, 'rocket_job/dirmon_entry'
11
12
  autoload :Heartbeat, 'rocket_job/heartbeat'
12
13
  autoload :Job, 'rocket_job/job'
13
14
  autoload :JobException, 'rocket_job/job_exception'
@@ -15,4 +16,7 @@ module RocketJob
15
16
  module Concerns
16
17
  autoload :Worker, 'rocket_job/concerns/worker'
17
18
  end
19
+ module Jobs
20
+ autoload :DirmonJob, 'rocket_job/jobs/dirmon_job'
21
+ end
18
22
  end
@@ -11,35 +11,35 @@ default_options: &default_options
11
11
  :reconnect_max_retry_seconds: 5
12
12
 
13
13
  development:
14
- uri: mongodb://localhost:27017/development_rocket_job
14
+ uri: mongodb://localhost:27017/development_rocketjob
15
15
  options:
16
16
  <<: *default_options
17
17
 
18
18
  development_work:
19
- uri: mongodb://localhost:27017/development_rocket_job_work
19
+ uri: mongodb://localhost:27017/development_rocketjob_work
20
20
  options:
21
21
  <<: *default_options
22
22
 
23
23
  test:
24
- uri: mongodb://localhost:27017/test_rocket_job
24
+ uri: mongodb://localhost:27017/test_rocketjob
25
25
  options:
26
26
  <<: *default_options
27
27
 
28
28
  test_work:
29
- uri: mongodb://localhost:27017/test_rocket_job_work
29
+ uri: mongodb://localhost:27017/test_rocketjob_work
30
30
  options:
31
31
  <<: *default_options
32
32
 
33
33
  # Sample Production Settings
34
34
  production:
35
- uri: mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production_rocket_job
35
+ uri: mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production_rocketjob
36
36
  options:
37
37
  <<: *default_options
38
38
  :pool_size: 50
39
39
  :pool_timeout: 5
40
40
 
41
41
  production_work:
42
- uri: mongodb://mongo_local.site.com:27017/production_rocket_job_work
42
+ uri: mongodb://mongo_local.site.com:27017/production_rocketjob_work
43
43
  options:
44
44
  <<: *default_options
45
45
  :pool_size: 50
@@ -0,0 +1,299 @@
1
+ require_relative 'test_helper'
2
+ require_relative 'jobs/test_job'
3
+
4
+ # Unit Test for RocketJob::Job
5
+ class DirmonJobTest < Minitest::Test
6
+ context RocketJob::Jobs::DirmonJob do
7
+ setup do
8
+ @server = RocketJob::Server.new
9
+ @server.started
10
+ @dirmon_job = RocketJob::Jobs::DirmonJob.new
11
+ @archive_directory = '/tmp/archive_directory'
12
+ @entry = RocketJob::DirmonEntry.new(
13
+ path: 'abc/*',
14
+ job: 'Jobs::TestJob',
15
+ arguments: [ { input: 'yes' } ],
16
+ properties: { priority: 23, perform_method: :event },
17
+ archive_directory: @archive_directory
18
+ )
19
+ @job = Jobs::TestJob.new
20
+ @paths = {
21
+ 'abc/*' => %w(abc/file1 abc/file2)
22
+ }
23
+ end
24
+
25
+ teardown do
26
+ @dirmon_job.destroy if @dirmon_job && !@dirmon_job.new_record?
27
+ FileUtils.remove_dir(@archive_directory, true) if Dir.exist?(@archive_directory)
28
+ end
29
+
30
+ context '.config' do
31
+ should 'support multiple databases' do
32
+ assert_equal 'test_rocketjob', RocketJob::DirmonEntry.collection.db.name
33
+ end
34
+ end
35
+
36
+ context '#archive_file' do
37
+ should 'archive absolute path file' do
38
+ begin
39
+ file = Tempfile.new('archive')
40
+ file_name = file.path
41
+ File.open(file_name, 'w') { |file| file.write('Hello World') }
42
+ assert File.exists?(file_name)
43
+ @dirmon_job.archive_file(file_name, @archive_directory)
44
+ archive_file_name = File.join(@archive_directory, File.basename(file_name))
45
+ assert File.exists?(archive_file_name), archive_file_name
46
+ ensure
47
+ file.delete if file
48
+ end
49
+ end
50
+
51
+ should 'archive relative path file' do
52
+ begin
53
+ relative_path = 'tmp'
54
+ FileUtils.mkdir_p(relative_path)
55
+ file_name = File.join(relative_path, 'dirmon_job_test.txt')
56
+ File.open(file_name, 'w') { |file| file.write('Hello World') }
57
+ @dirmon_job.archive_file(file_name, @archive_directory)
58
+ archive_file_name = File.join(@archive_directory, file_name)
59
+ assert File.exists?(archive_file_name), archive_file_name
60
+ ensure
61
+ File.delete(file_name) if file_name && File.exists?(file_name)
62
+ end
63
+ end
64
+ end
65
+
66
+ context '#upload_default' do
67
+ should 'upload default case with no archive_directory' do
68
+ job = Jobs::TestJob.new
69
+ file_name = 'abc/myfile.txt'
70
+ archived_file_name = 'abc/_archive/myfile.txt'
71
+ @dirmon_job.stub(:archive_file, -> fn, tp { assert_equal [archived_file_name, 'abc/_archive'], [fn, tp] }) do
72
+ @dirmon_job.upload_default(job, file_name, nil)
73
+ end
74
+ assert_equal File.absolute_path(archived_file_name), job.arguments.first[:full_file_name]
75
+ end
76
+
77
+ should 'upload default case with archive_directory' do
78
+ job = Jobs::TestJob.new
79
+ file_name = 'abc/myfile.txt'
80
+ archived_file_name = "#{@archive_directory}/myfile.txt"
81
+ @dirmon_job.stub(:archive_file, -> fn, tp { assert_equal [archived_file_name, @archive_directory], [fn, tp] }) do
82
+ @dirmon_job.upload_default(job, file_name, @archive_directory)
83
+ end
84
+ assert_equal File.absolute_path(archived_file_name), job.arguments.first[:full_file_name]
85
+ end
86
+ end
87
+
88
+ context '#upload_file' do
89
+ should 'upload using #file_store_upload' do
90
+ job = Jobs::TestJob.new
91
+ job.define_singleton_method(:file_store_upload) do |file_name|
92
+ file_name
93
+ end
94
+ file_name = 'abc/myfile.txt'
95
+ @dirmon_job.stub(:archive_file, -> fn, tp { assert_equal [file_name, @archive_directory], [fn, tp] }) do
96
+ @dirmon_job.upload_file(job, file_name, @archive_directory)
97
+ end
98
+ end
99
+
100
+ should 'upload using #upload' do
101
+ job = Jobs::TestJob.new
102
+ job.define_singleton_method(:upload) do |file_name|
103
+ file_name
104
+ end
105
+ file_name = 'abc/myfile.txt'
106
+ @dirmon_job.stub(:archive_file, -> fn, tp { assert_equal [file_name, @archive_directory], [fn, tp] }) do
107
+ @dirmon_job.upload_file(job, file_name, @archive_directory)
108
+ end
109
+ end
110
+ end
111
+
112
+ context '#start_job' do
113
+ setup do
114
+ RocketJob::Config.inline_mode = true
115
+ end
116
+
117
+ teardown do
118
+ RocketJob::Config.inline_mode = false
119
+ end
120
+
121
+ should 'upload using #upload' do
122
+ file_name = 'abc/myfile.txt'
123
+ job = @dirmon_job.stub(:upload_file, -> j, fn, sp { assert_equal [file_name, @archive_directory], [fn, sp] }) do
124
+ @dirmon_job.start_job(@entry, file_name)
125
+ end
126
+ assert_equal @entry.job, job.class.name
127
+ assert_equal 23, job.priority
128
+ assert_equal [ {:input=>"yes", "before_event"=>true, "event"=>true, "after_event"=>true} ], job.arguments
129
+ end
130
+ end
131
+
132
+ context '#check_file' do
133
+ should 'check growing file' do
134
+ previous_size = 5
135
+ new_size = 10
136
+ file = Tempfile.new('check_file')
137
+ file_name = file.path
138
+ File.open(file_name, 'w') { |file| file.write('*' * new_size) }
139
+ assert_equal new_size, File.size(file_name)
140
+ result = @dirmon_job.check_file(@entry, file_name, previous_size)
141
+ assert_equal new_size, result
142
+ end
143
+
144
+ should 'check completed file' do
145
+ previous_size = 10
146
+ new_size = 10
147
+ file = Tempfile.new('check_file')
148
+ file_name = file.path
149
+ File.open(file_name, 'w') { |file| file.write('*' * new_size) }
150
+ assert_equal new_size, File.size(file_name)
151
+ started = false
152
+ result = @dirmon_job.stub(:start_job, -> e,fn { started = true } ) do
153
+ @dirmon_job.check_file(@entry, file_name, previous_size)
154
+ end
155
+ assert_equal nil, result
156
+ assert started
157
+ end
158
+
159
+ should 'check deleted file' do
160
+ previous_size = 5
161
+ file_name = 'blah'
162
+ result = @dirmon_job.check_file(@entry, file_name, previous_size)
163
+ assert_equal nil, result
164
+ end
165
+ end
166
+
167
+ context '#check_directories' do
168
+ setup do
169
+ @entry.save!
170
+ end
171
+
172
+ teardown do
173
+ @entry.destroy if @entry
174
+ end
175
+
176
+ should 'no files' do
177
+ previous_file_names = {}
178
+ result = nil
179
+ Dir.stub(:[], -> dir { [] }) do
180
+ result = @dirmon_job.check_directories(previous_file_names)
181
+ end
182
+ assert_equal 0, result.count
183
+ end
184
+
185
+ should 'new files' do
186
+ previous_file_names = {}
187
+ result = nil
188
+ Dir.stub(:[], -> dir { @paths[dir] }) do
189
+ result = @dirmon_job.stub(:check_file, -> e, fn, ps { 5 } ) do
190
+ @dirmon_job.check_directories(previous_file_names)
191
+ end
192
+ end
193
+ assert_equal result.count, @paths['abc/*'].count
194
+ result.each_pair do |k,v|
195
+ assert_equal 5, v
196
+ end
197
+ end
198
+
199
+ should 'allow files to grow' do
200
+ previous_file_names = {}
201
+ @paths['abc/*'].each { |file_name| previous_file_names[file_name] = 5}
202
+ result = nil
203
+ Dir.stub(:[], -> dir { @paths[dir] }) do
204
+ result = @dirmon_job.stub(:check_file, -> e, fn, ps { 10 } ) do
205
+ @dirmon_job.check_directories(previous_file_names)
206
+ end
207
+ end
208
+ assert_equal result.count, @paths['abc/*'].count
209
+ result.each_pair do |k,v|
210
+ assert_equal 10, v
211
+ end
212
+ end
213
+
214
+ should 'start all files' do
215
+ previous_file_names = {}
216
+ @paths['abc/*'].each { |file_name| previous_file_names[file_name] = 10 }
217
+ result = nil
218
+ Dir.stub(:[], -> dir { @paths[dir] }) do
219
+ result = @dirmon_job.stub(:check_file, -> e, fn, ps { nil } ) do
220
+ @dirmon_job.check_directories(previous_file_names)
221
+ end
222
+ end
223
+ assert_equal 0, result.count
224
+ end
225
+
226
+ should 'skip files in archive directory' do
227
+ previous_file_names = {}
228
+ @paths['abc/*'].each { |file_name| previous_file_names[file_name] = 5}
229
+ result = nil
230
+ # Add a file in the archive directory
231
+ @paths['abc/*'] << File.join('abc', RocketJob::Jobs::DirmonJob::DEFAULT_ARCHIVE_DIR, 'test.zip')
232
+ Dir.stub(:[], -> dir { @paths[dir] }) do
233
+ result = @dirmon_job.stub(:check_file, -> e, fn, ps { 10 } ) do
234
+ @dirmon_job.check_directories(previous_file_names)
235
+ end
236
+ end
237
+ assert_equal result.count, @paths['abc/*'].count - 1
238
+ result.each_pair do |k,v|
239
+ assert_equal 10, v
240
+ end
241
+ end
242
+ end
243
+
244
+
245
+ context '#perform' do
246
+ should 'check directories and reschedule' do
247
+ dirmon_job = nil
248
+ previous_file_names = {}
249
+ @paths['abc/*'].each { |file_name| previous_file_names[file_name] = 5 }
250
+ new_file_names = {}
251
+ @paths['abc/*'].each { |file_name| new_file_names[file_name] = 10 }
252
+ RocketJob::Jobs::DirmonJob.destroy_all
253
+ RocketJob::Jobs::DirmonJob.stub_any_instance(:check_directories, new_file_names) do
254
+ # perform_now does not save the job, just runs it
255
+ dirmon_job = RocketJob::Jobs::DirmonJob.perform_now(previous_file_names) do |job|
256
+ job.priority = 11
257
+ job.check_seconds = 30
258
+ end
259
+ end
260
+ assert dirmon_job.completed?, dirmon_job.status.inspect
261
+
262
+ # It should have enqueued another instance to run in the future
263
+ assert_equal 1, RocketJob::Jobs::DirmonJob.count
264
+ assert new_dirmon_job = RocketJob::Jobs::DirmonJob.last
265
+ assert_equal false, dirmon_job.id == new_dirmon_job.id
266
+ assert new_dirmon_job.run_at
267
+ assert_equal 11, new_dirmon_job.priority
268
+ assert_equal 30, new_dirmon_job.check_seconds
269
+ assert new_dirmon_job.queued?
270
+
271
+ new_dirmon_job.destroy
272
+ end
273
+
274
+ should 'check directories and reschedule even on exception' do
275
+ dirmon_job = nil
276
+ RocketJob::Jobs::DirmonJob.destroy_all
277
+ RocketJob::Jobs::DirmonJob.stub_any_instance(:check_directories, -> previous { raise RuntimeError.new("Oh no") }) do
278
+ # perform_now does not save the job, just runs it
279
+ dirmon_job = RocketJob::Jobs::DirmonJob.perform_now do |job|
280
+ job.priority = 11
281
+ job.check_seconds = 30
282
+ end
283
+ end
284
+ assert dirmon_job.failed?, dirmon_job.status.inspect
285
+
286
+ # It should have enqueued another instance to run in the future
287
+ assert_equal 2, RocketJob::Jobs::DirmonJob.count
288
+ assert new_dirmon_job = RocketJob::Jobs::DirmonJob.last
289
+ assert new_dirmon_job.run_at
290
+ assert_equal 11, new_dirmon_job.priority
291
+ assert_equal 30, new_dirmon_job.check_seconds
292
+ assert new_dirmon_job.queued?
293
+
294
+ new_dirmon_job.destroy
295
+ end
296
+ end
297
+
298
+ end
299
+ end
data/test/job_test.rb CHANGED
@@ -22,7 +22,7 @@ class JobTest < Minitest::Test
22
22
 
23
23
  context '.config' do
24
24
  should 'support multiple databases' do
25
- assert_equal 'test_rocket_job', RocketJob::Job.collection.db.name
25
+ assert_equal 'test_rocketjob', RocketJob::Job.collection.db.name
26
26
  end
27
27
  end
28
28
 
@@ -55,10 +55,8 @@ class JobTest < Minitest::Test
55
55
  assert_equal @arguments, @job.arguments
56
56
  assert_equal 0, @job.percent_complete
57
57
  assert_equal 50, @job.priority
58
- assert_equal true, @job.repeatable
59
58
  assert_equal 0, @job.failure_count
60
59
  assert_nil @job.run_at
61
- assert_nil @job.schedule
62
60
  assert_nil @job.started_at
63
61
  assert_equal :queued, @job.state
64
62
  end
@@ -181,6 +179,14 @@ class JobTest < Minitest::Test
181
179
  assert_equal @job.id, job.id
182
180
  end
183
181
 
182
+ should 'Skip expired jobs' do
183
+ count = RocketJob::Job.count
184
+ @job.expires_at = Time.now - 100
185
+ @job.save!
186
+ assert_equal nil, RocketJob::Job.next_job(@server.name)
187
+ assert_equal count, RocketJob::Job.count
188
+ end
189
+
184
190
  end
185
191
  end
186
192
  end
data/test/server_test.rb CHANGED
@@ -23,7 +23,7 @@ class ServerTest < Minitest::Test
23
23
 
24
24
  context '.config' do
25
25
  should 'support multiple databases' do
26
- assert_equal 'test_rocket_job', RocketJob::Job.collection.db.name
26
+ assert_equal 'test_rocketjob', RocketJob::Job.collection.db.name
27
27
  end
28
28
  end
29
29
 
data/test/worker_test.rb CHANGED
@@ -29,10 +29,8 @@ class WorkerTest < Minitest::Test
29
29
  assert_nil @job.expires_at
30
30
  assert_equal 0, @job.percent_complete
31
31
  assert_equal 50, @job.priority
32
- assert_equal true, @job.repeatable
33
32
  assert_equal 0, @job.failure_count
34
33
  assert_nil @job.run_at
35
- assert_nil @job.schedule
36
34
  assert_nil @job.started_at
37
35
  assert_equal :queued, @job.state
38
36
 
@@ -50,10 +48,8 @@ class WorkerTest < Minitest::Test
50
48
  assert_nil @job.expires_at
51
49
  assert_equal 100, @job.percent_complete
52
50
  assert_equal 50, @job.priority
53
- assert_equal true, @job.repeatable
54
51
  assert_equal 0, @job.failure_count
55
52
  assert_nil @job.run_at
56
- assert_nil @job.schedule
57
53
  assert @job.started_at
58
54
  end
59
55
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rocketjob
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.8.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Reid Morrison
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-07-14 00:00:00.000000000 Z
11
+ date: 2015-07-20 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: aasm
@@ -123,13 +123,16 @@ files:
123
123
  - lib/rocket_job/cli.rb
124
124
  - lib/rocket_job/concerns/worker.rb
125
125
  - lib/rocket_job/config.rb
126
+ - lib/rocket_job/dirmon_entry.rb
126
127
  - lib/rocket_job/heartbeat.rb
127
128
  - lib/rocket_job/job.rb
128
129
  - lib/rocket_job/job_exception.rb
130
+ - lib/rocket_job/jobs/dirmon_job.rb
129
131
  - lib/rocket_job/server.rb
130
132
  - lib/rocket_job/version.rb
131
133
  - lib/rocketjob.rb
132
134
  - test/config/mongo.yml
135
+ - test/dirmon_job_test.rb
133
136
  - test/job_test.rb
134
137
  - test/jobs/test_job.rb
135
138
  - test/server_test.rb
@@ -161,6 +164,7 @@ specification_version: 4
161
164
  summary: High volume, priority based, Enterprise Batch Processing solution for Ruby
162
165
  test_files:
163
166
  - test/config/mongo.yml
167
+ - test/dirmon_job_test.rb
164
168
  - test/job_test.rb
165
169
  - test/jobs/test_job.rb
166
170
  - test/server_test.rb