RubyGems - rocketjob - Versions diffs - 0.7.0 → 0.8.0 - Mend

rocketjob 0.7.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/README.md +194 -17
data/lib/rocket_job/cli.rb +1 -1
data/lib/rocket_job/config.rb +4 -0
data/lib/rocket_job/dirmon_entry.rb +60 -0
data/lib/rocket_job/job.rb +42 -48
data/lib/rocket_job/jobs/dirmon_job.rb +174 -0
data/lib/rocket_job/version.rb +1 -1
data/lib/rocketjob.rb +4 -0
data/test/config/mongo.yml +6 -6
data/test/dirmon_job_test.rb +299 -0
data/test/job_test.rb +9 -3
data/test/server_test.rb +1 -1
data/test/worker_test.rb +0 -4
metadata +6 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: d784e40e75e2aca697e1258fb6d823be8e03c6f1
-  data.tar.gz: 69008d9f1ae0396be4e1838f3d931299af226005
+  metadata.gz: 0ce24994db3b436019267874436049b46226fe54
+  data.tar.gz: b34a433d970dab95a6f37fcf4d1db25002e17bf6
 SHA512:
-  metadata.gz: 614602a9f849b27bfbd4e2fda0985da5ae798e4a95e9ccbfe84a49d620fc1fedd9423df5395e7a1acefc4f42f61008d467d136a2c122ef19038e1dfd4b7dd555
-  data.tar.gz: be35af2fafca63f647ebb485cfcdbcb76399556ac86693a48ec22d98cbfca512925b7c07ac9ca191a1fb7ce37ff0e2235124ddc7f79c7bf313647797e14e35f1
+  metadata.gz: d9f00987987456c79446bab56247fc3c23cb91263d42e1d30957b539851952a5564c9f744e165132120ade41529834d949c1af6918d960c546e2cd76dafc5b74
+  data.tar.gz: 77b5a354059d60f59b1f9c8f2f138f02a6817dd391ec7e05d8915998bfa0d98ae6aa1296c97dadc88ff8597e47aeb56122a1e8b526792b28e99b27dd662ff702

data/README.md CHANGED Viewed

@@ -1,10 +1,10 @@
-# rocketjob
+# rocketjob[![Build Status](https://secure.travis-ci.org/rocketjob/rocketjob.png?branch=master)](http://travis-ci.org/rocketjob/rocketjob) ![](http://ruby-gem-downloads-badge.herokuapp.com/rocketjob?type=total)
 High volume, priority based, background job processing solution for Ruby.
 ## Status
-Alpha - Feedback on the API is welcome. API will change.
+Beta - Feedback on the API is welcome. API may change.
 Already in use in production internally processing large files with millions
 of records, as well as large jobs to walk though large databases.
@@ -91,7 +91,7 @@ as quickly as possible without impacting other jobs with a higher priority.
 ## Management
-The companion project [rocketjob mission control](https://github.com/lambcr/rocket_job_mission_control)
+The companion project [rocketjob mission control](https://github.com/rocketjob/rocket_job_mission_control)
 contains the Rails Engine that can be loaded into your Rails project to add
 a web interface for viewing and managing `rocketjob` jobs.
@@ -122,30 +122,167 @@ To queue the above job for processing:
 MyJob.perform_later('jack@blah.com', 'lets meet')
 ```
-## Configuration
+## Directory Monitoring
-MongoMapper will already configure itself in Rails environments. Sometimes we want
-to use a different Mongo Database instance for the records and results.
+A common task with many batch processing systems is to look for the appearance of
+new files and kick off jobs to process them. `DirmonJob` is a job designed to do
+this task.
-For example, the RocketJob::Job can be stored in a Mongo Database that is replicated
-across data centers, whereas we may not want to replicate record and result data
-due to it's sheer volume.
+`DirmonJob` runs every 5 minutes by default, looking for new files that have appeared
+based on configured entries called `DirmonEntry`. Ultimately these entries will be
+configurable via `rocketjob_mission_control`, the web management interface for `rocketjob`.
+Example, creating a `DirmonEntry`
+```ruby
+RocketJob::DirmonEntry.new(
+  path:         'path_to_monitor/*',
+  job:          'Jobs::TestJob',
+  arguments:    [ { input: 'yes' } ],
+  properties:   { priority: 23, perform_method: :event },
+  archive_directory: '/exports/archive'
+)
+```
+The attributes of DirmonEntry:
+* path <String>
+Wildcard path to search for files in.
+For details on valid path values, see: http://ruby-doc.org/core-2.2.2/Dir.html#method-c-glob
+Example:
+    * input_files/process1/*.csv*
+    * input_files/process2/**/*
+* job <String>
+Name of the job to start
+* arguments <Array>
+Any user supplied arguments for the method invocation
+All keys must be UTF-8 strings. The values can be any valid BSON type:
+    * Integer
+    * Float
+    * Time    (UTC)
+    * String  (UTF-8)
+    * Array
+    * Hash
+    * True
+    * False
+    * Symbol
+    * nil
+    * Regular Expression
+_Note_: Date is not supported, convert it to a UTC time
+* properties <Hash>
+Any job properties to set.
+Example, override the default job priority:
+```ruby
+{ priority: 45 }
+```
+* archive_directory
+Archive directory to move the file to before the job is started. It is important to
+move the file before it is processed so that it is not picked up again for processing.
+If no archive_directory is supplied the file will be moved to a folder called '_archive'
+in the same folder as the file itself.
+If the `path` above is a relative path the relative path structure will be
+maintained when the file is moved to the archive path.
+* enabled <Boolean>
+Allow a monitoring entry to be disabled so that it is ignored by `DirmonJob`.
+This feature is useful for operations to temporarily stop processing files
+from a particular source, without having to completely delete the `DirmonEntry`.
+It can also be used to create a `DirmonEntry` without it becoming immediately
+active.
+```
+### Starting the directory monitor
+The directory monitor job only needs to be started once per installation by running
+the following code:
+```ruby
+RocketJob::Jobs::DirmonJob.perform_later
+```
+The polling interval to check for new files can be modified when starting the job
+for the first time by adding:
+```ruby
+RocketJob::Jobs::DirmonJob.perform_later do |job|
+  job.check_seconds = 180
+end
+```
+The default priority for `DirmonJob` is 40, to increase it's priority:
+```ruby
+RocketJob::Jobs::DirmonJob.perform_later do |job|
+  job.check_seconds = 300
+  job.priority      = 25
+end
+```
+Once `DirmonJob` has been started it's priority and check interval can be
+changed at any time as follows:
+```ruby
+RocketJob::Jobs::DirmonJob.first.set(check_seconds: 180, priority: 20)
+```
+The `DirmonJob` will automatically re-schedule a new instance of itself to run in
+the future after it completes a each scan/run. If successful the current job instance
+will destroy itself.
+In this way it avoids having a single Directory Monitor process that constantly
+sits there monitoring folders for changes. More importantly it avoids a "single
+point of failure" that is typical for earlier directory monitoring solutions.
+Every time `DirmonJob` runs and scans the paths for new files it could be running
+on a new worker. If any server/worker is removed or shutdown it will not stop
+`DirmonJob` since it will just run on another worker instance.
+There can only be one `DirmonJob` instance `queued` or `running` at a time. Any
+attempt to start a second instance will result in an exception.
+If an exception occurs while running `DirmonJob`, a failed job instance will remain
+in the job list for problem determination. The failed job cannot be restarted and
+should be destroyed if no longer needed.
+## Rails Configuration
+MongoMapper will already configure itself in Rails environments. `rocketjob` can
+be configured to use a separate MongoDB instance from the Rails application as follows:
+For example, we may want `RocketJob::Job` to be stored in a Mongo Database that
+is replicated across data centers, whereas we may not want to replicate the
+`RocketJob::SlicedJob`** slices due to it's sheer volume.
 ```ruby
 config.before_initialize do
-  # If this environment has a separate Work server
   # Share the common mongo configuration file
   config_file = root.join('config', 'mongo.yml')
   if config_file.file?
-    if config = YAML.load(ERB.new(config_file.read).result)["#{Rails.env}_work]
+    config = YAML.load(ERB.new(config_file.read).result)
+    if config["#{Rails.env}_rocketjob]
       options = (config['options']||{}).symbolize_keys
-      # In the development environment the Mongo driver generates a lot of
-      # network trace log data, move its debug logging to :trace
-      options[:logger] = SemanticLogger::DebugAsTraceLogger.new('Mongo:Work')
+      options[:logger] = SemanticLogger::DebugAsTraceLogger.new('Mongo:rocketjob')
+      RocketJob::Config.mongo_connection = Mongo::MongoClient.from_uri(config['uri'], options)
+    end
+    # It is also possible to store the jobs themselves in a separate MongoDB database
+    if config["#{Rails.env}_rocketjob_work]
+      options = (config['options']||{}).symbolize_keys
+      options[:logger] = SemanticLogger::DebugAsTraceLogger.new('Mongo:rocketjob_work')
       RocketJob::Config.mongo_work_connection = Mongo::MongoClient.from_uri(config['uri'], options)
-      # It is also possible to store the jobs themselves in a separate MongoDB database
-      # RocketJob::Config.mongo_connection = Mongo::MongoClient.from_uri(config['uri'], options)
     end
   else
     puts "\nmongo.yml config file not found: #{config_file}"
@@ -153,8 +290,48 @@ config.before_initialize do
 end
 ```
+For an example config file, `config/mongo.yml`, see [mongo.yml](https://github.com/rocketjob/rocketjob/blob/master/test/config/mongo.yml)
+## Standalone Configuration
+When running `rocketjob` in a standalone environment without Rails, the MongoDB
+connections will need to be setup as follows:
+```ruby
+options = {
+  pool_size:    50,
+  pool_timeout: 5,
+  logger:       SemanticLogger::DebugAsTraceLogger.new('Mongo:Work'),
+}
+# For example when using a replica-set for high availability
+uri = 'mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production_rocketjob'
+RocketJob::Config.mongo_connection = Mongo::MongoClient.from_uri(uri, options)
+# Use a separate database, or even server for `RocketJob::SlicedJob` slices
+uri = 'mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production_rocketjob_slices'
+RocketJob::Config.mongo_work_connection = Mongo::MongoClient.from_uri(uri, options)
+```
 ## Requirements
 MongoDB V2.6 or greater. V3 is recommended
 * V2.6 includes a feature to allow lookups using the `$or` clause to use an index
+## Meta
+* Code: `git clone git://github.com/rocketjob/rocketjob.git`
+* Home: <https://github.com/rocketjob/rocketjob>
+* Bugs: <http://github.com/rocketjob/rocketjob/issues>
+* Gems: <http://rubygems.org/gems/rocketjob>
+This project uses [Semantic Versioning](http://semver.org/).
+## Author
+[Reid Morrison](https://github.com/reidmorrison) :: @reidmorrison
+## Contributors
+* [Chris Lamb](https://github.com/lambcr)

data/lib/rocket_job/cli.rb CHANGED Viewed

@@ -18,7 +18,7 @@ module RocketJob
     # Run a RocketJob::Server from the command line
     def run
       SemanticLogger.add_appender(STDOUT,  &SemanticLogger::Appender::Base.colorized_formatter) unless quiet
-      boot_rails
+      boot_rails if defined?(:Rails)
       write_pidfile
       opts = {}

data/lib/rocket_job/config.rb CHANGED Viewed

@@ -53,11 +53,15 @@ module RocketJob
       connection(connection)
       Server.connection(connection)
       Job.connection(connection)
+      Config.connection(connection)
+      DirmonEntry.connection(connection)
       db_name = connection.db.name
       set_database_name(db_name)
       Server.set_database_name(db_name)
       Job.set_database_name(db_name)
+      Config.set_database_name(db_name)
+      DirmonEntry.set_database_name(db_name)
     end
     # Use a separate Mongo connection for the Records and Results

data/lib/rocket_job/dirmon_entry.rb ADDED Viewed

@@ -0,0 +1,60 @@
+module RocketJob
+  class DirmonEntry
+    include MongoMapper::Document
+    # Wildcard path to search for files in
+    #
+    # Example:
+    #   input_files/process1/*.csv*
+    #   input_files/process2/**/*
+    #
+    # For details on valid path values, see: http://ruby-doc.org/core-2.2.2/Dir.html#method-c-glob
+    #
+    # Note
+    # - If there are no '*' in the path then an exact filename match is expected
+    key :path,          String
+    # Job to start
+    #
+    # Example:
+    #   "ProcessItJob"
+    key :job,           String
+    # Any user supplied arguments for the method invocation
+    # All keys must be UTF-8 strings. The values can be any valid BSON type:
+    #   Integer
+    #   Float
+    #   Time    (UTC)
+    #   String  (UTF-8)
+    #   Array
+    #   Hash
+    #   True
+    #   False
+    #   Symbol
+    #   nil
+    #   Regular Expression
+    #
+    # Note: Date is not supported, convert it to a UTC time
+    key :arguments,     Array, default: []
+    # Any job properties to set
+    #
+    # Example, override the default job priority:
+    #   { priority: 45 }
+    key :properties,    Hash, default: {}
+    # Archive directory to move files to when processed to prevent processing the
+    # file again.
+    #
+    # If supplied, the file will be moved to this directory before the job is started
+    # If the file was in a sub-directory, the corresponding sub-directory will
+    # be created in the archive directory, if the path being scanned for files
+    # is a relative path. (I.e. Does not start with '/') .
+    key :archive_directory,  String
+    # Allow a monitoring path to be temporarily disabled
+    key :enabled,       Boolean, default: true
+    validates_presence_of :path, :job
+  end
+end

data/lib/rocket_job/job.rb CHANGED Viewed

@@ -23,28 +23,28 @@ module RocketJob
     key :perform_method,          Symbol, default: :perform
     # Priority of this job as it relates to other jobs [1..100]
-    #   1: Lowest Priority
-    # 100: Highest Priority
+    #   1: Highest Priority
     #  50: Default Priority
+    # 100: Lowest Priority
+    #
+    # Example:
+    #   A job with a priority of 40 will execute before a job with priority 50
+    #
+    # In RocketJob Pro, if a SlicedJob is running and a higher priority job
+    # arrives, then the current job will complete the current slices and process
+    # the new higher priority job
     key :priority,                Integer, default: 50
-    # Support running this job in the future
-    #   Also set when a job fails and needs to be re-tried in the future
+    # Run this job no earlier than this time
     key :run_at,                  Time
     # If a job has not started by this time, destroy it
     key :expires_at,              Time
     # When specified a job will be re-scheduled to run at it's next scheduled interval
-    # Format is the same as cron
-    key :schedule,                String
-    # Job should be marked as repeatable when it can be run multiple times
-    # without changing the system state or modifying database contents.
-    # Setting to false will result in an additional lookup on the results collection
-    # before processing the record to ensure it was not previously processed.
-    # This is necessary for retrying a job.
-    key :repeatable,              Boolean, default: true
+    # Format is the same as cron.
+    # #TODO Future capability.
+    #key :schedule,                String
     # When the job completes destroy it from both the database and the UI
     key :destroy_on_complete,     Boolean, default: true
@@ -75,9 +75,6 @@ module RocketJob
     #   Levels supported: :trace, :debug, :info, :warn, :error, :fatal
     key :log_level,               Symbol
-    # Only give access through the Web UI to this group identifier
-    #key :group,                   String
     #
     # Read-only attributes
     #
@@ -121,30 +118,17 @@ module RocketJob
     set_collection_name 'rocket_job.jobs'
     validates_presence_of :state, :failure_count, :created_at, :perform_method
-    # :repeatable, :destroy_on_complete, :collect_output, :arguments
     validates :priority, inclusion: 1..100
     # State Machine events and transitions
     #
-    # For Job Record jobs, usual processing:
     #   :queued -> :running -> :completed
-    #                       -> :paused     -> :running  ( manual )
-    #                       -> :failed     -> :running  ( manual )
-    #                       -> :retry      -> :running  ( future date )
-    #
-    # Any state other than :completed can transition manually to :aborted
-    #
-    # Work queue is priority based and then FIFO thereafter
-    # means that records from existing multi-record jobs will be completed before
-    # new jobs are started with the same priority.
-    # Unless, the loader is not fast enough and the
-    # records queue is empty. In this case the next multi-record job will
-    # start loading too.
-    #
-    # Where: state: [:queued, :running], run_at: $lte: Time.now
-    # Sort:  priority, created_at
-    #
-    # Index: state, run_at
+    #                       -> :paused     -> :running
+    #                                      -> :aborted
+    #                       -> :failed     -> :running
+    #                                      -> :aborted
+    #                       -> :aborted
+    #           -> :aborted
     aasm column: :state do
       # Job has been created and is queued for processing ( Initial state )
       state :queued, initial: true
@@ -162,10 +146,6 @@ module RocketJob
       # Job failed to process and needs to be manually re-tried or aborted
       state :failed
-      # Job failed to process previously and is scheduled to be retried at a
-      # future date
-      state :retry
       # Job was aborted and cannot be resumed ( End state )
       state :aborted
@@ -253,6 +233,11 @@ module RocketJob
       Time.at(seconds)
     end
+    # A job has expired if the expiry time has passed before it is started
+    def expired?
+      started_at.nil? && expires_at && (expires_at < Time.now)
+    end
     # Returns [Hash] status of this job
     def status(time_zone='Eastern Time (US & Canada)')
       h = {
@@ -279,13 +264,15 @@ module RocketJob
       h
     end
-    # Same basic formula for calculating retry interval as delayed_job and Sidekiq
-    # TODO Consider lowering the priority automatically after every retry?
+    # TODO Jobs are not currently automatically retried. Is there a need?
     def seconds_to_delay(count)
+      # TODO Consider lowering the priority automatically after every retry?
+      # Same basic formula for calculating retry interval as delayed_job and Sidekiq
       (count ** 4) + 15 + (rand(30)*(count+1))
     end
     # Patch the way MongoMapper reloads a model
+    # Only reload MongoMapper attributes, leaving other instance variables untouched
     def reload
       if doc = collection.find_one(:_id => id)
         load_from_database(doc)
@@ -345,7 +332,7 @@ module RocketJob
     #     Name of the server that will be processing this job
     #
     #   skip_job_ids [Array<BSON::ObjectId>]
-    #     Job ids to exclude when looking for 3the next job
+    #     Job ids to exclude when looking for the next job
     #
     # Note:
     #   If a job is in queued state it will be started
@@ -368,18 +355,25 @@ module RocketJob
       }
       query['_id'] = { '$nin' => skip_job_ids } if skip_job_ids && skip_job_ids.size > 0
-      if doc = find_and_modify(
+      while doc = find_and_modify(
           query:  query,
           sort:   [['priority', 'asc'], ['created_at', 'asc']],
           update: { '$set' => { 'server_name' => server_name, 'state' => 'running' } }
         )
         job = load(doc)
-        unless job.running?
-          # Also update in-memory state and run call-backs
-          job.start
-          job.set(started_at: job.started_at)
+        if job.running?
+          return job
+        else
+          if job.expired?
+            job.destroy
+            logger.info "Destroyed expired job #{job.class.name}, id:#{job.id}"
+          else
+            # Also update in-memory state and run call-backs
+            job.start
+            job.set(started_at: job.started_at)
+            return job
+          end
         end
-        job
       end
     end

data/lib/rocket_job/jobs/dirmon_job.rb ADDED Viewed

@@ -0,0 +1,174 @@
+require 'fileutils'
+module RocketJob
+  module Jobs
+    # Dirmon monitors folders for files matching the criteria specified in DirmonEntry
+    #
+    # * The first time Dirmon runs it gathers the names of files in the monitored
+    #   folders.
+    # * On completion Dirmon kicks off a new Dimon job passing it the list
+    #   of known files.
+    # * On each subsequent Dirmon run it checks the size of each file against the
+    #   previous list of known files, and only of the file size has not changed
+    #   the corresponding job is started for that file.
+    # * If the job implements #file_store_upload or #upload, that method is called
+    #   and then the file is deleted, or moved to the archive_directory if supplied
+    # * Otherwise, the file is moved to the supplied archive_directory (defaults to
+    #   `_archive` in the same folder as the file itself. The absolute path and
+    #   file name of the archived file is passed into the job as it's first argument.
+    #   Note: This means that such jobs _must_ have a Hash as the first agrument
+    #
+    # With RocketJob Pro, the file is automatically uploaded into the job itself
+    # using the job's #upload method, after which the file is archived or deleted
+    # if no archive_directory was specified in the DirmonEntry.
+    #
+    # To start Dirmon for the first time
+    #
+    #
+    # Note:
+    #   Do _not_ start multiple copies of Dirmon as it will result in duplicate
+    #   jobs being started.
+    class DirmonJob < RocketJob::Job
+      DEFAULT_ARCHIVE_DIR = '_archive'.freeze
+      rocket_job do |job|
+        job.priority = 40
+      end
+      # Number of seconds between directory scans. Default 5 mins
+      key :check_seconds,         Float, default: 300.0
+      # TODO Make :perform_later, :perform_now, :perform, :now protected/private
+      #      class << self
+      #        # Ensure that only one instance of the job is running.
+      #        protected :perform_later, :perform_now, :perform, :now
+      #      end
+      #self.send(:protected, :perform_later)
+      # Start the single instance of this job
+      # Returns true if the job was started
+      # Returns false if the job is already running and doe not need to be started
+      def self.start(&block)
+        # Prevent multiple Dirmon Jobs from running at the same time
+        return false if where(state: [ :running, :queued ]).count > 0
+        perform_later({}, &block)
+        true
+      end
+      # Iterate over each Dirmon entry looking for new files
+      # If a new file is found, it is not processed immediately, instead
+      # it is passed to the next run of this job along with the file size.
+      # If the file size has not changed, the Job is kicked off.
+      def perform(previous_file_names={})
+        new_file_names = check_directories(previous_file_names)
+      ensure
+        # Run again in the future, even if this run fails with an exception
+        self.class.perform_later(new_file_names || previous_file_names) do |job|
+          job.priority      = priority
+          job.check_seconds = check_seconds
+          job.run_at        = Time.now + check_seconds
+        end
+      end
+      # Checks the directories for new files, starting jobs if files have not changed
+      # since the last run
+      def check_directories(previous_file_names)
+        new_file_names = {}
+        DirmonEntry.where(enabled: true).each do |entry|
+          logger.tagged("Entry:#{entry.id}") do
+            Dir[entry.path].each do |file_name|
+              next if File.directory?(file_name)
+              next if file_name.include?(DEFAULT_ARCHIVE_DIR)
+              # BSON Keys cannot contain periods
+              key = file_name.gsub('.', '_')
+              previous_size = previous_file_names[key]
+              if size = check_file(entry, file_name, previous_size)
+                new_file_names[key] = size
+              end
+            end
+          end
+        end
+        new_file_names
+      end
+      # Checks if a file should result in starting a job
+      # Returns [Integer] file size, or nil if the file started a job
+      def check_file(entry, file_name, previous_size)
+        size = File.size(file_name)
+        if previous_size && (previous_size == size)
+          logger.info("File stabilized: #{file_name}. Starting: #{entry.job}")
+          start_job(entry, file_name)
+          nil
+        else
+          logger.info("Found file: #{file_name}. File size: #{size}")
+          # Keep for the next run
+          size
+        end
+      rescue Errno::ENOENT => exc
+        # File may have been deleted since the scan was performed
+        nil
+      end
+      # Starts the job for the supplied entry
+      def start_job(entry, file_name)
+        entry.job.constantize.perform_later(*entry.arguments) do |job|
+          # Set properties, also allows :perform_method to be overridden
+          entry.properties.each_pair { |k, v| job.send("#{k}=".to_sym, v) }
+          upload_file(job, file_name, entry.archive_directory)
+        end
+      end
+      # Upload the file to the job
+      def upload_file(job, file_name, archive_directory)
+        if job.respond_to?(:file_store_upload)
+          # Allow the job to determine what to do with the file
+          job.file_store_upload(file_name)
+          archive_file(file_name, archive_directory)
+        elsif job.respond_to?(:upload)
+          # With RocketJob Pro the file can be uploaded directly into the Job itself
+          job.upload(file_name)
+          archive_file(file_name, archive_directory)
+        else
+          upload_default(job, file_name, archive_directory)
+        end
+      end
+      # Archives the file for a job where there was no #file_store_upload or #upload method
+      def upload_default(job, file_name, archive_directory)
+        # The first argument must be a hash
+        job.arguments << {} if job.arguments.size == 0
+        # If no archive directory is supplied, use DEFAULT_ARCHIVE_DIR under the same path as the file
+        archive_directory ||= File.join(File.dirname(file_name), DEFAULT_ARCHIVE_DIR)
+        file_name = File.join(archive_directory, File.basename(file_name))
+        job.arguments.first[:full_file_name] = File.absolute_path(file_name)
+        archive_file(file_name, archive_directory)
+      end
+      # Move the file to the archive directory
+      # Or, delete it if no archive directory was supplied for this entry
+      #
+      # If the file_name contains a relative path the relative path will be
+      # created in the archive_directory before moving the file.
+      #
+      # If an absolute path is supplied, then the file is just moved into the
+      # archive directory without any sub-directories
+      def archive_file(file_name, archive_directory)
+        # Move file to archive directory if set
+        if archive_directory
+          # Absolute path?
+          target_file_name = if file_name.start_with?('/')
+            File.join(archive_directory, File.basename(file_name))
+          else
+            File.join(archive_directory, file_name)
+          end
+          FileUtils.mkdir_p(File.dirname(target_file_name))
+          FileUtils.move(file_name, target_file_name)
+        else
+          File.delete(file_name)
+        end
+      end
+    end
+  end
+end

data/lib/rocket_job/version.rb CHANGED Viewed

@@ -1,4 +1,4 @@
 # encoding: UTF-8
 module RocketJob #:nodoc
-  VERSION = "0.7.0"
+  VERSION = "0.8.0"
 end

data/lib/rocketjob.rb CHANGED Viewed

@@ -8,6 +8,7 @@ require 'rocket_job/version'
 module RocketJob
   autoload :CLI,                   'rocket_job/cli'
   autoload :Config,                'rocket_job/config'
+  autoload :DirmonEntry,           'rocket_job/dirmon_entry'
   autoload :Heartbeat,             'rocket_job/heartbeat'
   autoload :Job,                   'rocket_job/job'
   autoload :JobException,          'rocket_job/job_exception'
@@ -15,4 +16,7 @@ module RocketJob
   module Concerns
     autoload :Worker,              'rocket_job/concerns/worker'
   end
+  module Jobs
+    autoload :DirmonJob,           'rocket_job/jobs/dirmon_job'
+  end
 end

data/test/config/mongo.yml CHANGED Viewed

@@ -11,35 +11,35 @@ default_options: &default_options
   :reconnect_max_retry_seconds: 5
 development:
-  uri: mongodb://localhost:27017/development_rocket_job
+  uri: mongodb://localhost:27017/development_rocketjob
   options:
     <<: *default_options
 development_work:
-  uri: mongodb://localhost:27017/development_rocket_job_work
+  uri: mongodb://localhost:27017/development_rocketjob_work
   options:
     <<: *default_options
 test:
-  uri: mongodb://localhost:27017/test_rocket_job
+  uri: mongodb://localhost:27017/test_rocketjob
   options:
     <<: *default_options
 test_work:
-  uri: mongodb://localhost:27017/test_rocket_job_work
+  uri: mongodb://localhost:27017/test_rocketjob_work
   options:
     <<: *default_options
 # Sample Production Settings
 production:
-  uri: mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production_rocket_job
+  uri: mongodb://mongo1.site.com:27017,mongo2.site.com:27017/production_rocketjob
   options:
     <<: *default_options
     :pool_size:    50
     :pool_timeout: 5
 production_work:
-  uri: mongodb://mongo_local.site.com:27017/production_rocket_job_work
+  uri: mongodb://mongo_local.site.com:27017/production_rocketjob_work
   options:
     <<: *default_options
     :pool_size:    50

data/test/dirmon_job_test.rb ADDED Viewed

@@ -0,0 +1,299 @@
+require_relative 'test_helper'
+require_relative 'jobs/test_job'
+# Unit Test for RocketJob::Job
+class DirmonJobTest < Minitest::Test
+  context RocketJob::Jobs::DirmonJob do
+    setup do
+      @server = RocketJob::Server.new
+      @server.started
+      @dirmon_job   = RocketJob::Jobs::DirmonJob.new
+      @archive_directory = '/tmp/archive_directory'
+      @entry        = RocketJob::DirmonEntry.new(
+        path:         'abc/*',
+        job:          'Jobs::TestJob',
+        arguments:    [ { input: 'yes' } ],
+        properties:   { priority: 23, perform_method: :event },
+        archive_directory: @archive_directory
+      )
+      @job = Jobs::TestJob.new
+      @paths = {
+        'abc/*' => %w(abc/file1 abc/file2)
+      }
+    end
+    teardown do
+      @dirmon_job.destroy if @dirmon_job && !@dirmon_job.new_record?
+      FileUtils.remove_dir(@archive_directory, true) if Dir.exist?(@archive_directory)
+    end
+    context '.config' do
+      should 'support multiple databases' do
+        assert_equal 'test_rocketjob', RocketJob::DirmonEntry.collection.db.name
+      end
+    end
+    context '#archive_file' do
+      should 'archive absolute path file' do
+        begin
+          file = Tempfile.new('archive')
+          file_name = file.path
+          File.open(file_name, 'w') { |file| file.write('Hello World') }
+          assert File.exists?(file_name)
+          @dirmon_job.archive_file(file_name, @archive_directory)
+          archive_file_name = File.join(@archive_directory, File.basename(file_name))
+          assert File.exists?(archive_file_name), archive_file_name
+        ensure
+          file.delete if file
+        end
+      end
+      should 'archive relative path file' do
+        begin
+          relative_path = 'tmp'
+          FileUtils.mkdir_p(relative_path)
+          file_name = File.join(relative_path, 'dirmon_job_test.txt')
+          File.open(file_name, 'w') { |file| file.write('Hello World') }
+          @dirmon_job.archive_file(file_name, @archive_directory)
+          archive_file_name = File.join(@archive_directory, file_name)
+          assert File.exists?(archive_file_name), archive_file_name
+        ensure
+          File.delete(file_name) if file_name && File.exists?(file_name)
+        end
+      end
+    end
+    context '#upload_default' do
+      should 'upload default case with no archive_directory' do
+        job              = Jobs::TestJob.new
+        file_name        = 'abc/myfile.txt'
+        archived_file_name = 'abc/_archive/myfile.txt'
+        @dirmon_job.stub(:archive_file, -> fn, tp { assert_equal [archived_file_name, 'abc/_archive'], [fn, tp] }) do
+          @dirmon_job.upload_default(job, file_name, nil)
+        end
+        assert_equal File.absolute_path(archived_file_name), job.arguments.first[:full_file_name]
+      end
+      should 'upload default case with archive_directory' do
+        job              = Jobs::TestJob.new
+        file_name        = 'abc/myfile.txt'
+        archived_file_name = "#{@archive_directory}/myfile.txt"
+        @dirmon_job.stub(:archive_file, -> fn, tp { assert_equal [archived_file_name, @archive_directory], [fn, tp] }) do
+          @dirmon_job.upload_default(job, file_name, @archive_directory)
+        end
+        assert_equal File.absolute_path(archived_file_name), job.arguments.first[:full_file_name]
+      end
+    end
+    context '#upload_file' do
+      should 'upload using #file_store_upload' do
+        job              = Jobs::TestJob.new
+        job.define_singleton_method(:file_store_upload) do |file_name|
+          file_name
+        end
+        file_name        = 'abc/myfile.txt'
+        @dirmon_job.stub(:archive_file, -> fn, tp { assert_equal [file_name, @archive_directory], [fn, tp] }) do
+          @dirmon_job.upload_file(job, file_name, @archive_directory)
+        end
+      end
+      should 'upload using #upload' do
+        job              = Jobs::TestJob.new
+        job.define_singleton_method(:upload) do |file_name|
+          file_name
+        end
+        file_name        = 'abc/myfile.txt'
+        @dirmon_job.stub(:archive_file, -> fn, tp { assert_equal [file_name, @archive_directory], [fn, tp] }) do
+          @dirmon_job.upload_file(job, file_name, @archive_directory)
+        end
+      end
+    end
+    context '#start_job' do
+      setup do
+        RocketJob::Config.inline_mode = true
+      end
+      teardown do
+        RocketJob::Config.inline_mode = false
+      end
+      should 'upload using #upload' do
+        file_name = 'abc/myfile.txt'
+        job = @dirmon_job.stub(:upload_file, -> j, fn, sp { assert_equal [file_name, @archive_directory], [fn, sp] }) do
+          @dirmon_job.start_job(@entry, file_name)
+        end
+        assert_equal @entry.job, job.class.name
+        assert_equal 23, job.priority
+        assert_equal [ {:input=>"yes", "before_event"=>true, "event"=>true, "after_event"=>true} ], job.arguments
+      end
+    end
+    context '#check_file' do
+      should 'check growing file' do
+        previous_size = 5
+        new_size      = 10
+        file          = Tempfile.new('check_file')
+        file_name     = file.path
+        File.open(file_name, 'w') { |file| file.write('*' * new_size) }
+        assert_equal new_size, File.size(file_name)
+        result = @dirmon_job.check_file(@entry, file_name, previous_size)
+        assert_equal new_size, result
+      end
+      should 'check completed file' do
+        previous_size = 10
+        new_size      = 10
+        file          = Tempfile.new('check_file')
+        file_name     = file.path
+        File.open(file_name, 'w') { |file| file.write('*' * new_size) }
+        assert_equal new_size, File.size(file_name)
+        started = false
+        result = @dirmon_job.stub(:start_job, -> e,fn { started = true } ) do
+          @dirmon_job.check_file(@entry, file_name, previous_size)
+        end
+        assert_equal nil, result
+        assert started
+      end
+      should 'check deleted file' do
+        previous_size = 5
+        file_name     = 'blah'
+        result = @dirmon_job.check_file(@entry, file_name, previous_size)
+        assert_equal nil, result
+      end
+    end
+    context '#check_directories' do
+      setup do
+        @entry.save!
+      end
+      teardown do
+        @entry.destroy if @entry
+      end
+      should 'no files' do
+        previous_file_names = {}
+        result = nil
+        Dir.stub(:[], -> dir { [] }) do
+          result = @dirmon_job.check_directories(previous_file_names)
+        end
+        assert_equal 0, result.count
+      end
+      should 'new files' do
+        previous_file_names = {}
+        result = nil
+        Dir.stub(:[], -> dir { @paths[dir] }) do
+          result = @dirmon_job.stub(:check_file, -> e, fn, ps { 5 } ) do
+            @dirmon_job.check_directories(previous_file_names)
+          end
+        end
+        assert_equal result.count, @paths['abc/*'].count
+        result.each_pair do |k,v|
+          assert_equal 5, v
+        end
+      end
+      should 'allow files to grow' do
+        previous_file_names = {}
+        @paths['abc/*'].each { |file_name| previous_file_names[file_name] = 5}
+        result = nil
+        Dir.stub(:[], -> dir { @paths[dir] }) do
+          result = @dirmon_job.stub(:check_file, -> e, fn, ps { 10 } ) do
+            @dirmon_job.check_directories(previous_file_names)
+          end
+        end
+        assert_equal result.count, @paths['abc/*'].count
+        result.each_pair do |k,v|
+          assert_equal 10, v
+        end
+      end
+      should 'start all files' do
+        previous_file_names = {}
+        @paths['abc/*'].each { |file_name| previous_file_names[file_name] = 10 }
+        result = nil
+        Dir.stub(:[], -> dir { @paths[dir] }) do
+          result = @dirmon_job.stub(:check_file, -> e, fn, ps { nil } ) do
+            @dirmon_job.check_directories(previous_file_names)
+          end
+        end
+        assert_equal 0, result.count
+      end
+      should 'skip files in archive directory' do
+        previous_file_names = {}
+        @paths['abc/*'].each { |file_name| previous_file_names[file_name] = 5}
+        result = nil
+        # Add a file in the archive directory
+        @paths['abc/*'] << File.join('abc', RocketJob::Jobs::DirmonJob::DEFAULT_ARCHIVE_DIR, 'test.zip')
+        Dir.stub(:[], -> dir { @paths[dir] }) do
+          result = @dirmon_job.stub(:check_file, -> e, fn, ps { 10 } ) do
+            @dirmon_job.check_directories(previous_file_names)
+          end
+        end
+        assert_equal result.count, @paths['abc/*'].count - 1
+        result.each_pair do |k,v|
+          assert_equal 10, v
+        end
+      end
+    end
+    context '#perform' do
+      should 'check directories and reschedule' do
+        dirmon_job       = nil
+        previous_file_names = {}
+        @paths['abc/*'].each { |file_name| previous_file_names[file_name] = 5 }
+        new_file_names = {}
+        @paths['abc/*'].each { |file_name| new_file_names[file_name] = 10 }
+        RocketJob::Jobs::DirmonJob.destroy_all
+        RocketJob::Jobs::DirmonJob.stub_any_instance(:check_directories, new_file_names) do
+          # perform_now does not save the job, just runs it
+          dirmon_job = RocketJob::Jobs::DirmonJob.perform_now(previous_file_names) do |job|
+            job.priority = 11
+            job.check_seconds = 30
+          end
+        end
+        assert dirmon_job.completed?, dirmon_job.status.inspect
+        # It should have enqueued another instance to run in the future
+        assert_equal 1, RocketJob::Jobs::DirmonJob.count
+        assert new_dirmon_job = RocketJob::Jobs::DirmonJob.last
+        assert_equal false, dirmon_job.id == new_dirmon_job.id
+        assert new_dirmon_job.run_at
+        assert_equal 11, new_dirmon_job.priority
+        assert_equal 30, new_dirmon_job.check_seconds
+        assert new_dirmon_job.queued?
+        new_dirmon_job.destroy
+      end
+      should 'check directories and reschedule even on exception' do
+        dirmon_job = nil
+        RocketJob::Jobs::DirmonJob.destroy_all
+        RocketJob::Jobs::DirmonJob.stub_any_instance(:check_directories, -> previous { raise RuntimeError.new("Oh no") }) do
+          # perform_now does not save the job, just runs it
+          dirmon_job = RocketJob::Jobs::DirmonJob.perform_now do |job|
+            job.priority = 11
+            job.check_seconds = 30
+          end
+        end
+        assert dirmon_job.failed?, dirmon_job.status.inspect
+        # It should have enqueued another instance to run in the future
+        assert_equal 2, RocketJob::Jobs::DirmonJob.count
+        assert new_dirmon_job = RocketJob::Jobs::DirmonJob.last
+        assert new_dirmon_job.run_at
+        assert_equal 11, new_dirmon_job.priority
+        assert_equal 30, new_dirmon_job.check_seconds
+        assert new_dirmon_job.queued?
+        new_dirmon_job.destroy
+      end
+    end
+  end
+end

data/test/job_test.rb CHANGED Viewed

@@ -22,7 +22,7 @@ class JobTest < Minitest::Test
     context '.config' do
       should 'support multiple databases' do
-        assert_equal 'test_rocket_job', RocketJob::Job.collection.db.name
+        assert_equal 'test_rocketjob', RocketJob::Job.collection.db.name
       end
     end
@@ -55,10 +55,8 @@ class JobTest < Minitest::Test
         assert_equal @arguments, @job.arguments
         assert_equal 0, @job.percent_complete
         assert_equal 50, @job.priority
-        assert_equal true, @job.repeatable
         assert_equal 0, @job.failure_count
         assert_nil   @job.run_at
-        assert_nil   @job.schedule
         assert_nil   @job.started_at
         assert_equal :queued, @job.state
       end
@@ -181,6 +179,14 @@ class JobTest < Minitest::Test
         assert_equal @job.id, job.id
       end
+      should 'Skip expired jobs' do
+        count = RocketJob::Job.count
+        @job.expires_at = Time.now - 100
+        @job.save!
+        assert_equal nil, RocketJob::Job.next_job(@server.name)
+        assert_equal count, RocketJob::Job.count
+      end
     end
   end
 end

data/test/server_test.rb CHANGED Viewed

@@ -23,7 +23,7 @@ class ServerTest < Minitest::Test
     context '.config' do
       should 'support multiple databases' do
-        assert_equal 'test_rocket_job', RocketJob::Job.collection.db.name
+        assert_equal 'test_rocketjob', RocketJob::Job.collection.db.name
       end
     end

data/test/worker_test.rb CHANGED Viewed

@@ -29,10 +29,8 @@ class WorkerTest < Minitest::Test
           assert_nil   @job.expires_at
           assert_equal 0, @job.percent_complete
           assert_equal 50, @job.priority
-          assert_equal true, @job.repeatable
           assert_equal 0, @job.failure_count
           assert_nil   @job.run_at
-          assert_nil   @job.schedule
           assert_nil   @job.started_at
           assert_equal :queued, @job.state
@@ -50,10 +48,8 @@ class WorkerTest < Minitest::Test
           assert_nil   @job.expires_at
           assert_equal 100, @job.percent_complete
           assert_equal 50, @job.priority
-          assert_equal true, @job.repeatable
           assert_equal 0, @job.failure_count
           assert_nil   @job.run_at
-          assert_nil   @job.schedule
           assert       @job.started_at
         end
       end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rocketjob
 version: !ruby/object:Gem::Version
-  version: 0.7.0
+  version: 0.8.0
 platform: ruby
 authors:
 - Reid Morrison
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2015-07-14 00:00:00.000000000 Z
+date: 2015-07-20 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: aasm
@@ -123,13 +123,16 @@ files:
 - lib/rocket_job/cli.rb
 - lib/rocket_job/concerns/worker.rb
 - lib/rocket_job/config.rb
+- lib/rocket_job/dirmon_entry.rb
 - lib/rocket_job/heartbeat.rb
 - lib/rocket_job/job.rb
 - lib/rocket_job/job_exception.rb
+- lib/rocket_job/jobs/dirmon_job.rb
 - lib/rocket_job/server.rb
 - lib/rocket_job/version.rb
 - lib/rocketjob.rb
 - test/config/mongo.yml
+- test/dirmon_job_test.rb
 - test/job_test.rb
 - test/jobs/test_job.rb
 - test/server_test.rb
@@ -161,6 +164,7 @@ specification_version: 4
 summary: High volume, priority based, Enterprise Batch Processing solution for Ruby
 test_files:
 - test/config/mongo.yml
+- test/dirmon_job_test.rb
 - test/job_test.rb
 - test/jobs/test_job.rb
 - test/server_test.rb