RubyGems - jobserver - Versions diffs - 0.1.4 - Mend

jobserver 0.1.4

Files changed (8) hide show

data/LICENSE.txt ADDED

@@ -0,0 +1,58 @@
+Ruby is copyrighted free software by Yukihiro Matsumoto <matz@netlab.co.jp>.
+You can redistribute it and/or modify it under either the terms of the GPL
+(see COPYING.txt file), or the conditions below:
+  1. You may make and give away verbatim copies of the source form of the
+     software without restriction, provided that you duplicate all of the
+     original copyright notices and associated disclaimers.
+  2. You may modify your copy of the software in any way, provided that
+     you do at least ONE of the following:
+       a) place your modifications in the Public Domain or otherwise
+          make them Freely Available, such as by posting said
+	  modifications to Usenet or an equivalent medium, or by allowing
+	  the author to include your modifications in the software.
+       b) use the modified software only within your corporation or
+          organization.
+       c) rename any non-standard executables so the names do not conflict
+	  with standard executables, which must also be provided.
+       d) make other distribution arrangements with the author.
+  3. You may distribute the software in object code or executable
+     form, provided that you do at least ONE of the following:
+       a) distribute the executables and library files of the software,
+	  together with instructions (in the manual page or equivalent)
+	  on where to get the original distribution.
+       b) accompany the distribution with the machine-readable source of
+	  the software.
+       c) give non-standard executables non-standard names, with
+          instructions on where to get the original software distribution.
+       d) make other distribution arrangements with the author.
+  4. You may modify and include the part of the software into any other
+     software (possibly commercial).  But some files in the distribution
+     are not written by the author, so that they are not under this terms.
+     They are gc.c(partly), utils.c(partly), regex.[ch], st.[ch] and some
+     files under the ./missing directory.  See each file for the copying
+     condition.
+  5. The scripts and library files supplied as input to or produced as
+     output from the software do not automatically fall under the
+     copyright of the software, but belong to whomever generated them,
+     and may be sold commercially, and may be aggregated with this
+     software.
+  6. THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR
+     IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
+     WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+     PURPOSE.

data/README ADDED

@@ -0,0 +1,58 @@
+JobServer README
+================
+ The class JobServer supplies capabilities to execute jobs on the
+ local- or on remote hosts.
+ * Jobs encapsulate the call of a command in a shell on a (remote) host.
+ * Each client is controlled by a worker-thread on the server.
+   When a client is idle it receives the next job from the queue.
+ * Remote jobs will be launched using ssh. Therefore you must
+   configure your ssh keys without authentification password.
+ * A common home directory is not needed for the clients.
+   Different machine architectures as well as binaries are possible.
+ * Data for the clients can be saved to files on the client machines.
+Requirements
+------------
+  * Ruby 1.8
+  * ssh
+Install
+-------
+  De-compress archive and enter its top directory.
+  Then type:
+   ($ su)
+    # ruby setup.rb
+  This simple step installs this program under the default
+  location of Ruby libraries.  You can also install files into
+  your favorite directory by supplying setup.rb some options.
+  Try "ruby setup.rb --help".
+  Alternatively you can use the remote installer RubyGems
+  [http://rubygems.rubyforge.org/] for installation.  Having RubyGems installed
+  on your system, just type:
+   ($ su)
+   # gem install jobserver --remote
+Usage
+-----
+  In order to get an overview of the features you can generate
+  the RDoc documentation and have a look at the examples/ directory.
+License
+-------
+  Ruby License
+Christian Bang, cbang AT web.de

data/examples/example1.rb ADDED

@@ -0,0 +1,19 @@
+require 'jobserver'
+# Create the jobs
+myJobQueue = []
+5.times{|i| myJobQueue << Job.new(:name=>"date job-#{i}",       :client_command=> "date")}
+5.times{|i| myJobQueue << Job.new(:name=>"short date job-#{i}", :client_command=> "date +%Y-%m-%d_%H.%M")}
+# Create the server
+server = JobServer.new(myJobQueue) #run one local worker implicitly
+# You may add some remote machines to which you have ssh access with public key authentication without password.
+# server.add_ssh_worker("192.168.0.1")
+# server.add_ssh_worker("foo@mymachine.xy.org","",2)
+# Dump out statistics on the progress.
+# Default is to file "jobserver_stats.txt", every minute
+server.dumpStatistics
+# Wait until all jobs have finished
+server.serve

data/examples/example2.rb ADDED

@@ -0,0 +1,66 @@
+require 'jobserver'
+# define the handlers for each job:
+pre_run_handler = Proc.new do |job|
+  puts "Running job #{job.name} on #{job.host}"
+  # Initialize the results object as an empty array:
+  job.results = []
+end
+output_handler = Proc.new do |file, job|
+  line = file.gets
+  # Output the calculation, if the line contains one.
+  puts "Job #{job.name} has made the calculation: #{line}" if line =~ /=/
+  # Extract the date if the line contains one. Collect the results in the
+  # job.results variable.
+  job.results << $& if line =~ /\d?\d:\d\d:\d\d/
+end
+# Just another way of defining a handler. For this you must use method(:post_run_handler) later...
+def post_run_handler(job)
+  if job.results.empty?
+    puts "Error executing job #{job.name} on #{job.host}.\n\t#{job}"
+  else
+    # Now that the job has finished, store the results of this job in the global
+    # list of results:
+    # You may store results e.g. in a global variable:
+    if $result[job.host]
+      $result[job.host] << job.results
+    else
+      $result[job.host] = []
+    end
+  end
+end
+# execute the following commands for every job:
+Job.default_client_command = "sleep 10;date;echo"
+Job.nicelevel = nil # in this example we don't need nice
+Job.verbose = 0     # print no messages about the launch of jobs
+# Create the jobs
+myJobQueue = []
+10.times{|i| myJobQueue << Job.new(:name=>"job-#{i}", :params=>"5*#{i}=$((5*#{i}))", :pre_run_handler => pre_run_handler,
+                                    :output_handler=>output_handler, :post_run_handler=>method(:post_run_handler))}
+# Create the server
+server = JobServer.new(myJobQueue) #run one local worker implicitly
+$result = {} # we will store results here
+# You may add some remote machines to which you have ssh access with public key authentication without password.
+# server.add_ssh_worker("192.168.0.1")
+# server.add_ssh_worker("foo@mymachine.xy.org","",2)
+# Dump out statistics on the progress.
+# Default is to file "jobserver_stats.txt", every minute
+server.dumpStatistics
+# Wait until all jobs have finished
+server.serve
+puts "Times on the hosts when the jobs where run:"
+for host,time in $result
+  puts "#{host}: #{time.join(', ')}"
+end

data/examples/example3.rb ADDED

@@ -0,0 +1,328 @@
+#!/usr/bin/env ruby
+=begin
+  This non-executable example is an excerpt from a real application of the jobserver.
+  It was used to run experiments for combinatorial optimization problems.
+  This example derives a new class from Job which enhances the abilities of the jobserver by:
+  - robust detection of failed jobs
+  - jobs with various ressource-requirements will be assigned only to hosts that meet
+    their requirements
+  *Note*: This example assumes, that all hosts share a common home directory.
+  If you would like to have the original file on which this example is based on, write me:
+  Christian Bang, cbang@web.de
+=end
+############################# BEGIN PARAMETER SETTINGS ###################################
+experiment = $*[0].to_i
+# In this section you can add your own experiments. This way you can easily repeat an old experiment without having to
+# change the parameters of the last experiment. The number of the experiment to run is given by command line.
+case experiment # select the experiment you want to make here, or add a new one below
+when 0
+  $stderr.puts "Please give the number of the experiment you want to run as a command line argument."
+  exit
+when 1 ##### Experiment 1
+  # set variables which define the experiment
+  $param1_list = %w{1 3 5 7}
+  $param2_list = %w{xx xy yy}
+  $instances = %w{test01 test02}
+when 2 ##### Experiment 2
+   # set variables which define the experiment
+when 3 ##### Experiment 3
+   # set variables which define the experiment
+else
+  raise "Undefined experiment"
+end
+puts "Running experiment ##{experiment}"
+########################### now the environment dependant variables ####################
+$projectDir = "#{ENV['HOME']}/project"
+$experimentsDir = "#$projectDir/experiments" # where to store the experiments (in subdirectories)
+$experimentName = "MY_PROJECT" # prefix for the job output filenames (if logging enabled), NOT the name of the binary
+$workDir = "#$projectDir/src" # working directory for the processes
+# useHosts is an array of the hosts you want to use for the experiment. If a host has multiple CPUs, all are used.
+# For the definition of the hosts see the $knownHosts matrix below.
+useHosts = %w{hannibal caesar homer asterix}
+#JOB_LOGGING determines whether the output of each job should be kept in a log file (in $logDir)
+JOB_LOGGING = false
+# time when the exeperiment began. All log filenames contain this
+$timestamp = `date +%Y-%m-%d_%H.%M`.chomp
+$logDir = "#$experimentsDir/log.#{$experimentName}_#{$timestamp}"
+############################# END PARAMETER SETTINGS ###################################
+# usually you won't have to change anything below here
+$LOAD_PATH << File.dirname($0)
+require 'jobserver'
+require 'fileutils'
+# JobData has the following fields:
+# :host_requirements:
+#   An OR concatenated list of REQUIRE_ variables like <tt>REQUIRE_RUBY|REQUIRE_GNUPLOT</tt>.
+#   A job is only run on a host fulfilling these constraints.
+# :inputFiles:
+#   A string or an array of strings containing ames of files that are needed to run
+#   the job. If they don't exist, the job fails. But in case a gzipped input file is found,
+#   it is unzipped first. (gzipped again later, if another job does it)
+# :outputFiles:
+#   A string or an array of strings containing names of files that will be created when the
+#   job is run. If all the files already exist and are not corrupt, the job is not run but
+#   marked as success. See next item for what corrupt means.
+# :successToken:
+#   A string or an array of strings corresponding to an entry in +outputFiles+.
+#   If a successToken for the the output file is given and the file exists then a grep
+#   search for this token is made in the file. If the token was found, the file is deemed okay,
+#   else it is corrupt and will be deleted before execution of the job.
+# :alwaysOverride: if true, existing output files will be overridden.
+# :moreData: is some job-type specific data you can use for your own jobs. (E.g. used for latex generation)
+JobData = Struct.new(:host_requirements, :inputFiles, :outputFiles, :successToken, :alwaysOverride, :moreData)
+#add new software requirements here and in the list of $knownHosts below
+REQUIRE_NONE = 0; REQUIRE_RUBY = 1; REQUIRE_GNUPLOT = 2; REQUIRE_MY_PROJECT = 4; REQUIRE_R = 8; REQUIRE_LATEX = 16
+# These tokens should identify a given file that is NOT corrupt. That means the file could be generated
+# completely. Add your own tokens here.
+EPS_FILE_SUCCESS_TOKEN      = "%%Trailer"
+LATEX_SUCCESS_TOKEN         = "end{document}"
+OUTPUT_FORMAT_SUCCESS_TOKEN = "end data"
+# A list of known (and allowed) hosts in your local network.
+$knownHosts = [
+  #hostname, numCPUs, available software
+  ["merlin",   1, REQUIRE_RUBY|REQUIRE_GNUPLOT|REQUIRE_MY_PROJECT|REQUIRE_R|REQUIRE_LATEX],
+  ["asterix",  2, REQUIRE_RUBY|REQUIRE_GNUPLOT|REQUIRE_MY_PROJECT|REQUIRE_R|REQUIRE_LATEX],
+  ["hannibal", 2, REQUIRE_RUBY|REQUIRE_GNUPLOT|REQUIRE_MY_PROJECT|REQUIRE_R              ],
+  ["caesar",   2, REQUIRE_RUBY|REQUIRE_GNUPLOT|REQUIRE_MY_PROJECT          |REQUIRE_LATEX],
+  ["nero",     1, REQUIRE_RUBY|REQUIRE_GNUPLOT|REQUIRE_MY_PROJECT|REQUIRE_R|REQUIRE_LATEX],
+  ["herodot",  2, REQUIRE_RUBY|REQUIRE_GNUPLOT|REQUIRE_MY_PROJECT          |REQUIRE_LATEX],
+  ["cicero",   1,                              REQUIRE_MY_PROJECT|REQUIRE_R|REQUIRE_LATEX],
+  ["brutus",   1,              REQUIRE_GNUPLOT|REQUIRE_MY_PROJECT|REQUIRE_R|REQUIRE_LATEX],
+  ["homer",    2,              REQUIRE_GNUPLOT|                   REQUIRE_R|REQUIRE_LATEX],
+  ["platon",   1,              REQUIRE_GNUPLOT|REQUIRE_MY_PROJECT|REQUIRE_R|REQUIRE_LATEX]
+]
+# If you want to do time sensitive analysis and hence want to use only machines of the same type
+# for an experiment you could insert a new constraint, e.g. REQUIRE_TIME, that is only satisfied on
+# those machines you want to run the experiment on. An alternative approach is to create an object-specific class
+# for those jobs that are time sensitive:
+# timeSensitiveJob = MyProjectJob.new(...)
+# def timeSensitiveJob.runsOnHost(hostname)
+#   return hostname == "myspecialhost" && super(hostname)
+# end
+# hostname variable, used below. Didn't use ENV['HOSTNAME'] since this didn't always work.
+$HOSTNAME = `echo $HOSTNAME`.chomp
+# This subclass of Job doesn't run a job if the output files are already there and intact.
+# Corrupt/incomplete files will be removed and the job will be repeated (up to 3 times).
+# It also writes the job output to a log file for each job in +$workDir+.
+# Furthermore the data field is expected to be of type +JobData+.
+# The restriction is that jobs can only be run in the same local network where the home directory
+# is common on all hosts.
+class MyProjectJob < Job
+  # project-run could be a shell script that decides, which binary to call depending on the machine
+  # it has been called.
+  @@default_client_command = "#{$projectDir}/bin/project-run"
+  def initialize(name, params, data, dependencies = nil, client_command = nil)
+    super(:data => data,
+          :dependencies => dependencies,
+          :pre_run_handler => method(:pre_run_handler),
+          :post_run_handler => method(:post_run_handler),
+          :client_command => client_command)
+    @outputLogFileName = JOB_LOGGING ?  "%s/job-%05d.log" % [$logDir,@number] : "/dev/null"
+    @name = "%05d-" % @number + name
+    @params = params + " 2>&1 >>#{@outputLogFileName}"
+  end
+  def pre_run_handler(job)
+    info = "date #{`date`.chomp}: Running job \"#{job.name}\" on #{job.host}"
+    puts info
+    if JOB_LOGGING
+      system("echo #{quote(info)}>>#@outputLogFileName")
+      system("echo \"CALL: #{job.client_command} #{job.params}\">>#@outputLogFileName")
+    end
+  end
+  def post_run_handler(job)
+    info = "date #{`date`.chomp}: "
+    info += ((job.results) ? "finished" : "FAILED") +" Job \"#{job.name}\" on #{job.host}"
+    puts info
+    system("echo #{quote(info)}>>#@outputLogFileName") if JOB_LOGGING
+  end
+  def runCommand(command) # override Job.runCommand
+    data.successToken = [data.successToken] unless data.successToken.is_a?(Array)
+    data.inputFiles = [data.inputFiles] unless data.inputFiles.is_a?(Array)
+    data.outputFiles = [data.outputFiles] unless data.outputFiles.is_a?(Array)
+    ## Check, if all output files already exist and are not corrupt. Abort (with success), if true
+    allExists = true; tokens = data.successToken.clone
+    if !data.alwaysOverride
+      for outputFile in data.outputFiles
+        token = tokens.shift
+        if File.size?(outputFile)
+          if data.successToken
+            if `grep "#{token}" '#{outputFile}'` == '' # no success?
+              File.delete(outputFile)
+              puts "Overwriting corrupt output file: #{outputFile}"
+              allExists = false
+            end
+          # no token? assume, the existing file is okay
+          end
+        else # no file?
+          if File.size?(outputFile+".gz") # but a zipped version?
+            # assume that the zipped file is not corrupt since a corruptency test is made
+            # after each run and before the file is zipped
+          else
+            allExists = false
+          end
+        end
+      end
+    end
+    return if !data.alwaysOverride and allExists and not data.outputFiles.empty? # if no output files are written, run anyway
+    for inputFile in data.inputFiles
+      if not File.size?(inputFile) # this should not happen since the dependencies should take care of this
+        if File.size?(inputFile+".gz")
+          puts "Unzipping #{inputFile}.gz ..."
+          system("gunzip -f \"#{inputFile}.gz\"") # uncompress zipped input data if available
+          # we use the server for this which is kind of dirty but this case should not happen too often
+          # it will be recompressed by the "Compressing data"-job
+        else
+          puts "ERROR: input file '#{inputFile}' not found for command #{command}. Job failed!"
+          @results = nil
+          return
+        end
+      end
+    end
+    data.outputFiles.each{|f| FileUtils.mkdir_p(File.dirname(f)) }
+    super(command) # run the command
+    tokens = data.successToken.clone
+    for outputFile in data.outputFiles
+      token = tokens.shift
+      if File.size?(outputFile)
+        if data.successToken
+          if `grep "#{token}" '#{outputFile}'` == '' # no success?
+            File.delete(outputFile)
+            info = "Deleting corrupt file: #{outputFile}"
+            puts info
+            system("echo '#{info}'>>#@outputLogFileName")
+            @results = nil # mark failure because file corrupt
+          end
+        # no token? assume, the existing file is okay
+        end
+      else
+        @results = nil # mark failure because file not found
+      end
+    end
+  end
+  # Tests, whether the host fulfills the requirements of the job.
+  def runsOnHost(hostname)
+    hostname = $HOSTNAME if hostname == "localhost"
+    hostInfo = $knownHosts.assoc(hostname)
+    return (data.host_requirements & hostInfo[2]) == data.host_requirements
+  end
+end
+# override JobServer.retryJob:
+class JobServer
+  def retryJob(job)
+    if job.numTries < 3
+      puts "FAILURE: Will try to run job later: #{job}"
+      true
+    end # else return false implicitly
+  end
+end
+# Return a list of MyProjectJob instances that can be passed to the JobServer.
+# If you want to change the design of the experiment then change it here.
+def generateJobs
+  print "Generating jobs ... "
+  jobs = []
+  for param1 in $param1_list
+    for param2 in $param2_list
+      for instance in $instances
+        workdir = "#{$experimentsDir}/#{instance}/param1=#{param1},param2=#{param2}"
+        ############################# Generate Experiments #################################
+        outputA = "#{workdir}/my-experiment-type-A.dat"
+        params = "--input Instances/#{instance} --param1 #{param1} --param2 #{param2} --output #{outputA}"
+        jobdata = JobData.new(REQUIRE_MY_PROJECT, [], outputA, OUTPUT_FORMAT_SUCCESS_TOKEN)
+        jobA = MyProjectJob.new("type A: #{instance}; #{param1}, #{param2}", params, jobdata, nil)
+        jobs << jobA
+        # this experiment depends on the previous one of type A
+        outputB = "#{workdir}/my-experiment-type-B.dat"
+        params = "--input {outputA} --param1 #{param1} --param2 #{param2} --output #{outputB}"
+        jobdata = JobData.new(REQUIRE_MY_PROJECT, outputA, outputB, OUTPUT_FORMAT_SUCCESS_TOKEN)
+        jobB = MyProjectJob.new("type B: #{param1}, #{param2}", params, jobdata, [jobA])
+        jobs << jobB
+        ################################# Create plots ##################################
+        # Create multiplot (cost at iteration)
+        inputs = [outputA, outputB]
+        output = "#{workdir}/multiple.eps"
+        jobdata = JobData.new(REQUIRE_RUBY|REQUIRE_GNUPLOT, inputs, output, EPS_FILE_SUCCESS_TOKEN)
+        jobs << plotJob = MyProjectJob.new("Multiplot: <Title>",
+                  "<input params> -o '#{output}'", jobdata,
+                  [jobA, jobB], "#$projectDir/bin/multiplot_project.rb")
+        ################################# compress data ##################################
+        filesToZip = [outputA, outputB]
+        jobdata = JobData.new(REQUIRE_NONE, [], filesToZip.map{|f|f+".gz"}, nil)
+        jobs << MyProjectJob.new("Compressing data", filesToZip.map{|f| "'"+f+"'"}.join(" "), jobdata,
+                                 [jobA,jobB,plotJob], "gzip")
+        # the files will be uncompressed when needed again (see MyProjectJob.runCommand)
+        ############# Create Latex file with all the plots for this instance. ############
+        # this section has been excluded for this example
+        # See the latex package on www.rubyforge.org
+      end
+    end
+  end
+  puts "#{jobs.length} jobs generated."
+  return jobs
+end # generateJobs
+##################################################
+################ MAIN PROGRAM ####################
+##################################################
+FileUtils.mkdir_p($logDir)
+##### Creating the job server ##############
+Thread.abort_on_exception = true # terminates the program when an exception occurred in a Thread
+server = JobServer.new(generateJobs, $workDir, 0)
+server.dumpStatistics(statsFilename = "#$logDir/jobserver_stats.txt",30)
+puts "The server has started at #{`date`}"
+puts "Waiting for workers to finish the jobs..."
+puts "Look in #{statsFilename} to see the current state of the server"
+for host in useHosts
+  hostInfo = $knownHosts.assoc(host)
+  unless hostInfo
+    raise(ArgumentError, "ERROR: unregistered host: #{host}", caller)
+  end
+  host,numCPUs,features = hostInfo
+  if host == $HOSTNAME or host == "localhost"
+    server.add_local_worker(numCPUs)
+  else
+    server.add_ssh_worker(host, $workDir, numCPUs)
+  end
+end
+server.serve
+puts "The server has finished at #{`date`}"