once-only 0.2.1 → 0.2.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (5) hide show
  1. data/README.md +59 -15
  2. data/VERSION +1 -1
  3. data/bin/once-only +34 -5
  4. data/lib/once-only/check.rb +29 -3
  5. metadata +3 -3
data/README.md CHANGED
@@ -2,14 +2,46 @@
2
2
 
3
3
  [![Build Status](https://secure.travis-ci.org/pjotrp/once-only.png)](http://travis-ci.org/pjotrp/once-only)
4
4
 
5
- Relax with PBS!
5
+ Relax with PBS!
6
+
7
+ No worries about running jobs concurrently from the command line (also
8
+ on multi-core). Once-only is inspired by the Lisp once-only function,
9
+ which wraps another function and calculates a result only once, based
10
+ on the same inputs. Simply prepend your command with once-only:
11
+
12
+ When running
13
+
14
+ ```bash
15
+ once-only -d cluster00073 --pbs --in output.best.dnd ~/opt/paml/bin/codeml ~/paml7-8.ctl
16
+ ```
17
+
18
+ This is what you want to see when same the job was executed before
19
+
20
+ ```bash
21
+ **STATUS** Job 00073codemla4817 already completed!
22
+ ```
23
+
24
+ This is what you see when a job is running
25
+
26
+ ```bash
27
+ **STATUS** Job 00073codemla4817 is locked!
28
+ ```
29
+
30
+ With PBS, this is what you want to see when a job is already in the queue
31
+
32
+ ```bash
33
+ **STATUS** Job 00073codemla4817 already in queue!
34
+ ```
35
+
36
+ Features
6
37
 
7
38
  * Computations only happen once
8
- * A completed job does not get submitted again to PBS
39
+ * A completed job does not get submitted again (to PBS)
9
40
  * A job already in the queue does not get submitted again to PBS
10
41
  * A completed job in the PBS queue does not run again
42
+ * A running job is locked
11
43
  * Guarantee independently executed jobs
12
- * Do not worry about submitting serial jobs
44
+ * Do not worry about submitting serial jobs multiple times
13
45
 
14
46
  and coming
15
47
 
@@ -19,12 +51,18 @@ and coming
19
51
  Once-only makes a program or script run only *once*, provided the inputs don't
20
52
  change (in a functional style!). This is very useful when running a range of
21
53
  jobs on a compute cluster or GRID. It may even be useful in the context of
22
- webservices. Once-only makes it relaxed to run many jobs on compute clusters!
23
- A mistake, interruption, or even a parameter tweak, does not mean everything
24
- has to be run again. When running jobs serially you can just batch
25
- submit them after getting the first results. Any missed jobs can be
26
- run later again. This way you can get better utilisation of the
27
- cluster.
54
+ webservices.
55
+
56
+ Once-only makes it relaxed to run many jobs on compute clusters! A
57
+ mistake, interruption, or even a parameter tweak, does not mean
58
+ everything has to be run again. When running jobs serially you can
59
+ just batch submit them after getting the first results. Any missed
60
+ jobs can be run later again. This way you can get better utilisation
61
+ of your cores or a cluster. You can even use it as a poor-mans PBS on
62
+ your multi-core machine, or over NFS by firing up scripts
63
+ concurrently.
64
+
65
+ Examples:
28
66
 
29
67
  Instead of running a tool or script directly, such as
30
68
 
@@ -89,12 +127,6 @@ md5sum on the one-only has file, for example
89
127
  grep MD5 bio-table-ce4ceee0d2ee08ef235662c35b8238ad47fed030.txt |awk 'BEGIN { FS = "[ \t\n]+" }{ print $2,"",$3 }'|md5sum -c
90
128
  ```
91
129
 
92
- Once-only is inspired by the Lisp once-only function, which wraps another
93
- function and calculates a result only once, based on the same inputs. It is
94
- also inspired by the NixOS software deployment system, which guarantees
95
- packages are uniquely deployed, based on the source code inputs and the
96
- configuration at compile time.
97
-
98
130
  ## Installation
99
131
 
100
132
  Note: once-only is written in Ruby, but you don't need to understand
@@ -234,6 +266,18 @@ Note that files that come with a path will be stripped of their path
234
266
  before execution. When files are very large you may want to consider
235
267
  the --scratch option.
236
268
 
269
+ ### Precalculated hashes
270
+
271
+ The --precalc option allows for using precalculated hash values. The
272
+ extension says what hash to use. Example:
273
+
274
+ ```sh
275
+ once-only --precalc hash.md5 /bin/cat ~/.bashrc
276
+ ```
277
+
278
+ Once-only will pick up the values from 'hash.md5' and use those after
279
+ making sure the time stamp of the hash file is most recent.
280
+
237
281
  ### Use the scratch disk with --scratch (nyi)
238
282
 
239
283
  watch this page
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.2.1
1
+ 0.2.2
@@ -19,10 +19,12 @@ Usage:
19
19
  --skip-regex regex skip making checksumes of filenames that match the regex (multiple allowed)
20
20
  --skip-glob regex skip making checksumes of filenames that match the glob (multiple allowed)
21
21
  --include|--in file include input filename for making the checksums (file should exist)
22
+ --precalc file use precalculated Hash values (extension .md5)
22
23
  -v increase verbosity
23
24
  -q run quietly
24
25
  --debug give debug information
25
26
  --dry-run do not execute command
27
+ --ignore-lock ignore locked files (they expire normally after 5 hours)
26
28
  --force force execute command
27
29
 
28
30
  Examples:
@@ -67,7 +69,7 @@ def exit_error errval = 1, msg = nil
67
69
  end
68
70
 
69
71
  def parse_args(args)
70
- options = { :skip => [], :skip_regex => [], :skip_glob => [], :include => [] }
72
+ options = { :precalc => [], :skip => [], :skip_regex => [], :skip_glob => [], :include => [] }
71
73
 
72
74
  consume = lambda { |args|
73
75
  if not args[0]
@@ -112,6 +114,10 @@ def parse_args(args)
112
114
  when '--copy'
113
115
  options[:copy] = true
114
116
  consume.call(args[1..-1])
117
+ when '--precalc'
118
+ p args
119
+ options[:precalc] << args[1]
120
+ consume.call(args[2..-1])
115
121
  when '-h', '--help'
116
122
  print USAGE
117
123
  exit 1
@@ -127,6 +133,9 @@ def parse_args(args)
127
133
  when '--dry-run'
128
134
  options[:dry_run] = true
129
135
  consume.call(args[1..-1])
136
+ when '--ignore-lock'
137
+ options[:ignore_lock] = true
138
+ consume.call(args[1..-1])
130
139
  when '--force'
131
140
  options[:force] = true
132
141
  consume.call(args[1..-1])
@@ -158,6 +167,10 @@ once_only_args = OnceOnly::Check.drop_pbs_option(once_only_args)
158
167
  once_only_args = OnceOnly::Check.drop_dir_option(once_only_args)
159
168
  once_only_command = once_only_args.join(' ')
160
169
 
170
+ # --- Fetch the pre-calculated checksums
171
+ precalc = OnceOnly::Check.precalculated_checksums(options[:precalc])
172
+
173
+ # --- Calculate the checksums for the items in the list
161
174
  command = args.join(' ')
162
175
  command_sorted = args.sort.join(' ')
163
176
  command_sha1 = OnceOnly::Check::calc_checksum(command_sorted)
@@ -173,6 +186,7 @@ base_dir = Dir.pwd
173
186
  executable = args[0]
174
187
  args = args[1..-1] if options[:skip_exe]
175
188
 
189
+ # Handle the file list
176
190
  file_list = OnceOnly::Check::get_file_list(args)
177
191
  options[:skip_regex].each { |regex|
178
192
  file_list = OnceOnly::Check::filter_file_list(file_list,regex)
@@ -186,16 +200,29 @@ OnceOnly::Check::check_files_exist(options[:include])
186
200
  file_list += options[:include]
187
201
  file_list = file_list.uniq
188
202
 
189
- checksums = OnceOnly::Check::calc_file_checksums(file_list)
203
+ checksums = OnceOnly::Check::calc_file_checksums(file_list,precalc)
190
204
  checksums.push ['SHA1',command_sha1,command_sorted] if not options[:skip_cli]
191
205
 
192
206
  # ---- Create filenames
193
207
  once_only_filename = OnceOnly::Check::make_once_filename(checksums,File.basename(executable))
194
208
  $stderr.print "Check file name ",once_only_filename,"\n" if options[:verbose]
195
209
  error_filename = once_only_filename + '.err'
196
- tag_filename = once_only_filename + '.run'
197
210
  $stderr.print "**STATUS** Job file exists ",once_only_filename,"!\n" if options[:debug] and File.exist?(once_only_filename)
198
211
 
212
+ # ---- The 'run' file is used to prepare for a job
213
+ tag_filename = once_only_filename + '.run'
214
+
215
+ # ---- The 'lock' file is used when the job is running
216
+ lock_filename = once_only_filename + '.lock'
217
+ if File.exist?(lock_filename) and not options[:force] and not options[:ignore_lock]
218
+ $stderr.print "**STATUS** Job is locked with #{lock_filename} '#{original_commands}'!\n" if not options[:quiet]
219
+ if File.mtime(lock_filename) < Time.now - 18000
220
+ $stderr.print "**STATUS ** Lock is stale, retrying now\n"
221
+ else
222
+ exit 0
223
+ end
224
+ end
225
+
199
226
  # ---- Create job name
200
227
  dirname = File.basename(Dir.pwd).rjust(8,"-") # make sure it is long enough
201
228
 
@@ -207,7 +234,7 @@ if options[:copy]
207
234
  copy_dir = base_dir + '/' + File.basename(once_only_filename,".txt")
208
235
  end
209
236
 
210
- if options[:force] or not File.exist?(once_only_filename)
237
+ if options[:force] or not File.exist?(once_only_filename)
211
238
  $stderr.print "Running #{command}\n" if not options[:quiet]
212
239
  OnceOnly::Check::write_file(tag_filename,checksums)
213
240
  if options[:pbs]
@@ -240,6 +267,7 @@ if options[:force] or not File.exist?(once_only_filename)
240
267
  else
241
268
  # --- Run on command line
242
269
  if !options[:dry_run]
270
+ File.open(lock_filename, "w") {}
243
271
  success =
244
272
  if options[:copy]
245
273
  exit_error(1,"Directory #{copy_dir} already exists!") if File.directory?(copy_dir)
@@ -274,6 +302,7 @@ if options[:force] or not File.exist?(once_only_filename)
274
302
  system(command)
275
303
  end
276
304
  Dir.chdir(base_dir) if options[:copy]
305
+ File.unlink(lock_filename)
277
306
  if not success
278
307
  OnceOnly::Check::write_file(error_filename,checksums)
279
308
  File.unlink(tag_filename) if File.exist?(tag_filename)
@@ -283,7 +312,7 @@ if options[:force] or not File.exist?(once_only_filename)
283
312
  File.unlink(error_filename) if File.exist?(error_filename)
284
313
  OnceOnly::Check::write_file(once_only_filename,checksums)
285
314
  File.unlink(tag_filename) if File.exist?(tag_filename)
286
- end
315
+ end
287
316
  end
288
317
  end
289
318
  else
@@ -31,10 +31,36 @@ module OnceOnly
31
31
  list.map { |name| ( Dir.glob(glob).index(name) ? nil : name ) }.compact
32
32
  end
33
33
 
34
- # Calculate the checksums for each file in the list
35
- def Check::calc_file_checksums list
34
+ # Return a hash of files with their hash type, hash value and check time
35
+ def Check::precalculated_checksums(files)
36
+ precalc = {}
37
+ files.each do | fn |
38
+ dir = File.dirname(fn)
39
+ raise "Precalculated hash file should have .md5 extension!" if fn !~ /\.md5$/
40
+ t = File.mtime(fn)
41
+ File.open(fn).each { |s|
42
+ a = s.split
43
+ checkfn = File.expand_path(a[1],dir)
44
+ precalc[checkfn] = { type: 'MD5', hash: a[0], time: t }
45
+ }
46
+ end
47
+ precalc
48
+ end
49
+
50
+ # Calculate the checksums for each file in the list and return a list
51
+ # of array - each row containing the Hash type (MD5), the value and the (relative)
52
+ # file path.
53
+ def Check::calc_file_checksums list, precalc
36
54
  list.map { |fn|
37
- ['MD5'] + `/usr/bin/md5sum #{fn}`.split
55
+ # First see if fn is in the precalculated list
56
+ fqn = File.expand_path(fn)
57
+ if precalc[fqn] and File.mtime(fqn) < precalc[fqn][:time]
58
+ $stderr.print "Precalculated ",fn,"\n"
59
+ rec = precalc[fqn]
60
+ [rec[:type],rec[:hash],fqn]
61
+ else
62
+ ['MD5'] + `/usr/bin/md5sum #{fqn}`.split
63
+ end
38
64
  }
39
65
  end
40
66
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: once-only
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.2.2
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-08-27 00:00:00.000000000 Z
12
+ date: 2013-11-02 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
@@ -118,7 +118,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
118
118
  version: '0'
119
119
  segments:
120
120
  - 0
121
- hash: -4434987383675774210
121
+ hash: -2270452427765269751
122
122
  required_rubygems_version: !ruby/object:Gem::Requirement
123
123
  none: false
124
124
  requirements: