once-only 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (5) hide show
  1. data/README.md +59 -15
  2. data/VERSION +1 -1
  3. data/bin/once-only +34 -5
  4. data/lib/once-only/check.rb +29 -3
  5. metadata +3 -3
data/README.md CHANGED
@@ -2,14 +2,46 @@
2
2
 
3
3
  [![Build Status](https://secure.travis-ci.org/pjotrp/once-only.png)](http://travis-ci.org/pjotrp/once-only)
4
4
 
5
- Relax with PBS!
5
+ Relax with PBS!
6
+
7
+ No worries about running jobs concurrently from the command line (also
8
+ on multi-core). Once-only is inspired by the Lisp once-only function,
9
+ which wraps another function and calculates a result only once, based
10
+ on the same inputs. Simply prepend your command with once-only:
11
+
12
+ When running
13
+
14
+ ```bash
15
+ once-only -d cluster00073 --pbs --in output.best.dnd ~/opt/paml/bin/codeml ~/paml7-8.ctl
16
+ ```
17
+
18
+ This is what you want to see when same the job was executed before
19
+
20
+ ```bash
21
+ **STATUS** Job 00073codemla4817 already completed!
22
+ ```
23
+
24
+ This is what you see when a job is running
25
+
26
+ ```bash
27
+ **STATUS** Job 00073codemla4817 is locked!
28
+ ```
29
+
30
+ With PBS, this is what you want to see when a job is already in the queue
31
+
32
+ ```bash
33
+ **STATUS** Job 00073codemla4817 already in queue!
34
+ ```
35
+
36
+ Features
6
37
 
7
38
  * Computations only happen once
8
- * A completed job does not get submitted again to PBS
39
+ * A completed job does not get submitted again (to PBS)
9
40
  * A job already in the queue does not get submitted again to PBS
10
41
  * A completed job in the PBS queue does not run again
42
+ * A running job is locked
11
43
  * Guarantee independently executed jobs
12
- * Do not worry about submitting serial jobs
44
+ * Do not worry about submitting serial jobs multiple times
13
45
 
14
46
  and coming
15
47
 
@@ -19,12 +51,18 @@ and coming
19
51
  Once-only makes a program or script run only *once*, provided the inputs don't
20
52
  change (in a functional style!). This is very useful when running a range of
21
53
  jobs on a compute cluster or GRID. It may even be useful in the context of
22
- webservices. Once-only makes it relaxed to run many jobs on compute clusters!
23
- A mistake, interruption, or even a parameter tweak, does not mean everything
24
- has to be run again. When running jobs serially you can just batch
25
- submit them after getting the first results. Any missed jobs can be
26
- run later again. This way you can get better utilisation of the
27
- cluster.
54
+ webservices.
55
+
56
+ Once-only makes it relaxed to run many jobs on compute clusters! A
57
+ mistake, interruption, or even a parameter tweak, does not mean
58
+ everything has to be run again. When running jobs serially you can
59
+ just batch submit them after getting the first results. Any missed
60
+ jobs can be run later again. This way you can get better utilisation
61
+ of your cores or a cluster. You can even use it as a poor-mans PBS on
62
+ your multi-core machine, or over NFS by firing up scripts
63
+ concurrently.
64
+
65
+ Examples:
28
66
 
29
67
  Instead of running a tool or script directly, such as
30
68
 
@@ -89,12 +127,6 @@ md5sum on the one-only has file, for example
89
127
  grep MD5 bio-table-ce4ceee0d2ee08ef235662c35b8238ad47fed030.txt |awk 'BEGIN { FS = "[ \t\n]+" }{ print $2,"",$3 }'|md5sum -c
90
128
  ```
91
129
 
92
- Once-only is inspired by the Lisp once-only function, which wraps another
93
- function and calculates a result only once, based on the same inputs. It is
94
- also inspired by the NixOS software deployment system, which guarantees
95
- packages are uniquely deployed, based on the source code inputs and the
96
- configuration at compile time.
97
-
98
130
  ## Installation
99
131
 
100
132
  Note: once-only is written in Ruby, but you don't need to understand
@@ -234,6 +266,18 @@ Note that files that come with a path will be stripped of their path
234
266
  before execution. When files are very large you may want to consider
235
267
  the --scratch option.
236
268
 
269
+ ### Precalculated hashes
270
+
271
+ The --precalc option allows for using precalculated hash values. The
272
+ extension says what hash to use. Example:
273
+
274
+ ```sh
275
+ once-only --precalc hash.md5 /bin/cat ~/.bashrc
276
+ ```
277
+
278
+ Once-only will pick up the values from 'hash.md5' and use those after
279
+ making sure the time stamp of the hash file is most recent.
280
+
237
281
  ### Use the scratch disk with --scratch (nyi)
238
282
 
239
283
  watch this page
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.2.1
1
+ 0.2.2
@@ -19,10 +19,12 @@ Usage:
19
19
  --skip-regex regex skip making checksumes of filenames that match the regex (multiple allowed)
20
20
  --skip-glob regex skip making checksumes of filenames that match the glob (multiple allowed)
21
21
  --include|--in file include input filename for making the checksums (file should exist)
22
+ --precalc file use precalculated Hash values (extension .md5)
22
23
  -v increase verbosity
23
24
  -q run quietly
24
25
  --debug give debug information
25
26
  --dry-run do not execute command
27
+ --ignore-lock ignore locked files (they expire normally after 5 hours)
26
28
  --force force execute command
27
29
 
28
30
  Examples:
@@ -67,7 +69,7 @@ def exit_error errval = 1, msg = nil
67
69
  end
68
70
 
69
71
  def parse_args(args)
70
- options = { :skip => [], :skip_regex => [], :skip_glob => [], :include => [] }
72
+ options = { :precalc => [], :skip => [], :skip_regex => [], :skip_glob => [], :include => [] }
71
73
 
72
74
  consume = lambda { |args|
73
75
  if not args[0]
@@ -112,6 +114,10 @@ def parse_args(args)
112
114
  when '--copy'
113
115
  options[:copy] = true
114
116
  consume.call(args[1..-1])
117
+ when '--precalc'
118
+ p args
119
+ options[:precalc] << args[1]
120
+ consume.call(args[2..-1])
115
121
  when '-h', '--help'
116
122
  print USAGE
117
123
  exit 1
@@ -127,6 +133,9 @@ def parse_args(args)
127
133
  when '--dry-run'
128
134
  options[:dry_run] = true
129
135
  consume.call(args[1..-1])
136
+ when '--ignore-lock'
137
+ options[:ignore_lock] = true
138
+ consume.call(args[1..-1])
130
139
  when '--force'
131
140
  options[:force] = true
132
141
  consume.call(args[1..-1])
@@ -158,6 +167,10 @@ once_only_args = OnceOnly::Check.drop_pbs_option(once_only_args)
158
167
  once_only_args = OnceOnly::Check.drop_dir_option(once_only_args)
159
168
  once_only_command = once_only_args.join(' ')
160
169
 
170
+ # --- Fetch the pre-calculated checksums
171
+ precalc = OnceOnly::Check.precalculated_checksums(options[:precalc])
172
+
173
+ # --- Calculate the checksums for the items in the list
161
174
  command = args.join(' ')
162
175
  command_sorted = args.sort.join(' ')
163
176
  command_sha1 = OnceOnly::Check::calc_checksum(command_sorted)
@@ -173,6 +186,7 @@ base_dir = Dir.pwd
173
186
  executable = args[0]
174
187
  args = args[1..-1] if options[:skip_exe]
175
188
 
189
+ # Handle the file list
176
190
  file_list = OnceOnly::Check::get_file_list(args)
177
191
  options[:skip_regex].each { |regex|
178
192
  file_list = OnceOnly::Check::filter_file_list(file_list,regex)
@@ -186,16 +200,29 @@ OnceOnly::Check::check_files_exist(options[:include])
186
200
  file_list += options[:include]
187
201
  file_list = file_list.uniq
188
202
 
189
- checksums = OnceOnly::Check::calc_file_checksums(file_list)
203
+ checksums = OnceOnly::Check::calc_file_checksums(file_list,precalc)
190
204
  checksums.push ['SHA1',command_sha1,command_sorted] if not options[:skip_cli]
191
205
 
192
206
  # ---- Create filenames
193
207
  once_only_filename = OnceOnly::Check::make_once_filename(checksums,File.basename(executable))
194
208
  $stderr.print "Check file name ",once_only_filename,"\n" if options[:verbose]
195
209
  error_filename = once_only_filename + '.err'
196
- tag_filename = once_only_filename + '.run'
197
210
  $stderr.print "**STATUS** Job file exists ",once_only_filename,"!\n" if options[:debug] and File.exist?(once_only_filename)
198
211
 
212
+ # ---- The 'run' file is used to prepare for a job
213
+ tag_filename = once_only_filename + '.run'
214
+
215
+ # ---- The 'lock' file is used when the job is running
216
+ lock_filename = once_only_filename + '.lock'
217
+ if File.exist?(lock_filename) and not options[:force] and not options[:ignore_lock]
218
+ $stderr.print "**STATUS** Job is locked with #{lock_filename} '#{original_commands}'!\n" if not options[:quiet]
219
+ if File.mtime(lock_filename) < Time.now - 18000
220
+ $stderr.print "**STATUS ** Lock is stale, retrying now\n"
221
+ else
222
+ exit 0
223
+ end
224
+ end
225
+
199
226
  # ---- Create job name
200
227
  dirname = File.basename(Dir.pwd).rjust(8,"-") # make sure it is long enough
201
228
 
@@ -207,7 +234,7 @@ if options[:copy]
207
234
  copy_dir = base_dir + '/' + File.basename(once_only_filename,".txt")
208
235
  end
209
236
 
210
- if options[:force] or not File.exist?(once_only_filename)
237
+ if options[:force] or not File.exist?(once_only_filename)
211
238
  $stderr.print "Running #{command}\n" if not options[:quiet]
212
239
  OnceOnly::Check::write_file(tag_filename,checksums)
213
240
  if options[:pbs]
@@ -240,6 +267,7 @@ if options[:force] or not File.exist?(once_only_filename)
240
267
  else
241
268
  # --- Run on command line
242
269
  if !options[:dry_run]
270
+ File.open(lock_filename, "w") {}
243
271
  success =
244
272
  if options[:copy]
245
273
  exit_error(1,"Directory #{copy_dir} already exists!") if File.directory?(copy_dir)
@@ -274,6 +302,7 @@ if options[:force] or not File.exist?(once_only_filename)
274
302
  system(command)
275
303
  end
276
304
  Dir.chdir(base_dir) if options[:copy]
305
+ File.unlink(lock_filename)
277
306
  if not success
278
307
  OnceOnly::Check::write_file(error_filename,checksums)
279
308
  File.unlink(tag_filename) if File.exist?(tag_filename)
@@ -283,7 +312,7 @@ if options[:force] or not File.exist?(once_only_filename)
283
312
  File.unlink(error_filename) if File.exist?(error_filename)
284
313
  OnceOnly::Check::write_file(once_only_filename,checksums)
285
314
  File.unlink(tag_filename) if File.exist?(tag_filename)
286
- end
315
+ end
287
316
  end
288
317
  end
289
318
  else
@@ -31,10 +31,36 @@ module OnceOnly
31
31
  list.map { |name| ( Dir.glob(glob).index(name) ? nil : name ) }.compact
32
32
  end
33
33
 
34
- # Calculate the checksums for each file in the list
35
- def Check::calc_file_checksums list
34
+ # Return a hash of files with their hash type, hash value and check time
35
+ def Check::precalculated_checksums(files)
36
+ precalc = {}
37
+ files.each do | fn |
38
+ dir = File.dirname(fn)
39
+ raise "Precalculated hash file should have .md5 extension!" if fn !~ /\.md5$/
40
+ t = File.mtime(fn)
41
+ File.open(fn).each { |s|
42
+ a = s.split
43
+ checkfn = File.expand_path(a[1],dir)
44
+ precalc[checkfn] = { type: 'MD5', hash: a[0], time: t }
45
+ }
46
+ end
47
+ precalc
48
+ end
49
+
50
+ # Calculate the checksums for each file in the list and return a list
51
+ # of array - each row containing the Hash type (MD5), the value and the (relative)
52
+ # file path.
53
+ def Check::calc_file_checksums list, precalc
36
54
  list.map { |fn|
37
- ['MD5'] + `/usr/bin/md5sum #{fn}`.split
55
+ # First see if fn is in the precalculated list
56
+ fqn = File.expand_path(fn)
57
+ if precalc[fqn] and File.mtime(fqn) < precalc[fqn][:time]
58
+ $stderr.print "Precalculated ",fn,"\n"
59
+ rec = precalc[fqn]
60
+ [rec[:type],rec[:hash],fqn]
61
+ else
62
+ ['MD5'] + `/usr/bin/md5sum #{fqn}`.split
63
+ end
38
64
  }
39
65
  end
40
66
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: once-only
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.2.2
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-08-27 00:00:00.000000000 Z
12
+ date: 2013-11-02 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
@@ -118,7 +118,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
118
118
  version: '0'
119
119
  segments:
120
120
  - 0
121
- hash: -4434987383675774210
121
+ hash: -2270452427765269751
122
122
  required_rubygems_version: !ruby/object:Gem::Requirement
123
123
  none: false
124
124
  requirements: