once-only 0.0.1 → 0.2.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (5) hide show
  1. data/README.md +82 -16
  2. data/VERSION +1 -1
  3. data/bin/once-only +102 -17
  4. data/lib/once-only/check.rb +7 -1
  5. metadata +32 -12
data/README.md CHANGED
@@ -7,13 +7,24 @@ Relax with PBS!
7
7
  * Computations only happen once
8
8
  * A completed job does not get submitted again to PBS
9
9
  * A job already in the queue does not get submitted again to PBS
10
+ * A completed job in the PBS queue does not run again
11
+ * Guarantee independently executed jobs
12
+ * Do not worry about submitting serial jobs
13
+
14
+ and coming
15
+
16
+ * Automatically use a scratch disk (nyi)
17
+ * Garbage collect jobs (nyi)
10
18
 
11
19
  Once-only makes a program or script run only *once*, provided the inputs don't
12
20
  change (in a functional style!). This is very useful when running a range of
13
21
  jobs on a compute cluster or GRID. It may even be useful in the context of
14
22
  webservices. Once-only makes it relaxed to run many jobs on compute clusters!
15
23
  A mistake, interruption, or even a parameter tweak, does not mean everything
16
- has to be run again.
24
+ has to be run again. When running jobs serially you can just batch
25
+ submit them after getting the first results. Any missed jobs can be
26
+ run later again. This way you can get better utilisation of the
27
+ cluster.
17
28
 
18
29
  Instead of running a tool or script directly, such as
19
30
 
@@ -27,17 +38,18 @@ Prepend once-only
27
38
  once-only bowtie -t e_coli reads/e_coli_1000.fq e_coli.map
28
39
  ```
29
40
 
30
- and once-only will parse the command line for existing files and run a checksum
31
- on them (here the binary executable 'bowtie' and data files
41
+ and once-only will parse the command line for existing files and run a
42
+ checksum on them (here the binary executable 'bowtie' and data files
32
43
  reads/e_coli_1000.fq and e_coli.map). This checksum, in fact an MD5
33
- cryptographic hash, or optionally [pfff](https://github.com/pfff/pfff) for
34
- large files, is a unique identifier (aka fingerprint) and saved in a file in the running
35
- directory. When the checksum file does not exist in the directory the command
36
- 'bowtie -t e_coli reads/e_coli_1000.fq e_coli.map' is executed.
44
+ cryptographic hash, or optionally [pfff](https://github.com/pfff/pfff)
45
+ for large files, is a unique identifier (aka fingerprint) and saved in
46
+ a file in the running directory. When the checksum file does not
47
+ exist in the directory the command 'bowtie -t e_coli
48
+ reads/e_coli_1000.fq e_coli.map' is executed.
37
49
 
38
- When the file already exists execution is skipped. In other words, the checksum
39
- file guarantees the program is only run once with the same inputs. Really
40
- simple!
50
+ When the file already exists execution is skipped. In other words, the
51
+ checksum file guarantees the program is only run once with the same
52
+ inputs. Really simple!
41
53
 
42
54
  In combination with PBS this could be
43
55
 
@@ -94,13 +106,18 @@ With Ruby 1.9 or later on your system you can run
94
106
  gem install once-only
95
107
  ```
96
108
 
109
+ It is also easy to check out the git repository, as once-only has no
110
+ library dependencies.
111
+
97
112
  ### Dependencies
98
113
 
99
114
  'md5sum' is used for calculating MD5 hash values.
100
115
 
101
- 'pfff' is optional and used for calculating pfff hash values on very large files.
116
+ 'pfff' is optional and used for calculating pfff hash values on very
117
+ large files (nyi).
102
118
 
103
- When you are using PBS, once-only requires the 'qsub' and 'qstat' commands.
119
+ When you are using the --pbs option, once-only will use the 'qsub' and
120
+ 'qstat' commands.
104
121
 
105
122
  ## Usage (command line)
106
123
 
@@ -112,7 +129,7 @@ once-only --help
112
129
 
113
130
  Useful switches can be -v (verbose) and -q (quiet).
114
131
 
115
- If you want to skip scanning the executable file (useful in heterogenous environments,
132
+ To skip scanning the executable file (useful in heterogenous environments,
116
133
  such as the GRID) use the --skip-exe switch:
117
134
 
118
135
  ```sh
@@ -141,8 +158,11 @@ once-only --skip-exe --skip-glob 'out*' --skip-glob '*.ph' muscle -in aa.fa -out
141
158
 
142
159
  For a full range of glob patterns, see this [page](http://ruby.about.com/od/beginningruby/a/dir2.htm).
143
160
 
144
- Sometimes you want to include input files that are not on the command line for generating the hash. Maybe some default input file name is being picked up, or it is defined in a
145
- configuration file. In that case use the --include/--in options.
161
+ Sometimes you want to include input files that are not on the command
162
+ line for generating the hash. Maybe some default input file name is
163
+ being picked up, or it is defined in a configuration file. In that
164
+ case use the --include/--in options. Another feature is that if an -in
165
+ file does not exist once-only does not run.
146
166
 
147
167
  Another once-only command line option is to change directory before executing the script
148
168
 
@@ -152,9 +172,30 @@ once-only -d run001 --skip-regex 'out|\.ph$' muscle -in aa.fa -out out-alignment
152
172
 
153
173
  which is useful with PBS and in scripted environments.
154
174
 
175
+ ### Pipes and redirection
176
+
177
+ Once-only supports pipes and redirection by stringifying a command on
178
+ STDIN:
179
+
180
+ ```sh
181
+ echo "/bin/cat README.md > tmp.out" | once-only --skip tmp.out
182
+ ```
183
+
184
+ With PBS the tricky thing here is using more quotes for spacing. At this point it is
185
+ recommended to escape internal quotes, and avoid using single quotes, e.g.
186
+
187
+ ```sh
188
+ echo "/bin/cat \\\"README.md Version 2\\\" > tmp.out" | once-only --pbs --skip tmp.out
189
+ ```
190
+
155
191
  ### PBS
156
192
 
157
- Once-only has PBS support built-in. It only uses the 'qsub' and 'qstat' commands.
193
+ Once-only has PBS support built-in. When a job is in the queue, it
194
+ won't get submitted again. When a job has completed, it won't run
195
+ again. Thiss achieved by using once-only before submitting the job
196
+ to the queue, and right before running the job.
197
+
198
+ Once-only with PBS only uses the 'qsub' and 'qstat' commands.
158
199
 
159
200
  Basically use the --pbs option:
160
201
 
@@ -176,6 +217,31 @@ once-only --pbs --skip-exe /bin/cat ~/.bashrc
176
217
 
177
218
  so once-only won't check the file /bin/cat.
178
219
 
220
+ ### Guarantee independent jobs with --copy
221
+
222
+ Because once-only 'knows' the input files we can copy them to a unique
223
+ place before execution. By using the --copy switch a new directory is
224
+ created in the run directory using the hash value of the process.
225
+ Input files are copied and the job is run inside that directory. When
226
+ the job is finished the output file(s) are copied back to the working
227
+ directory. Example
228
+
229
+ ```sh
230
+ once-only --copy /bin/cat ~/.bashrc
231
+ ```
232
+
233
+ Note that files that come with a path will be stripped of their path
234
+ before execution. When files are very large you may want to consider
235
+ the --scratch option.
236
+
237
+ ### Use the scratch disk with --scratch (nyi)
238
+
239
+ watch this page
240
+
241
+ ### Garbage collect jobs (nyi)
242
+
243
+ watch this page
244
+
179
245
  ## Project home page
180
246
 
181
247
  Information on the source tree, documentation, examples, issues and
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.0.1
1
+ 0.2.1
@@ -11,20 +11,35 @@ once-only runs a command once only when inputs don't change!
11
11
  Usage:
12
12
 
13
13
  -d path change to directory before executing
14
+ --copy copy files to hash dir first
14
15
  --pbs [opts] convert to PBS command with optional options
15
16
  --skip|--out file skip making a checksum of the named file (multiple allowed)
16
17
  --skip-exe skip making a checksum of the executable command/script
17
18
  --skip-cli skip making a checksum of full command line
18
19
  --skip-regex regex skip making checksumes of filenames that match the regex (multiple allowed)
19
20
  --skip-glob regex skip making checksumes of filenames that match the glob (multiple allowed)
20
- --include|--in file include input filename for making the checksumes
21
+ --include|--in file include input filename for making the checksums (file should exist)
21
22
  -v increase verbosity
22
23
  -q run quietly
23
24
  --debug give debug information
24
25
  --dry-run do not execute command
25
26
  --force force execute command
26
27
 
27
- See the README for examples
28
+ Examples:
29
+
30
+ Basic use
31
+
32
+ once-only /bin/cat README.md
33
+
34
+ With PBS
35
+
36
+ once-only --pbs /bin/cat README.md
37
+
38
+ Using redirects
39
+
40
+ echo "/bin/cat README.md > tmp.out" | ./bin/once-only --skip tmp.out
41
+
42
+ See the README for more examples!
28
43
 
29
44
  EOB
30
45
 
@@ -45,8 +60,9 @@ if ARGV.size == 0
45
60
  exit 1
46
61
  end
47
62
 
48
- def exit_error errval = 1
49
- $stderr.print "\nonce-only returned error #{errval}\n"
63
+ def exit_error errval = 1, msg = nil
64
+ $stderr.print msg if msg
65
+ $stderr.print "\n**ERROR** once-only returned error #{errval}\n"
50
66
  exit errval
51
67
  end
52
68
 
@@ -54,20 +70,28 @@ def parse_args(args)
54
70
  options = { :skip => [], :skip_regex => [], :skip_glob => [], :include => [] }
55
71
 
56
72
  consume = lambda { |args|
73
+ if not args[0]
74
+ # check stdin
75
+ cmd = $stdin.gets
76
+ exit_error(1,"Empty command on STDIN") if cmd == nil
77
+ $stderr.print "Command (STDIN): ",cmd,"\n"
78
+ options[:stdin] = true
79
+ return cmd.split(/\s/)
80
+ end
57
81
  return args if File.exist?(args[0]) # reached the executable command
58
82
  case args[0]
59
83
  when '-d'
60
84
  options[:dir] = File.expand_path(args[1])
61
85
  consume.call(args[2..-1])
62
86
  when '--pbs'
63
- if args[1] =~ /\s+/ # optional argument
87
+ if args[1] and args[1] =~ /\s+/ # optional PBS argument with spacing
64
88
  options[:pbs] = args[1]
65
89
  consume.call(args[2..-1])
66
90
  else
67
91
  options[:pbs] = "''"
68
92
  consume.call(args[1..-1])
69
93
  end
70
- when '--skip','--out'
94
+ when '--skip', '--out'
71
95
  options[:skip] << args[1]
72
96
  consume.call(args[2..-1])
73
97
  when '--skip-exe'
@@ -82,9 +106,12 @@ def parse_args(args)
82
106
  when '--skip-glob'
83
107
  options[:skip_glob] << args[1]
84
108
  consume.call(args[2..-1])
85
- when '--include','--in'
109
+ when '--include', '--in', '-in'
86
110
  options[:include] << args[1]
87
111
  consume.call(args[2..-1])
112
+ when '--copy'
113
+ options[:copy] = true
114
+ consume.call(args[1..-1])
88
115
  when '-h', '--help'
89
116
  print USAGE
90
117
  exit 1
@@ -104,7 +131,7 @@ def parse_args(args)
104
131
  options[:force] = true
105
132
  consume.call(args[1..-1])
106
133
  else
107
- $stderr.print "Can not parse arguments",args
134
+ $stderr.print "**ERROR** Can not parse arguments",args
108
135
  exit_error(1)
109
136
  end
110
137
  }
@@ -126,6 +153,7 @@ if options[:debug]
126
153
  p options
127
154
  end
128
155
 
156
+ # --- Rewrite once-only args for PBS
129
157
  once_only_args = OnceOnly::Check.drop_pbs_option(once_only_args)
130
158
  once_only_args = OnceOnly::Check.drop_dir_option(once_only_args)
131
159
  once_only_command = once_only_args.join(' ')
@@ -140,6 +168,8 @@ if options[:dir]
140
168
  Dir.chdir options[:dir]
141
169
  end
142
170
 
171
+ base_dir = Dir.pwd
172
+
143
173
  executable = args[0]
144
174
  args = args[1..-1] if options[:skip_exe]
145
175
 
@@ -154,34 +184,53 @@ file_list -= options[:skip]
154
184
 
155
185
  OnceOnly::Check::check_files_exist(options[:include])
156
186
  file_list += options[:include]
187
+ file_list = file_list.uniq
157
188
 
158
189
  checksums = OnceOnly::Check::calc_file_checksums(file_list)
159
190
  checksums.push ['SHA1',command_sha1,command_sorted] if not options[:skip_cli]
160
191
 
192
+ # ---- Create filenames
161
193
  once_only_filename = OnceOnly::Check::make_once_filename(checksums,File.basename(executable))
162
194
  $stderr.print "Check file name ",once_only_filename,"\n" if options[:verbose]
163
195
  error_filename = once_only_filename + '.err'
164
- $stderr.print "Job file exists ",once_only_filename,"!\n" if options[:debug] and File.exist?(once_only_filename)
196
+ tag_filename = once_only_filename + '.run'
197
+ $stderr.print "**STATUS** Job file exists ",once_only_filename,"!\n" if options[:debug] and File.exist?(once_only_filename)
165
198
 
199
+ # ---- Create job name
166
200
  dirname = File.basename(Dir.pwd).rjust(8,"-") # make sure it is long enough
167
201
 
168
202
  job_name = (dirname[-5..-1] + once_only_filename.split(/-/).map{|s|s[0..5]}.join).gsub(/[_-]/,'')[0..15]
169
203
  $stderr.print "Job name ",job_name,"\n" if options[:verbose]
170
204
 
205
+ # ---- Create copy destination
206
+ if options[:copy]
207
+ copy_dir = base_dir + '/' + File.basename(once_only_filename,".txt")
208
+ end
209
+
171
210
  if options[:force] or not File.exist?(once_only_filename)
172
211
  $stderr.print "Running #{command}\n" if not options[:quiet]
212
+ OnceOnly::Check::write_file(tag_filename,checksums)
173
213
  if options[:pbs]
214
+ # --- Submit PBS job
215
+ pbs_command = 'echo \'' +
216
+ if options[:stdin]
217
+ 'echo "' + command + '"|'+ once_only_command
218
+ else
219
+ once_only_command + ' ' + command
220
+ end
221
+ # --- Add PBS part
222
+ pbs_command += "'|qsub -N #{job_name} "+options[:pbs]
223
+ pbs_command += ' -d ' + (options[:dir] ? options[:dir] : Dir.pwd)
224
+
225
+ $stderr.print("PBS command: ",pbs_command,"\n") if options[:verbose]
226
+
174
227
  # --- Check if job is already queued in PBS
175
228
  qstat = `/usr/bin/qstat`
176
229
  if qstat =~ /#{job_name}/
177
- $stderr.print "Job #{job_name} already in queue!\n"
230
+ $stderr.print "**STATUS** Job #{job_name} already in queue!\n"
178
231
  exit 0
179
232
  end
180
- # --- Submit PBS job
181
- pbs_command = 'echo "' + once_only_command + ' ' + command + "\"|qsub -N #{job_name} "+options[:pbs]+' '
182
- pbs_command += '-d ' + (options[:dir] ? options[:dir] : Dir.pwd)
183
-
184
- $stderr.print(pbs_command,"\n") if options[:verbose]
233
+
185
234
  if !options[:dry_run]
186
235
  if not system(pbs_command)
187
236
  OnceOnly::Check::write_file(error_filename,checksums)
@@ -191,18 +240,54 @@ if options[:force] or not File.exist?(once_only_filename)
191
240
  else
192
241
  # --- Run on command line
193
242
  if !options[:dry_run]
194
- if not system(command)
243
+ success =
244
+ if options[:copy]
245
+ exit_error(1,"Directory #{copy_dir} already exists!") if File.directory?(copy_dir)
246
+ $stderr.print "Running in directory ",copy_dir
247
+ Dir.mkdir(copy_dir)
248
+ # --- copy files
249
+ # p args
250
+ # p file_list
251
+ clist = args.dup
252
+ file_list.each { | fn |
253
+ # copy file
254
+ res = `cp #{fn} #{copy_dir}`
255
+ print res if options[:verbose]
256
+ # replace command
257
+ clist = clist.map { |arg| ( fn == arg ? File.basename(arg) : arg ) }
258
+ }
259
+ p clist if options[:debug]
260
+ command_stripped = clist.join(' ')
261
+ Dir.chdir(copy_dir)
262
+ system_result = system(command_stripped)
263
+ if system_result
264
+ # Copy results back
265
+ Dir.glob(copy_dir+'/*').each { |outfn|
266
+ if clist.index(File.basename(outfn)) > 0
267
+ res = `cp #{outfn} #{base_dir}`
268
+ print res if options[:verbose]
269
+ end
270
+ }
271
+ end
272
+ system_result
273
+ else
274
+ system(command)
275
+ end
276
+ Dir.chdir(base_dir) if options[:copy]
277
+ if not success
195
278
  OnceOnly::Check::write_file(error_filename,checksums)
279
+ File.unlink(tag_filename) if File.exist?(tag_filename)
196
280
  exit_error($?.exitstatus)
197
281
  else
198
282
  # --- Success!
199
283
  File.unlink(error_filename) if File.exist?(error_filename)
200
284
  OnceOnly::Check::write_file(once_only_filename,checksums)
285
+ File.unlink(tag_filename) if File.exist?(tag_filename)
201
286
  end
202
287
  end
203
288
  end
204
289
  else
205
- $stderr.print "Inputs unchanged. No need to rerun '#{original_commands}'!\n" if not options[:quiet]
290
+ $stderr.print "**STATUS** Inputs unchanged. No need to rerun '#{original_commands}'!\n" if not options[:quiet]
206
291
  end
207
292
 
208
293
  exit 0 # success!
@@ -16,7 +16,7 @@ module OnceOnly
16
16
 
17
17
  def Check::check_files_exist list
18
18
  list.each { |fn|
19
- raise "File #{fn} does not exist!" if not File.exist?(fn)
19
+ Check::exit_error("File #{fn} does not exist!") if not File.exist?(fn)
20
20
  }
21
21
  end
22
22
 
@@ -95,6 +95,12 @@ protected
95
95
  return filename if filename and File.exist?(filename)
96
96
  nil
97
97
  end
98
+
99
+ def Check::exit_error msg, errval=1
100
+ $stderr.print "\nERROR: ",msg
101
+ $stderr.print " (once-only returned error #{errval})!\n"
102
+ exit errval
103
+ end
98
104
  end
99
105
 
100
106
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: once-only
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.2.1
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,11 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-05-12 00:00:00.000000000Z
12
+ date: 2013-08-27 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
16
- requirement: &17265660 !ruby/object:Gem::Requirement
16
+ requirement: !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ~>
@@ -21,10 +21,15 @@ dependencies:
21
21
  version: 2.8.0
22
22
  type: :development
23
23
  prerelease: false
24
- version_requirements: *17265660
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ~>
28
+ - !ruby/object:Gem::Version
29
+ version: 2.8.0
25
30
  - !ruby/object:Gem::Dependency
26
31
  name: cucumber
27
- requirement: &17264920 !ruby/object:Gem::Requirement
32
+ requirement: !ruby/object:Gem::Requirement
28
33
  none: false
29
34
  requirements:
30
35
  - - ! '>='
@@ -32,10 +37,15 @@ dependencies:
32
37
  version: '0'
33
38
  type: :development
34
39
  prerelease: false
35
- version_requirements: *17264920
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ! '>='
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
36
46
  - !ruby/object:Gem::Dependency
37
47
  name: jeweler
38
- requirement: &17264040 !ruby/object:Gem::Requirement
48
+ requirement: !ruby/object:Gem::Requirement
39
49
  none: false
40
50
  requirements:
41
51
  - - ~>
@@ -43,10 +53,15 @@ dependencies:
43
53
  version: 1.8.4
44
54
  type: :development
45
55
  prerelease: false
46
- version_requirements: *17264040
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ~>
60
+ - !ruby/object:Gem::Version
61
+ version: 1.8.4
47
62
  - !ruby/object:Gem::Dependency
48
63
  name: bundler
49
- requirement: &17263420 !ruby/object:Gem::Requirement
64
+ requirement: !ruby/object:Gem::Requirement
50
65
  none: false
51
66
  requirements:
52
67
  - - ! '>='
@@ -54,7 +69,12 @@ dependencies:
54
69
  version: 1.0.21
55
70
  type: :development
56
71
  prerelease: false
57
- version_requirements: *17263420
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ! '>='
76
+ - !ruby/object:Gem::Version
77
+ version: 1.0.21
58
78
  description: ! "Run programs and scripts once only. Especially\n useful for PBS and
59
79
  GRID computing"
60
80
  email: pjotr.public01@thebird.nl
@@ -98,7 +118,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
98
118
  version: '0'
99
119
  segments:
100
120
  - 0
101
- hash: -995396904619724027
121
+ hash: -4434987383675774210
102
122
  required_rubygems_version: !ruby/object:Gem::Requirement
103
123
  none: false
104
124
  requirements:
@@ -107,7 +127,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
107
127
  version: '0'
108
128
  requirements: []
109
129
  rubyforge_project:
110
- rubygems_version: 1.8.10
130
+ rubygems_version: 1.8.23
111
131
  signing_key:
112
132
  specification_version: 3
113
133
  summary: Run commands once only if inputs do not change