bio-gemma-wrapper 0.98.1 → 0.99.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: b7a8bfe787236f397dba6e05aef202486ba53389
4
- data.tar.gz: 9fe2398ed3fcd053e8258f73a64b1610bfddf9e2
2
+ SHA256:
3
+ metadata.gz: da5f26b8acd9c3782c2b3f5f2a39af965fc7e1785cc820b49faca82924d74e51
4
+ data.tar.gz: 17035ee5fada269ae88dd0ed91d84075b2af88b400de1d0e9829cbdb60d5d0cb
5
5
  SHA512:
6
- metadata.gz: 0a691906f13da3469597517d160874315f5962822b0273757d33c37283c2a9a98da9a41d08c51e4ed7854b3d3593f9d96ac81d59212690edd628a2677499d501
7
- data.tar.gz: 5d4b898ff3566f52652cbb7db6597dc245e9a5da6b0d9da107de81deec0d51c02c822762e0ab921a2309c1dbc3455b55c1adc5fe94c891e31a9c32cb6d343f2e
6
+ metadata.gz: eaec3c7dad4fc1bda713765e056bfe11dd69d4ca850333fed5a1a27e344724365a705ddf7845ce63b5af6b35ab6140da10f4bc7067aaa4539e47f6c6f94de1f0
7
+ data.tar.gz: c26b282c0fd7c70a702467e58c3f6ea22f820d91a8a364b335cdab7e807add9cf1079faa25c46a87b89c78e4293990cdea2210427c5f9c3565bd5040fdbef496
data/README.md CHANGED
@@ -1,12 +1,20 @@
1
1
  [![gemma-wrapper gem version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
2
2
 
3
- # GEMMA wrapper caches K between runs with LOCO support
3
+ # GEMMA with LOCO, permutations and slurm support (and caching)
4
4
 
5
5
  ![Genetic associations identified in CFW mice using GEMMA (Parker et al,
6
6
  Nat. Genet., 2016)](cfw.gif)
7
7
 
8
8
  ## Introduction
9
9
 
10
+ Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
11
+ GEMMA in parallel (now the default with LOCO), and GEMMA on
12
+ PBS. Gemma-wrapper is used to run GEMMA as part of the
13
+ https://genenetwork.org/ environment.
14
+
15
+ Note that a version of gemma-wrapper is projected to be integrated
16
+ into gemma itself.
17
+
10
18
  GEMMA is a software toolkit for fast application of linear mixed
11
19
  models (LMMs) and related models to genome-wide association studies
12
20
  (GWAS) and other large-scale data sets.
@@ -14,15 +22,21 @@ models (LMMs) and related models to genome-wide association studies
14
22
  This repository contains gemma-wrapper, essentially a wrapper of
15
23
  GEMMA that provides support for caching the kinship or relatedness
16
24
  matrix (K) and caching LM and LMM computations with the option of full
17
- leave-one-chromosome-out genome scans (LOCO).
25
+ leave-one-chromosome-out genome scans (LOCO). Jobs can also be
26
+ submitted to HPC PBS, i.e., slurm.
18
27
 
19
28
  gemma-wrapper requires a recent version of GEMMA and essentially
20
29
  does a pass-through of all standard GEMMA invocation switches. On
21
30
  return gemma-wrapper can return a JSON object (--json) which is
22
31
  useful for web-services.
23
32
 
24
- Note that this a work in progress (WIP). What is described below
25
- should work.
33
+ ## Performance
34
+
35
+ LOCO runs in parallel by default which is at least a 5x performance
36
+ improvement on a machine with enough cores. GEMMA without LOCO,
37
+ however, does not run in parallel by default. Performance
38
+ improvements with the parallel implementation for LOCO and non-LOCO
39
+ can be viewed [here](./test/performance/releases.gmi).
26
40
 
27
41
  ## Installation
28
42
 
@@ -32,8 +46,9 @@ Prerequisites are
32
46
  * Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
33
47
  almost all Linux systems
34
48
 
35
- gemma-wrapper comes as a Ruby [gem](https://rubygems.org/gems/bio-gemma-wrapper) and
36
- can be installed with
49
+ gemma-wrapper comes as a Ruby
50
+ [gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
51
+ installed with
37
52
 
38
53
  gem install bio-gemma-wrapper
39
54
 
@@ -47,14 +62,19 @@ and it will render something like
47
62
  Usage: gemma-wrapper [options] -- [gemma-options]
48
63
  --permutate n Permutate # times by shuffling phenotypes
49
64
  --permute-phenotypes filen Phenotypes to be shuffled in permutations
50
- --loco [x,y,1,2,3...] Run full LOCO
65
+ --loco Run full leave-one-chromosome-out (LOCO)
66
+ --chromosomes [1,2,3] Run specific chromosomes
51
67
  --input filen JSON input variables (used for LOCO)
52
68
  --cache-dir path Use a cache directory
53
69
  --json Create output file in JSON format
54
- --force Force computation
70
+ --force Force computation (override cache)
71
+ --parallel Run jobs in parallel
72
+ --no-parallel Do not run jobs in parallel
73
+ --slurm[=opts] Use slurm PBS for submitting jobs
55
74
  --q, --quiet Run quietly
56
75
  -v, --verbose Run verbosely
57
- --debug Show debug messages and keep intermediate output
76
+ -d, --debug Show debug messages and keep intermediate output
77
+ --dry-run Show commands, but don't execute
58
78
  -- Anything after gets passed to GEMMA
59
79
 
60
80
  -h, --help display this help and exit
@@ -69,6 +89,8 @@ Unpack it and run the tool as
69
89
 
70
90
  ./bin/gemma-wrapper --help
71
91
 
92
+ See below for using a GNU Guix environment.
93
+
72
94
  ## Usage
73
95
 
74
96
  gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
@@ -90,12 +112,13 @@ the data files are found):
90
112
  gemma-wrapper -- \
91
113
  -g test/data/input/BXD_geno.txt.gz \
92
114
  -p test/data/input/BXD_pheno.txt \
115
+ -a test/data/input/BXD_snps.txt \
93
116
  -gk \
94
117
  -debug
95
118
 
96
119
  Run it twice to see
97
120
 
98
- /tmp/3079151e14b219c3b243b673d88001c1675168b4.log.txt gemma-wrapper CACHE HIT!
121
+ /tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
99
122
 
100
123
  gemma-wrapper computes the unique HASH value over the command
101
124
  line switches passed into GEMMA as well as the contents of the files
@@ -107,10 +130,12 @@ You can also get JSON output on STDOUT by providing the --json switch
107
130
  gemma-wrapper --json -- \
108
131
  -g test/data/input/BXD_geno.txt.gz \
109
132
  -p test/data/input/BXD_pheno.txt \
133
+ -a test/data/input/BXD_snps.txt \
110
134
  -gk \
111
- -debug
135
+ -debug > K.json
112
136
 
113
- prints out something that can be parsed with a calling program
137
+ K.json is something that can be parsed with a calling program, and is
138
+ also below as input for the GWA step. Example:
114
139
 
115
140
  ```json
116
141
  {"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
@@ -123,6 +148,7 @@ default. If you want something else provide a --cache-dir, e.g.
123
148
  gemma-wrapper --cache-dir ~/.gemma-cache -- \
124
149
  -g test/data/input/BXD_geno.txt.gz \
125
150
  -p test/data/input/BXD_pheno.txt \
151
+ -a test/data/input/BXD_snps.txt \
126
152
  -gk \
127
153
  -debug
128
154
 
@@ -130,10 +156,10 @@ will store K in ~/.gemma-cache.
130
156
 
131
157
  ### GWA
132
158
 
133
- Run the LMM using the K's captured in K.json using the --input
159
+ Run the LMM using the K's captured earlier in K.json using the --input
134
160
  switch
135
161
 
136
- gemma-wrapper --json --loco --input K.json -- \
162
+ gemma-wrapper --json --input K.json -- \
137
163
  -g test/data/input/BXD_geno.txt.gz \
138
164
  -p test/data/input/BXD_pheno.txt \
139
165
  -c test/data/input/BXD_covariates2.txt \
@@ -153,7 +179,7 @@ https://github.com/genetics-statistics/GEMMA/issues/46). To loop all
153
179
  chromosomes first create all K's with
154
180
 
155
181
  gemma-wrapper --json \
156
- --loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \
182
+ --loco -- \
157
183
  -g test/data/input/BXD_geno.txt.gz \
158
184
  -p test/data/input/BXD_pheno.txt \
159
185
  -a test/data/input/BXD_snps.txt \
@@ -201,12 +227,24 @@ Next, using K.json, permute the phenotypes with something like
201
227
  -lmm 2 -maf 0.1 \
202
228
  -debug > GWA.json
203
229
 
204
- This should get the 95% significant and 67% suggestive thresholds:
230
+ This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
231
+
232
+ ["95 percentile (significant) ", 1.92081e-05, 4.7]
233
+ ["67 percentile (suggestive) ", 5.227785e-05, 4.3]
234
+
235
+ ### Slurm PBS
205
236
 
206
- ["95 percentile (significant) ", 2.015475e-05, 4.7]
207
- ["67 percentile (suggestive) ", 2.015475e-05, 4.7]
237
+ To run gemma-wrapper on HPC use the '--slurm' switch.
208
238
 
239
+ ## Development
240
+
241
+ We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
242
+
243
+ ```
244
+ source .guix-deploy
245
+ ruby bin/gemma-wrapper --help
246
+ ```
209
247
 
210
248
  ## Copyright
211
249
 
212
- Copyright (c) 2017,2018 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
250
+ Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.98.1
1
+ 0.99.4
data/bin/gemma-wrapper CHANGED
@@ -4,7 +4,7 @@
4
4
  # Author:: Pjotr Prins
5
5
  # License:: GPL3
6
6
  #
7
- # Copyright (C) 2017,2018 Pjotr Prins <pjotr.prins@thebird.nl>
7
+ # Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
8
8
 
9
9
  USAGE = "
10
10
  GEMMA wrapper example:
@@ -14,12 +14,12 @@ GEMMA wrapper example:
14
14
  gemma-wrapper -- \\
15
15
  -g test/data/input/BXD_geno.txt.gz \\
16
16
  -p test/data/input/BXD_pheno.txt \\
17
+ -a test/data/input/BXD_snps.txt \
17
18
  -gk
18
19
 
19
20
  LOCO K computation with caching and JSON output
20
21
 
21
- gemma-wrapper --json \\
22
- --loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \\
22
+ gemma-wrapper --json --loco -- \\
23
23
  -g test/data/input/BXD_geno.txt.gz \\
24
24
  -p test/data/input/BXD_pheno.txt \\
25
25
  -a test/data/input/BXD_snps.txt \\
@@ -38,11 +38,10 @@ GEMMA wrapper example:
38
38
  Gemma gets used from the path. You can override by setting
39
39
 
40
40
  env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
41
-
42
41
  "
43
42
  # These are used for testing compatibility with the gemma tool
44
43
  GEMMA_V_MAJOR = 98
45
- GEMMA_V_MINOR = 0
44
+ GEMMA_V_MINOR = 4
46
45
 
47
46
  basepath = File.dirname(File.dirname(__FILE__))
48
47
  $: << File.join(basepath,'lib')
@@ -66,17 +65,21 @@ if not gemma_command
66
65
  end
67
66
 
68
67
 
68
+ require 'digest/sha1'
69
69
  require 'fileutils'
70
70
  require 'optparse'
71
- require 'tmpdir'
72
71
  require 'tempfile'
72
+ require 'tmpdir'
73
+
74
+ require 'lock'
73
75
 
74
76
  split_at = ARGV.index('--')
77
+
75
78
  if split_at
76
79
  gemma_args = ARGV[split_at+1..-1]
77
80
  end
78
81
 
79
- options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
82
+ options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, permute_phenotypes: false, parallel: nil }
80
83
 
81
84
  opts = OptionParser.new do |o|
82
85
  o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
@@ -91,8 +94,12 @@ opts = OptionParser.new do |o|
91
94
  raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
92
95
  end
93
96
 
94
- o.on('--loco [x,y,1,2,3...]', Array, 'Run full LOCO') do |lst|
95
- options[:loco] = lst
97
+ o.on('--loco', 'Run full leave-one-chromosome-out (LOCO)') do |b|
98
+ options[:loco] = b
99
+ end
100
+
101
+ o.on('--chromosomes [1,2,3]',Array,'Run specific chromosomes') do |lst|
102
+ options[:chromosomes] = lst
96
103
  end
97
104
 
98
105
  o.on('--input filen',String, 'JSON input variables (used for LOCO)') do |filen|
@@ -112,6 +119,22 @@ opts = OptionParser.new do |o|
112
119
  options[:force] = true
113
120
  end
114
121
 
122
+ o.on("--parallel", "Run jobs in parallel") do |b|
123
+ options[:parallel] = true
124
+ end
125
+
126
+ o.on("--no-parallel", "Do not run jobs in parallel") do |b|
127
+ options[:parallel] = false
128
+ end
129
+
130
+ o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
131
+ options[:slurm_opts] = ""
132
+ options[:slurm] = true
133
+ if slurm
134
+ options[:slurm_opts] = slurm
135
+ end
136
+ end
137
+
115
138
  o.on("--q", "--quiet", "Run quietly") do |q|
116
139
  options[:quiet] = true
117
140
  end
@@ -120,15 +143,20 @@ opts = OptionParser.new do |o|
120
143
  options[:verbose] = true
121
144
  end
122
145
 
123
- o.on("--debug", "Show debug messages and keep intermediate output") do |v|
146
+ o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
124
147
  options[:debug] = true
125
148
  end
126
149
 
150
+ o.on("--dry-run", "Show commands, but don't execute") do |b|
151
+ options[:dry_run] = b
152
+ end
153
+
127
154
  o.on('--','Anything after gets passed to GEMMA') do
128
155
  o.terminate()
129
156
  end
130
157
 
131
158
  o.separator ""
159
+
132
160
  o.on_tail('-h', '--help', 'display this help and exit') do
133
161
  options[:show_help] = true
134
162
  end
@@ -168,26 +196,46 @@ warning = lambda do |*msg|
168
196
  record[:warnings].push *msg.join("")
169
197
  OUTPUT.print "WARNING: ",*msg,"\n"
170
198
  end
199
+
171
200
  info = lambda do |*msg|
172
201
  record[:debug].push *msg.join("") if options[:json] and options[:debug]
173
202
  OUTPUT.print *msg,"\n" if !options[:quiet]
174
203
  end
175
204
 
205
+ # Fetch chromosomes
206
+ def get_chromosomes annofn
207
+ h = {}
208
+ File.open(annofn,"r").each_line do | line |
209
+ chr = line.split(/\s+/)[2]
210
+ h[chr] = true
211
+ end
212
+ h.map { |k,v| k }
213
+ end
176
214
  # ---- Start banner
177
215
 
178
216
  GEMMA_K_VERSION=version
179
- GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017,2018\n"
217
+ GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
180
218
  info.call GEMMA_K_BANNER
181
219
 
182
220
  # Check gemma version
183
- GEMMA_COMMAND=options[:gemma_command]
221
+ begin
222
+ gemma_command2 = options[:gemma_command]
223
+ info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
224
+
225
+ GEMMA_INFO = `#{gemma_command2}`
226
+ rescue Errno::ENOENT
227
+ gemma_command2 = "gemma"
228
+ error.call "<#{gemma_command2}> command not found"
229
+ end
184
230
 
185
- gemma_version_header = `#{GEMMA_COMMAND}`.split("\n").grep(/GEMMA|Version/)[0].strip
231
+ gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
186
232
  info.call "Using ",gemma_version_header,"\n"
187
233
  gemma_version = gemma_version_header.split(/[,\s]+/)[1]
188
234
  v_version, v_major, v_minor = gemma_version.split(".")
189
235
  info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
190
236
 
237
+ info.call gemma_version_header
238
+
191
239
  warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
192
240
 
193
241
  options[:gemma_version_header] = gemma_version_header
@@ -203,74 +251,160 @@ if RUBY_VERSION =~ /^1/
203
251
  warning "runs on Ruby 2.x only\n"
204
252
  end
205
253
 
254
+ # ---- LOCO defaults to parallel
255
+ if options[:parallel] == nil
256
+ options[:parallel] = true if options[:loco]
257
+ end
258
+
259
+ debug.call(options) # some debug output
260
+ debug.call(record)
261
+
206
262
  DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
207
263
  DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
208
264
 
265
+ if options[:parallel]
266
+ begin
267
+ skip_cite = `echo "will cite" |parallel --citation`
268
+ debug.call(skip_cite)
269
+ PARALLEL_INFO = `parallel --help`
270
+ rescue Errno::ENOENT
271
+ error.call "<parallel> command not found"
272
+ end
273
+ parallel_cmds = []
274
+ end
275
+
276
+ # ---- Fetch chromosomes from SNP annotation file
277
+ anno_idx = gemma_args.index '-a'
278
+ raise "Expected GEMMA -a genotype file switch" if anno_idx == nil
279
+ CHROMOSOMES = get_chromosomes(gemma_args[anno_idx+1])
280
+
209
281
  # ---- Compute HASH on inputs
210
282
  hashme = []
211
283
  geno_idx = gemma_args.index '-g'
212
284
  raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
213
285
  pheno_idx = gemma_args.index '-p'
214
- hashme =
215
- if DO_COMPUTE_KINSHIP and pheno_idx != nil
216
- # Remove the phenotype file from the hash
217
- gemma_args[0..pheno_idx-1] + gemma_args[pheno_idx+2..-1]
218
- else
219
- gemma_args
220
- end
221
286
 
222
287
  if DO_COMPUTE_GWA and options[:permute_phenotypes]
223
288
  raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
224
- hashme += ['-p', options[:permute_phenotypes]]
225
289
  end
226
290
 
227
- require 'digest/sha1'
228
- debug.call "Hashing on ",hashme,"\n"
229
- hashes = []
230
- hashme.each do | item |
231
- if File.exist?(item)
232
- hashes << Digest::SHA1.hexdigest(File.read(item))
233
- debug.call [item,hashes.last]
291
+ execute = lambda { |cmd|
292
+ info.call("Executing: #{cmd}")
293
+ err = 0
294
+ if not options[:debug]
295
+ # send output to stderr line by line
296
+ IO.popen("#{cmd}") do |io|
297
+ while s = io.gets
298
+ $stderr.print s
299
+ end
300
+ io.close
301
+ err = $?.to_i
302
+ end
234
303
  else
235
- hashes << item
304
+ $stderr.print `#{cmd}`
305
+ err = $?.to_i
306
+ end
307
+ err
308
+ }
309
+
310
+ compute_hash = lambda do | phenofn = nil |
311
+ # Compute a HASH on the inputs
312
+ debug.call "Hashing on ",hashme,"\n"
313
+ hashes = []
314
+ hm = if phenofn
315
+ hashme + ["-p", phenofn]
316
+ else
317
+ hashme
318
+ end
319
+ debug.call(hm)
320
+ hm.each do | item |
321
+ if File.file?(item)
322
+ hashes << Digest::SHA1.hexdigest(File.read(item))
323
+ debug.call [item,hashes.last]
324
+ else
325
+ hashes << item
326
+ end
236
327
  end
328
+ debug.call(hashes)
329
+ Digest::SHA1.hexdigest hashes.join(' ')
237
330
  end
238
- HASH = Digest::SHA1.hexdigest hashes.join(' ')
239
331
 
332
+ HASH = compute_hash.call()
240
333
  options[:hash] = HASH
241
334
 
335
+ at_exit do
336
+ Lock.release(HASH)
337
+ end
338
+
339
+ Lock.create(HASH) # this will wait for a lock to expire
340
+
341
+ joblog = options[:cache_dir]+"/"+HASH+"-parallel.log"
342
+
242
343
  # Create cache dir
243
344
  FileUtils::mkdir_p options[:cache_dir]
244
345
 
346
+ Dir.mktmpdir do |tmpdir| # tmpdir for GEMMA output
347
+
245
348
  error.call "Do not use the GEMMA -o switch!" if gemma_args.include? '-o'
246
349
  error.call "Do not use the GEMMA -outdir switch!" if gemma_args.include? '-outdir'
350
+ GEMMA_ARGS_HASH = gemma_args.dup # do not include outdir
247
351
  gemma_args << '-outdir'
248
- gemma_args << options[:cache_dir]
352
+ gemma_args << tmpdir
249
353
  GEMMA_ARGS = gemma_args
250
354
 
355
+ hashme =
356
+ if DO_COMPUTE_KINSHIP and pheno_idx != nil
357
+ # Remove the phenotype file from the hash for GRM computation
358
+ GEMMA_ARGS_HASH[0..pheno_idx-1] + GEMMA_ARGS_HASH[pheno_idx+2..-1]
359
+ else
360
+ GEMMA_ARGS_HASH
361
+ end
362
+
251
363
  debug.call "Options: ",options,"\n" if !options[:quiet]
252
364
 
253
- invoke_gemma = lambda do |extra_args, cache_hit = false|
254
- cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
365
+ invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
366
+ cmd = "#{gemma_command2} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
255
367
  record[:gemma_command] = cmd
256
368
  return if cache_hit
257
- debug.call cmd
369
+ if options[:slurm]
370
+ info.call cmd
371
+ hashi = HASH
372
+ prefix = tmpdir+'/'+hashi
373
+ scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
374
+ script = "#!/bin/bash
375
+ #SBATCH --job-name=gemma-#{scriptfn}
376
+ #SBATCH --ntasks=1
377
+ #SBATCH --time=20:00
378
+ srun #{cmd}
379
+ "
380
+ debug.call(script)
381
+ File.open(scriptfn,"w") { |f|
382
+ f.write(script)
383
+ }
384
+ cmd = "sbatch "+options[:slurm_opts] + scriptfn
385
+ end
258
386
  errno =
259
387
  if options[:json]
260
388
  # capture output
261
389
  err = 0
262
- IO.popen(cmd) do |io|
263
- while s = io.gets
264
- $stderr.print s
265
- end
266
- io.close
267
- err = $?.to_i
390
+ if options[:dry_run]
391
+ info.call("Would have invoked: ",cmd)
392
+ elsif options[:parallel]
393
+ info.call("Add parallel job: ",cmd)
394
+ parallel_cmds << cmd
395
+ else
396
+ err = execute.call(cmd)
268
397
  end
269
398
  err
270
399
  else
271
- debug.call("Invoking ",cmd) if options[:debug]
272
- system(cmd)
273
- $?.exitstatus
400
+ if options[:dry_run]
401
+ info.call("Would have invoked ",cmd)
402
+ 0
403
+ else
404
+ debug.call("Invoking ",cmd) if options[:debug]
405
+ system(cmd)
406
+ $?.exitstatus
407
+ end
274
408
  end
275
409
  if errno != 0
276
410
  debug.call "Gemma exit ",errno
@@ -280,11 +414,14 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
280
414
  end
281
415
  end
282
416
 
417
+ # Takes the hash value and checks whether the (output) file exists
283
418
  # returns datafn, logfn, cache_hit
284
- cache = lambda do | chr, ext |
419
+ cache = lambda do | chr, ext, h=HASH, permutation=0 |
285
420
  inject = (chr==nil ? "" : ".#{chr}" )+ext
286
- hashi = (chr==nil ? HASH : HASH+inject)
287
- prefix = options[:cache_dir]+'/'+hashi
421
+ hashi = (chr==nil ? h : h+inject)
422
+ prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
423
+ # for chr 3 and permutation 1 forms something like
424
+ # /tmp/1b700-a996f.3.cXX.txt.1.log.txt
288
425
  logfn = prefix+".log.txt"
289
426
  datafn = prefix+ext
290
427
  record[:files] ||= []
@@ -320,25 +457,32 @@ kinship = lambda do | chr = nil |
320
457
  end
321
458
 
322
459
  # ---- Run GWA
323
- gwas = lambda do | chr, kfn, pfn |
460
+ gwas = lambda do | chr, kfn, pfn, permutation=0 |
324
461
  record[:type] = "GWA"
325
462
  error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
326
- hashi, cache_hit = cache.call chr,".assoc.txt"
463
+ # Update hash for each permutation
464
+ hash = compute_hash.call(pfn)
465
+ hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
327
466
  if not cache_hit
328
467
  args = [ '-k', kfn, '-o', hashi ]
329
468
  args << [ '-loco', chr ] if chr != nil
330
469
  args << [ '-p', pfn ] if pfn
331
- invoke_gemma.call args
470
+ invoke_gemma.call args,false,chr,permutation
332
471
  end
333
472
  end
334
473
 
335
474
  LOCO = options[:loco]
336
- # if GEMMA_ARGS.include? '-gk'
475
+ if LOCO
476
+ if options[:chromosomes]
477
+ CHROMOSOMES = options[:chromosomes]
478
+ end
479
+ end
480
+
337
481
  if DO_COMPUTE_KINSHIP
338
482
  # compute K
339
- info.call LOCO
340
- if LOCO != nil
341
- LOCO.each do |chr|
483
+ info.call CHROMOSOMES
484
+ if LOCO
485
+ CHROMOSOMES.each do |chr|
342
486
  info.call "LOCO for ",chr
343
487
  kinship.call(chr)
344
488
  end
@@ -347,13 +491,24 @@ if DO_COMPUTE_KINSHIP
347
491
  end
348
492
  else
349
493
  # DO_COMPUTE_GWA
350
- json_in = JSON.parse(File.read(options[:input]))
494
+ begin
495
+ json_in = JSON.parse(File.read(options[:input]))
496
+ rescue TypeError
497
+ raise "Missing JSON input file?"
498
+ end
351
499
  raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
352
500
 
353
501
  pfn = options[:permute_phenotypes] # can be nil
354
- k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
355
- k_files.each do | chr, kfn | # call a GWA for each chromosome
356
- gwas.call(chr,kfn,pfn)
502
+ if LOCO
503
+ k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
504
+ k_files.each do | chr, kfn | # call a GWA for each chromosome
505
+ gwas.call(chr,kfn,pfn)
506
+ end
507
+ else
508
+ kfn = json_in["files"][0][2]
509
+ CHROMOSOMES.each do | chr |
510
+ gwas.call(chr,kfn,pfn)
511
+ end
357
512
  end
358
513
  # Permute
359
514
  if options[:permutate]
@@ -364,10 +519,10 @@ else
364
519
  end
365
520
  score_list = []
366
521
  debug.call(options[:permutate],"x permutations")
367
- (1..options[:permutate]).each do |i|
368
- $stderr.print "Iteration ",i,"\n"
522
+ (1..options[:permutate]).each do |permutation|
523
+ $stderr.print "Iteration ",permutation,"\n"
369
524
  # Create a shuffled phenotype file
370
- file = File.open("phenotypes-#{i}","w")
525
+ file = File.open("phenotypes-#{permutation}","w")
371
526
  tmp_pfn = file.path
372
527
  p tmp_pfn
373
528
  ps.shuffle.each do | l |
@@ -375,20 +530,23 @@ else
375
530
  end
376
531
  file.close
377
532
  k_files.each do | chr, kfn | # call a GWA for each chromosome
378
- gwas.call(chr,kfn,tmp_pfn)
533
+ gwas.call(chr,kfn,tmp_pfn,permutation)
379
534
  end
380
- # p [:HEY,record[:files].last]
381
- assocfn = record[:files].last[2]
382
- debug.call("Reading ",assocfn)
383
535
  score_min = 1000.0
384
- File.foreach(assocfn).with_index do |assoc, assoc_line_num|
385
- if assoc_line_num > 0
386
- value = assoc.strip.split(/\t/).last.to_f
387
- score_min = value if value < score_min
536
+ if false and not options[:slurm]
537
+ # p [:HEY,record[:files].last]
538
+ assocfn = record[:files].last[2]
539
+ debug.call("Reading ",assocfn)
540
+ File.foreach(assocfn).with_index do |assoc, assoc_line_num|
541
+ if assoc_line_num > 0
542
+ value = assoc.strip.split(/\t/).last.to_f
543
+ score_min = value if value < score_min
544
+ end
388
545
  end
389
546
  end
390
547
  score_list << score_min
391
548
  end
549
+ exit 0 if options[:slurm]
392
550
  ls = score_list.sort
393
551
  p ls
394
552
  significant = ls[(ls.size - ls.size*0.95).floor]
@@ -399,5 +557,40 @@ else
399
557
  end
400
558
  end
401
559
 
560
+ # ---- Invoke parallel
561
+ if options[:parallel]
562
+ # parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
563
+ cmd = parallel_cmds.join("\\n")
564
+
565
+ cmd = "echo -e \"#{cmd}\""
566
+ err = execute.call(cmd+"|parallel --joblog #{joblog}") # first try optimistically to run all jobs in parallel
567
+ if err != 0
568
+ [16,8,4,1].each do |jobs|
569
+ info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
570
+ err = execute.call(cmd+"|parallel -j #{jobs} --resume --joblog #{joblog}")
571
+ break if err == 0
572
+ end
573
+ if err != 0
574
+ info.call("Run failed!")
575
+ # Remove remaining files
576
+ FileUtils.rm_rf("#{tmpdir}/*", secure: true)
577
+ exit err
578
+ end
579
+ end
580
+ info.call("Run successful!")
581
+ end
402
582
  json_out.call
403
- exit 0
583
+
584
+ # copy all output files to the cache_dir. If a file exists only emit a warning
585
+ Dir.glob("*.txt", base: tmpdir) do | fn |
586
+ source = tmpdir + "/" + fn
587
+ dest = options[:cache_dir] + "/" + fn
588
+ if not File.exist?(dest) or options[:force]
589
+ info.call "Move #{source} to #{dest}"
590
+ FileUtils.mv source, dest, verbose: false
591
+ else
592
+ warning.call "File #{dest} already exists. Not overwriting"
593
+ end
594
+ end
595
+
596
+ end # tmpdir
@@ -2,10 +2,11 @@ Gem::Specification.new do |s|
2
2
  s.name = 'bio-gemma-wrapper'
3
3
  s.version = File.read('VERSION')
4
4
  s.summary = "GEMMA with LOCO and permutations"
5
- s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
5
+ s.description = "GEMMA wrapper adds LOCO and permutation support. Also runs in parallel and caches K between runs with LOCO support"
6
6
  s.authors = ["Pjotr Prins"]
7
7
  s.email = 'pjotr.public01@thebird.nl'
8
8
  s.files = ["bin/gemma-wrapper",
9
+ "lib/lock.rb",
9
10
  "Gemfile",
10
11
  "LICENSE.txt",
11
12
  "README.md",
data/lib/lock.rb ADDED
@@ -0,0 +1,95 @@
1
+ # Locking module for gemma (wrapper)
2
+ #
3
+
4
+ =begin
5
+
6
+ The logic is as follows:
7
+
8
+ 1. a program creates a named lock file (based on a hash of its inputs) with its PID
9
+ 2. on exit it destroys the file
10
+ 3. a new program checks for the lock file
11
+ 4. if it exists and the PID is still in the ps table - wait
12
+ 5. when the pid disappears or the lock file - continue
13
+ 6. a timeout will return an error in 3 minutes
14
+
15
+ Note that there is a theoretical chance the lock file existed, but disappeared. I think I have it covered by ignoring the unlink errors. Also the use of /proc/PID is Linux specific.
16
+
17
+ =end
18
+
19
+
20
+ require 'timeout'
21
+
22
+ module Lock
23
+
24
+ def self.local name
25
+ ENV['HOME']+"/."+name.gsub("/","-")+".lck"
26
+ end
27
+
28
+ def self.lock_pid name
29
+ lockfn = local(name)
30
+ if File.exist?(lockfn)
31
+ File.read(lockfn).to_i
32
+ else
33
+ 0
34
+ end
35
+ end
36
+
37
+ def self.locked? name
38
+ lockfn = local(name)
39
+ pid = lock_pid(name)
40
+ if File.exist?("/proc/#{pid}")
41
+ true
42
+ else
43
+ # the program went away - remove any 'stale' lock
44
+ begin
45
+ File.unlink(lockfn)
46
+ rescue Errno::ENOENT
47
+ # ignore error when the lock file went missing
48
+ end
49
+ false # --> no longer locked
50
+ end
51
+ end
52
+
53
+ def Lock::create name
54
+ wait_for(name)
55
+ lockfn = local(name)
56
+ if File.exist?(lockfn)
57
+ $stderr.print "\nERROR: Can not steal #{lockfn}"
58
+ exit 1
59
+ end
60
+ File.open(lockfn, File::RDWR|File::CREAT, 0644) do |f|
61
+ f.flock(File::LOCK_EX)
62
+ f.print(Process.pid)
63
+ end
64
+ end
65
+
66
+ def Lock::wait_for name
67
+ lockfn = local(name)
68
+ begin
69
+ status = Timeout::timeout(180) { # 3 minutes
70
+ while locked?(name)
71
+ $stderr.print("\nWaiting for lock #{lockfn}...")
72
+ sleep 2
73
+ end
74
+ }
75
+ rescue Timeout::Error
76
+ $stderr.print "\nERROR: Timed out, but I can not steal #{lockfn}"
77
+ exit 1
78
+ end
79
+ # yah! lock is released
80
+ end
81
+
82
+ def Lock::release name
83
+ lockfn = local(name)
84
+ if Process.pid == lock_pid(name)
85
+ begin
86
+ File.unlink(lockfn) # PID expired
87
+ rescue Errno::ENOENT
88
+ # ignore error when the lock file went missing
89
+ end
90
+ else
91
+ $stderr.print "\nERROR: can not release #{lockfn} because it is not owned by me"
92
+ end
93
+ end
94
+
95
+ end
metadata CHANGED
@@ -1,17 +1,17 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-gemma-wrapper
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.98.1
4
+ version: 0.99.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pjotr Prins
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-11-20 00:00:00.000000000 Z
11
+ date: 2021-11-25 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: GEMMA wrapper adds LOCO and permutation support. Also caches K between
14
- runs with LOCO support
13
+ description: GEMMA wrapper adds LOCO and permutation support. Also runs in parallel
14
+ and caches K between runs with LOCO support
15
15
  email: pjotr.public01@thebird.nl
16
16
  executables:
17
17
  - gemma-wrapper
@@ -24,6 +24,7 @@ files:
24
24
  - VERSION
25
25
  - bin/gemma-wrapper
26
26
  - gemma-wrapper.gemspec
27
+ - lib/lock.rb
27
28
  homepage: https://github.com/genetics-statistics/gemma-wrapper
28
29
  licenses:
29
30
  - GPL3
@@ -43,8 +44,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
43
44
  - !ruby/object:Gem::Version
44
45
  version: '0'
45
46
  requirements: []
46
- rubyforge_project:
47
- rubygems_version: 2.6.8
47
+ rubygems_version: 3.1.4
48
48
  signing_key:
49
49
  specification_version: 4
50
50
  summary: GEMMA with LOCO and permutations