bio-gemma-wrapper 0.98.1 → 0.99.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: b7a8bfe787236f397dba6e05aef202486ba53389
4
- data.tar.gz: 9fe2398ed3fcd053e8258f73a64b1610bfddf9e2
2
+ SHA256:
3
+ metadata.gz: da5f26b8acd9c3782c2b3f5f2a39af965fc7e1785cc820b49faca82924d74e51
4
+ data.tar.gz: 17035ee5fada269ae88dd0ed91d84075b2af88b400de1d0e9829cbdb60d5d0cb
5
5
  SHA512:
6
- metadata.gz: 0a691906f13da3469597517d160874315f5962822b0273757d33c37283c2a9a98da9a41d08c51e4ed7854b3d3593f9d96ac81d59212690edd628a2677499d501
7
- data.tar.gz: 5d4b898ff3566f52652cbb7db6597dc245e9a5da6b0d9da107de81deec0d51c02c822762e0ab921a2309c1dbc3455b55c1adc5fe94c891e31a9c32cb6d343f2e
6
+ metadata.gz: eaec3c7dad4fc1bda713765e056bfe11dd69d4ca850333fed5a1a27e344724365a705ddf7845ce63b5af6b35ab6140da10f4bc7067aaa4539e47f6c6f94de1f0
7
+ data.tar.gz: c26b282c0fd7c70a702467e58c3f6ea22f820d91a8a364b335cdab7e807add9cf1079faa25c46a87b89c78e4293990cdea2210427c5f9c3565bd5040fdbef496
data/README.md CHANGED
@@ -1,12 +1,20 @@
1
1
  [![gemma-wrapper gem version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
2
2
 
3
- # GEMMA wrapper caches K between runs with LOCO support
3
+ # GEMMA with LOCO, permutations and slurm support (and caching)
4
4
 
5
5
  ![Genetic associations identified in CFW mice using GEMMA (Parker et al,
6
6
  Nat. Genet., 2016)](cfw.gif)
7
7
 
8
8
  ## Introduction
9
9
 
10
+ Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
11
+ GEMMA in parallel (now the default with LOCO), and GEMMA on
12
+ PBS. Gemma-wrapper is used to run GEMMA as part of the
13
+ https://genenetwork.org/ environment.
14
+
15
+ Note that a version of gemma-wrapper is projected to be integrated
16
+ into gemma itself.
17
+
10
18
  GEMMA is a software toolkit for fast application of linear mixed
11
19
  models (LMMs) and related models to genome-wide association studies
12
20
  (GWAS) and other large-scale data sets.
@@ -14,15 +22,21 @@ models (LMMs) and related models to genome-wide association studies
14
22
  This repository contains gemma-wrapper, essentially a wrapper of
15
23
  GEMMA that provides support for caching the kinship or relatedness
16
24
  matrix (K) and caching LM and LMM computations with the option of full
17
- leave-one-chromosome-out genome scans (LOCO).
25
+ leave-one-chromosome-out genome scans (LOCO). Jobs can also be
26
+ submitted to HPC PBS, i.e., slurm.
18
27
 
19
28
  gemma-wrapper requires a recent version of GEMMA and essentially
20
29
  does a pass-through of all standard GEMMA invocation switches. On
21
30
  return gemma-wrapper can return a JSON object (--json) which is
22
31
  useful for web-services.
23
32
 
24
- Note that this a work in progress (WIP). What is described below
25
- should work.
33
+ ## Performance
34
+
35
+ LOCO runs in parallel by default which is at least a 5x performance
36
+ improvement on a machine with enough cores. GEMMA without LOCO,
37
+ however, does not run in parallel by default. Performance
38
+ improvements with the parallel implementation for LOCO and non-LOCO
39
+ can be viewed [here](./test/performance/releases.gmi).
26
40
 
27
41
  ## Installation
28
42
 
@@ -32,8 +46,9 @@ Prerequisites are
32
46
  * Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
33
47
  almost all Linux systems
34
48
 
35
- gemma-wrapper comes as a Ruby [gem](https://rubygems.org/gems/bio-gemma-wrapper) and
36
- can be installed with
49
+ gemma-wrapper comes as a Ruby
50
+ [gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
51
+ installed with
37
52
 
38
53
  gem install bio-gemma-wrapper
39
54
 
@@ -47,14 +62,19 @@ and it will render something like
47
62
  Usage: gemma-wrapper [options] -- [gemma-options]
48
63
  --permutate n Permutate # times by shuffling phenotypes
49
64
  --permute-phenotypes filen Phenotypes to be shuffled in permutations
50
- --loco [x,y,1,2,3...] Run full LOCO
65
+ --loco Run full leave-one-chromosome-out (LOCO)
66
+ --chromosomes [1,2,3] Run specific chromosomes
51
67
  --input filen JSON input variables (used for LOCO)
52
68
  --cache-dir path Use a cache directory
53
69
  --json Create output file in JSON format
54
- --force Force computation
70
+ --force Force computation (override cache)
71
+ --parallel Run jobs in parallel
72
+ --no-parallel Do not run jobs in parallel
73
+ --slurm[=opts] Use slurm PBS for submitting jobs
55
74
  --q, --quiet Run quietly
56
75
  -v, --verbose Run verbosely
57
- --debug Show debug messages and keep intermediate output
76
+ -d, --debug Show debug messages and keep intermediate output
77
+ --dry-run Show commands, but don't execute
58
78
  -- Anything after gets passed to GEMMA
59
79
 
60
80
  -h, --help display this help and exit
@@ -69,6 +89,8 @@ Unpack it and run the tool as
69
89
 
70
90
  ./bin/gemma-wrapper --help
71
91
 
92
+ See below for using a GNU Guix environment.
93
+
72
94
  ## Usage
73
95
 
74
96
  gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
@@ -90,12 +112,13 @@ the data files are found):
90
112
  gemma-wrapper -- \
91
113
  -g test/data/input/BXD_geno.txt.gz \
92
114
  -p test/data/input/BXD_pheno.txt \
115
+ -a test/data/input/BXD_snps.txt \
93
116
  -gk \
94
117
  -debug
95
118
 
96
119
  Run it twice to see
97
120
 
98
- /tmp/3079151e14b219c3b243b673d88001c1675168b4.log.txt gemma-wrapper CACHE HIT!
121
+ /tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
99
122
 
100
123
  gemma-wrapper computes the unique HASH value over the command
101
124
  line switches passed into GEMMA as well as the contents of the files
@@ -107,10 +130,12 @@ You can also get JSON output on STDOUT by providing the --json switch
107
130
  gemma-wrapper --json -- \
108
131
  -g test/data/input/BXD_geno.txt.gz \
109
132
  -p test/data/input/BXD_pheno.txt \
133
+ -a test/data/input/BXD_snps.txt \
110
134
  -gk \
111
- -debug
135
+ -debug > K.json
112
136
 
113
- prints out something that can be parsed with a calling program
137
+ K.json is something that can be parsed with a calling program, and is
138
+ also below as input for the GWA step. Example:
114
139
 
115
140
  ```json
116
141
  {"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
@@ -123,6 +148,7 @@ default. If you want something else provide a --cache-dir, e.g.
123
148
  gemma-wrapper --cache-dir ~/.gemma-cache -- \
124
149
  -g test/data/input/BXD_geno.txt.gz \
125
150
  -p test/data/input/BXD_pheno.txt \
151
+ -a test/data/input/BXD_snps.txt \
126
152
  -gk \
127
153
  -debug
128
154
 
@@ -130,10 +156,10 @@ will store K in ~/.gemma-cache.
130
156
 
131
157
  ### GWA
132
158
 
133
- Run the LMM using the K's captured in K.json using the --input
159
+ Run the LMM using the K's captured earlier in K.json using the --input
134
160
  switch
135
161
 
136
- gemma-wrapper --json --loco --input K.json -- \
162
+ gemma-wrapper --json --input K.json -- \
137
163
  -g test/data/input/BXD_geno.txt.gz \
138
164
  -p test/data/input/BXD_pheno.txt \
139
165
  -c test/data/input/BXD_covariates2.txt \
@@ -153,7 +179,7 @@ https://github.com/genetics-statistics/GEMMA/issues/46). To loop all
153
179
  chromosomes first create all K's with
154
180
 
155
181
  gemma-wrapper --json \
156
- --loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \
182
+ --loco -- \
157
183
  -g test/data/input/BXD_geno.txt.gz \
158
184
  -p test/data/input/BXD_pheno.txt \
159
185
  -a test/data/input/BXD_snps.txt \
@@ -201,12 +227,24 @@ Next, using K.json, permute the phenotypes with something like
201
227
  -lmm 2 -maf 0.1 \
202
228
  -debug > GWA.json
203
229
 
204
- This should get the 95% significant and 67% suggestive thresholds:
230
+ This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
231
+
232
+ ["95 percentile (significant) ", 1.92081e-05, 4.7]
233
+ ["67 percentile (suggestive) ", 5.227785e-05, 4.3]
234
+
235
+ ### Slurm PBS
205
236
 
206
- ["95 percentile (significant) ", 2.015475e-05, 4.7]
207
- ["67 percentile (suggestive) ", 2.015475e-05, 4.7]
237
+ To run gemma-wrapper on HPC use the '--slurm' switch.
208
238
 
239
+ ## Development
240
+
241
+ We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
242
+
243
+ ```
244
+ source .guix-deploy
245
+ ruby bin/gemma-wrapper --help
246
+ ```
209
247
 
210
248
  ## Copyright
211
249
 
212
- Copyright (c) 2017,2018 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
250
+ Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.98.1
1
+ 0.99.4
data/bin/gemma-wrapper CHANGED
@@ -4,7 +4,7 @@
4
4
  # Author:: Pjotr Prins
5
5
  # License:: GPL3
6
6
  #
7
- # Copyright (C) 2017,2018 Pjotr Prins <pjotr.prins@thebird.nl>
7
+ # Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
8
8
 
9
9
  USAGE = "
10
10
  GEMMA wrapper example:
@@ -14,12 +14,12 @@ GEMMA wrapper example:
14
14
  gemma-wrapper -- \\
15
15
  -g test/data/input/BXD_geno.txt.gz \\
16
16
  -p test/data/input/BXD_pheno.txt \\
17
+ -a test/data/input/BXD_snps.txt \
17
18
  -gk
18
19
 
19
20
  LOCO K computation with caching and JSON output
20
21
 
21
- gemma-wrapper --json \\
22
- --loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \\
22
+ gemma-wrapper --json --loco -- \\
23
23
  -g test/data/input/BXD_geno.txt.gz \\
24
24
  -p test/data/input/BXD_pheno.txt \\
25
25
  -a test/data/input/BXD_snps.txt \\
@@ -38,11 +38,10 @@ GEMMA wrapper example:
38
38
  Gemma gets used from the path. You can override by setting
39
39
 
40
40
  env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
41
-
42
41
  "
43
42
  # These are used for testing compatibility with the gemma tool
44
43
  GEMMA_V_MAJOR = 98
45
- GEMMA_V_MINOR = 0
44
+ GEMMA_V_MINOR = 4
46
45
 
47
46
  basepath = File.dirname(File.dirname(__FILE__))
48
47
  $: << File.join(basepath,'lib')
@@ -66,17 +65,21 @@ if not gemma_command
66
65
  end
67
66
 
68
67
 
68
+ require 'digest/sha1'
69
69
  require 'fileutils'
70
70
  require 'optparse'
71
- require 'tmpdir'
72
71
  require 'tempfile'
72
+ require 'tmpdir'
73
+
74
+ require 'lock'
73
75
 
74
76
  split_at = ARGV.index('--')
77
+
75
78
  if split_at
76
79
  gemma_args = ARGV[split_at+1..-1]
77
80
  end
78
81
 
79
- options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
82
+ options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, permute_phenotypes: false, parallel: nil }
80
83
 
81
84
  opts = OptionParser.new do |o|
82
85
  o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
@@ -91,8 +94,12 @@ opts = OptionParser.new do |o|
91
94
  raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
92
95
  end
93
96
 
94
- o.on('--loco [x,y,1,2,3...]', Array, 'Run full LOCO') do |lst|
95
- options[:loco] = lst
97
+ o.on('--loco', 'Run full leave-one-chromosome-out (LOCO)') do |b|
98
+ options[:loco] = b
99
+ end
100
+
101
+ o.on('--chromosomes [1,2,3]',Array,'Run specific chromosomes') do |lst|
102
+ options[:chromosomes] = lst
96
103
  end
97
104
 
98
105
  o.on('--input filen',String, 'JSON input variables (used for LOCO)') do |filen|
@@ -112,6 +119,22 @@ opts = OptionParser.new do |o|
112
119
  options[:force] = true
113
120
  end
114
121
 
122
+ o.on("--parallel", "Run jobs in parallel") do |b|
123
+ options[:parallel] = true
124
+ end
125
+
126
+ o.on("--no-parallel", "Do not run jobs in parallel") do |b|
127
+ options[:parallel] = false
128
+ end
129
+
130
+ o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
131
+ options[:slurm_opts] = ""
132
+ options[:slurm] = true
133
+ if slurm
134
+ options[:slurm_opts] = slurm
135
+ end
136
+ end
137
+
115
138
  o.on("--q", "--quiet", "Run quietly") do |q|
116
139
  options[:quiet] = true
117
140
  end
@@ -120,15 +143,20 @@ opts = OptionParser.new do |o|
120
143
  options[:verbose] = true
121
144
  end
122
145
 
123
- o.on("--debug", "Show debug messages and keep intermediate output") do |v|
146
+ o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
124
147
  options[:debug] = true
125
148
  end
126
149
 
150
+ o.on("--dry-run", "Show commands, but don't execute") do |b|
151
+ options[:dry_run] = b
152
+ end
153
+
127
154
  o.on('--','Anything after gets passed to GEMMA') do
128
155
  o.terminate()
129
156
  end
130
157
 
131
158
  o.separator ""
159
+
132
160
  o.on_tail('-h', '--help', 'display this help and exit') do
133
161
  options[:show_help] = true
134
162
  end
@@ -168,26 +196,46 @@ warning = lambda do |*msg|
168
196
  record[:warnings].push *msg.join("")
169
197
  OUTPUT.print "WARNING: ",*msg,"\n"
170
198
  end
199
+
171
200
  info = lambda do |*msg|
172
201
  record[:debug].push *msg.join("") if options[:json] and options[:debug]
173
202
  OUTPUT.print *msg,"\n" if !options[:quiet]
174
203
  end
175
204
 
205
+ # Fetch chromosomes
206
+ def get_chromosomes annofn
207
+ h = {}
208
+ File.open(annofn,"r").each_line do | line |
209
+ chr = line.split(/\s+/)[2]
210
+ h[chr] = true
211
+ end
212
+ h.map { |k,v| k }
213
+ end
176
214
  # ---- Start banner
177
215
 
178
216
  GEMMA_K_VERSION=version
179
- GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017,2018\n"
217
+ GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
180
218
  info.call GEMMA_K_BANNER
181
219
 
182
220
  # Check gemma version
183
- GEMMA_COMMAND=options[:gemma_command]
221
+ begin
222
+ gemma_command2 = options[:gemma_command]
223
+ info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
224
+
225
+ GEMMA_INFO = `#{gemma_command2}`
226
+ rescue Errno::ENOENT
227
+ gemma_command2 = "gemma"
228
+ error.call "<#{gemma_command2}> command not found"
229
+ end
184
230
 
185
- gemma_version_header = `#{GEMMA_COMMAND}`.split("\n").grep(/GEMMA|Version/)[0].strip
231
+ gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
186
232
  info.call "Using ",gemma_version_header,"\n"
187
233
  gemma_version = gemma_version_header.split(/[,\s]+/)[1]
188
234
  v_version, v_major, v_minor = gemma_version.split(".")
189
235
  info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
190
236
 
237
+ info.call gemma_version_header
238
+
191
239
  warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
192
240
 
193
241
  options[:gemma_version_header] = gemma_version_header
@@ -203,74 +251,160 @@ if RUBY_VERSION =~ /^1/
203
251
  warning "runs on Ruby 2.x only\n"
204
252
  end
205
253
 
254
+ # ---- LOCO defaults to parallel
255
+ if options[:parallel] == nil
256
+ options[:parallel] = true if options[:loco]
257
+ end
258
+
259
+ debug.call(options) # some debug output
260
+ debug.call(record)
261
+
206
262
  DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
207
263
  DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
208
264
 
265
+ if options[:parallel]
266
+ begin
267
+ skip_cite = `echo "will cite" |parallel --citation`
268
+ debug.call(skip_cite)
269
+ PARALLEL_INFO = `parallel --help`
270
+ rescue Errno::ENOENT
271
+ error.call "<parallel> command not found"
272
+ end
273
+ parallel_cmds = []
274
+ end
275
+
276
+ # ---- Fetch chromosomes from SNP annotation file
277
+ anno_idx = gemma_args.index '-a'
278
+ raise "Expected GEMMA -a genotype file switch" if anno_idx == nil
279
+ CHROMOSOMES = get_chromosomes(gemma_args[anno_idx+1])
280
+
209
281
  # ---- Compute HASH on inputs
210
282
  hashme = []
211
283
  geno_idx = gemma_args.index '-g'
212
284
  raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
213
285
  pheno_idx = gemma_args.index '-p'
214
- hashme =
215
- if DO_COMPUTE_KINSHIP and pheno_idx != nil
216
- # Remove the phenotype file from the hash
217
- gemma_args[0..pheno_idx-1] + gemma_args[pheno_idx+2..-1]
218
- else
219
- gemma_args
220
- end
221
286
 
222
287
  if DO_COMPUTE_GWA and options[:permute_phenotypes]
223
288
  raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
224
- hashme += ['-p', options[:permute_phenotypes]]
225
289
  end
226
290
 
227
- require 'digest/sha1'
228
- debug.call "Hashing on ",hashme,"\n"
229
- hashes = []
230
- hashme.each do | item |
231
- if File.exist?(item)
232
- hashes << Digest::SHA1.hexdigest(File.read(item))
233
- debug.call [item,hashes.last]
291
+ execute = lambda { |cmd|
292
+ info.call("Executing: #{cmd}")
293
+ err = 0
294
+ if not options[:debug]
295
+ # send output to stderr line by line
296
+ IO.popen("#{cmd}") do |io|
297
+ while s = io.gets
298
+ $stderr.print s
299
+ end
300
+ io.close
301
+ err = $?.to_i
302
+ end
234
303
  else
235
- hashes << item
304
+ $stderr.print `#{cmd}`
305
+ err = $?.to_i
306
+ end
307
+ err
308
+ }
309
+
310
+ compute_hash = lambda do | phenofn = nil |
311
+ # Compute a HASH on the inputs
312
+ debug.call "Hashing on ",hashme,"\n"
313
+ hashes = []
314
+ hm = if phenofn
315
+ hashme + ["-p", phenofn]
316
+ else
317
+ hashme
318
+ end
319
+ debug.call(hm)
320
+ hm.each do | item |
321
+ if File.file?(item)
322
+ hashes << Digest::SHA1.hexdigest(File.read(item))
323
+ debug.call [item,hashes.last]
324
+ else
325
+ hashes << item
326
+ end
236
327
  end
328
+ debug.call(hashes)
329
+ Digest::SHA1.hexdigest hashes.join(' ')
237
330
  end
238
- HASH = Digest::SHA1.hexdigest hashes.join(' ')
239
331
 
332
+ HASH = compute_hash.call()
240
333
  options[:hash] = HASH
241
334
 
335
+ at_exit do
336
+ Lock.release(HASH)
337
+ end
338
+
339
+ Lock.create(HASH) # this will wait for a lock to expire
340
+
341
+ joblog = options[:cache_dir]+"/"+HASH+"-parallel.log"
342
+
242
343
  # Create cache dir
243
344
  FileUtils::mkdir_p options[:cache_dir]
244
345
 
346
+ Dir.mktmpdir do |tmpdir| # tmpdir for GEMMA output
347
+
245
348
  error.call "Do not use the GEMMA -o switch!" if gemma_args.include? '-o'
246
349
  error.call "Do not use the GEMMA -outdir switch!" if gemma_args.include? '-outdir'
350
+ GEMMA_ARGS_HASH = gemma_args.dup # do not include outdir
247
351
  gemma_args << '-outdir'
248
- gemma_args << options[:cache_dir]
352
+ gemma_args << tmpdir
249
353
  GEMMA_ARGS = gemma_args
250
354
 
355
+ hashme =
356
+ if DO_COMPUTE_KINSHIP and pheno_idx != nil
357
+ # Remove the phenotype file from the hash for GRM computation
358
+ GEMMA_ARGS_HASH[0..pheno_idx-1] + GEMMA_ARGS_HASH[pheno_idx+2..-1]
359
+ else
360
+ GEMMA_ARGS_HASH
361
+ end
362
+
251
363
  debug.call "Options: ",options,"\n" if !options[:quiet]
252
364
 
253
- invoke_gemma = lambda do |extra_args, cache_hit = false|
254
- cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
365
+ invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
366
+ cmd = "#{gemma_command2} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
255
367
  record[:gemma_command] = cmd
256
368
  return if cache_hit
257
- debug.call cmd
369
+ if options[:slurm]
370
+ info.call cmd
371
+ hashi = HASH
372
+ prefix = tmpdir+'/'+hashi
373
+ scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
374
+ script = "#!/bin/bash
375
+ #SBATCH --job-name=gemma-#{scriptfn}
376
+ #SBATCH --ntasks=1
377
+ #SBATCH --time=20:00
378
+ srun #{cmd}
379
+ "
380
+ debug.call(script)
381
+ File.open(scriptfn,"w") { |f|
382
+ f.write(script)
383
+ }
384
+ cmd = "sbatch "+options[:slurm_opts] + scriptfn
385
+ end
258
386
  errno =
259
387
  if options[:json]
260
388
  # capture output
261
389
  err = 0
262
- IO.popen(cmd) do |io|
263
- while s = io.gets
264
- $stderr.print s
265
- end
266
- io.close
267
- err = $?.to_i
390
+ if options[:dry_run]
391
+ info.call("Would have invoked: ",cmd)
392
+ elsif options[:parallel]
393
+ info.call("Add parallel job: ",cmd)
394
+ parallel_cmds << cmd
395
+ else
396
+ err = execute.call(cmd)
268
397
  end
269
398
  err
270
399
  else
271
- debug.call("Invoking ",cmd) if options[:debug]
272
- system(cmd)
273
- $?.exitstatus
400
+ if options[:dry_run]
401
+ info.call("Would have invoked ",cmd)
402
+ 0
403
+ else
404
+ debug.call("Invoking ",cmd) if options[:debug]
405
+ system(cmd)
406
+ $?.exitstatus
407
+ end
274
408
  end
275
409
  if errno != 0
276
410
  debug.call "Gemma exit ",errno
@@ -280,11 +414,14 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
280
414
  end
281
415
  end
282
416
 
417
+ # Takes the hash value and checks whether the (output) file exists
283
418
  # returns datafn, logfn, cache_hit
284
- cache = lambda do | chr, ext |
419
+ cache = lambda do | chr, ext, h=HASH, permutation=0 |
285
420
  inject = (chr==nil ? "" : ".#{chr}" )+ext
286
- hashi = (chr==nil ? HASH : HASH+inject)
287
- prefix = options[:cache_dir]+'/'+hashi
421
+ hashi = (chr==nil ? h : h+inject)
422
+ prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
423
+ # for chr 3 and permutation 1 forms something like
424
+ # /tmp/1b700-a996f.3.cXX.txt.1.log.txt
288
425
  logfn = prefix+".log.txt"
289
426
  datafn = prefix+ext
290
427
  record[:files] ||= []
@@ -320,25 +457,32 @@ kinship = lambda do | chr = nil |
320
457
  end
321
458
 
322
459
  # ---- Run GWA
323
- gwas = lambda do | chr, kfn, pfn |
460
+ gwas = lambda do | chr, kfn, pfn, permutation=0 |
324
461
  record[:type] = "GWA"
325
462
  error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
326
- hashi, cache_hit = cache.call chr,".assoc.txt"
463
+ # Update hash for each permutation
464
+ hash = compute_hash.call(pfn)
465
+ hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
327
466
  if not cache_hit
328
467
  args = [ '-k', kfn, '-o', hashi ]
329
468
  args << [ '-loco', chr ] if chr != nil
330
469
  args << [ '-p', pfn ] if pfn
331
- invoke_gemma.call args
470
+ invoke_gemma.call args,false,chr,permutation
332
471
  end
333
472
  end
334
473
 
335
474
  LOCO = options[:loco]
336
- # if GEMMA_ARGS.include? '-gk'
475
+ if LOCO
476
+ if options[:chromosomes]
477
+ CHROMOSOMES = options[:chromosomes]
478
+ end
479
+ end
480
+
337
481
  if DO_COMPUTE_KINSHIP
338
482
  # compute K
339
- info.call LOCO
340
- if LOCO != nil
341
- LOCO.each do |chr|
483
+ info.call CHROMOSOMES
484
+ if LOCO
485
+ CHROMOSOMES.each do |chr|
342
486
  info.call "LOCO for ",chr
343
487
  kinship.call(chr)
344
488
  end
@@ -347,13 +491,24 @@ if DO_COMPUTE_KINSHIP
347
491
  end
348
492
  else
349
493
  # DO_COMPUTE_GWA
350
- json_in = JSON.parse(File.read(options[:input]))
494
+ begin
495
+ json_in = JSON.parse(File.read(options[:input]))
496
+ rescue TypeError
497
+ raise "Missing JSON input file?"
498
+ end
351
499
  raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
352
500
 
353
501
  pfn = options[:permute_phenotypes] # can be nil
354
- k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
355
- k_files.each do | chr, kfn | # call a GWA for each chromosome
356
- gwas.call(chr,kfn,pfn)
502
+ if LOCO
503
+ k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
504
+ k_files.each do | chr, kfn | # call a GWA for each chromosome
505
+ gwas.call(chr,kfn,pfn)
506
+ end
507
+ else
508
+ kfn = json_in["files"][0][2]
509
+ CHROMOSOMES.each do | chr |
510
+ gwas.call(chr,kfn,pfn)
511
+ end
357
512
  end
358
513
  # Permute
359
514
  if options[:permutate]
@@ -364,10 +519,10 @@ else
364
519
  end
365
520
  score_list = []
366
521
  debug.call(options[:permutate],"x permutations")
367
- (1..options[:permutate]).each do |i|
368
- $stderr.print "Iteration ",i,"\n"
522
+ (1..options[:permutate]).each do |permutation|
523
+ $stderr.print "Iteration ",permutation,"\n"
369
524
  # Create a shuffled phenotype file
370
- file = File.open("phenotypes-#{i}","w")
525
+ file = File.open("phenotypes-#{permutation}","w")
371
526
  tmp_pfn = file.path
372
527
  p tmp_pfn
373
528
  ps.shuffle.each do | l |
@@ -375,20 +530,23 @@ else
375
530
  end
376
531
  file.close
377
532
  k_files.each do | chr, kfn | # call a GWA for each chromosome
378
- gwas.call(chr,kfn,tmp_pfn)
533
+ gwas.call(chr,kfn,tmp_pfn,permutation)
379
534
  end
380
- # p [:HEY,record[:files].last]
381
- assocfn = record[:files].last[2]
382
- debug.call("Reading ",assocfn)
383
535
  score_min = 1000.0
384
- File.foreach(assocfn).with_index do |assoc, assoc_line_num|
385
- if assoc_line_num > 0
386
- value = assoc.strip.split(/\t/).last.to_f
387
- score_min = value if value < score_min
536
+ if false and not options[:slurm]
537
+ # p [:HEY,record[:files].last]
538
+ assocfn = record[:files].last[2]
539
+ debug.call("Reading ",assocfn)
540
+ File.foreach(assocfn).with_index do |assoc, assoc_line_num|
541
+ if assoc_line_num > 0
542
+ value = assoc.strip.split(/\t/).last.to_f
543
+ score_min = value if value < score_min
544
+ end
388
545
  end
389
546
  end
390
547
  score_list << score_min
391
548
  end
549
+ exit 0 if options[:slurm]
392
550
  ls = score_list.sort
393
551
  p ls
394
552
  significant = ls[(ls.size - ls.size*0.95).floor]
@@ -399,5 +557,40 @@ else
399
557
  end
400
558
  end
401
559
 
560
+ # ---- Invoke parallel
561
+ if options[:parallel]
562
+ # parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
563
+ cmd = parallel_cmds.join("\\n")
564
+
565
+ cmd = "echo -e \"#{cmd}\""
566
+ err = execute.call(cmd+"|parallel --joblog #{joblog}") # first try optimistically to run all jobs in parallel
567
+ if err != 0
568
+ [16,8,4,1].each do |jobs|
569
+ info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
570
+ err = execute.call(cmd+"|parallel -j #{jobs} --resume --joblog #{joblog}")
571
+ break if err == 0
572
+ end
573
+ if err != 0
574
+ info.call("Run failed!")
575
+ # Remove remaining files
576
+ FileUtils.rm_rf("#{tmpdir}/*", secure: true)
577
+ exit err
578
+ end
579
+ end
580
+ info.call("Run successful!")
581
+ end
402
582
  json_out.call
403
- exit 0
583
+
584
+ # copy all output files to the cache_dir. If a file exists only emit a warning
585
+ Dir.glob("*.txt", base: tmpdir) do | fn |
586
+ source = tmpdir + "/" + fn
587
+ dest = options[:cache_dir] + "/" + fn
588
+ if not File.exist?(dest) or options[:force]
589
+ info.call "Move #{source} to #{dest}"
590
+ FileUtils.mv source, dest, verbose: false
591
+ else
592
+ warning.call "File #{dest} already exists. Not overwriting"
593
+ end
594
+ end
595
+
596
+ end # tmpdir
@@ -2,10 +2,11 @@ Gem::Specification.new do |s|
2
2
  s.name = 'bio-gemma-wrapper'
3
3
  s.version = File.read('VERSION')
4
4
  s.summary = "GEMMA with LOCO and permutations"
5
- s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
5
+ s.description = "GEMMA wrapper adds LOCO and permutation support. Also runs in parallel and caches K between runs with LOCO support"
6
6
  s.authors = ["Pjotr Prins"]
7
7
  s.email = 'pjotr.public01@thebird.nl'
8
8
  s.files = ["bin/gemma-wrapper",
9
+ "lib/lock.rb",
9
10
  "Gemfile",
10
11
  "LICENSE.txt",
11
12
  "README.md",
data/lib/lock.rb ADDED
@@ -0,0 +1,95 @@
1
+ # Locking module for gemma (wrapper)
2
+ #
3
+
4
+ =begin
5
+
6
+ The logic is as follows:
7
+
8
+ 1. a program creates a named lock file (based on a hash of its inputs) with its PID
9
+ 2. on exit it destroys the file
10
+ 3. a new program checks for the lock file
11
+ 4. if it exists and the PID is still in the ps table - wait
12
+ 5. when the pid disappears or the lock file - continue
13
+ 6. a timeout will return an error in 3 minutes
14
+
15
+ Note that there is a theoretical chance the lock file existed, but disappeared. I think I have it covered by ignoring the unlink errors. Also the use of /proc/PID is Linux specific.
16
+
17
+ =end
18
+
19
+
20
+ require 'timeout'
21
+
22
+ module Lock
23
+
24
+ def self.local name
25
+ ENV['HOME']+"/."+name.gsub("/","-")+".lck"
26
+ end
27
+
28
+ def self.lock_pid name
29
+ lockfn = local(name)
30
+ if File.exist?(lockfn)
31
+ File.read(lockfn).to_i
32
+ else
33
+ 0
34
+ end
35
+ end
36
+
37
+ def self.locked? name
38
+ lockfn = local(name)
39
+ pid = lock_pid(name)
40
+ if File.exist?("/proc/#{pid}")
41
+ true
42
+ else
43
+ # the program went away - remove any 'stale' lock
44
+ begin
45
+ File.unlink(lockfn)
46
+ rescue Errno::ENOENT
47
+ # ignore error when the lock file went missing
48
+ end
49
+ false # --> no longer locked
50
+ end
51
+ end
52
+
53
+ def Lock::create name
54
+ wait_for(name)
55
+ lockfn = local(name)
56
+ if File.exist?(lockfn)
57
+ $stderr.print "\nERROR: Can not steal #{lockfn}"
58
+ exit 1
59
+ end
60
+ File.open(lockfn, File::RDWR|File::CREAT, 0644) do |f|
61
+ f.flock(File::LOCK_EX)
62
+ f.print(Process.pid)
63
+ end
64
+ end
65
+
66
+ def Lock::wait_for name
67
+ lockfn = local(name)
68
+ begin
69
+ status = Timeout::timeout(180) { # 3 minutes
70
+ while locked?(name)
71
+ $stderr.print("\nWaiting for lock #{lockfn}...")
72
+ sleep 2
73
+ end
74
+ }
75
+ rescue Timeout::Error
76
+ $stderr.print "\nERROR: Timed out, but I can not steal #{lockfn}"
77
+ exit 1
78
+ end
79
+ # yah! lock is released
80
+ end
81
+
82
+ def Lock::release name
83
+ lockfn = local(name)
84
+ if Process.pid == lock_pid(name)
85
+ begin
86
+ File.unlink(lockfn) # PID expired
87
+ rescue Errno::ENOENT
88
+ # ignore error when the lock file went missing
89
+ end
90
+ else
91
+ $stderr.print "\nERROR: can not release #{lockfn} because it is not owned by me"
92
+ end
93
+ end
94
+
95
+ end
metadata CHANGED
@@ -1,17 +1,17 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-gemma-wrapper
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.98.1
4
+ version: 0.99.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pjotr Prins
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-11-20 00:00:00.000000000 Z
11
+ date: 2021-11-25 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: GEMMA wrapper adds LOCO and permutation support. Also caches K between
14
- runs with LOCO support
13
+ description: GEMMA wrapper adds LOCO and permutation support. Also runs in parallel
14
+ and caches K between runs with LOCO support
15
15
  email: pjotr.public01@thebird.nl
16
16
  executables:
17
17
  - gemma-wrapper
@@ -24,6 +24,7 @@ files:
24
24
  - VERSION
25
25
  - bin/gemma-wrapper
26
26
  - gemma-wrapper.gemspec
27
+ - lib/lock.rb
27
28
  homepage: https://github.com/genetics-statistics/gemma-wrapper
28
29
  licenses:
29
30
  - GPL3
@@ -43,8 +44,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
43
44
  - !ruby/object:Gem::Version
44
45
  version: '0'
45
46
  requirements: []
46
- rubyforge_project:
47
- rubygems_version: 2.6.8
47
+ rubygems_version: 3.1.4
48
48
  signing_key:
49
49
  specification_version: 4
50
50
  summary: GEMMA with LOCO and permutations