bio-gemma-wrapper 0.97.1 → 0.99.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 740af7561d0fb801b810713cd623f3d25e78971a
4
- data.tar.gz: ce3aaae3418073dc1365bc8755ac73745ef4b01a
2
+ SHA256:
3
+ metadata.gz: e27a8a3abb00b758095df5956b3854674faf5ff681a93bc028df273c40125c0d
4
+ data.tar.gz: e9675dbb0ea0f087dd21774635d38f3cda11b46a88b36c77dd308086fd0ec5f2
5
5
  SHA512:
6
- metadata.gz: 30043473cf8b09ecf8a6fdcfa97468ba5d169bc7a0b6b6ab2102f7f978b8cb2c63fb5c5cefd7267cfe526e4748b6481aa9b6dbe7309dceff62f0351f79b16b0e
7
- data.tar.gz: 07e50ab7b2b2d87bed1ce21b5b2d123693898163ff9a175841db190fcc22b43ee37ecc1df7cbc697e89b3dcaa193c489c299d3c50c1aa557d9512c4d494adec6
6
+ metadata.gz: 81cf5440fa531d5a831efa787800c8bea230d47cddc666a31fff066551ff347708a41ddf1368c0d3946c7ba9faef8e5882e398ad340850253c53961cce96f662
7
+ data.tar.gz: 582ae78c48a1eb8eeca01172eaeaba9d5ca23e69601967e334f8c218e3a4dd74b297861b01ce49b1357798b49a96c12e737100dcacec7fc34b70da1fc9c75f0d
data/README.md CHANGED
@@ -1,10 +1,19 @@
1
- # GEMMA wrapper caches K between runs with LOCO support
1
+ [![gemma-wrapper gem version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
2
+
3
+ # GEMMA with LOCO, permutations and slurm support (and caching)
2
4
 
3
5
  ![Genetic associations identified in CFW mice using GEMMA (Parker et al,
4
6
  Nat. Genet., 2016)](cfw.gif)
5
7
 
6
8
  ## Introduction
7
9
 
10
+ Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
11
+ GEMMA in parallel (now the default), and GEMMA on PBS. Gemma-wrapper
12
+ is used to run GEMMA as part of the https://genenetwork.org/
13
+ environment.
14
+
15
+ Note that gemma-wrapper is projected to be integrated into gemma2/lib.
16
+
8
17
  GEMMA is a software toolkit for fast application of linear mixed
9
18
  models (LMMs) and related models to genome-wide association studies
10
19
  (GWAS) and other large-scale data sets.
@@ -12,16 +21,14 @@ models (LMMs) and related models to genome-wide association studies
12
21
  This repository contains gemma-wrapper, essentially a wrapper of
13
22
  GEMMA that provides support for caching the kinship or relatedness
14
23
  matrix (K) and caching LM and LMM computations with the option of full
15
- leave-one-chromosome-out genome scans (LOCO).
24
+ leave-one-chromosome-out genome scans (LOCO). Jobs can also be
25
+ submitted to HPC PBS, i.e., slurm.
16
26
 
17
27
  gemma-wrapper requires a recent version of GEMMA and essentially
18
28
  does a pass-through of all standard GEMMA invocation switches. On
19
29
  return gemma-wrapper can return a JSON object (--json) which is
20
30
  useful for web-services.
21
31
 
22
- Note that this a work in progress (WIP). What is described below
23
- should work.
24
-
25
32
  ## Installation
26
33
 
27
34
  Prerequisites are
@@ -30,8 +37,9 @@ Prerequisites are
30
37
  * Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
31
38
  almost all Linux systems
32
39
 
33
- gemma-wrapper comes as a Ruby [gem](https://rubygems.org/gems/bio-gemma-wrapper) and
34
- can be installed with
40
+ gemma-wrapper comes as a Ruby
41
+ [gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
42
+ installed with
35
43
 
36
44
  gem install bio-gemma-wrapper
37
45
 
@@ -39,15 +47,18 @@ Invoke the tool with
39
47
 
40
48
  gemma-wrapper --help
41
49
 
42
- and it will render
50
+ and it will render something like
43
51
 
44
52
  ```
45
53
  Usage: gemma-wrapper [options] -- [gemma-options]
54
+ --permutate n Permutate # times by shuffling phenotypes
55
+ --permute-phenotypes filen Phenotypes to be shuffled in permutations
46
56
  --loco [x,y,1,2,3...] Run full LOCO
47
57
  --input filen JSON input variables (used for LOCO)
48
58
  --cache-dir path Use a cache directory
49
59
  --json Create output file in JSON format
50
60
  --force Force computation
61
+ --slurm [options] Submit to slurm PBS
51
62
  --q, --quiet Run quietly
52
63
  -v, --verbose Run verbosely
53
64
  --debug Show debug messages and keep intermediate output
@@ -65,6 +76,8 @@ Unpack it and run the tool as
65
76
 
66
77
  ./bin/gemma-wrapper --help
67
78
 
79
+ See below for using a GNU Guix environment.
80
+
68
81
  ## Usage
69
82
 
70
83
  gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
@@ -91,11 +104,12 @@ the data files are found):
91
104
 
92
105
  Run it twice to see
93
106
 
94
- /tmp/3079151e14b219c3b243b673d88001c1675168b4.log.txt gemma-wrapper CACHE HIT!
107
+ /tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
95
108
 
96
109
  gemma-wrapper computes the unique HASH value over the command
97
110
  line switches passed into GEMMA as well as the contents of the files
98
- passed in (here the genotype and phenotype files).
111
+ passed in (here the genotype and phenotype files - actually it ignores the phenotype with K because
112
+ GEMMA always computes the same K).
99
113
 
100
114
  You can also get JSON output on STDOUT by providing the --json switch
101
115
 
@@ -103,9 +117,10 @@ You can also get JSON output on STDOUT by providing the --json switch
103
117
  -g test/data/input/BXD_geno.txt.gz \
104
118
  -p test/data/input/BXD_pheno.txt \
105
119
  -gk \
106
- -debug
120
+ -debug > K.json
107
121
 
108
- prints out something that can be parsed with a calling program
122
+ K.json is something that can be parsed with a calling program, and is
123
+ also below as input for the GWA step. Example:
109
124
 
110
125
  ```json
111
126
  {"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
@@ -123,6 +138,23 @@ default. If you want something else provide a --cache-dir, e.g.
123
138
 
124
139
  will store K in ~/.gemma-cache.
125
140
 
141
+ ### GWA
142
+
143
+ Run the LMM using the K's captured earlier in K.json using the --input
144
+ switch
145
+
146
+ gemma-wrapper --json --loco --input K.json -- \
147
+ -g test/data/input/BXD_geno.txt.gz \
148
+ -p test/data/input/BXD_pheno.txt \
149
+ -c test/data/input/BXD_covariates2.txt \
150
+ -a test/data/input/BXD_snps.txt \
151
+ -lmm 2 -maf 0.1 \
152
+ -debug > GWA.json
153
+
154
+ Running it twice should show that GWA is not recomputed.
155
+
156
+ /tmp/9e411810ad341de6456ce0c6efd4f973356d0bad.log.txt CACHE HIT!
157
+
126
158
  ### LOCO
127
159
 
128
160
  Recent versions of GEMMA have LOCO support for a single chromosome
@@ -158,6 +190,45 @@ GWA.json contains the file names of every chromosome
158
190
  The -k switch is injected automatically. Again output switches are not
159
191
  allowed (-o, -outdir)
160
192
 
193
+ ### Permutations
194
+
195
+ Permutations can be run with and without LOCO. First create K
196
+
197
+ gemma-wrapper --json -- \
198
+ -g test/data/input/BXD_geno.txt.gz \
199
+ -p test/data/input/BXD_pheno.txt \
200
+ -gk \
201
+ -debug > K.json
202
+
203
+ Next, using K.json, permute the phenotypes with something like
204
+
205
+ gemma-wrapper --json --loco --input K.json \
206
+ --permutate 100 --permute-phenotype test/data/input/BXD_pheno.txt -- \
207
+ -g test/data/input/BXD_geno.txt.gz \
208
+ -p test/data/input/BXD_pheno.txt \
209
+ -c test/data/input/BXD_covariates2.txt \
210
+ -a test/data/input/BXD_snps.txt \
211
+ -lmm 2 -maf 0.1 \
212
+ -debug > GWA.json
213
+
214
+ This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
215
+
216
+ ["95 percentile (significant) ", 1.92081e-05, 4.7]
217
+ ["67 percentile (suggestive) ", 5.227785e-05, 4.3]
218
+
219
+ ### Slurm PBS
220
+
221
+ To run gemma-wrapper on HPC use the '--slurm' switch.
222
+
223
+ ## Development
224
+
225
+ We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
226
+
227
+ ```
228
+ source .guix-deploy
229
+ ruby bin/gemma-wrapper --help
230
+ ```
231
+
161
232
  ## Copyright
162
233
 
163
- Copyright (c) 2017 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
234
+ Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.97.1
1
+ 0.99.2
data/bin/gemma-wrapper CHANGED
@@ -4,7 +4,7 @@
4
4
  # Author:: Pjotr Prins
5
5
  # License:: GPL3
6
6
  #
7
- # Copyright (C) 2017,2018 Pjotr Prins <pjotr.prins@thebird.nl>
7
+ # Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
8
8
 
9
9
  USAGE = "
10
10
  GEMMA wrapper example:
@@ -35,10 +35,13 @@ GEMMA wrapper example:
35
35
  -lmm 2 -maf 0.1 \\
36
36
  -debug > GWA.json
37
37
 
38
+ Gemma gets used from the path. You can override by setting
39
+
40
+ env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
38
41
  "
39
- # These are used for testing compatibility
40
- GEMMA_V_MAJOR = 97
41
- GEMMA_V_MINOR = 0
42
+ # These are used for testing compatibility with the gemma tool
43
+ GEMMA_V_MAJOR = 98
44
+ GEMMA_V_MINOR = 4
42
45
 
43
46
  basepath = File.dirname(File.dirname(__FILE__))
44
47
  $: << File.join(basepath,'lib')
@@ -61,32 +64,34 @@ if not gemma_command
61
64
  end
62
65
  end
63
66
 
67
+
68
+ require 'digest/sha1'
64
69
  require 'fileutils'
65
70
  require 'optparse'
66
- require 'tmpdir'
67
71
  require 'tempfile'
72
+ require 'tmpdir'
68
73
 
69
74
  split_at = ARGV.index('--')
70
75
  if split_at
71
76
  gemma_args = ARGV[split_at+1..-1]
72
77
  end
73
78
 
74
- options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
79
+ options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, parallel: true }
75
80
 
76
81
  opts = OptionParser.new do |o|
77
82
  o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
78
83
 
79
- o.on('--permutate n', Integer, 'Permutate by shuffling phenotypes') do |lst|
84
+ o.on('--permutate n', Integer, 'Permutate # times by shuffling phenotypes') do |lst|
80
85
  options[:permutate] = lst
81
86
  options[:force] = true
82
87
  end
83
88
 
84
- o.on('--phenotypes filen',String, 'Phenotypes to be shuffled in permutations') do |phenotypes|
85
- options[:phenotypes] = phenotypes
89
+ o.on('--permute-phenotypes filen',String, 'Phenotypes to be shuffled in permutations') do |phenotypes|
90
+ options[:permute_phenotypes] = phenotypes
86
91
  raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
87
92
  end
88
93
 
89
- o.on('--loco [x,y,1,2,3...]', Array, 'Run full LOCO') do |lst|
94
+ o.on('--loco [x,y,1,2,3...]', Array, 'Run full leave-one-chromosome-out (LOCO)') do |lst|
90
95
  options[:loco] = lst
91
96
  end
92
97
 
@@ -107,6 +112,18 @@ opts = OptionParser.new do |o|
107
112
  options[:force] = true
108
113
  end
109
114
 
115
+ o.on("--no-parallel", "Do not run jobs in parallel") do |b|
116
+ options[:parallel] = false
117
+ end
118
+
119
+ o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
120
+ options[:slurm_opts] = ""
121
+ options[:slurm] = true
122
+ if slurm
123
+ options[:slurm_opts] = slurm
124
+ end
125
+ end
126
+
110
127
  o.on("--q", "--quiet", "Run quietly") do |q|
111
128
  options[:quiet] = true
112
129
  end
@@ -115,15 +132,20 @@ opts = OptionParser.new do |o|
115
132
  options[:verbose] = true
116
133
  end
117
134
 
118
- o.on("--debug", "Show debug messages and keep intermediate output") do |v|
135
+ o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
119
136
  options[:debug] = true
120
137
  end
121
138
 
139
+ o.on("--dry-run", "Show commands, but don't execute") do |b|
140
+ options[:dry_run] = b
141
+ end
142
+
122
143
  o.on('--','Anything after gets passed to GEMMA') do
123
144
  o.terminate()
124
145
  end
125
146
 
126
147
  o.separator ""
148
+
127
149
  o.on_tail('-h', '--help', 'display this help and exit') do
128
150
  options[:show_help] = true
129
151
  end
@@ -171,17 +193,28 @@ end
171
193
  # ---- Start banner
172
194
 
173
195
  GEMMA_K_VERSION=version
174
- GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017,2018\n"
196
+ GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
175
197
  info.call GEMMA_K_BANNER
176
198
 
177
199
  # Check gemma version
178
200
  GEMMA_COMMAND=options[:gemma_command]
179
- gemma_version_header = `#{GEMMA_COMMAND}`.split("\n").grep(/GEMMA|Version/)[0].strip
201
+ info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
202
+
203
+ begin
204
+ GEMMA_INFO = `#{GEMMA_COMMAND}`
205
+ rescue Errno::ENOENT
206
+ GEMMA_COMMAND = "gemma" if not GEMMA_COMMAND
207
+ error.call "<#{GEMMA_COMMAND}> command not found"
208
+ end
209
+
210
+ gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
180
211
  info.call "Using ",gemma_version_header,"\n"
181
212
  gemma_version = gemma_version_header.split(/[,\s]+/)[1]
182
213
  v_version, v_major, v_minor = gemma_version.split(".")
183
214
  info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
184
215
 
216
+ info.call gemma_version_header
217
+
185
218
  warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
186
219
 
187
220
  options[:gemma_version_header] = gemma_version_header
@@ -197,60 +230,143 @@ if RUBY_VERSION =~ /^1/
197
230
  warning "runs on Ruby 2.x only\n"
198
231
  end
199
232
 
233
+ debug.call(options) # some debug output
234
+ debug.call(record)
235
+
236
+ DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
237
+ DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
238
+
239
+ # ---- Set up parallel
240
+ if options[:parallel]
241
+ begin
242
+ skip_cite = `echo "will cite" |parallel --citation`
243
+ debug.call(skip_cite)
244
+ PARALLEL_INFO = `parallel --help`
245
+ rescue Errno::ENOENT
246
+ error.call "<parallel> command not found"
247
+ end
248
+ parallel_cmds = []
249
+ end
250
+
200
251
  # ---- Compute HASH on inputs
201
252
  hashme = []
202
253
  geno_idx = gemma_args.index '-g'
203
- raise "Expected GEMMA -g switch" if geno_idx == nil
204
- hashme = gemma_args
205
- hashme += ['-p', options[:phenotypes]] if options[:phenotypes]
254
+ raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
255
+ pheno_idx = gemma_args.index '-p'
206
256
 
207
- require 'digest/sha1'
208
- debug.call "Hashing on ",hashme,"\n"
209
- hashes = []
210
- hashme.each do | item |
211
- if File.exist?(item)
212
- hashes << Digest::SHA1.hexdigest(File.read(item))
213
- debug.call [item,hashes.last]
257
+ if DO_COMPUTE_GWA and options[:permute_phenotypes]
258
+ raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
259
+ end
260
+
261
+ execute = lambda { |cmd|
262
+ info.call("Executing: #{cmd}")
263
+ err = 0
264
+ if not options[:debug]
265
+ # send output to stderr line by line
266
+ IO.popen("#{cmd}") do |io|
267
+ while s = io.gets
268
+ $stderr.print s
269
+ end
270
+ io.close
271
+ err = $?.to_i
272
+ end
214
273
  else
215
- hashes << item
274
+ $stderr.print `#{cmd}`
275
+ err = $?.to_i
276
+ end
277
+ err
278
+ }
279
+
280
+ compute_hash = lambda do | phenofn = nil |
281
+ # Compute a HASH on the inputs
282
+ debug.call "Hashing on ",hashme,"\n"
283
+ hashes = []
284
+ hm = if phenofn
285
+ hashme + ["-p", phenofn]
286
+ else
287
+ hashme
288
+ end
289
+ debug.call(hm)
290
+ hm.each do | item |
291
+ if File.file?(item)
292
+ hashes << Digest::SHA1.hexdigest(File.read(item))
293
+ debug.call [item,hashes.last]
294
+ else
295
+ hashes << item
296
+ end
216
297
  end
298
+ debug.call(hashes)
299
+ Digest::SHA1.hexdigest hashes.join(' ')
217
300
  end
218
- HASH = Digest::SHA1.hexdigest hashes.join(' ')
219
301
 
302
+ HASH = compute_hash.call()
220
303
  options[:hash] = HASH
221
304
 
222
305
  # Create cache dir
223
306
  FileUtils::mkdir_p options[:cache_dir]
224
307
 
308
+ Dir.mktmpdir do |tmpdir| # tmpdir for GEMMA output
309
+
225
310
  error.call "Do not use the GEMMA -o switch!" if gemma_args.include? '-o'
226
311
  error.call "Do not use the GEMMA -outdir switch!" if gemma_args.include? '-outdir'
312
+ GEMMA_ARGS_HASH = gemma_args.dup # do not include outdir
227
313
  gemma_args << '-outdir'
228
- gemma_args << options[:cache_dir]
314
+ gemma_args << tmpdir
229
315
  GEMMA_ARGS = gemma_args
230
316
 
317
+ hashme =
318
+ if DO_COMPUTE_KINSHIP and pheno_idx != nil
319
+ # Remove the phenotype file from the hash for GRM computation
320
+ GEMMA_ARGS_HASH[0..pheno_idx-1] + GEMMA_ARGS_HASH[pheno_idx+2..-1]
321
+ else
322
+ GEMMA_ARGS_HASH
323
+ end
324
+
231
325
  debug.call "Options: ",options,"\n" if !options[:quiet]
232
326
 
233
- invoke_gemma = lambda do |extra_args, cache_hit = false|
234
- cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
327
+ invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
328
+ cmd = "#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
235
329
  record[:gemma_command] = cmd
236
330
  return if cache_hit
237
- # debug.call cmd
331
+ if options[:slurm]
332
+ info.call cmd
333
+ hashi = HASH
334
+ prefix = tmpdir+'/'+hashi
335
+ scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
336
+ script = "#!/bin/bash
337
+ #SBATCH --job-name=gemma-#{scriptfn}
338
+ #SBATCH --ntasks=1
339
+ #SBATCH --time=20:00
340
+ srun #{cmd}
341
+ "
342
+ debug.call(script)
343
+ File.open(scriptfn,"w") { |f|
344
+ f.write(script)
345
+ }
346
+ cmd = "sbatch "+options[:slurm_opts] + scriptfn
347
+ end
238
348
  errno =
239
349
  if options[:json]
240
350
  # capture output
241
351
  err = 0
242
- IO.popen(cmd) do |io|
243
- while s = io.gets
244
- $stderr.print s
245
- end
246
- io.close
247
- err = $?.to_i
352
+ if options[:dry_run]
353
+ info.call("Would have invoked: ",cmd)
354
+ elsif options[:parallel]
355
+ info.call("Add parallel job: ",cmd)
356
+ parallel_cmds << cmd
357
+ else
358
+ err = execute.call(cmd)
248
359
  end
249
360
  err
250
361
  else
251
- debug.call("Invoking ",cmd) if options[:debug]
252
- system(cmd)
253
- $?.exitstatus
362
+ if options[:dry_run]
363
+ info.call("Would have invoked ",cmd)
364
+ 0
365
+ else
366
+ debug.call("Invoking ",cmd) if options[:debug]
367
+ system(cmd)
368
+ $?.exitstatus
369
+ end
254
370
  end
255
371
  if errno != 0
256
372
  debug.call "Gemma exit ",errno
@@ -260,11 +376,14 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
260
376
  end
261
377
  end
262
378
 
379
+ # Takes the hash value and checks whether the (output) file exists
263
380
  # returns datafn, logfn, cache_hit
264
- cache = lambda do | chr, ext |
381
+ cache = lambda do | chr, ext, h=HASH, permutation=0 |
265
382
  inject = (chr==nil ? "" : ".#{chr}" )+ext
266
- hashi = (chr==nil ? HASH : HASH+inject)
267
- prefix = options[:cache_dir]+'/'+hashi
383
+ hashi = (chr==nil ? h : h+inject)
384
+ prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
385
+ # for chr 3 and permutation 1 forms something like
386
+ # /tmp/1b700-a996f.3.cXX.txt.1.log.txt
268
387
  logfn = prefix+".log.txt"
269
388
  datafn = prefix+ext
270
389
  record[:files] ||= []
@@ -300,20 +419,22 @@ kinship = lambda do | chr = nil |
300
419
  end
301
420
 
302
421
  # ---- Run GWA
303
- gwas = lambda do | chr, kfn, pfn |
422
+ gwas = lambda do | chr, kfn, pfn, permutation=0 |
304
423
  record[:type] = "GWA"
305
- error.call "Do not use the GEMMA -k switch with gemma-wrapper!" if GEMMA_ARGS.include? '-k' # K is automatic
306
- hashi, cache_hit = cache.call chr,".assoc.txt"
424
+ error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
425
+ # Update hash for each permutation
426
+ hash = compute_hash.call(pfn)
427
+ hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
307
428
  if not cache_hit
308
429
  args = [ '-k', kfn, '-o', hashi ]
309
430
  args << [ '-loco', chr ] if chr != nil
310
431
  args << [ '-p', pfn ] if pfn
311
- invoke_gemma.call args
432
+ invoke_gemma.call args,false,chr,permutation
312
433
  end
313
434
  end
314
435
 
315
436
  LOCO = options[:loco]
316
- if GEMMA_ARGS.include? '-gk'
437
+ if DO_COMPUTE_KINSHIP
317
438
  # compute K
318
439
  info.call LOCO
319
440
  if LOCO != nil
@@ -325,11 +446,11 @@ if GEMMA_ARGS.include? '-gk'
325
446
  kinship.call # no LOCO
326
447
  end
327
448
  else
328
- # GWAS
449
+ # DO_COMPUTE_GWA
329
450
  json_in = JSON.parse(File.read(options[:input]))
330
451
  raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
331
452
 
332
- pfn = options[:phenotypes] # can be nil
453
+ pfn = options[:permute_phenotypes] # can be nil
333
454
  k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
334
455
  k_files.each do | chr, kfn | # call a GWA for each chromosome
335
456
  gwas.call(chr,kfn,pfn)
@@ -337,16 +458,16 @@ else
337
458
  # Permute
338
459
  if options[:permutate]
339
460
  ps = []
340
- raise "You should supply --phenotype with gemma-wrapper --permutate" if not pfn
461
+ raise "You should supply --permute-phenotypes with gemma-wrapper --permutate" if not pfn
341
462
  File.foreach(pfn).with_index do |line, line_num|
342
463
  ps << line
343
464
  end
344
465
  score_list = []
345
466
  debug.call(options[:permutate],"x permutations")
346
- (1..options[:permutate]).each do |i|
347
- $stderr.print "Iteration ",i,"\n"
467
+ (1..options[:permutate]).each do |permutation|
468
+ $stderr.print "Iteration ",permutation,"\n"
348
469
  # Create a shuffled phenotype file
349
- file = File.open("phenotypes-#{i}","w")
470
+ file = File.open("phenotypes-#{permutation}","w")
350
471
  tmp_pfn = file.path
351
472
  p tmp_pfn
352
473
  ps.shuffle.each do | l |
@@ -354,20 +475,23 @@ else
354
475
  end
355
476
  file.close
356
477
  k_files.each do | chr, kfn | # call a GWA for each chromosome
357
- gwas.call(chr,kfn,tmp_pfn)
478
+ gwas.call(chr,kfn,tmp_pfn,permutation)
358
479
  end
359
- # p [:HEY,record[:files].last]
360
- assocfn = record[:files].last[2]
361
- debug.call("Reading ",assocfn)
362
480
  score_min = 1000.0
363
- File.foreach(assocfn).with_index do |assoc, assoc_line_num|
364
- if assoc_line_num > 0
365
- value = assoc.strip.split(/\t/).last.to_f
366
- score_min = value if value < score_min
481
+ if false and not options[:slurm]
482
+ # p [:HEY,record[:files].last]
483
+ assocfn = record[:files].last[2]
484
+ debug.call("Reading ",assocfn)
485
+ File.foreach(assocfn).with_index do |assoc, assoc_line_num|
486
+ if assoc_line_num > 0
487
+ value = assoc.strip.split(/\t/).last.to_f
488
+ score_min = value if value < score_min
489
+ end
367
490
  end
368
491
  end
369
492
  score_list << score_min
370
493
  end
494
+ exit 0 if options[:slurm]
371
495
  ls = score_list.sort
372
496
  p ls
373
497
  significant = ls[(ls.size - ls.size*0.95).floor]
@@ -378,5 +502,38 @@ else
378
502
  end
379
503
  end
380
504
 
505
+ # ---- Invoke parallel
506
+ if options[:parallel]
507
+ # parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
508
+ cmd = parallel_cmds.join("\\n")
509
+
510
+ cmd = "echo -e \"#{cmd}\""
511
+ err = execute.call(cmd+"|parallel") # all jobs in parallel
512
+ if err != 0
513
+ [16,8,4,1].each do |jobs|
514
+ info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
515
+ err = execute.call(cmd+"|parallel -j #{jobs}")
516
+ break if err == 0
517
+ end
518
+ if err != 0
519
+ info.call("Run failed!")
520
+ exit err
521
+ end
522
+ end
523
+ info.call("Run successful!")
524
+ end
381
525
  json_out.call
382
- exit 0
526
+
527
+ # copy all output files to the cache_dir. If a file exists only emit a warning
528
+ Dir.glob("*.txt", base: tmpdir) do | fn |
529
+ source = tmpdir + "/" + fn
530
+ dest = options[:cache_dir] + "/" + fn
531
+ if not File.exist?(dest) or options[:force]
532
+ info.call "Move #{source} to #{dest}"
533
+ FileUtils.mv source, dest, verbose: false
534
+ else
535
+ warning.call "File #{dest} already exists. Not overwriting"
536
+ end
537
+ end
538
+
539
+ end # tmpdir
@@ -2,7 +2,7 @@ Gem::Specification.new do |s|
2
2
  s.name = 'bio-gemma-wrapper'
3
3
  s.version = File.read('VERSION')
4
4
  s.summary = "GEMMA with LOCO and permutations"
5
- s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
5
+ s.description = "GEMMA wrapper adds LOCO and permutation support. Also runs in parallel and caches K between runs with LOCO support"
6
6
  s.authors = ["Pjotr Prins"]
7
7
  s.email = 'pjotr.public01@thebird.nl'
8
8
  s.files = ["bin/gemma-wrapper",
metadata CHANGED
@@ -1,17 +1,17 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-gemma-wrapper
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.97.1
4
+ version: 0.99.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pjotr Prins
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-07-19 00:00:00.000000000 Z
11
+ date: 2021-08-08 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: GEMMA wrapper adds LOCO and permutation support. Also caches K between
14
- runs with LOCO support
13
+ description: GEMMA wrapper adds LOCO and permutation support. Also runs in parallel
14
+ and caches K between runs with LOCO support
15
15
  email: pjotr.public01@thebird.nl
16
16
  executables:
17
17
  - gemma-wrapper
@@ -43,8 +43,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
43
43
  - !ruby/object:Gem::Version
44
44
  version: '0'
45
45
  requirements: []
46
- rubyforge_project:
47
- rubygems_version: 2.6.8
46
+ rubygems_version: 3.2.5
48
47
  signing_key:
49
48
  specification_version: 4
50
49
  summary: GEMMA with LOCO and permutations