bio-gemma-wrapper 0.97.1 → 0.99.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 740af7561d0fb801b810713cd623f3d25e78971a
4
- data.tar.gz: ce3aaae3418073dc1365bc8755ac73745ef4b01a
2
+ SHA256:
3
+ metadata.gz: e27a8a3abb00b758095df5956b3854674faf5ff681a93bc028df273c40125c0d
4
+ data.tar.gz: e9675dbb0ea0f087dd21774635d38f3cda11b46a88b36c77dd308086fd0ec5f2
5
5
  SHA512:
6
- metadata.gz: 30043473cf8b09ecf8a6fdcfa97468ba5d169bc7a0b6b6ab2102f7f978b8cb2c63fb5c5cefd7267cfe526e4748b6481aa9b6dbe7309dceff62f0351f79b16b0e
7
- data.tar.gz: 07e50ab7b2b2d87bed1ce21b5b2d123693898163ff9a175841db190fcc22b43ee37ecc1df7cbc697e89b3dcaa193c489c299d3c50c1aa557d9512c4d494adec6
6
+ metadata.gz: 81cf5440fa531d5a831efa787800c8bea230d47cddc666a31fff066551ff347708a41ddf1368c0d3946c7ba9faef8e5882e398ad340850253c53961cce96f662
7
+ data.tar.gz: 582ae78c48a1eb8eeca01172eaeaba9d5ca23e69601967e334f8c218e3a4dd74b297861b01ce49b1357798b49a96c12e737100dcacec7fc34b70da1fc9c75f0d
data/README.md CHANGED
@@ -1,10 +1,19 @@
1
- # GEMMA wrapper caches K between runs with LOCO support
1
+ [![gemma-wrapper gem version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
2
+
3
+ # GEMMA with LOCO, permutations and slurm support (and caching)
2
4
 
3
5
  ![Genetic associations identified in CFW mice using GEMMA (Parker et al,
4
6
  Nat. Genet., 2016)](cfw.gif)
5
7
 
6
8
  ## Introduction
7
9
 
10
+ Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
11
+ GEMMA in parallel (now the default), and GEMMA on PBS. Gemma-wrapper
12
+ is used to run GEMMA as part of the https://genenetwork.org/
13
+ environment.
14
+
15
+ Note that gemma-wrapper is projected to be integrated into gemma2/lib.
16
+
8
17
  GEMMA is a software toolkit for fast application of linear mixed
9
18
  models (LMMs) and related models to genome-wide association studies
10
19
  (GWAS) and other large-scale data sets.
@@ -12,16 +21,14 @@ models (LMMs) and related models to genome-wide association studies
12
21
  This repository contains gemma-wrapper, essentially a wrapper of
13
22
  GEMMA that provides support for caching the kinship or relatedness
14
23
  matrix (K) and caching LM and LMM computations with the option of full
15
- leave-one-chromosome-out genome scans (LOCO).
24
+ leave-one-chromosome-out genome scans (LOCO). Jobs can also be
25
+ submitted to HPC PBS, i.e., slurm.
16
26
 
17
27
  gemma-wrapper requires a recent version of GEMMA and essentially
18
28
  does a pass-through of all standard GEMMA invocation switches. On
19
29
  return gemma-wrapper can return a JSON object (--json) which is
20
30
  useful for web-services.
21
31
 
22
- Note that this a work in progress (WIP). What is described below
23
- should work.
24
-
25
32
  ## Installation
26
33
 
27
34
  Prerequisites are
@@ -30,8 +37,9 @@ Prerequisites are
30
37
  * Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
31
38
  almost all Linux systems
32
39
 
33
- gemma-wrapper comes as a Ruby [gem](https://rubygems.org/gems/bio-gemma-wrapper) and
34
- can be installed with
40
+ gemma-wrapper comes as a Ruby
41
+ [gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
42
+ installed with
35
43
 
36
44
  gem install bio-gemma-wrapper
37
45
 
@@ -39,15 +47,18 @@ Invoke the tool with
39
47
 
40
48
  gemma-wrapper --help
41
49
 
42
- and it will render
50
+ and it will render something like
43
51
 
44
52
  ```
45
53
  Usage: gemma-wrapper [options] -- [gemma-options]
54
+ --permutate n Permutate # times by shuffling phenotypes
55
+ --permute-phenotypes filen Phenotypes to be shuffled in permutations
46
56
  --loco [x,y,1,2,3...] Run full LOCO
47
57
  --input filen JSON input variables (used for LOCO)
48
58
  --cache-dir path Use a cache directory
49
59
  --json Create output file in JSON format
50
60
  --force Force computation
61
+ --slurm [options] Submit to slurm PBS
51
62
  --q, --quiet Run quietly
52
63
  -v, --verbose Run verbosely
53
64
  --debug Show debug messages and keep intermediate output
@@ -65,6 +76,8 @@ Unpack it and run the tool as
65
76
 
66
77
  ./bin/gemma-wrapper --help
67
78
 
79
+ See below for using a GNU Guix environment.
80
+
68
81
  ## Usage
69
82
 
70
83
  gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
@@ -91,11 +104,12 @@ the data files are found):
91
104
 
92
105
  Run it twice to see
93
106
 
94
- /tmp/3079151e14b219c3b243b673d88001c1675168b4.log.txt gemma-wrapper CACHE HIT!
107
+ /tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
95
108
 
96
109
  gemma-wrapper computes the unique HASH value over the command
97
110
  line switches passed into GEMMA as well as the contents of the files
98
- passed in (here the genotype and phenotype files).
111
+ passed in (here the genotype and phenotype files - actually it ignores the phenotype with K because
112
+ GEMMA always computes the same K).
99
113
 
100
114
  You can also get JSON output on STDOUT by providing the --json switch
101
115
 
@@ -103,9 +117,10 @@ You can also get JSON output on STDOUT by providing the --json switch
103
117
  -g test/data/input/BXD_geno.txt.gz \
104
118
  -p test/data/input/BXD_pheno.txt \
105
119
  -gk \
106
- -debug
120
+ -debug > K.json
107
121
 
108
- prints out something that can be parsed with a calling program
122
+ K.json is something that can be parsed with a calling program, and is
123
+ also below as input for the GWA step. Example:
109
124
 
110
125
  ```json
111
126
  {"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
@@ -123,6 +138,23 @@ default. If you want something else provide a --cache-dir, e.g.
123
138
 
124
139
  will store K in ~/.gemma-cache.
125
140
 
141
+ ### GWA
142
+
143
+ Run the LMM using the K's captured earlier in K.json using the --input
144
+ switch
145
+
146
+ gemma-wrapper --json --loco --input K.json -- \
147
+ -g test/data/input/BXD_geno.txt.gz \
148
+ -p test/data/input/BXD_pheno.txt \
149
+ -c test/data/input/BXD_covariates2.txt \
150
+ -a test/data/input/BXD_snps.txt \
151
+ -lmm 2 -maf 0.1 \
152
+ -debug > GWA.json
153
+
154
+ Running it twice should show that GWA is not recomputed.
155
+
156
+ /tmp/9e411810ad341de6456ce0c6efd4f973356d0bad.log.txt CACHE HIT!
157
+
126
158
  ### LOCO
127
159
 
128
160
  Recent versions of GEMMA have LOCO support for a single chromosome
@@ -158,6 +190,45 @@ GWA.json contains the file names of every chromosome
158
190
  The -k switch is injected automatically. Again output switches are not
159
191
  allowed (-o, -outdir)
160
192
 
193
+ ### Permutations
194
+
195
+ Permutations can be run with and without LOCO. First create K
196
+
197
+ gemma-wrapper --json -- \
198
+ -g test/data/input/BXD_geno.txt.gz \
199
+ -p test/data/input/BXD_pheno.txt \
200
+ -gk \
201
+ -debug > K.json
202
+
203
+ Next, using K.json, permute the phenotypes with something like
204
+
205
+ gemma-wrapper --json --loco --input K.json \
206
+ --permutate 100 --permute-phenotype test/data/input/BXD_pheno.txt -- \
207
+ -g test/data/input/BXD_geno.txt.gz \
208
+ -p test/data/input/BXD_pheno.txt \
209
+ -c test/data/input/BXD_covariates2.txt \
210
+ -a test/data/input/BXD_snps.txt \
211
+ -lmm 2 -maf 0.1 \
212
+ -debug > GWA.json
213
+
214
+ This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
215
+
216
+ ["95 percentile (significant) ", 1.92081e-05, 4.7]
217
+ ["67 percentile (suggestive) ", 5.227785e-05, 4.3]
218
+
219
+ ### Slurm PBS
220
+
221
+ To run gemma-wrapper on HPC use the '--slurm' switch.
222
+
223
+ ## Development
224
+
225
+ We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
226
+
227
+ ```
228
+ source .guix-deploy
229
+ ruby bin/gemma-wrapper --help
230
+ ```
231
+
161
232
  ## Copyright
162
233
 
163
- Copyright (c) 2017 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
234
+ Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.97.1
1
+ 0.99.2
data/bin/gemma-wrapper CHANGED
@@ -4,7 +4,7 @@
4
4
  # Author:: Pjotr Prins
5
5
  # License:: GPL3
6
6
  #
7
- # Copyright (C) 2017,2018 Pjotr Prins <pjotr.prins@thebird.nl>
7
+ # Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
8
8
 
9
9
  USAGE = "
10
10
  GEMMA wrapper example:
@@ -35,10 +35,13 @@ GEMMA wrapper example:
35
35
  -lmm 2 -maf 0.1 \\
36
36
  -debug > GWA.json
37
37
 
38
+ Gemma gets used from the path. You can override by setting
39
+
40
+ env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
38
41
  "
39
- # These are used for testing compatibility
40
- GEMMA_V_MAJOR = 97
41
- GEMMA_V_MINOR = 0
42
+ # These are used for testing compatibility with the gemma tool
43
+ GEMMA_V_MAJOR = 98
44
+ GEMMA_V_MINOR = 4
42
45
 
43
46
  basepath = File.dirname(File.dirname(__FILE__))
44
47
  $: << File.join(basepath,'lib')
@@ -61,32 +64,34 @@ if not gemma_command
61
64
  end
62
65
  end
63
66
 
67
+
68
+ require 'digest/sha1'
64
69
  require 'fileutils'
65
70
  require 'optparse'
66
- require 'tmpdir'
67
71
  require 'tempfile'
72
+ require 'tmpdir'
68
73
 
69
74
  split_at = ARGV.index('--')
70
75
  if split_at
71
76
  gemma_args = ARGV[split_at+1..-1]
72
77
  end
73
78
 
74
- options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
79
+ options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, parallel: true }
75
80
 
76
81
  opts = OptionParser.new do |o|
77
82
  o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
78
83
 
79
- o.on('--permutate n', Integer, 'Permutate by shuffling phenotypes') do |lst|
84
+ o.on('--permutate n', Integer, 'Permutate # times by shuffling phenotypes') do |lst|
80
85
  options[:permutate] = lst
81
86
  options[:force] = true
82
87
  end
83
88
 
84
- o.on('--phenotypes filen',String, 'Phenotypes to be shuffled in permutations') do |phenotypes|
85
- options[:phenotypes] = phenotypes
89
+ o.on('--permute-phenotypes filen',String, 'Phenotypes to be shuffled in permutations') do |phenotypes|
90
+ options[:permute_phenotypes] = phenotypes
86
91
  raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
87
92
  end
88
93
 
89
- o.on('--loco [x,y,1,2,3...]', Array, 'Run full LOCO') do |lst|
94
+ o.on('--loco [x,y,1,2,3...]', Array, 'Run full leave-one-chromosome-out (LOCO)') do |lst|
90
95
  options[:loco] = lst
91
96
  end
92
97
 
@@ -107,6 +112,18 @@ opts = OptionParser.new do |o|
107
112
  options[:force] = true
108
113
  end
109
114
 
115
+ o.on("--no-parallel", "Do not run jobs in parallel") do |b|
116
+ options[:parallel] = false
117
+ end
118
+
119
+ o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
120
+ options[:slurm_opts] = ""
121
+ options[:slurm] = true
122
+ if slurm
123
+ options[:slurm_opts] = slurm
124
+ end
125
+ end
126
+
110
127
  o.on("--q", "--quiet", "Run quietly") do |q|
111
128
  options[:quiet] = true
112
129
  end
@@ -115,15 +132,20 @@ opts = OptionParser.new do |o|
115
132
  options[:verbose] = true
116
133
  end
117
134
 
118
- o.on("--debug", "Show debug messages and keep intermediate output") do |v|
135
+ o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
119
136
  options[:debug] = true
120
137
  end
121
138
 
139
+ o.on("--dry-run", "Show commands, but don't execute") do |b|
140
+ options[:dry_run] = b
141
+ end
142
+
122
143
  o.on('--','Anything after gets passed to GEMMA') do
123
144
  o.terminate()
124
145
  end
125
146
 
126
147
  o.separator ""
148
+
127
149
  o.on_tail('-h', '--help', 'display this help and exit') do
128
150
  options[:show_help] = true
129
151
  end
@@ -171,17 +193,28 @@ end
171
193
  # ---- Start banner
172
194
 
173
195
  GEMMA_K_VERSION=version
174
- GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017,2018\n"
196
+ GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
175
197
  info.call GEMMA_K_BANNER
176
198
 
177
199
  # Check gemma version
178
200
  GEMMA_COMMAND=options[:gemma_command]
179
- gemma_version_header = `#{GEMMA_COMMAND}`.split("\n").grep(/GEMMA|Version/)[0].strip
201
+ info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
202
+
203
+ begin
204
+ GEMMA_INFO = `#{GEMMA_COMMAND}`
205
+ rescue Errno::ENOENT
206
+ GEMMA_COMMAND = "gemma" if not GEMMA_COMMAND
207
+ error.call "<#{GEMMA_COMMAND}> command not found"
208
+ end
209
+
210
+ gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
180
211
  info.call "Using ",gemma_version_header,"\n"
181
212
  gemma_version = gemma_version_header.split(/[,\s]+/)[1]
182
213
  v_version, v_major, v_minor = gemma_version.split(".")
183
214
  info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
184
215
 
216
+ info.call gemma_version_header
217
+
185
218
  warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
186
219
 
187
220
  options[:gemma_version_header] = gemma_version_header
@@ -197,60 +230,143 @@ if RUBY_VERSION =~ /^1/
197
230
  warning "runs on Ruby 2.x only\n"
198
231
  end
199
232
 
233
+ debug.call(options) # some debug output
234
+ debug.call(record)
235
+
236
+ DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
237
+ DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
238
+
239
+ # ---- Set up parallel
240
+ if options[:parallel]
241
+ begin
242
+ skip_cite = `echo "will cite" |parallel --citation`
243
+ debug.call(skip_cite)
244
+ PARALLEL_INFO = `parallel --help`
245
+ rescue Errno::ENOENT
246
+ error.call "<parallel> command not found"
247
+ end
248
+ parallel_cmds = []
249
+ end
250
+
200
251
  # ---- Compute HASH on inputs
201
252
  hashme = []
202
253
  geno_idx = gemma_args.index '-g'
203
- raise "Expected GEMMA -g switch" if geno_idx == nil
204
- hashme = gemma_args
205
- hashme += ['-p', options[:phenotypes]] if options[:phenotypes]
254
+ raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
255
+ pheno_idx = gemma_args.index '-p'
206
256
 
207
- require 'digest/sha1'
208
- debug.call "Hashing on ",hashme,"\n"
209
- hashes = []
210
- hashme.each do | item |
211
- if File.exist?(item)
212
- hashes << Digest::SHA1.hexdigest(File.read(item))
213
- debug.call [item,hashes.last]
257
+ if DO_COMPUTE_GWA and options[:permute_phenotypes]
258
+ raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
259
+ end
260
+
261
+ execute = lambda { |cmd|
262
+ info.call("Executing: #{cmd}")
263
+ err = 0
264
+ if not options[:debug]
265
+ # send output to stderr line by line
266
+ IO.popen("#{cmd}") do |io|
267
+ while s = io.gets
268
+ $stderr.print s
269
+ end
270
+ io.close
271
+ err = $?.to_i
272
+ end
214
273
  else
215
- hashes << item
274
+ $stderr.print `#{cmd}`
275
+ err = $?.to_i
276
+ end
277
+ err
278
+ }
279
+
280
+ compute_hash = lambda do | phenofn = nil |
281
+ # Compute a HASH on the inputs
282
+ debug.call "Hashing on ",hashme,"\n"
283
+ hashes = []
284
+ hm = if phenofn
285
+ hashme + ["-p", phenofn]
286
+ else
287
+ hashme
288
+ end
289
+ debug.call(hm)
290
+ hm.each do | item |
291
+ if File.file?(item)
292
+ hashes << Digest::SHA1.hexdigest(File.read(item))
293
+ debug.call [item,hashes.last]
294
+ else
295
+ hashes << item
296
+ end
216
297
  end
298
+ debug.call(hashes)
299
+ Digest::SHA1.hexdigest hashes.join(' ')
217
300
  end
218
- HASH = Digest::SHA1.hexdigest hashes.join(' ')
219
301
 
302
+ HASH = compute_hash.call()
220
303
  options[:hash] = HASH
221
304
 
222
305
  # Create cache dir
223
306
  FileUtils::mkdir_p options[:cache_dir]
224
307
 
308
+ Dir.mktmpdir do |tmpdir| # tmpdir for GEMMA output
309
+
225
310
  error.call "Do not use the GEMMA -o switch!" if gemma_args.include? '-o'
226
311
  error.call "Do not use the GEMMA -outdir switch!" if gemma_args.include? '-outdir'
312
+ GEMMA_ARGS_HASH = gemma_args.dup # do not include outdir
227
313
  gemma_args << '-outdir'
228
- gemma_args << options[:cache_dir]
314
+ gemma_args << tmpdir
229
315
  GEMMA_ARGS = gemma_args
230
316
 
317
+ hashme =
318
+ if DO_COMPUTE_KINSHIP and pheno_idx != nil
319
+ # Remove the phenotype file from the hash for GRM computation
320
+ GEMMA_ARGS_HASH[0..pheno_idx-1] + GEMMA_ARGS_HASH[pheno_idx+2..-1]
321
+ else
322
+ GEMMA_ARGS_HASH
323
+ end
324
+
231
325
  debug.call "Options: ",options,"\n" if !options[:quiet]
232
326
 
233
- invoke_gemma = lambda do |extra_args, cache_hit = false|
234
- cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
327
+ invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
328
+ cmd = "#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
235
329
  record[:gemma_command] = cmd
236
330
  return if cache_hit
237
- # debug.call cmd
331
+ if options[:slurm]
332
+ info.call cmd
333
+ hashi = HASH
334
+ prefix = tmpdir+'/'+hashi
335
+ scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
336
+ script = "#!/bin/bash
337
+ #SBATCH --job-name=gemma-#{scriptfn}
338
+ #SBATCH --ntasks=1
339
+ #SBATCH --time=20:00
340
+ srun #{cmd}
341
+ "
342
+ debug.call(script)
343
+ File.open(scriptfn,"w") { |f|
344
+ f.write(script)
345
+ }
346
+ cmd = "sbatch "+options[:slurm_opts] + scriptfn
347
+ end
238
348
  errno =
239
349
  if options[:json]
240
350
  # capture output
241
351
  err = 0
242
- IO.popen(cmd) do |io|
243
- while s = io.gets
244
- $stderr.print s
245
- end
246
- io.close
247
- err = $?.to_i
352
+ if options[:dry_run]
353
+ info.call("Would have invoked: ",cmd)
354
+ elsif options[:parallel]
355
+ info.call("Add parallel job: ",cmd)
356
+ parallel_cmds << cmd
357
+ else
358
+ err = execute.call(cmd)
248
359
  end
249
360
  err
250
361
  else
251
- debug.call("Invoking ",cmd) if options[:debug]
252
- system(cmd)
253
- $?.exitstatus
362
+ if options[:dry_run]
363
+ info.call("Would have invoked ",cmd)
364
+ 0
365
+ else
366
+ debug.call("Invoking ",cmd) if options[:debug]
367
+ system(cmd)
368
+ $?.exitstatus
369
+ end
254
370
  end
255
371
  if errno != 0
256
372
  debug.call "Gemma exit ",errno
@@ -260,11 +376,14 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
260
376
  end
261
377
  end
262
378
 
379
+ # Takes the hash value and checks whether the (output) file exists
263
380
  # returns datafn, logfn, cache_hit
264
- cache = lambda do | chr, ext |
381
+ cache = lambda do | chr, ext, h=HASH, permutation=0 |
265
382
  inject = (chr==nil ? "" : ".#{chr}" )+ext
266
- hashi = (chr==nil ? HASH : HASH+inject)
267
- prefix = options[:cache_dir]+'/'+hashi
383
+ hashi = (chr==nil ? h : h+inject)
384
+ prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
385
+ # for chr 3 and permutation 1 forms something like
386
+ # /tmp/1b700-a996f.3.cXX.txt.1.log.txt
268
387
  logfn = prefix+".log.txt"
269
388
  datafn = prefix+ext
270
389
  record[:files] ||= []
@@ -300,20 +419,22 @@ kinship = lambda do | chr = nil |
300
419
  end
301
420
 
302
421
  # ---- Run GWA
303
- gwas = lambda do | chr, kfn, pfn |
422
+ gwas = lambda do | chr, kfn, pfn, permutation=0 |
304
423
  record[:type] = "GWA"
305
- error.call "Do not use the GEMMA -k switch with gemma-wrapper!" if GEMMA_ARGS.include? '-k' # K is automatic
306
- hashi, cache_hit = cache.call chr,".assoc.txt"
424
+ error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
425
+ # Update hash for each permutation
426
+ hash = compute_hash.call(pfn)
427
+ hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
307
428
  if not cache_hit
308
429
  args = [ '-k', kfn, '-o', hashi ]
309
430
  args << [ '-loco', chr ] if chr != nil
310
431
  args << [ '-p', pfn ] if pfn
311
- invoke_gemma.call args
432
+ invoke_gemma.call args,false,chr,permutation
312
433
  end
313
434
  end
314
435
 
315
436
  LOCO = options[:loco]
316
- if GEMMA_ARGS.include? '-gk'
437
+ if DO_COMPUTE_KINSHIP
317
438
  # compute K
318
439
  info.call LOCO
319
440
  if LOCO != nil
@@ -325,11 +446,11 @@ if GEMMA_ARGS.include? '-gk'
325
446
  kinship.call # no LOCO
326
447
  end
327
448
  else
328
- # GWAS
449
+ # DO_COMPUTE_GWA
329
450
  json_in = JSON.parse(File.read(options[:input]))
330
451
  raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
331
452
 
332
- pfn = options[:phenotypes] # can be nil
453
+ pfn = options[:permute_phenotypes] # can be nil
333
454
  k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
334
455
  k_files.each do | chr, kfn | # call a GWA for each chromosome
335
456
  gwas.call(chr,kfn,pfn)
@@ -337,16 +458,16 @@ else
337
458
  # Permute
338
459
  if options[:permutate]
339
460
  ps = []
340
- raise "You should supply --phenotype with gemma-wrapper --permutate" if not pfn
461
+ raise "You should supply --permute-phenotypes with gemma-wrapper --permutate" if not pfn
341
462
  File.foreach(pfn).with_index do |line, line_num|
342
463
  ps << line
343
464
  end
344
465
  score_list = []
345
466
  debug.call(options[:permutate],"x permutations")
346
- (1..options[:permutate]).each do |i|
347
- $stderr.print "Iteration ",i,"\n"
467
+ (1..options[:permutate]).each do |permutation|
468
+ $stderr.print "Iteration ",permutation,"\n"
348
469
  # Create a shuffled phenotype file
349
- file = File.open("phenotypes-#{i}","w")
470
+ file = File.open("phenotypes-#{permutation}","w")
350
471
  tmp_pfn = file.path
351
472
  p tmp_pfn
352
473
  ps.shuffle.each do | l |
@@ -354,20 +475,23 @@ else
354
475
  end
355
476
  file.close
356
477
  k_files.each do | chr, kfn | # call a GWA for each chromosome
357
- gwas.call(chr,kfn,tmp_pfn)
478
+ gwas.call(chr,kfn,tmp_pfn,permutation)
358
479
  end
359
- # p [:HEY,record[:files].last]
360
- assocfn = record[:files].last[2]
361
- debug.call("Reading ",assocfn)
362
480
  score_min = 1000.0
363
- File.foreach(assocfn).with_index do |assoc, assoc_line_num|
364
- if assoc_line_num > 0
365
- value = assoc.strip.split(/\t/).last.to_f
366
- score_min = value if value < score_min
481
+ if false and not options[:slurm]
482
+ # p [:HEY,record[:files].last]
483
+ assocfn = record[:files].last[2]
484
+ debug.call("Reading ",assocfn)
485
+ File.foreach(assocfn).with_index do |assoc, assoc_line_num|
486
+ if assoc_line_num > 0
487
+ value = assoc.strip.split(/\t/).last.to_f
488
+ score_min = value if value < score_min
489
+ end
367
490
  end
368
491
  end
369
492
  score_list << score_min
370
493
  end
494
+ exit 0 if options[:slurm]
371
495
  ls = score_list.sort
372
496
  p ls
373
497
  significant = ls[(ls.size - ls.size*0.95).floor]
@@ -378,5 +502,38 @@ else
378
502
  end
379
503
  end
380
504
 
505
+ # ---- Invoke parallel
506
+ if options[:parallel]
507
+ # parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
508
+ cmd = parallel_cmds.join("\\n")
509
+
510
+ cmd = "echo -e \"#{cmd}\""
511
+ err = execute.call(cmd+"|parallel") # all jobs in parallel
512
+ if err != 0
513
+ [16,8,4,1].each do |jobs|
514
+ info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
515
+ err = execute.call(cmd+"|parallel -j #{jobs}")
516
+ break if err == 0
517
+ end
518
+ if err != 0
519
+ info.call("Run failed!")
520
+ exit err
521
+ end
522
+ end
523
+ info.call("Run successful!")
524
+ end
381
525
  json_out.call
382
- exit 0
526
+
527
+ # copy all output files to the cache_dir. If a file exists only emit a warning
528
+ Dir.glob("*.txt", base: tmpdir) do | fn |
529
+ source = tmpdir + "/" + fn
530
+ dest = options[:cache_dir] + "/" + fn
531
+ if not File.exist?(dest) or options[:force]
532
+ info.call "Move #{source} to #{dest}"
533
+ FileUtils.mv source, dest, verbose: false
534
+ else
535
+ warning.call "File #{dest} already exists. Not overwriting"
536
+ end
537
+ end
538
+
539
+ end # tmpdir
@@ -2,7 +2,7 @@ Gem::Specification.new do |s|
2
2
  s.name = 'bio-gemma-wrapper'
3
3
  s.version = File.read('VERSION')
4
4
  s.summary = "GEMMA with LOCO and permutations"
5
- s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
5
+ s.description = "GEMMA wrapper adds LOCO and permutation support. Also runs in parallel and caches K between runs with LOCO support"
6
6
  s.authors = ["Pjotr Prins"]
7
7
  s.email = 'pjotr.public01@thebird.nl'
8
8
  s.files = ["bin/gemma-wrapper",
metadata CHANGED
@@ -1,17 +1,17 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-gemma-wrapper
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.97.1
4
+ version: 0.99.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pjotr Prins
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-07-19 00:00:00.000000000 Z
11
+ date: 2021-08-08 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: GEMMA wrapper adds LOCO and permutation support. Also caches K between
14
- runs with LOCO support
13
+ description: GEMMA wrapper adds LOCO and permutation support. Also runs in parallel
14
+ and caches K between runs with LOCO support
15
15
  email: pjotr.public01@thebird.nl
16
16
  executables:
17
17
  - gemma-wrapper
@@ -43,8 +43,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
43
43
  - !ruby/object:Gem::Version
44
44
  version: '0'
45
45
  requirements: []
46
- rubyforge_project:
47
- rubygems_version: 2.6.8
46
+ rubygems_version: 3.2.5
48
47
  signing_key:
49
48
  specification_version: 4
50
49
  summary: GEMMA with LOCO and permutations