bio-gemma-wrapper 0.92.2 → 0.99.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: a5750dc833b64764d55eaec57f9ce981302659b9
4
- data.tar.gz: f8ebc55f80ca39d84f03f4bae4385212aa5e35f7
2
+ SHA256:
3
+ metadata.gz: 9ddfd904e74beebe0de1b97732d872fce171732965a835b101b9cc9be815bb05
4
+ data.tar.gz: 2dae1c019da23f2f87216694d641fc1eb852aa7800557bd10cfb08cb3425e844
5
5
  SHA512:
6
- metadata.gz: 634361cf98042f4653d3ea4bd19d883ec08f7a1edff5f7c44935b4ef58d0b17d007451a23f7304ae29dd08c477549459b3d0cb2bcbda3060a6641d9ad665d012
7
- data.tar.gz: 55f75d9a5208595e5f3ff1dffd530f17f4fdb3040c2311ea993d69b1c21001673d3825015d5e4ec327d450225470d078b46f2db84a364d27bdbcbd5852741404
6
+ metadata.gz: 38454a3f12dab85bef711051e73e20a015fe6b6d9c71bafada2197b9aef1aa0eabe3f3709cb0dc9d0c39f4cc454c15bc4d3aea5d06140ccde72fa13aa6285f51
7
+ data.tar.gz: 28e77a6995893245c501e602d488b5e0c504549fa91d8c94f902591b87b4454fe9b7923667dfacae2ab1dac7f6f7d814df1ec036b2b4f616dfd4b84c549d35d1
data/README.md CHANGED
@@ -1,10 +1,19 @@
1
- # GEMMA wrapper caches K between runs with LOCO support
1
+ [![gemma-wrapper gem version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
2
+
3
+ # GEMMA with LOCO, permutations and slurm support (and caching)
2
4
 
3
5
  ![Genetic associations identified in CFW mice using GEMMA (Parker et al,
4
6
  Nat. Genet., 2016)](cfw.gif)
5
7
 
6
8
  ## Introduction
7
9
 
10
+ Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
11
+ GEMMA in parallel (now the default), and GEMMA on PBS. Gemma-wrapper
12
+ is used to run GEMMA as part of the https://genenetwork.org/
13
+ environment.
14
+
15
+ Note that gemma-wrapper is projected to be integrated into gemma2/lib.
16
+
8
17
  GEMMA is a software toolkit for fast application of linear mixed
9
18
  models (LMMs) and related models to genome-wide association studies
10
19
  (GWAS) and other large-scale data sets.
@@ -12,16 +21,14 @@ models (LMMs) and related models to genome-wide association studies
12
21
  This repository contains gemma-wrapper, essentially a wrapper of
13
22
  GEMMA that provides support for caching the kinship or relatedness
14
23
  matrix (K) and caching LM and LMM computations with the option of full
15
- leave-one-chromosome-out genome scans (LOCO).
24
+ leave-one-chromosome-out genome scans (LOCO). Jobs can also be
25
+ submitted to HPC PBS, i.e., slurm.
16
26
 
17
27
  gemma-wrapper requires a recent version of GEMMA and essentially
18
28
  does a pass-through of all standard GEMMA invocation switches. On
19
29
  return gemma-wrapper can return a JSON object (--json) which is
20
30
  useful for web-services.
21
31
 
22
- Note that this a work in progress (WIP). What is described below
23
- should work.
24
-
25
32
  ## Installation
26
33
 
27
34
  Prerequisites are
@@ -30,8 +37,9 @@ Prerequisites are
30
37
  * Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
31
38
  almost all Linux systems
32
39
 
33
- gemma-wrapper comes as a Ruby [gem](https://rubygems.org/gems/bio-gemma-wrapper) and
34
- can be installed with
40
+ gemma-wrapper comes as a Ruby
41
+ [gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
42
+ installed with
35
43
 
36
44
  gem install bio-gemma-wrapper
37
45
 
@@ -39,15 +47,18 @@ Invoke the tool with
39
47
 
40
48
  gemma-wrapper --help
41
49
 
42
- and it will render
50
+ and it will render something like
43
51
 
44
52
  ```
45
53
  Usage: gemma-wrapper [options] -- [gemma-options]
54
+ --permutate n Permutate # times by shuffling phenotypes
55
+ --permute-phenotypes filen Phenotypes to be shuffled in permutations
46
56
  --loco [x,y,1,2,3...] Run full LOCO
47
57
  --input filen JSON input variables (used for LOCO)
48
58
  --cache-dir path Use a cache directory
49
59
  --json Create output file in JSON format
50
60
  --force Force computation
61
+ --slurm [options] Submit to slurm PBS
51
62
  --q, --quiet Run quietly
52
63
  -v, --verbose Run verbosely
53
64
  --debug Show debug messages and keep intermediate output
@@ -65,6 +76,8 @@ Unpack it and run the tool as
65
76
 
66
77
  ./bin/gemma-wrapper --help
67
78
 
79
+ See below for using a GNU Guix environment.
80
+
68
81
  ## Usage
69
82
 
70
83
  gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
@@ -91,11 +104,12 @@ the data files are found):
91
104
 
92
105
  Run it twice to see
93
106
 
94
- /tmp/3079151e14b219c3b243b673d88001c1675168b4.log.txt gemma-wrapper CACHE HIT!
107
+ /tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
95
108
 
96
109
  gemma-wrapper computes the unique HASH value over the command
97
110
  line switches passed into GEMMA as well as the contents of the files
98
- passed in (here the genotype and phenotype files).
111
+ passed in (here the genotype and phenotype files - actually it ignores the phenotype with K because
112
+ GEMMA always computes the same K).
99
113
 
100
114
  You can also get JSON output on STDOUT by providing the --json switch
101
115
 
@@ -103,9 +117,10 @@ You can also get JSON output on STDOUT by providing the --json switch
103
117
  -g test/data/input/BXD_geno.txt.gz \
104
118
  -p test/data/input/BXD_pheno.txt \
105
119
  -gk \
106
- -debug
120
+ -debug > K.json
107
121
 
108
- prints out something that can be parsed with a calling program
122
+ K.json is something that can be parsed with a calling program, and is
123
+ also below as input for the GWA step. Example:
109
124
 
110
125
  ```json
111
126
  {"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
@@ -123,6 +138,23 @@ default. If you want something else provide a --cache-dir, e.g.
123
138
 
124
139
  will store K in ~/.gemma-cache.
125
140
 
141
+ ### GWA
142
+
143
+ Run the LMM using the K's captured earlier in K.json using the --input
144
+ switch
145
+
146
+ gemma-wrapper --json --loco --input K.json -- \
147
+ -g test/data/input/BXD_geno.txt.gz \
148
+ -p test/data/input/BXD_pheno.txt \
149
+ -c test/data/input/BXD_covariates2.txt \
150
+ -a test/data/input/BXD_snps.txt \
151
+ -lmm 2 -maf 0.1 \
152
+ -debug > GWA.json
153
+
154
+ Running it twice should show that GWA is not recomputed.
155
+
156
+ /tmp/9e411810ad341de6456ce0c6efd4f973356d0bad.log.txt CACHE HIT!
157
+
126
158
  ### LOCO
127
159
 
128
160
  Recent versions of GEMMA have LOCO support for a single chromosome
@@ -158,6 +190,45 @@ GWA.json contains the file names of every chromosome
158
190
  The -k switch is injected automatically. Again output switches are not
159
191
  allowed (-o, -outdir)
160
192
 
193
+ ### Permutations
194
+
195
+ Permutations can be run with and without LOCO. First create K
196
+
197
+ gemma-wrapper --json -- \
198
+ -g test/data/input/BXD_geno.txt.gz \
199
+ -p test/data/input/BXD_pheno.txt \
200
+ -gk \
201
+ -debug > K.json
202
+
203
+ Next, using K.json, permute the phenotypes with something like
204
+
205
+ gemma-wrapper --json --loco --input K.json \
206
+ --permutate 100 --permute-phenotype test/data/input/BXD_pheno.txt -- \
207
+ -g test/data/input/BXD_geno.txt.gz \
208
+ -p test/data/input/BXD_pheno.txt \
209
+ -c test/data/input/BXD_covariates2.txt \
210
+ -a test/data/input/BXD_snps.txt \
211
+ -lmm 2 -maf 0.1 \
212
+ -debug > GWA.json
213
+
214
+ This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
215
+
216
+ ["95 percentile (significant) ", 1.92081e-05, 4.7]
217
+ ["67 percentile (suggestive) ", 5.227785e-05, 4.3]
218
+
219
+ ### Slurm PBS
220
+
221
+ To run gemma-wrapper on HPC use the '--slurm' switch.
222
+
223
+ ## Development
224
+
225
+ We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
226
+
227
+ ```
228
+ source .guix-deploy
229
+ ruby bin/gemma-wrapper --help
230
+ ```
231
+
161
232
  ## Copyright
162
233
 
163
- Copyright (c) 2017 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
234
+ Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.92.2
1
+ 0.99.1
data/bin/gemma-wrapper CHANGED
@@ -4,9 +4,10 @@
4
4
  # Author:: Pjotr Prins
5
5
  # License:: GPL3
6
6
  #
7
- # Copyright (C) 2017 Pjotr Prins <pjotr.prins@thebird.nl>
7
+ # Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
8
8
 
9
- USAGE = "GEMMA wrapper example:
9
+ USAGE = "
10
+ GEMMA wrapper example:
10
11
 
11
12
  Simple caching of K computation with
12
13
 
@@ -34,9 +35,13 @@ USAGE = "GEMMA wrapper example:
34
35
  -lmm 2 -maf 0.1 \\
35
36
  -debug > GWA.json
36
37
 
38
+ Gemma gets used from the path. You can override by setting
39
+
40
+ env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
37
41
  "
38
- GEMMA_V_MAJOR = 97
39
- GEMMA_V_MINOR = 2
42
+ # These are used for testing compatibility with the gemma tool
43
+ GEMMA_V_MAJOR = 98
44
+ GEMMA_V_MINOR = 1
40
45
 
41
46
  basepath = File.dirname(File.dirname(__FILE__))
42
47
  $: << File.join(basepath,'lib')
@@ -59,8 +64,11 @@ if not gemma_command
59
64
  end
60
65
  end
61
66
 
67
+
68
+ require 'digest/sha1'
62
69
  require 'fileutils'
63
70
  require 'optparse'
71
+ require 'tempfile'
64
72
  require 'tmpdir'
65
73
 
66
74
  split_at = ARGV.index('--')
@@ -68,12 +76,22 @@ if split_at
68
76
  gemma_args = ARGV[split_at+1..-1]
69
77
  end
70
78
 
71
- options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
79
+ options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, parallel: true }
72
80
 
73
81
  opts = OptionParser.new do |o|
74
- o.banner = "Usage: #{File.basename($0)} [options] -- [gemma-options]"
82
+ o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
83
+
84
+ o.on('--permutate n', Integer, 'Permutate # times by shuffling phenotypes') do |lst|
85
+ options[:permutate] = lst
86
+ options[:force] = true
87
+ end
88
+
89
+ o.on('--permute-phenotypes filen',String, 'Phenotypes to be shuffled in permutations') do |phenotypes|
90
+ options[:permute_phenotypes] = phenotypes
91
+ raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
92
+ end
75
93
 
76
- o.on('--loco [x,y,1,2,3...]', Array, 'Run full LOCO') do |lst|
94
+ o.on('--loco [x,y,1,2,3...]', Array, 'Run full leave-one-chromosome-out (LOCO)') do |lst|
77
95
  options[:loco] = lst
78
96
  end
79
97
 
@@ -90,10 +108,22 @@ opts = OptionParser.new do |o|
90
108
  options[:json] = b
91
109
  end
92
110
 
93
- o.on("--force", "Force computation") do |q|
111
+ o.on("--force", "Force computation (override cache)") do |q|
94
112
  options[:force] = true
95
113
  end
96
114
 
115
+ o.on("--no-parallel", "Do not run jobs in parallel") do |b|
116
+ options[:parallel] = false
117
+ end
118
+
119
+ o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
120
+ options[:slurm_opts] = ""
121
+ options[:slurm] = true
122
+ if slurm
123
+ options[:slurm_opts] = slurm
124
+ end
125
+ end
126
+
97
127
  o.on("--q", "--quiet", "Run quietly") do |q|
98
128
  options[:quiet] = true
99
129
  end
@@ -102,15 +132,20 @@ opts = OptionParser.new do |o|
102
132
  options[:verbose] = true
103
133
  end
104
134
 
105
- o.on("--debug", "Show debug messages and keep intermediate output") do |v|
135
+ o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
106
136
  options[:debug] = true
107
137
  end
108
138
 
139
+ o.on("--dry-run", "Show commands, but don't execute") do |b|
140
+ options[:dry_run] = b
141
+ end
142
+
109
143
  o.on('--','Anything after gets passed to GEMMA') do
110
144
  o.terminate()
111
145
  end
112
146
 
113
147
  o.separator ""
148
+
114
149
  o.on_tail('-h', '--help', 'display this help and exit') do
115
150
  options[:show_help] = true
116
151
  end
@@ -129,6 +164,7 @@ json_out = lambda do
129
164
  print record.to_json if options[:json]
130
165
  end
131
166
 
167
+ # ---- Some error handlers
132
168
  error = lambda do |*msg|
133
169
  if options[:json]
134
170
  record[:error] = *msg.join(" ")
@@ -137,12 +173,14 @@ error = lambda do |*msg|
137
173
  end
138
174
  raise *msg
139
175
  end
176
+
140
177
  debug = lambda do |*msg|
141
178
  if options[:debug]
142
179
  record[:debug].push *msg.join("") if options[:json]
143
180
  OUTPUT.print "DEBUG: ",*msg,"\n"
144
181
  end
145
182
  end
183
+
146
184
  warning = lambda do |*msg|
147
185
  record[:warnings].push *msg.join("")
148
186
  OUTPUT.print "WARNING: ",*msg,"\n"
@@ -152,18 +190,32 @@ info = lambda do |*msg|
152
190
  OUTPUT.print *msg,"\n" if !options[:quiet]
153
191
  end
154
192
 
193
+ # ---- Start banner
194
+
155
195
  GEMMA_K_VERSION=version
156
- GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017\n"
196
+ GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
157
197
  info.call GEMMA_K_BANNER
158
198
 
159
199
  # Check gemma version
160
200
  GEMMA_COMMAND=options[:gemma_command]
161
- gemma_version_header = `#{GEMMA_COMMAND}`.split("\n").grep(/Version/)[0].strip
162
- info.call "Using GEMMA ",gemma_version_header,"\n"
201
+ info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
202
+
203
+ begin
204
+ GEMMA_INFO = `#{GEMMA_COMMAND}`
205
+ rescue Errno::ENOENT
206
+ GEMMA_COMMAND = "gemma" if not GEMMA_COMMAND
207
+ error.call "<#{GEMMA_COMMAND}> command not found"
208
+ end
209
+
210
+ gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
211
+ info.call "Using ",gemma_version_header,"\n"
163
212
  gemma_version = gemma_version_header.split(/[,\s]+/)[1]
164
213
  v_version, v_major, v_minor = gemma_version.split(".")
214
+ info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
215
+
216
+ info.call gemma_version_header
165
217
 
166
- error.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor == nil or v_minor.to_i < GEMMA_V_MINOR))
218
+ warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
167
219
 
168
220
  options[:gemma_version_header] = gemma_version_header
169
221
  options[:gemma_version] = gemma_version
@@ -178,25 +230,82 @@ if RUBY_VERSION =~ /^1/
178
230
  warning "runs on Ruby 2.x only\n"
179
231
  end
180
232
 
233
+ debug.call(options) # some debug output
234
+ debug.call(record)
235
+
236
+ DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
237
+ DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
238
+
239
+ # ---- Set up parallel
240
+ if options[:parallel]
241
+ begin
242
+ PARALLEL_INFO = `parallel --help`
243
+ rescue Errno::ENOENT
244
+ error.call "<parallel> command not found"
245
+ end
246
+ parallel_cmds = []
247
+ end
248
+
181
249
  # ---- Compute HASH on inputs
182
250
  hashme = []
183
251
  geno_idx = gemma_args.index '-g'
184
- raise "Expected GEMMA -g switch" if geno_idx == nil
185
- hashme = gemma_args
252
+ raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
253
+ pheno_idx = gemma_args.index '-p'
186
254
 
187
- require 'digest/sha1'
188
- debug.call "Hashing on ",hashme,"\n"
189
- hashes = []
190
- hashme.each do | item |
191
- if File.exist?(item)
192
- hashes << Digest::SHA1.hexdigest(File.read(item))
193
- debug.call [item,hashes.last]
255
+ if DO_COMPUTE_GWA and options[:permute_phenotypes]
256
+ raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
257
+ end
258
+
259
+
260
+ execute = lambda { |cmd|
261
+ info.call("Executing: #{cmd}")
262
+ err = 0
263
+ if not options[:debug]
264
+ # send output to stderr line by line
265
+ IO.popen("#{cmd}") do |io|
266
+ while s = io.gets
267
+ $stderr.print s
268
+ end
269
+ io.close
270
+ err = $?.to_i
271
+ end
194
272
  else
195
- hashes << item
273
+ $stderr.print `#{cmd}`
274
+ err = $?.to_i
196
275
  end
276
+ err
277
+ }
278
+
279
+ hashme =
280
+ if DO_COMPUTE_KINSHIP and pheno_idx != nil
281
+ # Remove the phenotype file from the hash for GRM computation
282
+ gemma_args[0..pheno_idx-1] + gemma_args[pheno_idx+2..-1]
283
+ else
284
+ gemma_args
285
+ end
286
+
287
+ compute_hash = lambda do | phenofn = nil |
288
+ # Compute a HASH on the inputs
289
+ debug.call "Hashing on ",hashme,"\n"
290
+ hashes = []
291
+ hm = if phenofn
292
+ hashme + ["-p", phenofn]
293
+ else
294
+ hashme
295
+ end
296
+ debug.call(hm)
297
+ hm.each do | item |
298
+ if File.file?(item)
299
+ hashes << Digest::SHA1.hexdigest(File.read(item))
300
+ debug.call [item,hashes.last]
301
+ else
302
+ hashes << item
303
+ end
304
+ end
305
+ Digest::SHA1.hexdigest hashes.join(' ')
197
306
  end
198
- HASH = Digest::SHA1.hexdigest hashes.join(' ')
199
307
 
308
+ HASH = compute_hash.call()
200
309
  options[:hash] = HASH
201
310
 
202
311
  # Create cache dir
@@ -210,26 +319,49 @@ GEMMA_ARGS = gemma_args
210
319
 
211
320
  debug.call "Options: ",options,"\n" if !options[:quiet]
212
321
 
213
- invoke_gemma = lambda do |extra_args, cache_hit = false|
214
- cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
322
+ invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
323
+ cmd = "#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
215
324
  record[:gemma_command] = cmd
216
325
  return if cache_hit
217
- # debug.call cmd
326
+ if options[:slurm]
327
+ info.call cmd
328
+ hashi = HASH
329
+ prefix = options[:cache_dir]+'/'+hashi
330
+ scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
331
+ script = "#!/bin/bash
332
+ #SBATCH --job-name=gemma-#{scriptfn}
333
+ #SBATCH --ntasks=1
334
+ #SBATCH --time=20:00
335
+ srun #{cmd}
336
+ "
337
+ debug.call(script)
338
+ File.open(scriptfn,"w") { |f|
339
+ f.write(script)
340
+ }
341
+ cmd = "sbatch "+options[:slurm_opts] + scriptfn
342
+ end
218
343
  errno =
219
344
  if options[:json]
220
345
  # capture output
221
346
  err = 0
222
- IO.popen(cmd) do |io|
223
- while s = io.gets
224
- $stderr.print s
225
- end
226
- io.close
227
- err = $?.to_i
347
+ if options[:dry_run]
348
+ info.call("Would have invoked: ",cmd)
349
+ elsif options[:parallel]
350
+ info.call("Add parallel job: ",cmd)
351
+ parallel_cmds << cmd
352
+ else
353
+ err = execute.call(cmd)
228
354
  end
229
355
  err
230
356
  else
231
- system(cmd)
232
- $?.exitstatus
357
+ if options[:dry_run]
358
+ info.call("Would have invoked ",cmd)
359
+ 0
360
+ else
361
+ debug.call("Invoking ",cmd) if options[:debug]
362
+ system(cmd)
363
+ $?.exitstatus
364
+ end
233
365
  end
234
366
  if errno != 0
235
367
  debug.call "Gemma exit ",errno
@@ -240,10 +372,12 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
240
372
  end
241
373
 
242
374
  # returns datafn, logfn, cache_hit
243
- cache = lambda do | chr, ext |
375
+ cache = lambda do | chr, ext, h=HASH, permutation=0 |
244
376
  inject = (chr==nil ? "" : ".#{chr}" )+ext
245
- hashi = HASH+inject
246
- prefix = options[:cache_dir]+'/'+hashi
377
+ hashi = (chr==nil ? h : h+inject)
378
+ prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
379
+ # for chr 3 and permutation 1 forms something like
380
+ # /tmp/1b700-a996f.3.cXX.txt.1.log.txt
247
381
  logfn = prefix+".log.txt"
248
382
  datafn = prefix+ext
249
383
  record[:files] ||= []
@@ -260,6 +394,7 @@ cache = lambda do | chr, ext |
260
394
  return hashi,false
261
395
  end
262
396
 
397
+ # ---- Compute K
263
398
  kinship = lambda do | chr = nil |
264
399
  record[:type] = "K"
265
400
  ext = case (GEMMA_ARGS[GEMMA_ARGS.index('-gk')+1]).to_i
@@ -277,21 +412,23 @@ kinship = lambda do | chr = nil |
277
412
  end
278
413
  end
279
414
 
280
- gwas = lambda do | chr, kfn |
415
+ # ---- Run GWA
416
+ gwas = lambda do | chr, kfn, pfn, permutation=0 |
281
417
  record[:type] = "GWA"
282
- error.call "Do not use the GEMMA -k switch!" if GEMMA_ARGS.include? '-k'
283
- hashi, cache_hit = cache.call chr,".assoc.txt"
418
+ error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
419
+ # Update hash for each permutation
420
+ hash = compute_hash.call(pfn)
421
+ hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
284
422
  if not cache_hit
285
- if chr != nil
286
- invoke_gemma.call [ '-loco', chr, '-k', kfn, '-o', hashi ]
287
- else
288
- error.call "Not supported"
289
- end
423
+ args = [ '-k', kfn, '-o', hashi ]
424
+ args << [ '-loco', chr ] if chr != nil
425
+ args << [ '-p', pfn ] if pfn
426
+ invoke_gemma.call args,false,chr,permutation
290
427
  end
291
428
  end
292
429
 
293
430
  LOCO = options[:loco]
294
- if GEMMA_ARGS.include? '-gk'
431
+ if DO_COMPUTE_KINSHIP
295
432
  # compute K
296
433
  info.call LOCO
297
434
  if LOCO != nil
@@ -303,14 +440,80 @@ if GEMMA_ARGS.include? '-gk'
303
440
  kinship.call # no LOCO
304
441
  end
305
442
  else
306
- # GWAS
443
+ # DO_COMPUTE_GWA
307
444
  json_in = JSON.parse(File.read(options[:input]))
308
445
  raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
446
+
447
+ pfn = options[:permute_phenotypes] # can be nil
309
448
  k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
310
- k_files.each do | chr, kfn |
311
- gwas.call(chr,kfn)
449
+ k_files.each do | chr, kfn | # call a GWA for each chromosome
450
+ gwas.call(chr,kfn,pfn)
451
+ end
452
+ # Permute
453
+ if options[:permutate]
454
+ ps = []
455
+ raise "You should supply --permute-phenotypes with gemma-wrapper --permutate" if not pfn
456
+ File.foreach(pfn).with_index do |line, line_num|
457
+ ps << line
458
+ end
459
+ score_list = []
460
+ debug.call(options[:permutate],"x permutations")
461
+ (1..options[:permutate]).each do |permutation|
462
+ $stderr.print "Iteration ",permutation,"\n"
463
+ # Create a shuffled phenotype file
464
+ file = File.open("phenotypes-#{permutation}","w")
465
+ tmp_pfn = file.path
466
+ p tmp_pfn
467
+ ps.shuffle.each do | l |
468
+ file.print(l)
469
+ end
470
+ file.close
471
+ k_files.each do | chr, kfn | # call a GWA for each chromosome
472
+ gwas.call(chr,kfn,tmp_pfn,permutation)
473
+ end
474
+ score_min = 1000.0
475
+ if false and not options[:slurm]
476
+ # p [:HEY,record[:files].last]
477
+ assocfn = record[:files].last[2]
478
+ debug.call("Reading ",assocfn)
479
+ File.foreach(assocfn).with_index do |assoc, assoc_line_num|
480
+ if assoc_line_num > 0
481
+ value = assoc.strip.split(/\t/).last.to_f
482
+ score_min = value if value < score_min
483
+ end
484
+ end
485
+ end
486
+ score_list << score_min
487
+ end
488
+ exit 0 if options[:slurm]
489
+ ls = score_list.sort
490
+ p ls
491
+ significant = ls[(ls.size - ls.size*0.95).floor]
492
+ suggestive = ls[(ls.size - ls.size*0.67).floor]
493
+ p ["95 percentile (significant) ",significant,(-Math.log10(significant)).round(1)]
494
+ p ["67 percentile (suggestive) ",suggestive,(-Math.log10(suggestive)).round(1)]
495
+ exit 0
312
496
  end
313
497
  end
314
498
 
499
+ # ---- Invoke parallel
500
+ if options[:parallel]
501
+ # parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
502
+ cmd = parallel_cmds.join("\\n")
503
+
504
+ cmd = "echo -e \"#{cmd}\""
505
+ err = execute.call(cmd+"|parallel") # all jobs in parallel
506
+ if err != 0
507
+ [16,8,4,1].each do |jobs|
508
+ info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
509
+ err = execute.call(cmd+"|parallel -j #{jobs}")
510
+ break if err == 0
511
+ end
512
+ if err != 0
513
+ info.call("Run failed!")
514
+ exit err
515
+ end
516
+ end
517
+ info.call("Run successful!")
518
+ end
315
519
  json_out.call
316
- exit 0
@@ -1,8 +1,8 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'bio-gemma-wrapper'
3
3
  s.version = File.read('VERSION')
4
- s.summary = "Cache GEMMA with LOCO"
5
- s.description = "GEMMA wrapper caches K between runs with LOCO support"
4
+ s.summary = "GEMMA with LOCO and permutations"
5
+ s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
6
6
  s.authors = ["Pjotr Prins"]
7
7
  s.email = 'pjotr.public01@thebird.nl'
8
8
  s.files = ["bin/gemma-wrapper",
metadata CHANGED
@@ -1,16 +1,17 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-gemma-wrapper
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.92.2
4
+ version: 0.99.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pjotr Prins
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-09-03 00:00:00.000000000 Z
11
+ date: 2021-07-11 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: GEMMA wrapper caches K between runs with LOCO support
13
+ description: GEMMA wrapper adds LOCO and permutation support. Also caches K between
14
+ runs with LOCO support
14
15
  email: pjotr.public01@thebird.nl
15
16
  executables:
16
17
  - gemma-wrapper
@@ -43,8 +44,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
43
44
  version: '0'
44
45
  requirements: []
45
46
  rubyforge_project:
46
- rubygems_version: 2.5.1
47
+ rubygems_version: 2.7.6.2
47
48
  signing_key:
48
49
  specification_version: 4
49
- summary: Cache GEMMA with LOCO
50
+ summary: GEMMA with LOCO and permutations
50
51
  test_files: []